diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2019-07-09 15:34:26 -0400 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2019-07-09 15:34:26 -0400 |
commit | e9a83bd2322035ed9d7dcf35753d3f984d76c6a5 (patch) | |
tree | 66dc466ff9aec0f9bb7f39cba50a47eab6585559 /Documentation/x86 | |
parent | 7011b7e1b702cc76f9e969b41d9a95969f2aecaa (diff) | |
parent | 454f96f2b738374da4b0a703b1e2e7aed82c4486 (diff) |
Merge tag 'docs-5.3' of git://git.lwn.net/linux
Pull Documentation updates from Jonathan Corbet:
"It's been a relatively busy cycle for docs:
- A fair pile of RST conversions, many from Mauro. These create more
than the usual number of simple but annoying merge conflicts with
other trees, unfortunately. He has a lot more of these waiting on
the wings that, I think, will go to you directly later on.
- A new document on how to use merges and rebases in kernel repos,
and one on Spectre vulnerabilities.
- Various improvements to the build system, including automatic
markup of function() references because some people, for reasons I
will never understand, were of the opinion that
:c:func:``function()`` is unattractive and not fun to type.
- We now recommend using sphinx 1.7, but still support back to 1.4.
- Lots of smaller improvements, warning fixes, typo fixes, etc"
* tag 'docs-5.3' of git://git.lwn.net/linux: (129 commits)
docs: automarkup.py: ignore exceptions when seeking for xrefs
docs: Move binderfs to admin-guide
Disable Sphinx SmartyPants in HTML output
doc: RCU callback locks need only _bh, not necessarily _irq
docs: format kernel-parameters -- as code
Doc : doc-guide : Fix a typo
platform: x86: get rid of a non-existent document
Add the RCU docs to the core-api manual
Documentation: RCU: Add TOC tree hooks
Documentation: RCU: Rename txt files to rst
Documentation: RCU: Convert RCU UP systems to reST
Documentation: RCU: Convert RCU linked list to reST
Documentation: RCU: Convert RCU basic concepts to reST
docs: filesystems: Remove uneeded .rst extension on toctables
scripts/sphinx-pre-install: fix out-of-tree build
docs: zh_CN: submitting-drivers.rst: Remove a duplicated Documentation/
Documentation: PGP: update for newer HW devices
Documentation: Add section about CPU vulnerabilities for Spectre
Documentation: platform: Delete x86-laptop-drivers.txt
docs: Note that :c:func: should no longer be used
...
Diffstat (limited to 'Documentation/x86')
-rw-r--r-- | Documentation/x86/index.rst | 1 | ||||
-rw-r--r-- | Documentation/x86/protection-keys.rst | 99 | ||||
-rw-r--r-- | Documentation/x86/resctrl_ui.rst | 30 | ||||
-rw-r--r-- | Documentation/x86/x86_64/5level-paging.rst | 2 | ||||
-rw-r--r-- | Documentation/x86/x86_64/boot-options.rst | 4 | ||||
-rw-r--r-- | Documentation/x86/x86_64/fake-numa-for-cpusets.rst | 2 |
6 files changed, 22 insertions, 116 deletions
diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst index ae36fc5fc649..f2de1b2d3ac7 100644 --- a/Documentation/x86/index.rst +++ b/Documentation/x86/index.rst | |||
@@ -19,7 +19,6 @@ x86-specific Documentation | |||
19 | tlb | 19 | tlb |
20 | mtrr | 20 | mtrr |
21 | pat | 21 | pat |
22 | protection-keys | ||
23 | intel_mpx | 22 | intel_mpx |
24 | amd-memory-encryption | 23 | amd-memory-encryption |
25 | pti | 24 | pti |
diff --git a/Documentation/x86/protection-keys.rst b/Documentation/x86/protection-keys.rst deleted file mode 100644 index 49d9833af871..000000000000 --- a/Documentation/x86/protection-keys.rst +++ /dev/null | |||
@@ -1,99 +0,0 @@ | |||
1 | .. SPDX-License-Identifier: GPL-2.0 | ||
2 | |||
3 | ====================== | ||
4 | Memory Protection Keys | ||
5 | ====================== | ||
6 | |||
7 | Memory Protection Keys for Userspace (PKU aka PKEYs) is a feature | ||
8 | which is found on Intel's Skylake "Scalable Processor" Server CPUs. | ||
9 | It will be avalable in future non-server parts. | ||
10 | |||
11 | For anyone wishing to test or use this feature, it is available in | ||
12 | Amazon's EC2 C5 instances and is known to work there using an Ubuntu | ||
13 | 17.04 image. | ||
14 | |||
15 | Memory Protection Keys provides a mechanism for enforcing page-based | ||
16 | protections, but without requiring modification of the page tables | ||
17 | when an application changes protection domains. It works by | ||
18 | dedicating 4 previously ignored bits in each page table entry to a | ||
19 | "protection key", giving 16 possible keys. | ||
20 | |||
21 | There is also a new user-accessible register (PKRU) with two separate | ||
22 | bits (Access Disable and Write Disable) for each key. Being a CPU | ||
23 | register, PKRU is inherently thread-local, potentially giving each | ||
24 | thread a different set of protections from every other thread. | ||
25 | |||
26 | There are two new instructions (RDPKRU/WRPKRU) for reading and writing | ||
27 | to the new register. The feature is only available in 64-bit mode, | ||
28 | even though there is theoretically space in the PAE PTEs. These | ||
29 | permissions are enforced on data access only and have no effect on | ||
30 | instruction fetches. | ||
31 | |||
32 | Syscalls | ||
33 | ======== | ||
34 | |||
35 | There are 3 system calls which directly interact with pkeys:: | ||
36 | |||
37 | int pkey_alloc(unsigned long flags, unsigned long init_access_rights) | ||
38 | int pkey_free(int pkey); | ||
39 | int pkey_mprotect(unsigned long start, size_t len, | ||
40 | unsigned long prot, int pkey); | ||
41 | |||
42 | Before a pkey can be used, it must first be allocated with | ||
43 | pkey_alloc(). An application calls the WRPKRU instruction | ||
44 | directly in order to change access permissions to memory covered | ||
45 | with a key. In this example WRPKRU is wrapped by a C function | ||
46 | called pkey_set(). | ||
47 | :: | ||
48 | |||
49 | int real_prot = PROT_READ|PROT_WRITE; | ||
50 | pkey = pkey_alloc(0, PKEY_DISABLE_WRITE); | ||
51 | ptr = mmap(NULL, PAGE_SIZE, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0); | ||
52 | ret = pkey_mprotect(ptr, PAGE_SIZE, real_prot, pkey); | ||
53 | ... application runs here | ||
54 | |||
55 | Now, if the application needs to update the data at 'ptr', it can | ||
56 | gain access, do the update, then remove its write access:: | ||
57 | |||
58 | pkey_set(pkey, 0); // clear PKEY_DISABLE_WRITE | ||
59 | *ptr = foo; // assign something | ||
60 | pkey_set(pkey, PKEY_DISABLE_WRITE); // set PKEY_DISABLE_WRITE again | ||
61 | |||
62 | Now when it frees the memory, it will also free the pkey since it | ||
63 | is no longer in use:: | ||
64 | |||
65 | munmap(ptr, PAGE_SIZE); | ||
66 | pkey_free(pkey); | ||
67 | |||
68 | .. note:: pkey_set() is a wrapper for the RDPKRU and WRPKRU instructions. | ||
69 | An example implementation can be found in | ||
70 | tools/testing/selftests/x86/protection_keys.c. | ||
71 | |||
72 | Behavior | ||
73 | ======== | ||
74 | |||
75 | The kernel attempts to make protection keys consistent with the | ||
76 | behavior of a plain mprotect(). For instance if you do this:: | ||
77 | |||
78 | mprotect(ptr, size, PROT_NONE); | ||
79 | something(ptr); | ||
80 | |||
81 | you can expect the same effects with protection keys when doing this:: | ||
82 | |||
83 | pkey = pkey_alloc(0, PKEY_DISABLE_WRITE | PKEY_DISABLE_READ); | ||
84 | pkey_mprotect(ptr, size, PROT_READ|PROT_WRITE, pkey); | ||
85 | something(ptr); | ||
86 | |||
87 | That should be true whether something() is a direct access to 'ptr' | ||
88 | like:: | ||
89 | |||
90 | *ptr = foo; | ||
91 | |||
92 | or when the kernel does the access on the application's behalf like | ||
93 | with a read():: | ||
94 | |||
95 | read(fd, ptr, 1); | ||
96 | |||
97 | The kernel will send a SIGSEGV in both cases, but si_code will be set | ||
98 | to SEGV_PKERR when violating protection keys versus SEGV_ACCERR when | ||
99 | the plain mprotect() permissions are violated. | ||
diff --git a/Documentation/x86/resctrl_ui.rst b/Documentation/x86/resctrl_ui.rst index 225cfd4daaee..5368cedfb530 100644 --- a/Documentation/x86/resctrl_ui.rst +++ b/Documentation/x86/resctrl_ui.rst | |||
@@ -40,7 +40,7 @@ mount options are: | |||
40 | Enable the MBA Software Controller(mba_sc) to specify MBA | 40 | Enable the MBA Software Controller(mba_sc) to specify MBA |
41 | bandwidth in MBps | 41 | bandwidth in MBps |
42 | 42 | ||
43 | L2 and L3 CDP are controlled seperately. | 43 | L2 and L3 CDP are controlled separately. |
44 | 44 | ||
45 | RDT features are orthogonal. A particular system may support only | 45 | RDT features are orthogonal. A particular system may support only |
46 | monitoring, only control, or both monitoring and control. Cache | 46 | monitoring, only control, or both monitoring and control. Cache |
@@ -118,7 +118,7 @@ related to allocation: | |||
118 | Corresponding region is pseudo-locked. No | 118 | Corresponding region is pseudo-locked. No |
119 | sharing allowed. | 119 | sharing allowed. |
120 | 120 | ||
121 | Memory bandwitdh(MB) subdirectory contains the following files | 121 | Memory bandwidth(MB) subdirectory contains the following files |
122 | with respect to allocation: | 122 | with respect to allocation: |
123 | 123 | ||
124 | "min_bandwidth": | 124 | "min_bandwidth": |
@@ -209,7 +209,7 @@ All groups contain the following files: | |||
209 | CPUs to/from this group. As with the tasks file a hierarchy is | 209 | CPUs to/from this group. As with the tasks file a hierarchy is |
210 | maintained where MON groups may only include CPUs owned by the | 210 | maintained where MON groups may only include CPUs owned by the |
211 | parent CTRL_MON group. | 211 | parent CTRL_MON group. |
212 | When the resouce group is in pseudo-locked mode this file will | 212 | When the resource group is in pseudo-locked mode this file will |
213 | only be readable, reflecting the CPUs associated with the | 213 | only be readable, reflecting the CPUs associated with the |
214 | pseudo-locked region. | 214 | pseudo-locked region. |
215 | 215 | ||
@@ -342,7 +342,7 @@ For cache resources we describe the portion of the cache that is available | |||
342 | for allocation using a bitmask. The maximum value of the mask is defined | 342 | for allocation using a bitmask. The maximum value of the mask is defined |
343 | by each cpu model (and may be different for different cache levels). It | 343 | by each cpu model (and may be different for different cache levels). It |
344 | is found using CPUID, but is also provided in the "info" directory of | 344 | is found using CPUID, but is also provided in the "info" directory of |
345 | the resctrl file system in "info/{resource}/cbm_mask". X86 hardware | 345 | the resctrl file system in "info/{resource}/cbm_mask". Intel hardware |
346 | requires that these masks have all the '1' bits in a contiguous block. So | 346 | requires that these masks have all the '1' bits in a contiguous block. So |
347 | 0x3, 0x6 and 0xC are legal 4-bit masks with two bits set, but 0x5, 0x9 | 347 | 0x3, 0x6 and 0xC are legal 4-bit masks with two bits set, but 0x5, 0x9 |
348 | and 0xA are not. On a system with a 20-bit mask each bit represents 5% | 348 | and 0xA are not. On a system with a 20-bit mask each bit represents 5% |
@@ -380,7 +380,7 @@ where L2 external is 10GBps (hence aggregate L2 external bandwidth is | |||
380 | 240GBps) and L3 external bandwidth is 100GBps. Now a workload with '20 | 380 | 240GBps) and L3 external bandwidth is 100GBps. Now a workload with '20 |
381 | threads, having 50% bandwidth, each consuming 5GBps' consumes the max L3 | 381 | threads, having 50% bandwidth, each consuming 5GBps' consumes the max L3 |
382 | bandwidth of 100GBps although the percentage value specified is only 50% | 382 | bandwidth of 100GBps although the percentage value specified is only 50% |
383 | << 100%. Hence increasing the bandwidth percentage will not yeild any | 383 | << 100%. Hence increasing the bandwidth percentage will not yield any |
384 | more bandwidth. This is because although the L2 external bandwidth still | 384 | more bandwidth. This is because although the L2 external bandwidth still |
385 | has capacity, the L3 external bandwidth is fully used. Also note that | 385 | has capacity, the L3 external bandwidth is fully used. Also note that |
386 | this would be dependent on number of cores the benchmark is run on. | 386 | this would be dependent on number of cores the benchmark is run on. |
@@ -398,7 +398,7 @@ In order to mitigate this and make the interface more user friendly, | |||
398 | resctrl added support for specifying the bandwidth in MBps as well. The | 398 | resctrl added support for specifying the bandwidth in MBps as well. The |
399 | kernel underneath would use a software feedback mechanism or a "Software | 399 | kernel underneath would use a software feedback mechanism or a "Software |
400 | Controller(mba_sc)" which reads the actual bandwidth using MBM counters | 400 | Controller(mba_sc)" which reads the actual bandwidth using MBM counters |
401 | and adjust the memowy bandwidth percentages to ensure:: | 401 | and adjust the memory bandwidth percentages to ensure:: |
402 | 402 | ||
403 | "actual bandwidth < user specified bandwidth". | 403 | "actual bandwidth < user specified bandwidth". |
404 | 404 | ||
@@ -418,16 +418,22 @@ L3 schemata file details (CDP enabled via mount option to resctrl) | |||
418 | When CDP is enabled L3 control is split into two separate resources | 418 | When CDP is enabled L3 control is split into two separate resources |
419 | so you can specify independent masks for code and data like this:: | 419 | so you can specify independent masks for code and data like this:: |
420 | 420 | ||
421 | L3data:<cache_id0>=<cbm>;<cache_id1>=<cbm>;... | 421 | L3DATA:<cache_id0>=<cbm>;<cache_id1>=<cbm>;... |
422 | L3code:<cache_id0>=<cbm>;<cache_id1>=<cbm>;... | 422 | L3CODE:<cache_id0>=<cbm>;<cache_id1>=<cbm>;... |
423 | 423 | ||
424 | L2 schemata file details | 424 | L2 schemata file details |
425 | ------------------------ | 425 | ------------------------ |
426 | L2 cache does not support code and data prioritization, so the | 426 | CDP is supported at L2 using the 'cdpl2' mount option. The schemata |
427 | schemata format is always:: | 427 | format is either:: |
428 | 428 | ||
429 | L2:<cache_id0>=<cbm>;<cache_id1>=<cbm>;... | 429 | L2:<cache_id0>=<cbm>;<cache_id1>=<cbm>;... |
430 | 430 | ||
431 | or | ||
432 | |||
433 | L2DATA:<cache_id0>=<cbm>;<cache_id1>=<cbm>;... | ||
434 | L2CODE:<cache_id0>=<cbm>;<cache_id1>=<cbm>;... | ||
435 | |||
436 | |||
431 | Memory bandwidth Allocation (default mode) | 437 | Memory bandwidth Allocation (default mode) |
432 | ------------------------------------------ | 438 | ------------------------------------------ |
433 | 439 | ||
@@ -671,8 +677,8 @@ allocations can overlap or not. The allocations specifies the maximum | |||
671 | b/w that the group may be able to use and the system admin can configure | 677 | b/w that the group may be able to use and the system admin can configure |
672 | the b/w accordingly. | 678 | the b/w accordingly. |
673 | 679 | ||
674 | If the MBA is specified in MB(megabytes) then user can enter the max b/w in MB | 680 | If resctrl is using the software controller (mba_sc) then user can enter the |
675 | rather than the percentage values. | 681 | max b/w in MB rather than the percentage values. |
676 | :: | 682 | :: |
677 | 683 | ||
678 | # echo "L3:0=3;1=c\nMB:0=1024;1=500" > /sys/fs/resctrl/p0/schemata | 684 | # echo "L3:0=3;1=c\nMB:0=1024;1=500" > /sys/fs/resctrl/p0/schemata |
diff --git a/Documentation/x86/x86_64/5level-paging.rst b/Documentation/x86/x86_64/5level-paging.rst index ab88a4514163..44856417e6a5 100644 --- a/Documentation/x86/x86_64/5level-paging.rst +++ b/Documentation/x86/x86_64/5level-paging.rst | |||
@@ -20,7 +20,7 @@ physical address space. This "ought to be enough for anybody" ©. | |||
20 | QEMU 2.9 and later support 5-level paging. | 20 | QEMU 2.9 and later support 5-level paging. |
21 | 21 | ||
22 | Virtual memory layout for 5-level paging is described in | 22 | Virtual memory layout for 5-level paging is described in |
23 | Documentation/x86/x86_64/mm.txt | 23 | Documentation/x86/x86_64/mm.rst |
24 | 24 | ||
25 | 25 | ||
26 | Enabling 5-level paging | 26 | Enabling 5-level paging |
diff --git a/Documentation/x86/x86_64/boot-options.rst b/Documentation/x86/x86_64/boot-options.rst index 2f69836b8445..6a4285a3c7a4 100644 --- a/Documentation/x86/x86_64/boot-options.rst +++ b/Documentation/x86/x86_64/boot-options.rst | |||
@@ -9,7 +9,7 @@ only the AMD64 specific ones are listed here. | |||
9 | 9 | ||
10 | Machine check | 10 | Machine check |
11 | ============= | 11 | ============= |
12 | Please see Documentation/x86/x86_64/machinecheck for sysfs runtime tunables. | 12 | Please see Documentation/x86/x86_64/machinecheck.rst for sysfs runtime tunables. |
13 | 13 | ||
14 | mce=off | 14 | mce=off |
15 | Disable machine check | 15 | Disable machine check |
@@ -89,7 +89,7 @@ APICs | |||
89 | Don't use the local APIC (alias for i386 compatibility) | 89 | Don't use the local APIC (alias for i386 compatibility) |
90 | 90 | ||
91 | pirq=... | 91 | pirq=... |
92 | See Documentation/x86/i386/IO-APIC.txt | 92 | See Documentation/x86/i386/IO-APIC.rst |
93 | 93 | ||
94 | noapictimer | 94 | noapictimer |
95 | Don't set up the APIC timer | 95 | Don't set up the APIC timer |
diff --git a/Documentation/x86/x86_64/fake-numa-for-cpusets.rst b/Documentation/x86/x86_64/fake-numa-for-cpusets.rst index a6926cd40f70..30108684ae87 100644 --- a/Documentation/x86/x86_64/fake-numa-for-cpusets.rst +++ b/Documentation/x86/x86_64/fake-numa-for-cpusets.rst | |||
@@ -18,7 +18,7 @@ For more information on the features of cpusets, see | |||
18 | Documentation/cgroup-v1/cpusets.rst. | 18 | Documentation/cgroup-v1/cpusets.rst. |
19 | There are a number of different configurations you can use for your needs. For | 19 | There are a number of different configurations you can use for your needs. For |
20 | more information on the numa=fake command line option and its various ways of | 20 | more information on the numa=fake command line option and its various ways of |
21 | configuring fake nodes, see Documentation/x86/x86_64/boot-options.txt. | 21 | configuring fake nodes, see Documentation/x86/x86_64/boot-options.rst. |
22 | 22 | ||
23 | For the purposes of this introduction, we'll assume a very primitive NUMA | 23 | For the purposes of this introduction, we'll assume a very primitive NUMA |
24 | emulation setup of "numa=fake=4*512,". This will split our system memory into | 24 | emulation setup of "numa=fake=4*512,". This will split our system memory into |