summaryrefslogtreecommitdiffstats
path: root/Documentation/x86
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2019-07-09 15:34:26 -0400
committerLinus Torvalds <torvalds@linux-foundation.org>2019-07-09 15:34:26 -0400
commite9a83bd2322035ed9d7dcf35753d3f984d76c6a5 (patch)
tree66dc466ff9aec0f9bb7f39cba50a47eab6585559 /Documentation/x86
parent7011b7e1b702cc76f9e969b41d9a95969f2aecaa (diff)
parent454f96f2b738374da4b0a703b1e2e7aed82c4486 (diff)
Merge tag 'docs-5.3' of git://git.lwn.net/linux
Pull Documentation updates from Jonathan Corbet: "It's been a relatively busy cycle for docs: - A fair pile of RST conversions, many from Mauro. These create more than the usual number of simple but annoying merge conflicts with other trees, unfortunately. He has a lot more of these waiting on the wings that, I think, will go to you directly later on. - A new document on how to use merges and rebases in kernel repos, and one on Spectre vulnerabilities. - Various improvements to the build system, including automatic markup of function() references because some people, for reasons I will never understand, were of the opinion that :c:func:``function()`` is unattractive and not fun to type. - We now recommend using sphinx 1.7, but still support back to 1.4. - Lots of smaller improvements, warning fixes, typo fixes, etc" * tag 'docs-5.3' of git://git.lwn.net/linux: (129 commits) docs: automarkup.py: ignore exceptions when seeking for xrefs docs: Move binderfs to admin-guide Disable Sphinx SmartyPants in HTML output doc: RCU callback locks need only _bh, not necessarily _irq docs: format kernel-parameters -- as code Doc : doc-guide : Fix a typo platform: x86: get rid of a non-existent document Add the RCU docs to the core-api manual Documentation: RCU: Add TOC tree hooks Documentation: RCU: Rename txt files to rst Documentation: RCU: Convert RCU UP systems to reST Documentation: RCU: Convert RCU linked list to reST Documentation: RCU: Convert RCU basic concepts to reST docs: filesystems: Remove uneeded .rst extension on toctables scripts/sphinx-pre-install: fix out-of-tree build docs: zh_CN: submitting-drivers.rst: Remove a duplicated Documentation/ Documentation: PGP: update for newer HW devices Documentation: Add section about CPU vulnerabilities for Spectre Documentation: platform: Delete x86-laptop-drivers.txt docs: Note that :c:func: should no longer be used ...
Diffstat (limited to 'Documentation/x86')
-rw-r--r--Documentation/x86/index.rst1
-rw-r--r--Documentation/x86/protection-keys.rst99
-rw-r--r--Documentation/x86/resctrl_ui.rst30
-rw-r--r--Documentation/x86/x86_64/5level-paging.rst2
-rw-r--r--Documentation/x86/x86_64/boot-options.rst4
-rw-r--r--Documentation/x86/x86_64/fake-numa-for-cpusets.rst2
6 files changed, 22 insertions, 116 deletions
diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst
index ae36fc5fc649..f2de1b2d3ac7 100644
--- a/Documentation/x86/index.rst
+++ b/Documentation/x86/index.rst
@@ -19,7 +19,6 @@ x86-specific Documentation
19 tlb 19 tlb
20 mtrr 20 mtrr
21 pat 21 pat
22 protection-keys
23 intel_mpx 22 intel_mpx
24 amd-memory-encryption 23 amd-memory-encryption
25 pti 24 pti
diff --git a/Documentation/x86/protection-keys.rst b/Documentation/x86/protection-keys.rst
deleted file mode 100644
index 49d9833af871..000000000000
--- a/Documentation/x86/protection-keys.rst
+++ /dev/null
@@ -1,99 +0,0 @@
1.. SPDX-License-Identifier: GPL-2.0
2
3======================
4Memory Protection Keys
5======================
6
7Memory Protection Keys for Userspace (PKU aka PKEYs) is a feature
8which is found on Intel's Skylake "Scalable Processor" Server CPUs.
9It will be avalable in future non-server parts.
10
11For anyone wishing to test or use this feature, it is available in
12Amazon's EC2 C5 instances and is known to work there using an Ubuntu
1317.04 image.
14
15Memory Protection Keys provides a mechanism for enforcing page-based
16protections, but without requiring modification of the page tables
17when an application changes protection domains. It works by
18dedicating 4 previously ignored bits in each page table entry to a
19"protection key", giving 16 possible keys.
20
21There is also a new user-accessible register (PKRU) with two separate
22bits (Access Disable and Write Disable) for each key. Being a CPU
23register, PKRU is inherently thread-local, potentially giving each
24thread a different set of protections from every other thread.
25
26There are two new instructions (RDPKRU/WRPKRU) for reading and writing
27to the new register. The feature is only available in 64-bit mode,
28even though there is theoretically space in the PAE PTEs. These
29permissions are enforced on data access only and have no effect on
30instruction fetches.
31
32Syscalls
33========
34
35There are 3 system calls which directly interact with pkeys::
36
37 int pkey_alloc(unsigned long flags, unsigned long init_access_rights)
38 int pkey_free(int pkey);
39 int pkey_mprotect(unsigned long start, size_t len,
40 unsigned long prot, int pkey);
41
42Before a pkey can be used, it must first be allocated with
43pkey_alloc(). An application calls the WRPKRU instruction
44directly in order to change access permissions to memory covered
45with a key. In this example WRPKRU is wrapped by a C function
46called pkey_set().
47::
48
49 int real_prot = PROT_READ|PROT_WRITE;
50 pkey = pkey_alloc(0, PKEY_DISABLE_WRITE);
51 ptr = mmap(NULL, PAGE_SIZE, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
52 ret = pkey_mprotect(ptr, PAGE_SIZE, real_prot, pkey);
53 ... application runs here
54
55Now, if the application needs to update the data at 'ptr', it can
56gain access, do the update, then remove its write access::
57
58 pkey_set(pkey, 0); // clear PKEY_DISABLE_WRITE
59 *ptr = foo; // assign something
60 pkey_set(pkey, PKEY_DISABLE_WRITE); // set PKEY_DISABLE_WRITE again
61
62Now when it frees the memory, it will also free the pkey since it
63is no longer in use::
64
65 munmap(ptr, PAGE_SIZE);
66 pkey_free(pkey);
67
68.. note:: pkey_set() is a wrapper for the RDPKRU and WRPKRU instructions.
69 An example implementation can be found in
70 tools/testing/selftests/x86/protection_keys.c.
71
72Behavior
73========
74
75The kernel attempts to make protection keys consistent with the
76behavior of a plain mprotect(). For instance if you do this::
77
78 mprotect(ptr, size, PROT_NONE);
79 something(ptr);
80
81you can expect the same effects with protection keys when doing this::
82
83 pkey = pkey_alloc(0, PKEY_DISABLE_WRITE | PKEY_DISABLE_READ);
84 pkey_mprotect(ptr, size, PROT_READ|PROT_WRITE, pkey);
85 something(ptr);
86
87That should be true whether something() is a direct access to 'ptr'
88like::
89
90 *ptr = foo;
91
92or when the kernel does the access on the application's behalf like
93with a read()::
94
95 read(fd, ptr, 1);
96
97The kernel will send a SIGSEGV in both cases, but si_code will be set
98to SEGV_PKERR when violating protection keys versus SEGV_ACCERR when
99the plain mprotect() permissions are violated.
diff --git a/Documentation/x86/resctrl_ui.rst b/Documentation/x86/resctrl_ui.rst
index 225cfd4daaee..5368cedfb530 100644
--- a/Documentation/x86/resctrl_ui.rst
+++ b/Documentation/x86/resctrl_ui.rst
@@ -40,7 +40,7 @@ mount options are:
40 Enable the MBA Software Controller(mba_sc) to specify MBA 40 Enable the MBA Software Controller(mba_sc) to specify MBA
41 bandwidth in MBps 41 bandwidth in MBps
42 42
43L2 and L3 CDP are controlled seperately. 43L2 and L3 CDP are controlled separately.
44 44
45RDT features are orthogonal. A particular system may support only 45RDT features are orthogonal. A particular system may support only
46monitoring, only control, or both monitoring and control. Cache 46monitoring, only control, or both monitoring and control. Cache
@@ -118,7 +118,7 @@ related to allocation:
118 Corresponding region is pseudo-locked. No 118 Corresponding region is pseudo-locked. No
119 sharing allowed. 119 sharing allowed.
120 120
121Memory bandwitdh(MB) subdirectory contains the following files 121Memory bandwidth(MB) subdirectory contains the following files
122with respect to allocation: 122with respect to allocation:
123 123
124"min_bandwidth": 124"min_bandwidth":
@@ -209,7 +209,7 @@ All groups contain the following files:
209 CPUs to/from this group. As with the tasks file a hierarchy is 209 CPUs to/from this group. As with the tasks file a hierarchy is
210 maintained where MON groups may only include CPUs owned by the 210 maintained where MON groups may only include CPUs owned by the
211 parent CTRL_MON group. 211 parent CTRL_MON group.
212 When the resouce group is in pseudo-locked mode this file will 212 When the resource group is in pseudo-locked mode this file will
213 only be readable, reflecting the CPUs associated with the 213 only be readable, reflecting the CPUs associated with the
214 pseudo-locked region. 214 pseudo-locked region.
215 215
@@ -342,7 +342,7 @@ For cache resources we describe the portion of the cache that is available
342for allocation using a bitmask. The maximum value of the mask is defined 342for allocation using a bitmask. The maximum value of the mask is defined
343by each cpu model (and may be different for different cache levels). It 343by each cpu model (and may be different for different cache levels). It
344is found using CPUID, but is also provided in the "info" directory of 344is found using CPUID, but is also provided in the "info" directory of
345the resctrl file system in "info/{resource}/cbm_mask". X86 hardware 345the resctrl file system in "info/{resource}/cbm_mask". Intel hardware
346requires that these masks have all the '1' bits in a contiguous block. So 346requires that these masks have all the '1' bits in a contiguous block. So
3470x3, 0x6 and 0xC are legal 4-bit masks with two bits set, but 0x5, 0x9 3470x3, 0x6 and 0xC are legal 4-bit masks with two bits set, but 0x5, 0x9
348and 0xA are not. On a system with a 20-bit mask each bit represents 5% 348and 0xA are not. On a system with a 20-bit mask each bit represents 5%
@@ -380,7 +380,7 @@ where L2 external is 10GBps (hence aggregate L2 external bandwidth is
380240GBps) and L3 external bandwidth is 100GBps. Now a workload with '20 380240GBps) and L3 external bandwidth is 100GBps. Now a workload with '20
381threads, having 50% bandwidth, each consuming 5GBps' consumes the max L3 381threads, having 50% bandwidth, each consuming 5GBps' consumes the max L3
382bandwidth of 100GBps although the percentage value specified is only 50% 382bandwidth of 100GBps although the percentage value specified is only 50%
383<< 100%. Hence increasing the bandwidth percentage will not yeild any 383<< 100%. Hence increasing the bandwidth percentage will not yield any
384more bandwidth. This is because although the L2 external bandwidth still 384more bandwidth. This is because although the L2 external bandwidth still
385has capacity, the L3 external bandwidth is fully used. Also note that 385has capacity, the L3 external bandwidth is fully used. Also note that
386this would be dependent on number of cores the benchmark is run on. 386this would be dependent on number of cores the benchmark is run on.
@@ -398,7 +398,7 @@ In order to mitigate this and make the interface more user friendly,
398resctrl added support for specifying the bandwidth in MBps as well. The 398resctrl added support for specifying the bandwidth in MBps as well. The
399kernel underneath would use a software feedback mechanism or a "Software 399kernel underneath would use a software feedback mechanism or a "Software
400Controller(mba_sc)" which reads the actual bandwidth using MBM counters 400Controller(mba_sc)" which reads the actual bandwidth using MBM counters
401and adjust the memowy bandwidth percentages to ensure:: 401and adjust the memory bandwidth percentages to ensure::
402 402
403 "actual bandwidth < user specified bandwidth". 403 "actual bandwidth < user specified bandwidth".
404 404
@@ -418,16 +418,22 @@ L3 schemata file details (CDP enabled via mount option to resctrl)
418When CDP is enabled L3 control is split into two separate resources 418When CDP is enabled L3 control is split into two separate resources
419so you can specify independent masks for code and data like this:: 419so you can specify independent masks for code and data like this::
420 420
421 L3data:<cache_id0>=<cbm>;<cache_id1>=<cbm>;... 421 L3DATA:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
422 L3code:<cache_id0>=<cbm>;<cache_id1>=<cbm>;... 422 L3CODE:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
423 423
424L2 schemata file details 424L2 schemata file details
425------------------------ 425------------------------
426L2 cache does not support code and data prioritization, so the 426CDP is supported at L2 using the 'cdpl2' mount option. The schemata
427schemata format is always:: 427format is either::
428 428
429 L2:<cache_id0>=<cbm>;<cache_id1>=<cbm>;... 429 L2:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
430 430
431or
432
433 L2DATA:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
434 L2CODE:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
435
436
431Memory bandwidth Allocation (default mode) 437Memory bandwidth Allocation (default mode)
432------------------------------------------ 438------------------------------------------
433 439
@@ -671,8 +677,8 @@ allocations can overlap or not. The allocations specifies the maximum
671b/w that the group may be able to use and the system admin can configure 677b/w that the group may be able to use and the system admin can configure
672the b/w accordingly. 678the b/w accordingly.
673 679
674If the MBA is specified in MB(megabytes) then user can enter the max b/w in MB 680If resctrl is using the software controller (mba_sc) then user can enter the
675rather than the percentage values. 681max b/w in MB rather than the percentage values.
676:: 682::
677 683
678 # echo "L3:0=3;1=c\nMB:0=1024;1=500" > /sys/fs/resctrl/p0/schemata 684 # echo "L3:0=3;1=c\nMB:0=1024;1=500" > /sys/fs/resctrl/p0/schemata
diff --git a/Documentation/x86/x86_64/5level-paging.rst b/Documentation/x86/x86_64/5level-paging.rst
index ab88a4514163..44856417e6a5 100644
--- a/Documentation/x86/x86_64/5level-paging.rst
+++ b/Documentation/x86/x86_64/5level-paging.rst
@@ -20,7 +20,7 @@ physical address space. This "ought to be enough for anybody" ©.
20QEMU 2.9 and later support 5-level paging. 20QEMU 2.9 and later support 5-level paging.
21 21
22Virtual memory layout for 5-level paging is described in 22Virtual memory layout for 5-level paging is described in
23Documentation/x86/x86_64/mm.txt 23Documentation/x86/x86_64/mm.rst
24 24
25 25
26Enabling 5-level paging 26Enabling 5-level paging
diff --git a/Documentation/x86/x86_64/boot-options.rst b/Documentation/x86/x86_64/boot-options.rst
index 2f69836b8445..6a4285a3c7a4 100644
--- a/Documentation/x86/x86_64/boot-options.rst
+++ b/Documentation/x86/x86_64/boot-options.rst
@@ -9,7 +9,7 @@ only the AMD64 specific ones are listed here.
9 9
10Machine check 10Machine check
11============= 11=============
12Please see Documentation/x86/x86_64/machinecheck for sysfs runtime tunables. 12Please see Documentation/x86/x86_64/machinecheck.rst for sysfs runtime tunables.
13 13
14 mce=off 14 mce=off
15 Disable machine check 15 Disable machine check
@@ -89,7 +89,7 @@ APICs
89 Don't use the local APIC (alias for i386 compatibility) 89 Don't use the local APIC (alias for i386 compatibility)
90 90
91 pirq=... 91 pirq=...
92 See Documentation/x86/i386/IO-APIC.txt 92 See Documentation/x86/i386/IO-APIC.rst
93 93
94 noapictimer 94 noapictimer
95 Don't set up the APIC timer 95 Don't set up the APIC timer
diff --git a/Documentation/x86/x86_64/fake-numa-for-cpusets.rst b/Documentation/x86/x86_64/fake-numa-for-cpusets.rst
index a6926cd40f70..30108684ae87 100644
--- a/Documentation/x86/x86_64/fake-numa-for-cpusets.rst
+++ b/Documentation/x86/x86_64/fake-numa-for-cpusets.rst
@@ -18,7 +18,7 @@ For more information on the features of cpusets, see
18Documentation/cgroup-v1/cpusets.rst. 18Documentation/cgroup-v1/cpusets.rst.
19There are a number of different configurations you can use for your needs. For 19There are a number of different configurations you can use for your needs. For
20more information on the numa=fake command line option and its various ways of 20more information on the numa=fake command line option and its various ways of
21configuring fake nodes, see Documentation/x86/x86_64/boot-options.txt. 21configuring fake nodes, see Documentation/x86/x86_64/boot-options.rst.
22 22
23For the purposes of this introduction, we'll assume a very primitive NUMA 23For the purposes of this introduction, we'll assume a very primitive NUMA
24emulation setup of "numa=fake=4*512,". This will split our system memory into 24emulation setup of "numa=fake=4*512,". This will split our system memory into