diff options
author | Kirill A. Shutemov <kirill.shutemov@linux.intel.com> | 2017-07-16 18:59:54 -0400 |
---|---|---|
committer | Ingo Molnar <mingo@kernel.org> | 2017-07-21 04:05:19 -0400 |
commit | 77ef56e4f0fbb350d93289aa025c7d605af012d4 (patch) | |
tree | 74562d80774df19f6be3a40304b0e77d65d6f9b4 | |
parent | ee00f4a32a76ef631394f31d5b6028d50462b357 (diff) |
x86: Enable 5-level paging support via CONFIG_X86_5LEVEL=y
Most of things are in place and we can enable support for 5-level paging.
The patch makes XEN_PV and XEN_PVH dependent on !X86_5LEVEL. Both are
not ready to work with 5-level paging.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20170716225954.74185-9-kirill.shutemov@linux.intel.com
[ Minor readability edits. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-rw-r--r-- | Documentation/x86/x86_64/5level-paging.txt | 64 | ||||
-rw-r--r-- | arch/x86/Kconfig | 19 | ||||
-rw-r--r-- | arch/x86/xen/Kconfig | 5 |
3 files changed, 88 insertions, 0 deletions
diff --git a/Documentation/x86/x86_64/5level-paging.txt b/Documentation/x86/x86_64/5level-paging.txt new file mode 100644 index 000000000000..087251a0d99c --- /dev/null +++ b/Documentation/x86/x86_64/5level-paging.txt | |||
@@ -0,0 +1,64 @@ | |||
1 | == Overview == | ||
2 | |||
3 | Original x86-64 was limited by 4-level paing to 256 TiB of virtual address | ||
4 | space and 64 TiB of physical address space. We are already bumping into | ||
5 | this limit: some vendors offers servers with 64 TiB of memory today. | ||
6 | |||
7 | To overcome the limitation upcoming hardware will introduce support for | ||
8 | 5-level paging. It is a straight-forward extension of the current page | ||
9 | table structure adding one more layer of translation. | ||
10 | |||
11 | It bumps the limits to 128 PiB of virtual address space and 4 PiB of | ||
12 | physical address space. This "ought to be enough for anybody" ©. | ||
13 | |||
14 | QEMU 2.9 and later support 5-level paging. | ||
15 | |||
16 | Virtual memory layout for 5-level paging is described in | ||
17 | Documentation/x86/x86_64/mm.txt | ||
18 | |||
19 | == Enabling 5-level paging == | ||
20 | |||
21 | CONFIG_X86_5LEVEL=y enables the feature. | ||
22 | |||
23 | So far, a kernel compiled with the option enabled will be able to boot | ||
24 | only on machines that supports the feature -- see for 'la57' flag in | ||
25 | /proc/cpuinfo. | ||
26 | |||
27 | The plan is to implement boot-time switching between 4- and 5-level paging | ||
28 | in the future. | ||
29 | |||
30 | == User-space and large virtual address space == | ||
31 | |||
32 | On x86, 5-level paging enables 56-bit userspace virtual address space. | ||
33 | Not all user space is ready to handle wide addresses. It's known that | ||
34 | at least some JIT compilers use higher bits in pointers to encode their | ||
35 | information. It collides with valid pointers with 5-level paging and | ||
36 | leads to crashes. | ||
37 | |||
38 | To mitigate this, we are not going to allocate virtual address space | ||
39 | above 47-bit by default. | ||
40 | |||
41 | But userspace can ask for allocation from full address space by | ||
42 | specifying hint address (with or without MAP_FIXED) above 47-bits. | ||
43 | |||
44 | If hint address set above 47-bit, but MAP_FIXED is not specified, we try | ||
45 | to look for unmapped area by specified address. If it's already | ||
46 | occupied, we look for unmapped area in *full* address space, rather than | ||
47 | from 47-bit window. | ||
48 | |||
49 | A high hint address would only affect the allocation in question, but not | ||
50 | any future mmap()s. | ||
51 | |||
52 | Specifying high hint address on older kernel or on machine without 5-level | ||
53 | paging support is safe. The hint will be ignored and kernel will fall back | ||
54 | to allocation from 47-bit address space. | ||
55 | |||
56 | This approach helps to easily make application's memory allocator aware | ||
57 | about large address space without manually tracking allocated virtual | ||
58 | address space. | ||
59 | |||
60 | One important case we need to handle here is interaction with MPX. | ||
61 | MPX (without MAWA extension) cannot handle addresses above 47-bit, so we | ||
62 | need to make sure that MPX cannot be enabled we already have VMA above | ||
63 | the boundary and forbid creating such VMAs once MPX is enabled. | ||
64 | |||
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 8328bcb9ce8b..ff637dedfafa 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig | |||
@@ -326,6 +326,7 @@ config FIX_EARLYCON_MEM | |||
326 | 326 | ||
327 | config PGTABLE_LEVELS | 327 | config PGTABLE_LEVELS |
328 | int | 328 | int |
329 | default 5 if X86_5LEVEL | ||
329 | default 4 if X86_64 | 330 | default 4 if X86_64 |
330 | default 3 if X86_PAE | 331 | default 3 if X86_PAE |
331 | default 2 | 332 | default 2 |
@@ -1398,6 +1399,24 @@ config X86_PAE | |||
1398 | has the cost of more pagetable lookup overhead, and also | 1399 | has the cost of more pagetable lookup overhead, and also |
1399 | consumes more pagetable space per process. | 1400 | consumes more pagetable space per process. |
1400 | 1401 | ||
1402 | config X86_5LEVEL | ||
1403 | bool "Enable 5-level page tables support" | ||
1404 | depends on X86_64 | ||
1405 | ---help--- | ||
1406 | 5-level paging enables access to larger address space: | ||
1407 | upto 128 PiB of virtual address space and 4 PiB of | ||
1408 | physical address space. | ||
1409 | |||
1410 | It will be supported by future Intel CPUs. | ||
1411 | |||
1412 | Note: a kernel with this option enabled can only be booted | ||
1413 | on machines that support the feature. | ||
1414 | |||
1415 | See Documentation/x86/x86_64/5level-paging.txt for more | ||
1416 | information. | ||
1417 | |||
1418 | Say N if unsure. | ||
1419 | |||
1401 | config ARCH_PHYS_ADDR_T_64BIT | 1420 | config ARCH_PHYS_ADDR_T_64BIT |
1402 | def_bool y | 1421 | def_bool y |
1403 | depends on X86_64 || X86_PAE | 1422 | depends on X86_64 || X86_PAE |
diff --git a/arch/x86/xen/Kconfig b/arch/x86/xen/Kconfig index 027987638e98..1ecd419811a2 100644 --- a/arch/x86/xen/Kconfig +++ b/arch/x86/xen/Kconfig | |||
@@ -17,6 +17,9 @@ config XEN_PV | |||
17 | bool "Xen PV guest support" | 17 | bool "Xen PV guest support" |
18 | default y | 18 | default y |
19 | depends on XEN | 19 | depends on XEN |
20 | # XEN_PV is not ready to work with 5-level paging. | ||
21 | # Changes to hypervisor are also required. | ||
22 | depends on !X86_5LEVEL | ||
20 | select XEN_HAVE_PVMMU | 23 | select XEN_HAVE_PVMMU |
21 | select XEN_HAVE_VPMU | 24 | select XEN_HAVE_VPMU |
22 | help | 25 | help |
@@ -75,4 +78,6 @@ config XEN_DEBUG_FS | |||
75 | config XEN_PVH | 78 | config XEN_PVH |
76 | bool "Support for running as a PVH guest" | 79 | bool "Support for running as a PVH guest" |
77 | depends on XEN && XEN_PVHVM && ACPI | 80 | depends on XEN && XEN_PVHVM && ACPI |
81 | # Pre-built page tables are not ready to handle 5-level paging. | ||
82 | depends on !X86_5LEVEL | ||
78 | def_bool n | 83 | def_bool n |