aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/x86/intel_mpx.txt
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/x86/intel_mpx.txt')
-rw-r--r--Documentation/x86/intel_mpx.txt234
1 files changed, 234 insertions, 0 deletions
diff --git a/Documentation/x86/intel_mpx.txt b/Documentation/x86/intel_mpx.txt
new file mode 100644
index 000000000000..4472ed2ad921
--- /dev/null
+++ b/Documentation/x86/intel_mpx.txt
@@ -0,0 +1,234 @@
11. Intel(R) MPX Overview
2========================
3
4Intel(R) Memory Protection Extensions (Intel(R) MPX) is a new capability
5introduced into Intel Architecture. Intel MPX provides hardware features
6that can be used in conjunction with compiler changes to check memory
7references, for those references whose compile-time normal intentions are
8usurped at runtime due to buffer overflow or underflow.
9
10For more information, please refer to Intel(R) Architecture Instruction
11Set Extensions Programming Reference, Chapter 9: Intel(R) Memory Protection
12Extensions.
13
14Note: Currently no hardware with MPX ISA is available but it is always
15possible to use SDE (Intel(R) Software Development Emulator) instead, which
16can be downloaded from
17http://software.intel.com/en-us/articles/intel-software-development-emulator
18
19
202. How to get the advantage of MPX
21==================================
22
23For MPX to work, changes are required in the kernel, binutils and compiler.
24No source changes are required for applications, just a recompile.
25
26There are a lot of moving parts of this to all work right. The following
27is how we expect the compiler, application and kernel to work together.
28
291) Application developer compiles with -fmpx. The compiler will add the
30 instrumentation as well as some setup code called early after the app
31 starts. New instruction prefixes are noops for old CPUs.
322) That setup code allocates (virtual) space for the "bounds directory",
33 points the "bndcfgu" register to the directory and notifies the kernel
34 (via the new prctl(PR_MPX_ENABLE_MANAGEMENT)) that the app will be using
35 MPX.
363) The kernel detects that the CPU has MPX, allows the new prctl() to
37 succeed, and notes the location of the bounds directory. Userspace is
38 expected to keep the bounds directory at that locationWe note it
39 instead of reading it each time because the 'xsave' operation needed
40 to access the bounds directory register is an expensive operation.
414) If the application needs to spill bounds out of the 4 registers, it
42 issues a bndstx instruction. Since the bounds directory is empty at
43 this point, a bounds fault (#BR) is raised, the kernel allocates a
44 bounds table (in the user address space) and makes the relevant entry
45 in the bounds directory point to the new table.
465) If the application violates the bounds specified in the bounds registers,
47 a separate kind of #BR is raised which will deliver a signal with
48 information about the violation in the 'struct siginfo'.
496) Whenever memory is freed, we know that it can no longer contain valid
50 pointers, and we attempt to free the associated space in the bounds
51 tables. If an entire table becomes unused, we will attempt to free
52 the table and remove the entry in the directory.
53
54To summarize, there are essentially three things interacting here:
55
56GCC with -fmpx:
57 * enables annotation of code with MPX instructions and prefixes
58 * inserts code early in the application to call in to the "gcc runtime"
59GCC MPX Runtime:
60 * Checks for hardware MPX support in cpuid leaf
61 * allocates virtual space for the bounds directory (malloc() essentially)
62 * points the hardware BNDCFGU register at the directory
63 * calls a new prctl(PR_MPX_ENABLE_MANAGEMENT) to notify the kernel to
64 start managing the bounds directories
65Kernel MPX Code:
66 * Checks for hardware MPX support in cpuid leaf
67 * Handles #BR exceptions and sends SIGSEGV to the app when it violates
68 bounds, like during a buffer overflow.
69 * When bounds are spilled in to an unallocated bounds table, the kernel
70 notices in the #BR exception, allocates the virtual space, then
71 updates the bounds directory to point to the new table. It keeps
72 special track of the memory with a VM_MPX flag.
73 * Frees unused bounds tables at the time that the memory they described
74 is unmapped.
75
76
773. How does MPX kernel code work
78================================
79
80Handling #BR faults caused by MPX
81---------------------------------
82
83When MPX is enabled, there are 2 new situations that can generate
84#BR faults.
85 * new bounds tables (BT) need to be allocated to save bounds.
86 * bounds violation caused by MPX instructions.
87
88We hook #BR handler to handle these two new situations.
89
90On-demand kernel allocation of bounds tables
91--------------------------------------------
92
93MPX only has 4 hardware registers for storing bounds information. If
94MPX-enabled code needs more than these 4 registers, it needs to spill
95them somewhere. It has two special instructions for this which allow
96the bounds to be moved between the bounds registers and some new "bounds
97tables".
98
99#BR exceptions are a new class of exceptions just for MPX. They are
100similar conceptually to a page fault and will be raised by the MPX
101hardware during both bounds violations or when the tables are not
102present. The kernel handles those #BR exceptions for not-present tables
103by carving the space out of the normal processes address space and then
104pointing the bounds-directory over to it.
105
106The tables need to be accessed and controlled by userspace because
107the instructions for moving bounds in and out of them are extremely
108frequent. They potentially happen every time a register points to
109memory. Any direct kernel involvement (like a syscall) to access the
110tables would obviously destroy performance.
111
112Why not do this in userspace? MPX does not strictly require anything in
113the kernel. It can theoretically be done completely from userspace. Here
114are a few ways this could be done. We don't think any of them are practical
115in the real-world, but here they are.
116
117Q: Can virtual space simply be reserved for the bounds tables so that we
118 never have to allocate them?
119A: MPX-enabled application will possibly create a lot of bounds tables in
120 process address space to save bounds information. These tables can take
121 up huge swaths of memory (as much as 80% of the memory on the system)
122 even if we clean them up aggressively. In the worst-case scenario, the
123 tables can be 4x the size of the data structure being tracked. IOW, a
124 1-page structure can require 4 bounds-table pages. An X-GB virtual
125 area needs 4*X GB of virtual space, plus 2GB for the bounds directory.
126 If we were to preallocate them for the 128TB of user virtual address
127 space, we would need to reserve 512TB+2GB, which is larger than the
128 entire virtual address space today. This means they can not be reserved
129 ahead of time. Also, a single process's pre-popualated bounds directory
130 consumes 2GB of virtual *AND* physical memory. IOW, it's completely
131 infeasible to prepopulate bounds directories.
132
133Q: Can we preallocate bounds table space at the same time memory is
134 allocated which might contain pointers that might eventually need
135 bounds tables?
136A: This would work if we could hook the site of each and every memory
137 allocation syscall. This can be done for small, constrained applications.
138 But, it isn't practical at a larger scale since a given app has no
139 way of controlling how all the parts of the app might allocate memory
140 (think libraries). The kernel is really the only place to intercept
141 these calls.
142
143Q: Could a bounds fault be handed to userspace and the tables allocated
144 there in a signal handler intead of in the kernel?
145A: mmap() is not on the list of safe async handler functions and even
146 if mmap() would work it still requires locking or nasty tricks to
147 keep track of the allocation state there.
148
149Having ruled out all of the userspace-only approaches for managing
150bounds tables that we could think of, we create them on demand in
151the kernel.
152
153Decoding MPX instructions
154-------------------------
155
156If a #BR is generated due to a bounds violation caused by MPX.
157We need to decode MPX instructions to get violation address and
158set this address into extended struct siginfo.
159
160The _sigfault feild of struct siginfo is extended as follow:
161
16287 /* SIGILL, SIGFPE, SIGSEGV, SIGBUS */
16388 struct {
16489 void __user *_addr; /* faulting insn/memory ref. */
16590 #ifdef __ARCH_SI_TRAPNO
16691 int _trapno; /* TRAP # which caused the signal */
16792 #endif
16893 short _addr_lsb; /* LSB of the reported address */
16994 struct {
17095 void __user *_lower;
17196 void __user *_upper;
17297 } _addr_bnd;
17398 } _sigfault;
174
175The '_addr' field refers to violation address, and new '_addr_and'
176field refers to the upper/lower bounds when a #BR is caused.
177
178Glibc will be also updated to support this new siginfo. So user
179can get violation address and bounds when bounds violations occur.
180
181Cleanup unused bounds tables
182----------------------------
183
184When a BNDSTX instruction attempts to save bounds to a bounds directory
185entry marked as invalid, a #BR is generated. This is an indication that
186no bounds table exists for this entry. In this case the fault handler
187will allocate a new bounds table on demand.
188
189Since the kernel allocated those tables on-demand without userspace
190knowledge, it is also responsible for freeing them when the associated
191mappings go away.
192
193Here, the solution for this issue is to hook do_munmap() to check
194whether one process is MPX enabled. If yes, those bounds tables covered
195in the virtual address region which is being unmapped will be freed also.
196
197Adding new prctl commands
198-------------------------
199
200Two new prctl commands are added to enable and disable MPX bounds tables
201management in kernel.
202
203155 #define PR_MPX_ENABLE_MANAGEMENT 43
204156 #define PR_MPX_DISABLE_MANAGEMENT 44
205
206Runtime library in userspace is responsible for allocation of bounds
207directory. So kernel have to use XSAVE instruction to get the base
208of bounds directory from BNDCFG register.
209
210But XSAVE is expected to be very expensive. In order to do performance
211optimization, we have to get the base of bounds directory and save it
212into struct mm_struct to be used in future during PR_MPX_ENABLE_MANAGEMENT
213command execution.
214
215
2164. Special rules
217================
218
2191) If userspace is requesting help from the kernel to do the management
220of bounds tables, it may not create or modify entries in the bounds directory.
221
222Certainly users can allocate bounds tables and forcibly point the bounds
223directory at them through XSAVE instruction, and then set valid bit
224of bounds entry to have this entry valid. But, the kernel will decline
225to assist in managing these tables.
226
2272) Userspace may not take multiple bounds directory entries and point
228them at the same bounds table.
229
230This is allowed architecturally. See more information "Intel(R) Architecture
231Instruction Set Extensions Programming Reference" (9.3.4).
232
233However, if users did this, the kernel might be fooled in to unmaping an
234in-use bounds table since it does not recognize sharing.