diff options
Diffstat (limited to 'Documentation/arm64/sve.txt')
-rw-r--r-- | Documentation/arm64/sve.txt | 508 |
1 files changed, 508 insertions, 0 deletions
diff --git a/Documentation/arm64/sve.txt b/Documentation/arm64/sve.txt new file mode 100644 index 000000000000..f128f736b4a5 --- /dev/null +++ b/Documentation/arm64/sve.txt | |||
@@ -0,0 +1,508 @@ | |||
1 | Scalable Vector Extension support for AArch64 Linux | ||
2 | =================================================== | ||
3 | |||
4 | Author: Dave Martin <Dave.Martin@arm.com> | ||
5 | Date: 4 August 2017 | ||
6 | |||
7 | This document outlines briefly the interface provided to userspace by Linux in | ||
8 | order to support use of the ARM Scalable Vector Extension (SVE). | ||
9 | |||
10 | This is an outline of the most important features and issues only and not | ||
11 | intended to be exhaustive. | ||
12 | |||
13 | This document does not aim to describe the SVE architecture or programmer's | ||
14 | model. To aid understanding, a minimal description of relevant programmer's | ||
15 | model features for SVE is included in Appendix A. | ||
16 | |||
17 | |||
18 | 1. General | ||
19 | ----------- | ||
20 | |||
21 | * SVE registers Z0..Z31, P0..P15 and FFR and the current vector length VL, are | ||
22 | tracked per-thread. | ||
23 | |||
24 | * The presence of SVE is reported to userspace via HWCAP_SVE in the aux vector | ||
25 | AT_HWCAP entry. Presence of this flag implies the presence of the SVE | ||
26 | instructions and registers, and the Linux-specific system interfaces | ||
27 | described in this document. SVE is reported in /proc/cpuinfo as "sve". | ||
28 | |||
29 | * Support for the execution of SVE instructions in userspace can also be | ||
30 | detected by reading the CPU ID register ID_AA64PFR0_EL1 using an MRS | ||
31 | instruction, and checking that the value of the SVE field is nonzero. [3] | ||
32 | |||
33 | It does not guarantee the presence of the system interfaces described in the | ||
34 | following sections: software that needs to verify that those interfaces are | ||
35 | present must check for HWCAP_SVE instead. | ||
36 | |||
37 | * Debuggers should restrict themselves to interacting with the target via the | ||
38 | NT_ARM_SVE regset. The recommended way of detecting support for this regset | ||
39 | is to connect to a target process first and then attempt a | ||
40 | ptrace(PTRACE_GETREGSET, pid, NT_ARM_SVE, &iov). | ||
41 | |||
42 | |||
43 | 2. Vector length terminology | ||
44 | ----------------------------- | ||
45 | |||
46 | The size of an SVE vector (Z) register is referred to as the "vector length". | ||
47 | |||
48 | To avoid confusion about the units used to express vector length, the kernel | ||
49 | adopts the following conventions: | ||
50 | |||
51 | * Vector length (VL) = size of a Z-register in bytes | ||
52 | |||
53 | * Vector quadwords (VQ) = size of a Z-register in units of 128 bits | ||
54 | |||
55 | (So, VL = 16 * VQ.) | ||
56 | |||
57 | The VQ convention is used where the underlying granularity is important, such | ||
58 | as in data structure definitions. In most other situations, the VL convention | ||
59 | is used. This is consistent with the meaning of the "VL" pseudo-register in | ||
60 | the SVE instruction set architecture. | ||
61 | |||
62 | |||
63 | 3. System call behaviour | ||
64 | ------------------------- | ||
65 | |||
66 | * On syscall, V0..V31 are preserved (as without SVE). Thus, bits [127:0] of | ||
67 | Z0..Z31 are preserved. All other bits of Z0..Z31, and all of P0..P15 and FFR | ||
68 | become unspecified on return from a syscall. | ||
69 | |||
70 | * The SVE registers are not used to pass arguments to or receive results from | ||
71 | any syscall. | ||
72 | |||
73 | * In practice the affected registers/bits will be preserved or will be replaced | ||
74 | with zeros on return from a syscall, but userspace should not make | ||
75 | assumptions about this. The kernel behaviour may vary on a case-by-case | ||
76 | basis. | ||
77 | |||
78 | * All other SVE state of a thread, including the currently configured vector | ||
79 | length, the state of the PR_SVE_VL_INHERIT flag, and the deferred vector | ||
80 | length (if any), is preserved across all syscalls, subject to the specific | ||
81 | exceptions for execve() described in section 6. | ||
82 | |||
83 | In particular, on return from a fork() or clone(), the parent and new child | ||
84 | process or thread share identical SVE configuration, matching that of the | ||
85 | parent before the call. | ||
86 | |||
87 | |||
88 | 4. Signal handling | ||
89 | ------------------- | ||
90 | |||
91 | * A new signal frame record sve_context encodes the SVE registers on signal | ||
92 | delivery. [1] | ||
93 | |||
94 | * This record is supplementary to fpsimd_context. The FPSR and FPCR registers | ||
95 | are only present in fpsimd_context. For convenience, the content of V0..V31 | ||
96 | is duplicated between sve_context and fpsimd_context. | ||
97 | |||
98 | * The signal frame record for SVE always contains basic metadata, in particular | ||
99 | the thread's vector length (in sve_context.vl). | ||
100 | |||
101 | * The SVE registers may or may not be included in the record, depending on | ||
102 | whether the registers are live for the thread. The registers are present if | ||
103 | and only if: | ||
104 | sve_context.head.size >= SVE_SIG_CONTEXT_SIZE(sve_vq_from_vl(sve_context.vl)). | ||
105 | |||
106 | * If the registers are present, the remainder of the record has a vl-dependent | ||
107 | size and layout. Macros SVE_SIG_* are defined [1] to facilitate access to | ||
108 | the members. | ||
109 | |||
110 | * If the SVE context is too big to fit in sigcontext.__reserved[], then extra | ||
111 | space is allocated on the stack, an extra_context record is written in | ||
112 | __reserved[] referencing this space. sve_context is then written in the | ||
113 | extra space. Refer to [1] for further details about this mechanism. | ||
114 | |||
115 | |||
116 | 5. Signal return | ||
117 | ----------------- | ||
118 | |||
119 | When returning from a signal handler: | ||
120 | |||
121 | * If there is no sve_context record in the signal frame, or if the record is | ||
122 | present but contains no register data as desribed in the previous section, | ||
123 | then the SVE registers/bits become non-live and take unspecified values. | ||
124 | |||
125 | * If sve_context is present in the signal frame and contains full register | ||
126 | data, the SVE registers become live and are populated with the specified | ||
127 | data. However, for backward compatibility reasons, bits [127:0] of Z0..Z31 | ||
128 | are always restored from the corresponding members of fpsimd_context.vregs[] | ||
129 | and not from sve_context. The remaining bits are restored from sve_context. | ||
130 | |||
131 | * Inclusion of fpsimd_context in the signal frame remains mandatory, | ||
132 | irrespective of whether sve_context is present or not. | ||
133 | |||
134 | * The vector length cannot be changed via signal return. If sve_context.vl in | ||
135 | the signal frame does not match the current vector length, the signal return | ||
136 | attempt is treated as illegal, resulting in a forced SIGSEGV. | ||
137 | |||
138 | |||
139 | 6. prctl extensions | ||
140 | -------------------- | ||
141 | |||
142 | Some new prctl() calls are added to allow programs to manage the SVE vector | ||
143 | length: | ||
144 | |||
145 | prctl(PR_SVE_SET_VL, unsigned long arg) | ||
146 | |||
147 | Sets the vector length of the calling thread and related flags, where | ||
148 | arg == vl | flags. Other threads of the calling process are unaffected. | ||
149 | |||
150 | vl is the desired vector length, where sve_vl_valid(vl) must be true. | ||
151 | |||
152 | flags: | ||
153 | |||
154 | PR_SVE_SET_VL_INHERIT | ||
155 | |||
156 | Inherit the current vector length across execve(). Otherwise, the | ||
157 | vector length is reset to the system default at execve(). (See | ||
158 | Section 9.) | ||
159 | |||
160 | PR_SVE_SET_VL_ONEXEC | ||
161 | |||
162 | Defer the requested vector length change until the next execve() | ||
163 | performed by this thread. | ||
164 | |||
165 | The effect is equivalent to implicit exceution of the following | ||
166 | call immediately after the next execve() (if any) by the thread: | ||
167 | |||
168 | prctl(PR_SVE_SET_VL, arg & ~PR_SVE_SET_VL_ONEXEC) | ||
169 | |||
170 | This allows launching of a new program with a different vector | ||
171 | length, while avoiding runtime side effects in the caller. | ||
172 | |||
173 | |||
174 | Without PR_SVE_SET_VL_ONEXEC, the requested change takes effect | ||
175 | immediately. | ||
176 | |||
177 | |||
178 | Return value: a nonnegative on success, or a negative value on error: | ||
179 | EINVAL: SVE not supported, invalid vector length requested, or | ||
180 | invalid flags. | ||
181 | |||
182 | |||
183 | On success: | ||
184 | |||
185 | * Either the calling thread's vector length or the deferred vector length | ||
186 | to be applied at the next execve() by the thread (dependent on whether | ||
187 | PR_SVE_SET_VL_ONEXEC is present in arg), is set to the largest value | ||
188 | supported by the system that is less than or equal to vl. If vl == | ||
189 | SVE_VL_MAX, the value set will be the largest value supported by the | ||
190 | system. | ||
191 | |||
192 | * Any previously outstanding deferred vector length change in the calling | ||
193 | thread is cancelled. | ||
194 | |||
195 | * The returned value describes the resulting configuration, encoded as for | ||
196 | PR_SVE_GET_VL. The vector length reported in this value is the new | ||
197 | current vector length for this thread if PR_SVE_SET_VL_ONEXEC was not | ||
198 | present in arg; otherwise, the reported vector length is the deferred | ||
199 | vector length that will be applied at the next execve() by the calling | ||
200 | thread. | ||
201 | |||
202 | * Changing the vector length causes all of P0..P15, FFR and all bits of | ||
203 | Z0..V31 except for Z0 bits [127:0] .. Z31 bits [127:0] to become | ||
204 | unspecified. Calling PR_SVE_SET_VL with vl equal to the thread's current | ||
205 | vector length, or calling PR_SVE_SET_VL with the PR_SVE_SET_VL_ONEXEC | ||
206 | flag, does not constitute a change to the vector length for this purpose. | ||
207 | |||
208 | |||
209 | prctl(PR_SVE_GET_VL) | ||
210 | |||
211 | Gets the vector length of the calling thread. | ||
212 | |||
213 | The following flag may be OR-ed into the result: | ||
214 | |||
215 | PR_SVE_SET_VL_INHERIT | ||
216 | |||
217 | Vector length will be inherited across execve(). | ||
218 | |||
219 | There is no way to determine whether there is an outstanding deferred | ||
220 | vector length change (which would only normally be the case between a | ||
221 | fork() or vfork() and the corresponding execve() in typical use). | ||
222 | |||
223 | To extract the vector length from the result, and it with | ||
224 | PR_SVE_VL_LEN_MASK. | ||
225 | |||
226 | Return value: a nonnegative value on success, or a negative value on error: | ||
227 | EINVAL: SVE not supported. | ||
228 | |||
229 | |||
230 | 7. ptrace extensions | ||
231 | --------------------- | ||
232 | |||
233 | * A new regset NT_ARM_SVE is defined for use with PTRACE_GETREGSET and | ||
234 | PTRACE_SETREGSET. | ||
235 | |||
236 | Refer to [2] for definitions. | ||
237 | |||
238 | The regset data starts with struct user_sve_header, containing: | ||
239 | |||
240 | size | ||
241 | |||
242 | Size of the complete regset, in bytes. | ||
243 | This depends on vl and possibly on other things in the future. | ||
244 | |||
245 | If a call to PTRACE_GETREGSET requests less data than the value of | ||
246 | size, the caller can allocate a larger buffer and retry in order to | ||
247 | read the complete regset. | ||
248 | |||
249 | max_size | ||
250 | |||
251 | Maximum size in bytes that the regset can grow to for the target | ||
252 | thread. The regset won't grow bigger than this even if the target | ||
253 | thread changes its vector length etc. | ||
254 | |||
255 | vl | ||
256 | |||
257 | Target thread's current vector length, in bytes. | ||
258 | |||
259 | max_vl | ||
260 | |||
261 | Maximum possible vector length for the target thread. | ||
262 | |||
263 | flags | ||
264 | |||
265 | either | ||
266 | |||
267 | SVE_PT_REGS_FPSIMD | ||
268 | |||
269 | SVE registers are not live (GETREGSET) or are to be made | ||
270 | non-live (SETREGSET). | ||
271 | |||
272 | The payload is of type struct user_fpsimd_state, with the same | ||
273 | meaning as for NT_PRFPREG, starting at offset | ||
274 | SVE_PT_FPSIMD_OFFSET from the start of user_sve_header. | ||
275 | |||
276 | Extra data might be appended in the future: the size of the | ||
277 | payload should be obtained using SVE_PT_FPSIMD_SIZE(vq, flags). | ||
278 | |||
279 | vq should be obtained using sve_vq_from_vl(vl). | ||
280 | |||
281 | or | ||
282 | |||
283 | SVE_PT_REGS_SVE | ||
284 | |||
285 | SVE registers are live (GETREGSET) or are to be made live | ||
286 | (SETREGSET). | ||
287 | |||
288 | The payload contains the SVE register data, starting at offset | ||
289 | SVE_PT_SVE_OFFSET from the start of user_sve_header, and with | ||
290 | size SVE_PT_SVE_SIZE(vq, flags); | ||
291 | |||
292 | ... OR-ed with zero or more of the following flags, which have the same | ||
293 | meaning and behaviour as the corresponding PR_SET_VL_* flags: | ||
294 | |||
295 | SVE_PT_VL_INHERIT | ||
296 | |||
297 | SVE_PT_VL_ONEXEC (SETREGSET only). | ||
298 | |||
299 | * The effects of changing the vector length and/or flags are equivalent to | ||
300 | those documented for PR_SVE_SET_VL. | ||
301 | |||
302 | The caller must make a further GETREGSET call if it needs to know what VL is | ||
303 | actually set by SETREGSET, unless is it known in advance that the requested | ||
304 | VL is supported. | ||
305 | |||
306 | * In the SVE_PT_REGS_SVE case, the size and layout of the payload depends on | ||
307 | the header fields. The SVE_PT_SVE_*() macros are provided to facilitate | ||
308 | access to the members. | ||
309 | |||
310 | * In either case, for SETREGSET it is permissible to omit the payload, in which | ||
311 | case only the vector length and flags are changed (along with any | ||
312 | consequences of those changes). | ||
313 | |||
314 | * For SETREGSET, if an SVE_PT_REGS_SVE payload is present and the | ||
315 | requested VL is not supported, the effect will be the same as if the | ||
316 | payload were omitted, except that an EIO error is reported. No | ||
317 | attempt is made to translate the payload data to the correct layout | ||
318 | for the vector length actually set. The thread's FPSIMD state is | ||
319 | preserved, but the remaining bits of the SVE registers become | ||
320 | unspecified. It is up to the caller to translate the payload layout | ||
321 | for the actual VL and retry. | ||
322 | |||
323 | * The effect of writing a partial, incomplete payload is unspecified. | ||
324 | |||
325 | |||
326 | 8. ELF coredump extensions | ||
327 | --------------------------- | ||
328 | |||
329 | * A NT_ARM_SVE note will be added to each coredump for each thread of the | ||
330 | dumped process. The contents will be equivalent to the data that would have | ||
331 | been read if a PTRACE_GETREGSET of NT_ARM_SVE were executed for each thread | ||
332 | when the coredump was generated. | ||
333 | |||
334 | |||
335 | 9. System runtime configuration | ||
336 | -------------------------------- | ||
337 | |||
338 | * To mitigate the ABI impact of expansion of the signal frame, a policy | ||
339 | mechanism is provided for administrators, distro maintainers and developers | ||
340 | to set the default vector length for userspace processes: | ||
341 | |||
342 | /proc/sys/abi/sve_default_vector_length | ||
343 | |||
344 | Writing the text representation of an integer to this file sets the system | ||
345 | default vector length to the specified value, unless the value is greater | ||
346 | than the maximum vector length supported by the system in which case the | ||
347 | default vector length is set to that maximum. | ||
348 | |||
349 | The result can be determined by reopening the file and reading its | ||
350 | contents. | ||
351 | |||
352 | At boot, the default vector length is initially set to 64 or the maximum | ||
353 | supported vector length, whichever is smaller. This determines the initial | ||
354 | vector length of the init process (PID 1). | ||
355 | |||
356 | Reading this file returns the current system default vector length. | ||
357 | |||
358 | * At every execve() call, the new vector length of the new process is set to | ||
359 | the system default vector length, unless | ||
360 | |||
361 | * PR_SVE_SET_VL_INHERIT (or equivalently SVE_PT_VL_INHERIT) is set for the | ||
362 | calling thread, or | ||
363 | |||
364 | * a deferred vector length change is pending, established via the | ||
365 | PR_SVE_SET_VL_ONEXEC flag (or SVE_PT_VL_ONEXEC). | ||
366 | |||
367 | * Modifying the system default vector length does not affect the vector length | ||
368 | of any existing process or thread that does not make an execve() call. | ||
369 | |||
370 | |||
371 | Appendix A. SVE programmer's model (informative) | ||
372 | ================================================= | ||
373 | |||
374 | This section provides a minimal description of the additions made by SVE to the | ||
375 | ARMv8-A programmer's model that are relevant to this document. | ||
376 | |||
377 | Note: This section is for information only and not intended to be complete or | ||
378 | to replace any architectural specification. | ||
379 | |||
380 | A.1. Registers | ||
381 | --------------- | ||
382 | |||
383 | In A64 state, SVE adds the following: | ||
384 | |||
385 | * 32 8VL-bit vector registers Z0..Z31 | ||
386 | For each Zn, Zn bits [127:0] alias the ARMv8-A vector register Vn. | ||
387 | |||
388 | A register write using a Vn register name zeros all bits of the corresponding | ||
389 | Zn except for bits [127:0]. | ||
390 | |||
391 | * 16 VL-bit predicate registers P0..P15 | ||
392 | |||
393 | * 1 VL-bit special-purpose predicate register FFR (the "first-fault register") | ||
394 | |||
395 | * a VL "pseudo-register" that determines the size of each vector register | ||
396 | |||
397 | The SVE instruction set architecture provides no way to write VL directly. | ||
398 | Instead, it can be modified only by EL1 and above, by writing appropriate | ||
399 | system registers. | ||
400 | |||
401 | * The value of VL can be configured at runtime by EL1 and above: | ||
402 | 16 <= VL <= VLmax, where VL must be a multiple of 16. | ||
403 | |||
404 | * The maximum vector length is determined by the hardware: | ||
405 | 16 <= VLmax <= 256. | ||
406 | |||
407 | (The SVE architecture specifies 256, but permits future architecture | ||
408 | revisions to raise this limit.) | ||
409 | |||
410 | * FPSR and FPCR are retained from ARMv8-A, and interact with SVE floating-point | ||
411 | operations in a similar way to the way in which they interact with ARMv8 | ||
412 | floating-point operations. | ||
413 | |||
414 | 8VL-1 128 0 bit index | ||
415 | +---- //// -----------------+ | ||
416 | Z0 | : V0 | | ||
417 | : : | ||
418 | Z7 | : V7 | | ||
419 | Z8 | : * V8 | | ||
420 | : : : | ||
421 | Z15 | : *V15 | | ||
422 | Z16 | : V16 | | ||
423 | : : | ||
424 | Z31 | : V31 | | ||
425 | +---- //// -----------------+ | ||
426 | 31 0 | ||
427 | VL-1 0 +-------+ | ||
428 | +---- //// --+ FPSR | | | ||
429 | P0 | | +-------+ | ||
430 | : | | *FPCR | | | ||
431 | P15 | | +-------+ | ||
432 | +---- //// --+ | ||
433 | FFR | | +-----+ | ||
434 | +---- //// --+ VL | | | ||
435 | +-----+ | ||
436 | |||
437 | (*) callee-save: | ||
438 | This only applies to bits [63:0] of Z-/V-registers. | ||
439 | FPCR contains callee-save and caller-save bits. See [4] for details. | ||
440 | |||
441 | |||
442 | A.2. Procedure call standard | ||
443 | ----------------------------- | ||
444 | |||
445 | The ARMv8-A base procedure call standard is extended as follows with respect to | ||
446 | the additional SVE register state: | ||
447 | |||
448 | * All SVE register bits that are not shared with FP/SIMD are caller-save. | ||
449 | |||
450 | * Z8 bits [63:0] .. Z15 bits [63:0] are callee-save. | ||
451 | |||
452 | This follows from the way these bits are mapped to V8..V15, which are caller- | ||
453 | save in the base procedure call standard. | ||
454 | |||
455 | |||
456 | Appendix B. ARMv8-A FP/SIMD programmer's model | ||
457 | =============================================== | ||
458 | |||
459 | Note: This section is for information only and not intended to be complete or | ||
460 | to replace any architectural specification. | ||
461 | |||
462 | Refer to [4] for for more information. | ||
463 | |||
464 | ARMv8-A defines the following floating-point / SIMD register state: | ||
465 | |||
466 | * 32 128-bit vector registers V0..V31 | ||
467 | * 2 32-bit status/control registers FPSR, FPCR | ||
468 | |||
469 | 127 0 bit index | ||
470 | +---------------+ | ||
471 | V0 | | | ||
472 | : : : | ||
473 | V7 | | | ||
474 | * V8 | | | ||
475 | : : : : | ||
476 | *V15 | | | ||
477 | V16 | | | ||
478 | : : : | ||
479 | V31 | | | ||
480 | +---------------+ | ||
481 | |||
482 | 31 0 | ||
483 | +-------+ | ||
484 | FPSR | | | ||
485 | +-------+ | ||
486 | *FPCR | | | ||
487 | +-------+ | ||
488 | |||
489 | (*) callee-save: | ||
490 | This only applies to bits [63:0] of V-registers. | ||
491 | FPCR contains a mixture of callee-save and caller-save bits. | ||
492 | |||
493 | |||
494 | References | ||
495 | ========== | ||
496 | |||
497 | [1] arch/arm64/include/uapi/asm/sigcontext.h | ||
498 | AArch64 Linux signal ABI definitions | ||
499 | |||
500 | [2] arch/arm64/include/uapi/asm/ptrace.h | ||
501 | AArch64 Linux ptrace ABI definitions | ||
502 | |||
503 | [3] linux/Documentation/arm64/cpu-feature-registers.txt | ||
504 | |||
505 | [4] ARM IHI0055C | ||
506 | http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055c/IHI0055C_beta_aapcs64.pdf | ||
507 | http://infocenter.arm.com/help/topic/com.arm.doc.subset.swdev.abi/index.html | ||
508 | Procedure Call Standard for the ARM 64-bit Architecture (AArch64) | ||