diff options
author | Ard Biesheuvel <ard.biesheuvel@linaro.org> | 2013-08-23 09:47:09 -0400 |
---|---|---|
committer | Russell King <rmk+kernel@arm.linux.org.uk> | 2013-08-25 12:09:48 -0400 |
commit | 2afd0a05241d2754f738003c2ed6d6821dac3d09 (patch) | |
tree | 91d2a7b7d86890c7dd3b97b80e52795e5491ad03 | |
parent | 83d26d1113d82e455439587d0d9418a357a1728e (diff) |
ARM: 7825/1: document the use of NEON in kernel mode
Add a file to Documentation/arm explaining how kernel mode NEON
is supposed to be used.
Reviewed-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
-rw-r--r-- | Documentation/arm/kernel_mode_neon.txt | 121 |
1 files changed, 121 insertions, 0 deletions
diff --git a/Documentation/arm/kernel_mode_neon.txt b/Documentation/arm/kernel_mode_neon.txt new file mode 100644 index 000000000000..525452726d31 --- /dev/null +++ b/Documentation/arm/kernel_mode_neon.txt | |||
@@ -0,0 +1,121 @@ | |||
1 | Kernel mode NEON | ||
2 | ================ | ||
3 | |||
4 | TL;DR summary | ||
5 | ------------- | ||
6 | * Use only NEON instructions, or VFP instructions that don't rely on support | ||
7 | code | ||
8 | * Isolate your NEON code in a separate compilation unit, and compile it with | ||
9 | '-mfpu=neon -mfloat-abi=softfp' | ||
10 | * Put kernel_neon_begin() and kernel_neon_end() calls around the calls into your | ||
11 | NEON code | ||
12 | * Don't sleep in your NEON code, and be aware that it will be executed with | ||
13 | preemption disabled | ||
14 | |||
15 | |||
16 | Introduction | ||
17 | ------------ | ||
18 | It is possible to use NEON instructions (and in some cases, VFP instructions) in | ||
19 | code that runs in kernel mode. However, for performance reasons, the NEON/VFP | ||
20 | register file is not preserved and restored at every context switch or taken | ||
21 | exception like the normal register file is, so some manual intervention is | ||
22 | required. Furthermore, special care is required for code that may sleep [i.e., | ||
23 | may call schedule()], as NEON or VFP instructions will be executed in a | ||
24 | non-preemptible section for reasons outlined below. | ||
25 | |||
26 | |||
27 | Lazy preserve and restore | ||
28 | ------------------------- | ||
29 | The NEON/VFP register file is managed using lazy preserve (on UP systems) and | ||
30 | lazy restore (on both SMP and UP systems). This means that the register file is | ||
31 | kept 'live', and is only preserved and restored when multiple tasks are | ||
32 | contending for the NEON/VFP unit (or, in the SMP case, when a task migrates to | ||
33 | another core). Lazy restore is implemented by disabling the NEON/VFP unit after | ||
34 | every context switch, resulting in a trap when subsequently a NEON/VFP | ||
35 | instruction is issued, allowing the kernel to step in and perform the restore if | ||
36 | necessary. | ||
37 | |||
38 | Any use of the NEON/VFP unit in kernel mode should not interfere with this, so | ||
39 | it is required to do an 'eager' preserve of the NEON/VFP register file, and | ||
40 | enable the NEON/VFP unit explicitly so no exceptions are generated on first | ||
41 | subsequent use. This is handled by the function kernel_neon_begin(), which | ||
42 | should be called before any kernel mode NEON or VFP instructions are issued. | ||
43 | Likewise, the NEON/VFP unit should be disabled again after use to make sure user | ||
44 | mode will hit the lazy restore trap upon next use. This is handled by the | ||
45 | function kernel_neon_end(). | ||
46 | |||
47 | |||
48 | Interruptions in kernel mode | ||
49 | ---------------------------- | ||
50 | For reasons of performance and simplicity, it was decided that there shall be no | ||
51 | preserve/restore mechanism for the kernel mode NEON/VFP register contents. This | ||
52 | implies that interruptions of a kernel mode NEON section can only be allowed if | ||
53 | they are guaranteed not to touch the NEON/VFP registers. For this reason, the | ||
54 | following rules and restrictions apply in the kernel: | ||
55 | * NEON/VFP code is not allowed in interrupt context; | ||
56 | * NEON/VFP code is not allowed to sleep; | ||
57 | * NEON/VFP code is executed with preemption disabled. | ||
58 | |||
59 | If latency is a concern, it is possible to put back to back calls to | ||
60 | kernel_neon_end() and kernel_neon_begin() in places in your code where none of | ||
61 | the NEON registers are live. (Additional calls to kernel_neon_begin() should be | ||
62 | reasonably cheap if no context switch occurred in the meantime) | ||
63 | |||
64 | |||
65 | VFP and support code | ||
66 | -------------------- | ||
67 | Earlier versions of VFP (prior to version 3) rely on software support for things | ||
68 | like IEEE-754 compliant underflow handling etc. When the VFP unit needs such | ||
69 | software assistance, it signals the kernel by raising an undefined instruction | ||
70 | exception. The kernel responds by inspecting the VFP control registers and the | ||
71 | current instruction and arguments, and emulates the instruction in software. | ||
72 | |||
73 | Such software assistance is currently not implemented for VFP instructions | ||
74 | executed in kernel mode. If such a condition is encountered, the kernel will | ||
75 | fail and generate an OOPS. | ||
76 | |||
77 | |||
78 | Separating NEON code from ordinary code | ||
79 | --------------------------------------- | ||
80 | The compiler is not aware of the special significance of kernel_neon_begin() and | ||
81 | kernel_neon_end(), i.e., that it is only allowed to issue NEON/VFP instructions | ||
82 | between calls to these respective functions. Furthermore, GCC may generate NEON | ||
83 | instructions of its own at -O3 level if -mfpu=neon is selected, and even if the | ||
84 | kernel is currently compiled at -O2, future changes may result in NEON/VFP | ||
85 | instructions appearing in unexpected places if no special care is taken. | ||
86 | |||
87 | Therefore, the recommended and only supported way of using NEON/VFP in the | ||
88 | kernel is by adhering to the following rules: | ||
89 | * isolate the NEON code in a separate compilation unit and compile it with | ||
90 | '-mfpu=neon -mfloat-abi=softfp'; | ||
91 | * issue the calls to kernel_neon_begin(), kernel_neon_end() as well as the calls | ||
92 | into the unit containing the NEON code from a compilation unit which is *not* | ||
93 | built with the GCC flag '-mfpu=neon' set. | ||
94 | |||
95 | As the kernel is compiled with '-msoft-float', the above will guarantee that | ||
96 | both NEON and VFP instructions will only ever appear in designated compilation | ||
97 | units at any optimization level. | ||
98 | |||
99 | |||
100 | NEON assembler | ||
101 | -------------- | ||
102 | NEON assembler is supported with no additional caveats as long as the rules | ||
103 | above are followed. | ||
104 | |||
105 | |||
106 | NEON code generated by GCC | ||
107 | -------------------------- | ||
108 | The GCC option -ftree-vectorize (implied by -O3) tries to exploit implicit | ||
109 | parallelism, and generates NEON code from ordinary C source code. This is fully | ||
110 | supported as long as the rules above are followed. | ||
111 | |||
112 | |||
113 | NEON intrinsics | ||
114 | --------------- | ||
115 | NEON intrinsics are also supported. However, as code using NEON intrinsics | ||
116 | relies on the GCC header <arm_neon.h>, (which #includes <stdint.h>), you should | ||
117 | observe the following in addition to the rules above: | ||
118 | * Compile the unit containing the NEON intrinsics with '-ffreestanding' so GCC | ||
119 | uses its builtin version of <stdint.h> (this is a C99 header which the kernel | ||
120 | does not supply); | ||
121 | * Include <arm_neon.h> last, or at least after <linux/types.h> | ||