diff options
Diffstat (limited to 'Documentation/fujitsu/frv/atomic-ops.txt')
-rw-r--r-- | Documentation/fujitsu/frv/atomic-ops.txt | 134 |
1 files changed, 134 insertions, 0 deletions
diff --git a/Documentation/fujitsu/frv/atomic-ops.txt b/Documentation/fujitsu/frv/atomic-ops.txt new file mode 100644 index 000000000000..96638e9b9fe0 --- /dev/null +++ b/Documentation/fujitsu/frv/atomic-ops.txt | |||
@@ -0,0 +1,134 @@ | |||
1 | ===================================== | ||
2 | FUJITSU FR-V KERNEL ATOMIC OPERATIONS | ||
3 | ===================================== | ||
4 | |||
5 | On the FR-V CPUs, there is only one atomic Read-Modify-Write operation: the SWAP/SWAPI | ||
6 | instruction. Unfortunately, this alone can't be used to implement the following operations: | ||
7 | |||
8 | (*) Atomic add to memory | ||
9 | |||
10 | (*) Atomic subtract from memory | ||
11 | |||
12 | (*) Atomic bit modification (set, clear or invert) | ||
13 | |||
14 | (*) Atomic compare and exchange | ||
15 | |||
16 | On such CPUs, the standard way of emulating such operations in uniprocessor mode is to disable | ||
17 | interrupts, but on the FR-V CPUs, modifying the PSR takes a lot of clock cycles, and it has to be | ||
18 | done twice. This means the CPU runs for a relatively long time with interrupts disabled, | ||
19 | potentially having a great effect on interrupt latency. | ||
20 | |||
21 | |||
22 | ============= | ||
23 | NEW ALGORITHM | ||
24 | ============= | ||
25 | |||
26 | To get around this, the following algorithm has been implemented. It operates in a way similar to | ||
27 | the LL/SC instruction pairs supported on a number of platforms. | ||
28 | |||
29 | (*) The CCCR.CC3 register is reserved within the kernel to act as an atomic modify abort flag. | ||
30 | |||
31 | (*) In the exception prologues run on kernel->kernel entry, CCCR.CC3 is set to 0 (Undefined | ||
32 | state). | ||
33 | |||
34 | (*) All atomic operations can then be broken down into the following algorithm: | ||
35 | |||
36 | (1) Set ICC3.Z to true and set CC3 to True (ORCC/CKEQ/ORCR). | ||
37 | |||
38 | (2) Load the value currently in the memory to be modified into a register. | ||
39 | |||
40 | (3) Make changes to the value. | ||
41 | |||
42 | (4) If CC3 is still True, simultaneously and atomically (by VLIW packing): | ||
43 | |||
44 | (a) Store the modified value back to memory. | ||
45 | |||
46 | (b) Set ICC3.Z to false (CORCC on GR29 is sufficient for this - GR29 holds the current | ||
47 | task pointer in the kernel, and so is guaranteed to be non-zero). | ||
48 | |||
49 | (5) If ICC3.Z is still true, go back to step (1). | ||
50 | |||
51 | This works in a non-SMP environment because any interrupt or other exception that happens between | ||
52 | steps (1) and (4) will set CC3 to the Undefined, thus aborting the store in (4a), and causing the | ||
53 | condition in ICC3 to remain with the Z flag set, thus causing step (5) to loop back to step (1). | ||
54 | |||
55 | |||
56 | This algorithm suffers from two problems: | ||
57 | |||
58 | (1) The condition CCCR.CC3 is cleared unconditionally by an exception, irrespective of whether or | ||
59 | not any changes were made to the target memory location during that exception. | ||
60 | |||
61 | (2) The branch from step (5) back to step (1) may have to happen more than once until the store | ||
62 | manages to take place. In theory, this loop could cycle forever because there are too many | ||
63 | interrupts coming in, but it's unlikely. | ||
64 | |||
65 | |||
66 | ======= | ||
67 | EXAMPLE | ||
68 | ======= | ||
69 | |||
70 | Taking an example from include/asm-frv/atomic.h: | ||
71 | |||
72 | static inline int atomic_add_return(int i, atomic_t *v) | ||
73 | { | ||
74 | unsigned long val; | ||
75 | |||
76 | asm("0: \n" | ||
77 | |||
78 | It starts by setting ICC3.Z to true for later use, and also transforming that into CC3 being in the | ||
79 | True state. | ||
80 | |||
81 | " orcc gr0,gr0,gr0,icc3 \n" <-- (1) | ||
82 | " ckeq icc3,cc7 \n" <-- (1) | ||
83 | |||
84 | Then it does the load. Note that the final phase of step (1) is done at the same time as the | ||
85 | load. The VLIW packing ensures they are done simultaneously. The ".p" on the load must not be | ||
86 | removed without swapping the order of these two instructions. | ||
87 | |||
88 | " ld.p %M0,%1 \n" <-- (2) | ||
89 | " orcr cc7,cc7,cc3 \n" <-- (1) | ||
90 | |||
91 | Then the proposed modification is generated. Note that the old value can be retained if required | ||
92 | (such as in test_and_set_bit()). | ||
93 | |||
94 | " add%I2 %1,%2,%1 \n" <-- (3) | ||
95 | |||
96 | Then it attempts to store the value back, contingent on no exception having cleared CC3 since it | ||
97 | was set to True. | ||
98 | |||
99 | " cst.p %1,%M0 ,cc3,#1 \n" <-- (4a) | ||
100 | |||
101 | It simultaneously records the success or failure of the store in ICC3.Z. | ||
102 | |||
103 | " corcc gr29,gr29,gr0 ,cc3,#1 \n" <-- (4b) | ||
104 | |||
105 | Such that the branch can then be taken if the operation was aborted. | ||
106 | |||
107 | " beq icc3,#0,0b \n" <-- (5) | ||
108 | : "+U"(v->counter), "=&r"(val) | ||
109 | : "NPr"(i) | ||
110 | : "memory", "cc7", "cc3", "icc3" | ||
111 | ); | ||
112 | |||
113 | return val; | ||
114 | } | ||
115 | |||
116 | |||
117 | ============= | ||
118 | CONFIGURATION | ||
119 | ============= | ||
120 | |||
121 | The atomic ops implementation can be made inline or out-of-line by changing the | ||
122 | CONFIG_FRV_OUTOFLINE_ATOMIC_OPS configuration variable. Making it out-of-line has a number of | ||
123 | advantages: | ||
124 | |||
125 | - The resulting kernel image may be smaller | ||
126 | - Debugging is easier as atomic ops can just be stepped over and they can be breakpointed | ||
127 | |||
128 | Keeping it inline also has a number of advantages: | ||
129 | |||
130 | - The resulting kernel may be Faster | ||
131 | - no out-of-line function calls need to be made | ||
132 | - the compiler doesn't have half its registers clobbered by making a call | ||
133 | |||
134 | The out-of-line implementations live in arch/frv/lib/atomic-ops.S. | ||