1 files changed, 134 insertions, 0 deletions
diff --git a/Documentation/fujitsu/frv/atomic-ops.txt b/Documentation/fujitsu/frv/atomic-ops.txt
new file mode 100644
index 000000000000..96638e9b9fe0
--- /dev/null
+++ b/Documentation/fujitsu/frv/atomic-ops.txt
@@ -0,0 +1,134 @@
+                               =====================================
+                               FUJITSU FR-V KERNEL ATOMIC OPERATIONS
+                               =====================================
+On the FR-V CPUs, there is only one atomic Read-Modify-Write operation: the SWAP/SWAPI
+instruction. Unfortunately, this alone can't be used to implement the following operations:
+ (*) Atomic add to memory
+ (*) Atomic subtract from memory
+ (*) Atomic bit modification (set, clear or invert)
+ (*) Atomic compare and exchange
+On such CPUs, the standard way of emulating such operations in uniprocessor mode is to disable
+interrupts, but on the FR-V CPUs, modifying the PSR takes a lot of clock cycles, and it has to be
+done twice. This means the CPU runs for a relatively long time with interrupts disabled,
+potentially having a great effect on interrupt latency.
+=============
+NEW ALGORITHM
+=============
+To get around this, the following algorithm has been implemented. It operates in a way similar to
+the LL/SC instruction pairs supported on a number of platforms.
+ (*) The CCCR.CC3 register is reserved within the kernel to act as an atomic modify abort flag.
+ (*) In the exception prologues run on kernel->kernel entry, CCCR.CC3 is set to 0 (Undefined
+     state).
+ (*) All atomic operations can then be broken down into the following algorithm:
+     (1) Set ICC3.Z to true and set CC3 to True (ORCC/CKEQ/ORCR).
+     (2) Load the value currently in the memory to be modified into a register.
+     (3) Make changes to the value.
+     (4) If CC3 is still True, simultaneously and atomically (by VLIW packing):
+         (a) Store the modified value back to memory.
+         (b) Set ICC3.Z to false (CORCC on GR29 is sufficient for this - GR29 holds the current
+             task pointer in the kernel, and so is guaranteed to be non-zero).
+     (5) If ICC3.Z is still true, go back to step (1).
+This works in a non-SMP environment because any interrupt or other exception that happens between
+steps (1) and (4) will set CC3 to the Undefined, thus aborting the store in (4a), and causing the
+condition in ICC3 to remain with the Z flag set, thus causing step (5) to loop back to step (1).
+This algorithm suffers from two problems:
+ (1) The condition CCCR.CC3 is cleared unconditionally by an exception, irrespective of whether or
+     not any changes were made to the target memory location during that exception.
+ (2) The branch from step (5) back to step (1) may have to happen more than once until the store
+     manages to take place. In theory, this loop could cycle forever because there are too many
+     interrupts coming in, but it's unlikely.
+=======
+EXAMPLE
+=======
+Taking an example from include/asm-frv/atomic.h:
+        static inline int atomic_add_return(int i, atomic_t *v)
+        {
+                unsigned long val;
+                asm("0:                                         \n"
+It starts by setting ICC3.Z to true for later use, and also transforming that into CC3 being in the
+True state.
+                    "   orcc            gr0,gr0,gr0,icc3        \n"     <-- (1)
+                    "   ckeq            icc3,cc7                \n"     <-- (1)
+Then it does the load. Note that the final phase of step (1) is done at the same time as the
+load. The VLIW packing ensures they are done simultaneously. The ".p" on the load must not be
+removed without swapping the order of these two instructions.
+                    "   ld.p            %M0,%1                  \n"     <-- (2)
+                    "   orcr            cc7,cc7,cc3             \n"     <-- (1)
+Then the proposed modification is generated. Note that the old value can be retained if required
+(such as in test_and_set_bit()).
+                    "   add%I2          %1,%2,%1                \n"     <-- (3)
+Then it attempts to store the value back, contingent on no exception having cleared CC3 since it
+was set to True.
+                    "   cst.p           %1,%M0          ,cc3,#1 \n"     <-- (4a)
+It simultaneously records the success or failure of the store in ICC3.Z.
+                    "   corcc           gr29,gr29,gr0   ,cc3,#1 \n"     <-- (4b)
+Such that the branch can then be taken if the operation was aborted.
+                    "   beq             icc3,#0,0b              \n"     <-- (5)
+                    : "+U"(v->counter), "=&r"(val)
+                    : "NPr"(i)
+                    : "memory", "cc7", "cc3", "icc3"
+                    );
+                return val;
+        }
+=============
+CONFIGURATION
+=============
+The atomic ops implementation can be made inline or out-of-line by changing the
+CONFIG_FRV_OUTOFLINE_ATOMIC_OPS configuration variable. Making it out-of-line has a number of
+advantages:
+ - The resulting kernel image may be smaller
+ - Debugging is easier as atomic ops can just be stepped over and they can be breakpointed
+Keeping it inline also has a number of advantages:
+ - The resulting kernel may be Faster
+   - no out-of-line function calls need to be made
+   - the compiler doesn't have half its registers clobbered by making a call
+The out-of-line implementations live in arch/frv/lib/atomic-ops.S.

diff --git a/Documentation/fujitsu/frv/atomic-ops.txt b/Documentation/fujitsu/frv/atomic-ops.txt new file mode 100644 index 000000000000..96638e9b9fe0 --- /dev/null +++ b/Documentation/fujitsu/frv/atomic-ops.txt
@@ -0,0 +1,134 @@
	1	=====================================
	2	FUJITSU FR-V KERNEL ATOMIC OPERATIONS
	3	=====================================
	4
	5	On the FR-V CPUs, there is only one atomic Read-Modify-Write operation: the SWAP/SWAPI
	6	instruction. Unfortunately, this alone can't be used to implement the following operations:
	7
	8	(*) Atomic add to memory
	9
	10	(*) Atomic subtract from memory
	11
	12	(*) Atomic bit modification (set, clear or invert)
	13
	14	(*) Atomic compare and exchange
	15
	16	On such CPUs, the standard way of emulating such operations in uniprocessor mode is to disable
	17	interrupts, but on the FR-V CPUs, modifying the PSR takes a lot of clock cycles, and it has to be
	18	done twice. This means the CPU runs for a relatively long time with interrupts disabled,
	19	potentially having a great effect on interrupt latency.
	20
	21
	22	=============
	23	NEW ALGORITHM
	24	=============
	25
	26	To get around this, the following algorithm has been implemented. It operates in a way similar to
	27	the LL/SC instruction pairs supported on a number of platforms.
	28
	29	(*) The CCCR.CC3 register is reserved within the kernel to act as an atomic modify abort flag.
	30
	31	(*) In the exception prologues run on kernel->kernel entry, CCCR.CC3 is set to 0 (Undefined
	32	state).
	33
	34	(*) All atomic operations can then be broken down into the following algorithm:
	35
	36	(1) Set ICC3.Z to true and set CC3 to True (ORCC/CKEQ/ORCR).
	37
	38	(2) Load the value currently in the memory to be modified into a register.
	39
	40	(3) Make changes to the value.
	41
	42	(4) If CC3 is still True, simultaneously and atomically (by VLIW packing):
	43
	44	(a) Store the modified value back to memory.
	45
	46	(b) Set ICC3.Z to false (CORCC on GR29 is sufficient for this - GR29 holds the current
	47	task pointer in the kernel, and so is guaranteed to be non-zero).
	48
	49	(5) If ICC3.Z is still true, go back to step (1).
	50
	51	This works in a non-SMP environment because any interrupt or other exception that happens between
	52	steps (1) and (4) will set CC3 to the Undefined, thus aborting the store in (4a), and causing the
	53	condition in ICC3 to remain with the Z flag set, thus causing step (5) to loop back to step (1).
	54
	55
	56	This algorithm suffers from two problems:
	57
	58	(1) The condition CCCR.CC3 is cleared unconditionally by an exception, irrespective of whether or
	59	not any changes were made to the target memory location during that exception.
	60
	61	(2) The branch from step (5) back to step (1) may have to happen more than once until the store
	62	manages to take place. In theory, this loop could cycle forever because there are too many
	63	interrupts coming in, but it's unlikely.
	64
	65
	66	=======
	67	EXAMPLE
	68	=======
	69
	70	Taking an example from include/asm-frv/atomic.h:
	71
	72	static inline int atomic_add_return(int i, atomic_t *v)
	73	{
	74	unsigned long val;
	75
	76	asm("0: \n"
	77
	78	It starts by setting ICC3.Z to true for later use, and also transforming that into CC3 being in the
	79	True state.
	80
	81	" orcc gr0,gr0,gr0,icc3 \n" <-- (1)
	82	" ckeq icc3,cc7 \n" <-- (1)
	83
	84	Then it does the load. Note that the final phase of step (1) is done at the same time as the
	85	load. The VLIW packing ensures they are done simultaneously. The ".p" on the load must not be
	86	removed without swapping the order of these two instructions.
	87
	88	" ld.p %M0,%1 \n" <-- (2)
	89	" orcr cc7,cc7,cc3 \n" <-- (1)
	90
	91	Then the proposed modification is generated. Note that the old value can be retained if required
	92	(such as in test_and_set_bit()).
	93
	94	" add%I2 %1,%2,%1 \n" <-- (3)
	95
	96	Then it attempts to store the value back, contingent on no exception having cleared CC3 since it
	97	was set to True.
	98
	99	" cst.p %1,%M0 ,cc3,#1 \n" <-- (4a)
	100
	101	It simultaneously records the success or failure of the store in ICC3.Z.
	102
	103	" corcc gr29,gr29,gr0 ,cc3,#1 \n" <-- (4b)
	104
	105	Such that the branch can then be taken if the operation was aborted.
	106
	107	" beq icc3,#0,0b \n" <-- (5)
	108	: "+U"(v->counter), "=&r"(val)
	109	: "NPr"(i)
	110	: "memory", "cc7", "cc3", "icc3"
	111	);
	112
	113	return val;
	114	}
	115
	116
	117	=============
	118	CONFIGURATION
	119	=============
	120
	121	The atomic ops implementation can be made inline or out-of-line by changing the
	122	CONFIG_FRV_OUTOFLINE_ATOMIC_OPS configuration variable. Making it out-of-line has a number of
	123	advantages:
	124
	125	- The resulting kernel image may be smaller
	126	- Debugging is easier as atomic ops can just be stepped over and they can be breakpointed
	127
	128	Keeping it inline also has a number of advantages:
	129
	130	- The resulting kernel may be Faster
	131	- no out-of-line function calls need to be made
	132	- the compiler doesn't have half its registers clobbered by making a call
	133
	134	The out-of-line implementations live in arch/frv/lib/atomic-ops.S.