locking/lglocks: Add documentation of current lglocks implementation

Local/global locks are currently not documented anywhere other than in an somewhat out-of-date LWN article - this is an attempt to document the current state of lglocks. This patch is against linux-next 3.18.0-rc6 Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Carsten Emde <c.emde@osadl.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20141208083326.GA29895@opentech.at Signed-off-by: Ingo Molnar <mingo@kernel.org>
author: Nicholas Mc Guire <der.herr@hofr.at> 2014-12-08 03:33:26 -0500
committer: Ingo Molnar <mingo@kernel.org> 2014-12-08 03:55:00 -0500
commit: 9fd7fc34cfcaf9f6c932ee1710cce83da3b7bd59 (patch)
tree: e57d5717ef514533d4885cc0748c6a7c7150f136 /Documentation
parent: 6f008e72cd111a119b5d8de8c5438d892aae99eb (diff)
1 files changed, 166 insertions, 0 deletions
diff --git a/Documentation/locking/lglock.txt b/Documentation/locking/lglock.txt
new file mode 100644
index 000000000000..a6971e34fabe
--- /dev/null
+++ b/Documentation/locking/lglock.txt
@@ -0,0 +1,166 @@
+lglock - local/global locks for mostly local access patterns
+------------------------------------------------------------
+Origin: Nick Piggin's VFS scalability series introduced during
+        2.6.35++ [1] [2]
+Location: kernel/locking/lglock.c
+        include/linux/lglock.h
+Users: currently only the VFS and stop_machine related code
+Design Goal:
+------------
+Improve scalability of globally used large data sets that are
+distributed over all CPUs as per_cpu elements.
+To manage global data structures that are partitioned over all CPUs
+as per_cpu elements but can be mostly handled by CPU local actions
+lglock will be used where the majority of accesses are cpu local
+reading and occasional cpu local writing with very infrequent
+global write access.
+* deal with things locally whenever possible
+        - very fast access to the local per_cpu data
+        - reasonably fast access to specific per_cpu data on a different
+          CPU
+* while making global action possible when needed
+        - by expensive access to all CPUs locks - effectively
+          resulting in a globally visible critical section.
+Design:
+-------
+Basically it is an array of per_cpu spinlocks with the
+lg_local_lock/unlock accessing the local CPUs lock object and the
+lg_local_lock_cpu/unlock_cpu accessing a remote CPUs lock object
+the lg_local_lock has to disable preemption as migration protection so
+that the reference to the local CPUs lock does not go out of scope.
+Due to the lg_local_lock/unlock only touching cpu-local resources it
+is fast. Taking the local lock on a different CPU will be more
+expensive but still relatively cheap.
+One can relax the migration constraints by acquiring the current
+CPUs lock with lg_local_lock_cpu, remember the cpu, and release that
+lock at the end of the critical section even if migrated. This should
+give most of the performance benefits without inhibiting migration
+though needs careful considerations for nesting of lglocks and
+consideration of deadlocks with lg_global_lock.
+The lg_global_lock/unlock locks all underlying spinlocks of all
+possible CPUs (including those off-line). The preemption disable/enable
+are needed in the non-RT kernels to prevent deadlocks like:
+                     on cpu 1
+              task A          task B
+         lg_global_lock
+           got cpu 0 lock
+                 <<<< preempt <<<<
+                         lg_local_lock_cpu for cpu 0
+                           spin on cpu 0 lock
+On -RT this deadlock scenario is resolved by the arch_spin_locks in the
+lglocks being replaced by rt_mutexes which resolve the above deadlock
+by boosting the lock-holder.
+Implementation:
+---------------
+The initial lglock implementation from Nick Piggin used some complex
+macros to generate the lglock/brlock in lglock.h - they were later
+turned into a set of functions by Andi Kleen [7]. The change to functions
+was motivated by the presence of multiple lock users and also by them
+being easier to maintain than the generating macros. This change to
+functions is also the basis to eliminated the restriction of not
+being initializeable in kernel modules (the remaining problem is that
+locks are not explicitly initialized - see lockdep-design.txt)
+Declaration and initialization:
+-------------------------------
+  #include <linux/lglock.h>
+  DEFINE_LGLOCK(name)
+  or:
+  DEFINE_STATIC_LGLOCK(name);
+  lg_lock_init(&name, "lockdep_name_string");
+  on UP this is mapped to DEFINE_SPINLOCK(name) in both cases, note
+  also that as of 3.18-rc6 all declaration in use are of the _STATIC_
+  variant (and it seems that the non-static was never in use).
+  lg_lock_init is initializing the lockdep map only.
+Usage:
+------
+From the locking semantics it is a spinlock. It could be called a
+locality aware spinlock. lg_local_* behaves like a per_cpu
+spinlock and lg_global_* like a global spinlock.
+No surprises in the API.
+  lg_local_lock(*lglock);
+     access to protected per_cpu object on this CPU
+  lg_local_unlock(*lglock);
+  lg_local_lock_cpu(*lglock, cpu);
+     access to protected per_cpu object on other CPU cpu
+  lg_local_unlock_cpu(*lglock, cpu);
+  lg_global_lock(*lglock);
+     access all protected per_cpu objects on all CPUs
+  lg_global_unlock(*lglock);
+  There are no _trylock variants of the lglocks.
+Note that the lg_global_lock/unlock has to iterate over all possible
+CPUs rather than the actually present CPUs or a CPU could go off-line
+with a held lock [4] and that makes it very expensive. A discussion on
+these issues can be found at [5]
+Constraints:
+------------
+  * currently the declaration of lglocks in kernel modules is not
+    possible, though this should be doable with little change.
+  * lglocks are not recursive.
+  * suitable for code that can do most operations on the CPU local
+    data and will very rarely need the global lock
+  * lg_global_lock/unlock is *very* expensive and does not scale
+  * on UP systems all lg_* primitives are simply spinlocks
+  * in PREEMPT_RT the spinlock becomes an rt-mutex and can sleep but
+    does not change the tasks state while sleeping [6].
+  * in PREEMPT_RT the preempt_disable/enable in lg_local_lock/unlock
+    is downgraded to a migrate_disable/enable, the other
+    preempt_disable/enable are downgraded to barriers [6].
+    The deadlock noted for non-RT above is resolved due to rt_mutexes
+    boosting the lock-holder in this case which arch_spin_locks do
+    not do.
+lglocks were designed for very specific problems in the VFS and probably
+only are the right answer in these corner cases. Any new user that looks
+at lglocks probably wants to look at the seqlock and RCU alternatives as
+her first choice. There are also efforts to resolve the RCU issues that
+currently prevent using RCU in place of view remaining lglocks.
+Note on brlock history:
+-----------------------
+The 'Big Reader' read-write spinlocks were originally introduced by
+Ingo Molnar in 2000 (2.4/2.5 kernel series) and removed in 2003. They
+later were introduced by the VFS scalability patch set in 2.6 series
+again as the "big reader lock" brlock [2] variant of lglock which has
+been replaced by seqlock primitives or by RCU based primitives in the
+3.13 kernel series as was suggested in [3] in 2003. The brlock was
+entirely removed in the 3.13 kernel series.
+Link: 1 http://lkml.org/lkml/2010/8/2/81
+Link: 2 http://lwn.net/Articles/401738/
+Link: 3 http://lkml.org/lkml/2003/3/9/205
+Link: 4 https://lkml.org/lkml/2011/8/24/185
+Link: 5 http://lkml.org/lkml/2011/12/18/189
+Link: 6 https://www.kernel.org/pub/linux/kernel/projects/rt/
+        patch series - lglocks-rt.patch.patch
+Link: 7 http://lkml.org/lkml/2012/3/5/26
author	Nicholas Mc Guire <der.herr@hofr.at>	2014-12-08 03:33:26 -0500
committer	Ingo Molnar <mingo@kernel.org>	2014-12-08 03:55:00 -0500
commit	9fd7fc34cfcaf9f6c932ee1710cce83da3b7bd59 (patch)
tree	e57d5717ef514533d4885cc0748c6a7c7150f136 /Documentation
parent	6f008e72cd111a119b5d8de8c5438d892aae99eb (diff)

diff --git a/Documentation/locking/lglock.txt b/Documentation/locking/lglock.txt new file mode 100644 index 000000000000..a6971e34fabe --- /dev/null +++ b/Documentation/locking/lglock.txt
@@ -0,0 +1,166 @@
	1	lglock - local/global locks for mostly local access patterns
	2	------------------------------------------------------------
	3
	4	Origin: Nick Piggin's VFS scalability series introduced during
	5	2.6.35++ [1] [2]
	6	Location: kernel/locking/lglock.c
	7	include/linux/lglock.h
	8	Users: currently only the VFS and stop_machine related code
	9
	10	Design Goal:
	11	------------
	12
	13	Improve scalability of globally used large data sets that are
	14	distributed over all CPUs as per_cpu elements.
	15
	16	To manage global data structures that are partitioned over all CPUs
	17	as per_cpu elements but can be mostly handled by CPU local actions
	18	lglock will be used where the majority of accesses are cpu local
	19	reading and occasional cpu local writing with very infrequent
	20	global write access.
	21
	22
	23	* deal with things locally whenever possible
	24	- very fast access to the local per_cpu data
	25	- reasonably fast access to specific per_cpu data on a different
	26	CPU
	27	* while making global action possible when needed
	28	- by expensive access to all CPUs locks - effectively
	29	resulting in a globally visible critical section.
	30
	31	Design:
	32	-------
	33
	34	Basically it is an array of per_cpu spinlocks with the
	35	lg_local_lock/unlock accessing the local CPUs lock object and the
	36	lg_local_lock_cpu/unlock_cpu accessing a remote CPUs lock object
	37	the lg_local_lock has to disable preemption as migration protection so
	38	that the reference to the local CPUs lock does not go out of scope.
	39	Due to the lg_local_lock/unlock only touching cpu-local resources it
	40	is fast. Taking the local lock on a different CPU will be more
	41	expensive but still relatively cheap.
	42
	43	One can relax the migration constraints by acquiring the current
	44	CPUs lock with lg_local_lock_cpu, remember the cpu, and release that
	45	lock at the end of the critical section even if migrated. This should
	46	give most of the performance benefits without inhibiting migration
	47	though needs careful considerations for nesting of lglocks and
	48	consideration of deadlocks with lg_global_lock.
	49
	50	The lg_global_lock/unlock locks all underlying spinlocks of all
	51	possible CPUs (including those off-line). The preemption disable/enable
	52	are needed in the non-RT kernels to prevent deadlocks like:
	53
	54	on cpu 1
	55
	56	task A task B
	57	lg_global_lock
	58	got cpu 0 lock
	59	<<<< preempt <<<<
	60	lg_local_lock_cpu for cpu 0
	61	spin on cpu 0 lock
	62
	63	On -RT this deadlock scenario is resolved by the arch_spin_locks in the
	64	lglocks being replaced by rt_mutexes which resolve the above deadlock
	65	by boosting the lock-holder.
	66
	67
	68	Implementation:
	69	---------------
	70
	71	The initial lglock implementation from Nick Piggin used some complex
	72	macros to generate the lglock/brlock in lglock.h - they were later
	73	turned into a set of functions by Andi Kleen [7]. The change to functions
	74	was motivated by the presence of multiple lock users and also by them
	75	being easier to maintain than the generating macros. This change to
	76	functions is also the basis to eliminated the restriction of not
	77	being initializeable in kernel modules (the remaining problem is that
	78	locks are not explicitly initialized - see lockdep-design.txt)
	79
	80	Declaration and initialization:
	81	-------------------------------
	82
	83	#include <linux/lglock.h>
	84
	85	DEFINE_LGLOCK(name)
	86	or:
	87	DEFINE_STATIC_LGLOCK(name);
	88
	89	lg_lock_init(&name, "lockdep_name_string");
	90
	91	on UP this is mapped to DEFINE_SPINLOCK(name) in both cases, note
	92	also that as of 3.18-rc6 all declaration in use are of the _STATIC_
	93	variant (and it seems that the non-static was never in use).
	94	lg_lock_init is initializing the lockdep map only.
	95
	96	Usage:
	97	------
	98
	99	From the locking semantics it is a spinlock. It could be called a
	100	locality aware spinlock. lg_local_* behaves like a per_cpu
	101	spinlock and lg_global_* like a global spinlock.
	102	No surprises in the API.
	103
	104	lg_local_lock(*lglock);
	105	access to protected per_cpu object on this CPU
	106	lg_local_unlock(*lglock);
	107
	108	lg_local_lock_cpu(*lglock, cpu);
	109	access to protected per_cpu object on other CPU cpu
	110	lg_local_unlock_cpu(*lglock, cpu);
	111
	112	lg_global_lock(*lglock);
	113	access all protected per_cpu objects on all CPUs
	114	lg_global_unlock(*lglock);
	115
	116	There are no _trylock variants of the lglocks.
	117
	118	Note that the lg_global_lock/unlock has to iterate over all possible
	119	CPUs rather than the actually present CPUs or a CPU could go off-line
	120	with a held lock [4] and that makes it very expensive. A discussion on
	121	these issues can be found at [5]
	122
	123	Constraints:
	124	------------
	125
	126	* currently the declaration of lglocks in kernel modules is not
	127	possible, though this should be doable with little change.
	128	* lglocks are not recursive.
	129	* suitable for code that can do most operations on the CPU local
	130	data and will very rarely need the global lock
	131	* lg_global_lock/unlock is very expensive and does not scale
	132	* on UP systems all lg_* primitives are simply spinlocks
	133	* in PREEMPT_RT the spinlock becomes an rt-mutex and can sleep but
	134	does not change the tasks state while sleeping [6].
	135	* in PREEMPT_RT the preempt_disable/enable in lg_local_lock/unlock
	136	is downgraded to a migrate_disable/enable, the other
	137	preempt_disable/enable are downgraded to barriers [6].
	138	The deadlock noted for non-RT above is resolved due to rt_mutexes
	139	boosting the lock-holder in this case which arch_spin_locks do
	140	not do.
	141
	142	lglocks were designed for very specific problems in the VFS and probably
	143	only are the right answer in these corner cases. Any new user that looks
	144	at lglocks probably wants to look at the seqlock and RCU alternatives as
	145	her first choice. There are also efforts to resolve the RCU issues that
	146	currently prevent using RCU in place of view remaining lglocks.
	147
	148	Note on brlock history:
	149	-----------------------
	150
	151	The 'Big Reader' read-write spinlocks were originally introduced by
	152	Ingo Molnar in 2000 (2.4/2.5 kernel series) and removed in 2003. They
	153	later were introduced by the VFS scalability patch set in 2.6 series
	154	again as the "big reader lock" brlock [2] variant of lglock which has
	155	been replaced by seqlock primitives or by RCU based primitives in the
	156	3.13 kernel series as was suggested in [3] in 2003. The brlock was
	157	entirely removed in the 3.13 kernel series.
	158
	159	Link: 1 http://lkml.org/lkml/2010/8/2/81
	160	Link: 2 http://lwn.net/Articles/401738/
	161	Link: 3 http://lkml.org/lkml/2003/3/9/205
	162	Link: 4 https://lkml.org/lkml/2011/8/24/185
	163	Link: 5 http://lkml.org/lkml/2011/12/18/189
	164	Link: 6 https://www.kernel.org/pub/linux/kernel/projects/rt/
	165	patch series - lglocks-rt.patch.patch
	166	Link: 7 http://lkml.org/lkml/2012/3/5/26