aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/RCU/Design/Data-Structures/Data-Structures.html
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/RCU/Design/Data-Structures/Data-Structures.html')
-rw-r--r--Documentation/RCU/Design/Data-Structures/Data-Structures.html173
1 files changed, 55 insertions, 118 deletions
diff --git a/Documentation/RCU/Design/Data-Structures/Data-Structures.html b/Documentation/RCU/Design/Data-Structures/Data-Structures.html
index 1d2051c0c3fc..18f179807563 100644
--- a/Documentation/RCU/Design/Data-Structures/Data-Structures.html
+++ b/Documentation/RCU/Design/Data-Structures/Data-Structures.html
@@ -23,8 +23,6 @@ to each other.
23 The <tt>rcu_segcblist</tt> Structure</a> 23 The <tt>rcu_segcblist</tt> Structure</a>
24<li> <a href="#The rcu_data Structure"> 24<li> <a href="#The rcu_data Structure">
25 The <tt>rcu_data</tt> Structure</a> 25 The <tt>rcu_data</tt> Structure</a>
26<li> <a href="#The rcu_dynticks Structure">
27 The <tt>rcu_dynticks</tt> Structure</a>
28<li> <a href="#The rcu_head Structure"> 26<li> <a href="#The rcu_head Structure">
29 The <tt>rcu_head</tt> Structure</a> 27 The <tt>rcu_head</tt> Structure</a>
30<li> <a href="#RCU-Specific Fields in the task_struct Structure"> 28<li> <a href="#RCU-Specific Fields in the task_struct Structure">
@@ -127,9 +125,11 @@ CPUs, RCU would configure the <tt>rcu_node</tt> tree as follows:
127</p><p>RCU currently permits up to a four-level tree, which on a 64-bit system 125</p><p>RCU currently permits up to a four-level tree, which on a 64-bit system
128accommodates up to 4,194,304 CPUs, though only a mere 524,288 CPUs for 126accommodates up to 4,194,304 CPUs, though only a mere 524,288 CPUs for
12932-bit systems. 12732-bit systems.
130On the other hand, you can set <tt>CONFIG_RCU_FANOUT</tt> to be 128On the other hand, you can set both <tt>CONFIG_RCU_FANOUT</tt> and
131as small as 2 if you wish, which would permit only 16 CPUs, which 129<tt>CONFIG_RCU_FANOUT_LEAF</tt> to be as small as 2, which would result
132is useful for testing. 130in a 16-CPU test using a 4-level tree.
131This can be useful for testing large-system capabilities on small test
132machines.
133 133
134</p><p>This multi-level combining tree allows us to get most of the 134</p><p>This multi-level combining tree allows us to get most of the
135performance and scalability 135performance and scalability
@@ -154,44 +154,9 @@ on that root <tt>rcu_node</tt> structure remains acceptably low.
154keeping lock contention under control at all tree levels regardless 154keeping lock contention under control at all tree levels regardless
155of the level of loading on the system. 155of the level of loading on the system.
156 156
157</p><p>The Linux kernel actually supports multiple flavors of RCU
158running concurrently, so RCU builds separate data structures for each
159flavor.
160For example, for <tt>CONFIG_TREE_RCU=y</tt> kernels, RCU provides
161rcu_sched and rcu_bh, as shown below:
162
163</p><p><img src="BigTreeClassicRCUBH.svg" alt="BigTreeClassicRCUBH.svg" width="33%">
164
165</p><p>Energy efficiency is increasingly important, and for that
166reason the Linux kernel provides <tt>CONFIG_NO_HZ_IDLE</tt>, which
167turns off the scheduling-clock interrupts on idle CPUs, which in
168turn allows those CPUs to attain deeper sleep states and to consume
169less energy.
170CPUs whose scheduling-clock interrupts have been turned off are
171said to be in <i>dyntick-idle mode</i>.
172RCU must handle dyntick-idle CPUs specially
173because RCU would otherwise wake up each CPU on every grace period,
174which would defeat the whole purpose of <tt>CONFIG_NO_HZ_IDLE</tt>.
175RCU uses the <tt>rcu_dynticks</tt> structure to track
176which CPUs are in dyntick idle mode, as shown below:
177
178</p><p><img src="BigTreeClassicRCUBHdyntick.svg" alt="BigTreeClassicRCUBHdyntick.svg" width="33%">
179
180</p><p>However, if a CPU is in dyntick-idle mode, it is in that mode
181for all flavors of RCU.
182Therefore, a single <tt>rcu_dynticks</tt> structure is allocated per
183CPU, and all of a given CPU's <tt>rcu_data</tt> structures share
184that <tt>rcu_dynticks</tt>, as shown in the figure.
185
186</p><p>Kernels built with <tt>CONFIG_PREEMPT_RCU</tt> support
187rcu_preempt in addition to rcu_sched and rcu_bh, as shown below:
188
189</p><p><img src="BigTreePreemptRCUBHdyntick.svg" alt="BigTreePreemptRCUBHdyntick.svg" width="35%">
190
191</p><p>RCU updaters wait for normal grace periods by registering 157</p><p>RCU updaters wait for normal grace periods by registering
192RCU callbacks, either directly via <tt>call_rcu()</tt> and 158RCU callbacks, either directly via <tt>call_rcu()</tt> and
193friends (namely <tt>call_rcu_bh()</tt> and <tt>call_rcu_sched()</tt>), 159friends (namely <tt>call_rcu_bh()</tt> and <tt>call_rcu_sched()</tt>),
194there being a separate interface per flavor of RCU)
195or indirectly via <tt>synchronize_rcu()</tt> and friends. 160or indirectly via <tt>synchronize_rcu()</tt> and friends.
196RCU callbacks are represented by <tt>rcu_head</tt> structures, 161RCU callbacks are represented by <tt>rcu_head</tt> structures,
197which are queued on <tt>rcu_data</tt> structures while they are 162which are queued on <tt>rcu_data</tt> structures while they are
@@ -214,9 +179,6 @@ its own synchronization:
214<li> Each <tt>rcu_node</tt> structure has a spinlock. 179<li> Each <tt>rcu_node</tt> structure has a spinlock.
215<li> The fields in <tt>rcu_data</tt> are private to the corresponding 180<li> The fields in <tt>rcu_data</tt> are private to the corresponding
216 CPU, although a few can be read and written by other CPUs. 181 CPU, although a few can be read and written by other CPUs.
217<li> Similarly, the fields in <tt>rcu_dynticks</tt> are private
218 to the corresponding CPU, although a few can be read by
219 other CPUs.
220</ol> 182</ol>
221 183
222<p>It is important to note that different data structures can have 184<p>It is important to note that different data structures can have
@@ -272,11 +234,6 @@ follows:
272 access to this information from the corresponding CPU. 234 access to this information from the corresponding CPU.
273 Finally, this structure records past dyntick-idle state 235 Finally, this structure records past dyntick-idle state
274 for the corresponding CPU and also tracks statistics. 236 for the corresponding CPU and also tracks statistics.
275<li> <tt>rcu_dynticks</tt>:
276 This per-CPU structure tracks the current dyntick-idle
277 state for the corresponding CPU.
278 Unlike the other three structures, the <tt>rcu_dynticks</tt>
279 structure is not replicated per RCU flavor.
280<li> <tt>rcu_head</tt>: 237<li> <tt>rcu_head</tt>:
281 This structure represents RCU callbacks, and is the 238 This structure represents RCU callbacks, and is the
282 only structure allocated and managed by RCU users. 239 only structure allocated and managed by RCU users.
@@ -287,14 +244,14 @@ follows:
287<p>If all you wanted from this article was a general notion of how 244<p>If all you wanted from this article was a general notion of how
288RCU's data structures are related, you are done. 245RCU's data structures are related, you are done.
289Otherwise, each of the following sections give more details on 246Otherwise, each of the following sections give more details on
290the <tt>rcu_state</tt>, <tt>rcu_node</tt>, <tt>rcu_data</tt>, 247the <tt>rcu_state</tt>, <tt>rcu_node</tt> and <tt>rcu_data</tt> data
291and <tt>rcu_dynticks</tt> data structures. 248structures.
292 249
293<h3><a name="The rcu_state Structure"> 250<h3><a name="The rcu_state Structure">
294The <tt>rcu_state</tt> Structure</a></h3> 251The <tt>rcu_state</tt> Structure</a></h3>
295 252
296<p>The <tt>rcu_state</tt> structure is the base structure that 253<p>The <tt>rcu_state</tt> structure is the base structure that
297represents a flavor of RCU. 254represents the state of RCU in the system.
298This structure forms the interconnection between the 255This structure forms the interconnection between the
299<tt>rcu_node</tt> and <tt>rcu_data</tt> structures, 256<tt>rcu_node</tt> and <tt>rcu_data</tt> structures,
300tracks grace periods, contains the lock used to 257tracks grace periods, contains the lock used to
@@ -389,7 +346,7 @@ sequence number.
389The bottom two bits are the state of the current grace period, 346The bottom two bits are the state of the current grace period,
390which can be zero for not yet started or one for in progress. 347which can be zero for not yet started or one for in progress.
391In other words, if the bottom two bits of <tt>-&gt;gp_seq</tt> are 348In other words, if the bottom two bits of <tt>-&gt;gp_seq</tt> are
392zero, the corresponding flavor of RCU is idle. 349zero, then RCU is idle.
393Any other value in the bottom two bits indicates that something is broken. 350Any other value in the bottom two bits indicates that something is broken.
394This field is protected by the root <tt>rcu_node</tt> structure's 351This field is protected by the root <tt>rcu_node</tt> structure's
395<tt>-&gt;lock</tt> field. 352<tt>-&gt;lock</tt> field.
@@ -419,10 +376,10 @@ as follows:
419grace period in jiffies. 376grace period in jiffies.
420It is protected by the root <tt>rcu_node</tt>'s <tt>-&gt;lock</tt>. 377It is protected by the root <tt>rcu_node</tt>'s <tt>-&gt;lock</tt>.
421 378
422<p>The <tt>-&gt;name</tt> field points to the name of the RCU flavor 379<p>The <tt>-&gt;name</tt> and <tt>-&gt;abbr</tt> fields distinguish
423(for example, &ldquo;rcu_sched&rdquo;), and is constant. 380between preemptible RCU (&ldquo;rcu_preempt&rdquo; and &ldquo;p&rdquo;)
424The <tt>-&gt;abbr</tt> field contains a one-character abbreviation, 381and non-preemptible RCU (&ldquo;rcu_sched&rdquo; and &ldquo;s&rdquo;).
425for example, &ldquo;s&rdquo; for RCU-sched. 382These fields are used for diagnostic and tracing purposes.
426 383
427<h3><a name="The rcu_node Structure"> 384<h3><a name="The rcu_node Structure">
428The <tt>rcu_node</tt> Structure</a></h3> 385The <tt>rcu_node</tt> Structure</a></h3>
@@ -971,25 +928,31 @@ this <tt>rcu_segcblist</tt> structure, <i>not</i> the <tt>-&gt;head</tt>
971pointer. 928pointer.
972The reason for this is that all the ready-to-invoke callbacks 929The reason for this is that all the ready-to-invoke callbacks
973(that is, those in the <tt>RCU_DONE_TAIL</tt> segment) are extracted 930(that is, those in the <tt>RCU_DONE_TAIL</tt> segment) are extracted
974all at once at callback-invocation time. 931all at once at callback-invocation time (<tt>rcu_do_batch</tt>), due
932to which <tt>-&gt;head</tt> may be set to NULL if there are no not-done
933callbacks remaining in the <tt>rcu_segcblist</tt>.
975If callback invocation must be postponed, for example, because a 934If callback invocation must be postponed, for example, because a
976high-priority process just woke up on this CPU, then the remaining 935high-priority process just woke up on this CPU, then the remaining
977callbacks are placed back on the <tt>RCU_DONE_TAIL</tt> segment. 936callbacks are placed back on the <tt>RCU_DONE_TAIL</tt> segment and
978Either way, the <tt>-&gt;len</tt> and <tt>-&gt;len_lazy</tt> counts 937<tt>-&gt;head</tt> once again points to the start of the segment.
979are adjusted after the corresponding callbacks have been invoked, and so 938In short, the head field can briefly be <tt>NULL</tt> even though the
980again it is the <tt>-&gt;len</tt> count that accurately reflects whether 939CPU has callbacks present the entire time.
981or not there are callbacks associated with this <tt>rcu_segcblist</tt> 940Therefore, it is not appropriate to test the <tt>-&gt;head</tt> pointer
982structure. 941for <tt>NULL</tt>.
942
943<p>In contrast, the <tt>-&gt;len</tt> and <tt>-&gt;len_lazy</tt> counts
944are adjusted only after the corresponding callbacks have been invoked.
945This means that the <tt>-&gt;len</tt> count is zero only if
946the <tt>rcu_segcblist</tt> structure really is devoid of callbacks.
983Of course, off-CPU sampling of the <tt>-&gt;len</tt> count requires 947Of course, off-CPU sampling of the <tt>-&gt;len</tt> count requires
984the use of appropriate synchronization, for example, memory barriers. 948careful use of appropriate synchronization, for example, memory barriers.
985This synchronization can be a bit subtle, particularly in the case 949This synchronization can be a bit subtle, particularly in the case
986of <tt>rcu_barrier()</tt>. 950of <tt>rcu_barrier()</tt>.
987 951
988<h3><a name="The rcu_data Structure"> 952<h3><a name="The rcu_data Structure">
989The <tt>rcu_data</tt> Structure</a></h3> 953The <tt>rcu_data</tt> Structure</a></h3>
990 954
991<p>The <tt>rcu_data</tt> maintains the per-CPU state for the 955<p>The <tt>rcu_data</tt> maintains the per-CPU state for the RCU subsystem.
992corresponding flavor of RCU.
993The fields in this structure may be accessed only from the corresponding 956The fields in this structure may be accessed only from the corresponding
994CPU (and from tracing) unless otherwise stated. 957CPU (and from tracing) unless otherwise stated.
995This structure is the 958This structure is the
@@ -1015,30 +978,19 @@ as follows:
1015 978
1016<pre> 979<pre>
1017 1 int cpu; 980 1 int cpu;
1018 2 struct rcu_state *rsp; 981 2 struct rcu_node *mynode;
1019 3 struct rcu_node *mynode; 982 3 unsigned long grpmask;
1020 4 struct rcu_dynticks *dynticks; 983 4 bool beenonline;
1021 5 unsigned long grpmask;
1022 6 bool beenonline;
1023</pre> 984</pre>
1024 985
1025<p>The <tt>-&gt;cpu</tt> field contains the number of the 986<p>The <tt>-&gt;cpu</tt> field contains the number of the
1026corresponding CPU, the <tt>-&gt;rsp</tt> pointer references 987corresponding CPU and the <tt>-&gt;mynode</tt> field references the
1027the corresponding <tt>rcu_state</tt> structure (and is most frequently 988corresponding <tt>rcu_node</tt> structure.
1028used to locate the name of the corresponding flavor of RCU for tracing),
1029and the <tt>-&gt;mynode</tt> field references the corresponding
1030<tt>rcu_node</tt> structure.
1031The <tt>-&gt;mynode</tt> is used to propagate quiescent states 989The <tt>-&gt;mynode</tt> is used to propagate quiescent states
1032up the combining tree. 990up the combining tree.
1033<p>The <tt>-&gt;dynticks</tt> pointer references the 991These two fields are constant and therefore do not require synchronization.
1034<tt>rcu_dynticks</tt> structure corresponding to this
1035CPU.
1036Recall that a single per-CPU instance of the <tt>rcu_dynticks</tt>
1037structure is shared among all flavors of RCU.
1038These first four fields are constant and therefore require not
1039synchronization.
1040 992
1041</p><p>The <tt>-&gt;grpmask</tt> field indicates the bit in 993<p>The <tt>-&gt;grpmask</tt> field indicates the bit in
1042the <tt>-&gt;mynode-&gt;qsmask</tt> corresponding to this 994the <tt>-&gt;mynode-&gt;qsmask</tt> corresponding to this
1043<tt>rcu_data</tt> structure, and is also used when propagating 995<tt>rcu_data</tt> structure, and is also used when propagating
1044quiescent states. 996quiescent states.
@@ -1057,12 +1009,12 @@ as follows:
1057 3 bool cpu_no_qs; 1009 3 bool cpu_no_qs;
1058 4 bool core_needs_qs; 1010 4 bool core_needs_qs;
1059 5 bool gpwrap; 1011 5 bool gpwrap;
1060 6 unsigned long rcu_qs_ctr_snap;
1061</pre> 1012</pre>
1062 1013
1063<p>The <tt>-&gt;gp_seq</tt> and <tt>-&gt;gp_seq_needed</tt> 1014<p>The <tt>-&gt;gp_seq</tt> field is the counterpart of the field of the same
1064fields are the counterparts of the fields of the same name 1015name in the <tt>rcu_state</tt> and <tt>rcu_node</tt> structures. The
1065in the <tt>rcu_state</tt> and <tt>rcu_node</tt> structures. 1016<tt>-&gt;gp_seq_needed</tt> field is the counterpart of the field of the same
1017name in the rcu_node</tt> structure.
1066They may each lag up to one behind their <tt>rcu_node</tt> 1018They may each lag up to one behind their <tt>rcu_node</tt>
1067counterparts, but in <tt>CONFIG_NO_HZ_IDLE</tt> and 1019counterparts, but in <tt>CONFIG_NO_HZ_IDLE</tt> and
1068<tt>CONFIG_NO_HZ_FULL</tt> kernels can lag 1020<tt>CONFIG_NO_HZ_FULL</tt> kernels can lag
@@ -1103,10 +1055,6 @@ CPU has remained idle for so long that the
1103<tt>gp_seq</tt> counter is in danger of overflow, which 1055<tt>gp_seq</tt> counter is in danger of overflow, which
1104will cause the CPU to disregard the values of its counters on 1056will cause the CPU to disregard the values of its counters on
1105its next exit from idle. 1057its next exit from idle.
1106Finally, the <tt>rcu_qs_ctr_snap</tt> field is used to detect
1107cases where a given operation has resulted in a quiescent state
1108for all flavors of RCU, for example, <tt>cond_resched()</tt>
1109when RCU has indicated a need for quiescent states.
1110 1058
1111<h5>RCU Callback Handling</h5> 1059<h5>RCU Callback Handling</h5>
1112 1060
@@ -1179,26 +1127,22 @@ Finally, the <tt>-&gt;dynticks_fqs</tt> field is used to
1179count the number of times this CPU is determined to be in 1127count the number of times this CPU is determined to be in
1180dyntick-idle state, and is used for tracing and debugging purposes. 1128dyntick-idle state, and is used for tracing and debugging purposes.
1181 1129
1182<h3><a name="The rcu_dynticks Structure"> 1130<p>
1183The <tt>rcu_dynticks</tt> Structure</a></h3> 1131This portion of the rcu_data structure is declared as follows:
1184
1185<p>The <tt>rcu_dynticks</tt> maintains the per-CPU dyntick-idle state
1186for the corresponding CPU.
1187Unlike the other structures, <tt>rcu_dynticks</tt> is not
1188replicated over the different flavors of RCU.
1189The fields in this structure may be accessed only from the corresponding
1190CPU (and from tracing) unless otherwise stated.
1191Its fields are as follows:
1192 1132
1193<pre> 1133<pre>
1194 1 long dynticks_nesting; 1134 1 long dynticks_nesting;
1195 2 long dynticks_nmi_nesting; 1135 2 long dynticks_nmi_nesting;
1196 3 atomic_t dynticks; 1136 3 atomic_t dynticks;
1197 4 bool rcu_need_heavy_qs; 1137 4 bool rcu_need_heavy_qs;
1198 5 unsigned long rcu_qs_ctr; 1138 5 bool rcu_urgent_qs;
1199 6 bool rcu_urgent_qs;
1200</pre> 1139</pre>
1201 1140
1141<p>These fields in the rcu_data structure maintain the per-CPU dyntick-idle
1142state for the corresponding CPU.
1143The fields may be accessed only from the corresponding CPU (and from tracing)
1144unless otherwise stated.
1145
1202<p>The <tt>-&gt;dynticks_nesting</tt> field counts the 1146<p>The <tt>-&gt;dynticks_nesting</tt> field counts the
1203nesting depth of process execution, so that in normal circumstances 1147nesting depth of process execution, so that in normal circumstances
1204this counter has value zero or one. 1148this counter has value zero or one.
@@ -1240,19 +1184,12 @@ it is willing to call for heavy-weight dyntick-counter operations.
1240This flag is checked by RCU's context-switch and <tt>cond_resched()</tt> 1184This flag is checked by RCU's context-switch and <tt>cond_resched()</tt>
1241code, which provide a momentary idle sojourn in response. 1185code, which provide a momentary idle sojourn in response.
1242 1186
1243</p><p>The <tt>-&gt;rcu_qs_ctr</tt> field is used to record
1244quiescent states from <tt>cond_resched()</tt>.
1245Because <tt>cond_resched()</tt> can execute quite frequently, this
1246must be quite lightweight, as in a non-atomic increment of this
1247per-CPU field.
1248
1249</p><p>Finally, the <tt>-&gt;rcu_urgent_qs</tt> field is used to record 1187</p><p>Finally, the <tt>-&gt;rcu_urgent_qs</tt> field is used to record
1250the fact that the RCU core code would really like to see a quiescent 1188the fact that the RCU core code would really like to see a quiescent state from
1251state from the corresponding CPU, with the various other fields indicating 1189the corresponding CPU, with the various other fields indicating just how badly
1252just how badly RCU wants this quiescent state. 1190RCU wants this quiescent state.
1253This flag is checked by RCU's context-switch and <tt>cond_resched()</tt> 1191This flag is checked by RCU's context-switch path
1254code, which, if nothing else, non-atomically increment <tt>-&gt;rcu_qs_ctr</tt> 1192(<tt>rcu_note_context_switch</tt>) and the cond_resched code.
1255in response.
1256 1193
1257<table> 1194<table>
1258<tr><th>&nbsp;</th></tr> 1195<tr><th>&nbsp;</th></tr>
@@ -1425,11 +1362,11 @@ the last part of the array, thus traversing only the leaf
1425<h3><a name="Summary"> 1362<h3><a name="Summary">
1426Summary</a></h3> 1363Summary</a></h3>
1427 1364
1428So each flavor of RCU is represented by an <tt>rcu_state</tt> structure, 1365So the state of RCU is represented by an <tt>rcu_state</tt> structure,
1429which contains a combining tree of <tt>rcu_node</tt> and 1366which contains a combining tree of <tt>rcu_node</tt> and
1430<tt>rcu_data</tt> structures. 1367<tt>rcu_data</tt> structures.
1431Finally, in <tt>CONFIG_NO_HZ_IDLE</tt> kernels, each CPU's dyntick-idle 1368Finally, in <tt>CONFIG_NO_HZ_IDLE</tt> kernels, each CPU's dyntick-idle
1432state is tracked by an <tt>rcu_dynticks</tt> structure. 1369state is tracked by dynticks-related fields in the <tt>rcu_data</tt> structure.
1433 1370
1434If you made it this far, you are well prepared to read the code 1371If you made it this far, you are well prepared to read the code
1435walkthroughs in the other articles in this series. 1372walkthroughs in the other articles in this series.