diff options
-rw-r--r-- | Documentation/RCU/Design/Data-Structures/Data-Structures.html | 207 | ||||
-rw-r--r-- | Documentation/RCU/Design/Data-Structures/nxtlist.svg | 34 |
2 files changed, 156 insertions, 85 deletions
diff --git a/Documentation/RCU/Design/Data-Structures/Data-Structures.html b/Documentation/RCU/Design/Data-Structures/Data-Structures.html index d583c653a703..2ab38ee420c5 100644 --- a/Documentation/RCU/Design/Data-Structures/Data-Structures.html +++ b/Documentation/RCU/Design/Data-Structures/Data-Structures.html | |||
@@ -19,6 +19,8 @@ to each other. | |||
19 | The <tt>rcu_state</tt> Structure</a> | 19 | The <tt>rcu_state</tt> Structure</a> |
20 | <li> <a href="#The rcu_node Structure"> | 20 | <li> <a href="#The rcu_node Structure"> |
21 | The <tt>rcu_node</tt> Structure</a> | 21 | The <tt>rcu_node</tt> Structure</a> |
22 | <li> <a href="#The rcu_segcblist Structure"> | ||
23 | The <tt>rcu_segcblist</tt> Structure</a> | ||
22 | <li> <a href="#The rcu_data Structure"> | 24 | <li> <a href="#The rcu_data Structure"> |
23 | The <tt>rcu_data</tt> Structure</a> | 25 | The <tt>rcu_data</tt> Structure</a> |
24 | <li> <a href="#The rcu_dynticks Structure"> | 26 | <li> <a href="#The rcu_dynticks Structure"> |
@@ -841,6 +843,134 @@ for lockdep lock-class names. | |||
841 | Finally, lines 64-66 produce an error if the maximum number of | 843 | Finally, lines 64-66 produce an error if the maximum number of |
842 | CPUs is too large for the specified fanout. | 844 | CPUs is too large for the specified fanout. |
843 | 845 | ||
846 | <h3><a name="The rcu_segcblist Structure"> | ||
847 | The <tt>rcu_segcblist</tt> Structure</a></h3> | ||
848 | |||
849 | The <tt>rcu_segcblist</tt> structure maintains a segmented list of | ||
850 | callbacks as follows: | ||
851 | |||
852 | <pre> | ||
853 | 1 #define RCU_DONE_TAIL 0 | ||
854 | 2 #define RCU_WAIT_TAIL 1 | ||
855 | 3 #define RCU_NEXT_READY_TAIL 2 | ||
856 | 4 #define RCU_NEXT_TAIL 3 | ||
857 | 5 #define RCU_CBLIST_NSEGS 4 | ||
858 | 6 | ||
859 | 7 struct rcu_segcblist { | ||
860 | 8 struct rcu_head *head; | ||
861 | 9 struct rcu_head **tails[RCU_CBLIST_NSEGS]; | ||
862 | 10 unsigned long gp_seq[RCU_CBLIST_NSEGS]; | ||
863 | 11 long len; | ||
864 | 12 long len_lazy; | ||
865 | 13 }; | ||
866 | </pre> | ||
867 | |||
868 | <p> | ||
869 | The segments are as follows: | ||
870 | |||
871 | <ol> | ||
872 | <li> <tt>RCU_DONE_TAIL</tt>: Callbacks whose grace periods have elapsed. | ||
873 | These callbacks are ready to be invoked. | ||
874 | <li> <tt>RCU_WAIT_TAIL</tt>: Callbacks that are waiting for the | ||
875 | current grace period. | ||
876 | Note that different CPUs can have different ideas about which | ||
877 | grace period is current, hence the <tt>->gp_seq</tt> field. | ||
878 | <li> <tt>RCU_NEXT_READY_TAIL</tt>: Callbacks waiting for the next | ||
879 | grace period to start. | ||
880 | <li> <tt>RCU_NEXT_TAIL</tt>: Callbacks that have not yet been | ||
881 | associated with a grace period. | ||
882 | </ol> | ||
883 | |||
884 | <p> | ||
885 | The <tt>->head</tt> pointer references the first callback or | ||
886 | is <tt>NULL</tt> if the list contains no callbacks (which is | ||
887 | <i>not</i> the same as being empty). | ||
888 | Each element of the <tt>->tails[]</tt> array references the | ||
889 | <tt>->next</tt> pointer of the last callback in the corresponding | ||
890 | segment of the list, or the list's <tt>->head</tt> pointer if | ||
891 | that segment and all previous segments are empty. | ||
892 | If the corresponding segment is empty but some previous segment is | ||
893 | not empty, then the array element is identical to its predecessor. | ||
894 | Older callbacks are closer to the head of the list, and new callbacks | ||
895 | are added at the tail. | ||
896 | This relationship between the <tt>->head</tt> pointer, the | ||
897 | <tt>->tails[]</tt> array, and the callbacks is shown in this | ||
898 | diagram: | ||
899 | |||
900 | </p><p><img src="nxtlist.svg" alt="nxtlist.svg" width="40%"> | ||
901 | |||
902 | </p><p>In this figure, the <tt>->head</tt> pointer references the | ||
903 | first | ||
904 | RCU callback in the list. | ||
905 | The <tt>->tails[RCU_DONE_TAIL]</tt> array element references | ||
906 | the <tt>->head</tt> pointer itself, indicating that none | ||
907 | of the callbacks is ready to invoke. | ||
908 | The <tt>->tails[RCU_WAIT_TAIL]</tt> array element references callback | ||
909 | CB 2's <tt>->next</tt> pointer, which indicates that | ||
910 | CB 1 and CB 2 are both waiting on the current grace period, | ||
911 | give or take possible disagreements about exactly which grace period | ||
912 | is the current one. | ||
913 | The <tt>->tails[RCU_NEXT_READY_TAIL]</tt> array element | ||
914 | references the same RCU callback that <tt>->tails[RCU_WAIT_TAIL]</tt> | ||
915 | does, which indicates that there are no callbacks waiting on the next | ||
916 | RCU grace period. | ||
917 | The <tt>->tails[RCU_NEXT_TAIL]</tt> array element references | ||
918 | CB 4's <tt>->next</tt> pointer, indicating that all the | ||
919 | remaining RCU callbacks have not yet been assigned to an RCU grace | ||
920 | period. | ||
921 | Note that the <tt>->tails[RCU_NEXT_TAIL]</tt> array element | ||
922 | always references the last RCU callback's <tt>->next</tt> pointer | ||
923 | unless the callback list is empty, in which case it references | ||
924 | the <tt>->head</tt> pointer. | ||
925 | |||
926 | <p> | ||
927 | There is one additional important special case for the | ||
928 | <tt>->tails[RCU_NEXT_TAIL]</tt> array element: It can be <tt>NULL</tt> | ||
929 | when this list is <i>disabled</i>. | ||
930 | Lists are disabled when the corresponding CPU is offline or when | ||
931 | the corresponding CPU's callbacks are offloaded to a kthread, | ||
932 | both of which are described elsewhere. | ||
933 | |||
934 | </p><p>CPUs advance their callbacks from the | ||
935 | <tt>RCU_NEXT_TAIL</tt> to the <tt>RCU_NEXT_READY_TAIL</tt> to the | ||
936 | <tt>RCU_WAIT_TAIL</tt> to the <tt>RCU_DONE_TAIL</tt> list segments | ||
937 | as grace periods advance. | ||
938 | |||
939 | </p><p>The <tt>->gp_seq[]</tt> array records grace-period | ||
940 | numbers corresponding to the list segments. | ||
941 | This is what allows different CPUs to have different ideas as to | ||
942 | which is the current grace period while still avoiding premature | ||
943 | invocation of their callbacks. | ||
944 | In particular, this allows CPUs that go idle for extended periods | ||
945 | to determine which of their callbacks are ready to be invoked after | ||
946 | reawakening. | ||
947 | |||
948 | </p><p>The <tt>->len</tt> counter contains the number of | ||
949 | callbacks in <tt>->head</tt>, and the | ||
950 | <tt>->len_lazy</tt> contains the number of those callbacks that | ||
951 | are known to only free memory, and whose invocation can therefore | ||
952 | be safely deferred. | ||
953 | |||
954 | <p><b>Important note</b>: It is the <tt>->len</tt> field that | ||
955 | determines whether or not there are callbacks associated with | ||
956 | this <tt>rcu_segcblist</tt> structure, <i>not</i> the <tt>->head</tt> | ||
957 | pointer. | ||
958 | The reason for this is that all the ready-to-invoke callbacks | ||
959 | (that is, those in the <tt>RCU_DONE_TAIL</tt> segment) are extracted | ||
960 | all at once at callback-invocation time. | ||
961 | If callback invocation must be postponed, for example, because a | ||
962 | high-priority process just woke up on this CPU, then the remaining | ||
963 | callbacks are placed back on the <tt>RCU_DONE_TAIL</tt> segment. | ||
964 | Either way, the <tt>->len</tt> and <tt>->len_lazy</tt> counts | ||
965 | are adjusted after the corresponding callbacks have been invoked, and so | ||
966 | again it is the <tt>->len</tt> count that accurately reflects whether | ||
967 | or not there are callbacks associated with this <tt>rcu_segcblist</tt> | ||
968 | structure. | ||
969 | Of course, off-CPU sampling of the <tt>->len</tt> count requires | ||
970 | the use of appropriate synchronization, for example, memory barriers. | ||
971 | This synchronization can be a bit subtle, particularly in the case | ||
972 | of <tt>rcu_barrier()</tt>. | ||
973 | |||
844 | <h3><a name="The rcu_data Structure"> | 974 | <h3><a name="The rcu_data Structure"> |
845 | The <tt>rcu_data</tt> Structure</a></h3> | 975 | The <tt>rcu_data</tt> Structure</a></h3> |
846 | 976 | ||
@@ -983,62 +1113,18 @@ choice. | |||
983 | as follows: | 1113 | as follows: |
984 | 1114 | ||
985 | <pre> | 1115 | <pre> |
986 | 1 struct rcu_head *nxtlist; | 1116 | 1 struct rcu_segcblist cblist; |
987 | 2 struct rcu_head **nxttail[RCU_NEXT_SIZE]; | 1117 | 2 long qlen_last_fqs_check; |
988 | 3 unsigned long nxtcompleted[RCU_NEXT_SIZE]; | 1118 | 3 unsigned long n_cbs_invoked; |
989 | 4 long qlen_lazy; | 1119 | 4 unsigned long n_nocbs_invoked; |
990 | 5 long qlen; | 1120 | 5 unsigned long n_cbs_orphaned; |
991 | 6 long qlen_last_fqs_check; | 1121 | 6 unsigned long n_cbs_adopted; |
992 | 7 unsigned long n_force_qs_snap; | 1122 | 7 unsigned long n_force_qs_snap; |
993 | 8 unsigned long n_cbs_invoked; | 1123 | 8 long blimit; |
994 | 9 unsigned long n_cbs_orphaned; | ||
995 | 10 unsigned long n_cbs_adopted; | ||
996 | 11 long blimit; | ||
997 | </pre> | 1124 | </pre> |
998 | 1125 | ||
999 | <p>The <tt>->nxtlist</tt> pointer and the | 1126 | <p>The <tt>->cblist</tt> structure is the segmented callback list |
1000 | <tt>->nxttail[]</tt> array form a four-segment list with | 1127 | described earlier. |
1001 | older callbacks near the head and newer ones near the tail. | ||
1002 | Each segment contains callbacks with the corresponding relationship | ||
1003 | to the current grace period. | ||
1004 | The pointer out of the end of each of the four segments is referenced | ||
1005 | by the element of the <tt>->nxttail[]</tt> array indexed by | ||
1006 | <tt>RCU_DONE_TAIL</tt> (for callbacks handled by a prior grace period), | ||
1007 | <tt>RCU_WAIT_TAIL</tt> (for callbacks waiting on the current grace period), | ||
1008 | <tt>RCU_NEXT_READY_TAIL</tt> (for callbacks that will wait on the next | ||
1009 | grace period), and | ||
1010 | <tt>RCU_NEXT_TAIL</tt> (for callbacks that are not yet associated | ||
1011 | with a specific grace period) | ||
1012 | respectively, as shown in the following figure. | ||
1013 | |||
1014 | </p><p><img src="nxtlist.svg" alt="nxtlist.svg" width="40%"> | ||
1015 | |||
1016 | </p><p>In this figure, the <tt>->nxtlist</tt> pointer references the | ||
1017 | first | ||
1018 | RCU callback in the list. | ||
1019 | The <tt>->nxttail[RCU_DONE_TAIL]</tt> array element references | ||
1020 | the <tt>->nxtlist</tt> pointer itself, indicating that none | ||
1021 | of the callbacks is ready to invoke. | ||
1022 | The <tt>->nxttail[RCU_WAIT_TAIL]</tt> array element references callback | ||
1023 | CB 2's <tt>->next</tt> pointer, which indicates that | ||
1024 | CB 1 and CB 2 are both waiting on the current grace period. | ||
1025 | The <tt>->nxttail[RCU_NEXT_READY_TAIL]</tt> array element | ||
1026 | references the same RCU callback that <tt>->nxttail[RCU_WAIT_TAIL]</tt> | ||
1027 | does, which indicates that there are no callbacks waiting on the next | ||
1028 | RCU grace period. | ||
1029 | The <tt>->nxttail[RCU_NEXT_TAIL]</tt> array element references | ||
1030 | CB 4's <tt>->next</tt> pointer, indicating that all the | ||
1031 | remaining RCU callbacks have not yet been assigned to an RCU grace | ||
1032 | period. | ||
1033 | Note that the <tt>->nxttail[RCU_NEXT_TAIL]</tt> array element | ||
1034 | always references the last RCU callback's <tt>->next</tt> pointer | ||
1035 | unless the callback list is empty, in which case it references | ||
1036 | the <tt>->nxtlist</tt> pointer. | ||
1037 | |||
1038 | </p><p>CPUs advance their callbacks from the | ||
1039 | <tt>RCU_NEXT_TAIL</tt> to the <tt>RCU_NEXT_READY_TAIL</tt> to the | ||
1040 | <tt>RCU_WAIT_TAIL</tt> to the <tt>RCU_DONE_TAIL</tt> list segments | ||
1041 | as grace periods advance. | ||
1042 | The CPU advances the callbacks in its <tt>rcu_data</tt> structure | 1128 | The CPU advances the callbacks in its <tt>rcu_data</tt> structure |
1043 | whenever it notices that another RCU grace period has completed. | 1129 | whenever it notices that another RCU grace period has completed. |
1044 | The CPU detects the completion of an RCU grace period by noticing | 1130 | The CPU detects the completion of an RCU grace period by noticing |
@@ -1049,16 +1135,7 @@ Recall that each <tt>rcu_node</tt> structure's | |||
1049 | <tt>->completed</tt> field is updated at the end of each | 1135 | <tt>->completed</tt> field is updated at the end of each |
1050 | grace period. | 1136 | grace period. |
1051 | 1137 | ||
1052 | </p><p>The <tt>->nxtcompleted[]</tt> array records grace-period | 1138 | <p> |
1053 | numbers corresponding to the list segments. | ||
1054 | This allows CPUs that go idle for extended periods to determine | ||
1055 | which of their callbacks are ready to be invoked after reawakening. | ||
1056 | |||
1057 | </p><p>The <tt>->qlen</tt> counter contains the number of | ||
1058 | callbacks in <tt>->nxtlist</tt>, and the | ||
1059 | <tt>->qlen_lazy</tt> contains the number of those callbacks that | ||
1060 | are known to only free memory, and whose invocation can therefore | ||
1061 | be safely deferred. | ||
1062 | The <tt>->qlen_last_fqs_check</tt> and | 1139 | The <tt>->qlen_last_fqs_check</tt> and |
1063 | <tt>->n_force_qs_snap</tt> coordinate the forcing of quiescent | 1140 | <tt>->n_force_qs_snap</tt> coordinate the forcing of quiescent |
1064 | states from <tt>call_rcu()</tt> and friends when callback | 1141 | states from <tt>call_rcu()</tt> and friends when callback |
@@ -1069,6 +1146,10 @@ lists grow excessively long. | |||
1069 | fields count the number of callbacks invoked, | 1146 | fields count the number of callbacks invoked, |
1070 | sent to other CPUs when this CPU goes offline, | 1147 | sent to other CPUs when this CPU goes offline, |
1071 | and received from other CPUs when those other CPUs go offline. | 1148 | and received from other CPUs when those other CPUs go offline. |
1149 | The <tt>->n_nocbs_invoked</tt> is used when the CPU's callbacks | ||
1150 | are offloaded to a kthread. | ||
1151 | |||
1152 | <p> | ||
1072 | Finally, the <tt>->blimit</tt> counter is the maximum number of | 1153 | Finally, the <tt>->blimit</tt> counter is the maximum number of |
1073 | RCU callbacks that may be invoked at a given time. | 1154 | RCU callbacks that may be invoked at a given time. |
1074 | 1155 | ||
diff --git a/Documentation/RCU/Design/Data-Structures/nxtlist.svg b/Documentation/RCU/Design/Data-Structures/nxtlist.svg index abc4cc73a097..0223e79c38e0 100644 --- a/Documentation/RCU/Design/Data-Structures/nxtlist.svg +++ b/Documentation/RCU/Design/Data-Structures/nxtlist.svg | |||
@@ -19,7 +19,7 @@ | |||
19 | id="svg2" | 19 | id="svg2" |
20 | version="1.1" | 20 | version="1.1" |
21 | inkscape:version="0.48.4 r9939" | 21 | inkscape:version="0.48.4 r9939" |
22 | sodipodi:docname="nxtlist.fig"> | 22 | sodipodi:docname="segcblist.svg"> |
23 | <metadata | 23 | <metadata |
24 | id="metadata94"> | 24 | id="metadata94"> |
25 | <rdf:RDF> | 25 | <rdf:RDF> |
@@ -28,7 +28,7 @@ | |||
28 | <dc:format>image/svg+xml</dc:format> | 28 | <dc:format>image/svg+xml</dc:format> |
29 | <dc:type | 29 | <dc:type |
30 | rdf:resource="http://purl.org/dc/dcmitype/StillImage" /> | 30 | rdf:resource="http://purl.org/dc/dcmitype/StillImage" /> |
31 | <dc:title></dc:title> | 31 | <dc:title /> |
32 | </cc:Work> | 32 | </cc:Work> |
33 | </rdf:RDF> | 33 | </rdf:RDF> |
34 | </metadata> | 34 | </metadata> |
@@ -241,61 +241,51 @@ | |||
241 | xml:space="preserve" | 241 | xml:space="preserve" |
242 | x="225" | 242 | x="225" |
243 | y="675" | 243 | y="675" |
244 | fill="#000000" | ||
245 | font-family="Courier" | ||
246 | font-style="normal" | 244 | font-style="normal" |
247 | font-weight="bold" | 245 | font-weight="bold" |
248 | font-size="324" | 246 | font-size="324" |
249 | text-anchor="start" | 247 | id="text64" |
250 | id="text64">nxtlist</text> | 248 | style="font-size:324px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;font-family:Courier">->head</text> |
251 | <!-- Text --> | 249 | <!-- Text --> |
252 | <text | 250 | <text |
253 | xml:space="preserve" | 251 | xml:space="preserve" |
254 | x="225" | 252 | x="225" |
255 | y="1800" | 253 | y="1800" |
256 | fill="#000000" | ||
257 | font-family="Courier" | ||
258 | font-style="normal" | 254 | font-style="normal" |
259 | font-weight="bold" | 255 | font-weight="bold" |
260 | font-size="324" | 256 | font-size="324" |
261 | text-anchor="start" | 257 | id="text66" |
262 | id="text66">nxttail[RCU_DONE_TAIL]</text> | 258 | style="font-size:324px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;font-family:Courier">->tails[RCU_DONE_TAIL]</text> |
263 | <!-- Text --> | 259 | <!-- Text --> |
264 | <text | 260 | <text |
265 | xml:space="preserve" | 261 | xml:space="preserve" |
266 | x="225" | 262 | x="225" |
267 | y="2925" | 263 | y="2925" |
268 | fill="#000000" | ||
269 | font-family="Courier" | ||
270 | font-style="normal" | 264 | font-style="normal" |
271 | font-weight="bold" | 265 | font-weight="bold" |
272 | font-size="324" | 266 | font-size="324" |
273 | text-anchor="start" | 267 | id="text68" |
274 | id="text68">nxttail[RCU_WAIT_TAIL]</text> | 268 | style="font-size:324px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;font-family:Courier">->tails[RCU_WAIT_TAIL]</text> |
275 | <!-- Text --> | 269 | <!-- Text --> |
276 | <text | 270 | <text |
277 | xml:space="preserve" | 271 | xml:space="preserve" |
278 | x="225" | 272 | x="225" |
279 | y="4050" | 273 | y="4050" |
280 | fill="#000000" | ||
281 | font-family="Courier" | ||
282 | font-style="normal" | 274 | font-style="normal" |
283 | font-weight="bold" | 275 | font-weight="bold" |
284 | font-size="324" | 276 | font-size="324" |
285 | text-anchor="start" | 277 | id="text70" |
286 | id="text70">nxttail[RCU_NEXT_READY_TAIL]</text> | 278 | style="font-size:324px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;font-family:Courier">->tails[RCU_NEXT_READY_TAIL]</text> |
287 | <!-- Text --> | 279 | <!-- Text --> |
288 | <text | 280 | <text |
289 | xml:space="preserve" | 281 | xml:space="preserve" |
290 | x="225" | 282 | x="225" |
291 | y="5175" | 283 | y="5175" |
292 | fill="#000000" | ||
293 | font-family="Courier" | ||
294 | font-style="normal" | 284 | font-style="normal" |
295 | font-weight="bold" | 285 | font-weight="bold" |
296 | font-size="324" | 286 | font-size="324" |
297 | text-anchor="start" | 287 | id="text72" |
298 | id="text72">nxttail[RCU_NEXT_TAIL]</text> | 288 | style="font-size:324px;font-style:normal;font-weight:bold;text-anchor:start;fill:#000000;font-family:Courier">->tails[RCU_NEXT_TAIL]</text> |
299 | <!-- Text --> | 289 | <!-- Text --> |
300 | <text | 290 | <text |
301 | xml:space="preserve" | 291 | xml:space="preserve" |