diff options
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/RCU/stallwarn.txt | 94 | ||||
-rw-r--r-- | Documentation/RCU/torture.txt | 10 | ||||
-rw-r--r-- | Documentation/RCU/trace.txt | 35 | ||||
-rw-r--r-- | Documentation/intel_txt.txt | 16 | ||||
-rw-r--r-- | Documentation/kernel-parameters.txt | 8 | ||||
-rw-r--r-- | Documentation/kprobes.txt | 10 | ||||
-rw-r--r-- | Documentation/rbtree.txt | 58 | ||||
-rw-r--r-- | Documentation/scheduler/sched-design-CFS.txt | 54 | ||||
-rw-r--r-- | Documentation/scheduler/sched-rt-group.txt | 20 | ||||
-rw-r--r-- | Documentation/trace/events.txt | 3 | ||||
-rw-r--r-- | Documentation/trace/ftrace.txt | 50 | ||||
-rw-r--r-- | Documentation/trace/kprobetrace.txt | 4 |
12 files changed, 226 insertions, 136 deletions
diff --git a/Documentation/RCU/stallwarn.txt b/Documentation/RCU/stallwarn.txt index 1423d2570d78..44c6dcc93d6d 100644 --- a/Documentation/RCU/stallwarn.txt +++ b/Documentation/RCU/stallwarn.txt | |||
@@ -3,35 +3,79 @@ Using RCU's CPU Stall Detector | |||
3 | The CONFIG_RCU_CPU_STALL_DETECTOR kernel config parameter enables | 3 | The CONFIG_RCU_CPU_STALL_DETECTOR kernel config parameter enables |
4 | RCU's CPU stall detector, which detects conditions that unduly delay | 4 | RCU's CPU stall detector, which detects conditions that unduly delay |
5 | RCU grace periods. The stall detector's idea of what constitutes | 5 | RCU grace periods. The stall detector's idea of what constitutes |
6 | "unduly delayed" is controlled by a pair of C preprocessor macros: | 6 | "unduly delayed" is controlled by a set of C preprocessor macros: |
7 | 7 | ||
8 | RCU_SECONDS_TILL_STALL_CHECK | 8 | RCU_SECONDS_TILL_STALL_CHECK |
9 | 9 | ||
10 | This macro defines the period of time that RCU will wait from | 10 | This macro defines the period of time that RCU will wait from |
11 | the beginning of a grace period until it issues an RCU CPU | 11 | the beginning of a grace period until it issues an RCU CPU |
12 | stall warning. It is normally ten seconds. | 12 | stall warning. This time period is normally ten seconds. |
13 | 13 | ||
14 | RCU_SECONDS_TILL_STALL_RECHECK | 14 | RCU_SECONDS_TILL_STALL_RECHECK |
15 | 15 | ||
16 | This macro defines the period of time that RCU will wait after | 16 | This macro defines the period of time that RCU will wait after |
17 | issuing a stall warning until it issues another stall warning. | 17 | issuing a stall warning until it issues another stall warning |
18 | It is normally set to thirty seconds. | 18 | for the same stall. This time period is normally set to thirty |
19 | seconds. | ||
19 | 20 | ||
20 | RCU_STALL_RAT_DELAY | 21 | RCU_STALL_RAT_DELAY |
21 | 22 | ||
22 | The CPU stall detector tries to make the offending CPU rat on itself, | 23 | The CPU stall detector tries to make the offending CPU print its |
23 | as this often gives better-quality stack traces. However, if | 24 | own warnings, as this often gives better-quality stack traces. |
24 | the offending CPU does not detect its own stall in the number | 25 | However, if the offending CPU does not detect its own stall in |
25 | of jiffies specified by RCU_STALL_RAT_DELAY, then other CPUs will | 26 | the number of jiffies specified by RCU_STALL_RAT_DELAY, then |
26 | complain. This is normally set to two jiffies. | 27 | some other CPU will complain. This delay is normally set to |
28 | two jiffies. | ||
27 | 29 | ||
28 | The following problems can result in an RCU CPU stall warning: | 30 | When a CPU detects that it is stalling, it will print a message similar |
31 | to the following: | ||
32 | |||
33 | INFO: rcu_sched_state detected stall on CPU 5 (t=2500 jiffies) | ||
34 | |||
35 | This message indicates that CPU 5 detected that it was causing a stall, | ||
36 | and that the stall was affecting RCU-sched. This message will normally be | ||
37 | followed by a stack dump of the offending CPU. On TREE_RCU kernel builds, | ||
38 | RCU and RCU-sched are implemented by the same underlying mechanism, | ||
39 | while on TREE_PREEMPT_RCU kernel builds, RCU is instead implemented | ||
40 | by rcu_preempt_state. | ||
41 | |||
42 | On the other hand, if the offending CPU fails to print out a stall-warning | ||
43 | message quickly enough, some other CPU will print a message similar to | ||
44 | the following: | ||
45 | |||
46 | INFO: rcu_bh_state detected stalls on CPUs/tasks: { 3 5 } (detected by 2, 2502 jiffies) | ||
47 | |||
48 | This message indicates that CPU 2 detected that CPUs 3 and 5 were both | ||
49 | causing stalls, and that the stall was affecting RCU-bh. This message | ||
50 | will normally be followed by stack dumps for each CPU. Please note that | ||
51 | TREE_PREEMPT_RCU builds can be stalled by tasks as well as by CPUs, | ||
52 | and that the tasks will be indicated by PID, for example, "P3421". | ||
53 | It is even possible for a rcu_preempt_state stall to be caused by both | ||
54 | CPUs -and- tasks, in which case the offending CPUs and tasks will all | ||
55 | be called out in the list. | ||
56 | |||
57 | Finally, if the grace period ends just as the stall warning starts | ||
58 | printing, there will be a spurious stall-warning message: | ||
59 | |||
60 | INFO: rcu_bh_state detected stalls on CPUs/tasks: { } (detected by 4, 2502 jiffies) | ||
61 | |||
62 | This is rare, but does happen from time to time in real life. | ||
63 | |||
64 | So your kernel printed an RCU CPU stall warning. The next question is | ||
65 | "What caused it?" The following problems can result in RCU CPU stall | ||
66 | warnings: | ||
29 | 67 | ||
30 | o A CPU looping in an RCU read-side critical section. | 68 | o A CPU looping in an RCU read-side critical section. |
31 | 69 | ||
32 | o A CPU looping with interrupts disabled. | 70 | o A CPU looping with interrupts disabled. This condition can |
71 | result in RCU-sched and RCU-bh stalls. | ||
33 | 72 | ||
34 | o A CPU looping with preemption disabled. | 73 | o A CPU looping with preemption disabled. This condition can |
74 | result in RCU-sched stalls and, if ksoftirqd is in use, RCU-bh | ||
75 | stalls. | ||
76 | |||
77 | o A CPU looping with bottom halves disabled. This condition can | ||
78 | result in RCU-sched and RCU-bh stalls. | ||
35 | 79 | ||
36 | o For !CONFIG_PREEMPT kernels, a CPU looping anywhere in the kernel | 80 | o For !CONFIG_PREEMPT kernels, a CPU looping anywhere in the kernel |
37 | without invoking schedule(). | 81 | without invoking schedule(). |
@@ -39,20 +83,24 @@ o For !CONFIG_PREEMPT kernels, a CPU looping anywhere in the kernel | |||
39 | o A bug in the RCU implementation. | 83 | o A bug in the RCU implementation. |
40 | 84 | ||
41 | o A hardware failure. This is quite unlikely, but has occurred | 85 | o A hardware failure. This is quite unlikely, but has occurred |
42 | at least once in a former life. A CPU failed in a running system, | 86 | at least once in real life. A CPU failed in a running system, |
43 | becoming unresponsive, but not causing an immediate crash. | 87 | becoming unresponsive, but not causing an immediate crash. |
44 | This resulted in a series of RCU CPU stall warnings, eventually | 88 | This resulted in a series of RCU CPU stall warnings, eventually |
45 | leading the realization that the CPU had failed. | 89 | leading the realization that the CPU had failed. |
46 | 90 | ||
47 | The RCU, RCU-sched, and RCU-bh implementations have CPU stall warning. | 91 | The RCU, RCU-sched, and RCU-bh implementations have CPU stall |
48 | SRCU does not do so directly, but its calls to synchronize_sched() will | 92 | warning. SRCU does not have its own CPU stall warnings, but its |
49 | result in RCU-sched detecting any CPU stalls that might be occurring. | 93 | calls to synchronize_sched() will result in RCU-sched detecting |
50 | 94 | RCU-sched-related CPU stalls. Please note that RCU only detects | |
51 | To diagnose the cause of the stall, inspect the stack traces. The offending | 95 | CPU stalls when there is a grace period in progress. No grace period, |
52 | function will usually be near the top of the stack. If you have a series | 96 | no CPU stall warnings. |
53 | of stall warnings from a single extended stall, comparing the stack traces | 97 | |
54 | can often help determine where the stall is occurring, which will usually | 98 | To diagnose the cause of the stall, inspect the stack traces. |
55 | be in the function nearest the top of the stack that stays the same from | 99 | The offending function will usually be near the top of the stack. |
56 | trace to trace. | 100 | If you have a series of stall warnings from a single extended stall, |
101 | comparing the stack traces can often help determine where the stall | ||
102 | is occurring, which will usually be in the function nearest the top of | ||
103 | that portion of the stack which remains the same from trace to trace. | ||
104 | If you can reliably trigger the stall, ftrace can be quite helpful. | ||
57 | 105 | ||
58 | RCU bugs can often be debugged with the help of CONFIG_RCU_TRACE. | 106 | RCU bugs can often be debugged with the help of CONFIG_RCU_TRACE. |
diff --git a/Documentation/RCU/torture.txt b/Documentation/RCU/torture.txt index 0e50bc2aa1e2..5d9016795fd8 100644 --- a/Documentation/RCU/torture.txt +++ b/Documentation/RCU/torture.txt | |||
@@ -182,16 +182,6 @@ Similarly, sched_expedited RCU provides the following: | |||
182 | sched_expedited-torture: Reader Pipe: 12660320201 95875 0 0 0 0 0 0 0 0 0 | 182 | sched_expedited-torture: Reader Pipe: 12660320201 95875 0 0 0 0 0 0 0 0 0 |
183 | sched_expedited-torture: Reader Batch: 12660424885 0 0 0 0 0 0 0 0 0 0 | 183 | sched_expedited-torture: Reader Batch: 12660424885 0 0 0 0 0 0 0 0 0 0 |
184 | sched_expedited-torture: Free-Block Circulation: 1090795 1090795 1090794 1090793 1090792 1090791 1090790 1090789 1090788 1090787 0 | 184 | sched_expedited-torture: Free-Block Circulation: 1090795 1090795 1090794 1090793 1090792 1090791 1090790 1090789 1090788 1090787 0 |
185 | state: -1 / 0:0 3:0 4:0 | ||
186 | |||
187 | As before, the first four lines are similar to those for RCU. | ||
188 | The last line shows the task-migration state. The first number is | ||
189 | -1 if synchronize_sched_expedited() is idle, -2 if in the process of | ||
190 | posting wakeups to the migration kthreads, and N when waiting on CPU N. | ||
191 | Each of the colon-separated fields following the "/" is a CPU:state pair. | ||
192 | Valid states are "0" for idle, "1" for waiting for quiescent state, | ||
193 | "2" for passed through quiescent state, and "3" when a race with a | ||
194 | CPU-hotplug event forces use of the synchronize_sched() primitive. | ||
195 | 185 | ||
196 | 186 | ||
197 | USAGE | 187 | USAGE |
diff --git a/Documentation/RCU/trace.txt b/Documentation/RCU/trace.txt index 8608fd85e921..efd8cc95c06b 100644 --- a/Documentation/RCU/trace.txt +++ b/Documentation/RCU/trace.txt | |||
@@ -256,23 +256,23 @@ o Each element of the form "1/1 0:127 ^0" represents one struct | |||
256 | The output of "cat rcu/rcu_pending" looks as follows: | 256 | The output of "cat rcu/rcu_pending" looks as follows: |
257 | 257 | ||
258 | rcu_sched: | 258 | rcu_sched: |
259 | 0 np=255892 qsp=53936 cbr=0 cng=14417 gpc=10033 gps=24320 nf=6445 nn=146741 | 259 | 0 np=255892 qsp=53936 rpq=85 cbr=0 cng=14417 gpc=10033 gps=24320 nf=6445 nn=146741 |
260 | 1 np=261224 qsp=54638 cbr=0 cng=25723 gpc=16310 gps=2849 nf=5912 nn=155792 | 260 | 1 np=261224 qsp=54638 rpq=33 cbr=0 cng=25723 gpc=16310 gps=2849 nf=5912 nn=155792 |
261 | 2 np=237496 qsp=49664 cbr=0 cng=2762 gpc=45478 gps=1762 nf=1201 nn=136629 | 261 | 2 np=237496 qsp=49664 rpq=23 cbr=0 cng=2762 gpc=45478 gps=1762 nf=1201 nn=136629 |
262 | 3 np=236249 qsp=48766 cbr=0 cng=286 gpc=48049 gps=1218 nf=207 nn=137723 | 262 | 3 np=236249 qsp=48766 rpq=98 cbr=0 cng=286 gpc=48049 gps=1218 nf=207 nn=137723 |
263 | 4 np=221310 qsp=46850 cbr=0 cng=26 gpc=43161 gps=4634 nf=3529 nn=123110 | 263 | 4 np=221310 qsp=46850 rpq=7 cbr=0 cng=26 gpc=43161 gps=4634 nf=3529 nn=123110 |
264 | 5 np=237332 qsp=48449 cbr=0 cng=54 gpc=47920 gps=3252 nf=201 nn=137456 | 264 | 5 np=237332 qsp=48449 rpq=9 cbr=0 cng=54 gpc=47920 gps=3252 nf=201 nn=137456 |
265 | 6 np=219995 qsp=46718 cbr=0 cng=50 gpc=42098 gps=6093 nf=4202 nn=120834 | 265 | 6 np=219995 qsp=46718 rpq=12 cbr=0 cng=50 gpc=42098 gps=6093 nf=4202 nn=120834 |
266 | 7 np=249893 qsp=49390 cbr=0 cng=72 gpc=38400 gps=17102 nf=41 nn=144888 | 266 | 7 np=249893 qsp=49390 rpq=42 cbr=0 cng=72 gpc=38400 gps=17102 nf=41 nn=144888 |
267 | rcu_bh: | 267 | rcu_bh: |
268 | 0 np=146741 qsp=1419 cbr=0 cng=6 gpc=0 gps=0 nf=2 nn=145314 | 268 | 0 np=146741 qsp=1419 rpq=6 cbr=0 cng=6 gpc=0 gps=0 nf=2 nn=145314 |
269 | 1 np=155792 qsp=12597 cbr=0 cng=0 gpc=4 gps=8 nf=3 nn=143180 | 269 | 1 np=155792 qsp=12597 rpq=3 cbr=0 cng=0 gpc=4 gps=8 nf=3 nn=143180 |
270 | 2 np=136629 qsp=18680 cbr=0 cng=0 gpc=7 gps=6 nf=0 nn=117936 | 270 | 2 np=136629 qsp=18680 rpq=1 cbr=0 cng=0 gpc=7 gps=6 nf=0 nn=117936 |
271 | 3 np=137723 qsp=2843 cbr=0 cng=0 gpc=10 gps=7 nf=0 nn=134863 | 271 | 3 np=137723 qsp=2843 rpq=0 cbr=0 cng=0 gpc=10 gps=7 nf=0 nn=134863 |
272 | 4 np=123110 qsp=12433 cbr=0 cng=0 gpc=4 gps=2 nf=0 nn=110671 | 272 | 4 np=123110 qsp=12433 rpq=0 cbr=0 cng=0 gpc=4 gps=2 nf=0 nn=110671 |
273 | 5 np=137456 qsp=4210 cbr=0 cng=0 gpc=6 gps=5 nf=0 nn=133235 | 273 | 5 np=137456 qsp=4210 rpq=1 cbr=0 cng=0 gpc=6 gps=5 nf=0 nn=133235 |
274 | 6 np=120834 qsp=9902 cbr=0 cng=0 gpc=6 gps=3 nf=2 nn=110921 | 274 | 6 np=120834 qsp=9902 rpq=2 cbr=0 cng=0 gpc=6 gps=3 nf=2 nn=110921 |
275 | 7 np=144888 qsp=26336 cbr=0 cng=0 gpc=8 gps=2 nf=0 nn=118542 | 275 | 7 np=144888 qsp=26336 rpq=0 cbr=0 cng=0 gpc=8 gps=2 nf=0 nn=118542 |
276 | 276 | ||
277 | As always, this is once again split into "rcu_sched" and "rcu_bh" | 277 | As always, this is once again split into "rcu_sched" and "rcu_bh" |
278 | portions, with CONFIG_TREE_PREEMPT_RCU kernels having an additional | 278 | portions, with CONFIG_TREE_PREEMPT_RCU kernels having an additional |
@@ -284,6 +284,9 @@ o "np" is the number of times that __rcu_pending() has been invoked | |||
284 | o "qsp" is the number of times that the RCU was waiting for a | 284 | o "qsp" is the number of times that the RCU was waiting for a |
285 | quiescent state from this CPU. | 285 | quiescent state from this CPU. |
286 | 286 | ||
287 | o "rpq" is the number of times that the CPU had passed through | ||
288 | a quiescent state, but not yet reported it to RCU. | ||
289 | |||
287 | o "cbr" is the number of times that this CPU had RCU callbacks | 290 | o "cbr" is the number of times that this CPU had RCU callbacks |
288 | that had passed through a grace period, and were thus ready | 291 | that had passed through a grace period, and were thus ready |
289 | to be invoked. | 292 | to be invoked. |
diff --git a/Documentation/intel_txt.txt b/Documentation/intel_txt.txt index f40a1f030019..87c8990dbbd9 100644 --- a/Documentation/intel_txt.txt +++ b/Documentation/intel_txt.txt | |||
@@ -161,13 +161,15 @@ o In order to put a system into any of the sleep states after a TXT | |||
161 | has been restored, it will restore the TPM PCRs and then | 161 | has been restored, it will restore the TPM PCRs and then |
162 | transfer control back to the kernel's S3 resume vector. | 162 | transfer control back to the kernel's S3 resume vector. |
163 | In order to preserve system integrity across S3, the kernel | 163 | In order to preserve system integrity across S3, the kernel |
164 | provides tboot with a set of memory ranges (kernel | 164 | provides tboot with a set of memory ranges (RAM and RESERVED_KERN |
165 | code/data/bss, S3 resume code, and AP trampoline) that tboot | 165 | in the e820 table, but not any memory that BIOS might alter over |
166 | will calculate a MAC (message authentication code) over and then | 166 | the S3 transition) that tboot will calculate a MAC (message |
167 | seal with the TPM. On resume and once the measured environment | 167 | authentication code) over and then seal with the TPM. On resume |
168 | has been re-established, tboot will re-calculate the MAC and | 168 | and once the measured environment has been re-established, tboot |
169 | verify it against the sealed value. Tboot's policy determines | 169 | will re-calculate the MAC and verify it against the sealed value. |
170 | what happens if the verification fails. | 170 | Tboot's policy determines what happens if the verification fails. |
171 | Note that the c/s 194 of tboot which has the new MAC code supports | ||
172 | this. | ||
171 | 173 | ||
172 | That's pretty much it for TXT support. | 174 | That's pretty much it for TXT support. |
173 | 175 | ||
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 839b21b0699a..567b7a8eb878 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt | |||
@@ -324,6 +324,8 @@ and is between 256 and 4096 characters. It is defined in the file | |||
324 | they are unmapped. Otherwise they are | 324 | they are unmapped. Otherwise they are |
325 | flushed before they will be reused, which | 325 | flushed before they will be reused, which |
326 | is a lot of faster | 326 | is a lot of faster |
327 | off - do not initialize any AMD IOMMU found in | ||
328 | the system | ||
327 | 329 | ||
328 | amijoy.map= [HW,JOY] Amiga joystick support | 330 | amijoy.map= [HW,JOY] Amiga joystick support |
329 | Map of devices attached to JOY0DAT and JOY1DAT | 331 | Map of devices attached to JOY0DAT and JOY1DAT |
@@ -784,8 +786,12 @@ and is between 256 and 4096 characters. It is defined in the file | |||
784 | as early as possible in order to facilitate early | 786 | as early as possible in order to facilitate early |
785 | boot debugging. | 787 | boot debugging. |
786 | 788 | ||
787 | ftrace_dump_on_oops | 789 | ftrace_dump_on_oops[=orig_cpu] |
788 | [FTRACE] will dump the trace buffers on oops. | 790 | [FTRACE] will dump the trace buffers on oops. |
791 | If no parameter is passed, ftrace will dump | ||
792 | buffers of all CPUs, but if you pass orig_cpu, it will | ||
793 | dump only the buffer of the CPU that triggered the | ||
794 | oops. | ||
789 | 795 | ||
790 | ftrace_filter=[function-list] | 796 | ftrace_filter=[function-list] |
791 | [FTRACE] Limit the functions traced by the function | 797 | [FTRACE] Limit the functions traced by the function |
diff --git a/Documentation/kprobes.txt b/Documentation/kprobes.txt index 2f9115c0ae62..61c291cddf18 100644 --- a/Documentation/kprobes.txt +++ b/Documentation/kprobes.txt | |||
@@ -165,8 +165,8 @@ the user entry_handler invocation is also skipped. | |||
165 | 165 | ||
166 | 1.4 How Does Jump Optimization Work? | 166 | 1.4 How Does Jump Optimization Work? |
167 | 167 | ||
168 | If you configured your kernel with CONFIG_OPTPROBES=y (currently | 168 | If your kernel is built with CONFIG_OPTPROBES=y (currently this flag |
169 | this option is supported on x86/x86-64, non-preemptive kernel) and | 169 | is automatically set 'y' on x86/x86-64, non-preemptive kernel) and |
170 | the "debug.kprobes_optimization" kernel parameter is set to 1 (see | 170 | the "debug.kprobes_optimization" kernel parameter is set to 1 (see |
171 | sysctl(8)), Kprobes tries to reduce probe-hit overhead by using a jump | 171 | sysctl(8)), Kprobes tries to reduce probe-hit overhead by using a jump |
172 | instruction instead of a breakpoint instruction at each probepoint. | 172 | instruction instead of a breakpoint instruction at each probepoint. |
@@ -271,8 +271,6 @@ tweak the kernel's execution path, you need to suppress optimization, | |||
271 | using one of the following techniques: | 271 | using one of the following techniques: |
272 | - Specify an empty function for the kprobe's post_handler or break_handler. | 272 | - Specify an empty function for the kprobe's post_handler or break_handler. |
273 | or | 273 | or |
274 | - Config CONFIG_OPTPROBES=n. | ||
275 | or | ||
276 | - Execute 'sysctl -w debug.kprobes_optimization=n' | 274 | - Execute 'sysctl -w debug.kprobes_optimization=n' |
277 | 275 | ||
278 | 2. Architectures Supported | 276 | 2. Architectures Supported |
@@ -307,10 +305,6 @@ it useful to "Compile the kernel with debug info" (CONFIG_DEBUG_INFO), | |||
307 | so you can use "objdump -d -l vmlinux" to see the source-to-object | 305 | so you can use "objdump -d -l vmlinux" to see the source-to-object |
308 | code mapping. | 306 | code mapping. |
309 | 307 | ||
310 | If you want to reduce probing overhead, set "Kprobes jump optimization | ||
311 | support" (CONFIG_OPTPROBES) to "y". You can find this option under the | ||
312 | "Kprobes" line. | ||
313 | |||
314 | 4. API Reference | 308 | 4. API Reference |
315 | 309 | ||
316 | The Kprobes API includes a "register" function and an "unregister" | 310 | The Kprobes API includes a "register" function and an "unregister" |
diff --git a/Documentation/rbtree.txt b/Documentation/rbtree.txt index aae8355d3166..221f38be98f4 100644 --- a/Documentation/rbtree.txt +++ b/Documentation/rbtree.txt | |||
@@ -190,3 +190,61 @@ Example: | |||
190 | for (node = rb_first(&mytree); node; node = rb_next(node)) | 190 | for (node = rb_first(&mytree); node; node = rb_next(node)) |
191 | printk("key=%s\n", rb_entry(node, struct mytype, node)->keystring); | 191 | printk("key=%s\n", rb_entry(node, struct mytype, node)->keystring); |
192 | 192 | ||
193 | Support for Augmented rbtrees | ||
194 | ----------------------------- | ||
195 | |||
196 | Augmented rbtree is an rbtree with "some" additional data stored in each node. | ||
197 | This data can be used to augment some new functionality to rbtree. | ||
198 | Augmented rbtree is an optional feature built on top of basic rbtree | ||
199 | infrastructure. rbtree user who wants this feature will have an augment | ||
200 | callback function in rb_root initialized. | ||
201 | |||
202 | This callback function will be called from rbtree core routines whenever | ||
203 | a node has a change in one or both of its children. It is the responsibility | ||
204 | of the callback function to recalculate the additional data that is in the | ||
205 | rb node using new children information. Note that if this new additional | ||
206 | data affects the parent node's additional data, then callback function has | ||
207 | to handle it and do the recursive updates. | ||
208 | |||
209 | |||
210 | Interval tree is an example of augmented rb tree. Reference - | ||
211 | "Introduction to Algorithms" by Cormen, Leiserson, Rivest and Stein. | ||
212 | More details about interval trees: | ||
213 | |||
214 | Classical rbtree has a single key and it cannot be directly used to store | ||
215 | interval ranges like [lo:hi] and do a quick lookup for any overlap with a new | ||
216 | lo:hi or to find whether there is an exact match for a new lo:hi. | ||
217 | |||
218 | However, rbtree can be augmented to store such interval ranges in a structured | ||
219 | way making it possible to do efficient lookup and exact match. | ||
220 | |||
221 | This "extra information" stored in each node is the maximum hi | ||
222 | (max_hi) value among all the nodes that are its descendents. This | ||
223 | information can be maintained at each node just be looking at the node | ||
224 | and its immediate children. And this will be used in O(log n) lookup | ||
225 | for lowest match (lowest start address among all possible matches) | ||
226 | with something like: | ||
227 | |||
228 | find_lowest_match(lo, hi, node) | ||
229 | { | ||
230 | lowest_match = NULL; | ||
231 | while (node) { | ||
232 | if (max_hi(node->left) > lo) { | ||
233 | // Lowest overlap if any must be on left side | ||
234 | node = node->left; | ||
235 | } else if (overlap(lo, hi, node)) { | ||
236 | lowest_match = node; | ||
237 | break; | ||
238 | } else if (lo > node->lo) { | ||
239 | // Lowest overlap if any must be on right side | ||
240 | node = node->right; | ||
241 | } else { | ||
242 | break; | ||
243 | } | ||
244 | } | ||
245 | return lowest_match; | ||
246 | } | ||
247 | |||
248 | Finding exact match will be to first find lowest match and then to follow | ||
249 | successor nodes looking for exact match, until the start of a node is beyond | ||
250 | the hi value we are looking for. | ||
diff --git a/Documentation/scheduler/sched-design-CFS.txt b/Documentation/scheduler/sched-design-CFS.txt index 6f33593e59e2..8239ebbcddce 100644 --- a/Documentation/scheduler/sched-design-CFS.txt +++ b/Documentation/scheduler/sched-design-CFS.txt | |||
@@ -211,7 +211,7 @@ provide fair CPU time to each such task group. For example, it may be | |||
211 | desirable to first provide fair CPU time to each user on the system and then to | 211 | desirable to first provide fair CPU time to each user on the system and then to |
212 | each task belonging to a user. | 212 | each task belonging to a user. |
213 | 213 | ||
214 | CONFIG_GROUP_SCHED strives to achieve exactly that. It lets tasks to be | 214 | CONFIG_CGROUP_SCHED strives to achieve exactly that. It lets tasks to be |
215 | grouped and divides CPU time fairly among such groups. | 215 | grouped and divides CPU time fairly among such groups. |
216 | 216 | ||
217 | CONFIG_RT_GROUP_SCHED permits to group real-time (i.e., SCHED_FIFO and | 217 | CONFIG_RT_GROUP_SCHED permits to group real-time (i.e., SCHED_FIFO and |
@@ -220,38 +220,11 @@ SCHED_RR) tasks. | |||
220 | CONFIG_FAIR_GROUP_SCHED permits to group CFS (i.e., SCHED_NORMAL and | 220 | CONFIG_FAIR_GROUP_SCHED permits to group CFS (i.e., SCHED_NORMAL and |
221 | SCHED_BATCH) tasks. | 221 | SCHED_BATCH) tasks. |
222 | 222 | ||
223 | At present, there are two (mutually exclusive) mechanisms to group tasks for | 223 | These options need CONFIG_CGROUPS to be defined, and let the administrator |
224 | CPU bandwidth control purposes: | ||
225 | |||
226 | - Based on user id (CONFIG_USER_SCHED) | ||
227 | |||
228 | With this option, tasks are grouped according to their user id. | ||
229 | |||
230 | - Based on "cgroup" pseudo filesystem (CONFIG_CGROUP_SCHED) | ||
231 | |||
232 | This options needs CONFIG_CGROUPS to be defined, and lets the administrator | ||
233 | create arbitrary groups of tasks, using the "cgroup" pseudo filesystem. See | 224 | create arbitrary groups of tasks, using the "cgroup" pseudo filesystem. See |
234 | Documentation/cgroups/cgroups.txt for more information about this filesystem. | 225 | Documentation/cgroups/cgroups.txt for more information about this filesystem. |
235 | 226 | ||
236 | Only one of these options to group tasks can be chosen and not both. | 227 | When CONFIG_FAIR_GROUP_SCHED is defined, a "cpu.shares" file is created for each |
237 | |||
238 | When CONFIG_USER_SCHED is defined, a directory is created in sysfs for each new | ||
239 | user and a "cpu_share" file is added in that directory. | ||
240 | |||
241 | # cd /sys/kernel/uids | ||
242 | # cat 512/cpu_share # Display user 512's CPU share | ||
243 | 1024 | ||
244 | # echo 2048 > 512/cpu_share # Modify user 512's CPU share | ||
245 | # cat 512/cpu_share # Display user 512's CPU share | ||
246 | 2048 | ||
247 | # | ||
248 | |||
249 | CPU bandwidth between two users is divided in the ratio of their CPU shares. | ||
250 | For example: if you would like user "root" to get twice the bandwidth of user | ||
251 | "guest," then set the cpu_share for both the users such that "root"'s cpu_share | ||
252 | is twice "guest"'s cpu_share. | ||
253 | |||
254 | When CONFIG_CGROUP_SCHED is defined, a "cpu.shares" file is created for each | ||
255 | group created using the pseudo filesystem. See example steps below to create | 228 | group created using the pseudo filesystem. See example steps below to create |
256 | task groups and modify their CPU share using the "cgroups" pseudo filesystem. | 229 | task groups and modify their CPU share using the "cgroups" pseudo filesystem. |
257 | 230 | ||
@@ -273,24 +246,3 @@ task groups and modify their CPU share using the "cgroups" pseudo filesystem. | |||
273 | 246 | ||
274 | # #Launch gmplayer (or your favourite movie player) | 247 | # #Launch gmplayer (or your favourite movie player) |
275 | # echo <movie_player_pid> > multimedia/tasks | 248 | # echo <movie_player_pid> > multimedia/tasks |
276 | |||
277 | 8. Implementation note: user namespaces | ||
278 | |||
279 | User namespaces are intended to be hierarchical. But they are currently | ||
280 | only partially implemented. Each of those has ramifications for CFS. | ||
281 | |||
282 | First, since user namespaces are hierarchical, the /sys/kernel/uids | ||
283 | presentation is inadequate. Eventually we will likely want to use sysfs | ||
284 | tagging to provide private views of /sys/kernel/uids within each user | ||
285 | namespace. | ||
286 | |||
287 | Second, the hierarchical nature is intended to support completely | ||
288 | unprivileged use of user namespaces. So if using user groups, then | ||
289 | we want the users in a user namespace to be children of the user | ||
290 | who created it. | ||
291 | |||
292 | That is currently unimplemented. So instead, every user in a new | ||
293 | user namespace will receive 1024 shares just like any user in the | ||
294 | initial user namespace. Note that at the moment creation of a new | ||
295 | user namespace requires each of CAP_SYS_ADMIN, CAP_SETUID, and | ||
296 | CAP_SETGID. | ||
diff --git a/Documentation/scheduler/sched-rt-group.txt b/Documentation/scheduler/sched-rt-group.txt index 86eabe6c3419..605b0d40329d 100644 --- a/Documentation/scheduler/sched-rt-group.txt +++ b/Documentation/scheduler/sched-rt-group.txt | |||
@@ -126,23 +126,12 @@ priority! | |||
126 | 2.3 Basis for grouping tasks | 126 | 2.3 Basis for grouping tasks |
127 | ---------------------------- | 127 | ---------------------------- |
128 | 128 | ||
129 | There are two compile-time settings for allocating CPU bandwidth. These are | 129 | Enabling CONFIG_RT_GROUP_SCHED lets you explicitly allocate real |
130 | configured using the "Basis for grouping tasks" multiple choice menu under | 130 | CPU bandwidth to task groups. |
131 | General setup > Group CPU Scheduler: | ||
132 | |||
133 | a. CONFIG_USER_SCHED (aka "Basis for grouping tasks" = "user id") | ||
134 | |||
135 | This lets you use the virtual files under | ||
136 | "/sys/kernel/uids/<uid>/cpu_rt_runtime_us" to control he CPU time reserved for | ||
137 | each user . | ||
138 | |||
139 | The other option is: | ||
140 | |||
141 | .o CONFIG_CGROUP_SCHED (aka "Basis for grouping tasks" = "Control groups") | ||
142 | 131 | ||
143 | This uses the /cgroup virtual file system and | 132 | This uses the /cgroup virtual file system and |
144 | "/cgroup/<cgroup>/cpu.rt_runtime_us" to control the CPU time reserved for each | 133 | "/cgroup/<cgroup>/cpu.rt_runtime_us" to control the CPU time reserved for each |
145 | control group instead. | 134 | control group. |
146 | 135 | ||
147 | For more information on working with control groups, you should read | 136 | For more information on working with control groups, you should read |
148 | Documentation/cgroups/cgroups.txt as well. | 137 | Documentation/cgroups/cgroups.txt as well. |
@@ -161,8 +150,7 @@ For now, this can be simplified to just the following (but see Future plans): | |||
161 | =============== | 150 | =============== |
162 | 151 | ||
163 | There is work in progress to make the scheduling period for each group | 152 | There is work in progress to make the scheduling period for each group |
164 | ("/sys/kernel/uids/<uid>/cpu_rt_period_us" or | 153 | ("/cgroup/<cgroup>/cpu.rt_period_us") configurable as well. |
165 | "/cgroup/<cgroup>/cpu.rt_period_us" respectively) configurable as well. | ||
166 | 154 | ||
167 | The constraint on the period is that a subgroup must have a smaller or | 155 | The constraint on the period is that a subgroup must have a smaller or |
168 | equal period to its parent. But realistically its not very useful _yet_ | 156 | equal period to its parent. But realistically its not very useful _yet_ |
diff --git a/Documentation/trace/events.txt b/Documentation/trace/events.txt index 02ac6ed38b2d..778ddf38b82c 100644 --- a/Documentation/trace/events.txt +++ b/Documentation/trace/events.txt | |||
@@ -90,7 +90,8 @@ In order to facilitate early boot debugging, use boot option: | |||
90 | 90 | ||
91 | trace_event=[event-list] | 91 | trace_event=[event-list] |
92 | 92 | ||
93 | The format of this boot option is the same as described in section 2.1. | 93 | event-list is a comma separated list of events. See section 2.1 for event |
94 | format. | ||
94 | 95 | ||
95 | 3. Defining an event-enabled tracepoint | 96 | 3. Defining an event-enabled tracepoint |
96 | ======================================= | 97 | ======================================= |
diff --git a/Documentation/trace/ftrace.txt b/Documentation/trace/ftrace.txt index 03485bfbd797..557c1edeccaf 100644 --- a/Documentation/trace/ftrace.txt +++ b/Documentation/trace/ftrace.txt | |||
@@ -155,6 +155,9 @@ of ftrace. Here is a list of some of the key files: | |||
155 | to be traced. Echoing names of functions into this file | 155 | to be traced. Echoing names of functions into this file |
156 | will limit the trace to only those functions. | 156 | will limit the trace to only those functions. |
157 | 157 | ||
158 | This interface also allows for commands to be used. See the | ||
159 | "Filter commands" section for more details. | ||
160 | |||
158 | set_ftrace_notrace: | 161 | set_ftrace_notrace: |
159 | 162 | ||
160 | This has an effect opposite to that of | 163 | This has an effect opposite to that of |
@@ -1337,12 +1340,14 @@ ftrace_dump_on_oops must be set. To set ftrace_dump_on_oops, one | |||
1337 | can either use the sysctl function or set it via the proc system | 1340 | can either use the sysctl function or set it via the proc system |
1338 | interface. | 1341 | interface. |
1339 | 1342 | ||
1340 | sysctl kernel.ftrace_dump_on_oops=1 | 1343 | sysctl kernel.ftrace_dump_on_oops=n |
1341 | 1344 | ||
1342 | or | 1345 | or |
1343 | 1346 | ||
1344 | echo 1 > /proc/sys/kernel/ftrace_dump_on_oops | 1347 | echo n > /proc/sys/kernel/ftrace_dump_on_oops |
1345 | 1348 | ||
1349 | If n = 1, ftrace will dump buffers of all CPUs, if n = 2 ftrace will | ||
1350 | only dump the buffer of the CPU that triggered the oops. | ||
1346 | 1351 | ||
1347 | Here's an example of such a dump after a null pointer | 1352 | Here's an example of such a dump after a null pointer |
1348 | dereference in a kernel module: | 1353 | dereference in a kernel module: |
@@ -1822,6 +1827,47 @@ this special filter via: | |||
1822 | echo > set_graph_function | 1827 | echo > set_graph_function |
1823 | 1828 | ||
1824 | 1829 | ||
1830 | Filter commands | ||
1831 | --------------- | ||
1832 | |||
1833 | A few commands are supported by the set_ftrace_filter interface. | ||
1834 | Trace commands have the following format: | ||
1835 | |||
1836 | <function>:<command>:<parameter> | ||
1837 | |||
1838 | The following commands are supported: | ||
1839 | |||
1840 | - mod | ||
1841 | This command enables function filtering per module. The | ||
1842 | parameter defines the module. For example, if only the write* | ||
1843 | functions in the ext3 module are desired, run: | ||
1844 | |||
1845 | echo 'write*:mod:ext3' > set_ftrace_filter | ||
1846 | |||
1847 | This command interacts with the filter in the same way as | ||
1848 | filtering based on function names. Thus, adding more functions | ||
1849 | in a different module is accomplished by appending (>>) to the | ||
1850 | filter file. Remove specific module functions by prepending | ||
1851 | '!': | ||
1852 | |||
1853 | echo '!writeback*:mod:ext3' >> set_ftrace_filter | ||
1854 | |||
1855 | - traceon/traceoff | ||
1856 | These commands turn tracing on and off when the specified | ||
1857 | functions are hit. The parameter determines how many times the | ||
1858 | tracing system is turned on and off. If unspecified, there is | ||
1859 | no limit. For example, to disable tracing when a schedule bug | ||
1860 | is hit the first 5 times, run: | ||
1861 | |||
1862 | echo '__schedule_bug:traceoff:5' > set_ftrace_filter | ||
1863 | |||
1864 | These commands are cumulative whether or not they are appended | ||
1865 | to set_ftrace_filter. To remove a command, prepend it by '!' | ||
1866 | and drop the parameter: | ||
1867 | |||
1868 | echo '!__schedule_bug:traceoff' > set_ftrace_filter | ||
1869 | |||
1870 | |||
1825 | trace_pipe | 1871 | trace_pipe |
1826 | ---------- | 1872 | ---------- |
1827 | 1873 | ||
diff --git a/Documentation/trace/kprobetrace.txt b/Documentation/trace/kprobetrace.txt index a9100b28eb84..ec94748ae65b 100644 --- a/Documentation/trace/kprobetrace.txt +++ b/Documentation/trace/kprobetrace.txt | |||
@@ -40,7 +40,9 @@ Synopsis of kprobe_events | |||
40 | $stack : Fetch stack address. | 40 | $stack : Fetch stack address. |
41 | $retval : Fetch return value.(*) | 41 | $retval : Fetch return value.(*) |
42 | +|-offs(FETCHARG) : Fetch memory at FETCHARG +|- offs address.(**) | 42 | +|-offs(FETCHARG) : Fetch memory at FETCHARG +|- offs address.(**) |
43 | NAME=FETCHARG: Set NAME as the argument name of FETCHARG. | 43 | NAME=FETCHARG : Set NAME as the argument name of FETCHARG. |
44 | FETCHARG:TYPE : Set TYPE as the type of FETCHARG. Currently, basic types | ||
45 | (u8/u16/u32/u64/s8/s16/s32/s64) are supported. | ||
44 | 46 | ||
45 | (*) only for return probe. | 47 | (*) only for return probe. |
46 | (**) this is useful for fetching a field of data structures. | 48 | (**) this is useful for fetching a field of data structures. |