aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
authorIngo Molnar <mingo@elte.hu>2011-03-18 05:38:53 -0400
committerIngo Molnar <mingo@elte.hu>2011-03-18 05:39:00 -0400
commit8dd8997d2c56c9f248294805e129e1fc69444380 (patch)
tree3b030a04295fc031db98746c4074c2df1ed6a19f /Documentation
parent1eda75c131ea42ec173323b6c34aeed78ae637c1 (diff)
parent016aa2ed1cc9cf704cf76d8df07751b6daa9750f (diff)
Merge branch 'linus' into x86/urgent
Merge reason: Merge upstream commits to avoid conflicts in upcoming patches. Signed-off-by: Ingo Molnar <mingo@elte.hu>
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/RCU/whatisRCU.txt31
-rw-r--r--Documentation/devicetree/bindings/i2c/ce4100-i2c.txt93
-rw-r--r--Documentation/devicetree/bindings/rtc/rtc-cmos.txt28
-rw-r--r--Documentation/devicetree/bindings/x86/ce4100.txt38
-rw-r--r--Documentation/devicetree/bindings/x86/interrupt.txt26
-rw-r--r--Documentation/devicetree/bindings/x86/timer.txt6
-rw-r--r--Documentation/devicetree/booting-without-of.txt20
-rw-r--r--Documentation/kernel-parameters.txt4
-rw-r--r--Documentation/memory-barriers.txt58
-rw-r--r--Documentation/rtc.txt29
-rw-r--r--Documentation/spinlocks.txt24
-rw-r--r--Documentation/trace/ftrace-design.txt7
-rw-r--r--Documentation/trace/ftrace.txt151
-rw-r--r--Documentation/trace/kprobetrace.txt16
14 files changed, 360 insertions, 171 deletions
diff --git a/Documentation/RCU/whatisRCU.txt b/Documentation/RCU/whatisRCU.txt
index cfaac34c4557..6ef692667e2f 100644
--- a/Documentation/RCU/whatisRCU.txt
+++ b/Documentation/RCU/whatisRCU.txt
@@ -849,6 +849,37 @@ All: lockdep-checked RCU-protected pointer access
849See the comment headers in the source code (or the docbook generated 849See the comment headers in the source code (or the docbook generated
850from them) for more information. 850from them) for more information.
851 851
852However, given that there are no fewer than four families of RCU APIs
853in the Linux kernel, how do you choose which one to use? The following
854list can be helpful:
855
856a. Will readers need to block? If so, you need SRCU.
857
858b. What about the -rt patchset? If readers would need to block
859 in an non-rt kernel, you need SRCU. If readers would block
860 in a -rt kernel, but not in a non-rt kernel, SRCU is not
861 necessary.
862
863c. Do you need to treat NMI handlers, hardirq handlers,
864 and code segments with preemption disabled (whether
865 via preempt_disable(), local_irq_save(), local_bh_disable(),
866 or some other mechanism) as if they were explicit RCU readers?
867 If so, you need RCU-sched.
868
869d. Do you need RCU grace periods to complete even in the face
870 of softirq monopolization of one or more of the CPUs? For
871 example, is your code subject to network-based denial-of-service
872 attacks? If so, you need RCU-bh.
873
874e. Is your workload too update-intensive for normal use of
875 RCU, but inappropriate for other synchronization mechanisms?
876 If so, consider SLAB_DESTROY_BY_RCU. But please be careful!
877
878f. Otherwise, use RCU.
879
880Of course, this all assumes that you have determined that RCU is in fact
881the right tool for your job.
882
852 883
8538. ANSWERS TO QUICK QUIZZES 8848. ANSWERS TO QUICK QUIZZES
854 885
diff --git a/Documentation/devicetree/bindings/i2c/ce4100-i2c.txt b/Documentation/devicetree/bindings/i2c/ce4100-i2c.txt
new file mode 100644
index 000000000000..569b16248514
--- /dev/null
+++ b/Documentation/devicetree/bindings/i2c/ce4100-i2c.txt
@@ -0,0 +1,93 @@
1CE4100 I2C
2----------
3
4CE4100 has one PCI device which is described as the I2C-Controller. This
5PCI device has three PCI-bars, each bar contains a complete I2C
6controller. So we have a total of three independent I2C-Controllers
7which share only an interrupt line.
8The driver is probed via the PCI-ID and is gathering the information of
9attached devices from the devices tree.
10Grant Likely recommended to use the ranges property to map the PCI-Bar
11number to its physical address and to use this to find the child nodes
12of the specific I2C controller. This were his exact words:
13
14 Here's where the magic happens. Each entry in
15 ranges describes how the parent pci address space
16 (middle group of 3) is translated to the local
17 address space (first group of 2) and the size of
18 each range (last cell). In this particular case,
19 the first cell of the local address is chosen to be
20 1:1 mapped to the BARs, and the second is the
21 offset from be base of the BAR (which would be
22 non-zero if you had 2 or more devices mapped off
23 the same BAR)
24
25 ranges allows the address mapping to be described
26 in a way that the OS can interpret without
27 requiring custom device driver code.
28
29This is an example which is used on FalconFalls:
30------------------------------------------------
31 i2c-controller@b,2 {
32 #address-cells = <2>;
33 #size-cells = <1>;
34 compatible = "pci8086,2e68.2",
35 "pci8086,2e68",
36 "pciclass,ff0000",
37 "pciclass,ff00";
38
39 reg = <0x15a00 0x0 0x0 0x0 0x0>;
40 interrupts = <16 1>;
41
42 /* as described by Grant, the first number in the group of
43 * three is the bar number followed by the 64bit bar address
44 * followed by size of the mapping. The bar address
45 * requires also a valid translation in parents ranges
46 * property.
47 */
48 ranges = <0 0 0x02000000 0 0xdffe0500 0x100
49 1 0 0x02000000 0 0xdffe0600 0x100
50 2 0 0x02000000 0 0xdffe0700 0x100>;
51
52 i2c@0 {
53 #address-cells = <1>;
54 #size-cells = <0>;
55 compatible = "intel,ce4100-i2c-controller";
56
57 /* The first number in the reg property is the
58 * number of the bar
59 */
60 reg = <0 0 0x100>;
61
62 /* This I2C controller has no devices */
63 };
64
65 i2c@1 {
66 #address-cells = <1>;
67 #size-cells = <0>;
68 compatible = "intel,ce4100-i2c-controller";
69 reg = <1 0 0x100>;
70
71 /* This I2C controller has one gpio controller */
72 gpio@26 {
73 #gpio-cells = <2>;
74 compatible = "ti,pcf8575";
75 reg = <0x26>;
76 gpio-controller;
77 };
78 };
79
80 i2c@2 {
81 #address-cells = <1>;
82 #size-cells = <0>;
83 compatible = "intel,ce4100-i2c-controller";
84 reg = <2 0 0x100>;
85
86 gpio@26 {
87 #gpio-cells = <2>;
88 compatible = "ti,pcf8575";
89 reg = <0x26>;
90 gpio-controller;
91 };
92 };
93 };
diff --git a/Documentation/devicetree/bindings/rtc/rtc-cmos.txt b/Documentation/devicetree/bindings/rtc/rtc-cmos.txt
new file mode 100644
index 000000000000..7382989b3052
--- /dev/null
+++ b/Documentation/devicetree/bindings/rtc/rtc-cmos.txt
@@ -0,0 +1,28 @@
1 Motorola mc146818 compatible RTC
2~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3
4Required properties:
5 - compatible : "motorola,mc146818"
6 - reg : should contain registers location and length.
7
8Optional properties:
9 - interrupts : should contain interrupt.
10 - interrupt-parent : interrupt source phandle.
11 - ctrl-reg : Contains the initial value of the control register also
12 called "Register B".
13 - freq-reg : Contains the initial value of the frequency register also
14 called "Regsiter A".
15
16"Register A" and "B" are usually initialized by the firmware (BIOS for
17instance). If this is not done, it can be performed by the driver.
18
19ISA Example:
20
21 rtc@70 {
22 compatible = "motorola,mc146818";
23 interrupts = <8 3>;
24 interrupt-parent = <&ioapic1>;
25 ctrl-reg = <2>;
26 freq-reg = <0x26>;
27 reg = <1 0x70 2>;
28 };
diff --git a/Documentation/devicetree/bindings/x86/ce4100.txt b/Documentation/devicetree/bindings/x86/ce4100.txt
new file mode 100644
index 000000000000..b49ae593a60b
--- /dev/null
+++ b/Documentation/devicetree/bindings/x86/ce4100.txt
@@ -0,0 +1,38 @@
1CE4100 Device Tree Bindings
2---------------------------
3
4The CE4100 SoC uses for in core peripherals the following compatible
5format: <vendor>,<chip>-<device>.
6Many of the "generic" devices like HPET or IO APIC have the ce4100
7name in their compatible property because they first appeared in this
8SoC.
9
10The CPU node
11------------
12 cpu@0 {
13 device_type = "cpu";
14 compatible = "intel,ce4100";
15 reg = <0>;
16 lapic = <&lapic0>;
17 };
18
19The reg property describes the CPU number. The lapic property points to
20the local APIC timer.
21
22The SoC node
23------------
24
25This node describes the in-core peripherals. Required property:
26 compatible = "intel,ce4100-cp";
27
28The PCI node
29------------
30This node describes the PCI bus on the SoC. Its property should be
31 compatible = "intel,ce4100-pci", "pci";
32
33If the OS is using the IO-APIC for interrupt routing then the reported
34interrupt numbers for devices is no longer true. In order to obtain the
35correct interrupt number, the child node which represents the device has
36to contain the interrupt property. Besides the interrupt property it has
37to contain at least the reg property containing the PCI bus address and
38compatible property according to "PCI Bus Binding Revision 2.1".
diff --git a/Documentation/devicetree/bindings/x86/interrupt.txt b/Documentation/devicetree/bindings/x86/interrupt.txt
new file mode 100644
index 000000000000..7d19f494f19a
--- /dev/null
+++ b/Documentation/devicetree/bindings/x86/interrupt.txt
@@ -0,0 +1,26 @@
1Interrupt chips
2---------------
3
4* Intel I/O Advanced Programmable Interrupt Controller (IO APIC)
5
6 Required properties:
7 --------------------
8 compatible = "intel,ce4100-ioapic";
9 #interrupt-cells = <2>;
10
11 Device's interrupt property:
12
13 interrupts = <P S>;
14
15 The first number (P) represents the interrupt pin which is wired to the
16 IO APIC. The second number (S) represents the sense of interrupt which
17 should be configured and can be one of:
18 0 - Edge Rising
19 1 - Level Low
20 2 - Level High
21 3 - Edge Falling
22
23* Local APIC
24 Required property:
25
26 compatible = "intel,ce4100-lapic";
diff --git a/Documentation/devicetree/bindings/x86/timer.txt b/Documentation/devicetree/bindings/x86/timer.txt
new file mode 100644
index 000000000000..c688af58e3bd
--- /dev/null
+++ b/Documentation/devicetree/bindings/x86/timer.txt
@@ -0,0 +1,6 @@
1Timers
2------
3
4* High Precision Event Timer (HPET)
5 Required property:
6 compatible = "intel,ce4100-hpet";
diff --git a/Documentation/devicetree/booting-without-of.txt b/Documentation/devicetree/booting-without-of.txt
index 28b1c9d3d351..55fd2623445b 100644
--- a/Documentation/devicetree/booting-without-of.txt
+++ b/Documentation/devicetree/booting-without-of.txt
@@ -13,6 +13,7 @@ Table of Contents
13 13
14 I - Introduction 14 I - Introduction
15 1) Entry point for arch/powerpc 15 1) Entry point for arch/powerpc
16 2) Entry point for arch/x86
16 17
17 II - The DT block format 18 II - The DT block format
18 1) Header 19 1) Header
@@ -225,6 +226,25 @@ it with special cases.
225 cannot support both configurations with Book E and configurations 226 cannot support both configurations with Book E and configurations
226 with classic Powerpc architectures. 227 with classic Powerpc architectures.
227 228
2292) Entry point for arch/x86
230-------------------------------
231
232 There is one single 32bit entry point to the kernel at code32_start,
233 the decompressor (the real mode entry point goes to the same 32bit
234 entry point once it switched into protected mode). That entry point
235 supports one calling convention which is documented in
236 Documentation/x86/boot.txt
237 The physical pointer to the device-tree block (defined in chapter II)
238 is passed via setup_data which requires at least boot protocol 2.09.
239 The type filed is defined as
240
241 #define SETUP_DTB 2
242
243 This device-tree is used as an extension to the "boot page". As such it
244 does not parse / consider data which is already covered by the boot
245 page. This includes memory size, reserved ranges, command line arguments
246 or initrd address. It simply holds information which can not be retrieved
247 otherwise like interrupt routing or a list of devices behind an I2C bus.
228 248
229II - The DT block format 249II - The DT block format
230======================== 250========================
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index f4a04c0c7edc..738c6fda3fb0 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2444,6 +2444,10 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
2444 <deci-seconds>: poll all this frequency 2444 <deci-seconds>: poll all this frequency
2445 0: no polling (default) 2445 0: no polling (default)
2446 2446
2447 threadirqs [KNL]
2448 Force threading of all interrupt handlers except those
2449 marked explicitely IRQF_NO_THREAD.
2450
2447 topology= [S390] 2451 topology= [S390]
2448 Format: {off | on} 2452 Format: {off | on}
2449 Specify if the kernel should make use of the cpu 2453 Specify if the kernel should make use of the cpu
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index 631ad2f1b229..f0d3a8026a56 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -21,6 +21,7 @@ Contents:
21 - SMP barrier pairing. 21 - SMP barrier pairing.
22 - Examples of memory barrier sequences. 22 - Examples of memory barrier sequences.
23 - Read memory barriers vs load speculation. 23 - Read memory barriers vs load speculation.
24 - Transitivity
24 25
25 (*) Explicit kernel barriers. 26 (*) Explicit kernel barriers.
26 27
@@ -959,6 +960,63 @@ the speculation will be cancelled and the value reloaded:
959 retrieved : : +-------+ 960 retrieved : : +-------+
960 961
961 962
963TRANSITIVITY
964------------
965
966Transitivity is a deeply intuitive notion about ordering that is not
967always provided by real computer systems. The following example
968demonstrates transitivity (also called "cumulativity"):
969
970 CPU 1 CPU 2 CPU 3
971 ======================= ======================= =======================
972 { X = 0, Y = 0 }
973 STORE X=1 LOAD X STORE Y=1
974 <general barrier> <general barrier>
975 LOAD Y LOAD X
976
977Suppose that CPU 2's load from X returns 1 and its load from Y returns 0.
978This indicates that CPU 2's load from X in some sense follows CPU 1's
979store to X and that CPU 2's load from Y in some sense preceded CPU 3's
980store to Y. The question is then "Can CPU 3's load from X return 0?"
981
982Because CPU 2's load from X in some sense came after CPU 1's store, it
983is natural to expect that CPU 3's load from X must therefore return 1.
984This expectation is an example of transitivity: if a load executing on
985CPU A follows a load from the same variable executing on CPU B, then
986CPU A's load must either return the same value that CPU B's load did,
987or must return some later value.
988
989In the Linux kernel, use of general memory barriers guarantees
990transitivity. Therefore, in the above example, if CPU 2's load from X
991returns 1 and its load from Y returns 0, then CPU 3's load from X must
992also return 1.
993
994However, transitivity is -not- guaranteed for read or write barriers.
995For example, suppose that CPU 2's general barrier in the above example
996is changed to a read barrier as shown below:
997
998 CPU 1 CPU 2 CPU 3
999 ======================= ======================= =======================
1000 { X = 0, Y = 0 }
1001 STORE X=1 LOAD X STORE Y=1
1002 <read barrier> <general barrier>
1003 LOAD Y LOAD X
1004
1005This substitution destroys transitivity: in this example, it is perfectly
1006legal for CPU 2's load from X to return 1, its load from Y to return 0,
1007and CPU 3's load from X to return 0.
1008
1009The key point is that although CPU 2's read barrier orders its pair
1010of loads, it does not guarantee to order CPU 1's store. Therefore, if
1011this example runs on a system where CPUs 1 and 2 share a store buffer
1012or a level of cache, CPU 2 might have early access to CPU 1's writes.
1013General barriers are therefore required to ensure that all CPUs agree
1014on the combined order of CPU 1's and CPU 2's accesses.
1015
1016To reiterate, if your code requires transitivity, use general barriers
1017throughout.
1018
1019
962======================== 1020========================
963EXPLICIT KERNEL BARRIERS 1021EXPLICIT KERNEL BARRIERS
964======================== 1022========================
diff --git a/Documentation/rtc.txt b/Documentation/rtc.txt
index 9104c1062084..250160469d83 100644
--- a/Documentation/rtc.txt
+++ b/Documentation/rtc.txt
@@ -178,38 +178,29 @@ RTC class framework, but can't be supported by the older driver.
178 setting the longer alarm time and enabling its IRQ using a single 178 setting the longer alarm time and enabling its IRQ using a single
179 request (using the same model as EFI firmware). 179 request (using the same model as EFI firmware).
180 180
181 * RTC_UIE_ON, RTC_UIE_OFF ... if the RTC offers IRQs, it probably 181 * RTC_UIE_ON, RTC_UIE_OFF ... if the RTC offers IRQs, the RTC framework
182 also offers update IRQs whenever the "seconds" counter changes. 182 will emulate this mechanism.
183 If needed, the RTC framework can emulate this mechanism.
184 183
185 * RTC_PIE_ON, RTC_PIE_OFF, RTC_IRQP_SET, RTC_IRQP_READ ... another 184 * RTC_PIE_ON, RTC_PIE_OFF, RTC_IRQP_SET, RTC_IRQP_READ ... these icotls
186 feature often accessible with an IRQ line is a periodic IRQ, issued 185 are emulated via a kernel hrtimer.
187 at settable frequencies (usually 2^N Hz).
188 186
189In many cases, the RTC alarm can be a system wake event, used to force 187In many cases, the RTC alarm can be a system wake event, used to force
190Linux out of a low power sleep state (or hibernation) back to a fully 188Linux out of a low power sleep state (or hibernation) back to a fully
191operational state. For example, a system could enter a deep power saving 189operational state. For example, a system could enter a deep power saving
192state until it's time to execute some scheduled tasks. 190state until it's time to execute some scheduled tasks.
193 191
194Note that many of these ioctls need not actually be implemented by your 192Note that many of these ioctls are handled by the common rtc-dev interface.
195driver. The common rtc-dev interface handles many of these nicely if your 193Some common examples:
196driver returns ENOIOCTLCMD. Some common examples:
197 194
198 * RTC_RD_TIME, RTC_SET_TIME: the read_time/set_time functions will be 195 * RTC_RD_TIME, RTC_SET_TIME: the read_time/set_time functions will be
199 called with appropriate values. 196 called with appropriate values.
200 197
201 * RTC_ALM_SET, RTC_ALM_READ, RTC_WKALM_SET, RTC_WKALM_RD: the 198 * RTC_ALM_SET, RTC_ALM_READ, RTC_WKALM_SET, RTC_WKALM_RD: gets or sets
202 set_alarm/read_alarm functions will be called. 199 the alarm rtc_timer. May call the set_alarm driver function.
203 200
204 * RTC_IRQP_SET, RTC_IRQP_READ: the irq_set_freq function will be called 201 * RTC_IRQP_SET, RTC_IRQP_READ: These are emulated by the generic code.
205 to set the frequency while the framework will handle the read for you
206 since the frequency is stored in the irq_freq member of the rtc_device
207 structure. Your driver needs to initialize the irq_freq member during
208 init. Make sure you check the requested frequency is in range of your
209 hardware in the irq_set_freq function. If it isn't, return -EINVAL. If
210 you cannot actually change the frequency, do not define irq_set_freq.
211 202
212 * RTC_PIE_ON, RTC_PIE_OFF: the irq_set_state function will be called. 203 * RTC_PIE_ON, RTC_PIE_OFF: These are also emulated by the generic code.
213 204
214If all else fails, check out the rtc-test.c driver! 205If all else fails, check out the rtc-test.c driver!
215 206
diff --git a/Documentation/spinlocks.txt b/Documentation/spinlocks.txt
index 178c831b907d..2e3c64b1a6a5 100644
--- a/Documentation/spinlocks.txt
+++ b/Documentation/spinlocks.txt
@@ -86,7 +86,7 @@ to change the variables it has to get an exclusive write lock.
86 86
87The routines look the same as above: 87The routines look the same as above:
88 88
89 rwlock_t xxx_lock = RW_LOCK_UNLOCKED; 89 rwlock_t xxx_lock = __RW_LOCK_UNLOCKED(xxx_lock);
90 90
91 unsigned long flags; 91 unsigned long flags;
92 92
@@ -196,25 +196,3 @@ appropriate:
196 196
197For static initialization, use DEFINE_SPINLOCK() / DEFINE_RWLOCK() or 197For static initialization, use DEFINE_SPINLOCK() / DEFINE_RWLOCK() or
198__SPIN_LOCK_UNLOCKED() / __RW_LOCK_UNLOCKED() as appropriate. 198__SPIN_LOCK_UNLOCKED() / __RW_LOCK_UNLOCKED() as appropriate.
199
200SPIN_LOCK_UNLOCKED and RW_LOCK_UNLOCKED are deprecated. These interfere
201with lockdep state tracking.
202
203Most of the time, you can simply turn:
204 static spinlock_t xxx_lock = SPIN_LOCK_UNLOCKED;
205into:
206 static DEFINE_SPINLOCK(xxx_lock);
207
208Static structure member variables go from:
209
210 struct foo bar {
211 .lock = SPIN_LOCK_UNLOCKED;
212 };
213
214to:
215
216 struct foo bar {
217 .lock = __SPIN_LOCK_UNLOCKED(bar.lock);
218 };
219
220Declaration of static rw_locks undergo a similar transformation.
diff --git a/Documentation/trace/ftrace-design.txt b/Documentation/trace/ftrace-design.txt
index dc52bd442c92..79fcafc7fd64 100644
--- a/Documentation/trace/ftrace-design.txt
+++ b/Documentation/trace/ftrace-design.txt
@@ -247,6 +247,13 @@ You need very few things to get the syscalls tracing in an arch.
247- Support the TIF_SYSCALL_TRACEPOINT thread flags. 247- Support the TIF_SYSCALL_TRACEPOINT thread flags.
248- Put the trace_sys_enter() and trace_sys_exit() tracepoints calls from ptrace 248- Put the trace_sys_enter() and trace_sys_exit() tracepoints calls from ptrace
249 in the ptrace syscalls tracing path. 249 in the ptrace syscalls tracing path.
250- If the system call table on this arch is more complicated than a simple array
251 of addresses of the system calls, implement an arch_syscall_addr to return
252 the address of a given system call.
253- If the symbol names of the system calls do not match the function names on
254 this arch, define ARCH_HAS_SYSCALL_MATCH_SYM_NAME in asm/ftrace.h and
255 implement arch_syscall_match_sym_name with the appropriate logic to return
256 true if the function name corresponds with the symbol name.
250- Tag this arch as HAVE_SYSCALL_TRACEPOINTS. 257- Tag this arch as HAVE_SYSCALL_TRACEPOINTS.
251 258
252 259
diff --git a/Documentation/trace/ftrace.txt b/Documentation/trace/ftrace.txt
index 557c1edeccaf..1ebc24cf9a55 100644
--- a/Documentation/trace/ftrace.txt
+++ b/Documentation/trace/ftrace.txt
@@ -80,11 +80,11 @@ of ftrace. Here is a list of some of the key files:
80 tracers listed here can be configured by 80 tracers listed here can be configured by
81 echoing their name into current_tracer. 81 echoing their name into current_tracer.
82 82
83 tracing_enabled: 83 tracing_on:
84 84
85 This sets or displays whether the current_tracer 85 This sets or displays whether writing to the trace
86 is activated and tracing or not. Echo 0 into this 86 ring buffer is enabled. Echo 0 into this file to disable
87 file to disable the tracer or 1 to enable it. 87 the tracer or 1 to enable it.
88 88
89 trace: 89 trace:
90 90
@@ -202,10 +202,6 @@ Here is the list of current tracers that may be configured.
202 to draw a graph of function calls similar to C code 202 to draw a graph of function calls similar to C code
203 source. 203 source.
204 204
205 "sched_switch"
206
207 Traces the context switches and wakeups between tasks.
208
209 "irqsoff" 205 "irqsoff"
210 206
211 Traces the areas that disable interrupts and saves 207 Traces the areas that disable interrupts and saves
@@ -273,39 +269,6 @@ format, the function name that was traced "path_put" and the
273parent function that called this function "path_walk". The 269parent function that called this function "path_walk". The
274timestamp is the time at which the function was entered. 270timestamp is the time at which the function was entered.
275 271
276The sched_switch tracer also includes tracing of task wakeups
277and context switches.
278
279 ksoftirqd/1-7 [01] 1453.070013: 7:115:R + 2916:115:S
280 ksoftirqd/1-7 [01] 1453.070013: 7:115:R + 10:115:S
281 ksoftirqd/1-7 [01] 1453.070013: 7:115:R ==> 10:115:R
282 events/1-10 [01] 1453.070013: 10:115:S ==> 2916:115:R
283 kondemand/1-2916 [01] 1453.070013: 2916:115:S ==> 7:115:R
284 ksoftirqd/1-7 [01] 1453.070013: 7:115:S ==> 0:140:R
285
286Wake ups are represented by a "+" and the context switches are
287shown as "==>". The format is:
288
289 Context switches:
290
291 Previous task Next Task
292
293 <pid>:<prio>:<state> ==> <pid>:<prio>:<state>
294
295 Wake ups:
296
297 Current task Task waking up
298
299 <pid>:<prio>:<state> + <pid>:<prio>:<state>
300
301The prio is the internal kernel priority, which is the inverse
302of the priority that is usually displayed by user-space tools.
303Zero represents the highest priority (99). Prio 100 starts the
304"nice" priorities with 100 being equal to nice -20 and 139 being
305nice 19. The prio "140" is reserved for the idle task which is
306the lowest priority thread (pid 0).
307
308
309Latency trace format 272Latency trace format
310-------------------- 273--------------------
311 274
@@ -491,78 +454,10 @@ x494] <- /root/a.out[+0x4a8] <- /lib/libc-2.7.so[+0x1e1a6]
491 latencies, as described in "Latency 454 latencies, as described in "Latency
492 trace format". 455 trace format".
493 456
494sched_switch 457 overwrite - This controls what happens when the trace buffer is
495------------ 458 full. If "1" (default), the oldest events are
496 459 discarded and overwritten. If "0", then the newest
497This tracer simply records schedule switches. Here is an example 460 events are discarded.
498of how to use it.
499
500 # echo sched_switch > current_tracer
501 # echo 1 > tracing_enabled
502 # sleep 1
503 # echo 0 > tracing_enabled
504 # cat trace
505
506# tracer: sched_switch
507#
508# TASK-PID CPU# TIMESTAMP FUNCTION
509# | | | | |
510 bash-3997 [01] 240.132281: 3997:120:R + 4055:120:R
511 bash-3997 [01] 240.132284: 3997:120:R ==> 4055:120:R
512 sleep-4055 [01] 240.132371: 4055:120:S ==> 3997:120:R
513 bash-3997 [01] 240.132454: 3997:120:R + 4055:120:S
514 bash-3997 [01] 240.132457: 3997:120:R ==> 4055:120:R
515 sleep-4055 [01] 240.132460: 4055:120:D ==> 3997:120:R
516 bash-3997 [01] 240.132463: 3997:120:R + 4055:120:D
517 bash-3997 [01] 240.132465: 3997:120:R ==> 4055:120:R
518 <idle>-0 [00] 240.132589: 0:140:R + 4:115:S
519 <idle>-0 [00] 240.132591: 0:140:R ==> 4:115:R
520 ksoftirqd/0-4 [00] 240.132595: 4:115:S ==> 0:140:R
521 <idle>-0 [00] 240.132598: 0:140:R + 4:115:S
522 <idle>-0 [00] 240.132599: 0:140:R ==> 4:115:R
523 ksoftirqd/0-4 [00] 240.132603: 4:115:S ==> 0:140:R
524 sleep-4055 [01] 240.133058: 4055:120:S ==> 3997:120:R
525 [...]
526
527
528As we have discussed previously about this format, the header
529shows the name of the trace and points to the options. The
530"FUNCTION" is a misnomer since here it represents the wake ups
531and context switches.
532
533The sched_switch file only lists the wake ups (represented with
534'+') and context switches ('==>') with the previous task or
535current task first followed by the next task or task waking up.
536The format for both of these is PID:KERNEL-PRIO:TASK-STATE.
537Remember that the KERNEL-PRIO is the inverse of the actual
538priority with zero (0) being the highest priority and the nice
539values starting at 100 (nice -20). Below is a quick chart to map
540the kernel priority to user land priorities.
541
542 Kernel Space User Space
543 ===============================================================
544 0(high) to 98(low) user RT priority 99(high) to 1(low)
545 with SCHED_RR or SCHED_FIFO
546 ---------------------------------------------------------------
547 99 sched_priority is not used in scheduling
548 decisions(it must be specified as 0)
549 ---------------------------------------------------------------
550 100(high) to 139(low) user nice -20(high) to 19(low)
551 ---------------------------------------------------------------
552 140 idle task priority
553 ---------------------------------------------------------------
554
555The task states are:
556
557 R - running : wants to run, may not actually be running
558 S - sleep : process is waiting to be woken up (handles signals)
559 D - disk sleep (uninterruptible sleep) : process must be woken up
560 (ignores signals)
561 T - stopped : process suspended
562 t - traced : process is being traced (with something like gdb)
563 Z - zombie : process waiting to be cleaned up
564 X - unknown
565
566 461
567ftrace_enabled 462ftrace_enabled
568-------------- 463--------------
@@ -607,10 +502,10 @@ an example:
607 # echo irqsoff > current_tracer 502 # echo irqsoff > current_tracer
608 # echo latency-format > trace_options 503 # echo latency-format > trace_options
609 # echo 0 > tracing_max_latency 504 # echo 0 > tracing_max_latency
610 # echo 1 > tracing_enabled 505 # echo 1 > tracing_on
611 # ls -ltr 506 # ls -ltr
612 [...] 507 [...]
613 # echo 0 > tracing_enabled 508 # echo 0 > tracing_on
614 # cat trace 509 # cat trace
615# tracer: irqsoff 510# tracer: irqsoff
616# 511#
@@ -715,10 +610,10 @@ is much like the irqsoff tracer.
715 # echo preemptoff > current_tracer 610 # echo preemptoff > current_tracer
716 # echo latency-format > trace_options 611 # echo latency-format > trace_options
717 # echo 0 > tracing_max_latency 612 # echo 0 > tracing_max_latency
718 # echo 1 > tracing_enabled 613 # echo 1 > tracing_on
719 # ls -ltr 614 # ls -ltr
720 [...] 615 [...]
721 # echo 0 > tracing_enabled 616 # echo 0 > tracing_on
722 # cat trace 617 # cat trace
723# tracer: preemptoff 618# tracer: preemptoff
724# 619#
@@ -863,10 +758,10 @@ tracers.
863 # echo preemptirqsoff > current_tracer 758 # echo preemptirqsoff > current_tracer
864 # echo latency-format > trace_options 759 # echo latency-format > trace_options
865 # echo 0 > tracing_max_latency 760 # echo 0 > tracing_max_latency
866 # echo 1 > tracing_enabled 761 # echo 1 > tracing_on
867 # ls -ltr 762 # ls -ltr
868 [...] 763 [...]
869 # echo 0 > tracing_enabled 764 # echo 0 > tracing_on
870 # cat trace 765 # cat trace
871# tracer: preemptirqsoff 766# tracer: preemptirqsoff
872# 767#
@@ -1026,9 +921,9 @@ Instead of performing an 'ls', we will run 'sleep 1' under
1026 # echo wakeup > current_tracer 921 # echo wakeup > current_tracer
1027 # echo latency-format > trace_options 922 # echo latency-format > trace_options
1028 # echo 0 > tracing_max_latency 923 # echo 0 > tracing_max_latency
1029 # echo 1 > tracing_enabled 924 # echo 1 > tracing_on
1030 # chrt -f 5 sleep 1 925 # chrt -f 5 sleep 1
1031 # echo 0 > tracing_enabled 926 # echo 0 > tracing_on
1032 # cat trace 927 # cat trace
1033# tracer: wakeup 928# tracer: wakeup
1034# 929#
@@ -1140,9 +1035,9 @@ ftrace_enabled is set; otherwise this tracer is a nop.
1140 1035
1141 # sysctl kernel.ftrace_enabled=1 1036 # sysctl kernel.ftrace_enabled=1
1142 # echo function > current_tracer 1037 # echo function > current_tracer
1143 # echo 1 > tracing_enabled 1038 # echo 1 > tracing_on
1144 # usleep 1 1039 # usleep 1
1145 # echo 0 > tracing_enabled 1040 # echo 0 > tracing_on
1146 # cat trace 1041 # cat trace
1147# tracer: function 1042# tracer: function
1148# 1043#
@@ -1180,7 +1075,7 @@ int trace_fd;
1180[...] 1075[...]
1181int main(int argc, char *argv[]) { 1076int main(int argc, char *argv[]) {
1182 [...] 1077 [...]
1183 trace_fd = open(tracing_file("tracing_enabled"), O_WRONLY); 1078 trace_fd = open(tracing_file("tracing_on"), O_WRONLY);
1184 [...] 1079 [...]
1185 if (condition_hit()) { 1080 if (condition_hit()) {
1186 write(trace_fd, "0", 1); 1081 write(trace_fd, "0", 1);
@@ -1631,9 +1526,9 @@ If I am only interested in sys_nanosleep and hrtimer_interrupt:
1631 # echo sys_nanosleep hrtimer_interrupt \ 1526 # echo sys_nanosleep hrtimer_interrupt \
1632 > set_ftrace_filter 1527 > set_ftrace_filter
1633 # echo function > current_tracer 1528 # echo function > current_tracer
1634 # echo 1 > tracing_enabled 1529 # echo 1 > tracing_on
1635 # usleep 1 1530 # usleep 1
1636 # echo 0 > tracing_enabled 1531 # echo 0 > tracing_on
1637 # cat trace 1532 # cat trace
1638# tracer: ftrace 1533# tracer: ftrace
1639# 1534#
@@ -1879,9 +1774,9 @@ different. The trace is live.
1879 # echo function > current_tracer 1774 # echo function > current_tracer
1880 # cat trace_pipe > /tmp/trace.out & 1775 # cat trace_pipe > /tmp/trace.out &
1881[1] 4153 1776[1] 4153
1882 # echo 1 > tracing_enabled 1777 # echo 1 > tracing_on
1883 # usleep 1 1778 # usleep 1
1884 # echo 0 > tracing_enabled 1779 # echo 0 > tracing_on
1885 # cat trace 1780 # cat trace
1886# tracer: function 1781# tracer: function
1887# 1782#
diff --git a/Documentation/trace/kprobetrace.txt b/Documentation/trace/kprobetrace.txt
index 5f77d94598dd..6d27ab8d6e9f 100644
--- a/Documentation/trace/kprobetrace.txt
+++ b/Documentation/trace/kprobetrace.txt
@@ -42,11 +42,25 @@ Synopsis of kprobe_events
42 +|-offs(FETCHARG) : Fetch memory at FETCHARG +|- offs address.(**) 42 +|-offs(FETCHARG) : Fetch memory at FETCHARG +|- offs address.(**)
43 NAME=FETCHARG : Set NAME as the argument name of FETCHARG. 43 NAME=FETCHARG : Set NAME as the argument name of FETCHARG.
44 FETCHARG:TYPE : Set TYPE as the type of FETCHARG. Currently, basic types 44 FETCHARG:TYPE : Set TYPE as the type of FETCHARG. Currently, basic types
45 (u8/u16/u32/u64/s8/s16/s32/s64) and string are supported. 45 (u8/u16/u32/u64/s8/s16/s32/s64), "string" and bitfield
46 are supported.
46 47
47 (*) only for return probe. 48 (*) only for return probe.
48 (**) this is useful for fetching a field of data structures. 49 (**) this is useful for fetching a field of data structures.
49 50
51Types
52-----
53Several types are supported for fetch-args. Kprobe tracer will access memory
54by given type. Prefix 's' and 'u' means those types are signed and unsigned
55respectively. Traced arguments are shown in decimal (signed) or hex (unsigned).
56String type is a special type, which fetches a "null-terminated" string from
57kernel space. This means it will fail and store NULL if the string container
58has been paged out.
59Bitfield is another special type, which takes 3 parameters, bit-width, bit-
60offset, and container-size (usually 32). The syntax is;
61
62 b<bit-width>@<bit-offset>/<container-size>
63
50 64
51Per-Probe Event Filtering 65Per-Probe Event Filtering
52------------------------- 66-------------------------