aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/ABI/testing/sysfs-class-bdi4
-rw-r--r--Documentation/DocBook/kernel-locking.tmpl25
-rw-r--r--Documentation/DocBook/kgdb.tmpl20
-rw-r--r--Documentation/HOWTO2
-rw-r--r--Documentation/SubmittingPatches46
-rw-r--r--Documentation/accounting/taskstats-struct.txt6
-rw-r--r--Documentation/auxdisplay/cfag12864b4
-rw-r--r--Documentation/auxdisplay/cfag12864b-example.c2
-rw-r--r--Documentation/auxdisplay/ks01084
-rw-r--r--Documentation/cciss.txt5
-rw-r--r--Documentation/cgroups.txt4
-rw-r--r--Documentation/controllers/devices.txt8
-rw-r--r--Documentation/cpu-freq/governors.txt8
-rw-r--r--Documentation/cpusets.txt20
-rw-r--r--Documentation/feature-removal-schedule.txt9
-rw-r--r--Documentation/filesystems/ext4.txt12
-rw-r--r--Documentation/filesystems/sysfs-pci.txt1
-rw-r--r--Documentation/ftrace.txt1353
-rw-r--r--Documentation/hwmon/ibmaem37
-rw-r--r--Documentation/hwmon/sysfs-interface33
-rw-r--r--Documentation/i2c/writing-clients18
-rw-r--r--Documentation/kernel-doc-nano-HOWTO.txt99
-rw-r--r--Documentation/kernel-docs.txt8
-rw-r--r--Documentation/kernel-parameters.txt12
-rw-r--r--Documentation/kobject.txt2
-rw-r--r--Documentation/laptops/thinkpad-acpi.txt2
-rw-r--r--Documentation/lguest/lguest.c12
-rw-r--r--Documentation/networking/arcnet.txt2
-rw-r--r--Documentation/networking/bridge.txt2
-rw-r--r--Documentation/networking/ip-sysctl.txt268
-rw-r--r--Documentation/networking/s2io.txt6
-rw-r--r--Documentation/video4linux/CARDLIST.au08282
-rw-r--r--Documentation/video4linux/CARDLIST.cx882
-rw-r--r--Documentation/video4linux/cx18.txt4
-rw-r--r--Documentation/vm/pagemap.txt77
-rw-r--r--Documentation/vm/slabinfo.c4
-rw-r--r--Documentation/vm/slub.txt2
37 files changed, 1992 insertions, 133 deletions
diff --git a/Documentation/ABI/testing/sysfs-class-bdi b/Documentation/ABI/testing/sysfs-class-bdi
index 5ac1e01bbd48..5f500977b42f 100644
--- a/Documentation/ABI/testing/sysfs-class-bdi
+++ b/Documentation/ABI/testing/sysfs-class-bdi
@@ -14,6 +14,10 @@ MAJOR:MINOR
14 non-block filesystems which provide their own BDI, such as NFS 14 non-block filesystems which provide their own BDI, such as NFS
15 and FUSE. 15 and FUSE.
16 16
17MAJOR:MINOR-fuseblk
18
19 Value of st_dev on fuseblk filesystems.
20
17default 21default
18 22
19 The default backing dev, used for non-block device backed 23 The default backing dev, used for non-block device backed
diff --git a/Documentation/DocBook/kernel-locking.tmpl b/Documentation/DocBook/kernel-locking.tmpl
index 77c42f40be5d..2510763295d0 100644
--- a/Documentation/DocBook/kernel-locking.tmpl
+++ b/Documentation/DocBook/kernel-locking.tmpl
@@ -703,6 +703,31 @@
703</sect1> 703</sect1>
704</chapter> 704</chapter>
705 705
706<chapter id="trylock-functions">
707 <title>The trylock Functions</title>
708 <para>
709 There are functions that try to acquire a lock only once and immediately
710 return a value telling about success or failure to acquire the lock.
711 They can be used if you need no access to the data protected with the lock
712 when some other thread is holding the lock. You should acquire the lock
713 later if you then need access to the data protected with the lock.
714 </para>
715
716 <para>
717 <function>spin_trylock()</function> does not spin but returns non-zero if
718 it acquires the spinlock on the first try or 0 if not. This function can
719 be used in all contexts like <function>spin_lock</function>: you must have
720 disabled the contexts that might interrupt you and acquire the spin lock.
721 </para>
722
723 <para>
724 <function>mutex_trylock()</function> does not suspend your task
725 but returns non-zero if it could lock the mutex on the first try
726 or 0 if not. This function cannot be safely used in hardware or software
727 interrupt contexts despite not sleeping.
728 </para>
729</chapter>
730
706 <chapter id="Examples"> 731 <chapter id="Examples">
707 <title>Common Examples</title> 732 <title>Common Examples</title>
708 <para> 733 <para>
diff --git a/Documentation/DocBook/kgdb.tmpl b/Documentation/DocBook/kgdb.tmpl
index 028a8444d95e..e8acd1f03456 100644
--- a/Documentation/DocBook/kgdb.tmpl
+++ b/Documentation/DocBook/kgdb.tmpl
@@ -84,10 +84,9 @@
84 runs an instance of gdb against the vmlinux file which contains 84 runs an instance of gdb against the vmlinux file which contains
85 the symbols (not boot image such as bzImage, zImage, uImage...). 85 the symbols (not boot image such as bzImage, zImage, uImage...).
86 In gdb the developer specifies the connection parameters and 86 In gdb the developer specifies the connection parameters and
87 connects to kgdb. Depending on which kgdb I/O modules exist in 87 connects to kgdb. The type of connection a developer makes with
88 the kernel for a given architecture, it may be possible to debug 88 gdb depends on the availability of kgdb I/O modules compiled as
89 the test machine's kernel with the development machine using a 89 builtin's or kernel modules in the test machine's kernel.
90 rs232 or ethernet connection.
91 </para> 90 </para>
92 </chapter> 91 </chapter>
93 <chapter id="CompilingAKernel"> 92 <chapter id="CompilingAKernel">
@@ -223,7 +222,7 @@
223 </para> 222 </para>
224 <para> 223 <para>
225 IMPORTANT NOTE: Using this option with kgdb over the console 224 IMPORTANT NOTE: Using this option with kgdb over the console
226 (kgdboc) or kgdb over ethernet (kgdboe) is not supported. 225 (kgdboc) is not supported.
227 </para> 226 </para>
228 </sect1> 227 </sect1>
229 </chapter> 228 </chapter>
@@ -249,18 +248,11 @@
249 (gdb) target remote /dev/ttyS0 248 (gdb) target remote /dev/ttyS0
250 </programlisting> 249 </programlisting>
251 <para> 250 <para>
252 Example (kgdb to a terminal server): 251 Example (kgdb to a terminal server on tcp port 2012):
253 </para> 252 </para>
254 <programlisting> 253 <programlisting>
255 % gdb ./vmlinux 254 % gdb ./vmlinux
256 (gdb) target remote udp:192.168.2.2:6443 255 (gdb) target remote 192.168.2.2:2012
257 </programlisting>
258 <para>
259 Example (kgdb over ethernet):
260 </para>
261 <programlisting>
262 % gdb ./vmlinux
263 (gdb) target remote udp:192.168.2.2:6443
264 </programlisting> 256 </programlisting>
265 <para> 257 <para>
266 Once connected, you can debug a kernel the way you would debug an 258 Once connected, you can debug a kernel the way you would debug an
diff --git a/Documentation/HOWTO b/Documentation/HOWTO
index 0291ade44c17..619e8caf30db 100644
--- a/Documentation/HOWTO
+++ b/Documentation/HOWTO
@@ -377,7 +377,7 @@ Bug Reporting
377bugzilla.kernel.org is where the Linux kernel developers track kernel 377bugzilla.kernel.org is where the Linux kernel developers track kernel
378bugs. Users are encouraged to report all bugs that they find in this 378bugs. Users are encouraged to report all bugs that they find in this
379tool. For details on how to use the kernel bugzilla, please see: 379tool. For details on how to use the kernel bugzilla, please see:
380 http://test.kernel.org/bugzilla/faq.html 380 http://bugzilla.kernel.org/page.cgi?id=faq.html
381 381
382The file REPORTING-BUGS in the main kernel source directory has a good 382The file REPORTING-BUGS in the main kernel source directory has a good
383template for how to report a possible kernel bug, and details what kind 383template for how to report a possible kernel bug, and details what kind
diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
index 9c93a03ea33b..118ca6e9404f 100644
--- a/Documentation/SubmittingPatches
+++ b/Documentation/SubmittingPatches
@@ -327,6 +327,52 @@ Some people also put extra tags at the end. They'll just be ignored for
327now, but you can do this to mark internal company procedures or just 327now, but you can do this to mark internal company procedures or just
328point out some special detail about the sign-off. 328point out some special detail about the sign-off.
329 329
330If you are a subsystem or branch maintainer, sometimes you need to slightly
331modify patches you receive in order to merge them, because the code is not
332exactly the same in your tree and the submitters'. If you stick strictly to
333rule (c), you should ask the submitter to rediff, but this is a totally
334counter-productive waste of time and energy. Rule (b) allows you to adjust
335the code, but then it is very impolite to change one submitter's code and
336make him endorse your bugs. To solve this problem, it is recommended that
337you add a line between the last Signed-off-by header and yours, indicating
338the nature of your changes. While there is nothing mandatory about this, it
339seems like prepending the description with your mail and/or name, all
340enclosed in square brackets, is noticeable enough to make it obvious that
341you are responsible for last-minute changes. Example :
342
343 Signed-off-by: Random J Developer <random@developer.example.org>
344 [lucky@maintainer.example.org: struct foo moved from foo.c to foo.h]
345 Signed-off-by: Lucky K Maintainer <lucky@maintainer.example.org>
346
347This practise is particularly helpful if you maintain a stable branch and
348want at the same time to credit the author, track changes, merge the fix,
349and protect the submitter from complaints. Note that under no circumstances
350can you change the author's identity (the From header), as it is the one
351which appears in the changelog.
352
353Special note to back-porters: It seems to be a common and useful practise
354to insert an indication of the origin of a patch at the top of the commit
355message (just after the subject line) to facilitate tracking. For instance,
356here's what we see in 2.6-stable :
357
358 Date: Tue May 13 19:10:30 2008 +0000
359
360 SCSI: libiscsi regression in 2.6.25: fix nop timer handling
361
362 commit 4cf1043593db6a337f10e006c23c69e5fc93e722 upstream
363
364And here's what appears in 2.4 :
365
366 Date: Tue May 13 22:12:27 2008 +0200
367
368 wireless, airo: waitbusy() won't delay
369
370 [backport of 2.6 commit b7acbdfbd1f277c1eb23f344f899cfa4cd0bf36a]
371
372Whatever the format, this information provides a valuable help to people
373tracking your trees, and to people trying to trouble-shoot bugs in your
374tree.
375
330 376
33113) When to use Acked-by: and Cc: 37713) When to use Acked-by: and Cc:
332 378
diff --git a/Documentation/accounting/taskstats-struct.txt b/Documentation/accounting/taskstats-struct.txt
index 8aa7529f8258..cd784f46bf8a 100644
--- a/Documentation/accounting/taskstats-struct.txt
+++ b/Documentation/accounting/taskstats-struct.txt
@@ -24,6 +24,8 @@ There are three different groups of fields in the struct taskstats:
24 24
254) Per-task and per-thread context switch count statistics 254) Per-task and per-thread context switch count statistics
26 26
275) Time accounting for SMT machines
28
27Future extension should add fields to the end of the taskstats struct, and 29Future extension should add fields to the end of the taskstats struct, and
28should not change the relative position of each field within the struct. 30should not change the relative position of each field within the struct.
29 31
@@ -164,4 +166,8 @@ struct taskstats {
164 __u64 nvcsw; /* Context voluntary switch counter */ 166 __u64 nvcsw; /* Context voluntary switch counter */
165 __u64 nivcsw; /* Context involuntary switch counter */ 167 __u64 nivcsw; /* Context involuntary switch counter */
166 168
1695) Time accounting for SMT machines
170 __u64 ac_utimescaled; /* utime scaled on frequency etc */
171 __u64 ac_stimescaled; /* stime scaled on frequency etc */
172 __u64 cpu_scaled_run_real_total; /* scaled cpu_run_real_total */
167} 173}
diff --git a/Documentation/auxdisplay/cfag12864b b/Documentation/auxdisplay/cfag12864b
index b714183d4125..eb7be393a510 100644
--- a/Documentation/auxdisplay/cfag12864b
+++ b/Documentation/auxdisplay/cfag12864b
@@ -3,7 +3,7 @@
3 =================================== 3 ===================================
4 4
5License: GPLv2 5License: GPLv2
6Author & Maintainer: Miguel Ojeda Sandonis <maxextreme@gmail.com> 6Author & Maintainer: Miguel Ojeda Sandonis
7Date: 2006-10-27 7Date: 2006-10-27
8 8
9 9
@@ -22,7 +22,7 @@ Date: 2006-10-27
221. DRIVER INFORMATION 221. DRIVER INFORMATION
23--------------------- 23---------------------
24 24
25This driver support one cfag12864b display at time. 25This driver supports a cfag12864b LCD.
26 26
27 27
28--------------------- 28---------------------
diff --git a/Documentation/auxdisplay/cfag12864b-example.c b/Documentation/auxdisplay/cfag12864b-example.c
index 7bfac354d4c9..2caeea5e4993 100644
--- a/Documentation/auxdisplay/cfag12864b-example.c
+++ b/Documentation/auxdisplay/cfag12864b-example.c
@@ -4,7 +4,7 @@
4 * Description: cfag12864b LCD userspace example program 4 * Description: cfag12864b LCD userspace example program
5 * License: GPLv2 5 * License: GPLv2
6 * 6 *
7 * Author: Copyright (C) Miguel Ojeda Sandonis <maxextreme@gmail.com> 7 * Author: Copyright (C) Miguel Ojeda Sandonis
8 * Date: 2006-10-31 8 * Date: 2006-10-31
9 * 9 *
10 * This program is free software; you can redistribute it and/or modify 10 * This program is free software; you can redistribute it and/or modify
diff --git a/Documentation/auxdisplay/ks0108 b/Documentation/auxdisplay/ks0108
index 92b03b60c613..8ddda0c8ceef 100644
--- a/Documentation/auxdisplay/ks0108
+++ b/Documentation/auxdisplay/ks0108
@@ -3,7 +3,7 @@
3 ========================================== 3 ==========================================
4 4
5License: GPLv2 5License: GPLv2
6Author & Maintainer: Miguel Ojeda Sandonis <maxextreme@gmail.com> 6Author & Maintainer: Miguel Ojeda Sandonis
7Date: 2006-10-27 7Date: 2006-10-27
8 8
9 9
@@ -21,7 +21,7 @@ Date: 2006-10-27
211. DRIVER INFORMATION 211. DRIVER INFORMATION
22--------------------- 22---------------------
23 23
24This driver support the ks0108 LCD controller. 24This driver supports the ks0108 LCD controller.
25 25
26 26
27--------------------- 27---------------------
diff --git a/Documentation/cciss.txt b/Documentation/cciss.txt
index e65736c6b8bc..63e59b8847c5 100644
--- a/Documentation/cciss.txt
+++ b/Documentation/cciss.txt
@@ -21,6 +21,11 @@ This driver is known to work with the following cards:
21 * SA E200 21 * SA E200
22 * SA E200i 22 * SA E200i
23 * SA E500 23 * SA E500
24 * SA P212
25 * SA P410
26 * SA P410i
27 * SA P411
28 * SA P812
24 29
25Detecting drive failures: 30Detecting drive failures:
26------------------------- 31-------------------------
diff --git a/Documentation/cgroups.txt b/Documentation/cgroups.txt
index 824fc0274471..d9014aa0eb68 100644
--- a/Documentation/cgroups.txt
+++ b/Documentation/cgroups.txt
@@ -390,6 +390,10 @@ If you have several tasks to attach, you have to do it one after another:
390 ... 390 ...
391# /bin/echo PIDn > tasks 391# /bin/echo PIDn > tasks
392 392
393You can attach the current shell task by echoing 0:
394
395# echo 0 > tasks
396
3933. Kernel API 3973. Kernel API
394============= 398=============
395 399
diff --git a/Documentation/controllers/devices.txt b/Documentation/controllers/devices.txt
index 4dcea42432c2..7cc6e6a60672 100644
--- a/Documentation/controllers/devices.txt
+++ b/Documentation/controllers/devices.txt
@@ -13,7 +13,7 @@ either an integer or * for all. Access is a composition of r
13The root device cgroup starts with rwm to 'all'. A child device 13The root device cgroup starts with rwm to 'all'. A child device
14cgroup gets a copy of the parent. Administrators can then remove 14cgroup gets a copy of the parent. Administrators can then remove
15devices from the whitelist or add new entries. A child cgroup can 15devices from the whitelist or add new entries. A child cgroup can
16never receive a device access which is denied its parent. However 16never receive a device access which is denied by its parent. However
17when a device access is removed from a parent it will not also be 17when a device access is removed from a parent it will not also be
18removed from the child(ren). 18removed from the child(ren).
19 19
@@ -29,7 +29,11 @@ allows cgroup 1 to read and mknod the device usually known as
29 29
30 echo a > /cgroups/1/devices.deny 30 echo a > /cgroups/1/devices.deny
31 31
32will remove the default 'a *:* mrw' entry. 32will remove the default 'a *:* rwm' entry. Doing
33
34 echo a > /cgroups/1/devices.allow
35
36will add the 'a *:* rwm' entry to the whitelist.
33 37
343. Security 383. Security
35 39
diff --git a/Documentation/cpu-freq/governors.txt b/Documentation/cpu-freq/governors.txt
index 6a9c55bd556b..dcec0564d040 100644
--- a/Documentation/cpu-freq/governors.txt
+++ b/Documentation/cpu-freq/governors.txt
@@ -129,14 +129,6 @@ to its default value of '80' it means that between the checking
129intervals the CPU needs to be on average more than 80% in use to then 129intervals the CPU needs to be on average more than 80% in use to then
130decide that the CPU frequency needs to be increased. 130decide that the CPU frequency needs to be increased.
131 131
132sampling_down_factor: this parameter controls the rate that the CPU
133makes a decision on when to decrease the frequency. When set to its
134default value of '5' it means that at 1/5 the sampling_rate the kernel
135makes a decision to lower the frequency. Five "lower rate" decisions
136have to be made in a row before the CPU frequency is actually lower.
137If set to '1' then the frequency decreases as quickly as it increases,
138if set to '2' it decreases at half the rate of the increase.
139
140ignore_nice_load: this parameter takes a value of '0' or '1'. When 132ignore_nice_load: this parameter takes a value of '0' or '1'. When
141set to '0' (its default), all processes are counted towards the 133set to '0' (its default), all processes are counted towards the
142'cpu utilisation' value. When set to '1', the processes that are 134'cpu utilisation' value. When set to '1', the processes that are
diff --git a/Documentation/cpusets.txt b/Documentation/cpusets.txt
index fb7b361e6eea..1f5a924d1e56 100644
--- a/Documentation/cpusets.txt
+++ b/Documentation/cpusets.txt
@@ -154,13 +154,15 @@ browsing and modifying the cpusets presently known to the kernel. No
154new system calls are added for cpusets - all support for querying and 154new system calls are added for cpusets - all support for querying and
155modifying cpusets is via this cpuset file system. 155modifying cpusets is via this cpuset file system.
156 156
157The /proc/<pid>/status file for each task has two added lines, 157The /proc/<pid>/status file for each task has four added lines,
158displaying the tasks cpus_allowed (on which CPUs it may be scheduled) 158displaying the tasks cpus_allowed (on which CPUs it may be scheduled)
159and mems_allowed (on which Memory Nodes it may obtain memory), 159and mems_allowed (on which Memory Nodes it may obtain memory),
160in the format seen in the following example: 160in the two formats seen in the following example:
161 161
162 Cpus_allowed: ffffffff,ffffffff,ffffffff,ffffffff 162 Cpus_allowed: ffffffff,ffffffff,ffffffff,ffffffff
163 Cpus_allowed_list: 0-127
163 Mems_allowed: ffffffff,ffffffff 164 Mems_allowed: ffffffff,ffffffff
165 Mems_allowed_list: 0-63
164 166
165Each cpuset is represented by a directory in the cgroup file system 167Each cpuset is represented by a directory in the cgroup file system
166containing (on top of the standard cgroup files) the following 168containing (on top of the standard cgroup files) the following
@@ -199,7 +201,7 @@ using the sched_setaffinity, mbind and set_mempolicy system calls.
199The following rules apply to each cpuset: 201The following rules apply to each cpuset:
200 202
201 - Its CPUs and Memory Nodes must be a subset of its parents. 203 - Its CPUs and Memory Nodes must be a subset of its parents.
202 - It can only be marked exclusive if its parent is. 204 - It can't be marked exclusive unless its parent is.
203 - If its cpu or memory is exclusive, they may not overlap any sibling. 205 - If its cpu or memory is exclusive, they may not overlap any sibling.
204 206
205These rules, and the natural hierarchy of cpusets, enable efficient 207These rules, and the natural hierarchy of cpusets, enable efficient
@@ -345,7 +347,7 @@ is modified to perform an inline check for this PF_SPREAD_PAGE task
345flag, and if set, a call to a new routine cpuset_mem_spread_node() 347flag, and if set, a call to a new routine cpuset_mem_spread_node()
346returns the node to prefer for the allocation. 348returns the node to prefer for the allocation.
347 349
348Similarly, setting 'memory_spread_cache' turns on the flag 350Similarly, setting 'memory_spread_slab' turns on the flag
349PF_SPREAD_SLAB, and appropriately marked slab caches will allocate 351PF_SPREAD_SLAB, and appropriately marked slab caches will allocate
350pages from the node returned by cpuset_mem_spread_node(). 352pages from the node returned by cpuset_mem_spread_node().
351 353
@@ -542,7 +544,10 @@ otherwise initial value -1 that indicates the cpuset has no request.
542 2 : search cores in a package. 544 2 : search cores in a package.
543 3 : search cpus in a node [= system wide on non-NUMA system] 545 3 : search cpus in a node [= system wide on non-NUMA system]
544 ( 4 : search nodes in a chunk of node [on NUMA system] ) 546 ( 4 : search nodes in a chunk of node [on NUMA system] )
545 ( 5~ : search system wide [on NUMA system]) 547 ( 5 : search system wide [on NUMA system] )
548
549The system default is architecture dependent. The system default
550can be changed using the relax_domain_level= boot parameter.
546 551
547This file is per-cpuset and affect the sched domain where the cpuset 552This file is per-cpuset and affect the sched domain where the cpuset
548belongs to. Therefore if the flag 'sched_load_balance' of a cpuset 553belongs to. Therefore if the flag 'sched_load_balance' of a cpuset
@@ -709,7 +714,10 @@ Now you want to do something with this cpuset.
709 714
710In this directory you can find several files: 715In this directory you can find several files:
711# ls 716# ls
712cpus cpu_exclusive mems mem_exclusive mem_hardwall tasks 717cpu_exclusive memory_migrate mems tasks
718cpus memory_pressure notify_on_release
719mem_exclusive memory_spread_page sched_load_balance
720mem_hardwall memory_spread_slab sched_relax_domain_level
713 721
714Reading them will give you information about the state of this cpuset: 722Reading them will give you information about the state of this cpuset:
715the CPUs and Memory Nodes it can use, the processes that are using 723the CPUs and Memory Nodes it can use, the processes that are using
diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt
index 5b3f31faed56..46ece3fba6f9 100644
--- a/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@ -312,3 +312,12 @@ When: 2.6.26
312Why: Implementation became generic; users should now include 312Why: Implementation became generic; users should now include
313 linux/semaphore.h instead. 313 linux/semaphore.h instead.
314Who: Matthew Wilcox <willy@linux.intel.com> 314Who: Matthew Wilcox <willy@linux.intel.com>
315
316---------------------------
317
318What: CONFIG_THERMAL_HWMON
319When: January 2009
320Why: This option was introduced just to allow older lm-sensors userspace
321 to keep working over the upgrade to 2.6.26. At the scheduled time of
322 removal fixed lm-sensors (2.x or 3.x) should be readily available.
323Who: Rene Herman <rene.herman@gmail.com>
diff --git a/Documentation/filesystems/ext4.txt b/Documentation/filesystems/ext4.txt
index 560f88dc7090..0c5086db8352 100644
--- a/Documentation/filesystems/ext4.txt
+++ b/Documentation/filesystems/ext4.txt
@@ -139,8 +139,16 @@ commit=nrsec (*) Ext4 can be told to sync all its data and metadata
139 Setting it to very large values will improve 139 Setting it to very large values will improve
140 performance. 140 performance.
141 141
142barrier=1 This enables/disables barriers. barrier=0 disables 142barrier=<0|1(*)> This enables/disables the use of write barriers in
143 it, barrier=1 enables it. 143 the jbd code. barrier=0 disables, barrier=1 enables.
144 This also requires an IO stack which can support
145 barriers, and if jbd gets an error on a barrier
146 write, it will disable again with a warning.
147 Write barriers enforce proper on-disk ordering
148 of journal commits, making volatile disk write caches
149 safe to use, at some performance penalty. If
150 your disks are battery-backed in one way or another,
151 disabling barriers may safely improve performance.
144 152
145orlov (*) This enables the new Orlov block allocator. It is 153orlov (*) This enables the new Orlov block allocator. It is
146 enabled by default. 154 enabled by default.
diff --git a/Documentation/filesystems/sysfs-pci.txt b/Documentation/filesystems/sysfs-pci.txt
index 5daa2aaec2c5..68ef48839c04 100644
--- a/Documentation/filesystems/sysfs-pci.txt
+++ b/Documentation/filesystems/sysfs-pci.txt
@@ -36,6 +36,7 @@ files, each with their own function.
36 local_cpus nearby CPU mask (cpumask, ro) 36 local_cpus nearby CPU mask (cpumask, ro)
37 resource PCI resource host addresses (ascii, ro) 37 resource PCI resource host addresses (ascii, ro)
38 resource0..N PCI resource N, if present (binary, mmap) 38 resource0..N PCI resource N, if present (binary, mmap)
39 resource0_wc..N_wc PCI WC map resource N, if prefetchable (binary, mmap)
39 rom PCI ROM resource, if present (binary, ro) 40 rom PCI ROM resource, if present (binary, ro)
40 subsystem_device PCI subsystem device (ascii, ro) 41 subsystem_device PCI subsystem device (ascii, ro)
41 subsystem_vendor PCI subsystem vendor (ascii, ro) 42 subsystem_vendor PCI subsystem vendor (ascii, ro)
diff --git a/Documentation/ftrace.txt b/Documentation/ftrace.txt
new file mode 100644
index 000000000000..13e4bf054c38
--- /dev/null
+++ b/Documentation/ftrace.txt
@@ -0,0 +1,1353 @@
1 ftrace - Function Tracer
2 ========================
3
4Copyright 2008 Red Hat Inc.
5Author: Steven Rostedt <srostedt@redhat.com>
6
7
8Introduction
9------------
10
11Ftrace is an internal tracer designed to help out developers and
12designers of systems to find what is going on inside the kernel.
13It can be used for debugging or analyzing latencies and performance
14issues that take place outside of user-space.
15
16Although ftrace is the function tracer, it also includes an
17infrastructure that allows for other types of tracing. Some of the
18tracers that are currently in ftrace is a tracer to trace
19context switches, the time it takes for a high priority task to
20run after it was woken up, the time interrupts are disabled, and
21more.
22
23
24The File System
25---------------
26
27Ftrace uses the debugfs file system to hold the control files as well
28as the files to display output.
29
30To mount the debugfs system:
31
32 # mkdir /debug
33 # mount -t debugfs nodev /debug
34
35
36That's it! (assuming that you have ftrace configured into your kernel)
37
38After mounting the debugfs, you can see a directory called
39"tracing". This directory contains the control and output files
40of ftrace. Here is a list of some of the key files:
41
42
43 Note: all time values are in microseconds.
44
45 current_tracer : This is used to set or display the current tracer
46 that is configured.
47
48 available_tracers : This holds the different types of tracers that
49 has been compiled into the kernel. The tracers
50 listed here can be configured by echoing in their
51 name into current_tracer.
52
53 tracing_enabled : This sets or displays whether the current_tracer
54 is activated and tracing or not. Echo 0 into this
55 file to disable the tracer or 1 (or non-zero) to
56 enable it.
57
58 trace : This file holds the output of the trace in a human readable
59 format.
60
61 latency_trace : This file shows the same trace but the information
62 is organized more to display possible latencies
63 in the system.
64
65 trace_pipe : The output is the same as the "trace" file but this
66 file is meant to be streamed with live tracing.
67 Reads from this file will block until new data
68 is retrieved. Unlike the "trace" and "latency_trace"
69 files, this file is a consumer. This means reading
70 from this file causes sequential reads to display
71 more current data. Once data is read from this
72 file, it is consumed, and will not be read
73 again with a sequential read. The "trace" and
74 "latency_trace" files are static, and if the
75 tracer isn't adding more data, they will display
76 the same information every time they are read.
77
78 iter_ctrl : This file lets the user control the amount of data
79 that is displayed in one of the above output
80 files.
81
82 trace_max_latency : Some of the tracers record the max latency.
83 For example, the time interrupts are disabled.
84 This time is saved in this file. The max trace
85 will also be stored, and displayed by either
86 "trace" or "latency_trace". A new max trace will
87 only be recorded if the latency is greater than
88 the value in this file. (in microseconds)
89
90 trace_entries : This sets or displays the number of trace
91 entries each CPU buffer can hold. The tracer buffers
92 are the same size for each CPU, so care must be
93 taken when modifying the trace_entries. The number
94 of actually entries will be the number given
95 times the number of possible CPUS. The buffers
96 are saved as individual pages, and the actual entries
97 will always be rounded up to entries per page.
98
99 This can only be updated when the current_tracer
100 is set to "none".
101
102 NOTE: It is planned on changing the allocated buffers
103 from being the number of possible CPUS to
104 the number of online CPUS.
105
106 tracing_cpumask : This is a mask that lets the user only trace
107 on specified CPUS. The format is a hex string
108 representing the CPUS.
109
110 set_ftrace_filter : When dynamic ftrace is configured in, the
111 code is dynamically modified to disable calling
112 of the function profiler (mcount). This lets
113 tracing be configured in with practically no overhead
114 in performance. This also has a side effect of
115 enabling or disabling specific functions to be
116 traced. Echoing in names of functions into this
117 file will limit the trace to only those files.
118
119 set_ftrace_notrace: This has the opposite effect that
120 set_ftrace_filter has. Any function that is added
121 here will not be traced. If a function exists
122 in both set_ftrace_filter and set_ftrace_notrace
123 the function will _not_ bet traced.
124
125 available_filter_functions : When a function is encountered the first
126 time by the dynamic tracer, it is recorded and
127 later the call is converted into a nop. This file
128 lists the functions that have been recorded
129 by the dynamic tracer and these functions can
130 be used to set the ftrace filter by the above
131 "set_ftrace_filter" file.
132
133
134The Tracers
135-----------
136
137Here are the list of current tracers that can be configured.
138
139 ftrace - function tracer that uses mcount to trace all functions.
140 It is possible to filter out which functions that are
141 traced when dynamic ftrace is configured in.
142
143 sched_switch - traces the context switches between tasks.
144
145 irqsoff - traces the areas that disable interrupts and saves off
146 the trace with the longest max latency.
147 See tracing_max_latency. When a new max is recorded,
148 it replaces the old trace. It is best to view this
149 trace with the latency_trace file.
150
151 preemptoff - Similar to irqsoff but traces and records the time
152 preemption is disabled.
153
154 preemptirqsoff - Similar to irqsoff and preemptoff, but traces and
155 records the largest time irqs and/or preemption is
156 disabled.
157
158 wakeup - Traces and records the max latency that it takes for
159 the highest priority task to get scheduled after
160 it has been woken up.
161
162 none - This is not a tracer. To remove all tracers from tracing
163 simply echo "none" into current_tracer.
164
165
166Examples of using the tracer
167----------------------------
168
169Here are typical examples of using the tracers with only controlling
170them with the debugfs interface (without using any user-land utilities).
171
172Output format:
173--------------
174
175Here's an example of the output format of the file "trace"
176
177 --------
178# tracer: ftrace
179#
180# TASK-PID CPU# TIMESTAMP FUNCTION
181# | | | | |
182 bash-4251 [01] 10152.583854: path_put <-path_walk
183 bash-4251 [01] 10152.583855: dput <-path_put
184 bash-4251 [01] 10152.583855: _atomic_dec_and_lock <-dput
185 --------
186
187A header is printed with the trace that is represented. In this case
188the tracer is "ftrace". Then a header showing the format. Task name
189"bash", the task PID "4251", the CPU that it was running on
190"01", the timestamp in <secs>.<usecs> format, the function name that was
191traced "path_put" and the parent function that called this function
192"path_walk".
193
194The sched_switch tracer also includes tracing of task wake ups and
195context switches.
196
197 ksoftirqd/1-7 [01] 1453.070013: 7:115:R + 2916:115:S
198 ksoftirqd/1-7 [01] 1453.070013: 7:115:R + 10:115:S
199 ksoftirqd/1-7 [01] 1453.070013: 7:115:R ==> 10:115:R
200 events/1-10 [01] 1453.070013: 10:115:S ==> 2916:115:R
201 kondemand/1-2916 [01] 1453.070013: 2916:115:S ==> 7:115:R
202 ksoftirqd/1-7 [01] 1453.070013: 7:115:S ==> 0:140:R
203
204Wake ups are represented by a "+" and the context switches show
205"==>". The format is:
206
207 Context switches:
208
209 Previous task Next Task
210
211 <pid>:<prio>:<state> ==> <pid>:<prio>:<state>
212
213 Wake ups:
214
215 Current task Task waking up
216
217 <pid>:<prio>:<state> + <pid>:<prio>:<state>
218
219The prio is the internal kernel priority, which is inverse to the
220priority that is usually displayed by user-space tools. Zero represents
221the highest priority (99). Prio 100 starts the "nice" priorities with
222100 being equal to nice -20 and 139 being nice 19. The prio "140" is
223reserved for the idle task which is the lowest priority thread (pid 0).
224
225
226Latency trace format
227--------------------
228
229For traces that display latency times, the latency_trace file gives
230a bit more information to see why a latency happened. Here's a typical
231trace.
232
233# tracer: irqsoff
234#
235irqsoff latency trace v1.1.5 on 2.6.26-rc8
236--------------------------------------------------------------------
237 latency: 97 us, #3/3, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
238 -----------------
239 | task: swapper-0 (uid:0 nice:0 policy:0 rt_prio:0)
240 -----------------
241 => started at: apic_timer_interrupt
242 => ended at: do_softirq
243
244# _------=> CPU#
245# / _-----=> irqs-off
246# | / _----=> need-resched
247# || / _---=> hardirq/softirq
248# ||| / _--=> preempt-depth
249# |||| /
250# ||||| delay
251# cmd pid ||||| time | caller
252# \ / ||||| \ | /
253 <idle>-0 0d..1 0us+: trace_hardirqs_off_thunk (apic_timer_interrupt)
254 <idle>-0 0d.s. 97us : __do_softirq (do_softirq)
255 <idle>-0 0d.s1 98us : trace_hardirqs_on (do_softirq)
256
257
258vim:ft=help
259
260
261This shows that the current tracer is "irqsoff" tracing the time
262interrupts are disabled. It gives the trace version and the kernel
263this was executed on (2.6.26-rc8). Then it displays the max latency
264in microsecs (97 us). The number of trace entries displayed
265by the total number recorded (both are three: #3/3). The type of
266preemption that was used (PREEMPT). VP, KP, SP, and HP are always zero
267and reserved for later use. #P is the number of online CPUS (#P:2).
268
269The task is the process that was running when the latency happened.
270(swapper pid: 0).
271
272The start and stop that caused the latencies:
273
274 apic_timer_interrupt is where the interrupts were disabled.
275 do_softirq is where they were enabled again.
276
277The next lines after the header are the trace itself. The header
278explains which is which.
279
280 cmd: The name of the process in the trace.
281
282 pid: The PID of that process.
283
284 CPU#: The CPU that the process was running on.
285
286 irqs-off: 'd' interrupts are disabled. '.' otherwise.
287
288 need-resched: 'N' task need_resched is set, '.' otherwise.
289
290 hardirq/softirq:
291 'H' - hard irq happened inside a softirq.
292 'h' - hard irq is running
293 's' - soft irq is running
294 '.' - normal context.
295
296 preempt-depth: The level of preempt_disabled
297
298The above is mostly meaningful for kernel developers.
299
300 time: This differs from the trace output where as the trace output
301 contained a absolute timestamp. This timestamp is relative
302 to the start of the first entry in the the trace.
303
304 delay: This is just to help catch your eye a bit better. And
305 needs to be fixed to be only relative to the same CPU.
306 The marks is determined by the difference between this
307 current trace and the next trace.
308 '!' - greater than preempt_mark_thresh (default 100)
309 '+' - greater than 1 microsecond
310 ' ' - less than or equal to 1 microsecond.
311
312 The rest is the same as the 'trace' file.
313
314
315iter_ctrl
316---------
317
318The iter_ctrl file is used to control what gets printed in the trace
319output. To see what is available, simply cat the file:
320
321 cat /debug/tracing/iter_ctrl
322 print-parent nosym-offset nosym-addr noverbose noraw nohex nobin \
323 noblock nostacktrace nosched-tree
324
325To disable one of the options, echo in the option appended with "no".
326
327 echo noprint-parent > /debug/tracing/iter_ctrl
328
329To enable an option, leave off the "no".
330
331 echo sym-offest > /debug/tracing/iter_ctrl
332
333Here are the available options:
334
335 print-parent - On function traces, display the calling function
336 as well as the function being traced.
337
338 print-parent:
339 bash-4000 [01] 1477.606694: simple_strtoul <-strict_strtoul
340
341 noprint-parent:
342 bash-4000 [01] 1477.606694: simple_strtoul
343
344
345 sym-offset - Display not only the function name, but also the offset
346 in the function. For example, instead of seeing just
347 "ktime_get" you will see "ktime_get+0xb/0x20"
348
349 sym-offset:
350 bash-4000 [01] 1477.606694: simple_strtoul+0x6/0xa0
351
352 sym-addr - this will also display the function address as well as
353 the function name.
354
355 sym-addr:
356 bash-4000 [01] 1477.606694: simple_strtoul <c0339346>
357
358 verbose - This deals with the latency_trace file.
359
360 bash 4000 1 0 00000000 00010a95 [58127d26] 1720.415ms \
361 (+0.000ms): simple_strtoul (strict_strtoul)
362
363 raw - This will display raw numbers. This option is best for use with
364 user applications that can translate the raw numbers better than
365 having it done in the kernel.
366
367 hex - similar to raw, but the numbers will be in a hexadecimal format.
368
369 bin - This will print out the formats in raw binary.
370
371 block - TBD (needs update)
372
373 stacktrace - This is one of the options that changes the trace itself.
374 When a trace is recorded, so is the stack of functions.
375 This allows for back traces of trace sites.
376
377 sched-tree - TBD (any users??)
378
379
380sched_switch
381------------
382
383This tracer simply records schedule switches. Here's an example
384on how to implement it.
385
386 # echo sched_switch > /debug/tracing/current_tracer
387 # echo 1 > /debug/tracing/tracing_enabled
388 # sleep 1
389 # echo 0 > /debug/tracing/tracing_enabled
390 # cat /debug/tracing/trace
391
392# tracer: sched_switch
393#
394# TASK-PID CPU# TIMESTAMP FUNCTION
395# | | | | |
396 bash-3997 [01] 240.132281: 3997:120:R + 4055:120:R
397 bash-3997 [01] 240.132284: 3997:120:R ==> 4055:120:R
398 sleep-4055 [01] 240.132371: 4055:120:S ==> 3997:120:R
399 bash-3997 [01] 240.132454: 3997:120:R + 4055:120:S
400 bash-3997 [01] 240.132457: 3997:120:R ==> 4055:120:R
401 sleep-4055 [01] 240.132460: 4055:120:D ==> 3997:120:R
402 bash-3997 [01] 240.132463: 3997:120:R + 4055:120:D
403 bash-3997 [01] 240.132465: 3997:120:R ==> 4055:120:R
404 <idle>-0 [00] 240.132589: 0:140:R + 4:115:S
405 <idle>-0 [00] 240.132591: 0:140:R ==> 4:115:R
406 ksoftirqd/0-4 [00] 240.132595: 4:115:S ==> 0:140:R
407 <idle>-0 [00] 240.132598: 0:140:R + 4:115:S
408 <idle>-0 [00] 240.132599: 0:140:R ==> 4:115:R
409 ksoftirqd/0-4 [00] 240.132603: 4:115:S ==> 0:140:R
410 sleep-4055 [01] 240.133058: 4055:120:S ==> 3997:120:R
411 [...]
412
413
414As we have discussed previously about this format, the header shows
415the name of the trace and points to the options. The "FUNCTION"
416is a misnomer since here it represents the wake ups and context
417switches.
418
419The sched_switch only lists the wake ups (represented with '+')
420and context switches ('==>') with the previous task or current
421first followed by the next task or task waking up. The format for both
422of these is PID:KERNEL-PRIO:TASK-STATE. Remember that the KERNEL-PRIO
423is the inverse of the actual priority with zero (0) being the highest
424priority and the nice values starting at 100 (nice -20). Below is
425a quick chart to map the kernel priority to user land priorities.
426
427 Kernel priority: 0 to 99 ==> user RT priority 99 to 0
428 Kernel priority: 100 to 139 ==> user nice -20 to 19
429 Kernel priority: 140 ==> idle task priority
430
431The task states are:
432
433 R - running : wants to run, may not actually be running
434 S - sleep : process is waiting to be woken up (handles signals)
435 D - deep sleep : process must be woken up (ignores signals)
436 T - stopped : process suspended
437 t - traced : process is being traced (with something like gdb)
438 Z - zombie : process waiting to be cleaned up
439 X - unknown
440
441
442ftrace_enabled
443--------------
444
445The following tracers give different output depending on whether
446or not the sysctl ftrace_enabled is set. To set ftrace_enabled,
447one can either use the sysctl function or set it via the proc
448file system interface.
449
450 sysctl kernel.ftrace_enabled=1
451
452 or
453
454 echo 1 > /proc/sys/kernel/ftrace_enabled
455
456To disable ftrace_enabled simply replace the '1' with '0' in
457the above commands.
458
459When ftrace_enabled is set the tracers will also record the functions
460that are within the trace. The descriptions of the tracers
461will also show an example with ftrace enabled.
462
463
464irqsoff
465-------
466
467When interrupts are disabled, the CPU can not react to any other
468external event (besides NMIs and SMIs). This prevents the timer
469interrupt from triggering or the mouse interrupt from letting the
470kernel know of a new mouse event. The result is a latency with the
471reaction time.
472
473The irqsoff tracer tracks the time interrupts are disabled and when
474they are re-enabled. When a new maximum latency is hit, it saves off
475the trace so that it may be retrieved at a later time. Every time a
476new maximum in reached, the old saved trace is discarded and the new
477trace is saved.
478
479To reset the maximum, echo 0 into tracing_max_latency. Here's an
480example:
481
482 # echo irqsoff > /debug/tracing/current_tracer
483 # echo 0 > /debug/tracing/tracing_max_latency
484 # echo 1 > /debug/tracing/tracing_enabled
485 # ls -ltr
486 [...]
487 # echo 0 > /debug/tracing/tracing_enabled
488 # cat /debug/tracing/latency_trace
489# tracer: irqsoff
490#
491irqsoff latency trace v1.1.5 on 2.6.26-rc8
492--------------------------------------------------------------------
493 latency: 6 us, #3/3, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
494 -----------------
495 | task: bash-4269 (uid:0 nice:0 policy:0 rt_prio:0)
496 -----------------
497 => started at: copy_page_range
498 => ended at: copy_page_range
499
500# _------=> CPU#
501# / _-----=> irqs-off
502# | / _----=> need-resched
503# || / _---=> hardirq/softirq
504# ||| / _--=> preempt-depth
505# |||| /
506# ||||| delay
507# cmd pid ||||| time | caller
508# \ / ||||| \ | /
509 bash-4269 1...1 0us+: _spin_lock (copy_page_range)
510 bash-4269 1...1 7us : _spin_unlock (copy_page_range)
511 bash-4269 1...2 7us : trace_preempt_on (copy_page_range)
512
513
514vim:ft=help
515
516Here we see that that we had a latency of 6 microsecs (which is
517very good). The spin_lock in copy_page_range disabled interrupts.
518The difference between the 6 and the displayed timestamp 7us is
519because the clock must have incremented between the time of recording
520the max latency and recording the function that had that latency.
521
522Note the above had ftrace_enabled not set. If we set the ftrace_enabled
523we get a much larger output:
524
525# tracer: irqsoff
526#
527irqsoff latency trace v1.1.5 on 2.6.26-rc8
528--------------------------------------------------------------------
529 latency: 50 us, #101/101, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
530 -----------------
531 | task: ls-4339 (uid:0 nice:0 policy:0 rt_prio:0)
532 -----------------
533 => started at: __alloc_pages_internal
534 => ended at: __alloc_pages_internal
535
536# _------=> CPU#
537# / _-----=> irqs-off
538# | / _----=> need-resched
539# || / _---=> hardirq/softirq
540# ||| / _--=> preempt-depth
541# |||| /
542# ||||| delay
543# cmd pid ||||| time | caller
544# \ / ||||| \ | /
545 ls-4339 0...1 0us+: get_page_from_freelist (__alloc_pages_internal)
546 ls-4339 0d..1 3us : rmqueue_bulk (get_page_from_freelist)
547 ls-4339 0d..1 3us : _spin_lock (rmqueue_bulk)
548 ls-4339 0d..1 4us : add_preempt_count (_spin_lock)
549 ls-4339 0d..2 4us : __rmqueue (rmqueue_bulk)
550 ls-4339 0d..2 5us : __rmqueue_smallest (__rmqueue)
551 ls-4339 0d..2 5us : __mod_zone_page_state (__rmqueue_smallest)
552 ls-4339 0d..2 6us : __rmqueue (rmqueue_bulk)
553 ls-4339 0d..2 6us : __rmqueue_smallest (__rmqueue)
554 ls-4339 0d..2 7us : __mod_zone_page_state (__rmqueue_smallest)
555 ls-4339 0d..2 7us : __rmqueue (rmqueue_bulk)
556 ls-4339 0d..2 8us : __rmqueue_smallest (__rmqueue)
557[...]
558 ls-4339 0d..2 46us : __rmqueue_smallest (__rmqueue)
559 ls-4339 0d..2 47us : __mod_zone_page_state (__rmqueue_smallest)
560 ls-4339 0d..2 47us : __rmqueue (rmqueue_bulk)
561 ls-4339 0d..2 48us : __rmqueue_smallest (__rmqueue)
562 ls-4339 0d..2 48us : __mod_zone_page_state (__rmqueue_smallest)
563 ls-4339 0d..2 49us : _spin_unlock (rmqueue_bulk)
564 ls-4339 0d..2 49us : sub_preempt_count (_spin_unlock)
565 ls-4339 0d..1 50us : get_page_from_freelist (__alloc_pages_internal)
566 ls-4339 0d..2 51us : trace_hardirqs_on (__alloc_pages_internal)
567
568
569vim:ft=help
570
571
572Here we traced a 50 microsecond latency. But we also see all the
573functions that were called during that time. Note that enabling
574function tracing we endure an added overhead. This overhead may
575extend the latency times. But never the less, this trace has provided
576some very helpful debugging.
577
578
579preemptoff
580----------
581
582When preemption is disabled we may be able to receive interrupts but
583the task can not be preempted and a higher priority task must wait
584for preemption to be enabled again before it can preempt a lower
585priority task.
586
587The preemptoff tracer traces the places that disables preemption.
588Like the irqsoff, it records the maximum latency that preemption
589was disabled. The control of preemptoff is much like the irqsoff.
590
591 # echo preemptoff > /debug/tracing/current_tracer
592 # echo 0 > /debug/tracing/tracing_max_latency
593 # echo 1 > /debug/tracing/tracing_enabled
594 # ls -ltr
595 [...]
596 # echo 0 > /debug/tracing/tracing_enabled
597 # cat /debug/tracing/latency_trace
598# tracer: preemptoff
599#
600preemptoff latency trace v1.1.5 on 2.6.26-rc8
601--------------------------------------------------------------------
602 latency: 29 us, #3/3, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
603 -----------------
604 | task: sshd-4261 (uid:0 nice:0 policy:0 rt_prio:0)
605 -----------------
606 => started at: do_IRQ
607 => ended at: __do_softirq
608
609# _------=> CPU#
610# / _-----=> irqs-off
611# | / _----=> need-resched
612# || / _---=> hardirq/softirq
613# ||| / _--=> preempt-depth
614# |||| /
615# ||||| delay
616# cmd pid ||||| time | caller
617# \ / ||||| \ | /
618 sshd-4261 0d.h. 0us+: irq_enter (do_IRQ)
619 sshd-4261 0d.s. 29us : _local_bh_enable (__do_softirq)
620 sshd-4261 0d.s1 30us : trace_preempt_on (__do_softirq)
621
622
623vim:ft=help
624
625This has some more changes. Preemption was disabled when an interrupt
626came in (notice the 'h'), and was enabled while doing a softirq.
627(notice the 's'). But we also see that interrupts have been disabled
628when entering the preempt off section and leaving it (the 'd').
629We do not know if interrupts were enabled in the mean time.
630
631# tracer: preemptoff
632#
633preemptoff latency trace v1.1.5 on 2.6.26-rc8
634--------------------------------------------------------------------
635 latency: 63 us, #87/87, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
636 -----------------
637 | task: sshd-4261 (uid:0 nice:0 policy:0 rt_prio:0)
638 -----------------
639 => started at: remove_wait_queue
640 => ended at: __do_softirq
641
642# _------=> CPU#
643# / _-----=> irqs-off
644# | / _----=> need-resched
645# || / _---=> hardirq/softirq
646# ||| / _--=> preempt-depth
647# |||| /
648# ||||| delay
649# cmd pid ||||| time | caller
650# \ / ||||| \ | /
651 sshd-4261 0d..1 0us : _spin_lock_irqsave (remove_wait_queue)
652 sshd-4261 0d..1 1us : _spin_unlock_irqrestore (remove_wait_queue)
653 sshd-4261 0d..1 2us : do_IRQ (common_interrupt)
654 sshd-4261 0d..1 2us : irq_enter (do_IRQ)
655 sshd-4261 0d..1 2us : idle_cpu (irq_enter)
656 sshd-4261 0d..1 3us : add_preempt_count (irq_enter)
657 sshd-4261 0d.h1 3us : idle_cpu (irq_enter)
658 sshd-4261 0d.h. 4us : handle_fasteoi_irq (do_IRQ)
659[...]
660 sshd-4261 0d.h. 12us : add_preempt_count (_spin_lock)
661 sshd-4261 0d.h1 12us : ack_ioapic_quirk_irq (handle_fasteoi_irq)
662 sshd-4261 0d.h1 13us : move_native_irq (ack_ioapic_quirk_irq)
663 sshd-4261 0d.h1 13us : _spin_unlock (handle_fasteoi_irq)
664 sshd-4261 0d.h1 14us : sub_preempt_count (_spin_unlock)
665 sshd-4261 0d.h1 14us : irq_exit (do_IRQ)
666 sshd-4261 0d.h1 15us : sub_preempt_count (irq_exit)
667 sshd-4261 0d..2 15us : do_softirq (irq_exit)
668 sshd-4261 0d... 15us : __do_softirq (do_softirq)
669 sshd-4261 0d... 16us : __local_bh_disable (__do_softirq)
670 sshd-4261 0d... 16us+: add_preempt_count (__local_bh_disable)
671 sshd-4261 0d.s4 20us : add_preempt_count (__local_bh_disable)
672 sshd-4261 0d.s4 21us : sub_preempt_count (local_bh_enable)
673 sshd-4261 0d.s5 21us : sub_preempt_count (local_bh_enable)
674[...]
675 sshd-4261 0d.s6 41us : add_preempt_count (__local_bh_disable)
676 sshd-4261 0d.s6 42us : sub_preempt_count (local_bh_enable)
677 sshd-4261 0d.s7 42us : sub_preempt_count (local_bh_enable)
678 sshd-4261 0d.s5 43us : add_preempt_count (__local_bh_disable)
679 sshd-4261 0d.s5 43us : sub_preempt_count (local_bh_enable_ip)
680 sshd-4261 0d.s6 44us : sub_preempt_count (local_bh_enable_ip)
681 sshd-4261 0d.s5 44us : add_preempt_count (__local_bh_disable)
682 sshd-4261 0d.s5 45us : sub_preempt_count (local_bh_enable)
683[...]
684 sshd-4261 0d.s. 63us : _local_bh_enable (__do_softirq)
685 sshd-4261 0d.s1 64us : trace_preempt_on (__do_softirq)
686
687
688The above is an example of the preemptoff trace with ftrace_enabled
689set. Here we see that interrupts were disabled the entire time.
690The irq_enter code lets us know that we entered an interrupt 'h'.
691Before that, the functions being traced still show that it is not
692in an interrupt, but we can see by the functions themselves that
693this is not the case.
694
695Notice that the __do_softirq when called doesn't have a preempt_count.
696It may seem that we missed a preempt enabled. What really happened
697is that the preempt count is held on the threads stack and we
698switched to the softirq stack (4K stacks in effect). The code
699does not copy the preempt count, but because interrupts are disabled
700we don't need to worry about it. Having a tracer like this is good
701to let people know what really happens inside the kernel.
702
703
704preemptirqsoff
705--------------
706
707Knowing the locations that have interrupts disabled or preemption
708disabled for the longest times is helpful. But sometimes we would
709like to know when either preemption and/or interrupts are disabled.
710
711The following code:
712
713 local_irq_disable();
714 call_function_with_irqs_off();
715 preempt_disable();
716 call_function_with_irqs_and_preemption_off();
717 local_irq_enable();
718 call_function_with_preemption_off();
719 preempt_enable();
720
721The irqsoff tracer will record the total length of
722call_function_with_irqs_off() and
723call_function_with_irqs_and_preemption_off().
724
725The preemptoff tracer will record the total length of
726call_function_with_irqs_and_preemption_off() and
727call_function_with_preemption_off().
728
729But neither will trace the time that interrupts and/or preemption
730is disabled. This total time is the time that we can not schedule.
731To record this time, use the preemptirqsoff tracer.
732
733Again, using this trace is much like the irqsoff and preemptoff tracers.
734
735 # echo preemptoff > /debug/tracing/current_tracer
736 # echo 0 > /debug/tracing/tracing_max_latency
737 # echo 1 > /debug/tracing/tracing_enabled
738 # ls -ltr
739 [...]
740 # echo 0 > /debug/tracing/tracing_enabled
741 # cat /debug/tracing/latency_trace
742# tracer: preemptirqsoff
743#
744preemptirqsoff latency trace v1.1.5 on 2.6.26-rc8
745--------------------------------------------------------------------
746 latency: 293 us, #3/3, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
747 -----------------
748 | task: ls-4860 (uid:0 nice:0 policy:0 rt_prio:0)
749 -----------------
750 => started at: apic_timer_interrupt
751 => ended at: __do_softirq
752
753# _------=> CPU#
754# / _-----=> irqs-off
755# | / _----=> need-resched
756# || / _---=> hardirq/softirq
757# ||| / _--=> preempt-depth
758# |||| /
759# ||||| delay
760# cmd pid ||||| time | caller
761# \ / ||||| \ | /
762 ls-4860 0d... 0us!: trace_hardirqs_off_thunk (apic_timer_interrupt)
763 ls-4860 0d.s. 294us : _local_bh_enable (__do_softirq)
764 ls-4860 0d.s1 294us : trace_preempt_on (__do_softirq)
765
766
767vim:ft=help
768
769
770The trace_hardirqs_off_thunk is called from assembly on x86 when
771interrupts are disabled in the assembly code. Without the function
772tracing, we don't know if interrupts were enabled within the preemption
773points. We do see that it started with preemption enabled.
774
775Here is a trace with ftrace_enabled set:
776
777
778# tracer: preemptirqsoff
779#
780preemptirqsoff latency trace v1.1.5 on 2.6.26-rc8
781--------------------------------------------------------------------
782 latency: 105 us, #183/183, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
783 -----------------
784 | task: sshd-4261 (uid:0 nice:0 policy:0 rt_prio:0)
785 -----------------
786 => started at: write_chan
787 => ended at: __do_softirq
788
789# _------=> CPU#
790# / _-----=> irqs-off
791# | / _----=> need-resched
792# || / _---=> hardirq/softirq
793# ||| / _--=> preempt-depth
794# |||| /
795# ||||| delay
796# cmd pid ||||| time | caller
797# \ / ||||| \ | /
798 ls-4473 0.N.. 0us : preempt_schedule (write_chan)
799 ls-4473 0dN.1 1us : _spin_lock (schedule)
800 ls-4473 0dN.1 2us : add_preempt_count (_spin_lock)
801 ls-4473 0d..2 2us : put_prev_task_fair (schedule)
802[...]
803 ls-4473 0d..2 13us : set_normalized_timespec (ktime_get_ts)
804 ls-4473 0d..2 13us : __switch_to (schedule)
805 sshd-4261 0d..2 14us : finish_task_switch (schedule)
806 sshd-4261 0d..2 14us : _spin_unlock_irq (finish_task_switch)
807 sshd-4261 0d..1 15us : add_preempt_count (_spin_lock_irqsave)
808 sshd-4261 0d..2 16us : _spin_unlock_irqrestore (hrtick_set)
809 sshd-4261 0d..2 16us : do_IRQ (common_interrupt)
810 sshd-4261 0d..2 17us : irq_enter (do_IRQ)
811 sshd-4261 0d..2 17us : idle_cpu (irq_enter)
812 sshd-4261 0d..2 18us : add_preempt_count (irq_enter)
813 sshd-4261 0d.h2 18us : idle_cpu (irq_enter)
814 sshd-4261 0d.h. 18us : handle_fasteoi_irq (do_IRQ)
815 sshd-4261 0d.h. 19us : _spin_lock (handle_fasteoi_irq)
816 sshd-4261 0d.h. 19us : add_preempt_count (_spin_lock)
817 sshd-4261 0d.h1 20us : _spin_unlock (handle_fasteoi_irq)
818 sshd-4261 0d.h1 20us : sub_preempt_count (_spin_unlock)
819[...]
820 sshd-4261 0d.h1 28us : _spin_unlock (handle_fasteoi_irq)
821 sshd-4261 0d.h1 29us : sub_preempt_count (_spin_unlock)
822 sshd-4261 0d.h2 29us : irq_exit (do_IRQ)
823 sshd-4261 0d.h2 29us : sub_preempt_count (irq_exit)
824 sshd-4261 0d..3 30us : do_softirq (irq_exit)
825 sshd-4261 0d... 30us : __do_softirq (do_softirq)
826 sshd-4261 0d... 31us : __local_bh_disable (__do_softirq)
827 sshd-4261 0d... 31us+: add_preempt_count (__local_bh_disable)
828 sshd-4261 0d.s4 34us : add_preempt_count (__local_bh_disable)
829[...]
830 sshd-4261 0d.s3 43us : sub_preempt_count (local_bh_enable_ip)
831 sshd-4261 0d.s4 44us : sub_preempt_count (local_bh_enable_ip)
832 sshd-4261 0d.s3 44us : smp_apic_timer_interrupt (apic_timer_interrupt)
833 sshd-4261 0d.s3 45us : irq_enter (smp_apic_timer_interrupt)
834 sshd-4261 0d.s3 45us : idle_cpu (irq_enter)
835 sshd-4261 0d.s3 46us : add_preempt_count (irq_enter)
836 sshd-4261 0d.H3 46us : idle_cpu (irq_enter)
837 sshd-4261 0d.H3 47us : hrtimer_interrupt (smp_apic_timer_interrupt)
838 sshd-4261 0d.H3 47us : ktime_get (hrtimer_interrupt)
839[...]
840 sshd-4261 0d.H3 81us : tick_program_event (hrtimer_interrupt)
841 sshd-4261 0d.H3 82us : ktime_get (tick_program_event)
842 sshd-4261 0d.H3 82us : ktime_get_ts (ktime_get)
843 sshd-4261 0d.H3 83us : getnstimeofday (ktime_get_ts)
844 sshd-4261 0d.H3 83us : set_normalized_timespec (ktime_get_ts)
845 sshd-4261 0d.H3 84us : clockevents_program_event (tick_program_event)
846 sshd-4261 0d.H3 84us : lapic_next_event (clockevents_program_event)
847 sshd-4261 0d.H3 85us : irq_exit (smp_apic_timer_interrupt)
848 sshd-4261 0d.H3 85us : sub_preempt_count (irq_exit)
849 sshd-4261 0d.s4 86us : sub_preempt_count (irq_exit)
850 sshd-4261 0d.s3 86us : add_preempt_count (__local_bh_disable)
851[...]
852 sshd-4261 0d.s1 98us : sub_preempt_count (net_rx_action)
853 sshd-4261 0d.s. 99us : add_preempt_count (_spin_lock_irq)
854 sshd-4261 0d.s1 99us+: _spin_unlock_irq (run_timer_softirq)
855 sshd-4261 0d.s. 104us : _local_bh_enable (__do_softirq)
856 sshd-4261 0d.s. 104us : sub_preempt_count (_local_bh_enable)
857 sshd-4261 0d.s. 105us : _local_bh_enable (__do_softirq)
858 sshd-4261 0d.s1 105us : trace_preempt_on (__do_softirq)
859
860
861This is a very interesting trace. It started with the preemption of
862the ls task. We see that the task had the "need_resched" bit set
863with the 'N' in the trace. Interrupts are disabled in the spin_lock
864and the trace started. We see that a schedule took place to run
865sshd. When the interrupts were enabled we took an interrupt.
866On return of the interrupt the softirq ran. We took another interrupt
867while running the softirq as we see with the capital 'H'.
868
869
870wakeup
871------
872
873In Real-Time environment it is very important to know the wakeup
874time it takes for the highest priority task that wakes up to the
875time it executes. This is also known as "schedule latency".
876I stress the point that this is about RT tasks. It is also important
877to know the scheduling latency of non-RT tasks, but the average
878schedule latency is better for non-RT tasks. Tools like
879LatencyTop is more appropriate for such measurements.
880
881Real-Time environments is interested in the worst case latency.
882That is the longest latency it takes for something to happen, and
883not the average. We can have a very fast scheduler that may only
884have a large latency once in a while, but that would not work well
885with Real-Time tasks. The wakeup tracer was designed to record
886the worst case wakeups of RT tasks. Non-RT tasks are not recorded
887because the tracer only records one worst case and tracing non-RT
888tasks that are unpredictable will overwrite the worst case latency
889of RT tasks.
890
891Since this tracer only deals with RT tasks, we will run this slightly
892different than we did with the previous tracers. Instead of performing
893an 'ls' we will run 'sleep 1' under 'chrt' which changes the
894priority of the task.
895
896 # echo wakeup > /debug/tracing/current_tracer
897 # echo 0 > /debug/tracing/tracing_max_latency
898 # echo 1 > /debug/tracing/tracing_enabled
899 # chrt -f 5 sleep 1
900 # echo 0 > /debug/tracing/tracing_enabled
901 # cat /debug/tracing/latency_trace
902# tracer: wakeup
903#
904wakeup latency trace v1.1.5 on 2.6.26-rc8
905--------------------------------------------------------------------
906 latency: 4 us, #2/2, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
907 -----------------
908 | task: sleep-4901 (uid:0 nice:0 policy:1 rt_prio:5)
909 -----------------
910
911# _------=> CPU#
912# / _-----=> irqs-off
913# | / _----=> need-resched
914# || / _---=> hardirq/softirq
915# ||| / _--=> preempt-depth
916# |||| /
917# ||||| delay
918# cmd pid ||||| time | caller
919# \ / ||||| \ | /
920 <idle>-0 1d.h4 0us+: try_to_wake_up (wake_up_process)
921 <idle>-0 1d..4 4us : schedule (cpu_idle)
922
923
924vim:ft=help
925
926
927Running this on an idle system we see that it only took 4 microseconds
928to perform the task switch. Note, since the trace marker in the
929schedule is before the actual "switch" we stop the tracing when
930the recorded task is about to schedule in. This may change if
931we add a new marker at the end of the scheduler.
932
933Notice that the recorded task is 'sleep' with the PID of 4901 and it
934has an rt_prio of 5. This priority is user-space priority and not
935the internal kernel priority. The policy is 1 for SCHED_FIFO and 2
936for SCHED_RR.
937
938Doing the same with chrt -r 5 and ftrace_enabled set.
939
940# tracer: wakeup
941#
942wakeup latency trace v1.1.5 on 2.6.26-rc8
943--------------------------------------------------------------------
944 latency: 50 us, #60/60, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
945 -----------------
946 | task: sleep-4068 (uid:0 nice:0 policy:2 rt_prio:5)
947 -----------------
948
949# _------=> CPU#
950# / _-----=> irqs-off
951# | / _----=> need-resched
952# || / _---=> hardirq/softirq
953# ||| / _--=> preempt-depth
954# |||| /
955# ||||| delay
956# cmd pid ||||| time | caller
957# \ / ||||| \ | /
958ksoftirq-7 1d.H3 0us : try_to_wake_up (wake_up_process)
959ksoftirq-7 1d.H4 1us : sub_preempt_count (marker_probe_cb)
960ksoftirq-7 1d.H3 2us : check_preempt_wakeup (try_to_wake_up)
961ksoftirq-7 1d.H3 3us : update_curr (check_preempt_wakeup)
962ksoftirq-7 1d.H3 4us : calc_delta_mine (update_curr)
963ksoftirq-7 1d.H3 5us : __resched_task (check_preempt_wakeup)
964ksoftirq-7 1d.H3 6us : task_wake_up_rt (try_to_wake_up)
965ksoftirq-7 1d.H3 7us : _spin_unlock_irqrestore (try_to_wake_up)
966[...]
967ksoftirq-7 1d.H2 17us : irq_exit (smp_apic_timer_interrupt)
968ksoftirq-7 1d.H2 18us : sub_preempt_count (irq_exit)
969ksoftirq-7 1d.s3 19us : sub_preempt_count (irq_exit)
970ksoftirq-7 1..s2 20us : rcu_process_callbacks (__do_softirq)
971[...]
972ksoftirq-7 1..s2 26us : __rcu_process_callbacks (rcu_process_callbacks)
973ksoftirq-7 1d.s2 27us : _local_bh_enable (__do_softirq)
974ksoftirq-7 1d.s2 28us : sub_preempt_count (_local_bh_enable)
975ksoftirq-7 1.N.3 29us : sub_preempt_count (ksoftirqd)
976ksoftirq-7 1.N.2 30us : _cond_resched (ksoftirqd)
977ksoftirq-7 1.N.2 31us : __cond_resched (_cond_resched)
978ksoftirq-7 1.N.2 32us : add_preempt_count (__cond_resched)
979ksoftirq-7 1.N.2 33us : schedule (__cond_resched)
980ksoftirq-7 1.N.2 33us : add_preempt_count (schedule)
981ksoftirq-7 1.N.3 34us : hrtick_clear (schedule)
982ksoftirq-7 1dN.3 35us : _spin_lock (schedule)
983ksoftirq-7 1dN.3 36us : add_preempt_count (_spin_lock)
984ksoftirq-7 1d..4 37us : put_prev_task_fair (schedule)
985ksoftirq-7 1d..4 38us : update_curr (put_prev_task_fair)
986[...]
987ksoftirq-7 1d..5 47us : _spin_trylock (tracing_record_cmdline)
988ksoftirq-7 1d..5 48us : add_preempt_count (_spin_trylock)
989ksoftirq-7 1d..6 49us : _spin_unlock (tracing_record_cmdline)
990ksoftirq-7 1d..6 49us : sub_preempt_count (_spin_unlock)
991ksoftirq-7 1d..4 50us : schedule (__cond_resched)
992
993The interrupt went off while running ksoftirqd. This task runs at
994SCHED_OTHER. Why didn't we see the 'N' set early? This may be
995a harmless bug with x86_32 and 4K stacks. The need_reched() function
996that tests if we need to reschedule looks on the actual stack.
997Where as the setting of the NEED_RESCHED bit happens on the
998task's stack. But because we are in a hard interrupt, the test
999is with the interrupts stack which has that to be false. We don't
1000see the 'N' until we switch back to the task's stack.
1001
1002ftrace
1003------
1004
1005ftrace is not only the name of the tracing infrastructure, but it
1006is also a name of one of the tracers. The tracer is the function
1007tracer. Enabling the function tracer can be done from the
1008debug file system. Make sure the ftrace_enabled is set otherwise
1009this tracer is a nop.
1010
1011 # sysctl kernel.ftrace_enabled=1
1012 # echo ftrace > /debug/tracing/current_tracer
1013 # echo 1 > /debug/tracing/tracing_enabled
1014 # usleep 1
1015 # echo 0 > /debug/tracing/tracing_enabled
1016 # cat /debug/tracing/trace
1017# tracer: ftrace
1018#
1019# TASK-PID CPU# TIMESTAMP FUNCTION
1020# | | | | |
1021 bash-4003 [00] 123.638713: finish_task_switch <-schedule
1022 bash-4003 [00] 123.638714: _spin_unlock_irq <-finish_task_switch
1023 bash-4003 [00] 123.638714: sub_preempt_count <-_spin_unlock_irq
1024 bash-4003 [00] 123.638715: hrtick_set <-schedule
1025 bash-4003 [00] 123.638715: _spin_lock_irqsave <-hrtick_set
1026 bash-4003 [00] 123.638716: add_preempt_count <-_spin_lock_irqsave
1027 bash-4003 [00] 123.638716: _spin_unlock_irqrestore <-hrtick_set
1028 bash-4003 [00] 123.638717: sub_preempt_count <-_spin_unlock_irqrestore
1029 bash-4003 [00] 123.638717: hrtick_clear <-hrtick_set
1030 bash-4003 [00] 123.638718: sub_preempt_count <-schedule
1031 bash-4003 [00] 123.638718: sub_preempt_count <-preempt_schedule
1032 bash-4003 [00] 123.638719: wait_for_completion <-__stop_machine_run
1033 bash-4003 [00] 123.638719: wait_for_common <-wait_for_completion
1034 bash-4003 [00] 123.638720: _spin_lock_irq <-wait_for_common
1035 bash-4003 [00] 123.638720: add_preempt_count <-_spin_lock_irq
1036[...]
1037
1038
1039Note: It is sometimes better to enable or disable tracing directly from
1040a program, because the buffer may be overflowed by the echo commands
1041before you get to the point you want to trace. It is also easier to
1042stop the tracing at the point that you hit the part that you are
1043interested in. Since the ftrace buffer is a ring buffer with the
1044oldest data being overwritten, usually it is sufficient to start the
1045tracer with an echo command but have you code stop it. Something
1046like the following is usually appropriate for this.
1047
1048int trace_fd;
1049[...]
1050int main(int argc, char *argv[]) {
1051 [...]
1052 trace_fd = open("/debug/tracing/tracing_enabled", O_WRONLY);
1053 [...]
1054 if (condition_hit()) {
1055 write(trace_fd, "0", 1);
1056 }
1057 [...]
1058}
1059
1060
1061dynamic ftrace
1062--------------
1063
1064If CONFIG_DYNAMIC_FTRACE is set, then the system will run with
1065virtually no overhead when function tracing is disabled. The way
1066this works is the mcount function call (placed at the start of
1067every kernel function, produced by the -pg switch in gcc), starts
1068of pointing to a simple return.
1069
1070When dynamic ftrace is initialized, it calls kstop_machine to make it
1071act like a uniprocessor so that it can freely modify code without
1072worrying about other processors executing that same code. At
1073initialization, the mcount calls are change to call a "record_ip"
1074function. After this, the first time a kernel function is called,
1075it has the calling address saved in a hash table.
1076
1077Later on the ftraced kernel thread is awoken and will again call
1078kstop_machine if new functions have been recorded. The ftraced thread
1079will change all calls to mcount to "nop". Just calling mcount
1080and having mcount return has shown a 10% overhead. By converting
1081it to a nop, there is no recordable overhead to the system.
1082
1083One special side-effect to the recording of the functions being
1084traced, is that we can now selectively choose which functions we
1085want to trace and which ones we want the mcount calls to remain as
1086nops.
1087
1088Two files that contain to the enabling and disabling of recorded
1089functions are:
1090
1091 set_ftrace_filter
1092
1093and
1094
1095 set_ftrace_notrace
1096
1097A list of available functions that you can add to this files is listed
1098in:
1099
1100 available_filter_functions
1101
1102 # cat /debug/tracing/available_filter_functions
1103put_prev_task_idle
1104kmem_cache_create
1105pick_next_task_rt
1106get_online_cpus
1107pick_next_task_fair
1108mutex_lock
1109[...]
1110
1111If I'm only interested in sys_nanosleep and hrtimer_interrupt:
1112
1113 # echo sys_nanosleep hrtimer_interrupt \
1114 > /debug/tracing/set_ftrace_filter
1115 # echo ftrace > /debug/tracing/current_tracer
1116 # echo 1 > /debug/tracing/tracing_enabled
1117 # usleep 1
1118 # echo 0 > /debug/tracing/tracing_enabled
1119 # cat /debug/tracing/trace
1120# tracer: ftrace
1121#
1122# TASK-PID CPU# TIMESTAMP FUNCTION
1123# | | | | |
1124 usleep-4134 [00] 1317.070017: hrtimer_interrupt <-smp_apic_timer_interrupt
1125 usleep-4134 [00] 1317.070111: sys_nanosleep <-syscall_call
1126 <idle>-0 [00] 1317.070115: hrtimer_interrupt <-smp_apic_timer_interrupt
1127
1128To see what functions are being traced, you can cat the file:
1129
1130 # cat /debug/tracing/set_ftrace_filter
1131hrtimer_interrupt
1132sys_nanosleep
1133
1134
1135Perhaps this isn't enough. The filters also allow simple wild cards.
1136Only the following is currently available
1137
1138 <match>* - will match functions that begins with <match>
1139 *<match> - will match functions that end with <match>
1140 *<match>* - will match functions that have <match> in it
1141
1142Thats all the wild cards that are allowed.
1143
1144 <match>*<match> will not work.
1145
1146 # echo hrtimer_* > /debug/tracing/set_ftrace_filter
1147
1148Produces:
1149
1150# tracer: ftrace
1151#
1152# TASK-PID CPU# TIMESTAMP FUNCTION
1153# | | | | |
1154 bash-4003 [00] 1480.611794: hrtimer_init <-copy_process
1155 bash-4003 [00] 1480.611941: hrtimer_start <-hrtick_set
1156 bash-4003 [00] 1480.611956: hrtimer_cancel <-hrtick_clear
1157 bash-4003 [00] 1480.611956: hrtimer_try_to_cancel <-hrtimer_cancel
1158 <idle>-0 [00] 1480.612019: hrtimer_get_next_event <-get_next_timer_interrupt
1159 <idle>-0 [00] 1480.612025: hrtimer_get_next_event <-get_next_timer_interrupt
1160 <idle>-0 [00] 1480.612032: hrtimer_get_next_event <-get_next_timer_interrupt
1161 <idle>-0 [00] 1480.612037: hrtimer_get_next_event <-get_next_timer_interrupt
1162 <idle>-0 [00] 1480.612382: hrtimer_get_next_event <-get_next_timer_interrupt
1163
1164
1165Notice that we lost the sys_nanosleep.
1166
1167 # cat /debug/tracing/set_ftrace_filter
1168hrtimer_run_queues
1169hrtimer_run_pending
1170hrtimer_init
1171hrtimer_cancel
1172hrtimer_try_to_cancel
1173hrtimer_forward
1174hrtimer_start
1175hrtimer_reprogram
1176hrtimer_force_reprogram
1177hrtimer_get_next_event
1178hrtimer_interrupt
1179hrtimer_nanosleep
1180hrtimer_wakeup
1181hrtimer_get_remaining
1182hrtimer_get_res
1183hrtimer_init_sleeper
1184
1185
1186This is because the '>' and '>>' act just like they do in bash.
1187To rewrite the filters, use '>'
1188To append to the filters, use '>>'
1189
1190To clear out a filter so that all functions will be recorded again.
1191
1192 # echo > /debug/tracing/set_ftrace_filter
1193 # cat /debug/tracing/set_ftrace_filter
1194 #
1195
1196Again, now we want to append.
1197
1198 # echo sys_nanosleep > /debug/tracing/set_ftrace_filter
1199 # cat /debug/tracing/set_ftrace_filter
1200sys_nanosleep
1201 # echo hrtimer_* >> /debug/tracing/set_ftrace_filter
1202 # cat /debug/tracing/set_ftrace_filter
1203hrtimer_run_queues
1204hrtimer_run_pending
1205hrtimer_init
1206hrtimer_cancel
1207hrtimer_try_to_cancel
1208hrtimer_forward
1209hrtimer_start
1210hrtimer_reprogram
1211hrtimer_force_reprogram
1212hrtimer_get_next_event
1213hrtimer_interrupt
1214sys_nanosleep
1215hrtimer_nanosleep
1216hrtimer_wakeup
1217hrtimer_get_remaining
1218hrtimer_get_res
1219hrtimer_init_sleeper
1220
1221
1222The set_ftrace_notrace prevents those functions from being traced.
1223
1224 # echo '*preempt*' '*lock*' > /debug/tracing/set_ftrace_notrace
1225
1226Produces:
1227
1228# tracer: ftrace
1229#
1230# TASK-PID CPU# TIMESTAMP FUNCTION
1231# | | | | |
1232 bash-4043 [01] 115.281644: finish_task_switch <-schedule
1233 bash-4043 [01] 115.281645: hrtick_set <-schedule
1234 bash-4043 [01] 115.281645: hrtick_clear <-hrtick_set
1235 bash-4043 [01] 115.281646: wait_for_completion <-__stop_machine_run
1236 bash-4043 [01] 115.281647: wait_for_common <-wait_for_completion
1237 bash-4043 [01] 115.281647: kthread_stop <-stop_machine_run
1238 bash-4043 [01] 115.281648: init_waitqueue_head <-kthread_stop
1239 bash-4043 [01] 115.281648: wake_up_process <-kthread_stop
1240 bash-4043 [01] 115.281649: try_to_wake_up <-wake_up_process
1241
1242We can see that there's no more lock or preempt tracing.
1243
1244ftraced
1245-------
1246
1247As mentioned above, when dynamic ftrace is configured in, a kernel
1248thread wakes up once a second and checks to see if there are mcount
1249calls that need to be converted into nops. If there is not, then
1250it simply goes back to sleep. But if there is, it will call
1251kstop_machine to convert the calls to nops.
1252
1253There may be a case that you do not want this added latency.
1254Perhaps you are doing some audio recording and this activity might
1255cause skips in the playback. There is an interface to disable
1256and enable the ftraced kernel thread.
1257
1258 # echo 0 > /debug/tracing/ftraced_enabled
1259
1260This will disable the calling of the kstop_machine to update the
1261mcount calls to nops. Remember that there's a large overhead
1262to calling mcount. Without this kernel thread, that overhead will
1263exist.
1264
1265Any write to the ftraced_enabled file will cause the kstop_machine
1266to run if there are recorded calls to mcount. This means that a
1267user can manually perform the updates when they want to by simply
1268echoing a '0' into the ftraced_enabled file.
1269
1270The updates are also done at the beginning of enabling a tracer
1271that uses ftrace function recording.
1272
1273
1274trace_pipe
1275----------
1276
1277The trace_pipe outputs the same as trace, but the effect on the
1278tracing is different. Every read from trace_pipe is consumed.
1279This means that subsequent reads will be different. The trace
1280is live.
1281
1282 # echo ftrace > /debug/tracing/current_tracer
1283 # cat /debug/tracing/trace_pipe > /tmp/trace.out &
1284[1] 4153
1285 # echo 1 > /debug/tracing/tracing_enabled
1286 # usleep 1
1287 # echo 0 > /debug/tracing/tracing_enabled
1288 # cat /debug/tracing/trace
1289# tracer: ftrace
1290#
1291# TASK-PID CPU# TIMESTAMP FUNCTION
1292# | | | | |
1293
1294 #
1295 # cat /tmp/trace.out
1296 bash-4043 [00] 41.267106: finish_task_switch <-schedule
1297 bash-4043 [00] 41.267106: hrtick_set <-schedule
1298 bash-4043 [00] 41.267107: hrtick_clear <-hrtick_set
1299 bash-4043 [00] 41.267108: wait_for_completion <-__stop_machine_run
1300 bash-4043 [00] 41.267108: wait_for_common <-wait_for_completion
1301 bash-4043 [00] 41.267109: kthread_stop <-stop_machine_run
1302 bash-4043 [00] 41.267109: init_waitqueue_head <-kthread_stop
1303 bash-4043 [00] 41.267110: wake_up_process <-kthread_stop
1304 bash-4043 [00] 41.267110: try_to_wake_up <-wake_up_process
1305 bash-4043 [00] 41.267111: select_task_rq_rt <-try_to_wake_up
1306
1307
1308Note, reading the trace_pipe will block until more input is added.
1309By changing the tracer, trace_pipe will issue an EOF. We needed
1310to set the ftrace tracer _before_ cating the trace_pipe file.
1311
1312
1313trace entries
1314-------------
1315
1316Having too much or not enough data can be troublesome in diagnosing
1317some issue in the kernel. The file trace_entries is used to modify
1318the size of the internal trace buffers. The numbers listed
1319is the number of entries that can be recorded per CPU. To know
1320the full size, multiply the number of possible CPUS with the
1321number of entries.
1322
1323 # cat /debug/tracing/trace_entries
132465620
1325
1326Note, to modify this you must have tracing fulling disabled. To do that,
1327echo "none" into the current_tracer.
1328
1329 # echo none > /debug/tracing/current_tracer
1330 # echo 100000 > /debug/tracing/trace_entries
1331 # cat /debug/tracing/trace_entries
1332100045
1333
1334
1335Notice that we echoed in 100,000 but the size is 100,045. The entries
1336are held by individual pages. It allocates the number of pages it takes
1337to fulfill the request. If more entries may fit on the last page
1338it will add them.
1339
1340 # echo 1 > /debug/tracing/trace_entries
1341 # cat /debug/tracing/trace_entries
134285
1343
1344This shows us that 85 entries can fit on a single page.
1345
1346The number of pages that will be allocated is a percentage of available
1347memory. Allocating too much will produces an error.
1348
1349 # echo 1000000000000 > /debug/tracing/trace_entries
1350-bash: echo: write error: Cannot allocate memory
1351 # cat /debug/tracing/trace_entries
135285
1353
diff --git a/Documentation/hwmon/ibmaem b/Documentation/hwmon/ibmaem
new file mode 100644
index 000000000000..2fefaf582a43
--- /dev/null
+++ b/Documentation/hwmon/ibmaem
@@ -0,0 +1,37 @@
1Kernel driver ibmaem
2======================
3
4Supported systems:
5 * Any recent IBM System X server with Active Energy Manager support.
6 This includes the x3350, x3550, x3650, x3655, x3755, x3850 M2,
7 x3950 M2, and certain HS2x/LS2x/QS2x blades. The IPMI host interface
8 driver ("ipmi-si") needs to be loaded for this driver to do anything.
9 Prefix: 'ibmaem'
10 Datasheet: Not available
11
12Author: Darrick J. Wong
13
14Description
15-----------
16
17This driver implements sensor reading support for the energy and power
18meters available on various IBM System X hardware through the BMC. All
19sensor banks will be exported as platform devices; this driver can talk
20to both v1 and v2 interfaces. This driver is completely separate from the
21older ibmpex driver.
22
23The v1 AEM interface has a simple set of features to monitor energy use.
24There is a register that displays an estimate of raw energy consumption
25since the last BMC reset, and a power sensor that returns average power
26use over a configurable interval.
27
28The v2 AEM interface is a bit more sophisticated, being able to present
29a wider range of energy and power use registers, the power cap as
30set by the AEM software, and temperature sensors.
31
32Special Features
33----------------
34
35The "power_cap" value displays the current system power cap, as set by
36the Active Energy Manager software. Setting the power cap from the host
37is not currently supported.
diff --git a/Documentation/hwmon/sysfs-interface b/Documentation/hwmon/sysfs-interface
index f4a8ebc1ef1a..2d845730d4e0 100644
--- a/Documentation/hwmon/sysfs-interface
+++ b/Documentation/hwmon/sysfs-interface
@@ -2,17 +2,12 @@ Naming and data format standards for sysfs files
2------------------------------------------------ 2------------------------------------------------
3 3
4The libsensors library offers an interface to the raw sensors data 4The libsensors library offers an interface to the raw sensors data
5through the sysfs interface. See libsensors documentation and source for 5through the sysfs interface. Since lm-sensors 3.0.0, libsensors is
6further information. As of writing this document, libsensors 6completely chip-independent. It assumes that all the kernel drivers
7(from lm_sensors 2.8.3) is heavily chip-dependent. Adding or updating 7implement the standard sysfs interface described in this document.
8support for any given chip requires modifying the library's code. 8This makes adding or updating support for any given chip very easy, as
9This is because libsensors was written for the procfs interface 9libsensors, and applications using it, do not need to be modified.
10older kernel modules were using, which wasn't standardized enough. 10This is a major improvement compared to lm-sensors 2.
11Recent versions of libsensors (from lm_sensors 2.8.2 and later) have
12support for the sysfs interface, though.
13
14The new sysfs interface was designed to be as chip-independent as
15possible.
16 11
17Note that motherboards vary widely in the connections to sensor chips. 12Note that motherboards vary widely in the connections to sensor chips.
18There is no standard that ensures, for example, that the second 13There is no standard that ensures, for example, that the second
@@ -35,19 +30,17 @@ access this data in a simple and consistent way. That said, such programs
35will have to implement conversion, labeling and hiding of inputs. For 30will have to implement conversion, labeling and hiding of inputs. For
36this reason, it is still not recommended to bypass the library. 31this reason, it is still not recommended to bypass the library.
37 32
38If you are developing a userspace application please send us feedback on
39this standard.
40
41Note that this standard isn't completely established yet, so it is subject
42to changes. If you are writing a new hardware monitoring driver those
43features can't seem to fit in this interface, please contact us with your
44extension proposal. Keep in mind that backward compatibility must be
45preserved.
46
47Each chip gets its own directory in the sysfs /sys/devices tree. To 33Each chip gets its own directory in the sysfs /sys/devices tree. To
48find all sensor chips, it is easier to follow the device symlinks from 34find all sensor chips, it is easier to follow the device symlinks from
49/sys/class/hwmon/hwmon*. 35/sys/class/hwmon/hwmon*.
50 36
37Up to lm-sensors 3.0.0, libsensors looks for hardware monitoring attributes
38in the "physical" device directory. Since lm-sensors 3.0.1, attributes found
39in the hwmon "class" device directory are also supported. Complex drivers
40(e.g. drivers for multifunction chips) may want to use this possibility to
41avoid namespace pollution. The only drawback will be that older versions of
42libsensors won't support the driver in question.
43
51All sysfs values are fixed point numbers. 44All sysfs values are fixed point numbers.
52 45
53There is only one value per file, unlike the older /proc specification. 46There is only one value per file, unlike the older /proc specification.
diff --git a/Documentation/i2c/writing-clients b/Documentation/i2c/writing-clients
index ee75cbace28d..d4cd4126d1ad 100644
--- a/Documentation/i2c/writing-clients
+++ b/Documentation/i2c/writing-clients
@@ -25,12 +25,23 @@ routines, and should be zero-initialized except for fields with data you
25provide. A client structure holds device-specific information like the 25provide. A client structure holds device-specific information like the
26driver model device node, and its I2C address. 26driver model device node, and its I2C address.
27 27
28/* iff driver uses driver model ("new style") binding model: */
29
30static struct i2c_device_id foo_idtable[] = {
31 { "foo", my_id_for_foo },
32 { "bar", my_id_for_bar },
33 { }
34};
35
36MODULE_DEVICE_TABLE(i2c, foo_idtable);
37
28static struct i2c_driver foo_driver = { 38static struct i2c_driver foo_driver = {
29 .driver = { 39 .driver = {
30 .name = "foo", 40 .name = "foo",
31 }, 41 },
32 42
33 /* iff driver uses driver model ("new style") binding model: */ 43 /* iff driver uses driver model ("new style") binding model: */
44 .id_table = foo_ids,
34 .probe = foo_probe, 45 .probe = foo_probe,
35 .remove = foo_remove, 46 .remove = foo_remove,
36 47
@@ -173,10 +184,9 @@ handle may be used during foo_probe(). If foo_probe() reports success
173(zero not a negative status code) it may save the handle and use it until 184(zero not a negative status code) it may save the handle and use it until
174foo_remove() returns. That binding model is used by most Linux drivers. 185foo_remove() returns. That binding model is used by most Linux drivers.
175 186
176Drivers match devices when i2c_client.driver_name and the driver name are 187The probe function is called when an entry in the id_table name field
177the same; this approach is used in several other busses that don't have 188matches the device's name. It is passed the entry that was matched so
178device typing support in the hardware. The driver and module name should 189the driver knows which one in the table matched.
179match, so hotplug/coldplug mechanisms will modprobe the driver.
180 190
181 191
182Device Creation (Standard driver model) 192Device Creation (Standard driver model)
diff --git a/Documentation/kernel-doc-nano-HOWTO.txt b/Documentation/kernel-doc-nano-HOWTO.txt
index 2075c0658bf5..0bd32748a467 100644
--- a/Documentation/kernel-doc-nano-HOWTO.txt
+++ b/Documentation/kernel-doc-nano-HOWTO.txt
@@ -1,6 +1,105 @@
1kernel-doc nano-HOWTO 1kernel-doc nano-HOWTO
2===================== 2=====================
3 3
4How to format kernel-doc comments
5---------------------------------
6
7In order to provide embedded, 'C' friendly, easy to maintain,
8but consistent and extractable documentation of the functions and
9data structures in the Linux kernel, the Linux kernel has adopted
10a consistent style for documenting functions and their parameters,
11and structures and their members.
12
13The format for this documentation is called the kernel-doc format.
14It is documented in this Documentation/kernel-doc-nano-HOWTO.txt file.
15
16This style embeds the documentation within the source files, using
17a few simple conventions. The scripts/kernel-doc perl script, some
18SGML templates in Documentation/DocBook, and other tools understand
19these conventions, and are used to extract this embedded documentation
20into various documents.
21
22In order to provide good documentation of kernel functions and data
23structures, please use the following conventions to format your
24kernel-doc comments in Linux kernel source.
25
26We definitely need kernel-doc formatted documentation for functions
27that are exported to loadable modules using EXPORT_SYMBOL.
28
29We also look to provide kernel-doc formatted documentation for
30functions externally visible to other kernel files (not marked
31"static").
32
33We also recommend providing kernel-doc formatted documentation
34for private (file "static") routines, for consistency of kernel
35source code layout. But this is lower priority and at the
36discretion of the MAINTAINER of that kernel source file.
37
38Data structures visible in kernel include files should also be
39documented using kernel-doc formatted comments.
40
41The opening comment mark "/**" is reserved for kernel-doc comments.
42Only comments so marked will be considered by the kernel-doc scripts,
43and any comment so marked must be in kernel-doc format. Do not use
44"/**" to be begin a comment block unless the comment block contains
45kernel-doc formatted comments. The closing comment marker for
46kernel-doc comments can be either "*/" or "**/".
47
48Kernel-doc comments should be placed just before the function
49or data structure being described.
50
51Example kernel-doc function comment:
52
53/**
54 * foobar() - short function description of foobar
55 * @arg1: Describe the first argument to foobar.
56 * @arg2: Describe the second argument to foobar.
57 * One can provide multiple line descriptions
58 * for arguments.
59 *
60 * A longer description, with more discussion of the function foobar()
61 * that might be useful to those using or modifying it. Begins with
62 * empty comment line, and may include additional embedded empty
63 * comment lines.
64 *
65 * The longer description can have multiple paragraphs.
66 **/
67
68The first line, with the short description, must be on a single line.
69
70The @argument descriptions must begin on the very next line following
71this opening short function description line, with no intervening
72empty comment lines.
73
74Example kernel-doc data structure comment.
75
76/**
77 * struct blah - the basic blah structure
78 * @mem1: describe the first member of struct blah
79 * @mem2: describe the second member of struct blah,
80 * perhaps with more lines and words.
81 *
82 * Longer description of this structure.
83 **/
84
85The kernel-doc function comments describe each parameter to the
86function, in order, with the @name lines.
87
88The kernel-doc data structure comments describe each structure member
89in the data structure, with the @name lines.
90
91The longer description formatting is "reflowed", losing your line
92breaks. So presenting carefully formatted lists within these
93descriptions won't work so well; derived documentation will lose
94the formatting.
95
96See the section below "How to add extractable documentation to your
97source files" for more details and notes on how to format kernel-doc
98comments.
99
100Components of the kernel-doc system
101-----------------------------------
102
4Many places in the source tree have extractable documentation in the 103Many places in the source tree have extractable documentation in the
5form of block comments above functions. The components of this system 104form of block comments above functions. The components of this system
6are: 105are:
diff --git a/Documentation/kernel-docs.txt b/Documentation/kernel-docs.txt
index 5a4ef48224ae..28cdc2af2131 100644
--- a/Documentation/kernel-docs.txt
+++ b/Documentation/kernel-docs.txt
@@ -715,14 +715,14 @@
715 715
716 * Name: "Gary's Encyclopedia - The Linux Kernel" 716 * Name: "Gary's Encyclopedia - The Linux Kernel"
717 Author: Gary (I suppose...). 717 Author: Gary (I suppose...).
718 URL: http://www.lisoleg.net/cgi-bin/lisoleg.pl?view=kernel.htm 718 URL: http://slencyclopedia.berlios.de/index.html
719 Keywords: links, not found here?. 719 Keywords: linux, community, everything!
720 Description: Gary's Encyclopedia exists to allow the rapid finding 720 Description: Gary's Encyclopedia exists to allow the rapid finding
721 of documentation and other information of interest to GNU/Linux 721 of documentation and other information of interest to GNU/Linux
722 users. It has about 4000 links to external pages in 150 major 722 users. It has about 4000 links to external pages in 150 major
723 categories. This link is for kernel-specific links, documents, 723 categories. This link is for kernel-specific links, documents,
724 sites... Look there if you could not find here what you were 724 sites... This list is now hosted by developer.Berlios.de,
725 looking for. 725 but seems not to have been updated since sometime in 1999.
726 726
727 * Name: "The home page of Linux-MM" 727 * Name: "The home page of Linux-MM"
728 Author: The Linux-MM team. 728 Author: The Linux-MM team.
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index e07c432c731f..f22386321e7a 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -295,7 +295,7 @@ and is between 256 and 4096 characters. It is defined in the file
295 when initialising the APIC and IO-APIC components. 295 when initialising the APIC and IO-APIC components.
296 296
297 apm= [APM] Advanced Power Management 297 apm= [APM] Advanced Power Management
298 See header of arch/i386/kernel/apm.c. 298 See header of arch/x86/kernel/apm_32.c.
299 299
300 arcrimi= [HW,NET] ARCnet - "RIM I" (entirely mem-mapped) cards 300 arcrimi= [HW,NET] ARCnet - "RIM I" (entirely mem-mapped) cards
301 Format: <io>,<irq>,<nodeID> 301 Format: <io>,<irq>,<nodeID>
@@ -638,7 +638,7 @@ and is between 256 and 4096 characters. It is defined in the file
638 638
639 elanfreq= [X86-32] 639 elanfreq= [X86-32]
640 See comment before function elanfreq_setup() in 640 See comment before function elanfreq_setup() in
641 arch/i386/kernel/cpu/cpufreq/elanfreq.c. 641 arch/x86/kernel/cpu/cpufreq/elanfreq.c.
642 642
643 elevator= [IOSCHED] 643 elevator= [IOSCHED]
644 Format: {"anticipatory" | "cfq" | "deadline" | "noop"} 644 Format: {"anticipatory" | "cfq" | "deadline" | "noop"}
@@ -1571,6 +1571,10 @@ and is between 256 and 4096 characters. It is defined in the file
1571 Format: { parport<nr> | timid | 0 } 1571 Format: { parport<nr> | timid | 0 }
1572 See also Documentation/parport.txt. 1572 See also Documentation/parport.txt.
1573 1573
1574 pmtmr= [X86] Manual setup of pmtmr I/O Port.
1575 Override pmtimer IOPort with a hex value.
1576 e.g. pmtmr=0x508
1577
1574 pnpacpi= [ACPI] 1578 pnpacpi= [ACPI]
1575 { off } 1579 { off }
1576 1580
@@ -1679,6 +1683,10 @@ and is between 256 and 4096 characters. It is defined in the file
1679 Format: <reboot_mode>[,<reboot_mode2>[,...]] 1683 Format: <reboot_mode>[,<reboot_mode2>[,...]]
1680 See arch/*/kernel/reboot.c or arch/*/kernel/process.c 1684 See arch/*/kernel/reboot.c or arch/*/kernel/process.c
1681 1685
1686 relax_domain_level=
1687 [KNL, SMP] Set scheduler's default relax_domain_level.
1688 See Documentation/cpusets.txt.
1689
1682 reserve= [KNL,BUGS] Force the kernel to ignore some iomem area 1690 reserve= [KNL,BUGS] Force the kernel to ignore some iomem area
1683 1691
1684 reservetop= [X86-32] 1692 reservetop= [X86-32]
diff --git a/Documentation/kobject.txt b/Documentation/kobject.txt
index bf3256e04027..51a8021ee532 100644
--- a/Documentation/kobject.txt
+++ b/Documentation/kobject.txt
@@ -305,7 +305,7 @@ should not be manipulated by any other user.
305 305
306A kset keeps its children in a standard kernel linked list. Kobjects point 306A kset keeps its children in a standard kernel linked list. Kobjects point
307back to their containing kset via their kset field. In almost all cases, 307back to their containing kset via their kset field. In almost all cases,
308the kobjects belonging to a ket have that kset (or, strictly, its embedded 308the kobjects belonging to a kset have that kset (or, strictly, its embedded
309kobject) in their parent. 309kobject) in their parent.
310 310
311As a kset contains a kobject within it, it should always be dynamically 311As a kset contains a kobject within it, it should always be dynamically
diff --git a/Documentation/laptops/thinkpad-acpi.txt b/Documentation/laptops/thinkpad-acpi.txt
index 01c6c3d8a7e3..64b3f146e4b0 100644
--- a/Documentation/laptops/thinkpad-acpi.txt
+++ b/Documentation/laptops/thinkpad-acpi.txt
@@ -503,7 +503,7 @@ generate input device EV_KEY events.
503In addition to the EV_KEY events, thinkpad-acpi may also issue EV_SW 503In addition to the EV_KEY events, thinkpad-acpi may also issue EV_SW
504events for switches: 504events for switches:
505 505
506SW_RADIO T60 and later hardare rfkill rocker switch 506SW_RFKILL_ALL T60 and later hardare rfkill rocker switch
507SW_TABLET_MODE Tablet ThinkPads HKEY events 0x5009 and 0x500A 507SW_TABLET_MODE Tablet ThinkPads HKEY events 0x5009 and 0x500A
508 508
509Non hot-key ACPI HKEY event map: 509Non hot-key ACPI HKEY event map:
diff --git a/Documentation/lguest/lguest.c b/Documentation/lguest/lguest.c
index 3be8ab2a886a..82fafe0429fe 100644
--- a/Documentation/lguest/lguest.c
+++ b/Documentation/lguest/lguest.c
@@ -157,6 +157,9 @@ struct virtqueue
157 157
158 /* The routine to call when the Guest pings us. */ 158 /* The routine to call when the Guest pings us. */
159 void (*handle_output)(int fd, struct virtqueue *me); 159 void (*handle_output)(int fd, struct virtqueue *me);
160
161 /* Outstanding buffers */
162 unsigned int inflight;
160}; 163};
161 164
162/* Remember the arguments to the program so we can "reboot" */ 165/* Remember the arguments to the program so we can "reboot" */
@@ -702,6 +705,7 @@ static unsigned get_vq_desc(struct virtqueue *vq,
702 errx(1, "Looped descriptor"); 705 errx(1, "Looped descriptor");
703 } while ((i = next_desc(vq, i)) != vq->vring.num); 706 } while ((i = next_desc(vq, i)) != vq->vring.num);
704 707
708 vq->inflight++;
705 return head; 709 return head;
706} 710}
707 711
@@ -719,6 +723,7 @@ static void add_used(struct virtqueue *vq, unsigned int head, int len)
719 /* Make sure buffer is written before we update index. */ 723 /* Make sure buffer is written before we update index. */
720 wmb(); 724 wmb();
721 vq->vring.used->idx++; 725 vq->vring.used->idx++;
726 vq->inflight--;
722} 727}
723 728
724/* This actually sends the interrupt for this virtqueue */ 729/* This actually sends the interrupt for this virtqueue */
@@ -726,8 +731,9 @@ static void trigger_irq(int fd, struct virtqueue *vq)
726{ 731{
727 unsigned long buf[] = { LHREQ_IRQ, vq->config.irq }; 732 unsigned long buf[] = { LHREQ_IRQ, vq->config.irq };
728 733
729 /* If they don't want an interrupt, don't send one. */ 734 /* If they don't want an interrupt, don't send one, unless empty. */
730 if (vq->vring.avail->flags & VRING_AVAIL_F_NO_INTERRUPT) 735 if ((vq->vring.avail->flags & VRING_AVAIL_F_NO_INTERRUPT)
736 && vq->inflight)
731 return; 737 return;
732 738
733 /* Send the Guest an interrupt tell them we used something up. */ 739 /* Send the Guest an interrupt tell them we used something up. */
@@ -1107,6 +1113,7 @@ static void add_virtqueue(struct device *dev, unsigned int num_descs,
1107 vq->next = NULL; 1113 vq->next = NULL;
1108 vq->last_avail_idx = 0; 1114 vq->last_avail_idx = 0;
1109 vq->dev = dev; 1115 vq->dev = dev;
1116 vq->inflight = 0;
1110 1117
1111 /* Initialize the configuration. */ 1118 /* Initialize the configuration. */
1112 vq->config.num = num_descs; 1119 vq->config.num = num_descs;
@@ -1368,6 +1375,7 @@ static void setup_tun_net(const char *arg)
1368 1375
1369 /* Tell Guest what MAC address to use. */ 1376 /* Tell Guest what MAC address to use. */
1370 add_feature(dev, VIRTIO_NET_F_MAC); 1377 add_feature(dev, VIRTIO_NET_F_MAC);
1378 add_feature(dev, VIRTIO_F_NOTIFY_ON_EMPTY);
1371 set_config(dev, sizeof(conf), &conf); 1379 set_config(dev, sizeof(conf), &conf);
1372 1380
1373 /* We don't need the socket any more; setup is done. */ 1381 /* We don't need the socket any more; setup is done. */
diff --git a/Documentation/networking/arcnet.txt b/Documentation/networking/arcnet.txt
index 770fc41a78e8..796012540386 100644
--- a/Documentation/networking/arcnet.txt
+++ b/Documentation/networking/arcnet.txt
@@ -46,7 +46,7 @@ These are the ARCnet drivers for Linux.
46 46
47 47
48This new release (2.91) has been put together by David Woodhouse 48This new release (2.91) has been put together by David Woodhouse
49<dwmw2@cam.ac.uk>, in an attempt to tidy up the driver after adding support 49<dwmw2@infradead.org>, in an attempt to tidy up the driver after adding support
50for yet another chipset. Now the generic support has been separated from the 50for yet another chipset. Now the generic support has been separated from the
51individual chipset drivers, and the source files aren't quite so packed with 51individual chipset drivers, and the source files aren't quite so packed with
52#ifdefs! I've changed this file a bit, but kept it in the first person from 52#ifdefs! I've changed this file a bit, but kept it in the first person from
diff --git a/Documentation/networking/bridge.txt b/Documentation/networking/bridge.txt
index bdae2db4119c..bec69a8a1697 100644
--- a/Documentation/networking/bridge.txt
+++ b/Documentation/networking/bridge.txt
@@ -1,6 +1,6 @@
1In order to use the Ethernet bridging functionality, you'll need the 1In order to use the Ethernet bridging functionality, you'll need the
2userspace tools. These programs and documentation are available 2userspace tools. These programs and documentation are available
3at http://bridge.sourceforge.net. The download page is 3at http://www.linux-foundation.org/en/Net:Bridge. The download page is
4http://prdownloads.sourceforge.net/bridge. 4http://prdownloads.sourceforge.net/bridge.
5 5
6If you still have questions, don't hesitate to post to the mailing list 6If you still have questions, don't hesitate to post to the mailing list
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 17a6e46fbd43..946b66e1b652 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -81,23 +81,23 @@ inet_peer_minttl - INTEGER
81 Minimum time-to-live of entries. Should be enough to cover fragment 81 Minimum time-to-live of entries. Should be enough to cover fragment
82 time-to-live on the reassembling side. This minimum time-to-live is 82 time-to-live on the reassembling side. This minimum time-to-live is
83 guaranteed if the pool size is less than inet_peer_threshold. 83 guaranteed if the pool size is less than inet_peer_threshold.
84 Measured in jiffies(1). 84 Measured in seconds.
85 85
86inet_peer_maxttl - INTEGER 86inet_peer_maxttl - INTEGER
87 Maximum time-to-live of entries. Unused entries will expire after 87 Maximum time-to-live of entries. Unused entries will expire after
88 this period of time if there is no memory pressure on the pool (i.e. 88 this period of time if there is no memory pressure on the pool (i.e.
89 when the number of entries in the pool is very small). 89 when the number of entries in the pool is very small).
90 Measured in jiffies(1). 90 Measured in seconds.
91 91
92inet_peer_gc_mintime - INTEGER 92inet_peer_gc_mintime - INTEGER
93 Minimum interval between garbage collection passes. This interval is 93 Minimum interval between garbage collection passes. This interval is
94 in effect under high memory pressure on the pool. 94 in effect under high memory pressure on the pool.
95 Measured in jiffies(1). 95 Measured in seconds.
96 96
97inet_peer_gc_maxtime - INTEGER 97inet_peer_gc_maxtime - INTEGER
98 Minimum interval between garbage collection passes. This interval is 98 Minimum interval between garbage collection passes. This interval is
99 in effect under low (or absent) memory pressure on the pool. 99 in effect under low (or absent) memory pressure on the pool.
100 Measured in jiffies(1). 100 Measured in seconds.
101 101
102TCP variables: 102TCP variables:
103 103
@@ -148,9 +148,9 @@ tcp_available_congestion_control - STRING
148 but not loaded. 148 but not loaded.
149 149
150tcp_base_mss - INTEGER 150tcp_base_mss - INTEGER
151 The initial value of search_low to be used by Packetization Layer 151 The initial value of search_low to be used by the packetization layer
152 Path MTU Discovery (MTU probing). If MTU probing is enabled, 152 Path MTU discovery (MTU probing). If MTU probing is enabled,
153 this is the inital MSS used by the connection. 153 this is the initial MSS used by the connection.
154 154
155tcp_congestion_control - STRING 155tcp_congestion_control - STRING
156 Set the congestion control algorithm to be used for new 156 Set the congestion control algorithm to be used for new
@@ -185,10 +185,9 @@ tcp_frto - INTEGER
185 timeouts. It is particularly beneficial in wireless environments 185 timeouts. It is particularly beneficial in wireless environments
186 where packet loss is typically due to random radio interference 186 where packet loss is typically due to random radio interference
187 rather than intermediate router congestion. F-RTO is sender-side 187 rather than intermediate router congestion. F-RTO is sender-side
188 only modification. Therefore it does not require any support from 188 only modification. Therefore it does not require any support from
189 the peer, but in a typical case, however, where wireless link is 189 the peer.
190 the local access link and most of the data flows downlink, the 190
191 faraway servers should have F-RTO enabled to take advantage of it.
192 If set to 1, basic version is enabled. 2 enables SACK enhanced 191 If set to 1, basic version is enabled. 2 enables SACK enhanced
193 F-RTO if flow uses SACK. The basic version can be used also when 192 F-RTO if flow uses SACK. The basic version can be used also when
194 SACK is in use though scenario(s) with it exists where F-RTO 193 SACK is in use though scenario(s) with it exists where F-RTO
@@ -276,7 +275,7 @@ tcp_mem - vector of 3 INTEGERs: min, pressure, max
276 memory. 275 memory.
277 276
278tcp_moderate_rcvbuf - BOOLEAN 277tcp_moderate_rcvbuf - BOOLEAN
279 If set, TCP performs receive buffer autotuning, attempting to 278 If set, TCP performs receive buffer auto-tuning, attempting to
280 automatically size the buffer (no greater than tcp_rmem[2]) to 279 automatically size the buffer (no greater than tcp_rmem[2]) to
281 match the size required by the path for full throughput. Enabled by 280 match the size required by the path for full throughput. Enabled by
282 default. 281 default.
@@ -336,7 +335,7 @@ tcp_rmem - vector of 3 INTEGERs: min, default, max
336 pressure. 335 pressure.
337 Default: 8K 336 Default: 8K
338 337
339 default: default size of receive buffer used by TCP sockets. 338 default: initial size of receive buffer used by TCP sockets.
340 This value overrides net.core.rmem_default used by other protocols. 339 This value overrides net.core.rmem_default used by other protocols.
341 Default: 87380 bytes. This value results in window of 65535 with 340 Default: 87380 bytes. This value results in window of 65535 with
342 default setting of tcp_adv_win_scale and tcp_app_win:0 and a bit 341 default setting of tcp_adv_win_scale and tcp_app_win:0 and a bit
@@ -344,8 +343,10 @@ tcp_rmem - vector of 3 INTEGERs: min, default, max
344 343
345 max: maximal size of receive buffer allowed for automatically 344 max: maximal size of receive buffer allowed for automatically
346 selected receiver buffers for TCP socket. This value does not override 345 selected receiver buffers for TCP socket. This value does not override
347 net.core.rmem_max, "static" selection via SO_RCVBUF does not use this. 346 net.core.rmem_max. Calling setsockopt() with SO_RCVBUF disables
348 Default: 87380*2 bytes. 347 automatic tuning of that socket's receive buffer size, in which
348 case this value is ignored.
349 Default: between 87380B and 4MB, depending on RAM size.
349 350
350tcp_sack - BOOLEAN 351tcp_sack - BOOLEAN
351 Enable select acknowledgments (SACKS). 352 Enable select acknowledgments (SACKS).
@@ -358,7 +359,7 @@ tcp_slow_start_after_idle - BOOLEAN
358 Default: 1 359 Default: 1
359 360
360tcp_stdurg - BOOLEAN 361tcp_stdurg - BOOLEAN
361 Use the Host requirements interpretation of the TCP urg pointer field. 362 Use the Host requirements interpretation of the TCP urgent pointer field.
362 Most hosts use the older BSD interpretation, so if you turn this on 363 Most hosts use the older BSD interpretation, so if you turn this on
363 Linux might not communicate correctly with them. 364 Linux might not communicate correctly with them.
364 Default: FALSE 365 Default: FALSE
@@ -371,12 +372,12 @@ tcp_synack_retries - INTEGER
371tcp_syncookies - BOOLEAN 372tcp_syncookies - BOOLEAN
372 Only valid when the kernel was compiled with CONFIG_SYNCOOKIES 373 Only valid when the kernel was compiled with CONFIG_SYNCOOKIES
373 Send out syncookies when the syn backlog queue of a socket 374 Send out syncookies when the syn backlog queue of a socket
374 overflows. This is to prevent against the common 'syn flood attack' 375 overflows. This is to prevent against the common 'SYN flood attack'
375 Default: FALSE 376 Default: FALSE
376 377
377 Note, that syncookies is fallback facility. 378 Note, that syncookies is fallback facility.
378 It MUST NOT be used to help highly loaded servers to stand 379 It MUST NOT be used to help highly loaded servers to stand
379 against legal connection rate. If you see synflood warnings 380 against legal connection rate. If you see SYN flood warnings
380 in your logs, but investigation shows that they occur 381 in your logs, but investigation shows that they occur
381 because of overload with legal connections, you should tune 382 because of overload with legal connections, you should tune
382 another parameters until this warning disappear. 383 another parameters until this warning disappear.
@@ -386,7 +387,7 @@ tcp_syncookies - BOOLEAN
386 to use TCP extensions, can result in serious degradation 387 to use TCP extensions, can result in serious degradation
387 of some services (f.e. SMTP relaying), visible not by you, 388 of some services (f.e. SMTP relaying), visible not by you,
388 but your clients and relays, contacting you. While you see 389 but your clients and relays, contacting you. While you see
389 synflood warnings in logs not being really flooded, your server 390 SYN flood warnings in logs not being really flooded, your server
390 is seriously misconfigured. 391 is seriously misconfigured.
391 392
392tcp_syn_retries - INTEGER 393tcp_syn_retries - INTEGER
@@ -419,19 +420,21 @@ tcp_window_scaling - BOOLEAN
419 Enable window scaling as defined in RFC1323. 420 Enable window scaling as defined in RFC1323.
420 421
421tcp_wmem - vector of 3 INTEGERs: min, default, max 422tcp_wmem - vector of 3 INTEGERs: min, default, max
422 min: Amount of memory reserved for send buffers for TCP socket. 423 min: Amount of memory reserved for send buffers for TCP sockets.
423 Each TCP socket has rights to use it due to fact of its birth. 424 Each TCP socket has rights to use it due to fact of its birth.
424 Default: 4K 425 Default: 4K
425 426
426 default: Amount of memory allowed for send buffers for TCP socket 427 default: initial size of send buffer used by TCP sockets. This
427 by default. This value overrides net.core.wmem_default used 428 value overrides net.core.wmem_default used by other protocols.
428 by other protocols, it is usually lower than net.core.wmem_default. 429 It is usually lower than net.core.wmem_default.
429 Default: 16K 430 Default: 16K
430 431
431 max: Maximal amount of memory allowed for automatically selected 432 max: Maximal amount of memory allowed for automatically tuned
432 send buffers for TCP socket. This value does not override 433 send buffers for TCP sockets. This value does not override
433 net.core.wmem_max, "static" selection via SO_SNDBUF does not use this. 434 net.core.wmem_max. Calling setsockopt() with SO_SNDBUF disables
434 Default: 128K 435 automatic tuning of that socket's send buffer size, in which case
436 this value is ignored.
437 Default: between 64K and 4MB, depending on RAM size.
435 438
436tcp_workaround_signed_windows - BOOLEAN 439tcp_workaround_signed_windows - BOOLEAN
437 If set, assume no receipt of a window scaling option means the 440 If set, assume no receipt of a window scaling option means the
@@ -794,10 +797,6 @@ tag - INTEGER
794 Allows you to write a number, which can be used as required. 797 Allows you to write a number, which can be used as required.
795 Default value is 0. 798 Default value is 0.
796 799
797(1) Jiffie: internal timeunit for the kernel. On the i386 1/100s, on the
798Alpha 1/1024s. See the HZ define in /usr/include/asm/param.h for the exact
799value on your system.
800
801Alexey Kuznetsov. 800Alexey Kuznetsov.
802kuznet@ms2.inr.ac.ru 801kuznet@ms2.inr.ac.ru
803 802
@@ -1064,24 +1063,193 @@ bridge-nf-filter-pppoe-tagged - BOOLEAN
1064 Default: 1 1063 Default: 1
1065 1064
1066 1065
1067UNDOCUMENTED: 1066proc/sys/net/sctp/* Variables:
1067
1068addip_enable - BOOLEAN
1069 Enable or disable extension of Dynamic Address Reconfiguration
1070 (ADD-IP) functionality specified in RFC5061. This extension provides
1071 the ability to dynamically add and remove new addresses for the SCTP
1072 associations.
1073
1074 1: Enable extension.
1075
1076 0: Disable extension.
1077
1078 Default: 0
1079
1080addip_noauth_enable - BOOLEAN
1081 Dynamic Address Reconfiguration (ADD-IP) requires the use of
1082 authentication to protect the operations of adding or removing new
1083 addresses. This requirement is mandated so that unauthorized hosts
1084 would not be able to hijack associations. However, older
1085 implementations may not have implemented this requirement while
1086 allowing the ADD-IP extension. For reasons of interoperability,
1087 we provide this variable to control the enforcement of the
1088 authentication requirement.
1089
1090 1: Allow ADD-IP extension to be used without authentication. This
1091 should only be set in a closed environment for interoperability
1092 with older implementations.
1093
1094 0: Enforce the authentication requirement
1095
1096 Default: 0
1097
1098auth_enable - BOOLEAN
1099 Enable or disable Authenticated Chunks extension. This extension
1100 provides the ability to send and receive authenticated chunks and is
1101 required for secure operation of Dynamic Address Reconfiguration
1102 (ADD-IP) extension.
1103
1104 1: Enable this extension.
1105 0: Disable this extension.
1106
1107 Default: 0
1108
1109prsctp_enable - BOOLEAN
1110 Enable or disable the Partial Reliability extension (RFC3758) which
1111 is used to notify peers that a given DATA should no longer be expected.
1112
1113 1: Enable extension
1114 0: Disable
1115
1116 Default: 1
1117
1118max_burst - INTEGER
1119 The limit of the number of new packets that can be initially sent. It
1120 controls how bursty the generated traffic can be.
1121
1122 Default: 4
1123
1124association_max_retrans - INTEGER
1125 Set the maximum number for retransmissions that an association can
1126 attempt deciding that the remote end is unreachable. If this value
1127 is exceeded, the association is terminated.
1128
1129 Default: 10
1130
1131max_init_retransmits - INTEGER
1132 The maximum number of retransmissions of INIT and COOKIE-ECHO chunks
1133 that an association will attempt before declaring the destination
1134 unreachable and terminating.
1135
1136 Default: 8
1137
1138path_max_retrans - INTEGER
1139 The maximum number of retransmissions that will be attempted on a given
1140 path. Once this threshold is exceeded, the path is considered
1141 unreachable, and new traffic will use a different path when the
1142 association is multihomed.
1143
1144 Default: 5
1145
1146rto_initial - INTEGER
1147 The initial round trip timeout value in milliseconds that will be used
1148 in calculating round trip times. This is the initial time interval
1149 for retransmissions.
1068 1150
1069dev_weight FIXME 1151 Default: 3000
1070discovery_slots FIXME 1152
1071discovery_timeout FIXME 1153rto_max - INTEGER
1072fast_poll_increase FIXME 1154 The maximum value (in milliseconds) of the round trip timeout. This
1073ip6_queue_maxlen FIXME 1155 is the largest time interval that can elapse between retransmissions.
1074lap_keepalive_time FIXME 1156
1075lo_cong FIXME 1157 Default: 60000
1076max_baud_rate FIXME 1158
1077max_dgram_qlen FIXME 1159rto_min - INTEGER
1078max_noreply_time FIXME 1160 The minimum value (in milliseconds) of the round trip timeout. This
1079max_tx_data_size FIXME 1161 is the smallest time interval the can elapse between retransmissions.
1080max_tx_window FIXME 1162
1081min_tx_turn_time FIXME 1163 Default: 1000
1082mod_cong FIXME 1164
1083no_cong FIXME 1165hb_interval - INTEGER
1084no_cong_thresh FIXME 1166 The interval (in milliseconds) between HEARTBEAT chunks. These chunks
1085slot_timeout FIXME 1167 are sent at the specified interval on idle paths to probe the state of
1086warn_noreply_time FIXME 1168 a given path between 2 associations.
1169
1170 Default: 30000
1171
1172sack_timeout - INTEGER
1173 The amount of time (in milliseconds) that the implementation will wait
1174 to send a SACK.
1175
1176 Default: 200
1177
1178valid_cookie_life - INTEGER
1179 The default lifetime of the SCTP cookie (in milliseconds). The cookie
1180 is used during association establishment.
1181
1182 Default: 60000
1183
1184cookie_preserve_enable - BOOLEAN
1185 Enable or disable the ability to extend the lifetime of the SCTP cookie
1186 that is used during the establishment phase of SCTP association
1187
1188 1: Enable cookie lifetime extension.
1189 0: Disable
1190
1191 Default: 1
1192
1193rcvbuf_policy - INTEGER
1194 Determines if the receive buffer is attributed to the socket or to
1195 association. SCTP supports the capability to create multiple
1196 associations on a single socket. When using this capability, it is
1197 possible that a single stalled association that's buffering a lot
1198 of data may block other associations from delivering their data by
1199 consuming all of the receive buffer space. To work around this,
1200 the rcvbuf_policy could be set to attribute the receiver buffer space
1201 to each association instead of the socket. This prevents the described
1202 blocking.
1203
1204 1: rcvbuf space is per association
1205 0: recbuf space is per socket
1206
1207 Default: 0
1208
1209sndbuf_policy - INTEGER
1210 Similar to rcvbuf_policy above, this applies to send buffer space.
1211
1212 1: Send buffer is tracked per association
1213 0: Send buffer is tracked per socket.
1214
1215 Default: 0
1216
1217sctp_mem - vector of 3 INTEGERs: min, pressure, max
1218 Number of pages allowed for queueing by all SCTP sockets.
1219
1220 min: Below this number of pages SCTP is not bothered about its
1221 memory appetite. When amount of memory allocated by SCTP exceeds
1222 this number, SCTP starts to moderate memory usage.
1223
1224 pressure: This value was introduced to follow format of tcp_mem.
1225
1226 max: Number of pages allowed for queueing by all SCTP sockets.
1227
1228 Default is calculated at boot time from amount of available memory.
1229
1230sctp_rmem - vector of 3 INTEGERs: min, default, max
1231 See tcp_rmem for a description.
1232
1233sctp_wmem - vector of 3 INTEGERs: min, default, max
1234 See tcp_wmem for a description.
1235
1236UNDOCUMENTED:
1087 1237
1238/proc/sys/net/core/*
1239 dev_weight FIXME
1240
1241/proc/sys/net/unix/*
1242 max_dgram_qlen FIXME
1243
1244/proc/sys/net/irda/*
1245 fast_poll_increase FIXME
1246 warn_noreply_time FIXME
1247 discovery_slots FIXME
1248 slot_timeout FIXME
1249 max_baud_rate FIXME
1250 discovery_timeout FIXME
1251 lap_keepalive_time FIXME
1252 max_noreply_time FIXME
1253 max_tx_data_size FIXME
1254 max_tx_window FIXME
1255 min_tx_turn_time FIXME
diff --git a/Documentation/networking/s2io.txt b/Documentation/networking/s2io.txt
index 4bde53e85f3f..1e28e2ddb90a 100644
--- a/Documentation/networking/s2io.txt
+++ b/Documentation/networking/s2io.txt
@@ -83,9 +83,9 @@ Valid range: Limited by memory on system
83Default: 30 83Default: 30
84 84
85e. intr_type 85e. intr_type
86Specifies interrupt type. Possible values 1(INTA), 2(MSI), 3(MSI-X) 86Specifies interrupt type. Possible values 0(INTA), 2(MSI-X)
87Valid range: 1-3 87Valid values: 0, 2
88Default: 1 88Default: 2
89 89
905. Performance suggestions 905. Performance suggestions
91General: 91General:
diff --git a/Documentation/video4linux/CARDLIST.au0828 b/Documentation/video4linux/CARDLIST.au0828
index aaae360312e4..86d1c8e7b18f 100644
--- a/Documentation/video4linux/CARDLIST.au0828
+++ b/Documentation/video4linux/CARDLIST.au0828
@@ -1,4 +1,4 @@
1 0 -> Unknown board (au0828) 1 0 -> Unknown board (au0828)
2 1 -> Hauppauge HVR950Q (au0828) [2040:7200] 2 1 -> Hauppauge HVR950Q (au0828) [2040:7200,2040:7210,2040:7217,2040:721b,2040:721f,2040:7280,0fd9:0008]
3 2 -> Hauppauge HVR850 (au0828) [2040:7240] 3 2 -> Hauppauge HVR850 (au0828) [2040:7240]
4 3 -> DViCO FusionHDTV USB (au0828) [0fe9:d620] 4 3 -> DViCO FusionHDTV USB (au0828) [0fe9:d620]
diff --git a/Documentation/video4linux/CARDLIST.cx88 b/Documentation/video4linux/CARDLIST.cx88
index 543957346469..7cf5685d3645 100644
--- a/Documentation/video4linux/CARDLIST.cx88
+++ b/Documentation/video4linux/CARDLIST.cx88
@@ -60,7 +60,7 @@
60 59 -> DViCO FusionHDTV 5 PCI nano [18ac:d530] 60 59 -> DViCO FusionHDTV 5 PCI nano [18ac:d530]
61 60 -> Pinnacle Hybrid PCTV [12ab:1788] 61 60 -> Pinnacle Hybrid PCTV [12ab:1788]
62 61 -> Winfast TV2000 XP Global [107d:6f18] 62 61 -> Winfast TV2000 XP Global [107d:6f18]
63 62 -> PowerColor Real Angel 330 [14f1:ea3d] 63 62 -> PowerColor RA330 [14f1:ea3d]
64 63 -> Geniatech X8000-MT DVBT [14f1:8852] 64 63 -> Geniatech X8000-MT DVBT [14f1:8852]
65 64 -> DViCO FusionHDTV DVB-T PRO [18ac:db30] 65 64 -> DViCO FusionHDTV DVB-T PRO [18ac:db30]
66 65 -> DViCO FusionHDTV 7 Gold [18ac:d610] 66 65 -> DViCO FusionHDTV 7 Gold [18ac:d610]
diff --git a/Documentation/video4linux/cx18.txt b/Documentation/video4linux/cx18.txt
index 077d56ec3f3d..6842c262890f 100644
--- a/Documentation/video4linux/cx18.txt
+++ b/Documentation/video4linux/cx18.txt
@@ -1,7 +1,9 @@
1Some notes regarding the cx18 driver for the Conexant CX23418 MPEG 1Some notes regarding the cx18 driver for the Conexant CX23418 MPEG
2encoder chip: 2encoder chip:
3 3
41) The only hardware currently supported is the Hauppauge HVR-1600. 41) The only hardware currently supported is the Hauppauge HVR-1600
5 card and the Compro VideoMate H900 (note that this card only
6 supports analog input, it has no digital tuner!).
5 7
62) Some people have problems getting the i2c bus to work. Cause unknown. 82) Some people have problems getting the i2c bus to work. Cause unknown.
7 The symptom is that the eeprom cannot be read and the card is 9 The symptom is that the eeprom cannot be read and the card is
diff --git a/Documentation/vm/pagemap.txt b/Documentation/vm/pagemap.txt
new file mode 100644
index 000000000000..ce72c0fe6177
--- /dev/null
+++ b/Documentation/vm/pagemap.txt
@@ -0,0 +1,77 @@
1pagemap, from the userspace perspective
2---------------------------------------
3
4pagemap is a new (as of 2.6.25) set of interfaces in the kernel that allow
5userspace programs to examine the page tables and related information by
6reading files in /proc.
7
8There are three components to pagemap:
9
10 * /proc/pid/pagemap. This file lets a userspace process find out which
11 physical frame each virtual page is mapped to. It contains one 64-bit
12 value for each virtual page, containing the following data (from
13 fs/proc/task_mmu.c, above pagemap_read):
14
15 * Bits 0-55 page frame number (PFN) if present
16 * Bits 0-4 swap type if swapped
17 * Bits 5-55 swap offset if swapped
18 * Bits 55-60 page shift (page size = 1<<page shift)
19 * Bit 61 reserved for future use
20 * Bit 62 page swapped
21 * Bit 63 page present
22
23 If the page is not present but in swap, then the PFN contains an
24 encoding of the swap file number and the page's offset into the
25 swap. Unmapped pages return a null PFN. This allows determining
26 precisely which pages are mapped (or in swap) and comparing mapped
27 pages between processes.
28
29 Efficient users of this interface will use /proc/pid/maps to
30 determine which areas of memory are actually mapped and llseek to
31 skip over unmapped regions.
32
33 * /proc/kpagecount. This file contains a 64-bit count of the number of
34 times each page is mapped, indexed by PFN.
35
36 * /proc/kpageflags. This file contains a 64-bit set of flags for each
37 page, indexed by PFN.
38
39 The flags are (from fs/proc/proc_misc, above kpageflags_read):
40
41 0. LOCKED
42 1. ERROR
43 2. REFERENCED
44 3. UPTODATE
45 4. DIRTY
46 5. LRU
47 6. ACTIVE
48 7. SLAB
49 8. WRITEBACK
50 9. RECLAIM
51 10. BUDDY
52
53Using pagemap to do something useful:
54
55The general procedure for using pagemap to find out about a process' memory
56usage goes like this:
57
58 1. Read /proc/pid/maps to determine which parts of the memory space are
59 mapped to what.
60 2. Select the maps you are interested in -- all of them, or a particular
61 library, or the stack or the heap, etc.
62 3. Open /proc/pid/pagemap and seek to the pages you would like to examine.
63 4. Read a u64 for each page from pagemap.
64 5. Open /proc/kpagecount and/or /proc/kpageflags. For each PFN you just
65 read, seek to that entry in the file, and read the data you want.
66
67For example, to find the "unique set size" (USS), which is the amount of
68memory that a process is using that is not shared with any other process,
69you can go through every map in the process, find the PFNs, look those up
70in kpagecount, and tally up the number of pages that are only referenced
71once.
72
73Other notes:
74
75Reading from any of the files will return -EINVAL if you are not starting
76the read on an 8-byte boundary (e.g., if you seeked an odd number of bytes
77into the file), or if the size of the read is not a multiple of 8 bytes.
diff --git a/Documentation/vm/slabinfo.c b/Documentation/vm/slabinfo.c
index e4230ed16ee7..df3227605d59 100644
--- a/Documentation/vm/slabinfo.c
+++ b/Documentation/vm/slabinfo.c
@@ -1,7 +1,7 @@
1/* 1/*
2 * Slabinfo: Tool to get reports about slabs 2 * Slabinfo: Tool to get reports about slabs
3 * 3 *
4 * (C) 2007 sgi, Christoph Lameter <clameter@sgi.com> 4 * (C) 2007 sgi, Christoph Lameter
5 * 5 *
6 * Compile by: 6 * Compile by:
7 * 7 *
@@ -99,7 +99,7 @@ void fatal(const char *x, ...)
99 99
100void usage(void) 100void usage(void)
101{ 101{
102 printf("slabinfo 5/7/2007. (c) 2007 sgi. clameter@sgi.com\n\n" 102 printf("slabinfo 5/7/2007. (c) 2007 sgi.\n\n"
103 "slabinfo [-ahnpvtsz] [-d debugopts] [slab-regexp]\n" 103 "slabinfo [-ahnpvtsz] [-d debugopts] [slab-regexp]\n"
104 "-a|--aliases Show aliases\n" 104 "-a|--aliases Show aliases\n"
105 "-A|--activity Most active slabs first\n" 105 "-A|--activity Most active slabs first\n"
diff --git a/Documentation/vm/slub.txt b/Documentation/vm/slub.txt
index 7c13f22a0c9e..bb1f5c6e28b3 100644
--- a/Documentation/vm/slub.txt
+++ b/Documentation/vm/slub.txt
@@ -266,4 +266,4 @@ of other objects.
266 266
267 slub_debug=FZ,dentry 267 slub_debug=FZ,dentry
268 268
269Christoph Lameter, <clameter@sgi.com>, May 30, 2007 269Christoph Lameter, May 30, 2007