aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/ABI/stable/sysfs-devices-node7
-rw-r--r--Documentation/fault-injection/provoke-crashes.txt38
-rw-r--r--Documentation/feature-removal-schedule.txt38
-rw-r--r--Documentation/filesystems/Locking18
-rw-r--r--Documentation/filesystems/nfs/nfs41-server.txt5
-rw-r--r--Documentation/filesystems/proc.txt53
-rw-r--r--Documentation/gpio.txt64
-rw-r--r--Documentation/hwmon/adt741142
-rw-r--r--Documentation/hwmon/adt747374
-rw-r--r--Documentation/hwmon/asc7621296
-rw-r--r--Documentation/hwmon/it8753
-rw-r--r--Documentation/hwmon/lm9022
-rw-r--r--Documentation/init.txt49
-rw-r--r--Documentation/kprobes.txt207
-rw-r--r--Documentation/kvm/api.txt12
-rw-r--r--Documentation/vm/slub.txt1
16 files changed, 844 insertions, 135 deletions
diff --git a/Documentation/ABI/stable/sysfs-devices-node b/Documentation/ABI/stable/sysfs-devices-node
new file mode 100644
index 000000000000..49b82cad7003
--- /dev/null
+++ b/Documentation/ABI/stable/sysfs-devices-node
@@ -0,0 +1,7 @@
1What: /sys/devices/system/node/nodeX
2Date: October 2002
3Contact: Linux Memory Management list <linux-mm@kvack.org>
4Description:
5 When CONFIG_NUMA is enabled, this is a directory containing
6 information on node X such as what CPUs are local to the
7 node.
diff --git a/Documentation/fault-injection/provoke-crashes.txt b/Documentation/fault-injection/provoke-crashes.txt
new file mode 100644
index 000000000000..7a9d3d81525b
--- /dev/null
+++ b/Documentation/fault-injection/provoke-crashes.txt
@@ -0,0 +1,38 @@
1The lkdtm module provides an interface to crash or injure the kernel at
2predefined crashpoints to evaluate the reliability of crash dumps obtained
3using different dumping solutions. The module uses KPROBEs to instrument
4crashing points, but can also crash the kernel directly without KRPOBE
5support.
6
7
8You can provide the way either through module arguments when inserting
9the module, or through a debugfs interface.
10
11Usage: insmod lkdtm.ko [recur_count={>0}] cpoint_name=<> cpoint_type=<>
12 [cpoint_count={>0}]
13
14 recur_count : Recursion level for the stack overflow test. Default is 10.
15
16 cpoint_name : Crash point where the kernel is to be crashed. It can be
17 one of INT_HARDWARE_ENTRY, INT_HW_IRQ_EN, INT_TASKLET_ENTRY,
18 FS_DEVRW, MEM_SWAPOUT, TIMERADD, SCSI_DISPATCH_CMD,
19 IDE_CORE_CP, DIRECT
20
21 cpoint_type : Indicates the action to be taken on hitting the crash point.
22 It can be one of PANIC, BUG, EXCEPTION, LOOP, OVERFLOW,
23 CORRUPT_STACK, UNALIGNED_LOAD_STORE_WRITE, OVERWRITE_ALLOCATION,
24 WRITE_AFTER_FREE,
25
26 cpoint_count : Indicates the number of times the crash point is to be hit
27 to trigger an action. The default is 10.
28
29You can also induce failures by mounting debugfs and writing the type to
30<mountpoint>/provoke-crash/<crashpoint>. E.g.,
31
32 mount -t debugfs debugfs /mnt
33 echo EXCEPTION > /mnt/provoke-crash/INT_HARDWARE_ENTRY
34
35
36A special file is `DIRECT' which will induce the crash directly without
37KPROBE instrumentation. This mode is the only one available when the module
38is built on a kernel without KPROBEs support.
diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt
index 03497909539e..a5cc0db63d7a 100644
--- a/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@ -449,12 +449,6 @@ Who: Alok N Kataria <akataria@vmware.com>
449 449
450---------------------------- 450----------------------------
451 451
452What: adt7473 hardware monitoring driver
453When: February 2010
454Why: Obsoleted by the adt7475 driver.
455Who: Jean Delvare <khali@linux-fr.org>
456
457---------------------------
458What: Support for lcd_switch and display_get in asus-laptop driver 452What: Support for lcd_switch and display_get in asus-laptop driver
459When: March 2010 453When: March 2010
460Why: These two features use non-standard interfaces. There are the 454Why: These two features use non-standard interfaces. There are the
@@ -556,3 +550,35 @@ Why: udev fully replaces this special file system that only contains CAPI
556 NCCI TTY device nodes. User space (pppdcapiplugin) works without 550 NCCI TTY device nodes. User space (pppdcapiplugin) works without
557 noticing the difference. 551 noticing the difference.
558Who: Jan Kiszka <jan.kiszka@web.de> 552Who: Jan Kiszka <jan.kiszka@web.de>
553
554----------------------------
555
556What: KVM memory aliases support
557When: July 2010
558Why: Memory aliasing support is used for speeding up guest vga access
559 through the vga windows.
560
561 Modern userspace no longer uses this feature, so it's just bitrotted
562 code and can be removed with no impact.
563Who: Avi Kivity <avi@redhat.com>
564
565----------------------------
566
567What: KVM kernel-allocated memory slots
568When: July 2010
569Why: Since 2.6.25, kvm supports user-allocated memory slots, which are
570 much more flexible than kernel-allocated slots. All current userspace
571 supports the newer interface and this code can be removed with no
572 impact.
573Who: Avi Kivity <avi@redhat.com>
574
575----------------------------
576
577What: KVM paravirt mmu host support
578When: January 2011
579Why: The paravirt mmu host support is slower than non-paravirt mmu, both
580 on newer and older hardware. It is already not exposed to the guest,
581 and kept only for live migration purposes.
582Who: Avi Kivity <avi@redhat.com>
583
584----------------------------
diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index 18b9d0ca0630..06bbbed71206 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -460,13 +460,6 @@ in sys_read() and friends.
460 460
461--------------------------- dquot_operations ------------------------------- 461--------------------------- dquot_operations -------------------------------
462prototypes: 462prototypes:
463 int (*initialize) (struct inode *, int);
464 int (*drop) (struct inode *);
465 int (*alloc_space) (struct inode *, qsize_t, int);
466 int (*alloc_inode) (const struct inode *, unsigned long);
467 int (*free_space) (struct inode *, qsize_t);
468 int (*free_inode) (const struct inode *, unsigned long);
469 int (*transfer) (struct inode *, struct iattr *);
470 int (*write_dquot) (struct dquot *); 463 int (*write_dquot) (struct dquot *);
471 int (*acquire_dquot) (struct dquot *); 464 int (*acquire_dquot) (struct dquot *);
472 int (*release_dquot) (struct dquot *); 465 int (*release_dquot) (struct dquot *);
@@ -479,13 +472,6 @@ a proper locking wrt the filesystem and call the generic quota operations.
479What filesystem should expect from the generic quota functions: 472What filesystem should expect from the generic quota functions:
480 473
481 FS recursion Held locks when called 474 FS recursion Held locks when called
482initialize: yes maybe dqonoff_sem
483drop: yes -
484alloc_space: ->mark_dirty() -
485alloc_inode: ->mark_dirty() -
486free_space: ->mark_dirty() -
487free_inode: ->mark_dirty() -
488transfer: yes -
489write_dquot: yes dqonoff_sem or dqptr_sem 475write_dquot: yes dqonoff_sem or dqptr_sem
490acquire_dquot: yes dqonoff_sem or dqptr_sem 476acquire_dquot: yes dqonoff_sem or dqptr_sem
491release_dquot: yes dqonoff_sem or dqptr_sem 477release_dquot: yes dqonoff_sem or dqptr_sem
@@ -495,10 +481,6 @@ write_info: yes dqonoff_sem
495FS recursion means calling ->quota_read() and ->quota_write() from superblock 481FS recursion means calling ->quota_read() and ->quota_write() from superblock
496operations. 482operations.
497 483
498->alloc_space(), ->alloc_inode(), ->free_space(), ->free_inode() are called
499only directly by the filesystem and do not call any fs functions only
500the ->mark_dirty() operation.
501
502More details about quota locking can be found in fs/dquot.c. 484More details about quota locking can be found in fs/dquot.c.
503 485
504--------------------------- vm_operations_struct ----------------------------- 486--------------------------- vm_operations_struct -----------------------------
diff --git a/Documentation/filesystems/nfs/nfs41-server.txt b/Documentation/filesystems/nfs/nfs41-server.txt
index 1bd0d0c05171..6a53a84afc72 100644
--- a/Documentation/filesystems/nfs/nfs41-server.txt
+++ b/Documentation/filesystems/nfs/nfs41-server.txt
@@ -17,8 +17,7 @@ kernels must turn 4.1 on or off *before* turning support for version 4
17on or off; rpc.nfsd does this correctly.) 17on or off; rpc.nfsd does this correctly.)
18 18
19The NFSv4 minorversion 1 (NFSv4.1) implementation in nfsd is based 19The NFSv4 minorversion 1 (NFSv4.1) implementation in nfsd is based
20on the latest NFSv4.1 Internet Draft: 20on RFC 5661.
21http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion1-29
22 21
23From the many new features in NFSv4.1 the current implementation 22From the many new features in NFSv4.1 the current implementation
24focuses on the mandatory-to-implement NFSv4.1 Sessions, providing 23focuses on the mandatory-to-implement NFSv4.1 Sessions, providing
@@ -44,7 +43,7 @@ interoperability problems with future clients. Known issues:
44 trunking, but this is a mandatory feature, and its use is 43 trunking, but this is a mandatory feature, and its use is
45 recommended to clients in a number of places. (E.g. to ensure 44 recommended to clients in a number of places. (E.g. to ensure
46 timely renewal in case an existing connection's retry timeouts 45 timely renewal in case an existing connection's retry timeouts
47 have gotten too long; see section 8.3 of the draft.) 46 have gotten too long; see section 8.3 of the RFC.)
48 Therefore, lack of this feature may cause future clients to 47 Therefore, lack of this feature may cause future clients to
49 fail. 48 fail.
50 - Incomplete backchannel support: incomplete backchannel gss 49 - Incomplete backchannel support: incomplete backchannel gss
diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index 0d07513a67a6..96a44dd95e03 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -164,6 +164,7 @@ read the file /proc/PID/status:
164 VmExe: 68 kB 164 VmExe: 68 kB
165 VmLib: 1412 kB 165 VmLib: 1412 kB
166 VmPTE: 20 kb 166 VmPTE: 20 kb
167 VmSwap: 0 kB
167 Threads: 1 168 Threads: 1
168 SigQ: 0/28578 169 SigQ: 0/28578
169 SigPnd: 0000000000000000 170 SigPnd: 0000000000000000
@@ -188,6 +189,12 @@ memory usage. Its seven fields are explained in Table 1-3. The stat file
188contains details information about the process itself. Its fields are 189contains details information about the process itself. Its fields are
189explained in Table 1-4. 190explained in Table 1-4.
190 191
192(for SMP CONFIG users)
193For making accounting scalable, RSS related information are handled in
194asynchronous manner and the vaule may not be very precise. To see a precise
195snapshot of a moment, you can see /proc/<pid>/smaps file and scan page table.
196It's slow but very precise.
197
191Table 1-2: Contents of the statm files (as of 2.6.30-rc7) 198Table 1-2: Contents of the statm files (as of 2.6.30-rc7)
192.............................................................................. 199..............................................................................
193 Field Content 200 Field Content
@@ -213,6 +220,7 @@ Table 1-2: Contents of the statm files (as of 2.6.30-rc7)
213 VmExe size of text segment 220 VmExe size of text segment
214 VmLib size of shared library code 221 VmLib size of shared library code
215 VmPTE size of page table entries 222 VmPTE size of page table entries
223 VmSwap size of swap usage (the number of referred swapents)
216 Threads number of threads 224 Threads number of threads
217 SigQ number of signals queued/max. number for queue 225 SigQ number of signals queued/max. number for queue
218 SigPnd bitmap of pending signals for the thread 226 SigPnd bitmap of pending signals for the thread
@@ -430,6 +438,7 @@ Table 1-5: Kernel info in /proc
430 modules List of loaded modules 438 modules List of loaded modules
431 mounts Mounted filesystems 439 mounts Mounted filesystems
432 net Networking info (see text) 440 net Networking info (see text)
441 pagetypeinfo Additional page allocator information (see text) (2.5)
433 partitions Table of partitions known to the system 442 partitions Table of partitions known to the system
434 pci Deprecated info of PCI bus (new way -> /proc/bus/pci/, 443 pci Deprecated info of PCI bus (new way -> /proc/bus/pci/,
435 decoupled by lspci (2.4) 444 decoupled by lspci (2.4)
@@ -584,7 +593,7 @@ Node 0, zone DMA 0 4 5 4 4 3 ...
584Node 0, zone Normal 1 0 0 1 101 8 ... 593Node 0, zone Normal 1 0 0 1 101 8 ...
585Node 0, zone HighMem 2 0 0 1 1 0 ... 594Node 0, zone HighMem 2 0 0 1 1 0 ...
586 595
587Memory fragmentation is a problem under some workloads, and buddyinfo is a 596External fragmentation is a problem under some workloads, and buddyinfo is a
588useful tool for helping diagnose these problems. Buddyinfo will give you a 597useful tool for helping diagnose these problems. Buddyinfo will give you a
589clue as to how big an area you can safely allocate, or why a previous 598clue as to how big an area you can safely allocate, or why a previous
590allocation failed. 599allocation failed.
@@ -594,6 +603,48 @@ available. In this case, there are 0 chunks of 2^0*PAGE_SIZE available in
594ZONE_DMA, 4 chunks of 2^1*PAGE_SIZE in ZONE_DMA, 101 chunks of 2^4*PAGE_SIZE 603ZONE_DMA, 4 chunks of 2^1*PAGE_SIZE in ZONE_DMA, 101 chunks of 2^4*PAGE_SIZE
595available in ZONE_NORMAL, etc... 604available in ZONE_NORMAL, etc...
596 605
606More information relevant to external fragmentation can be found in
607pagetypeinfo.
608
609> cat /proc/pagetypeinfo
610Page block order: 9
611Pages per block: 512
612
613Free pages count per migrate type at order 0 1 2 3 4 5 6 7 8 9 10
614Node 0, zone DMA, type Unmovable 0 0 0 1 1 1 1 1 1 1 0
615Node 0, zone DMA, type Reclaimable 0 0 0 0 0 0 0 0 0 0 0
616Node 0, zone DMA, type Movable 1 1 2 1 2 1 1 0 1 0 2
617Node 0, zone DMA, type Reserve 0 0 0 0 0 0 0 0 0 1 0
618Node 0, zone DMA, type Isolate 0 0 0 0 0 0 0 0 0 0 0
619Node 0, zone DMA32, type Unmovable 103 54 77 1 1 1 11 8 7 1 9
620Node 0, zone DMA32, type Reclaimable 0 0 2 1 0 0 0 0 1 0 0
621Node 0, zone DMA32, type Movable 169 152 113 91 77 54 39 13 6 1 452
622Node 0, zone DMA32, type Reserve 1 2 2 2 2 0 1 1 1 1 0
623Node 0, zone DMA32, type Isolate 0 0 0 0 0 0 0 0 0 0 0
624
625Number of blocks type Unmovable Reclaimable Movable Reserve Isolate
626Node 0, zone DMA 2 0 5 1 0
627Node 0, zone DMA32 41 6 967 2 0
628
629Fragmentation avoidance in the kernel works by grouping pages of different
630migrate types into the same contiguous regions of memory called page blocks.
631A page block is typically the size of the default hugepage size e.g. 2MB on
632X86-64. By keeping pages grouped based on their ability to move, the kernel
633can reclaim pages within a page block to satisfy a high-order allocation.
634
635The pagetypinfo begins with information on the size of a page block. It
636then gives the same type of information as buddyinfo except broken down
637by migrate-type and finishes with details on how many page blocks of each
638type exist.
639
640If min_free_kbytes has been tuned correctly (recommendations made by hugeadm
641from libhugetlbfs http://sourceforge.net/projects/libhugetlbfs/), one can
642make an estimate of the likely number of huge pages that can be allocated
643at a given point in time. All the "Movable" blocks should be allocatable
644unless memory has been mlock()'d. Some of the Reclaimable blocks should
645also be allocatable although a lot of filesystem metadata may have to be
646reclaimed to achieve this.
647
597.............................................................................. 648..............................................................................
598 649
599meminfo: 650meminfo:
diff --git a/Documentation/gpio.txt b/Documentation/gpio.txt
index 1866c27eec69..c2c6e9b39bbe 100644
--- a/Documentation/gpio.txt
+++ b/Documentation/gpio.txt
@@ -253,6 +253,70 @@ pin setup (e.g. controlling which pin the GPIO uses, pullup/pulldown).
253Also note that it's your responsibility to have stopped using a GPIO 253Also note that it's your responsibility to have stopped using a GPIO
254before you free it. 254before you free it.
255 255
256Considering in most cases GPIOs are actually configured right after they
257are claimed, three additional calls are defined:
258
259 /* request a single GPIO, with initial configuration specified by
260 * 'flags', identical to gpio_request() wrt other arguments and
261 * return value
262 */
263 int gpio_request_one(unsigned gpio, unsigned long flags, const char *label);
264
265 /* request multiple GPIOs in a single call
266 */
267 int gpio_request_array(struct gpio *array, size_t num);
268
269 /* release multiple GPIOs in a single call
270 */
271 void gpio_free_array(struct gpio *array, size_t num);
272
273where 'flags' is currently defined to specify the following properties:
274
275 * GPIOF_DIR_IN - to configure direction as input
276 * GPIOF_DIR_OUT - to configure direction as output
277
278 * GPIOF_INIT_LOW - as output, set initial level to LOW
279 * GPIOF_INIT_HIGH - as output, set initial level to HIGH
280
281since GPIOF_INIT_* are only valid when configured as output, so group valid
282combinations as:
283
284 * GPIOF_IN - configure as input
285 * GPIOF_OUT_INIT_LOW - configured as output, initial level LOW
286 * GPIOF_OUT_INIT_HIGH - configured as output, initial level HIGH
287
288In the future, these flags can be extended to support more properties such
289as open-drain status.
290
291Further more, to ease the claim/release of multiple GPIOs, 'struct gpio' is
292introduced to encapsulate all three fields as:
293
294 struct gpio {
295 unsigned gpio;
296 unsigned long flags;
297 const char *label;
298 };
299
300A typical example of usage:
301
302 static struct gpio leds_gpios[] = {
303 { 32, GPIOF_OUT_INIT_HIGH, "Power LED" }, /* default to ON */
304 { 33, GPIOF_OUT_INIT_LOW, "Green LED" }, /* default to OFF */
305 { 34, GPIOF_OUT_INIT_LOW, "Red LED" }, /* default to OFF */
306 { 35, GPIOF_OUT_INIT_LOW, "Blue LED" }, /* default to OFF */
307 { ... },
308 };
309
310 err = gpio_request_one(31, GPIOF_IN, "Reset Button");
311 if (err)
312 ...
313
314 err = gpio_request_array(leds_gpios, ARRAY_SIZE(leds_gpios));
315 if (err)
316 ...
317
318 gpio_free_array(leds_gpios, ARRAY_SIZE(leds_gpios));
319
256 320
257GPIOs mapped to IRQs 321GPIOs mapped to IRQs
258-------------------- 322--------------------
diff --git a/Documentation/hwmon/adt7411 b/Documentation/hwmon/adt7411
new file mode 100644
index 000000000000..1632960f9745
--- /dev/null
+++ b/Documentation/hwmon/adt7411
@@ -0,0 +1,42 @@
1Kernel driver adt7411
2=====================
3
4Supported chips:
5 * Analog Devices ADT7411
6 Prefix: 'adt7411'
7 Addresses scanned: 0x48, 0x4a, 0x4b
8 Datasheet: Publicly available at the Analog Devices website
9
10Author: Wolfram Sang (based on adt7470 by Darrick J. Wong)
11
12Description
13-----------
14
15This driver implements support for the Analog Devices ADT7411 chip. There may
16be other chips that implement this interface.
17
18The ADT7411 can use an I2C/SMBus compatible 2-wire interface or an
19SPI-compatible 4-wire interface. It provides a 10-bit analog to digital
20converter which measures 1 temperature, vdd and 8 input voltages. It has an
21internal temperature sensor, but an external one can also be connected (one
22loses 2 inputs then). There are high- and low-limit registers for all inputs.
23
24Check the datasheet for details.
25
26sysfs-Interface
27---------------
28
29in0_input - vdd voltage input
30in[1-8]_input - analog 1-8 input
31temp1_input - temperature input
32
33Besides standard interfaces, this driver adds (0 = off, 1 = on):
34
35 adc_ref_vdd - Use vdd as reference instead of 2.25 V
36 fast_sampling - Sample at 22.5 kHz instead of 1.4 kHz, but drop filters
37 no_average - Turn off averaging over 16 samples
38
39Notes
40-----
41
42SPI, external temperature sensor and limit registers are not supported yet.
diff --git a/Documentation/hwmon/adt7473 b/Documentation/hwmon/adt7473
deleted file mode 100644
index 446612bd1fb9..000000000000
--- a/Documentation/hwmon/adt7473
+++ /dev/null
@@ -1,74 +0,0 @@
1Kernel driver adt7473
2======================
3
4Supported chips:
5 * Analog Devices ADT7473
6 Prefix: 'adt7473'
7 Addresses scanned: I2C 0x2C, 0x2D, 0x2E
8 Datasheet: Publicly available at the Analog Devices website
9
10Author: Darrick J. Wong
11
12This driver is depreacted, please use the adt7475 driver instead.
13
14Description
15-----------
16
17This driver implements support for the Analog Devices ADT7473 chip family.
18
19The ADT7473 uses the 2-wire interface compatible with the SMBUS 2.0
20specification. Using an analog to digital converter it measures three (3)
21temperatures and two (2) voltages. It has four (4) 16-bit counters for
22measuring fan speed. There are three (3) PWM outputs that can be used
23to control fan speed.
24
25A sophisticated control system for the PWM outputs is designed into the
26ADT7473 that allows fan speed to be adjusted automatically based on any of the
27three temperature sensors. Each PWM output is individually adjustable and
28programmable. Once configured, the ADT7473 will adjust the PWM outputs in
29response to the measured temperatures without further host intervention.
30This feature can also be disabled for manual control of the PWM's.
31
32Each of the measured inputs (voltage, temperature, fan speed) has
33corresponding high/low limit values. The ADT7473 will signal an ALARM if
34any measured value exceeds either limit.
35
36The ADT7473 samples all inputs continuously. The driver will not read
37the registers more often than once every other second. Further,
38configuration data is only read once per minute.
39
40Special Features
41----------------
42
43The ADT7473 have a 10-bit ADC and can therefore measure temperatures
44with 0.25 degC resolution. Temperature readings can be configured either
45for twos complement format or "Offset 64" format, wherein 63 is subtracted
46from the raw value to get the temperature value.
47
48The Analog Devices datasheet is very detailed and describes a procedure for
49determining an optimal configuration for the automatic PWM control.
50
51Configuration Notes
52-------------------
53
54Besides standard interfaces driver adds the following:
55
56* PWM Control
57
58* pwm#_auto_point1_pwm and temp#_auto_point1_temp and
59* pwm#_auto_point2_pwm and temp#_auto_point2_temp -
60
61point1: Set the pwm speed at a lower temperature bound.
62point2: Set the pwm speed at a higher temperature bound.
63
64The ADT7473 will scale the pwm between the lower and higher pwm speed when
65the temperature is between the two temperature boundaries. PWM values range
66from 0 (off) to 255 (full speed). Fan speed will be set to maximum when the
67temperature sensor associated with the PWM control exceeds temp#_max.
68
69Notes
70-----
71
72The NVIDIA binary driver presents an ADT7473 chip via an on-card i2c bus.
73Unfortunately, they fail to set the i2c adapter class, so this driver may
74fail to find the chip until the nvidia driver is patched.
diff --git a/Documentation/hwmon/asc7621 b/Documentation/hwmon/asc7621
new file mode 100644
index 000000000000..7287be7e1f21
--- /dev/null
+++ b/Documentation/hwmon/asc7621
@@ -0,0 +1,296 @@
1Kernel driver asc7621
2==================
3
4Supported chips:
5 Andigilog aSC7621 and aSC7621a
6 Prefix: 'asc7621'
7 Addresses scanned: I2C 0x2c, 0x2d, 0x2e
8 Datasheet: http://www.fairview5.com/linux/asc7621/asc7621.pdf
9
10Author:
11 George Joseph
12
13Description provided by Dave Pivin @ Andigilog:
14
15Andigilog has both the PECI and pre-PECI versions of the Heceta-6, as
16Intel calls them. Heceta-6e has high frequency PWM and Heceta-6p has
17added PECI and a 4th thermal zone. The Andigilog aSC7611 is the
18Heceta-6e part and aSC7621 is the Heceta-6p part. They are both in
19volume production, shipping to Intel and their subs.
20
21We have enhanced both parts relative to the governing Intel
22specification. First enhancement is temperature reading resolution. We
23have used registers below 20h for vendor-specific functions in addition
24to those in the Intel-specified vendor range.
25
26Our conversion process produces a result that is reported as two bytes.
27The fan speed control uses this finer value to produce a "step-less" fan
28PWM output. These two bytes are "read-locked" to guarantee that once a
29high or low byte is read, the other byte is locked-in until after the
30next read of any register. So to get an atomic reading, read high or low
31byte, then the very next read should be the opposite byte. Our data
32sheet says 10-bits of resolution, although you may find the lower bits
33are active, they are not necessarily reliable or useful externally. We
34chose not to mask them.
35
36We employ significant filtering that is user tunable as described in the
37data sheet. Our temperature reports and fan PWM outputs are very smooth
38when compared to the competition, in addition to the higher resolution
39temperature reports. The smoother PWM output does not require user
40intervention.
41
42We offer GPIO features on the former VID pins. These are open-drain
43outputs or inputs and may be used as general purpose I/O or as alarm
44outputs that are based on temperature limits. These are in 19h and 1Ah.
45
46We offer flexible mapping of temperature readings to thermal zones. Any
47temperature may be mapped to any zone, which has a default assignment
48that follows Intel's specs.
49
50Since there is a fan to zone assignment that allows for the "hotter" of
51a set of zones to control the PWM of an individual fan, but there is no
52indication to the user, we have added an indicator that shows which zone
53is currently controlling the PWM for a given fan. This is in register
5400h.
55
56Both remote diode temperature readings may be given an offset value such
57that the reported reading as well as the temperature used to determine
58PWM may be offset for system calibration purposes.
59
60PECI Extended configuration allows for having more than two domains per
61PECI address and also provides an enabling function for each PECI
62address. One could use our flexible zone assignment to have a zone
63assigned to up to 4 PECI addresses. This is not possible in the default
64Intel configuration. This would be useful in multi-CPU systems with
65individual fans on each that would benefit from individual fan control.
66This is in register 0Eh.
67
68The tachometer measurement system is flexible and able to adapt to many
69fan types. We can also support pulse-stretched PWM so that 3-wire fans
70may be used. These characteristics are in registers 04h to 07h.
71
72Finally, we have added a tach disable function that turns off the tach
73measurement system for individual tachs in order to save power. That is
74in register 75h.
75
76--
77aSC7621 Product Description
78
79The aSC7621 has a two wire digital interface compatible with SMBus 2.0.
80Using a 10-bit ADC, the aSC7621 measures the temperature of two remote diode
81connected transistors as well as its own die. Support for Platform
82Environmental Control Interface (PECI) is included.
83
84Using temperature information from these four zones, an automatic fan speed
85control algorithm is employed to minimize acoustic impact while achieving
86recommended CPU temperature under varying operational loads.
87
88To set fan speed, the aSC7621 has three independent pulse width modulation
89(PWM) outputs that are controlled by one, or a combination of three,
90temperature zones. Both high- and low-frequency PWM ranges are supported.
91
92The aSC7621 also includes a digital filter that can be invoked to smooth
93temperature readings for better control of fan speed and minimum acoustic
94impact.
95
96The aSC7621 has tachometer inputs to measure fan speed on up to four fans.
97Limit and status registers for all measured values are included to alert
98the system host that any measurements are outside of programmed limits
99via status registers.
100
101System voltages of VCCP, 2.5V, 3.3V, 5.0V, and 12V motherboard power are
102monitored efficiently with internal scaling resistors.
103
104Features
105- Supports PECI interface and monitors internal and remote thermal diodes
106- 2-wire, SMBus 2.0 compliant, serial interface
107- 10-bit ADC
108- Monitors VCCP, 2.5V, 3.3V, 5.0V, and 12V motherboard/processor supplies
109- Programmable autonomous fan control based on temperature readings
110- Noise filtering of temperature reading for fan speed control
111- 0.25C digital temperature sensor resolution
112- 3 PWM fan speed control outputs for 2-, 3- or 4-wire fans and up to 4 fan
113 tachometer inputs
114- Enhanced measured temperature to Temperature Zone assignment.
115- Provides high and low PWM frequency ranges
116- 3 GPIO pins for custom use
117- 24-Lead QSOP package
118
119Configuration Notes
120===================
121
122Except where noted below, the sysfs entries created by this driver follow
123the standards defined in "sysfs-interface".
124
125temp1_source
126 0 (default) peci_legacy = 0, Remote 1 Temperature
127 peci_legacy = 1, PECI Processor Temperature 0
128 1 Remote 1 Temperature
129 2 Remote 2 Temperature
130 3 Internal Temperature
131 4 PECI Processor Temperature 0
132 5 PECI Processor Temperature 1
133 6 PECI Processor Temperature 2
134 7 PECI Processor Temperature 3
135
136temp2_source
137 0 (default) Internal Temperature
138 1 Remote 1 Temperature
139 2 Remote 2 Temperature
140 3 Internal Temperature
141 4 PECI Processor Temperature 0
142 5 PECI Processor Temperature 1
143 6 PECI Processor Temperature 2
144 7 PECI Processor Temperature 3
145
146temp3_source
147 0 (default) Remote 2 Temperature
148 1 Remote 1 Temperature
149 2 Remote 2 Temperature
150 3 Internal Temperature
151 4 PECI Processor Temperature 0
152 5 PECI Processor Temperature 1
153 6 PECI Processor Temperature 2
154 7 PECI Processor Temperature 3
155
156temp4_source
157 0 (default) peci_legacy = 0, PECI Processor Temperature 0
158 peci_legacy = 1, Remote 1 Temperature
159 1 Remote 1 Temperature
160 2 Remote 2 Temperature
161 3 Internal Temperature
162 4 PECI Processor Temperature 0
163 5 PECI Processor Temperature 1
164 6 PECI Processor Temperature 2
165 7 PECI Processor Temperature 3
166
167temp[1-4]_smoothing_enable
168temp[1-4]_smoothing_time
169 Smooths spikes in temp readings caused by noise.
170 Valid values in milliseconds are:
171 35000
172 17600
173 11800
174 7000
175 4400
176 3000
177 1600
178 800
179
180temp[1-4]_crit
181 When the corresponding zone temperature reaches this value,
182 ALL pwm outputs will got to 100%.
183
184temp[5-8]_input
185temp[5-8]_enable
186 The aSC7621 can also read temperatures provided by the processor
187 via the PECI bus. Usually these are "core" temps and are relative
188 to the point where the automatic thermal control circuit starts
189 throttling. This means that these are usually negative numbers.
190
191pwm[1-3]_enable
192 0 Fan off.
193 1 Fan on manual control.
194 2 Fan on automatic control and will run at the minimum pwm
195 if the temperature for the zone is below the minimum.
196 3 Fan on automatic control but will be off if the temperature
197 for the zone is below the minimum.
198 4-254 Ignored.
199 255 Fan on full.
200
201pwm[1-3]_auto_channels
202 Bitmap as described in sysctl-interface with the following
203 exceptions...
204 Only the following combination of zones (and their corresponding masks)
205 are valid:
206 1
207 2
208 3
209 2,3
210 1,2,3
211 4
212 1,2,3,4
213
214 Special values:
215 0 Disabled.
216 16 Fan on manual control.
217 31 Fan on full.
218
219
220pwm[1-3]_invert
221 When set, inverts the meaning of pwm[1-3].
222 i.e. when pwm = 0, the fan will be on full and
223 when pwm = 255 the fan will be off.
224
225pwm[1-3]_freq
226 PWM frequency in Hz
227 Valid values in Hz are:
228
229 10
230 15
231 23
232 30 (default)
233 38
234 47
235 62
236 94
237 23000
238 24000
239 25000
240 26000
241 27000
242 28000
243 29000
244 30000
245
246 Setting any other value will be ignored.
247
248peci_enable
249 Enables or disables PECI
250
251peci_avg
252 Input filter average time.
253
254 0 0 Sec. (no Smoothing) (default)
255 1 0.25 Sec.
256 2 0.5 Sec.
257 3 1.0 Sec.
258 4 2.0 Sec.
259 5 4.0 Sec.
260 6 8.0 Sec.
261 7 0.0 Sec.
262
263peci_legacy
264
265 0 Standard Mode (default)
266 Remote Diode 1 reading is associated with
267 Temperature Zone 1, PECI is associated with
268 Zone 4
269
270 1 Legacy Mode
271 PECI is associated with Temperature Zone 1,
272 Remote Diode 1 is associated with Zone 4
273
274peci_diode
275 Diode filter
276
277 0 0.25 Sec.
278 1 1.1 Sec.
279 2 2.4 Sec. (default)
280 3 3.4 Sec.
281 4 5.0 Sec.
282 5 6.8 Sec.
283 6 10.2 Sec.
284 7 16.4 Sec.
285
286peci_4domain
287 Four domain enable
288
289 0 1 or 2 Domains for enabled processors (default)
290 1 3 or 4 Domains for enabled processors
291
292peci_domain
293 Domain
294
295 0 Processor contains a single domain (0) (default)
296 1 Processor contains two domains (0,1)
diff --git a/Documentation/hwmon/it87 b/Documentation/hwmon/it87
index f9ba96c0ac4a..8d08bf0d38ed 100644
--- a/Documentation/hwmon/it87
+++ b/Documentation/hwmon/it87
@@ -5,31 +5,23 @@ Supported chips:
5 * IT8705F 5 * IT8705F
6 Prefix: 'it87' 6 Prefix: 'it87'
7 Addresses scanned: from Super I/O config space (8 I/O ports) 7 Addresses scanned: from Super I/O config space (8 I/O ports)
8 Datasheet: Publicly available at the ITE website 8 Datasheet: Once publicly available at the ITE website, but no longer
9 http://www.ite.com.tw/product_info/file/pc/IT8705F_V.0.4.1.pdf
10 * IT8712F 9 * IT8712F
11 Prefix: 'it8712' 10 Prefix: 'it8712'
12 Addresses scanned: from Super I/O config space (8 I/O ports) 11 Addresses scanned: from Super I/O config space (8 I/O ports)
13 Datasheet: Publicly available at the ITE website 12 Datasheet: Once publicly available at the ITE website, but no longer
14 http://www.ite.com.tw/product_info/file/pc/IT8712F_V0.9.1.pdf
15 http://www.ite.com.tw/product_info/file/pc/Errata%20V0.1%20for%20IT8712F%20V0.9.1.pdf
16 http://www.ite.com.tw/product_info/file/pc/IT8712F_V0.9.3.pdf
17 * IT8716F/IT8726F 13 * IT8716F/IT8726F
18 Prefix: 'it8716' 14 Prefix: 'it8716'
19 Addresses scanned: from Super I/O config space (8 I/O ports) 15 Addresses scanned: from Super I/O config space (8 I/O ports)
20 Datasheet: Publicly available at the ITE website 16 Datasheet: Once publicly available at the ITE website, but no longer
21 http://www.ite.com.tw/product_info/file/pc/IT8716F_V0.3.ZIP
22 http://www.ite.com.tw/product_info/file/pc/IT8726F_V0.3.pdf
23 * IT8718F 17 * IT8718F
24 Prefix: 'it8718' 18 Prefix: 'it8718'
25 Addresses scanned: from Super I/O config space (8 I/O ports) 19 Addresses scanned: from Super I/O config space (8 I/O ports)
26 Datasheet: Publicly available at the ITE website 20 Datasheet: Once publicly available at the ITE website, but no longer
27 http://www.ite.com.tw/product_info/file/pc/IT8718F_V0.2.zip
28 http://www.ite.com.tw/product_info/file/pc/IT8718F_V0%203_(for%20C%20version).zip
29 * IT8720F 21 * IT8720F
30 Prefix: 'it8720' 22 Prefix: 'it8720'
31 Addresses scanned: from Super I/O config space (8 I/O ports) 23 Addresses scanned: from Super I/O config space (8 I/O ports)
32 Datasheet: Not yet publicly available. 24 Datasheet: Not publicly available
33 * SiS950 [clone of IT8705F] 25 * SiS950 [clone of IT8705F]
34 Prefix: 'it87' 26 Prefix: 'it87'
35 Addresses scanned: from Super I/O config space (8 I/O ports) 27 Addresses scanned: from Super I/O config space (8 I/O ports)
@@ -136,6 +128,10 @@ registers are read whenever any data is read (unless it is less than 1.5
136seconds since the last update). This means that you can easily miss 128seconds since the last update). This means that you can easily miss
137once-only alarms. 129once-only alarms.
138 130
131Out-of-limit readings can also result in beeping, if the chip is properly
132wired and configured. Beeping can be enabled or disabled per sensor type
133(temperatures, voltages and fans.)
134
139The IT87xx only updates its values each 1.5 seconds; reading it more often 135The IT87xx only updates its values each 1.5 seconds; reading it more often
140will do no harm, but will return 'old' values. 136will do no harm, but will return 'old' values.
141 137
@@ -150,11 +146,38 @@ Fan speed control
150----------------- 146-----------------
151 147
152The fan speed control features are limited to manual PWM mode. Automatic 148The fan speed control features are limited to manual PWM mode. Automatic
153"Smart Guardian" mode control handling is not implemented. However 149"Smart Guardian" mode control handling is only implemented for older chips
154if you want to go for "manual mode" just write 1 to pwmN_enable. 150(see below.) However if you want to go for "manual mode" just write 1 to
151pwmN_enable.
155 152
156If you are only able to control the fan speed with very small PWM values, 153If you are only able to control the fan speed with very small PWM values,
157try lowering the PWM base frequency (pwm1_freq). Depending on the fan, 154try lowering the PWM base frequency (pwm1_freq). Depending on the fan,
158it may give you a somewhat greater control range. The same frequency is 155it may give you a somewhat greater control range. The same frequency is
159used to drive all fan outputs, which is why pwm2_freq and pwm3_freq are 156used to drive all fan outputs, which is why pwm2_freq and pwm3_freq are
160read-only. 157read-only.
158
159
160Automatic fan speed control (old interface)
161-------------------------------------------
162
163The driver supports the old interface to automatic fan speed control
164which is implemented by IT8705F chips up to revision F and IT8712F
165chips up to revision G.
166
167This interface implements 4 temperature vs. PWM output trip points.
168The PWM output of trip point 4 is always the maximum value (fan running
169at full speed) while the PWM output of the other 3 trip points can be
170freely chosen. The temperature of all 4 trip points can be freely chosen.
171Additionally, trip point 1 has an hysteresis temperature attached, to
172prevent fast switching between fan on and off.
173
174The chip automatically computes the PWM output value based on the input
175temperature, based on this simple rule: if the temperature value is
176between trip point N and trip point N+1 then the PWM output value is
177the one of trip point N. The automatic control mode is less flexible
178than the manual control mode, but it reacts faster, is more robust and
179doesn't use CPU cycles.
180
181Trip points must be set properly before switching to automatic fan speed
182control mode. The driver will perform basic integrity checks before
183actually switching to automatic control mode.
diff --git a/Documentation/hwmon/lm90 b/Documentation/hwmon/lm90
index 93d8e3d55150..6a03dd4bcc94 100644
--- a/Documentation/hwmon/lm90
+++ b/Documentation/hwmon/lm90
@@ -84,6 +84,10 @@ Supported chips:
84 Addresses scanned: I2C 0x4c 84 Addresses scanned: I2C 0x4c
85 Datasheet: Publicly available at the Maxim website 85 Datasheet: Publicly available at the Maxim website
86 http://www.maxim-ic.com/quick_view2.cfm/qv_pk/3500 86 http://www.maxim-ic.com/quick_view2.cfm/qv_pk/3500
87 * Winbond/Nuvoton W83L771AWG/ASG
88 Prefix: 'w83l771'
89 Addresses scanned: I2C 0x4c
90 Datasheet: Not publicly available, can be requested from Nuvoton
87 91
88 92
89Author: Jean Delvare <khali@linux-fr.org> 93Author: Jean Delvare <khali@linux-fr.org>
@@ -147,6 +151,12 @@ MAX6680 and MAX6681:
147 * Selectable address 151 * Selectable address
148 * Remote sensor type selection 152 * Remote sensor type selection
149 153
154W83L771AWG/ASG
155 * The AWG and ASG variants only differ in package format.
156 * Filter and alert configuration register at 0xBF
157 * Diode ideality factor configuration (remote sensor) at 0xE3
158 * Moving average (depending on conversion rate)
159
150All temperature values are given in degrees Celsius. Resolution 160All temperature values are given in degrees Celsius. Resolution
151is 1.0 degree for the local temperature, 0.125 degree for the remote 161is 1.0 degree for the local temperature, 0.125 degree for the remote
152temperature, except for the MAX6657, MAX6658 and MAX6659 which have a 162temperature, except for the MAX6657, MAX6658 and MAX6659 which have a
@@ -163,6 +173,18 @@ The lm90 driver will not update its values more frequently than every
163other second; reading them more often will do no harm, but will return 173other second; reading them more often will do no harm, but will return
164'old' values. 174'old' values.
165 175
176SMBus Alert Support
177-------------------
178
179This driver has basic support for SMBus alert. When an alert is received,
180the status register is read and the faulty temperature channel is logged.
181
182The Analog Devices chips (ADM1032 and ADT7461) do not implement the SMBus
183alert protocol properly so additional care is needed: the ALERT output is
184disabled when an alert is received, and is re-enabled only when the alarm
185is gone. Otherwise the chip would block alerts from other chips in the bus
186as long as the alarm is active.
187
166PEC Support 188PEC Support
167----------- 189-----------
168 190
diff --git a/Documentation/init.txt b/Documentation/init.txt
new file mode 100644
index 000000000000..535ad5e82b98
--- /dev/null
+++ b/Documentation/init.txt
@@ -0,0 +1,49 @@
1Explaining the dreaded "No init found." boot hang message
2=========================================================
3
4OK, so you've got this pretty unintuitive message (currently located
5in init/main.c) and are wondering what the H*** went wrong.
6Some high-level reasons for failure (listed roughly in order of execution)
7to load the init binary are:
8A) Unable to mount root FS
9B) init binary doesn't exist on rootfs
10C) broken console device
11D) binary exists but dependencies not available
12E) binary cannot be loaded
13
14Detailed explanations:
150) Set "debug" kernel parameter (in bootloader config file or CONFIG_CMDLINE)
16 to get more detailed kernel messages.
17A) make sure you have the correct root FS type
18 (and root= kernel parameter points to the correct partition),
19 required drivers such as storage hardware (such as SCSI or USB!)
20 and filesystem (ext3, jffs2 etc.) are builtin (alternatively as modules,
21 to be pre-loaded by an initrd)
22C) Possibly a conflict in console= setup --> initial console unavailable.
23 E.g. some serial consoles are unreliable due to serial IRQ issues (e.g.
24 missing interrupt-based configuration).
25 Try using a different console= device or e.g. netconsole= .
26D) e.g. required library dependencies of the init binary such as
27 /lib/ld-linux.so.2 missing or broken. Use readelf -d <INIT>|grep NEEDED
28 to find out which libraries are required.
29E) make sure the binary's architecture matches your hardware.
30 E.g. i386 vs. x86_64 mismatch, or trying to load x86 on ARM hardware.
31 In case you tried loading a non-binary file here (shell script?),
32 you should make sure that the script specifies an interpreter in its shebang
33 header line (#!/...) that is fully working (including its library
34 dependencies). And before tackling scripts, better first test a simple
35 non-script binary such as /bin/sh and confirm its successful execution.
36 To find out more, add code to init/main.c to display kernel_execve()s
37 return values.
38
39Please extend this explanation whenever you find new failure causes
40(after all loading the init binary is a CRITICAL and hard transition step
41which needs to be made as painless as possible), then submit patch to LKML.
42Further TODOs:
43- Implement the various run_init_process() invocations via a struct array
44 which can then store the kernel_execve() result value and on failure
45 log it all by iterating over _all_ results (very important usability fix).
46- try to make the implementation itself more helpful in general,
47 e.g. by providing additional error messages at affected places.
48
49Andreas Mohr <andi at lisas period de>
diff --git a/Documentation/kprobes.txt b/Documentation/kprobes.txt
index 053037a1fe6d..2f9115c0ae62 100644
--- a/Documentation/kprobes.txt
+++ b/Documentation/kprobes.txt
@@ -1,6 +1,7 @@
1Title : Kernel Probes (Kprobes) 1Title : Kernel Probes (Kprobes)
2Authors : Jim Keniston <jkenisto@us.ibm.com> 2Authors : Jim Keniston <jkenisto@us.ibm.com>
3 : Prasanna S Panchamukhi <prasanna@in.ibm.com> 3 : Prasanna S Panchamukhi <prasanna.panchamukhi@gmail.com>
4 : Masami Hiramatsu <mhiramat@redhat.com>
4 5
5CONTENTS 6CONTENTS
6 7
@@ -15,6 +16,7 @@ CONTENTS
159. Jprobes Example 169. Jprobes Example
1610. Kretprobes Example 1710. Kretprobes Example
17Appendix A: The kprobes debugfs interface 18Appendix A: The kprobes debugfs interface
19Appendix B: The kprobes sysctl interface
18 20
191. Concepts: Kprobes, Jprobes, Return Probes 211. Concepts: Kprobes, Jprobes, Return Probes
20 22
@@ -42,13 +44,13 @@ registration/unregistration of a group of *probes. These functions
42can speed up unregistration process when you have to unregister 44can speed up unregistration process when you have to unregister
43a lot of probes at once. 45a lot of probes at once.
44 46
45The next three subsections explain how the different types of 47The next four subsections explain how the different types of
46probes work. They explain certain things that you'll need to 48probes work and how jump optimization works. They explain certain
47know in order to make the best use of Kprobes -- e.g., the 49things that you'll need to know in order to make the best use of
48difference between a pre_handler and a post_handler, and how 50Kprobes -- e.g., the difference between a pre_handler and
49to use the maxactive and nmissed fields of a kretprobe. But 51a post_handler, and how to use the maxactive and nmissed fields of
50if you're in a hurry to start using Kprobes, you can skip ahead 52a kretprobe. But if you're in a hurry to start using Kprobes, you
51to section 2. 53can skip ahead to section 2.
52 54
531.1 How Does a Kprobe Work? 551.1 How Does a Kprobe Work?
54 56
@@ -161,13 +163,125 @@ In case probed function is entered but there is no kretprobe_instance
161object available, then in addition to incrementing the nmissed count, 163object available, then in addition to incrementing the nmissed count,
162the user entry_handler invocation is also skipped. 164the user entry_handler invocation is also skipped.
163 165
1661.4 How Does Jump Optimization Work?
167
168If you configured your kernel with CONFIG_OPTPROBES=y (currently
169this option is supported on x86/x86-64, non-preemptive kernel) and
170the "debug.kprobes_optimization" kernel parameter is set to 1 (see
171sysctl(8)), Kprobes tries to reduce probe-hit overhead by using a jump
172instruction instead of a breakpoint instruction at each probepoint.
173
1741.4.1 Init a Kprobe
175
176When a probe is registered, before attempting this optimization,
177Kprobes inserts an ordinary, breakpoint-based kprobe at the specified
178address. So, even if it's not possible to optimize this particular
179probepoint, there'll be a probe there.
180
1811.4.2 Safety Check
182
183Before optimizing a probe, Kprobes performs the following safety checks:
184
185- Kprobes verifies that the region that will be replaced by the jump
186instruction (the "optimized region") lies entirely within one function.
187(A jump instruction is multiple bytes, and so may overlay multiple
188instructions.)
189
190- Kprobes analyzes the entire function and verifies that there is no
191jump into the optimized region. Specifically:
192 - the function contains no indirect jump;
193 - the function contains no instruction that causes an exception (since
194 the fixup code triggered by the exception could jump back into the
195 optimized region -- Kprobes checks the exception tables to verify this);
196 and
197 - there is no near jump to the optimized region (other than to the first
198 byte).
199
200- For each instruction in the optimized region, Kprobes verifies that
201the instruction can be executed out of line.
202
2031.4.3 Preparing Detour Buffer
204
205Next, Kprobes prepares a "detour" buffer, which contains the following
206instruction sequence:
207- code to push the CPU's registers (emulating a breakpoint trap)
208- a call to the trampoline code which calls user's probe handlers.
209- code to restore registers
210- the instructions from the optimized region
211- a jump back to the original execution path.
212
2131.4.4 Pre-optimization
214
215After preparing the detour buffer, Kprobes verifies that none of the
216following situations exist:
217- The probe has either a break_handler (i.e., it's a jprobe) or a
218post_handler.
219- Other instructions in the optimized region are probed.
220- The probe is disabled.
221In any of the above cases, Kprobes won't start optimizing the probe.
222Since these are temporary situations, Kprobes tries to start
223optimizing it again if the situation is changed.
224
225If the kprobe can be optimized, Kprobes enqueues the kprobe to an
226optimizing list, and kicks the kprobe-optimizer workqueue to optimize
227it. If the to-be-optimized probepoint is hit before being optimized,
228Kprobes returns control to the original instruction path by setting
229the CPU's instruction pointer to the copied code in the detour buffer
230-- thus at least avoiding the single-step.
231
2321.4.5 Optimization
233
234The Kprobe-optimizer doesn't insert the jump instruction immediately;
235rather, it calls synchronize_sched() for safety first, because it's
236possible for a CPU to be interrupted in the middle of executing the
237optimized region(*). As you know, synchronize_sched() can ensure
238that all interruptions that were active when synchronize_sched()
239was called are done, but only if CONFIG_PREEMPT=n. So, this version
240of kprobe optimization supports only kernels with CONFIG_PREEMPT=n.(**)
241
242After that, the Kprobe-optimizer calls stop_machine() to replace
243the optimized region with a jump instruction to the detour buffer,
244using text_poke_smp().
245
2461.4.6 Unoptimization
247
248When an optimized kprobe is unregistered, disabled, or blocked by
249another kprobe, it will be unoptimized. If this happens before
250the optimization is complete, the kprobe is just dequeued from the
251optimized list. If the optimization has been done, the jump is
252replaced with the original code (except for an int3 breakpoint in
253the first byte) by using text_poke_smp().
254
255(*)Please imagine that the 2nd instruction is interrupted and then
256the optimizer replaces the 2nd instruction with the jump *address*
257while the interrupt handler is running. When the interrupt
258returns to original address, there is no valid instruction,
259and it causes an unexpected result.
260
261(**)This optimization-safety checking may be replaced with the
262stop-machine method that ksplice uses for supporting a CONFIG_PREEMPT=y
263kernel.
264
265NOTE for geeks:
266The jump optimization changes the kprobe's pre_handler behavior.
267Without optimization, the pre_handler can change the kernel's execution
268path by changing regs->ip and returning 1. However, when the probe
269is optimized, that modification is ignored. Thus, if you want to
270tweak the kernel's execution path, you need to suppress optimization,
271using one of the following techniques:
272- Specify an empty function for the kprobe's post_handler or break_handler.
273 or
274- Config CONFIG_OPTPROBES=n.
275 or
276- Execute 'sysctl -w debug.kprobes_optimization=n'
277
1642. Architectures Supported 2782. Architectures Supported
165 279
166Kprobes, jprobes, and return probes are implemented on the following 280Kprobes, jprobes, and return probes are implemented on the following
167architectures: 281architectures:
168 282
169- i386 283- i386 (Supports jump optimization)
170- x86_64 (AMD-64, EM64T) 284- x86_64 (AMD-64, EM64T) (Supports jump optimization)
171- ppc64 285- ppc64
172- ia64 (Does not support probes on instruction slot1.) 286- ia64 (Does not support probes on instruction slot1.)
173- sparc64 (Return probes not yet implemented.) 287- sparc64 (Return probes not yet implemented.)
@@ -193,6 +307,10 @@ it useful to "Compile the kernel with debug info" (CONFIG_DEBUG_INFO),
193so you can use "objdump -d -l vmlinux" to see the source-to-object 307so you can use "objdump -d -l vmlinux" to see the source-to-object
194code mapping. 308code mapping.
195 309
310If you want to reduce probing overhead, set "Kprobes jump optimization
311support" (CONFIG_OPTPROBES) to "y". You can find this option under the
312"Kprobes" line.
313
1964. API Reference 3144. API Reference
197 315
198The Kprobes API includes a "register" function and an "unregister" 316The Kprobes API includes a "register" function and an "unregister"
@@ -389,7 +507,10 @@ the probe which has been registered.
389 507
390Kprobes allows multiple probes at the same address. Currently, 508Kprobes allows multiple probes at the same address. Currently,
391however, there cannot be multiple jprobes on the same function at 509however, there cannot be multiple jprobes on the same function at
392the same time. 510the same time. Also, a probepoint for which there is a jprobe or
511a post_handler cannot be optimized. So if you install a jprobe,
512or a kprobe with a post_handler, at an optimized probepoint, the
513probepoint will be unoptimized automatically.
393 514
394In general, you can install a probe anywhere in the kernel. 515In general, you can install a probe anywhere in the kernel.
395In particular, you can probe interrupt handlers. Known exceptions 516In particular, you can probe interrupt handlers. Known exceptions
@@ -453,6 +574,38 @@ reason, Kprobes doesn't support return probes (or kprobes or jprobes)
453on the x86_64 version of __switch_to(); the registration functions 574on the x86_64 version of __switch_to(); the registration functions
454return -EINVAL. 575return -EINVAL.
455 576
577On x86/x86-64, since the Jump Optimization of Kprobes modifies
578instructions widely, there are some limitations to optimization. To
579explain it, we introduce some terminology. Imagine a 3-instruction
580sequence consisting of a two 2-byte instructions and one 3-byte
581instruction.
582
583 IA
584 |
585[-2][-1][0][1][2][3][4][5][6][7]
586 [ins1][ins2][ ins3 ]
587 [<- DCR ->]
588 [<- JTPR ->]
589
590ins1: 1st Instruction
591ins2: 2nd Instruction
592ins3: 3rd Instruction
593IA: Insertion Address
594JTPR: Jump Target Prohibition Region
595DCR: Detoured Code Region
596
597The instructions in DCR are copied to the out-of-line buffer
598of the kprobe, because the bytes in DCR are replaced by
599a 5-byte jump instruction. So there are several limitations.
600
601a) The instructions in DCR must be relocatable.
602b) The instructions in DCR must not include a call instruction.
603c) JTPR must not be targeted by any jump or call instruction.
604d) DCR must not straddle the border betweeen functions.
605
606Anyway, these limitations are checked by the in-kernel instruction
607decoder, so you don't need to worry about that.
608
4566. Probe Overhead 6096. Probe Overhead
457 610
458On a typical CPU in use in 2005, a kprobe hit takes 0.5 to 1.0 611On a typical CPU in use in 2005, a kprobe hit takes 0.5 to 1.0
@@ -476,6 +629,19 @@ k = 0.49 usec; j = 0.76; r = 0.80; kr = 0.82; jr = 1.07
476ppc64: POWER5 (gr), 1656 MHz (SMT disabled, 1 virtual CPU per physical CPU) 629ppc64: POWER5 (gr), 1656 MHz (SMT disabled, 1 virtual CPU per physical CPU)
477k = 0.77 usec; j = 1.31; r = 1.26; kr = 1.45; jr = 1.99 630k = 0.77 usec; j = 1.31; r = 1.26; kr = 1.45; jr = 1.99
478 631
6326.1 Optimized Probe Overhead
633
634Typically, an optimized kprobe hit takes 0.07 to 0.1 microseconds to
635process. Here are sample overhead figures (in usec) for x86 architectures.
636k = unoptimized kprobe, b = boosted (single-step skipped), o = optimized kprobe,
637r = unoptimized kretprobe, rb = boosted kretprobe, ro = optimized kretprobe.
638
639i386: Intel(R) Xeon(R) E5410, 2.33GHz, 4656.90 bogomips
640k = 0.80 usec; b = 0.33; o = 0.05; r = 1.10; rb = 0.61; ro = 0.33
641
642x86-64: Intel(R) Xeon(R) E5410, 2.33GHz, 4656.90 bogomips
643k = 0.99 usec; b = 0.43; o = 0.06; r = 1.24; rb = 0.68; ro = 0.30
644
4797. TODO 6457. TODO
480 646
481a. SystemTap (http://sourceware.org/systemtap): Provides a simplified 647a. SystemTap (http://sourceware.org/systemtap): Provides a simplified
@@ -523,7 +689,8 @@ is also specified. Following columns show probe status. If the probe is on
523a virtual address that is no longer valid (module init sections, module 689a virtual address that is no longer valid (module init sections, module
524virtual addresses that correspond to modules that've been unloaded), 690virtual addresses that correspond to modules that've been unloaded),
525such probes are marked with [GONE]. If the probe is temporarily disabled, 691such probes are marked with [GONE]. If the probe is temporarily disabled,
526such probes are marked with [DISABLED]. 692such probes are marked with [DISABLED]. If the probe is optimized, it is
693marked with [OPTIMIZED].
527 694
528/sys/kernel/debug/kprobes/enabled: Turn kprobes ON/OFF forcibly. 695/sys/kernel/debug/kprobes/enabled: Turn kprobes ON/OFF forcibly.
529 696
@@ -533,3 +700,19 @@ registered probes will be disarmed, till such time a "1" is echoed to this
533file. Note that this knob just disarms and arms all kprobes and doesn't 700file. Note that this knob just disarms and arms all kprobes and doesn't
534change each probe's disabling state. This means that disabled kprobes (marked 701change each probe's disabling state. This means that disabled kprobes (marked
535[DISABLED]) will be not enabled if you turn ON all kprobes by this knob. 702[DISABLED]) will be not enabled if you turn ON all kprobes by this knob.
703
704
705Appendix B: The kprobes sysctl interface
706
707/proc/sys/debug/kprobes-optimization: Turn kprobes optimization ON/OFF.
708
709When CONFIG_OPTPROBES=y, this sysctl interface appears and it provides
710a knob to globally and forcibly turn jump optimization (see section
7111.4) ON or OFF. By default, jump optimization is allowed (ON).
712If you echo "0" to this file or set "debug.kprobes_optimization" to
7130 via sysctl, all optimized probes will be unoptimized, and any new
714probes registered after that will not be optimized. Note that this
715knob *changes* the optimized state. This means that optimized probes
716(marked [OPTIMIZED]) will be unoptimized ([OPTIMIZED] tag will be
717removed). If the knob is turned on, they will be optimized again.
718
diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt
index 2811e452f756..c6416a398163 100644
--- a/Documentation/kvm/api.txt
+++ b/Documentation/kvm/api.txt
@@ -23,12 +23,12 @@ of a virtual machine. The ioctls belong to three classes
23 Only run vcpu ioctls from the same thread that was used to create the 23 Only run vcpu ioctls from the same thread that was used to create the
24 vcpu. 24 vcpu.
25 25
262. File descritpors 262. File descriptors
27 27
28The kvm API is centered around file descriptors. An initial 28The kvm API is centered around file descriptors. An initial
29open("/dev/kvm") obtains a handle to the kvm subsystem; this handle 29open("/dev/kvm") obtains a handle to the kvm subsystem; this handle
30can be used to issue system ioctls. A KVM_CREATE_VM ioctl on this 30can be used to issue system ioctls. A KVM_CREATE_VM ioctl on this
31handle will create a VM file descripror which can be used to issue VM 31handle will create a VM file descriptor which can be used to issue VM
32ioctls. A KVM_CREATE_VCPU ioctl on a VM fd will create a virtual cpu 32ioctls. A KVM_CREATE_VCPU ioctl on a VM fd will create a virtual cpu
33and return a file descriptor pointing to it. Finally, ioctls on a vcpu 33and return a file descriptor pointing to it. Finally, ioctls on a vcpu
34fd can be used to control the vcpu, including the important task of 34fd can be used to control the vcpu, including the important task of
@@ -643,7 +643,7 @@ Type: vm ioctl
643Parameters: struct kvm_clock_data (in) 643Parameters: struct kvm_clock_data (in)
644Returns: 0 on success, -1 on error 644Returns: 0 on success, -1 on error
645 645
646Sets the current timestamp of kvmclock to the valued specific in its parameter. 646Sets the current timestamp of kvmclock to the value specified in its parameter.
647In conjunction with KVM_GET_CLOCK, it is used to ensure monotonicity on scenarios 647In conjunction with KVM_GET_CLOCK, it is used to ensure monotonicity on scenarios
648such as migration. 648such as migration.
649 649
@@ -795,11 +795,11 @@ Unused.
795 __u64 data_offset; /* relative to kvm_run start */ 795 __u64 data_offset; /* relative to kvm_run start */
796 } io; 796 } io;
797 797
798If exit_reason is KVM_EXIT_IO_IN or KVM_EXIT_IO_OUT, then the vcpu has 798If exit_reason is KVM_EXIT_IO, then the vcpu has
799executed a port I/O instruction which could not be satisfied by kvm. 799executed a port I/O instruction which could not be satisfied by kvm.
800data_offset describes where the data is located (KVM_EXIT_IO_OUT) or 800data_offset describes where the data is located (KVM_EXIT_IO_OUT) or
801where kvm expects application code to place the data for the next 801where kvm expects application code to place the data for the next
802KVM_RUN invocation (KVM_EXIT_IO_IN). Data format is a patcked array. 802KVM_RUN invocation (KVM_EXIT_IO_IN). Data format is a packed array.
803 803
804 struct { 804 struct {
805 struct kvm_debug_exit_arch arch; 805 struct kvm_debug_exit_arch arch;
@@ -815,7 +815,7 @@ Unused.
815 __u8 is_write; 815 __u8 is_write;
816 } mmio; 816 } mmio;
817 817
818If exit_reason is KVM_EXIT_MMIO or KVM_EXIT_IO_OUT, then the vcpu has 818If exit_reason is KVM_EXIT_MMIO, then the vcpu has
819executed a memory-mapped I/O instruction which could not be satisfied 819executed a memory-mapped I/O instruction which could not be satisfied
820by kvm. The 'data' member contains the written data if 'is_write' is 820by kvm. The 'data' member contains the written data if 'is_write' is
821true, and should be filled by application code otherwise. 821true, and should be filled by application code otherwise.
diff --git a/Documentation/vm/slub.txt b/Documentation/vm/slub.txt
index b37300edf27c..07375e73981a 100644
--- a/Documentation/vm/slub.txt
+++ b/Documentation/vm/slub.txt
@@ -41,6 +41,7 @@ Possible debug options are
41 P Poisoning (object and padding) 41 P Poisoning (object and padding)
42 U User tracking (free and alloc) 42 U User tracking (free and alloc)
43 T Trace (please only use on single slabs) 43 T Trace (please only use on single slabs)
44 A Toggle failslab filter mark for the cache
44 O Switch debugging off for caches that would have 45 O Switch debugging off for caches that would have
45 caused higher minimum slab orders 46 caused higher minimum slab orders
46 - Switch all debugging off (useful if the kernel is 47 - Switch all debugging off (useful if the kernel is