aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/ABI/stable/sysfs-devices-node7
-rw-r--r--Documentation/ABI/testing/sysfs-block14
-rw-r--r--Documentation/ABI/testing/sysfs-bus-usb11
-rw-r--r--Documentation/ABI/testing/sysfs-devices-power79
-rw-r--r--Documentation/ABI/testing/sysfs-platform-asus-laptop12
-rw-r--r--Documentation/ABI/testing/sysfs-platform-eeepc-laptop10
-rw-r--r--Documentation/ABI/testing/sysfs-power13
-rw-r--r--Documentation/DMA-API.txt122
-rw-r--r--Documentation/DocBook/device-drivers.tmpl2
-rw-r--r--Documentation/DocBook/deviceiobook.tmpl2
-rw-r--r--Documentation/DocBook/mac80211.tmpl3
-rw-r--r--Documentation/DocBook/mtdnand.tmpl6
-rw-r--r--Documentation/DocBook/v4l/common.xml2
-rw-r--r--Documentation/DocBook/v4l/io.xml3
-rw-r--r--Documentation/DocBook/v4l/vidioc-g-parm.xml2
-rw-r--r--Documentation/DocBook/v4l/vidioc-qbuf.xml40
-rw-r--r--Documentation/DocBook/v4l/vidioc-querybuf.xml7
-rw-r--r--Documentation/DocBook/v4l/vidioc-reqbufs.xml36
-rw-r--r--Documentation/HOWTO113
-rw-r--r--Documentation/IPMI.txt12
-rw-r--r--Documentation/Makefile4
-rw-r--r--Documentation/PCI/PCI-DMA-mapping.txt352
-rw-r--r--Documentation/RCU/00-INDEX10
-rw-r--r--Documentation/RCU/RTFP.txt61
-rw-r--r--Documentation/RCU/checklist.txt208
-rw-r--r--Documentation/RCU/lockdep.txt67
-rw-r--r--Documentation/RCU/rcu.txt48
-rw-r--r--Documentation/RCU/stallwarn.txt58
-rw-r--r--Documentation/RCU/torture.txt12
-rw-r--r--Documentation/RCU/whatisRCU.txt16
-rw-r--r--Documentation/SubmitChecklist8
-rw-r--r--Documentation/arm/Samsung-S3C24XX/CPUfreq.txt4
-rw-r--r--Documentation/arm/Samsung/Overview.txt86
-rwxr-xr-xDocumentation/arm/Samsung/clksrc-change-registers.awk167
-rw-r--r--Documentation/arm/memory.txt6
-rw-r--r--Documentation/block/queue-sysfs.txt10
-rw-r--r--Documentation/cachetlb.txt30
-rw-r--r--Documentation/cdrom/ide-cd39
-rw-r--r--Documentation/cgroups/cgroup_event_listener.c110
-rw-r--r--Documentation/cgroups/cgroups.txt39
-rw-r--r--Documentation/cgroups/cpusets.txt127
-rw-r--r--Documentation/cgroups/memcg_test.txt47
-rw-r--r--Documentation/cgroups/memory.txt80
-rw-r--r--Documentation/console/console.txt2
-rw-r--r--Documentation/cpu-freq/pcc-cpufreq.txt207
-rw-r--r--Documentation/device-mapper/snapshot.txt44
-rw-r--r--Documentation/dontdiff1
-rw-r--r--Documentation/driver-model/platform.txt2
-rw-r--r--Documentation/dvb/get_dvb_firmware23
-rw-r--r--Documentation/eisa.txt2
-rw-r--r--Documentation/email-clients.txt30
-rw-r--r--Documentation/fault-injection/provoke-crashes.txt38
-rw-r--r--Documentation/feature-removal-schedule.txt137
-rw-r--r--Documentation/filesystems/00-INDEX4
-rw-r--r--Documentation/filesystems/Locking18
-rw-r--r--Documentation/filesystems/Makefile8
-rw-r--r--Documentation/filesystems/dentry-locking.txt3
-rw-r--r--Documentation/filesystems/dnotify.txt39
-rw-r--r--Documentation/filesystems/dnotify_test.c34
-rw-r--r--Documentation/filesystems/logfs.txt241
-rw-r--r--Documentation/filesystems/nfs/nfs41-server.txt5
-rw-r--r--Documentation/filesystems/nilfs2.txt3
-rw-r--r--Documentation/filesystems/proc.txt55
-rw-r--r--Documentation/filesystems/sharedsubtree.txt16
-rw-r--r--Documentation/gpio.txt64
-rw-r--r--Documentation/hwmon/abituguru2
-rw-r--r--Documentation/hwmon/adt741142
-rw-r--r--Documentation/hwmon/adt747374
-rw-r--r--Documentation/hwmon/asc7621296
-rw-r--r--Documentation/hwmon/it8753
-rw-r--r--Documentation/hwmon/lm9022
-rw-r--r--Documentation/i2c/busses/i2c-i8013
-rw-r--r--Documentation/i2c/busses/i2c-parport3
-rw-r--r--Documentation/i2c/busses/i2c-parport-light11
-rw-r--r--Documentation/i2c/smbus-protocol16
-rw-r--r--Documentation/i2c/writing-clients5
-rw-r--r--Documentation/init.txt49
-rw-r--r--Documentation/input/rotary-encoder.txt2
-rw-r--r--Documentation/input/sentelic.txt124
-rw-r--r--Documentation/ioctl/ioctl-number.txt1
-rw-r--r--Documentation/isdn/INTERFACE.CAPI9
-rw-r--r--Documentation/isdn/README.gigaset10
-rw-r--r--Documentation/kernel-parameters.txt57
-rw-r--r--Documentation/kobject.txt2
-rw-r--r--Documentation/kprobes.txt207
-rw-r--r--Documentation/kvm/api.txt12
-rw-r--r--Documentation/laptops/00-INDEX6
-rw-r--r--Documentation/laptops/Makefile8
-rw-r--r--Documentation/laptops/dslm.c166
-rw-r--r--Documentation/laptops/laptop-mode.txt170
-rw-r--r--Documentation/laptops/thinkpad-acpi.txt4
-rw-r--r--Documentation/lguest/lguest.c1
-rw-r--r--Documentation/networking/00-INDEX2
-rw-r--r--Documentation/networking/cxacru-cf.py48
-rw-r--r--Documentation/networking/cxacru.txt16
-rw-r--r--Documentation/networking/dccp.txt6
-rw-r--r--Documentation/networking/ip-sysctl.txt66
-rwxr-xr-xDocumentation/networking/ixgbevf.txt90
-rw-r--r--Documentation/networking/packet_mmap.txt8
-rw-r--r--Documentation/networking/regulatory.txt24
-rw-r--r--Documentation/networking/skfp.txt2
-rw-r--r--Documentation/networking/tcp-thin.txt47
-rw-r--r--Documentation/networking/timestamping/timestamping.c2
-rw-r--r--Documentation/pcmcia/locking.txt118
-rw-r--r--Documentation/pnp.txt13
-rw-r--r--Documentation/power/runtime_pm.txt95
-rw-r--r--Documentation/powerpc/dts-bindings/fsl/can.txt53
-rw-r--r--Documentation/powerpc/dts-bindings/fsl/dma.txt8
-rw-r--r--Documentation/powerpc/dts-bindings/fsl/i2c.txt30
-rw-r--r--Documentation/powerpc/dts-bindings/fsl/mpc5121-psc.txt70
-rw-r--r--Documentation/powerpc/dts-bindings/fsl/mpc5200.txt9
-rw-r--r--Documentation/powerpc/dts-bindings/fsl/spi.txt7
-rw-r--r--Documentation/powerpc/ptrace.txt134
-rw-r--r--Documentation/s390/CommonIO6
-rw-r--r--Documentation/s390/driver-model.txt4
-rw-r--r--Documentation/s390/kvm.txt2
-rw-r--r--Documentation/scsi/ChangeLog.lpfc10
-rw-r--r--Documentation/scsi/ChangeLog.megaraid_sas16
-rw-r--r--Documentation/serial/tty.txt4
-rw-r--r--Documentation/sound/alsa/ALSA-Configuration.txt27
-rw-r--r--Documentation/sound/alsa/HD-Audio-Models.txt5
-rw-r--r--Documentation/sound/alsa/HD-Audio.txt27
-rw-r--r--Documentation/sysctl/vm.txt5
-rw-r--r--Documentation/timers/00-INDEX2
-rw-r--r--Documentation/timers/Makefile8
-rw-r--r--Documentation/timers/hpet.txt273
-rw-r--r--Documentation/timers/hpet_example.c269
-rw-r--r--Documentation/trace/ftrace-design.txt5
-rw-r--r--Documentation/trace/ftrace.txt2
-rw-r--r--Documentation/trace/kprobetrace.txt57
-rw-r--r--Documentation/usb/error-codes.txt6
-rw-r--r--Documentation/usb/power-management.txt235
-rw-r--r--Documentation/video4linux/CARDLIST.cx238851
-rw-r--r--Documentation/video4linux/CARDLIST.saa71341
-rw-r--r--Documentation/video4linux/CARDLIST.tuner1
-rw-r--r--Documentation/video4linux/README.tlg230047
-rw-r--r--Documentation/video4linux/gspca.txt25
-rw-r--r--Documentation/video4linux/v4l2-framework.txt106
-rw-r--r--Documentation/video4linux/videobuf360
-rw-r--r--Documentation/vm/00-INDEX16
-rw-r--r--Documentation/vm/Makefile2
-rw-r--r--Documentation/vm/hugepage-mmap.c91
-rw-r--r--Documentation/vm/hugepage-shm.c98
-rw-r--r--Documentation/vm/hugetlbpage.txt169
-rw-r--r--Documentation/vm/map_hugetlb.c6
-rw-r--r--Documentation/vm/slub.txt1
-rw-r--r--Documentation/voyager.txt95
-rw-r--r--Documentation/x86/x86_64/boot-options.txt20
148 files changed, 5279 insertions, 1987 deletions
diff --git a/Documentation/ABI/stable/sysfs-devices-node b/Documentation/ABI/stable/sysfs-devices-node
new file mode 100644
index 000000000000..49b82cad7003
--- /dev/null
+++ b/Documentation/ABI/stable/sysfs-devices-node
@@ -0,0 +1,7 @@
1What: /sys/devices/system/node/nodeX
2Date: October 2002
3Contact: Linux Memory Management list <linux-mm@kvack.org>
4Description:
5 When CONFIG_NUMA is enabled, this is a directory containing
6 information on node X such as what CPUs are local to the
7 node.
diff --git a/Documentation/ABI/testing/sysfs-block b/Documentation/ABI/testing/sysfs-block
index d2f90334bb93..4873c759d535 100644
--- a/Documentation/ABI/testing/sysfs-block
+++ b/Documentation/ABI/testing/sysfs-block
@@ -128,3 +128,17 @@ Description:
128 preferred request size for workloads where sustained 128 preferred request size for workloads where sustained
129 throughput is desired. If no optimal I/O size is 129 throughput is desired. If no optimal I/O size is
130 reported this file contains 0. 130 reported this file contains 0.
131
132What: /sys/block/<disk>/queue/nomerges
133Date: January 2010
134Contact:
135Description:
136 Standard I/O elevator operations include attempts to
137 merge contiguous I/Os. For known random I/O loads these
138 attempts will always fail and result in extra cycles
139 being spent in the kernel. This allows one to turn off
140 this behavior on one of two ways: When set to 1, complex
141 merge checks are disabled, but the simple one-shot merges
142 with the previous I/O request are enabled. When set to 2,
143 all merge tries are disabled. The default value is 0 -
144 which enables all types of merge tries.
diff --git a/Documentation/ABI/testing/sysfs-bus-usb b/Documentation/ABI/testing/sysfs-bus-usb
index a07c0f366f91..a986e9bbba3d 100644
--- a/Documentation/ABI/testing/sysfs-bus-usb
+++ b/Documentation/ABI/testing/sysfs-bus-usb
@@ -159,3 +159,14 @@ Description:
159 device. This is useful to ensure auto probing won't 159 device. This is useful to ensure auto probing won't
160 match the driver to the device. For example: 160 match the driver to the device. For example:
161 # echo "046d c315" > /sys/bus/usb/drivers/foo/remove_id 161 # echo "046d c315" > /sys/bus/usb/drivers/foo/remove_id
162
163What: /sys/bus/usb/device/.../avoid_reset
164Date: December 2009
165Contact: Oliver Neukum <oliver@neukum.org>
166Description:
167 Writing 1 to this file tells the kernel that this
168 device will morph into another mode when it is reset.
169 Drivers will not use reset for error handling for
170 such devices.
171Users:
172 usb_modeswitch
diff --git a/Documentation/ABI/testing/sysfs-devices-power b/Documentation/ABI/testing/sysfs-devices-power
new file mode 100644
index 000000000000..6123c523bfd7
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-devices-power
@@ -0,0 +1,79 @@
1What: /sys/devices/.../power/
2Date: January 2009
3Contact: Rafael J. Wysocki <rjw@sisk.pl>
4Description:
5 The /sys/devices/.../power directory contains attributes
6 allowing the user space to check and modify some power
7 management related properties of given device.
8
9What: /sys/devices/.../power/wakeup
10Date: January 2009
11Contact: Rafael J. Wysocki <rjw@sisk.pl>
12Description:
13 The /sys/devices/.../power/wakeup attribute allows the user
14 space to check if the device is enabled to wake up the system
15 from sleep states, such as the memory sleep state (suspend to
16 RAM) and hibernation (suspend to disk), and to enable or disable
17 it to do that as desired.
18
19 Some devices support "wakeup" events, which are hardware signals
20 used to activate the system from a sleep state. Such devices
21 have one of the following two values for the sysfs power/wakeup
22 file:
23
24 + "enabled\n" to issue the events;
25 + "disabled\n" not to do so;
26
27 In that cases the user space can change the setting represented
28 by the contents of this file by writing either "enabled", or
29 "disabled" to it.
30
31 For the devices that are not capable of generating system wakeup
32 events this file contains "\n". In that cases the user space
33 cannot modify the contents of this file and the device cannot be
34 enabled to wake up the system.
35
36What: /sys/devices/.../power/control
37Date: January 2009
38Contact: Rafael J. Wysocki <rjw@sisk.pl>
39Description:
40 The /sys/devices/.../power/control attribute allows the user
41 space to control the run-time power management of the device.
42
43 All devices have one of the following two values for the
44 power/control file:
45
46 + "auto\n" to allow the device to be power managed at run time;
47 + "on\n" to prevent the device from being power managed;
48
49 The default for all devices is "auto", which means that they may
50 be subject to automatic power management, depending on their
51 drivers. Changing this attribute to "on" prevents the driver
52 from power managing the device at run time. Doing that while
53 the device is suspended causes it to be woken up.
54
55What: /sys/devices/.../power/async
56Date: January 2009
57Contact: Rafael J. Wysocki <rjw@sisk.pl>
58Description:
59 The /sys/devices/.../async attribute allows the user space to
60 enable or diasble the device's suspend and resume callbacks to
61 be executed asynchronously (ie. in separate threads, in parallel
62 with the main suspend/resume thread) during system-wide power
63 transitions (eg. suspend to RAM, hibernation).
64
65 All devices have one of the following two values for the
66 power/async file:
67
68 + "enabled\n" to permit the asynchronous suspend/resume;
69 + "disabled\n" to forbid it;
70
71 The value of this attribute may be changed by writing either
72 "enabled", or "disabled" to it.
73
74 It generally is unsafe to permit the asynchronous suspend/resume
75 of a device unless it is certain that all of the PM dependencies
76 of the device are known to the PM core. However, for some
77 devices this attribute is set to "enabled" by bus type code or
78 device drivers and in that cases it should be safe to leave the
79 default value.
diff --git a/Documentation/ABI/testing/sysfs-platform-asus-laptop b/Documentation/ABI/testing/sysfs-platform-asus-laptop
index a1cb660c50cf..1d775390e856 100644
--- a/Documentation/ABI/testing/sysfs-platform-asus-laptop
+++ b/Documentation/ABI/testing/sysfs-platform-asus-laptop
@@ -1,4 +1,4 @@
1What: /sys/devices/platform/asus-laptop/display 1What: /sys/devices/platform/asus_laptop/display
2Date: January 2007 2Date: January 2007
3KernelVersion: 2.6.20 3KernelVersion: 2.6.20
4Contact: "Corentin Chary" <corentincj@iksaif.net> 4Contact: "Corentin Chary" <corentincj@iksaif.net>
@@ -13,7 +13,7 @@ Description:
13 Ex: - 0 (0000b) means no display 13 Ex: - 0 (0000b) means no display
14 - 3 (0011b) CRT+LCD. 14 - 3 (0011b) CRT+LCD.
15 15
16What: /sys/devices/platform/asus-laptop/gps 16What: /sys/devices/platform/asus_laptop/gps
17Date: January 2007 17Date: January 2007
18KernelVersion: 2.6.20 18KernelVersion: 2.6.20
19Contact: "Corentin Chary" <corentincj@iksaif.net> 19Contact: "Corentin Chary" <corentincj@iksaif.net>
@@ -21,7 +21,7 @@ Description:
21 Control the gps device. 1 means on, 0 means off. 21 Control the gps device. 1 means on, 0 means off.
22Users: Lapsus 22Users: Lapsus
23 23
24What: /sys/devices/platform/asus-laptop/ledd 24What: /sys/devices/platform/asus_laptop/ledd
25Date: January 2007 25Date: January 2007
26KernelVersion: 2.6.20 26KernelVersion: 2.6.20
27Contact: "Corentin Chary" <corentincj@iksaif.net> 27Contact: "Corentin Chary" <corentincj@iksaif.net>
@@ -29,11 +29,11 @@ Description:
29 Some models like the W1N have a LED display that can be 29 Some models like the W1N have a LED display that can be
30 used to display several informations. 30 used to display several informations.
31 To control the LED display, use the following : 31 To control the LED display, use the following :
32 echo 0x0T000DDD > /sys/devices/platform/asus-laptop/ 32 echo 0x0T000DDD > /sys/devices/platform/asus_laptop/
33 where T control the 3 letters display, and DDD the 3 digits display. 33 where T control the 3 letters display, and DDD the 3 digits display.
34 The DDD table can be found in Documentation/laptops/asus-laptop.txt 34 The DDD table can be found in Documentation/laptops/asus-laptop.txt
35 35
36What: /sys/devices/platform/asus-laptop/bluetooth 36What: /sys/devices/platform/asus_laptop/bluetooth
37Date: January 2007 37Date: January 2007
38KernelVersion: 2.6.20 38KernelVersion: 2.6.20
39Contact: "Corentin Chary" <corentincj@iksaif.net> 39Contact: "Corentin Chary" <corentincj@iksaif.net>
@@ -42,7 +42,7 @@ Description:
42 This may control the led, the device or both. 42 This may control the led, the device or both.
43Users: Lapsus 43Users: Lapsus
44 44
45What: /sys/devices/platform/asus-laptop/wlan 45What: /sys/devices/platform/asus_laptop/wlan
46Date: January 2007 46Date: January 2007
47KernelVersion: 2.6.20 47KernelVersion: 2.6.20
48Contact: "Corentin Chary" <corentincj@iksaif.net> 48Contact: "Corentin Chary" <corentincj@iksaif.net>
diff --git a/Documentation/ABI/testing/sysfs-platform-eeepc-laptop b/Documentation/ABI/testing/sysfs-platform-eeepc-laptop
index 7445dfb321b5..5b026c69587a 100644
--- a/Documentation/ABI/testing/sysfs-platform-eeepc-laptop
+++ b/Documentation/ABI/testing/sysfs-platform-eeepc-laptop
@@ -1,4 +1,4 @@
1What: /sys/devices/platform/eeepc-laptop/disp 1What: /sys/devices/platform/eeepc/disp
2Date: May 2008 2Date: May 2008
3KernelVersion: 2.6.26 3KernelVersion: 2.6.26
4Contact: "Corentin Chary" <corentincj@iksaif.net> 4Contact: "Corentin Chary" <corentincj@iksaif.net>
@@ -9,21 +9,21 @@ Description:
9 - 3 = LCD+CRT 9 - 3 = LCD+CRT
10 If you run X11, you should use xrandr instead. 10 If you run X11, you should use xrandr instead.
11 11
12What: /sys/devices/platform/eeepc-laptop/camera 12What: /sys/devices/platform/eeepc/camera
13Date: May 2008 13Date: May 2008
14KernelVersion: 2.6.26 14KernelVersion: 2.6.26
15Contact: "Corentin Chary" <corentincj@iksaif.net> 15Contact: "Corentin Chary" <corentincj@iksaif.net>
16Description: 16Description:
17 Control the camera. 1 means on, 0 means off. 17 Control the camera. 1 means on, 0 means off.
18 18
19What: /sys/devices/platform/eeepc-laptop/cardr 19What: /sys/devices/platform/eeepc/cardr
20Date: May 2008 20Date: May 2008
21KernelVersion: 2.6.26 21KernelVersion: 2.6.26
22Contact: "Corentin Chary" <corentincj@iksaif.net> 22Contact: "Corentin Chary" <corentincj@iksaif.net>
23Description: 23Description:
24 Control the card reader. 1 means on, 0 means off. 24 Control the card reader. 1 means on, 0 means off.
25 25
26What: /sys/devices/platform/eeepc-laptop/cpufv 26What: /sys/devices/platform/eeepc/cpufv
27Date: Jun 2009 27Date: Jun 2009
28KernelVersion: 2.6.31 28KernelVersion: 2.6.31
29Contact: "Corentin Chary" <corentincj@iksaif.net> 29Contact: "Corentin Chary" <corentincj@iksaif.net>
@@ -42,7 +42,7 @@ Description:
42 `------------ Availables modes 42 `------------ Availables modes
43 For example, 0x301 means: mode 1 selected, 3 available modes. 43 For example, 0x301 means: mode 1 selected, 3 available modes.
44 44
45What: /sys/devices/platform/eeepc-laptop/available_cpufv 45What: /sys/devices/platform/eeepc/available_cpufv
46Date: Jun 2009 46Date: Jun 2009
47KernelVersion: 2.6.31 47KernelVersion: 2.6.31
48Contact: "Corentin Chary" <corentincj@iksaif.net> 48Contact: "Corentin Chary" <corentincj@iksaif.net>
diff --git a/Documentation/ABI/testing/sysfs-power b/Documentation/ABI/testing/sysfs-power
index dcff4d0623ad..d6a801f45b48 100644
--- a/Documentation/ABI/testing/sysfs-power
+++ b/Documentation/ABI/testing/sysfs-power
@@ -101,3 +101,16 @@ Description:
101 101
102 CAUTION: Using it will cause your machine's real-time (CMOS) 102 CAUTION: Using it will cause your machine's real-time (CMOS)
103 clock to be set to a random invalid time after a resume. 103 clock to be set to a random invalid time after a resume.
104
105What: /sys/power/pm_async
106Date: January 2009
107Contact: Rafael J. Wysocki <rjw@sisk.pl>
108Description:
109 The /sys/power/pm_async file controls the switch allowing the
110 user space to enable or disable asynchronous suspend and resume
111 of devices. If enabled, this feature will cause some device
112 drivers' suspend and resume callbacks to be executed in parallel
113 with each other and with the main suspend thread. It is enabled
114 if this file contains "1", which is the default. It may be
115 disabled by writing "0" to this file, in which case all devices
116 will be suspended and resumed synchronously.
diff --git a/Documentation/DMA-API.txt b/Documentation/DMA-API.txt
index 5aceb88b3f8b..05e2ae236865 100644
--- a/Documentation/DMA-API.txt
+++ b/Documentation/DMA-API.txt
@@ -4,20 +4,18 @@
4 James E.J. Bottomley <James.Bottomley@HansenPartnership.com> 4 James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
5 5
6This document describes the DMA API. For a more gentle introduction 6This document describes the DMA API. For a more gentle introduction
7phrased in terms of the pci_ equivalents (and actual examples) see 7of the API (and actual examples) see
8Documentation/PCI/PCI-DMA-mapping.txt. 8Documentation/DMA-API-HOWTO.txt.
9 9
10This API is split into two pieces. Part I describes the API and the 10This API is split into two pieces. Part I describes the API. Part II
11corresponding pci_ API. Part II describes the extensions to the API 11describes the extensions to the API for supporting non-consistent
12for supporting non-consistent memory machines. Unless you know that 12memory machines. Unless you know that your driver absolutely has to
13your driver absolutely has to support non-consistent platforms (this 13support non-consistent platforms (this is usually only legacy
14is usually only legacy platforms) you should only use the API 14platforms) you should only use the API described in part I.
15described in part I.
16 15
17Part I - pci_ and dma_ Equivalent API 16Part I - dma_ API
18------------------------------------- 17-------------------------------------
19 18
20To get the pci_ API, you must #include <linux/pci.h>
21To get the dma_ API, you must #include <linux/dma-mapping.h> 19To get the dma_ API, you must #include <linux/dma-mapping.h>
22 20
23 21
@@ -27,9 +25,6 @@ Part Ia - Using large dma-coherent buffers
27void * 25void *
28dma_alloc_coherent(struct device *dev, size_t size, 26dma_alloc_coherent(struct device *dev, size_t size,
29 dma_addr_t *dma_handle, gfp_t flag) 27 dma_addr_t *dma_handle, gfp_t flag)
30void *
31pci_alloc_consistent(struct pci_dev *dev, size_t size,
32 dma_addr_t *dma_handle)
33 28
34Consistent memory is memory for which a write by either the device or 29Consistent memory is memory for which a write by either the device or
35the processor can immediately be read by the processor or device 30the processor can immediately be read by the processor or device
@@ -53,15 +48,11 @@ The simplest way to do that is to use the dma_pool calls (see below).
53The flag parameter (dma_alloc_coherent only) allows the caller to 48The flag parameter (dma_alloc_coherent only) allows the caller to
54specify the GFP_ flags (see kmalloc) for the allocation (the 49specify the GFP_ flags (see kmalloc) for the allocation (the
55implementation may choose to ignore flags that affect the location of 50implementation may choose to ignore flags that affect the location of
56the returned memory, like GFP_DMA). For pci_alloc_consistent, you 51the returned memory, like GFP_DMA).
57must assume GFP_ATOMIC behaviour.
58 52
59void 53void
60dma_free_coherent(struct device *dev, size_t size, void *cpu_addr, 54dma_free_coherent(struct device *dev, size_t size, void *cpu_addr,
61 dma_addr_t dma_handle) 55 dma_addr_t dma_handle)
62void
63pci_free_consistent(struct pci_dev *dev, size_t size, void *cpu_addr,
64 dma_addr_t dma_handle)
65 56
66Free the region of consistent memory you previously allocated. dev, 57Free the region of consistent memory you previously allocated. dev,
67size and dma_handle must all be the same as those passed into the 58size and dma_handle must all be the same as those passed into the
@@ -89,10 +80,6 @@ for alignment, like queue heads needing to be aligned on N-byte boundaries.
89 dma_pool_create(const char *name, struct device *dev, 80 dma_pool_create(const char *name, struct device *dev,
90 size_t size, size_t align, size_t alloc); 81 size_t size, size_t align, size_t alloc);
91 82
92 struct pci_pool *
93 pci_pool_create(const char *name, struct pci_device *dev,
94 size_t size, size_t align, size_t alloc);
95
96The pool create() routines initialize a pool of dma-coherent buffers 83The pool create() routines initialize a pool of dma-coherent buffers
97for use with a given device. It must be called in a context which 84for use with a given device. It must be called in a context which
98can sleep. 85can sleep.
@@ -108,9 +95,6 @@ from this pool must not cross 4KByte boundaries.
108 void *dma_pool_alloc(struct dma_pool *pool, gfp_t gfp_flags, 95 void *dma_pool_alloc(struct dma_pool *pool, gfp_t gfp_flags,
109 dma_addr_t *dma_handle); 96 dma_addr_t *dma_handle);
110 97
111 void *pci_pool_alloc(struct pci_pool *pool, gfp_t gfp_flags,
112 dma_addr_t *dma_handle);
113
114This allocates memory from the pool; the returned memory will meet the size 98This allocates memory from the pool; the returned memory will meet the size
115and alignment requirements specified at creation time. Pass GFP_ATOMIC to 99and alignment requirements specified at creation time. Pass GFP_ATOMIC to
116prevent blocking, or if it's permitted (not in_interrupt, not holding SMP locks), 100prevent blocking, or if it's permitted (not in_interrupt, not holding SMP locks),
@@ -122,9 +106,6 @@ pool's device.
122 void dma_pool_free(struct dma_pool *pool, void *vaddr, 106 void dma_pool_free(struct dma_pool *pool, void *vaddr,
123 dma_addr_t addr); 107 dma_addr_t addr);
124 108
125 void pci_pool_free(struct pci_pool *pool, void *vaddr,
126 dma_addr_t addr);
127
128This puts memory back into the pool. The pool is what was passed to 109This puts memory back into the pool. The pool is what was passed to
129the pool allocation routine; the cpu (vaddr) and dma addresses are what 110the pool allocation routine; the cpu (vaddr) and dma addresses are what
130were returned when that routine allocated the memory being freed. 111were returned when that routine allocated the memory being freed.
@@ -132,8 +113,6 @@ were returned when that routine allocated the memory being freed.
132 113
133 void dma_pool_destroy(struct dma_pool *pool); 114 void dma_pool_destroy(struct dma_pool *pool);
134 115
135 void pci_pool_destroy(struct pci_pool *pool);
136
137The pool destroy() routines free the resources of the pool. They must be 116The pool destroy() routines free the resources of the pool. They must be
138called in a context which can sleep. Make sure you've freed all allocated 117called in a context which can sleep. Make sure you've freed all allocated
139memory back to the pool before you destroy it. 118memory back to the pool before you destroy it.
@@ -144,8 +123,6 @@ Part Ic - DMA addressing limitations
144 123
145int 124int
146dma_supported(struct device *dev, u64 mask) 125dma_supported(struct device *dev, u64 mask)
147int
148pci_dma_supported(struct pci_dev *hwdev, u64 mask)
149 126
150Checks to see if the device can support DMA to the memory described by 127Checks to see if the device can support DMA to the memory described by
151mask. 128mask.
@@ -159,8 +136,14 @@ driver writers.
159 136
160int 137int
161dma_set_mask(struct device *dev, u64 mask) 138dma_set_mask(struct device *dev, u64 mask)
139
140Checks to see if the mask is possible and updates the device
141parameters if it is.
142
143Returns: 0 if successful and a negative error if not.
144
162int 145int
163pci_set_dma_mask(struct pci_device *dev, u64 mask) 146dma_set_coherent_mask(struct device *dev, u64 mask)
164 147
165Checks to see if the mask is possible and updates the device 148Checks to see if the mask is possible and updates the device
166parameters if it is. 149parameters if it is.
@@ -187,9 +170,6 @@ Part Id - Streaming DMA mappings
187dma_addr_t 170dma_addr_t
188dma_map_single(struct device *dev, void *cpu_addr, size_t size, 171dma_map_single(struct device *dev, void *cpu_addr, size_t size,
189 enum dma_data_direction direction) 172 enum dma_data_direction direction)
190dma_addr_t
191pci_map_single(struct pci_dev *hwdev, void *cpu_addr, size_t size,
192 int direction)
193 173
194Maps a piece of processor virtual memory so it can be accessed by the 174Maps a piece of processor virtual memory so it can be accessed by the
195device and returns the physical handle of the memory. 175device and returns the physical handle of the memory.
@@ -198,14 +178,10 @@ The direction for both api's may be converted freely by casting.
198However the dma_ API uses a strongly typed enumerator for its 178However the dma_ API uses a strongly typed enumerator for its
199direction: 179direction:
200 180
201DMA_NONE = PCI_DMA_NONE no direction (used for 181DMA_NONE no direction (used for debugging)
202 debugging) 182DMA_TO_DEVICE data is going from the memory to the device
203DMA_TO_DEVICE = PCI_DMA_TODEVICE data is going from the 183DMA_FROM_DEVICE data is coming from the device to the memory
204 memory to the device 184DMA_BIDIRECTIONAL direction isn't known
205DMA_FROM_DEVICE = PCI_DMA_FROMDEVICE data is coming from
206 the device to the
207 memory
208DMA_BIDIRECTIONAL = PCI_DMA_BIDIRECTIONAL direction isn't known
209 185
210Notes: Not all memory regions in a machine can be mapped by this 186Notes: Not all memory regions in a machine can be mapped by this
211API. Further, regions that appear to be physically contiguous in 187API. Further, regions that appear to be physically contiguous in
@@ -268,9 +244,6 @@ cache lines are updated with data that the device may have changed).
268void 244void
269dma_unmap_single(struct device *dev, dma_addr_t dma_addr, size_t size, 245dma_unmap_single(struct device *dev, dma_addr_t dma_addr, size_t size,
270 enum dma_data_direction direction) 246 enum dma_data_direction direction)
271void
272pci_unmap_single(struct pci_dev *hwdev, dma_addr_t dma_addr,
273 size_t size, int direction)
274 247
275Unmaps the region previously mapped. All the parameters passed in 248Unmaps the region previously mapped. All the parameters passed in
276must be identical to those passed in (and returned) by the mapping 249must be identical to those passed in (and returned) by the mapping
@@ -280,15 +253,9 @@ dma_addr_t
280dma_map_page(struct device *dev, struct page *page, 253dma_map_page(struct device *dev, struct page *page,
281 unsigned long offset, size_t size, 254 unsigned long offset, size_t size,
282 enum dma_data_direction direction) 255 enum dma_data_direction direction)
283dma_addr_t
284pci_map_page(struct pci_dev *hwdev, struct page *page,
285 unsigned long offset, size_t size, int direction)
286void 256void
287dma_unmap_page(struct device *dev, dma_addr_t dma_address, size_t size, 257dma_unmap_page(struct device *dev, dma_addr_t dma_address, size_t size,
288 enum dma_data_direction direction) 258 enum dma_data_direction direction)
289void
290pci_unmap_page(struct pci_dev *hwdev, dma_addr_t dma_address,
291 size_t size, int direction)
292 259
293API for mapping and unmapping for pages. All the notes and warnings 260API for mapping and unmapping for pages. All the notes and warnings
294for the other mapping APIs apply here. Also, although the <offset> 261for the other mapping APIs apply here. Also, although the <offset>
@@ -299,9 +266,6 @@ cache width is.
299int 266int
300dma_mapping_error(struct device *dev, dma_addr_t dma_addr) 267dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
301 268
302int
303pci_dma_mapping_error(struct pci_dev *hwdev, dma_addr_t dma_addr)
304
305In some circumstances dma_map_single and dma_map_page will fail to create 269In some circumstances dma_map_single and dma_map_page will fail to create
306a mapping. A driver can check for these errors by testing the returned 270a mapping. A driver can check for these errors by testing the returned
307dma address with dma_mapping_error(). A non-zero return value means the mapping 271dma address with dma_mapping_error(). A non-zero return value means the mapping
@@ -311,9 +275,6 @@ reduce current DMA mapping usage or delay and try again later).
311 int 275 int
312 dma_map_sg(struct device *dev, struct scatterlist *sg, 276 dma_map_sg(struct device *dev, struct scatterlist *sg,
313 int nents, enum dma_data_direction direction) 277 int nents, enum dma_data_direction direction)
314 int
315 pci_map_sg(struct pci_dev *hwdev, struct scatterlist *sg,
316 int nents, int direction)
317 278
318Returns: the number of physical segments mapped (this may be shorter 279Returns: the number of physical segments mapped (this may be shorter
319than <nents> passed in if some elements of the scatter/gather list are 280than <nents> passed in if some elements of the scatter/gather list are
@@ -353,9 +314,6 @@ accessed sg->address and sg->length as shown above.
353 void 314 void
354 dma_unmap_sg(struct device *dev, struct scatterlist *sg, 315 dma_unmap_sg(struct device *dev, struct scatterlist *sg,
355 int nhwentries, enum dma_data_direction direction) 316 int nhwentries, enum dma_data_direction direction)
356 void
357 pci_unmap_sg(struct pci_dev *hwdev, struct scatterlist *sg,
358 int nents, int direction)
359 317
360Unmap the previously mapped scatter/gather list. All the parameters 318Unmap the previously mapped scatter/gather list. All the parameters
361must be the same as those and passed in to the scatter/gather mapping 319must be the same as those and passed in to the scatter/gather mapping
@@ -365,21 +323,23 @@ Note: <nents> must be the number you passed in, *not* the number of
365physical entries returned. 323physical entries returned.
366 324
367void 325void
368dma_sync_single(struct device *dev, dma_addr_t dma_handle, size_t size, 326dma_sync_single_for_cpu(struct device *dev, dma_addr_t dma_handle, size_t size,
369 enum dma_data_direction direction) 327 enum dma_data_direction direction)
370void 328void
371pci_dma_sync_single(struct pci_dev *hwdev, dma_addr_t dma_handle, 329dma_sync_single_for_device(struct device *dev, dma_addr_t dma_handle, size_t size,
372 size_t size, int direction) 330 enum dma_data_direction direction)
373void 331void
374dma_sync_sg(struct device *dev, struct scatterlist *sg, int nelems, 332dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg, int nelems,
375 enum dma_data_direction direction) 333 enum dma_data_direction direction)
376void 334void
377pci_dma_sync_sg(struct pci_dev *hwdev, struct scatterlist *sg, 335dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg, int nelems,
378 int nelems, int direction) 336 enum dma_data_direction direction)
379 337
380Synchronise a single contiguous or scatter/gather mapping. All the 338Synchronise a single contiguous or scatter/gather mapping for the cpu
381parameters must be the same as those passed into the single mapping 339and device. With the sync_sg API, all the parameters must be the same
382API. 340as those passed into the single mapping API. With the sync_single API,
341you can use dma_handle and size parameters that aren't identical to
342those passed into the single mapping API to do a partial sync.
383 343
384Notes: You must do this: 344Notes: You must do this:
385 345
@@ -461,9 +421,9 @@ void whizco_dma_map_sg_attrs(struct device *dev, dma_addr_t dma_addr,
461Part II - Advanced dma_ usage 421Part II - Advanced dma_ usage
462----------------------------- 422-----------------------------
463 423
464Warning: These pieces of the DMA API have no PCI equivalent. They 424Warning: These pieces of the DMA API should not be used in the
465should also not be used in the majority of cases, since they cater for 425majority of cases, since they cater for unlikely corner cases that
466unlikely corner cases that don't belong in usual drivers. 426don't belong in usual drivers.
467 427
468If you don't understand how cache line coherency works between a 428If you don't understand how cache line coherency works between a
469processor and an I/O device, you should not be using this part of the 429processor and an I/O device, you should not be using this part of the
@@ -514,16 +474,6 @@ into the width returned by this call. It will also always be a power
514of two for easy alignment. 474of two for easy alignment.
515 475
516void 476void
517dma_sync_single_range(struct device *dev, dma_addr_t dma_handle,
518 unsigned long offset, size_t size,
519 enum dma_data_direction direction)
520
521Does a partial sync, starting at offset and continuing for size. You
522must be careful to observe the cache alignment and width when doing
523anything like this. You must also be extra careful about accessing
524memory you intend to sync partially.
525
526void
527dma_cache_sync(struct device *dev, void *vaddr, size_t size, 477dma_cache_sync(struct device *dev, void *vaddr, size_t size,
528 enum dma_data_direction direction) 478 enum dma_data_direction direction)
529 479
diff --git a/Documentation/DocBook/device-drivers.tmpl b/Documentation/DocBook/device-drivers.tmpl
index f9a6e2c75f12..1b2dd4fc3db2 100644
--- a/Documentation/DocBook/device-drivers.tmpl
+++ b/Documentation/DocBook/device-drivers.tmpl
@@ -45,7 +45,7 @@
45 </sect1> 45 </sect1>
46 46
47 <sect1><title>Atomic and pointer manipulation</title> 47 <sect1><title>Atomic and pointer manipulation</title>
48!Iarch/x86/include/asm/atomic_32.h 48!Iarch/x86/include/asm/atomic.h
49!Iarch/x86/include/asm/unaligned.h 49!Iarch/x86/include/asm/unaligned.h
50 </sect1> 50 </sect1>
51 51
diff --git a/Documentation/DocBook/deviceiobook.tmpl b/Documentation/DocBook/deviceiobook.tmpl
index 3ed88126ab8f..c1ed6a49e598 100644
--- a/Documentation/DocBook/deviceiobook.tmpl
+++ b/Documentation/DocBook/deviceiobook.tmpl
@@ -316,7 +316,7 @@ CPU B: spin_unlock_irqrestore(&amp;dev_lock, flags)
316 316
317 <chapter id="pubfunctions"> 317 <chapter id="pubfunctions">
318 <title>Public Functions Provided</title> 318 <title>Public Functions Provided</title>
319!Iarch/x86/include/asm/io_32.h 319!Iarch/x86/include/asm/io.h
320!Elib/iomap.c 320!Elib/iomap.c
321 </chapter> 321 </chapter>
322 322
diff --git a/Documentation/DocBook/mac80211.tmpl b/Documentation/DocBook/mac80211.tmpl
index f3f37f141dbd..affb15a344a1 100644
--- a/Documentation/DocBook/mac80211.tmpl
+++ b/Documentation/DocBook/mac80211.tmpl
@@ -144,7 +144,7 @@ usage should require reading the full document.
144 this though and the recommendation to allow only a single 144 this though and the recommendation to allow only a single
145 interface in STA mode at first! 145 interface in STA mode at first!
146 </para> 146 </para>
147!Finclude/net/mac80211.h ieee80211_if_init_conf 147!Finclude/net/mac80211.h ieee80211_vif
148 </chapter> 148 </chapter>
149 149
150 <chapter id="rx-tx"> 150 <chapter id="rx-tx">
@@ -234,7 +234,6 @@ usage should require reading the full document.
234 <title>Multiple queues and QoS support</title> 234 <title>Multiple queues and QoS support</title>
235 <para>TBD</para> 235 <para>TBD</para>
236!Finclude/net/mac80211.h ieee80211_tx_queue_params 236!Finclude/net/mac80211.h ieee80211_tx_queue_params
237!Finclude/net/mac80211.h ieee80211_tx_queue_stats
238 </chapter> 237 </chapter>
239 238
240 <chapter id="AP"> 239 <chapter id="AP">
diff --git a/Documentation/DocBook/mtdnand.tmpl b/Documentation/DocBook/mtdnand.tmpl
index 5e7d84b48505..133cd6c3f3c1 100644
--- a/Documentation/DocBook/mtdnand.tmpl
+++ b/Documentation/DocBook/mtdnand.tmpl
@@ -488,7 +488,7 @@ static void board_select_chip (struct mtd_info *mtd, int chip)
488 The ECC bytes must be placed immidiately after the data 488 The ECC bytes must be placed immidiately after the data
489 bytes in order to make the syndrome generator work. This 489 bytes in order to make the syndrome generator work. This
490 is contrary to the usual layout used by software ECC. The 490 is contrary to the usual layout used by software ECC. The
491 seperation of data and out of band area is not longer 491 separation of data and out of band area is not longer
492 possible. The nand driver code handles this layout and 492 possible. The nand driver code handles this layout and
493 the remaining free bytes in the oob area are managed by 493 the remaining free bytes in the oob area are managed by
494 the autoplacement code. Provide a matching oob-layout 494 the autoplacement code. Provide a matching oob-layout
@@ -560,7 +560,7 @@ static void board_select_chip (struct mtd_info *mtd, int chip)
560 bad blocks. They have factory marked good blocks. The marker pattern 560 bad blocks. They have factory marked good blocks. The marker pattern
561 is erased when the block is erased to be reused. So in case of 561 is erased when the block is erased to be reused. So in case of
562 powerloss before writing the pattern back to the chip this block 562 powerloss before writing the pattern back to the chip this block
563 would be lost and added to the bad blocks. Therefor we scan the 563 would be lost and added to the bad blocks. Therefore we scan the
564 chip(s) when we detect them the first time for good blocks and 564 chip(s) when we detect them the first time for good blocks and
565 store this information in a bad block table before erasing any 565 store this information in a bad block table before erasing any
566 of the blocks. 566 of the blocks.
@@ -1094,7 +1094,7 @@ in this page</entry>
1094 manufacturers specifications. This applies similar to the spare area. 1094 manufacturers specifications. This applies similar to the spare area.
1095 </para> 1095 </para>
1096 <para> 1096 <para>
1097 Therefor NAND aware filesystems must either write in page size chunks 1097 Therefore NAND aware filesystems must either write in page size chunks
1098 or hold a writebuffer to collect smaller writes until they sum up to 1098 or hold a writebuffer to collect smaller writes until they sum up to
1099 pagesize. Available NAND aware filesystems: JFFS2, YAFFS. 1099 pagesize. Available NAND aware filesystems: JFFS2, YAFFS.
1100 </para> 1100 </para>
diff --git a/Documentation/DocBook/v4l/common.xml b/Documentation/DocBook/v4l/common.xml
index c65f0ac9b6ee..cea23e1c4fc6 100644
--- a/Documentation/DocBook/v4l/common.xml
+++ b/Documentation/DocBook/v4l/common.xml
@@ -1170,7 +1170,7 @@ frames per second. If less than this number of frames is to be
1170captured or output, applications can request frame skipping or 1170captured or output, applications can request frame skipping or
1171duplicating on the driver side. This is especially useful when using 1171duplicating on the driver side. This is especially useful when using
1172the &func-read; or &func-write;, which are not augmented by timestamps 1172the &func-read; or &func-write;, which are not augmented by timestamps
1173or sequence counters, and to avoid unneccessary data copying.</para> 1173or sequence counters, and to avoid unnecessary data copying.</para>
1174 1174
1175 <para>Finally these ioctls can be used to determine the number of 1175 <para>Finally these ioctls can be used to determine the number of
1176buffers used internally by a driver in read/write mode. For 1176buffers used internally by a driver in read/write mode. For
diff --git a/Documentation/DocBook/v4l/io.xml b/Documentation/DocBook/v4l/io.xml
index f92f24323b2a..e870330cbf77 100644
--- a/Documentation/DocBook/v4l/io.xml
+++ b/Documentation/DocBook/v4l/io.xml
@@ -589,7 +589,8 @@ number of a video input as in &v4l2-input; field
589 <entry></entry> 589 <entry></entry>
590 <entry>A place holder for future extensions and custom 590 <entry>A place holder for future extensions and custom
591(driver defined) buffer types 591(driver defined) buffer types
592<constant>V4L2_BUF_TYPE_PRIVATE</constant> and higher.</entry> 592<constant>V4L2_BUF_TYPE_PRIVATE</constant> and higher. Applications
593should set this to 0.</entry>
593 </row> 594 </row>
594 </tbody> 595 </tbody>
595 </tgroup> 596 </tgroup>
diff --git a/Documentation/DocBook/v4l/vidioc-g-parm.xml b/Documentation/DocBook/v4l/vidioc-g-parm.xml
index 78332d365ce9..392aa9e5571e 100644
--- a/Documentation/DocBook/v4l/vidioc-g-parm.xml
+++ b/Documentation/DocBook/v4l/vidioc-g-parm.xml
@@ -55,7 +55,7 @@ captured or output, applications can request frame skipping or
55duplicating on the driver side. This is especially useful when using 55duplicating on the driver side. This is especially useful when using
56the <function>read()</function> or <function>write()</function>, which 56the <function>read()</function> or <function>write()</function>, which
57are not augmented by timestamps or sequence counters, and to avoid 57are not augmented by timestamps or sequence counters, and to avoid
58unneccessary data copying.</para> 58unnecessary data copying.</para>
59 59
60 <para>Further these ioctls can be used to determine the number of 60 <para>Further these ioctls can be used to determine the number of
61buffers used internally by a driver in read/write mode. For 61buffers used internally by a driver in read/write mode. For
diff --git a/Documentation/DocBook/v4l/vidioc-qbuf.xml b/Documentation/DocBook/v4l/vidioc-qbuf.xml
index 187081778154..b843bd7b3897 100644
--- a/Documentation/DocBook/v4l/vidioc-qbuf.xml
+++ b/Documentation/DocBook/v4l/vidioc-qbuf.xml
@@ -54,12 +54,10 @@ to enqueue an empty (capturing) or filled (output) buffer in the
54driver's incoming queue. The semantics depend on the selected I/O 54driver's incoming queue. The semantics depend on the selected I/O
55method.</para> 55method.</para>
56 56
57 <para>To enqueue a <link linkend="mmap">memory mapped</link> 57 <para>To enqueue a buffer applications set the <structfield>type</structfield>
58buffer applications set the <structfield>type</structfield> field of a 58field of a &v4l2-buffer; to the same buffer type as was previously used
59&v4l2-buffer; to the same buffer type as previously &v4l2-format; 59with &v4l2-format; <structfield>type</structfield> and &v4l2-requestbuffers;
60<structfield>type</structfield> and &v4l2-requestbuffers; 60<structfield>type</structfield>. Applications must also set the
61<structfield>type</structfield>, the <structfield>memory</structfield>
62field to <constant>V4L2_MEMORY_MMAP</constant> and the
63<structfield>index</structfield> field. Valid index numbers range from 61<structfield>index</structfield> field. Valid index numbers range from
64zero to the number of buffers allocated with &VIDIOC-REQBUFS; 62zero to the number of buffers allocated with &VIDIOC-REQBUFS;
65(&v4l2-requestbuffers; <structfield>count</structfield>) minus one. The 63(&v4l2-requestbuffers; <structfield>count</structfield>) minus one. The
@@ -70,8 +68,19 @@ intended for output (<structfield>type</structfield> is
70<constant>V4L2_BUF_TYPE_VBI_OUTPUT</constant>) applications must also 68<constant>V4L2_BUF_TYPE_VBI_OUTPUT</constant>) applications must also
71initialize the <structfield>bytesused</structfield>, 69initialize the <structfield>bytesused</structfield>,
72<structfield>field</structfield> and 70<structfield>field</structfield> and
73<structfield>timestamp</structfield> fields. See <xref 71<structfield>timestamp</structfield> fields, see <xref
74 linkend="buffer" /> for details. When 72linkend="buffer" /> for details.
73Applications must also set <structfield>flags</structfield> to 0. If a driver
74supports capturing from specific video inputs and you want to specify a video
75input, then <structfield>flags</structfield> should be set to
76<constant>V4L2_BUF_FLAG_INPUT</constant> and the field
77<structfield>input</structfield> must be initialized to the desired input.
78The <structfield>reserved</structfield> field must be set to 0.
79</para>
80
81 <para>To enqueue a <link linkend="mmap">memory mapped</link>
82buffer applications set the <structfield>memory</structfield>
83field to <constant>V4L2_MEMORY_MMAP</constant>. When
75<constant>VIDIOC_QBUF</constant> is called with a pointer to this 84<constant>VIDIOC_QBUF</constant> is called with a pointer to this
76structure the driver sets the 85structure the driver sets the
77<constant>V4L2_BUF_FLAG_MAPPED</constant> and 86<constant>V4L2_BUF_FLAG_MAPPED</constant> and
@@ -81,14 +90,10 @@ structure the driver sets the
81&EINVAL;.</para> 90&EINVAL;.</para>
82 91
83 <para>To enqueue a <link linkend="userp">user pointer</link> 92 <para>To enqueue a <link linkend="userp">user pointer</link>
84buffer applications set the <structfield>type</structfield> field of a 93buffer applications set the <structfield>memory</structfield>
85&v4l2-buffer; to the same buffer type as previously &v4l2-format; 94field to <constant>V4L2_MEMORY_USERPTR</constant>, the
86<structfield>type</structfield> and &v4l2-requestbuffers;
87<structfield>type</structfield>, the <structfield>memory</structfield>
88field to <constant>V4L2_MEMORY_USERPTR</constant> and the
89<structfield>m.userptr</structfield> field to the address of the 95<structfield>m.userptr</structfield> field to the address of the
90buffer and <structfield>length</structfield> to its size. When the 96buffer and <structfield>length</structfield> to its size.
91buffer is intended for output additional fields must be set as above.
92When <constant>VIDIOC_QBUF</constant> is called with a pointer to this 97When <constant>VIDIOC_QBUF</constant> is called with a pointer to this
93structure the driver sets the <constant>V4L2_BUF_FLAG_QUEUED</constant> 98structure the driver sets the <constant>V4L2_BUF_FLAG_QUEUED</constant>
94flag and clears the <constant>V4L2_BUF_FLAG_MAPPED</constant> and 99flag and clears the <constant>V4L2_BUF_FLAG_MAPPED</constant> and
@@ -96,13 +101,14 @@ flag and clears the <constant>V4L2_BUF_FLAG_MAPPED</constant> and
96<structfield>flags</structfield> field, or it returns an error code. 101<structfield>flags</structfield> field, or it returns an error code.
97This ioctl locks the memory pages of the buffer in physical memory, 102This ioctl locks the memory pages of the buffer in physical memory,
98they cannot be swapped out to disk. Buffers remain locked until 103they cannot be swapped out to disk. Buffers remain locked until
99dequeued, until the &VIDIOC-STREAMOFF; or &VIDIOC-REQBUFS; ioctl are 104dequeued, until the &VIDIOC-STREAMOFF; or &VIDIOC-REQBUFS; ioctl is
100called, or until the device is closed.</para> 105called, or until the device is closed.</para>
101 106
102 <para>Applications call the <constant>VIDIOC_DQBUF</constant> 107 <para>Applications call the <constant>VIDIOC_DQBUF</constant>
103ioctl to dequeue a filled (capturing) or displayed (output) buffer 108ioctl to dequeue a filled (capturing) or displayed (output) buffer
104from the driver's outgoing queue. They just set the 109from the driver's outgoing queue. They just set the
105<structfield>type</structfield> and <structfield>memory</structfield> 110<structfield>type</structfield>, <structfield>memory</structfield>
111and <structfield>reserved</structfield>
106fields of a &v4l2-buffer; as above, when <constant>VIDIOC_DQBUF</constant> 112fields of a &v4l2-buffer; as above, when <constant>VIDIOC_DQBUF</constant>
107is called with a pointer to this structure the driver fills the 113is called with a pointer to this structure the driver fills the
108remaining fields or returns an error code.</para> 114remaining fields or returns an error code.</para>
diff --git a/Documentation/DocBook/v4l/vidioc-querybuf.xml b/Documentation/DocBook/v4l/vidioc-querybuf.xml
index d834993e6191..e649805a4908 100644
--- a/Documentation/DocBook/v4l/vidioc-querybuf.xml
+++ b/Documentation/DocBook/v4l/vidioc-querybuf.xml
@@ -54,12 +54,13 @@ buffer at any time after buffers have been allocated with the
54&VIDIOC-REQBUFS; ioctl.</para> 54&VIDIOC-REQBUFS; ioctl.</para>
55 55
56 <para>Applications set the <structfield>type</structfield> field 56 <para>Applications set the <structfield>type</structfield> field
57 of a &v4l2-buffer; to the same buffer type as previously 57 of a &v4l2-buffer; to the same buffer type as was previously used with
58&v4l2-format; <structfield>type</structfield> and &v4l2-requestbuffers; 58&v4l2-format; <structfield>type</structfield> and &v4l2-requestbuffers;
59<structfield>type</structfield>, and the <structfield>index</structfield> 59<structfield>type</structfield>, and the <structfield>index</structfield>
60 field. Valid index numbers range from zero 60 field. Valid index numbers range from zero
61to the number of buffers allocated with &VIDIOC-REQBUFS; 61to the number of buffers allocated with &VIDIOC-REQBUFS;
62 (&v4l2-requestbuffers; <structfield>count</structfield>) minus one. 62 (&v4l2-requestbuffers; <structfield>count</structfield>) minus one.
63The <structfield>reserved</structfield> field should to set to 0.
63After calling <constant>VIDIOC_QUERYBUF</constant> with a pointer to 64After calling <constant>VIDIOC_QUERYBUF</constant> with a pointer to
64 this structure drivers return an error code or fill the rest of 65 this structure drivers return an error code or fill the rest of
65the structure.</para> 66the structure.</para>
@@ -68,8 +69,8 @@ the structure.</para>
68<constant>V4L2_BUF_FLAG_MAPPED</constant>, 69<constant>V4L2_BUF_FLAG_MAPPED</constant>,
69<constant>V4L2_BUF_FLAG_QUEUED</constant> and 70<constant>V4L2_BUF_FLAG_QUEUED</constant> and
70<constant>V4L2_BUF_FLAG_DONE</constant> flags will be valid. The 71<constant>V4L2_BUF_FLAG_DONE</constant> flags will be valid. The
71<structfield>memory</structfield> field will be set to 72<structfield>memory</structfield> field will be set to the current
72<constant>V4L2_MEMORY_MMAP</constant>, the <structfield>m.offset</structfield> 73I/O method, the <structfield>m.offset</structfield>
73contains the offset of the buffer from the start of the device memory, 74contains the offset of the buffer from the start of the device memory,
74the <structfield>length</structfield> field its size. The driver may 75the <structfield>length</structfield> field its size. The driver may
75or may not set the remaining fields and flags, they are meaningless in 76or may not set the remaining fields and flags, they are meaningless in
diff --git a/Documentation/DocBook/v4l/vidioc-reqbufs.xml b/Documentation/DocBook/v4l/vidioc-reqbufs.xml
index bab38084454f..1c0816372074 100644
--- a/Documentation/DocBook/v4l/vidioc-reqbufs.xml
+++ b/Documentation/DocBook/v4l/vidioc-reqbufs.xml
@@ -54,23 +54,23 @@ I/O. Memory mapped buffers are located in device memory and must be
54allocated with this ioctl before they can be mapped into the 54allocated with this ioctl before they can be mapped into the
55application's address space. User buffers are allocated by 55application's address space. User buffers are allocated by
56applications themselves, and this ioctl is merely used to switch the 56applications themselves, and this ioctl is merely used to switch the
57driver into user pointer I/O mode.</para> 57driver into user pointer I/O mode and to setup some internal structures.</para>
58 58
59 <para>To allocate device buffers applications initialize three 59 <para>To allocate device buffers applications initialize all
60fields of a <structname>v4l2_requestbuffers</structname> structure. 60fields of the <structname>v4l2_requestbuffers</structname> structure.
61They set the <structfield>type</structfield> field to the respective 61They set the <structfield>type</structfield> field to the respective
62stream or buffer type, the <structfield>count</structfield> field to 62stream or buffer type, the <structfield>count</structfield> field to
63the desired number of buffers, and <structfield>memory</structfield> 63the desired number of buffers, <structfield>memory</structfield>
64must be set to <constant>V4L2_MEMORY_MMAP</constant>. When the ioctl 64must be set to the requested I/O method and the reserved array
65is called with a pointer to this structure the driver attempts to 65must be zeroed. When the ioctl
66allocate the requested number of buffers and stores the actual number 66is called with a pointer to this structure the driver will attempt to allocate
67the requested number of buffers and it stores the actual number
67allocated in the <structfield>count</structfield> field. It can be 68allocated in the <structfield>count</structfield> field. It can be
68smaller than the number requested, even zero, when the driver runs out 69smaller than the number requested, even zero, when the driver runs out
69of free memory. A larger number is possible when the driver requires 70of free memory. A larger number is also possible when the driver requires
70more buffers to function correctly.<footnote> 71more buffers to function correctly. For example video output requires at least two buffers,
71 <para>For example video output requires at least two buffers,
72one displayed and one filled by the application.</para> 72one displayed and one filled by the application.</para>
73 </footnote> When memory mapping I/O is not supported the ioctl 73 <para>When the I/O method is not supported the ioctl
74returns an &EINVAL;.</para> 74returns an &EINVAL;.</para>
75 75
76 <para>Applications can call <constant>VIDIOC_REQBUFS</constant> 76 <para>Applications can call <constant>VIDIOC_REQBUFS</constant>
@@ -81,14 +81,6 @@ in progress, an implicit &VIDIOC-STREAMOFF;. <!-- mhs: I see no
81reason why munmap()ping one or even all buffers must imply 81reason why munmap()ping one or even all buffers must imply
82streamoff.--></para> 82streamoff.--></para>
83 83
84 <para>To negotiate user pointer I/O, applications initialize only
85the <structfield>type</structfield> field and set
86<structfield>memory</structfield> to
87<constant>V4L2_MEMORY_USERPTR</constant>. When the ioctl is called
88with a pointer to this structure the driver prepares for user pointer
89I/O, when this I/O method is not supported the ioctl returns an
90&EINVAL;.</para>
91
92 <table pgwide="1" frame="none" id="v4l2-requestbuffers"> 84 <table pgwide="1" frame="none" id="v4l2-requestbuffers">
93 <title>struct <structname>v4l2_requestbuffers</structname></title> 85 <title>struct <structname>v4l2_requestbuffers</structname></title>
94 <tgroup cols="3"> 86 <tgroup cols="3">
@@ -97,9 +89,7 @@ I/O, when this I/O method is not supported the ioctl returns an
97 <row> 89 <row>
98 <entry>__u32</entry> 90 <entry>__u32</entry>
99 <entry><structfield>count</structfield></entry> 91 <entry><structfield>count</structfield></entry>
100 <entry>The number of buffers requested or granted. This 92 <entry>The number of buffers requested or granted.</entry>
101field is only used when <structfield>memory</structfield> is set to
102<constant>V4L2_MEMORY_MMAP</constant>.</entry>
103 </row> 93 </row>
104 <row> 94 <row>
105 <entry>&v4l2-buf-type;</entry> 95 <entry>&v4l2-buf-type;</entry>
@@ -120,7 +110,7 @@ as the &v4l2-format; <structfield>type</structfield> field. See <xref
120 <entry><structfield>reserved</structfield>[2]</entry> 110 <entry><structfield>reserved</structfield>[2]</entry>
121 <entry>A place holder for future extensions and custom 111 <entry>A place holder for future extensions and custom
122(driver defined) buffer types <constant>V4L2_BUF_TYPE_PRIVATE</constant> and 112(driver defined) buffer types <constant>V4L2_BUF_TYPE_PRIVATE</constant> and
123higher.</entry> 113higher. This array should be zeroed by applications.</entry>
124 </row> 114 </row>
125 </tbody> 115 </tbody>
126 </tgroup> 116 </tgroup>
diff --git a/Documentation/HOWTO b/Documentation/HOWTO
index 8495fc970391..f5395af88a41 100644
--- a/Documentation/HOWTO
+++ b/Documentation/HOWTO
@@ -221,8 +221,8 @@ branches. These different branches are:
221 - main 2.6.x kernel tree 221 - main 2.6.x kernel tree
222 - 2.6.x.y -stable kernel tree 222 - 2.6.x.y -stable kernel tree
223 - 2.6.x -git kernel patches 223 - 2.6.x -git kernel patches
224 - 2.6.x -mm kernel patches
225 - subsystem specific kernel trees and patches 224 - subsystem specific kernel trees and patches
225 - the 2.6.x -next kernel tree for integration tests
226 226
2272.6.x kernel tree 2272.6.x kernel tree
228----------------- 228-----------------
@@ -232,7 +232,7 @@ process is as follows:
232 - As soon as a new kernel is released a two weeks window is open, 232 - As soon as a new kernel is released a two weeks window is open,
233 during this period of time maintainers can submit big diffs to 233 during this period of time maintainers can submit big diffs to
234 Linus, usually the patches that have already been included in the 234 Linus, usually the patches that have already been included in the
235 -mm kernel for a few weeks. The preferred way to submit big changes 235 -next kernel for a few weeks. The preferred way to submit big changes
236 is using git (the kernel's source management tool, more information 236 is using git (the kernel's source management tool, more information
237 can be found at http://git.or.cz/) but plain patches are also just 237 can be found at http://git.or.cz/) but plain patches are also just
238 fine. 238 fine.
@@ -293,84 +293,43 @@ daily and represent the current state of Linus' tree. They are more
293experimental than -rc kernels since they are generated automatically 293experimental than -rc kernels since they are generated automatically
294without even a cursory glance to see if they are sane. 294without even a cursory glance to see if they are sane.
295 295
2962.6.x -mm kernel patches
297------------------------
298These are experimental kernel patches released by Andrew Morton. Andrew
299takes all of the different subsystem kernel trees and patches and mushes
300them together, along with a lot of patches that have been plucked from
301the linux-kernel mailing list. This tree serves as a proving ground for
302new features and patches. Once a patch has proved its worth in -mm for
303a while Andrew or the subsystem maintainer pushes it on to Linus for
304inclusion in mainline.
305
306It is heavily encouraged that all new patches get tested in the -mm tree
307before they are sent to Linus for inclusion in the main kernel tree. Code
308which does not make an appearance in -mm before the opening of the merge
309window will prove hard to merge into the mainline.
310
311These kernels are not appropriate for use on systems that are supposed
312to be stable and they are more risky to run than any of the other
313branches.
314
315If you wish to help out with the kernel development process, please test
316and use these kernel releases and provide feedback to the linux-kernel
317mailing list if you have any problems, and if everything works properly.
318
319In addition to all the other experimental patches, these kernels usually
320also contain any changes in the mainline -git kernels available at the
321time of release.
322
323The -mm kernels are not released on a fixed schedule, but usually a few
324-mm kernels are released in between each -rc kernel (1 to 3 is common).
325
326Subsystem Specific kernel trees and patches 296Subsystem Specific kernel trees and patches
327------------------------------------------- 297-------------------------------------------
328A number of the different kernel subsystem developers expose their 298The maintainers of the various kernel subsystems --- and also many
329development trees so that others can see what is happening in the 299kernel subsystem developers --- expose their current state of
330different areas of the kernel. These trees are pulled into the -mm 300development in source repositories. That way, others can see what is
331kernel releases as described above. 301happening in the different areas of the kernel. In areas where
332 302development is rapid, a developer may be asked to base his submissions
333Here is a list of some of the different kernel trees available: 303onto such a subsystem kernel tree so that conflicts between the
334 git trees: 304submission and other already ongoing work are avoided.
335 - Kbuild development tree, Sam Ravnborg <sam@ravnborg.org> 305
336 git.kernel.org:/pub/scm/linux/kernel/git/sam/kbuild.git 306Most of these repositories are git trees, but there are also other SCMs
337 307in use, or patch queues being published as quilt series. Addresses of
338 - ACPI development tree, Len Brown <len.brown@intel.com> 308these subsystem repositories are listed in the MAINTAINERS file. Many
339 git.kernel.org:/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6.git 309of them can be browsed at http://git.kernel.org/.
340 310
341 - Block development tree, Jens Axboe <jens.axboe@oracle.com> 311Before a proposed patch is committed to such a subsystem tree, it is
342 git.kernel.org:/pub/scm/linux/kernel/git/axboe/linux-2.6-block.git 312subject to review which primarily happens on mailing lists (see the
343 313respective section below). For several kernel subsystems, this review
344 - DRM development tree, Dave Airlie <airlied@linux.ie> 314process is tracked with the tool patchwork. Patchwork offers a web
345 git.kernel.org:/pub/scm/linux/kernel/git/airlied/drm-2.6.git 315interface which shows patch postings, any comments on a patch or
346 316revisions to it, and maintainers can mark patches as under review,
347 - ia64 development tree, Tony Luck <tony.luck@intel.com> 317accepted, or rejected. Most of these patchwork sites are listed at
348 git.kernel.org:/pub/scm/linux/kernel/git/aegl/linux-2.6.git 318http://patchwork.kernel.org/ or http://patchwork.ozlabs.org/.
349 319
350 - infiniband, Roland Dreier <rolandd@cisco.com> 3202.6.x -next kernel tree for integration tests
351 git.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git 321---------------------------------------------
352 322Before updates from subsystem trees are merged into the mainline 2.6.x
353 - libata, Jeff Garzik <jgarzik@pobox.com> 323tree, they need to be integration-tested. For this purpose, a special
354 git.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev.git 324testing repository exists into which virtually all subsystem trees are
355 325pulled on an almost daily basis:
356 - network drivers, Jeff Garzik <jgarzik@pobox.com> 326 http://git.kernel.org/?p=linux/kernel/git/sfr/linux-next.git
357 git.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git 327 http://linux.f-seidel.de/linux-next/pmwiki/
358 328
359 - pcmcia, Dominik Brodowski <linux@dominikbrodowski.net> 329This way, the -next kernel gives a summary outlook onto what will be
360 git.kernel.org:/pub/scm/linux/kernel/git/brodo/pcmcia-2.6.git 330expected to go into the mainline kernel at the next merge period.
361 331Adventurous testers are very welcome to runtime-test the -next kernel.
362 - SCSI, James Bottomley <James.Bottomley@hansenpartnership.com>
363 git.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6.git
364
365 - x86, Ingo Molnar <mingo@elte.hu>
366 git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86.git
367
368 quilt trees:
369 - USB, Driver Core, and I2C, Greg Kroah-Hartman <gregkh@suse.de>
370 kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/
371 332
372 Other kernel trees can be found listed at http://git.kernel.org/ and in
373 the MAINTAINERS file.
374 333
375Bug Reporting 334Bug Reporting
376------------- 335-------------
diff --git a/Documentation/IPMI.txt b/Documentation/IPMI.txt
index bc38283379f0..69dd29ed824e 100644
--- a/Documentation/IPMI.txt
+++ b/Documentation/IPMI.txt
@@ -365,6 +365,7 @@ You can change this at module load time (for a module) with:
365 regshifts=<shift1>,<shift2>,... 365 regshifts=<shift1>,<shift2>,...
366 slave_addrs=<addr1>,<addr2>,... 366 slave_addrs=<addr1>,<addr2>,...
367 force_kipmid=<enable1>,<enable2>,... 367 force_kipmid=<enable1>,<enable2>,...
368 kipmid_max_busy_us=<ustime1>,<ustime2>,...
368 unload_when_empty=[0|1] 369 unload_when_empty=[0|1]
369 370
370Each of these except si_trydefaults is a list, the first item for the 371Each of these except si_trydefaults is a list, the first item for the
@@ -433,6 +434,7 @@ kernel command line as:
433 ipmi_si.regshifts=<shift1>,<shift2>,... 434 ipmi_si.regshifts=<shift1>,<shift2>,...
434 ipmi_si.slave_addrs=<addr1>,<addr2>,... 435 ipmi_si.slave_addrs=<addr1>,<addr2>,...
435 ipmi_si.force_kipmid=<enable1>,<enable2>,... 436 ipmi_si.force_kipmid=<enable1>,<enable2>,...
437 ipmi_si.kipmid_max_busy_us=<ustime1>,<ustime2>,...
436 438
437It works the same as the module parameters of the same names. 439It works the same as the module parameters of the same names.
438 440
@@ -450,6 +452,16 @@ force this thread on or off. If you force it off and don't have
450interrupts, the driver will run VERY slowly. Don't blame me, 452interrupts, the driver will run VERY slowly. Don't blame me,
451these interfaces suck. 453these interfaces suck.
452 454
455Unfortunately, this thread can use a lot of CPU depending on the
456interface's performance. This can waste a lot of CPU and cause
457various issues with detecting idle CPU and using extra power. To
458avoid this, the kipmid_max_busy_us sets the maximum amount of time, in
459microseconds, that kipmid will spin before sleeping for a tick. This
460value sets a balance between performance and CPU waste and needs to be
461tuned to your needs. Maybe, someday, auto-tuning will be added, but
462that's not a simple thing and even the auto-tuning would need to be
463tuned to the user's desired performance.
464
453The driver supports a hot add and remove of interfaces. This way, 465The driver supports a hot add and remove of interfaces. This way,
454interfaces can be added or removed after the kernel is up and running. 466interfaces can be added or removed after the kernel is up and running.
455This is done using /sys/modules/ipmi_si/parameters/hotmod, which is a 467This is done using /sys/modules/ipmi_si/parameters/hotmod, which is a
diff --git a/Documentation/Makefile b/Documentation/Makefile
index 94b945733534..6fc7ea1d1f9d 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -1,3 +1,3 @@
1obj-m := DocBook/ accounting/ auxdisplay/ connector/ \ 1obj-m := DocBook/ accounting/ auxdisplay/ connector/ \
2 filesystems/configfs/ ia64/ networking/ \ 2 filesystems/ filesystems/configfs/ ia64/ laptops/ networking/ \
3 pcmcia/ spi/ video4linux/ vm/ watchdog/src/ 3 pcmcia/ spi/ timers/ video4linux/ vm/ watchdog/src/
diff --git a/Documentation/PCI/PCI-DMA-mapping.txt b/Documentation/PCI/PCI-DMA-mapping.txt
index ecad88d9fe59..52618ab069ad 100644
--- a/Documentation/PCI/PCI-DMA-mapping.txt
+++ b/Documentation/PCI/PCI-DMA-mapping.txt
@@ -1,12 +1,12 @@
1 Dynamic DMA mapping 1 Dynamic DMA mapping Guide
2 =================== 2 =========================
3 3
4 David S. Miller <davem@redhat.com> 4 David S. Miller <davem@redhat.com>
5 Richard Henderson <rth@cygnus.com> 5 Richard Henderson <rth@cygnus.com>
6 Jakub Jelinek <jakub@redhat.com> 6 Jakub Jelinek <jakub@redhat.com>
7 7
8This document describes the DMA mapping system in terms of the pci_ 8This is a guide to device driver writers on how to use the DMA API
9API. For a similar API that works for generic devices, see 9with example pseudo-code. For a concise description of the API, see
10DMA-API.txt. 10DMA-API.txt.
11 11
12Most of the 64bit platforms have special hardware that translates bus 12Most of the 64bit platforms have special hardware that translates bus
@@ -26,12 +26,15 @@ mapped only for the time they are actually used and unmapped after the DMA
26transfer. 26transfer.
27 27
28The following API will work of course even on platforms where no such 28The following API will work of course even on platforms where no such
29hardware exists, see e.g. arch/x86/include/asm/pci.h for how it is implemented on 29hardware exists.
30top of the virt_to_bus interface. 30
31Note that the DMA API works with any bus independent of the underlying
32microprocessor architecture. You should use the DMA API rather than
33the bus specific DMA API (e.g. pci_dma_*).
31 34
32First of all, you should make sure 35First of all, you should make sure
33 36
34#include <linux/pci.h> 37#include <linux/dma-mapping.h>
35 38
36is in your driver. This file will obtain for you the definition of the 39is in your driver. This file will obtain for you the definition of the
37dma_addr_t (which can hold any valid DMA address for the platform) 40dma_addr_t (which can hold any valid DMA address for the platform)
@@ -78,44 +81,43 @@ for you to DMA from/to.
78 DMA addressing limitations 81 DMA addressing limitations
79 82
80Does your device have any DMA addressing limitations? For example, is 83Does your device have any DMA addressing limitations? For example, is
81your device only capable of driving the low order 24-bits of address 84your device only capable of driving the low order 24-bits of address?
82on the PCI bus for SAC DMA transfers? If so, you need to inform the 85If so, you need to inform the kernel of this fact.
83PCI layer of this fact.
84 86
85By default, the kernel assumes that your device can address the full 87By default, the kernel assumes that your device can address the full
8632-bits in a SAC cycle. For a 64-bit DAC capable device, this needs 8832-bits. For a 64-bit capable device, this needs to be increased.
87to be increased. And for a device with limitations, as discussed in 89And for a device with limitations, as discussed in the previous
88the previous paragraph, it needs to be decreased. 90paragraph, it needs to be decreased.
89 91
90pci_alloc_consistent() by default will return 32-bit DMA addresses. 92Special note about PCI: PCI-X specification requires PCI-X devices to
91PCI-X specification requires PCI-X devices to support 64-bit 93support 64-bit addressing (DAC) for all transactions. And at least
92addressing (DAC) for all transactions. And at least one platform (SGI 94one platform (SGI SN2) requires 64-bit consistent allocations to
93SN2) requires 64-bit consistent allocations to operate correctly when 95operate correctly when the IO bus is in PCI-X mode.
94the IO bus is in PCI-X mode. Therefore, like with pci_set_dma_mask(), 96
95it's good practice to call pci_set_consistent_dma_mask() to set the 97For correct operation, you must interrogate the kernel in your device
96appropriate mask even if your device only supports 32-bit DMA 98probe routine to see if the DMA controller on the machine can properly
97(default) and especially if it's a PCI-X device. 99support the DMA addressing limitation your device has. It is good
98 100style to do this even if your device holds the default setting,
99For correct operation, you must interrogate the PCI layer in your
100device probe routine to see if the PCI controller on the machine can
101properly support the DMA addressing limitation your device has. It is
102good style to do this even if your device holds the default setting,
103because this shows that you did think about these issues wrt. your 101because this shows that you did think about these issues wrt. your
104device. 102device.
105 103
106The query is performed via a call to pci_set_dma_mask(): 104The query is performed via a call to dma_set_mask():
107 105
108 int pci_set_dma_mask(struct pci_dev *pdev, u64 device_mask); 106 int dma_set_mask(struct device *dev, u64 mask);
109 107
110The query for consistent allocations is performed via a call to 108The query for consistent allocations is performed via a call to
111pci_set_consistent_dma_mask(): 109dma_set_coherent_mask():
112 110
113 int pci_set_consistent_dma_mask(struct pci_dev *pdev, u64 device_mask); 111 int dma_set_coherent_mask(struct device *dev, u64 mask);
114 112
115Here, pdev is a pointer to the PCI device struct of your device, and 113Here, dev is a pointer to the device struct of your device, and mask
116device_mask is a bit mask describing which bits of a PCI address your 114is a bit mask describing which bits of an address your device
117device supports. It returns zero if your card can perform DMA 115supports. It returns zero if your card can perform DMA properly on
118properly on the machine given the address mask you provided. 116the machine given the address mask you provided. In general, the
117device struct of your device is embedded in the bus specific device
118struct of your device. For example, a pointer to the device struct of
119your PCI device is pdev->dev (pdev is a pointer to the PCI device
120struct of your device).
119 121
120If it returns non-zero, your device cannot perform DMA properly on 122If it returns non-zero, your device cannot perform DMA properly on
121this platform, and attempting to do so will result in undefined 123this platform, and attempting to do so will result in undefined
@@ -133,31 +135,30 @@ of your driver reports that performance is bad or that the device is not
133even detected, you can ask them for the kernel messages to find out 135even detected, you can ask them for the kernel messages to find out
134exactly why. 136exactly why.
135 137
136The standard 32-bit addressing PCI device would do something like 138The standard 32-bit addressing device would do something like this:
137this:
138 139
139 if (pci_set_dma_mask(pdev, DMA_BIT_MASK(32))) { 140 if (dma_set_mask(dev, DMA_BIT_MASK(32))) {
140 printk(KERN_WARNING 141 printk(KERN_WARNING
141 "mydev: No suitable DMA available.\n"); 142 "mydev: No suitable DMA available.\n");
142 goto ignore_this_device; 143 goto ignore_this_device;
143 } 144 }
144 145
145Another common scenario is a 64-bit capable device. The approach 146Another common scenario is a 64-bit capable device. The approach here
146here is to try for 64-bit DAC addressing, but back down to a 147is to try for 64-bit addressing, but back down to a 32-bit mask that
14732-bit mask should that fail. The PCI platform code may fail the 148should not fail. The kernel may fail the 64-bit mask not because the
14864-bit mask not because the platform is not capable of 64-bit 149platform is not capable of 64-bit addressing. Rather, it may fail in
149addressing. Rather, it may fail in this case simply because 150this case simply because 32-bit addressing is done more efficiently
15032-bit SAC addressing is done more efficiently than DAC addressing. 151than 64-bit addressing. For example, Sparc64 PCI SAC addressing is
151Sparc64 is one platform which behaves in this way. 152more efficient than DAC addressing.
152 153
153Here is how you would handle a 64-bit capable device which can drive 154Here is how you would handle a 64-bit capable device which can drive
154all 64-bits when accessing streaming DMA: 155all 64-bits when accessing streaming DMA:
155 156
156 int using_dac; 157 int using_dac;
157 158
158 if (!pci_set_dma_mask(pdev, DMA_BIT_MASK(64))) { 159 if (!dma_set_mask(dev, DMA_BIT_MASK(64))) {
159 using_dac = 1; 160 using_dac = 1;
160 } else if (!pci_set_dma_mask(pdev, DMA_BIT_MASK(32))) { 161 } else if (!dma_set_mask(dev, DMA_BIT_MASK(32))) {
161 using_dac = 0; 162 using_dac = 0;
162 } else { 163 } else {
163 printk(KERN_WARNING 164 printk(KERN_WARNING
@@ -170,36 +171,36 @@ the case would look like this:
170 171
171 int using_dac, consistent_using_dac; 172 int using_dac, consistent_using_dac;
172 173
173 if (!pci_set_dma_mask(pdev, DMA_BIT_MASK(64))) { 174 if (!dma_set_mask(dev, DMA_BIT_MASK(64))) {
174 using_dac = 1; 175 using_dac = 1;
175 consistent_using_dac = 1; 176 consistent_using_dac = 1;
176 pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(64)); 177 dma_set_coherent_mask(dev, DMA_BIT_MASK(64));
177 } else if (!pci_set_dma_mask(pdev, DMA_BIT_MASK(32))) { 178 } else if (!dma_set_mask(dev, DMA_BIT_MASK(32))) {
178 using_dac = 0; 179 using_dac = 0;
179 consistent_using_dac = 0; 180 consistent_using_dac = 0;
180 pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(32)); 181 dma_set_coherent_mask(dev, DMA_BIT_MASK(32));
181 } else { 182 } else {
182 printk(KERN_WARNING 183 printk(KERN_WARNING
183 "mydev: No suitable DMA available.\n"); 184 "mydev: No suitable DMA available.\n");
184 goto ignore_this_device; 185 goto ignore_this_device;
185 } 186 }
186 187
187pci_set_consistent_dma_mask() will always be able to set the same or a 188dma_set_coherent_mask() will always be able to set the same or a
188smaller mask as pci_set_dma_mask(). However for the rare case that a 189smaller mask as dma_set_mask(). However for the rare case that a
189device driver only uses consistent allocations, one would have to 190device driver only uses consistent allocations, one would have to
190check the return value from pci_set_consistent_dma_mask(). 191check the return value from dma_set_coherent_mask().
191 192
192Finally, if your device can only drive the low 24-bits of 193Finally, if your device can only drive the low 24-bits of
193address during PCI bus mastering you might do something like: 194address you might do something like:
194 195
195 if (pci_set_dma_mask(pdev, DMA_BIT_MASK(24))) { 196 if (dma_set_mask(dev, DMA_BIT_MASK(24))) {
196 printk(KERN_WARNING 197 printk(KERN_WARNING
197 "mydev: 24-bit DMA addressing not available.\n"); 198 "mydev: 24-bit DMA addressing not available.\n");
198 goto ignore_this_device; 199 goto ignore_this_device;
199 } 200 }
200 201
201When pci_set_dma_mask() is successful, and returns zero, the PCI layer 202When dma_set_mask() is successful, and returns zero, the kernel saves
202saves away this mask you have provided. The PCI layer will use this 203away this mask you have provided. The kernel will use this
203information later when you make DMA mappings. 204information later when you make DMA mappings.
204 205
205There is a case which we are aware of at this time, which is worth 206There is a case which we are aware of at this time, which is worth
@@ -208,7 +209,7 @@ functions (for example a sound card provides playback and record
208functions) and the various different functions have _different_ 209functions) and the various different functions have _different_
209DMA addressing limitations, you may wish to probe each mask and 210DMA addressing limitations, you may wish to probe each mask and
210only provide the functionality which the machine can handle. It 211only provide the functionality which the machine can handle. It
211is important that the last call to pci_set_dma_mask() be for the 212is important that the last call to dma_set_mask() be for the
212most specific mask. 213most specific mask.
213 214
214Here is pseudo-code showing how this might be done: 215Here is pseudo-code showing how this might be done:
@@ -217,17 +218,17 @@ Here is pseudo-code showing how this might be done:
217 #define RECORD_ADDRESS_BITS DMA_BIT_MASK(24) 218 #define RECORD_ADDRESS_BITS DMA_BIT_MASK(24)
218 219
219 struct my_sound_card *card; 220 struct my_sound_card *card;
220 struct pci_dev *pdev; 221 struct device *dev;
221 222
222 ... 223 ...
223 if (!pci_set_dma_mask(pdev, PLAYBACK_ADDRESS_BITS)) { 224 if (!dma_set_mask(dev, PLAYBACK_ADDRESS_BITS)) {
224 card->playback_enabled = 1; 225 card->playback_enabled = 1;
225 } else { 226 } else {
226 card->playback_enabled = 0; 227 card->playback_enabled = 0;
227 printk(KERN_WARNING "%s: Playback disabled due to DMA limitations.\n", 228 printk(KERN_WARNING "%s: Playback disabled due to DMA limitations.\n",
228 card->name); 229 card->name);
229 } 230 }
230 if (!pci_set_dma_mask(pdev, RECORD_ADDRESS_BITS)) { 231 if (!dma_set_mask(dev, RECORD_ADDRESS_BITS)) {
231 card->record_enabled = 1; 232 card->record_enabled = 1;
232 } else { 233 } else {
233 card->record_enabled = 0; 234 card->record_enabled = 0;
@@ -252,8 +253,8 @@ There are two types of DMA mappings:
252 Think of "consistent" as "synchronous" or "coherent". 253 Think of "consistent" as "synchronous" or "coherent".
253 254
254 The current default is to return consistent memory in the low 32 255 The current default is to return consistent memory in the low 32
255 bits of the PCI bus space. However, for future compatibility you 256 bits of the bus space. However, for future compatibility you should
256 should set the consistent mask even if this default is fine for your 257 set the consistent mask even if this default is fine for your
257 driver. 258 driver.
258 259
259 Good examples of what to use consistent mappings for are: 260 Good examples of what to use consistent mappings for are:
@@ -285,9 +286,9 @@ There are two types of DMA mappings:
285 found in PCI bridges (such as by reading a register's value 286 found in PCI bridges (such as by reading a register's value
286 after writing it). 287 after writing it).
287 288
288- Streaming DMA mappings which are usually mapped for one DMA transfer, 289- Streaming DMA mappings which are usually mapped for one DMA
289 unmapped right after it (unless you use pci_dma_sync_* below) and for which 290 transfer, unmapped right after it (unless you use dma_sync_* below)
290 hardware can optimize for sequential accesses. 291 and for which hardware can optimize for sequential accesses.
291 292
292 This of "streaming" as "asynchronous" or "outside the coherency 293 This of "streaming" as "asynchronous" or "outside the coherency
293 domain". 294 domain".
@@ -302,8 +303,8 @@ There are two types of DMA mappings:
302 optimizations the hardware allows. To this end, when using 303 optimizations the hardware allows. To this end, when using
303 such mappings you must be explicit about what you want to happen. 304 such mappings you must be explicit about what you want to happen.
304 305
305Neither type of DMA mapping has alignment restrictions that come 306Neither type of DMA mapping has alignment restrictions that come from
306from PCI, although some devices may have such restrictions. 307the underlying bus, although some devices may have such restrictions.
307Also, systems with caches that aren't DMA-coherent will work better 308Also, systems with caches that aren't DMA-coherent will work better
308when the underlying buffers don't share cache lines with other data. 309when the underlying buffers don't share cache lines with other data.
309 310
@@ -315,33 +316,27 @@ you should do:
315 316
316 dma_addr_t dma_handle; 317 dma_addr_t dma_handle;
317 318
318 cpu_addr = pci_alloc_consistent(pdev, size, &dma_handle); 319 cpu_addr = dma_alloc_coherent(dev, size, &dma_handle, gfp);
319
320where pdev is a struct pci_dev *. This may be called in interrupt context.
321You should use dma_alloc_coherent (see DMA-API.txt) for buses
322where devices don't have struct pci_dev (like ISA, EISA).
323 320
324This argument is needed because the DMA translations may be bus 321where device is a struct device *. This may be called in interrupt
325specific (and often is private to the bus which the device is attached 322context with the GFP_ATOMIC flag.
326to).
327 323
328Size is the length of the region you want to allocate, in bytes. 324Size is the length of the region you want to allocate, in bytes.
329 325
330This routine will allocate RAM for that region, so it acts similarly to 326This routine will allocate RAM for that region, so it acts similarly to
331__get_free_pages (but takes size instead of a page order). If your 327__get_free_pages (but takes size instead of a page order). If your
332driver needs regions sized smaller than a page, you may prefer using 328driver needs regions sized smaller than a page, you may prefer using
333the pci_pool interface, described below. 329the dma_pool interface, described below.
334 330
335The consistent DMA mapping interfaces, for non-NULL pdev, will by 331The consistent DMA mapping interfaces, for non-NULL dev, will by
336default return a DMA address which is SAC (Single Address Cycle) 332default return a DMA address which is 32-bit addressable. Even if the
337addressable. Even if the device indicates (via PCI dma mask) that it 333device indicates (via DMA mask) that it may address the upper 32-bits,
338may address the upper 32-bits and thus perform DAC cycles, consistent 334consistent allocation will only return > 32-bit addresses for DMA if
339allocation will only return > 32-bit PCI addresses for DMA if the 335the consistent DMA mask has been explicitly changed via
340consistent dma mask has been explicitly changed via 336dma_set_coherent_mask(). This is true of the dma_pool interface as
341pci_set_consistent_dma_mask(). This is true of the pci_pool interface 337well.
342as well. 338
343 339dma_alloc_coherent returns two values: the virtual address which you
344pci_alloc_consistent returns two values: the virtual address which you
345can use to access it from the CPU and dma_handle which you pass to the 340can use to access it from the CPU and dma_handle which you pass to the
346card. 341card.
347 342
@@ -354,54 +349,54 @@ buffer you receive will not cross a 64K boundary.
354 349
355To unmap and free such a DMA region, you call: 350To unmap and free such a DMA region, you call:
356 351
357 pci_free_consistent(pdev, size, cpu_addr, dma_handle); 352 dma_free_coherent(dev, size, cpu_addr, dma_handle);
358 353
359where pdev, size are the same as in the above call and cpu_addr and 354where dev, size are the same as in the above call and cpu_addr and
360dma_handle are the values pci_alloc_consistent returned to you. 355dma_handle are the values dma_alloc_coherent returned to you.
361This function may not be called in interrupt context. 356This function may not be called in interrupt context.
362 357
363If your driver needs lots of smaller memory regions, you can write 358If your driver needs lots of smaller memory regions, you can write
364custom code to subdivide pages returned by pci_alloc_consistent, 359custom code to subdivide pages returned by dma_alloc_coherent,
365or you can use the pci_pool API to do that. A pci_pool is like 360or you can use the dma_pool API to do that. A dma_pool is like
366a kmem_cache, but it uses pci_alloc_consistent not __get_free_pages. 361a kmem_cache, but it uses dma_alloc_coherent not __get_free_pages.
367Also, it understands common hardware constraints for alignment, 362Also, it understands common hardware constraints for alignment,
368like queue heads needing to be aligned on N byte boundaries. 363like queue heads needing to be aligned on N byte boundaries.
369 364
370Create a pci_pool like this: 365Create a dma_pool like this:
371 366
372 struct pci_pool *pool; 367 struct dma_pool *pool;
373 368
374 pool = pci_pool_create(name, pdev, size, align, alloc); 369 pool = dma_pool_create(name, dev, size, align, alloc);
375 370
376The "name" is for diagnostics (like a kmem_cache name); pdev and size 371The "name" is for diagnostics (like a kmem_cache name); dev and size
377are as above. The device's hardware alignment requirement for this 372are as above. The device's hardware alignment requirement for this
378type of data is "align" (which is expressed in bytes, and must be a 373type of data is "align" (which is expressed in bytes, and must be a
379power of two). If your device has no boundary crossing restrictions, 374power of two). If your device has no boundary crossing restrictions,
380pass 0 for alloc; passing 4096 says memory allocated from this pool 375pass 0 for alloc; passing 4096 says memory allocated from this pool
381must not cross 4KByte boundaries (but at that time it may be better to 376must not cross 4KByte boundaries (but at that time it may be better to
382go for pci_alloc_consistent directly instead). 377go for dma_alloc_coherent directly instead).
383 378
384Allocate memory from a pci pool like this: 379Allocate memory from a dma pool like this:
385 380
386 cpu_addr = pci_pool_alloc(pool, flags, &dma_handle); 381 cpu_addr = dma_pool_alloc(pool, flags, &dma_handle);
387 382
388flags are SLAB_KERNEL if blocking is permitted (not in_interrupt nor 383flags are SLAB_KERNEL if blocking is permitted (not in_interrupt nor
389holding SMP locks), SLAB_ATOMIC otherwise. Like pci_alloc_consistent, 384holding SMP locks), SLAB_ATOMIC otherwise. Like dma_alloc_coherent,
390this returns two values, cpu_addr and dma_handle. 385this returns two values, cpu_addr and dma_handle.
391 386
392Free memory that was allocated from a pci_pool like this: 387Free memory that was allocated from a dma_pool like this:
393 388
394 pci_pool_free(pool, cpu_addr, dma_handle); 389 dma_pool_free(pool, cpu_addr, dma_handle);
395 390
396where pool is what you passed to pci_pool_alloc, and cpu_addr and 391where pool is what you passed to dma_pool_alloc, and cpu_addr and
397dma_handle are the values pci_pool_alloc returned. This function 392dma_handle are the values dma_pool_alloc returned. This function
398may be called in interrupt context. 393may be called in interrupt context.
399 394
400Destroy a pci_pool by calling: 395Destroy a dma_pool by calling:
401 396
402 pci_pool_destroy(pool); 397 dma_pool_destroy(pool);
403 398
404Make sure you've called pci_pool_free for all memory allocated 399Make sure you've called dma_pool_free for all memory allocated
405from a pool before you destroy the pool. This function may not 400from a pool before you destroy the pool. This function may not
406be called in interrupt context. 401be called in interrupt context.
407 402
@@ -411,15 +406,15 @@ The interfaces described in subsequent portions of this document
411take a DMA direction argument, which is an integer and takes on 406take a DMA direction argument, which is an integer and takes on
412one of the following values: 407one of the following values:
413 408
414 PCI_DMA_BIDIRECTIONAL 409 DMA_BIDIRECTIONAL
415 PCI_DMA_TODEVICE 410 DMA_TO_DEVICE
416 PCI_DMA_FROMDEVICE 411 DMA_FROM_DEVICE
417 PCI_DMA_NONE 412 DMA_NONE
418 413
419One should provide the exact DMA direction if you know it. 414One should provide the exact DMA direction if you know it.
420 415
421PCI_DMA_TODEVICE means "from main memory to the PCI device" 416DMA_TO_DEVICE means "from main memory to the device"
422PCI_DMA_FROMDEVICE means "from the PCI device to main memory" 417DMA_FROM_DEVICE means "from the device to main memory"
423It is the direction in which the data moves during the DMA 418It is the direction in which the data moves during the DMA
424transfer. 419transfer.
425 420
@@ -427,12 +422,12 @@ You are _strongly_ encouraged to specify this as precisely
427as you possibly can. 422as you possibly can.
428 423
429If you absolutely cannot know the direction of the DMA transfer, 424If you absolutely cannot know the direction of the DMA transfer,
430specify PCI_DMA_BIDIRECTIONAL. It means that the DMA can go in 425specify DMA_BIDIRECTIONAL. It means that the DMA can go in
431either direction. The platform guarantees that you may legally 426either direction. The platform guarantees that you may legally
432specify this, and that it will work, but this may be at the 427specify this, and that it will work, but this may be at the
433cost of performance for example. 428cost of performance for example.
434 429
435The value PCI_DMA_NONE is to be used for debugging. One can 430The value DMA_NONE is to be used for debugging. One can
436hold this in a data structure before you come to know the 431hold this in a data structure before you come to know the
437precise direction, and this will help catch cases where your 432precise direction, and this will help catch cases where your
438direction tracking logic has failed to set things up properly. 433direction tracking logic has failed to set things up properly.
@@ -442,21 +437,21 @@ potential platform-specific optimizations of such) is for debugging.
442Some platforms actually have a write permission boolean which DMA 437Some platforms actually have a write permission boolean which DMA
443mappings can be marked with, much like page protections in the user 438mappings can be marked with, much like page protections in the user
444program address space. Such platforms can and do report errors in the 439program address space. Such platforms can and do report errors in the
445kernel logs when the PCI controller hardware detects violation of the 440kernel logs when the DMA controller hardware detects violation of the
446permission setting. 441permission setting.
447 442
448Only streaming mappings specify a direction, consistent mappings 443Only streaming mappings specify a direction, consistent mappings
449implicitly have a direction attribute setting of 444implicitly have a direction attribute setting of
450PCI_DMA_BIDIRECTIONAL. 445DMA_BIDIRECTIONAL.
451 446
452The SCSI subsystem tells you the direction to use in the 447The SCSI subsystem tells you the direction to use in the
453'sc_data_direction' member of the SCSI command your driver is 448'sc_data_direction' member of the SCSI command your driver is
454working on. 449working on.
455 450
456For Networking drivers, it's a rather simple affair. For transmit 451For Networking drivers, it's a rather simple affair. For transmit
457packets, map/unmap them with the PCI_DMA_TODEVICE direction 452packets, map/unmap them with the DMA_TO_DEVICE direction
458specifier. For receive packets, just the opposite, map/unmap them 453specifier. For receive packets, just the opposite, map/unmap them
459with the PCI_DMA_FROMDEVICE direction specifier. 454with the DMA_FROM_DEVICE direction specifier.
460 455
461 Using Streaming DMA mappings 456 Using Streaming DMA mappings
462 457
@@ -467,43 +462,43 @@ scatterlist.
467 462
468To map a single region, you do: 463To map a single region, you do:
469 464
470 struct pci_dev *pdev = mydev->pdev; 465 struct device *dev = &my_dev->dev;
471 dma_addr_t dma_handle; 466 dma_addr_t dma_handle;
472 void *addr = buffer->ptr; 467 void *addr = buffer->ptr;
473 size_t size = buffer->len; 468 size_t size = buffer->len;
474 469
475 dma_handle = pci_map_single(pdev, addr, size, direction); 470 dma_handle = dma_map_single(dev, addr, size, direction);
476 471
477and to unmap it: 472and to unmap it:
478 473
479 pci_unmap_single(pdev, dma_handle, size, direction); 474 dma_unmap_single(dev, dma_handle, size, direction);
480 475
481You should call pci_unmap_single when the DMA activity is finished, e.g. 476You should call dma_unmap_single when the DMA activity is finished, e.g.
482from the interrupt which told you that the DMA transfer is done. 477from the interrupt which told you that the DMA transfer is done.
483 478
484Using cpu pointers like this for single mappings has a disadvantage, 479Using cpu pointers like this for single mappings has a disadvantage,
485you cannot reference HIGHMEM memory in this way. Thus, there is a 480you cannot reference HIGHMEM memory in this way. Thus, there is a
486map/unmap interface pair akin to pci_{map,unmap}_single. These 481map/unmap interface pair akin to dma_{map,unmap}_single. These
487interfaces deal with page/offset pairs instead of cpu pointers. 482interfaces deal with page/offset pairs instead of cpu pointers.
488Specifically: 483Specifically:
489 484
490 struct pci_dev *pdev = mydev->pdev; 485 struct device *dev = &my_dev->dev;
491 dma_addr_t dma_handle; 486 dma_addr_t dma_handle;
492 struct page *page = buffer->page; 487 struct page *page = buffer->page;
493 unsigned long offset = buffer->offset; 488 unsigned long offset = buffer->offset;
494 size_t size = buffer->len; 489 size_t size = buffer->len;
495 490
496 dma_handle = pci_map_page(pdev, page, offset, size, direction); 491 dma_handle = dma_map_page(dev, page, offset, size, direction);
497 492
498 ... 493 ...
499 494
500 pci_unmap_page(pdev, dma_handle, size, direction); 495 dma_unmap_page(dev, dma_handle, size, direction);
501 496
502Here, "offset" means byte offset within the given page. 497Here, "offset" means byte offset within the given page.
503 498
504With scatterlists, you map a region gathered from several regions by: 499With scatterlists, you map a region gathered from several regions by:
505 500
506 int i, count = pci_map_sg(pdev, sglist, nents, direction); 501 int i, count = dma_map_sg(dev, sglist, nents, direction);
507 struct scatterlist *sg; 502 struct scatterlist *sg;
508 503
509 for_each_sg(sglist, sg, count, i) { 504 for_each_sg(sglist, sg, count, i) {
@@ -527,16 +522,16 @@ accessed sg->address and sg->length as shown above.
527 522
528To unmap a scatterlist, just call: 523To unmap a scatterlist, just call:
529 524
530 pci_unmap_sg(pdev, sglist, nents, direction); 525 dma_unmap_sg(dev, sglist, nents, direction);
531 526
532Again, make sure DMA activity has already finished. 527Again, make sure DMA activity has already finished.
533 528
534PLEASE NOTE: The 'nents' argument to the pci_unmap_sg call must be 529PLEASE NOTE: The 'nents' argument to the dma_unmap_sg call must be
535 the _same_ one you passed into the pci_map_sg call, 530 the _same_ one you passed into the dma_map_sg call,
536 it should _NOT_ be the 'count' value _returned_ from the 531 it should _NOT_ be the 'count' value _returned_ from the
537 pci_map_sg call. 532 dma_map_sg call.
538 533
539Every pci_map_{single,sg} call should have its pci_unmap_{single,sg} 534Every dma_map_{single,sg} call should have its dma_unmap_{single,sg}
540counterpart, because the bus address space is a shared resource (although 535counterpart, because the bus address space is a shared resource (although
541in some ports the mapping is per each BUS so less devices contend for the 536in some ports the mapping is per each BUS so less devices contend for the
542same bus address space) and you could render the machine unusable by eating 537same bus address space) and you could render the machine unusable by eating
@@ -547,14 +542,14 @@ the data in between the DMA transfers, the buffer needs to be synced
547properly in order for the cpu and device to see the most uptodate and 542properly in order for the cpu and device to see the most uptodate and
548correct copy of the DMA buffer. 543correct copy of the DMA buffer.
549 544
550So, firstly, just map it with pci_map_{single,sg}, and after each DMA 545So, firstly, just map it with dma_map_{single,sg}, and after each DMA
551transfer call either: 546transfer call either:
552 547
553 pci_dma_sync_single_for_cpu(pdev, dma_handle, size, direction); 548 dma_sync_single_for_cpu(dev, dma_handle, size, direction);
554 549
555or: 550or:
556 551
557 pci_dma_sync_sg_for_cpu(pdev, sglist, nents, direction); 552 dma_sync_sg_for_cpu(dev, sglist, nents, direction);
558 553
559as appropriate. 554as appropriate.
560 555
@@ -562,27 +557,27 @@ Then, if you wish to let the device get at the DMA area again,
562finish accessing the data with the cpu, and then before actually 557finish accessing the data with the cpu, and then before actually
563giving the buffer to the hardware call either: 558giving the buffer to the hardware call either:
564 559
565 pci_dma_sync_single_for_device(pdev, dma_handle, size, direction); 560 dma_sync_single_for_device(dev, dma_handle, size, direction);
566 561
567or: 562or:
568 563
569 pci_dma_sync_sg_for_device(dev, sglist, nents, direction); 564 dma_sync_sg_for_device(dev, sglist, nents, direction);
570 565
571as appropriate. 566as appropriate.
572 567
573After the last DMA transfer call one of the DMA unmap routines 568After the last DMA transfer call one of the DMA unmap routines
574pci_unmap_{single,sg}. If you don't touch the data from the first pci_map_* 569dma_unmap_{single,sg}. If you don't touch the data from the first dma_map_*
575call till pci_unmap_*, then you don't have to call the pci_dma_sync_* 570call till dma_unmap_*, then you don't have to call the dma_sync_*
576routines at all. 571routines at all.
577 572
578Here is pseudo code which shows a situation in which you would need 573Here is pseudo code which shows a situation in which you would need
579to use the pci_dma_sync_*() interfaces. 574to use the dma_sync_*() interfaces.
580 575
581 my_card_setup_receive_buffer(struct my_card *cp, char *buffer, int len) 576 my_card_setup_receive_buffer(struct my_card *cp, char *buffer, int len)
582 { 577 {
583 dma_addr_t mapping; 578 dma_addr_t mapping;
584 579
585 mapping = pci_map_single(cp->pdev, buffer, len, PCI_DMA_FROMDEVICE); 580 mapping = dma_map_single(cp->dev, buffer, len, DMA_FROM_DEVICE);
586 581
587 cp->rx_buf = buffer; 582 cp->rx_buf = buffer;
588 cp->rx_len = len; 583 cp->rx_len = len;
@@ -606,25 +601,25 @@ to use the pci_dma_sync_*() interfaces.
606 * the DMA transfer with the CPU first 601 * the DMA transfer with the CPU first
607 * so that we see updated contents. 602 * so that we see updated contents.
608 */ 603 */
609 pci_dma_sync_single_for_cpu(cp->pdev, cp->rx_dma, 604 dma_sync_single_for_cpu(&cp->dev, cp->rx_dma,
610 cp->rx_len, 605 cp->rx_len,
611 PCI_DMA_FROMDEVICE); 606 DMA_FROM_DEVICE);
612 607
613 /* Now it is safe to examine the buffer. */ 608 /* Now it is safe to examine the buffer. */
614 hp = (struct my_card_header *) cp->rx_buf; 609 hp = (struct my_card_header *) cp->rx_buf;
615 if (header_is_ok(hp)) { 610 if (header_is_ok(hp)) {
616 pci_unmap_single(cp->pdev, cp->rx_dma, cp->rx_len, 611 dma_unmap_single(&cp->dev, cp->rx_dma, cp->rx_len,
617 PCI_DMA_FROMDEVICE); 612 DMA_FROM_DEVICE);
618 pass_to_upper_layers(cp->rx_buf); 613 pass_to_upper_layers(cp->rx_buf);
619 make_and_setup_new_rx_buf(cp); 614 make_and_setup_new_rx_buf(cp);
620 } else { 615 } else {
621 /* Just sync the buffer and give it back 616 /* Just sync the buffer and give it back
622 * to the card. 617 * to the card.
623 */ 618 */
624 pci_dma_sync_single_for_device(cp->pdev, 619 dma_sync_single_for_device(&cp->dev,
625 cp->rx_dma, 620 cp->rx_dma,
626 cp->rx_len, 621 cp->rx_len,
627 PCI_DMA_FROMDEVICE); 622 DMA_FROM_DEVICE);
628 give_rx_buf_to_card(cp); 623 give_rx_buf_to_card(cp);
629 } 624 }
630 } 625 }
@@ -634,19 +629,19 @@ Drivers converted fully to this interface should not use virt_to_bus any
634longer, nor should they use bus_to_virt. Some drivers have to be changed a 629longer, nor should they use bus_to_virt. Some drivers have to be changed a
635little bit, because there is no longer an equivalent to bus_to_virt in the 630little bit, because there is no longer an equivalent to bus_to_virt in the
636dynamic DMA mapping scheme - you have to always store the DMA addresses 631dynamic DMA mapping scheme - you have to always store the DMA addresses
637returned by the pci_alloc_consistent, pci_pool_alloc, and pci_map_single 632returned by the dma_alloc_coherent, dma_pool_alloc, and dma_map_single
638calls (pci_map_sg stores them in the scatterlist itself if the platform 633calls (dma_map_sg stores them in the scatterlist itself if the platform
639supports dynamic DMA mapping in hardware) in your driver structures and/or 634supports dynamic DMA mapping in hardware) in your driver structures and/or
640in the card registers. 635in the card registers.
641 636
642All PCI drivers should be using these interfaces with no exceptions. 637All drivers should be using these interfaces with no exceptions. It
643It is planned to completely remove virt_to_bus() and bus_to_virt() as 638is planned to completely remove virt_to_bus() and bus_to_virt() as
644they are entirely deprecated. Some ports already do not provide these 639they are entirely deprecated. Some ports already do not provide these
645as it is impossible to correctly support them. 640as it is impossible to correctly support them.
646 641
647 Optimizing Unmap State Space Consumption 642 Optimizing Unmap State Space Consumption
648 643
649On many platforms, pci_unmap_{single,page}() is simply a nop. 644On many platforms, dma_unmap_{single,page}() is simply a nop.
650Therefore, keeping track of the mapping address and length is a waste 645Therefore, keeping track of the mapping address and length is a waste
651of space. Instead of filling your drivers up with ifdefs and the like 646of space. Instead of filling your drivers up with ifdefs and the like
652to "work around" this (which would defeat the whole purpose of a 647to "work around" this (which would defeat the whole purpose of a
@@ -655,7 +650,7 @@ portable API) the following facilities are provided.
655Actually, instead of describing the macros one by one, we'll 650Actually, instead of describing the macros one by one, we'll
656transform some example code. 651transform some example code.
657 652
6581) Use DECLARE_PCI_UNMAP_{ADDR,LEN} in state saving structures. 6531) Use DEFINE_DMA_UNMAP_{ADDR,LEN} in state saving structures.
659 Example, before: 654 Example, before:
660 655
661 struct ring_state { 656 struct ring_state {
@@ -668,14 +663,11 @@ transform some example code.
668 663
669 struct ring_state { 664 struct ring_state {
670 struct sk_buff *skb; 665 struct sk_buff *skb;
671 DECLARE_PCI_UNMAP_ADDR(mapping) 666 DEFINE_DMA_UNMAP_ADDR(mapping);
672 DECLARE_PCI_UNMAP_LEN(len) 667 DEFINE_DMA_UNMAP_LEN(len);
673 }; 668 };
674 669
675 NOTE: DO NOT put a semicolon at the end of the DECLARE_*() 6702) Use dma_unmap_{addr,len}_set to set these values.
676 macro.
677
6782) Use pci_unmap_{addr,len}_set to set these values.
679 Example, before: 671 Example, before:
680 672
681 ringp->mapping = FOO; 673 ringp->mapping = FOO;
@@ -683,21 +675,21 @@ transform some example code.
683 675
684 after: 676 after:
685 677
686 pci_unmap_addr_set(ringp, mapping, FOO); 678 dma_unmap_addr_set(ringp, mapping, FOO);
687 pci_unmap_len_set(ringp, len, BAR); 679 dma_unmap_len_set(ringp, len, BAR);
688 680
6893) Use pci_unmap_{addr,len} to access these values. 6813) Use dma_unmap_{addr,len} to access these values.
690 Example, before: 682 Example, before:
691 683
692 pci_unmap_single(pdev, ringp->mapping, ringp->len, 684 dma_unmap_single(dev, ringp->mapping, ringp->len,
693 PCI_DMA_FROMDEVICE); 685 DMA_FROM_DEVICE);
694 686
695 after: 687 after:
696 688
697 pci_unmap_single(pdev, 689 dma_unmap_single(dev,
698 pci_unmap_addr(ringp, mapping), 690 dma_unmap_addr(ringp, mapping),
699 pci_unmap_len(ringp, len), 691 dma_unmap_len(ringp, len),
700 PCI_DMA_FROMDEVICE); 692 DMA_FROM_DEVICE);
701 693
702It really should be self-explanatory. We treat the ADDR and LEN 694It really should be self-explanatory. We treat the ADDR and LEN
703separately, because it is possible for an implementation to only 695separately, because it is possible for an implementation to only
@@ -732,15 +724,15 @@ to "Closing".
732DMA address space is limited on some architectures and an allocation 724DMA address space is limited on some architectures and an allocation
733failure can be determined by: 725failure can be determined by:
734 726
735- checking if pci_alloc_consistent returns NULL or pci_map_sg returns 0 727- checking if dma_alloc_coherent returns NULL or dma_map_sg returns 0
736 728
737- checking the returned dma_addr_t of pci_map_single and pci_map_page 729- checking the returned dma_addr_t of dma_map_single and dma_map_page
738 by using pci_dma_mapping_error(): 730 by using dma_mapping_error():
739 731
740 dma_addr_t dma_handle; 732 dma_addr_t dma_handle;
741 733
742 dma_handle = pci_map_single(pdev, addr, size, direction); 734 dma_handle = dma_map_single(dev, addr, size, direction);
743 if (pci_dma_mapping_error(pdev, dma_handle)) { 735 if (dma_mapping_error(dev, dma_handle)) {
744 /* 736 /*
745 * reduce current DMA mapping usage, 737 * reduce current DMA mapping usage,
746 * delay and try again later or 738 * delay and try again later or
diff --git a/Documentation/RCU/00-INDEX b/Documentation/RCU/00-INDEX
index 9bb62f7b89c3..71b6f500ddb9 100644
--- a/Documentation/RCU/00-INDEX
+++ b/Documentation/RCU/00-INDEX
@@ -6,16 +6,22 @@ checklist.txt
6 - Review Checklist for RCU Patches 6 - Review Checklist for RCU Patches
7listRCU.txt 7listRCU.txt
8 - Using RCU to Protect Read-Mostly Linked Lists 8 - Using RCU to Protect Read-Mostly Linked Lists
9lockdep.txt
10 - RCU and lockdep checking
9NMI-RCU.txt 11NMI-RCU.txt
10 - Using RCU to Protect Dynamic NMI Handlers 12 - Using RCU to Protect Dynamic NMI Handlers
13rcubarrier.txt
14 - RCU and Unloadable Modules
15rculist_nulls.txt
16 - RCU list primitives for use with SLAB_DESTROY_BY_RCU
11rcuref.txt 17rcuref.txt
12 - Reference-count design for elements of lists/arrays protected by RCU 18 - Reference-count design for elements of lists/arrays protected by RCU
13rcu.txt 19rcu.txt
14 - RCU Concepts 20 - RCU Concepts
15rcubarrier.txt
16 - Unloading modules that use RCU callbacks
17RTFP.txt 21RTFP.txt
18 - List of RCU papers (bibliography) going back to 1980. 22 - List of RCU papers (bibliography) going back to 1980.
23stallwarn.txt
24 - RCU CPU stall warnings (CONFIG_RCU_CPU_STALL_DETECTOR)
19torture.txt 25torture.txt
20 - RCU Torture Test Operation (CONFIG_RCU_TORTURE_TEST) 26 - RCU Torture Test Operation (CONFIG_RCU_TORTURE_TEST)
21trace.txt 27trace.txt
diff --git a/Documentation/RCU/RTFP.txt b/Documentation/RCU/RTFP.txt
index d2b85237c76e..5aea459e3dd6 100644
--- a/Documentation/RCU/RTFP.txt
+++ b/Documentation/RCU/RTFP.txt
@@ -25,10 +25,10 @@ to be referencing the data structure. However, this mechanism was not
25optimized for modern computer systems, which is not surprising given 25optimized for modern computer systems, which is not surprising given
26that these overheads were not so expensive in the mid-80s. Nonetheless, 26that these overheads were not so expensive in the mid-80s. Nonetheless,
27passive serialization appears to be the first deferred-destruction 27passive serialization appears to be the first deferred-destruction
28mechanism to be used in production. Furthermore, the relevant patent has 28mechanism to be used in production. Furthermore, the relevant patent
29lapsed, so this approach may be used in non-GPL software, if desired. 29has lapsed, so this approach may be used in non-GPL software, if desired.
30(In contrast, use of RCU is permitted only in software licensed under 30(In contrast, implementation of RCU is permitted only in software licensed
31GPL. Sorry!!!) 31under either GPL or LGPL. Sorry!!!)
32 32
33In 1990, Pugh [Pugh90] noted that explicitly tracking which threads 33In 1990, Pugh [Pugh90] noted that explicitly tracking which threads
34were reading a given data structure permitted deferred free to operate 34were reading a given data structure permitted deferred free to operate
@@ -150,6 +150,18 @@ preemptible RCU [PaulEMcKenney2007PreemptibleRCU], and the three-part
150LWN "What is RCU?" series [PaulEMcKenney2007WhatIsRCUFundamentally, 150LWN "What is RCU?" series [PaulEMcKenney2007WhatIsRCUFundamentally,
151PaulEMcKenney2008WhatIsRCUUsage, and PaulEMcKenney2008WhatIsRCUAPI]. 151PaulEMcKenney2008WhatIsRCUUsage, and PaulEMcKenney2008WhatIsRCUAPI].
152 152
1532008 saw a journal paper on real-time RCU [DinakarGuniguntala2008IBMSysJ],
154a history of how Linux changed RCU more than RCU changed Linux
155[PaulEMcKenney2008RCUOSR], and a design overview of hierarchical RCU
156[PaulEMcKenney2008HierarchicalRCU].
157
1582009 introduced user-level RCU algorithms [PaulEMcKenney2009MaliciousURCU],
159which Mathieu Desnoyers is now maintaining [MathieuDesnoyers2009URCU]
160[MathieuDesnoyersPhD]. TINY_RCU [PaulEMcKenney2009BloatWatchRCU] made
161its appearance, as did expedited RCU [PaulEMcKenney2009expeditedRCU].
162The problem of resizeable RCU-protected hash tables may now be on a path
163to a solution [JoshTriplett2009RPHash].
164
153Bibtex Entries 165Bibtex Entries
154 166
155@article{Kung80 167@article{Kung80
@@ -730,6 +742,11 @@ Revised:
730" 742"
731} 743}
732 744
745#
746# "What is RCU?" LWN series.
747#
748########################################################################
749
733@article{DinakarGuniguntala2008IBMSysJ 750@article{DinakarGuniguntala2008IBMSysJ
734,author="D. Guniguntala and P. E. McKenney and J. Triplett and J. Walpole" 751,author="D. Guniguntala and P. E. McKenney and J. Triplett and J. Walpole"
735,title="The read-copy-update mechanism for supporting real-time applications on shared-memory multiprocessor systems with {Linux}" 752,title="The read-copy-update mechanism for supporting real-time applications on shared-memory multiprocessor systems with {Linux}"
@@ -820,3 +837,39 @@ Revised:
820 Uniprocessor assumptions allow simplified RCU implementation. 837 Uniprocessor assumptions allow simplified RCU implementation.
821" 838"
822} 839}
840
841@unpublished{PaulEMcKenney2009expeditedRCU
842,Author="Paul E. McKenney"
843,Title="[{PATCH} -tip 0/3] expedited 'big hammer' {RCU} grace periods"
844,month="June"
845,day="25"
846,year="2009"
847,note="Available:
848\url{http://lkml.org/lkml/2009/6/25/306}
849[Viewed August 16, 2009]"
850,annotation="
851 First posting of expedited RCU to be accepted into -tip.
852"
853}
854
855@unpublished{JoshTriplett2009RPHash
856,Author="Josh Triplett"
857,Title="Scalable concurrent hash tables via relativistic programming"
858,month="September"
859,year="2009"
860,note="Linux Plumbers Conference presentation"
861,annotation="
862 RP fun with hash tables.
863"
864}
865
866@phdthesis{MathieuDesnoyersPhD
867, title = "Low-Impact Operating System Tracing"
868, author = "Mathieu Desnoyers"
869, school = "Ecole Polytechnique de Montr\'{e}al"
870, month = "December"
871, year = 2009
872,note="Available:
873\url{http://www.lttng.org/pub/thesis/desnoyers-dissertation-2009-12.pdf}
874[Viewed December 9, 2009]"
875}
diff --git a/Documentation/RCU/checklist.txt b/Documentation/RCU/checklist.txt
index 51525a30e8b4..cbc180f90194 100644
--- a/Documentation/RCU/checklist.txt
+++ b/Documentation/RCU/checklist.txt
@@ -8,13 +8,12 @@ would cause. This list is based on experiences reviewing such patches
8over a rather long period of time, but improvements are always welcome! 8over a rather long period of time, but improvements are always welcome!
9 9
100. Is RCU being applied to a read-mostly situation? If the data 100. Is RCU being applied to a read-mostly situation? If the data
11 structure is updated more than about 10% of the time, then 11 structure is updated more than about 10% of the time, then you
12 you should strongly consider some other approach, unless 12 should strongly consider some other approach, unless detailed
13 detailed performance measurements show that RCU is nonetheless 13 performance measurements show that RCU is nonetheless the right
14 the right tool for the job. Yes, you might think of RCU 14 tool for the job. Yes, RCU does reduce read-side overhead by
15 as simply cutting overhead off of the readers and imposing it 15 increasing write-side overhead, which is exactly why normal uses
16 on the writers. That is exactly why normal uses of RCU will 16 of RCU will do much more reading than updating.
17 do much more reading than updating.
18 17
19 Another exception is where performance is not an issue, and RCU 18 Another exception is where performance is not an issue, and RCU
20 provides a simpler implementation. An example of this situation 19 provides a simpler implementation. An example of this situation
@@ -35,13 +34,13 @@ over a rather long period of time, but improvements are always welcome!
35 34
36 If you choose #b, be prepared to describe how you have handled 35 If you choose #b, be prepared to describe how you have handled
37 memory barriers on weakly ordered machines (pretty much all of 36 memory barriers on weakly ordered machines (pretty much all of
38 them -- even x86 allows reads to be reordered), and be prepared 37 them -- even x86 allows later loads to be reordered to precede
39 to explain why this added complexity is worthwhile. If you 38 earlier stores), and be prepared to explain why this added
40 choose #c, be prepared to explain how this single task does not 39 complexity is worthwhile. If you choose #c, be prepared to
41 become a major bottleneck on big multiprocessor machines (for 40 explain how this single task does not become a major bottleneck on
42 example, if the task is updating information relating to itself 41 big multiprocessor machines (for example, if the task is updating
43 that other tasks can read, there by definition can be no 42 information relating to itself that other tasks can read, there
44 bottleneck). 43 by definition can be no bottleneck).
45 44
462. Do the RCU read-side critical sections make proper use of 452. Do the RCU read-side critical sections make proper use of
47 rcu_read_lock() and friends? These primitives are needed 46 rcu_read_lock() and friends? These primitives are needed
@@ -51,8 +50,10 @@ over a rather long period of time, but improvements are always welcome!
51 actuarial risk of your kernel. 50 actuarial risk of your kernel.
52 51
53 As a rough rule of thumb, any dereference of an RCU-protected 52 As a rough rule of thumb, any dereference of an RCU-protected
54 pointer must be covered by rcu_read_lock() or rcu_read_lock_bh() 53 pointer must be covered by rcu_read_lock(), rcu_read_lock_bh(),
55 or by the appropriate update-side lock. 54 rcu_read_lock_sched(), or by the appropriate update-side lock.
55 Disabling of preemption can serve as rcu_read_lock_sched(), but
56 is less readable.
56 57
573. Does the update code tolerate concurrent accesses? 583. Does the update code tolerate concurrent accesses?
58 59
@@ -62,25 +63,27 @@ over a rather long period of time, but improvements are always welcome!
62 of ways to handle this concurrency, depending on the situation: 63 of ways to handle this concurrency, depending on the situation:
63 64
64 a. Use the RCU variants of the list and hlist update 65 a. Use the RCU variants of the list and hlist update
65 primitives to add, remove, and replace elements on an 66 primitives to add, remove, and replace elements on
66 RCU-protected list. Alternatively, use the RCU-protected 67 an RCU-protected list. Alternatively, use the other
67 trees that have been added to the Linux kernel. 68 RCU-protected data structures that have been added to
69 the Linux kernel.
68 70
69 This is almost always the best approach. 71 This is almost always the best approach.
70 72
71 b. Proceed as in (a) above, but also maintain per-element 73 b. Proceed as in (a) above, but also maintain per-element
72 locks (that are acquired by both readers and writers) 74 locks (that are acquired by both readers and writers)
73 that guard per-element state. Of course, fields that 75 that guard per-element state. Of course, fields that
74 the readers refrain from accessing can be guarded by the 76 the readers refrain from accessing can be guarded by
75 update-side lock. 77 some other lock acquired only by updaters, if desired.
76 78
77 This works quite well, also. 79 This works quite well, also.
78 80
79 c. Make updates appear atomic to readers. For example, 81 c. Make updates appear atomic to readers. For example,
80 pointer updates to properly aligned fields will appear 82 pointer updates to properly aligned fields will
81 atomic, as will individual atomic primitives. Operations 83 appear atomic, as will individual atomic primitives.
82 performed under a lock and sequences of multiple atomic 84 Sequences of perations performed under a lock will -not-
83 primitives will -not- appear to be atomic. 85 appear to be atomic to RCU readers, nor will sequences
86 of multiple atomic primitives.
84 87
85 This can work, but is starting to get a bit tricky. 88 This can work, but is starting to get a bit tricky.
86 89
@@ -98,9 +101,9 @@ over a rather long period of time, but improvements are always welcome!
98 a new structure containing updated values. 101 a new structure containing updated values.
99 102
1004. Weakly ordered CPUs pose special challenges. Almost all CPUs 1034. Weakly ordered CPUs pose special challenges. Almost all CPUs
101 are weakly ordered -- even i386 CPUs allow reads to be reordered. 104 are weakly ordered -- even x86 CPUs allow later loads to be
102 RCU code must take all of the following measures to prevent 105 reordered to precede earlier stores. RCU code must take all of
103 memory-corruption problems: 106 the following measures to prevent memory-corruption problems:
104 107
105 a. Readers must maintain proper ordering of their memory 108 a. Readers must maintain proper ordering of their memory
106 accesses. The rcu_dereference() primitive ensures that 109 accesses. The rcu_dereference() primitive ensures that
@@ -113,14 +116,25 @@ over a rather long period of time, but improvements are always welcome!
113 The rcu_dereference() primitive is also an excellent 116 The rcu_dereference() primitive is also an excellent
114 documentation aid, letting the person reading the code 117 documentation aid, letting the person reading the code
115 know exactly which pointers are protected by RCU. 118 know exactly which pointers are protected by RCU.
116 119 Please note that compilers can also reorder code, and
117 The rcu_dereference() primitive is used by the various 120 they are becoming increasingly aggressive about doing
118 "_rcu()" list-traversal primitives, such as the 121 just that. The rcu_dereference() primitive therefore
119 list_for_each_entry_rcu(). Note that it is perfectly 122 also prevents destructive compiler optimizations.
120 legal (if redundant) for update-side code to use 123
121 rcu_dereference() and the "_rcu()" list-traversal 124 The rcu_dereference() primitive is used by the
122 primitives. This is particularly useful in code 125 various "_rcu()" list-traversal primitives, such
123 that is common to readers and updaters. 126 as the list_for_each_entry_rcu(). Note that it is
127 perfectly legal (if redundant) for update-side code to
128 use rcu_dereference() and the "_rcu()" list-traversal
129 primitives. This is particularly useful in code that
130 is common to readers and updaters. However, lockdep
131 will complain if you access rcu_dereference() outside
132 of an RCU read-side critical section. See lockdep.txt
133 to learn what to do about this.
134
135 Of course, neither rcu_dereference() nor the "_rcu()"
136 list-traversal primitives can substitute for a good
137 concurrency design coordinating among multiple updaters.
124 138
125 b. If the list macros are being used, the list_add_tail_rcu() 139 b. If the list macros are being used, the list_add_tail_rcu()
126 and list_add_rcu() primitives must be used in order 140 and list_add_rcu() primitives must be used in order
@@ -135,11 +149,14 @@ over a rather long period of time, but improvements are always welcome!
135 readers. Similarly, if the hlist macros are being used, 149 readers. Similarly, if the hlist macros are being used,
136 the hlist_del_rcu() primitive is required. 150 the hlist_del_rcu() primitive is required.
137 151
138 The list_replace_rcu() primitive may be used to 152 The list_replace_rcu() and hlist_replace_rcu() primitives
139 replace an old structure with a new one in an 153 may be used to replace an old structure with a new one
140 RCU-protected list. 154 in their respective types of RCU-protected lists.
155
156 d. Rules similar to (4b) and (4c) apply to the "hlist_nulls"
157 type of RCU-protected linked lists.
141 158
142 d. Updates must ensure that initialization of a given 159 e. Updates must ensure that initialization of a given
143 structure happens before pointers to that structure are 160 structure happens before pointers to that structure are
144 publicized. Use the rcu_assign_pointer() primitive 161 publicized. Use the rcu_assign_pointer() primitive
145 when publicizing a pointer to a structure that can 162 when publicizing a pointer to a structure that can
@@ -151,16 +168,31 @@ over a rather long period of time, but improvements are always welcome!
151 it cannot block. 168 it cannot block.
152 169
1536. Since synchronize_rcu() can block, it cannot be called from 1706. Since synchronize_rcu() can block, it cannot be called from
154 any sort of irq context. Ditto for synchronize_sched() and 171 any sort of irq context. The same rule applies for
155 synchronize_srcu(). 172 synchronize_rcu_bh(), synchronize_sched(), synchronize_srcu(),
156 173 synchronize_rcu_expedited(), synchronize_rcu_bh_expedited(),
1577. If the updater uses call_rcu(), then the corresponding readers 174 synchronize_sched_expedite(), and synchronize_srcu_expedited().
158 must use rcu_read_lock() and rcu_read_unlock(). If the updater 175
159 uses call_rcu_bh(), then the corresponding readers must use 176 The expedited forms of these primitives have the same semantics
160 rcu_read_lock_bh() and rcu_read_unlock_bh(). If the updater 177 as the non-expedited forms, but expediting is both expensive
161 uses call_rcu_sched(), then the corresponding readers must 178 and unfriendly to real-time workloads. Use of the expedited
162 disable preemption. Mixing things up will result in confusion 179 primitives should be restricted to rare configuration-change
163 and broken kernels. 180 operations that would not normally be undertaken while a real-time
181 workload is running.
182
1837. If the updater uses call_rcu() or synchronize_rcu(), then the
184 corresponding readers must use rcu_read_lock() and
185 rcu_read_unlock(). If the updater uses call_rcu_bh() or
186 synchronize_rcu_bh(), then the corresponding readers must
187 use rcu_read_lock_bh() and rcu_read_unlock_bh(). If the
188 updater uses call_rcu_sched() or synchronize_sched(), then
189 the corresponding readers must disable preemption, possibly
190 by calling rcu_read_lock_sched() and rcu_read_unlock_sched().
191 If the updater uses synchronize_srcu(), the the corresponding
192 readers must use srcu_read_lock() and srcu_read_unlock(),
193 and with the same srcu_struct. The rules for the expedited
194 primitives are the same as for their non-expedited counterparts.
195 Mixing things up will result in confusion and broken kernels.
164 196
165 One exception to this rule: rcu_read_lock() and rcu_read_unlock() 197 One exception to this rule: rcu_read_lock() and rcu_read_unlock()
166 may be substituted for rcu_read_lock_bh() and rcu_read_unlock_bh() 198 may be substituted for rcu_read_lock_bh() and rcu_read_unlock_bh()
@@ -212,6 +244,8 @@ over a rather long period of time, but improvements are always welcome!
212 e. Periodically invoke synchronize_rcu(), permitting a limited 244 e. Periodically invoke synchronize_rcu(), permitting a limited
213 number of updates per grace period. 245 number of updates per grace period.
214 246
247 The same cautions apply to call_rcu_bh() and call_rcu_sched().
248
2159. All RCU list-traversal primitives, which include 2499. All RCU list-traversal primitives, which include
216 rcu_dereference(), list_for_each_entry_rcu(), 250 rcu_dereference(), list_for_each_entry_rcu(),
217 list_for_each_continue_rcu(), and list_for_each_safe_rcu(), 251 list_for_each_continue_rcu(), and list_for_each_safe_rcu(),
@@ -219,7 +253,9 @@ over a rather long period of time, but improvements are always welcome!
219 must be protected by appropriate update-side locks. RCU 253 must be protected by appropriate update-side locks. RCU
220 read-side critical sections are delimited by rcu_read_lock() 254 read-side critical sections are delimited by rcu_read_lock()
221 and rcu_read_unlock(), or by similar primitives such as 255 and rcu_read_unlock(), or by similar primitives such as
222 rcu_read_lock_bh() and rcu_read_unlock_bh(). 256 rcu_read_lock_bh() and rcu_read_unlock_bh(), in which case
257 the matching rcu_dereference() primitive must be used in order
258 to keep lockdep happy, in this case, rcu_dereference_bh().
223 259
224 The reason that it is permissible to use RCU list-traversal 260 The reason that it is permissible to use RCU list-traversal
225 primitives when the update-side lock is held is that doing so 261 primitives when the update-side lock is held is that doing so
@@ -229,7 +265,8 @@ over a rather long period of time, but improvements are always welcome!
22910. Conversely, if you are in an RCU read-side critical section, 26510. Conversely, if you are in an RCU read-side critical section,
230 and you don't hold the appropriate update-side lock, you -must- 266 and you don't hold the appropriate update-side lock, you -must-
231 use the "_rcu()" variants of the list macros. Failing to do so 267 use the "_rcu()" variants of the list macros. Failing to do so
232 will break Alpha and confuse people reading your code. 268 will break Alpha, cause aggressive compilers to generate bad code,
269 and confuse people trying to read your code.
233 270
23411. Note that synchronize_rcu() -only- guarantees to wait until 27111. Note that synchronize_rcu() -only- guarantees to wait until
235 all currently executing rcu_read_lock()-protected RCU read-side 272 all currently executing rcu_read_lock()-protected RCU read-side
@@ -239,15 +276,21 @@ over a rather long period of time, but improvements are always welcome!
239 rcu_read_lock()-protected read-side critical sections, do -not- 276 rcu_read_lock()-protected read-side critical sections, do -not-
240 use synchronize_rcu(). 277 use synchronize_rcu().
241 278
242 If you want to wait for some of these other things, you might 279 Similarly, disabling preemption is not an acceptable substitute
243 instead need to use synchronize_irq() or synchronize_sched(). 280 for rcu_read_lock(). Code that attempts to use preemption
281 disabling where it should be using rcu_read_lock() will break
282 in real-time kernel builds.
283
284 If you want to wait for interrupt handlers, NMI handlers, and
285 code under the influence of preempt_disable(), you instead
286 need to use synchronize_irq() or synchronize_sched().
244 287
24512. Any lock acquired by an RCU callback must be acquired elsewhere 28812. Any lock acquired by an RCU callback must be acquired elsewhere
246 with softirq disabled, e.g., via spin_lock_irqsave(), 289 with softirq disabled, e.g., via spin_lock_irqsave(),
247 spin_lock_bh(), etc. Failing to disable irq on a given 290 spin_lock_bh(), etc. Failing to disable irq on a given
248 acquisition of that lock will result in deadlock as soon as the 291 acquisition of that lock will result in deadlock as soon as
249 RCU callback happens to interrupt that acquisition's critical 292 the RCU softirq handler happens to run your RCU callback while
250 section. 293 interrupting that acquisition's critical section.
251 294
25213. RCU callbacks can be and are executed in parallel. In many cases, 29513. RCU callbacks can be and are executed in parallel. In many cases,
253 the callback code simply wrappers around kfree(), so that this 296 the callback code simply wrappers around kfree(), so that this
@@ -265,29 +308,30 @@ over a rather long period of time, but improvements are always welcome!
265 not the case, a self-spawning RCU callback would prevent the 308 not the case, a self-spawning RCU callback would prevent the
266 victim CPU from ever going offline.) 309 victim CPU from ever going offline.)
267 310
26814. SRCU (srcu_read_lock(), srcu_read_unlock(), and synchronize_srcu()) 31114. SRCU (srcu_read_lock(), srcu_read_unlock(), srcu_dereference(),
269 may only be invoked from process context. Unlike other forms of 312 synchronize_srcu(), and synchronize_srcu_expedited()) may only
270 RCU, it -is- permissible to block in an SRCU read-side critical 313 be invoked from process context. Unlike other forms of RCU, it
271 section (demarked by srcu_read_lock() and srcu_read_unlock()), 314 -is- permissible to block in an SRCU read-side critical section
272 hence the "SRCU": "sleepable RCU". Please note that if you 315 (demarked by srcu_read_lock() and srcu_read_unlock()), hence the
273 don't need to sleep in read-side critical sections, you should 316 "SRCU": "sleepable RCU". Please note that if you don't need
274 be using RCU rather than SRCU, because RCU is almost always 317 to sleep in read-side critical sections, you should be using
275 faster and easier to use than is SRCU. 318 RCU rather than SRCU, because RCU is almost always faster and
319 easier to use than is SRCU.
276 320
277 Also unlike other forms of RCU, explicit initialization 321 Also unlike other forms of RCU, explicit initialization
278 and cleanup is required via init_srcu_struct() and 322 and cleanup is required via init_srcu_struct() and
279 cleanup_srcu_struct(). These are passed a "struct srcu_struct" 323 cleanup_srcu_struct(). These are passed a "struct srcu_struct"
280 that defines the scope of a given SRCU domain. Once initialized, 324 that defines the scope of a given SRCU domain. Once initialized,
281 the srcu_struct is passed to srcu_read_lock(), srcu_read_unlock() 325 the srcu_struct is passed to srcu_read_lock(), srcu_read_unlock()
282 and synchronize_srcu(). A given synchronize_srcu() waits only 326 synchronize_srcu(), and synchronize_srcu_expedited(). A given
283 for SRCU read-side critical sections governed by srcu_read_lock() 327 synchronize_srcu() waits only for SRCU read-side critical
284 and srcu_read_unlock() calls that have been passd the same 328 sections governed by srcu_read_lock() and srcu_read_unlock()
285 srcu_struct. This property is what makes sleeping read-side 329 calls that have been passed the same srcu_struct. This property
286 critical sections tolerable -- a given subsystem delays only 330 is what makes sleeping read-side critical sections tolerable --
287 its own updates, not those of other subsystems using SRCU. 331 a given subsystem delays only its own updates, not those of other
288 Therefore, SRCU is less prone to OOM the system than RCU would 332 subsystems using SRCU. Therefore, SRCU is less prone to OOM the
289 be if RCU's read-side critical sections were permitted to 333 system than RCU would be if RCU's read-side critical sections
290 sleep. 334 were permitted to sleep.
291 335
292 The ability to sleep in read-side critical sections does not 336 The ability to sleep in read-side critical sections does not
293 come for free. First, corresponding srcu_read_lock() and 337 come for free. First, corresponding srcu_read_lock() and
@@ -311,12 +355,12 @@ over a rather long period of time, but improvements are always welcome!
311 destructive operation, and -only- -then- invoke call_rcu(), 355 destructive operation, and -only- -then- invoke call_rcu(),
312 synchronize_rcu(), or friends. 356 synchronize_rcu(), or friends.
313 357
314 Because these primitives only wait for pre-existing readers, 358 Because these primitives only wait for pre-existing readers, it
315 it is the caller's responsibility to guarantee safety to 359 is the caller's responsibility to guarantee that any subsequent
316 any subsequent readers. 360 readers will execute safely.
317 361
31816. The various RCU read-side primitives do -not- contain memory 36216. The various RCU read-side primitives do -not- necessarily contain
319 barriers. The CPU (and in some cases, the compiler) is free 363 memory barriers. You should therefore plan for the CPU
320 to reorder code into and out of RCU read-side critical sections. 364 and the compiler to freely reorder code into and out of RCU
321 It is the responsibility of the RCU update-side primitives to 365 read-side critical sections. It is the responsibility of the
322 deal with this. 366 RCU update-side primitives to deal with this.
diff --git a/Documentation/RCU/lockdep.txt b/Documentation/RCU/lockdep.txt
new file mode 100644
index 000000000000..fe24b58627bd
--- /dev/null
+++ b/Documentation/RCU/lockdep.txt
@@ -0,0 +1,67 @@
1RCU and lockdep checking
2
3All flavors of RCU have lockdep checking available, so that lockdep is
4aware of when each task enters and leaves any flavor of RCU read-side
5critical section. Each flavor of RCU is tracked separately (but note
6that this is not the case in 2.6.32 and earlier). This allows lockdep's
7tracking to include RCU state, which can sometimes help when debugging
8deadlocks and the like.
9
10In addition, RCU provides the following primitives that check lockdep's
11state:
12
13 rcu_read_lock_held() for normal RCU.
14 rcu_read_lock_bh_held() for RCU-bh.
15 rcu_read_lock_sched_held() for RCU-sched.
16 srcu_read_lock_held() for SRCU.
17
18These functions are conservative, and will therefore return 1 if they
19aren't certain (for example, if CONFIG_DEBUG_LOCK_ALLOC is not set).
20This prevents things like WARN_ON(!rcu_read_lock_held()) from giving false
21positives when lockdep is disabled.
22
23In addition, a separate kernel config parameter CONFIG_PROVE_RCU enables
24checking of rcu_dereference() primitives:
25
26 rcu_dereference(p):
27 Check for RCU read-side critical section.
28 rcu_dereference_bh(p):
29 Check for RCU-bh read-side critical section.
30 rcu_dereference_sched(p):
31 Check for RCU-sched read-side critical section.
32 srcu_dereference(p, sp):
33 Check for SRCU read-side critical section.
34 rcu_dereference_check(p, c):
35 Use explicit check expression "c".
36 rcu_dereference_raw(p)
37 Don't check. (Use sparingly, if at all.)
38
39The rcu_dereference_check() check expression can be any boolean
40expression, but would normally include one of the rcu_read_lock_held()
41family of functions and a lockdep expression. However, any boolean
42expression can be used. For a moderately ornate example, consider
43the following:
44
45 file = rcu_dereference_check(fdt->fd[fd],
46 rcu_read_lock_held() ||
47 lockdep_is_held(&files->file_lock) ||
48 atomic_read(&files->count) == 1);
49
50This expression picks up the pointer "fdt->fd[fd]" in an RCU-safe manner,
51and, if CONFIG_PROVE_RCU is configured, verifies that this expression
52is used in:
53
541. An RCU read-side critical section, or
552. with files->file_lock held, or
563. on an unshared files_struct.
57
58In case (1), the pointer is picked up in an RCU-safe manner for vanilla
59RCU read-side critical sections, in case (2) the ->file_lock prevents
60any change from taking place, and finally, in case (3) the current task
61is the only task accessing the file_struct, again preventing any change
62from taking place.
63
64There are currently only "universal" versions of the rcu_assign_pointer()
65and RCU list-/tree-traversal primitives, which do not (yet) check for
66being in an RCU read-side critical section. In the future, separate
67versions of these primitives might be created.
diff --git a/Documentation/RCU/rcu.txt b/Documentation/RCU/rcu.txt
index 2a23523ce471..31852705b586 100644
--- a/Documentation/RCU/rcu.txt
+++ b/Documentation/RCU/rcu.txt
@@ -75,6 +75,8 @@ o I hear that RCU is patented? What is with that?
75 search for the string "Patent" in RTFP.txt to find them. 75 search for the string "Patent" in RTFP.txt to find them.
76 Of these, one was allowed to lapse by the assignee, and the 76 Of these, one was allowed to lapse by the assignee, and the
77 others have been contributed to the Linux kernel under GPL. 77 others have been contributed to the Linux kernel under GPL.
78 There are now also LGPL implementations of user-level RCU
79 available (http://lttng.org/?q=node/18).
78 80
79o I hear that RCU needs work in order to support realtime kernels? 81o I hear that RCU needs work in order to support realtime kernels?
80 82
@@ -91,48 +93,4 @@ o Where can I find more information on RCU?
91 93
92o What are all these files in this directory? 94o What are all these files in this directory?
93 95
94 96 See 00-INDEX for the list.
95 NMI-RCU.txt
96
97 Describes how to use RCU to implement dynamic
98 NMI handlers, which can be revectored on the fly,
99 without rebooting.
100
101 RTFP.txt
102
103 List of RCU-related publications and web sites.
104
105 UP.txt
106
107 Discussion of RCU usage in UP kernels.
108
109 arrayRCU.txt
110
111 Describes how to use RCU to protect arrays, with
112 resizeable arrays whose elements reference other
113 data structures being of the most interest.
114
115 checklist.txt
116
117 Lists things to check for when inspecting code that
118 uses RCU.
119
120 listRCU.txt
121
122 Describes how to use RCU to protect linked lists.
123 This is the simplest and most common use of RCU
124 in the Linux kernel.
125
126 rcu.txt
127
128 You are reading it!
129
130 rcuref.txt
131
132 Describes how to combine use of reference counts
133 with RCU.
134
135 whatisRCU.txt
136
137 Overview of how the RCU implementation works. Along
138 the way, presents a conceptual view of RCU.
diff --git a/Documentation/RCU/stallwarn.txt b/Documentation/RCU/stallwarn.txt
new file mode 100644
index 000000000000..1423d2570d78
--- /dev/null
+++ b/Documentation/RCU/stallwarn.txt
@@ -0,0 +1,58 @@
1Using RCU's CPU Stall Detector
2
3The CONFIG_RCU_CPU_STALL_DETECTOR kernel config parameter enables
4RCU's CPU stall detector, which detects conditions that unduly delay
5RCU grace periods. The stall detector's idea of what constitutes
6"unduly delayed" is controlled by a pair of C preprocessor macros:
7
8RCU_SECONDS_TILL_STALL_CHECK
9
10 This macro defines the period of time that RCU will wait from
11 the beginning of a grace period until it issues an RCU CPU
12 stall warning. It is normally ten seconds.
13
14RCU_SECONDS_TILL_STALL_RECHECK
15
16 This macro defines the period of time that RCU will wait after
17 issuing a stall warning until it issues another stall warning.
18 It is normally set to thirty seconds.
19
20RCU_STALL_RAT_DELAY
21
22 The CPU stall detector tries to make the offending CPU rat on itself,
23 as this often gives better-quality stack traces. However, if
24 the offending CPU does not detect its own stall in the number
25 of jiffies specified by RCU_STALL_RAT_DELAY, then other CPUs will
26 complain. This is normally set to two jiffies.
27
28The following problems can result in an RCU CPU stall warning:
29
30o A CPU looping in an RCU read-side critical section.
31
32o A CPU looping with interrupts disabled.
33
34o A CPU looping with preemption disabled.
35
36o For !CONFIG_PREEMPT kernels, a CPU looping anywhere in the kernel
37 without invoking schedule().
38
39o A bug in the RCU implementation.
40
41o A hardware failure. This is quite unlikely, but has occurred
42 at least once in a former life. A CPU failed in a running system,
43 becoming unresponsive, but not causing an immediate crash.
44 This resulted in a series of RCU CPU stall warnings, eventually
45 leading the realization that the CPU had failed.
46
47The RCU, RCU-sched, and RCU-bh implementations have CPU stall warning.
48SRCU does not do so directly, but its calls to synchronize_sched() will
49result in RCU-sched detecting any CPU stalls that might be occurring.
50
51To diagnose the cause of the stall, inspect the stack traces. The offending
52function will usually be near the top of the stack. If you have a series
53of stall warnings from a single extended stall, comparing the stack traces
54can often help determine where the stall is occurring, which will usually
55be in the function nearest the top of the stack that stays the same from
56trace to trace.
57
58RCU bugs can often be debugged with the help of CONFIG_RCU_TRACE.
diff --git a/Documentation/RCU/torture.txt b/Documentation/RCU/torture.txt
index 9dba3bb90e60..0e50bc2aa1e2 100644
--- a/Documentation/RCU/torture.txt
+++ b/Documentation/RCU/torture.txt
@@ -30,6 +30,18 @@ MODULE PARAMETERS
30 30
31This module has the following parameters: 31This module has the following parameters:
32 32
33fqs_duration Duration (in microseconds) of artificially induced bursts
34 of force_quiescent_state() invocations. In RCU
35 implementations having force_quiescent_state(), these
36 bursts help force races between forcing a given grace
37 period and that grace period ending on its own.
38
39fqs_holdoff Holdoff time (in microseconds) between consecutive calls
40 to force_quiescent_state() within a burst.
41
42fqs_stutter Wait time (in seconds) between consecutive bursts
43 of calls to force_quiescent_state().
44
33irqreaders Says to invoke RCU readers from irq level. This is currently 45irqreaders Says to invoke RCU readers from irq level. This is currently
34 done via timers. Defaults to "1" for variants of RCU that 46 done via timers. Defaults to "1" for variants of RCU that
35 permit this. (Or, more accurately, variants of RCU that do 47 permit this. (Or, more accurately, variants of RCU that do
diff --git a/Documentation/RCU/whatisRCU.txt b/Documentation/RCU/whatisRCU.txt
index d542ca243b80..1dc00ee97163 100644
--- a/Documentation/RCU/whatisRCU.txt
+++ b/Documentation/RCU/whatisRCU.txt
@@ -323,14 +323,17 @@ used as follows:
323 Defer Protect 323 Defer Protect
324 324
325a. synchronize_rcu() rcu_read_lock() / rcu_read_unlock() 325a. synchronize_rcu() rcu_read_lock() / rcu_read_unlock()
326 call_rcu() 326 call_rcu() rcu_dereference()
327 327
328b. call_rcu_bh() rcu_read_lock_bh() / rcu_read_unlock_bh() 328b. call_rcu_bh() rcu_read_lock_bh() / rcu_read_unlock_bh()
329 rcu_dereference_bh()
329 330
330c. synchronize_sched() preempt_disable() / preempt_enable() 331c. synchronize_sched() rcu_read_lock_sched() / rcu_read_unlock_sched()
332 preempt_disable() / preempt_enable()
331 local_irq_save() / local_irq_restore() 333 local_irq_save() / local_irq_restore()
332 hardirq enter / hardirq exit 334 hardirq enter / hardirq exit
333 NMI enter / NMI exit 335 NMI enter / NMI exit
336 rcu_dereference_sched()
334 337
335These three mechanisms are used as follows: 338These three mechanisms are used as follows:
336 339
@@ -780,9 +783,8 @@ Linux-kernel source code, but it helps to have a full list of the
780APIs, since there does not appear to be a way to categorize them 783APIs, since there does not appear to be a way to categorize them
781in docbook. Here is the list, by category. 784in docbook. Here is the list, by category.
782 785
783RCU pointer/list traversal: 786RCU list traversal:
784 787
785 rcu_dereference
786 list_for_each_entry_rcu 788 list_for_each_entry_rcu
787 hlist_for_each_entry_rcu 789 hlist_for_each_entry_rcu
788 hlist_nulls_for_each_entry_rcu 790 hlist_nulls_for_each_entry_rcu
@@ -808,7 +810,7 @@ RCU: Critical sections Grace period Barrier
808 810
809 rcu_read_lock synchronize_net rcu_barrier 811 rcu_read_lock synchronize_net rcu_barrier
810 rcu_read_unlock synchronize_rcu 812 rcu_read_unlock synchronize_rcu
811 synchronize_rcu_expedited 813 rcu_dereference synchronize_rcu_expedited
812 call_rcu 814 call_rcu
813 815
814 816
@@ -816,7 +818,7 @@ bh: Critical sections Grace period Barrier
816 818
817 rcu_read_lock_bh call_rcu_bh rcu_barrier_bh 819 rcu_read_lock_bh call_rcu_bh rcu_barrier_bh
818 rcu_read_unlock_bh synchronize_rcu_bh 820 rcu_read_unlock_bh synchronize_rcu_bh
819 synchronize_rcu_bh_expedited 821 rcu_dereference_bh synchronize_rcu_bh_expedited
820 822
821 823
822sched: Critical sections Grace period Barrier 824sched: Critical sections Grace period Barrier
@@ -825,12 +827,14 @@ sched: Critical sections Grace period Barrier
825 rcu_read_unlock_sched call_rcu_sched 827 rcu_read_unlock_sched call_rcu_sched
826 [preempt_disable] synchronize_sched_expedited 828 [preempt_disable] synchronize_sched_expedited
827 [and friends] 829 [and friends]
830 rcu_dereference_sched
828 831
829 832
830SRCU: Critical sections Grace period Barrier 833SRCU: Critical sections Grace period Barrier
831 834
832 srcu_read_lock synchronize_srcu N/A 835 srcu_read_lock synchronize_srcu N/A
833 srcu_read_unlock synchronize_srcu_expedited 836 srcu_read_unlock synchronize_srcu_expedited
837 srcu_dereference
834 838
835SRCU: Initialization/cleanup 839SRCU: Initialization/cleanup
836 init_srcu_struct 840 init_srcu_struct
diff --git a/Documentation/SubmitChecklist b/Documentation/SubmitChecklist
index 1053a56be3b1..8916ca48bc95 100644
--- a/Documentation/SubmitChecklist
+++ b/Documentation/SubmitChecklist
@@ -9,10 +9,14 @@ Documentation/SubmittingPatches and elsewhere regarding submitting Linux
9kernel patches. 9kernel patches.
10 10
11 11
121: Builds cleanly with applicable or modified CONFIG options =y, =m, and 121: If you use a facility then #include the file that defines/declares
13 that facility. Don't depend on other header files pulling in ones
14 that you use.
15
162: Builds cleanly with applicable or modified CONFIG options =y, =m, and
13 =n. No gcc warnings/errors, no linker warnings/errors. 17 =n. No gcc warnings/errors, no linker warnings/errors.
14 18
152: Passes allnoconfig, allmodconfig 192b: Passes allnoconfig, allmodconfig
16 20
173: Builds on multiple CPU architectures by using local cross-compile tools 213: Builds on multiple CPU architectures by using local cross-compile tools
18 or some other build farm. 22 or some other build farm.
diff --git a/Documentation/arm/Samsung-S3C24XX/CPUfreq.txt b/Documentation/arm/Samsung-S3C24XX/CPUfreq.txt
index 76b3a11e90be..fa968aa99d67 100644
--- a/Documentation/arm/Samsung-S3C24XX/CPUfreq.txt
+++ b/Documentation/arm/Samsung-S3C24XX/CPUfreq.txt
@@ -14,8 +14,8 @@ Introduction
14 how the clocks are arranged. The first implementation used as single 14 how the clocks are arranged. The first implementation used as single
15 PLL to feed the ARM, memory and peripherals via a series of dividers 15 PLL to feed the ARM, memory and peripherals via a series of dividers
16 and muxes and this is the implementation that is documented here. A 16 and muxes and this is the implementation that is documented here. A
17 newer version where there is a seperate PLL and clock divider for the 17 newer version where there is a separate PLL and clock divider for the
18 ARM core is available as a seperate driver. 18 ARM core is available as a separate driver.
19 19
20 20
21Layout 21Layout
diff --git a/Documentation/arm/Samsung/Overview.txt b/Documentation/arm/Samsung/Overview.txt
new file mode 100644
index 000000000000..7cced1fea9c3
--- /dev/null
+++ b/Documentation/arm/Samsung/Overview.txt
@@ -0,0 +1,86 @@
1 Samsung ARM Linux Overview
2 ==========================
3
4Introduction
5------------
6
7 The Samsung range of ARM SoCs spans many similar devices, from the initial
8 ARM9 through to the newest ARM cores. This document shows an overview of
9 the current kernel support, how to use it and where to find the code
10 that supports this.
11
12 The currently supported SoCs are:
13
14 - S3C24XX: See Documentation/arm/Samsung-S3C24XX/Overview.txt for full list
15 - S3C64XX: S3C6400 and S3C6410
16 - S5PC6440
17
18 S5PC100 and S5PC110 support is currently being merged
19
20
21S3C24XX Systems
22---------------
23
24 There is still documentation in Documnetation/arm/Samsung-S3C24XX/ which
25 deals with the architecture and drivers specific to these devices.
26
27 See Documentation/arm/Samsung-S3C24XX/Overview.txt for more information
28 on the implementation details and specific support.
29
30
31Configuration
32-------------
33
34 A number of configurations are supplied, as there is no current way of
35 unifying all the SoCs into one kernel.
36
37 s5p6440_defconfig - S5P6440 specific default configuration
38 s5pc100_defconfig - S5PC100 specific default configuration
39
40
41Layout
42------
43
44 The directory layout is currently being restructured, and consists of
45 several platform directories and then the machine specific directories
46 of the CPUs being built for.
47
48 plat-samsung provides the base for all the implementations, and is the
49 last in the line of include directories that are processed for the build
50 specific information. It contains the base clock, GPIO and device definitions
51 to get the system running.
52
53 plat-s3c is the s3c24xx/s3c64xx platform directory, although it is currently
54 involved in other builds this will be phased out once the relevant code is
55 moved elsewhere.
56
57 plat-s3c24xx is for s3c24xx specific builds, see the S3C24XX docs.
58
59 plat-s3c64xx is for the s3c64xx specific bits, see the S3C24XX docs.
60
61 plat-s5p is for s5p specific builds, more to be added.
62
63
64 [ to finish ]
65
66
67Port Contributors
68-----------------
69
70 Ben Dooks (BJD)
71 Vincent Sanders
72 Herbert Potzl
73 Arnaud Patard (RTP)
74 Roc Wu
75 Klaus Fetscher
76 Dimitry Andric
77 Shannon Holland
78 Guillaume Gourat (NexVision)
79 Christer Weinigel (wingel) (Acer N30)
80 Lucas Correia Villa Real (S3C2400 port)
81
82
83Document Author
84---------------
85
86Copyright 2009-2010 Ben Dooks <ben-linux@fluff.org>
diff --git a/Documentation/arm/Samsung/clksrc-change-registers.awk b/Documentation/arm/Samsung/clksrc-change-registers.awk
new file mode 100755
index 000000000000..0c50220851fb
--- /dev/null
+++ b/Documentation/arm/Samsung/clksrc-change-registers.awk
@@ -0,0 +1,167 @@
1#!/usr/bin/awk -f
2#
3# Copyright 2010 Ben Dooks <ben-linux@fluff.org>
4#
5# Released under GPLv2
6
7# example usage
8# ./clksrc-change-registers.awk arch/arm/plat-s5pc1xx/include/plat/regs-clock.h < src > dst
9
10function extract_value(s)
11{
12 eqat = index(s, "=")
13 comat = index(s, ",")
14 return substr(s, eqat+2, (comat-eqat)-2)
15}
16
17function remove_brackets(b)
18{
19 return substr(b, 2, length(b)-2)
20}
21
22function splitdefine(l, p)
23{
24 r = split(l, tp)
25
26 p[0] = tp[2]
27 p[1] = remove_brackets(tp[3])
28}
29
30function find_length(f)
31{
32 if (0)
33 printf "find_length " f "\n" > "/dev/stderr"
34
35 if (f ~ /0x1/)
36 return 1
37 else if (f ~ /0x3/)
38 return 2
39 else if (f ~ /0x7/)
40 return 3
41 else if (f ~ /0xf/)
42 return 4
43
44 printf "unknown legnth " f "\n" > "/dev/stderr"
45 exit
46}
47
48function find_shift(s)
49{
50 id = index(s, "<")
51 if (id <= 0) {
52 printf "cannot find shift " s "\n" > "/dev/stderr"
53 exit
54 }
55
56 return substr(s, id+2)
57}
58
59
60BEGIN {
61 if (ARGC < 2) {
62 print "too few arguments" > "/dev/stderr"
63 exit
64 }
65
66# read the header file and find the mask values that we will need
67# to replace and create an associative array of values
68
69 while (getline line < ARGV[1] > 0) {
70 if (line ~ /\#define.*_MASK/ &&
71 !(line ~ /S5PC100_EPLL_MASK/) &&
72 !(line ~ /USB_SIG_MASK/)) {
73 splitdefine(line, fields)
74 name = fields[0]
75 if (0)
76 printf "MASK " line "\n" > "/dev/stderr"
77 dmask[name,0] = find_length(fields[1])
78 dmask[name,1] = find_shift(fields[1])
79 if (0)
80 printf "=> '" name "' LENGTH=" dmask[name,0] " SHIFT=" dmask[name,1] "\n" > "/dev/stderr"
81 } else {
82 }
83 }
84
85 delete ARGV[1]
86}
87
88/clksrc_clk.*=.*{/ {
89 shift=""
90 mask=""
91 divshift=""
92 reg_div=""
93 reg_src=""
94 indent=1
95
96 print $0
97
98 for(; indent >= 1;) {
99 if ((getline line) <= 0) {
100 printf "unexpected end of file" > "/dev/stderr"
101 exit 1;
102 }
103
104 if (line ~ /\.shift/) {
105 shift = extract_value(line)
106 } else if (line ~ /\.mask/) {
107 mask = extract_value(line)
108 } else if (line ~ /\.reg_divider/) {
109 reg_div = extract_value(line)
110 } else if (line ~ /\.reg_source/) {
111 reg_src = extract_value(line)
112 } else if (line ~ /\.divider_shift/) {
113 divshift = extract_value(line)
114 } else if (line ~ /{/) {
115 indent++
116 print line
117 } else if (line ~ /}/) {
118 indent--
119
120 if (indent == 0) {
121 if (0) {
122 printf "shift '" shift "' ='" dmask[shift,0] "'\n" > "/dev/stderr"
123 printf "mask '" mask "'\n" > "/dev/stderr"
124 printf "dshft '" divshift "'\n" > "/dev/stderr"
125 printf "rdiv '" reg_div "'\n" > "/dev/stderr"
126 printf "rsrc '" reg_src "'\n" > "/dev/stderr"
127 }
128
129 generated = mask
130 sub(reg_src, reg_div, generated)
131
132 if (0) {
133 printf "/* rsrc " reg_src " */\n"
134 printf "/* rdiv " reg_div " */\n"
135 printf "/* shift " shift " */\n"
136 printf "/* mask " mask " */\n"
137 printf "/* generated " generated " */\n"
138 }
139
140 if (reg_div != "") {
141 printf "\t.reg_div = { "
142 printf ".reg = " reg_div ", "
143 printf ".shift = " dmask[generated,1] ", "
144 printf ".size = " dmask[generated,0] ", "
145 printf "},\n"
146 }
147
148 printf "\t.reg_src = { "
149 printf ".reg = " reg_src ", "
150 printf ".shift = " dmask[mask,1] ", "
151 printf ".size = " dmask[mask,0] ", "
152
153 printf "},\n"
154
155 }
156
157 print line
158 } else {
159 print line
160 }
161
162 if (0)
163 printf indent ":" line "\n" > "/dev/stderr"
164 }
165}
166
167// && ! /clksrc_clk.*=.*{/ { print $0 }
diff --git a/Documentation/arm/memory.txt b/Documentation/arm/memory.txt
index 9d58c7c5eddd..eb0fae18ffb1 100644
--- a/Documentation/arm/memory.txt
+++ b/Documentation/arm/memory.txt
@@ -59,7 +59,11 @@ PAGE_OFFSET high_memory-1 Kernel direct-mapped RAM region.
59 This maps the platforms RAM, and typically 59 This maps the platforms RAM, and typically
60 maps all platform RAM in a 1:1 relationship. 60 maps all platform RAM in a 1:1 relationship.
61 61
62TASK_SIZE PAGE_OFFSET-1 Kernel module space 62PKMAP_BASE PAGE_OFFSET-1 Permanent kernel mappings
63 One way of mapping HIGHMEM pages into kernel
64 space.
65
66MODULES_VADDR MODULES_END-1 Kernel module space
63 Kernel modules inserted via insmod are 67 Kernel modules inserted via insmod are
64 placed here using dynamic mappings. 68 placed here using dynamic mappings.
65 69
diff --git a/Documentation/block/queue-sysfs.txt b/Documentation/block/queue-sysfs.txt
index e164403f60e1..f65274081c8d 100644
--- a/Documentation/block/queue-sysfs.txt
+++ b/Documentation/block/queue-sysfs.txt
@@ -25,11 +25,11 @@ size allowed by the hardware.
25 25
26nomerges (RW) 26nomerges (RW)
27------------- 27-------------
28This enables the user to disable the lookup logic involved with IO merging 28This enables the user to disable the lookup logic involved with IO
29requests in the block layer. Merging may still occur through a direct 29merging requests in the block layer. By default (0) all merges are
301-hit cache, since that comes for (almost) free. The IO scheduler will not 30enabled. When set to 1 only simple one-hit merges will be tried. When
31waste cycles doing tree/hash lookups for merges if nomerges is 1. Defaults 31set to 2 no merge algorithms will be tried (including one-hit or more
32to 0, enabling all merges. 32complex tree/hash lookups).
33 33
34nr_requests (RW) 34nr_requests (RW)
35---------------- 35----------------
diff --git a/Documentation/cachetlb.txt b/Documentation/cachetlb.txt
index da42ab414c48..2b5f823abd03 100644
--- a/Documentation/cachetlb.txt
+++ b/Documentation/cachetlb.txt
@@ -88,12 +88,12 @@ changes occur:
88 This is used primarily during fault processing. 88 This is used primarily during fault processing.
89 89
905) void update_mmu_cache(struct vm_area_struct *vma, 905) void update_mmu_cache(struct vm_area_struct *vma,
91 unsigned long address, pte_t pte) 91 unsigned long address, pte_t *ptep)
92 92
93 At the end of every page fault, this routine is invoked to 93 At the end of every page fault, this routine is invoked to
94 tell the architecture specific code that a translation 94 tell the architecture specific code that a translation
95 described by "pte" now exists at virtual address "address" 95 now exists at virtual address "address" for address space
96 for address space "vma->vm_mm", in the software page tables. 96 "vma->vm_mm", in the software page tables.
97 97
98 A port may use this information in any way it so chooses. 98 A port may use this information in any way it so chooses.
99 For example, it could use this event to pre-load TLB 99 For example, it could use this event to pre-load TLB
@@ -377,3 +377,27 @@ maps this page at its virtual address.
377 All the functionality of flush_icache_page can be implemented in 377 All the functionality of flush_icache_page can be implemented in
378 flush_dcache_page and update_mmu_cache. In 2.7 the hope is to 378 flush_dcache_page and update_mmu_cache. In 2.7 the hope is to
379 remove this interface completely. 379 remove this interface completely.
380
381The final category of APIs is for I/O to deliberately aliased address
382ranges inside the kernel. Such aliases are set up by use of the
383vmap/vmalloc API. Since kernel I/O goes via physical pages, the I/O
384subsystem assumes that the user mapping and kernel offset mapping are
385the only aliases. This isn't true for vmap aliases, so anything in
386the kernel trying to do I/O to vmap areas must manually manage
387coherency. It must do this by flushing the vmap range before doing
388I/O and invalidating it after the I/O returns.
389
390 void flush_kernel_vmap_range(void *vaddr, int size)
391 flushes the kernel cache for a given virtual address range in
392 the vmap area. This is to make sure that any data the kernel
393 modified in the vmap range is made visible to the physical
394 page. The design is to make this area safe to perform I/O on.
395 Note that this API does *not* also flush the offset map alias
396 of the area.
397
398 void invalidate_kernel_vmap_range(void *vaddr, int size) invalidates
399 the cache for a given virtual address range in the vmap area
400 which prevents the processor from making the cache stale by
401 speculatively reading data while the I/O was occurring to the
402 physical pages. This is only necessary for data reads into the
403 vmap area.
diff --git a/Documentation/cdrom/ide-cd b/Documentation/cdrom/ide-cd
index 2c558cd6c1ef..f4dc9de2694e 100644
--- a/Documentation/cdrom/ide-cd
+++ b/Documentation/cdrom/ide-cd
@@ -159,42 +159,7 @@ two arguments: the CDROM device, and the slot number to which you wish
159to change. If the slot number is -1, the drive is unloaded. 159to change. If the slot number is -1, the drive is unloaded.
160 160
161 161
1624. Compilation options 1624. Common problems
163----------------------
164
165There are a few additional options which can be set when compiling the
166driver. Most people should not need to mess with any of these; they
167are listed here simply for completeness. A compilation option can be
168enabled by adding a line of the form `#define <option> 1' to the top
169of ide-cd.c. All these options are disabled by default.
170
171VERBOSE_IDE_CD_ERRORS
172 If this is set, ATAPI error codes will be translated into textual
173 descriptions. In addition, a dump is made of the command which
174 provoked the error. This is off by default to save the memory used
175 by the (somewhat long) table of error descriptions.
176
177STANDARD_ATAPI
178 If this is set, the code needed to deal with certain drives which do
179 not properly implement the ATAPI spec will be disabled. If you know
180 your drive implements ATAPI properly, you can turn this on to get a
181 slightly smaller kernel.
182
183NO_DOOR_LOCKING
184 If this is set, the driver will never attempt to lock the door of
185 the drive.
186
187CDROM_NBLOCKS_BUFFER
188 This sets the size of the buffer to be used for a CDROMREADAUDIO
189 ioctl. The default is 8.
190
191TEST
192 This currently enables an additional ioctl which enables a user-mode
193 program to execute an arbitrary packet command. See the source for
194 details. This should be left off unless you know what you're doing.
195
196
1975. Common problems
198------------------ 163------------------
199 164
200This section discusses some common problems encountered when trying to 165This section discusses some common problems encountered when trying to
@@ -371,7 +336,7 @@ f. Data corruption.
371 expense of low system performance. 336 expense of low system performance.
372 337
373 338
3746. cdchange.c 3395. cdchange.c
375------------- 340-------------
376 341
377/* 342/*
diff --git a/Documentation/cgroups/cgroup_event_listener.c b/Documentation/cgroups/cgroup_event_listener.c
new file mode 100644
index 000000000000..8c2bfc4a6358
--- /dev/null
+++ b/Documentation/cgroups/cgroup_event_listener.c
@@ -0,0 +1,110 @@
1/*
2 * cgroup_event_listener.c - Simple listener of cgroup events
3 *
4 * Copyright (C) Kirill A. Shutemov <kirill@shutemov.name>
5 */
6
7#include <assert.h>
8#include <errno.h>
9#include <fcntl.h>
10#include <libgen.h>
11#include <limits.h>
12#include <stdio.h>
13#include <string.h>
14#include <unistd.h>
15
16#include <sys/eventfd.h>
17
18#define USAGE_STR "Usage: cgroup_event_listener <path-to-control-file> <args>\n"
19
20int main(int argc, char **argv)
21{
22 int efd = -1;
23 int cfd = -1;
24 int event_control = -1;
25 char event_control_path[PATH_MAX];
26 char line[LINE_MAX];
27 int ret;
28
29 if (argc != 3) {
30 fputs(USAGE_STR, stderr);
31 return 1;
32 }
33
34 cfd = open(argv[1], O_RDONLY);
35 if (cfd == -1) {
36 fprintf(stderr, "Cannot open %s: %s\n", argv[1],
37 strerror(errno));
38 goto out;
39 }
40
41 ret = snprintf(event_control_path, PATH_MAX, "%s/cgroup.event_control",
42 dirname(argv[1]));
43 if (ret >= PATH_MAX) {
44 fputs("Path to cgroup.event_control is too long\n", stderr);
45 goto out;
46 }
47
48 event_control = open(event_control_path, O_WRONLY);
49 if (event_control == -1) {
50 fprintf(stderr, "Cannot open %s: %s\n", event_control_path,
51 strerror(errno));
52 goto out;
53 }
54
55 efd = eventfd(0, 0);
56 if (efd == -1) {
57 perror("eventfd() failed");
58 goto out;
59 }
60
61 ret = snprintf(line, LINE_MAX, "%d %d %s", efd, cfd, argv[2]);
62 if (ret >= LINE_MAX) {
63 fputs("Arguments string is too long\n", stderr);
64 goto out;
65 }
66
67 ret = write(event_control, line, strlen(line) + 1);
68 if (ret == -1) {
69 perror("Cannot write to cgroup.event_control");
70 goto out;
71 }
72
73 while (1) {
74 uint64_t result;
75
76 ret = read(efd, &result, sizeof(result));
77 if (ret == -1) {
78 if (errno == EINTR)
79 continue;
80 perror("Cannot read from eventfd");
81 break;
82 }
83 assert(ret == sizeof(result));
84
85 ret = access(event_control_path, W_OK);
86 if ((ret == -1) && (errno == ENOENT)) {
87 puts("The cgroup seems to have removed.");
88 ret = 0;
89 break;
90 }
91
92 if (ret == -1) {
93 perror("cgroup.event_control "
94 "is not accessable any more");
95 break;
96 }
97
98 printf("%s %s: crossed\n", argv[1], argv[2]);
99 }
100
101out:
102 if (efd >= 0)
103 close(efd);
104 if (event_control >= 0)
105 close(event_control);
106 if (cfd >= 0)
107 close(cfd);
108
109 return (ret != 0);
110}
diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroups/cgroups.txt
index 0b33bfe7dde9..fd588ff0e296 100644
--- a/Documentation/cgroups/cgroups.txt
+++ b/Documentation/cgroups/cgroups.txt
@@ -22,6 +22,8 @@ CONTENTS:
222. Usage Examples and Syntax 222. Usage Examples and Syntax
23 2.1 Basic Usage 23 2.1 Basic Usage
24 2.2 Attaching processes 24 2.2 Attaching processes
25 2.3 Mounting hierarchies by name
26 2.4 Notification API
253. Kernel API 273. Kernel API
26 3.1 Overview 28 3.1 Overview
27 3.2 Synchronization 29 3.2 Synchronization
@@ -434,6 +436,25 @@ you give a subsystem a name.
434The name of the subsystem appears as part of the hierarchy description 436The name of the subsystem appears as part of the hierarchy description
435in /proc/mounts and /proc/<pid>/cgroups. 437in /proc/mounts and /proc/<pid>/cgroups.
436 438
4392.4 Notification API
440--------------------
441
442There is mechanism which allows to get notifications about changing
443status of a cgroup.
444
445To register new notification handler you need:
446 - create a file descriptor for event notification using eventfd(2);
447 - open a control file to be monitored (e.g. memory.usage_in_bytes);
448 - write "<event_fd> <control_fd> <args>" to cgroup.event_control.
449 Interpretation of args is defined by control file implementation;
450
451eventfd will be woken up by control file implementation or when the
452cgroup is removed.
453
454To unregister notification handler just close eventfd.
455
456NOTE: Support of notifications should be implemented for the control
457file. See documentation for the subsystem.
437 458
4383. Kernel API 4593. Kernel API
439============= 460=============
@@ -488,6 +509,11 @@ Each subsystem should:
488- add an entry in linux/cgroup_subsys.h 509- add an entry in linux/cgroup_subsys.h
489- define a cgroup_subsys object called <name>_subsys 510- define a cgroup_subsys object called <name>_subsys
490 511
512If a subsystem can be compiled as a module, it should also have in its
513module initcall a call to cgroup_load_subsys(), and in its exitcall a
514call to cgroup_unload_subsys(). It should also set its_subsys.module =
515THIS_MODULE in its .c file.
516
491Each subsystem may export the following methods. The only mandatory 517Each subsystem may export the following methods. The only mandatory
492methods are create/destroy. Any others that are null are presumed to 518methods are create/destroy. Any others that are null are presumed to
493be successful no-ops. 519be successful no-ops.
@@ -536,10 +562,21 @@ returns an error, this will abort the attach operation. If a NULL
536task is passed, then a successful result indicates that *any* 562task is passed, then a successful result indicates that *any*
537unspecified task can be moved into the cgroup. Note that this isn't 563unspecified task can be moved into the cgroup. Note that this isn't
538called on a fork. If this method returns 0 (success) then this should 564called on a fork. If this method returns 0 (success) then this should
539remain valid while the caller holds cgroup_mutex. If threadgroup is 565remain valid while the caller holds cgroup_mutex and it is ensured that either
566attach() or cancel_attach() will be called in future. If threadgroup is
540true, then a successful result indicates that all threads in the given 567true, then a successful result indicates that all threads in the given
541thread's threadgroup can be moved together. 568thread's threadgroup can be moved together.
542 569
570void cancel_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
571 struct task_struct *task, bool threadgroup)
572(cgroup_mutex held by caller)
573
574Called when a task attach operation has failed after can_attach() has succeeded.
575A subsystem whose can_attach() has some side-effects should provide this
576function, so that the subsytem can implement a rollback. If not, not necessary.
577This will be called only about subsystems whose can_attach() operation have
578succeeded.
579
543void attach(struct cgroup_subsys *ss, struct cgroup *cgrp, 580void attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
544 struct cgroup *old_cgrp, struct task_struct *task, 581 struct cgroup *old_cgrp, struct task_struct *task,
545 bool threadgroup) 582 bool threadgroup)
diff --git a/Documentation/cgroups/cpusets.txt b/Documentation/cgroups/cpusets.txt
index 1d7e9784439a..4160df82b3f5 100644
--- a/Documentation/cgroups/cpusets.txt
+++ b/Documentation/cgroups/cpusets.txt
@@ -168,20 +168,20 @@ Each cpuset is represented by a directory in the cgroup file system
168containing (on top of the standard cgroup files) the following 168containing (on top of the standard cgroup files) the following
169files describing that cpuset: 169files describing that cpuset:
170 170
171 - cpus: list of CPUs in that cpuset 171 - cpuset.cpus: list of CPUs in that cpuset
172 - mems: list of Memory Nodes in that cpuset 172 - cpuset.mems: list of Memory Nodes in that cpuset
173 - memory_migrate flag: if set, move pages to cpusets nodes 173 - cpuset.memory_migrate flag: if set, move pages to cpusets nodes
174 - cpu_exclusive flag: is cpu placement exclusive? 174 - cpuset.cpu_exclusive flag: is cpu placement exclusive?
175 - mem_exclusive flag: is memory placement exclusive? 175 - cpuset.mem_exclusive flag: is memory placement exclusive?
176 - mem_hardwall flag: is memory allocation hardwalled 176 - cpuset.mem_hardwall flag: is memory allocation hardwalled
177 - memory_pressure: measure of how much paging pressure in cpuset 177 - cpuset.memory_pressure: measure of how much paging pressure in cpuset
178 - memory_spread_page flag: if set, spread page cache evenly on allowed nodes 178 - cpuset.memory_spread_page flag: if set, spread page cache evenly on allowed nodes
179 - memory_spread_slab flag: if set, spread slab cache evenly on allowed nodes 179 - cpuset.memory_spread_slab flag: if set, spread slab cache evenly on allowed nodes
180 - sched_load_balance flag: if set, load balance within CPUs on that cpuset 180 - cpuset.sched_load_balance flag: if set, load balance within CPUs on that cpuset
181 - sched_relax_domain_level: the searching range when migrating tasks 181 - cpuset.sched_relax_domain_level: the searching range when migrating tasks
182 182
183In addition, the root cpuset only has the following file: 183In addition, the root cpuset only has the following file:
184 - memory_pressure_enabled flag: compute memory_pressure? 184 - cpuset.memory_pressure_enabled flag: compute memory_pressure?
185 185
186New cpusets are created using the mkdir system call or shell 186New cpusets are created using the mkdir system call or shell
187command. The properties of a cpuset, such as its flags, allowed 187command. The properties of a cpuset, such as its flags, allowed
@@ -229,7 +229,7 @@ If a cpuset is cpu or mem exclusive, no other cpuset, other than
229a direct ancestor or descendant, may share any of the same CPUs or 229a direct ancestor or descendant, may share any of the same CPUs or
230Memory Nodes. 230Memory Nodes.
231 231
232A cpuset that is mem_exclusive *or* mem_hardwall is "hardwalled", 232A cpuset that is cpuset.mem_exclusive *or* cpuset.mem_hardwall is "hardwalled",
233i.e. it restricts kernel allocations for page, buffer and other data 233i.e. it restricts kernel allocations for page, buffer and other data
234commonly shared by the kernel across multiple users. All cpusets, 234commonly shared by the kernel across multiple users. All cpusets,
235whether hardwalled or not, restrict allocations of memory for user 235whether hardwalled or not, restrict allocations of memory for user
@@ -304,15 +304,15 @@ times 1000.
304--------------------------- 304---------------------------
305There are two boolean flag files per cpuset that control where the 305There are two boolean flag files per cpuset that control where the
306kernel allocates pages for the file system buffers and related in 306kernel allocates pages for the file system buffers and related in
307kernel data structures. They are called 'memory_spread_page' and 307kernel data structures. They are called 'cpuset.memory_spread_page' and
308'memory_spread_slab'. 308'cpuset.memory_spread_slab'.
309 309
310If the per-cpuset boolean flag file 'memory_spread_page' is set, then 310If the per-cpuset boolean flag file 'cpuset.memory_spread_page' is set, then
311the kernel will spread the file system buffers (page cache) evenly 311the kernel will spread the file system buffers (page cache) evenly
312over all the nodes that the faulting task is allowed to use, instead 312over all the nodes that the faulting task is allowed to use, instead
313of preferring to put those pages on the node where the task is running. 313of preferring to put those pages on the node where the task is running.
314 314
315If the per-cpuset boolean flag file 'memory_spread_slab' is set, 315If the per-cpuset boolean flag file 'cpuset.memory_spread_slab' is set,
316then the kernel will spread some file system related slab caches, 316then the kernel will spread some file system related slab caches,
317such as for inodes and dentries evenly over all the nodes that the 317such as for inodes and dentries evenly over all the nodes that the
318faulting task is allowed to use, instead of preferring to put those 318faulting task is allowed to use, instead of preferring to put those
@@ -337,21 +337,21 @@ their containing tasks memory spread settings. If memory spreading
337is turned off, then the currently specified NUMA mempolicy once again 337is turned off, then the currently specified NUMA mempolicy once again
338applies to memory page allocations. 338applies to memory page allocations.
339 339
340Both 'memory_spread_page' and 'memory_spread_slab' are boolean flag 340Both 'cpuset.memory_spread_page' and 'cpuset.memory_spread_slab' are boolean flag
341files. By default they contain "0", meaning that the feature is off 341files. By default they contain "0", meaning that the feature is off
342for that cpuset. If a "1" is written to that file, then that turns 342for that cpuset. If a "1" is written to that file, then that turns
343the named feature on. 343the named feature on.
344 344
345The implementation is simple. 345The implementation is simple.
346 346
347Setting the flag 'memory_spread_page' turns on a per-process flag 347Setting the flag 'cpuset.memory_spread_page' turns on a per-process flag
348PF_SPREAD_PAGE for each task that is in that cpuset or subsequently 348PF_SPREAD_PAGE for each task that is in that cpuset or subsequently
349joins that cpuset. The page allocation calls for the page cache 349joins that cpuset. The page allocation calls for the page cache
350is modified to perform an inline check for this PF_SPREAD_PAGE task 350is modified to perform an inline check for this PF_SPREAD_PAGE task
351flag, and if set, a call to a new routine cpuset_mem_spread_node() 351flag, and if set, a call to a new routine cpuset_mem_spread_node()
352returns the node to prefer for the allocation. 352returns the node to prefer for the allocation.
353 353
354Similarly, setting 'memory_spread_slab' turns on the flag 354Similarly, setting 'cpuset.memory_spread_slab' turns on the flag
355PF_SPREAD_SLAB, and appropriately marked slab caches will allocate 355PF_SPREAD_SLAB, and appropriately marked slab caches will allocate
356pages from the node returned by cpuset_mem_spread_node(). 356pages from the node returned by cpuset_mem_spread_node().
357 357
@@ -404,24 +404,24 @@ the following two situations:
404 system overhead on those CPUs, including avoiding task load 404 system overhead on those CPUs, including avoiding task load
405 balancing if that is not needed. 405 balancing if that is not needed.
406 406
407When the per-cpuset flag "sched_load_balance" is enabled (the default 407When the per-cpuset flag "cpuset.sched_load_balance" is enabled (the default
408setting), it requests that all the CPUs in that cpusets allowed 'cpus' 408setting), it requests that all the CPUs in that cpusets allowed 'cpuset.cpus'
409be contained in a single sched domain, ensuring that load balancing 409be contained in a single sched domain, ensuring that load balancing
410can move a task (not otherwised pinned, as by sched_setaffinity) 410can move a task (not otherwised pinned, as by sched_setaffinity)
411from any CPU in that cpuset to any other. 411from any CPU in that cpuset to any other.
412 412
413When the per-cpuset flag "sched_load_balance" is disabled, then the 413When the per-cpuset flag "cpuset.sched_load_balance" is disabled, then the
414scheduler will avoid load balancing across the CPUs in that cpuset, 414scheduler will avoid load balancing across the CPUs in that cpuset,
415--except-- in so far as is necessary because some overlapping cpuset 415--except-- in so far as is necessary because some overlapping cpuset
416has "sched_load_balance" enabled. 416has "sched_load_balance" enabled.
417 417
418So, for example, if the top cpuset has the flag "sched_load_balance" 418So, for example, if the top cpuset has the flag "cpuset.sched_load_balance"
419enabled, then the scheduler will have one sched domain covering all 419enabled, then the scheduler will have one sched domain covering all
420CPUs, and the setting of the "sched_load_balance" flag in any other 420CPUs, and the setting of the "cpuset.sched_load_balance" flag in any other
421cpusets won't matter, as we're already fully load balancing. 421cpusets won't matter, as we're already fully load balancing.
422 422
423Therefore in the above two situations, the top cpuset flag 423Therefore in the above two situations, the top cpuset flag
424"sched_load_balance" should be disabled, and only some of the smaller, 424"cpuset.sched_load_balance" should be disabled, and only some of the smaller,
425child cpusets have this flag enabled. 425child cpusets have this flag enabled.
426 426
427When doing this, you don't usually want to leave any unpinned tasks in 427When doing this, you don't usually want to leave any unpinned tasks in
@@ -433,7 +433,7 @@ scheduler might not consider the possibility of load balancing that
433task to that underused CPU. 433task to that underused CPU.
434 434
435Of course, tasks pinned to a particular CPU can be left in a cpuset 435Of course, tasks pinned to a particular CPU can be left in a cpuset
436that disables "sched_load_balance" as those tasks aren't going anywhere 436that disables "cpuset.sched_load_balance" as those tasks aren't going anywhere
437else anyway. 437else anyway.
438 438
439There is an impedance mismatch here, between cpusets and sched domains. 439There is an impedance mismatch here, between cpusets and sched domains.
@@ -443,19 +443,19 @@ overlap and each CPU is in at most one sched domain.
443It is necessary for sched domains to be flat because load balancing 443It is necessary for sched domains to be flat because load balancing
444across partially overlapping sets of CPUs would risk unstable dynamics 444across partially overlapping sets of CPUs would risk unstable dynamics
445that would be beyond our understanding. So if each of two partially 445that would be beyond our understanding. So if each of two partially
446overlapping cpusets enables the flag 'sched_load_balance', then we 446overlapping cpusets enables the flag 'cpuset.sched_load_balance', then we
447form a single sched domain that is a superset of both. We won't move 447form a single sched domain that is a superset of both. We won't move
448a task to a CPU outside it cpuset, but the scheduler load balancing 448a task to a CPU outside it cpuset, but the scheduler load balancing
449code might waste some compute cycles considering that possibility. 449code might waste some compute cycles considering that possibility.
450 450
451This mismatch is why there is not a simple one-to-one relation 451This mismatch is why there is not a simple one-to-one relation
452between which cpusets have the flag "sched_load_balance" enabled, 452between which cpusets have the flag "cpuset.sched_load_balance" enabled,
453and the sched domain configuration. If a cpuset enables the flag, it 453and the sched domain configuration. If a cpuset enables the flag, it
454will get balancing across all its CPUs, but if it disables the flag, 454will get balancing across all its CPUs, but if it disables the flag,
455it will only be assured of no load balancing if no other overlapping 455it will only be assured of no load balancing if no other overlapping
456cpuset enables the flag. 456cpuset enables the flag.
457 457
458If two cpusets have partially overlapping 'cpus' allowed, and only 458If two cpusets have partially overlapping 'cpuset.cpus' allowed, and only
459one of them has this flag enabled, then the other may find its 459one of them has this flag enabled, then the other may find its
460tasks only partially load balanced, just on the overlapping CPUs. 460tasks only partially load balanced, just on the overlapping CPUs.
461This is just the general case of the top_cpuset example given a few 461This is just the general case of the top_cpuset example given a few
@@ -468,23 +468,23 @@ load balancing to the other CPUs.
4681.7.1 sched_load_balance implementation details. 4681.7.1 sched_load_balance implementation details.
469------------------------------------------------ 469------------------------------------------------
470 470
471The per-cpuset flag 'sched_load_balance' defaults to enabled (contrary 471The per-cpuset flag 'cpuset.sched_load_balance' defaults to enabled (contrary
472to most cpuset flags.) When enabled for a cpuset, the kernel will 472to most cpuset flags.) When enabled for a cpuset, the kernel will
473ensure that it can load balance across all the CPUs in that cpuset 473ensure that it can load balance across all the CPUs in that cpuset
474(makes sure that all the CPUs in the cpus_allowed of that cpuset are 474(makes sure that all the CPUs in the cpus_allowed of that cpuset are
475in the same sched domain.) 475in the same sched domain.)
476 476
477If two overlapping cpusets both have 'sched_load_balance' enabled, 477If two overlapping cpusets both have 'cpuset.sched_load_balance' enabled,
478then they will be (must be) both in the same sched domain. 478then they will be (must be) both in the same sched domain.
479 479
480If, as is the default, the top cpuset has 'sched_load_balance' enabled, 480If, as is the default, the top cpuset has 'cpuset.sched_load_balance' enabled,
481then by the above that means there is a single sched domain covering 481then by the above that means there is a single sched domain covering
482the whole system, regardless of any other cpuset settings. 482the whole system, regardless of any other cpuset settings.
483 483
484The kernel commits to user space that it will avoid load balancing 484The kernel commits to user space that it will avoid load balancing
485where it can. It will pick as fine a granularity partition of sched 485where it can. It will pick as fine a granularity partition of sched
486domains as it can while still providing load balancing for any set 486domains as it can while still providing load balancing for any set
487of CPUs allowed to a cpuset having 'sched_load_balance' enabled. 487of CPUs allowed to a cpuset having 'cpuset.sched_load_balance' enabled.
488 488
489The internal kernel cpuset to scheduler interface passes from the 489The internal kernel cpuset to scheduler interface passes from the
490cpuset code to the scheduler code a partition of the load balanced 490cpuset code to the scheduler code a partition of the load balanced
@@ -495,9 +495,9 @@ all the CPUs that must be load balanced.
495The cpuset code builds a new such partition and passes it to the 495The cpuset code builds a new such partition and passes it to the
496scheduler sched domain setup code, to have the sched domains rebuilt 496scheduler sched domain setup code, to have the sched domains rebuilt
497as necessary, whenever: 497as necessary, whenever:
498 - the 'sched_load_balance' flag of a cpuset with non-empty CPUs changes, 498 - the 'cpuset.sched_load_balance' flag of a cpuset with non-empty CPUs changes,
499 - or CPUs come or go from a cpuset with this flag enabled, 499 - or CPUs come or go from a cpuset with this flag enabled,
500 - or 'sched_relax_domain_level' value of a cpuset with non-empty CPUs 500 - or 'cpuset.sched_relax_domain_level' value of a cpuset with non-empty CPUs
501 and with this flag enabled changes, 501 and with this flag enabled changes,
502 - or a cpuset with non-empty CPUs and with this flag enabled is removed, 502 - or a cpuset with non-empty CPUs and with this flag enabled is removed,
503 - or a cpu is offlined/onlined. 503 - or a cpu is offlined/onlined.
@@ -542,7 +542,7 @@ As the result, task B on CPU X need to wait task A or wait load balance
542on the next tick. For some applications in special situation, waiting 542on the next tick. For some applications in special situation, waiting
5431 tick may be too long. 5431 tick may be too long.
544 544
545The 'sched_relax_domain_level' file allows you to request changing 545The 'cpuset.sched_relax_domain_level' file allows you to request changing
546this searching range as you like. This file takes int value which 546this searching range as you like. This file takes int value which
547indicates size of searching range in levels ideally as follows, 547indicates size of searching range in levels ideally as follows,
548otherwise initial value -1 that indicates the cpuset has no request. 548otherwise initial value -1 that indicates the cpuset has no request.
@@ -559,8 +559,8 @@ The system default is architecture dependent. The system default
559can be changed using the relax_domain_level= boot parameter. 559can be changed using the relax_domain_level= boot parameter.
560 560
561This file is per-cpuset and affect the sched domain where the cpuset 561This file is per-cpuset and affect the sched domain where the cpuset
562belongs to. Therefore if the flag 'sched_load_balance' of a cpuset 562belongs to. Therefore if the flag 'cpuset.sched_load_balance' of a cpuset
563is disabled, then 'sched_relax_domain_level' have no effect since 563is disabled, then 'cpuset.sched_relax_domain_level' have no effect since
564there is no sched domain belonging the cpuset. 564there is no sched domain belonging the cpuset.
565 565
566If multiple cpusets are overlapping and hence they form a single sched 566If multiple cpusets are overlapping and hence they form a single sched
@@ -607,9 +607,9 @@ from one cpuset to another, then the kernel will adjust the tasks
607memory placement, as above, the next time that the kernel attempts 607memory placement, as above, the next time that the kernel attempts
608to allocate a page of memory for that task. 608to allocate a page of memory for that task.
609 609
610If a cpuset has its 'cpus' modified, then each task in that cpuset 610If a cpuset has its 'cpuset.cpus' modified, then each task in that cpuset
611will have its allowed CPU placement changed immediately. Similarly, 611will have its allowed CPU placement changed immediately. Similarly,
612if a tasks pid is written to another cpusets 'tasks' file, then its 612if a tasks pid is written to another cpusets 'cpuset.tasks' file, then its
613allowed CPU placement is changed immediately. If such a task had been 613allowed CPU placement is changed immediately. If such a task had been
614bound to some subset of its cpuset using the sched_setaffinity() call, 614bound to some subset of its cpuset using the sched_setaffinity() call,
615the task will be allowed to run on any CPU allowed in its new cpuset, 615the task will be allowed to run on any CPU allowed in its new cpuset,
@@ -622,8 +622,8 @@ and the processor placement is updated immediately.
622Normally, once a page is allocated (given a physical page 622Normally, once a page is allocated (given a physical page
623of main memory) then that page stays on whatever node it 623of main memory) then that page stays on whatever node it
624was allocated, so long as it remains allocated, even if the 624was allocated, so long as it remains allocated, even if the
625cpusets memory placement policy 'mems' subsequently changes. 625cpusets memory placement policy 'cpuset.mems' subsequently changes.
626If the cpuset flag file 'memory_migrate' is set true, then when 626If the cpuset flag file 'cpuset.memory_migrate' is set true, then when
627tasks are attached to that cpuset, any pages that task had 627tasks are attached to that cpuset, any pages that task had
628allocated to it on nodes in its previous cpuset are migrated 628allocated to it on nodes in its previous cpuset are migrated
629to the tasks new cpuset. The relative placement of the page within 629to the tasks new cpuset. The relative placement of the page within
@@ -631,12 +631,12 @@ the cpuset is preserved during these migration operations if possible.
631For example if the page was on the second valid node of the prior cpuset 631For example if the page was on the second valid node of the prior cpuset
632then the page will be placed on the second valid node of the new cpuset. 632then the page will be placed on the second valid node of the new cpuset.
633 633
634Also if 'memory_migrate' is set true, then if that cpusets 634Also if 'cpuset.memory_migrate' is set true, then if that cpusets
635'mems' file is modified, pages allocated to tasks in that 635'cpuset.mems' file is modified, pages allocated to tasks in that
636cpuset, that were on nodes in the previous setting of 'mems', 636cpuset, that were on nodes in the previous setting of 'cpuset.mems',
637will be moved to nodes in the new setting of 'mems.' 637will be moved to nodes in the new setting of 'mems.'
638Pages that were not in the tasks prior cpuset, or in the cpusets 638Pages that were not in the tasks prior cpuset, or in the cpusets
639prior 'mems' setting, will not be moved. 639prior 'cpuset.mems' setting, will not be moved.
640 640
641There is an exception to the above. If hotplug functionality is used 641There is an exception to the above. If hotplug functionality is used
642to remove all the CPUs that are currently assigned to a cpuset, 642to remove all the CPUs that are currently assigned to a cpuset,
@@ -678,8 +678,8 @@ and then start a subshell 'sh' in that cpuset:
678 cd /dev/cpuset 678 cd /dev/cpuset
679 mkdir Charlie 679 mkdir Charlie
680 cd Charlie 680 cd Charlie
681 /bin/echo 2-3 > cpus 681 /bin/echo 2-3 > cpuset.cpus
682 /bin/echo 1 > mems 682 /bin/echo 1 > cpuset.mems
683 /bin/echo $$ > tasks 683 /bin/echo $$ > tasks
684 sh 684 sh
685 # The subshell 'sh' is now running in cpuset Charlie 685 # The subshell 'sh' is now running in cpuset Charlie
@@ -725,10 +725,13 @@ Now you want to do something with this cpuset.
725 725
726In this directory you can find several files: 726In this directory you can find several files:
727# ls 727# ls
728cpu_exclusive memory_migrate mems tasks 728cpuset.cpu_exclusive cpuset.memory_spread_slab
729cpus memory_pressure notify_on_release 729cpuset.cpus cpuset.mems
730mem_exclusive memory_spread_page sched_load_balance 730cpuset.mem_exclusive cpuset.sched_load_balance
731mem_hardwall memory_spread_slab sched_relax_domain_level 731cpuset.mem_hardwall cpuset.sched_relax_domain_level
732cpuset.memory_migrate notify_on_release
733cpuset.memory_pressure tasks
734cpuset.memory_spread_page
732 735
733Reading them will give you information about the state of this cpuset: 736Reading them will give you information about the state of this cpuset:
734the CPUs and Memory Nodes it can use, the processes that are using 737the CPUs and Memory Nodes it can use, the processes that are using
@@ -736,13 +739,13 @@ it, its properties. By writing to these files you can manipulate
736the cpuset. 739the cpuset.
737 740
738Set some flags: 741Set some flags:
739# /bin/echo 1 > cpu_exclusive 742# /bin/echo 1 > cpuset.cpu_exclusive
740 743
741Add some cpus: 744Add some cpus:
742# /bin/echo 0-7 > cpus 745# /bin/echo 0-7 > cpuset.cpus
743 746
744Add some mems: 747Add some mems:
745# /bin/echo 0-7 > mems 748# /bin/echo 0-7 > cpuset.mems
746 749
747Now attach your shell to this cpuset: 750Now attach your shell to this cpuset:
748# /bin/echo $$ > tasks 751# /bin/echo $$ > tasks
@@ -774,28 +777,28 @@ echo "/sbin/cpuset_release_agent" > /dev/cpuset/release_agent
774This is the syntax to use when writing in the cpus or mems files 777This is the syntax to use when writing in the cpus or mems files
775in cpuset directories: 778in cpuset directories:
776 779
777# /bin/echo 1-4 > cpus -> set cpus list to cpus 1,2,3,4 780# /bin/echo 1-4 > cpuset.cpus -> set cpus list to cpus 1,2,3,4
778# /bin/echo 1,2,3,4 > cpus -> set cpus list to cpus 1,2,3,4 781# /bin/echo 1,2,3,4 > cpuset.cpus -> set cpus list to cpus 1,2,3,4
779 782
780To add a CPU to a cpuset, write the new list of CPUs including the 783To add a CPU to a cpuset, write the new list of CPUs including the
781CPU to be added. To add 6 to the above cpuset: 784CPU to be added. To add 6 to the above cpuset:
782 785
783# /bin/echo 1-4,6 > cpus -> set cpus list to cpus 1,2,3,4,6 786# /bin/echo 1-4,6 > cpuset.cpus -> set cpus list to cpus 1,2,3,4,6
784 787
785Similarly to remove a CPU from a cpuset, write the new list of CPUs 788Similarly to remove a CPU from a cpuset, write the new list of CPUs
786without the CPU to be removed. 789without the CPU to be removed.
787 790
788To remove all the CPUs: 791To remove all the CPUs:
789 792
790# /bin/echo "" > cpus -> clear cpus list 793# /bin/echo "" > cpuset.cpus -> clear cpus list
791 794
7922.3 Setting flags 7952.3 Setting flags
793----------------- 796-----------------
794 797
795The syntax is very simple: 798The syntax is very simple:
796 799
797# /bin/echo 1 > cpu_exclusive -> set flag 'cpu_exclusive' 800# /bin/echo 1 > cpuset.cpu_exclusive -> set flag 'cpuset.cpu_exclusive'
798# /bin/echo 0 > cpu_exclusive -> unset flag 'cpu_exclusive' 801# /bin/echo 0 > cpuset.cpu_exclusive -> unset flag 'cpuset.cpu_exclusive'
799 802
8002.4 Attaching processes 8032.4 Attaching processes
801----------------------- 804-----------------------
diff --git a/Documentation/cgroups/memcg_test.txt b/Documentation/cgroups/memcg_test.txt
index 72db89ed0609..f7f68b2ac199 100644
--- a/Documentation/cgroups/memcg_test.txt
+++ b/Documentation/cgroups/memcg_test.txt
@@ -1,6 +1,6 @@
1Memory Resource Controller(Memcg) Implementation Memo. 1Memory Resource Controller(Memcg) Implementation Memo.
2Last Updated: 2009/1/20 2Last Updated: 2010/2
3Base Kernel Version: based on 2.6.29-rc2. 3Base Kernel Version: based on 2.6.33-rc7-mm(candidate for 34).
4 4
5Because VM is getting complex (one of reasons is memcg...), memcg's behavior 5Because VM is getting complex (one of reasons is memcg...), memcg's behavior
6is complex. This is a document for memcg's internal behavior. 6is complex. This is a document for memcg's internal behavior.
@@ -337,7 +337,7 @@ Under below explanation, we assume CONFIG_MEM_RES_CTRL_SWAP=y.
337 race and lock dependency with other cgroup subsystems. 337 race and lock dependency with other cgroup subsystems.
338 338
339 example) 339 example)
340 # mount -t cgroup none /cgroup -t cpuset,memory,cpu,devices 340 # mount -t cgroup none /cgroup -o cpuset,memory,cpu,devices
341 341
342 and do task move, mkdir, rmdir etc...under this. 342 and do task move, mkdir, rmdir etc...under this.
343 343
@@ -348,7 +348,7 @@ Under below explanation, we assume CONFIG_MEM_RES_CTRL_SWAP=y.
348 348
349 For example, test like following is good. 349 For example, test like following is good.
350 (Shell-A) 350 (Shell-A)
351 # mount -t cgroup none /cgroup -t memory 351 # mount -t cgroup none /cgroup -o memory
352 # mkdir /cgroup/test 352 # mkdir /cgroup/test
353 # echo 40M > /cgroup/test/memory.limit_in_bytes 353 # echo 40M > /cgroup/test/memory.limit_in_bytes
354 # echo 0 > /cgroup/test/tasks 354 # echo 0 > /cgroup/test/tasks
@@ -378,3 +378,42 @@ Under below explanation, we assume CONFIG_MEM_RES_CTRL_SWAP=y.
378 #echo 50M > memory.limit_in_bytes 378 #echo 50M > memory.limit_in_bytes
379 #echo 50M > memory.memsw.limit_in_bytes 379 #echo 50M > memory.memsw.limit_in_bytes
380 run 51M of malloc 380 run 51M of malloc
381
382 9.9 Move charges at task migration
383 Charges associated with a task can be moved along with task migration.
384
385 (Shell-A)
386 #mkdir /cgroup/A
387 #echo $$ >/cgroup/A/tasks
388 run some programs which uses some amount of memory in /cgroup/A.
389
390 (Shell-B)
391 #mkdir /cgroup/B
392 #echo 1 >/cgroup/B/memory.move_charge_at_immigrate
393 #echo "pid of the program running in group A" >/cgroup/B/tasks
394
395 You can see charges have been moved by reading *.usage_in_bytes or
396 memory.stat of both A and B.
397 See 8.2 of Documentation/cgroups/memory.txt to see what value should be
398 written to move_charge_at_immigrate.
399
400 9.10 Memory thresholds
401 Memory controler implements memory thresholds using cgroups notification
402 API. You can use Documentation/cgroups/cgroup_event_listener.c to test
403 it.
404
405 (Shell-A) Create cgroup and run event listener
406 # mkdir /cgroup/A
407 # ./cgroup_event_listener /cgroup/A/memory.usage_in_bytes 5M
408
409 (Shell-B) Add task to cgroup and try to allocate and free memory
410 # echo $$ >/cgroup/A/tasks
411 # a="$(dd if=/dev/zero bs=1M count=10)"
412 # a=
413
414 You will see message from cgroup_event_listener every time you cross
415 the thresholds.
416
417 Use /cgroup/A/memory.memsw.usage_in_bytes to test memsw thresholds.
418
419 It's good idea to test root cgroup as well.
diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt
index b871f2552b45..f8bc802d70b9 100644
--- a/Documentation/cgroups/memory.txt
+++ b/Documentation/cgroups/memory.txt
@@ -182,6 +182,8 @@ list.
182NOTE: Reclaim does not work for the root cgroup, since we cannot set any 182NOTE: Reclaim does not work for the root cgroup, since we cannot set any
183limits on the root cgroup. 183limits on the root cgroup.
184 184
185Note2: When panic_on_oom is set to "2", the whole system will panic.
186
1852. Locking 1872. Locking
186 188
187The memory controller uses the following hierarchy 189The memory controller uses the following hierarchy
@@ -262,10 +264,12 @@ some of the pages cached in the cgroup (page cache pages).
2624.2 Task migration 2644.2 Task migration
263 265
264When a task migrates from one cgroup to another, it's charge is not 266When a task migrates from one cgroup to another, it's charge is not
265carried forward. The pages allocated from the original cgroup still 267carried forward by default. The pages allocated from the original cgroup still
266remain charged to it, the charge is dropped when the page is freed or 268remain charged to it, the charge is dropped when the page is freed or
267reclaimed. 269reclaimed.
268 270
271Note: You can move charges of a task along with task migration. See 8.
272
2694.3 Removing a cgroup 2734.3 Removing a cgroup
270 274
271A cgroup can be removed by rmdir, but as discussed in sections 4.1 and 4.2, a 275A cgroup can be removed by rmdir, but as discussed in sections 4.1 and 4.2, a
@@ -377,7 +381,8 @@ The feature can be disabled by
377NOTE1: Enabling/disabling will fail if the cgroup already has other 381NOTE1: Enabling/disabling will fail if the cgroup already has other
378cgroups created below it. 382cgroups created below it.
379 383
380NOTE2: This feature can be enabled/disabled per subtree. 384NOTE2: When panic_on_oom is set to "2", the whole system will panic in
385case of an oom event in any cgroup.
381 386
3827. Soft limits 3877. Soft limits
383 388
@@ -414,7 +419,76 @@ NOTE1: Soft limits take effect over a long period of time, since they involve
414NOTE2: It is recommended to set the soft limit always below the hard limit, 419NOTE2: It is recommended to set the soft limit always below the hard limit,
415 otherwise the hard limit will take precedence. 420 otherwise the hard limit will take precedence.
416 421
4178. TODO 4228. Move charges at task migration
423
424Users can move charges associated with a task along with task migration, that
425is, uncharge task's pages from the old cgroup and charge them to the new cgroup.
426This feature is not supported in !CONFIG_MMU environments because of lack of
427page tables.
428
4298.1 Interface
430
431This feature is disabled by default. It can be enabled(and disabled again) by
432writing to memory.move_charge_at_immigrate of the destination cgroup.
433
434If you want to enable it:
435
436# echo (some positive value) > memory.move_charge_at_immigrate
437
438Note: Each bits of move_charge_at_immigrate has its own meaning about what type
439 of charges should be moved. See 8.2 for details.
440Note: Charges are moved only when you move mm->owner, IOW, a leader of a thread
441 group.
442Note: If we cannot find enough space for the task in the destination cgroup, we
443 try to make space by reclaiming memory. Task migration may fail if we
444 cannot make enough space.
445Note: It can take several seconds if you move charges in giga bytes order.
446
447And if you want disable it again:
448
449# echo 0 > memory.move_charge_at_immigrate
450
4518.2 Type of charges which can be move
452
453Each bits of move_charge_at_immigrate has its own meaning about what type of
454charges should be moved.
455
456 bit | what type of charges would be moved ?
457 -----+------------------------------------------------------------------------
458 0 | A charge of an anonymous page(or swap of it) used by the target task.
459 | Those pages and swaps must be used only by the target task. You must
460 | enable Swap Extension(see 2.4) to enable move of swap charges.
461
462Note: Those pages and swaps must be charged to the old cgroup.
463Note: More type of pages(e.g. file cache, shmem,) will be supported by other
464 bits in future.
465
4668.3 TODO
467
468- Add support for other types of pages(e.g. file cache, shmem, etc.).
469- Implement madvise(2) to let users decide the vma to be moved or not to be
470 moved.
471- All of moving charge operations are done under cgroup_mutex. It's not good
472 behavior to hold the mutex too long, so we may need some trick.
473
4749. Memory thresholds
475
476Memory controler implements memory thresholds using cgroups notification
477API (see cgroups.txt). It allows to register multiple memory and memsw
478thresholds and gets notifications when it crosses.
479
480To register a threshold application need:
481 - create an eventfd using eventfd(2);
482 - open memory.usage_in_bytes or memory.memsw.usage_in_bytes;
483 - write string like "<event_fd> <memory.usage_in_bytes> <threshold>" to
484 cgroup.event_control.
485
486Application will be notified through eventfd when memory usage crosses
487threshold in any direction.
488
489It's applicable for root and non-root cgroup.
490
49110. TODO
418 492
4191. Add support for accounting huge pages (as a separate controller) 4931. Add support for accounting huge pages (as a separate controller)
4202. Make per-cgroup scanner reclaim not-shared pages first 4942. Make per-cgroup scanner reclaim not-shared pages first
diff --git a/Documentation/console/console.txt b/Documentation/console/console.txt
index 877a1b26cc3d..926cf1b5e63e 100644
--- a/Documentation/console/console.txt
+++ b/Documentation/console/console.txt
@@ -74,7 +74,7 @@ driver takes over the consoles vacated by the driver. Binding, on the other
74hand, will bind the driver to the consoles that are currently occupied by a 74hand, will bind the driver to the consoles that are currently occupied by a
75system driver. 75system driver.
76 76
77NOTE1: Binding and binding must be selected in Kconfig. It's under: 77NOTE1: Binding and unbinding must be selected in Kconfig. It's under:
78 78
79Device Drivers -> Character devices -> Support for binding and unbinding 79Device Drivers -> Character devices -> Support for binding and unbinding
80console drivers 80console drivers
diff --git a/Documentation/cpu-freq/pcc-cpufreq.txt b/Documentation/cpu-freq/pcc-cpufreq.txt
new file mode 100644
index 000000000000..9e3c3b33514c
--- /dev/null
+++ b/Documentation/cpu-freq/pcc-cpufreq.txt
@@ -0,0 +1,207 @@
1/*
2 * pcc-cpufreq.txt - PCC interface documentation
3 *
4 * Copyright (C) 2009 Red Hat, Matthew Garrett <mjg@redhat.com>
5 * Copyright (C) 2009 Hewlett-Packard Development Company, L.P.
6 * Nagananda Chumbalkar <nagananda.chumbalkar@hp.com>
7 *
8 * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
9 *
10 * This program is free software; you can redistribute it and/or modify
11 * it under the terms of the GNU General Public License as published by
12 * the Free Software Foundation; version 2 of the License.
13 *
14 * This program is distributed in the hope that it will be useful, but
15 * WITHOUT ANY WARRANTY; without even the implied warranty of
16 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or NON
17 * INFRINGEMENT. See the GNU General Public License for more details.
18 *
19 * You should have received a copy of the GNU General Public License along
20 * with this program; if not, write to the Free Software Foundation, Inc.,
21 * 675 Mass Ave, Cambridge, MA 02139, USA.
22 *
23 * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
24 */
25
26
27 Processor Clocking Control Driver
28 ---------------------------------
29
30Contents:
31---------
321. Introduction
331.1 PCC interface
341.1.1 Get Average Frequency
351.1.2 Set Desired Frequency
361.2 Platforms affected
372. Driver and /sys details
382.1 scaling_available_frequencies
392.2 cpuinfo_transition_latency
402.3 cpuinfo_cur_freq
412.4 related_cpus
423. Caveats
43
441. Introduction:
45----------------
46Processor Clocking Control (PCC) is an interface between the platform
47firmware and OSPM. It is a mechanism for coordinating processor
48performance (ie: frequency) between the platform firmware and the OS.
49
50The PCC driver (pcc-cpufreq) allows OSPM to take advantage of the PCC
51interface.
52
53OS utilizes the PCC interface to inform platform firmware what frequency the
54OS wants for a logical processor. The platform firmware attempts to achieve
55the requested frequency. If the request for the target frequency could not be
56satisfied by platform firmware, then it usually means that power budget
57conditions are in place, and "power capping" is taking place.
58
591.1 PCC interface:
60------------------
61The complete PCC specification is available here:
62http://www.acpica.org/download/Processor-Clocking-Control-v1p0.pdf
63
64PCC relies on a shared memory region that provides a channel for communication
65between the OS and platform firmware. PCC also implements a "doorbell" that
66is used by the OS to inform the platform firmware that a command has been
67sent.
68
69The ACPI PCCH() method is used to discover the location of the PCC shared
70memory region. The shared memory region header contains the "command" and
71"status" interface. PCCH() also contains details on how to access the platform
72doorbell.
73
74The following commands are supported by the PCC interface:
75* Get Average Frequency
76* Set Desired Frequency
77
78The ACPI PCCP() method is implemented for each logical processor and is
79used to discover the offsets for the input and output buffers in the shared
80memory region.
81
82When PCC mode is enabled, the platform will not expose processor performance
83or throttle states (_PSS, _TSS and related ACPI objects) to OSPM. Therefore,
84the native P-state driver (such as acpi-cpufreq for Intel, powernow-k8 for
85AMD) will not load.
86
87However, OSPM remains in control of policy. The governor (eg: "ondemand")
88computes the required performance for each processor based on server workload.
89The PCC driver fills in the command interface, and the input buffer and
90communicates the request to the platform firmware. The platform firmware is
91responsible for delivering the requested performance.
92
93Each PCC command is "global" in scope and can affect all the logical CPUs in
94the system. Therefore, PCC is capable of performing "group" updates. With PCC
95the OS is capable of getting/setting the frequency of all the logical CPUs in
96the system with a single call to the BIOS.
97
981.1.1 Get Average Frequency:
99----------------------------
100This command is used by the OSPM to query the running frequency of the
101processor since the last time this command was completed. The output buffer
102indicates the average unhalted frequency of the logical processor expressed as
103a percentage of the nominal (ie: maximum) CPU frequency. The output buffer
104also signifies if the CPU frequency is limited by a power budget condition.
105
1061.1.2 Set Desired Frequency:
107----------------------------
108This command is used by the OSPM to communicate to the platform firmware the
109desired frequency for a logical processor. The output buffer is currently
110ignored by OSPM. The next invocation of "Get Average Frequency" will inform
111OSPM if the desired frequency was achieved or not.
112
1131.2 Platforms affected:
114-----------------------
115The PCC driver will load on any system where the platform firmware:
116* supports the PCC interface, and the associated PCCH() and PCCP() methods
117* assumes responsibility for managing the hardware clocking controls in order
118to deliver the requested processor performance
119
120Currently, certain HP ProLiant platforms implement the PCC interface. On those
121platforms PCC is the "default" choice.
122
123However, it is possible to disable this interface via a BIOS setting. In
124such an instance, as is also the case on platforms where the PCC interface
125is not implemented, the PCC driver will fail to load silently.
126
1272. Driver and /sys details:
128---------------------------
129When the driver loads, it merely prints the lowest and the highest CPU
130frequencies supported by the platform firmware.
131
132The PCC driver loads with a message such as:
133pcc-cpufreq: (v1.00.00) driver loaded with frequency limits: 1600 MHz, 2933
134MHz
135
136This means that the OPSM can request the CPU to run at any frequency in
137between the limits (1600 MHz, and 2933 MHz) specified in the message.
138
139Internally, there is no need for the driver to convert the "target" frequency
140to a corresponding P-state.
141
142The VERSION number for the driver will be of the format v.xy.ab.
143eg: 1.00.02
144 ----- --
145 | |
146 | -- this will increase with bug fixes/enhancements to the driver
147 |-- this is the version of the PCC specification the driver adheres to
148
149
150The following is a brief discussion on some of the fields exported via the
151/sys filesystem and how their values are affected by the PCC driver:
152
1532.1 scaling_available_frequencies:
154----------------------------------
155scaling_available_frequencies is not created in /sys. No intermediate
156frequencies need to be listed because the BIOS will try to achieve any
157frequency, within limits, requested by the governor. A frequency does not have
158to be strictly associated with a P-state.
159
1602.2 cpuinfo_transition_latency:
161-------------------------------
162The cpuinfo_transition_latency field is 0. The PCC specification does
163not include a field to expose this value currently.
164
1652.3 cpuinfo_cur_freq:
166---------------------
167A) Often cpuinfo_cur_freq will show a value different than what is declared
168in the scaling_available_frequencies or scaling_cur_freq, or scaling_max_freq.
169This is due to "turbo boost" available on recent Intel processors. If certain
170conditions are met the BIOS can achieve a slightly higher speed than requested
171by OSPM. An example:
172
173scaling_cur_freq : 2933000
174cpuinfo_cur_freq : 3196000
175
176B) There is a round-off error associated with the cpuinfo_cur_freq value.
177Since the driver obtains the current frequency as a "percentage" (%) of the
178nominal frequency from the BIOS, sometimes, the values displayed by
179scaling_cur_freq and cpuinfo_cur_freq may not match. An example:
180
181scaling_cur_freq : 1600000
182cpuinfo_cur_freq : 1583000
183
184In this example, the nominal frequency is 2933 MHz. The driver obtains the
185current frequency, cpuinfo_cur_freq, as 54% of the nominal frequency:
186
187 54% of 2933 MHz = 1583 MHz
188
189Nominal frequency is the maximum frequency of the processor, and it usually
190corresponds to the frequency of the P0 P-state.
191
1922.4 related_cpus:
193-----------------
194The related_cpus field is identical to affected_cpus.
195
196affected_cpus : 4
197related_cpus : 4
198
199Currently, the PCC driver does not evaluate _PSD. The platforms that support
200PCC do not implement SW_ALL. So OSPM doesn't need to perform any coordination
201to ensure that the same frequency is requested of all dependent CPUs.
202
2033. Caveats:
204-----------
205The "cpufreq_stats" module in its present form cannot be loaded and
206expected to work with the PCC driver. Since the "cpufreq_stats" module
207provides information wrt each P-state, it is not applicable to the PCC driver.
diff --git a/Documentation/device-mapper/snapshot.txt b/Documentation/device-mapper/snapshot.txt
index e3a77b215135..0d5bc46dc167 100644
--- a/Documentation/device-mapper/snapshot.txt
+++ b/Documentation/device-mapper/snapshot.txt
@@ -122,3 +122,47 @@ volumeGroup-base: 0 2097152 snapshot-merge 254:11 254:12 P 16
122brw------- 1 root root 254, 11 29 ago 18:15 /dev/mapper/volumeGroup-base-real 122brw------- 1 root root 254, 11 29 ago 18:15 /dev/mapper/volumeGroup-base-real
123brw------- 1 root root 254, 12 29 ago 18:16 /dev/mapper/volumeGroup-base-cow 123brw------- 1 root root 254, 12 29 ago 18:16 /dev/mapper/volumeGroup-base-cow
124brw------- 1 root root 254, 10 29 ago 18:16 /dev/mapper/volumeGroup-base 124brw------- 1 root root 254, 10 29 ago 18:16 /dev/mapper/volumeGroup-base
125
126
127How to determine when a merging is complete
128===========================================
129The snapshot-merge and snapshot status lines end with:
130 <sectors_allocated>/<total_sectors> <metadata_sectors>
131
132Both <sectors_allocated> and <total_sectors> include both data and metadata.
133During merging, the number of sectors allocated gets smaller and
134smaller. Merging has finished when the number of sectors holding data
135is zero, in other words <sectors_allocated> == <metadata_sectors>.
136
137Here is a practical example (using a hybrid of lvm and dmsetup commands):
138
139# lvs
140 LV VG Attr LSize Origin Snap% Move Log Copy% Convert
141 base volumeGroup owi-a- 4.00g
142 snap volumeGroup swi-a- 1.00g base 18.97
143
144# dmsetup status volumeGroup-snap
1450 8388608 snapshot 397896/2097152 1560
146 ^^^^ metadata sectors
147
148# lvconvert --merge -b volumeGroup/snap
149 Merging of volume snap started.
150
151# lvs volumeGroup/snap
152 LV VG Attr LSize Origin Snap% Move Log Copy% Convert
153 base volumeGroup Owi-a- 4.00g 17.23
154
155# dmsetup status volumeGroup-base
1560 8388608 snapshot-merge 281688/2097152 1104
157
158# dmsetup status volumeGroup-base
1590 8388608 snapshot-merge 180480/2097152 712
160
161# dmsetup status volumeGroup-base
1620 8388608 snapshot-merge 16/2097152 16
163
164Merging has finished.
165
166# lvs
167 LV VG Attr LSize Origin Snap% Move Log Copy% Convert
168 base volumeGroup owi-a- 4.00g
diff --git a/Documentation/dontdiff b/Documentation/dontdiff
index 3ad6acead949..d9bcffd59433 100644
--- a/Documentation/dontdiff
+++ b/Documentation/dontdiff
@@ -69,7 +69,6 @@ av_permissions.h
69bbootsect 69bbootsect
70bin2c 70bin2c
71binkernel.spec 71binkernel.spec
72binoffset
73bootsect 72bootsect
74bounds.h 73bounds.h
75bsetup 74bsetup
diff --git a/Documentation/driver-model/platform.txt b/Documentation/driver-model/platform.txt
index 2e2c2ea90ceb..41f41632ee55 100644
--- a/Documentation/driver-model/platform.txt
+++ b/Documentation/driver-model/platform.txt
@@ -192,7 +192,7 @@ command line. This will execute all matching early_param() callbacks.
192User specified early platform devices will be registered at this point. 192User specified early platform devices will be registered at this point.
193For the early serial console case the user can specify port on the 193For the early serial console case the user can specify port on the
194kernel command line as "earlyprintk=serial.0" where "earlyprintk" is 194kernel command line as "earlyprintk=serial.0" where "earlyprintk" is
195the class string, "serial" is the name of the platfrom driver and 195the class string, "serial" is the name of the platform driver and
1960 is the platform device id. If the id is -1 then the dot and the 1960 is the platform device id. If the id is -1 then the dot and the
197id can be omitted. 197id can be omitted.
198 198
diff --git a/Documentation/dvb/get_dvb_firmware b/Documentation/dvb/get_dvb_firmware
index 14b7b5a3bcb9..239cbdbf4d12 100644
--- a/Documentation/dvb/get_dvb_firmware
+++ b/Documentation/dvb/get_dvb_firmware
@@ -26,7 +26,7 @@ use IO::Handle;
26 "dec3000s", "vp7041", "dibusb", "nxt2002", "nxt2004", 26 "dec3000s", "vp7041", "dibusb", "nxt2002", "nxt2004",
27 "or51211", "or51132_qam", "or51132_vsb", "bluebird", 27 "or51211", "or51132_qam", "or51132_vsb", "bluebird",
28 "opera1", "cx231xx", "cx18", "cx23885", "pvrusb2", "mpc718", 28 "opera1", "cx231xx", "cx18", "cx23885", "pvrusb2", "mpc718",
29 "af9015"); 29 "af9015", "ngene");
30 30
31# Check args 31# Check args
32syntax() if (scalar(@ARGV) != 1); 32syntax() if (scalar(@ARGV) != 1);
@@ -39,7 +39,7 @@ for ($i=0; $i < scalar(@components); $i++) {
39 die $@ if $@; 39 die $@ if $@;
40 print STDERR <<EOF; 40 print STDERR <<EOF;
41Firmware(s) $outfile extracted successfully. 41Firmware(s) $outfile extracted successfully.
42Now copy it(they) to either /usr/lib/hotplug/firmware or /lib/firmware 42Now copy it(them) to either /usr/lib/hotplug/firmware or /lib/firmware
43(depending on configuration of firmware hotplug). 43(depending on configuration of firmware hotplug).
44EOF 44EOF
45 exit(0); 45 exit(0);
@@ -549,6 +549,24 @@ sub af9015 {
549 close INFILE; 549 close INFILE;
550} 550}
551 551
552sub ngene {
553 my $url = "http://www.digitaldevices.de/download/";
554 my $file1 = "ngene_15.fw";
555 my $hash1 = "d798d5a757121174f0dbc5f2833c0c85";
556 my $file2 = "ngene_17.fw";
557 my $hash2 = "26b687136e127b8ac24b81e0eeafc20b";
558
559 checkstandard();
560
561 wgetfile($file1, $url . $file1);
562 verify($file1, $hash1);
563
564 wgetfile($file2, $url . $file2);
565 verify($file2, $hash2);
566
567 "$file1, $file2";
568}
569
552# --------------------------------------------------------------- 570# ---------------------------------------------------------------
553# Utilities 571# Utilities
554 572
@@ -667,6 +685,7 @@ sub delzero{
667sub syntax() { 685sub syntax() {
668 print STDERR "syntax: get_dvb_firmware <component>\n"; 686 print STDERR "syntax: get_dvb_firmware <component>\n";
669 print STDERR "Supported components:\n"; 687 print STDERR "Supported components:\n";
688 @components = sort @components;
670 for($i=0; $i < scalar(@components); $i++) { 689 for($i=0; $i < scalar(@components); $i++) {
671 print STDERR "\t" . $components[$i] . "\n"; 690 print STDERR "\t" . $components[$i] . "\n";
672 } 691 }
diff --git a/Documentation/eisa.txt b/Documentation/eisa.txt
index 60e361ba08c0..f297fc1202ae 100644
--- a/Documentation/eisa.txt
+++ b/Documentation/eisa.txt
@@ -171,7 +171,7 @@ device.
171virtual_root.force_probe : 171virtual_root.force_probe :
172 172
173Force the probing code to probe EISA slots even when it cannot find an 173Force the probing code to probe EISA slots even when it cannot find an
174EISA compliant mainboard (nothing appears on slot 0). Defaultd to 0 174EISA compliant mainboard (nothing appears on slot 0). Defaults to 0
175(don't force), and set to 1 (force probing) when either 175(don't force), and set to 1 (force probing) when either
176CONFIG_ALPHA_JENSEN or CONFIG_EISA_VLB_PRIMING are set. 176CONFIG_ALPHA_JENSEN or CONFIG_EISA_VLB_PRIMING are set.
177 177
diff --git a/Documentation/email-clients.txt b/Documentation/email-clients.txt
index a618efab7b15..945ff3fda433 100644
--- a/Documentation/email-clients.txt
+++ b/Documentation/email-clients.txt
@@ -216,26 +216,14 @@ Works. Use "Insert file..." or external editor.
216~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 216~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
217Gmail (Web GUI) 217Gmail (Web GUI)
218 218
219If you just have to use Gmail to send patches, it CAN be made to work. It 219Does not work for sending patches.
220requires a bit of external help, though. 220
221 221Gmail web client converts tabs to spaces automatically.
222The first problem is that Gmail converts tabs to spaces. This will 222
223totally break your patches. To prevent this, you have to use a different 223At the same time it wraps lines every 78 chars with CRLF style line breaks
224editor. There is a firefox extension called "ViewSourceWith" 224although tab2space problem can be solved with external editor.
225(https://addons.mozilla.org/en-US/firefox/addon/394) which allows you to 225
226edit any text box in the editor of your choice. Configure it to launch 226Another problem is that Gmail will base64-encode any message that has a
227your favorite editor. When you want to send a patch, use this technique. 227non-ASCII character. That includes things like European names.
228Once you have crafted your messsage + patch, save and exit the editor,
229which should reload the Gmail edit box. GMAIL WILL PRESERVE THE TABS.
230Hoorah. Apparently you can cut-n-paste literal tabs, but Gmail will
231convert those to spaces upon sending!
232
233The second problem is that Gmail converts tabs to spaces on replies. If
234you reply to a patch, don't expect to be able to apply it as a patch.
235
236The last problem is that Gmail will base64-encode any message that has a
237non-ASCII character. That includes things like European names. Be aware.
238
239Gmail is not convenient for lkml patches, but CAN be made to work.
240 228
241 ### 229 ###
diff --git a/Documentation/fault-injection/provoke-crashes.txt b/Documentation/fault-injection/provoke-crashes.txt
new file mode 100644
index 000000000000..7a9d3d81525b
--- /dev/null
+++ b/Documentation/fault-injection/provoke-crashes.txt
@@ -0,0 +1,38 @@
1The lkdtm module provides an interface to crash or injure the kernel at
2predefined crashpoints to evaluate the reliability of crash dumps obtained
3using different dumping solutions. The module uses KPROBEs to instrument
4crashing points, but can also crash the kernel directly without KRPOBE
5support.
6
7
8You can provide the way either through module arguments when inserting
9the module, or through a debugfs interface.
10
11Usage: insmod lkdtm.ko [recur_count={>0}] cpoint_name=<> cpoint_type=<>
12 [cpoint_count={>0}]
13
14 recur_count : Recursion level for the stack overflow test. Default is 10.
15
16 cpoint_name : Crash point where the kernel is to be crashed. It can be
17 one of INT_HARDWARE_ENTRY, INT_HW_IRQ_EN, INT_TASKLET_ENTRY,
18 FS_DEVRW, MEM_SWAPOUT, TIMERADD, SCSI_DISPATCH_CMD,
19 IDE_CORE_CP, DIRECT
20
21 cpoint_type : Indicates the action to be taken on hitting the crash point.
22 It can be one of PANIC, BUG, EXCEPTION, LOOP, OVERFLOW,
23 CORRUPT_STACK, UNALIGNED_LOAD_STORE_WRITE, OVERWRITE_ALLOCATION,
24 WRITE_AFTER_FREE,
25
26 cpoint_count : Indicates the number of times the crash point is to be hit
27 to trigger an action. The default is 10.
28
29You can also induce failures by mounting debugfs and writing the type to
30<mountpoint>/provoke-crash/<crashpoint>. E.g.,
31
32 mount -t debugfs debugfs /mnt
33 echo EXCEPTION > /mnt/provoke-crash/INT_HARDWARE_ENTRY
34
35
36A special file is `DIRECT' which will induce the crash directly without
37KPROBE instrumentation. This mode is the only one available when the module
38is built on a kernel without KPROBEs support.
diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt
index 0a46833c1b76..ed511af0f79a 100644
--- a/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@ -6,21 +6,6 @@ be removed from this file.
6 6
7--------------------------- 7---------------------------
8 8
9What: USER_SCHED
10When: 2.6.34
11
12Why: USER_SCHED was implemented as a proof of concept for group scheduling.
13 The effect of USER_SCHED can already be achieved from userspace with
14 the help of libcgroup. The removal of USER_SCHED will also simplify
15 the scheduler code with the removal of one major ifdef. There are also
16 issues USER_SCHED has with USER_NS. A decision was taken not to fix
17 those and instead remove USER_SCHED. Also new group scheduling
18 features will not be implemented for USER_SCHED.
19
20Who: Dhaval Giani <dhaval@linux.vnet.ibm.com>
21
22---------------------------
23
24What: PRISM54 9What: PRISM54
25When: 2.6.34 10When: 2.6.34
26 11
@@ -64,6 +49,17 @@ Who: Robin Getz <rgetz@blackfin.uclinux.org> & Matt Mackall <mpm@selenic.com>
64 49
65--------------------------- 50---------------------------
66 51
52What: Deprecated snapshot ioctls
53When: 2.6.36
54
55Why: The ioctls in kernel/power/user.c were marked as deprecated long time
56 ago. Now they notify users about that so that they need to replace
57 their userspace. After some more time, remove them completely.
58
59Who: Jiri Slaby <jirislaby@gmail.com>
60
61---------------------------
62
67What: The ieee80211_regdom module parameter 63What: The ieee80211_regdom module parameter
68When: March 2010 / desktop catchup 64When: March 2010 / desktop catchup
69 65
@@ -88,27 +84,6 @@ Who: Luis R. Rodriguez <lrodriguez@atheros.com>
88 84
89--------------------------- 85---------------------------
90 86
91What: CONFIG_WIRELESS_OLD_REGULATORY - old static regulatory information
92When: March 2010 / desktop catchup
93
94Why: The old regulatory infrastructure has been replaced with a new one
95 which does not require statically defined regulatory domains. We do
96 not want to keep static regulatory domains in the kernel due to the
97 the dynamic nature of regulatory law and localization. We kept around
98 the old static definitions for the regulatory domains of:
99
100 * US
101 * JP
102 * EU
103
104 and used by default the US when CONFIG_WIRELESS_OLD_REGULATORY was
105 set. We will remove this option once the standard Linux desktop catches
106 up with the new userspace APIs we have implemented.
107
108Who: Luis R. Rodriguez <lrodriguez@atheros.com>
109
110---------------------------
111
112What: dev->power.power_state 87What: dev->power.power_state
113When: July 2007 88When: July 2007
114Why: Broken design for runtime control over driver power states, confusing 89Why: Broken design for runtime control over driver power states, confusing
@@ -142,19 +117,25 @@ Who: Mauro Carvalho Chehab <mchehab@infradead.org>
142--------------------------- 117---------------------------
143 118
144What: PCMCIA control ioctl (needed for pcmcia-cs [cardmgr, cardctl]) 119What: PCMCIA control ioctl (needed for pcmcia-cs [cardmgr, cardctl])
145When: November 2005 120When: 2.6.35/2.6.36
146Files: drivers/pcmcia/: pcmcia_ioctl.c 121Files: drivers/pcmcia/: pcmcia_ioctl.c
147Why: With the 16-bit PCMCIA subsystem now behaving (almost) like a 122Why: With the 16-bit PCMCIA subsystem now behaving (almost) like a
148 normal hotpluggable bus, and with it using the default kernel 123 normal hotpluggable bus, and with it using the default kernel
149 infrastructure (hotplug, driver core, sysfs) keeping the PCMCIA 124 infrastructure (hotplug, driver core, sysfs) keeping the PCMCIA
150 control ioctl needed by cardmgr and cardctl from pcmcia-cs is 125 control ioctl needed by cardmgr and cardctl from pcmcia-cs is
151 unnecessary, and makes further cleanups and integration of the 126 unnecessary and potentially harmful (it does not provide for
127 proper locking), and makes further cleanups and integration of the
152 PCMCIA subsystem into the Linux kernel device driver model more 128 PCMCIA subsystem into the Linux kernel device driver model more
153 difficult. The features provided by cardmgr and cardctl are either 129 difficult. The features provided by cardmgr and cardctl are either
154 handled by the kernel itself now or are available in the new 130 handled by the kernel itself now or are available in the new
155 pcmciautils package available at 131 pcmciautils package available at
156 http://kernel.org/pub/linux/utils/kernel/pcmcia/ 132 http://kernel.org/pub/linux/utils/kernel/pcmcia/
157Who: Dominik Brodowski <linux@brodo.de> 133
134 For all architectures except ARM, the associated config symbol
135 has been removed from kernel 2.6.34; for ARM, it will be likely
136 be removed from kernel 2.6.35. The actual code will then likely
137 be removed from kernel 2.6.36.
138Who: Dominik Brodowski <linux@dominikbrodowski.net>
158 139
159--------------------------- 140---------------------------
160 141
@@ -468,12 +449,6 @@ Who: Alok N Kataria <akataria@vmware.com>
468 449
469---------------------------- 450----------------------------
470 451
471What: adt7473 hardware monitoring driver
472When: February 2010
473Why: Obsoleted by the adt7475 driver.
474Who: Jean Delvare <khali@linux-fr.org>
475
476---------------------------
477What: Support for lcd_switch and display_get in asus-laptop driver 452What: Support for lcd_switch and display_get in asus-laptop driver
478When: March 2010 453When: March 2010
479Why: These two features use non-standard interfaces. There are the 454Why: These two features use non-standard interfaces. There are the
@@ -542,3 +517,75 @@ Why: Duplicate functionality with the gspca_zc3xx driver, zc0301 only
542 sensors) wich are also supported by the gspca_zc3xx driver 517 sensors) wich are also supported by the gspca_zc3xx driver
543 (which supports 53 USB-ID's in total) 518 (which supports 53 USB-ID's in total)
544Who: Hans de Goede <hdegoede@redhat.com> 519Who: Hans de Goede <hdegoede@redhat.com>
520
521----------------------------
522
523What: corgikbd, spitzkbd, tosakbd driver
524When: 2.6.35
525Files: drivers/input/keyboard/{corgi,spitz,tosa}kbd.c
526Why: We now have a generic GPIO based matrix keyboard driver that
527 are fully capable of handling all the keys on these devices.
528 The original drivers manipulate the GPIO registers directly
529 and so are difficult to maintain.
530Who: Eric Miao <eric.y.miao@gmail.com>
531
532----------------------------
533
534What: corgi_ssp and corgi_ts driver
535When: 2.6.35
536Files: arch/arm/mach-pxa/corgi_ssp.c, drivers/input/touchscreen/corgi_ts.c
537Why: The corgi touchscreen is now deprecated in favour of the generic
538 ads7846.c driver. The noise reduction technique used in corgi_ts.c,
539 that's to wait till vsync before ADC sampling, is also integrated into
540 ads7846 driver now. Provided that the original driver is not generic
541 and is difficult to maintain, it will be removed later.
542Who: Eric Miao <eric.y.miao@gmail.com>
543
544----------------------------
545
546What: capifs
547When: February 2011
548Files: drivers/isdn/capi/capifs.*
549Why: udev fully replaces this special file system that only contains CAPI
550 NCCI TTY device nodes. User space (pppdcapiplugin) works without
551 noticing the difference.
552Who: Jan Kiszka <jan.kiszka@web.de>
553
554----------------------------
555
556What: KVM memory aliases support
557When: July 2010
558Why: Memory aliasing support is used for speeding up guest vga access
559 through the vga windows.
560
561 Modern userspace no longer uses this feature, so it's just bitrotted
562 code and can be removed with no impact.
563Who: Avi Kivity <avi@redhat.com>
564
565----------------------------
566
567What: KVM kernel-allocated memory slots
568When: July 2010
569Why: Since 2.6.25, kvm supports user-allocated memory slots, which are
570 much more flexible than kernel-allocated slots. All current userspace
571 supports the newer interface and this code can be removed with no
572 impact.
573Who: Avi Kivity <avi@redhat.com>
574
575----------------------------
576
577What: KVM paravirt mmu host support
578When: January 2011
579Why: The paravirt mmu host support is slower than non-paravirt mmu, both
580 on newer and older hardware. It is already not exposed to the guest,
581 and kept only for live migration purposes.
582Who: Avi Kivity <avi@redhat.com>
583
584----------------------------
585
586What: "acpi=ht" boot option
587When: 2.6.35
588Why: Useful in 2003, implementation is a hack.
589 Generally invoked by accident today.
590 Seen as doing more harm than good.
591Who: Len Brown <len.brown@intel.com>
diff --git a/Documentation/filesystems/00-INDEX b/Documentation/filesystems/00-INDEX
index 875d49696b6e..3bae418c6ad3 100644
--- a/Documentation/filesystems/00-INDEX
+++ b/Documentation/filesystems/00-INDEX
@@ -32,6 +32,8 @@ dlmfs.txt
32 - info on the userspace interface to the OCFS2 DLM. 32 - info on the userspace interface to the OCFS2 DLM.
33dnotify.txt 33dnotify.txt
34 - info about directory notification in Linux. 34 - info about directory notification in Linux.
35dnotify_test.c
36 - example program for dnotify
35ecryptfs.txt 37ecryptfs.txt
36 - docs on eCryptfs: stacked cryptographic filesystem for Linux. 38 - docs on eCryptfs: stacked cryptographic filesystem for Linux.
37exofs.txt 39exofs.txt
@@ -62,6 +64,8 @@ jfs.txt
62 - info and mount options for the JFS filesystem. 64 - info and mount options for the JFS filesystem.
63locks.txt 65locks.txt
64 - info on file locking implementations, flock() vs. fcntl(), etc. 66 - info on file locking implementations, flock() vs. fcntl(), etc.
67logfs.txt
68 - info on the LogFS flash filesystem.
65mandatory-locking.txt 69mandatory-locking.txt
66 - info on the Linux implementation of Sys V mandatory file locking. 70 - info on the Linux implementation of Sys V mandatory file locking.
67ncpfs.txt 71ncpfs.txt
diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index 18b9d0ca0630..06bbbed71206 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -460,13 +460,6 @@ in sys_read() and friends.
460 460
461--------------------------- dquot_operations ------------------------------- 461--------------------------- dquot_operations -------------------------------
462prototypes: 462prototypes:
463 int (*initialize) (struct inode *, int);
464 int (*drop) (struct inode *);
465 int (*alloc_space) (struct inode *, qsize_t, int);
466 int (*alloc_inode) (const struct inode *, unsigned long);
467 int (*free_space) (struct inode *, qsize_t);
468 int (*free_inode) (const struct inode *, unsigned long);
469 int (*transfer) (struct inode *, struct iattr *);
470 int (*write_dquot) (struct dquot *); 463 int (*write_dquot) (struct dquot *);
471 int (*acquire_dquot) (struct dquot *); 464 int (*acquire_dquot) (struct dquot *);
472 int (*release_dquot) (struct dquot *); 465 int (*release_dquot) (struct dquot *);
@@ -479,13 +472,6 @@ a proper locking wrt the filesystem and call the generic quota operations.
479What filesystem should expect from the generic quota functions: 472What filesystem should expect from the generic quota functions:
480 473
481 FS recursion Held locks when called 474 FS recursion Held locks when called
482initialize: yes maybe dqonoff_sem
483drop: yes -
484alloc_space: ->mark_dirty() -
485alloc_inode: ->mark_dirty() -
486free_space: ->mark_dirty() -
487free_inode: ->mark_dirty() -
488transfer: yes -
489write_dquot: yes dqonoff_sem or dqptr_sem 475write_dquot: yes dqonoff_sem or dqptr_sem
490acquire_dquot: yes dqonoff_sem or dqptr_sem 476acquire_dquot: yes dqonoff_sem or dqptr_sem
491release_dquot: yes dqonoff_sem or dqptr_sem 477release_dquot: yes dqonoff_sem or dqptr_sem
@@ -495,10 +481,6 @@ write_info: yes dqonoff_sem
495FS recursion means calling ->quota_read() and ->quota_write() from superblock 481FS recursion means calling ->quota_read() and ->quota_write() from superblock
496operations. 482operations.
497 483
498->alloc_space(), ->alloc_inode(), ->free_space(), ->free_inode() are called
499only directly by the filesystem and do not call any fs functions only
500the ->mark_dirty() operation.
501
502More details about quota locking can be found in fs/dquot.c. 484More details about quota locking can be found in fs/dquot.c.
503 485
504--------------------------- vm_operations_struct ----------------------------- 486--------------------------- vm_operations_struct -----------------------------
diff --git a/Documentation/filesystems/Makefile b/Documentation/filesystems/Makefile
new file mode 100644
index 000000000000..a5dd114da14f
--- /dev/null
+++ b/Documentation/filesystems/Makefile
@@ -0,0 +1,8 @@
1# kbuild trick to avoid linker error. Can be omitted if a module is built.
2obj- := dummy.o
3
4# List of programs to build
5hostprogs-y := dnotify_test
6
7# Tell kbuild to always build the programs
8always := $(hostprogs-y)
diff --git a/Documentation/filesystems/dentry-locking.txt b/Documentation/filesystems/dentry-locking.txt
index 4c0c575a4012..79334ed5daa7 100644
--- a/Documentation/filesystems/dentry-locking.txt
+++ b/Documentation/filesystems/dentry-locking.txt
@@ -62,7 +62,8 @@ changes are :
622. Insertion of a dentry into the hash table is done using 622. Insertion of a dentry into the hash table is done using
63 hlist_add_head_rcu() which take care of ordering the writes - the 63 hlist_add_head_rcu() which take care of ordering the writes - the
64 writes to the dentry must be visible before the dentry is 64 writes to the dentry must be visible before the dentry is
65 inserted. This works in conjunction with hlist_for_each_rcu() while 65 inserted. This works in conjunction with hlist_for_each_rcu(),
66 which has since been replaced by hlist_for_each_entry_rcu(), while
66 walking the hash chain. The only requirement is that all 67 walking the hash chain. The only requirement is that all
67 initialization to the dentry must be done before 68 initialization to the dentry must be done before
68 hlist_add_head_rcu() since we don't have dcache_lock protection 69 hlist_add_head_rcu() since we don't have dcache_lock protection
diff --git a/Documentation/filesystems/dnotify.txt b/Documentation/filesystems/dnotify.txt
index 9f5d338ddbb8..6baf88f46859 100644
--- a/Documentation/filesystems/dnotify.txt
+++ b/Documentation/filesystems/dnotify.txt
@@ -62,38 +62,9 @@ disabled, fcntl(fd, F_NOTIFY, ...) will return -EINVAL.
62 62
63Example 63Example
64------- 64-------
65See Documentation/filesystems/dnotify_test.c for an example.
65 66
66 #define _GNU_SOURCE /* needed to get the defines */ 67NOTE
67 #include <fcntl.h> /* in glibc 2.2 this has the needed 68----
68 values defined */ 69Beginning with Linux 2.6.13, dnotify has been replaced by inotify.
69 #include <signal.h> 70See Documentation/filesystems/inotify.txt for more information on it.
70 #include <stdio.h>
71 #include <unistd.h>
72
73 static volatile int event_fd;
74
75 static void handler(int sig, siginfo_t *si, void *data)
76 {
77 event_fd = si->si_fd;
78 }
79
80 int main(void)
81 {
82 struct sigaction act;
83 int fd;
84
85 act.sa_sigaction = handler;
86 sigemptyset(&act.sa_mask);
87 act.sa_flags = SA_SIGINFO;
88 sigaction(SIGRTMIN + 1, &act, NULL);
89
90 fd = open(".", O_RDONLY);
91 fcntl(fd, F_SETSIG, SIGRTMIN + 1);
92 fcntl(fd, F_NOTIFY, DN_MODIFY|DN_CREATE|DN_MULTISHOT);
93 /* we will now be notified if any of the files
94 in "." is modified or new files are created */
95 while (1) {
96 pause();
97 printf("Got event on fd=%d\n", event_fd);
98 }
99 }
diff --git a/Documentation/filesystems/dnotify_test.c b/Documentation/filesystems/dnotify_test.c
new file mode 100644
index 000000000000..8b37b4a1e18d
--- /dev/null
+++ b/Documentation/filesystems/dnotify_test.c
@@ -0,0 +1,34 @@
1#define _GNU_SOURCE /* needed to get the defines */
2#include <fcntl.h> /* in glibc 2.2 this has the needed
3 values defined */
4#include <signal.h>
5#include <stdio.h>
6#include <unistd.h>
7
8static volatile int event_fd;
9
10static void handler(int sig, siginfo_t *si, void *data)
11{
12 event_fd = si->si_fd;
13}
14
15int main(void)
16{
17 struct sigaction act;
18 int fd;
19
20 act.sa_sigaction = handler;
21 sigemptyset(&act.sa_mask);
22 act.sa_flags = SA_SIGINFO;
23 sigaction(SIGRTMIN + 1, &act, NULL);
24
25 fd = open(".", O_RDONLY);
26 fcntl(fd, F_SETSIG, SIGRTMIN + 1);
27 fcntl(fd, F_NOTIFY, DN_MODIFY|DN_CREATE|DN_MULTISHOT);
28 /* we will now be notified if any of the files
29 in "." is modified or new files are created */
30 while (1) {
31 pause();
32 printf("Got event on fd=%d\n", event_fd);
33 }
34}
diff --git a/Documentation/filesystems/logfs.txt b/Documentation/filesystems/logfs.txt
new file mode 100644
index 000000000000..e64c94ba401a
--- /dev/null
+++ b/Documentation/filesystems/logfs.txt
@@ -0,0 +1,241 @@
1
2The LogFS Flash Filesystem
3==========================
4
5Specification
6=============
7
8Superblocks
9-----------
10
11Two superblocks exist at the beginning and end of the filesystem.
12Each superblock is 256 Bytes large, with another 3840 Bytes reserved
13for future purposes, making a total of 4096 Bytes.
14
15Superblock locations may differ for MTD and block devices. On MTD the
16first non-bad block contains a superblock in the first 4096 Bytes and
17the last non-bad block contains a superblock in the last 4096 Bytes.
18On block devices, the first 4096 Bytes of the device contain the first
19superblock and the last aligned 4096 Byte-block contains the second
20superblock.
21
22For the most part, the superblocks can be considered read-only. They
23are written only to correct errors detected within the superblocks,
24move the journal and change the filesystem parameters through tunefs.
25As a result, the superblock does not contain any fields that require
26constant updates, like the amount of free space, etc.
27
28Segments
29--------
30
31The space in the device is split up into equal-sized segments.
32Segments are the primary write unit of LogFS. Within each segments,
33writes happen from front (low addresses) to back (high addresses. If
34only a partial segment has been written, the segment number, the
35current position within and optionally a write buffer are stored in
36the journal.
37
38Segments are erased as a whole. Therefore Garbage Collection may be
39required to completely free a segment before doing so.
40
41Journal
42--------
43
44The journal contains all global information about the filesystem that
45is subject to frequent change. At mount time, it has to be scanned
46for the most recent commit entry, which contains a list of pointers to
47all currently valid entries.
48
49Object Store
50------------
51
52All space except for the superblocks and journal is part of the object
53store. Each segment contains a segment header and a number of
54objects, each consisting of the object header and the payload.
55Objects are either inodes, directory entries (dentries), file data
56blocks or indirect blocks.
57
58Levels
59------
60
61Garbage collection (GC) may fail if all data is written
62indiscriminately. One requirement of GC is that data is seperated
63roughly according to the distance between the tree root and the data.
64Effectively that means all file data is on level 0, indirect blocks
65are on levels 1, 2, 3 4 or 5 for 1x, 2x, 3x, 4x or 5x indirect blocks,
66respectively. Inode file data is on level 6 for the inodes and 7-11
67for indirect blocks.
68
69Each segment contains objects of a single level only. As a result,
70each level requires its own seperate segment to be open for writing.
71
72Inode File
73----------
74
75All inodes are stored in a special file, the inode file. Single
76exception is the inode file's inode (master inode) which for obvious
77reasons is stored in the journal instead. Instead of data blocks, the
78leaf nodes of the inode files are inodes.
79
80Aliases
81-------
82
83Writes in LogFS are done by means of a wandering tree. A naïve
84implementation would require that for each write or a block, all
85parent blocks are written as well, since the block pointers have
86changed. Such an implementation would not be very efficient.
87
88In LogFS, the block pointer changes are cached in the journal by means
89of alias entries. Each alias consists of its logical address - inode
90number, block index, level and child number (index into block) - and
91the changed data. Any 8-byte word can be changes in this manner.
92
93Currently aliases are used for block pointers, file size, file used
94bytes and the height of an inodes indirect tree.
95
96Segment Aliases
97---------------
98
99Related to regular aliases, these are used to handle bad blocks.
100Initially, bad blocks are handled by moving the affected segment
101content to a spare segment and noting this move in the journal with a
102segment alias, a simple (to, from) tupel. GC will later empty this
103segment and the alias can be removed again. This is used on MTD only.
104
105Vim
106---
107
108By cleverly predicting the life time of data, it is possible to
109seperate long-living data from short-living data and thereby reduce
110the GC overhead later. Each type of distinc life expectency (vim) can
111have a seperate segment open for writing. Each (level, vim) tupel can
112be open just once. If an open segment with unknown vim is encountered
113at mount time, it is closed and ignored henceforth.
114
115Indirect Tree
116-------------
117
118Inodes in LogFS are similar to FFS-style filesystems with direct and
119indirect block pointers. One difference is that LogFS uses a single
120indirect pointer that can be either a 1x, 2x, etc. indirect pointer.
121A height field in the inode defines the height of the indirect tree
122and thereby the indirection of the pointer.
123
124Another difference is the addressing of indirect blocks. In LogFS,
125the first 16 pointers in the first indirect block are left empty,
126corresponding to the 16 direct pointers in the inode. In ext2 (maybe
127others as well) the first pointer in the first indirect block
128corresponds to logical block 12, skipping the 12 direct pointers.
129So where ext2 is using arithmetic to better utilize space, LogFS keeps
130arithmetic simple and uses compression to save space.
131
132Compression
133-----------
134
135Both file data and metadata can be compressed. Compression for file
136data can be enabled with chattr +c and disabled with chattr -c. Doing
137so has no effect on existing data, but new data will be stored
138accordingly. New inodes will inherit the compression flag of the
139parent directory.
140
141Metadata is always compressed. However, the space accounting ignores
142this and charges for the uncompressed size. Failing to do so could
143result in GC failures when, after moving some data, indirect blocks
144compress worse than previously. Even on a 100% full medium, GC may
145not consume any extra space, so the compression gains are lost space
146to the user.
147
148However, they are not lost space to the filesystem internals. By
149cheating the user for those bytes, the filesystem gained some slack
150space and GC will run less often and faster.
151
152Garbage Collection and Wear Leveling
153------------------------------------
154
155Garbage collection is invoked whenever the number of free segments
156falls below a threshold. The best (known) candidate is picked based
157on the least amount of valid data contained in the segment. All
158remaining valid data is copied elsewhere, thereby invalidating it.
159
160The GC code also checks for aliases and writes then back if their
161number gets too large.
162
163Wear leveling is done by occasionally picking a suboptimal segment for
164garbage collection. If a stale segments erase count is significantly
165lower than the active segments' erase counts, it will be picked. Wear
166leveling is rate limited, so it will never monopolize the device for
167more than one segment worth at a time.
168
169Values for "occasionally", "significantly lower" are compile time
170constants.
171
172Hashed directories
173------------------
174
175To satisfy efficient lookup(), directory entries are hashed and
176located based on the hash. In order to both support large directories
177and not be overly inefficient for small directories, several hash
178tables of increasing size are used. For each table, the hash value
179modulo the table size gives the table index.
180
181Tables sizes are chosen to limit the number of indirect blocks with a
182fully populated table to 0, 1, 2 or 3 respectively. So the first
183table contains 16 entries, the second 512-16, etc.
184
185The last table is special in several ways. First its size depends on
186the effective 32bit limit on telldir/seekdir cookies. Since logfs
187uses the upper half of the address space for indirect blocks, the size
188is limited to 2^31. Secondly the table contains hash buckets with 16
189entries each.
190
191Using single-entry buckets would result in birthday "attacks". At
192just 2^16 used entries, hash collisions would be likely (P >= 0.5).
193My math skills are insufficient to do the combinatorics for the 17x
194collisions necessary to overflow a bucket, but testing showed that in
19510,000 runs the lowest directory fill before a bucket overflow was
196188,057,130 entries with an average of 315,149,915 entries. So for
197directory sizes of up to a million, bucket overflows should be
198virtually impossible under normal circumstances.
199
200With carefully chosen filenames, it is obviously possible to cause an
201overflow with just 21 entries (4 higher tables + 16 entries + 1). So
202there may be a security concern if a malicious user has write access
203to a directory.
204
205Open For Discussion
206===================
207
208Device Address Space
209--------------------
210
211A device address space is used for caching. Both block devices and
212MTD provide functions to either read a single page or write a segment.
213Partial segments may be written for data integrity, but where possible
214complete segments are written for performance on simple block device
215flash media.
216
217Meta Inodes
218-----------
219
220Inodes are stored in the inode file, which is just a regular file for
221most purposes. At umount time, however, the inode file needs to
222remain open until all dirty inodes are written. So
223generic_shutdown_super() may not close this inode, but shouldn't
224complain about remaining inodes due to the inode file either. Same
225goes for mapping inode of the device address space.
226
227Currently logfs uses a hack that essentially copies part of fs/inode.c
228code over. A general solution would be preferred.
229
230Indirect block mapping
231----------------------
232
233With compression, the block device (or mapping inode) cannot be used
234to cache indirect blocks. Some other place is required. Currently
235logfs uses the top half of each inode's address space. The low 8TB
236(on 32bit) are filled with file data, the high 8TB are used for
237indirect blocks.
238
239One problem is that 16TB files created on 64bit systems actually have
240data in the top 8TB. But files >16TB would cause problems anyway, so
241only the limit has changed.
diff --git a/Documentation/filesystems/nfs/nfs41-server.txt b/Documentation/filesystems/nfs/nfs41-server.txt
index 1bd0d0c05171..6a53a84afc72 100644
--- a/Documentation/filesystems/nfs/nfs41-server.txt
+++ b/Documentation/filesystems/nfs/nfs41-server.txt
@@ -17,8 +17,7 @@ kernels must turn 4.1 on or off *before* turning support for version 4
17on or off; rpc.nfsd does this correctly.) 17on or off; rpc.nfsd does this correctly.)
18 18
19The NFSv4 minorversion 1 (NFSv4.1) implementation in nfsd is based 19The NFSv4 minorversion 1 (NFSv4.1) implementation in nfsd is based
20on the latest NFSv4.1 Internet Draft: 20on RFC 5661.
21http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion1-29
22 21
23From the many new features in NFSv4.1 the current implementation 22From the many new features in NFSv4.1 the current implementation
24focuses on the mandatory-to-implement NFSv4.1 Sessions, providing 23focuses on the mandatory-to-implement NFSv4.1 Sessions, providing
@@ -44,7 +43,7 @@ interoperability problems with future clients. Known issues:
44 trunking, but this is a mandatory feature, and its use is 43 trunking, but this is a mandatory feature, and its use is
45 recommended to clients in a number of places. (E.g. to ensure 44 recommended to clients in a number of places. (E.g. to ensure
46 timely renewal in case an existing connection's retry timeouts 45 timely renewal in case an existing connection's retry timeouts
47 have gotten too long; see section 8.3 of the draft.) 46 have gotten too long; see section 8.3 of the RFC.)
48 Therefore, lack of this feature may cause future clients to 47 Therefore, lack of this feature may cause future clients to
49 fail. 48 fail.
50 - Incomplete backchannel support: incomplete backchannel gss 49 - Incomplete backchannel support: incomplete backchannel gss
diff --git a/Documentation/filesystems/nilfs2.txt b/Documentation/filesystems/nilfs2.txt
index 839efd8a8a8c..cf6d0d85ca82 100644
--- a/Documentation/filesystems/nilfs2.txt
+++ b/Documentation/filesystems/nilfs2.txt
@@ -74,6 +74,9 @@ norecovery Disable recovery of the filesystem on mount.
74 This disables every write access on the device for 74 This disables every write access on the device for
75 read-only mounts or snapshots. This option will fail 75 read-only mounts or snapshots. This option will fail
76 for r/w mounts on an unclean volume. 76 for r/w mounts on an unclean volume.
77discard Issue discard/TRIM commands to the underlying block
78 device when blocks are freed. This is useful for SSD
79 devices and sparse/thinly-provisioned LUNs.
77 80
78NILFS2 usage 81NILFS2 usage
79============ 82============
diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index 0d07513a67a6..a4f30faa4f1f 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -164,6 +164,7 @@ read the file /proc/PID/status:
164 VmExe: 68 kB 164 VmExe: 68 kB
165 VmLib: 1412 kB 165 VmLib: 1412 kB
166 VmPTE: 20 kb 166 VmPTE: 20 kb
167 VmSwap: 0 kB
167 Threads: 1 168 Threads: 1
168 SigQ: 0/28578 169 SigQ: 0/28578
169 SigPnd: 0000000000000000 170 SigPnd: 0000000000000000
@@ -188,7 +189,13 @@ memory usage. Its seven fields are explained in Table 1-3. The stat file
188contains details information about the process itself. Its fields are 189contains details information about the process itself. Its fields are
189explained in Table 1-4. 190explained in Table 1-4.
190 191
191Table 1-2: Contents of the statm files (as of 2.6.30-rc7) 192(for SMP CONFIG users)
193For making accounting scalable, RSS related information are handled in
194asynchronous manner and the vaule may not be very precise. To see a precise
195snapshot of a moment, you can see /proc/<pid>/smaps file and scan page table.
196It's slow but very precise.
197
198Table 1-2: Contents of the status files (as of 2.6.30-rc7)
192.............................................................................. 199..............................................................................
193 Field Content 200 Field Content
194 Name filename of the executable 201 Name filename of the executable
@@ -213,6 +220,7 @@ Table 1-2: Contents of the statm files (as of 2.6.30-rc7)
213 VmExe size of text segment 220 VmExe size of text segment
214 VmLib size of shared library code 221 VmLib size of shared library code
215 VmPTE size of page table entries 222 VmPTE size of page table entries
223 VmSwap size of swap usage (the number of referred swapents)
216 Threads number of threads 224 Threads number of threads
217 SigQ number of signals queued/max. number for queue 225 SigQ number of signals queued/max. number for queue
218 SigPnd bitmap of pending signals for the thread 226 SigPnd bitmap of pending signals for the thread
@@ -430,6 +438,7 @@ Table 1-5: Kernel info in /proc
430 modules List of loaded modules 438 modules List of loaded modules
431 mounts Mounted filesystems 439 mounts Mounted filesystems
432 net Networking info (see text) 440 net Networking info (see text)
441 pagetypeinfo Additional page allocator information (see text) (2.5)
433 partitions Table of partitions known to the system 442 partitions Table of partitions known to the system
434 pci Deprecated info of PCI bus (new way -> /proc/bus/pci/, 443 pci Deprecated info of PCI bus (new way -> /proc/bus/pci/,
435 decoupled by lspci (2.4) 444 decoupled by lspci (2.4)
@@ -584,7 +593,7 @@ Node 0, zone DMA 0 4 5 4 4 3 ...
584Node 0, zone Normal 1 0 0 1 101 8 ... 593Node 0, zone Normal 1 0 0 1 101 8 ...
585Node 0, zone HighMem 2 0 0 1 1 0 ... 594Node 0, zone HighMem 2 0 0 1 1 0 ...
586 595
587Memory fragmentation is a problem under some workloads, and buddyinfo is a 596External fragmentation is a problem under some workloads, and buddyinfo is a
588useful tool for helping diagnose these problems. Buddyinfo will give you a 597useful tool for helping diagnose these problems. Buddyinfo will give you a
589clue as to how big an area you can safely allocate, or why a previous 598clue as to how big an area you can safely allocate, or why a previous
590allocation failed. 599allocation failed.
@@ -594,6 +603,48 @@ available. In this case, there are 0 chunks of 2^0*PAGE_SIZE available in
594ZONE_DMA, 4 chunks of 2^1*PAGE_SIZE in ZONE_DMA, 101 chunks of 2^4*PAGE_SIZE 603ZONE_DMA, 4 chunks of 2^1*PAGE_SIZE in ZONE_DMA, 101 chunks of 2^4*PAGE_SIZE
595available in ZONE_NORMAL, etc... 604available in ZONE_NORMAL, etc...
596 605
606More information relevant to external fragmentation can be found in
607pagetypeinfo.
608
609> cat /proc/pagetypeinfo
610Page block order: 9
611Pages per block: 512
612
613Free pages count per migrate type at order 0 1 2 3 4 5 6 7 8 9 10
614Node 0, zone DMA, type Unmovable 0 0 0 1 1 1 1 1 1 1 0
615Node 0, zone DMA, type Reclaimable 0 0 0 0 0 0 0 0 0 0 0
616Node 0, zone DMA, type Movable 1 1 2 1 2 1 1 0 1 0 2
617Node 0, zone DMA, type Reserve 0 0 0 0 0 0 0 0 0 1 0
618Node 0, zone DMA, type Isolate 0 0 0 0 0 0 0 0 0 0 0
619Node 0, zone DMA32, type Unmovable 103 54 77 1 1 1 11 8 7 1 9
620Node 0, zone DMA32, type Reclaimable 0 0 2 1 0 0 0 0 1 0 0
621Node 0, zone DMA32, type Movable 169 152 113 91 77 54 39 13 6 1 452
622Node 0, zone DMA32, type Reserve 1 2 2 2 2 0 1 1 1 1 0
623Node 0, zone DMA32, type Isolate 0 0 0 0 0 0 0 0 0 0 0
624
625Number of blocks type Unmovable Reclaimable Movable Reserve Isolate
626Node 0, zone DMA 2 0 5 1 0
627Node 0, zone DMA32 41 6 967 2 0
628
629Fragmentation avoidance in the kernel works by grouping pages of different
630migrate types into the same contiguous regions of memory called page blocks.
631A page block is typically the size of the default hugepage size e.g. 2MB on
632X86-64. By keeping pages grouped based on their ability to move, the kernel
633can reclaim pages within a page block to satisfy a high-order allocation.
634
635The pagetypinfo begins with information on the size of a page block. It
636then gives the same type of information as buddyinfo except broken down
637by migrate-type and finishes with details on how many page blocks of each
638type exist.
639
640If min_free_kbytes has been tuned correctly (recommendations made by hugeadm
641from libhugetlbfs http://sourceforge.net/projects/libhugetlbfs/), one can
642make an estimate of the likely number of huge pages that can be allocated
643at a given point in time. All the "Movable" blocks should be allocatable
644unless memory has been mlock()'d. Some of the Reclaimable blocks should
645also be allocatable although a lot of filesystem metadata may have to be
646reclaimed to achieve this.
647
597.............................................................................. 648..............................................................................
598 649
599meminfo: 650meminfo:
diff --git a/Documentation/filesystems/sharedsubtree.txt b/Documentation/filesystems/sharedsubtree.txt
index 23a181074f94..fc0e39af43c3 100644
--- a/Documentation/filesystems/sharedsubtree.txt
+++ b/Documentation/filesystems/sharedsubtree.txt
@@ -837,6 +837,9 @@ replicas continue to be exactly same.
837 individual lists does not affect propagation or the way propagation 837 individual lists does not affect propagation or the way propagation
838 tree is modified by operations. 838 tree is modified by operations.
839 839
840 All vfsmounts in a peer group have the same ->mnt_master. If it is
841 non-NULL, they form a contiguous (ordered) segment of slave list.
842
840 A example propagation tree looks as shown in the figure below. 843 A example propagation tree looks as shown in the figure below.
841 [ NOTE: Though it looks like a forest, if we consider all the shared 844 [ NOTE: Though it looks like a forest, if we consider all the shared
842 mounts as a conceptual entity called 'pnode', it becomes a tree] 845 mounts as a conceptual entity called 'pnode', it becomes a tree]
@@ -874,8 +877,19 @@ replicas continue to be exactly same.
874 877
875 NOTE: The propagation tree is orthogonal to the mount tree. 878 NOTE: The propagation tree is orthogonal to the mount tree.
876 879
8808B Locking:
881
882 ->mnt_share, ->mnt_slave, ->mnt_slave_list, ->mnt_master are protected
883 by namespace_sem (exclusive for modifications, shared for reading).
884
885 Normally we have ->mnt_flags modifications serialized by vfsmount_lock.
886 There are two exceptions: do_add_mount() and clone_mnt().
887 The former modifies a vfsmount that has not been visible in any shared
888 data structures yet.
889 The latter holds namespace_sem and the only references to vfsmount
890 are in lists that can't be traversed without namespace_sem.
877 891
8788B Algorithm: 8928C Algorithm:
879 893
880 The crux of the implementation resides in rbind/move operation. 894 The crux of the implementation resides in rbind/move operation.
881 895
diff --git a/Documentation/gpio.txt b/Documentation/gpio.txt
index 1866c27eec69..c2c6e9b39bbe 100644
--- a/Documentation/gpio.txt
+++ b/Documentation/gpio.txt
@@ -253,6 +253,70 @@ pin setup (e.g. controlling which pin the GPIO uses, pullup/pulldown).
253Also note that it's your responsibility to have stopped using a GPIO 253Also note that it's your responsibility to have stopped using a GPIO
254before you free it. 254before you free it.
255 255
256Considering in most cases GPIOs are actually configured right after they
257are claimed, three additional calls are defined:
258
259 /* request a single GPIO, with initial configuration specified by
260 * 'flags', identical to gpio_request() wrt other arguments and
261 * return value
262 */
263 int gpio_request_one(unsigned gpio, unsigned long flags, const char *label);
264
265 /* request multiple GPIOs in a single call
266 */
267 int gpio_request_array(struct gpio *array, size_t num);
268
269 /* release multiple GPIOs in a single call
270 */
271 void gpio_free_array(struct gpio *array, size_t num);
272
273where 'flags' is currently defined to specify the following properties:
274
275 * GPIOF_DIR_IN - to configure direction as input
276 * GPIOF_DIR_OUT - to configure direction as output
277
278 * GPIOF_INIT_LOW - as output, set initial level to LOW
279 * GPIOF_INIT_HIGH - as output, set initial level to HIGH
280
281since GPIOF_INIT_* are only valid when configured as output, so group valid
282combinations as:
283
284 * GPIOF_IN - configure as input
285 * GPIOF_OUT_INIT_LOW - configured as output, initial level LOW
286 * GPIOF_OUT_INIT_HIGH - configured as output, initial level HIGH
287
288In the future, these flags can be extended to support more properties such
289as open-drain status.
290
291Further more, to ease the claim/release of multiple GPIOs, 'struct gpio' is
292introduced to encapsulate all three fields as:
293
294 struct gpio {
295 unsigned gpio;
296 unsigned long flags;
297 const char *label;
298 };
299
300A typical example of usage:
301
302 static struct gpio leds_gpios[] = {
303 { 32, GPIOF_OUT_INIT_HIGH, "Power LED" }, /* default to ON */
304 { 33, GPIOF_OUT_INIT_LOW, "Green LED" }, /* default to OFF */
305 { 34, GPIOF_OUT_INIT_LOW, "Red LED" }, /* default to OFF */
306 { 35, GPIOF_OUT_INIT_LOW, "Blue LED" }, /* default to OFF */
307 { ... },
308 };
309
310 err = gpio_request_one(31, GPIOF_IN, "Reset Button");
311 if (err)
312 ...
313
314 err = gpio_request_array(leds_gpios, ARRAY_SIZE(leds_gpios));
315 if (err)
316 ...
317
318 gpio_free_array(leds_gpios, ARRAY_SIZE(leds_gpios));
319
256 320
257GPIOs mapped to IRQs 321GPIOs mapped to IRQs
258-------------------- 322--------------------
diff --git a/Documentation/hwmon/abituguru b/Documentation/hwmon/abituguru
index 87ffa0f5ec70..5eb3b9d5f0d5 100644
--- a/Documentation/hwmon/abituguru
+++ b/Documentation/hwmon/abituguru
@@ -30,7 +30,7 @@ Supported chips:
30 bank1_types=1,1,0,0,0,0,0,2,0,0,0,0,2,0,0,1 30 bank1_types=1,1,0,0,0,0,0,2,0,0,0,0,2,0,0,1
31 You may also need to specify the fan_sensors option for these boards 31 You may also need to specify the fan_sensors option for these boards
32 fan_sensors=5 32 fan_sensors=5
33 2) There is a seperate abituguru3 driver for these motherboards, 33 2) There is a separate abituguru3 driver for these motherboards,
34 the abituguru (without the 3 !) driver will not work on these 34 the abituguru (without the 3 !) driver will not work on these
35 motherboards (and visa versa)! 35 motherboards (and visa versa)!
36 36
diff --git a/Documentation/hwmon/adt7411 b/Documentation/hwmon/adt7411
new file mode 100644
index 000000000000..1632960f9745
--- /dev/null
+++ b/Documentation/hwmon/adt7411
@@ -0,0 +1,42 @@
1Kernel driver adt7411
2=====================
3
4Supported chips:
5 * Analog Devices ADT7411
6 Prefix: 'adt7411'
7 Addresses scanned: 0x48, 0x4a, 0x4b
8 Datasheet: Publicly available at the Analog Devices website
9
10Author: Wolfram Sang (based on adt7470 by Darrick J. Wong)
11
12Description
13-----------
14
15This driver implements support for the Analog Devices ADT7411 chip. There may
16be other chips that implement this interface.
17
18The ADT7411 can use an I2C/SMBus compatible 2-wire interface or an
19SPI-compatible 4-wire interface. It provides a 10-bit analog to digital
20converter which measures 1 temperature, vdd and 8 input voltages. It has an
21internal temperature sensor, but an external one can also be connected (one
22loses 2 inputs then). There are high- and low-limit registers for all inputs.
23
24Check the datasheet for details.
25
26sysfs-Interface
27---------------
28
29in0_input - vdd voltage input
30in[1-8]_input - analog 1-8 input
31temp1_input - temperature input
32
33Besides standard interfaces, this driver adds (0 = off, 1 = on):
34
35 adc_ref_vdd - Use vdd as reference instead of 2.25 V
36 fast_sampling - Sample at 22.5 kHz instead of 1.4 kHz, but drop filters
37 no_average - Turn off averaging over 16 samples
38
39Notes
40-----
41
42SPI, external temperature sensor and limit registers are not supported yet.
diff --git a/Documentation/hwmon/adt7473 b/Documentation/hwmon/adt7473
deleted file mode 100644
index 446612bd1fb9..000000000000
--- a/Documentation/hwmon/adt7473
+++ /dev/null
@@ -1,74 +0,0 @@
1Kernel driver adt7473
2======================
3
4Supported chips:
5 * Analog Devices ADT7473
6 Prefix: 'adt7473'
7 Addresses scanned: I2C 0x2C, 0x2D, 0x2E
8 Datasheet: Publicly available at the Analog Devices website
9
10Author: Darrick J. Wong
11
12This driver is depreacted, please use the adt7475 driver instead.
13
14Description
15-----------
16
17This driver implements support for the Analog Devices ADT7473 chip family.
18
19The ADT7473 uses the 2-wire interface compatible with the SMBUS 2.0
20specification. Using an analog to digital converter it measures three (3)
21temperatures and two (2) voltages. It has four (4) 16-bit counters for
22measuring fan speed. There are three (3) PWM outputs that can be used
23to control fan speed.
24
25A sophisticated control system for the PWM outputs is designed into the
26ADT7473 that allows fan speed to be adjusted automatically based on any of the
27three temperature sensors. Each PWM output is individually adjustable and
28programmable. Once configured, the ADT7473 will adjust the PWM outputs in
29response to the measured temperatures without further host intervention.
30This feature can also be disabled for manual control of the PWM's.
31
32Each of the measured inputs (voltage, temperature, fan speed) has
33corresponding high/low limit values. The ADT7473 will signal an ALARM if
34any measured value exceeds either limit.
35
36The ADT7473 samples all inputs continuously. The driver will not read
37the registers more often than once every other second. Further,
38configuration data is only read once per minute.
39
40Special Features
41----------------
42
43The ADT7473 have a 10-bit ADC and can therefore measure temperatures
44with 0.25 degC resolution. Temperature readings can be configured either
45for twos complement format or "Offset 64" format, wherein 63 is subtracted
46from the raw value to get the temperature value.
47
48The Analog Devices datasheet is very detailed and describes a procedure for
49determining an optimal configuration for the automatic PWM control.
50
51Configuration Notes
52-------------------
53
54Besides standard interfaces driver adds the following:
55
56* PWM Control
57
58* pwm#_auto_point1_pwm and temp#_auto_point1_temp and
59* pwm#_auto_point2_pwm and temp#_auto_point2_temp -
60
61point1: Set the pwm speed at a lower temperature bound.
62point2: Set the pwm speed at a higher temperature bound.
63
64The ADT7473 will scale the pwm between the lower and higher pwm speed when
65the temperature is between the two temperature boundaries. PWM values range
66from 0 (off) to 255 (full speed). Fan speed will be set to maximum when the
67temperature sensor associated with the PWM control exceeds temp#_max.
68
69Notes
70-----
71
72The NVIDIA binary driver presents an ADT7473 chip via an on-card i2c bus.
73Unfortunately, they fail to set the i2c adapter class, so this driver may
74fail to find the chip until the nvidia driver is patched.
diff --git a/Documentation/hwmon/asc7621 b/Documentation/hwmon/asc7621
new file mode 100644
index 000000000000..7287be7e1f21
--- /dev/null
+++ b/Documentation/hwmon/asc7621
@@ -0,0 +1,296 @@
1Kernel driver asc7621
2==================
3
4Supported chips:
5 Andigilog aSC7621 and aSC7621a
6 Prefix: 'asc7621'
7 Addresses scanned: I2C 0x2c, 0x2d, 0x2e
8 Datasheet: http://www.fairview5.com/linux/asc7621/asc7621.pdf
9
10Author:
11 George Joseph
12
13Description provided by Dave Pivin @ Andigilog:
14
15Andigilog has both the PECI and pre-PECI versions of the Heceta-6, as
16Intel calls them. Heceta-6e has high frequency PWM and Heceta-6p has
17added PECI and a 4th thermal zone. The Andigilog aSC7611 is the
18Heceta-6e part and aSC7621 is the Heceta-6p part. They are both in
19volume production, shipping to Intel and their subs.
20
21We have enhanced both parts relative to the governing Intel
22specification. First enhancement is temperature reading resolution. We
23have used registers below 20h for vendor-specific functions in addition
24to those in the Intel-specified vendor range.
25
26Our conversion process produces a result that is reported as two bytes.
27The fan speed control uses this finer value to produce a "step-less" fan
28PWM output. These two bytes are "read-locked" to guarantee that once a
29high or low byte is read, the other byte is locked-in until after the
30next read of any register. So to get an atomic reading, read high or low
31byte, then the very next read should be the opposite byte. Our data
32sheet says 10-bits of resolution, although you may find the lower bits
33are active, they are not necessarily reliable or useful externally. We
34chose not to mask them.
35
36We employ significant filtering that is user tunable as described in the
37data sheet. Our temperature reports and fan PWM outputs are very smooth
38when compared to the competition, in addition to the higher resolution
39temperature reports. The smoother PWM output does not require user
40intervention.
41
42We offer GPIO features on the former VID pins. These are open-drain
43outputs or inputs and may be used as general purpose I/O or as alarm
44outputs that are based on temperature limits. These are in 19h and 1Ah.
45
46We offer flexible mapping of temperature readings to thermal zones. Any
47temperature may be mapped to any zone, which has a default assignment
48that follows Intel's specs.
49
50Since there is a fan to zone assignment that allows for the "hotter" of
51a set of zones to control the PWM of an individual fan, but there is no
52indication to the user, we have added an indicator that shows which zone
53is currently controlling the PWM for a given fan. This is in register
5400h.
55
56Both remote diode temperature readings may be given an offset value such
57that the reported reading as well as the temperature used to determine
58PWM may be offset for system calibration purposes.
59
60PECI Extended configuration allows for having more than two domains per
61PECI address and also provides an enabling function for each PECI
62address. One could use our flexible zone assignment to have a zone
63assigned to up to 4 PECI addresses. This is not possible in the default
64Intel configuration. This would be useful in multi-CPU systems with
65individual fans on each that would benefit from individual fan control.
66This is in register 0Eh.
67
68The tachometer measurement system is flexible and able to adapt to many
69fan types. We can also support pulse-stretched PWM so that 3-wire fans
70may be used. These characteristics are in registers 04h to 07h.
71
72Finally, we have added a tach disable function that turns off the tach
73measurement system for individual tachs in order to save power. That is
74in register 75h.
75
76--
77aSC7621 Product Description
78
79The aSC7621 has a two wire digital interface compatible with SMBus 2.0.
80Using a 10-bit ADC, the aSC7621 measures the temperature of two remote diode
81connected transistors as well as its own die. Support for Platform
82Environmental Control Interface (PECI) is included.
83
84Using temperature information from these four zones, an automatic fan speed
85control algorithm is employed to minimize acoustic impact while achieving
86recommended CPU temperature under varying operational loads.
87
88To set fan speed, the aSC7621 has three independent pulse width modulation
89(PWM) outputs that are controlled by one, or a combination of three,
90temperature zones. Both high- and low-frequency PWM ranges are supported.
91
92The aSC7621 also includes a digital filter that can be invoked to smooth
93temperature readings for better control of fan speed and minimum acoustic
94impact.
95
96The aSC7621 has tachometer inputs to measure fan speed on up to four fans.
97Limit and status registers for all measured values are included to alert
98the system host that any measurements are outside of programmed limits
99via status registers.
100
101System voltages of VCCP, 2.5V, 3.3V, 5.0V, and 12V motherboard power are
102monitored efficiently with internal scaling resistors.
103
104Features
105- Supports PECI interface and monitors internal and remote thermal diodes
106- 2-wire, SMBus 2.0 compliant, serial interface
107- 10-bit ADC
108- Monitors VCCP, 2.5V, 3.3V, 5.0V, and 12V motherboard/processor supplies
109- Programmable autonomous fan control based on temperature readings
110- Noise filtering of temperature reading for fan speed control
111- 0.25C digital temperature sensor resolution
112- 3 PWM fan speed control outputs for 2-, 3- or 4-wire fans and up to 4 fan
113 tachometer inputs
114- Enhanced measured temperature to Temperature Zone assignment.
115- Provides high and low PWM frequency ranges
116- 3 GPIO pins for custom use
117- 24-Lead QSOP package
118
119Configuration Notes
120===================
121
122Except where noted below, the sysfs entries created by this driver follow
123the standards defined in "sysfs-interface".
124
125temp1_source
126 0 (default) peci_legacy = 0, Remote 1 Temperature
127 peci_legacy = 1, PECI Processor Temperature 0
128 1 Remote 1 Temperature
129 2 Remote 2 Temperature
130 3 Internal Temperature
131 4 PECI Processor Temperature 0
132 5 PECI Processor Temperature 1
133 6 PECI Processor Temperature 2
134 7 PECI Processor Temperature 3
135
136temp2_source
137 0 (default) Internal Temperature
138 1 Remote 1 Temperature
139 2 Remote 2 Temperature
140 3 Internal Temperature
141 4 PECI Processor Temperature 0
142 5 PECI Processor Temperature 1
143 6 PECI Processor Temperature 2
144 7 PECI Processor Temperature 3
145
146temp3_source
147 0 (default) Remote 2 Temperature
148 1 Remote 1 Temperature
149 2 Remote 2 Temperature
150 3 Internal Temperature
151 4 PECI Processor Temperature 0
152 5 PECI Processor Temperature 1
153 6 PECI Processor Temperature 2
154 7 PECI Processor Temperature 3
155
156temp4_source
157 0 (default) peci_legacy = 0, PECI Processor Temperature 0
158 peci_legacy = 1, Remote 1 Temperature
159 1 Remote 1 Temperature
160 2 Remote 2 Temperature
161 3 Internal Temperature
162 4 PECI Processor Temperature 0
163 5 PECI Processor Temperature 1
164 6 PECI Processor Temperature 2
165 7 PECI Processor Temperature 3
166
167temp[1-4]_smoothing_enable
168temp[1-4]_smoothing_time
169 Smooths spikes in temp readings caused by noise.
170 Valid values in milliseconds are:
171 35000
172 17600
173 11800
174 7000
175 4400
176 3000
177 1600
178 800
179
180temp[1-4]_crit
181 When the corresponding zone temperature reaches this value,
182 ALL pwm outputs will got to 100%.
183
184temp[5-8]_input
185temp[5-8]_enable
186 The aSC7621 can also read temperatures provided by the processor
187 via the PECI bus. Usually these are "core" temps and are relative
188 to the point where the automatic thermal control circuit starts
189 throttling. This means that these are usually negative numbers.
190
191pwm[1-3]_enable
192 0 Fan off.
193 1 Fan on manual control.
194 2 Fan on automatic control and will run at the minimum pwm
195 if the temperature for the zone is below the minimum.
196 3 Fan on automatic control but will be off if the temperature
197 for the zone is below the minimum.
198 4-254 Ignored.
199 255 Fan on full.
200
201pwm[1-3]_auto_channels
202 Bitmap as described in sysctl-interface with the following
203 exceptions...
204 Only the following combination of zones (and their corresponding masks)
205 are valid:
206 1
207 2
208 3
209 2,3
210 1,2,3
211 4
212 1,2,3,4
213
214 Special values:
215 0 Disabled.
216 16 Fan on manual control.
217 31 Fan on full.
218
219
220pwm[1-3]_invert
221 When set, inverts the meaning of pwm[1-3].
222 i.e. when pwm = 0, the fan will be on full and
223 when pwm = 255 the fan will be off.
224
225pwm[1-3]_freq
226 PWM frequency in Hz
227 Valid values in Hz are:
228
229 10
230 15
231 23
232 30 (default)
233 38
234 47
235 62
236 94
237 23000
238 24000
239 25000
240 26000
241 27000
242 28000
243 29000
244 30000
245
246 Setting any other value will be ignored.
247
248peci_enable
249 Enables or disables PECI
250
251peci_avg
252 Input filter average time.
253
254 0 0 Sec. (no Smoothing) (default)
255 1 0.25 Sec.
256 2 0.5 Sec.
257 3 1.0 Sec.
258 4 2.0 Sec.
259 5 4.0 Sec.
260 6 8.0 Sec.
261 7 0.0 Sec.
262
263peci_legacy
264
265 0 Standard Mode (default)
266 Remote Diode 1 reading is associated with
267 Temperature Zone 1, PECI is associated with
268 Zone 4
269
270 1 Legacy Mode
271 PECI is associated with Temperature Zone 1,
272 Remote Diode 1 is associated with Zone 4
273
274peci_diode
275 Diode filter
276
277 0 0.25 Sec.
278 1 1.1 Sec.
279 2 2.4 Sec. (default)
280 3 3.4 Sec.
281 4 5.0 Sec.
282 5 6.8 Sec.
283 6 10.2 Sec.
284 7 16.4 Sec.
285
286peci_4domain
287 Four domain enable
288
289 0 1 or 2 Domains for enabled processors (default)
290 1 3 or 4 Domains for enabled processors
291
292peci_domain
293 Domain
294
295 0 Processor contains a single domain (0) (default)
296 1 Processor contains two domains (0,1)
diff --git a/Documentation/hwmon/it87 b/Documentation/hwmon/it87
index f9ba96c0ac4a..8d08bf0d38ed 100644
--- a/Documentation/hwmon/it87
+++ b/Documentation/hwmon/it87
@@ -5,31 +5,23 @@ Supported chips:
5 * IT8705F 5 * IT8705F
6 Prefix: 'it87' 6 Prefix: 'it87'
7 Addresses scanned: from Super I/O config space (8 I/O ports) 7 Addresses scanned: from Super I/O config space (8 I/O ports)
8 Datasheet: Publicly available at the ITE website 8 Datasheet: Once publicly available at the ITE website, but no longer
9 http://www.ite.com.tw/product_info/file/pc/IT8705F_V.0.4.1.pdf
10 * IT8712F 9 * IT8712F
11 Prefix: 'it8712' 10 Prefix: 'it8712'
12 Addresses scanned: from Super I/O config space (8 I/O ports) 11 Addresses scanned: from Super I/O config space (8 I/O ports)
13 Datasheet: Publicly available at the ITE website 12 Datasheet: Once publicly available at the ITE website, but no longer
14 http://www.ite.com.tw/product_info/file/pc/IT8712F_V0.9.1.pdf
15 http://www.ite.com.tw/product_info/file/pc/Errata%20V0.1%20for%20IT8712F%20V0.9.1.pdf
16 http://www.ite.com.tw/product_info/file/pc/IT8712F_V0.9.3.pdf
17 * IT8716F/IT8726F 13 * IT8716F/IT8726F
18 Prefix: 'it8716' 14 Prefix: 'it8716'
19 Addresses scanned: from Super I/O config space (8 I/O ports) 15 Addresses scanned: from Super I/O config space (8 I/O ports)
20 Datasheet: Publicly available at the ITE website 16 Datasheet: Once publicly available at the ITE website, but no longer
21 http://www.ite.com.tw/product_info/file/pc/IT8716F_V0.3.ZIP
22 http://www.ite.com.tw/product_info/file/pc/IT8726F_V0.3.pdf
23 * IT8718F 17 * IT8718F
24 Prefix: 'it8718' 18 Prefix: 'it8718'
25 Addresses scanned: from Super I/O config space (8 I/O ports) 19 Addresses scanned: from Super I/O config space (8 I/O ports)
26 Datasheet: Publicly available at the ITE website 20 Datasheet: Once publicly available at the ITE website, but no longer
27 http://www.ite.com.tw/product_info/file/pc/IT8718F_V0.2.zip
28 http://www.ite.com.tw/product_info/file/pc/IT8718F_V0%203_(for%20C%20version).zip
29 * IT8720F 21 * IT8720F
30 Prefix: 'it8720' 22 Prefix: 'it8720'
31 Addresses scanned: from Super I/O config space (8 I/O ports) 23 Addresses scanned: from Super I/O config space (8 I/O ports)
32 Datasheet: Not yet publicly available. 24 Datasheet: Not publicly available
33 * SiS950 [clone of IT8705F] 25 * SiS950 [clone of IT8705F]
34 Prefix: 'it87' 26 Prefix: 'it87'
35 Addresses scanned: from Super I/O config space (8 I/O ports) 27 Addresses scanned: from Super I/O config space (8 I/O ports)
@@ -136,6 +128,10 @@ registers are read whenever any data is read (unless it is less than 1.5
136seconds since the last update). This means that you can easily miss 128seconds since the last update). This means that you can easily miss
137once-only alarms. 129once-only alarms.
138 130
131Out-of-limit readings can also result in beeping, if the chip is properly
132wired and configured. Beeping can be enabled or disabled per sensor type
133(temperatures, voltages and fans.)
134
139The IT87xx only updates its values each 1.5 seconds; reading it more often 135The IT87xx only updates its values each 1.5 seconds; reading it more often
140will do no harm, but will return 'old' values. 136will do no harm, but will return 'old' values.
141 137
@@ -150,11 +146,38 @@ Fan speed control
150----------------- 146-----------------
151 147
152The fan speed control features are limited to manual PWM mode. Automatic 148The fan speed control features are limited to manual PWM mode. Automatic
153"Smart Guardian" mode control handling is not implemented. However 149"Smart Guardian" mode control handling is only implemented for older chips
154if you want to go for "manual mode" just write 1 to pwmN_enable. 150(see below.) However if you want to go for "manual mode" just write 1 to
151pwmN_enable.
155 152
156If you are only able to control the fan speed with very small PWM values, 153If you are only able to control the fan speed with very small PWM values,
157try lowering the PWM base frequency (pwm1_freq). Depending on the fan, 154try lowering the PWM base frequency (pwm1_freq). Depending on the fan,
158it may give you a somewhat greater control range. The same frequency is 155it may give you a somewhat greater control range. The same frequency is
159used to drive all fan outputs, which is why pwm2_freq and pwm3_freq are 156used to drive all fan outputs, which is why pwm2_freq and pwm3_freq are
160read-only. 157read-only.
158
159
160Automatic fan speed control (old interface)
161-------------------------------------------
162
163The driver supports the old interface to automatic fan speed control
164which is implemented by IT8705F chips up to revision F and IT8712F
165chips up to revision G.
166
167This interface implements 4 temperature vs. PWM output trip points.
168The PWM output of trip point 4 is always the maximum value (fan running
169at full speed) while the PWM output of the other 3 trip points can be
170freely chosen. The temperature of all 4 trip points can be freely chosen.
171Additionally, trip point 1 has an hysteresis temperature attached, to
172prevent fast switching between fan on and off.
173
174The chip automatically computes the PWM output value based on the input
175temperature, based on this simple rule: if the temperature value is
176between trip point N and trip point N+1 then the PWM output value is
177the one of trip point N. The automatic control mode is less flexible
178than the manual control mode, but it reacts faster, is more robust and
179doesn't use CPU cycles.
180
181Trip points must be set properly before switching to automatic fan speed
182control mode. The driver will perform basic integrity checks before
183actually switching to automatic control mode.
diff --git a/Documentation/hwmon/lm90 b/Documentation/hwmon/lm90
index 93d8e3d55150..6a03dd4bcc94 100644
--- a/Documentation/hwmon/lm90
+++ b/Documentation/hwmon/lm90
@@ -84,6 +84,10 @@ Supported chips:
84 Addresses scanned: I2C 0x4c 84 Addresses scanned: I2C 0x4c
85 Datasheet: Publicly available at the Maxim website 85 Datasheet: Publicly available at the Maxim website
86 http://www.maxim-ic.com/quick_view2.cfm/qv_pk/3500 86 http://www.maxim-ic.com/quick_view2.cfm/qv_pk/3500
87 * Winbond/Nuvoton W83L771AWG/ASG
88 Prefix: 'w83l771'
89 Addresses scanned: I2C 0x4c
90 Datasheet: Not publicly available, can be requested from Nuvoton
87 91
88 92
89Author: Jean Delvare <khali@linux-fr.org> 93Author: Jean Delvare <khali@linux-fr.org>
@@ -147,6 +151,12 @@ MAX6680 and MAX6681:
147 * Selectable address 151 * Selectable address
148 * Remote sensor type selection 152 * Remote sensor type selection
149 153
154W83L771AWG/ASG
155 * The AWG and ASG variants only differ in package format.
156 * Filter and alert configuration register at 0xBF
157 * Diode ideality factor configuration (remote sensor) at 0xE3
158 * Moving average (depending on conversion rate)
159
150All temperature values are given in degrees Celsius. Resolution 160All temperature values are given in degrees Celsius. Resolution
151is 1.0 degree for the local temperature, 0.125 degree for the remote 161is 1.0 degree for the local temperature, 0.125 degree for the remote
152temperature, except for the MAX6657, MAX6658 and MAX6659 which have a 162temperature, except for the MAX6657, MAX6658 and MAX6659 which have a
@@ -163,6 +173,18 @@ The lm90 driver will not update its values more frequently than every
163other second; reading them more often will do no harm, but will return 173other second; reading them more often will do no harm, but will return
164'old' values. 174'old' values.
165 175
176SMBus Alert Support
177-------------------
178
179This driver has basic support for SMBus alert. When an alert is received,
180the status register is read and the faulty temperature channel is logged.
181
182The Analog Devices chips (ADM1032 and ADT7461) do not implement the SMBus
183alert protocol properly so additional care is needed: the ALERT output is
184disabled when an alert is received, and is re-enabled only when the alarm
185is gone. Otherwise the chip would block alerts from other chips in the bus
186as long as the alarm is active.
187
166PEC Support 188PEC Support
167----------- 189-----------
168 190
diff --git a/Documentation/i2c/busses/i2c-i801 b/Documentation/i2c/busses/i2c-i801
index 81c0c59a60ea..e1bb5b261693 100644
--- a/Documentation/i2c/busses/i2c-i801
+++ b/Documentation/i2c/busses/i2c-i801
@@ -15,7 +15,8 @@ Supported adapters:
15 * Intel 82801I (ICH9) 15 * Intel 82801I (ICH9)
16 * Intel EP80579 (Tolapai) 16 * Intel EP80579 (Tolapai)
17 * Intel 82801JI (ICH10) 17 * Intel 82801JI (ICH10)
18 * Intel PCH 18 * Intel 3400/5 Series (PCH)
19 * Intel Cougar Point (PCH)
19 Datasheets: Publicly available at the Intel website 20 Datasheets: Publicly available at the Intel website
20 21
21Authors: 22Authors:
diff --git a/Documentation/i2c/busses/i2c-parport b/Documentation/i2c/busses/i2c-parport
index dceaba1ad930..2461c7b53b2c 100644
--- a/Documentation/i2c/busses/i2c-parport
+++ b/Documentation/i2c/busses/i2c-parport
@@ -29,6 +29,9 @@ can be easily added when needed.
29Earlier kernels defaulted to type=0 (Philips). But now, if the type 29Earlier kernels defaulted to type=0 (Philips). But now, if the type
30parameter is missing, the driver will simply fail to initialize. 30parameter is missing, the driver will simply fail to initialize.
31 31
32SMBus alert support is available on adapters which have this line properly
33connected to the parallel port's interrupt pin.
34
32 35
33Building your own adapter 36Building your own adapter
34------------------------- 37-------------------------
diff --git a/Documentation/i2c/busses/i2c-parport-light b/Documentation/i2c/busses/i2c-parport-light
index 287436478520..bdc9cbb2e0f2 100644
--- a/Documentation/i2c/busses/i2c-parport-light
+++ b/Documentation/i2c/busses/i2c-parport-light
@@ -9,3 +9,14 @@ parport handling is not an option. The drawback is a reduced portability
9and the impossibility to daisy-chain other parallel port devices. 9and the impossibility to daisy-chain other parallel port devices.
10 10
11Please see i2c-parport for documentation. 11Please see i2c-parport for documentation.
12
13Module parameters:
14
15* type: type of adapter (see i2c-parport or modinfo)
16
17* base: base I/O address
18 Default is 0x378 which is fairly common for parallel ports, at least on PC.
19
20* irq: optional IRQ
21 This must be passed if you want SMBus alert support, assuming your adapter
22 actually supports this.
diff --git a/Documentation/i2c/smbus-protocol b/Documentation/i2c/smbus-protocol
index 9df47441f0e7..7c19d1a2bea0 100644
--- a/Documentation/i2c/smbus-protocol
+++ b/Documentation/i2c/smbus-protocol
@@ -185,6 +185,22 @@ the protocol. All ARP communications use slave address 0x61 and
185require PEC checksums. 185require PEC checksums.
186 186
187 187
188SMBus Alert
189===========
190
191SMBus Alert was introduced in Revision 1.0 of the specification.
192
193The SMBus alert protocol allows several SMBus slave devices to share a
194single interrupt pin on the SMBus master, while still allowing the master
195to know which slave triggered the interrupt.
196
197This is implemented the following way in the Linux kernel:
198* I2C bus drivers which support SMBus alert should call
199 i2c_setup_smbus_alert() to setup SMBus alert support.
200* I2C drivers for devices which can trigger SMBus alerts should implement
201 the optional alert() callback.
202
203
188I2C Block Transactions 204I2C Block Transactions
189====================== 205======================
190 206
diff --git a/Documentation/i2c/writing-clients b/Documentation/i2c/writing-clients
index 0a74603eb671..3219ee0dbfef 100644
--- a/Documentation/i2c/writing-clients
+++ b/Documentation/i2c/writing-clients
@@ -318,8 +318,9 @@ Plain I2C communication
318These routines read and write some bytes from/to a client. The client 318These routines read and write some bytes from/to a client. The client
319contains the i2c address, so you do not have to include it. The second 319contains the i2c address, so you do not have to include it. The second
320parameter contains the bytes to read/write, the third the number of bytes 320parameter contains the bytes to read/write, the third the number of bytes
321to read/write (must be less than the length of the buffer.) Returned is 321to read/write (must be less than the length of the buffer, also should be
322the actual number of bytes read/written. 322less than 64k since msg.len is u16.) Returned is the actual number of bytes
323read/written.
323 324
324 int i2c_transfer(struct i2c_adapter *adap, struct i2c_msg *msg, 325 int i2c_transfer(struct i2c_adapter *adap, struct i2c_msg *msg,
325 int num); 326 int num);
diff --git a/Documentation/init.txt b/Documentation/init.txt
new file mode 100644
index 000000000000..535ad5e82b98
--- /dev/null
+++ b/Documentation/init.txt
@@ -0,0 +1,49 @@
1Explaining the dreaded "No init found." boot hang message
2=========================================================
3
4OK, so you've got this pretty unintuitive message (currently located
5in init/main.c) and are wondering what the H*** went wrong.
6Some high-level reasons for failure (listed roughly in order of execution)
7to load the init binary are:
8A) Unable to mount root FS
9B) init binary doesn't exist on rootfs
10C) broken console device
11D) binary exists but dependencies not available
12E) binary cannot be loaded
13
14Detailed explanations:
150) Set "debug" kernel parameter (in bootloader config file or CONFIG_CMDLINE)
16 to get more detailed kernel messages.
17A) make sure you have the correct root FS type
18 (and root= kernel parameter points to the correct partition),
19 required drivers such as storage hardware (such as SCSI or USB!)
20 and filesystem (ext3, jffs2 etc.) are builtin (alternatively as modules,
21 to be pre-loaded by an initrd)
22C) Possibly a conflict in console= setup --> initial console unavailable.
23 E.g. some serial consoles are unreliable due to serial IRQ issues (e.g.
24 missing interrupt-based configuration).
25 Try using a different console= device or e.g. netconsole= .
26D) e.g. required library dependencies of the init binary such as
27 /lib/ld-linux.so.2 missing or broken. Use readelf -d <INIT>|grep NEEDED
28 to find out which libraries are required.
29E) make sure the binary's architecture matches your hardware.
30 E.g. i386 vs. x86_64 mismatch, or trying to load x86 on ARM hardware.
31 In case you tried loading a non-binary file here (shell script?),
32 you should make sure that the script specifies an interpreter in its shebang
33 header line (#!/...) that is fully working (including its library
34 dependencies). And before tackling scripts, better first test a simple
35 non-script binary such as /bin/sh and confirm its successful execution.
36 To find out more, add code to init/main.c to display kernel_execve()s
37 return values.
38
39Please extend this explanation whenever you find new failure causes
40(after all loading the init binary is a CRITICAL and hard transition step
41which needs to be made as painless as possible), then submit patch to LKML.
42Further TODOs:
43- Implement the various run_init_process() invocations via a struct array
44 which can then store the kernel_execve() result value and on failure
45 log it all by iterating over _all_ results (very important usability fix).
46- try to make the implementation itself more helpful in general,
47 e.g. by providing additional error messages at affected places.
48
49Andreas Mohr <andi at lisas period de>
diff --git a/Documentation/input/rotary-encoder.txt b/Documentation/input/rotary-encoder.txt
index 3a6aec40c0b0..8b4129de1d2d 100644
--- a/Documentation/input/rotary-encoder.txt
+++ b/Documentation/input/rotary-encoder.txt
@@ -75,7 +75,7 @@ and the number of steps or will clamp at the maximum and zero depending on
75the configuration. 75the configuration.
76 76
77Because GPIO to IRQ mapping is platform specific, this information must 77Because GPIO to IRQ mapping is platform specific, this information must
78be given in seperately to the driver. See the example below. 78be given in separately to the driver. See the example below.
79 79
80---------<snip>--------- 80---------<snip>---------
81 81
diff --git a/Documentation/input/sentelic.txt b/Documentation/input/sentelic.txt
index f7160a2fb6a2..b35affd5c649 100644
--- a/Documentation/input/sentelic.txt
+++ b/Documentation/input/sentelic.txt
@@ -1,5 +1,5 @@
1Copyright (C) 2002-2008 Sentelic Corporation. 1Copyright (C) 2002-2010 Sentelic Corporation.
2Last update: Oct-31-2008 2Last update: Jan-13-2010
3 3
4============================================================================== 4==============================================================================
5* Finger Sensing Pad Intellimouse Mode(scrolling wheel, 4th and 5th buttons) 5* Finger Sensing Pad Intellimouse Mode(scrolling wheel, 4th and 5th buttons)
@@ -44,7 +44,7 @@ B) MSID 6: Horizontal and Vertical scrolling.
44Packet 1 44Packet 1
45 Bit 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 45 Bit 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0
46BYTE |---------------|BYTE |---------------|BYTE|---------------|BYTE|---------------| 46BYTE |---------------|BYTE |---------------|BYTE|---------------|BYTE|---------------|
47 1 |Y|X|y|x|1|M|R|L| 2 |X|X|X|X|X|X|X|X| 3 |Y|Y|Y|Y|Y|Y|Y|Y| 4 | | |B|F|l|r|u|d| 47 1 |Y|X|y|x|1|M|R|L| 2 |X|X|X|X|X|X|X|X| 3 |Y|Y|Y|Y|Y|Y|Y|Y| 4 | | |B|F|r|l|u|d|
48 |---------------| |---------------| |---------------| |---------------| 48 |---------------| |---------------| |---------------| |---------------|
49 49
50Byte 1: Bit7 => Y overflow 50Byte 1: Bit7 => Y overflow
@@ -59,15 +59,15 @@ Byte 2: X Movement(9-bit 2's complement integers)
59Byte 3: Y Movement(9-bit 2's complement integers) 59Byte 3: Y Movement(9-bit 2's complement integers)
60Byte 4: Bit0 => the Vertical scrolling movement downward. 60Byte 4: Bit0 => the Vertical scrolling movement downward.
61 Bit1 => the Vertical scrolling movement upward. 61 Bit1 => the Vertical scrolling movement upward.
62 Bit2 => the Vertical scrolling movement rightward. 62 Bit2 => the Horizontal scrolling movement leftward.
63 Bit3 => the Vertical scrolling movement leftward. 63 Bit3 => the Horizontal scrolling movement rightward.
64 Bit4 => 1 = 4th mouse button is pressed, Forward one page. 64 Bit4 => 1 = 4th mouse button is pressed, Forward one page.
65 0 = 4th mouse button is not pressed. 65 0 = 4th mouse button is not pressed.
66 Bit5 => 1 = 5th mouse button is pressed, Backward one page. 66 Bit5 => 1 = 5th mouse button is pressed, Backward one page.
67 0 = 5th mouse button is not pressed. 67 0 = 5th mouse button is not pressed.
68 68
69C) MSID 7: 69C) MSID 7:
70# FSP uses 2 packets(8 Bytes) data to represent Absolute Position 70# FSP uses 2 packets (8 Bytes) to represent Absolute Position.
71 so we have PACKET NUMBER to identify packets. 71 so we have PACKET NUMBER to identify packets.
72 If PACKET NUMBER is 0, the packet is Packet 1. 72 If PACKET NUMBER is 0, the packet is Packet 1.
73 If PACKET NUMBER is 1, the packet is Packet 2. 73 If PACKET NUMBER is 1, the packet is Packet 2.
@@ -129,7 +129,7 @@ Byte 3: Message Type => 0x00 (Disabled)
129Byte 4: Bit7~Bit0 => Don't Care 129Byte 4: Bit7~Bit0 => Don't Care
130 130
131============================================================================== 131==============================================================================
132* Absolute position for STL3888-A0. 132* Absolute position for STL3888-Ax.
133============================================================================== 133==============================================================================
134Packet 1 (ABSOLUTE POSITION) 134Packet 1 (ABSOLUTE POSITION)
135 Bit 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 135 Bit 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0
@@ -179,14 +179,14 @@ Byte 4: Bit1~Bit0 => Y coordinate (xpos[1:0])
179 Bit5~Bit4 => y2_g 179 Bit5~Bit4 => y2_g
180 Bit7~Bit6 => x2_g 180 Bit7~Bit6 => x2_g
181 181
182Notify Packet for STL3888-A0 182Notify Packet for STL3888-Ax
183 Bit 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 183 Bit 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0
184BYTE |---------------|BYTE |---------------|BYTE|---------------|BYTE|---------------| 184BYTE |---------------|BYTE |---------------|BYTE|---------------|BYTE|---------------|
185 1 |1|0|1|P|1|M|R|L| 2 |C|C|C|C|C|C|C|C| 3 |0|0|F|F|0|0|0|i| 4 |r|l|d|u|0|0|0|0| 185 1 |1|0|1|P|1|M|R|L| 2 |C|C|C|C|C|C|C|C| 3 |0|0|F|F|0|0|0|i| 4 |r|l|d|u|0|0|0|0|
186 |---------------| |---------------| |---------------| |---------------| 186 |---------------| |---------------| |---------------| |---------------|
187 187
188Byte 1: Bit7~Bit6 => 00, Normal data packet 188Byte 1: Bit7~Bit6 => 00, Normal data packet
189 => 01, Absolute coordination packet 189 => 01, Absolute coordinates packet
190 => 10, Notify packet 190 => 10, Notify packet
191 Bit5 => 1 191 Bit5 => 1
192 Bit4 => when in absolute coordinates mode (valid when EN_PKT_GO is 1): 192 Bit4 => when in absolute coordinates mode (valid when EN_PKT_GO is 1):
@@ -205,15 +205,106 @@ Byte 4: Bit7 => scroll right button
205 Bit6 => scroll left button 205 Bit6 => scroll left button
206 Bit5 => scroll down button 206 Bit5 => scroll down button
207 Bit4 => scroll up button 207 Bit4 => scroll up button
208 * Note that if gesture and additional button (Bit4~Bit7) 208 * Note that if gesture and additional buttoni (Bit4~Bit7)
209 happen at the same time, the button information will not 209 happen at the same time, the button information will not
210 be sent. 210 be sent.
211 Bit3~Bit0 => Reserved
212
213Sample sequence of Multi-finger, Multi-coordinate mode:
214
215 notify packet (valid bit == 1), abs pkt 1, abs pkt 2, abs pkt 1,
216 abs pkt 2, ..., notify packet (valid bit == 0)
217
218==============================================================================
219* Absolute position for STL3888-B0.
220==============================================================================
221Packet 1(ABSOLUTE POSITION)
222 Bit 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0
223BYTE |---------------|BYTE |---------------|BYTE|---------------|BYTE|---------------|
224 1 |0|1|V|F|1|0|R|L| 2 |X|X|X|X|X|X|X|X| 3 |Y|Y|Y|Y|Y|Y|Y|Y| 4 |r|l|u|d|X|X|Y|Y|
225 |---------------| |---------------| |---------------| |---------------|
226
227Byte 1: Bit7~Bit6 => 00, Normal data packet
228 => 01, Absolute coordinates packet
229 => 10, Notify packet
230 Bit5 => Valid bit, 0 means that the coordinate is invalid or finger up.
231 When both fingers are up, the last two reports have zero valid
232 bit.
233 Bit4 => finger up/down information. 1: finger down, 0: finger up.
234 Bit3 => 1
235 Bit2 => finger index, 0 is the first finger, 1 is the second finger.
236 Bit1 => Right Button, 1 is pressed, 0 is not pressed.
237 Bit0 => Left Button, 1 is pressed, 0 is not pressed.
238Byte 2: X coordinate (xpos[9:2])
239Byte 3: Y coordinate (ypos[9:2])
240Byte 4: Bit1~Bit0 => Y coordinate (xpos[1:0])
241 Bit3~Bit2 => X coordinate (ypos[1:0])
242 Bit4 => scroll down button
243 Bit5 => scroll up button
244 Bit6 => scroll left button
245 Bit7 => scroll right button
246
247Packet 2 (ABSOLUTE POSITION)
248 Bit 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0
249BYTE |---------------|BYTE |---------------|BYTE|---------------|BYTE|---------------|
250 1 |0|1|V|F|1|1|R|L| 2 |X|X|X|X|X|X|X|X| 3 |Y|Y|Y|Y|Y|Y|Y|Y| 4 |r|l|u|d|X|X|Y|Y|
251 |---------------| |---------------| |---------------| |---------------|
252
253Byte 1: Bit7~Bit6 => 00, Normal data packet
254 => 01, Absolute coordination packet
255 => 10, Notify packet
256 Bit5 => Valid bit, 0 means that the coordinate is invalid or finger up.
257 When both fingers are up, the last two reports have zero valid
258 bit.
259 Bit4 => finger up/down information. 1: finger down, 0: finger up.
260 Bit3 => 1
261 Bit2 => finger index, 0 is the first finger, 1 is the second finger.
262 Bit1 => Right Button, 1 is pressed, 0 is not pressed.
263 Bit0 => Left Button, 1 is pressed, 0 is not pressed.
264Byte 2: X coordinate (xpos[9:2])
265Byte 3: Y coordinate (ypos[9:2])
266Byte 4: Bit1~Bit0 => Y coordinate (xpos[1:0])
267 Bit3~Bit2 => X coordinate (ypos[1:0])
268 Bit4 => scroll down button
269 Bit5 => scroll up button
270 Bit6 => scroll left button
271 Bit7 => scroll right button
272
273Notify Packet for STL3888-B0
274 Bit 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0
275BYTE |---------------|BYTE |---------------|BYTE|---------------|BYTE|---------------|
276 1 |1|0|1|P|1|M|R|L| 2 |C|C|C|C|C|C|C|C| 3 |0|0|F|F|0|0|0|i| 4 |r|l|u|d|0|0|0|0|
277 |---------------| |---------------| |---------------| |---------------|
278
279Byte 1: Bit7~Bit6 => 00, Normal data packet
280 => 01, Absolute coordination packet
281 => 10, Notify packet
282 Bit5 => 1
283 Bit4 => when in absolute coordinate mode (valid when EN_PKT_GO is 1):
284 0: left button is generated by the on-pad command
285 1: left button is generated by the external button
286 Bit3 => 1
287 Bit2 => Middle Button, 1 is pressed, 0 is not pressed.
288 Bit1 => Right Button, 1 is pressed, 0 is not pressed.
289 Bit0 => Left Button, 1 is pressed, 0 is not pressed.
290Byte 2: Message Type => 0xB7 (Multi Finger, Multi Coordinate mode)
291Byte 3: Bit7~Bit6 => Don't care
292 Bit5~Bit4 => Number of fingers
293 Bit3~Bit1 => Reserved
294 Bit0 => 1: enter gesture mode; 0: leaving gesture mode
295Byte 4: Bit7 => scroll right button
296 Bit6 => scroll left button
297 Bit5 => scroll up button
298 Bit4 => scroll down button
299 * Note that if gesture and additional button(Bit4~Bit7)
300 happen at the same time, the button information will not
301 be sent.
211 Bit3~Bit0 => Reserved 302 Bit3~Bit0 => Reserved
212 303
213Sample sequence of Multi-finger, Multi-coordinate mode: 304Sample sequence of Multi-finger, Multi-coordinate mode:
214 305
215 notify packet (valid bit == 1), abs pkt 1, abs pkt 2, abs pkt 1, 306 notify packet (valid bit == 1), abs pkt 1, abs pkt 2, abs pkt 1,
216 abs pkt 2, ..., notify packet(valid bit == 0) 307 abs pkt 2, ..., notify packet (valid bit == 0)
217 308
218============================================================================== 309==============================================================================
219* FSP Enable/Disable packet 310* FSP Enable/Disable packet
@@ -409,7 +500,8 @@ offset width default r/w name
409 0: read only, 1: read/write enable 500 0: read only, 1: read/write enable
410 (Note that following registers does not require clock gating being 501 (Note that following registers does not require clock gating being
411 enabled prior to write: 05 06 07 08 09 0c 0f 10 11 12 16 17 18 23 2e 502 enabled prior to write: 05 06 07 08 09 0c 0f 10 11 12 16 17 18 23 2e
412 40 41 42 43.) 503 40 41 42 43. In addition to that, this bit must be 1 when gesture
504 mode is enabled)
413 505
4140x31 RW on-pad command detection 5060x31 RW on-pad command detection
415 bit7 0 RW on-pad command left button down tag 507 bit7 0 RW on-pad command left button down tag
@@ -463,6 +555,10 @@ offset width default r/w name
463 absolute coordinates; otherwise, host only receives packets with 555 absolute coordinates; otherwise, host only receives packets with
464 relative coordinate.) 556 relative coordinate.)
465 557
558 bit7 0 RW EN_PS2_F2: PS/2 gesture mode 2nd
559 finger packet enable
560 0: disable, 1: enable
561
4660x43 RW on-pad control 5620x43 RW on-pad control
467 bit0 0 RW on-pad control enable 563 bit0 0 RW on-pad control enable
468 0: disable, 1: enable 564 0: disable, 1: enable
diff --git a/Documentation/ioctl/ioctl-number.txt b/Documentation/ioctl/ioctl-number.txt
index 35cf64d4436d..35c9b51d20ea 100644
--- a/Documentation/ioctl/ioctl-number.txt
+++ b/Documentation/ioctl/ioctl-number.txt
@@ -139,7 +139,6 @@ Code Seq#(hex) Include File Comments
139'K' all linux/kd.h 139'K' all linux/kd.h
140'L' 00-1F linux/loop.h conflict! 140'L' 00-1F linux/loop.h conflict!
141'L' 10-1F drivers/scsi/mpt2sas/mpt2sas_ctl.h conflict! 141'L' 10-1F drivers/scsi/mpt2sas/mpt2sas_ctl.h conflict!
142'L' 20-2F linux/usb/vstusb.h
143'L' E0-FF linux/ppdd.h encrypted disk device driver 142'L' E0-FF linux/ppdd.h encrypted disk device driver
144 <http://linux01.gwdg.de/~alatham/ppdd.html> 143 <http://linux01.gwdg.de/~alatham/ppdd.html>
145'M' all linux/soundcard.h conflict! 144'M' all linux/soundcard.h conflict!
diff --git a/Documentation/isdn/INTERFACE.CAPI b/Documentation/isdn/INTERFACE.CAPI
index 5fe8de5cc727..f172091fb7cd 100644
--- a/Documentation/isdn/INTERFACE.CAPI
+++ b/Documentation/isdn/INTERFACE.CAPI
@@ -149,10 +149,11 @@ char *(*procinfo)(struct capi_ctr *ctrlr)
149 pointer to a callback function returning the entry for the device in 149 pointer to a callback function returning the entry for the device in
150 the CAPI controller info table, /proc/capi/controller 150 the CAPI controller info table, /proc/capi/controller
151 151
152read_proc_t *ctr_read_proc 152const struct file_operations *proc_fops
153 pointer to the read_proc callback function for the device's proc file 153 pointers to callback functions for the device's proc file
154 system entry, /proc/capi/controllers/<n>; will be called with a 154 system entry, /proc/capi/controllers/<n>; pointer to the device's
155 pointer to the device's capi_ctr structure as the last (data) argument 155 capi_ctr structure is available from struct proc_dir_entry::data
156 which is available from struct inode.
156 157
157Note: Callback functions except send_message() are never called in interrupt 158Note: Callback functions except send_message() are never called in interrupt
158context. 159context.
diff --git a/Documentation/isdn/README.gigaset b/Documentation/isdn/README.gigaset
index 794941fc9493..e472df842323 100644
--- a/Documentation/isdn/README.gigaset
+++ b/Documentation/isdn/README.gigaset
@@ -292,10 +292,10 @@ GigaSet 307x Device Driver
292 to /etc/modprobe.d/gigaset, /etc/modprobe.conf.local or a similar file. 292 to /etc/modprobe.d/gigaset, /etc/modprobe.conf.local or a similar file.
293 293
294 Problem: 294 Problem:
295 Your isdn script aborts with a message about isdnlog. 295 The isdnlog program emits error messages or just doesn't work.
296 Solution: 296 Solution:
297 Try deactivating (or commenting out) isdnlog. This driver does not 297 Isdnlog supports only the HiSax driver. Do not attempt to use it with
298 support it. 298 other drivers such as Gigaset.
299 299
300 Problem: 300 Problem:
301 You have two or more DECT data adapters (M101/M105) and only the 301 You have two or more DECT data adapters (M101/M105) and only the
@@ -321,8 +321,8 @@ GigaSet 307x Device Driver
321 writing an appropriate value to /sys/module/gigaset/parameters/debug, e.g. 321 writing an appropriate value to /sys/module/gigaset/parameters/debug, e.g.
322 echo 0 > /sys/module/gigaset/parameters/debug 322 echo 0 > /sys/module/gigaset/parameters/debug
323 switches off debugging output completely, 323 switches off debugging output completely,
324 echo 0x10a020 > /sys/module/gigaset/parameters/debug 324 echo 0x302020 > /sys/module/gigaset/parameters/debug
325 enables the standard set of debugging output messages. These values are 325 enables a reasonable set of debugging output messages. These values are
326 bit patterns where every bit controls a certain type of debugging output. 326 bit patterns where every bit controls a certain type of debugging output.
327 See the constants DEBUG_* in the source file gigaset.h for details. 327 See the constants DEBUG_* in the source file gigaset.h for details.
328 328
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 826b6e148316..e4cbca58536c 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -54,6 +54,7 @@ parameter is applicable:
54 IMA Integrity measurement architecture is enabled. 54 IMA Integrity measurement architecture is enabled.
55 IOSCHED More than one I/O scheduler is enabled. 55 IOSCHED More than one I/O scheduler is enabled.
56 IP_PNP IP DHCP, BOOTP, or RARP is enabled. 56 IP_PNP IP DHCP, BOOTP, or RARP is enabled.
57 IPV6 IPv6 support is enabled.
57 ISAPNP ISA PnP code is enabled. 58 ISAPNP ISA PnP code is enabled.
58 ISDN Appropriate ISDN support is enabled. 59 ISDN Appropriate ISDN support is enabled.
59 JOY Appropriate joystick support is enabled. 60 JOY Appropriate joystick support is enabled.
@@ -199,10 +200,6 @@ and is between 256 and 4096 characters. It is defined in the file
199 acpi_display_output=video 200 acpi_display_output=video
200 See above. 201 See above.
201 202
202 acpi_early_pdc_eval [HW,ACPI] Evaluate processor _PDC methods
203 early. Needed on some platforms to properly
204 initialize the EC.
205
206 acpi_irq_balance [HW,ACPI] 203 acpi_irq_balance [HW,ACPI]
207 ACPI will balance active IRQs 204 ACPI will balance active IRQs
208 default in APIC mode 205 default in APIC mode
@@ -315,6 +312,11 @@ and is between 256 and 4096 characters. It is defined in the file
315 aic79xx= [HW,SCSI] 312 aic79xx= [HW,SCSI]
316 See Documentation/scsi/aic79xx.txt. 313 See Documentation/scsi/aic79xx.txt.
317 314
315 alignment= [KNL,ARM]
316 Allow the default userspace alignment fault handler
317 behaviour to be specified. Bit 0 enables warnings,
318 bit 1 enables fixups, and bit 2 sends a segfault.
319
318 amd_iommu= [HW,X86-84] 320 amd_iommu= [HW,X86-84]
319 Pass parameters to the AMD IOMMU driver in the system. 321 Pass parameters to the AMD IOMMU driver in the system.
320 Possible values are: 322 Possible values are:
@@ -351,6 +353,9 @@ and is between 256 and 4096 characters. It is defined in the file
351 Change the amount of debugging information output 353 Change the amount of debugging information output
352 when initialising the APIC and IO-APIC components. 354 when initialising the APIC and IO-APIC components.
353 355
356 autoconf= [IPV6]
357 See Documentation/networking/ipv6.txt.
358
354 show_lapic= [APIC,X86] Advanced Programmable Interrupt Controller 359 show_lapic= [APIC,X86] Advanced Programmable Interrupt Controller
355 Limit apic dumping. The parameter defines the maximal 360 Limit apic dumping. The parameter defines the maximal
356 number of local apics being dumped. Also it is possible 361 number of local apics being dumped. Also it is possible
@@ -633,6 +638,12 @@ and is between 256 and 4096 characters. It is defined in the file
633 See drivers/char/README.epca and 638 See drivers/char/README.epca and
634 Documentation/serial/digiepca.txt. 639 Documentation/serial/digiepca.txt.
635 640
641 disable= [IPV6]
642 See Documentation/networking/ipv6.txt.
643
644 disable_ipv6= [IPV6]
645 See Documentation/networking/ipv6.txt.
646
636 disable_mtrr_cleanup [X86] 647 disable_mtrr_cleanup [X86]
637 The kernel tries to adjust MTRR layout from continuous 648 The kernel tries to adjust MTRR layout from continuous
638 to discrete, to make X server driver able to add WB 649 to discrete, to make X server driver able to add WB
@@ -1733,6 +1744,9 @@ and is between 256 and 4096 characters. It is defined in the file
1733 nomfgpt [X86-32] Disable Multi-Function General Purpose 1744 nomfgpt [X86-32] Disable Multi-Function General Purpose
1734 Timer usage (for AMD Geode machines). 1745 Timer usage (for AMD Geode machines).
1735 1746
1747 nopat [X86] Disable PAT (page attribute table extension of
1748 pagetables) support.
1749
1736 norandmaps Don't use address space randomization. Equivalent to 1750 norandmaps Don't use address space randomization. Equivalent to
1737 echo 0 > /proc/sys/kernel/randomize_va_space 1751 echo 0 > /proc/sys/kernel/randomize_va_space
1738 1752
@@ -1776,6 +1790,12 @@ and is between 256 and 4096 characters. It is defined in the file
1776 purges which is reported from either PAL_VM_SUMMARY or 1790 purges which is reported from either PAL_VM_SUMMARY or
1777 SAL PALO. 1791 SAL PALO.
1778 1792
1793 nr_cpus= [SMP] Maximum number of processors that an SMP kernel
1794 could support. nr_cpus=n : n >= 1 limits the kernel to
1795 supporting 'n' processors. Later in runtime you can not
1796 use hotplug cpu feature to put more cpu back to online.
1797 just like you compile the kernel NR_CPUS=n
1798
1779 nr_uarts= [SERIAL] maximum number of UARTs to be registered. 1799 nr_uarts= [SERIAL] maximum number of UARTs to be registered.
1780 1800
1781 numa_zonelist_order= [KNL, BOOT] Select zonelist order for NUMA. 1801 numa_zonelist_order= [KNL, BOOT] Select zonelist order for NUMA.
@@ -1943,8 +1963,12 @@ and is between 256 and 4096 characters. It is defined in the file
1943 IRQ routing is enabled. 1963 IRQ routing is enabled.
1944 noacpi [X86] Do not use ACPI for IRQ routing 1964 noacpi [X86] Do not use ACPI for IRQ routing
1945 or for PCI scanning. 1965 or for PCI scanning.
1946 use_crs [X86] Use _CRS for PCI resource 1966 use_crs [X86] Use PCI host bridge window information
1947 allocation. 1967 from ACPI. On BIOSes from 2008 or later, this
1968 is enabled by default. If you need to use this,
1969 please report a bug.
1970 nocrs [X86] Ignore PCI host bridge windows from ACPI.
1971 If you need to use this, please report a bug.
1948 routeirq Do IRQ routing for all PCI devices. 1972 routeirq Do IRQ routing for all PCI devices.
1949 This is normally done in pci_enable_device(), 1973 This is normally done in pci_enable_device(),
1950 so this option is a temporary workaround 1974 so this option is a temporary workaround
@@ -1993,6 +2017,14 @@ and is between 256 and 4096 characters. It is defined in the file
1993 force Enable ASPM even on devices that claim not to support it. 2017 force Enable ASPM even on devices that claim not to support it.
1994 WARNING: Forcing ASPM on may cause system lockups. 2018 WARNING: Forcing ASPM on may cause system lockups.
1995 2019
2020 pcie_pme= [PCIE,PM] Native PCIe PME signaling options:
2021 off Do not use native PCIe PME signaling.
2022 force Use native PCIe PME signaling even if the BIOS refuses
2023 to allow the kernel to control the relevant PCIe config
2024 registers.
2025 nomsi Do not use MSI for native PCIe PME signaling (this makes
2026 all PCIe root ports use INTx for everything).
2027
1996 pcmv= [HW,PCMCIA] BadgePAD 4 2028 pcmv= [HW,PCMCIA] BadgePAD 4
1997 2029
1998 pd. [PARIDE] 2030 pd. [PARIDE]
@@ -2698,6 +2730,13 @@ and is between 256 and 4096 characters. It is defined in the file
2698 medium is write-protected). 2730 medium is write-protected).
2699 Example: quirks=0419:aaf5:rl,0421:0433:rc 2731 Example: quirks=0419:aaf5:rl,0421:0433:rc
2700 2732
2733 userpte=
2734 [X86] Flags controlling user PTE allocations.
2735
2736 nohigh = do not allocate PTE pages in
2737 HIGHMEM regardless of setting
2738 of CONFIG_HIGHPTE.
2739
2701 vdso= [X86,SH] 2740 vdso= [X86,SH]
2702 vdso=2: enable compat VDSO (default with COMPAT_VDSO) 2741 vdso=2: enable compat VDSO (default with COMPAT_VDSO)
2703 vdso=1: enable VDSO (default) 2742 vdso=1: enable VDSO (default)
@@ -2791,6 +2830,12 @@ and is between 256 and 4096 characters. It is defined in the file
2791 default x2apic cluster mode on platforms 2830 default x2apic cluster mode on platforms
2792 supporting x2apic. 2831 supporting x2apic.
2793 2832
2833 x86_mrst_timer= [X86-32,APBT]
2834 Choose timer option for x86 Moorestown MID platform.
2835 Two valid options are apbt timer only and lapic timer
2836 plus one apbt timer for broadcast timer.
2837 x86_mrst_timer=apbt_only | lapic_and_apbt
2838
2794 xd= [HW,XT] Original XT pre-IDE (RLL encoded) disks. 2839 xd= [HW,XT] Original XT pre-IDE (RLL encoded) disks.
2795 xd_geo= See header of drivers/block/xd.c. 2840 xd_geo= See header of drivers/block/xd.c.
2796 2841
diff --git a/Documentation/kobject.txt b/Documentation/kobject.txt
index c79ab996dada..bdb13817e1e9 100644
--- a/Documentation/kobject.txt
+++ b/Documentation/kobject.txt
@@ -266,7 +266,7 @@ kobj_type:
266 266
267 struct kobj_type { 267 struct kobj_type {
268 void (*release)(struct kobject *); 268 void (*release)(struct kobject *);
269 struct sysfs_ops *sysfs_ops; 269 const struct sysfs_ops *sysfs_ops;
270 struct attribute **default_attrs; 270 struct attribute **default_attrs;
271 }; 271 };
272 272
diff --git a/Documentation/kprobes.txt b/Documentation/kprobes.txt
index 053037a1fe6d..2f9115c0ae62 100644
--- a/Documentation/kprobes.txt
+++ b/Documentation/kprobes.txt
@@ -1,6 +1,7 @@
1Title : Kernel Probes (Kprobes) 1Title : Kernel Probes (Kprobes)
2Authors : Jim Keniston <jkenisto@us.ibm.com> 2Authors : Jim Keniston <jkenisto@us.ibm.com>
3 : Prasanna S Panchamukhi <prasanna@in.ibm.com> 3 : Prasanna S Panchamukhi <prasanna.panchamukhi@gmail.com>
4 : Masami Hiramatsu <mhiramat@redhat.com>
4 5
5CONTENTS 6CONTENTS
6 7
@@ -15,6 +16,7 @@ CONTENTS
159. Jprobes Example 169. Jprobes Example
1610. Kretprobes Example 1710. Kretprobes Example
17Appendix A: The kprobes debugfs interface 18Appendix A: The kprobes debugfs interface
19Appendix B: The kprobes sysctl interface
18 20
191. Concepts: Kprobes, Jprobes, Return Probes 211. Concepts: Kprobes, Jprobes, Return Probes
20 22
@@ -42,13 +44,13 @@ registration/unregistration of a group of *probes. These functions
42can speed up unregistration process when you have to unregister 44can speed up unregistration process when you have to unregister
43a lot of probes at once. 45a lot of probes at once.
44 46
45The next three subsections explain how the different types of 47The next four subsections explain how the different types of
46probes work. They explain certain things that you'll need to 48probes work and how jump optimization works. They explain certain
47know in order to make the best use of Kprobes -- e.g., the 49things that you'll need to know in order to make the best use of
48difference between a pre_handler and a post_handler, and how 50Kprobes -- e.g., the difference between a pre_handler and
49to use the maxactive and nmissed fields of a kretprobe. But 51a post_handler, and how to use the maxactive and nmissed fields of
50if you're in a hurry to start using Kprobes, you can skip ahead 52a kretprobe. But if you're in a hurry to start using Kprobes, you
51to section 2. 53can skip ahead to section 2.
52 54
531.1 How Does a Kprobe Work? 551.1 How Does a Kprobe Work?
54 56
@@ -161,13 +163,125 @@ In case probed function is entered but there is no kretprobe_instance
161object available, then in addition to incrementing the nmissed count, 163object available, then in addition to incrementing the nmissed count,
162the user entry_handler invocation is also skipped. 164the user entry_handler invocation is also skipped.
163 165
1661.4 How Does Jump Optimization Work?
167
168If you configured your kernel with CONFIG_OPTPROBES=y (currently
169this option is supported on x86/x86-64, non-preemptive kernel) and
170the "debug.kprobes_optimization" kernel parameter is set to 1 (see
171sysctl(8)), Kprobes tries to reduce probe-hit overhead by using a jump
172instruction instead of a breakpoint instruction at each probepoint.
173
1741.4.1 Init a Kprobe
175
176When a probe is registered, before attempting this optimization,
177Kprobes inserts an ordinary, breakpoint-based kprobe at the specified
178address. So, even if it's not possible to optimize this particular
179probepoint, there'll be a probe there.
180
1811.4.2 Safety Check
182
183Before optimizing a probe, Kprobes performs the following safety checks:
184
185- Kprobes verifies that the region that will be replaced by the jump
186instruction (the "optimized region") lies entirely within one function.
187(A jump instruction is multiple bytes, and so may overlay multiple
188instructions.)
189
190- Kprobes analyzes the entire function and verifies that there is no
191jump into the optimized region. Specifically:
192 - the function contains no indirect jump;
193 - the function contains no instruction that causes an exception (since
194 the fixup code triggered by the exception could jump back into the
195 optimized region -- Kprobes checks the exception tables to verify this);
196 and
197 - there is no near jump to the optimized region (other than to the first
198 byte).
199
200- For each instruction in the optimized region, Kprobes verifies that
201the instruction can be executed out of line.
202
2031.4.3 Preparing Detour Buffer
204
205Next, Kprobes prepares a "detour" buffer, which contains the following
206instruction sequence:
207- code to push the CPU's registers (emulating a breakpoint trap)
208- a call to the trampoline code which calls user's probe handlers.
209- code to restore registers
210- the instructions from the optimized region
211- a jump back to the original execution path.
212
2131.4.4 Pre-optimization
214
215After preparing the detour buffer, Kprobes verifies that none of the
216following situations exist:
217- The probe has either a break_handler (i.e., it's a jprobe) or a
218post_handler.
219- Other instructions in the optimized region are probed.
220- The probe is disabled.
221In any of the above cases, Kprobes won't start optimizing the probe.
222Since these are temporary situations, Kprobes tries to start
223optimizing it again if the situation is changed.
224
225If the kprobe can be optimized, Kprobes enqueues the kprobe to an
226optimizing list, and kicks the kprobe-optimizer workqueue to optimize
227it. If the to-be-optimized probepoint is hit before being optimized,
228Kprobes returns control to the original instruction path by setting
229the CPU's instruction pointer to the copied code in the detour buffer
230-- thus at least avoiding the single-step.
231
2321.4.5 Optimization
233
234The Kprobe-optimizer doesn't insert the jump instruction immediately;
235rather, it calls synchronize_sched() for safety first, because it's
236possible for a CPU to be interrupted in the middle of executing the
237optimized region(*). As you know, synchronize_sched() can ensure
238that all interruptions that were active when synchronize_sched()
239was called are done, but only if CONFIG_PREEMPT=n. So, this version
240of kprobe optimization supports only kernels with CONFIG_PREEMPT=n.(**)
241
242After that, the Kprobe-optimizer calls stop_machine() to replace
243the optimized region with a jump instruction to the detour buffer,
244using text_poke_smp().
245
2461.4.6 Unoptimization
247
248When an optimized kprobe is unregistered, disabled, or blocked by
249another kprobe, it will be unoptimized. If this happens before
250the optimization is complete, the kprobe is just dequeued from the
251optimized list. If the optimization has been done, the jump is
252replaced with the original code (except for an int3 breakpoint in
253the first byte) by using text_poke_smp().
254
255(*)Please imagine that the 2nd instruction is interrupted and then
256the optimizer replaces the 2nd instruction with the jump *address*
257while the interrupt handler is running. When the interrupt
258returns to original address, there is no valid instruction,
259and it causes an unexpected result.
260
261(**)This optimization-safety checking may be replaced with the
262stop-machine method that ksplice uses for supporting a CONFIG_PREEMPT=y
263kernel.
264
265NOTE for geeks:
266The jump optimization changes the kprobe's pre_handler behavior.
267Without optimization, the pre_handler can change the kernel's execution
268path by changing regs->ip and returning 1. However, when the probe
269is optimized, that modification is ignored. Thus, if you want to
270tweak the kernel's execution path, you need to suppress optimization,
271using one of the following techniques:
272- Specify an empty function for the kprobe's post_handler or break_handler.
273 or
274- Config CONFIG_OPTPROBES=n.
275 or
276- Execute 'sysctl -w debug.kprobes_optimization=n'
277
1642. Architectures Supported 2782. Architectures Supported
165 279
166Kprobes, jprobes, and return probes are implemented on the following 280Kprobes, jprobes, and return probes are implemented on the following
167architectures: 281architectures:
168 282
169- i386 283- i386 (Supports jump optimization)
170- x86_64 (AMD-64, EM64T) 284- x86_64 (AMD-64, EM64T) (Supports jump optimization)
171- ppc64 285- ppc64
172- ia64 (Does not support probes on instruction slot1.) 286- ia64 (Does not support probes on instruction slot1.)
173- sparc64 (Return probes not yet implemented.) 287- sparc64 (Return probes not yet implemented.)
@@ -193,6 +307,10 @@ it useful to "Compile the kernel with debug info" (CONFIG_DEBUG_INFO),
193so you can use "objdump -d -l vmlinux" to see the source-to-object 307so you can use "objdump -d -l vmlinux" to see the source-to-object
194code mapping. 308code mapping.
195 309
310If you want to reduce probing overhead, set "Kprobes jump optimization
311support" (CONFIG_OPTPROBES) to "y". You can find this option under the
312"Kprobes" line.
313
1964. API Reference 3144. API Reference
197 315
198The Kprobes API includes a "register" function and an "unregister" 316The Kprobes API includes a "register" function and an "unregister"
@@ -389,7 +507,10 @@ the probe which has been registered.
389 507
390Kprobes allows multiple probes at the same address. Currently, 508Kprobes allows multiple probes at the same address. Currently,
391however, there cannot be multiple jprobes on the same function at 509however, there cannot be multiple jprobes on the same function at
392the same time. 510the same time. Also, a probepoint for which there is a jprobe or
511a post_handler cannot be optimized. So if you install a jprobe,
512or a kprobe with a post_handler, at an optimized probepoint, the
513probepoint will be unoptimized automatically.
393 514
394In general, you can install a probe anywhere in the kernel. 515In general, you can install a probe anywhere in the kernel.
395In particular, you can probe interrupt handlers. Known exceptions 516In particular, you can probe interrupt handlers. Known exceptions
@@ -453,6 +574,38 @@ reason, Kprobes doesn't support return probes (or kprobes or jprobes)
453on the x86_64 version of __switch_to(); the registration functions 574on the x86_64 version of __switch_to(); the registration functions
454return -EINVAL. 575return -EINVAL.
455 576
577On x86/x86-64, since the Jump Optimization of Kprobes modifies
578instructions widely, there are some limitations to optimization. To
579explain it, we introduce some terminology. Imagine a 3-instruction
580sequence consisting of a two 2-byte instructions and one 3-byte
581instruction.
582
583 IA
584 |
585[-2][-1][0][1][2][3][4][5][6][7]
586 [ins1][ins2][ ins3 ]
587 [<- DCR ->]
588 [<- JTPR ->]
589
590ins1: 1st Instruction
591ins2: 2nd Instruction
592ins3: 3rd Instruction
593IA: Insertion Address
594JTPR: Jump Target Prohibition Region
595DCR: Detoured Code Region
596
597The instructions in DCR are copied to the out-of-line buffer
598of the kprobe, because the bytes in DCR are replaced by
599a 5-byte jump instruction. So there are several limitations.
600
601a) The instructions in DCR must be relocatable.
602b) The instructions in DCR must not include a call instruction.
603c) JTPR must not be targeted by any jump or call instruction.
604d) DCR must not straddle the border betweeen functions.
605
606Anyway, these limitations are checked by the in-kernel instruction
607decoder, so you don't need to worry about that.
608
4566. Probe Overhead 6096. Probe Overhead
457 610
458On a typical CPU in use in 2005, a kprobe hit takes 0.5 to 1.0 611On a typical CPU in use in 2005, a kprobe hit takes 0.5 to 1.0
@@ -476,6 +629,19 @@ k = 0.49 usec; j = 0.76; r = 0.80; kr = 0.82; jr = 1.07
476ppc64: POWER5 (gr), 1656 MHz (SMT disabled, 1 virtual CPU per physical CPU) 629ppc64: POWER5 (gr), 1656 MHz (SMT disabled, 1 virtual CPU per physical CPU)
477k = 0.77 usec; j = 1.31; r = 1.26; kr = 1.45; jr = 1.99 630k = 0.77 usec; j = 1.31; r = 1.26; kr = 1.45; jr = 1.99
478 631
6326.1 Optimized Probe Overhead
633
634Typically, an optimized kprobe hit takes 0.07 to 0.1 microseconds to
635process. Here are sample overhead figures (in usec) for x86 architectures.
636k = unoptimized kprobe, b = boosted (single-step skipped), o = optimized kprobe,
637r = unoptimized kretprobe, rb = boosted kretprobe, ro = optimized kretprobe.
638
639i386: Intel(R) Xeon(R) E5410, 2.33GHz, 4656.90 bogomips
640k = 0.80 usec; b = 0.33; o = 0.05; r = 1.10; rb = 0.61; ro = 0.33
641
642x86-64: Intel(R) Xeon(R) E5410, 2.33GHz, 4656.90 bogomips
643k = 0.99 usec; b = 0.43; o = 0.06; r = 1.24; rb = 0.68; ro = 0.30
644
4797. TODO 6457. TODO
480 646
481a. SystemTap (http://sourceware.org/systemtap): Provides a simplified 647a. SystemTap (http://sourceware.org/systemtap): Provides a simplified
@@ -523,7 +689,8 @@ is also specified. Following columns show probe status. If the probe is on
523a virtual address that is no longer valid (module init sections, module 689a virtual address that is no longer valid (module init sections, module
524virtual addresses that correspond to modules that've been unloaded), 690virtual addresses that correspond to modules that've been unloaded),
525such probes are marked with [GONE]. If the probe is temporarily disabled, 691such probes are marked with [GONE]. If the probe is temporarily disabled,
526such probes are marked with [DISABLED]. 692such probes are marked with [DISABLED]. If the probe is optimized, it is
693marked with [OPTIMIZED].
527 694
528/sys/kernel/debug/kprobes/enabled: Turn kprobes ON/OFF forcibly. 695/sys/kernel/debug/kprobes/enabled: Turn kprobes ON/OFF forcibly.
529 696
@@ -533,3 +700,19 @@ registered probes will be disarmed, till such time a "1" is echoed to this
533file. Note that this knob just disarms and arms all kprobes and doesn't 700file. Note that this knob just disarms and arms all kprobes and doesn't
534change each probe's disabling state. This means that disabled kprobes (marked 701change each probe's disabling state. This means that disabled kprobes (marked
535[DISABLED]) will be not enabled if you turn ON all kprobes by this knob. 702[DISABLED]) will be not enabled if you turn ON all kprobes by this knob.
703
704
705Appendix B: The kprobes sysctl interface
706
707/proc/sys/debug/kprobes-optimization: Turn kprobes optimization ON/OFF.
708
709When CONFIG_OPTPROBES=y, this sysctl interface appears and it provides
710a knob to globally and forcibly turn jump optimization (see section
7111.4) ON or OFF. By default, jump optimization is allowed (ON).
712If you echo "0" to this file or set "debug.kprobes_optimization" to
7130 via sysctl, all optimized probes will be unoptimized, and any new
714probes registered after that will not be optimized. Note that this
715knob *changes* the optimized state. This means that optimized probes
716(marked [OPTIMIZED]) will be unoptimized ([OPTIMIZED] tag will be
717removed). If the knob is turned on, they will be optimized again.
718
diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt
index 2811e452f756..c6416a398163 100644
--- a/Documentation/kvm/api.txt
+++ b/Documentation/kvm/api.txt
@@ -23,12 +23,12 @@ of a virtual machine. The ioctls belong to three classes
23 Only run vcpu ioctls from the same thread that was used to create the 23 Only run vcpu ioctls from the same thread that was used to create the
24 vcpu. 24 vcpu.
25 25
262. File descritpors 262. File descriptors
27 27
28The kvm API is centered around file descriptors. An initial 28The kvm API is centered around file descriptors. An initial
29open("/dev/kvm") obtains a handle to the kvm subsystem; this handle 29open("/dev/kvm") obtains a handle to the kvm subsystem; this handle
30can be used to issue system ioctls. A KVM_CREATE_VM ioctl on this 30can be used to issue system ioctls. A KVM_CREATE_VM ioctl on this
31handle will create a VM file descripror which can be used to issue VM 31handle will create a VM file descriptor which can be used to issue VM
32ioctls. A KVM_CREATE_VCPU ioctl on a VM fd will create a virtual cpu 32ioctls. A KVM_CREATE_VCPU ioctl on a VM fd will create a virtual cpu
33and return a file descriptor pointing to it. Finally, ioctls on a vcpu 33and return a file descriptor pointing to it. Finally, ioctls on a vcpu
34fd can be used to control the vcpu, including the important task of 34fd can be used to control the vcpu, including the important task of
@@ -643,7 +643,7 @@ Type: vm ioctl
643Parameters: struct kvm_clock_data (in) 643Parameters: struct kvm_clock_data (in)
644Returns: 0 on success, -1 on error 644Returns: 0 on success, -1 on error
645 645
646Sets the current timestamp of kvmclock to the valued specific in its parameter. 646Sets the current timestamp of kvmclock to the value specified in its parameter.
647In conjunction with KVM_GET_CLOCK, it is used to ensure monotonicity on scenarios 647In conjunction with KVM_GET_CLOCK, it is used to ensure monotonicity on scenarios
648such as migration. 648such as migration.
649 649
@@ -795,11 +795,11 @@ Unused.
795 __u64 data_offset; /* relative to kvm_run start */ 795 __u64 data_offset; /* relative to kvm_run start */
796 } io; 796 } io;
797 797
798If exit_reason is KVM_EXIT_IO_IN or KVM_EXIT_IO_OUT, then the vcpu has 798If exit_reason is KVM_EXIT_IO, then the vcpu has
799executed a port I/O instruction which could not be satisfied by kvm. 799executed a port I/O instruction which could not be satisfied by kvm.
800data_offset describes where the data is located (KVM_EXIT_IO_OUT) or 800data_offset describes where the data is located (KVM_EXIT_IO_OUT) or
801where kvm expects application code to place the data for the next 801where kvm expects application code to place the data for the next
802KVM_RUN invocation (KVM_EXIT_IO_IN). Data format is a patcked array. 802KVM_RUN invocation (KVM_EXIT_IO_IN). Data format is a packed array.
803 803
804 struct { 804 struct {
805 struct kvm_debug_exit_arch arch; 805 struct kvm_debug_exit_arch arch;
@@ -815,7 +815,7 @@ Unused.
815 __u8 is_write; 815 __u8 is_write;
816 } mmio; 816 } mmio;
817 817
818If exit_reason is KVM_EXIT_MMIO or KVM_EXIT_IO_OUT, then the vcpu has 818If exit_reason is KVM_EXIT_MMIO, then the vcpu has
819executed a memory-mapped I/O instruction which could not be satisfied 819executed a memory-mapped I/O instruction which could not be satisfied
820by kvm. The 'data' member contains the written data if 'is_write' is 820by kvm. The 'data' member contains the written data if 'is_write' is
821true, and should be filled by application code otherwise. 821true, and should be filled by application code otherwise.
diff --git a/Documentation/laptops/00-INDEX b/Documentation/laptops/00-INDEX
index ee5692b26dd4..fa688538e757 100644
--- a/Documentation/laptops/00-INDEX
+++ b/Documentation/laptops/00-INDEX
@@ -2,6 +2,12 @@
2 - This file 2 - This file
3acer-wmi.txt 3acer-wmi.txt
4 - information on the Acer Laptop WMI Extras driver. 4 - information on the Acer Laptop WMI Extras driver.
5asus-laptop.txt
6 - information on the Asus Laptop Extras driver.
7disk-shock-protection.txt
8 - information on hard disk shock protection.
9dslm.c
10 - Simple Disk Sleep Monitor program
5laptop-mode.txt 11laptop-mode.txt
6 - how to conserve battery power using laptop-mode. 12 - how to conserve battery power using laptop-mode.
7sony-laptop.txt 13sony-laptop.txt
diff --git a/Documentation/laptops/Makefile b/Documentation/laptops/Makefile
new file mode 100644
index 000000000000..5cb144af3c09
--- /dev/null
+++ b/Documentation/laptops/Makefile
@@ -0,0 +1,8 @@
1# kbuild trick to avoid linker error. Can be omitted if a module is built.
2obj- := dummy.o
3
4# List of programs to build
5hostprogs-y := dslm
6
7# Tell kbuild to always build the programs
8always := $(hostprogs-y)
diff --git a/Documentation/laptops/dslm.c b/Documentation/laptops/dslm.c
new file mode 100644
index 000000000000..72ff290c5fc6
--- /dev/null
+++ b/Documentation/laptops/dslm.c
@@ -0,0 +1,166 @@
1/*
2 * dslm.c
3 * Simple Disk Sleep Monitor
4 * by Bartek Kania
5 * Licenced under the GPL
6 */
7#include <unistd.h>
8#include <stdlib.h>
9#include <stdio.h>
10#include <fcntl.h>
11#include <errno.h>
12#include <time.h>
13#include <string.h>
14#include <signal.h>
15#include <sys/ioctl.h>
16#include <linux/hdreg.h>
17
18#ifdef DEBUG
19#define D(x) x
20#else
21#define D(x)
22#endif
23
24int endit = 0;
25
26/* Check if the disk is in powersave-mode
27 * Most of the code is stolen from hdparm.
28 * 1 = active, 0 = standby/sleep, -1 = unknown */
29static int check_powermode(int fd)
30{
31 unsigned char args[4] = {WIN_CHECKPOWERMODE1,0,0,0};
32 int state;
33
34 if (ioctl(fd, HDIO_DRIVE_CMD, &args)
35 && (args[0] = WIN_CHECKPOWERMODE2) /* try again with 0x98 */
36 && ioctl(fd, HDIO_DRIVE_CMD, &args)) {
37 if (errno != EIO || args[0] != 0 || args[1] != 0) {
38 state = -1; /* "unknown"; */
39 } else
40 state = 0; /* "sleeping"; */
41 } else {
42 state = (args[2] == 255) ? 1 : 0;
43 }
44 D(printf(" drive state is: %d\n", state));
45
46 return state;
47}
48
49static char *state_name(int i)
50{
51 if (i == -1) return "unknown";
52 if (i == 0) return "sleeping";
53 if (i == 1) return "active";
54
55 return "internal error";
56}
57
58static char *myctime(time_t time)
59{
60 char *ts = ctime(&time);
61 ts[strlen(ts) - 1] = 0;
62
63 return ts;
64}
65
66static void measure(int fd)
67{
68 time_t start_time;
69 int last_state;
70 time_t last_time;
71 int curr_state;
72 time_t curr_time = 0;
73 time_t time_diff;
74 time_t active_time = 0;
75 time_t sleep_time = 0;
76 time_t unknown_time = 0;
77 time_t total_time = 0;
78 int changes = 0;
79 float tmp;
80
81 printf("Starting measurements\n");
82
83 last_state = check_powermode(fd);
84 start_time = last_time = time(0);
85 printf(" System is in state %s\n\n", state_name(last_state));
86
87 while(!endit) {
88 sleep(1);
89 curr_state = check_powermode(fd);
90
91 if (curr_state != last_state || endit) {
92 changes++;
93 curr_time = time(0);
94 time_diff = curr_time - last_time;
95
96 if (last_state == 1) active_time += time_diff;
97 else if (last_state == 0) sleep_time += time_diff;
98 else unknown_time += time_diff;
99
100 last_state = curr_state;
101 last_time = curr_time;
102
103 printf("%s: State-change to %s\n", myctime(curr_time),
104 state_name(curr_state));
105 }
106 }
107 changes--; /* Compensate for SIGINT */
108
109 total_time = time(0) - start_time;
110 printf("\nTotal running time: %lus\n", curr_time - start_time);
111 printf(" State changed %d times\n", changes);
112
113 tmp = (float)sleep_time / (float)total_time * 100;
114 printf(" Time in sleep state: %lus (%.2f%%)\n", sleep_time, tmp);
115 tmp = (float)active_time / (float)total_time * 100;
116 printf(" Time in active state: %lus (%.2f%%)\n", active_time, tmp);
117 tmp = (float)unknown_time / (float)total_time * 100;
118 printf(" Time in unknown state: %lus (%.2f%%)\n", unknown_time, tmp);
119}
120
121static void ender(int s)
122{
123 endit = 1;
124}
125
126static void usage(void)
127{
128 puts("usage: dslm [-w <time>] <disk>");
129 exit(0);
130}
131
132int main(int argc, char **argv)
133{
134 int fd;
135 char *disk = 0;
136 int settle_time = 60;
137
138 /* Parse the simple command-line */
139 if (argc == 2)
140 disk = argv[1];
141 else if (argc == 4) {
142 settle_time = atoi(argv[2]);
143 disk = argv[3];
144 } else
145 usage();
146
147 if (!(fd = open(disk, O_RDONLY|O_NONBLOCK))) {
148 printf("Can't open %s, because: %s\n", disk, strerror(errno));
149 exit(-1);
150 }
151
152 if (settle_time) {
153 printf("Waiting %d seconds for the system to settle down to "
154 "'normal'\n", settle_time);
155 sleep(settle_time);
156 } else
157 puts("Not waiting for system to settle down");
158
159 signal(SIGINT, ender);
160
161 measure(fd);
162
163 close(fd);
164
165 return 0;
166}
diff --git a/Documentation/laptops/laptop-mode.txt b/Documentation/laptops/laptop-mode.txt
index eeedee11c8c2..2c3c35093023 100644
--- a/Documentation/laptops/laptop-mode.txt
+++ b/Documentation/laptops/laptop-mode.txt
@@ -779,172 +779,4 @@ Monitoring tool
779--------------- 779---------------
780 780
781Bartek Kania submitted this, it can be used to measure how much time your disk 781Bartek Kania submitted this, it can be used to measure how much time your disk
782spends spun up/down. 782spends spun up/down. See Documentation/laptops/dslm.c
783
784---------------------------dslm.c BEGIN-----------------------------------------
785/*
786 * Simple Disk Sleep Monitor
787 * by Bartek Kania
788 * Licenced under the GPL
789 */
790#include <unistd.h>
791#include <stdlib.h>
792#include <stdio.h>
793#include <fcntl.h>
794#include <errno.h>
795#include <time.h>
796#include <string.h>
797#include <signal.h>
798#include <sys/ioctl.h>
799#include <linux/hdreg.h>
800
801#ifdef DEBUG
802#define D(x) x
803#else
804#define D(x)
805#endif
806
807int endit = 0;
808
809/* Check if the disk is in powersave-mode
810 * Most of the code is stolen from hdparm.
811 * 1 = active, 0 = standby/sleep, -1 = unknown */
812int check_powermode(int fd)
813{
814 unsigned char args[4] = {WIN_CHECKPOWERMODE1,0,0,0};
815 int state;
816
817 if (ioctl(fd, HDIO_DRIVE_CMD, &args)
818 && (args[0] = WIN_CHECKPOWERMODE2) /* try again with 0x98 */
819 && ioctl(fd, HDIO_DRIVE_CMD, &args)) {
820 if (errno != EIO || args[0] != 0 || args[1] != 0) {
821 state = -1; /* "unknown"; */
822 } else
823 state = 0; /* "sleeping"; */
824 } else {
825 state = (args[2] == 255) ? 1 : 0;
826 }
827 D(printf(" drive state is: %d\n", state));
828
829 return state;
830}
831
832char *state_name(int i)
833{
834 if (i == -1) return "unknown";
835 if (i == 0) return "sleeping";
836 if (i == 1) return "active";
837
838 return "internal error";
839}
840
841char *myctime(time_t time)
842{
843 char *ts = ctime(&time);
844 ts[strlen(ts) - 1] = 0;
845
846 return ts;
847}
848
849void measure(int fd)
850{
851 time_t start_time;
852 int last_state;
853 time_t last_time;
854 int curr_state;
855 time_t curr_time = 0;
856 time_t time_diff;
857 time_t active_time = 0;
858 time_t sleep_time = 0;
859 time_t unknown_time = 0;
860 time_t total_time = 0;
861 int changes = 0;
862 float tmp;
863
864 printf("Starting measurements\n");
865
866 last_state = check_powermode(fd);
867 start_time = last_time = time(0);
868 printf(" System is in state %s\n\n", state_name(last_state));
869
870 while(!endit) {
871 sleep(1);
872 curr_state = check_powermode(fd);
873
874 if (curr_state != last_state || endit) {
875 changes++;
876 curr_time = time(0);
877 time_diff = curr_time - last_time;
878
879 if (last_state == 1) active_time += time_diff;
880 else if (last_state == 0) sleep_time += time_diff;
881 else unknown_time += time_diff;
882
883 last_state = curr_state;
884 last_time = curr_time;
885
886 printf("%s: State-change to %s\n", myctime(curr_time),
887 state_name(curr_state));
888 }
889 }
890 changes--; /* Compensate for SIGINT */
891
892 total_time = time(0) - start_time;
893 printf("\nTotal running time: %lus\n", curr_time - start_time);
894 printf(" State changed %d times\n", changes);
895
896 tmp = (float)sleep_time / (float)total_time * 100;
897 printf(" Time in sleep state: %lus (%.2f%%)\n", sleep_time, tmp);
898 tmp = (float)active_time / (float)total_time * 100;
899 printf(" Time in active state: %lus (%.2f%%)\n", active_time, tmp);
900 tmp = (float)unknown_time / (float)total_time * 100;
901 printf(" Time in unknown state: %lus (%.2f%%)\n", unknown_time, tmp);
902}
903
904void ender(int s)
905{
906 endit = 1;
907}
908
909void usage()
910{
911 puts("usage: dslm [-w <time>] <disk>");
912 exit(0);
913}
914
915int main(int argc, char **argv)
916{
917 int fd;
918 char *disk = 0;
919 int settle_time = 60;
920
921 /* Parse the simple command-line */
922 if (argc == 2)
923 disk = argv[1];
924 else if (argc == 4) {
925 settle_time = atoi(argv[2]);
926 disk = argv[3];
927 } else
928 usage();
929
930 if (!(fd = open(disk, O_RDONLY|O_NONBLOCK))) {
931 printf("Can't open %s, because: %s\n", disk, strerror(errno));
932 exit(-1);
933 }
934
935 if (settle_time) {
936 printf("Waiting %d seconds for the system to settle down to "
937 "'normal'\n", settle_time);
938 sleep(settle_time);
939 } else
940 puts("Not waiting for system to settle down");
941
942 signal(SIGINT, ender);
943
944 measure(fd);
945
946 close(fd);
947
948 return 0;
949}
950---------------------------dslm.c END-------------------------------------------
diff --git a/Documentation/laptops/thinkpad-acpi.txt b/Documentation/laptops/thinkpad-acpi.txt
index 75afa1229fd7..39c0a09d0105 100644
--- a/Documentation/laptops/thinkpad-acpi.txt
+++ b/Documentation/laptops/thinkpad-acpi.txt
@@ -650,6 +650,10 @@ LCD, CRT or DVI (if available). The following commands are available:
650 echo expand_toggle > /proc/acpi/ibm/video 650 echo expand_toggle > /proc/acpi/ibm/video
651 echo video_switch > /proc/acpi/ibm/video 651 echo video_switch > /proc/acpi/ibm/video
652 652
653NOTE: Access to this feature is restricted to processes owning the
654CAP_SYS_ADMIN capability for safety reasons, as it can interact badly
655enough with some versions of X.org to crash it.
656
653Each video output device can be enabled or disabled individually. 657Each video output device can be enabled or disabled individually.
654Reading /proc/acpi/ibm/video shows the status of each device. 658Reading /proc/acpi/ibm/video shows the status of each device.
655 659
diff --git a/Documentation/lguest/lguest.c b/Documentation/lguest/lguest.c
index 42208511b5c0..3119f5db75bd 100644
--- a/Documentation/lguest/lguest.c
+++ b/Documentation/lguest/lguest.c
@@ -34,7 +34,6 @@
34#include <sys/uio.h> 34#include <sys/uio.h>
35#include <termios.h> 35#include <termios.h>
36#include <getopt.h> 36#include <getopt.h>
37#include <zlib.h>
38#include <assert.h> 37#include <assert.h>
39#include <sched.h> 38#include <sched.h>
40#include <limits.h> 39#include <limits.h>
diff --git a/Documentation/networking/00-INDEX b/Documentation/networking/00-INDEX
index 50189bf07d53..fe5c099b8fc8 100644
--- a/Documentation/networking/00-INDEX
+++ b/Documentation/networking/00-INDEX
@@ -32,6 +32,8 @@ cs89x0.txt
32 - the Crystal LAN (CS8900/20-based) Ethernet ISA adapter driver 32 - the Crystal LAN (CS8900/20-based) Ethernet ISA adapter driver
33cxacru.txt 33cxacru.txt
34 - Conexant AccessRunner USB ADSL Modem 34 - Conexant AccessRunner USB ADSL Modem
35cxacru-cf.py
36 - Conexant AccessRunner USB ADSL Modem configuration file parser
35de4x5.txt 37de4x5.txt
36 - the Digital EtherWORKS DE4?? and DE5?? PCI Ethernet driver 38 - the Digital EtherWORKS DE4?? and DE5?? PCI Ethernet driver
37decnet.txt 39decnet.txt
diff --git a/Documentation/networking/cxacru-cf.py b/Documentation/networking/cxacru-cf.py
new file mode 100644
index 000000000000..b41d298398c8
--- /dev/null
+++ b/Documentation/networking/cxacru-cf.py
@@ -0,0 +1,48 @@
1#!/usr/bin/env python
2# Copyright 2009 Simon Arlott
3#
4# This program is free software; you can redistribute it and/or modify it
5# under the terms of the GNU General Public License as published by the Free
6# Software Foundation; either version 2 of the License, or (at your option)
7# any later version.
8#
9# This program is distributed in the hope that it will be useful, but WITHOUT
10# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
11# FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
12# more details.
13#
14# You should have received a copy of the GNU General Public License along with
15# this program; if not, write to the Free Software Foundation, Inc., 59
16# Temple Place - Suite 330, Boston, MA 02111-1307, USA.
17#
18# Usage: cxacru-cf.py < cxacru-cf.bin
19# Output: values string suitable for the sysfs adsl_config attribute
20#
21# Warning: cxacru-cf.bin with MD5 hash cdbac2689969d5ed5d4850f117702110
22# contains mis-aligned values which will stop the modem from being able
23# to make a connection. If the first and last two bytes are removed then
24# the values become valid, but the modulation will be forced to ANSI
25# T1.413 only which may not be appropriate.
26#
27# The original binary format is a packed list of le32 values.
28
29import sys
30import struct
31
32i = 0
33while True:
34 buf = sys.stdin.read(4)
35
36 if len(buf) == 0:
37 break
38 elif len(buf) != 4:
39 sys.stdout.write("\n")
40 sys.stderr.write("Error: read {0} not 4 bytes\n".format(len(buf)))
41 sys.exit(1)
42
43 if i > 0:
44 sys.stdout.write(" ")
45 sys.stdout.write("{0:x}={1}".format(i, struct.unpack("<I", buf)[0]))
46 i += 1
47
48sys.stdout.write("\n")
diff --git a/Documentation/networking/cxacru.txt b/Documentation/networking/cxacru.txt
index b074681a963e..2cce04457b4d 100644
--- a/Documentation/networking/cxacru.txt
+++ b/Documentation/networking/cxacru.txt
@@ -4,6 +4,12 @@ While it is capable of managing/maintaining the ADSL connection without the
4module loaded, the device will sometimes stop responding after unloading the 4module loaded, the device will sometimes stop responding after unloading the
5driver and it is necessary to unplug/remove power to the device to fix this. 5driver and it is necessary to unplug/remove power to the device to fix this.
6 6
7Note: support for cxacru-cf.bin has been removed. It was not loaded correctly
8so it had no effect on the device configuration. Fixing it could have stopped
9existing devices working when an invalid configuration is supplied.
10
11There is a script cxacru-cf.py to convert an existing file to the sysfs form.
12
7Detected devices will appear as ATM devices named "cxacru". In /sys/class/atm/ 13Detected devices will appear as ATM devices named "cxacru". In /sys/class/atm/
8these are directories named cxacruN where N is the device number. A symlink 14these are directories named cxacruN where N is the device number. A symlink
9named device points to the USB interface device's directory which contains 15named device points to the USB interface device's directory which contains
@@ -15,6 +21,15 @@ several sysfs attribute files for retrieving device statistics:
15* adsl_headend_environment 21* adsl_headend_environment
16 Information about the remote headend. 22 Information about the remote headend.
17 23
24* adsl_config
25 Configuration writing interface.
26 Write parameters in hexadecimal format <index>=<value>,
27 separated by whitespace, e.g.:
28 "1=0 a=5"
29 Up to 7 parameters at a time will be sent and the modem will restart
30 the ADSL connection when any value is set. These are logged for future
31 reference.
32
18* downstream_attenuation (dB) 33* downstream_attenuation (dB)
19* downstream_bits_per_frame 34* downstream_bits_per_frame
20* downstream_rate (kbps) 35* downstream_rate (kbps)
@@ -61,6 +76,7 @@ several sysfs attribute files for retrieving device statistics:
61* mac_address 76* mac_address
62 77
63* modulation 78* modulation
79 "" (when not connected)
64 "ANSI T1.413" 80 "ANSI T1.413"
65 "ITU-T G.992.1 (G.DMT)" 81 "ITU-T G.992.1 (G.DMT)"
66 "ITU-T G.992.2 (G.LITE)" 82 "ITU-T G.992.2 (G.LITE)"
diff --git a/Documentation/networking/dccp.txt b/Documentation/networking/dccp.txt
index b132e4a3cf0f..a62fdf7a6bff 100644
--- a/Documentation/networking/dccp.txt
+++ b/Documentation/networking/dccp.txt
@@ -58,8 +58,10 @@ DCCP_SOCKOPT_GET_CUR_MPS is read-only and retrieves the current maximum packet
58size (application payload size) in bytes, see RFC 4340, section 14. 58size (application payload size) in bytes, see RFC 4340, section 14.
59 59
60DCCP_SOCKOPT_AVAILABLE_CCIDS is also read-only and returns the list of CCIDs 60DCCP_SOCKOPT_AVAILABLE_CCIDS is also read-only and returns the list of CCIDs
61supported by the endpoint (see include/linux/dccp.h for symbolic constants). 61supported by the endpoint. The option value is an array of type uint8_t whose
62The caller needs to provide a sufficiently large (> 2) array of type uint8_t. 62size is passed as option length. The minimum array size is 4 elements, the
63value returned in the optlen argument always reflects the true number of
64built-in CCIDs.
63 65
64DCCP_SOCKOPT_CCID is write-only and sets both the TX and RX CCIDs at the same 66DCCP_SOCKOPT_CCID is write-only and sets both the TX and RX CCIDs at the same
65time, combining the operation of the next two socket options. This option is 67time, combining the operation of the next two socket options. This option is
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 006b39dec87d..8b72c88ba213 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -487,6 +487,30 @@ tcp_dma_copybreak - INTEGER
487 and CONFIG_NET_DMA is enabled. 487 and CONFIG_NET_DMA is enabled.
488 Default: 4096 488 Default: 4096
489 489
490tcp_thin_linear_timeouts - BOOLEAN
491 Enable dynamic triggering of linear timeouts for thin streams.
492 If set, a check is performed upon retransmission by timeout to
493 determine if the stream is thin (less than 4 packets in flight).
494 As long as the stream is found to be thin, up to 6 linear
495 timeouts may be performed before exponential backoff mode is
496 initiated. This improves retransmission latency for
497 non-aggressive thin streams, often found to be time-dependent.
498 For more information on thin streams, see
499 Documentation/networking/tcp-thin.txt
500 Default: 0
501
502tcp_thin_dupack - BOOLEAN
503 Enable dynamic triggering of retransmissions after one dupACK
504 for thin streams. If set, a check is performed upon reception
505 of a dupACK to determine if the stream is thin (less than 4
506 packets in flight). As long as the stream is found to be thin,
507 data is retransmitted on the first received dupACK. This
508 improves retransmission latency for non-aggressive thin
509 streams, often found to be time-dependent.
510 For more information on thin streams, see
511 Documentation/networking/tcp-thin.txt
512 Default: 0
513
490UDP variables: 514UDP variables:
491 515
492udp_mem - vector of 3 INTEGERs: min, pressure, max 516udp_mem - vector of 3 INTEGERs: min, pressure, max
@@ -692,6 +716,25 @@ proxy_arp - BOOLEAN
692 conf/{all,interface}/proxy_arp is set to TRUE, 716 conf/{all,interface}/proxy_arp is set to TRUE,
693 it will be disabled otherwise 717 it will be disabled otherwise
694 718
719proxy_arp_pvlan - BOOLEAN
720 Private VLAN proxy arp.
721 Basically allow proxy arp replies back to the same interface
722 (from which the ARP request/solicitation was received).
723
724 This is done to support (ethernet) switch features, like RFC
725 3069, where the individual ports are NOT allowed to
726 communicate with each other, but they are allowed to talk to
727 the upstream router. As described in RFC 3069, it is possible
728 to allow these hosts to communicate through the upstream
729 router by proxy_arp'ing. Don't need to be used together with
730 proxy_arp.
731
732 This technology is known by different names:
733 In RFC 3069 it is called VLAN Aggregation.
734 Cisco and Allied Telesyn call it Private VLAN.
735 Hewlett-Packard call it Source-Port filtering or port-isolation.
736 Ericsson call it MAC-Forced Forwarding (RFC Draft).
737
695shared_media - BOOLEAN 738shared_media - BOOLEAN
696 Send(router) or accept(host) RFC1620 shared media redirects. 739 Send(router) or accept(host) RFC1620 shared media redirects.
697 Overrides ip_secure_redirects. 740 Overrides ip_secure_redirects.
@@ -833,9 +876,18 @@ arp_notify - BOOLEAN
833 or hardware address changes. 876 or hardware address changes.
834 877
835arp_accept - BOOLEAN 878arp_accept - BOOLEAN
836 Define behavior when gratuitous arp replies are received: 879 Define behavior for gratuitous ARP frames who's IP is not
837 0 - drop gratuitous arp frames 880 already present in the ARP table:
838 1 - accept gratuitous arp frames 881 0 - don't create new entries in the ARP table
882 1 - create new entries in the ARP table
883
884 Both replies and requests type gratuitous arp will trigger the
885 ARP table to be updated, if this setting is on.
886
887 If the ARP table already contains the IP address of the
888 gratuitous arp frame, the arp table will be updated regardless
889 if this setting is on or off.
890
839 891
840app_solicit - INTEGER 892app_solicit - INTEGER
841 The maximum number of probes to send to the user space ARP daemon 893 The maximum number of probes to send to the user space ARP daemon
@@ -1074,10 +1126,10 @@ regen_max_retry - INTEGER
1074 Default: 5 1126 Default: 5
1075 1127
1076max_addresses - INTEGER 1128max_addresses - INTEGER
1077 Number of maximum addresses per interface. 0 disables limitation. 1129 Maximum number of autoconfigured addresses per interface. Setting
1078 It is recommended not set too large value (or 0) because it would 1130 to zero disables the limitation. It is not recommended to set this
1079 be too easy way to crash kernel to allow to create too much of 1131 value too large (or to zero) because it would be an easy way to
1080 autoconfigured addresses. 1132 crash the kernel by allowing too many addresses to be created.
1081 Default: 16 1133 Default: 16
1082 1134
1083disable_ipv6 - BOOLEAN 1135disable_ipv6 - BOOLEAN
diff --git a/Documentation/networking/ixgbevf.txt b/Documentation/networking/ixgbevf.txt
new file mode 100755
index 000000000000..19015de6725f
--- /dev/null
+++ b/Documentation/networking/ixgbevf.txt
@@ -0,0 +1,90 @@
1Linux* Base Driver for Intel(R) Network Connection
2==================================================
3
4November 24, 2009
5
6Contents
7========
8
9- In This Release
10- Identifying Your Adapter
11- Known Issues/Troubleshooting
12- Support
13
14In This Release
15===============
16
17This file describes the ixgbevf Linux* Base Driver for Intel Network
18Connection.
19
20The ixgbevf driver supports 82599-based virtual function devices that can only
21be activated on kernels with CONFIG_PCI_IOV enabled.
22
23The ixgbevf driver supports virtual functions generated by the ixgbe driver
24with a max_vfs value of 1 or greater.
25
26The guest OS loading the ixgbevf driver must support MSI-X interrupts.
27
28VLANs: There is a limit of a total of 32 shared VLANs to 1 or more VFs.
29
30Identifying Your Adapter
31========================
32
33For more information on how to identify your adapter, go to the Adapter &
34Driver ID Guide at:
35
36 http://support.intel.com/support/network/sb/CS-008441.htm
37
38Known Issues/Troubleshooting
39============================
40
41 Unloading Physical Function (PF) Driver Causes System Reboots When VM is
42 Running and VF is Loaded on the VM
43 ------------------------------------------------------------------------
44 Do not unload the PF driver (ixgbe) while VFs are assigned to guests.
45
46Support
47=======
48
49For general information, go to the Intel support website at:
50
51 http://support.intel.com
52
53or the Intel Wired Networking project hosted by Sourceforge at:
54
55 http://sourceforge.net/projects/e1000
56
57If an issue is identified with the released source code on the supported
58kernel with a supported adapter, email the specific information related
59to the issue to e1000-devel@lists.sf.net
60
61License
62=======
63
64Intel 10 Gigabit Linux driver.
65Copyright(c) 1999 - 2009 Intel Corporation.
66
67This program is free software; you can redistribute it and/or modify it
68under the terms and conditions of the GNU General Public License,
69version 2, as published by the Free Software Foundation.
70
71This program is distributed in the hope it will be useful, but WITHOUT
72ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
73FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
74more details.
75
76You should have received a copy of the GNU General Public License along with
77this program; if not, write to the Free Software Foundation, Inc.,
7851 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
79
80The full GNU General Public License is included in this distribution in
81the file called "COPYING".
82
83Trademarks
84==========
85
86Intel, Itanium, and Pentium are trademarks or registered trademarks of
87Intel Corporation or its subsidiaries in the United States and other
88countries.
89
90* Other names and brands may be claimed as the property of others.
diff --git a/Documentation/networking/packet_mmap.txt b/Documentation/networking/packet_mmap.txt
index a22fd85e3796..09ab0d290326 100644
--- a/Documentation/networking/packet_mmap.txt
+++ b/Documentation/networking/packet_mmap.txt
@@ -2,7 +2,7 @@
2+ ABSTRACT 2+ ABSTRACT
3-------------------------------------------------------------------------------- 3--------------------------------------------------------------------------------
4 4
5This file documents the CONFIG_PACKET_MMAP option available with the PACKET 5This file documents the mmap() facility available with the PACKET
6socket interface on 2.4 and 2.6 kernels. This type of sockets is used for 6socket interface on 2.4 and 2.6 kernels. This type of sockets is used for
7capture network traffic with utilities like tcpdump or any other that needs 7capture network traffic with utilities like tcpdump or any other that needs
8raw access to network interface. 8raw access to network interface.
@@ -44,7 +44,7 @@ enabled. For transmission, check the MTU (Maximum Transmission Unit) used and
44supported by devices of your network. 44supported by devices of your network.
45 45
46-------------------------------------------------------------------------------- 46--------------------------------------------------------------------------------
47+ How to use CONFIG_PACKET_MMAP to improve capture process 47+ How to use mmap() to improve capture process
48-------------------------------------------------------------------------------- 48--------------------------------------------------------------------------------
49 49
50From the user standpoint, you should use the higher level libpcap library, which 50From the user standpoint, you should use the higher level libpcap library, which
@@ -64,7 +64,7 @@ the low level details or want to improve libpcap by including PACKET_MMAP
64support. 64support.
65 65
66-------------------------------------------------------------------------------- 66--------------------------------------------------------------------------------
67+ How to use CONFIG_PACKET_MMAP directly to improve capture process 67+ How to use mmap() directly to improve capture process
68-------------------------------------------------------------------------------- 68--------------------------------------------------------------------------------
69 69
70From the system calls stand point, the use of PACKET_MMAP involves 70From the system calls stand point, the use of PACKET_MMAP involves
@@ -105,7 +105,7 @@ also the mapping of the circular buffer in the user process and
105the use of this buffer. 105the use of this buffer.
106 106
107-------------------------------------------------------------------------------- 107--------------------------------------------------------------------------------
108+ How to use CONFIG_PACKET_MMAP directly to improve transmission process 108+ How to use mmap() directly to improve transmission process
109-------------------------------------------------------------------------------- 109--------------------------------------------------------------------------------
110Transmission process is similar to capture as shown below. 110Transmission process is similar to capture as shown below.
111 111
diff --git a/Documentation/networking/regulatory.txt b/Documentation/networking/regulatory.txt
index ee31369e9e5b..9551622d0a7b 100644
--- a/Documentation/networking/regulatory.txt
+++ b/Documentation/networking/regulatory.txt
@@ -188,3 +188,27 @@ Then in some part of your code after your wiphy has been registered:
188 &mydriver_jp_regdom.reg_rules[i], 188 &mydriver_jp_regdom.reg_rules[i],
189 sizeof(struct ieee80211_reg_rule)); 189 sizeof(struct ieee80211_reg_rule));
190 regulatory_struct_hint(rd); 190 regulatory_struct_hint(rd);
191
192Statically compiled regulatory database
193---------------------------------------
194
195In most situations the userland solution using CRDA as described
196above is the preferred solution. However in some cases a set of
197rules built into the kernel itself may be desirable. To account
198for this situation, a configuration option has been provided
199(i.e. CONFIG_CFG80211_INTERNAL_REGDB). With this option enabled,
200the wireless database information contained in net/wireless/db.txt is
201used to generate a data structure encoded in net/wireless/regdb.c.
202That option also enables code in net/wireless/reg.c which queries
203the data in regdb.c as an alternative to using CRDA.
204
205The file net/wireless/db.txt should be kept up-to-date with the db.txt
206file available in the git repository here:
207
208 git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-regdb.git
209
210Again, most users in most situations should be using the CRDA package
211provided with their distribution, and in most other situations users
212should be building and using CRDA on their own rather than using
213this option. If you are not absolutely sure that you should be using
214CONFIG_CFG80211_INTERNAL_REGDB then _DO_NOT_USE_IT_.
diff --git a/Documentation/networking/skfp.txt b/Documentation/networking/skfp.txt
index abfddf81e34a..203ec66c9fb4 100644
--- a/Documentation/networking/skfp.txt
+++ b/Documentation/networking/skfp.txt
@@ -68,7 +68,7 @@ Compaq adapters (not tested):
68======================= 68=======================
69 69
70From v2.01 on, the driver is integrated in the linux kernel sources. 70From v2.01 on, the driver is integrated in the linux kernel sources.
71Therefor, the installation is the same as for any other adapter 71Therefore, the installation is the same as for any other adapter
72supported by the kernel. 72supported by the kernel.
73Refer to the manual of your distribution about the installation 73Refer to the manual of your distribution about the installation
74of network adapters. 74of network adapters.
diff --git a/Documentation/networking/tcp-thin.txt b/Documentation/networking/tcp-thin.txt
new file mode 100644
index 000000000000..151e229980f1
--- /dev/null
+++ b/Documentation/networking/tcp-thin.txt
@@ -0,0 +1,47 @@
1Thin-streams and TCP
2====================
3A wide range of Internet-based services that use reliable transport
4protocols display what we call thin-stream properties. This means
5that the application sends data with such a low rate that the
6retransmission mechanisms of the transport protocol are not fully
7effective. In time-dependent scenarios (like online games, control
8systems, stock trading etc.) where the user experience depends
9on the data delivery latency, packet loss can be devastating for
10the service quality. Extreme latencies are caused by TCP's
11dependency on the arrival of new data from the application to trigger
12retransmissions effectively through fast retransmit instead of
13waiting for long timeouts.
14
15After analysing a large number of time-dependent interactive
16applications, we have seen that they often produce thin streams
17and also stay with this traffic pattern throughout its entire
18lifespan. The combination of time-dependency and the fact that the
19streams provoke high latencies when using TCP is unfortunate.
20
21In order to reduce application-layer latency when packets are lost,
22a set of mechanisms has been made, which address these latency issues
23for thin streams. In short, if the kernel detects a thin stream,
24the retransmission mechanisms are modified in the following manner:
25
261) If the stream is thin, fast retransmit on the first dupACK.
272) If the stream is thin, do not apply exponential backoff.
28
29These enhancements are applied only if the stream is detected as
30thin. This is accomplished by defining a threshold for the number
31of packets in flight. If there are less than 4 packets in flight,
32fast retransmissions can not be triggered, and the stream is prone
33to experience high retransmission latencies.
34
35Since these mechanisms are targeted at time-dependent applications,
36they must be specifically activated by the application using the
37TCP_THIN_LINEAR_TIMEOUTS and TCP_THIN_DUPACK IOCTLS or the
38tcp_thin_linear_timeouts and tcp_thin_dupack sysctls. Both
39modifications are turned off by default.
40
41References
42==========
43More information on the modifications, as well as a wide range of
44experimental data can be found here:
45"Improving latency for interactive, thin-stream applications over
46reliable transport"
47http://simula.no/research/nd/publications/Simula.nd.477/simula_pdf_file
diff --git a/Documentation/networking/timestamping/timestamping.c b/Documentation/networking/timestamping/timestamping.c
index a7936fe8444a..bab619a48214 100644
--- a/Documentation/networking/timestamping/timestamping.c
+++ b/Documentation/networking/timestamping/timestamping.c
@@ -370,7 +370,7 @@ int main(int argc, char **argv)
370 } 370 }
371 371
372 sock = socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP); 372 sock = socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP);
373 if (socket < 0) 373 if (sock < 0)
374 bail("socket"); 374 bail("socket");
375 375
376 memset(&device, 0, sizeof(device)); 376 memset(&device, 0, sizeof(device));
diff --git a/Documentation/pcmcia/locking.txt b/Documentation/pcmcia/locking.txt
new file mode 100644
index 000000000000..68f622bc4064
--- /dev/null
+++ b/Documentation/pcmcia/locking.txt
@@ -0,0 +1,118 @@
1This file explains the locking and exclusion scheme used in the PCCARD
2and PCMCIA subsystems.
3
4
5A) Overview, Locking Hierarchy:
6===============================
7
8pcmcia_socket_list_rwsem - protects only the list of sockets
9- skt_mutex - serializes card insert / ejection
10 - ops_mutex - serializes socket operation
11
12
13B) Exclusion
14============
15
16The following functions and callbacks to struct pcmcia_socket must
17be called with "skt_mutex" held:
18
19 socket_detect_change()
20 send_event()
21 socket_reset()
22 socket_shutdown()
23 socket_setup()
24 socket_remove()
25 socket_insert()
26 socket_early_resume()
27 socket_late_resume()
28 socket_resume()
29 socket_suspend()
30
31 struct pcmcia_callback *callback
32
33The following functions and callbacks to struct pcmcia_socket must
34be called with "ops_mutex" held:
35
36 socket_reset()
37 socket_setup()
38
39 struct pccard_operations *ops
40 struct pccard_resource_ops *resource_ops;
41
42Note that send_event() and struct pcmcia_callback *callback must not be
43called with "ops_mutex" held.
44
45
46C) Protection
47=============
48
491. Global Data:
50---------------
51struct list_head pcmcia_socket_list;
52
53protected by pcmcia_socket_list_rwsem;
54
55
562. Per-Socket Data:
57-------------------
58The resource_ops and their data are protected by ops_mutex.
59
60The "main" struct pcmcia_socket is protected as follows (read-only fields
61or single-use fields not mentioned):
62
63- by pcmcia_socket_list_rwsem:
64 struct list_head socket_list;
65
66- by thread_lock:
67 unsigned int thread_events;
68
69- by skt_mutex:
70 u_int suspended_state;
71 void (*tune_bridge);
72 struct pcmcia_callback *callback;
73 int resume_status;
74
75- by ops_mutex:
76 socket_state_t socket;
77 u_int state;
78 u_short lock_count;
79 pccard_mem_map cis_mem;
80 void __iomem *cis_virt;
81 struct { } irq;
82 io_window_t io[];
83 pccard_mem_map win[];
84 struct list_head cis_cache;
85 size_t fake_cis_len;
86 u8 *fake_cis;
87 u_int irq_mask;
88 void (*zoom_video);
89 int (*power_hook);
90 u8 resource...;
91 struct list_head devices_list;
92 u8 device_count;
93 struct pcmcia_state;
94
95
963. Per PCMCIA-device Data:
97--------------------------
98
99The "main" struct pcmcia_devie is protected as follows (read-only fields
100or single-use fields not mentioned):
101
102
103- by pcmcia_socket->ops_mutex:
104 struct list_head socket_device_list;
105 struct config_t *function_config;
106 u16 _irq:1;
107 u16 _io:1;
108 u16 _win:4;
109 u16 _locked:1;
110 u16 allow_func_id_match:1;
111 u16 suspended:1;
112 u16 _removed:1;
113
114- by the PCMCIA driver:
115 io_req_t io;
116 irq_req_t irq;
117 config_req_t conf;
118 window_handle_t win;
diff --git a/Documentation/pnp.txt b/Documentation/pnp.txt
index a327db67782a..763e4659bf18 100644
--- a/Documentation/pnp.txt
+++ b/Documentation/pnp.txt
@@ -57,7 +57,7 @@ PC standard floppy disk controller
57# cat resources 57# cat resources
58DISABLED 58DISABLED
59 59
60- Notice the string "DISABLED". THis means the device is not active. 60- Notice the string "DISABLED". This means the device is not active.
61 61
623.) check the device's possible configurations (optional) 623.) check the device's possible configurations (optional)
63# cat options 63# cat options
@@ -139,7 +139,7 @@ Plug and Play but it is planned to be in the near future.
139 139
140Requirements for a Linux PnP protocol: 140Requirements for a Linux PnP protocol:
1411.) the protocol must use EISA IDs 1411.) the protocol must use EISA IDs
1422.) the protocol must inform the PnP Layer of a devices current configuration 1422.) the protocol must inform the PnP Layer of a device's current configuration
143- the ability to set resources is optional but preferred. 143- the ability to set resources is optional but preferred.
144 144
145The following are PnP protocol related functions: 145The following are PnP protocol related functions:
@@ -158,7 +158,7 @@ pnp_remove_device
158- automatically will free mem used by the device and related structures 158- automatically will free mem used by the device and related structures
159 159
160pnp_add_id 160pnp_add_id
161- adds a EISA ID to the list of supported IDs for the specified device 161- adds an EISA ID to the list of supported IDs for the specified device
162 162
163For more information consult the source of a protocol such as 163For more information consult the source of a protocol such as
164/drivers/pnp/pnpbios/core.c. 164/drivers/pnp/pnpbios/core.c.
@@ -167,7 +167,7 @@ For more information consult the source of a protocol such as
167 167
168Linux Plug and Play Drivers 168Linux Plug and Play Drivers
169--------------------------- 169---------------------------
170 This section contains information for linux PnP driver developers. 170 This section contains information for Linux PnP driver developers.
171 171
172The New Way 172The New Way
173........... 173...........
@@ -235,11 +235,10 @@ static int __init serial8250_pnp_init(void)
235The Old Way 235The Old Way
236........... 236...........
237 237
238a series of compatibility functions have been created to make it easy to convert 238A series of compatibility functions have been created to make it easy to convert
239
240ISAPNP drivers. They should serve as a temporary solution only. 239ISAPNP drivers. They should serve as a temporary solution only.
241 240
242they are as follows: 241They are as follows:
243 242
244struct pnp_card *pnp_find_card(unsigned short vendor, 243struct pnp_card *pnp_find_card(unsigned short vendor,
245 unsigned short device, 244 unsigned short device,
diff --git a/Documentation/power/runtime_pm.txt b/Documentation/power/runtime_pm.txt
index 356fd86f4ea8..55b859b3bc72 100644
--- a/Documentation/power/runtime_pm.txt
+++ b/Documentation/power/runtime_pm.txt
@@ -224,6 +224,12 @@ defined in include/linux/pm.h:
224 RPM_SUSPENDED, which means that each device is initially regarded by the 224 RPM_SUSPENDED, which means that each device is initially regarded by the
225 PM core as 'suspended', regardless of its real hardware status 225 PM core as 'suspended', regardless of its real hardware status
226 226
227 unsigned int runtime_auto;
228 - if set, indicates that the user space has allowed the device driver to
229 power manage the device at run time via the /sys/devices/.../power/control
230 interface; it may only be modified with the help of the pm_runtime_allow()
231 and pm_runtime_forbid() helper functions
232
227All of the above fields are members of the 'power' member of 'struct device'. 233All of the above fields are members of the 'power' member of 'struct device'.
228 234
2294. Run-time PM Device Helper Functions 2354. Run-time PM Device Helper Functions
@@ -250,7 +256,7 @@ drivers/base/power/runtime.c and include/linux/pm_runtime.h:
250 to suspend the device again in future 256 to suspend the device again in future
251 257
252 int pm_runtime_resume(struct device *dev); 258 int pm_runtime_resume(struct device *dev);
253 - execute the subsystem-leve resume callback for the device; returns 0 on 259 - execute the subsystem-level resume callback for the device; returns 0 on
254 success, 1 if the device's run-time PM status was already 'active' or 260 success, 1 if the device's run-time PM status was already 'active' or
255 error code on failure, where -EAGAIN means it may be safe to attempt to 261 error code on failure, where -EAGAIN means it may be safe to attempt to
256 resume the device again in future, but 'power.runtime_error' should be 262 resume the device again in future, but 'power.runtime_error' should be
@@ -329,6 +335,20 @@ drivers/base/power/runtime.c and include/linux/pm_runtime.h:
329 'power.runtime_error' is set or 'power.disable_depth' is greater than 335 'power.runtime_error' is set or 'power.disable_depth' is greater than
330 zero) 336 zero)
331 337
338 bool pm_runtime_suspended(struct device *dev);
339 - return true if the device's runtime PM status is 'suspended', or false
340 otherwise
341
342 void pm_runtime_allow(struct device *dev);
343 - set the power.runtime_auto flag for the device and decrease its usage
344 counter (used by the /sys/devices/.../power/control interface to
345 effectively allow the device to be power managed at run time)
346
347 void pm_runtime_forbid(struct device *dev);
348 - unset the power.runtime_auto flag for the device and increase its usage
349 counter (used by the /sys/devices/.../power/control interface to
350 effectively prevent the device from being power managed at run time)
351
332It is safe to execute the following helper functions from interrupt context: 352It is safe to execute the following helper functions from interrupt context:
333 353
334pm_request_idle() 354pm_request_idle()
@@ -382,6 +402,18 @@ may be desirable to suspend the device as soon as ->probe() or ->remove() has
382finished, so the PM core uses pm_runtime_idle_sync() to invoke the 402finished, so the PM core uses pm_runtime_idle_sync() to invoke the
383subsystem-level idle callback for the device at that time. 403subsystem-level idle callback for the device at that time.
384 404
405The user space can effectively disallow the driver of the device to power manage
406it at run time by changing the value of its /sys/devices/.../power/control
407attribute to "on", which causes pm_runtime_forbid() to be called. In principle,
408this mechanism may also be used by the driver to effectively turn off the
409run-time power management of the device until the user space turns it on.
410Namely, during the initialization the driver can make sure that the run-time PM
411status of the device is 'active' and call pm_runtime_forbid(). It should be
412noted, however, that if the user space has already intentionally changed the
413value of /sys/devices/.../power/control to "auto" to allow the driver to power
414manage the device at run time, the driver may confuse it by using
415pm_runtime_forbid() this way.
416
3856. Run-time PM and System Sleep 4176. Run-time PM and System Sleep
386 418
387Run-time PM and system sleep (i.e., system suspend and hibernation, also known 419Run-time PM and system sleep (i.e., system suspend and hibernation, also known
@@ -431,3 +463,64 @@ The PM core always increments the run-time usage counter before calling the
431->prepare() callback and decrements it after calling the ->complete() callback. 463->prepare() callback and decrements it after calling the ->complete() callback.
432Hence disabling run-time PM temporarily like this will not cause any run-time 464Hence disabling run-time PM temporarily like this will not cause any run-time
433suspend callbacks to be lost. 465suspend callbacks to be lost.
466
4677. Generic subsystem callbacks
468
469Subsystems may wish to conserve code space by using the set of generic power
470management callbacks provided by the PM core, defined in
471driver/base/power/generic_ops.c:
472
473 int pm_generic_runtime_idle(struct device *dev);
474 - invoke the ->runtime_idle() callback provided by the driver of this
475 device, if defined, and call pm_runtime_suspend() for this device if the
476 return value is 0 or the callback is not defined
477
478 int pm_generic_runtime_suspend(struct device *dev);
479 - invoke the ->runtime_suspend() callback provided by the driver of this
480 device and return its result, or return -EINVAL if not defined
481
482 int pm_generic_runtime_resume(struct device *dev);
483 - invoke the ->runtime_resume() callback provided by the driver of this
484 device and return its result, or return -EINVAL if not defined
485
486 int pm_generic_suspend(struct device *dev);
487 - if the device has not been suspended at run time, invoke the ->suspend()
488 callback provided by its driver and return its result, or return 0 if not
489 defined
490
491 int pm_generic_resume(struct device *dev);
492 - invoke the ->resume() callback provided by the driver of this device and,
493 if successful, change the device's runtime PM status to 'active'
494
495 int pm_generic_freeze(struct device *dev);
496 - if the device has not been suspended at run time, invoke the ->freeze()
497 callback provided by its driver and return its result, or return 0 if not
498 defined
499
500 int pm_generic_thaw(struct device *dev);
501 - if the device has not been suspended at run time, invoke the ->thaw()
502 callback provided by its driver and return its result, or return 0 if not
503 defined
504
505 int pm_generic_poweroff(struct device *dev);
506 - if the device has not been suspended at run time, invoke the ->poweroff()
507 callback provided by its driver and return its result, or return 0 if not
508 defined
509
510 int pm_generic_restore(struct device *dev);
511 - invoke the ->restore() callback provided by the driver of this device and,
512 if successful, change the device's runtime PM status to 'active'
513
514These functions can be assigned to the ->runtime_idle(), ->runtime_suspend(),
515->runtime_resume(), ->suspend(), ->resume(), ->freeze(), ->thaw(), ->poweroff(),
516or ->restore() callback pointers in the subsystem-level dev_pm_ops structures.
517
518If a subsystem wishes to use all of them at the same time, it can simply assign
519the GENERIC_SUBSYS_PM_OPS macro, defined in include/linux/pm.h, to its
520dev_pm_ops structure pointer.
521
522Device drivers that wish to use the same function as a system suspend, freeze,
523poweroff and run-time suspend callback, and similarly for system resume, thaw,
524restore, and run-time resume, can achieve this with the help of the
525UNIVERSAL_DEV_PM_OPS macro defined in include/linux/pm.h (possibly setting its
526last argument to NULL).
diff --git a/Documentation/powerpc/dts-bindings/fsl/can.txt b/Documentation/powerpc/dts-bindings/fsl/can.txt
new file mode 100644
index 000000000000..2fa4fcd38fd6
--- /dev/null
+++ b/Documentation/powerpc/dts-bindings/fsl/can.txt
@@ -0,0 +1,53 @@
1CAN Device Tree Bindings
2------------------------
3
4(c) 2006-2009 Secret Lab Technologies Ltd
5Grant Likely <grant.likely@secretlab.ca>
6
7fsl,mpc5200-mscan nodes
8-----------------------
9In addition to the required compatible-, reg- and interrupt-properties, you can
10also specify which clock source shall be used for the controller:
11
12- fsl,mscan-clock-source : a string describing the clock source. Valid values
13 are: "ip" for ip bus clock
14 "ref" for reference clock (XTAL)
15 "ref" is default in case this property is not
16 present.
17
18fsl,mpc5121-mscan nodes
19-----------------------
20In addition to the required compatible-, reg- and interrupt-properties, you can
21also specify which clock source and divider shall be used for the controller:
22
23- fsl,mscan-clock-source : a string describing the clock source. Valid values
24 are: "ip" for ip bus clock
25 "ref" for reference clock
26 "sys" for system clock
27 If this property is not present, an optimal CAN
28 clock source and frequency based on the system
29 clock will be selected. If this is not possible,
30 the reference clock will be used.
31
32- fsl,mscan-clock-divider: for the reference and system clock, an additional
33 clock divider can be specified. By default, a
34 value of 1 is used.
35
36Note that the MPC5121 Rev. 1 processor is not supported.
37
38Examples:
39 can@1300 {
40 compatible = "fsl,mpc5121-mscan";
41 interrupts = <12 0x8>;
42 interrupt-parent = <&ipic>;
43 reg = <0x1300 0x80>;
44 };
45
46 can@1380 {
47 compatible = "fsl,mpc5121-mscan";
48 interrupts = <13 0x8>;
49 interrupt-parent = <&ipic>;
50 reg = <0x1380 0x80>;
51 fsl,mscan-clock-source = "ref";
52 fsl,mscan-clock-divider = <3>;
53 };
diff --git a/Documentation/powerpc/dts-bindings/fsl/dma.txt b/Documentation/powerpc/dts-bindings/fsl/dma.txt
index 0732cdd05ba1..2a4b4bce6110 100644
--- a/Documentation/powerpc/dts-bindings/fsl/dma.txt
+++ b/Documentation/powerpc/dts-bindings/fsl/dma.txt
@@ -44,21 +44,29 @@ Example:
44 compatible = "fsl,mpc8349-dma-channel", "fsl,elo-dma-channel"; 44 compatible = "fsl,mpc8349-dma-channel", "fsl,elo-dma-channel";
45 cell-index = <0>; 45 cell-index = <0>;
46 reg = <0 0x80>; 46 reg = <0 0x80>;
47 interrupt-parent = <&ipic>;
48 interrupts = <71 8>;
47 }; 49 };
48 dma-channel@80 { 50 dma-channel@80 {
49 compatible = "fsl,mpc8349-dma-channel", "fsl,elo-dma-channel"; 51 compatible = "fsl,mpc8349-dma-channel", "fsl,elo-dma-channel";
50 cell-index = <1>; 52 cell-index = <1>;
51 reg = <0x80 0x80>; 53 reg = <0x80 0x80>;
54 interrupt-parent = <&ipic>;
55 interrupts = <71 8>;
52 }; 56 };
53 dma-channel@100 { 57 dma-channel@100 {
54 compatible = "fsl,mpc8349-dma-channel", "fsl,elo-dma-channel"; 58 compatible = "fsl,mpc8349-dma-channel", "fsl,elo-dma-channel";
55 cell-index = <2>; 59 cell-index = <2>;
56 reg = <0x100 0x80>; 60 reg = <0x100 0x80>;
61 interrupt-parent = <&ipic>;
62 interrupts = <71 8>;
57 }; 63 };
58 dma-channel@180 { 64 dma-channel@180 {
59 compatible = "fsl,mpc8349-dma-channel", "fsl,elo-dma-channel"; 65 compatible = "fsl,mpc8349-dma-channel", "fsl,elo-dma-channel";
60 cell-index = <3>; 66 cell-index = <3>;
61 reg = <0x180 0x80>; 67 reg = <0x180 0x80>;
68 interrupt-parent = <&ipic>;
69 interrupts = <71 8>;
62 }; 70 };
63 }; 71 };
64 72
diff --git a/Documentation/powerpc/dts-bindings/fsl/i2c.txt b/Documentation/powerpc/dts-bindings/fsl/i2c.txt
index b6d2e21474f9..50da20310585 100644
--- a/Documentation/powerpc/dts-bindings/fsl/i2c.txt
+++ b/Documentation/powerpc/dts-bindings/fsl/i2c.txt
@@ -2,15 +2,14 @@
2 2
3Required properties : 3Required properties :
4 4
5 - device_type : Should be "i2c"
6 - reg : Offset and length of the register set for the device 5 - reg : Offset and length of the register set for the device
6 - compatible : should be "fsl,CHIP-i2c" where CHIP is the name of a
7 compatible processor, e.g. mpc8313, mpc8543, mpc8544, mpc5121,
8 mpc5200 or mpc5200b. For the mpc5121, an additional node
9 "fsl,mpc5121-i2c-ctrl" is required as shown in the example below.
7 10
8Recommended properties : 11Recommended properties :
9 12
10 - compatible : compatibility list with 2 entries, the first should
11 be "fsl,CHIP-i2c" where CHIP is the name of a compatible processor,
12 e.g. mpc8313, mpc8543, mpc8544, mpc5200 or mpc5200b. The second one
13 should be "fsl-i2c".
14 - interrupts : <a b> where a is the interrupt number and b is a 13 - interrupts : <a b> where a is the interrupt number and b is a
15 field that represents an encoding of the sense and level 14 field that represents an encoding of the sense and level
16 information for the interrupt. This should be encoded based on 15 information for the interrupt. This should be encoded based on
@@ -24,25 +23,40 @@ Recommended properties :
24 23
25Examples : 24Examples :
26 25
26 /* MPC5121 based board */
27 i2c@1740 {
28 #address-cells = <1>;
29 #size-cells = <0>;
30 compatible = "fsl,mpc5121-i2c", "fsl-i2c";
31 reg = <0x1740 0x20>;
32 interrupts = <11 0x8>;
33 interrupt-parent = <&ipic>;
34 clock-frequency = <100000>;
35 };
36
37 i2ccontrol@1760 {
38 compatible = "fsl,mpc5121-i2c-ctrl";
39 reg = <0x1760 0x8>;
40 };
41
42 /* MPC5200B based board */
27 i2c@3d00 { 43 i2c@3d00 {
28 #address-cells = <1>; 44 #address-cells = <1>;
29 #size-cells = <0>; 45 #size-cells = <0>;
30 compatible = "fsl,mpc5200b-i2c","fsl,mpc5200-i2c","fsl-i2c"; 46 compatible = "fsl,mpc5200b-i2c","fsl,mpc5200-i2c","fsl-i2c";
31 cell-index = <0>;
32 reg = <0x3d00 0x40>; 47 reg = <0x3d00 0x40>;
33 interrupts = <2 15 0>; 48 interrupts = <2 15 0>;
34 interrupt-parent = <&mpc5200_pic>; 49 interrupt-parent = <&mpc5200_pic>;
35 fsl,preserve-clocking; 50 fsl,preserve-clocking;
36 }; 51 };
37 52
53 /* MPC8544 base board */
38 i2c@3100 { 54 i2c@3100 {
39 #address-cells = <1>; 55 #address-cells = <1>;
40 #size-cells = <0>; 56 #size-cells = <0>;
41 cell-index = <1>;
42 compatible = "fsl,mpc8544-i2c", "fsl-i2c"; 57 compatible = "fsl,mpc8544-i2c", "fsl-i2c";
43 reg = <0x3100 0x100>; 58 reg = <0x3100 0x100>;
44 interrupts = <43 2>; 59 interrupts = <43 2>;
45 interrupt-parent = <&mpic>; 60 interrupt-parent = <&mpic>;
46 clock-frequency = <400000>; 61 clock-frequency = <400000>;
47 }; 62 };
48
diff --git a/Documentation/powerpc/dts-bindings/fsl/mpc5121-psc.txt b/Documentation/powerpc/dts-bindings/fsl/mpc5121-psc.txt
new file mode 100644
index 000000000000..8832e8798912
--- /dev/null
+++ b/Documentation/powerpc/dts-bindings/fsl/mpc5121-psc.txt
@@ -0,0 +1,70 @@
1MPC5121 PSC Device Tree Bindings
2
3PSC in UART mode
4----------------
5
6For PSC in UART mode the needed PSC serial devices
7are specified by fsl,mpc5121-psc-uart nodes in the
8fsl,mpc5121-immr SoC node. Additionally the PSC FIFO
9Controller node fsl,mpc5121-psc-fifo is requered there:
10
11fsl,mpc5121-psc-uart nodes
12--------------------------
13
14Required properties :
15 - compatible : Should contain "fsl,mpc5121-psc-uart" and "fsl,mpc5121-psc"
16 - cell-index : Index of the PSC in hardware
17 - reg : Offset and length of the register set for the PSC device
18 - interrupts : <a b> where a is the interrupt number of the
19 PSC FIFO Controller and b is a field that represents an
20 encoding of the sense and level information for the interrupt.
21 - interrupt-parent : the phandle for the interrupt controller that
22 services interrupts for this device.
23
24Recommended properties :
25 - fsl,rx-fifo-size : the size of the RX fifo slice (a multiple of 4)
26 - fsl,tx-fifo-size : the size of the TX fifo slice (a multiple of 4)
27
28
29fsl,mpc5121-psc-fifo node
30-------------------------
31
32Required properties :
33 - compatible : Should be "fsl,mpc5121-psc-fifo"
34 - reg : Offset and length of the register set for the PSC
35 FIFO Controller
36 - interrupts : <a b> where a is the interrupt number of the
37 PSC FIFO Controller and b is a field that represents an
38 encoding of the sense and level information for the interrupt.
39 - interrupt-parent : the phandle for the interrupt controller that
40 services interrupts for this device.
41
42
43Example for a board using PSC0 and PSC1 devices in serial mode:
44
45serial@11000 {
46 compatible = "fsl,mpc5121-psc-uart", "fsl,mpc5121-psc";
47 cell-index = <0>;
48 reg = <0x11000 0x100>;
49 interrupts = <40 0x8>;
50 interrupt-parent = < &ipic >;
51 fsl,rx-fifo-size = <16>;
52 fsl,tx-fifo-size = <16>;
53};
54
55serial@11100 {
56 compatible = "fsl,mpc5121-psc-uart", "fsl,mpc5121-psc";
57 cell-index = <1>;
58 reg = <0x11100 0x100>;
59 interrupts = <40 0x8>;
60 interrupt-parent = < &ipic >;
61 fsl,rx-fifo-size = <16>;
62 fsl,tx-fifo-size = <16>;
63};
64
65pscfifo@11f00 {
66 compatible = "fsl,mpc5121-psc-fifo";
67 reg = <0x11f00 0x100>;
68 interrupts = <40 0x8>;
69 interrupt-parent = < &ipic >;
70};
diff --git a/Documentation/powerpc/dts-bindings/fsl/mpc5200.txt b/Documentation/powerpc/dts-bindings/fsl/mpc5200.txt
index 5c6602dbfdc2..4ccb2cd5df94 100644
--- a/Documentation/powerpc/dts-bindings/fsl/mpc5200.txt
+++ b/Documentation/powerpc/dts-bindings/fsl/mpc5200.txt
@@ -195,11 +195,4 @@ External interrupts:
195 195
196fsl,mpc5200-mscan nodes 196fsl,mpc5200-mscan nodes
197----------------------- 197-----------------------
198In addition to the required compatible-, reg- and interrupt-properites, you can 198See file can.txt in this directory.
199also specify which clock source shall be used for the controller:
200
201- fsl,mscan-clock-source- a string describing the clock source. Valid values
202 are: "ip" for ip bus clock
203 "ref" for reference clock (XTAL)
204 "ref" is default in case this property is not
205 present.
diff --git a/Documentation/powerpc/dts-bindings/fsl/spi.txt b/Documentation/powerpc/dts-bindings/fsl/spi.txt
index e7d9a344c4f4..80510c018eea 100644
--- a/Documentation/powerpc/dts-bindings/fsl/spi.txt
+++ b/Documentation/powerpc/dts-bindings/fsl/spi.txt
@@ -13,6 +13,11 @@ Required properties:
13- interrupt-parent : the phandle for the interrupt controller that 13- interrupt-parent : the phandle for the interrupt controller that
14 services interrupts for this device. 14 services interrupts for this device.
15 15
16Optional properties:
17- gpios : specifies the gpio pins to be used for chipselects.
18 The gpios will be referred to as reg = <index> in the SPI child nodes.
19 If unspecified, a single SPI device without a chip select can be used.
20
16Example: 21Example:
17 spi@4c0 { 22 spi@4c0 {
18 cell-index = <0>; 23 cell-index = <0>;
@@ -21,4 +26,6 @@ Example:
21 interrupts = <82 0>; 26 interrupts = <82 0>;
22 interrupt-parent = <700>; 27 interrupt-parent = <700>;
23 mode = "cpu"; 28 mode = "cpu";
29 gpios = <&gpio 18 1 // device reg=<0>
30 &gpio 19 1>; // device reg=<1>
24 }; 31 };
diff --git a/Documentation/powerpc/ptrace.txt b/Documentation/powerpc/ptrace.txt
new file mode 100644
index 000000000000..f4a5499b7bc6
--- /dev/null
+++ b/Documentation/powerpc/ptrace.txt
@@ -0,0 +1,134 @@
1GDB intends to support the following hardware debug features of BookE
2processors:
3
44 hardware breakpoints (IAC)
52 hardware watchpoints (read, write and read-write) (DAC)
62 value conditions for the hardware watchpoints (DVC)
7
8For that, we need to extend ptrace so that GDB can query and set these
9resources. Since we're extending, we're trying to create an interface
10that's extendable and that covers both BookE and server processors, so
11that GDB doesn't need to special-case each of them. We added the
12following 3 new ptrace requests.
13
141. PTRACE_PPC_GETHWDEBUGINFO
15
16Query for GDB to discover the hardware debug features. The main info to
17be returned here is the minimum alignment for the hardware watchpoints.
18BookE processors don't have restrictions here, but server processors have
19an 8-byte alignment restriction for hardware watchpoints. We'd like to avoid
20adding special cases to GDB based on what it sees in AUXV.
21
22Since we're at it, we added other useful info that the kernel can return to
23GDB: this query will return the number of hardware breakpoints, hardware
24watchpoints and whether it supports a range of addresses and a condition.
25The query will fill the following structure provided by the requesting process:
26
27struct ppc_debug_info {
28 unit32_t version;
29 unit32_t num_instruction_bps;
30 unit32_t num_data_bps;
31 unit32_t num_condition_regs;
32 unit32_t data_bp_alignment;
33 unit32_t sizeof_condition; /* size of the DVC register */
34 uint64_t features; /* bitmask of the individual flags */
35};
36
37features will have bits indicating whether there is support for:
38
39#define PPC_DEBUG_FEATURE_INSN_BP_RANGE 0x1
40#define PPC_DEBUG_FEATURE_INSN_BP_MASK 0x2
41#define PPC_DEBUG_FEATURE_DATA_BP_RANGE 0x4
42#define PPC_DEBUG_FEATURE_DATA_BP_MASK 0x8
43
442. PTRACE_SETHWDEBUG
45
46Sets a hardware breakpoint or watchpoint, according to the provided structure:
47
48struct ppc_hw_breakpoint {
49 uint32_t version;
50#define PPC_BREAKPOINT_TRIGGER_EXECUTE 0x1
51#define PPC_BREAKPOINT_TRIGGER_READ 0x2
52#define PPC_BREAKPOINT_TRIGGER_WRITE 0x4
53 uint32_t trigger_type; /* only some combinations allowed */
54#define PPC_BREAKPOINT_MODE_EXACT 0x0
55#define PPC_BREAKPOINT_MODE_RANGE_INCLUSIVE 0x1
56#define PPC_BREAKPOINT_MODE_RANGE_EXCLUSIVE 0x2
57#define PPC_BREAKPOINT_MODE_MASK 0x3
58 uint32_t addr_mode; /* address match mode */
59
60#define PPC_BREAKPOINT_CONDITION_MODE 0x3
61#define PPC_BREAKPOINT_CONDITION_NONE 0x0
62#define PPC_BREAKPOINT_CONDITION_AND 0x1
63#define PPC_BREAKPOINT_CONDITION_EXACT 0x1 /* different name for the same thing as above */
64#define PPC_BREAKPOINT_CONDITION_OR 0x2
65#define PPC_BREAKPOINT_CONDITION_AND_OR 0x3
66#define PPC_BREAKPOINT_CONDITION_BE_ALL 0x00ff0000 /* byte enable bits */
67#define PPC_BREAKPOINT_CONDITION_BE(n) (1<<((n)+16))
68 uint32_t condition_mode; /* break/watchpoint condition flags */
69
70 uint64_t addr;
71 uint64_t addr2;
72 uint64_t condition_value;
73};
74
75A request specifies one event, not necessarily just one register to be set.
76For instance, if the request is for a watchpoint with a condition, both the
77DAC and DVC registers will be set in the same request.
78
79With this GDB can ask for all kinds of hardware breakpoints and watchpoints
80that the BookE supports. COMEFROM breakpoints available in server processors
81are not contemplated, but that is out of the scope of this work.
82
83ptrace will return an integer (handle) uniquely identifying the breakpoint or
84watchpoint just created. This integer will be used in the PTRACE_DELHWDEBUG
85request to ask for its removal. Return -ENOSPC if the requested breakpoint
86can't be allocated on the registers.
87
88Some examples of using the structure to:
89
90- set a breakpoint in the first breakpoint register
91
92 p.version = PPC_DEBUG_CURRENT_VERSION;
93 p.trigger_type = PPC_BREAKPOINT_TRIGGER_EXECUTE;
94 p.addr_mode = PPC_BREAKPOINT_MODE_EXACT;
95 p.condition_mode = PPC_BREAKPOINT_CONDITION_NONE;
96 p.addr = (uint64_t) address;
97 p.addr2 = 0;
98 p.condition_value = 0;
99
100- set a watchpoint which triggers on reads in the second watchpoint register
101
102 p.version = PPC_DEBUG_CURRENT_VERSION;
103 p.trigger_type = PPC_BREAKPOINT_TRIGGER_READ;
104 p.addr_mode = PPC_BREAKPOINT_MODE_EXACT;
105 p.condition_mode = PPC_BREAKPOINT_CONDITION_NONE;
106 p.addr = (uint64_t) address;
107 p.addr2 = 0;
108 p.condition_value = 0;
109
110- set a watchpoint which triggers only with a specific value
111
112 p.version = PPC_DEBUG_CURRENT_VERSION;
113 p.trigger_type = PPC_BREAKPOINT_TRIGGER_READ;
114 p.addr_mode = PPC_BREAKPOINT_MODE_EXACT;
115 p.condition_mode = PPC_BREAKPOINT_CONDITION_AND | PPC_BREAKPOINT_CONDITION_BE_ALL;
116 p.addr = (uint64_t) address;
117 p.addr2 = 0;
118 p.condition_value = (uint64_t) condition;
119
120- set a ranged hardware breakpoint
121
122 p.version = PPC_DEBUG_CURRENT_VERSION;
123 p.trigger_type = PPC_BREAKPOINT_TRIGGER_EXECUTE;
124 p.addr_mode = PPC_BREAKPOINT_MODE_RANGE_INCLUSIVE;
125 p.condition_mode = PPC_BREAKPOINT_CONDITION_NONE;
126 p.addr = (uint64_t) begin_range;
127 p.addr2 = (uint64_t) end_range;
128 p.condition_value = 0;
129
1303. PTRACE_DELHWDEBUG
131
132Takes an integer which identifies an existing breakpoint or watchpoint
133(i.e., the value returned from PTRACE_SETHWDEBUG), and deletes the
134corresponding breakpoint or watchpoint..
diff --git a/Documentation/s390/CommonIO b/Documentation/s390/CommonIO
index 339207d11d95..d378cba66456 100644
--- a/Documentation/s390/CommonIO
+++ b/Documentation/s390/CommonIO
@@ -87,6 +87,12 @@ Command line parameters
87 compatibility, by the device number in hexadecimal (0xabcd or abcd). Device 87 compatibility, by the device number in hexadecimal (0xabcd or abcd). Device
88 numbers given as 0xabcd will be interpreted as 0.0.abcd. 88 numbers given as 0xabcd will be interpreted as 0.0.abcd.
89 89
90* /proc/cio_settle
91
92 A write request to this file is blocked until all queued cio actions are
93 handled. This will allow userspace to wait for pending work affecting
94 device availability after changing cio_ignore or the hardware configuration.
95
90* For some of the information present in the /proc filesystem in 2.4 (namely, 96* For some of the information present in the /proc filesystem in 2.4 (namely,
91 /proc/subchannels and /proc/chpids), see driver-model.txt. 97 /proc/subchannels and /proc/chpids), see driver-model.txt.
92 Information formerly in /proc/irq_count is now in /proc/interrupts. 98 Information formerly in /proc/irq_count is now in /proc/interrupts.
diff --git a/Documentation/s390/driver-model.txt b/Documentation/s390/driver-model.txt
index bde473df748d..ed265cf54cde 100644
--- a/Documentation/s390/driver-model.txt
+++ b/Documentation/s390/driver-model.txt
@@ -223,8 +223,8 @@ touched by the driver - it should use the ccwgroup device's driver_data for its
223private data. 223private data.
224 224
225To implement a ccwgroup driver, please refer to include/asm/ccwgroup.h. Keep in 225To implement a ccwgroup driver, please refer to include/asm/ccwgroup.h. Keep in
226mind that most drivers will need to implement both a ccwgroup and a ccw driver 226mind that most drivers will need to implement both a ccwgroup and a ccw
227(unless you have a meta ccw driver, like cu3088 for lcs and ctc). 227driver.
228 228
229 229
2302. Channel paths 2302. Channel paths
diff --git a/Documentation/s390/kvm.txt b/Documentation/s390/kvm.txt
index 6f5ceb0f09fc..85f3280d7ef6 100644
--- a/Documentation/s390/kvm.txt
+++ b/Documentation/s390/kvm.txt
@@ -102,7 +102,7 @@ args: unsigned long
102see also: include/linux/kvm.h 102see also: include/linux/kvm.h
103This ioctl stores the state of the cpu at the guest real address given as 103This ioctl stores the state of the cpu at the guest real address given as
104argument, unless one of the following values defined in include/linux/kvm.h 104argument, unless one of the following values defined in include/linux/kvm.h
105is given as arguement: 105is given as argument:
106KVM_S390_STORE_STATUS_NOADDR - the CPU stores its status to the save area in 106KVM_S390_STORE_STATUS_NOADDR - the CPU stores its status to the save area in
107absolute lowcore as defined by the principles of operation 107absolute lowcore as defined by the principles of operation
108KVM_S390_STORE_STATUS_PREFIXED - the CPU stores its status to the save area in 108KVM_S390_STORE_STATUS_PREFIXED - the CPU stores its status to the save area in
diff --git a/Documentation/scsi/ChangeLog.lpfc b/Documentation/scsi/ChangeLog.lpfc
index ff19a52fe004..2ffc1148eb95 100644
--- a/Documentation/scsi/ChangeLog.lpfc
+++ b/Documentation/scsi/ChangeLog.lpfc
@@ -989,8 +989,8 @@ Changes from 20040709 to 20040716
989 * Remove redundant port_cmp != 2 check in if 989 * Remove redundant port_cmp != 2 check in if
990 (!port_cmp) { .... if (port_cmp != 2).... } 990 (!port_cmp) { .... if (port_cmp != 2).... }
991 * Clock changes: removed struct clk_data and timerList. 991 * Clock changes: removed struct clk_data and timerList.
992 * Clock changes: seperate nodev_tmo and els_retry_delay into 2 992 * Clock changes: separate nodev_tmo and els_retry_delay into 2
993 seperate timers and convert to 1 argument changed 993 separate timers and convert to 1 argument changed
994 LPFC_NODE_FARP_PEND_t to struct lpfc_node_farp_pend convert 994 LPFC_NODE_FARP_PEND_t to struct lpfc_node_farp_pend convert
995 ipfarp_tmo to 1 argument convert target struct tmofunc and 995 ipfarp_tmo to 1 argument convert target struct tmofunc and
996 rtplunfunc to 1 argument * cr_count, cr_delay and 996 rtplunfunc to 1 argument * cr_count, cr_delay and
@@ -1514,7 +1514,7 @@ Changes from 20040402 to 20040409
1514 * Remove unused elxclock declaration in elx_sli.h. 1514 * Remove unused elxclock declaration in elx_sli.h.
1515 * Since everywhere IOCB_ENTRY is used, the return value is cast, 1515 * Since everywhere IOCB_ENTRY is used, the return value is cast,
1516 move the cast into the macro. 1516 move the cast into the macro.
1517 * Split ioctls out into seperate files 1517 * Split ioctls out into separate files
1518 1518
1519Changes from 20040326 to 20040402 1519Changes from 20040326 to 20040402
1520 1520
@@ -1534,7 +1534,7 @@ Changes from 20040326 to 20040402
1534 * Unused variable cleanup 1534 * Unused variable cleanup
1535 * Use Linux list macros for DMABUF_t 1535 * Use Linux list macros for DMABUF_t
1536 * Break up ioctls into 3 sections, dfc, util, hbaapi 1536 * Break up ioctls into 3 sections, dfc, util, hbaapi
1537 rearranged code so this could be easily seperated into a 1537 rearranged code so this could be easily separated into a
1538 differnet module later All 3 are currently turned on by 1538 differnet module later All 3 are currently turned on by
1539 defines in lpfc_ioctl.c LPFC_DFC_IOCTL, LPFC_UTIL_IOCTL, 1539 defines in lpfc_ioctl.c LPFC_DFC_IOCTL, LPFC_UTIL_IOCTL,
1540 LPFC_HBAAPI_IOCTL 1540 LPFC_HBAAPI_IOCTL
@@ -1551,7 +1551,7 @@ Changes from 20040326 to 20040402
1551 started by lpfc_online(). lpfc_offline() only stopped 1551 started by lpfc_online(). lpfc_offline() only stopped
1552 els_timeout routine. It now stops all timeout routines 1552 els_timeout routine. It now stops all timeout routines
1553 associated with that hba. 1553 associated with that hba.
1554 * Replace seperate next and prev pointers in struct 1554 * Replace separate next and prev pointers in struct
1555 lpfc_bindlist with list_head type. In elxHBA_t, replace 1555 lpfc_bindlist with list_head type. In elxHBA_t, replace
1556 fc_nlpbind_start and _end with fc_nlpbind_list and use 1556 fc_nlpbind_start and _end with fc_nlpbind_list and use
1557 list_head macros to access it. 1557 list_head macros to access it.
diff --git a/Documentation/scsi/ChangeLog.megaraid_sas b/Documentation/scsi/ChangeLog.megaraid_sas
index 17ffa0607712..30023568805e 100644
--- a/Documentation/scsi/ChangeLog.megaraid_sas
+++ b/Documentation/scsi/ChangeLog.megaraid_sas
@@ -1,3 +1,19 @@
11 Release Date : Thur. Oct 29, 2009 09:12:45 PST 2009 -
2 (emaild-id:megaraidlinux@lsi.com)
3 Bo Yang
4
52 Current Version : 00.00.04.17.1-rc1
63 Older Version : 00.00.04.12
7
81. Add the pad_0 in mfi frame structure to 0 to fix the
9 context value larger than 32bit value issue.
10
112. Add the logic drive list to the driver. Driver will
12 keep the logic drive list internal after driver load.
13
143. driver fixed the device update issue after get the AEN
15 PD delete/ADD, LD add/delete from FW.
16
11 Release Date : Tues. July 28, 2009 10:12:45 PST 2009 - 171 Release Date : Tues. July 28, 2009 10:12:45 PST 2009 -
2 (emaild-id:megaraidlinux@lsi.com) 18 (emaild-id:megaraidlinux@lsi.com)
3 Bo Yang 19 Bo Yang
diff --git a/Documentation/serial/tty.txt b/Documentation/serial/tty.txt
index 5e5349a4fcd2..7c900507279f 100644
--- a/Documentation/serial/tty.txt
+++ b/Documentation/serial/tty.txt
@@ -105,6 +105,10 @@ write_wakeup() - May be called at any point between open and close.
105 is permitted to call the driver write method from 105 is permitted to call the driver write method from
106 this function. In such a situation defer it. 106 this function. In such a situation defer it.
107 107
108dcd_change() - Report to the tty line the current DCD pin status
109 changes and the relative timestamp. The timestamp
110 can be NULL.
111
108 112
109Driver Access 113Driver Access
110 114
diff --git a/Documentation/sound/alsa/ALSA-Configuration.txt b/Documentation/sound/alsa/ALSA-Configuration.txt
index 8923597bd2bd..bfcbbf88c44d 100644
--- a/Documentation/sound/alsa/ALSA-Configuration.txt
+++ b/Documentation/sound/alsa/ALSA-Configuration.txt
@@ -482,6 +482,9 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
482 482
483 reference_rate - reference sample rate, 44100 or 48000 (default) 483 reference_rate - reference sample rate, 44100 or 48000 (default)
484 multiple - multiple to ref. sample rate, 1 or 2 (default) 484 multiple - multiple to ref. sample rate, 1 or 2 (default)
485 subsystem - override the PCI SSID for probing; the value
486 consists of SSVID << 16 | SSDID. The default is
487 zero, which means no override.
485 488
486 This module supports multiple cards. 489 This module supports multiple cards.
487 490
@@ -1123,6 +1126,21 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
1123 1126
1124 This module supports multiple cards, autoprobe and ISA PnP. 1127 This module supports multiple cards, autoprobe and ISA PnP.
1125 1128
1129 Module snd-jazz16
1130 -------------------
1131
1132 Module for Media Vision Jazz16 chipset. The chipset consists of 3 chips:
1133 MVD1216 + MVA416 + MVA514.
1134
1135 port - port # for SB DSP chip (0x210,0x220,0x230,0x240,0x250,0x260)
1136 irq - IRQ # for SB DSP chip (3,5,7,9,10,15)
1137 dma8 - DMA # for SB DSP chip (1,3)
1138 dma16 - DMA # for SB DSP chip (5,7)
1139 mpu_port - MPU-401 port # (0x300,0x310,0x320,0x330)
1140 mpu_irq - MPU-401 irq # (2,3,5,7)
1141
1142 This module supports multiple cards.
1143
1126 Module snd-korg1212 1144 Module snd-korg1212
1127 ------------------- 1145 -------------------
1128 1146
@@ -1791,6 +1809,13 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
1791 1809
1792 The power-management is supported. 1810 The power-management is supported.
1793 1811
1812 Module snd-ua101
1813 ----------------
1814
1815 Module for the Edirol UA-101/UA-1000 audio/MIDI interfaces.
1816
1817 This module supports multiple devices, autoprobe and hotplugging.
1818
1794 Module snd-usb-audio 1819 Module snd-usb-audio
1795 -------------------- 1820 --------------------
1796 1821
@@ -1923,7 +1948,7 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
1923 ------------------- 1948 -------------------
1924 1949
1925 Module for sound cards based on the Asus AV100/AV200 chips, 1950 Module for sound cards based on the Asus AV100/AV200 chips,
1926 i.e., Xonar D1, DX, D2, D2X, HDAV1.3 (Deluxe), Essence ST 1951 i.e., Xonar D1, DX, D2, D2X, DS, HDAV1.3 (Deluxe), Essence ST
1927 (Deluxe) and Essence STX. 1952 (Deluxe) and Essence STX.
1928 1953
1929 This module supports autoprobe and multiple cards. 1954 This module supports autoprobe and multiple cards.
diff --git a/Documentation/sound/alsa/HD-Audio-Models.txt b/Documentation/sound/alsa/HD-Audio-Models.txt
index e72cee9e2a71..1d38b0dfba95 100644
--- a/Documentation/sound/alsa/HD-Audio-Models.txt
+++ b/Documentation/sound/alsa/HD-Audio-Models.txt
@@ -124,6 +124,8 @@ ALC882/883/885/888/889
124 asus-a7m ASUS A7M 124 asus-a7m ASUS A7M
125 macpro MacPro support 125 macpro MacPro support
126 mb5 Macbook 5,1 126 mb5 Macbook 5,1
127 macmini3 Macmini 3,1
128 mba21 Macbook Air 2,1
127 mbp3 Macbook Pro rev3 129 mbp3 Macbook Pro rev3
128 imac24 iMac 24'' with jack detection 130 imac24 iMac 24'' with jack detection
129 imac91 iMac 9,1 131 imac91 iMac 9,1
@@ -279,13 +281,16 @@ Conexant 5051
279 laptop Basic Laptop config (default) 281 laptop Basic Laptop config (default)
280 hp HP Spartan laptop 282 hp HP Spartan laptop
281 hp-dv6736 HP dv6736 283 hp-dv6736 HP dv6736
284 hp-f700 HP Compaq Presario F700
282 lenovo-x200 Lenovo X200 laptop 285 lenovo-x200 Lenovo X200 laptop
286 toshiba Toshiba Satellite M300
283 287
284Conexant 5066 288Conexant 5066
285============= 289=============
286 laptop Basic Laptop config (default) 290 laptop Basic Laptop config (default)
287 dell-laptop Dell laptops 291 dell-laptop Dell laptops
288 olpc-xo-1_5 OLPC XO 1.5 292 olpc-xo-1_5 OLPC XO 1.5
293 ideapad Lenovo IdeaPad U150
289 294
290STAC9200 295STAC9200
291======== 296========
diff --git a/Documentation/sound/alsa/HD-Audio.txt b/Documentation/sound/alsa/HD-Audio.txt
index 6325bec06a72..f4dd3bf99d12 100644
--- a/Documentation/sound/alsa/HD-Audio.txt
+++ b/Documentation/sound/alsa/HD-Audio.txt
@@ -452,6 +452,33 @@ Similarly, the lines after `[verb]` are parsed as `init_verbs`
452sysfs entries, and the lines after `[hint]` are parsed as `hints` 452sysfs entries, and the lines after `[hint]` are parsed as `hints`
453sysfs entries, respectively. 453sysfs entries, respectively.
454 454
455Another example to override the codec vendor id from 0x12345678 to
4560xdeadbeef is like below:
457------------------------------------------------------------------------
458 [codec]
459 0x12345678 0xabcd1234 2
460
461 [vendor_id]
462 0xdeadbeef
463------------------------------------------------------------------------
464
465In the similar way, you can override the codec subsystem_id via
466`[subsystem_id]`, the revision id via `[revision_id]` line.
467Also, the codec chip name can be rewritten via `[chip_name]` line.
468------------------------------------------------------------------------
469 [codec]
470 0x12345678 0xabcd1234 2
471
472 [subsystem_id]
473 0xffff1111
474
475 [revision_id]
476 0x10
477
478 [chip_name]
479 My-own NEWS-0002
480------------------------------------------------------------------------
481
455The hd-audio driver reads the file via request_firmware(). Thus, 482The hd-audio driver reads the file via request_firmware(). Thus,
456a patch file has to be located on the appropriate firmware path, 483a patch file has to be located on the appropriate firmware path,
457typically, /lib/firmware. For example, when you pass the option 484typically, /lib/firmware. For example, when you pass the option
diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
index fc5790d36cd9..6c7d18c53f84 100644
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -573,11 +573,14 @@ Because other nodes' memory may be free. This means system total status
573may be not fatal yet. 573may be not fatal yet.
574 574
575If this is set to 2, the kernel panics compulsorily even on the 575If this is set to 2, the kernel panics compulsorily even on the
576above-mentioned. 576above-mentioned. Even oom happens under memory cgroup, the whole
577system panics.
577 578
578The default value is 0. 579The default value is 0.
5791 and 2 are for failover of clustering. Please select either 5801 and 2 are for failover of clustering. Please select either
580according to your policy of failover. 581according to your policy of failover.
582panic_on_oom=2+kdump gives you very strong tool to investigate
583why oom happens. You can get snapshot.
581 584
582============================================================= 585=============================================================
583 586
diff --git a/Documentation/timers/00-INDEX b/Documentation/timers/00-INDEX
index 397dc35e1323..a9248da5cdbc 100644
--- a/Documentation/timers/00-INDEX
+++ b/Documentation/timers/00-INDEX
@@ -4,6 +4,8 @@ highres.txt
4 - High resolution timers and dynamic ticks design notes 4 - High resolution timers and dynamic ticks design notes
5hpet.txt 5hpet.txt
6 - High Precision Event Timer Driver for Linux 6 - High Precision Event Timer Driver for Linux
7hpet_example.c
8 - sample hpet timer test program
7hrtimers.txt 9hrtimers.txt
8 - subsystem for high-resolution kernel timers 10 - subsystem for high-resolution kernel timers
9timer_stats.txt 11timer_stats.txt
diff --git a/Documentation/timers/Makefile b/Documentation/timers/Makefile
new file mode 100644
index 000000000000..c85625f4ab25
--- /dev/null
+++ b/Documentation/timers/Makefile
@@ -0,0 +1,8 @@
1# kbuild trick to avoid linker error. Can be omitted if a module is built.
2obj- := dummy.o
3
4# List of programs to build
5hostprogs-y := hpet_example
6
7# Tell kbuild to always build the programs
8always := $(hostprogs-y)
diff --git a/Documentation/timers/hpet.txt b/Documentation/timers/hpet.txt
index 16d25e6b5a00..767392ffd31e 100644
--- a/Documentation/timers/hpet.txt
+++ b/Documentation/timers/hpet.txt
@@ -26,274 +26,5 @@ initialization. An example of this initialization can be found in
26arch/x86/kernel/hpet.c. 26arch/x86/kernel/hpet.c.
27 27
28The driver provides a userspace API which resembles the API found in the 28The driver provides a userspace API which resembles the API found in the
29RTC driver framework. An example user space program is provided below. 29RTC driver framework. An example user space program is provided in
30 30file:Documentation/timers/hpet_example.c
31#include <stdio.h>
32#include <stdlib.h>
33#include <unistd.h>
34#include <fcntl.h>
35#include <string.h>
36#include <memory.h>
37#include <malloc.h>
38#include <time.h>
39#include <ctype.h>
40#include <sys/types.h>
41#include <sys/wait.h>
42#include <signal.h>
43#include <fcntl.h>
44#include <errno.h>
45#include <sys/time.h>
46#include <linux/hpet.h>
47
48
49extern void hpet_open_close(int, const char **);
50extern void hpet_info(int, const char **);
51extern void hpet_poll(int, const char **);
52extern void hpet_fasync(int, const char **);
53extern void hpet_read(int, const char **);
54
55#include <sys/poll.h>
56#include <sys/ioctl.h>
57#include <signal.h>
58
59struct hpet_command {
60 char *command;
61 void (*func)(int argc, const char ** argv);
62} hpet_command[] = {
63 {
64 "open-close",
65 hpet_open_close
66 },
67 {
68 "info",
69 hpet_info
70 },
71 {
72 "poll",
73 hpet_poll
74 },
75 {
76 "fasync",
77 hpet_fasync
78 },
79};
80
81int
82main(int argc, const char ** argv)
83{
84 int i;
85
86 argc--;
87 argv++;
88
89 if (!argc) {
90 fprintf(stderr, "-hpet: requires command\n");
91 return -1;
92 }
93
94
95 for (i = 0; i < (sizeof (hpet_command) / sizeof (hpet_command[0])); i++)
96 if (!strcmp(argv[0], hpet_command[i].command)) {
97 argc--;
98 argv++;
99 fprintf(stderr, "-hpet: executing %s\n",
100 hpet_command[i].command);
101 hpet_command[i].func(argc, argv);
102 return 0;
103 }
104
105 fprintf(stderr, "do_hpet: command %s not implemented\n", argv[0]);
106
107 return -1;
108}
109
110void
111hpet_open_close(int argc, const char **argv)
112{
113 int fd;
114
115 if (argc != 1) {
116 fprintf(stderr, "hpet_open_close: device-name\n");
117 return;
118 }
119
120 fd = open(argv[0], O_RDONLY);
121 if (fd < 0)
122 fprintf(stderr, "hpet_open_close: open failed\n");
123 else
124 close(fd);
125
126 return;
127}
128
129void
130hpet_info(int argc, const char **argv)
131{
132}
133
134void
135hpet_poll(int argc, const char **argv)
136{
137 unsigned long freq;
138 int iterations, i, fd;
139 struct pollfd pfd;
140 struct hpet_info info;
141 struct timeval stv, etv;
142 struct timezone tz;
143 long usec;
144
145 if (argc != 3) {
146 fprintf(stderr, "hpet_poll: device-name freq iterations\n");
147 return;
148 }
149
150 freq = atoi(argv[1]);
151 iterations = atoi(argv[2]);
152
153 fd = open(argv[0], O_RDONLY);
154
155 if (fd < 0) {
156 fprintf(stderr, "hpet_poll: open of %s failed\n", argv[0]);
157 return;
158 }
159
160 if (ioctl(fd, HPET_IRQFREQ, freq) < 0) {
161 fprintf(stderr, "hpet_poll: HPET_IRQFREQ failed\n");
162 goto out;
163 }
164
165 if (ioctl(fd, HPET_INFO, &info) < 0) {
166 fprintf(stderr, "hpet_poll: failed to get info\n");
167 goto out;
168 }
169
170 fprintf(stderr, "hpet_poll: info.hi_flags 0x%lx\n", info.hi_flags);
171
172 if (info.hi_flags && (ioctl(fd, HPET_EPI, 0) < 0)) {
173 fprintf(stderr, "hpet_poll: HPET_EPI failed\n");
174 goto out;
175 }
176
177 if (ioctl(fd, HPET_IE_ON, 0) < 0) {
178 fprintf(stderr, "hpet_poll, HPET_IE_ON failed\n");
179 goto out;
180 }
181
182 pfd.fd = fd;
183 pfd.events = POLLIN;
184
185 for (i = 0; i < iterations; i++) {
186 pfd.revents = 0;
187 gettimeofday(&stv, &tz);
188 if (poll(&pfd, 1, -1) < 0)
189 fprintf(stderr, "hpet_poll: poll failed\n");
190 else {
191 long data;
192
193 gettimeofday(&etv, &tz);
194 usec = stv.tv_sec * 1000000 + stv.tv_usec;
195 usec = (etv.tv_sec * 1000000 + etv.tv_usec) - usec;
196
197 fprintf(stderr,
198 "hpet_poll: expired time = 0x%lx\n", usec);
199
200 fprintf(stderr, "hpet_poll: revents = 0x%x\n",
201 pfd.revents);
202
203 if (read(fd, &data, sizeof(data)) != sizeof(data)) {
204 fprintf(stderr, "hpet_poll: read failed\n");
205 }
206 else
207 fprintf(stderr, "hpet_poll: data 0x%lx\n",
208 data);
209 }
210 }
211
212out:
213 close(fd);
214 return;
215}
216
217static int hpet_sigio_count;
218
219static void
220hpet_sigio(int val)
221{
222 fprintf(stderr, "hpet_sigio: called\n");
223 hpet_sigio_count++;
224}
225
226void
227hpet_fasync(int argc, const char **argv)
228{
229 unsigned long freq;
230 int iterations, i, fd, value;
231 sig_t oldsig;
232 struct hpet_info info;
233
234 hpet_sigio_count = 0;
235 fd = -1;
236
237 if ((oldsig = signal(SIGIO, hpet_sigio)) == SIG_ERR) {
238 fprintf(stderr, "hpet_fasync: failed to set signal handler\n");
239 return;
240 }
241
242 if (argc != 3) {
243 fprintf(stderr, "hpet_fasync: device-name freq iterations\n");
244 goto out;
245 }
246
247 fd = open(argv[0], O_RDONLY);
248
249 if (fd < 0) {
250 fprintf(stderr, "hpet_fasync: failed to open %s\n", argv[0]);
251 return;
252 }
253
254
255 if ((fcntl(fd, F_SETOWN, getpid()) == 1) ||
256 ((value = fcntl(fd, F_GETFL)) == 1) ||
257 (fcntl(fd, F_SETFL, value | O_ASYNC) == 1)) {
258 fprintf(stderr, "hpet_fasync: fcntl failed\n");
259 goto out;
260 }
261
262 freq = atoi(argv[1]);
263 iterations = atoi(argv[2]);
264
265 if (ioctl(fd, HPET_IRQFREQ, freq) < 0) {
266 fprintf(stderr, "hpet_fasync: HPET_IRQFREQ failed\n");
267 goto out;
268 }
269
270 if (ioctl(fd, HPET_INFO, &info) < 0) {
271 fprintf(stderr, "hpet_fasync: failed to get info\n");
272 goto out;
273 }
274
275 fprintf(stderr, "hpet_fasync: info.hi_flags 0x%lx\n", info.hi_flags);
276
277 if (info.hi_flags && (ioctl(fd, HPET_EPI, 0) < 0)) {
278 fprintf(stderr, "hpet_fasync: HPET_EPI failed\n");
279 goto out;
280 }
281
282 if (ioctl(fd, HPET_IE_ON, 0) < 0) {
283 fprintf(stderr, "hpet_fasync, HPET_IE_ON failed\n");
284 goto out;
285 }
286
287 for (i = 0; i < iterations; i++) {
288 (void) pause();
289 fprintf(stderr, "hpet_fasync: count = %d\n", hpet_sigio_count);
290 }
291
292out:
293 signal(SIGIO, oldsig);
294
295 if (fd >= 0)
296 close(fd);
297
298 return;
299}
diff --git a/Documentation/timers/hpet_example.c b/Documentation/timers/hpet_example.c
new file mode 100644
index 000000000000..f9ce2d9fdfd5
--- /dev/null
+++ b/Documentation/timers/hpet_example.c
@@ -0,0 +1,269 @@
1#include <stdio.h>
2#include <stdlib.h>
3#include <unistd.h>
4#include <fcntl.h>
5#include <string.h>
6#include <memory.h>
7#include <malloc.h>
8#include <time.h>
9#include <ctype.h>
10#include <sys/types.h>
11#include <sys/wait.h>
12#include <signal.h>
13#include <fcntl.h>
14#include <errno.h>
15#include <sys/time.h>
16#include <linux/hpet.h>
17
18
19extern void hpet_open_close(int, const char **);
20extern void hpet_info(int, const char **);
21extern void hpet_poll(int, const char **);
22extern void hpet_fasync(int, const char **);
23extern void hpet_read(int, const char **);
24
25#include <sys/poll.h>
26#include <sys/ioctl.h>
27#include <signal.h>
28
29struct hpet_command {
30 char *command;
31 void (*func)(int argc, const char ** argv);
32} hpet_command[] = {
33 {
34 "open-close",
35 hpet_open_close
36 },
37 {
38 "info",
39 hpet_info
40 },
41 {
42 "poll",
43 hpet_poll
44 },
45 {
46 "fasync",
47 hpet_fasync
48 },
49};
50
51int
52main(int argc, const char ** argv)
53{
54 int i;
55
56 argc--;
57 argv++;
58
59 if (!argc) {
60 fprintf(stderr, "-hpet: requires command\n");
61 return -1;
62 }
63
64
65 for (i = 0; i < (sizeof (hpet_command) / sizeof (hpet_command[0])); i++)
66 if (!strcmp(argv[0], hpet_command[i].command)) {
67 argc--;
68 argv++;
69 fprintf(stderr, "-hpet: executing %s\n",
70 hpet_command[i].command);
71 hpet_command[i].func(argc, argv);
72 return 0;
73 }
74
75 fprintf(stderr, "do_hpet: command %s not implemented\n", argv[0]);
76
77 return -1;
78}
79
80void
81hpet_open_close(int argc, const char **argv)
82{
83 int fd;
84
85 if (argc != 1) {
86 fprintf(stderr, "hpet_open_close: device-name\n");
87 return;
88 }
89
90 fd = open(argv[0], O_RDONLY);
91 if (fd < 0)
92 fprintf(stderr, "hpet_open_close: open failed\n");
93 else
94 close(fd);
95
96 return;
97}
98
99void
100hpet_info(int argc, const char **argv)
101{
102}
103
104void
105hpet_poll(int argc, const char **argv)
106{
107 unsigned long freq;
108 int iterations, i, fd;
109 struct pollfd pfd;
110 struct hpet_info info;
111 struct timeval stv, etv;
112 struct timezone tz;
113 long usec;
114
115 if (argc != 3) {
116 fprintf(stderr, "hpet_poll: device-name freq iterations\n");
117 return;
118 }
119
120 freq = atoi(argv[1]);
121 iterations = atoi(argv[2]);
122
123 fd = open(argv[0], O_RDONLY);
124
125 if (fd < 0) {
126 fprintf(stderr, "hpet_poll: open of %s failed\n", argv[0]);
127 return;
128 }
129
130 if (ioctl(fd, HPET_IRQFREQ, freq) < 0) {
131 fprintf(stderr, "hpet_poll: HPET_IRQFREQ failed\n");
132 goto out;
133 }
134
135 if (ioctl(fd, HPET_INFO, &info) < 0) {
136 fprintf(stderr, "hpet_poll: failed to get info\n");
137 goto out;
138 }
139
140 fprintf(stderr, "hpet_poll: info.hi_flags 0x%lx\n", info.hi_flags);
141
142 if (info.hi_flags && (ioctl(fd, HPET_EPI, 0) < 0)) {
143 fprintf(stderr, "hpet_poll: HPET_EPI failed\n");
144 goto out;
145 }
146
147 if (ioctl(fd, HPET_IE_ON, 0) < 0) {
148 fprintf(stderr, "hpet_poll, HPET_IE_ON failed\n");
149 goto out;
150 }
151
152 pfd.fd = fd;
153 pfd.events = POLLIN;
154
155 for (i = 0; i < iterations; i++) {
156 pfd.revents = 0;
157 gettimeofday(&stv, &tz);
158 if (poll(&pfd, 1, -1) < 0)
159 fprintf(stderr, "hpet_poll: poll failed\n");
160 else {
161 long data;
162
163 gettimeofday(&etv, &tz);
164 usec = stv.tv_sec * 1000000 + stv.tv_usec;
165 usec = (etv.tv_sec * 1000000 + etv.tv_usec) - usec;
166
167 fprintf(stderr,
168 "hpet_poll: expired time = 0x%lx\n", usec);
169
170 fprintf(stderr, "hpet_poll: revents = 0x%x\n",
171 pfd.revents);
172
173 if (read(fd, &data, sizeof(data)) != sizeof(data)) {
174 fprintf(stderr, "hpet_poll: read failed\n");
175 }
176 else
177 fprintf(stderr, "hpet_poll: data 0x%lx\n",
178 data);
179 }
180 }
181
182out:
183 close(fd);
184 return;
185}
186
187static int hpet_sigio_count;
188
189static void
190hpet_sigio(int val)
191{
192 fprintf(stderr, "hpet_sigio: called\n");
193 hpet_sigio_count++;
194}
195
196void
197hpet_fasync(int argc, const char **argv)
198{
199 unsigned long freq;
200 int iterations, i, fd, value;
201 sig_t oldsig;
202 struct hpet_info info;
203
204 hpet_sigio_count = 0;
205 fd = -1;
206
207 if ((oldsig = signal(SIGIO, hpet_sigio)) == SIG_ERR) {
208 fprintf(stderr, "hpet_fasync: failed to set signal handler\n");
209 return;
210 }
211
212 if (argc != 3) {
213 fprintf(stderr, "hpet_fasync: device-name freq iterations\n");
214 goto out;
215 }
216
217 fd = open(argv[0], O_RDONLY);
218
219 if (fd < 0) {
220 fprintf(stderr, "hpet_fasync: failed to open %s\n", argv[0]);
221 return;
222 }
223
224
225 if ((fcntl(fd, F_SETOWN, getpid()) == 1) ||
226 ((value = fcntl(fd, F_GETFL)) == 1) ||
227 (fcntl(fd, F_SETFL, value | O_ASYNC) == 1)) {
228 fprintf(stderr, "hpet_fasync: fcntl failed\n");
229 goto out;
230 }
231
232 freq = atoi(argv[1]);
233 iterations = atoi(argv[2]);
234
235 if (ioctl(fd, HPET_IRQFREQ, freq) < 0) {
236 fprintf(stderr, "hpet_fasync: HPET_IRQFREQ failed\n");
237 goto out;
238 }
239
240 if (ioctl(fd, HPET_INFO, &info) < 0) {
241 fprintf(stderr, "hpet_fasync: failed to get info\n");
242 goto out;
243 }
244
245 fprintf(stderr, "hpet_fasync: info.hi_flags 0x%lx\n", info.hi_flags);
246
247 if (info.hi_flags && (ioctl(fd, HPET_EPI, 0) < 0)) {
248 fprintf(stderr, "hpet_fasync: HPET_EPI failed\n");
249 goto out;
250 }
251
252 if (ioctl(fd, HPET_IE_ON, 0) < 0) {
253 fprintf(stderr, "hpet_fasync, HPET_IE_ON failed\n");
254 goto out;
255 }
256
257 for (i = 0; i < iterations; i++) {
258 (void) pause();
259 fprintf(stderr, "hpet_fasync: count = %d\n", hpet_sigio_count);
260 }
261
262out:
263 signal(SIGIO, oldsig);
264
265 if (fd >= 0)
266 close(fd);
267
268 return;
269}
diff --git a/Documentation/trace/ftrace-design.txt b/Documentation/trace/ftrace-design.txt
index 6a5a579126b0..f1f81afee8a0 100644
--- a/Documentation/trace/ftrace-design.txt
+++ b/Documentation/trace/ftrace-design.txt
@@ -238,11 +238,10 @@ HAVE_SYSCALL_TRACEPOINTS
238 238
239You need very few things to get the syscalls tracing in an arch. 239You need very few things to get the syscalls tracing in an arch.
240 240
241- Support HAVE_ARCH_TRACEHOOK (see arch/Kconfig).
241- Have a NR_syscalls variable in <asm/unistd.h> that provides the number 242- Have a NR_syscalls variable in <asm/unistd.h> that provides the number
242 of syscalls supported by the arch. 243 of syscalls supported by the arch.
243- Implement arch_syscall_addr() that resolves a syscall address from a 244- Support the TIF_SYSCALL_TRACEPOINT thread flags.
244 syscall number.
245- Support the TIF_SYSCALL_TRACEPOINT thread flags
246- Put the trace_sys_enter() and trace_sys_exit() tracepoints calls from ptrace 245- Put the trace_sys_enter() and trace_sys_exit() tracepoints calls from ptrace
247 in the ptrace syscalls tracing path. 246 in the ptrace syscalls tracing path.
248- Tag this arch as HAVE_SYSCALL_TRACEPOINTS. 247- Tag this arch as HAVE_SYSCALL_TRACEPOINTS.
diff --git a/Documentation/trace/ftrace.txt b/Documentation/trace/ftrace.txt
index bab3040da548..03485bfbd797 100644
--- a/Documentation/trace/ftrace.txt
+++ b/Documentation/trace/ftrace.txt
@@ -1588,7 +1588,7 @@ module author does not need to worry about it.
1588 1588
1589When tracing is enabled, kstop_machine is called to prevent 1589When tracing is enabled, kstop_machine is called to prevent
1590races with the CPUS executing code being modified (which can 1590races with the CPUS executing code being modified (which can
1591cause the CPU to do undesireable things), and the nops are 1591cause the CPU to do undesirable things), and the nops are
1592patched back to calls. But this time, they do not call mcount 1592patched back to calls. But this time, they do not call mcount
1593(which is just a function stub). They now call into the ftrace 1593(which is just a function stub). They now call into the ftrace
1594infrastructure. 1594infrastructure.
diff --git a/Documentation/trace/kprobetrace.txt b/Documentation/trace/kprobetrace.txt
index 47aabeebbdf6..a9100b28eb84 100644
--- a/Documentation/trace/kprobetrace.txt
+++ b/Documentation/trace/kprobetrace.txt
@@ -24,6 +24,7 @@ Synopsis of kprobe_events
24------------------------- 24-------------------------
25 p[:[GRP/]EVENT] SYMBOL[+offs]|MEMADDR [FETCHARGS] : Set a probe 25 p[:[GRP/]EVENT] SYMBOL[+offs]|MEMADDR [FETCHARGS] : Set a probe
26 r[:[GRP/]EVENT] SYMBOL[+0] [FETCHARGS] : Set a return probe 26 r[:[GRP/]EVENT] SYMBOL[+0] [FETCHARGS] : Set a return probe
27 -:[GRP/]EVENT : Clear a probe
27 28
28 GRP : Group name. If omitted, use "kprobes" for it. 29 GRP : Group name. If omitted, use "kprobes" for it.
29 EVENT : Event name. If omitted, the event name is generated 30 EVENT : Event name. If omitted, the event name is generated
@@ -37,15 +38,12 @@ Synopsis of kprobe_events
37 @SYM[+|-offs] : Fetch memory at SYM +|- offs (SYM should be a data symbol) 38 @SYM[+|-offs] : Fetch memory at SYM +|- offs (SYM should be a data symbol)
38 $stackN : Fetch Nth entry of stack (N >= 0) 39 $stackN : Fetch Nth entry of stack (N >= 0)
39 $stack : Fetch stack address. 40 $stack : Fetch stack address.
40 $argN : Fetch function argument. (N >= 0)(*) 41 $retval : Fetch return value.(*)
41 $retval : Fetch return value.(**) 42 +|-offs(FETCHARG) : Fetch memory at FETCHARG +|- offs address.(**)
42 +|-offs(FETCHARG) : Fetch memory at FETCHARG +|- offs address.(***)
43 NAME=FETCHARG: Set NAME as the argument name of FETCHARG. 43 NAME=FETCHARG: Set NAME as the argument name of FETCHARG.
44 44
45 (*) aN may not correct on asmlinkaged functions and at the middle of 45 (*) only for return probe.
46 function body. 46 (**) this is useful for fetching a field of data structures.
47 (**) only for return probe.
48 (***) this is useful for fetching a field of data structures.
49 47
50 48
51Per-Probe Event Filtering 49Per-Probe Event Filtering
@@ -82,13 +80,16 @@ Usage examples
82To add a probe as a new event, write a new definition to kprobe_events 80To add a probe as a new event, write a new definition to kprobe_events
83as below. 81as below.
84 82
85 echo p:myprobe do_sys_open dfd=$arg0 filename=$arg1 flags=$arg2 mode=$arg3 > /sys/kernel/debug/tracing/kprobe_events 83 echo 'p:myprobe do_sys_open dfd=%ax filename=%dx flags=%cx mode=+4($stack)' > /sys/kernel/debug/tracing/kprobe_events
86 84
87 This sets a kprobe on the top of do_sys_open() function with recording 85 This sets a kprobe on the top of do_sys_open() function with recording
881st to 4th arguments as "myprobe" event. As this example shows, users can 861st to 4th arguments as "myprobe" event. Note, which register/stack entry is
89choose more familiar names for each arguments. 87assigned to each function argument depends on arch-specific ABI. If you unsure
88the ABI, please try to use probe subcommand of perf-tools (you can find it
89under tools/perf/).
90As this example shows, users can choose more familiar names for each arguments.
90 91
91 echo r:myretprobe do_sys_open $retval >> /sys/kernel/debug/tracing/kprobe_events 92 echo 'r:myretprobe do_sys_open $retval' >> /sys/kernel/debug/tracing/kprobe_events
92 93
93 This sets a kretprobe on the return point of do_sys_open() function with 94 This sets a kretprobe on the return point of do_sys_open() function with
94recording return value as "myretprobe" event. 95recording return value as "myretprobe" event.
@@ -97,23 +98,24 @@ recording return value as "myretprobe" event.
97 98
98 cat /sys/kernel/debug/tracing/events/kprobes/myprobe/format 99 cat /sys/kernel/debug/tracing/events/kprobes/myprobe/format
99name: myprobe 100name: myprobe
100ID: 75 101ID: 780
101format: 102format:
102 field:unsigned short common_type; offset:0; size:2; 103 field:unsigned short common_type; offset:0; size:2; signed:0;
103 field:unsigned char common_flags; offset:2; size:1; 104 field:unsigned char common_flags; offset:2; size:1; signed:0;
104 field:unsigned char common_preempt_count; offset:3; size:1; 105 field:unsigned char common_preempt_count; offset:3; size:1;signed:0;
105 field:int common_pid; offset:4; size:4; 106 field:int common_pid; offset:4; size:4; signed:1;
106 field:int common_tgid; offset:8; size:4; 107 field:int common_lock_depth; offset:8; size:4; signed:1;
107 108
108 field: unsigned long ip; offset:16;tsize:8; 109 field:unsigned long __probe_ip; offset:12; size:4; signed:0;
109 field: int nargs; offset:24;tsize:4; 110 field:int __probe_nargs; offset:16; size:4; signed:1;
110 field: unsigned long dfd; offset:32;tsize:8; 111 field:unsigned long dfd; offset:20; size:4; signed:0;
111 field: unsigned long filename; offset:40;tsize:8; 112 field:unsigned long filename; offset:24; size:4; signed:0;
112 field: unsigned long flags; offset:48;tsize:8; 113 field:unsigned long flags; offset:28; size:4; signed:0;
113 field: unsigned long mode; offset:56;tsize:8; 114 field:unsigned long mode; offset:32; size:4; signed:0;
114 115
115print fmt: "(%lx) dfd=%lx filename=%lx flags=%lx mode=%lx", REC->ip, REC->dfd, REC->filename, REC->flags, REC->mode
116 116
117print fmt: "(%lx) dfd=%lx filename=%lx flags=%lx mode=%lx", REC->__probe_ip,
118REC->dfd, REC->filename, REC->flags, REC->mode
117 119
118 You can see that the event has 4 arguments as in the expressions you specified. 120 You can see that the event has 4 arguments as in the expressions you specified.
119 121
@@ -121,6 +123,12 @@ print fmt: "(%lx) dfd=%lx filename=%lx flags=%lx mode=%lx", REC->ip, REC->dfd, R
121 123
122 This clears all probe points. 124 This clears all probe points.
123 125
126 Or,
127
128 echo -:myprobe >> kprobe_events
129
130 This clears probe points selectively.
131
124 Right after definition, each event is disabled by default. For tracing these 132 Right after definition, each event is disabled by default. For tracing these
125events, you need to enable it. 133events, you need to enable it.
126 134
@@ -146,4 +154,3 @@ events, you need to enable it.
146returns from SYMBOL(e.g. "sys_open+0x1b/0x1d <- do_sys_open" means kernel 154returns from SYMBOL(e.g. "sys_open+0x1b/0x1d <- do_sys_open" means kernel
147returns from do_sys_open to sys_open+0x1b). 155returns from do_sys_open to sys_open+0x1b).
148 156
149
diff --git a/Documentation/usb/error-codes.txt b/Documentation/usb/error-codes.txt
index 9cf83e8c27b8..d83703ea74b2 100644
--- a/Documentation/usb/error-codes.txt
+++ b/Documentation/usb/error-codes.txt
@@ -41,8 +41,8 @@ USB-specific:
41 41
42-EFBIG Host controller driver can't schedule that many ISO frames. 42-EFBIG Host controller driver can't schedule that many ISO frames.
43 43
44-EPIPE Specified endpoint is stalled. For non-control endpoints, 44-EPIPE The pipe type specified in the URB doesn't match the
45 reset this status with usb_clear_halt(). 45 endpoint's actual type.
46 46
47-EMSGSIZE (a) endpoint maxpacket size is zero; it is not usable 47-EMSGSIZE (a) endpoint maxpacket size is zero; it is not usable
48 in the current interface altsetting. 48 in the current interface altsetting.
@@ -60,6 +60,8 @@ USB-specific:
60 60
61-EHOSTUNREACH URB was rejected because the device is suspended. 61-EHOSTUNREACH URB was rejected because the device is suspended.
62 62
63-ENOEXEC A control URB doesn't contain a Setup packet.
64
63 65
64************************************************************************** 66**************************************************************************
65* Error codes returned by in urb->status * 67* Error codes returned by in urb->status *
diff --git a/Documentation/usb/power-management.txt b/Documentation/usb/power-management.txt
index 3bf6818c8cf5..2790ad48cfc2 100644
--- a/Documentation/usb/power-management.txt
+++ b/Documentation/usb/power-management.txt
@@ -2,7 +2,7 @@
2 2
3 Alan Stern <stern@rowland.harvard.edu> 3 Alan Stern <stern@rowland.harvard.edu>
4 4
5 November 10, 2009 5 December 11, 2009
6 6
7 7
8 8
@@ -29,9 +29,9 @@ covered to some extent (see Documentation/power/*.txt for more
29information about system PM). 29information about system PM).
30 30
31Note: Dynamic PM support for USB is present only if the kernel was 31Note: Dynamic PM support for USB is present only if the kernel was
32built with CONFIG_USB_SUSPEND enabled. System PM support is present 32built with CONFIG_USB_SUSPEND enabled (which depends on
33only if the kernel was built with CONFIG_SUSPEND or CONFIG_HIBERNATION 33CONFIG_PM_RUNTIME). System PM support is present only if the kernel
34enabled. 34was built with CONFIG_SUSPEND or CONFIG_HIBERNATION enabled.
35 35
36 36
37 What is Remote Wakeup? 37 What is Remote Wakeup?
@@ -229,6 +229,11 @@ necessary operations by hand or add them to a udev script. You can
229also change the idle-delay time; 2 seconds is not the best choice for 229also change the idle-delay time; 2 seconds is not the best choice for
230every device. 230every device.
231 231
232If a driver knows that its device has proper suspend/resume support,
233it can enable autosuspend all by itself. For example, the video
234driver for a laptop's webcam might do this, since these devices are
235rarely used and so should normally be autosuspended.
236
232Sometimes it turns out that even when a device does work okay with 237Sometimes it turns out that even when a device does work okay with
233autosuspend there are still problems. For example, there are 238autosuspend there are still problems. For example, there are
234experimental patches adding autosuspend support to the usbhid driver, 239experimental patches adding autosuspend support to the usbhid driver,
@@ -321,69 +326,81 @@ driver does so by calling these six functions:
321 void usb_autopm_get_interface_no_resume(struct usb_interface *intf); 326 void usb_autopm_get_interface_no_resume(struct usb_interface *intf);
322 void usb_autopm_put_interface_no_suspend(struct usb_interface *intf); 327 void usb_autopm_put_interface_no_suspend(struct usb_interface *intf);
323 328
324The functions work by maintaining a counter in the usb_interface 329The functions work by maintaining a usage counter in the
325structure. When intf->pm_usage_count is > 0 then the interface is 330usb_interface's embedded device structure. When the counter is > 0
326deemed to be busy, and the kernel will not autosuspend the interface's 331then the interface is deemed to be busy, and the kernel will not
327device. When intf->pm_usage_count is <= 0 then the interface is 332autosuspend the interface's device. When the usage counter is = 0
328considered to be idle, and the kernel may autosuspend the device. 333then the interface is considered to be idle, and the kernel may
334autosuspend the device.
329 335
330(There is a similar pm_usage_count field in struct usb_device, 336(There is a similar usage counter field in struct usb_device,
331associated with the device itself rather than any of its interfaces. 337associated with the device itself rather than any of its interfaces.
332This field is used only by the USB core.) 338This counter is used only by the USB core.)
333 339
334Drivers must not modify intf->pm_usage_count directly; its value 340Drivers need not be concerned about balancing changes to the usage
335should be changed only be using the functions listed above. Drivers 341counter; the USB core will undo any remaining "get"s when a driver
336are responsible for insuring that the overall change to pm_usage_count 342is unbound from its interface. As a corollary, drivers must not call
337during their lifetime balances out to 0 (it may be necessary for the 343any of the usb_autopm_* functions after their diconnect() routine has
338disconnect method to call usb_autopm_put_interface() one or more times 344returned.
339to fulfill this requirement). The first two routines use the PM mutex 345
340in struct usb_device for mutual exclusion; drivers using the async 346Drivers using the async routines are responsible for their own
341routines are responsible for their own synchronization and mutual 347synchronization and mutual exclusion.
342exclusion. 348
343 349 usb_autopm_get_interface() increments the usage counter and
344 usb_autopm_get_interface() increments pm_usage_count and 350 does an autoresume if the device is suspended. If the
345 attempts an autoresume if the new value is > 0 and the 351 autoresume fails, the counter is decremented back.
346 device is suspended. 352
347 353 usb_autopm_put_interface() decrements the usage counter and
348 usb_autopm_put_interface() decrements pm_usage_count and 354 attempts an autosuspend if the new value is = 0.
349 attempts an autosuspend if the new value is <= 0 and the
350 device isn't suspended.
351 355
352 usb_autopm_get_interface_async() and 356 usb_autopm_get_interface_async() and
353 usb_autopm_put_interface_async() do almost the same things as 357 usb_autopm_put_interface_async() do almost the same things as
354 their non-async counterparts. The differences are: they do 358 their non-async counterparts. The big difference is that they
355 not acquire the PM mutex, and they use a workqueue to do their 359 use a workqueue to do the resume or suspend part of their
356 jobs. As a result they can be called in an atomic context, 360 jobs. As a result they can be called in an atomic context,
357 such as an URB's completion handler, but when they return the 361 such as an URB's completion handler, but when they return the
358 device will not generally not yet be in the desired state. 362 device will generally not yet be in the desired state.
359 363
360 usb_autopm_get_interface_no_resume() and 364 usb_autopm_get_interface_no_resume() and
361 usb_autopm_put_interface_no_suspend() merely increment or 365 usb_autopm_put_interface_no_suspend() merely increment or
362 decrement the pm_usage_count value; they do not attempt to 366 decrement the usage counter; they do not attempt to carry out
363 carry out an autoresume or an autosuspend. Hence they can be 367 an autoresume or an autosuspend. Hence they can be called in
364 called in an atomic context. 368 an atomic context.
365 369
366The conventional usage pattern is that a driver calls 370The simplest usage pattern is that a driver calls
367usb_autopm_get_interface() in its open routine and 371usb_autopm_get_interface() in its open routine and
368usb_autopm_put_interface() in its close or release routine. But 372usb_autopm_put_interface() in its close or release routine. But other
369other patterns are possible. 373patterns are possible.
370 374
371The autosuspend attempts mentioned above will often fail for one 375The autosuspend attempts mentioned above will often fail for one
372reason or another. For example, the power/level attribute might be 376reason or another. For example, the power/level attribute might be
373set to "on", or another interface in the same device might not be 377set to "on", or another interface in the same device might not be
374idle. This is perfectly normal. If the reason for failure was that 378idle. This is perfectly normal. If the reason for failure was that
375the device hasn't been idle for long enough, a delayed workqueue 379the device hasn't been idle for long enough, a timer is scheduled to
376routine is automatically set up to carry out the operation when the 380carry out the operation automatically when the autosuspend idle-delay
377autosuspend idle-delay has expired. 381has expired.
378 382
379Autoresume attempts also can fail, although failure would mean that 383Autoresume attempts also can fail, although failure would mean that
380the device is no longer present or operating properly. Unlike 384the device is no longer present or operating properly. Unlike
381autosuspend, there's no delay for an autoresume. 385autosuspend, there's no idle-delay for an autoresume.
382 386
383 387
384 Other parts of the driver interface 388 Other parts of the driver interface
385 ----------------------------------- 389 -----------------------------------
386 390
391Drivers can enable autosuspend for their devices by calling
392
393 usb_enable_autosuspend(struct usb_device *udev);
394
395in their probe() routine, if they know that the device is capable of
396suspending and resuming correctly. This is exactly equivalent to
397writing "auto" to the device's power/level attribute. Likewise,
398drivers can disable autosuspend by calling
399
400 usb_disable_autosuspend(struct usb_device *udev);
401
402This is exactly the same as writing "on" to the power/level attribute.
403
387Sometimes a driver needs to make sure that remote wakeup is enabled 404Sometimes a driver needs to make sure that remote wakeup is enabled
388during autosuspend. For example, there's not much point 405during autosuspend. For example, there's not much point
389autosuspending a keyboard if the user can't cause the keyboard to do a 406autosuspending a keyboard if the user can't cause the keyboard to do a
@@ -395,26 +412,27 @@ though, setting this flag won't cause the kernel to autoresume it.
395Normally a driver would set this flag in its probe method, at which 412Normally a driver would set this flag in its probe method, at which
396time the device is guaranteed not to be autosuspended.) 413time the device is guaranteed not to be autosuspended.)
397 414
398The synchronous usb_autopm_* routines have to run in a sleepable 415If a driver does its I/O asynchronously in interrupt context, it
399process context; they must not be called from an interrupt handler or 416should call usb_autopm_get_interface_async() before starting output and
400while holding a spinlock. In fact, the entire autosuspend mechanism 417usb_autopm_put_interface_async() when the output queue drains. When
401is not well geared toward interrupt-driven operation. However there 418it receives an input event, it should call
402is one thing a driver can do in an interrupt handler:
403 419
404 usb_mark_last_busy(struct usb_device *udev); 420 usb_mark_last_busy(struct usb_device *udev);
405 421
406This sets udev->last_busy to the current time. udev->last_busy is the 422in the event handler. This sets udev->last_busy to the current time.
407field used for idle-delay calculations; updating it will cause any 423udev->last_busy is the field used for idle-delay calculations;
408pending autosuspend to be moved back. The usb_autopm_* routines will 424updating it will cause any pending autosuspend to be moved back. Most
409also set the last_busy field to the current time. 425of the usb_autopm_* routines will also set the last_busy field to the
410 426current time.
411Calling urb_mark_last_busy() from within an URB completion handler is 427
412subject to races: The kernel may have just finished deciding the 428Asynchronous operation is always subject to races. For example, a
413device has been idle for long enough but not yet gotten around to 429driver may call one of the usb_autopm_*_interface_async() routines at
414calling the driver's suspend method. The driver would have to be 430a time when the core has just finished deciding the device has been
415responsible for synchronizing its suspend method with its URB 431idle for long enough but not yet gotten around to calling the driver's
416completion handler and causing the autosuspend to fail with -EBUSY if 432suspend method. The suspend method must be responsible for
417an URB had completed too recently. 433synchronizing with the output request routine and the URB completion
434handler; it should cause autosuspends to fail with -EBUSY if the
435driver needs to use the device.
418 436
419External suspend calls should never be allowed to fail in this way, 437External suspend calls should never be allowed to fail in this way,
420only autosuspend calls. The driver can tell them apart by checking 438only autosuspend calls. The driver can tell them apart by checking
@@ -422,75 +440,23 @@ the PM_EVENT_AUTO bit in the message.event argument to the suspend
422method; this bit will be set for internal PM events (autosuspend) and 440method; this bit will be set for internal PM events (autosuspend) and
423clear for external PM events. 441clear for external PM events.
424 442
425Many of the ingredients in the autosuspend framework are oriented
426towards interfaces: The usb_interface structure contains the
427pm_usage_cnt field, and the usb_autopm_* routines take an interface
428pointer as their argument. But somewhat confusingly, a few of the
429pieces (i.e., usb_mark_last_busy()) use the usb_device structure
430instead. Drivers need to keep this straight; they can call
431interface_to_usbdev() to find the device structure for a given
432interface.
433
434 443
435 Locking requirements 444 Mutual exclusion
436 -------------------- 445 ----------------
437 446
438All three suspend/resume methods are always called while holding the 447For external events -- but not necessarily for autosuspend or
439usb_device's PM mutex. For external events -- but not necessarily for 448autoresume -- the device semaphore (udev->dev.sem) will be held when a
440autosuspend or autoresume -- the device semaphore (udev->dev.sem) will 449suspend or resume method is called. This implies that external
441also be held. This implies that external suspend/resume events are 450suspend/resume events are mutually exclusive with calls to probe,
442mutually exclusive with calls to probe, disconnect, pre_reset, and 451disconnect, pre_reset, and post_reset; the USB core guarantees that
443post_reset; the USB core guarantees that this is true of internal 452this is true of autosuspend/autoresume events as well.
444suspend/resume events as well.
445 453
446If a driver wants to block all suspend/resume calls during some 454If a driver wants to block all suspend/resume calls during some
447critical section, it can simply acquire udev->pm_mutex. Note that 455critical section, the best way is to lock the device and call
448calls to resume may be triggered indirectly. Block IO due to memory 456usb_autopm_get_interface() (and do the reverse at the end of the
449allocations can make the vm subsystem resume a device. Thus while 457critical section). Holding the device semaphore will block all
450holding this lock you must not allocate memory with GFP_KERNEL or 458external PM calls, and the usb_autopm_get_interface() will prevent any
451GFP_NOFS. 459internal PM calls, even if it fails. (Exercise: Why?)
452
453Alternatively, if the critical section might call some of the
454usb_autopm_* routines, the driver can avoid deadlock by doing:
455
456 down(&udev->dev.sem);
457 rc = usb_autopm_get_interface(intf);
458
459and at the end of the critical section:
460
461 if (!rc)
462 usb_autopm_put_interface(intf);
463 up(&udev->dev.sem);
464
465Holding the device semaphore will block all external PM calls, and the
466usb_autopm_get_interface() will prevent any internal PM calls, even if
467it fails. (Exercise: Why?)
468
469The rules for locking order are:
470
471 Never acquire any device semaphore while holding any PM mutex.
472
473 Never acquire udev->pm_mutex while holding the PM mutex for
474 a device that isn't a descendant of udev.
475
476In other words, PM mutexes should only be acquired going up the device
477tree, and they should be acquired only after locking all the device
478semaphores you need to hold. These rules don't matter to drivers very
479much; they usually affect just the USB core.
480
481Still, drivers do need to be careful. For example, many drivers use a
482private mutex to synchronize their normal I/O activities with their
483disconnect method. Now if the driver supports autosuspend then it
484must call usb_autopm_put_interface() from somewhere -- maybe from its
485close method. It should make the call while holding the private mutex,
486since a driver shouldn't call any of the usb_autopm_* functions for an
487interface from which it has been unbound.
488
489But the usb_autpm_* routines always acquire the device's PM mutex, and
490consequently the locking order has to be: private mutex first, PM
491mutex second. Since the suspend method is always called with the PM
492mutex held, it mustn't try to acquire the private mutex. It has to
493synchronize with the driver's I/O activities in some other way.
494 460
495 461
496 Interaction between dynamic PM and system PM 462 Interaction between dynamic PM and system PM
@@ -499,22 +465,11 @@ synchronize with the driver's I/O activities in some other way.
499Dynamic power management and system power management can interact in 465Dynamic power management and system power management can interact in
500a couple of ways. 466a couple of ways.
501 467
502Firstly, a device may already be manually suspended or autosuspended 468Firstly, a device may already be autosuspended when a system suspend
503when a system suspend occurs. Since system suspends are supposed to 469occurs. Since system suspends are supposed to be as transparent as
504be as transparent as possible, the device should remain suspended 470possible, the device should remain suspended following the system
505following the system resume. The 2.6.23 kernel obeys this principle 471resume. But this theory may not work out well in practice; over time
506for manually suspended devices but not for autosuspended devices; they 472the kernel's behavior in this regard has changed.
507do get resumed when the system wakes up. (Presumably they will be
508autosuspended again after their idle-delay time expires.) In later
509kernels this behavior will be fixed.
510
511(There is an exception. If a device would undergo a reset-resume
512instead of a normal resume, and the device is enabled for remote
513wakeup, then the reset-resume takes place even if the device was
514already suspended when the system suspend began. The justification is
515that a reset-resume is a kind of remote-wakeup event. Or to put it
516another way, a device which needs a reset won't be able to generate
517normal remote-wakeup signals, so it ought to be resumed immediately.)
518 473
519Secondly, a dynamic power-management event may occur as a system 474Secondly, a dynamic power-management event may occur as a system
520suspend is underway. The window for this is short, since system 475suspend is underway. The window for this is short, since system
diff --git a/Documentation/video4linux/CARDLIST.cx23885 b/Documentation/video4linux/CARDLIST.cx23885
index 7539e8fa1ffd..16ca030e1185 100644
--- a/Documentation/video4linux/CARDLIST.cx23885
+++ b/Documentation/video4linux/CARDLIST.cx23885
@@ -26,3 +26,4 @@
26 25 -> Compro VideoMate E800 [1858:e800] 26 25 -> Compro VideoMate E800 [1858:e800]
27 26 -> Hauppauge WinTV-HVR1290 [0070:8551] 27 26 -> Hauppauge WinTV-HVR1290 [0070:8551]
28 27 -> Mygica X8558 PRO DMB-TH [14f1:8578] 28 27 -> Mygica X8558 PRO DMB-TH [14f1:8578]
29 28 -> LEADTEK WinFast PxTV1200 [107d:6f22]
diff --git a/Documentation/video4linux/CARDLIST.saa7134 b/Documentation/video4linux/CARDLIST.saa7134
index fce1e7eb0474..b4a767060ed7 100644
--- a/Documentation/video4linux/CARDLIST.saa7134
+++ b/Documentation/video4linux/CARDLIST.saa7134
@@ -174,3 +174,4 @@
174173 -> Zolid Hybrid TV Tuner PCI [1131:2004] 174173 -> Zolid Hybrid TV Tuner PCI [1131:2004]
175174 -> Asus Europa Hybrid OEM [1043:4847] 175174 -> Asus Europa Hybrid OEM [1043:4847]
176175 -> Leadtek Winfast DTV1000S [107d:6655] 176175 -> Leadtek Winfast DTV1000S [107d:6655]
177176 -> Beholder BeholdTV 505 RDS [0000:5051]
diff --git a/Documentation/video4linux/CARDLIST.tuner b/Documentation/video4linux/CARDLIST.tuner
index e0d298fe8830..9b2e0dd6017e 100644
--- a/Documentation/video4linux/CARDLIST.tuner
+++ b/Documentation/video4linux/CARDLIST.tuner
@@ -81,3 +81,4 @@ tuner=80 - Philips FQ1216LME MK3 PAL/SECAM w/active loopthrough
81tuner=81 - Partsnic (Daewoo) PTI-5NF05 81tuner=81 - Partsnic (Daewoo) PTI-5NF05
82tuner=82 - Philips CU1216L 82tuner=82 - Philips CU1216L
83tuner=83 - NXP TDA18271 83tuner=83 - NXP TDA18271
84tuner=84 - Sony BTF-Pxn01Z
diff --git a/Documentation/video4linux/README.tlg2300 b/Documentation/video4linux/README.tlg2300
new file mode 100644
index 000000000000..416ccb93d8c9
--- /dev/null
+++ b/Documentation/video4linux/README.tlg2300
@@ -0,0 +1,47 @@
1tlg2300 release notes
2====================
3
4This is a v4l2/dvb device driver for the tlg2300 chip.
5
6
7current status
8==============
9
10video
11 - support mmap and read().(no overlay)
12
13audio
14 - The driver will register a ALSA card for the audio input.
15
16vbi
17 - Works for almost TV norms.
18
19dvb-t
20 - works for DVB-T
21
22FM
23 - Works for radio.
24
25---------------------------------------------------------------------------
26TESTED APPLICATIONS:
27
28-VLC1.0.4 test the video and dvb. The GUI is friendly to use.
29
30-Mplayer test the video.
31
32-Mplayer test the FM. The mplayer should be compiled with --enable-radio and
33 --enable-radio-capture.
34 The command runs as this(The alsa audio registers to card 1):
35 #mplayer radio://103.7/capture/ -radio adevice=hw=1,0:arate=48000 \
36 -rawaudio rate=48000:channels=2
37
38---------------------------------------------------------------------------
39KNOWN PROBLEMS:
40about preemphasis:
41 You can set the preemphasis for radio by the following command:
42 #v4l2-ctl -d /dev/radio0 --set-ctrl=pre_emphasis_settings=1
43
44 "pre_emphasis_settings=1" means that you select the 50us. If you want
45 to select the 75us, please use "pre_emphasis_settings=2"
46
47
diff --git a/Documentation/video4linux/gspca.txt b/Documentation/video4linux/gspca.txt
index 1800a62cf135..181b9e6fd984 100644
--- a/Documentation/video4linux/gspca.txt
+++ b/Documentation/video4linux/gspca.txt
@@ -42,6 +42,7 @@ ov519 041e:4064 Creative Live! VISTA VF0420
42ov519 041e:4067 Creative Live! Cam Video IM (VF0350) 42ov519 041e:4067 Creative Live! Cam Video IM (VF0350)
43ov519 041e:4068 Creative Live! VISTA VF0470 43ov519 041e:4068 Creative Live! VISTA VF0470
44spca561 0458:7004 Genius VideoCAM Express V2 44spca561 0458:7004 Genius VideoCAM Express V2
45sn9c2028 0458:7005 Genius Smart 300, version 2
45sunplus 0458:7006 Genius Dsc 1.3 Smart 46sunplus 0458:7006 Genius Dsc 1.3 Smart
46zc3xx 0458:7007 Genius VideoCam V2 47zc3xx 0458:7007 Genius VideoCam V2
47zc3xx 0458:700c Genius VideoCam V3 48zc3xx 0458:700c Genius VideoCam V3
@@ -109,6 +110,7 @@ sunplus 04a5:3003 Benq DC 1300
109sunplus 04a5:3008 Benq DC 1500 110sunplus 04a5:3008 Benq DC 1500
110sunplus 04a5:300a Benq DC 3410 111sunplus 04a5:300a Benq DC 3410
111spca500 04a5:300c Benq DC 1016 112spca500 04a5:300c Benq DC 1016
113benq 04a5:3035 Benq DC E300
112finepix 04cb:0104 Fujifilm FinePix 4800 114finepix 04cb:0104 Fujifilm FinePix 4800
113finepix 04cb:0109 Fujifilm FinePix A202 115finepix 04cb:0109 Fujifilm FinePix A202
114finepix 04cb:010b Fujifilm FinePix A203 116finepix 04cb:010b Fujifilm FinePix A203
@@ -142,6 +144,7 @@ sunplus 04fc:5360 Sunplus Generic
142spca500 04fc:7333 PalmPixDC85 144spca500 04fc:7333 PalmPixDC85
143sunplus 04fc:ffff Pure DigitalDakota 145sunplus 04fc:ffff Pure DigitalDakota
144spca501 0506:00df 3Com HomeConnect Lite 146spca501 0506:00df 3Com HomeConnect Lite
147sunplus 052b:1507 Megapixel 5 Pretec DC-1007
145sunplus 052b:1513 Megapix V4 148sunplus 052b:1513 Megapix V4
146sunplus 052b:1803 MegaImage VI 149sunplus 052b:1803 MegaImage VI
147tv8532 0545:808b Veo Stingray 150tv8532 0545:808b Veo Stingray
@@ -151,6 +154,7 @@ sunplus 0546:3191 Polaroid Ion 80
151sunplus 0546:3273 Polaroid PDC2030 154sunplus 0546:3273 Polaroid PDC2030
152ov519 054c:0154 Sonny toy4 155ov519 054c:0154 Sonny toy4
153ov519 054c:0155 Sonny toy5 156ov519 054c:0155 Sonny toy5
157cpia1 0553:0002 CPIA CPiA (version1) based cameras
154zc3xx 055f:c005 Mustek Wcam300A 158zc3xx 055f:c005 Mustek Wcam300A
155spca500 055f:c200 Mustek Gsmart 300 159spca500 055f:c200 Mustek Gsmart 300
156sunplus 055f:c211 Kowa Bs888e Microcamera 160sunplus 055f:c211 Kowa Bs888e Microcamera
@@ -188,8 +192,7 @@ spca500 06bd:0404 Agfa CL20
188spca500 06be:0800 Optimedia 192spca500 06be:0800 Optimedia
189sunplus 06d6:0031 Trust 610 LCD PowerC@m Zoom 193sunplus 06d6:0031 Trust 610 LCD PowerC@m Zoom
190spca506 06e1:a190 ADS Instant VCD 194spca506 06e1:a190 ADS Instant VCD
191ov534 06f8:3002 Hercules Blog Webcam 195ov534_9 06f8:3003 Hercules Dualpix HD Weblog
192ov534 06f8:3003 Hercules Dualpix HD Weblog
193sonixj 06f8:3004 Hercules Classic Silver 196sonixj 06f8:3004 Hercules Classic Silver
194sonixj 06f8:3008 Hercules Deluxe Optical Glass 197sonixj 06f8:3008 Hercules Deluxe Optical Glass
195pac7302 06f8:3009 Hercules Classic Link 198pac7302 06f8:3009 Hercules Classic Link
@@ -204,6 +207,7 @@ sunplus 0733:2221 Mercury Digital Pro 3.1p
204sunplus 0733:3261 Concord 3045 spca536a 207sunplus 0733:3261 Concord 3045 spca536a
205sunplus 0733:3281 Cyberpix S550V 208sunplus 0733:3281 Cyberpix S550V
206spca506 0734:043b 3DeMon USB Capture aka 209spca506 0734:043b 3DeMon USB Capture aka
210cpia1 0813:0001 QX3 camera
207ov519 0813:0002 Dual Mode USB Camera Plus 211ov519 0813:0002 Dual Mode USB Camera Plus
208spca500 084d:0003 D-Link DSC-350 212spca500 084d:0003 D-Link DSC-350
209spca500 08ca:0103 Aiptek PocketDV 213spca500 08ca:0103 Aiptek PocketDV
@@ -225,7 +229,8 @@ sunplus 08ca:2050 Medion MD 41437
225sunplus 08ca:2060 Aiptek PocketDV5300 229sunplus 08ca:2060 Aiptek PocketDV5300
226tv8532 0923:010f ICM532 cams 230tv8532 0923:010f ICM532 cams
227mars 093a:050f Mars-Semi Pc-Camera 231mars 093a:050f Mars-Semi Pc-Camera
228mr97310a 093a:010f Sakar Digital no. 77379 232mr97310a 093a:010e All known CIF cams with this ID
233mr97310a 093a:010f All known VGA cams with this ID
229pac207 093a:2460 Qtec Webcam 100 234pac207 093a:2460 Qtec Webcam 100
230pac207 093a:2461 HP Webcam 235pac207 093a:2461 HP Webcam
231pac207 093a:2463 Philips SPC 220 NC 236pac207 093a:2463 Philips SPC 220 NC
@@ -302,6 +307,7 @@ sonixj 0c45:613b Surfer SN-206
302sonixj 0c45:613c Sonix Pccam168 307sonixj 0c45:613c Sonix Pccam168
303sonixj 0c45:6143 Sonix Pccam168 308sonixj 0c45:6143 Sonix Pccam168
304sonixj 0c45:6148 Digitus DA-70811/ZSMC USB PC Camera ZS211/Microdia 309sonixj 0c45:6148 Digitus DA-70811/ZSMC USB PC Camera ZS211/Microdia
310sonixj 0c45:614a Frontech E-Ccam (JIL-2225)
305sn9c20x 0c45:6240 PC Camera (SN9C201 + MT9M001) 311sn9c20x 0c45:6240 PC Camera (SN9C201 + MT9M001)
306sn9c20x 0c45:6242 PC Camera (SN9C201 + MT9M111) 312sn9c20x 0c45:6242 PC Camera (SN9C201 + MT9M111)
307sn9c20x 0c45:6248 PC Camera (SN9C201 + OV9655) 313sn9c20x 0c45:6248 PC Camera (SN9C201 + OV9655)
@@ -324,6 +330,10 @@ sn9c20x 0c45:62b0 PC Camera (SN9C202 + MT9V011/MT9V111/MT9V112)
324sn9c20x 0c45:62b3 PC Camera (SN9C202 + OV9655) 330sn9c20x 0c45:62b3 PC Camera (SN9C202 + OV9655)
325sn9c20x 0c45:62bb PC Camera (SN9C202 + OV7660) 331sn9c20x 0c45:62bb PC Camera (SN9C202 + OV7660)
326sn9c20x 0c45:62bc PC Camera (SN9C202 + HV7131R) 332sn9c20x 0c45:62bc PC Camera (SN9C202 + HV7131R)
333sn9c2028 0c45:8001 Wild Planet Digital Spy Camera
334sn9c2028 0c45:8003 Sakar #11199, #6637x, #67480 keychain cams
335sn9c2028 0c45:8008 Mini-Shotz ms-350
336sn9c2028 0c45:800a Vivitar Vivicam 3350B
327sunplus 0d64:0303 Sunplus FashionCam DXG 337sunplus 0d64:0303 Sunplus FashionCam DXG
328ov519 0e96:c001 TRUST 380 USB2 SPACEC@M 338ov519 0e96:c001 TRUST 380 USB2 SPACEC@M
329etoms 102c:6151 Qcam Sangha CIF 339etoms 102c:6151 Qcam Sangha CIF
@@ -341,10 +351,11 @@ spca501 1776:501c Arowana 300K CMOS Camera
341t613 17a1:0128 TASCORP JPEG Webcam, NGS Cyclops 351t613 17a1:0128 TASCORP JPEG Webcam, NGS Cyclops
342vc032x 17ef:4802 Lenovo Vc0323+MI1310_SOC 352vc032x 17ef:4802 Lenovo Vc0323+MI1310_SOC
343pac207 2001:f115 D-Link DSB-C120 353pac207 2001:f115 D-Link DSB-C120
344sq905c 2770:9050 sq905c 354sq905c 2770:9050 Disney pix micro (CIF)
345sq905c 2770:905c DualCamera 355sq905c 2770:9052 Disney pix micro 2 (VGA)
346sq905 2770:9120 Argus Digital Camera DC1512 356sq905c 2770:905c All 11 known cameras with this ID
347sq905c 2770:913d sq905c 357sq905 2770:9120 All 24 known cameras with this ID
358sq905c 2770:913d All 4 known cameras with this ID
348spca500 2899:012c Toptro Industrial 359spca500 2899:012c Toptro Industrial
349ov519 8020:ef04 ov519 360ov519 8020:ef04 ov519
350spca508 8086:0110 Intel Easy PC Camera 361spca508 8086:0110 Intel Easy PC Camera
diff --git a/Documentation/video4linux/v4l2-framework.txt b/Documentation/video4linux/v4l2-framework.txt
index 74d677c8b036..5155700c206b 100644
--- a/Documentation/video4linux/v4l2-framework.txt
+++ b/Documentation/video4linux/v4l2-framework.txt
@@ -599,99 +599,13 @@ video_device::minor fields.
599video buffer helper functions 599video buffer helper functions
600----------------------------- 600-----------------------------
601 601
602The v4l2 core API provides a standard method for dealing with video 602The v4l2 core API provides a set of standard methods (called "videobuf")
603buffers. Those methods allow a driver to implement read(), mmap() and 603for dealing with video buffers. Those methods allow a driver to implement
604overlay() on a consistent way. 604read(), mmap() and overlay() in a consistent way. There are currently
605 605methods for using video buffers on devices that supports DMA with
606There are currently methods for using video buffers on devices that 606scatter/gather method (videobuf-dma-sg), DMA with linear access
607supports DMA with scatter/gather method (videobuf-dma-sg), DMA with 607(videobuf-dma-contig), and vmalloced buffers, mostly used on USB drivers
608linear access (videobuf-dma-contig), and vmalloced buffers, mostly 608(videobuf-vmalloc).
609used on USB drivers (videobuf-vmalloc). 609
610 610Please see Documentation/video4linux/videobuf for more information on how
611Any driver using videobuf should provide operations (callbacks) for 611to use the videobuf layer.
612four handlers:
613
614ops->buf_setup - calculates the size of the video buffers and avoid they
615 to waste more than some maximum limit of RAM;
616ops->buf_prepare - fills the video buffer structs and calls
617 videobuf_iolock() to alloc and prepare mmaped memory;
618ops->buf_queue - advices the driver that another buffer were
619 requested (by read() or by QBUF);
620ops->buf_release - frees any buffer that were allocated.
621
622In order to use it, the driver need to have a code (generally called at
623interrupt context) that will properly handle the buffer request lists,
624announcing that a new buffer were filled.
625
626The irq handling code should handle the videobuf task lists, in order
627to advice videobuf that a new frame were filled, in order to honor to a
628request. The code is generally like this one:
629 if (list_empty(&dma_q->active))
630 return;
631
632 buf = list_entry(dma_q->active.next, struct vbuffer, vb.queue);
633
634 if (!waitqueue_active(&buf->vb.done))
635 return;
636
637 /* Some logic to handle the buf may be needed here */
638
639 list_del(&buf->vb.queue);
640 do_gettimeofday(&buf->vb.ts);
641 wake_up(&buf->vb.done);
642
643Those are the videobuffer functions used on drivers, implemented on
644videobuf-core:
645
646- Videobuf init functions
647 videobuf_queue_sg_init()
648 Initializes the videobuf infrastructure. This function should be
649 called before any other videobuf function on drivers that uses DMA
650 Scatter/Gather buffers.
651
652 videobuf_queue_dma_contig_init
653 Initializes the videobuf infrastructure. This function should be
654 called before any other videobuf function on drivers that need DMA
655 contiguous buffers.
656
657 videobuf_queue_vmalloc_init()
658 Initializes the videobuf infrastructure. This function should be
659 called before any other videobuf function on USB (and other drivers)
660 that need a vmalloced type of videobuf.
661
662- videobuf_iolock()
663 Prepares the videobuf memory for the proper method (read, mmap, overlay).
664
665- videobuf_queue_is_busy()
666 Checks if a videobuf is streaming.
667
668- videobuf_queue_cancel()
669 Stops video handling.
670
671- videobuf_mmap_free()
672 frees mmap buffers.
673
674- videobuf_stop()
675 Stops video handling, ends mmap and frees mmap and other buffers.
676
677- V4L2 api functions. Those functions correspond to VIDIOC_foo ioctls:
678 videobuf_reqbufs(), videobuf_querybuf(), videobuf_qbuf(),
679 videobuf_dqbuf(), videobuf_streamon(), videobuf_streamoff().
680
681- V4L1 api function (corresponds to VIDIOCMBUF ioctl):
682 videobuf_cgmbuf()
683 This function is used to provide backward compatibility with V4L1
684 API.
685
686- Some help functions for read()/poll() operations:
687 videobuf_read_stream()
688 For continuous stream read()
689 videobuf_read_one()
690 For snapshot read()
691 videobuf_poll_stream()
692 polling help function
693
694The better way to understand it is to take a look at vivi driver. One
695of the main reasons for vivi is to be a videobuf usage example. the
696vivi_thread_tick() does the task that the IRQ callback would do on PCI
697drivers (or the irq callback on USB).
diff --git a/Documentation/video4linux/videobuf b/Documentation/video4linux/videobuf
new file mode 100644
index 000000000000..17a1f9abf260
--- /dev/null
+++ b/Documentation/video4linux/videobuf
@@ -0,0 +1,360 @@
1An introduction to the videobuf layer
2Jonathan Corbet <corbet@lwn.net>
3Current as of 2.6.33
4
5The videobuf layer functions as a sort of glue layer between a V4L2 driver
6and user space. It handles the allocation and management of buffers for
7the storage of video frames. There is a set of functions which can be used
8to implement many of the standard POSIX I/O system calls, including read(),
9poll(), and, happily, mmap(). Another set of functions can be used to
10implement the bulk of the V4L2 ioctl() calls related to streaming I/O,
11including buffer allocation, queueing and dequeueing, and streaming
12control. Using videobuf imposes a few design decisions on the driver
13author, but the payback comes in the form of reduced code in the driver and
14a consistent implementation of the V4L2 user-space API.
15
16Buffer types
17
18Not all video devices use the same kind of buffers. In fact, there are (at
19least) three common variations:
20
21 - Buffers which are scattered in both the physical and (kernel) virtual
22 address spaces. (Almost) all user-space buffers are like this, but it
23 makes great sense to allocate kernel-space buffers this way as well when
24 it is possible. Unfortunately, it is not always possible; working with
25 this kind of buffer normally requires hardware which can do
26 scatter/gather DMA operations.
27
28 - Buffers which are physically scattered, but which are virtually
29 contiguous; buffers allocated with vmalloc(), in other words. These
30 buffers are just as hard to use for DMA operations, but they can be
31 useful in situations where DMA is not available but virtually-contiguous
32 buffers are convenient.
33
34 - Buffers which are physically contiguous. Allocation of this kind of
35 buffer can be unreliable on fragmented systems, but simpler DMA
36 controllers cannot deal with anything else.
37
38Videobuf can work with all three types of buffers, but the driver author
39must pick one at the outset and design the driver around that decision.
40
41[It's worth noting that there's a fourth kind of buffer: "overlay" buffers
42which are located within the system's video memory. The overlay
43functionality is considered to be deprecated for most use, but it still
44shows up occasionally in system-on-chip drivers where the performance
45benefits merit the use of this technique. Overlay buffers can be handled
46as a form of scattered buffer, but there are very few implementations in
47the kernel and a description of this technique is currently beyond the
48scope of this document.]
49
50Data structures, callbacks, and initialization
51
52Depending on which type of buffers are being used, the driver should
53include one of the following files:
54
55 <media/videobuf-dma-sg.h> /* Physically scattered */
56 <media/videobuf-vmalloc.h> /* vmalloc() buffers */
57 <media/videobuf-dma-contig.h> /* Physically contiguous */
58
59The driver's data structure describing a V4L2 device should include a
60struct videobuf_queue instance for the management of the buffer queue,
61along with a list_head for the queue of available buffers. There will also
62need to be an interrupt-safe spinlock which is used to protect (at least)
63the queue.
64
65The next step is to write four simple callbacks to help videobuf deal with
66the management of buffers:
67
68 struct videobuf_queue_ops {
69 int (*buf_setup)(struct videobuf_queue *q,
70 unsigned int *count, unsigned int *size);
71 int (*buf_prepare)(struct videobuf_queue *q,
72 struct videobuf_buffer *vb,
73 enum v4l2_field field);
74 void (*buf_queue)(struct videobuf_queue *q,
75 struct videobuf_buffer *vb);
76 void (*buf_release)(struct videobuf_queue *q,
77 struct videobuf_buffer *vb);
78 };
79
80buf_setup() is called early in the I/O process, when streaming is being
81initiated; its purpose is to tell videobuf about the I/O stream. The count
82parameter will be a suggested number of buffers to use; the driver should
83check it for rationality and adjust it if need be. As a practical rule, a
84minimum of two buffers are needed for proper streaming, and there is
85usually a maximum (which cannot exceed 32) which makes sense for each
86device. The size parameter should be set to the expected (maximum) size
87for each frame of data.
88
89Each buffer (in the form of a struct videobuf_buffer pointer) will be
90passed to buf_prepare(), which should set the buffer's size, width, height,
91and field fields properly. If the buffer's state field is
92VIDEOBUF_NEEDS_INIT, the driver should pass it to:
93
94 int videobuf_iolock(struct videobuf_queue* q, struct videobuf_buffer *vb,
95 struct v4l2_framebuffer *fbuf);
96
97Among other things, this call will usually allocate memory for the buffer.
98Finally, the buf_prepare() function should set the buffer's state to
99VIDEOBUF_PREPARED.
100
101When a buffer is queued for I/O, it is passed to buf_queue(), which should
102put it onto the driver's list of available buffers and set its state to
103VIDEOBUF_QUEUED. Note that this function is called with the queue spinlock
104held; if it tries to acquire it as well things will come to a screeching
105halt. Yes, this is the voice of experience. Note also that videobuf may
106wait on the first buffer in the queue; placing other buffers in front of it
107could again gum up the works. So use list_add_tail() to enqueue buffers.
108
109Finally, buf_release() is called when a buffer is no longer intended to be
110used. The driver should ensure that there is no I/O active on the buffer,
111then pass it to the appropriate free routine(s):
112
113 /* Scatter/gather drivers */
114 int videobuf_dma_unmap(struct videobuf_queue *q,
115 struct videobuf_dmabuf *dma);
116 int videobuf_dma_free(struct videobuf_dmabuf *dma);
117
118 /* vmalloc drivers */
119 void videobuf_vmalloc_free (struct videobuf_buffer *buf);
120
121 /* Contiguous drivers */
122 void videobuf_dma_contig_free(struct videobuf_queue *q,
123 struct videobuf_buffer *buf);
124
125One way to ensure that a buffer is no longer under I/O is to pass it to:
126
127 int videobuf_waiton(struct videobuf_buffer *vb, int non_blocking, int intr);
128
129Here, vb is the buffer, non_blocking indicates whether non-blocking I/O
130should be used (it should be zero in the buf_release() case), and intr
131controls whether an interruptible wait is used.
132
133File operations
134
135At this point, much of the work is done; much of the rest is slipping
136videobuf calls into the implementation of the other driver callbacks. The
137first step is in the open() function, which must initialize the
138videobuf queue. The function to use depends on the type of buffer used:
139
140 void videobuf_queue_sg_init(struct videobuf_queue *q,
141 struct videobuf_queue_ops *ops,
142 struct device *dev,
143 spinlock_t *irqlock,
144 enum v4l2_buf_type type,
145 enum v4l2_field field,
146 unsigned int msize,
147 void *priv);
148
149 void videobuf_queue_vmalloc_init(struct videobuf_queue *q,
150 struct videobuf_queue_ops *ops,
151 struct device *dev,
152 spinlock_t *irqlock,
153 enum v4l2_buf_type type,
154 enum v4l2_field field,
155 unsigned int msize,
156 void *priv);
157
158 void videobuf_queue_dma_contig_init(struct videobuf_queue *q,
159 struct videobuf_queue_ops *ops,
160 struct device *dev,
161 spinlock_t *irqlock,
162 enum v4l2_buf_type type,
163 enum v4l2_field field,
164 unsigned int msize,
165 void *priv);
166
167In each case, the parameters are the same: q is the queue structure for the
168device, ops is the set of callbacks as described above, dev is the device
169structure for this video device, irqlock is an interrupt-safe spinlock to
170protect access to the data structures, type is the buffer type used by the
171device (cameras will use V4L2_BUF_TYPE_VIDEO_CAPTURE, for example), field
172describes which field is being captured (often V4L2_FIELD_NONE for
173progressive devices), msize is the size of any containing structure used
174around struct videobuf_buffer, and priv is a private data pointer which
175shows up in the priv_data field of struct videobuf_queue. Note that these
176are void functions which, evidently, are immune to failure.
177
178V4L2 capture drivers can be written to support either of two APIs: the
179read() system call and the rather more complicated streaming mechanism. As
180a general rule, it is necessary to support both to ensure that all
181applications have a chance of working with the device. Videobuf makes it
182easy to do that with the same code. To implement read(), the driver need
183only make a call to one of:
184
185 ssize_t videobuf_read_one(struct videobuf_queue *q,
186 char __user *data, size_t count,
187 loff_t *ppos, int nonblocking);
188
189 ssize_t videobuf_read_stream(struct videobuf_queue *q,
190 char __user *data, size_t count,
191 loff_t *ppos, int vbihack, int nonblocking);
192
193Either one of these functions will read frame data into data, returning the
194amount actually read; the difference is that videobuf_read_one() will only
195read a single frame, while videobuf_read_stream() will read multiple frames
196if they are needed to satisfy the count requested by the application. A
197typical driver read() implementation will start the capture engine, call
198one of the above functions, then stop the engine before returning (though a
199smarter implementation might leave the engine running for a little while in
200anticipation of another read() call happening in the near future).
201
202The poll() function can usually be implemented with a direct call to:
203
204 unsigned int videobuf_poll_stream(struct file *file,
205 struct videobuf_queue *q,
206 poll_table *wait);
207
208Note that the actual wait queue eventually used will be the one associated
209with the first available buffer.
210
211When streaming I/O is done to kernel-space buffers, the driver must support
212the mmap() system call to enable user space to access the data. In many
213V4L2 drivers, the often-complex mmap() implementation simplifies to a
214single call to:
215
216 int videobuf_mmap_mapper(struct videobuf_queue *q,
217 struct vm_area_struct *vma);
218
219Everything else is handled by the videobuf code.
220
221The release() function requires two separate videobuf calls:
222
223 void videobuf_stop(struct videobuf_queue *q);
224 int videobuf_mmap_free(struct videobuf_queue *q);
225
226The call to videobuf_stop() terminates any I/O in progress - though it is
227still up to the driver to stop the capture engine. The call to
228videobuf_mmap_free() will ensure that all buffers have been unmapped; if
229so, they will all be passed to the buf_release() callback. If buffers
230remain mapped, videobuf_mmap_free() returns an error code instead. The
231purpose is clearly to cause the closing of the file descriptor to fail if
232buffers are still mapped, but every driver in the 2.6.32 kernel cheerfully
233ignores its return value.
234
235ioctl() operations
236
237The V4L2 API includes a very long list of driver callbacks to respond to
238the many ioctl() commands made available to user space. A number of these
239- those associated with streaming I/O - turn almost directly into videobuf
240calls. The relevant helper functions are:
241
242 int videobuf_reqbufs(struct videobuf_queue *q,
243 struct v4l2_requestbuffers *req);
244 int videobuf_querybuf(struct videobuf_queue *q, struct v4l2_buffer *b);
245 int videobuf_qbuf(struct videobuf_queue *q, struct v4l2_buffer *b);
246 int videobuf_dqbuf(struct videobuf_queue *q, struct v4l2_buffer *b,
247 int nonblocking);
248 int videobuf_streamon(struct videobuf_queue *q);
249 int videobuf_streamoff(struct videobuf_queue *q);
250 int videobuf_cgmbuf(struct videobuf_queue *q, struct video_mbuf *mbuf,
251 int count);
252
253So, for example, a VIDIOC_REQBUFS call turns into a call to the driver's
254vidioc_reqbufs() callback which, in turn, usually only needs to locate the
255proper struct videobuf_queue pointer and pass it to videobuf_reqbufs().
256These support functions can replace a great deal of buffer management
257boilerplate in a lot of V4L2 drivers.
258
259The vidioc_streamon() and vidioc_streamoff() functions will be a bit more
260complex, of course, since they will also need to deal with starting and
261stopping the capture engine. videobuf_cgmbuf(), called from the driver's
262vidiocgmbuf() function, only exists if the V4L1 compatibility module has
263been selected with CONFIG_VIDEO_V4L1_COMPAT, so its use must be surrounded
264with #ifdef directives.
265
266Buffer allocation
267
268Thus far, we have talked about buffers, but have not looked at how they are
269allocated. The scatter/gather case is the most complex on this front. For
270allocation, the driver can leave buffer allocation entirely up to the
271videobuf layer; in this case, buffers will be allocated as anonymous
272user-space pages and will be very scattered indeed. If the application is
273using user-space buffers, no allocation is needed; the videobuf layer will
274take care of calling get_user_pages() and filling in the scatterlist array.
275
276If the driver needs to do its own memory allocation, it should be done in
277the vidioc_reqbufs() function, *after* calling videobuf_reqbufs(). The
278first step is a call to:
279
280 struct videobuf_dmabuf *videobuf_to_dma(struct videobuf_buffer *buf);
281
282The returned videobuf_dmabuf structure (defined in
283<media/videobuf-dma-sg.h>) includes a couple of relevant fields:
284
285 struct scatterlist *sglist;
286 int sglen;
287
288The driver must allocate an appropriately-sized scatterlist array and
289populate it with pointers to the pieces of the allocated buffer; sglen
290should be set to the length of the array.
291
292Drivers using the vmalloc() method need not (and cannot) concern themselves
293with buffer allocation at all; videobuf will handle those details. The
294same is normally true of contiguous-DMA drivers as well; videobuf will
295allocate the buffers (with dma_alloc_coherent()) when it sees fit. That
296means that these drivers may be trying to do high-order allocations at any
297time, an operation which is not always guaranteed to work. Some drivers
298play tricks by allocating DMA space at system boot time; videobuf does not
299currently play well with those drivers.
300
301As of 2.6.31, contiguous-DMA drivers can work with a user-supplied buffer,
302as long as that buffer is physically contiguous. Normal user-space
303allocations will not meet that criterion, but buffers obtained from other
304kernel drivers, or those contained within huge pages, will work with these
305drivers.
306
307Filling the buffers
308
309The final part of a videobuf implementation has no direct callback - it's
310the portion of the code which actually puts frame data into the buffers,
311usually in response to interrupts from the device. For all types of
312drivers, this process works approximately as follows:
313
314 - Obtain the next available buffer and make sure that somebody is actually
315 waiting for it.
316
317 - Get a pointer to the memory and put video data there.
318
319 - Mark the buffer as done and wake up the process waiting for it.
320
321Step (1) above is done by looking at the driver-managed list_head structure
322- the one which is filled in the buf_queue() callback. Because starting
323the engine and enqueueing buffers are done in separate steps, it's possible
324for the engine to be running without any buffers available - in the
325vmalloc() case especially. So the driver should be prepared for the list
326to be empty. It is equally possible that nobody is yet interested in the
327buffer; the driver should not remove it from the list or fill it until a
328process is waiting on it. That test can be done by examining the buffer's
329done field (a wait_queue_head_t structure) with waitqueue_active().
330
331A buffer's state should be set to VIDEOBUF_ACTIVE before being mapped for
332DMA; that ensures that the videobuf layer will not try to do anything with
333it while the device is transferring data.
334
335For scatter/gather drivers, the needed memory pointers will be found in the
336scatterlist structure described above. Drivers using the vmalloc() method
337can get a memory pointer with:
338
339 void *videobuf_to_vmalloc(struct videobuf_buffer *buf);
340
341For contiguous DMA drivers, the function to use is:
342
343 dma_addr_t videobuf_to_dma_contig(struct videobuf_buffer *buf);
344
345The contiguous DMA API goes out of its way to hide the kernel-space address
346of the DMA buffer from drivers.
347
348The final step is to set the size field of the relevant videobuf_buffer
349structure to the actual size of the captured image, set state to
350VIDEOBUF_DONE, then call wake_up() on the done queue. At this point, the
351buffer is owned by the videobuf layer and the driver should not touch it
352again.
353
354Developers who are interested in more information can go into the relevant
355header files; there are a few low-level functions declared there which have
356not been talked about here. Also worthwhile is the vivi driver
357(drivers/media/video/vivi.c), which is maintained as an example of how V4L2
358drivers should be written. Vivi only uses the vmalloc() API, but it's good
359enough to get started with. Note also that all of these calls are exported
360GPL-only, so they will not be available to non-GPL kernel modules.
diff --git a/Documentation/vm/00-INDEX b/Documentation/vm/00-INDEX
index e57d6a9dd32b..dca82d7c83d8 100644
--- a/Documentation/vm/00-INDEX
+++ b/Documentation/vm/00-INDEX
@@ -4,23 +4,35 @@ active_mm.txt
4 - An explanation from Linus about tsk->active_mm vs tsk->mm. 4 - An explanation from Linus about tsk->active_mm vs tsk->mm.
5balance 5balance
6 - various information on memory balancing. 6 - various information on memory balancing.
7hugepage-mmap.c
8 - Example app using huge page memory with the mmap system call.
9hugepage-shm.c
10 - Example app using huge page memory with Sys V shared memory system calls.
7hugetlbpage.txt 11hugetlbpage.txt
8 - a brief summary of hugetlbpage support in the Linux kernel. 12 - a brief summary of hugetlbpage support in the Linux kernel.
13hwpoison.txt
14 - explains what hwpoison is
9ksm.txt 15ksm.txt
10 - how to use the Kernel Samepage Merging feature. 16 - how to use the Kernel Samepage Merging feature.
11locking 17locking
12 - info on how locking and synchronization is done in the Linux vm code. 18 - info on how locking and synchronization is done in the Linux vm code.
19map_hugetlb.c
20 - an example program that uses the MAP_HUGETLB mmap flag.
13numa 21numa
14 - information about NUMA specific code in the Linux vm. 22 - information about NUMA specific code in the Linux vm.
15numa_memory_policy.txt 23numa_memory_policy.txt
16 - documentation of concepts and APIs of the 2.6 memory policy support. 24 - documentation of concepts and APIs of the 2.6 memory policy support.
17overcommit-accounting 25overcommit-accounting
18 - description of the Linux kernels overcommit handling modes. 26 - description of the Linux kernels overcommit handling modes.
27page-types.c
28 - Tool for querying page flags
19page_migration 29page_migration
20 - description of page migration in NUMA systems. 30 - description of page migration in NUMA systems.
31pagemap.txt
32 - pagemap, from the userspace perspective
21slabinfo.c 33slabinfo.c
22 - source code for a tool to get reports about slabs. 34 - source code for a tool to get reports about slabs.
23slub.txt 35slub.txt
24 - a short users guide for SLUB. 36 - a short users guide for SLUB.
25map_hugetlb.c 37unevictable-lru.txt
26 - an example program that uses the MAP_HUGETLB mmap flag. 38 - Unevictable LRU infrastructure
diff --git a/Documentation/vm/Makefile b/Documentation/vm/Makefile
index 5bd269b3731a..9dcff328b964 100644
--- a/Documentation/vm/Makefile
+++ b/Documentation/vm/Makefile
@@ -2,7 +2,7 @@
2obj- := dummy.o 2obj- := dummy.o
3 3
4# List of programs to build 4# List of programs to build
5hostprogs-y := slabinfo page-types 5hostprogs-y := slabinfo page-types hugepage-mmap hugepage-shm map_hugetlb
6 6
7# Tell kbuild to always build the programs 7# Tell kbuild to always build the programs
8always := $(hostprogs-y) 8always := $(hostprogs-y)
diff --git a/Documentation/vm/hugepage-mmap.c b/Documentation/vm/hugepage-mmap.c
new file mode 100644
index 000000000000..db0dd9a33d54
--- /dev/null
+++ b/Documentation/vm/hugepage-mmap.c
@@ -0,0 +1,91 @@
1/*
2 * hugepage-mmap:
3 *
4 * Example of using huge page memory in a user application using the mmap
5 * system call. Before running this application, make sure that the
6 * administrator has mounted the hugetlbfs filesystem (on some directory
7 * like /mnt) using the command mount -t hugetlbfs nodev /mnt. In this
8 * example, the app is requesting memory of size 256MB that is backed by
9 * huge pages.
10 *
11 * For the ia64 architecture, the Linux kernel reserves Region number 4 for
12 * huge pages. That means that if one requires a fixed address, a huge page
13 * aligned address starting with 0x800000... will be required. If a fixed
14 * address is not required, the kernel will select an address in the proper
15 * range.
16 * Other architectures, such as ppc64, i386 or x86_64 are not so constrained.
17 */
18
19#include <stdlib.h>
20#include <stdio.h>
21#include <unistd.h>
22#include <sys/mman.h>
23#include <fcntl.h>
24
25#define FILE_NAME "/mnt/hugepagefile"
26#define LENGTH (256UL*1024*1024)
27#define PROTECTION (PROT_READ | PROT_WRITE)
28
29/* Only ia64 requires this */
30#ifdef __ia64__
31#define ADDR (void *)(0x8000000000000000UL)
32#define FLAGS (MAP_SHARED | MAP_FIXED)
33#else
34#define ADDR (void *)(0x0UL)
35#define FLAGS (MAP_SHARED)
36#endif
37
38static void check_bytes(char *addr)
39{
40 printf("First hex is %x\n", *((unsigned int *)addr));
41}
42
43static void write_bytes(char *addr)
44{
45 unsigned long i;
46
47 for (i = 0; i < LENGTH; i++)
48 *(addr + i) = (char)i;
49}
50
51static void read_bytes(char *addr)
52{
53 unsigned long i;
54
55 check_bytes(addr);
56 for (i = 0; i < LENGTH; i++)
57 if (*(addr + i) != (char)i) {
58 printf("Mismatch at %lu\n", i);
59 break;
60 }
61}
62
63int main(void)
64{
65 void *addr;
66 int fd;
67
68 fd = open(FILE_NAME, O_CREAT | O_RDWR, 0755);
69 if (fd < 0) {
70 perror("Open failed");
71 exit(1);
72 }
73
74 addr = mmap(ADDR, LENGTH, PROTECTION, FLAGS, fd, 0);
75 if (addr == MAP_FAILED) {
76 perror("mmap");
77 unlink(FILE_NAME);
78 exit(1);
79 }
80
81 printf("Returned address is %p\n", addr);
82 check_bytes(addr);
83 write_bytes(addr);
84 read_bytes(addr);
85
86 munmap(addr, LENGTH);
87 close(fd);
88 unlink(FILE_NAME);
89
90 return 0;
91}
diff --git a/Documentation/vm/hugepage-shm.c b/Documentation/vm/hugepage-shm.c
new file mode 100644
index 000000000000..07956d8592c9
--- /dev/null
+++ b/Documentation/vm/hugepage-shm.c
@@ -0,0 +1,98 @@
1/*
2 * hugepage-shm:
3 *
4 * Example of using huge page memory in a user application using Sys V shared
5 * memory system calls. In this example the app is requesting 256MB of
6 * memory that is backed by huge pages. The application uses the flag
7 * SHM_HUGETLB in the shmget system call to inform the kernel that it is
8 * requesting huge pages.
9 *
10 * For the ia64 architecture, the Linux kernel reserves Region number 4 for
11 * huge pages. That means that if one requires a fixed address, a huge page
12 * aligned address starting with 0x800000... will be required. If a fixed
13 * address is not required, the kernel will select an address in the proper
14 * range.
15 * Other architectures, such as ppc64, i386 or x86_64 are not so constrained.
16 *
17 * Note: The default shared memory limit is quite low on many kernels,
18 * you may need to increase it via:
19 *
20 * echo 268435456 > /proc/sys/kernel/shmmax
21 *
22 * This will increase the maximum size per shared memory segment to 256MB.
23 * The other limit that you will hit eventually is shmall which is the
24 * total amount of shared memory in pages. To set it to 16GB on a system
25 * with a 4kB pagesize do:
26 *
27 * echo 4194304 > /proc/sys/kernel/shmall
28 */
29
30#include <stdlib.h>
31#include <stdio.h>
32#include <sys/types.h>
33#include <sys/ipc.h>
34#include <sys/shm.h>
35#include <sys/mman.h>
36
37#ifndef SHM_HUGETLB
38#define SHM_HUGETLB 04000
39#endif
40
41#define LENGTH (256UL*1024*1024)
42
43#define dprintf(x) printf(x)
44
45/* Only ia64 requires this */
46#ifdef __ia64__
47#define ADDR (void *)(0x8000000000000000UL)
48#define SHMAT_FLAGS (SHM_RND)
49#else
50#define ADDR (void *)(0x0UL)
51#define SHMAT_FLAGS (0)
52#endif
53
54int main(void)
55{
56 int shmid;
57 unsigned long i;
58 char *shmaddr;
59
60 if ((shmid = shmget(2, LENGTH,
61 SHM_HUGETLB | IPC_CREAT | SHM_R | SHM_W)) < 0) {
62 perror("shmget");
63 exit(1);
64 }
65 printf("shmid: 0x%x\n", shmid);
66
67 shmaddr = shmat(shmid, ADDR, SHMAT_FLAGS);
68 if (shmaddr == (char *)-1) {
69 perror("Shared memory attach failure");
70 shmctl(shmid, IPC_RMID, NULL);
71 exit(2);
72 }
73 printf("shmaddr: %p\n", shmaddr);
74
75 dprintf("Starting the writes:\n");
76 for (i = 0; i < LENGTH; i++) {
77 shmaddr[i] = (char)(i);
78 if (!(i % (1024 * 1024)))
79 dprintf(".");
80 }
81 dprintf("\n");
82
83 dprintf("Starting the Check...");
84 for (i = 0; i < LENGTH; i++)
85 if (shmaddr[i] != (char)i)
86 printf("\nIndex %lu mismatched\n", i);
87 dprintf("Done.\n");
88
89 if (shmdt((const void *)shmaddr) != 0) {
90 perror("Detach failure");
91 shmctl(shmid, IPC_RMID, NULL);
92 exit(3);
93 }
94
95 shmctl(shmid, IPC_RMID, NULL);
96
97 return 0;
98}
diff --git a/Documentation/vm/hugetlbpage.txt b/Documentation/vm/hugetlbpage.txt
index bc31636973e3..457634c1e03e 100644
--- a/Documentation/vm/hugetlbpage.txt
+++ b/Documentation/vm/hugetlbpage.txt
@@ -299,176 +299,11 @@ map_hugetlb.c.
299******************************************************************* 299*******************************************************************
300 300
301/* 301/*
302 * Example of using huge page memory in a user application using Sys V shared 302 * hugepage-shm: see Documentation/vm/hugepage-shm.c
303 * memory system calls. In this example the app is requesting 256MB of
304 * memory that is backed by huge pages. The application uses the flag
305 * SHM_HUGETLB in the shmget system call to inform the kernel that it is
306 * requesting huge pages.
307 *
308 * For the ia64 architecture, the Linux kernel reserves Region number 4 for
309 * huge pages. That means that if one requires a fixed address, a huge page
310 * aligned address starting with 0x800000... will be required. If a fixed
311 * address is not required, the kernel will select an address in the proper
312 * range.
313 * Other architectures, such as ppc64, i386 or x86_64 are not so constrained.
314 *
315 * Note: The default shared memory limit is quite low on many kernels,
316 * you may need to increase it via:
317 *
318 * echo 268435456 > /proc/sys/kernel/shmmax
319 *
320 * This will increase the maximum size per shared memory segment to 256MB.
321 * The other limit that you will hit eventually is shmall which is the
322 * total amount of shared memory in pages. To set it to 16GB on a system
323 * with a 4kB pagesize do:
324 *
325 * echo 4194304 > /proc/sys/kernel/shmall
326 */ 303 */
327#include <stdlib.h>
328#include <stdio.h>
329#include <sys/types.h>
330#include <sys/ipc.h>
331#include <sys/shm.h>
332#include <sys/mman.h>
333
334#ifndef SHM_HUGETLB
335#define SHM_HUGETLB 04000
336#endif
337
338#define LENGTH (256UL*1024*1024)
339
340#define dprintf(x) printf(x)
341
342#define ADDR (void *)(0x0UL) /* let kernel choose address */
343#define SHMAT_FLAGS (0)
344
345int main(void)
346{
347 int shmid;
348 unsigned long i;
349 char *shmaddr;
350
351 if ((shmid = shmget(2, LENGTH,
352 SHM_HUGETLB | IPC_CREAT | SHM_R | SHM_W)) < 0) {
353 perror("shmget");
354 exit(1);
355 }
356 printf("shmid: 0x%x\n", shmid);
357
358 shmaddr = shmat(shmid, ADDR, SHMAT_FLAGS);
359 if (shmaddr == (char *)-1) {
360 perror("Shared memory attach failure");
361 shmctl(shmid, IPC_RMID, NULL);
362 exit(2);
363 }
364 printf("shmaddr: %p\n", shmaddr);
365
366 dprintf("Starting the writes:\n");
367 for (i = 0; i < LENGTH; i++) {
368 shmaddr[i] = (char)(i);
369 if (!(i % (1024 * 1024)))
370 dprintf(".");
371 }
372 dprintf("\n");
373
374 dprintf("Starting the Check...");
375 for (i = 0; i < LENGTH; i++)
376 if (shmaddr[i] != (char)i)
377 printf("\nIndex %lu mismatched\n", i);
378 dprintf("Done.\n");
379
380 if (shmdt((const void *)shmaddr) != 0) {
381 perror("Detach failure");
382 shmctl(shmid, IPC_RMID, NULL);
383 exit(3);
384 }
385
386 shmctl(shmid, IPC_RMID, NULL);
387
388 return 0;
389}
390 304
391******************************************************************* 305*******************************************************************
392 306
393/* 307/*
394 * Example of using huge page memory in a user application using the mmap 308 * hugepage-mmap: see Documentation/vm/hugepage-mmap.c
395 * system call. Before running this application, make sure that the
396 * administrator has mounted the hugetlbfs filesystem (on some directory
397 * like /mnt) using the command mount -t hugetlbfs nodev /mnt. In this
398 * example, the app is requesting memory of size 256MB that is backed by
399 * huge pages.
400 *
401 * For the ia64 architecture, the Linux kernel reserves Region number 4 for
402 * huge pages. That means that if one requires a fixed address, a huge page
403 * aligned address starting with 0x800000... will be required. If a fixed
404 * address is not required, the kernel will select an address in the proper
405 * range.
406 * Other architectures, such as ppc64, i386 or x86_64 are not so constrained.
407 */ 309 */
408#include <stdlib.h>
409#include <stdio.h>
410#include <unistd.h>
411#include <sys/mman.h>
412#include <fcntl.h>
413
414#define FILE_NAME "/mnt/hugepagefile"
415#define LENGTH (256UL*1024*1024)
416#define PROTECTION (PROT_READ | PROT_WRITE)
417
418#define ADDR (void *)(0x0UL) /* let kernel choose address */
419#define FLAGS (MAP_SHARED)
420
421void check_bytes(char *addr)
422{
423 printf("First hex is %x\n", *((unsigned int *)addr));
424}
425
426void write_bytes(char *addr)
427{
428 unsigned long i;
429
430 for (i = 0; i < LENGTH; i++)
431 *(addr + i) = (char)i;
432}
433
434void read_bytes(char *addr)
435{
436 unsigned long i;
437
438 check_bytes(addr);
439 for (i = 0; i < LENGTH; i++)
440 if (*(addr + i) != (char)i) {
441 printf("Mismatch at %lu\n", i);
442 break;
443 }
444}
445
446int main(void)
447{
448 void *addr;
449 int fd;
450
451 fd = open(FILE_NAME, O_CREAT | O_RDWR, 0755);
452 if (fd < 0) {
453 perror("Open failed");
454 exit(1);
455 }
456
457 addr = mmap(ADDR, LENGTH, PROTECTION, FLAGS, fd, 0);
458 if (addr == MAP_FAILED) {
459 perror("mmap");
460 unlink(FILE_NAME);
461 exit(1);
462 }
463
464 printf("Returned address is %p\n", addr);
465 check_bytes(addr);
466 write_bytes(addr);
467 read_bytes(addr);
468
469 munmap(addr, LENGTH);
470 close(fd);
471 unlink(FILE_NAME);
472
473 return 0;
474}
diff --git a/Documentation/vm/map_hugetlb.c b/Documentation/vm/map_hugetlb.c
index e2bdae37f499..9969c7d9f985 100644
--- a/Documentation/vm/map_hugetlb.c
+++ b/Documentation/vm/map_hugetlb.c
@@ -31,12 +31,12 @@
31#define FLAGS (MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB) 31#define FLAGS (MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB)
32#endif 32#endif
33 33
34void check_bytes(char *addr) 34static void check_bytes(char *addr)
35{ 35{
36 printf("First hex is %x\n", *((unsigned int *)addr)); 36 printf("First hex is %x\n", *((unsigned int *)addr));
37} 37}
38 38
39void write_bytes(char *addr) 39static void write_bytes(char *addr)
40{ 40{
41 unsigned long i; 41 unsigned long i;
42 42
@@ -44,7 +44,7 @@ void write_bytes(char *addr)
44 *(addr + i) = (char)i; 44 *(addr + i) = (char)i;
45} 45}
46 46
47void read_bytes(char *addr) 47static void read_bytes(char *addr)
48{ 48{
49 unsigned long i; 49 unsigned long i;
50 50
diff --git a/Documentation/vm/slub.txt b/Documentation/vm/slub.txt
index b37300edf27c..07375e73981a 100644
--- a/Documentation/vm/slub.txt
+++ b/Documentation/vm/slub.txt
@@ -41,6 +41,7 @@ Possible debug options are
41 P Poisoning (object and padding) 41 P Poisoning (object and padding)
42 U User tracking (free and alloc) 42 U User tracking (free and alloc)
43 T Trace (please only use on single slabs) 43 T Trace (please only use on single slabs)
44 A Toggle failslab filter mark for the cache
44 O Switch debugging off for caches that would have 45 O Switch debugging off for caches that would have
45 caused higher minimum slab orders 46 caused higher minimum slab orders
46 - Switch all debugging off (useful if the kernel is 47 - Switch all debugging off (useful if the kernel is
diff --git a/Documentation/voyager.txt b/Documentation/voyager.txt
deleted file mode 100644
index 2749af552cdf..000000000000
--- a/Documentation/voyager.txt
+++ /dev/null
@@ -1,95 +0,0 @@
1Running Linux on the Voyager Architecture
2=========================================
3
4For full details and current project status, see
5
6http://www.hansenpartnership.com/voyager
7
8The voyager architecture was designed by NCR in the mid 80s to be a
9fully SMP capable RAS computing architecture built around intel's 486
10chip set. The voyager came in three levels of architectural
11sophistication: 3,4 and 5 --- 1 and 2 never made it out of prototype.
12The linux patches support only the Level 5 voyager architecture (any
13machine class 3435 and above).
14
15The Voyager Architecture
16------------------------
17
18Voyager machines consist of a Baseboard with a 386 diagnostic
19processor, a Power Supply Interface (PSI) a Primary and possibly
20Secondary Microchannel bus and between 2 and 20 voyager slots. The
21voyager slots can be populated with memory and cpu cards (up to 4GB
22memory and from 1 486 to 32 Pentium Pro processors). Internally, the
23voyager has a dual arbitrated system bus and a configuration and test
24bus (CAT). The voyager bus speed is 40MHz. Therefore (since all
25voyager cards are dual ported for each system bus) the maximum
26transfer rate is 320Mb/s but only if you have your slot configuration
27tuned (only memory cards can communicate with both busses at once, CPU
28cards utilise them one at a time).
29
30Voyager SMP
31-----------
32
33Since voyager was the first intel based SMP system, it is slightly
34more primitive than the Intel IO-APIC approach to SMP. Voyager allows
35arbitrary interrupt routing (including processor affinity routing) of
36all 16 PC type interrupts. However it does this by using a modified
375259 master/slave chip set instead of an APIC bus. Additionally,
38voyager supports Cross Processor Interrupts (CPI) equivalent to the
39APIC IPIs. There are two routed voyager interrupt lines provided to
40each slot.
41
42Processor Cards
43---------------
44
45These come in single, dyadic and quad configurations (the quads are
46problematic--see later). The maximum configuration is 8 quad cards
47for 32 way SMP.
48
49Quad Processors
50---------------
51
52Because voyager only supplies two interrupt lines to each Processor
53card, the Quad processors have to be configured (and Bootstrapped) in
54as a pair of Master/Slave processors.
55
56In fact, most Quad cards only accept one VIC interrupt line, so they
57have one interrupt handling processor (called the VIC extended
58processor) and three non-interrupt handling processors.
59
60Current Status
61--------------
62
63The System will boot on Mono, Dyad and Quad cards. There was
64originally a Quad boot problem which has been fixed by proper gdt
65alignment in the initial boot loader. If you still cannot get your
66voyager system to boot, email me at:
67
68<J.E.J.Bottomley@HansenPartnership.com>
69
70
71The Quad cards now support using the separate Quad CPI vectors instead
72of going through the VIC mailbox system.
73
74The Level 4 architecture (3430 and 3360 Machines) should also work
75fine.
76
77Dump Switch
78-----------
79
80The voyager dump switch sends out a broadcast NMI which the voyager
81code intercepts and does a task dump.
82
83Power Switch
84------------
85
86The front panel power switch is intercepted by the kernel and should
87cause a system shutdown and power off.
88
89A Note About Mixed CPU Systems
90------------------------------
91
92Linux isn't designed to handle mixed CPU systems very well. In order
93to get everything going you *must* make sure that your lowest
94capability CPU is used for booting. Also, mixing CPU classes
95(e.g. 486 and 586) is really not going to work very well at all.
diff --git a/Documentation/x86/x86_64/boot-options.txt b/Documentation/x86/x86_64/boot-options.txt
index 29a6ff8bc7d3..7fbbaf85f5b7 100644
--- a/Documentation/x86/x86_64/boot-options.txt
+++ b/Documentation/x86/x86_64/boot-options.txt
@@ -166,19 +166,13 @@ NUMA
166 166
167 numa=noacpi Don't parse the SRAT table for NUMA setup 167 numa=noacpi Don't parse the SRAT table for NUMA setup
168 168
169 numa=fake=CMDLINE 169 numa=fake=<size>[MG]
170 If a number, fakes CMDLINE nodes and ignores NUMA setup of the 170 If given as a memory unit, fills all system RAM with nodes of
171 actual machine. Otherwise, system memory is configured 171 size interleaved over physical nodes.
172 depending on the sizes and coefficients listed. For example: 172
173 numa=fake=2*512,1024,4*256,*128 173 numa=fake=<N>
174 gives two 512M nodes, a 1024M node, four 256M nodes, and the 174 If given as an integer, fills all system RAM with N fake nodes
175 rest split into 128M chunks. If the last character of CMDLINE 175 interleaved over physical nodes.
176 is a *, the remaining memory is divided up equally among its
177 coefficient:
178 numa=fake=2*512,2*
179 gives two 512M nodes and the rest split into two nodes.
180 Otherwise, the remaining system RAM is allocated to an
181 additional node.
182 176
183ACPI 177ACPI
184 178