aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/00-INDEX2
-rw-r--r--Documentation/ABI/removed/devfs (renamed from Documentation/ABI/obsolete/devfs)5
-rw-r--r--Documentation/ABI/testing/sysfs-power88
-rw-r--r--Documentation/Changes22
-rw-r--r--Documentation/CodingStyle34
-rw-r--r--Documentation/DMA-mapping.txt8
-rw-r--r--Documentation/DocBook/Makefile3
-rw-r--r--Documentation/DocBook/genericirq.tmpl474
-rw-r--r--Documentation/DocBook/kernel-api.tmpl130
-rw-r--r--Documentation/DocBook/kernel-locking.tmpl2
-rw-r--r--Documentation/DocBook/libata.tmpl12
-rw-r--r--Documentation/DocBook/mtdnand.tmpl17
-rw-r--r--Documentation/DocBook/usb.tmpl123
-rw-r--r--Documentation/DocBook/videobook.tmpl2
-rw-r--r--Documentation/HOWTO3
-rw-r--r--Documentation/IPMI.txt4
-rw-r--r--Documentation/IRQ.txt22
-rw-r--r--Documentation/RCU/checklist.txt44
-rw-r--r--Documentation/RCU/torture.txt34
-rw-r--r--Documentation/RCU/whatisRCU.txt17
-rw-r--r--Documentation/README.DAC9606
-rw-r--r--Documentation/SubmitChecklist79
-rw-r--r--Documentation/SubmittingDrivers21
-rw-r--r--Documentation/SubmittingPatches51
-rw-r--r--Documentation/accounting/delay-accounting.txt112
-rw-r--r--Documentation/accounting/getdelays.c396
-rw-r--r--Documentation/accounting/taskstats.txt181
-rw-r--r--Documentation/arm/IXP4xx2
-rw-r--r--Documentation/arm/Samsung-S3C24XX/Overview.txt35
-rw-r--r--Documentation/arm/Samsung-S3C24XX/S3C2412.txt120
-rw-r--r--Documentation/arm/Samsung-S3C24XX/S3C2413.txt21
-rw-r--r--Documentation/atomic_ops.txt28
-rw-r--r--Documentation/cciss.txt1
-rw-r--r--Documentation/connector/ucon.c206
-rw-r--r--Documentation/console/console.txt144
-rw-r--r--Documentation/cpu-freq/user-guide.txt5
-rw-r--r--Documentation/cpu-hotplug.txt12
-rw-r--r--Documentation/cpusets.txt6
-rw-r--r--Documentation/crypto/api-intro.txt36
-rw-r--r--Documentation/devices.txt18
-rw-r--r--Documentation/digiepca.txt2
-rw-r--r--Documentation/dontdiff1
-rw-r--r--Documentation/driver-model/overview.txt2
-rw-r--r--Documentation/drivers/edac/edac.txt152
-rw-r--r--Documentation/fb/fbcon.txt180
-rw-r--r--Documentation/fb/imacfb.txt31
-rw-r--r--Documentation/feature-removal-schedule.txt172
-rw-r--r--Documentation/filesystems/00-INDEX4
-rw-r--r--Documentation/filesystems/Locking4
-rw-r--r--Documentation/filesystems/automount-support.txt2
-rw-r--r--Documentation/filesystems/configfs/configfs_example.c19
-rw-r--r--Documentation/filesystems/devfs/ChangeLog1977
-rw-r--r--Documentation/filesystems/devfs/README1959
-rw-r--r--Documentation/filesystems/devfs/ToDo40
-rw-r--r--Documentation/filesystems/devfs/boot-options65
-rw-r--r--Documentation/filesystems/ext3.txt8
-rw-r--r--Documentation/filesystems/fuse.txt118
-rw-r--r--Documentation/filesystems/proc.txt32
-rw-r--r--Documentation/filesystems/ramfs-rootfs-initramfs.txt146
-rw-r--r--Documentation/filesystems/relay.txt479
-rw-r--r--Documentation/filesystems/relayfs.txt442
-rw-r--r--Documentation/filesystems/vfs.txt4
-rw-r--r--Documentation/hwmon/abituguru32
-rw-r--r--Documentation/hwmon/it8761
-rw-r--r--Documentation/hwmon/k8temp52
-rw-r--r--Documentation/hwmon/vt1211206
-rw-r--r--Documentation/hwmon/w83627ehf85
-rw-r--r--Documentation/hwmon/w83791d69
-rw-r--r--Documentation/i2c/busses/i2c-sis96x4
-rw-r--r--Documentation/i2c/busses/i2c-viapro7
-rw-r--r--Documentation/i2c/i2c-stub15
-rw-r--r--Documentation/i386/boot.txt1
-rw-r--r--Documentation/i386/zero-page.txt4
-rw-r--r--Documentation/infiniband/ipoib.txt2
-rw-r--r--Documentation/initrd.txt40
-rw-r--r--Documentation/input/joystick.txt1
-rw-r--r--Documentation/ioctl-number.txt1
-rw-r--r--Documentation/irqflags-tracing.txt57
-rw-r--r--Documentation/kbuild/kconfig-language.txt12
-rw-r--r--Documentation/kbuild/makefiles.txt290
-rw-r--r--Documentation/kbuild/modules.txt161
-rw-r--r--Documentation/kdump/gdbmacros.txt2
-rw-r--r--Documentation/kdump/kdump.txt420
-rw-r--r--Documentation/kernel-parameters.txt78
-rw-r--r--Documentation/keys-request-key.txt54
-rw-r--r--Documentation/keys.txt72
-rw-r--r--Documentation/kobject.txt2
-rw-r--r--Documentation/lockdep-design.txt197
-rw-r--r--Documentation/md.txt67
-rw-r--r--Documentation/memory-barriers.txt41
-rw-r--r--Documentation/mips/time.README10
-rw-r--r--Documentation/netlabel/00-INDEX10
-rw-r--r--Documentation/netlabel/cipso_ipv4.txt48
-rw-r--r--Documentation/netlabel/draft-ietf-cipso-ipsecurity-01.txt791
-rw-r--r--Documentation/netlabel/introduction.txt46
-rw-r--r--Documentation/netlabel/lsm_interface.txt47
-rw-r--r--Documentation/networking/LICENSE.qla3xxx46
-rw-r--r--Documentation/networking/bonding.txt59
-rw-r--r--Documentation/networking/dccp.txt8
-rw-r--r--Documentation/networking/ip-sysctl.txt56
-rw-r--r--Documentation/networking/ipvs-sysctl.txt143
-rw-r--r--Documentation/networking/pktgen.txt18
-rw-r--r--Documentation/networking/secid.txt14
-rw-r--r--Documentation/nfsroot.txt275
-rw-r--r--Documentation/nommu-mmap.txt46
-rw-r--r--Documentation/pci.txt2
-rw-r--r--Documentation/pcieaer-howto.txt253
-rw-r--r--Documentation/pcmcia/crc32hash.c32
-rw-r--r--Documentation/pcmcia/devicetable.txt36
-rw-r--r--Documentation/pi-futex.txt121
-rw-r--r--Documentation/power/devices.txt725
-rw-r--r--Documentation/power/interface.txt15
-rw-r--r--Documentation/powerpc/booting-without-of.txt20
-rw-r--r--Documentation/ramdisk.txt12
-rw-r--r--Documentation/robust-futexes.txt2
-rw-r--r--Documentation/rt-mutex-design.txt781
-rw-r--r--Documentation/rt-mutex.txt79
-rw-r--r--Documentation/rtc.txt7
-rw-r--r--Documentation/scsi/ChangeLog.arcmsr56
-rw-r--r--Documentation/scsi/ChangeLog.megaraid123
-rw-r--r--Documentation/scsi/ChangeLog.megaraid_sas16
-rw-r--r--Documentation/scsi/aacraid.txt53
-rw-r--r--Documentation/scsi/arcmsr_spec.txt574
-rw-r--r--Documentation/scsi/libsas.txt484
-rw-r--r--Documentation/scsi/ppa.txt2
-rw-r--r--Documentation/scsi/tmscsim.txt2
-rw-r--r--Documentation/seclvl.txt97
-rw-r--r--Documentation/sh/new-machine.txt128
-rw-r--r--Documentation/sh/register-banks.txt33
-rw-r--r--Documentation/sound/alsa/ALSA-Configuration.txt148
-rw-r--r--Documentation/sound/alsa/DocBook/writing-an-alsa-driver.tmpl17
-rw-r--r--Documentation/sparc/sbus_drivers.txt95
-rw-r--r--Documentation/sparse.txt8
-rw-r--r--Documentation/sysctl/fs.txt20
-rw-r--r--Documentation/sysctl/kernel.txt25
-rw-r--r--Documentation/sysctl/vm.txt40
-rw-r--r--Documentation/sysrq.txt5
-rw-r--r--Documentation/tty.txt7
-rw-r--r--Documentation/usb/error-codes.txt11
-rw-r--r--Documentation/usb/proc_usb_info.txt2
-rw-r--r--Documentation/usb/usb-help.txt3
-rw-r--r--Documentation/usb/usb-serial.txt9
-rw-r--r--Documentation/video4linux/CARDLIST.bttv4
-rw-r--r--Documentation/video4linux/CARDLIST.cx8810
-rw-r--r--Documentation/video4linux/CARDLIST.saa71341
-rw-r--r--Documentation/video4linux/CARDLIST.tuner3
-rw-r--r--Documentation/video4linux/CQcam.txt203
-rw-r--r--Documentation/video4linux/README.pvrusb2212
-rw-r--r--Documentation/video4linux/Zoran23
-rw-r--r--Documentation/video4linux/bttv/CONTRIBUTORS8
-rw-r--r--Documentation/video4linux/cx2341x/fw-calling.txt69
-rw-r--r--Documentation/video4linux/cx2341x/fw-decoder-api.txt319
-rw-r--r--Documentation/video4linux/cx2341x/fw-dma.txt94
-rw-r--r--Documentation/video4linux/cx2341x/fw-encoder-api.txt694
-rw-r--r--Documentation/video4linux/cx2341x/fw-memory.txt141
-rw-r--r--Documentation/video4linux/cx2341x/fw-osd-api.txt342
-rw-r--r--Documentation/video4linux/cx2341x/fw-upload.txt49
-rw-r--r--Documentation/video4linux/cx88/hauppauge-wintv-cx88-ir.txt54
-rw-r--r--Documentation/video4linux/et61x251.txt52
-rw-r--r--Documentation/video4linux/ibmcam.txt168
-rw-r--r--Documentation/video4linux/ov511.txt32
-rw-r--r--Documentation/video4linux/sn9c102.txt78
-rw-r--r--Documentation/video4linux/v4lgrab.c192
-rw-r--r--Documentation/video4linux/w9968cf.txt162
-rw-r--r--Documentation/video4linux/zc0301.txt80
-rw-r--r--Documentation/watchdog/pcwd-watchdog.txt75
-rw-r--r--Documentation/watchdog/src/watchdog-simple.c15
-rw-r--r--Documentation/watchdog/src/watchdog-test.c68
-rw-r--r--Documentation/watchdog/watchdog-api.txt56
-rw-r--r--Documentation/watchdog/watchdog.txt23
-rw-r--r--Documentation/x86_64/boot-options.txt40
-rw-r--r--Documentation/x86_64/kernel-stacks99
172 files changed, 12649 insertions, 6775 deletions
diff --git a/Documentation/00-INDEX b/Documentation/00-INDEX
index 5f7f7d7f77d2..02457ec9c94f 100644
--- a/Documentation/00-INDEX
+++ b/Documentation/00-INDEX
@@ -184,6 +184,8 @@ mtrr.txt
184 - how to use PPro Memory Type Range Registers to increase performance. 184 - how to use PPro Memory Type Range Registers to increase performance.
185nbd.txt 185nbd.txt
186 - info on a TCP implementation of a network block device. 186 - info on a TCP implementation of a network block device.
187netlabel/
188 - directory with information on the NetLabel subsystem.
187networking/ 189networking/
188 - directory with info on various aspects of networking with Linux. 190 - directory with info on various aspects of networking with Linux.
189nfsroot.txt 191nfsroot.txt
diff --git a/Documentation/ABI/obsolete/devfs b/Documentation/ABI/removed/devfs
index b8b87399bc8f..8195c4e0d0a1 100644
--- a/Documentation/ABI/obsolete/devfs
+++ b/Documentation/ABI/removed/devfs
@@ -1,13 +1,12 @@
1What: devfs 1What: devfs
2Date: July 2005 2Date: July 2005 (scheduled), finally removed in kernel v2.6.18
3Contact: Greg Kroah-Hartman <gregkh@suse.de> 3Contact: Greg Kroah-Hartman <gregkh@suse.de>
4Description: 4Description:
5 devfs has been unmaintained for a number of years, has unfixable 5 devfs has been unmaintained for a number of years, has unfixable
6 races, contains a naming policy within the kernel that is 6 races, contains a naming policy within the kernel that is
7 against the LSB, and can be replaced by using udev. 7 against the LSB, and can be replaced by using udev.
8 The files fs/devfs/*, include/linux/devfs_fs*.h will be removed, 8 The files fs/devfs/*, include/linux/devfs_fs*.h were removed,
9 along with the the assorted devfs function calls throughout the 9 along with the the assorted devfs function calls throughout the
10 kernel tree. 10 kernel tree.
11 11
12Users: 12Users:
13
diff --git a/Documentation/ABI/testing/sysfs-power b/Documentation/ABI/testing/sysfs-power
new file mode 100644
index 000000000000..d882f8093871
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-power
@@ -0,0 +1,88 @@
1What: /sys/power/
2Date: August 2006
3Contact: Rafael J. Wysocki <rjw@sisk.pl>
4Description:
5 The /sys/power directory will contain files that will
6 provide a unified interface to the power management
7 subsystem.
8
9What: /sys/power/state
10Date: August 2006
11Contact: Rafael J. Wysocki <rjw@sisk.pl>
12Description:
13 The /sys/power/state file controls the system power state.
14 Reading from this file returns what states are supported,
15 which is hard-coded to 'standby' (Power-On Suspend), 'mem'
16 (Suspend-to-RAM), and 'disk' (Suspend-to-Disk).
17
18 Writing to this file one of these strings causes the system to
19 transition into that state. Please see the file
20 Documentation/power/states.txt for a description of each of
21 these states.
22
23What: /sys/power/disk
24Date: August 2006
25Contact: Rafael J. Wysocki <rjw@sisk.pl>
26Description:
27 The /sys/power/disk file controls the operating mode of the
28 suspend-to-disk mechanism. Reading from this file returns
29 the name of the method by which the system will be put to
30 sleep on the next suspend. There are four methods supported:
31 'firmware' - means that the memory image will be saved to disk
32 by some firmware, in which case we also assume that the
33 firmware will handle the system suspend.
34 'platform' - the memory image will be saved by the kernel and
35 the system will be put to sleep by the platform driver (e.g.
36 ACPI or other PM registers).
37 'shutdown' - the memory image will be saved by the kernel and
38 the system will be powered off.
39 'reboot' - the memory image will be saved by the kernel and
40 the system will be rebooted.
41
42 The suspend-to-disk method may be chosen by writing to this
43 file one of the accepted strings:
44
45 'firmware'
46 'platform'
47 'shutdown'
48 'reboot'
49
50 It will only change to 'firmware' or 'platform' if the system
51 supports that.
52
53What: /sys/power/image_size
54Date: August 2006
55Contact: Rafael J. Wysocki <rjw@sisk.pl>
56Description:
57 The /sys/power/image_size file controls the size of the image
58 created by the suspend-to-disk mechanism. It can be written a
59 string representing a non-negative integer that will be used
60 as an upper limit of the image size, in bytes. The kernel's
61 suspend-to-disk code will do its best to ensure the image size
62 will not exceed this number. However, if it turns out to be
63 impossible, the kernel will try to suspend anyway using the
64 smallest image possible. In particular, if "0" is written to
65 this file, the suspend image will be as small as possible.
66
67 Reading from this file will display the current image size
68 limit, which is set to 500 MB by default.
69
70What: /sys/power/pm_trace
71Date: August 2006
72Contact: Rafael J. Wysocki <rjw@sisk.pl>
73Description:
74 The /sys/power/pm_trace file controls the code which saves the
75 last PM event point in the RTC across reboots, so that you can
76 debug a machine that just hangs during suspend (or more
77 commonly, during resume). Namely, the RTC is only used to save
78 the last PM event point if this file contains '1'. Initially
79 it contains '0' which may be changed to '1' by writing a
80 string representing a nonzero integer into it.
81
82 To use this debugging feature you should attempt to suspend
83 the machine, then reboot it and run
84
85 dmesg -s 1000000 | grep 'hash matches'
86
87 CAUTION: Using it will cause your machine's real-time (CMOS)
88 clock to be set to a random invalid time after a resume.
diff --git a/Documentation/Changes b/Documentation/Changes
index b02f476c2973..abee7f58c1ed 100644
--- a/Documentation/Changes
+++ b/Documentation/Changes
@@ -37,15 +37,14 @@ o e2fsprogs 1.29 # tune2fs
37o jfsutils 1.1.3 # fsck.jfs -V 37o jfsutils 1.1.3 # fsck.jfs -V
38o reiserfsprogs 3.6.3 # reiserfsck -V 2>&1|grep reiserfsprogs 38o reiserfsprogs 3.6.3 # reiserfsck -V 2>&1|grep reiserfsprogs
39o xfsprogs 2.6.0 # xfs_db -V 39o xfsprogs 2.6.0 # xfs_db -V
40o pcmciautils 004 40o pcmciautils 004 # pccardctl -V
41o pcmcia-cs 3.1.21 # cardmgr -V
42o quota-tools 3.09 # quota -V 41o quota-tools 3.09 # quota -V
43o PPP 2.4.0 # pppd --version 42o PPP 2.4.0 # pppd --version
44o isdn4k-utils 3.1pre1 # isdnctrl 2>&1|grep version 43o isdn4k-utils 3.1pre1 # isdnctrl 2>&1|grep version
45o nfs-utils 1.0.5 # showmount --version 44o nfs-utils 1.0.5 # showmount --version
46o procps 3.2.0 # ps --version 45o procps 3.2.0 # ps --version
47o oprofile 0.9 # oprofiled --version 46o oprofile 0.9 # oprofiled --version
48o udev 071 # udevinfo -V 47o udev 081 # udevinfo -V
49 48
50Kernel compilation 49Kernel compilation
51================== 50==================
@@ -181,8 +180,8 @@ Intel IA32 microcode
181-------------------- 180--------------------
182 181
183A driver has been added to allow updating of Intel IA32 microcode, 182A driver has been added to allow updating of Intel IA32 microcode,
184accessible as both a devfs regular file and as a normal (misc) 183accessible as a normal (misc) character device. If you are not using
185character device. If you are not using devfs you may need to: 184udev you may need to:
186 185
187mkdir /dev/cpu 186mkdir /dev/cpu
188mknod /dev/cpu/microcode c 10 184 187mknod /dev/cpu/microcode c 10 184
@@ -201,7 +200,9 @@ with programs using shared memory.
201udev 200udev
202---- 201----
203udev is a userspace application for populating /dev dynamically with 202udev is a userspace application for populating /dev dynamically with
204only entries for devices actually present. udev replaces devfs. 203only entries for devices actually present. udev replaces the basic
204functionality of devfs, while allowing persistant device naming for
205devices.
205 206
206FUSE 207FUSE
207---- 208----
@@ -231,18 +232,13 @@ The PPP driver has been restructured to support multilink and to
231enable it to operate over diverse media layers. If you use PPP, 232enable it to operate over diverse media layers. If you use PPP,
232upgrade pppd to at least 2.4.0. 233upgrade pppd to at least 2.4.0.
233 234
234If you are not using devfs, you must have the device file /dev/ppp 235If you are not using udev, you must have the device file /dev/ppp
235which can be made by: 236which can be made by:
236 237
237mknod /dev/ppp c 108 0 238mknod /dev/ppp c 108 0
238 239
239as root. 240as root.
240 241
241If you use devfsd and build ppp support as modules, you will need
242the following in your /etc/devfsd.conf file:
243
244LOOKUP PPP MODLOAD
245
246Isdn4k-utils 242Isdn4k-utils
247------------ 243------------
248 244
@@ -271,7 +267,7 @@ active clients.
271 267
272To enable this new functionality, you need to: 268To enable this new functionality, you need to:
273 269
274 mount -t nfsd nfsd /proc/fs/nfs 270 mount -t nfsd nfsd /proc/fs/nfsd
275 271
276before running exportfs or mountd. It is recommended that all NFS 272before running exportfs or mountd. It is recommended that all NFS
277services be protected from the internet-at-large by a firewall where 273services be protected from the internet-at-large by a firewall where
diff --git a/Documentation/CodingStyle b/Documentation/CodingStyle
index 6d2412ec91ed..29c18966b050 100644
--- a/Documentation/CodingStyle
+++ b/Documentation/CodingStyle
@@ -532,6 +532,40 @@ appears outweighs the potential value of the hint that tells gcc to do
532something it would have done anyway. 532something it would have done anyway.
533 533
534 534
535 Chapter 16: Function return values and names
536
537Functions can return values of many different kinds, and one of the
538most common is a value indicating whether the function succeeded or
539failed. Such a value can be represented as an error-code integer
540(-Exxx = failure, 0 = success) or a "succeeded" boolean (0 = failure,
541non-zero = success).
542
543Mixing up these two sorts of representations is a fertile source of
544difficult-to-find bugs. If the C language included a strong distinction
545between integers and booleans then the compiler would find these mistakes
546for us... but it doesn't. To help prevent such bugs, always follow this
547convention:
548
549 If the name of a function is an action or an imperative command,
550 the function should return an error-code integer. If the name
551 is a predicate, the function should return a "succeeded" boolean.
552
553For example, "add work" is a command, and the add_work() function returns 0
554for success or -EBUSY for failure. In the same way, "PCI device present" is
555a predicate, and the pci_dev_present() function returns 1 if it succeeds in
556finding a matching device or 0 if it doesn't.
557
558All EXPORTed functions must respect this convention, and so should all
559public functions. Private (static) functions need not, but it is
560recommended that they do.
561
562Functions whose return value is the actual result of a computation, rather
563than an indication of whether the computation succeeded, are not subject to
564this rule. Generally they indicate failure by returning some out-of-range
565result. Typical examples would be functions that return pointers; they use
566NULL or the ERR_PTR mechanism to report failure.
567
568
535 569
536 Appendix I: References 570 Appendix I: References
537 571
diff --git a/Documentation/DMA-mapping.txt b/Documentation/DMA-mapping.txt
index 7c717699032c..63392c9132b4 100644
--- a/Documentation/DMA-mapping.txt
+++ b/Documentation/DMA-mapping.txt
@@ -698,12 +698,12 @@ these interfaces. Remember that, as defined, consistent mappings are
698always going to be SAC addressable. 698always going to be SAC addressable.
699 699
700The first thing your driver needs to do is query the PCI platform 700The first thing your driver needs to do is query the PCI platform
701layer with your devices DAC addressing capabilities: 701layer if it is capable of handling your devices DAC addressing
702capabilities:
702 703
703 int pci_dac_set_dma_mask(struct pci_dev *pdev, u64 mask); 704 int pci_dac_dma_supported(struct pci_dev *hwdev, u64 mask);
704 705
705This routine behaves identically to pci_set_dma_mask. You may not 706You may not use the following interfaces if this routine fails.
706use the following interfaces if this routine fails.
707 707
708Next, DMA addresses using this API are kept track of using the 708Next, DMA addresses using this API are kept track of using the
709dma64_addr_t type. It is guaranteed to be big enough to hold any 709dma64_addr_t type. It is guaranteed to be big enough to hold any
diff --git a/Documentation/DocBook/Makefile b/Documentation/DocBook/Makefile
index 5a2882d275ba..66e1cf733571 100644
--- a/Documentation/DocBook/Makefile
+++ b/Documentation/DocBook/Makefile
@@ -10,7 +10,8 @@ DOCBOOKS := wanbook.xml z8530book.xml mcabook.xml videobook.xml \
10 kernel-hacking.xml kernel-locking.xml deviceiobook.xml \ 10 kernel-hacking.xml kernel-locking.xml deviceiobook.xml \
11 procfs-guide.xml writing_usb_driver.xml \ 11 procfs-guide.xml writing_usb_driver.xml \
12 kernel-api.xml journal-api.xml lsm.xml usb.xml \ 12 kernel-api.xml journal-api.xml lsm.xml usb.xml \
13 gadget.xml libata.xml mtdnand.xml librs.xml rapidio.xml 13 gadget.xml libata.xml mtdnand.xml librs.xml rapidio.xml \
14 genericirq.xml
14 15
15### 16###
16# The build process is as follows (targets): 17# The build process is as follows (targets):
diff --git a/Documentation/DocBook/genericirq.tmpl b/Documentation/DocBook/genericirq.tmpl
new file mode 100644
index 000000000000..0f4a4b6321e4
--- /dev/null
+++ b/Documentation/DocBook/genericirq.tmpl
@@ -0,0 +1,474 @@
1<?xml version="1.0" encoding="UTF-8"?>
2<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
3 "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []>
4
5<book id="Generic-IRQ-Guide">
6 <bookinfo>
7 <title>Linux generic IRQ handling</title>
8
9 <authorgroup>
10 <author>
11 <firstname>Thomas</firstname>
12 <surname>Gleixner</surname>
13 <affiliation>
14 <address>
15 <email>tglx@linutronix.de</email>
16 </address>
17 </affiliation>
18 </author>
19 <author>
20 <firstname>Ingo</firstname>
21 <surname>Molnar</surname>
22 <affiliation>
23 <address>
24 <email>mingo@elte.hu</email>
25 </address>
26 </affiliation>
27 </author>
28 </authorgroup>
29
30 <copyright>
31 <year>2005-2006</year>
32 <holder>Thomas Gleixner</holder>
33 </copyright>
34 <copyright>
35 <year>2005-2006</year>
36 <holder>Ingo Molnar</holder>
37 </copyright>
38
39 <legalnotice>
40 <para>
41 This documentation is free software; you can redistribute
42 it and/or modify it under the terms of the GNU General Public
43 License version 2 as published by the Free Software Foundation.
44 </para>
45
46 <para>
47 This program is distributed in the hope that it will be
48 useful, but WITHOUT ANY WARRANTY; without even the implied
49 warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
50 See the GNU General Public License for more details.
51 </para>
52
53 <para>
54 You should have received a copy of the GNU General Public
55 License along with this program; if not, write to the Free
56 Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
57 MA 02111-1307 USA
58 </para>
59
60 <para>
61 For more details see the file COPYING in the source
62 distribution of Linux.
63 </para>
64 </legalnotice>
65 </bookinfo>
66
67<toc></toc>
68
69 <chapter id="intro">
70 <title>Introduction</title>
71 <para>
72 The generic interrupt handling layer is designed to provide a
73 complete abstraction of interrupt handling for device drivers.
74 It is able to handle all the different types of interrupt controller
75 hardware. Device drivers use generic API functions to request, enable,
76 disable and free interrupts. The drivers do not have to know anything
77 about interrupt hardware details, so they can be used on different
78 platforms without code changes.
79 </para>
80 <para>
81 This documentation is provided to developers who want to implement
82 an interrupt subsystem based for their architecture, with the help
83 of the generic IRQ handling layer.
84 </para>
85 </chapter>
86
87 <chapter id="rationale">
88 <title>Rationale</title>
89 <para>
90 The original implementation of interrupt handling in Linux is using
91 the __do_IRQ() super-handler, which is able to deal with every
92 type of interrupt logic.
93 </para>
94 <para>
95 Originally, Russell King identified different types of handlers to
96 build a quite universal set for the ARM interrupt handler
97 implementation in Linux 2.5/2.6. He distinguished between:
98 <itemizedlist>
99 <listitem><para>Level type</para></listitem>
100 <listitem><para>Edge type</para></listitem>
101 <listitem><para>Simple type</para></listitem>
102 </itemizedlist>
103 In the SMP world of the __do_IRQ() super-handler another type
104 was identified:
105 <itemizedlist>
106 <listitem><para>Per CPU type</para></listitem>
107 </itemizedlist>
108 </para>
109 <para>
110 This split implementation of highlevel IRQ handlers allows us to
111 optimize the flow of the interrupt handling for each specific
112 interrupt type. This reduces complexity in that particular codepath
113 and allows the optimized handling of a given type.
114 </para>
115 <para>
116 The original general IRQ implementation used hw_interrupt_type
117 structures and their ->ack(), ->end() [etc.] callbacks to
118 differentiate the flow control in the super-handler. This leads to
119 a mix of flow logic and lowlevel hardware logic, and it also leads
120 to unnecessary code duplication: for example in i386, there is a
121 ioapic_level_irq and a ioapic_edge_irq irq-type which share many
122 of the lowlevel details but have different flow handling.
123 </para>
124 <para>
125 A more natural abstraction is the clean separation of the
126 'irq flow' and the 'chip details'.
127 </para>
128 <para>
129 Analysing a couple of architecture's IRQ subsystem implementations
130 reveals that most of them can use a generic set of 'irq flow'
131 methods and only need to add the chip level specific code.
132 The separation is also valuable for (sub)architectures
133 which need specific quirks in the irq flow itself but not in the
134 chip-details - and thus provides a more transparent IRQ subsystem
135 design.
136 </para>
137 <para>
138 Each interrupt descriptor is assigned its own highlevel flow
139 handler, which is normally one of the generic
140 implementations. (This highlevel flow handler implementation also
141 makes it simple to provide demultiplexing handlers which can be
142 found in embedded platforms on various architectures.)
143 </para>
144 <para>
145 The separation makes the generic interrupt handling layer more
146 flexible and extensible. For example, an (sub)architecture can
147 use a generic irq-flow implementation for 'level type' interrupts
148 and add a (sub)architecture specific 'edge type' implementation.
149 </para>
150 <para>
151 To make the transition to the new model easier and prevent the
152 breakage of existing implementations, the __do_IRQ() super-handler
153 is still available. This leads to a kind of duality for the time
154 being. Over time the new model should be used in more and more
155 architectures, as it enables smaller and cleaner IRQ subsystems.
156 </para>
157 </chapter>
158 <chapter id="bugs">
159 <title>Known Bugs And Assumptions</title>
160 <para>
161 None (knock on wood).
162 </para>
163 </chapter>
164
165 <chapter id="Abstraction">
166 <title>Abstraction layers</title>
167 <para>
168 There are three main levels of abstraction in the interrupt code:
169 <orderedlist>
170 <listitem><para>Highlevel driver API</para></listitem>
171 <listitem><para>Highlevel IRQ flow handlers</para></listitem>
172 <listitem><para>Chiplevel hardware encapsulation</para></listitem>
173 </orderedlist>
174 </para>
175 <sect1>
176 <title>Interrupt control flow</title>
177 <para>
178 Each interrupt is described by an interrupt descriptor structure
179 irq_desc. The interrupt is referenced by an 'unsigned int' numeric
180 value which selects the corresponding interrupt decription structure
181 in the descriptor structures array.
182 The descriptor structure contains status information and pointers
183 to the interrupt flow method and the interrupt chip structure
184 which are assigned to this interrupt.
185 </para>
186 <para>
187 Whenever an interrupt triggers, the lowlevel arch code calls into
188 the generic interrupt code by calling desc->handle_irq().
189 This highlevel IRQ handling function only uses desc->chip primitives
190 referenced by the assigned chip descriptor structure.
191 </para>
192 </sect1>
193 <sect1>
194 <title>Highlevel Driver API</title>
195 <para>
196 The highlevel Driver API consists of following functions:
197 <itemizedlist>
198 <listitem><para>request_irq()</para></listitem>
199 <listitem><para>free_irq()</para></listitem>
200 <listitem><para>disable_irq()</para></listitem>
201 <listitem><para>enable_irq()</para></listitem>
202 <listitem><para>disable_irq_nosync() (SMP only)</para></listitem>
203 <listitem><para>synchronize_irq() (SMP only)</para></listitem>
204 <listitem><para>set_irq_type()</para></listitem>
205 <listitem><para>set_irq_wake()</para></listitem>
206 <listitem><para>set_irq_data()</para></listitem>
207 <listitem><para>set_irq_chip()</para></listitem>
208 <listitem><para>set_irq_chip_data()</para></listitem>
209 </itemizedlist>
210 See the autogenerated function documentation for details.
211 </para>
212 </sect1>
213 <sect1>
214 <title>Highlevel IRQ flow handlers</title>
215 <para>
216 The generic layer provides a set of pre-defined irq-flow methods:
217 <itemizedlist>
218 <listitem><para>handle_level_irq</para></listitem>
219 <listitem><para>handle_edge_irq</para></listitem>
220 <listitem><para>handle_simple_irq</para></listitem>
221 <listitem><para>handle_percpu_irq</para></listitem>
222 </itemizedlist>
223 The interrupt flow handlers (either predefined or architecture
224 specific) are assigned to specific interrupts by the architecture
225 either during bootup or during device initialization.
226 </para>
227 <sect2>
228 <title>Default flow implementations</title>
229 <sect3>
230 <title>Helper functions</title>
231 <para>
232 The helper functions call the chip primitives and
233 are used by the default flow implementations.
234 The following helper functions are implemented (simplified excerpt):
235 <programlisting>
236default_enable(irq)
237{
238 desc->chip->unmask(irq);
239}
240
241default_disable(irq)
242{
243 if (!delay_disable(irq))
244 desc->chip->mask(irq);
245}
246
247default_ack(irq)
248{
249 chip->ack(irq);
250}
251
252default_mask_ack(irq)
253{
254 if (chip->mask_ack) {
255 chip->mask_ack(irq);
256 } else {
257 chip->mask(irq);
258 chip->ack(irq);
259 }
260}
261
262noop(irq)
263{
264}
265
266 </programlisting>
267 </para>
268 </sect3>
269 </sect2>
270 <sect2>
271 <title>Default flow handler implementations</title>
272 <sect3>
273 <title>Default Level IRQ flow handler</title>
274 <para>
275 handle_level_irq provides a generic implementation
276 for level-triggered interrupts.
277 </para>
278 <para>
279 The following control flow is implemented (simplified excerpt):
280 <programlisting>
281desc->chip->start();
282handle_IRQ_event(desc->action);
283desc->chip->end();
284 </programlisting>
285 </para>
286 </sect3>
287 <sect3>
288 <title>Default Edge IRQ flow handler</title>
289 <para>
290 handle_edge_irq provides a generic implementation
291 for edge-triggered interrupts.
292 </para>
293 <para>
294 The following control flow is implemented (simplified excerpt):
295 <programlisting>
296if (desc->status &amp; running) {
297 desc->chip->hold();
298 desc->status |= pending | masked;
299 return;
300}
301desc->chip->start();
302desc->status |= running;
303do {
304 if (desc->status &amp; masked)
305 desc->chip->enable();
306 desc-status &amp;= ~pending;
307 handle_IRQ_event(desc->action);
308} while (status &amp; pending);
309desc-status &amp;= ~running;
310desc->chip->end();
311 </programlisting>
312 </para>
313 </sect3>
314 <sect3>
315 <title>Default simple IRQ flow handler</title>
316 <para>
317 handle_simple_irq provides a generic implementation
318 for simple interrupts.
319 </para>
320 <para>
321 Note: The simple flow handler does not call any
322 handler/chip primitives.
323 </para>
324 <para>
325 The following control flow is implemented (simplified excerpt):
326 <programlisting>
327handle_IRQ_event(desc->action);
328 </programlisting>
329 </para>
330 </sect3>
331 <sect3>
332 <title>Default per CPU flow handler</title>
333 <para>
334 handle_percpu_irq provides a generic implementation
335 for per CPU interrupts.
336 </para>
337 <para>
338 Per CPU interrupts are only available on SMP and
339 the handler provides a simplified version without
340 locking.
341 </para>
342 <para>
343 The following control flow is implemented (simplified excerpt):
344 <programlisting>
345desc->chip->start();
346handle_IRQ_event(desc->action);
347desc->chip->end();
348 </programlisting>
349 </para>
350 </sect3>
351 </sect2>
352 <sect2>
353 <title>Quirks and optimizations</title>
354 <para>
355 The generic functions are intended for 'clean' architectures and chips,
356 which have no platform-specific IRQ handling quirks. If an architecture
357 needs to implement quirks on the 'flow' level then it can do so by
358 overriding the highlevel irq-flow handler.
359 </para>
360 </sect2>
361 <sect2>
362 <title>Delayed interrupt disable</title>
363 <para>
364 This per interrupt selectable feature, which was introduced by Russell
365 King in the ARM interrupt implementation, does not mask an interrupt
366 at the hardware level when disable_irq() is called. The interrupt is
367 kept enabled and is masked in the flow handler when an interrupt event
368 happens. This prevents losing edge interrupts on hardware which does
369 not store an edge interrupt event while the interrupt is disabled at
370 the hardware level. When an interrupt arrives while the IRQ_DISABLED
371 flag is set, then the interrupt is masked at the hardware level and
372 the IRQ_PENDING bit is set. When the interrupt is re-enabled by
373 enable_irq() the pending bit is checked and if it is set, the
374 interrupt is resent either via hardware or by a software resend
375 mechanism. (It's necessary to enable CONFIG_HARDIRQS_SW_RESEND when
376 you want to use the delayed interrupt disable feature and your
377 hardware is not capable of retriggering an interrupt.)
378 The delayed interrupt disable can be runtime enabled, per interrupt,
379 by setting the IRQ_DELAYED_DISABLE flag in the irq_desc status field.
380 </para>
381 </sect2>
382 </sect1>
383 <sect1>
384 <title>Chiplevel hardware encapsulation</title>
385 <para>
386 The chip level hardware descriptor structure irq_chip
387 contains all the direct chip relevant functions, which
388 can be utilized by the irq flow implementations.
389 <itemizedlist>
390 <listitem><para>ack()</para></listitem>
391 <listitem><para>mask_ack() - Optional, recommended for performance</para></listitem>
392 <listitem><para>mask()</para></listitem>
393 <listitem><para>unmask()</para></listitem>
394 <listitem><para>retrigger() - Optional</para></listitem>
395 <listitem><para>set_type() - Optional</para></listitem>
396 <listitem><para>set_wake() - Optional</para></listitem>
397 </itemizedlist>
398 These primitives are strictly intended to mean what they say: ack means
399 ACK, masking means masking of an IRQ line, etc. It is up to the flow
400 handler(s) to use these basic units of lowlevel functionality.
401 </para>
402 </sect1>
403 </chapter>
404
405 <chapter id="doirq">
406 <title>__do_IRQ entry point</title>
407 <para>
408 The original implementation __do_IRQ() is an alternative entry
409 point for all types of interrupts.
410 </para>
411 <para>
412 This handler turned out to be not suitable for all
413 interrupt hardware and was therefore reimplemented with split
414 functionality for egde/level/simple/percpu interrupts. This is not
415 only a functional optimization. It also shortens code paths for
416 interrupts.
417 </para>
418 <para>
419 To make use of the split implementation, replace the call to
420 __do_IRQ by a call to desc->chip->handle_irq() and associate
421 the appropriate handler function to desc->chip->handle_irq().
422 In most cases the generic handler implementations should
423 be sufficient.
424 </para>
425 </chapter>
426
427 <chapter id="locking">
428 <title>Locking on SMP</title>
429 <para>
430 The locking of chip registers is up to the architecture that
431 defines the chip primitives. There is a chip->lock field that can be used
432 for serialization, but the generic layer does not touch it. The per-irq
433 structure is protected via desc->lock, by the generic layer.
434 </para>
435 </chapter>
436 <chapter id="structs">
437 <title>Structures</title>
438 <para>
439 This chapter contains the autogenerated documentation of the structures which are
440 used in the generic IRQ layer.
441 </para>
442!Iinclude/linux/irq.h
443 </chapter>
444
445 <chapter id="pubfunctions">
446 <title>Public Functions Provided</title>
447 <para>
448 This chapter contains the autogenerated documentation of the kernel API functions
449 which are exported.
450 </para>
451!Ekernel/irq/manage.c
452!Ekernel/irq/chip.c
453 </chapter>
454
455 <chapter id="intfunctions">
456 <title>Internal Functions Provided</title>
457 <para>
458 This chapter contains the autogenerated documentation of the internal functions.
459 </para>
460!Ikernel/irq/handle.c
461!Ikernel/irq/chip.c
462 </chapter>
463
464 <chapter id="credits">
465 <title>Credits</title>
466 <para>
467 The following people have contributed to this document:
468 <orderedlist>
469 <listitem><para>Thomas Gleixner<email>tglx@linutronix.de</email></para></listitem>
470 <listitem><para>Ingo Molnar<email>mingo@elte.hu</email></para></listitem>
471 </orderedlist>
472 </para>
473 </chapter>
474</book>
diff --git a/Documentation/DocBook/kernel-api.tmpl b/Documentation/DocBook/kernel-api.tmpl
index 31b727ceb127..6d4b1ef5b6f1 100644
--- a/Documentation/DocBook/kernel-api.tmpl
+++ b/Documentation/DocBook/kernel-api.tmpl
@@ -59,9 +59,14 @@
59!Iinclude/linux/hrtimer.h 59!Iinclude/linux/hrtimer.h
60!Ekernel/hrtimer.c 60!Ekernel/hrtimer.c
61 </sect1> 61 </sect1>
62 <sect1><title>Workqueues and Kevents</title>
63!Ekernel/workqueue.c
64 </sect1>
62 <sect1><title>Internal Functions</title> 65 <sect1><title>Internal Functions</title>
63!Ikernel/exit.c 66!Ikernel/exit.c
64!Ikernel/signal.c 67!Ikernel/signal.c
68!Iinclude/linux/kthread.h
69!Ekernel/kthread.c
65 </sect1> 70 </sect1>
66 71
67 <sect1><title>Kernel objects manipulation</title> 72 <sect1><title>Kernel objects manipulation</title>
@@ -114,6 +119,29 @@ X!Ilib/string.c
114 </sect1> 119 </sect1>
115 </chapter> 120 </chapter>
116 121
122 <chapter id="kernel-lib">
123 <title>Basic Kernel Library Functions</title>
124
125 <para>
126 The Linux kernel provides more basic utility functions.
127 </para>
128
129 <sect1><title>Bitmap Operations</title>
130!Elib/bitmap.c
131!Ilib/bitmap.c
132 </sect1>
133
134 <sect1><title>Command-line Parsing</title>
135!Elib/cmdline.c
136 </sect1>
137
138 <sect1><title>CRC Functions</title>
139!Elib/crc16.c
140!Elib/crc32.c
141!Elib/crc-ccitt.c
142 </sect1>
143 </chapter>
144
117 <chapter id="mm"> 145 <chapter id="mm">
118 <title>Memory Management in Linux</title> 146 <title>Memory Management in Linux</title>
119 <sect1><title>The Slab Cache</title> 147 <sect1><title>The Slab Cache</title>
@@ -153,27 +181,6 @@ X!Ilib/string.c
153 </sect1> 181 </sect1>
154 </chapter> 182 </chapter>
155 183
156 <chapter id="proc">
157 <title>The proc filesystem</title>
158
159 <sect1><title>sysctl interface</title>
160!Ekernel/sysctl.c
161 </sect1>
162
163 <sect1><title>proc filesystem interface</title>
164!Ifs/proc/base.c
165 </sect1>
166 </chapter>
167
168 <chapter id="debugfs">
169 <title>The debugfs filesystem</title>
170
171 <sect1><title>debugfs interface</title>
172!Efs/debugfs/inode.c
173!Efs/debugfs/file.c
174 </sect1>
175 </chapter>
176
177 <chapter id="vfs"> 184 <chapter id="vfs">
178 <title>The Linux VFS</title> 185 <title>The Linux VFS</title>
179 <sect1><title>The Filesystem types</title> 186 <sect1><title>The Filesystem types</title>
@@ -206,6 +213,50 @@ X!Ilib/string.c
206 </sect1> 213 </sect1>
207 </chapter> 214 </chapter>
208 215
216 <chapter id="proc">
217 <title>The proc filesystem</title>
218
219 <sect1><title>sysctl interface</title>
220!Ekernel/sysctl.c
221 </sect1>
222
223 <sect1><title>proc filesystem interface</title>
224!Ifs/proc/base.c
225 </sect1>
226 </chapter>
227
228 <chapter id="sysfs">
229 <title>The Filesystem for Exporting Kernel Objects</title>
230!Efs/sysfs/file.c
231!Efs/sysfs/symlink.c
232!Efs/sysfs/bin.c
233 </chapter>
234
235 <chapter id="debugfs">
236 <title>The debugfs filesystem</title>
237
238 <sect1><title>debugfs interface</title>
239!Efs/debugfs/inode.c
240!Efs/debugfs/file.c
241 </sect1>
242 </chapter>
243
244 <chapter id="relayfs">
245 <title>relay interface support</title>
246
247 <para>
248 Relay interface support
249 is designed to provide an efficient mechanism for tools and
250 facilities to relay large amounts of data from kernel space to
251 user space.
252 </para>
253
254 <sect1><title>relay interface</title>
255!Ekernel/relay.c
256!Ikernel/relay.c
257 </sect1>
258 </chapter>
259
209 <chapter id="netcore"> 260 <chapter id="netcore">
210 <title>Linux Networking</title> 261 <title>Linux Networking</title>
211 <sect1><title>Networking Base Types</title> 262 <sect1><title>Networking Base Types</title>
@@ -275,20 +326,19 @@ X!Ekernel/module.c
275 </sect1> 326 </sect1>
276 327
277 <sect1><title>Resources Management</title> 328 <sect1><title>Resources Management</title>
278!Ekernel/resource.c 329!Ikernel/resource.c
279 </sect1> 330 </sect1>
280 331
281 <sect1><title>MTRR Handling</title> 332 <sect1><title>MTRR Handling</title>
282!Earch/i386/kernel/cpu/mtrr/main.c 333!Earch/i386/kernel/cpu/mtrr/main.c
283 </sect1> 334 </sect1>
335
284 <sect1><title>PCI Support Library</title> 336 <sect1><title>PCI Support Library</title>
285!Edrivers/pci/pci.c 337!Edrivers/pci/pci.c
286!Edrivers/pci/pci-driver.c 338!Edrivers/pci/pci-driver.c
287!Edrivers/pci/remove.c 339!Edrivers/pci/remove.c
288!Edrivers/pci/pci-acpi.c 340!Edrivers/pci/pci-acpi.c
289<!-- kerneldoc does not understand to __devinit 341!Edrivers/pci/search.c
290X!Edrivers/pci/search.c
291 -->
292!Edrivers/pci/msi.c 342!Edrivers/pci/msi.c
293!Edrivers/pci/bus.c 343!Edrivers/pci/bus.c
294<!-- FIXME: Removed for now since no structured comments in source 344<!-- FIXME: Removed for now since no structured comments in source
@@ -315,16 +365,11 @@ X!Earch/i386/kernel/mca.c
315 </sect1> 365 </sect1>
316 </chapter> 366 </chapter>
317 367
318 <chapter id="devfs"> 368 <chapter id="firmware">
319 <title>The Device File System</title> 369 <title>Firmware Interfaces</title>
320!Efs/devfs/base.c 370 <sect1><title>DMI Interfaces</title>
321 </chapter> 371!Edrivers/firmware/dmi_scan.c
322 372 </sect1>
323 <chapter id="sysfs">
324 <title>The Filesystem for Exporting Kernel Objects</title>
325!Efs/sysfs/file.c
326!Efs/sysfs/symlink.c
327!Efs/sysfs/bin.c
328 </chapter> 373 </chapter>
329 374
330 <chapter id="security"> 375 <chapter id="security">
@@ -357,6 +402,7 @@ X!Iinclude/linux/device.h
357--> 402-->
358!Edrivers/base/driver.c 403!Edrivers/base/driver.c
359!Edrivers/base/core.c 404!Edrivers/base/core.c
405!Edrivers/base/class.c
360!Edrivers/base/firmware_class.c 406!Edrivers/base/firmware_class.c
361!Edrivers/base/transport_class.c 407!Edrivers/base/transport_class.c
362!Edrivers/base/dmapool.c 408!Edrivers/base/dmapool.c
@@ -403,17 +449,29 @@ X!Edrivers/pnp/system.c
403 </sect1> 449 </sect1>
404 </chapter> 450 </chapter>
405 451
406
407 <chapter id="blkdev"> 452 <chapter id="blkdev">
408 <title>Block Devices</title> 453 <title>Block Devices</title>
409!Eblock/ll_rw_blk.c 454!Eblock/ll_rw_blk.c
410 </chapter> 455 </chapter>
411 456
457 <chapter id="chrdev">
458 <title>Char devices</title>
459!Efs/char_dev.c
460 </chapter>
461
412 <chapter id="miscdev"> 462 <chapter id="miscdev">
413 <title>Miscellaneous Devices</title> 463 <title>Miscellaneous Devices</title>
414!Edrivers/char/misc.c 464!Edrivers/char/misc.c
415 </chapter> 465 </chapter>
416 466
467 <chapter id="parportdev">
468 <title>Parallel Port Devices</title>
469!Iinclude/linux/parport.h
470!Edrivers/parport/ieee1284.c
471!Edrivers/parport/share.c
472!Idrivers/parport/daisy.c
473 </chapter>
474
417 <chapter id="viddev"> 475 <chapter id="viddev">
418 <title>Video4Linux</title> 476 <title>Video4Linux</title>
419!Edrivers/media/video/videodev.c 477!Edrivers/media/video/videodev.c
diff --git a/Documentation/DocBook/kernel-locking.tmpl b/Documentation/DocBook/kernel-locking.tmpl
index 158ffe9bfade..644c3884fab9 100644
--- a/Documentation/DocBook/kernel-locking.tmpl
+++ b/Documentation/DocBook/kernel-locking.tmpl
@@ -1590,7 +1590,7 @@ the amount of locking which needs to be done.
1590 <para> 1590 <para>
1591 Our final dilemma is this: when can we actually destroy the 1591 Our final dilemma is this: when can we actually destroy the
1592 removed element? Remember, a reader might be stepping through 1592 removed element? Remember, a reader might be stepping through
1593 this element in the list right now: it we free this element and 1593 this element in the list right now: if we free this element and
1594 the <symbol>next</symbol> pointer changes, the reader will jump 1594 the <symbol>next</symbol> pointer changes, the reader will jump
1595 off into garbage and crash. We need to wait until we know that 1595 off into garbage and crash. We need to wait until we know that
1596 all the readers who were traversing the list when we deleted the 1596 all the readers who were traversing the list when we deleted the
diff --git a/Documentation/DocBook/libata.tmpl b/Documentation/DocBook/libata.tmpl
index e97c32314541..065e8dc23e3a 100644
--- a/Documentation/DocBook/libata.tmpl
+++ b/Documentation/DocBook/libata.tmpl
@@ -868,18 +868,18 @@ and other resources, etc.
868 868
869 <chapter id="libataExt"> 869 <chapter id="libataExt">
870 <title>libata Library</title> 870 <title>libata Library</title>
871!Edrivers/scsi/libata-core.c 871!Edrivers/ata/libata-core.c
872 </chapter> 872 </chapter>
873 873
874 <chapter id="libataInt"> 874 <chapter id="libataInt">
875 <title>libata Core Internals</title> 875 <title>libata Core Internals</title>
876!Idrivers/scsi/libata-core.c 876!Idrivers/ata/libata-core.c
877 </chapter> 877 </chapter>
878 878
879 <chapter id="libataScsiInt"> 879 <chapter id="libataScsiInt">
880 <title>libata SCSI translation/emulation</title> 880 <title>libata SCSI translation/emulation</title>
881!Edrivers/scsi/libata-scsi.c 881!Edrivers/ata/libata-scsi.c
882!Idrivers/scsi/libata-scsi.c 882!Idrivers/ata/libata-scsi.c
883 </chapter> 883 </chapter>
884 884
885 <chapter id="ataExceptions"> 885 <chapter id="ataExceptions">
@@ -1600,12 +1600,12 @@ and other resources, etc.
1600 1600
1601 <chapter id="PiixInt"> 1601 <chapter id="PiixInt">
1602 <title>ata_piix Internals</title> 1602 <title>ata_piix Internals</title>
1603!Idrivers/scsi/ata_piix.c 1603!Idrivers/ata/ata_piix.c
1604 </chapter> 1604 </chapter>
1605 1605
1606 <chapter id="SILInt"> 1606 <chapter id="SILInt">
1607 <title>sata_sil Internals</title> 1607 <title>sata_sil Internals</title>
1608!Idrivers/scsi/sata_sil.c 1608!Idrivers/ata/sata_sil.c
1609 </chapter> 1609 </chapter>
1610 1610
1611 <chapter id="libataThanks"> 1611 <chapter id="libataThanks">
diff --git a/Documentation/DocBook/mtdnand.tmpl b/Documentation/DocBook/mtdnand.tmpl
index 6e463d0db266..a8c8cce50633 100644
--- a/Documentation/DocBook/mtdnand.tmpl
+++ b/Documentation/DocBook/mtdnand.tmpl
@@ -109,7 +109,7 @@
109 for most of the implementations. These functions can be replaced by the 109 for most of the implementations. These functions can be replaced by the
110 board driver if neccecary. Those functions are called via pointers in the 110 board driver if neccecary. Those functions are called via pointers in the
111 NAND chip description structure. The board driver can set the functions which 111 NAND chip description structure. The board driver can set the functions which
112 should be replaced by board dependend functions before calling nand_scan(). 112 should be replaced by board dependent functions before calling nand_scan().
113 If the function pointer is NULL on entry to nand_scan() then the pointer 113 If the function pointer is NULL on entry to nand_scan() then the pointer
114 is set to the default function which is suitable for the detected chip type. 114 is set to the default function which is suitable for the detected chip type.
115 </para></listitem> 115 </para></listitem>
@@ -133,7 +133,7 @@
133 [REPLACEABLE]</para><para> 133 [REPLACEABLE]</para><para>
134 Replaceable members hold hardware related functions which can be 134 Replaceable members hold hardware related functions which can be
135 provided by the board driver. The board driver can set the functions which 135 provided by the board driver. The board driver can set the functions which
136 should be replaced by board dependend functions before calling nand_scan(). 136 should be replaced by board dependent functions before calling nand_scan().
137 If the function pointer is NULL on entry to nand_scan() then the pointer 137 If the function pointer is NULL on entry to nand_scan() then the pointer
138 is set to the default function which is suitable for the detected chip type. 138 is set to the default function which is suitable for the detected chip type.
139 </para></listitem> 139 </para></listitem>
@@ -156,9 +156,8 @@
156 <title>Basic board driver</title> 156 <title>Basic board driver</title>
157 <para> 157 <para>
158 For most boards it will be sufficient to provide just the 158 For most boards it will be sufficient to provide just the
159 basic functions and fill out some really board dependend 159 basic functions and fill out some really board dependent
160 members in the nand chip description structure. 160 members in the nand chip description structure.
161 See drivers/mtd/nand/skeleton for reference.
162 </para> 161 </para>
163 <sect1> 162 <sect1>
164 <title>Basic defines</title> 163 <title>Basic defines</title>
@@ -189,9 +188,9 @@ static unsigned long baseaddr;
189 <sect1> 188 <sect1>
190 <title>Partition defines</title> 189 <title>Partition defines</title>
191 <para> 190 <para>
192 If you want to divide your device into parititions, then 191 If you want to divide your device into partitions, then
193 enable the configuration switch CONFIG_MTD_PARITIONS and define 192 enable the configuration switch CONFIG_MTD_PARTITIONS and define
194 a paritioning scheme suitable to your board. 193 a partitioning scheme suitable to your board.
195 </para> 194 </para>
196 <programlisting> 195 <programlisting>
197#define NUM_PARTITIONS 2 196#define NUM_PARTITIONS 2
@@ -1295,7 +1294,9 @@ in this page</entry>
1295 </para> 1294 </para>
1296!Idrivers/mtd/nand/nand_base.c 1295!Idrivers/mtd/nand/nand_base.c
1297!Idrivers/mtd/nand/nand_bbt.c 1296!Idrivers/mtd/nand/nand_bbt.c
1298!Idrivers/mtd/nand/nand_ecc.c 1297<!-- No internal functions for kernel-doc:
1298X!Idrivers/mtd/nand/nand_ecc.c
1299-->
1299 </chapter> 1300 </chapter>
1300 1301
1301 <chapter id="credits"> 1302 <chapter id="credits">
diff --git a/Documentation/DocBook/usb.tmpl b/Documentation/DocBook/usb.tmpl
index 320af25de3a2..3608472d7b74 100644
--- a/Documentation/DocBook/usb.tmpl
+++ b/Documentation/DocBook/usb.tmpl
@@ -43,59 +43,52 @@
43 43
44 <para>A Universal Serial Bus (USB) is used to connect a host, 44 <para>A Universal Serial Bus (USB) is used to connect a host,
45 such as a PC or workstation, to a number of peripheral 45 such as a PC or workstation, to a number of peripheral
46 devices. USB uses a tree structure, with the host at the 46 devices. USB uses a tree structure, with the host as the
47 root (the system's master), hubs as interior nodes, and 47 root (the system's master), hubs as interior nodes, and
48 peripheral devices as leaves (and slaves). 48 peripherals as leaves (and slaves).
49 Modern PCs support several such trees of USB devices, usually 49 Modern PCs support several such trees of USB devices, usually
50 one USB 2.0 tree (480 Mbit/sec each) with 50 one USB 2.0 tree (480 Mbit/sec each) with
51 a few USB 1.1 trees (12 Mbit/sec each) that are used when you 51 a few USB 1.1 trees (12 Mbit/sec each) that are used when you
52 connect a USB 1.1 device directly to the machine's "root hub". 52 connect a USB 1.1 device directly to the machine's "root hub".
53 </para> 53 </para>
54 54
55 <para>That master/slave asymmetry was designed in part for 55 <para>That master/slave asymmetry was designed-in for a number of
56 ease of use. It is not physically possible to assemble 56 reasons, one being ease of use. It is not physically possible to
57 (legal) USB cables incorrectly: all upstream "to-the-host" 57 assemble (legal) USB cables incorrectly: all upstream "to the host"
58 connectors are the rectangular type, matching the sockets on 58 connectors are the rectangular type (matching the sockets on
59 root hubs, and the downstream type are the squarish type 59 root hubs), and all downstream connectors are the squarish type
60 (or they are built in to the peripheral). 60 (or they are built into the peripheral).
61 Software doesn't need to deal with distributed autoconfiguration 61 Also, the host software doesn't need to deal with distributed
62 since the pre-designated master node manages all that. 62 auto-configuration since the pre-designated master node manages all that.
63 At the electrical level, bus protocol overhead is reduced by 63 And finally, at the electrical level, bus protocol overhead is reduced by
64 eliminating arbitration and moving scheduling into host software. 64 eliminating arbitration and moving scheduling into the host software.
65 </para> 65 </para>
66 66
67 <para>USB 1.0 was announced in January 1996, and was revised 67 <para>USB 1.0 was announced in January 1996 and was revised
68 as USB 1.1 (with improvements in hub specification and 68 as USB 1.1 (with improvements in hub specification and
69 support for interrupt-out transfers) in September 1998. 69 support for interrupt-out transfers) in September 1998.
70 USB 2.0 was released in April 2000, including high speed 70 USB 2.0 was released in April 2000, adding high-speed
71 transfers and transaction translating hubs (used for USB 1.1 71 transfers and transaction-translating hubs (used for USB 1.1
72 and 1.0 backward compatibility). 72 and 1.0 backward compatibility).
73 </para> 73 </para>
74 74
75 <para>USB support was added to Linux early in the 2.2 kernel series 75 <para>Kernel developers added USB support to Linux early in the 2.2 kernel
76 shortly before the 2.3 development forked off. Updates 76 series, shortly before 2.3 development forked. Updates from 2.3 were
77 from 2.3 were regularly folded back into 2.2 releases, bringing 77 regularly folded back into 2.2 releases, which improved reliability and
78 new features such as <filename>/sbin/hotplug</filename> support, 78 brought <filename>/sbin/hotplug</filename> support as well more drivers.
79 more drivers, and more robustness. 79 Such improvements were continued in the 2.5 kernel series, where they added
80 The 2.5 kernel series continued such improvements, and also 80 USB 2.0 support, improved performance, and made the host controller drivers
81 worked on USB 2.0 support, 81 (HCDs) more consistent. They also simplified the API (to make bugs less
82 higher performance, 82 likely) and added internal "kerneldoc" documentation.
83 better consistency between host controller drivers,
84 API simplification (to make bugs less likely),
85 and providing internal "kerneldoc" documentation.
86 </para> 83 </para>
87 84
88 <para>Linux can run inside USB devices as well as on 85 <para>Linux can run inside USB devices as well as on
89 the hosts that control the devices. 86 the hosts that control the devices.
90 Because the Linux 2.x USB support evolved to support mass market 87 But USB device drivers running inside those peripherals
91 platforms such as Apple Macintosh or PC-compatible systems,
92 it didn't address design concerns for those types of USB systems.
93 So it can't be used inside mass-market PDAs, or other peripherals.
94 USB device drivers running inside those Linux peripherals
95 don't do the same things as the ones running inside hosts, 88 don't do the same things as the ones running inside hosts,
96 and so they've been given a different name: 89 so they've been given a different name:
97 they're called <emphasis>gadget drivers</emphasis>. 90 <emphasis>gadget drivers</emphasis>.
98 This document does not present gadget drivers. 91 This document does not cover gadget drivers.
99 </para> 92 </para>
100 93
101 </chapter> 94 </chapter>
@@ -103,17 +96,14 @@
103<chapter id="host"> 96<chapter id="host">
104 <title>USB Host-Side API Model</title> 97 <title>USB Host-Side API Model</title>
105 98
106 <para>Within the kernel, 99 <para>Host-side drivers for USB devices talk to the "usbcore" APIs.
107 host-side drivers for USB devices talk to the "usbcore" APIs. 100 There are two. One is intended for
108 There are two types of public "usbcore" APIs, targetted at two different 101 <emphasis>general-purpose</emphasis> drivers (exposed through
109 layers of USB driver. Those are 102 driver frameworks), and the other is for drivers that are
110 <emphasis>general purpose</emphasis> drivers, exposed through 103 <emphasis>part of the core</emphasis>.
111 driver frameworks such as block, character, or network devices; 104 Such core drivers include the <emphasis>hub</emphasis> driver
112 and drivers that are <emphasis>part of the core</emphasis>, 105 (which manages trees of USB devices) and several different kinds
113 which are involved in managing a USB bus. 106 of <emphasis>host controller drivers</emphasis>,
114 Such core drivers include the <emphasis>hub</emphasis> driver,
115 which manages trees of USB devices, and several different kinds
116 of <emphasis>host controller driver (HCD)</emphasis>,
117 which control individual busses. 107 which control individual busses.
118 </para> 108 </para>
119 109
@@ -122,21 +112,21 @@
122 112
123 <itemizedlist> 113 <itemizedlist>
124 114
125 <listitem><para>USB supports four kinds of data transfer 115 <listitem><para>USB supports four kinds of data transfers
126 (control, bulk, interrupt, and isochronous). Two transfer 116 (control, bulk, interrupt, and isochronous). Two of them (control
127 types use bandwidth as it's available (control and bulk), 117 and bulk) use bandwidth as it's available,
128 while the other two types of transfer (interrupt and isochronous) 118 while the other two (interrupt and isochronous)
129 are scheduled to provide guaranteed bandwidth. 119 are scheduled to provide guaranteed bandwidth.
130 </para></listitem> 120 </para></listitem>
131 121
132 <listitem><para>The device description model includes one or more 122 <listitem><para>The device description model includes one or more
133 "configurations" per device, only one of which is active at a time. 123 "configurations" per device, only one of which is active at a time.
134 Devices that are capable of high speed operation must also support 124 Devices that are capable of high-speed operation must also support
135 full speed configurations, along with a way to ask about the 125 full-speed configurations, along with a way to ask about the
136 "other speed" configurations that might be used. 126 "other speed" configurations which might be used.
137 </para></listitem> 127 </para></listitem>
138 128
139 <listitem><para>Configurations have one or more "interface", each 129 <listitem><para>Configurations have one or more "interfaces", each
140 of which may have "alternate settings". Interfaces may be 130 of which may have "alternate settings". Interfaces may be
141 standardized by USB "Class" specifications, or may be specific to 131 standardized by USB "Class" specifications, or may be specific to
142 a vendor or device.</para> 132 a vendor or device.</para>
@@ -162,7 +152,7 @@
162 </para></listitem> 152 </para></listitem>
163 153
164 <listitem><para>The Linux USB API supports synchronous calls for 154 <listitem><para>The Linux USB API supports synchronous calls for
165 control and bulk messaging. 155 control and bulk messages.
166 It also supports asynchnous calls for all kinds of data transfer, 156 It also supports asynchnous calls for all kinds of data transfer,
167 using request structures called "URBs" (USB Request Blocks). 157 using request structures called "URBs" (USB Request Blocks).
168 </para></listitem> 158 </para></listitem>
@@ -463,14 +453,25 @@
463 file in your Linux kernel sources. 453 file in your Linux kernel sources.
464 </para> 454 </para>
465 455
466 <para>Otherwise the main use for this file from programs 456 <para>This file, in combination with the poll() system call, can
467 is to poll() it to get notifications of usb devices 457 also be used to detect when devices are added or removed:
468 as they're plugged or unplugged. 458<programlisting>int fd;
469 To see what changed, you'd need to read the file and 459struct pollfd pfd;
470 compare "before" and "after" contents, scan the filesystem, 460
471 or see its hotplug event. 461fd = open("/proc/bus/usb/devices", O_RDONLY);
462pfd = { fd, POLLIN, 0 };
463for (;;) {
464 /* The first time through, this call will return immediately. */
465 poll(&amp;pfd, 1, -1);
466
467 /* To see what's changed, compare the file's previous and current
468 contents or scan the filesystem. (Scanning is more precise.) */
469}</programlisting>
470 Note that this behavior is intended to be used for informational
471 and debug purposes. It would be more appropriate to use programs
472 such as udev or HAL to initialize a device or start a user-mode
473 helper program, for instance.
472 </para> 474 </para>
473
474 </sect1> 475 </sect1>
475 476
476 <sect1> 477 <sect1>
diff --git a/Documentation/DocBook/videobook.tmpl b/Documentation/DocBook/videobook.tmpl
index fdff984a5161..b629da33951d 100644
--- a/Documentation/DocBook/videobook.tmpl
+++ b/Documentation/DocBook/videobook.tmpl
@@ -976,7 +976,7 @@ static int camera_close(struct video_device *dev)
976 <title>Interrupt Handling</title> 976 <title>Interrupt Handling</title>
977 <para> 977 <para>
978 Our example handler is for an ISA bus device. If it was PCI you would be 978 Our example handler is for an ISA bus device. If it was PCI you would be
979 able to share the interrupt and would have set SA_SHIRQ to indicate a 979 able to share the interrupt and would have set IRQF_SHARED to indicate a
980 shared IRQ. We pass the device pointer as the interrupt routine argument. We 980 shared IRQ. We pass the device pointer as the interrupt routine argument. We
981 don't need to since we only support one card but doing this will make it 981 don't need to since we only support one card but doing this will make it
982 easier to upgrade the driver for multiple devices in the future. 982 easier to upgrade the driver for multiple devices in the future.
diff --git a/Documentation/HOWTO b/Documentation/HOWTO
index 915ae8c986c6..1d6560413cc5 100644
--- a/Documentation/HOWTO
+++ b/Documentation/HOWTO
@@ -358,7 +358,8 @@ Here is a list of some of the different kernel trees available:
358 quilt trees: 358 quilt trees:
359 - USB, PCI, Driver Core, and I2C, Greg Kroah-Hartman <gregkh@suse.de> 359 - USB, PCI, Driver Core, and I2C, Greg Kroah-Hartman <gregkh@suse.de>
360 kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/ 360 kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/
361 361 - x86-64, partly i386, Andi Kleen <ak@suse.de>
362 ftp.firstfloor.org:/pub/ak/x86_64/quilt/
362 363
363Bug Reporting 364Bug Reporting
364------------- 365-------------
diff --git a/Documentation/IPMI.txt b/Documentation/IPMI.txt
index bf1cf98d2a27..0256805b548f 100644
--- a/Documentation/IPMI.txt
+++ b/Documentation/IPMI.txt
@@ -10,7 +10,7 @@ standard for controlling intelligent devices that monitor a system.
10It provides for dynamic discovery of sensors in the system and the 10It provides for dynamic discovery of sensors in the system and the
11ability to monitor the sensors and be informed when the sensor's 11ability to monitor the sensors and be informed when the sensor's
12values change or go outside certain boundaries. It also has a 12values change or go outside certain boundaries. It also has a
13standardized database for field-replacable units (FRUs) and a watchdog 13standardized database for field-replaceable units (FRUs) and a watchdog
14timer. 14timer.
15 15
16To use this, you need an interface to an IPMI controller in your 16To use this, you need an interface to an IPMI controller in your
@@ -64,7 +64,7 @@ situation, you need to read the section below named 'The SI Driver' or
64IPMI defines a standard watchdog timer. You can enable this with the 64IPMI defines a standard watchdog timer. You can enable this with the
65'IPMI Watchdog Timer' config option. If you compile the driver into 65'IPMI Watchdog Timer' config option. If you compile the driver into
66the kernel, then via a kernel command-line option you can have the 66the kernel, then via a kernel command-line option you can have the
67watchdog timer start as soon as it intitializes. It also have a lot 67watchdog timer start as soon as it initializes. It also have a lot
68of other options, see the 'Watchdog' section below for more details. 68of other options, see the 'Watchdog' section below for more details.
69Note that you can also have the watchdog continue to run if it is 69Note that you can also have the watchdog continue to run if it is
70closed (by default it is disabled on close). Go into the 'Watchdog 70closed (by default it is disabled on close). Go into the 'Watchdog
diff --git a/Documentation/IRQ.txt b/Documentation/IRQ.txt
new file mode 100644
index 000000000000..1011e7175021
--- /dev/null
+++ b/Documentation/IRQ.txt
@@ -0,0 +1,22 @@
1What is an IRQ?
2
3An IRQ is an interrupt request from a device.
4Currently they can come in over a pin, or over a packet.
5Several devices may be connected to the same pin thus
6sharing an IRQ.
7
8An IRQ number is a kernel identifier used to talk about a hardware
9interrupt source. Typically this is an index into the global irq_desc
10array, but except for what linux/interrupt.h implements the details
11are architecture specific.
12
13An IRQ number is an enumeration of the possible interrupt sources on a
14machine. Typically what is enumerated is the number of input pins on
15all of the interrupt controller in the system. In the case of ISA
16what is enumerated are the 16 input pins on the two i8259 interrupt
17controllers.
18
19Architectures can assign additional meaning to the IRQ numbers, and
20are encouraged to in the case where there is any manual configuration
21of the hardware involved. The ISA IRQs are a classic example of
22assigning this kind of additional meaning.
diff --git a/Documentation/RCU/checklist.txt b/Documentation/RCU/checklist.txt
index 49e27cc19385..1d50cf0c905e 100644
--- a/Documentation/RCU/checklist.txt
+++ b/Documentation/RCU/checklist.txt
@@ -144,9 +144,47 @@ over a rather long period of time, but improvements are always welcome!
144 whether the increased speed is worth it. 144 whether the increased speed is worth it.
145 145
1468. Although synchronize_rcu() is a bit slower than is call_rcu(), 1468. Although synchronize_rcu() is a bit slower than is call_rcu(),
147 it usually results in simpler code. So, unless update performance 147 it usually results in simpler code. So, unless update
148 is important or the updaters cannot block, synchronize_rcu() 148 performance is critically important or the updaters cannot block,
149 should be used in preference to call_rcu(). 149 synchronize_rcu() should be used in preference to call_rcu().
150
151 An especially important property of the synchronize_rcu()
152 primitive is that it automatically self-limits: if grace periods
153 are delayed for whatever reason, then the synchronize_rcu()
154 primitive will correspondingly delay updates. In contrast,
155 code using call_rcu() should explicitly limit update rate in
156 cases where grace periods are delayed, as failing to do so can
157 result in excessive realtime latencies or even OOM conditions.
158
159 Ways of gaining this self-limiting property when using call_rcu()
160 include:
161
162 a. Keeping a count of the number of data-structure elements
163 used by the RCU-protected data structure, including those
164 waiting for a grace period to elapse. Enforce a limit
165 on this number, stalling updates as needed to allow
166 previously deferred frees to complete.
167
168 Alternatively, limit only the number awaiting deferred
169 free rather than the total number of elements.
170
171 b. Limiting update rate. For example, if updates occur only
172 once per hour, then no explicit rate limiting is required,
173 unless your system is already badly broken. The dcache
174 subsystem takes this approach -- updates are guarded
175 by a global lock, limiting their rate.
176
177 c. Trusted update -- if updates can only be done manually by
178 superuser or some other trusted user, then it might not
179 be necessary to automatically limit them. The theory
180 here is that superuser already has lots of ways to crash
181 the machine.
182
183 d. Use call_rcu_bh() rather than call_rcu(), in order to take
184 advantage of call_rcu_bh()'s faster grace periods.
185
186 e. Periodically invoke synchronize_rcu(), permitting a limited
187 number of updates per grace period.
150 188
1519. All RCU list-traversal primitives, which include 1899. All RCU list-traversal primitives, which include
152 list_for_each_rcu(), list_for_each_entry_rcu(), 190 list_for_each_rcu(), list_for_each_entry_rcu(),
diff --git a/Documentation/RCU/torture.txt b/Documentation/RCU/torture.txt
index e4c38152f7f7..a4948591607d 100644
--- a/Documentation/RCU/torture.txt
+++ b/Documentation/RCU/torture.txt
@@ -7,7 +7,7 @@ The CONFIG_RCU_TORTURE_TEST config option is available for all RCU
7implementations. It creates an rcutorture kernel module that can 7implementations. It creates an rcutorture kernel module that can
8be loaded to run a torture test. The test periodically outputs 8be loaded to run a torture test. The test periodically outputs
9status messages via printk(), which can be examined via the dmesg 9status messages via printk(), which can be examined via the dmesg
10command (perhaps grepping for "rcutorture"). The test is started 10command (perhaps grepping for "torture"). The test is started
11when the module is loaded, and stops when the module is unloaded. 11when the module is loaded, and stops when the module is unloaded.
12 12
13However, actually setting this config option to "y" results in the system 13However, actually setting this config option to "y" results in the system
@@ -35,6 +35,19 @@ stat_interval The number of seconds between output of torture
35 be printed -only- when the module is unloaded, and this 35 be printed -only- when the module is unloaded, and this
36 is the default. 36 is the default.
37 37
38shuffle_interval
39 The number of seconds to keep the test threads affinitied
40 to a particular subset of the CPUs. Used in conjunction
41 with test_no_idle_hz.
42
43test_no_idle_hz Whether or not to test the ability of RCU to operate in
44 a kernel that disables the scheduling-clock interrupt to
45 idle CPUs. Boolean parameter, "1" to test, "0" otherwise.
46
47torture_type The type of RCU to test: "rcu" for the rcu_read_lock()
48 API, "rcu_bh" for the rcu_read_lock_bh() API, and "srcu"
49 for the "srcu_read_lock()" API.
50
38verbose Enable debug printk()s. Default is disabled. 51verbose Enable debug printk()s. Default is disabled.
39 52
40 53
@@ -42,14 +55,14 @@ OUTPUT
42 55
43The statistics output is as follows: 56The statistics output is as follows:
44 57
45 rcutorture: --- Start of test: nreaders=16 stat_interval=0 verbose=0 58 rcu-torture: --- Start of test: nreaders=16 stat_interval=0 verbose=0
46 rcutorture: rtc: 0000000000000000 ver: 1916 tfle: 0 rta: 1916 rtaf: 0 rtf: 1915 59 rcu-torture: rtc: 0000000000000000 ver: 1916 tfle: 0 rta: 1916 rtaf: 0 rtf: 1915
47 rcutorture: Reader Pipe: 1466408 9747 0 0 0 0 0 0 0 0 0 60 rcu-torture: Reader Pipe: 1466408 9747 0 0 0 0 0 0 0 0 0
48 rcutorture: Reader Batch: 1464477 11678 0 0 0 0 0 0 0 0 61 rcu-torture: Reader Batch: 1464477 11678 0 0 0 0 0 0 0 0
49 rcutorture: Free-Block Circulation: 1915 1915 1915 1915 1915 1915 1915 1915 1915 1915 0 62 rcu-torture: Free-Block Circulation: 1915 1915 1915 1915 1915 1915 1915 1915 1915 1915 0
50 rcutorture: --- End of test 63 rcu-torture: --- End of test
51 64
52The command "dmesg | grep rcutorture:" will extract this information on 65The command "dmesg | grep torture:" will extract this information on
53most systems. On more esoteric configurations, it may be necessary to 66most systems. On more esoteric configurations, it may be necessary to
54use other commands to access the output of the printk()s used by 67use other commands to access the output of the printk()s used by
55the RCU torture test. The printk()s use KERN_ALERT, so they should 68the RCU torture test. The printk()s use KERN_ALERT, so they should
@@ -115,8 +128,9 @@ The following script may be used to torture RCU:
115 modprobe rcutorture 128 modprobe rcutorture
116 sleep 100 129 sleep 100
117 rmmod rcutorture 130 rmmod rcutorture
118 dmesg | grep rcutorture: 131 dmesg | grep torture:
119 132
120The output can be manually inspected for the error flag of "!!!". 133The output can be manually inspected for the error flag of "!!!".
121One could of course create a more elaborate script that automatically 134One could of course create a more elaborate script that automatically
122checked for such errors. 135checked for such errors. The "rmmod" command forces a "SUCCESS" or
136"FAILURE" indication to be printk()ed.
diff --git a/Documentation/RCU/whatisRCU.txt b/Documentation/RCU/whatisRCU.txt
index 6e459420ee9f..318df44259b3 100644
--- a/Documentation/RCU/whatisRCU.txt
+++ b/Documentation/RCU/whatisRCU.txt
@@ -184,7 +184,17 @@ synchronize_rcu()
184 blocking, it registers a function and argument which are invoked 184 blocking, it registers a function and argument which are invoked
185 after all ongoing RCU read-side critical sections have completed. 185 after all ongoing RCU read-side critical sections have completed.
186 This callback variant is particularly useful in situations where 186 This callback variant is particularly useful in situations where
187 it is illegal to block. 187 it is illegal to block or where update-side performance is
188 critically important.
189
190 However, the call_rcu() API should not be used lightly, as use
191 of the synchronize_rcu() API generally results in simpler code.
192 In addition, the synchronize_rcu() API has the nice property
193 of automatically limiting update rate should grace periods
194 be delayed. This property results in system resilience in face
195 of denial-of-service attacks. Code using call_rcu() should limit
196 update rate in order to gain this same sort of resilience. See
197 checklist.txt for some approaches to limiting the update rate.
188 198
189rcu_assign_pointer() 199rcu_assign_pointer()
190 200
@@ -677,8 +687,9 @@ diff shows how closely related RCU and reader-writer locking can be.
677 + spin_lock(&listmutex); 687 + spin_lock(&listmutex);
678 list_for_each_entry(p, head, lp) { 688 list_for_each_entry(p, head, lp) {
679 if (p->key == key) { 689 if (p->key == key) {
680 list_del(&p->list); 690 - list_del(&p->list);
681 - write_unlock(&listmutex); 691 - write_unlock(&listmutex);
692 + list_del_rcu(&p->list);
682 + spin_unlock(&listmutex); 693 + spin_unlock(&listmutex);
683 + synchronize_rcu(); 694 + synchronize_rcu();
684 kfree(p); 695 kfree(p);
@@ -726,7 +737,7 @@ Or, for those who prefer a side-by-side listing:
726 5 write_lock(&listmutex); 5 spin_lock(&listmutex); 737 5 write_lock(&listmutex); 5 spin_lock(&listmutex);
727 6 list_for_each_entry(p, head, lp) { 6 list_for_each_entry(p, head, lp) { 738 6 list_for_each_entry(p, head, lp) { 6 list_for_each_entry(p, head, lp) {
728 7 if (p->key == key) { 7 if (p->key == key) { 739 7 if (p->key == key) { 7 if (p->key == key) {
729 8 list_del(&p->list); 8 list_del(&p->list); 740 8 list_del(&p->list); 8 list_del_rcu(&p->list);
730 9 write_unlock(&listmutex); 9 spin_unlock(&listmutex); 741 9 write_unlock(&listmutex); 9 spin_unlock(&listmutex);
731 10 synchronize_rcu(); 742 10 synchronize_rcu();
73210 kfree(p); 11 kfree(p); 74310 kfree(p); 11 kfree(p);
diff --git a/Documentation/README.DAC960 b/Documentation/README.DAC960
index 98ea617a0dd6..0e8f618ab534 100644
--- a/Documentation/README.DAC960
+++ b/Documentation/README.DAC960
@@ -78,9 +78,9 @@ also known as "System Drives", and Drive Groups are also called "Packs". Both
78terms are in use in the Mylex documentation; I have chosen to standardize on 78terms are in use in the Mylex documentation; I have chosen to standardize on
79the more generic "Logical Drive" and "Drive Group". 79the more generic "Logical Drive" and "Drive Group".
80 80
81DAC960 RAID disk devices are named in the style of the Device File System 81DAC960 RAID disk devices are named in the style of the obsolete Device File
82(DEVFS). The device corresponding to Logical Drive D on Controller C is 82System (DEVFS). The device corresponding to Logical Drive D on Controller C
83referred to as /dev/rd/cCdD, and the partitions are called /dev/rd/cCdDp1 83is referred to as /dev/rd/cCdD, and the partitions are called /dev/rd/cCdDp1
84through /dev/rd/cCdDp7. For example, partition 3 of Logical Drive 5 on 84through /dev/rd/cCdDp7. For example, partition 3 of Logical Drive 5 on
85Controller 2 is referred to as /dev/rd/c2d5p3. Note that unlike with SCSI 85Controller 2 is referred to as /dev/rd/c2d5p3. Note that unlike with SCSI
86disks the device names will not change in the event of a disk drive failure. 86disks the device names will not change in the event of a disk drive failure.
diff --git a/Documentation/SubmitChecklist b/Documentation/SubmitChecklist
index 8230098da529..a6cb6ffd2933 100644
--- a/Documentation/SubmitChecklist
+++ b/Documentation/SubmitChecklist
@@ -1,57 +1,66 @@
1Linux Kernel patch sumbittal checklist 1Linux Kernel patch sumbittal checklist
2~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 2~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3 3
4Here are some basic things that developers should do if they 4Here are some basic things that developers should do if they want to see their
5want to see their kernel patch submittals accepted quicker. 5kernel patch submissions accepted more quickly.
6 6
7These are all above and beyond the documentation that is provided 7These are all above and beyond the documentation that is provided in
8in Documentation/SubmittingPatches and elsewhere about submitting 8Documentation/SubmittingPatches and elsewhere regarding submitting Linux
9Linux kernel patches. 9kernel patches.
10 10
11 11
12 12
13- Builds cleanly with applicable or modified CONFIG options =y, =m, and =n. 131: Builds cleanly with applicable or modified CONFIG options =y, =m, and
14 No gcc warnings/errors, no linker warnings/errors. 14 =n. No gcc warnings/errors, no linker warnings/errors.
15 15
16- Passes allnoconfig, allmodconfig 162: Passes allnoconfig, allmodconfig
17 17
18- Builds on multiple CPU arch-es by using local cross-compile tools 183: Builds on multiple CPU architectures by using local cross-compile tools
19 or something like PLM at OSDL. 19 or something like PLM at OSDL.
20 20
21- ppc64 is a good architecture for cross-compilation checking because it 214: ppc64 is a good architecture for cross-compilation checking because it
22 tends to use `unsigned long' for 64-bit quantities. 22 tends to use `unsigned long' for 64-bit quantities.
23 23
24- Matches kernel coding style(!) 245: Matches kernel coding style(!)
25 25
26- Any new or modified CONFIG options don't muck up the config menu. 266: Any new or modified CONFIG options don't muck up the config menu.
27 27
28- All new Kconfig options have help text. 287: All new Kconfig options have help text.
29 29
30- Has been carefully reviewed with respect to relevant Kconfig 308: Has been carefully reviewed with respect to relevant Kconfig
31 combinations. This is very hard to get right with testing -- 31 combinations. This is very hard to get right with testing -- brainpower
32 brainpower pays off here. 32 pays off here.
33 33
34- Check cleanly with sparse. 349: Check cleanly with sparse.
35 35
36- Use 'make checkstack' and 'make namespacecheck' and fix any 3610: Use 'make checkstack' and 'make namespacecheck' and fix any problems
37 problems that they find. Note: checkstack does not point out 37 that they find. Note: checkstack does not point out problems explicitly,
38 problems explicitly, but any one function that uses more than 38 but any one function that uses more than 512 bytes on the stack is a
39 512 bytes on the stack is a candidate for change. 39 candidate for change.
40 40
41- Include kernel-doc to document global kernel APIs. (Not required 4111: Include kernel-doc to document global kernel APIs. (Not required for
42 for static functions, but OK there also.) Use 'make htmldocs' 42 static functions, but OK there also.) Use 'make htmldocs' or 'make
43 or 'make mandocs' to check the kernel-doc and fix any issues. 43 mandocs' to check the kernel-doc and fix any issues.
44 44
45- Has been tested with CONFIG_PREEMPT, CONFIG_DEBUG_PREEMPT, 4512: Has been tested with CONFIG_PREEMPT, CONFIG_DEBUG_PREEMPT,
46 CONFIG_DEBUG_SLAB, CONFIG_DEBUG_PAGEALLOC, CONFIG_DEBUG_MUTEXES, 46 CONFIG_DEBUG_SLAB, CONFIG_DEBUG_PAGEALLOC, CONFIG_DEBUG_MUTEXES,
47 CONFIG_DEBUG_SPINLOCK, CONFIG_DEBUG_SPINLOCK_SLEEP all simultaneously 47 CONFIG_DEBUG_SPINLOCK, CONFIG_DEBUG_SPINLOCK_SLEEP all simultaneously
48 enabled. 48 enabled.
49 49
50- Has been build- and runtime tested with and without CONFIG_SMP and 5013: Has been build- and runtime tested with and without CONFIG_SMP and
51 CONFIG_PREEMPT. 51 CONFIG_PREEMPT.
52 52
53- If the patch affects IO/Disk, etc: has been tested with and without 5314: If the patch affects IO/Disk, etc: has been tested with and without
54 CONFIG_LBD. 54 CONFIG_LBD.
55 55
5615: All codepaths have been exercised with all lockdep features enabled.
56 57
572006-APR-27 5816: All new /proc entries are documented under Documentation/
59
6017: All new kernel boot parameters are documented in
61 Documentation/kernel-parameters.txt.
62
6318: All new module parameters are documented with MODULE_PARM_DESC()
64
6519: All new userspace interfaces are documented in Documentation/ABI/.
66 See Documentation/ABI/README for more information.
diff --git a/Documentation/SubmittingDrivers b/Documentation/SubmittingDrivers
index 6bd30fdd0786..58bead05eabb 100644
--- a/Documentation/SubmittingDrivers
+++ b/Documentation/SubmittingDrivers
@@ -59,11 +59,11 @@ Copyright: The copyright owner must agree to use of GPL.
59 are the same person/entity. If not, the name of 59 are the same person/entity. If not, the name of
60 the person/entity authorizing use of GPL should be 60 the person/entity authorizing use of GPL should be
61 listed in case it's necessary to verify the will of 61 listed in case it's necessary to verify the will of
62 the copright owner. 62 the copyright owner.
63 63
64Interfaces: If your driver uses existing interfaces and behaves like 64Interfaces: If your driver uses existing interfaces and behaves like
65 other drivers in the same class it will be much more likely 65 other drivers in the same class it will be much more likely
66 to be accepted than if it invents gratuitous new ones. 66 to be accepted than if it invents gratuitous new ones.
67 If you need to implement a common API over Linux and NT 67 If you need to implement a common API over Linux and NT
68 drivers do it in userspace. 68 drivers do it in userspace.
69 69
@@ -88,7 +88,7 @@ Clarity: It helps if anyone can see how to fix the driver. It helps
88 it will go in the bitbucket. 88 it will go in the bitbucket.
89 89
90Control: In general if there is active maintainance of a driver by 90Control: In general if there is active maintainance of a driver by
91 the author then patches will be redirected to them unless 91 the author then patches will be redirected to them unless
92 they are totally obvious and without need of checking. 92 they are totally obvious and without need of checking.
93 If you want to be the contact and update point for the 93 If you want to be the contact and update point for the
94 driver it is a good idea to state this in the comments, 94 driver it is a good idea to state this in the comments,
@@ -100,7 +100,7 @@ What Criteria Do Not Determine Acceptance
100Vendor: Being the hardware vendor and maintaining the driver is 100Vendor: Being the hardware vendor and maintaining the driver is
101 often a good thing. If there is a stable working driver from 101 often a good thing. If there is a stable working driver from
102 other people already in the tree don't expect 'we are the 102 other people already in the tree don't expect 'we are the
103 vendor' to get your driver chosen. Ideally work with the 103 vendor' to get your driver chosen. Ideally work with the
104 existing driver author to build a single perfect driver. 104 existing driver author to build a single perfect driver.
105 105
106Author: It doesn't matter if a large Linux company wrote the driver, 106Author: It doesn't matter if a large Linux company wrote the driver,
@@ -116,17 +116,13 @@ Linux kernel master tree:
116 ftp.??.kernel.org:/pub/linux/kernel/... 116 ftp.??.kernel.org:/pub/linux/kernel/...
117 ?? == your country code, such as "us", "uk", "fr", etc. 117 ?? == your country code, such as "us", "uk", "fr", etc.
118 118
119Linux kernel mailing list: 119Linux kernel mailing list:
120 linux-kernel@vger.kernel.org 120 linux-kernel@vger.kernel.org
121 [mail majordomo@vger.kernel.org to subscribe] 121 [mail majordomo@vger.kernel.org to subscribe]
122 122
123Linux Device Drivers, Third Edition (covers 2.6.10): 123Linux Device Drivers, Third Edition (covers 2.6.10):
124 http://lwn.net/Kernel/LDD3/ (free version) 124 http://lwn.net/Kernel/LDD3/ (free version)
125 125
126Kernel traffic:
127 Weekly summary of kernel list activity (much easier to read)
128 http://www.kerneltraffic.org/kernel-traffic/
129
130LWN.net: 126LWN.net:
131 Weekly summary of kernel development activity - http://lwn.net/ 127 Weekly summary of kernel development activity - http://lwn.net/
132 2.6 API changes: 128 2.6 API changes:
@@ -145,11 +141,8 @@ KernelNewbies:
145Linux USB project: 141Linux USB project:
146 http://www.linux-usb.org/ 142 http://www.linux-usb.org/
147 143
148How to NOT write kernel driver by arjanv@redhat.com 144How to NOT write kernel driver by Arjan van de Ven:
149 http://people.redhat.com/arjanv/olspaper.pdf 145 http://www.fenrus.org/how-to-not-write-a-device-driver-paper.pdf
150 146
151Kernel Janitor: 147Kernel Janitor:
152 http://janitor.kernelnewbies.org/ 148 http://janitor.kernelnewbies.org/
153
154--
155Last updated on 17 Nov 2005.
diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
index c2c85bcb3d43..302d148c2e18 100644
--- a/Documentation/SubmittingPatches
+++ b/Documentation/SubmittingPatches
@@ -10,7 +10,9 @@ kernel, the process can sometimes be daunting if you're not familiar
10with "the system." This text is a collection of suggestions which 10with "the system." This text is a collection of suggestions which
11can greatly increase the chances of your change being accepted. 11can greatly increase the chances of your change being accepted.
12 12
13If you are submitting a driver, also read Documentation/SubmittingDrivers. 13Read Documentation/SubmitChecklist for a list of items to check
14before submitting code. If you are submitting a driver, also read
15Documentation/SubmittingDrivers.
14 16
15 17
16 18
@@ -74,9 +76,6 @@ There are a number of scripts which can aid in this:
74Quilt: 76Quilt:
75http://savannah.nongnu.org/projects/quilt 77http://savannah.nongnu.org/projects/quilt
76 78
77Randy Dunlap's patch scripts:
78http://www.xenotime.net/linux/scripts/patching-scripts-002.tar.gz
79
80Andrew Morton's patch scripts: 79Andrew Morton's patch scripts:
81http://www.zip.com.au/~akpm/linux/patches/ 80http://www.zip.com.au/~akpm/linux/patches/
82Instead of these scripts, quilt is the recommended patch management 81Instead of these scripts, quilt is the recommended patch management
@@ -174,15 +173,15 @@ For small patches you may want to CC the Trivial Patch Monkey
174trivial@kernel.org managed by Adrian Bunk; which collects "trivial" 173trivial@kernel.org managed by Adrian Bunk; which collects "trivial"
175patches. Trivial patches must qualify for one of the following rules: 174patches. Trivial patches must qualify for one of the following rules:
176 Spelling fixes in documentation 175 Spelling fixes in documentation
177 Spelling fixes which could break grep(1). 176 Spelling fixes which could break grep(1)
178 Warning fixes (cluttering with useless warnings is bad) 177 Warning fixes (cluttering with useless warnings is bad)
179 Compilation fixes (only if they are actually correct) 178 Compilation fixes (only if they are actually correct)
180 Runtime fixes (only if they actually fix things) 179 Runtime fixes (only if they actually fix things)
181 Removing use of deprecated functions/macros (eg. check_region). 180 Removing use of deprecated functions/macros (eg. check_region)
182 Contact detail and documentation fixes 181 Contact detail and documentation fixes
183 Non-portable code replaced by portable code (even in arch-specific, 182 Non-portable code replaced by portable code (even in arch-specific,
184 since people copy, as long as it's trivial) 183 since people copy, as long as it's trivial)
185 Any fix by the author/maintainer of the file. (ie. patch monkey 184 Any fix by the author/maintainer of the file (ie. patch monkey
186 in re-transmission mode) 185 in re-transmission mode)
187URL: <http://www.kernel.org/pub/linux/kernel/people/bunk/trivial/> 186URL: <http://www.kernel.org/pub/linux/kernel/people/bunk/trivial/>
188 187
@@ -210,6 +209,19 @@ Exception: If your mailer is mangling patches then someone may ask
210you to re-send them using MIME. 209you to re-send them using MIME.
211 210
212 211
212WARNING: Some mailers like Mozilla send your messages with
213---- message header ----
214Content-Type: text/plain; charset=us-ascii; format=flowed
215---- message header ----
216The problem is that "format=flowed" makes some of the mailers
217on receiving side to replace TABs with spaces and do similar
218changes. Thus the patches from you can look corrupted.
219
220To fix this just make your mozilla defaults/pref/mailnews.js file to look like:
221pref("mailnews.send_plaintext_flowed", false); // RFC 2646=======
222pref("mailnews.display.disable_format_flowed_support", true);
223
224
213 225
2147) E-mail size. 2267) E-mail size.
215 227
@@ -246,13 +258,13 @@ updated change.
246It is quite common for Linus to "drop" your patch without comment. 258It is quite common for Linus to "drop" your patch without comment.
247That's the nature of the system. If he drops your patch, it could be 259That's the nature of the system. If he drops your patch, it could be
248due to 260due to
249* Your patch did not apply cleanly to the latest kernel version 261* Your patch did not apply cleanly to the latest kernel version.
250* Your patch was not sufficiently discussed on linux-kernel. 262* Your patch was not sufficiently discussed on linux-kernel.
251* A style issue (see section 2), 263* A style issue (see section 2).
252* An e-mail formatting issue (re-read this section) 264* An e-mail formatting issue (re-read this section).
253* A technical problem with your change 265* A technical problem with your change.
254* He gets tons of e-mail, and yours got lost in the shuffle 266* He gets tons of e-mail, and yours got lost in the shuffle.
255* You are being annoying (See Figure 1) 267* You are being annoying.
256 268
257When in doubt, solicit comments on linux-kernel mailing list. 269When in doubt, solicit comments on linux-kernel mailing list.
258 270
@@ -309,6 +321,8 @@ then you just add a line saying
309 321
310 Signed-off-by: Random J Developer <random@developer.example.org> 322 Signed-off-by: Random J Developer <random@developer.example.org>
311 323
324using your real name (sorry, no pseudonyms or anonymous contributions.)
325
312Some people also put extra tags at the end. They'll just be ignored for 326Some people also put extra tags at the end. They'll just be ignored for
313now, but you can do this to mark internal company procedures or just 327now, but you can do this to mark internal company procedures or just
314point out some special detail about the sign-off. 328point out some special detail about the sign-off.
@@ -475,22 +489,21 @@ SECTION 3 - REFERENCES
475Andrew Morton, "The perfect patch" (tpp). 489Andrew Morton, "The perfect patch" (tpp).
476 <http://www.zip.com.au/~akpm/linux/patches/stuff/tpp.txt> 490 <http://www.zip.com.au/~akpm/linux/patches/stuff/tpp.txt>
477 491
478Jeff Garzik, "Linux kernel patch submission format." 492Jeff Garzik, "Linux kernel patch submission format".
479 <http://linux.yyz.us/patch-format.html> 493 <http://linux.yyz.us/patch-format.html>
480 494
481Greg Kroah-Hartman "How to piss off a kernel subsystem maintainer". 495Greg Kroah-Hartman, "How to piss off a kernel subsystem maintainer".
482 <http://www.kroah.com/log/2005/03/31/> 496 <http://www.kroah.com/log/2005/03/31/>
483 <http://www.kroah.com/log/2005/07/08/> 497 <http://www.kroah.com/log/2005/07/08/>
484 <http://www.kroah.com/log/2005/10/19/> 498 <http://www.kroah.com/log/2005/10/19/>
485 <http://www.kroah.com/log/2006/01/11/> 499 <http://www.kroah.com/log/2006/01/11/>
486 500
487NO!!!! No more huge patch bombs to linux-kernel@vger.kernel.org people!. 501NO!!!! No more huge patch bombs to linux-kernel@vger.kernel.org people!
488 <http://marc.theaimsgroup.com/?l=linux-kernel&m=112112749912944&w=2> 502 <http://marc.theaimsgroup.com/?l=linux-kernel&m=112112749912944&w=2>
489 503
490Kernel Documentation/CodingStyle 504Kernel Documentation/CodingStyle:
491 <http://sosdg.org/~coywolf/lxr/source/Documentation/CodingStyle> 505 <http://sosdg.org/~coywolf/lxr/source/Documentation/CodingStyle>
492 506
493Linus Torvald's mail on the canonical patch format: 507Linus Torvalds's mail on the canonical patch format:
494 <http://lkml.org/lkml/2005/4/7/183> 508 <http://lkml.org/lkml/2005/4/7/183>
495-- 509--
496Last updated on 17 Nov 2005.
diff --git a/Documentation/accounting/delay-accounting.txt b/Documentation/accounting/delay-accounting.txt
new file mode 100644
index 000000000000..1443cd71d263
--- /dev/null
+++ b/Documentation/accounting/delay-accounting.txt
@@ -0,0 +1,112 @@
1Delay accounting
2----------------
3
4Tasks encounter delays in execution when they wait
5for some kernel resource to become available e.g. a
6runnable task may wait for a free CPU to run on.
7
8The per-task delay accounting functionality measures
9the delays experienced by a task while
10
11a) waiting for a CPU (while being runnable)
12b) completion of synchronous block I/O initiated by the task
13c) swapping in pages
14
15and makes these statistics available to userspace through
16the taskstats interface.
17
18Such delays provide feedback for setting a task's cpu priority,
19io priority and rss limit values appropriately. Long delays for
20important tasks could be a trigger for raising its corresponding priority.
21
22The functionality, through its use of the taskstats interface, also provides
23delay statistics aggregated for all tasks (or threads) belonging to a
24thread group (corresponding to a traditional Unix process). This is a commonly
25needed aggregation that is more efficiently done by the kernel.
26
27Userspace utilities, particularly resource management applications, can also
28aggregate delay statistics into arbitrary groups. To enable this, delay
29statistics of a task are available both during its lifetime as well as on its
30exit, ensuring continuous and complete monitoring can be done.
31
32
33Interface
34---------
35
36Delay accounting uses the taskstats interface which is described
37in detail in a separate document in this directory. Taskstats returns a
38generic data structure to userspace corresponding to per-pid and per-tgid
39statistics. The delay accounting functionality populates specific fields of
40this structure. See
41 include/linux/taskstats.h
42for a description of the fields pertaining to delay accounting.
43It will generally be in the form of counters returning the cumulative
44delay seen for cpu, sync block I/O, swapin etc.
45
46Taking the difference of two successive readings of a given
47counter (say cpu_delay_total) for a task will give the delay
48experienced by the task waiting for the corresponding resource
49in that interval.
50
51When a task exits, records containing the per-task statistics
52are sent to userspace without requiring a command. If it is the last exiting
53task of a thread group, the per-tgid statistics are also sent. More details
54are given in the taskstats interface description.
55
56The getdelays.c userspace utility in this directory allows simple commands to
57be run and the corresponding delay statistics to be displayed. It also serves
58as an example of using the taskstats interface.
59
60Usage
61-----
62
63Compile the kernel with
64 CONFIG_TASK_DELAY_ACCT=y
65 CONFIG_TASKSTATS=y
66
67Delay accounting is enabled by default at boot up.
68To disable, add
69 nodelayacct
70to the kernel boot options. The rest of the instructions
71below assume this has not been done.
72
73After the system has booted up, use a utility
74similar to getdelays.c to access the delays
75seen by a given task or a task group (tgid).
76The utility also allows a given command to be
77executed and the corresponding delays to be
78seen.
79
80General format of the getdelays command
81
82getdelays [-t tgid] [-p pid] [-c cmd...]
83
84
85Get delays, since system boot, for pid 10
86# ./getdelays -p 10
87(output similar to next case)
88
89Get sum of delays, since system boot, for all pids with tgid 5
90# ./getdelays -t 5
91
92
93CPU count real total virtual total delay total
94 7876 92005750 100000000 24001500
95IO count delay total
96 0 0
97MEM count delay total
98 0 0
99
100Get delays seen in executing a given simple command
101# ./getdelays -c ls /
102
103bin data1 data3 data5 dev home media opt root srv sys usr
104boot data2 data4 data6 etc lib mnt proc sbin subdomain tmp var
105
106
107CPU count real total virtual total delay total
108 6 4000250 4000000 0
109IO count delay total
110 0 0
111MEM count delay total
112 0 0
diff --git a/Documentation/accounting/getdelays.c b/Documentation/accounting/getdelays.c
new file mode 100644
index 000000000000..795ca3911cc5
--- /dev/null
+++ b/Documentation/accounting/getdelays.c
@@ -0,0 +1,396 @@
1/* getdelays.c
2 *
3 * Utility to get per-pid and per-tgid delay accounting statistics
4 * Also illustrates usage of the taskstats interface
5 *
6 * Copyright (C) Shailabh Nagar, IBM Corp. 2005
7 * Copyright (C) Balbir Singh, IBM Corp. 2006
8 * Copyright (c) Jay Lan, SGI. 2006
9 *
10 */
11
12#include <stdio.h>
13#include <stdlib.h>
14#include <errno.h>
15#include <unistd.h>
16#include <poll.h>
17#include <string.h>
18#include <fcntl.h>
19#include <sys/types.h>
20#include <sys/stat.h>
21#include <sys/socket.h>
22#include <sys/types.h>
23#include <signal.h>
24
25#include <linux/genetlink.h>
26#include <linux/taskstats.h>
27
28/*
29 * Generic macros for dealing with netlink sockets. Might be duplicated
30 * elsewhere. It is recommended that commercial grade applications use
31 * libnl or libnetlink and use the interfaces provided by the library
32 */
33#define GENLMSG_DATA(glh) ((void *)(NLMSG_DATA(glh) + GENL_HDRLEN))
34#define GENLMSG_PAYLOAD(glh) (NLMSG_PAYLOAD(glh, 0) - GENL_HDRLEN)
35#define NLA_DATA(na) ((void *)((char*)(na) + NLA_HDRLEN))
36#define NLA_PAYLOAD(len) (len - NLA_HDRLEN)
37
38#define err(code, fmt, arg...) do { printf(fmt, ##arg); exit(code); } while (0)
39int done = 0;
40int rcvbufsz=0;
41
42 char name[100];
43int dbg=0, print_delays=0;
44__u64 stime, utime;
45#define PRINTF(fmt, arg...) { \
46 if (dbg) { \
47 printf(fmt, ##arg); \
48 } \
49 }
50
51/* Maximum size of response requested or message sent */
52#define MAX_MSG_SIZE 256
53/* Maximum number of cpus expected to be specified in a cpumask */
54#define MAX_CPUS 32
55/* Maximum length of pathname to log file */
56#define MAX_FILENAME 256
57
58struct msgtemplate {
59 struct nlmsghdr n;
60 struct genlmsghdr g;
61 char buf[MAX_MSG_SIZE];
62};
63
64char cpumask[100+6*MAX_CPUS];
65
66/*
67 * Create a raw netlink socket and bind
68 */
69static int create_nl_socket(int protocol)
70{
71 int fd;
72 struct sockaddr_nl local;
73
74 fd = socket(AF_NETLINK, SOCK_RAW, protocol);
75 if (fd < 0)
76 return -1;
77
78 if (rcvbufsz)
79 if (setsockopt(fd, SOL_SOCKET, SO_RCVBUF,
80 &rcvbufsz, sizeof(rcvbufsz)) < 0) {
81 printf("Unable to set socket rcv buf size to %d\n",
82 rcvbufsz);
83 return -1;
84 }
85
86 memset(&local, 0, sizeof(local));
87 local.nl_family = AF_NETLINK;
88
89 if (bind(fd, (struct sockaddr *) &local, sizeof(local)) < 0)
90 goto error;
91
92 return fd;
93error:
94 close(fd);
95 return -1;
96}
97
98
99int send_cmd(int sd, __u16 nlmsg_type, __u32 nlmsg_pid,
100 __u8 genl_cmd, __u16 nla_type,
101 void *nla_data, int nla_len)
102{
103 struct nlattr *na;
104 struct sockaddr_nl nladdr;
105 int r, buflen;
106 char *buf;
107
108 struct msgtemplate msg;
109
110 msg.n.nlmsg_len = NLMSG_LENGTH(GENL_HDRLEN);
111 msg.n.nlmsg_type = nlmsg_type;
112 msg.n.nlmsg_flags = NLM_F_REQUEST;
113 msg.n.nlmsg_seq = 0;
114 msg.n.nlmsg_pid = nlmsg_pid;
115 msg.g.cmd = genl_cmd;
116 msg.g.version = 0x1;
117 na = (struct nlattr *) GENLMSG_DATA(&msg);
118 na->nla_type = nla_type;
119 na->nla_len = nla_len + 1 + NLA_HDRLEN;
120 memcpy(NLA_DATA(na), nla_data, nla_len);
121 msg.n.nlmsg_len += NLMSG_ALIGN(na->nla_len);
122
123 buf = (char *) &msg;
124 buflen = msg.n.nlmsg_len ;
125 memset(&nladdr, 0, sizeof(nladdr));
126 nladdr.nl_family = AF_NETLINK;
127 while ((r = sendto(sd, buf, buflen, 0, (struct sockaddr *) &nladdr,
128 sizeof(nladdr))) < buflen) {
129 if (r > 0) {
130 buf += r;
131 buflen -= r;
132 } else if (errno != EAGAIN)
133 return -1;
134 }
135 return 0;
136}
137
138
139/*
140 * Probe the controller in genetlink to find the family id
141 * for the TASKSTATS family
142 */
143int get_family_id(int sd)
144{
145 struct {
146 struct nlmsghdr n;
147 struct genlmsghdr g;
148 char buf[256];
149 } ans;
150
151 int id, rc;
152 struct nlattr *na;
153 int rep_len;
154
155 strcpy(name, TASKSTATS_GENL_NAME);
156 rc = send_cmd(sd, GENL_ID_CTRL, getpid(), CTRL_CMD_GETFAMILY,
157 CTRL_ATTR_FAMILY_NAME, (void *)name,
158 strlen(TASKSTATS_GENL_NAME)+1);
159
160 rep_len = recv(sd, &ans, sizeof(ans), 0);
161 if (ans.n.nlmsg_type == NLMSG_ERROR ||
162 (rep_len < 0) || !NLMSG_OK((&ans.n), rep_len))
163 return 0;
164
165 na = (struct nlattr *) GENLMSG_DATA(&ans);
166 na = (struct nlattr *) ((char *) na + NLA_ALIGN(na->nla_len));
167 if (na->nla_type == CTRL_ATTR_FAMILY_ID) {
168 id = *(__u16 *) NLA_DATA(na);
169 }
170 return id;
171}
172
173void print_delayacct(struct taskstats *t)
174{
175 printf("\n\nCPU %15s%15s%15s%15s\n"
176 " %15llu%15llu%15llu%15llu\n"
177 "IO %15s%15s\n"
178 " %15llu%15llu\n"
179 "MEM %15s%15s\n"
180 " %15llu%15llu\n\n",
181 "count", "real total", "virtual total", "delay total",
182 t->cpu_count, t->cpu_run_real_total, t->cpu_run_virtual_total,
183 t->cpu_delay_total,
184 "count", "delay total",
185 t->blkio_count, t->blkio_delay_total,
186 "count", "delay total", t->swapin_count, t->swapin_delay_total);
187}
188
189int main(int argc, char *argv[])
190{
191 int c, rc, rep_len, aggr_len, len2, cmd_type;
192 __u16 id;
193 __u32 mypid;
194
195 struct nlattr *na;
196 int nl_sd = -1;
197 int len = 0;
198 pid_t tid = 0;
199 pid_t rtid = 0;
200
201 int fd = 0;
202 int count = 0;
203 int write_file = 0;
204 int maskset = 0;
205 char logfile[128];
206 int loop = 0;
207
208 struct msgtemplate msg;
209
210 while (1) {
211 c = getopt(argc, argv, "dw:r:m:t:p:v:l");
212 if (c < 0)
213 break;
214
215 switch (c) {
216 case 'd':
217 printf("print delayacct stats ON\n");
218 print_delays = 1;
219 break;
220 case 'w':
221 strncpy(logfile, optarg, MAX_FILENAME);
222 printf("write to file %s\n", logfile);
223 write_file = 1;
224 break;
225 case 'r':
226 rcvbufsz = atoi(optarg);
227 printf("receive buf size %d\n", rcvbufsz);
228 if (rcvbufsz < 0)
229 err(1, "Invalid rcv buf size\n");
230 break;
231 case 'm':
232 strncpy(cpumask, optarg, sizeof(cpumask));
233 maskset = 1;
234 printf("cpumask %s maskset %d\n", cpumask, maskset);
235 break;
236 case 't':
237 tid = atoi(optarg);
238 if (!tid)
239 err(1, "Invalid tgid\n");
240 cmd_type = TASKSTATS_CMD_ATTR_TGID;
241 print_delays = 1;
242 break;
243 case 'p':
244 tid = atoi(optarg);
245 if (!tid)
246 err(1, "Invalid pid\n");
247 cmd_type = TASKSTATS_CMD_ATTR_PID;
248 print_delays = 1;
249 break;
250 case 'v':
251 printf("debug on\n");
252 dbg = 1;
253 break;
254 case 'l':
255 printf("listen forever\n");
256 loop = 1;
257 break;
258 default:
259 printf("Unknown option %d\n", c);
260 exit(-1);
261 }
262 }
263
264 if (write_file) {
265 fd = open(logfile, O_WRONLY | O_CREAT | O_TRUNC,
266 S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH);
267 if (fd == -1) {
268 perror("Cannot open output file\n");
269 exit(1);
270 }
271 }
272
273 if ((nl_sd = create_nl_socket(NETLINK_GENERIC)) < 0)
274 err(1, "error creating Netlink socket\n");
275
276
277 mypid = getpid();
278 id = get_family_id(nl_sd);
279 if (!id) {
280 printf("Error getting family id, errno %d", errno);
281 goto err;
282 }
283 PRINTF("family id %d\n", id);
284
285 if (maskset) {
286 rc = send_cmd(nl_sd, id, mypid, TASKSTATS_CMD_GET,
287 TASKSTATS_CMD_ATTR_REGISTER_CPUMASK,
288 &cpumask, sizeof(cpumask));
289 PRINTF("Sent register cpumask, retval %d\n", rc);
290 if (rc < 0) {
291 printf("error sending register cpumask\n");
292 goto err;
293 }
294 }
295
296 if (tid) {
297 rc = send_cmd(nl_sd, id, mypid, TASKSTATS_CMD_GET,
298 cmd_type, &tid, sizeof(__u32));
299 PRINTF("Sent pid/tgid, retval %d\n", rc);
300 if (rc < 0) {
301 printf("error sending tid/tgid cmd\n");
302 goto done;
303 }
304 }
305
306 do {
307 int i;
308
309 rep_len = recv(nl_sd, &msg, sizeof(msg), 0);
310 PRINTF("received %d bytes\n", rep_len);
311
312 if (rep_len < 0) {
313 printf("nonfatal reply error: errno %d\n", errno);
314 continue;
315 }
316 if (msg.n.nlmsg_type == NLMSG_ERROR ||
317 !NLMSG_OK((&msg.n), rep_len)) {
318 printf("fatal reply error, errno %d\n", errno);
319 goto done;
320 }
321
322 PRINTF("nlmsghdr size=%d, nlmsg_len=%d, rep_len=%d\n",
323 sizeof(struct nlmsghdr), msg.n.nlmsg_len, rep_len);
324
325
326 rep_len = GENLMSG_PAYLOAD(&msg.n);
327
328 na = (struct nlattr *) GENLMSG_DATA(&msg);
329 len = 0;
330 i = 0;
331 while (len < rep_len) {
332 len += NLA_ALIGN(na->nla_len);
333 switch (na->nla_type) {
334 case TASKSTATS_TYPE_AGGR_TGID:
335 /* Fall through */
336 case TASKSTATS_TYPE_AGGR_PID:
337 aggr_len = NLA_PAYLOAD(na->nla_len);
338 len2 = 0;
339 /* For nested attributes, na follows */
340 na = (struct nlattr *) NLA_DATA(na);
341 done = 0;
342 while (len2 < aggr_len) {
343 switch (na->nla_type) {
344 case TASKSTATS_TYPE_PID:
345 rtid = *(int *) NLA_DATA(na);
346 if (print_delays)
347 printf("PID\t%d\n", rtid);
348 break;
349 case TASKSTATS_TYPE_TGID:
350 rtid = *(int *) NLA_DATA(na);
351 if (print_delays)
352 printf("TGID\t%d\n", rtid);
353 break;
354 case TASKSTATS_TYPE_STATS:
355 count++;
356 if (print_delays)
357 print_delayacct((struct taskstats *) NLA_DATA(na));
358 if (fd) {
359 if (write(fd, NLA_DATA(na), na->nla_len) < 0) {
360 err(1,"write error\n");
361 }
362 }
363 if (!loop)
364 goto done;
365 break;
366 default:
367 printf("Unknown nested nla_type %d\n", na->nla_type);
368 break;
369 }
370 len2 += NLA_ALIGN(na->nla_len);
371 na = (struct nlattr *) ((char *) na + len2);
372 }
373 break;
374
375 default:
376 printf("Unknown nla_type %d\n", na->nla_type);
377 break;
378 }
379 na = (struct nlattr *) (GENLMSG_DATA(&msg) + len);
380 }
381 } while (loop);
382done:
383 if (maskset) {
384 rc = send_cmd(nl_sd, id, mypid, TASKSTATS_CMD_GET,
385 TASKSTATS_CMD_ATTR_DEREGISTER_CPUMASK,
386 &cpumask, sizeof(cpumask));
387 printf("Sent deregister mask, retval %d\n", rc);
388 if (rc < 0)
389 err(rc, "error sending deregister cpumask\n");
390 }
391err:
392 close(nl_sd);
393 if (fd)
394 close(fd);
395 return 0;
396}
diff --git a/Documentation/accounting/taskstats.txt b/Documentation/accounting/taskstats.txt
new file mode 100644
index 000000000000..92ebf29e9041
--- /dev/null
+++ b/Documentation/accounting/taskstats.txt
@@ -0,0 +1,181 @@
1Per-task statistics interface
2-----------------------------
3
4
5Taskstats is a netlink-based interface for sending per-task and
6per-process statistics from the kernel to userspace.
7
8Taskstats was designed for the following benefits:
9
10- efficiently provide statistics during lifetime of a task and on its exit
11- unified interface for multiple accounting subsystems
12- extensibility for use by future accounting patches
13
14Terminology
15-----------
16
17"pid", "tid" and "task" are used interchangeably and refer to the standard
18Linux task defined by struct task_struct. per-pid stats are the same as
19per-task stats.
20
21"tgid", "process" and "thread group" are used interchangeably and refer to the
22tasks that share an mm_struct i.e. the traditional Unix process. Despite the
23use of tgid, there is no special treatment for the task that is thread group
24leader - a process is deemed alive as long as it has any task belonging to it.
25
26Usage
27-----
28
29To get statistics during a task's lifetime, userspace opens a unicast netlink
30socket (NETLINK_GENERIC family) and sends commands specifying a pid or a tgid.
31The response contains statistics for a task (if pid is specified) or the sum of
32statistics for all tasks of the process (if tgid is specified).
33
34To obtain statistics for tasks which are exiting, the userspace listener
35sends a register command and specifies a cpumask. Whenever a task exits on
36one of the cpus in the cpumask, its per-pid statistics are sent to the
37registered listener. Using cpumasks allows the data received by one listener
38to be limited and assists in flow control over the netlink interface and is
39explained in more detail below.
40
41If the exiting task is the last thread exiting its thread group,
42an additional record containing the per-tgid stats is also sent to userspace.
43The latter contains the sum of per-pid stats for all threads in the thread
44group, both past and present.
45
46getdelays.c is a simple utility demonstrating usage of the taskstats interface
47for reporting delay accounting statistics. Users can register cpumasks,
48send commands and process responses, listen for per-tid/tgid exit data,
49write the data received to a file and do basic flow control by increasing
50receive buffer sizes.
51
52Interface
53---------
54
55The user-kernel interface is encapsulated in include/linux/taskstats.h
56
57To avoid this documentation becoming obsolete as the interface evolves, only
58an outline of the current version is given. taskstats.h always overrides the
59description here.
60
61struct taskstats is the common accounting structure for both per-pid and
62per-tgid data. It is versioned and can be extended by each accounting subsystem
63that is added to the kernel. The fields and their semantics are defined in the
64taskstats.h file.
65
66The data exchanged between user and kernel space is a netlink message belonging
67to the NETLINK_GENERIC family and using the netlink attributes interface.
68The messages are in the format
69
70 +----------+- - -+-------------+-------------------+
71 | nlmsghdr | Pad | genlmsghdr | taskstats payload |
72 +----------+- - -+-------------+-------------------+
73
74
75The taskstats payload is one of the following three kinds:
76
771. Commands: Sent from user to kernel. Commands to get data on
78a pid/tgid consist of one attribute, of type TASKSTATS_CMD_ATTR_PID/TGID,
79containing a u32 pid or tgid in the attribute payload. The pid/tgid denotes
80the task/process for which userspace wants statistics.
81
82Commands to register/deregister interest in exit data from a set of cpus
83consist of one attribute, of type
84TASKSTATS_CMD_ATTR_REGISTER/DEREGISTER_CPUMASK and contain a cpumask in the
85attribute payload. The cpumask is specified as an ascii string of
86comma-separated cpu ranges e.g. to listen to exit data from cpus 1,2,3,5,7,8
87the cpumask would be "1-3,5,7-8". If userspace forgets to deregister interest
88in cpus before closing the listening socket, the kernel cleans up its interest
89set over time. However, for the sake of efficiency, an explicit deregistration
90is advisable.
91
922. Response for a command: sent from the kernel in response to a userspace
93command. The payload is a series of three attributes of type:
94
95a) TASKSTATS_TYPE_AGGR_PID/TGID : attribute containing no payload but indicates
96a pid/tgid will be followed by some stats.
97
98b) TASKSTATS_TYPE_PID/TGID: attribute whose payload is the pid/tgid whose stats
99is being returned.
100
101c) TASKSTATS_TYPE_STATS: attribute with a struct taskstsats as payload. The
102same structure is used for both per-pid and per-tgid stats.
103
1043. New message sent by kernel whenever a task exits. The payload consists of a
105 series of attributes of the following type:
106
107a) TASKSTATS_TYPE_AGGR_PID: indicates next two attributes will be pid+stats
108b) TASKSTATS_TYPE_PID: contains exiting task's pid
109c) TASKSTATS_TYPE_STATS: contains the exiting task's per-pid stats
110d) TASKSTATS_TYPE_AGGR_TGID: indicates next two attributes will be tgid+stats
111e) TASKSTATS_TYPE_TGID: contains tgid of process to which task belongs
112f) TASKSTATS_TYPE_STATS: contains the per-tgid stats for exiting task's process
113
114
115per-tgid stats
116--------------
117
118Taskstats provides per-process stats, in addition to per-task stats, since
119resource management is often done at a process granularity and aggregating task
120stats in userspace alone is inefficient and potentially inaccurate (due to lack
121of atomicity).
122
123However, maintaining per-process, in addition to per-task stats, within the
124kernel has space and time overheads. To address this, the taskstats code
125accumalates each exiting task's statistics into a process-wide data structure.
126When the last task of a process exits, the process level data accumalated also
127gets sent to userspace (along with the per-task data).
128
129When a user queries to get per-tgid data, the sum of all other live threads in
130the group is added up and added to the accumalated total for previously exited
131threads of the same thread group.
132
133Extending taskstats
134-------------------
135
136There are two ways to extend the taskstats interface to export more
137per-task/process stats as patches to collect them get added to the kernel
138in future:
139
1401. Adding more fields to the end of the existing struct taskstats. Backward
141 compatibility is ensured by the version number within the
142 structure. Userspace will use only the fields of the struct that correspond
143 to the version its using.
144
1452. Defining separate statistic structs and using the netlink attributes
146 interface to return them. Since userspace processes each netlink attribute
147 independently, it can always ignore attributes whose type it does not
148 understand (because it is using an older version of the interface).
149
150
151Choosing between 1. and 2. is a matter of trading off flexibility and
152overhead. If only a few fields need to be added, then 1. is the preferable
153path since the kernel and userspace don't need to incur the overhead of
154processing new netlink attributes. But if the new fields expand the existing
155struct too much, requiring disparate userspace accounting utilities to
156unnecessarily receive large structures whose fields are of no interest, then
157extending the attributes structure would be worthwhile.
158
159Flow control for taskstats
160--------------------------
161
162When the rate of task exits becomes large, a listener may not be able to keep
163up with the kernel's rate of sending per-tid/tgid exit data leading to data
164loss. This possibility gets compounded when the taskstats structure gets
165extended and the number of cpus grows large.
166
167To avoid losing statistics, userspace should do one or more of the following:
168
169- increase the receive buffer sizes for the netlink sockets opened by
170listeners to receive exit data.
171
172- create more listeners and reduce the number of cpus being listened to by
173each listener. In the extreme case, there could be one listener for each cpu.
174Users may also consider setting the cpu affinity of the listener to the subset
175of cpus to which it listens, especially if they are listening to just one cpu.
176
177Despite these measures, if the userspace receives ENOBUFS error messages
178indicated overflow of receive buffers, it should take measures to handle the
179loss of data.
180
181----
diff --git a/Documentation/arm/IXP4xx b/Documentation/arm/IXP4xx
index d4c6d3aa0c25..43edb4ecf27d 100644
--- a/Documentation/arm/IXP4xx
+++ b/Documentation/arm/IXP4xx
@@ -85,7 +85,7 @@ IXP4xx provides two methods of accessing PCI memory space:
852) If > 64MB of memory space is required, the IXP4xx can be 852) If > 64MB of memory space is required, the IXP4xx can be
86 configured to use indirect registers to access PCI This allows 86 configured to use indirect registers to access PCI This allows
87 for up to 128MB (0x48000000 to 0x4fffffff) of memory on the bus. 87 for up to 128MB (0x48000000 to 0x4fffffff) of memory on the bus.
88 The disadvantadge of this is that every PCI access requires 88 The disadvantage of this is that every PCI access requires
89 three local register accesses plus a spinlock, but in some 89 three local register accesses plus a spinlock, but in some
90 cases the performance hit is acceptable. In addition, you cannot 90 cases the performance hit is acceptable. In addition, you cannot
91 mmap() PCI devices in this case due to the indirect nature 91 mmap() PCI devices in this case due to the indirect nature
diff --git a/Documentation/arm/Samsung-S3C24XX/Overview.txt b/Documentation/arm/Samsung-S3C24XX/Overview.txt
index 8c6ee684174c..3e46d2a31158 100644
--- a/Documentation/arm/Samsung-S3C24XX/Overview.txt
+++ b/Documentation/arm/Samsung-S3C24XX/Overview.txt
@@ -7,11 +7,13 @@ Introduction
7------------ 7------------
8 8
9 The Samsung S3C24XX range of ARM9 System-on-Chip CPUs are supported 9 The Samsung S3C24XX range of ARM9 System-on-Chip CPUs are supported
10 by the 's3c2410' architecture of ARM Linux. Currently the S3C2410 and 10 by the 's3c2410' architecture of ARM Linux. Currently the S3C2410,
11 the S3C2440 are supported CPUs. 11 S3C2440 and S3C2442 devices are supported.
12 12
13 Support for the S3C2400 series is in progress. 13 Support for the S3C2400 series is in progress.
14 14
15 Support for the S3C2412 and S3C2413 CPUs is being merged.
16
15 17
16Configuration 18Configuration
17------------- 19-------------
@@ -43,9 +45,18 @@ Machines
43 45
44 Samsung's own development board, geared for PDA work. 46 Samsung's own development board, geared for PDA work.
45 47
48 Samsung/Aiji SMDK2412
49
50 The S3C2412 version of the SMDK2440.
51
52 Samsung/Aiji SMDK2413
53
54 The S3C2412 version of the SMDK2440.
55
46 Samsung/Meritech SMDK2440 56 Samsung/Meritech SMDK2440
47 57
48 The S3C2440 compatible version of the SMDK2440 58 The S3C2440 compatible version of the SMDK2440, which has the
59 option of an S3C2440 or S3C2442 CPU module.
49 60
50 Thorcom VR1000 61 Thorcom VR1000
51 62
@@ -211,24 +222,6 @@ Port Contributors
211 Lucas Correia Villa Real (S3C2400 port) 222 Lucas Correia Villa Real (S3C2400 port)
212 223
213 224
214Document Changes
215----------------
216
217 05 Sep 2004 - BJD - Added Document Changes section
218 05 Sep 2004 - BJD - Added Klaus Fetscher to list of contributors
219 25 Oct 2004 - BJD - Added Dimitry Andric to list of contributors
220 25 Oct 2004 - BJD - Updated the MTD from the 2.6.9 merge
221 21 Jan 2005 - BJD - Added rx3715, added Shannon to contributors
222 10 Feb 2005 - BJD - Added Guillaume Gourat to contributors
223 02 Mar 2005 - BJD - Added SMDK2440 to list of machines
224 06 Mar 2005 - BJD - Added Christer Weinigel
225 08 Mar 2005 - BJD - Added LCVR to list of people, updated introduction
226 08 Mar 2005 - BJD - Added section on adding machines
227 09 Sep 2005 - BJD - Added section on platform data
228 11 Feb 2006 - BJD - Added I2C, RTC and Watchdog sections
229 11 Feb 2006 - BJD - Added Osiris machine, and S3C2400 information
230
231
232Document Author 225Document Author
233--------------- 226---------------
234 227
diff --git a/Documentation/arm/Samsung-S3C24XX/S3C2412.txt b/Documentation/arm/Samsung-S3C24XX/S3C2412.txt
new file mode 100644
index 000000000000..cb82a7fc7901
--- /dev/null
+++ b/Documentation/arm/Samsung-S3C24XX/S3C2412.txt
@@ -0,0 +1,120 @@
1 S3C2412 ARM Linux Overview
2 ==========================
3
4Introduction
5------------
6
7 The S3C2412 is part of the S3C24XX range of ARM9 System-on-Chip CPUs
8 from Samsung. This part has an ARM926-EJS core, capable of running up
9 to 266MHz (see data-sheet for more information)
10
11
12Clock
13-----
14
15 The core clock code provides a set of clocks to the drivers, and allows
16 for source selection and a number of other features.
17
18
19Power
20-----
21
22 No support for suspend/resume to RAM in the current system.
23
24
25DMA
26---
27
28 No current support for DMA.
29
30
31GPIO
32----
33
34 There is support for setting the GPIO to input/output/special function
35 and reading or writing to them.
36
37
38UART
39----
40
41 The UART hardware is similar to the S3C2440, and is supported by the
42 s3c2410 driver in the drivers/serial directory.
43
44
45NAND
46----
47
48 The NAND hardware is similar to the S3C2440, and is supported by the
49 s3c2410 driver in the drivers/mtd/nand directory.
50
51
52USB Host
53--------
54
55 The USB hardware is similar to the S3C2410, with extended clock source
56 control. The OHCI portion is supported by the ohci-s3c2410 driver, and
57 the clock control selection is supported by the core clock code.
58
59
60USB Device
61----------
62
63 No current support in the kernel
64
65
66IRQs
67----
68
69 All the standard, and external interrupt sources are supported. The
70 extra sub-sources are not yet supported.
71
72
73RTC
74---
75
76 The RTC hardware is similar to the S3C2410, and is supported by the
77 s3c2410-rtc driver.
78
79
80Watchdog
81--------
82
83 The watchdog harware is the same as the S3C2410, and is supported by
84 the s3c2410_wdt driver.
85
86
87MMC/SD/SDIO
88-----------
89
90 No current support for the MMC/SD/SDIO block.
91
92IIC
93---
94
95 The IIC hardware is the same as the S3C2410, and is supported by the
96 i2c-s3c24xx driver.
97
98
99IIS
100---
101
102 No current support for the IIS interface.
103
104
105SPI
106---
107
108 No current support for the SPI interfaces.
109
110
111ATA
112---
113
114 No current support for the on-board ATA block.
115
116
117Document Author
118---------------
119
120Ben Dooks, (c) 2006 Simtec Electronics
diff --git a/Documentation/arm/Samsung-S3C24XX/S3C2413.txt b/Documentation/arm/Samsung-S3C24XX/S3C2413.txt
new file mode 100644
index 000000000000..ab2a88858f12
--- /dev/null
+++ b/Documentation/arm/Samsung-S3C24XX/S3C2413.txt
@@ -0,0 +1,21 @@
1 S3C2413 ARM Linux Overview
2 ==========================
3
4Introduction
5------------
6
7 The S3C2413 is an extended version of the S3C2412, with an camera
8 interface and mobile DDR memory support. See the S3C2412 support
9 documentation for more information.
10
11
12Camera Interface
13---------------
14
15 This block is currently not supported.
16
17
18Document Author
19---------------
20
21Ben Dooks, (c) 2006 Simtec Electronics
diff --git a/Documentation/atomic_ops.txt b/Documentation/atomic_ops.txt
index 23a1c2402bcc..2a63d5662a93 100644
--- a/Documentation/atomic_ops.txt
+++ b/Documentation/atomic_ops.txt
@@ -157,13 +157,13 @@ For example, smp_mb__before_atomic_dec() can be used like so:
157 smp_mb__before_atomic_dec(); 157 smp_mb__before_atomic_dec();
158 atomic_dec(&obj->ref_count); 158 atomic_dec(&obj->ref_count);
159 159
160It makes sure that all memory operations preceeding the atomic_dec() 160It makes sure that all memory operations preceding the atomic_dec()
161call are strongly ordered with respect to the atomic counter 161call are strongly ordered with respect to the atomic counter
162operation. In the above example, it guarentees that the assignment of 162operation. In the above example, it guarantees that the assignment of
163"1" to obj->dead will be globally visible to other cpus before the 163"1" to obj->dead will be globally visible to other cpus before the
164atomic counter decrement. 164atomic counter decrement.
165 165
166Without the explicitl smp_mb__before_atomic_dec() call, the 166Without the explicit smp_mb__before_atomic_dec() call, the
167implementation could legally allow the atomic counter update visible 167implementation could legally allow the atomic counter update visible
168to other cpus before the "obj->dead = 1;" assignment. 168to other cpus before the "obj->dead = 1;" assignment.
169 169
@@ -173,11 +173,11 @@ ordering with respect to memory operations after an atomic_dec() call
173(smp_mb__{before,after}_atomic_inc()). 173(smp_mb__{before,after}_atomic_inc()).
174 174
175A missing memory barrier in the cases where they are required by the 175A missing memory barrier in the cases where they are required by the
176atomic_t implementation above can have disasterous results. Here is 176atomic_t implementation above can have disastrous results. Here is
177an example, which follows a pattern occuring frequently in the Linux 177an example, which follows a pattern occurring frequently in the Linux
178kernel. It is the use of atomic counters to implement reference 178kernel. It is the use of atomic counters to implement reference
179counting, and it works such that once the counter falls to zero it can 179counting, and it works such that once the counter falls to zero it can
180be guarenteed that no other entity can be accessing the object: 180be guaranteed that no other entity can be accessing the object:
181 181
182static void obj_list_add(struct obj *obj) 182static void obj_list_add(struct obj *obj)
183{ 183{
@@ -291,9 +291,9 @@ to the size of an "unsigned long" C data type, and are least of that
291size. The endianness of the bits within each "unsigned long" are the 291size. The endianness of the bits within each "unsigned long" are the
292native endianness of the cpu. 292native endianness of the cpu.
293 293
294 void set_bit(unsigned long nr, volatils unsigned long *addr); 294 void set_bit(unsigned long nr, volatile unsigned long *addr);
295 void clear_bit(unsigned long nr, volatils unsigned long *addr); 295 void clear_bit(unsigned long nr, volatile unsigned long *addr);
296 void change_bit(unsigned long nr, volatils unsigned long *addr); 296 void change_bit(unsigned long nr, volatile unsigned long *addr);
297 297
298These routines set, clear, and change, respectively, the bit number 298These routines set, clear, and change, respectively, the bit number
299indicated by "nr" on the bit mask pointed to by "ADDR". 299indicated by "nr" on the bit mask pointed to by "ADDR".
@@ -301,9 +301,9 @@ indicated by "nr" on the bit mask pointed to by "ADDR".
301They must execute atomically, yet there are no implicit memory barrier 301They must execute atomically, yet there are no implicit memory barrier
302semantics required of these interfaces. 302semantics required of these interfaces.
303 303
304 int test_and_set_bit(unsigned long nr, volatils unsigned long *addr); 304 int test_and_set_bit(unsigned long nr, volatile unsigned long *addr);
305 int test_and_clear_bit(unsigned long nr, volatils unsigned long *addr); 305 int test_and_clear_bit(unsigned long nr, volatile unsigned long *addr);
306 int test_and_change_bit(unsigned long nr, volatils unsigned long *addr); 306 int test_and_change_bit(unsigned long nr, volatile unsigned long *addr);
307 307
308Like the above, except that these routines return a boolean which 308Like the above, except that these routines return a boolean which
309indicates whether the changed bit was set _BEFORE_ the atomic bit 309indicates whether the changed bit was set _BEFORE_ the atomic bit
@@ -335,7 +335,7 @@ subsequent memory operation is made visible. For example:
335 /* ... */; 335 /* ... */;
336 obj->killed = 1; 336 obj->killed = 1;
337 337
338The implementation of test_and_set_bit() must guarentee that 338The implementation of test_and_set_bit() must guarantee that
339"obj->dead = 1;" is visible to cpus before the atomic memory operation 339"obj->dead = 1;" is visible to cpus before the atomic memory operation
340done by test_and_set_bit() becomes visible. Likewise, the atomic 340done by test_and_set_bit() becomes visible. Likewise, the atomic
341memory operation done by test_and_set_bit() must become visible before 341memory operation done by test_and_set_bit() must become visible before
@@ -474,7 +474,7 @@ Now, as far as memory barriers go, as long as spin_lock()
474strictly orders all subsequent memory operations (including 474strictly orders all subsequent memory operations (including
475the cas()) with respect to itself, things will be fine. 475the cas()) with respect to itself, things will be fine.
476 476
477Said another way, _atomic_dec_and_lock() must guarentee that 477Said another way, _atomic_dec_and_lock() must guarantee that
478a counter dropping to zero is never made visible before the 478a counter dropping to zero is never made visible before the
479spinlock being acquired. 479spinlock being acquired.
480 480
diff --git a/Documentation/cciss.txt b/Documentation/cciss.txt
index 15378422fc46..9c629ffa0e58 100644
--- a/Documentation/cciss.txt
+++ b/Documentation/cciss.txt
@@ -20,6 +20,7 @@ This driver is known to work with the following cards:
20 * SA P400i 20 * SA P400i
21 * SA E200 21 * SA E200
22 * SA E200i 22 * SA E200i
23 * SA E500
23 24
24If nodes are not already created in the /dev/cciss directory, run as root: 25If nodes are not already created in the /dev/cciss directory, run as root:
25 26
diff --git a/Documentation/connector/ucon.c b/Documentation/connector/ucon.c
new file mode 100644
index 000000000000..d738cde2a8d5
--- /dev/null
+++ b/Documentation/connector/ucon.c
@@ -0,0 +1,206 @@
1/*
2 * ucon.c
3 *
4 * Copyright (c) 2004+ Evgeniy Polyakov <johnpol@2ka.mipt.ru>
5 *
6 *
7 * This program is free software; you can redistribute it and/or modify
8 * it under the terms of the GNU General Public License as published by
9 * the Free Software Foundation; either version 2 of the License, or
10 * (at your option) any later version.
11 *
12 * This program is distributed in the hope that it will be useful,
13 * but WITHOUT ANY WARRANTY; without even the implied warranty of
14 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
15 * GNU General Public License for more details.
16 *
17 * You should have received a copy of the GNU General Public License
18 * along with this program; if not, write to the Free Software
19 * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
20 */
21
22#include <asm/types.h>
23
24#include <sys/types.h>
25#include <sys/socket.h>
26#include <sys/poll.h>
27
28#include <linux/netlink.h>
29#include <linux/rtnetlink.h>
30
31#include <arpa/inet.h>
32
33#include <stdio.h>
34#include <stdlib.h>
35#include <unistd.h>
36#include <string.h>
37#include <errno.h>
38#include <time.h>
39
40#include <linux/connector.h>
41
42#define DEBUG
43#define NETLINK_CONNECTOR 11
44
45#ifdef DEBUG
46#define ulog(f, a...) fprintf(stdout, f, ##a)
47#else
48#define ulog(f, a...) do {} while (0)
49#endif
50
51static int need_exit;
52static __u32 seq;
53
54static int netlink_send(int s, struct cn_msg *msg)
55{
56 struct nlmsghdr *nlh;
57 unsigned int size;
58 int err;
59 char buf[128];
60 struct cn_msg *m;
61
62 size = NLMSG_SPACE(sizeof(struct cn_msg) + msg->len);
63
64 nlh = (struct nlmsghdr *)buf;
65 nlh->nlmsg_seq = seq++;
66 nlh->nlmsg_pid = getpid();
67 nlh->nlmsg_type = NLMSG_DONE;
68 nlh->nlmsg_len = NLMSG_LENGTH(size - sizeof(*nlh));
69 nlh->nlmsg_flags = 0;
70
71 m = NLMSG_DATA(nlh);
72#if 0
73 ulog("%s: [%08x.%08x] len=%u, seq=%u, ack=%u.\n",
74 __func__, msg->id.idx, msg->id.val, msg->len, msg->seq, msg->ack);
75#endif
76 memcpy(m, msg, sizeof(*m) + msg->len);
77
78 err = send(s, nlh, size, 0);
79 if (err == -1)
80 ulog("Failed to send: %s [%d].\n",
81 strerror(errno), errno);
82
83 return err;
84}
85
86int main(int argc, char *argv[])
87{
88 int s;
89 char buf[1024];
90 int len;
91 struct nlmsghdr *reply;
92 struct sockaddr_nl l_local;
93 struct cn_msg *data;
94 FILE *out;
95 time_t tm;
96 struct pollfd pfd;
97
98 if (argc < 2)
99 out = stdout;
100 else {
101 out = fopen(argv[1], "a+");
102 if (!out) {
103 ulog("Unable to open %s for writing: %s\n",
104 argv[1], strerror(errno));
105 out = stdout;
106 }
107 }
108
109 memset(buf, 0, sizeof(buf));
110
111 s = socket(PF_NETLINK, SOCK_DGRAM, NETLINK_CONNECTOR);
112 if (s == -1) {
113 perror("socket");
114 return -1;
115 }
116
117 l_local.nl_family = AF_NETLINK;
118 l_local.nl_groups = 0x123; /* bitmask of requested groups */
119 l_local.nl_pid = 0;
120
121 if (bind(s, (struct sockaddr *)&l_local, sizeof(struct sockaddr_nl)) == -1) {
122 perror("bind");
123 close(s);
124 return -1;
125 }
126
127#if 0
128 {
129 int on = 0x57; /* Additional group number */
130 setsockopt(s, SOL_NETLINK, NETLINK_ADD_MEMBERSHIP, &on, sizeof(on));
131 }
132#endif
133 if (0) {
134 int i, j;
135
136 memset(buf, 0, sizeof(buf));
137
138 data = (struct cn_msg *)buf;
139
140 data->id.idx = 0x123;
141 data->id.val = 0x456;
142 data->seq = seq++;
143 data->ack = 0;
144 data->len = 0;
145
146 for (j=0; j<10; ++j) {
147 for (i=0; i<1000; ++i) {
148 len = netlink_send(s, data);
149 }
150
151 ulog("%d messages have been sent to %08x.%08x.\n", i, data->id.idx, data->id.val);
152 }
153
154 return 0;
155 }
156
157
158 pfd.fd = s;
159
160 while (!need_exit) {
161 pfd.events = POLLIN;
162 pfd.revents = 0;
163 switch (poll(&pfd, 1, -1)) {
164 case 0:
165 need_exit = 1;
166 break;
167 case -1:
168 if (errno != EINTR) {
169 need_exit = 1;
170 break;
171 }
172 continue;
173 }
174 if (need_exit)
175 break;
176
177 memset(buf, 0, sizeof(buf));
178 len = recv(s, buf, sizeof(buf), 0);
179 if (len == -1) {
180 perror("recv buf");
181 close(s);
182 return -1;
183 }
184 reply = (struct nlmsghdr *)buf;
185
186 switch (reply->nlmsg_type) {
187 case NLMSG_ERROR:
188 fprintf(out, "Error message received.\n");
189 fflush(out);
190 break;
191 case NLMSG_DONE:
192 data = (struct cn_msg *)NLMSG_DATA(reply);
193
194 time(&tm);
195 fprintf(out, "%.24s : [%x.%x] [%08u.%08u].\n",
196 ctime(&tm), data->id.idx, data->id.val, data->seq, data->ack);
197 fflush(out);
198 break;
199 default:
200 break;
201 }
202 }
203
204 close(s);
205 return 0;
206}
diff --git a/Documentation/console/console.txt b/Documentation/console/console.txt
new file mode 100644
index 000000000000..d3e17447321c
--- /dev/null
+++ b/Documentation/console/console.txt
@@ -0,0 +1,144 @@
1Console Drivers
2===============
3
4The linux kernel has 2 general types of console drivers. The first type is
5assigned by the kernel to all the virtual consoles during the boot process.
6This type will be called 'system driver', and only one system driver is allowed
7to exist. The system driver is persistent and it can never be unloaded, though
8it may become inactive.
9
10The second type has to be explicitly loaded and unloaded. This will be called
11'modular driver' by this document. Multiple modular drivers can coexist at
12any time with each driver sharing the console with other drivers including
13the system driver. However, modular drivers cannot take over the console
14that is currently occupied by another modular driver. (Exception: Drivers that
15call take_over_console() will succeed in the takeover regardless of the type
16of driver occupying the consoles.) They can only take over the console that is
17occupied by the system driver. In the same token, if the modular driver is
18released by the console, the system driver will take over.
19
20Modular drivers, from the programmer's point of view, has to call:
21
22 take_over_console() - load and bind driver to console layer
23 give_up_console() - unbind and unload driver
24
25In newer kernels, the following are also available:
26
27 register_con_driver()
28 unregister_con_driver()
29
30If sysfs is enabled, the contents of /sys/class/vtconsole can be
31examined. This shows the console backends currently registered by the
32system which are named vtcon<n> where <n> is an integer fro 0 to 15. Thus:
33
34 ls /sys/class/vtconsole
35 . .. vtcon0 vtcon1
36
37Each directory in /sys/class/vtconsole has 3 files:
38
39 ls /sys/class/vtconsole/vtcon0
40 . .. bind name uevent
41
42What do these files signify?
43
44 1. bind - this is a read/write file. It shows the status of the driver if
45 read, or acts to bind or unbind the driver to the virtual consoles
46 when written to. The possible values are:
47
48 0 - means the driver is not bound and if echo'ed, commands the driver
49 to unbind
50
51 1 - means the driver is bound and if echo'ed, commands the driver to
52 bind
53
54 2. name - read-only file. Shows the name of the driver in this format:
55
56 cat /sys/class/vtconsole/vtcon0/name
57 (S) VGA+
58
59 '(S)' stands for a (S)ystem driver, ie, it cannot be directly
60 commanded to bind or unbind
61
62 'VGA+' is the name of the driver
63
64 cat /sys/class/vtconsole/vtcon1/name
65 (M) frame buffer device
66
67 In this case, '(M)' stands for a (M)odular driver, one that can be
68 directly commanded to bind or unbind.
69
70 3. uevent - ignore this file
71
72When unbinding, the modular driver is detached first, and then the system
73driver takes over the consoles vacated by the driver. Binding, on the other
74hand, will bind the driver to the consoles that are currently occupied by a
75system driver.
76
77NOTE1: Binding and binding must be selected in Kconfig. It's under:
78
79Device Drivers -> Character devices -> Support for binding and unbinding
80console drivers
81
82NOTE2: If any of the virtual consoles are in KD_GRAPHICS mode, then binding or
83unbinding will not succeed. An example of an application that sets the console
84to KD_GRAPHICS is X.
85
86How useful is this feature? This is very useful for console driver
87developers. By unbinding the driver from the console layer, one can unload the
88driver, make changes, recompile, reload and rebind the driver without any need
89for rebooting the kernel. For regular users who may want to switch from
90framebuffer console to VGA console and vice versa, this feature also makes
91this possible. (NOTE NOTE NOTE: Please read fbcon.txt under Documentation/fb
92for more details).
93
94Notes for developers:
95=====================
96
97take_over_console() is now broken up into:
98
99 register_con_driver()
100 bind_con_driver() - private function
101
102give_up_console() is a wrapper to unregister_con_driver(), and a driver must
103be fully unbound for this call to succeed. con_is_bound() will check if the
104driver is bound or not.
105
106Guidelines for console driver writers:
107=====================================
108
109In order for binding to and unbinding from the console to properly work,
110console drivers must follow these guidelines:
111
1121. All drivers, except system drivers, must call either register_con_driver()
113 or take_over_console(). register_con_driver() will just add the driver to
114 the console's internal list. It won't take over the
115 console. take_over_console(), as it name implies, will also take over (or
116 bind to) the console.
117
1182. All resources allocated during con->con_init() must be released in
119 con->con_deinit().
120
1213. All resources allocated in con->con_startup() must be released when the
122 driver, which was previously bound, becomes unbound. The console layer
123 does not have a complementary call to con->con_startup() so it's up to the
124 driver to check when it's legal to release these resources. Calling
125 con_is_bound() in con->con_deinit() will help. If the call returned
126 false(), then it's safe to release the resources. This balance has to be
127 ensured because con->con_startup() can be called again when a request to
128 rebind the driver to the console arrives.
129
1304. Upon exit of the driver, ensure that the driver is totally unbound. If the
131 condition is satisfied, then the driver must call unregister_con_driver()
132 or give_up_console().
133
1345. unregister_con_driver() can also be called on conditions which make it
135 impossible for the driver to service console requests. This can happen
136 with the framebuffer console that suddenly lost all of its drivers.
137
138The current crop of console drivers should still work correctly, but binding
139and unbinding them may cause problems. With minimal fixes, these drivers can
140be made to work correctly.
141
142==========================
143Antonino Daplas <adaplas@pol.net>
144
diff --git a/Documentation/cpu-freq/user-guide.txt b/Documentation/cpu-freq/user-guide.txt
index 7fedc00c3d30..555c8cf3650a 100644
--- a/Documentation/cpu-freq/user-guide.txt
+++ b/Documentation/cpu-freq/user-guide.txt
@@ -153,10 +153,13 @@ scaling_governor, and by "echoing" the name of another
153 that some governors won't load - they only 153 that some governors won't load - they only
154 work on some specific architectures or 154 work on some specific architectures or
155 processors. 155 processors.
156scaling_min_freq and 156scaling_min_freq and
157scaling_max_freq show the current "policy limits" (in 157scaling_max_freq show the current "policy limits" (in
158 kHz). By echoing new values into these 158 kHz). By echoing new values into these
159 files, you can change these limits. 159 files, you can change these limits.
160 NOTE: when setting a policy you need to
161 first set scaling_max_freq, then
162 scaling_min_freq.
160 163
161 164
162If you have selected the "userspace" governor which allows you to 165If you have selected the "userspace" governor which allows you to
diff --git a/Documentation/cpu-hotplug.txt b/Documentation/cpu-hotplug.txt
index 1bcf69996c9d..bc107cb157a8 100644
--- a/Documentation/cpu-hotplug.txt
+++ b/Documentation/cpu-hotplug.txt
@@ -251,16 +251,24 @@ A: This is what you would need in your kernel code to receive notifications.
251 return NOTIFY_OK; 251 return NOTIFY_OK;
252 } 252 }
253 253
254 static struct notifier_block foobar_cpu_notifer = 254 static struct notifier_block __cpuinitdata foobar_cpu_notifer =
255 { 255 {
256 .notifier_call = foobar_cpu_callback, 256 .notifier_call = foobar_cpu_callback,
257 }; 257 };
258 258
259You need to call register_cpu_notifier() from your init function.
260Init functions could be of two types:
2611. early init (init function called when only the boot processor is online).
2622. late init (init function called _after_ all the CPUs are online).
259 263
260In your init function, 264For the first case, you should add the following to your init function
261 265
262 register_cpu_notifier(&foobar_cpu_notifier); 266 register_cpu_notifier(&foobar_cpu_notifier);
263 267
268For the second case, you should add the following to your init function
269
270 register_hotcpu_notifier(&foobar_cpu_notifier);
271
264You can fail PREPARE notifiers if something doesn't work to prepare resources. 272You can fail PREPARE notifiers if something doesn't work to prepare resources.
265This will stop the activity and send a following CANCELED event back. 273This will stop the activity and send a following CANCELED event back.
266 274
diff --git a/Documentation/cpusets.txt b/Documentation/cpusets.txt
index 159e2a0c3e80..842f0d1ab216 100644
--- a/Documentation/cpusets.txt
+++ b/Documentation/cpusets.txt
@@ -217,6 +217,12 @@ exclusive cpuset. Also, the use of a Linux virtual file system (vfs)
217to represent the cpuset hierarchy provides for a familiar permission 217to represent the cpuset hierarchy provides for a familiar permission
218and name space for cpusets, with a minimum of additional kernel code. 218and name space for cpusets, with a minimum of additional kernel code.
219 219
220The cpus and mems files in the root (top_cpuset) cpuset are
221read-only. The cpus file automatically tracks the value of
222cpu_online_map using a CPU hotplug notifier, and the mems file
223automatically tracks the value of node_online_map using the
224cpuset_track_online_nodes() hook.
225
220 226
2211.4 What are exclusive cpusets ? 2271.4 What are exclusive cpusets ?
222-------------------------------- 228--------------------------------
diff --git a/Documentation/crypto/api-intro.txt b/Documentation/crypto/api-intro.txt
index 74dffc68ff9f..5a03a2801d67 100644
--- a/Documentation/crypto/api-intro.txt
+++ b/Documentation/crypto/api-intro.txt
@@ -19,15 +19,14 @@ At the lowest level are algorithms, which register dynamically with the
19API. 19API.
20 20
21'Transforms' are user-instantiated objects, which maintain state, handle all 21'Transforms' are user-instantiated objects, which maintain state, handle all
22of the implementation logic (e.g. manipulating page vectors), provide an 22of the implementation logic (e.g. manipulating page vectors) and provide an
23abstraction to the underlying algorithms, and handle common logical 23abstraction to the underlying algorithms. However, at the user
24operations (e.g. cipher modes, HMAC for digests). However, at the user
25level they are very simple. 24level they are very simple.
26 25
27Conceptually, the API layering looks like this: 26Conceptually, the API layering looks like this:
28 27
29 [transform api] (user interface) 28 [transform api] (user interface)
30 [transform ops] (per-type logic glue e.g. cipher.c, digest.c) 29 [transform ops] (per-type logic glue e.g. cipher.c, compress.c)
31 [algorithm api] (for registering algorithms) 30 [algorithm api] (for registering algorithms)
32 31
33The idea is to make the user interface and algorithm registration API 32The idea is to make the user interface and algorithm registration API
@@ -44,22 +43,27 @@ under development.
44Here's an example of how to use the API: 43Here's an example of how to use the API:
45 44
46 #include <linux/crypto.h> 45 #include <linux/crypto.h>
46 #include <linux/err.h>
47 #include <linux/scatterlist.h>
47 48
48 struct scatterlist sg[2]; 49 struct scatterlist sg[2];
49 char result[128]; 50 char result[128];
50 struct crypto_tfm *tfm; 51 struct crypto_hash *tfm;
52 struct hash_desc desc;
51 53
52 tfm = crypto_alloc_tfm("md5", 0); 54 tfm = crypto_alloc_hash("md5", 0, CRYPTO_ALG_ASYNC);
53 if (tfm == NULL) 55 if (IS_ERR(tfm))
54 fail(); 56 fail();
55 57
56 /* ... set up the scatterlists ... */ 58 /* ... set up the scatterlists ... */
59
60 desc.tfm = tfm;
61 desc.flags = 0;
57 62
58 crypto_digest_init(tfm); 63 if (crypto_hash_digest(&desc, &sg, 2, result))
59 crypto_digest_update(tfm, &sg, 2); 64 fail();
60 crypto_digest_final(tfm, result);
61 65
62 crypto_free_tfm(tfm); 66 crypto_free_hash(tfm);
63 67
64 68
65Many real examples are available in the regression test module (tcrypt.c). 69Many real examples are available in the regression test module (tcrypt.c).
@@ -126,7 +130,7 @@ might already be working on.
126BUGS 130BUGS
127 131
128Send bug reports to: 132Send bug reports to:
129James Morris <jmorris@redhat.com> 133Herbert Xu <herbert@gondor.apana.org.au>
130Cc: David S. Miller <davem@redhat.com> 134Cc: David S. Miller <davem@redhat.com>
131 135
132 136
@@ -134,13 +138,14 @@ FURTHER INFORMATION
134 138
135For further patches and various updates, including the current TODO 139For further patches and various updates, including the current TODO
136list, see: 140list, see:
137http://samba.org/~jamesm/crypto/ 141http://gondor.apana.org.au/~herbert/crypto/
138 142
139 143
140AUTHORS 144AUTHORS
141 145
142James Morris 146James Morris
143David S. Miller 147David S. Miller
148Herbert Xu
144 149
145 150
146CREDITS 151CREDITS
@@ -238,8 +243,11 @@ Anubis algorithm contributors:
238Tiger algorithm contributors: 243Tiger algorithm contributors:
239 Aaron Grothe 244 Aaron Grothe
240 245
246VIA PadLock contributors:
247 Michal Ludvig
248
241Generic scatterwalk code by Adam J. Richter <adam@yggdrasil.com> 249Generic scatterwalk code by Adam J. Richter <adam@yggdrasil.com>
242 250
243Please send any credits updates or corrections to: 251Please send any credits updates or corrections to:
244James Morris <jmorris@redhat.com> 252Herbert Xu <herbert@gondor.apana.org.au>
245 253
diff --git a/Documentation/devices.txt b/Documentation/devices.txt
index b2f593fc76ca..addc67b1d770 100644
--- a/Documentation/devices.txt
+++ b/Documentation/devices.txt
@@ -3,7 +3,7 @@
3 3
4 Maintained by Torben Mathiasen <device@lanana.org> 4 Maintained by Torben Mathiasen <device@lanana.org>
5 5
6 Last revised: 01 March 2006 6 Last revised: 15 May 2006
7 7
8This list is the Linux Device List, the official registry of allocated 8This list is the Linux Device List, the official registry of allocated
9device numbers and /dev directory nodes for the Linux operating 9device numbers and /dev directory nodes for the Linux operating
@@ -2543,6 +2543,9 @@ Your cooperation is appreciated.
2543 64 = /dev/usb/rio500 Diamond Rio 500 2543 64 = /dev/usb/rio500 Diamond Rio 500
2544 65 = /dev/usb/usblcd USBLCD Interface (info@usblcd.de) 2544 65 = /dev/usb/usblcd USBLCD Interface (info@usblcd.de)
2545 66 = /dev/usb/cpad0 Synaptics cPad (mouse/LCD) 2545 66 = /dev/usb/cpad0 Synaptics cPad (mouse/LCD)
2546 67 = /dev/usb/adutux0 1st Ontrak ADU device
2547 ...
2548 76 = /dev/usb/adutux10 10th Ontrak ADU device
2546 96 = /dev/usb/hiddev0 1st USB HID device 2549 96 = /dev/usb/hiddev0 1st USB HID device
2547 ... 2550 ...
2548 111 = /dev/usb/hiddev15 16th USB HID device 2551 111 = /dev/usb/hiddev15 16th USB HID device
@@ -2565,10 +2568,10 @@ Your cooperation is appreciated.
2565 243 = /dev/usb/dabusb3 Fourth dabusb device 2568 243 = /dev/usb/dabusb3 Fourth dabusb device
2566 2569
2567180 block USB block devices 2570180 block USB block devices
2568 0 = /dev/uba First USB block device 2571 0 = /dev/uba First USB block device
2569 8 = /dev/ubb Second USB block device 2572 8 = /dev/ubb Second USB block device
2570 16 = /dev/ubc Thrid USB block device 2573 16 = /dev/ubc Third USB block device
2571 ... 2574 ...
2572 2575
2573181 char Conrad Electronic parallel port radio clocks 2576181 char Conrad Electronic parallel port radio clocks
2574 0 = /dev/pcfclock0 First Conrad radio clock 2577 0 = /dev/pcfclock0 First Conrad radio clock
@@ -2791,6 +2794,7 @@ Your cooperation is appreciated.
2791 170 = /dev/ttyNX0 Hilscher netX serial port 0 2794 170 = /dev/ttyNX0 Hilscher netX serial port 0
2792 ... 2795 ...
2793 185 = /dev/ttyNX15 Hilscher netX serial port 15 2796 185 = /dev/ttyNX15 Hilscher netX serial port 15
2797 186 = /dev/ttyJ0 JTAG1 DCC protocol based serial port emulation
2794 2798
2795205 char Low-density serial ports (alternate device) 2799205 char Low-density serial ports (alternate device)
2796 0 = /dev/culu0 Callout device for ttyLU0 2800 0 = /dev/culu0 Callout device for ttyLU0
@@ -3108,6 +3112,10 @@ Your cooperation is appreciated.
3108 ... 3112 ...
3109 240 = /dev/rfdp 16th RFD FTL layer 3113 240 = /dev/rfdp 16th RFD FTL layer
3110 3114
3115257 char Phoenix Technologies Cryptographic Services Driver
3116 0 = /dev/ptlsec Crypto Services Driver
3117
3118
3111 3119
3112 **** ADDITIONAL /dev DIRECTORY ENTRIES 3120 **** ADDITIONAL /dev DIRECTORY ENTRIES
3113 3121
diff --git a/Documentation/digiepca.txt b/Documentation/digiepca.txt
index 88820fe38dad..f2560e22f2c9 100644
--- a/Documentation/digiepca.txt
+++ b/Documentation/digiepca.txt
@@ -2,7 +2,7 @@ NOTE: This driver is obsolete. Digi provides a 2.6 driver (dgdm) at
2http://www.digi.com for PCI cards. They no longer maintain this driver, 2http://www.digi.com for PCI cards. They no longer maintain this driver,
3and have no 2.6 driver for ISA cards. 3and have no 2.6 driver for ISA cards.
4 4
5This driver requires a number of user-space tools. They can be aquired from 5This driver requires a number of user-space tools. They can be acquired from
6http://www.digi.com, but only works with 2.4 kernels. 6http://www.digi.com, but only works with 2.4 kernels.
7 7
8 8
diff --git a/Documentation/dontdiff b/Documentation/dontdiff
index 24adfe9af3ca..63c2d0c55aa2 100644
--- a/Documentation/dontdiff
+++ b/Documentation/dontdiff
@@ -135,6 +135,7 @@ tags
135times.h* 135times.h*
136tkparse 136tkparse
137trix_boot.h 137trix_boot.h
138utsrelease.h*
138version.h* 139version.h*
139vmlinux 140vmlinux
140vmlinux-* 141vmlinux-*
diff --git a/Documentation/driver-model/overview.txt b/Documentation/driver-model/overview.txt
index ac4a7a737e43..2050c9ffc629 100644
--- a/Documentation/driver-model/overview.txt
+++ b/Documentation/driver-model/overview.txt
@@ -18,7 +18,7 @@ Traditional driver models implemented some sort of tree-like structure
18(sometimes just a list) for the devices they control. There wasn't any 18(sometimes just a list) for the devices they control. There wasn't any
19uniformity across the different bus types. 19uniformity across the different bus types.
20 20
21The current driver model provides a comon, uniform data model for describing 21The current driver model provides a common, uniform data model for describing
22a bus and the devices that can appear under the bus. The unified bus 22a bus and the devices that can appear under the bus. The unified bus
23model includes a set of common attributes which all busses carry, and a set 23model includes a set of common attributes which all busses carry, and a set
24of common callbacks, such as device discovery during bus probing, bus 24of common callbacks, such as device discovery during bus probing, bus
diff --git a/Documentation/drivers/edac/edac.txt b/Documentation/drivers/edac/edac.txt
index 70d96a62e5e1..7b3d969d2964 100644
--- a/Documentation/drivers/edac/edac.txt
+++ b/Documentation/drivers/edac/edac.txt
@@ -35,15 +35,14 @@ the vendor should tie the parity status bits to 0 if they do not intend
35to generate parity. Some vendors do not do this, and thus the parity bit 35to generate parity. Some vendors do not do this, and thus the parity bit
36can "float" giving false positives. 36can "float" giving false positives.
37 37
38The PCI Parity EDAC device has the ability to "skip" known flaky 38[There are patches in the kernel queue which will allow for storage of
39cards during the parity scan. These are set by the parity "blacklist" 39quirks of PCI devices reporting false parity positives. The 2.6.18
40interface in the sysfs for PCI Parity. (See the PCI section in the sysfs 40kernel should have those patches included. When that becomes available,
41section below.) There is also a parity "whitelist" which is used as 41then EDAC will be patched to utilize that information to "skip" such
42an explicit list of devices to scan, while the blacklist is a list 42devices.]
43of devices to skip.
44 43
45EDAC will have future error detectors that will be added or integrated 44EDAC will have future error detectors that will be integrated with
46into EDAC in the following list: 45EDAC or added to it, in the following list:
47 46
48 MCE Machine Check Exception 47 MCE Machine Check Exception
49 MCA Machine Check Architecture 48 MCA Machine Check Architecture
@@ -93,22 +92,24 @@ EDAC lives in the /sys/devices/system/edac directory. Within this directory
93there currently reside 2 'edac' components: 92there currently reside 2 'edac' components:
94 93
95 mc memory controller(s) system 94 mc memory controller(s) system
96 pci PCI status system 95 pci PCI control and status system
97 96
98 97
99============================================================================ 98============================================================================
100Memory Controller (mc) Model 99Memory Controller (mc) Model
101 100
102First a background on the memory controller's model abstracted in EDAC. 101First a background on the memory controller's model abstracted in EDAC.
103Each mc device controls a set of DIMM memory modules. These modules are 102Each 'mc' device controls a set of DIMM memory modules. These modules are
104laid out in a Chip-Select Row (csrowX) and Channel table (chX). There can 103laid out in a Chip-Select Row (csrowX) and Channel table (chX). There can
105be multiple csrows and two channels. 104be multiple csrows and multiple channels.
106 105
107Memory controllers allow for several csrows, with 8 csrows being a typical value. 106Memory controllers allow for several csrows, with 8 csrows being a typical value.
108Yet, the actual number of csrows depends on the electrical "loading" 107Yet, the actual number of csrows depends on the electrical "loading"
109of a given motherboard, memory controller and DIMM characteristics. 108of a given motherboard, memory controller and DIMM characteristics.
110 109
111Dual channels allows for 128 bit data transfers to the CPU from memory. 110Dual channels allows for 128 bit data transfers to the CPU from memory.
111Some newer chipsets allow for more than 2 channels, like Fully Buffered DIMMs
112(FB-DIMMs). The following example will assume 2 channels:
112 113
113 114
114 Channel 0 Channel 1 115 Channel 0 Channel 1
@@ -234,23 +235,15 @@ Polling period control file:
234 The time period, in milliseconds, for polling for error information. 235 The time period, in milliseconds, for polling for error information.
235 Too small a value wastes resources. Too large a value might delay 236 Too small a value wastes resources. Too large a value might delay
236 necessary handling of errors and might loose valuable information for 237 necessary handling of errors and might loose valuable information for
237 locating the error. 1000 milliseconds (once each second) is about 238 locating the error. 1000 milliseconds (once each second) is the current
238 right for most uses. 239 default. Systems which require all the bandwidth they can get, may
240 increase this.
239 241
240 LOAD TIME: module/kernel parameter: poll_msec=[0|1] 242 LOAD TIME: module/kernel parameter: poll_msec=[0|1]
241 243
242 RUN TIME: echo "1000" >/sys/devices/system/edac/mc/poll_msec 244 RUN TIME: echo "1000" >/sys/devices/system/edac/mc/poll_msec
243 245
244 246
245Module Version read-only attribute file:
246
247 'mc_version'
248
249 The EDAC CORE module's version and compile date are shown here to
250 indicate what EDAC is running.
251
252
253
254============================================================================ 247============================================================================
255'mcX' DIRECTORIES 248'mcX' DIRECTORIES
256 249
@@ -284,35 +277,6 @@ Seconds since last counter reset control file:
284 277
285 278
286 279
287DIMM capability attribute file:
288
289 'edac_capability'
290
291 The EDAC (Error Detection and Correction) capabilities/modes of
292 the memory controller hardware.
293
294
295DIMM Current Capability attribute file:
296
297 'edac_current_capability'
298
299 The EDAC capabilities available with the hardware
300 configuration. This may not be the same as "EDAC capability"
301 if the correct memory is not used. If a memory controller is
302 capable of EDAC, but DIMMs without check bits are in use, then
303 Parity, SECDED, S4ECD4ED capabilities will not be available
304 even though the memory controller might be capable of those
305 modes with the proper memory loaded.
306
307
308Memory Type supported on this controller attribute file:
309
310 'supported_mem_type'
311
312 This attribute file displays the memory type, usually
313 buffered and unbuffered DIMMs.
314
315
316Memory Controller name attribute file: 280Memory Controller name attribute file:
317 281
318 'mc_name' 282 'mc_name'
@@ -321,16 +285,6 @@ Memory Controller name attribute file:
321 that is being utilized. 285 that is being utilized.
322 286
323 287
324Memory Controller Module name attribute file:
325
326 'module_name'
327
328 This attribute file displays the memory controller module name,
329 version and date built. The name of the memory controller
330 hardware - some drivers work with multiple controllers and
331 this field shows which hardware is present.
332
333
334Total memory managed by this memory controller attribute file: 288Total memory managed by this memory controller attribute file:
335 289
336 'size_mb' 290 'size_mb'
@@ -432,6 +386,9 @@ Memory Type attribute file:
432 386
433 This attribute file will display what type of memory is currently 387 This attribute file will display what type of memory is currently
434 on this csrow. Normally, either buffered or unbuffered memory. 388 on this csrow. Normally, either buffered or unbuffered memory.
389 Examples:
390 Registered-DDR
391 Unbuffered-DDR
435 392
436 393
437EDAC Mode of operation attribute file: 394EDAC Mode of operation attribute file:
@@ -446,8 +403,13 @@ Device type attribute file:
446 403
447 'dev_type' 404 'dev_type'
448 405
449 This attribute file will display what type of DIMM device is 406 This attribute file will display what type of DRAM device is
450 being utilized. Example: x4 407 being utilized on this DIMM.
408 Examples:
409 x1
410 x2
411 x4
412 x8
451 413
452 414
453Channel 0 CE Count attribute file: 415Channel 0 CE Count attribute file:
@@ -522,10 +484,10 @@ SYSTEM LOGGING
522If logging for UEs and CEs are enabled then system logs will have 484If logging for UEs and CEs are enabled then system logs will have
523error notices indicating errors that have been detected: 485error notices indicating errors that have been detected:
524 486
525MC0: CE page 0x283, offset 0xce0, grain 8, syndrome 0x6ec3, row 0, 487EDAC MC0: CE page 0x283, offset 0xce0, grain 8, syndrome 0x6ec3, row 0,
526channel 1 "DIMM_B1": amd76x_edac 488channel 1 "DIMM_B1": amd76x_edac
527 489
528MC0: CE page 0x1e5, offset 0xfb0, grain 8, syndrome 0xb741, row 0, 490EDAC MC0: CE page 0x1e5, offset 0xfb0, grain 8, syndrome 0xb741, row 0,
529channel 1 "DIMM_B1": amd76x_edac 491channel 1 "DIMM_B1": amd76x_edac
530 492
531 493
@@ -610,64 +572,4 @@ Parity Count:
610 572
611 573
612 574
613PCI Device Whitelist:
614
615 'pci_parity_whitelist'
616
617 This control file allows for an explicit list of PCI devices to be
618 scanned for parity errors. Only devices found on this list will
619 be examined. The list is a line of hexadecimal VENDOR and DEVICE
620 ID tuples:
621
622 1022:7450,1434:16a6
623
624 One or more can be inserted, separated by a comma.
625
626 To write the above list doing the following as one command line:
627
628 echo "1022:7450,1434:16a6"
629 > /sys/devices/system/edac/pci/pci_parity_whitelist
630
631
632
633 To display what the whitelist is, simply 'cat' the same file.
634
635
636PCI Device Blacklist:
637
638 'pci_parity_blacklist'
639
640 This control file allows for a list of PCI devices to be
641 skipped for scanning.
642 The list is a line of hexadecimal VENDOR and DEVICE ID tuples:
643
644 1022:7450,1434:16a6
645
646 One or more can be inserted, separated by a comma.
647
648 To write the above list doing the following as one command line:
649
650 echo "1022:7450,1434:16a6"
651 > /sys/devices/system/edac/pci/pci_parity_blacklist
652
653
654 To display what the whitelist currently contains,
655 simply 'cat' the same file.
656
657======================================================================= 575=======================================================================
658
659PCI Vendor and Devices IDs can be obtained with the lspci command. Using
660the -n option lspci will display the vendor and device IDs. The system
661administrator will have to determine which devices should be scanned or
662skipped.
663
664
665
666The two lists (white and black) are prioritized. blacklist is the lower
667priority and will NOT be utilized when a whitelist has been set.
668Turn OFF a whitelist by an empty echo command:
669
670 echo > /sys/devices/system/edac/pci/pci_parity_whitelist
671
672and any previous blacklist will be utilized.
673
diff --git a/Documentation/fb/fbcon.txt b/Documentation/fb/fbcon.txt
index 08dce0f631bf..f373df12ed4c 100644
--- a/Documentation/fb/fbcon.txt
+++ b/Documentation/fb/fbcon.txt
@@ -135,10 +135,10 @@ C. Boot options
135 135
136 The angle can be changed anytime afterwards by 'echoing' the same 136 The angle can be changed anytime afterwards by 'echoing' the same
137 numbers to any one of the 2 attributes found in 137 numbers to any one of the 2 attributes found in
138 /sys/class/graphics/fb{x} 138 /sys/class/graphics/fbcon
139 139
140 con_rotate - rotate the display of the active console 140 rotate - rotate the display of the active console
141 con_rotate_all - rotate the display of all consoles 141 rotate_all - rotate the display of all consoles
142 142
143 Console rotation will only become available if Console Rotation 143 Console rotation will only become available if Console Rotation
144 Support is compiled in your kernel. 144 Support is compiled in your kernel.
@@ -148,5 +148,177 @@ C. Boot options
148 Actually, the underlying fb driver is totally ignorant of console 148 Actually, the underlying fb driver is totally ignorant of console
149 rotation. 149 rotation.
150 150
151--- 151C. Attaching, Detaching and Unloading
152
153Before going on on how to attach, detach and unload the framebuffer console, an
154illustration of the dependencies may help.
155
156The console layer, as with most subsystems, needs a driver that interfaces with
157the hardware. Thus, in a VGA console:
158
159console ---> VGA driver ---> hardware.
160
161Assuming the VGA driver can be unloaded, one must first unbind the VGA driver
162from the console layer before unloading the driver. The VGA driver cannot be
163unloaded if it is still bound to the console layer. (See
164Documentation/console/console.txt for more information).
165
166This is more complicated in the case of the the framebuffer console (fbcon),
167because fbcon is an intermediate layer between the console and the drivers:
168
169console ---> fbcon ---> fbdev drivers ---> hardware
170
171The fbdev drivers cannot be unloaded if it's bound to fbcon, and fbcon cannot
172be unloaded if it's bound to the console layer.
173
174So to unload the fbdev drivers, one must first unbind fbcon from the console,
175then unbind the fbdev drivers from fbcon. Fortunately, unbinding fbcon from
176the console layer will automatically unbind framebuffer drivers from
177fbcon. Thus, there is no need to explicitly unbind the fbdev drivers from
178fbcon.
179
180So, how do we unbind fbcon from the console? Part of the answer is in
181Documentation/console/console.txt. To summarize:
182
183Echo a value to the bind file that represents the framebuffer console
184driver. So assuming vtcon1 represents fbcon, then:
185
186echo 1 > sys/class/vtconsole/vtcon1/bind - attach framebuffer console to
187 console layer
188echo 0 > sys/class/vtconsole/vtcon1/bind - detach framebuffer console from
189 console layer
190
191If fbcon is detached from the console layer, your boot console driver (which is
192usually VGA text mode) will take over. A few drivers (rivafb and i810fb) will
193restore VGA text mode for you. With the rest, before detaching fbcon, you
194must take a few additional steps to make sure that your VGA text mode is
195restored properly. The following is one of the several methods that you can do:
196
1971. Download or install vbetool. This utility is included with most
198 distributions nowadays, and is usually part of the suspend/resume tool.
199
2002. In your kernel configuration, ensure that CONFIG_FRAMEBUFFER_CONSOLE is set
201 to 'y' or 'm'. Enable one or more of your favorite framebuffer drivers.
202
2033. Boot into text mode and as root run:
204
205 vbetool vbestate save > <vga state file>
206
207 The above command saves the register contents of your graphics
208 hardware to <vga state file>. You need to do this step only once as
209 the state file can be reused.
210
2114. If fbcon is compiled as a module, load fbcon by doing:
212
213 modprobe fbcon
214
2155. Now to detach fbcon:
216
217 vbetool vbestate restore < <vga state file> && \
218 echo 0 > /sys/class/vtconsole/vtcon1/bind
219
2206. That's it, you're back to VGA mode. And if you compiled fbcon as a module,
221 you can unload it by 'rmmod fbcon'
222
2237. To reattach fbcon:
224
225 echo 1 > /sys/class/vtconsole/vtcon1/bind
226
2278. Once fbcon is unbound, all drivers registered to the system will also
228become unbound. This means that fbcon and individual framebuffer drivers
229can be unloaded or reloaded at will. Reloading the drivers or fbcon will
230automatically bind the console, fbcon and the drivers together. Unloading
231all the drivers without unloading fbcon will make it impossible for the
232console to bind fbcon.
233
234Notes for vesafb users:
235=======================
236
237Unfortunately, if your bootline includes a vga=xxx parameter that sets the
238hardware in graphics mode, such as when loading vesafb, vgacon will not load.
239Instead, vgacon will replace the default boot console with dummycon, and you
240won't get any display after detaching fbcon. Your machine is still alive, so
241you can reattach vesafb. However, to reattach vesafb, you need to do one of
242the following:
243
244Variation 1:
245
246 a. Before detaching fbcon, do
247
248 vbetool vbemode save > <vesa state file> # do once for each vesafb mode,
249 # the file can be reused
250
251 b. Detach fbcon as in step 5.
252
253 c. Attach fbcon
254
255 vbetool vbestate restore < <vesa state file> && \
256 echo 1 > /sys/class/vtconsole/vtcon1/bind
257
258Variation 2:
259
260 a. Before detaching fbcon, do:
261 echo <ID> > /sys/class/tty/console/bind
262
263
264 vbetool vbemode get
265
266 b. Take note of the mode number
267
268 b. Detach fbcon as in step 5.
269
270 c. Attach fbcon:
271
272 vbetool vbemode set <mode number> && \
273 echo 1 > /sys/class/vtconsole/vtcon1/bind
274
275Samples:
276========
277
278Here are 2 sample bash scripts that you can use to bind or unbind the
279framebuffer console driver if you are in an X86 box:
280
281---------------------------------------------------------------------------
282#!/bin/bash
283# Unbind fbcon
284
285# Change this to where your actual vgastate file is located
286# Or Use VGASTATE=$1 to indicate the state file at runtime
287VGASTATE=/tmp/vgastate
288
289# path to vbetool
290VBETOOL=/usr/local/bin
291
292
293for (( i = 0; i < 16; i++))
294do
295 if test -x /sys/class/vtconsole/vtcon$i; then
296 if [ `cat /sys/class/vtconsole/vtcon$i/name | grep -c "frame buffer"` \
297 = 1 ]; then
298 if test -x $VBETOOL/vbetool; then
299 echo Unbinding vtcon$i
300 $VBETOOL/vbetool vbestate restore < $VGASTATE
301 echo 0 > /sys/class/vtconsole/vtcon$i/bind
302 fi
303 fi
304 fi
305done
306
307---------------------------------------------------------------------------
308#!/bin/bash
309# Bind fbcon
310
311for (( i = 0; i < 16; i++))
312do
313 if test -x /sys/class/vtconsole/vtcon$i; then
314 if [ `cat /sys/class/vtconsole/vtcon$i/name | grep -c "frame buffer"` \
315 = 1 ]; then
316 echo Unbinding vtcon$i
317 echo 1 > /sys/class/vtconsole/vtcon$i/bind
318 fi
319 fi
320done
321---------------------------------------------------------------------------
322
323--
152Antonino Daplas <adaplas@pol.net> 324Antonino Daplas <adaplas@pol.net>
diff --git a/Documentation/fb/imacfb.txt b/Documentation/fb/imacfb.txt
new file mode 100644
index 000000000000..759028545a7e
--- /dev/null
+++ b/Documentation/fb/imacfb.txt
@@ -0,0 +1,31 @@
1
2What is imacfb?
3===============
4
5This is a generic EFI platform driver for Intel based Apple computers.
6Imacfb is only for EFI booted Intel Macs.
7
8Supported Hardware
9==================
10
11iMac 17"/20"
12Macbook
13Macbook Pro 15"/17"
14MacMini
15
16How to use it?
17==============
18
19Imacfb does not have any kind of autodetection of your machine.
20You have to add the fillowing kernel parameters in your elilo.conf:
21 Macbook :
22 video=imacfb:macbook
23 MacMini :
24 video=imacfb:mini
25 Macbook Pro 15", iMac 17" :
26 video=imacfb:i17
27 Macbook Pro 17", iMac 20" :
28 video=imacfb:i20
29
30--
31Edgar Hucek <gimli@dark-green.com>
diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt
index 027285d0c26c..436697cb9388 100644
--- a/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@ -6,14 +6,18 @@ be removed from this file.
6 6
7--------------------------- 7---------------------------
8 8
9What: devfs 9What: /sys/devices/.../power/state
10When: July 2005 10 dev->power.power_state
11Files: fs/devfs/*, include/linux/devfs_fs*.h and assorted devfs 11 dpm_runtime_{suspend,resume)()
12 function calls throughout the kernel tree 12When: July 2007
13Why: It has been unmaintained for a number of years, has unfixable 13Why: Broken design for runtime control over driver power states, confusing
14 races, contains a naming policy within the kernel that is 14 driver-internal runtime power management with: mechanisms to support
15 against the LSB, and can be replaced by using udev. 15 system-wide sleep state transitions; event codes that distinguish
16Who: Greg Kroah-Hartman <greg@kroah.com> 16 different phases of swsusp "sleep" transitions; and userspace policy
17 inputs. This framework was never widely used, and most attempts to
18 use it were broken. Drivers should instead be exposing domain-specific
19 interfaces either to kernel or to userspace.
20Who: Pavel Machek <pavel@suse.cz>
17 21
18--------------------------- 22---------------------------
19 23
@@ -66,11 +70,15 @@ Who: Mauro Carvalho Chehab <mchehab@brturbo.com.br>
66 70
67--------------------------- 71---------------------------
68 72
69What: remove EXPORT_SYMBOL(insert_resource) 73What: sys_sysctl
70When: April 2006 74When: January 2007
71Files: kernel/resource.c 75Why: The same information is available through /proc/sys and that is the
72Why: No modular usage in the kernel. 76 interface user space prefers to use. And there do not appear to be
73Who: Adrian Bunk <bunk@stusta.de> 77 any existing user in user space of sys_sysctl. The additional
78 maintenance overhead of keeping a set of binary names gets
79 in the way of doing a good job of maintaining this interface.
80
81Who: Eric Biederman <ebiederm@xmission.com>
74 82
75--------------------------- 83---------------------------
76 84
@@ -132,16 +140,6 @@ Who: NeilBrown <neilb@suse.de>
132 140
133--------------------------- 141---------------------------
134 142
135What: au1x00_uart driver
136When: January 2006
137Why: The 8250 serial driver now has the ability to deal with the differences
138 between the standard 8250 family of UARTs and their slightly strange
139 brother on Alchemy SOCs. The loss of features is not considered an
140 issue.
141Who: Ralf Baechle <ralf@linux-mips.org>
142
143---------------------------
144
145What: eepro100 network driver 143What: eepro100 network driver
146When: January 2007 144When: January 2007
147Why: replaced by the e100 driver 145Why: replaced by the e100 driver
@@ -149,6 +147,13 @@ Who: Adrian Bunk <bunk@stusta.de>
149 147
150--------------------------- 148---------------------------
151 149
150What: drivers depending on OSS_OBSOLETE_DRIVER
151When: options in 2.6.20, code in 2.6.22
152Why: OSS drivers with ALSA replacements
153Who: Adrian Bunk <bunk@stusta.de>
154
155---------------------------
156
152What: pci_module_init(driver) 157What: pci_module_init(driver)
153When: January 2007 158When: January 2007
154Why: Is replaced by pci_register_driver(pci_driver). 159Why: Is replaced by pci_register_driver(pci_driver).
@@ -177,14 +182,13 @@ Who: Jean Delvare <khali@linux-fr.org>
177 182
178--------------------------- 183---------------------------
179 184
180What: remove EXPORT_SYMBOL(tasklist_lock) 185What: Unused EXPORT_SYMBOL/EXPORT_SYMBOL_GPL exports
181When: August 2006 186 (temporary transition config option provided until then)
182Files: kernel/fork.c 187 The transition config option will also be removed at the same time.
183Why: tasklist_lock protects the kernel internal task list. Modules have 188When: before 2.6.19
184 no business looking at it, and all instances in drivers have been due 189Why: Unused symbols are both increasing the size of the kernel binary
185 to use of too-lowlevel APIs. Having this symbol exported prevents 190 and are often a sign of "wrong API"
186 moving to more scalable locking schemes for the task list. 191Who: Arjan van de Ven <arjan@linux.intel.com>
187Who: Christoph Hellwig <hch@lst.de>
188 192
189--------------------------- 193---------------------------
190 194
@@ -224,3 +228,109 @@ Why: The interface no longer has any callers left in the kernel. It
224Who: Nick Piggin <npiggin@suse.de> 228Who: Nick Piggin <npiggin@suse.de>
225 229
226--------------------------- 230---------------------------
231
232What: Support for the Momentum / PMC-Sierra Jaguar ATX evaluation board
233When: September 2006
234Why: Does no longer build since quite some time, and was never popular,
235 due to the platform being replaced by successor models. Apparently
236 no user base left. It also is one of the last users of
237 WANT_PAGE_VIRTUAL.
238Who: Ralf Baechle <ralf@linux-mips.org>
239
240---------------------------
241
242What: Support for the Momentum Ocelot, Ocelot 3, Ocelot C and Ocelot G
243When: September 2006
244Why: Some do no longer build and apparently there is no user base left
245 for these platforms.
246Who: Ralf Baechle <ralf@linux-mips.org>
247
248---------------------------
249
250What: Support for MIPS Technologies' Altas and SEAD evaluation board
251When: September 2006
252Why: Some do no longer build and apparently there is no user base left
253 for these platforms. Hardware out of production since several years.
254Who: Ralf Baechle <ralf@linux-mips.org>
255
256---------------------------
257
258What: Support for the IT8172-based platforms, ITE 8172G and Globespan IVR
259When: September 2006
260Why: Code does no longer build since at least 2.6.0, apparently there is
261 no user base left for these platforms. Hardware out of production
262 since several years and hardly a trace of the manufacturer left on
263 the net.
264Who: Ralf Baechle <ralf@linux-mips.org>
265
266---------------------------
267
268What: Interrupt only SA_* flags
269When: Januar 2007
270Why: The interrupt related SA_* flags are replaced by IRQF_* to move them
271 out of the signal namespace.
272
273Who: Thomas Gleixner <tglx@linutronix.de>
274
275---------------------------
276
277What: i2c-ite and i2c-algo-ite drivers
278When: September 2006
279Why: These drivers never compiled since they were added to the kernel
280 tree 5 years ago. This feature removal can be reevaluated if
281 someone shows interest in the drivers, fixes them and takes over
282 maintenance.
283 http://marc.theaimsgroup.com/?l=linux-mips&m=115040510817448
284Who: Jean Delvare <khali@linux-fr.org>
285
286---------------------------
287
288What: Bridge netfilter deferred IPv4/IPv6 output hook calling
289When: January 2007
290Why: The deferred output hooks are a layering violation causing unusual
291 and broken behaviour on bridge devices. Examples of things they
292 break include QoS classifation using the MARK or CLASSIFY targets,
293 the IPsec policy match and connection tracking with VLANs on a
294 bridge. Their only use is to enable bridge output port filtering
295 within iptables with the physdev match, which can also be done by
296 combining iptables and ebtables using netfilter marks. Until it
297 will get removed the hook deferral is disabled by default and is
298 only enabled when needed.
299
300Who: Patrick McHardy <kaber@trash.net>
301
302---------------------------
303
304What: frame diverter
305When: November 2006
306Why: The frame diverter is included in most distribution kernels, but is
307 broken. It does not correctly handle many things:
308 - IPV6
309 - non-linear skb's
310 - network device RCU on removal
311 - input frames not correctly checked for protocol errors
312 It also adds allocation overhead even if not enabled.
313 It is not clear if anyone is still using it.
314Who: Stephen Hemminger <shemminger@osdl.org>
315
316---------------------------
317
318
319What: PHYSDEVPATH, PHYSDEVBUS, PHYSDEVDRIVER in the uevent environment
320When: Oktober 2008
321Why: The stacking of class devices makes these values misleading and
322 inconsistent.
323 Class devices should not carry any of these properties, and bus
324 devices have SUBSYTEM and DRIVER as a replacement.
325Who: Kay Sievers <kay.sievers@suse.de>
326
327---------------------------
328
329What: i2c-isa
330When: December 2006
331Why: i2c-isa is a non-sense and doesn't fit in the device driver
332 model. Drivers relying on it are better implemented as platform
333 drivers.
334Who: Jean Delvare <khali@linux-fr.org>
335
336---------------------------
diff --git a/Documentation/filesystems/00-INDEX b/Documentation/filesystems/00-INDEX
index 66fdc0744fe0..16dec61d7671 100644
--- a/Documentation/filesystems/00-INDEX
+++ b/Documentation/filesystems/00-INDEX
@@ -62,8 +62,8 @@ ramfs-rootfs-initramfs.txt
62 - info on the 'in memory' filesystems ramfs, rootfs and initramfs. 62 - info on the 'in memory' filesystems ramfs, rootfs and initramfs.
63reiser4.txt 63reiser4.txt
64 - info on the Reiser4 filesystem based on dancing tree algorithms. 64 - info on the Reiser4 filesystem based on dancing tree algorithms.
65relayfs.txt 65relay.txt
66 - info on relayfs, for efficient streaming from kernel to user space. 66 - info on relay, for efficient streaming from kernel to user space.
67romfs.txt 67romfs.txt
68 - description of the ROMFS filesystem. 68 - description of the ROMFS filesystem.
69smbfs.txt 69smbfs.txt
diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index d31efbbdfe50..247d7f619aa2 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -142,8 +142,8 @@ see also dquot_operations section.
142 142
143--------------------------- file_system_type --------------------------- 143--------------------------- file_system_type ---------------------------
144prototypes: 144prototypes:
145 struct int (*get_sb) (struct file_system_type *, int, 145 int (*get_sb) (struct file_system_type *, int,
146 const char *, void *, struct vfsmount *); 146 const char *, void *, struct vfsmount *);
147 void (*kill_sb) (struct super_block *); 147 void (*kill_sb) (struct super_block *);
148locking rules: 148locking rules:
149 may block BKL 149 may block BKL
diff --git a/Documentation/filesystems/automount-support.txt b/Documentation/filesystems/automount-support.txt
index 58c65a1713e5..7cac200e2a85 100644
--- a/Documentation/filesystems/automount-support.txt
+++ b/Documentation/filesystems/automount-support.txt
@@ -19,7 +19,7 @@ following procedure:
19 19
20 (2) Have the follow_link() op do the following steps: 20 (2) Have the follow_link() op do the following steps:
21 21
22 (a) Call do_kern_mount() to call the appropriate filesystem to set up a 22 (a) Call vfs_kern_mount() to call the appropriate filesystem to set up a
23 superblock and gain a vfsmount structure representing it. 23 superblock and gain a vfsmount structure representing it.
24 24
25 (b) Copy the nameidata provided as an argument and substitute the dentry 25 (b) Copy the nameidata provided as an argument and substitute the dentry
diff --git a/Documentation/filesystems/configfs/configfs_example.c b/Documentation/filesystems/configfs/configfs_example.c
index 3d4713a6c207..2d6a14a463e0 100644
--- a/Documentation/filesystems/configfs/configfs_example.c
+++ b/Documentation/filesystems/configfs/configfs_example.c
@@ -264,6 +264,15 @@ static struct config_item_type simple_child_type = {
264}; 264};
265 265
266 266
267struct simple_children {
268 struct config_group group;
269};
270
271static inline struct simple_children *to_simple_children(struct config_item *item)
272{
273 return item ? container_of(to_config_group(item), struct simple_children, group) : NULL;
274}
275
267static struct config_item *simple_children_make_item(struct config_group *group, const char *name) 276static struct config_item *simple_children_make_item(struct config_group *group, const char *name)
268{ 277{
269 struct simple_child *simple_child; 278 struct simple_child *simple_child;
@@ -304,7 +313,13 @@ static ssize_t simple_children_attr_show(struct config_item *item,
304"items have only one attribute that is readable and writeable.\n"); 313"items have only one attribute that is readable and writeable.\n");
305} 314}
306 315
316static void simple_children_release(struct config_item *item)
317{
318 kfree(to_simple_children(item));
319}
320
307static struct configfs_item_operations simple_children_item_ops = { 321static struct configfs_item_operations simple_children_item_ops = {
322 .release = simple_children_release,
308 .show_attribute = simple_children_attr_show, 323 .show_attribute = simple_children_attr_show,
309}; 324};
310 325
@@ -345,10 +360,6 @@ static struct configfs_subsystem simple_children_subsys = {
345 * children of its own. 360 * children of its own.
346 */ 361 */
347 362
348struct simple_children {
349 struct config_group group;
350};
351
352static struct config_group *group_children_make_group(struct config_group *group, const char *name) 363static struct config_group *group_children_make_group(struct config_group *group, const char *name)
353{ 364{
354 struct simple_children *simple_children; 365 struct simple_children *simple_children;
diff --git a/Documentation/filesystems/devfs/ChangeLog b/Documentation/filesystems/devfs/ChangeLog
deleted file mode 100644
index e5aba5246d7c..000000000000
--- a/Documentation/filesystems/devfs/ChangeLog
+++ /dev/null
@@ -1,1977 +0,0 @@
1/* -*- auto-fill -*- */
2===============================================================================
3Changes for patch v1
4
5- creation of devfs
6
7- modified miscellaneous character devices to support devfs
8===============================================================================
9Changes for patch v2
10
11- bug fix with manual inode creation
12===============================================================================
13Changes for patch v3
14
15- bugfixes
16
17- documentation improvements
18
19- created a couple of scripts (one to save&restore a devfs and the
20 other to set up compatibility symlinks)
21
22- devfs support for SCSI discs. New name format is: sd_hHcCiIlL
23===============================================================================
24Changes for patch v4
25
26- bugfix for the directory reading code
27
28- bugfix for compilation with kerneld
29
30- devfs support for generic hard discs
31
32- rationalisation of the various watchdog drivers
33===============================================================================
34Changes for patch v5
35
36- support for mounting directly from entries in the devfs (it doesn't
37 need to be mounted to do this), including the root filesystem.
38 Mounting of swap partitions also works. Hence, now if you set
39 CONFIG_DEVFS_ONLY to 'Y' then you won't be able to access your discs
40 via ordinary device nodes. Naturally, the default is 'N' so that you
41 can still use your old device nodes. If you want to mount from devfs
42 entries, make sure you use: append = "root=/dev/sd_..." in your
43 lilo.conf. It seems LILO looks for the device number (major&minor)
44 and writes that into the kernel image :-(
45
46- support for character memory devices (/dev/null, /dev/zero, /dev/full
47 and so on). Thanks to C. Scott Ananian <cananian@alumni.princeton.edu>
48===============================================================================
49Changes for patch v6
50
51- support for subdirectories
52
53- support for symbolic links (created by devfs_mk_symlink(), no
54 support yet for creation via symlink(2))
55
56- SCSI disc naming now cast in stone, with the format:
57 /dev/sd/c0b1t2u3 controller=0, bus=1, ID=2, LUN=3, whole disc
58 /dev/sd/c0b1t2u3p4 controller=0, bus=1, ID=2, LUN=3, 4th partition
59
60- loop devices now appear in devfs
61
62- tty devices, console, serial ports, etc. now appear in devfs
63 Thanks to C. Scott Ananian <cananian@alumni.princeton.edu>
64
65- bugs with mounting devfs-only devices now fixed
66===============================================================================
67Changes for patch v7
68
69- SCSI CD-ROMS, tapes and generic devices now appear in devfs
70===============================================================================
71Changes for patch v8
72
73- bugfix with no-rewind SCSI tapes
74
75- RAMDISCs now appear in devfs
76
77- better cleaning up of devfs entries created by various modules
78
79- interface change to <devfs_register>
80===============================================================================
81Changes for patch v9
82
83- the v8 patch was corrupted somehow, which would affect the patch for
84 linux/fs/filesystems.c
85 I've also fixed the v8 patch file on the WWW
86
87- MetaDevices (/dev/md*) should now appear in devfs
88===============================================================================
89Changes for patch v10
90
91- bugfix in meta device support for devfs
92
93- created this ChangeLog file
94
95- added devfs support to the floppy driver
96
97- added support for creating sockets in a devfs
98===============================================================================
99Changes for patch v11
100
101- added DEVFS_FL_HIDE_UNREG flag
102
103- incorporated better patch for ttyname() in libc 5.4.43 from H.J. Lu.
104
105- interface change to <devfs_mk_symlink>
106
107- support for creating symlinks with symlink(2)
108
109- parallel port printer (/dev/lp*) now appears in devfs
110===============================================================================
111Changes for patch v12
112
113- added inode check to <devfs_fill_file> function
114
115- improved devfs support when mounting from devfs
116
117- added call to <<release>> operation when removing swap areas on
118 devfs devices
119
120- increased NR_SUPER to 128 to support large numbers of devfs mounts
121 (for chroot(2) gaols)
122
123- fixed bug in SCSI disc support: was generating incorrect minors if
124 SCSI ID's did not start at 0 and increase by 1
125
126- support symlink traversal when mounting root
127===============================================================================
128Changes for patch v13
129
130- added devfs support to soundcard driver
131 Thanks to Eric Dumas <dumas@linux.eu.org> and
132 C. Scott Ananian <cananian@alumni.princeton.edu>
133
134- added devfs support to the joystick driver
135
136- loop driver now has it's own subdirectory "/dev/loop/"
137
138- created <devfs_get_flags> and <devfs_set_flags> functions
139
140- fix problem with SCSI disc compatibility names (sd{a,b,c,d,e,f})
141 which assumes ID's start at 0 and increase by 1. Also only create
142 devfs entries for SCSI disc partitions which actually exist
143 Show new names in partition check
144 Thanks to Jakub Jelinek <jj@sunsite.ms.mff.cuni.cz>
145===============================================================================
146Changes for patch v14
147
148- bug fix in floppy driver: would not compile without
149 CONFIG_DEVFS_FS='Y'
150 Thanks to Jurgen Botz <jbotz@nova.botz.org>
151
152- bug fix in loop driver
153 Thanks to C. Scott Ananian <cananian@alumni.princeton.edu>
154
155- do not create devfs entries for printers not configured
156 Thanks to C. Scott Ananian <cananian@alumni.princeton.edu>
157
158- do not create devfs entries for serial ports not present
159 Thanks to C. Scott Ananian <cananian@alumni.princeton.edu>
160
161- ensure <tty_register_devfs> is exported from tty_io.c
162 Thanks to C. Scott Ananian <cananian@alumni.princeton.edu>
163
164- allow unregistering of devfs symlink entries
165
166- fixed bug in SCSI disc naming introduced in last patch version
167===============================================================================
168Changes for patch v15
169
170- ported to kernel 2.1.81
171===============================================================================
172Changes for patch v16
173
174- created <devfs_set_symlink_destination> function
175
176- moved DEVFS_SUPER_MAGIC into header file
177
178- added DEVFS_FL_HIDE flag
179
180- created <devfs_get_maj_min>
181
182- created <devfs_get_handle_from_inode>
183
184- fixed bugs in searching by major&minor
185
186- changed interface to <devfs_unregister>, <devfs_fill_file> and
187 <devfs_find_handle>
188
189- fixed inode times when symlink created with symlink(2)
190
191- change tty driver to do auto-creation of devfs entries
192 Thanks to C. Scott Ananian <cananian@alumni.princeton.edu>
193
194- fixed bug in genhd.c: whole disc (non-SCSI) was not registered to
195 devfs
196
197- updated libc 5.4.43 patch for ttyname()
198===============================================================================
199Changes for patch v17
200
201- added CONFIG_DEVFS_TTY_COMPAT
202 Thanks to C. Scott Ananian <cananian@alumni.princeton.edu>
203
204- bugfix in devfs support for drivers/char/lp.c
205 Thanks to C. Scott Ananian <cananian@alumni.princeton.edu>
206
207- clean up serial driver so that PCMCIA devices unregister correctly
208 Thanks to C. Scott Ananian <cananian@alumni.princeton.edu>
209
210- fixed bug in genhd.c: whole disc (non-SCSI) was not registered to
211 devfs [was missing in patch v16]
212
213- updated libc 5.4.43 patch for ttyname() [was missing in patch v16]
214
215- all SCSI devices now registered in /dev/sg
216
217- support removal of devfs entries via unlink(2)
218===============================================================================
219Changes for patch v18
220
221- added floppy/?u720 floppy entry
222
223- fixed kerneld support for entries in devfs subdirectories
224
225- incorporated latest patch for ttyname() in libc 5.4.43 from H.J. Lu.
226===============================================================================
227Changes for patch v19
228
229- bug fix when looking up unregistered entries: kerneld was not called
230
231- fixes for kernel 2.1.86 (now requires 2.1.86)
232===============================================================================
233Changes for patch v20
234
235- only create available floppy entries
236 Thanks to Andrzej Krzysztofowicz <ankry@green.mif.pg.gda.pl>
237
238- new IDE naming scheme following SCSI format (i.e. /dev/id/c0b0t0u0p1
239 instead of /dev/hda1)
240 Thanks to Andrzej Krzysztofowicz <ankry@green.mif.pg.gda.pl>
241
242- new XT disc naming scheme following SCSI format (i.e. /dev/xd/c0t0p1
243 instead of /dev/xda1)
244 Thanks to Andrzej Krzysztofowicz <ankry@green.mif.pg.gda.pl>
245
246- new non-standard CD-ROM names (i.e. /dev/sbp/c#t#)
247 Thanks to Andrzej Krzysztofowicz <ankry@green.mif.pg.gda.pl>
248
249- allow symlink traversal when mounting the root filesystem
250
251- Create entries for MD devices at MD init
252 Thanks to Christophe Leroy <christophe.leroy5@capway.com>
253===============================================================================
254Changes for patch v21
255
256- ported to kernel 2.1.91
257===============================================================================
258Changes for patch v22
259
260- SCSI host number patch ("scsihosts=" kernel option)
261 Thanks to Andrzej Krzysztofowicz <ankry@green.mif.pg.gda.pl>
262===============================================================================
263Changes for patch v23
264
265- Fixed persistence bug with device numbers for manually created
266 device files
267
268- Fixed problem with recreating symlinks with different content
269
270- Added CONFIG_DEVFS_MOUNT (mount devfs on /dev at boot time)
271===============================================================================
272Changes for patch v24
273
274- Switched from CONFIG_KERNELD to CONFIG_KMOD: module autoloading
275 should now work again
276
277- Hide entries which are manually unlinked
278
279- Always invalidate devfs dentry cache when registering entries
280
281- Support removal of devfs directories via rmdir(2)
282
283- Ensure directories created by <devfs_mk_dir> are visible
284
285- Default no access for "other" for floppy device
286===============================================================================
287Changes for patch v25
288
289- Updates to CREDITS file and minor IDE numbering change
290 Thanks to Andrzej Krzysztofowicz <ankry@green.mif.pg.gda.pl>
291
292- Invalidate devfs dentry cache when making directories
293
294- Invalidate devfs dentry cache when removing entries
295
296- More informative message if root FS mount fails when devfs
297 configured
298
299- Fixed persistence bug with fifos
300===============================================================================
301Changes for patch v26
302
303- ported to kernel 2.1.97
304
305- Changed serial directory from "/dev/serial" to "/dev/tts" and
306 "/dev/consoles" to "/dev/vc" to be more friendly to new procps
307===============================================================================
308Changes for patch v27
309
310- Added support for IDE4 and IDE5
311 Thanks to Andrzej Krzysztofowicz <ankry@green.mif.pg.gda.pl>
312
313- Documented "scsihosts=" boot parameter
314
315- Print process command when debugging kerneld/kmod
316
317- Added debugging for register/unregister/change operations
318
319- Added "devfs=" boot options
320
321- Hide unregistered entries by default
322===============================================================================
323Changes for patch v28
324
325- No longer lock/unlock superblock in <devfs_put_super> (cope with
326 recent VFS interface change)
327
328- Do not automatically change ownership/protection of /dev/tty
329
330- Drop negative dentries when they are released
331
332- Manage dcache more efficiently
333===============================================================================
334Changes for patch v29
335
336- Added DEVFS_FL_AUTO_DEVNUM flag
337===============================================================================
338Changes for patch v30
339
340- No longer set unnecessary methods
341
342- Ported to kernel 2.1.99-pre3
343===============================================================================
344Changes for patch v31
345
346- Added PID display to <call_kerneld> debugging message
347
348- Added "diread" and "diwrite" options
349
350- Ported to kernel 2.1.102
351
352- Fixed persistence problem with permissions
353===============================================================================
354Changes for patch v32
355
356- Fixed devfs support in drivers/block/md.c
357===============================================================================
358Changes for patch v33
359
360- Support legacy device nodes
361
362- Fixed bug where recreated inodes were hidden
363
364- New IDE naming scheme: everything is under /dev/ide
365===============================================================================
366Changes for patch v34
367
368- Improved debugging in <get_vfs_inode>
369
370- Prevent duplicate calls to <devfs_mk_dir> in SCSI layer
371
372- No longer free old dentries in <devfs_mk_dir>
373
374- Free all dentries for a given entry when deleting inodes
375===============================================================================
376Changes for patch v35
377
378- Ported to kernel 2.1.105 (sound driver changes)
379===============================================================================
380Changes for patch v36
381
382- Fixed sound driver port
383===============================================================================
384Changes for patch v37
385
386- Minor documentation tweaks
387===============================================================================
388Changes for patch v38
389
390- More documentation tweaks
391
392- Fix for sound driver port
393
394- Removed ttyname-patch (grab libc 5.4.44 instead)
395
396- Ported to kernel 2.1.107-pre2 (loop driver fix)
397===============================================================================
398Changes for patch v39
399
400- Ported to kernel 2.1.107 (hd.c hunk broke due to spelling "fixes"). Sigh
401
402- Removed many #ifdef's, replaced with trickery in include/devfs_fs.h
403===============================================================================
404Changes for patch v40
405
406- Fix for sound driver port
407
408- Limit auto-device numbering to majors 128 to 239
409===============================================================================
410Changes for patch v41
411
412- Fixed inode times persistence problem
413===============================================================================
414Changes for patch v42
415
416- Ported to kernel 2.1.108 (drivers/scsi/hosts.c hunk broke)
417===============================================================================
418Changes for patch v43
419
420- Fixed spelling in <devfs_readlink> debug
421
422- Fixed bug in <devfs_setup> parsing "dilookup"
423
424- More #ifdef's removed
425
426- Supported Sparc keyboard (/dev/kbd)
427
428- Supported DSP56001 digital signal processor (/dev/dsp56k)
429
430- Supported Apple Desktop Bus (/dev/adb)
431
432- Supported Coda network file system (/dev/cfs*)
433===============================================================================
434Changes for patch v44
435
436- Fixed devfs inode leak when manually recreating inodes
437
438- Fixed permission persistence problem when recreating inodes
439===============================================================================
440Changes for patch v45
441
442- Ported to kernel 2.1.110
443===============================================================================
444Changes for patch v46
445
446- Ported to kernel 2.1.112-pre1
447
448- Removed harmless "unused variable" compiler warning
449
450- Fixed modes for manually recreated device nodes
451===============================================================================
452Changes for patch v47
453
454- Added NULL devfs inode warning in <devfs_read_inode>
455
456- Force all inode nlink values to 1
457===============================================================================
458Changes for patch v48
459
460- Added "dimknod" option
461
462- Set inode nlink to 0 when freeing dentries
463
464- Added support for virtual console capture devices (/dev/vcs*)
465 Thanks to Dennis Hou <smilax@mindmeld.yi.org>
466
467- Fixed modes for manually recreated symlinks
468===============================================================================
469Changes for patch v49
470
471- Ported to kernel 2.1.113
472===============================================================================
473Changes for patch v50
474
475- Fixed bugs in recreated directories and symlinks
476===============================================================================
477Changes for patch v51
478
479- Improved robustness of rc.devfs script
480 Thanks to Roderich Schupp <rsch@experteam.de>
481
482- Fixed bugs in recreated device nodes
483
484- Fixed bug in currently unused <devfs_get_handle_from_inode>
485
486- Defined new <devfs_handle_t> type
487
488- Improved debugging when getting entries
489
490- Fixed bug where directories could be emptied
491
492- Ported to kernel 2.1.115
493===============================================================================
494Changes for patch v52
495
496- Replaced dummy .epoch inode with .devfsd character device
497
498- Modified rc.devfs to take account of above change
499
500- Removed spurious driver warning messages when CONFIG_DEVFS_FS=n
501
502- Implemented devfsd protocol revision 0
503===============================================================================
504Changes for patch v53
505
506- Ported to kernel 2.1.116 (kmod change broke hunk)
507
508- Updated Documentation/Configure.help
509
510- Test and tty pattern patch for rc.devfs script
511 Thanks to Roderich Schupp <rsch@experteam.de>
512
513- Added soothing message to warning in <devfs_d_iput>
514===============================================================================
515Changes for patch v54
516
517- Ported to kernel 2.1.117
518
519- Fixed default permissions in sound driver
520
521- Added support for frame buffer devices (/dev/fb*)
522===============================================================================
523Changes for patch v55
524
525- Ported to kernel 2.1.119
526
527- Use GCC extensions for structure initialisations
528
529- Implemented async open notification
530
531- Incremented devfsd protocol revision to 1
532===============================================================================
533Changes for patch v56
534
535- Ported to kernel 2.1.120-pre3
536
537- Moved async open notification to end of <devfs_open>
538===============================================================================
539Changes for patch v57
540
541- Ported to kernel 2.1.121
542
543- Prepended "/dev/" to module load request
544
545- Renamed <call_kerneld> to <call_kmod>
546
547- Created sample modules.conf file
548===============================================================================
549Changes for patch v58
550
551- Fixed typo "AYSNC" -> "ASYNC"
552===============================================================================
553Changes for patch v59
554
555- Added open flag for files
556===============================================================================
557Changes for patch v60
558
559- Ported to kernel 2.1.123-pre2
560===============================================================================
561Changes for patch v61
562
563- Set i_blocks=0 and i_blksize=1024 in <devfs_read_inode>
564===============================================================================
565Changes for patch v62
566
567- Ported to kernel 2.1.123
568===============================================================================
569Changes for patch v63
570
571- Ported to kernel 2.1.124-pre2
572===============================================================================
573Changes for patch v64
574
575- Fixed Unix98 pty support
576
577- Increased buffer size in <get_partition_list> to avoid crash and
578 burn
579===============================================================================
580Changes for patch v65
581
582- More Unix98 pty support fixes
583
584- Added test for empty <<name>> in <devfs_find_handle>
585
586- Renamed <generate_path> to <devfs_generate_path> and published
587
588- Created /dev/root symlink
589 Thanks to Roderich Schupp <rsch@ExperTeam.de>
590 with further modifications by me
591===============================================================================
592Changes for patch v66
593
594- Yet more Unix98 pty support fixes (now tested)
595
596- Created <devfs_get_fops>
597
598- Support media change checks when CONFIG_DEVFS_ONLY=y
599
600- Abolished Unix98-style PTY names for old PTY devices
601===============================================================================
602Changes for patch v67
603
604- Added inline declaration for dummy <devfs_generate_path>
605
606- Removed spurious "unable to register... in devfs" messages when
607 CONFIG_DEVFS_FS=n
608
609- Fixed misc. devices when CONFIG_DEVFS_FS=n
610
611- Limit auto-device numbering to majors 144 to 239
612===============================================================================
613Changes for patch v68
614
615- Hide unopened virtual consoles from directory listings
616
617- Added support for video capture devices
618
619- Ported to kernel 2.1.125
620===============================================================================
621Changes for patch v69
622
623- Fix for CONFIG_VT=n
624===============================================================================
625Changes for patch v70
626
627- Added support for non-OSS/Free sound cards
628===============================================================================
629Changes for patch v71
630
631- Ported to kernel 2.1.126-pre2
632===============================================================================
633Changes for patch v72
634
635- #ifdef's for CONFIG_DEVFS_DISABLE_OLD_NAMES removed
636===============================================================================
637Changes for patch v73
638
639- CONFIG_DEVFS_DISABLE_OLD_NAMES replaced with "nocompat" boot option
640
641- CONFIG_DEVFS_BOOT_OPTIONS removed: boot options always available
642===============================================================================
643Changes for patch v74
644
645- Removed CONFIG_DEVFS_MOUNT and "mount" boot option and replaced with
646 "nomount" boot option
647
648- Documentation updates
649
650- Updated sample modules.conf
651===============================================================================
652Changes for patch v75
653
654- Updated sample modules.conf
655
656- Remount devfs after initrd finishes
657
658- Ported to kernel 2.1.127
659
660- Added support for ISDN
661 Thanks to Christophe Leroy <christophe.leroy5@capway.com>
662===============================================================================
663Changes for patch v76
664
665- Updated an email address in ChangeLog
666
667- CONFIG_DEVFS_ONLY replaced with "only" boot option
668===============================================================================
669Changes for patch v77
670
671- Added DEVFS_FL_REMOVABLE flag
672
673- Check for disc change when listing directories with removable media
674 devices
675
676- Use DEVFS_FL_REMOVABLE in sd.c
677
678- Ported to kernel 2.1.128
679===============================================================================
680Changes for patch v78
681
682- Only call <scan_dir_for_removable> on first call to <devfs_readdir>
683
684- Ported to kernel 2.1.129-pre5
685
686- ISDN support improvements
687 Thanks to Christophe Leroy <christophe.leroy5@capway.com>
688===============================================================================
689Changes for patch v79
690
691- Ported to kernel 2.1.130
692
693- Renamed miscdevice "apm" to "apm_bios" to be consistent with
694 devices.txt
695===============================================================================
696Changes for patch v80
697
698- Ported to kernel 2.1.131
699
700- Updated <devfs_rmdir> for VFS change in 2.1.131
701===============================================================================
702Changes for patch v81
703
704- Fixed permissions on /dev/ptmx
705===============================================================================
706Changes for patch v82
707
708- Ported to kernel 2.1.132-pre4
709
710- Changed initial permissions on /dev/pts/*
711
712- Created <devfs_mk_compat>
713
714- Added "symlinks" boot option
715
716- Changed devfs_register_blkdev() back to register_blkdev() for IDE
717
718- Check for partitions on removable media in <devfs_lookup>
719===============================================================================
720Changes for patch v83
721
722- Fixed support for ramdisc when using string-based root FS name
723
724- Ported to kernel 2.2.0-pre1
725===============================================================================
726Changes for patch v84
727
728- Ported to kernel 2.2.0-pre7
729===============================================================================
730Changes for patch v85
731
732- Compile fixes for driver/sound/sound_common.c (non-module) and
733 drivers/isdn/isdn_common.c
734 Thanks to Christophe Leroy <christophe.leroy5@capway.com>
735
736- Added support for registering regular files
737
738- Created <devfs_set_file_size>
739
740- Added /dev/cpu/mtrr as an alternative interface to /proc/mtrr
741
742- Update devfs inodes from entries if not changed through FS
743===============================================================================
744Changes for patch v86
745
746- Ported to kernel 2.2.0-pre9
747===============================================================================
748Changes for patch v87
749
750- Fixed bug when mounting non-devfs devices in a devfs
751===============================================================================
752Changes for patch v88
753
754- Fixed <devfs_fill_file> to only initialise temporary inodes
755
756- Trap for NULL fops in <devfs_register>
757
758- Return -ENODEV in <devfs_fill_file> for non-driver inodes
759
760- Fixed bug when unswapping non-devfs devices in a devfs
761===============================================================================
762Changes for patch v89
763
764- Switched to C data types in include/linux/devfs_fs.h
765
766- Switched from PATH_MAX to DEVFS_PATHLEN
767
768- Updated Documentation/filesystems/devfs/modules.conf to take account
769 of reverse scanning (!) by modprobe
770
771- Ported to kernel 2.2.0
772===============================================================================
773Changes for patch v90
774
775- CONFIG_DEVFS_DISABLE_OLD_TTY_NAMES replaced with "nottycompat" boot
776 option
777
778- CONFIG_DEVFS_TTY_COMPAT removed: existing "symlinks" boot option now
779 controls this. This means you must have libc 5.4.44 or later, or a
780 recent version of libc 6 if you use the "symlinks" option
781===============================================================================
782Changes for patch v91
783
784- Switch from <devfs_mk_symlink> to <devfs_mk_compat> in
785 drivers/char/vc_screen.c to fix problems with Midnight Commander
786===============================================================================
787Changes for patch v92
788
789- Ported to kernel 2.2.2-pre5
790===============================================================================
791Changes for patch v93
792
793- Modified <sd_name> in drivers/scsi/sd.c to cope with devices that
794 don't exist (which happens with new RAID autostart code printk()s)
795===============================================================================
796Changes for patch v94
797
798- Fixed bug in joystick driver: only first joystick was registered
799===============================================================================
800Changes for patch v95
801
802- Fixed another bug in joystick driver
803
804- Fixed <devfsd_read> to not overrun event buffer
805===============================================================================
806Changes for patch v96
807
808- Ported to kernel 2.2.5-2
809
810- Created <devfs_auto_unregister>
811
812- Fixed bugs: compatibility entries were not unregistered for:
813 loop driver
814 floppy driver
815 RAMDISC driver
816 IDE tape driver
817 SCSI CD-ROM driver
818 SCSI HDD driver
819===============================================================================
820Changes for patch v97
821
822- Fixed bugs: compatibility entries were not unregistered for:
823 ALSA sound driver
824 partitions in generic disc driver
825
826- Don't return unregistred entries in <devfs_find_handle>
827
828- Panic in <devfs_unregister> if entry unregistered
829
830- Don't panic in <devfs_auto_unregister> for duplicates
831===============================================================================
832Changes for patch v98
833
834- Don't unregister already unregistered entries in <unregister>
835
836- Register entry in <sd_detect>
837
838- Unregister entry in <sd_detach>
839
840- Changed to <devfs_*register_chrdev> in drivers/char/tty_io.c
841
842- Ported to kernel 2.2.7
843===============================================================================
844Changes for patch v99
845
846- Ported to kernel 2.2.8
847
848- Fixed bug in drivers/scsi/sd.c when >16 SCSI discs
849
850- Disable warning messages when unable to read partition table for
851 removable media
852===============================================================================
853Changes for patch v100
854
855- Ported to kernel 2.3.1-pre5
856
857- Added "oops-on-panic" boot option
858
859- Improved debugging in <devfs_register> and <devfs_unregister>
860
861- Register entry in <sr_detect>
862
863- Unregister entry in <sr_detach>
864
865- Register entry in <sg_detect>
866
867- Unregister entry in <sg_detach>
868
869- Added support for ALSA drivers
870===============================================================================
871Changes for patch v101
872
873- Ported to kernel 2.3.2
874===============================================================================
875Changes for patch v102
876
877- Update serial driver to register PCMCIA entries
878 Thanks to Roch-Alexandre Nomine-Beguin <roch@samarkand.infini.fr>
879
880- Updated an email address in ChangeLog
881
882- Hide virtual console capture entries from directory listings when
883 corresponding console device is not open
884===============================================================================
885Changes for patch v103
886
887- Ported to kernel 2.3.3
888===============================================================================
889Changes for patch v104
890
891- Added documentation for some functions
892
893- Added "doc" target to fs/devfs/Makefile
894
895- Added "v4l" directory for video4linux devices
896
897- Replaced call to <devfs_unregister> in <sd_detach> with call to
898 <devfs_register_partitions>
899
900- Moved registration for sr and sg drivers from detect() to attach()
901 methods
902
903- Register entries in <st_attach> and unregister in <st_detach>
904
905- Work around IDE driver treating CD-ROM as gendisk
906
907- Use <sed> instead of <tr> in rc.devfs
908
909- Updated ToDo list
910
911- Removed "oops-on-panic" boot option: now always Oops
912===============================================================================
913Changes for patch v105
914
915- Unregister SCSI host from <scsi_host_no_list> in <scsi_unregister>
916 Thanks to Zoltán Böszörményi <zboszor@mail.externet.hu>
917
918- Don't save /dev/log in rc.devfs
919
920- Ported to kernel 2.3.4-pre1
921===============================================================================
922Changes for patch v106
923
924- Fixed silly typo in drivers/scsi/st.c
925
926- Improved debugging in <devfs_register>
927===============================================================================
928Changes for patch v107
929
930- Added "diunlink" and "nokmod" boot options
931
932- Removed superfluous warning message in <devfs_d_iput>
933===============================================================================
934Changes for patch v108
935
936- Remove entries when unloading sound module
937===============================================================================
938Changes for patch v109
939
940- Ported to kernel 2.3.6-pre2
941===============================================================================
942Changes for patch v110
943
944- Took account of change to <d_alloc_root>
945===============================================================================
946Changes for patch v111
947
948- Created separate event queue for each mounted devfs
949
950- Removed <devfs_invalidate_dcache>
951
952- Created new ioctl()s for devfsd
953
954- Incremented devfsd protocol revision to 3
955
956- Fixed bug when re-creating directories: contents were lost
957
958- Block access to inodes until devfsd updates permissions
959===============================================================================
960Changes for patch v112
961
962- Modified patch so it applies against 2.3.5 and 2.3.6
963
964- Updated an email address in ChangeLog
965
966- Do not automatically change ownership/protection of /dev/tty<n>
967
968- Updated sample modules.conf
969
970- Switched to sending process uid/gid to devfsd
971
972- Renamed <call_kmod> to <try_modload>
973
974- Added DEVFSD_NOTIFY_LOOKUP event
975
976- Added DEVFSD_NOTIFY_CHANGE event
977
978- Added DEVFSD_NOTIFY_CREATE event
979
980- Incremented devfsd protocol revision to 4
981
982- Moved kernel-specific stuff to include/linux/devfs_fs_kernel.h
983===============================================================================
984Changes for patch v113
985
986- Ported to kernel 2.3.9
987
988- Restricted permissions on some block devices
989===============================================================================
990Changes for patch v114
991
992- Added support for /dev/netlink
993 Thanks to Dennis Hou <smilax@mindmeld.yi.org>
994
995- Return EISDIR rather than EINVAL for read(2) on directories
996
997- Ported to kernel 2.3.10
998===============================================================================
999Changes for patch v115
1000
1001- Added support for all remaining character devices
1002 Thanks to Dennis Hou <smilax@mindmeld.yi.org>
1003
1004- Cleaned up netlink support
1005===============================================================================
1006Changes for patch v116
1007
1008- Added support for /dev/parport%d
1009 Thanks to Tim Waugh <tim@cyberelk.demon.co.uk>
1010
1011- Fixed parallel port ATAPI tape driver
1012
1013- Fixed Atari SLM laser printer driver
1014===============================================================================
1015Changes for patch v117
1016
1017- Added support for COSA card
1018 Thanks to Dennis Hou <smilax@mindmeld.yi.org>
1019
1020- Fixed drivers/char/ppdev.c: missing #include <linux/init.h>
1021
1022- Fixed drivers/char/ftape/zftape/zftape-init.c
1023 Thanks to Vladimir Popov <mashgrad@usa.net>
1024===============================================================================
1025Changes for patch v118
1026
1027- Ported to kernel 2.3.15-pre3
1028
1029- Fixed bug in loop driver
1030
1031- Unregister /dev/lp%d entries in drivers/char/lp.c
1032 Thanks to Maciej W. Rozycki <macro@ds2.pg.gda.pl>
1033===============================================================================
1034Changes for patch v119
1035
1036- Ported to kernel 2.3.16
1037===============================================================================
1038Changes for patch v120
1039
1040- Fixed bug in drivers/scsi/scsi.c
1041
1042- Added /dev/ppp
1043 Thanks to Dennis Hou <smilax@mindmeld.yi.org>
1044
1045- Ported to kernel 2.3.17
1046===============================================================================
1047Changes for patch v121
1048
1049- Fixed bug in drivers/block/loop.c
1050
1051- Ported to kernel 2.3.18
1052===============================================================================
1053Changes for patch v122
1054
1055- Ported to kernel 2.3.19
1056===============================================================================
1057Changes for patch v123
1058
1059- Ported to kernel 2.3.20
1060===============================================================================
1061Changes for patch v124
1062
1063- Ported to kernel 2.3.21
1064===============================================================================
1065Changes for patch v125
1066
1067- Created <devfs_get_info>, <devfs_set_info>,
1068 <devfs_get_first_child> and <devfs_get_next_sibling>
1069 Added <<dir>> parameter to <devfs_register>, <devfs_mk_compat>,
1070 <devfs_mk_dir> and <devfs_find_handle>
1071 Work sponsored by SGI
1072
1073- Fixed apparent bug in COSA driver
1074
1075- Re-instated "scsihosts=" boot option
1076===============================================================================
1077Changes for patch v126
1078
1079- Always create /dev/pts if CONFIG_UNIX98_PTYS=y
1080
1081- Fixed call to <devfs_mk_dir> in drivers/block/ide-disk.c
1082 Thanks to Dennis Hou <smilax@mindmeld.yi.org>
1083
1084- Allow multiple unregistrations
1085
1086- Created /dev/scsi hierarchy
1087 Work sponsored by SGI
1088===============================================================================
1089Changes for patch v127
1090
1091Work sponsored by SGI
1092
1093- No longer disable devpts if devfs enabled (caveat emptor)
1094
1095- Added flags array to struct gendisk and removed code from
1096 drivers/scsi/sd.c
1097
1098- Created /dev/discs hierarchy
1099===============================================================================
1100Changes for patch v128
1101
1102Work sponsored by SGI
1103
1104- Created /dev/cdroms hierarchy
1105===============================================================================
1106Changes for patch v129
1107
1108Work sponsored by SGI
1109
1110- Removed compatibility entries for sound devices
1111
1112- Removed compatibility entries for printer devices
1113
1114- Removed compatibility entries for video4linux devices
1115
1116- Removed compatibility entries for parallel port devices
1117
1118- Removed compatibility entries for frame buffer devices
1119===============================================================================
1120Changes for patch v130
1121
1122Work sponsored by SGI
1123
1124- Added major and minor number to devfsd protocol
1125
1126- Incremented devfsd protocol revision to 5
1127
1128- Removed compatibility entries for SoundBlaster CD-ROMs
1129
1130- Removed compatibility entries for netlink devices
1131
1132- Removed compatibility entries for SCSI generic devices
1133
1134- Removed compatibility entries for SCSI tape devices
1135===============================================================================
1136Changes for patch v131
1137
1138Work sponsored by SGI
1139
1140- Support info pointer for all devfs entry types
1141
1142- Added <<info>> parameter to <devfs_mk_dir> and <devfs_mk_symlink>
1143
1144- Removed /dev/st hierarchy
1145
1146- Removed /dev/sg hierarchy
1147
1148- Removed compatibility entries for loop devices
1149
1150- Removed compatibility entries for IDE tape devices
1151
1152- Removed compatibility entries for SCSI CD-ROMs
1153
1154- Removed /dev/sr hierarchy
1155===============================================================================
1156Changes for patch v132
1157
1158Work sponsored by SGI
1159
1160- Removed compatibility entries for floppy devices
1161
1162- Removed compatibility entries for RAMDISCs
1163
1164- Removed compatibility entries for meta-devices
1165
1166- Removed compatibility entries for SCSI discs
1167
1168- Created <devfs_make_root>
1169
1170- Removed /dev/sd hierarchy
1171
1172- Support "../" when searching devfs namespace
1173
1174- Created /dev/ide/host* hierarchy
1175
1176- Supported IDE hard discs in /dev/ide/host* hierarchy
1177
1178- Removed compatibility entries for IDE discs
1179
1180- Removed /dev/ide/hd hierarchy
1181
1182- Supported IDE CD-ROMs in /dev/ide/host* hierarchy
1183
1184- Removed compatibility entries for IDE CD-ROMs
1185
1186- Removed /dev/ide/cd hierarchy
1187===============================================================================
1188Changes for patch v133
1189
1190Work sponsored by SGI
1191
1192- Created <devfs_get_unregister_slave>
1193
1194- Fixed bug in fs/partitions/check.c when rescanning
1195===============================================================================
1196Changes for patch v134
1197
1198Work sponsored by SGI
1199
1200- Removed /dev/sd, /dev/sr, /dev/st and /dev/sg directories
1201
1202- Removed /dev/ide/hd directory
1203
1204- Exported <devfs_get_parent>
1205
1206- Created <devfs_register_tape> and /dev/tapes hierarchy
1207
1208- Removed /dev/ide/mt hierarchy
1209
1210- Removed /dev/ide/fd hierarchy
1211
1212- Ported to kernel 2.3.25
1213===============================================================================
1214Changes for patch v135
1215
1216Work sponsored by SGI
1217
1218- Removed compatibility entries for virtual console capture devices
1219
1220- Removed unused <devfs_set_symlink_destination>
1221
1222- Removed compatibility entries for serial devices
1223
1224- Removed compatibility entries for console devices
1225
1226- Do not hide entries from devfsd or children
1227
1228- Removed DEVFS_FL_TTY_COMPAT flag
1229
1230- Removed "nottycompat" boot option
1231
1232- Removed <devfs_mk_compat>
1233===============================================================================
1234Changes for patch v136
1235
1236Work sponsored by SGI
1237
1238- Moved BSD pty devices to /dev/pty
1239
1240- Added DEVFS_FL_WAIT flag
1241===============================================================================
1242Changes for patch v137
1243
1244Work sponsored by SGI
1245
1246- Really fixed bug in fs/partitions/check.c when rescanning
1247
1248- Support new "disc" naming scheme in <get_removable_partition>
1249
1250- Allow NULL fops in <devfs_register>
1251
1252- Removed redundant name functions in SCSI disc and IDE drivers
1253===============================================================================
1254Changes for patch v138
1255
1256Work sponsored by SGI
1257
1258- Fixed old bugs in drivers/block/paride/pt.c, drivers/char/tpqic02.c,
1259 drivers/net/wan/cosa.c and drivers/scsi/scsi.c
1260 Thanks to Sergey Kubushin <ksi@ksi-linux.com>
1261
1262- Fall back to major table if NULL fops given to <devfs_register>
1263===============================================================================
1264Changes for patch v139
1265
1266Work sponsored by SGI
1267
1268- Corrected and moved <get_blkfops> and <get_chrfops> declarations
1269 from arch/alpha/kernel/osf_sys.c to include/linux/fs.h
1270
1271- Removed name function from struct gendisk
1272
1273- Updated devfs FAQ
1274===============================================================================
1275Changes for patch v140
1276
1277Work sponsored by SGI
1278
1279- Ported to kernel 2.3.27
1280===============================================================================
1281Changes for patch v141
1282
1283Work sponsored by SGI
1284
1285- Bug fix in arch/m68k/atari/joystick.c
1286
1287- Moved ISDN and capi devices to /dev/isdn
1288===============================================================================
1289Changes for patch v142
1290
1291Work sponsored by SGI
1292
1293- Bug fix in drivers/block/ide-probe.c (patch confusion)
1294===============================================================================
1295Changes for patch v143
1296
1297Work sponsored by SGI
1298
1299- Bug fix in drivers/block/blkpg.c:partition_name()
1300===============================================================================
1301Changes for patch v144
1302
1303Work sponsored by SGI
1304
1305- Ported to kernel 2.3.29
1306
1307- Removed calls to <devfs_register> from cdu31a, cm206, mcd and mcdx
1308 CD-ROM drivers: generic driver handles this now
1309
1310- Moved joystick devices to /dev/joysticks
1311===============================================================================
1312Changes for patch v145
1313
1314Work sponsored by SGI
1315
1316- Ported to kernel 2.3.30-pre3
1317
1318- Register whole-disc entry even for invalid partition tables
1319
1320- Fixed bug in mounting root FS when initrd enabled
1321
1322- Fixed device entry leak with IDE CD-ROMs
1323
1324- Fixed compile problem with drivers/isdn/isdn_common.c
1325
1326- Moved COSA devices to /dev/cosa
1327
1328- Support fifos when unregistering
1329
1330- Created <devfs_register_series> and used in many drivers
1331
1332- Moved Coda devices to /dev/coda
1333
1334- Moved parallel port IDE tapes to /dev/pt
1335
1336- Moved parallel port IDE generic devices to /dev/pg
1337===============================================================================
1338Changes for patch v146
1339
1340Work sponsored by SGI
1341
1342- Removed obsolete DEVFS_FL_COMPAT and DEVFS_FL_TOLERANT flags
1343
1344- Fixed compile problem with fs/coda/psdev.c
1345
1346- Reinstate change to <devfs_register_blkdev> in
1347 drivers/block/ide-probe.c now that fs/isofs/inode.c is fixed
1348
1349- Switched to <devfs_register_blkdev> in drivers/block/floppy.c,
1350 drivers/scsi/sr.c and drivers/block/md.c
1351
1352- Moved DAC960 devices to /dev/dac960
1353===============================================================================
1354Changes for patch v147
1355
1356Work sponsored by SGI
1357
1358- Ported to kernel 2.3.32-pre4
1359===============================================================================
1360Changes for patch v148
1361
1362Work sponsored by SGI
1363
1364- Removed kmod support: use devfsd instead
1365
1366- Moved miscellaneous character devices to /dev/misc
1367===============================================================================
1368Changes for patch v149
1369
1370Work sponsored by SGI
1371
1372- Ensure include/linux/joystick.h is OK for user-space
1373
1374- Improved debugging in <get_vfs_inode>
1375
1376- Ensure dentries created by devfsd will be cleaned up
1377===============================================================================
1378Changes for patch v150
1379
1380Work sponsored by SGI
1381
1382- Ported to kernel 2.3.34
1383===============================================================================
1384Changes for patch v151
1385
1386Work sponsored by SGI
1387
1388- Ported to kernel 2.3.35-pre1
1389
1390- Created <devfs_get_name>
1391===============================================================================
1392Changes for patch v152
1393
1394Work sponsored by SGI
1395
1396- Updated sample modules.conf
1397
1398- Ported to kernel 2.3.36-pre1
1399===============================================================================
1400Changes for patch v153
1401
1402Work sponsored by SGI
1403
1404- Ported to kernel 2.3.42
1405
1406- Removed <devfs_fill_file>
1407===============================================================================
1408Changes for patch v154
1409
1410Work sponsored by SGI
1411
1412- Took account of device number changes for /dev/fb*
1413===============================================================================
1414Changes for patch v155
1415
1416Work sponsored by SGI
1417
1418- Ported to kernel 2.3.43-pre8
1419
1420- Moved /dev/tty0 to /dev/vc/0
1421
1422- Moved sequence number formatting from <_tty_make_name> to drivers
1423===============================================================================
1424Changes for patch v156
1425
1426Work sponsored by SGI
1427
1428- Fixed breakage in drivers/scsi/sd.c due to recent SCSI changes
1429===============================================================================
1430Changes for patch v157
1431
1432Work sponsored by SGI
1433
1434- Ported to kernel 2.3.45
1435===============================================================================
1436Changes for patch v158
1437
1438Work sponsored by SGI
1439
1440- Ported to kernel 2.3.46-pre2
1441===============================================================================
1442Changes for patch v159
1443
1444Work sponsored by SGI
1445
1446- Fixed drivers/block/md.c
1447 Thanks to Mike Galbraith <mikeg@weiden.de>
1448
1449- Documentation fixes
1450
1451- Moved device registration from <lp_init> to <lp_register>
1452 Thanks to Tim Waugh <twaugh@redhat.com>
1453===============================================================================
1454Changes for patch v160
1455
1456Work sponsored by SGI
1457
1458- Fixed drivers/char/joystick/joystick.c
1459 Thanks to Vojtech Pavlik <vojtech@suse.cz>
1460
1461- Documentation updates
1462
1463- Fixed arch/i386/kernel/mtrr.c if procfs and devfs not enabled
1464
1465- Fixed drivers/char/stallion.c
1466===============================================================================
1467Changes for patch v161
1468
1469Work sponsored by SGI
1470
1471- Remove /dev/ide when ide-mod is unloaded
1472
1473- Fixed bug in drivers/block/ide-probe.c when secondary but no primary
1474
1475- Added DEVFS_FL_NO_PERSISTENCE flag
1476
1477- Used new DEVFS_FL_NO_PERSISTENCE flag for Unix98 pty slaves
1478
1479- Removed unnecessary call to <update_devfs_inode_from_entry> in
1480 <devfs_readdir>
1481
1482- Only set auto-ownership for /dev/pty/s*
1483===============================================================================
1484Changes for patch v162
1485
1486Work sponsored by SGI
1487
1488- Set inode->i_size to correct size for symlinks
1489 Thanks to Jeremy Fitzhardinge <jeremy@goop.org>
1490
1491- Only give lookup() method to directories to comply with new VFS
1492 assumptions
1493
1494- Remove unnecessary tests in symlink methods
1495
1496- Don't kill existing block ops in <devfs_read_inode>
1497
1498- Restore auto-ownership for /dev/pty/m*
1499===============================================================================
1500Changes for patch v163
1501
1502Work sponsored by SGI
1503
1504- Don't create missing directories in <devfs_find_handle>
1505
1506- Removed Documentation/filesystems/devfs/mk-devlinks
1507
1508- Updated Documentation/filesystems/devfs/README
1509===============================================================================
1510Changes for patch v164
1511
1512Work sponsored by SGI
1513
1514- Fixed CONFIG_DEVFS breakage in drivers/char/serial.c introduced in
1515 linux-2.3.99-pre6-7
1516===============================================================================
1517Changes for patch v165
1518
1519Work sponsored by SGI
1520
1521- Ported to kernel 2.3.99-pre6
1522===============================================================================
1523Changes for patch v166
1524
1525Work sponsored by SGI
1526
1527- Added CONFIG_DEVFS_MOUNT
1528===============================================================================
1529Changes for patch v167
1530
1531Work sponsored by SGI
1532
1533- Updated Documentation/filesystems/devfs/README
1534
1535- Updated sample modules.conf
1536===============================================================================
1537Changes for patch v168
1538
1539Work sponsored by SGI
1540
1541- Disabled multi-mount capability (use VFS bindings instead)
1542
1543- Updated README from master HTML file
1544===============================================================================
1545Changes for patch v169
1546
1547Work sponsored by SGI
1548
1549- Removed multi-mount code
1550
1551- Removed compatibility macros: VFS has changed too much
1552===============================================================================
1553Changes for patch v170
1554
1555Work sponsored by SGI
1556
1557- Updated README from master HTML file
1558
1559- Merged devfs inode into devfs entry
1560===============================================================================
1561Changes for patch v171
1562
1563Work sponsored by SGI
1564
1565- Updated sample modules.conf
1566
1567- Removed dead code in <devfs_register> which used to call
1568 <free_dentries>
1569
1570- Ported to kernel 2.4.0-test2-pre3
1571===============================================================================
1572Changes for patch v172
1573
1574Work sponsored by SGI
1575
1576- Changed interface to <devfs_register>
1577
1578- Changed interface to <devfs_register_series>
1579===============================================================================
1580Changes for patch v173
1581
1582Work sponsored by SGI
1583
1584- Simplified interface to <devfs_mk_symlink>
1585
1586- Simplified interface to <devfs_mk_dir>
1587
1588- Simplified interface to <devfs_find_handle>
1589===============================================================================
1590Changes for patch v174
1591
1592Work sponsored by SGI
1593
1594- Updated README from master HTML file
1595===============================================================================
1596Changes for patch v175
1597
1598Work sponsored by SGI
1599
1600- DocBook update for fs/devfs/base.c
1601 Thanks to Tim Waugh <twaugh@redhat.com>
1602
1603- Removed stale fs/tunnel.c (was never used or completed)
1604===============================================================================
1605Changes for patch v176
1606
1607Work sponsored by SGI
1608
1609- Updated ToDo list
1610
1611- Removed sample modules.conf: now distributed with devfsd
1612
1613- Updated README from master HTML file
1614
1615- Ported to kernel 2.4.0-test3-pre4 (which had devfs-patch-v174)
1616===============================================================================
1617Changes for patch v177
1618
1619- Updated README from master HTML file
1620
1621- Documentation cleanups
1622
1623- Ensure <devfs_generate_path> terminates string for root entry
1624 Thanks to Tim Jansen <tim@tjansen.de>
1625
1626- Exported <devfs_get_name> to modules
1627
1628- Make <devfs_mk_symlink> send events to devfsd
1629
1630- Cleaned up option processing in <devfs_setup>
1631
1632- Fixed bugs in handling symlinks: could leak or cause Oops
1633
1634- Cleaned up directory handling by separating fops
1635 Thanks to Alexander Viro <viro@parcelfarce.linux.theplanet.co.uk>
1636===============================================================================
1637Changes for patch v178
1638
1639- Fixed handling of inverted options in <devfs_setup>
1640===============================================================================
1641Changes for patch v179
1642
1643- Adjusted <try_modload> to account for <devfs_generate_path> fix
1644===============================================================================
1645Changes for patch v180
1646
1647- Fixed !CONFIG_DEVFS_FS stub declaration of <devfs_get_info>
1648===============================================================================
1649Changes for patch v181
1650
1651- Answered question posed by Al Viro and removed his comments from <devfs_open>
1652
1653- Moved setting of registered flag after other fields are changed
1654
1655- Fixed race between <devfsd_close> and <devfsd_notify_one>
1656
1657- Global VFS changes added bogus BKL to devfsd_close(): removed
1658
1659- Widened locking in <devfs_readlink> and <devfs_follow_link>
1660
1661- Replaced <devfsd_read> stack usage with <devfsd_ioctl> kmalloc
1662
1663- Simplified locking in <devfsd_ioctl> and fixed memory leak
1664===============================================================================
1665Changes for patch v182
1666
1667- Created <devfs_*alloc_major> and <devfs_*alloc_devnum>
1668
1669- Removed broken devnum allocation and use <devfs_alloc_devnum>
1670
1671- Fixed old devnum leak by calling new <devfs_dealloc_devnum>
1672
1673- Created <devfs_*alloc_unique_number>
1674
1675- Fixed number leak for /dev/cdroms/cdrom%d
1676
1677- Fixed number leak for /dev/discs/disc%d
1678===============================================================================
1679Changes for patch v183
1680
1681- Fixed bug in <devfs_setup> which could hang boot process
1682===============================================================================
1683Changes for patch v184
1684
1685- Documentation typo fix for fs/devfs/util.c
1686
1687- Fixed drivers/char/stallion.c for devfs
1688
1689- Added DEVFSD_NOTIFY_DELETE event
1690
1691- Updated README from master HTML file
1692
1693- Removed #include <asm/segment.h> from fs/devfs/base.c
1694===============================================================================
1695Changes for patch v185
1696
1697- Made <block_semaphore> and <char_semaphore> in fs/devfs/util.c
1698 private
1699
1700- Fixed inode table races by removing it and using inode->u.generic_ip
1701 instead
1702
1703- Moved <devfs_read_inode> into <get_vfs_inode>
1704
1705- Moved <devfs_write_inode> into <devfs_notify_change>
1706===============================================================================
1707Changes for patch v186
1708
1709- Fixed race in <devfs_do_symlink> for uni-processor
1710
1711- Updated README from master HTML file
1712===============================================================================
1713Changes for patch v187
1714
1715- Fixed drivers/char/stallion.c for devfs
1716
1717- Fixed drivers/char/rocket.c for devfs
1718
1719- Fixed bug in <devfs_alloc_unique_number>: limited to 128 numbers
1720===============================================================================
1721Changes for patch v188
1722
1723- Updated major masks in fs/devfs/util.c up to Linus' "no new majors"
1724 proclamation. Block: were 126 now 122 free, char: were 26 now 19 free
1725
1726- Updated README from master HTML file
1727
1728- Removed remnant of multi-mount support in <devfs_mknod>
1729
1730- Removed unused DEVFS_FL_SHOW_UNREG flag
1731===============================================================================
1732Changes for patch v189
1733
1734- Removed nlink field from struct devfs_inode
1735
1736- Removed auto-ownership for /dev/pty/* (BSD ptys) and used
1737 DEVFS_FL_CURRENT_OWNER|DEVFS_FL_NO_PERSISTENCE for /dev/pty/s* (just
1738 like Unix98 pty slaves) and made /dev/pty/m* rw-rw-rw- access
1739===============================================================================
1740Changes for patch v190
1741
1742- Updated README from master HTML file
1743
1744- Replaced BKL with global rwsem to protect symlink data (quick and
1745 dirty hack)
1746===============================================================================
1747Changes for patch v191
1748
1749- Replaced global rwsem for symlink with per-link refcount
1750===============================================================================
1751Changes for patch v192
1752
1753- Removed unnecessary #ifdef CONFIG_DEVFS_FS from arch/i386/kernel/mtrr.c
1754
1755- Ported to kernel 2.4.10-pre11
1756
1757- Set inode->i_mapping->a_ops for block nodes in <get_vfs_inode>
1758===============================================================================
1759Changes for patch v193
1760
1761- Went back to global rwsem for symlinks (refcount scheme no good)
1762===============================================================================
1763Changes for patch v194
1764
1765- Fixed overrun in <devfs_link> by removing function (not needed)
1766
1767- Updated README from master HTML file
1768===============================================================================
1769Changes for patch v195
1770
1771- Fixed buffer underrun in <try_modload>
1772
1773- Moved down_read() from <search_for_entry_in_dir> to <find_entry>
1774===============================================================================
1775Changes for patch v196
1776
1777- Fixed race in <devfsd_ioctl> when setting event mask
1778 Thanks to Kari Hurtta <hurtta@leija.mh.fmi.fi>
1779
1780- Avoid deadlock in <devfs_follow_link> by using temporary buffer
1781===============================================================================
1782Changes for patch v197
1783
1784- First release of new locking code for devfs core (v1.0)
1785
1786- Fixed bug in drivers/cdrom/cdrom.c
1787===============================================================================
1788Changes for patch v198
1789
1790- Discard temporary buffer, now use "%s" for dentry names
1791
1792- Don't generate path in <try_modload>: use fake entry instead
1793
1794- Use "existing" directory in <_devfs_make_parent_for_leaf>
1795
1796- Use slab cache rather than fixed buffer for devfsd events
1797===============================================================================
1798Changes for patch v199
1799
1800- Removed obsolete usage of DEVFS_FL_NO_PERSISTENCE
1801
1802- Send DEVFSD_NOTIFY_REGISTERED events in <devfs_mk_dir>
1803
1804- Fixed locking bug in <devfs_d_revalidate_wait> due to typo
1805
1806- Do not send CREATE, CHANGE, ASYNC_OPEN or DELETE events from devfsd
1807 or children
1808===============================================================================
1809Changes for patch v200
1810
1811- Ported to kernel 2.5.1-pre2
1812===============================================================================
1813Changes for patch v201
1814
1815- Fixed bug in <devfsd_read>: was dereferencing freed pointer
1816===============================================================================
1817Changes for patch v202
1818
1819- Fixed bug in <devfsd_close>: was dereferencing freed pointer
1820
1821- Added process group check for devfsd privileges
1822===============================================================================
1823Changes for patch v203
1824
1825- Use SLAB_ATOMIC in <devfsd_notify_de> from <devfs_d_delete>
1826===============================================================================
1827Changes for patch v204
1828
1829- Removed long obsolete rc.devfs
1830
1831- Return old entry in <devfs_mk_dir> for 2.4.x kernels
1832
1833- Updated README from master HTML file
1834
1835- Increment refcount on module in <check_disc_changed>
1836
1837- Created <devfs_get_handle> and exported <devfs_put>
1838
1839- Increment refcount on module in <devfs_get_ops>
1840
1841- Created <devfs_put_ops> and used where needed to fix races
1842
1843- Added clarifying comments in response to preliminary EMC code review
1844
1845- Added poisoning to <devfs_put>
1846
1847- Improved debugging messages
1848
1849- Fixed unregister bugs in drivers/md/lvm-fs.c
1850===============================================================================
1851Changes for patch v205
1852
1853- Corrected (made useful) debugging message in <unregister>
1854
1855- Moved <kmem_cache_create> in <mount_devfs_fs> to <init_devfs_fs>
1856
1857- Fixed drivers/md/lvm-fs.c to create "lvm" entry
1858
1859- Added magic number to guard against scribbling drivers
1860
1861- Only return old entry in <devfs_mk_dir> if a directory
1862
1863- Defined macros for error and debug messages
1864
1865- Updated README from master HTML file
1866===============================================================================
1867Changes for patch v206
1868
1869- Added support for multiple Compaq cpqarray controllers
1870
1871- Fixed (rare, old) race in <devfs_lookup>
1872===============================================================================
1873Changes for patch v207
1874
1875- Fixed deadlock bug in <devfs_d_revalidate_wait>
1876
1877- Tag VFS deletable in <devfs_mk_symlink> if handle ignored
1878
1879- Updated README from master HTML file
1880===============================================================================
1881Changes for patch v208
1882
1883- Added KERN_* to remaining messages
1884
1885- Cleaned up declaration of <stat_read>
1886
1887- Updated README from master HTML file
1888===============================================================================
1889Changes for patch v209
1890
1891- Updated README from master HTML file
1892
1893- Removed silently introduced calls to lock_kernel() and
1894 unlock_kernel() due to recent VFS locking changes. BKL isn't
1895 required in devfs
1896
1897- Changed <devfs_rmdir> to allow later additions if not yet empty
1898
1899- Added calls to <devfs_register_partitions> in drivers/block/blkpc.c
1900 <add_partition> and <del_partition>
1901
1902- Fixed bug in <devfs_alloc_unique_number>: was clearing beyond
1903 bitfield
1904
1905- Fixed bitfield data type for <devfs_*alloc_devnum>
1906
1907- Made major bitfield type and initialiser 64 bit safe
1908===============================================================================
1909Changes for patch v210
1910
1911- Updated fs/devfs/util.c to fix shift warning on 64 bit machines
1912 Thanks to Anton Blanchard <anton@samba.org>
1913
1914- Updated README from master HTML file
1915===============================================================================
1916Changes for patch v211
1917
1918- Do not put miscellaneous character devices in /dev/misc if they
1919 specify their own directory (i.e. contain a '/' character)
1920
1921- Copied macro for error messages from fs/devfs/base.c to
1922 fs/devfs/util.c and made use of this macro
1923
1924- Removed 2.4.x compatibility code from fs/devfs/base.c
1925===============================================================================
1926Changes for patch v212
1927
1928- Added BKL to <devfs_open> because drivers still need it
1929===============================================================================
1930Changes for patch v213
1931
1932- Protected <scan_dir_for_removable> and <get_removable_partition>
1933 from changing directory contents
1934===============================================================================
1935Changes for patch v214
1936
1937- Switched to ISO C structure field initialisers
1938
1939- Switch to set_current_state() and move before add_wait_queue()
1940
1941- Updated README from master HTML file
1942
1943- Fixed devfs entry leak in <devfs_readdir> when *readdir fails
1944===============================================================================
1945Changes for patch v215
1946
1947- Created <devfs_find_and_unregister>
1948
1949- Switched many functions from <devfs_find_handle> to
1950 <devfs_find_and_unregister>
1951
1952- Switched many functions from <devfs_find_handle> to <devfs_get_handle>
1953===============================================================================
1954Changes for patch v216
1955
1956- Switched arch/ia64/sn/io/hcl.c from <devfs_find_handle> to
1957 <devfs_get_handle>
1958
1959- Removed deprecated <devfs_find_handle>
1960===============================================================================
1961Changes for patch v217
1962
1963- Exported <devfs_find_and_unregister> and <devfs_only> to modules
1964
1965- Updated README from master HTML file
1966
1967- Fixed module unload race in <devfs_open>
1968===============================================================================
1969Changes for patch v218
1970
1971- Removed DEVFS_FL_AUTO_OWNER flag
1972
1973- Switched lingering structure field initialiser to ISO C
1974
1975- Added locking when setting/clearing flags
1976
1977- Documentation fix in fs/devfs/util.c
diff --git a/Documentation/filesystems/devfs/README b/Documentation/filesystems/devfs/README
deleted file mode 100644
index aabfba24bc2e..000000000000
--- a/Documentation/filesystems/devfs/README
+++ /dev/null
@@ -1,1959 +0,0 @@
1Devfs (Device File System) FAQ
2
3
4Linux Devfs (Device File System) FAQ
5Richard Gooch
620-AUG-2002
7
8
9Document languages:
10
11
12
13
14
15
16
17-----------------------------------------------------------------------------
18
19NOTE: the master copy of this document is available online at:
20
21http://www.atnf.csiro.au/~rgooch/linux/docs/devfs.html
22and looks much better than the text version distributed with the
23kernel sources. A mirror site is available at:
24
25http://www.ras.ucalgary.ca/~rgooch/linux/docs/devfs.html
26
27There is also an optional daemon that may be used with devfs. You can
28find out more about it at:
29
30http://www.atnf.csiro.au/~rgooch/linux/
31
32A mailing list is available which you may subscribe to. Send
33email
34to majordomo@oss.sgi.com with the following line in the
35body of the message:
36subscribe devfs
37To unsubscribe, send the message body:
38unsubscribe devfs
39instead. The list is archived at
40
41http://oss.sgi.com/projects/devfs/archive/.
42
43-----------------------------------------------------------------------------
44
45Contents
46
47
48What is it?
49
50Why do it?
51
52Who else does it?
53
54How it works
55
56Operational issues (essential reading)
57
58Instructions for the impatient
59Permissions persistence across reboots
60Dealing with drivers without devfs support
61All the way with Devfs
62Other Issues
63Kernel Naming Scheme
64Devfsd Naming Scheme
65Old Compatibility Names
66SCSI Host Probing Issues
67
68
69
70Device drivers currently ported
71
72Allocation of Device Numbers
73
74Questions and Answers
75
76Making things work
77Alternatives to devfs
78What I don't like about devfs
79How to report bugs
80Strange kernel messages
81Compilation problems with devfsd
82
83
84Other resources
85
86Translations of this document
87
88
89-----------------------------------------------------------------------------
90
91
92What is it?
93
94Devfs is an alternative to "real" character and block special devices
95on your root filesystem. Kernel device drivers can register devices by
96name rather than major and minor numbers. These devices will appear in
97devfs automatically, with whatever default ownership and
98protection the driver specified. A daemon (devfsd) can be used to
99override these defaults. Devfs has been in the kernel since 2.3.46.
100
101NOTE that devfs is entirely optional. If you prefer the old
102disc-based device nodes, then simply leave CONFIG_DEVFS_FS=n (the
103default). In this case, nothing will change. ALSO NOTE that if you do
104enable devfs, the defaults are such that full compatibility is
105maintained with the old devices names.
106
107There are two aspects to devfs: one is the underlying device
108namespace, which is a namespace just like any mounted filesystem. The
109other aspect is the filesystem code which provides a view of the
110device namespace. The reason I make a distinction is because devfs
111can be mounted many times, with each mount showing the same device
112namespace. Changes made are global to all mounted devfs filesystems.
113Also, because the devfs namespace exists without any devfs mounts, you
114can easily mount the root filesystem by referring to an entry in the
115devfs namespace.
116
117
118The cost of devfs is a small increase in kernel code size and memory
119usage. About 7 pages of code (some of that in __init sections) and 72
120bytes for each entry in the namespace. A modest system has only a
121couple of hundred device entries, so this costs a few more
122pages. Compare this with the suggestion to put /dev on a <a
123href="#why-faq-ramdisc">ramdisc.
124
125On a typical machine, the cost is under 0.2 percent. On a modest
126system with 64 MBytes of RAM, the cost is under 0.1 percent. The
127accusations of "bloatware" levelled at devfs are not justified.
128
129-----------------------------------------------------------------------------
130
131
132Why do it?
133
134There are several problems that devfs addresses. Some of these
135problems are more serious than others (depending on your point of
136view), and some can be solved without devfs. However, the totality of
137these problems really calls out for devfs.
138
139The choice is a patchwork of inefficient user space solutions, which
140are complex and likely to be fragile, or to use a simple and efficient
141devfs which is robust.
142
143There have been many counter-proposals to devfs, all seeking to
144provide some of the benefits without actually implementing devfs. So
145far there has been an absence of code and no proposed alternative has
146been able to provide all the features that devfs does. Further,
147alternative proposals require far more complexity in user-space (and
148still deliver less functionality than devfs). Some people have the
149mantra of reducing "kernel bloat", but don't consider the effects on
150user-space.
151
152A good solution limits the total complexity of kernel-space and
153user-space.
154
155
156Major&minor allocation
157
158The existing scheme requires the allocation of major and minor device
159numbers for each and every device. This means that a central
160co-ordinating authority is required to issue these device numbers
161(unless you're developing a "private" device driver), in order to
162preserve uniqueness. Devfs shifts the burden to a namespace. This may
163not seem like a huge benefit, but actually it is. Since driver authors
164will naturally choose a device name which reflects the functionality
165of the device, there is far less potential for namespace conflict.
166Solving this requires a kernel change.
167
168/dev management
169
170Because you currently access devices through device nodes, these must
171be created by the system administrator. For standard devices you can
172usually find a MAKEDEV programme which creates all these (hundreds!)
173of nodes. This means that changes in the kernel must be reflected by
174changes in the MAKEDEV programme, or else the system administrator
175creates device nodes by hand.
176
177The basic problem is that there are two separate databases of
178major and minor numbers. One is in the kernel and one is in /dev (or
179in a MAKEDEV programme, if you want to look at it that way). This is
180duplication of information, which is not good practice.
181Solving this requires a kernel change.
182
183/dev growth
184
185A typical /dev has over 1200 nodes! Most of these devices simply don't
186exist because the hardware is not available. A huge /dev increases the
187time to access devices (I'm just referring to the dentry lookup times
188and the time taken to read inodes off disc: the next subsection shows
189some more horrors).
190
191An example of how big /dev can grow is if we consider SCSI devices:
192
193host 6 bits (say up to 64 hosts on a really big machine)
194channel 4 bits (say up to 16 SCSI buses per host)
195id 4 bits
196lun 3 bits
197partition 6 bits
198TOTAL 23 bits
199
200
201This requires 8 Mega (1024*1024) inodes if we want to store all
202possible device nodes. Even if we scrap everything but id,partition
203and assume a single host adapter with a single SCSI bus and only one
204logical unit per SCSI target (id), that's still 10 bits or 1024
205inodes. Each VFS inode takes around 256 bytes (kernel 2.1.78), so
206that's 256 kBytes of inode storage on disc (assuming real inodes take
207a similar amount of space as VFS inodes). This is actually not so bad,
208because disc is cheap these days. Embedded systems would care about
209256 kBytes of /dev inodes, but you could argue that embedded systems
210would have hand-tuned /dev directories. I've had to do just that on my
211embedded systems, but I would rather just leave it to devfs.
212
213Another issue is the time taken to lookup an inode when first
214referenced. Not only does this take time in scanning through a list in
215memory, but also the seek times to read the inodes off disc.
216This could be solved in user-space using a clever programme which
217scanned the kernel logs and deleted /dev entries which are not
218available and created them when they were available. This programme
219would need to be run every time a new module was loaded, which would
220slow things down a lot.
221
222There is an existing programme called scsidev which will automatically
223create device nodes for SCSI devices. It can do this by scanning files
224in /proc/scsi. Unfortunately, to extend this idea to other device
225nodes would require significant modifications to existing drivers (so
226they too would provide information in /proc). This is a non-trivial
227change (I should know: devfs has had to do something similar). Once
228you go to this much effort, you may as well use devfs itself (which
229also provides this information). Furthermore, such a system would
230likely be implemented in an ad-hoc fashion, as different drivers will
231provide their information in different ways.
232
233Devfs is much cleaner, because it (naturally) has a uniform mechanism
234to provide this information: the device nodes themselves!
235
236
237Node to driver file_operations translation
238
239There is an important difference between the way disc-based character
240and block nodes and devfs entries make the connection between an entry
241in /dev and the actual device driver.
242
243With the current 8 bit major and minor numbers the connection between
244disc-based c&b nodes and per-major drivers is done through a
245fixed-length table of 128 entries. The various filesystem types set
246the inode operations for c&b nodes to {chr,blk}dev_inode_operations,
247so when a device is opened a few quick levels of indirection bring us
248to the driver file_operations.
249
250For miscellaneous character devices a second step is required: there
251is a scan for the driver entry with the same minor number as the file
252that was opened, and the appropriate minor open method is called. This
253scanning is done *every time* you open a device node. Potentially, you
254may be searching through dozens of misc. entries before you find your
255open method. While not an enormous performance overhead, this does
256seem pointless.
257
258Linux *must* move beyond the 8 bit major and minor barrier,
259somehow. If we simply increase each to 16 bits, then the indexing
260scheme used for major driver lookup becomes untenable, because the
261major tables (one each for character and block devices) would need to
262be 64 k entries long (512 kBytes on x86, 1 MByte for 64 bit
263systems). So we would have to use a scheme like that used for
264miscellaneous character devices, which means the search time goes up
265linearly with the average number of major device drivers on your
266system. Not all "devices" are hardware, some are higher-level drivers
267like KGI, so you can get more "devices" without adding hardware
268You can improve this by creating an ordered (balanced:-)
269binary tree, in which case your search time becomes log(N).
270Alternatively, you can use hashing to speed up the search.
271But why do that search at all if you don't have to? Once again, it
272seems pointless.
273
274Note that devfs doesn't use the major&minor system. For devfs
275entries, the connection is done when you lookup the /dev entry. When
276devfs_register() is called, an internal table is appended which has
277the entry name and the file_operations. If the dentry cache doesn't
278have the /dev entry already, this internal table is scanned to get the
279file_operations, and an inode is created. If the dentry cache already
280has the entry, there is *no lookup time* (other than the dentry scan
281itself, but we can't avoid that anyway, and besides Linux dentries
282cream other OS's which don't have them:-). Furthermore, the number of
283node entries in a devfs is only the number of available device
284entries, not the number of *conceivable* entries. Even if you remove
285unnecessary entries in a disc-based /dev, the number of conceivable
286entries remains the same: you just limit yourself in order to save
287space.
288
289Devfs provides a fast connection between a VFS node and the device
290driver, in a scalable way.
291
292/dev as a system administration tool
293
294Right now /dev contains a list of conceivable devices, most of which I
295don't have. Devfs only shows those devices available on my
296system. This means that listing /dev is a handy way of checking what
297devices are available.
298
299Major&minor size
300
301Existing major and minor numbers are limited to 8 bits each. This is
302now a limiting factor for some drivers, particularly the SCSI disc
303driver, which consumes a single major number. Only 16 discs are
304supported, and each disc may have only 15 partitions. Maybe this isn't
305a problem for you, but some of us are building huge Linux systems with
306disc arrays. With devfs an arbitrary pointer can be associated with
307each device entry, which can be used to give an effective 32 bit
308device identifier (i.e. that's like having a 32 bit minor
309number). Since this is private to the kernel, there are no C library
310compatibility issues which you would have with increasing major and
311minor number sizes. See the section on "Allocation of Device Numbers"
312for details on maintaining compatibility with userspace.
313
314Solving this requires a kernel change.
315
316Since writing this, the kernel has been modified so that the SCSI disc
317driver has more major numbers allocated to it and now supports up to
318128 discs. Since these major numbers are non-contiguous (a result of
319unplanned expansion), the implementation is a little more cumbersome
320than originally.
321
322Just like the changes to IPv4 to fix impending limitations in the
323address space, people find ways around the limitations. In the long
324run, however, solutions like IPv6 or devfs can't be put off forever.
325
326Read-only root filesystem
327
328Having your device nodes on the root filesystem means that you can't
329operate properly with a read-only root filesystem. This is because you
330want to change ownerships and protections of tty devices. Existing
331practice prevents you using a CD-ROM as your root filesystem for a
332*real* system. Sure, you can boot off a CD-ROM, but you can't change
333tty ownerships, so it's only good for installing.
334
335Also, you can't use a shared NFS root filesystem for a cluster of
336discless Linux machines (having tty ownerships changed on a common
337/dev is not good). Nor can you embed your root filesystem in a
338ROM-FS.
339
340You can get around this by creating a RAMDISC at boot time, making
341an ext2 filesystem in it, mounting it somewhere and copying the
342contents of /dev into it, then unmounting it and mounting it over
343/dev.
344
345A devfs is a cleaner way of solving this.
346
347Non-Unix root filesystem
348
349Non-Unix filesystems (such as NTFS) can't be used for a root
350filesystem because they variously don't support character and block
351special files or symbolic links. You can't have a separate disc-based
352or RAMDISC-based filesystem mounted on /dev because you need device
353nodes before you can mount these. Devfs can be mounted without any
354device nodes. Devlinks won't work because symlinks aren't supported.
355An alternative solution is to use initrd to mount a RAMDISC initial
356root filesystem (which is populated with a minimal set of device
357nodes), and then construct a new /dev in another RAMDISC, and finally
358switch to your non-Unix root filesystem. This requires clever boot
359scripts and a fragile and conceptually complex boot procedure.
360
361Devfs solves this in a robust and conceptually simple way.
362
363PTY security
364
365Current pseudo-tty (pty) devices are owned by root and read-writable
366by everyone. The user of a pty-pair cannot change
367ownership/protections without being suid-root.
368
369This could be solved with a secure user-space daemon which runs as
370root and does the actual creation of pty-pairs. Such a daemon would
371require modification to *every* programme that wants to use this new
372mechanism. It also slows down creation of pty-pairs.
373
374An alternative is to create a new open_pty() syscall which does much
375the same thing as the user-space daemon. Once again, this requires
376modifications to pty-handling programmes.
377
378The devfs solution allows a device driver to "tag" certain device
379files so that when an unopened device is opened, the ownerships are
380changed to the current euid and egid of the opening process, and the
381protections are changed to the default registered by the driver. When
382the device is closed ownership is set back to root and protections are
383set back to read-write for everybody. No programme need be changed.
384The devpts filesystem provides this auto-ownership feature for Unix98
385ptys. It doesn't support old-style pty devices, nor does it have all
386the other features of devfs.
387
388Intelligent device management
389
390Devfs implements a simple yet powerful protocol for communication with
391a device management daemon (devfsd) which runs in user space. It is
392possible to send a message (either synchronously or asynchronously) to
393devfsd on any event, such as registration/unregistration of device
394entries, opening and closing devices, looking up inodes, scanning
395directories and more. This has many possibilities. Some of these are
396already implemented. See:
397
398
399http://www.atnf.csiro.au/~rgooch/linux/
400
401Device entry registration events can be used by devfsd to change
402permissions of newly-created device nodes. This is one mechanism to
403control device permissions.
404
405Device entry registration/unregistration events can be used to run
406programmes or scripts. This can be used to provide automatic mounting
407of filesystems when a new block device media is inserted into the
408drive.
409
410Asynchronous device open and close events can be used to implement
411clever permissions management. For example, the default permissions on
412/dev/dsp do not allow everybody to read from the device. This is
413sensible, as you don't want some remote user recording what you say at
414your console. However, the console user is also prevented from
415recording. This behaviour is not desirable. With asynchronous device
416open and close events, you can have devfsd run a programme or script
417when console devices are opened to change the ownerships for *other*
418device nodes (such as /dev/dsp). On closure, you can run a different
419script to restore permissions. An advantage of this scheme over
420modifying the C library tty handling is that this works even if your
421programme crashes (how many times have you seen the utmp database with
422lingering entries for non-existent logins?).
423
424Synchronous device open events can be used to perform intelligent
425device access protections. Before the device driver open() method is
426called, the daemon must first validate the open attempt, by running an
427external programme or script. This is far more flexible than access
428control lists, as access can be determined on the basis of other
429system conditions instead of just the UID and GID.
430
431Inode lookup events can be used to authenticate module autoload
432requests. Instead of using kmod directly, the event is sent to
433devfsd which can implement an arbitrary authentication before loading
434the module itself.
435
436Inode lookup events can also be used to construct arbitrary
437namespaces, without having to resort to populating devfs with symlinks
438to devices that don't exist.
439
440Speculative Device Scanning
441
442Consider an application (like cdparanoia) that wants to find all
443CD-ROM devices on the system (SCSI, IDE and other types), whether or
444not their respective modules are loaded. The application must
445speculatively open certain device nodes (such as /dev/sr0 for the SCSI
446CD-ROMs) in order to make sure the module is loaded. This requires
447that all Linux distributions follow the standard device naming scheme
448(last time I looked RedHat did things differently). Devfs solves the
449naming problem.
450
451The same application also wants to see which devices are actually
452available on the system. With the existing system it needs to read the
453/dev directory and speculatively open each /dev/sr* device to
454determine if the device exists or not. With a large /dev this is an
455inefficient operation, especially if there are many /dev/sr* nodes. A
456solution like scsidev could reduce the number of /dev/sr* entries (but
457of course that also requires all that inefficient directory scanning).
458
459With devfs, the application can open the /dev/sr directory
460(which triggers the module autoloading if required), and proceed to
461read /dev/sr. Since only the available devices will have
462entries, there are no inefficencies in directory scanning or device
463openings.
464
465-----------------------------------------------------------------------------
466
467Who else does it?
468
469FreeBSD has a devfs implementation. Solaris and AIX each have a
470pseudo-devfs (something akin to scsidev but for all devices, with some
471unspecified kernel support). BeOS, Plan9 and QNX also have it. SGI's
472IRIX 6.4 and above also have a device filesystem.
473
474While we shouldn't just automatically do something because others do
475it, we should not ignore the work of others either. FreeBSD has a lot
476of competent people working on it, so their opinion should not be
477blithely ignored.
478
479-----------------------------------------------------------------------------
480
481
482How it works
483
484Registering device entries
485
486For every entry (device node) in a devfs-based /dev a driver must call
487devfs_register(). This adds the name of the device entry, the
488file_operations structure pointer and a few other things to an
489internal table. Device entries may be added and removed at any
490time. When a device entry is registered, it automagically appears in
491any mounted devfs'.
492
493Inode lookup
494
495When a lookup operation on an entry is performed and if there is no
496driver information for that entry devfs will attempt to call
497devfsd. If still no driver information can be found then a negative
498dentry is yielded and the next stage operation will be called by the
499VFS (such as create() or mknod() inode methods). If driver information
500can be found, an inode is created (if one does not exist already) and
501all is well.
502
503Manually creating device nodes
504
505The mknod() method allows you to create an ordinary named pipe in the
506devfs, or you can create a character or block special inode if one
507does not already exist. You may wish to create a character or block
508special inode so that you can set permissions and ownership. Later, if
509a device driver registers an entry with the same name, the
510permissions, ownership and times are retained. This is how you can set
511the protections on a device even before the driver is loaded. Once you
512create an inode it appears in the directory listing.
513
514Unregistering device entries
515
516A device driver calls devfs_unregister() to unregister an entry.
517
518Chroot() gaols
519
5202.2.x kernels
521
522The semantics of inode creation are different when devfs is mounted
523with the "explicit" option. Now, when a device entry is registered, it
524will not appear until you use mknod() to create the device. It doesn't
525matter if you mknod() before or after the device is registered with
526devfs_register(). The purpose of this behaviour is to support
527chroot(2) gaols, where you want to mount a minimal devfs inside the
528gaol. Only the devices you specifically want to be available (through
529your mknod() setup) will be accessible.
530
5312.4.x kernels
532
533As of kernel 2.3.99, the VFS has had the ability to rebind parts of
534the global filesystem namespace into another part of the namespace.
535This now works even at the leaf-node level, which means that
536individual files and device nodes may be bound into other parts of the
537namespace. This is like making links, but better, because it works
538across filesystems (unlike hard links) and works through chroot()
539gaols (unlike symbolic links).
540
541Because of these improvements to the VFS, the multi-mount capability
542in devfs is no longer needed. The administrator may create a minimal
543device tree inside a chroot(2) gaol by using VFS bindings. As this
544provides most of the features of the devfs multi-mount capability, I
545removed the multi-mount support code (after issuing an RFC). This
546yielded code size reductions and simplifications.
547
548If you want to construct a minimal chroot() gaol, the following
549command should suffice:
550
551mount --bind /dev/null /gaol/dev/null
552
553
554Repeat for other device nodes you want to expose. Simple!
555
556-----------------------------------------------------------------------------
557
558
559Operational issues
560
561
562Instructions for the impatient
563
564Nobody likes reading documentation. People just want to get in there
565and play. So this section tells you quickly the steps you need to take
566to run with devfs mounted over /dev. Skip these steps and you will end
567up with a nearly unbootable system. Subsequent sections describe the
568issues in more detail, and discuss non-essential configuration
569options.
570
571Devfsd
572OK, if you're reading this, I assume you want to play with
573devfs. First you should ensure that /usr/src/linux contains a
574recent kernel source tree. Then you need to compile devfsd, the device
575management daemon, available at
576
577http://www.atnf.csiro.au/~rgooch/linux/.
578Because the kernel has a naming scheme
579which is quite different from the old naming scheme, you need to
580install devfsd so that software and configuration files that use the
581old naming scheme will not break.
582
583Compile and install devfsd. You will be provided with a default
584configuration file /etc/devfsd.conf which will provide
585compatibility symlinks for the old naming scheme. Don't change this
586config file unless you know what you're doing. Even if you think you
587do know what you're doing, don't change it until you've followed all
588the steps below and booted a devfs-enabled system and verified that it
589works.
590
591Now edit your main system boot script so that devfsd is started at the
592very beginning (before any filesystem
593checks). /etc/rc.d/rc.sysinit is often the main boot script
594on systems with SysV-style boot scripts. On systems with BSD-style
595boot scripts it is often /etc/rc. Also check
596/sbin/rc.
597
598NOTE that the line you put into the boot
599script should be exactly:
600
601/sbin/devfsd /dev
602
603DO NOT use some special daemon-launching
604programme, otherwise the boot script may not wait for devfsd to finish
605initialising.
606
607System Libraries
608There may still be some problems because of broken software making
609assumptions about device names. In particular, some software does not
610handle devices which are symbolic links. If you are running a libc 5
611based system, install libc 5.4.44 (if you have libc 5.4.46, go back to
612libc 5.4.44, which is actually correct). If you are running a glibc
613based system, make sure you have glibc 2.1.3 or later.
614
615/etc/securetty
616PAM (Pluggable Authentication Modules) is supposed to be a flexible
617mechanism for providing better user authentication and access to
618services. Unfortunately, it's also fragile, complex and undocumented
619(check out RedHat 6.1, and probably other distributions as well). PAM
620has problems with symbolic links. Append the following lines to your
621/etc/securetty file:
622
623vc/1
624vc/2
625vc/3
626vc/4
627vc/5
628vc/6
629vc/7
630vc/8
631
632This will not weaken security. If you have a version of util-linux
633earlier than 2.10.h, please upgrade to 2.10.h or later. If you
634absolutely cannot upgrade, then also append the following lines to
635your /etc/securetty file:
636
6371
6382
6393
6404
6415
6426
6437
6448
645
646This may potentially weaken security by allowing root logins over the
647network (a password is still required, though). However, since there
648are problems with dealing with symlinks, I'm suspicious of the level
649of security offered in any case.
650
651XFree86
652While not essential, it's probably a good idea to upgrade to XFree86
6534.0, as patches went in to make it more devfs-friendly. If you don't,
654you'll probably need to apply the following patch to
655/etc/security/console.perms so that ordinary users can run
656startx. Note that not all distributions have this file (e.g. Debian),
657so if it's not present, don't worry about it.
658
659--- /etc/security/console.perms.orig Sat Apr 17 16:26:47 1999
660+++ /etc/security/console.perms Fri Feb 25 23:53:55 2000
661@@ -14,7 +14,7 @@
662 # man 5 console.perms
663
664 # file classes -- these are regular expressions
665-<console>=tty[0-9][0-9]* :[0-9]\.[0-9] :[0-9]
666+<console>=tty[0-9][0-9]* vc/[0-9][0-9]* :[0-9]\.[0-9] :[0-9]
667
668 # device classes -- these are shell-style globs
669 <floppy>=/dev/fd[0-1]*
670
671If the patch does not apply, then change the line:
672
673<console>=tty[0-9][0-9]* :[0-9]\.[0-9] :[0-9]
674
675with:
676
677<console>=tty[0-9][0-9]* vc/[0-9][0-9]* :[0-9]\.[0-9] :[0-9]
678
679
680Disable devpts
681I've had a report of devpts mounted on /dev/pts not working
682correctly. Since devfs will also manage /dev/pts, there is no
683need to mount devpts as well. You should either edit your
684/etc/fstab so devpts is not mounted, or disable devpts from
685your kernel configuration.
686
687Unsupported drivers
688Not all drivers have devfs support. If you depend on one of these
689drivers, you will need to create a script or tarfile that you can use
690at boot time to create device nodes as appropriate. There is a
691section which describes this. Another
692section lists the drivers which have
693devfs support.
694
695/dev/mouse
696
697Many disributions configure /dev/mouse to be the mouse device
698for XFree86 and GPM. I actually think this is a bad idea, because it
699adds another level of indirection. When looking at a config file, if
700you see /dev/mouse you're left wondering which mouse
701is being referred to. Hence I recommend putting the actual mouse
702device (for example /dev/psaux) into your
703/etc/X11/XF86Config file (and similarly for the GPM
704configuration file).
705
706Alternatively, use the same technique used for unsupported drivers
707described above.
708
709The Kernel
710Finally, you need to make sure devfs is compiled into your kernel. Set
711CONFIG_EXPERIMENTAL=y, CONFIG_DEVFS_FS=y and CONFIG_DEVFS_MOUNT=y by
712using favourite configuration tool (i.e. make config or
713make xconfig) and then make clean and then recompile your kernel and
714modules. At boot, devfs will be mounted onto /dev.
715
716If you encounter problems booting (for example if you forgot a
717configuration step), you can pass devfs=nomount at the kernel
718boot command line. This will prevent the kernel from mounting devfs at
719boot time onto /dev.
720
721In general, a kernel built with CONFIG_DEVFS_FS=y but without mounting
722devfs onto /dev is completely safe, and requires no
723configuration changes. One exception to take note of is when
724LABEL= directives are used in /etc/fstab. In this
725case you will be unable to boot properly. This is because the
726mount(8) programme uses /proc/partitions as part of
727the volume label search process, and the device names it finds are not
728available, because setting CONFIG_DEVFS_FS=y changes the names in
729/proc/partitions, irrespective of whether devfs is mounted.
730
731Now you've finished all the steps required. You're now ready to boot
732your shiny new kernel. Enjoy.
733
734Changing the configuration
735
736OK, you've now booted a devfs-enabled system, and everything works.
737Now you may feel like changing the configuration (common targets are
738/etc/fstab and /etc/devfsd.conf). Since you have a
739system that works, if you make any changes and it doesn't work, you
740now know that you only have to restore your configuration files to the
741default and it will work again.
742
743
744Permissions persistence across reboots
745
746If you don't use mknod(2) to create a device file, nor use chmod(2) or
747chown(2) to change the ownerships/permissions, the inode ctime will
748remain at 0 (the epoch, 12 am, 1-JAN-1970, GMT). Anything with a ctime
749later than this has had it's ownership/permissions changed. Hence, a
750simple script or programme may be used to tar up all changed inodes,
751prior to shutdown. Although effective, many consider this approach a
752kludge.
753
754A much better approach is to use devfsd to save and restore
755permissions. It may be configured to record changes in permissions and
756will save them in a database (in fact a directory tree), and restore
757these upon boot. This is an efficient method and results in immediate
758saving of current permissions (unlike the tar approach, which saves
759permissions at some unspecified future time).
760
761The default configuration file supplied with devfsd has config entries
762which you may uncomment to enable persistence management.
763
764If you decide to use the tar approach anyway, be aware that tar will
765first unlink(2) an inode before creating a new device node. The
766unlink(2) has the effect of breaking the connection between a devfs
767entry and the device driver. If you use the "devfs=only" boot option,
768you lose access to the device driver, requiring you to reload the
769module. I consider this a bug in tar (there is no real need to
770unlink(2) the inode first).
771
772Alternatively, you can use devfsd to provide more sophisticated
773management of device permissions. You can use devfsd to store
774permissions for whole groups of devices with a single configuration
775entry, rather than the conventional single entry per device entry.
776
777Permissions database stored in mounted-over /dev
778
779If you wish to save and restore your device permissions into the
780disc-based /dev while still mounting devfs onto /dev
781you may do so. This requires a 2.4.x kernel (in fact, 2.3.99 or
782later), which has the VFS binding facility. You need to do the
783following to set this up:
784
785
786
787make sure the kernel does not mount devfs at boot time
788
789
790make sure you have a correct /dev/console entry in your
791root file-system (where your disc-based /dev lives)
792
793create the /dev-state directory
794
795
796add the following lines near the very beginning of your boot
797scripts:
798
799mount --bind /dev /dev-state
800mount -t devfs none /dev
801devfsd /dev
802
803
804
805
806add the following lines to your /etc/devfsd.conf file:
807
808REGISTER ^pt[sy] IGNORE
809CREATE ^pt[sy] IGNORE
810CHANGE ^pt[sy] IGNORE
811DELETE ^pt[sy] IGNORE
812REGISTER .* COPY /dev-state/$devname $devpath
813CREATE .* COPY $devpath /dev-state/$devname
814CHANGE .* COPY $devpath /dev-state/$devname
815DELETE .* CFUNCTION GLOBAL unlink /dev-state/$devname
816RESTORE /dev-state
817
818Note that the sample devfsd.conf file contains these lines,
819as well as other sample configurations you may find useful. See the
820devfsd distribution
821
822
823reboot.
824
825
826
827
828Permissions database stored in normal directory
829
830If you are using an older kernel which doesn't support VFS binding,
831then you won't be able to have the permissions database in a
832mounted-over /dev. However, you can still use a regular
833directory to store the database. The sample /etc/devfsd.conf
834file above may still be used. You will need to create the
835/dev-state directory prior to installing devfsd. If you have
836old permissions in /dev, then just copy (or move) the device
837nodes over to the new directory.
838
839Which method is better?
840
841The best method is to have the permissions database stored in the
842mounted-over /dev. This is because you will not need to copy
843device nodes over to /dev-state, and because it allows you to
844switch between devfs and non-devfs kernels, without requiring you to
845copy permissions between /dev-state (for devfs) and
846/dev (for non-devfs).
847
848
849Dealing with drivers without devfs support
850
851Currently, not all device drivers in the kernel have been modified to
852use devfs. Device drivers which do not yet have devfs support will not
853automagically appear in devfs. The simplest way to create device nodes
854for these drivers is to unpack a tarfile containing the required
855device nodes. You can do this in your boot scripts. All your drivers
856will now work as before.
857
858Hopefully for most people devfs will have enough support so that they
859can mount devfs directly over /dev without losing most functionality
860(i.e. losing access to various devices). As of 22-JAN-1998 (devfs
861patch version 10) I am now running this way. All the devices I have
862are available in devfs, so I don't lose anything.
863
864WARNING: if your configuration requires the old-style device names
865(i.e. /dev/hda1 or /dev/sda1), you must install devfsd and configure
866it to maintain compatibility entries. It is almost certain that you
867will require this. Note that the kernel creates a compatibility entry
868for the root device, so you don't need initrd.
869
870Note that you no longer need to mount devpts if you use Unix98 PTYs,
871as devfs can manage /dev/pts itself. This saves you some RAM, as you
872don't need to compile and install devpts. Note that some versions of
873glibc have a bug with Unix98 pty handling on devfs systems. Contact
874the glibc maintainers for a fix. Glibc 2.1.3 has the fix.
875
876Note also that apart from editing /etc/fstab, other things will need
877to be changed if you *don't* install devfsd. Some software (like the X
878server) hard-wire device names in their source. It really is much
879easier to install devfsd so that compatibility entries are created.
880You can then slowly migrate your system to using the new device names
881(for example, by starting with /etc/fstab), and then limiting the
882compatibility entries that devfsd creates.
883
884IF YOU CONFIGURE TO MOUNT DEVFS AT BOOT, MAKE SURE YOU INSTALL DEVFSD
885BEFORE YOU BOOT A DEVFS-ENABLED KERNEL!
886
887Now that devfs has gone into the 2.3.46 kernel, I'm getting a lot of
888reports back. Many of these are because people are trying to run
889without devfsd, and hence some things break. Please just run devfsd if
890things break. I want to concentrate on real bugs rather than
891misconfiguration problems at the moment. If people are willing to fix
892bugs/false assumptions in other code (i.e. glibc, X server) and submit
893that to the respective maintainers, that would be great.
894
895
896All the way with Devfs
897
898The devfs kernel patch creates a rationalised device tree. As stated
899above, if you want to keep using the old /dev naming scheme,
900you just need to configure devfsd appopriately (see the man
901page). People who prefer the old names can ignore this section. For
902those of us who like the rationalised names and an uncluttered
903/dev, read on.
904
905If you don't run devfsd, or don't enable compatibility entry
906management, then you will have to configure your system to use the new
907names. For example, you will then need to edit your
908/etc/fstab to use the new disc naming scheme. If you want to
909be able to boot non-devfs kernels, you will need compatibility
910symlinks in the underlying disc-based /dev pointing back to
911the old-style names for when you boot a kernel without devfs.
912
913You can selectively decide which devices you want compatibility
914entries for. For example, you may only want compatibility entries for
915BSD pseudo-terminal devices (otherwise you'll have to patch you C
916library or use Unix98 ptys instead). It's just a matter of putting in
917the correct regular expression into /dev/devfsd.conf.
918
919There are other choices of naming schemes that you may prefer. For
920example, I don't use the kernel-supplied
921names, because they are too verbose. A common misconception is
922that the kernel-supplied names are meant to be used directly in
923configuration files. This is not the case. They are designed to
924reflect the layout of the devices attached and to provide easy
925classification.
926
927If you like the kernel-supplied names, that's fine. If you don't then
928you should be using devfsd to construct a namespace more to your
929liking. Devfsd has built-in code to construct a
930namespace that is both logical and easy to
931manage. In essence, it creates a convenient abbreviation of the
932kernel-supplied namespace.
933
934You are of course free to build your own namespace. Devfsd has all the
935infrastructure required to make this easy for you. All you need do is
936write a script. You can even write some C code and devfsd can load the
937shared object as a callable extension.
938
939
940Other Issues
941
942The init programme
943Another thing to take note of is whether your init programme
944creates a Unix socket /dev/telinit. Some versions of init
945create /dev/telinit so that the telinit programme can
946communicate with the init process. If you have such a system you need
947to make sure that devfs is mounted over /dev *before* init
948starts. In other words, you can't leave the mounting of devfs to
949/etc/rc, since this is executed after init. Other
950versions of init require a named pipe /dev/initctl
951which must exist *before* init starts. Once again, you need to
952mount devfs and then create the named pipe *before* init
953starts.
954
955The default behaviour now is not to mount devfs onto /dev at
956boot time for 2.3.x and later kernels. You can correct this with the
957"devfs=mount" boot option. This solves any problems with init,
958and also prevents the dreaded:
959
960Cannot open initial console
961
962message. For 2.2.x kernels where you need to apply the devfs patch,
963the default is to mount.
964
965If you have automatic mounting of devfs onto /dev then you
966may need to create /dev/initctl in your boot scripts. The
967following lines should suffice:
968
969mknod /dev/initctl p
970kill -SIGUSR1 1 # tell init that /dev/initctl now exists
971
972Alternatively, if you don't want the kernel to mount devfs onto
973/dev then you could use the following procedure is a
974guideline for how to get around /dev/initctl problems:
975
976# cd /sbin
977# mv init init.real
978# cat > init
979#! /bin/sh
980mount -n -t devfs none /dev
981mknod /dev/initctl p
982exec /sbin/init.real $*
983[control-D]
984# chmod a+x init
985
986Note that newer versions of init create /dev/initctl
987automatically, so you don't have to worry about this.
988
989Module autoloading
990You will need to configure devfsd to enable module
991autoloading. The following lines should be placed in your
992/etc/devfsd.conf file:
993
994LOOKUP .* MODLOAD
995
996
997As of devfsd-v1.3.10, a generic /etc/modules.devfs
998configuration file is installed, which is used by the MODLOAD
999action. This should be sufficient for most configurations. If you
1000require further configuration, edit your /etc/modules.conf
1001file. The way module autoloading work with devfs is:
1002
1003
1004a process attempts to lookup a device node (e.g. /dev/fred)
1005
1006
1007if that device node does not exist, the full pathname is passed to
1008devfsd as a string
1009
1010
1011devfsd will pass the string to the modprobe programme (provided the
1012configuration line shown above is present), and specifies that
1013/etc/modules.devfs is the configuration file
1014
1015
1016/etc/modules.devfs includes /etc/modules.conf to
1017access local configurations
1018
1019modprobe will search it's configuration files, looking for an alias
1020that translates the pathname into a module name
1021
1022
1023the translated pathname is then used to load the module.
1024
1025
1026If you wanted a lookup of /dev/fred to load the
1027mymod module, you would require the following configuration
1028line in /etc/modules.conf:
1029
1030alias /dev/fred mymod
1031
1032The /etc/modules.devfs configuration file provides many such
1033aliases for standard device names. If you look closely at this file,
1034you will note that some modules require multiple alias configuration
1035lines. This is required to support module autoloading for old and new
1036device names.
1037
1038Mounting root off a devfs device
1039If you wish to mount root off a devfs device when you pass the
1040"devfs=only" boot option, then you need to pass in the
1041"root=<device>" option to the kernel when booting. If you use
1042LILO, then you must have this in lilo.conf:
1043
1044append = "root=<device>"
1045
1046Surprised? Yep, so was I. It turns out if you have (as most people
1047do):
1048
1049root = <device>
1050
1051
1052then LILO will determine the device number of <device> and will
1053write that device number into a special place in the kernel image
1054before starting the kernel, and the kernel will use that device number
1055to mount the root filesystem. So, using the "append" variety ensures
1056that LILO passes the root filesystem device as a string, which devfs
1057can then use.
1058
1059Note that this isn't an issue if you don't pass "devfs=only".
1060
1061TTY issues
1062The ttyname(3) function in some versions of the C library makes
1063false assumptions about device entries which are symbolic links. The
1064tty(1) programme is one that depends on this function. I've
1065written a patch to libc 5.4.43 which fixes this. This has been
1066included in libc 5.4.44 and a similar fix is in glibc 2.1.3.
1067
1068
1069Kernel Naming Scheme
1070
1071The kernel provides a default naming scheme. This scheme is designed
1072to make it easy to search for specific devices or device types, and to
1073view the available devices. Some device types (such as hard discs),
1074have a directory of entries, making it easy to see what devices of
1075that class are available. Often, the entries are symbolic links into a
1076directory tree that reflects the topology of available devices. The
1077topological tree is useful for finding how your devices are arranged.
1078
1079Below is a list of the naming schemes for the most common drivers. A
1080list of reserved device names is
1081available for reference. Please send email to
1082rgooch@atnf.csiro.au to obtain an allocation. Please be
1083patient (the maintainer is busy). An alternative name may be allocated
1084instead of the requested name, at the discretion of the maintainer.
1085
1086Disc Devices
1087
1088All discs, whether SCSI, IDE or whatever, are placed under the
1089/dev/discs hierarchy:
1090
1091 /dev/discs/disc0 first disc
1092 /dev/discs/disc1 second disc
1093
1094
1095Each of these entries is a symbolic link to the directory for that
1096device. The device directory contains:
1097
1098 disc for the whole disc
1099 part* for individual partitions
1100
1101
1102CD-ROM Devices
1103
1104All CD-ROMs, whether SCSI, IDE or whatever, are placed under the
1105/dev/cdroms hierarchy:
1106
1107 /dev/cdroms/cdrom0 first CD-ROM
1108 /dev/cdroms/cdrom1 second CD-ROM
1109
1110
1111Each of these entries is a symbolic link to the real device entry for
1112that device.
1113
1114Tape Devices
1115
1116All tapes, whether SCSI, IDE or whatever, are placed under the
1117/dev/tapes hierarchy:
1118
1119 /dev/tapes/tape0 first tape
1120 /dev/tapes/tape1 second tape
1121
1122
1123Each of these entries is a symbolic link to the directory for that
1124device. The device directory contains:
1125
1126 mt for mode 0
1127 mtl for mode 1
1128 mtm for mode 2
1129 mta for mode 3
1130 mtn for mode 0, no rewind
1131 mtln for mode 1, no rewind
1132 mtmn for mode 2, no rewind
1133 mtan for mode 3, no rewind
1134
1135
1136SCSI Devices
1137
1138To uniquely identify any SCSI device requires the following
1139information:
1140
1141 controller (host adapter)
1142 bus (SCSI channel)
1143 target (SCSI ID)
1144 unit (Logical Unit Number)
1145
1146
1147All SCSI devices are placed under /dev/scsi (assuming devfs
1148is mounted on /dev). Hence, a SCSI device with the following
1149parameters: c=1,b=2,t=3,u=4 would appear as:
1150
1151 /dev/scsi/host1/bus2/target3/lun4 device directory
1152
1153
1154Inside this directory, a number of device entries may be created,
1155depending on which SCSI device-type drivers were installed.
1156
1157See the section on the disc naming scheme to see what entries the SCSI
1158disc driver creates.
1159
1160See the section on the tape naming scheme to see what entries the SCSI
1161tape driver creates.
1162
1163The SCSI CD-ROM driver creates:
1164
1165 cd
1166
1167
1168The SCSI generic driver creates:
1169
1170 generic
1171
1172
1173IDE Devices
1174
1175To uniquely identify any IDE device requires the following
1176information:
1177
1178 controller
1179 bus (aka. primary/secondary)
1180 target (aka. master/slave)
1181 unit
1182
1183
1184All IDE devices are placed under /dev/ide, and uses a similar
1185naming scheme to the SCSI subsystem.
1186
1187XT Hard Discs
1188
1189All XT discs are placed under /dev/xd. The first XT disc has
1190the directory /dev/xd/disc0.
1191
1192TTY devices
1193
1194The tty devices now appear as:
1195
1196 New name Old-name Device Type
1197 -------- -------- -----------
1198 /dev/tts/{0,1,...} /dev/ttyS{0,1,...} Serial ports
1199 /dev/cua/{0,1,...} /dev/cua{0,1,...} Call out devices
1200 /dev/vc/0 /dev/tty Current virtual console
1201 /dev/vc/{1,2,...} /dev/tty{1...63} Virtual consoles
1202 /dev/vcc/{0,1,...} /dev/vcs{1...63} Virtual consoles
1203 /dev/pty/m{0,1,...} /dev/ptyp?? PTY masters
1204 /dev/pty/s{0,1,...} /dev/ttyp?? PTY slaves
1205
1206
1207RAMDISCS
1208
1209The RAMDISCS are placed in their own directory, and are named thus:
1210
1211 /dev/rd/{0,1,2,...}
1212
1213
1214Meta Devices
1215
1216The meta devices are placed in their own directory, and are named
1217thus:
1218
1219 /dev/md/{0,1,2,...}
1220
1221
1222Floppy discs
1223
1224Floppy discs are placed in the /dev/floppy directory.
1225
1226Loop devices
1227
1228Loop devices are placed in the /dev/loop directory.
1229
1230Sound devices
1231
1232Sound devices are placed in the /dev/sound directory
1233(audio, sequencer, ...).
1234
1235
1236Devfsd Naming Scheme
1237
1238Devfsd provides a naming scheme which is a convenient abbreviation of
1239the kernel-supplied namespace. In some
1240cases, the kernel-supplied naming scheme is quite convenient, so
1241devfsd does not provide another naming scheme. The convenience names
1242that devfsd creates are in fact the same names as the original devfs
1243kernel patch created (before Linus mandated the Big Name
1244Change). These are referred to as "new compatibility entries".
1245
1246In order to configure devfsd to create these convenience names, the
1247following lines should be placed in your /etc/devfsd.conf:
1248
1249REGISTER .* MKNEWCOMPAT
1250UNREGISTER .* RMNEWCOMPAT
1251
1252This will cause devfsd to create (and destroy) symbolic links which
1253point to the kernel-supplied names.
1254
1255SCSI Hard Discs
1256
1257All SCSI discs are placed under /dev/sd (assuming devfs is
1258mounted on /dev). Hence, a SCSI disc with the following
1259parameters: c=1,b=2,t=3,u=4 would appear as:
1260
1261 /dev/sd/c1b2t3u4 for the whole disc
1262 /dev/sd/c1b2t3u4p5 for the 5th partition
1263 /dev/sd/c1b2t3u4p5s6 for the 6th slice in the 5th partition
1264
1265
1266SCSI Tapes
1267
1268All SCSI tapes are placed under /dev/st. A similar naming
1269scheme is used as for SCSI discs. A SCSI tape with the
1270parameters:c=1,b=2,t=3,u=4 would appear as:
1271
1272 /dev/st/c1b2t3u4m0 for mode 0
1273 /dev/st/c1b2t3u4m1 for mode 1
1274 /dev/st/c1b2t3u4m2 for mode 2
1275 /dev/st/c1b2t3u4m3 for mode 3
1276 /dev/st/c1b2t3u4m0n for mode 0, no rewind
1277 /dev/st/c1b2t3u4m1n for mode 1, no rewind
1278 /dev/st/c1b2t3u4m2n for mode 2, no rewind
1279 /dev/st/c1b2t3u4m3n for mode 3, no rewind
1280
1281
1282SCSI CD-ROMs
1283
1284All SCSI CD-ROMs are placed under /dev/sr. A similar naming
1285scheme is used as for SCSI discs. A SCSI CD-ROM with the
1286parameters:c=1,b=2,t=3,u=4 would appear as:
1287
1288 /dev/sr/c1b2t3u4
1289
1290
1291SCSI Generic Devices
1292
1293The generic (aka. raw) interface for all SCSI devices are placed under
1294/dev/sg. A similar naming scheme is used as for SCSI discs. A
1295SCSI generic device with the parameters:c=1,b=2,t=3,u=4 would appear
1296as:
1297
1298 /dev/sg/c1b2t3u4
1299
1300
1301IDE Hard Discs
1302
1303All IDE discs are placed under /dev/ide/hd, using a similar
1304convention to SCSI discs. The following mappings exist between the new
1305and the old names:
1306
1307 /dev/hda /dev/ide/hd/c0b0t0u0
1308 /dev/hdb /dev/ide/hd/c0b0t1u0
1309 /dev/hdc /dev/ide/hd/c0b1t0u0
1310 /dev/hdd /dev/ide/hd/c0b1t1u0
1311
1312
1313IDE Tapes
1314
1315A similar naming scheme is used as for IDE discs. The entries will
1316appear in the /dev/ide/mt directory.
1317
1318IDE CD-ROM
1319
1320A similar naming scheme is used as for IDE discs. The entries will
1321appear in the /dev/ide/cd directory.
1322
1323IDE Floppies
1324
1325A similar naming scheme is used as for IDE discs. The entries will
1326appear in the /dev/ide/fd directory.
1327
1328XT Hard Discs
1329
1330All XT discs are placed under /dev/xd. The first XT disc
1331would appear as /dev/xd/c0t0.
1332
1333
1334Old Compatibility Names
1335
1336The old compatibility names are the legacy device names, such as
1337/dev/hda, /dev/sda, /dev/rtc and so on.
1338Devfsd can be configured to create compatibility symlinks so that you
1339may continue to use the old names in your configuration files and so
1340that old applications will continue to function correctly.
1341
1342In order to configure devfsd to create these legacy names, the
1343following lines should be placed in your /etc/devfsd.conf:
1344
1345REGISTER .* MKOLDCOMPAT
1346UNREGISTER .* RMOLDCOMPAT
1347
1348This will cause devfsd to create (and destroy) symbolic links which
1349point to the kernel-supplied names.
1350
1351
1352-----------------------------------------------------------------------------
1353
1354
1355Device drivers currently ported
1356
1357- All miscellaneous character devices support devfs (this is done
1358 transparently through misc_register())
1359
1360- SCSI discs and generic hard discs
1361
1362- Character memory devices (null, zero, full and so on)
1363 Thanks to C. Scott Ananian <cananian@alumni.princeton.edu>
1364
1365- Loop devices (/dev/loop?)
1366
1367- TTY devices (console, serial ports, terminals and pseudo-terminals)
1368 Thanks to C. Scott Ananian <cananian@alumni.princeton.edu>
1369
1370- SCSI tapes (/dev/scsi and /dev/tapes)
1371
1372- SCSI CD-ROMs (/dev/scsi and /dev/cdroms)
1373
1374- SCSI generic devices (/dev/scsi)
1375
1376- RAMDISCS (/dev/ram?)
1377
1378- Meta Devices (/dev/md*)
1379
1380- Floppy discs (/dev/floppy)
1381
1382- Parallel port printers (/dev/printers)
1383
1384- Sound devices (/dev/sound)
1385 Thanks to Eric Dumas <dumas@linux.eu.org> and
1386 C. Scott Ananian <cananian@alumni.princeton.edu>
1387
1388- Joysticks (/dev/joysticks)
1389
1390- Sparc keyboard (/dev/kbd)
1391
1392- DSP56001 digital signal processor (/dev/dsp56k)
1393
1394- Apple Desktop Bus (/dev/adb)
1395
1396- Coda network file system (/dev/cfs*)
1397
1398- Virtual console capture devices (/dev/vcc)
1399 Thanks to Dennis Hou <smilax@mindmeld.yi.org>
1400
1401- Frame buffer devices (/dev/fb)
1402
1403- Video capture devices (/dev/v4l)
1404
1405
1406-----------------------------------------------------------------------------
1407
1408
1409Allocation of Device Numbers
1410
1411Devfs allows you to write a driver which doesn't need to allocate a
1412device number (major&minor numbers) for the internal operation of the
1413kernel. However, there are a number of userspace programmes that use
1414the device number as a unique handle for a device. An example is the
1415find programme, which uses device numbers to determine whether
1416an inode is on a different filesystem than another inode. The device
1417number used is the one for the block device which a filesystem is
1418using. To preserve compatibility with userspace programmes, block
1419devices using devfs need to have unique device numbers allocated to
1420them. Furthermore, POSIX specifies device numbers, so some kind of
1421device number needs to be presented to userspace.
1422
1423The simplest option (especially when porting drivers to devfs) is to
1424keep using the old major and minor numbers. Devfs will take whatever
1425values are given for major&minor and pass them onto userspace.
1426
1427This device number is a 16 bit number, so this leaves plenty of space
1428for large numbers of discs and partitions. This scheme can also be
1429used for character devices, in particular the tty devices, which are
1430currently limited to 256 pseudo-ttys (this limits the total number of
1431simultaneous xterms and remote logins). Note that the device number
1432is limited to the range 36864-61439 (majors 144-239), in order to
1433avoid any possible conflicts with existing official allocations.
1434
1435Please note that using dynamically allocated block device numbers may
1436break the NFS daemons (both user and kernel mode), which expect dev_t
1437for a given device to be constant over the lifetime of remote mounts.
1438
1439A final note on this scheme: since it doesn't increase the size of
1440device numbers, there are no compatibility issues with userspace.
1441
1442-----------------------------------------------------------------------------
1443
1444
1445Questions and Answers
1446
1447
1448Making things work
1449Alternatives to devfs
1450What I don't like about devfs
1451How to report bugs
1452Strange kernel messages
1453Compilation problems with devfsd
1454
1455
1456
1457Making things work
1458
1459Here are some common questions and answers.
1460
1461
1462
1463Devfsd doesn't start
1464
1465Make sure you have compiled and installed devfsd
1466Make sure devfsd is being started from your boot
1467scripts
1468Make sure you have configured your kernel to enable devfs (see
1469below)
1470Make sure devfs is mounted (see below)
1471
1472
1473Devfsd is not managing all my permissions
1474
1475Make sure you are capturing the appropriate events. For example,
1476device entries created by the kernel generate REGISTER events,
1477but those created by devfsd generate CREATE events.
1478
1479
1480Devfsd is not capturing all REGISTER events
1481
1482See the previous entry: you may need to capture CREATE events.
1483
1484
1485X will not start
1486
1487Make sure you followed the steps
1488outlined above.
1489
1490
1491Why don't my network devices appear in devfs?
1492
1493This is not a bug. Network devices have their own, completely separate
1494namespace. They are accessed via socket(2) and
1495setsockopt(2) calls, and thus require no device nodes. I have
1496raised the possibilty of moving network devices into the device
1497namespace, but have had no response.
1498
1499
1500How can I test if I have devfs compiled into my kernel?
1501
1502All filesystems built-in or currently loaded are listed in
1503/proc/filesystems. If you see a devfs entry, then
1504you know that devfs was compiled into your kernel. If you have
1505correctly configured and rebuilt your kernel, then devfs will be
1506built-in. If you think you've configured it in, but
1507/proc/filesystems doesn't show it, you've made a mistake.
1508Common mistakes include:
1509
1510Using a 2.2.x kernel without applying the devfs patch (if you
1511don't know how to patch your kernel, use 2.4.x instead, don't bother
1512asking me how to patch)
1513Forgetting to set CONFIG_EXPERIMENTAL=y
1514Forgetting to set CONFIG_DEVFS_FS=y
1515Forgetting to set CONFIG_DEVFS_MOUNT=y (if you want devfs
1516to be automatically mounted at boot)
1517Editing your .config manually, instead of using make
1518config or make xconfig
1519Forgetting to run make dep; make clean after changing the
1520configuration and before compiling
1521Forgetting to compile your kernel and modules
1522Forgetting to install your kernel
1523Forgetting to install your modules
1524
1525Please check twice that you've done all these steps before sending in
1526a bug report.
1527
1528
1529
1530How can I test if devfs is mounted on /dev?
1531
1532The device filesystem will always create an entry called
1533".devfsd", which is used to communicate with the daemon. Even
1534if the daemon is not running, this entry will exist. Testing for the
1535existence of this entry is the approved method of determining if devfs
1536is mounted or not. Note that the type of entry (i.e. regular file,
1537character device, named pipe, etc.) may change without notice. Only
1538the existence of the entry should be relied upon.
1539
1540
1541When I start devfsd, I see the error:
1542Error opening file: ".devfsd" No such file or directory?
1543
1544This means that devfs is not mounted. Make sure you have devfs mounted.
1545
1546
1547How do I mount devfs?
1548
1549First make sure you have devfs compiled into your kernel (see
1550above). Then you will either need to:
1551
1552set CONFIG_DEVFS_MOUNT=y in your kernel config
1553pass devfs=mount to your boot loader
1554mount devfs manually in your boot scripts with:
1555mount -t none devfs /dev
1556
1557
1558
1559Mount by volume LABEL=<label> doesn't work with
1560devfs
1561
1562Most probably you are not mounting devfs onto /dev. What
1563happens is that if your kernel config has CONFIG_DEVFS_FS=y
1564then the contents of /proc/partitions will have the devfs
1565names (such as scsi/host0/bus0/target0/lun0/part1). The
1566contents of /proc/partitions are used by mount(8) when
1567mounting by volume label. If devfs is not mounted on /dev,
1568then mount(8) will fail to find devices. The solution is to
1569make sure that devfs is mounted on /dev. See above for how to
1570do that.
1571
1572
1573I have extra or incorrect entries in /dev
1574
1575You may have stale entries in your dev-state area. Check for a
1576RESTORE configuration line in your devfsd configuration
1577(typically /etc/devfsd.conf). If you have this line, check
1578the contents of the specified directory for stale entries. Remove
1579any entries which are incorrect, then reboot.
1580
1581
1582I get "Unable to open initial console" messages at boot
1583
1584This usually happens when you don't have devfs automounted onto
1585/dev at boot time, and there is no valid
1586/dev/console entry on your root file-system. Create a valid
1587/dev/console device node.
1588
1589
1590
1591
1592
1593Alternatives to devfs
1594
1595I've attempted to collate all the anti-devfs proposals and explain
1596their limitations. Under construction.
1597
1598
1599Why not just pass device create/remove events to a daemon?
1600
1601Here the suggestion is to develop an API in the kernel so that devices
1602can register create and remove events, and a daemon listens for those
1603events. The daemon would then populate/depopulate /dev (which
1604resides on disc).
1605
1606This has several limitations:
1607
1608
1609it only works for modules loaded and unloaded (or devices inserted
1610and removed) after the kernel has finished booting. Without a database
1611of events, there is no way the daemon could fully populate
1612/dev
1613
1614
1615if you add a database to this scheme, the question is then how to
1616present that database to user-space. If you make it a list of strings
1617with embedded event codes which are passed through a pipe to the
1618daemon, then this is only of use to the daemon. I would argue that the
1619natural way to present this data is via a filesystem (since many of
1620the events will be of a hierarchical nature), such as devfs.
1621Presenting the data as a filesystem makes it easy for the user to see
1622what is available and also makes it easy to write scripts to scan the
1623"database"
1624
1625
1626the tight binding between device nodes and drivers is no longer
1627possible (requiring the otherwise perfectly avoidable
1628table lookups)
1629
1630
1631you cannot catch inode lookup events on /dev which means
1632that module autoloading requires device nodes to be created. This is a
1633problem, particularly for drivers where only a few inodes are created
1634from a potentially large set
1635
1636
1637this technique can't be used when the root FS is mounted
1638read-only
1639
1640
1641
1642
1643Just implement a better scsidev
1644
1645This suggestion involves taking the scsidev programme and
1646extending it to scan for all devices, not just SCSI devices. The
1647scsidev programme works by scanning /proc/scsi
1648
1649Problems:
1650
1651
1652the kernel does not currently provide a list of all devices
1653available. Not all drivers register entries in /proc or
1654generate kernel messages
1655
1656
1657there is no uniform mechanism to register devices other than the
1658devfs API
1659
1660
1661implementing such an API is then the same as the
1662proposal above
1663
1664
1665
1666
1667Put /dev on a ramdisc
1668
1669This suggestion involves creating a ramdisc and populating it with
1670device nodes and then mounting it over /dev.
1671
1672Problems:
1673
1674
1675
1676this doesn't help when mounting the root filesystem, since you
1677still need a device node to do that
1678
1679
1680if you want to use this technique for the root device node as
1681well, you need to use initrd. This complicates the booting sequence
1682and makes it significantly harder to administer and configure. The
1683initrd is essentially opaque, robbing the system administrator of easy
1684configuration
1685
1686
1687insufficient information is available to correctly populate the
1688ramdisc. So we come back to the
1689proposal above to "solve" this
1690
1691
1692a ramdisc-based solution would take more kernel memory, since the
1693backing store would be (at best) normal VFS inodes and dentries, which
1694take 284 bytes and 112 bytes, respectively, for each entry. Compare
1695that to 72 bytes for devfs
1696
1697
1698
1699
1700Do nothing: there's no problem
1701
1702Sometimes people can be heard to claim that the existing scheme is
1703fine. This is what they're ignoring:
1704
1705
1706device number size (8 bits each for major and minor) is a real
1707limitation, and must be fixed somehow. Systems with large numbers of
1708SCSI devices, for example, will continue to consume the remaining
1709unallocated major numbers. USB will also need to push beyond the 8 bit
1710minor limitation
1711
1712
1713simply increasing the device number size is insufficient. Apart
1714from causing a lot of pain, it doesn't solve the management issues
1715of a /dev with thousands or more device nodes
1716
1717
1718ignoring the problem of a huge /dev will not make it go
1719away, and dismisses the legitimacy of a large number of people who
1720want a dynamic /dev
1721
1722
1723the standard response then becomes: "write a device management
1724daemon", which brings us back to the
1725proposal above
1726
1727
1728
1729
1730What I don't like about devfs
1731
1732Here are some common complaints about devfs, and some suggestions and
1733solutions that may make it more palatable for you. I can't please
1734everybody, but I do try :-)
1735
1736I hate the naming scheme
1737
1738First, remember that no naming scheme will please everybody. You hate
1739the scheme, others love it. Who's to say who's right and who's wrong?
1740Ultimately, the person who writes the code gets to choose, and what
1741exists now is a combination of the choices made by the
1742devfs author and the
1743kernel maintainer (Linus).
1744
1745However, not all is lost. If you want to create your own naming
1746scheme, it is a simple matter to write a standalone script, hack
1747devfsd, or write a script called by devfsd. You can create whatever
1748naming scheme you like.
1749
1750Further, if you want to remove all traces of the devfs naming scheme
1751from /dev, you can mount devfs elsewhere (say
1752/devfs) and populate /dev with links into
1753/devfs. This population can be automated using devfsd if you
1754wish.
1755
1756You can even use the VFS binding facility to make the links, rather
1757than using symbolic links. This way, you don't even have to see the
1758"destination" of these symbolic links.
1759
1760Devfs puts policy into the kernel
1761
1762There's already policy in the kernel. Device numbers are in fact
1763policy (why should the kernel dictate what device numbers I use?).
1764Face it, some policy has to be in the kernel. The real difference
1765between device names as policy and device numbers as policy is that
1766no one will use device numbers directly, because device
1767numbers are devoid of meaning to humans and are ugly. At least with
1768the devfs device names, (even though you can add your own naming
1769scheme) some people will use the devfs-supplied names directly. This
1770offends some people :-)
1771
1772Devfs is bloatware
1773
1774This is not even remotely true. As shown above,
1775both code and data size are quite modest.
1776
1777
1778How to report bugs
1779
1780If you have (or think you have) a bug with devfs, please follow the
1781steps below:
1782
1783
1784
1785make sure you have enabled debugging output when configuring your
1786kernel. You will need to set (at least) the following config options:
1787
1788CONFIG_DEVFS_DEBUG=y
1789CONFIG_DEBUG_KERNEL=y
1790CONFIG_DEBUG_SLAB=y
1791
1792
1793
1794please make sure you have the latest devfs patches applied. The
1795latest kernel version might not have the latest devfs patches applied
1796yet (Linus is very busy)
1797
1798
1799save a copy of your complete kernel logs (preferably by
1800using the dmesg programme) for later inclusion in your bug
1801report. You may need to use the -s switch to increase the
1802internal buffer size so you can capture all the boot messages.
1803Don't edit or trim the dmesg output
1804
1805
1806
1807
1808try booting with devfs=dall passed to the kernel boot
1809command line (read the documentation on your bootloader on how to do
1810this), and save the result to a file. This may be quite verbose, and
1811it may overflow the messages buffer, but try to get as much of it as
1812you can
1813
1814
1815send a copy of your devfsd configuration file(s)
1816
1817send the bug report to me first.
1818Don't expect that I will see it if you post it to the linux-kernel
1819mailing list. Include all the information listed above, plus
1820anything else that you think might be relevant. Put the string
1821devfs somewhere in the subject line, so my mail filters mark
1822it as urgent
1823
1824
1825
1826
1827Here is a general guide on how to ask questions in a way that greatly
1828improves your chances of getting a reply:
1829
1830http://www.tuxedo.org/~esr/faqs/smart-questions.html. If you have
1831a bug to report, you should also read
1832
1833http://www.chiark.greenend.org.uk/~sgtatham/bugs.html.
1834
1835
1836Strange kernel messages
1837
1838You may see devfs-related messages in your kernel logs. Below are some
1839messages and what they mean (and what you should do about them, if
1840anything).
1841
1842
1843
1844devfs_register(fred): could not append to parent, err: -17
1845
1846You need to check what the error code means, but usually 17 means
1847EEXIST. This means that a driver attempted to create an entry
1848fred in a directory, but there already was an entry with that
1849name. This is often caused by flawed boot scripts which untar a bunch
1850of inodes into /dev, as a way to restore permissions. This
1851message is harmless, as the device nodes will still
1852provide access to the driver (unless you use the devfs=only
1853boot option, which is only for dedicated souls:-). If you want to get
1854rid of these annoying messages, upgrade to devfsd-v1.3.20 and use the
1855recommended RESTORE directive to restore permissions.
1856
1857
1858devfs_mk_dir(bill): using old entry in dir: c1808724 ""
1859
1860This is similar to the message above, except that a driver attempted
1861to create a directory named bill, and the parent directory
1862has an entry with the same name. In this case, to ensure that drivers
1863continue to work properly, the old entry is re-used and given to the
1864driver. In 2.5 kernels, the driver is given a NULL entry, and thus,
1865under rare circumstances, may not create the require device nodes.
1866The solution is the same as above.
1867
1868
1869
1870
1871
1872Compilation problems with devfsd
1873
1874Usually, you can compile devfsd just by typing in
1875make in the source directory, followed by a make
1876install (as root). Sometimes, you may have problems, particularly
1877on broken configurations.
1878
1879
1880
1881error messages relating to DEVFSD_NOTIFY_DELETE
1882
1883This happened because you have an ancient set of kernel headers
1884installed in /usr/include/linux or /usr/src/linux.
1885Install kernel 2.4.10 or later. You may need to pass the
1886KERNEL_DIR variable to make (if you did not install
1887the new kernel sources as /usr/src/linux), or you may copy
1888the devfs_fs.h file in the kernel source tree into
1889/usr/include/linux.
1890
1891
1892
1893
1894-----------------------------------------------------------------------------
1895
1896
1897Other resources
1898
1899
1900
1901Douglas Gilbert has written a useful document at
1902
1903http://www.torque.net/sg/devfs_scsi.html which
1904explores the SCSI subsystem and how it interacts with devfs
1905
1906
1907Douglas Gilbert has written another useful document at
1908
1909http://www.torque.net/scsi/SCSI-2.4-HOWTO/ which
1910discusses the Linux SCSI subsystem in 2.4.
1911
1912
1913Johannes Erdfelt has started a discussion paper on Linux and
1914hot-swap devices, describing what the requirements are for a scalable
1915solution and how and why he's used devfs+devfsd. Note that this is an
1916early draft only, available in plain text form at:
1917
1918http://johannes.erdfelt.com/hotswap.txt.
1919Johannes has promised a HTML version will follow.
1920
1921
1922I presented an invited
1923paper
1924at the
1925
19262nd Annual Storage Management Workshop held in Miamia, Florida,
1927U.S.A. in October 2000.
1928
1929
1930
1931
1932-----------------------------------------------------------------------------
1933
1934
1935Translations of this document
1936
1937This document has been translated into other languages.
1938
1939
1940
1941
1942The document master (in English) by rgooch@atnf.csiro.au is
1943available at
1944
1945http://www.atnf.csiro.au/~rgooch/linux/docs/devfs.html
1946
1947
1948
1949A Korean translation by viatoris@nownuri.net is available at
1950
1951http://your.destiny.pe.kr/devfs/devfs.html
1952
1953
1954
1955
1956-----------------------------------------------------------------------------
1957Most flags courtesy of ITA's
1958Flags of All Countries
1959used with permission.
diff --git a/Documentation/filesystems/devfs/ToDo b/Documentation/filesystems/devfs/ToDo
deleted file mode 100644
index afd5a8f2c19b..000000000000
--- a/Documentation/filesystems/devfs/ToDo
+++ /dev/null
@@ -1,40 +0,0 @@
1 Device File System (devfs) ToDo List
2
3 Richard Gooch <rgooch@atnf.csiro.au>
4
5 3-JUL-2000
6
7This is a list of things to be done for better devfs support in the
8Linux kernel. If you'd like to contribute to the devfs, please have a
9look at this list for anything that is unallocated. Also, if there are
10items missing (surely), please contact me so I can add them to the
11list (preferably with your name attached to them:-).
12
13
14- >256 ptys
15 Thanks to C. Scott Ananian <cananian@alumni.princeton.edu>
16
17- Amiga floppy driver (drivers/block/amiflop.c)
18
19- Atari floppy driver (drivers/block/ataflop.c)
20
21- SWIM3 (Super Woz Integrated Machine 3) floppy driver (drivers/block/swim3.c)
22
23- Amiga ZorroII ramdisc driver (drivers/block/z2ram.c)
24
25- Parallel port ATAPI CD-ROM (drivers/block/paride/pcd.c)
26
27- Parallel port ATAPI floppy (drivers/block/paride/pf.c)
28
29- AP1000 block driver (drivers/ap1000/ap.c, drivers/ap1000/ddv.c)
30
31- Archimedes floppy (drivers/acorn/block/fd1772.c)
32
33- MFM hard drive (drivers/acorn/block/mfmhd.c)
34
35- I2O block device (drivers/message/i2o/i2o_block.c)
36
37- ST-RAM device (arch/m68k/atari/stram.c)
38
39- Raw devices
40
diff --git a/Documentation/filesystems/devfs/boot-options b/Documentation/filesystems/devfs/boot-options
deleted file mode 100644
index df3d33b03e0a..000000000000
--- a/Documentation/filesystems/devfs/boot-options
+++ /dev/null
@@ -1,65 +0,0 @@
1/* -*- auto-fill -*- */
2
3 Device File System (devfs) Boot Options
4
5 Richard Gooch <rgooch@atnf.csiro.au>
6
7 18-AUG-2001
8
9
10When CONFIG_DEVFS_DEBUG is enabled, you can pass several boot options
11to the kernel to debug devfs. The boot options are prefixed by
12"devfs=", and are separated by commas. Spaces are not allowed. The
13syntax looks like this:
14
15devfs=<option1>,<option2>,<option3>
16
17and so on. For example, if you wanted to turn on debugging for module
18load requests and device registration, you would do:
19
20devfs=dmod,dreg
21
22You may prefix "no" to any option. This will invert the option.
23
24
25Debugging Options
26=================
27
28These requires CONFIG_DEVFS_DEBUG to be enabled.
29Note that all debugging options have 'd' as the first character. By
30default all options are off. All debugging output is sent to the
31kernel logs. The debugging options do not take effect until the devfs
32version message appears (just prior to the root filesystem being
33mounted).
34
35These are the options:
36
37dmod print module load requests to <request_module>
38
39dreg print device register requests to <devfs_register>
40
41dunreg print device unregister requests to <devfs_unregister>
42
43dchange print device change requests to <devfs_set_flags>
44
45dilookup print inode lookup requests
46
47diget print VFS inode allocations
48
49diunlink print inode unlinks
50
51dichange print inode changes
52
53dimknod print calls to mknod(2)
54
55dall some debugging turned on
56
57
58Other Options
59=============
60
61These control the default behaviour of devfs. The options are:
62
63mount mount devfs onto /dev at boot time
64
65only disable non-devfs device nodes for devfs-capable drivers
diff --git a/Documentation/filesystems/ext3.txt b/Documentation/filesystems/ext3.txt
index afb1335c05d6..4aecc9bdb273 100644
--- a/Documentation/filesystems/ext3.txt
+++ b/Documentation/filesystems/ext3.txt
@@ -113,6 +113,14 @@ noquota
113grpquota 113grpquota
114usrquota 114usrquota
115 115
116bh (*) ext3 associates buffer heads to data pages to
117nobh (a) cache disk block mapping information
118 (b) link pages into transaction to provide
119 ordering guarantees.
120 "bh" option forces use of buffer heads.
121 "nobh" option tries to avoid associating buffer
122 heads (supported only for "writeback" mode).
123
116 124
117Specification 125Specification
118============= 126=============
diff --git a/Documentation/filesystems/fuse.txt b/Documentation/filesystems/fuse.txt
index 33f74310d161..a584f05403a4 100644
--- a/Documentation/filesystems/fuse.txt
+++ b/Documentation/filesystems/fuse.txt
@@ -18,6 +18,14 @@ Non-privileged mount (or user mount):
18 user. NOTE: this is not the same as mounts allowed with the "user" 18 user. NOTE: this is not the same as mounts allowed with the "user"
19 option in /etc/fstab, which is not discussed here. 19 option in /etc/fstab, which is not discussed here.
20 20
21Filesystem connection:
22
23 A connection between the filesystem daemon and the kernel. The
24 connection exists until either the daemon dies, or the filesystem is
25 umounted. Note that detaching (or lazy umounting) the filesystem
26 does _not_ break the connection, in this case it will exist until
27 the last reference to the filesystem is released.
28
21Mount owner: 29Mount owner:
22 30
23 The user who does the mounting. 31 The user who does the mounting.
@@ -86,16 +94,20 @@ Mount options
86 The default is infinite. Note that the size of read requests is 94 The default is infinite. Note that the size of read requests is
87 limited anyway to 32 pages (which is 128kbyte on i386). 95 limited anyway to 32 pages (which is 128kbyte on i386).
88 96
89Sysfs 97Control filesystem
90~~~~~ 98~~~~~~~~~~~~~~~~~~
99
100There's a control filesystem for FUSE, which can be mounted by:
91 101
92FUSE sets up the following hierarchy in sysfs: 102 mount -t fusectl none /sys/fs/fuse/connections
93 103
94 /sys/fs/fuse/connections/N/ 104Mounting it under the '/sys/fs/fuse/connections' directory makes it
105backwards compatible with earlier versions.
95 106
96where N is an increasing number allocated to each new connection. 107Under the fuse control filesystem each connection has a directory
108named by a unique number.
97 109
98For each connection the following attributes are defined: 110For each connection the following files exist within this directory:
99 111
100 'waiting' 112 'waiting'
101 113
@@ -110,7 +122,47 @@ For each connection the following attributes are defined:
110 connection. This means that all waiting requests will be aborted an 122 connection. This means that all waiting requests will be aborted an
111 error returned for all aborted and new requests. 123 error returned for all aborted and new requests.
112 124
113Only a privileged user may read or write these attributes. 125Only the owner of the mount may read or write these files.
126
127Interrupting filesystem operations
128~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
129
130If a process issuing a FUSE filesystem request is interrupted, the
131following will happen:
132
133 1) If the request is not yet sent to userspace AND the signal is
134 fatal (SIGKILL or unhandled fatal signal), then the request is
135 dequeued and returns immediately.
136
137 2) If the request is not yet sent to userspace AND the signal is not
138 fatal, then an 'interrupted' flag is set for the request. When
139 the request has been successfully transfered to userspace and
140 this flag is set, an INTERRUPT request is queued.
141
142 3) If the request is already sent to userspace, then an INTERRUPT
143 request is queued.
144
145INTERRUPT requests take precedence over other requests, so the
146userspace filesystem will receive queued INTERRUPTs before any others.
147
148The userspace filesystem may ignore the INTERRUPT requests entirely,
149or may honor them by sending a reply to the _original_ request, with
150the error set to EINTR.
151
152It is also possible that there's a race between processing the
153original request and it's INTERRUPT request. There are two possibilities:
154
155 1) The INTERRUPT request is processed before the original request is
156 processed
157
158 2) The INTERRUPT request is processed after the original request has
159 been answered
160
161If the filesystem cannot find the original request, it should wait for
162some timeout and/or a number of new requests to arrive, after which it
163should reply to the INTERRUPT request with an EAGAIN error. In case
1641) the INTERRUPT request will be requeued. In case 2) the INTERRUPT
165reply will be ignored.
114 166
115Aborting a filesystem connection 167Aborting a filesystem connection
116~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 168~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -139,8 +191,8 @@ the filesystem. There are several ways to do this:
139 - Use forced umount (umount -f). Works in all cases but only if 191 - Use forced umount (umount -f). Works in all cases but only if
140 filesystem is still attached (it hasn't been lazy unmounted) 192 filesystem is still attached (it hasn't been lazy unmounted)
141 193
142 - Abort filesystem through the sysfs interface. Most powerful 194 - Abort filesystem through the FUSE control filesystem. Most
143 method, always works. 195 powerful method, always works.
144 196
145How do non-privileged mounts work? 197How do non-privileged mounts work?
146~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 198~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -304,25 +356,7 @@ Scenario 1 - Simple deadlock
304 | | for "file"] 356 | | for "file"]
305 | | *DEADLOCK* 357 | | *DEADLOCK*
306 358
307The solution for this is to allow requests to be interrupted while 359The solution for this is to allow the filesystem to be aborted.
308they are in userspace:
309
310 | [interrupted by signal] |
311 | <fuse_unlink() |
312 | [release semaphore] | [semaphore acquired]
313 | <sys_unlink() |
314 | | >fuse_unlink()
315 | | [queue req on fc->pending]
316 | | [wake up fc->waitq]
317 | | [sleep on req->waitq]
318
319If the filesystem daemon was single threaded, this will stop here,
320since there's no other thread to dequeue and execute the request.
321In this case the solution is to kill the FUSE daemon as well. If
322there are multiple serving threads, you just have to kill them as
323long as any remain.
324
325Moral: a filesystem which deadlocks, can soon find itself dead.
326 360
327Scenario 2 - Tricky deadlock 361Scenario 2 - Tricky deadlock
328---------------------------- 362----------------------------
@@ -355,24 +389,14 @@ but is caused by a pagefault.
355 | | [lock page] 389 | | [lock page]
356 | | * DEADLOCK * 390 | | * DEADLOCK *
357 391
358Solution is again to let the the request be interrupted (not 392Solution is basically the same as above.
359elaborated further).
360
361An additional problem is that while the write buffer is being
362copied to the request, the request must not be interrupted. This
363is because the destination address of the copy may not be valid
364after the request is interrupted.
365
366This is solved with doing the copy atomically, and allowing
367interruption while the page(s) belonging to the write buffer are
368faulted with get_user_pages(). The 'req->locked' flag indicates
369when the copy is taking place, and interruption is delayed until
370this flag is unset.
371 393
372Scenario 3 - Tricky deadlock with asynchronous read 394An additional problem is that while the write buffer is being copied
373--------------------------------------------------- 395to the request, the request must not be interrupted/aborted. This is
396because the destination address of the copy may not be valid after the
397request has returned.
374 398
375The same situation as above, except thread-1 will wait on page lock 399This is solved with doing the copy atomically, and allowing abort
376and hence it will be uninterruptible as well. The solution is to 400while the page(s) belonging to the write buffer are faulted with
377abort the connection with forced umount (if mount is attached) or 401get_user_pages(). The 'req->locked' flag indicates when the copy is
378through the abort attribute in sysfs. 402taking place, and abort is delayed until this flag is unset.
diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index 99902ae6804e..7240ee7515de 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -39,6 +39,8 @@ Table of Contents
39 2.9 Appletalk 39 2.9 Appletalk
40 2.10 IPX 40 2.10 IPX
41 2.11 /proc/sys/fs/mqueue - POSIX message queues filesystem 41 2.11 /proc/sys/fs/mqueue - POSIX message queues filesystem
42 2.12 /proc/<pid>/oom_adj - Adjust the oom-killer score
43 2.13 /proc/<pid>/oom_score - Display current oom-killer score
42 44
43------------------------------------------------------------------------------ 45------------------------------------------------------------------------------
44Preface 46Preface
@@ -1124,11 +1126,15 @@ debugging information is displayed on console.
1124NMI switch that most IA32 servers have fires unknown NMI up, for example. 1126NMI switch that most IA32 servers have fires unknown NMI up, for example.
1125If a system hangs up, try pressing the NMI switch. 1127If a system hangs up, try pressing the NMI switch.
1126 1128
1127[NOTE] 1129nmi_watchdog
1128 This function and oprofile share a NMI callback. Therefore this function 1130------------
1129 cannot be enabled when oprofile is activated. 1131
1130 And NMI watchdog will be disabled when the value in this file is set to 1132Enables/Disables the NMI watchdog on x86 systems. When the value is non-zero
1131 non-zero. 1133the NMI watchdog is enabled and will continuously test all online cpus to
1134determine whether or not they are still functioning properly.
1135
1136Because the NMI watchdog shares registers with oprofile, by disabling the NMI
1137watchdog, oprofile may have more registers to utilize.
1132 1138
1133 1139
11342.4 /proc/sys/vm - The virtual memory subsystem 11402.4 /proc/sys/vm - The virtual memory subsystem
@@ -1958,6 +1964,22 @@ a queue must be less or equal then msg_max.
1958maximum message size value (it is every message queue's attribute set during 1964maximum message size value (it is every message queue's attribute set during
1959its creation). 1965its creation).
1960 1966
19672.12 /proc/<pid>/oom_adj - Adjust the oom-killer score
1968------------------------------------------------------
1969
1970This file can be used to adjust the score used to select which processes
1971should be killed in an out-of-memory situation. Giving it a high score will
1972increase the likelihood of this process being killed by the oom-killer. Valid
1973values are in the range -16 to +15, plus the special value -17, which disables
1974oom-killing altogether for this process.
1975
19762.13 /proc/<pid>/oom_score - Display current oom-killer score
1977-------------------------------------------------------------
1978
1979------------------------------------------------------------------------------
1980This file can be used to check the current score used by the oom-killer is for
1981any given <pid>. Use it together with /proc/<pid>/oom_adj to tune which
1982process should be killed in an out-of-memory situation.
1961 1983
1962------------------------------------------------------------------------------ 1984------------------------------------------------------------------------------
1963Summary 1985Summary
diff --git a/Documentation/filesystems/ramfs-rootfs-initramfs.txt b/Documentation/filesystems/ramfs-rootfs-initramfs.txt
index 60ab61e54e8a..25981e2e51be 100644
--- a/Documentation/filesystems/ramfs-rootfs-initramfs.txt
+++ b/Documentation/filesystems/ramfs-rootfs-initramfs.txt
@@ -70,11 +70,13 @@ tmpfs mounts. See Documentation/filesystems/tmpfs.txt for more information.
70What is rootfs? 70What is rootfs?
71--------------- 71---------------
72 72
73Rootfs is a special instance of ramfs, which is always present in 2.6 systems. 73Rootfs is a special instance of ramfs (or tmpfs, if that's enabled), which is
74(It's used internally as the starting and stopping point for searches of the 74always present in 2.6 systems. You can't unmount rootfs for approximately the
75kernel's doubly-linked list of mount points.) 75same reason you can't kill the init process; rather than having special code
76to check for and handle an empty list, it's smaller and simpler for the kernel
77to just make sure certain lists can't become empty.
76 78
77Most systems just mount another filesystem over it and ignore it. The 79Most systems just mount another filesystem over rootfs and ignore it. The
78amount of space an empty instance of ramfs takes up is tiny. 80amount of space an empty instance of ramfs takes up is tiny.
79 81
80What is initramfs? 82What is initramfs?
@@ -92,14 +94,16 @@ out of that.
92 94
93All this differs from the old initrd in several ways: 95All this differs from the old initrd in several ways:
94 96
95 - The old initrd was a separate file, while the initramfs archive is linked 97 - The old initrd was always a separate file, while the initramfs archive is
96 into the linux kernel image. (The directory linux-*/usr is devoted to 98 linked into the linux kernel image. (The directory linux-*/usr is devoted
97 generating this archive during the build.) 99 to generating this archive during the build.)
98 100
99 - The old initrd file was a gzipped filesystem image (in some file format, 101 - The old initrd file was a gzipped filesystem image (in some file format,
100 such as ext2, that had to be built into the kernel), while the new 102 such as ext2, that needed a driver built into the kernel), while the new
101 initramfs archive is a gzipped cpio archive (like tar only simpler, 103 initramfs archive is a gzipped cpio archive (like tar only simpler,
102 see cpio(1) and Documentation/early-userspace/buffer-format.txt). 104 see cpio(1) and Documentation/early-userspace/buffer-format.txt). The
105 kernel's cpio extraction code is not only extremely small, it's also
106 __init data that can be discarded during the boot process.
103 107
104 - The program run by the old initrd (which was called /initrd, not /init) did 108 - The program run by the old initrd (which was called /initrd, not /init) did
105 some setup and then returned to the kernel, while the init program from 109 some setup and then returned to the kernel, while the init program from
@@ -124,13 +128,14 @@ Populating initramfs:
124 128
125The 2.6 kernel build process always creates a gzipped cpio format initramfs 129The 2.6 kernel build process always creates a gzipped cpio format initramfs
126archive and links it into the resulting kernel binary. By default, this 130archive and links it into the resulting kernel binary. By default, this
127archive is empty (consuming 134 bytes on x86). The config option 131archive is empty (consuming 134 bytes on x86).
128CONFIG_INITRAMFS_SOURCE (for some reason buried under devices->block devices 132
129in menuconfig, and living in usr/Kconfig) can be used to specify a source for 133The config option CONFIG_INITRAMFS_SOURCE (for some reason buried under
130the initramfs archive, which will automatically be incorporated into the 134devices->block devices in menuconfig, and living in usr/Kconfig) can be used
131resulting binary. This option can point to an existing gzipped cpio archive, a 135to specify a source for the initramfs archive, which will automatically be
132directory containing files to be archived, or a text file specification such 136incorporated into the resulting binary. This option can point to an existing
133as the following example: 137gzipped cpio archive, a directory containing files to be archived, or a text
138file specification such as the following example:
134 139
135 dir /dev 755 0 0 140 dir /dev 755 0 0
136 nod /dev/console 644 0 0 c 5 1 141 nod /dev/console 644 0 0 c 5 1
@@ -146,23 +151,84 @@ as the following example:
146Run "usr/gen_init_cpio" (after the kernel build) to get a usage message 151Run "usr/gen_init_cpio" (after the kernel build) to get a usage message
147documenting the above file format. 152documenting the above file format.
148 153
149One advantage of the text file is that root access is not required to 154One advantage of the configuration file is that root access is not required to
150set permissions or create device nodes in the new archive. (Note that those 155set permissions or create device nodes in the new archive. (Note that those
151two example "file" entries expect to find files named "init.sh" and "busybox" in 156two example "file" entries expect to find files named "init.sh" and "busybox" in
152a directory called "initramfs", under the linux-2.6.* directory. See 157a directory called "initramfs", under the linux-2.6.* directory. See
153Documentation/early-userspace/README for more details.) 158Documentation/early-userspace/README for more details.)
154 159
155The kernel does not depend on external cpio tools, gen_init_cpio is created 160The kernel does not depend on external cpio tools. If you specify a
156from usr/gen_init_cpio.c which is entirely self-contained, and the kernel's 161directory instead of a configuration file, the kernel's build infrastructure
157boot-time extractor is also (obviously) self-contained. However, if you _do_ 162creates a configuration file from that directory (usr/Makefile calls
158happen to have cpio installed, the following command line can extract the 163scripts/gen_initramfs_list.sh), and proceeds to package up that directory
159generated cpio image back into its component files: 164using the config file (by feeding it to usr/gen_init_cpio, which is created
165from usr/gen_init_cpio.c). The kernel's build-time cpio creation code is
166entirely self-contained, and the kernel's boot-time extractor is also
167(obviously) self-contained.
168
169The one thing you might need external cpio utilities installed for is creating
170or extracting your own preprepared cpio files to feed to the kernel build
171(instead of a config file or directory).
172
173The following command line can extract a cpio image (either by the above script
174or by the kernel build) back into its component files:
160 175
161 cpio -i -d -H newc -F initramfs_data.cpio --no-absolute-filenames 176 cpio -i -d -H newc -F initramfs_data.cpio --no-absolute-filenames
162 177
178The following shell script can create a prebuilt cpio archive you can
179use in place of the above config file:
180
181 #!/bin/sh
182
183 # Copyright 2006 Rob Landley <rob@landley.net> and TimeSys Corporation.
184 # Licensed under GPL version 2
185
186 if [ $# -ne 2 ]
187 then
188 echo "usage: mkinitramfs directory imagename.cpio.gz"
189 exit 1
190 fi
191
192 if [ -d "$1" ]
193 then
194 echo "creating $2 from $1"
195 (cd "$1"; find . | cpio -o -H newc | gzip) > "$2"
196 else
197 echo "First argument must be a directory"
198 exit 1
199 fi
200
201Note: The cpio man page contains some bad advice that will break your initramfs
202archive if you follow it. It says "A typical way to generate the list
203of filenames is with the find command; you should give find the -depth option
204to minimize problems with permissions on directories that are unwritable or not
205searchable." Don't do this when creating initramfs.cpio.gz images, it won't
206work. The Linux kernel cpio extractor won't create files in a directory that
207doesn't exist, so the directory entries must go before the files that go in
208those directories. The above script gets them in the right order.
209
210External initramfs images:
211--------------------------
212
213If the kernel has initrd support enabled, an external cpio.gz archive can also
214be passed into a 2.6 kernel in place of an initrd. In this case, the kernel
215will autodetect the type (initramfs, not initrd) and extract the external cpio
216archive into rootfs before trying to run /init.
217
218This has the memory efficiency advantages of initramfs (no ramdisk block
219device) but the separate packaging of initrd (which is nice if you have
220non-GPL code you'd like to run from initramfs, without conflating it with
221the GPL licensed Linux kernel binary).
222
223It can also be used to supplement the kernel's built-in initamfs image. The
224files in the external archive will overwrite any conflicting files in
225the built-in initramfs archive. Some distributors also prefer to customize
226a single kernel image with task-specific initramfs images, without recompiling.
227
163Contents of initramfs: 228Contents of initramfs:
164---------------------- 229----------------------
165 230
231An initramfs archive is a complete self-contained root filesystem for Linux.
166If you don't already understand what shared libraries, devices, and paths 232If you don't already understand what shared libraries, devices, and paths
167you need to get a minimal root filesystem up and running, here are some 233you need to get a minimal root filesystem up and running, here are some
168references: 234references:
@@ -176,13 +242,36 @@ code against, along with some related utilities. It is BSD licensed.
176 242
177I use uClibc (http://www.uclibc.org) and busybox (http://www.busybox.net) 243I use uClibc (http://www.uclibc.org) and busybox (http://www.busybox.net)
178myself. These are LGPL and GPL, respectively. (A self-contained initramfs 244myself. These are LGPL and GPL, respectively. (A self-contained initramfs
179package is planned for the busybox 1.2 release.) 245package is planned for the busybox 1.3 release.)
180 246
181In theory you could use glibc, but that's not well suited for small embedded 247In theory you could use glibc, but that's not well suited for small embedded
182uses like this. (A "hello world" program statically linked against glibc is 248uses like this. (A "hello world" program statically linked against glibc is
183over 400k. With uClibc it's 7k. Also note that glibc dlopens libnss to do 249over 400k. With uClibc it's 7k. Also note that glibc dlopens libnss to do
184name lookups, even when otherwise statically linked.) 250name lookups, even when otherwise statically linked.)
185 251
252A good first step is to get initramfs to run a statically linked "hello world"
253program as init, and test it under an emulator like qemu (www.qemu.org) or
254User Mode Linux, like so:
255
256 cat > hello.c << EOF
257 #include <stdio.h>
258 #include <unistd.h>
259
260 int main(int argc, char *argv[])
261 {
262 printf("Hello world!\n");
263 sleep(999999999);
264 }
265 EOF
266 gcc -static hello2.c -o init
267 echo init | cpio -o -H newc | gzip > test.cpio.gz
268 # Testing external initramfs using the initrd loading mechanism.
269 qemu -kernel /boot/vmlinuz -initrd test.cpio.gz /dev/zero
270
271When debugging a normal root filesystem, it's nice to be able to boot with
272"init=/bin/sh". The initramfs equivalent is "rdinit=/bin/sh", and it's
273just as useful.
274
186Why cpio rather than tar? 275Why cpio rather than tar?
187------------------------- 276-------------------------
188 277
@@ -241,7 +330,7 @@ the above threads) is:
241Future directions: 330Future directions:
242------------------ 331------------------
243 332
244Today (2.6.14), initramfs is always compiled in, but not always used. The 333Today (2.6.16), initramfs is always compiled in, but not always used. The
245kernel falls back to legacy boot code that is reached only if initramfs does 334kernel falls back to legacy boot code that is reached only if initramfs does
246not contain an /init program. The fallback is legacy code, there to ensure a 335not contain an /init program. The fallback is legacy code, there to ensure a
247smooth transition and allowing early boot functionality to gradually move to 336smooth transition and allowing early boot functionality to gradually move to
@@ -258,8 +347,9 @@ and so on.
258 347
259This kind of complexity (which inevitably includes policy) is rightly handled 348This kind of complexity (which inevitably includes policy) is rightly handled
260in userspace. Both klibc and busybox/uClibc are working on simple initramfs 349in userspace. Both klibc and busybox/uClibc are working on simple initramfs
261packages to drop into a kernel build, and when standard solutions are ready 350packages to drop into a kernel build.
262and widely deployed, the kernel's legacy early boot code will become obsolete
263and a candidate for the feature removal schedule.
264 351
265But that's a while off yet. 352The klibc package has now been accepted into Andrew Morton's 2.6.17-mm tree.
353The kernel's current early boot code (partition detection, etc) will probably
354be migrated into a default initramfs, automatically created and used by the
355kernel build.
diff --git a/Documentation/filesystems/relay.txt b/Documentation/filesystems/relay.txt
new file mode 100644
index 000000000000..d6788dae0349
--- /dev/null
+++ b/Documentation/filesystems/relay.txt
@@ -0,0 +1,479 @@
1relay interface (formerly relayfs)
2==================================
3
4The relay interface provides a means for kernel applications to
5efficiently log and transfer large quantities of data from the kernel
6to userspace via user-defined 'relay channels'.
7
8A 'relay channel' is a kernel->user data relay mechanism implemented
9as a set of per-cpu kernel buffers ('channel buffers'), each
10represented as a regular file ('relay file') in user space. Kernel
11clients write into the channel buffers using efficient write
12functions; these automatically log into the current cpu's channel
13buffer. User space applications mmap() or read() from the relay files
14and retrieve the data as it becomes available. The relay files
15themselves are files created in a host filesystem, e.g. debugfs, and
16are associated with the channel buffers using the API described below.
17
18The format of the data logged into the channel buffers is completely
19up to the kernel client; the relay interface does however provide
20hooks which allow kernel clients to impose some structure on the
21buffer data. The relay interface doesn't implement any form of data
22filtering - this also is left to the kernel client. The purpose is to
23keep things as simple as possible.
24
25This document provides an overview of the relay interface API. The
26details of the function parameters are documented along with the
27functions in the relay interface code - please see that for details.
28
29Semantics
30=========
31
32Each relay channel has one buffer per CPU, each buffer has one or more
33sub-buffers. Messages are written to the first sub-buffer until it is
34too full to contain a new message, in which case it it is written to
35the next (if available). Messages are never split across sub-buffers.
36At this point, userspace can be notified so it empties the first
37sub-buffer, while the kernel continues writing to the next.
38
39When notified that a sub-buffer is full, the kernel knows how many
40bytes of it are padding i.e. unused space occurring because a complete
41message couldn't fit into a sub-buffer. Userspace can use this
42knowledge to copy only valid data.
43
44After copying it, userspace can notify the kernel that a sub-buffer
45has been consumed.
46
47A relay channel can operate in a mode where it will overwrite data not
48yet collected by userspace, and not wait for it to be consumed.
49
50The relay channel itself does not provide for communication of such
51data between userspace and kernel, allowing the kernel side to remain
52simple and not impose a single interface on userspace. It does
53provide a set of examples and a separate helper though, described
54below.
55
56The read() interface both removes padding and internally consumes the
57read sub-buffers; thus in cases where read(2) is being used to drain
58the channel buffers, special-purpose communication between kernel and
59user isn't necessary for basic operation.
60
61One of the major goals of the relay interface is to provide a low
62overhead mechanism for conveying kernel data to userspace. While the
63read() interface is easy to use, it's not as efficient as the mmap()
64approach; the example code attempts to make the tradeoff between the
65two approaches as small as possible.
66
67klog and relay-apps example code
68================================
69
70The relay interface itself is ready to use, but to make things easier,
71a couple simple utility functions and a set of examples are provided.
72
73The relay-apps example tarball, available on the relay sourceforge
74site, contains a set of self-contained examples, each consisting of a
75pair of .c files containing boilerplate code for each of the user and
76kernel sides of a relay application. When combined these two sets of
77boilerplate code provide glue to easily stream data to disk, without
78having to bother with mundane housekeeping chores.
79
80The 'klog debugging functions' patch (klog.patch in the relay-apps
81tarball) provides a couple of high-level logging functions to the
82kernel which allow writing formatted text or raw data to a channel,
83regardless of whether a channel to write into exists or not, or even
84whether the relay interface is compiled into the kernel or not. These
85functions allow you to put unconditional 'trace' statements anywhere
86in the kernel or kernel modules; only when there is a 'klog handler'
87registered will data actually be logged (see the klog and kleak
88examples for details).
89
90It is of course possible to use the relay interface from scratch,
91i.e. without using any of the relay-apps example code or klog, but
92you'll have to implement communication between userspace and kernel,
93allowing both to convey the state of buffers (full, empty, amount of
94padding). The read() interface both removes padding and internally
95consumes the read sub-buffers; thus in cases where read(2) is being
96used to drain the channel buffers, special-purpose communication
97between kernel and user isn't necessary for basic operation. Things
98such as buffer-full conditions would still need to be communicated via
99some channel though.
100
101klog and the relay-apps examples can be found in the relay-apps
102tarball on http://relayfs.sourceforge.net
103
104The relay interface user space API
105==================================
106
107The relay interface implements basic file operations for user space
108access to relay channel buffer data. Here are the file operations
109that are available and some comments regarding their behavior:
110
111open() enables user to open an _existing_ channel buffer.
112
113mmap() results in channel buffer being mapped into the caller's
114 memory space. Note that you can't do a partial mmap - you
115 must map the entire file, which is NRBUF * SUBBUFSIZE.
116
117read() read the contents of a channel buffer. The bytes read are
118 'consumed' by the reader, i.e. they won't be available
119 again to subsequent reads. If the channel is being used
120 in no-overwrite mode (the default), it can be read at any
121 time even if there's an active kernel writer. If the
122 channel is being used in overwrite mode and there are
123 active channel writers, results may be unpredictable -
124 users should make sure that all logging to the channel has
125 ended before using read() with overwrite mode. Sub-buffer
126 padding is automatically removed and will not be seen by
127 the reader.
128
129sendfile() transfer data from a channel buffer to an output file
130 descriptor. Sub-buffer padding is automatically removed
131 and will not be seen by the reader.
132
133poll() POLLIN/POLLRDNORM/POLLERR supported. User applications are
134 notified when sub-buffer boundaries are crossed.
135
136close() decrements the channel buffer's refcount. When the refcount
137 reaches 0, i.e. when no process or kernel client has the
138 buffer open, the channel buffer is freed.
139
140In order for a user application to make use of relay files, the
141host filesystem must be mounted. For example,
142
143 mount -t debugfs debugfs /debug
144
145NOTE: the host filesystem doesn't need to be mounted for kernel
146 clients to create or use channels - it only needs to be
147 mounted when user space applications need access to the buffer
148 data.
149
150
151The relay interface kernel API
152==============================
153
154Here's a summary of the API the relay interface provides to in-kernel clients:
155
156TBD(curr. line MT:/API/)
157 channel management functions:
158
159 relay_open(base_filename, parent, subbuf_size, n_subbufs,
160 callbacks)
161 relay_close(chan)
162 relay_flush(chan)
163 relay_reset(chan)
164
165 channel management typically called on instigation of userspace:
166
167 relay_subbufs_consumed(chan, cpu, subbufs_consumed)
168
169 write functions:
170
171 relay_write(chan, data, length)
172 __relay_write(chan, data, length)
173 relay_reserve(chan, length)
174
175 callbacks:
176
177 subbuf_start(buf, subbuf, prev_subbuf, prev_padding)
178 buf_mapped(buf, filp)
179 buf_unmapped(buf, filp)
180 create_buf_file(filename, parent, mode, buf, is_global)
181 remove_buf_file(dentry)
182
183 helper functions:
184
185 relay_buf_full(buf)
186 subbuf_start_reserve(buf, length)
187
188
189Creating a channel
190------------------
191
192relay_open() is used to create a channel, along with its per-cpu
193channel buffers. Each channel buffer will have an associated file
194created for it in the host filesystem, which can be and mmapped or
195read from in user space. The files are named basename0...basenameN-1
196where N is the number of online cpus, and by default will be created
197in the root of the filesystem (if the parent param is NULL). If you
198want a directory structure to contain your relay files, you should
199create it using the host filesystem's directory creation function,
200e.g. debugfs_create_dir(), and pass the parent directory to
201relay_open(). Users are responsible for cleaning up any directory
202structure they create, when the channel is closed - again the host
203filesystem's directory removal functions should be used for that,
204e.g. debugfs_remove().
205
206In order for a channel to be created and the host filesystem's files
207associated with its channel buffers, the user must provide definitions
208for two callback functions, create_buf_file() and remove_buf_file().
209create_buf_file() is called once for each per-cpu buffer from
210relay_open() and allows the user to create the file which will be used
211to represent the corresponding channel buffer. The callback should
212return the dentry of the file created to represent the channel buffer.
213remove_buf_file() must also be defined; it's responsible for deleting
214the file(s) created in create_buf_file() and is called during
215relay_close().
216
217Here are some typical definitions for these callbacks, in this case
218using debugfs:
219
220/*
221 * create_buf_file() callback. Creates relay file in debugfs.
222 */
223static struct dentry *create_buf_file_handler(const char *filename,
224 struct dentry *parent,
225 int mode,
226 struct rchan_buf *buf,
227 int *is_global)
228{
229 return debugfs_create_file(filename, mode, parent, buf,
230 &relay_file_operations);
231}
232
233/*
234 * remove_buf_file() callback. Removes relay file from debugfs.
235 */
236static int remove_buf_file_handler(struct dentry *dentry)
237{
238 debugfs_remove(dentry);
239
240 return 0;
241}
242
243/*
244 * relay interface callbacks
245 */
246static struct rchan_callbacks relay_callbacks =
247{
248 .create_buf_file = create_buf_file_handler,
249 .remove_buf_file = remove_buf_file_handler,
250};
251
252And an example relay_open() invocation using them:
253
254 chan = relay_open("cpu", NULL, SUBBUF_SIZE, N_SUBBUFS, &relay_callbacks);
255
256If the create_buf_file() callback fails, or isn't defined, channel
257creation and thus relay_open() will fail.
258
259The total size of each per-cpu buffer is calculated by multiplying the
260number of sub-buffers by the sub-buffer size passed into relay_open().
261The idea behind sub-buffers is that they're basically an extension of
262double-buffering to N buffers, and they also allow applications to
263easily implement random-access-on-buffer-boundary schemes, which can
264be important for some high-volume applications. The number and size
265of sub-buffers is completely dependent on the application and even for
266the same application, different conditions will warrant different
267values for these parameters at different times. Typically, the right
268values to use are best decided after some experimentation; in general,
269though, it's safe to assume that having only 1 sub-buffer is a bad
270idea - you're guaranteed to either overwrite data or lose events
271depending on the channel mode being used.
272
273The create_buf_file() implementation can also be defined in such a way
274as to allow the creation of a single 'global' buffer instead of the
275default per-cpu set. This can be useful for applications interested
276mainly in seeing the relative ordering of system-wide events without
277the need to bother with saving explicit timestamps for the purpose of
278merging/sorting per-cpu files in a postprocessing step.
279
280To have relay_open() create a global buffer, the create_buf_file()
281implementation should set the value of the is_global outparam to a
282non-zero value in addition to creating the file that will be used to
283represent the single buffer. In the case of a global buffer,
284create_buf_file() and remove_buf_file() will be called only once. The
285normal channel-writing functions, e.g. relay_write(), can still be
286used - writes from any cpu will transparently end up in the global
287buffer - but since it is a global buffer, callers should make sure
288they use the proper locking for such a buffer, either by wrapping
289writes in a spinlock, or by copying a write function from relay.h and
290creating a local version that internally does the proper locking.
291
292Channel 'modes'
293---------------
294
295relay channels can be used in either of two modes - 'overwrite' or
296'no-overwrite'. The mode is entirely determined by the implementation
297of the subbuf_start() callback, as described below. The default if no
298subbuf_start() callback is defined is 'no-overwrite' mode. If the
299default mode suits your needs, and you plan to use the read()
300interface to retrieve channel data, you can ignore the details of this
301section, as it pertains mainly to mmap() implementations.
302
303In 'overwrite' mode, also known as 'flight recorder' mode, writes
304continuously cycle around the buffer and will never fail, but will
305unconditionally overwrite old data regardless of whether it's actually
306been consumed. In no-overwrite mode, writes will fail, i.e. data will
307be lost, if the number of unconsumed sub-buffers equals the total
308number of sub-buffers in the channel. It should be clear that if
309there is no consumer or if the consumer can't consume sub-buffers fast
310enough, data will be lost in either case; the only difference is
311whether data is lost from the beginning or the end of a buffer.
312
313As explained above, a relay channel is made of up one or more
314per-cpu channel buffers, each implemented as a circular buffer
315subdivided into one or more sub-buffers. Messages are written into
316the current sub-buffer of the channel's current per-cpu buffer via the
317write functions described below. Whenever a message can't fit into
318the current sub-buffer, because there's no room left for it, the
319client is notified via the subbuf_start() callback that a switch to a
320new sub-buffer is about to occur. The client uses this callback to 1)
321initialize the next sub-buffer if appropriate 2) finalize the previous
322sub-buffer if appropriate and 3) return a boolean value indicating
323whether or not to actually move on to the next sub-buffer.
324
325To implement 'no-overwrite' mode, the userspace client would provide
326an implementation of the subbuf_start() callback something like the
327following:
328
329static int subbuf_start(struct rchan_buf *buf,
330 void *subbuf,
331 void *prev_subbuf,
332 unsigned int prev_padding)
333{
334 if (prev_subbuf)
335 *((unsigned *)prev_subbuf) = prev_padding;
336
337 if (relay_buf_full(buf))
338 return 0;
339
340 subbuf_start_reserve(buf, sizeof(unsigned int));
341
342 return 1;
343}
344
345If the current buffer is full, i.e. all sub-buffers remain unconsumed,
346the callback returns 0 to indicate that the buffer switch should not
347occur yet, i.e. until the consumer has had a chance to read the
348current set of ready sub-buffers. For the relay_buf_full() function
349to make sense, the consumer is reponsible for notifying the relay
350interface when sub-buffers have been consumed via
351relay_subbufs_consumed(). Any subsequent attempts to write into the
352buffer will again invoke the subbuf_start() callback with the same
353parameters; only when the consumer has consumed one or more of the
354ready sub-buffers will relay_buf_full() return 0, in which case the
355buffer switch can continue.
356
357The implementation of the subbuf_start() callback for 'overwrite' mode
358would be very similar:
359
360static int subbuf_start(struct rchan_buf *buf,
361 void *subbuf,
362 void *prev_subbuf,
363 unsigned int prev_padding)
364{
365 if (prev_subbuf)
366 *((unsigned *)prev_subbuf) = prev_padding;
367
368 subbuf_start_reserve(buf, sizeof(unsigned int));
369
370 return 1;
371}
372
373In this case, the relay_buf_full() check is meaningless and the
374callback always returns 1, causing the buffer switch to occur
375unconditionally. It's also meaningless for the client to use the
376relay_subbufs_consumed() function in this mode, as it's never
377consulted.
378
379The default subbuf_start() implementation, used if the client doesn't
380define any callbacks, or doesn't define the subbuf_start() callback,
381implements the simplest possible 'no-overwrite' mode, i.e. it does
382nothing but return 0.
383
384Header information can be reserved at the beginning of each sub-buffer
385by calling the subbuf_start_reserve() helper function from within the
386subbuf_start() callback. This reserved area can be used to store
387whatever information the client wants. In the example above, room is
388reserved in each sub-buffer to store the padding count for that
389sub-buffer. This is filled in for the previous sub-buffer in the
390subbuf_start() implementation; the padding value for the previous
391sub-buffer is passed into the subbuf_start() callback along with a
392pointer to the previous sub-buffer, since the padding value isn't
393known until a sub-buffer is filled. The subbuf_start() callback is
394also called for the first sub-buffer when the channel is opened, to
395give the client a chance to reserve space in it. In this case the
396previous sub-buffer pointer passed into the callback will be NULL, so
397the client should check the value of the prev_subbuf pointer before
398writing into the previous sub-buffer.
399
400Writing to a channel
401--------------------
402
403Kernel clients write data into the current cpu's channel buffer using
404relay_write() or __relay_write(). relay_write() is the main logging
405function - it uses local_irqsave() to protect the buffer and should be
406used if you might be logging from interrupt context. If you know
407you'll never be logging from interrupt context, you can use
408__relay_write(), which only disables preemption. These functions
409don't return a value, so you can't determine whether or not they
410failed - the assumption is that you wouldn't want to check a return
411value in the fast logging path anyway, and that they'll always succeed
412unless the buffer is full and no-overwrite mode is being used, in
413which case you can detect a failed write in the subbuf_start()
414callback by calling the relay_buf_full() helper function.
415
416relay_reserve() is used to reserve a slot in a channel buffer which
417can be written to later. This would typically be used in applications
418that need to write directly into a channel buffer without having to
419stage data in a temporary buffer beforehand. Because the actual write
420may not happen immediately after the slot is reserved, applications
421using relay_reserve() can keep a count of the number of bytes actually
422written, either in space reserved in the sub-buffers themselves or as
423a separate array. See the 'reserve' example in the relay-apps tarball
424at http://relayfs.sourceforge.net for an example of how this can be
425done. Because the write is under control of the client and is
426separated from the reserve, relay_reserve() doesn't protect the buffer
427at all - it's up to the client to provide the appropriate
428synchronization when using relay_reserve().
429
430Closing a channel
431-----------------
432
433The client calls relay_close() when it's finished using the channel.
434The channel and its associated buffers are destroyed when there are no
435longer any references to any of the channel buffers. relay_flush()
436forces a sub-buffer switch on all the channel buffers, and can be used
437to finalize and process the last sub-buffers before the channel is
438closed.
439
440Misc
441----
442
443Some applications may want to keep a channel around and re-use it
444rather than open and close a new channel for each use. relay_reset()
445can be used for this purpose - it resets a channel to its initial
446state without reallocating channel buffer memory or destroying
447existing mappings. It should however only be called when it's safe to
448do so, i.e. when the channel isn't currently being written to.
449
450Finally, there are a couple of utility callbacks that can be used for
451different purposes. buf_mapped() is called whenever a channel buffer
452is mmapped from user space and buf_unmapped() is called when it's
453unmapped. The client can use this notification to trigger actions
454within the kernel application, such as enabling/disabling logging to
455the channel.
456
457
458Resources
459=========
460
461For news, example code, mailing list, etc. see the relay interface homepage:
462
463 http://relayfs.sourceforge.net
464
465
466Credits
467=======
468
469The ideas and specs for the relay interface came about as a result of
470discussions on tracing involving the following:
471
472Michel Dagenais <michel.dagenais@polymtl.ca>
473Richard Moore <richardj_moore@uk.ibm.com>
474Bob Wisniewski <bob@watson.ibm.com>
475Karim Yaghmour <karim@opersys.com>
476Tom Zanussi <zanussi@us.ibm.com>
477
478Also thanks to Hubertus Franke for a lot of useful suggestions and bug
479reports.
diff --git a/Documentation/filesystems/relayfs.txt b/Documentation/filesystems/relayfs.txt
deleted file mode 100644
index 5832377b7340..000000000000
--- a/Documentation/filesystems/relayfs.txt
+++ /dev/null
@@ -1,442 +0,0 @@
1
2relayfs - a high-speed data relay filesystem
3============================================
4
5relayfs is a filesystem designed to provide an efficient mechanism for
6tools and facilities to relay large and potentially sustained streams
7of data from kernel space to user space.
8
9The main abstraction of relayfs is the 'channel'. A channel consists
10of a set of per-cpu kernel buffers each represented by a file in the
11relayfs filesystem. Kernel clients write into a channel using
12efficient write functions which automatically log to the current cpu's
13channel buffer. User space applications mmap() the per-cpu files and
14retrieve the data as it becomes available.
15
16The format of the data logged into the channel buffers is completely
17up to the relayfs client; relayfs does however provide hooks which
18allow clients to impose some structure on the buffer data. Nor does
19relayfs implement any form of data filtering - this also is left to
20the client. The purpose is to keep relayfs as simple as possible.
21
22This document provides an overview of the relayfs API. The details of
23the function parameters are documented along with the functions in the
24filesystem code - please see that for details.
25
26Semantics
27=========
28
29Each relayfs channel has one buffer per CPU, each buffer has one or
30more sub-buffers. Messages are written to the first sub-buffer until
31it is too full to contain a new message, in which case it it is
32written to the next (if available). Messages are never split across
33sub-buffers. At this point, userspace can be notified so it empties
34the first sub-buffer, while the kernel continues writing to the next.
35
36When notified that a sub-buffer is full, the kernel knows how many
37bytes of it are padding i.e. unused. Userspace can use this knowledge
38to copy only valid data.
39
40After copying it, userspace can notify the kernel that a sub-buffer
41has been consumed.
42
43relayfs can operate in a mode where it will overwrite data not yet
44collected by userspace, and not wait for it to consume it.
45
46relayfs itself does not provide for communication of such data between
47userspace and kernel, allowing the kernel side to remain simple and
48not impose a single interface on userspace. It does provide a set of
49examples and a separate helper though, described below.
50
51klog and relay-apps example code
52================================
53
54relayfs itself is ready to use, but to make things easier, a couple
55simple utility functions and a set of examples are provided.
56
57The relay-apps example tarball, available on the relayfs sourceforge
58site, contains a set of self-contained examples, each consisting of a
59pair of .c files containing boilerplate code for each of the user and
60kernel sides of a relayfs application; combined these two sets of
61boilerplate code provide glue to easily stream data to disk, without
62having to bother with mundane housekeeping chores.
63
64The 'klog debugging functions' patch (klog.patch in the relay-apps
65tarball) provides a couple of high-level logging functions to the
66kernel which allow writing formatted text or raw data to a channel,
67regardless of whether a channel to write into exists or not, or
68whether relayfs is compiled into the kernel or is configured as a
69module. These functions allow you to put unconditional 'trace'
70statements anywhere in the kernel or kernel modules; only when there
71is a 'klog handler' registered will data actually be logged (see the
72klog and kleak examples for details).
73
74It is of course possible to use relayfs from scratch i.e. without
75using any of the relay-apps example code or klog, but you'll have to
76implement communication between userspace and kernel, allowing both to
77convey the state of buffers (full, empty, amount of padding).
78
79klog and the relay-apps examples can be found in the relay-apps
80tarball on http://relayfs.sourceforge.net
81
82
83The relayfs user space API
84==========================
85
86relayfs implements basic file operations for user space access to
87relayfs channel buffer data. Here are the file operations that are
88available and some comments regarding their behavior:
89
90open() enables user to open an _existing_ buffer.
91
92mmap() results in channel buffer being mapped into the caller's
93 memory space. Note that you can't do a partial mmap - you must
94 map the entire file, which is NRBUF * SUBBUFSIZE.
95
96read() read the contents of a channel buffer. The bytes read are
97 'consumed' by the reader i.e. they won't be available again
98 to subsequent reads. If the channel is being used in
99 no-overwrite mode (the default), it can be read at any time
100 even if there's an active kernel writer. If the channel is
101 being used in overwrite mode and there are active channel
102 writers, results may be unpredictable - users should make
103 sure that all logging to the channel has ended before using
104 read() with overwrite mode.
105
106poll() POLLIN/POLLRDNORM/POLLERR supported. User applications are
107 notified when sub-buffer boundaries are crossed.
108
109close() decrements the channel buffer's refcount. When the refcount
110 reaches 0 i.e. when no process or kernel client has the buffer
111 open, the channel buffer is freed.
112
113
114In order for a user application to make use of relayfs files, the
115relayfs filesystem must be mounted. For example,
116
117 mount -t relayfs relayfs /mnt/relay
118
119NOTE: relayfs doesn't need to be mounted for kernel clients to create
120 or use channels - it only needs to be mounted when user space
121 applications need access to the buffer data.
122
123
124The relayfs kernel API
125======================
126
127Here's a summary of the API relayfs provides to in-kernel clients:
128
129
130 channel management functions:
131
132 relay_open(base_filename, parent, subbuf_size, n_subbufs,
133 callbacks)
134 relay_close(chan)
135 relay_flush(chan)
136 relay_reset(chan)
137 relayfs_create_dir(name, parent)
138 relayfs_remove_dir(dentry)
139 relayfs_create_file(name, parent, mode, fops, data)
140 relayfs_remove_file(dentry)
141
142 channel management typically called on instigation of userspace:
143
144 relay_subbufs_consumed(chan, cpu, subbufs_consumed)
145
146 write functions:
147
148 relay_write(chan, data, length)
149 __relay_write(chan, data, length)
150 relay_reserve(chan, length)
151
152 callbacks:
153
154 subbuf_start(buf, subbuf, prev_subbuf, prev_padding)
155 buf_mapped(buf, filp)
156 buf_unmapped(buf, filp)
157 create_buf_file(filename, parent, mode, buf, is_global)
158 remove_buf_file(dentry)
159
160 helper functions:
161
162 relay_buf_full(buf)
163 subbuf_start_reserve(buf, length)
164
165
166Creating a channel
167------------------
168
169relay_open() is used to create a channel, along with its per-cpu
170channel buffers. Each channel buffer will have an associated file
171created for it in the relayfs filesystem, which can be opened and
172mmapped from user space if desired. The files are named
173basename0...basenameN-1 where N is the number of online cpus, and by
174default will be created in the root of the filesystem. If you want a
175directory structure to contain your relayfs files, you can create it
176with relayfs_create_dir() and pass the parent directory to
177relay_open(). Clients are responsible for cleaning up any directory
178structure they create when the channel is closed - use
179relayfs_remove_dir() for that.
180
181The total size of each per-cpu buffer is calculated by multiplying the
182number of sub-buffers by the sub-buffer size passed into relay_open().
183The idea behind sub-buffers is that they're basically an extension of
184double-buffering to N buffers, and they also allow applications to
185easily implement random-access-on-buffer-boundary schemes, which can
186be important for some high-volume applications. The number and size
187of sub-buffers is completely dependent on the application and even for
188the same application, different conditions will warrant different
189values for these parameters at different times. Typically, the right
190values to use are best decided after some experimentation; in general,
191though, it's safe to assume that having only 1 sub-buffer is a bad
192idea - you're guaranteed to either overwrite data or lose events
193depending on the channel mode being used.
194
195Channel 'modes'
196---------------
197
198relayfs channels can be used in either of two modes - 'overwrite' or
199'no-overwrite'. The mode is entirely determined by the implementation
200of the subbuf_start() callback, as described below. In 'overwrite'
201mode, also known as 'flight recorder' mode, writes continuously cycle
202around the buffer and will never fail, but will unconditionally
203overwrite old data regardless of whether it's actually been consumed.
204In no-overwrite mode, writes will fail i.e. data will be lost, if the
205number of unconsumed sub-buffers equals the total number of
206sub-buffers in the channel. It should be clear that if there is no
207consumer or if the consumer can't consume sub-buffers fast enought,
208data will be lost in either case; the only difference is whether data
209is lost from the beginning or the end of a buffer.
210
211As explained above, a relayfs channel is made of up one or more
212per-cpu channel buffers, each implemented as a circular buffer
213subdivided into one or more sub-buffers. Messages are written into
214the current sub-buffer of the channel's current per-cpu buffer via the
215write functions described below. Whenever a message can't fit into
216the current sub-buffer, because there's no room left for it, the
217client is notified via the subbuf_start() callback that a switch to a
218new sub-buffer is about to occur. The client uses this callback to 1)
219initialize the next sub-buffer if appropriate 2) finalize the previous
220sub-buffer if appropriate and 3) return a boolean value indicating
221whether or not to actually go ahead with the sub-buffer switch.
222
223To implement 'no-overwrite' mode, the userspace client would provide
224an implementation of the subbuf_start() callback something like the
225following:
226
227static int subbuf_start(struct rchan_buf *buf,
228 void *subbuf,
229 void *prev_subbuf,
230 unsigned int prev_padding)
231{
232 if (prev_subbuf)
233 *((unsigned *)prev_subbuf) = prev_padding;
234
235 if (relay_buf_full(buf))
236 return 0;
237
238 subbuf_start_reserve(buf, sizeof(unsigned int));
239
240 return 1;
241}
242
243If the current buffer is full i.e. all sub-buffers remain unconsumed,
244the callback returns 0 to indicate that the buffer switch should not
245occur yet i.e. until the consumer has had a chance to read the current
246set of ready sub-buffers. For the relay_buf_full() function to make
247sense, the consumer is reponsible for notifying relayfs when
248sub-buffers have been consumed via relay_subbufs_consumed(). Any
249subsequent attempts to write into the buffer will again invoke the
250subbuf_start() callback with the same parameters; only when the
251consumer has consumed one or more of the ready sub-buffers will
252relay_buf_full() return 0, in which case the buffer switch can
253continue.
254
255The implementation of the subbuf_start() callback for 'overwrite' mode
256would be very similar:
257
258static int subbuf_start(struct rchan_buf *buf,
259 void *subbuf,
260 void *prev_subbuf,
261 unsigned int prev_padding)
262{
263 if (prev_subbuf)
264 *((unsigned *)prev_subbuf) = prev_padding;
265
266 subbuf_start_reserve(buf, sizeof(unsigned int));
267
268 return 1;
269}
270
271In this case, the relay_buf_full() check is meaningless and the
272callback always returns 1, causing the buffer switch to occur
273unconditionally. It's also meaningless for the client to use the
274relay_subbufs_consumed() function in this mode, as it's never
275consulted.
276
277The default subbuf_start() implementation, used if the client doesn't
278define any callbacks, or doesn't define the subbuf_start() callback,
279implements the simplest possible 'no-overwrite' mode i.e. it does
280nothing but return 0.
281
282Header information can be reserved at the beginning of each sub-buffer
283by calling the subbuf_start_reserve() helper function from within the
284subbuf_start() callback. This reserved area can be used to store
285whatever information the client wants. In the example above, room is
286reserved in each sub-buffer to store the padding count for that
287sub-buffer. This is filled in for the previous sub-buffer in the
288subbuf_start() implementation; the padding value for the previous
289sub-buffer is passed into the subbuf_start() callback along with a
290pointer to the previous sub-buffer, since the padding value isn't
291known until a sub-buffer is filled. The subbuf_start() callback is
292also called for the first sub-buffer when the channel is opened, to
293give the client a chance to reserve space in it. In this case the
294previous sub-buffer pointer passed into the callback will be NULL, so
295the client should check the value of the prev_subbuf pointer before
296writing into the previous sub-buffer.
297
298Writing to a channel
299--------------------
300
301kernel clients write data into the current cpu's channel buffer using
302relay_write() or __relay_write(). relay_write() is the main logging
303function - it uses local_irqsave() to protect the buffer and should be
304used if you might be logging from interrupt context. If you know
305you'll never be logging from interrupt context, you can use
306__relay_write(), which only disables preemption. These functions
307don't return a value, so you can't determine whether or not they
308failed - the assumption is that you wouldn't want to check a return
309value in the fast logging path anyway, and that they'll always succeed
310unless the buffer is full and no-overwrite mode is being used, in
311which case you can detect a failed write in the subbuf_start()
312callback by calling the relay_buf_full() helper function.
313
314relay_reserve() is used to reserve a slot in a channel buffer which
315can be written to later. This would typically be used in applications
316that need to write directly into a channel buffer without having to
317stage data in a temporary buffer beforehand. Because the actual write
318may not happen immediately after the slot is reserved, applications
319using relay_reserve() can keep a count of the number of bytes actually
320written, either in space reserved in the sub-buffers themselves or as
321a separate array. See the 'reserve' example in the relay-apps tarball
322at http://relayfs.sourceforge.net for an example of how this can be
323done. Because the write is under control of the client and is
324separated from the reserve, relay_reserve() doesn't protect the buffer
325at all - it's up to the client to provide the appropriate
326synchronization when using relay_reserve().
327
328Closing a channel
329-----------------
330
331The client calls relay_close() when it's finished using the channel.
332The channel and its associated buffers are destroyed when there are no
333longer any references to any of the channel buffers. relay_flush()
334forces a sub-buffer switch on all the channel buffers, and can be used
335to finalize and process the last sub-buffers before the channel is
336closed.
337
338Creating non-relay files
339------------------------
340
341relay_open() automatically creates files in the relayfs filesystem to
342represent the per-cpu kernel buffers; it's often useful for
343applications to be able to create their own files alongside the relay
344files in the relayfs filesystem as well e.g. 'control' files much like
345those created in /proc or debugfs for similar purposes, used to
346communicate control information between the kernel and user sides of a
347relayfs application. For this purpose the relayfs_create_file() and
348relayfs_remove_file() API functions exist. For relayfs_create_file(),
349the caller passes in a set of user-defined file operations to be used
350for the file and an optional void * to a user-specified data item,
351which will be accessible via inode->u.generic_ip (see the relay-apps
352tarball for examples). The file_operations are a required parameter
353to relayfs_create_file() and thus the semantics of these files are
354completely defined by the caller.
355
356See the relay-apps tarball at http://relayfs.sourceforge.net for
357examples of how these non-relay files are meant to be used.
358
359Creating relay files in other filesystems
360-----------------------------------------
361
362By default of course, relay_open() creates relay files in the relayfs
363filesystem. Because relay_file_operations is exported, however, it's
364also possible to create and use relay files in other pseudo-filesytems
365such as debugfs.
366
367For this purpose, two callback functions are provided,
368create_buf_file() and remove_buf_file(). create_buf_file() is called
369once for each per-cpu buffer from relay_open() to allow the client to
370create a file to be used to represent the corresponding buffer; if
371this callback is not defined, the default implementation will create
372and return a file in the relayfs filesystem to represent the buffer.
373The callback should return the dentry of the file created to represent
374the relay buffer. Note that the parent directory passed to
375relay_open() (and passed along to the callback), if specified, must
376exist in the same filesystem the new relay file is created in. If
377create_buf_file() is defined, remove_buf_file() must also be defined;
378it's responsible for deleting the file(s) created in create_buf_file()
379and is called during relay_close().
380
381The create_buf_file() implementation can also be defined in such a way
382as to allow the creation of a single 'global' buffer instead of the
383default per-cpu set. This can be useful for applications interested
384mainly in seeing the relative ordering of system-wide events without
385the need to bother with saving explicit timestamps for the purpose of
386merging/sorting per-cpu files in a postprocessing step.
387
388To have relay_open() create a global buffer, the create_buf_file()
389implementation should set the value of the is_global outparam to a
390non-zero value in addition to creating the file that will be used to
391represent the single buffer. In the case of a global buffer,
392create_buf_file() and remove_buf_file() will be called only once. The
393normal channel-writing functions e.g. relay_write() can still be used
394- writes from any cpu will transparently end up in the global buffer -
395but since it is a global buffer, callers should make sure they use the
396proper locking for such a buffer, either by wrapping writes in a
397spinlock, or by copying a write function from relayfs_fs.h and
398creating a local version that internally does the proper locking.
399
400See the 'exported-relayfile' examples in the relay-apps tarball for
401examples of creating and using relay files in debugfs.
402
403Misc
404----
405
406Some applications may want to keep a channel around and re-use it
407rather than open and close a new channel for each use. relay_reset()
408can be used for this purpose - it resets a channel to its initial
409state without reallocating channel buffer memory or destroying
410existing mappings. It should however only be called when it's safe to
411do so i.e. when the channel isn't currently being written to.
412
413Finally, there are a couple of utility callbacks that can be used for
414different purposes. buf_mapped() is called whenever a channel buffer
415is mmapped from user space and buf_unmapped() is called when it's
416unmapped. The client can use this notification to trigger actions
417within the kernel application, such as enabling/disabling logging to
418the channel.
419
420
421Resources
422=========
423
424For news, example code, mailing list, etc. see the relayfs homepage:
425
426 http://relayfs.sourceforge.net
427
428
429Credits
430=======
431
432The ideas and specs for relayfs came about as a result of discussions
433on tracing involving the following:
434
435Michel Dagenais <michel.dagenais@polymtl.ca>
436Richard Moore <richardj_moore@uk.ibm.com>
437Bob Wisniewski <bob@watson.ibm.com>
438Karim Yaghmour <karim@opersys.com>
439Tom Zanussi <zanussi@us.ibm.com>
440
441Also thanks to Hubertus Franke for a lot of useful suggestions and bug
442reports.
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index 9d3aed628bc1..1cb7e8be927a 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -113,8 +113,8 @@ members are defined:
113struct file_system_type { 113struct file_system_type {
114 const char *name; 114 const char *name;
115 int fs_flags; 115 int fs_flags;
116 struct int (*get_sb) (struct file_system_type *, int, 116 int (*get_sb) (struct file_system_type *, int,
117 const char *, void *, struct vfsmount *); 117 const char *, void *, struct vfsmount *);
118 void (*kill_sb) (struct super_block *); 118 void (*kill_sb) (struct super_block *);
119 struct module *owner; 119 struct module *owner;
120 struct file_system_type * next; 120 struct file_system_type * next;
diff --git a/Documentation/hwmon/abituguru b/Documentation/hwmon/abituguru
index 69cdb527d58f..b2c0d61b39a2 100644
--- a/Documentation/hwmon/abituguru
+++ b/Documentation/hwmon/abituguru
@@ -2,13 +2,36 @@ Kernel driver abituguru
2======================= 2=======================
3 3
4Supported chips: 4Supported chips:
5 * Abit uGuru (Hardware Monitor part only) 5 * Abit uGuru revision 1-3 (Hardware Monitor part only)
6 Prefix: 'abituguru' 6 Prefix: 'abituguru'
7 Addresses scanned: ISA 0x0E0 7 Addresses scanned: ISA 0x0E0
8 Datasheet: Not available, this driver is based on reverse engineering. 8 Datasheet: Not available, this driver is based on reverse engineering.
9 A "Datasheet" has been written based on the reverse engineering it 9 A "Datasheet" has been written based on the reverse engineering it
10 should be available in the same dir as this file under the name 10 should be available in the same dir as this file under the name
11 abituguru-datasheet. 11 abituguru-datasheet.
12 Note:
13 The uGuru is a microcontroller with onboard firmware which programs
14 it to behave as a hwmon IC. There are many different revisions of the
15 firmware and thus effectivly many different revisions of the uGuru.
16 Below is an incomplete list with which revisions are used for which
17 Motherboards:
18 uGuru 1.00 ~ 1.24 (AI7, KV8-MAX3, AN7) (1)
19 uGuru 2.0.0.0 ~ 2.0.4.2 (KV8-PRO)
20 uGuru 2.1.0.0 ~ 2.1.2.8 (AS8, AV8, AA8, AG8, AA8XE, AX8)
21 uGuru 2.2.0.0 ~ 2.2.0.6 (AA8 Fatal1ty)
22 uGuru 2.3.0.0 ~ 2.3.0.9 (AN8)
23 uGuru 3.0.0.0 ~ 3.0.1.2 (AW8, AL8, NI8)
24 uGuru 4.xxxxx? (AT8 32X) (2)
25 1) For revisions 2 and 3 uGuru's the driver can autodetect the
26 sensortype (Volt or Temp) for bank1 sensors, for revision 1 uGuru's
27 this doesnot always work. For these uGuru's the autodection can
28 be overriden with the bank1_types module param. For all 3 known
29 revison 1 motherboards the correct use of this param is:
30 bank1_types=1,1,0,0,0,0,0,2,0,0,0,0,2,0,0,1
31 You may also need to specify the fan_sensors option for these boards
32 fan_sensors=5
33 2) The current version of the abituguru driver is known to NOT work
34 on these Motherboards
12 35
13Authors: 36Authors:
14 Hans de Goede <j.w.r.degoede@hhs.nl>, 37 Hans de Goede <j.w.r.degoede@hhs.nl>,
@@ -22,6 +45,11 @@ Module Parameters
22* force: bool Force detection. Note this parameter only causes the 45* force: bool Force detection. Note this parameter only causes the
23 detection to be skipped, if the uGuru can't be read 46 detection to be skipped, if the uGuru can't be read
24 the module initialization (insmod) will still fail. 47 the module initialization (insmod) will still fail.
48* bank1_types: int[] Bank1 sensortype autodetection override:
49 -1 autodetect (default)
50 0 volt sensor
51 1 temp sensor
52 2 not connected
25* fan_sensors: int Tell the driver how many fan speed sensors there are 53* fan_sensors: int Tell the driver how many fan speed sensors there are
26 on your motherboard. Default: 0 (autodetect). 54 on your motherboard. Default: 0 (autodetect).
27* pwms: int Tell the driver how many fan speed controls (fan 55* pwms: int Tell the driver how many fan speed controls (fan
@@ -29,7 +57,7 @@ Module Parameters
29* verbose: int How verbose should the driver be? (0-3): 57* verbose: int How verbose should the driver be? (0-3):
30 0 normal output 58 0 normal output
31 1 + verbose error reporting 59 1 + verbose error reporting
32 2 + sensors type probing info\n" 60 2 + sensors type probing info (default)
33 3 + retryable error reporting 61 3 + retryable error reporting
34 Default: 2 (the driver is still in the testing phase) 62 Default: 2 (the driver is still in the testing phase)
35 63
diff --git a/Documentation/hwmon/it87 b/Documentation/hwmon/it87
index 9555be1ed999..e783fd62e308 100644
--- a/Documentation/hwmon/it87
+++ b/Documentation/hwmon/it87
@@ -13,12 +13,25 @@ Supported chips:
13 from Super I/O config space (8 I/O ports) 13 from Super I/O config space (8 I/O ports)
14 Datasheet: Publicly available at the ITE website 14 Datasheet: Publicly available at the ITE website
15 http://www.ite.com.tw/ 15 http://www.ite.com.tw/
16 * IT8716F
17 Prefix: 'it8716'
18 Addresses scanned: from Super I/O config space (8 I/O ports)
19 Datasheet: Publicly available at the ITE website
20 http://www.ite.com.tw/product_info/file/pc/IT8716F_V0.3.ZIP
21 * IT8718F
22 Prefix: 'it8718'
23 Addresses scanned: from Super I/O config space (8 I/O ports)
24 Datasheet: Publicly available at the ITE website
25 http://www.ite.com.tw/product_info/file/pc/IT8718F_V0.2.zip
26 http://www.ite.com.tw/product_info/file/pc/IT8718F_V0%203_(for%20C%20version).zip
16 * SiS950 [clone of IT8705F] 27 * SiS950 [clone of IT8705F]
17 Prefix: 'it87' 28 Prefix: 'it87'
18 Addresses scanned: from Super I/O config space (8 I/O ports) 29 Addresses scanned: from Super I/O config space (8 I/O ports)
19 Datasheet: No longer be available 30 Datasheet: No longer be available
20 31
21Author: Christophe Gauthron <chrisg@0-in.com> 32Authors:
33 Christophe Gauthron <chrisg@0-in.com>
34 Jean Delvare <khali@linux-fr.org>
22 35
23 36
24Module Parameters 37Module Parameters
@@ -43,26 +56,46 @@ Module Parameters
43Description 56Description
44----------- 57-----------
45 58
46This driver implements support for the IT8705F, IT8712F and SiS950 chips. 59This driver implements support for the IT8705F, IT8712F, IT8716F,
47 60IT8718F and SiS950 chips.
48This driver also supports IT8712F, which adds SMBus access, and a VID
49input, used to report the Vcore voltage of the Pentium processor.
50The IT8712F additionally features VID inputs.
51 61
52These chips are 'Super I/O chips', supporting floppy disks, infrared ports, 62These chips are 'Super I/O chips', supporting floppy disks, infrared ports,
53joysticks and other miscellaneous stuff. For hardware monitoring, they 63joysticks and other miscellaneous stuff. For hardware monitoring, they
54include an 'environment controller' with 3 temperature sensors, 3 fan 64include an 'environment controller' with 3 temperature sensors, 3 fan
55rotation speed sensors, 8 voltage sensors, and associated alarms. 65rotation speed sensors, 8 voltage sensors, and associated alarms.
56 66
67The IT8712F and IT8716F additionally feature VID inputs, used to report
68the Vcore voltage of the processor. The early IT8712F have 5 VID pins,
69the IT8716F and late IT8712F have 6. They are shared with other functions
70though, so the functionality may not be available on a given system.
71The driver dumbly assume it is there.
72
73The IT8718F also features VID inputs (up to 8 pins) but the value is
74stored in the Super-I/O configuration space. Due to technical limitations,
75this value can currently only be read once at initialization time, so
76the driver won't notice and report changes in the VID value. The two
77upper VID bits share their pins with voltage inputs (in5 and in6) so you
78can't have both on a given board.
79
80The IT8716F, IT8718F and later IT8712F revisions have support for
812 additional fans. They are not yet supported by the driver.
82
83The IT8716F and IT8718F, and late IT8712F and IT8705F also have optional
8416-bit tachometer counters for fans 1 to 3. This is better (no more fan
85clock divider mess) but not compatible with the older chips and
86revisions. For now, the driver only uses the 16-bit mode on the
87IT8716F and IT8718F.
88
57Temperatures are measured in degrees Celsius. An alarm is triggered once 89Temperatures are measured in degrees Celsius. An alarm is triggered once
58when the Overtemperature Shutdown limit is crossed. 90when the Overtemperature Shutdown limit is crossed.
59 91
60Fan rotation speeds are reported in RPM (rotations per minute). An alarm is 92Fan rotation speeds are reported in RPM (rotations per minute). An alarm is
61triggered if the rotation speed has dropped below a programmable limit. Fan 93triggered if the rotation speed has dropped below a programmable limit. When
62readings can be divided by a programmable divider (1, 2, 4 or 8) to give the 9416-bit tachometer counters aren't used, fan readings can be divided by
63readings more range or accuracy. Not all RPM values can accurately be 95a programmable divider (1, 2, 4 or 8) to give the readings more range or
64represented, so some rounding is done. With a divider of 2, the lowest 96accuracy. With a divider of 2, the lowest representable value is around
65representable value is around 2600 RPM. 972600 RPM. Not all RPM values can accurately be represented, so some rounding
98is done.
66 99
67Voltage sensors (also known as IN sensors) report their values in volts. An 100Voltage sensors (also known as IN sensors) report their values in volts. An
68alarm is triggered if the voltage has crossed a programmable minimum or 101alarm is triggered if the voltage has crossed a programmable minimum or
@@ -71,9 +104,9 @@ zero'; this is important for negative voltage measurements. All voltage
71inputs can measure voltages between 0 and 4.08 volts, with a resolution of 104inputs can measure voltages between 0 and 4.08 volts, with a resolution of
720.016 volt. The battery voltage in8 does not have limit registers. 1050.016 volt. The battery voltage in8 does not have limit registers.
73 106
74The VID lines (IT8712F only) encode the core voltage value: the voltage 107The VID lines (IT8712F/IT8716F/IT8718F) encode the core voltage value:
75level your processor should work with. This is hardcoded by the mainboard 108the voltage level your processor should work with. This is hardcoded by
76and/or processor itself. It is a value in volts. 109the mainboard and/or processor itself. It is a value in volts.
77 110
78If an alarm triggers, it will remain triggered until the hardware register 111If an alarm triggers, it will remain triggered until the hardware register
79is read at least once. This means that the cause for the alarm may already 112is read at least once. This means that the cause for the alarm may already
diff --git a/Documentation/hwmon/k8temp b/Documentation/hwmon/k8temp
new file mode 100644
index 000000000000..bab445ab0f52
--- /dev/null
+++ b/Documentation/hwmon/k8temp
@@ -0,0 +1,52 @@
1Kernel driver k8temp
2====================
3
4Supported chips:
5 * AMD K8 CPU
6 Prefix: 'k8temp'
7 Addresses scanned: PCI space
8 Datasheet: http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/32559.pdf
9
10Author: Rudolf Marek
11Contact: Rudolf Marek <r.marek@sh.cvut.cz>
12
13Description
14-----------
15
16This driver permits reading temperature sensor(s) embedded inside AMD K8 CPUs.
17Official documentation says that it works from revision F of K8 core, but
18in fact it seems to be implemented for all revisions of K8 except the first
19two revisions (SH-B0 and SH-B3).
20
21There can be up to four temperature sensors inside single CPU. The driver
22will auto-detect the sensors and will display only temperatures from
23implemented sensors.
24
25Mapping of /sys files is as follows:
26
27temp1_input - temperature of Core 0 and "place" 0
28temp2_input - temperature of Core 0 and "place" 1
29temp3_input - temperature of Core 1 and "place" 0
30temp4_input - temperature of Core 1 and "place" 1
31
32Temperatures are measured in degrees Celsius and measurement resolution is
331 degree C. It is expected that future CPU will have better resolution. The
34temperature is updated once a second. Valid temperatures are from -49 to
35206 degrees C.
36
37Temperature known as TCaseMax was specified for processors up to revision E.
38This temperature is defined as temperature between heat-spreader and CPU
39case, so the internal CPU temperature supplied by this driver can be higher.
40There is no easy way how to measure the temperature which will correlate
41with TCaseMax temperature.
42
43For newer revisions of CPU (rev F, socket AM2) there is a mathematically
44computed temperature called TControl, which must be lower than TControlMax.
45
46The relationship is following:
47
48temp1_input - TjOffset*2 < TControlMax,
49
50TjOffset is not yet exported by the driver, TControlMax is usually
5170 degrees C. The rule of the thumb -> CPU temperature should not cross
5260 degrees C too much.
diff --git a/Documentation/hwmon/vt1211 b/Documentation/hwmon/vt1211
new file mode 100644
index 000000000000..77fa633b97a8
--- /dev/null
+++ b/Documentation/hwmon/vt1211
@@ -0,0 +1,206 @@
1Kernel driver vt1211
2====================
3
4Supported chips:
5 * VIA VT1211
6 Prefix: 'vt1211'
7 Addresses scanned: none, address read from Super-I/O config space
8 Datasheet: Provided by VIA upon request and under NDA
9
10Authors: Juerg Haefliger <juergh@gmail.com>
11
12This driver is based on the driver for kernel 2.4 by Mark D. Studebaker and
13its port to kernel 2.6 by Lars Ekman.
14
15Thanks to Joseph Chan and Fiona Gatt from VIA for providing documentation and
16technical support.
17
18
19Module Parameters
20-----------------
21
22* uch_config: int Override the BIOS default universal channel (UCH)
23 configuration for channels 1-5.
24 Legal values are in the range of 0-31. Bit 0 maps to
25 UCH1, bit 1 maps to UCH2 and so on. Setting a bit to 1
26 enables the thermal input of that particular UCH and
27 setting a bit to 0 enables the voltage input.
28
29* int_mode: int Override the BIOS default temperature interrupt mode.
30 The only possible value is 0 which forces interrupt
31 mode 0. In this mode, any pending interrupt is cleared
32 when the status register is read but is regenerated as
33 long as the temperature stays above the hysteresis
34 limit.
35
36Be aware that overriding BIOS defaults might cause some unwanted side effects!
37
38
39Description
40-----------
41
42The VIA VT1211 Super-I/O chip includes complete hardware monitoring
43capabilities. It monitors 2 dedicated temperature sensor inputs (temp1 and
44temp2), 1 dedicated voltage (in5) and 2 fans. Additionally, the chip
45implements 5 universal input channels (UCH1-5) that can be individually
46programmed to either monitor a voltage or a temperature.
47
48This chip also provides manual and automatic control of fan speeds (according
49to the datasheet). The driver only supports automatic control since the manual
50mode doesn't seem to work as advertised in the datasheet. In fact I couldn't
51get manual mode to work at all! Be aware that automatic mode hasn't been
52tested very well (due to the fact that my EPIA M10000 doesn't have the fans
53connected to the PWM outputs of the VT1211 :-().
54
55The following table shows the relationship between the vt1211 inputs and the
56sysfs nodes.
57
58Sensor Voltage Mode Temp Mode Default Use (from the datasheet)
59------ ------------ --------- --------------------------------
60Reading 1 temp1 Intel thermal diode
61Reading 3 temp2 Internal thermal diode
62UCH1/Reading2 in0 temp3 NTC type thermistor
63UCH2 in1 temp4 +2.5V
64UCH3 in2 temp5 VccP (processor core)
65UCH4 in3 temp6 +5V
66UCH5 in4 temp7 +12V
67+3.3V in5 Internal VCC (+3.3V)
68
69
70Voltage Monitoring
71------------------
72
73Voltages are sampled by an 8-bit ADC with a LSB of ~10mV. The supported input
74range is thus from 0 to 2.60V. Voltage values outside of this range need
75external scaling resistors. This external scaling needs to be compensated for
76via compute lines in sensors.conf, like:
77
78compute inx @*(1+R1/R2), @/(1+R1/R2)
79
80The board level scaling resistors according to VIA's recommendation are as
81follows. And this is of course totally dependent on the actual board
82implementation :-) You will have to find documentation for your own
83motherboard and edit sensors.conf accordingly.
84
85 Expected
86Voltage R1 R2 Divider Raw Value
87-----------------------------------------------
88+2.5V 2K 10K 1.2 2083 mV
89VccP --- --- 1.0 1400 mV (1)
90+5V 14K 10K 2.4 2083 mV
91+12V 47K 10K 5.7 2105 mV
92+3.3V (int) 2K 3.4K 1.588 3300 mV (2)
93+3.3V (ext) 6.8K 10K 1.68 1964 mV
94
95(1) Depending on the CPU (1.4V is for a VIA C3 Nehemiah).
96(2) R1 and R2 for 3.3V (int) are internal to the VT1211 chip and the driver
97 performs the scaling and returns the properly scaled voltage value.
98
99Each measured voltage has an associated low and high limit which triggers an
100alarm when crossed.
101
102
103Temperature Monitoring
104----------------------
105
106Temperatures are reported in millidegree Celsius. Each measured temperature
107has a high limit which triggers an alarm if crossed. There is an associated
108hysteresis value with each temperature below which the temperature has to drop
109before the alarm is cleared (this is only true for interrupt mode 0). The
110interrupt mode can be forced to 0 in case the BIOS doesn't do it
111automatically. See the 'Module Parameters' section for details.
112
113All temperature channels except temp2 are external. Temp2 is the VT1211
114internal thermal diode and the driver does all the scaling for temp2 and
115returns the temperature in millidegree Celsius. For the external channels
116temp1 and temp3-temp7, scaling depends on the board implementation and needs
117to be performed in userspace via sensors.conf.
118
119Temp1 is an Intel-type thermal diode which requires the following formula to
120convert between sysfs readings and real temperatures:
121
122compute temp1 (@-Offset)/Gain, (@*Gain)+Offset
123
124According to the VIA VT1211 BIOS porting guide, the following gain and offset
125values should be used:
126
127Diode Type Offset Gain
128---------- ------ ----
129Intel CPU 88.638 0.9528
130 65.000 0.9686 *)
131VIA C3 Ezra 83.869 0.9528
132VIA C3 Ezra-T 73.869 0.9528
133
134*) This is the formula from the lm_sensors 2.10.0 sensors.conf file. I don't
135know where it comes from or how it was derived, it's just listed here for
136completeness.
137
138Temp3-temp7 support NTC thermistors. For these channels, the driver returns
139the voltages as seen at the individual pins of UCH1-UCH5. The voltage at the
140pin (Vpin) is formed by a voltage divider made of the thermistor (Rth) and a
141scaling resistor (Rs):
142
143Vpin = 2200 * Rth / (Rs + Rth) (2200 is the ADC max limit of 2200 mV)
144
145The equation for the thermistor is as follows (google it if you want to know
146more about it):
147
148Rth = Ro * exp(B * (1 / T - 1 / To)) (To is 298.15K (25C) and Ro is the
149 nominal resistance at 25C)
150
151Mingling the above two equations and assuming Rs = Ro and B = 3435 yields the
152following formula for sensors.conf:
153
154compute tempx 1 / (1 / 298.15 - (` (2200 / @ - 1)) / 3435) - 273.15,
155 2200 / (1 + (^ (3435 / 298.15 - 3435 / (273.15 + @))))
156
157
158Fan Speed Control
159-----------------
160
161The VT1211 provides 2 programmable PWM outputs to control the speeds of 2
162fans. Writing a 2 to any of the two pwm[1-2]_enable sysfs nodes will put the
163PWM controller in automatic mode. There is only a single controller that
164controls both PWM outputs but each PWM output can be individually enabled and
165disabled.
166
167Each PWM has 4 associated distinct output duty-cycles: full, high, low and
168off. Full and off are internally hard-wired to 255 (100%) and 0 (0%),
169respectively. High and low can be programmed via
170pwm[1-2]_auto_point[2-3]_pwm. Each PWM output can be associated with a
171different thermal input but - and here's the weird part - only one set of
172thermal thresholds exist that controls both PWMs output duty-cycles. The
173thermal thresholds are accessible via pwm[1-2]_auto_point[1-4]_temp. Note
174that even though there are 2 sets of 4 auto points each, they map to the same
175registers in the VT1211 and programming one set is sufficient (actually only
176the first set pwm1_auto_point[1-4]_temp is writable, the second set is
177read-only).
178
179PWM Auto Point PWM Output Duty-Cycle
180------------------------------------------------
181pwm[1-2]_auto_point4_pwm full speed duty-cycle (hard-wired to 255)
182pwm[1-2]_auto_point3_pwm high speed duty-cycle
183pwm[1-2]_auto_point2_pwm low speed duty-cycle
184pwm[1-2]_auto_point1_pwm off duty-cycle (hard-wired to 0)
185
186Temp Auto Point Thermal Threshold
187---------------------------------------------
188pwm[1-2]_auto_point4_temp full speed temp
189pwm[1-2]_auto_point3_temp high speed temp
190pwm[1-2]_auto_point2_temp low speed temp
191pwm[1-2]_auto_point1_temp off temp
192
193Long story short, the controller implements the following algorithm to set the
194PWM output duty-cycle based on the input temperature:
195
196Thermal Threshold Output Duty-Cycle
197 (Rising Temp) (Falling Temp)
198----------------------------------------------------------
199 full speed duty-cycle full speed duty-cycle
200full speed temp
201 high speed duty-cycle full speed duty-cycle
202high speed temp
203 low speed duty-cycle high speed duty-cycle
204low speed temp
205 off duty-cycle low speed duty-cycle
206off temp
diff --git a/Documentation/hwmon/w83627ehf b/Documentation/hwmon/w83627ehf
new file mode 100644
index 000000000000..fae3b781d82d
--- /dev/null
+++ b/Documentation/hwmon/w83627ehf
@@ -0,0 +1,85 @@
1Kernel driver w83627ehf
2=======================
3
4Supported chips:
5 * Winbond W83627EHF/EHG (ISA access ONLY)
6 Prefix: 'w83627ehf'
7 Addresses scanned: ISA address retrieved from Super I/O registers
8 Datasheet: http://www.winbond-usa.com/products/winbond_products/pdfs/PCIC/W83627EHF_%20W83627EHGb.pdf
9
10Authors:
11 Jean Delvare <khali@linux-fr.org>
12 Yuan Mu (Winbond)
13 Rudolf Marek <r.marek@sh.cvut.cz>
14
15Description
16-----------
17
18This driver implements support for the Winbond W83627EHF and W83627EHG
19super I/O chips. We will refer to them collectively as Winbond chips.
20
21The chips implement three temperature sensors, five fan rotation
22speed sensors, ten analog voltage sensors, alarms with beep warnings (control
23unimplemented), and some automatic fan regulation strategies (plus manual
24fan control mode).
25
26Temperatures are measured in degrees Celsius and measurement resolution is 1
27degC for temp1 and 0.5 degC for temp2 and temp3. An alarm is triggered when
28the temperature gets higher than high limit; it stays on until the temperature
29falls below the Hysteresis value.
30
31Fan rotation speeds are reported in RPM (rotations per minute). An alarm is
32triggered if the rotation speed has dropped below a programmable limit. Fan
33readings can be divided by a programmable divider (1, 2, 4, 8, 16, 32, 64 or
34128) to give the readings more range or accuracy. The driver sets the most
35suitable fan divisor itself. Some fans might not be present because they
36share pins with other functions.
37
38Voltage sensors (also known as IN sensors) report their values in millivolts.
39An alarm is triggered if the voltage has crossed a programmable minimum
40or maximum limit.
41
42The driver supports automatic fan control mode known as Thermal Cruise.
43In this mode, the chip attempts to keep the measured temperature in a
44predefined temperature range. If the temperature goes out of range, fan
45is driven slower/faster to reach the predefined range again.
46
47The mode works for fan1-fan4. Mapping of temperatures to pwm outputs is as
48follows:
49
50temp1 -> pwm1
51temp2 -> pwm2
52temp3 -> pwm3
53prog -> pwm4 (the programmable setting is not supported by the driver)
54
55/sys files
56----------
57
58pwm[1-4] - this file stores PWM duty cycle or DC value (fan speed) in range:
59 0 (stop) to 255 (full)
60
61pwm[1-4]_enable - this file controls mode of fan/temperature control:
62 * 1 Manual Mode, write to pwm file any value 0-255 (full speed)
63 * 2 Thermal Cruise
64
65Thermal Cruise mode
66-------------------
67
68If the temperature is in the range defined by:
69
70pwm[1-4]_target - set target temperature, unit millidegree Celcius
71 (range 0 - 127000)
72pwm[1-4]_tolerance - tolerance, unit millidegree Celcius (range 0 - 15000)
73
74there are no changes to fan speed. Once the temperature leaves the interval,
75fan speed increases (temp is higher) or decreases if lower than desired.
76There are defined steps and times, but not exported by the driver yet.
77
78pwm[1-4]_min_output - minimum fan speed (range 1 - 255), when the temperature
79 is below defined range.
80pwm[1-4]_stop_time - how many milliseconds [ms] must elapse to switch
81 corresponding fan off. (when the temperature was below
82 defined range).
83
84Note: last two functions are influenced by other control bits, not yet exported
85 by the driver, so a change might not have any effect.
diff --git a/Documentation/hwmon/w83791d b/Documentation/hwmon/w83791d
index 83a3836289c2..19b2ed739fa1 100644
--- a/Documentation/hwmon/w83791d
+++ b/Documentation/hwmon/w83791d
@@ -5,7 +5,7 @@ Supported chips:
5 * Winbond W83791D 5 * Winbond W83791D
6 Prefix: 'w83791d' 6 Prefix: 'w83791d'
7 Addresses scanned: I2C 0x2c - 0x2f 7 Addresses scanned: I2C 0x2c - 0x2f
8 Datasheet: http://www.winbond-usa.com/products/winbond_products/pdfs/PCIC/W83791Da.pdf 8 Datasheet: http://www.winbond-usa.com/products/winbond_products/pdfs/PCIC/W83791D_W83791Gb.pdf
9 9
10Author: Charles Spirakis <bezaur@gmail.com> 10Author: Charles Spirakis <bezaur@gmail.com>
11 11
@@ -20,6 +20,9 @@ Credits:
20 Chunhao Huang <DZShen@Winbond.com.tw>, 20 Chunhao Huang <DZShen@Winbond.com.tw>,
21 Rudolf Marek <r.marek@sh.cvut.cz> 21 Rudolf Marek <r.marek@sh.cvut.cz>
22 22
23Additional contributors:
24 Sven Anders <anders@anduras.de>
25
23Module Parameters 26Module Parameters
24----------------- 27-----------------
25 28
@@ -46,7 +49,8 @@ Module Parameters
46Description 49Description
47----------- 50-----------
48 51
49This driver implements support for the Winbond W83791D chip. 52This driver implements support for the Winbond W83791D chip. The W83791G
53chip appears to be the same as the W83791D but is lead free.
50 54
51Detection of the chip can sometimes be foiled because it can be in an 55Detection of the chip can sometimes be foiled because it can be in an
52internal state that allows no clean access (Bank with ID register is not 56internal state that allows no clean access (Bank with ID register is not
@@ -71,34 +75,36 @@ Voltage sensors (also known as IN sensors) report their values in millivolts.
71An alarm is triggered if the voltage has crossed a programmable minimum 75An alarm is triggered if the voltage has crossed a programmable minimum
72or maximum limit. 76or maximum limit.
73 77
74Alarms are provided as output from a "realtime status register". The 78The bit ordering for the alarm "realtime status register" and the
75following bits are defined: 79"beep enable registers" are different.
76 80
77bit - alarm on: 81in0 (VCORE) : alarms: 0x000001 beep_enable: 0x000001
780 - Vcore 82in1 (VINR0) : alarms: 0x000002 beep_enable: 0x002000 <== mismatch
791 - VINR0 83in2 (+3.3VIN): alarms: 0x000004 beep_enable: 0x000004
802 - +3.3VIN 84in3 (5VDD) : alarms: 0x000008 beep_enable: 0x000008
813 - 5VDD 85in4 (+12VIN) : alarms: 0x000100 beep_enable: 0x000100
824 - temp1 86in5 (-12VIN) : alarms: 0x000200 beep_enable: 0x000200
835 - temp2 87in6 (-5VIN) : alarms: 0x000400 beep_enable: 0x000400
846 - fan1 88in7 (VSB) : alarms: 0x080000 beep_enable: 0x010000 <== mismatch
857 - fan2 89in8 (VBAT) : alarms: 0x100000 beep_enable: 0x020000 <== mismatch
868 - +12VIN 90in9 (VINR1) : alarms: 0x004000 beep_enable: 0x004000
879 - -12VIN 91temp1 : alarms: 0x000010 beep_enable: 0x000010
8810 - -5VIN 92temp2 : alarms: 0x000020 beep_enable: 0x000020
8911 - fan3 93temp3 : alarms: 0x002000 beep_enable: 0x000002 <== mismatch
9012 - chassis 94fan1 : alarms: 0x000040 beep_enable: 0x000040
9113 - temp3 95fan2 : alarms: 0x000080 beep_enable: 0x000080
9214 - VINR1 96fan3 : alarms: 0x000800 beep_enable: 0x000800
9315 - reserved 97fan4 : alarms: 0x200000 beep_enable: 0x200000
9416 - tart1 98fan5 : alarms: 0x400000 beep_enable: 0x400000
9517 - tart2 99tart1 : alarms: 0x010000 beep_enable: 0x040000 <== mismatch
9618 - tart3 100tart2 : alarms: 0x020000 beep_enable: 0x080000 <== mismatch
9719 - VSB 101tart3 : alarms: 0x040000 beep_enable: 0x100000 <== mismatch
9820 - VBAT 102case_open : alarms: 0x001000 beep_enable: 0x001000
9921 - fan4 103user_enable : alarms: -------- beep_enable: 0x800000
10022 - fan5 104
10123 - reserved 105*** NOTE: It is the responsibility of user-space code to handle the fact
106that the beep enable and alarm bits are in different positions when using that
107feature of the chip.
102 108
103When an alarm goes off, you can be warned by a beeping signal through your 109When an alarm goes off, you can be warned by a beeping signal through your
104computer speaker. It is possible to enable all beeping globally, or only 110computer speaker. It is possible to enable all beeping globally, or only
@@ -109,5 +115,6 @@ often will do no harm, but will return 'old' values.
109 115
110W83791D TODO: 116W83791D TODO:
111--------------- 117---------------
112Provide a patch for per-file alarms as discussed on the mailing list 118Provide a patch for per-file alarms and beep enables as defined in the hwmon
119 documentation (Documentation/hwmon/sysfs-interface)
113Provide a patch for smart-fan control (still need appropriate motherboard/fans) 120Provide a patch for smart-fan control (still need appropriate motherboard/fans)
diff --git a/Documentation/i2c/busses/i2c-sis96x b/Documentation/i2c/busses/i2c-sis96x
index 00a009b977e9..08d7b2dac69a 100644
--- a/Documentation/i2c/busses/i2c-sis96x
+++ b/Documentation/i2c/busses/i2c-sis96x
@@ -42,8 +42,8 @@ I suspect that this driver could be made to work for the following SiS
42chipsets as well: 635, and 635T. If anyone owns a board with those chips 42chipsets as well: 635, and 635T. If anyone owns a board with those chips
43AND is willing to risk crashing & burning an otherwise well-behaved kernel 43AND is willing to risk crashing & burning an otherwise well-behaved kernel
44in the name of progress... please contact me at <mhoffman@lightlink.com> or 44in the name of progress... please contact me at <mhoffman@lightlink.com> or
45via the project's mailing list: <lm-sensors@lm-sensors.org>. Please 45via the project's mailing list: <i2c@lm-sensors.org>. Please send bug
46send bug reports and/or success stories as well. 46reports and/or success stories as well.
47 47
48 48
49TO DOs 49TO DOs
diff --git a/Documentation/i2c/busses/i2c-viapro b/Documentation/i2c/busses/i2c-viapro
index 16775663b9f5..25680346e0ac 100644
--- a/Documentation/i2c/busses/i2c-viapro
+++ b/Documentation/i2c/busses/i2c-viapro
@@ -7,9 +7,12 @@ Supported adapters:
7 * VIA Technologies, Inc. VT82C686A/B 7 * VIA Technologies, Inc. VT82C686A/B
8 Datasheet: Sometimes available at the VIA website 8 Datasheet: Sometimes available at the VIA website
9 9
10 * VIA Technologies, Inc. VT8231, VT8233, VT8233A, VT8235, VT8237R 10 * VIA Technologies, Inc. VT8231, VT8233, VT8233A
11 Datasheet: available on request from VIA 11 Datasheet: available on request from VIA
12 12
13 * VIA Technologies, Inc. VT8235, VT8237R, VT8237A, VT8251
14 Datasheet: available on request and under NDA from VIA
15
13Authors: 16Authors:
14 Kyösti Mälkki <kmalkki@cc.hut.fi>, 17 Kyösti Mälkki <kmalkki@cc.hut.fi>,
15 Mark D. Studebaker <mdsxyz123@yahoo.com>, 18 Mark D. Studebaker <mdsxyz123@yahoo.com>,
@@ -39,6 +42,8 @@ Your lspci -n listing must show one of these :
39 device 1106:8235 (VT8231 function 4) 42 device 1106:8235 (VT8231 function 4)
40 device 1106:3177 (VT8235) 43 device 1106:3177 (VT8235)
41 device 1106:3227 (VT8237R) 44 device 1106:3227 (VT8237R)
45 device 1106:3337 (VT8237A)
46 device 1106:3287 (VT8251)
42 47
43If none of these show up, you should look in the BIOS for settings like 48If none of these show up, you should look in the BIOS for settings like
44enable ACPI / SMBus or even USB. 49enable ACPI / SMBus or even USB.
diff --git a/Documentation/i2c/i2c-stub b/Documentation/i2c/i2c-stub
index d6dcb138abf5..9cc081e69764 100644
--- a/Documentation/i2c/i2c-stub
+++ b/Documentation/i2c/i2c-stub
@@ -6,9 +6,12 @@ This module is a very simple fake I2C/SMBus driver. It implements four
6types of SMBus commands: write quick, (r/w) byte, (r/w) byte data, and 6types of SMBus commands: write quick, (r/w) byte, (r/w) byte data, and
7(r/w) word data. 7(r/w) word data.
8 8
9You need to provide a chip address as a module parameter when loading
10this driver, which will then only react to SMBus commands to this address.
11
9No hardware is needed nor associated with this module. It will accept write 12No hardware is needed nor associated with this module. It will accept write
10quick commands to all addresses; it will respond to the other commands (also 13quick commands to one address; it will respond to the other commands (also
11to all addresses) by reading from or writing to an array in memory. It will 14to one address) by reading from or writing to an array in memory. It will
12also spam the kernel logs for every command it handles. 15also spam the kernel logs for every command it handles.
13 16
14A pointer register with auto-increment is implemented for all byte 17A pointer register with auto-increment is implemented for all byte
@@ -21,6 +24,11 @@ The typical use-case is like this:
21 3. load the target sensors chip driver module 24 3. load the target sensors chip driver module
22 4. observe its behavior in the kernel log 25 4. observe its behavior in the kernel log
23 26
27PARAMETERS:
28
29int chip_addr:
30 The SMBus address to emulate a chip at.
31
24CAVEATS: 32CAVEATS:
25 33
26There are independent arrays for byte/data and word/data commands. Depending 34There are independent arrays for byte/data and word/data commands. Depending
@@ -33,6 +41,9 @@ If the hardware for your driver has banked registers (e.g. Winbond sensors
33chips) this module will not work well - although it could be extended to 41chips) this module will not work well - although it could be extended to
34support that pretty easily. 42support that pretty easily.
35 43
44Only one chip address is supported - although this module could be
45extended to support more.
46
36If you spam it hard enough, printk can be lossy. This module really wants 47If you spam it hard enough, printk can be lossy. This module really wants
37something like relayfs. 48something like relayfs.
38 49
diff --git a/Documentation/i386/boot.txt b/Documentation/i386/boot.txt
index 10312bebe55d..c51314b1a463 100644
--- a/Documentation/i386/boot.txt
+++ b/Documentation/i386/boot.txt
@@ -181,6 +181,7 @@ filled out, however:
181 5 ELILO 181 5 ELILO
182 7 GRuB 182 7 GRuB
183 8 U-BOOT 183 8 U-BOOT
184 9 Xen
184 185
185 Please contact <hpa@zytor.com> if you need a bootloader ID 186 Please contact <hpa@zytor.com> if you need a bootloader ID
186 value assigned. 187 value assigned.
diff --git a/Documentation/i386/zero-page.txt b/Documentation/i386/zero-page.txt
index df28c7416781..c04a421f4a7c 100644
--- a/Documentation/i386/zero-page.txt
+++ b/Documentation/i386/zero-page.txt
@@ -63,6 +63,10 @@ Offset Type Description
63 2 for bootsect-loader 63 2 for bootsect-loader
64 3 for SYSLINUX 64 3 for SYSLINUX
65 4 for ETHERBOOT 65 4 for ETHERBOOT
66 5 for ELILO
67 7 for GRuB
68 8 for U-BOOT
69 9 for Xen
66 V = version 70 V = version
670x211 char loadflags: 710x211 char loadflags:
68 bit0 = 1: kernel is loaded high (bzImage) 72 bit0 = 1: kernel is loaded high (bzImage)
diff --git a/Documentation/infiniband/ipoib.txt b/Documentation/infiniband/ipoib.txt
index 187035560d7f..864ff3283780 100644
--- a/Documentation/infiniband/ipoib.txt
+++ b/Documentation/infiniband/ipoib.txt
@@ -51,8 +51,6 @@ Debugging Information
51 51
52References 52References
53 53
54 IETF IP over InfiniBand (ipoib) Working Group
55 http://ietf.org/html.charters/ipoib-charter.html
56 Transmission of IP over InfiniBand (IPoIB) (RFC 4391) 54 Transmission of IP over InfiniBand (IPoIB) (RFC 4391)
57 http://ietf.org/rfc/rfc4391.txt 55 http://ietf.org/rfc/rfc4391.txt
58 IP over InfiniBand (IPoIB) Architecture (RFC 4392) 56 IP over InfiniBand (IPoIB) Architecture (RFC 4392)
diff --git a/Documentation/initrd.txt b/Documentation/initrd.txt
index 7de1c80cd719..15f1b35deb34 100644
--- a/Documentation/initrd.txt
+++ b/Documentation/initrd.txt
@@ -67,12 +67,27 @@ initrd adds the following new options:
67 as the last process has closed it, all data is freed and /dev/initrd 67 as the last process has closed it, all data is freed and /dev/initrd
68 can't be opened anymore. 68 can't be opened anymore.
69 69
70 root=/dev/ram0 (without devfs) 70 root=/dev/ram0
71 root=/dev/rd/0 (with devfs)
72 71
73 initrd is mounted as root, and the normal boot procedure is followed, 72 initrd is mounted as root, and the normal boot procedure is followed,
74 with the RAM disk still mounted as root. 73 with the RAM disk still mounted as root.
75 74
75Compressed cpio images
76----------------------
77
78Recent kernels have support for populating a ramdisk from a compressed cpio
79archive, on such systems, the creation of a ramdisk image doesn't need to
80involve special block devices or loopbacks, you merely create a directory on
81disk with the desired initrd content, cd to that directory, and run (as an
82example):
83
84find . | cpio --quiet -c -o | gzip -9 -n > /boot/imagefile.img
85
86Examining the contents of an existing image file is just as simple:
87
88mkdir /tmp/imagefile
89cd /tmp/imagefile
90gzip -cd /boot/imagefile.img | cpio -imd --quiet
76 91
77Installation 92Installation
78------------ 93------------
@@ -90,8 +105,7 @@ you're building an install floppy), the root file system creation
90procedure should create the /initrd directory. 105procedure should create the /initrd directory.
91 106
92If initrd will not be mounted in some cases, its content is still 107If initrd will not be mounted in some cases, its content is still
93accessible if the following device has been created (note that this 108accessible if the following device has been created:
94does not work if using devfs):
95 109
96# mknod /dev/initrd b 1 250 110# mknod /dev/initrd b 1 250
97# chmod 400 /dev/initrd 111# chmod 400 /dev/initrd
@@ -119,8 +133,7 @@ We'll describe the loopback device method:
119 (if space is critical, you may want to use the Minix FS instead of Ext2) 133 (if space is critical, you may want to use the Minix FS instead of Ext2)
120 3) mount the file system, e.g. 134 3) mount the file system, e.g.
121 # mount -t ext2 -o loop initrd /mnt 135 # mount -t ext2 -o loop initrd /mnt
122 4) create the console device (not necessary if using devfs, but it can't 136 4) create the console device:
123 hurt to do it anyway):
124 # mkdir /mnt/dev 137 # mkdir /mnt/dev
125 # mknod /mnt/dev/console c 5 1 138 # mknod /mnt/dev/console c 5 1
126 5) copy all the files that are needed to properly use the initrd 139 5) copy all the files that are needed to properly use the initrd
@@ -152,12 +165,7 @@ have to be given:
152 165
153 root=/dev/ram0 init=/linuxrc rw 166 root=/dev/ram0 init=/linuxrc rw
154 167
155if not using devfs, or 168(rw is only necessary if writing to the initrd file system.)
156
157 root=/dev/rd/0 init=/linuxrc rw
158
159if using devfs. (rw is only necessary if writing to the initrd file
160system.)
161 169
162With LOADLIN, you simply execute 170With LOADLIN, you simply execute
163 171
@@ -217,9 +225,9 @@ following command:
217# exec chroot . what-follows <dev/console >dev/console 2>&1 225# exec chroot . what-follows <dev/console >dev/console 2>&1
218 226
219Where what-follows is a program under the new root, e.g. /sbin/init 227Where what-follows is a program under the new root, e.g. /sbin/init
220If the new root file system will be used with devfs and has no valid 228If the new root file system will be used with udev and has no valid
221/dev directory, devfs must be mounted before invoking chroot in order to 229/dev directory, udev must be initialized before invoking chroot in order
222provide /dev/console. 230to provide /dev/console.
223 231
224Note: implementation details of pivot_root may change with time. In order 232Note: implementation details of pivot_root may change with time. In order
225to ensure compatibility, the following points should be observed: 233to ensure compatibility, the following points should be observed:
@@ -236,7 +244,7 @@ Now, the initrd can be unmounted and the memory allocated by the RAM
236disk can be freed: 244disk can be freed:
237 245
238# umount /initrd 246# umount /initrd
239# blockdev --flushbufs /dev/ram0 # /dev/rd/0 if using devfs 247# blockdev --flushbufs /dev/ram0
240 248
241It is also possible to use initrd with an NFS-mounted root, see the 249It is also possible to use initrd with an NFS-mounted root, see the
242pivot_root(8) man page for details. 250pivot_root(8) man page for details.
diff --git a/Documentation/input/joystick.txt b/Documentation/input/joystick.txt
index d53b857a3710..841c353297e6 100644
--- a/Documentation/input/joystick.txt
+++ b/Documentation/input/joystick.txt
@@ -39,7 +39,6 @@ them. Bug reports and success stories are also welcome.
39 39
40 The input project website is at: 40 The input project website is at:
41 41
42 http://www.suse.cz/development/input/
43 http://atrey.karlin.mff.cuni.cz/~vojtech/input/ 42 http://atrey.karlin.mff.cuni.cz/~vojtech/input/
44 43
45 There is also a mailing list for the driver at: 44 There is also a mailing list for the driver at:
diff --git a/Documentation/ioctl-number.txt b/Documentation/ioctl-number.txt
index 1543802ef53e..edc04d74ae23 100644
--- a/Documentation/ioctl-number.txt
+++ b/Documentation/ioctl-number.txt
@@ -119,7 +119,6 @@ Code Seq# Include File Comments
119'c' 00-7F linux/comstats.h conflict! 119'c' 00-7F linux/comstats.h conflict!
120'c' 00-7F linux/coda.h conflict! 120'c' 00-7F linux/coda.h conflict!
121'd' 00-FF linux/char/drm/drm/h conflict! 121'd' 00-FF linux/char/drm/drm/h conflict!
122'd' 00-1F linux/devfs_fs.h conflict!
123'd' 00-DF linux/video_decoder.h conflict! 122'd' 00-DF linux/video_decoder.h conflict!
124'd' F0-FF linux/digi1.h 123'd' F0-FF linux/digi1.h
125'e' all linux/digi1.h conflict! 124'e' all linux/digi1.h conflict!
diff --git a/Documentation/irqflags-tracing.txt b/Documentation/irqflags-tracing.txt
new file mode 100644
index 000000000000..6a444877ee0b
--- /dev/null
+++ b/Documentation/irqflags-tracing.txt
@@ -0,0 +1,57 @@
1IRQ-flags state tracing
2
3started by Ingo Molnar <mingo@redhat.com>
4
5the "irq-flags tracing" feature "traces" hardirq and softirq state, in
6that it gives interested subsystems an opportunity to be notified of
7every hardirqs-off/hardirqs-on, softirqs-off/softirqs-on event that
8happens in the kernel.
9
10CONFIG_TRACE_IRQFLAGS_SUPPORT is needed for CONFIG_PROVE_SPIN_LOCKING
11and CONFIG_PROVE_RW_LOCKING to be offered by the generic lock debugging
12code. Otherwise only CONFIG_PROVE_MUTEX_LOCKING and
13CONFIG_PROVE_RWSEM_LOCKING will be offered on an architecture - these
14are locking APIs that are not used in IRQ context. (the one exception
15for rwsems is worked around)
16
17architecture support for this is certainly not in the "trivial"
18category, because lots of lowlevel assembly code deal with irq-flags
19state changes. But an architecture can be irq-flags-tracing enabled in a
20rather straightforward and risk-free manner.
21
22Architectures that want to support this need to do a couple of
23code-organizational changes first:
24
25- move their irq-flags manipulation code from their asm/system.h header
26 to asm/irqflags.h
27
28- rename local_irq_disable()/etc to raw_local_irq_disable()/etc. so that
29 the linux/irqflags.h code can inject callbacks and can construct the
30 real local_irq_disable()/etc APIs.
31
32- add and enable TRACE_IRQFLAGS_SUPPORT in their arch level Kconfig file
33
34and then a couple of functional changes are needed as well to implement
35irq-flags-tracing support:
36
37- in lowlevel entry code add (build-conditional) calls to the
38 trace_hardirqs_off()/trace_hardirqs_on() functions. The lock validator
39 closely guards whether the 'real' irq-flags matches the 'virtual'
40 irq-flags state, and complains loudly (and turns itself off) if the
41 two do not match. Usually most of the time for arch support for
42 irq-flags-tracing is spent in this state: look at the lockdep
43 complaint, try to figure out the assembly code we did not cover yet,
44 fix and repeat. Once the system has booted up and works without a
45 lockdep complaint in the irq-flags-tracing functions arch support is
46 complete.
47- if the architecture has non-maskable interrupts then those need to be
48 excluded from the irq-tracing [and lock validation] mechanism via
49 lockdep_off()/lockdep_on().
50
51in general there is no risk from having an incomplete irq-flags-tracing
52implementation in an architecture: lockdep will detect that and will
53turn itself off. I.e. the lock validator will still be reliable. There
54should be no crashes due to irq-tracing bugs. (except if the assembly
55changes break other code by modifying conditions or registers that
56shouldnt be)
57
diff --git a/Documentation/kbuild/kconfig-language.txt b/Documentation/kbuild/kconfig-language.txt
index ca1967f36423..003fccc14d24 100644
--- a/Documentation/kbuild/kconfig-language.txt
+++ b/Documentation/kbuild/kconfig-language.txt
@@ -67,19 +67,19 @@ applicable everywhere (see syntax).
67- default value: "default" <expr> ["if" <expr>] 67- default value: "default" <expr> ["if" <expr>]
68 A config option can have any number of default values. If multiple 68 A config option can have any number of default values. If multiple
69 default values are visible, only the first defined one is active. 69 default values are visible, only the first defined one is active.
70 Default values are not limited to the menu entry, where they are 70 Default values are not limited to the menu entry where they are
71 defined, this means the default can be defined somewhere else or be 71 defined. This means the default can be defined somewhere else or be
72 overridden by an earlier definition. 72 overridden by an earlier definition.
73 The default value is only assigned to the config symbol if no other 73 The default value is only assigned to the config symbol if no other
74 value was set by the user (via the input prompt above). If an input 74 value was set by the user (via the input prompt above). If an input
75 prompt is visible the default value is presented to the user and can 75 prompt is visible the default value is presented to the user and can
76 be overridden by him. 76 be overridden by him.
77 Optionally dependencies only for this default value can be added with 77 Optionally, dependencies only for this default value can be added with
78 "if". 78 "if".
79 79
80- dependencies: "depends on"/"requires" <expr> 80- dependencies: "depends on"/"requires" <expr>
81 This defines a dependency for this menu entry. If multiple 81 This defines a dependency for this menu entry. If multiple
82 dependencies are defined they are connected with '&&'. Dependencies 82 dependencies are defined, they are connected with '&&'. Dependencies
83 are applied to all other options within this menu entry (which also 83 are applied to all other options within this menu entry (which also
84 accept an "if" expression), so these two examples are equivalent: 84 accept an "if" expression), so these two examples are equivalent:
85 85
@@ -153,7 +153,7 @@ Nonconstant symbols are the most common ones and are defined with the
153'config' statement. Nonconstant symbols consist entirely of alphanumeric 153'config' statement. Nonconstant symbols consist entirely of alphanumeric
154characters or underscores. 154characters or underscores.
155Constant symbols are only part of expressions. Constant symbols are 155Constant symbols are only part of expressions. Constant symbols are
156always surrounded by single or double quotes. Within the quote any 156always surrounded by single or double quotes. Within the quote, any
157other character is allowed and the quotes can be escaped using '\'. 157other character is allowed and the quotes can be escaped using '\'.
158 158
159Menu structure 159Menu structure
@@ -237,7 +237,7 @@ choices:
237 <choice block> 237 <choice block>
238 "endchoice" 238 "endchoice"
239 239
240This defines a choice group and accepts any of above attributes as 240This defines a choice group and accepts any of the above attributes as
241options. A choice can only be of type bool or tristate, while a boolean 241options. A choice can only be of type bool or tristate, while a boolean
242choice only allows a single config entry to be selected, a tristate 242choice only allows a single config entry to be selected, a tristate
243choice also allows any number of config entries to be set to 'm'. This 243choice also allows any number of config entries to be set to 'm'. This
diff --git a/Documentation/kbuild/makefiles.txt b/Documentation/kbuild/makefiles.txt
index a9c00facdf40..e2cbd59cf2d0 100644
--- a/Documentation/kbuild/makefiles.txt
+++ b/Documentation/kbuild/makefiles.txt
@@ -22,7 +22,7 @@ This document describes the Linux kernel Makefiles.
22 === 4 Host Program support 22 === 4 Host Program support
23 --- 4.1 Simple Host Program 23 --- 4.1 Simple Host Program
24 --- 4.2 Composite Host Programs 24 --- 4.2 Composite Host Programs
25 --- 4.3 Defining shared libraries 25 --- 4.3 Defining shared libraries
26 --- 4.4 Using C++ for host programs 26 --- 4.4 Using C++ for host programs
27 --- 4.5 Controlling compiler options for host programs 27 --- 4.5 Controlling compiler options for host programs
28 --- 4.6 When host programs are actually built 28 --- 4.6 When host programs are actually built
@@ -69,7 +69,7 @@ architecture-specific information to the top Makefile.
69 69
70Each subdirectory has a kbuild Makefile which carries out the commands 70Each subdirectory has a kbuild Makefile which carries out the commands
71passed down from above. The kbuild Makefile uses information from the 71passed down from above. The kbuild Makefile uses information from the
72.config file to construct various file lists used by kbuild to build 72.config file to construct various file lists used by kbuild to build
73any built-in or modular targets. 73any built-in or modular targets.
74 74
75scripts/Makefile.* contains all the definitions/rules etc. that 75scripts/Makefile.* contains all the definitions/rules etc. that
@@ -86,7 +86,7 @@ any kernel Makefiles (or any other source files).
86 86
87*Normal developers* are people who work on features such as device 87*Normal developers* are people who work on features such as device
88drivers, file systems, and network protocols. These people need to 88drivers, file systems, and network protocols. These people need to
89maintain the kbuild Makefiles for the subsystem that they are 89maintain the kbuild Makefiles for the subsystem they are
90working on. In order to do this effectively, they need some overall 90working on. In order to do this effectively, they need some overall
91knowledge about the kernel Makefiles, plus detailed knowledge about the 91knowledge about the kernel Makefiles, plus detailed knowledge about the
92public interface for kbuild. 92public interface for kbuild.
@@ -104,10 +104,10 @@ This document is aimed towards normal developers and arch developers.
104=== 3 The kbuild files 104=== 3 The kbuild files
105 105
106Most Makefiles within the kernel are kbuild Makefiles that use the 106Most Makefiles within the kernel are kbuild Makefiles that use the
107kbuild infrastructure. This chapter introduce the syntax used in the 107kbuild infrastructure. This chapter introduces the syntax used in the
108kbuild makefiles. 108kbuild makefiles.
109The preferred name for the kbuild files are 'Makefile' but 'Kbuild' can 109The preferred name for the kbuild files are 'Makefile' but 'Kbuild' can
110be used and if both a 'Makefile' and a 'Kbuild' file exists then the 'Kbuild' 110be used and if both a 'Makefile' and a 'Kbuild' file exists, then the 'Kbuild'
111file will be used. 111file will be used.
112 112
113Section 3.1 "Goal definitions" is a quick intro, further chapters provide 113Section 3.1 "Goal definitions" is a quick intro, further chapters provide
@@ -124,7 +124,7 @@ more details, with real examples.
124 Example: 124 Example:
125 obj-y += foo.o 125 obj-y += foo.o
126 126
127 This tell kbuild that there is one object in that directory named 127 This tell kbuild that there is one object in that directory, named
128 foo.o. foo.o will be built from foo.c or foo.S. 128 foo.o. foo.o will be built from foo.c or foo.S.
129 129
130 If foo.o shall be built as a module, the variable obj-m is used. 130 If foo.o shall be built as a module, the variable obj-m is used.
@@ -140,7 +140,7 @@ more details, with real examples.
140--- 3.2 Built-in object goals - obj-y 140--- 3.2 Built-in object goals - obj-y
141 141
142 The kbuild Makefile specifies object files for vmlinux 142 The kbuild Makefile specifies object files for vmlinux
143 in the lists $(obj-y). These lists depend on the kernel 143 in the $(obj-y) lists. These lists depend on the kernel
144 configuration. 144 configuration.
145 145
146 Kbuild compiles all the $(obj-y) files. It then calls 146 Kbuild compiles all the $(obj-y) files. It then calls
@@ -154,8 +154,8 @@ more details, with real examples.
154 Link order is significant, because certain functions 154 Link order is significant, because certain functions
155 (module_init() / __initcall) will be called during boot in the 155 (module_init() / __initcall) will be called during boot in the
156 order they appear. So keep in mind that changing the link 156 order they appear. So keep in mind that changing the link
157 order may e.g. change the order in which your SCSI 157 order may e.g. change the order in which your SCSI
158 controllers are detected, and thus you disks are renumbered. 158 controllers are detected, and thus your disks are renumbered.
159 159
160 Example: 160 Example:
161 #drivers/isdn/i4l/Makefile 161 #drivers/isdn/i4l/Makefile
@@ -203,11 +203,11 @@ more details, with real examples.
203 Example: 203 Example:
204 #fs/ext2/Makefile 204 #fs/ext2/Makefile
205 obj-$(CONFIG_EXT2_FS) += ext2.o 205 obj-$(CONFIG_EXT2_FS) += ext2.o
206 ext2-y := balloc.o bitmap.o 206 ext2-y := balloc.o bitmap.o
207 ext2-$(CONFIG_EXT2_FS_XATTR) += xattr.o 207 ext2-$(CONFIG_EXT2_FS_XATTR) += xattr.o
208 208
209 In this example xattr.o is only part of the composite object 209 In this example, xattr.o is only part of the composite object
210 ext2.o, if $(CONFIG_EXT2_FS_XATTR) evaluates to 'y'. 210 ext2.o if $(CONFIG_EXT2_FS_XATTR) evaluates to 'y'.
211 211
212 Note: Of course, when you are building objects into the kernel, 212 Note: Of course, when you are building objects into the kernel,
213 the syntax above will also work. So, if you have CONFIG_EXT2_FS=y, 213 the syntax above will also work. So, if you have CONFIG_EXT2_FS=y,
@@ -221,16 +221,16 @@ more details, with real examples.
221 221
222--- 3.5 Library file goals - lib-y 222--- 3.5 Library file goals - lib-y
223 223
224 Objects listed with obj-* are used for modules or 224 Objects listed with obj-* are used for modules, or
225 combined in a built-in.o for that specific directory. 225 combined in a built-in.o for that specific directory.
226 There is also the possibility to list objects that will 226 There is also the possibility to list objects that will
227 be included in a library, lib.a. 227 be included in a library, lib.a.
228 All objects listed with lib-y are combined in a single 228 All objects listed with lib-y are combined in a single
229 library for that directory. 229 library for that directory.
230 Objects that are listed in obj-y and additional listed in 230 Objects that are listed in obj-y and additionaly listed in
231 lib-y will not be included in the library, since they will anyway 231 lib-y will not be included in the library, since they will anyway
232 be accessible. 232 be accessible.
233 For consistency objects listed in lib-m will be included in lib.a. 233 For consistency, objects listed in lib-m will be included in lib.a.
234 234
235 Note that the same kbuild makefile may list files to be built-in 235 Note that the same kbuild makefile may list files to be built-in
236 and to be part of a library. Therefore the same directory 236 and to be part of a library. Therefore the same directory
@@ -241,11 +241,11 @@ more details, with real examples.
241 lib-y := checksum.o delay.o 241 lib-y := checksum.o delay.o
242 242
243 This will create a library lib.a based on checksum.o and delay.o. 243 This will create a library lib.a based on checksum.o and delay.o.
244 For kbuild to actually recognize that there is a lib.a being build 244 For kbuild to actually recognize that there is a lib.a being built,
245 the directory shall be listed in libs-y. 245 the directory shall be listed in libs-y.
246 See also "6.3 List directories to visit when descending". 246 See also "6.3 List directories to visit when descending".
247 247
248 Usage of lib-y is normally restricted to lib/ and arch/*/lib. 248 Use of lib-y is normally restricted to lib/ and arch/*/lib.
249 249
250--- 3.6 Descending down in directories 250--- 3.6 Descending down in directories
251 251
@@ -255,7 +255,7 @@ more details, with real examples.
255 invoke make recursively in subdirectories, provided you let it know of 255 invoke make recursively in subdirectories, provided you let it know of
256 them. 256 them.
257 257
258 To do so obj-y and obj-m are used. 258 To do so, obj-y and obj-m are used.
259 ext2 lives in a separate directory, and the Makefile present in fs/ 259 ext2 lives in a separate directory, and the Makefile present in fs/
260 tells kbuild to descend down using the following assignment. 260 tells kbuild to descend down using the following assignment.
261 261
@@ -353,8 +353,8 @@ more details, with real examples.
353 Special rules are used when the kbuild infrastructure does 353 Special rules are used when the kbuild infrastructure does
354 not provide the required support. A typical example is 354 not provide the required support. A typical example is
355 header files generated during the build process. 355 header files generated during the build process.
356 Another example is the architecture specific Makefiles which 356 Another example are the architecture specific Makefiles which
357 needs special rules to prepare boot images etc. 357 need special rules to prepare boot images etc.
358 358
359 Special rules are written as normal Make rules. 359 Special rules are written as normal Make rules.
360 Kbuild is not executing in the directory where the Makefile is 360 Kbuild is not executing in the directory where the Makefile is
@@ -387,28 +387,47 @@ more details, with real examples.
387 387
388--- 3.11 $(CC) support functions 388--- 3.11 $(CC) support functions
389 389
390 The kernel may be build with several different versions of 390 The kernel may be built with several different versions of
391 $(CC), each supporting a unique set of features and options. 391 $(CC), each supporting a unique set of features and options.
392 kbuild provide basic support to check for valid options for $(CC). 392 kbuild provide basic support to check for valid options for $(CC).
393 $(CC) is useally the gcc compiler, but other alternatives are 393 $(CC) is useally the gcc compiler, but other alternatives are
394 available. 394 available.
395 395
396 as-option 396 as-option
397 as-option is used to check if $(CC) when used to compile 397 as-option is used to check if $(CC) -- when used to compile
398 assembler (*.S) files supports the given option. An optional 398 assembler (*.S) files -- supports the given option. An optional
399 second option may be specified if first option are not supported. 399 second option may be specified if the first option is not supported.
400 400
401 Example: 401 Example:
402 #arch/sh/Makefile 402 #arch/sh/Makefile
403 cflags-y += $(call as-option,-Wa$(comma)-isa=$(isa-y),) 403 cflags-y += $(call as-option,-Wa$(comma)-isa=$(isa-y),)
404 404
405 In the above example cflags-y will be assinged the the option 405 In the above example, cflags-y will be assigned the option
406 -Wa$(comma)-isa=$(isa-y) if it is supported by $(CC). 406 -Wa$(comma)-isa=$(isa-y) if it is supported by $(CC).
407 The second argument is optional, and if supplied will be used 407 The second argument is optional, and if supplied will be used
408 if first argument is not supported. 408 if first argument is not supported.
409 409
410 ld-option
411 ld-option is used to check if $(CC) when used to link object files
412 supports the given option. An optional second option may be
413 specified if first option are not supported.
414
415 Example:
416 #arch/i386/kernel/Makefile
417 vsyscall-flags += $(call ld-option, -Wl$(comma)--hash-style=sysv)
418
419 In the above example vsyscall-flags will be assigned the option
420 -Wl$(comma)--hash-style=sysv if it is supported by $(CC).
421 The second argument is optional, and if supplied will be used
422 if first argument is not supported.
423
424 as-instr
425 as-instr checks if the assembler reports a specific instruction
426 and then outputs either option1 or option2
427 C escapes are supported in the test instruction
428
410 cc-option 429 cc-option
411 cc-option is used to check if $(CC) support a given option, and not 430 cc-option is used to check if $(CC) supports a given option, and not
412 supported to use an optional second option. 431 supported to use an optional second option.
413 432
414 Example: 433 Example:
@@ -416,12 +435,12 @@ more details, with real examples.
416 cflags-y += $(call cc-option,-march=pentium-mmx,-march=i586) 435 cflags-y += $(call cc-option,-march=pentium-mmx,-march=i586)
417 436
418 In the above example cflags-y will be assigned the option 437 In the above example cflags-y will be assigned the option
419 -march=pentium-mmx if supported by $(CC), otherwise -march-i586. 438 -march=pentium-mmx if supported by $(CC), otherwise -march=i586.
420 The second argument to cc-option is optional, and if omitted 439 The second argument to cc-option is optional, and if omitted,
421 cflags-y will be assigned no value if first option is not supported. 440 cflags-y will be assigned no value if first option is not supported.
422 441
423 cc-option-yn 442 cc-option-yn
424 cc-option-yn is used to check if gcc supports a given option 443 cc-option-yn is used to check if gcc supports a given option
425 and return 'y' if supported, otherwise 'n'. 444 and return 'y' if supported, otherwise 'n'.
426 445
427 Example: 446 Example:
@@ -429,32 +448,33 @@ more details, with real examples.
429 biarch := $(call cc-option-yn, -m32) 448 biarch := $(call cc-option-yn, -m32)
430 aflags-$(biarch) += -a32 449 aflags-$(biarch) += -a32
431 cflags-$(biarch) += -m32 450 cflags-$(biarch) += -m32
432 451
433 In the above example $(biarch) is set to y if $(CC) supports the -m32 452 In the above example, $(biarch) is set to y if $(CC) supports the -m32
434 option. When $(biarch) equals to y the expanded variables $(aflags-y) 453 option. When $(biarch) equals 'y', the expanded variables $(aflags-y)
435 and $(cflags-y) will be assigned the values -a32 and -m32. 454 and $(cflags-y) will be assigned the values -a32 and -m32,
455 respectively.
436 456
437 cc-option-align 457 cc-option-align
438 gcc version >= 3.0 shifted type of options used to speify 458 gcc versions >= 3.0 changed the type of options used to specify
439 alignment of functions, loops etc. $(cc-option-align) whrn used 459 alignment of functions, loops etc. $(cc-option-align), when used
440 as prefix to the align options will select the right prefix: 460 as prefix to the align options, will select the right prefix:
441 gcc < 3.00 461 gcc < 3.00
442 cc-option-align = -malign 462 cc-option-align = -malign
443 gcc >= 3.00 463 gcc >= 3.00
444 cc-option-align = -falign 464 cc-option-align = -falign
445 465
446 Example: 466 Example:
447 CFLAGS += $(cc-option-align)-functions=4 467 CFLAGS += $(cc-option-align)-functions=4
448 468
449 In the above example the option -falign-functions=4 is used for 469 In the above example, the option -falign-functions=4 is used for
450 gcc >= 3.00. For gcc < 3.00 -malign-functions=4 is used. 470 gcc >= 3.00. For gcc < 3.00, -malign-functions=4 is used.
451 471
452 cc-version 472 cc-version
453 cc-version return a numerical version of the $(CC) compiler version. 473 cc-version returns a numerical version of the $(CC) compiler version.
454 The format is <major><minor> where both are two digits. So for example 474 The format is <major><minor> where both are two digits. So for example
455 gcc 3.41 would return 0341. 475 gcc 3.41 would return 0341.
456 cc-version is useful when a specific $(CC) version is faulty in one 476 cc-version is useful when a specific $(CC) version is faulty in one
457 area, for example the -mregparm=3 were broken in some gcc version 477 area, for example -mregparm=3 was broken in some gcc versions
458 even though the option was accepted by gcc. 478 even though the option was accepted by gcc.
459 479
460 Example: 480 Example:
@@ -463,20 +483,20 @@ more details, with real examples.
463 if [ $(call cc-version) -ge 0300 ] ; then \ 483 if [ $(call cc-version) -ge 0300 ] ; then \
464 echo "-mregparm=3"; fi ;) 484 echo "-mregparm=3"; fi ;)
465 485
466 In the above example -mregparm=3 is only used for gcc version greater 486 In the above example, -mregparm=3 is only used for gcc version greater
467 than or equal to gcc 3.0. 487 than or equal to gcc 3.0.
468 488
469 cc-ifversion 489 cc-ifversion
470 cc-ifversion test the version of $(CC) and equals last argument if 490 cc-ifversion tests the version of $(CC) and equals last argument if
471 version expression is true. 491 version expression is true.
472 492
473 Example: 493 Example:
474 #fs/reiserfs/Makefile 494 #fs/reiserfs/Makefile
475 EXTRA_CFLAGS := $(call cc-ifversion, -lt, 0402, -O1) 495 EXTRA_CFLAGS := $(call cc-ifversion, -lt, 0402, -O1)
476 496
477 In this example EXTRA_CFLAGS will be assigned the value -O1 if the 497 In this example, EXTRA_CFLAGS will be assigned the value -O1 if the
478 $(CC) version is less than 4.2. 498 $(CC) version is less than 4.2.
479 cc-ifversion takes all the shell operators: 499 cc-ifversion takes all the shell operators:
480 -eq, -ne, -lt, -le, -gt, and -ge 500 -eq, -ne, -lt, -le, -gt, and -ge
481 The third parameter may be a text as in this example, but it may also 501 The third parameter may be a text as in this example, but it may also
482 be an expanded variable or a macro. 502 be an expanded variable or a macro.
@@ -492,7 +512,7 @@ The first step is to tell kbuild that a host program exists. This is
492done utilising the variable hostprogs-y. 512done utilising the variable hostprogs-y.
493 513
494The second step is to add an explicit dependency to the executable. 514The second step is to add an explicit dependency to the executable.
495This can be done in two ways. Either add the dependency in a rule, 515This can be done in two ways. Either add the dependency in a rule,
496or utilise the variable $(always). 516or utilise the variable $(always).
497Both possibilities are described in the following. 517Both possibilities are described in the following.
498 518
@@ -509,28 +529,28 @@ Both possibilities are described in the following.
509 Kbuild assumes in the above example that bin2hex is made from a single 529 Kbuild assumes in the above example that bin2hex is made from a single
510 c-source file named bin2hex.c located in the same directory as 530 c-source file named bin2hex.c located in the same directory as
511 the Makefile. 531 the Makefile.
512 532
513--- 4.2 Composite Host Programs 533--- 4.2 Composite Host Programs
514 534
515 Host programs can be made up based on composite objects. 535 Host programs can be made up based on composite objects.
516 The syntax used to define composite objects for host programs is 536 The syntax used to define composite objects for host programs is
517 similar to the syntax used for kernel objects. 537 similar to the syntax used for kernel objects.
518 $(<executeable>-objs) list all objects used to link the final 538 $(<executeable>-objs) lists all objects used to link the final
519 executable. 539 executable.
520 540
521 Example: 541 Example:
522 #scripts/lxdialog/Makefile 542 #scripts/lxdialog/Makefile
523 hostprogs-y := lxdialog 543 hostprogs-y := lxdialog
524 lxdialog-objs := checklist.o lxdialog.o 544 lxdialog-objs := checklist.o lxdialog.o
525 545
526 Objects with extension .o are compiled from the corresponding .c 546 Objects with extension .o are compiled from the corresponding .c
527 files. In the above example checklist.c is compiled to checklist.o 547 files. In the above example, checklist.c is compiled to checklist.o
528 and lxdialog.c is compiled to lxdialog.o. 548 and lxdialog.c is compiled to lxdialog.o.
529 Finally the two .o files are linked to the executable, lxdialog. 549 Finally, the two .o files are linked to the executable, lxdialog.
530 Note: The syntax <executable>-y is not permitted for host-programs. 550 Note: The syntax <executable>-y is not permitted for host-programs.
531 551
532--- 4.3 Defining shared libraries 552--- 4.3 Defining shared libraries
533 553
534 Objects with extension .so are considered shared libraries, and 554 Objects with extension .so are considered shared libraries, and
535 will be compiled as position independent objects. 555 will be compiled as position independent objects.
536 Kbuild provides support for shared libraries, but the usage 556 Kbuild provides support for shared libraries, but the usage
@@ -543,7 +563,7 @@ Both possibilities are described in the following.
543 hostprogs-y := conf 563 hostprogs-y := conf
544 conf-objs := conf.o libkconfig.so 564 conf-objs := conf.o libkconfig.so
545 libkconfig-objs := expr.o type.o 565 libkconfig-objs := expr.o type.o
546 566
547 Shared libraries always require a corresponding -objs line, and 567 Shared libraries always require a corresponding -objs line, and
548 in the example above the shared library libkconfig is composed by 568 in the example above the shared library libkconfig is composed by
549 the two objects expr.o and type.o. 569 the two objects expr.o and type.o.
@@ -564,7 +584,7 @@ Both possibilities are described in the following.
564 584
565 In the example above the executable is composed of the C++ file 585 In the example above the executable is composed of the C++ file
566 qconf.cc - identified by $(qconf-cxxobjs). 586 qconf.cc - identified by $(qconf-cxxobjs).
567 587
568 If qconf is composed by a mixture of .c and .cc files, then an 588 If qconf is composed by a mixture of .c and .cc files, then an
569 additional line can be used to identify this. 589 additional line can be used to identify this.
570 590
@@ -573,34 +593,35 @@ Both possibilities are described in the following.
573 hostprogs-y := qconf 593 hostprogs-y := qconf
574 qconf-cxxobjs := qconf.o 594 qconf-cxxobjs := qconf.o
575 qconf-objs := check.o 595 qconf-objs := check.o
576 596
577--- 4.5 Controlling compiler options for host programs 597--- 4.5 Controlling compiler options for host programs
578 598
579 When compiling host programs, it is possible to set specific flags. 599 When compiling host programs, it is possible to set specific flags.
580 The programs will always be compiled utilising $(HOSTCC) passed 600 The programs will always be compiled utilising $(HOSTCC) passed
581 the options specified in $(HOSTCFLAGS). 601 the options specified in $(HOSTCFLAGS).
582 To set flags that will take effect for all host programs created 602 To set flags that will take effect for all host programs created
583 in that Makefile use the variable HOST_EXTRACFLAGS. 603 in that Makefile, use the variable HOST_EXTRACFLAGS.
584 604
585 Example: 605 Example:
586 #scripts/lxdialog/Makefile 606 #scripts/lxdialog/Makefile
587 HOST_EXTRACFLAGS += -I/usr/include/ncurses 607 HOST_EXTRACFLAGS += -I/usr/include/ncurses
588 608
589 To set specific flags for a single file the following construction 609 To set specific flags for a single file the following construction
590 is used: 610 is used:
591 611
592 Example: 612 Example:
593 #arch/ppc64/boot/Makefile 613 #arch/ppc64/boot/Makefile
594 HOSTCFLAGS_piggyback.o := -DKERNELBASE=$(KERNELBASE) 614 HOSTCFLAGS_piggyback.o := -DKERNELBASE=$(KERNELBASE)
595 615
596 It is also possible to specify additional options to the linker. 616 It is also possible to specify additional options to the linker.
597 617
598 Example: 618 Example:
599 #scripts/kconfig/Makefile 619 #scripts/kconfig/Makefile
600 HOSTLOADLIBES_qconf := -L$(QTDIR)/lib 620 HOSTLOADLIBES_qconf := -L$(QTDIR)/lib
601 621
602 When linking qconf it will be passed the extra option "-L$(QTDIR)/lib". 622 When linking qconf, it will be passed the extra option
603 623 "-L$(QTDIR)/lib".
624
604--- 4.6 When host programs are actually built 625--- 4.6 When host programs are actually built
605 626
606 Kbuild will only build host-programs when they are referenced 627 Kbuild will only build host-programs when they are referenced
@@ -615,7 +636,7 @@ Both possibilities are described in the following.
615 $(obj)/devlist.h: $(src)/pci.ids $(obj)/gen-devlist 636 $(obj)/devlist.h: $(src)/pci.ids $(obj)/gen-devlist
616 ( cd $(obj); ./gen-devlist ) < $< 637 ( cd $(obj); ./gen-devlist ) < $<
617 638
618 The target $(obj)/devlist.h will not be built before 639 The target $(obj)/devlist.h will not be built before
619 $(obj)/gen-devlist is updated. Note that references to 640 $(obj)/gen-devlist is updated. Note that references to
620 the host programs in special rules must be prefixed with $(obj). 641 the host programs in special rules must be prefixed with $(obj).
621 642
@@ -634,7 +655,7 @@ Both possibilities are described in the following.
634 655
635--- 4.7 Using hostprogs-$(CONFIG_FOO) 656--- 4.7 Using hostprogs-$(CONFIG_FOO)
636 657
637 A typcal pattern in a Kbuild file lok like this: 658 A typical pattern in a Kbuild file looks like this:
638 659
639 Example: 660 Example:
640 #scripts/Makefile 661 #scripts/Makefile
@@ -642,13 +663,13 @@ Both possibilities are described in the following.
642 663
643 Kbuild knows about both 'y' for built-in and 'm' for module. 664 Kbuild knows about both 'y' for built-in and 'm' for module.
644 So if a config symbol evaluate to 'm', kbuild will still build 665 So if a config symbol evaluate to 'm', kbuild will still build
645 the binary. In other words Kbuild handle hostprogs-m exactly 666 the binary. In other words, Kbuild handles hostprogs-m exactly
646 like hostprogs-y. But only hostprogs-y is recommend used 667 like hostprogs-y. But only hostprogs-y is recommended to be used
647 when no CONFIG symbol are involved. 668 when no CONFIG symbols are involved.
648 669
649=== 5 Kbuild clean infrastructure 670=== 5 Kbuild clean infrastructure
650 671
651"make clean" deletes most generated files in the src tree where the kernel 672"make clean" deletes most generated files in the obj tree where the kernel
652is compiled. This includes generated files such as host programs. 673is compiled. This includes generated files such as host programs.
653Kbuild knows targets listed in $(hostprogs-y), $(hostprogs-m), $(always), 674Kbuild knows targets listed in $(hostprogs-y), $(hostprogs-m), $(always),
654$(extra-y) and $(targets). They are all deleted during "make clean". 675$(extra-y) and $(targets). They are all deleted during "make clean".
@@ -666,7 +687,8 @@ When executing "make clean", the two files "devlist.h classlist.h" will
666be deleted. Kbuild will assume files to be in same relative directory as the 687be deleted. Kbuild will assume files to be in same relative directory as the
667Makefile except if an absolute path is specified (path starting with '/'). 688Makefile except if an absolute path is specified (path starting with '/').
668 689
669To delete a directory hirachy use: 690To delete a directory hierarchy use:
691
670 Example: 692 Example:
671 #scripts/package/Makefile 693 #scripts/package/Makefile
672 clean-dirs := $(objtree)/debian/ 694 clean-dirs := $(objtree)/debian/
@@ -709,29 +731,29 @@ be visited during "make clean".
709 731
710The top level Makefile sets up the environment and does the preparation, 732The top level Makefile sets up the environment and does the preparation,
711before starting to descend down in the individual directories. 733before starting to descend down in the individual directories.
712The top level makefile contains the generic part, whereas the 734The top level makefile contains the generic part, whereas
713arch/$(ARCH)/Makefile contains what is required to set-up kbuild 735arch/$(ARCH)/Makefile contains what is required to set up kbuild
714to the said architecture. 736for said architecture.
715To do so arch/$(ARCH)/Makefile sets a number of variables, and defines 737To do so, arch/$(ARCH)/Makefile sets up a number of variables and defines
716a few targets. 738a few targets.
717 739
718When kbuild executes the following steps are followed (roughly): 740When kbuild executes, the following steps are followed (roughly):
7191) Configuration of the kernel => produced .config 7411) Configuration of the kernel => produce .config
7202) Store kernel version in include/linux/version.h 7422) Store kernel version in include/linux/version.h
7213) Symlink include/asm to include/asm-$(ARCH) 7433) Symlink include/asm to include/asm-$(ARCH)
7224) Updating all other prerequisites to the target prepare: 7444) Updating all other prerequisites to the target prepare:
723 - Additional prerequisites are specified in arch/$(ARCH)/Makefile 745 - Additional prerequisites are specified in arch/$(ARCH)/Makefile
7245) Recursively descend down in all directories listed in 7465) Recursively descend down in all directories listed in
725 init-* core* drivers-* net-* libs-* and build all targets. 747 init-* core* drivers-* net-* libs-* and build all targets.
726 - The value of the above variables are extended in arch/$(ARCH)/Makefile. 748 - The values of the above variables are expanded in arch/$(ARCH)/Makefile.
7276) All object files are then linked and the resulting file vmlinux is 7496) All object files are then linked and the resulting file vmlinux is
728 located at the root of the src tree. 750 located at the root of the obj tree.
729 The very first objects linked are listed in head-y, assigned by 751 The very first objects linked are listed in head-y, assigned by
730 arch/$(ARCH)/Makefile. 752 arch/$(ARCH)/Makefile.
7317) Finally the architecture specific part does any required post processing 7537) Finally, the architecture specific part does any required post processing
732 and builds the final bootimage. 754 and builds the final bootimage.
733 - This includes building boot records 755 - This includes building boot records
734 - Preparing initrd images and the like 756 - Preparing initrd images and thelike
735 757
736 758
737--- 6.1 Set variables to tweak the build to the architecture 759--- 6.1 Set variables to tweak the build to the architecture
@@ -746,7 +768,7 @@ When kbuild executes the following steps are followed (roughly):
746 LDFLAGS := -m elf_s390 768 LDFLAGS := -m elf_s390
747 Note: EXTRA_LDFLAGS and LDFLAGS_$@ can be used to further customise 769 Note: EXTRA_LDFLAGS and LDFLAGS_$@ can be used to further customise
748 the flags used. See chapter 7. 770 the flags used. See chapter 7.
749 771
750 LDFLAGS_MODULE Options for $(LD) when linking modules 772 LDFLAGS_MODULE Options for $(LD) when linking modules
751 773
752 LDFLAGS_MODULE is used to set specific flags for $(LD) when 774 LDFLAGS_MODULE is used to set specific flags for $(LD) when
@@ -756,7 +778,7 @@ When kbuild executes the following steps are followed (roughly):
756 LDFLAGS_vmlinux Options for $(LD) when linking vmlinux 778 LDFLAGS_vmlinux Options for $(LD) when linking vmlinux
757 779
758 LDFLAGS_vmlinux is used to specify additional flags to pass to 780 LDFLAGS_vmlinux is used to specify additional flags to pass to
759 the linker when linking the final vmlinux. 781 the linker when linking the final vmlinux image.
760 LDFLAGS_vmlinux uses the LDFLAGS_$@ support. 782 LDFLAGS_vmlinux uses the LDFLAGS_$@ support.
761 783
762 Example: 784 Example:
@@ -766,7 +788,7 @@ When kbuild executes the following steps are followed (roughly):
766 OBJCOPYFLAGS objcopy flags 788 OBJCOPYFLAGS objcopy flags
767 789
768 When $(call if_changed,objcopy) is used to translate a .o file, 790 When $(call if_changed,objcopy) is used to translate a .o file,
769 then the flags specified in OBJCOPYFLAGS will be used. 791 the flags specified in OBJCOPYFLAGS will be used.
770 $(call if_changed,objcopy) is often used to generate raw binaries on 792 $(call if_changed,objcopy) is often used to generate raw binaries on
771 vmlinux. 793 vmlinux.
772 794
@@ -778,7 +800,7 @@ When kbuild executes the following steps are followed (roughly):
778 $(obj)/image: vmlinux FORCE 800 $(obj)/image: vmlinux FORCE
779 $(call if_changed,objcopy) 801 $(call if_changed,objcopy)
780 802
781 In this example the binary $(obj)/image is a binary version of 803 In this example, the binary $(obj)/image is a binary version of
782 vmlinux. The usage of $(call if_changed,xxx) will be described later. 804 vmlinux. The usage of $(call if_changed,xxx) will be described later.
783 805
784 AFLAGS $(AS) assembler flags 806 AFLAGS $(AS) assembler flags
@@ -795,7 +817,7 @@ When kbuild executes the following steps are followed (roughly):
795 Default value - see top level Makefile 817 Default value - see top level Makefile
796 Append or modify as required per architecture. 818 Append or modify as required per architecture.
797 819
798 Often the CFLAGS variable depends on the configuration. 820 Often, the CFLAGS variable depends on the configuration.
799 821
800 Example: 822 Example:
801 #arch/i386/Makefile 823 #arch/i386/Makefile
@@ -816,7 +838,7 @@ When kbuild executes the following steps are followed (roughly):
816 ... 838 ...
817 839
818 840
819 The first examples utilises the trick that a config option expands 841 The first example utilises the trick that a config option expands
820 to 'y' when selected. 842 to 'y' when selected.
821 843
822 CFLAGS_KERNEL $(CC) options specific for built-in 844 CFLAGS_KERNEL $(CC) options specific for built-in
@@ -829,18 +851,18 @@ When kbuild executes the following steps are followed (roughly):
829 $(CFLAGS_MODULE) contains extra C compiler flags used to compile code 851 $(CFLAGS_MODULE) contains extra C compiler flags used to compile code
830 for loadable kernel modules. 852 for loadable kernel modules.
831 853
832 854
833--- 6.2 Add prerequisites to archprepare: 855--- 6.2 Add prerequisites to archprepare:
834 856
835 The archprepare: rule is used to list prerequisites that needs to be 857 The archprepare: rule is used to list prerequisites that need to be
836 built before starting to descend down in the subdirectories. 858 built before starting to descend down in the subdirectories.
837 This is usual header files containing assembler constants. 859 This is usually used for header files containing assembler constants.
838 860
839 Example: 861 Example:
840 #arch/arm/Makefile 862 #arch/arm/Makefile
841 archprepare: maketools 863 archprepare: maketools
842 864
843 In this example the file target maketools will be processed 865 In this example, the file target maketools will be processed
844 before descending down in the subdirectories. 866 before descending down in the subdirectories.
845 See also chapter XXX-TODO that describe how kbuild supports 867 See also chapter XXX-TODO that describe how kbuild supports
846 generating offset header files. 868 generating offset header files.
@@ -853,18 +875,19 @@ When kbuild executes the following steps are followed (roughly):
853 corresponding arch-specific section for modules; the module-building 875 corresponding arch-specific section for modules; the module-building
854 machinery is all architecture-independent. 876 machinery is all architecture-independent.
855 877
856 878
857 head-y, init-y, core-y, libs-y, drivers-y, net-y 879 head-y, init-y, core-y, libs-y, drivers-y, net-y
858 880
859 $(head-y) list objects to be linked first in vmlinux. 881 $(head-y) lists objects to be linked first in vmlinux.
860 $(libs-y) list directories where a lib.a archive can be located. 882 $(libs-y) lists directories where a lib.a archive can be located.
861 The rest list directories where a built-in.o object file can be located. 883 The rest lists directories where a built-in.o object file can be
884 located.
862 885
863 $(init-y) objects will be located after $(head-y). 886 $(init-y) objects will be located after $(head-y).
864 Then the rest follows in this order: 887 Then the rest follows in this order:
865 $(core-y), $(libs-y), $(drivers-y) and $(net-y). 888 $(core-y), $(libs-y), $(drivers-y) and $(net-y).
866 889
867 The top level Makefile define values for all generic directories, 890 The top level Makefile defines values for all generic directories,
868 and arch/$(ARCH)/Makefile only adds architecture specific directories. 891 and arch/$(ARCH)/Makefile only adds architecture specific directories.
869 892
870 Example: 893 Example:
@@ -901,27 +924,27 @@ When kbuild executes the following steps are followed (roughly):
901 "$(Q)$(MAKE) $(build)=<dir>" is the recommended way to invoke 924 "$(Q)$(MAKE) $(build)=<dir>" is the recommended way to invoke
902 make in a subdirectory. 925 make in a subdirectory.
903 926
904 There are no rules for naming of the architecture specific targets, 927 There are no rules for naming architecture specific targets,
905 but executing "make help" will list all relevant targets. 928 but executing "make help" will list all relevant targets.
906 To support this $(archhelp) must be defined. 929 To support this, $(archhelp) must be defined.
907 930
908 Example: 931 Example:
909 #arch/i386/Makefile 932 #arch/i386/Makefile
910 define archhelp 933 define archhelp
911 echo '* bzImage - Image (arch/$(ARCH)/boot/bzImage)' 934 echo '* bzImage - Image (arch/$(ARCH)/boot/bzImage)'
912 endef 935 endif
913 936
914 When make is executed without arguments, the first goal encountered 937 When make is executed without arguments, the first goal encountered
915 will be built. In the top level Makefile the first goal present 938 will be built. In the top level Makefile the first goal present
916 is all:. 939 is all:.
917 An architecture shall always per default build a bootable image. 940 An architecture shall always, per default, build a bootable image.
918 In "make help" the default goal is highlighted with a '*'. 941 In "make help", the default goal is highlighted with a '*'.
919 Add a new prerequisite to all: to select a default goal different 942 Add a new prerequisite to all: to select a default goal different
920 from vmlinux. 943 from vmlinux.
921 944
922 Example: 945 Example:
923 #arch/i386/Makefile 946 #arch/i386/Makefile
924 all: bzImage 947 all: bzImage
925 948
926 When "make" is executed without arguments, bzImage will be built. 949 When "make" is executed without arguments, bzImage will be built.
927 950
@@ -941,10 +964,10 @@ When kbuild executes the following steps are followed (roughly):
941 #arch/i386/kernel/Makefile 964 #arch/i386/kernel/Makefile
942 extra-y := head.o init_task.o 965 extra-y := head.o init_task.o
943 966
944 In this example extra-y is used to list object files that 967 In this example, extra-y is used to list object files that
945 shall be built, but shall not be linked as part of built-in.o. 968 shall be built, but shall not be linked as part of built-in.o.
946 969
947 970
948--- 6.6 Commands useful for building a boot image 971--- 6.6 Commands useful for building a boot image
949 972
950 Kbuild provides a few macros that are useful when building a 973 Kbuild provides a few macros that are useful when building a
@@ -958,8 +981,8 @@ When kbuild executes the following steps are followed (roughly):
958 target: source(s) FORCE 981 target: source(s) FORCE
959 $(call if_changed,ld/objcopy/gzip) 982 $(call if_changed,ld/objcopy/gzip)
960 983
961 When the rule is evaluated it is checked to see if any files 984 When the rule is evaluated, it is checked to see if any files
962 needs an update, or the commandline has changed since last 985 needs an update, or the command line has changed since the last
963 invocation. The latter will force a rebuild if any options 986 invocation. The latter will force a rebuild if any options
964 to the executable have changed. 987 to the executable have changed.
965 Any target that utilises if_changed must be listed in $(targets), 988 Any target that utilises if_changed must be listed in $(targets),
@@ -977,8 +1000,8 @@ When kbuild executes the following steps are followed (roughly):
977 #WRONG!# $(call if_changed, ld/objcopy/gzip) 1000 #WRONG!# $(call if_changed, ld/objcopy/gzip)
978 1001
979 ld 1002 ld
980 Link target. Often LDFLAGS_$@ is used to set specific options to ld. 1003 Link target. Often, LDFLAGS_$@ is used to set specific options to ld.
981 1004
982 objcopy 1005 objcopy
983 Copy binary. Uses OBJCOPYFLAGS usually specified in 1006 Copy binary. Uses OBJCOPYFLAGS usually specified in
984 arch/$(ARCH)/Makefile. 1007 arch/$(ARCH)/Makefile.
@@ -996,10 +1019,10 @@ When kbuild executes the following steps are followed (roughly):
996 $(obj)/setup $(obj)/bootsect: %: %.o FORCE 1019 $(obj)/setup $(obj)/bootsect: %: %.o FORCE
997 $(call if_changed,ld) 1020 $(call if_changed,ld)
998 1021
999 In this example there are two possible targets, requiring different 1022 In this example, there are two possible targets, requiring different
1000 options to the linker. the linker options are specified using the 1023 options to the linker. The linker options are specified using the
1001 LDFLAGS_$@ syntax - one for each potential target. 1024 LDFLAGS_$@ syntax - one for each potential target.
1002 $(targets) are assinged all potential targets, herby kbuild knows 1025 $(targets) are assinged all potential targets, by which kbuild knows
1003 the targets and will: 1026 the targets and will:
1004 1) check for commandline changes 1027 1) check for commandline changes
1005 2) delete target during make clean 1028 2) delete target during make clean
@@ -1013,7 +1036,7 @@ When kbuild executes the following steps are followed (roughly):
1013 1036
1014--- 6.7 Custom kbuild commands 1037--- 6.7 Custom kbuild commands
1015 1038
1016 When kbuild is executing with KBUILD_VERBOSE=0 then only a shorthand 1039 When kbuild is executing with KBUILD_VERBOSE=0, then only a shorthand
1017 of a command is normally displayed. 1040 of a command is normally displayed.
1018 To enable this behaviour for custom commands kbuild requires 1041 To enable this behaviour for custom commands kbuild requires
1019 two variables to be set: 1042 two variables to be set:
@@ -1031,34 +1054,34 @@ When kbuild executes the following steps are followed (roughly):
1031 $(call if_changed,image) 1054 $(call if_changed,image)
1032 @echo 'Kernel: $@ is ready' 1055 @echo 'Kernel: $@ is ready'
1033 1056
1034 When updating the $(obj)/bzImage target the line: 1057 When updating the $(obj)/bzImage target, the line
1035 1058
1036 BUILD arch/i386/boot/bzImage 1059 BUILD arch/i386/boot/bzImage
1037 1060
1038 will be displayed with "make KBUILD_VERBOSE=0". 1061 will be displayed with "make KBUILD_VERBOSE=0".
1039 1062
1040 1063
1041--- 6.8 Preprocessing linker scripts 1064--- 6.8 Preprocessing linker scripts
1042 1065
1043 When the vmlinux image is build the linker script: 1066 When the vmlinux image is built, the linker script
1044 arch/$(ARCH)/kernel/vmlinux.lds is used. 1067 arch/$(ARCH)/kernel/vmlinux.lds is used.
1045 The script is a preprocessed variant of the file vmlinux.lds.S 1068 The script is a preprocessed variant of the file vmlinux.lds.S
1046 located in the same directory. 1069 located in the same directory.
1047 kbuild knows .lds file and includes a rule *lds.S -> *lds. 1070 kbuild knows .lds files and includes a rule *lds.S -> *lds.
1048 1071
1049 Example: 1072 Example:
1050 #arch/i386/kernel/Makefile 1073 #arch/i386/kernel/Makefile
1051 always := vmlinux.lds 1074 always := vmlinux.lds
1052 1075
1053 #Makefile 1076 #Makefile
1054 export CPPFLAGS_vmlinux.lds += -P -C -U$(ARCH) 1077 export CPPFLAGS_vmlinux.lds += -P -C -U$(ARCH)
1055 1078
1056 The assigment to $(always) is used to tell kbuild to build the 1079 The assignment to $(always) is used to tell kbuild to build the
1057 target: vmlinux.lds. 1080 target vmlinux.lds.
1058 The assignment to $(CPPFLAGS_vmlinux.lds) tell kbuild to use the 1081 The assignment to $(CPPFLAGS_vmlinux.lds) tells kbuild to use the
1059 specified options when building the target vmlinux.lds. 1082 specified options when building the target vmlinux.lds.
1060 1083
1061 When building the *.lds target kbuild used the variakles: 1084 When building the *.lds target, kbuild uses the variables:
1062 CPPFLAGS : Set in top-level Makefile 1085 CPPFLAGS : Set in top-level Makefile
1063 EXTRA_CPPFLAGS : May be set in the kbuild makefile 1086 EXTRA_CPPFLAGS : May be set in the kbuild makefile
1064 CPPFLAGS_$(@F) : Target specific flags. 1087 CPPFLAGS_$(@F) : Target specific flags.
@@ -1123,9 +1146,17 @@ The top Makefile exports the following variables:
1123 $(INSTALL_MOD_PATH)/lib/modules/$(KERNELRELEASE). The user may 1146 $(INSTALL_MOD_PATH)/lib/modules/$(KERNELRELEASE). The user may
1124 override this value on the command line if desired. 1147 override this value on the command line if desired.
1125 1148
1149 INSTALL_MOD_STRIP
1150
1151 If this variable is specified, will cause modules to be stripped
1152 after they are installed. If INSTALL_MOD_STRIP is '1', then the
1153 default option --strip-debug will be used. Otherwise,
1154 INSTALL_MOD_STRIP will used as the option(s) to the strip command.
1155
1156
1126=== 8 Makefile language 1157=== 8 Makefile language
1127 1158
1128The kernel Makefiles are designed to run with GNU Make. The Makefiles 1159The kernel Makefiles are designed to be run with GNU Make. The Makefiles
1129use only the documented features of GNU Make, but they do use many 1160use only the documented features of GNU Make, but they do use many
1130GNU extensions. 1161GNU extensions.
1131 1162
@@ -1147,10 +1178,13 @@ is the right choice.
1147Original version made by Michael Elizabeth Chastain, <mailto:mec@shout.net> 1178Original version made by Michael Elizabeth Chastain, <mailto:mec@shout.net>
1148Updates by Kai Germaschewski <kai@tp1.ruhr-uni-bochum.de> 1179Updates by Kai Germaschewski <kai@tp1.ruhr-uni-bochum.de>
1149Updates by Sam Ravnborg <sam@ravnborg.org> 1180Updates by Sam Ravnborg <sam@ravnborg.org>
1181Language QA by Jan Engelhardt <jengelh@gmx.de>
1150 1182
1151=== 10 TODO 1183=== 10 TODO
1152 1184
1153- Describe how kbuild support shipped files with _shipped. 1185- Describe how kbuild supports shipped files with _shipped.
1154- Generating offset header files. 1186- Generating offset header files.
1155- Add more variables to section 7? 1187- Add more variables to section 7?
1156 1188
1189
1190
diff --git a/Documentation/kbuild/modules.txt b/Documentation/kbuild/modules.txt
index 61fc079eb966..2e7702e94a78 100644
--- a/Documentation/kbuild/modules.txt
+++ b/Documentation/kbuild/modules.txt
@@ -1,7 +1,7 @@
1 1
2In this document you will find information about: 2In this document you will find information about:
3- how to build external modules 3- how to build external modules
4- how to make your module use kbuild infrastructure 4- how to make your module use the kbuild infrastructure
5- how kbuild will install a kernel 5- how kbuild will install a kernel
6- how to install modules in a non-standard location 6- how to install modules in a non-standard location
7 7
@@ -24,7 +24,7 @@ In this document you will find information about:
24 --- 6.1 INSTALL_MOD_PATH 24 --- 6.1 INSTALL_MOD_PATH
25 --- 6.2 INSTALL_MOD_DIR 25 --- 6.2 INSTALL_MOD_DIR
26 === 7. Module versioning & Module.symvers 26 === 7. Module versioning & Module.symvers
27 --- 7.1 Symbols fron the kernel (vmlinux + modules) 27 --- 7.1 Symbols from the kernel (vmlinux + modules)
28 --- 7.2 Symbols and external modules 28 --- 7.2 Symbols and external modules
29 --- 7.3 Symbols from another external module 29 --- 7.3 Symbols from another external module
30 === 8. Tips & Tricks 30 === 8. Tips & Tricks
@@ -36,13 +36,13 @@ In this document you will find information about:
36 36
37kbuild includes functionality for building modules both 37kbuild includes functionality for building modules both
38within the kernel source tree and outside the kernel source tree. 38within the kernel source tree and outside the kernel source tree.
39The latter is usually referred to as external modules and is used 39The latter is usually referred to as external or "out-of-tree"
40both during development and for modules that are not planned to be 40modules and is used both during development and for modules that
41included in the kernel tree. 41are not planned to be included in the kernel tree.
42 42
43What is covered within this file is mainly information to authors 43What is covered within this file is mainly information to authors
44of modules. The author of an external modules should supply 44of modules. The author of an external module should supply
45a makefile that hides most of the complexity so one only has to type 45a makefile that hides most of the complexity, so one only has to type
46'make' to build the module. A complete example will be present in 46'make' to build the module. A complete example will be present in
47chapter 4, "Creating a kbuild file for an external module". 47chapter 4, "Creating a kbuild file for an external module".
48 48
@@ -63,14 +63,15 @@ when building an external module.
63 For the running kernel use: 63 For the running kernel use:
64 make -C /lib/modules/`uname -r`/build M=`pwd` 64 make -C /lib/modules/`uname -r`/build M=`pwd`
65 65
66 For the above command to succeed the kernel must have been built with 66 For the above command to succeed, the kernel must have been
67 modules enabled. 67 built with modules enabled.
68 68
69 To install the modules that were just built: 69 To install the modules that were just built:
70 70
71 make -C <path-to-kernel> M=`pwd` modules_install 71 make -C <path-to-kernel> M=`pwd` modules_install
72 72
73 More complex examples later, the above should get you going. 73 More complex examples will be shown later, the above should
74 be enough to get you started.
74 75
75--- 2.2 Available targets 76--- 2.2 Available targets
76 77
@@ -89,13 +90,13 @@ when building an external module.
89 Same functionality as if no target was specified. 90 Same functionality as if no target was specified.
90 See description above. 91 See description above.
91 92
92 make -C $KDIR M=$PWD modules_install 93 make -C $KDIR M=`pwd` modules_install
93 Install the external module(s). 94 Install the external module(s).
94 Installation default is in /lib/modules/<kernel-version>/extra, 95 Installation default is in /lib/modules/<kernel-version>/extra,
95 but may be prefixed with INSTALL_MOD_PATH - see separate 96 but may be prefixed with INSTALL_MOD_PATH - see separate
96 chapter. 97 chapter.
97 98
98 make -C $KDIR M=$PWD clean 99 make -C $KDIR M=`pwd` clean
99 Remove all generated files for the module - the kernel 100 Remove all generated files for the module - the kernel
100 source directory is not modified. 101 source directory is not modified.
101 102
@@ -129,29 +130,28 @@ when building an external module.
129 130
130 To make sure the kernel contains the information required to 131 To make sure the kernel contains the information required to
131 build external modules the target 'modules_prepare' must be used. 132 build external modules the target 'modules_prepare' must be used.
132 'module_prepare' solely exists as a simple way to prepare 133 'module_prepare' exists solely as a simple way to prepare
133 a kernel for building external modules. 134 a kernel source tree for building external modules.
134 Note: modules_prepare will not build Module.symvers even if 135 Note: modules_prepare will not build Module.symvers even if
135 CONFIG_MODULEVERSIONING is set. 136 CONFIG_MODULEVERSIONING is set. Therefore a full kernel build
136 Therefore a full kernel build needs to be executed to make 137 needs to be executed to make module versioning work.
137 module versioning work.
138 138
139--- 2.5 Building separate files for a module 139--- 2.5 Building separate files for a module
140 It is possible to build single files which is part of a module. 140 It is possible to build single files which are part of a module.
141 This works equal for the kernel, a module and even for external 141 This works equally well for the kernel, a module and even for
142 modules. 142 external modules.
143 Examples (module foo.ko, consist of bar.o, baz.o): 143 Examples (module foo.ko, consist of bar.o, baz.o):
144 make -C $KDIR M=`pwd` bar.lst 144 make -C $KDIR M=`pwd` bar.lst
145 make -C $KDIR M=`pwd` bar.o 145 make -C $KDIR M=`pwd` bar.o
146 make -C $KDIR M=`pwd` foo.ko 146 make -C $KDIR M=`pwd` foo.ko
147 make -C $KDIR M=`pwd` / 147 make -C $KDIR M=`pwd` /
148 148
149 149
150=== 3. Example commands 150=== 3. Example commands
151 151
152This example shows the actual commands to be executed when building 152This example shows the actual commands to be executed when building
153an external module for the currently running kernel. 153an external module for the currently running kernel.
154In the example below the distribution is supposed to use the 154In the example below, the distribution is supposed to use the
155facility to locate output files for a kernel compile in a different 155facility to locate output files for a kernel compile in a different
156directory than the kernel source - but the examples will also work 156directory than the kernel source - but the examples will also work
157when the source and the output files are mixed in the same directory. 157when the source and the output files are mixed in the same directory.
@@ -170,14 +170,14 @@ the following commands to build the module:
170 O=/lib/modules/`uname-r`/build \ 170 O=/lib/modules/`uname-r`/build \
171 M=`pwd` 171 M=`pwd`
172 172
173Then to install the module use the following command: 173Then, to install the module use the following command:
174 174
175 make -C /usr/src/`uname -r`/source \ 175 make -C /usr/src/`uname -r`/source \
176 O=/lib/modules/`uname-r`/build \ 176 O=/lib/modules/`uname-r`/build \
177 M=`pwd` \ 177 M=`pwd` \
178 modules_install 178 modules_install
179 179
180If one looks closely you will see that this is the same commands as 180If you look closely you will see that this is the same command as
181listed before - with the directories spelled out. 181listed before - with the directories spelled out.
182 182
183The above are rather long commands, and the following chapter 183The above are rather long commands, and the following chapter
@@ -230,7 +230,7 @@ following files:
230 230
231 endif 231 endif
232 232
233 In example 1 the check for KERNELRELEASE is used to separate 233 In example 1, the check for KERNELRELEASE is used to separate
234 the two parts of the Makefile. kbuild will only see the two 234 the two parts of the Makefile. kbuild will only see the two
235 assignments whereas make will see everything except the two 235 assignments whereas make will see everything except the two
236 kbuild assignments. 236 kbuild assignments.
@@ -255,7 +255,7 @@ following files:
255 echo "X" > 8123_bin_shipped 255 echo "X" > 8123_bin_shipped
256 256
257 257
258 In example 2 we are down to two fairly simple files and for simple 258 In example 2, we are down to two fairly simple files and for simple
259 files as used in this example the split is questionable. But some 259 files as used in this example the split is questionable. But some
260 external modules use Makefiles of several hundred lines and here it 260 external modules use Makefiles of several hundred lines and here it
261 really pays off to separate the kbuild part from the rest. 261 really pays off to separate the kbuild part from the rest.
@@ -282,9 +282,9 @@ following files:
282 282
283 endif 283 endif
284 284
285 The trick here is to include the Kbuild file from Makefile so 285 The trick here is to include the Kbuild file from Makefile, so
286 if an older version of kbuild picks up the Makefile the Kbuild 286 if an older version of kbuild picks up the Makefile, the Kbuild
287 file will be included. 287 file will be included.
288 288
289--- 4.2 Binary blobs included in a module 289--- 4.2 Binary blobs included in a module
290 290
@@ -301,18 +301,19 @@ following files:
301 obj-m := 8123.o 301 obj-m := 8123.o
302 8123-y := 8123_if.o 8123_pci.o 8123_bin.o 302 8123-y := 8123_if.o 8123_pci.o 8123_bin.o
303 303
304 In example 4 there is no distinction between the ordinary .c/.h files 304 In example 4, there is no distinction between the ordinary .c/.h files
305 and the binary file. But kbuild will pick up different rules to create 305 and the binary file. But kbuild will pick up different rules to create
306 the .o file. 306 the .o file.
307 307
308 308
309=== 5. Include files 309=== 5. Include files
310 310
311Include files are a necessity when a .c file uses something from another .c 311Include files are a necessity when a .c file uses something from other .c
312files (not strictly in the sense of .c but if good programming practice is 312files (not strictly in the sense of C, but if good programming practice is
313used). Any module that consist of more than one .c file will have a .h file 313used). Any module that consists of more than one .c file will have a .h file
314for one of the .c files. 314for one of the .c files.
315- If the .h file only describes a module internal interface then the .h file 315
316- If the .h file only describes a module internal interface, then the .h file
316 shall be placed in the same directory as the .c files. 317 shall be placed in the same directory as the .c files.
317- If the .h files describe an interface used by other parts of the kernel 318- If the .h files describe an interface used by other parts of the kernel
318 located in different directories, the .h files shall be located in 319 located in different directories, the .h files shall be located in
@@ -323,11 +324,11 @@ under include/ such as include/scsi. Another exception is arch-specific
323.h files which are located under include/asm-$(ARCH)/*. 324.h files which are located under include/asm-$(ARCH)/*.
324 325
325External modules have a tendency to locate include files in a separate include/ 326External modules have a tendency to locate include files in a separate include/
326directory and therefore needs to deal with this in their kbuild file. 327directory and therefore need to deal with this in their kbuild file.
327 328
328--- 5.1 How to include files from the kernel include dir 329--- 5.1 How to include files from the kernel include dir
329 330
330 When a module needs to include a file from include/linux/ then one 331 When a module needs to include a file from include/linux/, then one
331 just uses: 332 just uses:
332 333
333 #include <linux/modules.h> 334 #include <linux/modules.h>
@@ -348,7 +349,7 @@ directory and therefore needs to deal with this in their kbuild file.
348 The trick here is to use either EXTRA_CFLAGS (take effect for all .c 349 The trick here is to use either EXTRA_CFLAGS (take effect for all .c
349 files) or CFLAGS_$F.o (take effect only for a single file). 350 files) or CFLAGS_$F.o (take effect only for a single file).
350 351
351 In our example if we move 8123_if.h to a subdirectory named include/ 352 In our example, if we move 8123_if.h to a subdirectory named include/
352 the resulting Kbuild file would look like: 353 the resulting Kbuild file would look like:
353 354
354 --> filename: Kbuild 355 --> filename: Kbuild
@@ -362,19 +363,19 @@ directory and therefore needs to deal with this in their kbuild file.
362 363
363--- 5.3 External modules using several directories 364--- 5.3 External modules using several directories
364 365
365 If an external module does not follow the usual kernel style but 366 If an external module does not follow the usual kernel style, but
366 decide to spread files over several directories then kbuild can 367 decides to spread files over several directories, then kbuild can
367 support this too. 368 handle this too.
368 369
369 Consider the following example: 370 Consider the following example:
370 371
371 | 372 |
372 +- src/complex_main.c 373 +- src/complex_main.c
373 | +- hal/hardwareif.c 374 | +- hal/hardwareif.c
374 | +- hal/include/hardwareif.h 375 | +- hal/include/hardwareif.h
375 +- include/complex.h 376 +- include/complex.h
376 377
377 To build a single module named complex.ko we then need the following 378 To build a single module named complex.ko, we then need the following
378 kbuild file: 379 kbuild file:
379 380
380 Kbuild: 381 Kbuild:
@@ -387,12 +388,12 @@ directory and therefore needs to deal with this in their kbuild file.
387 388
388 389
389 kbuild knows how to handle .o files located in another directory - 390 kbuild knows how to handle .o files located in another directory -
390 although this is NOT reccommended practice. The syntax is to specify 391 although this is NOT recommended practice. The syntax is to specify
391 the directory relative to the directory where the Kbuild file is 392 the directory relative to the directory where the Kbuild file is
392 located. 393 located.
393 394
394 To find the .h files we have to explicitly tell kbuild where to look 395 To find the .h files, we have to explicitly tell kbuild where to look
395 for the .h files. When kbuild executes current directory is always 396 for the .h files. When kbuild executes, the current directory is always
396 the root of the kernel tree (argument to -C) and therefore we have to 397 the root of the kernel tree (argument to -C) and therefore we have to
397 tell kbuild how to find the .h files using absolute paths. 398 tell kbuild how to find the .h files using absolute paths.
398 $(src) will specify the absolute path to the directory where the 399 $(src) will specify the absolute path to the directory where the
@@ -412,7 +413,7 @@ External modules are installed in the directory:
412 413
413--- 6.1 INSTALL_MOD_PATH 414--- 6.1 INSTALL_MOD_PATH
414 415
415 Above are the default directories, but as always some level of 416 Above are the default directories, but as always, some level of
416 customization is possible. One can prefix the path using the variable 417 customization is possible. One can prefix the path using the variable
417 INSTALL_MOD_PATH: 418 INSTALL_MOD_PATH:
418 419
@@ -420,17 +421,17 @@ External modules are installed in the directory:
420 => Install dir: /frodo/lib/modules/$(KERNELRELEASE)/kernel 421 => Install dir: /frodo/lib/modules/$(KERNELRELEASE)/kernel
421 422
422 INSTALL_MOD_PATH may be set as an ordinary shell variable or as in the 423 INSTALL_MOD_PATH may be set as an ordinary shell variable or as in the
423 example above be specified on the command line when calling make. 424 example above, can be specified on the command line when calling make.
424 INSTALL_MOD_PATH has effect both when installing modules included in 425 INSTALL_MOD_PATH has effect both when installing modules included in
425 the kernel as well as when installing external modules. 426 the kernel as well as when installing external modules.
426 427
427--- 6.2 INSTALL_MOD_DIR 428--- 6.2 INSTALL_MOD_DIR
428 429
429 When installing external modules they are default installed in a 430 When installing external modules they are by default installed to a
430 directory under /lib/modules/$(KERNELRELEASE)/extra, but one may wish 431 directory under /lib/modules/$(KERNELRELEASE)/extra, but one may wish
431 to locate modules for a specific functionality in a separate 432 to locate modules for a specific functionality in a separate
432 directory. For this purpose one can use INSTALL_MOD_DIR to specify an 433 directory. For this purpose, one can use INSTALL_MOD_DIR to specify an
433 alternative name than 'extra'. 434 alternative name to 'extra'.
434 435
435 $ make INSTALL_MOD_DIR=gandalf -C KERNELDIR \ 436 $ make INSTALL_MOD_DIR=gandalf -C KERNELDIR \
436 M=`pwd` modules_install 437 M=`pwd` modules_install
@@ -444,16 +445,16 @@ Module versioning is enabled by the CONFIG_MODVERSIONS tag.
444Module versioning is used as a simple ABI consistency check. The Module 445Module versioning is used as a simple ABI consistency check. The Module
445versioning creates a CRC value of the full prototype for an exported symbol and 446versioning creates a CRC value of the full prototype for an exported symbol and
446when a module is loaded/used then the CRC values contained in the kernel are 447when a module is loaded/used then the CRC values contained in the kernel are
447compared with similar values in the module. If they are not equal then the 448compared with similar values in the module. If they are not equal, then the
448kernel refuses to load the module. 449kernel refuses to load the module.
449 450
450Module.symvers contains a list of all exported symbols from a kernel build. 451Module.symvers contains a list of all exported symbols from a kernel build.
451 452
452--- 7.1 Symbols fron the kernel (vmlinux + modules) 453--- 7.1 Symbols fron the kernel (vmlinux + modules)
453 454
454 During a kernel build a file named Module.symvers will be generated. 455 During a kernel build, a file named Module.symvers will be generated.
455 Module.symvers contains all exported symbols from the kernel and 456 Module.symvers contains all exported symbols from the kernel and
456 compiled modules. For each symbols the corresponding CRC value 457 compiled modules. For each symbols, the corresponding CRC value
457 is stored too. 458 is stored too.
458 459
459 The syntax of the Module.symvers file is: 460 The syntax of the Module.symvers file is:
@@ -461,27 +462,27 @@ Module.symvers contains a list of all exported symbols from a kernel build.
461 Sample: 462 Sample:
462 0x2d036834 scsi_remove_host drivers/scsi/scsi_mod 463 0x2d036834 scsi_remove_host drivers/scsi/scsi_mod
463 464
464 For a kernel build without CONFIG_MODVERSIONING enabled the crc 465 For a kernel build without CONFIG_MODVERSIONS enabled, the crc
465 would read: 0x00000000 466 would read: 0x00000000
466 467
467 Module.symvers serve two purposes. 468 Module.symvers serves two purposes:
468 1) It list all exported symbols both from vmlinux and all modules 469 1) It lists all exported symbols both from vmlinux and all modules
469 2) It list CRC if CONFIG_MODVERSION is enabled 470 2) It lists the CRC if CONFIG_MODVERSIONS is enabled
470 471
471--- 7.2 Symbols and external modules 472--- 7.2 Symbols and external modules
472 473
473 When building an external module the build system needs access to 474 When building an external module, the build system needs access to
474 the symbols from the kernel to check if all external symbols are 475 the symbols from the kernel to check if all external symbols are
475 defined. This is done in the MODPOST step and to obtain all 476 defined. This is done in the MODPOST step and to obtain all
476 symbols modpost reads Module.symvers from the kernel. 477 symbols, modpost reads Module.symvers from the kernel.
477 If a Module.symvers file is present in the directory where 478 If a Module.symvers file is present in the directory where
478 the external module is being build this file will be read too. 479 the external module is being built, this file will be read too.
479 During the MODPOST step a new Module.symvers file will be written 480 During the MODPOST step, a new Module.symvers file will be written
480 containing all exported symbols that was not defined in the kernel. 481 containing all exported symbols that were not defined in the kernel.
481 482
482--- 7.3 Symbols from another external module 483--- 7.3 Symbols from another external module
483 484
484 Sometimes one external module uses exported symbols from another 485 Sometimes, an external module uses exported symbols from another
485 external module. Kbuild needs to have full knowledge on all symbols 486 external module. Kbuild needs to have full knowledge on all symbols
486 to avoid spitting out warnings about undefined symbols. 487 to avoid spitting out warnings about undefined symbols.
487 Two solutions exist to let kbuild know all symbols of more than 488 Two solutions exist to let kbuild know all symbols of more than
@@ -490,15 +491,15 @@ Module.symvers contains a list of all exported symbols from a kernel build.
490 impractical in certain situations. 491 impractical in certain situations.
491 492
492 Use a top-level Kbuild file 493 Use a top-level Kbuild file
493 If you have two modules: 'foo', 'bar' and 'foo' needs symbols 494 If you have two modules: 'foo' and 'bar', and 'foo' needs
494 from 'bar' then one can use a common top-level kbuild file so 495 symbols from 'bar', then one can use a common top-level kbuild
495 both modules are compiled in same build. 496 file so both modules are compiled in same build.
496 497
497 Consider following directory layout: 498 Consider following directory layout:
498 ./foo/ <= contains the foo module 499 ./foo/ <= contains the foo module
499 ./bar/ <= contains the bar module 500 ./bar/ <= contains the bar module
500 The top-level Kbuild file would then look like: 501 The top-level Kbuild file would then look like:
501 502
502 #./Kbuild: (this file may also be named Makefile) 503 #./Kbuild: (this file may also be named Makefile)
503 obj-y := foo/ bar/ 504 obj-y := foo/ bar/
504 505
@@ -509,23 +510,23 @@ Module.symvers contains a list of all exported symbols from a kernel build.
509 knowledge on symbols from both modules. 510 knowledge on symbols from both modules.
510 511
511 Use an extra Module.symvers file 512 Use an extra Module.symvers file
512 When an external module is build a Module.symvers file is 513 When an external module is built, a Module.symvers file is
513 generated containing all exported symbols which are not 514 generated containing all exported symbols which are not
514 defined in the kernel. 515 defined in the kernel.
515 To get access to symbols from module 'bar' one can copy the 516 To get access to symbols from module 'bar', one can copy the
516 Module.symvers file from the compilation of the 'bar' module 517 Module.symvers file from the compilation of the 'bar' module
517 to the directory where the 'foo' module is build. 518 to the directory where the 'foo' module is built.
518 During the module build kbuild will read the Module.symvers 519 During the module build, kbuild will read the Module.symvers
519 file in the directory of the external module and when the 520 file in the directory of the external module and when the
520 build is finished a new Module.symvers file is created 521 build is finished, a new Module.symvers file is created
521 containing the sum of all symbols defined and not part of the 522 containing the sum of all symbols defined and not part of the
522 kernel. 523 kernel.
523 524
524=== 8. Tips & Tricks 525=== 8. Tips & Tricks
525 526
526--- 8.1 Testing for CONFIG_FOO_BAR 527--- 8.1 Testing for CONFIG_FOO_BAR
527 528
528 Modules often needs to check for certain CONFIG_ options to decide if 529 Modules often need to check for certain CONFIG_ options to decide if
529 a specific feature shall be included in the module. When kbuild is used 530 a specific feature shall be included in the module. When kbuild is used
530 this is done by referencing the CONFIG_ variable directly. 531 this is done by referencing the CONFIG_ variable directly.
531 532
@@ -537,7 +538,7 @@ Module.symvers contains a list of all exported symbols from a kernel build.
537 538
538 External modules have traditionally used grep to check for specific 539 External modules have traditionally used grep to check for specific
539 CONFIG_ settings directly in .config. This usage is broken. 540 CONFIG_ settings directly in .config. This usage is broken.
540 As introduced before external modules shall use kbuild when building 541 As introduced before, external modules shall use kbuild when building
541 and therefore can use the same methods as in-kernel modules when testing 542 and therefore can use the same methods as in-kernel modules when
542 for CONFIG_ definitions. 543 testing for CONFIG_ definitions.
543 544
diff --git a/Documentation/kdump/gdbmacros.txt b/Documentation/kdump/gdbmacros.txt
index dcf5580380ab..9b9b454b048a 100644
--- a/Documentation/kdump/gdbmacros.txt
+++ b/Documentation/kdump/gdbmacros.txt
@@ -175,7 +175,7 @@ end
175document trapinfo 175document trapinfo
176 Run info threads and lookup pid of thread #1 176 Run info threads and lookup pid of thread #1
177 'trapinfo <pid>' will tell you by which trap & possibly 177 'trapinfo <pid>' will tell you by which trap & possibly
178 addresthe kernel paniced. 178 address the kernel panicked.
179end 179end
180 180
181 181
diff --git a/Documentation/kdump/kdump.txt b/Documentation/kdump/kdump.txt
index 212cf3c21abf..08bafa8c1caa 100644
--- a/Documentation/kdump/kdump.txt
+++ b/Documentation/kdump/kdump.txt
@@ -1,155 +1,325 @@
1Documentation for kdump - the kexec-based crash dumping solution 1================================================================
2Documentation for Kdump - The kexec-based Crash Dumping Solution
2================================================================ 3================================================================
3 4
4DESIGN 5This document includes overview, setup and installation, and analysis
5====== 6information.
6 7
7Kdump uses kexec to reboot to a second kernel whenever a dump needs to be 8Overview
8taken. This second kernel is booted with very little memory. The first kernel 9========
9reserves the section of memory that the second kernel uses. This ensures that
10on-going DMA from the first kernel does not corrupt the second kernel.
11 10
12All the necessary information about Core image is encoded in ELF format and 11Kdump uses kexec to quickly boot to a dump-capture kernel whenever a
13stored in reserved area of memory before crash. Physical address of start of 12dump of the system kernel's memory needs to be taken (for example, when
14ELF header is passed to new kernel through command line parameter elfcorehdr=. 13the system panics). The system kernel's memory image is preserved across
14the reboot and is accessible to the dump-capture kernel.
15 15
16On i386, the first 640 KB of physical memory is needed to boot, irrespective 16You can use common Linux commands, such as cp and scp, to copy the
17of where the kernel loads. Hence, this region is backed up by kexec just before 17memory image to a dump file on the local disk, or across the network to
18rebooting into the new kernel. 18a remote system.
19 19
20In the second kernel, "old memory" can be accessed in two ways. 20Kdump and kexec are currently supported on the x86, x86_64, and ppc64
21architectures.
21 22
22- The first one is through a /dev/oldmem device interface. A capture utility 23When the system kernel boots, it reserves a small section of memory for
23 can read the device file and write out the memory in raw format. This is raw 24the dump-capture kernel. This ensures that ongoing Direct Memory Access
24 dump of memory and analysis/capture tool should be intelligent enough to 25(DMA) from the system kernel does not corrupt the dump-capture kernel.
25 determine where to look for the right information. ELF headers (elfcorehdr=) 26The kexec -p command loads the dump-capture kernel into this reserved
26 can become handy here. 27memory.
27 28
28- The second interface is through /proc/vmcore. This exports the dump as an ELF 29On x86 machines, the first 640 KB of physical memory is needed to boot,
29 format file which can be written out using any file copy command 30regardless of where the kernel loads. Therefore, kexec backs up this
30 (cp, scp, etc). Further, gdb can be used to perform limited debugging on 31region just before rebooting into the dump-capture kernel.
31 the dump file. This method ensures methods ensure that there is correct
32 ordering of the dump pages (corresponding to the first 640 KB that has been
33 relocated).
34 32
35SETUP 33All of the necessary information about the system kernel's core image is
36===== 34encoded in the ELF format, and stored in a reserved area of memory
35before a crash. The physical address of the start of the ELF header is
36passed to the dump-capture kernel through the elfcorehdr= boot
37parameter.
38
39With the dump-capture kernel, you can access the memory image, or "old
40memory," in two ways:
41
42- Through a /dev/oldmem device interface. A capture utility can read the
43 device file and write out the memory in raw format. This is a raw dump
44 of memory. Analysis and capture tools must be intelligent enough to
45 determine where to look for the right information.
46
47- Through /proc/vmcore. This exports the dump as an ELF-format file that
48 you can write out using file copy commands such as cp or scp. Further,
49 you can use analysis tools such as the GNU Debugger (GDB) and the Crash
50 tool to debug the dump file. This method ensures that the dump pages are
51 correctly ordered.
52
53
54Setup and Installation
55======================
56
57Install kexec-tools and the Kdump patch
58---------------------------------------
59
601) Login as the root user.
61
622) Download the kexec-tools user-space package from the following URL:
63
64 http://www.xmission.com/~ebiederm/files/kexec/kexec-tools-1.101.tar.gz
65
663) Unpack the tarball with the tar command, as follows:
67
68 tar xvpzf kexec-tools-1.101.tar.gz
69
704) Download the latest consolidated Kdump patch from the following URL:
71
72 http://lse.sourceforge.net/kdump/
73
74 (This location is being used until all the user-space Kdump patches
75 are integrated with the kexec-tools package.)
76
775) Change to the kexec-tools-1.101 directory, as follows:
78
79 cd kexec-tools-1.101
80
816) Apply the consolidated patch to the kexec-tools-1.101 source tree
82 with the patch command, as follows. (Modify the path to the downloaded
83 patch as necessary.)
84
85 patch -p1 < /path-to-kdump-patch/kexec-tools-1.101-kdump.patch
86
877) Configure the package, as follows:
88
89 ./configure
90
918) Compile the package, as follows:
92
93 make
94
959) Install the package, as follows:
96
97 make install
98
99
100Download and build the system and dump-capture kernels
101------------------------------------------------------
102
103Download the mainline (vanilla) kernel source code (2.6.13-rc1 or newer)
104from http://www.kernel.org. Two kernels must be built: a system kernel
105and a dump-capture kernel. Use the following steps to configure these
106kernels with the necessary kexec and Kdump features:
107
108System kernel
109-------------
110
1111) Enable "kexec system call" in "Processor type and features."
112
113 CONFIG_KEXEC=y
114
1152) Enable "sysfs file system support" in "Filesystem" -> "Pseudo
116 filesystems." This is usually enabled by default.
117
118 CONFIG_SYSFS=y
119
120 Note that "sysfs file system support" might not appear in the "Pseudo
121 filesystems" menu if "Configure standard kernel features (for small
122 systems)" is not enabled in "General Setup." In this case, check the
123 .config file itself to ensure that sysfs is turned on, as follows:
124
125 grep 'CONFIG_SYSFS' .config
126
1273) Enable "Compile the kernel with debug info" in "Kernel hacking."
128
129 CONFIG_DEBUG_INFO=Y
130
131 This causes the kernel to be built with debug symbols. The dump
132 analysis tools require a vmlinux with debug symbols in order to read
133 and analyze a dump file.
134
1354) Make and install the kernel and its modules. Update the boot loader
136 (such as grub, yaboot, or lilo) configuration files as necessary.
137
1385) Boot the system kernel with the boot parameter "crashkernel=Y@X",
139 where Y specifies how much memory to reserve for the dump-capture kernel
140 and X specifies the beginning of this reserved memory. For example,
141 "crashkernel=64M@16M" tells the system kernel to reserve 64 MB of memory
142 starting at physical address 0x01000000 for the dump-capture kernel.
143
144 On x86 and x86_64, use "crashkernel=64M@16M".
145
146 On ppc64, use "crashkernel=128M@32M".
147
148
149The dump-capture kernel
150-----------------------
37 151
381) Download the upstream kexec-tools userspace package from 1521) Under "General setup," append "-kdump" to the current string in
39 http://www.xmission.com/~ebiederm/files/kexec/kexec-tools-1.101.tar.gz. 153 "Local version."
40 154
41 Apply the latest consolidated kdump patch on top of kexec-tools-1.101 1552) On x86, enable high memory support under "Processor type and
42 from http://lse.sourceforge.net/kdump/. This arrangment has been made 156 features":
43 till all the userspace patches supporting kdump are integrated with 157
44 upstream kexec-tools userspace. 158 CONFIG_HIGHMEM64G=y
45 159 or
462) Download and build the appropriate (2.6.13-rc1 onwards) vanilla kernels. 160 CONFIG_HIGHMEM4G
47 Two kernels need to be built in order to get this feature working. 161
48 Following are the steps to properly configure the two kernels specific 1623) On x86 and x86_64, disable symmetric multi-processing support
49 to kexec and kdump features: 163 under "Processor type and features":
50 164
51 A) First kernel or regular kernel: 165 CONFIG_SMP=n
52 ---------------------------------- 166 (If CONFIG_SMP=y, then specify maxcpus=1 on the kernel command line
53 a) Enable "kexec system call" feature (in Processor type and features). 167 when loading the dump-capture kernel, see section "Load the Dump-capture
54 CONFIG_KEXEC=y 168 Kernel".)
55 b) Enable "sysfs file system support" (in Pseudo filesystems). 169
56 CONFIG_SYSFS=y 1704) On ppc64, disable NUMA support and enable EMBEDDED support:
57 c) make 171
58 d) Boot into first kernel with the command line parameter "crashkernel=Y@X". 172 CONFIG_NUMA=n
59 Use appropriate values for X and Y. Y denotes how much memory to reserve 173 CONFIG_EMBEDDED=y
60 for the second kernel, and X denotes at what physical address the 174 CONFIG_EEH=N for the dump-capture kernel
61 reserved memory section starts. For example: "crashkernel=64M@16M". 175
62 1765) Enable "kernel crash dumps" support under "Processor type and
63 177 features":
64 B) Second kernel or dump capture kernel: 178
65 --------------------------------------- 179 CONFIG_CRASH_DUMP=y
66 a) For i386 architecture enable Highmem support 180
67 CONFIG_HIGHMEM=y 1816) Use a suitable value for "Physical address where the kernel is
68 b) Enable "kernel crash dumps" feature (under "Processor type and features") 182 loaded" (under "Processor type and features"). This only appears when
69 CONFIG_CRASH_DUMP=y 183 "kernel crash dumps" is enabled. By default this value is 0x1000000
70 c) Make sure a suitable value for "Physical address where the kernel is 184 (16MB). It should be the same as X in the "crashkernel=Y@X" boot
71 loaded" (under "Processor type and features"). By default this value 185 parameter discussed above.
72 is 0x1000000 (16MB) and it should be same as X (See option d above), 186
73 e.g., 16 MB or 0x1000000. 187 On x86 and x86_64, use "CONFIG_PHYSICAL_START=0x1000000".
74 CONFIG_PHYSICAL_START=0x1000000 188
75 d) Enable "/proc/vmcore support" (Optional, under "Pseudo filesystems"). 189 On ppc64 the value is automatically set at 32MB when
76 CONFIG_PROC_VMCORE=y 190 CONFIG_CRASH_DUMP is set.
77 191
783) After booting to regular kernel or first kernel, load the second kernel 1926) Optionally enable "/proc/vmcore support" under "Filesystems" ->
79 using the following command: 193 "Pseudo filesystems".
80 194
81 kexec -p <second-kernel> --args-linux --elf32-core-headers 195 CONFIG_PROC_VMCORE=y
82 --append="root=<root-dev> init 1 irqpoll maxcpus=1" 196 (CONFIG_PROC_VMCORE is set by default when CONFIG_CRASH_DUMP is selected.)
83 197
84 Notes: 1987) Make and install the kernel and its modules. DO NOT add this kernel
85 ====== 199 to the boot loader configuration files.
86 i) <second-kernel> has to be a vmlinux image ie uncompressed elf image. 200
87 bzImage will not work, as of now. 201
88 ii) --args-linux has to be speicfied as if kexec it loading an elf image, 202Load the Dump-capture Kernel
89 it needs to know that the arguments supplied are of linux type. 203============================
90 iii) By default ELF headers are stored in ELF64 format to support systems 204
91 with more than 4GB memory. Option --elf32-core-headers forces generation 205After booting to the system kernel, load the dump-capture kernel using
92 of ELF32 headers. The reason for this option being, as of now gdb can 206the following command:
93 not open vmcore file with ELF64 headers on a 32 bit systems. So ELF32 207
94 headers can be used if one has non-PAE systems and hence memory less 208 kexec -p <dump-capture-kernel> \
95 than 4GB. 209 --initrd=<initrd-for-dump-capture-kernel> --args-linux \
96 iv) Specify "irqpoll" as command line parameter. This reduces driver 210 --append="root=<root-dev> init 1 irqpoll"
97 initialization failures in second kernel due to shared interrupts. 211
98 v) <root-dev> needs to be specified in a format corresponding to the root 212
99 device name in the output of mount command. 213Notes on loading the dump-capture kernel:
100 vi) If you have built the drivers required to mount root file system as 214
101 modules in <second-kernel>, then, specify 215* <dump-capture-kernel> must be a vmlinux image (that is, an
102 --initrd=<initrd-for-second-kernel>. 216 uncompressed ELF image). bzImage does not work at this time.
103 vii) Specify maxcpus=1 as, if during first kernel run, if panic happens on 217
104 non-boot cpus, second kernel doesn't seem to be boot up all the cpus. 218* By default, the ELF headers are stored in ELF64 format to support
105 The other option is to always built the second kernel without SMP 219 systems with more than 4GB memory. The --elf32-core-headers option can
106 support ie CONFIG_SMP=n 220 be used to force the generation of ELF32 headers. This is necessary
107 221 because GDB currently cannot open vmcore files with ELF64 headers on
1084) After successfully loading the second kernel as above, if a panic occurs 222 32-bit systems. ELF32 headers can be used on non-PAE systems (that is,
109 system reboots into the second kernel. A module can be written to force 223 less than 4GB of memory).
110 the panic or "ALT-SysRq-c" can be used initiate a crash dump for testing 224
111 purposes. 225* The "irqpoll" boot parameter reduces driver initialization failures
112 226 due to shared interrupts in the dump-capture kernel.
1135) Once the second kernel has booted, write out the dump file using 227
228* You must specify <root-dev> in the format corresponding to the root
229 device name in the output of mount command.
230
231* "init 1" boots the dump-capture kernel into single-user mode without
232 networking. If you want networking, use "init 3."
233
234
235Kernel Panic
236============
237
238After successfully loading the dump-capture kernel as previously
239described, the system will reboot into the dump-capture kernel if a
240system crash is triggered. Trigger points are located in panic(),
241die(), die_nmi() and in the sysrq handler (ALT-SysRq-c).
242
243The following conditions will execute a crash trigger point:
244
245If a hard lockup is detected and "NMI watchdog" is configured, the system
246will boot into the dump-capture kernel ( die_nmi() ).
247
248If die() is called, and it happens to be a thread with pid 0 or 1, or die()
249is called inside interrupt context or die() is called and panic_on_oops is set,
250the system will boot into the dump-capture kernel.
251
252On powererpc systems when a soft-reset is generated, die() is called by all cpus and the system system will boot into the dump-capture kernel.
253
254For testing purposes, you can trigger a crash by using "ALT-SysRq-c",
255"echo c > /proc/sysrq-trigger or write a module to force the panic.
256
257Write Out the Dump File
258=======================
259
260After the dump-capture kernel is booted, write out the dump file with
261the following command:
114 262
115 cp /proc/vmcore <dump-file> 263 cp /proc/vmcore <dump-file>
116 264
117 Dump memory can also be accessed as a /dev/oldmem device for a linear/raw 265You can also access dumped memory as a /dev/oldmem device for a linear
118 view. To create the device, type: 266and raw view. To create the device, use the following command:
119 267
120 mknod /dev/oldmem c 1 12 268 mknod /dev/oldmem c 1 12
121 269
122 Use "dd" with suitable options for count, bs and skip to access specific 270Use the dd command with suitable options for count, bs, and skip to
123 portions of the dump. 271access specific portions of the dump.
124 272
125 Entire memory: dd if=/dev/oldmem of=oldmem.001 273To see the entire memory, use the following command:
126 274
275 dd if=/dev/oldmem of=oldmem.001
127 276
128ANALYSIS 277
278Analysis
129======== 279========
130Limited analysis can be done using gdb on the dump file copied out of
131/proc/vmcore. Use vmlinux built with -g and run
132 280
133 gdb vmlinux <dump-file> 281Before analyzing the dump image, you should reboot into a stable kernel.
282
283You can do limited analysis using GDB on the dump file copied out of
284/proc/vmcore. Use the debug vmlinux built with -g and run the following
285command:
286
287 gdb vmlinux <dump-file>
134 288
135Stack trace for the task on processor 0, register display, memory display 289Stack trace for the task on processor 0, register display, and memory
136work fine. 290display work fine.
137 291
138Note: gdb cannot analyse core files generated in ELF64 format for i386. 292Note: GDB cannot analyze core files generated in ELF64 format for x86.
293On systems with a maximum of 4GB of memory, you can generate
294ELF32-format headers using the --elf32-core-headers kernel option on the
295dump kernel.
139 296
140Latest "crash" (crash-4.0-2.18) as available on Dave Anderson's site 297You can also use the Crash utility to analyze dump files in Kdump
141http://people.redhat.com/~anderson/ works well with kdump format. 298format. Crash is available on Dave Anderson's site at the following URL:
142 299
300 http://people.redhat.com/~anderson/
301
302
303To Do
304=====
143 305
144TODO 3061) Provide a kernel pages filtering mechanism, so core file size is not
145==== 307 extreme on systems with huge memory banks.
1461) Provide a kernel pages filtering mechanism so that core file size is not
147 insane on systems having huge memory banks.
1482) Relocatable kernel can help in maintaining multiple kernels for crashdump
149 and same kernel as the first kernel can be used to capture the dump.
150 308
3092) Relocatable kernel can help in maintaining multiple kernels for
310 crash_dump, and the same kernel as the system kernel can be used to
311 capture the dump.
151 312
152CONTACT 313
314Contact
153======= 315=======
316
154Vivek Goyal (vgoyal@in.ibm.com) 317Vivek Goyal (vgoyal@in.ibm.com)
155Maneesh Soni (maneesh@in.ibm.com) 318Maneesh Soni (maneesh@in.ibm.com)
319
320
321Trademark
322=========
323
324Linux is a trademark of Linus Torvalds in the United States, other
325countries, or both.
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index bca6f389da66..137e993f4329 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -35,7 +35,6 @@ parameter is applicable:
35 APM Advanced Power Management support is enabled. 35 APM Advanced Power Management support is enabled.
36 AX25 Appropriate AX.25 support is enabled. 36 AX25 Appropriate AX.25 support is enabled.
37 CD Appropriate CD support is enabled. 37 CD Appropriate CD support is enabled.
38 DEVFS devfs support is enabled.
39 DRM Direct Rendering Management support is enabled. 38 DRM Direct Rendering Management support is enabled.
40 EDD BIOS Enhanced Disk Drive Services (EDD) is enabled 39 EDD BIOS Enhanced Disk Drive Services (EDD) is enabled
41 EFI EFI Partitioning (GPT) is enabled 40 EFI EFI Partitioning (GPT) is enabled
@@ -61,6 +60,7 @@ parameter is applicable:
61 MTD MTD support is enabled. 60 MTD MTD support is enabled.
62 NET Appropriate network support is enabled. 61 NET Appropriate network support is enabled.
63 NUMA NUMA support is enabled. 62 NUMA NUMA support is enabled.
63 GENERIC_TIME The generic timeofday code is enabled.
64 NFS Appropriate NFS support is enabled. 64 NFS Appropriate NFS support is enabled.
65 OSS OSS sound support is enabled. 65 OSS OSS sound support is enabled.
66 PARIDE The ParIDE subsystem is enabled. 66 PARIDE The ParIDE subsystem is enabled.
@@ -110,6 +110,13 @@ be entered as an environment variable, whereas its absence indicates that
110it will appear as a kernel argument readable via /proc/cmdline by programs 110it will appear as a kernel argument readable via /proc/cmdline by programs
111running once the system is up. 111running once the system is up.
112 112
113The number of kernel parameters is not limited, but the length of the
114complete command line (parameters including spaces etc.) is limited to
115a fixed number of characters. This limit depends on the architecture
116and is between 256 and 4096 characters. It is defined in the file
117./include/asm/setup.h as COMMAND_LINE_SIZE.
118
119
113 53c7xx= [HW,SCSI] Amiga SCSI controllers 120 53c7xx= [HW,SCSI] Amiga SCSI controllers
114 See header of drivers/scsi/53c7xx.c. 121 See header of drivers/scsi/53c7xx.c.
115 See also Documentation/scsi/ncr53c7xx.txt. 122 See also Documentation/scsi/ncr53c7xx.txt.
@@ -179,6 +186,11 @@ running once the system is up.
179 override platform specific driver. 186 override platform specific driver.
180 See also Documentation/acpi-hotkey.txt. 187 See also Documentation/acpi-hotkey.txt.
181 188
189 acpi_pm_good [IA-32,X86-64]
190 Override the pmtimer bug detection: force the kernel
191 to assume that this machine's pmtimer latches its value
192 and always returns good values.
193
182 enable_timer_pin_1 [i386,x86-64] 194 enable_timer_pin_1 [i386,x86-64]
183 Enable PIN 1 of APIC timer 195 Enable PIN 1 of APIC timer
184 Can be useful to work around chipset bugs 196 Can be useful to work around chipset bugs
@@ -341,10 +353,11 @@ running once the system is up.
341 Value can be changed at runtime via 353 Value can be changed at runtime via
342 /selinux/checkreqprot. 354 /selinux/checkreqprot.
343 355
344 clock= [BUGS=IA-32,HW] gettimeofday timesource override. 356 clock= [BUGS=IA-32, HW] gettimeofday clocksource override.
345 Forces specified timesource (if avaliable) to be used 357 [Deprecated]
346 when calculating gettimeofday(). If specicified 358 Forces specified clocksource (if avaliable) to be used
347 timesource is not avalible, it defaults to PIT. 359 when calculating gettimeofday(). If specified
360 clocksource is not avalible, it defaults to PIT.
348 Format: { pit | tsc | cyclone | pmtmr } 361 Format: { pit | tsc | cyclone | pmtmr }
349 362
350 disable_8254_timer 363 disable_8254_timer
@@ -429,13 +442,19 @@ running once the system is up.
429 442
430 debug [KNL] Enable kernel debugging (events log level). 443 debug [KNL] Enable kernel debugging (events log level).
431 444
445 debug_locks_verbose=
446 [KNL] verbose self-tests
447 Format=<0|1>
448 Print debugging info while doing the locking API
449 self-tests.
450 We default to 0 (no extra messages), setting it to
451 1 will print _a lot_ more information - normally
452 only useful to kernel developers.
453
432 decnet= [HW,NET] 454 decnet= [HW,NET]
433 Format: <area>[,<node>] 455 Format: <area>[,<node>]
434 See also Documentation/networking/decnet.txt. 456 See also Documentation/networking/decnet.txt.
435 457
436 devfs= [DEVFS]
437 See Documentation/filesystems/devfs/boot-options.
438
439 dhash_entries= [KNL] 458 dhash_entries= [KNL]
440 Set number of hash buckets for dentry cache. 459 Set number of hash buckets for dentry cache.
441 460
@@ -561,8 +580,6 @@ running once the system is up.
561 gscd= [HW,CD] 580 gscd= [HW,CD]
562 Format: <io> 581 Format: <io>
563 582
564 gt96100eth= [NET] MIPS GT96100 Advanced Communication Controller
565
566 gus= [HW,OSS] 583 gus= [HW,OSS]
567 Format: <io>,<irq>,<dma>,<dma16> 584 Format: <io>,<irq>,<dma>,<dma16>
568 585
@@ -685,6 +702,12 @@ running once the system is up.
685 ips= [HW,SCSI] Adaptec / IBM ServeRAID controller 702 ips= [HW,SCSI] Adaptec / IBM ServeRAID controller
686 See header of drivers/scsi/ips.c. 703 See header of drivers/scsi/ips.c.
687 704
705 ports= [IP_VS_FTP] IPVS ftp helper module
706 Default is 21.
707 Up to 8 (IP_VS_APP_MAX_PORTS) ports
708 may be specified.
709 Format: <port>,<port>....
710
688 irqfixup [HW] 711 irqfixup [HW]
689 When an interrupt is not handled search all handlers 712 When an interrupt is not handled search all handlers
690 for it. Intended to get systems with badly broken 713 for it. Intended to get systems with badly broken
@@ -1017,6 +1040,8 @@ running once the system is up.
1017 1040
1018 nocache [ARM] 1041 nocache [ARM]
1019 1042
1043 nodelayacct [KNL] Disable per-task delay accounting
1044
1020 nodisconnect [HW,SCSI,M68K] Disables SCSI disconnects. 1045 nodisconnect [HW,SCSI,M68K] Disables SCSI disconnects.
1021 1046
1022 noexec [IA-64] 1047 noexec [IA-64]
@@ -1220,7 +1245,11 @@ running once the system is up.
1220 bootloader. This is currently used on 1245 bootloader. This is currently used on
1221 IXP2000 systems where the bus has to be 1246 IXP2000 systems where the bus has to be
1222 configured a certain way for adjunct CPUs. 1247 configured a certain way for adjunct CPUs.
1223 1248 noearly [X86] Don't do any early type 1 scanning.
1249 This might help on some broken boards which
1250 machine check when some devices' config space
1251 is read. But various workarounds are disabled
1252 and some IOMMU drivers will not work.
1224 pcmv= [HW,PCMCIA] BadgePAD 4 1253 pcmv= [HW,PCMCIA] BadgePAD 4
1225 1254
1226 pd. [PARIDE] 1255 pd. [PARIDE]
@@ -1302,7 +1331,7 @@ running once the system is up.
1302 pt. [PARIDE] 1331 pt. [PARIDE]
1303 See Documentation/paride.txt. 1332 See Documentation/paride.txt.
1304 1333
1305 quiet= [KNL] Disable log messages 1334 quiet [KNL] Disable most log messages
1306 1335
1307 r128= [HW,DRM] 1336 r128= [HW,DRM]
1308 1337
@@ -1343,6 +1372,14 @@ running once the system is up.
1343 1372
1344 reserve= [KNL,BUGS] Force the kernel to ignore some iomem area 1373 reserve= [KNL,BUGS] Force the kernel to ignore some iomem area
1345 1374
1375 reservetop= [IA-32]
1376 Format: nn[KMG]
1377 Reserves a hole at the top of the kernel virtual
1378 address space.
1379
1380 reset_devices [KNL] Force drivers to reset the underlying device
1381 during initialization.
1382
1346 resume= [SWSUSP] 1383 resume= [SWSUSP]
1347 Specify the partition device for software suspend 1384 Specify the partition device for software suspend
1348 1385
@@ -1617,6 +1654,10 @@ running once the system is up.
1617 1654
1618 time Show timing data prefixed to each printk message line 1655 time Show timing data prefixed to each printk message line
1619 1656
1657 clocksource= [GENERIC_TIME] Override the default clocksource
1658 Override the default clocksource and use the clocksource
1659 with the name specified.
1660
1620 tipar.timeout= [HW,PPT] 1661 tipar.timeout= [HW,PPT]
1621 Set communications timeout in tenths of a second 1662 Set communications timeout in tenths of a second
1622 (default 15). 1663 (default 15).
@@ -1658,6 +1699,10 @@ running once the system is up.
1658 usbhid.mousepoll= 1699 usbhid.mousepoll=
1659 [USBHID] The interval which mice are to be polled at. 1700 [USBHID] The interval which mice are to be polled at.
1660 1701
1702 vdso= [IA-32]
1703 vdso=1: enable VDSO (default)
1704 vdso=0: disable VDSO mapping
1705
1661 video= [FB] Frame buffer configuration 1706 video= [FB] Frame buffer configuration
1662 See Documentation/fb/modedb.txt. 1707 See Documentation/fb/modedb.txt.
1663 1708
@@ -1674,9 +1719,14 @@ running once the system is up.
1674 decrease the size and leave more room for directly 1719 decrease the size and leave more room for directly
1675 mapped kernel RAM. 1720 mapped kernel RAM.
1676 1721
1677 vmhalt= [KNL,S390] 1722 vmhalt= [KNL,S390] Perform z/VM CP command after system halt.
1723 Format: <command>
1724
1725 vmpanic= [KNL,S390] Perform z/VM CP command after kernel panic.
1726 Format: <command>
1678 1727
1679 vmpoff= [KNL,S390] 1728 vmpoff= [KNL,S390] Perform z/VM CP command after power off.
1729 Format: <command>
1680 1730
1681 waveartist= [HW,OSS] 1731 waveartist= [HW,OSS]
1682 Format: <io>,<irq>,<dma>,<dma2> 1732 Format: <io>,<irq>,<dma>,<dma2>
diff --git a/Documentation/keys-request-key.txt b/Documentation/keys-request-key.txt
index 22488d791168..c1f64fdf84cb 100644
--- a/Documentation/keys-request-key.txt
+++ b/Documentation/keys-request-key.txt
@@ -3,16 +3,23 @@
3 =================== 3 ===================
4 4
5The key request service is part of the key retention service (refer to 5The key request service is part of the key retention service (refer to
6Documentation/keys.txt). This document explains more fully how that the 6Documentation/keys.txt). This document explains more fully how the requesting
7requesting algorithm works. 7algorithm works.
8 8
9The process starts by either the kernel requesting a service by calling 9The process starts by either the kernel requesting a service by calling
10request_key(): 10request_key*():
11 11
12 struct key *request_key(const struct key_type *type, 12 struct key *request_key(const struct key_type *type,
13 const char *description, 13 const char *description,
14 const char *callout_string); 14 const char *callout_string);
15 15
16or:
17
18 struct key *request_key_with_auxdata(const struct key_type *type,
19 const char *description,
20 const char *callout_string,
21 void *aux);
22
16Or by userspace invoking the request_key system call: 23Or by userspace invoking the request_key system call:
17 24
18 key_serial_t request_key(const char *type, 25 key_serial_t request_key(const char *type,
@@ -20,16 +27,26 @@ Or by userspace invoking the request_key system call:
20 const char *callout_info, 27 const char *callout_info,
21 key_serial_t dest_keyring); 28 key_serial_t dest_keyring);
22 29
23The main difference between the two access points is that the in-kernel 30The main difference between the access points is that the in-kernel interface
24interface does not need to link the key to a keyring to prevent it from being 31does not need to link the key to a keyring to prevent it from being immediately
25immediately destroyed. The kernel interface returns a pointer directly to the 32destroyed. The kernel interface returns a pointer directly to the key, and
26key, and it's up to the caller to destroy the key. 33it's up to the caller to destroy the key.
34
35The request_key_with_auxdata() call is like the in-kernel request_key() call,
36except that it permits auxiliary data to be passed to the upcaller (the default
37is NULL). This is only useful for those key types that define their own upcall
38mechanism rather than using /sbin/request-key.
27 39
28The userspace interface links the key to a keyring associated with the process 40The userspace interface links the key to a keyring associated with the process
29to prevent the key from going away, and returns the serial number of the key to 41to prevent the key from going away, and returns the serial number of the key to
30the caller. 42the caller.
31 43
32 44
45The following example assumes that the key types involved don't define their
46own upcall mechanisms. If they do, then those should be substituted for the
47forking and execution of /sbin/request-key.
48
49
33=========== 50===========
34THE PROCESS 51THE PROCESS
35=========== 52===========
@@ -40,8 +57,8 @@ A request proceeds in the following manner:
40 interface]. 57 interface].
41 58
42 (2) request_key() searches the process's subscribed keyrings to see if there's 59 (2) request_key() searches the process's subscribed keyrings to see if there's
43 a suitable key there. If there is, it returns the key. If there isn't, and 60 a suitable key there. If there is, it returns the key. If there isn't,
44 callout_info is not set, an error is returned. Otherwise the process 61 and callout_info is not set, an error is returned. Otherwise the process
45 proceeds to the next step. 62 proceeds to the next step.
46 63
47 (3) request_key() sees that A doesn't have the desired key yet, so it creates 64 (3) request_key() sees that A doesn't have the desired key yet, so it creates
@@ -62,7 +79,7 @@ A request proceeds in the following manner:
62 instantiation. 79 instantiation.
63 80
64 (7) The program may want to access another key from A's context (say a 81 (7) The program may want to access another key from A's context (say a
65 Kerberos TGT key). It just requests the appropriate key, and the keyring 82 Kerberos TGT key). It just requests the appropriate key, and the keyring
66 search notes that the session keyring has auth key V in its bottom level. 83 search notes that the session keyring has auth key V in its bottom level.
67 84
68 This will permit it to then search the keyrings of process A with the 85 This will permit it to then search the keyrings of process A with the
@@ -79,10 +96,11 @@ A request proceeds in the following manner:
79(10) The program then exits 0 and request_key() deletes key V and returns key 96(10) The program then exits 0 and request_key() deletes key V and returns key
80 U to the caller. 97 U to the caller.
81 98
82This also extends further. If key W (step 7 above) didn't exist, key W would be 99This also extends further. If key W (step 7 above) didn't exist, key W would
83created uninstantiated, another auth key (X) would be created (as per step 3) 100be created uninstantiated, another auth key (X) would be created (as per step
84and another copy of /sbin/request-key spawned (as per step 4); but the context 1013) and another copy of /sbin/request-key spawned (as per step 4); but the
85specified by auth key X will still be process A, as it was in auth key V. 102context specified by auth key X will still be process A, as it was in auth key
103V.
86 104
87This is because process A's keyrings can't simply be attached to 105This is because process A's keyrings can't simply be attached to
88/sbin/request-key at the appropriate places because (a) execve will discard two 106/sbin/request-key at the appropriate places because (a) execve will discard two
@@ -118,17 +136,17 @@ A search of any particular keyring proceeds in the following fashion:
118 136
119 (2) It considers all the non-keyring keys within that keyring and, if any key 137 (2) It considers all the non-keyring keys within that keyring and, if any key
120 matches the criteria specified, calls key_permission(SEARCH) on it to see 138 matches the criteria specified, calls key_permission(SEARCH) on it to see
121 if the key is allowed to be found. If it is, that key is returned; if 139 if the key is allowed to be found. If it is, that key is returned; if
122 not, the search continues, and the error code is retained if of higher 140 not, the search continues, and the error code is retained if of higher
123 priority than the one currently set. 141 priority than the one currently set.
124 142
125 (3) It then considers all the keyring-type keys in the keyring it's currently 143 (3) It then considers all the keyring-type keys in the keyring it's currently
126 searching. It calls key_permission(SEARCH) on each keyring, and if this 144 searching. It calls key_permission(SEARCH) on each keyring, and if this
127 grants permission, it recurses, executing steps (2) and (3) on that 145 grants permission, it recurses, executing steps (2) and (3) on that
128 keyring. 146 keyring.
129 147
130The process stops immediately a valid key is found with permission granted to 148The process stops immediately a valid key is found with permission granted to
131use it. Any error from a previous match attempt is discarded and the key is 149use it. Any error from a previous match attempt is discarded and the key is
132returned. 150returned.
133 151
134When search_process_keyrings() is invoked, it performs the following searches 152When search_process_keyrings() is invoked, it performs the following searches
@@ -153,7 +171,7 @@ The moment one succeeds, all pending errors are discarded and the found key is
153returned. 171returned.
154 172
155Only if all these fail does the whole thing fail with the highest priority 173Only if all these fail does the whole thing fail with the highest priority
156error. Note that several errors may have come from LSM. 174error. Note that several errors may have come from LSM.
157 175
158The error priority is: 176The error priority is:
159 177
diff --git a/Documentation/keys.txt b/Documentation/keys.txt
index 3bbe157b45e4..e373f0212843 100644
--- a/Documentation/keys.txt
+++ b/Documentation/keys.txt
@@ -241,25 +241,30 @@ The security class "key" has been added to SELinux so that mandatory access
241controls can be applied to keys created within various contexts. This support 241controls can be applied to keys created within various contexts. This support
242is preliminary, and is likely to change quite significantly in the near future. 242is preliminary, and is likely to change quite significantly in the near future.
243Currently, all of the basic permissions explained above are provided in SELinux 243Currently, all of the basic permissions explained above are provided in SELinux
244as well; SE Linux is simply invoked after all basic permission checks have been 244as well; SELinux is simply invoked after all basic permission checks have been
245performed. 245performed.
246 246
247Each key is labeled with the same context as the task to which it belongs. 247The value of the file /proc/self/attr/keycreate influences the labeling of
248Typically, this is the same task that was running when the key was created. 248newly-created keys. If the contents of that file correspond to an SELinux
249The default keyrings are handled differently, but in a way that is very 249security context, then the key will be assigned that context. Otherwise, the
250intuitive: 250key will be assigned the current context of the task that invoked the key
251creation request. Tasks must be granted explicit permission to assign a
252particular context to newly-created keys, using the "create" permission in the
253key security class.
251 254
252 (*) The user and user session keyrings that are created when the user logs in 255The default keyrings associated with users will be labeled with the default
253 are currently labeled with the context of the login manager. 256context of the user if and only if the login programs have been instrumented to
254 257properly initialize keycreate during the login process. Otherwise, they will
255 (*) The keyrings associated with new threads are each labeled with the context 258be labeled with the context of the login program itself.
256 of their associated thread, and both session and process keyrings are
257 handled similarly.
258 259
259Note, however, that the default keyrings associated with the root user are 260Note, however, that the default keyrings associated with the root user are
260labeled with the default kernel context, since they are created early in the 261labeled with the default kernel context, since they are created early in the
261boot process, before root has a chance to log in. 262boot process, before root has a chance to log in.
262 263
264The keyrings associated with new threads are each labeled with the context of
265their associated thread, and both session and process keyrings are handled
266similarly.
267
263 268
264================ 269================
265NEW PROCFS FILES 270NEW PROCFS FILES
@@ -270,9 +275,17 @@ about the status of the key service:
270 275
271 (*) /proc/keys 276 (*) /proc/keys
272 277
273 This lists all the keys on the system, giving information about their 278 This lists the keys that are currently viewable by the task reading the
274 type, description and permissions. The payload of the key is not available 279 file, giving information about their type, description and permissions.
275 this way: 280 It is not possible to view the payload of the key this way, though some
281 information about it may be given.
282
283 The only keys included in the list are those that grant View permission to
284 the reading process whether or not it possesses them. Note that LSM
285 security checks are still performed, and may further filter out keys that
286 the current process is not authorised to view.
287
288 The contents of the file look like this:
276 289
277 SERIAL FLAGS USAGE EXPY PERM UID GID TYPE DESCRIPTION: SUMMARY 290 SERIAL FLAGS USAGE EXPY PERM UID GID TYPE DESCRIPTION: SUMMARY
278 00000001 I----- 39 perm 1f3f0000 0 0 keyring _uid_ses.0: 1/4 291 00000001 I----- 39 perm 1f3f0000 0 0 keyring _uid_ses.0: 1/4
@@ -300,7 +313,7 @@ about the status of the key service:
300 (*) /proc/key-users 313 (*) /proc/key-users
301 314
302 This file lists the tracking data for each user that has at least one key 315 This file lists the tracking data for each user that has at least one key
303 on the system. Such data includes quota information and statistics: 316 on the system. Such data includes quota information and statistics:
304 317
305 [root@andromeda root]# cat /proc/key-users 318 [root@andromeda root]# cat /proc/key-users
306 0: 46 45/45 1/100 13/10000 319 0: 46 45/45 1/100 13/10000
@@ -767,6 +780,17 @@ payload contents" for more information.
767 See also Documentation/keys-request-key.txt. 780 See also Documentation/keys-request-key.txt.
768 781
769 782
783(*) To search for a key, passing auxiliary data to the upcaller, call:
784
785 struct key *request_key_with_auxdata(const struct key_type *type,
786 const char *description,
787 const char *callout_string,
788 void *aux);
789
790 This is identical to request_key(), except that the auxiliary data is
791 passed to the key_type->request_key() op if it exists.
792
793
770(*) When it is no longer required, the key should be released using: 794(*) When it is no longer required, the key should be released using:
771 795
772 void key_put(struct key *key); 796 void key_put(struct key *key);
@@ -1018,6 +1042,24 @@ The structure has a number of fields, some of which are mandatory:
1018 as might happen when the userspace buffer is accessed. 1042 as might happen when the userspace buffer is accessed.
1019 1043
1020 1044
1045 (*) int (*request_key)(struct key *key, struct key *authkey, const char *op,
1046 void *aux);
1047
1048 This method is optional. If provided, request_key() and
1049 request_key_with_auxdata() will invoke this function rather than
1050 upcalling to /sbin/request-key to operate upon a key of this type.
1051
1052 The aux parameter is as passed to request_key_with_auxdata() or is NULL
1053 otherwise. Also passed are the key to be operated upon, the
1054 authorisation key for this operation and the operation type (currently
1055 only "create").
1056
1057 This function should return only when the upcall is complete. Upon return
1058 the authorisation key will be revoked, and the target key will be
1059 negatively instantiated if it is still uninstantiated. The error will be
1060 returned to the caller of request_key*().
1061
1062
1021============================ 1063============================
1022REQUEST-KEY CALLBACK SERVICE 1064REQUEST-KEY CALLBACK SERVICE
1023============================ 1065============================
diff --git a/Documentation/kobject.txt b/Documentation/kobject.txt
index 8d9bffbd192c..949f7b5a2053 100644
--- a/Documentation/kobject.txt
+++ b/Documentation/kobject.txt
@@ -247,7 +247,7 @@ the object-specific fields, which include:
247- default_attrs: Default attributes to be exported via sysfs when the 247- default_attrs: Default attributes to be exported via sysfs when the
248 object is registered.Note that the last attribute has to be 248 object is registered.Note that the last attribute has to be
249 initialized to NULL ! You can find a complete implementation 249 initialized to NULL ! You can find a complete implementation
250 in drivers/block/genhd.c 250 in block/genhd.c
251 251
252 252
253Instances of struct kobj_type are not registered; only referenced by 253Instances of struct kobj_type are not registered; only referenced by
diff --git a/Documentation/lockdep-design.txt b/Documentation/lockdep-design.txt
new file mode 100644
index 000000000000..00d93605bfd3
--- /dev/null
+++ b/Documentation/lockdep-design.txt
@@ -0,0 +1,197 @@
1Runtime locking correctness validator
2=====================================
3
4started by Ingo Molnar <mingo@redhat.com>
5additions by Arjan van de Ven <arjan@linux.intel.com>
6
7Lock-class
8----------
9
10The basic object the validator operates upon is a 'class' of locks.
11
12A class of locks is a group of locks that are logically the same with
13respect to locking rules, even if the locks may have multiple (possibly
14tens of thousands of) instantiations. For example a lock in the inode
15struct is one class, while each inode has its own instantiation of that
16lock class.
17
18The validator tracks the 'state' of lock-classes, and it tracks
19dependencies between different lock-classes. The validator maintains a
20rolling proof that the state and the dependencies are correct.
21
22Unlike an lock instantiation, the lock-class itself never goes away: when
23a lock-class is used for the first time after bootup it gets registered,
24and all subsequent uses of that lock-class will be attached to this
25lock-class.
26
27State
28-----
29
30The validator tracks lock-class usage history into 5 separate state bits:
31
32- 'ever held in hardirq context' [ == hardirq-safe ]
33- 'ever held in softirq context' [ == softirq-safe ]
34- 'ever held with hardirqs enabled' [ == hardirq-unsafe ]
35- 'ever held with softirqs and hardirqs enabled' [ == softirq-unsafe ]
36
37- 'ever used' [ == !unused ]
38
39Single-lock state rules:
40------------------------
41
42A softirq-unsafe lock-class is automatically hardirq-unsafe as well. The
43following states are exclusive, and only one of them is allowed to be
44set for any lock-class:
45
46 <hardirq-safe> and <hardirq-unsafe>
47 <softirq-safe> and <softirq-unsafe>
48
49The validator detects and reports lock usage that violate these
50single-lock state rules.
51
52Multi-lock dependency rules:
53----------------------------
54
55The same lock-class must not be acquired twice, because this could lead
56to lock recursion deadlocks.
57
58Furthermore, two locks may not be taken in different order:
59
60 <L1> -> <L2>
61 <L2> -> <L1>
62
63because this could lead to lock inversion deadlocks. (The validator
64finds such dependencies in arbitrary complexity, i.e. there can be any
65other locking sequence between the acquire-lock operations, the
66validator will still track all dependencies between locks.)
67
68Furthermore, the following usage based lock dependencies are not allowed
69between any two lock-classes:
70
71 <hardirq-safe> -> <hardirq-unsafe>
72 <softirq-safe> -> <softirq-unsafe>
73
74The first rule comes from the fact the a hardirq-safe lock could be
75taken by a hardirq context, interrupting a hardirq-unsafe lock - and
76thus could result in a lock inversion deadlock. Likewise, a softirq-safe
77lock could be taken by an softirq context, interrupting a softirq-unsafe
78lock.
79
80The above rules are enforced for any locking sequence that occurs in the
81kernel: when acquiring a new lock, the validator checks whether there is
82any rule violation between the new lock and any of the held locks.
83
84When a lock-class changes its state, the following aspects of the above
85dependency rules are enforced:
86
87- if a new hardirq-safe lock is discovered, we check whether it
88 took any hardirq-unsafe lock in the past.
89
90- if a new softirq-safe lock is discovered, we check whether it took
91 any softirq-unsafe lock in the past.
92
93- if a new hardirq-unsafe lock is discovered, we check whether any
94 hardirq-safe lock took it in the past.
95
96- if a new softirq-unsafe lock is discovered, we check whether any
97 softirq-safe lock took it in the past.
98
99(Again, we do these checks too on the basis that an interrupt context
100could interrupt _any_ of the irq-unsafe or hardirq-unsafe locks, which
101could lead to a lock inversion deadlock - even if that lock scenario did
102not trigger in practice yet.)
103
104Exception: Nested data dependencies leading to nested locking
105-------------------------------------------------------------
106
107There are a few cases where the Linux kernel acquires more than one
108instance of the same lock-class. Such cases typically happen when there
109is some sort of hierarchy within objects of the same type. In these
110cases there is an inherent "natural" ordering between the two objects
111(defined by the properties of the hierarchy), and the kernel grabs the
112locks in this fixed order on each of the objects.
113
114An example of such an object hieararchy that results in "nested locking"
115is that of a "whole disk" block-dev object and a "partition" block-dev
116object; the partition is "part of" the whole device and as long as one
117always takes the whole disk lock as a higher lock than the partition
118lock, the lock ordering is fully correct. The validator does not
119automatically detect this natural ordering, as the locking rule behind
120the ordering is not static.
121
122In order to teach the validator about this correct usage model, new
123versions of the various locking primitives were added that allow you to
124specify a "nesting level". An example call, for the block device mutex,
125looks like this:
126
127enum bdev_bd_mutex_lock_class
128{
129 BD_MUTEX_NORMAL,
130 BD_MUTEX_WHOLE,
131 BD_MUTEX_PARTITION
132};
133
134 mutex_lock_nested(&bdev->bd_contains->bd_mutex, BD_MUTEX_PARTITION);
135
136In this case the locking is done on a bdev object that is known to be a
137partition.
138
139The validator treats a lock that is taken in such a nested fasion as a
140separate (sub)class for the purposes of validation.
141
142Note: When changing code to use the _nested() primitives, be careful and
143check really thoroughly that the hiearchy is correctly mapped; otherwise
144you can get false positives or false negatives.
145
146Proof of 100% correctness:
147--------------------------
148
149The validator achieves perfect, mathematical 'closure' (proof of locking
150correctness) in the sense that for every simple, standalone single-task
151locking sequence that occured at least once during the lifetime of the
152kernel, the validator proves it with a 100% certainty that no
153combination and timing of these locking sequences can cause any class of
154lock related deadlock. [*]
155
156I.e. complex multi-CPU and multi-task locking scenarios do not have to
157occur in practice to prove a deadlock: only the simple 'component'
158locking chains have to occur at least once (anytime, in any
159task/context) for the validator to be able to prove correctness. (For
160example, complex deadlocks that would normally need more than 3 CPUs and
161a very unlikely constellation of tasks, irq-contexts and timings to
162occur, can be detected on a plain, lightly loaded single-CPU system as
163well!)
164
165This radically decreases the complexity of locking related QA of the
166kernel: what has to be done during QA is to trigger as many "simple"
167single-task locking dependencies in the kernel as possible, at least
168once, to prove locking correctness - instead of having to trigger every
169possible combination of locking interaction between CPUs, combined with
170every possible hardirq and softirq nesting scenario (which is impossible
171to do in practice).
172
173[*] assuming that the validator itself is 100% correct, and no other
174 part of the system corrupts the state of the validator in any way.
175 We also assume that all NMI/SMM paths [which could interrupt
176 even hardirq-disabled codepaths] are correct and do not interfere
177 with the validator. We also assume that the 64-bit 'chain hash'
178 value is unique for every lock-chain in the system. Also, lock
179 recursion must not be higher than 20.
180
181Performance:
182------------
183
184The above rules require _massive_ amounts of runtime checking. If we did
185that for every lock taken and for every irqs-enable event, it would
186render the system practically unusably slow. The complexity of checking
187is O(N^2), so even with just a few hundred lock-classes we'd have to do
188tens of thousands of checks for every event.
189
190This problem is solved by checking any given 'locking scenario' (unique
191sequence of locks taken after each other) only once. A simple stack of
192held locks is maintained, and a lightweight 64-bit hash value is
193calculated, which hash is unique for every lock chain. The hash value,
194when the chain is validated for the first time, is then put into a hash
195table, which hash-table can be checked in a lockfree manner. If the
196locking chain occurs again later on, the hash table tells us that we
197dont have to validate the chain again.
diff --git a/Documentation/md.txt b/Documentation/md.txt
index 03a13c462cf2..0668f9dc9d29 100644
--- a/Documentation/md.txt
+++ b/Documentation/md.txt
@@ -200,6 +200,17 @@ All md devices contain:
200 This can be written only while the array is being assembled, not 200 This can be written only while the array is being assembled, not
201 after it is started. 201 after it is started.
202 202
203 layout
204 The "layout" for the array for the particular level. This is
205 simply a number that is interpretted differently by different
206 levels. It can be written while assembling an array.
207
208 resync_start
209 The point at which resync should start. If no resync is needed,
210 this will be a very large number. At array creation it will
211 default to 0, though starting the array as 'clean' will
212 set it much larger.
213
203 new_dev 214 new_dev
204 This file can be written but not read. The value written should 215 This file can be written but not read. The value written should
205 be a block device number as major:minor. e.g. 8:0 216 be a block device number as major:minor. e.g. 8:0
@@ -207,6 +218,54 @@ All md devices contain:
207 available. It will then appear at md/dev-XXX (depending on the 218 available. It will then appear at md/dev-XXX (depending on the
208 name of the device) and further configuration is then possible. 219 name of the device) and further configuration is then possible.
209 220
221 safe_mode_delay
222 When an md array has seen no write requests for a certain period
223 of time, it will be marked as 'clean'. When another write
224 request arrive, the array is marked as 'dirty' before the write
225 commenses. This is known as 'safe_mode'.
226 The 'certain period' is controlled by this file which stores the
227 period as a number of seconds. The default is 200msec (0.200).
228 Writing a value of 0 disables safemode.
229
230 array_state
231 This file contains a single word which describes the current
232 state of the array. In many cases, the state can be set by
233 writing the word for the desired state, however some states
234 cannot be explicitly set, and some transitions are not allowed.
235
236 clear
237 No devices, no size, no level
238 Writing is equivalent to STOP_ARRAY ioctl
239 inactive
240 May have some settings, but array is not active
241 all IO results in error
242 When written, doesn't tear down array, but just stops it
243 suspended (not supported yet)
244 All IO requests will block. The array can be reconfigured.
245 Writing this, if accepted, will block until array is quiessent
246 readonly
247 no resync can happen. no superblocks get written.
248 write requests fail
249 read-auto
250 like readonly, but behaves like 'clean' on a write request.
251
252 clean - no pending writes, but otherwise active.
253 When written to inactive array, starts without resync
254 If a write request arrives then
255 if metadata is known, mark 'dirty' and switch to 'active'.
256 if not known, block and switch to write-pending
257 If written to an active array that has pending writes, then fails.
258 active
259 fully active: IO and resync can be happening.
260 When written to inactive array, starts with resync
261
262 write-pending
263 clean, but writes are blocked waiting for 'active' to be written.
264
265 active-idle
266 like active, but no writes have been seen for a while (safe_mode_delay).
267
268
210 sync_speed_min 269 sync_speed_min
211 sync_speed_max 270 sync_speed_max
212 This are similar to /proc/sys/dev/raid/speed_limit_{min,max} 271 This are similar to /proc/sys/dev/raid/speed_limit_{min,max}
@@ -250,10 +309,18 @@ Each directory contains:
250 faulty - device has been kicked from active use due to 309 faulty - device has been kicked from active use due to
251 a detected fault 310 a detected fault
252 in_sync - device is a fully in-sync member of the array 311 in_sync - device is a fully in-sync member of the array
312 writemostly - device will only be subject to read
313 requests if there are no other options.
314 This applies only to raid1 arrays.
253 spare - device is working, but not a full member. 315 spare - device is working, but not a full member.
254 This includes spares that are in the process 316 This includes spares that are in the process
255 of being recoverred to 317 of being recoverred to
256 This list make grow in future. 318 This list make grow in future.
319 This can be written to.
320 Writing "faulty" simulates a failure on the device.
321 Writing "remove" removes the device from the array.
322 Writing "writemostly" sets the writemostly flag.
323 Writing "-writemostly" clears the writemostly flag.
257 324
258 errors 325 errors
259 An approximate count of read errors that have been detected on 326 An approximate count of read errors that have been detected on
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index 4710845dbac4..46b9b389df35 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -262,9 +262,14 @@ What is required is some way of intervening to instruct the compiler and the
262CPU to restrict the order. 262CPU to restrict the order.
263 263
264Memory barriers are such interventions. They impose a perceived partial 264Memory barriers are such interventions. They impose a perceived partial
265ordering between the memory operations specified on either side of the barrier. 265ordering over the memory operations on either side of the barrier.
266They request that the sequence of memory events generated appears to other 266
267parts of the system as if the barrier is effective on that CPU. 267Such enforcement is important because the CPUs and other devices in a system
268can use a variety of tricks to improve performance - including reordering,
269deferral and combination of memory operations; speculative loads; speculative
270branch prediction and various types of caching. Memory barriers are used to
271override or suppress these tricks, allowing the code to sanely control the
272interaction of multiple CPUs and/or devices.
268 273
269 274
270VARIETIES OF MEMORY BARRIER 275VARIETIES OF MEMORY BARRIER
@@ -282,7 +287,7 @@ Memory barriers come in four basic varieties:
282 A write barrier is a partial ordering on stores only; it is not required 287 A write barrier is a partial ordering on stores only; it is not required
283 to have any effect on loads. 288 to have any effect on loads.
284 289
285 A CPU can be viewed as as commiting a sequence of store operations to the 290 A CPU can be viewed as committing a sequence of store operations to the
286 memory system as time progresses. All stores before a write barrier will 291 memory system as time progresses. All stores before a write barrier will
287 occur in the sequence _before_ all the stores after the write barrier. 292 occur in the sequence _before_ all the stores after the write barrier.
288 293
@@ -413,7 +418,7 @@ There are certain things that the Linux kernel memory barriers do not guarantee:
413 indirect effect will be the order in which the second CPU sees the effects 418 indirect effect will be the order in which the second CPU sees the effects
414 of the first CPU's accesses occur, but see the next point: 419 of the first CPU's accesses occur, but see the next point:
415 420
416 (*) There is no guarantee that the a CPU will see the correct order of effects 421 (*) There is no guarantee that a CPU will see the correct order of effects
417 from a second CPU's accesses, even _if_ the second CPU uses a memory 422 from a second CPU's accesses, even _if_ the second CPU uses a memory
418 barrier, unless the first CPU _also_ uses a matching memory barrier (see 423 barrier, unless the first CPU _also_ uses a matching memory barrier (see
419 the subsection on "SMP Barrier Pairing"). 424 the subsection on "SMP Barrier Pairing").
@@ -461,8 +466,8 @@ Whilst this may seem like a failure of coherency or causality maintenance, it
461isn't, and this behaviour can be observed on certain real CPUs (such as the DEC 466isn't, and this behaviour can be observed on certain real CPUs (such as the DEC
462Alpha). 467Alpha).
463 468
464To deal with this, a data dependency barrier must be inserted between the 469To deal with this, a data dependency barrier or better must be inserted
465address load and the data load: 470between the address load and the data load:
466 471
467 CPU 1 CPU 2 472 CPU 1 CPU 2
468 =============== =============== 473 =============== ===============
@@ -484,7 +489,7 @@ lines. The pointer P might be stored in an odd-numbered cache line, and the
484variable B might be stored in an even-numbered cache line. Then, if the 489variable B might be stored in an even-numbered cache line. Then, if the
485even-numbered bank of the reading CPU's cache is extremely busy while the 490even-numbered bank of the reading CPU's cache is extremely busy while the
486odd-numbered bank is idle, one can see the new value of the pointer P (&B), 491odd-numbered bank is idle, one can see the new value of the pointer P (&B),
487but the old value of the variable B (1). 492but the old value of the variable B (2).
488 493
489 494
490Another example of where data dependency barriers might by required is where a 495Another example of where data dependency barriers might by required is where a
@@ -597,7 +602,7 @@ Consider the following sequence of events:
597 602
598This sequence of events is committed to the memory coherence system in an order 603This sequence of events is committed to the memory coherence system in an order
599that the rest of the system might perceive as the unordered set of { STORE A, 604that the rest of the system might perceive as the unordered set of { STORE A,
600STORE B, STORE C } all occuring before the unordered set of { STORE D, STORE E 605STORE B, STORE C } all occurring before the unordered set of { STORE D, STORE E
601}: 606}:
602 607
603 +-------+ : : 608 +-------+ : :
@@ -744,7 +749,7 @@ some effectively random order, despite the write barrier issued by CPU 1:
744 : : 749 : :
745 750
746 751
747If, however, a read barrier were to be placed between the load of E and the 752If, however, a read barrier were to be placed between the load of B and the
748load of A on CPU 2: 753load of A on CPU 2:
749 754
750 CPU 1 CPU 2 755 CPU 1 CPU 2
@@ -1010,10 +1015,9 @@ CPU from reordering them.
1010There are some more advanced barrier functions: 1015There are some more advanced barrier functions:
1011 1016
1012 (*) set_mb(var, value) 1017 (*) set_mb(var, value)
1013 (*) set_wmb(var, value)
1014 1018
1015 These assign the value to the variable and then insert at least a write 1019 This assigns the value to the variable and then inserts at least a write
1016 barrier after it, depending on the function. They aren't guaranteed to 1020 barrier after it, depending on the function. It isn't guaranteed to
1017 insert anything more than a compiler barrier in a UP compilation. 1021 insert anything more than a compiler barrier in a UP compilation.
1018 1022
1019 1023
@@ -1461,9 +1465,8 @@ instruction itself is complete.
1461 1465
1462On a UP system - where this wouldn't be a problem - the smp_mb() is just a 1466On a UP system - where this wouldn't be a problem - the smp_mb() is just a
1463compiler barrier, thus making sure the compiler emits the instructions in the 1467compiler barrier, thus making sure the compiler emits the instructions in the
1464right order without actually intervening in the CPU. Since there there's only 1468right order without actually intervening in the CPU. Since there's only one
1465one CPU, that CPU's dependency ordering logic will take care of everything 1469CPU, that CPU's dependency ordering logic will take care of everything else.
1466else.
1467 1470
1468 1471
1469ATOMIC OPERATIONS 1472ATOMIC OPERATIONS
@@ -1640,9 +1643,9 @@ functions:
1640 1643
1641 The PCI bus, amongst others, defines an I/O space concept - which on such 1644 The PCI bus, amongst others, defines an I/O space concept - which on such
1642 CPUs as i386 and x86_64 cpus readily maps to the CPU's concept of I/O 1645 CPUs as i386 and x86_64 cpus readily maps to the CPU's concept of I/O
1643 space. However, it may also mapped as a virtual I/O space in the CPU's 1646 space. However, it may also be mapped as a virtual I/O space in the CPU's
1644 memory map, particularly on those CPUs that don't support alternate 1647 memory map, particularly on those CPUs that don't support alternate I/O
1645 I/O spaces. 1648 spaces.
1646 1649
1647 Accesses to this space may be fully synchronous (as on i386), but 1650 Accesses to this space may be fully synchronous (as on i386), but
1648 intermediary bridges (such as the PCI host bridge) may not fully honour 1651 intermediary bridges (such as the PCI host bridge) may not fully honour
diff --git a/Documentation/mips/time.README b/Documentation/mips/time.README
index 70bc0dd43d6d..69ddc5c14b79 100644
--- a/Documentation/mips/time.README
+++ b/Documentation/mips/time.README
@@ -65,7 +65,7 @@ the following functions or values:
65 1. (optional) set up RTC routines 65 1. (optional) set up RTC routines
66 2. (optional) calibrate and set the mips_counter_frequency 66 2. (optional) calibrate and set the mips_counter_frequency
67 67
68 b) board_timer_setup - a function pointer. Invoked at the end of time_init() 68 b) plat_timer_setup - a function pointer. Invoked at the end of time_init()
69 1. (optional) over-ride any decisions made in time_init() 69 1. (optional) over-ride any decisions made in time_init()
70 2. set up the irqaction for timer interrupt. 70 2. set up the irqaction for timer interrupt.
71 3. enable the timer interrupt 71 3. enable the timer interrupt
@@ -116,19 +116,17 @@ Step 2: the machine setup() function
116 116
117 If you supply board_time_init(), set the function poointer. 117 If you supply board_time_init(), set the function poointer.
118 118
119 Set the function pointer board_timer_setup() (mandatory)
120 119
121 120Step 3: implement rtc routines, board_time_init() and plat_timer_setup()
122Step 3: implement rtc routines, board_time_init() and board_timer_setup()
123 if needed. 121 if needed.
124 122
125 board_time_init() - 123 board_time_init() -
126 a) (optional) set up RTC routines, 124 a) (optional) set up RTC routines,
127 b) (optional) calibrate and set the mips_counter_frequency 125 b) (optional) calibrate and set the mips_counter_frequency
128 (only needed if you intended to use fixed_rate_gettimeoffset 126 (only needed if you intended to use fixed_rate_gettimeoffset
129 or use cpu counter as timer interrupt source) 127 or use cpu counter as timer interrupt source)
130 128
131 board_timer_setup() - 129 plat_timer_setup() -
132 a) (optional) over-write any choices made above by time_init(). 130 a) (optional) over-write any choices made above by time_init().
133 b) machine specific code should setup the timer irqaction. 131 b) machine specific code should setup the timer irqaction.
134 c) enable the timer interrupt 132 c) enable the timer interrupt
diff --git a/Documentation/netlabel/00-INDEX b/Documentation/netlabel/00-INDEX
new file mode 100644
index 000000000000..837bf35990e2
--- /dev/null
+++ b/Documentation/netlabel/00-INDEX
@@ -0,0 +1,10 @@
100-INDEX
2 - this file.
3cipso_ipv4.txt
4 - documentation on the IPv4 CIPSO protocol engine.
5draft-ietf-cipso-ipsecurity-01.txt
6 - IETF draft of the CIPSO protocol, dated 16 July 1992.
7introduction.txt
8 - NetLabel introduction, READ THIS FIRST.
9lsm_interface.txt
10 - documentation on the NetLabel kernel security module API.
diff --git a/Documentation/netlabel/cipso_ipv4.txt b/Documentation/netlabel/cipso_ipv4.txt
new file mode 100644
index 000000000000..93dacb132c3c
--- /dev/null
+++ b/Documentation/netlabel/cipso_ipv4.txt
@@ -0,0 +1,48 @@
1NetLabel CIPSO/IPv4 Protocol Engine
2==============================================================================
3Paul Moore, paul.moore@hp.com
4
5May 17, 2006
6
7 * Overview
8
9The NetLabel CIPSO/IPv4 protocol engine is based on the IETF Commercial IP
10Security Option (CIPSO) draft from July 16, 1992. A copy of this draft can be
11found in this directory, consult '00-INDEX' for the filename. While the IETF
12draft never made it to an RFC standard it has become a de-facto standard for
13labeled networking and is used in many trusted operating systems.
14
15 * Outbound Packet Processing
16
17The CIPSO/IPv4 protocol engine applies the CIPSO IP option to packets by
18adding the CIPSO label to the socket. This causes all packets leaving the
19system through the socket to have the CIPSO IP option applied. The socket's
20CIPSO label can be changed at any point in time, however, it is recommended
21that it is set upon the socket's creation. The LSM can set the socket's CIPSO
22label by using the NetLabel security module API; if the NetLabel "domain" is
23configured to use CIPSO for packet labeling then a CIPSO IP option will be
24generated and attached to the socket.
25
26 * Inbound Packet Processing
27
28The CIPSO/IPv4 protocol engine validates every CIPSO IP option it finds at the
29IP layer without any special handling required by the LSM. However, in order
30to decode and translate the CIPSO label on the packet the LSM must use the
31NetLabel security module API to extract the security attributes of the packet.
32This is typically done at the socket layer using the 'socket_sock_rcv_skb()'
33LSM hook.
34
35 * Label Translation
36
37The CIPSO/IPv4 protocol engine contains a mechanism to translate CIPSO security
38attributes such as sensitivity level and category to values which are
39appropriate for the host. These mappings are defined as part of a CIPSO
40Domain Of Interpretation (DOI) definition and are configured through the
41NetLabel user space communication layer. Each DOI definition can have a
42different security attribute mapping table.
43
44 * Label Translation Cache
45
46The NetLabel system provides a framework for caching security attribute
47mappings from the network labels to the corresponding LSM identifiers. The
48CIPSO/IPv4 protocol engine supports this caching mechanism.
diff --git a/Documentation/netlabel/draft-ietf-cipso-ipsecurity-01.txt b/Documentation/netlabel/draft-ietf-cipso-ipsecurity-01.txt
new file mode 100644
index 000000000000..256c2c9d4f50
--- /dev/null
+++ b/Documentation/netlabel/draft-ietf-cipso-ipsecurity-01.txt
@@ -0,0 +1,791 @@
1IETF CIPSO Working Group
216 July, 1992
3
4
5
6 COMMERCIAL IP SECURITY OPTION (CIPSO 2.2)
7
8
9
101. Status
11
12This Internet Draft provides the high level specification for a Commercial
13IP Security Option (CIPSO). This draft reflects the version as approved by
14the CIPSO IETF Working Group. Distribution of this memo is unlimited.
15
16This document is an Internet Draft. Internet Drafts are working documents
17of the Internet Engineering Task Force (IETF), its Areas, and its Working
18Groups. Note that other groups may also distribute working documents as
19Internet Drafts.
20
21Internet Drafts are draft documents valid for a maximum of six months.
22Internet Drafts may be updated, replaced, or obsoleted by other documents
23at any time. It is not appropriate to use Internet Drafts as reference
24material or to cite them other than as a "working draft" or "work in
25progress."
26
27Please check the I-D abstract listing contained in each Internet Draft
28directory to learn the current status of this or any other Internet Draft.
29
30
31
32
332. Background
34
35Currently the Internet Protocol includes two security options. One of
36these options is the DoD Basic Security Option (BSO) (Type 130) which allows
37IP datagrams to be labeled with security classifications. This option
38provides sixteen security classifications and a variable number of handling
39restrictions. To handle additional security information, such as security
40categories or compartments, another security option (Type 133) exists and
41is referred to as the DoD Extended Security Option (ESO). The values for
42the fixed fields within these two options are administered by the Defense
43Information Systems Agency (DISA).
44
45Computer vendors are now building commercial operating systems with
46mandatory access controls and multi-level security. These systems are
47no longer built specifically for a particular group in the defense or
48intelligence communities. They are generally available commercial systems
49for use in a variety of government and civil sector environments.
50
51The small number of ESO format codes can not support all the possible
52applications of a commercial security option. The BSO and ESO were
53designed to only support the United States DoD. CIPSO has been designed
54to support multiple security policies. This Internet Draft provides the
55format and procedures required to support a Mandatory Access Control
56security policy. Support for additional security policies shall be
57defined in future RFCs.
58
59
60
61
62Internet Draft, Expires 15 Jan 93 [PAGE 1]
63
64
65
66CIPSO INTERNET DRAFT 16 July, 1992
67
68
69
70
713. CIPSO Format
72
73Option type: 134 (Class 0, Number 6, Copy on Fragmentation)
74Option length: Variable
75
76This option permits security related information to be passed between
77systems within a single Domain of Interpretation (DOI). A DOI is a
78collection of systems which agree on the meaning of particular values
79in the security option. An authority that has been assigned a DOI
80identifier will define a mapping between appropriate CIPSO field values
81and their human readable equivalent. This authority will distribute that
82mapping to hosts within the authority's domain. These mappings may be
83sensitive, therefore a DOI authority is not required to make these
84mappings available to anyone other than the systems that are included in
85the DOI.
86
87This option MUST be copied on fragmentation. This option appears at most
88once in a datagram. All multi-octet fields in the option are defined to be
89transmitted in network byte order. The format of this option is as follows:
90
91+----------+----------+------//------+-----------//---------+
92| 10000110 | LLLLLLLL | DDDDDDDDDDDD | TTTTTTTTTTTTTTTTTTTT |
93+----------+----------+------//------+-----------//---------+
94
95 TYPE=134 OPTION DOMAIN OF TAGS
96 LENGTH INTERPRETATION
97
98
99 Figure 1. CIPSO Format
100
101
1023.1 Type
103
104This field is 1 octet in length. Its value is 134.
105
106
1073.2 Length
108
109This field is 1 octet in length. It is the total length of the option
110including the type and length fields. With the current IP header length
111restriction of 40 octets the value of this field MUST not exceed 40.
112
113
1143.3 Domain of Interpretation Identifier
115
116This field is an unsigned 32 bit integer. The value 0 is reserved and MUST
117not appear as the DOI identifier in any CIPSO option. Implementations
118should assume that the DOI identifier field is not aligned on any particular
119byte boundary.
120
121To conserve space in the protocol, security levels and categories are
122represented by numbers rather than their ASCII equivalent. This requires
123a mapping table within CIPSO hosts to map these numbers to their
124corresponding ASCII representations. Non-related groups of systems may
125
126
127
128Internet Draft, Expires 15 Jan 93 [PAGE 2]
129
130
131
132CIPSO INTERNET DRAFT 16 July, 1992
133
134
135
136have their own unique mappings. For example, one group of systems may
137use the number 5 to represent Unclassified while another group may use the
138number 1 to represent that same security level. The DOI identifier is used
139to identify which mapping was used for the values within the option.
140
141
1423.4 Tag Types
143
144A common format for passing security related information is necessary
145for interoperability. CIPSO uses sets of "tags" to contain the security
146information relevant to the data in the IP packet. Each tag begins with
147a tag type identifier followed by the length of the tag and ends with the
148actual security information to be passed. All multi-octet fields in a tag
149are defined to be transmitted in network byte order. Like the DOI
150identifier field in the CIPSO header, implementations should assume that
151all tags, as well as fields within a tag, are not aligned on any particular
152octet boundary. The tag types defined in this document contain alignment
153bytes to assist alignment of some information, however alignment can not
154be guaranteed if CIPSO is not the first IP option.
155
156CIPSO tag types 0 through 127 are reserved for defining standard tag
157formats. Their definitions will be published in RFCs. Tag types whose
158identifiers are greater than 127 are defined by the DOI authority and may
159only be meaningful in certain Domains of Interpretation. For these tag
160types, implementations will require the DOI identifier as well as the tag
161number to determine the security policy and the format associated with the
162tag. Use of tag types above 127 are restricted to closed networks where
163interoperability with other networks will not be an issue. Implementations
164that support a tag type greater than 127 MUST support at least one DOI that
165requires only tag types 1 to 127.
166
167Tag type 0 is reserved. Tag types 1, 2, and 5 are defined in this
168Internet Draft. Types 3 and 4 are reserved for work in progress.
169The standard format for all current and future CIPSO tags is shown below:
170
171+----------+----------+--------//--------+
172| TTTTTTTT | LLLLLLLL | IIIIIIIIIIIIIIII |
173+----------+----------+--------//--------+
174 TAG TAG TAG
175 TYPE LENGTH INFORMATION
176
177 Figure 2: Standard Tag Format
178
179In the three tag types described in this document, the length and count
180restrictions are based on the current IP limitation of 40 octets for all
181IP options. If the IP header is later expanded, then the length and count
182restrictions specified in this document may increase to use the full area
183provided for IP options.
184
185
1863.4.1 Tag Type Classes
187
188Tag classes consist of tag types that have common processing requirements
189and support the same security policy. The three tags defined in this
190Internet Draft belong to the Mandatory Access Control (MAC) Sensitivity
191
192
193
194Internet Draft, Expires 15 Jan 93 [PAGE 3]
195
196
197
198CIPSO INTERNET DRAFT 16 July, 1992
199
200
201
202class and support the MAC Sensitivity security policy.
203
204
2053.4.2 Tag Type 1
206
207This is referred to as the "bit-mapped" tag type. Tag type 1 is included
208in the MAC Sensitivity tag type class. The format of this tag type is as
209follows:
210
211+----------+----------+----------+----------+--------//---------+
212| 00000001 | LLLLLLLL | 00000000 | LLLLLLLL | CCCCCCCCCCCCCCCCC |
213+----------+----------+----------+----------+--------//---------+
214
215 TAG TAG ALIGNMENT SENSITIVITY BIT MAP OF
216 TYPE LENGTH OCTET LEVEL CATEGORIES
217
218 Figure 3. Tag Type 1 Format
219
220
2213.4.2.1 Tag Type
222
223This field is 1 octet in length and has a value of 1.
224
225
2263.4.2.2 Tag Length
227
228This field is 1 octet in length. It is the total length of the tag type
229including the type and length fields. With the current IP header length
230restriction of 40 bytes the value within this field is between 4 and 34.
231
232
2333.4.2.3 Alignment Octet
234
235This field is 1 octet in length and always has the value of 0. Its purpose
236is to align the category bitmap field on an even octet boundary. This will
237speed many implementations including router implementations.
238
239
2403.4.2.4 Sensitivity Level
241
242This field is 1 octet in length. Its value is from 0 to 255. The values
243are ordered with 0 being the minimum value and 255 representing the maximum
244value.
245
246
2473.4.2.5 Bit Map of Categories
248
249The length of this field is variable and ranges from 0 to 30 octets. This
250provides representation of categories 0 to 239. The ordering of the bits
251is left to right or MSB to LSB. For example category 0 is represented by
252the most significant bit of the first byte and category 15 is represented
253by the least significant bit of the second byte. Figure 4 graphically
254shows this ordering. Bit N is binary 1 if category N is part of the label
255for the datagram, and bit N is binary 0 if category N is not part of the
256label. Except for the optimized tag 1 format described in the next section,
257
258
259
260Internet Draft, Expires 15 Jan 93 [PAGE 4]
261
262
263
264CIPSO INTERNET DRAFT 16 July, 1992
265
266
267
268minimal encoding SHOULD be used resulting in no trailing zero octets in the
269category bitmap.
270
271 octet 0 octet 1 octet 2 octet 3 octet 4 octet 5
272 XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX . . .
273bit 01234567 89111111 11112222 22222233 33333333 44444444
274number 012345 67890123 45678901 23456789 01234567
275
276 Figure 4. Ordering of Bits in Tag 1 Bit Map
277
278
2793.4.2.6 Optimized Tag 1 Format
280
281Routers work most efficiently when processing fixed length fields. To
282support these routers there is an optimized form of tag type 1. The format
283does not change. The only change is to the category bitmap which is set to
284a constant length of 10 octets. Trailing octets required to fill out the 10
285octets are zero filled. Ten octets, allowing for 80 categories, was chosen
286because it makes the total length of the CIPSO option 20 octets. If CIPSO
287is the only option then the option will be full word aligned and additional
288filler octets will not be required.
289
290
2913.4.3 Tag Type 2
292
293This is referred to as the "enumerated" tag type. It is used to describe
294large but sparsely populated sets of categories. Tag type 2 is in the MAC
295Sensitivity tag type class. The format of this tag type is as follows:
296
297+----------+----------+----------+----------+-------------//-------------+
298| 00000010 | LLLLLLLL | 00000000 | LLLLLLLL | CCCCCCCCCCCCCCCCCCCCCCCCCC |
299+----------+----------+----------+----------+-------------//-------------+
300
301 TAG TAG ALIGNMENT SENSITIVITY ENUMERATED
302 TYPE LENGTH OCTET LEVEL CATEGORIES
303
304 Figure 5. Tag Type 2 Format
305
306
3073.4.3.1 Tag Type
308
309This field is one octet in length and has a value of 2.
310
311
3123.4.3.2 Tag Length
313
314This field is 1 octet in length. It is the total length of the tag type
315including the type and length fields. With the current IP header length
316restriction of 40 bytes the value within this field is between 4 and 34.
317
318
3193.4.3.3 Alignment Octet
320
321This field is 1 octet in length and always has the value of 0. Its purpose
322is to align the category field on an even octet boundary. This will
323
324
325
326Internet Draft, Expires 15 Jan 93 [PAGE 5]
327
328
329
330CIPSO INTERNET DRAFT 16 July, 1992
331
332
333
334speed many implementations including router implementations.
335
336
3373.4.3.4 Sensitivity Level
338
339This field is 1 octet in length. Its value is from 0 to 255. The values
340are ordered with 0 being the minimum value and 255 representing the
341maximum value.
342
343
3443.4.3.5 Enumerated Categories
345
346In this tag, categories are represented by their actual value rather than
347by their position within a bit field. The length of each category is 2
348octets. Up to 15 categories may be represented by this tag. Valid values
349for categories are 0 to 65534. Category 65535 is not a valid category
350value. The categories MUST be listed in ascending order within the tag.
351
352
3533.4.4 Tag Type 5
354
355This is referred to as the "range" tag type. It is used to represent
356labels where all categories in a range, or set of ranges, are included
357in the sensitivity label. Tag type 5 is in the MAC Sensitivity tag type
358class. The format of this tag type is as follows:
359
360+----------+----------+----------+----------+------------//-------------+
361| 00000101 | LLLLLLLL | 00000000 | LLLLLLLL | Top/Bottom | Top/Bottom |
362+----------+----------+----------+----------+------------//-------------+
363
364 TAG TAG ALIGNMENT SENSITIVITY CATEGORY RANGES
365 TYPE LENGTH OCTET LEVEL
366
367 Figure 6. Tag Type 5 Format
368
369
3703.4.4.1 Tag Type
371
372This field is one octet in length and has a value of 5.
373
374
3753.4.4.2 Tag Length
376
377This field is 1 octet in length. It is the total length of the tag type
378including the type and length fields. With the current IP header length
379restriction of 40 bytes the value within this field is between 4 and 34.
380
381
3823.4.4.3 Alignment Octet
383
384This field is 1 octet in length and always has the value of 0. Its purpose
385is to align the category range field on an even octet boundary. This will
386speed many implementations including router implementations.
387
388
389
390
391
392Internet Draft, Expires 15 Jan 93 [PAGE 6]
393
394
395
396CIPSO INTERNET DRAFT 16 July, 1992
397
398
399
4003.4.4.4 Sensitivity Level
401
402This field is 1 octet in length. Its value is from 0 to 255. The values
403are ordered with 0 being the minimum value and 255 representing the maximum
404value.
405
406
4073.4.4.5 Category Ranges
408
409A category range is a 4 octet field comprised of the 2 octet index of the
410highest numbered category followed by the 2 octet index of the lowest
411numbered category. These range endpoints are inclusive within the range of
412categories. All categories within a range are included in the sensitivity
413label. This tag may contain a maximum of 7 category pairs. The bottom
414category endpoint for the last pair in the tag MAY be omitted and SHOULD be
415assumed to be 0. The ranges MUST be non-overlapping and be listed in
416descending order. Valid values for categories are 0 to 65534. Category
41765535 is not a valid category value.
418
419
4203.4.5 Minimum Requirements
421
422A CIPSO implementation MUST be capable of generating at least tag type 1 in
423the non-optimized form. In addition, a CIPSO implementation MUST be able
424to receive any valid tag type 1 even those using the optimized tag type 1
425format.
426
427
4284. Configuration Parameters
429
430The configuration parameters defined below are required for all CIPSO hosts,
431gateways, and routers that support multiple sensitivity labels. A CIPSO
432host is defined to be the origination or destination system for an IP
433datagram. A CIPSO gateway provides IP routing services between two or more
434IP networks and may be required to perform label translations between
435networks. A CIPSO gateway may be an enhanced CIPSO host or it may just
436provide gateway services with no end system CIPSO capabilities. A CIPSO
437router is a dedicated IP router that routes IP datagrams between two or more
438IP networks.
439
440An implementation of CIPSO on a host MUST have the capability to reject a
441datagram for reasons that the information contained can not be adequately
442protected by the receiving host or if acceptance may result in violation of
443the host or network security policy. In addition, a CIPSO gateway or router
444MUST be able to reject datagrams going to networks that can not provide
445adequate protection or may violate the network's security policy. To
446provide this capability the following minimal set of configuration
447parameters are required for CIPSO implementations:
448
449HOST_LABEL_MAX - This parameter contains the maximum sensitivity label that
450a CIPSO host is authorized to handle. All datagrams that have a label
451greater than this maximum MUST be rejected by the CIPSO host. This
452parameter does not apply to CIPSO gateways or routers. This parameter need
453not be defined explicitly as it can be implicitly derived from the
454PORT_LABEL_MAX parameters for the associated interfaces.
455
456
457
458Internet Draft, Expires 15 Jan 93 [PAGE 7]
459
460
461
462CIPSO INTERNET DRAFT 16 July, 1992
463
464
465
466
467HOST_LABEL_MIN - This parameter contains the minimum sensitivity label that
468a CIPSO host is authorized to handle. All datagrams that have a label less
469than this minimum MUST be rejected by the CIPSO host. This parameter does
470not apply to CIPSO gateways or routers. This parameter need not be defined
471explicitly as it can be implicitly derived from the PORT_LABEL_MIN
472parameters for the associated interfaces.
473
474PORT_LABEL_MAX - This parameter contains the maximum sensitivity label for
475all datagrams that may exit a particular network interface port. All
476outgoing datagrams that have a label greater than this maximum MUST be
477rejected by the CIPSO system. The label within this parameter MUST be
478less than or equal to the label within the HOST_LABEL_MAX parameter. This
479parameter does not apply to CIPSO hosts that support only one network port.
480
481PORT_LABEL_MIN - This parameter contains the minimum sensitivity label for
482all datagrams that may exit a particular network interface port. All
483outgoing datagrams that have a label less than this minimum MUST be
484rejected by the CIPSO system. The label within this parameter MUST be
485greater than or equal to the label within the HOST_LABEL_MIN parameter.
486This parameter does not apply to CIPSO hosts that support only one network
487port.
488
489PORT_DOI - This parameter is used to assign a DOI identifier value to a
490particular network interface port. All CIPSO labels within datagrams
491going out this port MUST use the specified DOI identifier. All CIPSO
492hosts and gateways MUST support either this parameter, the NET_DOI
493parameter, or the HOST_DOI parameter.
494
495NET_DOI - This parameter is used to assign a DOI identifier value to a
496particular IP network address. All CIPSO labels within datagrams destined
497for the particular IP network MUST use the specified DOI identifier. All
498CIPSO hosts and gateways MUST support either this parameter, the PORT_DOI
499parameter, or the HOST_DOI parameter.
500
501HOST_DOI - This parameter is used to assign a DOI identifier value to a
502particular IP host address. All CIPSO labels within datagrams destined for
503the particular IP host will use the specified DOI identifier. All CIPSO
504hosts and gateways MUST support either this parameter, the PORT_DOI
505parameter, or the NET_DOI parameter.
506
507This list represents the minimal set of configuration parameters required
508to be compliant. Implementors are encouraged to add to this list to
509provide enhanced functionality and control. For example, many security
510policies may require both incoming and outgoing datagrams be checked against
511the port and host label ranges.
512
513
5144.1 Port Range Parameters
515
516The labels represented by the PORT_LABEL_MAX and PORT_LABEL_MIN parameters
517MAY be in CIPSO or local format. Some CIPSO systems, such as routers, may
518want to have the range parameters expressed in CIPSO format so that incoming
519labels do not have to be converted to a local format before being compared
520against the range. If multiple DOIs are supported by one of these CIPSO
521
522
523
524Internet Draft, Expires 15 Jan 93 [PAGE 8]
525
526
527
528CIPSO INTERNET DRAFT 16 July, 1992
529
530
531
532systems then multiple port range parameters would be needed, one set for
533each DOI supported on a particular port.
534
535The port range will usually represent the total set of labels that may
536exist on the logical network accessed through the corresponding network
537interface. It may, however, represent a subset of these labels that are
538allowed to enter the CIPSO system.
539
540
5414.2 Single Label CIPSO Hosts
542
543CIPSO implementations that support only one label are not required to
544support the parameters described above. These limited implementations are
545only required to support a NET_LABEL parameter. This parameter contains
546the CIPSO label that may be inserted in datagrams that exit the host. In
547addition, the host MUST reject any incoming datagram that has a label which
548is not equivalent to the NET_LABEL parameter.
549
550
5515. Handling Procedures
552
553This section describes the processing requirements for incoming and
554outgoing IP datagrams. Just providing the correct CIPSO label format
555is not enough. Assumptions will be made by one system on how a
556receiving system will handle the CIPSO label. Wrong assumptions may
557lead to non-interoperability or even a security incident. The
558requirements described below represent the minimal set needed for
559interoperability and that provide users some level of confidence.
560Many other requirements could be added to increase user confidence,
561however at the risk of restricting creativity and limiting vendor
562participation.
563
564
5655.1 Input Procedures
566
567All datagrams received through a network port MUST have a security label
568associated with them, either contained in the datagram or assigned to the
569receiving port. Without this label the host, gateway, or router will not
570have the information it needs to make security decisions. This security
571label will be obtained from the CIPSO if the option is present in the
572datagram. See section 4.1.2 for handling procedures for unlabeled
573datagrams. This label will be compared against the PORT (if appropriate)
574and HOST configuration parameters defined in section 3.
575
576If any field within the CIPSO option, such as the DOI identifier, is not
577recognized the IP datagram is discarded and an ICMP "parameter problem"
578(type 12) is generated and returned. The ICMP code field is set to "bad
579parameter" (code 0) and the pointer is set to the start of the CIPSO field
580that is unrecognized.
581
582If the contents of the CIPSO are valid but the security label is
583outside of the configured host or port label range, the datagram is
584discarded and an ICMP "destination unreachable" (type 3) is generated
585and returned. The code field of the ICMP is set to "communication with
586destination network administratively prohibited" (code 9) or to
587
588
589
590Internet Draft, Expires 15 Jan 93 [PAGE 9]
591
592
593
594CIPSO INTERNET DRAFT 16 July, 1992
595
596
597
598"communication with destination host administratively prohibited"
599(code 10). The value of the code field used is dependent upon whether
600the originator of the ICMP message is acting as a CIPSO host or a CIPSO
601gateway. The recipient of the ICMP message MUST be able to handle either
602value. The same procedure is performed if a CIPSO can not be added to an
603IP packet because it is too large to fit in the IP options area.
604
605If the error is triggered by receipt of an ICMP message, the message
606is discarded and no response is permitted (consistent with general ICMP
607processing rules).
608
609
6105.1.1 Unrecognized tag types
611
612The default condition for any CIPSO implementation is that an
613unrecognized tag type MUST be treated as a "parameter problem" and
614handled as described in section 4.1. A CIPSO implementation MAY allow
615the system administrator to identify tag types that may safely be
616ignored. This capability is an allowable enhancement, not a
617requirement.
618
619
6205.1.2 Unlabeled Packets
621
622A network port may be configured to not require a CIPSO label for all
623incoming datagrams. For this configuration a CIPSO label must be
624assigned to that network port and associated with all unlabeled IP
625datagrams. This capability might be used for single level networks or
626networks that have CIPSO and non-CIPSO hosts and the non-CIPSO hosts
627all operate at the same label.
628
629If a CIPSO option is required and none is found, the datagram is
630discarded and an ICMP "parameter problem" (type 12) is generated and
631returned to the originator of the datagram. The code field of the ICMP
632is set to "option missing" (code 1) and the ICMP pointer is set to 134
633(the value of the option type for the missing CIPSO option).
634
635
6365.2 Output Procedures
637
638A CIPSO option MUST appear only once in a datagram. Only one tag type
639from the MAC Sensitivity class MAY be included in a CIPSO option. Given
640the current set of defined tag types, this means that CIPSO labels at
641first will contain only one tag.
642
643All datagrams leaving a CIPSO system MUST meet the following condition:
644
645 PORT_LABEL_MIN <= CIPSO label <= PORT_LABEL_MAX
646
647If this condition is not satisfied the datagram MUST be discarded.
648If the CIPSO system only supports one port, the HOST_LABEL_MIN and the
649HOST_LABEL_MAX parameters MAY be substituted for the PORT parameters in
650the above condition.
651
652The DOI identifier to be used for all outgoing datagrams is configured by
653
654
655
656Internet Draft, Expires 15 Jan 93 [PAGE 10]
657
658
659
660CIPSO INTERNET DRAFT 16 July, 1992
661
662
663
664the administrator. If port level DOI identifier assignment is used, then
665the PORT_DOI configuration parameter MUST contain the DOI identifier to
666use. If network level DOI assignment is used, then the NET_DOI parameter
667MUST contain the DOI identifier to use. And if host level DOI assignment
668is employed, then the HOST_DOI parameter MUST contain the DOI identifier
669to use. A CIPSO implementation need only support one level of DOI
670assignment.
671
672
6735.3 DOI Processing Requirements
674
675A CIPSO implementation MUST support at least one DOI and SHOULD support
676multiple DOIs. System and network administrators are cautioned to
677ensure that at least one DOI is common within an IP network to allow for
678broadcasting of IP datagrams.
679
680CIPSO gateways MUST be capable of translating a CIPSO option from one
681DOI to another when forwarding datagrams between networks. For
682efficiency purposes this capability is only a desired feature for CIPSO
683routers.
684
685
6865.4 Label of ICMP Messages
687
688The CIPSO label to be used on all outgoing ICMP messages MUST be equivalent
689to the label of the datagram that caused the ICMP message. If the ICMP was
690generated due to a problem associated with the original CIPSO label then the
691following responses are allowed:
692
693 a. Use the CIPSO label of the original IP datagram
694 b. Drop the original datagram with no return message generated
695
696In most cases these options will have the same effect. If you can not
697interpret the label or if it is outside the label range of your host or
698interface then an ICMP message with the same label will probably not be
699able to exit the system.
700
701
7026. Assignment of DOI Identifier Numbers =
703
704Requests for assignment of a DOI identifier number should be addressed to
705the Internet Assigned Numbers Authority (IANA).
706
707
7087. Acknowledgements
709
710Much of the material in this RFC is based on (and copied from) work
711done by Gary Winiger of Sun Microsystems and published as Commercial
712IP Security Option at the INTEROP 89, Commercial IPSO Workshop.
713
714
7158. Author's Address
716
717To submit mail for distribution to members of the IETF CIPSO Working
718Group, send mail to: cipso@wdl1.wdl.loral.com.
719
720
721
722Internet Draft, Expires 15 Jan 93 [PAGE 11]
723
724
725
726CIPSO INTERNET DRAFT 16 July, 1992
727
728
729
730
731To be added to or deleted from this distribution, send mail to:
732cipso-request@wdl1.wdl.loral.com.
733
734
7359. References
736
737RFC 1038, "Draft Revised IP Security Option", M. St. Johns, IETF, January
7381988.
739
740RFC 1108, "U.S. Department of Defense Security Options
741for the Internet Protocol", Stephen Kent, IAB, 1 March, 1991.
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788Internet Draft, Expires 15 Jan 93 [PAGE 12]
789
790
791
diff --git a/Documentation/netlabel/introduction.txt b/Documentation/netlabel/introduction.txt
new file mode 100644
index 000000000000..a4ffba1694c8
--- /dev/null
+++ b/Documentation/netlabel/introduction.txt
@@ -0,0 +1,46 @@
1NetLabel Introduction
2==============================================================================
3Paul Moore, paul.moore@hp.com
4
5August 2, 2006
6
7 * Overview
8
9NetLabel is a mechanism which can be used by kernel security modules to attach
10security attributes to outgoing network packets generated from user space
11applications and read security attributes from incoming network packets. It
12is composed of three main components, the protocol engines, the communication
13layer, and the kernel security module API.
14
15 * Protocol Engines
16
17The protocol engines are responsible for both applying and retrieving the
18network packet's security attributes. If any translation between the network
19security attributes and those on the host are required then the protocol
20engine will handle those tasks as well. Other kernel subsystems should
21refrain from calling the protocol engines directly, instead they should use
22the NetLabel kernel security module API described below.
23
24Detailed information about each NetLabel protocol engine can be found in this
25directory, consult '00-INDEX' for filenames.
26
27 * Communication Layer
28
29The communication layer exists to allow NetLabel configuration and monitoring
30from user space. The NetLabel communication layer uses a message based
31protocol built on top of the Generic NETLINK transport mechanism. The exact
32formatting of these NetLabel messages as well as the Generic NETLINK family
33names can be found in the the 'net/netlabel/' directory as comments in the
34header files as well as in 'include/net/netlabel.h'.
35
36 * Security Module API
37
38The purpose of the NetLabel security module API is to provide a protocol
39independent interface to the underlying NetLabel protocol engines. In addition
40to protocol independence, the security module API is designed to be completely
41LSM independent which should allow multiple LSMs to leverage the same code
42base.
43
44Detailed information about the NetLabel security module API can be found in the
45'include/net/netlabel.h' header file as well as the 'lsm_interface.txt' file
46found in this directory.
diff --git a/Documentation/netlabel/lsm_interface.txt b/Documentation/netlabel/lsm_interface.txt
new file mode 100644
index 000000000000..98dd9f7430f2
--- /dev/null
+++ b/Documentation/netlabel/lsm_interface.txt
@@ -0,0 +1,47 @@
1NetLabel Linux Security Module Interface
2==============================================================================
3Paul Moore, paul.moore@hp.com
4
5May 17, 2006
6
7 * Overview
8
9NetLabel is a mechanism which can set and retrieve security attributes from
10network packets. It is intended to be used by LSM developers who want to make
11use of a common code base for several different packet labeling protocols.
12The NetLabel security module API is defined in 'include/net/netlabel.h' but a
13brief overview is given below.
14
15 * NetLabel Security Attributes
16
17Since NetLabel supports multiple different packet labeling protocols and LSMs
18it uses the concept of security attributes to refer to the packet's security
19labels. The NetLabel security attributes are defined by the
20'netlbl_lsm_secattr' structure in the NetLabel header file. Internally the
21NetLabel subsystem converts the security attributes to and from the correct
22low-level packet label depending on the NetLabel build time and run time
23configuration. It is up to the LSM developer to translate the NetLabel
24security attributes into whatever security identifiers are in use for their
25particular LSM.
26
27 * NetLabel LSM Protocol Operations
28
29These are the functions which allow the LSM developer to manipulate the labels
30on outgoing packets as well as read the labels on incoming packets. Functions
31exist to operate both on sockets as well as the sk_buffs directly. These high
32level functions are translated into low level protocol operations based on how
33the administrator has configured the NetLabel subsystem.
34
35 * NetLabel Label Mapping Cache Operations
36
37Depending on the exact configuration, translation between the network packet
38label and the internal LSM security identifier can be time consuming. The
39NetLabel label mapping cache is a caching mechanism which can be used to
40sidestep much of this overhead once a mapping has been established. Once the
41LSM has received a packet, used NetLabel to decode it's security attributes,
42and translated the security attributes into a LSM internal identifier the LSM
43can use the NetLabel caching functions to associate the LSM internal
44identifier with the network packet's label. This means that in the future
45when a incoming packet matches a cached value not only are the internal
46NetLabel translation mechanisms bypassed but the LSM translation mechanisms are
47bypassed as well which should result in a significant reduction in overhead.
diff --git a/Documentation/networking/LICENSE.qla3xxx b/Documentation/networking/LICENSE.qla3xxx
new file mode 100644
index 000000000000..2f2077e34d81
--- /dev/null
+++ b/Documentation/networking/LICENSE.qla3xxx
@@ -0,0 +1,46 @@
1Copyright (c) 2003-2006 QLogic Corporation
2QLogic Linux Networking HBA Driver
3
4This program includes a device driver for Linux 2.6 that may be
5distributed with QLogic hardware specific firmware binary file.
6You may modify and redistribute the device driver code under the
7GNU General Public License as published by the Free Software
8Foundation (version 2 or a later version).
9
10You may redistribute the hardware specific firmware binary file
11under the following terms:
12
13 1. Redistribution of source code (only if applicable),
14 must retain the above copyright notice, this list of
15 conditions and the following disclaimer.
16
17 2. Redistribution in binary form must reproduce the above
18 copyright notice, this list of conditions and the
19 following disclaimer in the documentation and/or other
20 materials provided with the distribution.
21
22 3. The name of QLogic Corporation may not be used to
23 endorse or promote products derived from this software
24 without specific prior written permission
25
26REGARDLESS OF WHAT LICENSING MECHANISM IS USED OR APPLICABLE,
27THIS PROGRAM IS PROVIDED BY QLOGIC CORPORATION "AS IS'' AND ANY
28EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
29IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
30PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR
31BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
32EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED
33TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
34DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
35ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
36OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
37OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
38POSSIBILITY OF SUCH DAMAGE.
39
40USER ACKNOWLEDGES AND AGREES THAT USE OF THIS PROGRAM WILL NOT
41CREATE OR GIVE GROUNDS FOR A LICENSE BY IMPLICATION, ESTOPPEL, OR
42OTHERWISE IN ANY INTELLECTUAL PROPERTY RIGHTS (PATENT, COPYRIGHT,
43TRADE SECRET, MASK WORK, OR OTHER PROPRIETARY RIGHT) EMBODIED IN
44ANY OTHER QLOGIC HARDWARE OR SOFTWARE EITHER SOLELY OR IN
45COMBINATION WITH THIS PROGRAM.
46
diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt
index afac780445cd..dc942eaf490f 100644
--- a/Documentation/networking/bonding.txt
+++ b/Documentation/networking/bonding.txt
@@ -192,6 +192,17 @@ or, for backwards compatibility, the option value. E.g.,
192arp_interval 192arp_interval
193 193
194 Specifies the ARP link monitoring frequency in milliseconds. 194 Specifies the ARP link monitoring frequency in milliseconds.
195
196 The ARP monitor works by periodically checking the slave
197 devices to determine whether they have sent or received
198 traffic recently (the precise criteria depends upon the
199 bonding mode, and the state of the slave). Regular traffic is
200 generated via ARP probes issued for the addresses specified by
201 the arp_ip_target option.
202
203 This behavior can be modified by the arp_validate option,
204 below.
205
195 If ARP monitoring is used in an etherchannel compatible mode 206 If ARP monitoring is used in an etherchannel compatible mode
196 (modes 0 and 2), the switch should be configured in a mode 207 (modes 0 and 2), the switch should be configured in a mode
197 that evenly distributes packets across all links. If the 208 that evenly distributes packets across all links. If the
@@ -213,6 +224,54 @@ arp_ip_target
213 maximum number of targets that can be specified is 16. The 224 maximum number of targets that can be specified is 16. The
214 default value is no IP addresses. 225 default value is no IP addresses.
215 226
227arp_validate
228
229 Specifies whether or not ARP probes and replies should be
230 validated in the active-backup mode. This causes the ARP
231 monitor to examine the incoming ARP requests and replies, and
232 only consider a slave to be up if it is receiving the
233 appropriate ARP traffic.
234
235 Possible values are:
236
237 none or 0
238
239 No validation is performed. This is the default.
240
241 active or 1
242
243 Validation is performed only for the active slave.
244
245 backup or 2
246
247 Validation is performed only for backup slaves.
248
249 all or 3
250
251 Validation is performed for all slaves.
252
253 For the active slave, the validation checks ARP replies to
254 confirm that they were generated by an arp_ip_target. Since
255 backup slaves do not typically receive these replies, the
256 validation performed for backup slaves is on the ARP request
257 sent out via the active slave. It is possible that some
258 switch or network configurations may result in situations
259 wherein the backup slaves do not receive the ARP requests; in
260 such a situation, validation of backup slaves must be
261 disabled.
262
263 This option is useful in network configurations in which
264 multiple bonding hosts are concurrently issuing ARPs to one or
265 more targets beyond a common switch. Should the link between
266 the switch and target fail (but not the switch itself), the
267 probe traffic generated by the multiple bonding instances will
268 fool the standard ARP monitor into considering the links as
269 still up. Use of the arp_validate option can resolve this, as
270 the ARP monitor will only consider ARP requests and replies
271 associated with its own instance of bonding.
272
273 This option was added in bonding version 3.1.0.
274
216downdelay 275downdelay
217 276
218 Specifies the time, in milliseconds, to wait before disabling 277 Specifies the time, in milliseconds, to wait before disabling
diff --git a/Documentation/networking/dccp.txt b/Documentation/networking/dccp.txt
index c45daabd3bfe..74563b38ffd9 100644
--- a/Documentation/networking/dccp.txt
+++ b/Documentation/networking/dccp.txt
@@ -1,7 +1,6 @@
1DCCP protocol 1DCCP protocol
2============ 2============
3 3
4Last updated: 10 November 2005
5 4
6Contents 5Contents
7======== 6========
@@ -42,8 +41,11 @@ Socket options
42DCCP_SOCKOPT_PACKET_SIZE is used for CCID3 to set default packet size for 41DCCP_SOCKOPT_PACKET_SIZE is used for CCID3 to set default packet size for
43calculations. 42calculations.
44 43
45DCCP_SOCKOPT_SERVICE sets the service. This is compulsory as per the 44DCCP_SOCKOPT_SERVICE sets the service. The specification mandates use of
46specification. If you don't set it you will get EPROTO. 45service codes (RFC 4340, sec. 8.1.2); if this socket option is not set,
46the socket will fall back to 0 (which means that no meaningful service code
47is present). Connecting sockets set at most one service option; for
48listening sockets, multiple service codes can be specified.
47 49
48Notes 50Notes
49===== 51=====
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index d46338af6002..935e298f674a 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -102,9 +102,15 @@ inet_peer_gc_maxtime - INTEGER
102TCP variables: 102TCP variables:
103 103
104tcp_abc - INTEGER 104tcp_abc - INTEGER
105 Controls Appropriate Byte Count defined in RFC3465. If set to 105 Controls Appropriate Byte Count (ABC) defined in RFC3465.
106 0 then does congestion avoid once per ack. 1 is conservative 106 ABC is a way of increasing congestion window (cwnd) more slowly
107 value, and 2 is more agressive. 107 in response to partial acknowledgments.
108 Possible values are:
109 0 increase cwnd once per acknowledgment (no ABC)
110 1 increase cwnd once per acknowledgment of full sized segment
111 2 allow increase cwnd by two if acknowledgment is
112 of two segments to compensate for delayed acknowledgments.
113 Default: 0 (off)
108 114
109tcp_syn_retries - INTEGER 115tcp_syn_retries - INTEGER
110 Number of times initial SYNs for an active TCP connection attempt 116 Number of times initial SYNs for an active TCP connection attempt
@@ -294,15 +300,15 @@ tcp_rmem - vector of 3 INTEGERs: min, default, max
294 Default: 87380*2 bytes. 300 Default: 87380*2 bytes.
295 301
296tcp_mem - vector of 3 INTEGERs: min, pressure, max 302tcp_mem - vector of 3 INTEGERs: min, pressure, max
297 low: below this number of pages TCP is not bothered about its 303 min: below this number of pages TCP is not bothered about its
298 memory appetite. 304 memory appetite.
299 305
300 pressure: when amount of memory allocated by TCP exceeds this number 306 pressure: when amount of memory allocated by TCP exceeds this number
301 of pages, TCP moderates its memory consumption and enters memory 307 of pages, TCP moderates its memory consumption and enters memory
302 pressure mode, which is exited when memory consumption falls 308 pressure mode, which is exited when memory consumption falls
303 under "low". 309 under "min".
304 310
305 high: number of pages allowed for queueing by all TCP sockets. 311 max: number of pages allowed for queueing by all TCP sockets.
306 312
307 Defaults are calculated at boot time from amount of available 313 Defaults are calculated at boot time from amount of available
308 memory. 314 memory.
@@ -369,6 +375,41 @@ tcp_slow_start_after_idle - BOOLEAN
369 be timed out after an idle period. 375 be timed out after an idle period.
370 Default: 1 376 Default: 1
371 377
378CIPSOv4 Variables:
379
380cipso_cache_enable - BOOLEAN
381 If set, enable additions to and lookups from the CIPSO label mapping
382 cache. If unset, additions are ignored and lookups always result in a
383 miss. However, regardless of the setting the cache is still
384 invalidated when required when means you can safely toggle this on and
385 off and the cache will always be "safe".
386 Default: 1
387
388cipso_cache_bucket_size - INTEGER
389 The CIPSO label cache consists of a fixed size hash table with each
390 hash bucket containing a number of cache entries. This variable limits
391 the number of entries in each hash bucket; the larger the value the
392 more CIPSO label mappings that can be cached. When the number of
393 entries in a given hash bucket reaches this limit adding new entries
394 causes the oldest entry in the bucket to be removed to make room.
395 Default: 10
396
397cipso_rbm_optfmt - BOOLEAN
398 Enable the "Optimized Tag 1 Format" as defined in section 3.4.2.6 of
399 the CIPSO draft specification (see Documentation/netlabel for details).
400 This means that when set the CIPSO tag will be padded with empty
401 categories in order to make the packet data 32-bit aligned.
402 Default: 0
403
404cipso_rbm_structvalid - BOOLEAN
405 If set, do a very strict check of the CIPSO option when
406 ip_options_compile() is called. If unset, relax the checks done during
407 ip_options_compile(). Either way is "safe" as errors are caught else
408 where in the CIPSO processing code but setting this to 0 (False) should
409 result in less work (i.e. it should be faster) but could cause problems
410 with other implementations that require strict checking.
411 Default: 0
412
372IP Variables: 413IP Variables:
373 414
374ip_local_port_range - 2 INTEGERS 415ip_local_port_range - 2 INTEGERS
@@ -724,6 +765,9 @@ conf/all/forwarding - BOOLEAN
724 765
725 This referred to as global forwarding. 766 This referred to as global forwarding.
726 767
768proxy_ndp - BOOLEAN
769 Do proxy ndp.
770
727conf/interface/*: 771conf/interface/*:
728 Change special settings per interface. 772 Change special settings per interface.
729 773
diff --git a/Documentation/networking/ipvs-sysctl.txt b/Documentation/networking/ipvs-sysctl.txt
new file mode 100644
index 000000000000..4ccdbca03811
--- /dev/null
+++ b/Documentation/networking/ipvs-sysctl.txt
@@ -0,0 +1,143 @@
1/proc/sys/net/ipv4/vs/* Variables:
2
3am_droprate - INTEGER
4 default 10
5
6 It sets the always mode drop rate, which is used in the mode 3
7 of the drop_rate defense.
8
9amemthresh - INTEGER
10 default 1024
11
12 It sets the available memory threshold (in pages), which is
13 used in the automatic modes of defense. When there is no
14 enough available memory, the respective strategy will be
15 enabled and the variable is automatically set to 2, otherwise
16 the strategy is disabled and the variable is set to 1.
17
18cache_bypass - BOOLEAN
19 0 - disabled (default)
20 not 0 - enabled
21
22 If it is enabled, forward packets to the original destination
23 directly when no cache server is available and destination
24 address is not local (iph->daddr is RTN_UNICAST). It is mostly
25 used in transparent web cache cluster.
26
27debug_level - INTEGER
28 0 - transmission error messages (default)
29 1 - non-fatal error messages
30 2 - configuration
31 3 - destination trash
32 4 - drop entry
33 5 - service lookup
34 6 - scheduling
35 7 - connection new/expire, lookup and synchronization
36 8 - state transition
37 9 - binding destination, template checks and applications
38 10 - IPVS packet transmission
39 11 - IPVS packet handling (ip_vs_in/ip_vs_out)
40 12 or more - packet traversal
41
42 Only available when IPVS is compiled with the CONFIG_IPVS_DEBUG
43
44 Higher debugging levels include the messages for lower debugging
45 levels, so setting debug level 2, includes level 0, 1 and 2
46 messages. Thus, logging becomes more and more verbose the higher
47 the level.
48
49drop_entry - INTEGER
50 0 - disabled (default)
51
52 The drop_entry defense is to randomly drop entries in the
53 connection hash table, just in order to collect back some
54 memory for new connections. In the current code, the
55 drop_entry procedure can be activated every second, then it
56 randomly scans 1/32 of the whole and drops entries that are in
57 the SYN-RECV/SYNACK state, which should be effective against
58 syn-flooding attack.
59
60 The valid values of drop_entry are from 0 to 3, where 0 means
61 that this strategy is always disabled, 1 and 2 mean automatic
62 modes (when there is no enough available memory, the strategy
63 is enabled and the variable is automatically set to 2,
64 otherwise the strategy is disabled and the variable is set to
65 1), and 3 means that that the strategy is always enabled.
66
67drop_packet - INTEGER
68 0 - disabled (default)
69
70 The drop_packet defense is designed to drop 1/rate packets
71 before forwarding them to real servers. If the rate is 1, then
72 drop all the incoming packets.
73
74 The value definition is the same as that of the drop_entry. In
75 the automatic mode, the rate is determined by the follow
76 formula: rate = amemthresh / (amemthresh - available_memory)
77 when available memory is less than the available memory
78 threshold. When the mode 3 is set, the always mode drop rate
79 is controlled by the /proc/sys/net/ipv4/vs/am_droprate.
80
81expire_nodest_conn - BOOLEAN
82 0 - disabled (default)
83 not 0 - enabled
84
85 The default value is 0, the load balancer will silently drop
86 packets when its destination server is not available. It may
87 be useful, when user-space monitoring program deletes the
88 destination server (because of server overload or wrong
89 detection) and add back the server later, and the connections
90 to the server can continue.
91
92 If this feature is enabled, the load balancer will expire the
93 connection immediately when a packet arrives and its
94 destination server is not available, then the client program
95 will be notified that the connection is closed. This is
96 equivalent to the feature some people requires to flush
97 connections when its destination is not available.
98
99expire_quiescent_template - BOOLEAN
100 0 - disabled (default)
101 not 0 - enabled
102
103 When set to a non-zero value, the load balancer will expire
104 persistent templates when the destination server is quiescent.
105 This may be useful, when a user makes a destination server
106 quiescent by setting its weight to 0 and it is desired that
107 subsequent otherwise persistent connections are sent to a
108 different destination server. By default new persistent
109 connections are allowed to quiescent destination servers.
110
111 If this feature is enabled, the load balancer will expire the
112 persistence template if it is to be used to schedule a new
113 connection and the destination server is quiescent.
114
115nat_icmp_send - BOOLEAN
116 0 - disabled (default)
117 not 0 - enabled
118
119 It controls sending icmp error messages (ICMP_DEST_UNREACH)
120 for VS/NAT when the load balancer receives packets from real
121 servers but the connection entries don't exist.
122
123secure_tcp - INTEGER
124 0 - disabled (default)
125
126 The secure_tcp defense is to use a more complicated state
127 transition table and some possible short timeouts of each
128 state. In the VS/NAT, it delays the entering the ESTABLISHED
129 until the real server starts to send data and ACK packet
130 (after 3-way handshake).
131
132 The value definition is the same as that of drop_entry or
133 drop_packet.
134
135sync_threshold - INTEGER
136 default 3
137
138 It sets synchronization threshold, which is the minimum number
139 of incoming packets that a connection needs to receive before
140 the connection will be synchronized. A connection will be
141 synchronized, every time the number of its incoming packets
142 modulus 50 equals the threshold. The range of the threshold is
143 from 0 to 49.
diff --git a/Documentation/networking/pktgen.txt b/Documentation/networking/pktgen.txt
index 278771c9ad99..18d385c068fc 100644
--- a/Documentation/networking/pktgen.txt
+++ b/Documentation/networking/pktgen.txt
@@ -74,7 +74,7 @@ Examples:
74 pgset "pkt_size 9014" sets packet size to 9014 74 pgset "pkt_size 9014" sets packet size to 9014
75 pgset "frags 5" packet will consist of 5 fragments 75 pgset "frags 5" packet will consist of 5 fragments
76 pgset "count 200000" sets number of packets to send, set to zero 76 pgset "count 200000" sets number of packets to send, set to zero
77 for continious sends untill explicitl stopped. 77 for continuous sends until explicitly stopped.
78 78
79 pgset "delay 5000" adds delay to hard_start_xmit(). nanoseconds 79 pgset "delay 5000" adds delay to hard_start_xmit(). nanoseconds
80 80
@@ -100,6 +100,7 @@ Examples:
100 are: IPSRC_RND #IP Source is random (between min/max), 100 are: IPSRC_RND #IP Source is random (between min/max),
101 IPDST_RND, UDPSRC_RND, 101 IPDST_RND, UDPSRC_RND,
102 UDPDST_RND, MACSRC_RND, MACDST_RND 102 UDPDST_RND, MACSRC_RND, MACDST_RND
103 MPLS_RND, VID_RND, SVID_RND
103 104
104 pgset "udp_src_min 9" set UDP source port min, If < udp_src_max, then 105 pgset "udp_src_min 9" set UDP source port min, If < udp_src_max, then
105 cycle through the port range. 106 cycle through the port range.
@@ -125,6 +126,21 @@ Examples:
125 126
126 pgset "mpls 0" turn off mpls (or any invalid argument works too!) 127 pgset "mpls 0" turn off mpls (or any invalid argument works too!)
127 128
129 pgset "vlan_id 77" set VLAN ID 0-4095
130 pgset "vlan_p 3" set priority bit 0-7 (default 0)
131 pgset "vlan_cfi 0" set canonical format identifier 0-1 (default 0)
132
133 pgset "svlan_id 22" set SVLAN ID 0-4095
134 pgset "svlan_p 3" set priority bit 0-7 (default 0)
135 pgset "svlan_cfi 0" set canonical format identifier 0-1 (default 0)
136
137 pgset "vlan_id 9999" > 4095 remove vlan and svlan tags
138 pgset "svlan 9999" > 4095 remove svlan tag
139
140
141 pgset "tos XX" set former IPv4 TOS field (e.g. "tos 28" for AF11 no ECN, default 00)
142 pgset "traffic_class XX" set former IPv6 TRAFFIC CLASS (e.g. "traffic_class B8" for EF no ECN, default 00)
143
128 pgset stop aborts injection. Also, ^C aborts generator. 144 pgset stop aborts injection. Also, ^C aborts generator.
129 145
130 146
diff --git a/Documentation/networking/secid.txt b/Documentation/networking/secid.txt
new file mode 100644
index 000000000000..95ea06784333
--- /dev/null
+++ b/Documentation/networking/secid.txt
@@ -0,0 +1,14 @@
1flowi structure:
2
3The secid member in the flow structure is used in LSMs (e.g. SELinux) to indicate
4the label of the flow. This label of the flow is currently used in selecting
5matching labeled xfrm(s).
6
7If this is an outbound flow, the label is derived from the socket, if any, or
8the incoming packet this flow is being generated as a response to (e.g. tcp
9resets, timewait ack, etc.). It is also conceivable that the label could be
10derived from other sources such as process context, device, etc., in special
11cases, as may be appropriate.
12
13If this is an inbound flow, the label is derived from the IPSec security
14associations, if any, used by the packet.
diff --git a/Documentation/nfsroot.txt b/Documentation/nfsroot.txt
index d56dc71d9430..3cc953cb288f 100644
--- a/Documentation/nfsroot.txt
+++ b/Documentation/nfsroot.txt
@@ -4,15 +4,16 @@ Mounting the root filesystem via NFS (nfsroot)
4Written 1996 by Gero Kuhlmann <gero@gkminix.han.de> 4Written 1996 by Gero Kuhlmann <gero@gkminix.han.de>
5Updated 1997 by Martin Mares <mj@atrey.karlin.mff.cuni.cz> 5Updated 1997 by Martin Mares <mj@atrey.karlin.mff.cuni.cz>
6Updated 2006 by Nico Schottelius <nico-kernel-nfsroot@schottelius.org> 6Updated 2006 by Nico Schottelius <nico-kernel-nfsroot@schottelius.org>
7Updated 2006 by Horms <horms@verge.net.au>
7 8
8 9
9 10
10If you want to use a diskless system, as an X-terminal or printer 11In order to use a diskless system, such as an X-terminal or printer server
11server for example, you have to put your root filesystem onto a 12for example, it is necessary for the root filesystem to be present on a
12non-disk device. This can either be a ramdisk (see initrd.txt in 13non-disk device. This may be an initramfs (see Documentation/filesystems/
13this directory for further information) or a filesystem mounted 14ramfs-rootfs-initramfs.txt), a ramdisk (see Documenation/initrd.txt) or a
14via NFS. The following text describes on how to use NFS for the 15filesystem mounted via NFS. The following text describes on how to use NFS
15root filesystem. For the rest of this text 'client' means the 16for the root filesystem. For the rest of this text 'client' means the
16diskless system, and 'server' means the NFS server. 17diskless system, and 'server' means the NFS server.
17 18
18 19
@@ -21,11 +22,13 @@ diskless system, and 'server' means the NFS server.
211.) Enabling nfsroot capabilities 221.) Enabling nfsroot capabilities
22 ----------------------------- 23 -----------------------------
23 24
24In order to use nfsroot you have to select support for NFS during 25In order to use nfsroot, NFS client support needs to be selected as
25kernel configuration. Note that NFS cannot be loaded as a module 26built-in during configuration. Once this has been selected, the nfsroot
26in this case. The configuration script will then ask you whether 27option will become available, which should also be selected.
27you want to use nfsroot, and if yes what kind of auto configuration 28
28system you want to use. Selecting both BOOTP and RARP is safe. 29In the networking options, kernel level autoconfiguration can be selected,
30along with the types of autoconfiguration to support. Selecting all of
31DHCP, BOOTP and RARP is safe.
29 32
30 33
31 34
@@ -33,11 +36,10 @@ system you want to use. Selecting both BOOTP and RARP is safe.
332.) Kernel command line 362.) Kernel command line
34 ------------------- 37 -------------------
35 38
36When the kernel has been loaded by a boot loader (either by loadlin, 39When the kernel has been loaded by a boot loader (see below) it needs to be
37LILO or a network boot program) it has to be told what root fs device 40told what root fs device to use. And in the case of nfsroot, where to find
38to use, and where to find the server and the name of the directory 41both the server and the name of the directory on the server to mount as root.
39on the server to mount as root. This can be established by a couple 42This can be established using the following kernel command line parameters:
40of kernel command line parameters:
41 43
42 44
43root=/dev/nfs 45root=/dev/nfs
@@ -49,23 +51,21 @@ root=/dev/nfs
49 51
50nfsroot=[<server-ip>:]<root-dir>[,<nfs-options>] 52nfsroot=[<server-ip>:]<root-dir>[,<nfs-options>]
51 53
52 If the `nfsroot' parameter is NOT given on the command line, the default 54 If the `nfsroot' parameter is NOT given on the command line,
53 "/tftpboot/%s" will be used. 55 the default "/tftpboot/%s" will be used.
54 56
55 <server-ip> Specifies the IP address of the NFS server. If this field 57 <server-ip> Specifies the IP address of the NFS server.
56 is not given, the default address as determined by the 58 The default address is determined by the `ip' parameter
57 `ip' variable (see below) is used. One use of this 59 (see below). This parameter allows the use of different
58 parameter is for example to allow using different servers 60 servers for IP autoconfiguration and NFS.
59 for RARP and NFS. Usually you can leave this blank.
60 61
61 <root-dir> Name of the directory on the server to mount as root. If 62 <root-dir> Name of the directory on the server to mount as root.
62 there is a "%s" token in the string, the token will be 63 If there is a "%s" token in the string, it will be
63 replaced by the ASCII-representation of the client's IP 64 replaced by the ASCII-representation of the client's
64 address. 65 IP address.
65 66
66 <nfs-options> Standard NFS options. All options are separated by commas. 67 <nfs-options> Standard NFS options. All options are separated by commas.
67 If the options field is not given, the following defaults 68 The following defaults are used:
68 will be used:
69 port = as given by server portmap daemon 69 port = as given by server portmap daemon
70 rsize = 1024 70 rsize = 1024
71 wsize = 1024 71 wsize = 1024
@@ -81,129 +81,174 @@ nfsroot=[<server-ip>:]<root-dir>[,<nfs-options>]
81ip=<client-ip>:<server-ip>:<gw-ip>:<netmask>:<hostname>:<device>:<autoconf> 81ip=<client-ip>:<server-ip>:<gw-ip>:<netmask>:<hostname>:<device>:<autoconf>
82 82
83 This parameter tells the kernel how to configure IP addresses of devices 83 This parameter tells the kernel how to configure IP addresses of devices
84 and also how to set up the IP routing table. It was originally called `nfsaddrs', 84 and also how to set up the IP routing table. It was originally called
85 but now the boot-time IP configuration works independently of NFS, so it 85 `nfsaddrs', but now the boot-time IP configuration works independently of
86 was renamed to `ip' and the old name remained as an alias for compatibility 86 NFS, so it was renamed to `ip' and the old name remained as an alias for
87 reasons. 87 compatibility reasons.
88 88
89 If this parameter is missing from the kernel command line, all fields are 89 If this parameter is missing from the kernel command line, all fields are
90 assumed to be empty, and the defaults mentioned below apply. In general 90 assumed to be empty, and the defaults mentioned below apply. In general
91 this means that the kernel tries to configure everything using both 91 this means that the kernel tries to configure everything using
92 RARP and BOOTP (depending on what has been enabled during kernel confi- 92 autoconfiguration.
93 guration, and if both what protocol answer got in first). 93
94 The <autoconf> parameter can appear alone as the value to the `ip'
95 parameter (without all the ':' characters before) in which case auto-
96 configuration is used.
97
98 <client-ip> IP address of the client.
94 99
95 <client-ip> IP address of the client. If empty, the address will either 100 Default: Determined using autoconfiguration.
96 be determined by RARP or BOOTP. What protocol is used de-
97 pends on what has been enabled during kernel configuration
98 and on the <autoconf> parameter. If this parameter is not
99 empty, neither RARP nor BOOTP will be used.
100 101
101 <server-ip> IP address of the NFS server. If RARP is used to determine 102 <server-ip> IP address of the NFS server. If RARP is used to determine
102 the client address and this parameter is NOT empty only 103 the client address and this parameter is NOT empty only
103 replies from the specified server are accepted. To use 104 replies from the specified server are accepted.
104 different RARP and NFS server, specify your RARP server 105
105 here (or leave it blank), and specify your NFS server in 106 Only required for for NFS root. That is autoconfiguration
106 the `nfsroot' parameter (see above). If this entry is blank 107 will not be triggered if it is missing and NFS root is not
107 the address of the server is used which answered the RARP 108 in operation.
108 or BOOTP request. 109
109 110 Default: Determined using autoconfiguration.
110 <gw-ip> IP address of a gateway if the server is on a different 111 The address of the autoconfiguration server is used.
111 subnet. If this entry is empty no gateway is used and the 112
112 server is assumed to be on the local network, unless a 113 <gw-ip> IP address of a gateway if the server is on a different subnet.
113 value has been received by BOOTP. 114
114 115 Default: Determined using autoconfiguration.
115 <netmask> Netmask for local network interface. If this is empty, 116
117 <netmask> Netmask for local network interface. If unspecified
116 the netmask is derived from the client IP address assuming 118 the netmask is derived from the client IP address assuming
117 classful addressing, unless overridden in BOOTP reply. 119 classful addressing.
118 120
119 <hostname> Name of the client. If empty, the client IP address is 121 Default: Determined using autoconfiguration.
120 used in ASCII notation, or the value received by BOOTP.
121 122
122 <device> Name of network device to use. If this is empty, all 123 <hostname> Name of the client. May be supplied by autoconfiguration,
123 devices are used for RARP and BOOTP requests, and the 124 but its absence will not trigger autoconfiguration.
124 first one we receive a reply on is configured. If you have
125 only one device, you can safely leave this blank.
126 125
127 <autoconf> Method to use for autoconfiguration. If this is either 126 Default: Client IP address is used in ASCII notation.
128 'rarp' or 'bootp', the specified protocol is used.
129 If the value is 'both' or empty, both protocols are used
130 so far as they have been enabled during kernel configura-
131 tion. 'off' means no autoconfiguration.
132 127
133 The <autoconf> parameter can appear alone as the value to the `ip' 128 <device> Name of network device to use.
134 parameter (without all the ':' characters before) in which case auto- 129
135 configuration is used. 130 Default: If the host only has one device, it is used.
131 Otherwise the device is determined using
132 autoconfiguration. This is done by sending
133 autoconfiguration requests out of all devices,
134 and using the device that received the first reply.
136 135
136 <autoconf> Method to use for autoconfiguration. In the case of options
137 which specify multiple autoconfiguration protocols,
138 requests are sent using all protocols, and the first one
139 to reply is used.
137 140
141 Only autoconfiguration protocols that have been compiled
142 into the kernel will be used, regardless of the value of
143 this option.
138 144
145 off or none: don't use autoconfiguration (default)
146 on or any: use any protocol available in the kernel
147 dhcp: use DHCP
148 bootp: use BOOTP
149 rarp: use RARP
150 both: use both BOOTP and RARP but not DHCP
151 (old option kept for backwards compatibility)
139 152
1403.) Kernel loader 153 Default: any
141 -------------
142 154
143To get the kernel into memory different approaches can be used. They
144depend on what facilities are available:
145 155
146 156
1473.1) Writing the kernel onto a floppy using dd:
148 As always you can just write the kernel onto a floppy using dd,
149 but then it's not possible to use kernel command lines at all.
150 To substitute the 'root=' parameter, create a dummy device on any
151 linux system with major number 0 and minor number 255 using mknod:
152 157
153 mknod /dev/boot255 c 0 255 1583.) Boot Loader
159 ----------
154 160
155 Then copy the kernel zImage file onto a floppy using dd: 161To get the kernel into memory different approaches can be used.
162They depend on various facilities being available:
156 163
157 dd if=/usr/src/linux/arch/i386/boot/zImage of=/dev/fd0
158 164
159 And finally use rdev to set the root device: 1653.1) Booting from a floppy using syslinux
160 166
161 rdev /dev/fd0 /dev/boot255 167 When building kernels, an easy way to create a boot floppy that uses
168 syslinux is to use the zdisk or bzdisk make targets which use
169 and bzimage images respectively. Both targets accept the
170 FDARGS parameter which can be used to set the kernel command line.
162 171
163 You can then remove the dummy device /dev/boot255 again. There 172 e.g.
164 is no real device available for it. 173 make bzdisk FDARGS="root=/dev/nfs"
165 The other two kernel command line parameters cannot be substi- 174
166 tuted with rdev. Therefore, using this method the kernel will 175 Note that the user running this command will need to have
167 by default use RARP and/or BOOTP, and if it gets an answer via 176 access to the floppy drive device, /dev/fd0
168 RARP will mount the directory /tftpboot/<client-ip>/ as its 177
169 root. If it got a BOOTP answer the directory name in that answer 178 For more information on syslinux, including how to create bootdisks
170 is used. 179 for prebuilt kernels, see http://syslinux.zytor.com/
180
181 N.B: Previously it was possible to write a kernel directly to
182 a floppy using dd, configure the boot device using rdev, and
183 boot using the resulting floppy. Linux no longer supports this
184 method of booting.
185
1863.2) Booting from a cdrom using isolinux
187
188 When building kernels, an easy way to create a bootable cdrom that
189 uses isolinux is to use the isoimage target which uses a bzimage
190 image. Like zdisk and bzdisk, this target accepts the FDARGS
191 parameter which can be used to set the kernel command line.
192
193 e.g.
194 make isoimage FDARGS="root=/dev/nfs"
195
196 The resulting iso image will be arch/<ARCH>/boot/image.iso
197 This can be written to a cdrom using a variety of tools including
198 cdrecord.
199
200 e.g.
201 cdrecord dev=ATAPI:1,0,0 arch/i386/boot/image.iso
202
203 For more information on isolinux, including how to create bootdisks
204 for prebuilt kernels, see http://syslinux.zytor.com/
171 205
1723.2) Using LILO 2063.2) Using LILO
173 When using LILO you can specify all necessary command line 207 When using LILO all the necessary command line parameters may be
174 parameters with the 'append=' command in the LILO configuration 208 specified using the 'append=' directive in the LILO configuration
175 file. However, to use the 'root=' command you also need to 209 file.
176 set up a dummy device as described in 3.1 above. For how to use 210
177 LILO and its 'append=' command please refer to the LILO 211 However, to use the 'root=' directive you also need to create
178 documentation. 212 a dummy root device, which may be removed after LILO is run.
213
214 mknod /dev/boot255 c 0 255
215
216 For information on configuring LILO, please refer to its documentation.
179 217
1803.3) Using GRUB 2183.3) Using GRUB
181 When you use GRUB, you simply append the parameters after the kernel 219 When using GRUB, kernel parameter are simply appended after the kernel
182 specification: "kernel <kernel> <parameters>" (without the quotes). 220 specification: kernel <kernel> <parameters>
183 221
1843.4) Using loadlin 2223.4) Using loadlin
185 When you want to boot Linux from a DOS command prompt without 223 loadlin may be used to boot Linux from a DOS command prompt without
186 having a local hard disk to mount as root, you can use loadlin. 224 requiring a local hard disk to mount as root. This has not been
187 I was told that it works, but haven't used it myself yet. In 225 thoroughly tested by the authors of this document, but in general
188 general you should be able to create a kernel command line simi- 226 it should be possible configure the kernel command line similarly
189 lar to how LILO is doing it. Please refer to the loadlin docu- 227 to the configuration of LILO.
190 mentation for further information. 228
229 Please refer to the loadlin documentation for further information.
191 230
1923.5) Using a boot ROM 2313.5) Using a boot ROM
193 This is probably the most elegant way of booting a diskless 232 This is probably the most elegant way of booting a diskless client.
194 client. With a boot ROM the kernel gets loaded using the TFTP 233 With a boot ROM the kernel is loaded using the TFTP protocol. The
195 protocol. As far as I know, no commercial boot ROMs yet 234 authors of this document are not aware of any no commercial boot
196 support booting Linux over the network, but there are two 235 ROMs that support booting Linux over the network. However, there
197 free implementations of a boot ROM available on sunsite.unc.edu 236 are two free implementations of a boot ROM, netboot-nfs and
198 and its mirrors. They are called 'netboot-nfs' and 'etherboot'. 237 etherboot, both of which are available on sunsite.unc.edu, and both
199 Both contain everything you need to boot a diskless Linux client. 238 of which contain everything you need to boot a diskless Linux client.
200 239
2013.6) Using pxelinux 2403.6) Using pxelinux
202 Using pxelinux you specify the kernel you built with 241 Pxelinux may be used to boot linux using the PXE boot loader
242 which is present on many modern network cards.
243
244 When using pxelinux, the kernel image is specified using
203 "kernel <relative-path-below /tftpboot>". The nfsroot parameters 245 "kernel <relative-path-below /tftpboot>". The nfsroot parameters
204 are passed to the kernel by adding them to the "append" line. 246 are passed to the kernel by adding them to the "append" line.
205 You may perhaps also want to fine tune the console output, 247 It is common to use serial console in conjunction with pxeliunx,
206 see Documentation/serial-console.txt for serial console help. 248 see Documentation/serial-console.txt for more information.
249
250 For more information on isolinux, including how to create bootdisks
251 for prebuilt kernels, see http://syslinux.zytor.com/
207 252
208 253
209 254
diff --git a/Documentation/nommu-mmap.txt b/Documentation/nommu-mmap.txt
index b88ebe4d808c..7714f57caad5 100644
--- a/Documentation/nommu-mmap.txt
+++ b/Documentation/nommu-mmap.txt
@@ -116,6 +116,9 @@ FURTHER NOTES ON NO-MMU MMAP
116 (*) A list of all the mappings on the system is visible through /proc/maps in 116 (*) A list of all the mappings on the system is visible through /proc/maps in
117 no-MMU mode. 117 no-MMU mode.
118 118
119 (*) A list of all the mappings in use by a process is visible through
120 /proc/<pid>/maps in no-MMU mode.
121
119 (*) Supplying MAP_FIXED or a requesting a particular mapping address will 122 (*) Supplying MAP_FIXED or a requesting a particular mapping address will
120 result in an error. 123 result in an error.
121 124
@@ -125,6 +128,49 @@ FURTHER NOTES ON NO-MMU MMAP
125 error will result if they don't. This is most likely to be encountered 128 error will result if they don't. This is most likely to be encountered
126 with character device files, pipes, fifos and sockets. 129 with character device files, pipes, fifos and sockets.
127 130
131
132==========================
133INTERPROCESS SHARED MEMORY
134==========================
135
136Both SYSV IPC SHM shared memory and POSIX shared memory is supported in NOMMU
137mode. The former through the usual mechanism, the latter through files created
138on ramfs or tmpfs mounts.
139
140
141=======
142FUTEXES
143=======
144
145Futexes are supported in NOMMU mode if the arch supports them. An error will
146be given if an address passed to the futex system call lies outside the
147mappings made by a process or if the mapping in which the address lies does not
148support futexes (such as an I/O chardev mapping).
149
150
151=============
152NO-MMU MREMAP
153=============
154
155The mremap() function is partially supported. It may change the size of a
156mapping, and may move it[*] if MREMAP_MAYMOVE is specified and if the new size
157of the mapping exceeds the size of the slab object currently occupied by the
158memory to which the mapping refers, or if a smaller slab object could be used.
159
160MREMAP_FIXED is not supported, though it is ignored if there's no change of
161address and the object does not need to be moved.
162
163Shared mappings may not be moved. Shareable mappings may not be moved either,
164even if they are not currently shared.
165
166The mremap() function must be given an exact match for base address and size of
167a previously mapped object. It may not be used to create holes in existing
168mappings, move parts of existing mappings or resize parts of mappings. It must
169act on a complete mapping.
170
171[*] Not currently supported.
172
173
128============================================ 174============================================
129PROVIDING SHAREABLE CHARACTER DEVICE SUPPORT 175PROVIDING SHAREABLE CHARACTER DEVICE SUPPORT
130============================================ 176============================================
diff --git a/Documentation/pci.txt b/Documentation/pci.txt
index 3242e5c1ee9c..2b395e478961 100644
--- a/Documentation/pci.txt
+++ b/Documentation/pci.txt
@@ -225,7 +225,7 @@ Generic flavors of pci_request_region() are request_mem_region()
225Use these for address resources that are not described by "normal" PCI 225Use these for address resources that are not described by "normal" PCI
226interfaces (e.g. BAR). 226interfaces (e.g. BAR).
227 227
228 All interrupt handlers should be registered with SA_SHIRQ and use the devid 228 All interrupt handlers should be registered with IRQF_SHARED and use the devid
229to map IRQs to devices (remember that all PCI interrupts are shared). 229to map IRQs to devices (remember that all PCI interrupts are shared).
230 230
231 231
diff --git a/Documentation/pcieaer-howto.txt b/Documentation/pcieaer-howto.txt
new file mode 100644
index 000000000000..16c251230c82
--- /dev/null
+++ b/Documentation/pcieaer-howto.txt
@@ -0,0 +1,253 @@
1 The PCI Express Advanced Error Reporting Driver Guide HOWTO
2 T. Long Nguyen <tom.l.nguyen@intel.com>
3 Yanmin Zhang <yanmin.zhang@intel.com>
4 07/29/2006
5
6
71. Overview
8
91.1 About this guide
10
11This guide describes the basics of the PCI Express Advanced Error
12Reporting (AER) driver and provides information on how to use it, as
13well as how to enable the drivers of endpoint devices to conform with
14PCI Express AER driver.
15
161.2 Copyright © Intel Corporation 2006.
17
181.3 What is the PCI Express AER Driver?
19
20PCI Express error signaling can occur on the PCI Express link itself
21or on behalf of transactions initiated on the link. PCI Express
22defines two error reporting paradigms: the baseline capability and
23the Advanced Error Reporting capability. The baseline capability is
24required of all PCI Express components providing a minimum defined
25set of error reporting requirements. Advanced Error Reporting
26capability is implemented with a PCI Express advanced error reporting
27extended capability structure providing more robust error reporting.
28
29The PCI Express AER driver provides the infrastructure to support PCI
30Express Advanced Error Reporting capability. The PCI Express AER
31driver provides three basic functions:
32
33- Gathers the comprehensive error information if errors occurred.
34- Reports error to the users.
35- Performs error recovery actions.
36
37AER driver only attaches root ports which support PCI-Express AER
38capability.
39
40
412. User Guide
42
432.1 Include the PCI Express AER Root Driver into the Linux Kernel
44
45The PCI Express AER Root driver is a Root Port service driver attached
46to the PCI Express Port Bus driver. If a user wants to use it, the driver
47has to be compiled. Option CONFIG_PCIEAER supports this capability. It
48depends on CONFIG_PCIEPORTBUS, so pls. set CONFIG_PCIEPORTBUS=y and
49CONFIG_PCIEAER = y.
50
512.2 Load PCI Express AER Root Driver
52There is a case where a system has AER support in BIOS. Enabling the AER
53Root driver and having AER support in BIOS may result unpredictable
54behavior. To avoid this conflict, a successful load of the AER Root driver
55requires ACPI _OSC support in the BIOS to allow the AER Root driver to
56request for native control of AER. See the PCI FW 3.0 Specification for
57details regarding OSC usage. Currently, lots of firmwares don't provide
58_OSC support while they use PCI Express. To support such firmwares,
59forceload, a parameter of type bool, could enable AER to continue to
60be initiated although firmwares have no _OSC support. To enable the
61walkaround, pls. add aerdriver.forceload=y to kernel boot parameter line
62when booting kernel. Note that forceload=n by default.
63
642.3 AER error output
65When a PCI-E AER error is captured, an error message will be outputed to
66console. If it's a correctable error, it is outputed as a warning.
67Otherwise, it is printed as an error. So users could choose different
68log level to filter out correctable error messages.
69
70Below shows an example.
71+------ PCI-Express Device Error -----+
72Error Severity : Uncorrected (Fatal)
73PCIE Bus Error type : Transaction Layer
74Unsupported Request : First
75Requester ID : 0500
76VendorID=8086h, DeviceID=0329h, Bus=05h, Device=00h, Function=00h
77TLB Header:
7804000001 00200a03 05010000 00050100
79
80In the example, 'Requester ID' means the ID of the device who sends
81the error message to root port. Pls. refer to pci express specs for
82other fields.
83
84
853. Developer Guide
86
87To enable AER aware support requires a software driver to configure
88the AER capability structure within its device and to provide callbacks.
89
90To support AER better, developers need understand how AER does work
91firstly.
92
93PCI Express errors are classified into two types: correctable errors
94and uncorrectable errors. This classification is based on the impacts
95of those errors, which may result in degraded performance or function
96failure.
97
98Correctable errors pose no impacts on the functionality of the
99interface. The PCI Express protocol can recover without any software
100intervention or any loss of data. These errors are detected and
101corrected by hardware. Unlike correctable errors, uncorrectable
102errors impact functionality of the interface. Uncorrectable errors
103can cause a particular transaction or a particular PCI Express link
104to be unreliable. Depending on those error conditions, uncorrectable
105errors are further classified into non-fatal errors and fatal errors.
106Non-fatal errors cause the particular transaction to be unreliable,
107but the PCI Express link itself is fully functional. Fatal errors, on
108the other hand, cause the link to be unreliable.
109
110When AER is enabled, a PCI Express device will automatically send an
111error message to the PCIE root port above it when the device captures
112an error. The Root Port, upon receiving an error reporting message,
113internally processes and logs the error message in its PCI Express
114capability structure. Error information being logged includes storing
115the error reporting agent's requestor ID into the Error Source
116Identification Registers and setting the error bits of the Root Error
117Status Register accordingly. If AER error reporting is enabled in Root
118Error Command Register, the Root Port generates an interrupt if an
119error is detected.
120
121Note that the errors as described above are related to the PCI Express
122hierarchy and links. These errors do not include any device specific
123errors because device specific errors will still get sent directly to
124the device driver.
125
1263.1 Configure the AER capability structure
127
128AER aware drivers of PCI Express component need change the device
129control registers to enable AER. They also could change AER registers,
130including mask and severity registers. Helper function
131pci_enable_pcie_error_reporting could be used to enable AER. See
132section 3.3.
133
1343.2. Provide callbacks
135
1363.2.1 callback reset_link to reset pci express link
137
138This callback is used to reset the pci express physical link when a
139fatal error happens. The root port aer service driver provides a
140default reset_link function, but different upstream ports might
141have different specifications to reset pci express link, so all
142upstream ports should provide their own reset_link functions.
143
144In struct pcie_port_service_driver, a new pointer, reset_link, is
145added.
146
147pci_ers_result_t (*reset_link) (struct pci_dev *dev);
148
149Section 3.2.2.2 provides more detailed info on when to call
150reset_link.
151
1523.2.2 PCI error-recovery callbacks
153
154The PCI Express AER Root driver uses error callbacks to coordinate
155with downstream device drivers associated with a hierarchy in question
156when performing error recovery actions.
157
158Data struct pci_driver has a pointer, err_handler, to point to
159pci_error_handlers who consists of a couple of callback function
160pointers. AER driver follows the rules defined in
161pci-error-recovery.txt except pci express specific parts (e.g.
162reset_link). Pls. refer to pci-error-recovery.txt for detailed
163definitions of the callbacks.
164
165Below sections specify when to call the error callback functions.
166
1673.2.2.1 Correctable errors
168
169Correctable errors pose no impacts on the functionality of
170the interface. The PCI Express protocol can recover without any
171software intervention or any loss of data. These errors do not
172require any recovery actions. The AER driver clears the device's
173correctable error status register accordingly and logs these errors.
174
1753.2.2.2 Non-correctable (non-fatal and fatal) errors
176
177If an error message indicates a non-fatal error, performing link reset
178at upstream is not required. The AER driver calls error_detected(dev,
179pci_channel_io_normal) to all drivers associated within a hierarchy in
180question. for example,
181EndPoint<==>DownstreamPort B<==>UpstreamPort A<==>RootPort.
182If Upstream port A captures an AER error, the hierarchy consists of
183Downstream port B and EndPoint.
184
185A driver may return PCI_ERS_RESULT_CAN_RECOVER,
186PCI_ERS_RESULT_DISCONNECT, or PCI_ERS_RESULT_NEED_RESET, depending on
187whether it can recover or the AER driver calls mmio_enabled as next.
188
189If an error message indicates a fatal error, kernel will broadcast
190error_detected(dev, pci_channel_io_frozen) to all drivers within
191a hierarchy in question. Then, performing link reset at upstream is
192necessary. As different kinds of devices might use different approaches
193to reset link, AER port service driver is required to provide the
194function to reset link. Firstly, kernel looks for if the upstream
195component has an aer driver. If it has, kernel uses the reset_link
196callback of the aer driver. If the upstream component has no aer driver
197and the port is downstream port, we will use the aer driver of the
198root port who reports the AER error. As for upstream ports,
199they should provide their own aer service drivers with reset_link
200function. If error_detected returns PCI_ERS_RESULT_CAN_RECOVER and
201reset_link returns PCI_ERS_RESULT_RECOVERED, the error handling goes
202to mmio_enabled.
203
2043.3 helper functions
205
2063.3.1 int pci_find_aer_capability(struct pci_dev *dev);
207pci_find_aer_capability locates the PCI Express AER capability
208in the device configuration space. If the device doesn't support
209PCI-Express AER, the function returns 0.
210
2113.3.2 int pci_enable_pcie_error_reporting(struct pci_dev *dev);
212pci_enable_pcie_error_reporting enables the device to send error
213messages to root port when an error is detected. Note that devices
214don't enable the error reporting by default, so device drivers need
215call this function to enable it.
216
2173.3.3 int pci_disable_pcie_error_reporting(struct pci_dev *dev);
218pci_disable_pcie_error_reporting disables the device to send error
219messages to root port when an error is detected.
220
2213.3.4 int pci_cleanup_aer_uncorrect_error_status(struct pci_dev *dev);
222pci_cleanup_aer_uncorrect_error_status cleanups the uncorrectable
223error status register.
224
2253.4 Frequent Asked Questions
226
227Q: What happens if a PCI Express device driver does not provide an
228error recovery handler (pci_driver->err_handler is equal to NULL)?
229
230A: The devices attached with the driver won't be recovered. If the
231error is fatal, kernel will print out warning messages. Please refer
232to section 3 for more information.
233
234Q: What happens if an upstream port service driver does not provide
235callback reset_link?
236
237A: Fatal error recovery will fail if the errors are reported by the
238upstream ports who are attached by the service driver.
239
240Q: How does this infrastructure deal with driver that is not PCI
241Express aware?
242
243A: This infrastructure calls the error callback functions of the
244driver when an error happens. But if the driver is not aware of
245PCI Express, the device might not report its own errors to root
246port.
247
248Q: What modifications will that driver need to make it compatible
249with the PCI Express AER Root driver?
250
251A: It could call the helper functions to enable AER in devices and
252cleanup uncorrectable status register. Pls. refer to section 3.3.
253
diff --git a/Documentation/pcmcia/crc32hash.c b/Documentation/pcmcia/crc32hash.c
new file mode 100644
index 000000000000..cbc36d299af8
--- /dev/null
+++ b/Documentation/pcmcia/crc32hash.c
@@ -0,0 +1,32 @@
1/* crc32hash.c - derived from linux/lib/crc32.c, GNU GPL v2 */
2/* Usage example:
3$ ./crc32hash "Dual Speed"
4*/
5
6#include <string.h>
7#include <stdio.h>
8#include <ctype.h>
9#include <stdlib.h>
10
11unsigned int crc32(unsigned char const *p, unsigned int len)
12{
13 int i;
14 unsigned int crc = 0;
15 while (len--) {
16 crc ^= *p++;
17 for (i = 0; i < 8; i++)
18 crc = (crc >> 1) ^ ((crc & 1) ? 0xedb88320 : 0);
19 }
20 return crc;
21}
22
23int main(int argc, char **argv) {
24 unsigned int result;
25 if (argc != 2) {
26 printf("no string passed as argument\n");
27 return -1;
28 }
29 result = crc32(argv[1], strlen(argv[1]));
30 printf("0x%x\n", result);
31 return 0;
32}
diff --git a/Documentation/pcmcia/devicetable.txt b/Documentation/pcmcia/devicetable.txt
index 3351c0355143..199afd100cf2 100644
--- a/Documentation/pcmcia/devicetable.txt
+++ b/Documentation/pcmcia/devicetable.txt
@@ -27,37 +27,7 @@ pcmcia:m0149cC1ABf06pfn00fn00pa725B842DpbF1EFEE84pc0877B627pd00000000
27The hex value after "pa" is the hash of product ID string 1, after "pb" for 27The hex value after "pa" is the hash of product ID string 1, after "pb" for
28string 2 and so on. 28string 2 and so on.
29 29
30Alternatively, you can use this small tool to determine the crc32 hash. 30Alternatively, you can use crc32hash (see Documentation/pcmcia/crc32hash.c)
31simply pass the string you want to evaluate as argument to this program, 31to determine the crc32 hash. Simply pass the string you want to evaluate
32e.g. 32as argument to this program, e.g.:
33$ ./crc32hash "Dual Speed" 33$ ./crc32hash "Dual Speed"
34
35-------------------------------------------------------------------------
36/* crc32hash.c - derived from linux/lib/crc32.c, GNU GPL v2 */
37#include <string.h>
38#include <stdio.h>
39#include <ctype.h>
40#include <stdlib.h>
41
42unsigned int crc32(unsigned char const *p, unsigned int len)
43{
44 int i;
45 unsigned int crc = 0;
46 while (len--) {
47 crc ^= *p++;
48 for (i = 0; i < 8; i++)
49 crc = (crc >> 1) ^ ((crc & 1) ? 0xedb88320 : 0);
50 }
51 return crc;
52}
53
54int main(int argc, char **argv) {
55 unsigned int result;
56 if (argc != 2) {
57 printf("no string passed as argument\n");
58 return -1;
59 }
60 result = crc32(argv[1], strlen(argv[1]));
61 printf("0x%x\n", result);
62 return 0;
63}
diff --git a/Documentation/pi-futex.txt b/Documentation/pi-futex.txt
new file mode 100644
index 000000000000..5d61dacd21f6
--- /dev/null
+++ b/Documentation/pi-futex.txt
@@ -0,0 +1,121 @@
1Lightweight PI-futexes
2----------------------
3
4We are calling them lightweight for 3 reasons:
5
6 - in the user-space fastpath a PI-enabled futex involves no kernel work
7 (or any other PI complexity) at all. No registration, no extra kernel
8 calls - just pure fast atomic ops in userspace.
9
10 - even in the slowpath, the system call and scheduling pattern is very
11 similar to normal futexes.
12
13 - the in-kernel PI implementation is streamlined around the mutex
14 abstraction, with strict rules that keep the implementation
15 relatively simple: only a single owner may own a lock (i.e. no
16 read-write lock support), only the owner may unlock a lock, no
17 recursive locking, etc.
18
19Priority Inheritance - why?
20---------------------------
21
22The short reply: user-space PI helps achieving/improving determinism for
23user-space applications. In the best-case, it can help achieve
24determinism and well-bound latencies. Even in the worst-case, PI will
25improve the statistical distribution of locking related application
26delays.
27
28The longer reply:
29-----------------
30
31Firstly, sharing locks between multiple tasks is a common programming
32technique that often cannot be replaced with lockless algorithms. As we
33can see it in the kernel [which is a quite complex program in itself],
34lockless structures are rather the exception than the norm - the current
35ratio of lockless vs. locky code for shared data structures is somewhere
36between 1:10 and 1:100. Lockless is hard, and the complexity of lockless
37algorithms often endangers to ability to do robust reviews of said code.
38I.e. critical RT apps often choose lock structures to protect critical
39data structures, instead of lockless algorithms. Furthermore, there are
40cases (like shared hardware, or other resource limits) where lockless
41access is mathematically impossible.
42
43Media players (such as Jack) are an example of reasonable application
44design with multiple tasks (with multiple priority levels) sharing
45short-held locks: for example, a highprio audio playback thread is
46combined with medium-prio construct-audio-data threads and low-prio
47display-colory-stuff threads. Add video and decoding to the mix and
48we've got even more priority levels.
49
50So once we accept that synchronization objects (locks) are an
51unavoidable fact of life, and once we accept that multi-task userspace
52apps have a very fair expectation of being able to use locks, we've got
53to think about how to offer the option of a deterministic locking
54implementation to user-space.
55
56Most of the technical counter-arguments against doing priority
57inheritance only apply to kernel-space locks. But user-space locks are
58different, there we cannot disable interrupts or make the task
59non-preemptible in a critical section, so the 'use spinlocks' argument
60does not apply (user-space spinlocks have the same priority inversion
61problems as other user-space locking constructs). Fact is, pretty much
62the only technique that currently enables good determinism for userspace
63locks (such as futex-based pthread mutexes) is priority inheritance:
64
65Currently (without PI), if a high-prio and a low-prio task shares a lock
66[this is a quite common scenario for most non-trivial RT applications],
67even if all critical sections are coded carefully to be deterministic
68(i.e. all critical sections are short in duration and only execute a
69limited number of instructions), the kernel cannot guarantee any
70deterministic execution of the high-prio task: any medium-priority task
71could preempt the low-prio task while it holds the shared lock and
72executes the critical section, and could delay it indefinitely.
73
74Implementation:
75---------------
76
77As mentioned before, the userspace fastpath of PI-enabled pthread
78mutexes involves no kernel work at all - they behave quite similarly to
79normal futex-based locks: a 0 value means unlocked, and a value==TID
80means locked. (This is the same method as used by list-based robust
81futexes.) Userspace uses atomic ops to lock/unlock these mutexes without
82entering the kernel.
83
84To handle the slowpath, we have added two new futex ops:
85
86 FUTEX_LOCK_PI
87 FUTEX_UNLOCK_PI
88
89If the lock-acquire fastpath fails, [i.e. an atomic transition from 0 to
90TID fails], then FUTEX_LOCK_PI is called. The kernel does all the
91remaining work: if there is no futex-queue attached to the futex address
92yet then the code looks up the task that owns the futex [it has put its
93own TID into the futex value], and attaches a 'PI state' structure to
94the futex-queue. The pi_state includes an rt-mutex, which is a PI-aware,
95kernel-based synchronization object. The 'other' task is made the owner
96of the rt-mutex, and the FUTEX_WAITERS bit is atomically set in the
97futex value. Then this task tries to lock the rt-mutex, on which it
98blocks. Once it returns, it has the mutex acquired, and it sets the
99futex value to its own TID and returns. Userspace has no other work to
100perform - it now owns the lock, and futex value contains
101FUTEX_WAITERS|TID.
102
103If the unlock side fastpath succeeds, [i.e. userspace manages to do a
104TID -> 0 atomic transition of the futex value], then no kernel work is
105triggered.
106
107If the unlock fastpath fails (because the FUTEX_WAITERS bit is set),
108then FUTEX_UNLOCK_PI is called, and the kernel unlocks the futex on the
109behalf of userspace - and it also unlocks the attached
110pi_state->rt_mutex and thus wakes up any potential waiters.
111
112Note that under this approach, contrary to previous PI-futex approaches,
113there is no prior 'registration' of a PI-futex. [which is not quite
114possible anyway, due to existing ABI properties of pthread mutexes.]
115
116Also, under this scheme, 'robustness' and 'PI' are two orthogonal
117properties of futexes, and all four combinations are possible: futex,
118robust-futex, PI-futex, robust+PI-futex.
119
120More details about priority inheritance can be found in
121Documentation/rtmutex.txt.
diff --git a/Documentation/power/devices.txt b/Documentation/power/devices.txt
index fba1e05c47c7..d0e79d5820a5 100644
--- a/Documentation/power/devices.txt
+++ b/Documentation/power/devices.txt
@@ -1,208 +1,553 @@
1Most of the code in Linux is device drivers, so most of the Linux power
2management code is also driver-specific. Most drivers will do very little;
3others, especially for platforms with small batteries (like cell phones),
4will do a lot.
5
6This writeup gives an overview of how drivers interact with system-wide
7power management goals, emphasizing the models and interfaces that are
8shared by everything that hooks up to the driver model core. Read it as
9background for the domain-specific work you'd do with any specific driver.
10
11
12Two Models for Device Power Management
13======================================
14Drivers will use one or both of these models to put devices into low-power
15states:
16
17 System Sleep model:
18 Drivers can enter low power states as part of entering system-wide
19 low-power states like "suspend-to-ram", or (mostly for systems with
20 disks) "hibernate" (suspend-to-disk).
21
22 This is something that device, bus, and class drivers collaborate on
23 by implementing various role-specific suspend and resume methods to
24 cleanly power down hardware and software subsystems, then reactivate
25 them without loss of data.
26
27 Some drivers can manage hardware wakeup events, which make the system
28 leave that low-power state. This feature may be disabled using the
29 relevant /sys/devices/.../power/wakeup file; enabling it may cost some
30 power usage, but let the whole system enter low power states more often.
31
32 Runtime Power Management model:
33 Drivers may also enter low power states while the system is running,
34 independently of other power management activity. Upstream drivers
35 will normally not know (or care) if the device is in some low power
36 state when issuing requests; the driver will auto-resume anything
37 that's needed when it gets a request.
38
39 This doesn't have, or need much infrastructure; it's just something you
40 should do when writing your drivers. For example, clk_disable() unused
41 clocks as part of minimizing power drain for currently-unused hardware.
42 Of course, sometimes clusters of drivers will collaborate with each
43 other, which could involve task-specific power management.
44
45There's not a lot to be said about those low power states except that they
46are very system-specific, and often device-specific. Also, that if enough
47drivers put themselves into low power states (at "runtime"), the effect may be
48the same as entering some system-wide low-power state (system sleep) ... and
49that synergies exist, so that several drivers using runtime pm might put the
50system into a state where even deeper power saving options are available.
51
52Most suspended devices will have quiesced all I/O: no more DMA or irqs, no
53more data read or written, and requests from upstream drivers are no longer
54accepted. A given bus or platform may have different requirements though.
55
56Examples of hardware wakeup events include an alarm from a real time clock,
57network wake-on-LAN packets, keyboard or mouse activity, and media insertion
58or removal (for PCMCIA, MMC/SD, USB, and so on).
59
60
61Interfaces for Entering System Sleep States
62===========================================
63Most of the programming interfaces a device driver needs to know about
64relate to that first model: entering a system-wide low power state,
65rather than just minimizing power consumption by one device.
66
67
68Bus Driver Methods
69------------------
70The core methods to suspend and resume devices reside in struct bus_type.
71These are mostly of interest to people writing infrastructure for busses
72like PCI or USB, or because they define the primitives that device drivers
73may need to apply in domain-specific ways to their devices:
1 74
2Device Power Management 75struct bus_type {
76 ...
77 int (*suspend)(struct device *dev, pm_message_t state);
78 int (*suspend_late)(struct device *dev, pm_message_t state);
3 79
80 int (*resume_early)(struct device *dev);
81 int (*resume)(struct device *dev);
82};
4 83
5Device power management encompasses two areas - the ability to save 84Bus drivers implement those methods as appropriate for the hardware and
6state and transition a device to a low-power state when the system is 85the drivers using it; PCI works differently from USB, and so on. Not many
7entering a low-power state; and the ability to transition a device to 86people write bus drivers; most driver code is a "device driver" that
8a low-power state while the system is running (and independently of 87builds on top of bus-specific framework code.
9any other power management activity). 88
89For more information on these driver calls, see the description later;
90they are called in phases for every device, respecting the parent-child
91sequencing in the driver model tree. Note that as this is being written,
92only the suspend() and resume() are widely available; not many bus drivers
93leverage all of those phases, or pass them down to lower driver levels.
94
95
96/sys/devices/.../power/wakeup files
97-----------------------------------
98All devices in the driver model have two flags to control handling of
99wakeup events, which are hardware signals that can force the device and/or
100system out of a low power state. These are initialized by bus or device
101driver code using device_init_wakeup(dev,can_wakeup).
102
103The "can_wakeup" flag just records whether the device (and its driver) can
104physically support wakeup events. When that flag is clear, the sysfs
105"wakeup" file is empty, and device_may_wakeup() returns false.
106
107For devices that can issue wakeup events, a separate flag controls whether
108that device should try to use its wakeup mechanism. The initial value of
109device_may_wakeup() will be true, so that the device's "wakeup" file holds
110the value "enabled". Userspace can change that to "disabled" so that
111device_may_wakeup() returns false; or change it back to "enabled" (so that
112it returns true again).
113
114
115EXAMPLE: PCI Device Driver Methods
116-----------------------------------
117PCI framework software calls these methods when the PCI device driver bound
118to a device device has provided them:
119
120struct pci_driver {
121 ...
122 int (*suspend)(struct pci_device *pdev, pm_message_t state);
123 int (*suspend_late)(struct pci_device *pdev, pm_message_t state);
124
125 int (*resume_early)(struct pci_device *pdev);
126 int (*resume)(struct pci_device *pdev);
127};
10 128
129Drivers will implement those methods, and call PCI-specific procedures
130like pci_set_power_state(), pci_enable_wake(), pci_save_state(), and
131pci_restore_state() to manage PCI-specific mechanisms. (PCI config space
132could be saved during driver probe, if it weren't for the fact that some
133systems rely on userspace tweaking using setpci.) Devices are suspended
134before their bridges enter low power states, and likewise bridges resume
135before their devices.
136
137
138Upper Layers of Driver Stacks
139-----------------------------
140Device drivers generally have at least two interfaces, and the methods
141sketched above are the ones which apply to the lower level (nearer PCI, USB,
142or other bus hardware). The network and block layers are examples of upper
143level interfaces, as is a character device talking to userspace.
144
145Power management requests normally need to flow through those upper levels,
146which often use domain-oriented requests like "blank that screen". In
147some cases those upper levels will have power management intelligence that
148relates to end-user activity, or other devices that work in cooperation.
149
150When those interfaces are structured using class interfaces, there is a
151standard way to have the upper layer stop issuing requests to a given
152class device (and restart later):
153
154struct class {
155 ...
156 int (*suspend)(struct device *dev, pm_message_t state);
157 int (*resume)(struct device *dev);
158};
11 159
12Methods 160Those calls are issued in specific phases of the process by which the
161system enters a low power "suspend" state, or resumes from it.
162
163
164Calling Drivers to Enter System Sleep States
165============================================
166When the system enters a low power state, each device's driver is asked
167to suspend the device by putting it into state compatible with the target
168system state. That's usually some version of "off", but the details are
169system-specific. Also, wakeup-enabled devices will usually stay partly
170functional in order to wake the system.
171
172When the system leaves that low power state, the device's driver is asked
173to resume it. The suspend and resume operations always go together, and
174both are multi-phase operations.
175
176For simple drivers, suspend might quiesce the device using the class code
177and then turn its hardware as "off" as possible with late_suspend. The
178matching resume calls would then completely reinitialize the hardware
179before reactivating its class I/O queues.
180
181More power-aware drivers drivers will use more than one device low power
182state, either at runtime or during system sleep states, and might trigger
183system wakeup events.
184
185
186Call Sequence Guarantees
187------------------------
188To ensure that bridges and similar links needed to talk to a device are
189available when the device is suspended or resumed, the device tree is
190walked in a bottom-up order to suspend devices. A top-down order is
191used to resume those devices.
192
193The ordering of the device tree is defined by the order in which devices
194get registered: a child can never be registered, probed or resumed before
195its parent; and can't be removed or suspended after that parent.
196
197The policy is that the device tree should match hardware bus topology.
198(Or at least the control bus, for devices which use multiple busses.)
199
200
201Suspending Devices
202------------------
203Suspending a given device is done in several phases. Suspending the
204system always includes every phase, executing calls for every device
205before the next phase begins. Not all busses or classes support all
206these callbacks; and not all drivers use all the callbacks.
207
208The phases are seen by driver notifications issued in this order:
209
210 1 class.suspend(dev, message) is called after tasks are frozen, for
211 devices associated with a class that has such a method. This
212 method may sleep.
213
214 Since I/O activity usually comes from such higher layers, this is
215 a good place to quiesce all drivers of a given type (and keep such
216 code out of those drivers).
217
218 2 bus.suspend(dev, message) is called next. This method may sleep,
219 and is often morphed into a device driver call with bus-specific
220 parameters and/or rules.
221
222 This call should handle parts of device suspend logic that require
223 sleeping. It probably does work to quiesce the device which hasn't
224 been abstracted into class.suspend() or bus.suspend_late().
225
226 3 bus.suspend_late(dev, message) is called with IRQs disabled, and
227 with only one CPU active. Until the bus.resume_early() phase
228 completes (see later), IRQs are not enabled again. This method
229 won't be exposed by all busses; for message based busses like USB,
230 I2C, or SPI, device interactions normally require IRQs. This bus
231 call may be morphed into a driver call with bus-specific parameters.
232
233 This call might save low level hardware state that might otherwise
234 be lost in the upcoming low power state, and actually put the
235 device into a low power state ... so that in some cases the device
236 may stay partly usable until this late. This "late" call may also
237 help when coping with hardware that behaves badly.
238
239The pm_message_t parameter is currently used to refine those semantics
240(described later).
241
242At the end of those phases, drivers should normally have stopped all I/O
243transactions (DMA, IRQs), saved enough state that they can re-initialize
244or restore previous state (as needed by the hardware), and placed the
245device into a low-power state. On many platforms they will also use
246clk_disable() to gate off one or more clock sources; sometimes they will
247also switch off power supplies, or reduce voltages. Drivers which have
248runtime PM support may already have performed some or all of the steps
249needed to prepare for the upcoming system sleep state.
250
251When any driver sees that its device_can_wakeup(dev), it should make sure
252to use the relevant hardware signals to trigger a system wakeup event.
253For example, enable_irq_wake() might identify GPIO signals hooked up to
254a switch or other external hardware, and pci_enable_wake() does something
255similar for PCI's PME# signal.
256
257If a driver (or bus, or class) fails it suspend method, the system won't
258enter the desired low power state; it will resume all the devices it's
259suspended so far.
260
261Note that drivers may need to perform different actions based on the target
262system lowpower/sleep state. At this writing, there are only platform
263specific APIs through which drivers could determine those target states.
264
265
266Device Low Power (suspend) States
267---------------------------------
268Device low-power states aren't very standard. One device might only handle
269"on" and "off, while another might support a dozen different versions of
270"on" (how many engines are active?), plus a state that gets back to "on"
271faster than from a full "off".
272
273Some busses define rules about what different suspend states mean. PCI
274gives one example: after the suspend sequence completes, a non-legacy
275PCI device may not perform DMA or issue IRQs, and any wakeup events it
276issues would be issued through the PME# bus signal. Plus, there are
277several PCI-standard device states, some of which are optional.
278
279In contrast, integrated system-on-chip processors often use irqs as the
280wakeup event sources (so drivers would call enable_irq_wake) and might
281be able to treat DMA completion as a wakeup event (sometimes DMA can stay
282active too, it'd only be the CPU and some peripherals that sleep).
283
284Some details here may be platform-specific. Systems may have devices that
285can be fully active in certain sleep states, such as an LCD display that's
286refreshed using DMA while most of the system is sleeping lightly ... and
287its frame buffer might even be updated by a DSP or other non-Linux CPU while
288the Linux control processor stays idle.
289
290Moreover, the specific actions taken may depend on the target system state.
291One target system state might allow a given device to be very operational;
292another might require a hard shut down with re-initialization on resume.
293And two different target systems might use the same device in different
294ways; the aforementioned LCD might be active in one product's "standby",
295but a different product using the same SOC might work differently.
296
297
298Meaning of pm_message_t.event
299-----------------------------
300Parameters to suspend calls include the device affected and a message of
301type pm_message_t, which has one field: the event. If driver does not
302recognize the event code, suspend calls may abort the request and return
303a negative errno. However, most drivers will be fine if they implement
304PM_EVENT_SUSPEND semantics for all messages.
305
306The event codes are used to refine the goal of suspending the device, and
307mostly matter when creating or resuming system memory image snapshots, as
308used with suspend-to-disk:
309
310 PM_EVENT_SUSPEND -- quiesce the driver and put hardware into a low-power
311 state. When used with system sleep states like "suspend-to-RAM" or
312 "standby", the upcoming resume() call will often be able to rely on
313 state kept in hardware, or issue system wakeup events. When used
314 instead with suspend-to-disk, few devices support this capability;
315 most are completely powered off.
316
317 PM_EVENT_FREEZE -- quiesce the driver, but don't necessarily change into
318 any low power mode. A system snapshot is about to be taken, often
319 followed by a call to the driver's resume() method. Neither wakeup
320 events nor DMA are allowed.
321
322 PM_EVENT_PRETHAW -- quiesce the driver, knowing that the upcoming resume()
323 will restore a suspend-to-disk snapshot from a different kernel image.
324 Drivers that are smart enough to look at their hardware state during
325 resume() processing need that state to be correct ... a PRETHAW could
326 be used to invalidate that state (by resetting the device), like a
327 shutdown() invocation would before a kexec() or system halt. Other
328 drivers might handle this the same way as PM_EVENT_FREEZE. Neither
329 wakeup events nor DMA are allowed.
330
331To enter "standby" (ACPI S1) or "Suspend to RAM" (STR, ACPI S3) states, or
332the similarly named APM states, only PM_EVENT_SUSPEND is used; for "Suspend
333to Disk" (STD, hibernate, ACPI S4), all of those event codes are used.
334
335There's also PM_EVENT_ON, a value which never appears as a suspend event
336but is sometimes used to record the "not suspended" device state.
337
338
339Resuming Devices
340----------------
341Resuming is done in multiple phases, much like suspending, with all
342devices processing each phase's calls before the next phase begins.
343
344The phases are seen by driver notifications issued in this order:
345
346 1 bus.resume_early(dev) is called with IRQs disabled, and with
347 only one CPU active. As with bus.suspend_late(), this method
348 won't be supported on busses that require IRQs in order to
349 interact with devices.
350
351 This reverses the effects of bus.suspend_late().
352
353 2 bus.resume(dev) is called next. This may be morphed into a device
354 driver call with bus-specific parameters; implementations may sleep.
355
356 This reverses the effects of bus.suspend().
357
358 3 class.resume(dev) is called for devices associated with a class
359 that has such a method. Implementations may sleep.
360
361 This reverses the effects of class.suspend(), and would usually
362 reactivate the device's I/O queue.
363
364At the end of those phases, drivers should normally be as functional as
365they were before suspending: I/O can be performed using DMA and IRQs, and
366the relevant clocks are gated on. The device need not be "fully on"; it
367might be in a runtime lowpower/suspend state that acts as if it were.
368
369However, the details here may again be platform-specific. For example,
370some systems support multiple "run" states, and the mode in effect at
371the end of resume() might not be the one which preceded suspension.
372That means availability of certain clocks or power supplies changed,
373which could easily affect how a driver works.
374
375
376Drivers need to be able to handle hardware which has been reset since the
377suspend methods were called, for example by complete reinitialization.
378This may be the hardest part, and the one most protected by NDA'd documents
379and chip errata. It's simplest if the hardware state hasn't changed since
380the suspend() was called, but that can't always be guaranteed.
381
382Drivers must also be prepared to notice that the device has been removed
383while the system was powered off, whenever that's physically possible.
384PCMCIA, MMC, USB, Firewire, SCSI, and even IDE are common examples of busses
385where common Linux platforms will see such removal. Details of how drivers
386will notice and handle such removals are currently bus-specific, and often
387involve a separate thread.
13 388
14The methods to suspend and resume devices reside in struct bus_type:
15 389
16struct bus_type { 390Note that the bus-specific runtime PM wakeup mechanism can exist, and might
17 ... 391be defined to share some of the same driver code as for system wakeup. For
18 int (*suspend)(struct device * dev, pm_message_t state); 392example, a bus-specific device driver's resume() method might be used there,
19 int (*resume)(struct device * dev); 393so it wouldn't only be called from bus.resume() during system-wide wakeup.
20}; 394See bus-specific information about how runtime wakeup events are handled.
21 395
22Each bus driver is responsible implementing these methods, translating
23the call into a bus-specific request and forwarding the call to the
24bus-specific drivers. For example, PCI drivers implement suspend() and
25resume() methods in struct pci_driver. The PCI core is simply
26responsible for translating the pointers to PCI-specific ones and
27calling the low-level driver.
28
29This is done to a) ease transition to the new power management methods
30and leverage the existing PM code in various bus drivers; b) allow
31buses to implement generic and default PM routines for devices, and c)
32make the flow of execution obvious to the reader.
33
34
35System Power Management
36
37When the system enters a low-power state, the device tree is walked in
38a depth-first fashion to transition each device into a low-power
39state. The ordering of the device tree is guaranteed by the order in
40which devices get registered - children are never registered before
41their ancestors, and devices are placed at the back of the list when
42registered. By walking the list in reverse order, we are guaranteed to
43suspend devices in the proper order.
44
45Devices are suspended once with interrupts enabled. Drivers are
46expected to stop I/O transactions, save device state, and place the
47device into a low-power state. Drivers may sleep, allocate memory,
48etc. at will.
49
50Some devices are broken and will inevitably have problems powering
51down or disabling themselves with interrupts enabled. For these
52special cases, they may return -EAGAIN. This will put the device on a
53list to be taken care of later. When interrupts are disabled, before
54we enter the low-power state, their drivers are called again to put
55their device to sleep.
56
57On resume, the devices that returned -EAGAIN will be called to power
58themselves back on with interrupts disabled. Once interrupts have been
59re-enabled, the rest of the drivers will be called to resume their
60devices. On resume, a driver is responsible for powering back on each
61device, restoring state, and re-enabling I/O transactions for that
62device.
63 396
397System Devices
398--------------
64System devices follow a slightly different API, which can be found in 399System devices follow a slightly different API, which can be found in
65 400
66 include/linux/sysdev.h 401 include/linux/sysdev.h
67 drivers/base/sys.c 402 drivers/base/sys.c
68 403
69System devices will only be suspended with interrupts disabled, and 404System devices will only be suspended with interrupts disabled, and after
70after all other devices have been suspended. On resume, they will be 405all other devices have been suspended. On resume, they will be resumed
71resumed before any other devices, and also with interrupts disabled. 406before any other devices, and also with interrupts disabled.
72 407
408That is, IRQs are disabled, the suspend_late() phase begins, then the
409sysdev_driver.suspend() phase, and the system enters a sleep state. Then
410the sysdev_driver.resume() phase begins, followed by the resume_early()
411phase, after which IRQs are enabled.
73 412
74Runtime Power Management 413Code to actually enter and exit the system-wide low power state sometimes
75 414involves hardware details that are only known to the boot firmware, and
76Many devices are able to dynamically power down while the system is 415may leave a CPU running software (from SRAM or flash memory) that monitors
77still running. This feature is useful for devices that are not being 416the system and manages its wakeup sequence.
78used, and can offer significant power savings on a running system.
79
80In each device's directory, there is a 'power' directory, which
81contains at least a 'state' file. Reading from this file displays what
82power state the device is currently in. Writing to this file initiates
83a transition to the specified power state, which must be a decimal in
84the range 1-3, inclusive; or 0 for 'On'.
85 417
86The PM core will call the ->suspend() method in the bus_type object
87that the device belongs to if the specified state is not 0, or
88->resume() if it is.
89 418
90Nothing will happen if the specified state is the same state the 419Runtime Power Management
91device is currently in. 420========================
92 421Many devices are able to dynamically power down while the system is still
93If the device is already in a low-power state, and the specified state 422running. This feature is useful for devices that are not being used, and
94is another, but different, low-power state, the ->resume() method will 423can offer significant power savings on a running system. These devices
95first be called to power the device back on, then ->suspend() will be 424often support a range of runtime power states, which might use names such
96called again with the new state. 425as "off", "sleep", "idle", "active", and so on. Those states will in some
97 426cases (like PCI) be partially constrained by a bus the device uses, and will
98The driver is responsible for saving the working state of the device 427usually include hardware states that are also used in system sleep states.
99and putting it into the low-power state specified. If this was 428
100successful, it returns 0, and the device's power_state field is 429However, note that if a driver puts a device into a runtime low power state
101updated. 430and the system then goes into a system-wide sleep state, it normally ought
102 431to resume into that runtime low power state rather than "full on". Such
103The driver must take care to know whether or not it is able to 432distinctions would be part of the driver-internal state machine for that
104properly resume the device, including all step of reinitialization 433hardware; the whole point of runtime power management is to be sure that
105necessary. (This is the hardest part, and the one most protected by 434drivers are decoupled in that way from the state machine governing phases
106NDA'd documents). 435of the system-wide power/sleep state transitions.
107 436
108The driver must also take care not to suspend a device that is 437
109currently in use. It is their responsibility to provide their own 438Power Saving Techniques
110exclusion mechanisms. 439-----------------------
111 440Normally runtime power management is handled by the drivers without specific
112The runtime power transition happens with interrupts enabled. If a 441userspace or kernel intervention, by device-aware use of techniques like:
113device cannot support being powered down with interrupts, it may 442
114return -EAGAIN (as it would during a system power management 443 Using information provided by other system layers
115transition), but it will _not_ be called again, and the transaction 444 - stay deeply "off" except between open() and close()
116will fail. 445 - if transceiver/PHY indicates "nobody connected", stay "off"
117 446 - application protocols may include power commands or hints
118There is currently no way to know what states a device or driver 447
119supports a priori. This will change in the future. 448 Using fewer CPU cycles
120 449 - using DMA instead of PIO
121pm_message_t meaning 450 - removing timers, or making them lower frequency
122 451 - shortening "hot" code paths
123pm_message_t has two fields. event ("major"), and flags. If driver 452 - eliminating cache misses
124does not know event code, it aborts the request, returning error. Some 453 - (sometimes) offloading work to device firmware
125drivers may need to deal with special cases based on the actual type 454
126of suspend operation being done at the system level. This is why 455 Reducing other resource costs
127there are flags. 456 - gating off unused clocks in software (or hardware)
128 457 - switching off unused power supplies
129Event codes are: 458 - eliminating (or delaying/merging) IRQs
130 459 - tuning DMA to use word and/or burst modes
131ON -- no need to do anything except special cases like broken 460
132HW. 461 Using device-specific low power states
133 462 - using lower voltages
134# NOTIFICATION -- pretty much same as ON? 463 - avoiding needless DMA transfers
135 464
136FREEZE -- stop DMA and interrupts, and be prepared to reinit HW from 465Read your hardware documentation carefully to see the opportunities that
137scratch. That probably means stop accepting upstream requests, the 466may be available. If you can, measure the actual power usage and check
138actual policy of what to do with them being specific to a given 467it against the budget established for your project.
139driver. It's acceptable for a network driver to just drop packets 468
140while a block driver is expected to block the queue so no request is 469
141lost. (Use IDE as an example on how to do that). FREEZE requires no 470Examples: USB hosts, system timer, system CPU
142power state change, and it's expected for drivers to be able to 471----------------------------------------------
143quickly transition back to operating state. 472USB host controllers make interesting, if complex, examples. In many cases
144 473these have no work to do: no USB devices are connected, or all of them are
145SUSPEND -- like FREEZE, but also put hardware into low-power state. If 474in the USB "suspend" state. Linux host controller drivers can then disable
146there's need to distinguish several levels of sleep, additional flag 475periodic DMA transfers that would otherwise be a constant power drain on the
147is probably best way to do that. 476memory subsystem, and enter a suspend state. In power-aware controllers,
148 477entering that suspend state may disable the clock used with USB signaling,
149Transitions are only from a resumed state to a suspended state, never 478saving a certain amount of power.
150between 2 suspended states. (ON -> FREEZE or ON -> SUSPEND can happen, 479
151FREEZE -> SUSPEND or SUSPEND -> FREEZE can not). 480The controller will be woken from that state (with an IRQ) by changes to the
152 481signal state on the data lines of a given port, for example by an existing
153All events are: 482peripheral requesting "remote wakeup" or by plugging a new peripheral. The
154 483same wakeup mechanism usually works from "standby" sleep states, and on some
155[NOTE NOTE NOTE: If you are driver author, you should not care; you 484systems also from "suspend to RAM" (or even "suspend to disk") states.
156should only look at event, and ignore flags.] 485(Except that ACPI may be involved instead of normal IRQs, on some hardware.)
157 486
158#Prepare for suspend -- userland is still running but we are going to 487System devices like timers and CPUs may have special roles in the platform
159#enter suspend state. This gives drivers chance to load firmware from 488power management scheme. For example, system timers using a "dynamic tick"
160#disk and store it in memory, or do other activities taht require 489approach don't just save CPU cycles (by eliminating needless timer IRQs),
161#operating userland, ability to kmalloc GFP_KERNEL, etc... All of these 490but they may also open the door to using lower power CPU "idle" states that
162#are forbiden once the suspend dance is started.. event = ON, flags = 491cost more than a jiffie to enter and exit. On x86 systems these are states
163#PREPARE_TO_SUSPEND 492like "C3"; note that periodic DMA transfers from a USB host controller will
164 493also prevent entry to a C3 state, much like a periodic timer IRQ.
165Apm standby -- prepare for APM event. Quiesce devices to make life 494
166easier for APM BIOS. event = FREEZE, flags = APM_STANDBY 495That kind of runtime mechanism interaction is common. "System On Chip" (SOC)
167 496processors often have low power idle modes that can't be entered unless
168Apm suspend -- same as APM_STANDBY, but it we should probably avoid 497certain medium-speed clocks (often 12 or 48 MHz) are gated off. When the
169spinning down disks. event = FREEZE, flags = APM_SUSPEND 498drivers gate those clocks effectively, then the system idle task may be able
170 499to use the lower power idle modes and thereby increase battery life.
171System halt, reboot -- quiesce devices to make life easier for BIOS. event 500
172= FREEZE, flags = SYSTEM_HALT or SYSTEM_REBOOT 501If the CPU can have a "cpufreq" driver, there also may be opportunities
173 502to shift to lower voltage settings and reduce the power cost of executing
174System shutdown -- at least disks need to be spun down, or data may be 503a given number of instructions. (Without voltage adjustment, it's rare
175lost. Quiesce devices, just to make life easier for BIOS. event = 504for cpufreq to save much power; the cost-per-instruction must go down.)
176FREEZE, flags = SYSTEM_SHUTDOWN 505
177 506
178Kexec -- turn off DMAs and put hardware into some state where new 507/sys/devices/.../power/state files
179kernel can take over. event = FREEZE, flags = KEXEC 508==================================
180 509For now you can also test some of this functionality using sysfs.
181Powerdown at end of swsusp -- very similar to SYSTEM_SHUTDOWN, except wake 510
182may need to be enabled on some devices. This actually has at least 3 511 DEPRECATED: USE "power/state" ONLY FOR DRIVER TESTING, AND
183subtypes, system can reboot, enter S4 and enter S5 at the end of 512 AVOID USING dev->power.power_state IN DRIVERS.
184swsusp. event = FREEZE, flags = SWSUSP and one of SYSTEM_REBOOT, 513
185SYSTEM_SHUTDOWN, SYSTEM_S4 514 THESE WILL BE REMOVED. IF THE "power/state" FILE GETS REPLACED,
186 515 IT WILL BECOME SOMETHING COUPLED TO THE BUS OR DRIVER.
187Suspend to ram -- put devices into low power state. event = SUSPEND, 516
188flags = SUSPEND_TO_RAM 517In each device's directory, there is a 'power' directory, which contains
189 518at least a 'state' file. The value of this field is effectively boolean,
190Freeze for swsusp snapshot -- stop DMA and interrupts. No need to put 519PM_EVENT_ON or PM_EVENT_SUSPEND.
191devices into low power mode, but you must be able to reinitialize 520
192device from scratch in resume method. This has two flavors, its done 521 * Reading from this file displays a value corresponding to
193once on suspending kernel, once on resuming kernel. event = FREEZE, 522 the power.power_state.event field. All nonzero values are
194flags = DURING_SUSPEND or DURING_RESUME 523 displayed as "2", corresponding to a low power state; zero
195 524 is displayed as "0", corresponding to normal operation.
196Device detach requested from /sys -- deinitialize device; proably same as 525
197SYSTEM_SHUTDOWN, I do not understand this one too much. probably event 526 * Writing to this file initiates a transition using the
198= FREEZE, flags = DEV_DETACH. 527 specified event code number; only '0', '2', and '3' are
199 528 accepted (without a newline); '2' and '3' are both
200#These are not really events sent: 529 mapped to PM_EVENT_SUSPEND.
201# 530
202#System fully on -- device is working normally; this is probably never 531On writes, the PM core relies on that recorded event code and the device/bus
203#passed to suspend() method... event = ON, flags = 0 532capabilities to determine whether it uses a partial suspend() or resume()
204# 533sequence to change things so that the recorded event corresponds to the
205#Ready after resume -- userland is now running, again. Time to free any 534numeric parameter.
206#memory you ate during prepare to suspend... event = ON, flags = 535
207#READY_AFTER_RESUME 536 - If the bus requires the irqs-disabled suspend_late()/resume_early()
208# 537 phases, writes fail because those operations are not supported here.
538
539 - If the recorded value is the expected value, nothing is done.
540
541 - If the recorded value is nonzero, the device is partially resumed,
542 using the bus.resume() and/or class.resume() methods.
543
544 - If the target value is nonzero, the device is partially suspended,
545 using the class.suspend() and/or bus.suspend() methods and the
546 PM_EVENT_SUSPEND message.
547
548Drivers have no way to tell whether their suspend() and resume() calls
549have come through the sysfs power/state file or as part of entering a
550system sleep state, except that when accessed through sysfs the normal
551parent/child sequencing rules are ignored. Drivers (such as bus, bridge,
552or hub drivers) which expose child devices may need to enforce those rules
553on their own.
diff --git a/Documentation/power/interface.txt b/Documentation/power/interface.txt
index 4117802af0f8..a66bec222b16 100644
--- a/Documentation/power/interface.txt
+++ b/Documentation/power/interface.txt
@@ -52,3 +52,18 @@ suspend image will be as small as possible.
52 52
53Reading from this file will display the current image size limit, which 53Reading from this file will display the current image size limit, which
54is set to 500 MB by default. 54is set to 500 MB by default.
55
56/sys/power/pm_trace controls the code which saves the last PM event point in
57the RTC across reboots, so that you can debug a machine that just hangs
58during suspend (or more commonly, during resume). Namely, the RTC is only
59used to save the last PM event point if this file contains '1'. Initially it
60contains '0' which may be changed to '1' by writing a string representing a
61nonzero integer into it.
62
63To use this debugging feature you should attempt to suspend the machine, then
64reboot it and run
65
66 dmesg -s 1000000 | grep 'hash matches'
67
68CAUTION: Using it will cause your machine's real-time (CMOS) clock to be
69set to a random invalid time after a resume.
diff --git a/Documentation/powerpc/booting-without-of.txt b/Documentation/powerpc/booting-without-of.txt
index 217e51768b87..5c0ba235f5a5 100644
--- a/Documentation/powerpc/booting-without-of.txt
+++ b/Documentation/powerpc/booting-without-of.txt
@@ -1136,10 +1136,10 @@ Sense and level information should be encoded as follows:
1136 Devices connected to openPIC-compatible controllers should encode 1136 Devices connected to openPIC-compatible controllers should encode
1137 sense and polarity as follows: 1137 sense and polarity as follows:
1138 1138
1139 0 = high to low edge sensitive type enabled 1139 0 = low to high edge sensitive type enabled
1140 1 = active low level sensitive type enabled 1140 1 = active low level sensitive type enabled
1141 2 = low to high edge sensitive type enabled 1141 2 = active high level sensitive type enabled
1142 3 = active high level sensitive type enabled 1142 3 = high to low edge sensitive type enabled
1143 1143
1144 ISA PIC interrupt controllers should adhere to the ISA PIC 1144 ISA PIC interrupt controllers should adhere to the ISA PIC
1145 encodings listed below: 1145 encodings listed below:
@@ -1196,7 +1196,7 @@ platforms are moved over to use the flattened-device-tree model.
1196 - model : Model of the device. Can be "TSEC", "eTSEC", or "FEC" 1196 - model : Model of the device. Can be "TSEC", "eTSEC", or "FEC"
1197 - compatible : Should be "gianfar" 1197 - compatible : Should be "gianfar"
1198 - reg : Offset and length of the register set for the device 1198 - reg : Offset and length of the register set for the device
1199 - address : List of bytes representing the ethernet address of 1199 - mac-address : List of bytes representing the ethernet address of
1200 this controller 1200 this controller
1201 - interrupts : <a b> where a is the interrupt number and b is a 1201 - interrupts : <a b> where a is the interrupt number and b is a
1202 field that represents an encoding of the sense and level 1202 field that represents an encoding of the sense and level
@@ -1216,7 +1216,7 @@ platforms are moved over to use the flattened-device-tree model.
1216 model = "TSEC"; 1216 model = "TSEC";
1217 compatible = "gianfar"; 1217 compatible = "gianfar";
1218 reg = <24000 1000>; 1218 reg = <24000 1000>;
1219 address = [ 00 E0 0C 00 73 00 ]; 1219 mac-address = [ 00 E0 0C 00 73 00 ];
1220 interrupts = <d 3 e 3 12 3>; 1220 interrupts = <d 3 e 3 12 3>;
1221 interrupt-parent = <40000>; 1221 interrupt-parent = <40000>;
1222 phy-handle = <2452000> 1222 phy-handle = <2452000>
@@ -1436,9 +1436,9 @@ platforms are moved over to use the flattened-device-tree model.
1436 interrupts = <1d 3>; 1436 interrupts = <1d 3>;
1437 interrupt-parent = <40000>; 1437 interrupt-parent = <40000>;
1438 num-channels = <4>; 1438 num-channels = <4>;
1439 channel-fifo-len = <24>; 1439 channel-fifo-len = <18>;
1440 exec-units-mask = <000000fe>; 1440 exec-units-mask = <000000fe>;
1441 descriptor-types-mask = <073f1127>; 1441 descriptor-types-mask = <012b0ebf>;
1442 }; 1442 };
1443 1443
1444 1444
@@ -1498,7 +1498,7 @@ not necessary as they are usually the same as the root node.
1498 model = "TSEC"; 1498 model = "TSEC";
1499 compatible = "gianfar"; 1499 compatible = "gianfar";
1500 reg = <24000 1000>; 1500 reg = <24000 1000>;
1501 address = [ 00 E0 0C 00 73 00 ]; 1501 mac-address = [ 00 E0 0C 00 73 00 ];
1502 interrupts = <d 3 e 3 12 3>; 1502 interrupts = <d 3 e 3 12 3>;
1503 interrupt-parent = <40000>; 1503 interrupt-parent = <40000>;
1504 phy-handle = <2452000>; 1504 phy-handle = <2452000>;
@@ -1511,7 +1511,7 @@ not necessary as they are usually the same as the root node.
1511 model = "TSEC"; 1511 model = "TSEC";
1512 compatible = "gianfar"; 1512 compatible = "gianfar";
1513 reg = <25000 1000>; 1513 reg = <25000 1000>;
1514 address = [ 00 E0 0C 00 73 01 ]; 1514 mac-address = [ 00 E0 0C 00 73 01 ];
1515 interrupts = <13 3 14 3 18 3>; 1515 interrupts = <13 3 14 3 18 3>;
1516 interrupt-parent = <40000>; 1516 interrupt-parent = <40000>;
1517 phy-handle = <2452001>; 1517 phy-handle = <2452001>;
@@ -1524,7 +1524,7 @@ not necessary as they are usually the same as the root node.
1524 model = "FEC"; 1524 model = "FEC";
1525 compatible = "gianfar"; 1525 compatible = "gianfar";
1526 reg = <26000 1000>; 1526 reg = <26000 1000>;
1527 address = [ 00 E0 0C 00 73 02 ]; 1527 mac-address = [ 00 E0 0C 00 73 02 ];
1528 interrupts = <19 3>; 1528 interrupts = <19 3>;
1529 interrupt-parent = <40000>; 1529 interrupt-parent = <40000>;
1530 phy-handle = <2452002>; 1530 phy-handle = <2452002>;
diff --git a/Documentation/ramdisk.txt b/Documentation/ramdisk.txt
index 7c25584e082c..52f75b7d51c2 100644
--- a/Documentation/ramdisk.txt
+++ b/Documentation/ramdisk.txt
@@ -6,7 +6,7 @@ Contents:
6 1) Overview 6 1) Overview
7 2) Kernel Command Line Parameters 7 2) Kernel Command Line Parameters
8 3) Using "rdev -r" 8 3) Using "rdev -r"
9 4) An Example of Creating a Compressed RAM Disk 9 4) An Example of Creating a Compressed RAM Disk
10 10
11 11
121) Overview 121) Overview
@@ -34,7 +34,7 @@ make it clearer. The original "ramdisk=<ram_size>" has been kept around for
34compatibility reasons, but it may be removed in the future. 34compatibility reasons, but it may be removed in the future.
35 35
36The new RAM disk also has the ability to load compressed RAM disk images, 36The new RAM disk also has the ability to load compressed RAM disk images,
37allowing one to squeeze more programs onto an average installation or 37allowing one to squeeze more programs onto an average installation or
38rescue floppy disk. 38rescue floppy disk.
39 39
40 40
@@ -51,7 +51,7 @@ default is 4096 (4 MB) (8192 (8 MB) on S390).
51 =================== 51 ===================
52 52
53This parameter tells the RAM disk driver how many bytes to use per block. The 53This parameter tells the RAM disk driver how many bytes to use per block. The
54default is 512. 54default is 1024 (BLOCK_SIZE).
55 55
56 56
573) Using "rdev -r" 573) Using "rdev -r"
@@ -70,7 +70,7 @@ These numbers are no magical secrets, as seen below:
70./arch/i386/kernel/setup.c:#define RAMDISK_PROMPT_FLAG 0x8000 70./arch/i386/kernel/setup.c:#define RAMDISK_PROMPT_FLAG 0x8000
71./arch/i386/kernel/setup.c:#define RAMDISK_LOAD_FLAG 0x4000 71./arch/i386/kernel/setup.c:#define RAMDISK_LOAD_FLAG 0x4000
72 72
73Consider a typical two floppy disk setup, where you will have the 73Consider a typical two floppy disk setup, where you will have the
74kernel on disk one, and have already put a RAM disk image onto disk #2. 74kernel on disk one, and have already put a RAM disk image onto disk #2.
75 75
76Hence you want to set bits 0 to 13 as 0, meaning that your RAM disk 76Hence you want to set bits 0 to 13 as 0, meaning that your RAM disk
@@ -97,12 +97,12 @@ Since the default start = 0 and the default prompt = 1, you could use:
97 append = "load_ramdisk=1" 97 append = "load_ramdisk=1"
98 98
99 99
1004) An Example of Creating a Compressed RAM Disk 1004) An Example of Creating a Compressed RAM Disk
101---------------------------------------------- 101----------------------------------------------
102 102
103To create a RAM disk image, you will need a spare block device to 103To create a RAM disk image, you will need a spare block device to
104construct it on. This can be the RAM disk device itself, or an 104construct it on. This can be the RAM disk device itself, or an
105unused disk partition (such as an unmounted swap partition). For this 105unused disk partition (such as an unmounted swap partition). For this
106example, we will use the RAM disk device, "/dev/ram0". 106example, we will use the RAM disk device, "/dev/ram0".
107 107
108Note: This technique should not be done on a machine with less than 8 MB 108Note: This technique should not be done on a machine with less than 8 MB
diff --git a/Documentation/robust-futexes.txt b/Documentation/robust-futexes.txt
index df82d75245a0..76e8064b8c3a 100644
--- a/Documentation/robust-futexes.txt
+++ b/Documentation/robust-futexes.txt
@@ -95,7 +95,7 @@ comparison. If the thread has registered a list, then normally the list
95is empty. If the thread/process crashed or terminated in some incorrect 95is empty. If the thread/process crashed or terminated in some incorrect
96way then the list might be non-empty: in this case the kernel carefully 96way then the list might be non-empty: in this case the kernel carefully
97walks the list [not trusting it], and marks all locks that are owned by 97walks the list [not trusting it], and marks all locks that are owned by
98this thread with the FUTEX_OWNER_DEAD bit, and wakes up one waiter (if 98this thread with the FUTEX_OWNER_DIED bit, and wakes up one waiter (if
99any). 99any).
100 100
101The list is guaranteed to be private and per-thread at do_exit() time, 101The list is guaranteed to be private and per-thread at do_exit() time,
diff --git a/Documentation/rt-mutex-design.txt b/Documentation/rt-mutex-design.txt
new file mode 100644
index 000000000000..c472ffacc2f6
--- /dev/null
+++ b/Documentation/rt-mutex-design.txt
@@ -0,0 +1,781 @@
1#
2# Copyright (c) 2006 Steven Rostedt
3# Licensed under the GNU Free Documentation License, Version 1.2
4#
5
6RT-mutex implementation design
7------------------------------
8
9This document tries to describe the design of the rtmutex.c implementation.
10It doesn't describe the reasons why rtmutex.c exists. For that please see
11Documentation/rt-mutex.txt. Although this document does explain problems
12that happen without this code, but that is in the concept to understand
13what the code actually is doing.
14
15The goal of this document is to help others understand the priority
16inheritance (PI) algorithm that is used, as well as reasons for the
17decisions that were made to implement PI in the manner that was done.
18
19
20Unbounded Priority Inversion
21----------------------------
22
23Priority inversion is when a lower priority process executes while a higher
24priority process wants to run. This happens for several reasons, and
25most of the time it can't be helped. Anytime a high priority process wants
26to use a resource that a lower priority process has (a mutex for example),
27the high priority process must wait until the lower priority process is done
28with the resource. This is a priority inversion. What we want to prevent
29is something called unbounded priority inversion. That is when the high
30priority process is prevented from running by a lower priority process for
31an undetermined amount of time.
32
33The classic example of unbounded priority inversion is were you have three
34processes, let's call them processes A, B, and C, where A is the highest
35priority process, C is the lowest, and B is in between. A tries to grab a lock
36that C owns and must wait and lets C run to release the lock. But in the
37meantime, B executes, and since B is of a higher priority than C, it preempts C,
38but by doing so, it is in fact preempting A which is a higher priority process.
39Now there's no way of knowing how long A will be sleeping waiting for C
40to release the lock, because for all we know, B is a CPU hog and will
41never give C a chance to release the lock. This is called unbounded priority
42inversion.
43
44Here's a little ASCII art to show the problem.
45
46 grab lock L1 (owned by C)
47 |
48A ---+
49 C preempted by B
50 |
51C +----+
52
53B +-------->
54 B now keeps A from running.
55
56
57Priority Inheritance (PI)
58-------------------------
59
60There are several ways to solve this issue, but other ways are out of scope
61for this document. Here we only discuss PI.
62
63PI is where a process inherits the priority of another process if the other
64process blocks on a lock owned by the current process. To make this easier
65to understand, let's use the previous example, with processes A, B, and C again.
66
67This time, when A blocks on the lock owned by C, C would inherit the priority
68of A. So now if B becomes runnable, it would not preempt C, since C now has
69the high priority of A. As soon as C releases the lock, it loses its
70inherited priority, and A then can continue with the resource that C had.
71
72Terminology
73-----------
74
75Here I explain some terminology that is used in this document to help describe
76the design that is used to implement PI.
77
78PI chain - The PI chain is an ordered series of locks and processes that cause
79 processes to inherit priorities from a previous process that is
80 blocked on one of its locks. This is described in more detail
81 later in this document.
82
83mutex - In this document, to differentiate from locks that implement
84 PI and spin locks that are used in the PI code, from now on
85 the PI locks will be called a mutex.
86
87lock - In this document from now on, I will use the term lock when
88 referring to spin locks that are used to protect parts of the PI
89 algorithm. These locks disable preemption for UP (when
90 CONFIG_PREEMPT is enabled) and on SMP prevents multiple CPUs from
91 entering critical sections simultaneously.
92
93spin lock - Same as lock above.
94
95waiter - A waiter is a struct that is stored on the stack of a blocked
96 process. Since the scope of the waiter is within the code for
97 a process being blocked on the mutex, it is fine to allocate
98 the waiter on the process's stack (local variable). This
99 structure holds a pointer to the task, as well as the mutex that
100 the task is blocked on. It also has the plist node structures to
101 place the task in the waiter_list of a mutex as well as the
102 pi_list of a mutex owner task (described below).
103
104 waiter is sometimes used in reference to the task that is waiting
105 on a mutex. This is the same as waiter->task.
106
107waiters - A list of processes that are blocked on a mutex.
108
109top waiter - The highest priority process waiting on a specific mutex.
110
111top pi waiter - The highest priority process waiting on one of the mutexes
112 that a specific process owns.
113
114Note: task and process are used interchangeably in this document, mostly to
115 differentiate between two processes that are being described together.
116
117
118PI chain
119--------
120
121The PI chain is a list of processes and mutexes that may cause priority
122inheritance to take place. Multiple chains may converge, but a chain
123would never diverge, since a process can't be blocked on more than one
124mutex at a time.
125
126Example:
127
128 Process: A, B, C, D, E
129 Mutexes: L1, L2, L3, L4
130
131 A owns: L1
132 B blocked on L1
133 B owns L2
134 C blocked on L2
135 C owns L3
136 D blocked on L3
137 D owns L4
138 E blocked on L4
139
140The chain would be:
141
142 E->L4->D->L3->C->L2->B->L1->A
143
144To show where two chains merge, we could add another process F and
145another mutex L5 where B owns L5 and F is blocked on mutex L5.
146
147The chain for F would be:
148
149 F->L5->B->L1->A
150
151Since a process may own more than one mutex, but never be blocked on more than
152one, the chains merge.
153
154Here we show both chains:
155
156 E->L4->D->L3->C->L2-+
157 |
158 +->B->L1->A
159 |
160 F->L5-+
161
162For PI to work, the processes at the right end of these chains (or we may
163also call it the Top of the chain) must be equal to or higher in priority
164than the processes to the left or below in the chain.
165
166Also since a mutex may have more than one process blocked on it, we can
167have multiple chains merge at mutexes. If we add another process G that is
168blocked on mutex L2:
169
170 G->L2->B->L1->A
171
172And once again, to show how this can grow I will show the merging chains
173again.
174
175 E->L4->D->L3->C-+
176 +->L2-+
177 | |
178 G-+ +->B->L1->A
179 |
180 F->L5-+
181
182
183Plist
184-----
185
186Before I go further and talk about how the PI chain is stored through lists
187on both mutexes and processes, I'll explain the plist. This is similar to
188the struct list_head functionality that is already in the kernel.
189The implementation of plist is out of scope for this document, but it is
190very important to understand what it does.
191
192There are a few differences between plist and list, the most important one
193being that plist is a priority sorted linked list. This means that the
194priorities of the plist are sorted, such that it takes O(1) to retrieve the
195highest priority item in the list. Obviously this is useful to store processes
196based on their priorities.
197
198Another difference, which is important for implementation, is that, unlike
199list, the head of the list is a different element than the nodes of a list.
200So the head of the list is declared as struct plist_head and nodes that will
201be added to the list are declared as struct plist_node.
202
203
204Mutex Waiter List
205-----------------
206
207Every mutex keeps track of all the waiters that are blocked on itself. The mutex
208has a plist to store these waiters by priority. This list is protected by
209a spin lock that is located in the struct of the mutex. This lock is called
210wait_lock. Since the modification of the waiter list is never done in
211interrupt context, the wait_lock can be taken without disabling interrupts.
212
213
214Task PI List
215------------
216
217To keep track of the PI chains, each process has its own PI list. This is
218a list of all top waiters of the mutexes that are owned by the process.
219Note that this list only holds the top waiters and not all waiters that are
220blocked on mutexes owned by the process.
221
222The top of the task's PI list is always the highest priority task that
223is waiting on a mutex that is owned by the task. So if the task has
224inherited a priority, it will always be the priority of the task that is
225at the top of this list.
226
227This list is stored in the task structure of a process as a plist called
228pi_list. This list is protected by a spin lock also in the task structure,
229called pi_lock. This lock may also be taken in interrupt context, so when
230locking the pi_lock, interrupts must be disabled.
231
232
233Depth of the PI Chain
234---------------------
235
236The maximum depth of the PI chain is not dynamic, and could actually be
237defined. But is very complex to figure it out, since it depends on all
238the nesting of mutexes. Let's look at the example where we have 3 mutexes,
239L1, L2, and L3, and four separate functions func1, func2, func3 and func4.
240The following shows a locking order of L1->L2->L3, but may not actually
241be directly nested that way.
242
243void func1(void)
244{
245 mutex_lock(L1);
246
247 /* do anything */
248
249 mutex_unlock(L1);
250}
251
252void func2(void)
253{
254 mutex_lock(L1);
255 mutex_lock(L2);
256
257 /* do something */
258
259 mutex_unlock(L2);
260 mutex_unlock(L1);
261}
262
263void func3(void)
264{
265 mutex_lock(L2);
266 mutex_lock(L3);
267
268 /* do something else */
269
270 mutex_unlock(L3);
271 mutex_unlock(L2);
272}
273
274void func4(void)
275{
276 mutex_lock(L3);
277
278 /* do something again */
279
280 mutex_unlock(L3);
281}
282
283Now we add 4 processes that run each of these functions separately.
284Processes A, B, C, and D which run functions func1, func2, func3 and func4
285respectively, and such that D runs first and A last. With D being preempted
286in func4 in the "do something again" area, we have a locking that follows:
287
288D owns L3
289 C blocked on L3
290 C owns L2
291 B blocked on L2
292 B owns L1
293 A blocked on L1
294
295And thus we have the chain A->L1->B->L2->C->L3->D.
296
297This gives us a PI depth of 4 (four processes), but looking at any of the
298functions individually, it seems as though they only have at most a locking
299depth of two. So, although the locking depth is defined at compile time,
300it still is very difficult to find the possibilities of that depth.
301
302Now since mutexes can be defined by user-land applications, we don't want a DOS
303type of application that nests large amounts of mutexes to create a large
304PI chain, and have the code holding spin locks while looking at a large
305amount of data. So to prevent this, the implementation not only implements
306a maximum lock depth, but also only holds at most two different locks at a
307time, as it walks the PI chain. More about this below.
308
309
310Mutex owner and flags
311---------------------
312
313The mutex structure contains a pointer to the owner of the mutex. If the
314mutex is not owned, this owner is set to NULL. Since all architectures
315have the task structure on at least a four byte alignment (and if this is
316not true, the rtmutex.c code will be broken!), this allows for the two
317least significant bits to be used as flags. This part is also described
318in Documentation/rt-mutex.txt, but will also be briefly described here.
319
320Bit 0 is used as the "Pending Owner" flag. This is described later.
321Bit 1 is used as the "Has Waiters" flags. This is also described later
322 in more detail, but is set whenever there are waiters on a mutex.
323
324
325cmpxchg Tricks
326--------------
327
328Some architectures implement an atomic cmpxchg (Compare and Exchange). This
329is used (when applicable) to keep the fast path of grabbing and releasing
330mutexes short.
331
332cmpxchg is basically the following function performed atomically:
333
334unsigned long _cmpxchg(unsigned long *A, unsigned long *B, unsigned long *C)
335{
336 unsigned long T = *A;
337 if (*A == *B) {
338 *A = *C;
339 }
340 return T;
341}
342#define cmpxchg(a,b,c) _cmpxchg(&a,&b,&c)
343
344This is really nice to have, since it allows you to only update a variable
345if the variable is what you expect it to be. You know if it succeeded if
346the return value (the old value of A) is equal to B.
347
348The macro rt_mutex_cmpxchg is used to try to lock and unlock mutexes. If
349the architecture does not support CMPXCHG, then this macro is simply set
350to fail every time. But if CMPXCHG is supported, then this will
351help out extremely to keep the fast path short.
352
353The use of rt_mutex_cmpxchg with the flags in the owner field help optimize
354the system for architectures that support it. This will also be explained
355later in this document.
356
357
358Priority adjustments
359--------------------
360
361The implementation of the PI code in rtmutex.c has several places that a
362process must adjust its priority. With the help of the pi_list of a
363process this is rather easy to know what needs to be adjusted.
364
365The functions implementing the task adjustments are rt_mutex_adjust_prio,
366__rt_mutex_adjust_prio (same as the former, but expects the task pi_lock
367to already be taken), rt_mutex_get_prio, and rt_mutex_setprio.
368
369rt_mutex_getprio and rt_mutex_setprio are only used in __rt_mutex_adjust_prio.
370
371rt_mutex_getprio returns the priority that the task should have. Either the
372task's own normal priority, or if a process of a higher priority is waiting on
373a mutex owned by the task, then that higher priority should be returned.
374Since the pi_list of a task holds an order by priority list of all the top
375waiters of all the mutexes that the task owns, rt_mutex_getprio simply needs
376to compare the top pi waiter to its own normal priority, and return the higher
377priority back.
378
379(Note: if looking at the code, you will notice that the lower number of
380 prio is returned. This is because the prio field in the task structure
381 is an inverse order of the actual priority. So a "prio" of 5 is
382 of higher priority than a "prio" of 10.)
383
384__rt_mutex_adjust_prio examines the result of rt_mutex_getprio, and if the
385result does not equal the task's current priority, then rt_mutex_setprio
386is called to adjust the priority of the task to the new priority.
387Note that rt_mutex_setprio is defined in kernel/sched.c to implement the
388actual change in priority.
389
390It is interesting to note that __rt_mutex_adjust_prio can either increase
391or decrease the priority of the task. In the case that a higher priority
392process has just blocked on a mutex owned by the task, __rt_mutex_adjust_prio
393would increase/boost the task's priority. But if a higher priority task
394were for some reason to leave the mutex (timeout or signal), this same function
395would decrease/unboost the priority of the task. That is because the pi_list
396always contains the highest priority task that is waiting on a mutex owned
397by the task, so we only need to compare the priority of that top pi waiter
398to the normal priority of the given task.
399
400
401High level overview of the PI chain walk
402----------------------------------------
403
404The PI chain walk is implemented by the function rt_mutex_adjust_prio_chain.
405
406The implementation has gone through several iterations, and has ended up
407with what we believe is the best. It walks the PI chain by only grabbing
408at most two locks at a time, and is very efficient.
409
410The rt_mutex_adjust_prio_chain can be used either to boost or lower process
411priorities.
412
413rt_mutex_adjust_prio_chain is called with a task to be checked for PI
414(de)boosting (the owner of a mutex that a process is blocking on), a flag to
415check for deadlocking, the mutex that the task owns, and a pointer to a waiter
416that is the process's waiter struct that is blocked on the mutex (although this
417parameter may be NULL for deboosting).
418
419For this explanation, I will not mention deadlock detection. This explanation
420will try to stay at a high level.
421
422When this function is called, there are no locks held. That also means
423that the state of the owner and lock can change when entered into this function.
424
425Before this function is called, the task has already had rt_mutex_adjust_prio
426performed on it. This means that the task is set to the priority that it
427should be at, but the plist nodes of the task's waiter have not been updated
428with the new priorities, and that this task may not be in the proper locations
429in the pi_lists and wait_lists that the task is blocked on. This function
430solves all that.
431
432A loop is entered, where task is the owner to be checked for PI changes that
433was passed by parameter (for the first iteration). The pi_lock of this task is
434taken to prevent any more changes to the pi_list of the task. This also
435prevents new tasks from completing the blocking on a mutex that is owned by this
436task.
437
438If the task is not blocked on a mutex then the loop is exited. We are at
439the top of the PI chain.
440
441A check is now done to see if the original waiter (the process that is blocked
442on the current mutex) is the top pi waiter of the task. That is, is this
443waiter on the top of the task's pi_list. If it is not, it either means that
444there is another process higher in priority that is blocked on one of the
445mutexes that the task owns, or that the waiter has just woken up via a signal
446or timeout and has left the PI chain. In either case, the loop is exited, since
447we don't need to do any more changes to the priority of the current task, or any
448task that owns a mutex that this current task is waiting on. A priority chain
449walk is only needed when a new top pi waiter is made to a task.
450
451The next check sees if the task's waiter plist node has the priority equal to
452the priority the task is set at. If they are equal, then we are done with
453the loop. Remember that the function started with the priority of the
454task adjusted, but the plist nodes that hold the task in other processes
455pi_lists have not been adjusted.
456
457Next, we look at the mutex that the task is blocked on. The mutex's wait_lock
458is taken. This is done by a spin_trylock, because the locking order of the
459pi_lock and wait_lock goes in the opposite direction. If we fail to grab the
460lock, the pi_lock is released, and we restart the loop.
461
462Now that we have both the pi_lock of the task as well as the wait_lock of
463the mutex the task is blocked on, we update the task's waiter's plist node
464that is located on the mutex's wait_list.
465
466Now we release the pi_lock of the task.
467
468Next the owner of the mutex has its pi_lock taken, so we can update the
469task's entry in the owner's pi_list. If the task is the highest priority
470process on the mutex's wait_list, then we remove the previous top waiter
471from the owner's pi_list, and replace it with the task.
472
473Note: It is possible that the task was the current top waiter on the mutex,
474 in which case the task is not yet on the pi_list of the waiter. This
475 is OK, since plist_del does nothing if the plist node is not on any
476 list.
477
478If the task was not the top waiter of the mutex, but it was before we
479did the priority updates, that means we are deboosting/lowering the
480task. In this case, the task is removed from the pi_list of the owner,
481and the new top waiter is added.
482
483Lastly, we unlock both the pi_lock of the task, as well as the mutex's
484wait_lock, and continue the loop again. On the next iteration of the
485loop, the previous owner of the mutex will be the task that will be
486processed.
487
488Note: One might think that the owner of this mutex might have changed
489 since we just grab the mutex's wait_lock. And one could be right.
490 The important thing to remember is that the owner could not have
491 become the task that is being processed in the PI chain, since
492 we have taken that task's pi_lock at the beginning of the loop.
493 So as long as there is an owner of this mutex that is not the same
494 process as the tasked being worked on, we are OK.
495
496 Looking closely at the code, one might be confused. The check for the
497 end of the PI chain is when the task isn't blocked on anything or the
498 task's waiter structure "task" element is NULL. This check is
499 protected only by the task's pi_lock. But the code to unlock the mutex
500 sets the task's waiter structure "task" element to NULL with only
501 the protection of the mutex's wait_lock, which was not taken yet.
502 Isn't this a race condition if the task becomes the new owner?
503
504 The answer is No! The trick is the spin_trylock of the mutex's
505 wait_lock. If we fail that lock, we release the pi_lock of the
506 task and continue the loop, doing the end of PI chain check again.
507
508 In the code to release the lock, the wait_lock of the mutex is held
509 the entire time, and it is not let go when we grab the pi_lock of the
510 new owner of the mutex. So if the switch of a new owner were to happen
511 after the check for end of the PI chain and the grabbing of the
512 wait_lock, the unlocking code would spin on the new owner's pi_lock
513 but never give up the wait_lock. So the PI chain loop is guaranteed to
514 fail the spin_trylock on the wait_lock, release the pi_lock, and
515 try again.
516
517 If you don't quite understand the above, that's OK. You don't have to,
518 unless you really want to make a proof out of it ;)
519
520
521Pending Owners and Lock stealing
522--------------------------------
523
524One of the flags in the owner field of the mutex structure is "Pending Owner".
525What this means is that an owner was chosen by the process releasing the
526mutex, but that owner has yet to wake up and actually take the mutex.
527
528Why is this important? Why can't we just give the mutex to another process
529and be done with it?
530
531The PI code is to help with real-time processes, and to let the highest
532priority process run as long as possible with little latencies and delays.
533If a high priority process owns a mutex that a lower priority process is
534blocked on, when the mutex is released it would be given to the lower priority
535process. What if the higher priority process wants to take that mutex again.
536The high priority process would fail to take that mutex that it just gave up
537and it would need to boost the lower priority process to run with full
538latency of that critical section (since the low priority process just entered
539it).
540
541There's no reason a high priority process that gives up a mutex should be
542penalized if it tries to take that mutex again. If the new owner of the
543mutex has not woken up yet, there's no reason that the higher priority process
544could not take that mutex away.
545
546To solve this, we introduced Pending Ownership and Lock Stealing. When a
547new process is given a mutex that it was blocked on, it is only given
548pending ownership. This means that it's the new owner, unless a higher
549priority process comes in and tries to grab that mutex. If a higher priority
550process does come along and wants that mutex, we let the higher priority
551process "steal" the mutex from the pending owner (only if it is still pending)
552and continue with the mutex.
553
554
555Taking of a mutex (The walk through)
556------------------------------------
557
558OK, now let's take a look at the detailed walk through of what happens when
559taking a mutex.
560
561The first thing that is tried is the fast taking of the mutex. This is
562done when we have CMPXCHG enabled (otherwise the fast taking automatically
563fails). Only when the owner field of the mutex is NULL can the lock be
564taken with the CMPXCHG and nothing else needs to be done.
565
566If there is contention on the lock, whether it is owned or pending owner
567we go about the slow path (rt_mutex_slowlock).
568
569The slow path function is where the task's waiter structure is created on
570the stack. This is because the waiter structure is only needed for the
571scope of this function. The waiter structure holds the nodes to store
572the task on the wait_list of the mutex, and if need be, the pi_list of
573the owner.
574
575The wait_lock of the mutex is taken since the slow path of unlocking the
576mutex also takes this lock.
577
578We then call try_to_take_rt_mutex. This is where the architecture that
579does not implement CMPXCHG would always grab the lock (if there's no
580contention).
581
582try_to_take_rt_mutex is used every time the task tries to grab a mutex in the
583slow path. The first thing that is done here is an atomic setting of
584the "Has Waiters" flag of the mutex's owner field. Yes, this could really
585be false, because if the the mutex has no owner, there are no waiters and
586the current task also won't have any waiters. But we don't have the lock
587yet, so we assume we are going to be a waiter. The reason for this is to
588play nice for those architectures that do have CMPXCHG. By setting this flag
589now, the owner of the mutex can't release the mutex without going into the
590slow unlock path, and it would then need to grab the wait_lock, which this
591code currently holds. So setting the "Has Waiters" flag forces the owner
592to synchronize with this code.
593
594Now that we know that we can't have any races with the owner releasing the
595mutex, we check to see if we can take the ownership. This is done if the
596mutex doesn't have a owner, or if we can steal the mutex from a pending
597owner. Let's look at the situations we have here.
598
599 1) Has owner that is pending
600 ----------------------------
601
602 The mutex has a owner, but it hasn't woken up and the mutex flag
603 "Pending Owner" is set. The first check is to see if the owner isn't the
604 current task. This is because this function is also used for the pending
605 owner to grab the mutex. When a pending owner wakes up, it checks to see
606 if it can take the mutex, and this is done if the owner is already set to
607 itself. If so, we succeed and leave the function, clearing the "Pending
608 Owner" bit.
609
610 If the pending owner is not current, we check to see if the current priority is
611 higher than the pending owner. If not, we fail the function and return.
612
613 There's also something special about a pending owner. That is a pending owner
614 is never blocked on a mutex. So there is no PI chain to worry about. It also
615 means that if the mutex doesn't have any waiters, there's no accounting needed
616 to update the pending owner's pi_list, since we only worry about processes
617 blocked on the current mutex.
618
619 If there are waiters on this mutex, and we just stole the ownership, we need
620 to take the top waiter, remove it from the pi_list of the pending owner, and
621 add it to the current pi_list. Note that at this moment, the pending owner
622 is no longer on the list of waiters. This is fine, since the pending owner
623 would add itself back when it realizes that it had the ownership stolen
624 from itself. When the pending owner tries to grab the mutex, it will fail
625 in try_to_take_rt_mutex if the owner field points to another process.
626
627 2) No owner
628 -----------
629
630 If there is no owner (or we successfully stole the lock), we set the owner
631 of the mutex to current, and set the flag of "Has Waiters" if the current
632 mutex actually has waiters, or we clear the flag if it doesn't. See, it was
633 OK that we set that flag early, since now it is cleared.
634
635 3) Failed to grab ownership
636 ---------------------------
637
638 The most interesting case is when we fail to take ownership. This means that
639 there exists an owner, or there's a pending owner with equal or higher
640 priority than the current task.
641
642We'll continue on the failed case.
643
644If the mutex has a timeout, we set up a timer to go off to break us out
645of this mutex if we failed to get it after a specified amount of time.
646
647Now we enter a loop that will continue to try to take ownership of the mutex, or
648fail from a timeout or signal.
649
650Once again we try to take the mutex. This will usually fail the first time
651in the loop, since it had just failed to get the mutex. But the second time
652in the loop, this would likely succeed, since the task would likely be
653the pending owner.
654
655If the mutex is TASK_INTERRUPTIBLE a check for signals and timeout is done
656here.
657
658The waiter structure has a "task" field that points to the task that is blocked
659on the mutex. This field can be NULL the first time it goes through the loop
660or if the task is a pending owner and had it's mutex stolen. If the "task"
661field is NULL then we need to set up the accounting for it.
662
663Task blocks on mutex
664--------------------
665
666The accounting of a mutex and process is done with the waiter structure of
667the process. The "task" field is set to the process, and the "lock" field
668to the mutex. The plist nodes are initialized to the processes current
669priority.
670
671Since the wait_lock was taken at the entry of the slow lock, we can safely
672add the waiter to the wait_list. If the current process is the highest
673priority process currently waiting on this mutex, then we remove the
674previous top waiter process (if it exists) from the pi_list of the owner,
675and add the current process to that list. Since the pi_list of the owner
676has changed, we call rt_mutex_adjust_prio on the owner to see if the owner
677should adjust its priority accordingly.
678
679If the owner is also blocked on a lock, and had its pi_list changed
680(or deadlock checking is on), we unlock the wait_lock of the mutex and go ahead
681and run rt_mutex_adjust_prio_chain on the owner, as described earlier.
682
683Now all locks are released, and if the current process is still blocked on a
684mutex (waiter "task" field is not NULL), then we go to sleep (call schedule).
685
686Waking up in the loop
687---------------------
688
689The schedule can then wake up for a few reasons.
690 1) we were given pending ownership of the mutex.
691 2) we received a signal and was TASK_INTERRUPTIBLE
692 3) we had a timeout and was TASK_INTERRUPTIBLE
693
694In any of these cases, we continue the loop and once again try to grab the
695ownership of the mutex. If we succeed, we exit the loop, otherwise we continue
696and on signal and timeout, will exit the loop, or if we had the mutex stolen
697we just simply add ourselves back on the lists and go back to sleep.
698
699Note: For various reasons, because of timeout and signals, the steal mutex
700 algorithm needs to be careful. This is because the current process is
701 still on the wait_list. And because of dynamic changing of priorities,
702 especially on SCHED_OTHER tasks, the current process can be the
703 highest priority task on the wait_list.
704
705Failed to get mutex on Timeout or Signal
706----------------------------------------
707
708If a timeout or signal occurred, the waiter's "task" field would not be
709NULL and the task needs to be taken off the wait_list of the mutex and perhaps
710pi_list of the owner. If this process was a high priority process, then
711the rt_mutex_adjust_prio_chain needs to be executed again on the owner,
712but this time it will be lowering the priorities.
713
714
715Unlocking the Mutex
716-------------------
717
718The unlocking of a mutex also has a fast path for those architectures with
719CMPXCHG. Since the taking of a mutex on contention always sets the
720"Has Waiters" flag of the mutex's owner, we use this to know if we need to
721take the slow path when unlocking the mutex. If the mutex doesn't have any
722waiters, the owner field of the mutex would equal the current process and
723the mutex can be unlocked by just replacing the owner field with NULL.
724
725If the owner field has the "Has Waiters" bit set (or CMPXCHG is not available),
726the slow unlock path is taken.
727
728The first thing done in the slow unlock path is to take the wait_lock of the
729mutex. This synchronizes the locking and unlocking of the mutex.
730
731A check is made to see if the mutex has waiters or not. On architectures that
732do not have CMPXCHG, this is the location that the owner of the mutex will
733determine if a waiter needs to be awoken or not. On architectures that
734do have CMPXCHG, that check is done in the fast path, but it is still needed
735in the slow path too. If a waiter of a mutex woke up because of a signal
736or timeout between the time the owner failed the fast path CMPXCHG check and
737the grabbing of the wait_lock, the mutex may not have any waiters, thus the
738owner still needs to make this check. If there are no waiters than the mutex
739owner field is set to NULL, the wait_lock is released and nothing more is
740needed.
741
742If there are waiters, then we need to wake one up and give that waiter
743pending ownership.
744
745On the wake up code, the pi_lock of the current owner is taken. The top
746waiter of the lock is found and removed from the wait_list of the mutex
747as well as the pi_list of the current owner. The task field of the new
748pending owner's waiter structure is set to NULL, and the owner field of the
749mutex is set to the new owner with the "Pending Owner" bit set, as well
750as the "Has Waiters" bit if there still are other processes blocked on the
751mutex.
752
753The pi_lock of the previous owner is released, and the new pending owner's
754pi_lock is taken. Remember that this is the trick to prevent the race
755condition in rt_mutex_adjust_prio_chain from adding itself as a waiter
756on the mutex.
757
758We now clear the "pi_blocked_on" field of the new pending owner, and if
759the mutex still has waiters pending, we add the new top waiter to the pi_list
760of the pending owner.
761
762Finally we unlock the pi_lock of the pending owner and wake it up.
763
764
765Contact
766-------
767
768For updates on this document, please email Steven Rostedt <rostedt@goodmis.org>
769
770
771Credits
772-------
773
774Author: Steven Rostedt <rostedt@goodmis.org>
775
776Reviewers: Ingo Molnar, Thomas Gleixner, Thomas Duetsch, and Randy Dunlap
777
778Updates
779-------
780
781This document was originally written for 2.6.17-rc3-mm1
diff --git a/Documentation/rt-mutex.txt b/Documentation/rt-mutex.txt
new file mode 100644
index 000000000000..243393d882ee
--- /dev/null
+++ b/Documentation/rt-mutex.txt
@@ -0,0 +1,79 @@
1RT-mutex subsystem with PI support
2----------------------------------
3
4RT-mutexes with priority inheritance are used to support PI-futexes,
5which enable pthread_mutex_t priority inheritance attributes
6(PTHREAD_PRIO_INHERIT). [See Documentation/pi-futex.txt for more details
7about PI-futexes.]
8
9This technology was developed in the -rt tree and streamlined for
10pthread_mutex support.
11
12Basic principles:
13-----------------
14
15RT-mutexes extend the semantics of simple mutexes by the priority
16inheritance protocol.
17
18A low priority owner of a rt-mutex inherits the priority of a higher
19priority waiter until the rt-mutex is released. If the temporarily
20boosted owner blocks on a rt-mutex itself it propagates the priority
21boosting to the owner of the other rt_mutex it gets blocked on. The
22priority boosting is immediately removed once the rt_mutex has been
23unlocked.
24
25This approach allows us to shorten the block of high-prio tasks on
26mutexes which protect shared resources. Priority inheritance is not a
27magic bullet for poorly designed applications, but it allows
28well-designed applications to use userspace locks in critical parts of
29an high priority thread, without losing determinism.
30
31The enqueueing of the waiters into the rtmutex waiter list is done in
32priority order. For same priorities FIFO order is chosen. For each
33rtmutex, only the top priority waiter is enqueued into the owner's
34priority waiters list. This list too queues in priority order. Whenever
35the top priority waiter of a task changes (for example it timed out or
36got a signal), the priority of the owner task is readjusted. [The
37priority enqueueing is handled by "plists", see include/linux/plist.h
38for more details.]
39
40RT-mutexes are optimized for fastpath operations and have no internal
41locking overhead when locking an uncontended mutex or unlocking a mutex
42without waiters. The optimized fastpath operations require cmpxchg
43support. [If that is not available then the rt-mutex internal spinlock
44is used]
45
46The state of the rt-mutex is tracked via the owner field of the rt-mutex
47structure:
48
49rt_mutex->owner holds the task_struct pointer of the owner. Bit 0 and 1
50are used to keep track of the "owner is pending" and "rtmutex has
51waiters" state.
52
53 owner bit1 bit0
54 NULL 0 0 mutex is free (fast acquire possible)
55 NULL 0 1 invalid state
56 NULL 1 0 Transitional state*
57 NULL 1 1 invalid state
58 taskpointer 0 0 mutex is held (fast release possible)
59 taskpointer 0 1 task is pending owner
60 taskpointer 1 0 mutex is held and has waiters
61 taskpointer 1 1 task is pending owner and mutex has waiters
62
63Pending-ownership handling is a performance optimization:
64pending-ownership is assigned to the first (highest priority) waiter of
65the mutex, when the mutex is released. The thread is woken up and once
66it starts executing it can acquire the mutex. Until the mutex is taken
67by it (bit 0 is cleared) a competing higher priority thread can "steal"
68the mutex which puts the woken up thread back on the waiters list.
69
70The pending-ownership optimization is especially important for the
71uninterrupted workflow of high-prio tasks which repeatedly
72takes/releases locks that have lower-prio waiters. Without this
73optimization the higher-prio thread would ping-pong to the lower-prio
74task [because at unlock time we always assign a new owner].
75
76(*) The "mutex has waiters" bit gets set to take the lock. If the lock
77doesn't already have an owner, this bit is quickly cleared if there are
78no waiters. So this is a transitional state to synchronize with looking
79at the owner field of the mutex and the mutex owner releasing the lock.
diff --git a/Documentation/rtc.txt b/Documentation/rtc.txt
index 95d17b3e2eee..2a58f985795a 100644
--- a/Documentation/rtc.txt
+++ b/Documentation/rtc.txt
@@ -44,8 +44,10 @@ normal timer interrupt, which is 100Hz.
44Programming and/or enabling interrupt frequencies greater than 64Hz is 44Programming and/or enabling interrupt frequencies greater than 64Hz is
45only allowed by root. This is perhaps a bit conservative, but we don't want 45only allowed by root. This is perhaps a bit conservative, but we don't want
46an evil user generating lots of IRQs on a slow 386sx-16, where it might have 46an evil user generating lots of IRQs on a slow 386sx-16, where it might have
47a negative impact on performance. Note that the interrupt handler is only 47a negative impact on performance. This 64Hz limit can be changed by writing
48a few lines of code to minimize any possibility of this effect. 48a different value to /proc/sys/dev/rtc/max-user-freq. Note that the
49interrupt handler is only a few lines of code to minimize any possibility
50of this effect.
49 51
50Also, if the kernel time is synchronized with an external source, the 52Also, if the kernel time is synchronized with an external source, the
51kernel will write the time back to the CMOS clock every 11 minutes. In 53kernel will write the time back to the CMOS clock every 11 minutes. In
@@ -81,6 +83,7 @@ that will be using this driver.
81 */ 83 */
82 84
83#include <stdio.h> 85#include <stdio.h>
86#include <stdlib.h>
84#include <linux/rtc.h> 87#include <linux/rtc.h>
85#include <sys/ioctl.h> 88#include <sys/ioctl.h>
86#include <sys/time.h> 89#include <sys/time.h>
diff --git a/Documentation/scsi/ChangeLog.arcmsr b/Documentation/scsi/ChangeLog.arcmsr
new file mode 100644
index 000000000000..162c47fdf45f
--- /dev/null
+++ b/Documentation/scsi/ChangeLog.arcmsr
@@ -0,0 +1,56 @@
1**************************************************************************
2** History
3**
4** REV# DATE NAME DESCRIPTION
5** 1.00.00.00 3/31/2004 Erich Chen First release
6** 1.10.00.04 7/28/2004 Erich Chen modify for ioctl
7** 1.10.00.06 8/28/2004 Erich Chen modify for 2.6.x
8** 1.10.00.08 9/28/2004 Erich Chen modify for x86_64
9** 1.10.00.10 10/10/2004 Erich Chen bug fix for SMP & ioctl
10** 1.20.00.00 11/29/2004 Erich Chen bug fix with arcmsr_bus_reset when PHY error
11** 1.20.00.02 12/09/2004 Erich Chen bug fix with over 2T bytes RAID Volume
12** 1.20.00.04 1/09/2005 Erich Chen fits for Debian linux kernel version 2.2.xx
13** 1.20.00.05 2/20/2005 Erich Chen cleanly as look like a Linux driver at 2.6.x
14** thanks for peoples kindness comment
15** Kornel Wieliczek
16** Christoph Hellwig
17** Adrian Bunk
18** Andrew Morton
19** Christoph Hellwig
20** James Bottomley
21** Arjan van de Ven
22** 1.20.00.06 3/12/2005 Erich Chen fix with arcmsr_pci_unmap_dma "unsigned long" cast,
23** modify PCCB POOL allocated by "dma_alloc_coherent"
24** (Kornel Wieliczek's comment)
25** 1.20.00.07 3/23/2005 Erich Chen bug fix with arcmsr_scsi_host_template_init
26** occur segmentation fault,
27** if RAID adapter does not on PCI slot
28** and modprobe/rmmod this driver twice.
29** bug fix enormous stack usage (Adrian Bunk's comment)
30** 1.20.00.08 6/23/2005 Erich Chen bug fix with abort command,
31** in case of heavy loading when sata cable
32** working on low quality connection
33** 1.20.00.09 9/12/2005 Erich Chen bug fix with abort command handling, firmware version check
34** and firmware update notify for hardware bug fix
35** 1.20.00.10 9/23/2005 Erich Chen enhance sysfs function for change driver's max tag Q number.
36** add DMA_64BIT_MASK for backward compatible with all 2.6.x
37** add some useful message for abort command
38** add ioctl code 'ARCMSR_IOCTL_FLUSH_ADAPTER_CACHE'
39** customer can send this command for sync raid volume data
40** 1.20.00.11 9/29/2005 Erich Chen by comment of Arjan van de Ven fix incorrect msleep redefine
41** cast off sizeof(dma_addr_t) condition for 64bit pci_set_dma_mask
42** 1.20.00.12 9/30/2005 Erich Chen bug fix with 64bit platform's ccbs using if over 4G system memory
43** change 64bit pci_set_consistent_dma_mask into 32bit
44** increcct adapter count if adapter initialize fail.
45** miss edit at arcmsr_build_ccb....
46** psge += sizeof(struct _SG64ENTRY *) =>
47** psge += sizeof(struct _SG64ENTRY)
48** 64 bits sg entry would be incorrectly calculated
49** thanks Kornel Wieliczek give me kindly notify
50** and detail description
51** 1.20.00.13 11/15/2005 Erich Chen scheduling pending ccb with FIFO
52** change the architecture of arcmsr command queue list
53** for linux standard list
54** enable usage of pci message signal interrupt
55** follow Randy.Danlup kindness suggestion cleanup this code
56************************************************************************** \ No newline at end of file
diff --git a/Documentation/scsi/ChangeLog.megaraid b/Documentation/scsi/ChangeLog.megaraid
index c173806c91fa..a056bbe67c7e 100644
--- a/Documentation/scsi/ChangeLog.megaraid
+++ b/Documentation/scsi/ChangeLog.megaraid
@@ -1,3 +1,126 @@
1Release Date : Fri May 19 09:31:45 EST 2006 - Seokmann Ju <sju@lsil.com>
2Current Version : 2.20.4.9 (scsi module), 2.20.2.6 (cmm module)
3Older Version : 2.20.4.8 (scsi module), 2.20.2.6 (cmm module)
4
51. Fixed a bug in megaraid_init_mbox().
6 Customer reported "garbage in file on x86_64 platform".
7 Root Cause: the driver registered controllers as 64-bit DMA capable
8 for those which are not support it.
9 Fix: Made change in the function inserting identification machanism
10 identifying 64-bit DMA capable controllers.
11
12 > -----Original Message-----
13 > From: Vasily Averin [mailto:vvs@sw.ru]
14 > Sent: Thursday, May 04, 2006 2:49 PM
15 > To: linux-scsi@vger.kernel.org; Kolli, Neela; Mukker, Atul;
16 > Ju, Seokmann; Bagalkote, Sreenivas;
17 > James.Bottomley@SteelEye.com; devel@openvz.org
18 > Subject: megaraid_mbox: garbage in file
19 >
20 > Hello all,
21 >
22 > I've investigated customers claim on the unstable work of
23 > their node and found a
24 > strange effect: reading from some files leads to the
25 > "attempt to access beyond end of device" messages.
26 >
27 > I've checked filesystem, memory on the node, motherboard BIOS
28 > version, but it
29 > does not help and issue still has been reproduced by simple
30 > file reading.
31 >
32 > Reproducer is simple:
33 >
34 > echo 0xffffffff >/proc/sys/dev/scsi/logging_level ;
35 > cat /vz/private/101/root/etc/ld.so.cache >/tmp/ttt ;
36 > echo 0 >/proc/sys/dev/scsi/logging
37 >
38 > It leads to the following messages in dmesg
39 >
40 > sd_init_command: disk=sda, block=871769260, count=26
41 > sda : block=871769260
42 > sda : reading 26/26 512 byte blocks.
43 > scsi_add_timer: scmd: f79ed980, time: 7500, (c02b1420)
44 > sd 0:1:0:0: send 0xf79ed980 sd 0:1:0:0:
45 > command: Read (10): 28 00 33 f6 24 ac 00 00 1a 00
46 > buffer = 0xf7cfb540, bufflen = 13312, done = 0xc0366b40,
47 > queuecommand 0xc0344010
48 > leaving scsi_dispatch_cmnd()
49 > scsi_delete_timer: scmd: f79ed980, rtn: 1
50 > sd 0:1:0:0: done 0xf79ed980 SUCCESS 0 sd 0:1:0:0:
51 > command: Read (10): 28 00 33 f6 24 ac 00 00 1a 00
52 > scsi host busy 1 failed 0
53 > sd 0:1:0:0: Notifying upper driver of completion (result 0)
54 > sd_rw_intr: sda: res=0x0
55 > 26 sectors total, 13312 bytes done.
56 > use_sg is 4
57 > attempt to access beyond end of device
58 > sda6: rw=0, want=1044134458, limit=951401367
59 > Buffer I/O error on device sda6, logical block 522067228
60 > attempt to access beyond end of device
61
622. When INQUIRY with EVPD bit set issued to the MegaRAID controller,
63 system memory gets corrupted.
64 Root Cause: MegaRAID F/W handle the INQUIRY with EVPD bit set
65 incorrectly.
66 Fix: MegaRAID F/W has fixed the problem and being process of release,
67 soon. Meanwhile, driver will filter out the request.
68
693. One of member in the data structure of the driver leads unaligne
70 issue on 64-bit platform.
71 Customer reporeted "kernel unaligned access addrss" issue when
72 application communicates with MegaRAID HBA driver.
73 Root Cause: in uioc_t structure, one of member had misaligned and it
74 led system to display the error message.
75 Fix: A patch submitted to community from following folk.
76
77 > -----Original Message-----
78 > From: linux-scsi-owner@vger.kernel.org
79 > [mailto:linux-scsi-owner@vger.kernel.org] On Behalf Of Sakurai Hiroomi
80 > Sent: Wednesday, July 12, 2006 4:20 AM
81 > To: linux-scsi@vger.kernel.org; linux-kernel@vger.kernel.org
82 > Subject: Re: Help: strange messages from kernel on IA64 platform
83 >
84 > Hi,
85 >
86 > I saw same message.
87 >
88 > When GAM(Global Array Manager) is started, The following
89 > message output.
90 > kernel: kernel unaligned access to 0xe0000001fe1080d4,
91 > ip=0xa000000200053371
92 >
93 > The uioc structure used by ioctl is defined by packed,
94 > the allignment of each member are disturbed.
95 > In a 64 bit structure, the allignment of member doesn't fit 64 bit
96 > boundary. this causes this messages.
97 > In a 32 bit structure, we don't see the message because the allinment
98 > of member fit 32 bit boundary even if packed is specified.
99 >
100 > patch
101 > I Add 32 bit dummy member to fit 64 bit boundary. I tested.
102 > We confirmed this patch fix the problem by IA64 server.
103 >
104 > **************************************************************
105 > ****************
106 > --- linux-2.6.9/drivers/scsi/megaraid/megaraid_ioctl.h.orig
107 > 2006-04-03 17:13:03.000000000 +0900
108 > +++ linux-2.6.9/drivers/scsi/megaraid/megaraid_ioctl.h
109 > 2006-04-03 17:14:09.000000000 +0900
110 > @@ -132,6 +132,10 @@
111 > /* Driver Data: */
112 > void __user * user_data;
113 > uint32_t user_data_len;
114 > +
115 > + /* 64bit alignment */
116 > + uint32_t pad_0xBC;
117 > +
118 > mraid_passthru_t __user *user_pthru;
119 >
120 > mraid_passthru_t *pthru32;
121 > **************************************************************
122 > ****************
123
1Release Date : Mon Apr 11 12:27:22 EST 2006 - Seokmann Ju <sju@lsil.com> 124Release Date : Mon Apr 11 12:27:22 EST 2006 - Seokmann Ju <sju@lsil.com>
2Current Version : 2.20.4.8 (scsi module), 2.20.2.6 (cmm module) 125Current Version : 2.20.4.8 (scsi module), 2.20.2.6 (cmm module)
3Older Version : 2.20.4.7 (scsi module), 2.20.2.6 (cmm module) 126Older Version : 2.20.4.7 (scsi module), 2.20.2.6 (cmm module)
diff --git a/Documentation/scsi/ChangeLog.megaraid_sas b/Documentation/scsi/ChangeLog.megaraid_sas
index 0a85a7e8120e..d9e5960dafd5 100644
--- a/Documentation/scsi/ChangeLog.megaraid_sas
+++ b/Documentation/scsi/ChangeLog.megaraid_sas
@@ -1,4 +1,20 @@
1 1
21 Release Date : Sun May 14 22:49:52 PDT 2006 - Sumant Patro <Sumant.Patro@lsil.com>
32 Current Version : 00.00.03.01
43 Older Version : 00.00.02.04
5
6i. Added support for ZCR controller.
7
8 New device id 0x413 added.
9
10ii. Bug fix : Disable controller interrupt before firing INIT cmd to FW.
11
12 Interrupt is enabled after required initialization is over.
13 This is done to ensure that driver is ready to handle interrupts when
14 it is generated by the controller.
15
16 -Sumant Patro <Sumant.Patro@lsil.com>
17
21 Release Date : Wed Feb 03 14:31:44 PST 2006 - Sumant Patro <Sumant.Patro@lsil.com> 181 Release Date : Wed Feb 03 14:31:44 PST 2006 - Sumant Patro <Sumant.Patro@lsil.com>
32 Current Version : 00.00.02.04 192 Current Version : 00.00.02.04
43 Older Version : 00.00.02.04 203 Older Version : 00.00.02.04
diff --git a/Documentation/scsi/aacraid.txt b/Documentation/scsi/aacraid.txt
index be55670851a4..ee03678c8029 100644
--- a/Documentation/scsi/aacraid.txt
+++ b/Documentation/scsi/aacraid.txt
@@ -11,38 +11,43 @@ the original).
11Supported Cards/Chipsets 11Supported Cards/Chipsets
12------------------------- 12-------------------------
13 PCI ID (pci.ids) OEM Product 13 PCI ID (pci.ids) OEM Product
14 9005:0285:9005:028a Adaptec 2020ZCR (Skyhawk) 14 9005:0283:9005:0283 Adaptec Catapult (3210S with arc firmware)
15 9005:0285:9005:028e Adaptec 2020SA (Skyhawk) 15 9005:0284:9005:0284 Adaptec Tomcat (3410S with arc firmware)
16 9005:0285:9005:028b Adaptec 2025ZCR (Terminator)
17 9005:0285:9005:028f Adaptec 2025SA (Terminator)
18 9005:0285:9005:0286 Adaptec 2120S (Crusader)
19 9005:0286:9005:028d Adaptec 2130S (Lancer)
20 9005:0285:9005:0285 Adaptec 2200S (Vulcan) 16 9005:0285:9005:0285 Adaptec 2200S (Vulcan)
17 9005:0285:9005:0286 Adaptec 2120S (Crusader)
21 9005:0285:9005:0287 Adaptec 2200S (Vulcan-2m) 18 9005:0285:9005:0287 Adaptec 2200S (Vulcan-2m)
19 9005:0285:9005:0288 Adaptec 3230S (Harrier)
20 9005:0285:9005:0289 Adaptec 3240S (Tornado)
21 9005:0285:9005:028a Adaptec 2020ZCR (Skyhawk)
22 9005:0285:9005:028b Adaptec 2025ZCR (Terminator)
22 9005:0286:9005:028c Adaptec 2230S (Lancer) 23 9005:0286:9005:028c Adaptec 2230S (Lancer)
23 9005:0286:9005:028c Adaptec 2230SLP (Lancer) 24 9005:0286:9005:028c Adaptec 2230SLP (Lancer)
24 9005:0285:9005:0296 Adaptec 2240S (SabreExpress) 25 9005:0286:9005:028d Adaptec 2130S (Lancer)
26 9005:0285:9005:028e Adaptec 2020SA (Skyhawk)
27 9005:0285:9005:028f Adaptec 2025SA (Terminator)
25 9005:0285:9005:0290 Adaptec 2410SA (Jaguar) 28 9005:0285:9005:0290 Adaptec 2410SA (Jaguar)
26 9005:0285:9005:0293 Adaptec 21610SA (Corsair-16)
27 9005:0285:103c:3227 Adaptec 2610SA (Bearcat HP release) 29 9005:0285:103c:3227 Adaptec 2610SA (Bearcat HP release)
30 9005:0285:9005:0293 Adaptec 21610SA (Corsair-16)
31 9005:0285:9005:0296 Adaptec 2240S (SabreExpress)
28 9005:0285:9005:0292 Adaptec 2810SA (Corsair-8) 32 9005:0285:9005:0292 Adaptec 2810SA (Corsair-8)
29 9005:0285:9005:0294 Adaptec Prowler 33 9005:0285:9005:0294 Adaptec Prowler
30 9005:0286:9005:029d Adaptec 2420SA (Intruder HP release)
31 9005:0286:9005:029c Adaptec 2620SA (Intruder)
32 9005:0286:9005:029b Adaptec 2820SA (Intruder)
33 9005:0286:9005:02a7 Adaptec 2830SA (Skyray)
34 9005:0286:9005:02a8 Adaptec 2430SA (Skyray)
35 9005:0285:9005:0288 Adaptec 3230S (Harrier)
36 9005:0285:9005:0289 Adaptec 3240S (Tornado)
37 9005:0285:9005:0298 Adaptec 4000SAS (BlackBird)
38 9005:0285:9005:0297 Adaptec 4005SAS (AvonPark) 34 9005:0285:9005:0297 Adaptec 4005SAS (AvonPark)
35 9005:0285:9005:0298 Adaptec 4000SAS (BlackBird)
39 9005:0285:9005:0299 Adaptec 4800SAS (Marauder-X) 36 9005:0285:9005:0299 Adaptec 4800SAS (Marauder-X)
40 9005:0285:9005:029a Adaptec 4805SAS (Marauder-E) 37 9005:0285:9005:029a Adaptec 4805SAS (Marauder-E)
38 9005:0286:9005:029b Adaptec 2820SA (Intruder)
39 9005:0286:9005:029c Adaptec 2620SA (Intruder)
40 9005:0286:9005:029d Adaptec 2420SA (Intruder HP release)
41 9005:0286:9005:02a2 Adaptec 3800SAS (Hurricane44) 41 9005:0286:9005:02a2 Adaptec 3800SAS (Hurricane44)
42 9005:0286:9005:02a7 Adaptec 3805SAS (Hurricane80)
43 9005:0286:9005:02a8 Adaptec 3400SAS (Hurricane40)
44 9005:0286:9005:02ac Adaptec 1800SAS (Typhoon44)
45 9005:0286:9005:02b3 Adaptec 2400SAS (Hurricane40lm)
46 9005:0285:9005:02b5 Adaptec ASR5800 (Voodoo44)
47 9005:0285:9005:02b6 Adaptec ASR5805 (Voodoo80)
48 9005:0285:9005:02b7 Adaptec ASR5808 (Voodoo08)
42 1011:0046:9005:0364 Adaptec 5400S (Mustang) 49 1011:0046:9005:0364 Adaptec 5400S (Mustang)
43 1011:0046:9005:0365 Adaptec 5400S (Mustang) 50 1011:0046:9005:0365 Adaptec 5400S (Mustang)
44 9005:0283:9005:0283 Adaptec Catapult (3210S with arc firmware)
45 9005:0284:9005:0284 Adaptec Tomcat (3410S with arc firmware)
46 9005:0287:9005:0800 Adaptec Themisto (Jupiter) 51 9005:0287:9005:0800 Adaptec Themisto (Jupiter)
47 9005:0200:9005:0200 Adaptec Themisto (Jupiter) 52 9005:0200:9005:0200 Adaptec Themisto (Jupiter)
48 9005:0286:9005:0800 Adaptec Callisto (Jupiter) 53 9005:0286:9005:0800 Adaptec Callisto (Jupiter)
@@ -64,18 +69,20 @@ Supported Cards/Chipsets
64 9005:0285:9005:0290 IBM ServeRAID 7t (Jaguar) 69 9005:0285:9005:0290 IBM ServeRAID 7t (Jaguar)
65 9005:0285:1014:02F2 IBM ServeRAID 8i (AvonPark) 70 9005:0285:1014:02F2 IBM ServeRAID 8i (AvonPark)
66 9005:0285:1014:0312 IBM ServeRAID 8i (AvonParkLite) 71 9005:0285:1014:0312 IBM ServeRAID 8i (AvonParkLite)
67 9005:0286:1014:9580 IBM ServeRAID 8k/8k-l8 (Aurora)
68 9005:0286:1014:9540 IBM ServeRAID 8k/8k-l4 (AuroraLite) 72 9005:0286:1014:9540 IBM ServeRAID 8k/8k-l4 (AuroraLite)
69 9005:0286:9005:029f ICP ICP9014R0 (Lancer) 73 9005:0286:1014:9580 IBM ServeRAID 8k/8k-l8 (Aurora)
74 9005:0286:1014:034d IBM ServeRAID 8s (Hurricane)
70 9005:0286:9005:029e ICP ICP9024R0 (Lancer) 75 9005:0286:9005:029e ICP ICP9024R0 (Lancer)
76 9005:0286:9005:029f ICP ICP9014R0 (Lancer)
71 9005:0286:9005:02a0 ICP ICP9047MA (Lancer) 77 9005:0286:9005:02a0 ICP ICP9047MA (Lancer)
72 9005:0286:9005:02a1 ICP ICP9087MA (Lancer) 78 9005:0286:9005:02a1 ICP ICP9087MA (Lancer)
79 9005:0286:9005:02a3 ICP ICP5445AU (Hurricane44)
73 9005:0286:9005:02a4 ICP ICP9085LI (Marauder-X) 80 9005:0286:9005:02a4 ICP ICP9085LI (Marauder-X)
74 9005:0286:9005:02a5 ICP ICP5085BR (Marauder-E) 81 9005:0286:9005:02a5 ICP ICP5085BR (Marauder-E)
75 9005:0286:9005:02a3 ICP ICP5445AU (Hurricane44)
76 9005:0286:9005:02a6 ICP ICP9067MA (Intruder-6) 82 9005:0286:9005:02a6 ICP ICP9067MA (Intruder-6)
77 9005:0286:9005:02a9 ICP ICP5087AU (Skyray) 83 9005:0286:9005:02a9 ICP ICP5085AU (Hurricane80)
78 9005:0286:9005:02aa ICP ICP5047AU (Skyray) 84 9005:0286:9005:02aa ICP ICP5045AU (Hurricane40)
85 9005:0286:9005:02b4 ICP ICP5045AL (Hurricane40lm)
79 86
80People 87People
81------------------------- 88-------------------------
diff --git a/Documentation/scsi/arcmsr_spec.txt b/Documentation/scsi/arcmsr_spec.txt
new file mode 100644
index 000000000000..5e0042340fd3
--- /dev/null
+++ b/Documentation/scsi/arcmsr_spec.txt
@@ -0,0 +1,574 @@
1*******************************************************************************
2** ARECA FIRMWARE SPEC
3*******************************************************************************
4** Usage of IOP331 adapter
5** (All In/Out is in IOP331's view)
6** 1. Message 0 --> InitThread message and retrun code
7** 2. Doorbell is used for RS-232 emulation
8** inDoorBell : bit0 -- data in ready
9** (DRIVER DATA WRITE OK)
10** bit1 -- data out has been read
11** (DRIVER DATA READ OK)
12** outDooeBell: bit0 -- data out ready
13** (IOP331 DATA WRITE OK)
14** bit1 -- data in has been read
15** (IOP331 DATA READ OK)
16** 3. Index Memory Usage
17** offset 0xf00 : for RS232 out (request buffer)
18** offset 0xe00 : for RS232 in (scratch buffer)
19** offset 0xa00 : for inbound message code message_rwbuffer
20** (driver send to IOP331)
21** offset 0xa00 : for outbound message code message_rwbuffer
22** (IOP331 send to driver)
23** 4. RS-232 emulation
24** Currently 128 byte buffer is used
25** 1st uint32_t : Data length (1--124)
26** Byte 4--127 : Max 124 bytes of data
27** 5. PostQ
28** All SCSI Command must be sent through postQ:
29** (inbound queue port) Request frame must be 32 bytes aligned
30** #bit27--bit31 => flag for post ccb
31** #bit0--bit26 => real address (bit27--bit31) of post arcmsr_cdb
32** bit31 :
33** 0 : 256 bytes frame
34** 1 : 512 bytes frame
35** bit30 :
36** 0 : normal request
37** 1 : BIOS request
38** bit29 : reserved
39** bit28 : reserved
40** bit27 : reserved
41** ---------------------------------------------------------------------------
42** (outbount queue port) Request reply
43** #bit27--bit31
44** => flag for reply
45** #bit0--bit26
46** => real address (bit27--bit31) of reply arcmsr_cdb
47** bit31 : must be 0 (for this type of reply)
48** bit30 : reserved for BIOS handshake
49** bit29 : reserved
50** bit28 :
51** 0 : no error, ignore AdapStatus/DevStatus/SenseData
52** 1 : Error, error code in AdapStatus/DevStatus/SenseData
53** bit27 : reserved
54** 6. BIOS request
55** All BIOS request is the same with request from PostQ
56** Except :
57** Request frame is sent from configuration space
58** offset: 0x78 : Request Frame (bit30 == 1)
59** offset: 0x18 : writeonly to generate
60** IRQ to IOP331
61** Completion of request:
62** (bit30 == 0, bit28==err flag)
63** 7. Definition of SGL entry (structure)
64** 8. Message1 Out - Diag Status Code (????)
65** 9. Message0 message code :
66** 0x00 : NOP
67** 0x01 : Get Config
68** ->offset 0xa00 :for outbound message code message_rwbuffer
69** (IOP331 send to driver)
70** Signature 0x87974060(4)
71** Request len 0x00000200(4)
72** numbers of queue 0x00000100(4)
73** SDRAM Size 0x00000100(4)-->256 MB
74** IDE Channels 0x00000008(4)
75** vendor 40 bytes char
76** model 8 bytes char
77** FirmVer 16 bytes char
78** Device Map 16 bytes char
79** FirmwareVersion DWORD <== Added for checking of
80** new firmware capability
81** 0x02 : Set Config
82** ->offset 0xa00 :for inbound message code message_rwbuffer
83** (driver send to IOP331)
84** Signature 0x87974063(4)
85** UPPER32 of Request Frame (4)-->Driver Only
86** 0x03 : Reset (Abort all queued Command)
87** 0x04 : Stop Background Activity
88** 0x05 : Flush Cache
89** 0x06 : Start Background Activity
90** (re-start if background is halted)
91** 0x07 : Check If Host Command Pending
92** (Novell May Need This Function)
93** 0x08 : Set controller time
94** ->offset 0xa00 : for inbound message code message_rwbuffer
95** (driver to IOP331)
96** byte 0 : 0xaa <-- signature
97** byte 1 : 0x55 <-- signature
98** byte 2 : year (04)
99** byte 3 : month (1..12)
100** byte 4 : date (1..31)
101** byte 5 : hour (0..23)
102** byte 6 : minute (0..59)
103** byte 7 : second (0..59)
104*******************************************************************************
105*******************************************************************************
106** RS-232 Interface for Areca Raid Controller
107** The low level command interface is exclusive with VT100 terminal
108** --------------------------------------------------------------------
109** 1. Sequence of command execution
110** --------------------------------------------------------------------
111** (A) Header : 3 bytes sequence (0x5E, 0x01, 0x61)
112** (B) Command block : variable length of data including length,
113** command code, data and checksum byte
114** (C) Return data : variable length of data
115** --------------------------------------------------------------------
116** 2. Command block
117** --------------------------------------------------------------------
118** (A) 1st byte : command block length (low byte)
119** (B) 2nd byte : command block length (high byte)
120** note ..command block length shouldn't > 2040 bytes,
121** length excludes these two bytes
122** (C) 3rd byte : command code
123** (D) 4th and following bytes : variable length data bytes
124** depends on command code
125** (E) last byte : checksum byte (sum of 1st byte until last data byte)
126** --------------------------------------------------------------------
127** 3. Command code and associated data
128** --------------------------------------------------------------------
129** The following are command code defined in raid controller Command
130** code 0x10--0x1? are used for system level management,
131** no password checking is needed and should be implemented in separate
132** well controlled utility and not for end user access.
133** Command code 0x20--0x?? always check the password,
134** password must be entered to enable these command.
135** enum
136** {
137** GUI_SET_SERIAL=0x10,
138** GUI_SET_VENDOR,
139** GUI_SET_MODEL,
140** GUI_IDENTIFY,
141** GUI_CHECK_PASSWORD,
142** GUI_LOGOUT,
143** GUI_HTTP,
144** GUI_SET_ETHERNET_ADDR,
145** GUI_SET_LOGO,
146** GUI_POLL_EVENT,
147** GUI_GET_EVENT,
148** GUI_GET_HW_MONITOR,
149** // GUI_QUICK_CREATE=0x20, (function removed)
150** GUI_GET_INFO_R=0x20,
151** GUI_GET_INFO_V,
152** GUI_GET_INFO_P,
153** GUI_GET_INFO_S,
154** GUI_CLEAR_EVENT,
155** GUI_MUTE_BEEPER=0x30,
156** GUI_BEEPER_SETTING,
157** GUI_SET_PASSWORD,
158** GUI_HOST_INTERFACE_MODE,
159** GUI_REBUILD_PRIORITY,
160** GUI_MAX_ATA_MODE,
161** GUI_RESET_CONTROLLER,
162** GUI_COM_PORT_SETTING,
163** GUI_NO_OPERATION,
164** GUI_DHCP_IP,
165** GUI_CREATE_PASS_THROUGH=0x40,
166** GUI_MODIFY_PASS_THROUGH,
167** GUI_DELETE_PASS_THROUGH,
168** GUI_IDENTIFY_DEVICE,
169** GUI_CREATE_RAIDSET=0x50,
170** GUI_DELETE_RAIDSET,
171** GUI_EXPAND_RAIDSET,
172** GUI_ACTIVATE_RAIDSET,
173** GUI_CREATE_HOT_SPARE,
174** GUI_DELETE_HOT_SPARE,
175** GUI_CREATE_VOLUME=0x60,
176** GUI_MODIFY_VOLUME,
177** GUI_DELETE_VOLUME,
178** GUI_START_CHECK_VOLUME,
179** GUI_STOP_CHECK_VOLUME
180** };
181** Command description :
182** GUI_SET_SERIAL : Set the controller serial#
183** byte 0,1 : length
184** byte 2 : command code 0x10
185** byte 3 : password length (should be 0x0f)
186** byte 4-0x13 : should be "ArEcATecHnoLogY"
187** byte 0x14--0x23 : Serial number string (must be 16 bytes)
188** GUI_SET_VENDOR : Set vendor string for the controller
189** byte 0,1 : length
190** byte 2 : command code 0x11
191** byte 3 : password length (should be 0x08)
192** byte 4-0x13 : should be "ArEcAvAr"
193** byte 0x14--0x3B : vendor string (must be 40 bytes)
194** GUI_SET_MODEL : Set the model name of the controller
195** byte 0,1 : length
196** byte 2 : command code 0x12
197** byte 3 : password length (should be 0x08)
198** byte 4-0x13 : should be "ArEcAvAr"
199** byte 0x14--0x1B : model string (must be 8 bytes)
200** GUI_IDENTIFY : Identify device
201** byte 0,1 : length
202** byte 2 : command code 0x13
203** return "Areca RAID Subsystem "
204** GUI_CHECK_PASSWORD : Verify password
205** byte 0,1 : length
206** byte 2 : command code 0x14
207** byte 3 : password length
208** byte 4-0x?? : user password to be checked
209** GUI_LOGOUT : Logout GUI (force password checking on next command)
210** byte 0,1 : length
211** byte 2 : command code 0x15
212** GUI_HTTP : HTTP interface (reserved for Http proxy service)(0x16)
213**
214** GUI_SET_ETHERNET_ADDR : Set the ethernet MAC address
215** byte 0,1 : length
216** byte 2 : command code 0x17
217** byte 3 : password length (should be 0x08)
218** byte 4-0x13 : should be "ArEcAvAr"
219** byte 0x14--0x19 : Ethernet MAC address (must be 6 bytes)
220** GUI_SET_LOGO : Set logo in HTTP
221** byte 0,1 : length
222** byte 2 : command code 0x18
223** byte 3 : Page# (0/1/2/3) (0xff --> clear OEM logo)
224** byte 4/5/6/7 : 0x55/0xaa/0xa5/0x5a
225** byte 8 : TITLE.JPG data (each page must be 2000 bytes)
226** note page0 1st 2 byte must be
227** actual length of the JPG file
228** GUI_POLL_EVENT : Poll If Event Log Changed
229** byte 0,1 : length
230** byte 2 : command code 0x19
231** GUI_GET_EVENT : Read Event
232** byte 0,1 : length
233** byte 2 : command code 0x1a
234** byte 3 : Event Page (0:1st page/1/2/3:last page)
235** GUI_GET_HW_MONITOR : Get HW monitor data
236** byte 0,1 : length
237** byte 2 : command code 0x1b
238** byte 3 : # of FANs(example 2)
239** byte 4 : # of Voltage sensor(example 3)
240** byte 5 : # of temperature sensor(example 2)
241** byte 6 : # of power
242** byte 7/8 : Fan#0 (RPM)
243** byte 9/10 : Fan#1
244** byte 11/12 : Voltage#0 original value in *1000
245** byte 13/14 : Voltage#0 value
246** byte 15/16 : Voltage#1 org
247** byte 17/18 : Voltage#1
248** byte 19/20 : Voltage#2 org
249** byte 21/22 : Voltage#2
250** byte 23 : Temp#0
251** byte 24 : Temp#1
252** byte 25 : Power indicator (bit0 : power#0,
253** bit1 : power#1)
254** byte 26 : UPS indicator
255** GUI_QUICK_CREATE : Quick create raid/volume set
256** byte 0,1 : length
257** byte 2 : command code 0x20
258** byte 3/4/5/6 : raw capacity
259** byte 7 : raid level
260** byte 8 : stripe size
261** byte 9 : spare
262** byte 10/11/12/13: device mask (the devices to create raid/volume)
263** This function is removed, application like
264** to implement quick create function
265** need to use GUI_CREATE_RAIDSET and GUI_CREATE_VOLUMESET function.
266** GUI_GET_INFO_R : Get Raid Set Information
267** byte 0,1 : length
268** byte 2 : command code 0x20
269** byte 3 : raidset#
270** typedef struct sGUI_RAIDSET
271** {
272** BYTE grsRaidSetName[16];
273** DWORD grsCapacity;
274** DWORD grsCapacityX;
275** DWORD grsFailMask;
276** BYTE grsDevArray[32];
277** BYTE grsMemberDevices;
278** BYTE grsNewMemberDevices;
279** BYTE grsRaidState;
280** BYTE grsVolumes;
281** BYTE grsVolumeList[16];
282** BYTE grsRes1;
283** BYTE grsRes2;
284** BYTE grsRes3;
285** BYTE grsFreeSegments;
286** DWORD grsRawStripes[8];
287** DWORD grsRes4;
288** DWORD grsRes5; // Total to 128 bytes
289** DWORD grsRes6; // Total to 128 bytes
290** } sGUI_RAIDSET, *pGUI_RAIDSET;
291** GUI_GET_INFO_V : Get Volume Set Information
292** byte 0,1 : length
293** byte 2 : command code 0x21
294** byte 3 : volumeset#
295** typedef struct sGUI_VOLUMESET
296** {
297** BYTE gvsVolumeName[16]; // 16
298** DWORD gvsCapacity;
299** DWORD gvsCapacityX;
300** DWORD gvsFailMask;
301** DWORD gvsStripeSize;
302** DWORD gvsNewFailMask;
303** DWORD gvsNewStripeSize;
304** DWORD gvsVolumeStatus;
305** DWORD gvsProgress; // 32
306** sSCSI_ATTR gvsScsi;
307** BYTE gvsMemberDisks;
308** BYTE gvsRaidLevel; // 8
309** BYTE gvsNewMemberDisks;
310** BYTE gvsNewRaidLevel;
311** BYTE gvsRaidSetNumber;
312** BYTE gvsRes0; // 4
313** BYTE gvsRes1[4]; // 64 bytes
314** } sGUI_VOLUMESET, *pGUI_VOLUMESET;
315** GUI_GET_INFO_P : Get Physical Drive Information
316** byte 0,1 : length
317** byte 2 : command code 0x22
318** byte 3 : drive # (from 0 to max-channels - 1)
319** typedef struct sGUI_PHY_DRV
320** {
321** BYTE gpdModelName[40];
322** BYTE gpdSerialNumber[20];
323** BYTE gpdFirmRev[8];
324** DWORD gpdCapacity;
325** DWORD gpdCapacityX; // Reserved for expansion
326** BYTE gpdDeviceState;
327** BYTE gpdPioMode;
328** BYTE gpdCurrentUdmaMode;
329** BYTE gpdUdmaMode;
330** BYTE gpdDriveSelect;
331** BYTE gpdRaidNumber; // 0xff if not belongs to a raid set
332** sSCSI_ATTR gpdScsi;
333** BYTE gpdReserved[40]; // Total to 128 bytes
334** } sGUI_PHY_DRV, *pGUI_PHY_DRV;
335** GUI_GET_INFO_S : Get System Information
336** byte 0,1 : length
337** byte 2 : command code 0x23
338** typedef struct sCOM_ATTR
339** {
340** BYTE comBaudRate;
341** BYTE comDataBits;
342** BYTE comStopBits;
343** BYTE comParity;
344** BYTE comFlowControl;
345** } sCOM_ATTR, *pCOM_ATTR;
346** typedef struct sSYSTEM_INFO
347** {
348** BYTE gsiVendorName[40];
349** BYTE gsiSerialNumber[16];
350** BYTE gsiFirmVersion[16];
351** BYTE gsiBootVersion[16];
352** BYTE gsiMbVersion[16];
353** BYTE gsiModelName[8];
354** BYTE gsiLocalIp[4];
355** BYTE gsiCurrentIp[4];
356** DWORD gsiTimeTick;
357** DWORD gsiCpuSpeed;
358** DWORD gsiICache;
359** DWORD gsiDCache;
360** DWORD gsiScache;
361** DWORD gsiMemorySize;
362** DWORD gsiMemorySpeed;
363** DWORD gsiEvents;
364** BYTE gsiMacAddress[6];
365** BYTE gsiDhcp;
366** BYTE gsiBeeper;
367** BYTE gsiChannelUsage;
368** BYTE gsiMaxAtaMode;
369** BYTE gsiSdramEcc; // 1:if ECC enabled
370** BYTE gsiRebuildPriority;
371** sCOM_ATTR gsiComA; // 5 bytes
372** sCOM_ATTR gsiComB; // 5 bytes
373** BYTE gsiIdeChannels;
374** BYTE gsiScsiHostChannels;
375** BYTE gsiIdeHostChannels;
376** BYTE gsiMaxVolumeSet;
377** BYTE gsiMaxRaidSet;
378** BYTE gsiEtherPort; // 1:if ether net port supported
379** BYTE gsiRaid6Engine; // 1:Raid6 engine supported
380** BYTE gsiRes[75];
381** } sSYSTEM_INFO, *pSYSTEM_INFO;
382** GUI_CLEAR_EVENT : Clear System Event
383** byte 0,1 : length
384** byte 2 : command code 0x24
385** GUI_MUTE_BEEPER : Mute current beeper
386** byte 0,1 : length
387** byte 2 : command code 0x30
388** GUI_BEEPER_SETTING : Disable beeper
389** byte 0,1 : length
390** byte 2 : command code 0x31
391** byte 3 : 0->disable, 1->enable
392** GUI_SET_PASSWORD : Change password
393** byte 0,1 : length
394** byte 2 : command code 0x32
395** byte 3 : pass word length ( must <= 15 )
396** byte 4 : password (must be alpha-numerical)
397** GUI_HOST_INTERFACE_MODE : Set host interface mode
398** byte 0,1 : length
399** byte 2 : command code 0x33
400** byte 3 : 0->Independent, 1->cluster
401** GUI_REBUILD_PRIORITY : Set rebuild priority
402** byte 0,1 : length
403** byte 2 : command code 0x34
404** byte 3 : 0/1/2/3 (low->high)
405** GUI_MAX_ATA_MODE : Set maximum ATA mode to be used
406** byte 0,1 : length
407** byte 2 : command code 0x35
408** byte 3 : 0/1/2/3 (133/100/66/33)
409** GUI_RESET_CONTROLLER : Reset Controller
410** byte 0,1 : length
411** byte 2 : command code 0x36
412** *Response with VT100 screen (discard it)
413** GUI_COM_PORT_SETTING : COM port setting
414** byte 0,1 : length
415** byte 2 : command code 0x37
416** byte 3 : 0->COMA (term port),
417** 1->COMB (debug port)
418** byte 4 : 0/1/2/3/4/5/6/7
419** (1200/2400/4800/9600/19200/38400/57600/115200)
420** byte 5 : data bit
421** (0:7 bit, 1:8 bit : must be 8 bit)
422** byte 6 : stop bit (0:1, 1:2 stop bits)
423** byte 7 : parity (0:none, 1:off, 2:even)
424** byte 8 : flow control
425** (0:none, 1:xon/xoff, 2:hardware => must use none)
426** GUI_NO_OPERATION : No operation
427** byte 0,1 : length
428** byte 2 : command code 0x38
429** GUI_DHCP_IP : Set DHCP option and local IP address
430** byte 0,1 : length
431** byte 2 : command code 0x39
432** byte 3 : 0:dhcp disabled, 1:dhcp enabled
433** byte 4/5/6/7 : IP address
434** GUI_CREATE_PASS_THROUGH : Create pass through disk
435** byte 0,1 : length
436** byte 2 : command code 0x40
437** byte 3 : device #
438** byte 4 : scsi channel (0/1)
439** byte 5 : scsi id (0-->15)
440** byte 6 : scsi lun (0-->7)
441** byte 7 : tagged queue (1 : enabled)
442** byte 8 : cache mode (1 : enabled)
443** byte 9 : max speed (0/1/2/3/4,
444** async/20/40/80/160 for scsi)
445** (0/1/2/3/4, 33/66/100/133/150 for ide )
446** GUI_MODIFY_PASS_THROUGH : Modify pass through disk
447** byte 0,1 : length
448** byte 2 : command code 0x41
449** byte 3 : device #
450** byte 4 : scsi channel (0/1)
451** byte 5 : scsi id (0-->15)
452** byte 6 : scsi lun (0-->7)
453** byte 7 : tagged queue (1 : enabled)
454** byte 8 : cache mode (1 : enabled)
455** byte 9 : max speed (0/1/2/3/4,
456** async/20/40/80/160 for scsi)
457** (0/1/2/3/4, 33/66/100/133/150 for ide )
458** GUI_DELETE_PASS_THROUGH : Delete pass through disk
459** byte 0,1 : length
460** byte 2 : command code 0x42
461** byte 3 : device# to be deleted
462** GUI_IDENTIFY_DEVICE : Identify Device
463** byte 0,1 : length
464** byte 2 : command code 0x43
465** byte 3 : Flash Method
466** (0:flash selected, 1:flash not selected)
467** byte 4/5/6/7 : IDE device mask to be flashed
468** note .... no response data available
469** GUI_CREATE_RAIDSET : Create Raid Set
470** byte 0,1 : length
471** byte 2 : command code 0x50
472** byte 3/4/5/6 : device mask
473** byte 7-22 : raidset name (if byte 7 == 0:use default)
474** GUI_DELETE_RAIDSET : Delete Raid Set
475** byte 0,1 : length
476** byte 2 : command code 0x51
477** byte 3 : raidset#
478** GUI_EXPAND_RAIDSET : Expand Raid Set
479** byte 0,1 : length
480** byte 2 : command code 0x52
481** byte 3 : raidset#
482** byte 4/5/6/7 : device mask for expansion
483** byte 8/9/10 : (8:0 no change, 1 change, 0xff:terminate,
484** 9:new raid level,
485** 10:new stripe size
486** 0/1/2/3/4/5->4/8/16/32/64/128K )
487** byte 11/12/13 : repeat for each volume in the raidset
488** GUI_ACTIVATE_RAIDSET : Activate incomplete raid set
489** byte 0,1 : length
490** byte 2 : command code 0x53
491** byte 3 : raidset#
492** GUI_CREATE_HOT_SPARE : Create hot spare disk
493** byte 0,1 : length
494** byte 2 : command code 0x54
495** byte 3/4/5/6 : device mask for hot spare creation
496** GUI_DELETE_HOT_SPARE : Delete hot spare disk
497** byte 0,1 : length
498** byte 2 : command code 0x55
499** byte 3/4/5/6 : device mask for hot spare deletion
500** GUI_CREATE_VOLUME : Create volume set
501** byte 0,1 : length
502** byte 2 : command code 0x60
503** byte 3 : raidset#
504** byte 4-19 : volume set name
505** (if byte4 == 0, use default)
506** byte 20-27 : volume capacity (blocks)
507** byte 28 : raid level
508** byte 29 : stripe size
509** (0/1/2/3/4/5->4/8/16/32/64/128K)
510** byte 30 : channel
511** byte 31 : ID
512** byte 32 : LUN
513** byte 33 : 1 enable tag
514** byte 34 : 1 enable cache
515** byte 35 : speed
516** (0/1/2/3/4->async/20/40/80/160 for scsi)
517** (0/1/2/3/4->33/66/100/133/150 for IDE )
518** byte 36 : 1 to select quick init
519**
520** GUI_MODIFY_VOLUME : Modify volume Set
521** byte 0,1 : length
522** byte 2 : command code 0x61
523** byte 3 : volumeset#
524** byte 4-19 : new volume set name
525** (if byte4 == 0, not change)
526** byte 20-27 : new volume capacity (reserved)
527** byte 28 : new raid level
528** byte 29 : new stripe size
529** (0/1/2/3/4/5->4/8/16/32/64/128K)
530** byte 30 : new channel
531** byte 31 : new ID
532** byte 32 : new LUN
533** byte 33 : 1 enable tag
534** byte 34 : 1 enable cache
535** byte 35 : speed
536** (0/1/2/3/4->async/20/40/80/160 for scsi)
537** (0/1/2/3/4->33/66/100/133/150 for IDE )
538** GUI_DELETE_VOLUME : Delete volume set
539** byte 0,1 : length
540** byte 2 : command code 0x62
541** byte 3 : volumeset#
542** GUI_START_CHECK_VOLUME : Start volume consistency check
543** byte 0,1 : length
544** byte 2 : command code 0x63
545** byte 3 : volumeset#
546** GUI_STOP_CHECK_VOLUME : Stop volume consistency check
547** byte 0,1 : length
548** byte 2 : command code 0x64
549** ---------------------------------------------------------------------
550** 4. Returned data
551** ---------------------------------------------------------------------
552** (A) Header : 3 bytes sequence (0x5E, 0x01, 0x61)
553** (B) Length : 2 bytes
554** (low byte 1st, excludes length and checksum byte)
555** (C) status or data :
556** <1> If length == 1 ==> 1 byte status code
557** #define GUI_OK 0x41
558** #define GUI_RAIDSET_NOT_NORMAL 0x42
559** #define GUI_VOLUMESET_NOT_NORMAL 0x43
560** #define GUI_NO_RAIDSET 0x44
561** #define GUI_NO_VOLUMESET 0x45
562** #define GUI_NO_PHYSICAL_DRIVE 0x46
563** #define GUI_PARAMETER_ERROR 0x47
564** #define GUI_UNSUPPORTED_COMMAND 0x48
565** #define GUI_DISK_CONFIG_CHANGED 0x49
566** #define GUI_INVALID_PASSWORD 0x4a
567** #define GUI_NO_DISK_SPACE 0x4b
568** #define GUI_CHECKSUM_ERROR 0x4c
569** #define GUI_PASSWORD_REQUIRED 0x4d
570** <2> If length > 1 ==>
571** data block returned from controller
572** and the contents depends on the command code
573** (E) Checksum : checksum of length and status or data byte
574**************************************************************************
diff --git a/Documentation/scsi/libsas.txt b/Documentation/scsi/libsas.txt
new file mode 100644
index 000000000000..9e2078b2a615
--- /dev/null
+++ b/Documentation/scsi/libsas.txt
@@ -0,0 +1,484 @@
1SAS Layer
2---------
3
4The SAS Layer is a management infrastructure which manages
5SAS LLDDs. It sits between SCSI Core and SAS LLDDs. The
6layout is as follows: while SCSI Core is concerned with
7SAM/SPC issues, and a SAS LLDD+sequencer is concerned with
8phy/OOB/link management, the SAS layer is concerned with:
9
10 * SAS Phy/Port/HA event management (LLDD generates,
11 SAS Layer processes),
12 * SAS Port management (creation/destruction),
13 * SAS Domain discovery and revalidation,
14 * SAS Domain device management,
15 * SCSI Host registration/unregistration,
16 * Device registration with SCSI Core (SAS) or libata
17 (SATA), and
18 * Expander management and exporting expander control
19 to user space.
20
21A SAS LLDD is a PCI device driver. It is concerned with
22phy/OOB management, and vendor specific tasks and generates
23events to the SAS layer.
24
25The SAS Layer does most SAS tasks as outlined in the SAS 1.1
26spec.
27
28The sas_ha_struct describes the SAS LLDD to the SAS layer.
29Most of it is used by the SAS Layer but a few fields need to
30be initialized by the LLDDs.
31
32After initializing your hardware, from the probe() function
33you call sas_register_ha(). It will register your LLDD with
34the SCSI subsystem, creating a SCSI host and it will
35register your SAS driver with the sysfs SAS tree it creates.
36It will then return. Then you enable your phys to actually
37start OOB (at which point your driver will start calling the
38notify_* event callbacks).
39
40Structure descriptions:
41
42struct sas_phy --------------------
43Normally this is statically embedded to your driver's
44phy structure:
45 struct my_phy {
46 blah;
47 struct sas_phy sas_phy;
48 bleh;
49 };
50And then all the phys are an array of my_phy in your HA
51struct (shown below).
52
53Then as you go along and initialize your phys you also
54initialize the sas_phy struct, along with your own
55phy structure.
56
57In general, the phys are managed by the LLDD and the ports
58are managed by the SAS layer. So the phys are initialized
59and updated by the LLDD and the ports are initialized and
60updated by the SAS layer.
61
62There is a scheme where the LLDD can RW certain fields,
63and the SAS layer can only read such ones, and vice versa.
64The idea is to avoid unnecessary locking.
65
66enabled -- must be set (0/1)
67id -- must be set [0,MAX_PHYS)
68class, proto, type, role, oob_mode, linkrate -- must be set
69oob_mode -- you set this when OOB has finished and then notify
70the SAS Layer.
71
72sas_addr -- this normally points to an array holding the sas
73address of the phy, possibly somewhere in your my_phy
74struct.
75
76attached_sas_addr -- set this when you (LLDD) receive an
77IDENTIFY frame or a FIS frame, _before_ notifying the SAS
78layer. The idea is that sometimes the LLDD may want to fake
79or provide a different SAS address on that phy/port and this
80allows it to do this. At best you should copy the sas
81address from the IDENTIFY frame or maybe generate a SAS
82address for SATA directly attached devices. The Discover
83process may later change this.
84
85frame_rcvd -- this is where you copy the IDENTIFY/FIS frame
86when you get it; you lock, copy, set frame_rcvd_size and
87unlock the lock, and then call the event. It is a pointer
88since there's no way to know your hw frame size _exactly_,
89so you define the actual array in your phy struct and let
90this pointer point to it. You copy the frame from your
91DMAable memory to that area holding the lock.
92
93sas_prim -- this is where primitives go when they're
94received. See sas.h. Grab the lock, set the primitive,
95release the lock, notify.
96
97port -- this points to the sas_port if the phy belongs
98to a port -- the LLDD only reads this. It points to the
99sas_port this phy is part of. Set by the SAS Layer.
100
101ha -- may be set; the SAS layer sets it anyway.
102
103lldd_phy -- you should set this to point to your phy so you
104can find your way around faster when the SAS layer calls one
105of your callbacks and passes you a phy. If the sas_phy is
106embedded you can also use container_of -- whatever you
107prefer.
108
109
110struct sas_port --------------------
111The LLDD doesn't set any fields of this struct -- it only
112reads them. They should be self explanatory.
113
114phy_mask is 32 bit, this should be enough for now, as I
115haven't heard of a HA having more than 8 phys.
116
117lldd_port -- I haven't found use for that -- maybe other
118LLDD who wish to have internal port representation can make
119use of this.
120
121
122struct sas_ha_struct --------------------
123It normally is statically declared in your own LLDD
124structure describing your adapter:
125struct my_sas_ha {
126 blah;
127 struct sas_ha_struct sas_ha;
128 struct my_phy phys[MAX_PHYS];
129 struct sas_port sas_ports[MAX_PHYS]; /* (1) */
130 bleh;
131};
132
133(1) If your LLDD doesn't have its own port representation.
134
135What needs to be initialized (sample function given below).
136
137pcidev
138sas_addr -- since the SAS layer doesn't want to mess with
139 memory allocation, etc, this points to statically
140 allocated array somewhere (say in your host adapter
141 structure) and holds the SAS address of the host
142 adapter as given by you or the manufacturer, etc.
143sas_port
144sas_phy -- an array of pointers to structures. (see
145 note above on sas_addr).
146 These must be set. See more notes below.
147num_phys -- the number of phys present in the sas_phy array,
148 and the number of ports present in the sas_port
149 array. There can be a maximum num_phys ports (one per
150 port) so we drop the num_ports, and only use
151 num_phys.
152
153The event interface:
154
155 /* LLDD calls these to notify the class of an event. */
156 void (*notify_ha_event)(struct sas_ha_struct *, enum ha_event);
157 void (*notify_port_event)(struct sas_phy *, enum port_event);
158 void (*notify_phy_event)(struct sas_phy *, enum phy_event);
159
160When sas_register_ha() returns, those are set and can be
161called by the LLDD to notify the SAS layer of such events
162the SAS layer.
163
164The port notification:
165
166 /* The class calls these to notify the LLDD of an event. */
167 void (*lldd_port_formed)(struct sas_phy *);
168 void (*lldd_port_deformed)(struct sas_phy *);
169
170If the LLDD wants notification when a port has been formed
171or deformed it sets those to a function satisfying the type.
172
173A SAS LLDD should also implement at least one of the Task
174Management Functions (TMFs) described in SAM:
175
176 /* Task Management Functions. Must be called from process context. */
177 int (*lldd_abort_task)(struct sas_task *);
178 int (*lldd_abort_task_set)(struct domain_device *, u8 *lun);
179 int (*lldd_clear_aca)(struct domain_device *, u8 *lun);
180 int (*lldd_clear_task_set)(struct domain_device *, u8 *lun);
181 int (*lldd_I_T_nexus_reset)(struct domain_device *);
182 int (*lldd_lu_reset)(struct domain_device *, u8 *lun);
183 int (*lldd_query_task)(struct sas_task *);
184
185For more information please read SAM from T10.org.
186
187Port and Adapter management:
188
189 /* Port and Adapter management */
190 int (*lldd_clear_nexus_port)(struct sas_port *);
191 int (*lldd_clear_nexus_ha)(struct sas_ha_struct *);
192
193A SAS LLDD should implement at least one of those.
194
195Phy management:
196
197 /* Phy management */
198 int (*lldd_control_phy)(struct sas_phy *, enum phy_func);
199
200lldd_ha -- set this to point to your HA struct. You can also
201use container_of if you embedded it as shown above.
202
203A sample initialization and registration function
204can look like this (called last thing from probe())
205*but* before you enable the phys to do OOB:
206
207static int register_sas_ha(struct my_sas_ha *my_ha)
208{
209 int i;
210 static struct sas_phy *sas_phys[MAX_PHYS];
211 static struct sas_port *sas_ports[MAX_PHYS];
212
213 my_ha->sas_ha.sas_addr = &my_ha->sas_addr[0];
214
215 for (i = 0; i < MAX_PHYS; i++) {
216 sas_phys[i] = &my_ha->phys[i].sas_phy;
217 sas_ports[i] = &my_ha->sas_ports[i];
218 }
219
220 my_ha->sas_ha.sas_phy = sas_phys;
221 my_ha->sas_ha.sas_port = sas_ports;
222 my_ha->sas_ha.num_phys = MAX_PHYS;
223
224 my_ha->sas_ha.lldd_port_formed = my_port_formed;
225
226 my_ha->sas_ha.lldd_dev_found = my_dev_found;
227 my_ha->sas_ha.lldd_dev_gone = my_dev_gone;
228
229 my_ha->sas_ha.lldd_max_execute_num = lldd_max_execute_num; (1)
230
231 my_ha->sas_ha.lldd_queue_size = ha_can_queue;
232 my_ha->sas_ha.lldd_execute_task = my_execute_task;
233
234 my_ha->sas_ha.lldd_abort_task = my_abort_task;
235 my_ha->sas_ha.lldd_abort_task_set = my_abort_task_set;
236 my_ha->sas_ha.lldd_clear_aca = my_clear_aca;
237 my_ha->sas_ha.lldd_clear_task_set = my_clear_task_set;
238 my_ha->sas_ha.lldd_I_T_nexus_reset= NULL; (2)
239 my_ha->sas_ha.lldd_lu_reset = my_lu_reset;
240 my_ha->sas_ha.lldd_query_task = my_query_task;
241
242 my_ha->sas_ha.lldd_clear_nexus_port = my_clear_nexus_port;
243 my_ha->sas_ha.lldd_clear_nexus_ha = my_clear_nexus_ha;
244
245 my_ha->sas_ha.lldd_control_phy = my_control_phy;
246
247 return sas_register_ha(&my_ha->sas_ha);
248}
249
250(1) This is normally a LLDD parameter, something of the
251lines of a task collector. What it tells the SAS Layer is
252whether the SAS layer should run in Direct Mode (default:
253value 0 or 1) or Task Collector Mode (value greater than 1).
254
255In Direct Mode, the SAS Layer calls Execute Task as soon as
256it has a command to send to the SDS, _and_ this is a single
257command, i.e. not linked.
258
259Some hardware (e.g. aic94xx) has the capability to DMA more
260than one task at a time (interrupt) from host memory. Task
261Collector Mode is an optional feature for HAs which support
262this in their hardware. (Again, it is completely optional
263even if your hardware supports it.)
264
265In Task Collector Mode, the SAS Layer would do _natural_
266coalescing of tasks and at the appropriate moment it would
267call your driver to DMA more than one task in a single HA
268interrupt. DMBS may want to use this by insmod/modprobe
269setting the lldd_max_execute_num to something greater than
2701.
271
272(2) SAS 1.1 does not define I_T Nexus Reset TMF.
273
274Events
275------
276
277Events are _the only way_ a SAS LLDD notifies the SAS layer
278of anything. There is no other method or way a LLDD to tell
279the SAS layer of anything happening internally or in the SAS
280domain.
281
282Phy events:
283 PHYE_LOSS_OF_SIGNAL, (C)
284 PHYE_OOB_DONE,
285 PHYE_OOB_ERROR, (C)
286 PHYE_SPINUP_HOLD.
287
288Port events, passed on a _phy_:
289 PORTE_BYTES_DMAED, (M)
290 PORTE_BROADCAST_RCVD, (E)
291 PORTE_LINK_RESET_ERR, (C)
292 PORTE_TIMER_EVENT, (C)
293 PORTE_HARD_RESET.
294
295Host Adapter event:
296 HAE_RESET
297
298A SAS LLDD should be able to generate
299 - at least one event from group C (choice),
300 - events marked M (mandatory) are mandatory (only one),
301 - events marked E (expander) if it wants the SAS layer
302 to handle domain revalidation (only one such).
303 - Unmarked events are optional.
304
305Meaning:
306
307HAE_RESET -- when your HA got internal error and was reset.
308
309PORTE_BYTES_DMAED -- on receiving an IDENTIFY/FIS frame
310PORTE_BROADCAST_RCVD -- on receiving a primitive
311PORTE_LINK_RESET_ERR -- timer expired, loss of signal, loss
312of DWS, etc. (*)
313PORTE_TIMER_EVENT -- DWS reset timeout timer expired (*)
314PORTE_HARD_RESET -- Hard Reset primitive received.
315
316PHYE_LOSS_OF_SIGNAL -- the device is gone (*)
317PHYE_OOB_DONE -- OOB went fine and oob_mode is valid
318PHYE_OOB_ERROR -- Error while doing OOB, the device probably
319got disconnected. (*)
320PHYE_SPINUP_HOLD -- SATA is present, COMWAKE not sent.
321
322(*) should set/clear the appropriate fields in the phy,
323 or alternatively call the inlined sas_phy_disconnected()
324 which is just a helper, from their tasklet.
325
326The Execute Command SCSI RPC:
327
328 int (*lldd_execute_task)(struct sas_task *, int num,
329 unsigned long gfp_flags);
330
331Used to queue a task to the SAS LLDD. @task is the tasks to
332be executed. @num should be the number of tasks being
333queued at this function call (they are linked listed via
334task::list), @gfp_mask should be the gfp_mask defining the
335context of the caller.
336
337This function should implement the Execute Command SCSI RPC,
338or if you're sending a SCSI Task as linked commands, you
339should also use this function.
340
341That is, when lldd_execute_task() is called, the command(s)
342go out on the transport *immediately*. There is *no*
343queuing of any sort and at any level in a SAS LLDD.
344
345The use of task::list is two-fold, one for linked commands,
346the other discussed below.
347
348It is possible to queue up more than one task at a time, by
349initializing the list element of struct sas_task, and
350passing the number of tasks enlisted in this manner in num.
351
352Returns: -SAS_QUEUE_FULL, -ENOMEM, nothing was queued;
353 0, the task(s) were queued.
354
355If you want to pass num > 1, then either
356A) you're the only caller of this function and keep track
357 of what you've queued to the LLDD, or
358B) you know what you're doing and have a strategy of
359 retrying.
360
361As opposed to queuing one task at a time (function call),
362batch queuing of tasks, by having num > 1, greatly
363simplifies LLDD code, sequencer code, and _hardware design_,
364and has some performance advantages in certain situations
365(DBMS).
366
367The LLDD advertises if it can take more than one command at
368a time at lldd_execute_task(), by setting the
369lldd_max_execute_num parameter (controlled by "collector"
370module parameter in aic94xx SAS LLDD).
371
372You should leave this to the default 1, unless you know what
373you're doing.
374
375This is a function of the LLDD, to which the SAS layer can
376cater to.
377
378int lldd_queue_size
379 The host adapter's queue size. This is the maximum
380number of commands the lldd can have pending to domain
381devices on behalf of all upper layers submitting through
382lldd_execute_task().
383
384You really want to set this to something (much) larger than
3851.
386
387This _really_ has absolutely nothing to do with queuing.
388There is no queuing in SAS LLDDs.
389
390struct sas_task {
391 dev -- the device this task is destined to
392 list -- must be initialized (INIT_LIST_HEAD)
393 task_proto -- _one_ of enum sas_proto
394 scatter -- pointer to scatter gather list array
395 num_scatter -- number of elements in scatter
396 total_xfer_len -- total number of bytes expected to be transfered
397 data_dir -- PCI_DMA_...
398 task_done -- callback when the task has finished execution
399};
400
401When an external entity, entity other than the LLDD or the
402SAS Layer, wants to work with a struct domain_device, it
403_must_ call kobject_get() when getting a handle on the
404device and kobject_put() when it is done with the device.
405
406This does two things:
407 A) implements proper kfree() for the device;
408 B) increments/decrements the kref for all players:
409 domain_device
410 all domain_device's ... (if past an expander)
411 port
412 host adapter
413 pci device
414 and up the ladder, etc.
415
416DISCOVERY
417---------
418
419The sysfs tree has the following purposes:
420 a) It shows you the physical layout of the SAS domain at
421 the current time, i.e. how the domain looks in the
422 physical world right now.
423 b) Shows some device parameters _at_discovery_time_.
424
425This is a link to the tree(1) program, very useful in
426viewing the SAS domain:
427ftp://mama.indstate.edu/linux/tree/
428I expect user space applications to actually create a
429graphical interface of this.
430
431That is, the sysfs domain tree doesn't show or keep state if
432you e.g., change the meaning of the READY LED MEANING
433setting, but it does show you the current connection status
434of the domain device.
435
436Keeping internal device state changes is responsibility of
437upper layers (Command set drivers) and user space.
438
439When a device or devices are unplugged from the domain, this
440is reflected in the sysfs tree immediately, and the device(s)
441removed from the system.
442
443The structure domain_device describes any device in the SAS
444domain. It is completely managed by the SAS layer. A task
445points to a domain device, this is how the SAS LLDD knows
446where to send the task(s) to. A SAS LLDD only reads the
447contents of the domain_device structure, but it never creates
448or destroys one.
449
450Expander management from User Space
451-----------------------------------
452
453In each expander directory in sysfs, there is a file called
454"smp_portal". It is a binary sysfs attribute file, which
455implements an SMP portal (Note: this is *NOT* an SMP port),
456to which user space applications can send SMP requests and
457receive SMP responses.
458
459Functionality is deceptively simple:
460
4611. Build the SMP frame you want to send. The format and layout
462 is described in the SAS spec. Leave the CRC field equal 0.
463open(2)
4642. Open the expander's SMP portal sysfs file in RW mode.
465write(2)
4663. Write the frame you built in 1.
467read(2)
4684. Read the amount of data you expect to receive for the frame you built.
469 If you receive different amount of data you expected to receive,
470 then there was some kind of error.
471close(2)
472All this process is shown in detail in the function do_smp_func()
473and its callers, in the file "expander_conf.c".
474
475The kernel functionality is implemented in the file
476"sas_expander.c".
477
478The program "expander_conf.c" implements this. It takes one
479argument, the sysfs file name of the SMP portal to the
480expander, and gives expander information, including routing
481tables.
482
483The SMP portal gives you complete control of the expander,
484so please be careful.
diff --git a/Documentation/scsi/ppa.txt b/Documentation/scsi/ppa.txt
index 0dac88d86d87..5d9223bc1bd5 100644
--- a/Documentation/scsi/ppa.txt
+++ b/Documentation/scsi/ppa.txt
@@ -12,5 +12,3 @@ http://www.torque.net/parport/
12Email list for Linux Parport 12Email list for Linux Parport
13linux-parport@torque.net 13linux-parport@torque.net
14 14
15Email for problems with ZIP or ZIP Plus drivers
16campbell@torque.net
diff --git a/Documentation/scsi/tmscsim.txt b/Documentation/scsi/tmscsim.txt
index e165229adf50..df7a02bfb5bf 100644
--- a/Documentation/scsi/tmscsim.txt
+++ b/Documentation/scsi/tmscsim.txt
@@ -109,7 +109,7 @@ than the 33.33 MHz being in the PCI spec.
109 109
110If you want to share the IRQ with another device and the driver refuses to 110If you want to share the IRQ with another device and the driver refuses to
111do so, you might succeed with changing the DC390_IRQ type in tmscsim.c to 111do so, you might succeed with changing the DC390_IRQ type in tmscsim.c to
112SA_SHIRQ | SA_INTERRUPT. 112IRQF_SHARED | IRQF_DISABLED.
113 113
114 114
1153.Features 1153.Features
diff --git a/Documentation/seclvl.txt b/Documentation/seclvl.txt
deleted file mode 100644
index 97274d122d0e..000000000000
--- a/Documentation/seclvl.txt
+++ /dev/null
@@ -1,97 +0,0 @@
1BSD Secure Levels Linux Security Module
2Michael A. Halcrow <mike@halcrow.us>
3
4
5Introduction
6
7Under the BSD Secure Levels security model, sets of policies are
8associated with levels. Levels range from -1 to 2, with -1 being the
9weakest and 2 being the strongest. These security policies are
10enforced at the kernel level, so not even the superuser is able to
11disable or circumvent them. This hardens the machine against attackers
12who gain root access to the system.
13
14
15Levels and Policies
16
17Level -1 (Permanently Insecure):
18 - Cannot increase the secure level
19
20Level 0 (Insecure):
21 - Cannot ptrace the init process
22
23Level 1 (Default):
24 - /dev/mem and /dev/kmem are read-only
25 - IMMUTABLE and APPEND extended attributes, if set, may not be unset
26 - Cannot load or unload kernel modules
27 - Cannot write directly to a mounted block device
28 - Cannot perform raw I/O operations
29 - Cannot perform network administrative tasks
30 - Cannot setuid any file
31
32Level 2 (Secure):
33 - Cannot decrement the system time
34 - Cannot write to any block device, whether mounted or not
35 - Cannot unmount any mounted filesystems
36
37
38Compilation
39
40To compile the BSD Secure Levels LSM, seclvl.ko, enable the
41SECURITY_SECLVL configuration option. This is found under Security
42options -> BSD Secure Levels in the kernel configuration menu.
43
44
45Basic Usage
46
47Once the machine is in a running state, with all the necessary modules
48loaded and all the filesystems mounted, you can load the seclvl.ko
49module:
50
51# insmod seclvl.ko
52
53The module defaults to secure level 1, except when compiled directly
54into the kernel, in which case it defaults to secure level 0. To raise
55the secure level to 2, the administrator writes ``2'' to the
56seclvl/seclvl file under the sysfs mount point (assumed to be /sys in
57these examples):
58
59# echo -n "2" > /sys/seclvl/seclvl
60
61Alternatively, you can initialize the module at secure level 2 with
62the initlvl module parameter:
63
64# insmod seclvl.ko initlvl=2
65
66At this point, it is impossible to remove the module or reduce the
67secure level. If the administrator wishes to have the option of doing
68so, he must provide a module parameter, sha1_passwd, that specifies
69the SHA1 hash of the password that can be used to reduce the secure
70level to 0.
71
72To generate this SHA1 hash, the administrator can use OpenSSL:
73
74# echo -n "boogabooga" | openssl sha1
75abeda4e0f33defa51741217592bf595efb8d289c
76
77In order to use password-instigated secure level reduction, the SHA1
78crypto module must be loaded or compiled into the kernel:
79
80# insmod sha1.ko
81
82The administrator can then insmod the seclvl module, including the
83SHA1 hash of the password:
84
85# insmod seclvl.ko
86 sha1_passwd=abeda4e0f33defa51741217592bf595efb8d289c
87
88To reduce the secure level, write the password to seclvl/passwd under
89your sysfs mount point:
90
91# echo -n "boogabooga" > /sys/seclvl/passwd
92
93The September 2004 edition of Sys Admin Magazine has an article about
94the BSD Secure Levels LSM. I encourage you to refer to that article
95for a more in-depth treatment of this security module:
96
97http://www.samag.com/documents/s=9304/sam0409a/0409a.htm
diff --git a/Documentation/sh/new-machine.txt b/Documentation/sh/new-machine.txt
index eb2dd2e6993b..73988e0d112b 100644
--- a/Documentation/sh/new-machine.txt
+++ b/Documentation/sh/new-machine.txt
@@ -41,11 +41,6 @@ Board-specific code:
41 | 41 |
42 .. more boards here ... 42 .. more boards here ...
43 43
44It should also be noted that each board is required to have some certain
45headers. At the time of this writing, io.h is the only thing that needs
46to be provided for each board, and can generally just reference generic
47functions (with the exception of isa_port2addr).
48
49Next, for companion chips: 44Next, for companion chips:
50. 45.
51`-- arch 46`-- arch
@@ -104,12 +99,13 @@ and then populate that with sub-directories for each member of the family.
104Both the Solution Engine and the hp6xx boards are an example of this. 99Both the Solution Engine and the hp6xx boards are an example of this.
105 100
106After you have setup your new arch/sh/boards/ directory, remember that you 101After you have setup your new arch/sh/boards/ directory, remember that you
107also must add a directory in include/asm-sh for headers localized to this 102should also add a directory in include/asm-sh for headers localized to this
108board. In order to interoperate seamlessly with the build system, it's best 103board (if there are going to be more than one). In order to interoperate
109to have this directory the same as the arch/sh/boards/ directory name, 104seamlessly with the build system, it's best to have this directory the same
110though if your board is again part of a family, the build system has ways 105as the arch/sh/boards/ directory name, though if your board is again part of
111of dealing with this, and you can feel free to name the directory after 106a family, the build system has ways of dealing with this (via incdir-y
112the family member itself. 107overloading), and you can feel free to name the directory after the family
108member itself.
113 109
114There are a few things that each board is required to have, both in the 110There are a few things that each board is required to have, both in the
115arch/sh/boards and the include/asm-sh/ heirarchy. In order to better 111arch/sh/boards and the include/asm-sh/ heirarchy. In order to better
@@ -122,6 +118,7 @@ might look something like:
122 * arch/sh/boards/vapor/setup.c - Setup code for imaginary board 118 * arch/sh/boards/vapor/setup.c - Setup code for imaginary board
123 */ 119 */
124#include <linux/init.h> 120#include <linux/init.h>
121#include <asm/rtc.h> /* for board_time_init() */
125 122
126const char *get_system_type(void) 123const char *get_system_type(void)
127{ 124{
@@ -152,79 +149,57 @@ int __init platform_setup(void)
152} 149}
153 150
154Our new imaginary board will also have to tie into the machvec in order for it 151Our new imaginary board will also have to tie into the machvec in order for it
155to be of any use. Currently the machvec is slowly on its way out, but is still 152to be of any use.
156required for the time being. As such, let us take a look at what needs to be
157done for the machvec assignment.
158 153
159machvec functions fall into a number of categories: 154machvec functions fall into a number of categories:
160 155
161 - I/O functions to IO memory (inb etc) and PCI/main memory (readb etc). 156 - I/O functions to IO memory (inb etc) and PCI/main memory (readb etc).
162 - I/O remapping functions (ioremap etc) 157 - I/O mapping functions (ioport_map, ioport_unmap, etc).
163 - some initialisation functions 158 - a 'heartbeat' function.
164 - a 'heartbeat' function 159 - PCI and IRQ initialization routines.
165 - some miscellaneous flags 160 - Consistent allocators (for boards that need special allocators,
166 161 particularly for allocating out of some board-specific SRAM for DMA
167The tree can be built in two ways: 162 handles).
168 - as a fully generic build. All drivers are linked in, and all functions 163
169 go through the machvec 164There are machvec functions added and removed over time, so always be sure to
170 - as a machine specific build. In this case only the required drivers 165consult include/asm-sh/machvec.h for the current state of the machvec.
171 will be linked in, and some macros may be redefined to not go through 166
172 the machvec where performance is important (in particular IO functions). 167The kernel will automatically wrap in generic routines for undefined function
173 168pointers in the machvec at boot time, as machvec functions are referenced
174There are three ways in which IO can be performed: 169unconditionally throughout most of the tree. Some boards have incredibly
175 - none at all. This is really only useful for the 'unknown' machine type, 170sparse machvecs (such as the dreamcast and sh03), whereas others must define
176 which us designed to run on a machine about which we know nothing, and 171virtually everything (rts7751r2d).
177 so all all IO instructions do nothing. 172
178 - fully custom. In this case all IO functions go to a machine specific 173Adding a new machine is relatively trivial (using vapor as an example):
179 set of functions which can do what they like 174
180 - a generic set of functions. These will cope with most situations, 175If the board-specific definitions are quite minimalistic, as is the case for
181 and rely on a single function, mv_port2addr, which is called through the 176the vast majority of boards, simply having a single board-specific header is
182 machine vector, and converts an IO address into a memory address, which 177sufficient.
183 can be read from/written to directly. 178
184 179 - add a new file include/asm-sh/vapor.h which contains prototypes for
185Thus adding a new machine involves the following steps (I will assume I am
186adding a machine called vapor):
187
188 - add a new file include/asm-sh/vapor/io.h which contains prototypes for
189 any machine specific IO functions prefixed with the machine name, for 180 any machine specific IO functions prefixed with the machine name, for
190 example vapor_inb. These will be needed when filling out the machine 181 example vapor_inb. These will be needed when filling out the machine
191 vector. 182 vector.
192 183
193 This is the minimum that is required, however there are ample 184 Note that these prototypes are generated automatically by setting
194 opportunities to optimise this. In particular, by making the prototypes 185 __IO_PREFIX to something sensible. A typical example would be:
195 inline function definitions, it is possible to inline the function when 186
196 building machine specific versions. Note that the machine vector 187 #define __IO_PREFIX vapor
197 functions will still be needed, so that a module built for a generic 188 #include <asm/io_generic.h>
198 setup can be loaded. 189
199 190 somewhere in the board-specific header. Any boards being ported that still
200 - add a new file arch/sh/boards/vapor/mach.c. This contains the definition 191 have a legacy io.h should remove it entirely and switch to the new model.
201 of the machine vector. When building the machine specific version, this 192
202 will be the real machine vector (via an alias), while in the generic 193 - Add machine vector definitions to the board's setup.c. At a bare minimum,
203 version is used to initialise the machine vector, and then freed, by 194 this must be defined as something like:
204 making it initdata. This should be defined as: 195
205 196 struct sh_machine_vector mv_vapor __initmv = {
206 struct sh_machine_vector mv_vapor __initmv = { 197 .mv_name = "vapor",
207 .mv_name = "vapor", 198 };
208 } 199 ALIAS_MV(vapor)
209 ALIAS_MV(vapor) 200
210 201 - finally add a file arch/sh/boards/vapor/io.c, which contains definitions of
211 - finally add a file arch/sh/boards/vapor/io.c, which contains 202 the machine specific io functions (if there are enough to warrant it).
212 definitions of the machine specific io functions.
213
214A note about initialisation functions. Three initialisation functions are
215provided in the machine vector:
216 - mv_arch_init - called very early on from setup_arch
217 - mv_init_irq - called from init_IRQ, after the generic SH interrupt
218 initialisation
219 - mv_init_pci - currently not used
220
221Any other remaining functions which need to be called at start up can be
222added to the list using the __initcalls macro (or module_init if the code
223can be built as a module). Many generic drivers probe to see if the device
224they are targeting is present, however this may not always be appropriate,
225so a flag can be added to the machine vector which will be set on those
226machines which have the hardware in question, reducing the probe to a
227single conditional.
228 203
2293. Hooking into the Build System 2043. Hooking into the Build System
230================================ 205================================
@@ -303,4 +278,3 @@ which will in turn copy the defconfig for this board, run it through
303oldconfig (prompting you for any new options since the time of creation), 278oldconfig (prompting you for any new options since the time of creation),
304and start you on your way to having a functional kernel for your new 279and start you on your way to having a functional kernel for your new
305board. 280board.
306
diff --git a/Documentation/sh/register-banks.txt b/Documentation/sh/register-banks.txt
new file mode 100644
index 000000000000..a6719f2f6594
--- /dev/null
+++ b/Documentation/sh/register-banks.txt
@@ -0,0 +1,33 @@
1 Notes on register bank usage in the kernel
2 ==========================================
3
4Introduction
5------------
6
7The SH-3 and SH-4 CPU families traditionally include a single partial register
8bank (selected by SR.RB, only r0 ... r7 are banked), whereas other families
9may have more full-featured banking or simply no such capabilities at all.
10
11SR.RB banking
12-------------
13
14In the case of this type of banking, banked registers are mapped directly to
15r0 ... r7 if SR.RB is set to the bank we are interested in, otherwise ldc/stc
16can still be used to reference the banked registers (as r0_bank ... r7_bank)
17when in the context of another bank. The developer must keep the SR.RB value
18in mind when writing code that utilizes these banked registers, for obvious
19reasons. Userspace is also not able to poke at the bank1 values, so these can
20be used rather effectively as scratch registers by the kernel.
21
22Presently the kernel uses several of these registers.
23
24 - r0_bank, r1_bank (referenced as k0 and k1, used for scratch
25 registers when doing exception handling).
26 - r2_bank (used to track the EXPEVT/INTEVT code)
27 - Used by do_IRQ() and friends for doing irq mapping based off
28 of the interrupt exception vector jump table offset
29 - r6_bank (global interrupt mask)
30 - The SR.IMASK interrupt handler makes use of this to set the
31 interrupt priority level (used by local_irq_enable())
32 - r7_bank (current)
33
diff --git a/Documentation/sound/alsa/ALSA-Configuration.txt b/Documentation/sound/alsa/ALSA-Configuration.txt
index 87d76a5c73d0..e6b57dd46a4f 100644
--- a/Documentation/sound/alsa/ALSA-Configuration.txt
+++ b/Documentation/sound/alsa/ALSA-Configuration.txt
@@ -472,6 +472,22 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
472 472
473 The power-management is supported. 473 The power-management is supported.
474 474
475 Module snd-darla20
476 ------------------
477
478 Module for Echoaudio Darla20
479
480 This module supports multiple cards.
481 The driver requires the firmware loader support on kernel.
482
483 Module snd-darla24
484 ------------------
485
486 Module for Echoaudio Darla24
487
488 This module supports multiple cards.
489 The driver requires the firmware loader support on kernel.
490
475 Module snd-dt019x 491 Module snd-dt019x
476 ----------------- 492 -----------------
477 493
@@ -499,6 +515,14 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
499 515
500 The power-management is supported. 516 The power-management is supported.
501 517
518 Module snd-echo3g
519 -----------------
520
521 Module for Echoaudio 3G cards (Gina3G/Layla3G)
522
523 This module supports multiple cards.
524 The driver requires the firmware loader support on kernel.
525
502 Module snd-emu10k1 526 Module snd-emu10k1
503 ------------------ 527 ------------------
504 528
@@ -657,6 +681,22 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
657 681
658 The power-management is supported. 682 The power-management is supported.
659 683
684 Module snd-gina20
685 -----------------
686
687 Module for Echoaudio Gina20
688
689 This module supports multiple cards.
690 The driver requires the firmware loader support on kernel.
691
692 Module snd-gina24
693 -----------------
694
695 Module for Echoaudio Gina24
696
697 This module supports multiple cards.
698 The driver requires the firmware loader support on kernel.
699
660 Module snd-gusclassic 700 Module snd-gusclassic
661 --------------------- 701 ---------------------
662 702
@@ -718,6 +758,7 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
718 position_fix - Fix DMA pointer (0 = auto, 1 = none, 2 = POSBUF, 3 = FIFO size) 758 position_fix - Fix DMA pointer (0 = auto, 1 = none, 2 = POSBUF, 3 = FIFO size)
719 single_cmd - Use single immediate commands to communicate with 759 single_cmd - Use single immediate commands to communicate with
720 codecs (for debugging only) 760 codecs (for debugging only)
761 disable_msi - Disable Message Signaled Interrupt (MSI)
721 762
722 This module supports one card and autoprobe. 763 This module supports one card and autoprobe.
723 764
@@ -738,11 +779,16 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
738 6stack-digout 6-jack with a SPDIF out 779 6stack-digout 6-jack with a SPDIF out
739 w810 3-jack 780 w810 3-jack
740 z71v 3-jack (HP shared SPDIF) 781 z71v 3-jack (HP shared SPDIF)
741 asus 3-jack 782 asus 3-jack (ASUS Mobo)
783 asus-w1v ASUS W1V
784 asus-dig ASUS with SPDIF out
785 asus-dig2 ASUS with SPDIF out (using GPIO2)
742 uniwill 3-jack 786 uniwill 3-jack
743 F1734 2-jack 787 F1734 2-jack
744 lg LG laptop (m1 express dual) 788 lg LG laptop (m1 express dual)
745 lg-lw LG LW20 laptop 789 lg-lw LG LW20/LW25 laptop
790 tcl TCL S700
791 clevo Clevo laptops (m520G, m665n)
746 test for testing/debugging purpose, almost all controls can be 792 test for testing/debugging purpose, almost all controls can be
747 adjusted. Appearing only when compiled with 793 adjusted. Appearing only when compiled with
748 $CONFIG_SND_DEBUG=y 794 $CONFIG_SND_DEBUG=y
@@ -750,6 +796,7 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
750 796
751 ALC260 797 ALC260
752 hp HP machines 798 hp HP machines
799 hp-3013 HP machines (3013-variant)
753 fujitsu Fujitsu S7020 800 fujitsu Fujitsu S7020
754 acer Acer TravelMate 801 acer Acer TravelMate
755 basic fixed pin assignment (old default model) 802 basic fixed pin assignment (old default model)
@@ -757,18 +804,32 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
757 804
758 ALC262 805 ALC262
759 fujitsu Fujitsu Laptop 806 fujitsu Fujitsu Laptop
807 hp-bpc HP xw4400/6400/8400/9400 laptops
808 benq Benq ED8
760 basic fixed pin assignment w/o SPDIF 809 basic fixed pin assignment w/o SPDIF
761 auto auto-config reading BIOS (default) 810 auto auto-config reading BIOS (default)
762 811
763 ALC882/883/885 812 ALC882/885
764 3stack-dig 3-jack with SPDIF I/O 813 3stack-dig 3-jack with SPDIF I/O
765 6stck-dig 6-jack digital with SPDIF I/O 814 6stck-dig 6-jack digital with SPDIF I/O
815 arima Arima W820Di1
816 auto auto-config reading BIOS (default)
817
818 ALC883/888
819 3stack-dig 3-jack with SPDIF I/O
820 6stack-dig 6-jack digital with SPDIF I/O
821 3stack-6ch 3-jack 6-channel
822 3stack-6ch-dig 3-jack 6-channel with SPDIF I/O
823 6stack-dig-demo 6-jack digital for Intel demo board
824 acer Acer laptops (Travelmate 3012WTMi, Aspire 5600, etc)
766 auto auto-config reading BIOS (default) 825 auto auto-config reading BIOS (default)
767 826
768 ALC861 827 ALC861/660
769 3stack 3-jack 828 3stack 3-jack
770 3stack-dig 3-jack with SPDIF I/O 829 3stack-dig 3-jack with SPDIF I/O
771 6stack-dig 6-jack with SPDIF I/O 830 6stack-dig 6-jack with SPDIF I/O
831 3stack-660 3-jack (for ALC660)
832 uniwill-m31 Uniwill M31 laptop
772 auto auto-config reading BIOS (default) 833 auto auto-config reading BIOS (default)
773 834
774 CMI9880 835 CMI9880
@@ -797,10 +858,21 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
797 3stack-dig ditto with SPDIF 858 3stack-dig ditto with SPDIF
798 laptop 3-jack with hp-jack automute 859 laptop 3-jack with hp-jack automute
799 laptop-dig ditto with SPDIF 860 laptop-dig ditto with SPDIF
800 auto auto-confgi reading BIOS (default) 861 auto auto-config reading BIOS (default)
862
863 STAC9200/9205/9220/9221/9254
864 ref Reference board
865 3stack D945 3stack
866 5stack D945 5stack + SPDIF
867
868 STAC9227/9228/9229/927x
869 ref Reference board
870 3stack D965 3stack
871 5stack D965 5stack + SPDIF
801 872
802 STAC7661(?) 873 STAC9872
803 vaio Setup for VAIO FE550G/SZ110 874 vaio Setup for VAIO FE550G/SZ110
875 vaio-ar Setup for VAIO AR
804 876
805 If the default configuration doesn't work and one of the above 877 If the default configuration doesn't work and one of the above
806 matches with your device, report it together with the PCI 878 matches with your device, report it together with the PCI
@@ -937,6 +1009,30 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
937 driver isn't configured properly or you want to try another 1009 driver isn't configured properly or you want to try another
938 type for testing. 1010 type for testing.
939 1011
1012 Module snd-indigo
1013 -----------------
1014
1015 Module for Echoaudio Indigo
1016
1017 This module supports multiple cards.
1018 The driver requires the firmware loader support on kernel.
1019
1020 Module snd-indigodj
1021 -------------------
1022
1023 Module for Echoaudio Indigo DJ
1024
1025 This module supports multiple cards.
1026 The driver requires the firmware loader support on kernel.
1027
1028 Module snd-indigoio
1029 -------------------
1030
1031 Module for Echoaudio Indigo IO
1032
1033 This module supports multiple cards.
1034 The driver requires the firmware loader support on kernel.
1035
940 Module snd-intel8x0 1036 Module snd-intel8x0
941 ------------------- 1037 -------------------
942 1038
@@ -1036,6 +1132,22 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
1036 1132
1037 This module supports multiple cards. 1133 This module supports multiple cards.
1038 1134
1135 Module snd-layla20
1136 ------------------
1137
1138 Module for Echoaudio Layla20
1139
1140 This module supports multiple cards.
1141 The driver requires the firmware loader support on kernel.
1142
1143 Module snd-layla24
1144 ------------------
1145
1146 Module for Echoaudio Layla24
1147
1148 This module supports multiple cards.
1149 The driver requires the firmware loader support on kernel.
1150
1039 Module snd-maestro3 1151 Module snd-maestro3
1040 ------------------- 1152 -------------------
1041 1153
@@ -1056,6 +1168,14 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
1056 1168
1057 The power-management is supported. 1169 The power-management is supported.
1058 1170
1171 Module snd-mia
1172 ---------------
1173
1174 Module for Echoaudio Mia
1175
1176 This module supports multiple cards.
1177 The driver requires the firmware loader support on kernel.
1178
1059 Module snd-miro 1179 Module snd-miro
1060 --------------- 1180 ---------------
1061 1181
@@ -1088,6 +1208,14 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
1088 When no hotplug fw loader is available, you need to load the 1208 When no hotplug fw loader is available, you need to load the
1089 firmware via mixartloader utility in alsa-tools package. 1209 firmware via mixartloader utility in alsa-tools package.
1090 1210
1211 Module snd-mona
1212 ---------------
1213
1214 Module for Echoaudio Mona
1215
1216 This module supports multiple cards.
1217 The driver requires the firmware loader support on kernel.
1218
1091 Module snd-mpu401 1219 Module snd-mpu401
1092 ----------------- 1220 -----------------
1093 1221
@@ -1111,6 +1239,14 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
1111 1239
1112 Module supports only 1 card. This module has no enable option. 1240 Module supports only 1 card. This module has no enable option.
1113 1241
1242 Module snd-mts64
1243 ----------------
1244
1245 Module for Ego Systems (ESI) Miditerminal 4140
1246
1247 This module supports multiple devices.
1248 Requires parport (CONFIG_PARPORT).
1249
1114 Module snd-nm256 1250 Module snd-nm256
1115 ---------------- 1251 ----------------
1116 1252
diff --git a/Documentation/sound/alsa/DocBook/writing-an-alsa-driver.tmpl b/Documentation/sound/alsa/DocBook/writing-an-alsa-driver.tmpl
index 635cbb94357c..4807ef79a94d 100644
--- a/Documentation/sound/alsa/DocBook/writing-an-alsa-driver.tmpl
+++ b/Documentation/sound/alsa/DocBook/writing-an-alsa-driver.tmpl
@@ -1054,9 +1054,8 @@
1054 1054
1055 <para> 1055 <para>
1056 For a device which allows hotplugging, you can use 1056 For a device which allows hotplugging, you can use
1057 <function>snd_card_free_in_thread</function>. This one will 1057 <function>snd_card_free_when_closed</function>. This one will
1058 postpone the destruction and wait in a kernel-thread until all 1058 postpone the destruction until all devices are closed.
1059 devices are closed.
1060 </para> 1059 </para>
1061 1060
1062 </section> 1061 </section>
@@ -1149,7 +1148,7 @@
1149 } 1148 }
1150 chip->port = pci_resource_start(pci, 0); 1149 chip->port = pci_resource_start(pci, 0);
1151 if (request_irq(pci->irq, snd_mychip_interrupt, 1150 if (request_irq(pci->irq, snd_mychip_interrupt,
1152 SA_INTERRUPT|SA_SHIRQ, "My Chip", chip)) { 1151 IRQF_DISABLED|IRQF_SHARED, "My Chip", chip)) {
1153 printk(KERN_ERR "cannot grab irq %d\n", pci->irq); 1152 printk(KERN_ERR "cannot grab irq %d\n", pci->irq);
1154 snd_mychip_free(chip); 1153 snd_mychip_free(chip);
1155 return -EBUSY; 1154 return -EBUSY;
@@ -1172,7 +1171,7 @@
1172 } 1171 }
1173 1172
1174 /* PCI IDs */ 1173 /* PCI IDs */
1175 static struct pci_device_id snd_mychip_ids[] __devinitdata = { 1174 static struct pci_device_id snd_mychip_ids[] = {
1176 { PCI_VENDOR_ID_FOO, PCI_DEVICE_ID_BAR, 1175 { PCI_VENDOR_ID_FOO, PCI_DEVICE_ID_BAR,
1177 PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0, }, 1176 PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0, },
1178 .... 1177 ....
@@ -1323,7 +1322,7 @@
1323 <programlisting> 1322 <programlisting>
1324<![CDATA[ 1323<![CDATA[
1325 if (request_irq(pci->irq, snd_mychip_interrupt, 1324 if (request_irq(pci->irq, snd_mychip_interrupt,
1326 SA_INTERRUPT|SA_SHIRQ, "My Chip", chip)) { 1325 IRQF_DISABLED|IRQF_SHARED, "My Chip", chip)) {
1327 printk(KERN_ERR "cannot grab irq %d\n", pci->irq); 1326 printk(KERN_ERR "cannot grab irq %d\n", pci->irq);
1328 snd_mychip_free(chip); 1327 snd_mychip_free(chip);
1329 return -EBUSY; 1328 return -EBUSY;
@@ -1342,7 +1341,7 @@
1342 1341
1343 <para> 1342 <para>
1344 On the PCI bus, the interrupts can be shared. Thus, 1343 On the PCI bus, the interrupts can be shared. Thus,
1345 <constant>SA_SHIRQ</constant> is given as the interrupt flag of 1344 <constant>IRQF_SHARED</constant> is given as the interrupt flag of
1346 <function>request_irq()</function>. 1345 <function>request_irq()</function>.
1347 </para> 1346 </para>
1348 1347
@@ -1565,7 +1564,7 @@
1565 <informalexample> 1564 <informalexample>
1566 <programlisting> 1565 <programlisting>
1567<![CDATA[ 1566<![CDATA[
1568 static struct pci_device_id snd_mychip_ids[] __devinitdata = { 1567 static struct pci_device_id snd_mychip_ids[] = {
1569 { PCI_VENDOR_ID_FOO, PCI_DEVICE_ID_BAR, 1568 { PCI_VENDOR_ID_FOO, PCI_DEVICE_ID_BAR,
1570 PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0, }, 1569 PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0, },
1571 .... 1570 ....
@@ -3048,7 +3047,7 @@ struct _snd_pcm_runtime {
3048 </para> 3047 </para>
3049 3048
3050 <para> 3049 <para>
3051 If you aquire a spinlock in the interrupt handler, and the 3050 If you acquire a spinlock in the interrupt handler, and the
3052 lock is used in other pcm callbacks, too, then you have to 3051 lock is used in other pcm callbacks, too, then you have to
3053 release the lock before calling 3052 release the lock before calling
3054 <function>snd_pcm_period_elapsed()</function>, because 3053 <function>snd_pcm_period_elapsed()</function>, because
diff --git a/Documentation/sparc/sbus_drivers.txt b/Documentation/sparc/sbus_drivers.txt
index 876195dc2aef..4b9351624f13 100644
--- a/Documentation/sparc/sbus_drivers.txt
+++ b/Documentation/sparc/sbus_drivers.txt
@@ -25,42 +25,84 @@ the bits necessary to run your device. The most commonly
25used members of this structure, and their typical usage, 25used members of this structure, and their typical usage,
26will be detailed below. 26will be detailed below.
27 27
28 Here is how probing is performed by an SBUS driver 28 Here is a piece of skeleton code for perofming a device
29under Linux: 29probe in an SBUS driverunder Linux:
30 30
31 static void init_one_mydevice(struct sbus_dev *sdev) 31 static int __devinit mydevice_probe_one(struct sbus_dev *sdev)
32 { 32 {
33 struct mysdevice *mp = kzalloc(sizeof(*mp), GFP_KERNEL);
34
35 if (!mp)
36 return -ENODEV;
37
38 ...
39 dev_set_drvdata(&sdev->ofdev.dev, mp);
40 return 0;
33 ... 41 ...
34 } 42 }
35 43
36 static int mydevice_match(struct sbus_dev *sdev) 44 static int __devinit mydevice_probe(struct of_device *dev,
45 const struct of_device_id *match)
37 { 46 {
38 if (some_criteria(sdev)) 47 struct sbus_dev *sdev = to_sbus_device(&dev->dev);
39 return 1; 48
40 return 0; 49 return mydevice_probe_one(sdev);
41 } 50 }
42 51
43 static void mydevice_probe(void) 52 static int __devexit mydevice_remove(struct of_device *dev)
44 { 53 {
45 struct sbus_bus *sbus; 54 struct sbus_dev *sdev = to_sbus_device(&dev->dev);
46 struct sbus_dev *sdev; 55 struct mydevice *mp = dev_get_drvdata(&dev->dev);
47 56
48 for_each_sbus(sbus) { 57 return mydevice_remove_one(sdev, mp);
49 for_each_sbusdev(sdev, sbus) {
50 if (mydevice_match(sdev))
51 init_one_mydevice(sdev);
52 }
53 }
54 } 58 }
55 59
56 All this does is walk through all SBUS devices in the 60 static struct of_device_id mydevice_match[] = {
57system, checks each to see if it is of the type which 61 {
58your driver is written for, and if so it calls the init 62 .name = "mydevice",
59routine to attach the device and prepare to drive it. 63 },
64 {},
65 };
66
67 MODULE_DEVICE_TABLE(of, mydevice_match);
60 68
61 "init_one_mydevice" might do things like allocate software 69 static struct of_platform_driver mydevice_driver = {
62state structures, map in I/O registers, place the hardware 70 .name = "mydevice",
63into an initialized state, etc. 71 .match_table = mydevice_match,
72 .probe = mydevice_probe,
73 .remove = __devexit_p(mydevice_remove),
74 };
75
76 static int __init mydevice_init(void)
77 {
78 return of_register_driver(&mydevice_driver, &sbus_bus_type);
79 }
80
81 static void __exit mydevice_exit(void)
82 {
83 of_unregister_driver(&mydevice_driver);
84 }
85
86 module_init(mydevice_init);
87 module_exit(mydevice_exit);
88
89 The mydevice_match table is a series of entries which
90describes what SBUS devices your driver is meant for. In the
91simplest case you specify a string for the 'name' field. Every
92SBUS device with a 'name' property matching your string will
93be passed one-by-one to your .probe method.
94
95 You should store away your device private state structure
96pointer in the drvdata area so that you can retrieve it later on
97in your .remove method.
98
99 Any memory allocated, registers mapped, IRQs registered,
100etc. must be undone by your .remove method so that all resources
101of your device are relased by the time it returns.
102
103 You should _NOT_ use the for_each_sbus(), for_each_sbusdev(),
104and for_all_sbusdev() interfaces. They are deprecated, will be
105removed, and no new driver should reference them ever.
64 106
65 Mapping and Accessing I/O Registers 107 Mapping and Accessing I/O Registers
66 108
@@ -263,10 +305,3 @@ discussed above and plus it handles both PCI and SBUS boards.
263 Lance driver abuses consistent mappings for data transfer. 305 Lance driver abuses consistent mappings for data transfer.
264It is a nifty trick which we do not particularly recommend... 306It is a nifty trick which we do not particularly recommend...
265Just check it out and know that it's legal. 307Just check it out and know that it's legal.
266
267 Bad examples, do NOT use
268
269 drivers/video/cgsix.c
270 This one uses result of sbus_ioremap as if it is an address.
271This does NOT work on sparc64 and therefore is broken. We will
272convert it at a later date.
diff --git a/Documentation/sparse.txt b/Documentation/sparse.txt
index 5a311c38dd1a..f9c99c9a54f9 100644
--- a/Documentation/sparse.txt
+++ b/Documentation/sparse.txt
@@ -69,10 +69,10 @@ recompiled, or use "make C=2" to run sparse on the files whether they need to
69be recompiled or not. The latter is a fast way to check the whole tree if you 69be recompiled or not. The latter is a fast way to check the whole tree if you
70have already built it. 70have already built it.
71 71
72The optional make variable CF can be used to pass arguments to sparse. The 72The optional make variable CHECKFLAGS can be used to pass arguments to sparse.
73build system passes -Wbitwise to sparse automatically. To perform endianness 73The build system passes -Wbitwise to sparse automatically. To perform
74checks, you may define __CHECK_ENDIAN__: 74endianness checks, you may define __CHECK_ENDIAN__:
75 75
76 make C=2 CF="-D__CHECK_ENDIAN__" 76 make C=2 CHECKFLAGS="-D__CHECK_ENDIAN__"
77 77
78These checks are disabled by default as they generate a host of warnings. 78These checks are disabled by default as they generate a host of warnings.
diff --git a/Documentation/sysctl/fs.txt b/Documentation/sysctl/fs.txt
index 0b62c62142cf..5c3a51905969 100644
--- a/Documentation/sysctl/fs.txt
+++ b/Documentation/sysctl/fs.txt
@@ -25,6 +25,7 @@ Currently, these files are in /proc/sys/fs:
25- inode-state 25- inode-state
26- overflowuid 26- overflowuid
27- overflowgid 27- overflowgid
28- suid_dumpable
28- super-max 29- super-max
29- super-nr 30- super-nr
30 31
@@ -131,6 +132,25 @@ The default is 65534.
131 132
132============================================================== 133==============================================================
133 134
135suid_dumpable:
136
137This value can be used to query and set the core dump mode for setuid
138or otherwise protected/tainted binaries. The modes are
139
1400 - (default) - traditional behaviour. Any process which has changed
141 privilege levels or is execute only will not be dumped
1421 - (debug) - all processes dump core when possible. The core dump is
143 owned by the current user and no security is applied. This is
144 intended for system debugging situations only. Ptrace is unchecked.
1452 - (suidsafe) - any binary which normally would not be dumped is dumped
146 readable by root only. This allows the end user to remove
147 such a dump but not access it directly. For security reasons
148 core dumps in this mode will not overwrite one another or
149 other files. This mode is appropriate when adminstrators are
150 attempting to debug problems in a normal environment.
151
152==============================================================
153
134super-max & super-nr: 154super-max & super-nr:
135 155
136These numbers control the maximum number of superblocks, and 156These numbers control the maximum number of superblocks, and
diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index b0c7ab93dcb9..89bf8c20a586 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -50,7 +50,6 @@ show up in /proc/sys/kernel:
50- shmmax [ sysv ipc ] 50- shmmax [ sysv ipc ]
51- shmmni 51- shmmni
52- stop-a [ SPARC only ] 52- stop-a [ SPARC only ]
53- suid_dumpable
54- sysrq ==> Documentation/sysrq.txt 53- sysrq ==> Documentation/sysrq.txt
55- tainted 54- tainted
56- threads-max 55- threads-max
@@ -211,9 +210,8 @@ Controls the kernel's behaviour when an oops or BUG is encountered.
211 210
2120: try to continue operation 2110: try to continue operation
213 212
2141: delay a few seconds (to give klogd time to record the oops output) and 2131: panic immediatly. If the `panic' sysctl is also non-zero then the
215 then panic. If the `panic' sysctl is also non-zero then the machine will 214 machine will be rebooted.
216 be rebooted.
217 215
218============================================================== 216==============================================================
219 217
@@ -311,25 +309,6 @@ kernel. This value defaults to SHMMAX.
311 309
312============================================================== 310==============================================================
313 311
314suid_dumpable:
315
316This value can be used to query and set the core dump mode for setuid
317or otherwise protected/tainted binaries. The modes are
318
3190 - (default) - traditional behaviour. Any process which has changed
320 privilege levels or is execute only will not be dumped
3211 - (debug) - all processes dump core when possible. The core dump is
322 owned by the current user and no security is applied. This is
323 intended for system debugging situations only. Ptrace is unchecked.
3242 - (suidsafe) - any binary which normally would not be dumped is dumped
325 readable by root only. This allows the end user to remove
326 such a dump but not access it directly. For security reasons
327 core dumps in this mode will not overwrite one another or
328 other files. This mode is appropriate when adminstrators are
329 attempting to debug problems in a normal environment.
330
331==============================================================
332
333tainted: 312tainted:
334 313
335Non-zero if the kernel has been tainted. Numeric values, which 314Non-zero if the kernel has been tainted. Numeric values, which
diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
index 2dc246af4885..20d0d797f539 100644
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -28,7 +28,8 @@ Currently, these files are in /proc/sys/vm:
28- block_dump 28- block_dump
29- drop-caches 29- drop-caches
30- zone_reclaim_mode 30- zone_reclaim_mode
31- zone_reclaim_interval 31- min_unmapped_ratio
32- min_slab_ratio
32- panic_on_oom 33- panic_on_oom
33 34
34============================================================== 35==============================================================
@@ -138,7 +139,6 @@ This is value ORed together of
1381 = Zone reclaim on 1391 = Zone reclaim on
1392 = Zone reclaim writes dirty pages out 1402 = Zone reclaim writes dirty pages out
1404 = Zone reclaim swaps pages 1414 = Zone reclaim swaps pages
1418 = Also do a global slab reclaim pass
142 142
143zone_reclaim_mode is set during bootup to 1 if it is determined that pages 143zone_reclaim_mode is set during bootup to 1 if it is determined that pages
144from remote zones will cause a measurable performance reduction. The 144from remote zones will cause a measurable performance reduction. The
@@ -162,22 +162,36 @@ Allowing regular swap effectively restricts allocations to the local
162node unless explicitly overridden by memory policies or cpuset 162node unless explicitly overridden by memory policies or cpuset
163configurations. 163configurations.
164 164
165It may be advisable to allow slab reclaim if the system makes heavy 165=============================================================
166use of files and builds up large slab caches. However, the slab 166
167shrink operation is global, may take a long time and free slabs 167min_unmapped_ratio:
168in all nodes of the system. 168
169This is available only on NUMA kernels.
170
171A percentage of the total pages in each zone. Zone reclaim will only
172occur if more than this percentage of pages are file backed and unmapped.
173This is to insure that a minimal amount of local pages is still available for
174file I/O even if the node is overallocated.
175
176The default is 1 percent.
177
178=============================================================
169 179
170================================================================ 180min_slab_ratio:
171 181
172zone_reclaim_interval: 182This is available only on NUMA kernels.
173 183
174The time allowed for off node allocations after zone reclaim 184A percentage of the total pages in each zone. On Zone reclaim
175has failed to reclaim enough pages to allow a local allocation. 185(fallback from the local zone occurs) slabs will be reclaimed if more
186than this percentage of pages in a zone are reclaimable slab pages.
187This insures that the slab growth stays under control even in NUMA
188systems that rarely perform global reclaim.
176 189
177Time is set in seconds and set by default to 30 seconds. 190The default is 5 percent.
178 191
179Reduce the interval if undesired off node allocations occur. However, too 192Note that slab reclaim is triggered in a per zone / node fashion.
180frequent scans will have a negative impact onoff node allocation performance. 193The process of reclaiming slab memory is currently not node specific
194and may not be fast.
181 195
182============================================================= 196=============================================================
183 197
diff --git a/Documentation/sysrq.txt b/Documentation/sysrq.txt
index ad0bedf678b3..e0188a23fd5e 100644
--- a/Documentation/sysrq.txt
+++ b/Documentation/sysrq.txt
@@ -115,8 +115,9 @@ trojan program is running at console and which could grab your password
115when you would try to login. It will kill all programs on given console 115when you would try to login. It will kill all programs on given console
116and thus letting you make sure that the login prompt you see is actually 116and thus letting you make sure that the login prompt you see is actually
117the one from init, not some trojan program. 117the one from init, not some trojan program.
118IMPORTANT:In its true form it is not a true SAK like the one in :IMPORTANT 118IMPORTANT: In its true form it is not a true SAK like the one in a :IMPORTANT
119IMPORTANT:c2 compliant systems, and it should be mistook as such. :IMPORTANT 119IMPORTANT: c2 compliant system, and it should not be mistaken as :IMPORTANT
120IMPORTANT: such. :IMPORTANT
120 It seems other find it useful as (System Attention Key) which is 121 It seems other find it useful as (System Attention Key) which is
121useful when you want to exit a program that will not let you switch consoles. 122useful when you want to exit a program that will not let you switch consoles.
122(For example, X or a svgalib program.) 123(For example, X or a svgalib program.)
diff --git a/Documentation/tty.txt b/Documentation/tty.txt
index 8ff7bc2a0811..dab56604745d 100644
--- a/Documentation/tty.txt
+++ b/Documentation/tty.txt
@@ -80,13 +80,6 @@ receive_buf() - Hand buffers of bytes from the driver to the ldisc
80 for processing. Semantics currently rather 80 for processing. Semantics currently rather
81 mysterious 8( 81 mysterious 8(
82 82
83receive_room() - Can be called by the driver layer at any time when
84 the ldisc is opened. The ldisc must be able to
85 handle the reported amount of data at that instant.
86 Synchronization between active receive_buf and
87 receive_room calls is down to the driver not the
88 ldisc. Must not sleep.
89
90write_wakeup() - May be called at any point between open and close. 83write_wakeup() - May be called at any point between open and close.
91 The TTY_DO_WRITE_WAKEUP flag indicates if a call 84 The TTY_DO_WRITE_WAKEUP flag indicates if a call
92 is needed but always races versus calls. Thus the 85 is needed but always races versus calls. Thus the
diff --git a/Documentation/usb/error-codes.txt b/Documentation/usb/error-codes.txt
index 867f4c38f356..39c68f8c4e6c 100644
--- a/Documentation/usb/error-codes.txt
+++ b/Documentation/usb/error-codes.txt
@@ -98,13 +98,13 @@ one or more packets could finish before an error stops further endpoint I/O.
98 error, a failure to respond (often caused by 98 error, a failure to respond (often caused by
99 device disconnect), or some other fault. 99 device disconnect), or some other fault.
100 100
101-ETIMEDOUT (**) No response packet received within the prescribed 101-ETIME (**) No response packet received within the prescribed
102 bus turn-around time. This error may instead be 102 bus turn-around time. This error may instead be
103 reported as -EPROTO or -EILSEQ. 103 reported as -EPROTO or -EILSEQ.
104 104
105 Note that the synchronous USB message functions 105-ETIMEDOUT Synchronous USB message functions use this code
106 also use this code to indicate timeout expired 106 to indicate timeout expired before the transfer
107 before the transfer completed. 107 completed, and no other error was reported by HC.
108 108
109-EPIPE (**) Endpoint stalled. For non-control endpoints, 109-EPIPE (**) Endpoint stalled. For non-control endpoints,
110 reset this status with usb_clear_halt(). 110 reset this status with usb_clear_halt().
@@ -163,6 +163,3 @@ usb_get_*/usb_set_*():
163usb_control_msg(): 163usb_control_msg():
164usb_bulk_msg(): 164usb_bulk_msg():
165-ETIMEDOUT Timeout expired before the transfer completed. 165-ETIMEDOUT Timeout expired before the transfer completed.
166 In the future this code may change to -ETIME,
167 whose definition is a closer match to this sort
168 of error.
diff --git a/Documentation/usb/proc_usb_info.txt b/Documentation/usb/proc_usb_info.txt
index f86550fe38ee..22c5331260ca 100644
--- a/Documentation/usb/proc_usb_info.txt
+++ b/Documentation/usb/proc_usb_info.txt
@@ -59,7 +59,7 @@ bind to an interface (or perhaps several) using an ioctl call. You
59would issue more ioctls to the device to communicate to it using 59would issue more ioctls to the device to communicate to it using
60control, bulk, or other kinds of USB transfers. The IOCTLs are 60control, bulk, or other kinds of USB transfers. The IOCTLs are
61listed in the <linux/usbdevice_fs.h> file, and at this writing the 61listed in the <linux/usbdevice_fs.h> file, and at this writing the
62source code (linux/drivers/usb/devio.c) is the primary reference 62source code (linux/drivers/usb/core/devio.c) is the primary reference
63for how to access devices through those files. 63for how to access devices through those files.
64 64
65Note that since by default these BBB/DDD files are writable only by 65Note that since by default these BBB/DDD files are writable only by
diff --git a/Documentation/usb/usb-help.txt b/Documentation/usb/usb-help.txt
index b7c324973695..a7408593829f 100644
--- a/Documentation/usb/usb-help.txt
+++ b/Documentation/usb/usb-help.txt
@@ -5,8 +5,7 @@ For USB help other than the readme files that are located in
5Documentation/usb/*, see the following: 5Documentation/usb/*, see the following:
6 6
7Linux-USB project: http://www.linux-usb.org 7Linux-USB project: http://www.linux-usb.org
8 mirrors at http://www.suse.cz/development/linux-usb/ 8 mirrors at http://usb.in.tum.de/linux-usb/
9 and http://usb.in.tum.de/linux-usb/
10 and http://it.linux-usb.org 9 and http://it.linux-usb.org
11Linux USB Guide: http://linux-usb.sourceforge.net 10Linux USB Guide: http://linux-usb.sourceforge.net
12Linux-USB device overview (working devices and drivers): 11Linux-USB device overview (working devices and drivers):
diff --git a/Documentation/usb/usb-serial.txt b/Documentation/usb/usb-serial.txt
index f001cd93b79b..a2dee6e6190d 100644
--- a/Documentation/usb/usb-serial.txt
+++ b/Documentation/usb/usb-serial.txt
@@ -399,10 +399,10 @@ REINER SCT cyberJack pinpad/e-com USB chipcard reader
399 399
400Prolific PL2303 Driver 400Prolific PL2303 Driver
401 401
402 This driver support any device that has the PL2303 chip from Prolific 402 This driver supports any device that has the PL2303 chip from Prolific
403 in it. This includes a number of single port USB to serial 403 in it. This includes a number of single port USB to serial
404 converters and USB GPS devices. Devices from Aten (the UC-232) and 404 converters and USB GPS devices. Devices from Aten (the UC-232) and
405 IO-Data work with this driver. 405 IO-Data work with this driver, as does the DCU-11 mobile-phone cable.
406 406
407 For any questions or problems with this driver, please contact Greg 407 For any questions or problems with this driver, please contact Greg
408 Kroah-Hartman at greg@kroah.com 408 Kroah-Hartman at greg@kroah.com
@@ -433,6 +433,11 @@ Options supported:
433 See http://www.uuhaus.de/linux/palmconnect.html for up-to-date 433 See http://www.uuhaus.de/linux/palmconnect.html for up-to-date
434 information on this driver. 434 information on this driver.
435 435
436AIRcable USB Dongle Bluetooth driver
437 If there is the cdc_acm driver loaded in the system, you will find that the
438 cdc_acm claims the device before AIRcable can. This is simply corrected
439 by unloading both modules and then loading the aircable module before
440 cdc_acm module
436 441
437Generic Serial driver 442Generic Serial driver
438 443
diff --git a/Documentation/video4linux/CARDLIST.bttv b/Documentation/video4linux/CARDLIST.bttv
index b72706c58a44..4efa4645885f 100644
--- a/Documentation/video4linux/CARDLIST.bttv
+++ b/Documentation/video4linux/CARDLIST.bttv
@@ -87,7 +87,7 @@
87 86 -> Osprey 101/151 w/ svid 87 86 -> Osprey 101/151 w/ svid
88 87 -> Osprey 200/201/250/251 88 87 -> Osprey 200/201/250/251
89 88 -> Osprey 200/250 [0070:ff01] 89 88 -> Osprey 200/250 [0070:ff01]
90 89 -> Osprey 210/220 90 89 -> Osprey 210/220/230
91 90 -> Osprey 500 [0070:ff02] 91 90 -> Osprey 500 [0070:ff02]
92 91 -> Osprey 540 [0070:ff04] 92 91 -> Osprey 540 [0070:ff04]
93 92 -> Osprey 2000 [0070:ff03] 93 92 -> Osprey 2000 [0070:ff03]
@@ -111,7 +111,7 @@
111110 -> IVC-100 [ff00:a132] 111110 -> IVC-100 [ff00:a132]
112111 -> IVC-120G [ff00:a182,ff01:a182,ff02:a182,ff03:a182,ff04:a182,ff05:a182,ff06:a182,ff07:a182,ff08:a182,ff09:a182,ff0a:a182,ff0b:a182,ff0c:a182,ff0d:a182,ff0e:a182,ff0f:a182] 112111 -> IVC-120G [ff00:a182,ff01:a182,ff02:a182,ff03:a182,ff04:a182,ff05:a182,ff06:a182,ff07:a182,ff08:a182,ff09:a182,ff0a:a182,ff0b:a182,ff0c:a182,ff0d:a182,ff0e:a182,ff0f:a182]
113112 -> pcHDTV HD-2000 TV [7063:2000] 113112 -> pcHDTV HD-2000 TV [7063:2000]
114113 -> Twinhan DST + clones [11bd:0026,1822:0001,270f:fc00] 114113 -> Twinhan DST + clones [11bd:0026,1822:0001,270f:fc00,1822:0026]
115114 -> Winfast VC100 [107d:6607] 115114 -> Winfast VC100 [107d:6607]
116115 -> Teppro TEV-560/InterVision IV-560 116115 -> Teppro TEV-560/InterVision IV-560
117116 -> SIMUS GVC1100 [aa6a:82b2] 117116 -> SIMUS GVC1100 [aa6a:82b2]
diff --git a/Documentation/video4linux/CARDLIST.cx88 b/Documentation/video4linux/CARDLIST.cx88
index 3b39a91b24bd..00d9a1f2a54c 100644
--- a/Documentation/video4linux/CARDLIST.cx88
+++ b/Documentation/video4linux/CARDLIST.cx88
@@ -15,7 +15,7 @@
15 14 -> KWorld/VStream XPert DVB-T [17de:08a6] 15 14 -> KWorld/VStream XPert DVB-T [17de:08a6]
16 15 -> DViCO FusionHDTV DVB-T1 [18ac:db00] 16 15 -> DViCO FusionHDTV DVB-T1 [18ac:db00]
17 16 -> KWorld LTV883RF 17 16 -> KWorld LTV883RF
18 17 -> DViCO FusionHDTV 3 Gold-Q [18ac:d810] 18 17 -> DViCO FusionHDTV 3 Gold-Q [18ac:d810,18ac:d800]
19 18 -> Hauppauge Nova-T DVB-T [0070:9002,0070:9001] 19 18 -> Hauppauge Nova-T DVB-T [0070:9002,0070:9001]
20 19 -> Conexant DVB-T reference design [14f1:0187] 20 19 -> Conexant DVB-T reference design [14f1:0187]
21 20 -> Provideo PV259 [1540:2580] 21 20 -> Provideo PV259 [1540:2580]
@@ -40,8 +40,14 @@
40 39 -> KWorld DVB-S 100 [17de:08b2] 40 39 -> KWorld DVB-S 100 [17de:08b2]
41 40 -> Hauppauge WinTV-HVR1100 DVB-T/Hybrid [0070:9400,0070:9402] 41 40 -> Hauppauge WinTV-HVR1100 DVB-T/Hybrid [0070:9400,0070:9402]
42 41 -> Hauppauge WinTV-HVR1100 DVB-T/Hybrid (Low Profile) [0070:9800,0070:9802] 42 41 -> Hauppauge WinTV-HVR1100 DVB-T/Hybrid (Low Profile) [0070:9800,0070:9802]
43 42 -> digitalnow DNTV Live! DVB-T Pro [1822:0025] 43 42 -> digitalnow DNTV Live! DVB-T Pro [1822:0025,1822:0019]
44 43 -> KWorld/VStream XPert DVB-T with cx22702 [17de:08a1] 44 43 -> KWorld/VStream XPert DVB-T with cx22702 [17de:08a1]
45 44 -> DViCO FusionHDTV DVB-T Dual Digital [18ac:db50,18ac:db54] 45 44 -> DViCO FusionHDTV DVB-T Dual Digital [18ac:db50,18ac:db54]
46 45 -> KWorld HardwareMpegTV XPert [17de:0840] 46 45 -> KWorld HardwareMpegTV XPert [17de:0840]
47 46 -> DViCO FusionHDTV DVB-T Hybrid [18ac:db40,18ac:db44] 47 46 -> DViCO FusionHDTV DVB-T Hybrid [18ac:db40,18ac:db44]
48 47 -> pcHDTV HD5500 HDTV [7063:5500]
49 48 -> Kworld MCE 200 Deluxe [17de:0841]
50 49 -> PixelView PlayTV P7000 [1554:4813]
51 50 -> NPG Tech Real TV FM Top 10 [14f1:0842]
52 51 -> WinFast DTV2000 H [107d:665e]
53 52 -> Geniatech DVB-S [14f1:0084]
diff --git a/Documentation/video4linux/CARDLIST.saa7134 b/Documentation/video4linux/CARDLIST.saa7134
index bca50903233f..9068b669f5ee 100644
--- a/Documentation/video4linux/CARDLIST.saa7134
+++ b/Documentation/video4linux/CARDLIST.saa7134
@@ -93,3 +93,4 @@
93 92 -> AVerMedia A169 B1 [1461:6360] 93 92 -> AVerMedia A169 B1 [1461:6360]
94 93 -> Medion 7134 Bridge #2 [16be:0005] 94 93 -> Medion 7134 Bridge #2 [16be:0005]
95 94 -> LifeView FlyDVB-T Hybrid Cardbus [5168:3306,5168:3502] 95 94 -> LifeView FlyDVB-T Hybrid Cardbus [5168:3306,5168:3502]
96 95 -> LifeView FlyVIDEO3000 (NTSC) [5169:0138]
diff --git a/Documentation/video4linux/CARDLIST.tuner b/Documentation/video4linux/CARDLIST.tuner
index 1bcdac67dd8c..44134f04b82a 100644
--- a/Documentation/video4linux/CARDLIST.tuner
+++ b/Documentation/video4linux/CARDLIST.tuner
@@ -62,7 +62,7 @@ tuner=60 - Thomson DTT 761X (ATSC/NTSC)
62tuner=61 - Tena TNF9533-D/IF/TNF9533-B/DF 62tuner=61 - Tena TNF9533-D/IF/TNF9533-B/DF
63tuner=62 - Philips TEA5767HN FM Radio 63tuner=62 - Philips TEA5767HN FM Radio
64tuner=63 - Philips FMD1216ME MK3 Hybrid Tuner 64tuner=63 - Philips FMD1216ME MK3 Hybrid Tuner
65tuner=64 - LG TDVS-H062F/TUA6034 65tuner=64 - LG TDVS-H06xF
66tuner=65 - Ymec TVF66T5-B/DFF 66tuner=65 - Ymec TVF66T5-B/DFF
67tuner=66 - LG TALN series 67tuner=66 - LG TALN series
68tuner=67 - Philips TD1316 Hybrid Tuner 68tuner=67 - Philips TD1316 Hybrid Tuner
@@ -71,3 +71,4 @@ tuner=69 - Tena TNF 5335 and similar models
71tuner=70 - Samsung TCPN 2121P30A 71tuner=70 - Samsung TCPN 2121P30A
72tuner=71 - Xceive xc3028 72tuner=71 - Xceive xc3028
73tuner=72 - Thomson FE6600 73tuner=72 - Thomson FE6600
74tuner=73 - Samsung TCPG 6121P30A
diff --git a/Documentation/video4linux/CQcam.txt b/Documentation/video4linux/CQcam.txt
index 464e4cec94cb..ade8651e2443 100644
--- a/Documentation/video4linux/CQcam.txt
+++ b/Documentation/video4linux/CQcam.txt
@@ -185,207 +185,10 @@ this work is documented at the video4linux2 site listed below.
185 185
1869.0 --- A sample program using v4lgrabber, 1869.0 --- A sample program using v4lgrabber,
187 187
188This program is a simple image grabber that will copy a frame from the 188v4lgrab is a simple image grabber that will copy a frame from the
189first video device, /dev/video0 to standard output in portable pixmap 189first video device, /dev/video0 to standard output in portable pixmap
190format (.ppm) Using this like: 'v4lgrab | convert - c-qcam.jpg' 190format (.ppm) To produce .jpg output, you can use it like this:
191produced this picture of me at 191'v4lgrab | convert - c-qcam.jpg'
192 http://mug.sys.virginia.edu/~drf5n/extras/c-qcam.jpg
193
194-------------------- 8< ---------------- 8< -----------------------------
195
196/* Simple Video4Linux image grabber. */
197/*
198 * Video4Linux Driver Test/Example Framegrabbing Program
199 *
200 * Compile with:
201 * gcc -s -Wall -Wstrict-prototypes v4lgrab.c -o v4lgrab
202 * Use as:
203 * v4lgrab >image.ppm
204 *
205 * Copyright (C) 1998-05-03, Phil Blundell <philb@gnu.org>
206 * Copied from http://www.tazenda.demon.co.uk/phil/vgrabber.c
207 * with minor modifications (Dave Forrest, drf5n@virginia.edu).
208 *
209 */
210
211#include <unistd.h>
212#include <sys/types.h>
213#include <sys/stat.h>
214#include <fcntl.h>
215#include <stdio.h>
216#include <sys/ioctl.h>
217#include <stdlib.h>
218
219#include <linux/types.h>
220#include <linux/videodev.h>
221
222#define FILE "/dev/video0"
223
224/* Stole this from tvset.c */
225
226#define READ_VIDEO_PIXEL(buf, format, depth, r, g, b) \
227{ \
228 switch (format) \
229 { \
230 case VIDEO_PALETTE_GREY: \
231 switch (depth) \
232 { \
233 case 4: \
234 case 6: \
235 case 8: \
236 (r) = (g) = (b) = (*buf++ << 8);\
237 break; \
238 \
239 case 16: \
240 (r) = (g) = (b) = \
241 *((unsigned short *) buf); \
242 buf += 2; \
243 break; \
244 } \
245 break; \
246 \
247 \
248 case VIDEO_PALETTE_RGB565: \
249 { \
250 unsigned short tmp = *(unsigned short *)buf; \
251 (r) = tmp&0xF800; \
252 (g) = (tmp<<5)&0xFC00; \
253 (b) = (tmp<<11)&0xF800; \
254 buf += 2; \
255 } \
256 break; \
257 \
258 case VIDEO_PALETTE_RGB555: \
259 (r) = (buf[0]&0xF8)<<8; \
260 (g) = ((buf[0] << 5 | buf[1] >> 3)&0xF8)<<8; \
261 (b) = ((buf[1] << 2 ) & 0xF8)<<8; \
262 buf += 2; \
263 break; \
264 \
265 case VIDEO_PALETTE_RGB24: \
266 (r) = buf[0] << 8; (g) = buf[1] << 8; \
267 (b) = buf[2] << 8; \
268 buf += 3; \
269 break; \
270 \
271 default: \
272 fprintf(stderr, \
273 "Format %d not yet supported\n", \
274 format); \
275 } \
276}
277
278int get_brightness_adj(unsigned char *image, long size, int *brightness) {
279 long i, tot = 0;
280 for (i=0;i<size*3;i++)
281 tot += image[i];
282 *brightness = (128 - tot/(size*3))/3;
283 return !((tot/(size*3)) >= 126 && (tot/(size*3)) <= 130);
284}
285
286int main(int argc, char ** argv)
287{
288 int fd = open(FILE, O_RDONLY), f;
289 struct video_capability cap;
290 struct video_window win;
291 struct video_picture vpic;
292
293 unsigned char *buffer, *src;
294 int bpp = 24, r, g, b;
295 unsigned int i, src_depth;
296
297 if (fd < 0) {
298 perror(FILE);
299 exit(1);
300 }
301
302 if (ioctl(fd, VIDIOCGCAP, &cap) < 0) {
303 perror("VIDIOGCAP");
304 fprintf(stderr, "(" FILE " not a video4linux device?)\n");
305 close(fd);
306 exit(1);
307 }
308
309 if (ioctl(fd, VIDIOCGWIN, &win) < 0) {
310 perror("VIDIOCGWIN");
311 close(fd);
312 exit(1);
313 }
314
315 if (ioctl(fd, VIDIOCGPICT, &vpic) < 0) {
316 perror("VIDIOCGPICT");
317 close(fd);
318 exit(1);
319 }
320
321 if (cap.type & VID_TYPE_MONOCHROME) {
322 vpic.depth=8;
323 vpic.palette=VIDEO_PALETTE_GREY; /* 8bit grey */
324 if(ioctl(fd, VIDIOCSPICT, &vpic) < 0) {
325 vpic.depth=6;
326 if(ioctl(fd, VIDIOCSPICT, &vpic) < 0) {
327 vpic.depth=4;
328 if(ioctl(fd, VIDIOCSPICT, &vpic) < 0) {
329 fprintf(stderr, "Unable to find a supported capture format.\n");
330 close(fd);
331 exit(1);
332 }
333 }
334 }
335 } else {
336 vpic.depth=24;
337 vpic.palette=VIDEO_PALETTE_RGB24;
338
339 if(ioctl(fd, VIDIOCSPICT, &vpic) < 0) {
340 vpic.palette=VIDEO_PALETTE_RGB565;
341 vpic.depth=16;
342
343 if(ioctl(fd, VIDIOCSPICT, &vpic)==-1) {
344 vpic.palette=VIDEO_PALETTE_RGB555;
345 vpic.depth=15;
346
347 if(ioctl(fd, VIDIOCSPICT, &vpic)==-1) {
348 fprintf(stderr, "Unable to find a supported capture format.\n");
349 return -1;
350 }
351 }
352 }
353 }
354
355 buffer = malloc(win.width * win.height * bpp);
356 if (!buffer) {
357 fprintf(stderr, "Out of memory.\n");
358 exit(1);
359 }
360
361 do {
362 int newbright;
363 read(fd, buffer, win.width * win.height * bpp);
364 f = get_brightness_adj(buffer, win.width * win.height, &newbright);
365 if (f) {
366 vpic.brightness += (newbright << 8);
367 if(ioctl(fd, VIDIOCSPICT, &vpic)==-1) {
368 perror("VIDIOSPICT");
369 break;
370 }
371 }
372 } while (f);
373
374 fprintf(stdout, "P6\n%d %d 255\n", win.width, win.height);
375
376 src = buffer;
377
378 for (i = 0; i < win.width * win.height; i++) {
379 READ_VIDEO_PIXEL(src, vpic.palette, src_depth, r, g, b);
380 fputc(r>>8, stdout);
381 fputc(g>>8, stdout);
382 fputc(b>>8, stdout);
383 }
384
385 close(fd);
386 return 0;
387}
388-------------------- 8< ---------------- 8< -----------------------------
389 192
390 193
39110.0 --- Other Information 19410.0 --- Other Information
diff --git a/Documentation/video4linux/README.pvrusb2 b/Documentation/video4linux/README.pvrusb2
new file mode 100644
index 000000000000..c73a32c34528
--- /dev/null
+++ b/Documentation/video4linux/README.pvrusb2
@@ -0,0 +1,212 @@
1
2$Id$
3Mike Isely <isely@pobox.com>
4
5 pvrusb2 driver
6
7Background:
8
9 This driver is intended for the "Hauppauge WinTV PVR USB 2.0", which
10 is a USB 2.0 hosted TV Tuner. This driver is a work in progress.
11 Its history started with the reverse-engineering effort by Björn
12 Danielsson <pvrusb2@dax.nu> whose web page can be found here:
13
14 http://pvrusb2.dax.nu/
15
16 From there Aurelien Alleaume <slts@free.fr> began an effort to
17 create a video4linux compatible driver. I began with Aurelien's
18 last known snapshot and evolved the driver to the state it is in
19 here.
20
21 More information on this driver can be found at:
22
23 http://www.isely.net/pvrusb2.html
24
25
26 This driver has a strong separation of layers. They are very
27 roughly:
28
29 1a. Low level wire-protocol implementation with the device.
30
31 1b. I2C adaptor implementation and corresponding I2C client drivers
32 implemented elsewhere in V4L.
33
34 1c. High level hardware driver implementation which coordinates all
35 activities that ensure correct operation of the device.
36
37 2. A "context" layer which manages instancing of driver, setup,
38 tear-down, arbitration, and interaction with high level
39 interfaces appropriately as devices are hotplugged in the
40 system.
41
42 3. High level interfaces which glue the driver to various published
43 Linux APIs (V4L, sysfs, maybe DVB in the future).
44
45 The most important shearing layer is between the top 2 layers. A
46 lot of work went into the driver to ensure that any kind of
47 conceivable API can be laid on top of the core driver. (Yes, the
48 driver internally leverages V4L to do its work but that really has
49 nothing to do with the API published by the driver to the outside
50 world.) The architecture allows for different APIs to
51 simultaneously access the driver. I have a strong sense of fairness
52 about APIs and also feel that it is a good design principle to keep
53 implementation and interface isolated from each other. Thus while
54 right now the V4L high level interface is the most complete, the
55 sysfs high level interface will work equally well for similar
56 functions, and there's no reason I see right now why it shouldn't be
57 possible to produce a DVB high level interface that can sit right
58 alongside V4L.
59
60 NOTE: Complete documentation on the pvrusb2 driver is contained in
61 the html files within the doc directory; these are exactly the same
62 as what is on the web site at the time. Browse those files
63 (especially the FAQ) before asking questions.
64
65
66Building
67
68 To build these modules essentially amounts to just running "Make",
69 but you need the kernel source tree nearby and you will likely also
70 want to set a few controlling environment variables first in order
71 to link things up with that source tree. Please see the Makefile
72 here for comments that explain how to do that.
73
74
75Source file list / functional overview:
76
77 (Note: The term "module" used below generally refers to loosely
78 defined functional units within the pvrusb2 driver and bears no
79 relation to the Linux kernel's concept of a loadable module.)
80
81 pvrusb2-audio.[ch] - This is glue logic that resides between this
82 driver and the msp3400.ko I2C client driver (which is found
83 elsewhere in V4L).
84
85 pvrusb2-context.[ch] - This module implements the context for an
86 instance of the driver. Everything else eventually ties back to
87 or is otherwise instanced within the data structures implemented
88 here. Hotplugging is ultimately coordinated here. All high level
89 interfaces tie into the driver through this module. This module
90 helps arbitrate each interface's access to the actual driver core,
91 and is designed to allow concurrent access through multiple
92 instances of multiple interfaces (thus you can for example change
93 the tuner's frequency through sysfs while simultaneously streaming
94 video through V4L out to an instance of mplayer).
95
96 pvrusb2-debug.h - This header defines a printk() wrapper and a mask
97 of debugging bit definitions for the various kinds of debug
98 messages that can be enabled within the driver.
99
100 pvrusb2-debugifc.[ch] - This module implements a crude command line
101 oriented debug interface into the driver. Aside from being part
102 of the process for implementing manual firmware extraction (see
103 the pvrusb2 web site mentioned earlier), probably I'm the only one
104 who has ever used this. It is mainly a debugging aid.
105
106 pvrusb2-eeprom.[ch] - This is glue logic that resides between this
107 driver the tveeprom.ko module, which is itself implemented
108 elsewhere in V4L.
109
110 pvrusb2-encoder.[ch] - This module implements all protocol needed to
111 interact with the Conexant mpeg2 encoder chip within the pvrusb2
112 device. It is a crude echo of corresponding logic in ivtv,
113 however the design goals (strict isolation) and physical layer
114 (proxy through USB instead of PCI) are enough different that this
115 implementation had to be completely different.
116
117 pvrusb2-hdw-internal.h - This header defines the core data structure
118 in the driver used to track ALL internal state related to control
119 of the hardware. Nobody outside of the core hardware-handling
120 modules should have any business using this header. All external
121 access to the driver should be through one of the high level
122 interfaces (e.g. V4L, sysfs, etc), and in fact even those high
123 level interfaces are restricted to the API defined in
124 pvrusb2-hdw.h and NOT this header.
125
126 pvrusb2-hdw.h - This header defines the full internal API for
127 controlling the hardware. High level interfaces (e.g. V4L, sysfs)
128 will work through here.
129
130 pvrusb2-hdw.c - This module implements all the various bits of logic
131 that handle overall control of a specific pvrusb2 device.
132 (Policy, instantiation, and arbitration of pvrusb2 devices fall
133 within the jurisdiction of pvrusb-context not here).
134
135 pvrusb2-i2c-chips-*.c - These modules implement the glue logic to
136 tie together and configure various I2C modules as they attach to
137 the I2C bus. There are two versions of this file. The "v4l2"
138 version is intended to be used in-tree alongside V4L, where we
139 implement just the logic that makes sense for a pure V4L
140 environment. The "all" version is intended for use outside of
141 V4L, where we might encounter other possibly "challenging" modules
142 from ivtv or older kernel snapshots (or even the support modules
143 in the standalone snapshot).
144
145 pvrusb2-i2c-cmd-v4l1.[ch] - This module implements generic V4L1
146 compatible commands to the I2C modules. It is here where state
147 changes inside the pvrusb2 driver are translated into V4L1
148 commands that are in turn send to the various I2C modules.
149
150 pvrusb2-i2c-cmd-v4l2.[ch] - This module implements generic V4L2
151 compatible commands to the I2C modules. It is here where state
152 changes inside the pvrusb2 driver are translated into V4L2
153 commands that are in turn send to the various I2C modules.
154
155 pvrusb2-i2c-core.[ch] - This module provides an implementation of a
156 kernel-friendly I2C adaptor driver, through which other external
157 I2C client drivers (e.g. msp3400, tuner, lirc) may connect and
158 operate corresponding chips within the the pvrusb2 device. It is
159 through here that other V4L modules can reach into this driver to
160 operate specific pieces (and those modules are in turn driven by
161 glue logic which is coordinated by pvrusb2-hdw, doled out by
162 pvrusb2-context, and then ultimately made available to users
163 through one of the high level interfaces).
164
165 pvrusb2-io.[ch] - This module implements a very low level ring of
166 transfer buffers, required in order to stream data from the
167 device. This module is *very* low level. It only operates the
168 buffers and makes no attempt to define any policy or mechanism for
169 how such buffers might be used.
170
171 pvrusb2-ioread.[ch] - This module layers on top of pvrusb2-io.[ch]
172 to provide a streaming API usable by a read() system call style of
173 I/O. Right now this is the only layer on top of pvrusb2-io.[ch],
174 however the underlying architecture here was intended to allow for
175 other styles of I/O to be implemented with additonal modules, like
176 mmap()'ed buffers or something even more exotic.
177
178 pvrusb2-main.c - This is the top level of the driver. Module level
179 and USB core entry points are here. This is our "main".
180
181 pvrusb2-sysfs.[ch] - This is the high level interface which ties the
182 pvrusb2 driver into sysfs. Through this interface you can do
183 everything with the driver except actually stream data.
184
185 pvrusb2-tuner.[ch] - This is glue logic that resides between this
186 driver and the tuner.ko I2C client driver (which is found
187 elsewhere in V4L).
188
189 pvrusb2-util.h - This header defines some common macros used
190 throughout the driver. These macros are not really specific to
191 the driver, but they had to go somewhere.
192
193 pvrusb2-v4l2.[ch] - This is the high level interface which ties the
194 pvrusb2 driver into video4linux. It is through here that V4L
195 applications can open and operate the driver in the usual V4L
196 ways. Note that **ALL** V4L functionality is published only
197 through here and nowhere else.
198
199 pvrusb2-video-*.[ch] - This is glue logic that resides between this
200 driver and the saa711x.ko I2C client driver (which is found
201 elsewhere in V4L). Note that saa711x.ko used to be known as
202 saa7115.ko in ivtv. There are two versions of this; one is
203 selected depending on the particular saa711[5x].ko that is found.
204
205 pvrusb2.h - This header contains compile time tunable parameters
206 (and at the moment the driver has very little that needs to be
207 tuned).
208
209
210 -Mike Isely
211 isely@pobox.com
212
diff --git a/Documentation/video4linux/Zoran b/Documentation/video4linux/Zoran
index be9f21b84555..040a2c841ae9 100644
--- a/Documentation/video4linux/Zoran
+++ b/Documentation/video4linux/Zoran
@@ -33,6 +33,21 @@ Inputs/outputs: Composite and S-video
33Norms: PAL, SECAM (720x576 @ 25 fps), NTSC (720x480 @ 29.97 fps) 33Norms: PAL, SECAM (720x576 @ 25 fps), NTSC (720x480 @ 29.97 fps)
34Card number: 7 34Card number: 7
35 35
36AverMedia 6 Eyes AVS6EYES:
37* Zoran zr36067 PCI controller
38* Zoran zr36060 MJPEG codec
39* Samsung ks0127 TV decoder
40* Conexant bt866 TV encoder
41Drivers to use: videodev, i2c-core, i2c-algo-bit,
42 videocodec, ks0127, bt866, zr36060, zr36067
43Inputs/outputs: Six physical inputs. 1-6 are composite,
44 1-2, 3-4, 5-6 doubles as S-video,
45 1-3 triples as component.
46 One composite output.
47Norms: PAL, SECAM (720x576 @ 25 fps), NTSC (720x480 @ 29.97 fps)
48Card number: 8
49Not autodetected, card=8 is necessary.
50
36Linux Media Labs LML33: 51Linux Media Labs LML33:
37* Zoran zr36067 PCI controller 52* Zoran zr36067 PCI controller
38* Zoran zr36060 MJPEG codec 53* Zoran zr36060 MJPEG codec
@@ -192,6 +207,10 @@ Micronas vpx3220a TV decoder
192was introduced in 1996, is used in the DC30 and DC30+ and 207was introduced in 1996, is used in the DC30 and DC30+ and
193can handle: PAL B/G/H/I, PAL N, PAL M, NTSC M, NTSC 44, PAL 60, SECAM,NTSC Comb 208can handle: PAL B/G/H/I, PAL N, PAL M, NTSC M, NTSC 44, PAL 60, SECAM,NTSC Comb
194 209
210Samsung ks0127 TV decoder
211is used in the AVS6EYES card and
212can handle: NTSC-M/N/44, PAL-M/N/B/G/H/I/D/K/L and SECAM
213
195=========================== 214===========================
196 215
1971.2 What the TV encoder can do an what not 2161.2 What the TV encoder can do an what not
@@ -221,6 +240,10 @@ ITT mse3000 TV encoder
221was introduced in 1991, is used in the DC10 old 240was introduced in 1991, is used in the DC10 old
222can generate: PAL , NTSC , SECAM 241can generate: PAL , NTSC , SECAM
223 242
243Conexant bt866 TV encoder
244is used in AVS6EYES, and
245can generate: NTSC/PAL, PAL­M, PAL­N
246
224The adv717x, should be able to produce PAL N. But you find nothing PAL N 247The adv717x, should be able to produce PAL N. But you find nothing PAL N
225specific in the registers. Seem that you have to reuse a other standard 248specific in the registers. Seem that you have to reuse a other standard
226to generate PAL N, maybe it would work if you use the PAL M settings. 249to generate PAL N, maybe it would work if you use the PAL M settings.
diff --git a/Documentation/video4linux/bttv/CONTRIBUTORS b/Documentation/video4linux/bttv/CONTRIBUTORS
index aef49db8847d..8aad6dd93d6b 100644
--- a/Documentation/video4linux/bttv/CONTRIBUTORS
+++ b/Documentation/video4linux/bttv/CONTRIBUTORS
@@ -1,4 +1,4 @@
1Contributors to bttv: 1Contributors to bttv:
2 2
3Michael Chu <mmchu@pobox.com> 3Michael Chu <mmchu@pobox.com>
4 AverMedia fix and more flexible card recognition 4 AverMedia fix and more flexible card recognition
@@ -8,8 +8,8 @@ Alan Cox <alan@redhat.com>
8 8
9Chris Kleitsch 9Chris Kleitsch
10 Hardware I2C 10 Hardware I2C
11 11
12Gerd Knorr <kraxel@cs.tu-berlin.de> 12Gerd Knorr <kraxel@cs.tu-berlin.de>
13 Radio card (ITT sound processor) 13 Radio card (ITT sound processor)
14 14
15bigfoot <bigfoot@net-way.net> 15bigfoot <bigfoot@net-way.net>
@@ -18,7 +18,7 @@ Ragnar Hojland Espinosa <ragnar@macula.net>
18 18
19 19
20+ many more (please mail me if you are missing in this list and would 20+ many more (please mail me if you are missing in this list and would
21 like to be mentioned) 21 like to be mentioned)
22 22
23 23
24 24
diff --git a/Documentation/video4linux/cx2341x/fw-calling.txt b/Documentation/video4linux/cx2341x/fw-calling.txt
new file mode 100644
index 000000000000..8d21181de537
--- /dev/null
+++ b/Documentation/video4linux/cx2341x/fw-calling.txt
@@ -0,0 +1,69 @@
1This page describes how to make calls to the firmware api.
2
3How to call
4===========
5
6The preferred calling convention is known as the firmware mailbox. The
7mailboxes are basically a fixed length array that serves as the call-stack.
8
9Firmware mailboxes can be located by searching the encoder and decoder memory
10for a 16 byte signature. That signature will be located on a 256-byte boundary.
11
12Signature:
130x78, 0x56, 0x34, 0x12, 0x12, 0x78, 0x56, 0x34,
140x34, 0x12, 0x78, 0x56, 0x56, 0x34, 0x12, 0x78
15
16The firmware implements 20 mailboxes of 20 32-bit words. The first 10 are
17reserved for API calls. The second 10 are used by the firmware for event
18notification.
19
20 Index Name
21 ----- ----
22 0 Flags
23 1 Command
24 2 Return value
25 3 Timeout
26 4-19 Parameter/Result
27
28
29The flags are defined in the following table. The direction is from the
30perspective of the firmware.
31
32 Bit Direction Purpose
33 --- --------- -------
34 2 O Firmware has processed the command.
35 1 I Driver has finished setting the parameters.
36 0 I Driver is using this mailbox.
37
38
39The command is a 32-bit enumerator. The API specifics may be found in the
40fw-*-api.txt documents.
41
42The return value is a 32-bit enumerator. Only two values are currently defined:
430=success and -1=command undefined.
44
45There are 16 parameters/results 32-bit fields. The driver populates these fields
46with values for all the parameters required by the call. The driver overwrites
47these fields with result values returned by the call. The API specifics may be
48found in the fw-*-api.txt documents.
49
50The timeout value protects the card from a hung driver thread. If the driver
51doesn't handle the completed call within the timeout specified, the firmware
52will reset that mailbox.
53
54To make an API call, the driver iterates over each mailbox looking for the
55first one available (bit 0 has been cleared). The driver sets that bit, fills
56in the command enumerator, the timeout value and any required parameters. The
57driver then sets the parameter ready bit (bit 1). The firmware scans the
58mailboxes for pending commands, processes them, sets the result code, populates
59the result value array with that call's return values and sets the call
60complete bit (bit 2). Once bit 2 is set, the driver should retrieve the results
61and clear all the flags. If the driver does not perform this task within the
62time set in the timeout register, the firmware will reset that mailbox.
63
64Event notifications are sent from the firmware to the host. The host tells the
65firmware which events it is interested in via an API call. That call tells the
66firmware which notification mailbox to use. The firmware signals the host via
67an interrupt. Only the 16 Results fields are used, the Flags, Command, Return
68value and Timeout words are not used.
69
diff --git a/Documentation/video4linux/cx2341x/fw-decoder-api.txt b/Documentation/video4linux/cx2341x/fw-decoder-api.txt
new file mode 100644
index 000000000000..9df4fb3ea0f2
--- /dev/null
+++ b/Documentation/video4linux/cx2341x/fw-decoder-api.txt
@@ -0,0 +1,319 @@
1Decoder firmware API description
2================================
3
4Note: this API is part of the decoder firmware, so it's cx23415 only.
5
6-------------------------------------------------------------------------------
7
8Name CX2341X_DEC_PING_FW
9Enum 0/0x00
10Description
11 This API call does nothing. It may be used to check if the firmware
12 is responding.
13
14-------------------------------------------------------------------------------
15
16Name CX2341X_DEC_START_PLAYBACK
17Enum 1/0x01
18Description
19 Begin or resume playback.
20Param[0]
21 0 based frame number in GOP to begin playback from.
22Param[1]
23 Specifies the number of muted audio frames to play before normal
24 audio resumes.
25
26-------------------------------------------------------------------------------
27
28Name CX2341X_DEC_STOP_PLAYBACK
29Enum 2/0x02
30Description
31 Ends playback and clears all decoder buffers. If PTS is not zero,
32 playback stops at specified PTS.
33Param[0]
34 Display 0=last frame, 1=black
35Param[1]
36 PTS low
37Param[2]
38 PTS high
39
40-------------------------------------------------------------------------------
41
42Name CX2341X_DEC_SET_PLAYBACK_SPEED
43Enum 3/0x03
44Description
45 Playback stream at speed other than normal. There are two modes of
46 operation:
47 Smooth: host transfers entire stream and firmware drops unused
48 frames.
49 Coarse: host drops frames based on indexing as required to achieve
50 desired speed.
51Param[0]
52 Bitmap:
53 0:7 0 normal
54 1 fast only "1.5 times"
55 n nX fast, 1/nX slow
56 30 Framedrop:
57 '0' during 1.5 times play, every other B frame is dropped
58 '1' during 1.5 times play, stream is unchanged (bitrate
59 must not exceed 8mbps)
60 31 Speed:
61 '0' slow
62 '1' fast
63Param[1]
64 Direction: 0=forward, 1=reverse
65Param[2]
66 Picture mask:
67 1=I frames
68 3=I, P frames
69 7=I, P, B frames
70Param[3]
71 B frames per GOP (for reverse play only)
72Param[4]
73 Mute audio: 0=disable, 1=enable
74Param[5]
75 Display 0=frame, 1=field
76Param[6]
77 Specifies the number of muted audio frames to play before normal audio
78 resumes.
79
80-------------------------------------------------------------------------------
81
82Name CX2341X_DEC_STEP_VIDEO
83Enum 5/0x05
84Description
85 Each call to this API steps the playback to the next unit defined below
86 in the current playback direction.
87Param[0]
88 0=frame, 1=top field, 2=bottom field
89
90-------------------------------------------------------------------------------
91
92Name CX2341X_DEC_SET_DMA_BLOCK_SIZE
93Enum 8/0x08
94Description
95 Set DMA transfer block size. Counterpart to API 0xC9
96Param[0]
97 DMA transfer block size in bytes. A different size may be specified
98 when issuing the DMA transfer command.
99
100-------------------------------------------------------------------------------
101
102Name CX2341X_DEC_GET_XFER_INFO
103Enum 9/0x09
104Description
105 This API call may be used to detect an end of stream condtion.
106Result[0]
107 Stream type
108Result[1]
109 Address offset
110Result[2]
111 Maximum bytes to transfer
112Result[3]
113 Buffer fullness
114
115-------------------------------------------------------------------------------
116
117Name CX2341X_DEC_GET_DMA_STATUS
118Enum 10/0x0A
119Description
120 Status of the last DMA transfer
121Result[0]
122 Bit 1 set means transfer complete
123 Bit 2 set means DMA error
124 Bit 3 set means linked list error
125Result[1]
126 DMA type: 0=MPEG, 1=OSD, 2=YUV
127
128-------------------------------------------------------------------------------
129
130Name CX2341X_DEC_SCHED_DMA_FROM_HOST
131Enum 11/0x0B
132Description
133 Setup DMA from host operation. Counterpart to API 0xCC
134Param[0]
135 Memory address of link list
136Param[1]
137 Total # of bytes to transfer
138Param[2]
139 DMA type (0=MPEG, 1=OSD, 2=YUV)
140
141-------------------------------------------------------------------------------
142
143Name CX2341X_DEC_PAUSE_PLAYBACK
144Enum 13/0x0D
145Description
146 Freeze playback immediately. In this mode, when internal buffers are
147 full, no more data will be accepted and data request IRQs will be
148 masked.
149Param[0]
150 Display: 0=last frame, 1=black
151
152-------------------------------------------------------------------------------
153
154Name CX2341X_DEC_HALT_FW
155Enum 14/0x0E
156Description
157 The firmware is halted and no further API calls are serviced until
158 the firmware is uploaded again.
159
160-------------------------------------------------------------------------------
161
162Name CX2341X_DEC_SET_STANDARD
163Enum 16/0x10
164Description
165 Selects display standard
166Param[0]
167 0=NTSC, 1=PAL
168
169-------------------------------------------------------------------------------
170
171Name CX2341X_DEC_GET_VERSION
172Enum 17/0x11
173Description
174 Returns decoder firmware version information
175Result[0]
176 Version bitmask:
177 Bits 0:15 build
178 Bits 16:23 minor
179 Bits 24:31 major
180
181-------------------------------------------------------------------------------
182
183Name CX2341X_DEC_SET_STREAM_INPUT
184Enum 20/0x14
185Description
186 Select decoder stream input port
187Param[0]
188 0=memory (default), 1=streaming
189
190-------------------------------------------------------------------------------
191
192Name CX2341X_DEC_GET_TIMING_INFO
193Enum 21/0x15
194Description
195 Returns timing information from start of playback
196Result[0]
197 Frame count by decode order
198Result[1]
199 Video PTS bits 0:31 by display order
200Result[2]
201 Video PTS bit 32 by display order
202Result[3]
203 SCR bits 0:31 by display order
204Result[4]
205 SCR bit 32 by display order
206
207-------------------------------------------------------------------------------
208
209Name CX2341X_DEC_SET_AUDIO_MODE
210Enum 22/0x16
211Description
212 Select audio mode
213Param[0]
214 Dual mono mode action
215Param[1]
216 Stereo mode action:
217 0=Stereo, 1=Left, 2=Right, 3=Mono, 4=Swap, -1=Unchanged
218
219-------------------------------------------------------------------------------
220
221Name CX2341X_DEC_SET_EVENT_NOTIFICATION
222Enum 23/0x17
223Description
224 Setup firmware to notify the host about a particular event.
225 Counterpart to API 0xD5
226Param[0]
227 Event: 0=Audio mode change between stereo and dual channel
228Param[1]
229 Notification 0=disabled, 1=enabled
230Param[2]
231 Interrupt bit
232Param[3]
233 Mailbox slot, -1 if no mailbox required.
234
235-------------------------------------------------------------------------------
236
237Name CX2341X_DEC_SET_DISPLAY_BUFFERS
238Enum 24/0x18
239Description
240 Number of display buffers. To decode all frames in reverse playback you
241 must use nine buffers.
242Param[0]
243 0=six buffers, 1=nine buffers
244
245-------------------------------------------------------------------------------
246
247Name CX2341X_DEC_EXTRACT_VBI
248Enum 25/0x19
249Description
250 Extracts VBI data
251Param[0]
252 0=extract from extension & user data, 1=extract from private packets
253Result[0]
254 VBI table location
255Result[1]
256 VBI table size
257
258-------------------------------------------------------------------------------
259
260Name CX2341X_DEC_SET_DECODER_SOURCE
261Enum 26/0x1A
262Description
263 Selects decoder source. Ensure that the parameters passed to this
264 API match the encoder settings.
265Param[0]
266 Mode: 0=MPEG from host, 1=YUV from encoder, 2=YUV from host
267Param[1]
268 YUV picture width
269Param[2]
270 YUV picture height
271Param[3]
272 Bitmap: see Param[0] of API 0xBD
273
274-------------------------------------------------------------------------------
275
276Name CX2341X_DEC_SET_AUDIO_OUTPUT
277Enum 27/0x1B
278Description
279 Select audio output format
280Param[0]
281 Bitmask:
282 0:1 Data size:
283 '00' 16 bit
284 '01' 20 bit
285 '10' 24 bit
286 2:7 Unused
287 8:9 Mode:
288 '00' 2 channels
289 '01' 4 channels
290 '10' 6 channels
291 '11' 6 channels with one line data mode
292 (for left justified MSB first mode, 20 bit only)
293 10:11 Unused
294 12:13 Channel format:
295 '00' right justified MSB first mode
296 '01' left justified MSB first mode
297 '10' I2S mode
298 14:15 Unused
299 16:21 Right justify bit count
300 22:31 Unused
301
302-------------------------------------------------------------------------------
303
304Name CX2341X_DEC_SET_AV_DELAY
305Enum 28/0x1C
306Description
307 Set audio/video delay in 90Khz ticks
308Param[0]
309 0=A/V in sync, negative=audio lags, positive=video lags
310
311-------------------------------------------------------------------------------
312
313Name CX2341X_DEC_SET_PREBUFFERING
314Enum 30/0x1E
315Description
316 Decoder prebuffering, when enabled up to 128KB are buffered for
317 streams <8mpbs or 640KB for streams >8mbps
318Param[0]
319 0=off, 1=on
diff --git a/Documentation/video4linux/cx2341x/fw-dma.txt b/Documentation/video4linux/cx2341x/fw-dma.txt
new file mode 100644
index 000000000000..8123e262d5b6
--- /dev/null
+++ b/Documentation/video4linux/cx2341x/fw-dma.txt
@@ -0,0 +1,94 @@
1This page describes the structures and procedures used by the cx2341x DMA
2engine.
3
4Introduction
5============
6
7The cx2341x PCI interface is busmaster capable. This means it has a DMA
8engine to efficiently transfer large volumes of data between the card and main
9memory without requiring help from a CPU. Like most hardware, it must operate
10on contiguous physical memory. This is difficult to come by in large quantities
11on virtual memory machines.
12
13Therefore, it also supports a technique called "scatter-gather". The card can
14transfer multiple buffers in one operation. Instead of allocating one large
15contiguous buffer, the driver can allocate several smaller buffers.
16
17In practice, I've seen the average transfer to be roughly 80K, but transfers
18above 128K were not uncommon, particularly at startup. The 128K figure is
19important, because that is the largest block that the kernel can normally
20allocate. Even still, 128K blocks are hard to come by, so the driver writer is
21urged to choose a smaller block size and learn the scatter-gather technique.
22
23Mailbox #10 is reserved for DMA transfer information.
24
25Flow
26====
27
28This section describes, in general, the order of events when handling DMA
29transfers. Detailed information follows this section.
30
31- The card raises the Encoder interrupt.
32- The driver reads the transfer type, offset and size from Mailbox #10.
33- The driver constructs the scatter-gather array from enough free dma buffers
34 to cover the size.
35- The driver schedules the DMA transfer via the ScheduleDMAtoHost API call.
36- The card raises the DMA Complete interrupt.
37- The driver checks the DMA status register for any errors.
38- The driver post-processes the newly transferred buffers.
39
40NOTE! It is possible that the Encoder and DMA Complete interrupts get raised
41simultaneously. (End of the last, start of the next, etc.)
42
43Mailbox #10
44===========
45
46The Flags, Command, Return Value and Timeout fields are ignored.
47
48Name: Mailbox #10
49Results[0]: Type: 0: MPEG.
50Results[1]: Offset: The position relative to the card's memory space.
51Results[2]: Size: The exact number of bytes to transfer.
52
53My speculation is that since the StartCapture API has a capture type of "RAW"
54available, that the type field will have other values that correspond to YUV
55and PCM data.
56
57Scatter-Gather Array
58====================
59
60The scatter-gather array is a contiguously allocated block of memory that
61tells the card the source and destination of each data-block to transfer.
62Card "addresses" are derived from the offset supplied by Mailbox #10. Host
63addresses are the physical memory location of the target DMA buffer.
64
65Each S-G array element is a struct of three 32-bit words. The first word is
66the source address, the second is the destination address. Both take up the
67entire 32 bits. The lowest 16 bits of the third word is the transfer byte
68count. The high-bit of the third word is the "last" flag. The last-flag tells
69the card to raise the DMA_DONE interrupt. From hard personal experience, if
70you forget to set this bit, the card will still "work" but the stream will
71most likely get corrupted.
72
73The transfer count must be a multiple of 256. Therefore, the driver will need
74to track how much data in the target buffer is valid and deal with it
75accordingly.
76
77Array Element:
78
79- 32-bit Source Address
80- 32-bit Destination Address
81- 16-bit reserved (high bit is the last flag)
82- 16-bit byte count
83
84DMA Transfer Status
85===================
86
87Register 0x0004 holds the DMA Transfer Status:
88
89Bit
904 Scatter-Gather array error
913 DMA write error
922 DMA read error
931 write completed
940 read completed
diff --git a/Documentation/video4linux/cx2341x/fw-encoder-api.txt b/Documentation/video4linux/cx2341x/fw-encoder-api.txt
new file mode 100644
index 000000000000..001c68644b08
--- /dev/null
+++ b/Documentation/video4linux/cx2341x/fw-encoder-api.txt
@@ -0,0 +1,694 @@
1Encoder firmware API description
2================================
3
4-------------------------------------------------------------------------------
5
6Name CX2341X_ENC_PING_FW
7Enum 128/0x80
8Description
9 Does nothing. Can be used to check if the firmware is responding.
10
11-------------------------------------------------------------------------------
12
13Name CX2341X_ENC_START_CAPTURE
14Enum 129/0x81
15Description
16 Commences the capture of video, audio and/or VBI data. All encoding
17 parameters must be initialized prior to this API call. Captures frames
18 continuously or until a predefined number of frames have been captured.
19Param[0]
20 Capture stream type:
21 0=MPEG
22 1=Raw
23 2=Raw passthrough
24 3=VBI
25
26Param[1]
27 Bitmask:
28 Bit 0 when set, captures YUV
29 Bit 1 when set, captures PCM audio
30 Bit 2 when set, captures VBI (same as param[0]=3)
31 Bit 3 when set, the capture destination is the decoder
32 (same as param[0]=2)
33 Bit 4 when set, the capture destination is the host
34 Note: this parameter is only meaningful for RAW capture type.
35
36-------------------------------------------------------------------------------
37
38Name CX2341X_ENC_STOP_CAPTURE
39Enum 130/0x82
40Description
41 Ends a capture in progress
42Param[0]
43 0=stop at end of GOP (generates IRQ)
44 1=stop immediate (no IRQ)
45Param[1]
46 Stream type to stop, see param[0] of API 0x81
47Param[2]
48 Subtype, see param[1] of API 0x81
49
50-------------------------------------------------------------------------------
51
52Name CX2341X_ENC_SET_AUDIO_ID
53Enum 137/0x89
54Description
55 Assigns the transport stream ID of the encoded audio stream
56Param[0]
57 Audio Stream ID
58
59-------------------------------------------------------------------------------
60
61Name CX2341X_ENC_SET_VIDEO_ID
62Enum 139/0x8B
63Description
64 Set video transport stream ID
65Param[0]
66 Video stream ID
67
68-------------------------------------------------------------------------------
69
70Name CX2341X_ENC_SET_PCR_ID
71Enum 141/0x8D
72Description
73 Assigns the transport stream ID for PCR packets
74Param[0]
75 PCR Stream ID
76
77-------------------------------------------------------------------------------
78
79Name CX2341X_ENC_SET_FRAME_RATE
80Enum 143/0x8F
81Description
82 Set video frames per second. Change occurs at start of new GOP.
83Param[0]
84 0=30fps
85 1=25fps
86
87-------------------------------------------------------------------------------
88
89Name CX2341X_ENC_SET_FRAME_SIZE
90Enum 145/0x91
91Description
92 Select video stream encoding resolution.
93Param[0]
94 Height in lines. Default 480
95Param[1]
96 Width in pixels. Default 720
97
98-------------------------------------------------------------------------------
99
100Name CX2341X_ENC_SET_BIT_RATE
101Enum 149/0x95
102Description
103 Assign average video stream bitrate. Note on the last three params:
104 Param[3] and [4] seem to be always 0, param [5] doesn't seem to be used.
105Param[0]
106 0=variable bitrate, 1=constant bitrate
107Param[1]
108 bitrate in bits per second
109Param[2]
110 peak bitrate in bits per second, divided by 400
111Param[3]
112 Mux bitrate in bits per second, divided by 400. May be 0 (default).
113Param[4]
114 Rate Control VBR Padding
115Param[5]
116 VBV Buffer used by encoder
117
118-------------------------------------------------------------------------------
119
120Name CX2341X_ENC_SET_GOP_PROPERTIES
121Enum 151/0x97
122Description
123 Setup the GOP structure
124Param[0]
125 GOP size (maximum is 34)
126Param[1]
127 Number of B frames between the I and P frame, plus 1.
128 For example: IBBPBBPBBPBB --> GOP size: 12, number of B frames: 2+1 = 3
129 Note that GOP size must be a multiple of (B-frames + 1).
130
131-------------------------------------------------------------------------------
132
133Name CX2341X_ENC_SET_ASPECT_RATIO
134Enum 153/0x99
135Description
136 Sets the encoding aspect ratio. Changes in the aspect ratio take effect
137 at the start of the next GOP.
138Param[0]
139 '0000' forbidden
140 '0001' 1:1 square
141 '0010' 4:3
142 '0011' 16:9
143 '0100' 2.21:1
144 '0101' reserved
145 ....
146 '1111' reserved
147
148-------------------------------------------------------------------------------
149
150Name CX2341X_ENC_SET_DNR_FILTER_MODE
151Enum 155/0x9B
152Description
153 Assign Dynamic Noise Reduction operating mode
154Param[0]
155 Bit0: Spatial filter, set=auto, clear=manual
156 Bit1: Temporal filter, set=auto, clear=manual
157Param[1]
158 Median filter:
159 0=Disabled
160 1=Horizontal
161 2=Vertical
162 3=Horiz/Vert
163 4=Diagonal
164
165-------------------------------------------------------------------------------
166
167Name CX2341X_ENC_SET_DNR_FILTER_PROPS
168Enum 157/0x9D
169Description
170 These Dynamic Noise Reduction filter values are only meaningful when
171 the respective filter is set to "manual" (See API 0x9B)
172Param[0]
173 Spatial filter: default 0, range 0:15
174Param[1]
175 Temporal filter: default 0, range 0:31
176
177-------------------------------------------------------------------------------
178
179Name CX2341X_ENC_SET_CORING_LEVELS
180Enum 159/0x9F
181Description
182 Assign Dynamic Noise Reduction median filter properties.
183Param[0]
184 Threshold above which the luminance median filter is enabled.
185 Default: 0, range 0:255
186Param[1]
187 Threshold below which the luminance median filter is enabled.
188 Default: 255, range 0:255
189Param[2]
190 Threshold above which the chrominance median filter is enabled.
191 Default: 0, range 0:255
192Param[3]
193 Threshold below which the chrominance median filter is enabled.
194 Default: 255, range 0:255
195
196-------------------------------------------------------------------------------
197
198Name CX2341X_ENC_SET_SPATIAL_FILTER_TYPE
199Enum 161/0xA1
200Description
201 Assign spatial prefilter parameters
202Param[0]
203 Luminance filter
204 0=Off
205 1=1D Horizontal
206 2=1D Vertical
207 3=2D H/V Separable (default)
208 4=2D Symmetric non-separable
209Param[1]
210 Chrominance filter
211 0=Off
212 1=1D Horizontal (default)
213
214-------------------------------------------------------------------------------
215
216Name CX2341X_ENC_SET_3_2_PULLDOWN
217Enum 177/0xB1
218Description
219 3:2 pulldown properties
220Param[0]
221 0=enabled
222 1=disabled
223
224-------------------------------------------------------------------------------
225
226Name CX2341X_ENC_SET_VBI_LINE
227Enum 183/0xB7
228Description
229 Selects VBI line number.
230Param[0]
231 Bits 0:4 line number
232 Bit 31 0=top_field, 1=bottom_field
233 Bits 0:31 all set specifies "all lines"
234Param[1]
235 VBI line information features: 0=disabled, 1=enabled
236Param[2]
237 Slicing: 0=None, 1=Closed Caption
238 Almost certainly not implemented. Set to 0.
239Param[3]
240 Luminance samples in this line.
241 Almost certainly not implemented. Set to 0.
242Param[4]
243 Chrominance samples in this line
244 Almost certainly not implemented. Set to 0.
245
246-------------------------------------------------------------------------------
247
248Name CX2341X_ENC_SET_STREAM_TYPE
249Enum 185/0xB9
250Description
251 Assign stream type
252 Note: Transport stream is not working in recent firmwares.
253 And in older firmwares the timestamps in the TS seem to be
254 unreliable.
255Param[0]
256 0=Program stream
257 1=Transport stream
258 2=MPEG1 stream
259 3=PES A/V stream
260 5=PES Video stream
261 7=PES Audio stream
262 10=DVD stream
263 11=VCD stream
264 12=SVCD stream
265 13=DVD_S1 stream
266 14=DVD_S2 stream
267
268-------------------------------------------------------------------------------
269
270Name CX2341X_ENC_SET_OUTPUT_PORT
271Enum 187/0xBB
272Description
273 Assign stream output port. Normally 0 when the data is copied through
274 the PCI bus (DMA), and 1 when the data is streamed to another chip
275 (pvrusb and cx88-blackbird).
276Param[0]
277 0=Memory (default)
278 1=Streaming
279 2=Serial
280Param[1]
281 Unknown, but leaving this to 0 seems to work best. Indications are that
282 this might have to do with USB support, although passing anything but 0
283 onl breaks things.
284
285-------------------------------------------------------------------------------
286
287Name CX2341X_ENC_SET_AUDIO_PROPERTIES
288Enum 189/0xBD
289Description
290 Set audio stream properties, may be called while encoding is in progress.
291 Note: all bitfields are consistent with ISO11172 documentation except
292 bits 2:3 which ISO docs define as:
293 '11' Layer I
294 '10' Layer II
295 '01' Layer III
296 '00' Undefined
297 This discrepancy may indicate a possible error in the documentation.
298 Testing indicated that only Layer II is actually working, and that
299 the minimum bitrate should be 192 kbps.
300Param[0]
301 Bitmask:
302 0:1 '00' 44.1Khz
303 '01' 48Khz
304 '10' 32Khz
305 '11' reserved
306
307 2:3 '01'=Layer I
308 '10'=Layer II
309
310 4:7 Bitrate:
311 Index | Layer I | Layer II
312 ------+-------------+------------
313 '0000' | free format | free format
314 '0001' | 32 kbit/s | 32 kbit/s
315 '0010' | 64 kbit/s | 48 kbit/s
316 '0011' | 96 kbit/s | 56 kbit/s
317 '0100' | 128 kbit/s | 64 kbit/s
318 '0101' | 160 kbit/s | 80 kbit/s
319 '0110' | 192 kbit/s | 96 kbit/s
320 '0111' | 224 kbit/s | 112 kbit/s
321 '1000' | 256 kbit/s | 128 kbit/s
322 '1001' | 288 kbit/s | 160 kbit/s
323 '1010' | 320 kbit/s | 192 kbit/s
324 '1011' | 352 kbit/s | 224 kbit/s
325 '1100' | 384 kbit/s | 256 kbit/s
326 '1101' | 416 kbit/s | 320 kbit/s
327 '1110' | 448 kbit/s | 384 kbit/s
328 Note: For Layer II, not all combinations of total bitrate
329 and mode are allowed. See ISO11172-3 3-Annex B, Table 3-B.2
330
331 8:9 '00'=Stereo
332 '01'=JointStereo
333 '10'=Dual
334 '11'=Mono
335 Note: testing seems to indicate that Mono and possibly
336 JointStereo are not working (default to stereo).
337 Dual does work, though.
338
339 10:11 Mode Extension used in joint_stereo mode.
340 In Layer I and II they indicate which subbands are in
341 intensity_stereo. All other subbands are coded in stereo.
342 '00' subbands 4-31 in intensity_stereo, bound==4
343 '01' subbands 8-31 in intensity_stereo, bound==8
344 '10' subbands 12-31 in intensity_stereo, bound==12
345 '11' subbands 16-31 in intensity_stereo, bound==16
346
347 12:13 Emphasis:
348 '00' None
349 '01' 50/15uS
350 '10' reserved
351 '11' CCITT J.17
352
353 14 CRC:
354 '0' off
355 '1' on
356
357 15 Copyright:
358 '0' off
359 '1' on
360
361 16 Generation:
362 '0' copy
363 '1' original
364
365-------------------------------------------------------------------------------
366
367Name CX2341X_ENC_HALT_FW
368Enum 195/0xC3
369Description
370 The firmware is halted and no further API calls are serviced until the
371 firmware is uploaded again.
372
373-------------------------------------------------------------------------------
374
375Name CX2341X_ENC_GET_VERSION
376Enum 196/0xC4
377Description
378 Returns the version of the encoder firmware.
379Result[0]
380 Version bitmask:
381 Bits 0:15 build
382 Bits 16:23 minor
383 Bits 24:31 major
384
385-------------------------------------------------------------------------------
386
387Name CX2341X_ENC_SET_GOP_CLOSURE
388Enum 197/0xC5
389Description
390 Assigns the GOP open/close property.
391Param[0]
392 0=Open
393 1=Closed
394
395-------------------------------------------------------------------------------
396
397Name CX2341X_ENC_GET_SEQ_END
398Enum 198/0xC6
399Description
400 Obtains the sequence end code of the encoder's buffer. When a capture
401 is started a number of interrupts are still generated, the last of
402 which will have Result[0] set to 1 and Result[1] will contain the size
403 of the buffer.
404Result[0]
405 State of the transfer (1 if last buffer)
406Result[1]
407 If Result[0] is 1, this contains the size of the last buffer, undefined
408 otherwise.
409
410-------------------------------------------------------------------------------
411
412Name CX2341X_ENC_SET_PGM_INDEX_INFO
413Enum 199/0xC7
414Description
415 Sets the Program Index Information.
416Param[0]
417 Picture Mask:
418 0=No index capture
419 1=I frames
420 3=I,P frames
421 7=I,P,B frames
422Param[1]
423 Elements requested (up to 400)
424Result[0]
425 Offset in SDF memory of the table.
426Result[1]
427 Number of allocated elements up to a maximum of Param[1]
428
429-------------------------------------------------------------------------------
430
431Name CX2341X_ENC_SET_VBI_CONFIG
432Enum 200/0xC8
433Description
434 Configure VBI settings
435Param[0]
436 Bitmap:
437 0 Mode '0' Sliced, '1' Raw
438 1:3 Insertion:
439 '000' insert in extension & user data
440 '001' insert in private packets
441 '010' separate stream and user data
442 '111' separate stream and private data
443 8:15 Stream ID (normally 0xBD)
444Param[1]
445 Frames per interrupt (max 8). Only valid in raw mode.
446Param[2]
447 Total raw VBI frames. Only valid in raw mode.
448Param[3]
449 Start codes
450Param[4]
451 Stop codes
452Param[5]
453 Lines per frame
454Param[6]
455 Byte per line
456Result[0]
457 Observed frames per interrupt in raw mode only. Rage 1 to Param[1]
458Result[1]
459 Observed number of frames in raw mode. Range 1 to Param[2]
460Result[2]
461 Memory offset to start or raw VBI data
462
463-------------------------------------------------------------------------------
464
465Name CX2341X_ENC_SET_DMA_BLOCK_SIZE
466Enum 201/0xC9
467Description
468 Set DMA transfer block size
469Param[0]
470 DMA transfer block size in bytes or frames. When unit is bytes,
471 supported block sizes are 2^7, 2^8 and 2^9 bytes.
472Param[1]
473 Unit: 0=bytes, 1=frames
474
475-------------------------------------------------------------------------------
476
477Name CX2341X_ENC_GET_PREV_DMA_INFO_MB_10
478Enum 202/0xCA
479Description
480 Returns information on the previous DMA transfer in conjunction with
481 bit 27 of the interrupt mask. Uses mailbox 10.
482Result[0]
483 Type of stream
484Result[1]
485 Address Offset
486Result[2]
487 Maximum size of transfer
488
489-------------------------------------------------------------------------------
490
491Name CX2341X_ENC_GET_PREV_DMA_INFO_MB_9
492Enum 203/0xCB
493Description
494 Returns information on the previous DMA transfer in conjunction with
495 bit 27 of the interrupt mask. Uses mailbox 9.
496Result[0]
497 Status bits:
498 Bit 0 set indicates transfer complete
499 Bit 2 set indicates transfer error
500 Bit 4 set indicates linked list error
501Result[1]
502 DMA type
503Result[2]
504 Presentation Time Stamp bits 0..31
505Result[3]
506 Presentation Time Stamp bit 32
507
508-------------------------------------------------------------------------------
509
510Name CX2341X_ENC_SCHED_DMA_TO_HOST
511Enum 204/0xCC
512Description
513 Setup DMA to host operation
514Param[0]
515 Memory address of link list
516Param[1]
517 Length of link list (wtf: what units ???)
518Param[2]
519 DMA type (0=MPEG)
520
521-------------------------------------------------------------------------------
522
523Name CX2341X_ENC_INITIALIZE_INPUT
524Enum 205/0xCD
525Description
526 Initializes the video input
527
528-------------------------------------------------------------------------------
529
530Name CX2341X_ENC_SET_FRAME_DROP_RATE
531Enum 208/0xD0
532Description
533 For each frame captured, skip specified number of frames.
534Param[0]
535 Number of frames to skip
536
537-------------------------------------------------------------------------------
538
539Name CX2341X_ENC_PAUSE_ENCODER
540Enum 210/0xD2
541Description
542 During a pause condition, all frames are dropped instead of being encoded.
543Param[0]
544 0=Pause encoding
545 1=Continue encoding
546
547-------------------------------------------------------------------------------
548
549Name CX2341X_ENC_REFRESH_INPUT
550Enum 211/0xD3
551Description
552 Refreshes the video input
553
554-------------------------------------------------------------------------------
555
556Name CX2341X_ENC_SET_COPYRIGHT
557Enum 212/0xD4
558Description
559 Sets stream copyright property
560Param[0]
561 0=Stream is not copyrighted
562 1=Stream is copyrighted
563
564-------------------------------------------------------------------------------
565
566Name CX2341X_ENC_SET_EVENT_NOTIFICATION
567Enum 213/0xD5
568Description
569 Setup firmware to notify the host about a particular event. Host must
570 unmask the interrupt bit.
571Param[0]
572 Event (0=refresh encoder input)
573Param[1]
574 Notification 0=disabled 1=enabled
575Param[2]
576 Interrupt bit
577Param[3]
578 Mailbox slot, -1 if no mailbox required.
579
580-------------------------------------------------------------------------------
581
582Name CX2341X_ENC_SET_NUM_VSYNC_LINES
583Enum 214/0xD6
584Description
585 Depending on the analog video decoder used, this assigns the number
586 of lines for field 1 and 2.
587Param[0]
588 Field 1 number of lines:
589 0x00EF for SAA7114
590 0x00F0 for SAA7115
591 0x0105 for Micronas
592Param[1]
593 Field 2 number of lines:
594 0x00EF for SAA7114
595 0x00F0 for SAA7115
596 0x0106 for Micronas
597
598-------------------------------------------------------------------------------
599
600Name CX2341X_ENC_SET_PLACEHOLDER
601Enum 215/0xD7
602Description
603 Provides a mechanism of inserting custom user data in the MPEG stream.
604Param[0]
605 0=extension & user data
606 1=private packet with stream ID 0xBD
607Param[1]
608 Rate at which to insert data, in units of frames (for private packet)
609 or GOPs (for ext. & user data)
610Param[2]
611 Number of data DWORDs (below) to insert
612Param[3]
613 Custom data 0
614Param[4]
615 Custom data 1
616Param[5]
617 Custom data 2
618Param[6]
619 Custom data 3
620Param[7]
621 Custom data 4
622Param[8]
623 Custom data 5
624Param[9]
625 Custom data 6
626Param[10]
627 Custom data 7
628Param[11]
629 Custom data 8
630
631-------------------------------------------------------------------------------
632
633Name CX2341X_ENC_MUTE_VIDEO
634Enum 217/0xD9
635Description
636 Video muting
637Param[0]
638 Bit usage:
639 0 '0'=video not muted
640 '1'=video muted, creates frames with the YUV color defined below
641 1:7 Unused
642 8:15 V chrominance information
643 16:23 U chrominance information
644 24:31 Y luminance information
645
646-------------------------------------------------------------------------------
647
648Name CX2341X_ENC_MUTE_AUDIO
649Enum 218/0xDA
650Description
651 Audio muting
652Param[0]
653 0=audio not muted
654 1=audio muted (produces silent mpeg audio stream)
655
656-------------------------------------------------------------------------------
657
658Name CX2341X_ENC_UNKNOWN
659Enum 219/0xDB
660Description
661 Unknown API, it's used by Hauppauge though.
662Param[0]
663 0 This is the value Hauppauge uses, Unknown what it means.
664
665-------------------------------------------------------------------------------
666
667Name CX2341X_ENC_MISC
668Enum 220/0xDC
669Description
670 Miscellaneous actions. Not known for 100% what it does. It's really a
671 sort of ioctl call. The first parameter is a command number, the second
672 the value.
673Param[0]
674 Command number:
675 1=set initial SCR value when starting encoding.
676 2=set quality mode (apparently some test setting).
677 3=setup advanced VIM protection handling (supposedly only for the cx23416
678 for raw YUV).
679 Actually it looks like this should be 0 for saa7114/5 based card and 1
680 for cx25840 based cards.
681 4=generate artificial PTS timestamps
682 5=USB flush mode
683 6=something to do with the quantization matrix
684 7=set navigation pack insertion for DVD
685 8=enable scene change detection (seems to be a failure)
686 9=set history parameters of the video input module
687 10=set input field order of VIM
688 11=set quantization matrix
689 12=reset audio interface
690 13=set audio volume delay
691 14=set audio delay
692
693Param[1]
694 Command value.
diff --git a/Documentation/video4linux/cx2341x/fw-memory.txt b/Documentation/video4linux/cx2341x/fw-memory.txt
new file mode 100644
index 000000000000..ef0aad3f88fc
--- /dev/null
+++ b/Documentation/video4linux/cx2341x/fw-memory.txt
@@ -0,0 +1,141 @@
1This document describes the cx2341x memory map and documents some of the register
2space.
3
4Warning! This information was figured out from searching through the memory and
5registers, this information may not be correct and is certainly not complete, and
6was not derived from anything more than searching through the memory space with
7commands like:
8
9 ivtvctl -O min=0x02000000,max=0x020000ff
10
11So take this as is, I'm always searching for more stuff, it's a large
12register space :-).
13
14Memory Map
15==========
16
17The cx2341x exposes its entire 64M memory space to the PCI host via the PCI BAR0
18(Base Address Register 0). The addresses here are offsets relative to the
19address held in BAR0.
20
210x00000000-0x00ffffff Encoder memory space
220x00000000-0x0003ffff Encode.rom
23 ???-??? MPEG buffer(s)
24 ???-??? Raw video capture buffer(s)
25 ???-??? Raw audio capture buffer(s)
26 ???-??? Display buffers (6 or 9)
27
280x01000000-0x01ffffff Decoder memory space
290x01000000-0x0103ffff Decode.rom
30 ???-??? MPEG buffers(s)
310x0114b000-0x0115afff Audio.rom (deprecated?)
32
330x02000000-0x0200ffff Register Space
34
35Registers
36=========
37
38The registers occupy the 64k space starting at the 0x02000000 offset from BAR0.
39All of these registers are 32 bits wide.
40
41DMA Registers 0x000-0xff:
42
43 0x00 - Control:
44 0=reset/cancel, 1=read, 2=write, 4=stop
45 0x04 - DMA status:
46 1=read busy, 2=write busy, 4=read error, 8=write error, 16=link list error
47 0x08 - pci DMA pointer for read link list
48 0x0c - pci DMA pointer for write link list
49 0x10 - read/write DMA enable:
50 1=read enable, 2=write enable
51 0x14 - always 0xffffffff, if set any lower instability occurs, 0x00 crashes
52 0x18 - ??
53 0x1c - always 0x20 or 32, smaller values slow down DMA transactions
54 0x20 - always value of 0x780a010a
55 0x24-0x3c - usually just random values???
56 0x40 - Interrupt status
57 0x44 - Write a bit here and shows up in Interrupt status 0x40
58 0x48 - Interrupt Mask
59 0x4C - always value of 0xfffdffff,
60 if changed to 0xffffffff DMA write interrupts break.
61 0x50 - always 0xffffffff
62 0x54 - always 0xffffffff (0x4c, 0x50, 0x54 seem like interrupt masks, are
63 3 processors on chip, Java ones, VPU, SPU, APU, maybe these are the
64 interrupt masks???).
65 0x60-0x7C - random values
66 0x80 - first write linked list reg, for Encoder Memory addr
67 0x84 - first write linked list reg, for pci memory addr
68 0x88 - first write linked list reg, for length of buffer in memory addr
69 (|0x80000000 or this for last link)
70 0x8c-0xcc - rest of write linked list reg, 8 sets of 3 total, DMA goes here
71 from linked list addr in reg 0x0c, firmware must push through or
72 something.
73 0xe0 - first (and only) read linked list reg, for pci memory addr
74 0xe4 - first (and only) read linked list reg, for Decoder memory addr
75 0xe8 - first (and only) read linked list reg, for length of buffer
76 0xec-0xff - Nothing seems to be in these registers, 0xec-f4 are 0x00000000.
77
78Memory locations for Encoder Buffers 0x700-0x7ff:
79
80These registers show offsets of memory locations pertaining to each
81buffer area used for encoding, have to shift them by <<1 first.
82
830x07F8: Encoder SDRAM refresh
840x07FC: Encoder SDRAM pre-charge
85
86Memory locations for Decoder Buffers 0x800-0x8ff:
87
88These registers show offsets of memory locations pertaining to each
89buffer area used for decoding, have to shift them by <<1 first.
90
910x08F8: Decoder SDRAM refresh
920x08FC: Decoder SDRAM pre-charge
93
94Other memory locations:
95
960x2800: Video Display Module control
970x2D00: AO (audio output?) control
980x2D24: Bytes Flushed
990x7000: LSB I2C write clock bit (inverted)
1000x7004: LSB I2C write data bit (inverted)
1010x7008: LSB I2C read clock bit
1020x700c: LSB I2C read data bit
1030x9008: GPIO get input state
1040x900c: GPIO set output state
1050x9020: GPIO direction (Bit7 (GPIO 0..7) - 0:input, 1:output)
1060x9050: SPU control
1070x9054: Reset HW blocks
1080x9058: VPU control
1090xA018: Bit6: interrupt pending?
1100xA064: APU command
111
112
113Interrupt Status Register
114=========================
115
116The definition of the bits in the interrupt status register 0x0040, and the
117interrupt mask 0x0048. If a bit is cleared in the mask, then we want our ISR to
118execute.
119
120Bit
12131 Encoder Start Capture
12230 Encoder EOS
12329 Encoder VBI capture
12428 Encoder Video Input Module reset event
12527 Encoder DMA complete
12626
12725 Decoder copy protect detection event
12824 Decoder audio mode change detection event
12923
13022 Decoder data request
13121 Decoder I-Frame? done
13220 Decoder DMA complete
13319 Decoder VBI re-insertion
13418 Decoder DMA err (linked-list bad)
135
136Missing
137Encoder API call completed
138Decoder API call completed
139Encoder API post(?)
140Decoder API post(?)
141Decoder VTRACE event
diff --git a/Documentation/video4linux/cx2341x/fw-osd-api.txt b/Documentation/video4linux/cx2341x/fw-osd-api.txt
new file mode 100644
index 000000000000..da98ae30a37a
--- /dev/null
+++ b/Documentation/video4linux/cx2341x/fw-osd-api.txt
@@ -0,0 +1,342 @@
1OSD firmware API description
2============================
3
4Note: this API is part of the decoder firmware, so it's cx23415 only.
5
6-------------------------------------------------------------------------------
7
8Name CX2341X_OSD_GET_FRAMEBUFFER
9Enum 65/0x41
10Description
11 Return base and length of contiguous OSD memory.
12Result[0]
13 OSD base address
14Result[1]
15 OSD length
16
17-------------------------------------------------------------------------------
18
19Name CX2341X_OSD_GET_PIXEL_FORMAT
20Enum 66/0x42
21Description
22 Query OSD format
23Result[0]
24 0=8bit index, 4=AlphaRGB 8:8:8:8
25
26-------------------------------------------------------------------------------
27
28Name CX2341X_OSD_SET_PIXEL_FORMAT
29Enum 67/0x43
30Description
31 Assign pixel format
32Param[0]
33 0=8bit index, 4=AlphaRGB 8:8:8:8
34
35-------------------------------------------------------------------------------
36
37Name CX2341X_OSD_GET_STATE
38Enum 68/0x44
39Description
40 Query OSD state
41Result[0]
42 Bit 0 0=off, 1=on
43 Bits 1:2 alpha control
44 Bits 3:5 pixel format
45
46-------------------------------------------------------------------------------
47
48Name CX2341X_OSD_SET_STATE
49Enum 69/0x45
50Description
51 OSD switch
52Param[0]
53 0=off, 1=on
54
55-------------------------------------------------------------------------------
56
57Name CX2341X_OSD_GET_OSD_COORDS
58Enum 70/0x46
59Description
60 Retrieve coordinates of OSD area blended with video
61Result[0]
62 OSD buffer address
63Result[1]
64 Stride in pixels
65Result[2]
66 Lines in OSD buffer
67Result[3]
68 Horizontal offset in buffer
69Result[4]
70 Vertical offset in buffer
71
72-------------------------------------------------------------------------------
73
74Name CX2341X_OSD_SET_OSD_COORDS
75Enum 71/0x47
76Description
77 Assign the coordinates of the OSD area to blend with video
78Param[0]
79 buffer address
80Param[1]
81 buffer stride in pixels
82Param[2]
83 lines in buffer
84Param[3]
85 horizontal offset
86Param[4]
87 vertical offset
88
89-------------------------------------------------------------------------------
90
91Name CX2341X_OSD_GET_SCREEN_COORDS
92Enum 72/0x48
93Description
94 Retrieve OSD screen area coordinates
95Result[0]
96 top left horizontal offset
97Result[1]
98 top left vertical offset
99Result[2]
100 bottom right hotizontal offset
101Result[3]
102 bottom right vertical offset
103
104-------------------------------------------------------------------------------
105
106Name CX2341X_OSD_SET_SCREEN_COORDS
107Enum 73/0x49
108Description
109 Assign the coordinates of the screen area to blend with video
110Param[0]
111 top left horizontal offset
112Param[1]
113 top left vertical offset
114Param[2]
115 bottom left horizontal offset
116Param[3]
117 bottom left vertical offset
118
119-------------------------------------------------------------------------------
120
121Name CX2341X_OSD_GET_GLOBAL_ALPHA
122Enum 74/0x4A
123Description
124 Retrieve OSD global alpha
125Result[0]
126 global alpha: 0=off, 1=on
127Result[1]
128 bits 0:7 global alpha
129
130-------------------------------------------------------------------------------
131
132Name CX2341X_OSD_SET_GLOBAL_ALPHA
133Enum 75/0x4B
134Description
135 Update global alpha
136Param[0]
137 global alpha: 0=off, 1=on
138Param[1]
139 global alpha (8 bits)
140Param[2]
141 local alpha: 0=on, 1=off
142
143-------------------------------------------------------------------------------
144
145Name CX2341X_OSD_SET_BLEND_COORDS
146Enum 78/0x4C
147Description
148 Move start of blending area within display buffer
149Param[0]
150 horizontal offset in buffer
151Param[1]
152 vertical offset in buffer
153
154-------------------------------------------------------------------------------
155
156Name CX2341X_OSD_GET_FLICKER_STATE
157Enum 79/0x4F
158Description
159 Retrieve flicker reduction module state
160Result[0]
161 flicker state: 0=off, 1=on
162
163-------------------------------------------------------------------------------
164
165Name CX2341X_OSD_SET_FLICKER_STATE
166Enum 80/0x50
167Description
168 Set flicker reduction module state
169Param[0]
170 State: 0=off, 1=on
171
172-------------------------------------------------------------------------------
173
174Name CX2341X_OSD_BLT_COPY
175Enum 82/0x52
176Description
177 BLT copy
178Param[0]
179'0000' zero
180'0001' ~destination AND ~source
181'0010' ~destination AND source
182'0011' ~destination
183'0100' destination AND ~source
184'0101' ~source
185'0110' destination XOR source
186'0111' ~destination OR ~source
187'1000' ~destination AND ~source
188'1001' destination XNOR source
189'1010' source
190'1011' ~destination OR source
191'1100' destination
192'1101' destination OR ~source
193'1110' destination OR source
194'1111' one
195
196Param[1]
197 Resulting alpha blending
198 '01' source_alpha
199 '10' destination_alpha
200 '11' source_alpha*destination_alpha+1
201 (zero if both source and destination alpha are zero)
202Param[2]
203 '00' output_pixel = source_pixel
204
205 '01' if source_alpha=0:
206 output_pixel = destination_pixel
207 if 256 > source_alpha > 1:
208 output_pixel = ((source_alpha + 1)*source_pixel +
209 (255 - source_alpha)*destination_pixel)/256
210
211 '10' if destination_alpha=0:
212 output_pixel = source_pixel
213 if 255 > destination_alpha > 0:
214 output_pixel = ((255 - destination_alpha)*source_pixel +
215 (destination_alpha + 1)*destination_pixel)/256
216
217 '11' if source_alpha=0:
218 source_temp = 0
219 if source_alpha=255:
220 source_temp = source_pixel*256
221 if 255 > source_alpha > 0:
222 source_temp = source_pixel*(source_alpha + 1)
223 if destination_alpha=0:
224 destination_temp = 0
225 if destination_alpha=255:
226 destination_temp = destination_pixel*256
227 if 255 > destination_alpha > 0:
228 destination_temp = destination_pixel*(destination_alpha + 1)
229 output_pixel = (source_temp + destination_temp)/256
230Param[3]
231 width
232Param[4]
233 height
234Param[5]
235 destination pixel mask
236Param[6]
237 destination rectangle start address
238Param[7]
239 destination stride in dwords
240Param[8]
241 source stride in dwords
242Param[9]
243 source rectangle start address
244
245-------------------------------------------------------------------------------
246
247Name CX2341X_OSD_BLT_FILL
248Enum 83/0x53
249Description
250 BLT fill color
251Param[0]
252 Same as Param[0] on API 0x52
253Param[1]
254 Same as Param[1] on API 0x52
255Param[2]
256 Same as Param[2] on API 0x52
257Param[3]
258 width
259Param[4]
260 height
261Param[5]
262 destination pixel mask
263Param[6]
264 destination rectangle start address
265Param[7]
266 destination stride in dwords
267Param[8]
268 color fill value
269
270-------------------------------------------------------------------------------
271
272Name CX2341X_OSD_BLT_TEXT
273Enum 84/0x54
274Description
275 BLT for 8 bit alpha text source
276Param[0]
277 Same as Param[0] on API 0x52
278Param[1]
279 Same as Param[1] on API 0x52
280Param[2]
281 Same as Param[2] on API 0x52
282Param[3]
283 width
284Param[4]
285 height
286Param[5]
287 destination pixel mask
288Param[6]
289 destination rectangle start address
290Param[7]
291 destination stride in dwords
292Param[8]
293 source stride in dwords
294Param[9]
295 source rectangle start address
296Param[10]
297 color fill value
298
299-------------------------------------------------------------------------------
300
301Name CX2341X_OSD_SET_FRAMEBUFFER_WINDOW
302Enum 86/0x56
303Description
304 Positions the main output window on the screen. The coordinates must be
305 such that the entire window fits on the screen.
306Param[0]
307 window width
308Param[1]
309 window height
310Param[2]
311 top left window corner horizontal offset
312Param[3]
313 top left window corner vertical offset
314
315-------------------------------------------------------------------------------
316
317Name CX2341X_OSD_SET_CHROMA_KEY
318Enum 96/0x60
319Description
320 Chroma key switch and color
321Param[0]
322 state: 0=off, 1=on
323Param[1]
324 color
325
326-------------------------------------------------------------------------------
327
328Name CX2341X_OSD_GET_ALPHA_CONTENT_INDEX
329Enum 97/0x61
330Description
331 Retrieve alpha content index
332Result[0]
333 alpha content index, Range 0:15
334
335-------------------------------------------------------------------------------
336
337Name CX2341X_OSD_SET_ALPHA_CONTENT_INDEX
338Enum 98/0x62
339Description
340 Assign alpha content index
341Param[0]
342 alpha content index, range 0:15
diff --git a/Documentation/video4linux/cx2341x/fw-upload.txt b/Documentation/video4linux/cx2341x/fw-upload.txt
new file mode 100644
index 000000000000..60c502ce3215
--- /dev/null
+++ b/Documentation/video4linux/cx2341x/fw-upload.txt
@@ -0,0 +1,49 @@
1This document describes how to upload the cx2341x firmware to the card.
2
3How to find
4===========
5
6See the web pages of the various projects that uses this chip for information
7on how to obtain the firmware.
8
9The firmware stored in a Windows driver can be detected as follows:
10
11- Each firmware image is 256k bytes.
12- The 1st 32-bit word of the Encoder image is 0x0000da7
13- The 1st 32-bit word of the Decoder image is 0x00003a7
14- The 2nd 32-bit word of both images is 0xaa55bb66
15
16How to load
17===========
18
19- Issue the FWapi command to stop the encoder if it is running. Wait for the
20 command to complete.
21- Issue the FWapi command to stop the decoder if it is running. Wait for the
22 command to complete.
23- Issue the I2C command to the digitizer to stop emitting VSYNC events.
24- Issue the FWapi command to halt the encoder's firmware.
25- Sleep for 10ms.
26- Issue the FWapi command to halt the decoder's firmware.
27- Sleep for 10ms.
28- Write 0x00000000 to register 0x2800 to stop the Video Display Module.
29- Write 0x00000005 to register 0x2D00 to stop the AO (audio output?).
30- Write 0x00000000 to register 0xA064 to ping? the APU.
31- Write 0xFFFFFFFE to register 0x9058 to stop the VPU.
32- Write 0xFFFFFFFF to register 0x9054 to reset the HW blocks.
33- Write 0x00000001 to register 0x9050 to stop the SPU.
34- Sleep for 10ms.
35- Write 0x0000001A to register 0x07FC to init the Encoder SDRAM's pre-charge.
36- Write 0x80000640 to register 0x07F8 to init the Encoder SDRAM's refresh to 1us.
37- Write 0x0000001A to register 0x08FC to init the Decoder SDRAM's pre-charge.
38- Write 0x80000640 to register 0x08F8 to init the Decoder SDRAM's refresh to 1us.
39- Sleep for 512ms. (600ms is recommended)
40- Transfer the encoder's firmware image to offset 0 in Encoder memory space.
41- Transfer the decoder's firmware image to offset 0 in Decoder memory space.
42- Use a read-modify-write operation to Clear bit 0 of register 0x9050 to
43 re-enable the SPU.
44- Sleep for 1 second.
45- Use a read-modify-write operation to Clear bits 3 and 0 of register 0x9058
46 to re-enable the VPU.
47- Sleep for 1 second.
48- Issue status API commands to both firmware images to verify.
49
diff --git a/Documentation/video4linux/cx88/hauppauge-wintv-cx88-ir.txt b/Documentation/video4linux/cx88/hauppauge-wintv-cx88-ir.txt
new file mode 100644
index 000000000000..93fec32a1188
--- /dev/null
+++ b/Documentation/video4linux/cx88/hauppauge-wintv-cx88-ir.txt
@@ -0,0 +1,54 @@
1The controls for the mux are GPIO [0,1] for source, and GPIO 2 for muting.
2
3GPIO0 GPIO1
4 0 0 TV Audio
5 1 0 FM radio
6 0 1 Line-In
7 1 1 Mono tuner bypass or CD passthru (tuner specific)
8
9GPIO 16(i believe) is tied to the IR port (if present).
10
11------------------------------------------------------------------------------------
12
13>From the data sheet:
14 Register 24'h20004 PCI Interrupt Status
15 bit [18] IR_SMP_INT Set when 32 input samples have been collected over
16 gpio[16] pin into GP_SAMPLE register.
17
18What's missing from the data sheet:
19
20Setup 4KHz sampling rate (roughly 2x oversampled; good enough for our RC5
21compat remote)
22set register 0x35C050 to 0xa80a80
23
24enable sampling
25set register 0x35C054 to 0x5
26
27Of course, enable the IRQ bit 18 in the interrupt mask register .(and
28provide for a handler)
29
30GP_SAMPLE register is at 0x35C058
31
32Bits are then right shifted into the GP_SAMPLE register at the specified
33rate; you get an interrupt when a full DWORD is recieved.
34You need to recover the actual RC5 bits out of the (oversampled) IR sensor
35bits. (Hint: look for the 0/1and 1/0 crossings of the RC5 bi-phase data) An
36actual raw RC5 code will span 2-3 DWORDS, depending on the actual alignment.
37
38I'm pretty sure when no IR signal is present the receiver is always in a
39marking state(1); but stray light, etc can cause intermittent noise values
40as well. Remember, this is a free running sample of the IR receiver state
41over time, so don't assume any sample starts at any particular place.
42
43http://www.atmel.com/dyn/resources/prod_documents/doc2817.pdf
44This data sheet (google search) seems to have a lovely description of the
45RC5 basics
46
47http://users.pandora.be/nenya/electronics/rc5/ and more data
48
49http://www.ee.washington.edu/circuit_archive/text/ir_decode.txt
50and even a reference to how to decode a bi-phase data stream.
51
52http://www.xs4all.nl/~sbp/knowledge/ir/rc5.htm
53still more info
54
diff --git a/Documentation/video4linux/et61x251.txt b/Documentation/video4linux/et61x251.txt
index 29340282ab5f..cd584f20a997 100644
--- a/Documentation/video4linux/et61x251.txt
+++ b/Documentation/video4linux/et61x251.txt
@@ -1,9 +1,9 @@
1 1
2 ET61X[12]51 PC Camera Controllers 2 ET61X[12]51 PC Camera Controllers
3 Driver for Linux 3 Driver for Linux
4 ================================= 4 =================================
5 5
6 - Documentation - 6 - Documentation -
7 7
8 8
9Index 9Index
@@ -156,46 +156,46 @@ Name: video_nr
156Type: short array (min = 0, max = 64) 156Type: short array (min = 0, max = 64)
157Syntax: <-1|n[,...]> 157Syntax: <-1|n[,...]>
158Description: Specify V4L2 minor mode number: 158Description: Specify V4L2 minor mode number:
159 -1 = use next available 159 -1 = use next available
160 n = use minor number n 160 n = use minor number n
161 You can specify up to 64 cameras this way. 161 You can specify up to 64 cameras this way.
162 For example: 162 For example:
163 video_nr=-1,2,-1 would assign minor number 2 to the second 163 video_nr=-1,2,-1 would assign minor number 2 to the second
164 registered camera and use auto for the first one and for every 164 registered camera and use auto for the first one and for every
165 other camera. 165 other camera.
166Default: -1 166Default: -1
167------------------------------------------------------------------------------- 167-------------------------------------------------------------------------------
168Name: force_munmap 168Name: force_munmap
169Type: bool array (min = 0, max = 64) 169Type: bool array (min = 0, max = 64)
170Syntax: <0|1[,...]> 170Syntax: <0|1[,...]>
171Description: Force the application to unmap previously mapped buffer memory 171Description: Force the application to unmap previously mapped buffer memory
172 before calling any VIDIOC_S_CROP or VIDIOC_S_FMT ioctl's. Not 172 before calling any VIDIOC_S_CROP or VIDIOC_S_FMT ioctl's. Not
173 all the applications support this feature. This parameter is 173 all the applications support this feature. This parameter is
174 specific for each detected camera. 174 specific for each detected camera.
175 0 = do not force memory unmapping 175 0 = do not force memory unmapping
176 1 = force memory unmapping (save memory) 176 1 = force memory unmapping (save memory)
177Default: 0 177Default: 0
178------------------------------------------------------------------------------- 178-------------------------------------------------------------------------------
179Name: frame_timeout 179Name: frame_timeout
180Type: uint array (min = 0, max = 64) 180Type: uint array (min = 0, max = 64)
181Syntax: <n[,...]> 181Syntax: <n[,...]>
182Description: Timeout for a video frame in seconds. This parameter is 182Description: Timeout for a video frame in seconds. This parameter is
183 specific for each detected camera. This parameter can be 183 specific for each detected camera. This parameter can be
184 changed at runtime thanks to the /sys filesystem interface. 184 changed at runtime thanks to the /sys filesystem interface.
185Default: 2 185Default: 2
186------------------------------------------------------------------------------- 186-------------------------------------------------------------------------------
187Name: debug 187Name: debug
188Type: ushort 188Type: ushort
189Syntax: <n> 189Syntax: <n>
190Description: Debugging information level, from 0 to 3: 190Description: Debugging information level, from 0 to 3:
191 0 = none (use carefully) 191 0 = none (use carefully)
192 1 = critical errors 192 1 = critical errors
193 2 = significant informations 193 2 = significant informations
194 3 = more verbose messages 194 3 = more verbose messages
195 Level 3 is useful for testing only, when only one device 195 Level 3 is useful for testing only, when only one device
196 is used at the same time. It also shows some more informations 196 is used at the same time. It also shows some more informations
197 about the hardware being detected. This module parameter can be 197 about the hardware being detected. This module parameter can be
198 changed at runtime thanks to the /sys filesystem interface. 198 changed at runtime thanks to the /sys filesystem interface.
199Default: 2 199Default: 2
200------------------------------------------------------------------------------- 200-------------------------------------------------------------------------------
201 201
diff --git a/Documentation/video4linux/ibmcam.txt b/Documentation/video4linux/ibmcam.txt
index 4a40a2e99451..397a94eb77b8 100644
--- a/Documentation/video4linux/ibmcam.txt
+++ b/Documentation/video4linux/ibmcam.txt
@@ -21,7 +21,7 @@ Internal interface: Video For Linux (V4L)
21Supported controls: 21Supported controls:
22- by V4L: Contrast, Brightness, Color, Hue 22- by V4L: Contrast, Brightness, Color, Hue
23- by driver options: frame rate, lighting conditions, video format, 23- by driver options: frame rate, lighting conditions, video format,
24 default picture settings, sharpness. 24 default picture settings, sharpness.
25 25
26SUPPORTED CAMERAS: 26SUPPORTED CAMERAS:
27 27
@@ -191,66 +191,66 @@ init_model2_sat Integer 0..255 [0x34] init_model2_sat=65
191init_model2_yb Integer 0..255 [0xa0] init_model2_yb=200 191init_model2_yb Integer 0..255 [0xa0] init_model2_yb=200
192 192
193debug You don't need this option unless you are a developer. 193debug You don't need this option unless you are a developer.
194 If you are a developer then you will see in the code 194 If you are a developer then you will see in the code
195 what values do what. 0=off. 195 what values do what. 0=off.
196 196
197flags This is a bit mask, and you can combine any number of 197flags This is a bit mask, and you can combine any number of
198 bits to produce what you want. Usually you don't want 198 bits to produce what you want. Usually you don't want
199 any of extra features this option provides: 199 any of extra features this option provides:
200 200
201 FLAGS_RETRY_VIDIOCSYNC 1 This bit allows to retry failed 201 FLAGS_RETRY_VIDIOCSYNC 1 This bit allows to retry failed
202 VIDIOCSYNC ioctls without failing. 202 VIDIOCSYNC ioctls without failing.
203 Will work with xawtv, will not 203 Will work with xawtv, will not
204 with xrealproducer. Default is 204 with xrealproducer. Default is
205 not set. 205 not set.
206 FLAGS_MONOCHROME 2 Activates monochrome (b/w) mode. 206 FLAGS_MONOCHROME 2 Activates monochrome (b/w) mode.
207 FLAGS_DISPLAY_HINTS 4 Shows colored pixels which have 207 FLAGS_DISPLAY_HINTS 4 Shows colored pixels which have
208 magic meaning to developers. 208 magic meaning to developers.
209 FLAGS_OVERLAY_STATS 8 Shows tiny numbers on screen, 209 FLAGS_OVERLAY_STATS 8 Shows tiny numbers on screen,
210 useful only for debugging. 210 useful only for debugging.
211 FLAGS_FORCE_TESTPATTERN 16 Shows blue screen with numbers. 211 FLAGS_FORCE_TESTPATTERN 16 Shows blue screen with numbers.
212 FLAGS_SEPARATE_FRAMES 32 Shows each frame separately, as 212 FLAGS_SEPARATE_FRAMES 32 Shows each frame separately, as
213 it was received from the camera. 213 it was received from the camera.
214 Default (not set) is to mix the 214 Default (not set) is to mix the
215 preceding frame in to compensate 215 preceding frame in to compensate
216 for occasional loss of Isoc data 216 for occasional loss of Isoc data
217 on high frame rates. 217 on high frame rates.
218 FLAGS_CLEAN_FRAMES 64 Forces "cleanup" of each frame 218 FLAGS_CLEAN_FRAMES 64 Forces "cleanup" of each frame
219 prior to use; relevant only if 219 prior to use; relevant only if
220 FLAGS_SEPARATE_FRAMES is set. 220 FLAGS_SEPARATE_FRAMES is set.
221 Default is not to clean frames, 221 Default is not to clean frames,
222 this is a little faster but may 222 this is a little faster but may
223 produce flicker if frame rate is 223 produce flicker if frame rate is
224 too high and Isoc data gets lost. 224 too high and Isoc data gets lost.
225 FLAGS_NO_DECODING 128 This flag turns the video stream 225 FLAGS_NO_DECODING 128 This flag turns the video stream
226 decoder off, and dumps the raw 226 decoder off, and dumps the raw
227 Isoc data from the camera into 227 Isoc data from the camera into
228 the reading process. Useful to 228 the reading process. Useful to
229 developers, but not to users. 229 developers, but not to users.
230 230
231framerate This setting controls frame rate of the camera. This is 231framerate This setting controls frame rate of the camera. This is
232 an approximate setting (in terms of "worst" ... "best") 232 an approximate setting (in terms of "worst" ... "best")
233 because camera changes frame rate depending on amount 233 because camera changes frame rate depending on amount
234 of light available. Setting 0 is slowest, 6 is fastest. 234 of light available. Setting 0 is slowest, 6 is fastest.
235 Beware - fast settings are very demanding and may not 235 Beware - fast settings are very demanding and may not
236 work well with all video sizes. Be conservative. 236 work well with all video sizes. Be conservative.
237 237
238hue_correction This highly optional setting allows to adjust the 238hue_correction This highly optional setting allows to adjust the
239 hue of the image in a way slightly different from 239 hue of the image in a way slightly different from
240 what usual "hue" control does. Both controls affect 240 what usual "hue" control does. Both controls affect
241 YUV colorspace: regular "hue" control adjusts only 241 YUV colorspace: regular "hue" control adjusts only
242 U component, and this "hue_correction" option similarly 242 U component, and this "hue_correction" option similarly
243 adjusts only V component. However usually it is enough 243 adjusts only V component. However usually it is enough
244 to tweak only U or V to compensate for colored light or 244 to tweak only U or V to compensate for colored light or
245 color temperature; this option simply allows more 245 color temperature; this option simply allows more
246 complicated correction when and if it is necessary. 246 complicated correction when and if it is necessary.
247 247
248init_brightness These settings specify _initial_ values which will be 248init_brightness These settings specify _initial_ values which will be
249init_contrast used to set up the camera. If your V4L application has 249init_contrast used to set up the camera. If your V4L application has
250init_color its own controls to adjust the picture then these 250init_color its own controls to adjust the picture then these
251init_hue controls will be used too. These options allow you to 251init_hue controls will be used too. These options allow you to
252 preconfigure the camera when it gets connected, before 252 preconfigure the camera when it gets connected, before
253 any V4L application connects to it. Good for webcams. 253 any V4L application connects to it. Good for webcams.
254 254
255init_model2_rg These initial settings alter color balance of the 255init_model2_rg These initial settings alter color balance of the
256init_model2_rg2 camera on hardware level. All four settings may be used 256init_model2_rg2 camera on hardware level. All four settings may be used
@@ -258,47 +258,47 @@ init_model2_sat to tune the camera to specific lighting conditions. These
258init_model2_yb settings only apply to Model 2 cameras. 258init_model2_yb settings only apply to Model 2 cameras.
259 259
260lighting This option selects one of three hardware-defined 260lighting This option selects one of three hardware-defined
261 photosensitivity settings of the camera. 0=bright light, 261 photosensitivity settings of the camera. 0=bright light,
262 1=Medium (default), 2=Low light. This setting affects 262 1=Medium (default), 2=Low light. This setting affects
263 frame rate: the dimmer the lighting the lower the frame 263 frame rate: the dimmer the lighting the lower the frame
264 rate (because longer exposition time is needed). The 264 rate (because longer exposition time is needed). The
265 Model 2 cameras allow values more than 2 for this option, 265 Model 2 cameras allow values more than 2 for this option,
266 thus enabling extremely high sensitivity at cost of frame 266 thus enabling extremely high sensitivity at cost of frame
267 rate, color saturation and imaging sensor noise. 267 rate, color saturation and imaging sensor noise.
268 268
269sharpness This option controls smoothing (noise reduction) 269sharpness This option controls smoothing (noise reduction)
270 made by camera. Setting 0 is most smooth, setting 6 270 made by camera. Setting 0 is most smooth, setting 6
271 is most sharp. Be aware that CMOS sensor used in the 271 is most sharp. Be aware that CMOS sensor used in the
272 camera is pretty noisy, so if you choose 6 you will 272 camera is pretty noisy, so if you choose 6 you will
273 be greeted with "snowy" image. Default is 4. Model 2 273 be greeted with "snowy" image. Default is 4. Model 2
274 cameras do not support this feature. 274 cameras do not support this feature.
275 275
276size This setting chooses one of several image sizes that are 276size This setting chooses one of several image sizes that are
277 supported by this driver. Cameras may support more, but 277 supported by this driver. Cameras may support more, but
278 it's difficult to reverse-engineer all formats. 278 it's difficult to reverse-engineer all formats.
279 Following video sizes are supported: 279 Following video sizes are supported:
280 280
281 size=0 128x96 (Model 1 only) 281 size=0 128x96 (Model 1 only)
282 size=1 160x120 282 size=1 160x120
283 size=2 176x144 283 size=2 176x144
284 size=3 320x240 (Model 2 only) 284 size=3 320x240 (Model 2 only)
285 size=4 352x240 (Model 2 only) 285 size=4 352x240 (Model 2 only)
286 size=5 352x288 286 size=5 352x288
287 size=6 640x480 (Model 3 only) 287 size=6 640x480 (Model 3 only)
288 288
289 The 352x288 is the native size of the Model 1 sensor 289 The 352x288 is the native size of the Model 1 sensor
290 array, so it's the best resolution the camera can 290 array, so it's the best resolution the camera can
291 yield. The best resolution of Model 2 is 176x144, and 291 yield. The best resolution of Model 2 is 176x144, and
292 larger images are produced by stretching the bitmap. 292 larger images are produced by stretching the bitmap.
293 Model 3 has sensor with 640x480 grid, and it works too, 293 Model 3 has sensor with 640x480 grid, and it works too,
294 but the frame rate will be exceptionally low (1-2 FPS); 294 but the frame rate will be exceptionally low (1-2 FPS);
295 it may be still OK for some applications, like security. 295 it may be still OK for some applications, like security.
296 Choose the image size you need. The smaller image can 296 Choose the image size you need. The smaller image can
297 support faster frame rate. Default is 352x288. 297 support faster frame rate. Default is 352x288.
298 298
299For more information and the Troubleshooting FAQ visit this URL: 299For more information and the Troubleshooting FAQ visit this URL:
300 300
301 http://www.linux-usb.org/ibmcam/ 301 http://www.linux-usb.org/ibmcam/
302 302
303WHAT NEEDS TO BE DONE: 303WHAT NEEDS TO BE DONE:
304 304
diff --git a/Documentation/video4linux/ov511.txt b/Documentation/video4linux/ov511.txt
index 142741e3c578..79af610d4ba5 100644
--- a/Documentation/video4linux/ov511.txt
+++ b/Documentation/video4linux/ov511.txt
@@ -81,7 +81,7 @@ MODULE PARAMETERS:
81 TYPE: integer (Boolean) 81 TYPE: integer (Boolean)
82 DEFAULT: 1 82 DEFAULT: 1
83 DESC: Brightness is normally under automatic control and can't be set 83 DESC: Brightness is normally under automatic control and can't be set
84 manually by the video app. Set to 0 for manual control. 84 manually by the video app. Set to 0 for manual control.
85 85
86 NAME: autogain 86 NAME: autogain
87 TYPE: integer (Boolean) 87 TYPE: integer (Boolean)
@@ -97,13 +97,13 @@ MODULE PARAMETERS:
97 TYPE: integer (0-6) 97 TYPE: integer (0-6)
98 DEFAULT: 3 98 DEFAULT: 3
99 DESC: Sets the threshold for printing debug messages. The higher the value, 99 DESC: Sets the threshold for printing debug messages. The higher the value,
100 the more is printed. The levels are cumulative, and are as follows: 100 the more is printed. The levels are cumulative, and are as follows:
101 0=no debug messages 101 0=no debug messages
102 1=init/detection/unload and other significant messages 102 1=init/detection/unload and other significant messages
103 2=some warning messages 103 2=some warning messages
104 3=config/control function calls 104 3=config/control function calls
105 4=most function calls and data parsing messages 105 4=most function calls and data parsing messages
106 5=highly repetitive mesgs 106 5=highly repetitive mesgs
107 107
108 NAME: snapshot 108 NAME: snapshot
109 TYPE: integer (Boolean) 109 TYPE: integer (Boolean)
@@ -116,24 +116,24 @@ MODULE PARAMETERS:
116 TYPE: integer (1-4 for OV511, 1-31 for OV511+) 116 TYPE: integer (1-4 for OV511, 1-31 for OV511+)
117 DEFAULT: 1 117 DEFAULT: 1
118 DESC: Number of cameras allowed to stream simultaneously on a single bus. 118 DESC: Number of cameras allowed to stream simultaneously on a single bus.
119 Values higher than 1 reduce the data rate of each camera, allowing two 119 Values higher than 1 reduce the data rate of each camera, allowing two
120 or more to be used at once. If you have a complicated setup involving 120 or more to be used at once. If you have a complicated setup involving
121 both OV511 and OV511+ cameras, trial-and-error may be necessary for 121 both OV511 and OV511+ cameras, trial-and-error may be necessary for
122 finding the optimum setting. 122 finding the optimum setting.
123 123
124 NAME: compress 124 NAME: compress
125 TYPE: integer (Boolean) 125 TYPE: integer (Boolean)
126 DEFAULT: 0 126 DEFAULT: 0
127 DESC: Set this to 1 to turn on the camera's compression engine. This can 127 DESC: Set this to 1 to turn on the camera's compression engine. This can
128 potentially increase the frame rate at the expense of quality, if you 128 potentially increase the frame rate at the expense of quality, if you
129 have a fast CPU. You must load the proper compression module for your 129 have a fast CPU. You must load the proper compression module for your
130 camera before starting your application (ov511_decomp or ov518_decomp). 130 camera before starting your application (ov511_decomp or ov518_decomp).
131 131
132 NAME: testpat 132 NAME: testpat
133 TYPE: integer (Boolean) 133 TYPE: integer (Boolean)
134 DEFAULT: 0 134 DEFAULT: 0
135 DESC: This configures the camera's sensor to transmit a colored test-pattern 135 DESC: This configures the camera's sensor to transmit a colored test-pattern
136 instead of an image. This does not work correctly yet. 136 instead of an image. This does not work correctly yet.
137 137
138 NAME: dumppix 138 NAME: dumppix
139 TYPE: integer (0-2) 139 TYPE: integer (0-2)
diff --git a/Documentation/video4linux/sn9c102.txt b/Documentation/video4linux/sn9c102.txt
index 142920bc011f..1d20895b4354 100644
--- a/Documentation/video4linux/sn9c102.txt
+++ b/Documentation/video4linux/sn9c102.txt
@@ -1,9 +1,9 @@
1 1
2 SN9C10x PC Camera Controllers 2 SN9C10x PC Camera Controllers
3 Driver for Linux 3 Driver for Linux
4 ============================= 4 =============================
5 5
6 - Documentation - 6 - Documentation -
7 7
8 8
9Index 9Index
@@ -176,46 +176,46 @@ Name: video_nr
176Type: short array (min = 0, max = 64) 176Type: short array (min = 0, max = 64)
177Syntax: <-1|n[,...]> 177Syntax: <-1|n[,...]>
178Description: Specify V4L2 minor mode number: 178Description: Specify V4L2 minor mode number:
179 -1 = use next available 179 -1 = use next available
180 n = use minor number n 180 n = use minor number n
181 You can specify up to 64 cameras this way. 181 You can specify up to 64 cameras this way.
182 For example: 182 For example:
183 video_nr=-1,2,-1 would assign minor number 2 to the second 183 video_nr=-1,2,-1 would assign minor number 2 to the second
184 recognized camera and use auto for the first one and for every 184 recognized camera and use auto for the first one and for every
185 other camera. 185 other camera.
186Default: -1 186Default: -1
187------------------------------------------------------------------------------- 187-------------------------------------------------------------------------------
188Name: force_munmap 188Name: force_munmap
189Type: bool array (min = 0, max = 64) 189Type: bool array (min = 0, max = 64)
190Syntax: <0|1[,...]> 190Syntax: <0|1[,...]>
191Description: Force the application to unmap previously mapped buffer memory 191Description: Force the application to unmap previously mapped buffer memory
192 before calling any VIDIOC_S_CROP or VIDIOC_S_FMT ioctl's. Not 192 before calling any VIDIOC_S_CROP or VIDIOC_S_FMT ioctl's. Not
193 all the applications support this feature. This parameter is 193 all the applications support this feature. This parameter is
194 specific for each detected camera. 194 specific for each detected camera.
195 0 = do not force memory unmapping 195 0 = do not force memory unmapping
196 1 = force memory unmapping (save memory) 196 1 = force memory unmapping (save memory)
197Default: 0 197Default: 0
198------------------------------------------------------------------------------- 198-------------------------------------------------------------------------------
199Name: frame_timeout 199Name: frame_timeout
200Type: uint array (min = 0, max = 64) 200Type: uint array (min = 0, max = 64)
201Syntax: <n[,...]> 201Syntax: <n[,...]>
202Description: Timeout for a video frame in seconds. This parameter is 202Description: Timeout for a video frame in seconds. This parameter is
203 specific for each detected camera. This parameter can be 203 specific for each detected camera. This parameter can be
204 changed at runtime thanks to the /sys filesystem interface. 204 changed at runtime thanks to the /sys filesystem interface.
205Default: 2 205Default: 2
206------------------------------------------------------------------------------- 206-------------------------------------------------------------------------------
207Name: debug 207Name: debug
208Type: ushort 208Type: ushort
209Syntax: <n> 209Syntax: <n>
210Description: Debugging information level, from 0 to 3: 210Description: Debugging information level, from 0 to 3:
211 0 = none (use carefully) 211 0 = none (use carefully)
212 1 = critical errors 212 1 = critical errors
213 2 = significant informations 213 2 = significant informations
214 3 = more verbose messages 214 3 = more verbose messages
215 Level 3 is useful for testing only, when only one device 215 Level 3 is useful for testing only, when only one device
216 is used. It also shows some more informations about the 216 is used. It also shows some more informations about the
217 hardware being detected. This parameter can be changed at 217 hardware being detected. This parameter can be changed at
218 runtime thanks to the /sys filesystem interface. 218 runtime thanks to the /sys filesystem interface.
219Default: 2 219Default: 2
220------------------------------------------------------------------------------- 220-------------------------------------------------------------------------------
221 221
@@ -280,24 +280,24 @@ Byte # Value Description
2800x04 0xC4 Frame synchronisation pattern. 2800x04 0xC4 Frame synchronisation pattern.
2810x05 0x96 Frame synchronisation pattern. 2810x05 0x96 Frame synchronisation pattern.
2820x06 0xXX Unknown meaning. The exact value depends on the chip; 2820x06 0xXX Unknown meaning. The exact value depends on the chip;
283 possible values are 0x00, 0x01 and 0x20. 283 possible values are 0x00, 0x01 and 0x20.
2840x07 0xXX Variable value, whose bits are ff00uzzc, where ff is a 2840x07 0xXX Variable value, whose bits are ff00uzzc, where ff is a
285 frame counter, u is unknown, zz is a size indicator 285 frame counter, u is unknown, zz is a size indicator
286 (00 = VGA, 01 = SIF, 10 = QSIF) and c stands for 286 (00 = VGA, 01 = SIF, 10 = QSIF) and c stands for
287 "compression enabled" (1 = yes, 0 = no). 287 "compression enabled" (1 = yes, 0 = no).
2880x08 0xXX Brightness sum inside Auto-Exposure area (low-byte). 2880x08 0xXX Brightness sum inside Auto-Exposure area (low-byte).
2890x09 0xXX Brightness sum inside Auto-Exposure area (high-byte). 2890x09 0xXX Brightness sum inside Auto-Exposure area (high-byte).
290 For a pure white image, this number will be equal to 500 290 For a pure white image, this number will be equal to 500
291 times the area of the specified AE area. For images 291 times the area of the specified AE area. For images
292 that are not pure white, the value scales down according 292 that are not pure white, the value scales down according
293 to relative whiteness. 293 to relative whiteness.
2940x0A 0xXX Brightness sum outside Auto-Exposure area (low-byte). 2940x0A 0xXX Brightness sum outside Auto-Exposure area (low-byte).
2950x0B 0xXX Brightness sum outside Auto-Exposure area (high-byte). 2950x0B 0xXX Brightness sum outside Auto-Exposure area (high-byte).
296 For a pure white image, this number will be equal to 125 296 For a pure white image, this number will be equal to 125
297 times the area outside of the specified AE area. For 297 times the area outside of the specified AE area. For
298 images that are not pure white, the value scales down 298 images that are not pure white, the value scales down
299 according to relative whiteness. 299 according to relative whiteness.
300 according to relative whiteness. 300 according to relative whiteness.
301 301
302The following bytes are used by the SN9C103 bridge only: 302The following bytes are used by the SN9C103 bridge only:
303 303
diff --git a/Documentation/video4linux/v4lgrab.c b/Documentation/video4linux/v4lgrab.c
new file mode 100644
index 000000000000..079b628481cf
--- /dev/null
+++ b/Documentation/video4linux/v4lgrab.c
@@ -0,0 +1,192 @@
1/* Simple Video4Linux image grabber. */
2/*
3 * Video4Linux Driver Test/Example Framegrabbing Program
4 *
5 * Compile with:
6 * gcc -s -Wall -Wstrict-prototypes v4lgrab.c -o v4lgrab
7 * Use as:
8 * v4lgrab >image.ppm
9 *
10 * Copyright (C) 1998-05-03, Phil Blundell <philb@gnu.org>
11 * Copied from http://www.tazenda.demon.co.uk/phil/vgrabber.c
12 * with minor modifications (Dave Forrest, drf5n@virginia.edu).
13 *
14 */
15
16#include <unistd.h>
17#include <sys/types.h>
18#include <sys/stat.h>
19#include <fcntl.h>
20#include <stdio.h>
21#include <sys/ioctl.h>
22#include <stdlib.h>
23
24#include <linux/types.h>
25#include <linux/videodev.h>
26
27#define FILE "/dev/video0"
28
29/* Stole this from tvset.c */
30
31#define READ_VIDEO_PIXEL(buf, format, depth, r, g, b) \
32{ \
33 switch (format) \
34 { \
35 case VIDEO_PALETTE_GREY: \
36 switch (depth) \
37 { \
38 case 4: \
39 case 6: \
40 case 8: \
41 (r) = (g) = (b) = (*buf++ << 8);\
42 break; \
43 \
44 case 16: \
45 (r) = (g) = (b) = \
46 *((unsigned short *) buf); \
47 buf += 2; \
48 break; \
49 } \
50 break; \
51 \
52 \
53 case VIDEO_PALETTE_RGB565: \
54 { \
55 unsigned short tmp = *(unsigned short *)buf; \
56 (r) = tmp&0xF800; \
57 (g) = (tmp<<5)&0xFC00; \
58 (b) = (tmp<<11)&0xF800; \
59 buf += 2; \
60 } \
61 break; \
62 \
63 case VIDEO_PALETTE_RGB555: \
64 (r) = (buf[0]&0xF8)<<8; \
65 (g) = ((buf[0] << 5 | buf[1] >> 3)&0xF8)<<8; \
66 (b) = ((buf[1] << 2 ) & 0xF8)<<8; \
67 buf += 2; \
68 break; \
69 \
70 case VIDEO_PALETTE_RGB24: \
71 (r) = buf[0] << 8; (g) = buf[1] << 8; \
72 (b) = buf[2] << 8; \
73 buf += 3; \
74 break; \
75 \
76 default: \
77 fprintf(stderr, \
78 "Format %d not yet supported\n", \
79 format); \
80 } \
81}
82
83int get_brightness_adj(unsigned char *image, long size, int *brightness) {
84 long i, tot = 0;
85 for (i=0;i<size*3;i++)
86 tot += image[i];
87 *brightness = (128 - tot/(size*3))/3;
88 return !((tot/(size*3)) >= 126 && (tot/(size*3)) <= 130);
89}
90
91int main(int argc, char ** argv)
92{
93 int fd = open(FILE, O_RDONLY), f;
94 struct video_capability cap;
95 struct video_window win;
96 struct video_picture vpic;
97
98 unsigned char *buffer, *src;
99 int bpp = 24, r, g, b;
100 unsigned int i, src_depth;
101
102 if (fd < 0) {
103 perror(FILE);
104 exit(1);
105 }
106
107 if (ioctl(fd, VIDIOCGCAP, &cap) < 0) {
108 perror("VIDIOGCAP");
109 fprintf(stderr, "(" FILE " not a video4linux device?)\n");
110 close(fd);
111 exit(1);
112 }
113
114 if (ioctl(fd, VIDIOCGWIN, &win) < 0) {
115 perror("VIDIOCGWIN");
116 close(fd);
117 exit(1);
118 }
119
120 if (ioctl(fd, VIDIOCGPICT, &vpic) < 0) {
121 perror("VIDIOCGPICT");
122 close(fd);
123 exit(1);
124 }
125
126 if (cap.type & VID_TYPE_MONOCHROME) {
127 vpic.depth=8;
128 vpic.palette=VIDEO_PALETTE_GREY; /* 8bit grey */
129 if(ioctl(fd, VIDIOCSPICT, &vpic) < 0) {
130 vpic.depth=6;
131 if(ioctl(fd, VIDIOCSPICT, &vpic) < 0) {
132 vpic.depth=4;
133 if(ioctl(fd, VIDIOCSPICT, &vpic) < 0) {
134 fprintf(stderr, "Unable to find a supported capture format.\n");
135 close(fd);
136 exit(1);
137 }
138 }
139 }
140 } else {
141 vpic.depth=24;
142 vpic.palette=VIDEO_PALETTE_RGB24;
143
144 if(ioctl(fd, VIDIOCSPICT, &vpic) < 0) {
145 vpic.palette=VIDEO_PALETTE_RGB565;
146 vpic.depth=16;
147
148 if(ioctl(fd, VIDIOCSPICT, &vpic)==-1) {
149 vpic.palette=VIDEO_PALETTE_RGB555;
150 vpic.depth=15;
151
152 if(ioctl(fd, VIDIOCSPICT, &vpic)==-1) {
153 fprintf(stderr, "Unable to find a supported capture format.\n");
154 return -1;
155 }
156 }
157 }
158 }
159
160 buffer = malloc(win.width * win.height * bpp);
161 if (!buffer) {
162 fprintf(stderr, "Out of memory.\n");
163 exit(1);
164 }
165
166 do {
167 int newbright;
168 read(fd, buffer, win.width * win.height * bpp);
169 f = get_brightness_adj(buffer, win.width * win.height, &newbright);
170 if (f) {
171 vpic.brightness += (newbright << 8);
172 if(ioctl(fd, VIDIOCSPICT, &vpic)==-1) {
173 perror("VIDIOSPICT");
174 break;
175 }
176 }
177 } while (f);
178
179 fprintf(stdout, "P6\n%d %d 255\n", win.width, win.height);
180
181 src = buffer;
182
183 for (i = 0; i < win.width * win.height; i++) {
184 READ_VIDEO_PIXEL(src, vpic.palette, src_depth, r, g, b);
185 fputc(r>>8, stdout);
186 fputc(g>>8, stdout);
187 fputc(b>>8, stdout);
188 }
189
190 close(fd);
191 return 0;
192}
diff --git a/Documentation/video4linux/w9968cf.txt b/Documentation/video4linux/w9968cf.txt
index 3b704f2aae6d..0d53ce774b01 100644
--- a/Documentation/video4linux/w9968cf.txt
+++ b/Documentation/video4linux/w9968cf.txt
@@ -1,9 +1,9 @@
1 1
2 W996[87]CF JPEG USB Dual Mode Camera Chip 2 W996[87]CF JPEG USB Dual Mode Camera Chip
3 Driver for Linux 2.6 (basic version) 3 Driver for Linux 2.6 (basic version)
4 ========================================= 4 =========================================
5 5
6 - Documentation - 6 - Documentation -
7 7
8 8
9Index 9Index
@@ -188,57 +188,57 @@ Name: ovmod_load
188Type: bool 188Type: bool
189Syntax: <0|1> 189Syntax: <0|1>
190Description: Automatic 'ovcamchip' module loading: 0 disabled, 1 enabled. 190Description: Automatic 'ovcamchip' module loading: 0 disabled, 1 enabled.
191 If enabled, 'insmod' searches for the required 'ovcamchip' 191 If enabled, 'insmod' searches for the required 'ovcamchip'
192 module in the system, according to its configuration, and 192 module in the system, according to its configuration, and
193 loads that module automatically. This action is performed as 193 loads that module automatically. This action is performed as
194 once soon as the 'w9968cf' module is loaded into memory. 194 once soon as the 'w9968cf' module is loaded into memory.
195Default: 1 195Default: 1
196Note: The kernel must be compiled with the CONFIG_KMOD option 196Note: The kernel must be compiled with the CONFIG_KMOD option
197 enabled for the 'ovcamchip' module to be loaded and for 197 enabled for the 'ovcamchip' module to be loaded and for
198 this parameter to be present. 198 this parameter to be present.
199------------------------------------------------------------------------------- 199-------------------------------------------------------------------------------
200Name: simcams 200Name: simcams
201Type: int 201Type: int
202Syntax: <n> 202Syntax: <n>
203Description: Number of cameras allowed to stream simultaneously. 203Description: Number of cameras allowed to stream simultaneously.
204 n may vary from 0 to 32. 204 n may vary from 0 to 32.
205Default: 32 205Default: 32
206------------------------------------------------------------------------------- 206-------------------------------------------------------------------------------
207Name: video_nr 207Name: video_nr
208Type: int array (min = 0, max = 32) 208Type: int array (min = 0, max = 32)
209Syntax: <-1|n[,...]> 209Syntax: <-1|n[,...]>
210Description: Specify V4L minor mode number. 210Description: Specify V4L minor mode number.
211 -1 = use next available 211 -1 = use next available
212 n = use minor number n 212 n = use minor number n
213 You can specify up to 32 cameras this way. 213 You can specify up to 32 cameras this way.
214 For example: 214 For example:
215 video_nr=-1,2,-1 would assign minor number 2 to the second 215 video_nr=-1,2,-1 would assign minor number 2 to the second
216 recognized camera and use auto for the first one and for every 216 recognized camera and use auto for the first one and for every
217 other camera. 217 other camera.
218Default: -1 218Default: -1
219------------------------------------------------------------------------------- 219-------------------------------------------------------------------------------
220Name: packet_size 220Name: packet_size
221Type: int array (min = 0, max = 32) 221Type: int array (min = 0, max = 32)
222Syntax: <n[,...]> 222Syntax: <n[,...]>
223Description: Specify the maximum data payload size in bytes for alternate 223Description: Specify the maximum data payload size in bytes for alternate
224 settings, for each device. n is scaled between 63 and 1023. 224 settings, for each device. n is scaled between 63 and 1023.
225Default: 1023 225Default: 1023
226------------------------------------------------------------------------------- 226-------------------------------------------------------------------------------
227Name: max_buffers 227Name: max_buffers
228Type: int array (min = 0, max = 32) 228Type: int array (min = 0, max = 32)
229Syntax: <n[,...]> 229Syntax: <n[,...]>
230Description: For advanced users. 230Description: For advanced users.
231 Specify the maximum number of video frame buffers to allocate 231 Specify the maximum number of video frame buffers to allocate
232 for each device, from 2 to 32. 232 for each device, from 2 to 32.
233Default: 2 233Default: 2
234------------------------------------------------------------------------------- 234-------------------------------------------------------------------------------
235Name: double_buffer 235Name: double_buffer
236Type: bool array (min = 0, max = 32) 236Type: bool array (min = 0, max = 32)
237Syntax: <0|1[,...]> 237Syntax: <0|1[,...]>
238Description: Hardware double buffering: 0 disabled, 1 enabled. 238Description: Hardware double buffering: 0 disabled, 1 enabled.
239 It should be enabled if you want smooth video output: if you 239 It should be enabled if you want smooth video output: if you
240 obtain out of sync. video, disable it, or try to 240 obtain out of sync. video, disable it, or try to
241 decrease the 'clockdiv' module parameter value. 241 decrease the 'clockdiv' module parameter value.
242Default: 1 for every device. 242Default: 1 for every device.
243------------------------------------------------------------------------------- 243-------------------------------------------------------------------------------
244Name: clamping 244Name: clamping
@@ -251,9 +251,9 @@ Name: filter_type
251Type: int array (min = 0, max = 32) 251Type: int array (min = 0, max = 32)
252Syntax: <0|1|2[,...]> 252Syntax: <0|1|2[,...]>
253Description: Video filter type. 253Description: Video filter type.
254 0 none, 1 (1-2-1) 3-tap filter, 2 (2-3-6-3-2) 5-tap filter. 254 0 none, 1 (1-2-1) 3-tap filter, 2 (2-3-6-3-2) 5-tap filter.
255 The filter is used to reduce noise and aliasing artifacts 255 The filter is used to reduce noise and aliasing artifacts
256 produced by the CCD or CMOS image sensor. 256 produced by the CCD or CMOS image sensor.
257Default: 0 for every device. 257Default: 0 for every device.
258------------------------------------------------------------------------------- 258-------------------------------------------------------------------------------
259Name: largeview 259Name: largeview
@@ -266,9 +266,9 @@ Name: upscaling
266Type: bool array (min = 0, max = 32) 266Type: bool array (min = 0, max = 32)
267Syntax: <0|1[,...]> 267Syntax: <0|1[,...]>
268Description: Software scaling (for non-compressed video only): 268Description: Software scaling (for non-compressed video only):
269 0 disabled, 1 enabled. 269 0 disabled, 1 enabled.
270 Disable it if you have a slow CPU or you don't have enough 270 Disable it if you have a slow CPU or you don't have enough
271 memory. 271 memory.
272Default: 0 for every device. 272Default: 0 for every device.
273Note: If 'w9968cf-vpp' is not present, this parameter is set to 0. 273Note: If 'w9968cf-vpp' is not present, this parameter is set to 0.
274------------------------------------------------------------------------------- 274-------------------------------------------------------------------------------
@@ -276,36 +276,36 @@ Name: decompression
276Type: int array (min = 0, max = 32) 276Type: int array (min = 0, max = 32)
277Syntax: <0|1|2[,...]> 277Syntax: <0|1|2[,...]>
278Description: Software video decompression: 278Description: Software video decompression:
279 0 = disables decompression 279 0 = disables decompression
280 (doesn't allow formats needing decompression). 280 (doesn't allow formats needing decompression).
281 1 = forces decompression 281 1 = forces decompression
282 (allows formats needing decompression only). 282 (allows formats needing decompression only).
283 2 = allows any permitted formats. 283 2 = allows any permitted formats.
284 Formats supporting (de)compressed video are YUV422P and 284 Formats supporting (de)compressed video are YUV422P and
285 YUV420P/YUV420 in any resolutions where width and height are 285 YUV420P/YUV420 in any resolutions where width and height are
286 multiples of 16. 286 multiples of 16.
287Default: 2 for every device. 287Default: 2 for every device.
288Note: If 'w9968cf-vpp' is not present, forcing decompression is not 288Note: If 'w9968cf-vpp' is not present, forcing decompression is not
289 allowed; in this case this parameter is set to 2. 289 allowed; in this case this parameter is set to 2.
290------------------------------------------------------------------------------- 290-------------------------------------------------------------------------------
291Name: force_palette 291Name: force_palette
292Type: int array (min = 0, max = 32) 292Type: int array (min = 0, max = 32)
293Syntax: <0|9|10|13|15|8|7|1|6|3|4|5[,...]> 293Syntax: <0|9|10|13|15|8|7|1|6|3|4|5[,...]>
294Description: Force picture palette. 294Description: Force picture palette.
295 In order: 295 In order:
296 0 = Off - allows any of the following formats: 296 0 = Off - allows any of the following formats:
297 9 = UYVY 16 bpp - Original video, compression disabled 297 9 = UYVY 16 bpp - Original video, compression disabled
298 10 = YUV420 12 bpp - Original video, compression enabled 298 10 = YUV420 12 bpp - Original video, compression enabled
299 13 = YUV422P 16 bpp - Original video, compression enabled 299 13 = YUV422P 16 bpp - Original video, compression enabled
300 15 = YUV420P 12 bpp - Original video, compression enabled 300 15 = YUV420P 12 bpp - Original video, compression enabled
301 8 = YUVY 16 bpp - Software conversion from UYVY 301 8 = YUVY 16 bpp - Software conversion from UYVY
302 7 = YUV422 16 bpp - Software conversion from UYVY 302 7 = YUV422 16 bpp - Software conversion from UYVY
303 1 = GREY 8 bpp - Software conversion from UYVY 303 1 = GREY 8 bpp - Software conversion from UYVY
304 6 = RGB555 16 bpp - Software conversion from UYVY 304 6 = RGB555 16 bpp - Software conversion from UYVY
305 3 = RGB565 16 bpp - Software conversion from UYVY 305 3 = RGB565 16 bpp - Software conversion from UYVY
306 4 = RGB24 24 bpp - Software conversion from UYVY 306 4 = RGB24 24 bpp - Software conversion from UYVY
307 5 = RGB32 32 bpp - Software conversion from UYVY 307 5 = RGB32 32 bpp - Software conversion from UYVY
308 When not 0, this parameter will override 'decompression'. 308 When not 0, this parameter will override 'decompression'.
309Default: 0 for every device. Initial palette is 9 (UYVY). 309Default: 0 for every device. Initial palette is 9 (UYVY).
310Note: If 'w9968cf-vpp' is not present, this parameter is set to 9. 310Note: If 'w9968cf-vpp' is not present, this parameter is set to 9.
311------------------------------------------------------------------------------- 311-------------------------------------------------------------------------------
@@ -313,77 +313,77 @@ Name: force_rgb
313Type: bool array (min = 0, max = 32) 313Type: bool array (min = 0, max = 32)
314Syntax: <0|1[,...]> 314Syntax: <0|1[,...]>
315Description: Read RGB video data instead of BGR: 315Description: Read RGB video data instead of BGR:
316 1 = use RGB component ordering. 316 1 = use RGB component ordering.
317 0 = use BGR component ordering. 317 0 = use BGR component ordering.
318 This parameter has effect when using RGBX palettes only. 318 This parameter has effect when using RGBX palettes only.
319Default: 0 for every device. 319Default: 0 for every device.
320------------------------------------------------------------------------------- 320-------------------------------------------------------------------------------
321Name: autobright 321Name: autobright
322Type: bool array (min = 0, max = 32) 322Type: bool array (min = 0, max = 32)
323Syntax: <0|1[,...]> 323Syntax: <0|1[,...]>
324Description: Image sensor automatically changes brightness: 324Description: Image sensor automatically changes brightness:
325 0 = no, 1 = yes 325 0 = no, 1 = yes
326Default: 0 for every device. 326Default: 0 for every device.
327------------------------------------------------------------------------------- 327-------------------------------------------------------------------------------
328Name: autoexp 328Name: autoexp
329Type: bool array (min = 0, max = 32) 329Type: bool array (min = 0, max = 32)
330Syntax: <0|1[,...]> 330Syntax: <0|1[,...]>
331Description: Image sensor automatically changes exposure: 331Description: Image sensor automatically changes exposure:
332 0 = no, 1 = yes 332 0 = no, 1 = yes
333Default: 1 for every device. 333Default: 1 for every device.
334------------------------------------------------------------------------------- 334-------------------------------------------------------------------------------
335Name: lightfreq 335Name: lightfreq
336Type: int array (min = 0, max = 32) 336Type: int array (min = 0, max = 32)
337Syntax: <50|60[,...]> 337Syntax: <50|60[,...]>
338Description: Light frequency in Hz: 338Description: Light frequency in Hz:
339 50 for European and Asian lighting, 60 for American lighting. 339 50 for European and Asian lighting, 60 for American lighting.
340Default: 50 for every device. 340Default: 50 for every device.
341------------------------------------------------------------------------------- 341-------------------------------------------------------------------------------
342Name: bandingfilter 342Name: bandingfilter
343Type: bool array (min = 0, max = 32) 343Type: bool array (min = 0, max = 32)
344Syntax: <0|1[,...]> 344Syntax: <0|1[,...]>
345Description: Banding filter to reduce effects of fluorescent 345Description: Banding filter to reduce effects of fluorescent
346 lighting: 346 lighting:
347 0 disabled, 1 enabled. 347 0 disabled, 1 enabled.
348 This filter tries to reduce the pattern of horizontal 348 This filter tries to reduce the pattern of horizontal
349 light/dark bands caused by some (usually fluorescent) lighting. 349 light/dark bands caused by some (usually fluorescent) lighting.
350Default: 0 for every device. 350Default: 0 for every device.
351------------------------------------------------------------------------------- 351-------------------------------------------------------------------------------
352Name: clockdiv 352Name: clockdiv
353Type: int array (min = 0, max = 32) 353Type: int array (min = 0, max = 32)
354Syntax: <-1|n[,...]> 354Syntax: <-1|n[,...]>
355Description: Force pixel clock divisor to a specific value (for experts): 355Description: Force pixel clock divisor to a specific value (for experts):
356 n may vary from 0 to 127. 356 n may vary from 0 to 127.
357 -1 for automatic value. 357 -1 for automatic value.
358 See also the 'double_buffer' module parameter. 358 See also the 'double_buffer' module parameter.
359Default: -1 for every device. 359Default: -1 for every device.
360------------------------------------------------------------------------------- 360-------------------------------------------------------------------------------
361Name: backlight 361Name: backlight
362Type: bool array (min = 0, max = 32) 362Type: bool array (min = 0, max = 32)
363Syntax: <0|1[,...]> 363Syntax: <0|1[,...]>
364Description: Objects are lit from behind: 364Description: Objects are lit from behind:
365 0 = no, 1 = yes 365 0 = no, 1 = yes
366Default: 0 for every device. 366Default: 0 for every device.
367------------------------------------------------------------------------------- 367-------------------------------------------------------------------------------
368Name: mirror 368Name: mirror
369Type: bool array (min = 0, max = 32) 369Type: bool array (min = 0, max = 32)
370Syntax: <0|1[,...]> 370Syntax: <0|1[,...]>
371Description: Reverse image horizontally: 371Description: Reverse image horizontally:
372 0 = no, 1 = yes 372 0 = no, 1 = yes
373Default: 0 for every device. 373Default: 0 for every device.
374------------------------------------------------------------------------------- 374-------------------------------------------------------------------------------
375Name: monochrome 375Name: monochrome
376Type: bool array (min = 0, max = 32) 376Type: bool array (min = 0, max = 32)
377Syntax: <0|1[,...]> 377Syntax: <0|1[,...]>
378Description: The image sensor is monochrome: 378Description: The image sensor is monochrome:
379 0 = no, 1 = yes 379 0 = no, 1 = yes
380Default: 0 for every device. 380Default: 0 for every device.
381------------------------------------------------------------------------------- 381-------------------------------------------------------------------------------
382Name: brightness 382Name: brightness
383Type: long array (min = 0, max = 32) 383Type: long array (min = 0, max = 32)
384Syntax: <n[,...]> 384Syntax: <n[,...]>
385Description: Set picture brightness (0-65535). 385Description: Set picture brightness (0-65535).
386 This parameter has no effect if 'autobright' is enabled. 386 This parameter has no effect if 'autobright' is enabled.
387Default: 31000 for every device. 387Default: 31000 for every device.
388------------------------------------------------------------------------------- 388-------------------------------------------------------------------------------
389Name: hue 389Name: hue
@@ -414,23 +414,23 @@ Name: debug
414Type: int 414Type: int
415Syntax: <n> 415Syntax: <n>
416Description: Debugging information level, from 0 to 6: 416Description: Debugging information level, from 0 to 6:
417 0 = none (use carefully) 417 0 = none (use carefully)
418 1 = critical errors 418 1 = critical errors
419 2 = significant informations 419 2 = significant informations
420 3 = configuration or general messages 420 3 = configuration or general messages
421 4 = warnings 421 4 = warnings
422 5 = called functions 422 5 = called functions
423 6 = function internals 423 6 = function internals
424 Level 5 and 6 are useful for testing only, when only one 424 Level 5 and 6 are useful for testing only, when only one
425 device is used. 425 device is used.
426Default: 2 426Default: 2
427------------------------------------------------------------------------------- 427-------------------------------------------------------------------------------
428Name: specific_debug 428Name: specific_debug
429Type: bool 429Type: bool
430Syntax: <0|1> 430Syntax: <0|1>
431Description: Enable or disable specific debugging messages: 431Description: Enable or disable specific debugging messages:
432 0 = print messages concerning every level <= 'debug' level. 432 0 = print messages concerning every level <= 'debug' level.
433 1 = print messages concerning the level indicated by 'debug'. 433 1 = print messages concerning the level indicated by 'debug'.
434Default: 0 434Default: 0
435------------------------------------------------------------------------------- 435-------------------------------------------------------------------------------
436 436
diff --git a/Documentation/video4linux/zc0301.txt b/Documentation/video4linux/zc0301.txt
index f55262c6733b..f406f5e80046 100644
--- a/Documentation/video4linux/zc0301.txt
+++ b/Documentation/video4linux/zc0301.txt
@@ -1,9 +1,9 @@
1 1
2 ZC0301 Image Processor and Control Chip 2 ZC0301 and ZC0301P Image Processor and Control Chip
3 Driver for Linux 3 Driver for Linux
4 ======================================= 4 ===================================================
5 5
6 - Documentation - 6 - Documentation -
7 7
8 8
9Index 9Index
@@ -51,13 +51,13 @@ Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
51 51
524. Overview and features 524. Overview and features
53======================== 53========================
54This driver supports the video interface of the devices mounting the ZC0301 54This driver supports the video interface of the devices mounting the ZC0301 or
55Image Processor and Control Chip. 55ZC0301P Image Processors and Control Chips.
56 56
57The driver relies on the Video4Linux2 and USB core modules. It has been 57The driver relies on the Video4Linux2 and USB core modules. It has been
58designed to run properly on SMP systems as well. 58designed to run properly on SMP systems as well.
59 59
60The latest version of the ZC0301 driver can be found at the following URL: 60The latest version of the ZC0301[P] driver can be found at the following URL:
61http://www.linux-projects.org/ 61http://www.linux-projects.org/
62 62
63Some of the features of the driver are: 63Some of the features of the driver are:
@@ -117,7 +117,7 @@ supported by the USB Audio driver thanks to the ALSA API:
117 117
118And finally: 118And finally:
119 119
120 # USB Multimedia devices 120 # V4L USB devices
121 # 121 #
122 CONFIG_USB_ZC0301=m 122 CONFIG_USB_ZC0301=m
123 123
@@ -146,46 +146,46 @@ Name: video_nr
146Type: short array (min = 0, max = 64) 146Type: short array (min = 0, max = 64)
147Syntax: <-1|n[,...]> 147Syntax: <-1|n[,...]>
148Description: Specify V4L2 minor mode number: 148Description: Specify V4L2 minor mode number:
149 -1 = use next available 149 -1 = use next available
150 n = use minor number n 150 n = use minor number n
151 You can specify up to 64 cameras this way. 151 You can specify up to 64 cameras this way.
152 For example: 152 For example:
153 video_nr=-1,2,-1 would assign minor number 2 to the second 153 video_nr=-1,2,-1 would assign minor number 2 to the second
154 registered camera and use auto for the first one and for every 154 registered camera and use auto for the first one and for every
155 other camera. 155 other camera.
156Default: -1 156Default: -1
157------------------------------------------------------------------------------- 157-------------------------------------------------------------------------------
158Name: force_munmap 158Name: force_munmap
159Type: bool array (min = 0, max = 64) 159Type: bool array (min = 0, max = 64)
160Syntax: <0|1[,...]> 160Syntax: <0|1[,...]>
161Description: Force the application to unmap previously mapped buffer memory 161Description: Force the application to unmap previously mapped buffer memory
162 before calling any VIDIOC_S_CROP or VIDIOC_S_FMT ioctl's. Not 162 before calling any VIDIOC_S_CROP or VIDIOC_S_FMT ioctl's. Not
163 all the applications support this feature. This parameter is 163 all the applications support this feature. This parameter is
164 specific for each detected camera. 164 specific for each detected camera.
165 0 = do not force memory unmapping 165 0 = do not force memory unmapping
166 1 = force memory unmapping (save memory) 166 1 = force memory unmapping (save memory)
167Default: 0 167Default: 0
168------------------------------------------------------------------------------- 168-------------------------------------------------------------------------------
169Name: frame_timeout 169Name: frame_timeout
170Type: uint array (min = 0, max = 64) 170Type: uint array (min = 0, max = 64)
171Syntax: <n[,...]> 171Syntax: <n[,...]>
172Description: Timeout for a video frame in seconds. This parameter is 172Description: Timeout for a video frame in seconds. This parameter is
173 specific for each detected camera. This parameter can be 173 specific for each detected camera. This parameter can be
174 changed at runtime thanks to the /sys filesystem interface. 174 changed at runtime thanks to the /sys filesystem interface.
175Default: 2 175Default: 2
176------------------------------------------------------------------------------- 176-------------------------------------------------------------------------------
177Name: debug 177Name: debug
178Type: ushort 178Type: ushort
179Syntax: <n> 179Syntax: <n>
180Description: Debugging information level, from 0 to 3: 180Description: Debugging information level, from 0 to 3:
181 0 = none (use carefully) 181 0 = none (use carefully)
182 1 = critical errors 182 1 = critical errors
183 2 = significant informations 183 2 = significant informations
184 3 = more verbose messages 184 3 = more verbose messages
185 Level 3 is useful for testing only, when only one device 185 Level 3 is useful for testing only, when only one device
186 is used at the same time. It also shows some more informations 186 is used at the same time. It also shows some more informations
187 about the hardware being detected. This module parameter can be 187 about the hardware being detected. This module parameter can be
188 changed at runtime thanks to the /sys filesystem interface. 188 changed at runtime thanks to the /sys filesystem interface.
189Default: 2 189Default: 2
190------------------------------------------------------------------------------- 190-------------------------------------------------------------------------------
191 191
@@ -204,11 +204,25 @@ Vendor ID Product ID
2040x041e 0x4017 2040x041e 0x4017
2050x041e 0x401c 2050x041e 0x401c
2060x041e 0x401e 2060x041e 0x401e
2070x041e 0x401f
2080x041e 0x4022
2070x041e 0x4034 2090x041e 0x4034
2080x041e 0x4035 2100x041e 0x4035
2110x041e 0x4036
2120x041e 0x403a
2130x0458 0x7007
2140x0458 0x700C
2150x0458 0x700f
2160x046d 0x08ae
2170x055f 0xd003
2180x055f 0xd004
2090x046d 0x08ae 2190x046d 0x08ae
2100x0ac8 0x0301 2200x0ac8 0x0301
2210x0ac8 0x301b
2220x0ac8 0x303b
2230x10fd 0x0128
2110x10fd 0x8050 2240x10fd 0x8050
2250x10fd 0x804e
212 226
213The list above does not imply that all those devices work with this driver: up 227The list above does not imply that all those devices work with this driver: up
214until now only the ones that mount the following image sensors are supported; 228until now only the ones that mount the following image sensors are supported;
@@ -217,6 +231,7 @@ kernel messages will always tell you whether this is the case:
217Model Manufacturer 231Model Manufacturer
218----- ------------ 232----- ------------
219PAS202BCB PixArt Imaging, Inc. 233PAS202BCB PixArt Imaging, Inc.
234PB-0330 Photobit Corporation
220 235
221 236
2229. Notes for V4L2 application developers 2379. Notes for V4L2 application developers
@@ -250,5 +265,6 @@ the fingerprint is: '88E8 F32F 7244 68BA 3958 5D40 99DA 5D2A FCE6 35A4'.
250 been taken from the documentation of the ZC030x Video4Linux1 driver written 265 been taken from the documentation of the ZC030x Video4Linux1 driver written
251 by Andrew Birkett <andy@nobugs.org>; 266 by Andrew Birkett <andy@nobugs.org>;
252- The initialization values of the ZC0301 controller connected to the PAS202BCB 267- The initialization values of the ZC0301 controller connected to the PAS202BCB
253 image sensor have been taken from the SPCA5XX driver maintained by 268 and PB-0330 image sensors have been taken from the SPCA5XX driver maintained
254 Michel Xhaard <mxhaard@magic.fr>. 269 by Michel Xhaard <mxhaard@magic.fr>;
270- Stanislav Lechev donated one camera.
diff --git a/Documentation/watchdog/pcwd-watchdog.txt b/Documentation/watchdog/pcwd-watchdog.txt
index 12187a33e310..d9ee6336c1d4 100644
--- a/Documentation/watchdog/pcwd-watchdog.txt
+++ b/Documentation/watchdog/pcwd-watchdog.txt
@@ -22,78 +22,9 @@
22 to run the program with an "&" to run it in the background!) 22 to run the program with an "&" to run it in the background!)
23 23
24 If you want to write a program to be compatible with the PC Watchdog 24 If you want to write a program to be compatible with the PC Watchdog
25 driver, simply do the following: 25 driver, simply use of modify the watchdog test program:
26 26 Documentation/watchdog/src/watchdog-test.c
27-- Snippet of code -- 27
28/*
29 * Watchdog Driver Test Program
30 */
31
32#include <stdio.h>
33#include <stdlib.h>
34#include <string.h>
35#include <unistd.h>
36#include <fcntl.h>
37#include <sys/ioctl.h>
38#include <linux/types.h>
39#include <linux/watchdog.h>
40
41int fd;
42
43/*
44 * This function simply sends an IOCTL to the driver, which in turn ticks
45 * the PC Watchdog card to reset its internal timer so it doesn't trigger
46 * a computer reset.
47 */
48void keep_alive(void)
49{
50 int dummy;
51
52 ioctl(fd, WDIOC_KEEPALIVE, &dummy);
53}
54
55/*
56 * The main program. Run the program with "-d" to disable the card,
57 * or "-e" to enable the card.
58 */
59int main(int argc, char *argv[])
60{
61 fd = open("/dev/watchdog", O_WRONLY);
62
63 if (fd == -1) {
64 fprintf(stderr, "Watchdog device not enabled.\n");
65 fflush(stderr);
66 exit(-1);
67 }
68
69 if (argc > 1) {
70 if (!strncasecmp(argv[1], "-d", 2)) {
71 ioctl(fd, WDIOC_SETOPTIONS, WDIOS_DISABLECARD);
72 fprintf(stderr, "Watchdog card disabled.\n");
73 fflush(stderr);
74 exit(0);
75 } else if (!strncasecmp(argv[1], "-e", 2)) {
76 ioctl(fd, WDIOC_SETOPTIONS, WDIOS_ENABLECARD);
77 fprintf(stderr, "Watchdog card enabled.\n");
78 fflush(stderr);
79 exit(0);
80 } else {
81 fprintf(stderr, "-d to disable, -e to enable.\n");
82 fprintf(stderr, "run by itself to tick the card.\n");
83 fflush(stderr);
84 exit(0);
85 }
86 } else {
87 fprintf(stderr, "Watchdog Ticking Away!\n");
88 fflush(stderr);
89 }
90
91 while(1) {
92 keep_alive();
93 sleep(1);
94 }
95}
96-- End snippet --
97 28
98 Other IOCTL functions include: 29 Other IOCTL functions include:
99 30
diff --git a/Documentation/watchdog/src/watchdog-simple.c b/Documentation/watchdog/src/watchdog-simple.c
new file mode 100644
index 000000000000..85cf17c48669
--- /dev/null
+++ b/Documentation/watchdog/src/watchdog-simple.c
@@ -0,0 +1,15 @@
1#include <stdlib.h>
2#include <fcntl.h>
3
4int main(int argc, const char *argv[]) {
5 int fd = open("/dev/watchdog", O_WRONLY);
6 if (fd == -1) {
7 perror("watchdog");
8 exit(1);
9 }
10 while (1) {
11 write(fd, "\0", 1);
12 fsync(fd);
13 sleep(10);
14 }
15}
diff --git a/Documentation/watchdog/src/watchdog-test.c b/Documentation/watchdog/src/watchdog-test.c
new file mode 100644
index 000000000000..65f6c19cb865
--- /dev/null
+++ b/Documentation/watchdog/src/watchdog-test.c
@@ -0,0 +1,68 @@
1/*
2 * Watchdog Driver Test Program
3 */
4
5#include <stdio.h>
6#include <stdlib.h>
7#include <string.h>
8#include <unistd.h>
9#include <fcntl.h>
10#include <sys/ioctl.h>
11#include <linux/types.h>
12#include <linux/watchdog.h>
13
14int fd;
15
16/*
17 * This function simply sends an IOCTL to the driver, which in turn ticks
18 * the PC Watchdog card to reset its internal timer so it doesn't trigger
19 * a computer reset.
20 */
21void keep_alive(void)
22{
23 int dummy;
24
25 ioctl(fd, WDIOC_KEEPALIVE, &dummy);
26}
27
28/*
29 * The main program. Run the program with "-d" to disable the card,
30 * or "-e" to enable the card.
31 */
32int main(int argc, char *argv[])
33{
34 fd = open("/dev/watchdog", O_WRONLY);
35
36 if (fd == -1) {
37 fprintf(stderr, "Watchdog device not enabled.\n");
38 fflush(stderr);
39 exit(-1);
40 }
41
42 if (argc > 1) {
43 if (!strncasecmp(argv[1], "-d", 2)) {
44 ioctl(fd, WDIOC_SETOPTIONS, WDIOS_DISABLECARD);
45 fprintf(stderr, "Watchdog card disabled.\n");
46 fflush(stderr);
47 exit(0);
48 } else if (!strncasecmp(argv[1], "-e", 2)) {
49 ioctl(fd, WDIOC_SETOPTIONS, WDIOS_ENABLECARD);
50 fprintf(stderr, "Watchdog card enabled.\n");
51 fflush(stderr);
52 exit(0);
53 } else {
54 fprintf(stderr, "-d to disable, -e to enable.\n");
55 fprintf(stderr, "run by itself to tick the card.\n");
56 fflush(stderr);
57 exit(0);
58 }
59 } else {
60 fprintf(stderr, "Watchdog Ticking Away!\n");
61 fflush(stderr);
62 }
63
64 while(1) {
65 keep_alive();
66 sleep(1);
67 }
68}
diff --git a/Documentation/watchdog/watchdog-api.txt b/Documentation/watchdog/watchdog-api.txt
index 21ed51173662..958ff3d48be3 100644
--- a/Documentation/watchdog/watchdog-api.txt
+++ b/Documentation/watchdog/watchdog-api.txt
@@ -34,22 +34,7 @@ activates as soon as /dev/watchdog is opened and will reboot unless
34the watchdog is pinged within a certain time, this time is called the 34the watchdog is pinged within a certain time, this time is called the
35timeout or margin. The simplest way to ping the watchdog is to write 35timeout or margin. The simplest way to ping the watchdog is to write
36some data to the device. So a very simple watchdog daemon would look 36some data to the device. So a very simple watchdog daemon would look
37like this: 37like this source file: see Documentation/watchdog/src/watchdog-simple.c
38
39#include <stdlib.h>
40#include <fcntl.h>
41
42int main(int argc, const char *argv[]) {
43 int fd=open("/dev/watchdog",O_WRONLY);
44 if (fd==-1) {
45 perror("watchdog");
46 exit(1);
47 }
48 while(1) {
49 write(fd, "\0", 1);
50 sleep(10);
51 }
52}
53 38
54A more advanced driver could for example check that a HTTP server is 39A more advanced driver could for example check that a HTTP server is
55still responding before doing the write call to ping the watchdog. 40still responding before doing the write call to ping the watchdog.
@@ -110,7 +95,40 @@ current timeout using the GETTIMEOUT ioctl.
110 ioctl(fd, WDIOC_GETTIMEOUT, &timeout); 95 ioctl(fd, WDIOC_GETTIMEOUT, &timeout);
111 printf("The timeout was is %d seconds\n", timeout); 96 printf("The timeout was is %d seconds\n", timeout);
112 97
113Envinronmental monitoring: 98Pretimeouts:
99
100Some watchdog timers can be set to have a trigger go off before the
101actual time they will reset the system. This can be done with an NMI,
102interrupt, or other mechanism. This allows Linux to record useful
103information (like panic information and kernel coredumps) before it
104resets.
105
106 pretimeout = 10;
107 ioctl(fd, WDIOC_SETPRETIMEOUT, &pretimeout);
108
109Note that the pretimeout is the number of seconds before the time
110when the timeout will go off. It is not the number of seconds until
111the pretimeout. So, for instance, if you set the timeout to 60 seconds
112and the pretimeout to 10 seconds, the pretimout will go of in 50
113seconds. Setting a pretimeout to zero disables it.
114
115There is also a get function for getting the pretimeout:
116
117 ioctl(fd, WDIOC_GETPRETIMEOUT, &timeout);
118 printf("The pretimeout was is %d seconds\n", timeout);
119
120Not all watchdog drivers will support a pretimeout.
121
122Get the number of seconds before reboot:
123
124Some watchdog drivers have the ability to report the remaining time
125before the system will reboot. The WDIOC_GETTIMELEFT is the ioctl
126that returns the number of seconds before reboot.
127
128 ioctl(fd, WDIOC_GETTIMELEFT, &timeleft);
129 printf("The timeout was is %d seconds\n", timeleft);
130
131Environmental monitoring:
114 132
115All watchdog drivers are required return more information about the system, 133All watchdog drivers are required return more information about the system,
116some do temperature, fan and power level monitoring, some can tell you 134some do temperature, fan and power level monitoring, some can tell you
@@ -169,6 +187,10 @@ The watchdog saw a keepalive ping since it was last queried.
169 187
170 WDIOF_SETTIMEOUT Can set/get the timeout 188 WDIOF_SETTIMEOUT Can set/get the timeout
171 189
190The watchdog can do pretimeouts.
191
192 WDIOF_PRETIMEOUT Pretimeout (in seconds), get/set
193
172 194
173For those drivers that return any bits set in the option field, the 195For those drivers that return any bits set in the option field, the
174GETSTATUS and GETBOOTSTATUS ioctls can be used to ask for the current 196GETSTATUS and GETBOOTSTATUS ioctls can be used to ask for the current
diff --git a/Documentation/watchdog/watchdog.txt b/Documentation/watchdog/watchdog.txt
index dffda29c8799..4b1ff69cc19a 100644
--- a/Documentation/watchdog/watchdog.txt
+++ b/Documentation/watchdog/watchdog.txt
@@ -65,28 +65,7 @@ The external event interfaces on the WDT boards are not currently supported.
65Minor numbers are however allocated for it. 65Minor numbers are however allocated for it.
66 66
67 67
68Example Watchdog Driver 68Example Watchdog Driver: see Documentation/watchdog/src/watchdog-simple.c
69-----------------------
70
71#include <stdio.h>
72#include <unistd.h>
73#include <fcntl.h>
74
75int main(int argc, const char *argv[])
76{
77 int fd=open("/dev/watchdog",O_WRONLY);
78 if(fd==-1)
79 {
80 perror("watchdog");
81 exit(1);
82 }
83 while(1)
84 {
85 write(fd,"\0",1);
86 fsync(fd);
87 sleep(10);
88 }
89}
90 69
91 70
92Contact Information 71Contact Information
diff --git a/Documentation/x86_64/boot-options.txt b/Documentation/x86_64/boot-options.txt
index f2cd6ef53ff3..74b77f9e91bc 100644
--- a/Documentation/x86_64/boot-options.txt
+++ b/Documentation/x86_64/boot-options.txt
@@ -199,12 +199,38 @@ IOMMU
199 allowed overwrite iommu off workarounds for specific chipsets. 199 allowed overwrite iommu off workarounds for specific chipsets.
200 soft Use software bounce buffering (default for Intel machines) 200 soft Use software bounce buffering (default for Intel machines)
201 noaperture Don't touch the aperture for AGP. 201 noaperture Don't touch the aperture for AGP.
202 allowdac Allow DMA >4GB
203 When off all DMA over >4GB is forced through an IOMMU or bounce
204 buffering.
205 nodac Forbid DMA >4GB
206 panic Always panic when IOMMU overflows
202 207
203 swiotlb=pages[,force] 208 swiotlb=pages[,force]
204 209
205 pages Prereserve that many 128K pages for the software IO bounce buffering. 210 pages Prereserve that many 128K pages for the software IO bounce buffering.
206 force Force all IO through the software TLB. 211 force Force all IO through the software TLB.
207 212
213 calgary=[64k,128k,256k,512k,1M,2M,4M,8M]
214 calgary=[translate_empty_slots]
215 calgary=[disable=<PCI bus number>]
216
217 64k,...,8M - Set the size of each PCI slot's translation table
218 when using the Calgary IOMMU. This is the size of the translation
219 table itself in main memory. The smallest table, 64k, covers an IO
220 space of 32MB; the largest, 8MB table, can cover an IO space of
221 4GB. Normally the kernel will make the right choice by itself.
222
223 translate_empty_slots - Enable translation even on slots that have
224 no devices attached to them, in case a device will be hotplugged
225 in the future.
226
227 disable=<PCI bus number> - Disable translation on a given PHB. For
228 example, the built-in graphics adapter resides on the first bridge
229 (PCI bus number 0); if translation (isolation) is enabled on this
230 bridge, X servers that access the hardware directly from user
231 space might stop working. Use this option if you have devices that
232 are accessed from userspace directly on some PCI host bridge.
233
208Debugging 234Debugging
209 235
210 oops=panic Always panic on oopses. Default is to just kill the process, 236 oops=panic Always panic on oopses. Default is to just kill the process,
@@ -217,6 +243,20 @@ Debugging
217 pagefaulttrace Dump all page faults. Only useful for extreme debugging 243 pagefaulttrace Dump all page faults. Only useful for extreme debugging
218 and will create a lot of output. 244 and will create a lot of output.
219 245
246 call_trace=[old|both|newfallback|new]
247 old: use old inexact backtracer
248 new: use new exact dwarf2 unwinder
249 both: print entries from both
250 newfallback: use new unwinder but fall back to old if it gets
251 stuck (default)
252
253 call_trace=[old|both|newfallback|new]
254 old: use old inexact backtracer
255 new: use new exact dwarf2 unwinder
256 both: print entries from both
257 newfallback: use new unwinder but fall back to old if it gets
258 stuck (default)
259
220Misc 260Misc
221 261
222 noreplacement Don't replace instructions with more appropriate ones 262 noreplacement Don't replace instructions with more appropriate ones
diff --git a/Documentation/x86_64/kernel-stacks b/Documentation/x86_64/kernel-stacks
new file mode 100644
index 000000000000..bddfddd466ab
--- /dev/null
+++ b/Documentation/x86_64/kernel-stacks
@@ -0,0 +1,99 @@
1Most of the text from Keith Owens, hacked by AK
2
3x86_64 page size (PAGE_SIZE) is 4K.
4
5Like all other architectures, x86_64 has a kernel stack for every
6active thread. These thread stacks are THREAD_SIZE (2*PAGE_SIZE) big.
7These stacks contain useful data as long as a thread is alive or a
8zombie. While the thread is in user space the kernel stack is empty
9except for the thread_info structure at the bottom.
10
11In addition to the per thread stacks, there are specialized stacks
12associated with each cpu. These stacks are only used while the kernel
13is in control on that cpu, when a cpu returns to user space the
14specialized stacks contain no useful data. The main cpu stacks is
15
16* Interrupt stack. IRQSTACKSIZE
17
18 Used for external hardware interrupts. If this is the first external
19 hardware interrupt (i.e. not a nested hardware interrupt) then the
20 kernel switches from the current task to the interrupt stack. Like
21 the split thread and interrupt stacks on i386 (with CONFIG_4KSTACKS),
22 this gives more room for kernel interrupt processing without having
23 to increase the size of every per thread stack.
24
25 The interrupt stack is also used when processing a softirq.
26
27Switching to the kernel interrupt stack is done by software based on a
28per CPU interrupt nest counter. This is needed because x86-64 "IST"
29hardware stacks cannot nest without races.
30
31x86_64 also has a feature which is not available on i386, the ability
32to automatically switch to a new stack for designated events such as
33double fault or NMI, which makes it easier to handle these unusual
34events on x86_64. This feature is called the Interrupt Stack Table
35(IST). There can be up to 7 IST entries per cpu. The IST code is an
36index into the Task State Segment (TSS), the IST entries in the TSS
37point to dedicated stacks, each stack can be a different size.
38
39An IST is selected by an non-zero value in the IST field of an
40interrupt-gate descriptor. When an interrupt occurs and the hardware
41loads such a descriptor, the hardware automatically sets the new stack
42pointer based on the IST value, then invokes the interrupt handler. If
43software wants to allow nested IST interrupts then the handler must
44adjust the IST values on entry to and exit from the interrupt handler.
45(this is occasionally done, e.g. for debug exceptions)
46
47Events with different IST codes (i.e. with different stacks) can be
48nested. For example, a debug interrupt can safely be interrupted by an
49NMI. arch/x86_64/kernel/entry.S::paranoidentry adjusts the stack
50pointers on entry to and exit from all IST events, in theory allowing
51IST events with the same code to be nested. However in most cases, the
52stack size allocated to an IST assumes no nesting for the same code.
53If that assumption is ever broken then the stacks will become corrupt.
54
55The currently assigned IST stacks are :-
56
57* STACKFAULT_STACK. EXCEPTION_STKSZ (PAGE_SIZE).
58
59 Used for interrupt 12 - Stack Fault Exception (#SS).
60
61 This allows to recover from invalid stack segments. Rarely
62 happens.
63
64* DOUBLEFAULT_STACK. EXCEPTION_STKSZ (PAGE_SIZE).
65
66 Used for interrupt 8 - Double Fault Exception (#DF).
67
68 Invoked when handling a exception causes another exception. Happens
69 when the kernel is very confused (e.g. kernel stack pointer corrupt)
70 Using a separate stack allows to recover from it well enough in many
71 cases to still output an oops.
72
73* NMI_STACK. EXCEPTION_STKSZ (PAGE_SIZE).
74
75 Used for non-maskable interrupts (NMI).
76
77 NMI can be delivered at any time, including when the kernel is in the
78 middle of switching stacks. Using IST for NMI events avoids making
79 assumptions about the previous state of the kernel stack.
80
81* DEBUG_STACK. DEBUG_STKSZ
82
83 Used for hardware debug interrupts (interrupt 1) and for software
84 debug interrupts (INT3).
85
86 When debugging a kernel, debug interrupts (both hardware and
87 software) can occur at any time. Using IST for these interrupts
88 avoids making assumptions about the previous state of the kernel
89 stack.
90
91* MCE_STACK. EXCEPTION_STKSZ (PAGE_SIZE).
92
93 Used for interrupt 18 - Machine Check Exception (#MC).
94
95 MCE can be delivered at any time, including when the kernel is in the
96 middle of switching stacks. Using IST for MCE events avoids making
97 assumptions about the previous state of the kernel stack.
98
99For more details see the Intel IA32 or AMD AMD64 architecture manuals.