aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
authorIngo Molnar <mingo@elte.hu>2008-07-18 13:31:12 -0400
committerIngo Molnar <mingo@elte.hu>2008-07-18 13:31:12 -0400
commit3e370b29d35fb01bfb92c2814d6f79bf6a2cb970 (patch)
tree3b8fb467d60bfe6a34686f4abdc3a60050ba40a4 /Documentation
parent88d1dce3a74367291f65a757fbdcaf17f042f30c (diff)
parent5b664cb235e97afbf34db9c4d77f08ebd725335e (diff)
Merge branch 'linus' into x86/pci-ioapic-boot-irq-quirks
Conflicts: drivers/pci/quirks.c Signed-off-by: Ingo Molnar <mingo@elte.hu>
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/ABI/testing/sysfs-block34
-rw-r--r--Documentation/ABI/testing/sysfs-bus-css35
-rw-r--r--Documentation/ABI/testing/sysfs-firmware-acpi127
-rw-r--r--Documentation/ABI/testing/sysfs-firmware-memmap71
-rw-r--r--Documentation/HOWTO2
-rw-r--r--Documentation/IRQ-affinity.txt37
-rw-r--r--Documentation/RCU/NMI-RCU.txt3
-rw-r--r--Documentation/RCU/RTFP.txt108
-rw-r--r--Documentation/RCU/checklist.txt89
-rw-r--r--Documentation/RCU/torture.txt48
-rw-r--r--Documentation/RCU/whatisRCU.txt58
-rw-r--r--Documentation/block/data-integrity.txt327
-rw-r--r--Documentation/cputopology.txt26
-rw-r--r--Documentation/feature-removal-schedule.txt7
-rw-r--r--Documentation/filesystems/configfs/configfs.txt10
-rw-r--r--Documentation/filesystems/configfs/configfs_example.c14
-rw-r--r--Documentation/filesystems/ext4.txt125
-rw-r--r--Documentation/filesystems/gfs2-glocks.txt114
-rw-r--r--Documentation/filesystems/proc.txt29
-rw-r--r--Documentation/filesystems/ubifs.txt164
-rw-r--r--Documentation/ftrace.txt403
-rw-r--r--Documentation/i2c/busses/i2c-i81047
-rw-r--r--Documentation/i2c/busses/i2c-prosavage23
-rw-r--r--Documentation/i2c/busses/i2c-savage426
-rw-r--r--Documentation/i2c/chips/max68752
-rw-r--r--Documentation/i2c/chips/pca953910
-rw-r--r--Documentation/i2c/chips/pcf857412
-rw-r--r--Documentation/i2c/chips/pcf85759
-rw-r--r--Documentation/i2c/fault-codes127
-rw-r--r--Documentation/i2c/smbus-protocol4
-rw-r--r--Documentation/i2c/writing-clients51
-rw-r--r--Documentation/ioctl-number.txt1
-rw-r--r--Documentation/ioctl/hdio.txt7
-rw-r--r--Documentation/kernel-parameters.txt28
-rw-r--r--Documentation/kprobes.txt1
-rw-r--r--Documentation/laptops/acer-wmi.txt2
-rw-r--r--Documentation/powerpc/booting-without-of.txt1082
-rw-r--r--Documentation/powerpc/bootwrapper.txt141
-rw-r--r--Documentation/powerpc/dts-bindings/fsl/board.txt29
-rw-r--r--Documentation/powerpc/dts-bindings/fsl/cpm_qe/cpm.txt67
-rw-r--r--Documentation/powerpc/dts-bindings/fsl/cpm_qe/cpm/brg.txt21
-rw-r--r--Documentation/powerpc/dts-bindings/fsl/cpm_qe/cpm/i2c.txt41
-rw-r--r--Documentation/powerpc/dts-bindings/fsl/cpm_qe/cpm/pic.txt18
-rw-r--r--Documentation/powerpc/dts-bindings/fsl/cpm_qe/cpm/usb.txt15
-rw-r--r--Documentation/powerpc/dts-bindings/fsl/cpm_qe/network.txt45
-rw-r--r--Documentation/powerpc/dts-bindings/fsl/cpm_qe/qe.txt58
-rw-r--r--Documentation/powerpc/dts-bindings/fsl/cpm_qe/qe/firmware.txt24
-rw-r--r--Documentation/powerpc/dts-bindings/fsl/cpm_qe/qe/par_io.txt51
-rw-r--r--Documentation/powerpc/dts-bindings/fsl/cpm_qe/qe/pincfg.txt60
-rw-r--r--Documentation/powerpc/dts-bindings/fsl/cpm_qe/qe/ucc.txt70
-rw-r--r--Documentation/powerpc/dts-bindings/fsl/cpm_qe/qe/usb.txt22
-rw-r--r--Documentation/powerpc/dts-bindings/fsl/cpm_qe/serial.txt21
-rw-r--r--Documentation/powerpc/dts-bindings/fsl/diu.txt18
-rw-r--r--Documentation/powerpc/dts-bindings/fsl/dma.txt127
-rw-r--r--Documentation/powerpc/dts-bindings/fsl/gtm.txt31
-rw-r--r--Documentation/powerpc/dts-bindings/fsl/guts.txt25
-rw-r--r--Documentation/powerpc/dts-bindings/fsl/i2c.txt32
-rw-r--r--Documentation/powerpc/dts-bindings/fsl/lbc.txt35
-rw-r--r--Documentation/powerpc/dts-bindings/fsl/msi-pic.txt36
-rw-r--r--Documentation/powerpc/dts-bindings/fsl/sata.txt29
-rw-r--r--Documentation/powerpc/dts-bindings/fsl/sec.txt68
-rw-r--r--Documentation/powerpc/dts-bindings/fsl/spi.txt24
-rw-r--r--Documentation/powerpc/dts-bindings/fsl/ssi.txt38
-rw-r--r--Documentation/powerpc/dts-bindings/fsl/tsec.txt69
-rw-r--r--Documentation/powerpc/dts-bindings/fsl/usb.txt59
-rw-r--r--Documentation/scheduler/sched-domains.txt7
-rw-r--r--Documentation/scheduler/sched-rt-group.txt4
-rw-r--r--Documentation/scsi/aacraid.txt24
-rw-r--r--Documentation/sound/alsa/ALSA-Configuration.txt17
-rw-r--r--Documentation/sound/alsa/DocBook/writing-an-alsa-driver.tmpl4
-rw-r--r--Documentation/tracers/mmiotrace.txt164
71 files changed, 3130 insertions, 1627 deletions
diff --git a/Documentation/ABI/testing/sysfs-block b/Documentation/ABI/testing/sysfs-block
index 4bd9ea539129..44f52a4f5903 100644
--- a/Documentation/ABI/testing/sysfs-block
+++ b/Documentation/ABI/testing/sysfs-block
@@ -26,3 +26,37 @@ Description:
26 I/O statistics of partition <part>. The format is the 26 I/O statistics of partition <part>. The format is the
27 same as the above-written /sys/block/<disk>/stat 27 same as the above-written /sys/block/<disk>/stat
28 format. 28 format.
29
30
31What: /sys/block/<disk>/integrity/format
32Date: June 2008
33Contact: Martin K. Petersen <martin.petersen@oracle.com>
34Description:
35 Metadata format for integrity capable block device.
36 E.g. T10-DIF-TYPE1-CRC.
37
38
39What: /sys/block/<disk>/integrity/read_verify
40Date: June 2008
41Contact: Martin K. Petersen <martin.petersen@oracle.com>
42Description:
43 Indicates whether the block layer should verify the
44 integrity of read requests serviced by devices that
45 support sending integrity metadata.
46
47
48What: /sys/block/<disk>/integrity/tag_size
49Date: June 2008
50Contact: Martin K. Petersen <martin.petersen@oracle.com>
51Description:
52 Number of bytes of integrity tag space available per
53 512 bytes of data.
54
55
56What: /sys/block/<disk>/integrity/write_generate
57Date: June 2008
58Contact: Martin K. Petersen <martin.petersen@oracle.com>
59Description:
60 Indicates whether the block layer should automatically
61 generate checksums for write requests bound for
62 devices that support receiving integrity metadata.
diff --git a/Documentation/ABI/testing/sysfs-bus-css b/Documentation/ABI/testing/sysfs-bus-css
new file mode 100644
index 000000000000..b585ec258a08
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-bus-css
@@ -0,0 +1,35 @@
1What: /sys/bus/css/devices/.../type
2Date: March 2008
3Contact: Cornelia Huck <cornelia.huck@de.ibm.com>
4 linux-s390@vger.kernel.org
5Description: Contains the subchannel type, as reported by the hardware.
6 This attribute is present for all subchannel types.
7
8What: /sys/bus/css/devices/.../modalias
9Date: March 2008
10Contact: Cornelia Huck <cornelia.huck@de.ibm.com>
11 linux-s390@vger.kernel.org
12Description: Contains the module alias as reported with uevents.
13 It is of the format css:t<type> and present for all
14 subchannel types.
15
16What: /sys/bus/css/drivers/io_subchannel/.../chpids
17Date: December 2002
18Contact: Cornelia Huck <cornelia.huck@de.ibm.com>
19 linux-s390@vger.kernel.org
20Description: Contains the ids of the channel paths used by this
21 subchannel, as reported by the channel subsystem
22 during subchannel recognition.
23 Note: This is an I/O-subchannel specific attribute.
24Users: s390-tools, HAL
25
26What: /sys/bus/css/drivers/io_subchannel/.../pimpampom
27Date: December 2002
28Contact: Cornelia Huck <cornelia.huck@de.ibm.com>
29 linux-s390@vger.kernel.org
30Description: Contains the PIM/PAM/POM values, as reported by the
31 channel subsystem when last queried by the common I/O
32 layer (this implies that this attribute is not neccessarily
33 in sync with the values current in the channel subsystem).
34 Note: This is an I/O-subchannel specific attribute.
35Users: s390-tools, HAL
diff --git a/Documentation/ABI/testing/sysfs-firmware-acpi b/Documentation/ABI/testing/sysfs-firmware-acpi
index 9470ed9afcc0..f27be7d1a49f 100644
--- a/Documentation/ABI/testing/sysfs-firmware-acpi
+++ b/Documentation/ABI/testing/sysfs-firmware-acpi
@@ -29,46 +29,46 @@ Description:
29 29
30 $ cd /sys/firmware/acpi/interrupts 30 $ cd /sys/firmware/acpi/interrupts
31 $ grep . * 31 $ grep . *
32 error:0 32 error: 0
33 ff_gbl_lock:0 33 ff_gbl_lock: 0 enable
34 ff_pmtimer:0 34 ff_pmtimer: 0 invalid
35 ff_pwr_btn:0 35 ff_pwr_btn: 0 enable
36 ff_rt_clk:0 36 ff_rt_clk: 2 disable
37 ff_slp_btn:0 37 ff_slp_btn: 0 invalid
38 gpe00:0 38 gpe00: 0 invalid
39 gpe01:0 39 gpe01: 0 enable
40 gpe02:0 40 gpe02: 108 enable
41 gpe03:0 41 gpe03: 0 invalid
42 gpe04:0 42 gpe04: 0 invalid
43 gpe05:0 43 gpe05: 0 invalid
44 gpe06:0 44 gpe06: 0 enable
45 gpe07:0 45 gpe07: 0 enable
46 gpe08:0 46 gpe08: 0 invalid
47 gpe09:174 47 gpe09: 0 invalid
48 gpe0A:0 48 gpe0A: 0 invalid
49 gpe0B:0 49 gpe0B: 0 invalid
50 gpe0C:0 50 gpe0C: 0 invalid
51 gpe0D:0 51 gpe0D: 0 invalid
52 gpe0E:0 52 gpe0E: 0 invalid
53 gpe0F:0 53 gpe0F: 0 invalid
54 gpe10:0 54 gpe10: 0 invalid
55 gpe11:60 55 gpe11: 0 invalid
56 gpe12:0 56 gpe12: 0 invalid
57 gpe13:0 57 gpe13: 0 invalid
58 gpe14:0 58 gpe14: 0 invalid
59 gpe15:0 59 gpe15: 0 invalid
60 gpe16:0 60 gpe16: 0 invalid
61 gpe17:0 61 gpe17: 1084 enable
62 gpe18:0 62 gpe18: 0 enable
63 gpe19:7 63 gpe19: 0 invalid
64 gpe1A:0 64 gpe1A: 0 invalid
65 gpe1B:0 65 gpe1B: 0 invalid
66 gpe1C:0 66 gpe1C: 0 invalid
67 gpe1D:0 67 gpe1D: 0 invalid
68 gpe1E:0 68 gpe1E: 0 invalid
69 gpe1F:0 69 gpe1F: 0 invalid
70 gpe_all:241 70 gpe_all: 1192
71 sci:241 71 sci: 1194
72 72
73 sci - The total number of times the ACPI SCI 73 sci - The total number of times the ACPI SCI
74 has claimed an interrupt. 74 has claimed an interrupt.
@@ -89,6 +89,13 @@ Description:
89 89
90 error - an interrupt that can't be accounted for above. 90 error - an interrupt that can't be accounted for above.
91 91
92 invalid: it's either a wakeup GPE or a GPE/Fixed Event that
93 doesn't have an event handler.
94
95 disable: the GPE/Fixed Event is valid but disabled.
96
97 enable: the GPE/Fixed Event is valid and enabled.
98
92 Root has permission to clear any of these counters. Eg. 99 Root has permission to clear any of these counters. Eg.
93 # echo 0 > gpe11 100 # echo 0 > gpe11
94 101
@@ -97,3 +104,43 @@ Description:
97 104
98 None of these counters has an effect on the function 105 None of these counters has an effect on the function
99 of the system, they are simply statistics. 106 of the system, they are simply statistics.
107
108 Besides this, user can also write specific strings to these files
109 to enable/disable/clear ACPI interrupts in user space, which can be
110 used to debug some ACPI interrupt storm issues.
111
112 Note that only writting to VALID GPE/Fixed Event is allowed,
113 i.e. user can only change the status of runtime GPE and
114 Fixed Event with event handler installed.
115
116 Let's take power button fixed event for example, please kill acpid
117 and other user space applications so that the machine won't shutdown
118 when pressing the power button.
119 # cat ff_pwr_btn
120 0
121 # press the power button for 3 times;
122 # cat ff_pwr_btn
123 3
124 # echo disable > ff_pwr_btn
125 # cat ff_pwr_btn
126 disable
127 # press the power button for 3 times;
128 # cat ff_pwr_btn
129 disable
130 # echo enable > ff_pwr_btn
131 # cat ff_pwr_btn
132 4
133 /*
134 * this is because the status bit is set even if the enable bit is cleared,
135 * and it triggers an ACPI fixed event when the enable bit is set again
136 */
137 # press the power button for 3 times;
138 # cat ff_pwr_btn
139 7
140 # echo disable > ff_pwr_btn
141 # press the power button for 3 times;
142 # echo clear > ff_pwr_btn /* clear the status bit */
143 # echo disable > ff_pwr_btn
144 # cat ff_pwr_btn
145 7
146
diff --git a/Documentation/ABI/testing/sysfs-firmware-memmap b/Documentation/ABI/testing/sysfs-firmware-memmap
new file mode 100644
index 000000000000..0d99ee6ae02e
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-firmware-memmap
@@ -0,0 +1,71 @@
1What: /sys/firmware/memmap/
2Date: June 2008
3Contact: Bernhard Walle <bwalle@suse.de>
4Description:
5 On all platforms, the firmware provides a memory map which the
6 kernel reads. The resources from that memory map are registered
7 in the kernel resource tree and exposed to userspace via
8 /proc/iomem (together with other resources).
9
10 However, on most architectures that firmware-provided memory
11 map is modified afterwards by the kernel itself, either because
12 the kernel merges that memory map with other information or
13 just because the user overwrites that memory map via command
14 line.
15
16 kexec needs the raw firmware-provided memory map to setup the
17 parameter segment of the kernel that should be booted with
18 kexec. Also, the raw memory map is useful for debugging. For
19 that reason, /sys/firmware/memmap is an interface that provides
20 the raw memory map to userspace.
21
22 The structure is as follows: Under /sys/firmware/memmap there
23 are subdirectories with the number of the entry as their name:
24
25 /sys/firmware/memmap/0
26 /sys/firmware/memmap/1
27 /sys/firmware/memmap/2
28 /sys/firmware/memmap/3
29 ...
30
31 The maximum depends on the number of memory map entries provided
32 by the firmware. The order is just the order that the firmware
33 provides.
34
35 Each directory contains three files:
36
37 start : The start address (as hexadecimal number with the
38 '0x' prefix).
39 end : The end address, inclusive (regardless whether the
40 firmware provides inclusive or exclusive ranges).
41 type : Type of the entry as string. See below for a list of
42 valid types.
43
44 So, for example:
45
46 /sys/firmware/memmap/0/start
47 /sys/firmware/memmap/0/end
48 /sys/firmware/memmap/0/type
49 /sys/firmware/memmap/1/start
50 ...
51
52 Currently following types exist:
53
54 - System RAM
55 - ACPI Tables
56 - ACPI Non-volatile Storage
57 - reserved
58
59 Following shell snippet can be used to display that memory
60 map in a human-readable format:
61
62 -------------------- 8< ----------------------------------------
63 #!/bin/bash
64 cd /sys/firmware/memmap
65 for dir in * ; do
66 start=$(cat $dir/start)
67 end=$(cat $dir/end)
68 type=$(cat $dir/type)
69 printf "%016x-%016x (%s)\n" $start $[ $end +1] "$type"
70 done
71 -------------------- >8 ----------------------------------------
diff --git a/Documentation/HOWTO b/Documentation/HOWTO
index 0291ade44c17..619e8caf30db 100644
--- a/Documentation/HOWTO
+++ b/Documentation/HOWTO
@@ -377,7 +377,7 @@ Bug Reporting
377bugzilla.kernel.org is where the Linux kernel developers track kernel 377bugzilla.kernel.org is where the Linux kernel developers track kernel
378bugs. Users are encouraged to report all bugs that they find in this 378bugs. Users are encouraged to report all bugs that they find in this
379tool. For details on how to use the kernel bugzilla, please see: 379tool. For details on how to use the kernel bugzilla, please see:
380 http://test.kernel.org/bugzilla/faq.html 380 http://bugzilla.kernel.org/page.cgi?id=faq.html
381 381
382The file REPORTING-BUGS in the main kernel source directory has a good 382The file REPORTING-BUGS in the main kernel source directory has a good
383template for how to report a possible kernel bug, and details what kind 383template for how to report a possible kernel bug, and details what kind
diff --git a/Documentation/IRQ-affinity.txt b/Documentation/IRQ-affinity.txt
index 938d7dd05490..b4a615b78403 100644
--- a/Documentation/IRQ-affinity.txt
+++ b/Documentation/IRQ-affinity.txt
@@ -1,17 +1,26 @@
1ChangeLog:
2 Started by Ingo Molnar <mingo@redhat.com>
3 Update by Max Krasnyansky <maxk@qualcomm.com>
1 4
2SMP IRQ affinity, started by Ingo Molnar <mingo@redhat.com> 5SMP IRQ affinity
3
4 6
5/proc/irq/IRQ#/smp_affinity specifies which target CPUs are permitted 7/proc/irq/IRQ#/smp_affinity specifies which target CPUs are permitted
6for a given IRQ source. It's a bitmask of allowed CPUs. It's not allowed 8for a given IRQ source. It's a bitmask of allowed CPUs. It's not allowed
7to turn off all CPUs, and if an IRQ controller does not support IRQ 9to turn off all CPUs, and if an IRQ controller does not support IRQ
8affinity then the value will not change from the default 0xffffffff. 10affinity then the value will not change from the default 0xffffffff.
9 11
12/proc/irq/default_smp_affinity specifies default affinity mask that applies
13to all non-active IRQs. Once IRQ is allocated/activated its affinity bitmask
14will be set to the default mask. It can then be changed as described above.
15Default mask is 0xffffffff.
16
10Here is an example of restricting IRQ44 (eth1) to CPU0-3 then restricting 17Here is an example of restricting IRQ44 (eth1) to CPU0-3 then restricting
11the IRQ to CPU4-7 (this is an 8-CPU SMP box): 18it to CPU4-7 (this is an 8-CPU SMP box):
12 19
20[root@moon 44]# cd /proc/irq/44
13[root@moon 44]# cat smp_affinity 21[root@moon 44]# cat smp_affinity
14ffffffff 22ffffffff
23
15[root@moon 44]# echo 0f > smp_affinity 24[root@moon 44]# echo 0f > smp_affinity
16[root@moon 44]# cat smp_affinity 25[root@moon 44]# cat smp_affinity
170000000f 260000000f
@@ -21,17 +30,27 @@ PING hell (195.4.7.3): 56 data bytes
21--- hell ping statistics --- 30--- hell ping statistics ---
226029 packets transmitted, 6027 packets received, 0% packet loss 316029 packets transmitted, 6027 packets received, 0% packet loss
23round-trip min/avg/max = 0.1/0.1/0.4 ms 32round-trip min/avg/max = 0.1/0.1/0.4 ms
24[root@moon 44]# cat /proc/interrupts | grep 44: 33[root@moon 44]# cat /proc/interrupts | grep 'CPU\|44:'
25 44: 0 1785 1785 1783 1783 1 34 CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7
261 0 IO-APIC-level eth1 35 44: 1068 1785 1785 1783 0 0 0 0 IO-APIC-level eth1
36
37As can be seen from the line above IRQ44 was delivered only to the first four
38processors (0-3).
39Now lets restrict that IRQ to CPU(4-7).
40
27[root@moon 44]# echo f0 > smp_affinity 41[root@moon 44]# echo f0 > smp_affinity
42[root@moon 44]# cat smp_affinity
43000000f0
28[root@moon 44]# ping -f h 44[root@moon 44]# ping -f h
29PING hell (195.4.7.3): 56 data bytes 45PING hell (195.4.7.3): 56 data bytes
30.. 46..
31--- hell ping statistics --- 47--- hell ping statistics ---
322779 packets transmitted, 2777 packets received, 0% packet loss 482779 packets transmitted, 2777 packets received, 0% packet loss
33round-trip min/avg/max = 0.1/0.5/585.4 ms 49round-trip min/avg/max = 0.1/0.5/585.4 ms
34[root@moon 44]# cat /proc/interrupts | grep 44: 50[root@moon 44]# cat /proc/interrupts | 'CPU\|44:'
35 44: 1068 1785 1785 1784 1784 1069 1070 1069 IO-APIC-level eth1 51 CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7
36[root@moon 44]# 52 44: 1068 1785 1785 1783 1784 1069 1070 1069 IO-APIC-level eth1
53
54This time around IRQ44 was delivered only to the last four processors.
55i.e counters for the CPU0-3 did not change.
37 56
diff --git a/Documentation/RCU/NMI-RCU.txt b/Documentation/RCU/NMI-RCU.txt
index c64158ecde43..a6d32e65d222 100644
--- a/Documentation/RCU/NMI-RCU.txt
+++ b/Documentation/RCU/NMI-RCU.txt
@@ -93,6 +93,9 @@ Since NMI handlers disable preemption, synchronize_sched() is guaranteed
93not to return until all ongoing NMI handlers exit. It is therefore safe 93not to return until all ongoing NMI handlers exit. It is therefore safe
94to free up the handler's data as soon as synchronize_sched() returns. 94to free up the handler's data as soon as synchronize_sched() returns.
95 95
96Important note: for this to work, the architecture in question must
97invoke irq_enter() and irq_exit() on NMI entry and exit, respectively.
98
96 99
97Answer to Quick Quiz 100Answer to Quick Quiz
98 101
diff --git a/Documentation/RCU/RTFP.txt b/Documentation/RCU/RTFP.txt
index 39ad8f56783a..9f711d2df91b 100644
--- a/Documentation/RCU/RTFP.txt
+++ b/Documentation/RCU/RTFP.txt
@@ -52,6 +52,10 @@ of each iteration. Unfortunately, chaotic relaxation requires highly
52structured data, such as the matrices used in scientific programs, and 52structured data, such as the matrices used in scientific programs, and
53is thus inapplicable to most data structures in operating-system kernels. 53is thus inapplicable to most data structures in operating-system kernels.
54 54
55In 1992, Henry (now Alexia) Massalin completed a dissertation advising
56parallel programmers to defer processing when feasible to simplify
57synchronization. RCU makes extremely heavy use of this advice.
58
55In 1993, Jacobson [Jacobson93] verbally described what is perhaps the 59In 1993, Jacobson [Jacobson93] verbally described what is perhaps the
56simplest deferred-free technique: simply waiting a fixed amount of time 60simplest deferred-free technique: simply waiting a fixed amount of time
57before freeing blocks awaiting deferred free. Jacobson did not describe 61before freeing blocks awaiting deferred free. Jacobson did not describe
@@ -138,6 +142,13 @@ blocking in read-side critical sections appeared [PaulEMcKenney2006c],
138Robert Olsson described an RCU-protected trie-hash combination 142Robert Olsson described an RCU-protected trie-hash combination
139[RobertOlsson2006a]. 143[RobertOlsson2006a].
140 144
1452007 saw the journal version of the award-winning RCU paper from 2006
146[ThomasEHart2007a], as well as a paper demonstrating use of Promela
147and Spin to mechanically verify an optimization to Oleg Nesterov's
148QRCU [PaulEMcKenney2007QRCUspin], a design document describing
149preemptible RCU [PaulEMcKenney2007PreemptibleRCU], and the three-part
150LWN "What is RCU?" series [PaulEMcKenney2007WhatIsRCUFundamentally,
151PaulEMcKenney2008WhatIsRCUUsage, and PaulEMcKenney2008WhatIsRCUAPI].
141 152
142Bibtex Entries 153Bibtex Entries
143 154
@@ -202,6 +213,20 @@ Bibtex Entries
202,Year="1991" 213,Year="1991"
203} 214}
204 215
216@phdthesis{HMassalinPhD
217,author="H. Massalin"
218,title="Synthesis: An Efficient Implementation of Fundamental Operating
219System Services"
220,school="Columbia University"
221,address="New York, NY"
222,year="1992"
223,annotation="
224 Mondo optimizing compiler.
225 Wait-free stuff.
226 Good advice: defer work to avoid synchronization.
227"
228}
229
205@unpublished{Jacobson93 230@unpublished{Jacobson93
206,author="Van Jacobson" 231,author="Van Jacobson"
207,title="Avoid Read-Side Locking Via Delayed Free" 232,title="Avoid Read-Side Locking Via Delayed Free"
@@ -635,3 +660,86 @@ Revised:
635" 660"
636} 661}
637 662
663@unpublished{PaulEMcKenney2007PreemptibleRCU
664,Author="Paul E. McKenney"
665,Title="The design of preemptible read-copy-update"
666,month="October"
667,day="8"
668,year="2007"
669,note="Available:
670\url{http://lwn.net/Articles/253651/}
671[Viewed October 25, 2007]"
672,annotation="
673 LWN article describing the design of preemptible RCU.
674"
675}
676
677########################################################################
678#
679# "What is RCU?" LWN series.
680#
681
682@unpublished{PaulEMcKenney2007WhatIsRCUFundamentally
683,Author="Paul E. McKenney and Jonathan Walpole"
684,Title="What is {RCU}, Fundamentally?"
685,month="December"
686,day="17"
687,year="2007"
688,note="Available:
689\url{http://lwn.net/Articles/262464/}
690[Viewed December 27, 2007]"
691,annotation="
692 Lays out the three basic components of RCU: (1) publish-subscribe,
693 (2) wait for pre-existing readers to complete, and (2) maintain
694 multiple versions.
695"
696}
697
698@unpublished{PaulEMcKenney2008WhatIsRCUUsage
699,Author="Paul E. McKenney"
700,Title="What is {RCU}? Part 2: Usage"
701,month="January"
702,day="4"
703,year="2008"
704,note="Available:
705\url{http://lwn.net/Articles/263130/}
706[Viewed January 4, 2008]"
707,annotation="
708 Lays out six uses of RCU:
709 1. RCU is a Reader-Writer Lock Replacement
710 2. RCU is a Restricted Reference-Counting Mechanism
711 3. RCU is a Bulk Reference-Counting Mechanism
712 4. RCU is a Poor Man's Garbage Collector
713 5. RCU is a Way of Providing Existence Guarantees
714 6. RCU is a Way of Waiting for Things to Finish
715"
716}
717
718@unpublished{PaulEMcKenney2008WhatIsRCUAPI
719,Author="Paul E. McKenney"
720,Title="{RCU} part 3: the {RCU} {API}"
721,month="January"
722,day="17"
723,year="2008"
724,note="Available:
725\url{http://lwn.net/Articles/264090/}
726[Viewed January 10, 2008]"
727,annotation="
728 Gives an overview of the Linux-kernel RCU API and a brief annotated RCU
729 bibliography.
730"
731}
732
733@article{DinakarGuniguntala2008IBMSysJ
734,author="D. Guniguntala and P. E. McKenney and J. Triplett and J. Walpole"
735,title="The read-copy-update mechanism for supporting real-time applications on shared-memory multiprocessor systems with {Linux}"
736,Year="2008"
737,Month="April"
738,journal="IBM Systems Journal"
739,volume="47"
740,number="2"
741,pages="@@-@@"
742,annotation="
743 RCU, realtime RCU, sleepable RCU, performance.
744"
745}
diff --git a/Documentation/RCU/checklist.txt b/Documentation/RCU/checklist.txt
index 42b01bc2e1b4..cf5562cbe356 100644
--- a/Documentation/RCU/checklist.txt
+++ b/Documentation/RCU/checklist.txt
@@ -13,10 +13,13 @@ over a rather long period of time, but improvements are always welcome!
13 detailed performance measurements show that RCU is nonetheless 13 detailed performance measurements show that RCU is nonetheless
14 the right tool for the job. 14 the right tool for the job.
15 15
16 The other exception would be where performance is not an issue, 16 Another exception is where performance is not an issue, and RCU
17 and RCU provides a simpler implementation. An example of this 17 provides a simpler implementation. An example of this situation
18 situation is the dynamic NMI code in the Linux 2.6 kernel, 18 is the dynamic NMI code in the Linux 2.6 kernel, at least on
19 at least on architectures where NMIs are rare. 19 architectures where NMIs are rare.
20
21 Yet another exception is where the low real-time latency of RCU's
22 read-side primitives is critically important.
20 23
211. Does the update code have proper mutual exclusion? 241. Does the update code have proper mutual exclusion?
22 25
@@ -39,9 +42,10 @@ over a rather long period of time, but improvements are always welcome!
39 42
402. Do the RCU read-side critical sections make proper use of 432. Do the RCU read-side critical sections make proper use of
41 rcu_read_lock() and friends? These primitives are needed 44 rcu_read_lock() and friends? These primitives are needed
42 to suppress preemption (or bottom halves, in the case of 45 to prevent grace periods from ending prematurely, which
43 rcu_read_lock_bh()) in the read-side critical sections, 46 could result in data being unceremoniously freed out from
44 and are also an excellent aid to readability. 47 under your read-side code, which can greatly increase the
48 actuarial risk of your kernel.
45 49
46 As a rough rule of thumb, any dereference of an RCU-protected 50 As a rough rule of thumb, any dereference of an RCU-protected
47 pointer must be covered by rcu_read_lock() or rcu_read_lock_bh() 51 pointer must be covered by rcu_read_lock() or rcu_read_lock_bh()
@@ -54,15 +58,30 @@ over a rather long period of time, but improvements are always welcome!
54 be running while updates are in progress. There are a number 58 be running while updates are in progress. There are a number
55 of ways to handle this concurrency, depending on the situation: 59 of ways to handle this concurrency, depending on the situation:
56 60
57 a. Make updates appear atomic to readers. For example, 61 a. Use the RCU variants of the list and hlist update
62 primitives to add, remove, and replace elements on an
63 RCU-protected list. Alternatively, use the RCU-protected
64 trees that have been added to the Linux kernel.
65
66 This is almost always the best approach.
67
68 b. Proceed as in (a) above, but also maintain per-element
69 locks (that are acquired by both readers and writers)
70 that guard per-element state. Of course, fields that
71 the readers refrain from accessing can be guarded by the
72 update-side lock.
73
74 This works quite well, also.
75
76 c. Make updates appear atomic to readers. For example,
58 pointer updates to properly aligned fields will appear 77 pointer updates to properly aligned fields will appear
59 atomic, as will individual atomic primitives. Operations 78 atomic, as will individual atomic primitives. Operations
60 performed under a lock and sequences of multiple atomic 79 performed under a lock and sequences of multiple atomic
61 primitives will -not- appear to be atomic. 80 primitives will -not- appear to be atomic.
62 81
63 This is almost always the best approach. 82 This can work, but is starting to get a bit tricky.
64 83
65 b. Carefully order the updates and the reads so that 84 d. Carefully order the updates and the reads so that
66 readers see valid data at all phases of the update. 85 readers see valid data at all phases of the update.
67 This is often more difficult than it sounds, especially 86 This is often more difficult than it sounds, especially
68 given modern CPUs' tendency to reorder memory references. 87 given modern CPUs' tendency to reorder memory references.
@@ -123,18 +142,22 @@ over a rather long period of time, but improvements are always welcome!
123 when publicizing a pointer to a structure that can 142 when publicizing a pointer to a structure that can
124 be traversed by an RCU read-side critical section. 143 be traversed by an RCU read-side critical section.
125 144
1265. If call_rcu(), or a related primitive such as call_rcu_bh(), 1455. If call_rcu(), or a related primitive such as call_rcu_bh() or
127 is used, the callback function must be written to be called 146 call_rcu_sched(), is used, the callback function must be
128 from softirq context. In particular, it cannot block. 147 written to be called from softirq context. In particular,
148 it cannot block.
129 149
1306. Since synchronize_rcu() can block, it cannot be called from 1506. Since synchronize_rcu() can block, it cannot be called from
131 any sort of irq context. 151 any sort of irq context. Ditto for synchronize_sched() and
152 synchronize_srcu().
132 153
1337. If the updater uses call_rcu(), then the corresponding readers 1547. If the updater uses call_rcu(), then the corresponding readers
134 must use rcu_read_lock() and rcu_read_unlock(). If the updater 155 must use rcu_read_lock() and rcu_read_unlock(). If the updater
135 uses call_rcu_bh(), then the corresponding readers must use 156 uses call_rcu_bh(), then the corresponding readers must use
136 rcu_read_lock_bh() and rcu_read_unlock_bh(). Mixing things up 157 rcu_read_lock_bh() and rcu_read_unlock_bh(). If the updater
137 will result in confusion and broken kernels. 158 uses call_rcu_sched(), then the corresponding readers must
159 disable preemption. Mixing things up will result in confusion
160 and broken kernels.
138 161
139 One exception to this rule: rcu_read_lock() and rcu_read_unlock() 162 One exception to this rule: rcu_read_lock() and rcu_read_unlock()
140 may be substituted for rcu_read_lock_bh() and rcu_read_unlock_bh() 163 may be substituted for rcu_read_lock_bh() and rcu_read_unlock_bh()
@@ -143,9 +166,9 @@ over a rather long period of time, but improvements are always welcome!
143 such cases is a must, of course! And the jury is still out on 166 such cases is a must, of course! And the jury is still out on
144 whether the increased speed is worth it. 167 whether the increased speed is worth it.
145 168
1468. Although synchronize_rcu() is a bit slower than is call_rcu(), 1698. Although synchronize_rcu() is slower than is call_rcu(), it
147 it usually results in simpler code. So, unless update 170 usually results in simpler code. So, unless update performance
148 performance is critically important or the updaters cannot block, 171 is critically important or the updaters cannot block,
149 synchronize_rcu() should be used in preference to call_rcu(). 172 synchronize_rcu() should be used in preference to call_rcu().
150 173
151 An especially important property of the synchronize_rcu() 174 An especially important property of the synchronize_rcu()
@@ -187,23 +210,23 @@ over a rather long period of time, but improvements are always welcome!
187 number of updates per grace period. 210 number of updates per grace period.
188 211
1899. All RCU list-traversal primitives, which include 2129. All RCU list-traversal primitives, which include
190 list_for_each_rcu(), list_for_each_entry_rcu(), 213 rcu_dereference(), list_for_each_rcu(), list_for_each_entry_rcu(),
191 list_for_each_continue_rcu(), and list_for_each_safe_rcu(), 214 list_for_each_continue_rcu(), and list_for_each_safe_rcu(),
192 must be within an RCU read-side critical section. RCU 215 must be either within an RCU read-side critical section or
216 must be protected by appropriate update-side locks. RCU
193 read-side critical sections are delimited by rcu_read_lock() 217 read-side critical sections are delimited by rcu_read_lock()
194 and rcu_read_unlock(), or by similar primitives such as 218 and rcu_read_unlock(), or by similar primitives such as
195 rcu_read_lock_bh() and rcu_read_unlock_bh(). 219 rcu_read_lock_bh() and rcu_read_unlock_bh().
196 220
197 Use of the _rcu() list-traversal primitives outside of an 221 The reason that it is permissible to use RCU list-traversal
198 RCU read-side critical section causes no harm other than 222 primitives when the update-side lock is held is that doing so
199 a slight performance degradation on Alpha CPUs. It can 223 can be quite helpful in reducing code bloat when common code is
200 also be quite helpful in reducing code bloat when common 224 shared between readers and updaters.
201 code is shared between readers and updaters.
202 225
20310. Conversely, if you are in an RCU read-side critical section, 22610. Conversely, if you are in an RCU read-side critical section,
204 you -must- use the "_rcu()" variants of the list macros. 227 and you don't hold the appropriate update-side lock, you -must-
205 Failing to do so will break Alpha and confuse people reading 228 use the "_rcu()" variants of the list macros. Failing to do so
206 your code. 229 will break Alpha and confuse people reading your code.
207 230
20811. Note that synchronize_rcu() -only- guarantees to wait until 23111. Note that synchronize_rcu() -only- guarantees to wait until
209 all currently executing rcu_read_lock()-protected RCU read-side 232 all currently executing rcu_read_lock()-protected RCU read-side
@@ -230,6 +253,14 @@ over a rather long period of time, but improvements are always welcome!
230 must use whatever locking or other synchronization is required 253 must use whatever locking or other synchronization is required
231 to safely access and/or modify that data structure. 254 to safely access and/or modify that data structure.
232 255
256 RCU callbacks are -usually- executed on the same CPU that executed
257 the corresponding call_rcu(), call_rcu_bh(), or call_rcu_sched(),
258 but are by -no- means guaranteed to be. For example, if a given
259 CPU goes offline while having an RCU callback pending, then that
260 RCU callback will execute on some surviving CPU. (If this was
261 not the case, a self-spawning RCU callback would prevent the
262 victim CPU from ever going offline.)
263
23314. SRCU (srcu_read_lock(), srcu_read_unlock(), and synchronize_srcu()) 26414. SRCU (srcu_read_lock(), srcu_read_unlock(), and synchronize_srcu())
234 may only be invoked from process context. Unlike other forms of 265 may only be invoked from process context. Unlike other forms of
235 RCU, it -is- permissible to block in an SRCU read-side critical 266 RCU, it -is- permissible to block in an SRCU read-side critical
diff --git a/Documentation/RCU/torture.txt b/Documentation/RCU/torture.txt
index 2967a65269d8..a342b6e1cc10 100644
--- a/Documentation/RCU/torture.txt
+++ b/Documentation/RCU/torture.txt
@@ -10,23 +10,30 @@ status messages via printk(), which can be examined via the dmesg
10command (perhaps grepping for "torture"). The test is started 10command (perhaps grepping for "torture"). The test is started
11when the module is loaded, and stops when the module is unloaded. 11when the module is loaded, and stops when the module is unloaded.
12 12
13However, actually setting this config option to "y" results in the system 13CONFIG_RCU_TORTURE_TEST_RUNNABLE
14running the test immediately upon boot, and ending only when the system 14
15is taken down. Normally, one will instead want to build the system 15It is also possible to specify CONFIG_RCU_TORTURE_TEST=y, which will
16with CONFIG_RCU_TORTURE_TEST=m and to use modprobe and rmmod to control 16result in the tests being loaded into the base kernel. In this case,
17the test, perhaps using a script similar to the one shown at the end of 17the CONFIG_RCU_TORTURE_TEST_RUNNABLE config option is used to specify
18this document. Note that you will need CONFIG_MODULE_UNLOAD in order 18whether the RCU torture tests are to be started immediately during
19to be able to end the test. 19boot or whether the /proc/sys/kernel/rcutorture_runnable file is used
20to enable them. This /proc file can be used to repeatedly pause and
21restart the tests, regardless of the initial state specified by the
22CONFIG_RCU_TORTURE_TEST_RUNNABLE config option.
23
24You will normally -not- want to start the RCU torture tests during boot
25(and thus the default is CONFIG_RCU_TORTURE_TEST_RUNNABLE=n), but doing
26this can sometimes be useful in finding boot-time bugs.
20 27
21 28
22MODULE PARAMETERS 29MODULE PARAMETERS
23 30
24This module has the following parameters: 31This module has the following parameters:
25 32
26nreaders This is the number of RCU reading threads supported. 33irqreaders Says to invoke RCU readers from irq level. This is currently
27 The default is twice the number of CPUs. Why twice? 34 done via timers. Defaults to "1" for variants of RCU that
28 To properly exercise RCU implementations with preemptible 35 permit this. (Or, more accurately, variants of RCU that do
29 read-side critical sections. 36 -not- permit this know to ignore this variable.)
30 37
31nfakewriters This is the number of RCU fake writer threads to run. Fake 38nfakewriters This is the number of RCU fake writer threads to run. Fake
32 writer threads repeatedly use the synchronous "wait for 39 writer threads repeatedly use the synchronous "wait for
@@ -37,6 +44,16 @@ nfakewriters This is the number of RCU fake writer threads to run. Fake
37 to trigger special cases caused by multiple writers, such as 44 to trigger special cases caused by multiple writers, such as
38 the synchronize_srcu() early return optimization. 45 the synchronize_srcu() early return optimization.
39 46
47nreaders This is the number of RCU reading threads supported.
48 The default is twice the number of CPUs. Why twice?
49 To properly exercise RCU implementations with preemptible
50 read-side critical sections.
51
52shuffle_interval
53 The number of seconds to keep the test threads affinitied
54 to a particular subset of the CPUs, defaults to 3 seconds.
55 Used in conjunction with test_no_idle_hz.
56
40stat_interval The number of seconds between output of torture 57stat_interval The number of seconds between output of torture
41 statistics (via printk()). Regardless of the interval, 58 statistics (via printk()). Regardless of the interval,
42 statistics are printed when the module is unloaded. 59 statistics are printed when the module is unloaded.
@@ -44,10 +61,11 @@ stat_interval The number of seconds between output of torture
44 be printed -only- when the module is unloaded, and this 61 be printed -only- when the module is unloaded, and this
45 is the default. 62 is the default.
46 63
47shuffle_interval 64stutter The length of time to run the test before pausing for this
48 The number of seconds to keep the test threads affinitied 65 same period of time. Defaults to "stutter=5", so as
49 to a particular subset of the CPUs, defaults to 5 seconds. 66 to run and pause for (roughly) five-second intervals.
50 Used in conjunction with test_no_idle_hz. 67 Specifying "stutter=0" causes the test to run continuously
68 without pausing, which is the old default behavior.
51 69
52test_no_idle_hz Whether or not to test the ability of RCU to operate in 70test_no_idle_hz Whether or not to test the ability of RCU to operate in
53 a kernel that disables the scheduling-clock interrupt to 71 a kernel that disables the scheduling-clock interrupt to
diff --git a/Documentation/RCU/whatisRCU.txt b/Documentation/RCU/whatisRCU.txt
index e0d6d99b8f9b..e04d643a9f57 100644
--- a/Documentation/RCU/whatisRCU.txt
+++ b/Documentation/RCU/whatisRCU.txt
@@ -1,3 +1,11 @@
1Please note that the "What is RCU?" LWN series is an excellent place
2to start learning about RCU:
3
41. What is RCU, Fundamentally? http://lwn.net/Articles/262464/
52. What is RCU? Part 2: Usage http://lwn.net/Articles/263130/
63. RCU part 3: the RCU API http://lwn.net/Articles/264090/
7
8
1What is RCU? 9What is RCU?
2 10
3RCU is a synchronization mechanism that was added to the Linux kernel 11RCU is a synchronization mechanism that was added to the Linux kernel
@@ -772,26 +780,18 @@ Linux-kernel source code, but it helps to have a full list of the
772APIs, since there does not appear to be a way to categorize them 780APIs, since there does not appear to be a way to categorize them
773in docbook. Here is the list, by category. 781in docbook. Here is the list, by category.
774 782
775Markers for RCU read-side critical sections:
776
777 rcu_read_lock
778 rcu_read_unlock
779 rcu_read_lock_bh
780 rcu_read_unlock_bh
781 srcu_read_lock
782 srcu_read_unlock
783
784RCU pointer/list traversal: 783RCU pointer/list traversal:
785 784
786 rcu_dereference 785 rcu_dereference
786 list_for_each_entry_rcu
787 hlist_for_each_entry_rcu
788
787 list_for_each_rcu (to be deprecated in favor of 789 list_for_each_rcu (to be deprecated in favor of
788 list_for_each_entry_rcu) 790 list_for_each_entry_rcu)
789 list_for_each_entry_rcu
790 list_for_each_continue_rcu (to be deprecated in favor of new 791 list_for_each_continue_rcu (to be deprecated in favor of new
791 list_for_each_entry_continue_rcu) 792 list_for_each_entry_continue_rcu)
792 hlist_for_each_entry_rcu
793 793
794RCU pointer update: 794RCU pointer/list update:
795 795
796 rcu_assign_pointer 796 rcu_assign_pointer
797 list_add_rcu 797 list_add_rcu
@@ -799,16 +799,36 @@ RCU pointer update:
799 list_del_rcu 799 list_del_rcu
800 list_replace_rcu 800 list_replace_rcu
801 hlist_del_rcu 801 hlist_del_rcu
802 hlist_add_after_rcu
803 hlist_add_before_rcu
802 hlist_add_head_rcu 804 hlist_add_head_rcu
805 hlist_replace_rcu
806 list_splice_init_rcu()
803 807
804RCU grace period: 808RCU: Critical sections Grace period Barrier
809
810 rcu_read_lock synchronize_net rcu_barrier
811 rcu_read_unlock synchronize_rcu
812 call_rcu
813
814
815bh: Critical sections Grace period Barrier
816
817 rcu_read_lock_bh call_rcu_bh rcu_barrier_bh
818 rcu_read_unlock_bh
819
820
821sched: Critical sections Grace period Barrier
822
823 [preempt_disable] synchronize_sched rcu_barrier_sched
824 [and friends] call_rcu_sched
825
826
827SRCU: Critical sections Grace period Barrier
828
829 srcu_read_lock synchronize_srcu N/A
830 srcu_read_unlock
805 831
806 synchronize_net
807 synchronize_sched
808 synchronize_rcu
809 synchronize_srcu
810 call_rcu
811 call_rcu_bh
812 832
813See the comment headers in the source code (or the docbook generated 833See the comment headers in the source code (or the docbook generated
814from them) for more information. 834from them) for more information.
diff --git a/Documentation/block/data-integrity.txt b/Documentation/block/data-integrity.txt
new file mode 100644
index 000000000000..e9dc8d86adc7
--- /dev/null
+++ b/Documentation/block/data-integrity.txt
@@ -0,0 +1,327 @@
1----------------------------------------------------------------------
21. INTRODUCTION
3
4Modern filesystems feature checksumming of data and metadata to
5protect against data corruption. However, the detection of the
6corruption is done at read time which could potentially be months
7after the data was written. At that point the original data that the
8application tried to write is most likely lost.
9
10The solution is to ensure that the disk is actually storing what the
11application meant it to. Recent additions to both the SCSI family
12protocols (SBC Data Integrity Field, SCC protection proposal) as well
13as SATA/T13 (External Path Protection) try to remedy this by adding
14support for appending integrity metadata to an I/O. The integrity
15metadata (or protection information in SCSI terminology) includes a
16checksum for each sector as well as an incrementing counter that
17ensures the individual sectors are written in the right order. And
18for some protection schemes also that the I/O is written to the right
19place on disk.
20
21Current storage controllers and devices implement various protective
22measures, for instance checksumming and scrubbing. But these
23technologies are working in their own isolated domains or at best
24between adjacent nodes in the I/O path. The interesting thing about
25DIF and the other integrity extensions is that the protection format
26is well defined and every node in the I/O path can verify the
27integrity of the I/O and reject it if corruption is detected. This
28allows not only corruption prevention but also isolation of the point
29of failure.
30
31----------------------------------------------------------------------
322. THE DATA INTEGRITY EXTENSIONS
33
34As written, the protocol extensions only protect the path between
35controller and storage device. However, many controllers actually
36allow the operating system to interact with the integrity metadata
37(IMD). We have been working with several FC/SAS HBA vendors to enable
38the protection information to be transferred to and from their
39controllers.
40
41The SCSI Data Integrity Field works by appending 8 bytes of protection
42information to each sector. The data + integrity metadata is stored
43in 520 byte sectors on disk. Data + IMD are interleaved when
44transferred between the controller and target. The T13 proposal is
45similar.
46
47Because it is highly inconvenient for operating systems to deal with
48520 (and 4104) byte sectors, we approached several HBA vendors and
49encouraged them to allow separation of the data and integrity metadata
50scatter-gather lists.
51
52The controller will interleave the buffers on write and split them on
53read. This means that the Linux can DMA the data buffers to and from
54host memory without changes to the page cache.
55
56Also, the 16-bit CRC checksum mandated by both the SCSI and SATA specs
57is somewhat heavy to compute in software. Benchmarks found that
58calculating this checksum had a significant impact on system
59performance for a number of workloads. Some controllers allow a
60lighter-weight checksum to be used when interfacing with the operating
61system. Emulex, for instance, supports the TCP/IP checksum instead.
62The IP checksum received from the OS is converted to the 16-bit CRC
63when writing and vice versa. This allows the integrity metadata to be
64generated by Linux or the application at very low cost (comparable to
65software RAID5).
66
67The IP checksum is weaker than the CRC in terms of detecting bit
68errors. However, the strength is really in the separation of the data
69buffers and the integrity metadata. These two distinct buffers much
70match up for an I/O to complete.
71
72The separation of the data and integrity metadata buffers as well as
73the choice in checksums is referred to as the Data Integrity
74Extensions. As these extensions are outside the scope of the protocol
75bodies (T10, T13), Oracle and its partners are trying to standardize
76them within the Storage Networking Industry Association.
77
78----------------------------------------------------------------------
793. KERNEL CHANGES
80
81The data integrity framework in Linux enables protection information
82to be pinned to I/Os and sent to/received from controllers that
83support it.
84
85The advantage to the integrity extensions in SCSI and SATA is that
86they enable us to protect the entire path from application to storage
87device. However, at the same time this is also the biggest
88disadvantage. It means that the protection information must be in a
89format that can be understood by the disk.
90
91Generally Linux/POSIX applications are agnostic to the intricacies of
92the storage devices they are accessing. The virtual filesystem switch
93and the block layer make things like hardware sector size and
94transport protocols completely transparent to the application.
95
96However, this level of detail is required when preparing the
97protection information to send to a disk. Consequently, the very
98concept of an end-to-end protection scheme is a layering violation.
99It is completely unreasonable for an application to be aware whether
100it is accessing a SCSI or SATA disk.
101
102The data integrity support implemented in Linux attempts to hide this
103from the application. As far as the application (and to some extent
104the kernel) is concerned, the integrity metadata is opaque information
105that's attached to the I/O.
106
107The current implementation allows the block layer to automatically
108generate the protection information for any I/O. Eventually the
109intent is to move the integrity metadata calculation to userspace for
110user data. Metadata and other I/O that originates within the kernel
111will still use the automatic generation interface.
112
113Some storage devices allow each hardware sector to be tagged with a
11416-bit value. The owner of this tag space is the owner of the block
115device. I.e. the filesystem in most cases. The filesystem can use
116this extra space to tag sectors as they see fit. Because the tag
117space is limited, the block interface allows tagging bigger chunks by
118way of interleaving. This way, 8*16 bits of information can be
119attached to a typical 4KB filesystem block.
120
121This also means that applications such as fsck and mkfs will need
122access to manipulate the tags from user space. A passthrough
123interface for this is being worked on.
124
125
126----------------------------------------------------------------------
1274. BLOCK LAYER IMPLEMENTATION DETAILS
128
1294.1 BIO
130
131The data integrity patches add a new field to struct bio when
132CONFIG_BLK_DEV_INTEGRITY is enabled. bio->bi_integrity is a pointer
133to a struct bip which contains the bio integrity payload. Essentially
134a bip is a trimmed down struct bio which holds a bio_vec containing
135the integrity metadata and the required housekeeping information (bvec
136pool, vector count, etc.)
137
138A kernel subsystem can enable data integrity protection on a bio by
139calling bio_integrity_alloc(bio). This will allocate and attach the
140bip to the bio.
141
142Individual pages containing integrity metadata can subsequently be
143attached using bio_integrity_add_page().
144
145bio_free() will automatically free the bip.
146
147
1484.2 BLOCK DEVICE
149
150Because the format of the protection data is tied to the physical
151disk, each block device has been extended with a block integrity
152profile (struct blk_integrity). This optional profile is registered
153with the block layer using blk_integrity_register().
154
155The profile contains callback functions for generating and verifying
156the protection data, as well as getting and setting application tags.
157The profile also contains a few constants to aid in completing,
158merging and splitting the integrity metadata.
159
160Layered block devices will need to pick a profile that's appropriate
161for all subdevices. blk_integrity_compare() can help with that. DM
162and MD linear, RAID0 and RAID1 are currently supported. RAID4/5/6
163will require extra work due to the application tag.
164
165
166----------------------------------------------------------------------
1675.0 BLOCK LAYER INTEGRITY API
168
1695.1 NORMAL FILESYSTEM
170
171 The normal filesystem is unaware that the underlying block device
172 is capable of sending/receiving integrity metadata. The IMD will
173 be automatically generated by the block layer at submit_bio() time
174 in case of a WRITE. A READ request will cause the I/O integrity
175 to be verified upon completion.
176
177 IMD generation and verification can be toggled using the
178
179 /sys/block/<bdev>/integrity/write_generate
180
181 and
182
183 /sys/block/<bdev>/integrity/read_verify
184
185 flags.
186
187
1885.2 INTEGRITY-AWARE FILESYSTEM
189
190 A filesystem that is integrity-aware can prepare I/Os with IMD
191 attached. It can also use the application tag space if this is
192 supported by the block device.
193
194
195 int bdev_integrity_enabled(block_device, int rw);
196
197 bdev_integrity_enabled() will return 1 if the block device
198 supports integrity metadata transfer for the data direction
199 specified in 'rw'.
200
201 bdev_integrity_enabled() honors the write_generate and
202 read_verify flags in sysfs and will respond accordingly.
203
204
205 int bio_integrity_prep(bio);
206
207 To generate IMD for WRITE and to set up buffers for READ, the
208 filesystem must call bio_integrity_prep(bio).
209
210 Prior to calling this function, the bio data direction and start
211 sector must be set, and the bio should have all data pages
212 added. It is up to the caller to ensure that the bio does not
213 change while I/O is in progress.
214
215 bio_integrity_prep() should only be called if
216 bio_integrity_enabled() returned 1.
217
218
219 int bio_integrity_tag_size(bio);
220
221 If the filesystem wants to use the application tag space it will
222 first have to find out how much storage space is available.
223 Because tag space is generally limited (usually 2 bytes per
224 sector regardless of sector size), the integrity framework
225 supports interleaving the information between the sectors in an
226 I/O.
227
228 Filesystems can call bio_integrity_tag_size(bio) to find out how
229 many bytes of storage are available for that particular bio.
230
231 Another option is bdev_get_tag_size(block_device) which will
232 return the number of available bytes per hardware sector.
233
234
235 int bio_integrity_set_tag(bio, void *tag_buf, len);
236
237 After a successful return from bio_integrity_prep(),
238 bio_integrity_set_tag() can be used to attach an opaque tag
239 buffer to a bio. Obviously this only makes sense if the I/O is
240 a WRITE.
241
242
243 int bio_integrity_get_tag(bio, void *tag_buf, len);
244
245 Similarly, at READ I/O completion time the filesystem can
246 retrieve the tag buffer using bio_integrity_get_tag().
247
248
2496.3 PASSING EXISTING INTEGRITY METADATA
250
251 Filesystems that either generate their own integrity metadata or
252 are capable of transferring IMD from user space can use the
253 following calls:
254
255
256 struct bip * bio_integrity_alloc(bio, gfp_mask, nr_pages);
257
258 Allocates the bio integrity payload and hangs it off of the bio.
259 nr_pages indicate how many pages of protection data need to be
260 stored in the integrity bio_vec list (similar to bio_alloc()).
261
262 The integrity payload will be freed at bio_free() time.
263
264
265 int bio_integrity_add_page(bio, page, len, offset);
266
267 Attaches a page containing integrity metadata to an existing
268 bio. The bio must have an existing bip,
269 i.e. bio_integrity_alloc() must have been called. For a WRITE,
270 the integrity metadata in the pages must be in a format
271 understood by the target device with the notable exception that
272 the sector numbers will be remapped as the request traverses the
273 I/O stack. This implies that the pages added using this call
274 will be modified during I/O! The first reference tag in the
275 integrity metadata must have a value of bip->bip_sector.
276
277 Pages can be added using bio_integrity_add_page() as long as
278 there is room in the bip bio_vec array (nr_pages).
279
280 Upon completion of a READ operation, the attached pages will
281 contain the integrity metadata received from the storage device.
282 It is up to the receiver to process them and verify data
283 integrity upon completion.
284
285
2866.4 REGISTERING A BLOCK DEVICE AS CAPABLE OF EXCHANGING INTEGRITY
287 METADATA
288
289 To enable integrity exchange on a block device the gendisk must be
290 registered as capable:
291
292 int blk_integrity_register(gendisk, blk_integrity);
293
294 The blk_integrity struct is a template and should contain the
295 following:
296
297 static struct blk_integrity my_profile = {
298 .name = "STANDARDSBODY-TYPE-VARIANT-CSUM",
299 .generate_fn = my_generate_fn,
300 .verify_fn = my_verify_fn,
301 .get_tag_fn = my_get_tag_fn,
302 .set_tag_fn = my_set_tag_fn,
303 .tuple_size = sizeof(struct my_tuple_size),
304 .tag_size = <tag bytes per hw sector>,
305 };
306
307 'name' is a text string which will be visible in sysfs. This is
308 part of the userland API so chose it carefully and never change
309 it. The format is standards body-type-variant.
310 E.g. T10-DIF-TYPE1-IP or T13-EPP-0-CRC.
311
312 'generate_fn' generates appropriate integrity metadata (for WRITE).
313
314 'verify_fn' verifies that the data buffer matches the integrity
315 metadata.
316
317 'tuple_size' must be set to match the size of the integrity
318 metadata per sector. I.e. 8 for DIF and EPP.
319
320 'tag_size' must be set to identify how many bytes of tag space
321 are available per hardware sector. For DIF this is either 2 or
322 0 depending on the value of the Control Mode Page ATO bit.
323
324 See 6.2 for a description of get_tag_fn and set_tag_fn.
325
326----------------------------------------------------------------------
3272007-12-24 Martin K. Petersen <martin.petersen@oracle.com>
diff --git a/Documentation/cputopology.txt b/Documentation/cputopology.txt
index b61cb9564023..bd699da24666 100644
--- a/Documentation/cputopology.txt
+++ b/Documentation/cputopology.txt
@@ -14,9 +14,8 @@ represent the thread siblings to cpu X in the same physical package;
14To implement it in an architecture-neutral way, a new source file, 14To implement it in an architecture-neutral way, a new source file,
15drivers/base/topology.c, is to export the 4 attributes. 15drivers/base/topology.c, is to export the 4 attributes.
16 16
17If one architecture wants to support this feature, it just needs to 17For an architecture to support this feature, it must define some of
18implement 4 defines, typically in file include/asm-XXX/topology.h. 18these macros in include/asm-XXX/topology.h:
19The 4 defines are:
20#define topology_physical_package_id(cpu) 19#define topology_physical_package_id(cpu)
21#define topology_core_id(cpu) 20#define topology_core_id(cpu)
22#define topology_thread_siblings(cpu) 21#define topology_thread_siblings(cpu)
@@ -25,17 +24,10 @@ The 4 defines are:
25The type of **_id is int. 24The type of **_id is int.
26The type of siblings is cpumask_t. 25The type of siblings is cpumask_t.
27 26
28To be consistent on all architectures, the 4 attributes should have 27To be consistent on all architectures, include/linux/topology.h
29default values if their values are unavailable. Below is the rule. 28provides default definitions for any of the above macros that are
301) physical_package_id: If cpu has no physical package id, -1 is the 29not defined by include/asm-XXX/topology.h:
31default value. 301) physical_package_id: -1
322) core_id: If cpu doesn't support multi-core, its core id is 0. 312) core_id: 0
333) thread_siblings: Just include itself, if the cpu doesn't support 323) thread_siblings: just the given CPU
34HT/multi-thread. 334) core_siblings: just the given CPU
354) core_siblings: Just include itself, if the cpu doesn't support
36multi-core and HT/Multi-thread.
37
38So be careful when declaring the 4 defines in include/asm-XXX/topology.h.
39
40If an attribute isn't defined on an architecture, it won't be exported.
41
diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt
index 46ece3fba6f9..65a1482457a8 100644
--- a/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@ -222,13 +222,6 @@ Who: Thomas Gleixner <tglx@linutronix.de>
222 222
223--------------------------- 223---------------------------
224 224
225What: i2c-i810, i2c-prosavage and i2c-savage4
226When: May 2008
227Why: These drivers are superseded by i810fb, intelfb and savagefb.
228Who: Jean Delvare <khali@linux-fr.org>
229
230---------------------------
231
232What (Why): 225What (Why):
233 - include/linux/netfilter_ipv4/ipt_TOS.h ipt_tos.h header files 226 - include/linux/netfilter_ipv4/ipt_TOS.h ipt_tos.h header files
234 (superseded by xt_TOS/xt_tos target & match) 227 (superseded by xt_TOS/xt_tos target & match)
diff --git a/Documentation/filesystems/configfs/configfs.txt b/Documentation/filesystems/configfs/configfs.txt
index 44c97e6accb2..15838d706ea2 100644
--- a/Documentation/filesystems/configfs/configfs.txt
+++ b/Documentation/filesystems/configfs/configfs.txt
@@ -233,10 +233,12 @@ accomplished via the group operations specified on the group's
233config_item_type. 233config_item_type.
234 234
235 struct configfs_group_operations { 235 struct configfs_group_operations {
236 struct config_item *(*make_item)(struct config_group *group, 236 int (*make_item)(struct config_group *group,
237 const char *name); 237 const char *name,
238 struct config_group *(*make_group)(struct config_group *group, 238 struct config_item **new_item);
239 const char *name); 239 int (*make_group)(struct config_group *group,
240 const char *name,
241 struct config_group **new_group);
240 int (*commit_item)(struct config_item *item); 242 int (*commit_item)(struct config_item *item);
241 void (*disconnect_notify)(struct config_group *group, 243 void (*disconnect_notify)(struct config_group *group,
242 struct config_item *item); 244 struct config_item *item);
diff --git a/Documentation/filesystems/configfs/configfs_example.c b/Documentation/filesystems/configfs/configfs_example.c
index 25151fd5c2c6..0b422acd470c 100644
--- a/Documentation/filesystems/configfs/configfs_example.c
+++ b/Documentation/filesystems/configfs/configfs_example.c
@@ -273,13 +273,13 @@ static inline struct simple_children *to_simple_children(struct config_item *ite
273 return item ? container_of(to_config_group(item), struct simple_children, group) : NULL; 273 return item ? container_of(to_config_group(item), struct simple_children, group) : NULL;
274} 274}
275 275
276static struct config_item *simple_children_make_item(struct config_group *group, const char *name) 276static int simple_children_make_item(struct config_group *group, const char *name, struct config_item **new_item)
277{ 277{
278 struct simple_child *simple_child; 278 struct simple_child *simple_child;
279 279
280 simple_child = kzalloc(sizeof(struct simple_child), GFP_KERNEL); 280 simple_child = kzalloc(sizeof(struct simple_child), GFP_KERNEL);
281 if (!simple_child) 281 if (!simple_child)
282 return NULL; 282 return -ENOMEM;
283 283
284 284
285 config_item_init_type_name(&simple_child->item, name, 285 config_item_init_type_name(&simple_child->item, name,
@@ -287,7 +287,8 @@ static struct config_item *simple_children_make_item(struct config_group *group,
287 287
288 simple_child->storeme = 0; 288 simple_child->storeme = 0;
289 289
290 return &simple_child->item; 290 *new_item = &simple_child->item;
291 return 0;
291} 292}
292 293
293static struct configfs_attribute simple_children_attr_description = { 294static struct configfs_attribute simple_children_attr_description = {
@@ -359,20 +360,21 @@ static struct configfs_subsystem simple_children_subsys = {
359 * children of its own. 360 * children of its own.
360 */ 361 */
361 362
362static struct config_group *group_children_make_group(struct config_group *group, const char *name) 363static int group_children_make_group(struct config_group *group, const char *name, struct config_group **new_group)
363{ 364{
364 struct simple_children *simple_children; 365 struct simple_children *simple_children;
365 366
366 simple_children = kzalloc(sizeof(struct simple_children), 367 simple_children = kzalloc(sizeof(struct simple_children),
367 GFP_KERNEL); 368 GFP_KERNEL);
368 if (!simple_children) 369 if (!simple_children)
369 return NULL; 370 return -ENOMEM;
370 371
371 372
372 config_group_init_type_name(&simple_children->group, name, 373 config_group_init_type_name(&simple_children->group, name,
373 &simple_children_type); 374 &simple_children_type);
374 375
375 return &simple_children->group; 376 *new_group = &simple_children->group;
377 return 0;
376} 378}
377 379
378static struct configfs_attribute group_children_attr_description = { 380static struct configfs_attribute group_children_attr_description = {
diff --git a/Documentation/filesystems/ext4.txt b/Documentation/filesystems/ext4.txt
index 0c5086db8352..80e193d82e2e 100644
--- a/Documentation/filesystems/ext4.txt
+++ b/Documentation/filesystems/ext4.txt
@@ -13,72 +13,93 @@ Mailing list: linux-ext4@vger.kernel.org
131. Quick usage instructions: 131. Quick usage instructions:
14=========================== 14===========================
15 15
16 - Grab updated e2fsprogs from 16 - Compile and install the latest version of e2fsprogs (as of this
17 ftp://ftp.kernel.org/pub/linux/kernel/people/tytso/e2fsprogs-interim/ 17 writing version 1.41) from:
18 This is a patchset on top of e2fsprogs-1.39, which can be found at 18
19 http://sourceforge.net/project/showfiles.php?group_id=2406
20
21 or
22
19 ftp://ftp.kernel.org/pub/linux/kernel/people/tytso/e2fsprogs/ 23 ftp://ftp.kernel.org/pub/linux/kernel/people/tytso/e2fsprogs/
20 24
21 - It's still mke2fs -j /dev/hda1 25 or grab the latest git repository from:
26
27 git://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git
28
29 - Create a new filesystem using the ext4dev filesystem type:
30
31 # mke2fs -t ext4dev /dev/hda1
32
33 Or configure an existing ext3 filesystem to support extents and set
34 the test_fs flag to indicate that it's ok for an in-development
35 filesystem to touch this filesystem:
22 36
23 - mount /dev/hda1 /wherever -t ext4dev 37 # tune2fs -O extents -E test_fs /dev/hda1
24 38
25 - To enable extents, 39 If the filesystem was created with 128 byte inodes, it can be
40 converted to use 256 byte for greater efficiency via:
26 41
27 mount /dev/hda1 /wherever -t ext4dev -o extents 42 # tune2fs -I 256 /dev/hda1
28 43
29 - The filesystem is compatible with the ext3 driver until you add a file 44 (Note: we currently do not have tools to convert an ext4dev
30 which has extents (ie: `mount -o extents', then create a file). 45 filesystem back to ext3; so please do not do try this on production
46 filesystems.)
31 47
32 NOTE: The "extents" mount flag is temporary. It will soon go away and 48 - Mounting:
33 extents will be enabled by the "-o extents" flag to mke2fs or tune2fs 49
50 # mount -t ext4dev /dev/hda1 /wherever
34 51
35 - When comparing performance with other filesystems, remember that 52 - When comparing performance with other filesystems, remember that
36 ext3/4 by default offers higher data integrity guarantees than most. So 53 ext3/4 by default offers higher data integrity guarantees than most.
37 when comparing with a metadata-only journalling filesystem, use `mount -o 54 So when comparing with a metadata-only journalling filesystem, such
38 data=writeback'. And you might as well use `mount -o nobh' too along 55 as ext3, use `mount -o data=writeback'. And you might as well use
39 with it. Making the journal larger than the mke2fs default often helps 56 `mount -o nobh' too along with it. Making the journal larger than
40 performance with metadata-intensive workloads. 57 the mke2fs default often helps performance with metadata-intensive
58 workloads.
41 59
422. Features 602. Features
43=========== 61===========
44 62
452.1 Currently available 632.1 Currently available
46 64
47* ability to use filesystems > 16TB 65* ability to use filesystems > 16TB (e2fsprogs support not available yet)
48* extent format reduces metadata overhead (RAM, IO for access, transactions) 66* extent format reduces metadata overhead (RAM, IO for access, transactions)
49* extent format more robust in face of on-disk corruption due to magics, 67* extent format more robust in face of on-disk corruption due to magics,
50* internal redunancy in tree 68* internal redunancy in tree
51 69* improved file allocation (multi-block alloc)
522.1 Previously available, soon to be enabled by default by "mkefs.ext4": 70* fix 32000 subdirectory limit
53 71* nsec timestamps for mtime, atime, ctime, create time
54* dir_index and resize inode will be on by default 72* inode version field on disk (NFSv4, Lustre)
55* large inodes will be used by default for fast EAs, nsec timestamps, etc 73* reduced e2fsck time via uninit_bg feature
74* journal checksumming for robustness, performance
75* persistent file preallocation (e.g for streaming media, databases)
76* ability to pack bitmaps and inode tables into larger virtual groups via the
77 flex_bg feature
78* large file support
79* Inode allocation using large virtual block groups via flex_bg
80* delayed allocation
81* large block (up to pagesize) support
82* efficent new ordered mode in JBD2 and ext4(avoid using buffer head to force
83 the ordering)
56 84
572.2 Candidate features for future inclusion 852.2 Candidate features for future inclusion
58 86
59There are several under discussion, whether they all make it in is 87* Online defrag (patches available but not well tested)
60partly a function of how much time everyone has to work on them: 88* reduced mke2fs time via lazy itable initialization in conjuction with
89 the uninit_bg feature (capability to do this is available in e2fsprogs
90 but a kernel thread to do lazy zeroing of unused inode table blocks
91 after filesystem is first mounted is required for safety)
61 92
62* improved file allocation (multi-block alloc, delayed alloc; basically done) 93There are several others under discussion, whether they all make it in is
63* fix 32000 subdirectory limit (patch exists, needs some e2fsck work) 94partly a function of how much time everyone has to work on them. Features like
64* nsec timestamps for mtime, atime, ctime, create time (patch exists, 95metadata checksumming have been discussed and planned for a bit but no patches
65 needs some e2fsck work) 96exist yet so I'm not sure they're in the near-term roadmap.
66* inode version field on disk (NFSv4, Lustre; prototype exists)
67* reduced mke2fs/e2fsck time via uninitialized groups (prototype exists)
68* journal checksumming for robustness, performance (prototype exists)
69* persistent file preallocation (e.g for streaming media, databases)
70 97
71Features like metadata checksumming have been discussed and planned for 98The big performance win will come with mballoc, delalloc and flex_bg
72a bit but no patches exist yet so I'm not sure they're in the near-term 99grouping of bitmaps and inode tables. Some test results available here:
73roadmap.
74 100
75The big performance win will come with mballoc and delalloc. CFS has 101 - http://www.bullopensource.org/ext4/20080530/ffsb-write-2.6.26-rc2.html
76been using mballoc for a few years already with Lustre, and IBM + Bull 102 - http://www.bullopensource.org/ext4/20080530/ffsb-readwrite-2.6.26-rc2.html
77did a lot of benchmarking on it. The reason it isn't in the first set of
78patches is partly a manageability issue, and partly because it doesn't
79directly affect the on-disk format (outside of much better allocation)
80so it isn't critical to get into the first round of changes. I believe
81Alex is working on a new set of patches right now.
82 103
833. Options 1043. Options
84========== 105==========
@@ -222,9 +243,11 @@ stripe=n Number of filesystem blocks that mballoc will try
222 to use for allocation size and alignment. For RAID5/6 243 to use for allocation size and alignment. For RAID5/6
223 systems this should be the number of data 244 systems this should be the number of data
224 disks * RAID chunk size in file system blocks. 245 disks * RAID chunk size in file system blocks.
225 246delalloc (*) Deferring block allocation until write-out time.
247nodelalloc Disable delayed allocation. Blocks are allocation
248 when data is copied from user to page cache.
226Data Mode 249Data Mode
227--------- 250=========
228There are 3 different data modes: 251There are 3 different data modes:
229 252
230* writeback mode 253* writeback mode
@@ -236,10 +259,10 @@ typically provide the best ext4 performance.
236 259
237* ordered mode 260* ordered mode
238In data=ordered mode, ext4 only officially journals metadata, but it logically 261In data=ordered mode, ext4 only officially journals metadata, but it logically
239groups metadata and data blocks into a single unit called a transaction. When 262groups metadata information related to data changes with the data blocks into a
240it's time to write the new metadata out to disk, the associated data blocks 263single unit called a transaction. When it's time to write the new metadata
241are written first. In general, this mode performs slightly slower than 264out to disk, the associated data blocks are written first. In general,
242writeback but significantly faster than journal mode. 265this mode performs slightly slower than writeback but significantly faster than journal mode.
243 266
244* journal mode 267* journal mode
245data=journal mode provides full data and metadata journaling. All new data is 268data=journal mode provides full data and metadata journaling. All new data is
@@ -247,7 +270,8 @@ written to the journal first, and then to its final location.
247In the event of a crash, the journal can be replayed, bringing both data and 270In the event of a crash, the journal can be replayed, bringing both data and
248metadata into a consistent state. This mode is the slowest except when data 271metadata into a consistent state. This mode is the slowest except when data
249needs to be read from and written to disk at the same time where it 272needs to be read from and written to disk at the same time where it
250outperforms all others modes. 273outperforms all others modes. Curently ext4 does not have delayed
274allocation support if this data journalling mode is selected.
251 275
252References 276References
253========== 277==========
@@ -256,7 +280,8 @@ kernel source: <file:fs/ext4/>
256 <file:fs/jbd2/> 280 <file:fs/jbd2/>
257 281
258programs: http://e2fsprogs.sourceforge.net/ 282programs: http://e2fsprogs.sourceforge.net/
259 http://ext2resize.sourceforge.net
260 283
261useful links: http://fedoraproject.org/wiki/ext3-devel 284useful links: http://fedoraproject.org/wiki/ext3-devel
262 http://www.bullopensource.org/ext4/ 285 http://www.bullopensource.org/ext4/
286 http://ext4.wiki.kernel.org/index.php/Main_Page
287 http://fedoraproject.org/wiki/Features/Ext4
diff --git a/Documentation/filesystems/gfs2-glocks.txt b/Documentation/filesystems/gfs2-glocks.txt
new file mode 100644
index 000000000000..4dae9a3840bf
--- /dev/null
+++ b/Documentation/filesystems/gfs2-glocks.txt
@@ -0,0 +1,114 @@
1 Glock internal locking rules
2 ------------------------------
3
4This documents the basic principles of the glock state machine
5internals. Each glock (struct gfs2_glock in fs/gfs2/incore.h)
6has two main (internal) locks:
7
8 1. A spinlock (gl_spin) which protects the internal state such
9 as gl_state, gl_target and the list of holders (gl_holders)
10 2. A non-blocking bit lock, GLF_LOCK, which is used to prevent other
11 threads from making calls to the DLM, etc. at the same time. If a
12 thread takes this lock, it must then call run_queue (usually via the
13 workqueue) when it releases it in order to ensure any pending tasks
14 are completed.
15
16The gl_holders list contains all the queued lock requests (not
17just the holders) associated with the glock. If there are any
18held locks, then they will be contiguous entries at the head
19of the list. Locks are granted in strictly the order that they
20are queued, except for those marked LM_FLAG_PRIORITY which are
21used only during recovery, and even then only for journal locks.
22
23There are three lock states that users of the glock layer can request,
24namely shared (SH), deferred (DF) and exclusive (EX). Those translate
25to the following DLM lock modes:
26
27Glock mode | DLM lock mode
28------------------------------
29 UN | IV/NL Unlocked (no DLM lock associated with glock) or NL
30 SH | PR (Protected read)
31 DF | CW (Concurrent write)
32 EX | EX (Exclusive)
33
34Thus DF is basically a shared mode which is incompatible with the "normal"
35shared lock mode, SH. In GFS2 the DF mode is used exclusively for direct I/O
36operations. The glocks are basically a lock plus some routines which deal
37with cache management. The following rules apply for the cache:
38
39Glock mode | Cache data | Cache Metadata | Dirty Data | Dirty Metadata
40--------------------------------------------------------------------------
41 UN | No | No | No | No
42 SH | Yes | Yes | No | No
43 DF | No | Yes | No | No
44 EX | Yes | Yes | Yes | Yes
45
46These rules are implemented using the various glock operations which
47are defined for each type of glock. Not all types of glocks use
48all the modes. Only inode glocks use the DF mode for example.
49
50Table of glock operations and per type constants:
51
52Field | Purpose
53----------------------------------------------------------------------------
54go_xmote_th | Called before remote state change (e.g. to sync dirty data)
55go_xmote_bh | Called after remote state change (e.g. to refill cache)
56go_inval | Called if remote state change requires invalidating the cache
57go_demote_ok | Returns boolean value of whether its ok to demote a glock
58 | (e.g. checks timeout, and that there is no cached data)
59go_lock | Called for the first local holder of a lock
60go_unlock | Called on the final local unlock of a lock
61go_dump | Called to print content of object for debugfs file, or on
62 | error to dump glock to the log.
63go_type; | The type of the glock, LM_TYPE_.....
64go_min_hold_time | The minimum hold time
65
66The minimum hold time for each lock is the time after a remote lock
67grant for which we ignore remote demote requests. This is in order to
68prevent a situation where locks are being bounced around the cluster
69from node to node with none of the nodes making any progress. This
70tends to show up most with shared mmaped files which are being written
71to by multiple nodes. By delaying the demotion in response to a
72remote callback, that gives the userspace program time to make
73some progress before the pages are unmapped.
74
75There is a plan to try and remove the go_lock and go_unlock callbacks
76if possible, in order to try and speed up the fast path though the locking.
77Also, eventually we hope to make the glock "EX" mode locally shared
78such that any local locking will be done with the i_mutex as required
79rather than via the glock.
80
81Locking rules for glock operations:
82
83Operation | GLF_LOCK bit lock held | gl_spin spinlock held
84-----------------------------------------------------------------
85go_xmote_th | Yes | No
86go_xmote_bh | Yes | No
87go_inval | Yes | No
88go_demote_ok | Sometimes | Yes
89go_lock | Yes | No
90go_unlock | Yes | No
91go_dump | Sometimes | Yes
92
93N.B. Operations must not drop either the bit lock or the spinlock
94if its held on entry. go_dump and do_demote_ok must never block.
95Note that go_dump will only be called if the glock's state
96indicates that it is caching uptodate data.
97
98Glock locking order within GFS2:
99
100 1. i_mutex (if required)
101 2. Rename glock (for rename only)
102 3. Inode glock(s)
103 (Parents before children, inodes at "same level" with same parent in
104 lock number order)
105 4. Rgrp glock(s) (for (de)allocation operations)
106 5. Transaction glock (via gfs2_trans_begin) for non-read operations
107 6. Page lock (always last, very important!)
108
109There are two glocks per inode. One deals with access to the inode
110itself (locking order as above), and the other, known as the iopen
111glock is used in conjunction with the i_nlink field in the inode to
112determine the lifetime of the inode in question. Locking of inodes
113is on a per-inode basis. Locking of rgrps is on a per rgrp basis.
114
diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index dbc3c6a3650f..7f268f327d75 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -380,28 +380,35 @@ i386 and x86_64 platforms support the new IRQ vector displays.
380Of some interest is the introduction of the /proc/irq directory to 2.4. 380Of some interest is the introduction of the /proc/irq directory to 2.4.
381It could be used to set IRQ to CPU affinity, this means that you can "hook" an 381It could be used to set IRQ to CPU affinity, this means that you can "hook" an
382IRQ to only one CPU, or to exclude a CPU of handling IRQs. The contents of the 382IRQ to only one CPU, or to exclude a CPU of handling IRQs. The contents of the
383irq subdir is one subdir for each IRQ, and one file; prof_cpu_mask 383irq subdir is one subdir for each IRQ, and two files; default_smp_affinity and
384prof_cpu_mask.
384 385
385For example 386For example
386 > ls /proc/irq/ 387 > ls /proc/irq/
387 0 10 12 14 16 18 2 4 6 8 prof_cpu_mask 388 0 10 12 14 16 18 2 4 6 8 prof_cpu_mask
388 1 11 13 15 17 19 3 5 7 9 389 1 11 13 15 17 19 3 5 7 9 default_smp_affinity
389 > ls /proc/irq/0/ 390 > ls /proc/irq/0/
390 smp_affinity 391 smp_affinity
391 392
392The contents of the prof_cpu_mask file and each smp_affinity file for each IRQ 393smp_affinity is a bitmask, in which you can specify which CPUs can handle the
393is the same by default: 394IRQ, you can set it by doing:
394 395
395 > cat /proc/irq/0/smp_affinity 396 > echo 1 > /proc/irq/10/smp_affinity
396 ffffffff 397
398This means that only the first CPU will handle the IRQ, but you can also echo
3995 which means that only the first and fourth CPU can handle the IRQ.
397 400
398It's a bitmask, in which you can specify which CPUs can handle the IRQ, you can 401The contents of each smp_affinity file is the same by default:
399set it by doing: 402
403 > cat /proc/irq/0/smp_affinity
404 ffffffff
400 405
401 > echo 1 > /proc/irq/prof_cpu_mask 406The default_smp_affinity mask applies to all non-active IRQs, which are the
407IRQs which have not yet been allocated/activated, and hence which lack a
408/proc/irq/[0-9]* directory.
402 409
403This means that only the first CPU will handle the IRQ, but you can also echo 5 410prof_cpu_mask specifies which CPUs are to be profiled by the system wide
404which means that only the first and fourth CPU can handle the IRQ. 411profiler. Default value is ffffffff (all cpus).
405 412
406The way IRQs are routed is handled by the IO-APIC, and it's Round Robin 413The way IRQs are routed is handled by the IO-APIC, and it's Round Robin
407between all the CPUs which are allowed to handle it. As usual the kernel has 414between all the CPUs which are allowed to handle it. As usual the kernel has
diff --git a/Documentation/filesystems/ubifs.txt b/Documentation/filesystems/ubifs.txt
new file mode 100644
index 000000000000..540e9e7f59c5
--- /dev/null
+++ b/Documentation/filesystems/ubifs.txt
@@ -0,0 +1,164 @@
1Introduction
2=============
3
4UBIFS file-system stands for UBI File System. UBI stands for "Unsorted
5Block Images". UBIFS is a flash file system, which means it is designed
6to work with flash devices. It is important to understand, that UBIFS
7is completely different to any traditional file-system in Linux, like
8Ext2, XFS, JFS, etc. UBIFS represents a separate class of file-systems
9which work with MTD devices, not block devices. The other Linux
10file-system of this class is JFFS2.
11
12To make it more clear, here is a small comparison of MTD devices and
13block devices.
14
151 MTD devices represent flash devices and they consist of eraseblocks of
16 rather large size, typically about 128KiB. Block devices consist of
17 small blocks, typically 512 bytes.
182 MTD devices support 3 main operations - read from some offset within an
19 eraseblock, write to some offset within an eraseblock, and erase a whole
20 eraseblock. Block devices support 2 main operations - read a whole
21 block and write a whole block.
223 The whole eraseblock has to be erased before it becomes possible to
23 re-write its contents. Blocks may be just re-written.
244 Eraseblocks become worn out after some number of erase cycles -
25 typically 100K-1G for SLC NAND and NOR flashes, and 1K-10K for MLC
26 NAND flashes. Blocks do not have the wear-out property.
275 Eraseblocks may become bad (only on NAND flashes) and software should
28 deal with this. Blocks on hard drives typically do not become bad,
29 because hardware has mechanisms to substitute bad blocks, at least in
30 modern LBA disks.
31
32It should be quite obvious why UBIFS is very different to traditional
33file-systems.
34
35UBIFS works on top of UBI. UBI is a separate software layer which may be
36found in drivers/mtd/ubi. UBI is basically a volume management and
37wear-leveling layer. It provides so called UBI volumes which is a higher
38level abstraction than a MTD device. The programming model of UBI devices
39is very similar to MTD devices - they still consist of large eraseblocks,
40they have read/write/erase operations, but UBI devices are devoid of
41limitations like wear and bad blocks (items 4 and 5 in the above list).
42
43In a sense, UBIFS is a next generation of JFFS2 file-system, but it is
44very different and incompatible to JFFS2. The following are the main
45differences.
46
47* JFFS2 works on top of MTD devices, UBIFS depends on UBI and works on
48 top of UBI volumes.
49* JFFS2 does not have on-media index and has to build it while mounting,
50 which requires full media scan. UBIFS maintains the FS indexing
51 information on the flash media and does not require full media scan,
52 so it mounts many times faster than JFFS2.
53* JFFS2 is a write-through file-system, while UBIFS supports write-back,
54 which makes UBIFS much faster on writes.
55
56Similarly to JFFS2, UBIFS supports on-the-flight compression which makes
57it possible to fit quite a lot of data to the flash.
58
59Similarly to JFFS2, UBIFS is tolerant of unclean reboots and power-cuts.
60It does not need stuff like ckfs.ext2. UBIFS automatically replays its
61journal and recovers from crashes, ensuring that the on-flash data
62structures are consistent.
63
64UBIFS scales logarithmically (most of the data structures it uses are
65trees), so the mount time and memory consumption do not linearly depend
66on the flash size, like in case of JFFS2. This is because UBIFS
67maintains the FS index on the flash media. However, UBIFS depends on
68UBI, which scales linearly. So overall UBI/UBIFS stack scales linearly.
69Nevertheless, UBI/UBIFS scales considerably better than JFFS2.
70
71The authors of UBIFS believe, that it is possible to develop UBI2 which
72would scale logarithmically as well. UBI2 would support the same API as UBI,
73but it would be binary incompatible to UBI. So UBIFS would not need to be
74changed to use UBI2
75
76
77Mount options
78=============
79
80(*) == default.
81
82norm_unmount (*) commit on unmount; the journal is committed
83 when the file-system is unmounted so that the
84 next mount does not have to replay the journal
85 and it becomes very fast;
86fast_unmount do not commit on unmount; this option makes
87 unmount faster, but the next mount slower
88 because of the need to replay the journal.
89
90
91Quick usage instructions
92========================
93
94The UBI volume to mount is specified using "ubiX_Y" or "ubiX:NAME" syntax,
95where "X" is UBI device number, "Y" is UBI volume number, and "NAME" is
96UBI volume name.
97
98Mount volume 0 on UBI device 0 to /mnt/ubifs:
99$ mount -t ubifs ubi0_0 /mnt/ubifs
100
101Mount "rootfs" volume of UBI device 0 to /mnt/ubifs ("rootfs" is volume
102name):
103$ mount -t ubifs ubi0:rootfs /mnt/ubifs
104
105The following is an example of the kernel boot arguments to attach mtd0
106to UBI and mount volume "rootfs":
107ubi.mtd=0 root=ubi0:rootfs rootfstype=ubifs
108
109
110Module Parameters for Debugging
111===============================
112
113When UBIFS has been compiled with debugging enabled, there are 3 module
114parameters that are available to control aspects of testing and debugging.
115The parameters are unsigned integers where each bit controls an option.
116The parameters are:
117
118debug_msgs Selects which debug messages to display, as follows:
119
120 Message Type Flag value
121
122 General messages 1
123 Journal messages 2
124 Mount messages 4
125 Commit messages 8
126 LEB search messages 16
127 Budgeting messages 32
128 Garbage collection messages 64
129 Tree Node Cache (TNC) messages 128
130 LEB properties (lprops) messages 256
131 Input/output messages 512
132 Log messages 1024
133 Scan messages 2048
134 Recovery messages 4096
135
136debug_chks Selects extra checks that UBIFS can do while running:
137
138 Check Flag value
139
140 General checks 1
141 Check Tree Node Cache (TNC) 2
142 Check indexing tree size 4
143 Check orphan area 8
144 Check old indexing tree 16
145 Check LEB properties (lprops) 32
146 Check leaf nodes and inodes 64
147
148debug_tsts Selects a mode of testing, as follows:
149
150 Test mode Flag value
151
152 Force in-the-gaps method 2
153 Failure mode for recovery testing 4
154
155For example, set debug_msgs to 5 to display General messages and Mount
156messages.
157
158
159References
160==========
161
162UBIFS documentation and FAQ/HOWTO at the MTD web site:
163http://www.linux-mtd.infradead.org/doc/ubifs.html
164http://www.linux-mtd.infradead.org/faq/ubifs.html
diff --git a/Documentation/ftrace.txt b/Documentation/ftrace.txt
index 13e4bf054c38..f218f616ff6b 100644
--- a/Documentation/ftrace.txt
+++ b/Documentation/ftrace.txt
@@ -2,8 +2,12 @@
2 ======================== 2 ========================
3 3
4Copyright 2008 Red Hat Inc. 4Copyright 2008 Red Hat Inc.
5Author: Steven Rostedt <srostedt@redhat.com> 5 Author: Steven Rostedt <srostedt@redhat.com>
6 License: The GNU Free Documentation License, Version 1.2
7Reviewers: Elias Oltmanns, Randy Dunlap, Andrew Morton,
8 John Kacur, and David Teigland.
6 9
10Written for: 2.6.27-rc1
7 11
8Introduction 12Introduction
9------------ 13------------
@@ -15,10 +19,11 @@ issues that take place outside of user-space.
15 19
16Although ftrace is the function tracer, it also includes an 20Although ftrace is the function tracer, it also includes an
17infrastructure that allows for other types of tracing. Some of the 21infrastructure that allows for other types of tracing. Some of the
18tracers that are currently in ftrace is a tracer to trace 22tracers that are currently in ftrace include a tracer to trace
19context switches, the time it takes for a high priority task to 23context switches, the time it takes for a high priority task to
20run after it was woken up, the time interrupts are disabled, and 24run after it was woken up, the time interrupts are disabled, and
21more. 25more (ftrace allows for tracer plugins, which means that the list of
26tracers can always grow).
22 27
23 28
24The File System 29The File System
@@ -32,6 +37,8 @@ To mount the debugfs system:
32 # mkdir /debug 37 # mkdir /debug
33 # mount -t debugfs nodev /debug 38 # mount -t debugfs nodev /debug
34 39
40(Note: it is more common to mount at /sys/kernel/debug, but for simplicity
41 this document will use /debug)
35 42
36That's it! (assuming that you have ftrace configured into your kernel) 43That's it! (assuming that you have ftrace configured into your kernel)
37 44
@@ -46,21 +53,20 @@ of ftrace. Here is a list of some of the key files:
46 that is configured. 53 that is configured.
47 54
48 available_tracers : This holds the different types of tracers that 55 available_tracers : This holds the different types of tracers that
49 has been compiled into the kernel. The tracers 56 have been compiled into the kernel. The tracers
50 listed here can be configured by echoing in their 57 listed here can be configured by echoing their name
51 name into current_tracer. 58 into current_tracer.
52 59
53 tracing_enabled : This sets or displays whether the current_tracer 60 tracing_enabled : This sets or displays whether the current_tracer
54 is activated and tracing or not. Echo 0 into this 61 is activated and tracing or not. Echo 0 into this
55 file to disable the tracer or 1 (or non-zero) to 62 file to disable the tracer or 1 to enable it.
56 enable it.
57 63
58 trace : This file holds the output of the trace in a human readable 64 trace : This file holds the output of the trace in a human readable
59 format. 65 format (described below).
60 66
61 latency_trace : This file shows the same trace but the information 67 latency_trace : This file shows the same trace but the information
62 is organized more to display possible latencies 68 is organized more to display possible latencies
63 in the system. 69 in the system (described below).
64 70
65 trace_pipe : The output is the same as the "trace" file but this 71 trace_pipe : The output is the same as the "trace" file but this
66 file is meant to be streamed with live tracing. 72 file is meant to be streamed with live tracing.
@@ -72,7 +78,7 @@ of ftrace. Here is a list of some of the key files:
72 file, it is consumed, and will not be read 78 file, it is consumed, and will not be read
73 again with a sequential read. The "trace" and 79 again with a sequential read. The "trace" and
74 "latency_trace" files are static, and if the 80 "latency_trace" files are static, and if the
75 tracer isn't adding more data, they will display 81 tracer is not adding more data, they will display
76 the same information every time they are read. 82 the same information every time they are read.
77 83
78 iter_ctrl : This file lets the user control the amount of data 84 iter_ctrl : This file lets the user control the amount of data
@@ -89,12 +95,14 @@ of ftrace. Here is a list of some of the key files:
89 95
90 trace_entries : This sets or displays the number of trace 96 trace_entries : This sets or displays the number of trace
91 entries each CPU buffer can hold. The tracer buffers 97 entries each CPU buffer can hold. The tracer buffers
92 are the same size for each CPU, so care must be 98 are the same size for each CPU. The displayed number
93 taken when modifying the trace_entries. The number 99 is the size of the CPU buffer and not total size. The
94 of actually entries will be the number given 100 trace buffers are allocated in pages (blocks of memory
95 times the number of possible CPUS. The buffers 101 that the kernel uses for allocation, usually 4 KB in size).
96 are saved as individual pages, and the actual entries 102 Since each entry is smaller than a page, if the last
97 will always be rounded up to entries per page. 103 allocated page has room for more entries than were
104 requested, the rest of the page is used to allocate
105 entries.
98 106
99 This can only be updated when the current_tracer 107 This can only be updated when the current_tracer
100 is set to "none". 108 is set to "none".
@@ -107,20 +115,19 @@ of ftrace. Here is a list of some of the key files:
107 on specified CPUS. The format is a hex string 115 on specified CPUS. The format is a hex string
108 representing the CPUS. 116 representing the CPUS.
109 117
110 set_ftrace_filter : When dynamic ftrace is configured in, the 118 set_ftrace_filter : When dynamic ftrace is configured in (see the
111 code is dynamically modified to disable calling 119 section below "dynamic ftrace"), the code is dynamically
112 of the function profiler (mcount). This lets 120 modified (code text rewrite) to disable calling of the
113 tracing be configured in with practically no overhead 121 function profiler (mcount). This lets tracing be configured
114 in performance. This also has a side effect of 122 in with practically no overhead in performance. This also
115 enabling or disabling specific functions to be 123 has a side effect of enabling or disabling specific functions
116 traced. Echoing in names of functions into this 124 to be traced. Echoing names of functions into this file
117 file will limit the trace to only those files. 125 will limit the trace to only those functions.
118 126
119 set_ftrace_notrace: This has the opposite effect that 127 set_ftrace_notrace: This has an effect opposite to that of
120 set_ftrace_filter has. Any function that is added 128 set_ftrace_filter. Any function that is added here will not
121 here will not be traced. If a function exists 129 be traced. If a function exists in both set_ftrace_filter
122 in both set_ftrace_filter and set_ftrace_notrace 130 and set_ftrace_notrace, the function will _not_ be traced.
123 the function will _not_ bet traced.
124 131
125 available_filter_functions : When a function is encountered the first 132 available_filter_functions : When a function is encountered the first
126 time by the dynamic tracer, it is recorded and 133 time by the dynamic tracer, it is recorded and
@@ -128,32 +135,31 @@ of ftrace. Here is a list of some of the key files:
128 lists the functions that have been recorded 135 lists the functions that have been recorded
129 by the dynamic tracer and these functions can 136 by the dynamic tracer and these functions can
130 be used to set the ftrace filter by the above 137 be used to set the ftrace filter by the above
131 "set_ftrace_filter" file. 138 "set_ftrace_filter" file. (See the section "dynamic ftrace"
139 below for more details).
132 140
133 141
134The Tracers 142The Tracers
135----------- 143-----------
136 144
137Here are the list of current tracers that can be configured. 145Here is the list of current tracers that may be configured.
138 146
139 ftrace - function tracer that uses mcount to trace all functions. 147 ftrace - function tracer that uses mcount to trace all functions.
140 It is possible to filter out which functions that are
141 traced when dynamic ftrace is configured in.
142 148
143 sched_switch - traces the context switches between tasks. 149 sched_switch - traces the context switches between tasks.
144 150
145 irqsoff - traces the areas that disable interrupts and saves off 151 irqsoff - traces the areas that disable interrupts and saves
146 the trace with the longest max latency. 152 the trace with the longest max latency.
147 See tracing_max_latency. When a new max is recorded, 153 See tracing_max_latency. When a new max is recorded,
148 it replaces the old trace. It is best to view this 154 it replaces the old trace. It is best to view this
149 trace with the latency_trace file. 155 trace via the latency_trace file.
150 156
151 preemptoff - Similar to irqsoff but traces and records the time 157 preemptoff - Similar to irqsoff but traces and records the amount of
152 preemption is disabled. 158 time for which preemption is disabled.
153 159
154 preemptirqsoff - Similar to irqsoff and preemptoff, but traces and 160 preemptirqsoff - Similar to irqsoff and preemptoff, but traces and
155 records the largest time irqs and/or preemption is 161 records the largest time for which irqs and/or preemption
156 disabled. 162 is disabled.
157 163
158 wakeup - Traces and records the max latency that it takes for 164 wakeup - Traces and records the max latency that it takes for
159 the highest priority task to get scheduled after 165 the highest priority task to get scheduled after
@@ -166,13 +172,13 @@ Here are the list of current tracers that can be configured.
166Examples of using the tracer 172Examples of using the tracer
167---------------------------- 173----------------------------
168 174
169Here are typical examples of using the tracers with only controlling 175Here are typical examples of using the tracers when controlling them only
170them with the debugfs interface (without using any user-land utilities). 176with the debugfs interface (without using any user-land utilities).
171 177
172Output format: 178Output format:
173-------------- 179--------------
174 180
175Here's an example of the output format of the file "trace" 181Here is an example of the output format of the file "trace"
176 182
177 -------- 183 --------
178# tracer: ftrace 184# tracer: ftrace
@@ -184,14 +190,15 @@ Here's an example of the output format of the file "trace"
184 bash-4251 [01] 10152.583855: _atomic_dec_and_lock <-dput 190 bash-4251 [01] 10152.583855: _atomic_dec_and_lock <-dput
185 -------- 191 --------
186 192
187A header is printed with the trace that is represented. In this case 193A header is printed with the tracer name that is represented by the trace.
188the tracer is "ftrace". Then a header showing the format. Task name 194In this case the tracer is "ftrace". Then a header showing the format. Task
189"bash", the task PID "4251", the CPU that it was running on 195name "bash", the task PID "4251", the CPU that it was running on
190"01", the timestamp in <secs>.<usecs> format, the function name that was 196"01", the timestamp in <secs>.<usecs> format, the function name that was
191traced "path_put" and the parent function that called this function 197traced "path_put" and the parent function that called this function
192"path_walk". 198"path_walk". The timestamp is the time at which the function was
199entered.
193 200
194The sched_switch tracer also includes tracing of task wake ups and 201The sched_switch tracer also includes tracing of task wakeups and
195context switches. 202context switches.
196 203
197 ksoftirqd/1-7 [01] 1453.070013: 7:115:R + 2916:115:S 204 ksoftirqd/1-7 [01] 1453.070013: 7:115:R + 2916:115:S
@@ -201,7 +208,7 @@ context switches.
201 kondemand/1-2916 [01] 1453.070013: 2916:115:S ==> 7:115:R 208 kondemand/1-2916 [01] 1453.070013: 2916:115:S ==> 7:115:R
202 ksoftirqd/1-7 [01] 1453.070013: 7:115:S ==> 0:140:R 209 ksoftirqd/1-7 [01] 1453.070013: 7:115:S ==> 0:140:R
203 210
204Wake ups are represented by a "+" and the context switches show 211Wake ups are represented by a "+" and the context switches are shown as
205"==>". The format is: 212"==>". The format is:
206 213
207 Context switches: 214 Context switches:
@@ -216,7 +223,7 @@ Wake ups are represented by a "+" and the context switches show
216 223
217 <pid>:<prio>:<state> + <pid>:<prio>:<state> 224 <pid>:<prio>:<state> + <pid>:<prio>:<state>
218 225
219The prio is the internal kernel priority, which is inverse to the 226The prio is the internal kernel priority, which is the inverse of the
220priority that is usually displayed by user-space tools. Zero represents 227priority that is usually displayed by user-space tools. Zero represents
221the highest priority (99). Prio 100 starts the "nice" priorities with 228the highest priority (99). Prio 100 starts the "nice" priorities with
222100 being equal to nice -20 and 139 being nice 19. The prio "140" is 229100 being equal to nice -20 and 139 being nice 19. The prio "140" is
@@ -227,7 +234,7 @@ Latency trace format
227-------------------- 234--------------------
228 235
229For traces that display latency times, the latency_trace file gives 236For traces that display latency times, the latency_trace file gives
230a bit more information to see why a latency happened. Here's a typical 237somewhat more information to see why a latency happened. Here is a typical
231trace. 238trace.
232 239
233# tracer: irqsoff 240# tracer: irqsoff
@@ -255,21 +262,20 @@ irqsoff latency trace v1.1.5 on 2.6.26-rc8
255 <idle>-0 0d.s1 98us : trace_hardirqs_on (do_softirq) 262 <idle>-0 0d.s1 98us : trace_hardirqs_on (do_softirq)
256 263
257 264
258vim:ft=help
259
260 265
261This shows that the current tracer is "irqsoff" tracing the time 266This shows that the current tracer is "irqsoff" tracing the time for which
262interrupts are disabled. It gives the trace version and the kernel 267interrupts were disabled. It gives the trace version and the version
263this was executed on (2.6.26-rc8). Then it displays the max latency 268of the kernel upon which this was executed on (2.6.26-rc8). Then it displays
264in microsecs (97 us). The number of trace entries displayed 269the max latency in microsecs (97 us). The number of trace entries displayed
265by the total number recorded (both are three: #3/3). The type of 270and the total number recorded (both are three: #3/3). The type of
266preemption that was used (PREEMPT). VP, KP, SP, and HP are always zero 271preemption that was used (PREEMPT). VP, KP, SP, and HP are always zero
267and reserved for later use. #P is the number of online CPUS (#P:2). 272and are reserved for later use. #P is the number of online CPUS (#P:2).
268 273
269The task is the process that was running when the latency happened. 274The task is the process that was running when the latency occurred.
270(swapper pid: 0). 275(swapper pid: 0).
271 276
272The start and stop that caused the latencies: 277The start and stop (the functions in which the interrupts were disabled and
278enabled respectively) that caused the latencies:
273 279
274 apic_timer_interrupt is where the interrupts were disabled. 280 apic_timer_interrupt is where the interrupts were disabled.
275 do_softirq is where they were enabled again. 281 do_softirq is where they were enabled again.
@@ -281,14 +287,14 @@ explains which is which.
281 287
282 pid: The PID of that process. 288 pid: The PID of that process.
283 289
284 CPU#: The CPU that the process was running on. 290 CPU#: The CPU which the process was running on.
285 291
286 irqs-off: 'd' interrupts are disabled. '.' otherwise. 292 irqs-off: 'd' interrupts are disabled. '.' otherwise.
287 293
288 need-resched: 'N' task need_resched is set, '.' otherwise. 294 need-resched: 'N' task need_resched is set, '.' otherwise.
289 295
290 hardirq/softirq: 296 hardirq/softirq:
291 'H' - hard irq happened inside a softirq. 297 'H' - hard irq occurred inside a softirq.
292 'h' - hard irq is running 298 'h' - hard irq is running
293 's' - soft irq is running 299 's' - soft irq is running
294 '.' - normal context. 300 '.' - normal context.
@@ -297,13 +303,13 @@ explains which is which.
297 303
298The above is mostly meaningful for kernel developers. 304The above is mostly meaningful for kernel developers.
299 305
300 time: This differs from the trace output where as the trace output 306 time: This differs from the trace file output. The trace file output
301 contained a absolute timestamp. This timestamp is relative 307 includes an absolute timestamp. The timestamp used by the
302 to the start of the first entry in the the trace. 308 latency_trace file is relative to the start of the trace.
303 309
304 delay: This is just to help catch your eye a bit better. And 310 delay: This is just to help catch your eye a bit better. And
305 needs to be fixed to be only relative to the same CPU. 311 needs to be fixed to be only relative to the same CPU.
306 The marks is determined by the difference between this 312 The marks are determined by the difference between this
307 current trace and the next trace. 313 current trace and the next trace.
308 '!' - greater than preempt_mark_thresh (default 100) 314 '!' - greater than preempt_mark_thresh (default 100)
309 '+' - greater than 1 microsecond 315 '+' - greater than 1 microsecond
@@ -322,13 +328,13 @@ output. To see what is available, simply cat the file:
322 print-parent nosym-offset nosym-addr noverbose noraw nohex nobin \ 328 print-parent nosym-offset nosym-addr noverbose noraw nohex nobin \
323 noblock nostacktrace nosched-tree 329 noblock nostacktrace nosched-tree
324 330
325To disable one of the options, echo in the option appended with "no". 331To disable one of the options, echo in the option prepended with "no".
326 332
327 echo noprint-parent > /debug/tracing/iter_ctrl 333 echo noprint-parent > /debug/tracing/iter_ctrl
328 334
329To enable an option, leave off the "no". 335To enable an option, leave off the "no".
330 336
331 echo sym-offest > /debug/tracing/iter_ctrl 337 echo sym-offset > /debug/tracing/iter_ctrl
332 338
333Here are the available options: 339Here are the available options:
334 340
@@ -344,7 +350,7 @@ Here are the available options:
344 350
345 sym-offset - Display not only the function name, but also the offset 351 sym-offset - Display not only the function name, but also the offset
346 in the function. For example, instead of seeing just 352 in the function. For example, instead of seeing just
347 "ktime_get" you will see "ktime_get+0xb/0x20" 353 "ktime_get", you will see "ktime_get+0xb/0x20".
348 354
349 sym-offset: 355 sym-offset:
350 bash-4000 [01] 1477.606694: simple_strtoul+0x6/0xa0 356 bash-4000 [01] 1477.606694: simple_strtoul+0x6/0xa0
@@ -364,7 +370,7 @@ Here are the available options:
364 user applications that can translate the raw numbers better than 370 user applications that can translate the raw numbers better than
365 having it done in the kernel. 371 having it done in the kernel.
366 372
367 hex - similar to raw, but the numbers will be in a hexadecimal format. 373 hex - Similar to raw, but the numbers will be in a hexadecimal format.
368 374
369 bin - This will print out the formats in raw binary. 375 bin - This will print out the formats in raw binary.
370 376
@@ -380,8 +386,8 @@ Here are the available options:
380sched_switch 386sched_switch
381------------ 387------------
382 388
383This tracer simply records schedule switches. Here's an example 389This tracer simply records schedule switches. Here is an example
384on how to implement it. 390of how to use it.
385 391
386 # echo sched_switch > /debug/tracing/current_tracer 392 # echo sched_switch > /debug/tracing/current_tracer
387 # echo 1 > /debug/tracing/tracing_enabled 393 # echo 1 > /debug/tracing/tracing_enabled
@@ -416,8 +422,8 @@ the name of the trace and points to the options. The "FUNCTION"
416is a misnomer since here it represents the wake ups and context 422is a misnomer since here it represents the wake ups and context
417switches. 423switches.
418 424
419The sched_switch only lists the wake ups (represented with '+') 425The sched_switch file only lists the wake ups (represented with '+')
420and context switches ('==>') with the previous task or current 426and context switches ('==>') with the previous task or current task
421first followed by the next task or task waking up. The format for both 427first followed by the next task or task waking up. The format for both
422of these is PID:KERNEL-PRIO:TASK-STATE. Remember that the KERNEL-PRIO 428of these is PID:KERNEL-PRIO:TASK-STATE. Remember that the KERNEL-PRIO
423is the inverse of the actual priority with zero (0) being the highest 429is the inverse of the actual priority with zero (0) being the highest
@@ -432,7 +438,8 @@ The task states are:
432 438
433 R - running : wants to run, may not actually be running 439 R - running : wants to run, may not actually be running
434 S - sleep : process is waiting to be woken up (handles signals) 440 S - sleep : process is waiting to be woken up (handles signals)
435 D - deep sleep : process must be woken up (ignores signals) 441 D - disk sleep (uninterruptible sleep) : process must be woken up
442 (ignores signals)
436 T - stopped : process suspended 443 T - stopped : process suspended
437 t - traced : process is being traced (with something like gdb) 444 t - traced : process is being traced (with something like gdb)
438 Z - zombie : process waiting to be cleaned up 445 Z - zombie : process waiting to be cleaned up
@@ -442,8 +449,8 @@ The task states are:
442ftrace_enabled 449ftrace_enabled
443-------------- 450--------------
444 451
445The following tracers give different output depending on whether 452The following tracers (listed below) give different output depending
446or not the sysctl ftrace_enabled is set. To set ftrace_enabled, 453on whether or not the sysctl ftrace_enabled is set. To set ftrace_enabled,
447one can either use the sysctl function or set it via the proc 454one can either use the sysctl function or set it via the proc
448file system interface. 455file system interface.
449 456
@@ -470,13 +477,12 @@ interrupt from triggering or the mouse interrupt from letting the
470kernel know of a new mouse event. The result is a latency with the 477kernel know of a new mouse event. The result is a latency with the
471reaction time. 478reaction time.
472 479
473The irqsoff tracer tracks the time interrupts are disabled and when 480The irqsoff tracer tracks the time for which interrupts are disabled.
474they are re-enabled. When a new maximum latency is hit, it saves off 481When a new maximum latency is hit, the tracer saves the trace leading up
475the trace so that it may be retrieved at a later time. Every time a 482to that latency point so that every time a new maximum is reached, the old
476new maximum in reached, the old saved trace is discarded and the new 483saved trace is discarded and the new trace is saved.
477trace is saved.
478 484
479To reset the maximum, echo 0 into tracing_max_latency. Here's an 485To reset the maximum, echo 0 into tracing_max_latency. Here is an
480example: 486example:
481 487
482 # echo irqsoff > /debug/tracing/current_tracer 488 # echo irqsoff > /debug/tracing/current_tracer
@@ -488,14 +494,14 @@ example:
488 # cat /debug/tracing/latency_trace 494 # cat /debug/tracing/latency_trace
489# tracer: irqsoff 495# tracer: irqsoff
490# 496#
491irqsoff latency trace v1.1.5 on 2.6.26-rc8 497irqsoff latency trace v1.1.5 on 2.6.26
492-------------------------------------------------------------------- 498--------------------------------------------------------------------
493 latency: 6 us, #3/3, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2) 499 latency: 12 us, #3/3, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
494 ----------------- 500 -----------------
495 | task: bash-4269 (uid:0 nice:0 policy:0 rt_prio:0) 501 | task: bash-3730 (uid:0 nice:0 policy:0 rt_prio:0)
496 ----------------- 502 -----------------
497 => started at: copy_page_range 503 => started at: sys_setpgid
498 => ended at: copy_page_range 504 => ended at: sys_setpgid
499 505
500# _------=> CPU# 506# _------=> CPU#
501# / _-----=> irqs-off 507# / _-----=> irqs-off
@@ -506,21 +512,19 @@ irqsoff latency trace v1.1.5 on 2.6.26-rc8
506# ||||| delay 512# ||||| delay
507# cmd pid ||||| time | caller 513# cmd pid ||||| time | caller
508# \ / ||||| \ | / 514# \ / ||||| \ | /
509 bash-4269 1...1 0us+: _spin_lock (copy_page_range) 515 bash-3730 1d... 0us : _write_lock_irq (sys_setpgid)
510 bash-4269 1...1 7us : _spin_unlock (copy_page_range) 516 bash-3730 1d..1 1us+: _write_unlock_irq (sys_setpgid)
511 bash-4269 1...2 7us : trace_preempt_on (copy_page_range) 517 bash-3730 1d..2 14us : trace_hardirqs_on (sys_setpgid)
512 518
513 519
514vim:ft=help 520Here we see that that we had a latency of 12 microsecs (which is
521very good). The _write_lock_irq in sys_setpgid disabled interrupts.
522The difference between the 12 and the displayed timestamp 14us occurred
523because the clock was incremented between the time of recording the max
524latency and the time of recording the function that had that latency.
515 525
516Here we see that that we had a latency of 6 microsecs (which is 526Note the above example had ftrace_enabled not set. If we set the
517very good). The spin_lock in copy_page_range disabled interrupts. 527ftrace_enabled, we get a much larger output:
518The difference between the 6 and the displayed timestamp 7us is
519because the clock must have incremented between the time of recording
520the max latency and recording the function that had that latency.
521
522Note the above had ftrace_enabled not set. If we set the ftrace_enabled
523we get a much larger output:
524 528
525# tracer: irqsoff 529# tracer: irqsoff
526# 530#
@@ -566,27 +570,26 @@ irqsoff latency trace v1.1.5 on 2.6.26-rc8
566 ls-4339 0d..2 51us : trace_hardirqs_on (__alloc_pages_internal) 570 ls-4339 0d..2 51us : trace_hardirqs_on (__alloc_pages_internal)
567 571
568 572
569vim:ft=help
570
571 573
572Here we traced a 50 microsecond latency. But we also see all the 574Here we traced a 50 microsecond latency. But we also see all the
573functions that were called during that time. Note that enabling 575functions that were called during that time. Note that by enabling
574function tracing we endure an added overhead. This overhead may 576function tracing, we incur an added overhead. This overhead may
575extend the latency times. But never the less, this trace has provided 577extend the latency times. But nevertheless, this trace has provided
576some very helpful debugging. 578some very helpful debugging information.
577 579
578 580
579preemptoff 581preemptoff
580---------- 582----------
581 583
582When preemption is disabled we may be able to receive interrupts but 584When preemption is disabled, we may be able to receive interrupts but
583the task can not be preempted and a higher priority task must wait 585the task cannot be preempted and a higher priority task must wait
584for preemption to be enabled again before it can preempt a lower 586for preemption to be enabled again before it can preempt a lower
585priority task. 587priority task.
586 588
587The preemptoff tracer traces the places that disables preemption. 589The preemptoff tracer traces the places that disable preemption.
588Like the irqsoff, it records the maximum latency that preemption 590Like the irqsoff tracer, it records the maximum latency for which preemption
589was disabled. The control of preemptoff is much like the irqsoff. 591was disabled. The control of preemptoff tracer is much like the irqsoff
592tracer.
590 593
591 # echo preemptoff > /debug/tracing/current_tracer 594 # echo preemptoff > /debug/tracing/current_tracer
592 # echo 0 > /debug/tracing/tracing_max_latency 595 # echo 0 > /debug/tracing/tracing_max_latency
@@ -620,8 +623,6 @@ preemptoff latency trace v1.1.5 on 2.6.26-rc8
620 sshd-4261 0d.s1 30us : trace_preempt_on (__do_softirq) 623 sshd-4261 0d.s1 30us : trace_preempt_on (__do_softirq)
621 624
622 625
623vim:ft=help
624
625This has some more changes. Preemption was disabled when an interrupt 626This has some more changes. Preemption was disabled when an interrupt
626came in (notice the 'h'), and was enabled while doing a softirq. 627came in (notice the 'h'), and was enabled while doing a softirq.
627(notice the 's'). But we also see that interrupts have been disabled 628(notice the 's'). But we also see that interrupts have been disabled
@@ -689,16 +690,16 @@ The above is an example of the preemptoff trace with ftrace_enabled
689set. Here we see that interrupts were disabled the entire time. 690set. Here we see that interrupts were disabled the entire time.
690The irq_enter code lets us know that we entered an interrupt 'h'. 691The irq_enter code lets us know that we entered an interrupt 'h'.
691Before that, the functions being traced still show that it is not 692Before that, the functions being traced still show that it is not
692in an interrupt, but we can see by the functions themselves that 693in an interrupt, but we can see from the functions themselves that
693this is not the case. 694this is not the case.
694 695
695Notice that the __do_softirq when called doesn't have a preempt_count. 696Notice that __do_softirq when called does not have a preempt_count.
696It may seem that we missed a preempt enabled. What really happened 697It may seem that we missed a preempt enabling. What really happened
697is that the preempt count is held on the threads stack and we 698is that the preempt count is held on the thread's stack and we
698switched to the softirq stack (4K stacks in effect). The code 699switched to the softirq stack (4K stacks in effect). The code
699does not copy the preempt count, but because interrupts are disabled 700does not copy the preempt count, but because interrupts are disabled,
700we don't need to worry about it. Having a tracer like this is good 701we do not need to worry about it. Having a tracer like this is good
701to let people know what really happens inside the kernel. 702for letting people know what really happens inside the kernel.
702 703
703 704
704preemptirqsoff 705preemptirqsoff
@@ -708,7 +709,7 @@ Knowing the locations that have interrupts disabled or preemption
708disabled for the longest times is helpful. But sometimes we would 709disabled for the longest times is helpful. But sometimes we would
709like to know when either preemption and/or interrupts are disabled. 710like to know when either preemption and/or interrupts are disabled.
710 711
711The following code: 712Consider the following code:
712 713
713 local_irq_disable(); 714 local_irq_disable();
714 call_function_with_irqs_off(); 715 call_function_with_irqs_off();
@@ -732,7 +733,7 @@ To record this time, use the preemptirqsoff tracer.
732 733
733Again, using this trace is much like the irqsoff and preemptoff tracers. 734Again, using this trace is much like the irqsoff and preemptoff tracers.
734 735
735 # echo preemptoff > /debug/tracing/current_tracer 736 # echo preemptirqsoff > /debug/tracing/current_tracer
736 # echo 0 > /debug/tracing/tracing_max_latency 737 # echo 0 > /debug/tracing/tracing_max_latency
737 # echo 1 > /debug/tracing/tracing_enabled 738 # echo 1 > /debug/tracing/tracing_enabled
738 # ls -ltr 739 # ls -ltr
@@ -764,12 +765,10 @@ preemptirqsoff latency trace v1.1.5 on 2.6.26-rc8
764 ls-4860 0d.s1 294us : trace_preempt_on (__do_softirq) 765 ls-4860 0d.s1 294us : trace_preempt_on (__do_softirq)
765 766
766 767
767vim:ft=help
768
769 768
770The trace_hardirqs_off_thunk is called from assembly on x86 when 769The trace_hardirqs_off_thunk is called from assembly on x86 when
771interrupts are disabled in the assembly code. Without the function 770interrupts are disabled in the assembly code. Without the function
772tracing, we don't know if interrupts were enabled within the preemption 771tracing, we do not know if interrupts were enabled within the preemption
773points. We do see that it started with preemption enabled. 772points. We do see that it started with preemption enabled.
774 773
775Here is a trace with ftrace_enabled set: 774Here is a trace with ftrace_enabled set:
@@ -860,25 +859,25 @@ preemptirqsoff latency trace v1.1.5 on 2.6.26-rc8
860 859
861This is a very interesting trace. It started with the preemption of 860This is a very interesting trace. It started with the preemption of
862the ls task. We see that the task had the "need_resched" bit set 861the ls task. We see that the task had the "need_resched" bit set
863with the 'N' in the trace. Interrupts are disabled in the spin_lock 862via the 'N' in the trace. Interrupts were disabled before the spin_lock
864and the trace started. We see that a schedule took place to run 863at the beginning of the trace. We see that a schedule took place to run
865sshd. When the interrupts were enabled we took an interrupt. 864sshd. When the interrupts were enabled, we took an interrupt.
866On return of the interrupt the softirq ran. We took another interrupt 865On return from the interrupt handler, the softirq ran. We took another
867while running the softirq as we see with the capital 'H'. 866interrupt while running the softirq as we see from the capital 'H'.
868 867
869 868
870wakeup 869wakeup
871------ 870------
872 871
873In Real-Time environment it is very important to know the wakeup 872In a Real-Time environment it is very important to know the wakeup
874time it takes for the highest priority task that wakes up to the 873time it takes for the highest priority task that is woken up to the
875time it executes. This is also known as "schedule latency". 874time that it executes. This is also known as "schedule latency".
876I stress the point that this is about RT tasks. It is also important 875I stress the point that this is about RT tasks. It is also important
877to know the scheduling latency of non-RT tasks, but the average 876to know the scheduling latency of non-RT tasks, but the average
878schedule latency is better for non-RT tasks. Tools like 877schedule latency is better for non-RT tasks. Tools like
879LatencyTop is more appropriate for such measurements. 878LatencyTop are more appropriate for such measurements.
880 879
881Real-Time environments is interested in the worst case latency. 880Real-Time environments are interested in the worst case latency.
882That is the longest latency it takes for something to happen, and 881That is the longest latency it takes for something to happen, and
883not the average. We can have a very fast scheduler that may only 882not the average. We can have a very fast scheduler that may only
884have a large latency once in a while, but that would not work well 883have a large latency once in a while, but that would not work well
@@ -889,8 +888,8 @@ tasks that are unpredictable will overwrite the worst case latency
889of RT tasks. 888of RT tasks.
890 889
891Since this tracer only deals with RT tasks, we will run this slightly 890Since this tracer only deals with RT tasks, we will run this slightly
892different than we did with the previous tracers. Instead of performing 891differently than we did with the previous tracers. Instead of performing
893an 'ls' we will run 'sleep 1' under 'chrt' which changes the 892an 'ls', we will run 'sleep 1' under 'chrt' which changes the
894priority of the task. 893priority of the task.
895 894
896 # echo wakeup > /debug/tracing/current_tracer 895 # echo wakeup > /debug/tracing/current_tracer
@@ -921,12 +920,10 @@ wakeup latency trace v1.1.5 on 2.6.26-rc8
921 <idle>-0 1d..4 4us : schedule (cpu_idle) 920 <idle>-0 1d..4 4us : schedule (cpu_idle)
922 921
923 922
924vim:ft=help
925
926 923
927Running this on an idle system we see that it only took 4 microseconds 924Running this on an idle system, we see that it only took 4 microseconds
928to perform the task switch. Note, since the trace marker in the 925to perform the task switch. Note, since the trace marker in the
929schedule is before the actual "switch" we stop the tracing when 926schedule is before the actual "switch", we stop the tracing when
930the recorded task is about to schedule in. This may change if 927the recorded task is about to schedule in. This may change if
931we add a new marker at the end of the scheduler. 928we add a new marker at the end of the scheduler.
932 929
@@ -991,13 +988,16 @@ ksoftirq-7 1d..6 49us : sub_preempt_count (_spin_unlock)
991ksoftirq-7 1d..4 50us : schedule (__cond_resched) 988ksoftirq-7 1d..4 50us : schedule (__cond_resched)
992 989
993The interrupt went off while running ksoftirqd. This task runs at 990The interrupt went off while running ksoftirqd. This task runs at
994SCHED_OTHER. Why didn't we see the 'N' set early? This may be 991SCHED_OTHER. Why did not we see the 'N' set early? This may be
995a harmless bug with x86_32 and 4K stacks. The need_reched() function 992a harmless bug with x86_32 and 4K stacks. On x86_32 with 4K stacks
996that tests if we need to reschedule looks on the actual stack. 993configured, the interrupt and softirq run with their own stack.
997Where as the setting of the NEED_RESCHED bit happens on the 994Some information is held on the top of the task's stack (need_resched
998task's stack. But because we are in a hard interrupt, the test 995and preempt_count are both stored there). The setting of the NEED_RESCHED
999is with the interrupts stack which has that to be false. We don't 996bit is done directly to the task's stack, but the reading of the
1000see the 'N' until we switch back to the task's stack. 997NEED_RESCHED is done by looking at the current stack, which in this case
998is the stack for the hard interrupt. This hides the fact that NEED_RESCHED
999has been set. We do not see the 'N' until we switch back to the task's
1000assigned stack.
1001 1001
1002ftrace 1002ftrace
1003------ 1003------
@@ -1036,14 +1036,14 @@ this tracer is a nop.
1036[...] 1036[...]
1037 1037
1038 1038
1039Note: It is sometimes better to enable or disable tracing directly from 1039Note: ftrace uses ring buffers to store the above entries. The newest data
1040a program, because the buffer may be overflowed by the echo commands 1040may overwrite the oldest data. Sometimes using echo to stop the trace
1041before you get to the point you want to trace. It is also easier to 1041is not sufficient because the tracing could have overwritten the data
1042stop the tracing at the point that you hit the part that you are 1042that you wanted to record. For this reason, it is sometimes better to
1043interested in. Since the ftrace buffer is a ring buffer with the 1043disable tracing directly from a program. This allows you to stop the
1044oldest data being overwritten, usually it is sufficient to start the 1044tracing at the point that you hit the part that you are interested in.
1045tracer with an echo command but have you code stop it. Something 1045To disable the tracing directly from a C program, something like following
1046like the following is usually appropriate for this. 1046code snippet can be used:
1047 1047
1048int trace_fd; 1048int trace_fd;
1049[...] 1049[...]
@@ -1052,25 +1052,31 @@ int main(int argc, char *argv[]) {
1052 trace_fd = open("/debug/tracing/tracing_enabled", O_WRONLY); 1052 trace_fd = open("/debug/tracing/tracing_enabled", O_WRONLY);
1053 [...] 1053 [...]
1054 if (condition_hit()) { 1054 if (condition_hit()) {
1055 write(trace_fd, "0", 1); 1055 write(trace_fd, "0", 1);
1056 } 1056 }
1057 [...] 1057 [...]
1058} 1058}
1059 1059
1060Note: Here we hard coded the path name. The debugfs mount is not
1061guaranteed to be at /debug (and is more commonly at /sys/kernel/debug).
1062For simple one time traces, the above is sufficent. For anything else,
1063a search through /proc/mounts may be needed to find where the debugfs
1064file-system is mounted.
1060 1065
1061dynamic ftrace 1066dynamic ftrace
1062-------------- 1067--------------
1063 1068
1064If CONFIG_DYNAMIC_FTRACE is set, then the system will run with 1069If CONFIG_DYNAMIC_FTRACE is set, the system will run with
1065virtually no overhead when function tracing is disabled. The way 1070virtually no overhead when function tracing is disabled. The way
1066this works is the mcount function call (placed at the start of 1071this works is the mcount function call (placed at the start of
1067every kernel function, produced by the -pg switch in gcc), starts 1072every kernel function, produced by the -pg switch in gcc), starts
1068of pointing to a simple return. 1073of pointing to a simple return. (Enabling FTRACE will include the
1074-pg switch in the compiling of the kernel.)
1069 1075
1070When dynamic ftrace is initialized, it calls kstop_machine to make it 1076When dynamic ftrace is initialized, it calls kstop_machine to make
1071act like a uniprocessor so that it can freely modify code without 1077the machine act like a uniprocessor so that it can freely modify code
1072worrying about other processors executing that same code. At 1078without worrying about other processors executing that same code. At
1073initialization, the mcount calls are change to call a "record_ip" 1079initialization, the mcount calls are changed to call a "record_ip"
1074function. After this, the first time a kernel function is called, 1080function. After this, the first time a kernel function is called,
1075it has the calling address saved in a hash table. 1081it has the calling address saved in a hash table.
1076 1082
@@ -1078,15 +1084,15 @@ Later on the ftraced kernel thread is awoken and will again call
1078kstop_machine if new functions have been recorded. The ftraced thread 1084kstop_machine if new functions have been recorded. The ftraced thread
1079will change all calls to mcount to "nop". Just calling mcount 1085will change all calls to mcount to "nop". Just calling mcount
1080and having mcount return has shown a 10% overhead. By converting 1086and having mcount return has shown a 10% overhead. By converting
1081it to a nop, there is no recordable overhead to the system. 1087it to a nop, there is no measurable overhead to the system.
1082 1088
1083One special side-effect to the recording of the functions being 1089One special side-effect to the recording of the functions being
1084traced, is that we can now selectively choose which functions we 1090traced is that we can now selectively choose which functions we
1085want to trace and which ones we want the mcount calls to remain as 1091wish to trace and which ones we want the mcount calls to remain as
1086nops. 1092nops.
1087 1093
1088Two files that contain to the enabling and disabling of recorded 1094Two files are used, one for enabling and one for disabling the tracing
1089functions are: 1095of specified functions. They are:
1090 1096
1091 set_ftrace_filter 1097 set_ftrace_filter
1092 1098
@@ -1094,7 +1100,7 @@ and
1094 1100
1095 set_ftrace_notrace 1101 set_ftrace_notrace
1096 1102
1097A list of available functions that you can add to this files is listed 1103A list of available functions that you can add to these files is listed
1098in: 1104in:
1099 1105
1100 available_filter_functions 1106 available_filter_functions
@@ -1108,7 +1114,7 @@ pick_next_task_fair
1108mutex_lock 1114mutex_lock
1109[...] 1115[...]
1110 1116
1111If I'm only interested in sys_nanosleep and hrtimer_interrupt: 1117If I am only interested in sys_nanosleep and hrtimer_interrupt:
1112 1118
1113 # echo sys_nanosleep hrtimer_interrupt \ 1119 # echo sys_nanosleep hrtimer_interrupt \
1114 > /debug/tracing/set_ftrace_filter 1120 > /debug/tracing/set_ftrace_filter
@@ -1125,21 +1131,21 @@ If I'm only interested in sys_nanosleep and hrtimer_interrupt:
1125 usleep-4134 [00] 1317.070111: sys_nanosleep <-syscall_call 1131 usleep-4134 [00] 1317.070111: sys_nanosleep <-syscall_call
1126 <idle>-0 [00] 1317.070115: hrtimer_interrupt <-smp_apic_timer_interrupt 1132 <idle>-0 [00] 1317.070115: hrtimer_interrupt <-smp_apic_timer_interrupt
1127 1133
1128To see what functions are being traced, you can cat the file: 1134To see which functions are being traced, you can cat the file:
1129 1135
1130 # cat /debug/tracing/set_ftrace_filter 1136 # cat /debug/tracing/set_ftrace_filter
1131hrtimer_interrupt 1137hrtimer_interrupt
1132sys_nanosleep 1138sys_nanosleep
1133 1139
1134 1140
1135Perhaps this isn't enough. The filters also allow simple wild cards. 1141Perhaps this is not enough. The filters also allow simple wild cards.
1136Only the following is currently available 1142Only the following are currently available
1137 1143
1138 <match>* - will match functions that begins with <match> 1144 <match>* - will match functions that begin with <match>
1139 *<match> - will match functions that end with <match> 1145 *<match> - will match functions that end with <match>
1140 *<match>* - will match functions that have <match> in it 1146 *<match>* - will match functions that have <match> in it
1141 1147
1142Thats all the wild cards that are allowed. 1148These are the only wild cards which are supported.
1143 1149
1144 <match>*<match> will not work. 1150 <match>*<match> will not work.
1145 1151
@@ -1187,7 +1193,7 @@ This is because the '>' and '>>' act just like they do in bash.
1187To rewrite the filters, use '>' 1193To rewrite the filters, use '>'
1188To append to the filters, use '>>' 1194To append to the filters, use '>>'
1189 1195
1190To clear out a filter so that all functions will be recorded again. 1196To clear out a filter so that all functions will be recorded again:
1191 1197
1192 # echo > /debug/tracing/set_ftrace_filter 1198 # echo > /debug/tracing/set_ftrace_filter
1193 # cat /debug/tracing/set_ftrace_filter 1199 # cat /debug/tracing/set_ftrace_filter
@@ -1246,24 +1252,24 @@ ftraced
1246 1252
1247As mentioned above, when dynamic ftrace is configured in, a kernel 1253As mentioned above, when dynamic ftrace is configured in, a kernel
1248thread wakes up once a second and checks to see if there are mcount 1254thread wakes up once a second and checks to see if there are mcount
1249calls that need to be converted into nops. If there is not, then 1255calls that need to be converted into nops. If there are not any, then
1250it simply goes back to sleep. But if there is, it will call 1256it simply goes back to sleep. But if there are some, it will call
1251kstop_machine to convert the calls to nops. 1257kstop_machine to convert the calls to nops.
1252 1258
1253There may be a case that you do not want this added latency. 1259There may be a case in which you do not want this added latency.
1254Perhaps you are doing some audio recording and this activity might 1260Perhaps you are doing some audio recording and this activity might
1255cause skips in the playback. There is an interface to disable 1261cause skips in the playback. There is an interface to disable
1256and enable the ftraced kernel thread. 1262and enable the "ftraced" kernel thread.
1257 1263
1258 # echo 0 > /debug/tracing/ftraced_enabled 1264 # echo 0 > /debug/tracing/ftraced_enabled
1259 1265
1260This will disable the calling of the kstop_machine to update the 1266This will disable the calling of kstop_machine to update the
1261mcount calls to nops. Remember that there's a large overhead 1267mcount calls to nops. Remember that there is a large overhead
1262to calling mcount. Without this kernel thread, that overhead will 1268to calling mcount. Without this kernel thread, that overhead will
1263exist. 1269exist.
1264 1270
1265Any write to the ftraced_enabled file will cause the kstop_machine 1271If there are recorded calls to mcount, any write to the ftraced_enabled
1266to run if there are recorded calls to mcount. This means that a 1272file will cause the kstop_machine to run. This means that a
1267user can manually perform the updates when they want to by simply 1273user can manually perform the updates when they want to by simply
1268echoing a '0' into the ftraced_enabled file. 1274echoing a '0' into the ftraced_enabled file.
1269 1275
@@ -1274,8 +1280,8 @@ that uses ftrace function recording.
1274trace_pipe 1280trace_pipe
1275---------- 1281----------
1276 1282
1277The trace_pipe outputs the same as trace, but the effect on the 1283The trace_pipe outputs the same content as the trace file, but the effect
1278tracing is different. Every read from trace_pipe is consumed. 1284on the tracing is different. Every read from trace_pipe is consumed.
1279This means that subsequent reads will be different. The trace 1285This means that subsequent reads will be different. The trace
1280is live. 1286is live.
1281 1287
@@ -1305,7 +1311,7 @@ is live.
1305 bash-4043 [00] 41.267111: select_task_rq_rt <-try_to_wake_up 1311 bash-4043 [00] 41.267111: select_task_rq_rt <-try_to_wake_up
1306 1312
1307 1313
1308Note, reading the trace_pipe will block until more input is added. 1314Note, reading the trace_pipe file will block until more input is added.
1309By changing the tracer, trace_pipe will issue an EOF. We needed 1315By changing the tracer, trace_pipe will issue an EOF. We needed
1310to set the ftrace tracer _before_ cating the trace_pipe file. 1316to set the ftrace tracer _before_ cating the trace_pipe file.
1311 1317
@@ -1314,8 +1320,8 @@ trace entries
1314------------- 1320-------------
1315 1321
1316Having too much or not enough data can be troublesome in diagnosing 1322Having too much or not enough data can be troublesome in diagnosing
1317some issue in the kernel. The file trace_entries is used to modify 1323an issue in the kernel. The file trace_entries is used to modify
1318the size of the internal trace buffers. The numbers listed 1324the size of the internal trace buffers. The number listed
1319is the number of entries that can be recorded per CPU. To know 1325is the number of entries that can be recorded per CPU. To know
1320the full size, multiply the number of possible CPUS with the 1326the full size, multiply the number of possible CPUS with the
1321number of entries. 1327number of entries.
@@ -1323,8 +1329,9 @@ number of entries.
1323 # cat /debug/tracing/trace_entries 1329 # cat /debug/tracing/trace_entries
132465620 133065620
1325 1331
1326Note, to modify this you must have tracing fulling disabled. To do that, 1332Note, to modify this, you must have tracing completely disabled. To do that,
1327echo "none" into the current_tracer. 1333echo "none" into the current_tracer. If the current_tracer is not set
1334to "none", an EINVAL error will be returned.
1328 1335
1329 # echo none > /debug/tracing/current_tracer 1336 # echo none > /debug/tracing/current_tracer
1330 # echo 100000 > /debug/tracing/trace_entries 1337 # echo 100000 > /debug/tracing/trace_entries
@@ -1333,18 +1340,18 @@ echo "none" into the current_tracer.
1333 1340
1334 1341
1335Notice that we echoed in 100,000 but the size is 100,045. The entries 1342Notice that we echoed in 100,000 but the size is 100,045. The entries
1336are held by individual pages. It allocates the number of pages it takes 1343are held in individual pages. It allocates the number of pages it takes
1337to fulfill the request. If more entries may fit on the last page 1344to fulfill the request. If more entries may fit on the last page
1338it will add them. 1345then they will be added.
1339 1346
1340 # echo 1 > /debug/tracing/trace_entries 1347 # echo 1 > /debug/tracing/trace_entries
1341 # cat /debug/tracing/trace_entries 1348 # cat /debug/tracing/trace_entries
134285 134985
1343 1350
1344This shows us that 85 entries can fit on a single page. 1351This shows us that 85 entries can fit in a single page.
1345 1352
1346The number of pages that will be allocated is a percentage of available 1353The number of pages which will be allocated is limited to a percentage
1347memory. Allocating too much will produces an error. 1354of available memory. Allocating too much will produce an error.
1348 1355
1349 # echo 1000000000000 > /debug/tracing/trace_entries 1356 # echo 1000000000000 > /debug/tracing/trace_entries
1350-bash: echo: write error: Cannot allocate memory 1357-bash: echo: write error: Cannot allocate memory
diff --git a/Documentation/i2c/busses/i2c-i810 b/Documentation/i2c/busses/i2c-i810
deleted file mode 100644
index 778210ee1583..000000000000
--- a/Documentation/i2c/busses/i2c-i810
+++ /dev/null
@@ -1,47 +0,0 @@
1Kernel driver i2c-i810
2
3Supported adapters:
4 * Intel 82810, 82810-DC100, 82810E, and 82815 (GMCH)
5 * Intel 82845G (GMCH)
6
7Authors:
8 Frodo Looijaard <frodol@dds.nl>,
9 Philip Edelbrock <phil@netroedge.com>,
10 Kyösti Mälkki <kmalkki@cc.hut.fi>,
11 Ralph Metzler <rjkm@thp.uni-koeln.de>,
12 Mark D. Studebaker <mdsxyz123@yahoo.com>
13
14Main contact: Mark Studebaker <mdsxyz123@yahoo.com>
15
16Description
17-----------
18
19WARNING: If you have an '810' or '815' motherboard, your standard I2C
20temperature sensors are most likely on the 801's I2C bus. You want the
21i2c-i801 driver for those, not this driver.
22
23Now for the i2c-i810...
24
25The GMCH chip contains two I2C interfaces.
26
27The first interface is used for DDC (Data Display Channel) which is a
28serial channel through the VGA monitor connector to a DDC-compliant
29monitor. This interface is defined by the Video Electronics Standards
30Association (VESA). The standards are available for purchase at
31http://www.vesa.org .
32
33The second interface is a general-purpose I2C bus. It may be connected to a
34TV-out chip such as the BT869 or possibly to a digital flat-panel display.
35
36Features
37--------
38
39Both busses use the i2c-algo-bit driver for 'bit banging'
40and support for specific transactions is provided by i2c-algo-bit.
41
42Issues
43------
44
45If you enable bus testing in i2c-algo-bit (insmod i2c-algo-bit bit_test=1),
46the test may fail; if so, the i2c-i810 driver won't be inserted. However,
47we think this has been fixed.
diff --git a/Documentation/i2c/busses/i2c-prosavage b/Documentation/i2c/busses/i2c-prosavage
deleted file mode 100644
index 703687902511..000000000000
--- a/Documentation/i2c/busses/i2c-prosavage
+++ /dev/null
@@ -1,23 +0,0 @@
1Kernel driver i2c-prosavage
2
3Supported adapters:
4
5 S3/VIA KM266/VT8375 aka ProSavage8
6 S3/VIA KM133/VT8365 aka Savage4
7
8Author: Henk Vergonet <henk@god.dyndns.org>
9
10Description
11-----------
12
13The Savage4 chips contain two I2C interfaces (aka a I2C 'master' or
14'host').
15
16The first interface is used for DDC (Data Display Channel) which is a
17serial channel through the VGA monitor connector to a DDC-compliant
18monitor. This interface is defined by the Video Electronics Standards
19Association (VESA). The standards are available for purchase at
20http://www.vesa.org . The second interface is a general-purpose I2C bus.
21
22Usefull for gaining access to the TV Encoder chips.
23
diff --git a/Documentation/i2c/busses/i2c-savage4 b/Documentation/i2c/busses/i2c-savage4
deleted file mode 100644
index 6ecceab618d3..000000000000
--- a/Documentation/i2c/busses/i2c-savage4
+++ /dev/null
@@ -1,26 +0,0 @@
1Kernel driver i2c-savage4
2
3Supported adapters:
4 * Savage4
5 * Savage2000
6
7Authors:
8 Alexander Wold <awold@bigfoot.com>,
9 Mark D. Studebaker <mdsxyz123@yahoo.com>
10
11Description
12-----------
13
14The Savage4 chips contain two I2C interfaces (aka a I2C 'master'
15or 'host').
16
17The first interface is used for DDC (Data Display Channel) which is a
18serial channel through the VGA monitor connector to a DDC-compliant
19monitor. This interface is defined by the Video Electronics Standards
20Association (VESA). The standards are available for purchase at
21http://www.vesa.org . The DDC bus is not yet supported because its register
22is not directly memory-mapped.
23
24The second interface is a general-purpose I2C bus. This is the only
25interface supported by the driver at the moment.
26
diff --git a/Documentation/i2c/chips/max6875 b/Documentation/i2c/chips/max6875
index a0cd8af2f408..10ca43cd1a72 100644
--- a/Documentation/i2c/chips/max6875
+++ b/Documentation/i2c/chips/max6875
@@ -49,7 +49,7 @@ $ modprobe max6875 force=0,0x50
49 49
50The MAX6874/MAX6875 ignores address bit 0, so this driver attaches to multiple 50The MAX6874/MAX6875 ignores address bit 0, so this driver attaches to multiple
51addresses. For example, for address 0x50, it also reserves 0x51. 51addresses. For example, for address 0x50, it also reserves 0x51.
52The even-address instance is called 'max6875', the odd one is 'max6875 subclient'. 52The even-address instance is called 'max6875', the odd one is 'dummy'.
53 53
54 54
55Programming the chip using i2c-dev 55Programming the chip using i2c-dev
diff --git a/Documentation/i2c/chips/pca9539 b/Documentation/i2c/chips/pca9539
index 1d81c530c4a5..6aff890088b1 100644
--- a/Documentation/i2c/chips/pca9539
+++ b/Documentation/i2c/chips/pca9539
@@ -7,7 +7,7 @@ drivers/gpio/pca9539.c instead.
7Supported chips: 7Supported chips:
8 * Philips PCA9539 8 * Philips PCA9539
9 Prefix: 'pca9539' 9 Prefix: 'pca9539'
10 Addresses scanned: 0x74 - 0x77 10 Addresses scanned: none
11 Datasheet: 11 Datasheet:
12 http://www.semiconductors.philips.com/acrobat/datasheets/PCA9539_2.pdf 12 http://www.semiconductors.philips.com/acrobat/datasheets/PCA9539_2.pdf
13 13
@@ -23,6 +23,14 @@ The input sense can also be inverted.
23The 16 lines are split between two bytes. 23The 16 lines are split between two bytes.
24 24
25 25
26Detection
27---------
28
29The PCA9539 is difficult to detect and not commonly found in PC machines,
30so you have to pass the I2C bus and address of the installed PCA9539
31devices explicitly to the driver at load time via the force=... parameter.
32
33
26Sysfs entries 34Sysfs entries
27------------- 35-------------
28 36
diff --git a/Documentation/i2c/chips/pcf8574 b/Documentation/i2c/chips/pcf8574
index 5c1ad1376b62..235815c075ff 100644
--- a/Documentation/i2c/chips/pcf8574
+++ b/Documentation/i2c/chips/pcf8574
@@ -4,13 +4,13 @@ Kernel driver pcf8574
4Supported chips: 4Supported chips:
5 * Philips PCF8574 5 * Philips PCF8574
6 Prefix: 'pcf8574' 6 Prefix: 'pcf8574'
7 Addresses scanned: I2C 0x20 - 0x27 7 Addresses scanned: none
8 Datasheet: Publicly available at the Philips Semiconductors website 8 Datasheet: Publicly available at the Philips Semiconductors website
9 http://www.semiconductors.philips.com/pip/PCF8574P.html 9 http://www.semiconductors.philips.com/pip/PCF8574P.html
10 10
11 * Philips PCF8574A 11 * Philips PCF8574A
12 Prefix: 'pcf8574a' 12 Prefix: 'pcf8574a'
13 Addresses scanned: I2C 0x38 - 0x3f 13 Addresses scanned: none
14 Datasheet: Publicly available at the Philips Semiconductors website 14 Datasheet: Publicly available at the Philips Semiconductors website
15 http://www.semiconductors.philips.com/pip/PCF8574P.html 15 http://www.semiconductors.philips.com/pip/PCF8574P.html
16 16
@@ -38,12 +38,10 @@ For more informations see the datasheet.
38Accessing PCF8574(A) via /sys interface 38Accessing PCF8574(A) via /sys interface
39------------------------------------- 39-------------------------------------
40 40
41! Be careful !
42The PCF8574(A) is plainly impossible to detect ! Stupid chip. 41The PCF8574(A) is plainly impossible to detect ! Stupid chip.
43So every chip with address in the interval [20..27] and [38..3f] are 42So, you have to pass the I2C bus and address of the installed PCF857A
44detected as PCF8574(A). If you have other chips in this address 43and PCF8574A devices explicitly to the driver at load time via the
45range, the workaround is to load this module after the one 44force=... parameter.
46for your others chips.
47 45
48On detection (i.e. insmod, modprobe et al.), directories are being 46On detection (i.e. insmod, modprobe et al.), directories are being
49created for each detected PCF8574(A): 47created for each detected PCF8574(A):
diff --git a/Documentation/i2c/chips/pcf8575 b/Documentation/i2c/chips/pcf8575
index 25f5698a61cf..40b268eb276f 100644
--- a/Documentation/i2c/chips/pcf8575
+++ b/Documentation/i2c/chips/pcf8575
@@ -40,12 +40,9 @@ Detection
40--------- 40---------
41 41
42There is no method known to detect whether a chip on a given I2C address is 42There is no method known to detect whether a chip on a given I2C address is
43a PCF8575 or whether it is any other I2C device. So there are two alternatives 43a PCF8575 or whether it is any other I2C device, so you have to pass the I2C
44to let the driver find the installed PCF8575 devices: 44bus and address of the installed PCF8575 devices explicitly to the driver at
45- Load this driver after any other I2C driver for I2C devices with addresses 45load time via the force=... parameter.
46 in the range 0x20 .. 0x27.
47- Pass the I2C bus and address of the installed PCF8575 devices explicitly to
48 the driver at load time via the probe=... or force=... parameters.
49 46
50/sys interface 47/sys interface
51-------------- 48--------------
diff --git a/Documentation/i2c/fault-codes b/Documentation/i2c/fault-codes
new file mode 100644
index 000000000000..045765c0b9b5
--- /dev/null
+++ b/Documentation/i2c/fault-codes
@@ -0,0 +1,127 @@
1This is a summary of the most important conventions for use of fault
2codes in the I2C/SMBus stack.
3
4
5A "Fault" is not always an "Error"
6----------------------------------
7Not all fault reports imply errors; "page faults" should be a familiar
8example. Software often retries idempotent operations after transient
9faults. There may be fancier recovery schemes that are appropriate in
10some cases, such as re-initializing (and maybe resetting). After such
11recovery, triggered by a fault report, there is no error.
12
13In a similar way, sometimes a "fault" code just reports one defined
14result for an operation ... it doesn't indicate that anything is wrong
15at all, just that the outcome wasn't on the "golden path".
16
17In short, your I2C driver code may need to know these codes in order
18to respond correctly. Other code may need to rely on YOUR code reporting
19the right fault code, so that it can (in turn) behave correctly.
20
21
22I2C and SMBus fault codes
23-------------------------
24These are returned as negative numbers from most calls, with zero or
25some positive number indicating a non-fault return. The specific
26numbers associated with these symbols differ between architectures,
27though most Linux systems use <asm-generic/errno*.h> numbering.
28
29Note that the descriptions here are not exhaustive. There are other
30codes that may be returned, and other cases where these codes should
31be returned. However, drivers should not return other codes for these
32cases (unless the hardware doesn't provide unique fault reports).
33
34Also, codes returned by adapter probe methods follow rules which are
35specific to their host bus (such as PCI, or the platform bus).
36
37
38EAGAIN
39 Returned by I2C adapters when they lose arbitration in master
40 transmit mode: some other master was transmitting different
41 data at the same time.
42
43 Also returned when trying to invoke an I2C operation in an
44 atomic context, when some task is already using that I2C bus
45 to execute some other operation.
46
47EBADMSG
48 Returned by SMBus logic when an invalid Packet Error Code byte
49 is received. This code is a CRC covering all bytes in the
50 transaction, and is sent before the terminating STOP. This
51 fault is only reported on read transactions; the SMBus slave
52 may have a way to report PEC mismatches on writes from the
53 host. Note that even if PECs are in use, you should not rely
54 on these as the only way to detect incorrect data transfers.
55
56EBUSY
57 Returned by SMBus adapters when the bus was busy for longer
58 than allowed. This usually indicates some device (maybe the
59 SMBus adapter) needs some fault recovery (such as resetting),
60 or that the reset was attempted but failed.
61
62EINVAL
63 This rather vague error means an invalid parameter has been
64 detected before any I/O operation was started. Use a more
65 specific fault code when you can.
66
67 One example would be a driver trying an SMBus Block Write
68 with block size outside the range of 1-32 bytes.
69
70EIO
71 This rather vague error means something went wrong when
72 performing an I/O operation. Use a more specific fault
73 code when you can.
74
75ENODEV
76 Returned by driver probe() methods. This is a bit more
77 specific than ENXIO, implying the problem isn't with the
78 address, but with the device found there. Driver probes
79 may verify the device returns *correct* responses, and
80 return this as appropriate. (The driver core will warn
81 about probe faults other than ENXIO and ENODEV.)
82
83ENOMEM
84 Returned by any component that can't allocate memory when
85 it needs to do so.
86
87ENXIO
88 Returned by I2C adapters to indicate that the address phase
89 of a transfer didn't get an ACK. While it might just mean
90 an I2C device was temporarily not responding, usually it
91 means there's nothing listening at that address.
92
93 Returned by driver probe() methods to indicate that they
94 found no device to bind to. (ENODEV may also be used.)
95
96EOPNOTSUPP
97 Returned by an adapter when asked to perform an operation
98 that it doesn't, or can't, support.
99
100 For example, this would be returned when an adapter that
101 doesn't support SMBus block transfers is asked to execute
102 one. In that case, the driver making that request should
103 have verified that functionality was supported before it
104 made that block transfer request.
105
106 Similarly, if an I2C adapter can't execute all legal I2C
107 messages, it should return this when asked to perform a
108 transaction it can't. (These limitations can't be seen in
109 the adapter's functionality mask, since the assumption is
110 that if an adapter supports I2C it supports all of I2C.)
111
112EPROTO
113 Returned when slave does not conform to the relevant I2C
114 or SMBus (or chip-specific) protocol specifications. One
115 case is when the length of an SMBus block data response
116 (from the SMBus slave) is outside the range 1-32 bytes.
117
118ETIMEDOUT
119 This is returned by drivers when an operation took too much
120 time, and was aborted before it completed.
121
122 SMBus adapters may return it when an operation took more
123 time than allowed by the SMBus specification; for example,
124 when a slave stretches clocks too far. I2C has no such
125 timeouts, but it's normal for I2C adapters to impose some
126 arbitrary limits (much longer than SMBus!) too.
127
diff --git a/Documentation/i2c/smbus-protocol b/Documentation/i2c/smbus-protocol
index 03f08fb491cc..24bfb65da17d 100644
--- a/Documentation/i2c/smbus-protocol
+++ b/Documentation/i2c/smbus-protocol
@@ -42,8 +42,8 @@ Count (8 bits): A data byte containing the length of a block operation.
42[..]: Data sent by I2C device, as opposed to data sent by the host adapter. 42[..]: Data sent by I2C device, as opposed to data sent by the host adapter.
43 43
44 44
45SMBus Quick Command: i2c_smbus_write_quick() 45SMBus Quick Command
46============================================= 46===================
47 47
48This sends a single bit to the device, at the place of the Rd/Wr bit. 48This sends a single bit to the device, at the place of the Rd/Wr bit.
49 49
diff --git a/Documentation/i2c/writing-clients b/Documentation/i2c/writing-clients
index d4cd4126d1ad..6b61b3a2e90b 100644
--- a/Documentation/i2c/writing-clients
+++ b/Documentation/i2c/writing-clients
@@ -44,6 +44,10 @@ static struct i2c_driver foo_driver = {
44 .id_table = foo_ids, 44 .id_table = foo_ids,
45 .probe = foo_probe, 45 .probe = foo_probe,
46 .remove = foo_remove, 46 .remove = foo_remove,
47 /* if device autodetection is needed: */
48 .class = I2C_CLASS_SOMETHING,
49 .detect = foo_detect,
50 .address_data = &addr_data,
47 51
48 /* else, driver uses "legacy" binding model: */ 52 /* else, driver uses "legacy" binding model: */
49 .attach_adapter = foo_attach_adapter, 53 .attach_adapter = foo_attach_adapter,
@@ -217,6 +221,31 @@ in the I2C bus driver. You may want to save the returned i2c_client
217reference for later use. 221reference for later use.
218 222
219 223
224Device Detection (Standard driver model)
225----------------------------------------
226
227Sometimes you do not know in advance which I2C devices are connected to
228a given I2C bus. This is for example the case of hardware monitoring
229devices on a PC's SMBus. In that case, you may want to let your driver
230detect supported devices automatically. This is how the legacy model
231was working, and is now available as an extension to the standard
232driver model (so that we can finally get rid of the legacy model.)
233
234You simply have to define a detect callback which will attempt to
235identify supported devices (returning 0 for supported ones and -ENODEV
236for unsupported ones), a list of addresses to probe, and a device type
237(or class) so that only I2C buses which may have that type of device
238connected (and not otherwise enumerated) will be probed. The i2c
239core will then call you back as needed and will instantiate a device
240for you for every successful detection.
241
242Note that this mechanism is purely optional and not suitable for all
243devices. You need some reliable way to identify the supported devices
244(typically using device-specific, dedicated identification registers),
245otherwise misdetections are likely to occur and things can get wrong
246quickly.
247
248
220Device Deletion (Standard driver model) 249Device Deletion (Standard driver model)
221--------------------------------------- 250---------------------------------------
222 251
@@ -569,7 +598,6 @@ SMBus communication
569 in terms of it. Never use this function directly! 598 in terms of it. Never use this function directly!
570 599
571 600
572 extern s32 i2c_smbus_write_quick(struct i2c_client * client, u8 value);
573 extern s32 i2c_smbus_read_byte(struct i2c_client * client); 601 extern s32 i2c_smbus_read_byte(struct i2c_client * client);
574 extern s32 i2c_smbus_write_byte(struct i2c_client * client, u8 value); 602 extern s32 i2c_smbus_write_byte(struct i2c_client * client, u8 value);
575 extern s32 i2c_smbus_read_byte_data(struct i2c_client * client, u8 command); 603 extern s32 i2c_smbus_read_byte_data(struct i2c_client * client, u8 command);
@@ -578,30 +606,31 @@ SMBus communication
578 extern s32 i2c_smbus_read_word_data(struct i2c_client * client, u8 command); 606 extern s32 i2c_smbus_read_word_data(struct i2c_client * client, u8 command);
579 extern s32 i2c_smbus_write_word_data(struct i2c_client * client, 607 extern s32 i2c_smbus_write_word_data(struct i2c_client * client,
580 u8 command, u16 value); 608 u8 command, u16 value);
609 extern s32 i2c_smbus_read_block_data(struct i2c_client * client,
610 u8 command, u8 *values);
581 extern s32 i2c_smbus_write_block_data(struct i2c_client * client, 611 extern s32 i2c_smbus_write_block_data(struct i2c_client * client,
582 u8 command, u8 length, 612 u8 command, u8 length,
583 u8 *values); 613 u8 *values);
584 extern s32 i2c_smbus_read_i2c_block_data(struct i2c_client * client, 614 extern s32 i2c_smbus_read_i2c_block_data(struct i2c_client * client,
585 u8 command, u8 length, u8 *values); 615 u8 command, u8 length, u8 *values);
586
587These ones were removed in Linux 2.6.10 because they had no users, but could
588be added back later if needed:
589
590 extern s32 i2c_smbus_read_block_data(struct i2c_client * client,
591 u8 command, u8 *values);
592 extern s32 i2c_smbus_write_i2c_block_data(struct i2c_client * client, 616 extern s32 i2c_smbus_write_i2c_block_data(struct i2c_client * client,
593 u8 command, u8 length, 617 u8 command, u8 length,
594 u8 *values); 618 u8 *values);
619
620These ones were removed from i2c-core because they had no users, but could
621be added back later if needed:
622
623 extern s32 i2c_smbus_write_quick(struct i2c_client * client, u8 value);
595 extern s32 i2c_smbus_process_call(struct i2c_client * client, 624 extern s32 i2c_smbus_process_call(struct i2c_client * client,
596 u8 command, u16 value); 625 u8 command, u16 value);
597 extern s32 i2c_smbus_block_process_call(struct i2c_client *client, 626 extern s32 i2c_smbus_block_process_call(struct i2c_client *client,
598 u8 command, u8 length, 627 u8 command, u8 length,
599 u8 *values) 628 u8 *values)
600 629
601All these transactions return -1 on failure. The 'write' transactions 630All these transactions return a negative errno value on failure. The 'write'
602return 0 on success; the 'read' transactions return the read value, except 631transactions return 0 on success; the 'read' transactions return the read
603for read_block, which returns the number of values read. The block buffers 632value, except for block transactions, which return the number of values
604need not be longer than 32 bytes. 633read. The block buffers need not be longer than 32 bytes.
605 634
606You can read the file `smbus-protocol' for more information about the 635You can read the file `smbus-protocol' for more information about the
607actual SMBus protocol. 636actual SMBus protocol.
diff --git a/Documentation/ioctl-number.txt b/Documentation/ioctl-number.txt
index 240ce7a56c40..3bb5f466a90d 100644
--- a/Documentation/ioctl-number.txt
+++ b/Documentation/ioctl-number.txt
@@ -117,6 +117,7 @@ Code Seq# Include File Comments
117 <mailto:natalia@nikhefk.nikhef.nl> 117 <mailto:natalia@nikhefk.nikhef.nl>
118'c' 00-7F linux/comstats.h conflict! 118'c' 00-7F linux/comstats.h conflict!
119'c' 00-7F linux/coda.h conflict! 119'c' 00-7F linux/coda.h conflict!
120'c' 80-9F asm-s390/chsc.h
120'd' 00-FF linux/char/drm/drm/h conflict! 121'd' 00-FF linux/char/drm/drm/h conflict!
121'd' 00-DF linux/video_decoder.h conflict! 122'd' 00-DF linux/video_decoder.h conflict!
122'd' F0-FF linux/digi1.h 123'd' F0-FF linux/digi1.h
diff --git a/Documentation/ioctl/hdio.txt b/Documentation/ioctl/hdio.txt
index c19efdeace2c..91a6ecbae0bb 100644
--- a/Documentation/ioctl/hdio.txt
+++ b/Documentation/ioctl/hdio.txt
@@ -508,12 +508,13 @@ HDIO_DRIVE_RESET execute a device reset
508 508
509 error returns: 509 error returns:
510 EACCES Access denied: requires CAP_SYS_ADMIN 510 EACCES Access denied: requires CAP_SYS_ADMIN
511 ENXIO No such device: phy dead or ctl_addr == 0
512 EIO I/O error: reset timed out or hardware error
511 513
512 notes: 514 notes:
513 515
514 Abort any current command, prevent anything else from being 516 Execute a reset on the device as soon as the current IO
515 queued, execute a reset on the device, and issue BLKRRPART 517 operation has completed.
516 ioctl on the block device.
517 518
518 Executes an ATAPI soft reset if applicable, otherwise 519 Executes an ATAPI soft reset if applicable, otherwise
519 executes an ATA soft reset on the controller. 520 executes an ATA soft reset on the controller.
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index df262b3c3d6e..f5662b7a34d1 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -147,10 +147,14 @@ and is between 256 and 4096 characters. It is defined in the file
147 default: 0 147 default: 0
148 148
149 acpi_sleep= [HW,ACPI] Sleep options 149 acpi_sleep= [HW,ACPI] Sleep options
150 Format: { s3_bios, s3_mode, s3_beep } 150 Format: { s3_bios, s3_mode, s3_beep, old_ordering }
151 See Documentation/power/video.txt for s3_bios and s3_mode. 151 See Documentation/power/video.txt for s3_bios and s3_mode.
152 s3_beep is for debugging; it makes the PC's speaker beep 152 s3_beep is for debugging; it makes the PC's speaker beep
153 as soon as the kernel's real-mode entry point is called. 153 as soon as the kernel's real-mode entry point is called.
154 old_ordering causes the ACPI 1.0 ordering of the _PTS
155 control method, wrt putting devices into low power
156 states, to be enforced (the ACPI 2.0 ordering of _PTS is
157 used by default).
154 158
155 acpi_sci= [HW,ACPI] ACPI System Control Interrupt trigger mode 159 acpi_sci= [HW,ACPI] ACPI System Control Interrupt trigger mode
156 Format: { level | edge | high | low } 160 Format: { level | edge | high | low }
@@ -571,6 +575,8 @@ and is between 256 and 4096 characters. It is defined in the file
571 575
572 debug_objects [KNL] Enable object debugging 576 debug_objects [KNL] Enable object debugging
573 577
578 debugpat [X86] Enable PAT debugging
579
574 decnet.addr= [HW,NET] 580 decnet.addr= [HW,NET]
575 Format: <area>[,<node>] 581 Format: <area>[,<node>]
576 See also Documentation/networking/decnet.txt. 582 See also Documentation/networking/decnet.txt.
@@ -756,9 +762,6 @@ and is between 256 and 4096 characters. It is defined in the file
756 hd= [EIDE] (E)IDE hard drive subsystem geometry 762 hd= [EIDE] (E)IDE hard drive subsystem geometry
757 Format: <cyl>,<head>,<sect> 763 Format: <cyl>,<head>,<sect>
758 764
759 hd?= [HW] (E)IDE subsystem
760 hd?lun= See Documentation/ide/ide.txt.
761
762 highmem=nn[KMG] [KNL,BOOT] forces the highmem zone to have an exact 765 highmem=nn[KMG] [KNL,BOOT] forces the highmem zone to have an exact
763 size of <nn>. This works even on boxes that have no 766 size of <nn>. This works even on boxes that have no
764 highmem otherwise. This also works to reduce highmem 767 highmem otherwise. This also works to reduce highmem
@@ -819,7 +822,7 @@ and is between 256 and 4096 characters. It is defined in the file
819 See Documentation/ide/ide.txt. 822 See Documentation/ide/ide.txt.
820 823
821 idle= [X86] 824 idle= [X86]
822 Format: idle=poll or idle=mwait 825 Format: idle=poll or idle=mwait, idle=halt, idle=nomwait
823 Poll forces a polling idle loop that can slightly improves the performance 826 Poll forces a polling idle loop that can slightly improves the performance
824 of waking up a idle CPU, but will use a lot of power and make the system 827 of waking up a idle CPU, but will use a lot of power and make the system
825 run hot. Not recommended. 828 run hot. Not recommended.
@@ -827,6 +830,9 @@ and is between 256 and 4096 characters. It is defined in the file
827 to not use it because it doesn't save as much power as a normal idle 830 to not use it because it doesn't save as much power as a normal idle
828 loop use the MONITOR/MWAIT idle loop anyways. Performance should be the same 831 loop use the MONITOR/MWAIT idle loop anyways. Performance should be the same
829 as idle=poll. 832 as idle=poll.
833 idle=halt. Halt is forced to be used for CPU idle.
834 In such case C2/C3 won't be used again.
835 idle=nomwait. Disable mwait for CPU C-states
830 836
831 ide-pci-generic.all-generic-ide [HW] (E)IDE subsystem 837 ide-pci-generic.all-generic-ide [HW] (E)IDE subsystem
832 Claim all unknown PCI IDE storage controllers. 838 Claim all unknown PCI IDE storage controllers.
@@ -1242,6 +1248,11 @@ and is between 256 and 4096 characters. It is defined in the file
1242 mtdparts= [MTD] 1248 mtdparts= [MTD]
1243 See drivers/mtd/cmdlinepart.c. 1249 See drivers/mtd/cmdlinepart.c.
1244 1250
1251 mtdset= [ARM]
1252 ARM/S3C2412 JIVE boot control
1253
1254 See arch/arm/mach-s3c2412/mach-jive.c
1255
1245 mtouchusb.raw_coordinates= 1256 mtouchusb.raw_coordinates=
1246 [HW] Make the MicroTouch USB driver use raw coordinates 1257 [HW] Make the MicroTouch USB driver use raw coordinates
1247 ('y', default) or cooked coordinates ('n') 1258 ('y', default) or cooked coordinates ('n')
@@ -1537,6 +1548,9 @@ and is between 256 and 4096 characters. It is defined in the file
1537 Use with caution as certain devices share 1548 Use with caution as certain devices share
1538 address decoders between ROMs and other 1549 address decoders between ROMs and other
1539 resources. 1550 resources.
1551 norom [X86-32,X86_64] Do not assign address space to
1552 expansion ROMs that do not already have
1553 BIOS assigned address ranges.
1540 irqmask=0xMMMM [X86-32] Set a bit mask of IRQs allowed to be 1554 irqmask=0xMMMM [X86-32] Set a bit mask of IRQs allowed to be
1541 assigned automatically to PCI devices. You can 1555 assigned automatically to PCI devices. You can
1542 make the kernel exclude IRQs of your ISA cards 1556 make the kernel exclude IRQs of your ISA cards
@@ -1612,6 +1626,10 @@ and is between 256 and 4096 characters. It is defined in the file
1612 Format: { parport<nr> | timid | 0 } 1626 Format: { parport<nr> | timid | 0 }
1613 See also Documentation/parport.txt. 1627 See also Documentation/parport.txt.
1614 1628
1629 pmtmr= [X86] Manual setup of pmtmr I/O Port.
1630 Override pmtimer IOPort with a hex value.
1631 e.g. pmtmr=0x508
1632
1615 pnpacpi= [ACPI] 1633 pnpacpi= [ACPI]
1616 { off } 1634 { off }
1617 1635
diff --git a/Documentation/kprobes.txt b/Documentation/kprobes.txt
index 6877e7187113..a79633d702bf 100644
--- a/Documentation/kprobes.txt
+++ b/Documentation/kprobes.txt
@@ -172,6 +172,7 @@ architectures:
172- ia64 (Does not support probes on instruction slot1.) 172- ia64 (Does not support probes on instruction slot1.)
173- sparc64 (Return probes not yet implemented.) 173- sparc64 (Return probes not yet implemented.)
174- arm 174- arm
175- ppc
175 176
1763. Configuring Kprobes 1773. Configuring Kprobes
177 178
diff --git a/Documentation/laptops/acer-wmi.txt b/Documentation/laptops/acer-wmi.txt
index 79b7dbd22141..69b5dd4e5a59 100644
--- a/Documentation/laptops/acer-wmi.txt
+++ b/Documentation/laptops/acer-wmi.txt
@@ -174,8 +174,6 @@ The LED is exposed through the LED subsystem, and can be found in:
174The mail LED is autodetected, so if you don't have one, the LED device won't 174The mail LED is autodetected, so if you don't have one, the LED device won't
175be registered. 175be registered.
176 176
177If you have a mail LED that is not green, please report this to me.
178
179Backlight 177Backlight
180********* 178*********
181 179
diff --git a/Documentation/powerpc/booting-without-of.txt b/Documentation/powerpc/booting-without-of.txt
index 1d2a772506cf..de2e5c05d6e7 100644
--- a/Documentation/powerpc/booting-without-of.txt
+++ b/Documentation/powerpc/booting-without-of.txt
@@ -41,23 +41,12 @@ Table of Contents
41 VI - System-on-a-chip devices and nodes 41 VI - System-on-a-chip devices and nodes
42 1) Defining child nodes of an SOC 42 1) Defining child nodes of an SOC
43 2) Representing devices without a current OF specification 43 2) Representing devices without a current OF specification
44 a) MDIO IO device 44 a) PHY nodes
45 b) Gianfar-compatible ethernet nodes 45 b) Interrupt controllers
46 c) PHY nodes 46 c) CFI or JEDEC memory-mapped NOR flash
47 d) Interrupt controllers 47 d) 4xx/Axon EMAC ethernet nodes
48 e) I2C 48 e) Xilinx IP cores
49 f) Freescale SOC USB controllers 49 f) USB EHCI controllers
50 g) Freescale SOC SEC Security Engines
51 h) Board Control and Status (BCSR)
52 i) Freescale QUICC Engine module (QE)
53 j) CFI or JEDEC memory-mapped NOR flash
54 k) Global Utilities Block
55 l) Freescale Communications Processor Module
56 m) Chipselect/Local Bus
57 n) 4xx/Axon EMAC ethernet nodes
58 o) Xilinx IP cores
59 p) Freescale Synchronous Serial Interface
60 q) USB EHCI controllers
61 50
62 VII - Marvell Discovery mv64[345]6x System Controller chips 51 VII - Marvell Discovery mv64[345]6x System Controller chips
63 1) The /system-controller node 52 1) The /system-controller node
@@ -1246,80 +1235,7 @@ descriptions for the SOC devices for which new nodes have been
1246defined; this list will expand as more and more SOC-containing 1235defined; this list will expand as more and more SOC-containing
1247platforms are moved over to use the flattened-device-tree model. 1236platforms are moved over to use the flattened-device-tree model.
1248 1237
1249 a) MDIO IO device 1238 a) PHY nodes
1250
1251 The MDIO is a bus to which the PHY devices are connected. For each
1252 device that exists on this bus, a child node should be created. See
1253 the definition of the PHY node below for an example of how to define
1254 a PHY.
1255
1256 Required properties:
1257 - reg : Offset and length of the register set for the device
1258 - compatible : Should define the compatible device type for the
1259 mdio. Currently, this is most likely to be "fsl,gianfar-mdio"
1260
1261 Example:
1262
1263 mdio@24520 {
1264 reg = <24520 20>;
1265 compatible = "fsl,gianfar-mdio";
1266
1267 ethernet-phy@0 {
1268 ......
1269 };
1270 };
1271
1272
1273 b) Gianfar-compatible ethernet nodes
1274
1275 Required properties:
1276
1277 - device_type : Should be "network"
1278 - model : Model of the device. Can be "TSEC", "eTSEC", or "FEC"
1279 - compatible : Should be "gianfar"
1280 - reg : Offset and length of the register set for the device
1281 - mac-address : List of bytes representing the ethernet address of
1282 this controller
1283 - interrupts : <a b> where a is the interrupt number and b is a
1284 field that represents an encoding of the sense and level
1285 information for the interrupt. This should be encoded based on
1286 the information in section 2) depending on the type of interrupt
1287 controller you have.
1288 - interrupt-parent : the phandle for the interrupt controller that
1289 services interrupts for this device.
1290 - phy-handle : The phandle for the PHY connected to this ethernet
1291 controller.
1292 - fixed-link : <a b c d e> where a is emulated phy id - choose any,
1293 but unique to the all specified fixed-links, b is duplex - 0 half,
1294 1 full, c is link speed - d#10/d#100/d#1000, d is pause - 0 no
1295 pause, 1 pause, e is asym_pause - 0 no asym_pause, 1 asym_pause.
1296
1297 Recommended properties:
1298
1299 - phy-connection-type : a string naming the controller/PHY interface type,
1300 i.e., "mii" (default), "rmii", "gmii", "rgmii", "rgmii-id", "sgmii",
1301 "tbi", or "rtbi". This property is only really needed if the connection
1302 is of type "rgmii-id", as all other connection types are detected by
1303 hardware.
1304
1305
1306 Example:
1307
1308 ethernet@24000 {
1309 #size-cells = <0>;
1310 device_type = "network";
1311 model = "TSEC";
1312 compatible = "gianfar";
1313 reg = <24000 1000>;
1314 mac-address = [ 00 E0 0C 00 73 00 ];
1315 interrupts = <d 3 e 3 12 3>;
1316 interrupt-parent = <40000>;
1317 phy-handle = <2452000>
1318 };
1319
1320
1321
1322 c) PHY nodes
1323 1239
1324 Required properties: 1240 Required properties:
1325 1241
@@ -1347,7 +1263,7 @@ platforms are moved over to use the flattened-device-tree model.
1347 }; 1263 };
1348 1264
1349 1265
1350 d) Interrupt controllers 1266 b) Interrupt controllers
1351 1267
1352 Some SOC devices contain interrupt controllers that are different 1268 Some SOC devices contain interrupt controllers that are different
1353 from the standard Open PIC specification. The SOC device nodes for 1269 from the standard Open PIC specification. The SOC device nodes for
@@ -1360,491 +1276,14 @@ platforms are moved over to use the flattened-device-tree model.
1360 1276
1361 pic@40000 { 1277 pic@40000 {
1362 linux,phandle = <40000>; 1278 linux,phandle = <40000>;
1363 clock-frequency = <0>;
1364 interrupt-controller; 1279 interrupt-controller;
1365 #address-cells = <0>; 1280 #address-cells = <0>;
1366 reg = <40000 40000>; 1281 reg = <40000 40000>;
1367 built-in;
1368 compatible = "chrp,open-pic"; 1282 compatible = "chrp,open-pic";
1369 device_type = "open-pic"; 1283 device_type = "open-pic";
1370 big-endian;
1371 };
1372
1373
1374 e) I2C
1375
1376 Required properties :
1377
1378 - device_type : Should be "i2c"
1379 - reg : Offset and length of the register set for the device
1380
1381 Recommended properties :
1382
1383 - compatible : Should be "fsl-i2c" for parts compatible with
1384 Freescale I2C specifications.
1385 - interrupts : <a b> where a is the interrupt number and b is a
1386 field that represents an encoding of the sense and level
1387 information for the interrupt. This should be encoded based on
1388 the information in section 2) depending on the type of interrupt
1389 controller you have.
1390 - interrupt-parent : the phandle for the interrupt controller that
1391 services interrupts for this device.
1392 - dfsrr : boolean; if defined, indicates that this I2C device has
1393 a digital filter sampling rate register
1394 - fsl5200-clocking : boolean; if defined, indicated that this device
1395 uses the FSL 5200 clocking mechanism.
1396
1397 Example :
1398
1399 i2c@3000 {
1400 interrupt-parent = <40000>;
1401 interrupts = <1b 3>;
1402 reg = <3000 18>;
1403 device_type = "i2c";
1404 compatible = "fsl-i2c";
1405 dfsrr;
1406 };
1407
1408
1409 f) Freescale SOC USB controllers
1410
1411 The device node for a USB controller that is part of a Freescale
1412 SOC is as described in the document "Open Firmware Recommended
1413 Practice : Universal Serial Bus" with the following modifications
1414 and additions :
1415
1416 Required properties :
1417 - compatible : Should be "fsl-usb2-mph" for multi port host USB
1418 controllers, or "fsl-usb2-dr" for dual role USB controllers
1419 - phy_type : For multi port host USB controllers, should be one of
1420 "ulpi", or "serial". For dual role USB controllers, should be
1421 one of "ulpi", "utmi", "utmi_wide", or "serial".
1422 - reg : Offset and length of the register set for the device
1423 - port0 : boolean; if defined, indicates port0 is connected for
1424 fsl-usb2-mph compatible controllers. Either this property or
1425 "port1" (or both) must be defined for "fsl-usb2-mph" compatible
1426 controllers.
1427 - port1 : boolean; if defined, indicates port1 is connected for
1428 fsl-usb2-mph compatible controllers. Either this property or
1429 "port0" (or both) must be defined for "fsl-usb2-mph" compatible
1430 controllers.
1431 - dr_mode : indicates the working mode for "fsl-usb2-dr" compatible
1432 controllers. Can be "host", "peripheral", or "otg". Default to
1433 "host" if not defined for backward compatibility.
1434
1435 Recommended properties :
1436 - interrupts : <a b> where a is the interrupt number and b is a
1437 field that represents an encoding of the sense and level
1438 information for the interrupt. This should be encoded based on
1439 the information in section 2) depending on the type of interrupt
1440 controller you have.
1441 - interrupt-parent : the phandle for the interrupt controller that
1442 services interrupts for this device.
1443
1444 Example multi port host USB controller device node :
1445 usb@22000 {
1446 compatible = "fsl-usb2-mph";
1447 reg = <22000 1000>;
1448 #address-cells = <1>;
1449 #size-cells = <0>;
1450 interrupt-parent = <700>;
1451 interrupts = <27 1>;
1452 phy_type = "ulpi";
1453 port0;
1454 port1;
1455 };
1456
1457 Example dual role USB controller device node :
1458 usb@23000 {
1459 compatible = "fsl-usb2-dr";
1460 reg = <23000 1000>;
1461 #address-cells = <1>;
1462 #size-cells = <0>;
1463 interrupt-parent = <700>;
1464 interrupts = <26 1>;
1465 dr_mode = "otg";
1466 phy = "ulpi";
1467 };
1468
1469
1470 g) Freescale SOC SEC Security Engines
1471
1472 Required properties:
1473
1474 - device_type : Should be "crypto"
1475 - model : Model of the device. Should be "SEC1" or "SEC2"
1476 - compatible : Should be "talitos"
1477 - reg : Offset and length of the register set for the device
1478 - interrupts : <a b> where a is the interrupt number and b is a
1479 field that represents an encoding of the sense and level
1480 information for the interrupt. This should be encoded based on
1481 the information in section 2) depending on the type of interrupt
1482 controller you have.
1483 - interrupt-parent : the phandle for the interrupt controller that
1484 services interrupts for this device.
1485 - num-channels : An integer representing the number of channels
1486 available.
1487 - channel-fifo-len : An integer representing the number of
1488 descriptor pointers each channel fetch fifo can hold.
1489 - exec-units-mask : The bitmask representing what execution units
1490 (EUs) are available. It's a single 32-bit cell. EU information
1491 should be encoded following the SEC's Descriptor Header Dword
1492 EU_SEL0 field documentation, i.e. as follows:
1493
1494 bit 0 = reserved - should be 0
1495 bit 1 = set if SEC has the ARC4 EU (AFEU)
1496 bit 2 = set if SEC has the DES/3DES EU (DEU)
1497 bit 3 = set if SEC has the message digest EU (MDEU)
1498 bit 4 = set if SEC has the random number generator EU (RNG)
1499 bit 5 = set if SEC has the public key EU (PKEU)
1500 bit 6 = set if SEC has the AES EU (AESU)
1501 bit 7 = set if SEC has the Kasumi EU (KEU)
1502
1503 bits 8 through 31 are reserved for future SEC EUs.
1504
1505 - descriptor-types-mask : The bitmask representing what descriptors
1506 are available. It's a single 32-bit cell. Descriptor type
1507 information should be encoded following the SEC's Descriptor
1508 Header Dword DESC_TYPE field documentation, i.e. as follows:
1509
1510 bit 0 = set if SEC supports the aesu_ctr_nonsnoop desc. type
1511 bit 1 = set if SEC supports the ipsec_esp descriptor type
1512 bit 2 = set if SEC supports the common_nonsnoop desc. type
1513 bit 3 = set if SEC supports the 802.11i AES ccmp desc. type
1514 bit 4 = set if SEC supports the hmac_snoop_no_afeu desc. type
1515 bit 5 = set if SEC supports the srtp descriptor type
1516 bit 6 = set if SEC supports the non_hmac_snoop_no_afeu desc.type
1517 bit 7 = set if SEC supports the pkeu_assemble descriptor type
1518 bit 8 = set if SEC supports the aesu_key_expand_output desc.type
1519 bit 9 = set if SEC supports the pkeu_ptmul descriptor type
1520 bit 10 = set if SEC supports the common_nonsnoop_afeu desc. type
1521 bit 11 = set if SEC supports the pkeu_ptadd_dbl descriptor type
1522
1523 ..and so on and so forth.
1524
1525 Example:
1526
1527 /* MPC8548E */
1528 crypto@30000 {
1529 device_type = "crypto";
1530 model = "SEC2";
1531 compatible = "talitos";
1532 reg = <30000 10000>;
1533 interrupts = <1d 3>;
1534 interrupt-parent = <40000>;
1535 num-channels = <4>;
1536 channel-fifo-len = <18>;
1537 exec-units-mask = <000000fe>;
1538 descriptor-types-mask = <012b0ebf>;
1539 };
1540
1541 h) Board Control and Status (BCSR)
1542
1543 Required properties:
1544
1545 - device_type : Should be "board-control"
1546 - reg : Offset and length of the register set for the device
1547
1548 Example:
1549
1550 bcsr@f8000000 {
1551 device_type = "board-control";
1552 reg = <f8000000 8000>;
1553 };
1554
1555 i) Freescale QUICC Engine module (QE)
1556 This represents qe module that is installed on PowerQUICC II Pro.
1557
1558 NOTE: This is an interim binding; it should be updated to fit
1559 in with the CPM binding later in this document.
1560
1561 Basically, it is a bus of devices, that could act more or less
1562 as a complete entity (UCC, USB etc ). All of them should be siblings on
1563 the "root" qe node, using the common properties from there.
1564 The description below applies to the qe of MPC8360 and
1565 more nodes and properties would be extended in the future.
1566
1567 i) Root QE device
1568
1569 Required properties:
1570 - compatible : should be "fsl,qe";
1571 - model : precise model of the QE, Can be "QE", "CPM", or "CPM2"
1572 - reg : offset and length of the device registers.
1573 - bus-frequency : the clock frequency for QUICC Engine.
1574
1575 Recommended properties
1576 - brg-frequency : the internal clock source frequency for baud-rate
1577 generators in Hz.
1578
1579 Example:
1580 qe@e0100000 {
1581 #address-cells = <1>;
1582 #size-cells = <1>;
1583 #interrupt-cells = <2>;
1584 compatible = "fsl,qe";
1585 ranges = <0 e0100000 00100000>;
1586 reg = <e0100000 480>;
1587 brg-frequency = <0>;
1588 bus-frequency = <179A7B00>;
1589 }
1590
1591
1592 ii) SPI (Serial Peripheral Interface)
1593
1594 Required properties:
1595 - cell-index : SPI controller index.
1596 - compatible : should be "fsl,spi".
1597 - mode : the SPI operation mode, it can be "cpu" or "cpu-qe".
1598 - reg : Offset and length of the register set for the device
1599 - interrupts : <a b> where a is the interrupt number and b is a
1600 field that represents an encoding of the sense and level
1601 information for the interrupt. This should be encoded based on
1602 the information in section 2) depending on the type of interrupt
1603 controller you have.
1604 - interrupt-parent : the phandle for the interrupt controller that
1605 services interrupts for this device.
1606
1607 Example:
1608 spi@4c0 {
1609 cell-index = <0>;
1610 compatible = "fsl,spi";
1611 reg = <4c0 40>;
1612 interrupts = <82 0>;
1613 interrupt-parent = <700>;
1614 mode = "cpu";
1615 };
1616
1617
1618 iii) USB (Universal Serial Bus Controller)
1619
1620 Required properties:
1621 - compatible : could be "qe_udc" or "fhci-hcd".
1622 - mode : the could be "host" or "slave".
1623 - reg : Offset and length of the register set for the device
1624 - interrupts : <a b> where a is the interrupt number and b is a
1625 field that represents an encoding of the sense and level
1626 information for the interrupt. This should be encoded based on
1627 the information in section 2) depending on the type of interrupt
1628 controller you have.
1629 - interrupt-parent : the phandle for the interrupt controller that
1630 services interrupts for this device.
1631
1632 Example(slave):
1633 usb@6c0 {
1634 compatible = "qe_udc";
1635 reg = <6c0 40>;
1636 interrupts = <8b 0>;
1637 interrupt-parent = <700>;
1638 mode = "slave";
1639 }; 1284 };
1640 1285
1641 1286 c) CFI or JEDEC memory-mapped NOR flash
1642 iv) UCC (Unified Communications Controllers)
1643
1644 Required properties:
1645 - device_type : should be "network", "hldc", "uart", "transparent"
1646 "bisync", "atm", or "serial".
1647 - compatible : could be "ucc_geth" or "fsl_atm" and so on.
1648 - cell-index : the ucc number(1-8), corresponding to UCCx in UM.
1649 - reg : Offset and length of the register set for the device
1650 - interrupts : <a b> where a is the interrupt number and b is a
1651 field that represents an encoding of the sense and level
1652 information for the interrupt. This should be encoded based on
1653 the information in section 2) depending on the type of interrupt
1654 controller you have.
1655 - interrupt-parent : the phandle for the interrupt controller that
1656 services interrupts for this device.
1657 - pio-handle : The phandle for the Parallel I/O port configuration.
1658 - port-number : for UART drivers, the port number to use, between 0 and 3.
1659 This usually corresponds to the /dev/ttyQE device, e.g. <0> = /dev/ttyQE0.
1660 The port number is added to the minor number of the device. Unlike the
1661 CPM UART driver, the port-number is required for the QE UART driver.
1662 - soft-uart : for UART drivers, if specified this means the QE UART device
1663 driver should use "Soft-UART" mode, which is needed on some SOCs that have
1664 broken UART hardware. Soft-UART is provided via a microcode upload.
1665 - rx-clock-name: the UCC receive clock source
1666 "none": clock source is disabled
1667 "brg1" through "brg16": clock source is BRG1-BRG16, respectively
1668 "clk1" through "clk24": clock source is CLK1-CLK24, respectively
1669 - tx-clock-name: the UCC transmit clock source
1670 "none": clock source is disabled
1671 "brg1" through "brg16": clock source is BRG1-BRG16, respectively
1672 "clk1" through "clk24": clock source is CLK1-CLK24, respectively
1673 The following two properties are deprecated. rx-clock has been replaced
1674 with rx-clock-name, and tx-clock has been replaced with tx-clock-name.
1675 Drivers that currently use the deprecated properties should continue to
1676 do so, in order to support older device trees, but they should be updated
1677 to check for the new properties first.
1678 - rx-clock : represents the UCC receive clock source.
1679 0x00 : clock source is disabled;
1680 0x1~0x10 : clock source is BRG1~BRG16 respectively;
1681 0x11~0x28: clock source is QE_CLK1~QE_CLK24 respectively.
1682 - tx-clock: represents the UCC transmit clock source;
1683 0x00 : clock source is disabled;
1684 0x1~0x10 : clock source is BRG1~BRG16 respectively;
1685 0x11~0x28: clock source is QE_CLK1~QE_CLK24 respectively.
1686
1687 Required properties for network device_type:
1688 - mac-address : list of bytes representing the ethernet address.
1689 - phy-handle : The phandle for the PHY connected to this controller.
1690
1691 Recommended properties:
1692 - phy-connection-type : a string naming the controller/PHY interface type,
1693 i.e., "mii" (default), "rmii", "gmii", "rgmii", "rgmii-id" (Internal
1694 Delay), "rgmii-txid" (delay on TX only), "rgmii-rxid" (delay on RX only),
1695 "tbi", or "rtbi".
1696
1697 Example:
1698 ucc@2000 {
1699 device_type = "network";
1700 compatible = "ucc_geth";
1701 cell-index = <1>;
1702 reg = <2000 200>;
1703 interrupts = <a0 0>;
1704 interrupt-parent = <700>;
1705 mac-address = [ 00 04 9f 00 23 23 ];
1706 rx-clock = "none";
1707 tx-clock = "clk9";
1708 phy-handle = <212000>;
1709 phy-connection-type = "gmii";
1710 pio-handle = <140001>;
1711 };
1712
1713
1714 v) Parallel I/O Ports
1715
1716 This node configures Parallel I/O ports for CPUs with QE support.
1717 The node should reside in the "soc" node of the tree. For each
1718 device that using parallel I/O ports, a child node should be created.
1719 See the definition of the Pin configuration nodes below for more
1720 information.
1721
1722 Required properties:
1723 - device_type : should be "par_io".
1724 - reg : offset to the register set and its length.
1725 - num-ports : number of Parallel I/O ports
1726
1727 Example:
1728 par_io@1400 {
1729 reg = <1400 100>;
1730 #address-cells = <1>;
1731 #size-cells = <0>;
1732 device_type = "par_io";
1733 num-ports = <7>;
1734 ucc_pin@01 {
1735 ......
1736 };
1737
1738
1739 vi) Pin configuration nodes
1740
1741 Required properties:
1742 - linux,phandle : phandle of this node; likely referenced by a QE
1743 device.
1744 - pio-map : array of pin configurations. Each pin is defined by 6
1745 integers. The six numbers are respectively: port, pin, dir,
1746 open_drain, assignment, has_irq.
1747 - port : port number of the pin; 0-6 represent port A-G in UM.
1748 - pin : pin number in the port.
1749 - dir : direction of the pin, should encode as follows:
1750
1751 0 = The pin is disabled
1752 1 = The pin is an output
1753 2 = The pin is an input
1754 3 = The pin is I/O
1755
1756 - open_drain : indicates the pin is normal or wired-OR:
1757
1758 0 = The pin is actively driven as an output
1759 1 = The pin is an open-drain driver. As an output, the pin is
1760 driven active-low, otherwise it is three-stated.
1761
1762 - assignment : function number of the pin according to the Pin Assignment
1763 tables in User Manual. Each pin can have up to 4 possible functions in
1764 QE and two options for CPM.
1765 - has_irq : indicates if the pin is used as source of external
1766 interrupts.
1767
1768 Example:
1769 ucc_pin@01 {
1770 linux,phandle = <140001>;
1771 pio-map = <
1772 /* port pin dir open_drain assignment has_irq */
1773 0 3 1 0 1 0 /* TxD0 */
1774 0 4 1 0 1 0 /* TxD1 */
1775 0 5 1 0 1 0 /* TxD2 */
1776 0 6 1 0 1 0 /* TxD3 */
1777 1 6 1 0 3 0 /* TxD4 */
1778 1 7 1 0 1 0 /* TxD5 */
1779 1 9 1 0 2 0 /* TxD6 */
1780 1 a 1 0 2 0 /* TxD7 */
1781 0 9 2 0 1 0 /* RxD0 */
1782 0 a 2 0 1 0 /* RxD1 */
1783 0 b 2 0 1 0 /* RxD2 */
1784 0 c 2 0 1 0 /* RxD3 */
1785 0 d 2 0 1 0 /* RxD4 */
1786 1 1 2 0 2 0 /* RxD5 */
1787 1 0 2 0 2 0 /* RxD6 */
1788 1 4 2 0 2 0 /* RxD7 */
1789 0 7 1 0 1 0 /* TX_EN */
1790 0 8 1 0 1 0 /* TX_ER */
1791 0 f 2 0 1 0 /* RX_DV */
1792 0 10 2 0 1 0 /* RX_ER */
1793 0 0 2 0 1 0 /* RX_CLK */
1794 2 9 1 0 3 0 /* GTX_CLK - CLK10 */
1795 2 8 2 0 1 0>; /* GTX125 - CLK9 */
1796 };
1797
1798 vii) Multi-User RAM (MURAM)
1799
1800 Required properties:
1801 - compatible : should be "fsl,qe-muram", "fsl,cpm-muram".
1802 - mode : the could be "host" or "slave".
1803 - ranges : Should be defined as specified in 1) to describe the
1804 translation of MURAM addresses.
1805 - data-only : sub-node which defines the address area under MURAM
1806 bus that can be allocated as data/parameter
1807
1808 Example:
1809
1810 muram@10000 {
1811 compatible = "fsl,qe-muram", "fsl,cpm-muram";
1812 ranges = <0 00010000 0000c000>;
1813
1814 data-only@0{
1815 compatible = "fsl,qe-muram-data",
1816 "fsl,cpm-muram-data";
1817 reg = <0 c000>;
1818 };
1819 };
1820
1821 viii) Uploaded QE firmware
1822
1823 If a new firwmare has been uploaded to the QE (usually by the
1824 boot loader), then a 'firmware' child node should be added to the QE
1825 node. This node provides information on the uploaded firmware that
1826 device drivers may need.
1827
1828 Required properties:
1829 - id: The string name of the firmware. This is taken from the 'id'
1830 member of the qe_firmware structure of the uploaded firmware.
1831 Device drivers can search this string to determine if the
1832 firmware they want is already present.
1833 - extended-modes: The Extended Modes bitfield, taken from the
1834 firmware binary. It is a 64-bit number represented
1835 as an array of two 32-bit numbers.
1836 - virtual-traps: The virtual traps, taken from the firmware binary.
1837 It is an array of 8 32-bit numbers.
1838
1839 Example:
1840
1841 firmware {
1842 id = "Soft-UART";
1843 extended-modes = <0 0>;
1844 virtual-traps = <0 0 0 0 0 0 0 0>;
1845 }
1846
1847 j) CFI or JEDEC memory-mapped NOR flash
1848 1287
1849 Flash chips (Memory Technology Devices) are often used for solid state 1288 Flash chips (Memory Technology Devices) are often used for solid state
1850 file systems on embedded devices. 1289 file systems on embedded devices.
@@ -1908,268 +1347,7 @@ platforms are moved over to use the flattened-device-tree model.
1908 }; 1347 };
1909 }; 1348 };
1910 1349
1911 k) Global Utilities Block 1350 d) 4xx/Axon EMAC ethernet nodes
1912
1913 The global utilities block controls power management, I/O device
1914 enabling, power-on-reset configuration monitoring, general-purpose
1915 I/O signal configuration, alternate function selection for multiplexed
1916 signals, and clock control.
1917
1918 Required properties:
1919
1920 - compatible : Should define the compatible device type for
1921 global-utilities.
1922 - reg : Offset and length of the register set for the device.
1923
1924 Recommended properties:
1925
1926 - fsl,has-rstcr : Indicates that the global utilities register set
1927 contains a functioning "reset control register" (i.e. the board
1928 is wired to reset upon setting the HRESET_REQ bit in this register).
1929
1930 Example:
1931
1932 global-utilities@e0000 { /* global utilities block */
1933 compatible = "fsl,mpc8548-guts";
1934 reg = <e0000 1000>;
1935 fsl,has-rstcr;
1936 };
1937
1938 l) Freescale Communications Processor Module
1939
1940 NOTE: This is an interim binding, and will likely change slightly,
1941 as more devices are supported. The QE bindings especially are
1942 incomplete.
1943
1944 i) Root CPM node
1945
1946 Properties:
1947 - compatible : "fsl,cpm1", "fsl,cpm2", or "fsl,qe".
1948 - reg : A 48-byte region beginning with CPCR.
1949
1950 Example:
1951 cpm@119c0 {
1952 #address-cells = <1>;
1953 #size-cells = <1>;
1954 #interrupt-cells = <2>;
1955 compatible = "fsl,mpc8272-cpm", "fsl,cpm2";
1956 reg = <119c0 30>;
1957 }
1958
1959 ii) Properties common to mulitple CPM/QE devices
1960
1961 - fsl,cpm-command : This value is ORed with the opcode and command flag
1962 to specify the device on which a CPM command operates.
1963
1964 - fsl,cpm-brg : Indicates which baud rate generator the device
1965 is associated with. If absent, an unused BRG
1966 should be dynamically allocated. If zero, the
1967 device uses an external clock rather than a BRG.
1968
1969 - reg : Unless otherwise specified, the first resource represents the
1970 scc/fcc/ucc registers, and the second represents the device's
1971 parameter RAM region (if it has one).
1972
1973 iii) Serial
1974
1975 Currently defined compatibles:
1976 - fsl,cpm1-smc-uart
1977 - fsl,cpm2-smc-uart
1978 - fsl,cpm1-scc-uart
1979 - fsl,cpm2-scc-uart
1980 - fsl,qe-uart
1981
1982 Example:
1983
1984 serial@11a00 {
1985 device_type = "serial";
1986 compatible = "fsl,mpc8272-scc-uart",
1987 "fsl,cpm2-scc-uart";
1988 reg = <11a00 20 8000 100>;
1989 interrupts = <28 8>;
1990 interrupt-parent = <&PIC>;
1991 fsl,cpm-brg = <1>;
1992 fsl,cpm-command = <00800000>;
1993 };
1994
1995 iii) Network
1996
1997 Currently defined compatibles:
1998 - fsl,cpm1-scc-enet
1999 - fsl,cpm2-scc-enet
2000 - fsl,cpm1-fec-enet
2001 - fsl,cpm2-fcc-enet (third resource is GFEMR)
2002 - fsl,qe-enet
2003
2004 Example:
2005
2006 ethernet@11300 {
2007 device_type = "network";
2008 compatible = "fsl,mpc8272-fcc-enet",
2009 "fsl,cpm2-fcc-enet";
2010 reg = <11300 20 8400 100 11390 1>;
2011 local-mac-address = [ 00 00 00 00 00 00 ];
2012 interrupts = <20 8>;
2013 interrupt-parent = <&PIC>;
2014 phy-handle = <&PHY0>;
2015 fsl,cpm-command = <12000300>;
2016 };
2017
2018 iv) MDIO
2019
2020 Currently defined compatibles:
2021 fsl,pq1-fec-mdio (reg is same as first resource of FEC device)
2022 fsl,cpm2-mdio-bitbang (reg is port C registers)
2023
2024 Properties for fsl,cpm2-mdio-bitbang:
2025 fsl,mdio-pin : pin of port C controlling mdio data
2026 fsl,mdc-pin : pin of port C controlling mdio clock
2027
2028 Example:
2029
2030 mdio@10d40 {
2031 device_type = "mdio";
2032 compatible = "fsl,mpc8272ads-mdio-bitbang",
2033 "fsl,mpc8272-mdio-bitbang",
2034 "fsl,cpm2-mdio-bitbang";
2035 reg = <10d40 14>;
2036 #address-cells = <1>;
2037 #size-cells = <0>;
2038 fsl,mdio-pin = <12>;
2039 fsl,mdc-pin = <13>;
2040 };
2041
2042 v) Baud Rate Generators
2043
2044 Currently defined compatibles:
2045 fsl,cpm-brg
2046 fsl,cpm1-brg
2047 fsl,cpm2-brg
2048
2049 Properties:
2050 - reg : There may be an arbitrary number of reg resources; BRG
2051 numbers are assigned to these in order.
2052 - clock-frequency : Specifies the base frequency driving
2053 the BRG.
2054
2055 Example:
2056
2057 brg@119f0 {
2058 compatible = "fsl,mpc8272-brg",
2059 "fsl,cpm2-brg",
2060 "fsl,cpm-brg";
2061 reg = <119f0 10 115f0 10>;
2062 clock-frequency = <d#25000000>;
2063 };
2064
2065 vi) Interrupt Controllers
2066
2067 Currently defined compatibles:
2068 - fsl,cpm1-pic
2069 - only one interrupt cell
2070 - fsl,pq1-pic
2071 - fsl,cpm2-pic
2072 - second interrupt cell is level/sense:
2073 - 2 is falling edge
2074 - 8 is active low
2075
2076 Example:
2077
2078 interrupt-controller@10c00 {
2079 #interrupt-cells = <2>;
2080 interrupt-controller;
2081 reg = <10c00 80>;
2082 compatible = "mpc8272-pic", "fsl,cpm2-pic";
2083 };
2084
2085 vii) USB (Universal Serial Bus Controller)
2086
2087 Properties:
2088 - compatible : "fsl,cpm1-usb", "fsl,cpm2-usb", "fsl,qe-usb"
2089
2090 Example:
2091 usb@11bc0 {
2092 #address-cells = <1>;
2093 #size-cells = <0>;
2094 compatible = "fsl,cpm2-usb";
2095 reg = <11b60 18 8b00 100>;
2096 interrupts = <b 8>;
2097 interrupt-parent = <&PIC>;
2098 fsl,cpm-command = <2e600000>;
2099 };
2100
2101 viii) Multi-User RAM (MURAM)
2102
2103 The multi-user/dual-ported RAM is expressed as a bus under the CPM node.
2104
2105 Ranges must be set up subject to the following restrictions:
2106
2107 - Children's reg nodes must be offsets from the start of all muram, even
2108 if the user-data area does not begin at zero.
2109 - If multiple range entries are used, the difference between the parent
2110 address and the child address must be the same in all, so that a single
2111 mapping can cover them all while maintaining the ability to determine
2112 CPM-side offsets with pointer subtraction. It is recommended that
2113 multiple range entries not be used.
2114 - A child address of zero must be translatable, even if no reg resources
2115 contain it.
2116
2117 A child "data" node must exist, compatible with "fsl,cpm-muram-data", to
2118 indicate the portion of muram that is usable by the OS for arbitrary
2119 purposes. The data node may have an arbitrary number of reg resources,
2120 all of which contribute to the allocatable muram pool.
2121
2122 Example, based on mpc8272:
2123
2124 muram@0 {
2125 #address-cells = <1>;
2126 #size-cells = <1>;
2127 ranges = <0 0 10000>;
2128
2129 data@0 {
2130 compatible = "fsl,cpm-muram-data";
2131 reg = <0 2000 9800 800>;
2132 };
2133 };
2134
2135 m) Chipselect/Local Bus
2136
2137 Properties:
2138 - name : Should be localbus
2139 - #address-cells : Should be either two or three. The first cell is the
2140 chipselect number, and the remaining cells are the
2141 offset into the chipselect.
2142 - #size-cells : Either one or two, depending on how large each chipselect
2143 can be.
2144 - ranges : Each range corresponds to a single chipselect, and cover
2145 the entire access window as configured.
2146
2147 Example:
2148 localbus@f0010100 {
2149 compatible = "fsl,mpc8272-localbus",
2150 "fsl,pq2-localbus";
2151 #address-cells = <2>;
2152 #size-cells = <1>;
2153 reg = <f0010100 40>;
2154
2155 ranges = <0 0 fe000000 02000000
2156 1 0 f4500000 00008000>;
2157
2158 flash@0,0 {
2159 compatible = "jedec-flash";
2160 reg = <0 0 2000000>;
2161 bank-width = <4>;
2162 device-width = <1>;
2163 };
2164
2165 board-control@1,0 {
2166 reg = <1 0 20>;
2167 compatible = "fsl,mpc8272ads-bcsr";
2168 };
2169 };
2170
2171
2172 n) 4xx/Axon EMAC ethernet nodes
2173 1351
2174 The EMAC ethernet controller in IBM and AMCC 4xx chips, and also 1352 The EMAC ethernet controller in IBM and AMCC 4xx chips, and also
2175 the Axon bridge. To operate this needs to interact with a ths 1353 the Axon bridge. To operate this needs to interact with a ths
@@ -2317,7 +1495,7 @@ platforms are moved over to use the flattened-device-tree model.
2317 available. 1495 available.
2318 For Axon: 0x0000012a 1496 For Axon: 0x0000012a
2319 1497
2320 o) Xilinx IP cores 1498 e) Xilinx IP cores
2321 1499
2322 The Xilinx EDK toolchain ships with a set of IP cores (devices) for use 1500 The Xilinx EDK toolchain ships with a set of IP cores (devices) for use
2323 in Xilinx Spartan and Virtex FPGAs. The devices cover the whole range 1501 in Xilinx Spartan and Virtex FPGAs. The devices cover the whole range
@@ -2611,206 +1789,7 @@ platforms are moved over to use the flattened-device-tree model.
2611 - reg-offset : A value of 3 is required 1789 - reg-offset : A value of 3 is required
2612 - reg-shift : A value of 2 is required 1790 - reg-shift : A value of 2 is required
2613 1791
2614 1792 f) USB EHCI controllers
2615 p) Freescale Synchronous Serial Interface
2616
2617 The SSI is a serial device that communicates with audio codecs. It can
2618 be programmed in AC97, I2S, left-justified, or right-justified modes.
2619
2620 Required properties:
2621 - compatible : compatible list, containing "fsl,ssi"
2622 - cell-index : the SSI, <0> = SSI1, <1> = SSI2, and so on
2623 - reg : offset and length of the register set for the device
2624 - interrupts : <a b> where a is the interrupt number and b is a
2625 field that represents an encoding of the sense and
2626 level information for the interrupt. This should be
2627 encoded based on the information in section 2)
2628 depending on the type of interrupt controller you
2629 have.
2630 - interrupt-parent : the phandle for the interrupt controller that
2631 services interrupts for this device.
2632 - fsl,mode : the operating mode for the SSI interface
2633 "i2s-slave" - I2S mode, SSI is clock slave
2634 "i2s-master" - I2S mode, SSI is clock master
2635 "lj-slave" - left-justified mode, SSI is clock slave
2636 "lj-master" - l.j. mode, SSI is clock master
2637 "rj-slave" - right-justified mode, SSI is clock slave
2638 "rj-master" - r.j., SSI is clock master
2639 "ac97-slave" - AC97 mode, SSI is clock slave
2640 "ac97-master" - AC97 mode, SSI is clock master
2641
2642 Optional properties:
2643 - codec-handle : phandle to a 'codec' node that defines an audio
2644 codec connected to this SSI. This node is typically
2645 a child of an I2C or other control node.
2646
2647 Child 'codec' node required properties:
2648 - compatible : compatible list, contains the name of the codec
2649
2650 Child 'codec' node optional properties:
2651 - clock-frequency : The frequency of the input clock, which typically
2652 comes from an on-board dedicated oscillator.
2653
2654 * Freescale 83xx DMA Controller
2655
2656 Freescale PowerPC 83xx have on chip general purpose DMA controllers.
2657
2658 Required properties:
2659
2660 - compatible : compatible list, contains 2 entries, first is
2661 "fsl,CHIP-dma", where CHIP is the processor
2662 (mpc8349, mpc8360, etc.) and the second is
2663 "fsl,elo-dma"
2664 - reg : <registers mapping for DMA general status reg>
2665 - ranges : Should be defined as specified in 1) to describe the
2666 DMA controller channels.
2667 - cell-index : controller index. 0 for controller @ 0x8100
2668 - interrupts : <interrupt mapping for DMA IRQ>
2669 - interrupt-parent : optional, if needed for interrupt mapping
2670
2671
2672 - DMA channel nodes:
2673 - compatible : compatible list, contains 2 entries, first is
2674 "fsl,CHIP-dma-channel", where CHIP is the processor
2675 (mpc8349, mpc8350, etc.) and the second is
2676 "fsl,elo-dma-channel"
2677 - reg : <registers mapping for channel>
2678 - cell-index : dma channel index starts at 0.
2679
2680 Optional properties:
2681 - interrupts : <interrupt mapping for DMA channel IRQ>
2682 (on 83xx this is expected to be identical to
2683 the interrupts property of the parent node)
2684 - interrupt-parent : optional, if needed for interrupt mapping
2685
2686 Example:
2687 dma@82a8 {
2688 #address-cells = <1>;
2689 #size-cells = <1>;
2690 compatible = "fsl,mpc8349-dma", "fsl,elo-dma";
2691 reg = <82a8 4>;
2692 ranges = <0 8100 1a4>;
2693 interrupt-parent = <&ipic>;
2694 interrupts = <47 8>;
2695 cell-index = <0>;
2696 dma-channel@0 {
2697 compatible = "fsl,mpc8349-dma-channel", "fsl,elo-dma-channel";
2698 cell-index = <0>;
2699 reg = <0 80>;
2700 };
2701 dma-channel@80 {
2702 compatible = "fsl,mpc8349-dma-channel", "fsl,elo-dma-channel";
2703 cell-index = <1>;
2704 reg = <80 80>;
2705 };
2706 dma-channel@100 {
2707 compatible = "fsl,mpc8349-dma-channel", "fsl,elo-dma-channel";
2708 cell-index = <2>;
2709 reg = <100 80>;
2710 };
2711 dma-channel@180 {
2712 compatible = "fsl,mpc8349-dma-channel", "fsl,elo-dma-channel";
2713 cell-index = <3>;
2714 reg = <180 80>;
2715 };
2716 };
2717
2718 * Freescale 85xx/86xx DMA Controller
2719
2720 Freescale PowerPC 85xx/86xx have on chip general purpose DMA controllers.
2721
2722 Required properties:
2723
2724 - compatible : compatible list, contains 2 entries, first is
2725 "fsl,CHIP-dma", where CHIP is the processor
2726 (mpc8540, mpc8540, etc.) and the second is
2727 "fsl,eloplus-dma"
2728 - reg : <registers mapping for DMA general status reg>
2729 - cell-index : controller index. 0 for controller @ 0x21000,
2730 1 for controller @ 0xc000
2731 - ranges : Should be defined as specified in 1) to describe the
2732 DMA controller channels.
2733
2734 - DMA channel nodes:
2735 - compatible : compatible list, contains 2 entries, first is
2736 "fsl,CHIP-dma-channel", where CHIP is the processor
2737 (mpc8540, mpc8560, etc.) and the second is
2738 "fsl,eloplus-dma-channel"
2739 - cell-index : dma channel index starts at 0.
2740 - reg : <registers mapping for channel>
2741 - interrupts : <interrupt mapping for DMA channel IRQ>
2742 - interrupt-parent : optional, if needed for interrupt mapping
2743
2744 Example:
2745 dma@21300 {
2746 #address-cells = <1>;
2747 #size-cells = <1>;
2748 compatible = "fsl,mpc8540-dma", "fsl,eloplus-dma";
2749 reg = <21300 4>;
2750 ranges = <0 21100 200>;
2751 cell-index = <0>;
2752 dma-channel@0 {
2753 compatible = "fsl,mpc8540-dma-channel", "fsl,eloplus-dma-channel";
2754 reg = <0 80>;
2755 cell-index = <0>;
2756 interrupt-parent = <&mpic>;
2757 interrupts = <14 2>;
2758 };
2759 dma-channel@80 {
2760 compatible = "fsl,mpc8540-dma-channel", "fsl,eloplus-dma-channel";
2761 reg = <80 80>;
2762 cell-index = <1>;
2763 interrupt-parent = <&mpic>;
2764 interrupts = <15 2>;
2765 };
2766 dma-channel@100 {
2767 compatible = "fsl,mpc8540-dma-channel", "fsl,eloplus-dma-channel";
2768 reg = <100 80>;
2769 cell-index = <2>;
2770 interrupt-parent = <&mpic>;
2771 interrupts = <16 2>;
2772 };
2773 dma-channel@180 {
2774 compatible = "fsl,mpc8540-dma-channel", "fsl,eloplus-dma-channel";
2775 reg = <180 80>;
2776 cell-index = <3>;
2777 interrupt-parent = <&mpic>;
2778 interrupts = <17 2>;
2779 };
2780 };
2781
2782 * Freescale 8xxx/3.0 Gb/s SATA nodes
2783
2784 SATA nodes are defined to describe on-chip Serial ATA controllers.
2785 Each SATA port should have its own node.
2786
2787 Required properties:
2788 - compatible : compatible list, contains 2 entries, first is
2789 "fsl,CHIP-sata", where CHIP is the processor
2790 (mpc8315, mpc8379, etc.) and the second is
2791 "fsl,pq-sata"
2792 - interrupts : <interrupt mapping for SATA IRQ>
2793 - cell-index : controller index.
2794 1 for controller @ 0x18000
2795 2 for controller @ 0x19000
2796 3 for controller @ 0x1a000
2797 4 for controller @ 0x1b000
2798
2799 Optional properties:
2800 - interrupt-parent : optional, if needed for interrupt mapping
2801 - reg : <registers mapping>
2802
2803 Example:
2804
2805 sata@18000 {
2806 compatible = "fsl,mpc8379-sata", "fsl,pq-sata";
2807 reg = <0x18000 0x1000>;
2808 cell-index = <1>;
2809 interrupts = <2c 8>;
2810 interrupt-parent = < &ipic >;
2811 };
2812
2813 q) USB EHCI controllers
2814 1793
2815 Required properties: 1794 Required properties:
2816 - compatible : should be "usb-ehci". 1795 - compatible : should be "usb-ehci".
@@ -2836,40 +1815,6 @@ platforms are moved over to use the flattened-device-tree model.
2836 big-endian; 1815 big-endian;
2837 }; 1816 };
2838 1817
2839 r) Freescale Display Interface Unit
2840
2841 The Freescale DIU is a LCD controller, with proper hardware, it can also
2842 drive DVI monitors.
2843
2844 Required properties:
2845 - compatible : should be "fsl-diu".
2846 - reg : should contain at least address and length of the DIU register
2847 set.
2848 - Interrupts : one DIU interrupt should be describe here.
2849
2850 Example (MPC8610HPCD)
2851 display@2c000 {
2852 compatible = "fsl,diu";
2853 reg = <0x2c000 100>;
2854 interrupts = <72 2>;
2855 interrupt-parent = <&mpic>;
2856 };
2857
2858 s) Freescale on board FPGA
2859
2860 This is the memory-mapped registers for on board FPGA.
2861
2862 Required properities:
2863 - compatible : should be "fsl,fpga-pixis".
2864 - reg : should contain the address and the lenght of the FPPGA register
2865 set.
2866
2867 Example (MPC8610HPCD)
2868 board-control@e8000000 {
2869 compatible = "fsl,fpga-pixis";
2870 reg = <0xe8000000 32>;
2871 };
2872
2873VII - Marvell Discovery mv64[345]6x System Controller chips 1818VII - Marvell Discovery mv64[345]6x System Controller chips
2874=========================================================== 1819===========================================================
2875 1820
@@ -3622,14 +2567,11 @@ not necessary as they are usually the same as the root node.
3622 2567
3623 pic@40000 { 2568 pic@40000 {
3624 linux,phandle = <40000>; 2569 linux,phandle = <40000>;
3625 clock-frequency = <0>;
3626 interrupt-controller; 2570 interrupt-controller;
3627 #address-cells = <0>; 2571 #address-cells = <0>;
3628 reg = <40000 40000>; 2572 reg = <40000 40000>;
3629 built-in;
3630 compatible = "chrp,open-pic"; 2573 compatible = "chrp,open-pic";
3631 device_type = "open-pic"; 2574 device_type = "open-pic";
3632 big-endian;
3633 }; 2575 };
3634 2576
3635 i2c@3000 { 2577 i2c@3000 {
diff --git a/Documentation/powerpc/bootwrapper.txt b/Documentation/powerpc/bootwrapper.txt
new file mode 100644
index 000000000000..d60fced5e1cc
--- /dev/null
+++ b/Documentation/powerpc/bootwrapper.txt
@@ -0,0 +1,141 @@
1The PowerPC boot wrapper
2------------------------
3Copyright (C) Secret Lab Technologies Ltd.
4
5PowerPC image targets compresses and wraps the kernel image (vmlinux) with
6a boot wrapper to make it usable by the system firmware. There is no
7standard PowerPC firmware interface, so the boot wrapper is designed to
8be adaptable for each kind of image that needs to be built.
9
10The boot wrapper can be found in the arch/powerpc/boot/ directory. The
11Makefile in that directory has targets for all the available image types.
12The different image types are used to support all of the various firmware
13interfaces found on PowerPC platforms. OpenFirmware is the most commonly
14used firmware type on general purpose PowerPC systems from Apple, IBM and
15others. U-Boot is typically found on embedded PowerPC hardware, but there
16are a handful of other firmware implementations which are also popular. Each
17firmware interface requires a different image format.
18
19The boot wrapper is built from the makefile in arch/powerpc/boot/Makefile and
20it uses the wrapper script (arch/powerpc/boot/wrapper) to generate target
21image. The details of the build system is discussed in the next section.
22Currently, the following image format targets exist:
23
24 cuImage.%: Backwards compatible uImage for older version of
25 U-Boot (for versions that don't understand the device
26 tree). This image embeds a device tree blob inside
27 the image. The boot wrapper, kernel and device tree
28 are all embedded inside the U-Boot uImage file format
29 with boot wrapper code that extracts data from the old
30 bd_info structure and loads the data into the device
31 tree before jumping into the kernel.
32 Because of the series of #ifdefs found in the
33 bd_info structure used in the old U-Boot interfaces,
34 cuImages are platform specific. Each specific
35 U-Boot platform has a different platform init file
36 which populates the embedded device tree with data
37 from the platform specific bd_info file. The platform
38 specific cuImage platform init code can be found in
39 arch/powerpc/boot/cuboot.*.c. Selection of the correct
40 cuImage init code for a specific board can be found in
41 the wrapper structure.
42 dtbImage.%: Similar to zImage, except device tree blob is embedded
43 inside the image instead of provided by firmware. The
44 output image file can be either an elf file or a flat
45 binary depending on the platform.
46 dtbImages are used on systems which do not have an
47 interface for passing a device tree directly.
48 dtbImages are similar to simpleImages except that
49 dtbImages have platform specific code for extracting
50 data from the board firmware, but simpleImages do not
51 talk to the firmware at all.
52 PlayStation 3 support uses dtbImage. So do Embedded
53 Planet boards using the PlanetCore firmware. Board
54 specific initialization code is typically found in a
55 file named arch/powerpc/boot/<platform>.c; but this
56 can be overridden by the wrapper script.
57 simpleImage.%: Firmware independent compressed image that does not
58 depend on any particular firmware interface and embeds
59 a device tree blob. This image is a flat binary that
60 can be loaded to any location in RAM and jumped to.
61 Firmware cannot pass any configuration data to the
62 kernel with this image type and it depends entirely on
63 the embedded device tree for all information.
64 The simpleImage is useful for booting systems with
65 an unknown firmware interface or for booting from
66 a debugger when no firmware is present (such as on
67 the Xilinx Virtex platform). The only assumption that
68 simpleImage makes is that RAM is correctly initialized
69 and that the MMU is either off or has RAM mapped to
70 base address 0.
71 simpleImage also supports inserting special platform
72 specific initialization code to the start of the bootup
73 sequence. The virtex405 platform uses this feature to
74 ensure that the cache is invalidated before caching
75 is enabled. Platform specific initialization code is
76 added as part of the wrapper script and is keyed on
77 the image target name. For example, all
78 simpleImage.virtex405-* targets will add the
79 virtex405-head.S initialization code (This also means
80 that the dts file for virtex405 targets should be
81 named (virtex405-<board>.dts). Search the wrapper
82 script for 'virtex405' and see the file
83 arch/powerpc/boot/virtex405-head.S for details.
84 treeImage.%; Image format for used with OpenBIOS firmware found
85 on some ppc4xx hardware. This image embeds a device
86 tree blob inside the image.
87 uImage: Native image format used by U-Boot. The uImage target
88 does not add any boot code. It just wraps a compressed
89 vmlinux in the uImage data structure. This image
90 requires a version of U-Boot that is able to pass
91 a device tree to the kernel at boot. If using an older
92 version of U-Boot, then you need to use a cuImage
93 instead.
94 zImage.%: Image format which does not embed a device tree.
95 Used by OpenFirmware and other firmware interfaces
96 which are able to supply a device tree. This image
97 expects firmware to provide the device tree at boot.
98 Typically, if you have general purpose PowerPC
99 hardware then you want this image format.
100
101Image types which embed a device tree blob (simpleImage, dtbImage, treeImage,
102and cuImage) all generate the device tree blob from a file in the
103arch/powerpc/boot/dts/ directory. The Makefile selects the correct device
104tree source based on the name of the target. Therefore, if the kernel is
105built with 'make treeImage.walnut simpleImage.virtex405-ml403', then the
106build system will use arch/powerpc/boot/dts/walnut.dts to build
107treeImage.walnut and arch/powerpc/boot/dts/virtex405-ml403.dts to build
108the simpleImage.virtex405-ml403.
109
110Two special targets called 'zImage' and 'zImage.initrd' also exist. These
111targets build all the default images as selected by the kernel configuration.
112Default images are selected by the boot wrapper Makefile
113(arch/powerpc/boot/Makefile) by adding targets to the $image-y variable. Look
114at the Makefile to see which default image targets are available.
115
116How it is built
117---------------
118arch/powerpc is designed to support multiplatform kernels, which means
119that a single vmlinux image can be booted on many different target boards.
120It also means that the boot wrapper must be able to wrap for many kinds of
121images on a single build. The design decision was made to not use any
122conditional compilation code (#ifdef, etc) in the boot wrapper source code.
123All of the boot wrapper pieces are buildable at any time regardless of the
124kernel configuration. Building all the wrapper bits on every kernel build
125also ensures that obscure parts of the wrapper are at the very least compile
126tested in a large variety of environments.
127
128The wrapper is adapted for different image types at link time by linking in
129just the wrapper bits that are appropriate for the image type. The 'wrapper
130script' (found in arch/powerpc/boot/wrapper) is called by the Makefile and
131is responsible for selecting the correct wrapper bits for the image type.
132The arguments are well documented in the script's comment block, so they
133are not repeated here. However, it is worth mentioning that the script
134uses the -p (platform) argument as the main method of deciding which wrapper
135bits to compile in. Look for the large 'case "$platform" in' block in the
136middle of the script. This is also the place where platform specific fixups
137can be selected by changing the link order.
138
139In particular, care should be taken when working with cuImages. cuImage
140wrapper bits are very board specific and care should be taken to make sure
141the target you are trying to build is supported by the wrapper bits.
diff --git a/Documentation/powerpc/dts-bindings/fsl/board.txt b/Documentation/powerpc/dts-bindings/fsl/board.txt
new file mode 100644
index 000000000000..74ae6f1cd2d6
--- /dev/null
+++ b/Documentation/powerpc/dts-bindings/fsl/board.txt
@@ -0,0 +1,29 @@
1* Board Control and Status (BCSR)
2
3Required properties:
4
5 - device_type : Should be "board-control"
6 - reg : Offset and length of the register set for the device
7
8Example:
9
10 bcsr@f8000000 {
11 device_type = "board-control";
12 reg = <f8000000 8000>;
13 };
14
15* Freescale on board FPGA
16
17This is the memory-mapped registers for on board FPGA.
18
19Required properities:
20- compatible : should be "fsl,fpga-pixis".
21- reg : should contain the address and the lenght of the FPPGA register
22 set.
23
24Example (MPC8610HPCD):
25
26 board-control@e8000000 {
27 compatible = "fsl,fpga-pixis";
28 reg = <0xe8000000 32>;
29 };
diff --git a/Documentation/powerpc/dts-bindings/fsl/cpm_qe/cpm.txt b/Documentation/powerpc/dts-bindings/fsl/cpm_qe/cpm.txt
new file mode 100644
index 000000000000..088fc471e03a
--- /dev/null
+++ b/Documentation/powerpc/dts-bindings/fsl/cpm_qe/cpm.txt
@@ -0,0 +1,67 @@
1* Freescale Communications Processor Module
2
3NOTE: This is an interim binding, and will likely change slightly,
4as more devices are supported. The QE bindings especially are
5incomplete.
6
7* Root CPM node
8
9Properties:
10- compatible : "fsl,cpm1", "fsl,cpm2", or "fsl,qe".
11- reg : A 48-byte region beginning with CPCR.
12
13Example:
14 cpm@119c0 {
15 #address-cells = <1>;
16 #size-cells = <1>;
17 #interrupt-cells = <2>;
18 compatible = "fsl,mpc8272-cpm", "fsl,cpm2";
19 reg = <119c0 30>;
20 }
21
22* Properties common to mulitple CPM/QE devices
23
24- fsl,cpm-command : This value is ORed with the opcode and command flag
25 to specify the device on which a CPM command operates.
26
27- fsl,cpm-brg : Indicates which baud rate generator the device
28 is associated with. If absent, an unused BRG
29 should be dynamically allocated. If zero, the
30 device uses an external clock rather than a BRG.
31
32- reg : Unless otherwise specified, the first resource represents the
33 scc/fcc/ucc registers, and the second represents the device's
34 parameter RAM region (if it has one).
35
36* Multi-User RAM (MURAM)
37
38The multi-user/dual-ported RAM is expressed as a bus under the CPM node.
39
40Ranges must be set up subject to the following restrictions:
41
42- Children's reg nodes must be offsets from the start of all muram, even
43 if the user-data area does not begin at zero.
44- If multiple range entries are used, the difference between the parent
45 address and the child address must be the same in all, so that a single
46 mapping can cover them all while maintaining the ability to determine
47 CPM-side offsets with pointer subtraction. It is recommended that
48 multiple range entries not be used.
49- A child address of zero must be translatable, even if no reg resources
50 contain it.
51
52A child "data" node must exist, compatible with "fsl,cpm-muram-data", to
53indicate the portion of muram that is usable by the OS for arbitrary
54purposes. The data node may have an arbitrary number of reg resources,
55all of which contribute to the allocatable muram pool.
56
57Example, based on mpc8272:
58 muram@0 {
59 #address-cells = <1>;
60 #size-cells = <1>;
61 ranges = <0 0 10000>;
62
63 data@0 {
64 compatible = "fsl,cpm-muram-data";
65 reg = <0 2000 9800 800>;
66 };
67 };
diff --git a/Documentation/powerpc/dts-bindings/fsl/cpm_qe/cpm/brg.txt b/Documentation/powerpc/dts-bindings/fsl/cpm_qe/cpm/brg.txt
new file mode 100644
index 000000000000..4c7d45eaf025
--- /dev/null
+++ b/Documentation/powerpc/dts-bindings/fsl/cpm_qe/cpm/brg.txt
@@ -0,0 +1,21 @@
1* Baud Rate Generators
2
3Currently defined compatibles:
4fsl,cpm-brg
5fsl,cpm1-brg
6fsl,cpm2-brg
7
8Properties:
9- reg : There may be an arbitrary number of reg resources; BRG
10 numbers are assigned to these in order.
11- clock-frequency : Specifies the base frequency driving
12 the BRG.
13
14Example:
15 brg@119f0 {
16 compatible = "fsl,mpc8272-brg",
17 "fsl,cpm2-brg",
18 "fsl,cpm-brg";
19 reg = <119f0 10 115f0 10>;
20 clock-frequency = <d#25000000>;
21 };
diff --git a/Documentation/powerpc/dts-bindings/fsl/cpm_qe/cpm/i2c.txt b/Documentation/powerpc/dts-bindings/fsl/cpm_qe/cpm/i2c.txt
new file mode 100644
index 000000000000..87bc6048667e
--- /dev/null
+++ b/Documentation/powerpc/dts-bindings/fsl/cpm_qe/cpm/i2c.txt
@@ -0,0 +1,41 @@
1* I2C
2
3The I2C controller is expressed as a bus under the CPM node.
4
5Properties:
6- compatible : "fsl,cpm1-i2c", "fsl,cpm2-i2c"
7- reg : On CPM2 devices, the second resource doesn't specify the I2C
8 Parameter RAM itself, but the I2C_BASE field of the CPM2 Parameter RAM
9 (typically 0x8afc 0x2).
10- #address-cells : Should be one. The cell is the i2c device address with
11 the r/w bit set to zero.
12- #size-cells : Should be zero.
13- clock-frequency : Can be used to set the i2c clock frequency. If
14 unspecified, a default frequency of 60kHz is being used.
15The following two properties are deprecated. They are only used by legacy
16i2c drivers to find the bus to probe:
17- linux,i2c-index : Can be used to hard code an i2c bus number. By default,
18 the bus number is dynamically assigned by the i2c core.
19- linux,i2c-class : Can be used to override the i2c class. The class is used
20 by legacy i2c device drivers to find a bus in a specific context like
21 system management, video or sound. By default, I2C_CLASS_HWMON (1) is
22 being used. The definition of the classes can be found in
23 include/i2c/i2c.h
24
25Example, based on mpc823:
26
27 i2c@860 {
28 compatible = "fsl,mpc823-i2c",
29 "fsl,cpm1-i2c";
30 reg = <0x860 0x20 0x3c80 0x30>;
31 interrupts = <16>;
32 interrupt-parent = <&CPM_PIC>;
33 fsl,cpm-command = <0x10>;
34 #address-cells = <1>;
35 #size-cells = <0>;
36
37 rtc@68 {
38 compatible = "dallas,ds1307";
39 reg = <0x68>;
40 };
41 };
diff --git a/Documentation/powerpc/dts-bindings/fsl/cpm_qe/cpm/pic.txt b/Documentation/powerpc/dts-bindings/fsl/cpm_qe/cpm/pic.txt
new file mode 100644
index 000000000000..8e3ee1681618
--- /dev/null
+++ b/Documentation/powerpc/dts-bindings/fsl/cpm_qe/cpm/pic.txt
@@ -0,0 +1,18 @@
1* Interrupt Controllers
2
3Currently defined compatibles:
4- fsl,cpm1-pic
5 - only one interrupt cell
6- fsl,pq1-pic
7- fsl,cpm2-pic
8 - second interrupt cell is level/sense:
9 - 2 is falling edge
10 - 8 is active low
11
12Example:
13 interrupt-controller@10c00 {
14 #interrupt-cells = <2>;
15 interrupt-controller;
16 reg = <10c00 80>;
17 compatible = "mpc8272-pic", "fsl,cpm2-pic";
18 };
diff --git a/Documentation/powerpc/dts-bindings/fsl/cpm_qe/cpm/usb.txt b/Documentation/powerpc/dts-bindings/fsl/cpm_qe/cpm/usb.txt
new file mode 100644
index 000000000000..74bfda4bb824
--- /dev/null
+++ b/Documentation/powerpc/dts-bindings/fsl/cpm_qe/cpm/usb.txt
@@ -0,0 +1,15 @@
1* USB (Universal Serial Bus Controller)
2
3Properties:
4- compatible : "fsl,cpm1-usb", "fsl,cpm2-usb", "fsl,qe-usb"
5
6Example:
7 usb@11bc0 {
8 #address-cells = <1>;
9 #size-cells = <0>;
10 compatible = "fsl,cpm2-usb";
11 reg = <11b60 18 8b00 100>;
12 interrupts = <b 8>;
13 interrupt-parent = <&PIC>;
14 fsl,cpm-command = <2e600000>;
15 };
diff --git a/Documentation/powerpc/dts-bindings/fsl/cpm_qe/network.txt b/Documentation/powerpc/dts-bindings/fsl/cpm_qe/network.txt
new file mode 100644
index 000000000000..0e4269446580
--- /dev/null
+++ b/Documentation/powerpc/dts-bindings/fsl/cpm_qe/network.txt
@@ -0,0 +1,45 @@
1* Network
2
3Currently defined compatibles:
4- fsl,cpm1-scc-enet
5- fsl,cpm2-scc-enet
6- fsl,cpm1-fec-enet
7- fsl,cpm2-fcc-enet (third resource is GFEMR)
8- fsl,qe-enet
9
10Example:
11
12 ethernet@11300 {
13 device_type = "network";
14 compatible = "fsl,mpc8272-fcc-enet",
15 "fsl,cpm2-fcc-enet";
16 reg = <11300 20 8400 100 11390 1>;
17 local-mac-address = [ 00 00 00 00 00 00 ];
18 interrupts = <20 8>;
19 interrupt-parent = <&PIC>;
20 phy-handle = <&PHY0>;
21 fsl,cpm-command = <12000300>;
22 };
23
24* MDIO
25
26Currently defined compatibles:
27fsl,pq1-fec-mdio (reg is same as first resource of FEC device)
28fsl,cpm2-mdio-bitbang (reg is port C registers)
29
30Properties for fsl,cpm2-mdio-bitbang:
31fsl,mdio-pin : pin of port C controlling mdio data
32fsl,mdc-pin : pin of port C controlling mdio clock
33
34Example:
35 mdio@10d40 {
36 device_type = "mdio";
37 compatible = "fsl,mpc8272ads-mdio-bitbang",
38 "fsl,mpc8272-mdio-bitbang",
39 "fsl,cpm2-mdio-bitbang";
40 reg = <10d40 14>;
41 #address-cells = <1>;
42 #size-cells = <0>;
43 fsl,mdio-pin = <12>;
44 fsl,mdc-pin = <13>;
45 };
diff --git a/Documentation/powerpc/dts-bindings/fsl/cpm_qe/qe.txt b/Documentation/powerpc/dts-bindings/fsl/cpm_qe/qe.txt
new file mode 100644
index 000000000000..78790d58dc2c
--- /dev/null
+++ b/Documentation/powerpc/dts-bindings/fsl/cpm_qe/qe.txt
@@ -0,0 +1,58 @@
1* Freescale QUICC Engine module (QE)
2This represents qe module that is installed on PowerQUICC II Pro.
3
4NOTE: This is an interim binding; it should be updated to fit
5in with the CPM binding later in this document.
6
7Basically, it is a bus of devices, that could act more or less
8as a complete entity (UCC, USB etc ). All of them should be siblings on
9the "root" qe node, using the common properties from there.
10The description below applies to the qe of MPC8360 and
11more nodes and properties would be extended in the future.
12
13i) Root QE device
14
15Required properties:
16- compatible : should be "fsl,qe";
17- model : precise model of the QE, Can be "QE", "CPM", or "CPM2"
18- reg : offset and length of the device registers.
19- bus-frequency : the clock frequency for QUICC Engine.
20
21Recommended properties
22- brg-frequency : the internal clock source frequency for baud-rate
23 generators in Hz.
24
25Example:
26 qe@e0100000 {
27 #address-cells = <1>;
28 #size-cells = <1>;
29 #interrupt-cells = <2>;
30 compatible = "fsl,qe";
31 ranges = <0 e0100000 00100000>;
32 reg = <e0100000 480>;
33 brg-frequency = <0>;
34 bus-frequency = <179A7B00>;
35 }
36
37* Multi-User RAM (MURAM)
38
39Required properties:
40- compatible : should be "fsl,qe-muram", "fsl,cpm-muram".
41- mode : the could be "host" or "slave".
42- ranges : Should be defined as specified in 1) to describe the
43 translation of MURAM addresses.
44- data-only : sub-node which defines the address area under MURAM
45 bus that can be allocated as data/parameter
46
47Example:
48
49 muram@10000 {
50 compatible = "fsl,qe-muram", "fsl,cpm-muram";
51 ranges = <0 00010000 0000c000>;
52
53 data-only@0{
54 compatible = "fsl,qe-muram-data",
55 "fsl,cpm-muram-data";
56 reg = <0 c000>;
57 };
58 };
diff --git a/Documentation/powerpc/dts-bindings/fsl/cpm_qe/qe/firmware.txt b/Documentation/powerpc/dts-bindings/fsl/cpm_qe/qe/firmware.txt
new file mode 100644
index 000000000000..6c238f59b2a9
--- /dev/null
+++ b/Documentation/powerpc/dts-bindings/fsl/cpm_qe/qe/firmware.txt
@@ -0,0 +1,24 @@
1* Uploaded QE firmware
2
3 If a new firwmare has been uploaded to the QE (usually by the
4 boot loader), then a 'firmware' child node should be added to the QE
5 node. This node provides information on the uploaded firmware that
6 device drivers may need.
7
8 Required properties:
9 - id: The string name of the firmware. This is taken from the 'id'
10 member of the qe_firmware structure of the uploaded firmware.
11 Device drivers can search this string to determine if the
12 firmware they want is already present.
13 - extended-modes: The Extended Modes bitfield, taken from the
14 firmware binary. It is a 64-bit number represented
15 as an array of two 32-bit numbers.
16 - virtual-traps: The virtual traps, taken from the firmware binary.
17 It is an array of 8 32-bit numbers.
18
19Example:
20 firmware {
21 id = "Soft-UART";
22 extended-modes = <0 0>;
23 virtual-traps = <0 0 0 0 0 0 0 0>;
24 };
diff --git a/Documentation/powerpc/dts-bindings/fsl/cpm_qe/qe/par_io.txt b/Documentation/powerpc/dts-bindings/fsl/cpm_qe/qe/par_io.txt
new file mode 100644
index 000000000000..60984260207b
--- /dev/null
+++ b/Documentation/powerpc/dts-bindings/fsl/cpm_qe/qe/par_io.txt
@@ -0,0 +1,51 @@
1* Parallel I/O Ports
2
3This node configures Parallel I/O ports for CPUs with QE support.
4The node should reside in the "soc" node of the tree. For each
5device that using parallel I/O ports, a child node should be created.
6See the definition of the Pin configuration nodes below for more
7information.
8
9Required properties:
10- device_type : should be "par_io".
11- reg : offset to the register set and its length.
12- num-ports : number of Parallel I/O ports
13
14Example:
15par_io@1400 {
16 reg = <1400 100>;
17 #address-cells = <1>;
18 #size-cells = <0>;
19 device_type = "par_io";
20 num-ports = <7>;
21 ucc_pin@01 {
22 ......
23 };
24
25Note that "par_io" nodes are obsolete, and should not be used for
26the new device trees. Instead, each Par I/O bank should be represented
27via its own gpio-controller node:
28
29Required properties:
30- #gpio-cells : should be "2".
31- compatible : should be "fsl,<chip>-qe-pario-bank",
32 "fsl,mpc8323-qe-pario-bank".
33- reg : offset to the register set and its length.
34- gpio-controller : node to identify gpio controllers.
35
36Example:
37 qe_pio_a: gpio-controller@1400 {
38 #gpio-cells = <2>;
39 compatible = "fsl,mpc8360-qe-pario-bank",
40 "fsl,mpc8323-qe-pario-bank";
41 reg = <0x1400 0x18>;
42 gpio-controller;
43 };
44
45 qe_pio_e: gpio-controller@1460 {
46 #gpio-cells = <2>;
47 compatible = "fsl,mpc8360-qe-pario-bank",
48 "fsl,mpc8323-qe-pario-bank";
49 reg = <0x1460 0x18>;
50 gpio-controller;
51 };
diff --git a/Documentation/powerpc/dts-bindings/fsl/cpm_qe/qe/pincfg.txt b/Documentation/powerpc/dts-bindings/fsl/cpm_qe/qe/pincfg.txt
new file mode 100644
index 000000000000..c5b43061db3a
--- /dev/null
+++ b/Documentation/powerpc/dts-bindings/fsl/cpm_qe/qe/pincfg.txt
@@ -0,0 +1,60 @@
1* Pin configuration nodes
2
3Required properties:
4- linux,phandle : phandle of this node; likely referenced by a QE
5 device.
6- pio-map : array of pin configurations. Each pin is defined by 6
7 integers. The six numbers are respectively: port, pin, dir,
8 open_drain, assignment, has_irq.
9 - port : port number of the pin; 0-6 represent port A-G in UM.
10 - pin : pin number in the port.
11 - dir : direction of the pin, should encode as follows:
12
13 0 = The pin is disabled
14 1 = The pin is an output
15 2 = The pin is an input
16 3 = The pin is I/O
17
18 - open_drain : indicates the pin is normal or wired-OR:
19
20 0 = The pin is actively driven as an output
21 1 = The pin is an open-drain driver. As an output, the pin is
22 driven active-low, otherwise it is three-stated.
23
24 - assignment : function number of the pin according to the Pin Assignment
25 tables in User Manual. Each pin can have up to 4 possible functions in
26 QE and two options for CPM.
27 - has_irq : indicates if the pin is used as source of external
28 interrupts.
29
30Example:
31 ucc_pin@01 {
32 linux,phandle = <140001>;
33 pio-map = <
34 /* port pin dir open_drain assignment has_irq */
35 0 3 1 0 1 0 /* TxD0 */
36 0 4 1 0 1 0 /* TxD1 */
37 0 5 1 0 1 0 /* TxD2 */
38 0 6 1 0 1 0 /* TxD3 */
39 1 6 1 0 3 0 /* TxD4 */
40 1 7 1 0 1 0 /* TxD5 */
41 1 9 1 0 2 0 /* TxD6 */
42 1 a 1 0 2 0 /* TxD7 */
43 0 9 2 0 1 0 /* RxD0 */
44 0 a 2 0 1 0 /* RxD1 */
45 0 b 2 0 1 0 /* RxD2 */
46 0 c 2 0 1 0 /* RxD3 */
47 0 d 2 0 1 0 /* RxD4 */
48 1 1 2 0 2 0 /* RxD5 */
49 1 0 2 0 2 0 /* RxD6 */
50 1 4 2 0 2 0 /* RxD7 */
51 0 7 1 0 1 0 /* TX_EN */
52 0 8 1 0 1 0 /* TX_ER */
53 0 f 2 0 1 0 /* RX_DV */
54 0 10 2 0 1 0 /* RX_ER */
55 0 0 2 0 1 0 /* RX_CLK */
56 2 9 1 0 3 0 /* GTX_CLK - CLK10 */
57 2 8 2 0 1 0>; /* GTX125 - CLK9 */
58 };
59
60
diff --git a/Documentation/powerpc/dts-bindings/fsl/cpm_qe/qe/ucc.txt b/Documentation/powerpc/dts-bindings/fsl/cpm_qe/qe/ucc.txt
new file mode 100644
index 000000000000..e47734bee3f0
--- /dev/null
+++ b/Documentation/powerpc/dts-bindings/fsl/cpm_qe/qe/ucc.txt
@@ -0,0 +1,70 @@
1* UCC (Unified Communications Controllers)
2
3Required properties:
4- device_type : should be "network", "hldc", "uart", "transparent"
5 "bisync", "atm", or "serial".
6- compatible : could be "ucc_geth" or "fsl_atm" and so on.
7- cell-index : the ucc number(1-8), corresponding to UCCx in UM.
8- reg : Offset and length of the register set for the device
9- interrupts : <a b> where a is the interrupt number and b is a
10 field that represents an encoding of the sense and level
11 information for the interrupt. This should be encoded based on
12 the information in section 2) depending on the type of interrupt
13 controller you have.
14- interrupt-parent : the phandle for the interrupt controller that
15 services interrupts for this device.
16- pio-handle : The phandle for the Parallel I/O port configuration.
17- port-number : for UART drivers, the port number to use, between 0 and 3.
18 This usually corresponds to the /dev/ttyQE device, e.g. <0> = /dev/ttyQE0.
19 The port number is added to the minor number of the device. Unlike the
20 CPM UART driver, the port-number is required for the QE UART driver.
21- soft-uart : for UART drivers, if specified this means the QE UART device
22 driver should use "Soft-UART" mode, which is needed on some SOCs that have
23 broken UART hardware. Soft-UART is provided via a microcode upload.
24- rx-clock-name: the UCC receive clock source
25 "none": clock source is disabled
26 "brg1" through "brg16": clock source is BRG1-BRG16, respectively
27 "clk1" through "clk24": clock source is CLK1-CLK24, respectively
28- tx-clock-name: the UCC transmit clock source
29 "none": clock source is disabled
30 "brg1" through "brg16": clock source is BRG1-BRG16, respectively
31 "clk1" through "clk24": clock source is CLK1-CLK24, respectively
32The following two properties are deprecated. rx-clock has been replaced
33with rx-clock-name, and tx-clock has been replaced with tx-clock-name.
34Drivers that currently use the deprecated properties should continue to
35do so, in order to support older device trees, but they should be updated
36to check for the new properties first.
37- rx-clock : represents the UCC receive clock source.
38 0x00 : clock source is disabled;
39 0x1~0x10 : clock source is BRG1~BRG16 respectively;
40 0x11~0x28: clock source is QE_CLK1~QE_CLK24 respectively.
41- tx-clock: represents the UCC transmit clock source;
42 0x00 : clock source is disabled;
43 0x1~0x10 : clock source is BRG1~BRG16 respectively;
44 0x11~0x28: clock source is QE_CLK1~QE_CLK24 respectively.
45
46Required properties for network device_type:
47- mac-address : list of bytes representing the ethernet address.
48- phy-handle : The phandle for the PHY connected to this controller.
49
50Recommended properties:
51- phy-connection-type : a string naming the controller/PHY interface type,
52 i.e., "mii" (default), "rmii", "gmii", "rgmii", "rgmii-id" (Internal
53 Delay), "rgmii-txid" (delay on TX only), "rgmii-rxid" (delay on RX only),
54 "tbi", or "rtbi".
55
56Example:
57 ucc@2000 {
58 device_type = "network";
59 compatible = "ucc_geth";
60 cell-index = <1>;
61 reg = <2000 200>;
62 interrupts = <a0 0>;
63 interrupt-parent = <700>;
64 mac-address = [ 00 04 9f 00 23 23 ];
65 rx-clock = "none";
66 tx-clock = "clk9";
67 phy-handle = <212000>;
68 phy-connection-type = "gmii";
69 pio-handle = <140001>;
70 };
diff --git a/Documentation/powerpc/dts-bindings/fsl/cpm_qe/qe/usb.txt b/Documentation/powerpc/dts-bindings/fsl/cpm_qe/qe/usb.txt
new file mode 100644
index 000000000000..c8f44d6bcbcf
--- /dev/null
+++ b/Documentation/powerpc/dts-bindings/fsl/cpm_qe/qe/usb.txt
@@ -0,0 +1,22 @@
1* USB (Universal Serial Bus Controller)
2
3Required properties:
4- compatible : could be "qe_udc" or "fhci-hcd".
5- mode : the could be "host" or "slave".
6- reg : Offset and length of the register set for the device
7- interrupts : <a b> where a is the interrupt number and b is a
8 field that represents an encoding of the sense and level
9 information for the interrupt. This should be encoded based on
10 the information in section 2) depending on the type of interrupt
11 controller you have.
12- interrupt-parent : the phandle for the interrupt controller that
13 services interrupts for this device.
14
15Example(slave):
16 usb@6c0 {
17 compatible = "qe_udc";
18 reg = <6c0 40>;
19 interrupts = <8b 0>;
20 interrupt-parent = <700>;
21 mode = "slave";
22 };
diff --git a/Documentation/powerpc/dts-bindings/fsl/cpm_qe/serial.txt b/Documentation/powerpc/dts-bindings/fsl/cpm_qe/serial.txt
new file mode 100644
index 000000000000..b35f3482e3e4
--- /dev/null
+++ b/Documentation/powerpc/dts-bindings/fsl/cpm_qe/serial.txt
@@ -0,0 +1,21 @@
1* Serial
2
3Currently defined compatibles:
4- fsl,cpm1-smc-uart
5- fsl,cpm2-smc-uart
6- fsl,cpm1-scc-uart
7- fsl,cpm2-scc-uart
8- fsl,qe-uart
9
10Example:
11
12 serial@11a00 {
13 device_type = "serial";
14 compatible = "fsl,mpc8272-scc-uart",
15 "fsl,cpm2-scc-uart";
16 reg = <11a00 20 8000 100>;
17 interrupts = <28 8>;
18 interrupt-parent = <&PIC>;
19 fsl,cpm-brg = <1>;
20 fsl,cpm-command = <00800000>;
21 };
diff --git a/Documentation/powerpc/dts-bindings/fsl/diu.txt b/Documentation/powerpc/dts-bindings/fsl/diu.txt
new file mode 100644
index 000000000000..deb35de70988
--- /dev/null
+++ b/Documentation/powerpc/dts-bindings/fsl/diu.txt
@@ -0,0 +1,18 @@
1* Freescale Display Interface Unit
2
3The Freescale DIU is a LCD controller, with proper hardware, it can also
4drive DVI monitors.
5
6Required properties:
7- compatible : should be "fsl-diu".
8- reg : should contain at least address and length of the DIU register
9 set.
10- Interrupts : one DIU interrupt should be describe here.
11
12Example (MPC8610HPCD):
13 display@2c000 {
14 compatible = "fsl,diu";
15 reg = <0x2c000 100>;
16 interrupts = <72 2>;
17 interrupt-parent = <&mpic>;
18 };
diff --git a/Documentation/powerpc/dts-bindings/fsl/dma.txt b/Documentation/powerpc/dts-bindings/fsl/dma.txt
new file mode 100644
index 000000000000..86826df00e64
--- /dev/null
+++ b/Documentation/powerpc/dts-bindings/fsl/dma.txt
@@ -0,0 +1,127 @@
1* Freescale 83xx DMA Controller
2
3Freescale PowerPC 83xx have on chip general purpose DMA controllers.
4
5Required properties:
6
7- compatible : compatible list, contains 2 entries, first is
8 "fsl,CHIP-dma", where CHIP is the processor
9 (mpc8349, mpc8360, etc.) and the second is
10 "fsl,elo-dma"
11- reg : <registers mapping for DMA general status reg>
12- ranges : Should be defined as specified in 1) to describe the
13 DMA controller channels.
14- cell-index : controller index. 0 for controller @ 0x8100
15- interrupts : <interrupt mapping for DMA IRQ>
16- interrupt-parent : optional, if needed for interrupt mapping
17
18
19- DMA channel nodes:
20 - compatible : compatible list, contains 2 entries, first is
21 "fsl,CHIP-dma-channel", where CHIP is the processor
22 (mpc8349, mpc8350, etc.) and the second is
23 "fsl,elo-dma-channel"
24 - reg : <registers mapping for channel>
25 - cell-index : dma channel index starts at 0.
26
27Optional properties:
28 - interrupts : <interrupt mapping for DMA channel IRQ>
29 (on 83xx this is expected to be identical to
30 the interrupts property of the parent node)
31 - interrupt-parent : optional, if needed for interrupt mapping
32
33Example:
34 dma@82a8 {
35 #address-cells = <1>;
36 #size-cells = <1>;
37 compatible = "fsl,mpc8349-dma", "fsl,elo-dma";
38 reg = <82a8 4>;
39 ranges = <0 8100 1a4>;
40 interrupt-parent = <&ipic>;
41 interrupts = <47 8>;
42 cell-index = <0>;
43 dma-channel@0 {
44 compatible = "fsl,mpc8349-dma-channel", "fsl,elo-dma-channel";
45 cell-index = <0>;
46 reg = <0 80>;
47 };
48 dma-channel@80 {
49 compatible = "fsl,mpc8349-dma-channel", "fsl,elo-dma-channel";
50 cell-index = <1>;
51 reg = <80 80>;
52 };
53 dma-channel@100 {
54 compatible = "fsl,mpc8349-dma-channel", "fsl,elo-dma-channel";
55 cell-index = <2>;
56 reg = <100 80>;
57 };
58 dma-channel@180 {
59 compatible = "fsl,mpc8349-dma-channel", "fsl,elo-dma-channel";
60 cell-index = <3>;
61 reg = <180 80>;
62 };
63 };
64
65* Freescale 85xx/86xx DMA Controller
66
67Freescale PowerPC 85xx/86xx have on chip general purpose DMA controllers.
68
69Required properties:
70
71- compatible : compatible list, contains 2 entries, first is
72 "fsl,CHIP-dma", where CHIP is the processor
73 (mpc8540, mpc8540, etc.) and the second is
74 "fsl,eloplus-dma"
75- reg : <registers mapping for DMA general status reg>
76- cell-index : controller index. 0 for controller @ 0x21000,
77 1 for controller @ 0xc000
78- ranges : Should be defined as specified in 1) to describe the
79 DMA controller channels.
80
81- DMA channel nodes:
82 - compatible : compatible list, contains 2 entries, first is
83 "fsl,CHIP-dma-channel", where CHIP is the processor
84 (mpc8540, mpc8560, etc.) and the second is
85 "fsl,eloplus-dma-channel"
86 - cell-index : dma channel index starts at 0.
87 - reg : <registers mapping for channel>
88 - interrupts : <interrupt mapping for DMA channel IRQ>
89 - interrupt-parent : optional, if needed for interrupt mapping
90
91Example:
92 dma@21300 {
93 #address-cells = <1>;
94 #size-cells = <1>;
95 compatible = "fsl,mpc8540-dma", "fsl,eloplus-dma";
96 reg = <21300 4>;
97 ranges = <0 21100 200>;
98 cell-index = <0>;
99 dma-channel@0 {
100 compatible = "fsl,mpc8540-dma-channel", "fsl,eloplus-dma-channel";
101 reg = <0 80>;
102 cell-index = <0>;
103 interrupt-parent = <&mpic>;
104 interrupts = <14 2>;
105 };
106 dma-channel@80 {
107 compatible = "fsl,mpc8540-dma-channel", "fsl,eloplus-dma-channel";
108 reg = <80 80>;
109 cell-index = <1>;
110 interrupt-parent = <&mpic>;
111 interrupts = <15 2>;
112 };
113 dma-channel@100 {
114 compatible = "fsl,mpc8540-dma-channel", "fsl,eloplus-dma-channel";
115 reg = <100 80>;
116 cell-index = <2>;
117 interrupt-parent = <&mpic>;
118 interrupts = <16 2>;
119 };
120 dma-channel@180 {
121 compatible = "fsl,mpc8540-dma-channel", "fsl,eloplus-dma-channel";
122 reg = <180 80>;
123 cell-index = <3>;
124 interrupt-parent = <&mpic>;
125 interrupts = <17 2>;
126 };
127 };
diff --git a/Documentation/powerpc/dts-bindings/fsl/gtm.txt b/Documentation/powerpc/dts-bindings/fsl/gtm.txt
new file mode 100644
index 000000000000..9a33efded4bc
--- /dev/null
+++ b/Documentation/powerpc/dts-bindings/fsl/gtm.txt
@@ -0,0 +1,31 @@
1* Freescale General-purpose Timers Module
2
3Required properties:
4 - compatible : should be
5 "fsl,<chip>-gtm", "fsl,gtm" for SOC GTMs
6 "fsl,<chip>-qe-gtm", "fsl,qe-gtm", "fsl,gtm" for QE GTMs
7 "fsl,<chip>-cpm2-gtm", "fsl,cpm2-gtm", "fsl,gtm" for CPM2 GTMs
8 - reg : should contain gtm registers location and length (0x40).
9 - interrupts : should contain four interrupts.
10 - interrupt-parent : interrupt source phandle.
11 - clock-frequency : specifies the frequency driving the timer.
12
13Example:
14
15timer@500 {
16 compatible = "fsl,mpc8360-gtm", "fsl,gtm";
17 reg = <0x500 0x40>;
18 interrupts = <90 8 78 8 84 8 72 8>;
19 interrupt-parent = <&ipic>;
20 /* filled by u-boot */
21 clock-frequency = <0>;
22};
23
24timer@440 {
25 compatible = "fsl,mpc8360-qe-gtm", "fsl,qe-gtm", "fsl,gtm";
26 reg = <0x440 0x40>;
27 interrupts = <12 13 14 15>;
28 interrupt-parent = <&qeic>;
29 /* filled by u-boot */
30 clock-frequency = <0>;
31};
diff --git a/Documentation/powerpc/dts-bindings/fsl/guts.txt b/Documentation/powerpc/dts-bindings/fsl/guts.txt
new file mode 100644
index 000000000000..9e7a2417dac5
--- /dev/null
+++ b/Documentation/powerpc/dts-bindings/fsl/guts.txt
@@ -0,0 +1,25 @@
1* Global Utilities Block
2
3The global utilities block controls power management, I/O device
4enabling, power-on-reset configuration monitoring, general-purpose
5I/O signal configuration, alternate function selection for multiplexed
6signals, and clock control.
7
8Required properties:
9
10 - compatible : Should define the compatible device type for
11 global-utilities.
12 - reg : Offset and length of the register set for the device.
13
14Recommended properties:
15
16 - fsl,has-rstcr : Indicates that the global utilities register set
17 contains a functioning "reset control register" (i.e. the board
18 is wired to reset upon setting the HRESET_REQ bit in this register).
19
20Example:
21 global-utilities@e0000 { /* global utilities block */
22 compatible = "fsl,mpc8548-guts";
23 reg = <e0000 1000>;
24 fsl,has-rstcr;
25 };
diff --git a/Documentation/powerpc/dts-bindings/fsl/i2c.txt b/Documentation/powerpc/dts-bindings/fsl/i2c.txt
new file mode 100644
index 000000000000..d0ab33e21fe6
--- /dev/null
+++ b/Documentation/powerpc/dts-bindings/fsl/i2c.txt
@@ -0,0 +1,32 @@
1* I2C
2
3Required properties :
4
5 - device_type : Should be "i2c"
6 - reg : Offset and length of the register set for the device
7
8Recommended properties :
9
10 - compatible : Should be "fsl-i2c" for parts compatible with
11 Freescale I2C specifications.
12 - interrupts : <a b> where a is the interrupt number and b is a
13 field that represents an encoding of the sense and level
14 information for the interrupt. This should be encoded based on
15 the information in section 2) depending on the type of interrupt
16 controller you have.
17 - interrupt-parent : the phandle for the interrupt controller that
18 services interrupts for this device.
19 - dfsrr : boolean; if defined, indicates that this I2C device has
20 a digital filter sampling rate register
21 - fsl5200-clocking : boolean; if defined, indicated that this device
22 uses the FSL 5200 clocking mechanism.
23
24Example :
25 i2c@3000 {
26 interrupt-parent = <40000>;
27 interrupts = <1b 3>;
28 reg = <3000 18>;
29 device_type = "i2c";
30 compatible = "fsl-i2c";
31 dfsrr;
32 };
diff --git a/Documentation/powerpc/dts-bindings/fsl/lbc.txt b/Documentation/powerpc/dts-bindings/fsl/lbc.txt
new file mode 100644
index 000000000000..3300fec501c5
--- /dev/null
+++ b/Documentation/powerpc/dts-bindings/fsl/lbc.txt
@@ -0,0 +1,35 @@
1* Chipselect/Local Bus
2
3Properties:
4- name : Should be localbus
5- #address-cells : Should be either two or three. The first cell is the
6 chipselect number, and the remaining cells are the
7 offset into the chipselect.
8- #size-cells : Either one or two, depending on how large each chipselect
9 can be.
10- ranges : Each range corresponds to a single chipselect, and cover
11 the entire access window as configured.
12
13Example:
14 localbus@f0010100 {
15 compatible = "fsl,mpc8272-localbus",
16 "fsl,pq2-localbus";
17 #address-cells = <2>;
18 #size-cells = <1>;
19 reg = <f0010100 40>;
20
21 ranges = <0 0 fe000000 02000000
22 1 0 f4500000 00008000>;
23
24 flash@0,0 {
25 compatible = "jedec-flash";
26 reg = <0 0 2000000>;
27 bank-width = <4>;
28 device-width = <1>;
29 };
30
31 board-control@1,0 {
32 reg = <1 0 20>;
33 compatible = "fsl,mpc8272ads-bcsr";
34 };
35 };
diff --git a/Documentation/powerpc/dts-bindings/fsl/msi-pic.txt b/Documentation/powerpc/dts-bindings/fsl/msi-pic.txt
new file mode 100644
index 000000000000..b26b91992c55
--- /dev/null
+++ b/Documentation/powerpc/dts-bindings/fsl/msi-pic.txt
@@ -0,0 +1,36 @@
1* Freescale MSI interrupt controller
2
3Reguired properities:
4- compatible : compatible list, contains 2 entries,
5 first is "fsl,CHIP-msi", where CHIP is the processor(mpc8610, mpc8572,
6 etc.) and the second is "fsl,mpic-msi" or "fsl,ipic-msi" depending on
7 the parent type.
8- reg : should contain the address and the length of the shared message
9 interrupt register set.
10- msi-available-ranges: use <start count> style section to define which
11 msi interrupt can be used in the 256 msi interrupts. This property is
12 optional, without this, all the 256 MSI interrupts can be used.
13- interrupts : each one of the interrupts here is one entry per 32 MSIs,
14 and routed to the host interrupt controller. the interrupts should
15 be set as edge sensitive.
16- interrupt-parent: the phandle for the interrupt controller
17 that services interrupts for this device. for 83xx cpu, the interrupts
18 are routed to IPIC, and for 85xx/86xx cpu the interrupts are routed
19 to MPIC.
20
21Example:
22 msi@41600 {
23 compatible = "fsl,mpc8610-msi", "fsl,mpic-msi";
24 reg = <0x41600 0x80>;
25 msi-available-ranges = <0 0x100>;
26 interrupts = <
27 0xe0 0
28 0xe1 0
29 0xe2 0
30 0xe3 0
31 0xe4 0
32 0xe5 0
33 0xe6 0
34 0xe7 0>;
35 interrupt-parent = <&mpic>;
36 };
diff --git a/Documentation/powerpc/dts-bindings/fsl/sata.txt b/Documentation/powerpc/dts-bindings/fsl/sata.txt
new file mode 100644
index 000000000000..b46bcf46c3d8
--- /dev/null
+++ b/Documentation/powerpc/dts-bindings/fsl/sata.txt
@@ -0,0 +1,29 @@
1* Freescale 8xxx/3.0 Gb/s SATA nodes
2
3SATA nodes are defined to describe on-chip Serial ATA controllers.
4Each SATA port should have its own node.
5
6Required properties:
7- compatible : compatible list, contains 2 entries, first is
8 "fsl,CHIP-sata", where CHIP is the processor
9 (mpc8315, mpc8379, etc.) and the second is
10 "fsl,pq-sata"
11- interrupts : <interrupt mapping for SATA IRQ>
12- cell-index : controller index.
13 1 for controller @ 0x18000
14 2 for controller @ 0x19000
15 3 for controller @ 0x1a000
16 4 for controller @ 0x1b000
17
18Optional properties:
19- interrupt-parent : optional, if needed for interrupt mapping
20- reg : <registers mapping>
21
22Example:
23 sata@18000 {
24 compatible = "fsl,mpc8379-sata", "fsl,pq-sata";
25 reg = <0x18000 0x1000>;
26 cell-index = <1>;
27 interrupts = <2c 8>;
28 interrupt-parent = < &ipic >;
29 };
diff --git a/Documentation/powerpc/dts-bindings/fsl/sec.txt b/Documentation/powerpc/dts-bindings/fsl/sec.txt
new file mode 100644
index 000000000000..2b6f2d45c45a
--- /dev/null
+++ b/Documentation/powerpc/dts-bindings/fsl/sec.txt
@@ -0,0 +1,68 @@
1Freescale SoC SEC Security Engines
2
3Required properties:
4
5- compatible : Should contain entries for this and backward compatible
6 SEC versions, high to low, e.g., "fsl,sec2.1", "fsl,sec2.0"
7- reg : Offset and length of the register set for the device
8- interrupts : the SEC's interrupt number
9- fsl,num-channels : An integer representing the number of channels
10 available.
11- fsl,channel-fifo-len : An integer representing the number of
12 descriptor pointers each channel fetch fifo can hold.
13- fsl,exec-units-mask : The bitmask representing what execution units
14 (EUs) are available. It's a single 32-bit cell. EU information
15 should be encoded following the SEC's Descriptor Header Dword
16 EU_SEL0 field documentation, i.e. as follows:
17
18 bit 0 = reserved - should be 0
19 bit 1 = set if SEC has the ARC4 EU (AFEU)
20 bit 2 = set if SEC has the DES/3DES EU (DEU)
21 bit 3 = set if SEC has the message digest EU (MDEU/MDEU-A)
22 bit 4 = set if SEC has the random number generator EU (RNG)
23 bit 5 = set if SEC has the public key EU (PKEU)
24 bit 6 = set if SEC has the AES EU (AESU)
25 bit 7 = set if SEC has the Kasumi EU (KEU)
26 bit 8 = set if SEC has the CRC EU (CRCU)
27 bit 11 = set if SEC has the message digest EU extended alg set (MDEU-B)
28
29remaining bits are reserved for future SEC EUs.
30
31- fsl,descriptor-types-mask : The bitmask representing what descriptors
32 are available. It's a single 32-bit cell. Descriptor type information
33 should be encoded following the SEC's Descriptor Header Dword DESC_TYPE
34 field documentation, i.e. as follows:
35
36 bit 0 = set if SEC supports the aesu_ctr_nonsnoop desc. type
37 bit 1 = set if SEC supports the ipsec_esp descriptor type
38 bit 2 = set if SEC supports the common_nonsnoop desc. type
39 bit 3 = set if SEC supports the 802.11i AES ccmp desc. type
40 bit 4 = set if SEC supports the hmac_snoop_no_afeu desc. type
41 bit 5 = set if SEC supports the srtp descriptor type
42 bit 6 = set if SEC supports the non_hmac_snoop_no_afeu desc.type
43 bit 7 = set if SEC supports the pkeu_assemble descriptor type
44 bit 8 = set if SEC supports the aesu_key_expand_output desc.type
45 bit 9 = set if SEC supports the pkeu_ptmul descriptor type
46 bit 10 = set if SEC supports the common_nonsnoop_afeu desc. type
47 bit 11 = set if SEC supports the pkeu_ptadd_dbl descriptor type
48
49 ..and so on and so forth.
50
51Optional properties:
52
53- interrupt-parent : the phandle for the interrupt controller that
54 services interrupts for this device.
55
56Example:
57
58 /* MPC8548E */
59 crypto@30000 {
60 compatible = "fsl,sec2.1", "fsl,sec2.0";
61 reg = <0x30000 0x10000>;
62 interrupts = <29 2>;
63 interrupt-parent = <&mpic>;
64 fsl,num-channels = <4>;
65 fsl,channel-fifo-len = <24>;
66 fsl,exec-units-mask = <0xfe>;
67 fsl,descriptor-types-mask = <0x12b0ebf>;
68 };
diff --git a/Documentation/powerpc/dts-bindings/fsl/spi.txt b/Documentation/powerpc/dts-bindings/fsl/spi.txt
new file mode 100644
index 000000000000..e7d9a344c4f4
--- /dev/null
+++ b/Documentation/powerpc/dts-bindings/fsl/spi.txt
@@ -0,0 +1,24 @@
1* SPI (Serial Peripheral Interface)
2
3Required properties:
4- cell-index : SPI controller index.
5- compatible : should be "fsl,spi".
6- mode : the SPI operation mode, it can be "cpu" or "cpu-qe".
7- reg : Offset and length of the register set for the device
8- interrupts : <a b> where a is the interrupt number and b is a
9 field that represents an encoding of the sense and level
10 information for the interrupt. This should be encoded based on
11 the information in section 2) depending on the type of interrupt
12 controller you have.
13- interrupt-parent : the phandle for the interrupt controller that
14 services interrupts for this device.
15
16Example:
17 spi@4c0 {
18 cell-index = <0>;
19 compatible = "fsl,spi";
20 reg = <4c0 40>;
21 interrupts = <82 0>;
22 interrupt-parent = <700>;
23 mode = "cpu";
24 };
diff --git a/Documentation/powerpc/dts-bindings/fsl/ssi.txt b/Documentation/powerpc/dts-bindings/fsl/ssi.txt
new file mode 100644
index 000000000000..d100555d488a
--- /dev/null
+++ b/Documentation/powerpc/dts-bindings/fsl/ssi.txt
@@ -0,0 +1,38 @@
1Freescale Synchronous Serial Interface
2
3The SSI is a serial device that communicates with audio codecs. It can
4be programmed in AC97, I2S, left-justified, or right-justified modes.
5
6Required properties:
7- compatible : compatible list, containing "fsl,ssi"
8- cell-index : the SSI, <0> = SSI1, <1> = SSI2, and so on
9- reg : offset and length of the register set for the device
10- interrupts : <a b> where a is the interrupt number and b is a
11 field that represents an encoding of the sense and
12 level information for the interrupt. This should be
13 encoded based on the information in section 2)
14 depending on the type of interrupt controller you
15 have.
16- interrupt-parent : the phandle for the interrupt controller that
17 services interrupts for this device.
18- fsl,mode : the operating mode for the SSI interface
19 "i2s-slave" - I2S mode, SSI is clock slave
20 "i2s-master" - I2S mode, SSI is clock master
21 "lj-slave" - left-justified mode, SSI is clock slave
22 "lj-master" - l.j. mode, SSI is clock master
23 "rj-slave" - right-justified mode, SSI is clock slave
24 "rj-master" - r.j., SSI is clock master
25 "ac97-slave" - AC97 mode, SSI is clock slave
26 "ac97-master" - AC97 mode, SSI is clock master
27
28Optional properties:
29- codec-handle : phandle to a 'codec' node that defines an audio
30 codec connected to this SSI. This node is typically
31 a child of an I2C or other control node.
32
33Child 'codec' node required properties:
34- compatible : compatible list, contains the name of the codec
35
36Child 'codec' node optional properties:
37- clock-frequency : The frequency of the input clock, which typically
38 comes from an on-board dedicated oscillator.
diff --git a/Documentation/powerpc/dts-bindings/fsl/tsec.txt b/Documentation/powerpc/dts-bindings/fsl/tsec.txt
new file mode 100644
index 000000000000..583ef6b56c43
--- /dev/null
+++ b/Documentation/powerpc/dts-bindings/fsl/tsec.txt
@@ -0,0 +1,69 @@
1* MDIO IO device
2
3The MDIO is a bus to which the PHY devices are connected. For each
4device that exists on this bus, a child node should be created. See
5the definition of the PHY node below for an example of how to define
6a PHY.
7
8Required properties:
9 - reg : Offset and length of the register set for the device
10 - compatible : Should define the compatible device type for the
11 mdio. Currently, this is most likely to be "fsl,gianfar-mdio"
12
13Example:
14
15 mdio@24520 {
16 reg = <24520 20>;
17 compatible = "fsl,gianfar-mdio";
18
19 ethernet-phy@0 {
20 ......
21 };
22 };
23
24
25* Gianfar-compatible ethernet nodes
26
27Required properties:
28
29 - device_type : Should be "network"
30 - model : Model of the device. Can be "TSEC", "eTSEC", or "FEC"
31 - compatible : Should be "gianfar"
32 - reg : Offset and length of the register set for the device
33 - mac-address : List of bytes representing the ethernet address of
34 this controller
35 - interrupts : <a b> where a is the interrupt number and b is a
36 field that represents an encoding of the sense and level
37 information for the interrupt. This should be encoded based on
38 the information in section 2) depending on the type of interrupt
39 controller you have.
40 - interrupt-parent : the phandle for the interrupt controller that
41 services interrupts for this device.
42 - phy-handle : The phandle for the PHY connected to this ethernet
43 controller.
44 - fixed-link : <a b c d e> where a is emulated phy id - choose any,
45 but unique to the all specified fixed-links, b is duplex - 0 half,
46 1 full, c is link speed - d#10/d#100/d#1000, d is pause - 0 no
47 pause, 1 pause, e is asym_pause - 0 no asym_pause, 1 asym_pause.
48
49Recommended properties:
50
51 - phy-connection-type : a string naming the controller/PHY interface type,
52 i.e., "mii" (default), "rmii", "gmii", "rgmii", "rgmii-id", "sgmii",
53 "tbi", or "rtbi". This property is only really needed if the connection
54 is of type "rgmii-id", as all other connection types are detected by
55 hardware.
56
57
58Example:
59 ethernet@24000 {
60 #size-cells = <0>;
61 device_type = "network";
62 model = "TSEC";
63 compatible = "gianfar";
64 reg = <24000 1000>;
65 mac-address = [ 00 E0 0C 00 73 00 ];
66 interrupts = <d 3 e 3 12 3>;
67 interrupt-parent = <40000>;
68 phy-handle = <2452000>
69 };
diff --git a/Documentation/powerpc/dts-bindings/fsl/usb.txt b/Documentation/powerpc/dts-bindings/fsl/usb.txt
new file mode 100644
index 000000000000..b00152402694
--- /dev/null
+++ b/Documentation/powerpc/dts-bindings/fsl/usb.txt
@@ -0,0 +1,59 @@
1Freescale SOC USB controllers
2
3The device node for a USB controller that is part of a Freescale
4SOC is as described in the document "Open Firmware Recommended
5Practice : Universal Serial Bus" with the following modifications
6and additions :
7
8Required properties :
9 - compatible : Should be "fsl-usb2-mph" for multi port host USB
10 controllers, or "fsl-usb2-dr" for dual role USB controllers
11 - phy_type : For multi port host USB controllers, should be one of
12 "ulpi", or "serial". For dual role USB controllers, should be
13 one of "ulpi", "utmi", "utmi_wide", or "serial".
14 - reg : Offset and length of the register set for the device
15 - port0 : boolean; if defined, indicates port0 is connected for
16 fsl-usb2-mph compatible controllers. Either this property or
17 "port1" (or both) must be defined for "fsl-usb2-mph" compatible
18 controllers.
19 - port1 : boolean; if defined, indicates port1 is connected for
20 fsl-usb2-mph compatible controllers. Either this property or
21 "port0" (or both) must be defined for "fsl-usb2-mph" compatible
22 controllers.
23 - dr_mode : indicates the working mode for "fsl-usb2-dr" compatible
24 controllers. Can be "host", "peripheral", or "otg". Default to
25 "host" if not defined for backward compatibility.
26
27Recommended properties :
28 - interrupts : <a b> where a is the interrupt number and b is a
29 field that represents an encoding of the sense and level
30 information for the interrupt. This should be encoded based on
31 the information in section 2) depending on the type of interrupt
32 controller you have.
33 - interrupt-parent : the phandle for the interrupt controller that
34 services interrupts for this device.
35
36Example multi port host USB controller device node :
37 usb@22000 {
38 compatible = "fsl-usb2-mph";
39 reg = <22000 1000>;
40 #address-cells = <1>;
41 #size-cells = <0>;
42 interrupt-parent = <700>;
43 interrupts = <27 1>;
44 phy_type = "ulpi";
45 port0;
46 port1;
47 };
48
49Example dual role USB controller device node :
50 usb@23000 {
51 compatible = "fsl-usb2-dr";
52 reg = <23000 1000>;
53 #address-cells = <1>;
54 #size-cells = <0>;
55 interrupt-parent = <700>;
56 interrupts = <26 1>;
57 dr_mode = "otg";
58 phy = "ulpi";
59 };
diff --git a/Documentation/scheduler/sched-domains.txt b/Documentation/scheduler/sched-domains.txt
index a9e990ab980f..373ceacc367e 100644
--- a/Documentation/scheduler/sched-domains.txt
+++ b/Documentation/scheduler/sched-domains.txt
@@ -61,10 +61,7 @@ builder by #define'ing ARCH_HASH_SCHED_DOMAIN, and exporting your
61arch_init_sched_domains function. This function will attach domains to all 61arch_init_sched_domains function. This function will attach domains to all
62CPUs using cpu_attach_domain. 62CPUs using cpu_attach_domain.
63 63
64Implementors should change the line 64The sched-domains debugging infrastructure can be enabled by enabling
65#undef SCHED_DOMAIN_DEBUG 65CONFIG_SCHED_DEBUG. This enables an error checking parse of the sched domains
66to
67#define SCHED_DOMAIN_DEBUG
68in kernel/sched.c as this enables an error checking parse of the sched domains
69which should catch most possible errors (described above). It also prints out 66which should catch most possible errors (described above). It also prints out
70the domain structure in a visual format. 67the domain structure in a visual format.
diff --git a/Documentation/scheduler/sched-rt-group.txt b/Documentation/scheduler/sched-rt-group.txt
index 14f901f639ee..3ef339f491e0 100644
--- a/Documentation/scheduler/sched-rt-group.txt
+++ b/Documentation/scheduler/sched-rt-group.txt
@@ -51,9 +51,9 @@ needs only about 3% CPU time to do so, it can do with a 0.03 * 0.005s =
510.00015s. So this group can be scheduled with a period of 0.005s and a run time 510.00015s. So this group can be scheduled with a period of 0.005s and a run time
52of 0.00015s. 52of 0.00015s.
53 53
54The remaining CPU time will be used for user input and other tass. Because 54The remaining CPU time will be used for user input and other tasks. Because
55realtime tasks have explicitly allocated the CPU time they need to perform 55realtime tasks have explicitly allocated the CPU time they need to perform
56their tasks, buffer underruns in the graphocs or audio can be eliminated. 56their tasks, buffer underruns in the graphics or audio can be eliminated.
57 57
58NOTE: the above example is not fully implemented as of yet (2.6.25). We still 58NOTE: the above example is not fully implemented as of yet (2.6.25). We still
59lack an EDF scheduler to make non-uniform periods usable. 59lack an EDF scheduler to make non-uniform periods usable.
diff --git a/Documentation/scsi/aacraid.txt b/Documentation/scsi/aacraid.txt
index d16011a8618e..709ca991a451 100644
--- a/Documentation/scsi/aacraid.txt
+++ b/Documentation/scsi/aacraid.txt
@@ -56,19 +56,33 @@ Supported Cards/Chipsets
56 9005:0285:9005:02d1 Adaptec 5405 (Voodoo40) 56 9005:0285:9005:02d1 Adaptec 5405 (Voodoo40)
57 9005:0285:15d9:02d2 SMC AOC-USAS-S8i-LP 57 9005:0285:15d9:02d2 SMC AOC-USAS-S8i-LP
58 9005:0285:15d9:02d3 SMC AOC-USAS-S8iR-LP 58 9005:0285:15d9:02d3 SMC AOC-USAS-S8iR-LP
59 9005:0285:9005:02d4 Adaptec 2045 (Voodoo04 Lite) 59 9005:0285:9005:02d4 Adaptec ASR-2045 (Voodoo04 Lite)
60 9005:0285:9005:02d5 Adaptec 2405 (Voodoo40 Lite) 60 9005:0285:9005:02d5 Adaptec ASR-2405 (Voodoo40 Lite)
61 9005:0285:9005:02d6 Adaptec 2445 (Voodoo44 Lite) 61 9005:0285:9005:02d6 Adaptec ASR-2445 (Voodoo44 Lite)
62 9005:0285:9005:02d7 Adaptec 2805 (Voodoo80 Lite) 62 9005:0285:9005:02d7 Adaptec ASR-2805 (Voodoo80 Lite)
63 9005:0285:9005:02d8 Adaptec 5405G (Voodoo40 PM)
64 9005:0285:9005:02d9 Adaptec 5445G (Voodoo44 PM)
65 9005:0285:9005:02da Adaptec 5805G (Voodoo80 PM)
66 9005:0285:9005:02db Adaptec 5085G (Voodoo08 PM)
67 9005:0285:9005:02dc Adaptec 51245G (Voodoo124 PM)
68 9005:0285:9005:02dd Adaptec 51645G (Voodoo164 PM)
69 9005:0285:9005:02de Adaptec 52445G (Voodoo244 PM)
70 9005:0285:9005:02df Adaptec ASR-2045G (Voodoo04 Lite PM)
71 9005:0285:9005:02e0 Adaptec ASR-2405G (Voodoo40 Lite PM)
72 9005:0285:9005:02e1 Adaptec ASR-2445G (Voodoo44 Lite PM)
73 9005:0285:9005:02e2 Adaptec ASR-2805G (Voodoo80 Lite PM)
63 1011:0046:9005:0364 Adaptec 5400S (Mustang) 74 1011:0046:9005:0364 Adaptec 5400S (Mustang)
75 1011:0046:9005:0365 Adaptec 5400S (Mustang)
64 9005:0287:9005:0800 Adaptec Themisto (Jupiter) 76 9005:0287:9005:0800 Adaptec Themisto (Jupiter)
65 9005:0200:9005:0200 Adaptec Themisto (Jupiter) 77 9005:0200:9005:0200 Adaptec Themisto (Jupiter)
66 9005:0286:9005:0800 Adaptec Callisto (Jupiter) 78 9005:0286:9005:0800 Adaptec Callisto (Jupiter)
67 1011:0046:9005:1364 Dell PERC 2/QC (Quad Channel, Mustang) 79 1011:0046:9005:1364 Dell PERC 2/QC (Quad Channel, Mustang)
80 1011:0046:9005:1365 Dell PERC 2/QC (Quad Channel, Mustang)
68 1028:0001:1028:0001 Dell PERC 2/Si (Iguana) 81 1028:0001:1028:0001 Dell PERC 2/Si (Iguana)
69 1028:0003:1028:0003 Dell PERC 3/Si (SlimFast) 82 1028:0003:1028:0003 Dell PERC 3/Si (SlimFast)
70 1028:0002:1028:0002 Dell PERC 3/Di (Opal) 83 1028:0002:1028:0002 Dell PERC 3/Di (Opal)
71 1028:0004:1028:0004 Dell PERC 3/DiF (Iguana) 84 1028:0004:1028:0004 Dell PERC 3/SiF (Iguana)
85 1028:0004:1028:00d0 Dell PERC 3/DiF (Iguana)
72 1028:0002:1028:00d1 Dell PERC 3/DiV (Viper) 86 1028:0002:1028:00d1 Dell PERC 3/DiV (Viper)
73 1028:0002:1028:00d9 Dell PERC 3/DiL (Lexus) 87 1028:0002:1028:00d9 Dell PERC 3/DiL (Lexus)
74 1028:000a:1028:0106 Dell PERC 3/DiJ (Jaguar) 88 1028:000a:1028:0106 Dell PERC 3/DiJ (Jaguar)
diff --git a/Documentation/sound/alsa/ALSA-Configuration.txt b/Documentation/sound/alsa/ALSA-Configuration.txt
index 0bbee38acd26..72aff61e7315 100644
--- a/Documentation/sound/alsa/ALSA-Configuration.txt
+++ b/Documentation/sound/alsa/ALSA-Configuration.txt
@@ -753,8 +753,11 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
753 753
754 [Multiple options for each card instance] 754 [Multiple options for each card instance]
755 model - force the model name 755 model - force the model name
756 position_fix - Fix DMA pointer (0 = auto, 1 = none, 2 = POSBUF, 3 = FIFO size) 756 position_fix - Fix DMA pointer (0 = auto, 1 = use LPIB, 2 = POSBUF)
757 probe_mask - Bitmask to probe codecs (default = -1, meaning all slots) 757 probe_mask - Bitmask to probe codecs (default = -1, meaning all slots)
758 bdl_pos_adj - Specifies the DMA IRQ timing delay in samples.
759 Passing -1 will make the driver to choose the appropriate
760 value based on the controller chip.
758 761
759 [Single (global) options] 762 [Single (global) options]
760 single_cmd - Use single immediate commands to communicate with 763 single_cmd - Use single immediate commands to communicate with
@@ -845,7 +848,7 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
845 ALC269 848 ALC269
846 basic Basic preset 849 basic Basic preset
847 850
848 ALC662 851 ALC662/663
849 3stack-dig 3-stack (2-channel) with SPDIF 852 3stack-dig 3-stack (2-channel) with SPDIF
850 3stack-6ch 3-stack (6-channel) 853 3stack-6ch 3-stack (6-channel)
851 3stack-6ch-dig 3-stack (6-channel) with SPDIF 854 3stack-6ch-dig 3-stack (6-channel) with SPDIF
@@ -853,6 +856,10 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
853 lenovo-101e Lenovo laptop 856 lenovo-101e Lenovo laptop
854 eeepc-p701 ASUS Eeepc P701 857 eeepc-p701 ASUS Eeepc P701
855 eeepc-ep20 ASUS Eeepc EP20 858 eeepc-ep20 ASUS Eeepc EP20
859 m51va ASUS M51VA
860 g71v ASUS G71V
861 h13 ASUS H13
862 g50v ASUS G50V
856 auto auto-config reading BIOS (default) 863 auto auto-config reading BIOS (default)
857 864
858 ALC882/885 865 ALC882/885
@@ -1091,7 +1098,7 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
1091 This occurs when the access to non-existing or non-working codec slot 1098 This occurs when the access to non-existing or non-working codec slot
1092 (likely a modem one) causes a stall of the communication via HD-audio 1099 (likely a modem one) causes a stall of the communication via HD-audio
1093 bus. You can see which codec slots are probed by enabling 1100 bus. You can see which codec slots are probed by enabling
1094 CONFIG_SND_DEBUG_DETECT, or simply from the file name of the codec 1101 CONFIG_SND_DEBUG_VERBOSE, or simply from the file name of the codec
1095 proc files. Then limit the slots to probe by probe_mask option. 1102 proc files. Then limit the slots to probe by probe_mask option.
1096 For example, probe_mask=1 means to probe only the first slot, and 1103 For example, probe_mask=1 means to probe only the first slot, and
1097 probe_mask=4 means only the third slot. 1104 probe_mask=4 means only the third slot.
@@ -2267,6 +2274,10 @@ case above again, the first two slots are already reserved. If any
2267other driver (e.g. snd-usb-audio) is loaded before snd-interwave or 2274other driver (e.g. snd-usb-audio) is loaded before snd-interwave or
2268snd-ens1371, it will be assigned to the third or later slot. 2275snd-ens1371, it will be assigned to the third or later slot.
2269 2276
2277When a module name is given with '!', the slot will be given for any
2278modules but that name. For example, "slots=!snd-pcsp" will reserve
2279the first slot for any modules but snd-pcsp.
2280
2270 2281
2271ALSA PCM devices to OSS devices mapping 2282ALSA PCM devices to OSS devices mapping
2272======================================= 2283=======================================
diff --git a/Documentation/sound/alsa/DocBook/writing-an-alsa-driver.tmpl b/Documentation/sound/alsa/DocBook/writing-an-alsa-driver.tmpl
index b03df4d4795c..e13c4e67029f 100644
--- a/Documentation/sound/alsa/DocBook/writing-an-alsa-driver.tmpl
+++ b/Documentation/sound/alsa/DocBook/writing-an-alsa-driver.tmpl
@@ -6127,8 +6127,8 @@ struct _snd_pcm_runtime {
6127 6127
6128 <para> 6128 <para>
6129 <function>snd_printdd()</function> is compiled in only when 6129 <function>snd_printdd()</function> is compiled in only when
6130 <constant>CONFIG_SND_DEBUG_DETECT</constant> is set. Please note 6130 <constant>CONFIG_SND_DEBUG_VERBOSE</constant> is set. Please note
6131 that <constant>DEBUG_DETECT</constant> is not set as default 6131 that <constant>CONFIG_SND_DEBUG_VERBOSE</constant> is not set as default
6132 even if you configure the alsa-driver with 6132 even if you configure the alsa-driver with
6133 <option>--with-debug=full</option> option. You need to give 6133 <option>--with-debug=full</option> option. You need to give
6134 explicitly <option>--with-debug=detect</option> option instead. 6134 explicitly <option>--with-debug=detect</option> option instead.
diff --git a/Documentation/tracers/mmiotrace.txt b/Documentation/tracers/mmiotrace.txt
new file mode 100644
index 000000000000..a4afb560a45b
--- /dev/null
+++ b/Documentation/tracers/mmiotrace.txt
@@ -0,0 +1,164 @@
1 In-kernel memory-mapped I/O tracing
2
3
4Home page and links to optional user space tools:
5
6 http://nouveau.freedesktop.org/wiki/MmioTrace
7
8MMIO tracing was originally developed by Intel around 2003 for their Fault
9Injection Test Harness. In Dec 2006 - Jan 2007, using the code from Intel,
10Jeff Muizelaar created a tool for tracing MMIO accesses with the Nouveau
11project in mind. Since then many people have contributed.
12
13Mmiotrace was built for reverse engineering any memory-mapped IO device with
14the Nouveau project as the first real user. Only x86 and x86_64 architectures
15are supported.
16
17Out-of-tree mmiotrace was originally modified for mainline inclusion and
18ftrace framework by Pekka Paalanen <pq@iki.fi>.
19
20
21Preparation
22-----------
23
24Mmiotrace feature is compiled in by the CONFIG_MMIOTRACE option. Tracing is
25disabled by default, so it is safe to have this set to yes. SMP systems are
26supported, but tracing is unreliable and may miss events if more than one CPU
27is on-line, therefore mmiotrace takes all but one CPU off-line during run-time
28activation. You can re-enable CPUs by hand, but you have been warned, there
29is no way to automatically detect if you are losing events due to CPUs racing.
30
31
32Usage Quick Reference
33---------------------
34
35$ mount -t debugfs debugfs /debug
36$ echo mmiotrace > /debug/tracing/current_tracer
37$ cat /debug/tracing/trace_pipe > mydump.txt &
38Start X or whatever.
39$ echo "X is up" > /debug/tracing/marker
40$ echo none > /debug/tracing/current_tracer
41Check for lost events.
42
43
44Usage
45-----
46
47Make sure debugfs is mounted to /debug. If not, (requires root privileges)
48$ mount -t debugfs debugfs /debug
49
50Check that the driver you are about to trace is not loaded.
51
52Activate mmiotrace (requires root privileges):
53$ echo mmiotrace > /debug/tracing/current_tracer
54
55Start storing the trace:
56$ cat /debug/tracing/trace_pipe > mydump.txt &
57The 'cat' process should stay running (sleeping) in the background.
58
59Load the driver you want to trace and use it. Mmiotrace will only catch MMIO
60accesses to areas that are ioremapped while mmiotrace is active.
61
62[Unimplemented feature:]
63During tracing you can place comments (markers) into the trace by
64$ echo "X is up" > /debug/tracing/marker
65This makes it easier to see which part of the (huge) trace corresponds to
66which action. It is recommended to place descriptive markers about what you
67do.
68
69Shut down mmiotrace (requires root privileges):
70$ echo none > /debug/tracing/current_tracer
71The 'cat' process exits. If it does not, kill it by issuing 'fg' command and
72pressing ctrl+c.
73
74Check that mmiotrace did not lose events due to a buffer filling up. Either
75$ grep -i lost mydump.txt
76which tells you exactly how many events were lost, or use
77$ dmesg
78to view your kernel log and look for "mmiotrace has lost events" warning. If
79events were lost, the trace is incomplete. You should enlarge the buffers and
80try again. Buffers are enlarged by first seeing how large the current buffers
81are:
82$ cat /debug/tracing/trace_entries
83gives you a number. Approximately double this number and write it back, for
84instance:
85$ echo 128000 > /debug/tracing/trace_entries
86Then start again from the top.
87
88If you are doing a trace for a driver project, e.g. Nouveau, you should also
89do the following before sending your results:
90$ lspci -vvv > lspci.txt
91$ dmesg > dmesg.txt
92$ tar zcf pciid-nick-mmiotrace.tar.gz mydump.txt lspci.txt dmesg.txt
93and then send the .tar.gz file. The trace compresses considerably. Replace
94"pciid" and "nick" with the PCI ID or model name of your piece of hardware
95under investigation and your nick name.
96
97
98How Mmiotrace Works
99-------------------
100
101Access to hardware IO-memory is gained by mapping addresses from PCI bus by
102calling one of the ioremap_*() functions. Mmiotrace is hooked into the
103__ioremap() function and gets called whenever a mapping is created. Mapping is
104an event that is recorded into the trace log. Note, that ISA range mappings
105are not caught, since the mapping always exists and is returned directly.
106
107MMIO accesses are recorded via page faults. Just before __ioremap() returns,
108the mapped pages are marked as not present. Any access to the pages causes a
109fault. The page fault handler calls mmiotrace to handle the fault. Mmiotrace
110marks the page present, sets TF flag to achieve single stepping and exits the
111fault handler. The instruction that faulted is executed and debug trap is
112entered. Here mmiotrace again marks the page as not present. The instruction
113is decoded to get the type of operation (read/write), data width and the value
114read or written. These are stored to the trace log.
115
116Setting the page present in the page fault handler has a race condition on SMP
117machines. During the single stepping other CPUs may run freely on that page
118and events can be missed without a notice. Re-enabling other CPUs during
119tracing is discouraged.
120
121
122Trace Log Format
123----------------
124
125The raw log is text and easily filtered with e.g. grep and awk. One record is
126one line in the log. A record starts with a keyword, followed by keyword
127dependant arguments. Arguments are separated by a space, or continue until the
128end of line. The format for version 20070824 is as follows:
129
130Explanation Keyword Space separated arguments
131---------------------------------------------------------------------------
132
133read event R width, timestamp, map id, physical, value, PC, PID
134write event W width, timestamp, map id, physical, value, PC, PID
135ioremap event MAP timestamp, map id, physical, virtual, length, PC, PID
136iounmap event UNMAP timestamp, map id, PC, PID
137marker MARK timestamp, text
138version VERSION the string "20070824"
139info for reader LSPCI one line from lspci -v
140PCI address map PCIDEV space separated /proc/bus/pci/devices data
141unk. opcode UNKNOWN timestamp, map id, physical, data, PC, PID
142
143Timestamp is in seconds with decimals. Physical is a PCI bus address, virtual
144is a kernel virtual address. Width is the data width in bytes and value is the
145data value. Map id is an arbitrary id number identifying the mapping that was
146used in an operation. PC is the program counter and PID is process id. PC is
147zero if it is not recorded. PID is always zero as tracing MMIO accesses
148originating in user space memory is not yet supported.
149
150For instance, the following awk filter will pass all 32-bit writes that target
151physical addresses in the range [0xfb73ce40, 0xfb800000[
152
153$ awk '/W 4 / { adr=strtonum($5); if (adr >= 0xfb73ce40 &&
154adr < 0xfb800000) print; }'
155
156
157Tools for Developers
158--------------------
159
160The user space tools include utilities for:
161- replacing numeric addresses and values with hardware register names
162- replaying MMIO logs, i.e., re-executing the recorded writes
163
164