aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/00-INDEX2
-rw-r--r--Documentation/ABI/stable/sysfs-class-ubi212
-rw-r--r--Documentation/DocBook/Makefile5
-rw-r--r--Documentation/DocBook/kernel-api.tmpl56
-rw-r--r--Documentation/HOWTO30
-rw-r--r--Documentation/arm/Samsung-S3C24XX/NAND.txt30
-rw-r--r--Documentation/arm/Samsung-S3C24XX/Overview.txt2
-rw-r--r--Documentation/device-mapper/dm-crypt.txt52
-rw-r--r--Documentation/fb/gxfb.txt52
-rw-r--r--Documentation/fb/intelfb.txt2
-rw-r--r--Documentation/fb/lxfb.txt52
-rw-r--r--Documentation/fb/metronomefb.txt16
-rw-r--r--Documentation/fb/modedb.txt4
-rw-r--r--Documentation/feature-removal-schedule.txt9
-rw-r--r--Documentation/filesystems/Locking3
-rw-r--r--Documentation/filesystems/nfs-rdma.txt14
-rw-r--r--Documentation/filesystems/seq_file.txt19
-rw-r--r--Documentation/filesystems/tmpfs.txt12
-rw-r--r--Documentation/filesystems/vfat.txt15
-rw-r--r--Documentation/gpio.txt10
-rw-r--r--Documentation/i386/boot.txt26
-rw-r--r--Documentation/ia64/kvm.txt82
-rw-r--r--Documentation/ide/ide-tape.txt211
-rw-r--r--Documentation/ide/ide.txt107
-rw-r--r--Documentation/ioctl-number.txt2
-rw-r--r--Documentation/kbuild/modules.txt9
-rw-r--r--Documentation/kernel-parameters.txt4
-rw-r--r--Documentation/kprobes.txt51
-rw-r--r--Documentation/leds-class.txt12
-rw-r--r--Documentation/md.txt6
-rw-r--r--Documentation/mips/AU1xxx_IDE.README46
-rw-r--r--Documentation/networking/phy.txt38
-rw-r--r--Documentation/powerpc/booting-without-of.txt44
-rw-r--r--Documentation/powerpc/kvm_440.txt41
-rw-r--r--Documentation/s390/kvm.txt125
-rw-r--r--Documentation/smart-config.txt98
-rw-r--r--Documentation/spi/spidev168
-rw-r--r--Documentation/spi/spidev_fdx.c158
-rw-r--r--Documentation/usb/anchors.txt50
-rw-r--r--Documentation/usb/callbacks.txt132
-rw-r--r--Documentation/usb/persist.txt43
-rw-r--r--Documentation/usb/usb-serial.txt7
-rw-r--r--Documentation/vm/numa_memory_policy.txt281
43 files changed, 1652 insertions, 686 deletions
diff --git a/Documentation/00-INDEX b/Documentation/00-INDEX
index a82a113b4a4b..1977fab38656 100644
--- a/Documentation/00-INDEX
+++ b/Documentation/00-INDEX
@@ -329,8 +329,6 @@ sgi-visws.txt
329 - short blurb on the SGI Visual Workstations. 329 - short blurb on the SGI Visual Workstations.
330sh/ 330sh/
331 - directory with info on porting Linux to a new architecture. 331 - directory with info on porting Linux to a new architecture.
332smart-config.txt
333 - description of the Smart Config makefile feature.
334sound/ 332sound/
335 - directory with info on sound card support. 333 - directory with info on sound card support.
336sparc/ 334sparc/
diff --git a/Documentation/ABI/stable/sysfs-class-ubi b/Documentation/ABI/stable/sysfs-class-ubi
new file mode 100644
index 000000000000..18d471d9faea
--- /dev/null
+++ b/Documentation/ABI/stable/sysfs-class-ubi
@@ -0,0 +1,212 @@
1What: /sys/class/ubi/
2Date: July 2006
3KernelVersion: 2.6.22
4Contact: Artem Bityutskiy <dedekind@infradead.org>
5Description:
6 The ubi/ class sub-directory belongs to the UBI subsystem and
7 provides general UBI information, per-UBI device information
8 and per-UBI volume information.
9
10What: /sys/class/ubi/version
11Date: July 2006
12KernelVersion: 2.6.22
13Contact: Artem Bityutskiy <dedekind@infradead.org>
14Description:
15 This file contains version of the latest supported UBI on-media
16 format. Currently it is 1, and there is no plan to change this.
17 However, if in the future UBI needs on-flash format changes
18 which cannot be done in a compatible manner, a new format
19 version will be added. So this is a mechanism for possible
20 future backward-compatible (but forward-incompatible)
21 improvements.
22
23What: /sys/class/ubiX/
24Date: July 2006
25KernelVersion: 2.6.22
26Contact: Artem Bityutskiy <dedekind@infradead.org>
27Description:
28 The /sys/class/ubi0, /sys/class/ubi1, etc directories describe
29 UBI devices (UBI device 0, 1, etc). They contain general UBI
30 device information and per UBI volume information (each UBI
31 device may have many UBI volumes)
32
33What: /sys/class/ubi/ubiX/avail_eraseblocks
34Date: July 2006
35KernelVersion: 2.6.22
36Contact: Artem Bityutskiy <dedekind@infradead.org>
37Description:
38 Amount of available logical eraseblock. For example, one may
39 create a new UBI volume which has this amount of logical
40 eraseblocks.
41
42What: /sys/class/ubi/ubiX/bad_peb_count
43Date: July 2006
44KernelVersion: 2.6.22
45Contact: Artem Bityutskiy <dedekind@infradead.org>
46Description:
47 Count of bad physical eraseblocks on the underlying MTD device.
48
49What: /sys/class/ubi/ubiX/bgt_enabled
50Date: July 2006
51KernelVersion: 2.6.22
52Contact: Artem Bityutskiy <dedekind@infradead.org>
53Description:
54 Contains ASCII "0\n" if the UBI background thread is disabled,
55 and ASCII "1\n" if it is enabled.
56
57What: /sys/class/ubi/ubiX/dev
58Date: July 2006
59KernelVersion: 2.6.22
60Contact: Artem Bityutskiy <dedekind@infradead.org>
61Description:
62 Major and minor numbers of the character device corresponding
63 to this UBI device (in <major>:<minor> format).
64
65What: /sys/class/ubi/ubiX/eraseblock_size
66Date: July 2006
67KernelVersion: 2.6.22
68Contact: Artem Bityutskiy <dedekind@infradead.org>
69Description:
70 Maximum logical eraseblock size this UBI device may provide. UBI
71 volumes may have smaller logical eraseblock size because of their
72 alignment.
73
74What: /sys/class/ubi/ubiX/max_ec
75Date: July 2006
76KernelVersion: 2.6.22
77Contact: Artem Bityutskiy <dedekind@infradead.org>
78Description:
79 Maximum physical eraseblock erase counter value.
80
81What: /sys/class/ubi/ubiX/max_vol_count
82Date: July 2006
83KernelVersion: 2.6.22
84Contact: Artem Bityutskiy <dedekind@infradead.org>
85Description:
86 Maximum number of volumes which this UBI device may have.
87
88What: /sys/class/ubi/ubiX/min_io_size
89Date: July 2006
90KernelVersion: 2.6.22
91Contact: Artem Bityutskiy <dedekind@infradead.org>
92Description:
93 Minimum input/output unit size. All the I/O may only be done
94 in fractions of the contained number.
95
96What: /sys/class/ubi/ubiX/mtd_num
97Date: January 2008
98KernelVersion: 2.6.25
99Contact: Artem Bityutskiy <dedekind@infradead.org>
100Description:
101 Number of the underlying MTD device.
102
103What: /sys/class/ubi/ubiX/reserved_for_bad
104Date: July 2006
105KernelVersion: 2.6.22
106Contact: Artem Bityutskiy <dedekind@infradead.org>
107Description:
108 Number of physical eraseblocks reserved for bad block handling.
109
110What: /sys/class/ubi/ubiX/total_eraseblocks
111Date: July 2006
112KernelVersion: 2.6.22
113Contact: Artem Bityutskiy <dedekind@infradead.org>
114Description:
115 Total number of good (not marked as bad) physical eraseblocks on
116 the underlying MTD device.
117
118What: /sys/class/ubi/ubiX/volumes_count
119Date: July 2006
120KernelVersion: 2.6.22
121Contact: Artem Bityutskiy <dedekind@infradead.org>
122Description:
123 Count of volumes on this UBI device.
124
125What: /sys/class/ubi/ubiX/ubiX_Y/
126Date: July 2006
127KernelVersion: 2.6.22
128Contact: Artem Bityutskiy <dedekind@infradead.org>
129Description:
130 The /sys/class/ubi/ubiX/ubiX_0/, /sys/class/ubi/ubiX/ubiX_1/,
131 etc directories describe UBI volumes on UBI device X (volumes
132 0, 1, etc).
133
134What: /sys/class/ubi/ubiX/ubiX_Y/alignment
135Date: July 2006
136KernelVersion: 2.6.22
137Contact: Artem Bityutskiy <dedekind@infradead.org>
138Description:
139 Volume alignment - the value the logical eraseblock size of
140 this volume has to be aligned on. For example, 2048 means that
141 logical eraseblock size is multiple of 2048. In other words,
142 volume logical eraseblock size is UBI device logical eraseblock
143 size aligned to the alignment value.
144
145What: /sys/class/ubi/ubiX/ubiX_Y/corrupted
146Date: July 2006
147KernelVersion: 2.6.22
148Contact: Artem Bityutskiy <dedekind@infradead.org>
149Description:
150 Contains ASCII "0\n" if the UBI volume is OK, and ASCII "1\n"
151 if it is corrupted (e.g., due to an interrupted volume update).
152
153What: /sys/class/ubi/ubiX/ubiX_Y/data_bytes
154Date: July 2006
155KernelVersion: 2.6.22
156Contact: Artem Bityutskiy <dedekind@infradead.org>
157Description:
158 The amount of data this volume contains. This value makes sense
159 only for static volumes, and for dynamic volume it equivalent
160 to the total volume size in bytes.
161
162What: /sys/class/ubi/ubiX/ubiX_Y/dev
163Date: July 2006
164KernelVersion: 2.6.22
165Contact: Artem Bityutskiy <dedekind@infradead.org>
166Description:
167 Major and minor numbers of the character device corresponding
168 to this UBI volume (in <major>:<minor> format).
169
170What: /sys/class/ubi/ubiX/ubiX_Y/name
171Date: July 2006
172KernelVersion: 2.6.22
173Contact: Artem Bityutskiy <dedekind@infradead.org>
174Description:
175 Volume name.
176
177What: /sys/class/ubi/ubiX/ubiX_Y/reserved_ebs
178Date: July 2006
179KernelVersion: 2.6.22
180Contact: Artem Bityutskiy <dedekind@infradead.org>
181Description:
182 Count of physical eraseblock reserved for this volume.
183 Equivalent to the volume size in logical eraseblocks.
184
185What: /sys/class/ubi/ubiX/ubiX_Y/type
186Date: July 2006
187KernelVersion: 2.6.22
188Contact: Artem Bityutskiy <dedekind@infradead.org>
189Description:
190 Volume type. Contains ASCII "dynamic\n" for dynamic volumes and
191 "static\n" for static volumes.
192
193What: /sys/class/ubi/ubiX/ubiX_Y/upd_marker
194Date: July 2006
195KernelVersion: 2.6.22
196Contact: Artem Bityutskiy <dedekind@infradead.org>
197Description:
198 Contains ASCII "0\n" if the update marker is not set for this
199 volume, and "1\n" if it is set. The update marker is set when
200 volume update starts, and cleaned when it ends. So the presence
201 of the update marker indicates that the volume is being updated
202 at the moment of the update was interrupted. The later may be
203 checked using the "corrupted" sysfs file.
204
205What: /sys/class/ubi/ubiX/ubiX_Y/usable_eb_size
206Date: July 2006
207KernelVersion: 2.6.22
208Contact: Artem Bityutskiy <dedekind@infradead.org>
209Description:
210 Logical eraseblock size of this volume. Equivalent to logical
211 eraseblock size of the device aligned on the volume alignment
212 value.
diff --git a/Documentation/DocBook/Makefile b/Documentation/DocBook/Makefile
index b2b6366bba51..83966e94cc32 100644
--- a/Documentation/DocBook/Makefile
+++ b/Documentation/DocBook/Makefile
@@ -187,8 +187,11 @@ quiet_cmd_fig2png = FIG2PNG $@
187 187
188### 188###
189# Rule to convert a .c file to inline XML documentation 189# Rule to convert a .c file to inline XML documentation
190 gen_xml = :
191 quiet_gen_xml = echo ' GEN $@'
192silent_gen_xml = :
190%.xml: %.c 193%.xml: %.c
191 @echo ' GEN $@' 194 @$($(quiet)gen_xml)
192 @( \ 195 @( \
193 echo "<programlisting>"; \ 196 echo "<programlisting>"; \
194 expand --tabs=8 < $< | \ 197 expand --tabs=8 < $< | \
diff --git a/Documentation/DocBook/kernel-api.tmpl b/Documentation/DocBook/kernel-api.tmpl
index 488dd4a4945b..b7b1482f6e04 100644
--- a/Documentation/DocBook/kernel-api.tmpl
+++ b/Documentation/DocBook/kernel-api.tmpl
@@ -119,7 +119,7 @@ X!Ilib/string.c
119!Elib/string.c 119!Elib/string.c
120 </sect1> 120 </sect1>
121 <sect1><title>Bit Operations</title> 121 <sect1><title>Bit Operations</title>
122!Iinclude/asm-x86/bitops_32.h 122!Iinclude/asm-x86/bitops.h
123 </sect1> 123 </sect1>
124 </chapter> 124 </chapter>
125 125
@@ -645,4 +645,58 @@ X!Idrivers/video/console/fonts.c
645!Edrivers/i2c/i2c-core.c 645!Edrivers/i2c/i2c-core.c
646 </chapter> 646 </chapter>
647 647
648 <chapter id="clk">
649 <title>Clock Framework</title>
650
651 <para>
652 The clock framework defines programming interfaces to support
653 software management of the system clock tree.
654 This framework is widely used with System-On-Chip (SOC) platforms
655 to support power management and various devices which may need
656 custom clock rates.
657 Note that these "clocks" don't relate to timekeeping or real
658 time clocks (RTCs), each of which have separate frameworks.
659 These <structname>struct clk</structname> instances may be used
660 to manage for example a 96 MHz signal that is used to shift bits
661 into and out of peripherals or busses, or otherwise trigger
662 synchronous state machine transitions in system hardware.
663 </para>
664
665 <para>
666 Power management is supported by explicit software clock gating:
667 unused clocks are disabled, so the system doesn't waste power
668 changing the state of transistors that aren't in active use.
669 On some systems this may be backed by hardware clock gating,
670 where clocks are gated without being disabled in software.
671 Sections of chips that are powered but not clocked may be able
672 to retain their last state.
673 This low power state is often called a <emphasis>retention
674 mode</emphasis>.
675 This mode still incurs leakage currents, especially with finer
676 circuit geometries, but for CMOS circuits power is mostly used
677 by clocked state changes.
678 </para>
679
680 <para>
681 Power-aware drivers only enable their clocks when the device
682 they manage is in active use. Also, system sleep states often
683 differ according to which clock domains are active: while a
684 "standby" state may allow wakeup from several active domains, a
685 "mem" (suspend-to-RAM) state may require a more wholesale shutdown
686 of clocks derived from higher speed PLLs and oscillators, limiting
687 the number of possible wakeup event sources. A driver's suspend
688 method may need to be aware of system-specific clock constraints
689 on the target sleep state.
690 </para>
691
692 <para>
693 Some platforms support programmable clock generators. These
694 can be used by external chips of various kinds, such as other
695 CPUs, multimedia codecs, and devices with strict requirements
696 for interface clocking.
697 </para>
698
699!Iinclude/linux/clk.h
700 </chapter>
701
648</book> 702</book>
diff --git a/Documentation/HOWTO b/Documentation/HOWTO
index 54835610b3d6..0291ade44c17 100644
--- a/Documentation/HOWTO
+++ b/Documentation/HOWTO
@@ -249,9 +249,11 @@ process is as follows:
249 release a new -rc kernel every week. 249 release a new -rc kernel every week.
250 - Process continues until the kernel is considered "ready", the 250 - Process continues until the kernel is considered "ready", the
251 process should last around 6 weeks. 251 process should last around 6 weeks.
252 - A list of known regressions present in each -rc release is 252 - Known regressions in each release are periodically posted to the
253 tracked at the following URI: 253 linux-kernel mailing list. The goal is to reduce the length of
254 http://kernelnewbies.org/known_regressions 254 that list to zero before declaring the kernel to be "ready," but, in
255 the real world, a small number of regressions often remain at
256 release time.
255 257
256It is worth mentioning what Andrew Morton wrote on the linux-kernel 258It is worth mentioning what Andrew Morton wrote on the linux-kernel
257mailing list about kernel releases: 259mailing list about kernel releases:
@@ -261,7 +263,7 @@ mailing list about kernel releases:
261 263
2622.6.x.y -stable kernel tree 2642.6.x.y -stable kernel tree
263--------------------------- 265---------------------------
264Kernels with 4 digit versions are -stable kernels. They contain 266Kernels with 4-part versions are -stable kernels. They contain
265relatively small and critical fixes for security problems or significant 267relatively small and critical fixes for security problems or significant
266regressions discovered in a given 2.6.x kernel. 268regressions discovered in a given 2.6.x kernel.
267 269
@@ -273,7 +275,10 @@ If no 2.6.x.y kernel is available, then the highest numbered 2.6.x
273kernel is the current stable kernel. 275kernel is the current stable kernel.
274 276
2752.6.x.y are maintained by the "stable" team <stable@kernel.org>, and are 2772.6.x.y are maintained by the "stable" team <stable@kernel.org>, and are
276released almost every other week. 278released as needs dictate. The normal release period is approximately
279two weeks, but it can be longer if there are no pressing problems. A
280security-related problem, instead, can cause a release to happen almost
281instantly.
277 282
278The file Documentation/stable_kernel_rules.txt in the kernel tree 283The file Documentation/stable_kernel_rules.txt in the kernel tree
279documents what kinds of changes are acceptable for the -stable tree, and 284documents what kinds of changes are acceptable for the -stable tree, and
@@ -298,7 +303,9 @@ a while Andrew or the subsystem maintainer pushes it on to Linus for
298inclusion in mainline. 303inclusion in mainline.
299 304
300It is heavily encouraged that all new patches get tested in the -mm tree 305It is heavily encouraged that all new patches get tested in the -mm tree
301before they are sent to Linus for inclusion in the main kernel tree. 306before they are sent to Linus for inclusion in the main kernel tree. Code
307which does not make an appearance in -mm before the opening of the merge
308window will prove hard to merge into the mainline.
302 309
303These kernels are not appropriate for use on systems that are supposed 310These kernels are not appropriate for use on systems that are supposed
304to be stable and they are more risky to run than any of the other 311to be stable and they are more risky to run than any of the other
@@ -354,11 +361,12 @@ Here is a list of some of the different kernel trees available:
354 - SCSI, James Bottomley <James.Bottomley@SteelEye.com> 361 - SCSI, James Bottomley <James.Bottomley@SteelEye.com>
355 git.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6.git 362 git.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6.git
356 363
364 - x86, Ingo Molnar <mingo@elte.hu>
365 git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86.git
366
357 quilt trees: 367 quilt trees:
358 - USB, PCI, Driver Core, and I2C, Greg Kroah-Hartman <gregkh@suse.de> 368 - USB, Driver Core, and I2C, Greg Kroah-Hartman <gregkh@suse.de>
359 kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/ 369 kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/
360 - x86-64, partly i386, Andi Kleen <ak@suse.de>
361 ftp.firstfloor.org:/pub/ak/x86_64/quilt/
362 370
363 Other kernel trees can be found listed at http://git.kernel.org/ and in 371 Other kernel trees can be found listed at http://git.kernel.org/ and in
364 the MAINTAINERS file. 372 the MAINTAINERS file.
@@ -392,8 +400,8 @@ If you want to be advised of the future bug reports, you can subscribe to the
392bugme-new mailing list (only new bug reports are mailed here) or to the 400bugme-new mailing list (only new bug reports are mailed here) or to the
393bugme-janitor mailing list (every change in the bugzilla is mailed here) 401bugme-janitor mailing list (every change in the bugzilla is mailed here)
394 402
395 http://lists.osdl.org/mailman/listinfo/bugme-new 403 http://lists.linux-foundation.org/mailman/listinfo/bugme-new
396 http://lists.osdl.org/mailman/listinfo/bugme-janitors 404 http://lists.linux-foundation.org/mailman/listinfo/bugme-janitors
397 405
398 406
399 407
diff --git a/Documentation/arm/Samsung-S3C24XX/NAND.txt b/Documentation/arm/Samsung-S3C24XX/NAND.txt
new file mode 100644
index 000000000000..bc478a3409b8
--- /dev/null
+++ b/Documentation/arm/Samsung-S3C24XX/NAND.txt
@@ -0,0 +1,30 @@
1 S3C24XX NAND Support
2 ====================
3
4Introduction
5------------
6
7Small Page NAND
8---------------
9
10The driver uses a 512 byte (1 page) ECC code for this setup. The
11ECC code is not directly compatible with the default kernel ECC
12code, so the driver enforces its own OOB layout and ECC parameters
13
14Large Page NAND
15---------------
16
17The driver is capable of handling NAND flash with a 2KiB page
18size, with support for hardware ECC generation and correction.
19
20Unlike the 512byte page mode, the driver generates ECC data for
21each 256 byte block in an 2KiB page. This means that more than
22one error in a page can be rectified. It also means that the
23OOB layout remains the default kernel layout for these flashes.
24
25
26Document Author
27---------------
28
29Ben Dooks, Copyright 2007 Simtec Electronics
30
diff --git a/Documentation/arm/Samsung-S3C24XX/Overview.txt b/Documentation/arm/Samsung-S3C24XX/Overview.txt
index c31b76fa66c4..d04e1e30c47f 100644
--- a/Documentation/arm/Samsung-S3C24XX/Overview.txt
+++ b/Documentation/arm/Samsung-S3C24XX/Overview.txt
@@ -156,6 +156,8 @@ NAND
156 controller. If there are any problems the latest linux-mtd 156 controller. If there are any problems the latest linux-mtd
157 code can be found from http://www.linux-mtd.infradead.org/ 157 code can be found from http://www.linux-mtd.infradead.org/
158 158
159 For more information see Documentation/arm/Samsung-S3C24XX/NAND.txt
160
159 161
160Serial 162Serial
161------ 163------
diff --git a/Documentation/device-mapper/dm-crypt.txt b/Documentation/device-mapper/dm-crypt.txt
new file mode 100644
index 000000000000..6680cab2c705
--- /dev/null
+++ b/Documentation/device-mapper/dm-crypt.txt
@@ -0,0 +1,52 @@
1dm-crypt
2=========
3
4Device-Mapper's "crypt" target provides transparent encryption of block devices
5using the kernel crypto API.
6
7Parameters: <cipher> <key> <iv_offset> <device path> <offset>
8
9<cipher>
10 Encryption cipher and an optional IV generation mode.
11 (In format cipher-chainmode-ivopts:ivmode).
12 Examples:
13 des
14 aes-cbc-essiv:sha256
15 twofish-ecb
16
17 /proc/crypto contains supported crypto modes
18
19<key>
20 Key used for encryption. It is encoded as a hexadecimal number.
21 You can only use key sizes that are valid for the selected cipher.
22
23<iv_offset>
24 The IV offset is a sector count that is added to the sector number
25 before creating the IV.
26
27<device path>
28 This is the device that is going to be used as backend and contains the
29 encrypted data. You can specify it as a path like /dev/xxx or a device
30 number <major>:<minor>.
31
32<offset>
33 Starting sector within the device where the encrypted data begins.
34
35Example scripts
36===============
37LUKS (Linux Unified Key Setup) is now the preferred way to set up disk
38encryption with dm-crypt using the 'cryptsetup' utility, see
39http://luks.endorphin.org/
40
41[[
42#!/bin/sh
43# Create a crypt device using dmsetup
44dmsetup create crypt1 --table "0 `blockdev --getsize $1` crypt aes-cbc-essiv:sha256 babebabebabebabebabebabebabebabe 0 $1 0"
45]]
46
47[[
48#!/bin/sh
49# Create a crypt device using cryptsetup and LUKS header with default cipher
50cryptsetup luksFormat $1
51cryptsetup luksOpen $1 crypt1
52]]
diff --git a/Documentation/fb/gxfb.txt b/Documentation/fb/gxfb.txt
new file mode 100644
index 000000000000..2f640903bbb2
--- /dev/null
+++ b/Documentation/fb/gxfb.txt
@@ -0,0 +1,52 @@
1[This file is cloned from VesaFB/aty128fb]
2
3What is gxfb?
4=================
5
6This is a graphics framebuffer driver for AMD Geode GX2 based processors.
7
8Advantages:
9
10 * No need to use AMD's VSA code (or other VESA emulation layer) in the
11 BIOS.
12 * It provides a nice large console (128 cols + 48 lines with 1024x768)
13 without using tiny, unreadable fonts.
14 * You can run XF68_FBDev on top of /dev/fb0
15 * Most important: boot logo :-)
16
17Disadvantages:
18
19 * graphic mode is slower than text mode...
20
21
22How to use it?
23==============
24
25Switching modes is done using gxfb.mode_option=<resolution>... boot
26parameter or using `fbset' program.
27
28See Documentation/fb/modedb.txt for more information on modedb
29resolutions.
30
31
32X11
33===
34
35XF68_FBDev should generally work fine, but it is non-accelerated.
36
37
38Configuration
39=============
40
41You can pass kernel command line options to gxfb with gxfb.<option>.
42For example, gxfb.mode_option=800x600@75.
43Accepted options:
44
45mode_option - specify the video mode. Of the form
46 <x>x<y>[-<bpp>][@<refresh>]
47vram - size of video ram (normally auto-detected)
48vt_switch - enable vt switching during suspend/resume. The vt
49 switch is slow, but harmless.
50
51--
52Andres Salomon <dilinger@debian.org>
diff --git a/Documentation/fb/intelfb.txt b/Documentation/fb/intelfb.txt
index da5ee74219e8..27a3160650a4 100644
--- a/Documentation/fb/intelfb.txt
+++ b/Documentation/fb/intelfb.txt
@@ -14,6 +14,8 @@ graphics devices. These would include:
14 Intel 915GM 14 Intel 915GM
15 Intel 945G 15 Intel 945G
16 Intel 945GM 16 Intel 945GM
17 Intel 965G
18 Intel 965GM
17 19
18B. List of available options 20B. List of available options
19 21
diff --git a/Documentation/fb/lxfb.txt b/Documentation/fb/lxfb.txt
new file mode 100644
index 000000000000..38b3ca6f6ca7
--- /dev/null
+++ b/Documentation/fb/lxfb.txt
@@ -0,0 +1,52 @@
1[This file is cloned from VesaFB/aty128fb]
2
3What is lxfb?
4=================
5
6This is a graphics framebuffer driver for AMD Geode LX based processors.
7
8Advantages:
9
10 * No need to use AMD's VSA code (or other VESA emulation layer) in the
11 BIOS.
12 * It provides a nice large console (128 cols + 48 lines with 1024x768)
13 without using tiny, unreadable fonts.
14 * You can run XF68_FBDev on top of /dev/fb0
15 * Most important: boot logo :-)
16
17Disadvantages:
18
19 * graphic mode is slower than text mode...
20
21
22How to use it?
23==============
24
25Switching modes is done using lxfb.mode_option=<resolution>... boot
26parameter or using `fbset' program.
27
28See Documentation/fb/modedb.txt for more information on modedb
29resolutions.
30
31
32X11
33===
34
35XF68_FBDev should generally work fine, but it is non-accelerated.
36
37
38Configuration
39=============
40
41You can pass kernel command line options to lxfb with lxfb.<option>.
42For example, lxfb.mode_option=800x600@75.
43Accepted options:
44
45mode_option - specify the video mode. Of the form
46 <x>x<y>[-<bpp>][@<refresh>]
47vram - size of video ram (normally auto-detected)
48vt_switch - enable vt switching during suspend/resume. The vt
49 switch is slow, but harmless.
50
51--
52Andres Salomon <dilinger@debian.org>
diff --git a/Documentation/fb/metronomefb.txt b/Documentation/fb/metronomefb.txt
index b9a2e7b7e838..237ca412582d 100644
--- a/Documentation/fb/metronomefb.txt
+++ b/Documentation/fb/metronomefb.txt
@@ -1,7 +1,7 @@
1 Metronomefb 1 Metronomefb
2 ----------- 2 -----------
3Maintained by Jaya Kumar <jayakumar.lkml.gmail.com> 3Maintained by Jaya Kumar <jayakumar.lkml.gmail.com>
4Last revised: Nov 20, 2007 4Last revised: Mar 10, 2008
5 5
6Metronomefb is a driver for the Metronome display controller. The controller 6Metronomefb is a driver for the Metronome display controller. The controller
7is from E-Ink Corporation. It is intended to be used to drive the E-Ink 7is from E-Ink Corporation. It is intended to be used to drive the E-Ink
@@ -11,20 +11,18 @@ display media here http://www.e-ink.com/products/matrix/metronome.html .
11Metronome is interfaced to the host CPU through the AMLCD interface. The 11Metronome is interfaced to the host CPU through the AMLCD interface. The
12host CPU generates the control information and the image in a framebuffer 12host CPU generates the control information and the image in a framebuffer
13which is then delivered to the AMLCD interface by a host specific method. 13which is then delivered to the AMLCD interface by a host specific method.
14Currently, that's implemented for the PXA's LCDC controller. The display and 14The display and error status are each pulled through individual GPIOs.
15error status are each pulled through individual GPIOs.
16 15
17Metronomefb was written for the PXA255/gumstix/lyre combination and 16Metronomefb is platform independent and depends on a board specific driver
18therefore currently has board set specific code in it. If other boards based on 17to do all physical IO work. Currently, an example is implemented for the
19other architectures are available, then the host specific code can be separated 18PXA board used in the AM-200 EPD devkit. This example is am200epd.c
20and abstracted out.
21 19
22Metronomefb requires waveform information which is delivered via the AMLCD 20Metronomefb requires waveform information which is delivered via the AMLCD
23interface to the metronome controller. The waveform information is expected to 21interface to the metronome controller. The waveform information is expected to
24be delivered from userspace via the firmware class interface. The waveform file 22be delivered from userspace via the firmware class interface. The waveform file
25can be compressed as long as your udev or hotplug script is aware of the need 23can be compressed as long as your udev or hotplug script is aware of the need
26to uncompress it before delivering it. metronomefb will ask for waveform.wbf 24to uncompress it before delivering it. metronomefb will ask for metronome.wbf
27which would typically go into /lib/firmware/waveform.wbf depending on your 25which would typically go into /lib/firmware/metronome.wbf depending on your
28udev/hotplug setup. I have only tested with a single waveform file which was 26udev/hotplug setup. I have only tested with a single waveform file which was
29originally labeled 23P01201_60_WT0107_MTC. I do not know what it stands for. 27originally labeled 23P01201_60_WT0107_MTC. I do not know what it stands for.
30Caution should be exercised when manipulating the waveform as there may be 28Caution should be exercised when manipulating the waveform as there may be
diff --git a/Documentation/fb/modedb.txt b/Documentation/fb/modedb.txt
index 4fcdb4cf4cca..ec4dee75a354 100644
--- a/Documentation/fb/modedb.txt
+++ b/Documentation/fb/modedb.txt
@@ -125,8 +125,12 @@ There may be more modes.
125 amifb - Amiga chipset frame buffer 125 amifb - Amiga chipset frame buffer
126 aty128fb - ATI Rage128 / Pro frame buffer 126 aty128fb - ATI Rage128 / Pro frame buffer
127 atyfb - ATI Mach64 frame buffer 127 atyfb - ATI Mach64 frame buffer
128 pm2fb - Permedia 2/2V frame buffer
129 pm3fb - Permedia 3 frame buffer
130 sstfb - Voodoo 1/2 (SST1) chipset frame buffer
128 tdfxfb - 3D Fx frame buffer 131 tdfxfb - 3D Fx frame buffer
129 tridentfb - Trident (Cyber)blade chipset frame buffer 132 tridentfb - Trident (Cyber)blade chipset frame buffer
133 vt8623fb - VIA 8623 frame buffer
130 134
131BTW, only a few drivers use this at the moment. Others are to follow 135BTW, only a few drivers use this at the moment. Others are to follow
132(feel free to send patches). 136(feel free to send patches).
diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt
index 448729fcaeb1..599fe55bf297 100644
--- a/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@ -128,15 +128,6 @@ Who: Arjan van de Ven <arjan@linux.intel.com>
128 128
129--------------------------- 129---------------------------
130 130
131What: vm_ops.nopage
132When: Soon, provided in-kernel callers have been converted
133Why: This interface is replaced by vm_ops.fault, but it has been around
134 forever, is used by a lot of drivers, and doesn't cost much to
135 maintain.
136Who: Nick Piggin <npiggin@suse.de>
137
138---------------------------
139
140What: PHYSDEVPATH, PHYSDEVBUS, PHYSDEVDRIVER in the uevent environment 131What: PHYSDEVPATH, PHYSDEVBUS, PHYSDEVDRIVER in the uevent environment
141When: October 2008 132When: October 2008
142Why: The stacking of class devices makes these values misleading and 133Why: The stacking of class devices makes these values misleading and
diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index 42d4b30b1045..c2992bc54f2f 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -511,7 +511,6 @@ prototypes:
511 void (*open)(struct vm_area_struct*); 511 void (*open)(struct vm_area_struct*);
512 void (*close)(struct vm_area_struct*); 512 void (*close)(struct vm_area_struct*);
513 int (*fault)(struct vm_area_struct*, struct vm_fault *); 513 int (*fault)(struct vm_area_struct*, struct vm_fault *);
514 struct page *(*nopage)(struct vm_area_struct*, unsigned long, int *);
515 int (*page_mkwrite)(struct vm_area_struct *, struct page *); 514 int (*page_mkwrite)(struct vm_area_struct *, struct page *);
516 515
517locking rules: 516locking rules:
@@ -519,7 +518,6 @@ locking rules:
519open: no yes 518open: no yes
520close: no yes 519close: no yes
521fault: no yes 520fault: no yes
522nopage: no yes
523page_mkwrite: no yes no 521page_mkwrite: no yes no
524 522
525 ->page_mkwrite() is called when a previously read-only page is 523 ->page_mkwrite() is called when a previously read-only page is
@@ -537,4 +535,3 @@ NULL.
537 535
538ipc/shm.c::shm_delete() - may need BKL. 536ipc/shm.c::shm_delete() - may need BKL.
539->read() and ->write() in many drivers are (probably) missing BKL. 537->read() and ->write() in many drivers are (probably) missing BKL.
540drivers/sgi/char/graphics.c::sgi_graphics_nopage() - may need BKL.
diff --git a/Documentation/filesystems/nfs-rdma.txt b/Documentation/filesystems/nfs-rdma.txt
index 1ae34879574b..d0ec45ae4e7d 100644
--- a/Documentation/filesystems/nfs-rdma.txt
+++ b/Documentation/filesystems/nfs-rdma.txt
@@ -5,7 +5,7 @@
5################################################################################ 5################################################################################
6 6
7 Author: NetApp and Open Grid Computing 7 Author: NetApp and Open Grid Computing
8 Date: February 25, 2008 8 Date: April 15, 2008
9 9
10Table of Contents 10Table of Contents
11~~~~~~~~~~~~~~~~~ 11~~~~~~~~~~~~~~~~~
@@ -197,12 +197,16 @@ NFS/RDMA Setup
197 - On the server system, configure the /etc/exports file and 197 - On the server system, configure the /etc/exports file and
198 start the NFS/RDMA server. 198 start the NFS/RDMA server.
199 199
200 Exports entries with the following format have been tested: 200 Exports entries with the following formats have been tested:
201 201
202 /vol0 10.97.103.47(rw,async) 192.168.0.47(rw,async,insecure,no_root_squash) 202 /vol0 192.168.0.47(fsid=0,rw,async,insecure,no_root_squash)
203 /vol0 192.168.0.0/255.255.255.0(fsid=0,rw,async,insecure,no_root_squash)
203 204
204 Here the first IP address is the client's Ethernet address and the second 205 The IP address(es) is(are) the client's IPoIB address for an InfiniBand HCA or the
205 IP address is the clients IPoIB address. 206 cleint's iWARP address(es) for an RNIC.
207
208 NOTE: The "insecure" option must be used because the NFS/RDMA client does not
209 use a reserved port.
206 210
207 Each time a machine boots: 211 Each time a machine boots:
208 212
diff --git a/Documentation/filesystems/seq_file.txt b/Documentation/filesystems/seq_file.txt
index 7fb8e6dc62bf..b843743aa0b5 100644
--- a/Documentation/filesystems/seq_file.txt
+++ b/Documentation/filesystems/seq_file.txt
@@ -122,8 +122,7 @@ stop() is the place to free it.
122 } 122 }
123 123
124Finally, the show() function should format the object currently pointed to 124Finally, the show() function should format the object currently pointed to
125by the iterator for output. It should return zero, or an error code if 125by the iterator for output. The example module's show() function is:
126something goes wrong. The example module's show() function is:
127 126
128 static int ct_seq_show(struct seq_file *s, void *v) 127 static int ct_seq_show(struct seq_file *s, void *v)
129 { 128 {
@@ -132,6 +131,12 @@ something goes wrong. The example module's show() function is:
132 return 0; 131 return 0;
133 } 132 }
134 133
134If all is well, the show() function should return zero. A negative error
135code in the usual manner indicates that something went wrong; it will be
136passed back to user space. This function can also return SEQ_SKIP, which
137causes the current item to be skipped; if the show() function has already
138generated output before returning SEQ_SKIP, that output will be dropped.
139
135We will look at seq_printf() in a moment. But first, the definition of the 140We will look at seq_printf() in a moment. But first, the definition of the
136seq_file iterator is finished by creating a seq_operations structure with 141seq_file iterator is finished by creating a seq_operations structure with
137the four functions we have just defined: 142the four functions we have just defined:
@@ -182,12 +187,18 @@ The first two output a single character and a string, just like one would
182expect. seq_escape() is like seq_puts(), except that any character in s 187expect. seq_escape() is like seq_puts(), except that any character in s
183which is in the string esc will be represented in octal form in the output. 188which is in the string esc will be represented in octal form in the output.
184 189
185There is also a function for printing filenames: 190There is also a pair of functions for printing filenames:
186 191
187 int seq_path(struct seq_file *m, struct path *path, char *esc); 192 int seq_path(struct seq_file *m, struct path *path, char *esc);
193 int seq_path_root(struct seq_file *m, struct path *path,
194 struct path *root, char *esc)
188 195
189Here, path indicates the file of interest, and esc is a set of characters 196Here, path indicates the file of interest, and esc is a set of characters
190which should be escaped in the output. 197which should be escaped in the output. A call to seq_path() will output
198the path relative to the current process's filesystem root. If a different
199root is desired, it can be used with seq_path_root(). Note that, if it
200turns out that path cannot be reached from root, the value of root will be
201changed in seq_file_root() to a root which *does* work.
191 202
192 203
193Making it all work 204Making it all work
diff --git a/Documentation/filesystems/tmpfs.txt b/Documentation/filesystems/tmpfs.txt
index 145e44086358..222437efd75a 100644
--- a/Documentation/filesystems/tmpfs.txt
+++ b/Documentation/filesystems/tmpfs.txt
@@ -92,6 +92,18 @@ NodeList format is a comma-separated list of decimal numbers and ranges,
92a range being two hyphen-separated decimal numbers, the smallest and 92a range being two hyphen-separated decimal numbers, the smallest and
93largest node numbers in the range. For example, mpol=bind:0-3,5,7,9-15 93largest node numbers in the range. For example, mpol=bind:0-3,5,7,9-15
94 94
95NUMA memory allocation policies have optional flags that can be used in
96conjunction with their modes. These optional flags can be specified
97when tmpfs is mounted by appending them to the mode before the NodeList.
98See Documentation/vm/numa_memory_policy.txt for a list of all available
99memory allocation policy mode flags.
100
101 =static is equivalent to MPOL_F_STATIC_NODES
102 =relative is equivalent to MPOL_F_RELATIVE_NODES
103
104For example, mpol=bind=static:NodeList, is the equivalent of an
105allocation policy of MPOL_BIND | MPOL_F_STATIC_NODES.
106
95Note that trying to mount a tmpfs with an mpol option will fail if the 107Note that trying to mount a tmpfs with an mpol option will fail if the
96running kernel does not support NUMA; and will fail if its nodelist 108running kernel does not support NUMA; and will fail if its nodelist
97specifies a node which is not online. If your system relies on that 109specifies a node which is not online. If your system relies on that
diff --git a/Documentation/filesystems/vfat.txt b/Documentation/filesystems/vfat.txt
index fcc123ffa252..2d5e1e582e13 100644
--- a/Documentation/filesystems/vfat.txt
+++ b/Documentation/filesystems/vfat.txt
@@ -17,6 +17,21 @@ dmask=### -- The permission mask for the directory.
17fmask=### -- The permission mask for files. 17fmask=### -- The permission mask for files.
18 The default is the umask of current process. 18 The default is the umask of current process.
19 19
20allow_utime=### -- This option controls the permission check of mtime/atime.
21
22 20 - If current process is in group of file's group ID,
23 you can change timestamp.
24 2 - Other users can change timestamp.
25
26 The default is set from `dmask' option. (If the directory is
27 writable, utime(2) is also allowed. I.e. ~dmask & 022)
28
29 Normally utime(2) checks current process is owner of
30 the file, or it has CAP_FOWNER capability. But FAT
31 filesystem doesn't have uid/gid on disk, so normal
32 check is too unflexible. With this option you can
33 relax it.
34
20codepage=### -- Sets the codepage number for converting to shortname 35codepage=### -- Sets the codepage number for converting to shortname
21 characters on FAT filesystem. 36 characters on FAT filesystem.
22 By default, FAT_DEFAULT_CODEPAGE setting is used. 37 By default, FAT_DEFAULT_CODEPAGE setting is used.
diff --git a/Documentation/gpio.txt b/Documentation/gpio.txt
index 54630095aa3c..c35ca9e40d4c 100644
--- a/Documentation/gpio.txt
+++ b/Documentation/gpio.txt
@@ -107,6 +107,16 @@ type of GPIO controller, and on one particular board 80-95 with an FPGA.
107The numbers need not be contiguous; either of those platforms could also 107The numbers need not be contiguous; either of those platforms could also
108use numbers 2000-2063 to identify GPIOs in a bank of I2C GPIO expanders. 108use numbers 2000-2063 to identify GPIOs in a bank of I2C GPIO expanders.
109 109
110If you want to initialize a structure with an invalid GPIO number, use
111some negative number (perhaps "-EINVAL"); that will never be valid. To
112test if a number could reference a GPIO, you may use this predicate:
113
114 int gpio_is_valid(int number);
115
116A number that's not valid will be rejected by calls which may request
117or free GPIOs (see below). Other numbers may also be rejected; for
118example, a number might be valid but unused on a given board.
119
110Whether a platform supports multiple GPIO controllers is currently a 120Whether a platform supports multiple GPIO controllers is currently a
111platform-specific implementation issue. 121platform-specific implementation issue.
112 122
diff --git a/Documentation/i386/boot.txt b/Documentation/i386/boot.txt
index 2eb16100bb3f..0fac3465f2e3 100644
--- a/Documentation/i386/boot.txt
+++ b/Documentation/i386/boot.txt
@@ -42,6 +42,8 @@ Protocol 2.05: (Kernel 2.6.20) Make protected mode kernel relocatable.
42Protocol 2.06: (Kernel 2.6.22) Added a field that contains the size of 42Protocol 2.06: (Kernel 2.6.22) Added a field that contains the size of
43 the boot command line 43 the boot command line
44 44
45Protocol 2.09: (kernel 2.6.26) Added a field of 64-bit physical
46 pointer to single linked list of struct setup_data.
45 47
46**** MEMORY LAYOUT 48**** MEMORY LAYOUT
47 49
@@ -172,6 +174,8 @@ Offset Proto Name Meaning
1720240/8 2.07+ hardware_subarch_data Subarchitecture-specific data 1740240/8 2.07+ hardware_subarch_data Subarchitecture-specific data
1730248/4 2.08+ payload_offset Offset of kernel payload 1750248/4 2.08+ payload_offset Offset of kernel payload
174024C/4 2.08+ payload_length Length of kernel payload 176024C/4 2.08+ payload_length Length of kernel payload
1770250/8 2.09+ setup_data 64-bit physical pointer to linked list
178 of struct setup_data
175 179
176(1) For backwards compatibility, if the setup_sects field contains 0, the 180(1) For backwards compatibility, if the setup_sects field contains 0, the
177 real value is 4. 181 real value is 4.
@@ -572,6 +576,28 @@ command line is entered using the following protocol:
572 covered by setup_move_size, so you may need to adjust this 576 covered by setup_move_size, so you may need to adjust this
573 field. 577 field.
574 578
579Field name: setup_data
580Type: write (obligatory)
581Offset/size: 0x250/8
582Protocol: 2.09+
583
584 The 64-bit physical pointer to NULL terminated single linked list of
585 struct setup_data. This is used to define a more extensible boot
586 parameters passing mechanism. The definition of struct setup_data is
587 as follow:
588
589 struct setup_data {
590 u64 next;
591 u32 type;
592 u32 len;
593 u8 data[0];
594 };
595
596 Where, the next is a 64-bit physical pointer to the next node of
597 linked list, the next field of the last node is 0; the type is used
598 to identify the contents of data; the len is the length of data
599 field; the data holds the real payload.
600
575 601
576**** MEMORY LAYOUT OF THE REAL-MODE CODE 602**** MEMORY LAYOUT OF THE REAL-MODE CODE
577 603
diff --git a/Documentation/ia64/kvm.txt b/Documentation/ia64/kvm.txt
new file mode 100644
index 000000000000..bec9d815da33
--- /dev/null
+++ b/Documentation/ia64/kvm.txt
@@ -0,0 +1,82 @@
1Currently, kvm module in EXPERIMENTAL stage on IA64. This means that
2interfaces are not stable enough to use. So, plase had better don't run
3critical applications in virtual machine. We will try our best to make it
4strong in future versions!
5 Guide: How to boot up guests on kvm/ia64
6
7This guide is to describe how to enable kvm support for IA-64 systems.
8
91. Get the kvm source from git.kernel.org.
10 Userspace source:
11 git clone git://git.kernel.org/pub/scm/virt/kvm/kvm-userspace.git
12 Kernel Source:
13 git clone git://git.kernel.org/pub/scm/linux/kernel/git/xiantao/kvm-ia64.git
14
152. Compile the source code.
16 2.1 Compile userspace code:
17 (1)cd ./kvm-userspace
18 (2)./configure
19 (3)cd kernel
20 (4)make sync LINUX= $kernel_dir (kernel_dir is the directory of kernel source.)
21 (5)cd ..
22 (6)make qemu
23 (7)cd qemu; make install
24
25 2.2 Compile kernel source code:
26 (1) cd ./$kernel_dir
27 (2) Make menuconfig
28 (3) Enter into virtualization option, and choose kvm.
29 (4) make
30 (5) Once (4) done, make modules_install
31 (6) Make initrd, and use new kernel to reboot up host machine.
32 (7) Once (6) done, cd $kernel_dir/arch/ia64/kvm
33 (8) insmod kvm.ko; insmod kvm-intel.ko
34
35Note: For step 2, please make sure that host page size == TARGET_PAGE_SIZE of qemu, otherwise, may fail.
36
373. Get Guest Firmware named as Flash.fd, and put it under right place:
38 (1) If you have the guest firmware (binary) released by Intel Corp for Xen, use it directly.
39
40 (2) If you have no firmware at hand, Please download its source from
41 hg clone http://xenbits.xensource.com/ext/efi-vfirmware.hg
42 you can get the firmware's binary in the directory of efi-vfirmware.hg/binaries.
43
44 (3) Rename the firware you owned to Flash.fd, and copy it to /usr/local/share/qemu
45
464. Boot up Linux or Windows guests:
47 4.1 Create or install a image for guest boot. If you have xen experience, it should be easy.
48
49 4.2 Boot up guests use the following command.
50 /usr/local/bin/qemu-system-ia64 -smp xx -m 512 -hda $your_image
51 (xx is the number of virtual processors for the guest, now the maximum value is 4)
52
535. Known possibile issue on some platforms with old Firmware.
54
55If meet strange host crashe issues, try to solve it through either of the following ways:
56
57(1): Upgrade your Firmware to the latest one.
58
59(2): Applying the below patch to kernel source.
60diff --git a/arch/ia64/kernel/pal.S b/arch/ia64/kernel/pal.S
61index 0b53344..f02b0f7 100644
62--- a/arch/ia64/kernel/pal.S
63+++ b/arch/ia64/kernel/pal.S
64@@ -84,7 +84,8 @@ GLOBAL_ENTRY(ia64_pal_call_static)
65 mov ar.pfs = loc1
66 mov rp = loc0
67 ;;
68- srlz.d // seralize restoration of psr.l
69+ srlz.i // seralize restoration of psr.l
70+ ;;
71 br.ret.sptk.many b0
72 END(ia64_pal_call_static)
73
746. Bug report:
75 If you found any issues when use kvm/ia64, Please post the bug info to kvm-ia64-devel mailing list.
76 https://lists.sourceforge.net/lists/listinfo/kvm-ia64-devel/
77
78Thanks for your interest! Let's work together, and make kvm/ia64 stronger and stronger!
79
80
81 Xiantao Zhang <xiantao.zhang@intel.com>
82 2008.3.10
diff --git a/Documentation/ide/ide-tape.txt b/Documentation/ide/ide-tape.txt
index 658f271a373f..3f348a0b21d8 100644
--- a/Documentation/ide/ide-tape.txt
+++ b/Documentation/ide/ide-tape.txt
@@ -1,146 +1,65 @@
1/* 1IDE ATAPI streaming tape driver.
2 * IDE ATAPI streaming tape driver. 2
3 * 3This driver is a part of the Linux ide driver.
4 * This driver is a part of the Linux ide driver. 4
5 * 5The driver, in co-operation with ide.c, basically traverses the
6 * The driver, in co-operation with ide.c, basically traverses the 6request-list for the block device interface. The character device
7 * request-list for the block device interface. The character device 7interface, on the other hand, creates new requests, adds them
8 * interface, on the other hand, creates new requests, adds them 8to the request-list of the block device, and waits for their completion.
9 * to the request-list of the block device, and waits for their completion. 9
10 * 10The block device major and minor numbers are determined from the
11 * Pipelined operation mode is now supported on both reads and writes. 11tape's relative position in the ide interfaces, as explained in ide.c.
12 * 12
13 * The block device major and minor numbers are determined from the 13The character device interface consists of the following devices:
14 * tape's relative position in the ide interfaces, as explained in ide.c. 14
15 * 15ht0 major 37, minor 0 first IDE tape, rewind on close.
16 * The character device interface consists of the following devices: 16ht1 major 37, minor 1 second IDE tape, rewind on close.
17 * 17...
18 * ht0 major 37, minor 0 first IDE tape, rewind on close. 18nht0 major 37, minor 128 first IDE tape, no rewind on close.
19 * ht1 major 37, minor 1 second IDE tape, rewind on close. 19nht1 major 37, minor 129 second IDE tape, no rewind on close.
20 * ... 20...
21 * nht0 major 37, minor 128 first IDE tape, no rewind on close. 21
22 * nht1 major 37, minor 129 second IDE tape, no rewind on close. 22The general magnetic tape commands compatible interface, as defined by
23 * ... 23include/linux/mtio.h, is accessible through the character device.
24 * 24
25 * The general magnetic tape commands compatible interface, as defined by 25General ide driver configuration options, such as the interrupt-unmask
26 * include/linux/mtio.h, is accessible through the character device. 26flag, can be configured by issuing an ioctl to the block device interface,
27 * 27as any other ide device.
28 * General ide driver configuration options, such as the interrupt-unmask 28
29 * flag, can be configured by issuing an ioctl to the block device interface, 29Our own ide-tape ioctl's can be issued to either the block device or
30 * as any other ide device. 30the character device interface.
31 * 31
32 * Our own ide-tape ioctl's can be issued to either the block device or 32Maximal throughput with minimal bus load will usually be achieved in the
33 * the character device interface. 33following scenario:
34 * 34
35 * Maximal throughput with minimal bus load will usually be achieved in the 35 1. ide-tape is operating in the pipelined operation mode.
36 * following scenario: 36 2. No buffering is performed by the user backup program.
37 * 37
38 * 1. ide-tape is operating in the pipelined operation mode. 38Testing was done with a 2 GB CONNER CTMA 4000 IDE ATAPI Streaming Tape Drive.
39 * 2. No buffering is performed by the user backup program. 39
40 * 40Here are some words from the first releases of hd.c, which are quoted
41 * Testing was done with a 2 GB CONNER CTMA 4000 IDE ATAPI Streaming Tape Drive. 41in ide.c and apply here as well:
42 * 42
43 * Here are some words from the first releases of hd.c, which are quoted 43| Special care is recommended. Have Fun!
44 * in ide.c and apply here as well: 44
45 * 45Possible improvements:
46 * | Special care is recommended. Have Fun! 46
47 * 471. Support for the ATAPI overlap protocol.
48 * 48
49 * An overview of the pipelined operation mode. 49In order to maximize bus throughput, we currently use the DSC
50 * 50overlap method which enables ide.c to service requests from the
51 * In the pipelined write mode, we will usually just add requests to our 51other device while the tape is busy executing a command. The
52 * pipeline and return immediately, before we even start to service them. The 52DSC overlap method involves polling the tape's status register
53 * user program will then have enough time to prepare the next request while 53for the DSC bit, and servicing the other device while the tape
54 * we are still busy servicing previous requests. In the pipelined read mode, 54isn't ready.
55 * the situation is similar - we add read-ahead requests into the pipeline, 55
56 * before the user even requested them. 56In the current QIC development standard (December 1995),
57 * 57it is recommended that new tape drives will *in addition*
58 * The pipeline can be viewed as a "safety net" which will be activated when 58implement the ATAPI overlap protocol, which is used for the
59 * the system load is high and prevents the user backup program from keeping up 59same purpose - efficient use of the IDE bus, but is interrupt
60 * with the current tape speed. At this point, the pipeline will get 60driven and thus has much less CPU overhead.
61 * shorter and shorter but the tape will still be streaming at the same speed. 61
62 * Assuming we have enough pipeline stages, the system load will hopefully 62ATAPI overlap is likely to be supported in most new ATAPI
63 * decrease before the pipeline is completely empty, and the backup program 63devices, including new ATAPI cdroms, and thus provides us
64 * will be able to "catch up" and refill the pipeline again. 64a method by which we can achieve higher throughput when
65 * 65sharing a (fast) ATA-2 disk with any (slow) new ATAPI device.
66 * When using the pipelined mode, it would be best to disable any type of
67 * buffering done by the user program, as ide-tape already provides all the
68 * benefits in the kernel, where it can be done in a more efficient way.
69 * As we will usually not block the user program on a request, the most
70 * efficient user code will then be a simple read-write-read-... cycle.
71 * Any additional logic will usually just slow down the backup process.
72 *
73 * Using the pipelined mode, I get a constant over 400 KBps throughput,
74 * which seems to be the maximum throughput supported by my tape.
75 *
76 * However, there are some downfalls:
77 *
78 * 1. We use memory (for data buffers) in proportional to the number
79 * of pipeline stages (each stage is about 26 KB with my tape).
80 * 2. In the pipelined write mode, we cheat and postpone error codes
81 * to the user task. In read mode, the actual tape position
82 * will be a bit further than the last requested block.
83 *
84 * Concerning (1):
85 *
86 * 1. We allocate stages dynamically only when we need them. When
87 * we don't need them, we don't consume additional memory. In
88 * case we can't allocate stages, we just manage without them
89 * (at the expense of decreased throughput) so when Linux is
90 * tight in memory, we will not pose additional difficulties.
91 *
92 * 2. The maximum number of stages (which is, in fact, the maximum
93 * amount of memory) which we allocate is limited by the compile
94 * time parameter IDETAPE_MAX_PIPELINE_STAGES.
95 *
96 * 3. The maximum number of stages is a controlled parameter - We
97 * don't start from the user defined maximum number of stages
98 * but from the lower IDETAPE_MIN_PIPELINE_STAGES (again, we
99 * will not even allocate this amount of stages if the user
100 * program can't handle the speed). We then implement a feedback
101 * loop which checks if the pipeline is empty, and if it is, we
102 * increase the maximum number of stages as necessary until we
103 * reach the optimum value which just manages to keep the tape
104 * busy with minimum allocated memory or until we reach
105 * IDETAPE_MAX_PIPELINE_STAGES.
106 *
107 * Concerning (2):
108 *
109 * In pipelined write mode, ide-tape can not return accurate error codes
110 * to the user program since we usually just add the request to the
111 * pipeline without waiting for it to be serviced. In case an error
112 * occurs, I will report it on the next user request.
113 *
114 * In the pipelined read mode, subsequent read requests or forward
115 * filemark spacing will perform correctly, as we preserve all blocks
116 * and filemarks which we encountered during our excess read-ahead.
117 *
118 * For accurate tape positioning and error reporting, disabling
119 * pipelined mode might be the best option.
120 *
121 * You can enable/disable/tune the pipelined operation mode by adjusting
122 * the compile time parameters below.
123 *
124 *
125 * Possible improvements.
126 *
127 * 1. Support for the ATAPI overlap protocol.
128 *
129 * In order to maximize bus throughput, we currently use the DSC
130 * overlap method which enables ide.c to service requests from the
131 * other device while the tape is busy executing a command. The
132 * DSC overlap method involves polling the tape's status register
133 * for the DSC bit, and servicing the other device while the tape
134 * isn't ready.
135 *
136 * In the current QIC development standard (December 1995),
137 * it is recommended that new tape drives will *in addition*
138 * implement the ATAPI overlap protocol, which is used for the
139 * same purpose - efficient use of the IDE bus, but is interrupt
140 * driven and thus has much less CPU overhead.
141 *
142 * ATAPI overlap is likely to be supported in most new ATAPI
143 * devices, including new ATAPI cdroms, and thus provides us
144 * a method by which we can achieve higher throughput when
145 * sharing a (fast) ATA-2 disk with any (slow) new ATAPI device.
146 */
diff --git a/Documentation/ide/ide.txt b/Documentation/ide/ide.txt
index 486c699f4aea..0c78f4b1d9d9 100644
--- a/Documentation/ide/ide.txt
+++ b/Documentation/ide/ide.txt
@@ -82,27 +82,26 @@ Drives are normally found by auto-probing and/or examining the CMOS/BIOS data.
82For really weird situations, the apparent (fdisk) geometry can also be specified 82For really weird situations, the apparent (fdisk) geometry can also be specified
83on the kernel "command line" using LILO. The format of such lines is: 83on the kernel "command line" using LILO. The format of such lines is:
84 84
85 hdx=cyls,heads,sects 85 ide_core.chs=[interface_number.device_number]:cyls,heads,sects
86or hdx=cdrom 86or ide_core.cdrom=[interface_number.device_number]
87 87
88where hdx can be any of hda through hdh, Three values are required 88For example:
89(cyls,heads,sects). For example:
90 89
91 hdc=1050,32,64 hdd=cdrom 90 ide_core.chs=1.0:1050,32,64 ide_core.cdrom=1.1
92 91
93either {hda,hdb} or {hdc,hdd}. The results of successful auto-probing may 92The results of successful auto-probing may override the physical geometry/irq
94override the physical geometry/irq specified, though the "original" geometry 93specified, though the "original" geometry may be retained as the "logical"
95may be retained as the "logical" geometry for partitioning purposes (fdisk). 94geometry for partitioning purposes (fdisk).
96 95
97If the auto-probing during boot time confuses a drive (ie. the drive works 96If the auto-probing during boot time confuses a drive (ie. the drive works
98with hd.c but not with ide.c), then an command line option may be specified 97with hd.c but not with ide.c), then an command line option may be specified
99for each drive for which you'd like the drive to skip the hardware 98for each drive for which you'd like the drive to skip the hardware
100probe/identification sequence. For example: 99probe/identification sequence. For example:
101 100
102 hdb=noprobe 101 ide_core.noprobe=0.1
103or 102or
104 hdc=768,16,32 103 ide_core.chs=1.0:768,16,32
105 hdc=noprobe 104 ide_core.noprobe=1.0
106 105
107Note that when only one IDE device is attached to an interface, it should be 106Note that when only one IDE device is attached to an interface, it should be
108jumpered as "single" or "master", *not* "slave". Many folks have had 107jumpered as "single" or "master", *not* "slave". Many folks have had
@@ -118,9 +117,9 @@ If for some reason your cdrom drive is *not* found at boot time, you can force
118the probe to look harder by supplying a kernel command line parameter 117the probe to look harder by supplying a kernel command line parameter
119via LILO, such as: 118via LILO, such as:
120 119
121 hdc=cdrom /* hdc = "master" on second interface */ 120 ide_core.cdrom=1.0 /* "master" on second interface (hdc) */
122or 121or
123 hdd=cdrom /* hdd = "slave" on second interface */ 122 ide_core.cdrom=1.1 /* "slave" on second interface (hdd) */
124 123
125For example, a GW2000 system might have a hard drive on the primary 124For example, a GW2000 system might have a hard drive on the primary
126interface (/dev/hda) and an IDE cdrom drive on the secondary interface 125interface (/dev/hda) and an IDE cdrom drive on the secondary interface
@@ -174,9 +173,7 @@ to /etc/modprobe.conf.
174 173
175When ide.c is used as a module, you can pass command line parameters to the 174When ide.c is used as a module, you can pass command line parameters to the
176driver using the "options=" keyword to insmod, while replacing any ',' with 175driver using the "options=" keyword to insmod, while replacing any ',' with
177';'. For example: 176';'.
178
179 insmod ide.o options="hda=nodma hdb=nodma"
180 177
181 178
182================================================================================ 179================================================================================
@@ -184,57 +181,6 @@ driver using the "options=" keyword to insmod, while replacing any ',' with
184Summary of ide driver parameters for kernel command line 181Summary of ide driver parameters for kernel command line
185-------------------------------------------------------- 182--------------------------------------------------------
186 183
187 "hdx=" is recognized for all "x" from "a" to "u", such as "hdc".
188
189 "idex=" is recognized for all "x" from "0" to "9", such as "ide1".
190
191 "hdx=noprobe" : drive may be present, but do not probe for it
192
193 "hdx=none" : drive is NOT present, ignore cmos and do not probe
194
195 "hdx=nowerr" : ignore the WRERR_STAT bit on this drive
196
197 "hdx=cdrom" : drive is present, and is a cdrom drive
198
199 "hdx=cyl,head,sect" : disk drive is present, with specified geometry
200
201 "hdx=autotune" : driver will attempt to tune interface speed
202 to the fastest PIO mode supported,
203 if possible for this drive only.
204 Not fully supported by all chipset types,
205 and quite likely to cause trouble with
206 older/odd IDE drives.
207
208 "hdx=nodma" : disallow DMA
209
210 "idebus=xx" : inform IDE driver of VESA/PCI bus speed in MHz,
211 where "xx" is between 20 and 66 inclusive,
212 used when tuning chipset PIO modes.
213 For PCI bus, 25 is correct for a P75 system,
214 30 is correct for P90,P120,P180 systems,
215 and 33 is used for P100,P133,P166 systems.
216 If in doubt, use idebus=33 for PCI.
217 As for VLB, it is safest to not specify it.
218 Bigger values are safer than smaller ones.
219
220 "idex=serialize" : do not overlap operations on idex. Please note
221 that you will have to specify this option for
222 both the respective primary and secondary channel
223 to take effect.
224
225 "idex=reset" : reset interface after probe
226
227 "idex=ata66" : informs the interface that it has an 80c cable
228 for chipsets that are ATA-66 capable, but the
229 ability to bit test for detection is currently
230 unknown.
231
232 "ide=doubler" : probe/support IDE doublers on Amiga
233
234There may be more options than shown -- use the source, Luke!
235
236Everything else is rejected with a "BAD OPTION" message.
237
238For legacy IDE VLB host drivers (ali14xx/dtc2278/ht6560b/qd65xx/umc8672) 184For legacy IDE VLB host drivers (ali14xx/dtc2278/ht6560b/qd65xx/umc8672)
239you need to explicitly enable probing by using "probe" kernel parameter, 185you need to explicitly enable probing by using "probe" kernel parameter,
240i.e. to enable probing for ALI M14xx chipsets (ali14xx host driver) use: 186i.e. to enable probing for ALI M14xx chipsets (ali14xx host driver) use:
@@ -251,6 +197,33 @@ are detected automatically).
251You also need to use "probe" kernel parameter for ide-4drives driver 197You also need to use "probe" kernel parameter for ide-4drives driver
252(support for IDE generic chipset with four drives on one port). 198(support for IDE generic chipset with four drives on one port).
253 199
200To enable support for IDE doublers on Amiga use "doubler" kernel parameter
201for gayle host driver (i.e. "gayle.doubler" if the driver is built-in).
202
203To force ignoring cable detection (this should be needed only if you're using
204short 40-wires cable which cannot be automatically detected - if this is not
205a case please report it as a bug instead) use "ignore_cable" kernel parameter:
206
207* "ide_core.ignore_cable=[interface_number]" boot option if IDE is built-in
208 (i.e. "ide_core.ignore_cable=1" to force ignoring cable for "ide1")
209
210* "ignore_cable=[interface_number]" module parameter (for ide_core module)
211 if IDE is compiled as module
212
213Other kernel parameters for ide_core are:
214
215* "nodma=[interface_number.device_number]" to disallow DMA for a device
216
217* "noflush=[interface_number.device_number]" to disable flush requests
218
219* "noprobe=[interface_number.device_number]" to skip probing
220
221* "nowerr=[interface_number.device_number]" to ignore the WRERR_STAT bit
222
223* "cdrom=[interface_number.device_number]" to force device as a CD-ROM
224
225* "chs=[interface_number.device_number]" to force device as a disk (using CHS)
226
254================================================================================ 227================================================================================
255 228
256Some Terminology 229Some Terminology
diff --git a/Documentation/ioctl-number.txt b/Documentation/ioctl-number.txt
index c18363bd8d11..240ce7a56c40 100644
--- a/Documentation/ioctl-number.txt
+++ b/Documentation/ioctl-number.txt
@@ -183,6 +183,8 @@ Code Seq# Include File Comments
1830xAC 00-1F linux/raw.h 1830xAC 00-1F linux/raw.h
1840xAD 00 Netfilter device in development: 1840xAD 00 Netfilter device in development:
185 <mailto:rusty@rustcorp.com.au> 185 <mailto:rusty@rustcorp.com.au>
1860xAE all linux/kvm.h Kernel-based Virtual Machine
187 <mailto:kvm-devel@lists.sourceforge.net>
1860xB0 all RATIO devices in development: 1880xB0 all RATIO devices in development:
187 <mailto:vgo@ratio.de> 189 <mailto:vgo@ratio.de>
1880xB1 00-1F PPPoX <mailto:mostrows@styx.uwaterloo.ca> 1900xB1 00-1F PPPoX <mailto:mostrows@styx.uwaterloo.ca>
diff --git a/Documentation/kbuild/modules.txt b/Documentation/kbuild/modules.txt
index 1d247d59ad56..1821c077b435 100644
--- a/Documentation/kbuild/modules.txt
+++ b/Documentation/kbuild/modules.txt
@@ -486,7 +486,7 @@ Module.symvers contains a list of all exported symbols from a kernel build.
486 Sometimes, an external module uses exported symbols from another 486 Sometimes, an external module uses exported symbols from another
487 external module. Kbuild needs to have full knowledge on all symbols 487 external module. Kbuild needs to have full knowledge on all symbols
488 to avoid spitting out warnings about undefined symbols. 488 to avoid spitting out warnings about undefined symbols.
489 Two solutions exist to let kbuild know all symbols of more than 489 Three solutions exist to let kbuild know all symbols of more than
490 one external module. 490 one external module.
491 The method with a top-level kbuild file is recommended but may be 491 The method with a top-level kbuild file is recommended but may be
492 impractical in certain situations. 492 impractical in certain situations.
@@ -523,6 +523,13 @@ Module.symvers contains a list of all exported symbols from a kernel build.
523 containing the sum of all symbols defined and not part of the 523 containing the sum of all symbols defined and not part of the
524 kernel. 524 kernel.
525 525
526 Use make variable KBUILD_EXTRA_SYMBOLS in the Makefile
527 If it is impractical to copy Module.symvers from another
528 module, you can assign a space separated list of files to
529 KBUILD_EXTRA_SYMBOLS in your Makfile. These files will be
530 loaded by modpost during the initialisation of its symbol
531 tables.
532
526=== 8. Tips & Tricks 533=== 8. Tips & Tricks
527 534
528--- 8.1 Testing for CONFIG_FOO_BAR 535--- 8.1 Testing for CONFIG_FOO_BAR
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index bf6303ec0bde..e5f3d918316f 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -772,10 +772,6 @@ and is between 256 and 4096 characters. It is defined in the file
772 Format: ide=nodma or ide=doubler 772 Format: ide=nodma or ide=doubler
773 See Documentation/ide/ide.txt. 773 See Documentation/ide/ide.txt.
774 774
775 ide?= [HW] (E)IDE subsystem
776 Format: ide?=ata66 or chipset specific parameters.
777 See Documentation/ide/ide.txt.
778
779 idebus= [HW] (E)IDE subsystem - VLB/PCI bus speed 775 idebus= [HW] (E)IDE subsystem - VLB/PCI bus speed
780 See Documentation/ide/ide.txt. 776 See Documentation/ide/ide.txt.
781 777
diff --git a/Documentation/kprobes.txt b/Documentation/kprobes.txt
index be89f393274f..6877e7187113 100644
--- a/Documentation/kprobes.txt
+++ b/Documentation/kprobes.txt
@@ -37,6 +37,11 @@ registration function such as register_kprobe() specifies where
37the probe is to be inserted and what handler is to be called when 37the probe is to be inserted and what handler is to be called when
38the probe is hit. 38the probe is hit.
39 39
40There are also register_/unregister_*probes() functions for batch
41registration/unregistration of a group of *probes. These functions
42can speed up unregistration process when you have to unregister
43a lot of probes at once.
44
40The next three subsections explain how the different types of 45The next three subsections explain how the different types of
41probes work. They explain certain things that you'll need to 46probes work. They explain certain things that you'll need to
42know in order to make the best use of Kprobes -- e.g., the 47know in order to make the best use of Kprobes -- e.g., the
@@ -190,10 +195,11 @@ code mapping.
1904. API Reference 1954. API Reference
191 196
192The Kprobes API includes a "register" function and an "unregister" 197The Kprobes API includes a "register" function and an "unregister"
193function for each type of probe. Here are terse, mini-man-page 198function for each type of probe. The API also includes "register_*probes"
194specifications for these functions and the associated probe handlers 199and "unregister_*probes" functions for (un)registering arrays of probes.
195that you'll write. See the files in the samples/kprobes/ sub-directory 200Here are terse, mini-man-page specifications for these functions and
196for examples. 201the associated probe handlers that you'll write. See the files in the
202samples/kprobes/ sub-directory for examples.
197 203
1984.1 register_kprobe 2044.1 register_kprobe
199 205
@@ -319,6 +325,43 @@ void unregister_kretprobe(struct kretprobe *rp);
319Removes the specified probe. The unregister function can be called 325Removes the specified probe. The unregister function can be called
320at any time after the probe has been registered. 326at any time after the probe has been registered.
321 327
328NOTE:
329If the functions find an incorrect probe (ex. an unregistered probe),
330they clear the addr field of the probe.
331
3324.5 register_*probes
333
334#include <linux/kprobes.h>
335int register_kprobes(struct kprobe **kps, int num);
336int register_kretprobes(struct kretprobe **rps, int num);
337int register_jprobes(struct jprobe **jps, int num);
338
339Registers each of the num probes in the specified array. If any
340error occurs during registration, all probes in the array, up to
341the bad probe, are safely unregistered before the register_*probes
342function returns.
343- kps/rps/jps: an array of pointers to *probe data structures
344- num: the number of the array entries.
345
346NOTE:
347You have to allocate(or define) an array of pointers and set all
348of the array entries before using these functions.
349
3504.6 unregister_*probes
351
352#include <linux/kprobes.h>
353void unregister_kprobes(struct kprobe **kps, int num);
354void unregister_kretprobes(struct kretprobe **rps, int num);
355void unregister_jprobes(struct jprobe **jps, int num);
356
357Removes each of the num probes in the specified array at once.
358
359NOTE:
360If the functions find some incorrect probes (ex. unregistered
361probes) in the specified array, they clear the addr field of those
362incorrect probes. However, other probes in the array are
363unregistered correctly.
364
3225. Kprobes Features and Limitations 3655. Kprobes Features and Limitations
323 366
324Kprobes allows multiple probes at the same address. Currently, 367Kprobes allows multiple probes at the same address. Currently,
diff --git a/Documentation/leds-class.txt b/Documentation/leds-class.txt
index 56757c751d6f..18860ad9935a 100644
--- a/Documentation/leds-class.txt
+++ b/Documentation/leds-class.txt
@@ -19,6 +19,12 @@ optimises away.
19 19
20Complex triggers whilst available to all LEDs have LED specific 20Complex triggers whilst available to all LEDs have LED specific
21parameters and work on a per LED basis. The timer trigger is an example. 21parameters and work on a per LED basis. The timer trigger is an example.
22The timer trigger will periodically change the LED brightness between
23LED_OFF and the current brightness setting. The "on" and "off" time can
24be specified via /sys/class/leds/<device>/delay_{on,off} in milliseconds.
25You can change the brightness value of a LED independently of the timer
26trigger. However, if you set the brightness value to LED_OFF it will
27also disable the timer trigger.
22 28
23You can change triggers in a similar manner to the way an IO scheduler 29You can change triggers in a similar manner to the way an IO scheduler
24is chosen (via /sys/class/leds/<device>/trigger). Trigger specific 30is chosen (via /sys/class/leds/<device>/trigger). Trigger specific
@@ -63,9 +69,9 @@ value if it is called with *delay_on==0 && *delay_off==0 parameters. In
63this case the driver should give back the chosen value through delay_on 69this case the driver should give back the chosen value through delay_on
64and delay_off parameters to the leds subsystem. 70and delay_off parameters to the leds subsystem.
65 71
66Any call to the brightness_set() callback function should cancel the 72Setting the brightness to zero with brightness_set() callback function
67previously programmed hardware blinking function so setting the brightness 73should completely turn off the LED and cancel the previously programmed
68to 0 can also cancel the blinking of the LED. 74hardware blinking function, if any.
69 75
70 76
71Known Issues 77Known Issues
diff --git a/Documentation/md.txt b/Documentation/md.txt
index 396cdd982c26..a8b430627473 100644
--- a/Documentation/md.txt
+++ b/Documentation/md.txt
@@ -450,3 +450,9 @@ These currently include
450 there are upper and lower limits (32768, 16). Default is 128. 450 there are upper and lower limits (32768, 16). Default is 128.
451 strip_cache_active (currently raid5 only) 451 strip_cache_active (currently raid5 only)
452 number of active entries in the stripe cache 452 number of active entries in the stripe cache
453 preread_bypass_threshold (currently raid5 only)
454 number of times a stripe requiring preread will be bypassed by
455 a stripe that does not require preread. For fairness defaults
456 to 1. Setting this to 0 disables bypass accounting and
457 requires preread stripes to wait until all full-width stripe-
458 writes are complete. Valid values are 0 to stripe_cache_size.
diff --git a/Documentation/mips/AU1xxx_IDE.README b/Documentation/mips/AU1xxx_IDE.README
index 5c8334123f4f..25a6ed1aaa5b 100644
--- a/Documentation/mips/AU1xxx_IDE.README
+++ b/Documentation/mips/AU1xxx_IDE.README
@@ -46,8 +46,6 @@ Two files are introduced:
46 46
47 a) 'include/asm-mips/mach-au1x00/au1xxx_ide.h' 47 a) 'include/asm-mips/mach-au1x00/au1xxx_ide.h'
48 containes : struct _auide_hwif 48 containes : struct _auide_hwif
49 struct drive_list_entry dma_white_list
50 struct drive_list_entry dma_black_list
51 timing parameters for PIO mode 0/1/2/3/4 49 timing parameters for PIO mode 0/1/2/3/4
52 timing parameters for MWDMA 0/1/2 50 timing parameters for MWDMA 0/1/2
53 51
@@ -63,12 +61,6 @@ Four configs variables are introduced:
63 CONFIG_BLK_DEV_IDE_AU1XXX_SEQTS_PER_RQ - maximum transfer size 61 CONFIG_BLK_DEV_IDE_AU1XXX_SEQTS_PER_RQ - maximum transfer size
64 per descriptor 62 per descriptor
65 63
66If MWDMA is enabled and the connected hard disc is not on the white list, the
67kernel switches to a "safe mwdma mode" at boot time. In this mode the IDE
68performance is substantial slower then in full speed mwdma. In this case
69please add your hard disc to the white list (follow instruction from 'ADD NEW
70HARD DISC TO WHITE OR BLACK LIST' section).
71
72 64
73SUPPORTED IDE MODES 65SUPPORTED IDE MODES
74------------------- 66-------------------
@@ -120,44 +112,6 @@ CONFIG_IDEDMA_AUTO=y
120Also undefine 'IDE_AU1XXX_BURSTMODE' in 'drivers/ide/mips/au1xxx-ide.c' to 112Also undefine 'IDE_AU1XXX_BURSTMODE' in 'drivers/ide/mips/au1xxx-ide.c' to
121disable the burst support on DBDMA controller. 113disable the burst support on DBDMA controller.
122 114
123ADD NEW HARD DISC TO WHITE OR BLACK LIST
124----------------------------------------
125
126Step 1 : detect the model name of your hard disc
127
128 a) connect your hard disc to the AU1XXX
129
130 b) boot your kernel and get the hard disc model.
131
132 Example boot log:
133
134 --snipped--
135 Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
136 ide: Assuming 50MHz system bus speed for PIO modes; override with idebus=xx
137 Au1xxx IDE(builtin) configured for MWDMA2
138 Probing IDE interface ide0...
139 hda: Maxtor 6E040L0, ATA DISK drive
140 ide0 at 0xac800000-0xac800007,0xac8001c0 on irq 64
141 hda: max request size: 64KiB
142 hda: 80293248 sectors (41110 MB) w/2048KiB Cache, CHS=65535/16/63, (U)DMA
143 --snipped--
144
145 In this example 'Maxtor 6E040L0'.
146
147Step 2 : edit 'include/asm-mips/mach-au1x00/au1xxx_ide.h'
148
149 Add your hard disc to the dma_white_list or dma_black_list structur.
150
151Step 3 : Recompile the kernel
152
153 Enable MWDMA support in the kernel configuration. Recompile the kernel and
154 reboot.
155
156Step 4 : Tests
157
158 If you have add a hard disc to the white list, please run some stress tests
159 for verification.
160
161 115
162ACKNOWLEDGMENTS 116ACKNOWLEDGMENTS
163--------------- 117---------------
diff --git a/Documentation/networking/phy.txt b/Documentation/networking/phy.txt
index 0bc95eab1512..8df6a7b0e66c 100644
--- a/Documentation/networking/phy.txt
+++ b/Documentation/networking/phy.txt
@@ -1,7 +1,7 @@
1 1
2------- 2-------
3PHY Abstraction Layer 3PHY Abstraction Layer
4(Updated 2006-11-30) 4(Updated 2008-04-08)
5 5
6Purpose 6Purpose
7 7
@@ -291,3 +291,39 @@ Writing a PHY driver
291 Feel free to look at the Marvell, Cicada, and Davicom drivers in 291 Feel free to look at the Marvell, Cicada, and Davicom drivers in
292 drivers/net/phy/ for examples (the lxt and qsemi drivers have 292 drivers/net/phy/ for examples (the lxt and qsemi drivers have
293 not been tested as of this writing) 293 not been tested as of this writing)
294
295Board Fixups
296
297 Sometimes the specific interaction between the platform and the PHY requires
298 special handling. For instance, to change where the PHY's clock input is,
299 or to add a delay to account for latency issues in the data path. In order
300 to support such contingencies, the PHY Layer allows platform code to register
301 fixups to be run when the PHY is brought up (or subsequently reset).
302
303 When the PHY Layer brings up a PHY it checks to see if there are any fixups
304 registered for it, matching based on UID (contained in the PHY device's phy_id
305 field) and the bus identifier (contained in phydev->dev.bus_id). Both must
306 match, however two constants, PHY_ANY_ID and PHY_ANY_UID, are provided as
307 wildcards for the bus ID and UID, respectively.
308
309 When a match is found, the PHY layer will invoke the run function associated
310 with the fixup. This function is passed a pointer to the phy_device of
311 interest. It should therefore only operate on that PHY.
312
313 The platform code can either register the fixup using phy_register_fixup():
314
315 int phy_register_fixup(const char *phy_id,
316 u32 phy_uid, u32 phy_uid_mask,
317 int (*run)(struct phy_device *));
318
319 Or using one of the two stubs, phy_register_fixup_for_uid() and
320 phy_register_fixup_for_id():
321
322 int phy_register_fixup_for_uid(u32 phy_uid, u32 phy_uid_mask,
323 int (*run)(struct phy_device *));
324 int phy_register_fixup_for_id(const char *phy_id,
325 int (*run)(struct phy_device *));
326
327 The stubs set one of the two matching criteria, and set the other one to
328 match anything.
329
diff --git a/Documentation/powerpc/booting-without-of.txt b/Documentation/powerpc/booting-without-of.txt
index 4cc780024e6c..1d2a772506cf 100644
--- a/Documentation/powerpc/booting-without-of.txt
+++ b/Documentation/powerpc/booting-without-of.txt
@@ -2601,6 +2601,17 @@ platforms are moved over to use the flattened-device-tree model.
2601 differ between different families. May be 2601 differ between different families. May be
2602 'virtex2p', 'virtex4', or 'virtex5'. 2602 'virtex2p', 'virtex4', or 'virtex5'.
2603 2603
2604 vi) Xilinx Uart 16550
2605
2606 Xilinx UART 16550 devices are very similar to the NS16550 but with
2607 different register spacing and an offset from the base address.
2608
2609 Requred properties:
2610 - clock-frequency : Frequency of the clock input
2611 - reg-offset : A value of 3 is required
2612 - reg-shift : A value of 2 is required
2613
2614
2604 p) Freescale Synchronous Serial Interface 2615 p) Freescale Synchronous Serial Interface
2605 2616
2606 The SSI is a serial device that communicates with audio codecs. It can 2617 The SSI is a serial device that communicates with audio codecs. It can
@@ -2825,6 +2836,39 @@ platforms are moved over to use the flattened-device-tree model.
2825 big-endian; 2836 big-endian;
2826 }; 2837 };
2827 2838
2839 r) Freescale Display Interface Unit
2840
2841 The Freescale DIU is a LCD controller, with proper hardware, it can also
2842 drive DVI monitors.
2843
2844 Required properties:
2845 - compatible : should be "fsl-diu".
2846 - reg : should contain at least address and length of the DIU register
2847 set.
2848 - Interrupts : one DIU interrupt should be describe here.
2849
2850 Example (MPC8610HPCD)
2851 display@2c000 {
2852 compatible = "fsl,diu";
2853 reg = <0x2c000 100>;
2854 interrupts = <72 2>;
2855 interrupt-parent = <&mpic>;
2856 };
2857
2858 s) Freescale on board FPGA
2859
2860 This is the memory-mapped registers for on board FPGA.
2861
2862 Required properities:
2863 - compatible : should be "fsl,fpga-pixis".
2864 - reg : should contain the address and the lenght of the FPPGA register
2865 set.
2866
2867 Example (MPC8610HPCD)
2868 board-control@e8000000 {
2869 compatible = "fsl,fpga-pixis";
2870 reg = <0xe8000000 32>;
2871 };
2828 2872
2829VII - Marvell Discovery mv64[345]6x System Controller chips 2873VII - Marvell Discovery mv64[345]6x System Controller chips
2830=========================================================== 2874===========================================================
diff --git a/Documentation/powerpc/kvm_440.txt b/Documentation/powerpc/kvm_440.txt
new file mode 100644
index 000000000000..c02a003fa03a
--- /dev/null
+++ b/Documentation/powerpc/kvm_440.txt
@@ -0,0 +1,41 @@
1Hollis Blanchard <hollisb@us.ibm.com>
215 Apr 2008
3
4Various notes on the implementation of KVM for PowerPC 440:
5
6To enforce isolation, host userspace, guest kernel, and guest userspace all
7run at user privilege level. Only the host kernel runs in supervisor mode.
8Executing privileged instructions in the guest traps into KVM (in the host
9kernel), where we decode and emulate them. Through this technique, unmodified
10440 Linux kernels can be run (slowly) as guests. Future performance work will
11focus on reducing the overhead and frequency of these traps.
12
13The usual code flow is started from userspace invoking an "run" ioctl, which
14causes KVM to switch into guest context. We use IVPR to hijack the host
15interrupt vectors while running the guest, which allows us to direct all
16interrupts to kvmppc_handle_interrupt(). At this point, we could either
17- handle the interrupt completely (e.g. emulate "mtspr SPRG0"), or
18- let the host interrupt handler run (e.g. when the decrementer fires), or
19- return to host userspace (e.g. when the guest performs device MMIO)
20
21Address spaces: We take advantage of the fact that Linux doesn't use the AS=1
22address space (in host or guest), which gives us virtual address space to use
23for guest mappings. While the guest is running, the host kernel remains mapped
24in AS=0, but the guest can only use AS=1 mappings.
25
26TLB entries: The TLB entries covering the host linear mapping remain
27present while running the guest. This reduces the overhead of lightweight
28exits, which are handled by KVM running in the host kernel. We keep three
29copies of the TLB:
30 - guest TLB: contents of the TLB as the guest sees it
31 - shadow TLB: the TLB that is actually in hardware while guest is running
32 - host TLB: to restore TLB state when context switching guest -> host
33When a TLB miss occurs because a mapping was not present in the shadow TLB,
34but was present in the guest TLB, KVM handles the fault without invoking the
35guest. Large guest pages are backed by multiple 4KB shadow pages through this
36mechanism.
37
38IO: MMIO and DCR accesses are emulated by userspace. We use virtio for network
39and block IO, so those drivers must be enabled in the guest. It's possible
40that some qemu device emulation (e.g. e1000 or rtl8139) may also work with
41little effort.
diff --git a/Documentation/s390/kvm.txt b/Documentation/s390/kvm.txt
new file mode 100644
index 000000000000..6f5ceb0f09fc
--- /dev/null
+++ b/Documentation/s390/kvm.txt
@@ -0,0 +1,125 @@
1*** BIG FAT WARNING ***
2The kvm module is currently in EXPERIMENTAL state for s390. This means that
3the interface to the module is not yet considered to remain stable. Thus, be
4prepared that we keep breaking your userspace application and guest
5compatibility over and over again until we feel happy with the result. Make sure
6your guest kernel, your host kernel, and your userspace launcher are in a
7consistent state.
8
9This Documentation describes the unique ioctl calls to /dev/kvm, the resulting
10kvm-vm file descriptors, and the kvm-vcpu file descriptors that differ from x86.
11
121. ioctl calls to /dev/kvm
13KVM does support the following ioctls on s390 that are common with other
14architectures and do behave the same:
15KVM_GET_API_VERSION
16KVM_CREATE_VM (*) see note
17KVM_CHECK_EXTENSION
18KVM_GET_VCPU_MMAP_SIZE
19
20Notes:
21* KVM_CREATE_VM may fail on s390, if the calling process has multiple
22threads and has not called KVM_S390_ENABLE_SIE before.
23
24In addition, on s390 the following architecture specific ioctls are supported:
25ioctl: KVM_S390_ENABLE_SIE
26args: none
27see also: include/linux/kvm.h
28This call causes the kernel to switch on PGSTE in the user page table. This
29operation is needed in order to run a virtual machine, and it requires the
30calling process to be single-threaded. Note that the first call to KVM_CREATE_VM
31will implicitly try to switch on PGSTE if the user process has not called
32KVM_S390_ENABLE_SIE before. User processes that want to launch multiple threads
33before creating a virtual machine have to call KVM_S390_ENABLE_SIE, or will
34observe an error calling KVM_CREATE_VM. Switching on PGSTE is a one-time
35operation, is not reversible, and will persist over the entire lifetime of
36the calling process. It does not have any user-visible effect other than a small
37performance penalty.
38
392. ioctl calls to the kvm-vm file descriptor
40KVM does support the following ioctls on s390 that are common with other
41architectures and do behave the same:
42KVM_CREATE_VCPU
43KVM_SET_USER_MEMORY_REGION (*) see note
44KVM_GET_DIRTY_LOG (**) see note
45
46Notes:
47* kvm does only allow exactly one memory slot on s390, which has to start
48 at guest absolute address zero and at a user address that is aligned on any
49 page boundary. This hardware "limitation" allows us to have a few unique
50 optimizations. The memory slot doesn't have to be filled
51 with memory actually, it may contain sparse holes. That said, with different
52 user memory layout this does still allow a large flexibility when
53 doing the guest memory setup.
54** KVM_GET_DIRTY_LOG doesn't work properly yet. The user will receive an empty
55log. This ioctl call is only needed for guest migration, and we intend to
56implement this one in the future.
57
58In addition, on s390 the following architecture specific ioctls for the kvm-vm
59file descriptor are supported:
60ioctl: KVM_S390_INTERRUPT
61args: struct kvm_s390_interrupt *
62see also: include/linux/kvm.h
63This ioctl is used to submit a floating interrupt for a virtual machine.
64Floating interrupts may be delivered to any virtual cpu in the configuration.
65Only some interrupt types defined in include/linux/kvm.h make sense when
66submitted as floating interrupts. The following interrupts are not considered
67to be useful as floating interrupts, and a call to inject them will result in
68-EINVAL error code: program interrupts and interprocessor signals. Valid
69floating interrupts are:
70KVM_S390_INT_VIRTIO
71KVM_S390_INT_SERVICE
72
733. ioctl calls to the kvm-vcpu file descriptor
74KVM does support the following ioctls on s390 that are common with other
75architectures and do behave the same:
76KVM_RUN
77KVM_GET_REGS
78KVM_SET_REGS
79KVM_GET_SREGS
80KVM_SET_SREGS
81KVM_GET_FPU
82KVM_SET_FPU
83
84In addition, on s390 the following architecture specific ioctls for the
85kvm-vcpu file descriptor are supported:
86ioctl: KVM_S390_INTERRUPT
87args: struct kvm_s390_interrupt *
88see also: include/linux/kvm.h
89This ioctl is used to submit an interrupt for a specific virtual cpu.
90Only some interrupt types defined in include/linux/kvm.h make sense when
91submitted for a specific cpu. The following interrupts are not considered
92to be useful, and a call to inject them will result in -EINVAL error code:
93service processor calls and virtio interrupts. Valid interrupt types are:
94KVM_S390_PROGRAM_INT
95KVM_S390_SIGP_STOP
96KVM_S390_RESTART
97KVM_S390_SIGP_SET_PREFIX
98KVM_S390_INT_EMERGENCY
99
100ioctl: KVM_S390_STORE_STATUS
101args: unsigned long
102see also: include/linux/kvm.h
103This ioctl stores the state of the cpu at the guest real address given as
104argument, unless one of the following values defined in include/linux/kvm.h
105is given as arguement:
106KVM_S390_STORE_STATUS_NOADDR - the CPU stores its status to the save area in
107absolute lowcore as defined by the principles of operation
108KVM_S390_STORE_STATUS_PREFIXED - the CPU stores its status to the save area in
109its prefix page just like the dump tool that comes with zipl. This is useful
110to create a system dump for use with lkcdutils or crash.
111
112ioctl: KVM_S390_SET_INITIAL_PSW
113args: struct kvm_s390_psw *
114see also: include/linux/kvm.h
115This ioctl can be used to set the processor status word (psw) of a stopped cpu
116prior to running it with KVM_RUN. Note that this call is not required to modify
117the psw during sie intercepts that fall back to userspace because struct kvm_run
118does contain the psw, and this value is evaluated during reentry of KVM_RUN
119after the intercept exit was recognized.
120
121ioctl: KVM_S390_INITIAL_RESET
122args: none
123see also: include/linux/kvm.h
124This ioctl can be used to perform an initial cpu reset as defined by the
125principles of operation. The target cpu has to be in stopped state.
diff --git a/Documentation/smart-config.txt b/Documentation/smart-config.txt
deleted file mode 100644
index 8467447b5a87..000000000000
--- a/Documentation/smart-config.txt
+++ /dev/null
@@ -1,98 +0,0 @@
1Smart CONFIG_* Dependencies
21 August 1999
3
4Michael Chastain <mec@shout.net>
5Werner Almesberger <almesber@lrc.di.epfl.ch>
6Martin von Loewis <martin@mira.isdn.cs.tu-berlin.de>
7
8Here is the problem:
9
10 Suppose that drivers/net/foo.c has the following lines:
11
12 #include <linux/config.h>
13
14 ...
15
16 #ifdef CONFIG_FOO_AUTOFROB
17 /* Code for auto-frobbing */
18 #else
19 /* Manual frobbing only */
20 #endif
21
22 ...
23
24 #ifdef CONFIG_FOO_MODEL_TWO
25 /* Code for model two */
26 #endif
27
28 Now suppose the user (the person building kernels) reconfigures the
29 kernel to change some unrelated setting. This will regenerate the
30 file include/linux/autoconf.h, which will cause include/linux/config.h
31 to be out of date, which will cause drivers/net/foo.c to be recompiled.
32
33 Most kernel sources, perhaps 80% of them, have at least one CONFIG_*
34 dependency somewhere. So changing _any_ CONFIG_* setting requires
35 almost _all_ of the kernel to be recompiled.
36
37Here is the solution:
38
39 We've made the dependency generator, mkdep.c, smarter. Instead of
40 generating this dependency:
41
42 drivers/net/foo.c: include/linux/config.h
43
44 It now generates these dependencies:
45
46 drivers/net/foo.c: \
47 include/config/foo/autofrob.h \
48 include/config/foo/model/two.h
49
50 So drivers/net/foo.c depends only on the CONFIG_* lines that
51 it actually uses.
52
53 A new program, split-include.c, runs at the beginning of
54 compilation (make bzImage or make zImage). split-include reads
55 include/linux/autoconf.h and updates the include/config/ tree,
56 writing one file per option. It updates only the files for options
57 that have changed.
58
59Flag Dependencies
60
61 Martin Von Loewis contributed another feature to this patch:
62 'flag dependencies'. The idea is that a .o file depends on
63 the compilation flags used to build it. The file foo.o has
64 its flags stored in .flags.foo.o.
65
66 Suppose the user changes the foo driver from resident to modular.
67 'make' will notice that the current foo.o was not compiled with
68 -DMODULE and will recompile foo.c.
69
70 All .o files made from C source have flag dependencies. So do .o
71 files made with ld, and .a files made with ar. However, .o files
72 made from assembly source do not have flag dependencies (nobody
73 needs this yet, but it would be good to fix).
74
75Per-source-file Flags
76
77 Flag dependencies also work with per-source-file flags.
78 You can specify compilation flags for individual source files
79 like this:
80
81 CFLAGS_foo.o = -DSPECIAL_FOO_DEFINE
82
83 This helps clean up drivers/net/Makefile, drivers/scsi/Makefile,
84 and several other Makefiles.
85
86Credit
87
88 Werner Almesberger had the original idea and wrote the first
89 version of this patch.
90
91 Michael Chastain picked it up and continued development. He is
92 now the principal author and maintainer. Please report any bugs
93 to him.
94
95 Martin von Loewis wrote flag dependencies, with some modifications
96 by Michael Chastain.
97
98 Thanks to all of the beta testers.
diff --git a/Documentation/spi/spidev b/Documentation/spi/spidev
index 5c8e1b988a08..ed2da5e5b28a 100644
--- a/Documentation/spi/spidev
+++ b/Documentation/spi/spidev
@@ -126,8 +126,8 @@ NOTES:
126FULL DUPLEX CHARACTER DEVICE API 126FULL DUPLEX CHARACTER DEVICE API
127================================ 127================================
128 128
129See the sample program below for one example showing the use of the full 129See the spidev_fdx.c sample program for one example showing the use of the
130duplex programming interface. (Although it doesn't perform a full duplex 130full duplex programming interface. (Although it doesn't perform a full duplex
131transfer.) The model is the same as that used in the kernel spi_sync() 131transfer.) The model is the same as that used in the kernel spi_sync()
132request; the individual transfers offer the same capabilities as are 132request; the individual transfers offer the same capabilities as are
133available to kernel drivers (except that it's not asynchronous). 133available to kernel drivers (except that it's not asynchronous).
@@ -141,167 +141,3 @@ and bitrate for each transfer segment.)
141 141
142To make a full duplex request, provide both rx_buf and tx_buf for the 142To make a full duplex request, provide both rx_buf and tx_buf for the
143same transfer. It's even OK if those are the same buffer. 143same transfer. It's even OK if those are the same buffer.
144
145
146SAMPLE PROGRAM
147==============
148
149-------------------------------- CUT HERE
150#include <stdio.h>
151#include <unistd.h>
152#include <stdlib.h>
153#include <fcntl.h>
154#include <string.h>
155
156#include <sys/ioctl.h>
157#include <sys/types.h>
158#include <sys/stat.h>
159
160#include <linux/types.h>
161#include <linux/spi/spidev.h>
162
163
164static int verbose;
165
166static void do_read(int fd, int len)
167{
168 unsigned char buf[32], *bp;
169 int status;
170
171 /* read at least 2 bytes, no more than 32 */
172 if (len < 2)
173 len = 2;
174 else if (len > sizeof(buf))
175 len = sizeof(buf);
176 memset(buf, 0, sizeof buf);
177
178 status = read(fd, buf, len);
179 if (status < 0) {
180 perror("read");
181 return;
182 }
183 if (status != len) {
184 fprintf(stderr, "short read\n");
185 return;
186 }
187
188 printf("read(%2d, %2d): %02x %02x,", len, status,
189 buf[0], buf[1]);
190 status -= 2;
191 bp = buf + 2;
192 while (status-- > 0)
193 printf(" %02x", *bp++);
194 printf("\n");
195}
196
197static void do_msg(int fd, int len)
198{
199 struct spi_ioc_transfer xfer[2];
200 unsigned char buf[32], *bp;
201 int status;
202
203 memset(xfer, 0, sizeof xfer);
204 memset(buf, 0, sizeof buf);
205
206 if (len > sizeof buf)
207 len = sizeof buf;
208
209 buf[0] = 0xaa;
210 xfer[0].tx_buf = (__u64) buf;
211 xfer[0].len = 1;
212
213 xfer[1].rx_buf = (__u64) buf;
214 xfer[1].len = len;
215
216 status = ioctl(fd, SPI_IOC_MESSAGE(2), xfer);
217 if (status < 0) {
218 perror("SPI_IOC_MESSAGE");
219 return;
220 }
221
222 printf("response(%2d, %2d): ", len, status);
223 for (bp = buf; len; len--)
224 printf(" %02x", *bp++);
225 printf("\n");
226}
227
228static void dumpstat(const char *name, int fd)
229{
230 __u8 mode, lsb, bits;
231 __u32 speed;
232
233 if (ioctl(fd, SPI_IOC_RD_MODE, &mode) < 0) {
234 perror("SPI rd_mode");
235 return;
236 }
237 if (ioctl(fd, SPI_IOC_RD_LSB_FIRST, &lsb) < 0) {
238 perror("SPI rd_lsb_fist");
239 return;
240 }
241 if (ioctl(fd, SPI_IOC_RD_BITS_PER_WORD, &bits) < 0) {
242 perror("SPI bits_per_word");
243 return;
244 }
245 if (ioctl(fd, SPI_IOC_RD_MAX_SPEED_HZ, &speed) < 0) {
246 perror("SPI max_speed_hz");
247 return;
248 }
249
250 printf("%s: spi mode %d, %d bits %sper word, %d Hz max\n",
251 name, mode, bits, lsb ? "(lsb first) " : "", speed);
252}
253
254int main(int argc, char **argv)
255{
256 int c;
257 int readcount = 0;
258 int msglen = 0;
259 int fd;
260 const char *name;
261
262 while ((c = getopt(argc, argv, "hm:r:v")) != EOF) {
263 switch (c) {
264 case 'm':
265 msglen = atoi(optarg);
266 if (msglen < 0)
267 goto usage;
268 continue;
269 case 'r':
270 readcount = atoi(optarg);
271 if (readcount < 0)
272 goto usage;
273 continue;
274 case 'v':
275 verbose++;
276 continue;
277 case 'h':
278 case '?':
279usage:
280 fprintf(stderr,
281 "usage: %s [-h] [-m N] [-r N] /dev/spidevB.D\n",
282 argv[0]);
283 return 1;
284 }
285 }
286
287 if ((optind + 1) != argc)
288 goto usage;
289 name = argv[optind];
290
291 fd = open(name, O_RDWR);
292 if (fd < 0) {
293 perror("open");
294 return 1;
295 }
296
297 dumpstat(name, fd);
298
299 if (msglen)
300 do_msg(fd, msglen);
301
302 if (readcount)
303 do_read(fd, readcount);
304
305 close(fd);
306 return 0;
307}
diff --git a/Documentation/spi/spidev_fdx.c b/Documentation/spi/spidev_fdx.c
new file mode 100644
index 000000000000..fc354f760384
--- /dev/null
+++ b/Documentation/spi/spidev_fdx.c
@@ -0,0 +1,158 @@
1#include <stdio.h>
2#include <unistd.h>
3#include <stdlib.h>
4#include <fcntl.h>
5#include <string.h>
6
7#include <sys/ioctl.h>
8#include <sys/types.h>
9#include <sys/stat.h>
10
11#include <linux/types.h>
12#include <linux/spi/spidev.h>
13
14
15static int verbose;
16
17static void do_read(int fd, int len)
18{
19 unsigned char buf[32], *bp;
20 int status;
21
22 /* read at least 2 bytes, no more than 32 */
23 if (len < 2)
24 len = 2;
25 else if (len > sizeof(buf))
26 len = sizeof(buf);
27 memset(buf, 0, sizeof buf);
28
29 status = read(fd, buf, len);
30 if (status < 0) {
31 perror("read");
32 return;
33 }
34 if (status != len) {
35 fprintf(stderr, "short read\n");
36 return;
37 }
38
39 printf("read(%2d, %2d): %02x %02x,", len, status,
40 buf[0], buf[1]);
41 status -= 2;
42 bp = buf + 2;
43 while (status-- > 0)
44 printf(" %02x", *bp++);
45 printf("\n");
46}
47
48static void do_msg(int fd, int len)
49{
50 struct spi_ioc_transfer xfer[2];
51 unsigned char buf[32], *bp;
52 int status;
53
54 memset(xfer, 0, sizeof xfer);
55 memset(buf, 0, sizeof buf);
56
57 if (len > sizeof buf)
58 len = sizeof buf;
59
60 buf[0] = 0xaa;
61 xfer[0].tx_buf = (__u64) buf;
62 xfer[0].len = 1;
63
64 xfer[1].rx_buf = (__u64) buf;
65 xfer[1].len = len;
66
67 status = ioctl(fd, SPI_IOC_MESSAGE(2), xfer);
68 if (status < 0) {
69 perror("SPI_IOC_MESSAGE");
70 return;
71 }
72
73 printf("response(%2d, %2d): ", len, status);
74 for (bp = buf; len; len--)
75 printf(" %02x", *bp++);
76 printf("\n");
77}
78
79static void dumpstat(const char *name, int fd)
80{
81 __u8 mode, lsb, bits;
82 __u32 speed;
83
84 if (ioctl(fd, SPI_IOC_RD_MODE, &mode) < 0) {
85 perror("SPI rd_mode");
86 return;
87 }
88 if (ioctl(fd, SPI_IOC_RD_LSB_FIRST, &lsb) < 0) {
89 perror("SPI rd_lsb_fist");
90 return;
91 }
92 if (ioctl(fd, SPI_IOC_RD_BITS_PER_WORD, &bits) < 0) {
93 perror("SPI bits_per_word");
94 return;
95 }
96 if (ioctl(fd, SPI_IOC_RD_MAX_SPEED_HZ, &speed) < 0) {
97 perror("SPI max_speed_hz");
98 return;
99 }
100
101 printf("%s: spi mode %d, %d bits %sper word, %d Hz max\n",
102 name, mode, bits, lsb ? "(lsb first) " : "", speed);
103}
104
105int main(int argc, char **argv)
106{
107 int c;
108 int readcount = 0;
109 int msglen = 0;
110 int fd;
111 const char *name;
112
113 while ((c = getopt(argc, argv, "hm:r:v")) != EOF) {
114 switch (c) {
115 case 'm':
116 msglen = atoi(optarg);
117 if (msglen < 0)
118 goto usage;
119 continue;
120 case 'r':
121 readcount = atoi(optarg);
122 if (readcount < 0)
123 goto usage;
124 continue;
125 case 'v':
126 verbose++;
127 continue;
128 case 'h':
129 case '?':
130usage:
131 fprintf(stderr,
132 "usage: %s [-h] [-m N] [-r N] /dev/spidevB.D\n",
133 argv[0]);
134 return 1;
135 }
136 }
137
138 if ((optind + 1) != argc)
139 goto usage;
140 name = argv[optind];
141
142 fd = open(name, O_RDWR);
143 if (fd < 0) {
144 perror("open");
145 return 1;
146 }
147
148 dumpstat(name, fd);
149
150 if (msglen)
151 do_msg(fd, msglen);
152
153 if (readcount)
154 do_read(fd, readcount);
155
156 close(fd);
157 return 0;
158}
diff --git a/Documentation/usb/anchors.txt b/Documentation/usb/anchors.txt
new file mode 100644
index 000000000000..7304bcf5a306
--- /dev/null
+++ b/Documentation/usb/anchors.txt
@@ -0,0 +1,50 @@
1What is anchor?
2===============
3
4A USB driver needs to support some callbacks requiring
5a driver to cease all IO to an interface. To do so, a
6driver has to keep track of the URBs it has submitted
7to know they've all completed or to call usb_kill_urb
8for them. The anchor is a data structure takes care of
9keeping track of URBs and provides methods to deal with
10multiple URBs.
11
12Allocation and Initialisation
13=============================
14
15There's no API to allocate an anchor. It is simply declared
16as struct usb_anchor. init_usb_anchor() must be called to
17initialise the data structure.
18
19Deallocation
20============
21
22Once it has no more URBs associated with it, the anchor can be
23freed with normal memory management operations.
24
25Association and disassociation of URBs with anchors
26===================================================
27
28An association of URBs to an anchor is made by an explicit
29call to usb_anchor_urb(). The association is maintained until
30an URB is finished by (successfull) completion. Thus disassociation
31is automatic. A function is provided to forcibly finish (kill)
32all URBs associated with an anchor.
33Furthermore, disassociation can be made with usb_unanchor_urb()
34
35Operations on multitudes of URBs
36================================
37
38usb_kill_anchored_urbs()
39------------------------
40
41This function kills all URBs associated with an anchor. The URBs
42are called in the reverse temporal order they were submitted.
43This way no data can be reordered.
44
45usb_wait_anchor_empty_timeout()
46-------------------------------
47
48This function waits for all URBs associated with an anchor to finish
49or a timeout, whichever comes first. Its return value will tell you
50whether the timeout was reached.
diff --git a/Documentation/usb/callbacks.txt b/Documentation/usb/callbacks.txt
new file mode 100644
index 000000000000..7c812411945b
--- /dev/null
+++ b/Documentation/usb/callbacks.txt
@@ -0,0 +1,132 @@
1What callbacks will usbcore do?
2===============================
3
4Usbcore will call into a driver through callbacks defined in the driver
5structure and through the completion handler of URBs a driver submits.
6Only the former are in the scope of this document. These two kinds of
7callbacks are completely independent of each other. Information on the
8completion callback can be found in Documentation/usb/URB.txt.
9
10The callbacks defined in the driver structure are:
11
121. Hotplugging callbacks:
13
14 * @probe: Called to see if the driver is willing to manage a particular
15 * interface on a device.
16 * @disconnect: Called when the interface is no longer accessible, usually
17 * because its device has been (or is being) disconnected or the
18 * driver module is being unloaded.
19
202. Odd backdoor through usbfs:
21
22 * @ioctl: Used for drivers that want to talk to userspace through
23 * the "usbfs" filesystem. This lets devices provide ways to
24 * expose information to user space regardless of where they
25 * do (or don't) show up otherwise in the filesystem.
26
273. Power management (PM) callbacks:
28
29 * @suspend: Called when the device is going to be suspended.
30 * @resume: Called when the device is being resumed.
31 * @reset_resume: Called when the suspended device has been reset instead
32 * of being resumed.
33
344. Device level operations:
35
36 * @pre_reset: Called when the device is about to be reset.
37 * @post_reset: Called after the device has been reset
38
39The ioctl interface (2) should be used only if you have a very good
40reason. Sysfs is preferred these days. The PM callbacks are covered
41separately in Documentation/usb/power-management.txt.
42
43Calling conventions
44===================
45
46All callbacks are mutually exclusive. There's no need for locking
47against other USB callbacks. All callbacks are called from a task
48context. You may sleep. However, it is important that all sleeps have a
49small fixed upper limit in time. In particular you must not call out to
50user space and await results.
51
52Hotplugging callbacks
53=====================
54
55These callbacks are intended to associate and disassociate a driver with
56an interface. A driver's bond to an interface is exclusive.
57
58The probe() callback
59--------------------
60
61int (*probe) (struct usb_interface *intf,
62 const struct usb_device_id *id);
63
64Accept or decline an interface. If you accept the device return 0,
65otherwise -ENODEV or -ENXIO. Other error codes should be used only if a
66genuine error occurred during initialisation which prevented a driver
67from accepting a device that would else have been accepted.
68You are strongly encouraged to use usbcore'sfacility,
69usb_set_intfdata(), to associate a data structure with an interface, so
70that you know which internal state and identity you associate with a
71particular interface. The device will not be suspended and you may do IO
72to the interface you are called for and endpoint 0 of the device. Device
73initialisation that doesn't take too long is a good idea here.
74
75The disconnect() callback
76-------------------------
77
78void (*disconnect) (struct usb_interface *intf);
79
80This callback is a signal to break any connection with an interface.
81You are not allowed any IO to a device after returning from this
82callback. You also may not do any other operation that may interfere
83with another driver bound the interface, eg. a power management
84operation.
85If you are called due to a physical disconnection, all your URBs will be
86killed by usbcore. Note that in this case disconnect will be called some
87time after the physical disconnection. Thus your driver must be prepared
88to deal with failing IO even prior to the callback.
89
90Device level callbacks
91======================
92
93pre_reset
94---------
95
96int (*pre_reset)(struct usb_interface *intf);
97
98Another driver or user space is triggering a reset on the device which
99contains the interface passed as an argument. Cease IO and save any
100device state you need to restore.
101
102If you need to allocate memory here, use GFP_NOIO or GFP_ATOMIC, if you
103are in atomic context.
104
105post_reset
106----------
107
108int (*post_reset)(struct usb_interface *intf);
109
110The reset has completed. Restore any saved device state and begin
111using the device again.
112
113If you need to allocate memory here, use GFP_NOIO or GFP_ATOMIC, if you
114are in atomic context.
115
116Call sequences
117==============
118
119No callbacks other than probe will be invoked for an interface
120that isn't bound to your driver.
121
122Probe will never be called for an interface bound to a driver.
123Hence following a successful probe, disconnect will be called
124before there is another probe for the same interface.
125
126Once your driver is bound to an interface, disconnect can be
127called at any time except in between pre_reset and post_reset.
128pre_reset is always followed by post_reset, even if the reset
129failed or the device has been unplugged.
130
131suspend is always followed by one of: resume, reset_resume, or
132disconnect.
diff --git a/Documentation/usb/persist.txt b/Documentation/usb/persist.txt
index df54d645cbb5..d56cb1a11550 100644
--- a/Documentation/usb/persist.txt
+++ b/Documentation/usb/persist.txt
@@ -2,7 +2,7 @@
2 2
3 Alan Stern <stern@rowland.harvard.edu> 3 Alan Stern <stern@rowland.harvard.edu>
4 4
5 September 2, 2006 (Updated May 29, 2007) 5 September 2, 2006 (Updated February 25, 2008)
6 6
7 7
8 What is the problem? 8 What is the problem?
@@ -65,9 +65,10 @@ much better.)
65 65
66 What is the solution? 66 What is the solution?
67 67
68Setting CONFIG_USB_PERSIST will cause the kernel to work around these 68The kernel includes a feature called USB-persist. It tries to work
69issues. It enables a mode in which the core USB device data 69around these issues by allowing the core USB device data structures to
70structures are allowed to persist across a power-session disruption. 70persist across a power-session disruption.
71
71It works like this. If the kernel sees that a USB host controller is 72It works like this. If the kernel sees that a USB host controller is
72not in the expected state during resume (i.e., if the controller was 73not in the expected state during resume (i.e., if the controller was
73reset or otherwise had lost power) then it applies a persistence check 74reset or otherwise had lost power) then it applies a persistence check
@@ -80,28 +81,30 @@ re-enumeration shows that the device now attached to that port has the
80same descriptors as before, including the Vendor and Product IDs, then 81same descriptors as before, including the Vendor and Product IDs, then
81the kernel continues to use the same device structure. In effect, the 82the kernel continues to use the same device structure. In effect, the
82kernel treats the device as though it had merely been reset instead of 83kernel treats the device as though it had merely been reset instead of
83unplugged. 84unplugged. The same thing happens if the host controller is in the
85expected state but a USB device was unplugged and then replugged.
84 86
85If no device is now attached to the port, or if the descriptors are 87If no device is now attached to the port, or if the descriptors are
86different from what the kernel remembers, then the treatment is what 88different from what the kernel remembers, then the treatment is what
87you would expect. The kernel destroys the old device structure and 89you would expect. The kernel destroys the old device structure and
88behaves as though the old device had been unplugged and a new device 90behaves as though the old device had been unplugged and a new device
89plugged in, just as it would without the CONFIG_USB_PERSIST option. 91plugged in.
90 92
91The end result is that the USB device remains available and usable. 93The end result is that the USB device remains available and usable.
92Filesystem mounts and memory mappings are unaffected, and the world is 94Filesystem mounts and memory mappings are unaffected, and the world is
93now a good and happy place. 95now a good and happy place.
94 96
95Note that even when CONFIG_USB_PERSIST is set, the "persist" feature 97Note that the "USB-persist" feature will be applied only to those
96will be applied only to those devices for which it is enabled. You 98devices for which it is enabled. You can enable the feature by doing
97can enable the feature by doing (as root): 99(as root):
98 100
99 echo 1 >/sys/bus/usb/devices/.../power/persist 101 echo 1 >/sys/bus/usb/devices/.../power/persist
100 102
101where the "..." should be filled in the with the device's ID. Disable 103where the "..." should be filled in the with the device's ID. Disable
102the feature by writing 0 instead of 1. For hubs the feature is 104the feature by writing 0 instead of 1. For hubs the feature is
103automatically and permanently enabled, so you only have to worry about 105automatically and permanently enabled and the power/persist file
104setting it for devices where it really matters. 106doesn't even exist, so you only have to worry about setting it for
107devices where it really matters.
105 108
106 109
107 Is this the best solution? 110 Is this the best solution?
@@ -112,19 +115,19 @@ centralized Logical Volume Manager. Such a solution would allow you
112to plug in a USB flash device, create a persistent volume associated 115to plug in a USB flash device, create a persistent volume associated
113with it, unplug the flash device, plug it back in later, and still 116with it, unplug the flash device, plug it back in later, and still
114have the same persistent volume associated with the device. As such 117have the same persistent volume associated with the device. As such
115it would be more far-reaching than CONFIG_USB_PERSIST. 118it would be more far-reaching than USB-persist.
116 119
117On the other hand, writing a persistent volume manager would be a big 120On the other hand, writing a persistent volume manager would be a big
118job and using it would require significant input from the user. This 121job and using it would require significant input from the user. This
119solution is much quicker and easier -- and it exists now, a giant 122solution is much quicker and easier -- and it exists now, a giant
120point in its favor! 123point in its favor!
121 124
122Furthermore, the USB_PERSIST option applies to _all_ USB devices, not 125Furthermore, the USB-persist feature applies to _all_ USB devices, not
123just mass-storage devices. It might turn out to be equally useful for 126just mass-storage devices. It might turn out to be equally useful for
124other device types, such as network interfaces. 127other device types, such as network interfaces.
125 128
126 129
127 WARNING: Using CONFIG_USB_PERSIST can be dangerous!! 130 WARNING: USB-persist can be dangerous!!
128 131
129When recovering an interrupted power session the kernel does its best 132When recovering an interrupted power session the kernel does its best
130to make sure the USB device hasn't been changed; that is, the same 133to make sure the USB device hasn't been changed; that is, the same
@@ -133,10 +136,10 @@ aren't guaranteed to be 100% accurate.
133 136
134If you replace one USB device with another of the same type (same 137If you replace one USB device with another of the same type (same
135manufacturer, same IDs, and so on) there's an excellent chance the 138manufacturer, same IDs, and so on) there's an excellent chance the
136kernel won't detect the change. Serial numbers and other strings are 139kernel won't detect the change. The serial number string and other
137not compared. In many cases it wouldn't help if they were, because 140descriptors are compared with the kernel's stored values, but this
138manufacturers frequently omit serial numbers entirely in their 141might not help since manufacturers frequently omit serial numbers
139devices. 142entirely in their devices.
140 143
141Furthermore it's quite possible to leave a USB device exactly the same 144Furthermore it's quite possible to leave a USB device exactly the same
142while changing its media. If you replace the flash memory card in a 145while changing its media. If you replace the flash memory card in a
@@ -152,5 +155,5 @@ but yourself.
152YOU HAVE BEEN WARNED! USE AT YOUR OWN RISK! 155YOU HAVE BEEN WARNED! USE AT YOUR OWN RISK!
153 156
154That having been said, most of the time there shouldn't be any trouble 157That having been said, most of the time there shouldn't be any trouble
155at all. The "persist" feature can be extremely useful. Make the most 158at all. The USB-persist feature can be extremely useful. Make the
156of it. 159most of it.
diff --git a/Documentation/usb/usb-serial.txt b/Documentation/usb/usb-serial.txt
index 8b077e43eee7..ff2c1ff57ba2 100644
--- a/Documentation/usb/usb-serial.txt
+++ b/Documentation/usb/usb-serial.txt
@@ -192,12 +192,9 @@ Keyspan USA-series Serial Adapters
192 192
193FTDI Single Port Serial Driver 193FTDI Single Port Serial Driver
194 194
195 This is a single port DB-25 serial adapter. More information about this 195 This is a single port DB-25 serial adapter.
196 device and the Linux driver can be found at:
197 http://reality.sgi.com/bryder_wellington/ftdi_sio/
198 196
199 For any questions or problems with this driver, please contact Bill Ryder 197 For any questions or problems with this driver, please contact Bill Ryder.
200 at bryder@sgi.com
201 198
202 199
203ZyXEL omni.net lcd plus ISDN TA 200ZyXEL omni.net lcd plus ISDN TA
diff --git a/Documentation/vm/numa_memory_policy.txt b/Documentation/vm/numa_memory_policy.txt
index dd4986497996..bad16d3f6a47 100644
--- a/Documentation/vm/numa_memory_policy.txt
+++ b/Documentation/vm/numa_memory_policy.txt
@@ -135,77 +135,58 @@ most general to most specific:
135 135
136Components of Memory Policies 136Components of Memory Policies
137 137
138 A Linux memory policy is a tuple consisting of a "mode" and an optional set 138 A Linux memory policy consists of a "mode", optional mode flags, and an
139 of nodes. The mode determine the behavior of the policy, while the 139 optional set of nodes. The mode determines the behavior of the policy,
140 optional set of nodes can be viewed as the arguments to the behavior. 140 the optional mode flags determine the behavior of the mode, and the
141 optional set of nodes can be viewed as the arguments to the policy
142 behavior.
141 143
142 Internally, memory policies are implemented by a reference counted 144 Internally, memory policies are implemented by a reference counted
143 structure, struct mempolicy. Details of this structure will be discussed 145 structure, struct mempolicy. Details of this structure will be discussed
144 in context, below, as required to explain the behavior. 146 in context, below, as required to explain the behavior.
145 147
146 Note: in some functions AND in the struct mempolicy itself, the mode
147 is called "policy". However, to avoid confusion with the policy tuple,
148 this document will continue to use the term "mode".
149
150 Linux memory policy supports the following 4 behavioral modes: 148 Linux memory policy supports the following 4 behavioral modes:
151 149
152 Default Mode--MPOL_DEFAULT: The behavior specified by this mode is 150 Default Mode--MPOL_DEFAULT: This mode is only used in the memory
153 context or scope dependent. 151 policy APIs. Internally, MPOL_DEFAULT is converted to the NULL
154 152 memory policy in all policy scopes. Any existing non-default policy
155 As mentioned in the Policy Scope section above, during normal 153 will simply be removed when MPOL_DEFAULT is specified. As a result,
156 system operation, the System Default Policy is hard coded to 154 MPOL_DEFAULT means "fall back to the next most specific policy scope."
157 contain the Default mode.
158
159 In this context, default mode means "local" allocation--that is
160 attempt to allocate the page from the node associated with the cpu
161 where the fault occurs. If the "local" node has no memory, or the
162 node's memory can be exhausted [no free pages available], local
163 allocation will "fallback to"--attempt to allocate pages from--
164 "nearby" nodes, in order of increasing "distance".
165 155
166 Implementation detail -- subject to change: "Fallback" uses 156 For example, a NULL or default task policy will fall back to the
167 a per node list of sibling nodes--called zonelists--built at 157 system default policy. A NULL or default vma policy will fall
168 boot time, or when nodes or memory are added or removed from 158 back to the task policy.
169 the system [memory hotplug]. These per node zonelist are
170 constructed with nodes in order of increasing distance based
171 on information provided by the platform firmware.
172 159
173 When a task/process policy or a shared policy contains the Default 160 When specified in one of the memory policy APIs, the Default mode
174 mode, this also means "local allocation", as described above. 161 does not use the optional set of nodes.
175 162
176 In the context of a VMA, Default mode means "fall back to task 163 It is an error for the set of nodes specified for this policy to
177 policy"--which may or may not specify Default mode. Thus, Default 164 be non-empty.
178 mode can not be counted on to mean local allocation when used
179 on a non-shared region of the address space. However, see
180 MPOL_PREFERRED below.
181
182 The Default mode does not use the optional set of nodes.
183 165
184 MPOL_BIND: This mode specifies that memory must come from the 166 MPOL_BIND: This mode specifies that memory must come from the
185 set of nodes specified by the policy. 167 set of nodes specified by the policy. Memory will be allocated from
186 168 the node in the set with sufficient free memory that is closest to
187 The memory policy APIs do not specify an order in which the nodes 169 the node where the allocation takes place.
188 will be searched. However, unlike "local allocation", the Bind
189 policy does not consider the distance between the nodes. Rather,
190 allocations will fallback to the nodes specified by the policy in
191 order of numeric node id. Like everything in Linux, this is subject
192 to change.
193 170
194 MPOL_PREFERRED: This mode specifies that the allocation should be 171 MPOL_PREFERRED: This mode specifies that the allocation should be
195 attempted from the single node specified in the policy. If that 172 attempted from the single node specified in the policy. If that
196 allocation fails, the kernel will search other nodes, exactly as 173 allocation fails, the kernel will search other nodes, in order of
197 it would for a local allocation that started at the preferred node 174 increasing distance from the preferred node based on information
198 in increasing distance from the preferred node. "Local" allocation 175 provided by the platform firmware.
199 policy can be viewed as a Preferred policy that starts at the node
200 containing the cpu where the allocation takes place. 176 containing the cpu where the allocation takes place.
201 177
202 Internally, the Preferred policy uses a single node--the 178 Internally, the Preferred policy uses a single node--the
203 preferred_node member of struct mempolicy. A "distinguished 179 preferred_node member of struct mempolicy. When the internal
204 value of this preferred_node, currently '-1', is interpreted 180 mode flag MPOL_F_LOCAL is set, the preferred_node is ignored and
205 as "the node containing the cpu where the allocation takes 181 the policy is interpreted as local allocation. "Local" allocation
206 place"--local allocation. This is the way to specify 182 policy can be viewed as a Preferred policy that starts at the node
207 local allocation for a specific range of addresses--i.e. for 183 containing the cpu where the allocation takes place.
208 VMA policies. 184
185 It is possible for the user to specify that local allocation is
186 always preferred by passing an empty nodemask with this mode.
187 If an empty nodemask is passed, the policy cannot use the
188 MPOL_F_STATIC_NODES or MPOL_F_RELATIVE_NODES flags described
189 below.
209 190
210 MPOL_INTERLEAVED: This mode specifies that page allocations be 191 MPOL_INTERLEAVED: This mode specifies that page allocations be
211 interleaved, on a page granularity, across the nodes specified in 192 interleaved, on a page granularity, across the nodes specified in
@@ -231,6 +212,154 @@ Components of Memory Policies
231 the temporary interleaved system default policy works in this 212 the temporary interleaved system default policy works in this
232 mode. 213 mode.
233 214
215 Linux memory policy supports the following optional mode flags:
216
217 MPOL_F_STATIC_NODES: This flag specifies that the nodemask passed by
218 the user should not be remapped if the task or VMA's set of allowed
219 nodes changes after the memory policy has been defined.
220
221 Without this flag, anytime a mempolicy is rebound because of a
222 change in the set of allowed nodes, the node (Preferred) or
223 nodemask (Bind, Interleave) is remapped to the new set of
224 allowed nodes. This may result in nodes being used that were
225 previously undesired.
226
227 With this flag, if the user-specified nodes overlap with the
228 nodes allowed by the task's cpuset, then the memory policy is
229 applied to their intersection. If the two sets of nodes do not
230 overlap, the Default policy is used.
231
232 For example, consider a task that is attached to a cpuset with
233 mems 1-3 that sets an Interleave policy over the same set. If
234 the cpuset's mems change to 3-5, the Interleave will now occur
235 over nodes 3, 4, and 5. With this flag, however, since only node
236 3 is allowed from the user's nodemask, the "interleave" only
237 occurs over that node. If no nodes from the user's nodemask are
238 now allowed, the Default behavior is used.
239
240 MPOL_F_STATIC_NODES cannot be combined with the
241 MPOL_F_RELATIVE_NODES flag. It also cannot be used for
242 MPOL_PREFERRED policies that were created with an empty nodemask
243 (local allocation).
244
245 MPOL_F_RELATIVE_NODES: This flag specifies that the nodemask passed
246 by the user will be mapped relative to the set of the task or VMA's
247 set of allowed nodes. The kernel stores the user-passed nodemask,
248 and if the allowed nodes changes, then that original nodemask will
249 be remapped relative to the new set of allowed nodes.
250
251 Without this flag (and without MPOL_F_STATIC_NODES), anytime a
252 mempolicy is rebound because of a change in the set of allowed
253 nodes, the node (Preferred) or nodemask (Bind, Interleave) is
254 remapped to the new set of allowed nodes. That remap may not
255 preserve the relative nature of the user's passed nodemask to its
256 set of allowed nodes upon successive rebinds: a nodemask of
257 1,3,5 may be remapped to 7-9 and then to 1-3 if the set of
258 allowed nodes is restored to its original state.
259
260 With this flag, the remap is done so that the node numbers from
261 the user's passed nodemask are relative to the set of allowed
262 nodes. In other words, if nodes 0, 2, and 4 are set in the user's
263 nodemask, the policy will be effected over the first (and in the
264 Bind or Interleave case, the third and fifth) nodes in the set of
265 allowed nodes. The nodemask passed by the user represents nodes
266 relative to task or VMA's set of allowed nodes.
267
268 If the user's nodemask includes nodes that are outside the range
269 of the new set of allowed nodes (for example, node 5 is set in
270 the user's nodemask when the set of allowed nodes is only 0-3),
271 then the remap wraps around to the beginning of the nodemask and,
272 if not already set, sets the node in the mempolicy nodemask.
273
274 For example, consider a task that is attached to a cpuset with
275 mems 2-5 that sets an Interleave policy over the same set with
276 MPOL_F_RELATIVE_NODES. If the cpuset's mems change to 3-7, the
277 interleave now occurs over nodes 3,5-6. If the cpuset's mems
278 then change to 0,2-3,5, then the interleave occurs over nodes
279 0,3,5.
280
281 Thanks to the consistent remapping, applications preparing
282 nodemasks to specify memory policies using this flag should
283 disregard their current, actual cpuset imposed memory placement
284 and prepare the nodemask as if they were always located on
285 memory nodes 0 to N-1, where N is the number of memory nodes the
286 policy is intended to manage. Let the kernel then remap to the
287 set of memory nodes allowed by the task's cpuset, as that may
288 change over time.
289
290 MPOL_F_RELATIVE_NODES cannot be combined with the
291 MPOL_F_STATIC_NODES flag. It also cannot be used for
292 MPOL_PREFERRED policies that were created with an empty nodemask
293 (local allocation).
294
295MEMORY POLICY REFERENCE COUNTING
296
297To resolve use/free races, struct mempolicy contains an atomic reference
298count field. Internal interfaces, mpol_get()/mpol_put() increment and
299decrement this reference count, respectively. mpol_put() will only free
300the structure back to the mempolicy kmem cache when the reference count
301goes to zero.
302
303When a new memory policy is allocated, it's reference count is initialized
304to '1', representing the reference held by the task that is installing the
305new policy. When a pointer to a memory policy structure is stored in another
306structure, another reference is added, as the task's reference will be dropped
307on completion of the policy installation.
308
309During run-time "usage" of the policy, we attempt to minimize atomic operations
310on the reference count, as this can lead to cache lines bouncing between cpus
311and NUMA nodes. "Usage" here means one of the following:
312
3131) querying of the policy, either by the task itself [using the get_mempolicy()
314 API discussed below] or by another task using the /proc/<pid>/numa_maps
315 interface.
316
3172) examination of the policy to determine the policy mode and associated node
318 or node lists, if any, for page allocation. This is considered a "hot
319 path". Note that for MPOL_BIND, the "usage" extends across the entire
320 allocation process, which may sleep during page reclaimation, because the
321 BIND policy nodemask is used, by reference, to filter ineligible nodes.
322
323We can avoid taking an extra reference during the usages listed above as
324follows:
325
3261) we never need to get/free the system default policy as this is never
327 changed nor freed, once the system is up and running.
328
3292) for querying the policy, we do not need to take an extra reference on the
330 target task's task policy nor vma policies because we always acquire the
331 task's mm's mmap_sem for read during the query. The set_mempolicy() and
332 mbind() APIs [see below] always acquire the mmap_sem for write when
333 installing or replacing task or vma policies. Thus, there is no possibility
334 of a task or thread freeing a policy while another task or thread is
335 querying it.
336
3373) Page allocation usage of task or vma policy occurs in the fault path where
338 we hold them mmap_sem for read. Again, because replacing the task or vma
339 policy requires that the mmap_sem be held for write, the policy can't be
340 freed out from under us while we're using it for page allocation.
341
3424) Shared policies require special consideration. One task can replace a
343 shared memory policy while another task, with a distinct mmap_sem, is
344 querying or allocating a page based on the policy. To resolve this
345 potential race, the shared policy infrastructure adds an extra reference
346 to the shared policy during lookup while holding a spin lock on the shared
347 policy management structure. This requires that we drop this extra
348 reference when we're finished "using" the policy. We must drop the
349 extra reference on shared policies in the same query/allocation paths
350 used for non-shared policies. For this reason, shared policies are marked
351 as such, and the extra reference is dropped "conditionally"--i.e., only
352 for shared policies.
353
354 Because of this extra reference counting, and because we must lookup
355 shared policies in a tree structure under spinlock, shared policies are
356 more expensive to use in the page allocation path. This is expecially
357 true for shared policies on shared memory regions shared by tasks running
358 on different NUMA nodes. This extra overhead can be avoided by always
359 falling back to task or system default policy for shared memory regions,
360 or by prefaulting the entire shared memory region into memory and locking
361 it down. However, this might not be appropriate for all applications.
362
234MEMORY POLICY APIs 363MEMORY POLICY APIs
235 364
236Linux supports 3 system calls for controlling memory policy. These APIS 365Linux supports 3 system calls for controlling memory policy. These APIS
@@ -251,7 +380,9 @@ Set [Task] Memory Policy:
251 Set's the calling task's "task/process memory policy" to mode 380 Set's the calling task's "task/process memory policy" to mode
252 specified by the 'mode' argument and the set of nodes defined 381 specified by the 'mode' argument and the set of nodes defined
253 by 'nmask'. 'nmask' points to a bit mask of node ids containing 382 by 'nmask'. 'nmask' points to a bit mask of node ids containing
254 at least 'maxnode' ids. 383 at least 'maxnode' ids. Optional mode flags may be passed by
384 combining the 'mode' argument with the flag (for example:
385 MPOL_INTERLEAVE | MPOL_F_STATIC_NODES).
255 386
256 See the set_mempolicy(2) man page for more details 387 See the set_mempolicy(2) man page for more details
257 388
@@ -303,29 +434,19 @@ MEMORY POLICIES AND CPUSETS
303Memory policies work within cpusets as described above. For memory policies 434Memory policies work within cpusets as described above. For memory policies
304that require a node or set of nodes, the nodes are restricted to the set of 435that require a node or set of nodes, the nodes are restricted to the set of
305nodes whose memories are allowed by the cpuset constraints. If the nodemask 436nodes whose memories are allowed by the cpuset constraints. If the nodemask
306specified for the policy contains nodes that are not allowed by the cpuset, or 437specified for the policy contains nodes that are not allowed by the cpuset and
307the intersection of the set of nodes specified for the policy and the set of 438MPOL_F_RELATIVE_NODES is not used, the intersection of the set of nodes
308nodes with memory is the empty set, the policy is considered invalid 439specified for the policy and the set of nodes with memory is used. If the
309and cannot be installed. 440result is the empty set, the policy is considered invalid and cannot be
310 441installed. If MPOL_F_RELATIVE_NODES is used, the policy's nodes are mapped
311The interaction of memory policies and cpusets can be problematic for a 442onto and folded into the task's set of allowed nodes as previously described.
312couple of reasons: 443
313 444The interaction of memory policies and cpusets can be problematic when tasks
3141) the memory policy APIs take physical node id's as arguments. As mentioned 445in two cpusets share access to a memory region, such as shared memory segments
315 above, it is illegal to specify nodes that are not allowed in the cpuset. 446created by shmget() of mmap() with the MAP_ANONYMOUS and MAP_SHARED flags, and
316 The application must query the allowed nodes using the get_mempolicy() 447any of the tasks install shared policy on the region, only nodes whose
317 API with the MPOL_F_MEMS_ALLOWED flag to determine the allowed nodes and 448memories are allowed in both cpusets may be used in the policies. Obtaining
318 restrict itself to those nodes. However, the resources available to a 449this information requires "stepping outside" the memory policy APIs to use the
319 cpuset can be changed by the system administrator, or a workload manager 450cpuset information and requires that one know in what cpusets other task might
320 application, at any time. So, a task may still get errors attempting to 451be attaching to the shared region. Furthermore, if the cpusets' allowed
321 specify policy nodes, and must query the allowed memories again. 452memory sets are disjoint, "local" allocation is the only valid policy.
322
3232) when tasks in two cpusets share access to a memory region, such as shared
324 memory segments created by shmget() of mmap() with the MAP_ANONYMOUS and
325 MAP_SHARED flags, and any of the tasks install shared policy on the region,
326 only nodes whose memories are allowed in both cpusets may be used in the
327 policies. Obtaining this information requires "stepping outside" the
328 memory policy APIs to use the cpuset information and requires that one
329 know in what cpusets other task might be attaching to the shared region.
330 Furthermore, if the cpusets' allowed memory sets are disjoint, "local"
331 allocation is the only valid policy.