diff options
Diffstat (limited to 'Documentation')
148 files changed, 5279 insertions, 1987 deletions
diff --git a/Documentation/ABI/stable/sysfs-devices-node b/Documentation/ABI/stable/sysfs-devices-node new file mode 100644 index 000000000000..49b82cad7003 --- /dev/null +++ b/Documentation/ABI/stable/sysfs-devices-node | |||
@@ -0,0 +1,7 @@ | |||
1 | What: /sys/devices/system/node/nodeX | ||
2 | Date: October 2002 | ||
3 | Contact: Linux Memory Management list <linux-mm@kvack.org> | ||
4 | Description: | ||
5 | When CONFIG_NUMA is enabled, this is a directory containing | ||
6 | information on node X such as what CPUs are local to the | ||
7 | node. | ||
diff --git a/Documentation/ABI/testing/sysfs-block b/Documentation/ABI/testing/sysfs-block index d2f90334bb93..4873c759d535 100644 --- a/Documentation/ABI/testing/sysfs-block +++ b/Documentation/ABI/testing/sysfs-block | |||
@@ -128,3 +128,17 @@ Description: | |||
128 | preferred request size for workloads where sustained | 128 | preferred request size for workloads where sustained |
129 | throughput is desired. If no optimal I/O size is | 129 | throughput is desired. If no optimal I/O size is |
130 | reported this file contains 0. | 130 | reported this file contains 0. |
131 | |||
132 | What: /sys/block/<disk>/queue/nomerges | ||
133 | Date: January 2010 | ||
134 | Contact: | ||
135 | Description: | ||
136 | Standard I/O elevator operations include attempts to | ||
137 | merge contiguous I/Os. For known random I/O loads these | ||
138 | attempts will always fail and result in extra cycles | ||
139 | being spent in the kernel. This allows one to turn off | ||
140 | this behavior on one of two ways: When set to 1, complex | ||
141 | merge checks are disabled, but the simple one-shot merges | ||
142 | with the previous I/O request are enabled. When set to 2, | ||
143 | all merge tries are disabled. The default value is 0 - | ||
144 | which enables all types of merge tries. | ||
diff --git a/Documentation/ABI/testing/sysfs-bus-usb b/Documentation/ABI/testing/sysfs-bus-usb index a07c0f366f91..a986e9bbba3d 100644 --- a/Documentation/ABI/testing/sysfs-bus-usb +++ b/Documentation/ABI/testing/sysfs-bus-usb | |||
@@ -159,3 +159,14 @@ Description: | |||
159 | device. This is useful to ensure auto probing won't | 159 | device. This is useful to ensure auto probing won't |
160 | match the driver to the device. For example: | 160 | match the driver to the device. For example: |
161 | # echo "046d c315" > /sys/bus/usb/drivers/foo/remove_id | 161 | # echo "046d c315" > /sys/bus/usb/drivers/foo/remove_id |
162 | |||
163 | What: /sys/bus/usb/device/.../avoid_reset | ||
164 | Date: December 2009 | ||
165 | Contact: Oliver Neukum <oliver@neukum.org> | ||
166 | Description: | ||
167 | Writing 1 to this file tells the kernel that this | ||
168 | device will morph into another mode when it is reset. | ||
169 | Drivers will not use reset for error handling for | ||
170 | such devices. | ||
171 | Users: | ||
172 | usb_modeswitch | ||
diff --git a/Documentation/ABI/testing/sysfs-devices-power b/Documentation/ABI/testing/sysfs-devices-power new file mode 100644 index 000000000000..6123c523bfd7 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-devices-power | |||
@@ -0,0 +1,79 @@ | |||
1 | What: /sys/devices/.../power/ | ||
2 | Date: January 2009 | ||
3 | Contact: Rafael J. Wysocki <rjw@sisk.pl> | ||
4 | Description: | ||
5 | The /sys/devices/.../power directory contains attributes | ||
6 | allowing the user space to check and modify some power | ||
7 | management related properties of given device. | ||
8 | |||
9 | What: /sys/devices/.../power/wakeup | ||
10 | Date: January 2009 | ||
11 | Contact: Rafael J. Wysocki <rjw@sisk.pl> | ||
12 | Description: | ||
13 | The /sys/devices/.../power/wakeup attribute allows the user | ||
14 | space to check if the device is enabled to wake up the system | ||
15 | from sleep states, such as the memory sleep state (suspend to | ||
16 | RAM) and hibernation (suspend to disk), and to enable or disable | ||
17 | it to do that as desired. | ||
18 | |||
19 | Some devices support "wakeup" events, which are hardware signals | ||
20 | used to activate the system from a sleep state. Such devices | ||
21 | have one of the following two values for the sysfs power/wakeup | ||
22 | file: | ||
23 | |||
24 | + "enabled\n" to issue the events; | ||
25 | + "disabled\n" not to do so; | ||
26 | |||
27 | In that cases the user space can change the setting represented | ||
28 | by the contents of this file by writing either "enabled", or | ||
29 | "disabled" to it. | ||
30 | |||
31 | For the devices that are not capable of generating system wakeup | ||
32 | events this file contains "\n". In that cases the user space | ||
33 | cannot modify the contents of this file and the device cannot be | ||
34 | enabled to wake up the system. | ||
35 | |||
36 | What: /sys/devices/.../power/control | ||
37 | Date: January 2009 | ||
38 | Contact: Rafael J. Wysocki <rjw@sisk.pl> | ||
39 | Description: | ||
40 | The /sys/devices/.../power/control attribute allows the user | ||
41 | space to control the run-time power management of the device. | ||
42 | |||
43 | All devices have one of the following two values for the | ||
44 | power/control file: | ||
45 | |||
46 | + "auto\n" to allow the device to be power managed at run time; | ||
47 | + "on\n" to prevent the device from being power managed; | ||
48 | |||
49 | The default for all devices is "auto", which means that they may | ||
50 | be subject to automatic power management, depending on their | ||
51 | drivers. Changing this attribute to "on" prevents the driver | ||
52 | from power managing the device at run time. Doing that while | ||
53 | the device is suspended causes it to be woken up. | ||
54 | |||
55 | What: /sys/devices/.../power/async | ||
56 | Date: January 2009 | ||
57 | Contact: Rafael J. Wysocki <rjw@sisk.pl> | ||
58 | Description: | ||
59 | The /sys/devices/.../async attribute allows the user space to | ||
60 | enable or diasble the device's suspend and resume callbacks to | ||
61 | be executed asynchronously (ie. in separate threads, in parallel | ||
62 | with the main suspend/resume thread) during system-wide power | ||
63 | transitions (eg. suspend to RAM, hibernation). | ||
64 | |||
65 | All devices have one of the following two values for the | ||
66 | power/async file: | ||
67 | |||
68 | + "enabled\n" to permit the asynchronous suspend/resume; | ||
69 | + "disabled\n" to forbid it; | ||
70 | |||
71 | The value of this attribute may be changed by writing either | ||
72 | "enabled", or "disabled" to it. | ||
73 | |||
74 | It generally is unsafe to permit the asynchronous suspend/resume | ||
75 | of a device unless it is certain that all of the PM dependencies | ||
76 | of the device are known to the PM core. However, for some | ||
77 | devices this attribute is set to "enabled" by bus type code or | ||
78 | device drivers and in that cases it should be safe to leave the | ||
79 | default value. | ||
diff --git a/Documentation/ABI/testing/sysfs-platform-asus-laptop b/Documentation/ABI/testing/sysfs-platform-asus-laptop index a1cb660c50cf..1d775390e856 100644 --- a/Documentation/ABI/testing/sysfs-platform-asus-laptop +++ b/Documentation/ABI/testing/sysfs-platform-asus-laptop | |||
@@ -1,4 +1,4 @@ | |||
1 | What: /sys/devices/platform/asus-laptop/display | 1 | What: /sys/devices/platform/asus_laptop/display |
2 | Date: January 2007 | 2 | Date: January 2007 |
3 | KernelVersion: 2.6.20 | 3 | KernelVersion: 2.6.20 |
4 | Contact: "Corentin Chary" <corentincj@iksaif.net> | 4 | Contact: "Corentin Chary" <corentincj@iksaif.net> |
@@ -13,7 +13,7 @@ Description: | |||
13 | Ex: - 0 (0000b) means no display | 13 | Ex: - 0 (0000b) means no display |
14 | - 3 (0011b) CRT+LCD. | 14 | - 3 (0011b) CRT+LCD. |
15 | 15 | ||
16 | What: /sys/devices/platform/asus-laptop/gps | 16 | What: /sys/devices/platform/asus_laptop/gps |
17 | Date: January 2007 | 17 | Date: January 2007 |
18 | KernelVersion: 2.6.20 | 18 | KernelVersion: 2.6.20 |
19 | Contact: "Corentin Chary" <corentincj@iksaif.net> | 19 | Contact: "Corentin Chary" <corentincj@iksaif.net> |
@@ -21,7 +21,7 @@ Description: | |||
21 | Control the gps device. 1 means on, 0 means off. | 21 | Control the gps device. 1 means on, 0 means off. |
22 | Users: Lapsus | 22 | Users: Lapsus |
23 | 23 | ||
24 | What: /sys/devices/platform/asus-laptop/ledd | 24 | What: /sys/devices/platform/asus_laptop/ledd |
25 | Date: January 2007 | 25 | Date: January 2007 |
26 | KernelVersion: 2.6.20 | 26 | KernelVersion: 2.6.20 |
27 | Contact: "Corentin Chary" <corentincj@iksaif.net> | 27 | Contact: "Corentin Chary" <corentincj@iksaif.net> |
@@ -29,11 +29,11 @@ Description: | |||
29 | Some models like the W1N have a LED display that can be | 29 | Some models like the W1N have a LED display that can be |
30 | used to display several informations. | 30 | used to display several informations. |
31 | To control the LED display, use the following : | 31 | To control the LED display, use the following : |
32 | echo 0x0T000DDD > /sys/devices/platform/asus-laptop/ | 32 | echo 0x0T000DDD > /sys/devices/platform/asus_laptop/ |
33 | where T control the 3 letters display, and DDD the 3 digits display. | 33 | where T control the 3 letters display, and DDD the 3 digits display. |
34 | The DDD table can be found in Documentation/laptops/asus-laptop.txt | 34 | The DDD table can be found in Documentation/laptops/asus-laptop.txt |
35 | 35 | ||
36 | What: /sys/devices/platform/asus-laptop/bluetooth | 36 | What: /sys/devices/platform/asus_laptop/bluetooth |
37 | Date: January 2007 | 37 | Date: January 2007 |
38 | KernelVersion: 2.6.20 | 38 | KernelVersion: 2.6.20 |
39 | Contact: "Corentin Chary" <corentincj@iksaif.net> | 39 | Contact: "Corentin Chary" <corentincj@iksaif.net> |
@@ -42,7 +42,7 @@ Description: | |||
42 | This may control the led, the device or both. | 42 | This may control the led, the device or both. |
43 | Users: Lapsus | 43 | Users: Lapsus |
44 | 44 | ||
45 | What: /sys/devices/platform/asus-laptop/wlan | 45 | What: /sys/devices/platform/asus_laptop/wlan |
46 | Date: January 2007 | 46 | Date: January 2007 |
47 | KernelVersion: 2.6.20 | 47 | KernelVersion: 2.6.20 |
48 | Contact: "Corentin Chary" <corentincj@iksaif.net> | 48 | Contact: "Corentin Chary" <corentincj@iksaif.net> |
diff --git a/Documentation/ABI/testing/sysfs-platform-eeepc-laptop b/Documentation/ABI/testing/sysfs-platform-eeepc-laptop index 7445dfb321b5..5b026c69587a 100644 --- a/Documentation/ABI/testing/sysfs-platform-eeepc-laptop +++ b/Documentation/ABI/testing/sysfs-platform-eeepc-laptop | |||
@@ -1,4 +1,4 @@ | |||
1 | What: /sys/devices/platform/eeepc-laptop/disp | 1 | What: /sys/devices/platform/eeepc/disp |
2 | Date: May 2008 | 2 | Date: May 2008 |
3 | KernelVersion: 2.6.26 | 3 | KernelVersion: 2.6.26 |
4 | Contact: "Corentin Chary" <corentincj@iksaif.net> | 4 | Contact: "Corentin Chary" <corentincj@iksaif.net> |
@@ -9,21 +9,21 @@ Description: | |||
9 | - 3 = LCD+CRT | 9 | - 3 = LCD+CRT |
10 | If you run X11, you should use xrandr instead. | 10 | If you run X11, you should use xrandr instead. |
11 | 11 | ||
12 | What: /sys/devices/platform/eeepc-laptop/camera | 12 | What: /sys/devices/platform/eeepc/camera |
13 | Date: May 2008 | 13 | Date: May 2008 |
14 | KernelVersion: 2.6.26 | 14 | KernelVersion: 2.6.26 |
15 | Contact: "Corentin Chary" <corentincj@iksaif.net> | 15 | Contact: "Corentin Chary" <corentincj@iksaif.net> |
16 | Description: | 16 | Description: |
17 | Control the camera. 1 means on, 0 means off. | 17 | Control the camera. 1 means on, 0 means off. |
18 | 18 | ||
19 | What: /sys/devices/platform/eeepc-laptop/cardr | 19 | What: /sys/devices/platform/eeepc/cardr |
20 | Date: May 2008 | 20 | Date: May 2008 |
21 | KernelVersion: 2.6.26 | 21 | KernelVersion: 2.6.26 |
22 | Contact: "Corentin Chary" <corentincj@iksaif.net> | 22 | Contact: "Corentin Chary" <corentincj@iksaif.net> |
23 | Description: | 23 | Description: |
24 | Control the card reader. 1 means on, 0 means off. | 24 | Control the card reader. 1 means on, 0 means off. |
25 | 25 | ||
26 | What: /sys/devices/platform/eeepc-laptop/cpufv | 26 | What: /sys/devices/platform/eeepc/cpufv |
27 | Date: Jun 2009 | 27 | Date: Jun 2009 |
28 | KernelVersion: 2.6.31 | 28 | KernelVersion: 2.6.31 |
29 | Contact: "Corentin Chary" <corentincj@iksaif.net> | 29 | Contact: "Corentin Chary" <corentincj@iksaif.net> |
@@ -42,7 +42,7 @@ Description: | |||
42 | `------------ Availables modes | 42 | `------------ Availables modes |
43 | For example, 0x301 means: mode 1 selected, 3 available modes. | 43 | For example, 0x301 means: mode 1 selected, 3 available modes. |
44 | 44 | ||
45 | What: /sys/devices/platform/eeepc-laptop/available_cpufv | 45 | What: /sys/devices/platform/eeepc/available_cpufv |
46 | Date: Jun 2009 | 46 | Date: Jun 2009 |
47 | KernelVersion: 2.6.31 | 47 | KernelVersion: 2.6.31 |
48 | Contact: "Corentin Chary" <corentincj@iksaif.net> | 48 | Contact: "Corentin Chary" <corentincj@iksaif.net> |
diff --git a/Documentation/ABI/testing/sysfs-power b/Documentation/ABI/testing/sysfs-power index dcff4d0623ad..d6a801f45b48 100644 --- a/Documentation/ABI/testing/sysfs-power +++ b/Documentation/ABI/testing/sysfs-power | |||
@@ -101,3 +101,16 @@ Description: | |||
101 | 101 | ||
102 | CAUTION: Using it will cause your machine's real-time (CMOS) | 102 | CAUTION: Using it will cause your machine's real-time (CMOS) |
103 | clock to be set to a random invalid time after a resume. | 103 | clock to be set to a random invalid time after a resume. |
104 | |||
105 | What: /sys/power/pm_async | ||
106 | Date: January 2009 | ||
107 | Contact: Rafael J. Wysocki <rjw@sisk.pl> | ||
108 | Description: | ||
109 | The /sys/power/pm_async file controls the switch allowing the | ||
110 | user space to enable or disable asynchronous suspend and resume | ||
111 | of devices. If enabled, this feature will cause some device | ||
112 | drivers' suspend and resume callbacks to be executed in parallel | ||
113 | with each other and with the main suspend thread. It is enabled | ||
114 | if this file contains "1", which is the default. It may be | ||
115 | disabled by writing "0" to this file, in which case all devices | ||
116 | will be suspended and resumed synchronously. | ||
diff --git a/Documentation/DMA-API.txt b/Documentation/DMA-API.txt index 5aceb88b3f8b..05e2ae236865 100644 --- a/Documentation/DMA-API.txt +++ b/Documentation/DMA-API.txt | |||
@@ -4,20 +4,18 @@ | |||
4 | James E.J. Bottomley <James.Bottomley@HansenPartnership.com> | 4 | James E.J. Bottomley <James.Bottomley@HansenPartnership.com> |
5 | 5 | ||
6 | This document describes the DMA API. For a more gentle introduction | 6 | This document describes the DMA API. For a more gentle introduction |
7 | phrased in terms of the pci_ equivalents (and actual examples) see | 7 | of the API (and actual examples) see |
8 | Documentation/PCI/PCI-DMA-mapping.txt. | 8 | Documentation/DMA-API-HOWTO.txt. |
9 | 9 | ||
10 | This API is split into two pieces. Part I describes the API and the | 10 | This API is split into two pieces. Part I describes the API. Part II |
11 | corresponding pci_ API. Part II describes the extensions to the API | 11 | describes the extensions to the API for supporting non-consistent |
12 | for supporting non-consistent memory machines. Unless you know that | 12 | memory machines. Unless you know that your driver absolutely has to |
13 | your driver absolutely has to support non-consistent platforms (this | 13 | support non-consistent platforms (this is usually only legacy |
14 | is usually only legacy platforms) you should only use the API | 14 | platforms) you should only use the API described in part I. |
15 | described in part I. | ||
16 | 15 | ||
17 | Part I - pci_ and dma_ Equivalent API | 16 | Part I - dma_ API |
18 | ------------------------------------- | 17 | ------------------------------------- |
19 | 18 | ||
20 | To get the pci_ API, you must #include <linux/pci.h> | ||
21 | To get the dma_ API, you must #include <linux/dma-mapping.h> | 19 | To get the dma_ API, you must #include <linux/dma-mapping.h> |
22 | 20 | ||
23 | 21 | ||
@@ -27,9 +25,6 @@ Part Ia - Using large dma-coherent buffers | |||
27 | void * | 25 | void * |
28 | dma_alloc_coherent(struct device *dev, size_t size, | 26 | dma_alloc_coherent(struct device *dev, size_t size, |
29 | dma_addr_t *dma_handle, gfp_t flag) | 27 | dma_addr_t *dma_handle, gfp_t flag) |
30 | void * | ||
31 | pci_alloc_consistent(struct pci_dev *dev, size_t size, | ||
32 | dma_addr_t *dma_handle) | ||
33 | 28 | ||
34 | Consistent memory is memory for which a write by either the device or | 29 | Consistent memory is memory for which a write by either the device or |
35 | the processor can immediately be read by the processor or device | 30 | the processor can immediately be read by the processor or device |
@@ -53,15 +48,11 @@ The simplest way to do that is to use the dma_pool calls (see below). | |||
53 | The flag parameter (dma_alloc_coherent only) allows the caller to | 48 | The flag parameter (dma_alloc_coherent only) allows the caller to |
54 | specify the GFP_ flags (see kmalloc) for the allocation (the | 49 | specify the GFP_ flags (see kmalloc) for the allocation (the |
55 | implementation may choose to ignore flags that affect the location of | 50 | implementation may choose to ignore flags that affect the location of |
56 | the returned memory, like GFP_DMA). For pci_alloc_consistent, you | 51 | the returned memory, like GFP_DMA). |
57 | must assume GFP_ATOMIC behaviour. | ||
58 | 52 | ||
59 | void | 53 | void |
60 | dma_free_coherent(struct device *dev, size_t size, void *cpu_addr, | 54 | dma_free_coherent(struct device *dev, size_t size, void *cpu_addr, |
61 | dma_addr_t dma_handle) | 55 | dma_addr_t dma_handle) |
62 | void | ||
63 | pci_free_consistent(struct pci_dev *dev, size_t size, void *cpu_addr, | ||
64 | dma_addr_t dma_handle) | ||
65 | 56 | ||
66 | Free the region of consistent memory you previously allocated. dev, | 57 | Free the region of consistent memory you previously allocated. dev, |
67 | size and dma_handle must all be the same as those passed into the | 58 | size and dma_handle must all be the same as those passed into the |
@@ -89,10 +80,6 @@ for alignment, like queue heads needing to be aligned on N-byte boundaries. | |||
89 | dma_pool_create(const char *name, struct device *dev, | 80 | dma_pool_create(const char *name, struct device *dev, |
90 | size_t size, size_t align, size_t alloc); | 81 | size_t size, size_t align, size_t alloc); |
91 | 82 | ||
92 | struct pci_pool * | ||
93 | pci_pool_create(const char *name, struct pci_device *dev, | ||
94 | size_t size, size_t align, size_t alloc); | ||
95 | |||
96 | The pool create() routines initialize a pool of dma-coherent buffers | 83 | The pool create() routines initialize a pool of dma-coherent buffers |
97 | for use with a given device. It must be called in a context which | 84 | for use with a given device. It must be called in a context which |
98 | can sleep. | 85 | can sleep. |
@@ -108,9 +95,6 @@ from this pool must not cross 4KByte boundaries. | |||
108 | void *dma_pool_alloc(struct dma_pool *pool, gfp_t gfp_flags, | 95 | void *dma_pool_alloc(struct dma_pool *pool, gfp_t gfp_flags, |
109 | dma_addr_t *dma_handle); | 96 | dma_addr_t *dma_handle); |
110 | 97 | ||
111 | void *pci_pool_alloc(struct pci_pool *pool, gfp_t gfp_flags, | ||
112 | dma_addr_t *dma_handle); | ||
113 | |||
114 | This allocates memory from the pool; the returned memory will meet the size | 98 | This allocates memory from the pool; the returned memory will meet the size |
115 | and alignment requirements specified at creation time. Pass GFP_ATOMIC to | 99 | and alignment requirements specified at creation time. Pass GFP_ATOMIC to |
116 | prevent blocking, or if it's permitted (not in_interrupt, not holding SMP locks), | 100 | prevent blocking, or if it's permitted (not in_interrupt, not holding SMP locks), |
@@ -122,9 +106,6 @@ pool's device. | |||
122 | void dma_pool_free(struct dma_pool *pool, void *vaddr, | 106 | void dma_pool_free(struct dma_pool *pool, void *vaddr, |
123 | dma_addr_t addr); | 107 | dma_addr_t addr); |
124 | 108 | ||
125 | void pci_pool_free(struct pci_pool *pool, void *vaddr, | ||
126 | dma_addr_t addr); | ||
127 | |||
128 | This puts memory back into the pool. The pool is what was passed to | 109 | This puts memory back into the pool. The pool is what was passed to |
129 | the pool allocation routine; the cpu (vaddr) and dma addresses are what | 110 | the pool allocation routine; the cpu (vaddr) and dma addresses are what |
130 | were returned when that routine allocated the memory being freed. | 111 | were returned when that routine allocated the memory being freed. |
@@ -132,8 +113,6 @@ were returned when that routine allocated the memory being freed. | |||
132 | 113 | ||
133 | void dma_pool_destroy(struct dma_pool *pool); | 114 | void dma_pool_destroy(struct dma_pool *pool); |
134 | 115 | ||
135 | void pci_pool_destroy(struct pci_pool *pool); | ||
136 | |||
137 | The pool destroy() routines free the resources of the pool. They must be | 116 | The pool destroy() routines free the resources of the pool. They must be |
138 | called in a context which can sleep. Make sure you've freed all allocated | 117 | called in a context which can sleep. Make sure you've freed all allocated |
139 | memory back to the pool before you destroy it. | 118 | memory back to the pool before you destroy it. |
@@ -144,8 +123,6 @@ Part Ic - DMA addressing limitations | |||
144 | 123 | ||
145 | int | 124 | int |
146 | dma_supported(struct device *dev, u64 mask) | 125 | dma_supported(struct device *dev, u64 mask) |
147 | int | ||
148 | pci_dma_supported(struct pci_dev *hwdev, u64 mask) | ||
149 | 126 | ||
150 | Checks to see if the device can support DMA to the memory described by | 127 | Checks to see if the device can support DMA to the memory described by |
151 | mask. | 128 | mask. |
@@ -159,8 +136,14 @@ driver writers. | |||
159 | 136 | ||
160 | int | 137 | int |
161 | dma_set_mask(struct device *dev, u64 mask) | 138 | dma_set_mask(struct device *dev, u64 mask) |
139 | |||
140 | Checks to see if the mask is possible and updates the device | ||
141 | parameters if it is. | ||
142 | |||
143 | Returns: 0 if successful and a negative error if not. | ||
144 | |||
162 | int | 145 | int |
163 | pci_set_dma_mask(struct pci_device *dev, u64 mask) | 146 | dma_set_coherent_mask(struct device *dev, u64 mask) |
164 | 147 | ||
165 | Checks to see if the mask is possible and updates the device | 148 | Checks to see if the mask is possible and updates the device |
166 | parameters if it is. | 149 | parameters if it is. |
@@ -187,9 +170,6 @@ Part Id - Streaming DMA mappings | |||
187 | dma_addr_t | 170 | dma_addr_t |
188 | dma_map_single(struct device *dev, void *cpu_addr, size_t size, | 171 | dma_map_single(struct device *dev, void *cpu_addr, size_t size, |
189 | enum dma_data_direction direction) | 172 | enum dma_data_direction direction) |
190 | dma_addr_t | ||
191 | pci_map_single(struct pci_dev *hwdev, void *cpu_addr, size_t size, | ||
192 | int direction) | ||
193 | 173 | ||
194 | Maps a piece of processor virtual memory so it can be accessed by the | 174 | Maps a piece of processor virtual memory so it can be accessed by the |
195 | device and returns the physical handle of the memory. | 175 | device and returns the physical handle of the memory. |
@@ -198,14 +178,10 @@ The direction for both api's may be converted freely by casting. | |||
198 | However the dma_ API uses a strongly typed enumerator for its | 178 | However the dma_ API uses a strongly typed enumerator for its |
199 | direction: | 179 | direction: |
200 | 180 | ||
201 | DMA_NONE = PCI_DMA_NONE no direction (used for | 181 | DMA_NONE no direction (used for debugging) |
202 | debugging) | 182 | DMA_TO_DEVICE data is going from the memory to the device |
203 | DMA_TO_DEVICE = PCI_DMA_TODEVICE data is going from the | 183 | DMA_FROM_DEVICE data is coming from the device to the memory |
204 | memory to the device | 184 | DMA_BIDIRECTIONAL direction isn't known |
205 | DMA_FROM_DEVICE = PCI_DMA_FROMDEVICE data is coming from | ||
206 | the device to the | ||
207 | memory | ||
208 | DMA_BIDIRECTIONAL = PCI_DMA_BIDIRECTIONAL direction isn't known | ||
209 | 185 | ||
210 | Notes: Not all memory regions in a machine can be mapped by this | 186 | Notes: Not all memory regions in a machine can be mapped by this |
211 | API. Further, regions that appear to be physically contiguous in | 187 | API. Further, regions that appear to be physically contiguous in |
@@ -268,9 +244,6 @@ cache lines are updated with data that the device may have changed). | |||
268 | void | 244 | void |
269 | dma_unmap_single(struct device *dev, dma_addr_t dma_addr, size_t size, | 245 | dma_unmap_single(struct device *dev, dma_addr_t dma_addr, size_t size, |
270 | enum dma_data_direction direction) | 246 | enum dma_data_direction direction) |
271 | void | ||
272 | pci_unmap_single(struct pci_dev *hwdev, dma_addr_t dma_addr, | ||
273 | size_t size, int direction) | ||
274 | 247 | ||
275 | Unmaps the region previously mapped. All the parameters passed in | 248 | Unmaps the region previously mapped. All the parameters passed in |
276 | must be identical to those passed in (and returned) by the mapping | 249 | must be identical to those passed in (and returned) by the mapping |
@@ -280,15 +253,9 @@ dma_addr_t | |||
280 | dma_map_page(struct device *dev, struct page *page, | 253 | dma_map_page(struct device *dev, struct page *page, |
281 | unsigned long offset, size_t size, | 254 | unsigned long offset, size_t size, |
282 | enum dma_data_direction direction) | 255 | enum dma_data_direction direction) |
283 | dma_addr_t | ||
284 | pci_map_page(struct pci_dev *hwdev, struct page *page, | ||
285 | unsigned long offset, size_t size, int direction) | ||
286 | void | 256 | void |
287 | dma_unmap_page(struct device *dev, dma_addr_t dma_address, size_t size, | 257 | dma_unmap_page(struct device *dev, dma_addr_t dma_address, size_t size, |
288 | enum dma_data_direction direction) | 258 | enum dma_data_direction direction) |
289 | void | ||
290 | pci_unmap_page(struct pci_dev *hwdev, dma_addr_t dma_address, | ||
291 | size_t size, int direction) | ||
292 | 259 | ||
293 | API for mapping and unmapping for pages. All the notes and warnings | 260 | API for mapping and unmapping for pages. All the notes and warnings |
294 | for the other mapping APIs apply here. Also, although the <offset> | 261 | for the other mapping APIs apply here. Also, although the <offset> |
@@ -299,9 +266,6 @@ cache width is. | |||
299 | int | 266 | int |
300 | dma_mapping_error(struct device *dev, dma_addr_t dma_addr) | 267 | dma_mapping_error(struct device *dev, dma_addr_t dma_addr) |
301 | 268 | ||
302 | int | ||
303 | pci_dma_mapping_error(struct pci_dev *hwdev, dma_addr_t dma_addr) | ||
304 | |||
305 | In some circumstances dma_map_single and dma_map_page will fail to create | 269 | In some circumstances dma_map_single and dma_map_page will fail to create |
306 | a mapping. A driver can check for these errors by testing the returned | 270 | a mapping. A driver can check for these errors by testing the returned |
307 | dma address with dma_mapping_error(). A non-zero return value means the mapping | 271 | dma address with dma_mapping_error(). A non-zero return value means the mapping |
@@ -311,9 +275,6 @@ reduce current DMA mapping usage or delay and try again later). | |||
311 | int | 275 | int |
312 | dma_map_sg(struct device *dev, struct scatterlist *sg, | 276 | dma_map_sg(struct device *dev, struct scatterlist *sg, |
313 | int nents, enum dma_data_direction direction) | 277 | int nents, enum dma_data_direction direction) |
314 | int | ||
315 | pci_map_sg(struct pci_dev *hwdev, struct scatterlist *sg, | ||
316 | int nents, int direction) | ||
317 | 278 | ||
318 | Returns: the number of physical segments mapped (this may be shorter | 279 | Returns: the number of physical segments mapped (this may be shorter |
319 | than <nents> passed in if some elements of the scatter/gather list are | 280 | than <nents> passed in if some elements of the scatter/gather list are |
@@ -353,9 +314,6 @@ accessed sg->address and sg->length as shown above. | |||
353 | void | 314 | void |
354 | dma_unmap_sg(struct device *dev, struct scatterlist *sg, | 315 | dma_unmap_sg(struct device *dev, struct scatterlist *sg, |
355 | int nhwentries, enum dma_data_direction direction) | 316 | int nhwentries, enum dma_data_direction direction) |
356 | void | ||
357 | pci_unmap_sg(struct pci_dev *hwdev, struct scatterlist *sg, | ||
358 | int nents, int direction) | ||
359 | 317 | ||
360 | Unmap the previously mapped scatter/gather list. All the parameters | 318 | Unmap the previously mapped scatter/gather list. All the parameters |
361 | must be the same as those and passed in to the scatter/gather mapping | 319 | must be the same as those and passed in to the scatter/gather mapping |
@@ -365,21 +323,23 @@ Note: <nents> must be the number you passed in, *not* the number of | |||
365 | physical entries returned. | 323 | physical entries returned. |
366 | 324 | ||
367 | void | 325 | void |
368 | dma_sync_single(struct device *dev, dma_addr_t dma_handle, size_t size, | 326 | dma_sync_single_for_cpu(struct device *dev, dma_addr_t dma_handle, size_t size, |
369 | enum dma_data_direction direction) | 327 | enum dma_data_direction direction) |
370 | void | 328 | void |
371 | pci_dma_sync_single(struct pci_dev *hwdev, dma_addr_t dma_handle, | 329 | dma_sync_single_for_device(struct device *dev, dma_addr_t dma_handle, size_t size, |
372 | size_t size, int direction) | 330 | enum dma_data_direction direction) |
373 | void | 331 | void |
374 | dma_sync_sg(struct device *dev, struct scatterlist *sg, int nelems, | 332 | dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg, int nelems, |
375 | enum dma_data_direction direction) | 333 | enum dma_data_direction direction) |
376 | void | 334 | void |
377 | pci_dma_sync_sg(struct pci_dev *hwdev, struct scatterlist *sg, | 335 | dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg, int nelems, |
378 | int nelems, int direction) | 336 | enum dma_data_direction direction) |
379 | 337 | ||
380 | Synchronise a single contiguous or scatter/gather mapping. All the | 338 | Synchronise a single contiguous or scatter/gather mapping for the cpu |
381 | parameters must be the same as those passed into the single mapping | 339 | and device. With the sync_sg API, all the parameters must be the same |
382 | API. | 340 | as those passed into the single mapping API. With the sync_single API, |
341 | you can use dma_handle and size parameters that aren't identical to | ||
342 | those passed into the single mapping API to do a partial sync. | ||
383 | 343 | ||
384 | Notes: You must do this: | 344 | Notes: You must do this: |
385 | 345 | ||
@@ -461,9 +421,9 @@ void whizco_dma_map_sg_attrs(struct device *dev, dma_addr_t dma_addr, | |||
461 | Part II - Advanced dma_ usage | 421 | Part II - Advanced dma_ usage |
462 | ----------------------------- | 422 | ----------------------------- |
463 | 423 | ||
464 | Warning: These pieces of the DMA API have no PCI equivalent. They | 424 | Warning: These pieces of the DMA API should not be used in the |
465 | should also not be used in the majority of cases, since they cater for | 425 | majority of cases, since they cater for unlikely corner cases that |
466 | unlikely corner cases that don't belong in usual drivers. | 426 | don't belong in usual drivers. |
467 | 427 | ||
468 | If you don't understand how cache line coherency works between a | 428 | If you don't understand how cache line coherency works between a |
469 | processor and an I/O device, you should not be using this part of the | 429 | processor and an I/O device, you should not be using this part of the |
@@ -514,16 +474,6 @@ into the width returned by this call. It will also always be a power | |||
514 | of two for easy alignment. | 474 | of two for easy alignment. |
515 | 475 | ||
516 | void | 476 | void |
517 | dma_sync_single_range(struct device *dev, dma_addr_t dma_handle, | ||
518 | unsigned long offset, size_t size, | ||
519 | enum dma_data_direction direction) | ||
520 | |||
521 | Does a partial sync, starting at offset and continuing for size. You | ||
522 | must be careful to observe the cache alignment and width when doing | ||
523 | anything like this. You must also be extra careful about accessing | ||
524 | memory you intend to sync partially. | ||
525 | |||
526 | void | ||
527 | dma_cache_sync(struct device *dev, void *vaddr, size_t size, | 477 | dma_cache_sync(struct device *dev, void *vaddr, size_t size, |
528 | enum dma_data_direction direction) | 478 | enum dma_data_direction direction) |
529 | 479 | ||
diff --git a/Documentation/DocBook/device-drivers.tmpl b/Documentation/DocBook/device-drivers.tmpl index f9a6e2c75f12..1b2dd4fc3db2 100644 --- a/Documentation/DocBook/device-drivers.tmpl +++ b/Documentation/DocBook/device-drivers.tmpl | |||
@@ -45,7 +45,7 @@ | |||
45 | </sect1> | 45 | </sect1> |
46 | 46 | ||
47 | <sect1><title>Atomic and pointer manipulation</title> | 47 | <sect1><title>Atomic and pointer manipulation</title> |
48 | !Iarch/x86/include/asm/atomic_32.h | 48 | !Iarch/x86/include/asm/atomic.h |
49 | !Iarch/x86/include/asm/unaligned.h | 49 | !Iarch/x86/include/asm/unaligned.h |
50 | </sect1> | 50 | </sect1> |
51 | 51 | ||
diff --git a/Documentation/DocBook/deviceiobook.tmpl b/Documentation/DocBook/deviceiobook.tmpl index 3ed88126ab8f..c1ed6a49e598 100644 --- a/Documentation/DocBook/deviceiobook.tmpl +++ b/Documentation/DocBook/deviceiobook.tmpl | |||
@@ -316,7 +316,7 @@ CPU B: spin_unlock_irqrestore(&dev_lock, flags) | |||
316 | 316 | ||
317 | <chapter id="pubfunctions"> | 317 | <chapter id="pubfunctions"> |
318 | <title>Public Functions Provided</title> | 318 | <title>Public Functions Provided</title> |
319 | !Iarch/x86/include/asm/io_32.h | 319 | !Iarch/x86/include/asm/io.h |
320 | !Elib/iomap.c | 320 | !Elib/iomap.c |
321 | </chapter> | 321 | </chapter> |
322 | 322 | ||
diff --git a/Documentation/DocBook/mac80211.tmpl b/Documentation/DocBook/mac80211.tmpl index f3f37f141dbd..affb15a344a1 100644 --- a/Documentation/DocBook/mac80211.tmpl +++ b/Documentation/DocBook/mac80211.tmpl | |||
@@ -144,7 +144,7 @@ usage should require reading the full document. | |||
144 | this though and the recommendation to allow only a single | 144 | this though and the recommendation to allow only a single |
145 | interface in STA mode at first! | 145 | interface in STA mode at first! |
146 | </para> | 146 | </para> |
147 | !Finclude/net/mac80211.h ieee80211_if_init_conf | 147 | !Finclude/net/mac80211.h ieee80211_vif |
148 | </chapter> | 148 | </chapter> |
149 | 149 | ||
150 | <chapter id="rx-tx"> | 150 | <chapter id="rx-tx"> |
@@ -234,7 +234,6 @@ usage should require reading the full document. | |||
234 | <title>Multiple queues and QoS support</title> | 234 | <title>Multiple queues and QoS support</title> |
235 | <para>TBD</para> | 235 | <para>TBD</para> |
236 | !Finclude/net/mac80211.h ieee80211_tx_queue_params | 236 | !Finclude/net/mac80211.h ieee80211_tx_queue_params |
237 | !Finclude/net/mac80211.h ieee80211_tx_queue_stats | ||
238 | </chapter> | 237 | </chapter> |
239 | 238 | ||
240 | <chapter id="AP"> | 239 | <chapter id="AP"> |
diff --git a/Documentation/DocBook/mtdnand.tmpl b/Documentation/DocBook/mtdnand.tmpl index 5e7d84b48505..133cd6c3f3c1 100644 --- a/Documentation/DocBook/mtdnand.tmpl +++ b/Documentation/DocBook/mtdnand.tmpl | |||
@@ -488,7 +488,7 @@ static void board_select_chip (struct mtd_info *mtd, int chip) | |||
488 | The ECC bytes must be placed immidiately after the data | 488 | The ECC bytes must be placed immidiately after the data |
489 | bytes in order to make the syndrome generator work. This | 489 | bytes in order to make the syndrome generator work. This |
490 | is contrary to the usual layout used by software ECC. The | 490 | is contrary to the usual layout used by software ECC. The |
491 | seperation of data and out of band area is not longer | 491 | separation of data and out of band area is not longer |
492 | possible. The nand driver code handles this layout and | 492 | possible. The nand driver code handles this layout and |
493 | the remaining free bytes in the oob area are managed by | 493 | the remaining free bytes in the oob area are managed by |
494 | the autoplacement code. Provide a matching oob-layout | 494 | the autoplacement code. Provide a matching oob-layout |
@@ -560,7 +560,7 @@ static void board_select_chip (struct mtd_info *mtd, int chip) | |||
560 | bad blocks. They have factory marked good blocks. The marker pattern | 560 | bad blocks. They have factory marked good blocks. The marker pattern |
561 | is erased when the block is erased to be reused. So in case of | 561 | is erased when the block is erased to be reused. So in case of |
562 | powerloss before writing the pattern back to the chip this block | 562 | powerloss before writing the pattern back to the chip this block |
563 | would be lost and added to the bad blocks. Therefor we scan the | 563 | would be lost and added to the bad blocks. Therefore we scan the |
564 | chip(s) when we detect them the first time for good blocks and | 564 | chip(s) when we detect them the first time for good blocks and |
565 | store this information in a bad block table before erasing any | 565 | store this information in a bad block table before erasing any |
566 | of the blocks. | 566 | of the blocks. |
@@ -1094,7 +1094,7 @@ in this page</entry> | |||
1094 | manufacturers specifications. This applies similar to the spare area. | 1094 | manufacturers specifications. This applies similar to the spare area. |
1095 | </para> | 1095 | </para> |
1096 | <para> | 1096 | <para> |
1097 | Therefor NAND aware filesystems must either write in page size chunks | 1097 | Therefore NAND aware filesystems must either write in page size chunks |
1098 | or hold a writebuffer to collect smaller writes until they sum up to | 1098 | or hold a writebuffer to collect smaller writes until they sum up to |
1099 | pagesize. Available NAND aware filesystems: JFFS2, YAFFS. | 1099 | pagesize. Available NAND aware filesystems: JFFS2, YAFFS. |
1100 | </para> | 1100 | </para> |
diff --git a/Documentation/DocBook/v4l/common.xml b/Documentation/DocBook/v4l/common.xml index c65f0ac9b6ee..cea23e1c4fc6 100644 --- a/Documentation/DocBook/v4l/common.xml +++ b/Documentation/DocBook/v4l/common.xml | |||
@@ -1170,7 +1170,7 @@ frames per second. If less than this number of frames is to be | |||
1170 | captured or output, applications can request frame skipping or | 1170 | captured or output, applications can request frame skipping or |
1171 | duplicating on the driver side. This is especially useful when using | 1171 | duplicating on the driver side. This is especially useful when using |
1172 | the &func-read; or &func-write;, which are not augmented by timestamps | 1172 | the &func-read; or &func-write;, which are not augmented by timestamps |
1173 | or sequence counters, and to avoid unneccessary data copying.</para> | 1173 | or sequence counters, and to avoid unnecessary data copying.</para> |
1174 | 1174 | ||
1175 | <para>Finally these ioctls can be used to determine the number of | 1175 | <para>Finally these ioctls can be used to determine the number of |
1176 | buffers used internally by a driver in read/write mode. For | 1176 | buffers used internally by a driver in read/write mode. For |
diff --git a/Documentation/DocBook/v4l/io.xml b/Documentation/DocBook/v4l/io.xml index f92f24323b2a..e870330cbf77 100644 --- a/Documentation/DocBook/v4l/io.xml +++ b/Documentation/DocBook/v4l/io.xml | |||
@@ -589,7 +589,8 @@ number of a video input as in &v4l2-input; field | |||
589 | <entry></entry> | 589 | <entry></entry> |
590 | <entry>A place holder for future extensions and custom | 590 | <entry>A place holder for future extensions and custom |
591 | (driver defined) buffer types | 591 | (driver defined) buffer types |
592 | <constant>V4L2_BUF_TYPE_PRIVATE</constant> and higher.</entry> | 592 | <constant>V4L2_BUF_TYPE_PRIVATE</constant> and higher. Applications |
593 | should set this to 0.</entry> | ||
593 | </row> | 594 | </row> |
594 | </tbody> | 595 | </tbody> |
595 | </tgroup> | 596 | </tgroup> |
diff --git a/Documentation/DocBook/v4l/vidioc-g-parm.xml b/Documentation/DocBook/v4l/vidioc-g-parm.xml index 78332d365ce9..392aa9e5571e 100644 --- a/Documentation/DocBook/v4l/vidioc-g-parm.xml +++ b/Documentation/DocBook/v4l/vidioc-g-parm.xml | |||
@@ -55,7 +55,7 @@ captured or output, applications can request frame skipping or | |||
55 | duplicating on the driver side. This is especially useful when using | 55 | duplicating on the driver side. This is especially useful when using |
56 | the <function>read()</function> or <function>write()</function>, which | 56 | the <function>read()</function> or <function>write()</function>, which |
57 | are not augmented by timestamps or sequence counters, and to avoid | 57 | are not augmented by timestamps or sequence counters, and to avoid |
58 | unneccessary data copying.</para> | 58 | unnecessary data copying.</para> |
59 | 59 | ||
60 | <para>Further these ioctls can be used to determine the number of | 60 | <para>Further these ioctls can be used to determine the number of |
61 | buffers used internally by a driver in read/write mode. For | 61 | buffers used internally by a driver in read/write mode. For |
diff --git a/Documentation/DocBook/v4l/vidioc-qbuf.xml b/Documentation/DocBook/v4l/vidioc-qbuf.xml index 187081778154..b843bd7b3897 100644 --- a/Documentation/DocBook/v4l/vidioc-qbuf.xml +++ b/Documentation/DocBook/v4l/vidioc-qbuf.xml | |||
@@ -54,12 +54,10 @@ to enqueue an empty (capturing) or filled (output) buffer in the | |||
54 | driver's incoming queue. The semantics depend on the selected I/O | 54 | driver's incoming queue. The semantics depend on the selected I/O |
55 | method.</para> | 55 | method.</para> |
56 | 56 | ||
57 | <para>To enqueue a <link linkend="mmap">memory mapped</link> | 57 | <para>To enqueue a buffer applications set the <structfield>type</structfield> |
58 | buffer applications set the <structfield>type</structfield> field of a | 58 | field of a &v4l2-buffer; to the same buffer type as was previously used |
59 | &v4l2-buffer; to the same buffer type as previously &v4l2-format; | 59 | with &v4l2-format; <structfield>type</structfield> and &v4l2-requestbuffers; |
60 | <structfield>type</structfield> and &v4l2-requestbuffers; | 60 | <structfield>type</structfield>. Applications must also set the |
61 | <structfield>type</structfield>, the <structfield>memory</structfield> | ||
62 | field to <constant>V4L2_MEMORY_MMAP</constant> and the | ||
63 | <structfield>index</structfield> field. Valid index numbers range from | 61 | <structfield>index</structfield> field. Valid index numbers range from |
64 | zero to the number of buffers allocated with &VIDIOC-REQBUFS; | 62 | zero to the number of buffers allocated with &VIDIOC-REQBUFS; |
65 | (&v4l2-requestbuffers; <structfield>count</structfield>) minus one. The | 63 | (&v4l2-requestbuffers; <structfield>count</structfield>) minus one. The |
@@ -70,8 +68,19 @@ intended for output (<structfield>type</structfield> is | |||
70 | <constant>V4L2_BUF_TYPE_VBI_OUTPUT</constant>) applications must also | 68 | <constant>V4L2_BUF_TYPE_VBI_OUTPUT</constant>) applications must also |
71 | initialize the <structfield>bytesused</structfield>, | 69 | initialize the <structfield>bytesused</structfield>, |
72 | <structfield>field</structfield> and | 70 | <structfield>field</structfield> and |
73 | <structfield>timestamp</structfield> fields. See <xref | 71 | <structfield>timestamp</structfield> fields, see <xref |
74 | linkend="buffer" /> for details. When | 72 | linkend="buffer" /> for details. |
73 | Applications must also set <structfield>flags</structfield> to 0. If a driver | ||
74 | supports capturing from specific video inputs and you want to specify a video | ||
75 | input, then <structfield>flags</structfield> should be set to | ||
76 | <constant>V4L2_BUF_FLAG_INPUT</constant> and the field | ||
77 | <structfield>input</structfield> must be initialized to the desired input. | ||
78 | The <structfield>reserved</structfield> field must be set to 0. | ||
79 | </para> | ||
80 | |||
81 | <para>To enqueue a <link linkend="mmap">memory mapped</link> | ||
82 | buffer applications set the <structfield>memory</structfield> | ||
83 | field to <constant>V4L2_MEMORY_MMAP</constant>. When | ||
75 | <constant>VIDIOC_QBUF</constant> is called with a pointer to this | 84 | <constant>VIDIOC_QBUF</constant> is called with a pointer to this |
76 | structure the driver sets the | 85 | structure the driver sets the |
77 | <constant>V4L2_BUF_FLAG_MAPPED</constant> and | 86 | <constant>V4L2_BUF_FLAG_MAPPED</constant> and |
@@ -81,14 +90,10 @@ structure the driver sets the | |||
81 | &EINVAL;.</para> | 90 | &EINVAL;.</para> |
82 | 91 | ||
83 | <para>To enqueue a <link linkend="userp">user pointer</link> | 92 | <para>To enqueue a <link linkend="userp">user pointer</link> |
84 | buffer applications set the <structfield>type</structfield> field of a | 93 | buffer applications set the <structfield>memory</structfield> |
85 | &v4l2-buffer; to the same buffer type as previously &v4l2-format; | 94 | field to <constant>V4L2_MEMORY_USERPTR</constant>, the |
86 | <structfield>type</structfield> and &v4l2-requestbuffers; | ||
87 | <structfield>type</structfield>, the <structfield>memory</structfield> | ||
88 | field to <constant>V4L2_MEMORY_USERPTR</constant> and the | ||
89 | <structfield>m.userptr</structfield> field to the address of the | 95 | <structfield>m.userptr</structfield> field to the address of the |
90 | buffer and <structfield>length</structfield> to its size. When the | 96 | buffer and <structfield>length</structfield> to its size. |
91 | buffer is intended for output additional fields must be set as above. | ||
92 | When <constant>VIDIOC_QBUF</constant> is called with a pointer to this | 97 | When <constant>VIDIOC_QBUF</constant> is called with a pointer to this |
93 | structure the driver sets the <constant>V4L2_BUF_FLAG_QUEUED</constant> | 98 | structure the driver sets the <constant>V4L2_BUF_FLAG_QUEUED</constant> |
94 | flag and clears the <constant>V4L2_BUF_FLAG_MAPPED</constant> and | 99 | flag and clears the <constant>V4L2_BUF_FLAG_MAPPED</constant> and |
@@ -96,13 +101,14 @@ flag and clears the <constant>V4L2_BUF_FLAG_MAPPED</constant> and | |||
96 | <structfield>flags</structfield> field, or it returns an error code. | 101 | <structfield>flags</structfield> field, or it returns an error code. |
97 | This ioctl locks the memory pages of the buffer in physical memory, | 102 | This ioctl locks the memory pages of the buffer in physical memory, |
98 | they cannot be swapped out to disk. Buffers remain locked until | 103 | they cannot be swapped out to disk. Buffers remain locked until |
99 | dequeued, until the &VIDIOC-STREAMOFF; or &VIDIOC-REQBUFS; ioctl are | 104 | dequeued, until the &VIDIOC-STREAMOFF; or &VIDIOC-REQBUFS; ioctl is |
100 | called, or until the device is closed.</para> | 105 | called, or until the device is closed.</para> |
101 | 106 | ||
102 | <para>Applications call the <constant>VIDIOC_DQBUF</constant> | 107 | <para>Applications call the <constant>VIDIOC_DQBUF</constant> |
103 | ioctl to dequeue a filled (capturing) or displayed (output) buffer | 108 | ioctl to dequeue a filled (capturing) or displayed (output) buffer |
104 | from the driver's outgoing queue. They just set the | 109 | from the driver's outgoing queue. They just set the |
105 | <structfield>type</structfield> and <structfield>memory</structfield> | 110 | <structfield>type</structfield>, <structfield>memory</structfield> |
111 | and <structfield>reserved</structfield> | ||
106 | fields of a &v4l2-buffer; as above, when <constant>VIDIOC_DQBUF</constant> | 112 | fields of a &v4l2-buffer; as above, when <constant>VIDIOC_DQBUF</constant> |
107 | is called with a pointer to this structure the driver fills the | 113 | is called with a pointer to this structure the driver fills the |
108 | remaining fields or returns an error code.</para> | 114 | remaining fields or returns an error code.</para> |
diff --git a/Documentation/DocBook/v4l/vidioc-querybuf.xml b/Documentation/DocBook/v4l/vidioc-querybuf.xml index d834993e6191..e649805a4908 100644 --- a/Documentation/DocBook/v4l/vidioc-querybuf.xml +++ b/Documentation/DocBook/v4l/vidioc-querybuf.xml | |||
@@ -54,12 +54,13 @@ buffer at any time after buffers have been allocated with the | |||
54 | &VIDIOC-REQBUFS; ioctl.</para> | 54 | &VIDIOC-REQBUFS; ioctl.</para> |
55 | 55 | ||
56 | <para>Applications set the <structfield>type</structfield> field | 56 | <para>Applications set the <structfield>type</structfield> field |
57 | of a &v4l2-buffer; to the same buffer type as previously | 57 | of a &v4l2-buffer; to the same buffer type as was previously used with |
58 | &v4l2-format; <structfield>type</structfield> and &v4l2-requestbuffers; | 58 | &v4l2-format; <structfield>type</structfield> and &v4l2-requestbuffers; |
59 | <structfield>type</structfield>, and the <structfield>index</structfield> | 59 | <structfield>type</structfield>, and the <structfield>index</structfield> |
60 | field. Valid index numbers range from zero | 60 | field. Valid index numbers range from zero |
61 | to the number of buffers allocated with &VIDIOC-REQBUFS; | 61 | to the number of buffers allocated with &VIDIOC-REQBUFS; |
62 | (&v4l2-requestbuffers; <structfield>count</structfield>) minus one. | 62 | (&v4l2-requestbuffers; <structfield>count</structfield>) minus one. |
63 | The <structfield>reserved</structfield> field should to set to 0. | ||
63 | After calling <constant>VIDIOC_QUERYBUF</constant> with a pointer to | 64 | After calling <constant>VIDIOC_QUERYBUF</constant> with a pointer to |
64 | this structure drivers return an error code or fill the rest of | 65 | this structure drivers return an error code or fill the rest of |
65 | the structure.</para> | 66 | the structure.</para> |
@@ -68,8 +69,8 @@ the structure.</para> | |||
68 | <constant>V4L2_BUF_FLAG_MAPPED</constant>, | 69 | <constant>V4L2_BUF_FLAG_MAPPED</constant>, |
69 | <constant>V4L2_BUF_FLAG_QUEUED</constant> and | 70 | <constant>V4L2_BUF_FLAG_QUEUED</constant> and |
70 | <constant>V4L2_BUF_FLAG_DONE</constant> flags will be valid. The | 71 | <constant>V4L2_BUF_FLAG_DONE</constant> flags will be valid. The |
71 | <structfield>memory</structfield> field will be set to | 72 | <structfield>memory</structfield> field will be set to the current |
72 | <constant>V4L2_MEMORY_MMAP</constant>, the <structfield>m.offset</structfield> | 73 | I/O method, the <structfield>m.offset</structfield> |
73 | contains the offset of the buffer from the start of the device memory, | 74 | contains the offset of the buffer from the start of the device memory, |
74 | the <structfield>length</structfield> field its size. The driver may | 75 | the <structfield>length</structfield> field its size. The driver may |
75 | or may not set the remaining fields and flags, they are meaningless in | 76 | or may not set the remaining fields and flags, they are meaningless in |
diff --git a/Documentation/DocBook/v4l/vidioc-reqbufs.xml b/Documentation/DocBook/v4l/vidioc-reqbufs.xml index bab38084454f..1c0816372074 100644 --- a/Documentation/DocBook/v4l/vidioc-reqbufs.xml +++ b/Documentation/DocBook/v4l/vidioc-reqbufs.xml | |||
@@ -54,23 +54,23 @@ I/O. Memory mapped buffers are located in device memory and must be | |||
54 | allocated with this ioctl before they can be mapped into the | 54 | allocated with this ioctl before they can be mapped into the |
55 | application's address space. User buffers are allocated by | 55 | application's address space. User buffers are allocated by |
56 | applications themselves, and this ioctl is merely used to switch the | 56 | applications themselves, and this ioctl is merely used to switch the |
57 | driver into user pointer I/O mode.</para> | 57 | driver into user pointer I/O mode and to setup some internal structures.</para> |
58 | 58 | ||
59 | <para>To allocate device buffers applications initialize three | 59 | <para>To allocate device buffers applications initialize all |
60 | fields of a <structname>v4l2_requestbuffers</structname> structure. | 60 | fields of the <structname>v4l2_requestbuffers</structname> structure. |
61 | They set the <structfield>type</structfield> field to the respective | 61 | They set the <structfield>type</structfield> field to the respective |
62 | stream or buffer type, the <structfield>count</structfield> field to | 62 | stream or buffer type, the <structfield>count</structfield> field to |
63 | the desired number of buffers, and <structfield>memory</structfield> | 63 | the desired number of buffers, <structfield>memory</structfield> |
64 | must be set to <constant>V4L2_MEMORY_MMAP</constant>. When the ioctl | 64 | must be set to the requested I/O method and the reserved array |
65 | is called with a pointer to this structure the driver attempts to | 65 | must be zeroed. When the ioctl |
66 | allocate the requested number of buffers and stores the actual number | 66 | is called with a pointer to this structure the driver will attempt to allocate |
67 | the requested number of buffers and it stores the actual number | ||
67 | allocated in the <structfield>count</structfield> field. It can be | 68 | allocated in the <structfield>count</structfield> field. It can be |
68 | smaller than the number requested, even zero, when the driver runs out | 69 | smaller than the number requested, even zero, when the driver runs out |
69 | of free memory. A larger number is possible when the driver requires | 70 | of free memory. A larger number is also possible when the driver requires |
70 | more buffers to function correctly.<footnote> | 71 | more buffers to function correctly. For example video output requires at least two buffers, |
71 | <para>For example video output requires at least two buffers, | ||
72 | one displayed and one filled by the application.</para> | 72 | one displayed and one filled by the application.</para> |
73 | </footnote> When memory mapping I/O is not supported the ioctl | 73 | <para>When the I/O method is not supported the ioctl |
74 | returns an &EINVAL;.</para> | 74 | returns an &EINVAL;.</para> |
75 | 75 | ||
76 | <para>Applications can call <constant>VIDIOC_REQBUFS</constant> | 76 | <para>Applications can call <constant>VIDIOC_REQBUFS</constant> |
@@ -81,14 +81,6 @@ in progress, an implicit &VIDIOC-STREAMOFF;. <!-- mhs: I see no | |||
81 | reason why munmap()ping one or even all buffers must imply | 81 | reason why munmap()ping one or even all buffers must imply |
82 | streamoff.--></para> | 82 | streamoff.--></para> |
83 | 83 | ||
84 | <para>To negotiate user pointer I/O, applications initialize only | ||
85 | the <structfield>type</structfield> field and set | ||
86 | <structfield>memory</structfield> to | ||
87 | <constant>V4L2_MEMORY_USERPTR</constant>. When the ioctl is called | ||
88 | with a pointer to this structure the driver prepares for user pointer | ||
89 | I/O, when this I/O method is not supported the ioctl returns an | ||
90 | &EINVAL;.</para> | ||
91 | |||
92 | <table pgwide="1" frame="none" id="v4l2-requestbuffers"> | 84 | <table pgwide="1" frame="none" id="v4l2-requestbuffers"> |
93 | <title>struct <structname>v4l2_requestbuffers</structname></title> | 85 | <title>struct <structname>v4l2_requestbuffers</structname></title> |
94 | <tgroup cols="3"> | 86 | <tgroup cols="3"> |
@@ -97,9 +89,7 @@ I/O, when this I/O method is not supported the ioctl returns an | |||
97 | <row> | 89 | <row> |
98 | <entry>__u32</entry> | 90 | <entry>__u32</entry> |
99 | <entry><structfield>count</structfield></entry> | 91 | <entry><structfield>count</structfield></entry> |
100 | <entry>The number of buffers requested or granted. This | 92 | <entry>The number of buffers requested or granted.</entry> |
101 | field is only used when <structfield>memory</structfield> is set to | ||
102 | <constant>V4L2_MEMORY_MMAP</constant>.</entry> | ||
103 | </row> | 93 | </row> |
104 | <row> | 94 | <row> |
105 | <entry>&v4l2-buf-type;</entry> | 95 | <entry>&v4l2-buf-type;</entry> |
@@ -120,7 +110,7 @@ as the &v4l2-format; <structfield>type</structfield> field. See <xref | |||
120 | <entry><structfield>reserved</structfield>[2]</entry> | 110 | <entry><structfield>reserved</structfield>[2]</entry> |
121 | <entry>A place holder for future extensions and custom | 111 | <entry>A place holder for future extensions and custom |
122 | (driver defined) buffer types <constant>V4L2_BUF_TYPE_PRIVATE</constant> and | 112 | (driver defined) buffer types <constant>V4L2_BUF_TYPE_PRIVATE</constant> and |
123 | higher.</entry> | 113 | higher. This array should be zeroed by applications.</entry> |
124 | </row> | 114 | </row> |
125 | </tbody> | 115 | </tbody> |
126 | </tgroup> | 116 | </tgroup> |
diff --git a/Documentation/HOWTO b/Documentation/HOWTO index 8495fc970391..f5395af88a41 100644 --- a/Documentation/HOWTO +++ b/Documentation/HOWTO | |||
@@ -221,8 +221,8 @@ branches. These different branches are: | |||
221 | - main 2.6.x kernel tree | 221 | - main 2.6.x kernel tree |
222 | - 2.6.x.y -stable kernel tree | 222 | - 2.6.x.y -stable kernel tree |
223 | - 2.6.x -git kernel patches | 223 | - 2.6.x -git kernel patches |
224 | - 2.6.x -mm kernel patches | ||
225 | - subsystem specific kernel trees and patches | 224 | - subsystem specific kernel trees and patches |
225 | - the 2.6.x -next kernel tree for integration tests | ||
226 | 226 | ||
227 | 2.6.x kernel tree | 227 | 2.6.x kernel tree |
228 | ----------------- | 228 | ----------------- |
@@ -232,7 +232,7 @@ process is as follows: | |||
232 | - As soon as a new kernel is released a two weeks window is open, | 232 | - As soon as a new kernel is released a two weeks window is open, |
233 | during this period of time maintainers can submit big diffs to | 233 | during this period of time maintainers can submit big diffs to |
234 | Linus, usually the patches that have already been included in the | 234 | Linus, usually the patches that have already been included in the |
235 | -mm kernel for a few weeks. The preferred way to submit big changes | 235 | -next kernel for a few weeks. The preferred way to submit big changes |
236 | is using git (the kernel's source management tool, more information | 236 | is using git (the kernel's source management tool, more information |
237 | can be found at http://git.or.cz/) but plain patches are also just | 237 | can be found at http://git.or.cz/) but plain patches are also just |
238 | fine. | 238 | fine. |
@@ -293,84 +293,43 @@ daily and represent the current state of Linus' tree. They are more | |||
293 | experimental than -rc kernels since they are generated automatically | 293 | experimental than -rc kernels since they are generated automatically |
294 | without even a cursory glance to see if they are sane. | 294 | without even a cursory glance to see if they are sane. |
295 | 295 | ||
296 | 2.6.x -mm kernel patches | ||
297 | ------------------------ | ||
298 | These are experimental kernel patches released by Andrew Morton. Andrew | ||
299 | takes all of the different subsystem kernel trees and patches and mushes | ||
300 | them together, along with a lot of patches that have been plucked from | ||
301 | the linux-kernel mailing list. This tree serves as a proving ground for | ||
302 | new features and patches. Once a patch has proved its worth in -mm for | ||
303 | a while Andrew or the subsystem maintainer pushes it on to Linus for | ||
304 | inclusion in mainline. | ||
305 | |||
306 | It is heavily encouraged that all new patches get tested in the -mm tree | ||
307 | before they are sent to Linus for inclusion in the main kernel tree. Code | ||
308 | which does not make an appearance in -mm before the opening of the merge | ||
309 | window will prove hard to merge into the mainline. | ||
310 | |||
311 | These kernels are not appropriate for use on systems that are supposed | ||
312 | to be stable and they are more risky to run than any of the other | ||
313 | branches. | ||
314 | |||
315 | If you wish to help out with the kernel development process, please test | ||
316 | and use these kernel releases and provide feedback to the linux-kernel | ||
317 | mailing list if you have any problems, and if everything works properly. | ||
318 | |||
319 | In addition to all the other experimental patches, these kernels usually | ||
320 | also contain any changes in the mainline -git kernels available at the | ||
321 | time of release. | ||
322 | |||
323 | The -mm kernels are not released on a fixed schedule, but usually a few | ||
324 | -mm kernels are released in between each -rc kernel (1 to 3 is common). | ||
325 | |||
326 | Subsystem Specific kernel trees and patches | 296 | Subsystem Specific kernel trees and patches |
327 | ------------------------------------------- | 297 | ------------------------------------------- |
328 | A number of the different kernel subsystem developers expose their | 298 | The maintainers of the various kernel subsystems --- and also many |
329 | development trees so that others can see what is happening in the | 299 | kernel subsystem developers --- expose their current state of |
330 | different areas of the kernel. These trees are pulled into the -mm | 300 | development in source repositories. That way, others can see what is |
331 | kernel releases as described above. | 301 | happening in the different areas of the kernel. In areas where |
332 | 302 | development is rapid, a developer may be asked to base his submissions | |
333 | Here is a list of some of the different kernel trees available: | 303 | onto such a subsystem kernel tree so that conflicts between the |
334 | git trees: | 304 | submission and other already ongoing work are avoided. |
335 | - Kbuild development tree, Sam Ravnborg <sam@ravnborg.org> | 305 | |
336 | git.kernel.org:/pub/scm/linux/kernel/git/sam/kbuild.git | 306 | Most of these repositories are git trees, but there are also other SCMs |
337 | 307 | in use, or patch queues being published as quilt series. Addresses of | |
338 | - ACPI development tree, Len Brown <len.brown@intel.com> | 308 | these subsystem repositories are listed in the MAINTAINERS file. Many |
339 | git.kernel.org:/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6.git | 309 | of them can be browsed at http://git.kernel.org/. |
340 | 310 | ||
341 | - Block development tree, Jens Axboe <jens.axboe@oracle.com> | 311 | Before a proposed patch is committed to such a subsystem tree, it is |
342 | git.kernel.org:/pub/scm/linux/kernel/git/axboe/linux-2.6-block.git | 312 | subject to review which primarily happens on mailing lists (see the |
343 | 313 | respective section below). For several kernel subsystems, this review | |
344 | - DRM development tree, Dave Airlie <airlied@linux.ie> | 314 | process is tracked with the tool patchwork. Patchwork offers a web |
345 | git.kernel.org:/pub/scm/linux/kernel/git/airlied/drm-2.6.git | 315 | interface which shows patch postings, any comments on a patch or |
346 | 316 | revisions to it, and maintainers can mark patches as under review, | |
347 | - ia64 development tree, Tony Luck <tony.luck@intel.com> | 317 | accepted, or rejected. Most of these patchwork sites are listed at |
348 | git.kernel.org:/pub/scm/linux/kernel/git/aegl/linux-2.6.git | 318 | http://patchwork.kernel.org/ or http://patchwork.ozlabs.org/. |
349 | 319 | ||
350 | - infiniband, Roland Dreier <rolandd@cisco.com> | 320 | 2.6.x -next kernel tree for integration tests |
351 | git.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband.git | 321 | --------------------------------------------- |
352 | 322 | Before updates from subsystem trees are merged into the mainline 2.6.x | |
353 | - libata, Jeff Garzik <jgarzik@pobox.com> | 323 | tree, they need to be integration-tested. For this purpose, a special |
354 | git.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev.git | 324 | testing repository exists into which virtually all subsystem trees are |
355 | 325 | pulled on an almost daily basis: | |
356 | - network drivers, Jeff Garzik <jgarzik@pobox.com> | 326 | http://git.kernel.org/?p=linux/kernel/git/sfr/linux-next.git |
357 | git.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git | 327 | http://linux.f-seidel.de/linux-next/pmwiki/ |
358 | 328 | ||
359 | - pcmcia, Dominik Brodowski <linux@dominikbrodowski.net> | 329 | This way, the -next kernel gives a summary outlook onto what will be |
360 | git.kernel.org:/pub/scm/linux/kernel/git/brodo/pcmcia-2.6.git | 330 | expected to go into the mainline kernel at the next merge period. |
361 | 331 | Adventurous testers are very welcome to runtime-test the -next kernel. | |
362 | - SCSI, James Bottomley <James.Bottomley@hansenpartnership.com> | ||
363 | git.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6.git | ||
364 | |||
365 | - x86, Ingo Molnar <mingo@elte.hu> | ||
366 | git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86.git | ||
367 | |||
368 | quilt trees: | ||
369 | - USB, Driver Core, and I2C, Greg Kroah-Hartman <gregkh@suse.de> | ||
370 | kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/ | ||
371 | 332 | ||
372 | Other kernel trees can be found listed at http://git.kernel.org/ and in | ||
373 | the MAINTAINERS file. | ||
374 | 333 | ||
375 | Bug Reporting | 334 | Bug Reporting |
376 | ------------- | 335 | ------------- |
diff --git a/Documentation/IPMI.txt b/Documentation/IPMI.txt index bc38283379f0..69dd29ed824e 100644 --- a/Documentation/IPMI.txt +++ b/Documentation/IPMI.txt | |||
@@ -365,6 +365,7 @@ You can change this at module load time (for a module) with: | |||
365 | regshifts=<shift1>,<shift2>,... | 365 | regshifts=<shift1>,<shift2>,... |
366 | slave_addrs=<addr1>,<addr2>,... | 366 | slave_addrs=<addr1>,<addr2>,... |
367 | force_kipmid=<enable1>,<enable2>,... | 367 | force_kipmid=<enable1>,<enable2>,... |
368 | kipmid_max_busy_us=<ustime1>,<ustime2>,... | ||
368 | unload_when_empty=[0|1] | 369 | unload_when_empty=[0|1] |
369 | 370 | ||
370 | Each of these except si_trydefaults is a list, the first item for the | 371 | Each of these except si_trydefaults is a list, the first item for the |
@@ -433,6 +434,7 @@ kernel command line as: | |||
433 | ipmi_si.regshifts=<shift1>,<shift2>,... | 434 | ipmi_si.regshifts=<shift1>,<shift2>,... |
434 | ipmi_si.slave_addrs=<addr1>,<addr2>,... | 435 | ipmi_si.slave_addrs=<addr1>,<addr2>,... |
435 | ipmi_si.force_kipmid=<enable1>,<enable2>,... | 436 | ipmi_si.force_kipmid=<enable1>,<enable2>,... |
437 | ipmi_si.kipmid_max_busy_us=<ustime1>,<ustime2>,... | ||
436 | 438 | ||
437 | It works the same as the module parameters of the same names. | 439 | It works the same as the module parameters of the same names. |
438 | 440 | ||
@@ -450,6 +452,16 @@ force this thread on or off. If you force it off and don't have | |||
450 | interrupts, the driver will run VERY slowly. Don't blame me, | 452 | interrupts, the driver will run VERY slowly. Don't blame me, |
451 | these interfaces suck. | 453 | these interfaces suck. |
452 | 454 | ||
455 | Unfortunately, this thread can use a lot of CPU depending on the | ||
456 | interface's performance. This can waste a lot of CPU and cause | ||
457 | various issues with detecting idle CPU and using extra power. To | ||
458 | avoid this, the kipmid_max_busy_us sets the maximum amount of time, in | ||
459 | microseconds, that kipmid will spin before sleeping for a tick. This | ||
460 | value sets a balance between performance and CPU waste and needs to be | ||
461 | tuned to your needs. Maybe, someday, auto-tuning will be added, but | ||
462 | that's not a simple thing and even the auto-tuning would need to be | ||
463 | tuned to the user's desired performance. | ||
464 | |||
453 | The driver supports a hot add and remove of interfaces. This way, | 465 | The driver supports a hot add and remove of interfaces. This way, |
454 | interfaces can be added or removed after the kernel is up and running. | 466 | interfaces can be added or removed after the kernel is up and running. |
455 | This is done using /sys/modules/ipmi_si/parameters/hotmod, which is a | 467 | This is done using /sys/modules/ipmi_si/parameters/hotmod, which is a |
diff --git a/Documentation/Makefile b/Documentation/Makefile index 94b945733534..6fc7ea1d1f9d 100644 --- a/Documentation/Makefile +++ b/Documentation/Makefile | |||
@@ -1,3 +1,3 @@ | |||
1 | obj-m := DocBook/ accounting/ auxdisplay/ connector/ \ | 1 | obj-m := DocBook/ accounting/ auxdisplay/ connector/ \ |
2 | filesystems/configfs/ ia64/ networking/ \ | 2 | filesystems/ filesystems/configfs/ ia64/ laptops/ networking/ \ |
3 | pcmcia/ spi/ video4linux/ vm/ watchdog/src/ | 3 | pcmcia/ spi/ timers/ video4linux/ vm/ watchdog/src/ |
diff --git a/Documentation/PCI/PCI-DMA-mapping.txt b/Documentation/PCI/PCI-DMA-mapping.txt index ecad88d9fe59..52618ab069ad 100644 --- a/Documentation/PCI/PCI-DMA-mapping.txt +++ b/Documentation/PCI/PCI-DMA-mapping.txt | |||
@@ -1,12 +1,12 @@ | |||
1 | Dynamic DMA mapping | 1 | Dynamic DMA mapping Guide |
2 | =================== | 2 | ========================= |
3 | 3 | ||
4 | David S. Miller <davem@redhat.com> | 4 | David S. Miller <davem@redhat.com> |
5 | Richard Henderson <rth@cygnus.com> | 5 | Richard Henderson <rth@cygnus.com> |
6 | Jakub Jelinek <jakub@redhat.com> | 6 | Jakub Jelinek <jakub@redhat.com> |
7 | 7 | ||
8 | This document describes the DMA mapping system in terms of the pci_ | 8 | This is a guide to device driver writers on how to use the DMA API |
9 | API. For a similar API that works for generic devices, see | 9 | with example pseudo-code. For a concise description of the API, see |
10 | DMA-API.txt. | 10 | DMA-API.txt. |
11 | 11 | ||
12 | Most of the 64bit platforms have special hardware that translates bus | 12 | Most of the 64bit platforms have special hardware that translates bus |
@@ -26,12 +26,15 @@ mapped only for the time they are actually used and unmapped after the DMA | |||
26 | transfer. | 26 | transfer. |
27 | 27 | ||
28 | The following API will work of course even on platforms where no such | 28 | The following API will work of course even on platforms where no such |
29 | hardware exists, see e.g. arch/x86/include/asm/pci.h for how it is implemented on | 29 | hardware exists. |
30 | top of the virt_to_bus interface. | 30 | |
31 | Note that the DMA API works with any bus independent of the underlying | ||
32 | microprocessor architecture. You should use the DMA API rather than | ||
33 | the bus specific DMA API (e.g. pci_dma_*). | ||
31 | 34 | ||
32 | First of all, you should make sure | 35 | First of all, you should make sure |
33 | 36 | ||
34 | #include <linux/pci.h> | 37 | #include <linux/dma-mapping.h> |
35 | 38 | ||
36 | is in your driver. This file will obtain for you the definition of the | 39 | is in your driver. This file will obtain for you the definition of the |
37 | dma_addr_t (which can hold any valid DMA address for the platform) | 40 | dma_addr_t (which can hold any valid DMA address for the platform) |
@@ -78,44 +81,43 @@ for you to DMA from/to. | |||
78 | DMA addressing limitations | 81 | DMA addressing limitations |
79 | 82 | ||
80 | Does your device have any DMA addressing limitations? For example, is | 83 | Does your device have any DMA addressing limitations? For example, is |
81 | your device only capable of driving the low order 24-bits of address | 84 | your device only capable of driving the low order 24-bits of address? |
82 | on the PCI bus for SAC DMA transfers? If so, you need to inform the | 85 | If so, you need to inform the kernel of this fact. |
83 | PCI layer of this fact. | ||
84 | 86 | ||
85 | By default, the kernel assumes that your device can address the full | 87 | By default, the kernel assumes that your device can address the full |
86 | 32-bits in a SAC cycle. For a 64-bit DAC capable device, this needs | 88 | 32-bits. For a 64-bit capable device, this needs to be increased. |
87 | to be increased. And for a device with limitations, as discussed in | 89 | And for a device with limitations, as discussed in the previous |
88 | the previous paragraph, it needs to be decreased. | 90 | paragraph, it needs to be decreased. |
89 | 91 | ||
90 | pci_alloc_consistent() by default will return 32-bit DMA addresses. | 92 | Special note about PCI: PCI-X specification requires PCI-X devices to |
91 | PCI-X specification requires PCI-X devices to support 64-bit | 93 | support 64-bit addressing (DAC) for all transactions. And at least |
92 | addressing (DAC) for all transactions. And at least one platform (SGI | 94 | one platform (SGI SN2) requires 64-bit consistent allocations to |
93 | SN2) requires 64-bit consistent allocations to operate correctly when | 95 | operate correctly when the IO bus is in PCI-X mode. |
94 | the IO bus is in PCI-X mode. Therefore, like with pci_set_dma_mask(), | 96 | |
95 | it's good practice to call pci_set_consistent_dma_mask() to set the | 97 | For correct operation, you must interrogate the kernel in your device |
96 | appropriate mask even if your device only supports 32-bit DMA | 98 | probe routine to see if the DMA controller on the machine can properly |
97 | (default) and especially if it's a PCI-X device. | 99 | support the DMA addressing limitation your device has. It is good |
98 | 100 | style to do this even if your device holds the default setting, | |
99 | For correct operation, you must interrogate the PCI layer in your | ||
100 | device probe routine to see if the PCI controller on the machine can | ||
101 | properly support the DMA addressing limitation your device has. It is | ||
102 | good style to do this even if your device holds the default setting, | ||
103 | because this shows that you did think about these issues wrt. your | 101 | because this shows that you did think about these issues wrt. your |
104 | device. | 102 | device. |
105 | 103 | ||
106 | The query is performed via a call to pci_set_dma_mask(): | 104 | The query is performed via a call to dma_set_mask(): |
107 | 105 | ||
108 | int pci_set_dma_mask(struct pci_dev *pdev, u64 device_mask); | 106 | int dma_set_mask(struct device *dev, u64 mask); |
109 | 107 | ||
110 | The query for consistent allocations is performed via a call to | 108 | The query for consistent allocations is performed via a call to |
111 | pci_set_consistent_dma_mask(): | 109 | dma_set_coherent_mask(): |
112 | 110 | ||
113 | int pci_set_consistent_dma_mask(struct pci_dev *pdev, u64 device_mask); | 111 | int dma_set_coherent_mask(struct device *dev, u64 mask); |
114 | 112 | ||
115 | Here, pdev is a pointer to the PCI device struct of your device, and | 113 | Here, dev is a pointer to the device struct of your device, and mask |
116 | device_mask is a bit mask describing which bits of a PCI address your | 114 | is a bit mask describing which bits of an address your device |
117 | device supports. It returns zero if your card can perform DMA | 115 | supports. It returns zero if your card can perform DMA properly on |
118 | properly on the machine given the address mask you provided. | 116 | the machine given the address mask you provided. In general, the |
117 | device struct of your device is embedded in the bus specific device | ||
118 | struct of your device. For example, a pointer to the device struct of | ||
119 | your PCI device is pdev->dev (pdev is a pointer to the PCI device | ||
120 | struct of your device). | ||
119 | 121 | ||
120 | If it returns non-zero, your device cannot perform DMA properly on | 122 | If it returns non-zero, your device cannot perform DMA properly on |
121 | this platform, and attempting to do so will result in undefined | 123 | this platform, and attempting to do so will result in undefined |
@@ -133,31 +135,30 @@ of your driver reports that performance is bad or that the device is not | |||
133 | even detected, you can ask them for the kernel messages to find out | 135 | even detected, you can ask them for the kernel messages to find out |
134 | exactly why. | 136 | exactly why. |
135 | 137 | ||
136 | The standard 32-bit addressing PCI device would do something like | 138 | The standard 32-bit addressing device would do something like this: |
137 | this: | ||
138 | 139 | ||
139 | if (pci_set_dma_mask(pdev, DMA_BIT_MASK(32))) { | 140 | if (dma_set_mask(dev, DMA_BIT_MASK(32))) { |
140 | printk(KERN_WARNING | 141 | printk(KERN_WARNING |
141 | "mydev: No suitable DMA available.\n"); | 142 | "mydev: No suitable DMA available.\n"); |
142 | goto ignore_this_device; | 143 | goto ignore_this_device; |
143 | } | 144 | } |
144 | 145 | ||
145 | Another common scenario is a 64-bit capable device. The approach | 146 | Another common scenario is a 64-bit capable device. The approach here |
146 | here is to try for 64-bit DAC addressing, but back down to a | 147 | is to try for 64-bit addressing, but back down to a 32-bit mask that |
147 | 32-bit mask should that fail. The PCI platform code may fail the | 148 | should not fail. The kernel may fail the 64-bit mask not because the |
148 | 64-bit mask not because the platform is not capable of 64-bit | 149 | platform is not capable of 64-bit addressing. Rather, it may fail in |
149 | addressing. Rather, it may fail in this case simply because | 150 | this case simply because 32-bit addressing is done more efficiently |
150 | 32-bit SAC addressing is done more efficiently than DAC addressing. | 151 | than 64-bit addressing. For example, Sparc64 PCI SAC addressing is |
151 | Sparc64 is one platform which behaves in this way. | 152 | more efficient than DAC addressing. |
152 | 153 | ||
153 | Here is how you would handle a 64-bit capable device which can drive | 154 | Here is how you would handle a 64-bit capable device which can drive |
154 | all 64-bits when accessing streaming DMA: | 155 | all 64-bits when accessing streaming DMA: |
155 | 156 | ||
156 | int using_dac; | 157 | int using_dac; |
157 | 158 | ||
158 | if (!pci_set_dma_mask(pdev, DMA_BIT_MASK(64))) { | 159 | if (!dma_set_mask(dev, DMA_BIT_MASK(64))) { |
159 | using_dac = 1; | 160 | using_dac = 1; |
160 | } else if (!pci_set_dma_mask(pdev, DMA_BIT_MASK(32))) { | 161 | } else if (!dma_set_mask(dev, DMA_BIT_MASK(32))) { |
161 | using_dac = 0; | 162 | using_dac = 0; |
162 | } else { | 163 | } else { |
163 | printk(KERN_WARNING | 164 | printk(KERN_WARNING |
@@ -170,36 +171,36 @@ the case would look like this: | |||
170 | 171 | ||
171 | int using_dac, consistent_using_dac; | 172 | int using_dac, consistent_using_dac; |
172 | 173 | ||
173 | if (!pci_set_dma_mask(pdev, DMA_BIT_MASK(64))) { | 174 | if (!dma_set_mask(dev, DMA_BIT_MASK(64))) { |
174 | using_dac = 1; | 175 | using_dac = 1; |
175 | consistent_using_dac = 1; | 176 | consistent_using_dac = 1; |
176 | pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(64)); | 177 | dma_set_coherent_mask(dev, DMA_BIT_MASK(64)); |
177 | } else if (!pci_set_dma_mask(pdev, DMA_BIT_MASK(32))) { | 178 | } else if (!dma_set_mask(dev, DMA_BIT_MASK(32))) { |
178 | using_dac = 0; | 179 | using_dac = 0; |
179 | consistent_using_dac = 0; | 180 | consistent_using_dac = 0; |
180 | pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(32)); | 181 | dma_set_coherent_mask(dev, DMA_BIT_MASK(32)); |
181 | } else { | 182 | } else { |
182 | printk(KERN_WARNING | 183 | printk(KERN_WARNING |
183 | "mydev: No suitable DMA available.\n"); | 184 | "mydev: No suitable DMA available.\n"); |
184 | goto ignore_this_device; | 185 | goto ignore_this_device; |
185 | } | 186 | } |
186 | 187 | ||
187 | pci_set_consistent_dma_mask() will always be able to set the same or a | 188 | dma_set_coherent_mask() will always be able to set the same or a |
188 | smaller mask as pci_set_dma_mask(). However for the rare case that a | 189 | smaller mask as dma_set_mask(). However for the rare case that a |
189 | device driver only uses consistent allocations, one would have to | 190 | device driver only uses consistent allocations, one would have to |
190 | check the return value from pci_set_consistent_dma_mask(). | 191 | check the return value from dma_set_coherent_mask(). |
191 | 192 | ||
192 | Finally, if your device can only drive the low 24-bits of | 193 | Finally, if your device can only drive the low 24-bits of |
193 | address during PCI bus mastering you might do something like: | 194 | address you might do something like: |
194 | 195 | ||
195 | if (pci_set_dma_mask(pdev, DMA_BIT_MASK(24))) { | 196 | if (dma_set_mask(dev, DMA_BIT_MASK(24))) { |
196 | printk(KERN_WARNING | 197 | printk(KERN_WARNING |
197 | "mydev: 24-bit DMA addressing not available.\n"); | 198 | "mydev: 24-bit DMA addressing not available.\n"); |
198 | goto ignore_this_device; | 199 | goto ignore_this_device; |
199 | } | 200 | } |
200 | 201 | ||
201 | When pci_set_dma_mask() is successful, and returns zero, the PCI layer | 202 | When dma_set_mask() is successful, and returns zero, the kernel saves |
202 | saves away this mask you have provided. The PCI layer will use this | 203 | away this mask you have provided. The kernel will use this |
203 | information later when you make DMA mappings. | 204 | information later when you make DMA mappings. |
204 | 205 | ||
205 | There is a case which we are aware of at this time, which is worth | 206 | There is a case which we are aware of at this time, which is worth |
@@ -208,7 +209,7 @@ functions (for example a sound card provides playback and record | |||
208 | functions) and the various different functions have _different_ | 209 | functions) and the various different functions have _different_ |
209 | DMA addressing limitations, you may wish to probe each mask and | 210 | DMA addressing limitations, you may wish to probe each mask and |
210 | only provide the functionality which the machine can handle. It | 211 | only provide the functionality which the machine can handle. It |
211 | is important that the last call to pci_set_dma_mask() be for the | 212 | is important that the last call to dma_set_mask() be for the |
212 | most specific mask. | 213 | most specific mask. |
213 | 214 | ||
214 | Here is pseudo-code showing how this might be done: | 215 | Here is pseudo-code showing how this might be done: |
@@ -217,17 +218,17 @@ Here is pseudo-code showing how this might be done: | |||
217 | #define RECORD_ADDRESS_BITS DMA_BIT_MASK(24) | 218 | #define RECORD_ADDRESS_BITS DMA_BIT_MASK(24) |
218 | 219 | ||
219 | struct my_sound_card *card; | 220 | struct my_sound_card *card; |
220 | struct pci_dev *pdev; | 221 | struct device *dev; |
221 | 222 | ||
222 | ... | 223 | ... |
223 | if (!pci_set_dma_mask(pdev, PLAYBACK_ADDRESS_BITS)) { | 224 | if (!dma_set_mask(dev, PLAYBACK_ADDRESS_BITS)) { |
224 | card->playback_enabled = 1; | 225 | card->playback_enabled = 1; |
225 | } else { | 226 | } else { |
226 | card->playback_enabled = 0; | 227 | card->playback_enabled = 0; |
227 | printk(KERN_WARNING "%s: Playback disabled due to DMA limitations.\n", | 228 | printk(KERN_WARNING "%s: Playback disabled due to DMA limitations.\n", |
228 | card->name); | 229 | card->name); |
229 | } | 230 | } |
230 | if (!pci_set_dma_mask(pdev, RECORD_ADDRESS_BITS)) { | 231 | if (!dma_set_mask(dev, RECORD_ADDRESS_BITS)) { |
231 | card->record_enabled = 1; | 232 | card->record_enabled = 1; |
232 | } else { | 233 | } else { |
233 | card->record_enabled = 0; | 234 | card->record_enabled = 0; |
@@ -252,8 +253,8 @@ There are two types of DMA mappings: | |||
252 | Think of "consistent" as "synchronous" or "coherent". | 253 | Think of "consistent" as "synchronous" or "coherent". |
253 | 254 | ||
254 | The current default is to return consistent memory in the low 32 | 255 | The current default is to return consistent memory in the low 32 |
255 | bits of the PCI bus space. However, for future compatibility you | 256 | bits of the bus space. However, for future compatibility you should |
256 | should set the consistent mask even if this default is fine for your | 257 | set the consistent mask even if this default is fine for your |
257 | driver. | 258 | driver. |
258 | 259 | ||
259 | Good examples of what to use consistent mappings for are: | 260 | Good examples of what to use consistent mappings for are: |
@@ -285,9 +286,9 @@ There are two types of DMA mappings: | |||
285 | found in PCI bridges (such as by reading a register's value | 286 | found in PCI bridges (such as by reading a register's value |
286 | after writing it). | 287 | after writing it). |
287 | 288 | ||
288 | - Streaming DMA mappings which are usually mapped for one DMA transfer, | 289 | - Streaming DMA mappings which are usually mapped for one DMA |
289 | unmapped right after it (unless you use pci_dma_sync_* below) and for which | 290 | transfer, unmapped right after it (unless you use dma_sync_* below) |
290 | hardware can optimize for sequential accesses. | 291 | and for which hardware can optimize for sequential accesses. |
291 | 292 | ||
292 | This of "streaming" as "asynchronous" or "outside the coherency | 293 | This of "streaming" as "asynchronous" or "outside the coherency |
293 | domain". | 294 | domain". |
@@ -302,8 +303,8 @@ There are two types of DMA mappings: | |||
302 | optimizations the hardware allows. To this end, when using | 303 | optimizations the hardware allows. To this end, when using |
303 | such mappings you must be explicit about what you want to happen. | 304 | such mappings you must be explicit about what you want to happen. |
304 | 305 | ||
305 | Neither type of DMA mapping has alignment restrictions that come | 306 | Neither type of DMA mapping has alignment restrictions that come from |
306 | from PCI, although some devices may have such restrictions. | 307 | the underlying bus, although some devices may have such restrictions. |
307 | Also, systems with caches that aren't DMA-coherent will work better | 308 | Also, systems with caches that aren't DMA-coherent will work better |
308 | when the underlying buffers don't share cache lines with other data. | 309 | when the underlying buffers don't share cache lines with other data. |
309 | 310 | ||
@@ -315,33 +316,27 @@ you should do: | |||
315 | 316 | ||
316 | dma_addr_t dma_handle; | 317 | dma_addr_t dma_handle; |
317 | 318 | ||
318 | cpu_addr = pci_alloc_consistent(pdev, size, &dma_handle); | 319 | cpu_addr = dma_alloc_coherent(dev, size, &dma_handle, gfp); |
319 | |||
320 | where pdev is a struct pci_dev *. This may be called in interrupt context. | ||
321 | You should use dma_alloc_coherent (see DMA-API.txt) for buses | ||
322 | where devices don't have struct pci_dev (like ISA, EISA). | ||
323 | 320 | ||
324 | This argument is needed because the DMA translations may be bus | 321 | where device is a struct device *. This may be called in interrupt |
325 | specific (and often is private to the bus which the device is attached | 322 | context with the GFP_ATOMIC flag. |
326 | to). | ||
327 | 323 | ||
328 | Size is the length of the region you want to allocate, in bytes. | 324 | Size is the length of the region you want to allocate, in bytes. |
329 | 325 | ||
330 | This routine will allocate RAM for that region, so it acts similarly to | 326 | This routine will allocate RAM for that region, so it acts similarly to |
331 | __get_free_pages (but takes size instead of a page order). If your | 327 | __get_free_pages (but takes size instead of a page order). If your |
332 | driver needs regions sized smaller than a page, you may prefer using | 328 | driver needs regions sized smaller than a page, you may prefer using |
333 | the pci_pool interface, described below. | 329 | the dma_pool interface, described below. |
334 | 330 | ||
335 | The consistent DMA mapping interfaces, for non-NULL pdev, will by | 331 | The consistent DMA mapping interfaces, for non-NULL dev, will by |
336 | default return a DMA address which is SAC (Single Address Cycle) | 332 | default return a DMA address which is 32-bit addressable. Even if the |
337 | addressable. Even if the device indicates (via PCI dma mask) that it | 333 | device indicates (via DMA mask) that it may address the upper 32-bits, |
338 | may address the upper 32-bits and thus perform DAC cycles, consistent | 334 | consistent allocation will only return > 32-bit addresses for DMA if |
339 | allocation will only return > 32-bit PCI addresses for DMA if the | 335 | the consistent DMA mask has been explicitly changed via |
340 | consistent dma mask has been explicitly changed via | 336 | dma_set_coherent_mask(). This is true of the dma_pool interface as |
341 | pci_set_consistent_dma_mask(). This is true of the pci_pool interface | 337 | well. |
342 | as well. | 338 | |
343 | 339 | dma_alloc_coherent returns two values: the virtual address which you | |
344 | pci_alloc_consistent returns two values: the virtual address which you | ||
345 | can use to access it from the CPU and dma_handle which you pass to the | 340 | can use to access it from the CPU and dma_handle which you pass to the |
346 | card. | 341 | card. |
347 | 342 | ||
@@ -354,54 +349,54 @@ buffer you receive will not cross a 64K boundary. | |||
354 | 349 | ||
355 | To unmap and free such a DMA region, you call: | 350 | To unmap and free such a DMA region, you call: |
356 | 351 | ||
357 | pci_free_consistent(pdev, size, cpu_addr, dma_handle); | 352 | dma_free_coherent(dev, size, cpu_addr, dma_handle); |
358 | 353 | ||
359 | where pdev, size are the same as in the above call and cpu_addr and | 354 | where dev, size are the same as in the above call and cpu_addr and |
360 | dma_handle are the values pci_alloc_consistent returned to you. | 355 | dma_handle are the values dma_alloc_coherent returned to you. |
361 | This function may not be called in interrupt context. | 356 | This function may not be called in interrupt context. |
362 | 357 | ||
363 | If your driver needs lots of smaller memory regions, you can write | 358 | If your driver needs lots of smaller memory regions, you can write |
364 | custom code to subdivide pages returned by pci_alloc_consistent, | 359 | custom code to subdivide pages returned by dma_alloc_coherent, |
365 | or you can use the pci_pool API to do that. A pci_pool is like | 360 | or you can use the dma_pool API to do that. A dma_pool is like |
366 | a kmem_cache, but it uses pci_alloc_consistent not __get_free_pages. | 361 | a kmem_cache, but it uses dma_alloc_coherent not __get_free_pages. |
367 | Also, it understands common hardware constraints for alignment, | 362 | Also, it understands common hardware constraints for alignment, |
368 | like queue heads needing to be aligned on N byte boundaries. | 363 | like queue heads needing to be aligned on N byte boundaries. |
369 | 364 | ||
370 | Create a pci_pool like this: | 365 | Create a dma_pool like this: |
371 | 366 | ||
372 | struct pci_pool *pool; | 367 | struct dma_pool *pool; |
373 | 368 | ||
374 | pool = pci_pool_create(name, pdev, size, align, alloc); | 369 | pool = dma_pool_create(name, dev, size, align, alloc); |
375 | 370 | ||
376 | The "name" is for diagnostics (like a kmem_cache name); pdev and size | 371 | The "name" is for diagnostics (like a kmem_cache name); dev and size |
377 | are as above. The device's hardware alignment requirement for this | 372 | are as above. The device's hardware alignment requirement for this |
378 | type of data is "align" (which is expressed in bytes, and must be a | 373 | type of data is "align" (which is expressed in bytes, and must be a |
379 | power of two). If your device has no boundary crossing restrictions, | 374 | power of two). If your device has no boundary crossing restrictions, |
380 | pass 0 for alloc; passing 4096 says memory allocated from this pool | 375 | pass 0 for alloc; passing 4096 says memory allocated from this pool |
381 | must not cross 4KByte boundaries (but at that time it may be better to | 376 | must not cross 4KByte boundaries (but at that time it may be better to |
382 | go for pci_alloc_consistent directly instead). | 377 | go for dma_alloc_coherent directly instead). |
383 | 378 | ||
384 | Allocate memory from a pci pool like this: | 379 | Allocate memory from a dma pool like this: |
385 | 380 | ||
386 | cpu_addr = pci_pool_alloc(pool, flags, &dma_handle); | 381 | cpu_addr = dma_pool_alloc(pool, flags, &dma_handle); |
387 | 382 | ||
388 | flags are SLAB_KERNEL if blocking is permitted (not in_interrupt nor | 383 | flags are SLAB_KERNEL if blocking is permitted (not in_interrupt nor |
389 | holding SMP locks), SLAB_ATOMIC otherwise. Like pci_alloc_consistent, | 384 | holding SMP locks), SLAB_ATOMIC otherwise. Like dma_alloc_coherent, |
390 | this returns two values, cpu_addr and dma_handle. | 385 | this returns two values, cpu_addr and dma_handle. |
391 | 386 | ||
392 | Free memory that was allocated from a pci_pool like this: | 387 | Free memory that was allocated from a dma_pool like this: |
393 | 388 | ||
394 | pci_pool_free(pool, cpu_addr, dma_handle); | 389 | dma_pool_free(pool, cpu_addr, dma_handle); |
395 | 390 | ||
396 | where pool is what you passed to pci_pool_alloc, and cpu_addr and | 391 | where pool is what you passed to dma_pool_alloc, and cpu_addr and |
397 | dma_handle are the values pci_pool_alloc returned. This function | 392 | dma_handle are the values dma_pool_alloc returned. This function |
398 | may be called in interrupt context. | 393 | may be called in interrupt context. |
399 | 394 | ||
400 | Destroy a pci_pool by calling: | 395 | Destroy a dma_pool by calling: |
401 | 396 | ||
402 | pci_pool_destroy(pool); | 397 | dma_pool_destroy(pool); |
403 | 398 | ||
404 | Make sure you've called pci_pool_free for all memory allocated | 399 | Make sure you've called dma_pool_free for all memory allocated |
405 | from a pool before you destroy the pool. This function may not | 400 | from a pool before you destroy the pool. This function may not |
406 | be called in interrupt context. | 401 | be called in interrupt context. |
407 | 402 | ||
@@ -411,15 +406,15 @@ The interfaces described in subsequent portions of this document | |||
411 | take a DMA direction argument, which is an integer and takes on | 406 | take a DMA direction argument, which is an integer and takes on |
412 | one of the following values: | 407 | one of the following values: |
413 | 408 | ||
414 | PCI_DMA_BIDIRECTIONAL | 409 | DMA_BIDIRECTIONAL |
415 | PCI_DMA_TODEVICE | 410 | DMA_TO_DEVICE |
416 | PCI_DMA_FROMDEVICE | 411 | DMA_FROM_DEVICE |
417 | PCI_DMA_NONE | 412 | DMA_NONE |
418 | 413 | ||
419 | One should provide the exact DMA direction if you know it. | 414 | One should provide the exact DMA direction if you know it. |
420 | 415 | ||
421 | PCI_DMA_TODEVICE means "from main memory to the PCI device" | 416 | DMA_TO_DEVICE means "from main memory to the device" |
422 | PCI_DMA_FROMDEVICE means "from the PCI device to main memory" | 417 | DMA_FROM_DEVICE means "from the device to main memory" |
423 | It is the direction in which the data moves during the DMA | 418 | It is the direction in which the data moves during the DMA |
424 | transfer. | 419 | transfer. |
425 | 420 | ||
@@ -427,12 +422,12 @@ You are _strongly_ encouraged to specify this as precisely | |||
427 | as you possibly can. | 422 | as you possibly can. |
428 | 423 | ||
429 | If you absolutely cannot know the direction of the DMA transfer, | 424 | If you absolutely cannot know the direction of the DMA transfer, |
430 | specify PCI_DMA_BIDIRECTIONAL. It means that the DMA can go in | 425 | specify DMA_BIDIRECTIONAL. It means that the DMA can go in |
431 | either direction. The platform guarantees that you may legally | 426 | either direction. The platform guarantees that you may legally |
432 | specify this, and that it will work, but this may be at the | 427 | specify this, and that it will work, but this may be at the |
433 | cost of performance for example. | 428 | cost of performance for example. |
434 | 429 | ||
435 | The value PCI_DMA_NONE is to be used for debugging. One can | 430 | The value DMA_NONE is to be used for debugging. One can |
436 | hold this in a data structure before you come to know the | 431 | hold this in a data structure before you come to know the |
437 | precise direction, and this will help catch cases where your | 432 | precise direction, and this will help catch cases where your |
438 | direction tracking logic has failed to set things up properly. | 433 | direction tracking logic has failed to set things up properly. |
@@ -442,21 +437,21 @@ potential platform-specific optimizations of such) is for debugging. | |||
442 | Some platforms actually have a write permission boolean which DMA | 437 | Some platforms actually have a write permission boolean which DMA |
443 | mappings can be marked with, much like page protections in the user | 438 | mappings can be marked with, much like page protections in the user |
444 | program address space. Such platforms can and do report errors in the | 439 | program address space. Such platforms can and do report errors in the |
445 | kernel logs when the PCI controller hardware detects violation of the | 440 | kernel logs when the DMA controller hardware detects violation of the |
446 | permission setting. | 441 | permission setting. |
447 | 442 | ||
448 | Only streaming mappings specify a direction, consistent mappings | 443 | Only streaming mappings specify a direction, consistent mappings |
449 | implicitly have a direction attribute setting of | 444 | implicitly have a direction attribute setting of |
450 | PCI_DMA_BIDIRECTIONAL. | 445 | DMA_BIDIRECTIONAL. |
451 | 446 | ||
452 | The SCSI subsystem tells you the direction to use in the | 447 | The SCSI subsystem tells you the direction to use in the |
453 | 'sc_data_direction' member of the SCSI command your driver is | 448 | 'sc_data_direction' member of the SCSI command your driver is |
454 | working on. | 449 | working on. |
455 | 450 | ||
456 | For Networking drivers, it's a rather simple affair. For transmit | 451 | For Networking drivers, it's a rather simple affair. For transmit |
457 | packets, map/unmap them with the PCI_DMA_TODEVICE direction | 452 | packets, map/unmap them with the DMA_TO_DEVICE direction |
458 | specifier. For receive packets, just the opposite, map/unmap them | 453 | specifier. For receive packets, just the opposite, map/unmap them |
459 | with the PCI_DMA_FROMDEVICE direction specifier. | 454 | with the DMA_FROM_DEVICE direction specifier. |
460 | 455 | ||
461 | Using Streaming DMA mappings | 456 | Using Streaming DMA mappings |
462 | 457 | ||
@@ -467,43 +462,43 @@ scatterlist. | |||
467 | 462 | ||
468 | To map a single region, you do: | 463 | To map a single region, you do: |
469 | 464 | ||
470 | struct pci_dev *pdev = mydev->pdev; | 465 | struct device *dev = &my_dev->dev; |
471 | dma_addr_t dma_handle; | 466 | dma_addr_t dma_handle; |
472 | void *addr = buffer->ptr; | 467 | void *addr = buffer->ptr; |
473 | size_t size = buffer->len; | 468 | size_t size = buffer->len; |
474 | 469 | ||
475 | dma_handle = pci_map_single(pdev, addr, size, direction); | 470 | dma_handle = dma_map_single(dev, addr, size, direction); |
476 | 471 | ||
477 | and to unmap it: | 472 | and to unmap it: |
478 | 473 | ||
479 | pci_unmap_single(pdev, dma_handle, size, direction); | 474 | dma_unmap_single(dev, dma_handle, size, direction); |
480 | 475 | ||
481 | You should call pci_unmap_single when the DMA activity is finished, e.g. | 476 | You should call dma_unmap_single when the DMA activity is finished, e.g. |
482 | from the interrupt which told you that the DMA transfer is done. | 477 | from the interrupt which told you that the DMA transfer is done. |
483 | 478 | ||
484 | Using cpu pointers like this for single mappings has a disadvantage, | 479 | Using cpu pointers like this for single mappings has a disadvantage, |
485 | you cannot reference HIGHMEM memory in this way. Thus, there is a | 480 | you cannot reference HIGHMEM memory in this way. Thus, there is a |
486 | map/unmap interface pair akin to pci_{map,unmap}_single. These | 481 | map/unmap interface pair akin to dma_{map,unmap}_single. These |
487 | interfaces deal with page/offset pairs instead of cpu pointers. | 482 | interfaces deal with page/offset pairs instead of cpu pointers. |
488 | Specifically: | 483 | Specifically: |
489 | 484 | ||
490 | struct pci_dev *pdev = mydev->pdev; | 485 | struct device *dev = &my_dev->dev; |
491 | dma_addr_t dma_handle; | 486 | dma_addr_t dma_handle; |
492 | struct page *page = buffer->page; | 487 | struct page *page = buffer->page; |
493 | unsigned long offset = buffer->offset; | 488 | unsigned long offset = buffer->offset; |
494 | size_t size = buffer->len; | 489 | size_t size = buffer->len; |
495 | 490 | ||
496 | dma_handle = pci_map_page(pdev, page, offset, size, direction); | 491 | dma_handle = dma_map_page(dev, page, offset, size, direction); |
497 | 492 | ||
498 | ... | 493 | ... |
499 | 494 | ||
500 | pci_unmap_page(pdev, dma_handle, size, direction); | 495 | dma_unmap_page(dev, dma_handle, size, direction); |
501 | 496 | ||
502 | Here, "offset" means byte offset within the given page. | 497 | Here, "offset" means byte offset within the given page. |
503 | 498 | ||
504 | With scatterlists, you map a region gathered from several regions by: | 499 | With scatterlists, you map a region gathered from several regions by: |
505 | 500 | ||
506 | int i, count = pci_map_sg(pdev, sglist, nents, direction); | 501 | int i, count = dma_map_sg(dev, sglist, nents, direction); |
507 | struct scatterlist *sg; | 502 | struct scatterlist *sg; |
508 | 503 | ||
509 | for_each_sg(sglist, sg, count, i) { | 504 | for_each_sg(sglist, sg, count, i) { |
@@ -527,16 +522,16 @@ accessed sg->address and sg->length as shown above. | |||
527 | 522 | ||
528 | To unmap a scatterlist, just call: | 523 | To unmap a scatterlist, just call: |
529 | 524 | ||
530 | pci_unmap_sg(pdev, sglist, nents, direction); | 525 | dma_unmap_sg(dev, sglist, nents, direction); |
531 | 526 | ||
532 | Again, make sure DMA activity has already finished. | 527 | Again, make sure DMA activity has already finished. |
533 | 528 | ||
534 | PLEASE NOTE: The 'nents' argument to the pci_unmap_sg call must be | 529 | PLEASE NOTE: The 'nents' argument to the dma_unmap_sg call must be |
535 | the _same_ one you passed into the pci_map_sg call, | 530 | the _same_ one you passed into the dma_map_sg call, |
536 | it should _NOT_ be the 'count' value _returned_ from the | 531 | it should _NOT_ be the 'count' value _returned_ from the |
537 | pci_map_sg call. | 532 | dma_map_sg call. |
538 | 533 | ||
539 | Every pci_map_{single,sg} call should have its pci_unmap_{single,sg} | 534 | Every dma_map_{single,sg} call should have its dma_unmap_{single,sg} |
540 | counterpart, because the bus address space is a shared resource (although | 535 | counterpart, because the bus address space is a shared resource (although |
541 | in some ports the mapping is per each BUS so less devices contend for the | 536 | in some ports the mapping is per each BUS so less devices contend for the |
542 | same bus address space) and you could render the machine unusable by eating | 537 | same bus address space) and you could render the machine unusable by eating |
@@ -547,14 +542,14 @@ the data in between the DMA transfers, the buffer needs to be synced | |||
547 | properly in order for the cpu and device to see the most uptodate and | 542 | properly in order for the cpu and device to see the most uptodate and |
548 | correct copy of the DMA buffer. | 543 | correct copy of the DMA buffer. |
549 | 544 | ||
550 | So, firstly, just map it with pci_map_{single,sg}, and after each DMA | 545 | So, firstly, just map it with dma_map_{single,sg}, and after each DMA |
551 | transfer call either: | 546 | transfer call either: |
552 | 547 | ||
553 | pci_dma_sync_single_for_cpu(pdev, dma_handle, size, direction); | 548 | dma_sync_single_for_cpu(dev, dma_handle, size, direction); |
554 | 549 | ||
555 | or: | 550 | or: |
556 | 551 | ||
557 | pci_dma_sync_sg_for_cpu(pdev, sglist, nents, direction); | 552 | dma_sync_sg_for_cpu(dev, sglist, nents, direction); |
558 | 553 | ||
559 | as appropriate. | 554 | as appropriate. |
560 | 555 | ||
@@ -562,27 +557,27 @@ Then, if you wish to let the device get at the DMA area again, | |||
562 | finish accessing the data with the cpu, and then before actually | 557 | finish accessing the data with the cpu, and then before actually |
563 | giving the buffer to the hardware call either: | 558 | giving the buffer to the hardware call either: |
564 | 559 | ||
565 | pci_dma_sync_single_for_device(pdev, dma_handle, size, direction); | 560 | dma_sync_single_for_device(dev, dma_handle, size, direction); |
566 | 561 | ||
567 | or: | 562 | or: |
568 | 563 | ||
569 | pci_dma_sync_sg_for_device(dev, sglist, nents, direction); | 564 | dma_sync_sg_for_device(dev, sglist, nents, direction); |
570 | 565 | ||
571 | as appropriate. | 566 | as appropriate. |
572 | 567 | ||
573 | After the last DMA transfer call one of the DMA unmap routines | 568 | After the last DMA transfer call one of the DMA unmap routines |
574 | pci_unmap_{single,sg}. If you don't touch the data from the first pci_map_* | 569 | dma_unmap_{single,sg}. If you don't touch the data from the first dma_map_* |
575 | call till pci_unmap_*, then you don't have to call the pci_dma_sync_* | 570 | call till dma_unmap_*, then you don't have to call the dma_sync_* |
576 | routines at all. | 571 | routines at all. |
577 | 572 | ||
578 | Here is pseudo code which shows a situation in which you would need | 573 | Here is pseudo code which shows a situation in which you would need |
579 | to use the pci_dma_sync_*() interfaces. | 574 | to use the dma_sync_*() interfaces. |
580 | 575 | ||
581 | my_card_setup_receive_buffer(struct my_card *cp, char *buffer, int len) | 576 | my_card_setup_receive_buffer(struct my_card *cp, char *buffer, int len) |
582 | { | 577 | { |
583 | dma_addr_t mapping; | 578 | dma_addr_t mapping; |
584 | 579 | ||
585 | mapping = pci_map_single(cp->pdev, buffer, len, PCI_DMA_FROMDEVICE); | 580 | mapping = dma_map_single(cp->dev, buffer, len, DMA_FROM_DEVICE); |
586 | 581 | ||
587 | cp->rx_buf = buffer; | 582 | cp->rx_buf = buffer; |
588 | cp->rx_len = len; | 583 | cp->rx_len = len; |
@@ -606,25 +601,25 @@ to use the pci_dma_sync_*() interfaces. | |||
606 | * the DMA transfer with the CPU first | 601 | * the DMA transfer with the CPU first |
607 | * so that we see updated contents. | 602 | * so that we see updated contents. |
608 | */ | 603 | */ |
609 | pci_dma_sync_single_for_cpu(cp->pdev, cp->rx_dma, | 604 | dma_sync_single_for_cpu(&cp->dev, cp->rx_dma, |
610 | cp->rx_len, | 605 | cp->rx_len, |
611 | PCI_DMA_FROMDEVICE); | 606 | DMA_FROM_DEVICE); |
612 | 607 | ||
613 | /* Now it is safe to examine the buffer. */ | 608 | /* Now it is safe to examine the buffer. */ |
614 | hp = (struct my_card_header *) cp->rx_buf; | 609 | hp = (struct my_card_header *) cp->rx_buf; |
615 | if (header_is_ok(hp)) { | 610 | if (header_is_ok(hp)) { |
616 | pci_unmap_single(cp->pdev, cp->rx_dma, cp->rx_len, | 611 | dma_unmap_single(&cp->dev, cp->rx_dma, cp->rx_len, |
617 | PCI_DMA_FROMDEVICE); | 612 | DMA_FROM_DEVICE); |
618 | pass_to_upper_layers(cp->rx_buf); | 613 | pass_to_upper_layers(cp->rx_buf); |
619 | make_and_setup_new_rx_buf(cp); | 614 | make_and_setup_new_rx_buf(cp); |
620 | } else { | 615 | } else { |
621 | /* Just sync the buffer and give it back | 616 | /* Just sync the buffer and give it back |
622 | * to the card. | 617 | * to the card. |
623 | */ | 618 | */ |
624 | pci_dma_sync_single_for_device(cp->pdev, | 619 | dma_sync_single_for_device(&cp->dev, |
625 | cp->rx_dma, | 620 | cp->rx_dma, |
626 | cp->rx_len, | 621 | cp->rx_len, |
627 | PCI_DMA_FROMDEVICE); | 622 | DMA_FROM_DEVICE); |
628 | give_rx_buf_to_card(cp); | 623 | give_rx_buf_to_card(cp); |
629 | } | 624 | } |
630 | } | 625 | } |
@@ -634,19 +629,19 @@ Drivers converted fully to this interface should not use virt_to_bus any | |||
634 | longer, nor should they use bus_to_virt. Some drivers have to be changed a | 629 | longer, nor should they use bus_to_virt. Some drivers have to be changed a |
635 | little bit, because there is no longer an equivalent to bus_to_virt in the | 630 | little bit, because there is no longer an equivalent to bus_to_virt in the |
636 | dynamic DMA mapping scheme - you have to always store the DMA addresses | 631 | dynamic DMA mapping scheme - you have to always store the DMA addresses |
637 | returned by the pci_alloc_consistent, pci_pool_alloc, and pci_map_single | 632 | returned by the dma_alloc_coherent, dma_pool_alloc, and dma_map_single |
638 | calls (pci_map_sg stores them in the scatterlist itself if the platform | 633 | calls (dma_map_sg stores them in the scatterlist itself if the platform |
639 | supports dynamic DMA mapping in hardware) in your driver structures and/or | 634 | supports dynamic DMA mapping in hardware) in your driver structures and/or |
640 | in the card registers. | 635 | in the card registers. |
641 | 636 | ||
642 | All PCI drivers should be using these interfaces with no exceptions. | 637 | All drivers should be using these interfaces with no exceptions. It |
643 | It is planned to completely remove virt_to_bus() and bus_to_virt() as | 638 | is planned to completely remove virt_to_bus() and bus_to_virt() as |
644 | they are entirely deprecated. Some ports already do not provide these | 639 | they are entirely deprecated. Some ports already do not provide these |
645 | as it is impossible to correctly support them. | 640 | as it is impossible to correctly support them. |
646 | 641 | ||
647 | Optimizing Unmap State Space Consumption | 642 | Optimizing Unmap State Space Consumption |
648 | 643 | ||
649 | On many platforms, pci_unmap_{single,page}() is simply a nop. | 644 | On many platforms, dma_unmap_{single,page}() is simply a nop. |
650 | Therefore, keeping track of the mapping address and length is a waste | 645 | Therefore, keeping track of the mapping address and length is a waste |
651 | of space. Instead of filling your drivers up with ifdefs and the like | 646 | of space. Instead of filling your drivers up with ifdefs and the like |
652 | to "work around" this (which would defeat the whole purpose of a | 647 | to "work around" this (which would defeat the whole purpose of a |
@@ -655,7 +650,7 @@ portable API) the following facilities are provided. | |||
655 | Actually, instead of describing the macros one by one, we'll | 650 | Actually, instead of describing the macros one by one, we'll |
656 | transform some example code. | 651 | transform some example code. |
657 | 652 | ||
658 | 1) Use DECLARE_PCI_UNMAP_{ADDR,LEN} in state saving structures. | 653 | 1) Use DEFINE_DMA_UNMAP_{ADDR,LEN} in state saving structures. |
659 | Example, before: | 654 | Example, before: |
660 | 655 | ||
661 | struct ring_state { | 656 | struct ring_state { |
@@ -668,14 +663,11 @@ transform some example code. | |||
668 | 663 | ||
669 | struct ring_state { | 664 | struct ring_state { |
670 | struct sk_buff *skb; | 665 | struct sk_buff *skb; |
671 | DECLARE_PCI_UNMAP_ADDR(mapping) | 666 | DEFINE_DMA_UNMAP_ADDR(mapping); |
672 | DECLARE_PCI_UNMAP_LEN(len) | 667 | DEFINE_DMA_UNMAP_LEN(len); |
673 | }; | 668 | }; |
674 | 669 | ||
675 | NOTE: DO NOT put a semicolon at the end of the DECLARE_*() | 670 | 2) Use dma_unmap_{addr,len}_set to set these values. |
676 | macro. | ||
677 | |||
678 | 2) Use pci_unmap_{addr,len}_set to set these values. | ||
679 | Example, before: | 671 | Example, before: |
680 | 672 | ||
681 | ringp->mapping = FOO; | 673 | ringp->mapping = FOO; |
@@ -683,21 +675,21 @@ transform some example code. | |||
683 | 675 | ||
684 | after: | 676 | after: |
685 | 677 | ||
686 | pci_unmap_addr_set(ringp, mapping, FOO); | 678 | dma_unmap_addr_set(ringp, mapping, FOO); |
687 | pci_unmap_len_set(ringp, len, BAR); | 679 | dma_unmap_len_set(ringp, len, BAR); |
688 | 680 | ||
689 | 3) Use pci_unmap_{addr,len} to access these values. | 681 | 3) Use dma_unmap_{addr,len} to access these values. |
690 | Example, before: | 682 | Example, before: |
691 | 683 | ||
692 | pci_unmap_single(pdev, ringp->mapping, ringp->len, | 684 | dma_unmap_single(dev, ringp->mapping, ringp->len, |
693 | PCI_DMA_FROMDEVICE); | 685 | DMA_FROM_DEVICE); |
694 | 686 | ||
695 | after: | 687 | after: |
696 | 688 | ||
697 | pci_unmap_single(pdev, | 689 | dma_unmap_single(dev, |
698 | pci_unmap_addr(ringp, mapping), | 690 | dma_unmap_addr(ringp, mapping), |
699 | pci_unmap_len(ringp, len), | 691 | dma_unmap_len(ringp, len), |
700 | PCI_DMA_FROMDEVICE); | 692 | DMA_FROM_DEVICE); |
701 | 693 | ||
702 | It really should be self-explanatory. We treat the ADDR and LEN | 694 | It really should be self-explanatory. We treat the ADDR and LEN |
703 | separately, because it is possible for an implementation to only | 695 | separately, because it is possible for an implementation to only |
@@ -732,15 +724,15 @@ to "Closing". | |||
732 | DMA address space is limited on some architectures and an allocation | 724 | DMA address space is limited on some architectures and an allocation |
733 | failure can be determined by: | 725 | failure can be determined by: |
734 | 726 | ||
735 | - checking if pci_alloc_consistent returns NULL or pci_map_sg returns 0 | 727 | - checking if dma_alloc_coherent returns NULL or dma_map_sg returns 0 |
736 | 728 | ||
737 | - checking the returned dma_addr_t of pci_map_single and pci_map_page | 729 | - checking the returned dma_addr_t of dma_map_single and dma_map_page |
738 | by using pci_dma_mapping_error(): | 730 | by using dma_mapping_error(): |
739 | 731 | ||
740 | dma_addr_t dma_handle; | 732 | dma_addr_t dma_handle; |
741 | 733 | ||
742 | dma_handle = pci_map_single(pdev, addr, size, direction); | 734 | dma_handle = dma_map_single(dev, addr, size, direction); |
743 | if (pci_dma_mapping_error(pdev, dma_handle)) { | 735 | if (dma_mapping_error(dev, dma_handle)) { |
744 | /* | 736 | /* |
745 | * reduce current DMA mapping usage, | 737 | * reduce current DMA mapping usage, |
746 | * delay and try again later or | 738 | * delay and try again later or |
diff --git a/Documentation/RCU/00-INDEX b/Documentation/RCU/00-INDEX index 9bb62f7b89c3..71b6f500ddb9 100644 --- a/Documentation/RCU/00-INDEX +++ b/Documentation/RCU/00-INDEX | |||
@@ -6,16 +6,22 @@ checklist.txt | |||
6 | - Review Checklist for RCU Patches | 6 | - Review Checklist for RCU Patches |
7 | listRCU.txt | 7 | listRCU.txt |
8 | - Using RCU to Protect Read-Mostly Linked Lists | 8 | - Using RCU to Protect Read-Mostly Linked Lists |
9 | lockdep.txt | ||
10 | - RCU and lockdep checking | ||
9 | NMI-RCU.txt | 11 | NMI-RCU.txt |
10 | - Using RCU to Protect Dynamic NMI Handlers | 12 | - Using RCU to Protect Dynamic NMI Handlers |
13 | rcubarrier.txt | ||
14 | - RCU and Unloadable Modules | ||
15 | rculist_nulls.txt | ||
16 | - RCU list primitives for use with SLAB_DESTROY_BY_RCU | ||
11 | rcuref.txt | 17 | rcuref.txt |
12 | - Reference-count design for elements of lists/arrays protected by RCU | 18 | - Reference-count design for elements of lists/arrays protected by RCU |
13 | rcu.txt | 19 | rcu.txt |
14 | - RCU Concepts | 20 | - RCU Concepts |
15 | rcubarrier.txt | ||
16 | - Unloading modules that use RCU callbacks | ||
17 | RTFP.txt | 21 | RTFP.txt |
18 | - List of RCU papers (bibliography) going back to 1980. | 22 | - List of RCU papers (bibliography) going back to 1980. |
23 | stallwarn.txt | ||
24 | - RCU CPU stall warnings (CONFIG_RCU_CPU_STALL_DETECTOR) | ||
19 | torture.txt | 25 | torture.txt |
20 | - RCU Torture Test Operation (CONFIG_RCU_TORTURE_TEST) | 26 | - RCU Torture Test Operation (CONFIG_RCU_TORTURE_TEST) |
21 | trace.txt | 27 | trace.txt |
diff --git a/Documentation/RCU/RTFP.txt b/Documentation/RCU/RTFP.txt index d2b85237c76e..5aea459e3dd6 100644 --- a/Documentation/RCU/RTFP.txt +++ b/Documentation/RCU/RTFP.txt | |||
@@ -25,10 +25,10 @@ to be referencing the data structure. However, this mechanism was not | |||
25 | optimized for modern computer systems, which is not surprising given | 25 | optimized for modern computer systems, which is not surprising given |
26 | that these overheads were not so expensive in the mid-80s. Nonetheless, | 26 | that these overheads were not so expensive in the mid-80s. Nonetheless, |
27 | passive serialization appears to be the first deferred-destruction | 27 | passive serialization appears to be the first deferred-destruction |
28 | mechanism to be used in production. Furthermore, the relevant patent has | 28 | mechanism to be used in production. Furthermore, the relevant patent |
29 | lapsed, so this approach may be used in non-GPL software, if desired. | 29 | has lapsed, so this approach may be used in non-GPL software, if desired. |
30 | (In contrast, use of RCU is permitted only in software licensed under | 30 | (In contrast, implementation of RCU is permitted only in software licensed |
31 | GPL. Sorry!!!) | 31 | under either GPL or LGPL. Sorry!!!) |
32 | 32 | ||
33 | In 1990, Pugh [Pugh90] noted that explicitly tracking which threads | 33 | In 1990, Pugh [Pugh90] noted that explicitly tracking which threads |
34 | were reading a given data structure permitted deferred free to operate | 34 | were reading a given data structure permitted deferred free to operate |
@@ -150,6 +150,18 @@ preemptible RCU [PaulEMcKenney2007PreemptibleRCU], and the three-part | |||
150 | LWN "What is RCU?" series [PaulEMcKenney2007WhatIsRCUFundamentally, | 150 | LWN "What is RCU?" series [PaulEMcKenney2007WhatIsRCUFundamentally, |
151 | PaulEMcKenney2008WhatIsRCUUsage, and PaulEMcKenney2008WhatIsRCUAPI]. | 151 | PaulEMcKenney2008WhatIsRCUUsage, and PaulEMcKenney2008WhatIsRCUAPI]. |
152 | 152 | ||
153 | 2008 saw a journal paper on real-time RCU [DinakarGuniguntala2008IBMSysJ], | ||
154 | a history of how Linux changed RCU more than RCU changed Linux | ||
155 | [PaulEMcKenney2008RCUOSR], and a design overview of hierarchical RCU | ||
156 | [PaulEMcKenney2008HierarchicalRCU]. | ||
157 | |||
158 | 2009 introduced user-level RCU algorithms [PaulEMcKenney2009MaliciousURCU], | ||
159 | which Mathieu Desnoyers is now maintaining [MathieuDesnoyers2009URCU] | ||
160 | [MathieuDesnoyersPhD]. TINY_RCU [PaulEMcKenney2009BloatWatchRCU] made | ||
161 | its appearance, as did expedited RCU [PaulEMcKenney2009expeditedRCU]. | ||
162 | The problem of resizeable RCU-protected hash tables may now be on a path | ||
163 | to a solution [JoshTriplett2009RPHash]. | ||
164 | |||
153 | Bibtex Entries | 165 | Bibtex Entries |
154 | 166 | ||
155 | @article{Kung80 | 167 | @article{Kung80 |
@@ -730,6 +742,11 @@ Revised: | |||
730 | " | 742 | " |
731 | } | 743 | } |
732 | 744 | ||
745 | # | ||
746 | # "What is RCU?" LWN series. | ||
747 | # | ||
748 | ######################################################################## | ||
749 | |||
733 | @article{DinakarGuniguntala2008IBMSysJ | 750 | @article{DinakarGuniguntala2008IBMSysJ |
734 | ,author="D. Guniguntala and P. E. McKenney and J. Triplett and J. Walpole" | 751 | ,author="D. Guniguntala and P. E. McKenney and J. Triplett and J. Walpole" |
735 | ,title="The read-copy-update mechanism for supporting real-time applications on shared-memory multiprocessor systems with {Linux}" | 752 | ,title="The read-copy-update mechanism for supporting real-time applications on shared-memory multiprocessor systems with {Linux}" |
@@ -820,3 +837,39 @@ Revised: | |||
820 | Uniprocessor assumptions allow simplified RCU implementation. | 837 | Uniprocessor assumptions allow simplified RCU implementation. |
821 | " | 838 | " |
822 | } | 839 | } |
840 | |||
841 | @unpublished{PaulEMcKenney2009expeditedRCU | ||
842 | ,Author="Paul E. McKenney" | ||
843 | ,Title="[{PATCH} -tip 0/3] expedited 'big hammer' {RCU} grace periods" | ||
844 | ,month="June" | ||
845 | ,day="25" | ||
846 | ,year="2009" | ||
847 | ,note="Available: | ||
848 | \url{http://lkml.org/lkml/2009/6/25/306} | ||
849 | [Viewed August 16, 2009]" | ||
850 | ,annotation=" | ||
851 | First posting of expedited RCU to be accepted into -tip. | ||
852 | " | ||
853 | } | ||
854 | |||
855 | @unpublished{JoshTriplett2009RPHash | ||
856 | ,Author="Josh Triplett" | ||
857 | ,Title="Scalable concurrent hash tables via relativistic programming" | ||
858 | ,month="September" | ||
859 | ,year="2009" | ||
860 | ,note="Linux Plumbers Conference presentation" | ||
861 | ,annotation=" | ||
862 | RP fun with hash tables. | ||
863 | " | ||
864 | } | ||
865 | |||
866 | @phdthesis{MathieuDesnoyersPhD | ||
867 | , title = "Low-Impact Operating System Tracing" | ||
868 | , author = "Mathieu Desnoyers" | ||
869 | , school = "Ecole Polytechnique de Montr\'{e}al" | ||
870 | , month = "December" | ||
871 | , year = 2009 | ||
872 | ,note="Available: | ||
873 | \url{http://www.lttng.org/pub/thesis/desnoyers-dissertation-2009-12.pdf} | ||
874 | [Viewed December 9, 2009]" | ||
875 | } | ||
diff --git a/Documentation/RCU/checklist.txt b/Documentation/RCU/checklist.txt index 51525a30e8b4..cbc180f90194 100644 --- a/Documentation/RCU/checklist.txt +++ b/Documentation/RCU/checklist.txt | |||
@@ -8,13 +8,12 @@ would cause. This list is based on experiences reviewing such patches | |||
8 | over a rather long period of time, but improvements are always welcome! | 8 | over a rather long period of time, but improvements are always welcome! |
9 | 9 | ||
10 | 0. Is RCU being applied to a read-mostly situation? If the data | 10 | 0. Is RCU being applied to a read-mostly situation? If the data |
11 | structure is updated more than about 10% of the time, then | 11 | structure is updated more than about 10% of the time, then you |
12 | you should strongly consider some other approach, unless | 12 | should strongly consider some other approach, unless detailed |
13 | detailed performance measurements show that RCU is nonetheless | 13 | performance measurements show that RCU is nonetheless the right |
14 | the right tool for the job. Yes, you might think of RCU | 14 | tool for the job. Yes, RCU does reduce read-side overhead by |
15 | as simply cutting overhead off of the readers and imposing it | 15 | increasing write-side overhead, which is exactly why normal uses |
16 | on the writers. That is exactly why normal uses of RCU will | 16 | of RCU will do much more reading than updating. |
17 | do much more reading than updating. | ||
18 | 17 | ||
19 | Another exception is where performance is not an issue, and RCU | 18 | Another exception is where performance is not an issue, and RCU |
20 | provides a simpler implementation. An example of this situation | 19 | provides a simpler implementation. An example of this situation |
@@ -35,13 +34,13 @@ over a rather long period of time, but improvements are always welcome! | |||
35 | 34 | ||
36 | If you choose #b, be prepared to describe how you have handled | 35 | If you choose #b, be prepared to describe how you have handled |
37 | memory barriers on weakly ordered machines (pretty much all of | 36 | memory barriers on weakly ordered machines (pretty much all of |
38 | them -- even x86 allows reads to be reordered), and be prepared | 37 | them -- even x86 allows later loads to be reordered to precede |
39 | to explain why this added complexity is worthwhile. If you | 38 | earlier stores), and be prepared to explain why this added |
40 | choose #c, be prepared to explain how this single task does not | 39 | complexity is worthwhile. If you choose #c, be prepared to |
41 | become a major bottleneck on big multiprocessor machines (for | 40 | explain how this single task does not become a major bottleneck on |
42 | example, if the task is updating information relating to itself | 41 | big multiprocessor machines (for example, if the task is updating |
43 | that other tasks can read, there by definition can be no | 42 | information relating to itself that other tasks can read, there |
44 | bottleneck). | 43 | by definition can be no bottleneck). |
45 | 44 | ||
46 | 2. Do the RCU read-side critical sections make proper use of | 45 | 2. Do the RCU read-side critical sections make proper use of |
47 | rcu_read_lock() and friends? These primitives are needed | 46 | rcu_read_lock() and friends? These primitives are needed |
@@ -51,8 +50,10 @@ over a rather long period of time, but improvements are always welcome! | |||
51 | actuarial risk of your kernel. | 50 | actuarial risk of your kernel. |
52 | 51 | ||
53 | As a rough rule of thumb, any dereference of an RCU-protected | 52 | As a rough rule of thumb, any dereference of an RCU-protected |
54 | pointer must be covered by rcu_read_lock() or rcu_read_lock_bh() | 53 | pointer must be covered by rcu_read_lock(), rcu_read_lock_bh(), |
55 | or by the appropriate update-side lock. | 54 | rcu_read_lock_sched(), or by the appropriate update-side lock. |
55 | Disabling of preemption can serve as rcu_read_lock_sched(), but | ||
56 | is less readable. | ||
56 | 57 | ||
57 | 3. Does the update code tolerate concurrent accesses? | 58 | 3. Does the update code tolerate concurrent accesses? |
58 | 59 | ||
@@ -62,25 +63,27 @@ over a rather long period of time, but improvements are always welcome! | |||
62 | of ways to handle this concurrency, depending on the situation: | 63 | of ways to handle this concurrency, depending on the situation: |
63 | 64 | ||
64 | a. Use the RCU variants of the list and hlist update | 65 | a. Use the RCU variants of the list and hlist update |
65 | primitives to add, remove, and replace elements on an | 66 | primitives to add, remove, and replace elements on |
66 | RCU-protected list. Alternatively, use the RCU-protected | 67 | an RCU-protected list. Alternatively, use the other |
67 | trees that have been added to the Linux kernel. | 68 | RCU-protected data structures that have been added to |
69 | the Linux kernel. | ||
68 | 70 | ||
69 | This is almost always the best approach. | 71 | This is almost always the best approach. |
70 | 72 | ||
71 | b. Proceed as in (a) above, but also maintain per-element | 73 | b. Proceed as in (a) above, but also maintain per-element |
72 | locks (that are acquired by both readers and writers) | 74 | locks (that are acquired by both readers and writers) |
73 | that guard per-element state. Of course, fields that | 75 | that guard per-element state. Of course, fields that |
74 | the readers refrain from accessing can be guarded by the | 76 | the readers refrain from accessing can be guarded by |
75 | update-side lock. | 77 | some other lock acquired only by updaters, if desired. |
76 | 78 | ||
77 | This works quite well, also. | 79 | This works quite well, also. |
78 | 80 | ||
79 | c. Make updates appear atomic to readers. For example, | 81 | c. Make updates appear atomic to readers. For example, |
80 | pointer updates to properly aligned fields will appear | 82 | pointer updates to properly aligned fields will |
81 | atomic, as will individual atomic primitives. Operations | 83 | appear atomic, as will individual atomic primitives. |
82 | performed under a lock and sequences of multiple atomic | 84 | Sequences of perations performed under a lock will -not- |
83 | primitives will -not- appear to be atomic. | 85 | appear to be atomic to RCU readers, nor will sequences |
86 | of multiple atomic primitives. | ||
84 | 87 | ||
85 | This can work, but is starting to get a bit tricky. | 88 | This can work, but is starting to get a bit tricky. |
86 | 89 | ||
@@ -98,9 +101,9 @@ over a rather long period of time, but improvements are always welcome! | |||
98 | a new structure containing updated values. | 101 | a new structure containing updated values. |
99 | 102 | ||
100 | 4. Weakly ordered CPUs pose special challenges. Almost all CPUs | 103 | 4. Weakly ordered CPUs pose special challenges. Almost all CPUs |
101 | are weakly ordered -- even i386 CPUs allow reads to be reordered. | 104 | are weakly ordered -- even x86 CPUs allow later loads to be |
102 | RCU code must take all of the following measures to prevent | 105 | reordered to precede earlier stores. RCU code must take all of |
103 | memory-corruption problems: | 106 | the following measures to prevent memory-corruption problems: |
104 | 107 | ||
105 | a. Readers must maintain proper ordering of their memory | 108 | a. Readers must maintain proper ordering of their memory |
106 | accesses. The rcu_dereference() primitive ensures that | 109 | accesses. The rcu_dereference() primitive ensures that |
@@ -113,14 +116,25 @@ over a rather long period of time, but improvements are always welcome! | |||
113 | The rcu_dereference() primitive is also an excellent | 116 | The rcu_dereference() primitive is also an excellent |
114 | documentation aid, letting the person reading the code | 117 | documentation aid, letting the person reading the code |
115 | know exactly which pointers are protected by RCU. | 118 | know exactly which pointers are protected by RCU. |
116 | 119 | Please note that compilers can also reorder code, and | |
117 | The rcu_dereference() primitive is used by the various | 120 | they are becoming increasingly aggressive about doing |
118 | "_rcu()" list-traversal primitives, such as the | 121 | just that. The rcu_dereference() primitive therefore |
119 | list_for_each_entry_rcu(). Note that it is perfectly | 122 | also prevents destructive compiler optimizations. |
120 | legal (if redundant) for update-side code to use | 123 | |
121 | rcu_dereference() and the "_rcu()" list-traversal | 124 | The rcu_dereference() primitive is used by the |
122 | primitives. This is particularly useful in code | 125 | various "_rcu()" list-traversal primitives, such |
123 | that is common to readers and updaters. | 126 | as the list_for_each_entry_rcu(). Note that it is |
127 | perfectly legal (if redundant) for update-side code to | ||
128 | use rcu_dereference() and the "_rcu()" list-traversal | ||
129 | primitives. This is particularly useful in code that | ||
130 | is common to readers and updaters. However, lockdep | ||
131 | will complain if you access rcu_dereference() outside | ||
132 | of an RCU read-side critical section. See lockdep.txt | ||
133 | to learn what to do about this. | ||
134 | |||
135 | Of course, neither rcu_dereference() nor the "_rcu()" | ||
136 | list-traversal primitives can substitute for a good | ||
137 | concurrency design coordinating among multiple updaters. | ||
124 | 138 | ||
125 | b. If the list macros are being used, the list_add_tail_rcu() | 139 | b. If the list macros are being used, the list_add_tail_rcu() |
126 | and list_add_rcu() primitives must be used in order | 140 | and list_add_rcu() primitives must be used in order |
@@ -135,11 +149,14 @@ over a rather long period of time, but improvements are always welcome! | |||
135 | readers. Similarly, if the hlist macros are being used, | 149 | readers. Similarly, if the hlist macros are being used, |
136 | the hlist_del_rcu() primitive is required. | 150 | the hlist_del_rcu() primitive is required. |
137 | 151 | ||
138 | The list_replace_rcu() primitive may be used to | 152 | The list_replace_rcu() and hlist_replace_rcu() primitives |
139 | replace an old structure with a new one in an | 153 | may be used to replace an old structure with a new one |
140 | RCU-protected list. | 154 | in their respective types of RCU-protected lists. |
155 | |||
156 | d. Rules similar to (4b) and (4c) apply to the "hlist_nulls" | ||
157 | type of RCU-protected linked lists. | ||
141 | 158 | ||
142 | d. Updates must ensure that initialization of a given | 159 | e. Updates must ensure that initialization of a given |
143 | structure happens before pointers to that structure are | 160 | structure happens before pointers to that structure are |
144 | publicized. Use the rcu_assign_pointer() primitive | 161 | publicized. Use the rcu_assign_pointer() primitive |
145 | when publicizing a pointer to a structure that can | 162 | when publicizing a pointer to a structure that can |
@@ -151,16 +168,31 @@ over a rather long period of time, but improvements are always welcome! | |||
151 | it cannot block. | 168 | it cannot block. |
152 | 169 | ||
153 | 6. Since synchronize_rcu() can block, it cannot be called from | 170 | 6. Since synchronize_rcu() can block, it cannot be called from |
154 | any sort of irq context. Ditto for synchronize_sched() and | 171 | any sort of irq context. The same rule applies for |
155 | synchronize_srcu(). | 172 | synchronize_rcu_bh(), synchronize_sched(), synchronize_srcu(), |
156 | 173 | synchronize_rcu_expedited(), synchronize_rcu_bh_expedited(), | |
157 | 7. If the updater uses call_rcu(), then the corresponding readers | 174 | synchronize_sched_expedite(), and synchronize_srcu_expedited(). |
158 | must use rcu_read_lock() and rcu_read_unlock(). If the updater | 175 | |
159 | uses call_rcu_bh(), then the corresponding readers must use | 176 | The expedited forms of these primitives have the same semantics |
160 | rcu_read_lock_bh() and rcu_read_unlock_bh(). If the updater | 177 | as the non-expedited forms, but expediting is both expensive |
161 | uses call_rcu_sched(), then the corresponding readers must | 178 | and unfriendly to real-time workloads. Use of the expedited |
162 | disable preemption. Mixing things up will result in confusion | 179 | primitives should be restricted to rare configuration-change |
163 | and broken kernels. | 180 | operations that would not normally be undertaken while a real-time |
181 | workload is running. | ||
182 | |||
183 | 7. If the updater uses call_rcu() or synchronize_rcu(), then the | ||
184 | corresponding readers must use rcu_read_lock() and | ||
185 | rcu_read_unlock(). If the updater uses call_rcu_bh() or | ||
186 | synchronize_rcu_bh(), then the corresponding readers must | ||
187 | use rcu_read_lock_bh() and rcu_read_unlock_bh(). If the | ||
188 | updater uses call_rcu_sched() or synchronize_sched(), then | ||
189 | the corresponding readers must disable preemption, possibly | ||
190 | by calling rcu_read_lock_sched() and rcu_read_unlock_sched(). | ||
191 | If the updater uses synchronize_srcu(), the the corresponding | ||
192 | readers must use srcu_read_lock() and srcu_read_unlock(), | ||
193 | and with the same srcu_struct. The rules for the expedited | ||
194 | primitives are the same as for their non-expedited counterparts. | ||
195 | Mixing things up will result in confusion and broken kernels. | ||
164 | 196 | ||
165 | One exception to this rule: rcu_read_lock() and rcu_read_unlock() | 197 | One exception to this rule: rcu_read_lock() and rcu_read_unlock() |
166 | may be substituted for rcu_read_lock_bh() and rcu_read_unlock_bh() | 198 | may be substituted for rcu_read_lock_bh() and rcu_read_unlock_bh() |
@@ -212,6 +244,8 @@ over a rather long period of time, but improvements are always welcome! | |||
212 | e. Periodically invoke synchronize_rcu(), permitting a limited | 244 | e. Periodically invoke synchronize_rcu(), permitting a limited |
213 | number of updates per grace period. | 245 | number of updates per grace period. |
214 | 246 | ||
247 | The same cautions apply to call_rcu_bh() and call_rcu_sched(). | ||
248 | |||
215 | 9. All RCU list-traversal primitives, which include | 249 | 9. All RCU list-traversal primitives, which include |
216 | rcu_dereference(), list_for_each_entry_rcu(), | 250 | rcu_dereference(), list_for_each_entry_rcu(), |
217 | list_for_each_continue_rcu(), and list_for_each_safe_rcu(), | 251 | list_for_each_continue_rcu(), and list_for_each_safe_rcu(), |
@@ -219,7 +253,9 @@ over a rather long period of time, but improvements are always welcome! | |||
219 | must be protected by appropriate update-side locks. RCU | 253 | must be protected by appropriate update-side locks. RCU |
220 | read-side critical sections are delimited by rcu_read_lock() | 254 | read-side critical sections are delimited by rcu_read_lock() |
221 | and rcu_read_unlock(), or by similar primitives such as | 255 | and rcu_read_unlock(), or by similar primitives such as |
222 | rcu_read_lock_bh() and rcu_read_unlock_bh(). | 256 | rcu_read_lock_bh() and rcu_read_unlock_bh(), in which case |
257 | the matching rcu_dereference() primitive must be used in order | ||
258 | to keep lockdep happy, in this case, rcu_dereference_bh(). | ||
223 | 259 | ||
224 | The reason that it is permissible to use RCU list-traversal | 260 | The reason that it is permissible to use RCU list-traversal |
225 | primitives when the update-side lock is held is that doing so | 261 | primitives when the update-side lock is held is that doing so |
@@ -229,7 +265,8 @@ over a rather long period of time, but improvements are always welcome! | |||
229 | 10. Conversely, if you are in an RCU read-side critical section, | 265 | 10. Conversely, if you are in an RCU read-side critical section, |
230 | and you don't hold the appropriate update-side lock, you -must- | 266 | and you don't hold the appropriate update-side lock, you -must- |
231 | use the "_rcu()" variants of the list macros. Failing to do so | 267 | use the "_rcu()" variants of the list macros. Failing to do so |
232 | will break Alpha and confuse people reading your code. | 268 | will break Alpha, cause aggressive compilers to generate bad code, |
269 | and confuse people trying to read your code. | ||
233 | 270 | ||
234 | 11. Note that synchronize_rcu() -only- guarantees to wait until | 271 | 11. Note that synchronize_rcu() -only- guarantees to wait until |
235 | all currently executing rcu_read_lock()-protected RCU read-side | 272 | all currently executing rcu_read_lock()-protected RCU read-side |
@@ -239,15 +276,21 @@ over a rather long period of time, but improvements are always welcome! | |||
239 | rcu_read_lock()-protected read-side critical sections, do -not- | 276 | rcu_read_lock()-protected read-side critical sections, do -not- |
240 | use synchronize_rcu(). | 277 | use synchronize_rcu(). |
241 | 278 | ||
242 | If you want to wait for some of these other things, you might | 279 | Similarly, disabling preemption is not an acceptable substitute |
243 | instead need to use synchronize_irq() or synchronize_sched(). | 280 | for rcu_read_lock(). Code that attempts to use preemption |
281 | disabling where it should be using rcu_read_lock() will break | ||
282 | in real-time kernel builds. | ||
283 | |||
284 | If you want to wait for interrupt handlers, NMI handlers, and | ||
285 | code under the influence of preempt_disable(), you instead | ||
286 | need to use synchronize_irq() or synchronize_sched(). | ||
244 | 287 | ||
245 | 12. Any lock acquired by an RCU callback must be acquired elsewhere | 288 | 12. Any lock acquired by an RCU callback must be acquired elsewhere |
246 | with softirq disabled, e.g., via spin_lock_irqsave(), | 289 | with softirq disabled, e.g., via spin_lock_irqsave(), |
247 | spin_lock_bh(), etc. Failing to disable irq on a given | 290 | spin_lock_bh(), etc. Failing to disable irq on a given |
248 | acquisition of that lock will result in deadlock as soon as the | 291 | acquisition of that lock will result in deadlock as soon as |
249 | RCU callback happens to interrupt that acquisition's critical | 292 | the RCU softirq handler happens to run your RCU callback while |
250 | section. | 293 | interrupting that acquisition's critical section. |
251 | 294 | ||
252 | 13. RCU callbacks can be and are executed in parallel. In many cases, | 295 | 13. RCU callbacks can be and are executed in parallel. In many cases, |
253 | the callback code simply wrappers around kfree(), so that this | 296 | the callback code simply wrappers around kfree(), so that this |
@@ -265,29 +308,30 @@ over a rather long period of time, but improvements are always welcome! | |||
265 | not the case, a self-spawning RCU callback would prevent the | 308 | not the case, a self-spawning RCU callback would prevent the |
266 | victim CPU from ever going offline.) | 309 | victim CPU from ever going offline.) |
267 | 310 | ||
268 | 14. SRCU (srcu_read_lock(), srcu_read_unlock(), and synchronize_srcu()) | 311 | 14. SRCU (srcu_read_lock(), srcu_read_unlock(), srcu_dereference(), |
269 | may only be invoked from process context. Unlike other forms of | 312 | synchronize_srcu(), and synchronize_srcu_expedited()) may only |
270 | RCU, it -is- permissible to block in an SRCU read-side critical | 313 | be invoked from process context. Unlike other forms of RCU, it |
271 | section (demarked by srcu_read_lock() and srcu_read_unlock()), | 314 | -is- permissible to block in an SRCU read-side critical section |
272 | hence the "SRCU": "sleepable RCU". Please note that if you | 315 | (demarked by srcu_read_lock() and srcu_read_unlock()), hence the |
273 | don't need to sleep in read-side critical sections, you should | 316 | "SRCU": "sleepable RCU". Please note that if you don't need |
274 | be using RCU rather than SRCU, because RCU is almost always | 317 | to sleep in read-side critical sections, you should be using |
275 | faster and easier to use than is SRCU. | 318 | RCU rather than SRCU, because RCU is almost always faster and |
319 | easier to use than is SRCU. | ||
276 | 320 | ||
277 | Also unlike other forms of RCU, explicit initialization | 321 | Also unlike other forms of RCU, explicit initialization |
278 | and cleanup is required via init_srcu_struct() and | 322 | and cleanup is required via init_srcu_struct() and |
279 | cleanup_srcu_struct(). These are passed a "struct srcu_struct" | 323 | cleanup_srcu_struct(). These are passed a "struct srcu_struct" |
280 | that defines the scope of a given SRCU domain. Once initialized, | 324 | that defines the scope of a given SRCU domain. Once initialized, |
281 | the srcu_struct is passed to srcu_read_lock(), srcu_read_unlock() | 325 | the srcu_struct is passed to srcu_read_lock(), srcu_read_unlock() |
282 | and synchronize_srcu(). A given synchronize_srcu() waits only | 326 | synchronize_srcu(), and synchronize_srcu_expedited(). A given |
283 | for SRCU read-side critical sections governed by srcu_read_lock() | 327 | synchronize_srcu() waits only for SRCU read-side critical |
284 | and srcu_read_unlock() calls that have been passd the same | 328 | sections governed by srcu_read_lock() and srcu_read_unlock() |
285 | srcu_struct. This property is what makes sleeping read-side | 329 | calls that have been passed the same srcu_struct. This property |
286 | critical sections tolerable -- a given subsystem delays only | 330 | is what makes sleeping read-side critical sections tolerable -- |
287 | its own updates, not those of other subsystems using SRCU. | 331 | a given subsystem delays only its own updates, not those of other |
288 | Therefore, SRCU is less prone to OOM the system than RCU would | 332 | subsystems using SRCU. Therefore, SRCU is less prone to OOM the |
289 | be if RCU's read-side critical sections were permitted to | 333 | system than RCU would be if RCU's read-side critical sections |
290 | sleep. | 334 | were permitted to sleep. |
291 | 335 | ||
292 | The ability to sleep in read-side critical sections does not | 336 | The ability to sleep in read-side critical sections does not |
293 | come for free. First, corresponding srcu_read_lock() and | 337 | come for free. First, corresponding srcu_read_lock() and |
@@ -311,12 +355,12 @@ over a rather long period of time, but improvements are always welcome! | |||
311 | destructive operation, and -only- -then- invoke call_rcu(), | 355 | destructive operation, and -only- -then- invoke call_rcu(), |
312 | synchronize_rcu(), or friends. | 356 | synchronize_rcu(), or friends. |
313 | 357 | ||
314 | Because these primitives only wait for pre-existing readers, | 358 | Because these primitives only wait for pre-existing readers, it |
315 | it is the caller's responsibility to guarantee safety to | 359 | is the caller's responsibility to guarantee that any subsequent |
316 | any subsequent readers. | 360 | readers will execute safely. |
317 | 361 | ||
318 | 16. The various RCU read-side primitives do -not- contain memory | 362 | 16. The various RCU read-side primitives do -not- necessarily contain |
319 | barriers. The CPU (and in some cases, the compiler) is free | 363 | memory barriers. You should therefore plan for the CPU |
320 | to reorder code into and out of RCU read-side critical sections. | 364 | and the compiler to freely reorder code into and out of RCU |
321 | It is the responsibility of the RCU update-side primitives to | 365 | read-side critical sections. It is the responsibility of the |
322 | deal with this. | 366 | RCU update-side primitives to deal with this. |
diff --git a/Documentation/RCU/lockdep.txt b/Documentation/RCU/lockdep.txt new file mode 100644 index 000000000000..fe24b58627bd --- /dev/null +++ b/Documentation/RCU/lockdep.txt | |||
@@ -0,0 +1,67 @@ | |||
1 | RCU and lockdep checking | ||
2 | |||
3 | All flavors of RCU have lockdep checking available, so that lockdep is | ||
4 | aware of when each task enters and leaves any flavor of RCU read-side | ||
5 | critical section. Each flavor of RCU is tracked separately (but note | ||
6 | that this is not the case in 2.6.32 and earlier). This allows lockdep's | ||
7 | tracking to include RCU state, which can sometimes help when debugging | ||
8 | deadlocks and the like. | ||
9 | |||
10 | In addition, RCU provides the following primitives that check lockdep's | ||
11 | state: | ||
12 | |||
13 | rcu_read_lock_held() for normal RCU. | ||
14 | rcu_read_lock_bh_held() for RCU-bh. | ||
15 | rcu_read_lock_sched_held() for RCU-sched. | ||
16 | srcu_read_lock_held() for SRCU. | ||
17 | |||
18 | These functions are conservative, and will therefore return 1 if they | ||
19 | aren't certain (for example, if CONFIG_DEBUG_LOCK_ALLOC is not set). | ||
20 | This prevents things like WARN_ON(!rcu_read_lock_held()) from giving false | ||
21 | positives when lockdep is disabled. | ||
22 | |||
23 | In addition, a separate kernel config parameter CONFIG_PROVE_RCU enables | ||
24 | checking of rcu_dereference() primitives: | ||
25 | |||
26 | rcu_dereference(p): | ||
27 | Check for RCU read-side critical section. | ||
28 | rcu_dereference_bh(p): | ||
29 | Check for RCU-bh read-side critical section. | ||
30 | rcu_dereference_sched(p): | ||
31 | Check for RCU-sched read-side critical section. | ||
32 | srcu_dereference(p, sp): | ||
33 | Check for SRCU read-side critical section. | ||
34 | rcu_dereference_check(p, c): | ||
35 | Use explicit check expression "c". | ||
36 | rcu_dereference_raw(p) | ||
37 | Don't check. (Use sparingly, if at all.) | ||
38 | |||
39 | The rcu_dereference_check() check expression can be any boolean | ||
40 | expression, but would normally include one of the rcu_read_lock_held() | ||
41 | family of functions and a lockdep expression. However, any boolean | ||
42 | expression can be used. For a moderately ornate example, consider | ||
43 | the following: | ||
44 | |||
45 | file = rcu_dereference_check(fdt->fd[fd], | ||
46 | rcu_read_lock_held() || | ||
47 | lockdep_is_held(&files->file_lock) || | ||
48 | atomic_read(&files->count) == 1); | ||
49 | |||
50 | This expression picks up the pointer "fdt->fd[fd]" in an RCU-safe manner, | ||
51 | and, if CONFIG_PROVE_RCU is configured, verifies that this expression | ||
52 | is used in: | ||
53 | |||
54 | 1. An RCU read-side critical section, or | ||
55 | 2. with files->file_lock held, or | ||
56 | 3. on an unshared files_struct. | ||
57 | |||
58 | In case (1), the pointer is picked up in an RCU-safe manner for vanilla | ||
59 | RCU read-side critical sections, in case (2) the ->file_lock prevents | ||
60 | any change from taking place, and finally, in case (3) the current task | ||
61 | is the only task accessing the file_struct, again preventing any change | ||
62 | from taking place. | ||
63 | |||
64 | There are currently only "universal" versions of the rcu_assign_pointer() | ||
65 | and RCU list-/tree-traversal primitives, which do not (yet) check for | ||
66 | being in an RCU read-side critical section. In the future, separate | ||
67 | versions of these primitives might be created. | ||
diff --git a/Documentation/RCU/rcu.txt b/Documentation/RCU/rcu.txt index 2a23523ce471..31852705b586 100644 --- a/Documentation/RCU/rcu.txt +++ b/Documentation/RCU/rcu.txt | |||
@@ -75,6 +75,8 @@ o I hear that RCU is patented? What is with that? | |||
75 | search for the string "Patent" in RTFP.txt to find them. | 75 | search for the string "Patent" in RTFP.txt to find them. |
76 | Of these, one was allowed to lapse by the assignee, and the | 76 | Of these, one was allowed to lapse by the assignee, and the |
77 | others have been contributed to the Linux kernel under GPL. | 77 | others have been contributed to the Linux kernel under GPL. |
78 | There are now also LGPL implementations of user-level RCU | ||
79 | available (http://lttng.org/?q=node/18). | ||
78 | 80 | ||
79 | o I hear that RCU needs work in order to support realtime kernels? | 81 | o I hear that RCU needs work in order to support realtime kernels? |
80 | 82 | ||
@@ -91,48 +93,4 @@ o Where can I find more information on RCU? | |||
91 | 93 | ||
92 | o What are all these files in this directory? | 94 | o What are all these files in this directory? |
93 | 95 | ||
94 | 96 | See 00-INDEX for the list. | |
95 | NMI-RCU.txt | ||
96 | |||
97 | Describes how to use RCU to implement dynamic | ||
98 | NMI handlers, which can be revectored on the fly, | ||
99 | without rebooting. | ||
100 | |||
101 | RTFP.txt | ||
102 | |||
103 | List of RCU-related publications and web sites. | ||
104 | |||
105 | UP.txt | ||
106 | |||
107 | Discussion of RCU usage in UP kernels. | ||
108 | |||
109 | arrayRCU.txt | ||
110 | |||
111 | Describes how to use RCU to protect arrays, with | ||
112 | resizeable arrays whose elements reference other | ||
113 | data structures being of the most interest. | ||
114 | |||
115 | checklist.txt | ||
116 | |||
117 | Lists things to check for when inspecting code that | ||
118 | uses RCU. | ||
119 | |||
120 | listRCU.txt | ||
121 | |||
122 | Describes how to use RCU to protect linked lists. | ||
123 | This is the simplest and most common use of RCU | ||
124 | in the Linux kernel. | ||
125 | |||
126 | rcu.txt | ||
127 | |||
128 | You are reading it! | ||
129 | |||
130 | rcuref.txt | ||
131 | |||
132 | Describes how to combine use of reference counts | ||
133 | with RCU. | ||
134 | |||
135 | whatisRCU.txt | ||
136 | |||
137 | Overview of how the RCU implementation works. Along | ||
138 | the way, presents a conceptual view of RCU. | ||
diff --git a/Documentation/RCU/stallwarn.txt b/Documentation/RCU/stallwarn.txt new file mode 100644 index 000000000000..1423d2570d78 --- /dev/null +++ b/Documentation/RCU/stallwarn.txt | |||
@@ -0,0 +1,58 @@ | |||
1 | Using RCU's CPU Stall Detector | ||
2 | |||
3 | The CONFIG_RCU_CPU_STALL_DETECTOR kernel config parameter enables | ||
4 | RCU's CPU stall detector, which detects conditions that unduly delay | ||
5 | RCU grace periods. The stall detector's idea of what constitutes | ||
6 | "unduly delayed" is controlled by a pair of C preprocessor macros: | ||
7 | |||
8 | RCU_SECONDS_TILL_STALL_CHECK | ||
9 | |||
10 | This macro defines the period of time that RCU will wait from | ||
11 | the beginning of a grace period until it issues an RCU CPU | ||
12 | stall warning. It is normally ten seconds. | ||
13 | |||
14 | RCU_SECONDS_TILL_STALL_RECHECK | ||
15 | |||
16 | This macro defines the period of time that RCU will wait after | ||
17 | issuing a stall warning until it issues another stall warning. | ||
18 | It is normally set to thirty seconds. | ||
19 | |||
20 | RCU_STALL_RAT_DELAY | ||
21 | |||
22 | The CPU stall detector tries to make the offending CPU rat on itself, | ||
23 | as this often gives better-quality stack traces. However, if | ||
24 | the offending CPU does not detect its own stall in the number | ||
25 | of jiffies specified by RCU_STALL_RAT_DELAY, then other CPUs will | ||
26 | complain. This is normally set to two jiffies. | ||
27 | |||
28 | The following problems can result in an RCU CPU stall warning: | ||
29 | |||
30 | o A CPU looping in an RCU read-side critical section. | ||
31 | |||
32 | o A CPU looping with interrupts disabled. | ||
33 | |||
34 | o A CPU looping with preemption disabled. | ||
35 | |||
36 | o For !CONFIG_PREEMPT kernels, a CPU looping anywhere in the kernel | ||
37 | without invoking schedule(). | ||
38 | |||
39 | o A bug in the RCU implementation. | ||
40 | |||
41 | o A hardware failure. This is quite unlikely, but has occurred | ||
42 | at least once in a former life. A CPU failed in a running system, | ||
43 | becoming unresponsive, but not causing an immediate crash. | ||
44 | This resulted in a series of RCU CPU stall warnings, eventually | ||
45 | leading the realization that the CPU had failed. | ||
46 | |||
47 | The RCU, RCU-sched, and RCU-bh implementations have CPU stall warning. | ||
48 | SRCU does not do so directly, but its calls to synchronize_sched() will | ||
49 | result in RCU-sched detecting any CPU stalls that might be occurring. | ||
50 | |||
51 | To diagnose the cause of the stall, inspect the stack traces. The offending | ||
52 | function will usually be near the top of the stack. If you have a series | ||
53 | of stall warnings from a single extended stall, comparing the stack traces | ||
54 | can often help determine where the stall is occurring, which will usually | ||
55 | be in the function nearest the top of the stack that stays the same from | ||
56 | trace to trace. | ||
57 | |||
58 | RCU bugs can often be debugged with the help of CONFIG_RCU_TRACE. | ||
diff --git a/Documentation/RCU/torture.txt b/Documentation/RCU/torture.txt index 9dba3bb90e60..0e50bc2aa1e2 100644 --- a/Documentation/RCU/torture.txt +++ b/Documentation/RCU/torture.txt | |||
@@ -30,6 +30,18 @@ MODULE PARAMETERS | |||
30 | 30 | ||
31 | This module has the following parameters: | 31 | This module has the following parameters: |
32 | 32 | ||
33 | fqs_duration Duration (in microseconds) of artificially induced bursts | ||
34 | of force_quiescent_state() invocations. In RCU | ||
35 | implementations having force_quiescent_state(), these | ||
36 | bursts help force races between forcing a given grace | ||
37 | period and that grace period ending on its own. | ||
38 | |||
39 | fqs_holdoff Holdoff time (in microseconds) between consecutive calls | ||
40 | to force_quiescent_state() within a burst. | ||
41 | |||
42 | fqs_stutter Wait time (in seconds) between consecutive bursts | ||
43 | of calls to force_quiescent_state(). | ||
44 | |||
33 | irqreaders Says to invoke RCU readers from irq level. This is currently | 45 | irqreaders Says to invoke RCU readers from irq level. This is currently |
34 | done via timers. Defaults to "1" for variants of RCU that | 46 | done via timers. Defaults to "1" for variants of RCU that |
35 | permit this. (Or, more accurately, variants of RCU that do | 47 | permit this. (Or, more accurately, variants of RCU that do |
diff --git a/Documentation/RCU/whatisRCU.txt b/Documentation/RCU/whatisRCU.txt index d542ca243b80..1dc00ee97163 100644 --- a/Documentation/RCU/whatisRCU.txt +++ b/Documentation/RCU/whatisRCU.txt | |||
@@ -323,14 +323,17 @@ used as follows: | |||
323 | Defer Protect | 323 | Defer Protect |
324 | 324 | ||
325 | a. synchronize_rcu() rcu_read_lock() / rcu_read_unlock() | 325 | a. synchronize_rcu() rcu_read_lock() / rcu_read_unlock() |
326 | call_rcu() | 326 | call_rcu() rcu_dereference() |
327 | 327 | ||
328 | b. call_rcu_bh() rcu_read_lock_bh() / rcu_read_unlock_bh() | 328 | b. call_rcu_bh() rcu_read_lock_bh() / rcu_read_unlock_bh() |
329 | rcu_dereference_bh() | ||
329 | 330 | ||
330 | c. synchronize_sched() preempt_disable() / preempt_enable() | 331 | c. synchronize_sched() rcu_read_lock_sched() / rcu_read_unlock_sched() |
332 | preempt_disable() / preempt_enable() | ||
331 | local_irq_save() / local_irq_restore() | 333 | local_irq_save() / local_irq_restore() |
332 | hardirq enter / hardirq exit | 334 | hardirq enter / hardirq exit |
333 | NMI enter / NMI exit | 335 | NMI enter / NMI exit |
336 | rcu_dereference_sched() | ||
334 | 337 | ||
335 | These three mechanisms are used as follows: | 338 | These three mechanisms are used as follows: |
336 | 339 | ||
@@ -780,9 +783,8 @@ Linux-kernel source code, but it helps to have a full list of the | |||
780 | APIs, since there does not appear to be a way to categorize them | 783 | APIs, since there does not appear to be a way to categorize them |
781 | in docbook. Here is the list, by category. | 784 | in docbook. Here is the list, by category. |
782 | 785 | ||
783 | RCU pointer/list traversal: | 786 | RCU list traversal: |
784 | 787 | ||
785 | rcu_dereference | ||
786 | list_for_each_entry_rcu | 788 | list_for_each_entry_rcu |
787 | hlist_for_each_entry_rcu | 789 | hlist_for_each_entry_rcu |
788 | hlist_nulls_for_each_entry_rcu | 790 | hlist_nulls_for_each_entry_rcu |
@@ -808,7 +810,7 @@ RCU: Critical sections Grace period Barrier | |||
808 | 810 | ||
809 | rcu_read_lock synchronize_net rcu_barrier | 811 | rcu_read_lock synchronize_net rcu_barrier |
810 | rcu_read_unlock synchronize_rcu | 812 | rcu_read_unlock synchronize_rcu |
811 | synchronize_rcu_expedited | 813 | rcu_dereference synchronize_rcu_expedited |
812 | call_rcu | 814 | call_rcu |
813 | 815 | ||
814 | 816 | ||
@@ -816,7 +818,7 @@ bh: Critical sections Grace period Barrier | |||
816 | 818 | ||
817 | rcu_read_lock_bh call_rcu_bh rcu_barrier_bh | 819 | rcu_read_lock_bh call_rcu_bh rcu_barrier_bh |
818 | rcu_read_unlock_bh synchronize_rcu_bh | 820 | rcu_read_unlock_bh synchronize_rcu_bh |
819 | synchronize_rcu_bh_expedited | 821 | rcu_dereference_bh synchronize_rcu_bh_expedited |
820 | 822 | ||
821 | 823 | ||
822 | sched: Critical sections Grace period Barrier | 824 | sched: Critical sections Grace period Barrier |
@@ -825,12 +827,14 @@ sched: Critical sections Grace period Barrier | |||
825 | rcu_read_unlock_sched call_rcu_sched | 827 | rcu_read_unlock_sched call_rcu_sched |
826 | [preempt_disable] synchronize_sched_expedited | 828 | [preempt_disable] synchronize_sched_expedited |
827 | [and friends] | 829 | [and friends] |
830 | rcu_dereference_sched | ||
828 | 831 | ||
829 | 832 | ||
830 | SRCU: Critical sections Grace period Barrier | 833 | SRCU: Critical sections Grace period Barrier |
831 | 834 | ||
832 | srcu_read_lock synchronize_srcu N/A | 835 | srcu_read_lock synchronize_srcu N/A |
833 | srcu_read_unlock synchronize_srcu_expedited | 836 | srcu_read_unlock synchronize_srcu_expedited |
837 | srcu_dereference | ||
834 | 838 | ||
835 | SRCU: Initialization/cleanup | 839 | SRCU: Initialization/cleanup |
836 | init_srcu_struct | 840 | init_srcu_struct |
diff --git a/Documentation/SubmitChecklist b/Documentation/SubmitChecklist index 1053a56be3b1..8916ca48bc95 100644 --- a/Documentation/SubmitChecklist +++ b/Documentation/SubmitChecklist | |||
@@ -9,10 +9,14 @@ Documentation/SubmittingPatches and elsewhere regarding submitting Linux | |||
9 | kernel patches. | 9 | kernel patches. |
10 | 10 | ||
11 | 11 | ||
12 | 1: Builds cleanly with applicable or modified CONFIG options =y, =m, and | 12 | 1: If you use a facility then #include the file that defines/declares |
13 | that facility. Don't depend on other header files pulling in ones | ||
14 | that you use. | ||
15 | |||
16 | 2: Builds cleanly with applicable or modified CONFIG options =y, =m, and | ||
13 | =n. No gcc warnings/errors, no linker warnings/errors. | 17 | =n. No gcc warnings/errors, no linker warnings/errors. |
14 | 18 | ||
15 | 2: Passes allnoconfig, allmodconfig | 19 | 2b: Passes allnoconfig, allmodconfig |
16 | 20 | ||
17 | 3: Builds on multiple CPU architectures by using local cross-compile tools | 21 | 3: Builds on multiple CPU architectures by using local cross-compile tools |
18 | or some other build farm. | 22 | or some other build farm. |
diff --git a/Documentation/arm/Samsung-S3C24XX/CPUfreq.txt b/Documentation/arm/Samsung-S3C24XX/CPUfreq.txt index 76b3a11e90be..fa968aa99d67 100644 --- a/Documentation/arm/Samsung-S3C24XX/CPUfreq.txt +++ b/Documentation/arm/Samsung-S3C24XX/CPUfreq.txt | |||
@@ -14,8 +14,8 @@ Introduction | |||
14 | how the clocks are arranged. The first implementation used as single | 14 | how the clocks are arranged. The first implementation used as single |
15 | PLL to feed the ARM, memory and peripherals via a series of dividers | 15 | PLL to feed the ARM, memory and peripherals via a series of dividers |
16 | and muxes and this is the implementation that is documented here. A | 16 | and muxes and this is the implementation that is documented here. A |
17 | newer version where there is a seperate PLL and clock divider for the | 17 | newer version where there is a separate PLL and clock divider for the |
18 | ARM core is available as a seperate driver. | 18 | ARM core is available as a separate driver. |
19 | 19 | ||
20 | 20 | ||
21 | Layout | 21 | Layout |
diff --git a/Documentation/arm/Samsung/Overview.txt b/Documentation/arm/Samsung/Overview.txt new file mode 100644 index 000000000000..7cced1fea9c3 --- /dev/null +++ b/Documentation/arm/Samsung/Overview.txt | |||
@@ -0,0 +1,86 @@ | |||
1 | Samsung ARM Linux Overview | ||
2 | ========================== | ||
3 | |||
4 | Introduction | ||
5 | ------------ | ||
6 | |||
7 | The Samsung range of ARM SoCs spans many similar devices, from the initial | ||
8 | ARM9 through to the newest ARM cores. This document shows an overview of | ||
9 | the current kernel support, how to use it and where to find the code | ||
10 | that supports this. | ||
11 | |||
12 | The currently supported SoCs are: | ||
13 | |||
14 | - S3C24XX: See Documentation/arm/Samsung-S3C24XX/Overview.txt for full list | ||
15 | - S3C64XX: S3C6400 and S3C6410 | ||
16 | - S5PC6440 | ||
17 | |||
18 | S5PC100 and S5PC110 support is currently being merged | ||
19 | |||
20 | |||
21 | S3C24XX Systems | ||
22 | --------------- | ||
23 | |||
24 | There is still documentation in Documnetation/arm/Samsung-S3C24XX/ which | ||
25 | deals with the architecture and drivers specific to these devices. | ||
26 | |||
27 | See Documentation/arm/Samsung-S3C24XX/Overview.txt for more information | ||
28 | on the implementation details and specific support. | ||
29 | |||
30 | |||
31 | Configuration | ||
32 | ------------- | ||
33 | |||
34 | A number of configurations are supplied, as there is no current way of | ||
35 | unifying all the SoCs into one kernel. | ||
36 | |||
37 | s5p6440_defconfig - S5P6440 specific default configuration | ||
38 | s5pc100_defconfig - S5PC100 specific default configuration | ||
39 | |||
40 | |||
41 | Layout | ||
42 | ------ | ||
43 | |||
44 | The directory layout is currently being restructured, and consists of | ||
45 | several platform directories and then the machine specific directories | ||
46 | of the CPUs being built for. | ||
47 | |||
48 | plat-samsung provides the base for all the implementations, and is the | ||
49 | last in the line of include directories that are processed for the build | ||
50 | specific information. It contains the base clock, GPIO and device definitions | ||
51 | to get the system running. | ||
52 | |||
53 | plat-s3c is the s3c24xx/s3c64xx platform directory, although it is currently | ||
54 | involved in other builds this will be phased out once the relevant code is | ||
55 | moved elsewhere. | ||
56 | |||
57 | plat-s3c24xx is for s3c24xx specific builds, see the S3C24XX docs. | ||
58 | |||
59 | plat-s3c64xx is for the s3c64xx specific bits, see the S3C24XX docs. | ||
60 | |||
61 | plat-s5p is for s5p specific builds, more to be added. | ||
62 | |||
63 | |||
64 | [ to finish ] | ||
65 | |||
66 | |||
67 | Port Contributors | ||
68 | ----------------- | ||
69 | |||
70 | Ben Dooks (BJD) | ||
71 | Vincent Sanders | ||
72 | Herbert Potzl | ||
73 | Arnaud Patard (RTP) | ||
74 | Roc Wu | ||
75 | Klaus Fetscher | ||
76 | Dimitry Andric | ||
77 | Shannon Holland | ||
78 | Guillaume Gourat (NexVision) | ||
79 | Christer Weinigel (wingel) (Acer N30) | ||
80 | Lucas Correia Villa Real (S3C2400 port) | ||
81 | |||
82 | |||
83 | Document Author | ||
84 | --------------- | ||
85 | |||
86 | Copyright 2009-2010 Ben Dooks <ben-linux@fluff.org> | ||
diff --git a/Documentation/arm/Samsung/clksrc-change-registers.awk b/Documentation/arm/Samsung/clksrc-change-registers.awk new file mode 100755 index 000000000000..0c50220851fb --- /dev/null +++ b/Documentation/arm/Samsung/clksrc-change-registers.awk | |||
@@ -0,0 +1,167 @@ | |||
1 | #!/usr/bin/awk -f | ||
2 | # | ||
3 | # Copyright 2010 Ben Dooks <ben-linux@fluff.org> | ||
4 | # | ||
5 | # Released under GPLv2 | ||
6 | |||
7 | # example usage | ||
8 | # ./clksrc-change-registers.awk arch/arm/plat-s5pc1xx/include/plat/regs-clock.h < src > dst | ||
9 | |||
10 | function extract_value(s) | ||
11 | { | ||
12 | eqat = index(s, "=") | ||
13 | comat = index(s, ",") | ||
14 | return substr(s, eqat+2, (comat-eqat)-2) | ||
15 | } | ||
16 | |||
17 | function remove_brackets(b) | ||
18 | { | ||
19 | return substr(b, 2, length(b)-2) | ||
20 | } | ||
21 | |||
22 | function splitdefine(l, p) | ||
23 | { | ||
24 | r = split(l, tp) | ||
25 | |||
26 | p[0] = tp[2] | ||
27 | p[1] = remove_brackets(tp[3]) | ||
28 | } | ||
29 | |||
30 | function find_length(f) | ||
31 | { | ||
32 | if (0) | ||
33 | printf "find_length " f "\n" > "/dev/stderr" | ||
34 | |||
35 | if (f ~ /0x1/) | ||
36 | return 1 | ||
37 | else if (f ~ /0x3/) | ||
38 | return 2 | ||
39 | else if (f ~ /0x7/) | ||
40 | return 3 | ||
41 | else if (f ~ /0xf/) | ||
42 | return 4 | ||
43 | |||
44 | printf "unknown legnth " f "\n" > "/dev/stderr" | ||
45 | exit | ||
46 | } | ||
47 | |||
48 | function find_shift(s) | ||
49 | { | ||
50 | id = index(s, "<") | ||
51 | if (id <= 0) { | ||
52 | printf "cannot find shift " s "\n" > "/dev/stderr" | ||
53 | exit | ||
54 | } | ||
55 | |||
56 | return substr(s, id+2) | ||
57 | } | ||
58 | |||
59 | |||
60 | BEGIN { | ||
61 | if (ARGC < 2) { | ||
62 | print "too few arguments" > "/dev/stderr" | ||
63 | exit | ||
64 | } | ||
65 | |||
66 | # read the header file and find the mask values that we will need | ||
67 | # to replace and create an associative array of values | ||
68 | |||
69 | while (getline line < ARGV[1] > 0) { | ||
70 | if (line ~ /\#define.*_MASK/ && | ||
71 | !(line ~ /S5PC100_EPLL_MASK/) && | ||
72 | !(line ~ /USB_SIG_MASK/)) { | ||
73 | splitdefine(line, fields) | ||
74 | name = fields[0] | ||
75 | if (0) | ||
76 | printf "MASK " line "\n" > "/dev/stderr" | ||
77 | dmask[name,0] = find_length(fields[1]) | ||
78 | dmask[name,1] = find_shift(fields[1]) | ||
79 | if (0) | ||
80 | printf "=> '" name "' LENGTH=" dmask[name,0] " SHIFT=" dmask[name,1] "\n" > "/dev/stderr" | ||
81 | } else { | ||
82 | } | ||
83 | } | ||
84 | |||
85 | delete ARGV[1] | ||
86 | } | ||
87 | |||
88 | /clksrc_clk.*=.*{/ { | ||
89 | shift="" | ||
90 | mask="" | ||
91 | divshift="" | ||
92 | reg_div="" | ||
93 | reg_src="" | ||
94 | indent=1 | ||
95 | |||
96 | print $0 | ||
97 | |||
98 | for(; indent >= 1;) { | ||
99 | if ((getline line) <= 0) { | ||
100 | printf "unexpected end of file" > "/dev/stderr" | ||
101 | exit 1; | ||
102 | } | ||
103 | |||
104 | if (line ~ /\.shift/) { | ||
105 | shift = extract_value(line) | ||
106 | } else if (line ~ /\.mask/) { | ||
107 | mask = extract_value(line) | ||
108 | } else if (line ~ /\.reg_divider/) { | ||
109 | reg_div = extract_value(line) | ||
110 | } else if (line ~ /\.reg_source/) { | ||
111 | reg_src = extract_value(line) | ||
112 | } else if (line ~ /\.divider_shift/) { | ||
113 | divshift = extract_value(line) | ||
114 | } else if (line ~ /{/) { | ||
115 | indent++ | ||
116 | print line | ||
117 | } else if (line ~ /}/) { | ||
118 | indent-- | ||
119 | |||
120 | if (indent == 0) { | ||
121 | if (0) { | ||
122 | printf "shift '" shift "' ='" dmask[shift,0] "'\n" > "/dev/stderr" | ||
123 | printf "mask '" mask "'\n" > "/dev/stderr" | ||
124 | printf "dshft '" divshift "'\n" > "/dev/stderr" | ||
125 | printf "rdiv '" reg_div "'\n" > "/dev/stderr" | ||
126 | printf "rsrc '" reg_src "'\n" > "/dev/stderr" | ||
127 | } | ||
128 | |||
129 | generated = mask | ||
130 | sub(reg_src, reg_div, generated) | ||
131 | |||
132 | if (0) { | ||
133 | printf "/* rsrc " reg_src " */\n" | ||
134 | printf "/* rdiv " reg_div " */\n" | ||
135 | printf "/* shift " shift " */\n" | ||
136 | printf "/* mask " mask " */\n" | ||
137 | printf "/* generated " generated " */\n" | ||
138 | } | ||
139 | |||
140 | if (reg_div != "") { | ||
141 | printf "\t.reg_div = { " | ||
142 | printf ".reg = " reg_div ", " | ||
143 | printf ".shift = " dmask[generated,1] ", " | ||
144 | printf ".size = " dmask[generated,0] ", " | ||
145 | printf "},\n" | ||
146 | } | ||
147 | |||
148 | printf "\t.reg_src = { " | ||
149 | printf ".reg = " reg_src ", " | ||
150 | printf ".shift = " dmask[mask,1] ", " | ||
151 | printf ".size = " dmask[mask,0] ", " | ||
152 | |||
153 | printf "},\n" | ||
154 | |||
155 | } | ||
156 | |||
157 | print line | ||
158 | } else { | ||
159 | print line | ||
160 | } | ||
161 | |||
162 | if (0) | ||
163 | printf indent ":" line "\n" > "/dev/stderr" | ||
164 | } | ||
165 | } | ||
166 | |||
167 | // && ! /clksrc_clk.*=.*{/ { print $0 } | ||
diff --git a/Documentation/arm/memory.txt b/Documentation/arm/memory.txt index 9d58c7c5eddd..eb0fae18ffb1 100644 --- a/Documentation/arm/memory.txt +++ b/Documentation/arm/memory.txt | |||
@@ -59,7 +59,11 @@ PAGE_OFFSET high_memory-1 Kernel direct-mapped RAM region. | |||
59 | This maps the platforms RAM, and typically | 59 | This maps the platforms RAM, and typically |
60 | maps all platform RAM in a 1:1 relationship. | 60 | maps all platform RAM in a 1:1 relationship. |
61 | 61 | ||
62 | TASK_SIZE PAGE_OFFSET-1 Kernel module space | 62 | PKMAP_BASE PAGE_OFFSET-1 Permanent kernel mappings |
63 | One way of mapping HIGHMEM pages into kernel | ||
64 | space. | ||
65 | |||
66 | MODULES_VADDR MODULES_END-1 Kernel module space | ||
63 | Kernel modules inserted via insmod are | 67 | Kernel modules inserted via insmod are |
64 | placed here using dynamic mappings. | 68 | placed here using dynamic mappings. |
65 | 69 | ||
diff --git a/Documentation/block/queue-sysfs.txt b/Documentation/block/queue-sysfs.txt index e164403f60e1..f65274081c8d 100644 --- a/Documentation/block/queue-sysfs.txt +++ b/Documentation/block/queue-sysfs.txt | |||
@@ -25,11 +25,11 @@ size allowed by the hardware. | |||
25 | 25 | ||
26 | nomerges (RW) | 26 | nomerges (RW) |
27 | ------------- | 27 | ------------- |
28 | This enables the user to disable the lookup logic involved with IO merging | 28 | This enables the user to disable the lookup logic involved with IO |
29 | requests in the block layer. Merging may still occur through a direct | 29 | merging requests in the block layer. By default (0) all merges are |
30 | 1-hit cache, since that comes for (almost) free. The IO scheduler will not | 30 | enabled. When set to 1 only simple one-hit merges will be tried. When |
31 | waste cycles doing tree/hash lookups for merges if nomerges is 1. Defaults | 31 | set to 2 no merge algorithms will be tried (including one-hit or more |
32 | to 0, enabling all merges. | 32 | complex tree/hash lookups). |
33 | 33 | ||
34 | nr_requests (RW) | 34 | nr_requests (RW) |
35 | ---------------- | 35 | ---------------- |
diff --git a/Documentation/cachetlb.txt b/Documentation/cachetlb.txt index da42ab414c48..2b5f823abd03 100644 --- a/Documentation/cachetlb.txt +++ b/Documentation/cachetlb.txt | |||
@@ -88,12 +88,12 @@ changes occur: | |||
88 | This is used primarily during fault processing. | 88 | This is used primarily during fault processing. |
89 | 89 | ||
90 | 5) void update_mmu_cache(struct vm_area_struct *vma, | 90 | 5) void update_mmu_cache(struct vm_area_struct *vma, |
91 | unsigned long address, pte_t pte) | 91 | unsigned long address, pte_t *ptep) |
92 | 92 | ||
93 | At the end of every page fault, this routine is invoked to | 93 | At the end of every page fault, this routine is invoked to |
94 | tell the architecture specific code that a translation | 94 | tell the architecture specific code that a translation |
95 | described by "pte" now exists at virtual address "address" | 95 | now exists at virtual address "address" for address space |
96 | for address space "vma->vm_mm", in the software page tables. | 96 | "vma->vm_mm", in the software page tables. |
97 | 97 | ||
98 | A port may use this information in any way it so chooses. | 98 | A port may use this information in any way it so chooses. |
99 | For example, it could use this event to pre-load TLB | 99 | For example, it could use this event to pre-load TLB |
@@ -377,3 +377,27 @@ maps this page at its virtual address. | |||
377 | All the functionality of flush_icache_page can be implemented in | 377 | All the functionality of flush_icache_page can be implemented in |
378 | flush_dcache_page and update_mmu_cache. In 2.7 the hope is to | 378 | flush_dcache_page and update_mmu_cache. In 2.7 the hope is to |
379 | remove this interface completely. | 379 | remove this interface completely. |
380 | |||
381 | The final category of APIs is for I/O to deliberately aliased address | ||
382 | ranges inside the kernel. Such aliases are set up by use of the | ||
383 | vmap/vmalloc API. Since kernel I/O goes via physical pages, the I/O | ||
384 | subsystem assumes that the user mapping and kernel offset mapping are | ||
385 | the only aliases. This isn't true for vmap aliases, so anything in | ||
386 | the kernel trying to do I/O to vmap areas must manually manage | ||
387 | coherency. It must do this by flushing the vmap range before doing | ||
388 | I/O and invalidating it after the I/O returns. | ||
389 | |||
390 | void flush_kernel_vmap_range(void *vaddr, int size) | ||
391 | flushes the kernel cache for a given virtual address range in | ||
392 | the vmap area. This is to make sure that any data the kernel | ||
393 | modified in the vmap range is made visible to the physical | ||
394 | page. The design is to make this area safe to perform I/O on. | ||
395 | Note that this API does *not* also flush the offset map alias | ||
396 | of the area. | ||
397 | |||
398 | void invalidate_kernel_vmap_range(void *vaddr, int size) invalidates | ||
399 | the cache for a given virtual address range in the vmap area | ||
400 | which prevents the processor from making the cache stale by | ||
401 | speculatively reading data while the I/O was occurring to the | ||
402 | physical pages. This is only necessary for data reads into the | ||
403 | vmap area. | ||
diff --git a/Documentation/cdrom/ide-cd b/Documentation/cdrom/ide-cd index 2c558cd6c1ef..f4dc9de2694e 100644 --- a/Documentation/cdrom/ide-cd +++ b/Documentation/cdrom/ide-cd | |||
@@ -159,42 +159,7 @@ two arguments: the CDROM device, and the slot number to which you wish | |||
159 | to change. If the slot number is -1, the drive is unloaded. | 159 | to change. If the slot number is -1, the drive is unloaded. |
160 | 160 | ||
161 | 161 | ||
162 | 4. Compilation options | 162 | 4. Common problems |
163 | ---------------------- | ||
164 | |||
165 | There are a few additional options which can be set when compiling the | ||
166 | driver. Most people should not need to mess with any of these; they | ||
167 | are listed here simply for completeness. A compilation option can be | ||
168 | enabled by adding a line of the form `#define <option> 1' to the top | ||
169 | of ide-cd.c. All these options are disabled by default. | ||
170 | |||
171 | VERBOSE_IDE_CD_ERRORS | ||
172 | If this is set, ATAPI error codes will be translated into textual | ||
173 | descriptions. In addition, a dump is made of the command which | ||
174 | provoked the error. This is off by default to save the memory used | ||
175 | by the (somewhat long) table of error descriptions. | ||
176 | |||
177 | STANDARD_ATAPI | ||
178 | If this is set, the code needed to deal with certain drives which do | ||
179 | not properly implement the ATAPI spec will be disabled. If you know | ||
180 | your drive implements ATAPI properly, you can turn this on to get a | ||
181 | slightly smaller kernel. | ||
182 | |||
183 | NO_DOOR_LOCKING | ||
184 | If this is set, the driver will never attempt to lock the door of | ||
185 | the drive. | ||
186 | |||
187 | CDROM_NBLOCKS_BUFFER | ||
188 | This sets the size of the buffer to be used for a CDROMREADAUDIO | ||
189 | ioctl. The default is 8. | ||
190 | |||
191 | TEST | ||
192 | This currently enables an additional ioctl which enables a user-mode | ||
193 | program to execute an arbitrary packet command. See the source for | ||
194 | details. This should be left off unless you know what you're doing. | ||
195 | |||
196 | |||
197 | 5. Common problems | ||
198 | ------------------ | 163 | ------------------ |
199 | 164 | ||
200 | This section discusses some common problems encountered when trying to | 165 | This section discusses some common problems encountered when trying to |
@@ -371,7 +336,7 @@ f. Data corruption. | |||
371 | expense of low system performance. | 336 | expense of low system performance. |
372 | 337 | ||
373 | 338 | ||
374 | 6. cdchange.c | 339 | 5. cdchange.c |
375 | ------------- | 340 | ------------- |
376 | 341 | ||
377 | /* | 342 | /* |
diff --git a/Documentation/cgroups/cgroup_event_listener.c b/Documentation/cgroups/cgroup_event_listener.c new file mode 100644 index 000000000000..8c2bfc4a6358 --- /dev/null +++ b/Documentation/cgroups/cgroup_event_listener.c | |||
@@ -0,0 +1,110 @@ | |||
1 | /* | ||
2 | * cgroup_event_listener.c - Simple listener of cgroup events | ||
3 | * | ||
4 | * Copyright (C) Kirill A. Shutemov <kirill@shutemov.name> | ||
5 | */ | ||
6 | |||
7 | #include <assert.h> | ||
8 | #include <errno.h> | ||
9 | #include <fcntl.h> | ||
10 | #include <libgen.h> | ||
11 | #include <limits.h> | ||
12 | #include <stdio.h> | ||
13 | #include <string.h> | ||
14 | #include <unistd.h> | ||
15 | |||
16 | #include <sys/eventfd.h> | ||
17 | |||
18 | #define USAGE_STR "Usage: cgroup_event_listener <path-to-control-file> <args>\n" | ||
19 | |||
20 | int main(int argc, char **argv) | ||
21 | { | ||
22 | int efd = -1; | ||
23 | int cfd = -1; | ||
24 | int event_control = -1; | ||
25 | char event_control_path[PATH_MAX]; | ||
26 | char line[LINE_MAX]; | ||
27 | int ret; | ||
28 | |||
29 | if (argc != 3) { | ||
30 | fputs(USAGE_STR, stderr); | ||
31 | return 1; | ||
32 | } | ||
33 | |||
34 | cfd = open(argv[1], O_RDONLY); | ||
35 | if (cfd == -1) { | ||
36 | fprintf(stderr, "Cannot open %s: %s\n", argv[1], | ||
37 | strerror(errno)); | ||
38 | goto out; | ||
39 | } | ||
40 | |||
41 | ret = snprintf(event_control_path, PATH_MAX, "%s/cgroup.event_control", | ||
42 | dirname(argv[1])); | ||
43 | if (ret >= PATH_MAX) { | ||
44 | fputs("Path to cgroup.event_control is too long\n", stderr); | ||
45 | goto out; | ||
46 | } | ||
47 | |||
48 | event_control = open(event_control_path, O_WRONLY); | ||
49 | if (event_control == -1) { | ||
50 | fprintf(stderr, "Cannot open %s: %s\n", event_control_path, | ||
51 | strerror(errno)); | ||
52 | goto out; | ||
53 | } | ||
54 | |||
55 | efd = eventfd(0, 0); | ||
56 | if (efd == -1) { | ||
57 | perror("eventfd() failed"); | ||
58 | goto out; | ||
59 | } | ||
60 | |||
61 | ret = snprintf(line, LINE_MAX, "%d %d %s", efd, cfd, argv[2]); | ||
62 | if (ret >= LINE_MAX) { | ||
63 | fputs("Arguments string is too long\n", stderr); | ||
64 | goto out; | ||
65 | } | ||
66 | |||
67 | ret = write(event_control, line, strlen(line) + 1); | ||
68 | if (ret == -1) { | ||
69 | perror("Cannot write to cgroup.event_control"); | ||
70 | goto out; | ||
71 | } | ||
72 | |||
73 | while (1) { | ||
74 | uint64_t result; | ||
75 | |||
76 | ret = read(efd, &result, sizeof(result)); | ||
77 | if (ret == -1) { | ||
78 | if (errno == EINTR) | ||
79 | continue; | ||
80 | perror("Cannot read from eventfd"); | ||
81 | break; | ||
82 | } | ||
83 | assert(ret == sizeof(result)); | ||
84 | |||
85 | ret = access(event_control_path, W_OK); | ||
86 | if ((ret == -1) && (errno == ENOENT)) { | ||
87 | puts("The cgroup seems to have removed."); | ||
88 | ret = 0; | ||
89 | break; | ||
90 | } | ||
91 | |||
92 | if (ret == -1) { | ||
93 | perror("cgroup.event_control " | ||
94 | "is not accessable any more"); | ||
95 | break; | ||
96 | } | ||
97 | |||
98 | printf("%s %s: crossed\n", argv[1], argv[2]); | ||
99 | } | ||
100 | |||
101 | out: | ||
102 | if (efd >= 0) | ||
103 | close(efd); | ||
104 | if (event_control >= 0) | ||
105 | close(event_control); | ||
106 | if (cfd >= 0) | ||
107 | close(cfd); | ||
108 | |||
109 | return (ret != 0); | ||
110 | } | ||
diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroups/cgroups.txt index 0b33bfe7dde9..fd588ff0e296 100644 --- a/Documentation/cgroups/cgroups.txt +++ b/Documentation/cgroups/cgroups.txt | |||
@@ -22,6 +22,8 @@ CONTENTS: | |||
22 | 2. Usage Examples and Syntax | 22 | 2. Usage Examples and Syntax |
23 | 2.1 Basic Usage | 23 | 2.1 Basic Usage |
24 | 2.2 Attaching processes | 24 | 2.2 Attaching processes |
25 | 2.3 Mounting hierarchies by name | ||
26 | 2.4 Notification API | ||
25 | 3. Kernel API | 27 | 3. Kernel API |
26 | 3.1 Overview | 28 | 3.1 Overview |
27 | 3.2 Synchronization | 29 | 3.2 Synchronization |
@@ -434,6 +436,25 @@ you give a subsystem a name. | |||
434 | The name of the subsystem appears as part of the hierarchy description | 436 | The name of the subsystem appears as part of the hierarchy description |
435 | in /proc/mounts and /proc/<pid>/cgroups. | 437 | in /proc/mounts and /proc/<pid>/cgroups. |
436 | 438 | ||
439 | 2.4 Notification API | ||
440 | -------------------- | ||
441 | |||
442 | There is mechanism which allows to get notifications about changing | ||
443 | status of a cgroup. | ||
444 | |||
445 | To register new notification handler you need: | ||
446 | - create a file descriptor for event notification using eventfd(2); | ||
447 | - open a control file to be monitored (e.g. memory.usage_in_bytes); | ||
448 | - write "<event_fd> <control_fd> <args>" to cgroup.event_control. | ||
449 | Interpretation of args is defined by control file implementation; | ||
450 | |||
451 | eventfd will be woken up by control file implementation or when the | ||
452 | cgroup is removed. | ||
453 | |||
454 | To unregister notification handler just close eventfd. | ||
455 | |||
456 | NOTE: Support of notifications should be implemented for the control | ||
457 | file. See documentation for the subsystem. | ||
437 | 458 | ||
438 | 3. Kernel API | 459 | 3. Kernel API |
439 | ============= | 460 | ============= |
@@ -488,6 +509,11 @@ Each subsystem should: | |||
488 | - add an entry in linux/cgroup_subsys.h | 509 | - add an entry in linux/cgroup_subsys.h |
489 | - define a cgroup_subsys object called <name>_subsys | 510 | - define a cgroup_subsys object called <name>_subsys |
490 | 511 | ||
512 | If a subsystem can be compiled as a module, it should also have in its | ||
513 | module initcall a call to cgroup_load_subsys(), and in its exitcall a | ||
514 | call to cgroup_unload_subsys(). It should also set its_subsys.module = | ||
515 | THIS_MODULE in its .c file. | ||
516 | |||
491 | Each subsystem may export the following methods. The only mandatory | 517 | Each subsystem may export the following methods. The only mandatory |
492 | methods are create/destroy. Any others that are null are presumed to | 518 | methods are create/destroy. Any others that are null are presumed to |
493 | be successful no-ops. | 519 | be successful no-ops. |
@@ -536,10 +562,21 @@ returns an error, this will abort the attach operation. If a NULL | |||
536 | task is passed, then a successful result indicates that *any* | 562 | task is passed, then a successful result indicates that *any* |
537 | unspecified task can be moved into the cgroup. Note that this isn't | 563 | unspecified task can be moved into the cgroup. Note that this isn't |
538 | called on a fork. If this method returns 0 (success) then this should | 564 | called on a fork. If this method returns 0 (success) then this should |
539 | remain valid while the caller holds cgroup_mutex. If threadgroup is | 565 | remain valid while the caller holds cgroup_mutex and it is ensured that either |
566 | attach() or cancel_attach() will be called in future. If threadgroup is | ||
540 | true, then a successful result indicates that all threads in the given | 567 | true, then a successful result indicates that all threads in the given |
541 | thread's threadgroup can be moved together. | 568 | thread's threadgroup can be moved together. |
542 | 569 | ||
570 | void cancel_attach(struct cgroup_subsys *ss, struct cgroup *cgrp, | ||
571 | struct task_struct *task, bool threadgroup) | ||
572 | (cgroup_mutex held by caller) | ||
573 | |||
574 | Called when a task attach operation has failed after can_attach() has succeeded. | ||
575 | A subsystem whose can_attach() has some side-effects should provide this | ||
576 | function, so that the subsytem can implement a rollback. If not, not necessary. | ||
577 | This will be called only about subsystems whose can_attach() operation have | ||
578 | succeeded. | ||
579 | |||
543 | void attach(struct cgroup_subsys *ss, struct cgroup *cgrp, | 580 | void attach(struct cgroup_subsys *ss, struct cgroup *cgrp, |
544 | struct cgroup *old_cgrp, struct task_struct *task, | 581 | struct cgroup *old_cgrp, struct task_struct *task, |
545 | bool threadgroup) | 582 | bool threadgroup) |
diff --git a/Documentation/cgroups/cpusets.txt b/Documentation/cgroups/cpusets.txt index 1d7e9784439a..4160df82b3f5 100644 --- a/Documentation/cgroups/cpusets.txt +++ b/Documentation/cgroups/cpusets.txt | |||
@@ -168,20 +168,20 @@ Each cpuset is represented by a directory in the cgroup file system | |||
168 | containing (on top of the standard cgroup files) the following | 168 | containing (on top of the standard cgroup files) the following |
169 | files describing that cpuset: | 169 | files describing that cpuset: |
170 | 170 | ||
171 | - cpus: list of CPUs in that cpuset | 171 | - cpuset.cpus: list of CPUs in that cpuset |
172 | - mems: list of Memory Nodes in that cpuset | 172 | - cpuset.mems: list of Memory Nodes in that cpuset |
173 | - memory_migrate flag: if set, move pages to cpusets nodes | 173 | - cpuset.memory_migrate flag: if set, move pages to cpusets nodes |
174 | - cpu_exclusive flag: is cpu placement exclusive? | 174 | - cpuset.cpu_exclusive flag: is cpu placement exclusive? |
175 | - mem_exclusive flag: is memory placement exclusive? | 175 | - cpuset.mem_exclusive flag: is memory placement exclusive? |
176 | - mem_hardwall flag: is memory allocation hardwalled | 176 | - cpuset.mem_hardwall flag: is memory allocation hardwalled |
177 | - memory_pressure: measure of how much paging pressure in cpuset | 177 | - cpuset.memory_pressure: measure of how much paging pressure in cpuset |
178 | - memory_spread_page flag: if set, spread page cache evenly on allowed nodes | 178 | - cpuset.memory_spread_page flag: if set, spread page cache evenly on allowed nodes |
179 | - memory_spread_slab flag: if set, spread slab cache evenly on allowed nodes | 179 | - cpuset.memory_spread_slab flag: if set, spread slab cache evenly on allowed nodes |
180 | - sched_load_balance flag: if set, load balance within CPUs on that cpuset | 180 | - cpuset.sched_load_balance flag: if set, load balance within CPUs on that cpuset |
181 | - sched_relax_domain_level: the searching range when migrating tasks | 181 | - cpuset.sched_relax_domain_level: the searching range when migrating tasks |
182 | 182 | ||
183 | In addition, the root cpuset only has the following file: | 183 | In addition, the root cpuset only has the following file: |
184 | - memory_pressure_enabled flag: compute memory_pressure? | 184 | - cpuset.memory_pressure_enabled flag: compute memory_pressure? |
185 | 185 | ||
186 | New cpusets are created using the mkdir system call or shell | 186 | New cpusets are created using the mkdir system call or shell |
187 | command. The properties of a cpuset, such as its flags, allowed | 187 | command. The properties of a cpuset, such as its flags, allowed |
@@ -229,7 +229,7 @@ If a cpuset is cpu or mem exclusive, no other cpuset, other than | |||
229 | a direct ancestor or descendant, may share any of the same CPUs or | 229 | a direct ancestor or descendant, may share any of the same CPUs or |
230 | Memory Nodes. | 230 | Memory Nodes. |
231 | 231 | ||
232 | A cpuset that is mem_exclusive *or* mem_hardwall is "hardwalled", | 232 | A cpuset that is cpuset.mem_exclusive *or* cpuset.mem_hardwall is "hardwalled", |
233 | i.e. it restricts kernel allocations for page, buffer and other data | 233 | i.e. it restricts kernel allocations for page, buffer and other data |
234 | commonly shared by the kernel across multiple users. All cpusets, | 234 | commonly shared by the kernel across multiple users. All cpusets, |
235 | whether hardwalled or not, restrict allocations of memory for user | 235 | whether hardwalled or not, restrict allocations of memory for user |
@@ -304,15 +304,15 @@ times 1000. | |||
304 | --------------------------- | 304 | --------------------------- |
305 | There are two boolean flag files per cpuset that control where the | 305 | There are two boolean flag files per cpuset that control where the |
306 | kernel allocates pages for the file system buffers and related in | 306 | kernel allocates pages for the file system buffers and related in |
307 | kernel data structures. They are called 'memory_spread_page' and | 307 | kernel data structures. They are called 'cpuset.memory_spread_page' and |
308 | 'memory_spread_slab'. | 308 | 'cpuset.memory_spread_slab'. |
309 | 309 | ||
310 | If the per-cpuset boolean flag file 'memory_spread_page' is set, then | 310 | If the per-cpuset boolean flag file 'cpuset.memory_spread_page' is set, then |
311 | the kernel will spread the file system buffers (page cache) evenly | 311 | the kernel will spread the file system buffers (page cache) evenly |
312 | over all the nodes that the faulting task is allowed to use, instead | 312 | over all the nodes that the faulting task is allowed to use, instead |
313 | of preferring to put those pages on the node where the task is running. | 313 | of preferring to put those pages on the node where the task is running. |
314 | 314 | ||
315 | If the per-cpuset boolean flag file 'memory_spread_slab' is set, | 315 | If the per-cpuset boolean flag file 'cpuset.memory_spread_slab' is set, |
316 | then the kernel will spread some file system related slab caches, | 316 | then the kernel will spread some file system related slab caches, |
317 | such as for inodes and dentries evenly over all the nodes that the | 317 | such as for inodes and dentries evenly over all the nodes that the |
318 | faulting task is allowed to use, instead of preferring to put those | 318 | faulting task is allowed to use, instead of preferring to put those |
@@ -337,21 +337,21 @@ their containing tasks memory spread settings. If memory spreading | |||
337 | is turned off, then the currently specified NUMA mempolicy once again | 337 | is turned off, then the currently specified NUMA mempolicy once again |
338 | applies to memory page allocations. | 338 | applies to memory page allocations. |
339 | 339 | ||
340 | Both 'memory_spread_page' and 'memory_spread_slab' are boolean flag | 340 | Both 'cpuset.memory_spread_page' and 'cpuset.memory_spread_slab' are boolean flag |
341 | files. By default they contain "0", meaning that the feature is off | 341 | files. By default they contain "0", meaning that the feature is off |
342 | for that cpuset. If a "1" is written to that file, then that turns | 342 | for that cpuset. If a "1" is written to that file, then that turns |
343 | the named feature on. | 343 | the named feature on. |
344 | 344 | ||
345 | The implementation is simple. | 345 | The implementation is simple. |
346 | 346 | ||
347 | Setting the flag 'memory_spread_page' turns on a per-process flag | 347 | Setting the flag 'cpuset.memory_spread_page' turns on a per-process flag |
348 | PF_SPREAD_PAGE for each task that is in that cpuset or subsequently | 348 | PF_SPREAD_PAGE for each task that is in that cpuset or subsequently |
349 | joins that cpuset. The page allocation calls for the page cache | 349 | joins that cpuset. The page allocation calls for the page cache |
350 | is modified to perform an inline check for this PF_SPREAD_PAGE task | 350 | is modified to perform an inline check for this PF_SPREAD_PAGE task |
351 | flag, and if set, a call to a new routine cpuset_mem_spread_node() | 351 | flag, and if set, a call to a new routine cpuset_mem_spread_node() |
352 | returns the node to prefer for the allocation. | 352 | returns the node to prefer for the allocation. |
353 | 353 | ||
354 | Similarly, setting 'memory_spread_slab' turns on the flag | 354 | Similarly, setting 'cpuset.memory_spread_slab' turns on the flag |
355 | PF_SPREAD_SLAB, and appropriately marked slab caches will allocate | 355 | PF_SPREAD_SLAB, and appropriately marked slab caches will allocate |
356 | pages from the node returned by cpuset_mem_spread_node(). | 356 | pages from the node returned by cpuset_mem_spread_node(). |
357 | 357 | ||
@@ -404,24 +404,24 @@ the following two situations: | |||
404 | system overhead on those CPUs, including avoiding task load | 404 | system overhead on those CPUs, including avoiding task load |
405 | balancing if that is not needed. | 405 | balancing if that is not needed. |
406 | 406 | ||
407 | When the per-cpuset flag "sched_load_balance" is enabled (the default | 407 | When the per-cpuset flag "cpuset.sched_load_balance" is enabled (the default |
408 | setting), it requests that all the CPUs in that cpusets allowed 'cpus' | 408 | setting), it requests that all the CPUs in that cpusets allowed 'cpuset.cpus' |
409 | be contained in a single sched domain, ensuring that load balancing | 409 | be contained in a single sched domain, ensuring that load balancing |
410 | can move a task (not otherwised pinned, as by sched_setaffinity) | 410 | can move a task (not otherwised pinned, as by sched_setaffinity) |
411 | from any CPU in that cpuset to any other. | 411 | from any CPU in that cpuset to any other. |
412 | 412 | ||
413 | When the per-cpuset flag "sched_load_balance" is disabled, then the | 413 | When the per-cpuset flag "cpuset.sched_load_balance" is disabled, then the |
414 | scheduler will avoid load balancing across the CPUs in that cpuset, | 414 | scheduler will avoid load balancing across the CPUs in that cpuset, |
415 | --except-- in so far as is necessary because some overlapping cpuset | 415 | --except-- in so far as is necessary because some overlapping cpuset |
416 | has "sched_load_balance" enabled. | 416 | has "sched_load_balance" enabled. |
417 | 417 | ||
418 | So, for example, if the top cpuset has the flag "sched_load_balance" | 418 | So, for example, if the top cpuset has the flag "cpuset.sched_load_balance" |
419 | enabled, then the scheduler will have one sched domain covering all | 419 | enabled, then the scheduler will have one sched domain covering all |
420 | CPUs, and the setting of the "sched_load_balance" flag in any other | 420 | CPUs, and the setting of the "cpuset.sched_load_balance" flag in any other |
421 | cpusets won't matter, as we're already fully load balancing. | 421 | cpusets won't matter, as we're already fully load balancing. |
422 | 422 | ||
423 | Therefore in the above two situations, the top cpuset flag | 423 | Therefore in the above two situations, the top cpuset flag |
424 | "sched_load_balance" should be disabled, and only some of the smaller, | 424 | "cpuset.sched_load_balance" should be disabled, and only some of the smaller, |
425 | child cpusets have this flag enabled. | 425 | child cpusets have this flag enabled. |
426 | 426 | ||
427 | When doing this, you don't usually want to leave any unpinned tasks in | 427 | When doing this, you don't usually want to leave any unpinned tasks in |
@@ -433,7 +433,7 @@ scheduler might not consider the possibility of load balancing that | |||
433 | task to that underused CPU. | 433 | task to that underused CPU. |
434 | 434 | ||
435 | Of course, tasks pinned to a particular CPU can be left in a cpuset | 435 | Of course, tasks pinned to a particular CPU can be left in a cpuset |
436 | that disables "sched_load_balance" as those tasks aren't going anywhere | 436 | that disables "cpuset.sched_load_balance" as those tasks aren't going anywhere |
437 | else anyway. | 437 | else anyway. |
438 | 438 | ||
439 | There is an impedance mismatch here, between cpusets and sched domains. | 439 | There is an impedance mismatch here, between cpusets and sched domains. |
@@ -443,19 +443,19 @@ overlap and each CPU is in at most one sched domain. | |||
443 | It is necessary for sched domains to be flat because load balancing | 443 | It is necessary for sched domains to be flat because load balancing |
444 | across partially overlapping sets of CPUs would risk unstable dynamics | 444 | across partially overlapping sets of CPUs would risk unstable dynamics |
445 | that would be beyond our understanding. So if each of two partially | 445 | that would be beyond our understanding. So if each of two partially |
446 | overlapping cpusets enables the flag 'sched_load_balance', then we | 446 | overlapping cpusets enables the flag 'cpuset.sched_load_balance', then we |
447 | form a single sched domain that is a superset of both. We won't move | 447 | form a single sched domain that is a superset of both. We won't move |
448 | a task to a CPU outside it cpuset, but the scheduler load balancing | 448 | a task to a CPU outside it cpuset, but the scheduler load balancing |
449 | code might waste some compute cycles considering that possibility. | 449 | code might waste some compute cycles considering that possibility. |
450 | 450 | ||
451 | This mismatch is why there is not a simple one-to-one relation | 451 | This mismatch is why there is not a simple one-to-one relation |
452 | between which cpusets have the flag "sched_load_balance" enabled, | 452 | between which cpusets have the flag "cpuset.sched_load_balance" enabled, |
453 | and the sched domain configuration. If a cpuset enables the flag, it | 453 | and the sched domain configuration. If a cpuset enables the flag, it |
454 | will get balancing across all its CPUs, but if it disables the flag, | 454 | will get balancing across all its CPUs, but if it disables the flag, |
455 | it will only be assured of no load balancing if no other overlapping | 455 | it will only be assured of no load balancing if no other overlapping |
456 | cpuset enables the flag. | 456 | cpuset enables the flag. |
457 | 457 | ||
458 | If two cpusets have partially overlapping 'cpus' allowed, and only | 458 | If two cpusets have partially overlapping 'cpuset.cpus' allowed, and only |
459 | one of them has this flag enabled, then the other may find its | 459 | one of them has this flag enabled, then the other may find its |
460 | tasks only partially load balanced, just on the overlapping CPUs. | 460 | tasks only partially load balanced, just on the overlapping CPUs. |
461 | This is just the general case of the top_cpuset example given a few | 461 | This is just the general case of the top_cpuset example given a few |
@@ -468,23 +468,23 @@ load balancing to the other CPUs. | |||
468 | 1.7.1 sched_load_balance implementation details. | 468 | 1.7.1 sched_load_balance implementation details. |
469 | ------------------------------------------------ | 469 | ------------------------------------------------ |
470 | 470 | ||
471 | The per-cpuset flag 'sched_load_balance' defaults to enabled (contrary | 471 | The per-cpuset flag 'cpuset.sched_load_balance' defaults to enabled (contrary |
472 | to most cpuset flags.) When enabled for a cpuset, the kernel will | 472 | to most cpuset flags.) When enabled for a cpuset, the kernel will |
473 | ensure that it can load balance across all the CPUs in that cpuset | 473 | ensure that it can load balance across all the CPUs in that cpuset |
474 | (makes sure that all the CPUs in the cpus_allowed of that cpuset are | 474 | (makes sure that all the CPUs in the cpus_allowed of that cpuset are |
475 | in the same sched domain.) | 475 | in the same sched domain.) |
476 | 476 | ||
477 | If two overlapping cpusets both have 'sched_load_balance' enabled, | 477 | If two overlapping cpusets both have 'cpuset.sched_load_balance' enabled, |
478 | then they will be (must be) both in the same sched domain. | 478 | then they will be (must be) both in the same sched domain. |
479 | 479 | ||
480 | If, as is the default, the top cpuset has 'sched_load_balance' enabled, | 480 | If, as is the default, the top cpuset has 'cpuset.sched_load_balance' enabled, |
481 | then by the above that means there is a single sched domain covering | 481 | then by the above that means there is a single sched domain covering |
482 | the whole system, regardless of any other cpuset settings. | 482 | the whole system, regardless of any other cpuset settings. |
483 | 483 | ||
484 | The kernel commits to user space that it will avoid load balancing | 484 | The kernel commits to user space that it will avoid load balancing |
485 | where it can. It will pick as fine a granularity partition of sched | 485 | where it can. It will pick as fine a granularity partition of sched |
486 | domains as it can while still providing load balancing for any set | 486 | domains as it can while still providing load balancing for any set |
487 | of CPUs allowed to a cpuset having 'sched_load_balance' enabled. | 487 | of CPUs allowed to a cpuset having 'cpuset.sched_load_balance' enabled. |
488 | 488 | ||
489 | The internal kernel cpuset to scheduler interface passes from the | 489 | The internal kernel cpuset to scheduler interface passes from the |
490 | cpuset code to the scheduler code a partition of the load balanced | 490 | cpuset code to the scheduler code a partition of the load balanced |
@@ -495,9 +495,9 @@ all the CPUs that must be load balanced. | |||
495 | The cpuset code builds a new such partition and passes it to the | 495 | The cpuset code builds a new such partition and passes it to the |
496 | scheduler sched domain setup code, to have the sched domains rebuilt | 496 | scheduler sched domain setup code, to have the sched domains rebuilt |
497 | as necessary, whenever: | 497 | as necessary, whenever: |
498 | - the 'sched_load_balance' flag of a cpuset with non-empty CPUs changes, | 498 | - the 'cpuset.sched_load_balance' flag of a cpuset with non-empty CPUs changes, |
499 | - or CPUs come or go from a cpuset with this flag enabled, | 499 | - or CPUs come or go from a cpuset with this flag enabled, |
500 | - or 'sched_relax_domain_level' value of a cpuset with non-empty CPUs | 500 | - or 'cpuset.sched_relax_domain_level' value of a cpuset with non-empty CPUs |
501 | and with this flag enabled changes, | 501 | and with this flag enabled changes, |
502 | - or a cpuset with non-empty CPUs and with this flag enabled is removed, | 502 | - or a cpuset with non-empty CPUs and with this flag enabled is removed, |
503 | - or a cpu is offlined/onlined. | 503 | - or a cpu is offlined/onlined. |
@@ -542,7 +542,7 @@ As the result, task B on CPU X need to wait task A or wait load balance | |||
542 | on the next tick. For some applications in special situation, waiting | 542 | on the next tick. For some applications in special situation, waiting |
543 | 1 tick may be too long. | 543 | 1 tick may be too long. |
544 | 544 | ||
545 | The 'sched_relax_domain_level' file allows you to request changing | 545 | The 'cpuset.sched_relax_domain_level' file allows you to request changing |
546 | this searching range as you like. This file takes int value which | 546 | this searching range as you like. This file takes int value which |
547 | indicates size of searching range in levels ideally as follows, | 547 | indicates size of searching range in levels ideally as follows, |
548 | otherwise initial value -1 that indicates the cpuset has no request. | 548 | otherwise initial value -1 that indicates the cpuset has no request. |
@@ -559,8 +559,8 @@ The system default is architecture dependent. The system default | |||
559 | can be changed using the relax_domain_level= boot parameter. | 559 | can be changed using the relax_domain_level= boot parameter. |
560 | 560 | ||
561 | This file is per-cpuset and affect the sched domain where the cpuset | 561 | This file is per-cpuset and affect the sched domain where the cpuset |
562 | belongs to. Therefore if the flag 'sched_load_balance' of a cpuset | 562 | belongs to. Therefore if the flag 'cpuset.sched_load_balance' of a cpuset |
563 | is disabled, then 'sched_relax_domain_level' have no effect since | 563 | is disabled, then 'cpuset.sched_relax_domain_level' have no effect since |
564 | there is no sched domain belonging the cpuset. | 564 | there is no sched domain belonging the cpuset. |
565 | 565 | ||
566 | If multiple cpusets are overlapping and hence they form a single sched | 566 | If multiple cpusets are overlapping and hence they form a single sched |
@@ -607,9 +607,9 @@ from one cpuset to another, then the kernel will adjust the tasks | |||
607 | memory placement, as above, the next time that the kernel attempts | 607 | memory placement, as above, the next time that the kernel attempts |
608 | to allocate a page of memory for that task. | 608 | to allocate a page of memory for that task. |
609 | 609 | ||
610 | If a cpuset has its 'cpus' modified, then each task in that cpuset | 610 | If a cpuset has its 'cpuset.cpus' modified, then each task in that cpuset |
611 | will have its allowed CPU placement changed immediately. Similarly, | 611 | will have its allowed CPU placement changed immediately. Similarly, |
612 | if a tasks pid is written to another cpusets 'tasks' file, then its | 612 | if a tasks pid is written to another cpusets 'cpuset.tasks' file, then its |
613 | allowed CPU placement is changed immediately. If such a task had been | 613 | allowed CPU placement is changed immediately. If such a task had been |
614 | bound to some subset of its cpuset using the sched_setaffinity() call, | 614 | bound to some subset of its cpuset using the sched_setaffinity() call, |
615 | the task will be allowed to run on any CPU allowed in its new cpuset, | 615 | the task will be allowed to run on any CPU allowed in its new cpuset, |
@@ -622,8 +622,8 @@ and the processor placement is updated immediately. | |||
622 | Normally, once a page is allocated (given a physical page | 622 | Normally, once a page is allocated (given a physical page |
623 | of main memory) then that page stays on whatever node it | 623 | of main memory) then that page stays on whatever node it |
624 | was allocated, so long as it remains allocated, even if the | 624 | was allocated, so long as it remains allocated, even if the |
625 | cpusets memory placement policy 'mems' subsequently changes. | 625 | cpusets memory placement policy 'cpuset.mems' subsequently changes. |
626 | If the cpuset flag file 'memory_migrate' is set true, then when | 626 | If the cpuset flag file 'cpuset.memory_migrate' is set true, then when |
627 | tasks are attached to that cpuset, any pages that task had | 627 | tasks are attached to that cpuset, any pages that task had |
628 | allocated to it on nodes in its previous cpuset are migrated | 628 | allocated to it on nodes in its previous cpuset are migrated |
629 | to the tasks new cpuset. The relative placement of the page within | 629 | to the tasks new cpuset. The relative placement of the page within |
@@ -631,12 +631,12 @@ the cpuset is preserved during these migration operations if possible. | |||
631 | For example if the page was on the second valid node of the prior cpuset | 631 | For example if the page was on the second valid node of the prior cpuset |
632 | then the page will be placed on the second valid node of the new cpuset. | 632 | then the page will be placed on the second valid node of the new cpuset. |
633 | 633 | ||
634 | Also if 'memory_migrate' is set true, then if that cpusets | 634 | Also if 'cpuset.memory_migrate' is set true, then if that cpusets |
635 | 'mems' file is modified, pages allocated to tasks in that | 635 | 'cpuset.mems' file is modified, pages allocated to tasks in that |
636 | cpuset, that were on nodes in the previous setting of 'mems', | 636 | cpuset, that were on nodes in the previous setting of 'cpuset.mems', |
637 | will be moved to nodes in the new setting of 'mems.' | 637 | will be moved to nodes in the new setting of 'mems.' |
638 | Pages that were not in the tasks prior cpuset, or in the cpusets | 638 | Pages that were not in the tasks prior cpuset, or in the cpusets |
639 | prior 'mems' setting, will not be moved. | 639 | prior 'cpuset.mems' setting, will not be moved. |
640 | 640 | ||
641 | There is an exception to the above. If hotplug functionality is used | 641 | There is an exception to the above. If hotplug functionality is used |
642 | to remove all the CPUs that are currently assigned to a cpuset, | 642 | to remove all the CPUs that are currently assigned to a cpuset, |
@@ -678,8 +678,8 @@ and then start a subshell 'sh' in that cpuset: | |||
678 | cd /dev/cpuset | 678 | cd /dev/cpuset |
679 | mkdir Charlie | 679 | mkdir Charlie |
680 | cd Charlie | 680 | cd Charlie |
681 | /bin/echo 2-3 > cpus | 681 | /bin/echo 2-3 > cpuset.cpus |
682 | /bin/echo 1 > mems | 682 | /bin/echo 1 > cpuset.mems |
683 | /bin/echo $$ > tasks | 683 | /bin/echo $$ > tasks |
684 | sh | 684 | sh |
685 | # The subshell 'sh' is now running in cpuset Charlie | 685 | # The subshell 'sh' is now running in cpuset Charlie |
@@ -725,10 +725,13 @@ Now you want to do something with this cpuset. | |||
725 | 725 | ||
726 | In this directory you can find several files: | 726 | In this directory you can find several files: |
727 | # ls | 727 | # ls |
728 | cpu_exclusive memory_migrate mems tasks | 728 | cpuset.cpu_exclusive cpuset.memory_spread_slab |
729 | cpus memory_pressure notify_on_release | 729 | cpuset.cpus cpuset.mems |
730 | mem_exclusive memory_spread_page sched_load_balance | 730 | cpuset.mem_exclusive cpuset.sched_load_balance |
731 | mem_hardwall memory_spread_slab sched_relax_domain_level | 731 | cpuset.mem_hardwall cpuset.sched_relax_domain_level |
732 | cpuset.memory_migrate notify_on_release | ||
733 | cpuset.memory_pressure tasks | ||
734 | cpuset.memory_spread_page | ||
732 | 735 | ||
733 | Reading them will give you information about the state of this cpuset: | 736 | Reading them will give you information about the state of this cpuset: |
734 | the CPUs and Memory Nodes it can use, the processes that are using | 737 | the CPUs and Memory Nodes it can use, the processes that are using |
@@ -736,13 +739,13 @@ it, its properties. By writing to these files you can manipulate | |||
736 | the cpuset. | 739 | the cpuset. |
737 | 740 | ||
738 | Set some flags: | 741 | Set some flags: |
739 | # /bin/echo 1 > cpu_exclusive | 742 | # /bin/echo 1 > cpuset.cpu_exclusive |
740 | 743 | ||
741 | Add some cpus: | 744 | Add some cpus: |
742 | # /bin/echo 0-7 > cpus | 745 | # /bin/echo 0-7 > cpuset.cpus |
743 | 746 | ||
744 | Add some mems: | 747 | Add some mems: |
745 | # /bin/echo 0-7 > mems | 748 | # /bin/echo 0-7 > cpuset.mems |
746 | 749 | ||
747 | Now attach your shell to this cpuset: | 750 | Now attach your shell to this cpuset: |
748 | # /bin/echo $$ > tasks | 751 | # /bin/echo $$ > tasks |
@@ -774,28 +777,28 @@ echo "/sbin/cpuset_release_agent" > /dev/cpuset/release_agent | |||
774 | This is the syntax to use when writing in the cpus or mems files | 777 | This is the syntax to use when writing in the cpus or mems files |
775 | in cpuset directories: | 778 | in cpuset directories: |
776 | 779 | ||
777 | # /bin/echo 1-4 > cpus -> set cpus list to cpus 1,2,3,4 | 780 | # /bin/echo 1-4 > cpuset.cpus -> set cpus list to cpus 1,2,3,4 |
778 | # /bin/echo 1,2,3,4 > cpus -> set cpus list to cpus 1,2,3,4 | 781 | # /bin/echo 1,2,3,4 > cpuset.cpus -> set cpus list to cpus 1,2,3,4 |
779 | 782 | ||
780 | To add a CPU to a cpuset, write the new list of CPUs including the | 783 | To add a CPU to a cpuset, write the new list of CPUs including the |
781 | CPU to be added. To add 6 to the above cpuset: | 784 | CPU to be added. To add 6 to the above cpuset: |
782 | 785 | ||
783 | # /bin/echo 1-4,6 > cpus -> set cpus list to cpus 1,2,3,4,6 | 786 | # /bin/echo 1-4,6 > cpuset.cpus -> set cpus list to cpus 1,2,3,4,6 |
784 | 787 | ||
785 | Similarly to remove a CPU from a cpuset, write the new list of CPUs | 788 | Similarly to remove a CPU from a cpuset, write the new list of CPUs |
786 | without the CPU to be removed. | 789 | without the CPU to be removed. |
787 | 790 | ||
788 | To remove all the CPUs: | 791 | To remove all the CPUs: |
789 | 792 | ||
790 | # /bin/echo "" > cpus -> clear cpus list | 793 | # /bin/echo "" > cpuset.cpus -> clear cpus list |
791 | 794 | ||
792 | 2.3 Setting flags | 795 | 2.3 Setting flags |
793 | ----------------- | 796 | ----------------- |
794 | 797 | ||
795 | The syntax is very simple: | 798 | The syntax is very simple: |
796 | 799 | ||
797 | # /bin/echo 1 > cpu_exclusive -> set flag 'cpu_exclusive' | 800 | # /bin/echo 1 > cpuset.cpu_exclusive -> set flag 'cpuset.cpu_exclusive' |
798 | # /bin/echo 0 > cpu_exclusive -> unset flag 'cpu_exclusive' | 801 | # /bin/echo 0 > cpuset.cpu_exclusive -> unset flag 'cpuset.cpu_exclusive' |
799 | 802 | ||
800 | 2.4 Attaching processes | 803 | 2.4 Attaching processes |
801 | ----------------------- | 804 | ----------------------- |
diff --git a/Documentation/cgroups/memcg_test.txt b/Documentation/cgroups/memcg_test.txt index 72db89ed0609..f7f68b2ac199 100644 --- a/Documentation/cgroups/memcg_test.txt +++ b/Documentation/cgroups/memcg_test.txt | |||
@@ -1,6 +1,6 @@ | |||
1 | Memory Resource Controller(Memcg) Implementation Memo. | 1 | Memory Resource Controller(Memcg) Implementation Memo. |
2 | Last Updated: 2009/1/20 | 2 | Last Updated: 2010/2 |
3 | Base Kernel Version: based on 2.6.29-rc2. | 3 | Base Kernel Version: based on 2.6.33-rc7-mm(candidate for 34). |
4 | 4 | ||
5 | Because VM is getting complex (one of reasons is memcg...), memcg's behavior | 5 | Because VM is getting complex (one of reasons is memcg...), memcg's behavior |
6 | is complex. This is a document for memcg's internal behavior. | 6 | is complex. This is a document for memcg's internal behavior. |
@@ -337,7 +337,7 @@ Under below explanation, we assume CONFIG_MEM_RES_CTRL_SWAP=y. | |||
337 | race and lock dependency with other cgroup subsystems. | 337 | race and lock dependency with other cgroup subsystems. |
338 | 338 | ||
339 | example) | 339 | example) |
340 | # mount -t cgroup none /cgroup -t cpuset,memory,cpu,devices | 340 | # mount -t cgroup none /cgroup -o cpuset,memory,cpu,devices |
341 | 341 | ||
342 | and do task move, mkdir, rmdir etc...under this. | 342 | and do task move, mkdir, rmdir etc...under this. |
343 | 343 | ||
@@ -348,7 +348,7 @@ Under below explanation, we assume CONFIG_MEM_RES_CTRL_SWAP=y. | |||
348 | 348 | ||
349 | For example, test like following is good. | 349 | For example, test like following is good. |
350 | (Shell-A) | 350 | (Shell-A) |
351 | # mount -t cgroup none /cgroup -t memory | 351 | # mount -t cgroup none /cgroup -o memory |
352 | # mkdir /cgroup/test | 352 | # mkdir /cgroup/test |
353 | # echo 40M > /cgroup/test/memory.limit_in_bytes | 353 | # echo 40M > /cgroup/test/memory.limit_in_bytes |
354 | # echo 0 > /cgroup/test/tasks | 354 | # echo 0 > /cgroup/test/tasks |
@@ -378,3 +378,42 @@ Under below explanation, we assume CONFIG_MEM_RES_CTRL_SWAP=y. | |||
378 | #echo 50M > memory.limit_in_bytes | 378 | #echo 50M > memory.limit_in_bytes |
379 | #echo 50M > memory.memsw.limit_in_bytes | 379 | #echo 50M > memory.memsw.limit_in_bytes |
380 | run 51M of malloc | 380 | run 51M of malloc |
381 | |||
382 | 9.9 Move charges at task migration | ||
383 | Charges associated with a task can be moved along with task migration. | ||
384 | |||
385 | (Shell-A) | ||
386 | #mkdir /cgroup/A | ||
387 | #echo $$ >/cgroup/A/tasks | ||
388 | run some programs which uses some amount of memory in /cgroup/A. | ||
389 | |||
390 | (Shell-B) | ||
391 | #mkdir /cgroup/B | ||
392 | #echo 1 >/cgroup/B/memory.move_charge_at_immigrate | ||
393 | #echo "pid of the program running in group A" >/cgroup/B/tasks | ||
394 | |||
395 | You can see charges have been moved by reading *.usage_in_bytes or | ||
396 | memory.stat of both A and B. | ||
397 | See 8.2 of Documentation/cgroups/memory.txt to see what value should be | ||
398 | written to move_charge_at_immigrate. | ||
399 | |||
400 | 9.10 Memory thresholds | ||
401 | Memory controler implements memory thresholds using cgroups notification | ||
402 | API. You can use Documentation/cgroups/cgroup_event_listener.c to test | ||
403 | it. | ||
404 | |||
405 | (Shell-A) Create cgroup and run event listener | ||
406 | # mkdir /cgroup/A | ||
407 | # ./cgroup_event_listener /cgroup/A/memory.usage_in_bytes 5M | ||
408 | |||
409 | (Shell-B) Add task to cgroup and try to allocate and free memory | ||
410 | # echo $$ >/cgroup/A/tasks | ||
411 | # a="$(dd if=/dev/zero bs=1M count=10)" | ||
412 | # a= | ||
413 | |||
414 | You will see message from cgroup_event_listener every time you cross | ||
415 | the thresholds. | ||
416 | |||
417 | Use /cgroup/A/memory.memsw.usage_in_bytes to test memsw thresholds. | ||
418 | |||
419 | It's good idea to test root cgroup as well. | ||
diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt index b871f2552b45..f8bc802d70b9 100644 --- a/Documentation/cgroups/memory.txt +++ b/Documentation/cgroups/memory.txt | |||
@@ -182,6 +182,8 @@ list. | |||
182 | NOTE: Reclaim does not work for the root cgroup, since we cannot set any | 182 | NOTE: Reclaim does not work for the root cgroup, since we cannot set any |
183 | limits on the root cgroup. | 183 | limits on the root cgroup. |
184 | 184 | ||
185 | Note2: When panic_on_oom is set to "2", the whole system will panic. | ||
186 | |||
185 | 2. Locking | 187 | 2. Locking |
186 | 188 | ||
187 | The memory controller uses the following hierarchy | 189 | The memory controller uses the following hierarchy |
@@ -262,10 +264,12 @@ some of the pages cached in the cgroup (page cache pages). | |||
262 | 4.2 Task migration | 264 | 4.2 Task migration |
263 | 265 | ||
264 | When a task migrates from one cgroup to another, it's charge is not | 266 | When a task migrates from one cgroup to another, it's charge is not |
265 | carried forward. The pages allocated from the original cgroup still | 267 | carried forward by default. The pages allocated from the original cgroup still |
266 | remain charged to it, the charge is dropped when the page is freed or | 268 | remain charged to it, the charge is dropped when the page is freed or |
267 | reclaimed. | 269 | reclaimed. |
268 | 270 | ||
271 | Note: You can move charges of a task along with task migration. See 8. | ||
272 | |||
269 | 4.3 Removing a cgroup | 273 | 4.3 Removing a cgroup |
270 | 274 | ||
271 | A cgroup can be removed by rmdir, but as discussed in sections 4.1 and 4.2, a | 275 | A cgroup can be removed by rmdir, but as discussed in sections 4.1 and 4.2, a |
@@ -377,7 +381,8 @@ The feature can be disabled by | |||
377 | NOTE1: Enabling/disabling will fail if the cgroup already has other | 381 | NOTE1: Enabling/disabling will fail if the cgroup already has other |
378 | cgroups created below it. | 382 | cgroups created below it. |
379 | 383 | ||
380 | NOTE2: This feature can be enabled/disabled per subtree. | 384 | NOTE2: When panic_on_oom is set to "2", the whole system will panic in |
385 | case of an oom event in any cgroup. | ||
381 | 386 | ||
382 | 7. Soft limits | 387 | 7. Soft limits |
383 | 388 | ||
@@ -414,7 +419,76 @@ NOTE1: Soft limits take effect over a long period of time, since they involve | |||
414 | NOTE2: It is recommended to set the soft limit always below the hard limit, | 419 | NOTE2: It is recommended to set the soft limit always below the hard limit, |
415 | otherwise the hard limit will take precedence. | 420 | otherwise the hard limit will take precedence. |
416 | 421 | ||
417 | 8. TODO | 422 | 8. Move charges at task migration |
423 | |||
424 | Users can move charges associated with a task along with task migration, that | ||
425 | is, uncharge task's pages from the old cgroup and charge them to the new cgroup. | ||
426 | This feature is not supported in !CONFIG_MMU environments because of lack of | ||
427 | page tables. | ||
428 | |||
429 | 8.1 Interface | ||
430 | |||
431 | This feature is disabled by default. It can be enabled(and disabled again) by | ||
432 | writing to memory.move_charge_at_immigrate of the destination cgroup. | ||
433 | |||
434 | If you want to enable it: | ||
435 | |||
436 | # echo (some positive value) > memory.move_charge_at_immigrate | ||
437 | |||
438 | Note: Each bits of move_charge_at_immigrate has its own meaning about what type | ||
439 | of charges should be moved. See 8.2 for details. | ||
440 | Note: Charges are moved only when you move mm->owner, IOW, a leader of a thread | ||
441 | group. | ||
442 | Note: If we cannot find enough space for the task in the destination cgroup, we | ||
443 | try to make space by reclaiming memory. Task migration may fail if we | ||
444 | cannot make enough space. | ||
445 | Note: It can take several seconds if you move charges in giga bytes order. | ||
446 | |||
447 | And if you want disable it again: | ||
448 | |||
449 | # echo 0 > memory.move_charge_at_immigrate | ||
450 | |||
451 | 8.2 Type of charges which can be move | ||
452 | |||
453 | Each bits of move_charge_at_immigrate has its own meaning about what type of | ||
454 | charges should be moved. | ||
455 | |||
456 | bit | what type of charges would be moved ? | ||
457 | -----+------------------------------------------------------------------------ | ||
458 | 0 | A charge of an anonymous page(or swap of it) used by the target task. | ||
459 | | Those pages and swaps must be used only by the target task. You must | ||
460 | | enable Swap Extension(see 2.4) to enable move of swap charges. | ||
461 | |||
462 | Note: Those pages and swaps must be charged to the old cgroup. | ||
463 | Note: More type of pages(e.g. file cache, shmem,) will be supported by other | ||
464 | bits in future. | ||
465 | |||
466 | 8.3 TODO | ||
467 | |||
468 | - Add support for other types of pages(e.g. file cache, shmem, etc.). | ||
469 | - Implement madvise(2) to let users decide the vma to be moved or not to be | ||
470 | moved. | ||
471 | - All of moving charge operations are done under cgroup_mutex. It's not good | ||
472 | behavior to hold the mutex too long, so we may need some trick. | ||
473 | |||
474 | 9. Memory thresholds | ||
475 | |||
476 | Memory controler implements memory thresholds using cgroups notification | ||
477 | API (see cgroups.txt). It allows to register multiple memory and memsw | ||
478 | thresholds and gets notifications when it crosses. | ||
479 | |||
480 | To register a threshold application need: | ||
481 | - create an eventfd using eventfd(2); | ||
482 | - open memory.usage_in_bytes or memory.memsw.usage_in_bytes; | ||
483 | - write string like "<event_fd> <memory.usage_in_bytes> <threshold>" to | ||
484 | cgroup.event_control. | ||
485 | |||
486 | Application will be notified through eventfd when memory usage crosses | ||
487 | threshold in any direction. | ||
488 | |||
489 | It's applicable for root and non-root cgroup. | ||
490 | |||
491 | 10. TODO | ||
418 | 492 | ||
419 | 1. Add support for accounting huge pages (as a separate controller) | 493 | 1. Add support for accounting huge pages (as a separate controller) |
420 | 2. Make per-cgroup scanner reclaim not-shared pages first | 494 | 2. Make per-cgroup scanner reclaim not-shared pages first |
diff --git a/Documentation/console/console.txt b/Documentation/console/console.txt index 877a1b26cc3d..926cf1b5e63e 100644 --- a/Documentation/console/console.txt +++ b/Documentation/console/console.txt | |||
@@ -74,7 +74,7 @@ driver takes over the consoles vacated by the driver. Binding, on the other | |||
74 | hand, will bind the driver to the consoles that are currently occupied by a | 74 | hand, will bind the driver to the consoles that are currently occupied by a |
75 | system driver. | 75 | system driver. |
76 | 76 | ||
77 | NOTE1: Binding and binding must be selected in Kconfig. It's under: | 77 | NOTE1: Binding and unbinding must be selected in Kconfig. It's under: |
78 | 78 | ||
79 | Device Drivers -> Character devices -> Support for binding and unbinding | 79 | Device Drivers -> Character devices -> Support for binding and unbinding |
80 | console drivers | 80 | console drivers |
diff --git a/Documentation/cpu-freq/pcc-cpufreq.txt b/Documentation/cpu-freq/pcc-cpufreq.txt new file mode 100644 index 000000000000..9e3c3b33514c --- /dev/null +++ b/Documentation/cpu-freq/pcc-cpufreq.txt | |||
@@ -0,0 +1,207 @@ | |||
1 | /* | ||
2 | * pcc-cpufreq.txt - PCC interface documentation | ||
3 | * | ||
4 | * Copyright (C) 2009 Red Hat, Matthew Garrett <mjg@redhat.com> | ||
5 | * Copyright (C) 2009 Hewlett-Packard Development Company, L.P. | ||
6 | * Nagananda Chumbalkar <nagananda.chumbalkar@hp.com> | ||
7 | * | ||
8 | * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
9 | * | ||
10 | * This program is free software; you can redistribute it and/or modify | ||
11 | * it under the terms of the GNU General Public License as published by | ||
12 | * the Free Software Foundation; version 2 of the License. | ||
13 | * | ||
14 | * This program is distributed in the hope that it will be useful, but | ||
15 | * WITHOUT ANY WARRANTY; without even the implied warranty of | ||
16 | * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or NON | ||
17 | * INFRINGEMENT. See the GNU General Public License for more details. | ||
18 | * | ||
19 | * You should have received a copy of the GNU General Public License along | ||
20 | * with this program; if not, write to the Free Software Foundation, Inc., | ||
21 | * 675 Mass Ave, Cambridge, MA 02139, USA. | ||
22 | * | ||
23 | * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
24 | */ | ||
25 | |||
26 | |||
27 | Processor Clocking Control Driver | ||
28 | --------------------------------- | ||
29 | |||
30 | Contents: | ||
31 | --------- | ||
32 | 1. Introduction | ||
33 | 1.1 PCC interface | ||
34 | 1.1.1 Get Average Frequency | ||
35 | 1.1.2 Set Desired Frequency | ||
36 | 1.2 Platforms affected | ||
37 | 2. Driver and /sys details | ||
38 | 2.1 scaling_available_frequencies | ||
39 | 2.2 cpuinfo_transition_latency | ||
40 | 2.3 cpuinfo_cur_freq | ||
41 | 2.4 related_cpus | ||
42 | 3. Caveats | ||
43 | |||
44 | 1. Introduction: | ||
45 | ---------------- | ||
46 | Processor Clocking Control (PCC) is an interface between the platform | ||
47 | firmware and OSPM. It is a mechanism for coordinating processor | ||
48 | performance (ie: frequency) between the platform firmware and the OS. | ||
49 | |||
50 | The PCC driver (pcc-cpufreq) allows OSPM to take advantage of the PCC | ||
51 | interface. | ||
52 | |||
53 | OS utilizes the PCC interface to inform platform firmware what frequency the | ||
54 | OS wants for a logical processor. The platform firmware attempts to achieve | ||
55 | the requested frequency. If the request for the target frequency could not be | ||
56 | satisfied by platform firmware, then it usually means that power budget | ||
57 | conditions are in place, and "power capping" is taking place. | ||
58 | |||
59 | 1.1 PCC interface: | ||
60 | ------------------ | ||
61 | The complete PCC specification is available here: | ||
62 | http://www.acpica.org/download/Processor-Clocking-Control-v1p0.pdf | ||
63 | |||
64 | PCC relies on a shared memory region that provides a channel for communication | ||
65 | between the OS and platform firmware. PCC also implements a "doorbell" that | ||
66 | is used by the OS to inform the platform firmware that a command has been | ||
67 | sent. | ||
68 | |||
69 | The ACPI PCCH() method is used to discover the location of the PCC shared | ||
70 | memory region. The shared memory region header contains the "command" and | ||
71 | "status" interface. PCCH() also contains details on how to access the platform | ||
72 | doorbell. | ||
73 | |||
74 | The following commands are supported by the PCC interface: | ||
75 | * Get Average Frequency | ||
76 | * Set Desired Frequency | ||
77 | |||
78 | The ACPI PCCP() method is implemented for each logical processor and is | ||
79 | used to discover the offsets for the input and output buffers in the shared | ||
80 | memory region. | ||
81 | |||
82 | When PCC mode is enabled, the platform will not expose processor performance | ||
83 | or throttle states (_PSS, _TSS and related ACPI objects) to OSPM. Therefore, | ||
84 | the native P-state driver (such as acpi-cpufreq for Intel, powernow-k8 for | ||
85 | AMD) will not load. | ||
86 | |||
87 | However, OSPM remains in control of policy. The governor (eg: "ondemand") | ||
88 | computes the required performance for each processor based on server workload. | ||
89 | The PCC driver fills in the command interface, and the input buffer and | ||
90 | communicates the request to the platform firmware. The platform firmware is | ||
91 | responsible for delivering the requested performance. | ||
92 | |||
93 | Each PCC command is "global" in scope and can affect all the logical CPUs in | ||
94 | the system. Therefore, PCC is capable of performing "group" updates. With PCC | ||
95 | the OS is capable of getting/setting the frequency of all the logical CPUs in | ||
96 | the system with a single call to the BIOS. | ||
97 | |||
98 | 1.1.1 Get Average Frequency: | ||
99 | ---------------------------- | ||
100 | This command is used by the OSPM to query the running frequency of the | ||
101 | processor since the last time this command was completed. The output buffer | ||
102 | indicates the average unhalted frequency of the logical processor expressed as | ||
103 | a percentage of the nominal (ie: maximum) CPU frequency. The output buffer | ||
104 | also signifies if the CPU frequency is limited by a power budget condition. | ||
105 | |||
106 | 1.1.2 Set Desired Frequency: | ||
107 | ---------------------------- | ||
108 | This command is used by the OSPM to communicate to the platform firmware the | ||
109 | desired frequency for a logical processor. The output buffer is currently | ||
110 | ignored by OSPM. The next invocation of "Get Average Frequency" will inform | ||
111 | OSPM if the desired frequency was achieved or not. | ||
112 | |||
113 | 1.2 Platforms affected: | ||
114 | ----------------------- | ||
115 | The PCC driver will load on any system where the platform firmware: | ||
116 | * supports the PCC interface, and the associated PCCH() and PCCP() methods | ||
117 | * assumes responsibility for managing the hardware clocking controls in order | ||
118 | to deliver the requested processor performance | ||
119 | |||
120 | Currently, certain HP ProLiant platforms implement the PCC interface. On those | ||
121 | platforms PCC is the "default" choice. | ||
122 | |||
123 | However, it is possible to disable this interface via a BIOS setting. In | ||
124 | such an instance, as is also the case on platforms where the PCC interface | ||
125 | is not implemented, the PCC driver will fail to load silently. | ||
126 | |||
127 | 2. Driver and /sys details: | ||
128 | --------------------------- | ||
129 | When the driver loads, it merely prints the lowest and the highest CPU | ||
130 | frequencies supported by the platform firmware. | ||
131 | |||
132 | The PCC driver loads with a message such as: | ||
133 | pcc-cpufreq: (v1.00.00) driver loaded with frequency limits: 1600 MHz, 2933 | ||
134 | MHz | ||
135 | |||
136 | This means that the OPSM can request the CPU to run at any frequency in | ||
137 | between the limits (1600 MHz, and 2933 MHz) specified in the message. | ||
138 | |||
139 | Internally, there is no need for the driver to convert the "target" frequency | ||
140 | to a corresponding P-state. | ||
141 | |||
142 | The VERSION number for the driver will be of the format v.xy.ab. | ||
143 | eg: 1.00.02 | ||
144 | ----- -- | ||
145 | | | | ||
146 | | -- this will increase with bug fixes/enhancements to the driver | ||
147 | |-- this is the version of the PCC specification the driver adheres to | ||
148 | |||
149 | |||
150 | The following is a brief discussion on some of the fields exported via the | ||
151 | /sys filesystem and how their values are affected by the PCC driver: | ||
152 | |||
153 | 2.1 scaling_available_frequencies: | ||
154 | ---------------------------------- | ||
155 | scaling_available_frequencies is not created in /sys. No intermediate | ||
156 | frequencies need to be listed because the BIOS will try to achieve any | ||
157 | frequency, within limits, requested by the governor. A frequency does not have | ||
158 | to be strictly associated with a P-state. | ||
159 | |||
160 | 2.2 cpuinfo_transition_latency: | ||
161 | ------------------------------- | ||
162 | The cpuinfo_transition_latency field is 0. The PCC specification does | ||
163 | not include a field to expose this value currently. | ||
164 | |||
165 | 2.3 cpuinfo_cur_freq: | ||
166 | --------------------- | ||
167 | A) Often cpuinfo_cur_freq will show a value different than what is declared | ||
168 | in the scaling_available_frequencies or scaling_cur_freq, or scaling_max_freq. | ||
169 | This is due to "turbo boost" available on recent Intel processors. If certain | ||
170 | conditions are met the BIOS can achieve a slightly higher speed than requested | ||
171 | by OSPM. An example: | ||
172 | |||
173 | scaling_cur_freq : 2933000 | ||
174 | cpuinfo_cur_freq : 3196000 | ||
175 | |||
176 | B) There is a round-off error associated with the cpuinfo_cur_freq value. | ||
177 | Since the driver obtains the current frequency as a "percentage" (%) of the | ||
178 | nominal frequency from the BIOS, sometimes, the values displayed by | ||
179 | scaling_cur_freq and cpuinfo_cur_freq may not match. An example: | ||
180 | |||
181 | scaling_cur_freq : 1600000 | ||
182 | cpuinfo_cur_freq : 1583000 | ||
183 | |||
184 | In this example, the nominal frequency is 2933 MHz. The driver obtains the | ||
185 | current frequency, cpuinfo_cur_freq, as 54% of the nominal frequency: | ||
186 | |||
187 | 54% of 2933 MHz = 1583 MHz | ||
188 | |||
189 | Nominal frequency is the maximum frequency of the processor, and it usually | ||
190 | corresponds to the frequency of the P0 P-state. | ||
191 | |||
192 | 2.4 related_cpus: | ||
193 | ----------------- | ||
194 | The related_cpus field is identical to affected_cpus. | ||
195 | |||
196 | affected_cpus : 4 | ||
197 | related_cpus : 4 | ||
198 | |||
199 | Currently, the PCC driver does not evaluate _PSD. The platforms that support | ||
200 | PCC do not implement SW_ALL. So OSPM doesn't need to perform any coordination | ||
201 | to ensure that the same frequency is requested of all dependent CPUs. | ||
202 | |||
203 | 3. Caveats: | ||
204 | ----------- | ||
205 | The "cpufreq_stats" module in its present form cannot be loaded and | ||
206 | expected to work with the PCC driver. Since the "cpufreq_stats" module | ||
207 | provides information wrt each P-state, it is not applicable to the PCC driver. | ||
diff --git a/Documentation/device-mapper/snapshot.txt b/Documentation/device-mapper/snapshot.txt index e3a77b215135..0d5bc46dc167 100644 --- a/Documentation/device-mapper/snapshot.txt +++ b/Documentation/device-mapper/snapshot.txt | |||
@@ -122,3 +122,47 @@ volumeGroup-base: 0 2097152 snapshot-merge 254:11 254:12 P 16 | |||
122 | brw------- 1 root root 254, 11 29 ago 18:15 /dev/mapper/volumeGroup-base-real | 122 | brw------- 1 root root 254, 11 29 ago 18:15 /dev/mapper/volumeGroup-base-real |
123 | brw------- 1 root root 254, 12 29 ago 18:16 /dev/mapper/volumeGroup-base-cow | 123 | brw------- 1 root root 254, 12 29 ago 18:16 /dev/mapper/volumeGroup-base-cow |
124 | brw------- 1 root root 254, 10 29 ago 18:16 /dev/mapper/volumeGroup-base | 124 | brw------- 1 root root 254, 10 29 ago 18:16 /dev/mapper/volumeGroup-base |
125 | |||
126 | |||
127 | How to determine when a merging is complete | ||
128 | =========================================== | ||
129 | The snapshot-merge and snapshot status lines end with: | ||
130 | <sectors_allocated>/<total_sectors> <metadata_sectors> | ||
131 | |||
132 | Both <sectors_allocated> and <total_sectors> include both data and metadata. | ||
133 | During merging, the number of sectors allocated gets smaller and | ||
134 | smaller. Merging has finished when the number of sectors holding data | ||
135 | is zero, in other words <sectors_allocated> == <metadata_sectors>. | ||
136 | |||
137 | Here is a practical example (using a hybrid of lvm and dmsetup commands): | ||
138 | |||
139 | # lvs | ||
140 | LV VG Attr LSize Origin Snap% Move Log Copy% Convert | ||
141 | base volumeGroup owi-a- 4.00g | ||
142 | snap volumeGroup swi-a- 1.00g base 18.97 | ||
143 | |||
144 | # dmsetup status volumeGroup-snap | ||
145 | 0 8388608 snapshot 397896/2097152 1560 | ||
146 | ^^^^ metadata sectors | ||
147 | |||
148 | # lvconvert --merge -b volumeGroup/snap | ||
149 | Merging of volume snap started. | ||
150 | |||
151 | # lvs volumeGroup/snap | ||
152 | LV VG Attr LSize Origin Snap% Move Log Copy% Convert | ||
153 | base volumeGroup Owi-a- 4.00g 17.23 | ||
154 | |||
155 | # dmsetup status volumeGroup-base | ||
156 | 0 8388608 snapshot-merge 281688/2097152 1104 | ||
157 | |||
158 | # dmsetup status volumeGroup-base | ||
159 | 0 8388608 snapshot-merge 180480/2097152 712 | ||
160 | |||
161 | # dmsetup status volumeGroup-base | ||
162 | 0 8388608 snapshot-merge 16/2097152 16 | ||
163 | |||
164 | Merging has finished. | ||
165 | |||
166 | # lvs | ||
167 | LV VG Attr LSize Origin Snap% Move Log Copy% Convert | ||
168 | base volumeGroup owi-a- 4.00g | ||
diff --git a/Documentation/dontdiff b/Documentation/dontdiff index 3ad6acead949..d9bcffd59433 100644 --- a/Documentation/dontdiff +++ b/Documentation/dontdiff | |||
@@ -69,7 +69,6 @@ av_permissions.h | |||
69 | bbootsect | 69 | bbootsect |
70 | bin2c | 70 | bin2c |
71 | binkernel.spec | 71 | binkernel.spec |
72 | binoffset | ||
73 | bootsect | 72 | bootsect |
74 | bounds.h | 73 | bounds.h |
75 | bsetup | 74 | bsetup |
diff --git a/Documentation/driver-model/platform.txt b/Documentation/driver-model/platform.txt index 2e2c2ea90ceb..41f41632ee55 100644 --- a/Documentation/driver-model/platform.txt +++ b/Documentation/driver-model/platform.txt | |||
@@ -192,7 +192,7 @@ command line. This will execute all matching early_param() callbacks. | |||
192 | User specified early platform devices will be registered at this point. | 192 | User specified early platform devices will be registered at this point. |
193 | For the early serial console case the user can specify port on the | 193 | For the early serial console case the user can specify port on the |
194 | kernel command line as "earlyprintk=serial.0" where "earlyprintk" is | 194 | kernel command line as "earlyprintk=serial.0" where "earlyprintk" is |
195 | the class string, "serial" is the name of the platfrom driver and | 195 | the class string, "serial" is the name of the platform driver and |
196 | 0 is the platform device id. If the id is -1 then the dot and the | 196 | 0 is the platform device id. If the id is -1 then the dot and the |
197 | id can be omitted. | 197 | id can be omitted. |
198 | 198 | ||
diff --git a/Documentation/dvb/get_dvb_firmware b/Documentation/dvb/get_dvb_firmware index 14b7b5a3bcb9..239cbdbf4d12 100644 --- a/Documentation/dvb/get_dvb_firmware +++ b/Documentation/dvb/get_dvb_firmware | |||
@@ -26,7 +26,7 @@ use IO::Handle; | |||
26 | "dec3000s", "vp7041", "dibusb", "nxt2002", "nxt2004", | 26 | "dec3000s", "vp7041", "dibusb", "nxt2002", "nxt2004", |
27 | "or51211", "or51132_qam", "or51132_vsb", "bluebird", | 27 | "or51211", "or51132_qam", "or51132_vsb", "bluebird", |
28 | "opera1", "cx231xx", "cx18", "cx23885", "pvrusb2", "mpc718", | 28 | "opera1", "cx231xx", "cx18", "cx23885", "pvrusb2", "mpc718", |
29 | "af9015"); | 29 | "af9015", "ngene"); |
30 | 30 | ||
31 | # Check args | 31 | # Check args |
32 | syntax() if (scalar(@ARGV) != 1); | 32 | syntax() if (scalar(@ARGV) != 1); |
@@ -39,7 +39,7 @@ for ($i=0; $i < scalar(@components); $i++) { | |||
39 | die $@ if $@; | 39 | die $@ if $@; |
40 | print STDERR <<EOF; | 40 | print STDERR <<EOF; |
41 | Firmware(s) $outfile extracted successfully. | 41 | Firmware(s) $outfile extracted successfully. |
42 | Now copy it(they) to either /usr/lib/hotplug/firmware or /lib/firmware | 42 | Now copy it(them) to either /usr/lib/hotplug/firmware or /lib/firmware |
43 | (depending on configuration of firmware hotplug). | 43 | (depending on configuration of firmware hotplug). |
44 | EOF | 44 | EOF |
45 | exit(0); | 45 | exit(0); |
@@ -549,6 +549,24 @@ sub af9015 { | |||
549 | close INFILE; | 549 | close INFILE; |
550 | } | 550 | } |
551 | 551 | ||
552 | sub ngene { | ||
553 | my $url = "http://www.digitaldevices.de/download/"; | ||
554 | my $file1 = "ngene_15.fw"; | ||
555 | my $hash1 = "d798d5a757121174f0dbc5f2833c0c85"; | ||
556 | my $file2 = "ngene_17.fw"; | ||
557 | my $hash2 = "26b687136e127b8ac24b81e0eeafc20b"; | ||
558 | |||
559 | checkstandard(); | ||
560 | |||
561 | wgetfile($file1, $url . $file1); | ||
562 | verify($file1, $hash1); | ||
563 | |||
564 | wgetfile($file2, $url . $file2); | ||
565 | verify($file2, $hash2); | ||
566 | |||
567 | "$file1, $file2"; | ||
568 | } | ||
569 | |||
552 | # --------------------------------------------------------------- | 570 | # --------------------------------------------------------------- |
553 | # Utilities | 571 | # Utilities |
554 | 572 | ||
@@ -667,6 +685,7 @@ sub delzero{ | |||
667 | sub syntax() { | 685 | sub syntax() { |
668 | print STDERR "syntax: get_dvb_firmware <component>\n"; | 686 | print STDERR "syntax: get_dvb_firmware <component>\n"; |
669 | print STDERR "Supported components:\n"; | 687 | print STDERR "Supported components:\n"; |
688 | @components = sort @components; | ||
670 | for($i=0; $i < scalar(@components); $i++) { | 689 | for($i=0; $i < scalar(@components); $i++) { |
671 | print STDERR "\t" . $components[$i] . "\n"; | 690 | print STDERR "\t" . $components[$i] . "\n"; |
672 | } | 691 | } |
diff --git a/Documentation/eisa.txt b/Documentation/eisa.txt index 60e361ba08c0..f297fc1202ae 100644 --- a/Documentation/eisa.txt +++ b/Documentation/eisa.txt | |||
@@ -171,7 +171,7 @@ device. | |||
171 | virtual_root.force_probe : | 171 | virtual_root.force_probe : |
172 | 172 | ||
173 | Force the probing code to probe EISA slots even when it cannot find an | 173 | Force the probing code to probe EISA slots even when it cannot find an |
174 | EISA compliant mainboard (nothing appears on slot 0). Defaultd to 0 | 174 | EISA compliant mainboard (nothing appears on slot 0). Defaults to 0 |
175 | (don't force), and set to 1 (force probing) when either | 175 | (don't force), and set to 1 (force probing) when either |
176 | CONFIG_ALPHA_JENSEN or CONFIG_EISA_VLB_PRIMING are set. | 176 | CONFIG_ALPHA_JENSEN or CONFIG_EISA_VLB_PRIMING are set. |
177 | 177 | ||
diff --git a/Documentation/email-clients.txt b/Documentation/email-clients.txt index a618efab7b15..945ff3fda433 100644 --- a/Documentation/email-clients.txt +++ b/Documentation/email-clients.txt | |||
@@ -216,26 +216,14 @@ Works. Use "Insert file..." or external editor. | |||
216 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | 216 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
217 | Gmail (Web GUI) | 217 | Gmail (Web GUI) |
218 | 218 | ||
219 | If you just have to use Gmail to send patches, it CAN be made to work. It | 219 | Does not work for sending patches. |
220 | requires a bit of external help, though. | 220 | |
221 | 221 | Gmail web client converts tabs to spaces automatically. | |
222 | The first problem is that Gmail converts tabs to spaces. This will | 222 | |
223 | totally break your patches. To prevent this, you have to use a different | 223 | At the same time it wraps lines every 78 chars with CRLF style line breaks |
224 | editor. There is a firefox extension called "ViewSourceWith" | 224 | although tab2space problem can be solved with external editor. |
225 | (https://addons.mozilla.org/en-US/firefox/addon/394) which allows you to | 225 | |
226 | edit any text box in the editor of your choice. Configure it to launch | 226 | Another problem is that Gmail will base64-encode any message that has a |
227 | your favorite editor. When you want to send a patch, use this technique. | 227 | non-ASCII character. That includes things like European names. |
228 | Once you have crafted your messsage + patch, save and exit the editor, | ||
229 | which should reload the Gmail edit box. GMAIL WILL PRESERVE THE TABS. | ||
230 | Hoorah. Apparently you can cut-n-paste literal tabs, but Gmail will | ||
231 | convert those to spaces upon sending! | ||
232 | |||
233 | The second problem is that Gmail converts tabs to spaces on replies. If | ||
234 | you reply to a patch, don't expect to be able to apply it as a patch. | ||
235 | |||
236 | The last problem is that Gmail will base64-encode any message that has a | ||
237 | non-ASCII character. That includes things like European names. Be aware. | ||
238 | |||
239 | Gmail is not convenient for lkml patches, but CAN be made to work. | ||
240 | 228 | ||
241 | ### | 229 | ### |
diff --git a/Documentation/fault-injection/provoke-crashes.txt b/Documentation/fault-injection/provoke-crashes.txt new file mode 100644 index 000000000000..7a9d3d81525b --- /dev/null +++ b/Documentation/fault-injection/provoke-crashes.txt | |||
@@ -0,0 +1,38 @@ | |||
1 | The lkdtm module provides an interface to crash or injure the kernel at | ||
2 | predefined crashpoints to evaluate the reliability of crash dumps obtained | ||
3 | using different dumping solutions. The module uses KPROBEs to instrument | ||
4 | crashing points, but can also crash the kernel directly without KRPOBE | ||
5 | support. | ||
6 | |||
7 | |||
8 | You can provide the way either through module arguments when inserting | ||
9 | the module, or through a debugfs interface. | ||
10 | |||
11 | Usage: insmod lkdtm.ko [recur_count={>0}] cpoint_name=<> cpoint_type=<> | ||
12 | [cpoint_count={>0}] | ||
13 | |||
14 | recur_count : Recursion level for the stack overflow test. Default is 10. | ||
15 | |||
16 | cpoint_name : Crash point where the kernel is to be crashed. It can be | ||
17 | one of INT_HARDWARE_ENTRY, INT_HW_IRQ_EN, INT_TASKLET_ENTRY, | ||
18 | FS_DEVRW, MEM_SWAPOUT, TIMERADD, SCSI_DISPATCH_CMD, | ||
19 | IDE_CORE_CP, DIRECT | ||
20 | |||
21 | cpoint_type : Indicates the action to be taken on hitting the crash point. | ||
22 | It can be one of PANIC, BUG, EXCEPTION, LOOP, OVERFLOW, | ||
23 | CORRUPT_STACK, UNALIGNED_LOAD_STORE_WRITE, OVERWRITE_ALLOCATION, | ||
24 | WRITE_AFTER_FREE, | ||
25 | |||
26 | cpoint_count : Indicates the number of times the crash point is to be hit | ||
27 | to trigger an action. The default is 10. | ||
28 | |||
29 | You can also induce failures by mounting debugfs and writing the type to | ||
30 | <mountpoint>/provoke-crash/<crashpoint>. E.g., | ||
31 | |||
32 | mount -t debugfs debugfs /mnt | ||
33 | echo EXCEPTION > /mnt/provoke-crash/INT_HARDWARE_ENTRY | ||
34 | |||
35 | |||
36 | A special file is `DIRECT' which will induce the crash directly without | ||
37 | KPROBE instrumentation. This mode is the only one available when the module | ||
38 | is built on a kernel without KPROBEs support. | ||
diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index 0a46833c1b76..ed511af0f79a 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt | |||
@@ -6,21 +6,6 @@ be removed from this file. | |||
6 | 6 | ||
7 | --------------------------- | 7 | --------------------------- |
8 | 8 | ||
9 | What: USER_SCHED | ||
10 | When: 2.6.34 | ||
11 | |||
12 | Why: USER_SCHED was implemented as a proof of concept for group scheduling. | ||
13 | The effect of USER_SCHED can already be achieved from userspace with | ||
14 | the help of libcgroup. The removal of USER_SCHED will also simplify | ||
15 | the scheduler code with the removal of one major ifdef. There are also | ||
16 | issues USER_SCHED has with USER_NS. A decision was taken not to fix | ||
17 | those and instead remove USER_SCHED. Also new group scheduling | ||
18 | features will not be implemented for USER_SCHED. | ||
19 | |||
20 | Who: Dhaval Giani <dhaval@linux.vnet.ibm.com> | ||
21 | |||
22 | --------------------------- | ||
23 | |||
24 | What: PRISM54 | 9 | What: PRISM54 |
25 | When: 2.6.34 | 10 | When: 2.6.34 |
26 | 11 | ||
@@ -64,6 +49,17 @@ Who: Robin Getz <rgetz@blackfin.uclinux.org> & Matt Mackall <mpm@selenic.com> | |||
64 | 49 | ||
65 | --------------------------- | 50 | --------------------------- |
66 | 51 | ||
52 | What: Deprecated snapshot ioctls | ||
53 | When: 2.6.36 | ||
54 | |||
55 | Why: The ioctls in kernel/power/user.c were marked as deprecated long time | ||
56 | ago. Now they notify users about that so that they need to replace | ||
57 | their userspace. After some more time, remove them completely. | ||
58 | |||
59 | Who: Jiri Slaby <jirislaby@gmail.com> | ||
60 | |||
61 | --------------------------- | ||
62 | |||
67 | What: The ieee80211_regdom module parameter | 63 | What: The ieee80211_regdom module parameter |
68 | When: March 2010 / desktop catchup | 64 | When: March 2010 / desktop catchup |
69 | 65 | ||
@@ -88,27 +84,6 @@ Who: Luis R. Rodriguez <lrodriguez@atheros.com> | |||
88 | 84 | ||
89 | --------------------------- | 85 | --------------------------- |
90 | 86 | ||
91 | What: CONFIG_WIRELESS_OLD_REGULATORY - old static regulatory information | ||
92 | When: March 2010 / desktop catchup | ||
93 | |||
94 | Why: The old regulatory infrastructure has been replaced with a new one | ||
95 | which does not require statically defined regulatory domains. We do | ||
96 | not want to keep static regulatory domains in the kernel due to the | ||
97 | the dynamic nature of regulatory law and localization. We kept around | ||
98 | the old static definitions for the regulatory domains of: | ||
99 | |||
100 | * US | ||
101 | * JP | ||
102 | * EU | ||
103 | |||
104 | and used by default the US when CONFIG_WIRELESS_OLD_REGULATORY was | ||
105 | set. We will remove this option once the standard Linux desktop catches | ||
106 | up with the new userspace APIs we have implemented. | ||
107 | |||
108 | Who: Luis R. Rodriguez <lrodriguez@atheros.com> | ||
109 | |||
110 | --------------------------- | ||
111 | |||
112 | What: dev->power.power_state | 87 | What: dev->power.power_state |
113 | When: July 2007 | 88 | When: July 2007 |
114 | Why: Broken design for runtime control over driver power states, confusing | 89 | Why: Broken design for runtime control over driver power states, confusing |
@@ -142,19 +117,25 @@ Who: Mauro Carvalho Chehab <mchehab@infradead.org> | |||
142 | --------------------------- | 117 | --------------------------- |
143 | 118 | ||
144 | What: PCMCIA control ioctl (needed for pcmcia-cs [cardmgr, cardctl]) | 119 | What: PCMCIA control ioctl (needed for pcmcia-cs [cardmgr, cardctl]) |
145 | When: November 2005 | 120 | When: 2.6.35/2.6.36 |
146 | Files: drivers/pcmcia/: pcmcia_ioctl.c | 121 | Files: drivers/pcmcia/: pcmcia_ioctl.c |
147 | Why: With the 16-bit PCMCIA subsystem now behaving (almost) like a | 122 | Why: With the 16-bit PCMCIA subsystem now behaving (almost) like a |
148 | normal hotpluggable bus, and with it using the default kernel | 123 | normal hotpluggable bus, and with it using the default kernel |
149 | infrastructure (hotplug, driver core, sysfs) keeping the PCMCIA | 124 | infrastructure (hotplug, driver core, sysfs) keeping the PCMCIA |
150 | control ioctl needed by cardmgr and cardctl from pcmcia-cs is | 125 | control ioctl needed by cardmgr and cardctl from pcmcia-cs is |
151 | unnecessary, and makes further cleanups and integration of the | 126 | unnecessary and potentially harmful (it does not provide for |
127 | proper locking), and makes further cleanups and integration of the | ||
152 | PCMCIA subsystem into the Linux kernel device driver model more | 128 | PCMCIA subsystem into the Linux kernel device driver model more |
153 | difficult. The features provided by cardmgr and cardctl are either | 129 | difficult. The features provided by cardmgr and cardctl are either |
154 | handled by the kernel itself now or are available in the new | 130 | handled by the kernel itself now or are available in the new |
155 | pcmciautils package available at | 131 | pcmciautils package available at |
156 | http://kernel.org/pub/linux/utils/kernel/pcmcia/ | 132 | http://kernel.org/pub/linux/utils/kernel/pcmcia/ |
157 | Who: Dominik Brodowski <linux@brodo.de> | 133 | |
134 | For all architectures except ARM, the associated config symbol | ||
135 | has been removed from kernel 2.6.34; for ARM, it will be likely | ||
136 | be removed from kernel 2.6.35. The actual code will then likely | ||
137 | be removed from kernel 2.6.36. | ||
138 | Who: Dominik Brodowski <linux@dominikbrodowski.net> | ||
158 | 139 | ||
159 | --------------------------- | 140 | --------------------------- |
160 | 141 | ||
@@ -468,12 +449,6 @@ Who: Alok N Kataria <akataria@vmware.com> | |||
468 | 449 | ||
469 | ---------------------------- | 450 | ---------------------------- |
470 | 451 | ||
471 | What: adt7473 hardware monitoring driver | ||
472 | When: February 2010 | ||
473 | Why: Obsoleted by the adt7475 driver. | ||
474 | Who: Jean Delvare <khali@linux-fr.org> | ||
475 | |||
476 | --------------------------- | ||
477 | What: Support for lcd_switch and display_get in asus-laptop driver | 452 | What: Support for lcd_switch and display_get in asus-laptop driver |
478 | When: March 2010 | 453 | When: March 2010 |
479 | Why: These two features use non-standard interfaces. There are the | 454 | Why: These two features use non-standard interfaces. There are the |
@@ -542,3 +517,75 @@ Why: Duplicate functionality with the gspca_zc3xx driver, zc0301 only | |||
542 | sensors) wich are also supported by the gspca_zc3xx driver | 517 | sensors) wich are also supported by the gspca_zc3xx driver |
543 | (which supports 53 USB-ID's in total) | 518 | (which supports 53 USB-ID's in total) |
544 | Who: Hans de Goede <hdegoede@redhat.com> | 519 | Who: Hans de Goede <hdegoede@redhat.com> |
520 | |||
521 | ---------------------------- | ||
522 | |||
523 | What: corgikbd, spitzkbd, tosakbd driver | ||
524 | When: 2.6.35 | ||
525 | Files: drivers/input/keyboard/{corgi,spitz,tosa}kbd.c | ||
526 | Why: We now have a generic GPIO based matrix keyboard driver that | ||
527 | are fully capable of handling all the keys on these devices. | ||
528 | The original drivers manipulate the GPIO registers directly | ||
529 | and so are difficult to maintain. | ||
530 | Who: Eric Miao <eric.y.miao@gmail.com> | ||
531 | |||
532 | ---------------------------- | ||
533 | |||
534 | What: corgi_ssp and corgi_ts driver | ||
535 | When: 2.6.35 | ||
536 | Files: arch/arm/mach-pxa/corgi_ssp.c, drivers/input/touchscreen/corgi_ts.c | ||
537 | Why: The corgi touchscreen is now deprecated in favour of the generic | ||
538 | ads7846.c driver. The noise reduction technique used in corgi_ts.c, | ||
539 | that's to wait till vsync before ADC sampling, is also integrated into | ||
540 | ads7846 driver now. Provided that the original driver is not generic | ||
541 | and is difficult to maintain, it will be removed later. | ||
542 | Who: Eric Miao <eric.y.miao@gmail.com> | ||
543 | |||
544 | ---------------------------- | ||
545 | |||
546 | What: capifs | ||
547 | When: February 2011 | ||
548 | Files: drivers/isdn/capi/capifs.* | ||
549 | Why: udev fully replaces this special file system that only contains CAPI | ||
550 | NCCI TTY device nodes. User space (pppdcapiplugin) works without | ||
551 | noticing the difference. | ||
552 | Who: Jan Kiszka <jan.kiszka@web.de> | ||
553 | |||
554 | ---------------------------- | ||
555 | |||
556 | What: KVM memory aliases support | ||
557 | When: July 2010 | ||
558 | Why: Memory aliasing support is used for speeding up guest vga access | ||
559 | through the vga windows. | ||
560 | |||
561 | Modern userspace no longer uses this feature, so it's just bitrotted | ||
562 | code and can be removed with no impact. | ||
563 | Who: Avi Kivity <avi@redhat.com> | ||
564 | |||
565 | ---------------------------- | ||
566 | |||
567 | What: KVM kernel-allocated memory slots | ||
568 | When: July 2010 | ||
569 | Why: Since 2.6.25, kvm supports user-allocated memory slots, which are | ||
570 | much more flexible than kernel-allocated slots. All current userspace | ||
571 | supports the newer interface and this code can be removed with no | ||
572 | impact. | ||
573 | Who: Avi Kivity <avi@redhat.com> | ||
574 | |||
575 | ---------------------------- | ||
576 | |||
577 | What: KVM paravirt mmu host support | ||
578 | When: January 2011 | ||
579 | Why: The paravirt mmu host support is slower than non-paravirt mmu, both | ||
580 | on newer and older hardware. It is already not exposed to the guest, | ||
581 | and kept only for live migration purposes. | ||
582 | Who: Avi Kivity <avi@redhat.com> | ||
583 | |||
584 | ---------------------------- | ||
585 | |||
586 | What: "acpi=ht" boot option | ||
587 | When: 2.6.35 | ||
588 | Why: Useful in 2003, implementation is a hack. | ||
589 | Generally invoked by accident today. | ||
590 | Seen as doing more harm than good. | ||
591 | Who: Len Brown <len.brown@intel.com> | ||
diff --git a/Documentation/filesystems/00-INDEX b/Documentation/filesystems/00-INDEX index 875d49696b6e..3bae418c6ad3 100644 --- a/Documentation/filesystems/00-INDEX +++ b/Documentation/filesystems/00-INDEX | |||
@@ -32,6 +32,8 @@ dlmfs.txt | |||
32 | - info on the userspace interface to the OCFS2 DLM. | 32 | - info on the userspace interface to the OCFS2 DLM. |
33 | dnotify.txt | 33 | dnotify.txt |
34 | - info about directory notification in Linux. | 34 | - info about directory notification in Linux. |
35 | dnotify_test.c | ||
36 | - example program for dnotify | ||
35 | ecryptfs.txt | 37 | ecryptfs.txt |
36 | - docs on eCryptfs: stacked cryptographic filesystem for Linux. | 38 | - docs on eCryptfs: stacked cryptographic filesystem for Linux. |
37 | exofs.txt | 39 | exofs.txt |
@@ -62,6 +64,8 @@ jfs.txt | |||
62 | - info and mount options for the JFS filesystem. | 64 | - info and mount options for the JFS filesystem. |
63 | locks.txt | 65 | locks.txt |
64 | - info on file locking implementations, flock() vs. fcntl(), etc. | 66 | - info on file locking implementations, flock() vs. fcntl(), etc. |
67 | logfs.txt | ||
68 | - info on the LogFS flash filesystem. | ||
65 | mandatory-locking.txt | 69 | mandatory-locking.txt |
66 | - info on the Linux implementation of Sys V mandatory file locking. | 70 | - info on the Linux implementation of Sys V mandatory file locking. |
67 | ncpfs.txt | 71 | ncpfs.txt |
diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking index 18b9d0ca0630..06bbbed71206 100644 --- a/Documentation/filesystems/Locking +++ b/Documentation/filesystems/Locking | |||
@@ -460,13 +460,6 @@ in sys_read() and friends. | |||
460 | 460 | ||
461 | --------------------------- dquot_operations ------------------------------- | 461 | --------------------------- dquot_operations ------------------------------- |
462 | prototypes: | 462 | prototypes: |
463 | int (*initialize) (struct inode *, int); | ||
464 | int (*drop) (struct inode *); | ||
465 | int (*alloc_space) (struct inode *, qsize_t, int); | ||
466 | int (*alloc_inode) (const struct inode *, unsigned long); | ||
467 | int (*free_space) (struct inode *, qsize_t); | ||
468 | int (*free_inode) (const struct inode *, unsigned long); | ||
469 | int (*transfer) (struct inode *, struct iattr *); | ||
470 | int (*write_dquot) (struct dquot *); | 463 | int (*write_dquot) (struct dquot *); |
471 | int (*acquire_dquot) (struct dquot *); | 464 | int (*acquire_dquot) (struct dquot *); |
472 | int (*release_dquot) (struct dquot *); | 465 | int (*release_dquot) (struct dquot *); |
@@ -479,13 +472,6 @@ a proper locking wrt the filesystem and call the generic quota operations. | |||
479 | What filesystem should expect from the generic quota functions: | 472 | What filesystem should expect from the generic quota functions: |
480 | 473 | ||
481 | FS recursion Held locks when called | 474 | FS recursion Held locks when called |
482 | initialize: yes maybe dqonoff_sem | ||
483 | drop: yes - | ||
484 | alloc_space: ->mark_dirty() - | ||
485 | alloc_inode: ->mark_dirty() - | ||
486 | free_space: ->mark_dirty() - | ||
487 | free_inode: ->mark_dirty() - | ||
488 | transfer: yes - | ||
489 | write_dquot: yes dqonoff_sem or dqptr_sem | 475 | write_dquot: yes dqonoff_sem or dqptr_sem |
490 | acquire_dquot: yes dqonoff_sem or dqptr_sem | 476 | acquire_dquot: yes dqonoff_sem or dqptr_sem |
491 | release_dquot: yes dqonoff_sem or dqptr_sem | 477 | release_dquot: yes dqonoff_sem or dqptr_sem |
@@ -495,10 +481,6 @@ write_info: yes dqonoff_sem | |||
495 | FS recursion means calling ->quota_read() and ->quota_write() from superblock | 481 | FS recursion means calling ->quota_read() and ->quota_write() from superblock |
496 | operations. | 482 | operations. |
497 | 483 | ||
498 | ->alloc_space(), ->alloc_inode(), ->free_space(), ->free_inode() are called | ||
499 | only directly by the filesystem and do not call any fs functions only | ||
500 | the ->mark_dirty() operation. | ||
501 | |||
502 | More details about quota locking can be found in fs/dquot.c. | 484 | More details about quota locking can be found in fs/dquot.c. |
503 | 485 | ||
504 | --------------------------- vm_operations_struct ----------------------------- | 486 | --------------------------- vm_operations_struct ----------------------------- |
diff --git a/Documentation/filesystems/Makefile b/Documentation/filesystems/Makefile new file mode 100644 index 000000000000..a5dd114da14f --- /dev/null +++ b/Documentation/filesystems/Makefile | |||
@@ -0,0 +1,8 @@ | |||
1 | # kbuild trick to avoid linker error. Can be omitted if a module is built. | ||
2 | obj- := dummy.o | ||
3 | |||
4 | # List of programs to build | ||
5 | hostprogs-y := dnotify_test | ||
6 | |||
7 | # Tell kbuild to always build the programs | ||
8 | always := $(hostprogs-y) | ||
diff --git a/Documentation/filesystems/dentry-locking.txt b/Documentation/filesystems/dentry-locking.txt index 4c0c575a4012..79334ed5daa7 100644 --- a/Documentation/filesystems/dentry-locking.txt +++ b/Documentation/filesystems/dentry-locking.txt | |||
@@ -62,7 +62,8 @@ changes are : | |||
62 | 2. Insertion of a dentry into the hash table is done using | 62 | 2. Insertion of a dentry into the hash table is done using |
63 | hlist_add_head_rcu() which take care of ordering the writes - the | 63 | hlist_add_head_rcu() which take care of ordering the writes - the |
64 | writes to the dentry must be visible before the dentry is | 64 | writes to the dentry must be visible before the dentry is |
65 | inserted. This works in conjunction with hlist_for_each_rcu() while | 65 | inserted. This works in conjunction with hlist_for_each_rcu(), |
66 | which has since been replaced by hlist_for_each_entry_rcu(), while | ||
66 | walking the hash chain. The only requirement is that all | 67 | walking the hash chain. The only requirement is that all |
67 | initialization to the dentry must be done before | 68 | initialization to the dentry must be done before |
68 | hlist_add_head_rcu() since we don't have dcache_lock protection | 69 | hlist_add_head_rcu() since we don't have dcache_lock protection |
diff --git a/Documentation/filesystems/dnotify.txt b/Documentation/filesystems/dnotify.txt index 9f5d338ddbb8..6baf88f46859 100644 --- a/Documentation/filesystems/dnotify.txt +++ b/Documentation/filesystems/dnotify.txt | |||
@@ -62,38 +62,9 @@ disabled, fcntl(fd, F_NOTIFY, ...) will return -EINVAL. | |||
62 | 62 | ||
63 | Example | 63 | Example |
64 | ------- | 64 | ------- |
65 | See Documentation/filesystems/dnotify_test.c for an example. | ||
65 | 66 | ||
66 | #define _GNU_SOURCE /* needed to get the defines */ | 67 | NOTE |
67 | #include <fcntl.h> /* in glibc 2.2 this has the needed | 68 | ---- |
68 | values defined */ | 69 | Beginning with Linux 2.6.13, dnotify has been replaced by inotify. |
69 | #include <signal.h> | 70 | See Documentation/filesystems/inotify.txt for more information on it. |
70 | #include <stdio.h> | ||
71 | #include <unistd.h> | ||
72 | |||
73 | static volatile int event_fd; | ||
74 | |||
75 | static void handler(int sig, siginfo_t *si, void *data) | ||
76 | { | ||
77 | event_fd = si->si_fd; | ||
78 | } | ||
79 | |||
80 | int main(void) | ||
81 | { | ||
82 | struct sigaction act; | ||
83 | int fd; | ||
84 | |||
85 | act.sa_sigaction = handler; | ||
86 | sigemptyset(&act.sa_mask); | ||
87 | act.sa_flags = SA_SIGINFO; | ||
88 | sigaction(SIGRTMIN + 1, &act, NULL); | ||
89 | |||
90 | fd = open(".", O_RDONLY); | ||
91 | fcntl(fd, F_SETSIG, SIGRTMIN + 1); | ||
92 | fcntl(fd, F_NOTIFY, DN_MODIFY|DN_CREATE|DN_MULTISHOT); | ||
93 | /* we will now be notified if any of the files | ||
94 | in "." is modified or new files are created */ | ||
95 | while (1) { | ||
96 | pause(); | ||
97 | printf("Got event on fd=%d\n", event_fd); | ||
98 | } | ||
99 | } | ||
diff --git a/Documentation/filesystems/dnotify_test.c b/Documentation/filesystems/dnotify_test.c new file mode 100644 index 000000000000..8b37b4a1e18d --- /dev/null +++ b/Documentation/filesystems/dnotify_test.c | |||
@@ -0,0 +1,34 @@ | |||
1 | #define _GNU_SOURCE /* needed to get the defines */ | ||
2 | #include <fcntl.h> /* in glibc 2.2 this has the needed | ||
3 | values defined */ | ||
4 | #include <signal.h> | ||
5 | #include <stdio.h> | ||
6 | #include <unistd.h> | ||
7 | |||
8 | static volatile int event_fd; | ||
9 | |||
10 | static void handler(int sig, siginfo_t *si, void *data) | ||
11 | { | ||
12 | event_fd = si->si_fd; | ||
13 | } | ||
14 | |||
15 | int main(void) | ||
16 | { | ||
17 | struct sigaction act; | ||
18 | int fd; | ||
19 | |||
20 | act.sa_sigaction = handler; | ||
21 | sigemptyset(&act.sa_mask); | ||
22 | act.sa_flags = SA_SIGINFO; | ||
23 | sigaction(SIGRTMIN + 1, &act, NULL); | ||
24 | |||
25 | fd = open(".", O_RDONLY); | ||
26 | fcntl(fd, F_SETSIG, SIGRTMIN + 1); | ||
27 | fcntl(fd, F_NOTIFY, DN_MODIFY|DN_CREATE|DN_MULTISHOT); | ||
28 | /* we will now be notified if any of the files | ||
29 | in "." is modified or new files are created */ | ||
30 | while (1) { | ||
31 | pause(); | ||
32 | printf("Got event on fd=%d\n", event_fd); | ||
33 | } | ||
34 | } | ||
diff --git a/Documentation/filesystems/logfs.txt b/Documentation/filesystems/logfs.txt new file mode 100644 index 000000000000..e64c94ba401a --- /dev/null +++ b/Documentation/filesystems/logfs.txt | |||
@@ -0,0 +1,241 @@ | |||
1 | |||
2 | The LogFS Flash Filesystem | ||
3 | ========================== | ||
4 | |||
5 | Specification | ||
6 | ============= | ||
7 | |||
8 | Superblocks | ||
9 | ----------- | ||
10 | |||
11 | Two superblocks exist at the beginning and end of the filesystem. | ||
12 | Each superblock is 256 Bytes large, with another 3840 Bytes reserved | ||
13 | for future purposes, making a total of 4096 Bytes. | ||
14 | |||
15 | Superblock locations may differ for MTD and block devices. On MTD the | ||
16 | first non-bad block contains a superblock in the first 4096 Bytes and | ||
17 | the last non-bad block contains a superblock in the last 4096 Bytes. | ||
18 | On block devices, the first 4096 Bytes of the device contain the first | ||
19 | superblock and the last aligned 4096 Byte-block contains the second | ||
20 | superblock. | ||
21 | |||
22 | For the most part, the superblocks can be considered read-only. They | ||
23 | are written only to correct errors detected within the superblocks, | ||
24 | move the journal and change the filesystem parameters through tunefs. | ||
25 | As a result, the superblock does not contain any fields that require | ||
26 | constant updates, like the amount of free space, etc. | ||
27 | |||
28 | Segments | ||
29 | -------- | ||
30 | |||
31 | The space in the device is split up into equal-sized segments. | ||
32 | Segments are the primary write unit of LogFS. Within each segments, | ||
33 | writes happen from front (low addresses) to back (high addresses. If | ||
34 | only a partial segment has been written, the segment number, the | ||
35 | current position within and optionally a write buffer are stored in | ||
36 | the journal. | ||
37 | |||
38 | Segments are erased as a whole. Therefore Garbage Collection may be | ||
39 | required to completely free a segment before doing so. | ||
40 | |||
41 | Journal | ||
42 | -------- | ||
43 | |||
44 | The journal contains all global information about the filesystem that | ||
45 | is subject to frequent change. At mount time, it has to be scanned | ||
46 | for the most recent commit entry, which contains a list of pointers to | ||
47 | all currently valid entries. | ||
48 | |||
49 | Object Store | ||
50 | ------------ | ||
51 | |||
52 | All space except for the superblocks and journal is part of the object | ||
53 | store. Each segment contains a segment header and a number of | ||
54 | objects, each consisting of the object header and the payload. | ||
55 | Objects are either inodes, directory entries (dentries), file data | ||
56 | blocks or indirect blocks. | ||
57 | |||
58 | Levels | ||
59 | ------ | ||
60 | |||
61 | Garbage collection (GC) may fail if all data is written | ||
62 | indiscriminately. One requirement of GC is that data is seperated | ||
63 | roughly according to the distance between the tree root and the data. | ||
64 | Effectively that means all file data is on level 0, indirect blocks | ||
65 | are on levels 1, 2, 3 4 or 5 for 1x, 2x, 3x, 4x or 5x indirect blocks, | ||
66 | respectively. Inode file data is on level 6 for the inodes and 7-11 | ||
67 | for indirect blocks. | ||
68 | |||
69 | Each segment contains objects of a single level only. As a result, | ||
70 | each level requires its own seperate segment to be open for writing. | ||
71 | |||
72 | Inode File | ||
73 | ---------- | ||
74 | |||
75 | All inodes are stored in a special file, the inode file. Single | ||
76 | exception is the inode file's inode (master inode) which for obvious | ||
77 | reasons is stored in the journal instead. Instead of data blocks, the | ||
78 | leaf nodes of the inode files are inodes. | ||
79 | |||
80 | Aliases | ||
81 | ------- | ||
82 | |||
83 | Writes in LogFS are done by means of a wandering tree. A naïve | ||
84 | implementation would require that for each write or a block, all | ||
85 | parent blocks are written as well, since the block pointers have | ||
86 | changed. Such an implementation would not be very efficient. | ||
87 | |||
88 | In LogFS, the block pointer changes are cached in the journal by means | ||
89 | of alias entries. Each alias consists of its logical address - inode | ||
90 | number, block index, level and child number (index into block) - and | ||
91 | the changed data. Any 8-byte word can be changes in this manner. | ||
92 | |||
93 | Currently aliases are used for block pointers, file size, file used | ||
94 | bytes and the height of an inodes indirect tree. | ||
95 | |||
96 | Segment Aliases | ||
97 | --------------- | ||
98 | |||
99 | Related to regular aliases, these are used to handle bad blocks. | ||
100 | Initially, bad blocks are handled by moving the affected segment | ||
101 | content to a spare segment and noting this move in the journal with a | ||
102 | segment alias, a simple (to, from) tupel. GC will later empty this | ||
103 | segment and the alias can be removed again. This is used on MTD only. | ||
104 | |||
105 | Vim | ||
106 | --- | ||
107 | |||
108 | By cleverly predicting the life time of data, it is possible to | ||
109 | seperate long-living data from short-living data and thereby reduce | ||
110 | the GC overhead later. Each type of distinc life expectency (vim) can | ||
111 | have a seperate segment open for writing. Each (level, vim) tupel can | ||
112 | be open just once. If an open segment with unknown vim is encountered | ||
113 | at mount time, it is closed and ignored henceforth. | ||
114 | |||
115 | Indirect Tree | ||
116 | ------------- | ||
117 | |||
118 | Inodes in LogFS are similar to FFS-style filesystems with direct and | ||
119 | indirect block pointers. One difference is that LogFS uses a single | ||
120 | indirect pointer that can be either a 1x, 2x, etc. indirect pointer. | ||
121 | A height field in the inode defines the height of the indirect tree | ||
122 | and thereby the indirection of the pointer. | ||
123 | |||
124 | Another difference is the addressing of indirect blocks. In LogFS, | ||
125 | the first 16 pointers in the first indirect block are left empty, | ||
126 | corresponding to the 16 direct pointers in the inode. In ext2 (maybe | ||
127 | others as well) the first pointer in the first indirect block | ||
128 | corresponds to logical block 12, skipping the 12 direct pointers. | ||
129 | So where ext2 is using arithmetic to better utilize space, LogFS keeps | ||
130 | arithmetic simple and uses compression to save space. | ||
131 | |||
132 | Compression | ||
133 | ----------- | ||
134 | |||
135 | Both file data and metadata can be compressed. Compression for file | ||
136 | data can be enabled with chattr +c and disabled with chattr -c. Doing | ||
137 | so has no effect on existing data, but new data will be stored | ||
138 | accordingly. New inodes will inherit the compression flag of the | ||
139 | parent directory. | ||
140 | |||
141 | Metadata is always compressed. However, the space accounting ignores | ||
142 | this and charges for the uncompressed size. Failing to do so could | ||
143 | result in GC failures when, after moving some data, indirect blocks | ||
144 | compress worse than previously. Even on a 100% full medium, GC may | ||
145 | not consume any extra space, so the compression gains are lost space | ||
146 | to the user. | ||
147 | |||
148 | However, they are not lost space to the filesystem internals. By | ||
149 | cheating the user for those bytes, the filesystem gained some slack | ||
150 | space and GC will run less often and faster. | ||
151 | |||
152 | Garbage Collection and Wear Leveling | ||
153 | ------------------------------------ | ||
154 | |||
155 | Garbage collection is invoked whenever the number of free segments | ||
156 | falls below a threshold. The best (known) candidate is picked based | ||
157 | on the least amount of valid data contained in the segment. All | ||
158 | remaining valid data is copied elsewhere, thereby invalidating it. | ||
159 | |||
160 | The GC code also checks for aliases and writes then back if their | ||
161 | number gets too large. | ||
162 | |||
163 | Wear leveling is done by occasionally picking a suboptimal segment for | ||
164 | garbage collection. If a stale segments erase count is significantly | ||
165 | lower than the active segments' erase counts, it will be picked. Wear | ||
166 | leveling is rate limited, so it will never monopolize the device for | ||
167 | more than one segment worth at a time. | ||
168 | |||
169 | Values for "occasionally", "significantly lower" are compile time | ||
170 | constants. | ||
171 | |||
172 | Hashed directories | ||
173 | ------------------ | ||
174 | |||
175 | To satisfy efficient lookup(), directory entries are hashed and | ||
176 | located based on the hash. In order to both support large directories | ||
177 | and not be overly inefficient for small directories, several hash | ||
178 | tables of increasing size are used. For each table, the hash value | ||
179 | modulo the table size gives the table index. | ||
180 | |||
181 | Tables sizes are chosen to limit the number of indirect blocks with a | ||
182 | fully populated table to 0, 1, 2 or 3 respectively. So the first | ||
183 | table contains 16 entries, the second 512-16, etc. | ||
184 | |||
185 | The last table is special in several ways. First its size depends on | ||
186 | the effective 32bit limit on telldir/seekdir cookies. Since logfs | ||
187 | uses the upper half of the address space for indirect blocks, the size | ||
188 | is limited to 2^31. Secondly the table contains hash buckets with 16 | ||
189 | entries each. | ||
190 | |||
191 | Using single-entry buckets would result in birthday "attacks". At | ||
192 | just 2^16 used entries, hash collisions would be likely (P >= 0.5). | ||
193 | My math skills are insufficient to do the combinatorics for the 17x | ||
194 | collisions necessary to overflow a bucket, but testing showed that in | ||
195 | 10,000 runs the lowest directory fill before a bucket overflow was | ||
196 | 188,057,130 entries with an average of 315,149,915 entries. So for | ||
197 | directory sizes of up to a million, bucket overflows should be | ||
198 | virtually impossible under normal circumstances. | ||
199 | |||
200 | With carefully chosen filenames, it is obviously possible to cause an | ||
201 | overflow with just 21 entries (4 higher tables + 16 entries + 1). So | ||
202 | there may be a security concern if a malicious user has write access | ||
203 | to a directory. | ||
204 | |||
205 | Open For Discussion | ||
206 | =================== | ||
207 | |||
208 | Device Address Space | ||
209 | -------------------- | ||
210 | |||
211 | A device address space is used for caching. Both block devices and | ||
212 | MTD provide functions to either read a single page or write a segment. | ||
213 | Partial segments may be written for data integrity, but where possible | ||
214 | complete segments are written for performance on simple block device | ||
215 | flash media. | ||
216 | |||
217 | Meta Inodes | ||
218 | ----------- | ||
219 | |||
220 | Inodes are stored in the inode file, which is just a regular file for | ||
221 | most purposes. At umount time, however, the inode file needs to | ||
222 | remain open until all dirty inodes are written. So | ||
223 | generic_shutdown_super() may not close this inode, but shouldn't | ||
224 | complain about remaining inodes due to the inode file either. Same | ||
225 | goes for mapping inode of the device address space. | ||
226 | |||
227 | Currently logfs uses a hack that essentially copies part of fs/inode.c | ||
228 | code over. A general solution would be preferred. | ||
229 | |||
230 | Indirect block mapping | ||
231 | ---------------------- | ||
232 | |||
233 | With compression, the block device (or mapping inode) cannot be used | ||
234 | to cache indirect blocks. Some other place is required. Currently | ||
235 | logfs uses the top half of each inode's address space. The low 8TB | ||
236 | (on 32bit) are filled with file data, the high 8TB are used for | ||
237 | indirect blocks. | ||
238 | |||
239 | One problem is that 16TB files created on 64bit systems actually have | ||
240 | data in the top 8TB. But files >16TB would cause problems anyway, so | ||
241 | only the limit has changed. | ||
diff --git a/Documentation/filesystems/nfs/nfs41-server.txt b/Documentation/filesystems/nfs/nfs41-server.txt index 1bd0d0c05171..6a53a84afc72 100644 --- a/Documentation/filesystems/nfs/nfs41-server.txt +++ b/Documentation/filesystems/nfs/nfs41-server.txt | |||
@@ -17,8 +17,7 @@ kernels must turn 4.1 on or off *before* turning support for version 4 | |||
17 | on or off; rpc.nfsd does this correctly.) | 17 | on or off; rpc.nfsd does this correctly.) |
18 | 18 | ||
19 | The NFSv4 minorversion 1 (NFSv4.1) implementation in nfsd is based | 19 | The NFSv4 minorversion 1 (NFSv4.1) implementation in nfsd is based |
20 | on the latest NFSv4.1 Internet Draft: | 20 | on RFC 5661. |
21 | http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion1-29 | ||
22 | 21 | ||
23 | From the many new features in NFSv4.1 the current implementation | 22 | From the many new features in NFSv4.1 the current implementation |
24 | focuses on the mandatory-to-implement NFSv4.1 Sessions, providing | 23 | focuses on the mandatory-to-implement NFSv4.1 Sessions, providing |
@@ -44,7 +43,7 @@ interoperability problems with future clients. Known issues: | |||
44 | trunking, but this is a mandatory feature, and its use is | 43 | trunking, but this is a mandatory feature, and its use is |
45 | recommended to clients in a number of places. (E.g. to ensure | 44 | recommended to clients in a number of places. (E.g. to ensure |
46 | timely renewal in case an existing connection's retry timeouts | 45 | timely renewal in case an existing connection's retry timeouts |
47 | have gotten too long; see section 8.3 of the draft.) | 46 | have gotten too long; see section 8.3 of the RFC.) |
48 | Therefore, lack of this feature may cause future clients to | 47 | Therefore, lack of this feature may cause future clients to |
49 | fail. | 48 | fail. |
50 | - Incomplete backchannel support: incomplete backchannel gss | 49 | - Incomplete backchannel support: incomplete backchannel gss |
diff --git a/Documentation/filesystems/nilfs2.txt b/Documentation/filesystems/nilfs2.txt index 839efd8a8a8c..cf6d0d85ca82 100644 --- a/Documentation/filesystems/nilfs2.txt +++ b/Documentation/filesystems/nilfs2.txt | |||
@@ -74,6 +74,9 @@ norecovery Disable recovery of the filesystem on mount. | |||
74 | This disables every write access on the device for | 74 | This disables every write access on the device for |
75 | read-only mounts or snapshots. This option will fail | 75 | read-only mounts or snapshots. This option will fail |
76 | for r/w mounts on an unclean volume. | 76 | for r/w mounts on an unclean volume. |
77 | discard Issue discard/TRIM commands to the underlying block | ||
78 | device when blocks are freed. This is useful for SSD | ||
79 | devices and sparse/thinly-provisioned LUNs. | ||
77 | 80 | ||
78 | NILFS2 usage | 81 | NILFS2 usage |
79 | ============ | 82 | ============ |
diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index 0d07513a67a6..a4f30faa4f1f 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt | |||
@@ -164,6 +164,7 @@ read the file /proc/PID/status: | |||
164 | VmExe: 68 kB | 164 | VmExe: 68 kB |
165 | VmLib: 1412 kB | 165 | VmLib: 1412 kB |
166 | VmPTE: 20 kb | 166 | VmPTE: 20 kb |
167 | VmSwap: 0 kB | ||
167 | Threads: 1 | 168 | Threads: 1 |
168 | SigQ: 0/28578 | 169 | SigQ: 0/28578 |
169 | SigPnd: 0000000000000000 | 170 | SigPnd: 0000000000000000 |
@@ -188,7 +189,13 @@ memory usage. Its seven fields are explained in Table 1-3. The stat file | |||
188 | contains details information about the process itself. Its fields are | 189 | contains details information about the process itself. Its fields are |
189 | explained in Table 1-4. | 190 | explained in Table 1-4. |
190 | 191 | ||
191 | Table 1-2: Contents of the statm files (as of 2.6.30-rc7) | 192 | (for SMP CONFIG users) |
193 | For making accounting scalable, RSS related information are handled in | ||
194 | asynchronous manner and the vaule may not be very precise. To see a precise | ||
195 | snapshot of a moment, you can see /proc/<pid>/smaps file and scan page table. | ||
196 | It's slow but very precise. | ||
197 | |||
198 | Table 1-2: Contents of the status files (as of 2.6.30-rc7) | ||
192 | .............................................................................. | 199 | .............................................................................. |
193 | Field Content | 200 | Field Content |
194 | Name filename of the executable | 201 | Name filename of the executable |
@@ -213,6 +220,7 @@ Table 1-2: Contents of the statm files (as of 2.6.30-rc7) | |||
213 | VmExe size of text segment | 220 | VmExe size of text segment |
214 | VmLib size of shared library code | 221 | VmLib size of shared library code |
215 | VmPTE size of page table entries | 222 | VmPTE size of page table entries |
223 | VmSwap size of swap usage (the number of referred swapents) | ||
216 | Threads number of threads | 224 | Threads number of threads |
217 | SigQ number of signals queued/max. number for queue | 225 | SigQ number of signals queued/max. number for queue |
218 | SigPnd bitmap of pending signals for the thread | 226 | SigPnd bitmap of pending signals for the thread |
@@ -430,6 +438,7 @@ Table 1-5: Kernel info in /proc | |||
430 | modules List of loaded modules | 438 | modules List of loaded modules |
431 | mounts Mounted filesystems | 439 | mounts Mounted filesystems |
432 | net Networking info (see text) | 440 | net Networking info (see text) |
441 | pagetypeinfo Additional page allocator information (see text) (2.5) | ||
433 | partitions Table of partitions known to the system | 442 | partitions Table of partitions known to the system |
434 | pci Deprecated info of PCI bus (new way -> /proc/bus/pci/, | 443 | pci Deprecated info of PCI bus (new way -> /proc/bus/pci/, |
435 | decoupled by lspci (2.4) | 444 | decoupled by lspci (2.4) |
@@ -584,7 +593,7 @@ Node 0, zone DMA 0 4 5 4 4 3 ... | |||
584 | Node 0, zone Normal 1 0 0 1 101 8 ... | 593 | Node 0, zone Normal 1 0 0 1 101 8 ... |
585 | Node 0, zone HighMem 2 0 0 1 1 0 ... | 594 | Node 0, zone HighMem 2 0 0 1 1 0 ... |
586 | 595 | ||
587 | Memory fragmentation is a problem under some workloads, and buddyinfo is a | 596 | External fragmentation is a problem under some workloads, and buddyinfo is a |
588 | useful tool for helping diagnose these problems. Buddyinfo will give you a | 597 | useful tool for helping diagnose these problems. Buddyinfo will give you a |
589 | clue as to how big an area you can safely allocate, or why a previous | 598 | clue as to how big an area you can safely allocate, or why a previous |
590 | allocation failed. | 599 | allocation failed. |
@@ -594,6 +603,48 @@ available. In this case, there are 0 chunks of 2^0*PAGE_SIZE available in | |||
594 | ZONE_DMA, 4 chunks of 2^1*PAGE_SIZE in ZONE_DMA, 101 chunks of 2^4*PAGE_SIZE | 603 | ZONE_DMA, 4 chunks of 2^1*PAGE_SIZE in ZONE_DMA, 101 chunks of 2^4*PAGE_SIZE |
595 | available in ZONE_NORMAL, etc... | 604 | available in ZONE_NORMAL, etc... |
596 | 605 | ||
606 | More information relevant to external fragmentation can be found in | ||
607 | pagetypeinfo. | ||
608 | |||
609 | > cat /proc/pagetypeinfo | ||
610 | Page block order: 9 | ||
611 | Pages per block: 512 | ||
612 | |||
613 | Free pages count per migrate type at order 0 1 2 3 4 5 6 7 8 9 10 | ||
614 | Node 0, zone DMA, type Unmovable 0 0 0 1 1 1 1 1 1 1 0 | ||
615 | Node 0, zone DMA, type Reclaimable 0 0 0 0 0 0 0 0 0 0 0 | ||
616 | Node 0, zone DMA, type Movable 1 1 2 1 2 1 1 0 1 0 2 | ||
617 | Node 0, zone DMA, type Reserve 0 0 0 0 0 0 0 0 0 1 0 | ||
618 | Node 0, zone DMA, type Isolate 0 0 0 0 0 0 0 0 0 0 0 | ||
619 | Node 0, zone DMA32, type Unmovable 103 54 77 1 1 1 11 8 7 1 9 | ||
620 | Node 0, zone DMA32, type Reclaimable 0 0 2 1 0 0 0 0 1 0 0 | ||
621 | Node 0, zone DMA32, type Movable 169 152 113 91 77 54 39 13 6 1 452 | ||
622 | Node 0, zone DMA32, type Reserve 1 2 2 2 2 0 1 1 1 1 0 | ||
623 | Node 0, zone DMA32, type Isolate 0 0 0 0 0 0 0 0 0 0 0 | ||
624 | |||
625 | Number of blocks type Unmovable Reclaimable Movable Reserve Isolate | ||
626 | Node 0, zone DMA 2 0 5 1 0 | ||
627 | Node 0, zone DMA32 41 6 967 2 0 | ||
628 | |||
629 | Fragmentation avoidance in the kernel works by grouping pages of different | ||
630 | migrate types into the same contiguous regions of memory called page blocks. | ||
631 | A page block is typically the size of the default hugepage size e.g. 2MB on | ||
632 | X86-64. By keeping pages grouped based on their ability to move, the kernel | ||
633 | can reclaim pages within a page block to satisfy a high-order allocation. | ||
634 | |||
635 | The pagetypinfo begins with information on the size of a page block. It | ||
636 | then gives the same type of information as buddyinfo except broken down | ||
637 | by migrate-type and finishes with details on how many page blocks of each | ||
638 | type exist. | ||
639 | |||
640 | If min_free_kbytes has been tuned correctly (recommendations made by hugeadm | ||
641 | from libhugetlbfs http://sourceforge.net/projects/libhugetlbfs/), one can | ||
642 | make an estimate of the likely number of huge pages that can be allocated | ||
643 | at a given point in time. All the "Movable" blocks should be allocatable | ||
644 | unless memory has been mlock()'d. Some of the Reclaimable blocks should | ||
645 | also be allocatable although a lot of filesystem metadata may have to be | ||
646 | reclaimed to achieve this. | ||
647 | |||
597 | .............................................................................. | 648 | .............................................................................. |
598 | 649 | ||
599 | meminfo: | 650 | meminfo: |
diff --git a/Documentation/filesystems/sharedsubtree.txt b/Documentation/filesystems/sharedsubtree.txt index 23a181074f94..fc0e39af43c3 100644 --- a/Documentation/filesystems/sharedsubtree.txt +++ b/Documentation/filesystems/sharedsubtree.txt | |||
@@ -837,6 +837,9 @@ replicas continue to be exactly same. | |||
837 | individual lists does not affect propagation or the way propagation | 837 | individual lists does not affect propagation or the way propagation |
838 | tree is modified by operations. | 838 | tree is modified by operations. |
839 | 839 | ||
840 | All vfsmounts in a peer group have the same ->mnt_master. If it is | ||
841 | non-NULL, they form a contiguous (ordered) segment of slave list. | ||
842 | |||
840 | A example propagation tree looks as shown in the figure below. | 843 | A example propagation tree looks as shown in the figure below. |
841 | [ NOTE: Though it looks like a forest, if we consider all the shared | 844 | [ NOTE: Though it looks like a forest, if we consider all the shared |
842 | mounts as a conceptual entity called 'pnode', it becomes a tree] | 845 | mounts as a conceptual entity called 'pnode', it becomes a tree] |
@@ -874,8 +877,19 @@ replicas continue to be exactly same. | |||
874 | 877 | ||
875 | NOTE: The propagation tree is orthogonal to the mount tree. | 878 | NOTE: The propagation tree is orthogonal to the mount tree. |
876 | 879 | ||
880 | 8B Locking: | ||
881 | |||
882 | ->mnt_share, ->mnt_slave, ->mnt_slave_list, ->mnt_master are protected | ||
883 | by namespace_sem (exclusive for modifications, shared for reading). | ||
884 | |||
885 | Normally we have ->mnt_flags modifications serialized by vfsmount_lock. | ||
886 | There are two exceptions: do_add_mount() and clone_mnt(). | ||
887 | The former modifies a vfsmount that has not been visible in any shared | ||
888 | data structures yet. | ||
889 | The latter holds namespace_sem and the only references to vfsmount | ||
890 | are in lists that can't be traversed without namespace_sem. | ||
877 | 891 | ||
878 | 8B Algorithm: | 892 | 8C Algorithm: |
879 | 893 | ||
880 | The crux of the implementation resides in rbind/move operation. | 894 | The crux of the implementation resides in rbind/move operation. |
881 | 895 | ||
diff --git a/Documentation/gpio.txt b/Documentation/gpio.txt index 1866c27eec69..c2c6e9b39bbe 100644 --- a/Documentation/gpio.txt +++ b/Documentation/gpio.txt | |||
@@ -253,6 +253,70 @@ pin setup (e.g. controlling which pin the GPIO uses, pullup/pulldown). | |||
253 | Also note that it's your responsibility to have stopped using a GPIO | 253 | Also note that it's your responsibility to have stopped using a GPIO |
254 | before you free it. | 254 | before you free it. |
255 | 255 | ||
256 | Considering in most cases GPIOs are actually configured right after they | ||
257 | are claimed, three additional calls are defined: | ||
258 | |||
259 | /* request a single GPIO, with initial configuration specified by | ||
260 | * 'flags', identical to gpio_request() wrt other arguments and | ||
261 | * return value | ||
262 | */ | ||
263 | int gpio_request_one(unsigned gpio, unsigned long flags, const char *label); | ||
264 | |||
265 | /* request multiple GPIOs in a single call | ||
266 | */ | ||
267 | int gpio_request_array(struct gpio *array, size_t num); | ||
268 | |||
269 | /* release multiple GPIOs in a single call | ||
270 | */ | ||
271 | void gpio_free_array(struct gpio *array, size_t num); | ||
272 | |||
273 | where 'flags' is currently defined to specify the following properties: | ||
274 | |||
275 | * GPIOF_DIR_IN - to configure direction as input | ||
276 | * GPIOF_DIR_OUT - to configure direction as output | ||
277 | |||
278 | * GPIOF_INIT_LOW - as output, set initial level to LOW | ||
279 | * GPIOF_INIT_HIGH - as output, set initial level to HIGH | ||
280 | |||
281 | since GPIOF_INIT_* are only valid when configured as output, so group valid | ||
282 | combinations as: | ||
283 | |||
284 | * GPIOF_IN - configure as input | ||
285 | * GPIOF_OUT_INIT_LOW - configured as output, initial level LOW | ||
286 | * GPIOF_OUT_INIT_HIGH - configured as output, initial level HIGH | ||
287 | |||
288 | In the future, these flags can be extended to support more properties such | ||
289 | as open-drain status. | ||
290 | |||
291 | Further more, to ease the claim/release of multiple GPIOs, 'struct gpio' is | ||
292 | introduced to encapsulate all three fields as: | ||
293 | |||
294 | struct gpio { | ||
295 | unsigned gpio; | ||
296 | unsigned long flags; | ||
297 | const char *label; | ||
298 | }; | ||
299 | |||
300 | A typical example of usage: | ||
301 | |||
302 | static struct gpio leds_gpios[] = { | ||
303 | { 32, GPIOF_OUT_INIT_HIGH, "Power LED" }, /* default to ON */ | ||
304 | { 33, GPIOF_OUT_INIT_LOW, "Green LED" }, /* default to OFF */ | ||
305 | { 34, GPIOF_OUT_INIT_LOW, "Red LED" }, /* default to OFF */ | ||
306 | { 35, GPIOF_OUT_INIT_LOW, "Blue LED" }, /* default to OFF */ | ||
307 | { ... }, | ||
308 | }; | ||
309 | |||
310 | err = gpio_request_one(31, GPIOF_IN, "Reset Button"); | ||
311 | if (err) | ||
312 | ... | ||
313 | |||
314 | err = gpio_request_array(leds_gpios, ARRAY_SIZE(leds_gpios)); | ||
315 | if (err) | ||
316 | ... | ||
317 | |||
318 | gpio_free_array(leds_gpios, ARRAY_SIZE(leds_gpios)); | ||
319 | |||
256 | 320 | ||
257 | GPIOs mapped to IRQs | 321 | GPIOs mapped to IRQs |
258 | -------------------- | 322 | -------------------- |
diff --git a/Documentation/hwmon/abituguru b/Documentation/hwmon/abituguru index 87ffa0f5ec70..5eb3b9d5f0d5 100644 --- a/Documentation/hwmon/abituguru +++ b/Documentation/hwmon/abituguru | |||
@@ -30,7 +30,7 @@ Supported chips: | |||
30 | bank1_types=1,1,0,0,0,0,0,2,0,0,0,0,2,0,0,1 | 30 | bank1_types=1,1,0,0,0,0,0,2,0,0,0,0,2,0,0,1 |
31 | You may also need to specify the fan_sensors option for these boards | 31 | You may also need to specify the fan_sensors option for these boards |
32 | fan_sensors=5 | 32 | fan_sensors=5 |
33 | 2) There is a seperate abituguru3 driver for these motherboards, | 33 | 2) There is a separate abituguru3 driver for these motherboards, |
34 | the abituguru (without the 3 !) driver will not work on these | 34 | the abituguru (without the 3 !) driver will not work on these |
35 | motherboards (and visa versa)! | 35 | motherboards (and visa versa)! |
36 | 36 | ||
diff --git a/Documentation/hwmon/adt7411 b/Documentation/hwmon/adt7411 new file mode 100644 index 000000000000..1632960f9745 --- /dev/null +++ b/Documentation/hwmon/adt7411 | |||
@@ -0,0 +1,42 @@ | |||
1 | Kernel driver adt7411 | ||
2 | ===================== | ||
3 | |||
4 | Supported chips: | ||
5 | * Analog Devices ADT7411 | ||
6 | Prefix: 'adt7411' | ||
7 | Addresses scanned: 0x48, 0x4a, 0x4b | ||
8 | Datasheet: Publicly available at the Analog Devices website | ||
9 | |||
10 | Author: Wolfram Sang (based on adt7470 by Darrick J. Wong) | ||
11 | |||
12 | Description | ||
13 | ----------- | ||
14 | |||
15 | This driver implements support for the Analog Devices ADT7411 chip. There may | ||
16 | be other chips that implement this interface. | ||
17 | |||
18 | The ADT7411 can use an I2C/SMBus compatible 2-wire interface or an | ||
19 | SPI-compatible 4-wire interface. It provides a 10-bit analog to digital | ||
20 | converter which measures 1 temperature, vdd and 8 input voltages. It has an | ||
21 | internal temperature sensor, but an external one can also be connected (one | ||
22 | loses 2 inputs then). There are high- and low-limit registers for all inputs. | ||
23 | |||
24 | Check the datasheet for details. | ||
25 | |||
26 | sysfs-Interface | ||
27 | --------------- | ||
28 | |||
29 | in0_input - vdd voltage input | ||
30 | in[1-8]_input - analog 1-8 input | ||
31 | temp1_input - temperature input | ||
32 | |||
33 | Besides standard interfaces, this driver adds (0 = off, 1 = on): | ||
34 | |||
35 | adc_ref_vdd - Use vdd as reference instead of 2.25 V | ||
36 | fast_sampling - Sample at 22.5 kHz instead of 1.4 kHz, but drop filters | ||
37 | no_average - Turn off averaging over 16 samples | ||
38 | |||
39 | Notes | ||
40 | ----- | ||
41 | |||
42 | SPI, external temperature sensor and limit registers are not supported yet. | ||
diff --git a/Documentation/hwmon/adt7473 b/Documentation/hwmon/adt7473 deleted file mode 100644 index 446612bd1fb9..000000000000 --- a/Documentation/hwmon/adt7473 +++ /dev/null | |||
@@ -1,74 +0,0 @@ | |||
1 | Kernel driver adt7473 | ||
2 | ====================== | ||
3 | |||
4 | Supported chips: | ||
5 | * Analog Devices ADT7473 | ||
6 | Prefix: 'adt7473' | ||
7 | Addresses scanned: I2C 0x2C, 0x2D, 0x2E | ||
8 | Datasheet: Publicly available at the Analog Devices website | ||
9 | |||
10 | Author: Darrick J. Wong | ||
11 | |||
12 | This driver is depreacted, please use the adt7475 driver instead. | ||
13 | |||
14 | Description | ||
15 | ----------- | ||
16 | |||
17 | This driver implements support for the Analog Devices ADT7473 chip family. | ||
18 | |||
19 | The ADT7473 uses the 2-wire interface compatible with the SMBUS 2.0 | ||
20 | specification. Using an analog to digital converter it measures three (3) | ||
21 | temperatures and two (2) voltages. It has four (4) 16-bit counters for | ||
22 | measuring fan speed. There are three (3) PWM outputs that can be used | ||
23 | to control fan speed. | ||
24 | |||
25 | A sophisticated control system for the PWM outputs is designed into the | ||
26 | ADT7473 that allows fan speed to be adjusted automatically based on any of the | ||
27 | three temperature sensors. Each PWM output is individually adjustable and | ||
28 | programmable. Once configured, the ADT7473 will adjust the PWM outputs in | ||
29 | response to the measured temperatures without further host intervention. | ||
30 | This feature can also be disabled for manual control of the PWM's. | ||
31 | |||
32 | Each of the measured inputs (voltage, temperature, fan speed) has | ||
33 | corresponding high/low limit values. The ADT7473 will signal an ALARM if | ||
34 | any measured value exceeds either limit. | ||
35 | |||
36 | The ADT7473 samples all inputs continuously. The driver will not read | ||
37 | the registers more often than once every other second. Further, | ||
38 | configuration data is only read once per minute. | ||
39 | |||
40 | Special Features | ||
41 | ---------------- | ||
42 | |||
43 | The ADT7473 have a 10-bit ADC and can therefore measure temperatures | ||
44 | with 0.25 degC resolution. Temperature readings can be configured either | ||
45 | for twos complement format or "Offset 64" format, wherein 63 is subtracted | ||
46 | from the raw value to get the temperature value. | ||
47 | |||
48 | The Analog Devices datasheet is very detailed and describes a procedure for | ||
49 | determining an optimal configuration for the automatic PWM control. | ||
50 | |||
51 | Configuration Notes | ||
52 | ------------------- | ||
53 | |||
54 | Besides standard interfaces driver adds the following: | ||
55 | |||
56 | * PWM Control | ||
57 | |||
58 | * pwm#_auto_point1_pwm and temp#_auto_point1_temp and | ||
59 | * pwm#_auto_point2_pwm and temp#_auto_point2_temp - | ||
60 | |||
61 | point1: Set the pwm speed at a lower temperature bound. | ||
62 | point2: Set the pwm speed at a higher temperature bound. | ||
63 | |||
64 | The ADT7473 will scale the pwm between the lower and higher pwm speed when | ||
65 | the temperature is between the two temperature boundaries. PWM values range | ||
66 | from 0 (off) to 255 (full speed). Fan speed will be set to maximum when the | ||
67 | temperature sensor associated with the PWM control exceeds temp#_max. | ||
68 | |||
69 | Notes | ||
70 | ----- | ||
71 | |||
72 | The NVIDIA binary driver presents an ADT7473 chip via an on-card i2c bus. | ||
73 | Unfortunately, they fail to set the i2c adapter class, so this driver may | ||
74 | fail to find the chip until the nvidia driver is patched. | ||
diff --git a/Documentation/hwmon/asc7621 b/Documentation/hwmon/asc7621 new file mode 100644 index 000000000000..7287be7e1f21 --- /dev/null +++ b/Documentation/hwmon/asc7621 | |||
@@ -0,0 +1,296 @@ | |||
1 | Kernel driver asc7621 | ||
2 | ================== | ||
3 | |||
4 | Supported chips: | ||
5 | Andigilog aSC7621 and aSC7621a | ||
6 | Prefix: 'asc7621' | ||
7 | Addresses scanned: I2C 0x2c, 0x2d, 0x2e | ||
8 | Datasheet: http://www.fairview5.com/linux/asc7621/asc7621.pdf | ||
9 | |||
10 | Author: | ||
11 | George Joseph | ||
12 | |||
13 | Description provided by Dave Pivin @ Andigilog: | ||
14 | |||
15 | Andigilog has both the PECI and pre-PECI versions of the Heceta-6, as | ||
16 | Intel calls them. Heceta-6e has high frequency PWM and Heceta-6p has | ||
17 | added PECI and a 4th thermal zone. The Andigilog aSC7611 is the | ||
18 | Heceta-6e part and aSC7621 is the Heceta-6p part. They are both in | ||
19 | volume production, shipping to Intel and their subs. | ||
20 | |||
21 | We have enhanced both parts relative to the governing Intel | ||
22 | specification. First enhancement is temperature reading resolution. We | ||
23 | have used registers below 20h for vendor-specific functions in addition | ||
24 | to those in the Intel-specified vendor range. | ||
25 | |||
26 | Our conversion process produces a result that is reported as two bytes. | ||
27 | The fan speed control uses this finer value to produce a "step-less" fan | ||
28 | PWM output. These two bytes are "read-locked" to guarantee that once a | ||
29 | high or low byte is read, the other byte is locked-in until after the | ||
30 | next read of any register. So to get an atomic reading, read high or low | ||
31 | byte, then the very next read should be the opposite byte. Our data | ||
32 | sheet says 10-bits of resolution, although you may find the lower bits | ||
33 | are active, they are not necessarily reliable or useful externally. We | ||
34 | chose not to mask them. | ||
35 | |||
36 | We employ significant filtering that is user tunable as described in the | ||
37 | data sheet. Our temperature reports and fan PWM outputs are very smooth | ||
38 | when compared to the competition, in addition to the higher resolution | ||
39 | temperature reports. The smoother PWM output does not require user | ||
40 | intervention. | ||
41 | |||
42 | We offer GPIO features on the former VID pins. These are open-drain | ||
43 | outputs or inputs and may be used as general purpose I/O or as alarm | ||
44 | outputs that are based on temperature limits. These are in 19h and 1Ah. | ||
45 | |||
46 | We offer flexible mapping of temperature readings to thermal zones. Any | ||
47 | temperature may be mapped to any zone, which has a default assignment | ||
48 | that follows Intel's specs. | ||
49 | |||
50 | Since there is a fan to zone assignment that allows for the "hotter" of | ||
51 | a set of zones to control the PWM of an individual fan, but there is no | ||
52 | indication to the user, we have added an indicator that shows which zone | ||
53 | is currently controlling the PWM for a given fan. This is in register | ||
54 | 00h. | ||
55 | |||
56 | Both remote diode temperature readings may be given an offset value such | ||
57 | that the reported reading as well as the temperature used to determine | ||
58 | PWM may be offset for system calibration purposes. | ||
59 | |||
60 | PECI Extended configuration allows for having more than two domains per | ||
61 | PECI address and also provides an enabling function for each PECI | ||
62 | address. One could use our flexible zone assignment to have a zone | ||
63 | assigned to up to 4 PECI addresses. This is not possible in the default | ||
64 | Intel configuration. This would be useful in multi-CPU systems with | ||
65 | individual fans on each that would benefit from individual fan control. | ||
66 | This is in register 0Eh. | ||
67 | |||
68 | The tachometer measurement system is flexible and able to adapt to many | ||
69 | fan types. We can also support pulse-stretched PWM so that 3-wire fans | ||
70 | may be used. These characteristics are in registers 04h to 07h. | ||
71 | |||
72 | Finally, we have added a tach disable function that turns off the tach | ||
73 | measurement system for individual tachs in order to save power. That is | ||
74 | in register 75h. | ||
75 | |||
76 | -- | ||
77 | aSC7621 Product Description | ||
78 | |||
79 | The aSC7621 has a two wire digital interface compatible with SMBus 2.0. | ||
80 | Using a 10-bit ADC, the aSC7621 measures the temperature of two remote diode | ||
81 | connected transistors as well as its own die. Support for Platform | ||
82 | Environmental Control Interface (PECI) is included. | ||
83 | |||
84 | Using temperature information from these four zones, an automatic fan speed | ||
85 | control algorithm is employed to minimize acoustic impact while achieving | ||
86 | recommended CPU temperature under varying operational loads. | ||
87 | |||
88 | To set fan speed, the aSC7621 has three independent pulse width modulation | ||
89 | (PWM) outputs that are controlled by one, or a combination of three, | ||
90 | temperature zones. Both high- and low-frequency PWM ranges are supported. | ||
91 | |||
92 | The aSC7621 also includes a digital filter that can be invoked to smooth | ||
93 | temperature readings for better control of fan speed and minimum acoustic | ||
94 | impact. | ||
95 | |||
96 | The aSC7621 has tachometer inputs to measure fan speed on up to four fans. | ||
97 | Limit and status registers for all measured values are included to alert | ||
98 | the system host that any measurements are outside of programmed limits | ||
99 | via status registers. | ||
100 | |||
101 | System voltages of VCCP, 2.5V, 3.3V, 5.0V, and 12V motherboard power are | ||
102 | monitored efficiently with internal scaling resistors. | ||
103 | |||
104 | Features | ||
105 | - Supports PECI interface and monitors internal and remote thermal diodes | ||
106 | - 2-wire, SMBus 2.0 compliant, serial interface | ||
107 | - 10-bit ADC | ||
108 | - Monitors VCCP, 2.5V, 3.3V, 5.0V, and 12V motherboard/processor supplies | ||
109 | - Programmable autonomous fan control based on temperature readings | ||
110 | - Noise filtering of temperature reading for fan speed control | ||
111 | - 0.25C digital temperature sensor resolution | ||
112 | - 3 PWM fan speed control outputs for 2-, 3- or 4-wire fans and up to 4 fan | ||
113 | tachometer inputs | ||
114 | - Enhanced measured temperature to Temperature Zone assignment. | ||
115 | - Provides high and low PWM frequency ranges | ||
116 | - 3 GPIO pins for custom use | ||
117 | - 24-Lead QSOP package | ||
118 | |||
119 | Configuration Notes | ||
120 | =================== | ||
121 | |||
122 | Except where noted below, the sysfs entries created by this driver follow | ||
123 | the standards defined in "sysfs-interface". | ||
124 | |||
125 | temp1_source | ||
126 | 0 (default) peci_legacy = 0, Remote 1 Temperature | ||
127 | peci_legacy = 1, PECI Processor Temperature 0 | ||
128 | 1 Remote 1 Temperature | ||
129 | 2 Remote 2 Temperature | ||
130 | 3 Internal Temperature | ||
131 | 4 PECI Processor Temperature 0 | ||
132 | 5 PECI Processor Temperature 1 | ||
133 | 6 PECI Processor Temperature 2 | ||
134 | 7 PECI Processor Temperature 3 | ||
135 | |||
136 | temp2_source | ||
137 | 0 (default) Internal Temperature | ||
138 | 1 Remote 1 Temperature | ||
139 | 2 Remote 2 Temperature | ||
140 | 3 Internal Temperature | ||
141 | 4 PECI Processor Temperature 0 | ||
142 | 5 PECI Processor Temperature 1 | ||
143 | 6 PECI Processor Temperature 2 | ||
144 | 7 PECI Processor Temperature 3 | ||
145 | |||
146 | temp3_source | ||
147 | 0 (default) Remote 2 Temperature | ||
148 | 1 Remote 1 Temperature | ||
149 | 2 Remote 2 Temperature | ||
150 | 3 Internal Temperature | ||
151 | 4 PECI Processor Temperature 0 | ||
152 | 5 PECI Processor Temperature 1 | ||
153 | 6 PECI Processor Temperature 2 | ||
154 | 7 PECI Processor Temperature 3 | ||
155 | |||
156 | temp4_source | ||
157 | 0 (default) peci_legacy = 0, PECI Processor Temperature 0 | ||
158 | peci_legacy = 1, Remote 1 Temperature | ||
159 | 1 Remote 1 Temperature | ||
160 | 2 Remote 2 Temperature | ||
161 | 3 Internal Temperature | ||
162 | 4 PECI Processor Temperature 0 | ||
163 | 5 PECI Processor Temperature 1 | ||
164 | 6 PECI Processor Temperature 2 | ||
165 | 7 PECI Processor Temperature 3 | ||
166 | |||
167 | temp[1-4]_smoothing_enable | ||
168 | temp[1-4]_smoothing_time | ||
169 | Smooths spikes in temp readings caused by noise. | ||
170 | Valid values in milliseconds are: | ||
171 | 35000 | ||
172 | 17600 | ||
173 | 11800 | ||
174 | 7000 | ||
175 | 4400 | ||
176 | 3000 | ||
177 | 1600 | ||
178 | 800 | ||
179 | |||
180 | temp[1-4]_crit | ||
181 | When the corresponding zone temperature reaches this value, | ||
182 | ALL pwm outputs will got to 100%. | ||
183 | |||
184 | temp[5-8]_input | ||
185 | temp[5-8]_enable | ||
186 | The aSC7621 can also read temperatures provided by the processor | ||
187 | via the PECI bus. Usually these are "core" temps and are relative | ||
188 | to the point where the automatic thermal control circuit starts | ||
189 | throttling. This means that these are usually negative numbers. | ||
190 | |||
191 | pwm[1-3]_enable | ||
192 | 0 Fan off. | ||
193 | 1 Fan on manual control. | ||
194 | 2 Fan on automatic control and will run at the minimum pwm | ||
195 | if the temperature for the zone is below the minimum. | ||
196 | 3 Fan on automatic control but will be off if the temperature | ||
197 | for the zone is below the minimum. | ||
198 | 4-254 Ignored. | ||
199 | 255 Fan on full. | ||
200 | |||
201 | pwm[1-3]_auto_channels | ||
202 | Bitmap as described in sysctl-interface with the following | ||
203 | exceptions... | ||
204 | Only the following combination of zones (and their corresponding masks) | ||
205 | are valid: | ||
206 | 1 | ||
207 | 2 | ||
208 | 3 | ||
209 | 2,3 | ||
210 | 1,2,3 | ||
211 | 4 | ||
212 | 1,2,3,4 | ||
213 | |||
214 | Special values: | ||
215 | 0 Disabled. | ||
216 | 16 Fan on manual control. | ||
217 | 31 Fan on full. | ||
218 | |||
219 | |||
220 | pwm[1-3]_invert | ||
221 | When set, inverts the meaning of pwm[1-3]. | ||
222 | i.e. when pwm = 0, the fan will be on full and | ||
223 | when pwm = 255 the fan will be off. | ||
224 | |||
225 | pwm[1-3]_freq | ||
226 | PWM frequency in Hz | ||
227 | Valid values in Hz are: | ||
228 | |||
229 | 10 | ||
230 | 15 | ||
231 | 23 | ||
232 | 30 (default) | ||
233 | 38 | ||
234 | 47 | ||
235 | 62 | ||
236 | 94 | ||
237 | 23000 | ||
238 | 24000 | ||
239 | 25000 | ||
240 | 26000 | ||
241 | 27000 | ||
242 | 28000 | ||
243 | 29000 | ||
244 | 30000 | ||
245 | |||
246 | Setting any other value will be ignored. | ||
247 | |||
248 | peci_enable | ||
249 | Enables or disables PECI | ||
250 | |||
251 | peci_avg | ||
252 | Input filter average time. | ||
253 | |||
254 | 0 0 Sec. (no Smoothing) (default) | ||
255 | 1 0.25 Sec. | ||
256 | 2 0.5 Sec. | ||
257 | 3 1.0 Sec. | ||
258 | 4 2.0 Sec. | ||
259 | 5 4.0 Sec. | ||
260 | 6 8.0 Sec. | ||
261 | 7 0.0 Sec. | ||
262 | |||
263 | peci_legacy | ||
264 | |||
265 | 0 Standard Mode (default) | ||
266 | Remote Diode 1 reading is associated with | ||
267 | Temperature Zone 1, PECI is associated with | ||
268 | Zone 4 | ||
269 | |||
270 | 1 Legacy Mode | ||
271 | PECI is associated with Temperature Zone 1, | ||
272 | Remote Diode 1 is associated with Zone 4 | ||
273 | |||
274 | peci_diode | ||
275 | Diode filter | ||
276 | |||
277 | 0 0.25 Sec. | ||
278 | 1 1.1 Sec. | ||
279 | 2 2.4 Sec. (default) | ||
280 | 3 3.4 Sec. | ||
281 | 4 5.0 Sec. | ||
282 | 5 6.8 Sec. | ||
283 | 6 10.2 Sec. | ||
284 | 7 16.4 Sec. | ||
285 | |||
286 | peci_4domain | ||
287 | Four domain enable | ||
288 | |||
289 | 0 1 or 2 Domains for enabled processors (default) | ||
290 | 1 3 or 4 Domains for enabled processors | ||
291 | |||
292 | peci_domain | ||
293 | Domain | ||
294 | |||
295 | 0 Processor contains a single domain (0) (default) | ||
296 | 1 Processor contains two domains (0,1) | ||
diff --git a/Documentation/hwmon/it87 b/Documentation/hwmon/it87 index f9ba96c0ac4a..8d08bf0d38ed 100644 --- a/Documentation/hwmon/it87 +++ b/Documentation/hwmon/it87 | |||
@@ -5,31 +5,23 @@ Supported chips: | |||
5 | * IT8705F | 5 | * IT8705F |
6 | Prefix: 'it87' | 6 | Prefix: 'it87' |
7 | Addresses scanned: from Super I/O config space (8 I/O ports) | 7 | Addresses scanned: from Super I/O config space (8 I/O ports) |
8 | Datasheet: Publicly available at the ITE website | 8 | Datasheet: Once publicly available at the ITE website, but no longer |
9 | http://www.ite.com.tw/product_info/file/pc/IT8705F_V.0.4.1.pdf | ||
10 | * IT8712F | 9 | * IT8712F |
11 | Prefix: 'it8712' | 10 | Prefix: 'it8712' |
12 | Addresses scanned: from Super I/O config space (8 I/O ports) | 11 | Addresses scanned: from Super I/O config space (8 I/O ports) |
13 | Datasheet: Publicly available at the ITE website | 12 | Datasheet: Once publicly available at the ITE website, but no longer |
14 | http://www.ite.com.tw/product_info/file/pc/IT8712F_V0.9.1.pdf | ||
15 | http://www.ite.com.tw/product_info/file/pc/Errata%20V0.1%20for%20IT8712F%20V0.9.1.pdf | ||
16 | http://www.ite.com.tw/product_info/file/pc/IT8712F_V0.9.3.pdf | ||
17 | * IT8716F/IT8726F | 13 | * IT8716F/IT8726F |
18 | Prefix: 'it8716' | 14 | Prefix: 'it8716' |
19 | Addresses scanned: from Super I/O config space (8 I/O ports) | 15 | Addresses scanned: from Super I/O config space (8 I/O ports) |
20 | Datasheet: Publicly available at the ITE website | 16 | Datasheet: Once publicly available at the ITE website, but no longer |
21 | http://www.ite.com.tw/product_info/file/pc/IT8716F_V0.3.ZIP | ||
22 | http://www.ite.com.tw/product_info/file/pc/IT8726F_V0.3.pdf | ||
23 | * IT8718F | 17 | * IT8718F |
24 | Prefix: 'it8718' | 18 | Prefix: 'it8718' |
25 | Addresses scanned: from Super I/O config space (8 I/O ports) | 19 | Addresses scanned: from Super I/O config space (8 I/O ports) |
26 | Datasheet: Publicly available at the ITE website | 20 | Datasheet: Once publicly available at the ITE website, but no longer |
27 | http://www.ite.com.tw/product_info/file/pc/IT8718F_V0.2.zip | ||
28 | http://www.ite.com.tw/product_info/file/pc/IT8718F_V0%203_(for%20C%20version).zip | ||
29 | * IT8720F | 21 | * IT8720F |
30 | Prefix: 'it8720' | 22 | Prefix: 'it8720' |
31 | Addresses scanned: from Super I/O config space (8 I/O ports) | 23 | Addresses scanned: from Super I/O config space (8 I/O ports) |
32 | Datasheet: Not yet publicly available. | 24 | Datasheet: Not publicly available |
33 | * SiS950 [clone of IT8705F] | 25 | * SiS950 [clone of IT8705F] |
34 | Prefix: 'it87' | 26 | Prefix: 'it87' |
35 | Addresses scanned: from Super I/O config space (8 I/O ports) | 27 | Addresses scanned: from Super I/O config space (8 I/O ports) |
@@ -136,6 +128,10 @@ registers are read whenever any data is read (unless it is less than 1.5 | |||
136 | seconds since the last update). This means that you can easily miss | 128 | seconds since the last update). This means that you can easily miss |
137 | once-only alarms. | 129 | once-only alarms. |
138 | 130 | ||
131 | Out-of-limit readings can also result in beeping, if the chip is properly | ||
132 | wired and configured. Beeping can be enabled or disabled per sensor type | ||
133 | (temperatures, voltages and fans.) | ||
134 | |||
139 | The IT87xx only updates its values each 1.5 seconds; reading it more often | 135 | The IT87xx only updates its values each 1.5 seconds; reading it more often |
140 | will do no harm, but will return 'old' values. | 136 | will do no harm, but will return 'old' values. |
141 | 137 | ||
@@ -150,11 +146,38 @@ Fan speed control | |||
150 | ----------------- | 146 | ----------------- |
151 | 147 | ||
152 | The fan speed control features are limited to manual PWM mode. Automatic | 148 | The fan speed control features are limited to manual PWM mode. Automatic |
153 | "Smart Guardian" mode control handling is not implemented. However | 149 | "Smart Guardian" mode control handling is only implemented for older chips |
154 | if you want to go for "manual mode" just write 1 to pwmN_enable. | 150 | (see below.) However if you want to go for "manual mode" just write 1 to |
151 | pwmN_enable. | ||
155 | 152 | ||
156 | If you are only able to control the fan speed with very small PWM values, | 153 | If you are only able to control the fan speed with very small PWM values, |
157 | try lowering the PWM base frequency (pwm1_freq). Depending on the fan, | 154 | try lowering the PWM base frequency (pwm1_freq). Depending on the fan, |
158 | it may give you a somewhat greater control range. The same frequency is | 155 | it may give you a somewhat greater control range. The same frequency is |
159 | used to drive all fan outputs, which is why pwm2_freq and pwm3_freq are | 156 | used to drive all fan outputs, which is why pwm2_freq and pwm3_freq are |
160 | read-only. | 157 | read-only. |
158 | |||
159 | |||
160 | Automatic fan speed control (old interface) | ||
161 | ------------------------------------------- | ||
162 | |||
163 | The driver supports the old interface to automatic fan speed control | ||
164 | which is implemented by IT8705F chips up to revision F and IT8712F | ||
165 | chips up to revision G. | ||
166 | |||
167 | This interface implements 4 temperature vs. PWM output trip points. | ||
168 | The PWM output of trip point 4 is always the maximum value (fan running | ||
169 | at full speed) while the PWM output of the other 3 trip points can be | ||
170 | freely chosen. The temperature of all 4 trip points can be freely chosen. | ||
171 | Additionally, trip point 1 has an hysteresis temperature attached, to | ||
172 | prevent fast switching between fan on and off. | ||
173 | |||
174 | The chip automatically computes the PWM output value based on the input | ||
175 | temperature, based on this simple rule: if the temperature value is | ||
176 | between trip point N and trip point N+1 then the PWM output value is | ||
177 | the one of trip point N. The automatic control mode is less flexible | ||
178 | than the manual control mode, but it reacts faster, is more robust and | ||
179 | doesn't use CPU cycles. | ||
180 | |||
181 | Trip points must be set properly before switching to automatic fan speed | ||
182 | control mode. The driver will perform basic integrity checks before | ||
183 | actually switching to automatic control mode. | ||
diff --git a/Documentation/hwmon/lm90 b/Documentation/hwmon/lm90 index 93d8e3d55150..6a03dd4bcc94 100644 --- a/Documentation/hwmon/lm90 +++ b/Documentation/hwmon/lm90 | |||
@@ -84,6 +84,10 @@ Supported chips: | |||
84 | Addresses scanned: I2C 0x4c | 84 | Addresses scanned: I2C 0x4c |
85 | Datasheet: Publicly available at the Maxim website | 85 | Datasheet: Publicly available at the Maxim website |
86 | http://www.maxim-ic.com/quick_view2.cfm/qv_pk/3500 | 86 | http://www.maxim-ic.com/quick_view2.cfm/qv_pk/3500 |
87 | * Winbond/Nuvoton W83L771AWG/ASG | ||
88 | Prefix: 'w83l771' | ||
89 | Addresses scanned: I2C 0x4c | ||
90 | Datasheet: Not publicly available, can be requested from Nuvoton | ||
87 | 91 | ||
88 | 92 | ||
89 | Author: Jean Delvare <khali@linux-fr.org> | 93 | Author: Jean Delvare <khali@linux-fr.org> |
@@ -147,6 +151,12 @@ MAX6680 and MAX6681: | |||
147 | * Selectable address | 151 | * Selectable address |
148 | * Remote sensor type selection | 152 | * Remote sensor type selection |
149 | 153 | ||
154 | W83L771AWG/ASG | ||
155 | * The AWG and ASG variants only differ in package format. | ||
156 | * Filter and alert configuration register at 0xBF | ||
157 | * Diode ideality factor configuration (remote sensor) at 0xE3 | ||
158 | * Moving average (depending on conversion rate) | ||
159 | |||
150 | All temperature values are given in degrees Celsius. Resolution | 160 | All temperature values are given in degrees Celsius. Resolution |
151 | is 1.0 degree for the local temperature, 0.125 degree for the remote | 161 | is 1.0 degree for the local temperature, 0.125 degree for the remote |
152 | temperature, except for the MAX6657, MAX6658 and MAX6659 which have a | 162 | temperature, except for the MAX6657, MAX6658 and MAX6659 which have a |
@@ -163,6 +173,18 @@ The lm90 driver will not update its values more frequently than every | |||
163 | other second; reading them more often will do no harm, but will return | 173 | other second; reading them more often will do no harm, but will return |
164 | 'old' values. | 174 | 'old' values. |
165 | 175 | ||
176 | SMBus Alert Support | ||
177 | ------------------- | ||
178 | |||
179 | This driver has basic support for SMBus alert. When an alert is received, | ||
180 | the status register is read and the faulty temperature channel is logged. | ||
181 | |||
182 | The Analog Devices chips (ADM1032 and ADT7461) do not implement the SMBus | ||
183 | alert protocol properly so additional care is needed: the ALERT output is | ||
184 | disabled when an alert is received, and is re-enabled only when the alarm | ||
185 | is gone. Otherwise the chip would block alerts from other chips in the bus | ||
186 | as long as the alarm is active. | ||
187 | |||
166 | PEC Support | 188 | PEC Support |
167 | ----------- | 189 | ----------- |
168 | 190 | ||
diff --git a/Documentation/i2c/busses/i2c-i801 b/Documentation/i2c/busses/i2c-i801 index 81c0c59a60ea..e1bb5b261693 100644 --- a/Documentation/i2c/busses/i2c-i801 +++ b/Documentation/i2c/busses/i2c-i801 | |||
@@ -15,7 +15,8 @@ Supported adapters: | |||
15 | * Intel 82801I (ICH9) | 15 | * Intel 82801I (ICH9) |
16 | * Intel EP80579 (Tolapai) | 16 | * Intel EP80579 (Tolapai) |
17 | * Intel 82801JI (ICH10) | 17 | * Intel 82801JI (ICH10) |
18 | * Intel PCH | 18 | * Intel 3400/5 Series (PCH) |
19 | * Intel Cougar Point (PCH) | ||
19 | Datasheets: Publicly available at the Intel website | 20 | Datasheets: Publicly available at the Intel website |
20 | 21 | ||
21 | Authors: | 22 | Authors: |
diff --git a/Documentation/i2c/busses/i2c-parport b/Documentation/i2c/busses/i2c-parport index dceaba1ad930..2461c7b53b2c 100644 --- a/Documentation/i2c/busses/i2c-parport +++ b/Documentation/i2c/busses/i2c-parport | |||
@@ -29,6 +29,9 @@ can be easily added when needed. | |||
29 | Earlier kernels defaulted to type=0 (Philips). But now, if the type | 29 | Earlier kernels defaulted to type=0 (Philips). But now, if the type |
30 | parameter is missing, the driver will simply fail to initialize. | 30 | parameter is missing, the driver will simply fail to initialize. |
31 | 31 | ||
32 | SMBus alert support is available on adapters which have this line properly | ||
33 | connected to the parallel port's interrupt pin. | ||
34 | |||
32 | 35 | ||
33 | Building your own adapter | 36 | Building your own adapter |
34 | ------------------------- | 37 | ------------------------- |
diff --git a/Documentation/i2c/busses/i2c-parport-light b/Documentation/i2c/busses/i2c-parport-light index 287436478520..bdc9cbb2e0f2 100644 --- a/Documentation/i2c/busses/i2c-parport-light +++ b/Documentation/i2c/busses/i2c-parport-light | |||
@@ -9,3 +9,14 @@ parport handling is not an option. The drawback is a reduced portability | |||
9 | and the impossibility to daisy-chain other parallel port devices. | 9 | and the impossibility to daisy-chain other parallel port devices. |
10 | 10 | ||
11 | Please see i2c-parport for documentation. | 11 | Please see i2c-parport for documentation. |
12 | |||
13 | Module parameters: | ||
14 | |||
15 | * type: type of adapter (see i2c-parport or modinfo) | ||
16 | |||
17 | * base: base I/O address | ||
18 | Default is 0x378 which is fairly common for parallel ports, at least on PC. | ||
19 | |||
20 | * irq: optional IRQ | ||
21 | This must be passed if you want SMBus alert support, assuming your adapter | ||
22 | actually supports this. | ||
diff --git a/Documentation/i2c/smbus-protocol b/Documentation/i2c/smbus-protocol index 9df47441f0e7..7c19d1a2bea0 100644 --- a/Documentation/i2c/smbus-protocol +++ b/Documentation/i2c/smbus-protocol | |||
@@ -185,6 +185,22 @@ the protocol. All ARP communications use slave address 0x61 and | |||
185 | require PEC checksums. | 185 | require PEC checksums. |
186 | 186 | ||
187 | 187 | ||
188 | SMBus Alert | ||
189 | =========== | ||
190 | |||
191 | SMBus Alert was introduced in Revision 1.0 of the specification. | ||
192 | |||
193 | The SMBus alert protocol allows several SMBus slave devices to share a | ||
194 | single interrupt pin on the SMBus master, while still allowing the master | ||
195 | to know which slave triggered the interrupt. | ||
196 | |||
197 | This is implemented the following way in the Linux kernel: | ||
198 | * I2C bus drivers which support SMBus alert should call | ||
199 | i2c_setup_smbus_alert() to setup SMBus alert support. | ||
200 | * I2C drivers for devices which can trigger SMBus alerts should implement | ||
201 | the optional alert() callback. | ||
202 | |||
203 | |||
188 | I2C Block Transactions | 204 | I2C Block Transactions |
189 | ====================== | 205 | ====================== |
190 | 206 | ||
diff --git a/Documentation/i2c/writing-clients b/Documentation/i2c/writing-clients index 0a74603eb671..3219ee0dbfef 100644 --- a/Documentation/i2c/writing-clients +++ b/Documentation/i2c/writing-clients | |||
@@ -318,8 +318,9 @@ Plain I2C communication | |||
318 | These routines read and write some bytes from/to a client. The client | 318 | These routines read and write some bytes from/to a client. The client |
319 | contains the i2c address, so you do not have to include it. The second | 319 | contains the i2c address, so you do not have to include it. The second |
320 | parameter contains the bytes to read/write, the third the number of bytes | 320 | parameter contains the bytes to read/write, the third the number of bytes |
321 | to read/write (must be less than the length of the buffer.) Returned is | 321 | to read/write (must be less than the length of the buffer, also should be |
322 | the actual number of bytes read/written. | 322 | less than 64k since msg.len is u16.) Returned is the actual number of bytes |
323 | read/written. | ||
323 | 324 | ||
324 | int i2c_transfer(struct i2c_adapter *adap, struct i2c_msg *msg, | 325 | int i2c_transfer(struct i2c_adapter *adap, struct i2c_msg *msg, |
325 | int num); | 326 | int num); |
diff --git a/Documentation/init.txt b/Documentation/init.txt new file mode 100644 index 000000000000..535ad5e82b98 --- /dev/null +++ b/Documentation/init.txt | |||
@@ -0,0 +1,49 @@ | |||
1 | Explaining the dreaded "No init found." boot hang message | ||
2 | ========================================================= | ||
3 | |||
4 | OK, so you've got this pretty unintuitive message (currently located | ||
5 | in init/main.c) and are wondering what the H*** went wrong. | ||
6 | Some high-level reasons for failure (listed roughly in order of execution) | ||
7 | to load the init binary are: | ||
8 | A) Unable to mount root FS | ||
9 | B) init binary doesn't exist on rootfs | ||
10 | C) broken console device | ||
11 | D) binary exists but dependencies not available | ||
12 | E) binary cannot be loaded | ||
13 | |||
14 | Detailed explanations: | ||
15 | 0) Set "debug" kernel parameter (in bootloader config file or CONFIG_CMDLINE) | ||
16 | to get more detailed kernel messages. | ||
17 | A) make sure you have the correct root FS type | ||
18 | (and root= kernel parameter points to the correct partition), | ||
19 | required drivers such as storage hardware (such as SCSI or USB!) | ||
20 | and filesystem (ext3, jffs2 etc.) are builtin (alternatively as modules, | ||
21 | to be pre-loaded by an initrd) | ||
22 | C) Possibly a conflict in console= setup --> initial console unavailable. | ||
23 | E.g. some serial consoles are unreliable due to serial IRQ issues (e.g. | ||
24 | missing interrupt-based configuration). | ||
25 | Try using a different console= device or e.g. netconsole= . | ||
26 | D) e.g. required library dependencies of the init binary such as | ||
27 | /lib/ld-linux.so.2 missing or broken. Use readelf -d <INIT>|grep NEEDED | ||
28 | to find out which libraries are required. | ||
29 | E) make sure the binary's architecture matches your hardware. | ||
30 | E.g. i386 vs. x86_64 mismatch, or trying to load x86 on ARM hardware. | ||
31 | In case you tried loading a non-binary file here (shell script?), | ||
32 | you should make sure that the script specifies an interpreter in its shebang | ||
33 | header line (#!/...) that is fully working (including its library | ||
34 | dependencies). And before tackling scripts, better first test a simple | ||
35 | non-script binary such as /bin/sh and confirm its successful execution. | ||
36 | To find out more, add code to init/main.c to display kernel_execve()s | ||
37 | return values. | ||
38 | |||
39 | Please extend this explanation whenever you find new failure causes | ||
40 | (after all loading the init binary is a CRITICAL and hard transition step | ||
41 | which needs to be made as painless as possible), then submit patch to LKML. | ||
42 | Further TODOs: | ||
43 | - Implement the various run_init_process() invocations via a struct array | ||
44 | which can then store the kernel_execve() result value and on failure | ||
45 | log it all by iterating over _all_ results (very important usability fix). | ||
46 | - try to make the implementation itself more helpful in general, | ||
47 | e.g. by providing additional error messages at affected places. | ||
48 | |||
49 | Andreas Mohr <andi at lisas period de> | ||
diff --git a/Documentation/input/rotary-encoder.txt b/Documentation/input/rotary-encoder.txt index 3a6aec40c0b0..8b4129de1d2d 100644 --- a/Documentation/input/rotary-encoder.txt +++ b/Documentation/input/rotary-encoder.txt | |||
@@ -75,7 +75,7 @@ and the number of steps or will clamp at the maximum and zero depending on | |||
75 | the configuration. | 75 | the configuration. |
76 | 76 | ||
77 | Because GPIO to IRQ mapping is platform specific, this information must | 77 | Because GPIO to IRQ mapping is platform specific, this information must |
78 | be given in seperately to the driver. See the example below. | 78 | be given in separately to the driver. See the example below. |
79 | 79 | ||
80 | ---------<snip>--------- | 80 | ---------<snip>--------- |
81 | 81 | ||
diff --git a/Documentation/input/sentelic.txt b/Documentation/input/sentelic.txt index f7160a2fb6a2..b35affd5c649 100644 --- a/Documentation/input/sentelic.txt +++ b/Documentation/input/sentelic.txt | |||
@@ -1,5 +1,5 @@ | |||
1 | Copyright (C) 2002-2008 Sentelic Corporation. | 1 | Copyright (C) 2002-2010 Sentelic Corporation. |
2 | Last update: Oct-31-2008 | 2 | Last update: Jan-13-2010 |
3 | 3 | ||
4 | ============================================================================== | 4 | ============================================================================== |
5 | * Finger Sensing Pad Intellimouse Mode(scrolling wheel, 4th and 5th buttons) | 5 | * Finger Sensing Pad Intellimouse Mode(scrolling wheel, 4th and 5th buttons) |
@@ -44,7 +44,7 @@ B) MSID 6: Horizontal and Vertical scrolling. | |||
44 | Packet 1 | 44 | Packet 1 |
45 | Bit 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 | 45 | Bit 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 |
46 | BYTE |---------------|BYTE |---------------|BYTE|---------------|BYTE|---------------| | 46 | BYTE |---------------|BYTE |---------------|BYTE|---------------|BYTE|---------------| |
47 | 1 |Y|X|y|x|1|M|R|L| 2 |X|X|X|X|X|X|X|X| 3 |Y|Y|Y|Y|Y|Y|Y|Y| 4 | | |B|F|l|r|u|d| | 47 | 1 |Y|X|y|x|1|M|R|L| 2 |X|X|X|X|X|X|X|X| 3 |Y|Y|Y|Y|Y|Y|Y|Y| 4 | | |B|F|r|l|u|d| |
48 | |---------------| |---------------| |---------------| |---------------| | 48 | |---------------| |---------------| |---------------| |---------------| |
49 | 49 | ||
50 | Byte 1: Bit7 => Y overflow | 50 | Byte 1: Bit7 => Y overflow |
@@ -59,15 +59,15 @@ Byte 2: X Movement(9-bit 2's complement integers) | |||
59 | Byte 3: Y Movement(9-bit 2's complement integers) | 59 | Byte 3: Y Movement(9-bit 2's complement integers) |
60 | Byte 4: Bit0 => the Vertical scrolling movement downward. | 60 | Byte 4: Bit0 => the Vertical scrolling movement downward. |
61 | Bit1 => the Vertical scrolling movement upward. | 61 | Bit1 => the Vertical scrolling movement upward. |
62 | Bit2 => the Vertical scrolling movement rightward. | 62 | Bit2 => the Horizontal scrolling movement leftward. |
63 | Bit3 => the Vertical scrolling movement leftward. | 63 | Bit3 => the Horizontal scrolling movement rightward. |
64 | Bit4 => 1 = 4th mouse button is pressed, Forward one page. | 64 | Bit4 => 1 = 4th mouse button is pressed, Forward one page. |
65 | 0 = 4th mouse button is not pressed. | 65 | 0 = 4th mouse button is not pressed. |
66 | Bit5 => 1 = 5th mouse button is pressed, Backward one page. | 66 | Bit5 => 1 = 5th mouse button is pressed, Backward one page. |
67 | 0 = 5th mouse button is not pressed. | 67 | 0 = 5th mouse button is not pressed. |
68 | 68 | ||
69 | C) MSID 7: | 69 | C) MSID 7: |
70 | # FSP uses 2 packets(8 Bytes) data to represent Absolute Position | 70 | # FSP uses 2 packets (8 Bytes) to represent Absolute Position. |
71 | so we have PACKET NUMBER to identify packets. | 71 | so we have PACKET NUMBER to identify packets. |
72 | If PACKET NUMBER is 0, the packet is Packet 1. | 72 | If PACKET NUMBER is 0, the packet is Packet 1. |
73 | If PACKET NUMBER is 1, the packet is Packet 2. | 73 | If PACKET NUMBER is 1, the packet is Packet 2. |
@@ -129,7 +129,7 @@ Byte 3: Message Type => 0x00 (Disabled) | |||
129 | Byte 4: Bit7~Bit0 => Don't Care | 129 | Byte 4: Bit7~Bit0 => Don't Care |
130 | 130 | ||
131 | ============================================================================== | 131 | ============================================================================== |
132 | * Absolute position for STL3888-A0. | 132 | * Absolute position for STL3888-Ax. |
133 | ============================================================================== | 133 | ============================================================================== |
134 | Packet 1 (ABSOLUTE POSITION) | 134 | Packet 1 (ABSOLUTE POSITION) |
135 | Bit 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 | 135 | Bit 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 |
@@ -179,14 +179,14 @@ Byte 4: Bit1~Bit0 => Y coordinate (xpos[1:0]) | |||
179 | Bit5~Bit4 => y2_g | 179 | Bit5~Bit4 => y2_g |
180 | Bit7~Bit6 => x2_g | 180 | Bit7~Bit6 => x2_g |
181 | 181 | ||
182 | Notify Packet for STL3888-A0 | 182 | Notify Packet for STL3888-Ax |
183 | Bit 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 | 183 | Bit 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 |
184 | BYTE |---------------|BYTE |---------------|BYTE|---------------|BYTE|---------------| | 184 | BYTE |---------------|BYTE |---------------|BYTE|---------------|BYTE|---------------| |
185 | 1 |1|0|1|P|1|M|R|L| 2 |C|C|C|C|C|C|C|C| 3 |0|0|F|F|0|0|0|i| 4 |r|l|d|u|0|0|0|0| | 185 | 1 |1|0|1|P|1|M|R|L| 2 |C|C|C|C|C|C|C|C| 3 |0|0|F|F|0|0|0|i| 4 |r|l|d|u|0|0|0|0| |
186 | |---------------| |---------------| |---------------| |---------------| | 186 | |---------------| |---------------| |---------------| |---------------| |
187 | 187 | ||
188 | Byte 1: Bit7~Bit6 => 00, Normal data packet | 188 | Byte 1: Bit7~Bit6 => 00, Normal data packet |
189 | => 01, Absolute coordination packet | 189 | => 01, Absolute coordinates packet |
190 | => 10, Notify packet | 190 | => 10, Notify packet |
191 | Bit5 => 1 | 191 | Bit5 => 1 |
192 | Bit4 => when in absolute coordinates mode (valid when EN_PKT_GO is 1): | 192 | Bit4 => when in absolute coordinates mode (valid when EN_PKT_GO is 1): |
@@ -205,15 +205,106 @@ Byte 4: Bit7 => scroll right button | |||
205 | Bit6 => scroll left button | 205 | Bit6 => scroll left button |
206 | Bit5 => scroll down button | 206 | Bit5 => scroll down button |
207 | Bit4 => scroll up button | 207 | Bit4 => scroll up button |
208 | * Note that if gesture and additional button (Bit4~Bit7) | 208 | * Note that if gesture and additional buttoni (Bit4~Bit7) |
209 | happen at the same time, the button information will not | 209 | happen at the same time, the button information will not |
210 | be sent. | 210 | be sent. |
211 | Bit3~Bit0 => Reserved | ||
212 | |||
213 | Sample sequence of Multi-finger, Multi-coordinate mode: | ||
214 | |||
215 | notify packet (valid bit == 1), abs pkt 1, abs pkt 2, abs pkt 1, | ||
216 | abs pkt 2, ..., notify packet (valid bit == 0) | ||
217 | |||
218 | ============================================================================== | ||
219 | * Absolute position for STL3888-B0. | ||
220 | ============================================================================== | ||
221 | Packet 1(ABSOLUTE POSITION) | ||
222 | Bit 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 | ||
223 | BYTE |---------------|BYTE |---------------|BYTE|---------------|BYTE|---------------| | ||
224 | 1 |0|1|V|F|1|0|R|L| 2 |X|X|X|X|X|X|X|X| 3 |Y|Y|Y|Y|Y|Y|Y|Y| 4 |r|l|u|d|X|X|Y|Y| | ||
225 | |---------------| |---------------| |---------------| |---------------| | ||
226 | |||
227 | Byte 1: Bit7~Bit6 => 00, Normal data packet | ||
228 | => 01, Absolute coordinates packet | ||
229 | => 10, Notify packet | ||
230 | Bit5 => Valid bit, 0 means that the coordinate is invalid or finger up. | ||
231 | When both fingers are up, the last two reports have zero valid | ||
232 | bit. | ||
233 | Bit4 => finger up/down information. 1: finger down, 0: finger up. | ||
234 | Bit3 => 1 | ||
235 | Bit2 => finger index, 0 is the first finger, 1 is the second finger. | ||
236 | Bit1 => Right Button, 1 is pressed, 0 is not pressed. | ||
237 | Bit0 => Left Button, 1 is pressed, 0 is not pressed. | ||
238 | Byte 2: X coordinate (xpos[9:2]) | ||
239 | Byte 3: Y coordinate (ypos[9:2]) | ||
240 | Byte 4: Bit1~Bit0 => Y coordinate (xpos[1:0]) | ||
241 | Bit3~Bit2 => X coordinate (ypos[1:0]) | ||
242 | Bit4 => scroll down button | ||
243 | Bit5 => scroll up button | ||
244 | Bit6 => scroll left button | ||
245 | Bit7 => scroll right button | ||
246 | |||
247 | Packet 2 (ABSOLUTE POSITION) | ||
248 | Bit 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 | ||
249 | BYTE |---------------|BYTE |---------------|BYTE|---------------|BYTE|---------------| | ||
250 | 1 |0|1|V|F|1|1|R|L| 2 |X|X|X|X|X|X|X|X| 3 |Y|Y|Y|Y|Y|Y|Y|Y| 4 |r|l|u|d|X|X|Y|Y| | ||
251 | |---------------| |---------------| |---------------| |---------------| | ||
252 | |||
253 | Byte 1: Bit7~Bit6 => 00, Normal data packet | ||
254 | => 01, Absolute coordination packet | ||
255 | => 10, Notify packet | ||
256 | Bit5 => Valid bit, 0 means that the coordinate is invalid or finger up. | ||
257 | When both fingers are up, the last two reports have zero valid | ||
258 | bit. | ||
259 | Bit4 => finger up/down information. 1: finger down, 0: finger up. | ||
260 | Bit3 => 1 | ||
261 | Bit2 => finger index, 0 is the first finger, 1 is the second finger. | ||
262 | Bit1 => Right Button, 1 is pressed, 0 is not pressed. | ||
263 | Bit0 => Left Button, 1 is pressed, 0 is not pressed. | ||
264 | Byte 2: X coordinate (xpos[9:2]) | ||
265 | Byte 3: Y coordinate (ypos[9:2]) | ||
266 | Byte 4: Bit1~Bit0 => Y coordinate (xpos[1:0]) | ||
267 | Bit3~Bit2 => X coordinate (ypos[1:0]) | ||
268 | Bit4 => scroll down button | ||
269 | Bit5 => scroll up button | ||
270 | Bit6 => scroll left button | ||
271 | Bit7 => scroll right button | ||
272 | |||
273 | Notify Packet for STL3888-B0 | ||
274 | Bit 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 | ||
275 | BYTE |---------------|BYTE |---------------|BYTE|---------------|BYTE|---------------| | ||
276 | 1 |1|0|1|P|1|M|R|L| 2 |C|C|C|C|C|C|C|C| 3 |0|0|F|F|0|0|0|i| 4 |r|l|u|d|0|0|0|0| | ||
277 | |---------------| |---------------| |---------------| |---------------| | ||
278 | |||
279 | Byte 1: Bit7~Bit6 => 00, Normal data packet | ||
280 | => 01, Absolute coordination packet | ||
281 | => 10, Notify packet | ||
282 | Bit5 => 1 | ||
283 | Bit4 => when in absolute coordinate mode (valid when EN_PKT_GO is 1): | ||
284 | 0: left button is generated by the on-pad command | ||
285 | 1: left button is generated by the external button | ||
286 | Bit3 => 1 | ||
287 | Bit2 => Middle Button, 1 is pressed, 0 is not pressed. | ||
288 | Bit1 => Right Button, 1 is pressed, 0 is not pressed. | ||
289 | Bit0 => Left Button, 1 is pressed, 0 is not pressed. | ||
290 | Byte 2: Message Type => 0xB7 (Multi Finger, Multi Coordinate mode) | ||
291 | Byte 3: Bit7~Bit6 => Don't care | ||
292 | Bit5~Bit4 => Number of fingers | ||
293 | Bit3~Bit1 => Reserved | ||
294 | Bit0 => 1: enter gesture mode; 0: leaving gesture mode | ||
295 | Byte 4: Bit7 => scroll right button | ||
296 | Bit6 => scroll left button | ||
297 | Bit5 => scroll up button | ||
298 | Bit4 => scroll down button | ||
299 | * Note that if gesture and additional button(Bit4~Bit7) | ||
300 | happen at the same time, the button information will not | ||
301 | be sent. | ||
211 | Bit3~Bit0 => Reserved | 302 | Bit3~Bit0 => Reserved |
212 | 303 | ||
213 | Sample sequence of Multi-finger, Multi-coordinate mode: | 304 | Sample sequence of Multi-finger, Multi-coordinate mode: |
214 | 305 | ||
215 | notify packet (valid bit == 1), abs pkt 1, abs pkt 2, abs pkt 1, | 306 | notify packet (valid bit == 1), abs pkt 1, abs pkt 2, abs pkt 1, |
216 | abs pkt 2, ..., notify packet(valid bit == 0) | 307 | abs pkt 2, ..., notify packet (valid bit == 0) |
217 | 308 | ||
218 | ============================================================================== | 309 | ============================================================================== |
219 | * FSP Enable/Disable packet | 310 | * FSP Enable/Disable packet |
@@ -409,7 +500,8 @@ offset width default r/w name | |||
409 | 0: read only, 1: read/write enable | 500 | 0: read only, 1: read/write enable |
410 | (Note that following registers does not require clock gating being | 501 | (Note that following registers does not require clock gating being |
411 | enabled prior to write: 05 06 07 08 09 0c 0f 10 11 12 16 17 18 23 2e | 502 | enabled prior to write: 05 06 07 08 09 0c 0f 10 11 12 16 17 18 23 2e |
412 | 40 41 42 43.) | 503 | 40 41 42 43. In addition to that, this bit must be 1 when gesture |
504 | mode is enabled) | ||
413 | 505 | ||
414 | 0x31 RW on-pad command detection | 506 | 0x31 RW on-pad command detection |
415 | bit7 0 RW on-pad command left button down tag | 507 | bit7 0 RW on-pad command left button down tag |
@@ -463,6 +555,10 @@ offset width default r/w name | |||
463 | absolute coordinates; otherwise, host only receives packets with | 555 | absolute coordinates; otherwise, host only receives packets with |
464 | relative coordinate.) | 556 | relative coordinate.) |
465 | 557 | ||
558 | bit7 0 RW EN_PS2_F2: PS/2 gesture mode 2nd | ||
559 | finger packet enable | ||
560 | 0: disable, 1: enable | ||
561 | |||
466 | 0x43 RW on-pad control | 562 | 0x43 RW on-pad control |
467 | bit0 0 RW on-pad control enable | 563 | bit0 0 RW on-pad control enable |
468 | 0: disable, 1: enable | 564 | 0: disable, 1: enable |
diff --git a/Documentation/ioctl/ioctl-number.txt b/Documentation/ioctl/ioctl-number.txt index 35cf64d4436d..35c9b51d20ea 100644 --- a/Documentation/ioctl/ioctl-number.txt +++ b/Documentation/ioctl/ioctl-number.txt | |||
@@ -139,7 +139,6 @@ Code Seq#(hex) Include File Comments | |||
139 | 'K' all linux/kd.h | 139 | 'K' all linux/kd.h |
140 | 'L' 00-1F linux/loop.h conflict! | 140 | 'L' 00-1F linux/loop.h conflict! |
141 | 'L' 10-1F drivers/scsi/mpt2sas/mpt2sas_ctl.h conflict! | 141 | 'L' 10-1F drivers/scsi/mpt2sas/mpt2sas_ctl.h conflict! |
142 | 'L' 20-2F linux/usb/vstusb.h | ||
143 | 'L' E0-FF linux/ppdd.h encrypted disk device driver | 142 | 'L' E0-FF linux/ppdd.h encrypted disk device driver |
144 | <http://linux01.gwdg.de/~alatham/ppdd.html> | 143 | <http://linux01.gwdg.de/~alatham/ppdd.html> |
145 | 'M' all linux/soundcard.h conflict! | 144 | 'M' all linux/soundcard.h conflict! |
diff --git a/Documentation/isdn/INTERFACE.CAPI b/Documentation/isdn/INTERFACE.CAPI index 5fe8de5cc727..f172091fb7cd 100644 --- a/Documentation/isdn/INTERFACE.CAPI +++ b/Documentation/isdn/INTERFACE.CAPI | |||
@@ -149,10 +149,11 @@ char *(*procinfo)(struct capi_ctr *ctrlr) | |||
149 | pointer to a callback function returning the entry for the device in | 149 | pointer to a callback function returning the entry for the device in |
150 | the CAPI controller info table, /proc/capi/controller | 150 | the CAPI controller info table, /proc/capi/controller |
151 | 151 | ||
152 | read_proc_t *ctr_read_proc | 152 | const struct file_operations *proc_fops |
153 | pointer to the read_proc callback function for the device's proc file | 153 | pointers to callback functions for the device's proc file |
154 | system entry, /proc/capi/controllers/<n>; will be called with a | 154 | system entry, /proc/capi/controllers/<n>; pointer to the device's |
155 | pointer to the device's capi_ctr structure as the last (data) argument | 155 | capi_ctr structure is available from struct proc_dir_entry::data |
156 | which is available from struct inode. | ||
156 | 157 | ||
157 | Note: Callback functions except send_message() are never called in interrupt | 158 | Note: Callback functions except send_message() are never called in interrupt |
158 | context. | 159 | context. |
diff --git a/Documentation/isdn/README.gigaset b/Documentation/isdn/README.gigaset index 794941fc9493..e472df842323 100644 --- a/Documentation/isdn/README.gigaset +++ b/Documentation/isdn/README.gigaset | |||
@@ -292,10 +292,10 @@ GigaSet 307x Device Driver | |||
292 | to /etc/modprobe.d/gigaset, /etc/modprobe.conf.local or a similar file. | 292 | to /etc/modprobe.d/gigaset, /etc/modprobe.conf.local or a similar file. |
293 | 293 | ||
294 | Problem: | 294 | Problem: |
295 | Your isdn script aborts with a message about isdnlog. | 295 | The isdnlog program emits error messages or just doesn't work. |
296 | Solution: | 296 | Solution: |
297 | Try deactivating (or commenting out) isdnlog. This driver does not | 297 | Isdnlog supports only the HiSax driver. Do not attempt to use it with |
298 | support it. | 298 | other drivers such as Gigaset. |
299 | 299 | ||
300 | Problem: | 300 | Problem: |
301 | You have two or more DECT data adapters (M101/M105) and only the | 301 | You have two or more DECT data adapters (M101/M105) and only the |
@@ -321,8 +321,8 @@ GigaSet 307x Device Driver | |||
321 | writing an appropriate value to /sys/module/gigaset/parameters/debug, e.g. | 321 | writing an appropriate value to /sys/module/gigaset/parameters/debug, e.g. |
322 | echo 0 > /sys/module/gigaset/parameters/debug | 322 | echo 0 > /sys/module/gigaset/parameters/debug |
323 | switches off debugging output completely, | 323 | switches off debugging output completely, |
324 | echo 0x10a020 > /sys/module/gigaset/parameters/debug | 324 | echo 0x302020 > /sys/module/gigaset/parameters/debug |
325 | enables the standard set of debugging output messages. These values are | 325 | enables a reasonable set of debugging output messages. These values are |
326 | bit patterns where every bit controls a certain type of debugging output. | 326 | bit patterns where every bit controls a certain type of debugging output. |
327 | See the constants DEBUG_* in the source file gigaset.h for details. | 327 | See the constants DEBUG_* in the source file gigaset.h for details. |
328 | 328 | ||
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 826b6e148316..e4cbca58536c 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt | |||
@@ -54,6 +54,7 @@ parameter is applicable: | |||
54 | IMA Integrity measurement architecture is enabled. | 54 | IMA Integrity measurement architecture is enabled. |
55 | IOSCHED More than one I/O scheduler is enabled. | 55 | IOSCHED More than one I/O scheduler is enabled. |
56 | IP_PNP IP DHCP, BOOTP, or RARP is enabled. | 56 | IP_PNP IP DHCP, BOOTP, or RARP is enabled. |
57 | IPV6 IPv6 support is enabled. | ||
57 | ISAPNP ISA PnP code is enabled. | 58 | ISAPNP ISA PnP code is enabled. |
58 | ISDN Appropriate ISDN support is enabled. | 59 | ISDN Appropriate ISDN support is enabled. |
59 | JOY Appropriate joystick support is enabled. | 60 | JOY Appropriate joystick support is enabled. |
@@ -199,10 +200,6 @@ and is between 256 and 4096 characters. It is defined in the file | |||
199 | acpi_display_output=video | 200 | acpi_display_output=video |
200 | See above. | 201 | See above. |
201 | 202 | ||
202 | acpi_early_pdc_eval [HW,ACPI] Evaluate processor _PDC methods | ||
203 | early. Needed on some platforms to properly | ||
204 | initialize the EC. | ||
205 | |||
206 | acpi_irq_balance [HW,ACPI] | 203 | acpi_irq_balance [HW,ACPI] |
207 | ACPI will balance active IRQs | 204 | ACPI will balance active IRQs |
208 | default in APIC mode | 205 | default in APIC mode |
@@ -315,6 +312,11 @@ and is between 256 and 4096 characters. It is defined in the file | |||
315 | aic79xx= [HW,SCSI] | 312 | aic79xx= [HW,SCSI] |
316 | See Documentation/scsi/aic79xx.txt. | 313 | See Documentation/scsi/aic79xx.txt. |
317 | 314 | ||
315 | alignment= [KNL,ARM] | ||
316 | Allow the default userspace alignment fault handler | ||
317 | behaviour to be specified. Bit 0 enables warnings, | ||
318 | bit 1 enables fixups, and bit 2 sends a segfault. | ||
319 | |||
318 | amd_iommu= [HW,X86-84] | 320 | amd_iommu= [HW,X86-84] |
319 | Pass parameters to the AMD IOMMU driver in the system. | 321 | Pass parameters to the AMD IOMMU driver in the system. |
320 | Possible values are: | 322 | Possible values are: |
@@ -351,6 +353,9 @@ and is between 256 and 4096 characters. It is defined in the file | |||
351 | Change the amount of debugging information output | 353 | Change the amount of debugging information output |
352 | when initialising the APIC and IO-APIC components. | 354 | when initialising the APIC and IO-APIC components. |
353 | 355 | ||
356 | autoconf= [IPV6] | ||
357 | See Documentation/networking/ipv6.txt. | ||
358 | |||
354 | show_lapic= [APIC,X86] Advanced Programmable Interrupt Controller | 359 | show_lapic= [APIC,X86] Advanced Programmable Interrupt Controller |
355 | Limit apic dumping. The parameter defines the maximal | 360 | Limit apic dumping. The parameter defines the maximal |
356 | number of local apics being dumped. Also it is possible | 361 | number of local apics being dumped. Also it is possible |
@@ -633,6 +638,12 @@ and is between 256 and 4096 characters. It is defined in the file | |||
633 | See drivers/char/README.epca and | 638 | See drivers/char/README.epca and |
634 | Documentation/serial/digiepca.txt. | 639 | Documentation/serial/digiepca.txt. |
635 | 640 | ||
641 | disable= [IPV6] | ||
642 | See Documentation/networking/ipv6.txt. | ||
643 | |||
644 | disable_ipv6= [IPV6] | ||
645 | See Documentation/networking/ipv6.txt. | ||
646 | |||
636 | disable_mtrr_cleanup [X86] | 647 | disable_mtrr_cleanup [X86] |
637 | The kernel tries to adjust MTRR layout from continuous | 648 | The kernel tries to adjust MTRR layout from continuous |
638 | to discrete, to make X server driver able to add WB | 649 | to discrete, to make X server driver able to add WB |
@@ -1733,6 +1744,9 @@ and is between 256 and 4096 characters. It is defined in the file | |||
1733 | nomfgpt [X86-32] Disable Multi-Function General Purpose | 1744 | nomfgpt [X86-32] Disable Multi-Function General Purpose |
1734 | Timer usage (for AMD Geode machines). | 1745 | Timer usage (for AMD Geode machines). |
1735 | 1746 | ||
1747 | nopat [X86] Disable PAT (page attribute table extension of | ||
1748 | pagetables) support. | ||
1749 | |||
1736 | norandmaps Don't use address space randomization. Equivalent to | 1750 | norandmaps Don't use address space randomization. Equivalent to |
1737 | echo 0 > /proc/sys/kernel/randomize_va_space | 1751 | echo 0 > /proc/sys/kernel/randomize_va_space |
1738 | 1752 | ||
@@ -1776,6 +1790,12 @@ and is between 256 and 4096 characters. It is defined in the file | |||
1776 | purges which is reported from either PAL_VM_SUMMARY or | 1790 | purges which is reported from either PAL_VM_SUMMARY or |
1777 | SAL PALO. | 1791 | SAL PALO. |
1778 | 1792 | ||
1793 | nr_cpus= [SMP] Maximum number of processors that an SMP kernel | ||
1794 | could support. nr_cpus=n : n >= 1 limits the kernel to | ||
1795 | supporting 'n' processors. Later in runtime you can not | ||
1796 | use hotplug cpu feature to put more cpu back to online. | ||
1797 | just like you compile the kernel NR_CPUS=n | ||
1798 | |||
1779 | nr_uarts= [SERIAL] maximum number of UARTs to be registered. | 1799 | nr_uarts= [SERIAL] maximum number of UARTs to be registered. |
1780 | 1800 | ||
1781 | numa_zonelist_order= [KNL, BOOT] Select zonelist order for NUMA. | 1801 | numa_zonelist_order= [KNL, BOOT] Select zonelist order for NUMA. |
@@ -1943,8 +1963,12 @@ and is between 256 and 4096 characters. It is defined in the file | |||
1943 | IRQ routing is enabled. | 1963 | IRQ routing is enabled. |
1944 | noacpi [X86] Do not use ACPI for IRQ routing | 1964 | noacpi [X86] Do not use ACPI for IRQ routing |
1945 | or for PCI scanning. | 1965 | or for PCI scanning. |
1946 | use_crs [X86] Use _CRS for PCI resource | 1966 | use_crs [X86] Use PCI host bridge window information |
1947 | allocation. | 1967 | from ACPI. On BIOSes from 2008 or later, this |
1968 | is enabled by default. If you need to use this, | ||
1969 | please report a bug. | ||
1970 | nocrs [X86] Ignore PCI host bridge windows from ACPI. | ||
1971 | If you need to use this, please report a bug. | ||
1948 | routeirq Do IRQ routing for all PCI devices. | 1972 | routeirq Do IRQ routing for all PCI devices. |
1949 | This is normally done in pci_enable_device(), | 1973 | This is normally done in pci_enable_device(), |
1950 | so this option is a temporary workaround | 1974 | so this option is a temporary workaround |
@@ -1993,6 +2017,14 @@ and is between 256 and 4096 characters. It is defined in the file | |||
1993 | force Enable ASPM even on devices that claim not to support it. | 2017 | force Enable ASPM even on devices that claim not to support it. |
1994 | WARNING: Forcing ASPM on may cause system lockups. | 2018 | WARNING: Forcing ASPM on may cause system lockups. |
1995 | 2019 | ||
2020 | pcie_pme= [PCIE,PM] Native PCIe PME signaling options: | ||
2021 | off Do not use native PCIe PME signaling. | ||
2022 | force Use native PCIe PME signaling even if the BIOS refuses | ||
2023 | to allow the kernel to control the relevant PCIe config | ||
2024 | registers. | ||
2025 | nomsi Do not use MSI for native PCIe PME signaling (this makes | ||
2026 | all PCIe root ports use INTx for everything). | ||
2027 | |||
1996 | pcmv= [HW,PCMCIA] BadgePAD 4 | 2028 | pcmv= [HW,PCMCIA] BadgePAD 4 |
1997 | 2029 | ||
1998 | pd. [PARIDE] | 2030 | pd. [PARIDE] |
@@ -2698,6 +2730,13 @@ and is between 256 and 4096 characters. It is defined in the file | |||
2698 | medium is write-protected). | 2730 | medium is write-protected). |
2699 | Example: quirks=0419:aaf5:rl,0421:0433:rc | 2731 | Example: quirks=0419:aaf5:rl,0421:0433:rc |
2700 | 2732 | ||
2733 | userpte= | ||
2734 | [X86] Flags controlling user PTE allocations. | ||
2735 | |||
2736 | nohigh = do not allocate PTE pages in | ||
2737 | HIGHMEM regardless of setting | ||
2738 | of CONFIG_HIGHPTE. | ||
2739 | |||
2701 | vdso= [X86,SH] | 2740 | vdso= [X86,SH] |
2702 | vdso=2: enable compat VDSO (default with COMPAT_VDSO) | 2741 | vdso=2: enable compat VDSO (default with COMPAT_VDSO) |
2703 | vdso=1: enable VDSO (default) | 2742 | vdso=1: enable VDSO (default) |
@@ -2791,6 +2830,12 @@ and is between 256 and 4096 characters. It is defined in the file | |||
2791 | default x2apic cluster mode on platforms | 2830 | default x2apic cluster mode on platforms |
2792 | supporting x2apic. | 2831 | supporting x2apic. |
2793 | 2832 | ||
2833 | x86_mrst_timer= [X86-32,APBT] | ||
2834 | Choose timer option for x86 Moorestown MID platform. | ||
2835 | Two valid options are apbt timer only and lapic timer | ||
2836 | plus one apbt timer for broadcast timer. | ||
2837 | x86_mrst_timer=apbt_only | lapic_and_apbt | ||
2838 | |||
2794 | xd= [HW,XT] Original XT pre-IDE (RLL encoded) disks. | 2839 | xd= [HW,XT] Original XT pre-IDE (RLL encoded) disks. |
2795 | xd_geo= See header of drivers/block/xd.c. | 2840 | xd_geo= See header of drivers/block/xd.c. |
2796 | 2841 | ||
diff --git a/Documentation/kobject.txt b/Documentation/kobject.txt index c79ab996dada..bdb13817e1e9 100644 --- a/Documentation/kobject.txt +++ b/Documentation/kobject.txt | |||
@@ -266,7 +266,7 @@ kobj_type: | |||
266 | 266 | ||
267 | struct kobj_type { | 267 | struct kobj_type { |
268 | void (*release)(struct kobject *); | 268 | void (*release)(struct kobject *); |
269 | struct sysfs_ops *sysfs_ops; | 269 | const struct sysfs_ops *sysfs_ops; |
270 | struct attribute **default_attrs; | 270 | struct attribute **default_attrs; |
271 | }; | 271 | }; |
272 | 272 | ||
diff --git a/Documentation/kprobes.txt b/Documentation/kprobes.txt index 053037a1fe6d..2f9115c0ae62 100644 --- a/Documentation/kprobes.txt +++ b/Documentation/kprobes.txt | |||
@@ -1,6 +1,7 @@ | |||
1 | Title : Kernel Probes (Kprobes) | 1 | Title : Kernel Probes (Kprobes) |
2 | Authors : Jim Keniston <jkenisto@us.ibm.com> | 2 | Authors : Jim Keniston <jkenisto@us.ibm.com> |
3 | : Prasanna S Panchamukhi <prasanna@in.ibm.com> | 3 | : Prasanna S Panchamukhi <prasanna.panchamukhi@gmail.com> |
4 | : Masami Hiramatsu <mhiramat@redhat.com> | ||
4 | 5 | ||
5 | CONTENTS | 6 | CONTENTS |
6 | 7 | ||
@@ -15,6 +16,7 @@ CONTENTS | |||
15 | 9. Jprobes Example | 16 | 9. Jprobes Example |
16 | 10. Kretprobes Example | 17 | 10. Kretprobes Example |
17 | Appendix A: The kprobes debugfs interface | 18 | Appendix A: The kprobes debugfs interface |
19 | Appendix B: The kprobes sysctl interface | ||
18 | 20 | ||
19 | 1. Concepts: Kprobes, Jprobes, Return Probes | 21 | 1. Concepts: Kprobes, Jprobes, Return Probes |
20 | 22 | ||
@@ -42,13 +44,13 @@ registration/unregistration of a group of *probes. These functions | |||
42 | can speed up unregistration process when you have to unregister | 44 | can speed up unregistration process when you have to unregister |
43 | a lot of probes at once. | 45 | a lot of probes at once. |
44 | 46 | ||
45 | The next three subsections explain how the different types of | 47 | The next four subsections explain how the different types of |
46 | probes work. They explain certain things that you'll need to | 48 | probes work and how jump optimization works. They explain certain |
47 | know in order to make the best use of Kprobes -- e.g., the | 49 | things that you'll need to know in order to make the best use of |
48 | difference between a pre_handler and a post_handler, and how | 50 | Kprobes -- e.g., the difference between a pre_handler and |
49 | to use the maxactive and nmissed fields of a kretprobe. But | 51 | a post_handler, and how to use the maxactive and nmissed fields of |
50 | if you're in a hurry to start using Kprobes, you can skip ahead | 52 | a kretprobe. But if you're in a hurry to start using Kprobes, you |
51 | to section 2. | 53 | can skip ahead to section 2. |
52 | 54 | ||
53 | 1.1 How Does a Kprobe Work? | 55 | 1.1 How Does a Kprobe Work? |
54 | 56 | ||
@@ -161,13 +163,125 @@ In case probed function is entered but there is no kretprobe_instance | |||
161 | object available, then in addition to incrementing the nmissed count, | 163 | object available, then in addition to incrementing the nmissed count, |
162 | the user entry_handler invocation is also skipped. | 164 | the user entry_handler invocation is also skipped. |
163 | 165 | ||
166 | 1.4 How Does Jump Optimization Work? | ||
167 | |||
168 | If you configured your kernel with CONFIG_OPTPROBES=y (currently | ||
169 | this option is supported on x86/x86-64, non-preemptive kernel) and | ||
170 | the "debug.kprobes_optimization" kernel parameter is set to 1 (see | ||
171 | sysctl(8)), Kprobes tries to reduce probe-hit overhead by using a jump | ||
172 | instruction instead of a breakpoint instruction at each probepoint. | ||
173 | |||
174 | 1.4.1 Init a Kprobe | ||
175 | |||
176 | When a probe is registered, before attempting this optimization, | ||
177 | Kprobes inserts an ordinary, breakpoint-based kprobe at the specified | ||
178 | address. So, even if it's not possible to optimize this particular | ||
179 | probepoint, there'll be a probe there. | ||
180 | |||
181 | 1.4.2 Safety Check | ||
182 | |||
183 | Before optimizing a probe, Kprobes performs the following safety checks: | ||
184 | |||
185 | - Kprobes verifies that the region that will be replaced by the jump | ||
186 | instruction (the "optimized region") lies entirely within one function. | ||
187 | (A jump instruction is multiple bytes, and so may overlay multiple | ||
188 | instructions.) | ||
189 | |||
190 | - Kprobes analyzes the entire function and verifies that there is no | ||
191 | jump into the optimized region. Specifically: | ||
192 | - the function contains no indirect jump; | ||
193 | - the function contains no instruction that causes an exception (since | ||
194 | the fixup code triggered by the exception could jump back into the | ||
195 | optimized region -- Kprobes checks the exception tables to verify this); | ||
196 | and | ||
197 | - there is no near jump to the optimized region (other than to the first | ||
198 | byte). | ||
199 | |||
200 | - For each instruction in the optimized region, Kprobes verifies that | ||
201 | the instruction can be executed out of line. | ||
202 | |||
203 | 1.4.3 Preparing Detour Buffer | ||
204 | |||
205 | Next, Kprobes prepares a "detour" buffer, which contains the following | ||
206 | instruction sequence: | ||
207 | - code to push the CPU's registers (emulating a breakpoint trap) | ||
208 | - a call to the trampoline code which calls user's probe handlers. | ||
209 | - code to restore registers | ||
210 | - the instructions from the optimized region | ||
211 | - a jump back to the original execution path. | ||
212 | |||
213 | 1.4.4 Pre-optimization | ||
214 | |||
215 | After preparing the detour buffer, Kprobes verifies that none of the | ||
216 | following situations exist: | ||
217 | - The probe has either a break_handler (i.e., it's a jprobe) or a | ||
218 | post_handler. | ||
219 | - Other instructions in the optimized region are probed. | ||
220 | - The probe is disabled. | ||
221 | In any of the above cases, Kprobes won't start optimizing the probe. | ||
222 | Since these are temporary situations, Kprobes tries to start | ||
223 | optimizing it again if the situation is changed. | ||
224 | |||
225 | If the kprobe can be optimized, Kprobes enqueues the kprobe to an | ||
226 | optimizing list, and kicks the kprobe-optimizer workqueue to optimize | ||
227 | it. If the to-be-optimized probepoint is hit before being optimized, | ||
228 | Kprobes returns control to the original instruction path by setting | ||
229 | the CPU's instruction pointer to the copied code in the detour buffer | ||
230 | -- thus at least avoiding the single-step. | ||
231 | |||
232 | 1.4.5 Optimization | ||
233 | |||
234 | The Kprobe-optimizer doesn't insert the jump instruction immediately; | ||
235 | rather, it calls synchronize_sched() for safety first, because it's | ||
236 | possible for a CPU to be interrupted in the middle of executing the | ||
237 | optimized region(*). As you know, synchronize_sched() can ensure | ||
238 | that all interruptions that were active when synchronize_sched() | ||
239 | was called are done, but only if CONFIG_PREEMPT=n. So, this version | ||
240 | of kprobe optimization supports only kernels with CONFIG_PREEMPT=n.(**) | ||
241 | |||
242 | After that, the Kprobe-optimizer calls stop_machine() to replace | ||
243 | the optimized region with a jump instruction to the detour buffer, | ||
244 | using text_poke_smp(). | ||
245 | |||
246 | 1.4.6 Unoptimization | ||
247 | |||
248 | When an optimized kprobe is unregistered, disabled, or blocked by | ||
249 | another kprobe, it will be unoptimized. If this happens before | ||
250 | the optimization is complete, the kprobe is just dequeued from the | ||
251 | optimized list. If the optimization has been done, the jump is | ||
252 | replaced with the original code (except for an int3 breakpoint in | ||
253 | the first byte) by using text_poke_smp(). | ||
254 | |||
255 | (*)Please imagine that the 2nd instruction is interrupted and then | ||
256 | the optimizer replaces the 2nd instruction with the jump *address* | ||
257 | while the interrupt handler is running. When the interrupt | ||
258 | returns to original address, there is no valid instruction, | ||
259 | and it causes an unexpected result. | ||
260 | |||
261 | (**)This optimization-safety checking may be replaced with the | ||
262 | stop-machine method that ksplice uses for supporting a CONFIG_PREEMPT=y | ||
263 | kernel. | ||
264 | |||
265 | NOTE for geeks: | ||
266 | The jump optimization changes the kprobe's pre_handler behavior. | ||
267 | Without optimization, the pre_handler can change the kernel's execution | ||
268 | path by changing regs->ip and returning 1. However, when the probe | ||
269 | is optimized, that modification is ignored. Thus, if you want to | ||
270 | tweak the kernel's execution path, you need to suppress optimization, | ||
271 | using one of the following techniques: | ||
272 | - Specify an empty function for the kprobe's post_handler or break_handler. | ||
273 | or | ||
274 | - Config CONFIG_OPTPROBES=n. | ||
275 | or | ||
276 | - Execute 'sysctl -w debug.kprobes_optimization=n' | ||
277 | |||
164 | 2. Architectures Supported | 278 | 2. Architectures Supported |
165 | 279 | ||
166 | Kprobes, jprobes, and return probes are implemented on the following | 280 | Kprobes, jprobes, and return probes are implemented on the following |
167 | architectures: | 281 | architectures: |
168 | 282 | ||
169 | - i386 | 283 | - i386 (Supports jump optimization) |
170 | - x86_64 (AMD-64, EM64T) | 284 | - x86_64 (AMD-64, EM64T) (Supports jump optimization) |
171 | - ppc64 | 285 | - ppc64 |
172 | - ia64 (Does not support probes on instruction slot1.) | 286 | - ia64 (Does not support probes on instruction slot1.) |
173 | - sparc64 (Return probes not yet implemented.) | 287 | - sparc64 (Return probes not yet implemented.) |
@@ -193,6 +307,10 @@ it useful to "Compile the kernel with debug info" (CONFIG_DEBUG_INFO), | |||
193 | so you can use "objdump -d -l vmlinux" to see the source-to-object | 307 | so you can use "objdump -d -l vmlinux" to see the source-to-object |
194 | code mapping. | 308 | code mapping. |
195 | 309 | ||
310 | If you want to reduce probing overhead, set "Kprobes jump optimization | ||
311 | support" (CONFIG_OPTPROBES) to "y". You can find this option under the | ||
312 | "Kprobes" line. | ||
313 | |||
196 | 4. API Reference | 314 | 4. API Reference |
197 | 315 | ||
198 | The Kprobes API includes a "register" function and an "unregister" | 316 | The Kprobes API includes a "register" function and an "unregister" |
@@ -389,7 +507,10 @@ the probe which has been registered. | |||
389 | 507 | ||
390 | Kprobes allows multiple probes at the same address. Currently, | 508 | Kprobes allows multiple probes at the same address. Currently, |
391 | however, there cannot be multiple jprobes on the same function at | 509 | however, there cannot be multiple jprobes on the same function at |
392 | the same time. | 510 | the same time. Also, a probepoint for which there is a jprobe or |
511 | a post_handler cannot be optimized. So if you install a jprobe, | ||
512 | or a kprobe with a post_handler, at an optimized probepoint, the | ||
513 | probepoint will be unoptimized automatically. | ||
393 | 514 | ||
394 | In general, you can install a probe anywhere in the kernel. | 515 | In general, you can install a probe anywhere in the kernel. |
395 | In particular, you can probe interrupt handlers. Known exceptions | 516 | In particular, you can probe interrupt handlers. Known exceptions |
@@ -453,6 +574,38 @@ reason, Kprobes doesn't support return probes (or kprobes or jprobes) | |||
453 | on the x86_64 version of __switch_to(); the registration functions | 574 | on the x86_64 version of __switch_to(); the registration functions |
454 | return -EINVAL. | 575 | return -EINVAL. |
455 | 576 | ||
577 | On x86/x86-64, since the Jump Optimization of Kprobes modifies | ||
578 | instructions widely, there are some limitations to optimization. To | ||
579 | explain it, we introduce some terminology. Imagine a 3-instruction | ||
580 | sequence consisting of a two 2-byte instructions and one 3-byte | ||
581 | instruction. | ||
582 | |||
583 | IA | ||
584 | | | ||
585 | [-2][-1][0][1][2][3][4][5][6][7] | ||
586 | [ins1][ins2][ ins3 ] | ||
587 | [<- DCR ->] | ||
588 | [<- JTPR ->] | ||
589 | |||
590 | ins1: 1st Instruction | ||
591 | ins2: 2nd Instruction | ||
592 | ins3: 3rd Instruction | ||
593 | IA: Insertion Address | ||
594 | JTPR: Jump Target Prohibition Region | ||
595 | DCR: Detoured Code Region | ||
596 | |||
597 | The instructions in DCR are copied to the out-of-line buffer | ||
598 | of the kprobe, because the bytes in DCR are replaced by | ||
599 | a 5-byte jump instruction. So there are several limitations. | ||
600 | |||
601 | a) The instructions in DCR must be relocatable. | ||
602 | b) The instructions in DCR must not include a call instruction. | ||
603 | c) JTPR must not be targeted by any jump or call instruction. | ||
604 | d) DCR must not straddle the border betweeen functions. | ||
605 | |||
606 | Anyway, these limitations are checked by the in-kernel instruction | ||
607 | decoder, so you don't need to worry about that. | ||
608 | |||
456 | 6. Probe Overhead | 609 | 6. Probe Overhead |
457 | 610 | ||
458 | On a typical CPU in use in 2005, a kprobe hit takes 0.5 to 1.0 | 611 | On a typical CPU in use in 2005, a kprobe hit takes 0.5 to 1.0 |
@@ -476,6 +629,19 @@ k = 0.49 usec; j = 0.76; r = 0.80; kr = 0.82; jr = 1.07 | |||
476 | ppc64: POWER5 (gr), 1656 MHz (SMT disabled, 1 virtual CPU per physical CPU) | 629 | ppc64: POWER5 (gr), 1656 MHz (SMT disabled, 1 virtual CPU per physical CPU) |
477 | k = 0.77 usec; j = 1.31; r = 1.26; kr = 1.45; jr = 1.99 | 630 | k = 0.77 usec; j = 1.31; r = 1.26; kr = 1.45; jr = 1.99 |
478 | 631 | ||
632 | 6.1 Optimized Probe Overhead | ||
633 | |||
634 | Typically, an optimized kprobe hit takes 0.07 to 0.1 microseconds to | ||
635 | process. Here are sample overhead figures (in usec) for x86 architectures. | ||
636 | k = unoptimized kprobe, b = boosted (single-step skipped), o = optimized kprobe, | ||
637 | r = unoptimized kretprobe, rb = boosted kretprobe, ro = optimized kretprobe. | ||
638 | |||
639 | i386: Intel(R) Xeon(R) E5410, 2.33GHz, 4656.90 bogomips | ||
640 | k = 0.80 usec; b = 0.33; o = 0.05; r = 1.10; rb = 0.61; ro = 0.33 | ||
641 | |||
642 | x86-64: Intel(R) Xeon(R) E5410, 2.33GHz, 4656.90 bogomips | ||
643 | k = 0.99 usec; b = 0.43; o = 0.06; r = 1.24; rb = 0.68; ro = 0.30 | ||
644 | |||
479 | 7. TODO | 645 | 7. TODO |
480 | 646 | ||
481 | a. SystemTap (http://sourceware.org/systemtap): Provides a simplified | 647 | a. SystemTap (http://sourceware.org/systemtap): Provides a simplified |
@@ -523,7 +689,8 @@ is also specified. Following columns show probe status. If the probe is on | |||
523 | a virtual address that is no longer valid (module init sections, module | 689 | a virtual address that is no longer valid (module init sections, module |
524 | virtual addresses that correspond to modules that've been unloaded), | 690 | virtual addresses that correspond to modules that've been unloaded), |
525 | such probes are marked with [GONE]. If the probe is temporarily disabled, | 691 | such probes are marked with [GONE]. If the probe is temporarily disabled, |
526 | such probes are marked with [DISABLED]. | 692 | such probes are marked with [DISABLED]. If the probe is optimized, it is |
693 | marked with [OPTIMIZED]. | ||
527 | 694 | ||
528 | /sys/kernel/debug/kprobes/enabled: Turn kprobes ON/OFF forcibly. | 695 | /sys/kernel/debug/kprobes/enabled: Turn kprobes ON/OFF forcibly. |
529 | 696 | ||
@@ -533,3 +700,19 @@ registered probes will be disarmed, till such time a "1" is echoed to this | |||
533 | file. Note that this knob just disarms and arms all kprobes and doesn't | 700 | file. Note that this knob just disarms and arms all kprobes and doesn't |
534 | change each probe's disabling state. This means that disabled kprobes (marked | 701 | change each probe's disabling state. This means that disabled kprobes (marked |
535 | [DISABLED]) will be not enabled if you turn ON all kprobes by this knob. | 702 | [DISABLED]) will be not enabled if you turn ON all kprobes by this knob. |
703 | |||
704 | |||
705 | Appendix B: The kprobes sysctl interface | ||
706 | |||
707 | /proc/sys/debug/kprobes-optimization: Turn kprobes optimization ON/OFF. | ||
708 | |||
709 | When CONFIG_OPTPROBES=y, this sysctl interface appears and it provides | ||
710 | a knob to globally and forcibly turn jump optimization (see section | ||
711 | 1.4) ON or OFF. By default, jump optimization is allowed (ON). | ||
712 | If you echo "0" to this file or set "debug.kprobes_optimization" to | ||
713 | 0 via sysctl, all optimized probes will be unoptimized, and any new | ||
714 | probes registered after that will not be optimized. Note that this | ||
715 | knob *changes* the optimized state. This means that optimized probes | ||
716 | (marked [OPTIMIZED]) will be unoptimized ([OPTIMIZED] tag will be | ||
717 | removed). If the knob is turned on, they will be optimized again. | ||
718 | |||
diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt index 2811e452f756..c6416a398163 100644 --- a/Documentation/kvm/api.txt +++ b/Documentation/kvm/api.txt | |||
@@ -23,12 +23,12 @@ of a virtual machine. The ioctls belong to three classes | |||
23 | Only run vcpu ioctls from the same thread that was used to create the | 23 | Only run vcpu ioctls from the same thread that was used to create the |
24 | vcpu. | 24 | vcpu. |
25 | 25 | ||
26 | 2. File descritpors | 26 | 2. File descriptors |
27 | 27 | ||
28 | The kvm API is centered around file descriptors. An initial | 28 | The kvm API is centered around file descriptors. An initial |
29 | open("/dev/kvm") obtains a handle to the kvm subsystem; this handle | 29 | open("/dev/kvm") obtains a handle to the kvm subsystem; this handle |
30 | can be used to issue system ioctls. A KVM_CREATE_VM ioctl on this | 30 | can be used to issue system ioctls. A KVM_CREATE_VM ioctl on this |
31 | handle will create a VM file descripror which can be used to issue VM | 31 | handle will create a VM file descriptor which can be used to issue VM |
32 | ioctls. A KVM_CREATE_VCPU ioctl on a VM fd will create a virtual cpu | 32 | ioctls. A KVM_CREATE_VCPU ioctl on a VM fd will create a virtual cpu |
33 | and return a file descriptor pointing to it. Finally, ioctls on a vcpu | 33 | and return a file descriptor pointing to it. Finally, ioctls on a vcpu |
34 | fd can be used to control the vcpu, including the important task of | 34 | fd can be used to control the vcpu, including the important task of |
@@ -643,7 +643,7 @@ Type: vm ioctl | |||
643 | Parameters: struct kvm_clock_data (in) | 643 | Parameters: struct kvm_clock_data (in) |
644 | Returns: 0 on success, -1 on error | 644 | Returns: 0 on success, -1 on error |
645 | 645 | ||
646 | Sets the current timestamp of kvmclock to the valued specific in its parameter. | 646 | Sets the current timestamp of kvmclock to the value specified in its parameter. |
647 | In conjunction with KVM_GET_CLOCK, it is used to ensure monotonicity on scenarios | 647 | In conjunction with KVM_GET_CLOCK, it is used to ensure monotonicity on scenarios |
648 | such as migration. | 648 | such as migration. |
649 | 649 | ||
@@ -795,11 +795,11 @@ Unused. | |||
795 | __u64 data_offset; /* relative to kvm_run start */ | 795 | __u64 data_offset; /* relative to kvm_run start */ |
796 | } io; | 796 | } io; |
797 | 797 | ||
798 | If exit_reason is KVM_EXIT_IO_IN or KVM_EXIT_IO_OUT, then the vcpu has | 798 | If exit_reason is KVM_EXIT_IO, then the vcpu has |
799 | executed a port I/O instruction which could not be satisfied by kvm. | 799 | executed a port I/O instruction which could not be satisfied by kvm. |
800 | data_offset describes where the data is located (KVM_EXIT_IO_OUT) or | 800 | data_offset describes where the data is located (KVM_EXIT_IO_OUT) or |
801 | where kvm expects application code to place the data for the next | 801 | where kvm expects application code to place the data for the next |
802 | KVM_RUN invocation (KVM_EXIT_IO_IN). Data format is a patcked array. | 802 | KVM_RUN invocation (KVM_EXIT_IO_IN). Data format is a packed array. |
803 | 803 | ||
804 | struct { | 804 | struct { |
805 | struct kvm_debug_exit_arch arch; | 805 | struct kvm_debug_exit_arch arch; |
@@ -815,7 +815,7 @@ Unused. | |||
815 | __u8 is_write; | 815 | __u8 is_write; |
816 | } mmio; | 816 | } mmio; |
817 | 817 | ||
818 | If exit_reason is KVM_EXIT_MMIO or KVM_EXIT_IO_OUT, then the vcpu has | 818 | If exit_reason is KVM_EXIT_MMIO, then the vcpu has |
819 | executed a memory-mapped I/O instruction which could not be satisfied | 819 | executed a memory-mapped I/O instruction which could not be satisfied |
820 | by kvm. The 'data' member contains the written data if 'is_write' is | 820 | by kvm. The 'data' member contains the written data if 'is_write' is |
821 | true, and should be filled by application code otherwise. | 821 | true, and should be filled by application code otherwise. |
diff --git a/Documentation/laptops/00-INDEX b/Documentation/laptops/00-INDEX index ee5692b26dd4..fa688538e757 100644 --- a/Documentation/laptops/00-INDEX +++ b/Documentation/laptops/00-INDEX | |||
@@ -2,6 +2,12 @@ | |||
2 | - This file | 2 | - This file |
3 | acer-wmi.txt | 3 | acer-wmi.txt |
4 | - information on the Acer Laptop WMI Extras driver. | 4 | - information on the Acer Laptop WMI Extras driver. |
5 | asus-laptop.txt | ||
6 | - information on the Asus Laptop Extras driver. | ||
7 | disk-shock-protection.txt | ||
8 | - information on hard disk shock protection. | ||
9 | dslm.c | ||
10 | - Simple Disk Sleep Monitor program | ||
5 | laptop-mode.txt | 11 | laptop-mode.txt |
6 | - how to conserve battery power using laptop-mode. | 12 | - how to conserve battery power using laptop-mode. |
7 | sony-laptop.txt | 13 | sony-laptop.txt |
diff --git a/Documentation/laptops/Makefile b/Documentation/laptops/Makefile new file mode 100644 index 000000000000..5cb144af3c09 --- /dev/null +++ b/Documentation/laptops/Makefile | |||
@@ -0,0 +1,8 @@ | |||
1 | # kbuild trick to avoid linker error. Can be omitted if a module is built. | ||
2 | obj- := dummy.o | ||
3 | |||
4 | # List of programs to build | ||
5 | hostprogs-y := dslm | ||
6 | |||
7 | # Tell kbuild to always build the programs | ||
8 | always := $(hostprogs-y) | ||
diff --git a/Documentation/laptops/dslm.c b/Documentation/laptops/dslm.c new file mode 100644 index 000000000000..72ff290c5fc6 --- /dev/null +++ b/Documentation/laptops/dslm.c | |||
@@ -0,0 +1,166 @@ | |||
1 | /* | ||
2 | * dslm.c | ||
3 | * Simple Disk Sleep Monitor | ||
4 | * by Bartek Kania | ||
5 | * Licenced under the GPL | ||
6 | */ | ||
7 | #include <unistd.h> | ||
8 | #include <stdlib.h> | ||
9 | #include <stdio.h> | ||
10 | #include <fcntl.h> | ||
11 | #include <errno.h> | ||
12 | #include <time.h> | ||
13 | #include <string.h> | ||
14 | #include <signal.h> | ||
15 | #include <sys/ioctl.h> | ||
16 | #include <linux/hdreg.h> | ||
17 | |||
18 | #ifdef DEBUG | ||
19 | #define D(x) x | ||
20 | #else | ||
21 | #define D(x) | ||
22 | #endif | ||
23 | |||
24 | int endit = 0; | ||
25 | |||
26 | /* Check if the disk is in powersave-mode | ||
27 | * Most of the code is stolen from hdparm. | ||
28 | * 1 = active, 0 = standby/sleep, -1 = unknown */ | ||
29 | static int check_powermode(int fd) | ||
30 | { | ||
31 | unsigned char args[4] = {WIN_CHECKPOWERMODE1,0,0,0}; | ||
32 | int state; | ||
33 | |||
34 | if (ioctl(fd, HDIO_DRIVE_CMD, &args) | ||
35 | && (args[0] = WIN_CHECKPOWERMODE2) /* try again with 0x98 */ | ||
36 | && ioctl(fd, HDIO_DRIVE_CMD, &args)) { | ||
37 | if (errno != EIO || args[0] != 0 || args[1] != 0) { | ||
38 | state = -1; /* "unknown"; */ | ||
39 | } else | ||
40 | state = 0; /* "sleeping"; */ | ||
41 | } else { | ||
42 | state = (args[2] == 255) ? 1 : 0; | ||
43 | } | ||
44 | D(printf(" drive state is: %d\n", state)); | ||
45 | |||
46 | return state; | ||
47 | } | ||
48 | |||
49 | static char *state_name(int i) | ||
50 | { | ||
51 | if (i == -1) return "unknown"; | ||
52 | if (i == 0) return "sleeping"; | ||
53 | if (i == 1) return "active"; | ||
54 | |||
55 | return "internal error"; | ||
56 | } | ||
57 | |||
58 | static char *myctime(time_t time) | ||
59 | { | ||
60 | char *ts = ctime(&time); | ||
61 | ts[strlen(ts) - 1] = 0; | ||
62 | |||
63 | return ts; | ||
64 | } | ||
65 | |||
66 | static void measure(int fd) | ||
67 | { | ||
68 | time_t start_time; | ||
69 | int last_state; | ||
70 | time_t last_time; | ||
71 | int curr_state; | ||
72 | time_t curr_time = 0; | ||
73 | time_t time_diff; | ||
74 | time_t active_time = 0; | ||
75 | time_t sleep_time = 0; | ||
76 | time_t unknown_time = 0; | ||
77 | time_t total_time = 0; | ||
78 | int changes = 0; | ||
79 | float tmp; | ||
80 | |||
81 | printf("Starting measurements\n"); | ||
82 | |||
83 | last_state = check_powermode(fd); | ||
84 | start_time = last_time = time(0); | ||
85 | printf(" System is in state %s\n\n", state_name(last_state)); | ||
86 | |||
87 | while(!endit) { | ||
88 | sleep(1); | ||
89 | curr_state = check_powermode(fd); | ||
90 | |||
91 | if (curr_state != last_state || endit) { | ||
92 | changes++; | ||
93 | curr_time = time(0); | ||
94 | time_diff = curr_time - last_time; | ||
95 | |||
96 | if (last_state == 1) active_time += time_diff; | ||
97 | else if (last_state == 0) sleep_time += time_diff; | ||
98 | else unknown_time += time_diff; | ||
99 | |||
100 | last_state = curr_state; | ||
101 | last_time = curr_time; | ||
102 | |||
103 | printf("%s: State-change to %s\n", myctime(curr_time), | ||
104 | state_name(curr_state)); | ||
105 | } | ||
106 | } | ||
107 | changes--; /* Compensate for SIGINT */ | ||
108 | |||
109 | total_time = time(0) - start_time; | ||
110 | printf("\nTotal running time: %lus\n", curr_time - start_time); | ||
111 | printf(" State changed %d times\n", changes); | ||
112 | |||
113 | tmp = (float)sleep_time / (float)total_time * 100; | ||
114 | printf(" Time in sleep state: %lus (%.2f%%)\n", sleep_time, tmp); | ||
115 | tmp = (float)active_time / (float)total_time * 100; | ||
116 | printf(" Time in active state: %lus (%.2f%%)\n", active_time, tmp); | ||
117 | tmp = (float)unknown_time / (float)total_time * 100; | ||
118 | printf(" Time in unknown state: %lus (%.2f%%)\n", unknown_time, tmp); | ||
119 | } | ||
120 | |||
121 | static void ender(int s) | ||
122 | { | ||
123 | endit = 1; | ||
124 | } | ||
125 | |||
126 | static void usage(void) | ||
127 | { | ||
128 | puts("usage: dslm [-w <time>] <disk>"); | ||
129 | exit(0); | ||
130 | } | ||
131 | |||
132 | int main(int argc, char **argv) | ||
133 | { | ||
134 | int fd; | ||
135 | char *disk = 0; | ||
136 | int settle_time = 60; | ||
137 | |||
138 | /* Parse the simple command-line */ | ||
139 | if (argc == 2) | ||
140 | disk = argv[1]; | ||
141 | else if (argc == 4) { | ||
142 | settle_time = atoi(argv[2]); | ||
143 | disk = argv[3]; | ||
144 | } else | ||
145 | usage(); | ||
146 | |||
147 | if (!(fd = open(disk, O_RDONLY|O_NONBLOCK))) { | ||
148 | printf("Can't open %s, because: %s\n", disk, strerror(errno)); | ||
149 | exit(-1); | ||
150 | } | ||
151 | |||
152 | if (settle_time) { | ||
153 | printf("Waiting %d seconds for the system to settle down to " | ||
154 | "'normal'\n", settle_time); | ||
155 | sleep(settle_time); | ||
156 | } else | ||
157 | puts("Not waiting for system to settle down"); | ||
158 | |||
159 | signal(SIGINT, ender); | ||
160 | |||
161 | measure(fd); | ||
162 | |||
163 | close(fd); | ||
164 | |||
165 | return 0; | ||
166 | } | ||
diff --git a/Documentation/laptops/laptop-mode.txt b/Documentation/laptops/laptop-mode.txt index eeedee11c8c2..2c3c35093023 100644 --- a/Documentation/laptops/laptop-mode.txt +++ b/Documentation/laptops/laptop-mode.txt | |||
@@ -779,172 +779,4 @@ Monitoring tool | |||
779 | --------------- | 779 | --------------- |
780 | 780 | ||
781 | Bartek Kania submitted this, it can be used to measure how much time your disk | 781 | Bartek Kania submitted this, it can be used to measure how much time your disk |
782 | spends spun up/down. | 782 | spends spun up/down. See Documentation/laptops/dslm.c |
783 | |||
784 | ---------------------------dslm.c BEGIN----------------------------------------- | ||
785 | /* | ||
786 | * Simple Disk Sleep Monitor | ||
787 | * by Bartek Kania | ||
788 | * Licenced under the GPL | ||
789 | */ | ||
790 | #include <unistd.h> | ||
791 | #include <stdlib.h> | ||
792 | #include <stdio.h> | ||
793 | #include <fcntl.h> | ||
794 | #include <errno.h> | ||
795 | #include <time.h> | ||
796 | #include <string.h> | ||
797 | #include <signal.h> | ||
798 | #include <sys/ioctl.h> | ||
799 | #include <linux/hdreg.h> | ||
800 | |||
801 | #ifdef DEBUG | ||
802 | #define D(x) x | ||
803 | #else | ||
804 | #define D(x) | ||
805 | #endif | ||
806 | |||
807 | int endit = 0; | ||
808 | |||
809 | /* Check if the disk is in powersave-mode | ||
810 | * Most of the code is stolen from hdparm. | ||
811 | * 1 = active, 0 = standby/sleep, -1 = unknown */ | ||
812 | int check_powermode(int fd) | ||
813 | { | ||
814 | unsigned char args[4] = {WIN_CHECKPOWERMODE1,0,0,0}; | ||
815 | int state; | ||
816 | |||
817 | if (ioctl(fd, HDIO_DRIVE_CMD, &args) | ||
818 | && (args[0] = WIN_CHECKPOWERMODE2) /* try again with 0x98 */ | ||
819 | && ioctl(fd, HDIO_DRIVE_CMD, &args)) { | ||
820 | if (errno != EIO || args[0] != 0 || args[1] != 0) { | ||
821 | state = -1; /* "unknown"; */ | ||
822 | } else | ||
823 | state = 0; /* "sleeping"; */ | ||
824 | } else { | ||
825 | state = (args[2] == 255) ? 1 : 0; | ||
826 | } | ||
827 | D(printf(" drive state is: %d\n", state)); | ||
828 | |||
829 | return state; | ||
830 | } | ||
831 | |||
832 | char *state_name(int i) | ||
833 | { | ||
834 | if (i == -1) return "unknown"; | ||
835 | if (i == 0) return "sleeping"; | ||
836 | if (i == 1) return "active"; | ||
837 | |||
838 | return "internal error"; | ||
839 | } | ||
840 | |||
841 | char *myctime(time_t time) | ||
842 | { | ||
843 | char *ts = ctime(&time); | ||
844 | ts[strlen(ts) - 1] = 0; | ||
845 | |||
846 | return ts; | ||
847 | } | ||
848 | |||
849 | void measure(int fd) | ||
850 | { | ||
851 | time_t start_time; | ||
852 | int last_state; | ||
853 | time_t last_time; | ||
854 | int curr_state; | ||
855 | time_t curr_time = 0; | ||
856 | time_t time_diff; | ||
857 | time_t active_time = 0; | ||
858 | time_t sleep_time = 0; | ||
859 | time_t unknown_time = 0; | ||
860 | time_t total_time = 0; | ||
861 | int changes = 0; | ||
862 | float tmp; | ||
863 | |||
864 | printf("Starting measurements\n"); | ||
865 | |||
866 | last_state = check_powermode(fd); | ||
867 | start_time = last_time = time(0); | ||
868 | printf(" System is in state %s\n\n", state_name(last_state)); | ||
869 | |||
870 | while(!endit) { | ||
871 | sleep(1); | ||
872 | curr_state = check_powermode(fd); | ||
873 | |||
874 | if (curr_state != last_state || endit) { | ||
875 | changes++; | ||
876 | curr_time = time(0); | ||
877 | time_diff = curr_time - last_time; | ||
878 | |||
879 | if (last_state == 1) active_time += time_diff; | ||
880 | else if (last_state == 0) sleep_time += time_diff; | ||
881 | else unknown_time += time_diff; | ||
882 | |||
883 | last_state = curr_state; | ||
884 | last_time = curr_time; | ||
885 | |||
886 | printf("%s: State-change to %s\n", myctime(curr_time), | ||
887 | state_name(curr_state)); | ||
888 | } | ||
889 | } | ||
890 | changes--; /* Compensate for SIGINT */ | ||
891 | |||
892 | total_time = time(0) - start_time; | ||
893 | printf("\nTotal running time: %lus\n", curr_time - start_time); | ||
894 | printf(" State changed %d times\n", changes); | ||
895 | |||
896 | tmp = (float)sleep_time / (float)total_time * 100; | ||
897 | printf(" Time in sleep state: %lus (%.2f%%)\n", sleep_time, tmp); | ||
898 | tmp = (float)active_time / (float)total_time * 100; | ||
899 | printf(" Time in active state: %lus (%.2f%%)\n", active_time, tmp); | ||
900 | tmp = (float)unknown_time / (float)total_time * 100; | ||
901 | printf(" Time in unknown state: %lus (%.2f%%)\n", unknown_time, tmp); | ||
902 | } | ||
903 | |||
904 | void ender(int s) | ||
905 | { | ||
906 | endit = 1; | ||
907 | } | ||
908 | |||
909 | void usage() | ||
910 | { | ||
911 | puts("usage: dslm [-w <time>] <disk>"); | ||
912 | exit(0); | ||
913 | } | ||
914 | |||
915 | int main(int argc, char **argv) | ||
916 | { | ||
917 | int fd; | ||
918 | char *disk = 0; | ||
919 | int settle_time = 60; | ||
920 | |||
921 | /* Parse the simple command-line */ | ||
922 | if (argc == 2) | ||
923 | disk = argv[1]; | ||
924 | else if (argc == 4) { | ||
925 | settle_time = atoi(argv[2]); | ||
926 | disk = argv[3]; | ||
927 | } else | ||
928 | usage(); | ||
929 | |||
930 | if (!(fd = open(disk, O_RDONLY|O_NONBLOCK))) { | ||
931 | printf("Can't open %s, because: %s\n", disk, strerror(errno)); | ||
932 | exit(-1); | ||
933 | } | ||
934 | |||
935 | if (settle_time) { | ||
936 | printf("Waiting %d seconds for the system to settle down to " | ||
937 | "'normal'\n", settle_time); | ||
938 | sleep(settle_time); | ||
939 | } else | ||
940 | puts("Not waiting for system to settle down"); | ||
941 | |||
942 | signal(SIGINT, ender); | ||
943 | |||
944 | measure(fd); | ||
945 | |||
946 | close(fd); | ||
947 | |||
948 | return 0; | ||
949 | } | ||
950 | ---------------------------dslm.c END------------------------------------------- | ||
diff --git a/Documentation/laptops/thinkpad-acpi.txt b/Documentation/laptops/thinkpad-acpi.txt index 75afa1229fd7..39c0a09d0105 100644 --- a/Documentation/laptops/thinkpad-acpi.txt +++ b/Documentation/laptops/thinkpad-acpi.txt | |||
@@ -650,6 +650,10 @@ LCD, CRT or DVI (if available). The following commands are available: | |||
650 | echo expand_toggle > /proc/acpi/ibm/video | 650 | echo expand_toggle > /proc/acpi/ibm/video |
651 | echo video_switch > /proc/acpi/ibm/video | 651 | echo video_switch > /proc/acpi/ibm/video |
652 | 652 | ||
653 | NOTE: Access to this feature is restricted to processes owning the | ||
654 | CAP_SYS_ADMIN capability for safety reasons, as it can interact badly | ||
655 | enough with some versions of X.org to crash it. | ||
656 | |||
653 | Each video output device can be enabled or disabled individually. | 657 | Each video output device can be enabled or disabled individually. |
654 | Reading /proc/acpi/ibm/video shows the status of each device. | 658 | Reading /proc/acpi/ibm/video shows the status of each device. |
655 | 659 | ||
diff --git a/Documentation/lguest/lguest.c b/Documentation/lguest/lguest.c index 42208511b5c0..3119f5db75bd 100644 --- a/Documentation/lguest/lguest.c +++ b/Documentation/lguest/lguest.c | |||
@@ -34,7 +34,6 @@ | |||
34 | #include <sys/uio.h> | 34 | #include <sys/uio.h> |
35 | #include <termios.h> | 35 | #include <termios.h> |
36 | #include <getopt.h> | 36 | #include <getopt.h> |
37 | #include <zlib.h> | ||
38 | #include <assert.h> | 37 | #include <assert.h> |
39 | #include <sched.h> | 38 | #include <sched.h> |
40 | #include <limits.h> | 39 | #include <limits.h> |
diff --git a/Documentation/networking/00-INDEX b/Documentation/networking/00-INDEX index 50189bf07d53..fe5c099b8fc8 100644 --- a/Documentation/networking/00-INDEX +++ b/Documentation/networking/00-INDEX | |||
@@ -32,6 +32,8 @@ cs89x0.txt | |||
32 | - the Crystal LAN (CS8900/20-based) Ethernet ISA adapter driver | 32 | - the Crystal LAN (CS8900/20-based) Ethernet ISA adapter driver |
33 | cxacru.txt | 33 | cxacru.txt |
34 | - Conexant AccessRunner USB ADSL Modem | 34 | - Conexant AccessRunner USB ADSL Modem |
35 | cxacru-cf.py | ||
36 | - Conexant AccessRunner USB ADSL Modem configuration file parser | ||
35 | de4x5.txt | 37 | de4x5.txt |
36 | - the Digital EtherWORKS DE4?? and DE5?? PCI Ethernet driver | 38 | - the Digital EtherWORKS DE4?? and DE5?? PCI Ethernet driver |
37 | decnet.txt | 39 | decnet.txt |
diff --git a/Documentation/networking/cxacru-cf.py b/Documentation/networking/cxacru-cf.py new file mode 100644 index 000000000000..b41d298398c8 --- /dev/null +++ b/Documentation/networking/cxacru-cf.py | |||
@@ -0,0 +1,48 @@ | |||
1 | #!/usr/bin/env python | ||
2 | # Copyright 2009 Simon Arlott | ||
3 | # | ||
4 | # This program is free software; you can redistribute it and/or modify it | ||
5 | # under the terms of the GNU General Public License as published by the Free | ||
6 | # Software Foundation; either version 2 of the License, or (at your option) | ||
7 | # any later version. | ||
8 | # | ||
9 | # This program is distributed in the hope that it will be useful, but WITHOUT | ||
10 | # ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or | ||
11 | # FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for | ||
12 | # more details. | ||
13 | # | ||
14 | # You should have received a copy of the GNU General Public License along with | ||
15 | # this program; if not, write to the Free Software Foundation, Inc., 59 | ||
16 | # Temple Place - Suite 330, Boston, MA 02111-1307, USA. | ||
17 | # | ||
18 | # Usage: cxacru-cf.py < cxacru-cf.bin | ||
19 | # Output: values string suitable for the sysfs adsl_config attribute | ||
20 | # | ||
21 | # Warning: cxacru-cf.bin with MD5 hash cdbac2689969d5ed5d4850f117702110 | ||
22 | # contains mis-aligned values which will stop the modem from being able | ||
23 | # to make a connection. If the first and last two bytes are removed then | ||
24 | # the values become valid, but the modulation will be forced to ANSI | ||
25 | # T1.413 only which may not be appropriate. | ||
26 | # | ||
27 | # The original binary format is a packed list of le32 values. | ||
28 | |||
29 | import sys | ||
30 | import struct | ||
31 | |||
32 | i = 0 | ||
33 | while True: | ||
34 | buf = sys.stdin.read(4) | ||
35 | |||
36 | if len(buf) == 0: | ||
37 | break | ||
38 | elif len(buf) != 4: | ||
39 | sys.stdout.write("\n") | ||
40 | sys.stderr.write("Error: read {0} not 4 bytes\n".format(len(buf))) | ||
41 | sys.exit(1) | ||
42 | |||
43 | if i > 0: | ||
44 | sys.stdout.write(" ") | ||
45 | sys.stdout.write("{0:x}={1}".format(i, struct.unpack("<I", buf)[0])) | ||
46 | i += 1 | ||
47 | |||
48 | sys.stdout.write("\n") | ||
diff --git a/Documentation/networking/cxacru.txt b/Documentation/networking/cxacru.txt index b074681a963e..2cce04457b4d 100644 --- a/Documentation/networking/cxacru.txt +++ b/Documentation/networking/cxacru.txt | |||
@@ -4,6 +4,12 @@ While it is capable of managing/maintaining the ADSL connection without the | |||
4 | module loaded, the device will sometimes stop responding after unloading the | 4 | module loaded, the device will sometimes stop responding after unloading the |
5 | driver and it is necessary to unplug/remove power to the device to fix this. | 5 | driver and it is necessary to unplug/remove power to the device to fix this. |
6 | 6 | ||
7 | Note: support for cxacru-cf.bin has been removed. It was not loaded correctly | ||
8 | so it had no effect on the device configuration. Fixing it could have stopped | ||
9 | existing devices working when an invalid configuration is supplied. | ||
10 | |||
11 | There is a script cxacru-cf.py to convert an existing file to the sysfs form. | ||
12 | |||
7 | Detected devices will appear as ATM devices named "cxacru". In /sys/class/atm/ | 13 | Detected devices will appear as ATM devices named "cxacru". In /sys/class/atm/ |
8 | these are directories named cxacruN where N is the device number. A symlink | 14 | these are directories named cxacruN where N is the device number. A symlink |
9 | named device points to the USB interface device's directory which contains | 15 | named device points to the USB interface device's directory which contains |
@@ -15,6 +21,15 @@ several sysfs attribute files for retrieving device statistics: | |||
15 | * adsl_headend_environment | 21 | * adsl_headend_environment |
16 | Information about the remote headend. | 22 | Information about the remote headend. |
17 | 23 | ||
24 | * adsl_config | ||
25 | Configuration writing interface. | ||
26 | Write parameters in hexadecimal format <index>=<value>, | ||
27 | separated by whitespace, e.g.: | ||
28 | "1=0 a=5" | ||
29 | Up to 7 parameters at a time will be sent and the modem will restart | ||
30 | the ADSL connection when any value is set. These are logged for future | ||
31 | reference. | ||
32 | |||
18 | * downstream_attenuation (dB) | 33 | * downstream_attenuation (dB) |
19 | * downstream_bits_per_frame | 34 | * downstream_bits_per_frame |
20 | * downstream_rate (kbps) | 35 | * downstream_rate (kbps) |
@@ -61,6 +76,7 @@ several sysfs attribute files for retrieving device statistics: | |||
61 | * mac_address | 76 | * mac_address |
62 | 77 | ||
63 | * modulation | 78 | * modulation |
79 | "" (when not connected) | ||
64 | "ANSI T1.413" | 80 | "ANSI T1.413" |
65 | "ITU-T G.992.1 (G.DMT)" | 81 | "ITU-T G.992.1 (G.DMT)" |
66 | "ITU-T G.992.2 (G.LITE)" | 82 | "ITU-T G.992.2 (G.LITE)" |
diff --git a/Documentation/networking/dccp.txt b/Documentation/networking/dccp.txt index b132e4a3cf0f..a62fdf7a6bff 100644 --- a/Documentation/networking/dccp.txt +++ b/Documentation/networking/dccp.txt | |||
@@ -58,8 +58,10 @@ DCCP_SOCKOPT_GET_CUR_MPS is read-only and retrieves the current maximum packet | |||
58 | size (application payload size) in bytes, see RFC 4340, section 14. | 58 | size (application payload size) in bytes, see RFC 4340, section 14. |
59 | 59 | ||
60 | DCCP_SOCKOPT_AVAILABLE_CCIDS is also read-only and returns the list of CCIDs | 60 | DCCP_SOCKOPT_AVAILABLE_CCIDS is also read-only and returns the list of CCIDs |
61 | supported by the endpoint (see include/linux/dccp.h for symbolic constants). | 61 | supported by the endpoint. The option value is an array of type uint8_t whose |
62 | The caller needs to provide a sufficiently large (> 2) array of type uint8_t. | 62 | size is passed as option length. The minimum array size is 4 elements, the |
63 | value returned in the optlen argument always reflects the true number of | ||
64 | built-in CCIDs. | ||
63 | 65 | ||
64 | DCCP_SOCKOPT_CCID is write-only and sets both the TX and RX CCIDs at the same | 66 | DCCP_SOCKOPT_CCID is write-only and sets both the TX and RX CCIDs at the same |
65 | time, combining the operation of the next two socket options. This option is | 67 | time, combining the operation of the next two socket options. This option is |
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index 006b39dec87d..8b72c88ba213 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt | |||
@@ -487,6 +487,30 @@ tcp_dma_copybreak - INTEGER | |||
487 | and CONFIG_NET_DMA is enabled. | 487 | and CONFIG_NET_DMA is enabled. |
488 | Default: 4096 | 488 | Default: 4096 |
489 | 489 | ||
490 | tcp_thin_linear_timeouts - BOOLEAN | ||
491 | Enable dynamic triggering of linear timeouts for thin streams. | ||
492 | If set, a check is performed upon retransmission by timeout to | ||
493 | determine if the stream is thin (less than 4 packets in flight). | ||
494 | As long as the stream is found to be thin, up to 6 linear | ||
495 | timeouts may be performed before exponential backoff mode is | ||
496 | initiated. This improves retransmission latency for | ||
497 | non-aggressive thin streams, often found to be time-dependent. | ||
498 | For more information on thin streams, see | ||
499 | Documentation/networking/tcp-thin.txt | ||
500 | Default: 0 | ||
501 | |||
502 | tcp_thin_dupack - BOOLEAN | ||
503 | Enable dynamic triggering of retransmissions after one dupACK | ||
504 | for thin streams. If set, a check is performed upon reception | ||
505 | of a dupACK to determine if the stream is thin (less than 4 | ||
506 | packets in flight). As long as the stream is found to be thin, | ||
507 | data is retransmitted on the first received dupACK. This | ||
508 | improves retransmission latency for non-aggressive thin | ||
509 | streams, often found to be time-dependent. | ||
510 | For more information on thin streams, see | ||
511 | Documentation/networking/tcp-thin.txt | ||
512 | Default: 0 | ||
513 | |||
490 | UDP variables: | 514 | UDP variables: |
491 | 515 | ||
492 | udp_mem - vector of 3 INTEGERs: min, pressure, max | 516 | udp_mem - vector of 3 INTEGERs: min, pressure, max |
@@ -692,6 +716,25 @@ proxy_arp - BOOLEAN | |||
692 | conf/{all,interface}/proxy_arp is set to TRUE, | 716 | conf/{all,interface}/proxy_arp is set to TRUE, |
693 | it will be disabled otherwise | 717 | it will be disabled otherwise |
694 | 718 | ||
719 | proxy_arp_pvlan - BOOLEAN | ||
720 | Private VLAN proxy arp. | ||
721 | Basically allow proxy arp replies back to the same interface | ||
722 | (from which the ARP request/solicitation was received). | ||
723 | |||
724 | This is done to support (ethernet) switch features, like RFC | ||
725 | 3069, where the individual ports are NOT allowed to | ||
726 | communicate with each other, but they are allowed to talk to | ||
727 | the upstream router. As described in RFC 3069, it is possible | ||
728 | to allow these hosts to communicate through the upstream | ||
729 | router by proxy_arp'ing. Don't need to be used together with | ||
730 | proxy_arp. | ||
731 | |||
732 | This technology is known by different names: | ||
733 | In RFC 3069 it is called VLAN Aggregation. | ||
734 | Cisco and Allied Telesyn call it Private VLAN. | ||
735 | Hewlett-Packard call it Source-Port filtering or port-isolation. | ||
736 | Ericsson call it MAC-Forced Forwarding (RFC Draft). | ||
737 | |||
695 | shared_media - BOOLEAN | 738 | shared_media - BOOLEAN |
696 | Send(router) or accept(host) RFC1620 shared media redirects. | 739 | Send(router) or accept(host) RFC1620 shared media redirects. |
697 | Overrides ip_secure_redirects. | 740 | Overrides ip_secure_redirects. |
@@ -833,9 +876,18 @@ arp_notify - BOOLEAN | |||
833 | or hardware address changes. | 876 | or hardware address changes. |
834 | 877 | ||
835 | arp_accept - BOOLEAN | 878 | arp_accept - BOOLEAN |
836 | Define behavior when gratuitous arp replies are received: | 879 | Define behavior for gratuitous ARP frames who's IP is not |
837 | 0 - drop gratuitous arp frames | 880 | already present in the ARP table: |
838 | 1 - accept gratuitous arp frames | 881 | 0 - don't create new entries in the ARP table |
882 | 1 - create new entries in the ARP table | ||
883 | |||
884 | Both replies and requests type gratuitous arp will trigger the | ||
885 | ARP table to be updated, if this setting is on. | ||
886 | |||
887 | If the ARP table already contains the IP address of the | ||
888 | gratuitous arp frame, the arp table will be updated regardless | ||
889 | if this setting is on or off. | ||
890 | |||
839 | 891 | ||
840 | app_solicit - INTEGER | 892 | app_solicit - INTEGER |
841 | The maximum number of probes to send to the user space ARP daemon | 893 | The maximum number of probes to send to the user space ARP daemon |
@@ -1074,10 +1126,10 @@ regen_max_retry - INTEGER | |||
1074 | Default: 5 | 1126 | Default: 5 |
1075 | 1127 | ||
1076 | max_addresses - INTEGER | 1128 | max_addresses - INTEGER |
1077 | Number of maximum addresses per interface. 0 disables limitation. | 1129 | Maximum number of autoconfigured addresses per interface. Setting |
1078 | It is recommended not set too large value (or 0) because it would | 1130 | to zero disables the limitation. It is not recommended to set this |
1079 | be too easy way to crash kernel to allow to create too much of | 1131 | value too large (or to zero) because it would be an easy way to |
1080 | autoconfigured addresses. | 1132 | crash the kernel by allowing too many addresses to be created. |
1081 | Default: 16 | 1133 | Default: 16 |
1082 | 1134 | ||
1083 | disable_ipv6 - BOOLEAN | 1135 | disable_ipv6 - BOOLEAN |
diff --git a/Documentation/networking/ixgbevf.txt b/Documentation/networking/ixgbevf.txt new file mode 100755 index 000000000000..19015de6725f --- /dev/null +++ b/Documentation/networking/ixgbevf.txt | |||
@@ -0,0 +1,90 @@ | |||
1 | Linux* Base Driver for Intel(R) Network Connection | ||
2 | ================================================== | ||
3 | |||
4 | November 24, 2009 | ||
5 | |||
6 | Contents | ||
7 | ======== | ||
8 | |||
9 | - In This Release | ||
10 | - Identifying Your Adapter | ||
11 | - Known Issues/Troubleshooting | ||
12 | - Support | ||
13 | |||
14 | In This Release | ||
15 | =============== | ||
16 | |||
17 | This file describes the ixgbevf Linux* Base Driver for Intel Network | ||
18 | Connection. | ||
19 | |||
20 | The ixgbevf driver supports 82599-based virtual function devices that can only | ||
21 | be activated on kernels with CONFIG_PCI_IOV enabled. | ||
22 | |||
23 | The ixgbevf driver supports virtual functions generated by the ixgbe driver | ||
24 | with a max_vfs value of 1 or greater. | ||
25 | |||
26 | The guest OS loading the ixgbevf driver must support MSI-X interrupts. | ||
27 | |||
28 | VLANs: There is a limit of a total of 32 shared VLANs to 1 or more VFs. | ||
29 | |||
30 | Identifying Your Adapter | ||
31 | ======================== | ||
32 | |||
33 | For more information on how to identify your adapter, go to the Adapter & | ||
34 | Driver ID Guide at: | ||
35 | |||
36 | http://support.intel.com/support/network/sb/CS-008441.htm | ||
37 | |||
38 | Known Issues/Troubleshooting | ||
39 | ============================ | ||
40 | |||
41 | Unloading Physical Function (PF) Driver Causes System Reboots When VM is | ||
42 | Running and VF is Loaded on the VM | ||
43 | ------------------------------------------------------------------------ | ||
44 | Do not unload the PF driver (ixgbe) while VFs are assigned to guests. | ||
45 | |||
46 | Support | ||
47 | ======= | ||
48 | |||
49 | For general information, go to the Intel support website at: | ||
50 | |||
51 | http://support.intel.com | ||
52 | |||
53 | or the Intel Wired Networking project hosted by Sourceforge at: | ||
54 | |||
55 | http://sourceforge.net/projects/e1000 | ||
56 | |||
57 | If an issue is identified with the released source code on the supported | ||
58 | kernel with a supported adapter, email the specific information related | ||
59 | to the issue to e1000-devel@lists.sf.net | ||
60 | |||
61 | License | ||
62 | ======= | ||
63 | |||
64 | Intel 10 Gigabit Linux driver. | ||
65 | Copyright(c) 1999 - 2009 Intel Corporation. | ||
66 | |||
67 | This program is free software; you can redistribute it and/or modify it | ||
68 | under the terms and conditions of the GNU General Public License, | ||
69 | version 2, as published by the Free Software Foundation. | ||
70 | |||
71 | This program is distributed in the hope it will be useful, but WITHOUT | ||
72 | ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or | ||
73 | FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for | ||
74 | more details. | ||
75 | |||
76 | You should have received a copy of the GNU General Public License along with | ||
77 | this program; if not, write to the Free Software Foundation, Inc., | ||
78 | 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA. | ||
79 | |||
80 | The full GNU General Public License is included in this distribution in | ||
81 | the file called "COPYING". | ||
82 | |||
83 | Trademarks | ||
84 | ========== | ||
85 | |||
86 | Intel, Itanium, and Pentium are trademarks or registered trademarks of | ||
87 | Intel Corporation or its subsidiaries in the United States and other | ||
88 | countries. | ||
89 | |||
90 | * Other names and brands may be claimed as the property of others. | ||
diff --git a/Documentation/networking/packet_mmap.txt b/Documentation/networking/packet_mmap.txt index a22fd85e3796..09ab0d290326 100644 --- a/Documentation/networking/packet_mmap.txt +++ b/Documentation/networking/packet_mmap.txt | |||
@@ -2,7 +2,7 @@ | |||
2 | + ABSTRACT | 2 | + ABSTRACT |
3 | -------------------------------------------------------------------------------- | 3 | -------------------------------------------------------------------------------- |
4 | 4 | ||
5 | This file documents the CONFIG_PACKET_MMAP option available with the PACKET | 5 | This file documents the mmap() facility available with the PACKET |
6 | socket interface on 2.4 and 2.6 kernels. This type of sockets is used for | 6 | socket interface on 2.4 and 2.6 kernels. This type of sockets is used for |
7 | capture network traffic with utilities like tcpdump or any other that needs | 7 | capture network traffic with utilities like tcpdump or any other that needs |
8 | raw access to network interface. | 8 | raw access to network interface. |
@@ -44,7 +44,7 @@ enabled. For transmission, check the MTU (Maximum Transmission Unit) used and | |||
44 | supported by devices of your network. | 44 | supported by devices of your network. |
45 | 45 | ||
46 | -------------------------------------------------------------------------------- | 46 | -------------------------------------------------------------------------------- |
47 | + How to use CONFIG_PACKET_MMAP to improve capture process | 47 | + How to use mmap() to improve capture process |
48 | -------------------------------------------------------------------------------- | 48 | -------------------------------------------------------------------------------- |
49 | 49 | ||
50 | From the user standpoint, you should use the higher level libpcap library, which | 50 | From the user standpoint, you should use the higher level libpcap library, which |
@@ -64,7 +64,7 @@ the low level details or want to improve libpcap by including PACKET_MMAP | |||
64 | support. | 64 | support. |
65 | 65 | ||
66 | -------------------------------------------------------------------------------- | 66 | -------------------------------------------------------------------------------- |
67 | + How to use CONFIG_PACKET_MMAP directly to improve capture process | 67 | + How to use mmap() directly to improve capture process |
68 | -------------------------------------------------------------------------------- | 68 | -------------------------------------------------------------------------------- |
69 | 69 | ||
70 | From the system calls stand point, the use of PACKET_MMAP involves | 70 | From the system calls stand point, the use of PACKET_MMAP involves |
@@ -105,7 +105,7 @@ also the mapping of the circular buffer in the user process and | |||
105 | the use of this buffer. | 105 | the use of this buffer. |
106 | 106 | ||
107 | -------------------------------------------------------------------------------- | 107 | -------------------------------------------------------------------------------- |
108 | + How to use CONFIG_PACKET_MMAP directly to improve transmission process | 108 | + How to use mmap() directly to improve transmission process |
109 | -------------------------------------------------------------------------------- | 109 | -------------------------------------------------------------------------------- |
110 | Transmission process is similar to capture as shown below. | 110 | Transmission process is similar to capture as shown below. |
111 | 111 | ||
diff --git a/Documentation/networking/regulatory.txt b/Documentation/networking/regulatory.txt index ee31369e9e5b..9551622d0a7b 100644 --- a/Documentation/networking/regulatory.txt +++ b/Documentation/networking/regulatory.txt | |||
@@ -188,3 +188,27 @@ Then in some part of your code after your wiphy has been registered: | |||
188 | &mydriver_jp_regdom.reg_rules[i], | 188 | &mydriver_jp_regdom.reg_rules[i], |
189 | sizeof(struct ieee80211_reg_rule)); | 189 | sizeof(struct ieee80211_reg_rule)); |
190 | regulatory_struct_hint(rd); | 190 | regulatory_struct_hint(rd); |
191 | |||
192 | Statically compiled regulatory database | ||
193 | --------------------------------------- | ||
194 | |||
195 | In most situations the userland solution using CRDA as described | ||
196 | above is the preferred solution. However in some cases a set of | ||
197 | rules built into the kernel itself may be desirable. To account | ||
198 | for this situation, a configuration option has been provided | ||
199 | (i.e. CONFIG_CFG80211_INTERNAL_REGDB). With this option enabled, | ||
200 | the wireless database information contained in net/wireless/db.txt is | ||
201 | used to generate a data structure encoded in net/wireless/regdb.c. | ||
202 | That option also enables code in net/wireless/reg.c which queries | ||
203 | the data in regdb.c as an alternative to using CRDA. | ||
204 | |||
205 | The file net/wireless/db.txt should be kept up-to-date with the db.txt | ||
206 | file available in the git repository here: | ||
207 | |||
208 | git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-regdb.git | ||
209 | |||
210 | Again, most users in most situations should be using the CRDA package | ||
211 | provided with their distribution, and in most other situations users | ||
212 | should be building and using CRDA on their own rather than using | ||
213 | this option. If you are not absolutely sure that you should be using | ||
214 | CONFIG_CFG80211_INTERNAL_REGDB then _DO_NOT_USE_IT_. | ||
diff --git a/Documentation/networking/skfp.txt b/Documentation/networking/skfp.txt index abfddf81e34a..203ec66c9fb4 100644 --- a/Documentation/networking/skfp.txt +++ b/Documentation/networking/skfp.txt | |||
@@ -68,7 +68,7 @@ Compaq adapters (not tested): | |||
68 | ======================= | 68 | ======================= |
69 | 69 | ||
70 | From v2.01 on, the driver is integrated in the linux kernel sources. | 70 | From v2.01 on, the driver is integrated in the linux kernel sources. |
71 | Therefor, the installation is the same as for any other adapter | 71 | Therefore, the installation is the same as for any other adapter |
72 | supported by the kernel. | 72 | supported by the kernel. |
73 | Refer to the manual of your distribution about the installation | 73 | Refer to the manual of your distribution about the installation |
74 | of network adapters. | 74 | of network adapters. |
diff --git a/Documentation/networking/tcp-thin.txt b/Documentation/networking/tcp-thin.txt new file mode 100644 index 000000000000..151e229980f1 --- /dev/null +++ b/Documentation/networking/tcp-thin.txt | |||
@@ -0,0 +1,47 @@ | |||
1 | Thin-streams and TCP | ||
2 | ==================== | ||
3 | A wide range of Internet-based services that use reliable transport | ||
4 | protocols display what we call thin-stream properties. This means | ||
5 | that the application sends data with such a low rate that the | ||
6 | retransmission mechanisms of the transport protocol are not fully | ||
7 | effective. In time-dependent scenarios (like online games, control | ||
8 | systems, stock trading etc.) where the user experience depends | ||
9 | on the data delivery latency, packet loss can be devastating for | ||
10 | the service quality. Extreme latencies are caused by TCP's | ||
11 | dependency on the arrival of new data from the application to trigger | ||
12 | retransmissions effectively through fast retransmit instead of | ||
13 | waiting for long timeouts. | ||
14 | |||
15 | After analysing a large number of time-dependent interactive | ||
16 | applications, we have seen that they often produce thin streams | ||
17 | and also stay with this traffic pattern throughout its entire | ||
18 | lifespan. The combination of time-dependency and the fact that the | ||
19 | streams provoke high latencies when using TCP is unfortunate. | ||
20 | |||
21 | In order to reduce application-layer latency when packets are lost, | ||
22 | a set of mechanisms has been made, which address these latency issues | ||
23 | for thin streams. In short, if the kernel detects a thin stream, | ||
24 | the retransmission mechanisms are modified in the following manner: | ||
25 | |||
26 | 1) If the stream is thin, fast retransmit on the first dupACK. | ||
27 | 2) If the stream is thin, do not apply exponential backoff. | ||
28 | |||
29 | These enhancements are applied only if the stream is detected as | ||
30 | thin. This is accomplished by defining a threshold for the number | ||
31 | of packets in flight. If there are less than 4 packets in flight, | ||
32 | fast retransmissions can not be triggered, and the stream is prone | ||
33 | to experience high retransmission latencies. | ||
34 | |||
35 | Since these mechanisms are targeted at time-dependent applications, | ||
36 | they must be specifically activated by the application using the | ||
37 | TCP_THIN_LINEAR_TIMEOUTS and TCP_THIN_DUPACK IOCTLS or the | ||
38 | tcp_thin_linear_timeouts and tcp_thin_dupack sysctls. Both | ||
39 | modifications are turned off by default. | ||
40 | |||
41 | References | ||
42 | ========== | ||
43 | More information on the modifications, as well as a wide range of | ||
44 | experimental data can be found here: | ||
45 | "Improving latency for interactive, thin-stream applications over | ||
46 | reliable transport" | ||
47 | http://simula.no/research/nd/publications/Simula.nd.477/simula_pdf_file | ||
diff --git a/Documentation/networking/timestamping/timestamping.c b/Documentation/networking/timestamping/timestamping.c index a7936fe8444a..bab619a48214 100644 --- a/Documentation/networking/timestamping/timestamping.c +++ b/Documentation/networking/timestamping/timestamping.c | |||
@@ -370,7 +370,7 @@ int main(int argc, char **argv) | |||
370 | } | 370 | } |
371 | 371 | ||
372 | sock = socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP); | 372 | sock = socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP); |
373 | if (socket < 0) | 373 | if (sock < 0) |
374 | bail("socket"); | 374 | bail("socket"); |
375 | 375 | ||
376 | memset(&device, 0, sizeof(device)); | 376 | memset(&device, 0, sizeof(device)); |
diff --git a/Documentation/pcmcia/locking.txt b/Documentation/pcmcia/locking.txt new file mode 100644 index 000000000000..68f622bc4064 --- /dev/null +++ b/Documentation/pcmcia/locking.txt | |||
@@ -0,0 +1,118 @@ | |||
1 | This file explains the locking and exclusion scheme used in the PCCARD | ||
2 | and PCMCIA subsystems. | ||
3 | |||
4 | |||
5 | A) Overview, Locking Hierarchy: | ||
6 | =============================== | ||
7 | |||
8 | pcmcia_socket_list_rwsem - protects only the list of sockets | ||
9 | - skt_mutex - serializes card insert / ejection | ||
10 | - ops_mutex - serializes socket operation | ||
11 | |||
12 | |||
13 | B) Exclusion | ||
14 | ============ | ||
15 | |||
16 | The following functions and callbacks to struct pcmcia_socket must | ||
17 | be called with "skt_mutex" held: | ||
18 | |||
19 | socket_detect_change() | ||
20 | send_event() | ||
21 | socket_reset() | ||
22 | socket_shutdown() | ||
23 | socket_setup() | ||
24 | socket_remove() | ||
25 | socket_insert() | ||
26 | socket_early_resume() | ||
27 | socket_late_resume() | ||
28 | socket_resume() | ||
29 | socket_suspend() | ||
30 | |||
31 | struct pcmcia_callback *callback | ||
32 | |||
33 | The following functions and callbacks to struct pcmcia_socket must | ||
34 | be called with "ops_mutex" held: | ||
35 | |||
36 | socket_reset() | ||
37 | socket_setup() | ||
38 | |||
39 | struct pccard_operations *ops | ||
40 | struct pccard_resource_ops *resource_ops; | ||
41 | |||
42 | Note that send_event() and struct pcmcia_callback *callback must not be | ||
43 | called with "ops_mutex" held. | ||
44 | |||
45 | |||
46 | C) Protection | ||
47 | ============= | ||
48 | |||
49 | 1. Global Data: | ||
50 | --------------- | ||
51 | struct list_head pcmcia_socket_list; | ||
52 | |||
53 | protected by pcmcia_socket_list_rwsem; | ||
54 | |||
55 | |||
56 | 2. Per-Socket Data: | ||
57 | ------------------- | ||
58 | The resource_ops and their data are protected by ops_mutex. | ||
59 | |||
60 | The "main" struct pcmcia_socket is protected as follows (read-only fields | ||
61 | or single-use fields not mentioned): | ||
62 | |||
63 | - by pcmcia_socket_list_rwsem: | ||
64 | struct list_head socket_list; | ||
65 | |||
66 | - by thread_lock: | ||
67 | unsigned int thread_events; | ||
68 | |||
69 | - by skt_mutex: | ||
70 | u_int suspended_state; | ||
71 | void (*tune_bridge); | ||
72 | struct pcmcia_callback *callback; | ||
73 | int resume_status; | ||
74 | |||
75 | - by ops_mutex: | ||
76 | socket_state_t socket; | ||
77 | u_int state; | ||
78 | u_short lock_count; | ||
79 | pccard_mem_map cis_mem; | ||
80 | void __iomem *cis_virt; | ||
81 | struct { } irq; | ||
82 | io_window_t io[]; | ||
83 | pccard_mem_map win[]; | ||
84 | struct list_head cis_cache; | ||
85 | size_t fake_cis_len; | ||
86 | u8 *fake_cis; | ||
87 | u_int irq_mask; | ||
88 | void (*zoom_video); | ||
89 | int (*power_hook); | ||
90 | u8 resource...; | ||
91 | struct list_head devices_list; | ||
92 | u8 device_count; | ||
93 | struct pcmcia_state; | ||
94 | |||
95 | |||
96 | 3. Per PCMCIA-device Data: | ||
97 | -------------------------- | ||
98 | |||
99 | The "main" struct pcmcia_devie is protected as follows (read-only fields | ||
100 | or single-use fields not mentioned): | ||
101 | |||
102 | |||
103 | - by pcmcia_socket->ops_mutex: | ||
104 | struct list_head socket_device_list; | ||
105 | struct config_t *function_config; | ||
106 | u16 _irq:1; | ||
107 | u16 _io:1; | ||
108 | u16 _win:4; | ||
109 | u16 _locked:1; | ||
110 | u16 allow_func_id_match:1; | ||
111 | u16 suspended:1; | ||
112 | u16 _removed:1; | ||
113 | |||
114 | - by the PCMCIA driver: | ||
115 | io_req_t io; | ||
116 | irq_req_t irq; | ||
117 | config_req_t conf; | ||
118 | window_handle_t win; | ||
diff --git a/Documentation/pnp.txt b/Documentation/pnp.txt index a327db67782a..763e4659bf18 100644 --- a/Documentation/pnp.txt +++ b/Documentation/pnp.txt | |||
@@ -57,7 +57,7 @@ PC standard floppy disk controller | |||
57 | # cat resources | 57 | # cat resources |
58 | DISABLED | 58 | DISABLED |
59 | 59 | ||
60 | - Notice the string "DISABLED". THis means the device is not active. | 60 | - Notice the string "DISABLED". This means the device is not active. |
61 | 61 | ||
62 | 3.) check the device's possible configurations (optional) | 62 | 3.) check the device's possible configurations (optional) |
63 | # cat options | 63 | # cat options |
@@ -139,7 +139,7 @@ Plug and Play but it is planned to be in the near future. | |||
139 | 139 | ||
140 | Requirements for a Linux PnP protocol: | 140 | Requirements for a Linux PnP protocol: |
141 | 1.) the protocol must use EISA IDs | 141 | 1.) the protocol must use EISA IDs |
142 | 2.) the protocol must inform the PnP Layer of a devices current configuration | 142 | 2.) the protocol must inform the PnP Layer of a device's current configuration |
143 | - the ability to set resources is optional but preferred. | 143 | - the ability to set resources is optional but preferred. |
144 | 144 | ||
145 | The following are PnP protocol related functions: | 145 | The following are PnP protocol related functions: |
@@ -158,7 +158,7 @@ pnp_remove_device | |||
158 | - automatically will free mem used by the device and related structures | 158 | - automatically will free mem used by the device and related structures |
159 | 159 | ||
160 | pnp_add_id | 160 | pnp_add_id |
161 | - adds a EISA ID to the list of supported IDs for the specified device | 161 | - adds an EISA ID to the list of supported IDs for the specified device |
162 | 162 | ||
163 | For more information consult the source of a protocol such as | 163 | For more information consult the source of a protocol such as |
164 | /drivers/pnp/pnpbios/core.c. | 164 | /drivers/pnp/pnpbios/core.c. |
@@ -167,7 +167,7 @@ For more information consult the source of a protocol such as | |||
167 | 167 | ||
168 | Linux Plug and Play Drivers | 168 | Linux Plug and Play Drivers |
169 | --------------------------- | 169 | --------------------------- |
170 | This section contains information for linux PnP driver developers. | 170 | This section contains information for Linux PnP driver developers. |
171 | 171 | ||
172 | The New Way | 172 | The New Way |
173 | ........... | 173 | ........... |
@@ -235,11 +235,10 @@ static int __init serial8250_pnp_init(void) | |||
235 | The Old Way | 235 | The Old Way |
236 | ........... | 236 | ........... |
237 | 237 | ||
238 | a series of compatibility functions have been created to make it easy to convert | 238 | A series of compatibility functions have been created to make it easy to convert |
239 | |||
240 | ISAPNP drivers. They should serve as a temporary solution only. | 239 | ISAPNP drivers. They should serve as a temporary solution only. |
241 | 240 | ||
242 | they are as follows: | 241 | They are as follows: |
243 | 242 | ||
244 | struct pnp_card *pnp_find_card(unsigned short vendor, | 243 | struct pnp_card *pnp_find_card(unsigned short vendor, |
245 | unsigned short device, | 244 | unsigned short device, |
diff --git a/Documentation/power/runtime_pm.txt b/Documentation/power/runtime_pm.txt index 356fd86f4ea8..55b859b3bc72 100644 --- a/Documentation/power/runtime_pm.txt +++ b/Documentation/power/runtime_pm.txt | |||
@@ -224,6 +224,12 @@ defined in include/linux/pm.h: | |||
224 | RPM_SUSPENDED, which means that each device is initially regarded by the | 224 | RPM_SUSPENDED, which means that each device is initially regarded by the |
225 | PM core as 'suspended', regardless of its real hardware status | 225 | PM core as 'suspended', regardless of its real hardware status |
226 | 226 | ||
227 | unsigned int runtime_auto; | ||
228 | - if set, indicates that the user space has allowed the device driver to | ||
229 | power manage the device at run time via the /sys/devices/.../power/control | ||
230 | interface; it may only be modified with the help of the pm_runtime_allow() | ||
231 | and pm_runtime_forbid() helper functions | ||
232 | |||
227 | All of the above fields are members of the 'power' member of 'struct device'. | 233 | All of the above fields are members of the 'power' member of 'struct device'. |
228 | 234 | ||
229 | 4. Run-time PM Device Helper Functions | 235 | 4. Run-time PM Device Helper Functions |
@@ -250,7 +256,7 @@ drivers/base/power/runtime.c and include/linux/pm_runtime.h: | |||
250 | to suspend the device again in future | 256 | to suspend the device again in future |
251 | 257 | ||
252 | int pm_runtime_resume(struct device *dev); | 258 | int pm_runtime_resume(struct device *dev); |
253 | - execute the subsystem-leve resume callback for the device; returns 0 on | 259 | - execute the subsystem-level resume callback for the device; returns 0 on |
254 | success, 1 if the device's run-time PM status was already 'active' or | 260 | success, 1 if the device's run-time PM status was already 'active' or |
255 | error code on failure, where -EAGAIN means it may be safe to attempt to | 261 | error code on failure, where -EAGAIN means it may be safe to attempt to |
256 | resume the device again in future, but 'power.runtime_error' should be | 262 | resume the device again in future, but 'power.runtime_error' should be |
@@ -329,6 +335,20 @@ drivers/base/power/runtime.c and include/linux/pm_runtime.h: | |||
329 | 'power.runtime_error' is set or 'power.disable_depth' is greater than | 335 | 'power.runtime_error' is set or 'power.disable_depth' is greater than |
330 | zero) | 336 | zero) |
331 | 337 | ||
338 | bool pm_runtime_suspended(struct device *dev); | ||
339 | - return true if the device's runtime PM status is 'suspended', or false | ||
340 | otherwise | ||
341 | |||
342 | void pm_runtime_allow(struct device *dev); | ||
343 | - set the power.runtime_auto flag for the device and decrease its usage | ||
344 | counter (used by the /sys/devices/.../power/control interface to | ||
345 | effectively allow the device to be power managed at run time) | ||
346 | |||
347 | void pm_runtime_forbid(struct device *dev); | ||
348 | - unset the power.runtime_auto flag for the device and increase its usage | ||
349 | counter (used by the /sys/devices/.../power/control interface to | ||
350 | effectively prevent the device from being power managed at run time) | ||
351 | |||
332 | It is safe to execute the following helper functions from interrupt context: | 352 | It is safe to execute the following helper functions from interrupt context: |
333 | 353 | ||
334 | pm_request_idle() | 354 | pm_request_idle() |
@@ -382,6 +402,18 @@ may be desirable to suspend the device as soon as ->probe() or ->remove() has | |||
382 | finished, so the PM core uses pm_runtime_idle_sync() to invoke the | 402 | finished, so the PM core uses pm_runtime_idle_sync() to invoke the |
383 | subsystem-level idle callback for the device at that time. | 403 | subsystem-level idle callback for the device at that time. |
384 | 404 | ||
405 | The user space can effectively disallow the driver of the device to power manage | ||
406 | it at run time by changing the value of its /sys/devices/.../power/control | ||
407 | attribute to "on", which causes pm_runtime_forbid() to be called. In principle, | ||
408 | this mechanism may also be used by the driver to effectively turn off the | ||
409 | run-time power management of the device until the user space turns it on. | ||
410 | Namely, during the initialization the driver can make sure that the run-time PM | ||
411 | status of the device is 'active' and call pm_runtime_forbid(). It should be | ||
412 | noted, however, that if the user space has already intentionally changed the | ||
413 | value of /sys/devices/.../power/control to "auto" to allow the driver to power | ||
414 | manage the device at run time, the driver may confuse it by using | ||
415 | pm_runtime_forbid() this way. | ||
416 | |||
385 | 6. Run-time PM and System Sleep | 417 | 6. Run-time PM and System Sleep |
386 | 418 | ||
387 | Run-time PM and system sleep (i.e., system suspend and hibernation, also known | 419 | Run-time PM and system sleep (i.e., system suspend and hibernation, also known |
@@ -431,3 +463,64 @@ The PM core always increments the run-time usage counter before calling the | |||
431 | ->prepare() callback and decrements it after calling the ->complete() callback. | 463 | ->prepare() callback and decrements it after calling the ->complete() callback. |
432 | Hence disabling run-time PM temporarily like this will not cause any run-time | 464 | Hence disabling run-time PM temporarily like this will not cause any run-time |
433 | suspend callbacks to be lost. | 465 | suspend callbacks to be lost. |
466 | |||
467 | 7. Generic subsystem callbacks | ||
468 | |||
469 | Subsystems may wish to conserve code space by using the set of generic power | ||
470 | management callbacks provided by the PM core, defined in | ||
471 | driver/base/power/generic_ops.c: | ||
472 | |||
473 | int pm_generic_runtime_idle(struct device *dev); | ||
474 | - invoke the ->runtime_idle() callback provided by the driver of this | ||
475 | device, if defined, and call pm_runtime_suspend() for this device if the | ||
476 | return value is 0 or the callback is not defined | ||
477 | |||
478 | int pm_generic_runtime_suspend(struct device *dev); | ||
479 | - invoke the ->runtime_suspend() callback provided by the driver of this | ||
480 | device and return its result, or return -EINVAL if not defined | ||
481 | |||
482 | int pm_generic_runtime_resume(struct device *dev); | ||
483 | - invoke the ->runtime_resume() callback provided by the driver of this | ||
484 | device and return its result, or return -EINVAL if not defined | ||
485 | |||
486 | int pm_generic_suspend(struct device *dev); | ||
487 | - if the device has not been suspended at run time, invoke the ->suspend() | ||
488 | callback provided by its driver and return its result, or return 0 if not | ||
489 | defined | ||
490 | |||
491 | int pm_generic_resume(struct device *dev); | ||
492 | - invoke the ->resume() callback provided by the driver of this device and, | ||
493 | if successful, change the device's runtime PM status to 'active' | ||
494 | |||
495 | int pm_generic_freeze(struct device *dev); | ||
496 | - if the device has not been suspended at run time, invoke the ->freeze() | ||
497 | callback provided by its driver and return its result, or return 0 if not | ||
498 | defined | ||
499 | |||
500 | int pm_generic_thaw(struct device *dev); | ||
501 | - if the device has not been suspended at run time, invoke the ->thaw() | ||
502 | callback provided by its driver and return its result, or return 0 if not | ||
503 | defined | ||
504 | |||
505 | int pm_generic_poweroff(struct device *dev); | ||
506 | - if the device has not been suspended at run time, invoke the ->poweroff() | ||
507 | callback provided by its driver and return its result, or return 0 if not | ||
508 | defined | ||
509 | |||
510 | int pm_generic_restore(struct device *dev); | ||
511 | - invoke the ->restore() callback provided by the driver of this device and, | ||
512 | if successful, change the device's runtime PM status to 'active' | ||
513 | |||
514 | These functions can be assigned to the ->runtime_idle(), ->runtime_suspend(), | ||
515 | ->runtime_resume(), ->suspend(), ->resume(), ->freeze(), ->thaw(), ->poweroff(), | ||
516 | or ->restore() callback pointers in the subsystem-level dev_pm_ops structures. | ||
517 | |||
518 | If a subsystem wishes to use all of them at the same time, it can simply assign | ||
519 | the GENERIC_SUBSYS_PM_OPS macro, defined in include/linux/pm.h, to its | ||
520 | dev_pm_ops structure pointer. | ||
521 | |||
522 | Device drivers that wish to use the same function as a system suspend, freeze, | ||
523 | poweroff and run-time suspend callback, and similarly for system resume, thaw, | ||
524 | restore, and run-time resume, can achieve this with the help of the | ||
525 | UNIVERSAL_DEV_PM_OPS macro defined in include/linux/pm.h (possibly setting its | ||
526 | last argument to NULL). | ||
diff --git a/Documentation/powerpc/dts-bindings/fsl/can.txt b/Documentation/powerpc/dts-bindings/fsl/can.txt new file mode 100644 index 000000000000..2fa4fcd38fd6 --- /dev/null +++ b/Documentation/powerpc/dts-bindings/fsl/can.txt | |||
@@ -0,0 +1,53 @@ | |||
1 | CAN Device Tree Bindings | ||
2 | ------------------------ | ||
3 | |||
4 | (c) 2006-2009 Secret Lab Technologies Ltd | ||
5 | Grant Likely <grant.likely@secretlab.ca> | ||
6 | |||
7 | fsl,mpc5200-mscan nodes | ||
8 | ----------------------- | ||
9 | In addition to the required compatible-, reg- and interrupt-properties, you can | ||
10 | also specify which clock source shall be used for the controller: | ||
11 | |||
12 | - fsl,mscan-clock-source : a string describing the clock source. Valid values | ||
13 | are: "ip" for ip bus clock | ||
14 | "ref" for reference clock (XTAL) | ||
15 | "ref" is default in case this property is not | ||
16 | present. | ||
17 | |||
18 | fsl,mpc5121-mscan nodes | ||
19 | ----------------------- | ||
20 | In addition to the required compatible-, reg- and interrupt-properties, you can | ||
21 | also specify which clock source and divider shall be used for the controller: | ||
22 | |||
23 | - fsl,mscan-clock-source : a string describing the clock source. Valid values | ||
24 | are: "ip" for ip bus clock | ||
25 | "ref" for reference clock | ||
26 | "sys" for system clock | ||
27 | If this property is not present, an optimal CAN | ||
28 | clock source and frequency based on the system | ||
29 | clock will be selected. If this is not possible, | ||
30 | the reference clock will be used. | ||
31 | |||
32 | - fsl,mscan-clock-divider: for the reference and system clock, an additional | ||
33 | clock divider can be specified. By default, a | ||
34 | value of 1 is used. | ||
35 | |||
36 | Note that the MPC5121 Rev. 1 processor is not supported. | ||
37 | |||
38 | Examples: | ||
39 | can@1300 { | ||
40 | compatible = "fsl,mpc5121-mscan"; | ||
41 | interrupts = <12 0x8>; | ||
42 | interrupt-parent = <&ipic>; | ||
43 | reg = <0x1300 0x80>; | ||
44 | }; | ||
45 | |||
46 | can@1380 { | ||
47 | compatible = "fsl,mpc5121-mscan"; | ||
48 | interrupts = <13 0x8>; | ||
49 | interrupt-parent = <&ipic>; | ||
50 | reg = <0x1380 0x80>; | ||
51 | fsl,mscan-clock-source = "ref"; | ||
52 | fsl,mscan-clock-divider = <3>; | ||
53 | }; | ||
diff --git a/Documentation/powerpc/dts-bindings/fsl/dma.txt b/Documentation/powerpc/dts-bindings/fsl/dma.txt index 0732cdd05ba1..2a4b4bce6110 100644 --- a/Documentation/powerpc/dts-bindings/fsl/dma.txt +++ b/Documentation/powerpc/dts-bindings/fsl/dma.txt | |||
@@ -44,21 +44,29 @@ Example: | |||
44 | compatible = "fsl,mpc8349-dma-channel", "fsl,elo-dma-channel"; | 44 | compatible = "fsl,mpc8349-dma-channel", "fsl,elo-dma-channel"; |
45 | cell-index = <0>; | 45 | cell-index = <0>; |
46 | reg = <0 0x80>; | 46 | reg = <0 0x80>; |
47 | interrupt-parent = <&ipic>; | ||
48 | interrupts = <71 8>; | ||
47 | }; | 49 | }; |
48 | dma-channel@80 { | 50 | dma-channel@80 { |
49 | compatible = "fsl,mpc8349-dma-channel", "fsl,elo-dma-channel"; | 51 | compatible = "fsl,mpc8349-dma-channel", "fsl,elo-dma-channel"; |
50 | cell-index = <1>; | 52 | cell-index = <1>; |
51 | reg = <0x80 0x80>; | 53 | reg = <0x80 0x80>; |
54 | interrupt-parent = <&ipic>; | ||
55 | interrupts = <71 8>; | ||
52 | }; | 56 | }; |
53 | dma-channel@100 { | 57 | dma-channel@100 { |
54 | compatible = "fsl,mpc8349-dma-channel", "fsl,elo-dma-channel"; | 58 | compatible = "fsl,mpc8349-dma-channel", "fsl,elo-dma-channel"; |
55 | cell-index = <2>; | 59 | cell-index = <2>; |
56 | reg = <0x100 0x80>; | 60 | reg = <0x100 0x80>; |
61 | interrupt-parent = <&ipic>; | ||
62 | interrupts = <71 8>; | ||
57 | }; | 63 | }; |
58 | dma-channel@180 { | 64 | dma-channel@180 { |
59 | compatible = "fsl,mpc8349-dma-channel", "fsl,elo-dma-channel"; | 65 | compatible = "fsl,mpc8349-dma-channel", "fsl,elo-dma-channel"; |
60 | cell-index = <3>; | 66 | cell-index = <3>; |
61 | reg = <0x180 0x80>; | 67 | reg = <0x180 0x80>; |
68 | interrupt-parent = <&ipic>; | ||
69 | interrupts = <71 8>; | ||
62 | }; | 70 | }; |
63 | }; | 71 | }; |
64 | 72 | ||
diff --git a/Documentation/powerpc/dts-bindings/fsl/i2c.txt b/Documentation/powerpc/dts-bindings/fsl/i2c.txt index b6d2e21474f9..50da20310585 100644 --- a/Documentation/powerpc/dts-bindings/fsl/i2c.txt +++ b/Documentation/powerpc/dts-bindings/fsl/i2c.txt | |||
@@ -2,15 +2,14 @@ | |||
2 | 2 | ||
3 | Required properties : | 3 | Required properties : |
4 | 4 | ||
5 | - device_type : Should be "i2c" | ||
6 | - reg : Offset and length of the register set for the device | 5 | - reg : Offset and length of the register set for the device |
6 | - compatible : should be "fsl,CHIP-i2c" where CHIP is the name of a | ||
7 | compatible processor, e.g. mpc8313, mpc8543, mpc8544, mpc5121, | ||
8 | mpc5200 or mpc5200b. For the mpc5121, an additional node | ||
9 | "fsl,mpc5121-i2c-ctrl" is required as shown in the example below. | ||
7 | 10 | ||
8 | Recommended properties : | 11 | Recommended properties : |
9 | 12 | ||
10 | - compatible : compatibility list with 2 entries, the first should | ||
11 | be "fsl,CHIP-i2c" where CHIP is the name of a compatible processor, | ||
12 | e.g. mpc8313, mpc8543, mpc8544, mpc5200 or mpc5200b. The second one | ||
13 | should be "fsl-i2c". | ||
14 | - interrupts : <a b> where a is the interrupt number and b is a | 13 | - interrupts : <a b> where a is the interrupt number and b is a |
15 | field that represents an encoding of the sense and level | 14 | field that represents an encoding of the sense and level |
16 | information for the interrupt. This should be encoded based on | 15 | information for the interrupt. This should be encoded based on |
@@ -24,25 +23,40 @@ Recommended properties : | |||
24 | 23 | ||
25 | Examples : | 24 | Examples : |
26 | 25 | ||
26 | /* MPC5121 based board */ | ||
27 | i2c@1740 { | ||
28 | #address-cells = <1>; | ||
29 | #size-cells = <0>; | ||
30 | compatible = "fsl,mpc5121-i2c", "fsl-i2c"; | ||
31 | reg = <0x1740 0x20>; | ||
32 | interrupts = <11 0x8>; | ||
33 | interrupt-parent = <&ipic>; | ||
34 | clock-frequency = <100000>; | ||
35 | }; | ||
36 | |||
37 | i2ccontrol@1760 { | ||
38 | compatible = "fsl,mpc5121-i2c-ctrl"; | ||
39 | reg = <0x1760 0x8>; | ||
40 | }; | ||
41 | |||
42 | /* MPC5200B based board */ | ||
27 | i2c@3d00 { | 43 | i2c@3d00 { |
28 | #address-cells = <1>; | 44 | #address-cells = <1>; |
29 | #size-cells = <0>; | 45 | #size-cells = <0>; |
30 | compatible = "fsl,mpc5200b-i2c","fsl,mpc5200-i2c","fsl-i2c"; | 46 | compatible = "fsl,mpc5200b-i2c","fsl,mpc5200-i2c","fsl-i2c"; |
31 | cell-index = <0>; | ||
32 | reg = <0x3d00 0x40>; | 47 | reg = <0x3d00 0x40>; |
33 | interrupts = <2 15 0>; | 48 | interrupts = <2 15 0>; |
34 | interrupt-parent = <&mpc5200_pic>; | 49 | interrupt-parent = <&mpc5200_pic>; |
35 | fsl,preserve-clocking; | 50 | fsl,preserve-clocking; |
36 | }; | 51 | }; |
37 | 52 | ||
53 | /* MPC8544 base board */ | ||
38 | i2c@3100 { | 54 | i2c@3100 { |
39 | #address-cells = <1>; | 55 | #address-cells = <1>; |
40 | #size-cells = <0>; | 56 | #size-cells = <0>; |
41 | cell-index = <1>; | ||
42 | compatible = "fsl,mpc8544-i2c", "fsl-i2c"; | 57 | compatible = "fsl,mpc8544-i2c", "fsl-i2c"; |
43 | reg = <0x3100 0x100>; | 58 | reg = <0x3100 0x100>; |
44 | interrupts = <43 2>; | 59 | interrupts = <43 2>; |
45 | interrupt-parent = <&mpic>; | 60 | interrupt-parent = <&mpic>; |
46 | clock-frequency = <400000>; | 61 | clock-frequency = <400000>; |
47 | }; | 62 | }; |
48 | |||
diff --git a/Documentation/powerpc/dts-bindings/fsl/mpc5121-psc.txt b/Documentation/powerpc/dts-bindings/fsl/mpc5121-psc.txt new file mode 100644 index 000000000000..8832e8798912 --- /dev/null +++ b/Documentation/powerpc/dts-bindings/fsl/mpc5121-psc.txt | |||
@@ -0,0 +1,70 @@ | |||
1 | MPC5121 PSC Device Tree Bindings | ||
2 | |||
3 | PSC in UART mode | ||
4 | ---------------- | ||
5 | |||
6 | For PSC in UART mode the needed PSC serial devices | ||
7 | are specified by fsl,mpc5121-psc-uart nodes in the | ||
8 | fsl,mpc5121-immr SoC node. Additionally the PSC FIFO | ||
9 | Controller node fsl,mpc5121-psc-fifo is requered there: | ||
10 | |||
11 | fsl,mpc5121-psc-uart nodes | ||
12 | -------------------------- | ||
13 | |||
14 | Required properties : | ||
15 | - compatible : Should contain "fsl,mpc5121-psc-uart" and "fsl,mpc5121-psc" | ||
16 | - cell-index : Index of the PSC in hardware | ||
17 | - reg : Offset and length of the register set for the PSC device | ||
18 | - interrupts : <a b> where a is the interrupt number of the | ||
19 | PSC FIFO Controller and b is a field that represents an | ||
20 | encoding of the sense and level information for the interrupt. | ||
21 | - interrupt-parent : the phandle for the interrupt controller that | ||
22 | services interrupts for this device. | ||
23 | |||
24 | Recommended properties : | ||
25 | - fsl,rx-fifo-size : the size of the RX fifo slice (a multiple of 4) | ||
26 | - fsl,tx-fifo-size : the size of the TX fifo slice (a multiple of 4) | ||
27 | |||
28 | |||
29 | fsl,mpc5121-psc-fifo node | ||
30 | ------------------------- | ||
31 | |||
32 | Required properties : | ||
33 | - compatible : Should be "fsl,mpc5121-psc-fifo" | ||
34 | - reg : Offset and length of the register set for the PSC | ||
35 | FIFO Controller | ||
36 | - interrupts : <a b> where a is the interrupt number of the | ||
37 | PSC FIFO Controller and b is a field that represents an | ||
38 | encoding of the sense and level information for the interrupt. | ||
39 | - interrupt-parent : the phandle for the interrupt controller that | ||
40 | services interrupts for this device. | ||
41 | |||
42 | |||
43 | Example for a board using PSC0 and PSC1 devices in serial mode: | ||
44 | |||
45 | serial@11000 { | ||
46 | compatible = "fsl,mpc5121-psc-uart", "fsl,mpc5121-psc"; | ||
47 | cell-index = <0>; | ||
48 | reg = <0x11000 0x100>; | ||
49 | interrupts = <40 0x8>; | ||
50 | interrupt-parent = < &ipic >; | ||
51 | fsl,rx-fifo-size = <16>; | ||
52 | fsl,tx-fifo-size = <16>; | ||
53 | }; | ||
54 | |||
55 | serial@11100 { | ||
56 | compatible = "fsl,mpc5121-psc-uart", "fsl,mpc5121-psc"; | ||
57 | cell-index = <1>; | ||
58 | reg = <0x11100 0x100>; | ||
59 | interrupts = <40 0x8>; | ||
60 | interrupt-parent = < &ipic >; | ||
61 | fsl,rx-fifo-size = <16>; | ||
62 | fsl,tx-fifo-size = <16>; | ||
63 | }; | ||
64 | |||
65 | pscfifo@11f00 { | ||
66 | compatible = "fsl,mpc5121-psc-fifo"; | ||
67 | reg = <0x11f00 0x100>; | ||
68 | interrupts = <40 0x8>; | ||
69 | interrupt-parent = < &ipic >; | ||
70 | }; | ||
diff --git a/Documentation/powerpc/dts-bindings/fsl/mpc5200.txt b/Documentation/powerpc/dts-bindings/fsl/mpc5200.txt index 5c6602dbfdc2..4ccb2cd5df94 100644 --- a/Documentation/powerpc/dts-bindings/fsl/mpc5200.txt +++ b/Documentation/powerpc/dts-bindings/fsl/mpc5200.txt | |||
@@ -195,11 +195,4 @@ External interrupts: | |||
195 | 195 | ||
196 | fsl,mpc5200-mscan nodes | 196 | fsl,mpc5200-mscan nodes |
197 | ----------------------- | 197 | ----------------------- |
198 | In addition to the required compatible-, reg- and interrupt-properites, you can | 198 | See file can.txt in this directory. |
199 | also specify which clock source shall be used for the controller: | ||
200 | |||
201 | - fsl,mscan-clock-source- a string describing the clock source. Valid values | ||
202 | are: "ip" for ip bus clock | ||
203 | "ref" for reference clock (XTAL) | ||
204 | "ref" is default in case this property is not | ||
205 | present. | ||
diff --git a/Documentation/powerpc/dts-bindings/fsl/spi.txt b/Documentation/powerpc/dts-bindings/fsl/spi.txt index e7d9a344c4f4..80510c018eea 100644 --- a/Documentation/powerpc/dts-bindings/fsl/spi.txt +++ b/Documentation/powerpc/dts-bindings/fsl/spi.txt | |||
@@ -13,6 +13,11 @@ Required properties: | |||
13 | - interrupt-parent : the phandle for the interrupt controller that | 13 | - interrupt-parent : the phandle for the interrupt controller that |
14 | services interrupts for this device. | 14 | services interrupts for this device. |
15 | 15 | ||
16 | Optional properties: | ||
17 | - gpios : specifies the gpio pins to be used for chipselects. | ||
18 | The gpios will be referred to as reg = <index> in the SPI child nodes. | ||
19 | If unspecified, a single SPI device without a chip select can be used. | ||
20 | |||
16 | Example: | 21 | Example: |
17 | spi@4c0 { | 22 | spi@4c0 { |
18 | cell-index = <0>; | 23 | cell-index = <0>; |
@@ -21,4 +26,6 @@ Example: | |||
21 | interrupts = <82 0>; | 26 | interrupts = <82 0>; |
22 | interrupt-parent = <700>; | 27 | interrupt-parent = <700>; |
23 | mode = "cpu"; | 28 | mode = "cpu"; |
29 | gpios = <&gpio 18 1 // device reg=<0> | ||
30 | &gpio 19 1>; // device reg=<1> | ||
24 | }; | 31 | }; |
diff --git a/Documentation/powerpc/ptrace.txt b/Documentation/powerpc/ptrace.txt new file mode 100644 index 000000000000..f4a5499b7bc6 --- /dev/null +++ b/Documentation/powerpc/ptrace.txt | |||
@@ -0,0 +1,134 @@ | |||
1 | GDB intends to support the following hardware debug features of BookE | ||
2 | processors: | ||
3 | |||
4 | 4 hardware breakpoints (IAC) | ||
5 | 2 hardware watchpoints (read, write and read-write) (DAC) | ||
6 | 2 value conditions for the hardware watchpoints (DVC) | ||
7 | |||
8 | For that, we need to extend ptrace so that GDB can query and set these | ||
9 | resources. Since we're extending, we're trying to create an interface | ||
10 | that's extendable and that covers both BookE and server processors, so | ||
11 | that GDB doesn't need to special-case each of them. We added the | ||
12 | following 3 new ptrace requests. | ||
13 | |||
14 | 1. PTRACE_PPC_GETHWDEBUGINFO | ||
15 | |||
16 | Query for GDB to discover the hardware debug features. The main info to | ||
17 | be returned here is the minimum alignment for the hardware watchpoints. | ||
18 | BookE processors don't have restrictions here, but server processors have | ||
19 | an 8-byte alignment restriction for hardware watchpoints. We'd like to avoid | ||
20 | adding special cases to GDB based on what it sees in AUXV. | ||
21 | |||
22 | Since we're at it, we added other useful info that the kernel can return to | ||
23 | GDB: this query will return the number of hardware breakpoints, hardware | ||
24 | watchpoints and whether it supports a range of addresses and a condition. | ||
25 | The query will fill the following structure provided by the requesting process: | ||
26 | |||
27 | struct ppc_debug_info { | ||
28 | unit32_t version; | ||
29 | unit32_t num_instruction_bps; | ||
30 | unit32_t num_data_bps; | ||
31 | unit32_t num_condition_regs; | ||
32 | unit32_t data_bp_alignment; | ||
33 | unit32_t sizeof_condition; /* size of the DVC register */ | ||
34 | uint64_t features; /* bitmask of the individual flags */ | ||
35 | }; | ||
36 | |||
37 | features will have bits indicating whether there is support for: | ||
38 | |||
39 | #define PPC_DEBUG_FEATURE_INSN_BP_RANGE 0x1 | ||
40 | #define PPC_DEBUG_FEATURE_INSN_BP_MASK 0x2 | ||
41 | #define PPC_DEBUG_FEATURE_DATA_BP_RANGE 0x4 | ||
42 | #define PPC_DEBUG_FEATURE_DATA_BP_MASK 0x8 | ||
43 | |||
44 | 2. PTRACE_SETHWDEBUG | ||
45 | |||
46 | Sets a hardware breakpoint or watchpoint, according to the provided structure: | ||
47 | |||
48 | struct ppc_hw_breakpoint { | ||
49 | uint32_t version; | ||
50 | #define PPC_BREAKPOINT_TRIGGER_EXECUTE 0x1 | ||
51 | #define PPC_BREAKPOINT_TRIGGER_READ 0x2 | ||
52 | #define PPC_BREAKPOINT_TRIGGER_WRITE 0x4 | ||
53 | uint32_t trigger_type; /* only some combinations allowed */ | ||
54 | #define PPC_BREAKPOINT_MODE_EXACT 0x0 | ||
55 | #define PPC_BREAKPOINT_MODE_RANGE_INCLUSIVE 0x1 | ||
56 | #define PPC_BREAKPOINT_MODE_RANGE_EXCLUSIVE 0x2 | ||
57 | #define PPC_BREAKPOINT_MODE_MASK 0x3 | ||
58 | uint32_t addr_mode; /* address match mode */ | ||
59 | |||
60 | #define PPC_BREAKPOINT_CONDITION_MODE 0x3 | ||
61 | #define PPC_BREAKPOINT_CONDITION_NONE 0x0 | ||
62 | #define PPC_BREAKPOINT_CONDITION_AND 0x1 | ||
63 | #define PPC_BREAKPOINT_CONDITION_EXACT 0x1 /* different name for the same thing as above */ | ||
64 | #define PPC_BREAKPOINT_CONDITION_OR 0x2 | ||
65 | #define PPC_BREAKPOINT_CONDITION_AND_OR 0x3 | ||
66 | #define PPC_BREAKPOINT_CONDITION_BE_ALL 0x00ff0000 /* byte enable bits */ | ||
67 | #define PPC_BREAKPOINT_CONDITION_BE(n) (1<<((n)+16)) | ||
68 | uint32_t condition_mode; /* break/watchpoint condition flags */ | ||
69 | |||
70 | uint64_t addr; | ||
71 | uint64_t addr2; | ||
72 | uint64_t condition_value; | ||
73 | }; | ||
74 | |||
75 | A request specifies one event, not necessarily just one register to be set. | ||
76 | For instance, if the request is for a watchpoint with a condition, both the | ||
77 | DAC and DVC registers will be set in the same request. | ||
78 | |||
79 | With this GDB can ask for all kinds of hardware breakpoints and watchpoints | ||
80 | that the BookE supports. COMEFROM breakpoints available in server processors | ||
81 | are not contemplated, but that is out of the scope of this work. | ||
82 | |||
83 | ptrace will return an integer (handle) uniquely identifying the breakpoint or | ||
84 | watchpoint just created. This integer will be used in the PTRACE_DELHWDEBUG | ||
85 | request to ask for its removal. Return -ENOSPC if the requested breakpoint | ||
86 | can't be allocated on the registers. | ||
87 | |||
88 | Some examples of using the structure to: | ||
89 | |||
90 | - set a breakpoint in the first breakpoint register | ||
91 | |||
92 | p.version = PPC_DEBUG_CURRENT_VERSION; | ||
93 | p.trigger_type = PPC_BREAKPOINT_TRIGGER_EXECUTE; | ||
94 | p.addr_mode = PPC_BREAKPOINT_MODE_EXACT; | ||
95 | p.condition_mode = PPC_BREAKPOINT_CONDITION_NONE; | ||
96 | p.addr = (uint64_t) address; | ||
97 | p.addr2 = 0; | ||
98 | p.condition_value = 0; | ||
99 | |||
100 | - set a watchpoint which triggers on reads in the second watchpoint register | ||
101 | |||
102 | p.version = PPC_DEBUG_CURRENT_VERSION; | ||
103 | p.trigger_type = PPC_BREAKPOINT_TRIGGER_READ; | ||
104 | p.addr_mode = PPC_BREAKPOINT_MODE_EXACT; | ||
105 | p.condition_mode = PPC_BREAKPOINT_CONDITION_NONE; | ||
106 | p.addr = (uint64_t) address; | ||
107 | p.addr2 = 0; | ||
108 | p.condition_value = 0; | ||
109 | |||
110 | - set a watchpoint which triggers only with a specific value | ||
111 | |||
112 | p.version = PPC_DEBUG_CURRENT_VERSION; | ||
113 | p.trigger_type = PPC_BREAKPOINT_TRIGGER_READ; | ||
114 | p.addr_mode = PPC_BREAKPOINT_MODE_EXACT; | ||
115 | p.condition_mode = PPC_BREAKPOINT_CONDITION_AND | PPC_BREAKPOINT_CONDITION_BE_ALL; | ||
116 | p.addr = (uint64_t) address; | ||
117 | p.addr2 = 0; | ||
118 | p.condition_value = (uint64_t) condition; | ||
119 | |||
120 | - set a ranged hardware breakpoint | ||
121 | |||
122 | p.version = PPC_DEBUG_CURRENT_VERSION; | ||
123 | p.trigger_type = PPC_BREAKPOINT_TRIGGER_EXECUTE; | ||
124 | p.addr_mode = PPC_BREAKPOINT_MODE_RANGE_INCLUSIVE; | ||
125 | p.condition_mode = PPC_BREAKPOINT_CONDITION_NONE; | ||
126 | p.addr = (uint64_t) begin_range; | ||
127 | p.addr2 = (uint64_t) end_range; | ||
128 | p.condition_value = 0; | ||
129 | |||
130 | 3. PTRACE_DELHWDEBUG | ||
131 | |||
132 | Takes an integer which identifies an existing breakpoint or watchpoint | ||
133 | (i.e., the value returned from PTRACE_SETHWDEBUG), and deletes the | ||
134 | corresponding breakpoint or watchpoint.. | ||
diff --git a/Documentation/s390/CommonIO b/Documentation/s390/CommonIO index 339207d11d95..d378cba66456 100644 --- a/Documentation/s390/CommonIO +++ b/Documentation/s390/CommonIO | |||
@@ -87,6 +87,12 @@ Command line parameters | |||
87 | compatibility, by the device number in hexadecimal (0xabcd or abcd). Device | 87 | compatibility, by the device number in hexadecimal (0xabcd or abcd). Device |
88 | numbers given as 0xabcd will be interpreted as 0.0.abcd. | 88 | numbers given as 0xabcd will be interpreted as 0.0.abcd. |
89 | 89 | ||
90 | * /proc/cio_settle | ||
91 | |||
92 | A write request to this file is blocked until all queued cio actions are | ||
93 | handled. This will allow userspace to wait for pending work affecting | ||
94 | device availability after changing cio_ignore or the hardware configuration. | ||
95 | |||
90 | * For some of the information present in the /proc filesystem in 2.4 (namely, | 96 | * For some of the information present in the /proc filesystem in 2.4 (namely, |
91 | /proc/subchannels and /proc/chpids), see driver-model.txt. | 97 | /proc/subchannels and /proc/chpids), see driver-model.txt. |
92 | Information formerly in /proc/irq_count is now in /proc/interrupts. | 98 | Information formerly in /proc/irq_count is now in /proc/interrupts. |
diff --git a/Documentation/s390/driver-model.txt b/Documentation/s390/driver-model.txt index bde473df748d..ed265cf54cde 100644 --- a/Documentation/s390/driver-model.txt +++ b/Documentation/s390/driver-model.txt | |||
@@ -223,8 +223,8 @@ touched by the driver - it should use the ccwgroup device's driver_data for its | |||
223 | private data. | 223 | private data. |
224 | 224 | ||
225 | To implement a ccwgroup driver, please refer to include/asm/ccwgroup.h. Keep in | 225 | To implement a ccwgroup driver, please refer to include/asm/ccwgroup.h. Keep in |
226 | mind that most drivers will need to implement both a ccwgroup and a ccw driver | 226 | mind that most drivers will need to implement both a ccwgroup and a ccw |
227 | (unless you have a meta ccw driver, like cu3088 for lcs and ctc). | 227 | driver. |
228 | 228 | ||
229 | 229 | ||
230 | 2. Channel paths | 230 | 2. Channel paths |
diff --git a/Documentation/s390/kvm.txt b/Documentation/s390/kvm.txt index 6f5ceb0f09fc..85f3280d7ef6 100644 --- a/Documentation/s390/kvm.txt +++ b/Documentation/s390/kvm.txt | |||
@@ -102,7 +102,7 @@ args: unsigned long | |||
102 | see also: include/linux/kvm.h | 102 | see also: include/linux/kvm.h |
103 | This ioctl stores the state of the cpu at the guest real address given as | 103 | This ioctl stores the state of the cpu at the guest real address given as |
104 | argument, unless one of the following values defined in include/linux/kvm.h | 104 | argument, unless one of the following values defined in include/linux/kvm.h |
105 | is given as arguement: | 105 | is given as argument: |
106 | KVM_S390_STORE_STATUS_NOADDR - the CPU stores its status to the save area in | 106 | KVM_S390_STORE_STATUS_NOADDR - the CPU stores its status to the save area in |
107 | absolute lowcore as defined by the principles of operation | 107 | absolute lowcore as defined by the principles of operation |
108 | KVM_S390_STORE_STATUS_PREFIXED - the CPU stores its status to the save area in | 108 | KVM_S390_STORE_STATUS_PREFIXED - the CPU stores its status to the save area in |
diff --git a/Documentation/scsi/ChangeLog.lpfc b/Documentation/scsi/ChangeLog.lpfc index ff19a52fe004..2ffc1148eb95 100644 --- a/Documentation/scsi/ChangeLog.lpfc +++ b/Documentation/scsi/ChangeLog.lpfc | |||
@@ -989,8 +989,8 @@ Changes from 20040709 to 20040716 | |||
989 | * Remove redundant port_cmp != 2 check in if | 989 | * Remove redundant port_cmp != 2 check in if |
990 | (!port_cmp) { .... if (port_cmp != 2).... } | 990 | (!port_cmp) { .... if (port_cmp != 2).... } |
991 | * Clock changes: removed struct clk_data and timerList. | 991 | * Clock changes: removed struct clk_data and timerList. |
992 | * Clock changes: seperate nodev_tmo and els_retry_delay into 2 | 992 | * Clock changes: separate nodev_tmo and els_retry_delay into 2 |
993 | seperate timers and convert to 1 argument changed | 993 | separate timers and convert to 1 argument changed |
994 | LPFC_NODE_FARP_PEND_t to struct lpfc_node_farp_pend convert | 994 | LPFC_NODE_FARP_PEND_t to struct lpfc_node_farp_pend convert |
995 | ipfarp_tmo to 1 argument convert target struct tmofunc and | 995 | ipfarp_tmo to 1 argument convert target struct tmofunc and |
996 | rtplunfunc to 1 argument * cr_count, cr_delay and | 996 | rtplunfunc to 1 argument * cr_count, cr_delay and |
@@ -1514,7 +1514,7 @@ Changes from 20040402 to 20040409 | |||
1514 | * Remove unused elxclock declaration in elx_sli.h. | 1514 | * Remove unused elxclock declaration in elx_sli.h. |
1515 | * Since everywhere IOCB_ENTRY is used, the return value is cast, | 1515 | * Since everywhere IOCB_ENTRY is used, the return value is cast, |
1516 | move the cast into the macro. | 1516 | move the cast into the macro. |
1517 | * Split ioctls out into seperate files | 1517 | * Split ioctls out into separate files |
1518 | 1518 | ||
1519 | Changes from 20040326 to 20040402 | 1519 | Changes from 20040326 to 20040402 |
1520 | 1520 | ||
@@ -1534,7 +1534,7 @@ Changes from 20040326 to 20040402 | |||
1534 | * Unused variable cleanup | 1534 | * Unused variable cleanup |
1535 | * Use Linux list macros for DMABUF_t | 1535 | * Use Linux list macros for DMABUF_t |
1536 | * Break up ioctls into 3 sections, dfc, util, hbaapi | 1536 | * Break up ioctls into 3 sections, dfc, util, hbaapi |
1537 | rearranged code so this could be easily seperated into a | 1537 | rearranged code so this could be easily separated into a |
1538 | differnet module later All 3 are currently turned on by | 1538 | differnet module later All 3 are currently turned on by |
1539 | defines in lpfc_ioctl.c LPFC_DFC_IOCTL, LPFC_UTIL_IOCTL, | 1539 | defines in lpfc_ioctl.c LPFC_DFC_IOCTL, LPFC_UTIL_IOCTL, |
1540 | LPFC_HBAAPI_IOCTL | 1540 | LPFC_HBAAPI_IOCTL |
@@ -1551,7 +1551,7 @@ Changes from 20040326 to 20040402 | |||
1551 | started by lpfc_online(). lpfc_offline() only stopped | 1551 | started by lpfc_online(). lpfc_offline() only stopped |
1552 | els_timeout routine. It now stops all timeout routines | 1552 | els_timeout routine. It now stops all timeout routines |
1553 | associated with that hba. | 1553 | associated with that hba. |
1554 | * Replace seperate next and prev pointers in struct | 1554 | * Replace separate next and prev pointers in struct |
1555 | lpfc_bindlist with list_head type. In elxHBA_t, replace | 1555 | lpfc_bindlist with list_head type. In elxHBA_t, replace |
1556 | fc_nlpbind_start and _end with fc_nlpbind_list and use | 1556 | fc_nlpbind_start and _end with fc_nlpbind_list and use |
1557 | list_head macros to access it. | 1557 | list_head macros to access it. |
diff --git a/Documentation/scsi/ChangeLog.megaraid_sas b/Documentation/scsi/ChangeLog.megaraid_sas index 17ffa0607712..30023568805e 100644 --- a/Documentation/scsi/ChangeLog.megaraid_sas +++ b/Documentation/scsi/ChangeLog.megaraid_sas | |||
@@ -1,3 +1,19 @@ | |||
1 | 1 Release Date : Thur. Oct 29, 2009 09:12:45 PST 2009 - | ||
2 | (emaild-id:megaraidlinux@lsi.com) | ||
3 | Bo Yang | ||
4 | |||
5 | 2 Current Version : 00.00.04.17.1-rc1 | ||
6 | 3 Older Version : 00.00.04.12 | ||
7 | |||
8 | 1. Add the pad_0 in mfi frame structure to 0 to fix the | ||
9 | context value larger than 32bit value issue. | ||
10 | |||
11 | 2. Add the logic drive list to the driver. Driver will | ||
12 | keep the logic drive list internal after driver load. | ||
13 | |||
14 | 3. driver fixed the device update issue after get the AEN | ||
15 | PD delete/ADD, LD add/delete from FW. | ||
16 | |||
1 | 1 Release Date : Tues. July 28, 2009 10:12:45 PST 2009 - | 17 | 1 Release Date : Tues. July 28, 2009 10:12:45 PST 2009 - |
2 | (emaild-id:megaraidlinux@lsi.com) | 18 | (emaild-id:megaraidlinux@lsi.com) |
3 | Bo Yang | 19 | Bo Yang |
diff --git a/Documentation/serial/tty.txt b/Documentation/serial/tty.txt index 5e5349a4fcd2..7c900507279f 100644 --- a/Documentation/serial/tty.txt +++ b/Documentation/serial/tty.txt | |||
@@ -105,6 +105,10 @@ write_wakeup() - May be called at any point between open and close. | |||
105 | is permitted to call the driver write method from | 105 | is permitted to call the driver write method from |
106 | this function. In such a situation defer it. | 106 | this function. In such a situation defer it. |
107 | 107 | ||
108 | dcd_change() - Report to the tty line the current DCD pin status | ||
109 | changes and the relative timestamp. The timestamp | ||
110 | can be NULL. | ||
111 | |||
108 | 112 | ||
109 | Driver Access | 113 | Driver Access |
110 | 114 | ||
diff --git a/Documentation/sound/alsa/ALSA-Configuration.txt b/Documentation/sound/alsa/ALSA-Configuration.txt index 8923597bd2bd..bfcbbf88c44d 100644 --- a/Documentation/sound/alsa/ALSA-Configuration.txt +++ b/Documentation/sound/alsa/ALSA-Configuration.txt | |||
@@ -482,6 +482,9 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed. | |||
482 | 482 | ||
483 | reference_rate - reference sample rate, 44100 or 48000 (default) | 483 | reference_rate - reference sample rate, 44100 or 48000 (default) |
484 | multiple - multiple to ref. sample rate, 1 or 2 (default) | 484 | multiple - multiple to ref. sample rate, 1 or 2 (default) |
485 | subsystem - override the PCI SSID for probing; the value | ||
486 | consists of SSVID << 16 | SSDID. The default is | ||
487 | zero, which means no override. | ||
485 | 488 | ||
486 | This module supports multiple cards. | 489 | This module supports multiple cards. |
487 | 490 | ||
@@ -1123,6 +1126,21 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed. | |||
1123 | 1126 | ||
1124 | This module supports multiple cards, autoprobe and ISA PnP. | 1127 | This module supports multiple cards, autoprobe and ISA PnP. |
1125 | 1128 | ||
1129 | Module snd-jazz16 | ||
1130 | ------------------- | ||
1131 | |||
1132 | Module for Media Vision Jazz16 chipset. The chipset consists of 3 chips: | ||
1133 | MVD1216 + MVA416 + MVA514. | ||
1134 | |||
1135 | port - port # for SB DSP chip (0x210,0x220,0x230,0x240,0x250,0x260) | ||
1136 | irq - IRQ # for SB DSP chip (3,5,7,9,10,15) | ||
1137 | dma8 - DMA # for SB DSP chip (1,3) | ||
1138 | dma16 - DMA # for SB DSP chip (5,7) | ||
1139 | mpu_port - MPU-401 port # (0x300,0x310,0x320,0x330) | ||
1140 | mpu_irq - MPU-401 irq # (2,3,5,7) | ||
1141 | |||
1142 | This module supports multiple cards. | ||
1143 | |||
1126 | Module snd-korg1212 | 1144 | Module snd-korg1212 |
1127 | ------------------- | 1145 | ------------------- |
1128 | 1146 | ||
@@ -1791,6 +1809,13 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed. | |||
1791 | 1809 | ||
1792 | The power-management is supported. | 1810 | The power-management is supported. |
1793 | 1811 | ||
1812 | Module snd-ua101 | ||
1813 | ---------------- | ||
1814 | |||
1815 | Module for the Edirol UA-101/UA-1000 audio/MIDI interfaces. | ||
1816 | |||
1817 | This module supports multiple devices, autoprobe and hotplugging. | ||
1818 | |||
1794 | Module snd-usb-audio | 1819 | Module snd-usb-audio |
1795 | -------------------- | 1820 | -------------------- |
1796 | 1821 | ||
@@ -1923,7 +1948,7 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed. | |||
1923 | ------------------- | 1948 | ------------------- |
1924 | 1949 | ||
1925 | Module for sound cards based on the Asus AV100/AV200 chips, | 1950 | Module for sound cards based on the Asus AV100/AV200 chips, |
1926 | i.e., Xonar D1, DX, D2, D2X, HDAV1.3 (Deluxe), Essence ST | 1951 | i.e., Xonar D1, DX, D2, D2X, DS, HDAV1.3 (Deluxe), Essence ST |
1927 | (Deluxe) and Essence STX. | 1952 | (Deluxe) and Essence STX. |
1928 | 1953 | ||
1929 | This module supports autoprobe and multiple cards. | 1954 | This module supports autoprobe and multiple cards. |
diff --git a/Documentation/sound/alsa/HD-Audio-Models.txt b/Documentation/sound/alsa/HD-Audio-Models.txt index e72cee9e2a71..1d38b0dfba95 100644 --- a/Documentation/sound/alsa/HD-Audio-Models.txt +++ b/Documentation/sound/alsa/HD-Audio-Models.txt | |||
@@ -124,6 +124,8 @@ ALC882/883/885/888/889 | |||
124 | asus-a7m ASUS A7M | 124 | asus-a7m ASUS A7M |
125 | macpro MacPro support | 125 | macpro MacPro support |
126 | mb5 Macbook 5,1 | 126 | mb5 Macbook 5,1 |
127 | macmini3 Macmini 3,1 | ||
128 | mba21 Macbook Air 2,1 | ||
127 | mbp3 Macbook Pro rev3 | 129 | mbp3 Macbook Pro rev3 |
128 | imac24 iMac 24'' with jack detection | 130 | imac24 iMac 24'' with jack detection |
129 | imac91 iMac 9,1 | 131 | imac91 iMac 9,1 |
@@ -279,13 +281,16 @@ Conexant 5051 | |||
279 | laptop Basic Laptop config (default) | 281 | laptop Basic Laptop config (default) |
280 | hp HP Spartan laptop | 282 | hp HP Spartan laptop |
281 | hp-dv6736 HP dv6736 | 283 | hp-dv6736 HP dv6736 |
284 | hp-f700 HP Compaq Presario F700 | ||
282 | lenovo-x200 Lenovo X200 laptop | 285 | lenovo-x200 Lenovo X200 laptop |
286 | toshiba Toshiba Satellite M300 | ||
283 | 287 | ||
284 | Conexant 5066 | 288 | Conexant 5066 |
285 | ============= | 289 | ============= |
286 | laptop Basic Laptop config (default) | 290 | laptop Basic Laptop config (default) |
287 | dell-laptop Dell laptops | 291 | dell-laptop Dell laptops |
288 | olpc-xo-1_5 OLPC XO 1.5 | 292 | olpc-xo-1_5 OLPC XO 1.5 |
293 | ideapad Lenovo IdeaPad U150 | ||
289 | 294 | ||
290 | STAC9200 | 295 | STAC9200 |
291 | ======== | 296 | ======== |
diff --git a/Documentation/sound/alsa/HD-Audio.txt b/Documentation/sound/alsa/HD-Audio.txt index 6325bec06a72..f4dd3bf99d12 100644 --- a/Documentation/sound/alsa/HD-Audio.txt +++ b/Documentation/sound/alsa/HD-Audio.txt | |||
@@ -452,6 +452,33 @@ Similarly, the lines after `[verb]` are parsed as `init_verbs` | |||
452 | sysfs entries, and the lines after `[hint]` are parsed as `hints` | 452 | sysfs entries, and the lines after `[hint]` are parsed as `hints` |
453 | sysfs entries, respectively. | 453 | sysfs entries, respectively. |
454 | 454 | ||
455 | Another example to override the codec vendor id from 0x12345678 to | ||
456 | 0xdeadbeef is like below: | ||
457 | ------------------------------------------------------------------------ | ||
458 | [codec] | ||
459 | 0x12345678 0xabcd1234 2 | ||
460 | |||
461 | [vendor_id] | ||
462 | 0xdeadbeef | ||
463 | ------------------------------------------------------------------------ | ||
464 | |||
465 | In the similar way, you can override the codec subsystem_id via | ||
466 | `[subsystem_id]`, the revision id via `[revision_id]` line. | ||
467 | Also, the codec chip name can be rewritten via `[chip_name]` line. | ||
468 | ------------------------------------------------------------------------ | ||
469 | [codec] | ||
470 | 0x12345678 0xabcd1234 2 | ||
471 | |||
472 | [subsystem_id] | ||
473 | 0xffff1111 | ||
474 | |||
475 | [revision_id] | ||
476 | 0x10 | ||
477 | |||
478 | [chip_name] | ||
479 | My-own NEWS-0002 | ||
480 | ------------------------------------------------------------------------ | ||
481 | |||
455 | The hd-audio driver reads the file via request_firmware(). Thus, | 482 | The hd-audio driver reads the file via request_firmware(). Thus, |
456 | a patch file has to be located on the appropriate firmware path, | 483 | a patch file has to be located on the appropriate firmware path, |
457 | typically, /lib/firmware. For example, when you pass the option | 484 | typically, /lib/firmware. For example, when you pass the option |
diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt index fc5790d36cd9..6c7d18c53f84 100644 --- a/Documentation/sysctl/vm.txt +++ b/Documentation/sysctl/vm.txt | |||
@@ -573,11 +573,14 @@ Because other nodes' memory may be free. This means system total status | |||
573 | may be not fatal yet. | 573 | may be not fatal yet. |
574 | 574 | ||
575 | If this is set to 2, the kernel panics compulsorily even on the | 575 | If this is set to 2, the kernel panics compulsorily even on the |
576 | above-mentioned. | 576 | above-mentioned. Even oom happens under memory cgroup, the whole |
577 | system panics. | ||
577 | 578 | ||
578 | The default value is 0. | 579 | The default value is 0. |
579 | 1 and 2 are for failover of clustering. Please select either | 580 | 1 and 2 are for failover of clustering. Please select either |
580 | according to your policy of failover. | 581 | according to your policy of failover. |
582 | panic_on_oom=2+kdump gives you very strong tool to investigate | ||
583 | why oom happens. You can get snapshot. | ||
581 | 584 | ||
582 | ============================================================= | 585 | ============================================================= |
583 | 586 | ||
diff --git a/Documentation/timers/00-INDEX b/Documentation/timers/00-INDEX index 397dc35e1323..a9248da5cdbc 100644 --- a/Documentation/timers/00-INDEX +++ b/Documentation/timers/00-INDEX | |||
@@ -4,6 +4,8 @@ highres.txt | |||
4 | - High resolution timers and dynamic ticks design notes | 4 | - High resolution timers and dynamic ticks design notes |
5 | hpet.txt | 5 | hpet.txt |
6 | - High Precision Event Timer Driver for Linux | 6 | - High Precision Event Timer Driver for Linux |
7 | hpet_example.c | ||
8 | - sample hpet timer test program | ||
7 | hrtimers.txt | 9 | hrtimers.txt |
8 | - subsystem for high-resolution kernel timers | 10 | - subsystem for high-resolution kernel timers |
9 | timer_stats.txt | 11 | timer_stats.txt |
diff --git a/Documentation/timers/Makefile b/Documentation/timers/Makefile new file mode 100644 index 000000000000..c85625f4ab25 --- /dev/null +++ b/Documentation/timers/Makefile | |||
@@ -0,0 +1,8 @@ | |||
1 | # kbuild trick to avoid linker error. Can be omitted if a module is built. | ||
2 | obj- := dummy.o | ||
3 | |||
4 | # List of programs to build | ||
5 | hostprogs-y := hpet_example | ||
6 | |||
7 | # Tell kbuild to always build the programs | ||
8 | always := $(hostprogs-y) | ||
diff --git a/Documentation/timers/hpet.txt b/Documentation/timers/hpet.txt index 16d25e6b5a00..767392ffd31e 100644 --- a/Documentation/timers/hpet.txt +++ b/Documentation/timers/hpet.txt | |||
@@ -26,274 +26,5 @@ initialization. An example of this initialization can be found in | |||
26 | arch/x86/kernel/hpet.c. | 26 | arch/x86/kernel/hpet.c. |
27 | 27 | ||
28 | The driver provides a userspace API which resembles the API found in the | 28 | The driver provides a userspace API which resembles the API found in the |
29 | RTC driver framework. An example user space program is provided below. | 29 | RTC driver framework. An example user space program is provided in |
30 | 30 | file:Documentation/timers/hpet_example.c | |
31 | #include <stdio.h> | ||
32 | #include <stdlib.h> | ||
33 | #include <unistd.h> | ||
34 | #include <fcntl.h> | ||
35 | #include <string.h> | ||
36 | #include <memory.h> | ||
37 | #include <malloc.h> | ||
38 | #include <time.h> | ||
39 | #include <ctype.h> | ||
40 | #include <sys/types.h> | ||
41 | #include <sys/wait.h> | ||
42 | #include <signal.h> | ||
43 | #include <fcntl.h> | ||
44 | #include <errno.h> | ||
45 | #include <sys/time.h> | ||
46 | #include <linux/hpet.h> | ||
47 | |||
48 | |||
49 | extern void hpet_open_close(int, const char **); | ||
50 | extern void hpet_info(int, const char **); | ||
51 | extern void hpet_poll(int, const char **); | ||
52 | extern void hpet_fasync(int, const char **); | ||
53 | extern void hpet_read(int, const char **); | ||
54 | |||
55 | #include <sys/poll.h> | ||
56 | #include <sys/ioctl.h> | ||
57 | #include <signal.h> | ||
58 | |||
59 | struct hpet_command { | ||
60 | char *command; | ||
61 | void (*func)(int argc, const char ** argv); | ||
62 | } hpet_command[] = { | ||
63 | { | ||
64 | "open-close", | ||
65 | hpet_open_close | ||
66 | }, | ||
67 | { | ||
68 | "info", | ||
69 | hpet_info | ||
70 | }, | ||
71 | { | ||
72 | "poll", | ||
73 | hpet_poll | ||
74 | }, | ||
75 | { | ||
76 | "fasync", | ||
77 | hpet_fasync | ||
78 | }, | ||
79 | }; | ||
80 | |||
81 | int | ||
82 | main(int argc, const char ** argv) | ||
83 | { | ||
84 | int i; | ||
85 | |||
86 | argc--; | ||
87 | argv++; | ||
88 | |||
89 | if (!argc) { | ||
90 | fprintf(stderr, "-hpet: requires command\n"); | ||
91 | return -1; | ||
92 | } | ||
93 | |||
94 | |||
95 | for (i = 0; i < (sizeof (hpet_command) / sizeof (hpet_command[0])); i++) | ||
96 | if (!strcmp(argv[0], hpet_command[i].command)) { | ||
97 | argc--; | ||
98 | argv++; | ||
99 | fprintf(stderr, "-hpet: executing %s\n", | ||
100 | hpet_command[i].command); | ||
101 | hpet_command[i].func(argc, argv); | ||
102 | return 0; | ||
103 | } | ||
104 | |||
105 | fprintf(stderr, "do_hpet: command %s not implemented\n", argv[0]); | ||
106 | |||
107 | return -1; | ||
108 | } | ||
109 | |||
110 | void | ||
111 | hpet_open_close(int argc, const char **argv) | ||
112 | { | ||
113 | int fd; | ||
114 | |||
115 | if (argc != 1) { | ||
116 | fprintf(stderr, "hpet_open_close: device-name\n"); | ||
117 | return; | ||
118 | } | ||
119 | |||
120 | fd = open(argv[0], O_RDONLY); | ||
121 | if (fd < 0) | ||
122 | fprintf(stderr, "hpet_open_close: open failed\n"); | ||
123 | else | ||
124 | close(fd); | ||
125 | |||
126 | return; | ||
127 | } | ||
128 | |||
129 | void | ||
130 | hpet_info(int argc, const char **argv) | ||
131 | { | ||
132 | } | ||
133 | |||
134 | void | ||
135 | hpet_poll(int argc, const char **argv) | ||
136 | { | ||
137 | unsigned long freq; | ||
138 | int iterations, i, fd; | ||
139 | struct pollfd pfd; | ||
140 | struct hpet_info info; | ||
141 | struct timeval stv, etv; | ||
142 | struct timezone tz; | ||
143 | long usec; | ||
144 | |||
145 | if (argc != 3) { | ||
146 | fprintf(stderr, "hpet_poll: device-name freq iterations\n"); | ||
147 | return; | ||
148 | } | ||
149 | |||
150 | freq = atoi(argv[1]); | ||
151 | iterations = atoi(argv[2]); | ||
152 | |||
153 | fd = open(argv[0], O_RDONLY); | ||
154 | |||
155 | if (fd < 0) { | ||
156 | fprintf(stderr, "hpet_poll: open of %s failed\n", argv[0]); | ||
157 | return; | ||
158 | } | ||
159 | |||
160 | if (ioctl(fd, HPET_IRQFREQ, freq) < 0) { | ||
161 | fprintf(stderr, "hpet_poll: HPET_IRQFREQ failed\n"); | ||
162 | goto out; | ||
163 | } | ||
164 | |||
165 | if (ioctl(fd, HPET_INFO, &info) < 0) { | ||
166 | fprintf(stderr, "hpet_poll: failed to get info\n"); | ||
167 | goto out; | ||
168 | } | ||
169 | |||
170 | fprintf(stderr, "hpet_poll: info.hi_flags 0x%lx\n", info.hi_flags); | ||
171 | |||
172 | if (info.hi_flags && (ioctl(fd, HPET_EPI, 0) < 0)) { | ||
173 | fprintf(stderr, "hpet_poll: HPET_EPI failed\n"); | ||
174 | goto out; | ||
175 | } | ||
176 | |||
177 | if (ioctl(fd, HPET_IE_ON, 0) < 0) { | ||
178 | fprintf(stderr, "hpet_poll, HPET_IE_ON failed\n"); | ||
179 | goto out; | ||
180 | } | ||
181 | |||
182 | pfd.fd = fd; | ||
183 | pfd.events = POLLIN; | ||
184 | |||
185 | for (i = 0; i < iterations; i++) { | ||
186 | pfd.revents = 0; | ||
187 | gettimeofday(&stv, &tz); | ||
188 | if (poll(&pfd, 1, -1) < 0) | ||
189 | fprintf(stderr, "hpet_poll: poll failed\n"); | ||
190 | else { | ||
191 | long data; | ||
192 | |||
193 | gettimeofday(&etv, &tz); | ||
194 | usec = stv.tv_sec * 1000000 + stv.tv_usec; | ||
195 | usec = (etv.tv_sec * 1000000 + etv.tv_usec) - usec; | ||
196 | |||
197 | fprintf(stderr, | ||
198 | "hpet_poll: expired time = 0x%lx\n", usec); | ||
199 | |||
200 | fprintf(stderr, "hpet_poll: revents = 0x%x\n", | ||
201 | pfd.revents); | ||
202 | |||
203 | if (read(fd, &data, sizeof(data)) != sizeof(data)) { | ||
204 | fprintf(stderr, "hpet_poll: read failed\n"); | ||
205 | } | ||
206 | else | ||
207 | fprintf(stderr, "hpet_poll: data 0x%lx\n", | ||
208 | data); | ||
209 | } | ||
210 | } | ||
211 | |||
212 | out: | ||
213 | close(fd); | ||
214 | return; | ||
215 | } | ||
216 | |||
217 | static int hpet_sigio_count; | ||
218 | |||
219 | static void | ||
220 | hpet_sigio(int val) | ||
221 | { | ||
222 | fprintf(stderr, "hpet_sigio: called\n"); | ||
223 | hpet_sigio_count++; | ||
224 | } | ||
225 | |||
226 | void | ||
227 | hpet_fasync(int argc, const char **argv) | ||
228 | { | ||
229 | unsigned long freq; | ||
230 | int iterations, i, fd, value; | ||
231 | sig_t oldsig; | ||
232 | struct hpet_info info; | ||
233 | |||
234 | hpet_sigio_count = 0; | ||
235 | fd = -1; | ||
236 | |||
237 | if ((oldsig = signal(SIGIO, hpet_sigio)) == SIG_ERR) { | ||
238 | fprintf(stderr, "hpet_fasync: failed to set signal handler\n"); | ||
239 | return; | ||
240 | } | ||
241 | |||
242 | if (argc != 3) { | ||
243 | fprintf(stderr, "hpet_fasync: device-name freq iterations\n"); | ||
244 | goto out; | ||
245 | } | ||
246 | |||
247 | fd = open(argv[0], O_RDONLY); | ||
248 | |||
249 | if (fd < 0) { | ||
250 | fprintf(stderr, "hpet_fasync: failed to open %s\n", argv[0]); | ||
251 | return; | ||
252 | } | ||
253 | |||
254 | |||
255 | if ((fcntl(fd, F_SETOWN, getpid()) == 1) || | ||
256 | ((value = fcntl(fd, F_GETFL)) == 1) || | ||
257 | (fcntl(fd, F_SETFL, value | O_ASYNC) == 1)) { | ||
258 | fprintf(stderr, "hpet_fasync: fcntl failed\n"); | ||
259 | goto out; | ||
260 | } | ||
261 | |||
262 | freq = atoi(argv[1]); | ||
263 | iterations = atoi(argv[2]); | ||
264 | |||
265 | if (ioctl(fd, HPET_IRQFREQ, freq) < 0) { | ||
266 | fprintf(stderr, "hpet_fasync: HPET_IRQFREQ failed\n"); | ||
267 | goto out; | ||
268 | } | ||
269 | |||
270 | if (ioctl(fd, HPET_INFO, &info) < 0) { | ||
271 | fprintf(stderr, "hpet_fasync: failed to get info\n"); | ||
272 | goto out; | ||
273 | } | ||
274 | |||
275 | fprintf(stderr, "hpet_fasync: info.hi_flags 0x%lx\n", info.hi_flags); | ||
276 | |||
277 | if (info.hi_flags && (ioctl(fd, HPET_EPI, 0) < 0)) { | ||
278 | fprintf(stderr, "hpet_fasync: HPET_EPI failed\n"); | ||
279 | goto out; | ||
280 | } | ||
281 | |||
282 | if (ioctl(fd, HPET_IE_ON, 0) < 0) { | ||
283 | fprintf(stderr, "hpet_fasync, HPET_IE_ON failed\n"); | ||
284 | goto out; | ||
285 | } | ||
286 | |||
287 | for (i = 0; i < iterations; i++) { | ||
288 | (void) pause(); | ||
289 | fprintf(stderr, "hpet_fasync: count = %d\n", hpet_sigio_count); | ||
290 | } | ||
291 | |||
292 | out: | ||
293 | signal(SIGIO, oldsig); | ||
294 | |||
295 | if (fd >= 0) | ||
296 | close(fd); | ||
297 | |||
298 | return; | ||
299 | } | ||
diff --git a/Documentation/timers/hpet_example.c b/Documentation/timers/hpet_example.c new file mode 100644 index 000000000000..f9ce2d9fdfd5 --- /dev/null +++ b/Documentation/timers/hpet_example.c | |||
@@ -0,0 +1,269 @@ | |||
1 | #include <stdio.h> | ||
2 | #include <stdlib.h> | ||
3 | #include <unistd.h> | ||
4 | #include <fcntl.h> | ||
5 | #include <string.h> | ||
6 | #include <memory.h> | ||
7 | #include <malloc.h> | ||
8 | #include <time.h> | ||
9 | #include <ctype.h> | ||
10 | #include <sys/types.h> | ||
11 | #include <sys/wait.h> | ||
12 | #include <signal.h> | ||
13 | #include <fcntl.h> | ||
14 | #include <errno.h> | ||
15 | #include <sys/time.h> | ||
16 | #include <linux/hpet.h> | ||
17 | |||
18 | |||
19 | extern void hpet_open_close(int, const char **); | ||
20 | extern void hpet_info(int, const char **); | ||
21 | extern void hpet_poll(int, const char **); | ||
22 | extern void hpet_fasync(int, const char **); | ||
23 | extern void hpet_read(int, const char **); | ||
24 | |||
25 | #include <sys/poll.h> | ||
26 | #include <sys/ioctl.h> | ||
27 | #include <signal.h> | ||
28 | |||
29 | struct hpet_command { | ||
30 | char *command; | ||
31 | void (*func)(int argc, const char ** argv); | ||
32 | } hpet_command[] = { | ||
33 | { | ||
34 | "open-close", | ||
35 | hpet_open_close | ||
36 | }, | ||
37 | { | ||
38 | "info", | ||
39 | hpet_info | ||
40 | }, | ||
41 | { | ||
42 | "poll", | ||
43 | hpet_poll | ||
44 | }, | ||
45 | { | ||
46 | "fasync", | ||
47 | hpet_fasync | ||
48 | }, | ||
49 | }; | ||
50 | |||
51 | int | ||
52 | main(int argc, const char ** argv) | ||
53 | { | ||
54 | int i; | ||
55 | |||
56 | argc--; | ||
57 | argv++; | ||
58 | |||
59 | if (!argc) { | ||
60 | fprintf(stderr, "-hpet: requires command\n"); | ||
61 | return -1; | ||
62 | } | ||
63 | |||
64 | |||
65 | for (i = 0; i < (sizeof (hpet_command) / sizeof (hpet_command[0])); i++) | ||
66 | if (!strcmp(argv[0], hpet_command[i].command)) { | ||
67 | argc--; | ||
68 | argv++; | ||
69 | fprintf(stderr, "-hpet: executing %s\n", | ||
70 | hpet_command[i].command); | ||
71 | hpet_command[i].func(argc, argv); | ||
72 | return 0; | ||
73 | } | ||
74 | |||
75 | fprintf(stderr, "do_hpet: command %s not implemented\n", argv[0]); | ||
76 | |||
77 | return -1; | ||
78 | } | ||
79 | |||
80 | void | ||
81 | hpet_open_close(int argc, const char **argv) | ||
82 | { | ||
83 | int fd; | ||
84 | |||
85 | if (argc != 1) { | ||
86 | fprintf(stderr, "hpet_open_close: device-name\n"); | ||
87 | return; | ||
88 | } | ||
89 | |||
90 | fd = open(argv[0], O_RDONLY); | ||
91 | if (fd < 0) | ||
92 | fprintf(stderr, "hpet_open_close: open failed\n"); | ||
93 | else | ||
94 | close(fd); | ||
95 | |||
96 | return; | ||
97 | } | ||
98 | |||
99 | void | ||
100 | hpet_info(int argc, const char **argv) | ||
101 | { | ||
102 | } | ||
103 | |||
104 | void | ||
105 | hpet_poll(int argc, const char **argv) | ||
106 | { | ||
107 | unsigned long freq; | ||
108 | int iterations, i, fd; | ||
109 | struct pollfd pfd; | ||
110 | struct hpet_info info; | ||
111 | struct timeval stv, etv; | ||
112 | struct timezone tz; | ||
113 | long usec; | ||
114 | |||
115 | if (argc != 3) { | ||
116 | fprintf(stderr, "hpet_poll: device-name freq iterations\n"); | ||
117 | return; | ||
118 | } | ||
119 | |||
120 | freq = atoi(argv[1]); | ||
121 | iterations = atoi(argv[2]); | ||
122 | |||
123 | fd = open(argv[0], O_RDONLY); | ||
124 | |||
125 | if (fd < 0) { | ||
126 | fprintf(stderr, "hpet_poll: open of %s failed\n", argv[0]); | ||
127 | return; | ||
128 | } | ||
129 | |||
130 | if (ioctl(fd, HPET_IRQFREQ, freq) < 0) { | ||
131 | fprintf(stderr, "hpet_poll: HPET_IRQFREQ failed\n"); | ||
132 | goto out; | ||
133 | } | ||
134 | |||
135 | if (ioctl(fd, HPET_INFO, &info) < 0) { | ||
136 | fprintf(stderr, "hpet_poll: failed to get info\n"); | ||
137 | goto out; | ||
138 | } | ||
139 | |||
140 | fprintf(stderr, "hpet_poll: info.hi_flags 0x%lx\n", info.hi_flags); | ||
141 | |||
142 | if (info.hi_flags && (ioctl(fd, HPET_EPI, 0) < 0)) { | ||
143 | fprintf(stderr, "hpet_poll: HPET_EPI failed\n"); | ||
144 | goto out; | ||
145 | } | ||
146 | |||
147 | if (ioctl(fd, HPET_IE_ON, 0) < 0) { | ||
148 | fprintf(stderr, "hpet_poll, HPET_IE_ON failed\n"); | ||
149 | goto out; | ||
150 | } | ||
151 | |||
152 | pfd.fd = fd; | ||
153 | pfd.events = POLLIN; | ||
154 | |||
155 | for (i = 0; i < iterations; i++) { | ||
156 | pfd.revents = 0; | ||
157 | gettimeofday(&stv, &tz); | ||
158 | if (poll(&pfd, 1, -1) < 0) | ||
159 | fprintf(stderr, "hpet_poll: poll failed\n"); | ||
160 | else { | ||
161 | long data; | ||
162 | |||
163 | gettimeofday(&etv, &tz); | ||
164 | usec = stv.tv_sec * 1000000 + stv.tv_usec; | ||
165 | usec = (etv.tv_sec * 1000000 + etv.tv_usec) - usec; | ||
166 | |||
167 | fprintf(stderr, | ||
168 | "hpet_poll: expired time = 0x%lx\n", usec); | ||
169 | |||
170 | fprintf(stderr, "hpet_poll: revents = 0x%x\n", | ||
171 | pfd.revents); | ||
172 | |||
173 | if (read(fd, &data, sizeof(data)) != sizeof(data)) { | ||
174 | fprintf(stderr, "hpet_poll: read failed\n"); | ||
175 | } | ||
176 | else | ||
177 | fprintf(stderr, "hpet_poll: data 0x%lx\n", | ||
178 | data); | ||
179 | } | ||
180 | } | ||
181 | |||
182 | out: | ||
183 | close(fd); | ||
184 | return; | ||
185 | } | ||
186 | |||
187 | static int hpet_sigio_count; | ||
188 | |||
189 | static void | ||
190 | hpet_sigio(int val) | ||
191 | { | ||
192 | fprintf(stderr, "hpet_sigio: called\n"); | ||
193 | hpet_sigio_count++; | ||
194 | } | ||
195 | |||
196 | void | ||
197 | hpet_fasync(int argc, const char **argv) | ||
198 | { | ||
199 | unsigned long freq; | ||
200 | int iterations, i, fd, value; | ||
201 | sig_t oldsig; | ||
202 | struct hpet_info info; | ||
203 | |||
204 | hpet_sigio_count = 0; | ||
205 | fd = -1; | ||
206 | |||
207 | if ((oldsig = signal(SIGIO, hpet_sigio)) == SIG_ERR) { | ||
208 | fprintf(stderr, "hpet_fasync: failed to set signal handler\n"); | ||
209 | return; | ||
210 | } | ||
211 | |||
212 | if (argc != 3) { | ||
213 | fprintf(stderr, "hpet_fasync: device-name freq iterations\n"); | ||
214 | goto out; | ||
215 | } | ||
216 | |||
217 | fd = open(argv[0], O_RDONLY); | ||
218 | |||
219 | if (fd < 0) { | ||
220 | fprintf(stderr, "hpet_fasync: failed to open %s\n", argv[0]); | ||
221 | return; | ||
222 | } | ||
223 | |||
224 | |||
225 | if ((fcntl(fd, F_SETOWN, getpid()) == 1) || | ||
226 | ((value = fcntl(fd, F_GETFL)) == 1) || | ||
227 | (fcntl(fd, F_SETFL, value | O_ASYNC) == 1)) { | ||
228 | fprintf(stderr, "hpet_fasync: fcntl failed\n"); | ||
229 | goto out; | ||
230 | } | ||
231 | |||
232 | freq = atoi(argv[1]); | ||
233 | iterations = atoi(argv[2]); | ||
234 | |||
235 | if (ioctl(fd, HPET_IRQFREQ, freq) < 0) { | ||
236 | fprintf(stderr, "hpet_fasync: HPET_IRQFREQ failed\n"); | ||
237 | goto out; | ||
238 | } | ||
239 | |||
240 | if (ioctl(fd, HPET_INFO, &info) < 0) { | ||
241 | fprintf(stderr, "hpet_fasync: failed to get info\n"); | ||
242 | goto out; | ||
243 | } | ||
244 | |||
245 | fprintf(stderr, "hpet_fasync: info.hi_flags 0x%lx\n", info.hi_flags); | ||
246 | |||
247 | if (info.hi_flags && (ioctl(fd, HPET_EPI, 0) < 0)) { | ||
248 | fprintf(stderr, "hpet_fasync: HPET_EPI failed\n"); | ||
249 | goto out; | ||
250 | } | ||
251 | |||
252 | if (ioctl(fd, HPET_IE_ON, 0) < 0) { | ||
253 | fprintf(stderr, "hpet_fasync, HPET_IE_ON failed\n"); | ||
254 | goto out; | ||
255 | } | ||
256 | |||
257 | for (i = 0; i < iterations; i++) { | ||
258 | (void) pause(); | ||
259 | fprintf(stderr, "hpet_fasync: count = %d\n", hpet_sigio_count); | ||
260 | } | ||
261 | |||
262 | out: | ||
263 | signal(SIGIO, oldsig); | ||
264 | |||
265 | if (fd >= 0) | ||
266 | close(fd); | ||
267 | |||
268 | return; | ||
269 | } | ||
diff --git a/Documentation/trace/ftrace-design.txt b/Documentation/trace/ftrace-design.txt index 6a5a579126b0..f1f81afee8a0 100644 --- a/Documentation/trace/ftrace-design.txt +++ b/Documentation/trace/ftrace-design.txt | |||
@@ -238,11 +238,10 @@ HAVE_SYSCALL_TRACEPOINTS | |||
238 | 238 | ||
239 | You need very few things to get the syscalls tracing in an arch. | 239 | You need very few things to get the syscalls tracing in an arch. |
240 | 240 | ||
241 | - Support HAVE_ARCH_TRACEHOOK (see arch/Kconfig). | ||
241 | - Have a NR_syscalls variable in <asm/unistd.h> that provides the number | 242 | - Have a NR_syscalls variable in <asm/unistd.h> that provides the number |
242 | of syscalls supported by the arch. | 243 | of syscalls supported by the arch. |
243 | - Implement arch_syscall_addr() that resolves a syscall address from a | 244 | - Support the TIF_SYSCALL_TRACEPOINT thread flags. |
244 | syscall number. | ||
245 | - Support the TIF_SYSCALL_TRACEPOINT thread flags | ||
246 | - Put the trace_sys_enter() and trace_sys_exit() tracepoints calls from ptrace | 245 | - Put the trace_sys_enter() and trace_sys_exit() tracepoints calls from ptrace |
247 | in the ptrace syscalls tracing path. | 246 | in the ptrace syscalls tracing path. |
248 | - Tag this arch as HAVE_SYSCALL_TRACEPOINTS. | 247 | - Tag this arch as HAVE_SYSCALL_TRACEPOINTS. |
diff --git a/Documentation/trace/ftrace.txt b/Documentation/trace/ftrace.txt index bab3040da548..03485bfbd797 100644 --- a/Documentation/trace/ftrace.txt +++ b/Documentation/trace/ftrace.txt | |||
@@ -1588,7 +1588,7 @@ module author does not need to worry about it. | |||
1588 | 1588 | ||
1589 | When tracing is enabled, kstop_machine is called to prevent | 1589 | When tracing is enabled, kstop_machine is called to prevent |
1590 | races with the CPUS executing code being modified (which can | 1590 | races with the CPUS executing code being modified (which can |
1591 | cause the CPU to do undesireable things), and the nops are | 1591 | cause the CPU to do undesirable things), and the nops are |
1592 | patched back to calls. But this time, they do not call mcount | 1592 | patched back to calls. But this time, they do not call mcount |
1593 | (which is just a function stub). They now call into the ftrace | 1593 | (which is just a function stub). They now call into the ftrace |
1594 | infrastructure. | 1594 | infrastructure. |
diff --git a/Documentation/trace/kprobetrace.txt b/Documentation/trace/kprobetrace.txt index 47aabeebbdf6..a9100b28eb84 100644 --- a/Documentation/trace/kprobetrace.txt +++ b/Documentation/trace/kprobetrace.txt | |||
@@ -24,6 +24,7 @@ Synopsis of kprobe_events | |||
24 | ------------------------- | 24 | ------------------------- |
25 | p[:[GRP/]EVENT] SYMBOL[+offs]|MEMADDR [FETCHARGS] : Set a probe | 25 | p[:[GRP/]EVENT] SYMBOL[+offs]|MEMADDR [FETCHARGS] : Set a probe |
26 | r[:[GRP/]EVENT] SYMBOL[+0] [FETCHARGS] : Set a return probe | 26 | r[:[GRP/]EVENT] SYMBOL[+0] [FETCHARGS] : Set a return probe |
27 | -:[GRP/]EVENT : Clear a probe | ||
27 | 28 | ||
28 | GRP : Group name. If omitted, use "kprobes" for it. | 29 | GRP : Group name. If omitted, use "kprobes" for it. |
29 | EVENT : Event name. If omitted, the event name is generated | 30 | EVENT : Event name. If omitted, the event name is generated |
@@ -37,15 +38,12 @@ Synopsis of kprobe_events | |||
37 | @SYM[+|-offs] : Fetch memory at SYM +|- offs (SYM should be a data symbol) | 38 | @SYM[+|-offs] : Fetch memory at SYM +|- offs (SYM should be a data symbol) |
38 | $stackN : Fetch Nth entry of stack (N >= 0) | 39 | $stackN : Fetch Nth entry of stack (N >= 0) |
39 | $stack : Fetch stack address. | 40 | $stack : Fetch stack address. |
40 | $argN : Fetch function argument. (N >= 0)(*) | 41 | $retval : Fetch return value.(*) |
41 | $retval : Fetch return value.(**) | 42 | +|-offs(FETCHARG) : Fetch memory at FETCHARG +|- offs address.(**) |
42 | +|-offs(FETCHARG) : Fetch memory at FETCHARG +|- offs address.(***) | ||
43 | NAME=FETCHARG: Set NAME as the argument name of FETCHARG. | 43 | NAME=FETCHARG: Set NAME as the argument name of FETCHARG. |
44 | 44 | ||
45 | (*) aN may not correct on asmlinkaged functions and at the middle of | 45 | (*) only for return probe. |
46 | function body. | 46 | (**) this is useful for fetching a field of data structures. |
47 | (**) only for return probe. | ||
48 | (***) this is useful for fetching a field of data structures. | ||
49 | 47 | ||
50 | 48 | ||
51 | Per-Probe Event Filtering | 49 | Per-Probe Event Filtering |
@@ -82,13 +80,16 @@ Usage examples | |||
82 | To add a probe as a new event, write a new definition to kprobe_events | 80 | To add a probe as a new event, write a new definition to kprobe_events |
83 | as below. | 81 | as below. |
84 | 82 | ||
85 | echo p:myprobe do_sys_open dfd=$arg0 filename=$arg1 flags=$arg2 mode=$arg3 > /sys/kernel/debug/tracing/kprobe_events | 83 | echo 'p:myprobe do_sys_open dfd=%ax filename=%dx flags=%cx mode=+4($stack)' > /sys/kernel/debug/tracing/kprobe_events |
86 | 84 | ||
87 | This sets a kprobe on the top of do_sys_open() function with recording | 85 | This sets a kprobe on the top of do_sys_open() function with recording |
88 | 1st to 4th arguments as "myprobe" event. As this example shows, users can | 86 | 1st to 4th arguments as "myprobe" event. Note, which register/stack entry is |
89 | choose more familiar names for each arguments. | 87 | assigned to each function argument depends on arch-specific ABI. If you unsure |
88 | the ABI, please try to use probe subcommand of perf-tools (you can find it | ||
89 | under tools/perf/). | ||
90 | As this example shows, users can choose more familiar names for each arguments. | ||
90 | 91 | ||
91 | echo r:myretprobe do_sys_open $retval >> /sys/kernel/debug/tracing/kprobe_events | 92 | echo 'r:myretprobe do_sys_open $retval' >> /sys/kernel/debug/tracing/kprobe_events |
92 | 93 | ||
93 | This sets a kretprobe on the return point of do_sys_open() function with | 94 | This sets a kretprobe on the return point of do_sys_open() function with |
94 | recording return value as "myretprobe" event. | 95 | recording return value as "myretprobe" event. |
@@ -97,23 +98,24 @@ recording return value as "myretprobe" event. | |||
97 | 98 | ||
98 | cat /sys/kernel/debug/tracing/events/kprobes/myprobe/format | 99 | cat /sys/kernel/debug/tracing/events/kprobes/myprobe/format |
99 | name: myprobe | 100 | name: myprobe |
100 | ID: 75 | 101 | ID: 780 |
101 | format: | 102 | format: |
102 | field:unsigned short common_type; offset:0; size:2; | 103 | field:unsigned short common_type; offset:0; size:2; signed:0; |
103 | field:unsigned char common_flags; offset:2; size:1; | 104 | field:unsigned char common_flags; offset:2; size:1; signed:0; |
104 | field:unsigned char common_preempt_count; offset:3; size:1; | 105 | field:unsigned char common_preempt_count; offset:3; size:1;signed:0; |
105 | field:int common_pid; offset:4; size:4; | 106 | field:int common_pid; offset:4; size:4; signed:1; |
106 | field:int common_tgid; offset:8; size:4; | 107 | field:int common_lock_depth; offset:8; size:4; signed:1; |
107 | 108 | ||
108 | field: unsigned long ip; offset:16;tsize:8; | 109 | field:unsigned long __probe_ip; offset:12; size:4; signed:0; |
109 | field: int nargs; offset:24;tsize:4; | 110 | field:int __probe_nargs; offset:16; size:4; signed:1; |
110 | field: unsigned long dfd; offset:32;tsize:8; | 111 | field:unsigned long dfd; offset:20; size:4; signed:0; |
111 | field: unsigned long filename; offset:40;tsize:8; | 112 | field:unsigned long filename; offset:24; size:4; signed:0; |
112 | field: unsigned long flags; offset:48;tsize:8; | 113 | field:unsigned long flags; offset:28; size:4; signed:0; |
113 | field: unsigned long mode; offset:56;tsize:8; | 114 | field:unsigned long mode; offset:32; size:4; signed:0; |
114 | 115 | ||
115 | print fmt: "(%lx) dfd=%lx filename=%lx flags=%lx mode=%lx", REC->ip, REC->dfd, REC->filename, REC->flags, REC->mode | ||
116 | 116 | ||
117 | print fmt: "(%lx) dfd=%lx filename=%lx flags=%lx mode=%lx", REC->__probe_ip, | ||
118 | REC->dfd, REC->filename, REC->flags, REC->mode | ||
117 | 119 | ||
118 | You can see that the event has 4 arguments as in the expressions you specified. | 120 | You can see that the event has 4 arguments as in the expressions you specified. |
119 | 121 | ||
@@ -121,6 +123,12 @@ print fmt: "(%lx) dfd=%lx filename=%lx flags=%lx mode=%lx", REC->ip, REC->dfd, R | |||
121 | 123 | ||
122 | This clears all probe points. | 124 | This clears all probe points. |
123 | 125 | ||
126 | Or, | ||
127 | |||
128 | echo -:myprobe >> kprobe_events | ||
129 | |||
130 | This clears probe points selectively. | ||
131 | |||
124 | Right after definition, each event is disabled by default. For tracing these | 132 | Right after definition, each event is disabled by default. For tracing these |
125 | events, you need to enable it. | 133 | events, you need to enable it. |
126 | 134 | ||
@@ -146,4 +154,3 @@ events, you need to enable it. | |||
146 | returns from SYMBOL(e.g. "sys_open+0x1b/0x1d <- do_sys_open" means kernel | 154 | returns from SYMBOL(e.g. "sys_open+0x1b/0x1d <- do_sys_open" means kernel |
147 | returns from do_sys_open to sys_open+0x1b). | 155 | returns from do_sys_open to sys_open+0x1b). |
148 | 156 | ||
149 | |||
diff --git a/Documentation/usb/error-codes.txt b/Documentation/usb/error-codes.txt index 9cf83e8c27b8..d83703ea74b2 100644 --- a/Documentation/usb/error-codes.txt +++ b/Documentation/usb/error-codes.txt | |||
@@ -41,8 +41,8 @@ USB-specific: | |||
41 | 41 | ||
42 | -EFBIG Host controller driver can't schedule that many ISO frames. | 42 | -EFBIG Host controller driver can't schedule that many ISO frames. |
43 | 43 | ||
44 | -EPIPE Specified endpoint is stalled. For non-control endpoints, | 44 | -EPIPE The pipe type specified in the URB doesn't match the |
45 | reset this status with usb_clear_halt(). | 45 | endpoint's actual type. |
46 | 46 | ||
47 | -EMSGSIZE (a) endpoint maxpacket size is zero; it is not usable | 47 | -EMSGSIZE (a) endpoint maxpacket size is zero; it is not usable |
48 | in the current interface altsetting. | 48 | in the current interface altsetting. |
@@ -60,6 +60,8 @@ USB-specific: | |||
60 | 60 | ||
61 | -EHOSTUNREACH URB was rejected because the device is suspended. | 61 | -EHOSTUNREACH URB was rejected because the device is suspended. |
62 | 62 | ||
63 | -ENOEXEC A control URB doesn't contain a Setup packet. | ||
64 | |||
63 | 65 | ||
64 | ************************************************************************** | 66 | ************************************************************************** |
65 | * Error codes returned by in urb->status * | 67 | * Error codes returned by in urb->status * |
diff --git a/Documentation/usb/power-management.txt b/Documentation/usb/power-management.txt index 3bf6818c8cf5..2790ad48cfc2 100644 --- a/Documentation/usb/power-management.txt +++ b/Documentation/usb/power-management.txt | |||
@@ -2,7 +2,7 @@ | |||
2 | 2 | ||
3 | Alan Stern <stern@rowland.harvard.edu> | 3 | Alan Stern <stern@rowland.harvard.edu> |
4 | 4 | ||
5 | November 10, 2009 | 5 | December 11, 2009 |
6 | 6 | ||
7 | 7 | ||
8 | 8 | ||
@@ -29,9 +29,9 @@ covered to some extent (see Documentation/power/*.txt for more | |||
29 | information about system PM). | 29 | information about system PM). |
30 | 30 | ||
31 | Note: Dynamic PM support for USB is present only if the kernel was | 31 | Note: Dynamic PM support for USB is present only if the kernel was |
32 | built with CONFIG_USB_SUSPEND enabled. System PM support is present | 32 | built with CONFIG_USB_SUSPEND enabled (which depends on |
33 | only if the kernel was built with CONFIG_SUSPEND or CONFIG_HIBERNATION | 33 | CONFIG_PM_RUNTIME). System PM support is present only if the kernel |
34 | enabled. | 34 | was built with CONFIG_SUSPEND or CONFIG_HIBERNATION enabled. |
35 | 35 | ||
36 | 36 | ||
37 | What is Remote Wakeup? | 37 | What is Remote Wakeup? |
@@ -229,6 +229,11 @@ necessary operations by hand or add them to a udev script. You can | |||
229 | also change the idle-delay time; 2 seconds is not the best choice for | 229 | also change the idle-delay time; 2 seconds is not the best choice for |
230 | every device. | 230 | every device. |
231 | 231 | ||
232 | If a driver knows that its device has proper suspend/resume support, | ||
233 | it can enable autosuspend all by itself. For example, the video | ||
234 | driver for a laptop's webcam might do this, since these devices are | ||
235 | rarely used and so should normally be autosuspended. | ||
236 | |||
232 | Sometimes it turns out that even when a device does work okay with | 237 | Sometimes it turns out that even when a device does work okay with |
233 | autosuspend there are still problems. For example, there are | 238 | autosuspend there are still problems. For example, there are |
234 | experimental patches adding autosuspend support to the usbhid driver, | 239 | experimental patches adding autosuspend support to the usbhid driver, |
@@ -321,69 +326,81 @@ driver does so by calling these six functions: | |||
321 | void usb_autopm_get_interface_no_resume(struct usb_interface *intf); | 326 | void usb_autopm_get_interface_no_resume(struct usb_interface *intf); |
322 | void usb_autopm_put_interface_no_suspend(struct usb_interface *intf); | 327 | void usb_autopm_put_interface_no_suspend(struct usb_interface *intf); |
323 | 328 | ||
324 | The functions work by maintaining a counter in the usb_interface | 329 | The functions work by maintaining a usage counter in the |
325 | structure. When intf->pm_usage_count is > 0 then the interface is | 330 | usb_interface's embedded device structure. When the counter is > 0 |
326 | deemed to be busy, and the kernel will not autosuspend the interface's | 331 | then the interface is deemed to be busy, and the kernel will not |
327 | device. When intf->pm_usage_count is <= 0 then the interface is | 332 | autosuspend the interface's device. When the usage counter is = 0 |
328 | considered to be idle, and the kernel may autosuspend the device. | 333 | then the interface is considered to be idle, and the kernel may |
334 | autosuspend the device. | ||
329 | 335 | ||
330 | (There is a similar pm_usage_count field in struct usb_device, | 336 | (There is a similar usage counter field in struct usb_device, |
331 | associated with the device itself rather than any of its interfaces. | 337 | associated with the device itself rather than any of its interfaces. |
332 | This field is used only by the USB core.) | 338 | This counter is used only by the USB core.) |
333 | 339 | ||
334 | Drivers must not modify intf->pm_usage_count directly; its value | 340 | Drivers need not be concerned about balancing changes to the usage |
335 | should be changed only be using the functions listed above. Drivers | 341 | counter; the USB core will undo any remaining "get"s when a driver |
336 | are responsible for insuring that the overall change to pm_usage_count | 342 | is unbound from its interface. As a corollary, drivers must not call |
337 | during their lifetime balances out to 0 (it may be necessary for the | 343 | any of the usb_autopm_* functions after their diconnect() routine has |
338 | disconnect method to call usb_autopm_put_interface() one or more times | 344 | returned. |
339 | to fulfill this requirement). The first two routines use the PM mutex | 345 | |
340 | in struct usb_device for mutual exclusion; drivers using the async | 346 | Drivers using the async routines are responsible for their own |
341 | routines are responsible for their own synchronization and mutual | 347 | synchronization and mutual exclusion. |
342 | exclusion. | 348 | |
343 | 349 | usb_autopm_get_interface() increments the usage counter and | |
344 | usb_autopm_get_interface() increments pm_usage_count and | 350 | does an autoresume if the device is suspended. If the |
345 | attempts an autoresume if the new value is > 0 and the | 351 | autoresume fails, the counter is decremented back. |
346 | device is suspended. | 352 | |
347 | 353 | usb_autopm_put_interface() decrements the usage counter and | |
348 | usb_autopm_put_interface() decrements pm_usage_count and | 354 | attempts an autosuspend if the new value is = 0. |
349 | attempts an autosuspend if the new value is <= 0 and the | ||
350 | device isn't suspended. | ||
351 | 355 | ||
352 | usb_autopm_get_interface_async() and | 356 | usb_autopm_get_interface_async() and |
353 | usb_autopm_put_interface_async() do almost the same things as | 357 | usb_autopm_put_interface_async() do almost the same things as |
354 | their non-async counterparts. The differences are: they do | 358 | their non-async counterparts. The big difference is that they |
355 | not acquire the PM mutex, and they use a workqueue to do their | 359 | use a workqueue to do the resume or suspend part of their |
356 | jobs. As a result they can be called in an atomic context, | 360 | jobs. As a result they can be called in an atomic context, |
357 | such as an URB's completion handler, but when they return the | 361 | such as an URB's completion handler, but when they return the |
358 | device will not generally not yet be in the desired state. | 362 | device will generally not yet be in the desired state. |
359 | 363 | ||
360 | usb_autopm_get_interface_no_resume() and | 364 | usb_autopm_get_interface_no_resume() and |
361 | usb_autopm_put_interface_no_suspend() merely increment or | 365 | usb_autopm_put_interface_no_suspend() merely increment or |
362 | decrement the pm_usage_count value; they do not attempt to | 366 | decrement the usage counter; they do not attempt to carry out |
363 | carry out an autoresume or an autosuspend. Hence they can be | 367 | an autoresume or an autosuspend. Hence they can be called in |
364 | called in an atomic context. | 368 | an atomic context. |
365 | 369 | ||
366 | The conventional usage pattern is that a driver calls | 370 | The simplest usage pattern is that a driver calls |
367 | usb_autopm_get_interface() in its open routine and | 371 | usb_autopm_get_interface() in its open routine and |
368 | usb_autopm_put_interface() in its close or release routine. But | 372 | usb_autopm_put_interface() in its close or release routine. But other |
369 | other patterns are possible. | 373 | patterns are possible. |
370 | 374 | ||
371 | The autosuspend attempts mentioned above will often fail for one | 375 | The autosuspend attempts mentioned above will often fail for one |
372 | reason or another. For example, the power/level attribute might be | 376 | reason or another. For example, the power/level attribute might be |
373 | set to "on", or another interface in the same device might not be | 377 | set to "on", or another interface in the same device might not be |
374 | idle. This is perfectly normal. If the reason for failure was that | 378 | idle. This is perfectly normal. If the reason for failure was that |
375 | the device hasn't been idle for long enough, a delayed workqueue | 379 | the device hasn't been idle for long enough, a timer is scheduled to |
376 | routine is automatically set up to carry out the operation when the | 380 | carry out the operation automatically when the autosuspend idle-delay |
377 | autosuspend idle-delay has expired. | 381 | has expired. |
378 | 382 | ||
379 | Autoresume attempts also can fail, although failure would mean that | 383 | Autoresume attempts also can fail, although failure would mean that |
380 | the device is no longer present or operating properly. Unlike | 384 | the device is no longer present or operating properly. Unlike |
381 | autosuspend, there's no delay for an autoresume. | 385 | autosuspend, there's no idle-delay for an autoresume. |
382 | 386 | ||
383 | 387 | ||
384 | Other parts of the driver interface | 388 | Other parts of the driver interface |
385 | ----------------------------------- | 389 | ----------------------------------- |
386 | 390 | ||
391 | Drivers can enable autosuspend for their devices by calling | ||
392 | |||
393 | usb_enable_autosuspend(struct usb_device *udev); | ||
394 | |||
395 | in their probe() routine, if they know that the device is capable of | ||
396 | suspending and resuming correctly. This is exactly equivalent to | ||
397 | writing "auto" to the device's power/level attribute. Likewise, | ||
398 | drivers can disable autosuspend by calling | ||
399 | |||
400 | usb_disable_autosuspend(struct usb_device *udev); | ||
401 | |||
402 | This is exactly the same as writing "on" to the power/level attribute. | ||
403 | |||
387 | Sometimes a driver needs to make sure that remote wakeup is enabled | 404 | Sometimes a driver needs to make sure that remote wakeup is enabled |
388 | during autosuspend. For example, there's not much point | 405 | during autosuspend. For example, there's not much point |
389 | autosuspending a keyboard if the user can't cause the keyboard to do a | 406 | autosuspending a keyboard if the user can't cause the keyboard to do a |
@@ -395,26 +412,27 @@ though, setting this flag won't cause the kernel to autoresume it. | |||
395 | Normally a driver would set this flag in its probe method, at which | 412 | Normally a driver would set this flag in its probe method, at which |
396 | time the device is guaranteed not to be autosuspended.) | 413 | time the device is guaranteed not to be autosuspended.) |
397 | 414 | ||
398 | The synchronous usb_autopm_* routines have to run in a sleepable | 415 | If a driver does its I/O asynchronously in interrupt context, it |
399 | process context; they must not be called from an interrupt handler or | 416 | should call usb_autopm_get_interface_async() before starting output and |
400 | while holding a spinlock. In fact, the entire autosuspend mechanism | 417 | usb_autopm_put_interface_async() when the output queue drains. When |
401 | is not well geared toward interrupt-driven operation. However there | 418 | it receives an input event, it should call |
402 | is one thing a driver can do in an interrupt handler: | ||
403 | 419 | ||
404 | usb_mark_last_busy(struct usb_device *udev); | 420 | usb_mark_last_busy(struct usb_device *udev); |
405 | 421 | ||
406 | This sets udev->last_busy to the current time. udev->last_busy is the | 422 | in the event handler. This sets udev->last_busy to the current time. |
407 | field used for idle-delay calculations; updating it will cause any | 423 | udev->last_busy is the field used for idle-delay calculations; |
408 | pending autosuspend to be moved back. The usb_autopm_* routines will | 424 | updating it will cause any pending autosuspend to be moved back. Most |
409 | also set the last_busy field to the current time. | 425 | of the usb_autopm_* routines will also set the last_busy field to the |
410 | 426 | current time. | |
411 | Calling urb_mark_last_busy() from within an URB completion handler is | 427 | |
412 | subject to races: The kernel may have just finished deciding the | 428 | Asynchronous operation is always subject to races. For example, a |
413 | device has been idle for long enough but not yet gotten around to | 429 | driver may call one of the usb_autopm_*_interface_async() routines at |
414 | calling the driver's suspend method. The driver would have to be | 430 | a time when the core has just finished deciding the device has been |
415 | responsible for synchronizing its suspend method with its URB | 431 | idle for long enough but not yet gotten around to calling the driver's |
416 | completion handler and causing the autosuspend to fail with -EBUSY if | 432 | suspend method. The suspend method must be responsible for |
417 | an URB had completed too recently. | 433 | synchronizing with the output request routine and the URB completion |
434 | handler; it should cause autosuspends to fail with -EBUSY if the | ||
435 | driver needs to use the device. | ||
418 | 436 | ||
419 | External suspend calls should never be allowed to fail in this way, | 437 | External suspend calls should never be allowed to fail in this way, |
420 | only autosuspend calls. The driver can tell them apart by checking | 438 | only autosuspend calls. The driver can tell them apart by checking |
@@ -422,75 +440,23 @@ the PM_EVENT_AUTO bit in the message.event argument to the suspend | |||
422 | method; this bit will be set for internal PM events (autosuspend) and | 440 | method; this bit will be set for internal PM events (autosuspend) and |
423 | clear for external PM events. | 441 | clear for external PM events. |
424 | 442 | ||
425 | Many of the ingredients in the autosuspend framework are oriented | ||
426 | towards interfaces: The usb_interface structure contains the | ||
427 | pm_usage_cnt field, and the usb_autopm_* routines take an interface | ||
428 | pointer as their argument. But somewhat confusingly, a few of the | ||
429 | pieces (i.e., usb_mark_last_busy()) use the usb_device structure | ||
430 | instead. Drivers need to keep this straight; they can call | ||
431 | interface_to_usbdev() to find the device structure for a given | ||
432 | interface. | ||
433 | |||
434 | 443 | ||
435 | Locking requirements | 444 | Mutual exclusion |
436 | -------------------- | 445 | ---------------- |
437 | 446 | ||
438 | All three suspend/resume methods are always called while holding the | 447 | For external events -- but not necessarily for autosuspend or |
439 | usb_device's PM mutex. For external events -- but not necessarily for | 448 | autoresume -- the device semaphore (udev->dev.sem) will be held when a |
440 | autosuspend or autoresume -- the device semaphore (udev->dev.sem) will | 449 | suspend or resume method is called. This implies that external |
441 | also be held. This implies that external suspend/resume events are | 450 | suspend/resume events are mutually exclusive with calls to probe, |
442 | mutually exclusive with calls to probe, disconnect, pre_reset, and | 451 | disconnect, pre_reset, and post_reset; the USB core guarantees that |
443 | post_reset; the USB core guarantees that this is true of internal | 452 | this is true of autosuspend/autoresume events as well. |
444 | suspend/resume events as well. | ||
445 | 453 | ||
446 | If a driver wants to block all suspend/resume calls during some | 454 | If a driver wants to block all suspend/resume calls during some |
447 | critical section, it can simply acquire udev->pm_mutex. Note that | 455 | critical section, the best way is to lock the device and call |
448 | calls to resume may be triggered indirectly. Block IO due to memory | 456 | usb_autopm_get_interface() (and do the reverse at the end of the |
449 | allocations can make the vm subsystem resume a device. Thus while | 457 | critical section). Holding the device semaphore will block all |
450 | holding this lock you must not allocate memory with GFP_KERNEL or | 458 | external PM calls, and the usb_autopm_get_interface() will prevent any |
451 | GFP_NOFS. | 459 | internal PM calls, even if it fails. (Exercise: Why?) |
452 | |||
453 | Alternatively, if the critical section might call some of the | ||
454 | usb_autopm_* routines, the driver can avoid deadlock by doing: | ||
455 | |||
456 | down(&udev->dev.sem); | ||
457 | rc = usb_autopm_get_interface(intf); | ||
458 | |||
459 | and at the end of the critical section: | ||
460 | |||
461 | if (!rc) | ||
462 | usb_autopm_put_interface(intf); | ||
463 | up(&udev->dev.sem); | ||
464 | |||
465 | Holding the device semaphore will block all external PM calls, and the | ||
466 | usb_autopm_get_interface() will prevent any internal PM calls, even if | ||
467 | it fails. (Exercise: Why?) | ||
468 | |||
469 | The rules for locking order are: | ||
470 | |||
471 | Never acquire any device semaphore while holding any PM mutex. | ||
472 | |||
473 | Never acquire udev->pm_mutex while holding the PM mutex for | ||
474 | a device that isn't a descendant of udev. | ||
475 | |||
476 | In other words, PM mutexes should only be acquired going up the device | ||
477 | tree, and they should be acquired only after locking all the device | ||
478 | semaphores you need to hold. These rules don't matter to drivers very | ||
479 | much; they usually affect just the USB core. | ||
480 | |||
481 | Still, drivers do need to be careful. For example, many drivers use a | ||
482 | private mutex to synchronize their normal I/O activities with their | ||
483 | disconnect method. Now if the driver supports autosuspend then it | ||
484 | must call usb_autopm_put_interface() from somewhere -- maybe from its | ||
485 | close method. It should make the call while holding the private mutex, | ||
486 | since a driver shouldn't call any of the usb_autopm_* functions for an | ||
487 | interface from which it has been unbound. | ||
488 | |||
489 | But the usb_autpm_* routines always acquire the device's PM mutex, and | ||
490 | consequently the locking order has to be: private mutex first, PM | ||
491 | mutex second. Since the suspend method is always called with the PM | ||
492 | mutex held, it mustn't try to acquire the private mutex. It has to | ||
493 | synchronize with the driver's I/O activities in some other way. | ||
494 | 460 | ||
495 | 461 | ||
496 | Interaction between dynamic PM and system PM | 462 | Interaction between dynamic PM and system PM |
@@ -499,22 +465,11 @@ synchronize with the driver's I/O activities in some other way. | |||
499 | Dynamic power management and system power management can interact in | 465 | Dynamic power management and system power management can interact in |
500 | a couple of ways. | 466 | a couple of ways. |
501 | 467 | ||
502 | Firstly, a device may already be manually suspended or autosuspended | 468 | Firstly, a device may already be autosuspended when a system suspend |
503 | when a system suspend occurs. Since system suspends are supposed to | 469 | occurs. Since system suspends are supposed to be as transparent as |
504 | be as transparent as possible, the device should remain suspended | 470 | possible, the device should remain suspended following the system |
505 | following the system resume. The 2.6.23 kernel obeys this principle | 471 | resume. But this theory may not work out well in practice; over time |
506 | for manually suspended devices but not for autosuspended devices; they | 472 | the kernel's behavior in this regard has changed. |
507 | do get resumed when the system wakes up. (Presumably they will be | ||
508 | autosuspended again after their idle-delay time expires.) In later | ||
509 | kernels this behavior will be fixed. | ||
510 | |||
511 | (There is an exception. If a device would undergo a reset-resume | ||
512 | instead of a normal resume, and the device is enabled for remote | ||
513 | wakeup, then the reset-resume takes place even if the device was | ||
514 | already suspended when the system suspend began. The justification is | ||
515 | that a reset-resume is a kind of remote-wakeup event. Or to put it | ||
516 | another way, a device which needs a reset won't be able to generate | ||
517 | normal remote-wakeup signals, so it ought to be resumed immediately.) | ||
518 | 473 | ||
519 | Secondly, a dynamic power-management event may occur as a system | 474 | Secondly, a dynamic power-management event may occur as a system |
520 | suspend is underway. The window for this is short, since system | 475 | suspend is underway. The window for this is short, since system |
diff --git a/Documentation/video4linux/CARDLIST.cx23885 b/Documentation/video4linux/CARDLIST.cx23885 index 7539e8fa1ffd..16ca030e1185 100644 --- a/Documentation/video4linux/CARDLIST.cx23885 +++ b/Documentation/video4linux/CARDLIST.cx23885 | |||
@@ -26,3 +26,4 @@ | |||
26 | 25 -> Compro VideoMate E800 [1858:e800] | 26 | 25 -> Compro VideoMate E800 [1858:e800] |
27 | 26 -> Hauppauge WinTV-HVR1290 [0070:8551] | 27 | 26 -> Hauppauge WinTV-HVR1290 [0070:8551] |
28 | 27 -> Mygica X8558 PRO DMB-TH [14f1:8578] | 28 | 27 -> Mygica X8558 PRO DMB-TH [14f1:8578] |
29 | 28 -> LEADTEK WinFast PxTV1200 [107d:6f22] | ||
diff --git a/Documentation/video4linux/CARDLIST.saa7134 b/Documentation/video4linux/CARDLIST.saa7134 index fce1e7eb0474..b4a767060ed7 100644 --- a/Documentation/video4linux/CARDLIST.saa7134 +++ b/Documentation/video4linux/CARDLIST.saa7134 | |||
@@ -174,3 +174,4 @@ | |||
174 | 173 -> Zolid Hybrid TV Tuner PCI [1131:2004] | 174 | 173 -> Zolid Hybrid TV Tuner PCI [1131:2004] |
175 | 174 -> Asus Europa Hybrid OEM [1043:4847] | 175 | 174 -> Asus Europa Hybrid OEM [1043:4847] |
176 | 175 -> Leadtek Winfast DTV1000S [107d:6655] | 176 | 175 -> Leadtek Winfast DTV1000S [107d:6655] |
177 | 176 -> Beholder BeholdTV 505 RDS [0000:5051] | ||
diff --git a/Documentation/video4linux/CARDLIST.tuner b/Documentation/video4linux/CARDLIST.tuner index e0d298fe8830..9b2e0dd6017e 100644 --- a/Documentation/video4linux/CARDLIST.tuner +++ b/Documentation/video4linux/CARDLIST.tuner | |||
@@ -81,3 +81,4 @@ tuner=80 - Philips FQ1216LME MK3 PAL/SECAM w/active loopthrough | |||
81 | tuner=81 - Partsnic (Daewoo) PTI-5NF05 | 81 | tuner=81 - Partsnic (Daewoo) PTI-5NF05 |
82 | tuner=82 - Philips CU1216L | 82 | tuner=82 - Philips CU1216L |
83 | tuner=83 - NXP TDA18271 | 83 | tuner=83 - NXP TDA18271 |
84 | tuner=84 - Sony BTF-Pxn01Z | ||
diff --git a/Documentation/video4linux/README.tlg2300 b/Documentation/video4linux/README.tlg2300 new file mode 100644 index 000000000000..416ccb93d8c9 --- /dev/null +++ b/Documentation/video4linux/README.tlg2300 | |||
@@ -0,0 +1,47 @@ | |||
1 | tlg2300 release notes | ||
2 | ==================== | ||
3 | |||
4 | This is a v4l2/dvb device driver for the tlg2300 chip. | ||
5 | |||
6 | |||
7 | current status | ||
8 | ============== | ||
9 | |||
10 | video | ||
11 | - support mmap and read().(no overlay) | ||
12 | |||
13 | audio | ||
14 | - The driver will register a ALSA card for the audio input. | ||
15 | |||
16 | vbi | ||
17 | - Works for almost TV norms. | ||
18 | |||
19 | dvb-t | ||
20 | - works for DVB-T | ||
21 | |||
22 | FM | ||
23 | - Works for radio. | ||
24 | |||
25 | --------------------------------------------------------------------------- | ||
26 | TESTED APPLICATIONS: | ||
27 | |||
28 | -VLC1.0.4 test the video and dvb. The GUI is friendly to use. | ||
29 | |||
30 | -Mplayer test the video. | ||
31 | |||
32 | -Mplayer test the FM. The mplayer should be compiled with --enable-radio and | ||
33 | --enable-radio-capture. | ||
34 | The command runs as this(The alsa audio registers to card 1): | ||
35 | #mplayer radio://103.7/capture/ -radio adevice=hw=1,0:arate=48000 \ | ||
36 | -rawaudio rate=48000:channels=2 | ||
37 | |||
38 | --------------------------------------------------------------------------- | ||
39 | KNOWN PROBLEMS: | ||
40 | about preemphasis: | ||
41 | You can set the preemphasis for radio by the following command: | ||
42 | #v4l2-ctl -d /dev/radio0 --set-ctrl=pre_emphasis_settings=1 | ||
43 | |||
44 | "pre_emphasis_settings=1" means that you select the 50us. If you want | ||
45 | to select the 75us, please use "pre_emphasis_settings=2" | ||
46 | |||
47 | |||
diff --git a/Documentation/video4linux/gspca.txt b/Documentation/video4linux/gspca.txt index 1800a62cf135..181b9e6fd984 100644 --- a/Documentation/video4linux/gspca.txt +++ b/Documentation/video4linux/gspca.txt | |||
@@ -42,6 +42,7 @@ ov519 041e:4064 Creative Live! VISTA VF0420 | |||
42 | ov519 041e:4067 Creative Live! Cam Video IM (VF0350) | 42 | ov519 041e:4067 Creative Live! Cam Video IM (VF0350) |
43 | ov519 041e:4068 Creative Live! VISTA VF0470 | 43 | ov519 041e:4068 Creative Live! VISTA VF0470 |
44 | spca561 0458:7004 Genius VideoCAM Express V2 | 44 | spca561 0458:7004 Genius VideoCAM Express V2 |
45 | sn9c2028 0458:7005 Genius Smart 300, version 2 | ||
45 | sunplus 0458:7006 Genius Dsc 1.3 Smart | 46 | sunplus 0458:7006 Genius Dsc 1.3 Smart |
46 | zc3xx 0458:7007 Genius VideoCam V2 | 47 | zc3xx 0458:7007 Genius VideoCam V2 |
47 | zc3xx 0458:700c Genius VideoCam V3 | 48 | zc3xx 0458:700c Genius VideoCam V3 |
@@ -109,6 +110,7 @@ sunplus 04a5:3003 Benq DC 1300 | |||
109 | sunplus 04a5:3008 Benq DC 1500 | 110 | sunplus 04a5:3008 Benq DC 1500 |
110 | sunplus 04a5:300a Benq DC 3410 | 111 | sunplus 04a5:300a Benq DC 3410 |
111 | spca500 04a5:300c Benq DC 1016 | 112 | spca500 04a5:300c Benq DC 1016 |
113 | benq 04a5:3035 Benq DC E300 | ||
112 | finepix 04cb:0104 Fujifilm FinePix 4800 | 114 | finepix 04cb:0104 Fujifilm FinePix 4800 |
113 | finepix 04cb:0109 Fujifilm FinePix A202 | 115 | finepix 04cb:0109 Fujifilm FinePix A202 |
114 | finepix 04cb:010b Fujifilm FinePix A203 | 116 | finepix 04cb:010b Fujifilm FinePix A203 |
@@ -142,6 +144,7 @@ sunplus 04fc:5360 Sunplus Generic | |||
142 | spca500 04fc:7333 PalmPixDC85 | 144 | spca500 04fc:7333 PalmPixDC85 |
143 | sunplus 04fc:ffff Pure DigitalDakota | 145 | sunplus 04fc:ffff Pure DigitalDakota |
144 | spca501 0506:00df 3Com HomeConnect Lite | 146 | spca501 0506:00df 3Com HomeConnect Lite |
147 | sunplus 052b:1507 Megapixel 5 Pretec DC-1007 | ||
145 | sunplus 052b:1513 Megapix V4 | 148 | sunplus 052b:1513 Megapix V4 |
146 | sunplus 052b:1803 MegaImage VI | 149 | sunplus 052b:1803 MegaImage VI |
147 | tv8532 0545:808b Veo Stingray | 150 | tv8532 0545:808b Veo Stingray |
@@ -151,6 +154,7 @@ sunplus 0546:3191 Polaroid Ion 80 | |||
151 | sunplus 0546:3273 Polaroid PDC2030 | 154 | sunplus 0546:3273 Polaroid PDC2030 |
152 | ov519 054c:0154 Sonny toy4 | 155 | ov519 054c:0154 Sonny toy4 |
153 | ov519 054c:0155 Sonny toy5 | 156 | ov519 054c:0155 Sonny toy5 |
157 | cpia1 0553:0002 CPIA CPiA (version1) based cameras | ||
154 | zc3xx 055f:c005 Mustek Wcam300A | 158 | zc3xx 055f:c005 Mustek Wcam300A |
155 | spca500 055f:c200 Mustek Gsmart 300 | 159 | spca500 055f:c200 Mustek Gsmart 300 |
156 | sunplus 055f:c211 Kowa Bs888e Microcamera | 160 | sunplus 055f:c211 Kowa Bs888e Microcamera |
@@ -188,8 +192,7 @@ spca500 06bd:0404 Agfa CL20 | |||
188 | spca500 06be:0800 Optimedia | 192 | spca500 06be:0800 Optimedia |
189 | sunplus 06d6:0031 Trust 610 LCD PowerC@m Zoom | 193 | sunplus 06d6:0031 Trust 610 LCD PowerC@m Zoom |
190 | spca506 06e1:a190 ADS Instant VCD | 194 | spca506 06e1:a190 ADS Instant VCD |
191 | ov534 06f8:3002 Hercules Blog Webcam | 195 | ov534_9 06f8:3003 Hercules Dualpix HD Weblog |
192 | ov534 06f8:3003 Hercules Dualpix HD Weblog | ||
193 | sonixj 06f8:3004 Hercules Classic Silver | 196 | sonixj 06f8:3004 Hercules Classic Silver |
194 | sonixj 06f8:3008 Hercules Deluxe Optical Glass | 197 | sonixj 06f8:3008 Hercules Deluxe Optical Glass |
195 | pac7302 06f8:3009 Hercules Classic Link | 198 | pac7302 06f8:3009 Hercules Classic Link |
@@ -204,6 +207,7 @@ sunplus 0733:2221 Mercury Digital Pro 3.1p | |||
204 | sunplus 0733:3261 Concord 3045 spca536a | 207 | sunplus 0733:3261 Concord 3045 spca536a |
205 | sunplus 0733:3281 Cyberpix S550V | 208 | sunplus 0733:3281 Cyberpix S550V |
206 | spca506 0734:043b 3DeMon USB Capture aka | 209 | spca506 0734:043b 3DeMon USB Capture aka |
210 | cpia1 0813:0001 QX3 camera | ||
207 | ov519 0813:0002 Dual Mode USB Camera Plus | 211 | ov519 0813:0002 Dual Mode USB Camera Plus |
208 | spca500 084d:0003 D-Link DSC-350 | 212 | spca500 084d:0003 D-Link DSC-350 |
209 | spca500 08ca:0103 Aiptek PocketDV | 213 | spca500 08ca:0103 Aiptek PocketDV |
@@ -225,7 +229,8 @@ sunplus 08ca:2050 Medion MD 41437 | |||
225 | sunplus 08ca:2060 Aiptek PocketDV5300 | 229 | sunplus 08ca:2060 Aiptek PocketDV5300 |
226 | tv8532 0923:010f ICM532 cams | 230 | tv8532 0923:010f ICM532 cams |
227 | mars 093a:050f Mars-Semi Pc-Camera | 231 | mars 093a:050f Mars-Semi Pc-Camera |
228 | mr97310a 093a:010f Sakar Digital no. 77379 | 232 | mr97310a 093a:010e All known CIF cams with this ID |
233 | mr97310a 093a:010f All known VGA cams with this ID | ||
229 | pac207 093a:2460 Qtec Webcam 100 | 234 | pac207 093a:2460 Qtec Webcam 100 |
230 | pac207 093a:2461 HP Webcam | 235 | pac207 093a:2461 HP Webcam |
231 | pac207 093a:2463 Philips SPC 220 NC | 236 | pac207 093a:2463 Philips SPC 220 NC |
@@ -302,6 +307,7 @@ sonixj 0c45:613b Surfer SN-206 | |||
302 | sonixj 0c45:613c Sonix Pccam168 | 307 | sonixj 0c45:613c Sonix Pccam168 |
303 | sonixj 0c45:6143 Sonix Pccam168 | 308 | sonixj 0c45:6143 Sonix Pccam168 |
304 | sonixj 0c45:6148 Digitus DA-70811/ZSMC USB PC Camera ZS211/Microdia | 309 | sonixj 0c45:6148 Digitus DA-70811/ZSMC USB PC Camera ZS211/Microdia |
310 | sonixj 0c45:614a Frontech E-Ccam (JIL-2225) | ||
305 | sn9c20x 0c45:6240 PC Camera (SN9C201 + MT9M001) | 311 | sn9c20x 0c45:6240 PC Camera (SN9C201 + MT9M001) |
306 | sn9c20x 0c45:6242 PC Camera (SN9C201 + MT9M111) | 312 | sn9c20x 0c45:6242 PC Camera (SN9C201 + MT9M111) |
307 | sn9c20x 0c45:6248 PC Camera (SN9C201 + OV9655) | 313 | sn9c20x 0c45:6248 PC Camera (SN9C201 + OV9655) |
@@ -324,6 +330,10 @@ sn9c20x 0c45:62b0 PC Camera (SN9C202 + MT9V011/MT9V111/MT9V112) | |||
324 | sn9c20x 0c45:62b3 PC Camera (SN9C202 + OV9655) | 330 | sn9c20x 0c45:62b3 PC Camera (SN9C202 + OV9655) |
325 | sn9c20x 0c45:62bb PC Camera (SN9C202 + OV7660) | 331 | sn9c20x 0c45:62bb PC Camera (SN9C202 + OV7660) |
326 | sn9c20x 0c45:62bc PC Camera (SN9C202 + HV7131R) | 332 | sn9c20x 0c45:62bc PC Camera (SN9C202 + HV7131R) |
333 | sn9c2028 0c45:8001 Wild Planet Digital Spy Camera | ||
334 | sn9c2028 0c45:8003 Sakar #11199, #6637x, #67480 keychain cams | ||
335 | sn9c2028 0c45:8008 Mini-Shotz ms-350 | ||
336 | sn9c2028 0c45:800a Vivitar Vivicam 3350B | ||
327 | sunplus 0d64:0303 Sunplus FashionCam DXG | 337 | sunplus 0d64:0303 Sunplus FashionCam DXG |
328 | ov519 0e96:c001 TRUST 380 USB2 SPACEC@M | 338 | ov519 0e96:c001 TRUST 380 USB2 SPACEC@M |
329 | etoms 102c:6151 Qcam Sangha CIF | 339 | etoms 102c:6151 Qcam Sangha CIF |
@@ -341,10 +351,11 @@ spca501 1776:501c Arowana 300K CMOS Camera | |||
341 | t613 17a1:0128 TASCORP JPEG Webcam, NGS Cyclops | 351 | t613 17a1:0128 TASCORP JPEG Webcam, NGS Cyclops |
342 | vc032x 17ef:4802 Lenovo Vc0323+MI1310_SOC | 352 | vc032x 17ef:4802 Lenovo Vc0323+MI1310_SOC |
343 | pac207 2001:f115 D-Link DSB-C120 | 353 | pac207 2001:f115 D-Link DSB-C120 |
344 | sq905c 2770:9050 sq905c | 354 | sq905c 2770:9050 Disney pix micro (CIF) |
345 | sq905c 2770:905c DualCamera | 355 | sq905c 2770:9052 Disney pix micro 2 (VGA) |
346 | sq905 2770:9120 Argus Digital Camera DC1512 | 356 | sq905c 2770:905c All 11 known cameras with this ID |
347 | sq905c 2770:913d sq905c | 357 | sq905 2770:9120 All 24 known cameras with this ID |
358 | sq905c 2770:913d All 4 known cameras with this ID | ||
348 | spca500 2899:012c Toptro Industrial | 359 | spca500 2899:012c Toptro Industrial |
349 | ov519 8020:ef04 ov519 | 360 | ov519 8020:ef04 ov519 |
350 | spca508 8086:0110 Intel Easy PC Camera | 361 | spca508 8086:0110 Intel Easy PC Camera |
diff --git a/Documentation/video4linux/v4l2-framework.txt b/Documentation/video4linux/v4l2-framework.txt index 74d677c8b036..5155700c206b 100644 --- a/Documentation/video4linux/v4l2-framework.txt +++ b/Documentation/video4linux/v4l2-framework.txt | |||
@@ -599,99 +599,13 @@ video_device::minor fields. | |||
599 | video buffer helper functions | 599 | video buffer helper functions |
600 | ----------------------------- | 600 | ----------------------------- |
601 | 601 | ||
602 | The v4l2 core API provides a standard method for dealing with video | 602 | The v4l2 core API provides a set of standard methods (called "videobuf") |
603 | buffers. Those methods allow a driver to implement read(), mmap() and | 603 | for dealing with video buffers. Those methods allow a driver to implement |
604 | overlay() on a consistent way. | 604 | read(), mmap() and overlay() in a consistent way. There are currently |
605 | 605 | methods for using video buffers on devices that supports DMA with | |
606 | There are currently methods for using video buffers on devices that | 606 | scatter/gather method (videobuf-dma-sg), DMA with linear access |
607 | supports DMA with scatter/gather method (videobuf-dma-sg), DMA with | 607 | (videobuf-dma-contig), and vmalloced buffers, mostly used on USB drivers |
608 | linear access (videobuf-dma-contig), and vmalloced buffers, mostly | 608 | (videobuf-vmalloc). |
609 | used on USB drivers (videobuf-vmalloc). | 609 | |
610 | 610 | Please see Documentation/video4linux/videobuf for more information on how | |
611 | Any driver using videobuf should provide operations (callbacks) for | 611 | to use the videobuf layer. |
612 | four handlers: | ||
613 | |||
614 | ops->buf_setup - calculates the size of the video buffers and avoid they | ||
615 | to waste more than some maximum limit of RAM; | ||
616 | ops->buf_prepare - fills the video buffer structs and calls | ||
617 | videobuf_iolock() to alloc and prepare mmaped memory; | ||
618 | ops->buf_queue - advices the driver that another buffer were | ||
619 | requested (by read() or by QBUF); | ||
620 | ops->buf_release - frees any buffer that were allocated. | ||
621 | |||
622 | In order to use it, the driver need to have a code (generally called at | ||
623 | interrupt context) that will properly handle the buffer request lists, | ||
624 | announcing that a new buffer were filled. | ||
625 | |||
626 | The irq handling code should handle the videobuf task lists, in order | ||
627 | to advice videobuf that a new frame were filled, in order to honor to a | ||
628 | request. The code is generally like this one: | ||
629 | if (list_empty(&dma_q->active)) | ||
630 | return; | ||
631 | |||
632 | buf = list_entry(dma_q->active.next, struct vbuffer, vb.queue); | ||
633 | |||
634 | if (!waitqueue_active(&buf->vb.done)) | ||
635 | return; | ||
636 | |||
637 | /* Some logic to handle the buf may be needed here */ | ||
638 | |||
639 | list_del(&buf->vb.queue); | ||
640 | do_gettimeofday(&buf->vb.ts); | ||
641 | wake_up(&buf->vb.done); | ||
642 | |||
643 | Those are the videobuffer functions used on drivers, implemented on | ||
644 | videobuf-core: | ||
645 | |||
646 | - Videobuf init functions | ||
647 | videobuf_queue_sg_init() | ||
648 | Initializes the videobuf infrastructure. This function should be | ||
649 | called before any other videobuf function on drivers that uses DMA | ||
650 | Scatter/Gather buffers. | ||
651 | |||
652 | videobuf_queue_dma_contig_init | ||
653 | Initializes the videobuf infrastructure. This function should be | ||
654 | called before any other videobuf function on drivers that need DMA | ||
655 | contiguous buffers. | ||
656 | |||
657 | videobuf_queue_vmalloc_init() | ||
658 | Initializes the videobuf infrastructure. This function should be | ||
659 | called before any other videobuf function on USB (and other drivers) | ||
660 | that need a vmalloced type of videobuf. | ||
661 | |||
662 | - videobuf_iolock() | ||
663 | Prepares the videobuf memory for the proper method (read, mmap, overlay). | ||
664 | |||
665 | - videobuf_queue_is_busy() | ||
666 | Checks if a videobuf is streaming. | ||
667 | |||
668 | - videobuf_queue_cancel() | ||
669 | Stops video handling. | ||
670 | |||
671 | - videobuf_mmap_free() | ||
672 | frees mmap buffers. | ||
673 | |||
674 | - videobuf_stop() | ||
675 | Stops video handling, ends mmap and frees mmap and other buffers. | ||
676 | |||
677 | - V4L2 api functions. Those functions correspond to VIDIOC_foo ioctls: | ||
678 | videobuf_reqbufs(), videobuf_querybuf(), videobuf_qbuf(), | ||
679 | videobuf_dqbuf(), videobuf_streamon(), videobuf_streamoff(). | ||
680 | |||
681 | - V4L1 api function (corresponds to VIDIOCMBUF ioctl): | ||
682 | videobuf_cgmbuf() | ||
683 | This function is used to provide backward compatibility with V4L1 | ||
684 | API. | ||
685 | |||
686 | - Some help functions for read()/poll() operations: | ||
687 | videobuf_read_stream() | ||
688 | For continuous stream read() | ||
689 | videobuf_read_one() | ||
690 | For snapshot read() | ||
691 | videobuf_poll_stream() | ||
692 | polling help function | ||
693 | |||
694 | The better way to understand it is to take a look at vivi driver. One | ||
695 | of the main reasons for vivi is to be a videobuf usage example. the | ||
696 | vivi_thread_tick() does the task that the IRQ callback would do on PCI | ||
697 | drivers (or the irq callback on USB). | ||
diff --git a/Documentation/video4linux/videobuf b/Documentation/video4linux/videobuf new file mode 100644 index 000000000000..17a1f9abf260 --- /dev/null +++ b/Documentation/video4linux/videobuf | |||
@@ -0,0 +1,360 @@ | |||
1 | An introduction to the videobuf layer | ||
2 | Jonathan Corbet <corbet@lwn.net> | ||
3 | Current as of 2.6.33 | ||
4 | |||
5 | The videobuf layer functions as a sort of glue layer between a V4L2 driver | ||
6 | and user space. It handles the allocation and management of buffers for | ||
7 | the storage of video frames. There is a set of functions which can be used | ||
8 | to implement many of the standard POSIX I/O system calls, including read(), | ||
9 | poll(), and, happily, mmap(). Another set of functions can be used to | ||
10 | implement the bulk of the V4L2 ioctl() calls related to streaming I/O, | ||
11 | including buffer allocation, queueing and dequeueing, and streaming | ||
12 | control. Using videobuf imposes a few design decisions on the driver | ||
13 | author, but the payback comes in the form of reduced code in the driver and | ||
14 | a consistent implementation of the V4L2 user-space API. | ||
15 | |||
16 | Buffer types | ||
17 | |||
18 | Not all video devices use the same kind of buffers. In fact, there are (at | ||
19 | least) three common variations: | ||
20 | |||
21 | - Buffers which are scattered in both the physical and (kernel) virtual | ||
22 | address spaces. (Almost) all user-space buffers are like this, but it | ||
23 | makes great sense to allocate kernel-space buffers this way as well when | ||
24 | it is possible. Unfortunately, it is not always possible; working with | ||
25 | this kind of buffer normally requires hardware which can do | ||
26 | scatter/gather DMA operations. | ||
27 | |||
28 | - Buffers which are physically scattered, but which are virtually | ||
29 | contiguous; buffers allocated with vmalloc(), in other words. These | ||
30 | buffers are just as hard to use for DMA operations, but they can be | ||
31 | useful in situations where DMA is not available but virtually-contiguous | ||
32 | buffers are convenient. | ||
33 | |||
34 | - Buffers which are physically contiguous. Allocation of this kind of | ||
35 | buffer can be unreliable on fragmented systems, but simpler DMA | ||
36 | controllers cannot deal with anything else. | ||
37 | |||
38 | Videobuf can work with all three types of buffers, but the driver author | ||
39 | must pick one at the outset and design the driver around that decision. | ||
40 | |||
41 | [It's worth noting that there's a fourth kind of buffer: "overlay" buffers | ||
42 | which are located within the system's video memory. The overlay | ||
43 | functionality is considered to be deprecated for most use, but it still | ||
44 | shows up occasionally in system-on-chip drivers where the performance | ||
45 | benefits merit the use of this technique. Overlay buffers can be handled | ||
46 | as a form of scattered buffer, but there are very few implementations in | ||
47 | the kernel and a description of this technique is currently beyond the | ||
48 | scope of this document.] | ||
49 | |||
50 | Data structures, callbacks, and initialization | ||
51 | |||
52 | Depending on which type of buffers are being used, the driver should | ||
53 | include one of the following files: | ||
54 | |||
55 | <media/videobuf-dma-sg.h> /* Physically scattered */ | ||
56 | <media/videobuf-vmalloc.h> /* vmalloc() buffers */ | ||
57 | <media/videobuf-dma-contig.h> /* Physically contiguous */ | ||
58 | |||
59 | The driver's data structure describing a V4L2 device should include a | ||
60 | struct videobuf_queue instance for the management of the buffer queue, | ||
61 | along with a list_head for the queue of available buffers. There will also | ||
62 | need to be an interrupt-safe spinlock which is used to protect (at least) | ||
63 | the queue. | ||
64 | |||
65 | The next step is to write four simple callbacks to help videobuf deal with | ||
66 | the management of buffers: | ||
67 | |||
68 | struct videobuf_queue_ops { | ||
69 | int (*buf_setup)(struct videobuf_queue *q, | ||
70 | unsigned int *count, unsigned int *size); | ||
71 | int (*buf_prepare)(struct videobuf_queue *q, | ||
72 | struct videobuf_buffer *vb, | ||
73 | enum v4l2_field field); | ||
74 | void (*buf_queue)(struct videobuf_queue *q, | ||
75 | struct videobuf_buffer *vb); | ||
76 | void (*buf_release)(struct videobuf_queue *q, | ||
77 | struct videobuf_buffer *vb); | ||
78 | }; | ||
79 | |||
80 | buf_setup() is called early in the I/O process, when streaming is being | ||
81 | initiated; its purpose is to tell videobuf about the I/O stream. The count | ||
82 | parameter will be a suggested number of buffers to use; the driver should | ||
83 | check it for rationality and adjust it if need be. As a practical rule, a | ||
84 | minimum of two buffers are needed for proper streaming, and there is | ||
85 | usually a maximum (which cannot exceed 32) which makes sense for each | ||
86 | device. The size parameter should be set to the expected (maximum) size | ||
87 | for each frame of data. | ||
88 | |||
89 | Each buffer (in the form of a struct videobuf_buffer pointer) will be | ||
90 | passed to buf_prepare(), which should set the buffer's size, width, height, | ||
91 | and field fields properly. If the buffer's state field is | ||
92 | VIDEOBUF_NEEDS_INIT, the driver should pass it to: | ||
93 | |||
94 | int videobuf_iolock(struct videobuf_queue* q, struct videobuf_buffer *vb, | ||
95 | struct v4l2_framebuffer *fbuf); | ||
96 | |||
97 | Among other things, this call will usually allocate memory for the buffer. | ||
98 | Finally, the buf_prepare() function should set the buffer's state to | ||
99 | VIDEOBUF_PREPARED. | ||
100 | |||
101 | When a buffer is queued for I/O, it is passed to buf_queue(), which should | ||
102 | put it onto the driver's list of available buffers and set its state to | ||
103 | VIDEOBUF_QUEUED. Note that this function is called with the queue spinlock | ||
104 | held; if it tries to acquire it as well things will come to a screeching | ||
105 | halt. Yes, this is the voice of experience. Note also that videobuf may | ||
106 | wait on the first buffer in the queue; placing other buffers in front of it | ||
107 | could again gum up the works. So use list_add_tail() to enqueue buffers. | ||
108 | |||
109 | Finally, buf_release() is called when a buffer is no longer intended to be | ||
110 | used. The driver should ensure that there is no I/O active on the buffer, | ||
111 | then pass it to the appropriate free routine(s): | ||
112 | |||
113 | /* Scatter/gather drivers */ | ||
114 | int videobuf_dma_unmap(struct videobuf_queue *q, | ||
115 | struct videobuf_dmabuf *dma); | ||
116 | int videobuf_dma_free(struct videobuf_dmabuf *dma); | ||
117 | |||
118 | /* vmalloc drivers */ | ||
119 | void videobuf_vmalloc_free (struct videobuf_buffer *buf); | ||
120 | |||
121 | /* Contiguous drivers */ | ||
122 | void videobuf_dma_contig_free(struct videobuf_queue *q, | ||
123 | struct videobuf_buffer *buf); | ||
124 | |||
125 | One way to ensure that a buffer is no longer under I/O is to pass it to: | ||
126 | |||
127 | int videobuf_waiton(struct videobuf_buffer *vb, int non_blocking, int intr); | ||
128 | |||
129 | Here, vb is the buffer, non_blocking indicates whether non-blocking I/O | ||
130 | should be used (it should be zero in the buf_release() case), and intr | ||
131 | controls whether an interruptible wait is used. | ||
132 | |||
133 | File operations | ||
134 | |||
135 | At this point, much of the work is done; much of the rest is slipping | ||
136 | videobuf calls into the implementation of the other driver callbacks. The | ||
137 | first step is in the open() function, which must initialize the | ||
138 | videobuf queue. The function to use depends on the type of buffer used: | ||
139 | |||
140 | void videobuf_queue_sg_init(struct videobuf_queue *q, | ||
141 | struct videobuf_queue_ops *ops, | ||
142 | struct device *dev, | ||
143 | spinlock_t *irqlock, | ||
144 | enum v4l2_buf_type type, | ||
145 | enum v4l2_field field, | ||
146 | unsigned int msize, | ||
147 | void *priv); | ||
148 | |||
149 | void videobuf_queue_vmalloc_init(struct videobuf_queue *q, | ||
150 | struct videobuf_queue_ops *ops, | ||
151 | struct device *dev, | ||
152 | spinlock_t *irqlock, | ||
153 | enum v4l2_buf_type type, | ||
154 | enum v4l2_field field, | ||
155 | unsigned int msize, | ||
156 | void *priv); | ||
157 | |||
158 | void videobuf_queue_dma_contig_init(struct videobuf_queue *q, | ||
159 | struct videobuf_queue_ops *ops, | ||
160 | struct device *dev, | ||
161 | spinlock_t *irqlock, | ||
162 | enum v4l2_buf_type type, | ||
163 | enum v4l2_field field, | ||
164 | unsigned int msize, | ||
165 | void *priv); | ||
166 | |||
167 | In each case, the parameters are the same: q is the queue structure for the | ||
168 | device, ops is the set of callbacks as described above, dev is the device | ||
169 | structure for this video device, irqlock is an interrupt-safe spinlock to | ||
170 | protect access to the data structures, type is the buffer type used by the | ||
171 | device (cameras will use V4L2_BUF_TYPE_VIDEO_CAPTURE, for example), field | ||
172 | describes which field is being captured (often V4L2_FIELD_NONE for | ||
173 | progressive devices), msize is the size of any containing structure used | ||
174 | around struct videobuf_buffer, and priv is a private data pointer which | ||
175 | shows up in the priv_data field of struct videobuf_queue. Note that these | ||
176 | are void functions which, evidently, are immune to failure. | ||
177 | |||
178 | V4L2 capture drivers can be written to support either of two APIs: the | ||
179 | read() system call and the rather more complicated streaming mechanism. As | ||
180 | a general rule, it is necessary to support both to ensure that all | ||
181 | applications have a chance of working with the device. Videobuf makes it | ||
182 | easy to do that with the same code. To implement read(), the driver need | ||
183 | only make a call to one of: | ||
184 | |||
185 | ssize_t videobuf_read_one(struct videobuf_queue *q, | ||
186 | char __user *data, size_t count, | ||
187 | loff_t *ppos, int nonblocking); | ||
188 | |||
189 | ssize_t videobuf_read_stream(struct videobuf_queue *q, | ||
190 | char __user *data, size_t count, | ||
191 | loff_t *ppos, int vbihack, int nonblocking); | ||
192 | |||
193 | Either one of these functions will read frame data into data, returning the | ||
194 | amount actually read; the difference is that videobuf_read_one() will only | ||
195 | read a single frame, while videobuf_read_stream() will read multiple frames | ||
196 | if they are needed to satisfy the count requested by the application. A | ||
197 | typical driver read() implementation will start the capture engine, call | ||
198 | one of the above functions, then stop the engine before returning (though a | ||
199 | smarter implementation might leave the engine running for a little while in | ||
200 | anticipation of another read() call happening in the near future). | ||
201 | |||
202 | The poll() function can usually be implemented with a direct call to: | ||
203 | |||
204 | unsigned int videobuf_poll_stream(struct file *file, | ||
205 | struct videobuf_queue *q, | ||
206 | poll_table *wait); | ||
207 | |||
208 | Note that the actual wait queue eventually used will be the one associated | ||
209 | with the first available buffer. | ||
210 | |||
211 | When streaming I/O is done to kernel-space buffers, the driver must support | ||
212 | the mmap() system call to enable user space to access the data. In many | ||
213 | V4L2 drivers, the often-complex mmap() implementation simplifies to a | ||
214 | single call to: | ||
215 | |||
216 | int videobuf_mmap_mapper(struct videobuf_queue *q, | ||
217 | struct vm_area_struct *vma); | ||
218 | |||
219 | Everything else is handled by the videobuf code. | ||
220 | |||
221 | The release() function requires two separate videobuf calls: | ||
222 | |||
223 | void videobuf_stop(struct videobuf_queue *q); | ||
224 | int videobuf_mmap_free(struct videobuf_queue *q); | ||
225 | |||
226 | The call to videobuf_stop() terminates any I/O in progress - though it is | ||
227 | still up to the driver to stop the capture engine. The call to | ||
228 | videobuf_mmap_free() will ensure that all buffers have been unmapped; if | ||
229 | so, they will all be passed to the buf_release() callback. If buffers | ||
230 | remain mapped, videobuf_mmap_free() returns an error code instead. The | ||
231 | purpose is clearly to cause the closing of the file descriptor to fail if | ||
232 | buffers are still mapped, but every driver in the 2.6.32 kernel cheerfully | ||
233 | ignores its return value. | ||
234 | |||
235 | ioctl() operations | ||
236 | |||
237 | The V4L2 API includes a very long list of driver callbacks to respond to | ||
238 | the many ioctl() commands made available to user space. A number of these | ||
239 | - those associated with streaming I/O - turn almost directly into videobuf | ||
240 | calls. The relevant helper functions are: | ||
241 | |||
242 | int videobuf_reqbufs(struct videobuf_queue *q, | ||
243 | struct v4l2_requestbuffers *req); | ||
244 | int videobuf_querybuf(struct videobuf_queue *q, struct v4l2_buffer *b); | ||
245 | int videobuf_qbuf(struct videobuf_queue *q, struct v4l2_buffer *b); | ||
246 | int videobuf_dqbuf(struct videobuf_queue *q, struct v4l2_buffer *b, | ||
247 | int nonblocking); | ||
248 | int videobuf_streamon(struct videobuf_queue *q); | ||
249 | int videobuf_streamoff(struct videobuf_queue *q); | ||
250 | int videobuf_cgmbuf(struct videobuf_queue *q, struct video_mbuf *mbuf, | ||
251 | int count); | ||
252 | |||
253 | So, for example, a VIDIOC_REQBUFS call turns into a call to the driver's | ||
254 | vidioc_reqbufs() callback which, in turn, usually only needs to locate the | ||
255 | proper struct videobuf_queue pointer and pass it to videobuf_reqbufs(). | ||
256 | These support functions can replace a great deal of buffer management | ||
257 | boilerplate in a lot of V4L2 drivers. | ||
258 | |||
259 | The vidioc_streamon() and vidioc_streamoff() functions will be a bit more | ||
260 | complex, of course, since they will also need to deal with starting and | ||
261 | stopping the capture engine. videobuf_cgmbuf(), called from the driver's | ||
262 | vidiocgmbuf() function, only exists if the V4L1 compatibility module has | ||
263 | been selected with CONFIG_VIDEO_V4L1_COMPAT, so its use must be surrounded | ||
264 | with #ifdef directives. | ||
265 | |||
266 | Buffer allocation | ||
267 | |||
268 | Thus far, we have talked about buffers, but have not looked at how they are | ||
269 | allocated. The scatter/gather case is the most complex on this front. For | ||
270 | allocation, the driver can leave buffer allocation entirely up to the | ||
271 | videobuf layer; in this case, buffers will be allocated as anonymous | ||
272 | user-space pages and will be very scattered indeed. If the application is | ||
273 | using user-space buffers, no allocation is needed; the videobuf layer will | ||
274 | take care of calling get_user_pages() and filling in the scatterlist array. | ||
275 | |||
276 | If the driver needs to do its own memory allocation, it should be done in | ||
277 | the vidioc_reqbufs() function, *after* calling videobuf_reqbufs(). The | ||
278 | first step is a call to: | ||
279 | |||
280 | struct videobuf_dmabuf *videobuf_to_dma(struct videobuf_buffer *buf); | ||
281 | |||
282 | The returned videobuf_dmabuf structure (defined in | ||
283 | <media/videobuf-dma-sg.h>) includes a couple of relevant fields: | ||
284 | |||
285 | struct scatterlist *sglist; | ||
286 | int sglen; | ||
287 | |||
288 | The driver must allocate an appropriately-sized scatterlist array and | ||
289 | populate it with pointers to the pieces of the allocated buffer; sglen | ||
290 | should be set to the length of the array. | ||
291 | |||
292 | Drivers using the vmalloc() method need not (and cannot) concern themselves | ||
293 | with buffer allocation at all; videobuf will handle those details. The | ||
294 | same is normally true of contiguous-DMA drivers as well; videobuf will | ||
295 | allocate the buffers (with dma_alloc_coherent()) when it sees fit. That | ||
296 | means that these drivers may be trying to do high-order allocations at any | ||
297 | time, an operation which is not always guaranteed to work. Some drivers | ||
298 | play tricks by allocating DMA space at system boot time; videobuf does not | ||
299 | currently play well with those drivers. | ||
300 | |||
301 | As of 2.6.31, contiguous-DMA drivers can work with a user-supplied buffer, | ||
302 | as long as that buffer is physically contiguous. Normal user-space | ||
303 | allocations will not meet that criterion, but buffers obtained from other | ||
304 | kernel drivers, or those contained within huge pages, will work with these | ||
305 | drivers. | ||
306 | |||
307 | Filling the buffers | ||
308 | |||
309 | The final part of a videobuf implementation has no direct callback - it's | ||
310 | the portion of the code which actually puts frame data into the buffers, | ||
311 | usually in response to interrupts from the device. For all types of | ||
312 | drivers, this process works approximately as follows: | ||
313 | |||
314 | - Obtain the next available buffer and make sure that somebody is actually | ||
315 | waiting for it. | ||
316 | |||
317 | - Get a pointer to the memory and put video data there. | ||
318 | |||
319 | - Mark the buffer as done and wake up the process waiting for it. | ||
320 | |||
321 | Step (1) above is done by looking at the driver-managed list_head structure | ||
322 | - the one which is filled in the buf_queue() callback. Because starting | ||
323 | the engine and enqueueing buffers are done in separate steps, it's possible | ||
324 | for the engine to be running without any buffers available - in the | ||
325 | vmalloc() case especially. So the driver should be prepared for the list | ||
326 | to be empty. It is equally possible that nobody is yet interested in the | ||
327 | buffer; the driver should not remove it from the list or fill it until a | ||
328 | process is waiting on it. That test can be done by examining the buffer's | ||
329 | done field (a wait_queue_head_t structure) with waitqueue_active(). | ||
330 | |||
331 | A buffer's state should be set to VIDEOBUF_ACTIVE before being mapped for | ||
332 | DMA; that ensures that the videobuf layer will not try to do anything with | ||
333 | it while the device is transferring data. | ||
334 | |||
335 | For scatter/gather drivers, the needed memory pointers will be found in the | ||
336 | scatterlist structure described above. Drivers using the vmalloc() method | ||
337 | can get a memory pointer with: | ||
338 | |||
339 | void *videobuf_to_vmalloc(struct videobuf_buffer *buf); | ||
340 | |||
341 | For contiguous DMA drivers, the function to use is: | ||
342 | |||
343 | dma_addr_t videobuf_to_dma_contig(struct videobuf_buffer *buf); | ||
344 | |||
345 | The contiguous DMA API goes out of its way to hide the kernel-space address | ||
346 | of the DMA buffer from drivers. | ||
347 | |||
348 | The final step is to set the size field of the relevant videobuf_buffer | ||
349 | structure to the actual size of the captured image, set state to | ||
350 | VIDEOBUF_DONE, then call wake_up() on the done queue. At this point, the | ||
351 | buffer is owned by the videobuf layer and the driver should not touch it | ||
352 | again. | ||
353 | |||
354 | Developers who are interested in more information can go into the relevant | ||
355 | header files; there are a few low-level functions declared there which have | ||
356 | not been talked about here. Also worthwhile is the vivi driver | ||
357 | (drivers/media/video/vivi.c), which is maintained as an example of how V4L2 | ||
358 | drivers should be written. Vivi only uses the vmalloc() API, but it's good | ||
359 | enough to get started with. Note also that all of these calls are exported | ||
360 | GPL-only, so they will not be available to non-GPL kernel modules. | ||
diff --git a/Documentation/vm/00-INDEX b/Documentation/vm/00-INDEX index e57d6a9dd32b..dca82d7c83d8 100644 --- a/Documentation/vm/00-INDEX +++ b/Documentation/vm/00-INDEX | |||
@@ -4,23 +4,35 @@ active_mm.txt | |||
4 | - An explanation from Linus about tsk->active_mm vs tsk->mm. | 4 | - An explanation from Linus about tsk->active_mm vs tsk->mm. |
5 | balance | 5 | balance |
6 | - various information on memory balancing. | 6 | - various information on memory balancing. |
7 | hugepage-mmap.c | ||
8 | - Example app using huge page memory with the mmap system call. | ||
9 | hugepage-shm.c | ||
10 | - Example app using huge page memory with Sys V shared memory system calls. | ||
7 | hugetlbpage.txt | 11 | hugetlbpage.txt |
8 | - a brief summary of hugetlbpage support in the Linux kernel. | 12 | - a brief summary of hugetlbpage support in the Linux kernel. |
13 | hwpoison.txt | ||
14 | - explains what hwpoison is | ||
9 | ksm.txt | 15 | ksm.txt |
10 | - how to use the Kernel Samepage Merging feature. | 16 | - how to use the Kernel Samepage Merging feature. |
11 | locking | 17 | locking |
12 | - info on how locking and synchronization is done in the Linux vm code. | 18 | - info on how locking and synchronization is done in the Linux vm code. |
19 | map_hugetlb.c | ||
20 | - an example program that uses the MAP_HUGETLB mmap flag. | ||
13 | numa | 21 | numa |
14 | - information about NUMA specific code in the Linux vm. | 22 | - information about NUMA specific code in the Linux vm. |
15 | numa_memory_policy.txt | 23 | numa_memory_policy.txt |
16 | - documentation of concepts and APIs of the 2.6 memory policy support. | 24 | - documentation of concepts and APIs of the 2.6 memory policy support. |
17 | overcommit-accounting | 25 | overcommit-accounting |
18 | - description of the Linux kernels overcommit handling modes. | 26 | - description of the Linux kernels overcommit handling modes. |
27 | page-types.c | ||
28 | - Tool for querying page flags | ||
19 | page_migration | 29 | page_migration |
20 | - description of page migration in NUMA systems. | 30 | - description of page migration in NUMA systems. |
31 | pagemap.txt | ||
32 | - pagemap, from the userspace perspective | ||
21 | slabinfo.c | 33 | slabinfo.c |
22 | - source code for a tool to get reports about slabs. | 34 | - source code for a tool to get reports about slabs. |
23 | slub.txt | 35 | slub.txt |
24 | - a short users guide for SLUB. | 36 | - a short users guide for SLUB. |
25 | map_hugetlb.c | 37 | unevictable-lru.txt |
26 | - an example program that uses the MAP_HUGETLB mmap flag. | 38 | - Unevictable LRU infrastructure |
diff --git a/Documentation/vm/Makefile b/Documentation/vm/Makefile index 5bd269b3731a..9dcff328b964 100644 --- a/Documentation/vm/Makefile +++ b/Documentation/vm/Makefile | |||
@@ -2,7 +2,7 @@ | |||
2 | obj- := dummy.o | 2 | obj- := dummy.o |
3 | 3 | ||
4 | # List of programs to build | 4 | # List of programs to build |
5 | hostprogs-y := slabinfo page-types | 5 | hostprogs-y := slabinfo page-types hugepage-mmap hugepage-shm map_hugetlb |
6 | 6 | ||
7 | # Tell kbuild to always build the programs | 7 | # Tell kbuild to always build the programs |
8 | always := $(hostprogs-y) | 8 | always := $(hostprogs-y) |
diff --git a/Documentation/vm/hugepage-mmap.c b/Documentation/vm/hugepage-mmap.c new file mode 100644 index 000000000000..db0dd9a33d54 --- /dev/null +++ b/Documentation/vm/hugepage-mmap.c | |||
@@ -0,0 +1,91 @@ | |||
1 | /* | ||
2 | * hugepage-mmap: | ||
3 | * | ||
4 | * Example of using huge page memory in a user application using the mmap | ||
5 | * system call. Before running this application, make sure that the | ||
6 | * administrator has mounted the hugetlbfs filesystem (on some directory | ||
7 | * like /mnt) using the command mount -t hugetlbfs nodev /mnt. In this | ||
8 | * example, the app is requesting memory of size 256MB that is backed by | ||
9 | * huge pages. | ||
10 | * | ||
11 | * For the ia64 architecture, the Linux kernel reserves Region number 4 for | ||
12 | * huge pages. That means that if one requires a fixed address, a huge page | ||
13 | * aligned address starting with 0x800000... will be required. If a fixed | ||
14 | * address is not required, the kernel will select an address in the proper | ||
15 | * range. | ||
16 | * Other architectures, such as ppc64, i386 or x86_64 are not so constrained. | ||
17 | */ | ||
18 | |||
19 | #include <stdlib.h> | ||
20 | #include <stdio.h> | ||
21 | #include <unistd.h> | ||
22 | #include <sys/mman.h> | ||
23 | #include <fcntl.h> | ||
24 | |||
25 | #define FILE_NAME "/mnt/hugepagefile" | ||
26 | #define LENGTH (256UL*1024*1024) | ||
27 | #define PROTECTION (PROT_READ | PROT_WRITE) | ||
28 | |||
29 | /* Only ia64 requires this */ | ||
30 | #ifdef __ia64__ | ||
31 | #define ADDR (void *)(0x8000000000000000UL) | ||
32 | #define FLAGS (MAP_SHARED | MAP_FIXED) | ||
33 | #else | ||
34 | #define ADDR (void *)(0x0UL) | ||
35 | #define FLAGS (MAP_SHARED) | ||
36 | #endif | ||
37 | |||
38 | static void check_bytes(char *addr) | ||
39 | { | ||
40 | printf("First hex is %x\n", *((unsigned int *)addr)); | ||
41 | } | ||
42 | |||
43 | static void write_bytes(char *addr) | ||
44 | { | ||
45 | unsigned long i; | ||
46 | |||
47 | for (i = 0; i < LENGTH; i++) | ||
48 | *(addr + i) = (char)i; | ||
49 | } | ||
50 | |||
51 | static void read_bytes(char *addr) | ||
52 | { | ||
53 | unsigned long i; | ||
54 | |||
55 | check_bytes(addr); | ||
56 | for (i = 0; i < LENGTH; i++) | ||
57 | if (*(addr + i) != (char)i) { | ||
58 | printf("Mismatch at %lu\n", i); | ||
59 | break; | ||
60 | } | ||
61 | } | ||
62 | |||
63 | int main(void) | ||
64 | { | ||
65 | void *addr; | ||
66 | int fd; | ||
67 | |||
68 | fd = open(FILE_NAME, O_CREAT | O_RDWR, 0755); | ||
69 | if (fd < 0) { | ||
70 | perror("Open failed"); | ||
71 | exit(1); | ||
72 | } | ||
73 | |||
74 | addr = mmap(ADDR, LENGTH, PROTECTION, FLAGS, fd, 0); | ||
75 | if (addr == MAP_FAILED) { | ||
76 | perror("mmap"); | ||
77 | unlink(FILE_NAME); | ||
78 | exit(1); | ||
79 | } | ||
80 | |||
81 | printf("Returned address is %p\n", addr); | ||
82 | check_bytes(addr); | ||
83 | write_bytes(addr); | ||
84 | read_bytes(addr); | ||
85 | |||
86 | munmap(addr, LENGTH); | ||
87 | close(fd); | ||
88 | unlink(FILE_NAME); | ||
89 | |||
90 | return 0; | ||
91 | } | ||
diff --git a/Documentation/vm/hugepage-shm.c b/Documentation/vm/hugepage-shm.c new file mode 100644 index 000000000000..07956d8592c9 --- /dev/null +++ b/Documentation/vm/hugepage-shm.c | |||
@@ -0,0 +1,98 @@ | |||
1 | /* | ||
2 | * hugepage-shm: | ||
3 | * | ||
4 | * Example of using huge page memory in a user application using Sys V shared | ||
5 | * memory system calls. In this example the app is requesting 256MB of | ||
6 | * memory that is backed by huge pages. The application uses the flag | ||
7 | * SHM_HUGETLB in the shmget system call to inform the kernel that it is | ||
8 | * requesting huge pages. | ||
9 | * | ||
10 | * For the ia64 architecture, the Linux kernel reserves Region number 4 for | ||
11 | * huge pages. That means that if one requires a fixed address, a huge page | ||
12 | * aligned address starting with 0x800000... will be required. If a fixed | ||
13 | * address is not required, the kernel will select an address in the proper | ||
14 | * range. | ||
15 | * Other architectures, such as ppc64, i386 or x86_64 are not so constrained. | ||
16 | * | ||
17 | * Note: The default shared memory limit is quite low on many kernels, | ||
18 | * you may need to increase it via: | ||
19 | * | ||
20 | * echo 268435456 > /proc/sys/kernel/shmmax | ||
21 | * | ||
22 | * This will increase the maximum size per shared memory segment to 256MB. | ||
23 | * The other limit that you will hit eventually is shmall which is the | ||
24 | * total amount of shared memory in pages. To set it to 16GB on a system | ||
25 | * with a 4kB pagesize do: | ||
26 | * | ||
27 | * echo 4194304 > /proc/sys/kernel/shmall | ||
28 | */ | ||
29 | |||
30 | #include <stdlib.h> | ||
31 | #include <stdio.h> | ||
32 | #include <sys/types.h> | ||
33 | #include <sys/ipc.h> | ||
34 | #include <sys/shm.h> | ||
35 | #include <sys/mman.h> | ||
36 | |||
37 | #ifndef SHM_HUGETLB | ||
38 | #define SHM_HUGETLB 04000 | ||
39 | #endif | ||
40 | |||
41 | #define LENGTH (256UL*1024*1024) | ||
42 | |||
43 | #define dprintf(x) printf(x) | ||
44 | |||
45 | /* Only ia64 requires this */ | ||
46 | #ifdef __ia64__ | ||
47 | #define ADDR (void *)(0x8000000000000000UL) | ||
48 | #define SHMAT_FLAGS (SHM_RND) | ||
49 | #else | ||
50 | #define ADDR (void *)(0x0UL) | ||
51 | #define SHMAT_FLAGS (0) | ||
52 | #endif | ||
53 | |||
54 | int main(void) | ||
55 | { | ||
56 | int shmid; | ||
57 | unsigned long i; | ||
58 | char *shmaddr; | ||
59 | |||
60 | if ((shmid = shmget(2, LENGTH, | ||
61 | SHM_HUGETLB | IPC_CREAT | SHM_R | SHM_W)) < 0) { | ||
62 | perror("shmget"); | ||
63 | exit(1); | ||
64 | } | ||
65 | printf("shmid: 0x%x\n", shmid); | ||
66 | |||
67 | shmaddr = shmat(shmid, ADDR, SHMAT_FLAGS); | ||
68 | if (shmaddr == (char *)-1) { | ||
69 | perror("Shared memory attach failure"); | ||
70 | shmctl(shmid, IPC_RMID, NULL); | ||
71 | exit(2); | ||
72 | } | ||
73 | printf("shmaddr: %p\n", shmaddr); | ||
74 | |||
75 | dprintf("Starting the writes:\n"); | ||
76 | for (i = 0; i < LENGTH; i++) { | ||
77 | shmaddr[i] = (char)(i); | ||
78 | if (!(i % (1024 * 1024))) | ||
79 | dprintf("."); | ||
80 | } | ||
81 | dprintf("\n"); | ||
82 | |||
83 | dprintf("Starting the Check..."); | ||
84 | for (i = 0; i < LENGTH; i++) | ||
85 | if (shmaddr[i] != (char)i) | ||
86 | printf("\nIndex %lu mismatched\n", i); | ||
87 | dprintf("Done.\n"); | ||
88 | |||
89 | if (shmdt((const void *)shmaddr) != 0) { | ||
90 | perror("Detach failure"); | ||
91 | shmctl(shmid, IPC_RMID, NULL); | ||
92 | exit(3); | ||
93 | } | ||
94 | |||
95 | shmctl(shmid, IPC_RMID, NULL); | ||
96 | |||
97 | return 0; | ||
98 | } | ||
diff --git a/Documentation/vm/hugetlbpage.txt b/Documentation/vm/hugetlbpage.txt index bc31636973e3..457634c1e03e 100644 --- a/Documentation/vm/hugetlbpage.txt +++ b/Documentation/vm/hugetlbpage.txt | |||
@@ -299,176 +299,11 @@ map_hugetlb.c. | |||
299 | ******************************************************************* | 299 | ******************************************************************* |
300 | 300 | ||
301 | /* | 301 | /* |
302 | * Example of using huge page memory in a user application using Sys V shared | 302 | * hugepage-shm: see Documentation/vm/hugepage-shm.c |
303 | * memory system calls. In this example the app is requesting 256MB of | ||
304 | * memory that is backed by huge pages. The application uses the flag | ||
305 | * SHM_HUGETLB in the shmget system call to inform the kernel that it is | ||
306 | * requesting huge pages. | ||
307 | * | ||
308 | * For the ia64 architecture, the Linux kernel reserves Region number 4 for | ||
309 | * huge pages. That means that if one requires a fixed address, a huge page | ||
310 | * aligned address starting with 0x800000... will be required. If a fixed | ||
311 | * address is not required, the kernel will select an address in the proper | ||
312 | * range. | ||
313 | * Other architectures, such as ppc64, i386 or x86_64 are not so constrained. | ||
314 | * | ||
315 | * Note: The default shared memory limit is quite low on many kernels, | ||
316 | * you may need to increase it via: | ||
317 | * | ||
318 | * echo 268435456 > /proc/sys/kernel/shmmax | ||
319 | * | ||
320 | * This will increase the maximum size per shared memory segment to 256MB. | ||
321 | * The other limit that you will hit eventually is shmall which is the | ||
322 | * total amount of shared memory in pages. To set it to 16GB on a system | ||
323 | * with a 4kB pagesize do: | ||
324 | * | ||
325 | * echo 4194304 > /proc/sys/kernel/shmall | ||
326 | */ | 303 | */ |
327 | #include <stdlib.h> | ||
328 | #include <stdio.h> | ||
329 | #include <sys/types.h> | ||
330 | #include <sys/ipc.h> | ||
331 | #include <sys/shm.h> | ||
332 | #include <sys/mman.h> | ||
333 | |||
334 | #ifndef SHM_HUGETLB | ||
335 | #define SHM_HUGETLB 04000 | ||
336 | #endif | ||
337 | |||
338 | #define LENGTH (256UL*1024*1024) | ||
339 | |||
340 | #define dprintf(x) printf(x) | ||
341 | |||
342 | #define ADDR (void *)(0x0UL) /* let kernel choose address */ | ||
343 | #define SHMAT_FLAGS (0) | ||
344 | |||
345 | int main(void) | ||
346 | { | ||
347 | int shmid; | ||
348 | unsigned long i; | ||
349 | char *shmaddr; | ||
350 | |||
351 | if ((shmid = shmget(2, LENGTH, | ||
352 | SHM_HUGETLB | IPC_CREAT | SHM_R | SHM_W)) < 0) { | ||
353 | perror("shmget"); | ||
354 | exit(1); | ||
355 | } | ||
356 | printf("shmid: 0x%x\n", shmid); | ||
357 | |||
358 | shmaddr = shmat(shmid, ADDR, SHMAT_FLAGS); | ||
359 | if (shmaddr == (char *)-1) { | ||
360 | perror("Shared memory attach failure"); | ||
361 | shmctl(shmid, IPC_RMID, NULL); | ||
362 | exit(2); | ||
363 | } | ||
364 | printf("shmaddr: %p\n", shmaddr); | ||
365 | |||
366 | dprintf("Starting the writes:\n"); | ||
367 | for (i = 0; i < LENGTH; i++) { | ||
368 | shmaddr[i] = (char)(i); | ||
369 | if (!(i % (1024 * 1024))) | ||
370 | dprintf("."); | ||
371 | } | ||
372 | dprintf("\n"); | ||
373 | |||
374 | dprintf("Starting the Check..."); | ||
375 | for (i = 0; i < LENGTH; i++) | ||
376 | if (shmaddr[i] != (char)i) | ||
377 | printf("\nIndex %lu mismatched\n", i); | ||
378 | dprintf("Done.\n"); | ||
379 | |||
380 | if (shmdt((const void *)shmaddr) != 0) { | ||
381 | perror("Detach failure"); | ||
382 | shmctl(shmid, IPC_RMID, NULL); | ||
383 | exit(3); | ||
384 | } | ||
385 | |||
386 | shmctl(shmid, IPC_RMID, NULL); | ||
387 | |||
388 | return 0; | ||
389 | } | ||
390 | 304 | ||
391 | ******************************************************************* | 305 | ******************************************************************* |
392 | 306 | ||
393 | /* | 307 | /* |
394 | * Example of using huge page memory in a user application using the mmap | 308 | * hugepage-mmap: see Documentation/vm/hugepage-mmap.c |
395 | * system call. Before running this application, make sure that the | ||
396 | * administrator has mounted the hugetlbfs filesystem (on some directory | ||
397 | * like /mnt) using the command mount -t hugetlbfs nodev /mnt. In this | ||
398 | * example, the app is requesting memory of size 256MB that is backed by | ||
399 | * huge pages. | ||
400 | * | ||
401 | * For the ia64 architecture, the Linux kernel reserves Region number 4 for | ||
402 | * huge pages. That means that if one requires a fixed address, a huge page | ||
403 | * aligned address starting with 0x800000... will be required. If a fixed | ||
404 | * address is not required, the kernel will select an address in the proper | ||
405 | * range. | ||
406 | * Other architectures, such as ppc64, i386 or x86_64 are not so constrained. | ||
407 | */ | 309 | */ |
408 | #include <stdlib.h> | ||
409 | #include <stdio.h> | ||
410 | #include <unistd.h> | ||
411 | #include <sys/mman.h> | ||
412 | #include <fcntl.h> | ||
413 | |||
414 | #define FILE_NAME "/mnt/hugepagefile" | ||
415 | #define LENGTH (256UL*1024*1024) | ||
416 | #define PROTECTION (PROT_READ | PROT_WRITE) | ||
417 | |||
418 | #define ADDR (void *)(0x0UL) /* let kernel choose address */ | ||
419 | #define FLAGS (MAP_SHARED) | ||
420 | |||
421 | void check_bytes(char *addr) | ||
422 | { | ||
423 | printf("First hex is %x\n", *((unsigned int *)addr)); | ||
424 | } | ||
425 | |||
426 | void write_bytes(char *addr) | ||
427 | { | ||
428 | unsigned long i; | ||
429 | |||
430 | for (i = 0; i < LENGTH; i++) | ||
431 | *(addr + i) = (char)i; | ||
432 | } | ||
433 | |||
434 | void read_bytes(char *addr) | ||
435 | { | ||
436 | unsigned long i; | ||
437 | |||
438 | check_bytes(addr); | ||
439 | for (i = 0; i < LENGTH; i++) | ||
440 | if (*(addr + i) != (char)i) { | ||
441 | printf("Mismatch at %lu\n", i); | ||
442 | break; | ||
443 | } | ||
444 | } | ||
445 | |||
446 | int main(void) | ||
447 | { | ||
448 | void *addr; | ||
449 | int fd; | ||
450 | |||
451 | fd = open(FILE_NAME, O_CREAT | O_RDWR, 0755); | ||
452 | if (fd < 0) { | ||
453 | perror("Open failed"); | ||
454 | exit(1); | ||
455 | } | ||
456 | |||
457 | addr = mmap(ADDR, LENGTH, PROTECTION, FLAGS, fd, 0); | ||
458 | if (addr == MAP_FAILED) { | ||
459 | perror("mmap"); | ||
460 | unlink(FILE_NAME); | ||
461 | exit(1); | ||
462 | } | ||
463 | |||
464 | printf("Returned address is %p\n", addr); | ||
465 | check_bytes(addr); | ||
466 | write_bytes(addr); | ||
467 | read_bytes(addr); | ||
468 | |||
469 | munmap(addr, LENGTH); | ||
470 | close(fd); | ||
471 | unlink(FILE_NAME); | ||
472 | |||
473 | return 0; | ||
474 | } | ||
diff --git a/Documentation/vm/map_hugetlb.c b/Documentation/vm/map_hugetlb.c index e2bdae37f499..9969c7d9f985 100644 --- a/Documentation/vm/map_hugetlb.c +++ b/Documentation/vm/map_hugetlb.c | |||
@@ -31,12 +31,12 @@ | |||
31 | #define FLAGS (MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB) | 31 | #define FLAGS (MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB) |
32 | #endif | 32 | #endif |
33 | 33 | ||
34 | void check_bytes(char *addr) | 34 | static void check_bytes(char *addr) |
35 | { | 35 | { |
36 | printf("First hex is %x\n", *((unsigned int *)addr)); | 36 | printf("First hex is %x\n", *((unsigned int *)addr)); |
37 | } | 37 | } |
38 | 38 | ||
39 | void write_bytes(char *addr) | 39 | static void write_bytes(char *addr) |
40 | { | 40 | { |
41 | unsigned long i; | 41 | unsigned long i; |
42 | 42 | ||
@@ -44,7 +44,7 @@ void write_bytes(char *addr) | |||
44 | *(addr + i) = (char)i; | 44 | *(addr + i) = (char)i; |
45 | } | 45 | } |
46 | 46 | ||
47 | void read_bytes(char *addr) | 47 | static void read_bytes(char *addr) |
48 | { | 48 | { |
49 | unsigned long i; | 49 | unsigned long i; |
50 | 50 | ||
diff --git a/Documentation/vm/slub.txt b/Documentation/vm/slub.txt index b37300edf27c..07375e73981a 100644 --- a/Documentation/vm/slub.txt +++ b/Documentation/vm/slub.txt | |||
@@ -41,6 +41,7 @@ Possible debug options are | |||
41 | P Poisoning (object and padding) | 41 | P Poisoning (object and padding) |
42 | U User tracking (free and alloc) | 42 | U User tracking (free and alloc) |
43 | T Trace (please only use on single slabs) | 43 | T Trace (please only use on single slabs) |
44 | A Toggle failslab filter mark for the cache | ||
44 | O Switch debugging off for caches that would have | 45 | O Switch debugging off for caches that would have |
45 | caused higher minimum slab orders | 46 | caused higher minimum slab orders |
46 | - Switch all debugging off (useful if the kernel is | 47 | - Switch all debugging off (useful if the kernel is |
diff --git a/Documentation/voyager.txt b/Documentation/voyager.txt deleted file mode 100644 index 2749af552cdf..000000000000 --- a/Documentation/voyager.txt +++ /dev/null | |||
@@ -1,95 +0,0 @@ | |||
1 | Running Linux on the Voyager Architecture | ||
2 | ========================================= | ||
3 | |||
4 | For full details and current project status, see | ||
5 | |||
6 | http://www.hansenpartnership.com/voyager | ||
7 | |||
8 | The voyager architecture was designed by NCR in the mid 80s to be a | ||
9 | fully SMP capable RAS computing architecture built around intel's 486 | ||
10 | chip set. The voyager came in three levels of architectural | ||
11 | sophistication: 3,4 and 5 --- 1 and 2 never made it out of prototype. | ||
12 | The linux patches support only the Level 5 voyager architecture (any | ||
13 | machine class 3435 and above). | ||
14 | |||
15 | The Voyager Architecture | ||
16 | ------------------------ | ||
17 | |||
18 | Voyager machines consist of a Baseboard with a 386 diagnostic | ||
19 | processor, a Power Supply Interface (PSI) a Primary and possibly | ||
20 | Secondary Microchannel bus and between 2 and 20 voyager slots. The | ||
21 | voyager slots can be populated with memory and cpu cards (up to 4GB | ||
22 | memory and from 1 486 to 32 Pentium Pro processors). Internally, the | ||
23 | voyager has a dual arbitrated system bus and a configuration and test | ||
24 | bus (CAT). The voyager bus speed is 40MHz. Therefore (since all | ||
25 | voyager cards are dual ported for each system bus) the maximum | ||
26 | transfer rate is 320Mb/s but only if you have your slot configuration | ||
27 | tuned (only memory cards can communicate with both busses at once, CPU | ||
28 | cards utilise them one at a time). | ||
29 | |||
30 | Voyager SMP | ||
31 | ----------- | ||
32 | |||
33 | Since voyager was the first intel based SMP system, it is slightly | ||
34 | more primitive than the Intel IO-APIC approach to SMP. Voyager allows | ||
35 | arbitrary interrupt routing (including processor affinity routing) of | ||
36 | all 16 PC type interrupts. However it does this by using a modified | ||
37 | 5259 master/slave chip set instead of an APIC bus. Additionally, | ||
38 | voyager supports Cross Processor Interrupts (CPI) equivalent to the | ||
39 | APIC IPIs. There are two routed voyager interrupt lines provided to | ||
40 | each slot. | ||
41 | |||
42 | Processor Cards | ||
43 | --------------- | ||
44 | |||
45 | These come in single, dyadic and quad configurations (the quads are | ||
46 | problematic--see later). The maximum configuration is 8 quad cards | ||
47 | for 32 way SMP. | ||
48 | |||
49 | Quad Processors | ||
50 | --------------- | ||
51 | |||
52 | Because voyager only supplies two interrupt lines to each Processor | ||
53 | card, the Quad processors have to be configured (and Bootstrapped) in | ||
54 | as a pair of Master/Slave processors. | ||
55 | |||
56 | In fact, most Quad cards only accept one VIC interrupt line, so they | ||
57 | have one interrupt handling processor (called the VIC extended | ||
58 | processor) and three non-interrupt handling processors. | ||
59 | |||
60 | Current Status | ||
61 | -------------- | ||
62 | |||
63 | The System will boot on Mono, Dyad and Quad cards. There was | ||
64 | originally a Quad boot problem which has been fixed by proper gdt | ||
65 | alignment in the initial boot loader. If you still cannot get your | ||
66 | voyager system to boot, email me at: | ||
67 | |||
68 | <J.E.J.Bottomley@HansenPartnership.com> | ||
69 | |||
70 | |||
71 | The Quad cards now support using the separate Quad CPI vectors instead | ||
72 | of going through the VIC mailbox system. | ||
73 | |||
74 | The Level 4 architecture (3430 and 3360 Machines) should also work | ||
75 | fine. | ||
76 | |||
77 | Dump Switch | ||
78 | ----------- | ||
79 | |||
80 | The voyager dump switch sends out a broadcast NMI which the voyager | ||
81 | code intercepts and does a task dump. | ||
82 | |||
83 | Power Switch | ||
84 | ------------ | ||
85 | |||
86 | The front panel power switch is intercepted by the kernel and should | ||
87 | cause a system shutdown and power off. | ||
88 | |||
89 | A Note About Mixed CPU Systems | ||
90 | ------------------------------ | ||
91 | |||
92 | Linux isn't designed to handle mixed CPU systems very well. In order | ||
93 | to get everything going you *must* make sure that your lowest | ||
94 | capability CPU is used for booting. Also, mixing CPU classes | ||
95 | (e.g. 486 and 586) is really not going to work very well at all. | ||
diff --git a/Documentation/x86/x86_64/boot-options.txt b/Documentation/x86/x86_64/boot-options.txt index 29a6ff8bc7d3..7fbbaf85f5b7 100644 --- a/Documentation/x86/x86_64/boot-options.txt +++ b/Documentation/x86/x86_64/boot-options.txt | |||
@@ -166,19 +166,13 @@ NUMA | |||
166 | 166 | ||
167 | numa=noacpi Don't parse the SRAT table for NUMA setup | 167 | numa=noacpi Don't parse the SRAT table for NUMA setup |
168 | 168 | ||
169 | numa=fake=CMDLINE | 169 | numa=fake=<size>[MG] |
170 | If a number, fakes CMDLINE nodes and ignores NUMA setup of the | 170 | If given as a memory unit, fills all system RAM with nodes of |
171 | actual machine. Otherwise, system memory is configured | 171 | size interleaved over physical nodes. |
172 | depending on the sizes and coefficients listed. For example: | 172 | |
173 | numa=fake=2*512,1024,4*256,*128 | 173 | numa=fake=<N> |
174 | gives two 512M nodes, a 1024M node, four 256M nodes, and the | 174 | If given as an integer, fills all system RAM with N fake nodes |
175 | rest split into 128M chunks. If the last character of CMDLINE | 175 | interleaved over physical nodes. |
176 | is a *, the remaining memory is divided up equally among its | ||
177 | coefficient: | ||
178 | numa=fake=2*512,2* | ||
179 | gives two 512M nodes and the rest split into two nodes. | ||
180 | Otherwise, the remaining system RAM is allocated to an | ||
181 | additional node. | ||
182 | 176 | ||
183 | ACPI | 177 | ACPI |
184 | 178 | ||