aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
authorOlof Johansson <olof@lixom.net>2014-09-24 01:11:05 -0400
committerOlof Johansson <olof@lixom.net>2014-09-24 01:11:25 -0400
commit8adc36bcd374dc7381d15e654215dd1f548ccbef (patch)
treeafc86512891f75b04efa0273694a977a77529a86 /Documentation
parent96bdd9aeb2cbc5eaae586f4d43badd072611fcb1 (diff)
parentd27704d1ec2f9ba06247b402c58a6f2febecef78 (diff)
Merge tag 'dt-for-v3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into next/dt
Merge "omap dts changes for v3.18 merge window" from Tony Lindgren: Changes for .dts files for omaps for v3.18 merge window: - Updates for gta04 to add gta04a3 model - Add support for Tehnexion TAO3530 boards - Regulator names for beaglebone - Pinctrl related updates for omap5, dra7 and am437 - Model name fix for sbc-t54 - Enable mailbox for various omaps * tag 'dt-for-v3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap: (291 commits) ARM: dts: OMAP2+: Add sub mailboxes device node information ARM: dts: dra7-evm: Mark uart1 rxd as wakeup capable ARM: dts: OMAP5 / DRA7: switch over to interrupts-extended property for UART ARM: dts: AM437x: switch to compatible pinctrl ARM: dts: DRA7: switch to compatible pinctrl ARM: dts: OMAP5: switch to compatible pinctrl ARM: dts: am335x-boneblack: Add names for remaining regulators ARM: dts: sbc-t54: fix model property ARM: dts: omap5.dtsi: add DSS RFBI node ARM: dts: omap3: Add HEAD acoustics omap3-ha.dts and omap3-ha-lcd.dts (TAO3530 based) ARM: dts: omap3: Add Technexion Thunder support (TAO3530 SOM based) ARM: dts: omap3: Add Technexion TAO3530 SOM omap3-tao3530.dtsi ARM: OMAP2+: tao3530: Add pdata-quirk for the mmc2 internal clock ARM: OMAP2+: board-generic: add support for AM57xx family ARM: dts: dra72-evm: Add tps65917 PMIC node ARM: dts: dra72-evm: Enable I2C1 node Linux 3.17-rc3 unicore32: Fix build error vexpress/spc: fix a build warning on array bounds spi: sh-msiof: Fix transmit-only DMA transfers ... Signed-off-by: Olof Johansson <olof@lixom.net>
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/devicetree/bindings/arm/omap/omap.txt12
-rw-r--r--Documentation/devicetree/bindings/mfd/tc3589x.txt107
-rw-r--r--Documentation/devicetree/bindings/mtd/gpmc-nand.txt2
-rw-r--r--Documentation/devicetree/bindings/pinctrl/qcom,apq8064-pinctrl.txt2
-rw-r--r--Documentation/dma-buf-sharing.txt14
-rw-r--r--Documentation/kdump/kdump.txt36
-rw-r--r--Documentation/this_cpu_ops.txt213
7 files changed, 333 insertions, 53 deletions
diff --git a/Documentation/devicetree/bindings/arm/omap/omap.txt b/Documentation/devicetree/bindings/arm/omap/omap.txt
index 0edc90305dfe..ddd9bcdf889c 100644
--- a/Documentation/devicetree/bindings/arm/omap/omap.txt
+++ b/Documentation/devicetree/bindings/arm/omap/omap.txt
@@ -85,6 +85,18 @@ SoCs:
85- DRA722 85- DRA722
86 compatible = "ti,dra722", "ti,dra72", "ti,dra7" 86 compatible = "ti,dra722", "ti,dra72", "ti,dra7"
87 87
88- AM5728
89 compatible = "ti,am5728", "ti,dra742", "ti,dra74", "ti,dra7"
90
91- AM5726
92 compatible = "ti,am5726", "ti,dra742", "ti,dra74", "ti,dra7"
93
94- AM5718
95 compatible = "ti,am5718", "ti,dra722", "ti,dra72", "ti,dra7"
96
97- AM5716
98 compatible = "ti,am5716", "ti,dra722", "ti,dra72", "ti,dra7"
99
88- AM4372 100- AM4372
89 compatible = "ti,am4372", "ti,am43" 101 compatible = "ti,am4372", "ti,am43"
90 102
diff --git a/Documentation/devicetree/bindings/mfd/tc3589x.txt b/Documentation/devicetree/bindings/mfd/tc3589x.txt
new file mode 100644
index 000000000000..6fcedba46ae9
--- /dev/null
+++ b/Documentation/devicetree/bindings/mfd/tc3589x.txt
@@ -0,0 +1,107 @@
1* Toshiba TC3589x multi-purpose expander
2
3The Toshiba TC3589x series are I2C-based MFD devices which may expose the
4following built-in devices: gpio, keypad, rotator (vibrator), PWM (for
5e.g. LEDs or vibrators) The included models are:
6
7- TC35890
8- TC35892
9- TC35893
10- TC35894
11- TC35895
12- TC35896
13
14Required properties:
15 - compatible : must be "toshiba,tc35890", "toshiba,tc35892", "toshiba,tc35893",
16 "toshiba,tc35894", "toshiba,tc35895" or "toshiba,tc35896"
17 - reg : I2C address of the device
18 - interrupt-parent : specifies which IRQ controller we're connected to
19 - interrupts : the interrupt on the parent the controller is connected to
20 - interrupt-controller : marks the device node as an interrupt controller
21 - #interrupt-cells : should be <1>, the first cell is the IRQ offset on this
22 TC3589x interrupt controller.
23
24Optional nodes:
25
26- GPIO
27 This GPIO module inside the TC3589x has 24 (TC35890, TC35892) or 20
28 (other models) GPIO lines.
29 - compatible : must be "toshiba,tc3589x-gpio"
30 - interrupts : interrupt on the parent, which must be the tc3589x MFD device
31 - interrupt-controller : marks the device node as an interrupt controller
32 - #interrupt-cells : should be <2>, the first cell is the IRQ offset on this
33 TC3589x GPIO interrupt controller, the second cell is the interrupt flags
34 in accordance with <dt-bindings/interrupt-controller/irq.h>. The following
35 flags are valid:
36 - IRQ_TYPE_LEVEL_LOW
37 - IRQ_TYPE_LEVEL_HIGH
38 - IRQ_TYPE_EDGE_RISING
39 - IRQ_TYPE_EDGE_FALLING
40 - IRQ_TYPE_EDGE_BOTH
41 - gpio-controller : marks the device node as a GPIO controller
42 - #gpio-cells : should be <2>, the first cell is the GPIO offset on this
43 GPIO controller, the second cell is the flags.
44
45- Keypad
46 This keypad is the same on all variants, supporting up to 96 different
47 keys. The linux-specific properties are modeled on those already existing
48 in other input drivers.
49 - compatible : must be "toshiba,tc3589x-keypad"
50 - debounce-delay-ms : debounce interval in milliseconds
51 - keypad,num-rows : number of rows in the matrix, see
52 bindings/input/matrix-keymap.txt
53 - keypad,num-columns : number of columns in the matrix, see
54 bindings/input/matrix-keymap.txt
55 - linux,keymap: the definition can be found in
56 bindings/input/matrix-keymap.txt
57 - linux,no-autorepeat: do no enable autorepeat feature.
58 - linux,wakeup: use any event on keypad as wakeup event.
59
60Example:
61
62tc35893@44 {
63 compatible = "toshiba,tc35893";
64 reg = <0x44>;
65 interrupt-parent = <&gpio6>;
66 interrupts = <26 IRQ_TYPE_EDGE_RISING>;
67
68 interrupt-controller;
69 #interrupt-cells = <1>;
70
71 tc3589x_gpio {
72 compatible = "toshiba,tc3589x-gpio";
73 interrupts = <0>;
74
75 interrupt-controller;
76 #interrupt-cells = <2>;
77 gpio-controller;
78 #gpio-cells = <2>;
79 };
80 tc3589x_keypad {
81 compatible = "toshiba,tc3589x-keypad";
82 interrupts = <6>;
83 debounce-delay-ms = <4>;
84 keypad,num-columns = <8>;
85 keypad,num-rows = <8>;
86 linux,no-autorepeat;
87 linux,wakeup;
88 linux,keymap = <0x0301006b
89 0x04010066
90 0x06040072
91 0x040200d7
92 0x0303006a
93 0x0205000e
94 0x0607008b
95 0x0500001c
96 0x0403000b
97 0x03040034
98 0x05020067
99 0x0305006c
100 0x040500e7
101 0x0005009e
102 0x06020073
103 0x01030039
104 0x07060069
105 0x050500d9>;
106 };
107};
diff --git a/Documentation/devicetree/bindings/mtd/gpmc-nand.txt b/Documentation/devicetree/bindings/mtd/gpmc-nand.txt
index 65f4f7c43136..ee654e95d8ad 100644
--- a/Documentation/devicetree/bindings/mtd/gpmc-nand.txt
+++ b/Documentation/devicetree/bindings/mtd/gpmc-nand.txt
@@ -22,7 +22,7 @@ Optional properties:
22 width of 8 is assumed. 22 width of 8 is assumed.
23 23
24 - ti,nand-ecc-opt: A string setting the ECC layout to use. One of: 24 - ti,nand-ecc-opt: A string setting the ECC layout to use. One of:
25 "sw" <deprecated> use "ham1" instead 25 "sw" 1-bit Hamming ecc code via software
26 "hw" <deprecated> use "ham1" instead 26 "hw" <deprecated> use "ham1" instead
27 "hw-romcode" <deprecated> use "ham1" instead 27 "hw-romcode" <deprecated> use "ham1" instead
28 "ham1" 1-bit Hamming ecc code 28 "ham1" 1-bit Hamming ecc code
diff --git a/Documentation/devicetree/bindings/pinctrl/qcom,apq8064-pinctrl.txt b/Documentation/devicetree/bindings/pinctrl/qcom,apq8064-pinctrl.txt
index 0211c6d8a522..92fae82f35f2 100644
--- a/Documentation/devicetree/bindings/pinctrl/qcom,apq8064-pinctrl.txt
+++ b/Documentation/devicetree/bindings/pinctrl/qcom,apq8064-pinctrl.txt
@@ -62,7 +62,7 @@ Example:
62 #gpio-cells = <2>; 62 #gpio-cells = <2>;
63 interrupt-controller; 63 interrupt-controller;
64 #interrupt-cells = <2>; 64 #interrupt-cells = <2>;
65 interrupts = <0 32 0x4>; 65 interrupts = <0 16 0x4>;
66 66
67 pinctrl-names = "default"; 67 pinctrl-names = "default";
68 pinctrl-0 = <&gsbi5_uart_default>; 68 pinctrl-0 = <&gsbi5_uart_default>;
diff --git a/Documentation/dma-buf-sharing.txt b/Documentation/dma-buf-sharing.txt
index 67a4087d53f9..bb9753b635a3 100644
--- a/Documentation/dma-buf-sharing.txt
+++ b/Documentation/dma-buf-sharing.txt
@@ -56,10 +56,10 @@ The dma_buf buffer sharing API usage contains the following steps:
56 size_t size, int flags, 56 size_t size, int flags,
57 const char *exp_name) 57 const char *exp_name)
58 58
59 If this succeeds, dma_buf_export allocates a dma_buf structure, and returns a 59 If this succeeds, dma_buf_export_named allocates a dma_buf structure, and
60 pointer to the same. It also associates an anonymous file with this buffer, 60 returns a pointer to the same. It also associates an anonymous file with this
61 so it can be exported. On failure to allocate the dma_buf object, it returns 61 buffer, so it can be exported. On failure to allocate the dma_buf object,
62 NULL. 62 it returns NULL.
63 63
64 'exp_name' is the name of exporter - to facilitate information while 64 'exp_name' is the name of exporter - to facilitate information while
65 debugging. 65 debugging.
@@ -76,7 +76,7 @@ The dma_buf buffer sharing API usage contains the following steps:
76 drivers and/or processes. 76 drivers and/or processes.
77 77
78 Interface: 78 Interface:
79 int dma_buf_fd(struct dma_buf *dmabuf) 79 int dma_buf_fd(struct dma_buf *dmabuf, int flags)
80 80
81 This API installs an fd for the anonymous file associated with this buffer; 81 This API installs an fd for the anonymous file associated with this buffer;
82 returns either 'fd', or error. 82 returns either 'fd', or error.
@@ -157,7 +157,9 @@ to request use of buffer for allocation.
157 "dma_buf->ops->" indirection from the users of this interface. 157 "dma_buf->ops->" indirection from the users of this interface.
158 158
159 In struct dma_buf_ops, unmap_dma_buf is defined as 159 In struct dma_buf_ops, unmap_dma_buf is defined as
160 void (*unmap_dma_buf)(struct dma_buf_attachment *, struct sg_table *); 160 void (*unmap_dma_buf)(struct dma_buf_attachment *,
161 struct sg_table *,
162 enum dma_data_direction);
161 163
162 unmap_dma_buf signifies the end-of-DMA for the attachment provided. Like 164 unmap_dma_buf signifies the end-of-DMA for the attachment provided. Like
163 map_dma_buf, this API also must be implemented by the exporter. 165 map_dma_buf, this API also must be implemented by the exporter.
diff --git a/Documentation/kdump/kdump.txt b/Documentation/kdump/kdump.txt
index 88d5a863712a..6c0b9f27e465 100644
--- a/Documentation/kdump/kdump.txt
+++ b/Documentation/kdump/kdump.txt
@@ -18,7 +18,7 @@ memory image to a dump file on the local disk, or across the network to
18a remote system. 18a remote system.
19 19
20Kdump and kexec are currently supported on the x86, x86_64, ppc64, ia64, 20Kdump and kexec are currently supported on the x86, x86_64, ppc64, ia64,
21and s390x architectures. 21s390x and arm architectures.
22 22
23When the system kernel boots, it reserves a small section of memory for 23When the system kernel boots, it reserves a small section of memory for
24the dump-capture kernel. This ensures that ongoing Direct Memory Access 24the dump-capture kernel. This ensures that ongoing Direct Memory Access
@@ -112,7 +112,7 @@ There are two possible methods of using Kdump.
1122) Or use the system kernel binary itself as dump-capture kernel and there is 1122) Or use the system kernel binary itself as dump-capture kernel and there is
113 no need to build a separate dump-capture kernel. This is possible 113 no need to build a separate dump-capture kernel. This is possible
114 only with the architectures which support a relocatable kernel. As 114 only with the architectures which support a relocatable kernel. As
115 of today, i386, x86_64, ppc64 and ia64 architectures support relocatable 115 of today, i386, x86_64, ppc64, ia64 and arm architectures support relocatable
116 kernel. 116 kernel.
117 117
118Building a relocatable kernel is advantageous from the point of view that 118Building a relocatable kernel is advantageous from the point of view that
@@ -241,6 +241,13 @@ Dump-capture kernel config options (Arch Dependent, ia64)
241 kernel will be aligned to 64Mb, so if the start address is not then 241 kernel will be aligned to 64Mb, so if the start address is not then
242 any space below the alignment point will be wasted. 242 any space below the alignment point will be wasted.
243 243
244Dump-capture kernel config options (Arch Dependent, arm)
245----------------------------------------------------------
246
247- To use a relocatable kernel,
248 Enable "AUTO_ZRELADDR" support under "Boot" options:
249
250 AUTO_ZRELADDR=y
244 251
245Extended crashkernel syntax 252Extended crashkernel syntax
246=========================== 253===========================
@@ -256,6 +263,10 @@ The syntax is:
256 crashkernel=<range1>:<size1>[,<range2>:<size2>,...][@offset] 263 crashkernel=<range1>:<size1>[,<range2>:<size2>,...][@offset]
257 range=start-[end] 264 range=start-[end]
258 265
266Please note, on arm, the offset is required.
267 crashkernel=<range1>:<size1>[,<range2>:<size2>,...]@offset
268 range=start-[end]
269
259 'start' is inclusive and 'end' is exclusive. 270 'start' is inclusive and 'end' is exclusive.
260 271
261For example: 272For example:
@@ -296,6 +307,12 @@ Boot into System Kernel
296 on the memory consumption of the kdump system. In general this is not 307 on the memory consumption of the kdump system. In general this is not
297 dependent on the memory size of the production system. 308 dependent on the memory size of the production system.
298 309
310 On arm, use "crashkernel=Y@X". Note that the start address of the kernel
311 will be aligned to 128MiB (0x08000000), so if the start address is not then
312 any space below the alignment point may be overwritten by the dump-capture kernel,
313 which means it is possible that the vmcore is not that precise as expected.
314
315
299Load the Dump-capture Kernel 316Load the Dump-capture Kernel
300============================ 317============================
301 318
@@ -315,7 +332,8 @@ For ia64:
315 - Use vmlinux or vmlinuz.gz 332 - Use vmlinux or vmlinuz.gz
316For s390x: 333For s390x:
317 - Use image or bzImage 334 - Use image or bzImage
318 335For arm:
336 - Use zImage
319 337
320If you are using a uncompressed vmlinux image then use following command 338If you are using a uncompressed vmlinux image then use following command
321to load dump-capture kernel. 339to load dump-capture kernel.
@@ -331,6 +349,15 @@ to load dump-capture kernel.
331 --initrd=<initrd-for-dump-capture-kernel> \ 349 --initrd=<initrd-for-dump-capture-kernel> \
332 --append="root=<root-dev> <arch-specific-options>" 350 --append="root=<root-dev> <arch-specific-options>"
333 351
352If you are using a compressed zImage, then use following command
353to load dump-capture kernel.
354
355 kexec --type zImage -p <dump-capture-kernel-bzImage> \
356 --initrd=<initrd-for-dump-capture-kernel> \
357 --dtb=<dtb-for-dump-capture-kernel> \
358 --append="root=<root-dev> <arch-specific-options>"
359
360
334Please note, that --args-linux does not need to be specified for ia64. 361Please note, that --args-linux does not need to be specified for ia64.
335It is planned to make this a no-op on that architecture, but for now 362It is planned to make this a no-op on that architecture, but for now
336it should be omitted 363it should be omitted
@@ -347,6 +374,9 @@ For ppc64:
347For s390x: 374For s390x:
348 "1 maxcpus=1 cgroup_disable=memory" 375 "1 maxcpus=1 cgroup_disable=memory"
349 376
377For arm:
378 "1 maxcpus=1 reset_devices"
379
350Notes on loading the dump-capture kernel: 380Notes on loading the dump-capture kernel:
351 381
352* By default, the ELF headers are stored in ELF64 format to support 382* By default, the ELF headers are stored in ELF64 format to support
diff --git a/Documentation/this_cpu_ops.txt b/Documentation/this_cpu_ops.txt
index 1a4ce7e3e05f..0ec995712176 100644
--- a/Documentation/this_cpu_ops.txt
+++ b/Documentation/this_cpu_ops.txt
@@ -2,26 +2,26 @@ this_cpu operations
2------------------- 2-------------------
3 3
4this_cpu operations are a way of optimizing access to per cpu 4this_cpu operations are a way of optimizing access to per cpu
5variables associated with the *currently* executing processor through 5variables associated with the *currently* executing processor. This is
6the use of segment registers (or a dedicated register where the cpu 6done through the use of segment registers (or a dedicated register where
7permanently stored the beginning of the per cpu area for a specific 7the cpu permanently stored the beginning of the per cpu area for a
8processor). 8specific processor).
9 9
10The this_cpu operations add a per cpu variable offset to the processor 10this_cpu operations add a per cpu variable offset to the processor
11specific percpu base and encode that operation in the instruction 11specific per cpu base and encode that operation in the instruction
12operating on the per cpu variable. 12operating on the per cpu variable.
13 13
14This means there are no atomicity issues between the calculation of 14This means that there are no atomicity issues between the calculation of
15the offset and the operation on the data. Therefore it is not 15the offset and the operation on the data. Therefore it is not
16necessary to disable preempt or interrupts to ensure that the 16necessary to disable preemption or interrupts to ensure that the
17processor is not changed between the calculation of the address and 17processor is not changed between the calculation of the address and
18the operation on the data. 18the operation on the data.
19 19
20Read-modify-write operations are of particular interest. Frequently 20Read-modify-write operations are of particular interest. Frequently
21processors have special lower latency instructions that can operate 21processors have special lower latency instructions that can operate
22without the typical synchronization overhead but still provide some 22without the typical synchronization overhead, but still provide some
23sort of relaxed atomicity guarantee. The x86 for example can execute 23sort of relaxed atomicity guarantees. The x86, for example, can execute
24RMV (Read Modify Write) instructions like inc/dec/cmpxchg without the 24RMW (Read Modify Write) instructions like inc/dec/cmpxchg without the
25lock prefix and the associated latency penalty. 25lock prefix and the associated latency penalty.
26 26
27Access to the variable without the lock prefix is not synchronized but 27Access to the variable without the lock prefix is not synchronized but
@@ -30,6 +30,38 @@ data specific to the currently executing processor. Only the current
30processor should be accessing that variable and therefore there are no 30processor should be accessing that variable and therefore there are no
31concurrency issues with other processors in the system. 31concurrency issues with other processors in the system.
32 32
33Please note that accesses by remote processors to a per cpu area are
34exceptional situations and may impact performance and/or correctness
35(remote write operations) of local RMW operations via this_cpu_*.
36
37The main use of the this_cpu operations has been to optimize counter
38operations.
39
40The following this_cpu() operations with implied preemption protection
41are defined. These operations can be used without worrying about
42preemption and interrupts.
43
44 this_cpu_add()
45 this_cpu_read(pcp)
46 this_cpu_write(pcp, val)
47 this_cpu_add(pcp, val)
48 this_cpu_and(pcp, val)
49 this_cpu_or(pcp, val)
50 this_cpu_add_return(pcp, val)
51 this_cpu_xchg(pcp, nval)
52 this_cpu_cmpxchg(pcp, oval, nval)
53 this_cpu_cmpxchg_double(pcp1, pcp2, oval1, oval2, nval1, nval2)
54 this_cpu_sub(pcp, val)
55 this_cpu_inc(pcp)
56 this_cpu_dec(pcp)
57 this_cpu_sub_return(pcp, val)
58 this_cpu_inc_return(pcp)
59 this_cpu_dec_return(pcp)
60
61
62Inner working of this_cpu operations
63------------------------------------
64
33On x86 the fs: or the gs: segment registers contain the base of the 65On x86 the fs: or the gs: segment registers contain the base of the
34per cpu area. It is then possible to simply use the segment override 66per cpu area. It is then possible to simply use the segment override
35to relocate a per cpu relative address to the proper per cpu area for 67to relocate a per cpu relative address to the proper per cpu area for
@@ -48,22 +80,21 @@ results in a single instruction
48 mov ax, gs:[x] 80 mov ax, gs:[x]
49 81
50instead of a sequence of calculation of the address and then a fetch 82instead of a sequence of calculation of the address and then a fetch
51from that address which occurs with the percpu operations. Before 83from that address which occurs with the per cpu operations. Before
52this_cpu_ops such sequence also required preempt disable/enable to 84this_cpu_ops such sequence also required preempt disable/enable to
53prevent the kernel from moving the thread to a different processor 85prevent the kernel from moving the thread to a different processor
54while the calculation is performed. 86while the calculation is performed.
55 87
56The main use of the this_cpu operations has been to optimize counter 88Consider the following this_cpu operation:
57operations.
58 89
59 this_cpu_inc(x) 90 this_cpu_inc(x)
60 91
61results in the following single instruction (no lock prefix!) 92The above results in the following single instruction (no lock prefix!)
62 93
63 inc gs:[x] 94 inc gs:[x]
64 95
65instead of the following operations required if there is no segment 96instead of the following operations required if there is no segment
66register. 97register:
67 98
68 int *y; 99 int *y;
69 int cpu; 100 int cpu;
@@ -73,10 +104,10 @@ register.
73 (*y)++; 104 (*y)++;
74 put_cpu(); 105 put_cpu();
75 106
76Note that these operations can only be used on percpu data that is 107Note that these operations can only be used on per cpu data that is
77reserved for a specific processor. Without disabling preemption in the 108reserved for a specific processor. Without disabling preemption in the
78surrounding code this_cpu_inc() will only guarantee that one of the 109surrounding code this_cpu_inc() will only guarantee that one of the
79percpu counters is correctly incremented. However, there is no 110per cpu counters is correctly incremented. However, there is no
80guarantee that the OS will not move the process directly before or 111guarantee that the OS will not move the process directly before or
81after the this_cpu instruction is executed. In general this means that 112after the this_cpu instruction is executed. In general this means that
82the value of the individual counters for each processor are 113the value of the individual counters for each processor are
@@ -86,9 +117,9 @@ that is of interest.
86Per cpu variables are used for performance reasons. Bouncing cache 117Per cpu variables are used for performance reasons. Bouncing cache
87lines can be avoided if multiple processors concurrently go through 118lines can be avoided if multiple processors concurrently go through
88the same code paths. Since each processor has its own per cpu 119the same code paths. Since each processor has its own per cpu
89variables no concurrent cacheline updates take place. The price that 120variables no concurrent cache line updates take place. The price that
90has to be paid for this optimization is the need to add up the per cpu 121has to be paid for this optimization is the need to add up the per cpu
91counters when the value of the counter is needed. 122counters when the value of a counter is needed.
92 123
93 124
94Special operations: 125Special operations:
@@ -100,33 +131,39 @@ Takes the offset of a per cpu variable (&x !) and returns the address
100of the per cpu variable that belongs to the currently executing 131of the per cpu variable that belongs to the currently executing
101processor. this_cpu_ptr avoids multiple steps that the common 132processor. this_cpu_ptr avoids multiple steps that the common
102get_cpu/put_cpu sequence requires. No processor number is 133get_cpu/put_cpu sequence requires. No processor number is
103available. Instead the offset of the local per cpu area is simply 134available. Instead, the offset of the local per cpu area is simply
104added to the percpu offset. 135added to the per cpu offset.
105 136
137Note that this operation is usually used in a code segment when
138preemption has been disabled. The pointer is then used to
139access local per cpu data in a critical section. When preemption
140is re-enabled this pointer is usually no longer useful since it may
141no longer point to per cpu data of the current processor.
106 142
107 143
108Per cpu variables and offsets 144Per cpu variables and offsets
109----------------------------- 145-----------------------------
110 146
111Per cpu variables have *offsets* to the beginning of the percpu 147Per cpu variables have *offsets* to the beginning of the per cpu
112area. They do not have addresses although they look like that in the 148area. They do not have addresses although they look like that in the
113code. Offsets cannot be directly dereferenced. The offset must be 149code. Offsets cannot be directly dereferenced. The offset must be
114added to a base pointer of a percpu area of a processor in order to 150added to a base pointer of a per cpu area of a processor in order to
115form a valid address. 151form a valid address.
116 152
117Therefore the use of x or &x outside of the context of per cpu 153Therefore the use of x or &x outside of the context of per cpu
118operations is invalid and will generally be treated like a NULL 154operations is invalid and will generally be treated like a NULL
119pointer dereference. 155pointer dereference.
120 156
121In the context of per cpu operations 157 DEFINE_PER_CPU(int, x);
122 158
123 x is a per cpu variable. Most this_cpu operations take a cpu 159In the context of per cpu operations the above implies that x is a per
124 variable. 160cpu variable. Most this_cpu operations take a cpu variable.
125 161
126 &x is the *offset* a per cpu variable. this_cpu_ptr() takes 162 int __percpu *p = &x;
127 the offset of a per cpu variable which makes this look a bit
128 strange.
129 163
164&x and hence p is the *offset* of a per cpu variable. this_cpu_ptr()
165takes the offset of a per cpu variable which makes this look a bit
166strange.
130 167
131 168
132Operations on a field of a per cpu structure 169Operations on a field of a per cpu structure
@@ -152,7 +189,7 @@ If we have an offset to struct s:
152 189
153 struct s __percpu *ps = &p; 190 struct s __percpu *ps = &p;
154 191
155 z = this_cpu_dec(ps->m); 192 this_cpu_dec(ps->m);
156 193
157 z = this_cpu_inc_return(ps->n); 194 z = this_cpu_inc_return(ps->n);
158 195
@@ -172,29 +209,52 @@ if we do not make use of this_cpu ops later to manipulate fields:
172Variants of this_cpu ops 209Variants of this_cpu ops
173------------------------- 210-------------------------
174 211
175this_cpu ops are interrupt safe. Some architecture do not support 212this_cpu ops are interrupt safe. Some architectures do not support
176these per cpu local operations. In that case the operation must be 213these per cpu local operations. In that case the operation must be
177replaced by code that disables interrupts, then does the operations 214replaced by code that disables interrupts, then does the operations
178that are guaranteed to be atomic and then reenable interrupts. Doing 215that are guaranteed to be atomic and then re-enable interrupts. Doing
179so is expensive. If there are other reasons why the scheduler cannot 216so is expensive. If there are other reasons why the scheduler cannot
180change the processor we are executing on then there is no reason to 217change the processor we are executing on then there is no reason to
181disable interrupts. For that purpose the __this_cpu operations are 218disable interrupts. For that purpose the following __this_cpu operations
182provided. For example. 219are provided.
183 220
184 __this_cpu_inc(x); 221These operations have no guarantee against concurrent interrupts or
185 222preemption. If a per cpu variable is not used in an interrupt context
186Will increment x and will not fallback to code that disables 223and the scheduler cannot preempt, then they are safe. If any interrupts
224still occur while an operation is in progress and if the interrupt too
225modifies the variable, then RMW actions can not be guaranteed to be
226safe.
227
228 __this_cpu_add()
229 __this_cpu_read(pcp)
230 __this_cpu_write(pcp, val)
231 __this_cpu_add(pcp, val)
232 __this_cpu_and(pcp, val)
233 __this_cpu_or(pcp, val)
234 __this_cpu_add_return(pcp, val)
235 __this_cpu_xchg(pcp, nval)
236 __this_cpu_cmpxchg(pcp, oval, nval)
237 __this_cpu_cmpxchg_double(pcp1, pcp2, oval1, oval2, nval1, nval2)
238 __this_cpu_sub(pcp, val)
239 __this_cpu_inc(pcp)
240 __this_cpu_dec(pcp)
241 __this_cpu_sub_return(pcp, val)
242 __this_cpu_inc_return(pcp)
243 __this_cpu_dec_return(pcp)
244
245
246Will increment x and will not fall-back to code that disables
187interrupts on platforms that cannot accomplish atomicity through 247interrupts on platforms that cannot accomplish atomicity through
188address relocation and a Read-Modify-Write operation in the same 248address relocation and a Read-Modify-Write operation in the same
189instruction. 249instruction.
190 250
191 251
192
193&this_cpu_ptr(pp)->n vs this_cpu_ptr(&pp->n) 252&this_cpu_ptr(pp)->n vs this_cpu_ptr(&pp->n)
194-------------------------------------------- 253--------------------------------------------
195 254
196The first operation takes the offset and forms an address and then 255The first operation takes the offset and forms an address and then
197adds the offset of the n field. 256adds the offset of the n field. This may result in two add
257instructions emitted by the compiler.
198 258
199The second one first adds the two offsets and then does the 259The second one first adds the two offsets and then does the
200relocation. IMHO the second form looks cleaner and has an easier time 260relocation. IMHO the second form looks cleaner and has an easier time
@@ -202,4 +262,73 @@ with (). The second form also is consistent with the way
202this_cpu_read() and friends are used. 262this_cpu_read() and friends are used.
203 263
204 264
205Christoph Lameter, April 3rd, 2013 265Remote access to per cpu data
266------------------------------
267
268Per cpu data structures are designed to be used by one cpu exclusively.
269If you use the variables as intended, this_cpu_ops() are guaranteed to
270be "atomic" as no other CPU has access to these data structures.
271
272There are special cases where you might need to access per cpu data
273structures remotely. It is usually safe to do a remote read access
274and that is frequently done to summarize counters. Remote write access
275something which could be problematic because this_cpu ops do not
276have lock semantics. A remote write may interfere with a this_cpu
277RMW operation.
278
279Remote write accesses to percpu data structures are highly discouraged
280unless absolutely necessary. Please consider using an IPI to wake up
281the remote CPU and perform the update to its per cpu area.
282
283To access per-cpu data structure remotely, typically the per_cpu_ptr()
284function is used:
285
286
287 DEFINE_PER_CPU(struct data, datap);
288
289 struct data *p = per_cpu_ptr(&datap, cpu);
290
291This makes it explicit that we are getting ready to access a percpu
292area remotely.
293
294You can also do the following to convert the datap offset to an address
295
296 struct data *p = this_cpu_ptr(&datap);
297
298but, passing of pointers calculated via this_cpu_ptr to other cpus is
299unusual and should be avoided.
300
301Remote access are typically only for reading the status of another cpus
302per cpu data. Write accesses can cause unique problems due to the
303relaxed synchronization requirements for this_cpu operations.
304
305One example that illustrates some concerns with write operations is
306the following scenario that occurs because two per cpu variables
307share a cache-line but the relaxed synchronization is applied to
308only one process updating the cache-line.
309
310Consider the following example
311
312
313 struct test {
314 atomic_t a;
315 int b;
316 };
317
318 DEFINE_PER_CPU(struct test, onecacheline);
319
320There is some concern about what would happen if the field 'a' is updated
321remotely from one processor and the local processor would use this_cpu ops
322to update field b. Care should be taken that such simultaneous accesses to
323data within the same cache line are avoided. Also costly synchronization
324may be necessary. IPIs are generally recommended in such scenarios instead
325of a remote write to the per cpu area of another processor.
326
327Even in cases where the remote writes are rare, please bear in
328mind that a remote write will evict the cache line from the processor
329that most likely will access it. If the processor wakes up and finds a
330missing local cache line of a per cpu area, its performance and hence
331the wake up times will be affected.
332
333Christoph Lameter, August 4th, 2014
334Pranith Kumar, Aug 2nd, 2014