perf: Fix perf ring buffer memory ordering

The PPC64 people noticed a missing memory barrier and crufty old comments in the perf ring buffer code. So update all the comments and add the missing barrier. When the architecture implements local_t using atomic_long_t there will be double barriers issued; but short of introducing more conditional barrier primitives this is the best we can do. Reported-by: Victor Kaplansky <victork@il.ibm.com> Tested-by: Victor Kaplansky <victork@il.ibm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: michael@ellerman.id.au Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Michael Neuling <mikey@neuling.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: anton@samba.org Cc: benh@kernel.crashing.org Link: http://lkml.kernel.org/r/20131025173749.GG19466@laptop.lan Signed-off-by: Ingo Molnar <mingo@kernel.org>
author: Peter Zijlstra <peterz@infradead.org> 2013-10-28 08:55:29 -0400
committer: Ingo Molnar <mingo@kernel.org> 2013-10-29 07:01:19 -0400
commit: bf378d341e4873ed928dc3c636252e6895a21f50 (patch)
tree: df69751e469725f2e23da8404884b5540d1bcddc /kernel
parent: cd65718712469ad844467250e8fad20a5838baae (diff)
1 files changed, 27 insertions, 4 deletions
diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index cd55144270b5..9c2ddfbf4525 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -87,10 +87,31 @@ again:
                goto out;
        /*
-         * Publish the known good head. Rely on the full barrier implied
+         * Since the mmap() consumer (userspace) can run on a different CPU:
-         * by atomic_dec_and_test() order the rb->head read and this
+         *
-         * write.
+         *   kernel                             user
+         *
+         *   READ ->data_tail                   READ ->data_head
+         *   smp_mb()   (A)                     smp_rmb()       (C)
+         *   WRITE $data                        READ $data
+         *   smp_wmb()  (B)                     smp_mb()        (D)
+         *   STORE ->data_head                  WRITE ->data_tail
+         *
+         * Where A pairs with D, and B pairs with C.
+         *
+         * I don't think A needs to be a full barrier because we won't in fact
+         * write data until we see the store from userspace. So we simply don't
+         * issue the data WRITE until we observe it. Be conservative for now.
+         *
+         * OTOH, D needs to be a full barrier since it separates the data READ
+         * from the tail WRITE.
+         *
+         * For B a WMB is sufficient since it separates two WRITEs, and for C
+         * an RMB is sufficient since it separates two READs.
+         *
+         * See perf_output_begin().
         */
+        smp_wmb();
        rb->user_page->data_head = head;
        /*
@@ -154,9 +175,11 @@ int perf_output_begin(struct perf_output_handle *handle,
                 * Userspace could choose to issue a mb() before updating the
                 * tail pointer. So that all reads will be completed before the
                 * write is issued.
+                 *
+                 * See perf_output_put_handle().
                 */
                tail = ACCESS_ONCE(rb->user_page->data_tail);
-                smp_rmb();
+                smp_mb();
                offset = head = local_read(&rb->head);
                head += size;
                if (unlikely(!perf_output_space(rb, tail, offset, head)))
author	Peter Zijlstra <peterz@infradead.org>	2013-10-28 08:55:29 -0400
committer	Ingo Molnar <mingo@kernel.org>	2013-10-29 07:01:19 -0400
commit	bf378d341e4873ed928dc3c636252e6895a21f50 (patch)
tree	df69751e469725f2e23da8404884b5540d1bcddc /kernel
parent	cd65718712469ad844467250e8fad20a5838baae (diff)

diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c index cd55144270b5..9c2ddfbf4525 100644 --- a/kernel/events/ring_buffer.c +++ b/kernel/events/ring_buffer.c
@@ -87,10 +87,31 @@ again:
87	goto out;	87	goto out;
88		88
89	/*	89	/*
90	* Publish the known good head. Rely on the full barrier implied	90	* Since the mmap() consumer (userspace) can run on a different CPU:
91	* by atomic_dec_and_test() order the rb->head read and this	91	*
92	* write.	92	* kernel user
		93	*
		94	* READ ->data_tail READ ->data_head
		95	* smp_mb() (A) smp_rmb() (C)
		96	* WRITE $data READ $data
		97	* smp_wmb() (B) smp_mb() (D)
		98	* STORE ->data_head WRITE ->data_tail
		99	*
		100	* Where A pairs with D, and B pairs with C.
		101	*
		102	* I don't think A needs to be a full barrier because we won't in fact
		103	* write data until we see the store from userspace. So we simply don't
		104	* issue the data WRITE until we observe it. Be conservative for now.
		105	*
		106	* OTOH, D needs to be a full barrier since it separates the data READ
		107	* from the tail WRITE.
		108	*
		109	* For B a WMB is sufficient since it separates two WRITEs, and for C
		110	* an RMB is sufficient since it separates two READs.
		111	*
		112	* See perf_output_begin().
93	*/	113	*/
		114	smp_wmb();
94	rb->user_page->data_head = head;	115	rb->user_page->data_head = head;
95		116
96	/*	117	/*
@@ -154,9 +175,11 @@ int perf_output_begin(struct perf_output_handle *handle,
154	* Userspace could choose to issue a mb() before updating the	175	* Userspace could choose to issue a mb() before updating the
155	* tail pointer. So that all reads will be completed before the	176	* tail pointer. So that all reads will be completed before the
156	* write is issued.	177	* write is issued.
		178	*
		179	* See perf_output_put_handle().
157	*/	180	*/
158	tail = ACCESS_ONCE(rb->user_page->data_tail);	181	tail = ACCESS_ONCE(rb->user_page->data_tail);
159	smp_rmb();	182	smp_mb();
160	offset = head = local_read(&rb->head);	183	offset = head = local_read(&rb->head);
161	head += size;	184	head += size;
162	if (unlikely(!perf_output_space(rb, tail, offset, head)))	185	if (unlikely(!perf_output_space(rb, tail, offset, head)))