aboutsummaryrefslogtreecommitdiffstats
path: root/kernel
diff options
context:
space:
mode:
authorMilton Miller <miltonm@bga.com>2011-03-15 15:27:16 -0400
committerLinus Torvalds <torvalds@linux-foundation.org>2011-03-17 19:58:10 -0400
commit45a5791920ae643eafc02e2eedef1a58e341b736 (patch)
tree104873926bde0203fd0688b62fee2058f3dd271e /kernel
parente6cd1e07a185d5f9b0aa75e020df02d3c1c44940 (diff)
call_function_many: add missing ordering
Paul McKenney's review pointed out two problems with the barriers in the 2.6.38 update to the smp call function many code. First, a barrier that would force the func and info members of data to be visible before their consumption in the interrupt handler was missing. This can be solved by adding a smp_wmb between setting the func and info members and setting setting the cpumask; this will pair with the existing and required smp_rmb ordering the cpumask read before the read of refs. This placement avoids the need a second smp_rmb in the interrupt handler which would be executed on each of the N cpus executing the call request. (I was thinking this barrier was present but was not). Second, the previous write to refs (establishing the zero that we the interrupt handler was testing from all cpus) was performed by a third party cpu. This would invoke transitivity which, as a recient or concurrent addition to memory-barriers.txt now explicitly states, would require a full smp_mb(). However, we know the cpumask will only be set by one cpu (the data owner) and any preivous iteration of the mask would have cleared by the reading cpu. By redundantly writing refs to 0 on the owning cpu before the smp_wmb, the write to refs will follow the same path as the writes that set the cpumask, which in turn allows us to keep the barrier in the interrupt handler a smp_rmb instead of promoting it to a smp_mb (which will be be executed by N cpus for each of the possible M elements on the list). I moved and expanded the comment about our (ab)use of the rcu list primitives for the concurrent walk earlier into this function. I considered moving the first two paragraphs to the queue list head and lock, but felt it would have been too disconected from the code. Cc: Paul McKinney <paulmck@linux.vnet.ibm.com> Cc: stable@kernel.org (2.6.32 and later) Signed-off-by: Milton Miller <miltonm@bga.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Diffstat (limited to 'kernel')
-rw-r--r--kernel/smp.c46
1 files changed, 33 insertions, 13 deletions
diff --git a/kernel/smp.c b/kernel/smp.c
index aaeee20c5634..7c6ded5effd9 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -483,23 +483,42 @@ void smp_call_function_many(const struct cpumask *mask,
483 483
484 data = &__get_cpu_var(cfd_data); 484 data = &__get_cpu_var(cfd_data);
485 csd_lock(&data->csd); 485 csd_lock(&data->csd);
486 BUG_ON(atomic_read(&data->refs) || !cpumask_empty(data->cpumask));
487 486
488 data->csd.func = func; 487 /* This BUG_ON verifies our reuse assertions and can be removed */
489 data->csd.info = info; 488 BUG_ON(atomic_read(&data->refs) || !cpumask_empty(data->cpumask));
490 cpumask_and(data->cpumask, mask, cpu_online_mask);
491 cpumask_clear_cpu(this_cpu, data->cpumask);
492 489
493 /* 490 /*
491 * The global call function queue list add and delete are protected
492 * by a lock, but the list is traversed without any lock, relying
493 * on the rcu list add and delete to allow safe concurrent traversal.
494 * We reuse the call function data without waiting for any grace 494 * We reuse the call function data without waiting for any grace
495 * period after some other cpu removes it from the global queue. 495 * period after some other cpu removes it from the global queue.
496 * This means a cpu might find our data block as it is writen. 496 * This means a cpu might find our data block as it is being
497 * The interrupt handler waits until it sees refs filled out 497 * filled out.
498 * while its cpu mask bit is set; here we may only clear our 498 *
499 * own cpu mask bit, and must wait to set refs until we are sure 499 * We hold off the interrupt handler on the other cpu by
500 * previous writes are complete and we have obtained the lock to 500 * ordering our writes to the cpu mask vs our setting of the
501 * add the element to the queue. 501 * refs counter. We assert only the cpu owning the data block
502 * will set a bit in cpumask, and each bit will only be cleared
503 * by the subject cpu. Each cpu must first find its bit is
504 * set and then check that refs is set indicating the element is
505 * ready to be processed, otherwise it must skip the entry.
506 *
507 * On the previous iteration refs was set to 0 by another cpu.
508 * To avoid the use of transitivity, set the counter to 0 here
509 * so the wmb will pair with the rmb in the interrupt handler.
502 */ 510 */
511 atomic_set(&data->refs, 0); /* convert 3rd to 1st party write */
512
513 data->csd.func = func;
514 data->csd.info = info;
515
516 /* Ensure 0 refs is visible before mask. Also orders func and info */
517 smp_wmb();
518
519 /* We rely on the "and" being processed before the store */
520 cpumask_and(data->cpumask, mask, cpu_online_mask);
521 cpumask_clear_cpu(this_cpu, data->cpumask);
503 522
504 raw_spin_lock_irqsave(&call_function.lock, flags); 523 raw_spin_lock_irqsave(&call_function.lock, flags);
505 /* 524 /*
@@ -509,8 +528,9 @@ void smp_call_function_many(const struct cpumask *mask,
509 */ 528 */
510 list_add_rcu(&data->csd.list, &call_function.queue); 529 list_add_rcu(&data->csd.list, &call_function.queue);
511 /* 530 /*
512 * We rely on the wmb() in list_add_rcu to order the writes 531 * We rely on the wmb() in list_add_rcu to complete our writes
513 * to func, data, and cpumask before this write to refs. 532 * to the cpumask before this write to refs, which indicates
533 * data is on the list and is ready to be processed.
514 */ 534 */
515 atomic_set(&data->refs, cpumask_weight(data->cpumask)); 535 atomic_set(&data->refs, cpumask_weight(data->cpumask));
516 raw_spin_unlock_irqrestore(&call_function.lock, flags); 536 raw_spin_unlock_irqrestore(&call_function.lock, flags);