<feed xmlns='http://www.w3.org/2005/Atom'>
<title>litmus-rt-pandaboard.git/kernel, branch pandaboard-litmus</title>
<subtitle>LITMUS^RT and MC^2 V0 support for the pandaboard.</subtitle>
<link rel='alternate' type='text/html' href='http://rtsrv.cs.unc.edu/cgit/cgit.cgi/litmus-rt-pandaboard.git/'/>
<entry>
<title>Merge remote-tracking branch 'oneiric-ubuntu/pandaboard' into pandaboard-litmus</title>
<updated>2012-09-20T17:33:17+00:00</updated>
<author>
<name>Christopher Kenna</name>
<email>cjk@cs.unc.edu</email>
</author>
<published>2012-09-20T17:33:17+00:00</published>
<link rel='alternate' type='text/html' href='http://rtsrv.cs.unc.edu/cgit/cgit.cgi/litmus-rt-pandaboard.git/commit/?id=5e0a4919ccb230ab449826ef91bdf38a4ed283e5'/>
<id>5e0a4919ccb230ab449826ef91bdf38a4ed283e5</id>
<content type='text'>
Remote branch is from Ubuntu Oneiric (tag Ubuntu-3.0.0-1215.27) with
minor patch to the TWL-RTC code. Trying to pull this into LITMUS^RT
staging (as of 9/20/2012) to get LITMUS^RT running on the PandaBoard.

Conflicts:
	Makefile
	fs/exec.c
	include/linux/fs.h
	kernel/fork.c
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Remote branch is from Ubuntu Oneiric (tag Ubuntu-3.0.0-1215.27) with
minor patch to the TWL-RTC code. Trying to pull this into LITMUS^RT
staging (as of 9/20/2012) to get LITMUS^RT running on the PandaBoard.

Conflicts:
	Makefile
	fs/exec.c
	include/linux/fs.h
	kernel/fork.c
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'v3.0-4460-fixes' into omap-pm-integration</title>
<updated>2012-08-17T08:18:03+00:00</updated>
<author>
<name>Nishanth Menon</name>
<email>nm@ti.com</email>
</author>
<published>2011-07-24T19:57:47+00:00</published>
<link rel='alternate' type='text/html' href='http://rtsrv.cs.unc.edu/cgit/cgit.cgi/litmus-rt-pandaboard.git/commit/?id=37ebde996aa6c080eba4518b5e67d36275d9b5b8'/>
<id>37ebde996aa6c080eba4518b5e67d36275d9b5b8</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>futex: Forbid uaddr == uaddr2 in futex_wait_requeue_pi()</title>
<updated>2012-08-13T13:10:40+00:00</updated>
<author>
<name>Darren Hart</name>
<email>dvhart@linux.intel.com</email>
</author>
<published>2012-07-20T18:53:31+00:00</published>
<link rel='alternate' type='text/html' href='http://rtsrv.cs.unc.edu/cgit/cgit.cgi/litmus-rt-pandaboard.git/commit/?id=63baf9a34fa8f0c3946d13ef30106b45dc1999cc'/>
<id>63baf9a34fa8f0c3946d13ef30106b45dc1999cc</id>
<content type='text'>
BugLink: http://bugs.launchpad.net/bugs/1034988

commit 6f7b0a2a5c0fb03be7c25bd1745baa50582348ef upstream.

If uaddr == uaddr2, then we have broken the rule of only requeueing
from a non-pi futex to a pi futex with this call. If we attempt this,
as the trinity test suite manages to do, we miss early wakeups as
q.key is equal to key2 (because they are the same uaddr). We will then
attempt to dereference the pi_mutex (which would exist had the futex_q
been properly requeued to a pi futex) and trigger a NULL pointer
dereference.

Signed-off-by: Darren Hart &lt;dvhart@linux.intel.com&gt;
Cc: Dave Jones &lt;davej@redhat.com&gt;
Link: http://lkml.kernel.org/r/ad82bfe7f7d130247fbe2b5b4275654807774227.1342809673.git.dvhart@linux.intel.com
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Signed-off-by: Herton Ronaldo Krzesinski &lt;herton.krzesinski@canonical.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
BugLink: http://bugs.launchpad.net/bugs/1034988

commit 6f7b0a2a5c0fb03be7c25bd1745baa50582348ef upstream.

If uaddr == uaddr2, then we have broken the rule of only requeueing
from a non-pi futex to a pi futex with this call. If we attempt this,
as the trinity test suite manages to do, we miss early wakeups as
q.key is equal to key2 (because they are the same uaddr). We will then
attempt to dereference the pi_mutex (which would exist had the futex_q
been properly requeued to a pi futex) and trigger a NULL pointer
dereference.

Signed-off-by: Darren Hart &lt;dvhart@linux.intel.com&gt;
Cc: Dave Jones &lt;davej@redhat.com&gt;
Link: http://lkml.kernel.org/r/ad82bfe7f7d130247fbe2b5b4275654807774227.1342809673.git.dvhart@linux.intel.com
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Signed-off-by: Herton Ronaldo Krzesinski &lt;herton.krzesinski@canonical.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>futex: Fix bug in WARN_ON for NULL q.pi_state</title>
<updated>2012-08-13T13:10:40+00:00</updated>
<author>
<name>Darren Hart</name>
<email>dvhart@linux.intel.com</email>
</author>
<published>2012-07-20T18:53:30+00:00</published>
<link rel='alternate' type='text/html' href='http://rtsrv.cs.unc.edu/cgit/cgit.cgi/litmus-rt-pandaboard.git/commit/?id=9340679edbbc64ce01e5f1b757ba928deff7dc37'/>
<id>9340679edbbc64ce01e5f1b757ba928deff7dc37</id>
<content type='text'>
BugLink: http://bugs.launchpad.net/bugs/1034988

commit f27071cb7fe3e1d37a9dbe6c0dfc5395cd40fa43 upstream.

The WARN_ON in futex_wait_requeue_pi() for a NULL q.pi_state was testing
the address (&amp;q.pi_state) of the pointer instead of the value
(q.pi_state) of the pointer. Correct it accordingly.

Signed-off-by: Darren Hart &lt;dvhart@linux.intel.com&gt;
Cc: Dave Jones &lt;davej@redhat.com&gt;
Link: http://lkml.kernel.org/r/1c85d97f6e5f79ec389a4ead3e367363c74bd09a.1342809673.git.dvhart@linux.intel.com
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Signed-off-by: Herton Ronaldo Krzesinski &lt;herton.krzesinski@canonical.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
BugLink: http://bugs.launchpad.net/bugs/1034988

commit f27071cb7fe3e1d37a9dbe6c0dfc5395cd40fa43 upstream.

The WARN_ON in futex_wait_requeue_pi() for a NULL q.pi_state was testing
the address (&amp;q.pi_state) of the pointer instead of the value
(q.pi_state) of the pointer. Correct it accordingly.

Signed-off-by: Darren Hart &lt;dvhart@linux.intel.com&gt;
Cc: Dave Jones &lt;davej@redhat.com&gt;
Link: http://lkml.kernel.org/r/1c85d97f6e5f79ec389a4ead3e367363c74bd09a.1342809673.git.dvhart@linux.intel.com
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Signed-off-by: Herton Ronaldo Krzesinski &lt;herton.krzesinski@canonical.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>futex: Test for pi_mutex on fault in futex_wait_requeue_pi()</title>
<updated>2012-08-13T13:10:40+00:00</updated>
<author>
<name>Darren Hart</name>
<email>dvhart@linux.intel.com</email>
</author>
<published>2012-07-20T18:53:29+00:00</published>
<link rel='alternate' type='text/html' href='http://rtsrv.cs.unc.edu/cgit/cgit.cgi/litmus-rt-pandaboard.git/commit/?id=01b3c280b2f55aee50945b964fb9c7befdb4f1ee'/>
<id>01b3c280b2f55aee50945b964fb9c7befdb4f1ee</id>
<content type='text'>
BugLink: http://bugs.launchpad.net/bugs/1034988

commit b6070a8d9853eda010a549fa9a09eb8d7269b929 upstream.

If fixup_pi_state_owner() faults, pi_mutex may be NULL. Test
for pi_mutex != NULL before testing the owner against current
and possibly unlocking it.

Signed-off-by: Darren Hart &lt;dvhart@linux.intel.com&gt;
Cc: Dave Jones &lt;davej@redhat.com&gt;
Cc: Dan Carpenter &lt;dan.carpenter@oracle.com&gt;
Link: http://lkml.kernel.org/r/dc59890338fc413606f04e5c5b131530734dae3d.1342809673.git.dvhart@linux.intel.com
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Signed-off-by: Herton Ronaldo Krzesinski &lt;herton.krzesinski@canonical.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
BugLink: http://bugs.launchpad.net/bugs/1034988

commit b6070a8d9853eda010a549fa9a09eb8d7269b929 upstream.

If fixup_pi_state_owner() faults, pi_mutex may be NULL. Test
for pi_mutex != NULL before testing the owner against current
and possibly unlocking it.

Signed-off-by: Darren Hart &lt;dvhart@linux.intel.com&gt;
Cc: Dave Jones &lt;davej@redhat.com&gt;
Cc: Dan Carpenter &lt;dan.carpenter@oracle.com&gt;
Link: http://lkml.kernel.org/r/dc59890338fc413606f04e5c5b131530734dae3d.1342809673.git.dvhart@linux.intel.com
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Signed-off-by: Herton Ronaldo Krzesinski &lt;herton.krzesinski@canonical.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>workqueue: perform cpu down operations from low priority cpu_notifier()</title>
<updated>2012-08-13T13:10:35+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2012-07-17T19:39:26+00:00</published>
<link rel='alternate' type='text/html' href='http://rtsrv.cs.unc.edu/cgit/cgit.cgi/litmus-rt-pandaboard.git/commit/?id=7d96871a273ffbb6e12f5df3a00fb598cc3afd9c'/>
<id>7d96871a273ffbb6e12f5df3a00fb598cc3afd9c</id>
<content type='text'>
BugLink: http://bugs.launchpad.net/bugs/1034988

commit 6575820221f7a4dd6eadecf7bf83cdd154335eda upstream.

Currently, all workqueue cpu hotplug operations run off
CPU_PRI_WORKQUEUE which is higher than normal notifiers.  This is to
ensure that workqueue is up and running while bringing up a CPU before
other notifiers try to use workqueue on the CPU.

Per-cpu workqueues are supposed to remain working and bound to the CPU
for normal CPU_DOWN_PREPARE notifiers.  This holds mostly true even
with workqueue offlining running with higher priority because
workqueue CPU_DOWN_PREPARE only creates a bound trustee thread which
runs the per-cpu workqueue without concurrency management without
explicitly detaching the existing workers.

However, if the trustee needs to create new workers, it creates
unbound workers which may wander off to other CPUs while
CPU_DOWN_PREPARE notifiers are in progress.  Furthermore, if the CPU
down is cancelled, the per-CPU workqueue may end up with workers which
aren't bound to the CPU.

While reliably reproducible with a convoluted artificial test-case
involving scheduling and flushing CPU burning work items from CPU down
notifiers, this isn't very likely to happen in the wild, and, even
when it happens, the effects are likely to be hidden by the following
successful CPU down.

Fix it by using different priorities for up and down notifiers - high
priority for up operations and low priority for down operations.

Workqueue cpu hotplug operations will soon go through further cleanup.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: "Rafael J. Wysocki" &lt;rjw@sisk.pl&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Signed-off-by: Herton Ronaldo Krzesinski &lt;herton.krzesinski@canonical.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
BugLink: http://bugs.launchpad.net/bugs/1034988

commit 6575820221f7a4dd6eadecf7bf83cdd154335eda upstream.

Currently, all workqueue cpu hotplug operations run off
CPU_PRI_WORKQUEUE which is higher than normal notifiers.  This is to
ensure that workqueue is up and running while bringing up a CPU before
other notifiers try to use workqueue on the CPU.

Per-cpu workqueues are supposed to remain working and bound to the CPU
for normal CPU_DOWN_PREPARE notifiers.  This holds mostly true even
with workqueue offlining running with higher priority because
workqueue CPU_DOWN_PREPARE only creates a bound trustee thread which
runs the per-cpu workqueue without concurrency management without
explicitly detaching the existing workers.

However, if the trustee needs to create new workers, it creates
unbound workers which may wander off to other CPUs while
CPU_DOWN_PREPARE notifiers are in progress.  Furthermore, if the CPU
down is cancelled, the per-CPU workqueue may end up with workers which
aren't bound to the CPU.

While reliably reproducible with a convoluted artificial test-case
involving scheduling and flushing CPU burning work items from CPU down
notifiers, this isn't very likely to happen in the wild, and, even
when it happens, the effects are likely to be hidden by the following
successful CPU down.

Fix it by using different priorities for up and down notifiers - high
priority for up operations and low priority for down operations.

Workqueue cpu hotplug operations will soon go through further cleanup.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: "Rafael J. Wysocki" &lt;rjw@sisk.pl&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Signed-off-by: Herton Ronaldo Krzesinski &lt;herton.krzesinski@canonical.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ftrace: Disable function tracing during suspend/resume and hibernation, again</title>
<updated>2012-08-13T13:10:34+00:00</updated>
<author>
<name>Srivatsa S. Bhat</name>
<email>srivatsa.bhat@linux.vnet.ibm.com</email>
</author>
<published>2012-06-16T13:30:45+00:00</published>
<link rel='alternate' type='text/html' href='http://rtsrv.cs.unc.edu/cgit/cgit.cgi/litmus-rt-pandaboard.git/commit/?id=bd2ae50a2a22701606ea0ebe6e5013a64b25b992'/>
<id>bd2ae50a2a22701606ea0ebe6e5013a64b25b992</id>
<content type='text'>
BugLink: http://bugs.launchpad.net/bugs/1034988

commit 443772d408a25af62498793f6f805ce3c559309a upstream.

If function tracing is enabled for some of the low-level suspend/resume
functions, it leads to triple fault during resume from suspend, ultimately
ending up in a reboot instead of a resume (or a total refusal to come out
of suspended state, on some machines).

This issue was explained in more detail in commit f42ac38c59e0a03d (ftrace:
disable tracing for suspend to ram). However, the changes made by that commit
got reverted by commit cbe2f5a6e84eebb (tracing: allow tracing of
suspend/resume &amp; hibernation code again). So, unfortunately since things are
not yet robust enough to allow tracing of low-level suspend/resume functions,
suspend/resume is still broken when ftrace is enabled.

So fix this by disabling function tracing during suspend/resume &amp; hibernation.

Signed-off-by: Srivatsa S. Bhat &lt;srivatsa.bhat@linux.vnet.ibm.com&gt;
Signed-off-by: Rafael J. Wysocki &lt;rjw@sisk.pl&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Signed-off-by: Herton Ronaldo Krzesinski &lt;herton.krzesinski@canonical.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
BugLink: http://bugs.launchpad.net/bugs/1034988

commit 443772d408a25af62498793f6f805ce3c559309a upstream.

If function tracing is enabled for some of the low-level suspend/resume
functions, it leads to triple fault during resume from suspend, ultimately
ending up in a reboot instead of a resume (or a total refusal to come out
of suspended state, on some machines).

This issue was explained in more detail in commit f42ac38c59e0a03d (ftrace:
disable tracing for suspend to ram). However, the changes made by that commit
got reverted by commit cbe2f5a6e84eebb (tracing: allow tracing of
suspend/resume &amp; hibernation code again). So, unfortunately since things are
not yet robust enough to allow tracing of low-level suspend/resume functions,
suspend/resume is still broken when ftrace is enabled.

So fix this by disabling function tracing during suspend/resume &amp; hibernation.

Signed-off-by: Srivatsa S. Bhat &lt;srivatsa.bhat@linux.vnet.ibm.com&gt;
Signed-off-by: Rafael J. Wysocki &lt;rjw@sisk.pl&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Signed-off-by: Herton Ronaldo Krzesinski &lt;herton.krzesinski@canonical.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cpuset: mm: reduce large amounts of memory barrier related damage v3</title>
<updated>2012-08-13T13:10:30+00:00</updated>
<author>
<name>Mel Gorman</name>
<email>mgorman@suse.de</email>
</author>
<published>2012-03-21T23:34:11+00:00</published>
<link rel='alternate' type='text/html' href='http://rtsrv.cs.unc.edu/cgit/cgit.cgi/litmus-rt-pandaboard.git/commit/?id=77619a3b6edeaf4864b4260d3d35566f78fe4c00'/>
<id>77619a3b6edeaf4864b4260d3d35566f78fe4c00</id>
<content type='text'>
BugLink: http://bugs.launchpad.net/bugs/1031926

commit cc9a6c8776615f9c194ccf0b63a0aa5628235545 upstream.

Stable note:  Not tracked in Bugzilla. [get|put]_mems_allowed() is extremely
	expensive and severely impacted page allocator performance. This
	is part of a series of patches that reduce page allocator overhead.

Commit c0ff7453bb5c ("cpuset,mm: fix no node to alloc memory when
changing cpuset's mems") wins a super prize for the largest number of
memory barriers entered into fast paths for one commit.

[get|put]_mems_allowed is incredibly heavy with pairs of full memory
barriers inserted into a number of hot paths.  This was detected while
investigating at large page allocator slowdown introduced some time
after 2.6.32.  The largest portion of this overhead was shown by
oprofile to be at an mfence introduced by this commit into the page
allocator hot path.

For extra style points, the commit introduced the use of yield() in an
implementation of what looks like a spinning mutex.

This patch replaces the full memory barriers on both read and write
sides with a sequence counter with just read barriers on the fast path
side.  This is much cheaper on some architectures, including x86.  The
main bulk of the patch is the retry logic if the nodemask changes in a
manner that can cause a false failure.

While updating the nodemask, a check is made to see if a false failure
is a risk.  If it is, the sequence number gets bumped and parallel
allocators will briefly stall while the nodemask update takes place.

In a page fault test microbenchmark, oprofile samples from
__alloc_pages_nodemask went from 4.53% of all samples to 1.15%.  The
actual results were

                             3.3.0-rc3          3.3.0-rc3
                             rc3-vanilla        nobarrier-v2r1
    Clients   1 UserTime       0.07 (  0.00%)   0.08 (-14.19%)
    Clients   2 UserTime       0.07 (  0.00%)   0.07 (  2.72%)
    Clients   4 UserTime       0.08 (  0.00%)   0.07 (  3.29%)
    Clients   1 SysTime        0.70 (  0.00%)   0.65 (  6.65%)
    Clients   2 SysTime        0.85 (  0.00%)   0.82 (  3.65%)
    Clients   4 SysTime        1.41 (  0.00%)   1.41 (  0.32%)
    Clients   1 WallTime       0.77 (  0.00%)   0.74 (  4.19%)
    Clients   2 WallTime       0.47 (  0.00%)   0.45 (  3.73%)
    Clients   4 WallTime       0.38 (  0.00%)   0.37 (  1.58%)
    Clients   1 Flt/sec/cpu  497620.28 (  0.00%) 520294.53 (  4.56%)
    Clients   2 Flt/sec/cpu  414639.05 (  0.00%) 429882.01 (  3.68%)
    Clients   4 Flt/sec/cpu  257959.16 (  0.00%) 258761.48 (  0.31%)
    Clients   1 Flt/sec      495161.39 (  0.00%) 517292.87 (  4.47%)
    Clients   2 Flt/sec      820325.95 (  0.00%) 850289.77 (  3.65%)
    Clients   4 Flt/sec      1020068.93 (  0.00%) 1022674.06 (  0.26%)
    MMTests Statistics: duration
    Sys Time Running Test (seconds)             135.68    132.17
    User+Sys Time Running Test (seconds)         164.2    160.13
    Total Elapsed Time (seconds)                123.46    120.87

The overall improvement is small but the System CPU time is much
improved and roughly in correlation to what oprofile reported (these
performance figures are without profiling so skew is expected).  The
actual number of page faults is noticeably improved.

For benchmarks like kernel builds, the overall benefit is marginal but
the system CPU time is slightly reduced.

To test the actual bug the commit fixed I opened two terminals.  The
first ran within a cpuset and continually ran a small program that
faulted 100M of anonymous data.  In a second window, the nodemask of the
cpuset was continually randomised in a loop.

Without the commit, the program would fail every so often (usually
within 10 seconds) and obviously with the commit everything worked fine.
With this patch applied, it also worked fine so the fix should be
functionally equivalent.

Signed-off-by: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Miao Xie &lt;miaox@cn.fujitsu.com&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Mel Gorman &lt;mgorman@suse.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Signed-off-by: Herton Ronaldo Krzesinski &lt;herton.krzesinski@canonical.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
BugLink: http://bugs.launchpad.net/bugs/1031926

commit cc9a6c8776615f9c194ccf0b63a0aa5628235545 upstream.

Stable note:  Not tracked in Bugzilla. [get|put]_mems_allowed() is extremely
	expensive and severely impacted page allocator performance. This
	is part of a series of patches that reduce page allocator overhead.

Commit c0ff7453bb5c ("cpuset,mm: fix no node to alloc memory when
changing cpuset's mems") wins a super prize for the largest number of
memory barriers entered into fast paths for one commit.

[get|put]_mems_allowed is incredibly heavy with pairs of full memory
barriers inserted into a number of hot paths.  This was detected while
investigating at large page allocator slowdown introduced some time
after 2.6.32.  The largest portion of this overhead was shown by
oprofile to be at an mfence introduced by this commit into the page
allocator hot path.

For extra style points, the commit introduced the use of yield() in an
implementation of what looks like a spinning mutex.

This patch replaces the full memory barriers on both read and write
sides with a sequence counter with just read barriers on the fast path
side.  This is much cheaper on some architectures, including x86.  The
main bulk of the patch is the retry logic if the nodemask changes in a
manner that can cause a false failure.

While updating the nodemask, a check is made to see if a false failure
is a risk.  If it is, the sequence number gets bumped and parallel
allocators will briefly stall while the nodemask update takes place.

In a page fault test microbenchmark, oprofile samples from
__alloc_pages_nodemask went from 4.53% of all samples to 1.15%.  The
actual results were

                             3.3.0-rc3          3.3.0-rc3
                             rc3-vanilla        nobarrier-v2r1
    Clients   1 UserTime       0.07 (  0.00%)   0.08 (-14.19%)
    Clients   2 UserTime       0.07 (  0.00%)   0.07 (  2.72%)
    Clients   4 UserTime       0.08 (  0.00%)   0.07 (  3.29%)
    Clients   1 SysTime        0.70 (  0.00%)   0.65 (  6.65%)
    Clients   2 SysTime        0.85 (  0.00%)   0.82 (  3.65%)
    Clients   4 SysTime        1.41 (  0.00%)   1.41 (  0.32%)
    Clients   1 WallTime       0.77 (  0.00%)   0.74 (  4.19%)
    Clients   2 WallTime       0.47 (  0.00%)   0.45 (  3.73%)
    Clients   4 WallTime       0.38 (  0.00%)   0.37 (  1.58%)
    Clients   1 Flt/sec/cpu  497620.28 (  0.00%) 520294.53 (  4.56%)
    Clients   2 Flt/sec/cpu  414639.05 (  0.00%) 429882.01 (  3.68%)
    Clients   4 Flt/sec/cpu  257959.16 (  0.00%) 258761.48 (  0.31%)
    Clients   1 Flt/sec      495161.39 (  0.00%) 517292.87 (  4.47%)
    Clients   2 Flt/sec      820325.95 (  0.00%) 850289.77 (  3.65%)
    Clients   4 Flt/sec      1020068.93 (  0.00%) 1022674.06 (  0.26%)
    MMTests Statistics: duration
    Sys Time Running Test (seconds)             135.68    132.17
    User+Sys Time Running Test (seconds)         164.2    160.13
    Total Elapsed Time (seconds)                123.46    120.87

The overall improvement is small but the System CPU time is much
improved and roughly in correlation to what oprofile reported (these
performance figures are without profiling so skew is expected).  The
actual number of page faults is noticeably improved.

For benchmarks like kernel builds, the overall benefit is marginal but
the system CPU time is slightly reduced.

To test the actual bug the commit fixed I opened two terminals.  The
first ran within a cpuset and continually ran a small program that
faulted 100M of anonymous data.  In a second window, the nodemask of the
cpuset was continually randomised in a loop.

Without the commit, the program would fail every so often (usually
within 10 seconds) and obviously with the commit everything worked fine.
With this patch applied, it also worked fine so the fix should be
functionally equivalent.

Signed-off-by: Mel Gorman &lt;mgorman@suse.de&gt;
Cc: Miao Xie &lt;miaox@cn.fujitsu.com&gt;
Cc: David Rientjes &lt;rientjes@google.com&gt;
Cc: Peter Zijlstra &lt;a.p.zijlstra@chello.nl&gt;
Cc: Christoph Lameter &lt;cl@linux.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Mel Gorman &lt;mgorman@suse.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Signed-off-by: Herton Ronaldo Krzesinski &lt;herton.krzesinski@canonical.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cpusets: stall when updating mems_allowed for mempolicy or disjoint nodemask</title>
<updated>2012-08-13T13:10:30+00:00</updated>
<author>
<name>David Rientjes</name>
<email>rientjes@google.com</email>
</author>
<published>2011-12-20T01:11:52+00:00</published>
<link rel='alternate' type='text/html' href='http://rtsrv.cs.unc.edu/cgit/cgit.cgi/litmus-rt-pandaboard.git/commit/?id=73a2437d720db594b59519517550b2adcda789e1'/>
<id>73a2437d720db594b59519517550b2adcda789e1</id>
<content type='text'>
BugLink: http://bugs.launchpad.net/bugs/1031926

commit b246272ecc5ac68c743b15c9e41a2275f7ce70e2 upstream.

Stable note: Not tracked in Bugzilla. [get|put]_mems_allowed() is extremely
	expensive and severely impacted page allocator performance. This is
	part of a series of patches that reduce page allocator overhead.

Kernels where MAX_NUMNODES &gt; BITS_PER_LONG may temporarily see an empty
nodemask in a tsk's mempolicy if its previous nodemask is remapped onto a
new set of allowed cpuset nodes where the two nodemasks, as a result of
the remap, are now disjoint.

c0ff7453bb5c ("cpuset,mm: fix no node to alloc memory when changing
cpuset's mems") adds get_mems_allowed() to prevent the set of allowed
nodes from changing for a thread.  This causes any update to a set of
allowed nodes to stall until put_mems_allowed() is called.

This stall is unncessary, however, if at least one node remains unchanged
in the update to the set of allowed nodes.  This was addressed by
89e8a244b97e ("cpusets: avoid looping when storing to mems_allowed if one
node remains set"), but it's still possible that an empty nodemask may be
read from a mempolicy because the old nodemask may be remapped to the new
nodemask during rebind.  To prevent this, only avoid the stall if there is
no mempolicy for the thread being changed.

This is a temporary solution until all reads from mempolicy nodemasks can
be guaranteed to not be empty without the get_mems_allowed()
synchronization.

Also moves the check for nodemask intersection inside task_lock() so that
tsk-&gt;mems_allowed cannot change.  This ensures that nothing can set this
tsk's mems_allowed out from under us and also protects tsk-&gt;mempolicy.

Reported-by: Miao Xie &lt;miaox@cn.fujitsu.com&gt;
Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Cc: Paul Menage &lt;paul@paulmenage.org&gt;
Cc: Stephen Rothwell &lt;sfr@canb.auug.org.au&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Mel Gorman &lt;mgorman@suse.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Signed-off-by: Herton Ronaldo Krzesinski &lt;herton.krzesinski@canonical.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
BugLink: http://bugs.launchpad.net/bugs/1031926

commit b246272ecc5ac68c743b15c9e41a2275f7ce70e2 upstream.

Stable note: Not tracked in Bugzilla. [get|put]_mems_allowed() is extremely
	expensive and severely impacted page allocator performance. This is
	part of a series of patches that reduce page allocator overhead.

Kernels where MAX_NUMNODES &gt; BITS_PER_LONG may temporarily see an empty
nodemask in a tsk's mempolicy if its previous nodemask is remapped onto a
new set of allowed cpuset nodes where the two nodemasks, as a result of
the remap, are now disjoint.

c0ff7453bb5c ("cpuset,mm: fix no node to alloc memory when changing
cpuset's mems") adds get_mems_allowed() to prevent the set of allowed
nodes from changing for a thread.  This causes any update to a set of
allowed nodes to stall until put_mems_allowed() is called.

This stall is unncessary, however, if at least one node remains unchanged
in the update to the set of allowed nodes.  This was addressed by
89e8a244b97e ("cpusets: avoid looping when storing to mems_allowed if one
node remains set"), but it's still possible that an empty nodemask may be
read from a mempolicy because the old nodemask may be remapped to the new
nodemask during rebind.  To prevent this, only avoid the stall if there is
no mempolicy for the thread being changed.

This is a temporary solution until all reads from mempolicy nodemasks can
be guaranteed to not be empty without the get_mems_allowed()
synchronization.

Also moves the check for nodemask intersection inside task_lock() so that
tsk-&gt;mems_allowed cannot change.  This ensures that nothing can set this
tsk's mems_allowed out from under us and also protects tsk-&gt;mempolicy.

Reported-by: Miao Xie &lt;miaox@cn.fujitsu.com&gt;
Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Cc: Paul Menage &lt;paul@paulmenage.org&gt;
Cc: Stephen Rothwell &lt;sfr@canb.auug.org.au&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Mel Gorman &lt;mgorman@suse.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Signed-off-by: Herton Ronaldo Krzesinski &lt;herton.krzesinski@canonical.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cpusets: avoid looping when storing to mems_allowed if one node remains set</title>
<updated>2012-08-13T13:10:30+00:00</updated>
<author>
<name>David Rientjes</name>
<email>rientjes@google.com</email>
</author>
<published>2011-11-02T20:38:39+00:00</published>
<link rel='alternate' type='text/html' href='http://rtsrv.cs.unc.edu/cgit/cgit.cgi/litmus-rt-pandaboard.git/commit/?id=0893514e3eed73d486095ad5ad225b09345e305e'/>
<id>0893514e3eed73d486095ad5ad225b09345e305e</id>
<content type='text'>
BugLink: http://bugs.launchpad.net/bugs/1031926

commit 89e8a244b97e48f1f30e898b6f32acca477f2a13 upstream.

Stable note: Not tracked in Bugzilla. [get|put]_mems_allowed() is
	extremely expensive and severely impacted page allocator performance.
	This is part of a series of patches that reduce page allocator
	overhead.

{get,put}_mems_allowed() exist so that general kernel code may locklessly
access a task's set of allowable nodes without having the chance that a
concurrent write will cause the nodemask to be empty on configurations
where MAX_NUMNODES &gt; BITS_PER_LONG.

This could incur a significant delay, however, especially in low memory
conditions because the page allocator is blocking and reclaim requires
get_mems_allowed() itself.  It is not atypical to see writes to
cpuset.mems take over 2 seconds to complete, for example.  In low memory
conditions, this is problematic because it's one of the most imporant
times to change cpuset.mems in the first place!

The only way a task's set of allowable nodes may change is through cpusets
by writing to cpuset.mems and when attaching a task to a generic code is
not reading the nodemask with get_mems_allowed() at the same time, and
then clearing all the old nodes.  This prevents the possibility that a
reader will see an empty nodemask at the same time the writer is storing a
new nodemask.

If at least one node remains unchanged, though, it's possible to simply
set all new nodes and then clear all the old nodes.  Changing a task's
nodemask is protected by cgroup_mutex so it's guaranteed that two threads
are not changing the same task's nodemask at the same time, so the
nodemask is guaranteed to be stored before another thread changes it and
determines whether a node remains set or not.

Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Cc: Miao Xie &lt;miaox@cn.fujitsu.com&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Cc: Nick Piggin &lt;npiggin@kernel.dk&gt;
Cc: Paul Menage &lt;paul@paulmenage.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Mel Gorman &lt;mgorman@suse.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Signed-off-by: Herton Ronaldo Krzesinski &lt;herton.krzesinski@canonical.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
BugLink: http://bugs.launchpad.net/bugs/1031926

commit 89e8a244b97e48f1f30e898b6f32acca477f2a13 upstream.

Stable note: Not tracked in Bugzilla. [get|put]_mems_allowed() is
	extremely expensive and severely impacted page allocator performance.
	This is part of a series of patches that reduce page allocator
	overhead.

{get,put}_mems_allowed() exist so that general kernel code may locklessly
access a task's set of allowable nodes without having the chance that a
concurrent write will cause the nodemask to be empty on configurations
where MAX_NUMNODES &gt; BITS_PER_LONG.

This could incur a significant delay, however, especially in low memory
conditions because the page allocator is blocking and reclaim requires
get_mems_allowed() itself.  It is not atypical to see writes to
cpuset.mems take over 2 seconds to complete, for example.  In low memory
conditions, this is problematic because it's one of the most imporant
times to change cpuset.mems in the first place!

The only way a task's set of allowable nodes may change is through cpusets
by writing to cpuset.mems and when attaching a task to a generic code is
not reading the nodemask with get_mems_allowed() at the same time, and
then clearing all the old nodes.  This prevents the possibility that a
reader will see an empty nodemask at the same time the writer is storing a
new nodemask.

If at least one node remains unchanged, though, it's possible to simply
set all new nodes and then clear all the old nodes.  Changing a task's
nodemask is protected by cgroup_mutex so it's guaranteed that two threads
are not changing the same task's nodemask at the same time, so the
nodemask is guaranteed to be stored before another thread changes it and
determines whether a node remains set or not.

Signed-off-by: David Rientjes &lt;rientjes@google.com&gt;
Cc: Miao Xie &lt;miaox@cn.fujitsu.com&gt;
Cc: KOSAKI Motohiro &lt;kosaki.motohiro@jp.fujitsu.com&gt;
Cc: Nick Piggin &lt;npiggin@kernel.dk&gt;
Cc: Paul Menage &lt;paul@paulmenage.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Mel Gorman &lt;mgorman@suse.de&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Signed-off-by: Herton Ronaldo Krzesinski &lt;herton.krzesinski@canonical.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
