litmus-rt.git/init, branch wip-2012.3-gpu-preport

Fix CPU spinlock lockups on secondary CPU bringup

2011-06-23T15:59:38+00:00

Secondary CPU bringup typically calls calibrate_delay() during its
initialization.  However, calibrate_delay() modifies a global variable
(loops_per_jiffy) used for udelay() and __delay().

A side effect of 71c696b1 ("calibrate: extract fall-back calculation
into own helper") introduced in the 2.6.39 merge window means that we
end up with a substantial period where loops_per_jiffy is zero.  This
causes the spinlock debugging code to malfunction:

	u64 loops = loops_per_jiffy * HZ;
	for (;;) {
		for (i = 0; i < loops; i++) {
			if (arch_spin_trylock(&lock->raw_lock))
				return;
			__delay(1);
		}
		...
	}

by never calling arch_spin_trylock() - resulting in the CPU locking
up in an infinite loop inside __spin_lock_debug().

Work around this by only writing to loops_per_jiffy only once we have
completed all the calibration decisions.

Tested-by: Santosh Shilimkar 
Signed-off-by: Russell King 
Cc:  (2.6.39-stable)
--
Better solutions (such as omitting the calibration for secondary CPUs,
or arranging for calibrate_delay() to return the LPJ value and leave
it to the caller to decide where to store it) are a possibility, but
would be much more invasive into each architecture.

I think this is the best solution for -rc and stable, but it should be
revisited for the next merge window.

 init/calibrate.c |   14 ++++++++------
 1 files changed, 8 insertions(+), 6 deletions(-)
Signed-off-by: Linus Torvalds

generic-ipi: Fix kexec boot crash by initializing call_single_queue before enabling interrupts

2011-06-17T08:17:12+00:00

There is a problem that kdump(2nd kernel) sometimes hangs up due
to a pending IPI from 1st kernel. Kernel panic occurs because IPI
comes before call_single_queue is initialized.

To fix the crash, rename init_call_single_data() to call_function_init()
and call it in start_kernel() so that call_single_queue can be
initialized before enabling interrupts.

The details of the crash are:

 (1) 2nd kernel boots up

 (2) A pending IPI from 1st kernel comes when irqs are first enabled
     in start_kernel().

 (3) Kernel tries to handle the interrupt, but call_single_queue
     is not initialized yet at this point. As a result, in the
     generic_smp_call_function_single_interrupt(), NULL pointer
     dereference occurs when list_replace_init() tries to access
     &q->list.next.

Therefore this patch changes the name of init_call_single_data()
to call_function_init() and calls it before local_irq_enable()
in start_kernel().

Signed-off-by: Takao Indoh 
Reviewed-by: WANG Cong 
Acked-by: Neil Horman 
Acked-by: Vivek Goyal 
Acked-by: Peter Zijlstra 
Cc: Milton Miller 
Cc: Jens Axboe 
Cc: Paul E. McKenney 
Cc: kexec@lists.infradead.org
Link: http://lkml.kernel.org/r/D6CBEE2F420741indou.takao@jp.fujitsu.com
Signed-off-by: Ingo Molnar

gcov: disable CONFIG_CONSTRUCTORS when not needed by CONFIG_GCOV_KERNEL

2011-06-16T03:04:01+00:00

CONFIG_CONSTRUCTORS controls support for running constructor functions at
kernel init time.  According to commit b99b87f70c7785ab ("kernel:
constructor support"), gcov (CONFIG_GCOV_KERNEL) needs this.  However,
CONFIG_CONSTRUCTORS currently defaults to y, with no option to disable it,
and CONFIG_GCOV_KERNEL depends on it.  Instead, default it to n and have
CONFIG_GCOV_KERNEL select it, so that the normal case of
CONFIG_GCOV_KERNEL=n will result in CONFIG_CONSTRUCTORS=n.

Observed in the short list of =y values in a minimal kernel configuration.

Signed-off-by: Josh Triplett 
Acked-by: WANG Cong 
Acked-by: Peter Oberparleiter 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

init/calibrate.c: remove annoying printk

2011-06-16T03:04:01+00:00

Remove calibrate_delay_direct()'s KERN_DEBUG printk related to bogomips
calculation as it appears when booting every core on setups with
'ignore_loglevel' which dmesg people scan for possible issues.  As the
message doesn't show very useful information to the widest audience of
kernel boot message gazers, it should be removed.

Introduced by commit d2b463135f84 ("init/calibrate.c: fix for critical
bogoMIPS intermittent calculation failure").

Signed-off-by: Borislav Petkov 
Cc: Andrew Worsley 
Cc: Phil Carmody 
Cc: Peter Zijlstra 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

uts: make default hostname configurable, rather than always using "(none)"

2011-06-16T03:04:00+00:00

The "hostname" tool falls back to setting the hostname to "localhost" if
/etc/hostname does not exist.  Distribution init scripts have the same
fallback.  However, if userspace never calls sethostname, such as when
booting with init=/bin/sh, or otherwise booting a minimal system without
the usual init scripts, the default hostname of "(none)" remains,
unhelpfully appearing in various places such as prompts ("root@(none):~#")
and logs.  Furthermore, "(none)" doesn't typically resolve to anything
useful.

Make the default hostname configurable.  This removes the need for the
standard fallback, provides a useful default for systems that never call
sethostname, and makes minimal systems that much more useful with less
configuration.  Distributions could choose to use "localhost" here to
avoid the fallback, while embedded systems may wish to use a specific
target hostname.

Signed-off-by: Josh Triplett 
Acked-by: Linus Torvalds 
Acked-by: David Miller 
Cc: Serge Hallyn 
Cc: Kel Modderman 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: Fix boot crash in mm_alloc()

2011-05-29T18:32:28+00:00

Thomas Gleixner reports that we now have a boot crash triggered by
CONFIG_CPUMASK_OFFSTACK=y:

    BUG: unable to handle kernel NULL pointer dereference at   (null)
    IP: [] find_next_bit+0x55/0xb0
    Call Trace:
     [] cpumask_any_but+0x2a/0x70
     [] flush_tlb_mm+0x2b/0x80
     [] pud_populate+0x35/0x50
     [] pgd_alloc+0x9a/0xf0
     [] mm_init+0xec/0x120
     [] mm_alloc+0x53/0xd0

which was introduced by commit de03c72cfce5 ("mm: convert
mm->cpu_vm_cpumask into cpumask_var_t"), and is due to wrong ordering of
mm_init() vs mm_init_cpumask

Thomas wrote a patch to just fix the ordering of initialization, but I
hate the new double allocation in the fork path, so I ended up instead
doing some more radical surgery to clean it all up.

Reported-by: Thomas Gleixner 
Reported-by: Ingo Molnar 
Cc: KOSAKI Motohiro 
Cc: Andrew Morton 
Signed-off-by: Linus Torvalds

cgroup: remove the ns_cgroup

2011-05-27T00:12:34+00:00

The ns_cgroup is an annoying cgroup at the namespace / cgroup frontier and
leads to some problems:

  * cgroup creation is out-of-control
  * cgroup name can conflict when pids are looping
  * it is not possible to have a single process handling a lot of
    namespaces without falling in a exponential creation time
  * we may want to create a namespace without creating a cgroup

  The ns_cgroup was replaced by a compatibility flag 'clone_children',
  where a newly created cgroup will copy the parent cgroup values.
  The userspace has to manually create a cgroup and add a task to
  the 'tasks' file.

This patch removes the ns_cgroup as suggested in the following thread:

https://lists.linux-foundation.org/pipermail/containers/2009-June/018616.html

The 'cgroup_clone' function is removed because it is no longer used.

This is a userspace-visible change.  Commit 45531757b45c ("cgroup: notify
ns_cgroup deprecated") (merged into 2.6.27) caused the kernel to emit a
printk warning users that the feature is planned for removal.  Since that
time we have heard from XXX users who were affected by this.

Signed-off-by: Daniel Lezcano 
Signed-off-by: Serge E. Hallyn 
Cc: Eric W. Biederman 
Cc: Jamal Hadi Salim 
Reviewed-by: Li Zefan 
Acked-by: Paul Menage 
Acked-by: Matt Helsley 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

printk: allocate kernel log buffer earlier

2011-05-25T15:39:48+00:00

On larger systems, because of the numerous ACPI, Bootmem and EFI messages,
the static log buffer overflows before the larger one specified by the
log_buf_len param is allocated.  Minimize the overflow by allocating the
new log buffer as soon as possible.

On kernels without memblock, a later call to setup_log_buf from
kernel/init.c is the fallback.

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix CONFIG_PRINTK=n build]
Signed-off-by: Mike Travis 
Cc: Yinghai Lu 
Cc: "H. Peter Anvin" 
Cc: Jack Steiner 
Cc: Thomas Gleixner 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

init/calibrate.c: fix for critical bogoMIPS intermittent calculation failure

2011-05-25T15:39:46+00:00

A fix to the TSC (Time Stamp Counter) based bogoMIPS calculation used on
secondary CPUs which has two faults:

1: Not handling wrapping of the lower 32 bits of the TSC counter on
   32bit kernel - perhaps TSC is not reset by a warm reset?

2: TSC and Jiffies are no incrementing together properly.  Either
   jiffies increment too quickly or Time Stamp Counter isn't incremented
   in during an SMI but the real time clock is and jiffies are
   incremented.

Case 1 can result in a factor of 16 too large a value which makes udelay()
values too small and can cause mysterious driver errors.  Case 2 appears
to give smaller 10-15% errors after averaging but enough to cause
occasional failures on my own board

I have tested this code on my own branch and attach patch suitable for
current kernel code.  See below for examples of the failures and how the
fix handles these situations now.

I reported this issue earlier here:
     Intermittent problem with BogoMIPs calculation on Intel AP CPUs -
http://marc.info/?l=linux-kernel&m=129947246316875&w=4

I suspect this issue has been seen by others but as it is intermittent and
bogoMIPS for secondary CPUs are no longer printed out it might have been
difficult to identify this as the cause.  Perhaps these unresolved issues,
although quite old, might be relevant as possibly this fault has been
around for a while.  In particular Case 1 may only be relevant to 32bit
kernels on newer HW (most people run 64bit kernels?).  Case 2 is less
dramatic since the earlier fix in this area and also intermittent.

   Re: bogomips discrepancy on Intel Core2 Quad CPU -
http://marc.info/?l=linux-kernel&m=118929277524298&w=4
   slow system and bogus bogomips  -
http://marc.info/?l=linux-kernel&m=116791286716107&w=4
   Re: Re: [RFC-PATCH] clocksource: update lpj if clocksource has -
http://marc.info/?l=linux-kernel&m=128952775819467&w=4

This issue is masked a little by commit feae3203d711db0a ("timers, init:
Limit the number of per cpu calibration bootup messages") which only
prints out the first bogoMIPS value making it much harder to notice other
values differing.  Perhaps it should be changed to only suppress them when
they are similar values?

Here are some outputs showing faults occurring and the new code handling
them properly.  See my earlier message for examples of the original
failure.

    Case 1:   A Time Stamp Counter wrap:
...
Calibrating delay loop (skipped), value calculated using timer
frequency.. 6332.70 BogoMIPS (lpj=31663540)
....
calibrate_delay_direct() timer_rate_max=31666493
timer_rate_min=31666151 pre_start=4170369255 pre_end=4202035539
calibrate_delay_direct() timer_rate_max=2425955274
timer_rate_min=2425954941 pre_start=4265368533 pre_end=2396356387
calibrate_delay_direct() ignoring timer_rate as we had a TSC wrap
around start=4265368581 >=post_end=2396356511
calibrate_delay_direct() timer_rate_max=31666274
timer_rate_min=31665942 pre_start=2440373374 pre_end=2472039515
calibrate_delay_direct() timer_rate_max=31666492
timer_rate_min=31666160 pre_start=2535372139 pre_end=2567038422
calibrate_delay_direct() timer_rate_max=31666455
timer_rate_min=31666207 pre_start=2630371084 pre_end=2662037415
Calibrating delay using timer specific routine.. 6333.28 BogoMIPS (lpj=31666428)
Total of 2 processors activated (12665.99 BogoMIPS).
....

    Case 2:  Some thing (presumably the SMM interrupt?) causing the
very low increase in TSC counter for the DELAY_CALIBRATION_TICKS
increase in jiffies
...
Calibrating delay loop (skipped), value calculated using timer
frequency.. 6333.25 BogoMIPS (lpj=31666270)
...
calibrate_delay_direct() timer_rate_max=31666483
timer_rate_min=31666074 pre_start=4199536526 pre_end=4231202809
calibrate_delay_direct() timer_rate_max=864348 timer_rate_min=864016
pre_start=2405343672 pre_end=2406207897
calibrate_delay_direct() timer_rate_max=31666483
timer_rate_min=31666179 pre_start=2469540464 pre_end=2501206823
calibrate_delay_direct() timer_rate_max=31666511
timer_rate_min=31666122 pre_start=2564539400 pre_end=2596205712
calibrate_delay_direct() timer_rate_max=31666084
timer_rate_min=31665685 pre_start=2659538782 pre_end=2691204657
calibrate_delay_direct() dropping min bogoMips estimate 1 = 864348
Calibrating delay using timer specific routine.. 6333.27 BogoMIPS (lpj=31666390)
Total of 2 processors activated (12666.53 BogoMIPS).
...

After 70 boots I saw 2 variations <1% slip through

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix straggly printk mess]
Signed-off-by: Andrew Worsley 
Reviewed-by: Phil Carmody 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

mm: convert mm->cpu_vm_cpumask into cpumask_var_t

2011-05-25T15:39:21+00:00

cpumask_t is very big struct and cpu_vm_mask is placed wrong position.
It might lead to reduce cache hit ratio.

This patch has two change.
1) Move the place of cpumask into last of mm_struct. Because usually cpumask
   is accessed only front bits when the system has cpu-hotplug capability
2) Convert cpu_vm_mask into cpumask_var_t. It may help to reduce memory
   footprint if cpumask_size() will use nr_cpumask_bits properly in future.

In addition, this patch change the name of cpu_vm_mask with cpu_vm_mask_var.
It may help to detect out of tree cpu_vm_mask users.

This patch has no functional change.

[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: KOSAKI Motohiro 
Cc: David Howells 
Cc: Koichi Yasutake 
Cc: Hugh Dickins 
Cc: Chris Metcalf 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds