From 9b1d82fa1611706fa7ee1505f290160a18caf95d Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Sun, 25 Oct 2009 19:03:50 -0700 Subject: rcu: "Tiny RCU", The Bloatwatch Edition This patch is a version of RCU designed for !SMP provided for a small-footprint RCU implementation. In particular, the implementation of synchronize_rcu() is extremely lightweight and high performance. It passes rcutorture testing in each of the four relevant configurations (combinations of NO_HZ and PREEMPT) on x86. This saves about 1K bytes compared to old Classic RCU (which is no longer in mainline), and more than three kilobytes compared to Hierarchical RCU (updated to 2.6.30): CONFIG_TREE_RCU: text data bss dec filename 183 4 0 187 kernel/rcupdate.o 2783 520 36 3339 kernel/rcutree.o 3526 Total (vs 4565 for v7) CONFIG_TREE_PREEMPT_RCU: text data bss dec filename 263 4 0 267 kernel/rcupdate.o 4594 776 52 5422 kernel/rcutree.o 5689 Total (6155 for v7) CONFIG_TINY_RCU: text data bss dec filename 96 4 0 100 kernel/rcupdate.o 734 24 0 758 kernel/rcutiny.o 858 Total (vs 848 for v7) The above is for x86. Your mileage may vary on other platforms. Further compression is possible, but is being procrastinated. Changes from v7 (http://lkml.org/lkml/2009/10/9/388) o Apply Lai Jiangshan's review comments (aside from might_sleep() in synchronize_sched(), which is covered by SMP builds). o Fix up expedited primitives. Changes from v6 (http://lkml.org/lkml/2009/9/23/293). o Forward ported to put it into the 2.6.33 stream. o Added lockdep support. o Make lightweight rcu_barrier. Changes from v5 (http://lkml.org/lkml/2009/6/23/12). o Ported to latest pre-2.6.32 merge window kernel. - Renamed rcu_qsctr_inc() to rcu_sched_qs(). - Renamed rcu_bh_qsctr_inc() to rcu_bh_qs(). - Provided trivial rcu_cpu_notify(). - Provided trivial exit_rcu(). - Provided trivial rcu_needs_cpu(). - Fixed up the rcu_*_enter/exit() functions in linux/hardirq.h. o Removed the dependence on EMBEDDED, with a view to making TINY_RCU default for !SMP at some time in the future. o Added (trivial) support for expedited grace periods. Changes from v4 (http://lkml.org/lkml/2009/5/2/91) include: o Squeeze the size down a bit further by removing the ->completed field from struct rcu_ctrlblk. o This permits synchronize_rcu() to become the empty function. Previous concerns about rcutorture were unfounded, as rcutorture correctly handles a constant value from rcu_batches_completed() and rcu_batches_completed_bh(). Changes from v3 (http://lkml.org/lkml/2009/3/29/221) include: o Changed rcu_batches_completed(), rcu_batches_completed_bh() rcu_enter_nohz(), rcu_exit_nohz(), rcu_nmi_enter(), and rcu_nmi_exit(), to be static inlines, as suggested by David Howells. Doing this saves about 100 bytes from rcutiny.o. (The numbers between v3 and this v4 of the patch are not directly comparable, since they are against different versions of Linux.) Changes from v2 (http://lkml.org/lkml/2009/2/3/333) include: o Fix whitespace issues. o Change short-circuit "||" operator to instead be "+" in order to fix performance bug noted by "kraai" on LWN. (http://lwn.net/Articles/324348/) Changes from v1 (http://lkml.org/lkml/2009/1/13/440) include: o This version depends on EMBEDDED as well as !SMP, as suggested by Ingo. o Updated rcu_needs_cpu() to unconditionally return zero, permitting the CPU to enter dynticks-idle mode at any time. This works because callbacks can be invoked upon entry to dynticks-idle mode. o Paul is now OK with this being included, based on a poll at the Kernel Miniconf at linux.conf.au, where about ten people said that they cared about saving 900 bytes on single-CPU systems. o Applies to both mainline and tip/core/rcu. Signed-off-by: Paul E. McKenney Acked-by: David Howells Acked-by: Josh Triplett Reviewed-by: Lai Jiangshan Cc: dipankar@in.ibm.com Cc: mathieu.desnoyers@polymtl.ca Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org Cc: Valdis.Kletnieks@vt.edu Cc: avi@redhat.com Cc: mtosatti@redhat.com LKML-Reference: <12565226351355-git-send-email-> Signed-off-by: Ingo Molnar --- init/Kconfig | 9 +++++++++ 1 file changed, 9 insertions(+) (limited to 'init') diff --git a/init/Kconfig b/init/Kconfig index 09c5c6431f42..d367bf6bcc34 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -334,6 +334,15 @@ config TREE_PREEMPT_RCU is also required. It also scales down nicely to smaller systems. +config TINY_RCU + bool "UP-only small-memory-footprint RCU" + depends on !SMP + help + This option selects the RCU implementation that is + designed for UP systems from which real-time response + is not required. This option greatly reduces the + memory footprint of RCU. + endchoice config RCU_TRACE -- cgit v1.2.2 From 26a7034b40ba80f82f64fa251a2cbf49f9971c6a Mon Sep 17 00:00:00 2001 From: "Eric W. Biederman" Date: Thu, 5 Nov 2009 05:26:41 -0800 Subject: sysctl: Reduce sys_sysctl to a compatibility wrapper around /proc/sys To simply maintenance and to be able to remove all of the binary sysctl support from various subsystems I have rewritten the binary sysctl code as a compatibility wrapper around proc/sys. The code is built around a hard coded table based on the table in sysctl_check.c that lists all of our current binary sysctls and provides enough information to convert from the sysctl binary input into into ascii and back again. New in this patch is the realization that the only dynamic entries that need to be handled have ifname as the asscii string and ifindex as their ctl_name. When a sys_sysctl is called the code now looks in the translation table converting the binary name to the path under /proc where the value is to be found. Opens that file, and calls into a format conversion wrapper that calls fop->read and then fop->write as appropriate. Since in practice the practically no one uses or tests sys_sysctl rewritting the code to be beautiful is a little silly. The redeeming merit of this work is it allows us to rip out all of the binary sysctl syscall support from everywhere else in the tree. Allowing us to remove a lot of dead (after this patch) and barely maintained code. In addition it becomes much easier to optimize the sysctl implementation for being the backing store of /proc/sys, without having to worry about sys_sysctl. Signed-off-by: Eric W. Biederman --- init/Kconfig | 1 + 1 file changed, 1 insertion(+) (limited to 'init') diff --git a/init/Kconfig b/init/Kconfig index f51586406d62..feb3b8f7e7ee 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -754,6 +754,7 @@ config UID16 config SYSCTL_SYSCALL bool "Sysctl syscall support" if EMBEDDED + depends on PROC_SYSCTL default y select SYSCTL ---help--- -- cgit v1.2.2 From 6beb000923882f6204ea2cfcd932e568e900803f Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Mon, 9 Nov 2009 15:21:34 +0000 Subject: locking: Make inlining decision Kconfig based commit 892a7c67 (locking: Allow arch-inlined spinlocks) implements the selection of which lock functions are inlined based on defines in arch/.../spinlock.h: #define __always_inline__LOCK_FUNCTION Despite of the name __always_inline__* the lock functions can be built out of line depending on config options. Also if the arch does not set some inline defines the generic code might set them; again depending on config options. This makes it unnecessary hard to figure out when and which lock functions are inlined. Aside of that it makes it way harder and messier for -rt to manipulate the lock functions. Convert the inlining decision to CONFIG switches. Each lock function is inlined depending on CONFIG_INLINE_*. The configs implement the existing dependencies. The architecture code can select ARCH_INLINE_* to signal that it wants the corresponding lock function inlined. ARCH_INLINE_* is necessary as Kconfig ignores "depends on" restrictions when a config element is selected. No functional change. Signed-off-by: Thomas Gleixner LKML-Reference: <20091109151428.504477141@linutronix.de> Acked-by: Heiko Carstens Reviewed-by: Ingo Molnar Acked-by: Peter Zijlstra --- init/Kconfig | 1 + 1 file changed, 1 insertion(+) (limited to 'init') diff --git a/init/Kconfig b/init/Kconfig index 9e03ef8b311e..06863dd7bf49 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1210,3 +1210,4 @@ source "block/Kconfig" config PREEMPT_NOTIFIERS bool +source "kernel/Kconfig.locks" -- cgit v1.2.2 From feae3203d711db0a9965300ee6d592257fdaae4f Mon Sep 17 00:00:00 2001 From: Mike Travis Date: Tue, 17 Nov 2009 18:22:13 -0600 Subject: timers, init: Limit the number of per cpu calibration bootup messages Limit the number of per cpu calibration messages by only printing out results for the first cpu to boot. Also, don't print "CPUx is down" as this is expected, and we don't need 4096 reminders... ;-) Signed-off-by: Mike Travis Cc: Heiko Carstens Cc: Roland Dreier Cc: Randy Dunlap Cc: Tejun Heo Cc: Andi Kleen Cc: Greg Kroah-Hartman Cc: Yinghai Lu Cc: David Rientjes Cc: Steven Rostedt Cc: Rusty Russell Cc: Hidetoshi Seto Cc: Jack Steiner Cc: Frederic Weisbecker LKML-Reference: <20091118002219.889552000@alcatraz.americas.sgi.com> Signed-off-by: Ingo Molnar --- init/calibrate.c | 24 +++++++++++++++--------- 1 file changed, 15 insertions(+), 9 deletions(-) (limited to 'init') diff --git a/init/calibrate.c b/init/calibrate.c index a379c9061199..6eb48e53d61c 100644 --- a/init/calibrate.c +++ b/init/calibrate.c @@ -123,23 +123,26 @@ void __cpuinit calibrate_delay(void) { unsigned long ticks, loopbit; int lps_precision = LPS_PREC; + static bool printed; if (preset_lpj) { loops_per_jiffy = preset_lpj; - printk(KERN_INFO - "Calibrating delay loop (skipped) preset value.. "); - } else if ((smp_processor_id() == 0) && lpj_fine) { + if (!printed) + pr_info("Calibrating delay loop (skipped) " + "preset value.. "); + } else if ((!printed) && lpj_fine) { loops_per_jiffy = lpj_fine; - printk(KERN_INFO - "Calibrating delay loop (skipped), " + pr_info("Calibrating delay loop (skipped), " "value calculated using timer frequency.. "); } else if ((loops_per_jiffy = calibrate_delay_direct()) != 0) { - printk(KERN_INFO - "Calibrating delay using timer specific routine.. "); + if (!printed) + pr_info("Calibrating delay using timer " + "specific routine.. "); } else { loops_per_jiffy = (1<<12); - printk(KERN_INFO "Calibrating delay loop... "); + if (!printed) + pr_info("Calibrating delay loop... "); while ((loops_per_jiffy <<= 1) != 0) { /* wait for "start of" clock tick */ ticks = jiffies; @@ -170,7 +173,10 @@ void __cpuinit calibrate_delay(void) loops_per_jiffy &= ~loopbit; } } - printk(KERN_CONT "%lu.%02lu BogoMIPS (lpj=%lu)\n", + if (!printed) + pr_cont("%lu.%02lu BogoMIPS (lpj=%lu)\n", loops_per_jiffy/(500000/HZ), (loops_per_jiffy/(5000/HZ)) % 100, loops_per_jiffy); + + printed = true; } -- cgit v1.2.2 From 9e9868a7372afc60615e8e11db246a91a121654d Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Uwe=20Kleine-K=C3=B6nig?= Date: Thu, 3 Dec 2009 19:58:00 +0100 Subject: sysfs: deprecated features are to help old tools not to confuse them MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Uwe Kleine-König Acked-by: Randy Dunlap Cc: Greg KH Cc: Andrew Morton Cc: Kay Sievers Signed-off-by: Linus Torvalds --- init/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'init') diff --git a/init/Kconfig b/init/Kconfig index 38899243213d..54c655ce9c04 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -615,7 +615,7 @@ config SYSFS_DEPRECATED bool config SYSFS_DEPRECATED_V2 - bool "enable deprecated sysfs features which may confuse old userspace tools" + bool "enable deprecated sysfs features to support old userspace tools" depends on SYSFS default n select SYSFS_DEPRECATED -- cgit v1.2.2 From 92045954058671fdd0ccf031ca06611ce1d929d1 Mon Sep 17 00:00:00 2001 From: Sam Ravnborg Date: Sun, 18 Oct 2009 00:36:47 +0200 Subject: kbuild: move compile.h to include/generated Signed-off-by: Sam Ravnborg Signed-off-by: Michal Marek --- init/Makefile | 8 ++------ init/version.c | 2 +- 2 files changed, 3 insertions(+), 7 deletions(-) (limited to 'init') diff --git a/init/Makefile b/init/Makefile index 4a243df426f7..0bf677aa0872 100644 --- a/init/Makefile +++ b/init/Makefile @@ -15,12 +15,8 @@ mounts-$(CONFIG_BLK_DEV_RAM) += do_mounts_rd.o mounts-$(CONFIG_BLK_DEV_INITRD) += do_mounts_initrd.o mounts-$(CONFIG_BLK_DEV_MD) += do_mounts_md.o -# files to be removed upon make clean -clean-files := ../include/linux/compile.h - # dependencies on generated files need to be listed explicitly - -$(obj)/version.o: include/linux/compile.h +$(obj)/version.o: include/generated/compile.h # compile.h changes depending on hostname, generation number, etc, # so we regenerate it always. @@ -30,7 +26,7 @@ $(obj)/version.o: include/linux/compile.h chk_compile.h = : quiet_chk_compile.h = echo ' CHK $@' silent_chk_compile.h = : -include/linux/compile.h: FORCE +include/generated/compile.h: FORCE @$($(quiet)chk_compile.h) $(Q)$(CONFIG_SHELL) $(srctree)/scripts/mkcompile_h $@ \ "$(UTS_MACHINE)" "$(CONFIG_SMP)" "$(CONFIG_PREEMPT)" "$(CC) $(KBUILD_CFLAGS)" diff --git a/init/version.c b/init/version.c index 52a8b98642b8..82328aaca1ef 100644 --- a/init/version.c +++ b/init/version.c @@ -6,7 +6,7 @@ * May be freely distributed as part of Linux. */ -#include +#include #include #include #include -- cgit v1.2.2 From 273b281fa22c293963ee3e6eec418f5dda2dbc83 Mon Sep 17 00:00:00 2001 From: Sam Ravnborg Date: Sun, 18 Oct 2009 00:52:28 +0200 Subject: kbuild: move utsrelease.h to include/generated Fix up all users of utsrelease.h Signed-off-by: Sam Ravnborg Signed-off-by: Michal Marek --- init/version.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'init') diff --git a/init/version.c b/init/version.c index 82328aaca1ef..adff586401a5 100644 --- a/init/version.c +++ b/init/version.c @@ -10,7 +10,7 @@ #include #include #include -#include +#include #include #ifndef CONFIG_KALLSYMS -- cgit v1.2.2 From ea637639591def87a54cea811cbac796980cb30d Mon Sep 17 00:00:00 2001 From: Jie Zhang Date: Mon, 14 Dec 2009 18:00:02 -0800 Subject: nommu: fix malloc performance by adding uninitialized flag The NOMMU code currently clears all anonymous mmapped memory. While this is what we want in the default case, all memory allocation from userspace under NOMMU has to go through this interface, including malloc() which is allowed to return uninitialized memory. This can easily be a significant performance penalty. So for constrained embedded systems were security is irrelevant, allow people to avoid clearing memory unnecessarily. This also alters the ELF-FDPIC binfmt such that it obtains uninitialised memory for the brk and stack region. Signed-off-by: Jie Zhang Signed-off-by: Robin Getz Signed-off-by: Mike Frysinger Signed-off-by: David Howells Acked-by: Paul Mundt Acked-by: Greg Ungerer Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- init/Kconfig | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) (limited to 'init') diff --git a/init/Kconfig b/init/Kconfig index 54c655ce9c04..a23da9f01803 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1079,6 +1079,28 @@ config SLOB endchoice +config MMAP_ALLOW_UNINITIALIZED + bool "Allow mmapped anonymous memory to be uninitialized" + depends on EMBEDDED && !MMU + default n + help + Normally, and according to the Linux spec, anonymous memory obtained + from mmap() has it's contents cleared before it is passed to + userspace. Enabling this config option allows you to request that + mmap() skip that if it is given an MAP_UNINITIALIZED flag, thus + providing a huge performance boost. If this option is not enabled, + then the flag will be ignored. + + This is taken advantage of by uClibc's malloc(), and also by + ELF-FDPIC binfmt's brk and stack allocator. + + Because of the obvious security issues, this option should only be + enabled on embedded devices where you control what is run in + userspace. Since that isn't generally a problem on no-MMU systems, + it is normally safe to say Y here. + + See Documentation/nommu-mmap.txt for more information. + config PROFILING bool "Profiling support (EXPERIMENTAL)" help -- cgit v1.2.2 From 196a15b4ee99f627fbc2c07e58e14aab2065fa80 Mon Sep 17 00:00:00 2001 From: H Hartley Sweeten Date: Mon, 14 Dec 2009 18:00:18 -0800 Subject: init/main.c: fix symbol shadows noise The symbol 'call' is a static symbol used for initcall_debug. This same symbol name is used locally by a couple functions and produces the following sparse warnings: warning: symbol 'call' shadows an earlier one Fix this noise by renaming the local symbols. Signed-off-by: H Hartley Sweeten Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- init/main.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) (limited to 'init') diff --git a/init/main.c b/init/main.c index 4051d75dd2d6..c3db4a98b369 100644 --- a/init/main.c +++ b/init/main.c @@ -691,10 +691,10 @@ asmlinkage void __init start_kernel(void) static void __init do_ctors(void) { #ifdef CONFIG_CONSTRUCTORS - ctor_fn_t *call = (ctor_fn_t *) __ctors_start; + ctor_fn_t *fn = (ctor_fn_t *) __ctors_start; - for (; call < (ctor_fn_t *) __ctors_end; call++) - (*call)(); + for (; fn < (ctor_fn_t *) __ctors_end; fn++) + (*fn)(); #endif } @@ -755,10 +755,10 @@ extern initcall_t __initcall_start[], __initcall_end[], __early_initcall_end[]; static void __init do_initcalls(void) { - initcall_t *call; + initcall_t *fn; - for (call = __early_initcall_end; call < __initcall_end; call++) - do_one_initcall(*call); + for (fn = __early_initcall_end; fn < __initcall_end; fn++) + do_one_initcall(*fn); /* Make sure there is no pending stuff from the initcall sequence */ flush_scheduled_work(); @@ -785,10 +785,10 @@ static void __init do_basic_setup(void) static void __init do_pre_smp_initcalls(void) { - initcall_t *call; + initcall_t *fn; - for (call = __initcall_start; call < __early_initcall_end; call++) - do_one_initcall(*call); + for (fn = __initcall_start; fn < __early_initcall_end; fn++) + do_one_initcall(*fn); } static void run_init_process(char *init_filename) -- cgit v1.2.2 From 54291362d2a5738e1b0495df2abcb9e6b0563a3f Mon Sep 17 00:00:00 2001 From: Phillip Lougher Date: Mon, 14 Dec 2009 21:45:19 +0000 Subject: initramfs: add missing decompressor error check The decompressors return error by calling a supplied error function, and/or by returning an error return value. The initramfs code, however, fails to check the exit code returned by the decompressor, and only checks the error status set by calling the error function. This patch adds a return code check and calls the error function. Signed-off-by: Phillip Lougher LKML-Reference: <4b26b1ef.0+ZWxT6886olqcSc%phillip@lougher.demon.co.uk> Signed-off-by: H. Peter Anvin --- init/initramfs.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) (limited to 'init') diff --git a/init/initramfs.c b/init/initramfs.c index 4c00edc59689..b37d34beb90b 100644 --- a/init/initramfs.c +++ b/init/initramfs.c @@ -413,7 +413,7 @@ static unsigned my_inptr; /* index of next byte to be processed in inbuf */ static char * __init unpack_to_rootfs(char *buf, unsigned len) { - int written; + int written, res; decompress_fn decompress; const char *compress_name; static __initdata char msg_buf[64]; @@ -445,10 +445,12 @@ static char * __init unpack_to_rootfs(char *buf, unsigned len) } this_header = 0; decompress = decompress_method(buf, len, &compress_name); - if (decompress) - decompress(buf, len, NULL, flush_buffer, NULL, + if (decompress) { + res = decompress(buf, len, NULL, flush_buffer, NULL, &my_inptr, error); - else if (compress_name) { + if (res) + error("decompressor failed"); + } else if (compress_name) { if (!message) { snprintf(msg_buf, sizeof msg_buf, "compression method %s not configured", -- cgit v1.2.2 From 933b0618d8b2a59c7a0742e43836544e02f1e9bd Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Wed, 16 Dec 2009 18:04:31 +0100 Subject: sched: Mark boot-cpu active before smp_init() A UP machine has 1 active cpu, not having the boot-cpu in the active map when starting the scheduler confuses things. Signed-off-by: Peter Zijlstra Cc: Mike Galbraith LKML-Reference: <20091216170517.423469527@chello.nl> Signed-off-by: Ingo Molnar --- init/main.c | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-) (limited to 'init') diff --git a/init/main.c b/init/main.c index c3db4a98b369..dac44a9356a5 100644 --- a/init/main.c +++ b/init/main.c @@ -369,12 +369,6 @@ static void __init smp_init(void) { unsigned int cpu; - /* - * Set up the current CPU as possible to migrate to. - * The other ones will be done by cpu_up/cpu_down() - */ - set_cpu_active(smp_processor_id(), true); - /* FIXME: This should be done in userspace --RR */ for_each_present_cpu(cpu) { if (num_online_cpus() >= setup_max_cpus) @@ -486,6 +480,7 @@ static void __init boot_cpu_init(void) int cpu = smp_processor_id(); /* Mark the boot cpu "present", "online" etc for SMP and UP case */ set_cpu_online(cpu, true); + set_cpu_active(cpu, true); set_cpu_present(cpu, true); set_cpu_possible(cpu, true); } -- cgit v1.2.2 From 07b139c8c81b97bbe55c68daf0cbeca8b1c609ca Mon Sep 17 00:00:00 2001 From: Li Zefan Date: Mon, 21 Dec 2009 14:27:35 +0800 Subject: perf events: Remove CONFIG_EVENT_PROFILE Quoted from Ingo: | This reminds me - i think we should eliminate CONFIG_EVENT_PROFILE - | it's an unnecessary Kconfig complication. If both PERF_EVENTS and | EVENT_TRACING is enabled we should expose generic tracepoints. | | Nor is it limited to event 'profiling', so it has become a misnomer as | well. Signed-off-by: Li Zefan Cc: Frederic Weisbecker Cc: Steven Rostedt Cc: Peter Zijlstra Cc: Paul Mackerras LKML-Reference: <4B2F1557.2050705@cn.fujitsu.com> Signed-off-by: Ingo Molnar --- init/Kconfig | 13 ------------- 1 file changed, 13 deletions(-) (limited to 'init') diff --git a/init/Kconfig b/init/Kconfig index a23da9f01803..06dab27c18d9 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -966,19 +966,6 @@ config PERF_EVENTS Say Y if unsure. -config EVENT_PROFILE - bool "Tracepoint profiling sources" - depends on PERF_EVENTS && EVENT_TRACING - default y - help - Allow the use of tracepoints as software performance events. - - When this is enabled, you can create perf events based on - tracepoints using PERF_TYPE_TRACEPOINT and the tracepoint ID - found in debugfs://tracing/events/*/*/id. (The -e/--events - option to the perf tool can parse and interpret symbolic - tracepoints, in the subsystem:tracepoint_name format.) - config PERF_COUNTERS bool "Kernel performance counters (old config option)" depends on HAVE_PERF_EVENTS -- cgit v1.2.2 From 16295bec6398a3eedc9377e1af6ff4c71b98c300 Mon Sep 17 00:00:00 2001 From: Steffen Klassert Date: Wed, 6 Jan 2010 19:47:10 +1100 Subject: padata: Generic parallelization/serialization interface This patch introduces an interface to process data objects in parallel. The parallelized objects return after serialization in the same order as they were before the parallelization. Signed-off-by: Steffen Klassert Signed-off-by: Herbert Xu --- init/Kconfig | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'init') diff --git a/init/Kconfig b/init/Kconfig index a23da9f01803..9fd23bcc1709 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1252,4 +1252,8 @@ source "block/Kconfig" config PREEMPT_NOTIFIERS bool +config PADATA + depends on SMP + bool + source "kernel/Kconfig.locks" -- cgit v1.2.2 From 7dd65feb6c603e13eba501c34c662259ab38e70e Mon Sep 17 00:00:00 2001 From: Albin Tonnerre Date: Fri, 8 Jan 2010 14:42:42 -0800 Subject: lib: add support for LZO-compressed kernels This patch series adds generic support for creating and extracting LZO-compressed kernel images, as well as support for using such images on the x86 and ARM architectures, and support for creating and using LZO-compressed initrd and initramfs images. Russell King said: : Testing on a Cortex A9 model: : - lzo decompressor is 65% of the time gzip takes to decompress a kernel : - lzo kernel is 9% larger than a gzip kernel : : which I'm happy to say confirms your figures when comparing the two. : : However, when comparing your new gzip code to the old gzip code: : - new is 99% of the size of the old code : - new takes 42% of the time to decompress than the old code : : What this means is that for a proper comparison, the results get even better: : - lzo is 7.5% larger than the old gzip'd kernel image : - lzo takes 28% of the time that the old gzip code took : : So the expense seems definitely worth the effort. The only reason I : can think of ever using gzip would be if you needed the additional : compression (eg, because you have limited flash to store the image.) : : I would argue that the default for ARM should therefore be LZO. This patch: The lzo compressor is worse than gzip at compression, but faster at extraction. Here are some figures for an ARM board I'm working on: Uncompressed size: 3.24Mo gzip 1.61Mo 0.72s lzo 1.75Mo 0.48s So for a compression ratio that is still relatively close to gzip, it's much faster to extract, at least in that case. This part contains: - Makefile routine to support lzo compression - Fixes to the existing lzo compressor so that it can be used in compressed kernels - wrapper around the existing lzo1x_decompress, as it only extracts one block at a time, while we need to extract a whole file here - config dialog for kernel compression [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: cleanup] Signed-off-by: Albin Tonnerre Tested-by: Wu Zhangjin Acked-by: "H. Peter Anvin" Cc: Ingo Molnar Cc: Thomas Gleixner Tested-by: Russell King Acked-by: Russell King Cc: Ralf Baechle Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- init/Kconfig | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-) (limited to 'init') diff --git a/init/Kconfig b/init/Kconfig index a23da9f01803..d95ca7cd5d45 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -115,10 +115,13 @@ config HAVE_KERNEL_BZIP2 config HAVE_KERNEL_LZMA bool +config HAVE_KERNEL_LZO + bool + choice prompt "Kernel compression mode" default KERNEL_GZIP - depends on HAVE_KERNEL_GZIP || HAVE_KERNEL_BZIP2 || HAVE_KERNEL_LZMA + depends on HAVE_KERNEL_GZIP || HAVE_KERNEL_BZIP2 || HAVE_KERNEL_LZMA || HAVE_KERNEL_LZO help The linux kernel is a kind of self-extracting executable. Several compression algorithms are available, which differ @@ -141,9 +144,8 @@ config KERNEL_GZIP bool "Gzip" depends on HAVE_KERNEL_GZIP help - The old and tried gzip compression. Its compression ratio is - the poorest among the 3 choices; however its speed (both - compression and decompression) is the fastest. + The old and tried gzip compression. It provides a good balance + between compression ratio and decompression speed. config KERNEL_BZIP2 bool "Bzip2" @@ -164,6 +166,14 @@ config KERNEL_LZMA two. Compression is slowest. The kernel size is about 33% smaller with LZMA in comparison to gzip. +config KERNEL_LZO + bool "LZO" + depends on HAVE_KERNEL_LZO + help + Its compression ratio is the poorest among the 4. The kernel + size is about about 10% bigger than gzip; however its speed + (both compression and decompression) is the fastest. + endchoice config SWAP -- cgit v1.2.2 From 7c9414385ebfdd87cc542d4e7e3bb0dbb2d3ce25 Mon Sep 17 00:00:00 2001 From: Dhaval Giani Date: Wed, 20 Jan 2010 13:26:18 +0100 Subject: sched: Remove USER_SCHED Remove the USER_SCHED feature. It has been scheduled to be removed in 2.6.34 as per http://marc.info/?l=linux-kernel&m=125728479022976&w=2 Signed-off-by: Dhaval Giani Signed-off-by: Peter Zijlstra LKML-Reference: <1263990378.24844.3.camel@localhost> Signed-off-by: Ingo Molnar --- init/Kconfig | 81 ++++++++++++++++++++++-------------------------------------- 1 file changed, 30 insertions(+), 51 deletions(-) (limited to 'init') diff --git a/init/Kconfig b/init/Kconfig index a23da9f01803..e9fa3007a6fc 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -435,57 +435,6 @@ config LOG_BUF_SHIFT config HAVE_UNSTABLE_SCHED_CLOCK bool -config GROUP_SCHED - bool "Group CPU scheduler" - depends on EXPERIMENTAL - default n - help - This feature lets CPU scheduler recognize task groups and control CPU - bandwidth allocation to such task groups. - In order to create a group from arbitrary set of processes, use - CONFIG_CGROUPS. (See Control Group support.) - -config FAIR_GROUP_SCHED - bool "Group scheduling for SCHED_OTHER" - depends on GROUP_SCHED - default GROUP_SCHED - -config RT_GROUP_SCHED - bool "Group scheduling for SCHED_RR/FIFO" - depends on EXPERIMENTAL - depends on GROUP_SCHED - default n - help - This feature lets you explicitly allocate real CPU bandwidth - to users or control groups (depending on the "Basis for grouping tasks" - setting below. If enabled, it will also make it impossible to - schedule realtime tasks for non-root users until you allocate - realtime bandwidth for them. - See Documentation/scheduler/sched-rt-group.txt for more information. - -choice - depends on GROUP_SCHED - prompt "Basis for grouping tasks" - default USER_SCHED - -config USER_SCHED - bool "user id" - help - This option will choose userid as the basis for grouping - tasks, thus providing equal CPU bandwidth to each user. - -config CGROUP_SCHED - bool "Control groups" - depends on CGROUPS - help - This option allows you to create arbitrary task groups - using the "cgroup" pseudo filesystem and control - the cpu bandwidth allocated to each such task group. - Refer to Documentation/cgroups/cgroups.txt for more - information on "cgroup" pseudo filesystem. - -endchoice - menuconfig CGROUPS boolean "Control Group support" help @@ -606,6 +555,36 @@ config CGROUP_MEM_RES_CTLR_SWAP Now, memory usage of swap_cgroup is 2 bytes per entry. If swap page size is 4096bytes, 512k per 1Gbytes of swap. +menuconfig CGROUP_SCHED + bool "Group CPU scheduler" + depends on EXPERIMENTAL && CGROUPS + default n + help + This feature lets CPU scheduler recognize task groups and control CPU + bandwidth allocation to such task groups. It uses cgroups to group + tasks. + +if CGROUP_SCHED +config FAIR_GROUP_SCHED + bool "Group scheduling for SCHED_OTHER" + depends on CGROUP_SCHED + default CGROUP_SCHED + +config RT_GROUP_SCHED + bool "Group scheduling for SCHED_RR/FIFO" + depends on EXPERIMENTAL + depends on CGROUP_SCHED + default n + help + This feature lets you explicitly allocate real CPU bandwidth + to users or control groups (depending on the "Basis for grouping tasks" + setting below. If enabled, it will also make it impossible to + schedule realtime tasks for non-root users until you allocate + realtime bandwidth for them. + See Documentation/scheduler/sched-rt-group.txt for more information. + +endif #CGROUP_SCHED + endif # CGROUPS config MM_OWNER -- cgit v1.2.2 From 54bb6552bd9405dc7685653157a4ec260c77a71c Mon Sep 17 00:00:00 2001 From: Eric Paris Date: Wed, 9 Dec 2009 15:29:01 -0500 Subject: ima: initialize ima before inodes can be allocated ima wants to create an inode information struct (iint) when inodes are allocated. This means that at least the part of ima which does this allocation (the allocation is filled with information later) should before any inodes are created. To accomplish this we split the ima initialization routine placing the kmem cache allocator inside a security_initcall() function. Since this makes use of radix trees we also need to make sure that is initialized before security_initcall(). Signed-off-by: Eric Paris Acked-by: Mimi Zohar Signed-off-by: Al Viro --- init/main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'init') diff --git a/init/main.c b/init/main.c index dac44a9356a5..4cb47a159f02 100644 --- a/init/main.c +++ b/init/main.c @@ -657,9 +657,9 @@ asmlinkage void __init start_kernel(void) proc_caches_init(); buffer_init(); key_init(); + radix_tree_init(); security_init(); vfs_caches_init(totalram_pages); - radix_tree_init(); signals_init(); /* rootfs populating might need page-writeback */ page_writeback_init(); -- cgit v1.2.2 From 773e3eb7b81e5ba13b5155dfb3bb75b8ce37f8f9 Mon Sep 17 00:00:00 2001 From: Yinghai Lu Date: Wed, 10 Feb 2010 01:20:33 -0800 Subject: init: Move radix_tree_init() early Prepare for using radix trees in early_irq_init(). Signed-off-by: Yinghai Lu LKML-Reference: <1265793639-15071-30-git-send-email-yinghai@kernel.org> Signed-off-by: H. Peter Anvin --- init/main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'init') diff --git a/init/main.c b/init/main.c index 4cb47a159f02..845187822e5c 100644 --- a/init/main.c +++ b/init/main.c @@ -584,6 +584,7 @@ asmlinkage void __init start_kernel(void) local_irq_disable(); } rcu_init(); + radix_tree_init(); /* init some links before init_ISA_irqs() */ early_irq_init(); init_IRQ(); @@ -657,7 +658,6 @@ asmlinkage void __init start_kernel(void) proc_caches_init(); buffer_init(); key_init(); - radix_tree_init(); security_init(); vfs_caches_init(totalram_pages); signals_init(); -- cgit v1.2.2 From 2b633e3fac5efada088b57d31e65401f22bcc18f Mon Sep 17 00:00:00 2001 From: Yinghai Lu Date: Wed, 10 Feb 2010 01:20:37 -0800 Subject: smp: Use nr_cpus= to set nr_cpu_ids early On x86, before prefill_possible_map(), nr_cpu_ids will be NR_CPUS aka CONFIG_NR_CPUS. Add nr_cpus= to set nr_cpu_ids. so we can simulate cpus <=8 are installed on normal config. -v2: accordging to Christoph, acpi_numa_init should use nr_cpu_ids in stead of NR_CPUS. -v3: add doc in kernel-parameters.txt according to Andrew. Signed-off-by: Yinghai Lu LKML-Reference: <1265793639-15071-34-git-send-email-yinghai@kernel.org> Acked-by: Linus Torvalds Signed-off-by: H. Peter Anvin Cc: Tony Luck --- init/main.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) (limited to 'init') diff --git a/init/main.c b/init/main.c index 845187822e5c..05b5283e98fa 100644 --- a/init/main.c +++ b/init/main.c @@ -149,6 +149,20 @@ static int __init nosmp(char *str) early_param("nosmp", nosmp); +/* this is hard limit */ +static int __init nrcpus(char *str) +{ + int nr_cpus; + + get_option(&str, &nr_cpus); + if (nr_cpus > 0 && nr_cpus < nr_cpu_ids) + nr_cpu_ids = nr_cpus; + + return 0; +} + +early_param("nr_cpus", nrcpus); + static int __init maxcpus(char *str) { get_option(&str, &setup_max_cpus); -- cgit v1.2.2 From d11c563dd20ff35da5652c3e1c989d9e10e1d6d0 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Mon, 22 Feb 2010 17:04:50 -0800 Subject: sched: Use lockdep-based checking on rcu_dereference() Update the rcu_dereference() usages to take advantage of the new lockdep-based checking. Signed-off-by: Paul E. McKenney Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: mathieu.desnoyers@polymtl.ca Cc: josh@joshtriplett.org Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org Cc: Valdis.Kletnieks@vt.edu Cc: dhowells@redhat.com LKML-Reference: <1266887105-1528-6-git-send-email-paulmck@linux.vnet.ibm.com> [ -v2: fix allmodconfig missing symbol export build failure on x86 ] Signed-off-by: Ingo Molnar --- init/main.c | 2 ++ 1 file changed, 2 insertions(+) (limited to 'init') diff --git a/init/main.c b/init/main.c index 4cb47a159f02..c75dcd6eef09 100644 --- a/init/main.c +++ b/init/main.c @@ -416,7 +416,9 @@ static noinline void __init_refok rest_init(void) kernel_thread(kernel_init, NULL, CLONE_FS | CLONE_SIGHAND); numa_default_policy(); pid = kernel_thread(kthreadd, NULL, CLONE_FS | CLONE_FILES); + rcu_read_lock(); kthreadd_task = find_task_by_pid_ns(pid, &init_pid_ns); + rcu_read_unlock(); unlock_kernel(); /* -- cgit v1.2.2 From 8bd93a2c5d4cab2ae17d06350daa7dbf546a4634 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Mon, 22 Feb 2010 17:04:59 -0800 Subject: rcu: Accelerate grace period if last non-dynticked CPU Currently, rcu_needs_cpu() simply checks whether the current CPU has an outstanding RCU callback, which means that the last CPU to go into dyntick-idle mode might wait a few ticks for the relevant grace periods to complete. However, if all the other CPUs are in dyntick-idle mode, and if this CPU is in a quiescent state (which it is for RCU-bh and RCU-sched any time that we are considering going into dyntick-idle mode), then the grace period is instantly complete. This patch therefore repeatedly invokes the RCU grace-period machinery in order to force any needed grace periods to complete quickly. It does so a limited number of times in order to prevent starvation by an RCU callback function that might pass itself to call_rcu(). However, if any CPU other than the current one is not in dyntick-idle mode, fall back to simply checking (with fix to bug noted by Lai Jiangshan). Also, take advantage of last grace-period forcing, the opportunity to do so noted by Steve Rostedt. And apply simplified #ifdef condition suggested by Frederic Weisbecker. Signed-off-by: Paul E. McKenney Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: mathieu.desnoyers@polymtl.ca Cc: josh@joshtriplett.org Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org Cc: Valdis.Kletnieks@vt.edu Cc: dhowells@redhat.com LKML-Reference: <1266887105-1528-15-git-send-email-paulmck@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar --- init/Kconfig | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) (limited to 'init') diff --git a/init/Kconfig b/init/Kconfig index d95ca7cd5d45..42bf914b325a 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -396,6 +396,22 @@ config RCU_FANOUT_EXACT Say N if unsure. +config RCU_FAST_NO_HZ + bool "Accelerate last non-dyntick-idle CPU's grace periods" + depends on TREE_RCU && NO_HZ && SMP + default n + help + This option causes RCU to attempt to accelerate grace periods + in order to allow the final CPU to enter dynticks-idle state + more quickly. On the other hand, this option increases the + overhead of the dynticks-idle checking, particularly on systems + with large numbers of CPUs. + + Say Y if energy efficiency is critically important, particularly + if you have relatively few CPUs. + + Say N if you are unsure. + config TREE_RCU_TRACE def_bool RCU_TRACE && ( TREE_RCU || TREE_PREEMPT_RCU ) select DEBUG_FS -- cgit v1.2.2 From b309a294e5b24692d0f7ea1defa168074cea619e Mon Sep 17 00:00:00 2001 From: Robert Richter Date: Fri, 26 Feb 2010 15:01:23 +0100 Subject: oprofile: remove EXPERIMENTAL from the config option description OProfile is already used for a long time and no longer experimental. Signed-off-by: Robert Richter --- init/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'init') diff --git a/init/Kconfig b/init/Kconfig index d95ca7cd5d45..b7c583956107 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1112,7 +1112,7 @@ config MMAP_ALLOW_UNINITIALIZED See Documentation/nommu-mmap.txt for more information. config PROFILING - bool "Profiling support (EXPERIMENTAL)" + bool "Profiling support" help Say Y here to enable the extended profiling support mechanisms used by profilers such as OProfile. -- cgit v1.2.2 From 2bd3a997befc226ab4b504f05c5cbba305f3e0e6 Mon Sep 17 00:00:00 2001 From: "Eric W. Biederman" Date: Tue, 2 Mar 2010 23:53:19 -0800 Subject: init: Open /dev/console from rootfs To avoid potential problems with an empty /dev open /dev/console from rootfs instead of waiting to mount our root filesystem and mounting it there. This effectively guarantees that there will be a device node, and it won't be on a filesystem that we will ever unmount, so there are no issues with leaving /dev/console open and pinning the filesystem. This is actually more effective than automatically mounting devtmpfs on /dev because it removes removes the occasionally problematic assumption that /dev/console exists from the boot code. With this patch I was able to throw busybox on my /boot partition (which has no /dev directory) and boot into userspace without problems. The only possible negative consequence I can think of is that someone out there deliberately used did not use a character device that is major 5 minor 2 for /dev/console. Does anyone know of a situation in which that could make sense? Signed-off-by: Eric W. Biederman Signed-off-by: Al Viro --- init/do_mounts_initrd.c | 4 ---- init/main.c | 11 ++++++----- 2 files changed, 6 insertions(+), 9 deletions(-) (limited to 'init') diff --git a/init/do_mounts_initrd.c b/init/do_mounts_initrd.c index 614241b5200c..2b108538d0d9 100644 --- a/init/do_mounts_initrd.c +++ b/init/do_mounts_initrd.c @@ -30,11 +30,7 @@ static int __init do_linuxrc(void * shell) extern char * envp_init[]; sys_close(old_fd);sys_close(root_fd); - sys_close(0);sys_close(1);sys_close(2); sys_setsid(); - (void) sys_open("/dev/console",O_RDWR,0); - (void) sys_dup(0); - (void) sys_dup(0); return kernel_execve(shell, argv, envp_init); } diff --git a/init/main.c b/init/main.c index 4cb47a159f02..106e02d7ffa5 100644 --- a/init/main.c +++ b/init/main.c @@ -806,11 +806,6 @@ static noinline int init_post(void) system_state = SYSTEM_RUNNING; numa_default_policy(); - if (sys_open((const char __user *) "/dev/console", O_RDWR, 0) < 0) - printk(KERN_WARNING "Warning: unable to open an initial console.\n"); - - (void) sys_dup(0); - (void) sys_dup(0); current->signal->flags |= SIGNAL_UNKILLABLE; @@ -873,6 +868,12 @@ static int __init kernel_init(void * unused) do_basic_setup(); + /* Open the /dev/console on the rootfs, this should never fail */ + if (sys_open((const char __user *) "/dev/console", O_RDWR, 0) < 0) + printk(KERN_WARNING "Warning: unable to open an initial console.\n"); + + (void) sys_dup(0); + (void) sys_dup(0); /* * check if there is an early userspace init. If yes, let it do all * the work -- cgit v1.2.2 From 452aa6999e6703ffbddd7f6ea124d3968915f3e3 Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Fri, 5 Mar 2010 13:42:13 -0800 Subject: mm/pm: force GFP_NOIO during suspend/hibernation and resume There are quite a few GFP_KERNEL memory allocations made during suspend/hibernation and resume that may cause the system to hang, because the I/O operations they depend on cannot be completed due to the underlying devices being suspended. Avoid this problem by clearing the __GFP_IO and __GFP_FS bits in gfp_allowed_mask before suspend/hibernation and restoring the original values of these bits in gfp_allowed_mask durig the subsequent resume. [akpm@linux-foundation.org: fix CONFIG_PM=n linkage] Signed-off-by: Rafael J. Wysocki Reported-by: Maxim Levitsky Cc: Sebastian Ott Cc: Benjamin Herrenschmidt Cc: KOSAKI Motohiro Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- init/main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'init') diff --git a/init/main.c b/init/main.c index 40aaa020cd68..41d0f10dbbc7 100644 --- a/init/main.c +++ b/init/main.c @@ -618,7 +618,7 @@ asmlinkage void __init start_kernel(void) local_irq_enable(); /* Interrupts are enabled now so all GFP allocations are safe. */ - set_gfp_allowed_mask(__GFP_BITS_MASK); + gfp_allowed_mask = __GFP_BITS_MASK; kmem_cache_init_late(); -- cgit v1.2.2 From 9a85b8d6049cbb0e7961df2069322fbc4192026a Mon Sep 17 00:00:00 2001 From: Andreas Mohr Date: Fri, 5 Mar 2010 13:42:39 -0800 Subject: init/main.c: improve usability in case of init binary failure - new Documentation/init.txt file describing various forms of failure trying to load the init binary after kernel bootup - extend the init/main.c init failure message to direct to Documentation/init.txt Signed-off-by: Andreas Mohr Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- init/main.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'init') diff --git a/init/main.c b/init/main.c index 41d0f10dbbc7..b09a828f38b5 100644 --- a/init/main.c +++ b/init/main.c @@ -847,7 +847,8 @@ static noinline int init_post(void) run_init_process("/bin/init"); run_init_process("/bin/sh"); - panic("No init found. Try passing init= option to kernel."); + panic("No init found. Try passing init= option to kernel. " + "See Linux Documentation/init.txt for guidance."); } static int __init kernel_init(void * unused) -- cgit v1.2.2 From 8aaed5bec2b9177eab1796c8c4f7a4c90804eef6 Mon Sep 17 00:00:00 2001 From: H Hartley Sweeten Date: Fri, 5 Mar 2010 13:42:39 -0800 Subject: init/initramfs.c: fix "symbol shadows an earlier one" noise The symbol 'count' is a local global variable in this file. The function clean_rootfs() should use a different symbol name to prevent "symbol shadows an earlier one" noise. Signed-off-by: H Hartley Sweeten Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- init/initramfs.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) (limited to 'init') diff --git a/init/initramfs.c b/init/initramfs.c index b37d34beb90b..37d3859b1b32 100644 --- a/init/initramfs.c +++ b/init/initramfs.c @@ -525,7 +525,7 @@ static void __init clean_rootfs(void) int fd; void *buf; struct linux_dirent64 *dirp; - int count; + int num; fd = sys_open("/", O_RDONLY, 0); WARN_ON(fd < 0); @@ -539,9 +539,9 @@ static void __init clean_rootfs(void) } dirp = buf; - count = sys_getdents64(fd, dirp, BUF_SIZE); - while (count > 0) { - while (count > 0) { + num = sys_getdents64(fd, dirp, BUF_SIZE); + while (num > 0) { + while (num > 0) { struct stat st; int ret; @@ -554,12 +554,12 @@ static void __init clean_rootfs(void) sys_unlink(dirp->d_name); } - count -= dirp->d_reclen; + num -= dirp->d_reclen; dirp = (void *)dirp + dirp->d_reclen; } dirp = buf; memset(buf, 0, BUF_SIZE); - count = sys_getdents64(fd, dirp, BUF_SIZE); + num = sys_getdents64(fd, dirp, BUF_SIZE); } sys_close(fd); -- cgit v1.2.2 From 7463e633c5f94792dcff1afefb0d2961318a9d09 Mon Sep 17 00:00:00 2001 From: H Hartley Sweeten Date: Fri, 5 Mar 2010 13:42:46 -0800 Subject: init/main.c: make setup_max_cpus static for !SMP The only in tree external users of the symbol setup_max_cpus are in arch/x86/. The files ./kernel/alternative.c, ./kernel/visws_quirks.c, and ./mm/kmemcheck/kmemcheck.c are all guarded by CONFIG_SMP being defined. For this case the symbol is an unsigned int and declared as an extern in include/linux/smp.h. When CONFIG_SMP is not defined the symbol setup_max_cpus is a constant value that is only used in init/main.c. Make the symbol static for this case. Signed-off-by: H Hartley Sweeten Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- init/main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'init') diff --git a/init/main.c b/init/main.c index b09a828f38b5..a1ab78ceb4b6 100644 --- a/init/main.c +++ b/init/main.c @@ -174,7 +174,7 @@ static int __init maxcpus(char *str) early_param("maxcpus", maxcpus); #else -const unsigned int setup_max_cpus = NR_CPUS; +static const unsigned int setup_max_cpus = NR_CPUS; #endif /* -- cgit v1.2.2 From 0dea116876eefc9c7ca9c5d74fe665481e499fa3 Mon Sep 17 00:00:00 2001 From: "Kirill A. Shutemov" Date: Wed, 10 Mar 2010 15:22:20 -0800 Subject: cgroup: implement eventfd-based generic API for notifications This patchset introduces eventfd-based API for notifications in cgroups and implements memory notifications on top of it. It uses statistics in memory controler to track memory usage. Output of time(1) on building kernel on tmpfs: Root cgroup before changes: make -j2 506.37 user 60.93s system 193% cpu 4:52.77 total Non-root cgroup before changes: make -j2 507.14 user 62.66s system 193% cpu 4:54.74 total Root cgroup after changes (0 thresholds): make -j2 507.13 user 62.20s system 193% cpu 4:53.55 total Non-root cgroup after changes (0 thresholds): make -j2 507.70 user 64.20s system 193% cpu 4:55.70 total Root cgroup after changes (1 thresholds, never crossed): make -j2 506.97 user 62.20s system 193% cpu 4:53.90 total Non-root cgroup after changes (1 thresholds, never crossed): make -j2 507.55 user 64.08s system 193% cpu 4:55.63 total This patch: Introduce the write-only file "cgroup.event_control" in every cgroup. To register new notification handler you need: - create an eventfd; - open a control file to be monitored. Callbacks register_event() and unregister_event() must be defined for the control file; - write " " to cgroup.event_control. Interpretation of args is defined by control file implementation; eventfd will be woken up by control file implementation or when the cgroup is removed. To unregister notification handler just close eventfd. If you need notification functionality for a control file you have to implement callbacks register_event() and unregister_event() in the struct cftype. [kamezawa.hiroyu@jp.fujitsu.com: Kconfig fix] Signed-off-by: Kirill A. Shutemov Reviewed-by: KAMEZAWA Hiroyuki Paul Menage Cc: Li Zefan Cc: Balbir Singh Cc: Pavel Emelyanov Cc: Dan Malek Cc: Vladislav Buzov Cc: Daisuke Nishimura Cc: Alexander Shishkin Cc: Davide Libenzi Signed-off-by: KAMEZAWA Hiroyuki Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- init/Kconfig | 1 + 1 file changed, 1 insertion(+) (limited to 'init') diff --git a/init/Kconfig b/init/Kconfig index 089a230e5652..eb77e8ccde1c 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -463,6 +463,7 @@ config HAVE_UNSTABLE_SCHED_CLOCK menuconfig CGROUPS boolean "Control Group support" + depends on EVENTFD help This option adds support for grouping sets of processes together, for use with process control subsystems such as Cpusets, CFS, memory -- cgit v1.2.2 From 5ab116c9349ef52d6fbd2e2917a53f13194b048e Mon Sep 17 00:00:00 2001 From: Miao Xie Date: Tue, 23 Mar 2010 13:35:34 -0700 Subject: cpuset: fix the problem that cpuset_mem_spread_node() returns an offline node cpuset_mem_spread_node() returns an offline node, and causes an oops. This patch fixes it by initializing task->mems_allowed to node_states[N_HIGH_MEMORY], and updating task->mems_allowed when doing memory hotplug. Signed-off-by: Miao Xie Acked-by: David Rientjes Reported-by: Nick Piggin Tested-by: Nick Piggin Cc: Paul Menage Cc: Li Zefan Cc: Ingo Molnar Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- init/main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'init') diff --git a/init/main.c b/init/main.c index a1ab78ceb4b6..cbead27caefc 100644 --- a/init/main.c +++ b/init/main.c @@ -858,7 +858,7 @@ static int __init kernel_init(void * unused) /* * init can allocate pages on any node */ - set_mems_allowed(node_possible_map); + set_mems_allowed(node_states[N_HIGH_MEMORY]); /* * init can run on any cpu. */ -- cgit v1.2.2 From 5a0e3ad6af8660be21ca98a971cd00f331318c05 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Wed, 24 Mar 2010 17:04:11 +0900 Subject: include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo Guess-its-ok-by: Christoph Lameter Cc: Ingo Molnar Cc: Lee Schermerhorn --- init/do_mounts.c | 1 + init/do_mounts_rd.c | 1 + init/main.c | 2 +- 3 files changed, 3 insertions(+), 1 deletion(-) (limited to 'init') diff --git a/init/do_mounts.c b/init/do_mounts.c index bb008d064c1a..02e3ca4fc527 100644 --- a/init/do_mounts.c +++ b/init/do_mounts.c @@ -15,6 +15,7 @@ #include #include #include +#include #include #include diff --git a/init/do_mounts_rd.c b/init/do_mounts_rd.c index 027a402708de..bf3ef667bf36 100644 --- a/init/do_mounts_rd.c +++ b/init/do_mounts_rd.c @@ -7,6 +7,7 @@ #include #include #include +#include #include "do_mounts.h" #include "../fs/squashfs/squashfs_fs.h" diff --git a/init/main.c b/init/main.c index cbead27caefc..5c8540271529 100644 --- a/init/main.c +++ b/init/main.c @@ -25,7 +25,6 @@ #include #include #include -#include #include #include #include @@ -69,6 +68,7 @@ #include #include #include +#include #include #include -- cgit v1.2.2 From df37bd156dcb4f5441beaf5bde444adac974e9a0 Mon Sep 17 00:00:00 2001 From: Phillip Lougher Date: Fri, 23 Apr 2010 13:18:11 -0400 Subject: initramfs: handle unrecognised decompressor when unpacking The unpack routine fails to handle the decompress_method() returning unrecognised decompressor (compress_name == NULL). This results in the routine looping eventually oopsing on an out of bounds memory access. Note this bug is usually hidden, only triggering on trailing junk after one or more correct compressed blocks. The case of the compressed archive being complete junk is (by accident?) caught by the if (state != Reset) check because state is initialised to Start, but not updated due to the decompressor not having been called. Obviously if the junk is trailing a correctly decompressed buffer, state == Reset from the previous call to the decompressor. Signed-off-by: Phillip Lougher Reported-by: Aaro Koskinen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- init/initramfs.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'init') diff --git a/init/initramfs.c b/init/initramfs.c index 37d3859b1b32..4b9c20205092 100644 --- a/init/initramfs.c +++ b/init/initramfs.c @@ -457,7 +457,8 @@ static char * __init unpack_to_rootfs(char *buf, unsigned len) compress_name); message = msg_buf; } - } + } else + error("junk in compressed archive"); if (state != Reset) error("junk in compressed archive"); this_header = saved_offset + my_inptr; -- cgit v1.2.2