aboutsummaryrefslogtreecommitdiffstats
path: root/kernel
Commit message (Collapse)AuthorAge
* [PATCH] Notifier chain update: API changesAlan Stern2006-03-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The kernel's implementation of notifier chains is unsafe. There is no protection against entries being added to or removed from a chain while the chain is in use. The issues were discussed in this thread: http://marc.theaimsgroup.com/?l=linux-kernel&m=113018709002036&w=2 We noticed that notifier chains in the kernel fall into two basic usage classes: "Blocking" chains are always called from a process context and the callout routines are allowed to sleep; "Atomic" chains can be called from an atomic context and the callout routines are not allowed to sleep. We decided to codify this distinction and make it part of the API. Therefore this set of patches introduces three new, parallel APIs: one for blocking notifiers, one for atomic notifiers, and one for "raw" notifiers (which is really just the old API under a new name). New kinds of data structures are used for the heads of the chains, and new routines are defined for registration, unregistration, and calling a chain. The three APIs are explained in include/linux/notifier.h and their implementation is in kernel/sys.c. With atomic and blocking chains, the implementation guarantees that the chain links will not be corrupted and that chain callers will not get messed up by entries being added or removed. For raw chains the implementation provides no guarantees at all; users of this API must provide their own protections. (The idea was that situations may come up where the assumptions of the atomic and blocking APIs are not appropriate, so it should be possible for users to handle these things in their own way.) There are some limitations, which should not be too hard to live with. For atomic/blocking chains, registration and unregistration must always be done in a process context since the chain is protected by a mutex/rwsem. Also, a callout routine for a non-raw chain must not try to register or unregister entries on its own chain. (This did happen in a couple of places and the code had to be changed to avoid it.) Since atomic chains may be called from within an NMI handler, they cannot use spinlocks for synchronization. Instead we use RCU. The overhead falls almost entirely in the unregister routine, which is okay since unregistration is much less frequent that calling a chain. Here is the list of chains that we adjusted and their classifications. None of them use the raw API, so for the moment it is only a placeholder. ATOMIC CHAINS ------------- arch/i386/kernel/traps.c: i386die_chain arch/ia64/kernel/traps.c: ia64die_chain arch/powerpc/kernel/traps.c: powerpc_die_chain arch/sparc64/kernel/traps.c: sparc64die_chain arch/x86_64/kernel/traps.c: die_chain drivers/char/ipmi/ipmi_si_intf.c: xaction_notifier_list kernel/panic.c: panic_notifier_list kernel/profile.c: task_free_notifier net/bluetooth/hci_core.c: hci_notifier net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_chain net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_expect_chain net/ipv6/addrconf.c: inet6addr_chain net/netfilter/nf_conntrack_core.c: nf_conntrack_chain net/netfilter/nf_conntrack_core.c: nf_conntrack_expect_chain net/netlink/af_netlink.c: netlink_chain BLOCKING CHAINS --------------- arch/powerpc/platforms/pseries/reconfig.c: pSeries_reconfig_chain arch/s390/kernel/process.c: idle_chain arch/x86_64/kernel/process.c idle_notifier drivers/base/memory.c: memory_chain drivers/cpufreq/cpufreq.c cpufreq_policy_notifier_list drivers/cpufreq/cpufreq.c cpufreq_transition_notifier_list drivers/macintosh/adb.c: adb_client_list drivers/macintosh/via-pmu.c sleep_notifier_list drivers/macintosh/via-pmu68k.c sleep_notifier_list drivers/macintosh/windfarm_core.c wf_client_list drivers/usb/core/notify.c usb_notifier_list drivers/video/fbmem.c fb_notifier_list kernel/cpu.c cpu_chain kernel/module.c module_notify_list kernel/profile.c munmap_notifier kernel/profile.c task_exit_notifier kernel/sys.c reboot_notifier_list net/core/dev.c netdev_chain net/decnet/dn_dev.c: dnaddr_chain net/ipv4/devinet.c: inetaddr_chain It's possible that some of these classifications are wrong. If they are, please let us know or submit a patch to fix them. Note that any chain that gets called very frequently should be atomic, because the rwsem read-locking used for blocking chains is very likely to incur cache misses on SMP systems. (However, if the chain's callout routines may sleep then the chain cannot be atomic.) The patch set was written by Alan Stern and Chandra Seetharaman, incorporating material written by Keith Owens and suggestions from Paul McKenney and Andrew Morton. [jes@sgi.com: restructure the notifier chain initialization macros] Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Signed-off-by: Jes Sorensen <jes@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] lightweight robust futexes updatesIngo Molnar2006-03-27
| | | | | | | | | | | | | | | | | | - fix: initialize the robust list(s) to NULL in copy_process. - doc update - cleanup: rename _inuser to _inatomic - __user cleanups and other small cleanups Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Ulrich Drepper <drepper@redhat.com> Cc: Andi Kleen <ak@muc.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] lightweight robust futexes: compatIngo Molnar2006-03-27
| | | | | | | | | | | | 32-bit syscall compatibility support. (This patch also moves all futex related compat functionality into kernel/futex_compat.c.) Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Arjan van de Ven <arjan@infradead.org> Acked-by: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] lightweight robust futexes: coreIngo Molnar2006-03-27
| | | | | | | | | | | | | Add the core infrastructure for robust futexes: structure definitions, the new syscalls and the do_exit() based cleanup mechanism. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Arjan van de Ven <arjan@infradead.org> Acked-by: Ulrich Drepper <drepper@redhat.com> Cc: Michael Kerrisk <mtk-manpages@gmx.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] sched: fix group power for allnodes_domainsSiddha, Suresh B2006-03-27
| | | | | | | | | | | | Current sched groups power calculation for allnodes_domains is wrong. We should really be using cumulative power of the physical packages in that group (similar to the calculation in node_domains) Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] sched: new sched domain for representing multi-coreSiddha, Suresh B2006-03-27
| | | | | | | | | | | | | | | | | | | | | | | | | Add a new sched domain for representing multi-core with shared caches between cores. Consider a dual package system, each package containing two cores and with last level cache shared between cores with in a package. If there are two runnable processes, with this appended patch those two processes will be scheduled on different packages. On such systems, with this patch we have observed 8% perf improvement with specJBB(2 warehouse) benchmark and 35% improvement with CFP2000 rate(with 2 users). This new domain will come into play only on multi-core systems with shared caches. On other systems, this sched domain will be removed by domain degeneration code. This new domain can be also used for implementing power savings policy (see OLS 2005 CMP kernel scheduler paper for more details.. I will post another patch for power savings policy soon) Most of the arch/* file changes are for cpu_coregroup_map() implementation. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Small schedule() optimizationAndreas Mohr2006-03-27
| | | | | | | | | small schedule() microoptimization. Signed-off-by: Andreas Mohr <andi@lisas.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] sched: fix task interactivity calculationMartin Andersson2006-03-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Is a truncation error in kernel/sched.c triggered when the nice value is negative. The affected code is used in the TASK_INTERACTIVE macro. The code is: #define SCALE(v1,v1_max,v2_max) \ (v1) * (v2_max) / (v1_max) which is used in this way: SCALE(TASK_NICE(p), 40, MAX_BONUS) Comments in the code says: * This part scales the interactivity limit depending on niceness. * * We scale it linearly, offset by the INTERACTIVE_DELTA delta. * Here are a few examples of different nice levels: * * TASK_INTERACTIVE(-20): [1,1,1,1,1,1,1,1,1,0,0] * TASK_INTERACTIVE(-10): [1,1,1,1,1,1,1,0,0,0,0] * TASK_INTERACTIVE( 0): [1,1,1,1,0,0,0,0,0,0,0] * TASK_INTERACTIVE( 10): [1,1,0,0,0,0,0,0,0,0,0] * TASK_INTERACTIVE( 19): [0,0,0,0,0,0,0,0,0,0,0] * * (the X axis represents the possible -5 ... 0 ... +5 dynamic * priority range a task can explore, a value of '1' means the * task is rated interactive.) However, the current code does not scale it linearly and the result differs from the given examples. If the mathematical function "floor" is used when the nice value is negative instead of the truncation one gets when using integer division, the result conforms to the documentation. Output of TASK_INTERACTIVE when using the kernel code: nice dynamic priorities -20 1 1 1 1 1 1 1 1 1 0 0 -19 1 1 1 1 1 1 1 1 0 0 0 -18 1 1 1 1 1 1 1 1 0 0 0 -17 1 1 1 1 1 1 1 1 0 0 0 -16 1 1 1 1 1 1 1 1 0 0 0 -15 1 1 1 1 1 1 1 0 0 0 0 -14 1 1 1 1 1 1 1 0 0 0 0 -13 1 1 1 1 1 1 1 0 0 0 0 -12 1 1 1 1 1 1 1 0 0 0 0 -11 1 1 1 1 1 1 0 0 0 0 0 -10 1 1 1 1 1 1 0 0 0 0 0 -9 1 1 1 1 1 1 0 0 0 0 0 -8 1 1 1 1 1 1 0 0 0 0 0 -7 1 1 1 1 1 0 0 0 0 0 0 -6 1 1 1 1 1 0 0 0 0 0 0 -5 1 1 1 1 1 0 0 0 0 0 0 -4 1 1 1 1 1 0 0 0 0 0 0 -3 1 1 1 1 0 0 0 0 0 0 0 -2 1 1 1 1 0 0 0 0 0 0 0 -1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 2 1 1 1 1 0 0 0 0 0 0 0 3 1 1 1 1 0 0 0 0 0 0 0 4 1 1 1 0 0 0 0 0 0 0 0 5 1 1 1 0 0 0 0 0 0 0 0 6 1 1 1 0 0 0 0 0 0 0 0 7 1 1 1 0 0 0 0 0 0 0 0 8 1 1 0 0 0 0 0 0 0 0 0 9 1 1 0 0 0 0 0 0 0 0 0 10 1 1 0 0 0 0 0 0 0 0 0 11 1 1 0 0 0 0 0 0 0 0 0 12 1 0 0 0 0 0 0 0 0 0 0 13 1 0 0 0 0 0 0 0 0 0 0 14 1 0 0 0 0 0 0 0 0 0 0 15 1 0 0 0 0 0 0 0 0 0 0 16 0 0 0 0 0 0 0 0 0 0 0 17 0 0 0 0 0 0 0 0 0 0 0 18 0 0 0 0 0 0 0 0 0 0 0 19 0 0 0 0 0 0 0 0 0 0 0 Output of TASK_INTERACTIVE when using "floor" nice dynamic priorities -20 1 1 1 1 1 1 1 1 1 0 0 -19 1 1 1 1 1 1 1 1 1 0 0 -18 1 1 1 1 1 1 1 1 1 0 0 -17 1 1 1 1 1 1 1 1 1 0 0 -16 1 1 1 1 1 1 1 1 0 0 0 -15 1 1 1 1 1 1 1 1 0 0 0 -14 1 1 1 1 1 1 1 1 0 0 0 -13 1 1 1 1 1 1 1 1 0 0 0 -12 1 1 1 1 1 1 1 0 0 0 0 -11 1 1 1 1 1 1 1 0 0 0 0 -10 1 1 1 1 1 1 1 0 0 0 0 -9 1 1 1 1 1 1 1 0 0 0 0 -8 1 1 1 1 1 1 0 0 0 0 0 -7 1 1 1 1 1 1 0 0 0 0 0 -6 1 1 1 1 1 1 0 0 0 0 0 -5 1 1 1 1 1 1 0 0 0 0 0 -4 1 1 1 1 1 0 0 0 0 0 0 -3 1 1 1 1 1 0 0 0 0 0 0 -2 1 1 1 1 1 0 0 0 0 0 0 -1 1 1 1 1 1 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 2 1 1 1 1 0 0 0 0 0 0 0 3 1 1 1 1 0 0 0 0 0 0 0 4 1 1 1 0 0 0 0 0 0 0 0 5 1 1 1 0 0 0 0 0 0 0 0 6 1 1 1 0 0 0 0 0 0 0 0 7 1 1 1 0 0 0 0 0 0 0 0 8 1 1 0 0 0 0 0 0 0 0 0 9 1 1 0 0 0 0 0 0 0 0 0 10 1 1 0 0 0 0 0 0 0 0 0 11 1 1 0 0 0 0 0 0 0 0 0 12 1 0 0 0 0 0 0 0 0 0 0 13 1 0 0 0 0 0 0 0 0 0 0 14 1 0 0 0 0 0 0 0 0 0 0 15 1 0 0 0 0 0 0 0 0 0 0 16 0 0 0 0 0 0 0 0 0 0 0 17 0 0 0 0 0 0 0 0 0 0 0 18 0 0 0 0 0 0 0 0 0 0 0 19 0 0 0 0 0 0 0 0 0 0 0 Signed-off-by: Martin Andersson <martin.andersson@control.lth.se> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Williams <pwil3058@bigpond.net.au> Cc: Con Kolivas <kernel@kolivas.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivialLinus Torvalds2006-03-26
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: drivers/char/ftape/lowlevel/fdc-io.c: Correct a comment Kconfig help: MTD_JEDECPROBE already supports Intel Remove ugly debugging stuff do_mounts.c: Minor ROOT_DEV comment cleanup BUG_ON() Conversion in drivers/s390/block/dasd_devmap.c BUG_ON() Conversion in mm/mempool.c BUG_ON() Conversion in mm/memory.c BUG_ON() Conversion in kernel/fork.c BUG_ON() Conversion in ipc/sem.c BUG_ON() Conversion in fs/ext2/ BUG_ON() Conversion in fs/hfs/ BUG_ON() Conversion in fs/dcache.c BUG_ON() Conversion in fs/buffer.c BUG_ON() Conversion in input/serio/hp_sdc_mlc.c BUG_ON() Conversion in md/dm-table.c BUG_ON() Conversion in md/dm-path-selector.c BUG_ON() Conversion in drivers/isdn BUG_ON() Conversion in drivers/char BUG_ON() Conversion in drivers/mtd/
| * BUG_ON() Conversion in kernel/fork.cEric Sesterhenn2006-03-26
| | | | | | | | | | | | | | | | this changes if() BUG(); constructs to BUG_ON() which is cleaner, contains unlikely() and can better optimized away. Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
* | [PATCH] kretprobe instance recycled by parent processbibo mao2006-03-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When kretprobe probes the schedule() function, if the probed process exits then schedule() will never return, so some kretprobe instances will never be recycled. In this patch the parent process will recycle retprobe instances of the probed function and there will be no memory leak of kretprobe instances. Signed-off-by: bibo mao <bibo.mao@intel.com> Cc: Masami Hiramatsu <hiramatu@sdl.hitachi.co.jp> Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] hrtimers: remove data fieldRoman Zippel2006-03-26
| | | | | | | | | | | | | | | | | | | | | | | | The nanosleep cleanup allows to remove the data field of hrtimer. The callback function can use container_of() to get it's own data. Since the hrtimer structure is anyway embedded in other structures, this adds no overhead. Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] hrtimers: remove nsec_t typedefRoman Zippel2006-03-26
| | | | | | | | | | | | | | | | | | | | | | | | nsec_t predates ktime_t and has mostly been superseded by it. In the few places that are left it's better to make it explicit that we're dealing with 64 bit values here. Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] hrtimers: remove state fieldRoman Zippel2006-03-26
| | | | | | | | | | | | | | | | | | | | | | Remove the state field and encode this information in the rb_node similiar to normal timer. Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] hrtimers: simplify nanosleepRoman Zippel2006-03-26
| | | | | | | | | | | | | | | | | | | | | | | | nanosleep is the only user of the expired state, so let it manage this itself, which makes the hrtimer code a bit simpler. The remaining time is also only calculated if requested. Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Acked-by: Ingo Molnar <mingo@elte.hu> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] hrtimers: posix-timer: cleanup common_timer_get()Roman Zippel2006-03-26
| | | | | | | | | | | | | | | | | | Cleanup common_timer_get() a little. Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] hrtimers: pass current time to hrtimer_forward()Roman Zippel2006-03-26
| | | | | | | | | | | | | | | | | | | | | | Pass current time to hrtimer_forward(). This allows to use the softirq time in the timer base when the forward function is called from the timer callback. Other places pass current time with a call to timer->base->get_time(). Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] hrtimers: optimize softirq runqueuesThomas Gleixner2006-03-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The hrtimer softirq is called from the timer softirq every tick. Retrieve the current time from xtime and wall_to_monotonic instead of calling base->get_time() for each timer base. Store the time in the base structure and provide a hook once clock source abstractions are in place and to keep the code open for new base clocks. Based on a patch from: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] consolidate sys32/compat_adjtimexStephen Rothwell2006-03-26
| | | | | | | | | | | | | | | | | | | | Create compat_sys_adjtimex and use it an all appropriate places. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Arnd Bergmann <arnd@arndb.de> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] swswsup: return correct load_image errorCon Kolivas2006-03-26
| | | | | | | | | | | | | | | | | | | | | | If there's an error in load_image() we should return that without checking snapshot_image_loaded. Signed-off-by: Con Kolivas <kernel@kolivas.org> Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] warn if free_irq() is called from IRQ contextIngo Molnar2006-03-26
|/ | | | | | | | | Warn if free_irq() is called in IRQ context - free_irq() can execute /proc VFS work, which must not be done in IRQ context. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* Merge branch 'audit.b3' of ↵Linus Torvalds2006-03-25
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current * 'audit.b3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current: (22 commits) [PATCH] fix audit_init failure path [PATCH] EXPORT_SYMBOL patch for audit_log, audit_log_start, audit_log_end and audit_format [PATCH] sem2mutex: audit_netlink_sem [PATCH] simplify audit_free() locking [PATCH] Fix audit operators [PATCH] promiscuous mode [PATCH] Add tty to syscall audit records [PATCH] add/remove rule update [PATCH] audit string fields interface + consumer [PATCH] SE Linux audit events [PATCH] Minor cosmetic cleanups to the code moved into auditfilter.c [PATCH] Fix audit record filtering with !CONFIG_AUDITSYSCALL [PATCH] Fix IA64 success/failure indication in syscall auditing. [PATCH] Miscellaneous bug and warning fixes [PATCH] Capture selinux subject/object context information. [PATCH] Exclude messages by message type [PATCH] Collect more inode information during syscall processing. [PATCH] Pass dentry, not just name, in fsnotify creation hooks. [PATCH] Define new range of userspace messages. [PATCH] Filter rule comparators ... Fixed trivial conflict in security/selinux/hooks.c
| * [PATCH] fix audit_init failure pathAmy Griffis2006-03-20
| | | | | | | | | | | | | | | | | | | | Make audit_init() failure path handle situations where the audit_panic() action is not AUDIT_FAIL_PANIC (default is AUDIT_FAIL_PRINTK). Other uses of audit_sock are not reached unless audit's netlink message handler is properly registered. Bug noticed by Peter Staubach. Signed-off-by: Amy Griffis <amy.griffis@hp.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * [PATCH] EXPORT_SYMBOL patch for audit_log, audit_log_start, audit_log_end ↵lorenzo@gnu.org2006-03-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | and audit_format Hi, This is a trivial patch that enables the possibility of using some auditing functions within loadable kernel modules (ie. inside a Linux Security Module). _ Make the audit_log_start, audit_log_end, audit_format and audit_log interfaces available to Loadable Kernel Modules, thus making possible the usage of the audit framework inside LSMs, etc. Signed-off-by: <Lorenzo Hernández García-Hierro <lorenzo@gnu.org>> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * [PATCH] sem2mutex: audit_netlink_semIngo Molnar2006-03-20
| | | | | | | | | | | | | | | | | | | | | | | | Semaphore to mutex conversion. The conversion was generated via scripts, and the result was validated automatically via a script as well. Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * [PATCH] simplify audit_free() lockingIngo Molnar2006-03-20
| | | | | | | | | | | | | | | | | | | | | | Simplify audit_free()'s locking: no need to lock a task that we are tearing down. [the extra locking also caused false positives in the lock validator] Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * [PATCH] Fix audit operatorsDustin Kirkland2006-03-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Darrel Goeddel initiated a discussion on IRC regarding the possibility of audit_comparator() returning -EINVAL signaling an invalid operator. It is possible when creating the rule to assure that the operator is one of the 6 sane values. Here's a snip from include/linux/audit.h Note that 0 (nonsense) and 7 (all operators) are not valid values for an operator. ... /* These are the supported operators. * 4 2 1 * = > < * ------- * 0 0 0 0 nonsense * 0 0 1 1 < * 0 1 0 2 > * 0 1 1 3 != * 1 0 0 4 = * 1 0 1 5 <= * 1 1 0 6 >= * 1 1 1 7 all operators */ ... Furthermore, prior to adding these extended operators, flagging the AUDIT_NEGATE bit implied !=, and otherwise == was assumed. The following code forces the operator to be != if the AUDIT_NEGATE bit was flipped on. And if no operator was specified, == is assumed. The only invalid condition is if the AUDIT_NEGATE bit is off and all of the AUDIT_EQUAL, AUDIT_LESS_THAN, and AUDIT_GREATER_THAN bits are on--clearly a nonsensical operator. Now that this is handled at rule insertion time, the default -EINVAL return of audit_comparator() is eliminated such that the function can only return 1 or 0. If this is acceptable, let's get this applied to the current tree. :-Dustin -- Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> (cherry picked from 9bf0a8e137040f87d1b563336d4194e38fb2ba1a commit)
| * [PATCH] Add tty to syscall audit recordsSteve Grubb2006-03-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Hi, >From the RBAC specs: FAU_SAR.1.1 The TSF shall provide the set of authorized RBAC administrators with the capability to read the following audit information from the audit records: <snip> (e) The User Session Identifier or Terminal Type A patch adding the tty for all syscalls is included in this email. Please apply. Signed-off-by: Steve Grubb <sgrubb@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * [PATCH] add/remove rule updateSteve Grubb2006-03-20
| | | | | | | | | | | | | | | | | | | | Hi, The following patch adds a little more information to the add/remove rule message emitted by the kernel. Signed-off-by: Steve Grubb <sgrubb@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * [PATCH] audit string fields interface + consumerAmy Griffis2006-03-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Updated patch to dynamically allocate audit rule fields in kernel's internal representation. Added unlikely() calls for testing memory allocation result. Amy Griffis wrote: [Wed Jan 11 2006, 02:02:31PM EST] > Modify audit's kernel-userspace interface to allow the specification > of string fields in audit rules. > > Signed-off-by: Amy Griffis <amy.griffis@hp.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> (cherry picked from 5ffc4a863f92351b720fe3e9c5cd647accff9e03 commit)
| * [PATCH] Minor cosmetic cleanups to the code moved into auditfilter.cDavid Woodhouse2006-03-20
| | | | | | | | Signed-off-by: David Woodhouse <dwmw2@infradead.org>
| * [PATCH] Fix audit record filtering with !CONFIG_AUDITSYSCALLDavid Woodhouse2006-03-20
| | | | | | | | | | | | | | | | | | | | This fixes the per-user and per-message-type filtering when syscall auditing isn't enabled. [AV: folded followup fix from the same author] Signed-off-by: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * [PATCH] Miscellaneous bug and warning fixesDustin Kirkland2006-03-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes a couple of bugs revealed in new features recently added to -mm1: * fixes warnings due to inconsistent use of const struct inode *inode * fixes bug that prevent a kernel from booting with audit on, and SELinux off due to a missing function in security/dummy.c * fixes a bug that throws spurious audit_panic() messages due to a missing return just before an error_path label * some reasonable house cleaning in audit_ipc_context(), audit_inode_context(), and audit_log_task_context() Signed-off-by: Dustin Kirkland <dustin.kirkland@us.ibm.com> Signed-off-by: David Woodhouse <dwmw2@infradead.org>
| * [PATCH] Capture selinux subject/object context information.Dustin Kirkland2006-03-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch extends existing audit records with subject/object context information. Audit records associated with filesystem inodes, ipc, and tasks now contain SELinux label information in the field "subj" if the item is performing the action, or in "obj" if the item is the receiver of an action. These labels are collected via hooks in SELinux and appended to the appropriate record in the audit code. This additional information is required for Common Criteria Labeled Security Protection Profile (LSPP). [AV: fixed kmalloc flags use] [folded leak fixes] [folded cleanup from akpm (kfree(NULL)] [folded audit_inode_context() leak fix] [folded akpm's fix for audit_ipc_perm() definition in case of !CONFIG_AUDIT] Signed-off-by: Dustin Kirkland <dustin.kirkland@us.ibm.com> Signed-off-by: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * [PATCH] Exclude messages by message typeDustin Kirkland2006-03-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Add a new, 5th filter called "exclude". - And add a new field AUDIT_MSGTYPE. - Define a new function audit_filter_exclude() that takes a message type as input and examines all rules in the filter. It returns '1' if the message is to be excluded, and '0' otherwise. - Call the audit_filter_exclude() function near the top of audit_log_start() just after asserting audit_initialized. If the message type is not to be audited, return NULL very early, before doing a lot of work. [combined with followup fix for bug in original patch, Nov 4, same author] [combined with later renaming AUDIT_FILTER_EXCLUDE->AUDIT_FILTER_TYPE and audit_filter_exclude() -> audit_filter_type()] Signed-off-by: Dustin Kirkland <dustin.kirkland@us.ibm.com> Signed-off-by: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * [PATCH] Collect more inode information during syscall processing.Amy Griffis2006-03-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch augments the collection of inode info during syscall processing. It represents part of the functionality that was provided by the auditfs patch included in RHEL4. Specifically, it: - Collects information for target inodes created or removed during syscalls. Previous code only collects information for the target inode's parent. - Adds the audit_inode() hook to syscalls that operate on a file descriptor (e.g. fchown), enabling audit to do inode filtering for these calls. - Modifies filtering code to check audit context for either an inode # or a parent inode # matching a given rule. - Modifies logging to provide inode # for both parent and child. - Protect debug info from NULL audit_names.name. [AV: folded a later typo fix from the same author] Signed-off-by: Amy Griffis <amy.griffis@hp.com> Signed-off-by: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * [PATCH] Pass dentry, not just name, in fsnotify creation hooks.Amy Griffis2006-03-20
| | | | | | | | | | | | | | | | The audit hooks (to be added shortly) will want to see dentry->d_inode too, not just the name. Signed-off-by: Amy Griffis <amy.griffis@hp.com> Signed-off-by: David Woodhouse <dwmw2@infradead.org>
| * [PATCH] Define new range of userspace messages.Steve Grubb2006-03-20
| | | | | | | | | | | | | | | | The attached patch updates various items for the new user space messages. Please apply. Signed-off-by: Steve Grubb <sgrubb@redhat.com> Signed-off-by: David Woodhouse <dwmw2@infradead.org>
| * [PATCH] Filter rule comparatorsDustin Kirkland2006-03-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, audit only supports the "=" and "!=" operators in the -F filter rules. This patch reworks the support for "=" and "!=", and adds support for ">", ">=", "<", and "<=". This turned out to be a pretty clean, and simply process. I ended up using the high order bits of the "field", as suggested by Steve and Amy. This allowed for no changes whatsoever to the netlink communications. See the documentation within the patch in the include/linux/audit.h area, where there is a table that explains the reasoning of the bitmask assignments clearly. The patch adds a new function, audit_comparator(left, op, right). This function will perform the specified comparison (op, which defaults to "==" for backward compatibility) between two values (left and right). If the negate bit is on, it will negate whatever that result was. This value is returned. Signed-off-by: Dustin Kirkland <dustin.kirkland@us.ibm.com> Signed-off-by: David Woodhouse <dwmw2@infradead.org>
| * [PATCH] AUDIT: kerneldoc for kernel/audit*.cRandy Dunlap2006-03-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | - add kerneldoc for non-static functions; - don't init static data to 0; - limit lines to < 80 columns; - fix long-format style; - delete whitespace at end of some lines; (chrisw: resend and update to current audit-2.6 tree) Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Chris Wright <chrisw@osdl.org> Signed-off-by: David Woodhouse <dwmw2@infradead.org>
| * [PATCH] make vm86 call audit_syscall_exitJason Baron2006-03-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | hi, The motivation behind the patch below was to address messages in /var/log/messages such as: Jan 31 10:54:15 mets kernel: audit(:0): major=252 name_count=0: freeing multiple contexts (1) Jan 31 10:54:15 mets kernel: audit(:0): major=113 name_count=0: freeing multiple contexts (2) I can reproduce by running 'get-edid' from: http://john.fremlin.de/programs/linux/read-edid/. These messages come about in the log b/c the vm86 calls do not exit via the normal system call exit paths and thus do not call 'audit_syscall_exit'. The next system call will then free the context for itself and for the vm86 context, thus generating the above messages. This patch addresses the issue by simply adding a call to 'audit_syscall_exit' from the vm86 code. Besides fixing the above error messages the patch also now allows vm86 system calls to become auditable. This is useful since strace does not appear to properly record the return values from sys_vm86. I think this patch is also a step in the right direction in terms of cleaning up some core auditing code. If we can correct any other paths that do not properly call the audit exit and entries points, then we can also eliminate the notion of context chaining. I've tested this patch by verifying that the log messages no longer appear, and that the audit records for sys_vm86 appear to be correct. Also, 'read_edid' produces itentical output. thanks, -Jason Signed-off-by: Jason Baron <jbaron@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivialLinus Torvalds2006-03-25
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (21 commits) BUG_ON() Conversion in drivers/video/ BUG_ON() Conversion in drivers/parisc/ BUG_ON() Conversion in drivers/block/ BUG_ON() Conversion in sound/sparc/cs4231.c BUG_ON() Conversion in drivers/s390/block/dasd.c BUG_ON() Conversion in lib/swiotlb.c BUG_ON() Conversion in kernel/cpu.c BUG_ON() Conversion in ipc/msg.c BUG_ON() Conversion in block/elevator.c BUG_ON() Conversion in fs/coda/ BUG_ON() Conversion in fs/binfmt_elf_fdpic.c BUG_ON() Conversion in input/serio/hil_mlc.c BUG_ON() Conversion in md/dm-hw-handler.c BUG_ON() Conversion in md/bitmap.c The comment describing how MS_ASYNC works in msync.c is confusing rcu: undeclared variable used in documentation fix typos "wich" -> "which" typo patch for fs/ufs/super.c Fix simple typos tabify drivers/char/Makefile ...
| * | BUG_ON() Conversion in kernel/cpu.cEric Sesterhenn2006-03-24
| | | | | | | | | | | | | | | | | | | | | | | | this changes if() BUG(); constructs to BUG_ON() which is cleaner, contains unlikely() and can better optimized away. Signed-off-by: Eric Sesterhenn <snakebyte@gmx.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
* | | [PATCH] remove pps supportRoman Zippel2006-03-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This removes the support for pps. It's completely unused within the kernel and is basically in the way for further cleanups. It should be easier to readd proper support for it after the rest has been converted to NTP4 (where the pps mechanisms are quite different from NTP3 anyway). Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Cc: Adrian Bunk <bunk@stusta.de> Cc: john stultz <johnstul@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | [PATCH] Add SA_PERCPU_IRQ flag supportDimitri Sivanich2006-03-25
| | | | | | | | | | | | | | | | | | | | | | | | Add support for SA_PERCPU_IRQ (only mmtimer.c uses this at this stage). Signed-off-by: Dimitri Sivanich <sivanich@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | [PATCH] Use unsigned int types for a faster bsearchEric Dumazet2006-03-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch avoids arithmetic on 'signed' types that are slower than 'unsigned'. This saves space and cpu cycles. size of kernel/sys.o before the patch (gcc-3.4.5) text data bss dec hex filename 10924 252 4 11180 2bac kernel/sys.o size of kernel/sys.o after the patch text data bss dec hex filename 10903 252 4 11159 2b97 kernel/sys.o I noticed that gcc-4.1.0 (from Fedora Core 5) even uses idiv instruction for (a+b)/2 if a and b are signed. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | [PATCH] Check if cpu can be onlined before calling smp_prepare_cpu()Ashok Raj2006-03-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Moved check for online cpu out of smp_prepare_cpu() - Moved default declaration of smp_prepare_cpu() to kernel/cpu.c - Removed lock_cpu_hotplug() from smp_prepare_cpu() to around it, since its called from cpu_up() as well now. - Removed clearing from cpu_present_map during cpu_offline as it breaks using cpu_up() directly during a subsequent online operation. Signed-off-by: Ashok Raj <ashok.raj@intel.com> Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: "Li, Shaohua" <shaohua.li@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | [PATCH] No need to protect current->group_info in sys_getgroups(), ↵Eric Dumazet2006-03-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | in_group_p() and in_egroup_p() While doing some benchmarks of an Apache/PHP SMP server, I noticed high oprofile numbers in in_group_p() and _atomic_dec_and_lock(). rank percent 1 4.8911 % __link_path_walk 2 4.8503 % __d_lookup *3 4.2911 % _atomic_dec_and_lock 4 3.9307 % __copy_to_user_ll 5 4.9004 % sysenter_past_esp *6 3.3248 % in_group_p It appears that in_group_p() does an uncessary get_group_info(current->group_info); /* atomic_inc() */ ... /* access current->group_info */ put_group_info(current->group_info); /* _atomic_dec_and_lock */ It is not necessary to do this, because the current task holds a reference on its own group_info, and this reference cannot change during the lookup. This patch deletes the get_group_info()/put_group_info() pair from sys_getgroups(), in_group_p() and in_egroup_p() functions. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Cc: Tim Hockin <thockin@hockin.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | [PATCH] find_task_by_pid() needs tasklist_lockAndrew Morton2006-03-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A couple of places are forgetting to take it. The kswapd case is probably unimportant. keventd_create_kthread() was racy. The whole thing is a bit flakey: you start a kernel thread, get its pid from kernel_thread() then look up its task_struct. a) It assumes that pid recycling takes a "long" time. b) We get a task_struct but no reference was taken on it. The owner of the kswapd and kthread task_struct*'s must assume that the new thread won't exit unexpectedly. Because if it does, they're left holding dead memory and any attempt to control or stop that task will crash. Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | [PATCH] refactor capable() to one implementation, add __capable() helperChris Wright2006-03-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move capable() to kernel/capability.c and eliminate duplicate implementations. Add __capable() function which can be used to check for capabiilty of any process. Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>