litmus-rt.git - The LITMUS^RT kernel.

	Commit message (Collapse)	Author	Age
*	infrastructure to debug (dynamic) objects	Thomas Gleixner	2008-04-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We can see an ever repeating problem pattern with objects of any kind in the kernel: 1) freeing of active objects 2) reinitialization of active objects Both problems can be hard to debug because the crash happens at a point where we have no chance to decode the root cause anymore. One problem spot are kernel timers, where the detection of the problem often happens in interrupt context and usually causes the machine to panic. While working on a timer related bug report I had to hack specialized code into the timer subsystem to get a reasonable hint for the root cause. This debug hack was fine for temporary use, but far from a mergeable solution due to the intrusiveness into the timer code. The code further lacked the ability to detect and report the root cause instantly and keep the system operational. Keeping the system operational is important to get hold of the debug information without special debugging aids like serial consoles and special knowledge of the bug reporter. The problems described above are not restricted to timers, but timers tend to expose it usually in a full system crash. Other objects are less explosive, but the symptoms caused by such mistakes can be even harder to debug. Instead of creating specialized debugging code for the timer subsystem a generic infrastructure is created which allows developers to verify their code and provides an easy to enable debug facility for users in case of trouble. The debugobjects core code keeps track of operations on static and dynamic objects by inserting them into a hashed list and sanity checking them on object operations and provides additional checks whenever kernel memory is freed. The tracked object operations are: - initializing an object - adding an object to a subsystem list - deleting an object from a subsystem list Each operation is sanity checked before the operation is executed and the subsystem specific code can provide a fixup function which allows to prevent the damage of the operation. When the sanity check triggers a warning message and a stack trace is printed. The list of operations can be extended if the need arises. For now it's limited to the requirements of the first user (timers). The core code enqueues the objects into hash buckets. The hash index is generated from the address of the object to simplify the lookup for the check on kfree/vfree. Each bucket has it's own spinlock to avoid contention on a global lock. The debug code can be compiled in without being active. The runtime overhead is minimal and could be optimized by asm alternatives. A kernel command line option enables the debugging code. Thanks to Ingo Molnar for review, suggestions and cleanup patches. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Greg KH <greg@kroah.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	slab: add a flag to prevent debug_free checks on a kmem_cache	Thomas Gleixner	2008-04-30
\| \| \| \| \| \| \| \| \| \| \| \| \|	This is a preperatory patch for the debugobjects infrastructure. The flag prevents debug_free checks on kmem_caches. This is necessary to avoid resursive calls into a debug mechanism which uses a kmem_cache itself. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	Add macros similar to min/max/min_t/max_t	Harvey Harrison	2008-04-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Also, change the variable names used in the min/max macros to avoid shadowed variable warnings when min/max min_t/max_t are nested. Small formatting changes to make all the macros have a similar form. [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: fix v4l build] Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Cc: Mauro Carvalho Chehab <mchehab@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Cc: Jeff Garzik <jeff@garzik.org> Cc: Tejun Heo <htejun@gmail.com> Cc: Michael Buesch <mb@bu3sch.de> Cc: "John W. Linville" <linville@tuxdriver.com> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Dmitry Torokhov <dtor@mail.ru> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	Basic braille screen reader support	Samuel Thibault	2008-04-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds a minimalistic braille screen reader support. This is meant to be used by blind people e.g. on boot failures or when / cannot be mounted etc and thus the userland screen readers can not work. [akpm@linux-foundation.org: fix exports] Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org> Cc: Jiri Kosina <jikos@jikos.cz> Cc: Dmitry Torokhov <dtor@mail.ru> Acked-by: Alan Cox <alan@redhat.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	reiserfs: use open_bdev_excl	Christoph Hellwig	2008-04-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use the proper helper to open a blockdevice by name for filesystem use, this makes sure it's properly claimed (also added for open-by-number) and gets rid of the struct file abuse. Tested by mounting a reiserfs filesystem with external journal. Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Chris Mason <chris.mason@oracle.com> Cc: Jeff Mahoney <jeffm@suse.com> Acked-by: Edward Shishkin <edward.shishkin@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	mm: Add NR_WRITEBACK_TEMP counter	Miklos Szeredi	2008-04-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fuse will use temporary buffers to write back dirty data from memory mappings (normal writes are done synchronously). This is needed, because there cannot be any guarantee about the time in which a write will complete. By using temporary buffers, from the MM's point if view the page is written back immediately. If the writeout was due to memory pressure, this effectively migrates data from a full zone to a less full zone. This patch adds a new counter (NR_WRITEBACK_TEMP) for the number of pages used as temporary buffers. [Lee.Schermerhorn@hp.com: add vmstat_text for NR_WRITEBACK_TEMP] Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Christoph Lameter <clameter@sgi.com> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	mm: bdi: export bdi_writeout_inc()	Miklos Szeredi	2008-04-30
\| \| \| \| \| \| \| \| \|	Fuse needs this for writable mmap support. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	mm: bdi: add separate writeback accounting capability	Miklos Szeredi	2008-04-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a new BDI capability flag: BDI_CAP_NO_ACCT_WB. If this flag is set, then don't update the per-bdi writeback stats from test_set_page_writeback() and test_clear_page_writeback(). Misc cleanups: - convert bdi_cap_writeback_dirty() and friends to static inline functions - create a flag that includes all three dirty/writeback related flags, since almst all users will want to have them toghether Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	mm: bdi: move statistics to debugfs	Miklos Szeredi	2008-04-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Move BDI statistics to debugfs: /sys/kernel/debug/bdi/<bdi>/stats Use postcore_initcall() to initialize the sysfs class and debugfs, because debugfs is initialized in core_initcall(). Update descriptions in ABI documentation. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	mm: bdi: allow setting a maximum for the bdi dirty limit	Peter Zijlstra	2008-04-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add "max_ratio" to /sys/class/bdi. This indicates the maximum percentage of the global dirty threshold allocated to this bdi. [mszeredi@suse.cz] - fix parsing in max_ratio_store(). - export bdi_set_max_ratio() to modules - limit bdi_dirty with bdi->max_ratio - document new sysfs attribute Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	mm: bdi: allow setting a minimum for the bdi dirty limit	Peter Zijlstra	2008-04-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Under normal circumstances each device is given a part of the total write-back cache that relates to its current avg writeout speed in relation to the other devices. min_ratio - allows one to assign a minimum portion of the write-back cache to a particular device. This is useful in situations where you might want to provide a minimum QoS. (One request for this feature came from flash based storage people who wanted to avoid writing out at all costs - they of course needed some pdflush hacks as well) max_ratio - allows one to assign a maximum portion of the dirty limit to a particular device. This is useful in situations where you want to avoid one device taking all or most of the write-back cache. Eg. an NFS mount that is prone to get stuck, or a FUSE mount which you don't trust to play fair. Add "min_ratio" to /sys/class/bdi. This indicates the minimum percentage of the global dirty threshold allocated to this bdi. [mszeredi@suse.cz] - fix parsing in min_ratio_store() - document new sysfs attribute Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	mm: bdi: export BDI attributes in sysfs	Peter Zijlstra	2008-04-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Provide a place in sysfs (/sys/class/bdi) for the backing_dev_info object. This allows us to see and set the various BDI specific variables. In particular this properly exposes the read-ahead window for all relevant users and /sys/block/<block>/queue/read_ahead_kb should be deprecated. With patient help from Kay Sievers and Greg KH [mszeredi@suse.cz] - split off NFS and FUSE changes into separate patches - document new sysfs attributes under Documentation/ABI - do bdi_class_init as a core_initcall, otherwise the "default" BDI won't be initialized - remove bdi_init_fmt macro, it's not used very much [akpm@linux-foundation.org: fix ia64 warning] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Kay Sievers <kay.sievers@vrfy.org> Acked-by: Greg KH <greg@kroah.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	pidns: make pid->level and pid_ns->level unsigned	Pavel Emelyanov	2008-04-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	These values represent the nesting level of a namespace and pids living in it, and it's always non-negative. Turning this from int to unsigned int saves some space in pid.c (11 bytes on x86 and 64 on ia64) by letting the compiler optimize the pid_nr_ns a bit. E.g. on ia64 this removes the sign extension calls, which compiler adds to optimize access to pid->nubers[ns->level]. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	pids: introduce change_pid() helper	Oleg Nesterov	2008-04-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Based on Eric W. Biederman's idea. Without tasklist_lock held task_session()/task_pgrp() can return NULL if the caller races with setprgp()/setsid() which does detach_pid() + attach_pid(). This can happen even if task == current. Intoduce the new helper, change_pid(), which should be used instead. This way the caller always sees the special pid != NULL, either old or new. Also change the prototype of attach_pid(), it always returns 0 and nobody check the returned value. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Pavel Emelyanov <xemul@openvz.org> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	Deprecate find_task_by_pid()	Pavel Emelyanov	2008-04-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are some places that are known to operate on tasks' global pids only: * the rest_init() call (called on boot) * the kgdb's getthread * the create_kthread() (since the kthread is run in init ns) So use the find_task_by_pid_ns(..., &init_pid_ns) there and schedule the find_task_by_pid for removal. [sukadev@us.ibm.com: Fix warning in kernel/pid.c] Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	devpts: factor out PTY index allocation	Sukadev Bhattiprolu	2008-04-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Factor out the code used to allocate/free a pts index into new interfaces, devpts_new_index() and devpts_kill_index(). This localizes the external data structures used in managing the pts indices. [akpm@linux-foundation.org: undo accidental mutex2sem conversion] Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Matt Helsley <matthltc@us.ibm.com> Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	tty: add throttle/unthrottle helpers	Alan Cox	2008-04-30
\| \| \| \| \| \| \| \| \|	Something Arjan suggested which allows us to clean up the code nicely Signed-off-by: Alan Cox <alan@redhat.com> Cc: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	tty: The big operations rework	Alan Cox	2008-04-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Operations are now a shared const function block as with most other Linux objects - Introduce wrappers for some optional functions to get consistent behaviour - Wrap put_char which used to be patched by the tty layer - Document which functions are needed/optional - Make put_char report success/fail - Cache the driver->ops pointer in the tty as tty->ops - Remove various surplus lock calls we no longer need - Remove proc_write method as noted by Alexey Dobriyan - Introduce some missing sanity checks where certain driver/ldisc combinations would oops as they didn't check needed methods were present [akpm@linux-foundation.org: fix fs/compat_ioctl.c build] [akpm@linux-foundation.org: fix isicom] [akpm@linux-foundation.org: fix arch/ia64/hp/sim/simserial.c build] [akpm@linux-foundation.org: fix kgdb] Signed-off-by: Alan Cox <alan@redhat.com> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Cc: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	char: switch gs, cyclades and esp to return int for put_char	Alan Cox	2008-04-30
\| \| \| \| \| \| \|	Signed-off-by: Alan Cox <alan@redhat.com> Cc: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	tty_io: fix remaining pid struct locking	Alan Cox	2008-04-30
\| \| \| \| \| \| \| \| \| \|	This fixes the last couple of pid struct locking failures I know about. [oleg@tv-sign.ru: clean up do_task_stat()] Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	redo locking of tty->pgrp	Alan Cox	2008-04-30
\| \| \| \| \| \| \| \| \| \| \| \|	Historically tty->pgrp and friends were pid_t and the code "knew" they were safe. The change to pid structs opened up a few races and the removal of the BKL in places made them quite hittable. We put tty->pgrp under the ctrl_lock for the tty. Signed-off-by: Alan Cox <alan@redhat.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	tty: BKL pushdown	Alan Cox	2008-04-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Push the BKL down into the line disciplines - Switch the tty layer to unlocked_ioctl - Introduce a new ctrl_lock spin lock for the control bits - Eliminate much of the lock_kernel use in n_tty - Prepare to (but don't yet) call the drivers with the lock dropped on the paths that historically held the lock BKL now primarily protects open/close/ldisc change in the tty layer [jirislaby@gmail.com: a couple of fixes] Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	ptrace: introduce ptrace_reparented() helper	Oleg Nesterov	2008-04-30
\| \| \| \| \| \| \| \| \| \| \| \|	Add another trivial helper for the sake of grep. It also auto-documents the fact that ->parent != real_parent implies ->ptrace. No functional changes. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	signals: use HAVE_SET_RESTORE_SIGMASK	Roland McGrath	2008-04-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Change all the #ifdef TIF_RESTORE_SIGMASK conditionals in non-arch code to #ifdef HAVE_SET_RESTORE_SIGMASK. If arch code defines it first, the generic set_restore_sigmask() using TIF_RESTORE_SIGMASK is not defined. Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	signals: set_restore_sigmask TIF_SIGPENDING	Roland McGrath	2008-04-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Set TIF_SIGPENDING in set_restore_sigmask. This lets arch code take TIF_RESTORE_SIGMASK out of the set of bits that will be noticed on return to user mode. On some machines those bits are scarce, and we can free this unneeded one up for other uses. It is probably the case that TIF_SIGPENDING is always set anyway everywhere set_restore_sigmask() is used. But this is some cheap paranoia in case there is an arcane case where it might not be. Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	signals: add set_restore_sigmask	Roland McGrath	2008-04-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds the set_restore_sigmask() inline in <linux/thread_info.h> and replaces every set_thread_flag(TIF_RESTORE_SIGMASK) with a call to it. No change, but abstracts the details of the flag protocol from all the calls. Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	signals: fix /sbin/init protection from unwanted signals	Oleg Nesterov	2008-04-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The global init has a lot of long standing problems with the unhandled fatal signals. - The "is_global_init(current)" check in get_signal_to_deliver() protects only the main thread. Sub-thread can dequee the fatal signal and shutdown the whole thread group except the main thread. If it dequeues SIGSTOP /sbin/init will be stopped, this is not right too. Note that we can't use is_global_init(->group_leader), this breaks exec and this can't solve other problems we have. - Even if afterwards ignored, the fatal signals sets SIGNAL_GROUP_EXIT on delivery. This breaks exec, has other bad implications, and this is just wrong. Introduce the new SIGNAL_UNKILLABLE flag to fix these problems. It also helps to solve some other problems addressed by the subsequent patches. Currently we use this flag for the global init only, but it could also be used by kthreads and (perhaps) by the sub-namespace inits. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	signals: join send_sigqueue() with send_group_sigqueue()	Oleg Nesterov	2008-04-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	We export send_sigqueue() and send_group_sigqueue() for the only user, posix_timer_event(). This is a bit silly, because both are just trivial helpers on top of do_send_sigqueue() and because the we pass the unused .si_signo parameter. Kill them both, rename do_send_sigqueue() to send_sigqueue(), and export it. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	kill_pid_info: don't take now unneeded tasklist_lock	Oleg Nesterov	2008-04-30
\| \| \| \| \| \| \| \| \| \| \| \|	Previously handle_stop_signal(SIGCONT) could drop ->siglock. That is why kill_pid_info(SIGCONT) takes tasklist_lock to make sure the target task can't go away after unlock. Not needed now. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Cc: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	signals: re-assign CLD_CONTINUED notification from the sender to reciever	Oleg Nesterov	2008-04-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Based on discussion with Jiri and Roland. In short: currently handle_stop_signal(SIGCONT, p) sends the notification to p->parent, with this patch p itself notifies its parent when it becomes running. handle_stop_signal(SIGCONT) has to drop ->siglock temporary in order to notify the parent with do_notify_parent_cldstop(). This leads to multiple problems: - as Jiri Kosina pointed out, the stopped task can resume without actually seeing SIGCONT which may have a handler. - we race with another sig_kernel_stop() signal which may come in that window. - we race with sig_fatal() signals which may set SIGNAL_GROUP_EXIT in that window. - we can't avoid taking tasklist_lock() while sending SIGCONT. With this patch handle_stop_signal() just sets the new SIGNAL_CLD_CONTINUED flag in p->signal->flags and returns. The notification is sent by the first task which returns from finish_stop() (there should be at least one) or any other signalled thread from get_signal_to_deliver(). This is a user-visible change. Say, currently kill(SIGCONT, stopped_child) can't return without seeing SIGCHLD, with this patch SIGCHLD can be delayed unpredictably. Another difference is that if the child is ptraced by another process, CLD_CONTINUED may be delivered to ->real_parent after ptrace_detach() while currently it always goes to the tracer which doesn't actually need this notification. Hopefully not a problem. The patch asks for the futher obvious cleanups, I'll send them separately. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Cc: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	md: support blocking writes to an array on device failure	Dan Williams	2008-04-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Allows a userspace metadata handler to take action upon detecting a device failure. Based on an original patch by Neil Brown. Changes: -added blocked_wait waitqueue to rdev -don't qualify Blocked with Faulty always let userspace block writes -added md_wait_for_blocked_rdev to wait for the block device to be clear, if userspace misses the notification another one is sent every 5 seconds -set MD_RECOVERY_NEEDED after clearing "blocked" -kill DoBlock flag, just test mddev->external Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	Blackfin serial driver: this driver enable SPORTs on Blackfin emulate UART	Bryan Wu	2008-04-30
\| \| \| \| \| \| \|	Signed-off-by: Bryan Wu <bryan.wu@analog.com> Cc: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	ext4: move headers out of include/linux	Christoph Hellwig	2008-04-29
\| \| \| \| \| \| \| \| \| \|	Move ext4 headers out of include/linux. This is just the trivial move, there's some more thing that could be done later. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	ext4: remove duplicate include of ext4_fs_i.h header file	Joe Perches	2008-04-29
\| \| \| \| \| \| \|	include/linux/ext4_fs_i.h is included in include/linux/ext_fs.h twice Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	Convert ext4 to use unlocked_ioctl	Andi Kleen	2008-04-29
\| \| \| \| \| \| \| \| \|	I checked ext4_ioctl and it looked largely safe to not be used without BKL. So convert it over to unlocked_ioctl. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
*	ext4: Fix race between migration and mmap write	Aneesh Kumar K.V	2008-04-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fail migrate if we allocated new blocks via mmap write. If we write to holes in the file via mmap, we end up allocating new blocks. This block allocation happens without taking inode->i_mutex. Since migrate is protected by i_mutex and migrate expects that no new blocks get allocated during migrate, fail migrate if new blocks get allocated. We can't take inode->i_mutex in the mmap write path because that would result in a locking order violation between i_mutex and mmap_sem. Also adding a separate rw_sempahore for protection is really high overhead for a rare operation such as migrate. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
*	Merge branch 'upstream-linus' of ↵	Linus Torvalds	2008-04-29
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev: [libata] linux/libata.h: reorganize ata_device struct members a bit ahci: SB600 ahci can't do MSI, blacklist that capability libata: More TSSTcorp pain, keep in sync with legacy IDE pata_via: Fix 6410 misdetect [libata] pata_atiixp: fix PIO timing data misprogramming
\| *	[libata] linux/libata.h: reorganize ata_device struct members a bit	Jeff Garzik	2008-04-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Put the big stuff at the end, to prepare for upcoming changes (and also hopefully achieve nicer packing of remaining members). Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
* \|	Merge branch 'i2c-for-linus' of git://jdelvare.pck.nerim.net/jdelvare-2.6	Linus Torvalds	2008-04-29
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* 'i2c-for-linus' of git://jdelvare.pck.nerim.net/jdelvare-2.6: i2c: Convert most new-style drivers to use module aliasing i2c: Add support for device alias names i2c-amd756-s4882: Fix an error path i2c: Drop unused RTC driver IDs i2c/tps65010: Add missing intialization of client data i2c-sis5595: Minor cleanups in sis5595_access i2c-piix4: Minor cleanups i2c: Spelling fix (successful) i2c-stub: No newline in parameter description
\| * \|	i2c: Convert most new-style drivers to use module aliasing	Jean Delvare	2008-04-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Based on earlier work by Jon Smirl and Jochen Friedrich. Update most new-style i2c drivers to use standard module aliasing instead of the old driver_name/type driver matching scheme. I've left the video drivers apart (except for SoC camera drivers) as they're a bit more diffcult to deal with, they'll have their own patch later. Signed-off-by: Jean Delvare <khali@linux-fr.org> Cc: Jon Smirl <jonsmirl@gmail.com> Cc: Jochen Friedrich <jochen@scram.de>
\| * \|	i2c: Add support for device alias names	Jean Delvare	2008-04-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Based on earlier work by Jon Smirl and Jochen Friedrich. This patch allows new-style i2c chip drivers to have alias names using the official kernel aliasing system and MODULE_DEVICE_TABLE(). At this point, the old i2c driver binding scheme (driver_name/type) is still supported. Signed-off-by: Jean Delvare <khali@linux-fr.org> Cc: Jochen Friedrich <jochen@scram.de> Cc: Jon Smirl <jonsmirl@gmail.com> Cc: Kay Sievers <kay.sievers@vrfy.org>
\| * \|	i2c: Drop unused RTC driver IDs	Jean Delvare	2008-04-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The x1208, pcf8563 and isl1208 RTC drivers have been converted to new-style i2c drivers, so they no longer use I2C driver IDs. Signed-off-by: Jean Delvare <khali@linux-fr.org> Cc: Alessandro Zummo <a.zummo@towertech.it>
* \| \|	Merge branch 'for-linus' of ↵	Linus Torvalds	2008-04-29
\|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: RDMA/nes: Formatting cleanup RDMA/nes: Add support for SFP+ PHY RDMA/nes: Use LRO IPoIB: Copy child MTU from parent IB/mthca: Avoid changing userspace ABI to handle DMA write barrier attribute IB/mthca: Avoid recycling old FMR R_Keys too soon mlx4_core: Avoid recycling old FMR R_Keys too soon IB/ehca: Allocate event queue size depending on max number of CQs and QPs IPoIB: Use separate CQ for UD send completions IB/iser: Count FMR alignment violations per session IB/iser: Move high-volume debug output to higher debug level IB/ehca: handle negative return value from ibmebus_request_irq() properly RDMA/cxgb3: Support peer-2-peer connection setup RDMA/cxgb3: Set the max_mr_size device attribute correctly RDMA/cxgb3: Correctly serialize peer abort path mlx4_core: Add a way to set the "collapsed" CQ flag
\| * \| \|	mlx4_core: Add a way to set the "collapsed" CQ flag	Yevgeny Petrilin	2008-04-29
\| \| \|/ \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Extend the mlx4_cq_resize() API with a way to set the "collapsed" flag for the CQ being created. Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
* \| \|	Security: Typecast CAP_*_SET macros	David Howells	2008-04-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Cast the CAP_*_SET macros to be of kernel_cap_t type to avoid compiler warnings. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \| \|	Security: Make secctx_to_secid() take const secdata	David Howells	2008-04-29
\|/ / \| \| \| \| \| \| \| \| \| \| \| \|	Make secctx_to_secid() take constant secdata. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \|	Improve queue_is_locked()	Jens Axboe	2008-04-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	spin_is_locked() doesn't work on UP without spinlock debugging. Make it safer and just return 1 on UP, so we don't get false positives. The plan is to kill this debug function during the -rc cycle. Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \|	Merge branch 'audit.b50' of ↵	Linus Torvalds	2008-04-29
\|\ \ \| \|/ \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current * 'audit.b50' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current: [PATCH] new predicate - AUDIT_FILETYPE [patch 2/2] Use find_task_by_vpid in audit code [patch 1/2] audit: let userspace fully control TTY input auditing [PATCH 2/2] audit: fix sparse shadowed variable warnings [PATCH 1/2] audit: move extern declarations to audit.h Audit: MAINTAINERS update Audit: increase the maximum length of the key field Audit: standardize string audit interfaces Audit: stop deadlock from signals under load Audit: save audit_backlog_limit audit messages in case auditd comes back Audit: collect sessionid in netlink messages Audit: end printk with newline
\| *	[PATCH] new predicate - AUDIT_FILETYPE	Al Viro	2008-04-28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Argument is S_IF... \| <index>, where index is normally 0 or 1. Triggers if chosen element of ctx->names[] is present and the mode of object in question matches the upper bits of argument. I.e. for things like "is the argument of that chmod a directory", etc. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
\| *	[patch 1/2] audit: let userspace fully control TTY input auditing	Miloslav Trmac	2008-04-28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove the code that automatically disables TTY input auditing in processes that open TTYs when they have no other TTY open; this heuristic was intended to automatically handle daemons, but it has false positives (e.g. with sshd) that make it impossible to control TTY input auditing from a PAM module. With this patch, TTY input auditing is controlled from user-space only. On the other hand, not even for daemons does it make sense to audit "input" from PTY masters; this data was produced by a program writing to the PTY slave, and does not represent data entered by the user. Signed-off-by: Miloslav Trmac <mitr@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>