litmus-rt.git - The LITMUS^RT kernel.

	Commit message (Collapse)	Author	Age
*	Fix race between cat /proc/*/wchan and rmmod et al	Alexey Dobriyan	2007-05-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	kallsyms_lookup() can go iterating over modules list unprotected which is OK for emergency situations (oops), but not OK for regular stuff like /proc/*/wchan. Introduce lookup_symbol_name()/lookup_module_symbol_name() which copy symbol name into caller-supplied buffer or return -ERANGE. All copying is done with module_mutex held, so... Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	Fix race between rmmod and cat /proc/kallsyms	Alexey Dobriyan	2007-05-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	module_get_kallsym() leaks "struct module *" outside of module_mutex which is no-no, because module can dissapear right after mutex unlock. Copy all needed information from inside module_mutex into caller-supplied space. [bunk@stusta.de: is_exported() can now become static] Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	Simplify module_get_kallsym() by dropping length arg	Alexey Dobriyan	2007-05-08
\| \| \| \| \| \| \| \| \| \| \| \|	module_get_kallsym() could in theory truncate module symbol name to fit in buffer, but nobody does this. Always use KSYM_NAME_LEN + 1 bytes for name. Suggested by lg^WRusty. Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	PNPACPI sets pnpdev->dev.archdata	David Brownell	2007-05-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Teach PNPACPI how to hook up its devices to their ACPI nodes, so that pnpdev->dev.archdata points to the parallel acpi device node. Previously this only worked for PCI, leaving a notable hole. Export "acpi_bus_type" so this can work. Remove some extraneous whitespace. Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Cc: Adam Belay <ambx1@neo.rr.com> Cc: Bjorn Helgaas <bjorn.helgaas@hp.com> Cc: Len Brown <lenb@kernel.org> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	Kprobes: print details of kretprobe on assertion failure	Ananth N Mavinakayanahalli	2007-05-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In certain cases like when the real return address can't be found or when the number of tracked calls to a kretprobed function is less than the number of returns, we may not be able to find the correct return address after processing a kretprobe. Currently we just do a BUG_ON, but no information is provided about the actual failing kretprobe. Print out details of the kretprobe before calling BUG(). Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com> Cc: Jim Keniston <jkenisto@us.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: Maneesh Soni <maneesh@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	kdump/kexec: calculate note size at compile time	Simon Horman	2007-05-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently the size of the per-cpu region reserved to save crash notes is set by the per-architecture value MAX_NOTE_BYTES. Which in turn is currently set to 1024 on all supported architectures. While testing ia64 I recently discovered that this value is in fact too small. The particular setup I was using actually needs 1172 bytes. This lead to very tedious failure mode where the tail of one elf note would overwrite the head of another if they ended up being alocated sequentially by kmalloc, which was often the case. It seems to me that a far better approach is to caclculate the size that the area needs to be. This patch does just that. If a simpler stop-gap patch for ia64 to be squeezed into 2.6.21(.X) is needed then this should be as easy as making MAX_NOTE_BYTES larger in arch/asm-ia64/kexec.h. Perhaps 2048 would be a good choice. However, I think that the approach in this patch is a much more robust idea. Acked-by: Vivek Goyal <vgoyal@in.ibm.com> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	remove artificial software max_loop limit	Ken Chen	2007-05-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove artificial maximum 256 loop device that can be created due to a legacy device number limit. Searching through lkml archive, there are several instances where users complained about the artificial limit that the loop driver impose. There is no reason to have such limit. This patch rid the limit entirely and make loop device and associated block queue instantiation on demand. With on-demand instantiation, it also gives the benefit of not wasting memory if these devices are not in use (compare to current implementation that always create 8 loop devices), a net improvement in both areas. This version is both tested with creation of large number of loop devices and is compatible with existing losetup/mount user land tools. There are a number of people who worked on this and provided valuable suggestions, in no particular order, by: Jens Axboe Jan Engelhardt Christoph Hellwig Thomas M Signed-off-by: Ken Chen <kenchen@google.com> Cc: Jan Engelhardt <jengelh@linux01.gwdg.de> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	add touch_all_softlockup_watchdogs()	Jeremy Fitzhardinge	2007-05-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add touch_all_softlockup_watchdogs() to allow the softlockup watchdog timers on all cpus to be updated. This is used to prevent sysrq-t from generating a spurious watchdog message when generating lots of output. Softlockup watchdogs use sched_clock() as its timebase, which is inherently per-cpu (at least, when it is measuring unstolen time). Because of this, it isn't possible for one CPU to directly update the other CPU's timers, but it is possible to tell the other CPUs to do update themselves appropriately. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Acked-by: Chris Lalancette <clalance@redhat.com> Signed-off-by: Prarit Bhargava <prarit@redhat.com> Cc: Rick Lindsley <ricklind@us.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	Move timekeeping code to timekeeping.c	john stultz	2007-05-08
\| \| \| \| \| \| \| \| \| \| \| \|	Move the timekeeping code out of kernel/timer.c and into kernel/time/timekeeping.c. I made no cleanups or other changes in transit. [akpm@linux-foundation.org: build fix] Signed-off-by: John Stultz <johnstul@us.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	time: SMP friendly alignment of struct clocksource	Eric Dumazet	2007-05-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	struct clocksource is a critical data structure. Most of its fields are read only, some of them are heavily modified at each timer interrupt. It makes sense to separate those fields and make sure they all share one cache line, or at least the minimum for machines with small cache lines. [akpm@linux-foundation.org: build fix] Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Acked-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	<linux/sysdev.h> needs to include <linux/module.h>	Ralf Baechle	2007-05-08
\| \| \| \| \| \| \| \| \| \| \|	sysdev.h uses THIS_MODULE so should include <linux/module.h>. [akpm@linux-foundation.org: couple of fixes] Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Cc: Andi Kleen <ak@suse.de> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	Add a new deferrable delayed work init	Venki Pallipadi	2007-05-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a new deferrable delayed work init. This can be used to schedule work that are 'unimportant' when CPU is idle and can be called later, when CPU eventually comes out of idle. Use this init in cpufreq ondemand governor. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	Add support for deferrable timers	Venki Pallipadi	2007-05-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Introduce a new flag for timers - deferrable: Timers that work normally when system is busy. But, will not cause CPU to come out of idle (just to service this timer), when CPU is idle. Instead, this timer will be serviced when CPU eventually wakes up with a subsequent non-deferrable timer. The main advantage of this is to avoid unnecessary timer interrupts when CPU is idle. If the routine currently called by a timer can wait until next event without any issues, this new timer can be used to setup timer event for that routine. This, with dynticks, allows CPUs to be lazy, allowing them to stay in idle for extended period of time by reducing unnecesary wakeup and thereby reducing the power consumption. This patch: Builds this new timer on top of existing timer infrastructure. It uses last bit in 'base' pointer of timer_list structure to store this deferrable timer flag. __next_timer_interrupt() function skips over these deferrable timers when CPU looks for next timer event for which it has to wake up. This is exported by a new interface init_timer_deferrable() that can be called in place of regular init_timer(). [akpm@linux-foundation.org: Privatise a #define] Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	parport->dev driver model support	David Brownell	2007-05-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently a parport_driver can't get a handle on the device node for the underlying parport (PNPACPI, PCI, etc). That prevents correct placement of sysfs child nodes, which can affect things like power management. This patch adds a field to "struct parport" pointing to that device node, and updates non-legacy port drivers to initialize that device pointer. That field replaces the analagous PCI-only support in parport_pc. [akpm@linux-foundation.org: fix powerpc build] Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	Delete unused header file linux/awe_voice.h	Robert P. J. Day	2007-05-08
\| \| \| \| \| \| \| \| \|	Delete the unused header file include/linux/awe_voice.h, as well as its corresponding Kbuild entry. Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	make remove_inode_dquot_ref() static	Adrian Bunk	2007-05-08
\| \| \| \| \| \| \| \| \|	remove_inode_dquot_ref() can now become static. Signed-off-by: Adrian Bunk <bunk@stusta.de> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	Delete unused header file math-emu/extended.h	Robert P. J. Day	2007-05-08
\| \| \| \| \| \|	Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	Remove do_sync_file_range()	Mark Fasheh	2007-05-08
\| \| \| \| \| \| \| \| \| \|	Remove do_sync_file_range() and convert callers to just use do_sync_mapping_range(). Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	move die notifier handling to common code	Christoph Hellwig	2007-05-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch moves the die notifier handling to common code. Previous various architectures had exactly the same code for it. Note that the new code is compiled unconditionally, this should be understood as an appel to the other architecture maintainer to implement support for it aswell (aka sprinkling a notify_die or two in the proper place) arm had a notifiy_die that did something totally different, I renamed it to arm_notify_die as part of the patch and made it static to the file it's declared and used at. avr32 used to pass slightly less information through this interface and I brought it into line with the other architectures. [akpm@linux-foundation.org: build fix] [akpm@linux-foundation.org: fix vmalloc_sync_all bustage] [bryan.wu@analog.com: fix vmalloc_sync_all in nommu] Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: <linux-arch@vger.kernel.org> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Bryan Wu <bryan.wu@analog.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	tty: introduce no_tty and use it in selinux	Eric W. Biederman	2007-05-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	While researching the tty layer pid leaks I found a weird case in selinux when we drop a controlling tty because of inadequate permissions we don't do the normal hangup processing. Which is a problem if it happens the session leader has exec'd something that can no longer access the tty. We already have code in the kernel to handle this case in the form of the TIOCNOTTY ioctl. So this patch factors out a helper function that is the essence of that ioctl and calls it from the selinux code. This removes the inconsistency in handling dropping of a controlling tty and who knows it might even make some part of user space happy because it received a SIGHUP it was expecting. In addition since this removes the last user of proc_set_tty outside of tty_io.c proc_set_tty is made static and removed from tty.h Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: James Morris <jmorris@namei.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	enlarge console.name	Andrew Morton	2007-05-08
\| \| \| \| \| \| \| \| \| \| \| \|	console.name[] is eight chars, but so is "earlyvga". So when we try to print console->name when using earlyvga it runs off the end of the string. Make it bigger. Diagnosed-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	futex: get_futex_key, get_key_refs and drop_key_refs	Rusty Russell	2007-05-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	lguest uses the convenient futex infrastructure for inter-domain I/O, so expose get_futex_key, get_key_refs (renamed get_futex_key_refs) and drop_key_refs (renamed drop_futex_key_refs). Also means we need to expose the union that these use. No code changes. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	cyclades: remove custom types	Klaus Kudielka	2007-05-08
\| \| \| \| \| \| \|	Switch from private uclong, etc over to standard types. Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	fix cyclades.h for x86_64 (and probably others)	Klaus Kudielka	2007-05-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	At least on x86_64 the present cyclades.h is broken due to the wrong size of uclong. This affects, of course, both the kernel and the user-level utilities. The symptom is that cyzload refuses to load the firmware. I also managed to freeze the machine when unloading the module. The patch below fixes this in an architecture-independent way. I have tested it with 2.6.19 and the driver works fine again with a Cyclades-Z on an Athlon 64 X2. [akpm@linux-foundation.org: fix warnings] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	Kprobes: Make kprobe.symbol_name const	Ananth N Mavinakayanahalli	2007-05-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Kprobes doesn't scribble the kprobe.symbol_name field. Its only set by the module when registering the probe. Modules that exercise good hygiene using the "const" qualifier will see warnings... warning: assignment discards qualifiers from pointer target type Make struct kprobe.symbol_name const char * Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Signed-off-by: Jim Keniston <jkenisto@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	tty: i386/x86_64 arbitary speed support	Alan Cox	2007-05-08
\| \| \| \| \| \| \| \| \| \|	Adds the needed TCGETS2/TCSETS2 ioctl calls, structures, defines and the like. Tested against the test suite and passes. Other platforms should need roughly the same change. Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	VFS: delay the dentry name generation on sockets and pipes	Eric Dumazet	2007-05-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	1) Introduces a new method in 'struct dentry_operations'. This method called d_dname() might be called from d_path() to build a pathname for special filesystems. It is called without locks. Future patches (if we succeed in having one common dentry for all pipes/sockets) may need to change prototype of this method, but we now use : char d_dname(struct dentry dentry, char buffer, int buflen); 2) Adds a dynamic_dname() helper function that eases d_dname() implementations 3) Defines d_dname method for sockets : No more sprintf() at socket creation. This is delayed up to the moment someone does an access to /proc/pid/fd/... 4) Defines d_dname method for pipes : No more sprintf() at pipe creation. This is delayed up to the moment someone does an access to /proc/pid/fd/... A benchmark consisting of 1.000.000 calls to pipe()/close()/close() gives a nice* speedup on my Pentium(M) 1.6 Ghz : 3.090 s instead of 3.450 s Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Acked-by: Christoph Hellwig <hch@infradead.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	Fix race between proc_get_inode() and remove_proc_entry()	Alexey Dobriyan	2007-05-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	proc_lookup remove_proc_entry =========== ================= lock_kernel(); spin_lock(&proc_subdir_lock); [find PDE with refcount 0] spin_unlock(&proc_subdir_lock); spin_lock(&proc_subdir_lock); [find PDE with refcount 0] [check refcount and free PDE] spin_unlock(&proc_subdir_lock); proc_get_inode: de_get(de); /* boom */ Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	add filesystem subtype support	Miklos Szeredi	2007-05-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There's a slight problem with filesystem type representation in fuse based filesystems. From the kernel's view, there are just two filesystem types: fuse and fuseblk. From the user's view there are lots of different filesystem types. The user is not even much concerned if the filesystem is fuse based or not. So there's a conflict of interest in how this should be represented in fstab, mtab and /proc/mounts. The current scheme is to encode the real filesystem type in the mount source. So an sshfs mount looks like this: sshfs#user@server:/ /mnt/server fuse rw,nosuid,nodev,... This url-ish syntax works OK for sshfs and similar filesystems. However for block device based filesystems (ntfs-3g, zfs) it doesn't work, since the kernel expects the mount source to be a real device name. A possibly better scheme would be to encode the real type in the type field as "type.subtype". So fuse mounts would look like this: /dev/hda1 /mnt/windows fuseblk.ntfs-3g rw,... user@server:/ /mnt/server fuse.sshfs rw,nosuid,nodev,... This patch adds the necessary code to the kernel so that this can be correctly displayed in /proc/mounts. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	init dma masks in pnp_dev	David Brownell	2007-05-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	PNP now initializes device dma masks, which prevents oopses when generic dma calls are made using pnp device nodes. This assumes PNP only uses ISA DMA, with 24 bit addresses; and that it's safe to init those masks for all devices (rather than finding out which devices have been assigned DMA channels, and handling only those). Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Cc: Adam Belay <abelay@novell.com> Cc: Jaroslav Kysela <perex@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	Merge sys_clone()/sys_unshare() nsproxy and namespace handling	Badari Pulavarty	2007-05-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	sys_clone() and sys_unshare() both makes copies of nsproxy and its associated namespaces. But they have different code paths. This patch merges all the nsproxy and its associated namespace copy/clone handling (as much as possible). Posted on container list earlier for feedback. - Create a new nsproxy and its associated namespaces and pass it back to caller to attach it to right process. - Changed all copy__ns() routines to return a new copy of namespace instead of attaching it to task->nsproxy. - Moved the CAP_SYS_ADMIN checks out of copy__ns() routines. - Removed unnessary !ns checks from copy__ns() and added BUG_ON() just incase. - Get rid of all individual unshare__ns() routines and make use of copy_*_ns() instead. [akpm@osdl.org: cleanups, warning fix] [clg@fr.ibm.com: remove dup_namespaces() declaration] [serue@us.ibm.com: fix CONFIG_IPC_NS=n, clone(CLONE_NEWIPC) retval] [akpm@linux-foundation.org: fix build with CONFIG_SYSVIPC=n] Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Serge Hallyn <serue@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: <containers@lists.osdl.org> Signed-off-by: Cedric Le Goater <clg@fr.ibm.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	IRQ: add __must_check to request_irq	Monakhov Dmitriy	2007-05-08
\| \| \| \| \| \| \| \| \| \| \|	This could help to find buggy drivers where request_irq return value wasn't checked. There's just no reason to ignore errors which can and do occur. Anyone who got warning during compilation have to realise what it is't realy safe code. Signed-off-by: Monakhov Dmitriy <dmonakhov@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	kconfig: centralize the selection of semaphore debugging in lib/Kconfig.debug	Robert P. J. Day	2007-05-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove the Kconfig selection of semaphore debugging from the ALPHA and FRV Kconfig files, and centralize it in lib/Kconfig.debug. There doesn't seem to be much point in letting individual architectures independently define the same Kconfig option when it can just as easily be put in a single Kconfig file and made dependent on a subset of architectures. that way, at least the option shows up in the same relative location in the menu each time. Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Cc: David Howells <dhowells@redhat.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Richard Henderson <rth@twiddle.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	reiserfs: shrink superblock if no xattrs	Alexey Dobriyan	2007-05-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This makes in-core superblock fit into one cacheline here. Before: struct dentry * xattr_root; /* 124 4 / / --- cacheline 1 boundary (128 bytes) --- / struct rw_semaphore xattr_dir_sem; / 128 12 / int j_errno; / 140 4 / }; / size: 144, cachelines: 2 / / sum members: 142, holes: 1, sum holes: 2 / / last cacheline: 16 bytes / After: int j_errno; / 124 4 / / --- cacheline 1 boundary (128 bytes) --- / }; / size: 128, cachelines: 1 / / sum members: 126, holes: 1, sum holes: 2 */ Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: <reiserfs-dev@namesys.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	Fix compilation of drivers with -O0	Michal Schmidt	2007-05-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It is sometimes useful to compile individual drivers with optimization disabled for easier debugging. Currently drivers which use htonl() and similar functions don't compile with -O0. This patch fixes it. It also removes obsolete and misleading comments. This header is not for userspace, so we don't have to care about strange programs these comments mention. (akpm: -O0 probably isn't a good idea, but this code looks pretty crufty and unuseful) Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	init/do_mounts.c: proper prepare_namespace() prototype	Adrian Bunk	2007-05-08
\| \| \| \| \| \| \| \|	Add a proper protype for prepare_namespace() in include/linux/init.h. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	use use SEEK_MAX to validate user lseek arguments	Chris Snook	2007-05-08
\| \| \| \| \| \| \| \| \|	Add SEEK_MAX and use it to validate lseek arguments from userspace. Signed-off-by: Chris Snook <csnook@redhat.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	Fix constant folding and poor optimization in byte swapping code	Trent Piepho	2007-05-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Constant folding does not work for the swabXX() byte swapping functions, and the C versions optimize poorly. Attempting to initialize a global variable to swab16(0x1234) or put something like "case swab32(42):" in a switch statement will not compile. It can work, swab.h just isn't doing it correctly. This patch fixes that. Contrary to the comment in asm-i386/byteorder.h, gcc does not recognize the "C" version of swab16 and turn it into efficient code. gcc can do this, just not with the current code. The simple function: u16 foo(u16 x) { return swab16(x); } Would compile to: movzwl %ax, %eax movl %eax, %edx shrl $8, %eax sall $8, %edx orl %eax, %edx With this patch, it will compile to: rolw $8, %ax I also attempted to document the maze different macros/inline functions that are used to create the final product. Signed-off-by: Trent Piepho <xyzzy@speakeasy.org> Cc: Francois-Rene Rideau <fare@tunes.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	ipmi: add new IPMI nmi watchdog handling	Corey Minyard	2007-05-08
\| \| \| \| \| \| \| \| \| \| \| \| \|	Convert over to the new NMI handling for getting IPMI watchdog timeouts via an NMI. This add config options to know if there is the ability to receive NMIs and if it has an NMI post processing call. Then it modifies the IPMI watchdog to take advantage of this so that it can know if an NMI comes in. It also adds testing that the IPMI NMI watchdog works. Signed-off-by: Corey Minyard <minyard@acm.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	reduce size of task_struct on 64-bit machines	William Cohen	2007-05-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This past week I was playing around with that pahole tool (http://oops.ghostprotocols.net:81/acme/dwarves/) and looking at the size of various struct in the kernel. I was surprised by the size of the task_struct on x86_64, approaching 4K. I looked through the fields in task_struct and found that a number of them were declared as "unsigned long" rather than "unsigned int" despite them appearing okay as 32-bit sized fields. On x86_64 "unsigned long" ends up being 8 bytes in size and forces 8 byte alignment. Is there a reason there a reason they are "unsigned long"? The patch below drops the size of the struct from 3808 bytes (60 64-byte cachelines) to 3760 bytes (59 64-byte cachelines). A couple other fields in the task struct take a signficant amount of space: struct thread_struct thread; 688 struct held_lock held_locks[30]; 1680 CONFIG_LOCKDEP is turned on in the .config [akpm@linux-foundation.org: fix printk warnings] Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	simplify the stacktrace code	Christoph Hellwig	2007-05-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Simplify the stacktrace code: - remove the unused task argument to save_stack_trace, it's always current - remove the all_contexts flag, it's alwasy 0 Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Andi Kleen <ak@suse.de> Cc: Akinobu Mita <akinobu.mita@gmail.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	Factor outstanding I/O error handling	Guillaume Chazarain	2007-05-08
\| \| \| \| \| \| \| \| \| \|	Cleanup: setting an outstanding error on a mapping was open coded too many times. Factor it out in mapping_set_error(). Signed-off-by: Guillaume Chazarain <guichaz@yahoo.fr> Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	mm: move common segment checks to separate helper function	Dmitriy Monakhov	2007-05-08
\| \| \| \| \| \| \| \| \| \|	[akpm@linux-foundation.org: cleanup] Signed-off-by: Monakhov Dmitriy <dmonakhov@openvz.org> Cc: Christoph Hellwig <hch@lst.de> Acked-by: Anton Altaparmakov <aia21@cam.ac.uk> Acked-by: David Chinner <dgc@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	Increase slab redzone to 64bits	David Woodhouse	2007-05-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There are two problems with the existing redzone implementation. Firstly, it's causing misalignment of structures which contain a 64-bit integer, such as netfilter's 'struct ipt_entry' -- causing netfilter modules to fail to load because of the misalignment. (In particular, the first check in net/ipv4/netfilter/ip_tables.c::check_entry_size_and_hooks()) On ppc32 and sparc32, amongst others, __alignof__(uint64_t) == 8. With slab debugging, we use 32-bit redzones. And allocated slab objects aren't sufficiently aligned to hold a structure containing a uint64_t. By _just_ setting ARCH_KMALLOC_MINALIGN to __alignof__(u64) we'd disable redzone checks on those architectures. By using 64-bit redzones we avoid that loss of debugging, and also fix the other problem while we're at it. When investigating this, I noticed that on 64-bit platforms we're using a 32-bit value of RED_ACTIVE/RED_INACTIVE in the 64-bit memory location set aside for the redzone. Which means that the four bytes immediately before or after the allocated object at 0x00,0x00,0x00,0x00 for LE and BE machines, respectively. Which is probably not the most useful choice of poison value. One way to fix both of those at once is just to switch to 64-bit redzones in all cases. Signed-off-by: David Woodhouse <dwmw2@infradead.org> Acked-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Christoph Lameter <clameter@engr.sgi.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	Merge git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6	Linus Torvalds	2007-05-07
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6: [IA64] update memory attribute aliasing documentation & test cases [IA64] fail mmaps that span areas with incompatible attributes [IA64] allow WB /sys/.../legacy_mem mmaps [IA64] make ioremap avoid unsupported attributes [IA64] rename ioremap variables to match i386 [IA64] relax per-cpu TLB requirement to DTC [IA64] remove per-cpu ia64_phys_stacked_size_p8 [IA64] Fix example error injection program [IA64] Itanium MC Error Injection Tool: pal_mc_error_inject() interface [IA64] Itanium MC Error Injection Tool: Makefile changes [IA64] Itanium MC Error Injection Tool: Driver sysfs interface [IA64] Itanium MC Error Injection Tool: Doc and sample application [IA64] Itanium MC Error Injection Tool: Kernel configuration
\| *	Pull mem-attribute into release branch	Tony Luck	2007-04-30
\| \|\
\| \| *	[IA64] make ioremap avoid unsupported attributes	Bjorn Helgaas	2007-03-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Example memory map (from HP sx1000 with VGA enabled): 0x00000 - 0x9FFFF supports only WB (cacheable) access 0xA0000 - 0xBFFFF supports only UC (uncacheable) access 0xC0000 - 0xFFFFF supports only WB (cacheable) access pci_read_rom() indirectly uses ioremap(0xC0000) to read the shadow VGA option ROM. ioremap() used to default to a 16MB or 64MB UC kernel identity mapping, which would cause an MCA when reading 0xC0000 since only WB is supported there. X uses reads the option ROM to initialize devices. A smaller test case is: # echo 1 > /sys/bus/pci/devices/0000:aa:03.0/rom # cp /sys/bus/pci/devices/0000:aa:03.0/rom x To avoid this, we can use the same ioremap_page_range() strategy that most architectures use for all ioremaps. These page table mappings come out of the vmalloc area. On ia64, these are in region 5 (0xA... addresses) and typically use 16KB or 64KB mappings instead of 16MB or 64MB mappings. The smaller mappings give more flexibility to use the correct attributes. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
\| * \|	Pull percpu-dtc into release branch	Tony Luck	2007-04-30
\| \|\ \
\| \| * \|	[IA64] relax per-cpu TLB requirement to DTC	Chen, Kenneth W	2007-02-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Instead of pinning per-cpu TLB into a DTR, use DTC. This will free up one TLB entry for application, or even kernel if access pattern to per-cpu data area has high temporal locality. Since per-cpu is mapped at the top of region 7 address, we just need to add special case in alt_dtlb_miss. The physical address of per-cpu data is already conveniently stored in IA64_KR(PER_CPU_DATA). Latency for alt_dtlb_miss is not affected as we can hide all the latency. It was measured that alt_dtlb_miss handler has 23 cycles latency before and after the patch. The performance effect is massive for applications that put lots of tlb pressure on CPU. Workload environment like database online transaction processing or application uses tera-byte of memory would benefit the most. Measurement with industry standard database benchmark shown an upward of 1.6% gain. While smaller workloads like cpu, java also showing small improvement. Signed-off-by: Ken Chen <kenneth.w.chen@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
\| \| * \|	[IA64] remove per-cpu ia64_phys_stacked_size_p8	Chen, Kenneth W	2007-02-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It's not efficient to use a per-cpu variable just to store how many physical stack register a cpu has. Ever since the incarnation of ia64 up till upcoming Montecito processor, that variable has "glued" to 96. Having a variable in memory means that the kernel is burning an extra cacheline access on every syscall and kernel exit path. Such "static" value is better served with the instruction patching utility exists today. Convert ia64_phys_stacked_size_p8 into dynamic insn patching. This also has a pleasant side effect of eliminating access to per-cpu area while psr.ic=0 in the kernel exit path. (fixable for per-cpu DTC work, but why bother?) There are some concerns with the default value that the instruc- tion encoded in the kernel image. It shouldn't be concerned. The reasons are: (1) cpu_init() is called at CPU initialization. In there, we find out physical stack register size from PAL and patch two instructions in kernel exit code. The code in question can not be executed before the patching is done. (2) current implementation stores zero in ia64_phys_stacked_size_p8, and that's what the current kernel exit path loads the value with. With the new code, it is equivalent that we store reg size 96 in ia64_phys_stacked_size_p8, thus creating a better safety net. Given (1) above can never fail, having (2) is just a bonus. All in all, this patch allow one less memory reference in the kernel exit path, thus reducing syscall and interrupt return latency; and avoid polluting potential useful data in the CPU cache. Signed-off-by: Ken Chen <kenneth.w.chen@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>