aboutsummaryrefslogtreecommitdiffstats
path: root/include/linux/kernel.h
Commit message (Collapse)AuthorAge
* lib/hexdumpRandy Dunlap2007-05-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Based on ace_dump_mem() from Grant Likely for the Xilinx SystemACE CompactFlash interface. Add print_hex_dump() & hex_dumper() to lib/hexdump.c and linux/kernel.h. This patch adds the functions print_hex_dump() & hex_dumper(). print_hex_dump() can be used to perform a hex + ASCII dump of data to syslog, in an easily viewable format, thus providing a common text hex dump format. hex_dumper() provides a dump-to-memory function. It converts one "line" of output (16 bytes of input) at a time. Example usages: print_hex_dump(KERN_DEBUG, DUMP_PREFIX_ADDRESS, frame->data, frame->len); hex_dumper(frame->data, frame->len, linebuf, sizeof(linebuf)); Example output using %DUMP_PREFIX_OFFSET: 0009ab42: 40414243 44454647 48494a4b 4c4d4e4f-@ABCDEFG HIJKLMNO Example output using %DUMP_PREFIX_ADDRESS: ffffffff88089af0: 70717273 74757677 78797a7b 7c7d7e7f-pqrstuvw xyz{|}~. [akpm@linux-foundation.org: cleanups, add export] Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* add upper-32-bits macroAndrew Morton2007-05-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | We keep on getting "right shift count >= width of type" warnings when doing things like sector_t s; x = s >> 56; because with CONFIG_LBD=n, s is only 32-bit. Similar problems can occur with dma_addr_t's. So add a simple wrapper function which code can use to avoid this warning. The above example would become x = upper_32_bits(s) >> 24; The first user is in fact AFS. Cc: James Bottomley <James.Bottomley@SteelEye.com> Cc: "Cameron, Steve" <Steve.Cameron@hp.com> Cc: "Miller, Mike (OS Dev)" <Mike.Miller@hp.com> Cc: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ARRAY_SIZE: check for typeRusty Russell2007-05-07
| | | | | | | | | | | | | We can use a gcc extension to ensure that ARRAY_SIZE() is handed an array, not a pointer. This is especially important when code is changed from a fixed array to a pointer. I assume the Intel compiler doesn't support __builtin_types_compatible_p. [jdike@addtoit.com: uml: update UML definition of ARRAY_SIZE] Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Add kvasprintf()Jeremy Fitzhardinge2007-04-30
| | | | | | | | | | | | | Add a kvasprintf() function to complement kasprintf(). No in-tree users yet, but I have some coming up. [akpm@linux-foundation.org: EXPORT it] Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Keir Fraser <keir@xensource.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [PATCH] pid: make session_of_pgrp use struct pid instead of pid_tEric W. Biederman2007-02-12
| | | | | | | | | | | | | | | | | | | | | | | | | To properly implement a pid namespace I need to deal exclusively in terms of struct pid, because pid_t values become ambiguous. To this end session_of_pgrp is transformed to take and return a struct pid pointer. To avoid the need to worry about reference counting I now require my caller to hold the appropriate locks. Leaving callers repsonsible for increasing the reference count if they need access to the result outside of the locks. Since session_of_pgrp currently only has one caller and that caller simply uses only test the result for equality with another process group, the locking change means I don't actually have to acquire the tasklist_lock at all. tiocspgrp is also modified to take and release the lock. The logic there is a little more complicated but nothing I won't need when I convert pgrp of a tty to a struct pid pointer. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [PATCH] include/linux/kernel.h: Remove labs()Richard Knutsson2007-02-12
| | | | | | | | Remove labs() since it is not used/needed. Signed-off-by: Richard Knutsson <ricknu-0@student.ltu.se> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [PATCH] Extract and use wake_up_klogd()Kirill Korotaev2007-02-11
| | | | | | | | | | | | Remove hack with printing space to wake up klogd. Use explicit wake_up_klogd(). See earlier discussion http://groups.google.com/group/fa.linux.kernel/browse_frm/thread/75f496668409f58d/1a8f28983a51e1ff?lnk=st&q=wake_up_klogd+group%3Afa.linux.kernel&rnum=2#1a8f28983a51e1ff Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [PATCH] Common compat_sys_sysinfoKyle McMartin2007-02-11
| | | | | | | | | | | | | | | | I noticed that almost all architectures implemented exactly the same sys32_sysinfo... except parisc, where a bug was to be found in handling of the uptime. So let's remove a whole whack of code for fun and profit. Cribbed compat_sys_sysinfo from x86_64's implementation, since I figured it would be the best tested. This patch incorporates Arnd's suggestion of not using set_fs/get_fs, but instead extracting out the common code from sys_sysinfo. Cc: Christoph Hellwig <hch@infradead.org> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [PATCH] Add TAINT_USER and ability to set taint flags from userspaceTheodore Ts'o2007-02-11
| | | | | | | | | | | | | | | | | | | | | Allow taint flags to be set from userspace by writing to /proc/sys/kernel/tainted, and add a new taint flag, TAINT_USER, to be used when userspace has potentially done something dangerous that might compromise the kernel. This will allow support personnel to ask further questions about what may have caused the user taint flag to have been set. For example, they might examine the logs of the realtime JVM to see if the Java program has used the really silly, stupid, dangerous, and completely-non-portable direct access to physical memory feature which MUST be implemented according to the Real-Time Specification for Java (RTSJ). Sigh. What were those silly people at Sun thinking? [akpm@osdl.org: build fix] [bunk@stusta.de: cleanup] Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [PATCH] fix linux banner format stringRoman Zippel2007-01-10
| | | | | | | | | | | | | | Revert previous attempts at messing with the linux banner string and simply use a separate format string for proc. Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Acked-by: Olaf Hering <olaf@aepfle.de> Acked-by: Jean Delvare <khali@linux-fr.org> Cc: Andrey Borzenkov <arvidjaar@mail.ru> Cc: Andrew Morton <akpm@osdl.org> Cc: Andy Whitcroft <apw@shadowen.org> Cc: Herbert Poetzl <herbert@13thfloor.at> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* Make SLES9 "get_kernel_version" work on the kernel binary againLinus Torvalds2006-12-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | As reported by Andy Whitcroft, at least the SLES9 initrd build process depends on getting the kernel version from the kernel binary. It does that by simply trawling the binary and looking for the signature of the "linux_banner" string (the string "Linux version " to be exact. Which is really broken in itself, but whatever..) That got broken when the string was changed to allow /proc/version to change the UTS release information dynamically, and "get_kernel_version" thus returned "%s" (see commit a2ee8649ba6d71416712e798276bf7c40b64e6e5: "[PATCH] Fix linux banner utsname information"). This just restores "linux_banner" as a static string, which should fix the version finding. And /proc/version simply uses a different string. To avoid wasting even that miniscule amount of memory, the early boot string should really be marked __initdata, but that just causes the same bug in SLES9 to re-appear, since it will then find other occurrences of "Linux version " first. Cc: Andy Whitcroft <apw@shadowen.org> Acked-by: Herbert Poetzl <herbert@13thfloor.at> Cc: Andi Kleen <ak@suse.de> Cc: Andrew Morton <akpm@osdl.org> Cc: Steve Fox <drfickle@us.ibm.com> Acked-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] LOG2: Alter roundup_pow_of_two() so that it can use a ilog2() on a ↵David Howells2006-12-08
| | | | | | | | | | | | | | | | constant Alter roundup_pow_of_two() so that it can make use of ilog2() on a constant to produce a constant value, retaining the ability for an arch to override it in the non-const case. This permits the function to be used to initialise variables. Signed-off-by: David Howells <dhowells@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] LOG2: Implement a general integer log2 facility in the kernelDavid Howells2006-12-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This facility provides three entry points: ilog2() Log base 2 of unsigned long ilog2_u32() Log base 2 of u32 ilog2_u64() Log base 2 of u64 These facilities can either be used inside functions on dynamic data: int do_something(long q) { ...; y = ilog2(x) ...; } Or can be used to statically initialise global variables with constant values: unsigned n = ilog2(27); When performing static initialisation, the compiler will report "error: initializer element is not constant" if asked to take a log of zero or of something not reducible to a constant. They treat negative numbers as unsigned. When not dealing with a constant, they fall back to using fls() which permits them to use arch-specific log calculation instructions - such as BSR on x86/x86_64 or SCAN on FRV - if available. [akpm@osdl.org: MMC fix] Signed-off-by: David Howells <dhowells@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: David Howells <dhowells@redhat.com> Cc: Wojtek Kaniewski <wojtekka@toxygen.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* fix spelling error in include/linux/kernel.hJim Cromie2006-11-29
| | | | Signed-off-by: Adrian Bunk <bunk@stusta.de>
* Fix 'ALIGN()' macro, take 2Linus Torvalds2006-11-26
| | | | | | | | | | | | | | | | | | | | | | | You wouldn't think that doing an ALIGN() macro that aligns something up to a power-of-two boundary would be likely to have bugs, would you? But hey, in the wonderful world of mixing integer types, you have to be careful. This just makes sure that the alignment is interpreted in the same type as the thing to be aligned. Thanks to Roland Dreier, who noticed that the amso1100 driver got broken by the previous fix (that just extended the mask to "unsigned long", but was still broken in "unsigned long long" - it just happened to be the same on 64-bit architectures). See commit 4c8bd7eeee4c8f157fb61fb64b57500990b42e0e for the history of bugs here... Acked-by: Roland Dreier <rdreier@cisco.com> Cc: Andrew Morton <akpm@osdl.org> Cc: David Miller <davem@davemloft.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Add printk_timed_ratelimit()Andrew Morton2006-11-03
| | | | | | | | | | | | | | | | | | printk_ratelimit() has global state which makes it not useful for callers which wish to perform ratelimiting at a particular frequency. Add a printk_timed_ratelimit() which utilises caller-provided state storage to permit more flexibility. This function can in fact be used for things other than printk ratelimiting and is perhaps poorly named. Cc: Ulrich Drepper <drepper@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] pr_debug: check pr_debug() argumentsZach Brown2006-10-03
| | | | | | | | | | | | | | | | | | | | | check pr_debug() arguments When DEBUG isn't defined pr_debug() is defined away as an empty macro. By throwing away the arguments we allow completely incorrect code to build. Instead let's make it an empty inline which checks arguments and mark it so gcc can check the format specification. This results in a seemingly insignificant code size increase. A x86-64 allyesconfig: text data bss dec hex filename 25354768 7191098 4854720 37400586 23ab00a vmlinux.before 25354945 7191138 4854720 37400803 23ab0e3 vmlinux Signed-off-by: Zach Brown <zach.brown@oracle.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Add NUMA_BUILD definition in kernel.h to avoid #ifdef CONFIG_NUMAChristoph Lameter2006-09-27
| | | | | | | | | | | | | | | | | | | | The NUMA_BUILD constant is always available and will be set to 1 on NUMA_BUILDs. That way checks valid only under CONFIG_NUMA can easily be done without #ifdef CONFIG_NUMA F.e. if (NUMA_BUILD && <numa_condition>) { ... } [akpm: not a thing we'd normally do, but CONFIG_NUMA is special: it is causing ifdef explosion in core kernel, so let's see if this is a comfortable way in whcih to control that] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* Merge branch 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6Linus Torvalds2006-09-26
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6: (225 commits) [PATCH] Don't set calgary iommu as default y [PATCH] i386/x86-64: New Intel feature flags [PATCH] x86: Add a cumulative thermal throttle event counter. [PATCH] i386: Make the jiffies compares use the 64bit safe macros. [PATCH] x86: Refactor thermal throttle processing [PATCH] Add 64bit jiffies compares (for use with get_jiffies_64) [PATCH] Fix unwinder warning in traps.c [PATCH] x86: Allow disabling early pci scans with pci=noearly or disallowing conf1 [PATCH] x86: Move direct PCI scanning functions out of line [PATCH] i386/x86-64: Make all early PCI scans dependent on CONFIG_PCI [PATCH] Don't leak NT bit into next task [PATCH] i386/x86-64: Work around gcc bug with noreturn functions in unwinder [PATCH] Fix some broken white space in ia32_signal.c [PATCH] Initialize argument registers for 32bit signal handlers. [PATCH] Remove all traces of signal number conversion [PATCH] Don't synchronize time reading on single core AMD systems [PATCH] Remove outdated comment in x86-64 mmconfig code [PATCH] Use string instructions for Core2 copy/clear [PATCH] x86: - restore i8259A eoi status on resume [PATCH] i386: Split multi-line printk in oops output. ...
| * [PATCH] x86: Allow users to force a panic on NMIDon Zickus2006-09-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To quote Alan Cox: The default Linux behaviour on an NMI of either memory or unknown is to continue operation. For many environments such as scientific computing it is preferable that the box is taken out and the error dealt with than an uncorrected parity/ECC error get propogated. A small number of systems do generate NMI's for bizarre random reasons such as power management so the default is unchanged. In other respects the new proc/sys entry works like the existing panic controls already in that directory. This is separate to the edac support - EDAC allows supported chipsets to handle ECC errors well, this change allows unsupported cases to at least panic rather than cause problems further down the line. Signed-off-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Andi Kleen <ak@suse.de>
* | [PATCH] add DIV_ROUND_UP()Steven Whitehouse2006-09-26
|/ | | | | | | | | | Add the DIV_ROUND_UP() helper macro: divide `n' by `d', rounding up. Stolen from the gfs2 tree(!) because the swsusp patches need it. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [KERNEL] Do not truncate to 'int' in ALIGN() macro.David Miller2006-09-23
| | | | | Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* pr_debug() should not be used in driversPavel Machek2006-08-11
| | | | | | | | pr_debug() should not be used from drivers, add comment saying that. Signed-off-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* [PATCH] FDPIC: Move roundup() into linux/kernel.hDavid Howells2006-07-10
| | | | | | | | | | Move the roundup() macro from binfmt_elf.c into linux/kernel.h as it's generally useful. [akpm@osdl.org: nuke all the other implementations] Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] x86_64: reliable stack trace supportJan Beulich2006-06-26
| | | | | | | | | | | | | These are the generic bits needed to enable reliable stack traces based on Dwarf2-like (.eh_frame) unwind information. Subsequent patches will enable x86-64 and i386 to make use of this. Thanks to Andi Kleen and Ingo Molnar, who pointed out several possibilities for improvement. Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Implement kasprintfJeremy Fitzhardinge2006-06-25
| | | | | | | | | | | Implement kasprintf, a kernel version of asprintf. This allocates the memory required for the formatted string, including the trailing '\0'. Returns NULL on allocation failure. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] remove unlikely() in might_sleep_if()Hua Zhong2006-06-23
| | | | | | | | | | | | The likely() profiling tools show that __alloc_page() causes a lot of misses: ! 132 119193 __alloc_pages():mm/page_alloc.c@937 Because most __alloc_page() calls are not atomic. Signed-off-by: Hua Zhong <hzhong@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] writeback: fix range handlingOGAWA Hirofumi2006-06-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a writeback_control's `start' and `end' fields are used to indicate a one-byte-range starting at file offset zero, the required values of .start=0,.end=0 mean that the ->writepages() implementation has no way of telling that it is being asked to perform a range request. Because we're currently overloading (start == 0 && end == 0) to mean "this is not a write-a-range request". To make all this sane, the patch changes range of writeback_control. So caller does: If it is calling ->writepages() to write pages, it sets range (range_start/end or range_cyclic) always. And if range_cyclic is true, ->writepages() thinks the range is cyclic, otherwise it just uses range_start and range_end. This patch does, - Add LLONG_MAX, LLONG_MIN, ULLONG_MAX to include/linux/kernel.h -1 is usually ok for range_end (type is long long). But, if someone did, range_end += val; range_end is "val - 1" u64val = range_end >> bits; u64val is "~(0ULL)" or something, they are wrong. So, this adds LLONG_MAX to avoid nasty things, and uses LLONG_MAX for range_end. - All callers of ->writepages() sets range_start/end or range_cyclic. - Fix updates of ->writeback_index. It seems already bit strange. If it starts at 0 and ended by check of nr_to_write, this last index may reduce chance to scan end of file. So, this updates ->writeback_index only if range_cyclic is true or whole-file is scanned. Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Nathan Scott <nathans@sgi.com> Cc: Anton Altaparmakov <aia21@cantab.net> Cc: Steven French <sfrench@us.ibm.com> Cc: "Vladimir V. Saveliev" <vs@namesys.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] symbol_put_addr() locks kernelTrent Piepho2006-05-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Even since a previous patch: Fix race between CONFIG_DEBUG_SLABALLOC and modules Sun, 27 Jun 2004 17:55:19 +0000 (17:55 +0000) http://www.kernel.org/git/?p=linux/kernel/git/torvalds/old-2.6-bkcvs.git;a=commit;h=92b3db26d31cf21b70e3c1eadc56c179506d8fbe The function symbol_put_addr() will deadlock the kernel. symbol_put_addr() would acquire modlist_lock, then while holding the lock call two functions kernel_text_address() and module_text_address() which also try to acquire the same lock. This deadlocks the kernel of course. This patch changes symbol_put_addr() to not acquire the modlist_lock, it doesn't need it since it never looks at the module list directly. Also, it now uses core_kernel_text() instead of kernel_text_address(). The latter has an additional check for addr inside a module, but we don't need to do that since we call module_text_address() (the same function kernel_text_address uses) ourselves. Signed-off-by: Trent Piepho <xyzzy@speakeasy.org> Cc: Zwane Mwaikambo <zwane@fsmlabs.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Johannes Stezenbach <js@linuxtv.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] the scheduled unexport of panic_timeoutAdrian Bunk2006-04-11
| | | | | | | | Implement the scheduled unexport of panic_timeout. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Notifier chain update: API changesAlan Stern2006-03-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The kernel's implementation of notifier chains is unsafe. There is no protection against entries being added to or removed from a chain while the chain is in use. The issues were discussed in this thread: http://marc.theaimsgroup.com/?l=linux-kernel&m=113018709002036&w=2 We noticed that notifier chains in the kernel fall into two basic usage classes: "Blocking" chains are always called from a process context and the callout routines are allowed to sleep; "Atomic" chains can be called from an atomic context and the callout routines are not allowed to sleep. We decided to codify this distinction and make it part of the API. Therefore this set of patches introduces three new, parallel APIs: one for blocking notifiers, one for atomic notifiers, and one for "raw" notifiers (which is really just the old API under a new name). New kinds of data structures are used for the heads of the chains, and new routines are defined for registration, unregistration, and calling a chain. The three APIs are explained in include/linux/notifier.h and their implementation is in kernel/sys.c. With atomic and blocking chains, the implementation guarantees that the chain links will not be corrupted and that chain callers will not get messed up by entries being added or removed. For raw chains the implementation provides no guarantees at all; users of this API must provide their own protections. (The idea was that situations may come up where the assumptions of the atomic and blocking APIs are not appropriate, so it should be possible for users to handle these things in their own way.) There are some limitations, which should not be too hard to live with. For atomic/blocking chains, registration and unregistration must always be done in a process context since the chain is protected by a mutex/rwsem. Also, a callout routine for a non-raw chain must not try to register or unregister entries on its own chain. (This did happen in a couple of places and the code had to be changed to avoid it.) Since atomic chains may be called from within an NMI handler, they cannot use spinlocks for synchronization. Instead we use RCU. The overhead falls almost entirely in the unregister routine, which is okay since unregistration is much less frequent that calling a chain. Here is the list of chains that we adjusted and their classifications. None of them use the raw API, so for the moment it is only a placeholder. ATOMIC CHAINS ------------- arch/i386/kernel/traps.c: i386die_chain arch/ia64/kernel/traps.c: ia64die_chain arch/powerpc/kernel/traps.c: powerpc_die_chain arch/sparc64/kernel/traps.c: sparc64die_chain arch/x86_64/kernel/traps.c: die_chain drivers/char/ipmi/ipmi_si_intf.c: xaction_notifier_list kernel/panic.c: panic_notifier_list kernel/profile.c: task_free_notifier net/bluetooth/hci_core.c: hci_notifier net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_chain net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_expect_chain net/ipv6/addrconf.c: inet6addr_chain net/netfilter/nf_conntrack_core.c: nf_conntrack_chain net/netfilter/nf_conntrack_core.c: nf_conntrack_expect_chain net/netlink/af_netlink.c: netlink_chain BLOCKING CHAINS --------------- arch/powerpc/platforms/pseries/reconfig.c: pSeries_reconfig_chain arch/s390/kernel/process.c: idle_chain arch/x86_64/kernel/process.c idle_notifier drivers/base/memory.c: memory_chain drivers/cpufreq/cpufreq.c cpufreq_policy_notifier_list drivers/cpufreq/cpufreq.c cpufreq_transition_notifier_list drivers/macintosh/adb.c: adb_client_list drivers/macintosh/via-pmu.c sleep_notifier_list drivers/macintosh/via-pmu68k.c sleep_notifier_list drivers/macintosh/windfarm_core.c wf_client_list drivers/usb/core/notify.c usb_notifier_list drivers/video/fbmem.c fb_notifier_list kernel/cpu.c cpu_chain kernel/module.c module_notify_list kernel/profile.c munmap_notifier kernel/profile.c task_exit_notifier kernel/sys.c reboot_notifier_list net/core/dev.c netdev_chain net/decnet/dn_dev.c: dnaddr_chain net/ipv4/devinet.c: inetaddr_chain It's possible that some of these classifications are wrong. If they are, please let us know or submit a patch to fix them. Note that any chain that gets called very frequently should be atomic, because the rwsem read-locking used for blocking chains is very likely to incur cache misses on SMP systems. (However, if the chain's callout routines may sleep then the chain cannot be atomic.) The patch set was written by Alan Stern and Chandra Seetharaman, incorporating material written by Keith Owens and suggestions from Paul McKenney and Andrew Morton. [jes@sgi.com: restructure the notifier chain initialization macros] Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Signed-off-by: Jes Sorensen <jes@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] roundup_pow_of_two() 64-bit fixAndrew Morton2006-03-25
| | | | | | | | | | | | | | | | | | | | fls() takes an integer, so roundup_pow_of_two() is busted for ulongs larger than 2^32-1. Fix this by implementing and using fls_long(). (Why does roundup_pow_of_two() return a long?) (Why is roundup_pow_of_two() __attribute_const__ whereas long_log2() is __attribute_pure__?) (Why does long_log2() suck so much? Because we were missing fls_long()?) Cc: Roland Dreier <rdreier@cisco.com> Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com> Cc: John Hawkes <hawkes@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] pause_on_oops command line optionAndrew Morton2006-03-23
| | | | | | | | | | | | | | | | | Attempt to fix the problem wherein people's oops reports scroll off the screen due to repeated oopsing or to oopses on other CPUs. If this happens the user can reboot with the `pause_on_oops=<seconds>' option. It will allow the first oopsing CPU to print an oops record just a single time. Second oopsing attempts, or oopses on other CPUs will cause those CPUs to enter a tight loop until the specified number of seconds have elapsed. The patch implements the infrastructure generically in the expectation that architectures other than x86 will find it useful. Cc: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] x86_64: Add boot option to disable randomized mappings and cleanupAndi Kleen2006-02-17
| | | | | | | | | | | | | AMD SimNow!'s JIT doesn't like them at all in the guest. For distribution installation it's easiest if it's a boot time option. Also I moved the variable to a more appropiate place and make it independent from sysctl And marked __read_mostly which it is. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*-. [ACPI] merge 3549 4320 4485 4588 4980 5483 5651 acpica asus fops pnpacpi ↵Len Brown2006-01-24
|\ \ | | | | | | | | | | | | | | | branches into release Signed-off-by: Len Brown <len.brown@intel.com>
| | * [ACPI] fix reboot upon suspend-to-diskAlexey Starikovskiy2005-12-15
| |/ | | | | | | | | | | | | | | http://bugzilla.kernel.org/show_bug.cgi?id=4320 Signed-off-by: Alexey Starikovskiy <alexey.y.starikovskiy@intel.com> Acked-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Len Brown <len.brown@intel.com>
* | [IPV6]: Preserve procfs IPV6 address output formatYOSHIFUJI Hideaki2006-01-17
| | | | | | | | | | | | | | | | Procfs always output IPV6 addresses without the colon characters, and we cannot change that. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | [NET]: Use NIP6_FMT in kernel.hJoe Perches2006-01-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are errors and inconsistency in the display of NIP6 strings. ie: net/ipv6/ip6_flowlabel.c There are errors and inconsistency in the display of NIPQUAD strings too. ie: net/netfilter/nf_conntrack_ftp.c This patch: adds NIP6_FMT to kernel.h changes all code to use NIP6_FMT fixes net/ipv6/ip6_flowlabel.c adds NIPQUAD_FMT to kernel.h fixes net/netfilter/nf_conntrack_ftp.c changes a few uses of "%u.%u.%u.%u" to NIPQUAD_FMT for symmetry to NIP6_FMT Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | [PATCH] dump_thread() cleanupakpm@osdl.org2006-01-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ) From: Adrian Bunk <bunk@stusta.de> - create one common dump_thread() prototype in kernel.h - dump_thread() is only used in fs/binfmt_aout.c and can therefore be removed on all architectures where CONFIG_BINFMT_AOUT is not available Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] mutex subsystem, add typecheck_fn(type, function)Chuck Ebbert2006-01-09
| | | | | | | | | | | | | | | | | | | | | | | | add typecheck_fn(type, function) to do type-checking of function pointers. Modified-by: Ingo Molnar <mingo@elte.hu> (made it typeof() based, instead of typedef based.) Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | [PATCH] remove gcc-2 checksAndrew Morton2006-01-08
|/ | | | | | | | | | | | | Remove various things which were checking for gcc-1.x and gcc-2.x compilers. From: Adrian Bunk <bunk@stusta.de> Some documentation updates and removes some code paths for gcc < 3.2. Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] __deprecated_for_modules: panic_timeoutAdrian Bunk2005-11-07
| | | | | | | | This looks like something which out-of-tree code could possibly be using. Give panic_timeout the twelve-month treatment. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] kernel-docs: fix kernel-doc format problemsRandy Dunlap2005-11-07
| | | | | | | | | | | | | | | Convert to proper kernel-doc format. Some have extra blank lines (not allowed immed. after the function name) or need blank lines (after all parameters). Function summary must be only one line. Colon (":") in a function description does weird things (causes kernel-doc to think that it's a new section head sadly). Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] include/linux/kernel.h:BUILD_BUG_ON(): fix a commentNikita Danilov2005-10-30
| | | | | | | | | Fix comment describing BUILD_BUG_ON: BUG_ON is not an assertion (unfortunately). Signed-off-by: Nikita Danilov <nikita@clusterfs.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Make BUILD_BUG_ON fail at compile time.Andi Kleen2005-09-13
| | | | | | | | | | | | | | | Force a compiler error instead of a link error, because they are easier to track down. Idea stolen from code by Jan Beulich <jbeulich@novell.com> If the argument to BUILD_BUG_ON evaluates to non-zero the compiler will do: t.c:6: error: size of array `type name' is negative (surprised that gcc doesn't have an extension for this) Signed-off-by: "Andi Kleen" <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] sched: voluntary kernel preemptionIngo Molnar2005-06-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds a new preemption model: 'Voluntary Kernel Preemption'. The 3 models can be selected from a new menu: (X) No Forced Preemption (Server) ( ) Voluntary Kernel Preemption (Desktop) ( ) Preemptible Kernel (Low-Latency Desktop) we still default to the stock (Server) preemption model. Voluntary preemption works by adding a cond_resched() (reschedule-if-needed) call to every might_sleep() check. It is lighter than CONFIG_PREEMPT - at the cost of not having as tight latencies. It represents a different latency/complexity/overhead tradeoff. It has no runtime impact at all if disabled. Here are size stats that show how the various preemption models impact the kernel's size: text data bss dec hex filename 3618774 547184 179896 4345854 424ffe vmlinux.stock 3626406 547184 179896 4353486 426dce vmlinux.voluntary +0.2% 3748414 548640 179896 4476950 445016 vmlinux.preempt +3.5% voluntary-preempt is +0.2% of .text, preempt is +3.5%. This feature has been tested for many months by lots of people (and it's also included in the RHEL4 distribution and earlier variants were in Fedora as well), and it's intended for users and distributions who dont want to use full-blown CONFIG_PREEMPT for one reason or another. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] clean up kernel messagesMatt Mackall2005-05-01
| | | | | | | | | | | | Arrange for all kernel printks to be no-ops. Only available if CONFIG_EMBEDDED. This patch saves about 375k on my laptop config and nearly 100k on minimal configs. Signed-off-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* Linux-2.6.12-rc2v2.6.12-rc2Linus Torvalds2005-04-16
Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!