aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/base/memory.c
Commit message (Collapse)AuthorAge
* Revert "memory-hotplug: add 0x prefix to HEX block_size_bytes"Linus Torvalds2010-04-09
| | | | | | | | | | | | | This reverts commit ba168fc37dea145deeb8fa9e7e71c748d2e00d74. It changes user-visible sysfs interfaces, and breaks some existing user space applications which apparently rely on the fact that the output does not contain the "0x" prefix. Requested-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* include cleanup: Update gfp.h and slab.h includes to prepare for breaking ↵Tejun Heo2010-03-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
* memory hotplug: allow setting of phys_deviceHeiko Carstens2010-03-17
| | | | | | | | | | | | | | | | | | | | | | /sys/devices/system/memory/memoryX/phys_device is supposed to contain the number of the physical device that the corresponding piece of memory belongs to. In case a physical device should be replaced or taken offline for whatever reason it is necessary to set all corresponding memory pieces offline. The current implementation always sets phys_device to '0' and there is no way or hook to change that. Seems like there was a plan to implement that but it wasn't finished for whatever reason. So add a weak function which architectures can override to actually set the phys_device from within add_memory_block(). Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* kobject: Constify struct kset_uevent_opsEmese Revfy2010-03-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | Constify struct kset_uevent_ops. This is part of the ops structure constification effort started by Arjan van de Ven et al. Benefits of this constification: * prevents modification of data that is shared (referenced) by many other structure instances at runtime * detects/prevents accidental (but not intentional) modification attempts on archs that enforce read-only kernel data at runtime * potentially better optimized code as the compiler can assume that the const data cannot be changed * the compiler/linker move const data into .rodata and therefore exclude them from false sharing Signed-off-by: Emese Revfy <re.emese@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* driver-core: Add attribute argument to class_attribute show/storeAndi Kleen2010-03-07
| | | | | | | | | | | | | | | | | | | | Passing the attribute to the low level IO functions allows all kinds of cleanups, by sharing low level IO code without requiring an own function for every piece of data. Also drivers can extend the attributes with own data fields and use that in the low level function. This makes the class attributes the same as sysdev_class attributes and plain attributes. This will allow further cleanups in drivers. Full tree sweep converting all users. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* sysdev: Fix type of sysdev class attribute in memory driverAndi Kleen2010-03-07
| | | | | | | | | | | This attribute is really a sysdev_class attribute, not a plain class attribute. They are identical in layout currently, but this might not always be the case. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* Revert "sysdev: fix prototype for memory_sysdev_class show/store functions"Greg Kroah-Hartman2010-01-20
| | | | | | | | | | | | | | | This reverts commit 8ff410daa009c4b44be445ded5b0cec00abc0426 It should not have been sent to Linus's tree yet, as it depends on changes that are queued up in my driver-core for the .34 kernel merge. Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: "Zheng, Shaohui" <shaohui.zheng@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* sysdev: fix prototype for memory_sysdev_class show/store functionsWu Fengguang2010-01-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The function prototype mismatches in call stack: [<ffffffff81494268>] print_block_size+0x58/0x60 [<ffffffff81487e3f>] sysdev_class_show+0x1f/0x30 [<ffffffff811d629b>] sysfs_read_file+0xcb/0x1f0 [<ffffffff81176328>] vfs_read+0xc8/0x180 Due to prototype mismatch, print_block_size() will sprintf() into *attribute instead of *buf, hence user space will read the initial zeros from *buf: $ hexdump /sys/devices/system/memory/block_size_bytes 0000000 0000 0000 0000 0000 0000008 After patch: cat /sys/devices/system/memory/block_size_bytes 0x8000000 This complements commits c29af9636 and 4a0b2b4dbe. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: "Zheng, Shaohui" <shaohui.zheng@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* memory-hotplug: add 0x prefix to HEX block_size_bytesWu Fengguang2010-01-16
| | | | | | | | Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: Add notifier in pageblock isolation for balloon driversRobert Jennings2009-12-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Memory balloon drivers can allocate a large amount of memory which is not movable but could be freed to accomodate memory hotplug remove. Prior to calling the memory hotplug notifier chain the memory in the pageblock is isolated. Currently, if the migrate type is not MIGRATE_MOVABLE the isolation will not proceed, causing the memory removal for that page range to fail. Rather than failing pageblock isolation if the migrateteype is not MIGRATE_MOVABLE, this patch checks if all of the pages in the pageblock, and not on the LRU, are owned by a registered balloon driver (or other entity) using a notifier chain. If all of the non-movable pages are owned by a balloon, they can be freed later through the memory notifier chain and the range can still be isolated in set_migratetype_isolate(). Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Ingo Molnar <mingo@elte.hu> Cc: Brian King <brking@linux.vnet.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Gerald Schaefer <geralds@linux.vnet.ibm.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* HWPOISON: Add soft page offline supportAndi Kleen2009-12-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a simpler, gentler variant of memory_failure() for soft page offlining controlled from user space. It doesn't kill anything, just tries to invalidate and if that doesn't work migrate the page away. This is useful for predictive failure analysis, where a page has a high rate of corrected errors, but hasn't gone bad yet. Instead it can be offlined early and avoided. The offlining is controlled from sysfs, including a new generic entry point for hard page offlining for symmetry too. We use the page isolate facility to prevent re-allocation race. Normally this is only used by memory hotplug. To avoid races with memory allocation I am using lock_system_sleep(). This avoids the situation where memory hotplug is about to isolate a page range and then hwpoison undoes that work. This is a big hammer currently, but the simplest solution currently. When the page is not free or LRU we try to free pages from slab and other caches. The slab freeing is currently quite dumb and does not try to focus on the specific slab cache which might own the page. This could be potentially improved later. Thanks to Fengguang Wu and Haicheng Li for some fixes. [Added fix from Andrew Morton to adapt to new migrate_pages prototype] Signed-off-by: Andi Kleen <ak@linux.intel.com>
* mm: show node to memory section relationship with symlinks in sysfsGary Hade2009-01-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Show node to memory section relationship with symlinks in sysfs Add /sys/devices/system/node/nodeX/memoryY symlinks for all the memory sections located on nodeX. For example: /sys/devices/system/node/node1/memory135 -> ../../memory/memory135 indicates that memory section 135 resides on node1. Also revises documentation to cover this change as well as updating Documentation/ABI/testing/sysfs-devices-memory to include descriptions of memory hotremove files 'phys_device', 'phys_index', and 'state' that were previously not described there. In addition to it always being a good policy to provide users with the maximum possible amount of physical location information for resources that can be hot-added and/or hot-removed, the following are some (but likely not all) of the user benefits provided by this change. Immediate: - Provides information needed to determine the specific node on which a defective DIMM is located. This will reduce system downtime when the node or defective DIMM is swapped out. - Prevents unintended onlining of a memory section that was previously offlined due to a defective DIMM. This could happen during node hot-add when the user or node hot-add assist script onlines _all_ offlined sections due to user or script inability to identify the specific memory sections located on the hot-added node. The consequences of reintroducing the defective memory could be ugly. - Provides information needed to vary the amount and distribution of memory on specific nodes for testing or debugging purposes. Future: - Will provide information needed to identify the memory sections that need to be offlined prior to physical removal of a specific node. Symlink creation during boot was tested on 2-node x86_64, 2-node ppc64, and 2-node ia64 systems. Symlink creation during physical memory hot-add tested on a 2-node x86_64 system. Signed-off-by: Gary Hade <garyhade@us.ibm.com> Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* memory_probe: fix wrong sysfs file attributeShaohua Li2008-10-20
| | | | | | | | | | This attribute just has a write operation. [akpm@linux-foundation.org: use S_IWUSR as suggested by Randy] Signed-off-by: Shaohua Li <shaohua.li@intel.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* more sysdev API change fallout - drivers/base/memory.cStephen Rothwell2008-07-28
| | | | | | | | | Noticed because of this warning: drivers/base/memory.c:279: warning: initialization from incompatible pointer type Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Use WARN() in drivers/base/Arjan van de Ven2008-07-26
| | | | | | | | | | | Use WARN() instead of a printk+WARN_ON() pair; this way the message becomes part of the warning section for better reporting/collection. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Greg KH <greg@kroah.com> Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* memory-hotplug: add sysfs removable attribute for hotplug memory removeBadari Pulavarty2008-07-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Memory may be hot-removed on a per-memory-block basis, particularly on POWER where the SPARSEMEM section size often matches the memory-block size. A user-level agent must be able to identify which sections of memory are likely to be removable before attempting the potentially expensive operation. This patch adds a file called "removable" to the memory directory in sysfs to help such an agent. In this patch, a memory block is considered removable if; o It contains only MOVABLE pageblocks o It contains only pageblocks with free pages regardless of pageblock type On the other hand, a memory block starting with a PageReserved() page will never be considered removable. Without this patch, the user-agent is forced to choose a memory block to remove randomly. Sample output of the sysfs files: ./memory/memory0/removable: 0 ./memory/memory1/removable: 0 ./memory/memory2/removable: 0 ./memory/memory3/removable: 0 ./memory/memory4/removable: 0 ./memory/memory5/removable: 0 ./memory/memory6/removable: 0 ./memory/memory7/removable: 1 ./memory/memory8/removable: 0 ./memory/memory9/removable: 0 ./memory/memory10/removable: 0 ./memory/memory11/removable: 0 ./memory/memory12/removable: 0 ./memory/memory13/removable: 0 ./memory/memory14/removable: 0 ./memory/memory15/removable: 0 ./memory/memory16/removable: 0 ./memory/memory17/removable: 1 ./memory/memory18/removable: 1 ./memory/memory19/removable: 1 ./memory/memory20/removable: 1 ./memory/memory21/removable: 1 ./memory/memory22/removable: 1 Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Mel Gorman <mel@csn.ul.ie> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* sysdev: Pass the attribute to the low level sysdev show/store functionAndi Kleen2008-07-22
| | | | | | | | | | | | | | | | | | | | | | This allow to dynamically generate attributes and share show/store functions between attributes. Right now most attributes are generated by special macros and lots of duplicated code. With the attribute passed it's instead possible to attach some data to the attribute and then use that in shared low level functions to do different things. I need this for the dynamically generated bank attributes in the x86 machine check code, but it'll allow some further cleanups. I converted all users in tree to the new show/store prototype. It's a single huge patch to avoid unbisectable sections. Runtime tested: x86-32, x86-64 Compiled only: ia64, powerpc Not compile tested/only grep converted: sh, arm, avr32 Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* memory: Introduce exports for memory notifiersHannes Hering2008-05-13
| | | | | | | This patch introduces two exports to allow modules to use memory notifiers. Signed-off-by: Hannes Hering <hering2@de.ibm.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
* driver core: replace remaining __FUNCTION__ occurrencesHarvey Harrison2008-04-19
| | | | | | | | __FUNCTION__ is gcc-specific, use __func__ Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* driver core: register_memory/unregister_memory clean ups and bugfixBadari Pulavarty2008-04-19
| | | | | | | | | | | | | | | register_memory()/unregister_memory() never gets called with "root". unregister_memory() is accessing kobject_name of the object just freed up. Since no one uses the code, lets take the code out. And also, make register_memory() static. Another bug fix - before calling unregister_memory() remove_memory_block() gets a ref on kobject. unregister_memory() need to drop that ref before calling sysdev_unregister(). Signed-off-by: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* driver core: memory: semaphore to mutexDaniel Walker2008-04-19
| | | | | | | Signed-off-by: Daniel Walker <dwalker@mvista.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* Driver core: change sysdev classes to use dynamic kobject namesKay Sievers2008-01-24
| | | | | | | | | | All kobjects require a dynamically allocated name now. We no longer need to keep track if the name is statically assigned, we can just unconditionally free() all kobject names on cleanup. Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* memory hotplug: rearrange memory hotplug notifierYasunori Goto2007-10-22
| | | | | | | | | | | | | | | | | | | | | | Current memory notifier has some defects yet. (Fortunately, nothing uses it.) This patch is to fix and rearrange for them. - Add information of start_pfn, nr_pages, and node id if node status is changes from/to memoryless node for callback functions. Callbacks can't do anything without those information. - Add notification going-online status. It is necessary for creating per node structure before the node's pages are available. - Move GOING_OFFLINE status notification after page isolation. It is good place for return memory like cache for callback, because returned page is not used again. - Make CANCEL events for rollingback when error occurs. - Delete MEM_MAPPING_INVALID notification. It will be not used. - Fix compile error of (un)register_memory_notifier(). Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* sparsemem: record when a section has a valid mem_mapAndy Whitcroft2007-10-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | We have flags to indicate whether a section actually has a valid mem_map associated with it. This is never set and we rely solely on the present bit to indicate a section is valid. By definition a section is not valid if it has no mem_map and there is a window during init where the present bit is set but there is no mem_map, during which pfn_valid() will return true incorrectly. Use the existing SECTION_HAS_MEM_MAP flag to indicate the presence of a valid mem_map. Switch valid_section{,_nr} and pfn_valid() to this bit. Add a new present_section{,_nr} and pfn_present() interfaces for those users who care to know that a section is going to be valid. [akpm@linux-foundation.org: coding-syle fixes] Signed-off-by: Andy Whitcroft <apw@shadowen.org> Acked-by: Mel Gorman <mel@csn.ul.ie> Cc: Christoph Lameter <clameter@sgi.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Andi Kleen <ak@suse.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* more uevent fallout (drivers/base/memory.c)Al Viro2007-10-14
| | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Driver core: change add_uevent_var to use a structKay Sievers2007-10-12
| | | | | | | | | | | | | | | | | | This changes the uevent buffer functions to use a struct instead of a long list of parameters. It does no longer require the caller to do the proper buffer termination and size accounting, which is currently wrong in some places. It fixes a known bug where parts of the uevent environment are overwritten because of wrong index calculations. Many thanks to Mathieu Desnoyers for finding bugs and improving the error handling. Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* [PATCH] driver/base/memory.c: handle errors properlyAndrew Morton2006-12-07
| | | | | | | | | Do proper error-checking and propagation in drivers/base/memory.c, hence fix __must_check warnings. Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] pgdat allocation for new node add (specify node id)Yasunori Goto2006-06-27
| | | | | | | | | | | | | | | | Change the name of old add_memory() to arch_add_memory. And use node id to get pgdat for the node at NODE_DATA(). Note: Powerpc's old add_memory() is defined as __devinit. However, add_memory() is usually called only after bootup. I suppose it may be redundant. But, I'm not well known about powerpc. So, I keep it. (But, __meminit is better at least.) Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: "Brown, Len" <len.brown@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Notifier chain update: API changesAlan Stern2006-03-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The kernel's implementation of notifier chains is unsafe. There is no protection against entries being added to or removed from a chain while the chain is in use. The issues were discussed in this thread: http://marc.theaimsgroup.com/?l=linux-kernel&m=113018709002036&w=2 We noticed that notifier chains in the kernel fall into two basic usage classes: "Blocking" chains are always called from a process context and the callout routines are allowed to sleep; "Atomic" chains can be called from an atomic context and the callout routines are not allowed to sleep. We decided to codify this distinction and make it part of the API. Therefore this set of patches introduces three new, parallel APIs: one for blocking notifiers, one for atomic notifiers, and one for "raw" notifiers (which is really just the old API under a new name). New kinds of data structures are used for the heads of the chains, and new routines are defined for registration, unregistration, and calling a chain. The three APIs are explained in include/linux/notifier.h and their implementation is in kernel/sys.c. With atomic and blocking chains, the implementation guarantees that the chain links will not be corrupted and that chain callers will not get messed up by entries being added or removed. For raw chains the implementation provides no guarantees at all; users of this API must provide their own protections. (The idea was that situations may come up where the assumptions of the atomic and blocking APIs are not appropriate, so it should be possible for users to handle these things in their own way.) There are some limitations, which should not be too hard to live with. For atomic/blocking chains, registration and unregistration must always be done in a process context since the chain is protected by a mutex/rwsem. Also, a callout routine for a non-raw chain must not try to register or unregister entries on its own chain. (This did happen in a couple of places and the code had to be changed to avoid it.) Since atomic chains may be called from within an NMI handler, they cannot use spinlocks for synchronization. Instead we use RCU. The overhead falls almost entirely in the unregister routine, which is okay since unregistration is much less frequent that calling a chain. Here is the list of chains that we adjusted and their classifications. None of them use the raw API, so for the moment it is only a placeholder. ATOMIC CHAINS ------------- arch/i386/kernel/traps.c: i386die_chain arch/ia64/kernel/traps.c: ia64die_chain arch/powerpc/kernel/traps.c: powerpc_die_chain arch/sparc64/kernel/traps.c: sparc64die_chain arch/x86_64/kernel/traps.c: die_chain drivers/char/ipmi/ipmi_si_intf.c: xaction_notifier_list kernel/panic.c: panic_notifier_list kernel/profile.c: task_free_notifier net/bluetooth/hci_core.c: hci_notifier net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_chain net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_expect_chain net/ipv6/addrconf.c: inet6addr_chain net/netfilter/nf_conntrack_core.c: nf_conntrack_chain net/netfilter/nf_conntrack_core.c: nf_conntrack_expect_chain net/netlink/af_netlink.c: netlink_chain BLOCKING CHAINS --------------- arch/powerpc/platforms/pseries/reconfig.c: pSeries_reconfig_chain arch/s390/kernel/process.c: idle_chain arch/x86_64/kernel/process.c idle_notifier drivers/base/memory.c: memory_chain drivers/cpufreq/cpufreq.c cpufreq_policy_notifier_list drivers/cpufreq/cpufreq.c cpufreq_transition_notifier_list drivers/macintosh/adb.c: adb_client_list drivers/macintosh/via-pmu.c sleep_notifier_list drivers/macintosh/via-pmu68k.c sleep_notifier_list drivers/macintosh/windfarm_core.c wf_client_list drivers/usb/core/notify.c usb_notifier_list drivers/video/fbmem.c fb_notifier_list kernel/cpu.c cpu_chain kernel/module.c module_notify_list kernel/profile.c munmap_notifier kernel/profile.c task_exit_notifier kernel/sys.c reboot_notifier_list net/core/dev.c netdev_chain net/decnet/dn_dev.c: dnaddr_chain net/ipv4/devinet.c: inetaddr_chain It's possible that some of these classifications are wrong. If they are, please let us know or submit a patch to fix them. Note that any chain that gets called very frequently should be atomic, because the rwsem read-locking used for blocking chains is very likely to incur cache misses on SMP systems. (However, if the chain's callout routines may sleep then the chain cannot be atomic.) The patch set was written by Alan Stern and Chandra Seetharaman, incorporating material written by Keith Owens and suggestions from Paul McKenney and Andrew Morton. [jes@sgi.com: restructure the notifier chain initialization macros] Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Signed-off-by: Jes Sorensen <jes@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] fix __user annotations in drivers/base/memory.cAl Viro2006-02-07
| | | | | | sysfs store doesn't deal with userland pointers Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* [PATCH] move capable() to capability.hRandy.Dunlap2006-01-11
| | | | | | | | | | | | | - Move capable() from sched.h to capability.h; - Use <linux/capability.h> where capable() is used (in include/, block/, ipc/, kernel/, a few drivers/, mm/, security/, & sound/; many more drivers/ to go) Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] memhotplug: register_memory should be globalAndy Whitcroft2006-01-06
| | | | | | | | | | register_memory is global and declared so in linux/memory.h. Update the HOTPLUG specific definition to match. This fixes a compile warning when HOTPLUG is enabled. Signed-off-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] memhotplug: register_ and unregister_memory_notifier should be globalAndy Whitcroft2006-01-06
| | | | | | | | | | Both register_memory_notifer and unregister_memory_notifier are global and declared so in linux/memory.h. Update the HOTPLUG specific definitions to match. This fixes a compile warning when HOTPLUG is enabled. Signed-off-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] driver core: replace "hotplug" by "uevent"Kay Sievers2006-01-04
| | | | | | | | | Leave the overloaded "hotplug" word to susbsystems which are handling real devices. The driver core does not "plug" anything, it just exports the state to userspace and generates events. Signed-off-by: Kay Sievers <kay.sievers@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* [PATCH] drivers/base/memory.c: unexport the static (sic) memory_sysdev_classAdrian Bunk2005-12-15
| | | | | | | | We can't export a static struct to modules. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] memory hotplug: move section_mem_map alloc to sparse.cDave Hansen2005-10-30
| | | | | | | | | | | | | | | This basically keeps up from having to extern __kmalloc_section_memmap(). The vaddr_in_vmalloc_area() helper could go in a vmalloc header, but that header gets hard to work with, because it needs some arch-specific macros. Just stick it in here for now, instead of creating another header. Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Lion Vollnhals <webmaster@schiggl.de> Signed-off-by: Jiri Slaby <xslaby@fi.muni.cz> Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] memory hotplug: sysfs and add/remove functionsDave Hansen2005-10-30
This adds generic memory add/remove and supporting functions for memory hotplug into a new file as well as a memory hotplug kernel config option. Individual architecture patches will follow. For now, disable memory hotplug when swsusp is enabled. There's a lot of churn there right now. We'll fix it up properly once it calms down. Signed-off-by: Matt Tolentino <matthew.e.tolentino@intel.com> Signed-off-by: Dave Hansen <haveblue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>