aboutsummaryrefslogtreecommitdiffstats
path: root/include
Commit message (Collapse)AuthorAge
* mem-hotplug: fix potential race while building zonelist for new populated zoneHaicheng Li2010-05-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add global mutex zonelists_mutex to fix the possible race: CPU0 CPU1 CPU2 (1) zone->present_pages += online_pages; (2) build_all_zonelists(); (3) alloc_page(); (4) free_page(); (5) build_all_zonelists(); (6) __build_all_zonelists(); (7) zone->pageset = alloc_percpu(); In step (3,4), zone->pageset still points to boot_pageset, so bad things may happen if 2+ nodes are in this state. Even if only 1 node is accessing the boot_pageset, (3) may still consume too much memory to fail the memory allocations in step (7). Besides, atomic operation ensures alloc_percpu() in step (7) will never fail since there is a new fresh memory block added in step(6). [haicheng.li@linux.intel.com: hold zonelists_mutex when build_all_zonelists] Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Reviewed-by: Andi Kleen <andi.kleen@intel.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mem-hotplug: avoid multiple zones sharing same boot strapping boot_pagesetHaicheng Li2010-05-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For each new populated zone of hotadded node, need to update its pagesets with dynamically allocated per_cpu_pageset struct for all possible CPUs: 1) Detach zone->pageset from the shared boot_pageset at end of __build_all_zonelists(). 2) Use mutex to protect zone->pageset when it's still shared in onlined_pages() Otherwises, multiple zones of different nodes would share same boot strapping boot_pageset for same CPU, which will finally cause below kernel panic: ------------[ cut here ]------------ kernel BUG at mm/page_alloc.c:1239! invalid opcode: 0000 [#1] SMP ... Call Trace: [<ffffffff811300c1>] __alloc_pages_nodemask+0x131/0x7b0 [<ffffffff81162e67>] alloc_pages_current+0x87/0xd0 [<ffffffff81128407>] __page_cache_alloc+0x67/0x70 [<ffffffff811325f0>] __do_page_cache_readahead+0x120/0x260 [<ffffffff81132751>] ra_submit+0x21/0x30 [<ffffffff811329c6>] ondemand_readahead+0x166/0x2c0 [<ffffffff81132ba0>] page_cache_async_readahead+0x80/0xa0 [<ffffffff8112a0e4>] generic_file_aio_read+0x364/0x670 [<ffffffff81266cfa>] nfs_file_read+0xca/0x130 [<ffffffff8117b20a>] do_sync_read+0xfa/0x140 [<ffffffff8117bf75>] vfs_read+0xb5/0x1a0 [<ffffffff8117c151>] sys_read+0x51/0x80 [<ffffffff8103c032>] system_call_fastpath+0x16/0x1b RIP [<ffffffff8112ff13>] get_page_from_freelist+0x883/0x900 RSP <ffff88000d1e78a8> ---[ end trace 4bda28328b9990db ] [akpm@linux-foundation.org: merge fix] Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Reviewed-by: Andi Kleen <andi.kleen@intel.com> Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: fix NR_SECTION_ROOTS == 0 when using using sparsemem extreme.Marcelo Roberto Jimenez2010-05-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Got this while compiling for ARM/SA1100: mm/sparse.c: In function '__section_nr': mm/sparse.c:135: warning: 'root' is used uninitialized in this function This patch follows Russell King's suggestion for a new calculation for NR_SECTION_ROOTS. Thanks also to Sergei Shtylyov for pointing out the existence of the macro DIV_ROUND_UP. Atsushi Nemoto observed: : This fix doesn't just silence the warning - it fixes a real problem. : : Without this fix, mem_section[] might have 0 size so mem_section[0] : will share other variable area. For example, I got: : : c030c700 b __warned.16478 : c030c700 B mem_section : c030c701 b __warned.16483 : : This might cause very strange behavior. Your patch actually fixes it. Signed-off-by: Marcelo Roberto Jimenez <mroberto@cpti.cetuc.puc-rio.br> Cc: Atsushi Nemoto <anemo@mba.ocn.ne.jp> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Sergei Shtylyov <sshtylyov@mvista.com> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* highmem: remove unneeded #ifdef CONFIG_TRACE_IRQFLAGS_SUPPORT for ↵Akinobu Mita2010-05-25
| | | | | | | | | | | | | | | | | | | | debug_kmap_atomic() In f4112de6b679d84bd9b9681c7504be7bdfb7c7d5 ("mm: introduce debug_kmap_atomic") I said that debug_kmap_atomic() needs CONFIG_TRACE_IRQFLAGS_SUPPORT. It was wrong. (I thought irqs_disabled() is only available when the architecture has CONFIG_TRACE_IRQFLAGS_SUPPORT) Remove the #ifdef CONFIG_TRACE_IRQFLAGS_SUPPORT check to enable kmap_atomic() debugging for the architectures which do not have CONFIG_TRACE_IRQFLAGS_SUPPORT. Reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* include/linux/gfp.h: fix coding stylematt mooney2010-05-25
| | | | | | | | | | | | | Add parenthesis in a define. This doesn't change functionality. checkpatch errors: 1) white space fixes 2) add spaces after comas Signed-off-by: matt mooney <mfm@muteddisk.com> Cc: Dan Carpenter <error27@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* include/linux/gfp.h: spelling fixesmatt mooney2010-05-25
| | | | | | | | | Fix minor spelling errors in a few comments; no code changes. Signed-off-by: matt mooney <mfm@muteddisk.com> Cc: Dan Carpenter <error27@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* cpu/mem hotplug: enable CPUs online before local memory onlineminskey guo2010-05-25
| | | | | | | | | | | | | | | | | | | | | | | | | | Enable users to online CPUs even if the CPUs belongs to a numa node which doesn't have onlined local memory. The zonlists(pg_data_t.node_zonelists[]) of a numa node are created either in system boot/init period, or at the time of local memory online. For a numa node without onlined local memory, its zonelists are not initialized at present. As a result, any memory allocation operations executed by CPUs within this node will fail. In fact, an out-of-memory error is triggered when attempt to online CPUs before memory comes to online. This patch tries to create zonelists for such numa nodes, so that the memory allocation for this node can be fallback'ed to other nodes. [akpm@linux-foundation.org: remove unneeded export] [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: minskey guo<chaohong.guo@intel.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* vmscan: remove isolate_pages callback scan controlJohannes Weiner2010-05-25
| | | | | | | | | | | | | | | | | | | | For now, we have global isolation vs. memory control group isolation, do not allow the reclaim entry function to set an arbitrary page isolation callback, we do not need that flexibility. And since we already pass around the group descriptor for the memory control group isolation case, just use it to decide which one of the two isolator functions to use. The decisions can be merged into nearby branches, so no extra cost there. In fact, we save the indirect calls. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: compaction: defer compaction using an exponential backoff when ↵Mel Gorman2010-05-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | compaction fails The fragmentation index may indicate that a failure is due to external fragmentation but after a compaction run completes, it is still possible for an allocation to fail. There are two obvious reasons as to why o Page migration cannot move all pages so fragmentation remains o A suitable page may exist but watermarks are not met In the event of compaction followed by an allocation failure, this patch defers further compaction in the zone (1 << compact_defer_shift) times. If the next compaction attempt also fails, compact_defer_shift is increased up to a maximum of 6. If compaction succeeds, the defer counters are reset again. The zone that is deferred is the first zone in the zonelist - i.e. the preferred zone. To defer compaction in the other zones, the information would need to be stored in the zonelist or implemented similar to the zonelist_cache. This would impact the fast-paths and is not justified at this time. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Cc: Rik van Riel <riel@redhat.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: compaction: add a tunable that decides when memory should be compacted ↵Mel Gorman2010-05-25
| | | | | | | | | | | | | | | | | | | | | | | and when it should be reclaimed The kernel applies some heuristics when deciding if memory should be compacted or reclaimed to satisfy a high-order allocation. One of these is based on the fragmentation. If the index is below 500, memory will not be compacted. This choice is arbitrary and not based on data. To help optimise the system and set a sensible default for this value, this patch adds a sysctl extfrag_threshold. The kernel will only compact memory if the fragmentation index is above the extfrag_threshold. [randy.dunlap@oracle.com: Fix build errors when proc fs is not configured] Signed-off-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Rik van Riel <riel@redhat.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: compaction: direct compact when a high-order allocation failsMel Gorman2010-05-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | Ordinarily when a high-order allocation fails, direct reclaim is entered to free pages to satisfy the allocation. With this patch, it is determined if an allocation failed due to external fragmentation instead of low memory and if so, the calling process will compact until a suitable page is freed. Compaction by moving pages in memory is considerably cheaper than paging out to disk and works where there are locked pages or no swap. If compaction fails to free a page of a suitable size, then reclaim will still occur. Direct compaction returns as soon as possible. As each block is compacted, it is checked if a suitable page has been freed and if so, it returns. [akpm@linux-foundation.org: Fix build errors] [aarcange@redhat.com: fix count_vm_event preempt in memory compaction direct reclaim] Signed-off-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Rik van Riel <riel@redhat.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: compaction: add /sys trigger for per-node memory compactionMel Gorman2010-05-25
| | | | | | | | | | | | | | | | | Add a per-node sysfs file called compact. When the file is written to, each zone in that node is compacted. The intention that this would be used by something like a job scheduler in a batch system before a job starts so that the job can allocate the maximum number of hugepages without significant start-up cost. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Rik van Riel <riel@redhat.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: compaction: add /proc trigger for memory compactionMel Gorman2010-05-25
| | | | | | | | | | | | | | | | Add a proc file /proc/sys/vm/compact_memory. When an arbitrary value is written to the file, all zones are compacted. The expected user of such a trigger is a job scheduler that prepares the system before the target application runs. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Rik van Riel <riel@redhat.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: compaction: memory compaction coreMel Gorman2010-05-25
| | | | | | | | | | | | | | | | | | | | | | | | | This patch is the core of a mechanism which compacts memory in a zone by relocating movable pages towards the end of the zone. A single compaction run involves a migration scanner and a free scanner. Both scanners operate on pageblock-sized areas in the zone. The migration scanner starts at the bottom of the zone and searches for all movable pages within each area, isolating them onto a private list called migratelist. The free scanner starts at the top of the zone and searches for suitable areas and consumes the free pages within making them available for the migration scanner. The pages isolated for migration are then migrated to the newly isolated free pages. [aarcange@redhat.com: Fix unsafe optimisation] [mel@csn.ul.ie: do not schedule work on other CPUs for compaction] Signed-off-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Rik van Riel <riel@redhat.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: move definition for LRU isolation modes to a headerMel Gorman2010-05-25
| | | | | | | | | | | | | | | Currently, vmscan.c defines the isolation modes for __isolate_lru_page(). Memory compaction needs access to these modes for isolating pages for migration. This patch exports them. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Christoph Lameter <cl@linux-foundation.org> Cc: Rik van Riel <riel@redhat.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: migration: avoid race between shift_arg_pages() and rmap_walk() during ↵Mel Gorman2010-05-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | migration by not migrating temporary stacks Page migration requires rmap to be able to find all ptes mapping a page at all times, otherwise the migration entry can be instantiated, but it is possible to leave one behind if the second rmap_walk fails to find the page. If this page is later faulted, migration_entry_to_page() will call BUG because the page is locked indicating the page was migrated by the migration PTE not cleaned up. For example kernel BUG at include/linux/swapops.h:105! invalid opcode: 0000 [#1] PREEMPT SMP ... Call Trace: [<ffffffff810e951a>] handle_mm_fault+0x3f8/0x76a [<ffffffff8130c7a2>] do_page_fault+0x44a/0x46e [<ffffffff813099b5>] page_fault+0x25/0x30 [<ffffffff8114de33>] load_elf_binary+0x152a/0x192b [<ffffffff8111329b>] search_binary_handler+0x173/0x313 [<ffffffff81114896>] do_execve+0x219/0x30a [<ffffffff8100a5c6>] sys_execve+0x43/0x5e [<ffffffff8100320a>] stub_execve+0x6a/0xc0 RIP [<ffffffff811094ff>] migration_entry_wait+0xc1/0x129 There is a race between shift_arg_pages and migration that triggers this bug. A temporary stack is setup during exec and later moved. If migration moves a page in the temporary stack and the VMA is then removed before migration completes, the migration PTE may not be found leading to a BUG when the stack is faulted. This patch causes pages within the temporary stack during exec to be skipped by migration. It does this by marking the VMA covering the temporary stack with an otherwise impossible combination of VMA flags. These flags are cleared when the temporary stack is moved to its final location. [kamezawa.hiroyu@jp.fujitsu.com: idea for having migration skip temporary stacks] Signed-off-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Rik van Riel <riel@redhat.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Christoph Lameter <cl@linux.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: migration: share the anon_vma ref counts between KSM and page migrationMel Gorman2010-05-25
| | | | | | | | | | | | | | | For clarity of review, KSM and page migration have separate refcounts on the anon_vma. While clear, this is a waste of memory. This patch gets KSM and page migration to share their toys in a spirit of harmony. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: Christoph Lameter <cl@linux-foundation.org> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: migration: take a reference to the anon_vma before migratingMel Gorman2010-05-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patchset is a memory compaction mechanism that reduces external fragmentation memory by moving GFP_MOVABLE pages to a fewer number of pageblocks. The term "compaction" was chosen as there are is a number of mechanisms that are not mutually exclusive that can be used to defragment memory. For example, lumpy reclaim is a form of defragmentation as was slub "defragmentation" (really a form of targeted reclaim). Hence, this is called "compaction" to distinguish it from other forms of defragmentation. In this implementation, a full compaction run involves two scanners operating within a zone - a migration and a free scanner. The migration scanner starts at the beginning of a zone and finds all movable pages within one pageblock_nr_pages-sized area and isolates them on a migratepages list. The free scanner begins at the end of the zone and searches on a per-area basis for enough free pages to migrate all the pages on the migratepages list. As each area is respectively migrated or exhausted of free pages, the scanners are advanced one area. A compaction run completes within a zone when the two scanners meet. This method is a bit primitive but is easy to understand and greater sophistication would require maintenance of counters on a per-pageblock basis. This would have a big impact on allocator fast-paths to improve compaction which is a poor trade-off. It also does not try relocate virtually contiguous pages to be physically contiguous. However, assuming transparent hugepages were in use, a hypothetical khugepaged might reuse compaction code to isolate free pages, split them and relocate userspace pages for promotion. Memory compaction can be triggered in one of three ways. It may be triggered explicitly by writing any value to /proc/sys/vm/compact_memory and compacting all of memory. It can be triggered on a per-node basis by writing any value to /sys/devices/system/node/nodeN/compact where N is the node ID to be compacted. When a process fails to allocate a high-order page, it may compact memory in an attempt to satisfy the allocation instead of entering direct reclaim. Explicit compaction does not finish until the two scanners meet and direct compaction ends if a suitable page becomes available that would meet watermarks. The series is in 14 patches. The first three are not "core" to the series but are important pre-requisites. Patch 1 reference counts anon_vma for rmap_walk_anon(). Without this patch, it's possible to use anon_vma after free if the caller is not holding a VMA or mmap_sem for the pages in question. While there should be no existing user that causes this problem, it's a requirement for memory compaction to be stable. The patch is at the start of the series for bisection reasons. Patch 2 merges the KSM and migrate counts. It could be merged with patch 1 but would be slightly harder to review. Patch 3 skips over unmapped anon pages during migration as there are no guarantees about the anon_vma existing. There is a window between when a page was isolated and migration started during which anon_vma could disappear. Patch 4 notes that PageSwapCache pages can still be migrated even if they are unmapped. Patch 5 allows CONFIG_MIGRATION to be set without CONFIG_NUMA Patch 6 exports a "unusable free space index" via debugfs. It's a measure of external fragmentation that takes the size of the allocation request into account. It can also be calculated from userspace so can be dropped if requested Patch 7 exports a "fragmentation index" which only has meaning when an allocation request fails. It determines if an allocation failure would be due to a lack of memory or external fragmentation. Patch 8 moves the definition for LRU isolation modes for use by compaction Patch 9 is the compaction mechanism although it's unreachable at this point Patch 10 adds a means of compacting all of memory with a proc trgger Patch 11 adds a means of compacting a specific node with a sysfs trigger Patch 12 adds "direct compaction" before "direct reclaim" if it is determined there is a good chance of success. Patch 13 adds a sysctl that allows tuning of the threshold at which the kernel will compact or direct reclaim Patch 14 temporarily disables compaction if an allocation failure occurs after compaction. Testing of compaction was in three stages. For the test, debugging, preempt, the sleep watchdog and lockdep were all enabled but nothing nasty popped out. min_free_kbytes was tuned as recommended by hugeadm to help fragmentation avoidance and high-order allocations. It was tested on X86, X86-64 and PPC64. Ths first test represents one of the easiest cases that can be faced for lumpy reclaim or memory compaction. 1. Machine freshly booted and configured for hugepage usage with a) hugeadm --create-global-mounts b) hugeadm --pool-pages-max DEFAULT:8G c) hugeadm --set-recommended-min_free_kbytes d) hugeadm --set-recommended-shmmax The min_free_kbytes here is important. Anti-fragmentation works best when pageblocks don't mix. hugeadm knows how to calculate a value that will significantly reduce the worst of external-fragmentation-related events as reported by the mm_page_alloc_extfrag tracepoint. 2. Load up memory a) Start updatedb b) Create in parallel a X files of pagesize*128 in size. Wait until files are created. By parallel, I mean that 4096 instances of dd were launched, one after the other using &. The crude objective being to mix filesystem metadata allocations with the buffer cache. c) Delete every second file so that pageblocks are likely to have holes d) kill updatedb if it's still running At this point, the system is quiet, memory is full but it's full with clean filesystem metadata and clean buffer cache that is unmapped. This is readily migrated or discarded so you'd expect lumpy reclaim to have no significant advantage over compaction but this is at the POC stage. 3. In increments, attempt to allocate 5% of memory as hugepages. Measure how long it took, how successful it was, how many direct reclaims took place and how how many compactions. Note the compaction figures might not fully add up as compactions can take place for orders other than the hugepage size X86 vanilla compaction Final page count 913 916 (attempted 1002) pages reclaimed 68296 9791 X86-64 vanilla compaction Final page count: 901 902 (attempted 1002) Total pages reclaimed: 112599 53234 PPC64 vanilla compaction Final page count: 93 94 (attempted 110) Total pages reclaimed: 103216 61838 There was not a dramatic improvement in success rates but it wouldn't be expected in this case either. What was important is that fewer pages were reclaimed in all cases reducing the amount of IO required to satisfy a huge page allocation. The second tests were all performance related - kernbench, netperf, iozone and sysbench. None showed anything too remarkable. The last test was a high-order allocation stress test. Many kernel compiles are started to fill memory with a pressured mix of unmovable and movable allocations. During this, an attempt is made to allocate 90% of memory as huge pages - one at a time with small delays between attempts to avoid flooding the IO queue. vanilla compaction Percentage of request allocated X86 98 99 Percentage of request allocated X86-64 95 98 Percentage of request allocated PPC64 55 70 This patch: rmap_walk_anon() does not use page_lock_anon_vma() for looking up and locking an anon_vma and it does not appear to have sufficient locking to ensure the anon_vma does not disappear from under it. This patch copies an approach used by KSM to take a reference on the anon_vma while pages are being migrated. This should prevent rmap_walk() running into nasty surprises later because anon_vma has been freed. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Rik van Riel <riel@redhat.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* cpuset,mm: fix no node to alloc memory when changing cpuset's memsMiao Xie2010-05-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before applying this patch, cpuset updates task->mems_allowed and mempolicy by setting all new bits in the nodemask first, and clearing all old unallowed bits later. But in the way, the allocator may find that there is no node to alloc memory. The reason is that cpuset rebinds the task's mempolicy, it cleans the nodes which the allocater can alloc pages on, for example: (mpol: mempolicy) task1 task1's mpol task2 alloc page 1 alloc on node0? NO 1 1 change mems from 1 to 0 1 rebind task1's mpol 0-1 set new bits 0 clear disallowed bits alloc on node1? NO 0 ... can't alloc page goto oom This patch fixes this problem by expanding the nodes range first(set newly allowed bits) and shrink it lazily(clear newly disallowed bits). So we use a variable to tell the write-side task that read-side task is reading nodemask, and the write-side task clears newly disallowed nodes after read-side task ends the current memory allocation. [akpm@linux-foundation.org: fix spello] Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Cc: Nick Piggin <npiggin@suse.de> Cc: Paul Menage <menage@google.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Ravikiran Thirumalai <kiran@scalex86.org> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mempolicy: restructure rebinding-mempolicy functionsMiao Xie2010-05-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Nick Piggin reported that the allocator may see an empty nodemask when changing cpuset's mems[1]. It happens only on the kernel that do not do atomic nodemask_t stores. (MAX_NUMNODES > BITS_PER_LONG) But I found that there is also a problem on the kernel that can do atomic nodemask_t stores. The problem is that the allocator can't find a node to alloc page when changing cpuset's mems though there is a lot of free memory. The reason is like this: (mpol: mempolicy) task1 task1's mpol task2 alloc page 1 alloc on node0? NO 1 1 change mems from 1 to 0 1 rebind task1's mpol 0-1 set new bits 0 clear disallowed bits alloc on node1? NO 0 ... can't alloc page goto oom I can use the attached program reproduce it by the following step: # mkdir /dev/cpuset # mount -t cpuset cpuset /dev/cpuset # mkdir /dev/cpuset/1 # echo `cat /dev/cpuset/cpus` > /dev/cpuset/1/cpus # echo `cat /dev/cpuset/mems` > /dev/cpuset/1/mems # echo $$ > /dev/cpuset/1/tasks # numactl --membind=`cat /dev/cpuset/mems` ./cpuset_mem_hog <nr_tasks> & <nr_tasks> = max(nr_cpus - 1, 1) # killall -s SIGUSR1 cpuset_mem_hog # ./change_mems.sh several hours later, oom will happen though there is a lot of free memory. This patchset fixes this problem by expanding the nodes range first(set newly allowed bits) and shrink it lazily(clear newly disallowed bits). So we use a variable to tell the write-side task that read-side task is reading nodemask, and the write-side task clears newly disallowed nodes after read-side task ends the current memory allocation. This patch: In order to fix no node to alloc memory, when we want to update mempolicy and mems_allowed, we expand the set of nodes first (set all the newly nodes) and shrink the set of nodes lazily(clean disallowed nodes), But the mempolicy's rebind functions may breaks the expanding. So we restructure the mempolicy's rebind functions and split the rebind work to two steps, just like the update of cpuset's mems: The 1st step: expand the set of the mempolicy's nodes. The 2nd step: shrink the set of the mempolicy's nodes. It is used when there is no real lock to protect the mempolicy in the read-side. Otherwise we can do rebind work at once. In order to implement it, we define enum mpol_rebind_step { MPOL_REBIND_ONCE, MPOL_REBIND_STEP1, MPOL_REBIND_STEP2, MPOL_REBIND_NSTEP, }; If the mempolicy needn't be updated by two steps, we can pass MPOL_REBIND_ONCE to the rebind functions. Or we can pass MPOL_REBIND_STEP1 to do the first step of the rebind work and pass MPOL_REBIND_STEP2 to do the second step work. Besides that, it maybe long time between these two step and we have to release the lock that protects mempolicy and mems_allowed. If we hold the lock once again, we must check whether the current mempolicy is under the rebinding (the first step has been done) or not, because the task may alloc a new mempolicy when we don't hold the lock. So we defined the following flag to identify it: #define MPOL_F_REBINDING (1 << 2) The new functions will be used in the next patch. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Cc: Nick Piggin <npiggin@suse.de> Cc: Paul Menage <menage@google.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Ravikiran Thirumalai <kiran@scalex86.org> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: remove return value of putback_lru_pages()Minchan Kim2010-05-25
| | | | | | | | | | | | | | | | putback_lru_page() never can fail. So it doesn't matter count of "the number of pages put back". In addition, users of this functions don't use return value. Let's remove unnecessary code. Signed-off-by: Minchan Kim <minchan.kim@gmail.com> Reviewed-by: Rik van Riel <riel@redhat.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* tmpfs: insert tmpfs cache pages to inactive list at firstKOSAKI Motohiro2010-05-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Shaohua Li reported parallel file copy on tmpfs can lead to OOM killer. This is regression of caused by commit 9ff473b9a7 ("vmscan: evict streaming IO first"). Wow, It is 2 years old patch! Currently, tmpfs file cache is inserted active list at first. This means that the insertion doesn't only increase numbers of pages in anon LRU, but it also reduces anon scanning ratio. Therefore, vmscan will get totally confused. It scans almost only file LRU even though the system has plenty unused tmpfs pages. Historically, lru_cache_add_active_anon() was used for two reasons. 1) Intend to priotize shmem page rather than regular file cache. 2) Intend to avoid reclaim priority inversion of used once pages. But we've lost both motivation because (1) Now we have separate anon and file LRU list. then, to insert active list doesn't help such priotize. (2) In past, one pte access bit will cause page activation. then to insert inactive list with pte access bit mean higher priority than to insert active list. Its priority inversion may lead to uninteded lru chun. but it was already solved by commit 645747462 (vmscan: detect mapped file pages used only once). (Thanks Hannes, you are great!) Thus, now we can use lru_cache_add_anon() instead. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reported-by: Shaohua Li <shaohua.li@intel.com> Reviewed-by: Wu Fengguang <fengguang.wu@intel.com> Reviewed-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Rik van Riel <riel@redhat.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Acked-by: Hugh Dickins <hughd@google.com> Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide-2.6Linus Torvalds2010-05-24
|\ | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide-2.6: cmd640: fix kernel oops in test_irq() method pdc202xx_old: ignore "FIFO empty" bit in test_irq() method pdc202xx_old: wire test_irq() method for PDC2026x IDE: pass IRQ flags to the IDE core ide: fix comment typo in ide.h
| * ide: fix comment typo in ide.hSergei Shtylyov2010-04-15
| | | | | | | | | | | | | | | | | | | | | | | | Fix typo in the comment to the 'dma_mode' field of the 'struct ide_drive_s' introduced by the commit 3fccaa192b9501e79a57e02e62b6bf420d2b461e (ide: add drive->dma_mode field). Whilt at it, convert spaces to a tab in the declaration of the neighbouring 'dn' field... Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'bkl/ioctl' of ↵Linus Torvalds2010-05-24
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing * 'bkl/ioctl' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing: uml: Pushdown the bkl from harddog_kern ioctl sunrpc: Pushdown the bkl from sunrpc cache ioctl sunrpc: Pushdown the bkl from ioctl autofs4: Pushdown the bkl from ioctl uml: Convert to unlocked_ioctls to remove implicit BKL ncpfs: BKL ioctl pushdown coda: Clean-up whitespace problems in pioctl.c coda: BKL ioctl pushdown drivers: Push down BKL into various drivers isdn: Push down BKL into ioctl functions scsi: Push down BKL into ioctl functions dvb: Push down BKL into ioctl functions smbfs: Push down BKL into ioctl function coda/psdev: Remove BKL from ioctl function um/mmapper: Remove BKL usage sn_hwperf: Kill BKL usage hfsplus: Push down BKL into ioctl function
| * | ncpfs: BKL ioctl pushdownJohn Kacur2010-05-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Convert ncp_ioctl to an unlocked_ioctl and push down the bkl into it. Signed-off-by: John Kacur <jkacur@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Petr Vandrovec <vandrove@vc.cvut.cz> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
* | | Merge git://git.infradead.org/battery-2.6Linus Torvalds2010-05-24
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.infradead.org/battery-2.6: ds2760_battery: Document ABI change ds2760_battery: Make charge_now and charge_full writeable power_supply: Add support for writeable properties power_supply: Use attribute groups power_supply: Add test_power driver tosa_battery: Fix build error due to direct driver_data usage wm97xx_battery: Quieten sparse warning (bat_set_pdata not declared) ds2782_battery: Get rid of magic numbers in driver_data ds2782_battery: Add support for ds2786 battery gas gauge pda_power: Add function callbacks for suspend and resume wm831x_power: Use genirq Driver for Zipit Z2 battery chip ds2782_battery: Fix clientdata on removal
| * | | power_supply: Add support for writeable propertiesDaniel Mack2010-05-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support for writeable power supply properties and exposes them as writeable to sysfs. A power supply implementation must implement two new function calls in order to use that feature: int set_property(struct power_supply *psy, enum power_supply_property psp, const union power_supply_propval *val); int property_is_writeable(struct power_supply *psy, enum power_supply_property psp); Signed-off-by: Daniel Mack <daniel@caiaq.de> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Alexey Starikovskiy <astarikovskiy@suse.de> Cc: Len Brown <len.brown@intel.com> Cc: Mark Brown <broonie@opensource.wolfsonmicro.com> Cc: Matt Reimer <mreimer@vpop.net> Cc: Evgeniy Polyakov <zbr@ioremap.net> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Anton Vorontsov <cbouatmailru@gmail.com>
| * | | power_supply: Use attribute groupsAnton Vorontsov2010-05-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes a race between power supply device and initial attributes creation, plus makes it possible to implement writable properties. [Daniel Mack - removed superflous return statement and dropped .mode attribute from POWER_SUPPLY_ATTR] Suggested-by: Greg KH <gregkh@suse.de> Suggested-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Anton Vorontsov <cbouatmailru@gmail.com> Tested-by: Daniel Mack <daniel@caiaq.de>
| * | | ds2782_battery: Add support for ds2786 battery gas gaugeYulia Vilensky2010-04-26
| | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Yulia Vilensky <vilensky@compulab.co.il> Signed-off-by: Mike Rapoport <mike@compulab.co.il> Signed-off-by: Anton Vorontsov <cbouatmailru@gmail.com>
| * | | Merge branch 'master' of ↵Anton Vorontsov2010-04-26
| |\ \ \ | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6
| * | | | pda_power: Add function callbacks for suspend and resumeDaniel Mack2010-04-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add function prototypes for power management events so they can be handled and used by platform implementations. Signed-off-by: Daniel Mack <daniel@caiaq.de> Cc: Dmitry Baryshkov <dbaryshkov@gmail.com> Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Anton Vorontsov <cbouatmailru@gmail.com>
| * | | | Driver for Zipit Z2 battery chipMarek Vasut2010-04-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds driver for Zipit Z2 battery chip called AER915. No details are known about the chip. The chip is available through I2C bus at address 0x55 and it's register 0x02 contains battery voltage. Signed-off-by: Marek Vasut <marek.vasut@gmail.com> Signed-off-by: Anton Vorontsov <cbouatmailru@gmail.com>
* | | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6Linus Torvalds2010-05-24
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6: (25 commits) sh: fix up sh7785lcr_32bit_defconfig. arch/sh/lib/strlen.S: Checkpatch cleanup sh: fix up sh7786 dmaengine build. sh: guard cookie consistency across termination in the DMA driver sh: prevent the DMA driver from unloading, while in use sh: fix Oops in the serial SCI driver sh: allow platforms to specify SD-card supported voltages mmc: let MFD's provide supported Vdd card voltages to tmio_mmc sh: disable SD-card write-protection detection on kfr2r09 mfd: pass platform flags down to the tmio_mmc driver tmio: add a platform flag to disable card write-protection detection sh: Add SDHI DMA support to migor sh: Add SDHI DMA support to kfr2r09 sh: Add SDHI DMA support to ms7724se sh: Add SDHI DMA support to ecovec mmc: add DMA support to tmio_mmc driver, when used on SuperH sh: prepare the SDHI MFD driver to pass DMA configuration to tmio_mmc.c mmc: prepare tmio_mmc for passing of DMA configuration from the MFD cell sh: add DMA slave definitions to sh7724 sh: add DMA slaves for two SDHI controllers to sh7722 ...
| * | | | | sh: allow platforms to specify SD-card supported voltagesGuennadi Liakhovetski2010-05-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Boards can have different supplied voltages on different SD card slots. This information has to be passed down to the SD/MMC driver. Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de> Acked-by: Ian Molton <ian@mnementh.co.uk> Signed-off-by: Paul Mundt <lethal@linux-sh.org>
| * | | | | mmc: let MFD's provide supported Vdd card voltages to tmio_mmcGuennadi Liakhovetski2010-05-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de> Acked-by: Ian Molton <ian@mnementh.co.uk> Signed-off-by: Paul Mundt <lethal@linux-sh.org>
| * | | | | mfd: pass platform flags down to the tmio_mmc driverGuennadi Liakhovetski2010-05-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de> Acked-by: Ian Molton <ian@mnementh.co.uk> Signed-off-by: Paul Mundt <lethal@linux-sh.org>
| * | | | | tmio: add a platform flag to disable card write-protection detectionGuennadi Liakhovetski2010-05-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Write-protection status is not always available, e.g., micro-SD cards do not have a write-protection switch at all. This patch adds a flag to let platforms force tmio_mmc to consider the card writable. Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de> Acked-by: Ian Molton <ian@mnementh.co.uk> Signed-off-by: Paul Mundt <lethal@linux-sh.org>
| * | | | | sh: prepare the SDHI MFD driver to pass DMA configuration to tmio_mmc.cGuennadi Liakhovetski2010-05-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pass DMA slave IDs from platform down to the tmio_mmc driver, to be used for dmaengine configuration. Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de> Acked-by: Samuel Ortiz <sameo@linux.intel.com> Signed-off-by: Paul Mundt <lethal@linux-sh.org>
| * | | | | mmc: prepare tmio_mmc for passing of DMA configuration from the MFD cellGuennadi Liakhovetski2010-05-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After this patch, if the "dma" pointer in struct tmio_mmc_data is not NULL, it points to a struct, containing two tokens, that have to be passed to the dmaengine driver for channel configuration. Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de> Acked-by: Samuel Ortiz <sameo@linux.intel.com> Signed-off-by: Paul Mundt <lethal@linux-sh.org>
* | | | | | Revert "ath9k: Group Key fix for VAPs"Linus Torvalds2010-05-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 03ceedea972a82d343fa5c2528b3952fa9e615d5, since it breaks resume from suspend-to-ram on Rafael's Acer Ferrari One. NetworkManager thinks everything is ok, but it can't connect to the AP to get an IP address after the resume. In fact, it even breaks resume for non-ath9k chipsets: reverting it also fixes Rafael's Toshiba Protege R500 with the iwlagn driver. As Johannes says: "Indeed, this patch needs to be reverted. That mac80211 change is wrong and completely unnecessary." Reported-and-requested-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Johannes Berg <johannes@sipsolutions.net> Cc: Daniel Yingqiang Ma <yma.cool@gmail.com> Cc: John W. Linville <linville@tuxdriver.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | Merge branch 'for-linus' of ↵Linus Torvalds2010-05-24
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs: 9p: Optimize TCREATE by eliminating a redundant fid clone. 9p: cleanup: remove unneeded assignment 9p: Add mksock support fs/9p: Make sure we properly instantiate dentry. 9p: add 9P2000.L rename operation 9p: add 9P2000.L statfs operation 9p: VFS switches for 9p2000.L: VFS switches 9p: VFS switches for 9p2000.L: protocol and client changes
| * | | | | | 9p: add 9P2000.L rename operationSripathi Kodi2010-05-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I made a V2 of this patch on top of my patches for VFS switches. All the changes were due to change in some offsets. rename - change name of file or directory size[4] Trename tag[2] fid[4] newdirfid[4] name[s] size[4] Rrename tag[2] The rename message is used to change the name of a file, possibly moving it to a new directory. The 9P wstat message can only rename a file within the same directory. Signed-off-by: Jim Garlick <garlick@llnl.gov> Signed-off-by: Sripathi Kodi <sripathik@in.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
| * | | | | | 9p: add 9P2000.L statfs operationSripathi Kodi2010-05-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I made a V2 of this patch on top of my patches for VFS switches. The change was adding v9fs_statfs pointer to v9fs_super_ops_dotl instead of v9fs_super_ops. statfs - get file system statistics size[4] Tstatfs tag[2] fid[4] size[4] Rstatfs tag[2] type[4] bsize[4] blocks[8] bfree[8] bavail[8] files[8] ffree[8] fsid[8] namelen[4] The statfs message is used to request file system information returned by the statfs(2) system call, which is used by df(1) to report file system and disk space usage. Signed-off-by: Jim Garlick <garlick@llnl.gov> Signed-off-by: Sripathi Kodi <sripathik@in.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
* | | | | | | Merge branch 'next-devicetree' of git://git.secretlab.ca/git/linux-2.6Linus Torvalds2010-05-24
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'next-devicetree' of git://git.secretlab.ca/git/linux-2.6: of: change of_match_device to work with struct device of: Remove duplicate fields from of_platform_driver drivercore: Add of_match_table to the common device drivers arch/microblaze: Move dma_mask from of_device into pdev_archdata arch/powerpc: Move dma_mask from of_device into pdev_archdata of: eliminate of_device->node and dev_archdata->{of,prom}_node of: Always use 'struct device.of_node' to get device node pointer. i2c/of: Allow device node to be passed via i2c_board_info driver-core: Add device node pointer to struct device of: protect contents of of_platform.h and of_device.h of/flattree: Make unflatten_device_tree() safe to call from any arch of/flattree: make of_fdt.h safe to unconditionally include.
| * \ \ \ \ \ \ Merge remote branch 'origin' into secretlab/next-devicetreeGrant Likely2010-05-22
| |\ \ \ \ \ \ \ | | | |/ / / / / | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merging in current state of Linus' tree to deal with merge conflicts and build failures in vio.c after merge. Conflicts: drivers/i2c/busses/i2c-cpm.c drivers/i2c/busses/i2c-mpc.c drivers/net/gianfar.c Also fixed up one line in arch/powerpc/kernel/vio.c to use the correct node pointer. Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
| * | | | | | | of: change of_match_device to work with struct deviceGrant Likely2010-05-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The of_node pointer is now stored directly in struct device, so of_match_device() should work with any device, not just struct of_device. This patch changes the interface to of_match_device() to accept a struct device instead of struct of_device. Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
| * | | | | | | of: Remove duplicate fields from of_platform_driverGrant Likely2010-05-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | .name, .match_table and .owner are duplicated in both of_platform_driver and device_driver. This patch is a removes the extra copies from struct of_platform_driver and converts all users to the device_driver members. This patch is a pretty mechanical change. The usage model doesn't change and if any drivers have been missed, or if anything has been fixed up incorrectly, then it will fail with a compile time error, and the fixup will be trivial. This patch looks big and scary because it touches so many files, but it should be pretty safe. Signed-off-by: Grant Likely <grant.likely@secretlab.ca> Acked-by: Sean MacLennan <smaclennan@pikatech.com>
| * | | | | | | drivercore: Add of_match_table to the common device driversGrant Likely2010-05-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | OF-style matching can be available to any device, on any type of bus. This patch allows any driver to provide an OF match table when CONFIG_OF is enabled so that drivers can be bound against devices described in the device tree. Signed-off-by: Grant Likely <grant.likely@secretlab.ca> Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
| * | | | | | | i2c/of: Allow device node to be passed via i2c_board_infoGrant Likely2010-04-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The struct device_node *of_node pointer is moving out of dev->archdata and into the struct device proper. of_i2c.c needs to set the of_node pointer before the device is registered. Since the i2c subsystem doesn't allow 2 stage allocation and registration of i2c devices, the of_node pointer needs to be passed via the i2c_board_info structure so that it is set prior to registration. This patch adds of_node to struct i2c_board_info (conditional on CONFIG_OF), sets of_node in i2c_new_device(), and modifies of_i2c.c to use the new parameter. The calling of dev_archdata_set_node() from of_i2c will be removed in a subsequent patch when of_node is removed from archdata and all users are converted over. Signed-off-by: Grant Likely <grant.likely@secretlab.ca>