aboutsummaryrefslogtreecommitdiffstats
path: root/kernel
Commit message (Collapse)AuthorAge
* mm: filter based on a nodemask as well as a gfp_maskMel Gorman2008-04-28
| | | | | | | | | | | | | | | | | | | | | | | | | | The MPOL_BIND policy creates a zonelist that is used for allocations controlled by that mempolicy. As the per-node zonelist is already being filtered based on a zone id, this patch adds a version of __alloc_pages() that takes a nodemask for further filtering. This eliminates the need for MPOL_BIND to create a custom zonelist. A positive benefit of this is that allocations using MPOL_BIND now use the local node's distance-ordered zonelist instead of a custom node-id-ordered zonelist. I.e., pages will be allocated from the closest allowed node with available memory. [Lee.Schermerhorn@hp.com: Mempolicy: update stale documentation and comments] [Lee.Schermerhorn@hp.com: Mempolicy: make dequeue_huge_page_vma() obey MPOL_BIND nodemask] [Lee.Schermerhorn@hp.com: Mempolicy: make dequeue_huge_page_vma() obey MPOL_BIND nodemask rework] Signed-off-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Hugh Dickins <hugh@veritas.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: have zonelist contains structs with both a zone pointer and zone_idxMel Gorman2008-04-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Filtering zonelists requires very frequent use of zone_idx(). This is costly as it involves a lookup of another structure and a substraction operation. As the zone_idx is often required, it should be quickly accessible. The node idx could also be stored here if it was found that accessing zone->node is significant which may be the case on workloads where nodemasks are heavily used. This patch introduces a struct zoneref to store a zone pointer and a zone index. The zonelist then consists of an array of these struct zonerefs which are looked up as necessary. Helpers are given for accessing the zone index as well as the node index. [kamezawa.hiroyu@jp.fujitsu.com: Suggested struct zoneref instead of embedding information in pointers] [hugh@veritas.com: mm-have-zonelist: fix memcg ooms] [hugh@veritas.com: just return do_try_to_free_pages] [hugh@veritas.com: do_try_to_free_pages gfp_mask redundant] Signed-off-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Christoph Lameter <clameter@sgi.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Christoph Lameter <clameter@sgi.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* s390: KVM preparation: provide hook to enable pgstes in user pagetableCarsten Otte2008-04-27
| | | | | | | | | | | | | | | | | | | | | | | | The SIE instruction on s390 uses the 2nd half of the page table page to virtualize the storage keys of a guest. This patch offers the s390_enable_sie function, which reorganizes the page tables of a single-threaded process to reserve space in the page table: s390_enable_sie makes sure that the process is single threaded and then uses dup_mm to create a new mm with reorganized page tables. The old mm is freed and the process has now a page status extended field after every page table. Code that wants to exploit pgstes should SELECT CONFIG_PGSTE. This patch has a small common code hit, namely making dup_mm non-static. Edit (Carsten): I've modified Martin's patch, following Jeremy Fitzhardinge's review feedback. Now we do have the prototype for dup_mm in include/linux/sched.h. Following Martin's suggestion, s390_enable_sie() does now call task_lock() to prevent race against ptrace modification of mm_users. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Carsten Otte <cotte@de.ibm.com> Acked-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Avi Kivity <avi@qumranet.com>
* Fix uninitialized 'copy' in unshare_filesAl Viro2008-04-26
| | | | | | | | | | | Arrgghhh... Sorry about that, I'd been sure I'd folded that one, but it actually got lost. Please apply - that breaks execve(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Tested-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'for-linus' of ↵Linus Torvalds2008-04-25
|\ | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: [PATCH] sanitize locate_fd() [PATCH] sanitize unshare_files/reset_files_struct [PATCH] sanitize handling of shared descriptor tables in failing execve() [PATCH] close race in unshare_files() [PATCH] restore sane ->umount_begin() API cifs: timeout dfs automounts +little fix.
| * [PATCH] sanitize unshare_files/reset_files_structAl Viro2008-04-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | * let unshare_files() give caller the displaced files_struct * don't bother with grabbing reference only to drop it in the caller if it hadn't been shared in the first place * in that form unshare_files() is trivially implemented via unshare_fd(), so we eliminate the duplicate logics in fork.c * reset_files_struct() is not just only called for current; it will break the system if somebody ever calls it for anything else (we can't modify ->files of somebody else). Lose the task_struct * argument. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * [PATCH] sanitize handling of shared descriptor tables in failing execve()Al Viro2008-04-25
| | | | | | | | | | | | | | | | | | | | | | | | | | * unshare_files() can fail; doing it after irreversible actions is wrong and de_thread() is certainly irreversible. * since we do it unconditionally anyway, we might as well do it in do_execve() and save ourselves the PITA in binfmt handlers, etc. * while we are at it, binfmt_som actually leaked files_struct on failure. As a side benefit, unshare_files(), put_files_struct() and reset_files_struct() become unexported. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * [PATCH] close race in unshare_files()Al Viro2008-04-25
| | | | | | | | | | | | updating current->files requires task_lock Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | Merge branch 'release' of ↵Linus Torvalds2008-04-25
|\ \ | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6 * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6: [PATCH] Build fix for CONFIG_NUMA=y && CONFIG_SMP=n [IA64] fix bootmem regression on Altix
| * | [PATCH] Build fix for CONFIG_NUMA=y && CONFIG_SMP=nMike Travis2008-04-24
| | | | | | | | | | | | | | | | | | | | | Regression caused by 434d53b00d6bb7be0a1d3dcc0d0d5df6c042e164 Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
| * | [IA64] fix bootmem regression on AltixRuss Anderson2008-04-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A recent change prevents SGI Altix from booting. This patch fixes the problem. The regresson was introduced in commit 434d53b00d6bb7be0a1d3dcc0d0d5df6c042e164 Signed-off-by: Russ Anderson <rja@sgi.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
* | | Merge branch 'for-linus' of ↵Linus Torvalds2008-04-25
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-fixes * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-fixes: sched: fix share (re)distribution softlockup: fix NOHZ wakeup seqlock: livelock fix
| * | | sched: fix share (re)distributionPeter Zijlstra2008-04-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | fix __aggregate_redistribute_shares() related lockup reported by David S. Miller. The problem this code tries to solve is 'accurately' calculating the 'fair' share of the group weight for each cpu. The current code falls back to a global group rebalance in case the sched_domain's span it looks at has no shares, but does have tasks. The reason it gets stuck here, is because its inherently racy - if someone steals the last task after we compute the agg->rq_weight, but before we rebalance, we'll never get out of the loop. We could of course go fix that, but while looking at this issue I found that this 'fallback' wasn't nearly as rare as I'd hoped it to be. In fact its quite common - and given it walks the whole machine, thats very bad. The new approach is simple (why didn't I think of it before?), we set the aggregate shares to the full task group weight, and each larger sched domain that encounters an aggregate shares larger than the weight, clips it (it already re-distributes anyway). This nicely converges to the desired global picture where the sum of all shares equals the task group weight. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | softlockup: fix NOHZ wakeupIngo Molnar2008-04-24
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | David Miller reported: |---------------> the following commit: | commit 27ec4407790d075c325e1f4da0a19c56953cce23 | Author: Ingo Molnar <mingo@elte.hu> | Date: Thu Feb 28 21:00:21 2008 +0100 | | sched: make cpu_clock() globally synchronous | | Alexey Zaytsev reported (and bisected) that the introduction of | cpu_clock() in printk made the timestamps jump back and forth. | | Make cpu_clock() more reliable while still keeping it fast when it's | called frequently. | | Signed-off-by: Ingo Molnar <mingo@elte.hu> causes watchdog triggers when a cpu exits NOHZ state when it has been there for >= the soft lockup threshold, for example here are some messages from a 128 cpu Niagara2 box: [ 168.106406] BUG: soft lockup - CPU#11 stuck for 128s! [dd:3239] [ 168.989592] BUG: soft lockup - CPU#21 stuck for 86s! [swapper:0] [ 168.999587] BUG: soft lockup - CPU#29 stuck for 91s! [make:4511] [ 168.999615] BUG: soft lockup - CPU#2 stuck for 85s! [swapper:0] [ 169.020514] BUG: soft lockup - CPU#37 stuck for 91s! [swapper:0] [ 169.020514] BUG: soft lockup - CPU#45 stuck for 91s! [sh:4515] [ 169.020515] BUG: soft lockup - CPU#69 stuck for 92s! [swapper:0] [ 169.020515] BUG: soft lockup - CPU#77 stuck for 92s! [swapper:0] [ 169.020515] BUG: soft lockup - CPU#61 stuck for 92s! [swapper:0] [ 169.112554] BUG: soft lockup - CPU#85 stuck for 92s! [swapper:0] [ 169.112554] BUG: soft lockup - CPU#101 stuck for 92s! [swapper:0] [ 169.112554] BUG: soft lockup - CPU#109 stuck for 92s! [swapper:0] [ 169.112554] BUG: soft lockup - CPU#117 stuck for 92s! [swapper:0] [ 169.171483] BUG: soft lockup - CPU#40 stuck for 80s! [dd:3239] [ 169.331483] BUG: soft lockup - CPU#13 stuck for 86s! [swapper:0] [ 169.351500] BUG: soft lockup - CPU#43 stuck for 101s! [dd:3239] [ 169.531482] BUG: soft lockup - CPU#9 stuck for 129s! [mkdir:4565] [ 169.595754] BUG: soft lockup - CPU#20 stuck for 93s! [swapper:0] [ 169.626787] BUG: soft lockup - CPU#52 stuck for 93s! [swapper:0] [ 169.626787] BUG: soft lockup - CPU#84 stuck for 92s! [swapper:0] [ 169.636812] BUG: soft lockup - CPU#116 stuck for 94s! [swapper:0] It's simple enough to trigger this by doing a 10 minute sleep after a fresh bootup then starting a parallel kernel build. I suspect this might be reintroducing a problem we've had and fixed before, see the thread: http://marc.info/?l=linux-kernel&m=119546414004065&w=2 <---------------| touch the softlockup watchdog when exiting NOHZ state - we are obviously not locked up. Signed-off-by: Ingo Molnar <mingo@elte.hu>
* / | sched: use alloc_bootmem() instead of alloc_bootmem_low()David Miller2008-04-25
|/ / | | | | | | | | | | | | | | There is no guarantee that there is physical ram below 4GB, and in fact many boxes don't have exactly that. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds2008-04-23
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: iwlwifi: Fix built-in compilation of iwlcore net: Unexport move_addr_to_{kernel,user} rt2x00: Select LEDS_CLASS. iwlwifi: Select LEDS_CLASS. leds: Do not guard NEW_LEDS with HAS_IOMEM [IPSEC]: Fix catch-22 with algorithm IDs above 31 time: Export set_normalized_timespec. tcp: Make use of before macro in tcp_input.c hamradio: Remove unneeded and deprecated cli()/sti() calls in dmascc.c [NETNS]: Remove empty ->init callback. [DCCP]: Convert do_gettimeofday() to getnstimeofday(). [NETNS]: Don't initialize err variable twice. [NETNS]: The ip6_fib_timer can work with garbage on net namespace stop. [IPV4]: Convert do_gettimeofday() to getnstimeofday(). [IPV4]: Make icmp_sk_init() static. [IPV6]: Make struct ip6_prohibit_entry_template static. tcp: Trivial fix to correct function name in a comment in net/ipv4/tcp.c [NET]: Expose netdevice dev_id through sysfs skbuff: fix missing kernel-doc notation [ROSE]: Fix soft lockup wrt. rose_node_list_lock
| * time: Export set_normalized_timespec.YOSHIFUJI Hideaki2008-04-21
| | | | | | | | | | | | | | | | Sorry I have just realized set_normalized_timespec() (used in timespec_sub()) is not exported, and link will fail because of it... Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'for-linus' of ↵Linus Torvalds2008-04-22
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: [PATCH] get rid of __exit_files(), __exit_fs() and __put_fs_struct() [PATCH] proc_readfd_common() race fix [PATCH] double-free of inode on alloc_file() failure exit in create_write_pipe() [PATCH] teach seq_file to discard entries [PATCH] umount_tree() will unhash everything itself [PATCH] get rid of more nameidata passing in namespace.c [PATCH] switch a bunch of LSM hooks from nameidata to path [PATCH] lock exclusively in collect_mounts() and drop_collected_mounts() [PATCH] move a bunch of declarations to fs/internal.h
| * | [PATCH] get rid of __exit_files(), __exit_fs() and __put_fs_struct()Al Viro2008-04-22
| | | | | | | | | | | | | | | | | | | | | | | | The only reason to have separated __...() for those was to keep them inlined for local users in exit.c. Since Alexey removed the inline on those, there's no reason whatsoever to keep them around; just collapse with normal variants. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | kernel-doc: fix sched.c missing parameterRandy Dunlap2008-04-22
|/ / | | | | | | | | | | | | | | | | Add missing kernel-doc in kernel/sched.c: Warning(linux-2.6.25-git3//kernel/sched.c:7044): No description found for parameter 'span' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge branch 'for-linus' of ↵Linus Torvalds2008-04-21
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/juhl/trivial * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/juhl/trivial: (24 commits) DOC: A couple corrections and clarifications in USB doc. Generate a slightly more informative error msg for bad HZ fix typo "is" -> "if" in Makefile ext*: spelling fix prefered -> preferred DOCUMENTATION: Use newer DEFINE_SPINLOCK macro in docs. KEYS: Fix the comment to match the file name in rxrpc-type.h. RAID: remove trailing space from printk line DMA engine: typo fixes Remove unused MAX_NODES_SHIFT MAINTAINERS: Clarify access to OCFS2 development mailing list. V4L: Storage class should be before const qualifier (sn9c102) V4L: Storage class should be before const qualifier sonypi: Storage class should be before const qualifier intel_menlow: Storage class should be before const qualifier DVB: Storage class should be before const qualifier arm: Storage class should be before const qualifier ALSA: Storage class should be before const qualifier acpi: Storage class should be before const qualifier firmware_sample_driver.c: fix coding style MAINTAINERS: Add ati_remote2 driver ... Fixed up trivial conflicts in firmware_sample_driver.c
| * | trivial: small cleanupsPavel Machek2008-04-21
| |/ | | | | | | | | | | | | | | | | | | These are small cleanups all over the tree. Trivial style and comment changes to fs/select.c, kernel/signal.c, kernel/stop_machine.c & mm/pdflush.c Signed-off-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/pci-2.6Linus Torvalds2008-04-21
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/pci-2.6: (42 commits) PCI: Change PCI subsystem MAINTAINER PCI: pci-iommu-iotlb-flushing-speedup PCI: pci_setup_bridge() mustn't be __devinit PCI: pci_bus_size_cardbus() mustn't be __devinit PCI: pci_scan_device() mustn't be __devinit PCI: pci_alloc_child_bus() mustn't be __devinit PCI: replace remaining __FUNCTION__ occurrences PCI: Hotplug: fakephp: Return success, not ENODEV, when bus rescan is triggered PCI: Hotplug: Fix leaks in IBM Hot Plug Controller Driver - ibmphp_init_devno() PCI: clean up resource alignment management PCI: aerdrv_acpi.c: remove unneeded NULL check PCI: Update VIA CX700 quirk PCI: Expose PCI VPD through sysfs PCI: iommu: iotlb flushing PCI: simplify quirk debug output PCI: iova RB tree setup tweak PCI: parisc: use generic pci_enable_resources() PCI: ppc: use generic pci_enable_resources() PCI: powerpc: use generic pci_enable_resources() PCI: ia64: use generic pci_enable_resources() ...
| * | PCI: clean up resource alignment managementIvan Kokshaysky2008-04-21
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Done per Linus' request and suggestions. Linus has explained that better than I'll be able to explain: On Thu, Mar 27, 2008 at 10:12:10AM -0700, Linus Torvalds wrote: > Actually, before we go any further, there might be a less intrusive > alternative: add just a couple of flags to the resource flags field (we > still have something like 8 unused bits on 32-bit), and use those to > implement a generic "resource_alignment()" routine. > > Two flags would do it: > > - IORESOURCE_SIZEALIGN: size indicates alignment (regular PCI device > resources) > > - IORESOURCE_STARTALIGN: start field is alignment (PCI bus resources > during probing) > > and then the case of both flags zero (or both bits set) would actually be > "invalid", and we would also clear the IORESOURCE_STARTALIGN flag when we > actually allocate the resource (so that we don't use the "start" field as > alignment incorrectly when it no longer indicates alignment). > > That wouldn't be totally generic, but it would have the nice property of > automatically at least add sanity checking for that whole "res->start has > the odd meaning of 'alignment' during probing" and remove the need for a > new field, and it would allow us to have a generic "resource_alignment()" > routine that just gets a resource pointer. Besides, I removed IORESOURCE_BUS_HAS_VGA flag which was unused for ages. Signed-off-by: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Gary Hade <garyhade@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* | ptrace: compat_ptrace_request siginfoRoland McGrath2008-04-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds support for PTRACE_GETSIGINFO and PTRACE_SETSIGINFO in compat_ptrace_request. It relies on existing arch definitions for copy_siginfo_to_user32 and copy_siginfo_from_user32. On powerpc, this fixes a longstanding regression of 32-bit ptrace calls on 64-bit kernels vs native calls (64-bit calls or 32-bit kernels). This can be seen in a 32-bit call using PTRACE_GETSIGINFO to examine e.g. siginfo_t.si_addr from a signal that sets it. (This was broken as of 2.6.24 and, I presume, many or all prior versions.) Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge branch 'master' of ↵Linus Torvalds2008-04-21
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-hrt * 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-hrt: hrtimer: optimize the softirq time optimization hrtimer: reduce calls to hrtimer_get_softirq_time() clockevents: fix typo in tick-broadcast.c jiffies: add time_is_after_jiffies and others which compare with jiffies
| * | hrtimer: optimize the softirq time optimizationThomas Gleixner2008-04-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The previous optimization did not take the case into account where a clock provides its own softirq_get_time() function. Check for the availablitiy of the clock get time function first and then check if we need to retrieve the time for both clocks via hrtimer_softirq_gettime() to avoid a double evaluation of time in that case as well. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | hrtimer: reduce calls to hrtimer_get_softirq_time()Dimitri Sivanich2008-04-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It seems that hrtimer_run_queues() is calling hrtimer_get_softirq_time() more often than it needs to. This can cause frequent contention on systems with large numbers of processors/cores. With this patch, hrtimer_run_queues only calls hrtimer_get_softirq_time() if there is a pending timer in one of the hrtimer bases, and only once. This also combines hrtimer_run_queues() and the inline run_hrtimer_queue() into one function. [ tglx@linutronix.de: coding style ] Signed-off-by: Dimitri Sivanich <sivanich@sgi.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | clockevents: fix typo in tick-broadcast.cGlauber Costa2008-04-21
| |/ | | | | | | | | | | | | | | braodcast -> broadcast Signed-off-by: Glauber Costa <gcosta@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* | Merge branch 'semaphore' of ↵Linus Torvalds2008-04-21
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/willy/misc * 'semaphore' of git://git.kernel.org/pub/scm/linux/kernel/git/willy/misc: Deprecate the asm/semaphore.h files in feature-removal-schedule. Convert asm/semaphore.h users to linux/semaphore.h security: Remove unnecessary inclusions of asm/semaphore.h lib: Remove unnecessary inclusions of asm/semaphore.h kernel: Remove unnecessary inclusions of asm/semaphore.h include: Remove unnecessary inclusions of asm/semaphore.h fs: Remove unnecessary inclusions of asm/semaphore.h drivers: Remove unnecessary inclusions of asm/semaphore.h net: Remove unnecessary inclusions of asm/semaphore.h arch: Remove unnecessary inclusions of asm/semaphore.h
| * | kernel: Remove unnecessary inclusions of asm/semaphore.hMatthew Wilcox2008-04-18
| |/ | | | | | | | | | | | | None of these files use any of the functionality promised by asm/semaphore.h. Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
* | Merge branch 'for-linus' of ↵Linus Torvalds2008-04-21
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-devel * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-devel: (62 commits) sched: build fix sched: better rt-group documentation sched: features fix sched: /debug/sched_features sched: add SCHED_FEAT_DEADLINE sched: debug: show a weight tree sched: fair: weight calculations sched: fair-group: de-couple load-balancing from the rb-trees sched: fair-group scheduling vs latency sched: rt-group: optimize dequeue_rt_stack sched: debug: add some debug code to handle the full hierarchy sched: fair-group: SMP-nice for group scheduling sched, cpuset: customize sched domains, core sched, cpuset: customize sched domains, docs sched: prepatory code movement sched: rt: multi level group constraints sched: task_group hierarchy sched: fix the task_group hierarchy for UID grouping sched: allow the group scheduler to have multiple levels sched: mix tasks and groups ...
| * | sched: build fixIngo Molnar2008-04-19
| | | | | | | | | | | | Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | sched: features fixIngo Molnar2008-04-19
| | | | | | | | | | | | Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | sched: /debug/sched_featuresPeter Zijlstra2008-04-19
| | | | | | | | | | | | | | | | | | | | | | | | provide a text based interface to the scheduler features; this saves the 'user' from setting bits using decimal arithmetic. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | sched: add SCHED_FEAT_DEADLINEIngo Molnar2008-04-19
| | | | | | | | | | | | | | | | | | unused at the moment. Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | sched: debug: show a weight treePeter Zijlstra2008-04-19
| | | | | | | | | | | | | | | | | | | | | Print a tree of weights. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | sched: fair: weight calculationsPeter Zijlstra2008-04-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to level the hierarchy, we need to calculate load based on the root view. That is, each task's load is in the same unit. A / \ B 1 / \ 2 3 To compute 1's load we do: weight(1) -------------- rq_weight(A) To compute 2's load we do: weight(2) weight(B) ------------ * ----------- rq_weight(B) rw_weight(A) This yields load fractions in comparable units. The consequence is that it changes virtual time. We used to have: time_{i} vtime_{i} = ------------ weight_{i} vtime = \Sum vtime_{i} = time / rq_weight. But with the new way of load calculation we get that vtime equals time. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | sched: fair-group: de-couple load-balancing from the rb-treesPeter Zijlstra2008-04-19
| | | | | | | | | | | | | | | | | | | | | | | | De-couple load-balancing from the rb-trees, so that I can change their organization. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | sched: fair-group scheduling vs latencyPeter Zijlstra2008-04-19
| | | | | | | | | | | | | | | | | | | | | | | | Currently FAIR_GROUP sched grows the scheduler latency outside of sysctl_sched_latency, invert this so it stays within. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | sched: rt-group: optimize dequeue_rt_stackPeter Zijlstra2008-04-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | Now that the group hierarchy can have an arbitrary depth the O(n^2) nature of RT task dequeues will really hurt. Optimize this by providing space to store the tree path, so we can walk it the other way. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | sched: debug: add some debug code to handle the full hierarchyPeter Zijlstra2008-04-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add some extra debug output so we can get a better overview of the full hierarchy. We print the cgroup path after each cfs_rq, so we can see what group we're looking at. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | sched: fair-group: SMP-nice for group schedulingPeter Zijlstra2008-04-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement SMP nice support for the full group hierarchy. On each load-balance action, compile a sched_domain wide view of the full task_group tree. We compute the domain wide view when walking down the hierarchy, and readjust the weights when walking back up. After collecting and readjusting the domain wide view, we try to balance the tasks within the task_groups. The current approach is a naively balance each task group until we've moved the targeted amount of load. Inspired by Srivatsa Vaddsgiri's previous code and Abhishek Chandra's H-SMP paper. XXX: there will be some numerical issues due to the limited nature of SCHED_LOAD_SCALE wrt to representing a task_groups influence on the total weight. When the tree is deep enough, or the task weight small enough, we'll run out of bits. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> CC: Abhishek Chandra <chandra@cs.umn.edu> CC: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | sched, cpuset: customize sched domains, coreHidetoshi Seto2008-04-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [rebased for sched-devel/latest] - Add a new cpuset file, having levels: sched_relax_domain_level - Modify partition_sched_domains() and build_sched_domains() to take attributes parameter passed from cpuset. - Fill newidle_idx for node domains which currently unused but might be required if sched_relax_domain_level become higher. - We can change the default level by boot option 'relax_domain_level='. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | sched: prepatory code movementPeter Zijlstra2008-04-19
| | | | | | | | | | | | | | | Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | sched: rt: multi level group constraintsPeter Zijlstra2008-04-19
| | | | | | | | | | | | | | | | | | | | | multi level rt constraints Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | sched: task_group hierarchyPeter Zijlstra2008-04-19
| | | | | | | | | | | | | | | | | | | | | Add the full parent<->child relation thing into task_groups as well. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | sched: fix the task_group hierarchy for UID groupingPeter Zijlstra2008-04-19
| | | | | | | | | | | | | | | | | | | | | | | | UID grouping doesn't actually have a task_group representing the root of the task_group tree. Add one. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | sched: allow the group scheduler to have multiple levelsDhaval Giani2008-04-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch makes the group scheduler multi hierarchy aware. [a.p.zijlstra@chello.nl: rt-parts and assorted fixes] Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | sched: mix tasks and groupsDhaval Giani2008-04-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch allows tasks and groups to exist in the same cfs_rq. With this change the CFS group scheduling follows a 1/(M+N) model from a 1/(1+N) fairness model where M tasks and N groups exist at the cfs_rq level. [a.p.zijlstra@chello.nl: rt bits and assorted fixes] Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com> Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>