aboutsummaryrefslogtreecommitdiffstats
path: root/include
Commit message (Collapse)AuthorAge
* Merge branch 'release' of ↵Linus Torvalds2007-11-09
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6 * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6: [IA64] IOSAPIC bogus error cleanup [IA64] Update printing of feature set bits [IA64] Fix IOSAPIC delivery mode setting [IA64] XPC heartbeat timer function must run on CPU 0 [IA64] Clean up /proc/interrupts output [IA64] Disable/re-enable CPE interrupts on Altix [IA64] Clean-up McKinley Errata message [IA64] Add gate.lds to list of files ignored by Git [IA64] Fix section mismatch in contig.c version of per_cpu_init() [IA64] Wrong args to memset in efi_gettimeofday() [IA64] Remove duplicate includes from ia32priv.h [IA64] fix number of bytes zeroed by sys_fw_init() in arch/ia64/hp/sim/boot/fw-emu.c [IA64] Fix perfmon sysctl directory modes
| * [IA64] Update printing of feature set bitsRuss Anderson2007-11-09
| | | | | | | | | | | | | | | | | | | | | | Newer Itanium versions have added additional processor feature set bits. This patch prints all the implemented feature set bits. Some bit descriptions have not been made public. For those bits, a generic "Feature set X bit Y" message is printed. Bits that are not implemented will no longer be printed. Signed-off-by: Russ Anderson <rja@sgi.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-schedLinus Torvalds2007-11-09
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched: sched: proper prototype for kernel/sched.c:migration_init() sched: avoid large irq-latencies in smp-balancing sched: fix copy_namespace() <-> sched_fork() dependency in do_fork sched: clean up the wakeup preempt check, #2 sched: clean up the wakeup preempt check sched: wakeup preemption fix sched: remove PREEMPT_RESTRICT sched: turn off PREEMPT_RESTRICT KVM: fix !SMP build error x86: make nmi_cpu_busy() always defined x86: make ipi_handler() always defined sched: cleanup, use NSEC_PER_MSEC and NSEC_PER_SEC sched: reintroduce SMP tunings again sched: restore deterministic CPU accounting on powerpc sched: fix delay accounting regression sched: reintroduce the sched_min_granularity tunable sched: documentation: place_entity() comments sched: fix vslice
| * | sched: proper prototype for kernel/sched.c:migration_init()Adrian Bunk2007-11-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds a proper prototype for migration_init() in include/linux/sched.h Since there's no point in always returning 0 to a caller that doesn't check the return value it also changes the function to return void. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | sched: avoid large irq-latencies in smp-balancingPeter Zijlstra2007-11-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SMP balancing is done with IRQs disabled and can iterate the full rq. When rqs are large this can cause large irq-latencies. Limit the nr of iterations on each run. This fixes a scheduling latency regression reported by the -rt folks. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Steven Rostedt <rostedt@goodmis.org> Tested-by: Gregory Haskins <ghaskins@novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | sched: remove PREEMPT_RESTRICTIngo Molnar2007-11-09
| | | | | | | | | | | | | | | | | | | | | remove PREEMPT_RESTRICT. (this is a separate commit so that any regression related to the removal itself is bisectable) Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | KVM: fix !SMP build errorIngo Molnar2007-11-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | fix a !SMP build error: drivers/kvm/kvm_main.c: In function 'kvm_flush_remote_tlbs': drivers/kvm/kvm_main.c:220: error: implicit declaration of function 'smp_call_function_mask' (and also avoid unused function warning related to up_smp_call_function() not making use of the 'func' parameter.) Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | sched: restore deterministic CPU accounting on powerpcPaul Mackerras2007-11-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since powerpc started using CONFIG_GENERIC_CLOCKEVENTS, the deterministic CPU accounting (CONFIG_VIRT_CPU_ACCOUNTING) has been broken on powerpc, because we end up counting user time twice: once in timer_interrupt() and once in update_process_times(). This fixes the problem by pulling the code in update_process_times that updates utime and stime into a separate function called account_process_tick. If CONFIG_VIRT_CPU_ACCOUNTING is not defined, there is a version of account_process_tick in kernel/timer.c that simply accounts a whole tick to either utime or stime as before. If CONFIG_VIRT_CPU_ACCOUNTING is defined, then arch code gets to implement account_process_tick. This also lets us simplify the s390 code a bit; it means that the s390 timer interrupt can now call update_process_times even when CONFIG_VIRT_CPU_ACCOUNTING is turned on, and can just implement a suitable account_process_tick(). account_process_tick() now takes the task_struct * as an argument. Tested both with and without CONFIG_VIRT_CPU_ACCOUNTING. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | sched: reintroduce the sched_min_granularity tunablePeter Zijlstra2007-11-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | we lost the sched_min_granularity tunable to a clever optimization that uses the sched_latency/min_granularity ratio - but the ratio is quite unintuitive to users and can also crash the kernel if the ratio is set to 0. So reintroduce the min_granularity tunable, while keeping the ratio maintained internally. no functionality changed. [ mingo@elte.hu: some fixlets. ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | | Merge master.kernel.org:/pub/scm/linux/kernel/git/lethal/sh-2.6Linus Torvalds2007-11-09
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * master.kernel.org:/pub/scm/linux/kernel/git/lethal/sh-2.6: (26 commits) sh: remove dead config symbols from SH code sh: Kill off broken snapgear ds1302 code. sh: Add a dummy vga.h. rtc: rtc-sh: Zero out tm value for invalid rtc states. rtc: sh-rtc: Handle rtc_device_register() failure properly. sh: Fix heartbeart on Solution Engine series sh: Remove SCI_NPORTS from sh-sci.h sh: Fix up PAGE_KERNEL_PCC() for nommu. sh: hs7751rvoip: Kill off dead IPR IRQ mappings. sh: hs7751rvoip: irq.c needs linux/interrupt.h. sh: Kill off __{copy,clear}_user_page(). sh: Optimized copy_{to,from}_user_page() for SH-4. sh: Wire up clear_user_highpage(). sh: Kill off the remaining ST40 cruft. superhyway: Handle device_register() retval properly. sh: kgdb sysrq depends on magic sysrq. sh: Add -Werror for clean directories. sh: Fix up kgdb build with modular sh-sci. sh: Export __{s,u}divsi3_i4i on all CPUs. sh: Fix up kgdb-on-NMI branch target. ...
| * | | sh: remove dead config symbols from SH codeJiri Olsa2007-11-08
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Jiri Olsa <olsajiri@gmail.com> Signed-off-by: Paul Mundt <lethal@linux-sh.org>
| * | | Merge branch 'page_colouring_despair'Paul Mundt2007-11-08
| |\ \ \
| | * | | sh: Kill off __{copy,clear}_user_page().Paul Mundt2007-11-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that copy_to_user_page()/copy_from_user_page() are wired up, we can drop the old __copy_xxx() implementations. Now that the page colouring scheme has changed via kmap_coherent(), we can avoid the flush in these specific helpers. Signed-off-by: Paul Mundt <lethal@linux-sh.org>
| | * | | sh: Optimized copy_{to,from}_user_page() for SH-4.Paul Mundt2007-11-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This moves copy_{to,from}_user_page() out-of-line on SH-4 and converts for the kmap_coherent() API. Based on the MIPS implementation. Signed-off-by: Paul Mundt <lethal@linux-sh.org>
| | * | | sh: Wire up clear_user_highpage().Paul Mundt2007-11-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With the kmap_coherent() API in place, this is trivial to implement, and lets us avoid the cache flush in certain cases. Signed-off-by: Paul Mundt <lethal@linux-sh.org>
| * | | | sh: Add a dummy vga.h.Paul Mundt2007-11-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have nothing to do here, but there are continually drivers that fail to build without it. Stub it in. Signed-off-by: Paul Mundt <lethal@linux-sh.org>
| * | | | sh: Fix up PAGE_KERNEL_PCC() for nommu.Paul Mundt2007-11-06
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | PAGE_KERNEL_PCC() takes two arguments, which weren't reflected in the nommu case. Fix it up. Signed-off-by: Paul Mundt <lethal@linux-sh.org>
| * | | sh: Kill off the remaining ST40 cruft.Paul Mundt2007-11-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ST40 stuff in-tree hasn't built for some time, and hasn't been updated for over 3 years. ST maintains their own out-of-tree changes and rebases occasionally, and that's ultimately where all of the ST40 users go anyways. In order for the ST40 code to be brought up to date most of the stuff removed in this changeset would have to be rewritten anyways, so there's very little benefit in keeping the remnants around either. Signed-off-by: Paul Mundt <lethal@linux-sh.org>
| * | | sh: remove PTRACE_O_TRACESYSGOOD from asm/ptrace.hMike Frysinger2007-11-06
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | The common linux/ptrace.h already defines PTRACE_O_TRACESYSGOOD so there is no need to have arches do it. This also keeps glibc-2.7 from breaking since it has an enum for the PTRACE_O_* flags. Signed-off-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Paul Mundt <lethal@linux-sh.org>
* | | Merge master.kernel.org:/home/rmk/linux-2.6-armLinus Torvalds2007-11-09
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * master.kernel.org:/home/rmk/linux-2.6-arm: [ARM] pxa: fix one-shot timer mode [ARM] 4645/1: Cyberpro: Trivial fix to restore 16bpp mode. [ARM] 4644/2: fix flush_kern_tlb_range() in module space [ARM] Allow watchdog drivers to be selected again [ARM] 4633/1: omap build fix when FB enabled [ARM] 4642/2: netX: default config for netx based boards [ARM] 4641/2: netX: fix kobject_name type [ARM] Fix iop3xx macro
| * | | [ARM] 4644/2: fix flush_kern_tlb_range() in module spaceKevin Hilman2007-11-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For kernel addresses between TASK_SIZE and PAGE_OFFSET, flush_tlb_kern_range() does not work as would be expected. The TLB invalidate works with a matching ASID, or on entries marked as global. The set_pte_at() macro marks addresses >= PAGE_OFFSET as global, but not addresses from TASK_SIZE to PAGE_OFFSET, which are also kernel addresses. The result is that the entries in this range are not actually invalidated by flush_tlb_kern_range(). This patch instead marks addresses >= TASK_SIZE as global. Signed-off-by: Satoru Fujii <s-fujii@ct.jp.nec.com> Signed-off-by: Kevin Hilman <khilman@mvista.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| * | | [ARM] Fix iop3xx macroRussell King2007-11-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Macro arguments used in expressions should have parens. This avoids warnings such as: drivers/serial/8250.c:1951: warning: suggest parentheses around arithmetic in operand of | Acked-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Lennert Buytenhek <buytenh@wantstofly.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
* | | | Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds2007-11-09
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-linus' of git://git.kernel.dk/linux-2.6-block: Add UNPLUG traces to all appropriate places block: fix requeue handling in blk_queue_invalidate_tags() mmc: Fix sg helper copy-and-paste error pktcdvd: fix BUG caused by sysfs module reference semantics change ioprio: allow sys_ioprio_set() value of 0 to reset ioprio setting cfq_idle_class_timer: add paranoid checks for jiffies overflow cfq: fix IOPRIO_CLASS_IDLE delays cfq: fix IOPRIO_CLASS_IDLE accounting
| * | | | Add UNPLUG traces to all appropriate placesAlan D. Brunelle2007-11-09
| | |/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | Added blk_unplug interface, allowing all invocations of unplugs to result in a generated blktrace UNPLUG. Signed-off-by: Alan D. Brunelle <Alan.Brunelle@hp.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* | | | Merge branch 'merge' of ↵Linus Torvalds2007-11-09
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (37 commits) [POWERPC] EEH: Make sure warning message is printed [POWERPC] Make altivec code in swsusp_32.S depend on CONFIG_ALTIVEC [POWERPC] windfarm: Fix windfarm thread freezer interaction [POWERPC] Fix si_addr value on low level hash failures [POWERPC] Refresh ppc64_defconfig and enable pasemi-related options [POWERPC] pasemi: Update defconfig [POWERPC] iSeries: Fix ref counting in vio setup [POWERPC] ] Fix memset size error [POWERPC] Fix link errors for allyesconfig [POWERPC] iSeries_init_IRQ non-PCI tidy [POWERPC] Change fallocate to match unistd.h on powerpc [POWERPC] EEH: Avoid crash on null device [POWERPC] EEH: Drivers that need reset trump others [POWERPC] EEH: Clean up comments [POWERPC] Fix off-by-one error in setting decrementer on Book E/4xx (v2) [POWERPC] Fix switch_slb handling of 1T ESID values [POWERPC] Fix build failure when CONFIG_VIRT_CPU_ACCOUNTING is not defined [POWERPC] Include udbg.h when using udbg_printf [POWERPC] Fix cache line vs. block size confusion [POWERPC] Fix sysctl table check failure on PowerMac ...
| * \ \ \ Merge branch 'for-2.6.24' of ↵Paul Mackerras2007-11-07
| |\ \ \ \ | | | | | | | | | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/jwboyer/powerpc-4xx into merge
| | * | | | [POWERPC] 4xx: Deal with 44x virtually tagged icacheBenjamin Herrenschmidt2007-11-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The 44x family has an interesting "feature" which is a virtually tagged instruction cache (yuck !). So far, we haven't dealt with it properly, which means we've been mostly lucky or people didn't report the problems, unless people have been running custom patches in their distro... This is an attempt at fixing it properly. I chose to do it by setting a global flag whenever we change a PTE that was previously marked executable, and flush the entire instruction cache upon return to user space when that happens. This is a bit heavy handed, but it's hard to do more fine grained flushes as the icbi instruction, on those processor, for some very strange reasons (since the cache is virtually mapped) still requires a valid TLB entry for reading in the target address space, which isn't something I want to deal with. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
| | * | | | [POWERPC] 4xx: Fix 4xx flush_tlb_page()Benjamin Herrenschmidt2007-11-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On 4xx CPUs, the current implementation of flush_tlb_page() uses a low level _tlbie() assembly function that only works for the current PID. Thus, invalidations caused by, for example, a COW fault triggered by get_user_pages() from a different context will not work properly, causing among other things, gdb breakpoints to fail. This patch adds a "pid" argument to _tlbie() on 4xx processors, and uses it to flush entries in the right context. FSL BookE also gets the argument but it seems they don't need it (their tlbivax form ignores the PID when invalidating according to the document I have). Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
| * | | | | [POWERPC] Change fallocate to match unistd.h on powerpcPatrick Mansfield2007-11-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix the fallocate system call on powerpc to match its unistd.h. This implies none of these system calls are currently working with the unistd.h sys call values: fallocate signalfd timerfd eventfd sync_file_range2 Signed-off-by: Patrick Mansfield <patmans@us.ibm.com> Acked-by: Anton Blanchard <anton@samba.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
| * | | | | [POWERPC] Fix off-by-one error in setting decrementer on Book E/4xx (v2)Paul Mackerras2007-11-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The decrementer in Book E and 4xx processors interrupts on the transition from 1 to 0, rather than on the 0 to -1 transition as on 64-bit server and 32-bit "classic" (6xx/7xx/7xxx) processors. At the moment we subtract 1 from the count of how many decrementer ticks are required before the next interrupt before putting it into the decrementer, which is correct for server/classic processors, but could possibly cause the interrupt to happen too early on Book E and 4xx if the timebase/decrementer frequency is low. This fixes the problem by making set_dec subtract 1 from the count for server and classic processors, instead of having the callers subtract 1. Since set_dec already had a bunch of ifdefs to handle different processor types, there is no net increase in ugliness. :) Note that calling set_dec(0) may not generate an interrupt on some processors. To make sure that decrementer_set_next_event always calls set_dec with an interval of at least 1 tick, we set min_delta_ns of the decrementer_clockevent to correspond to 2 ticks (2 rather than 1 to compensate for truncations in the conversions between ticks and ns). This also removes a redundant call to set the decrementer to 0x7fffffff - it was already set to that earlier in timer_interrupt. Signed-off-by: Paul Mackerras <paulus@samba.org>
* | | | | | frv: Remove bogus NO_IRQ = -1 defineAlan Cox2007-11-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The old NO_IRQ define some platforms had was long ago declared obsolete and wrong. FRV should therefore not be re-introducing this, especially as IRQs are usually unsigned in the kernel. The "no IRQ" case is defined to be zero and Linus made this rather clear at the time. arch/frv shows no dependancy on this but it might show up driver fixes needing doing I guess Signed-off-by: Alan Cox <alan@redhat.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | Merge branch 'master' of ↵Linus Torvalds2007-11-09
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6 * 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6: [SPARC64]: Use "is_power_of_2" macro for simplicity. [SPARC]: Remove duplicate includes.
| * | | | | | [SPARC64]: Use "is_power_of_2" macro for simplicity.Robert P. J. Day2007-11-07
| | |_|/ / / | |/| | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | [NETLINK]: Fix unicast timeoutsPatrick McHardy2007-11-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit ed6dcf4a in the history.git tree broke netlink_unicast timeouts by moving the schedule_timeout() call to a new function that doesn't propagate the remaining timeout back to the caller. This means on each retry we start with the full timeout again. ipc/mqueue.c seems to actually want to wait indefinitely so this behaviour is retained. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | [INET]: Remove per bucket rwlock in tcp/dccp ehash table.Eric Dumazet2007-11-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As done two years ago on IP route cache table (commit 22c047ccbc68fa8f3fa57f0e8f906479a062c426) , we can avoid using one lock per hash bucket for the huge TCP/DCCP hash tables. On a typical x86_64 platform, this saves about 2MB or 4MB of ram, for litle performance differences. (we hit a different cache line for the rwlock, but then the bucket cache line have a better sharing factor among cpus, since we dirty it less often). For netstat or ss commands that want a full scan of hash table, we perform fewer memory accesses. Using a 'small' table of hashed rwlocks should be more than enough to provide correct SMP concurrency between different buckets, without using too much memory. Sizing of this table depends on num_possible_cpus() and various CONFIG settings. This patch provides some locking abstraction that may ease a future work using a different model for TCP/DCCP table. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | [IPVS]: Synchronize closing of ConnectionsRumen G. Bogdanovski2007-11-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch makes the master daemon to sync the connection when it is about to close. This makes the connections on the backup to close or timeout according their state. Before the sync was performed only if the connection is in ESTABLISHED state which always made the connections to timeout in the hard coded 3 minutes. However the Andy Gospodarek's patch ([IPVS]: use proper timeout instead of fixed value) effectively did nothing more than increasing this to 15 minutes (Established state timeout). So this patch makes use of proper timeout since it syncs the connections on status changes to FIN_WAIT (2min timeout) and CLOSE (10sec timeout). However if the backup misses CLOSE hopefully it did not miss FIN_WAIT. Otherwise we will just have to wait for the ESTABLISHED state timeout. As it is without this patch. This way the number of the hanging connections on the backup is kept to minimum. And very few of them will be left to timeout with a long timeout. This is important if we want to make use of the fix for the real server overcommit on master/backup fail-over. Signed-off-by: Rumen G. Bogdanovski <rumen@voicecho.com> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | [IPVS]: Bind connections on stanby if the destination existsRumen G. Bogdanovski2007-11-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes the problem with node overload on director fail-over. Given the scenario: 2 nodes each accepting 3 connections at a time and 2 directors, director failover occurs when the nodes are fully loaded (6 connections to the cluster) in this case the new director will assign another 6 connections to the cluster, If the same real servers exist there. The problem turned to be in not binding the inherited connections to the real servers (destinations) on the backup director. Therefore: "ipvsadm -l" reports 0 connections: root@test2:~# ipvsadm -l IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP test2.local:5999 wlc -> node473.local:5999 Route 1000 0 0 -> node484.local:5999 Route 1000 0 0 while "ipvs -lnc" is right root@test2:~# ipvsadm -lnc IPVS connection entries pro expire state source virtual destination TCP 14:56 ESTABLISHED 192.168.0.10:39164 192.168.0.222:5999 192.168.0.51:5999 TCP 14:59 ESTABLISHED 192.168.0.10:39165 192.168.0.222:5999 192.168.0.52:5999 So the patch I am sending fixes the problem by binding the received connections to the appropriate service on the backup director, if it exists, else the connection will be handled the old way. So if the master and the backup directors are synchronized in terms of real services there will be no problem with server over-committing since new connections will not be created on the nonexistent real services on the backup. However if the service is created later on the backup, the binding will be performed when the next connection update is received. With this patch the inherited connections will show as inactive on the backup: root@test2:~# ipvsadm -l IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP test2.local:5999 wlc -> node473.local:5999 Route 1000 0 1 -> node484.local:5999 Route 1000 0 1 rumen@test2:~$ cat /proc/net/ip_vs IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP C0A800DE:176F wlc -> C0A80033:176F Route 1000 0 1 -> C0A80032:176F Route 1000 0 1 Regards, Rumen Bogdanovski Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Rumen G. Bogdanovski <rumen@voicecho.com> Signed-off-by: Simon Horman <horms@verge.net.au>
* | | | | | [TTY]: Fix network driver interactions with TCGET/SET calls.Alan Cox2007-11-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Dave Miller noted various cases where line disciplines for things like ppp go poking around in termios themselves in ways that broke with the new termios code. Rather than have them all learning about termios internals provide proper methods for this - tty_mode_ioctl() This handles all the terminal mode handling for speed/carrier etc and none of the methods are ldisc dependant so they can be called by any user - tty_perform_flush() This extracts the flush functionality and enables pppd the ppp layer to share it cleanly. The existing n_tty_ioctl code is refactored in this patch to provide the new functions and to call them itself appropriately. This patch has no (intended) behaviour changes and simply prepares for the other fixes. Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | [IPV4]: Compact some ifdefs in the fib code.Pavel Emelyanov2007-11-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are places that check for CONFIG_IP_MULTIPLE_TABLES twice in the same file, but the internals of these #ifdefs can be merged. As a side effect - remove one ifdef from inside a function. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | [NET]: Kill proc_net_create()David S. Miller2007-11-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are no more users. Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | [NET]: Define infrastructure to keep 'inuse' changes in an efficent SMP/NUMA ↵Eric Dumazet2007-11-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | way. "struct proto" currently uses an array stats[NR_CPUS] to track change on 'inuse' sockets per protocol. If NR_CPUS is big, this means we use a big memory area for this. Moreover, all this memory area is located on a single node on NUMA machines, increasing memory pressure on the boot node. In this patch, I tried to : - Keep a fast !CONFIG_SMP implementation - Keep a fast CONFIG_SMP implementation for often used protocols (tcp,udp,raw,...) - Introduce a NUMA efficient implementation Some helper macros are defined in include/net/sock.h These macros take into account CONFIG_SMP If a "struct proto" is declared without using DEFINE_PROTO_INUSE / REF_PROTO_INUSE macros, it will automatically use a default implementation, using a dynamically allocated percpu zone. This default implementation will be NUMA efficient, but might use 32/64 bytes per possible cpu because of current alloc_percpu() implementation. However it still should be better than previous implementation based on stats[NR_CPUS] field. When a "struct proto" is changed to use the new macros, we use a single static "int" percpu variable, lowering the memory and cpu costs, still preserving NUMA efficiency. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | [IPV4]: Clean the ip_sockglue.c from some ugly ifdefsPavel Emelyanov2007-11-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The #idfed CONFIG_IP_MROUTE is sometimes places inside the if-s, which looks completely bad. Similar ifdefs inside the functions looks a bit better, but they are also not recommended to be used. Provide an ifdef-ed ip_mroute_opt() helper to cleanup the code. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | [NETFILTER]: Sort matches/targets in Kbuild fileJan Engelhardt2007-11-07
|/ / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sort matches and targets in the Kbuild file. Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | Merge branch 'upstream-linus' of ↵Linus Torvalds2007-11-05
|\ \ \ \ \ | |_|_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev * 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev: libata: handle broken cable reporting pata_hpt37x: Fix outstanding bug reports on the HPT374 and 37x cable detect ata_piix: Add additional PCI identifier for 40 wire short cable pata_serverworks: Fix problem with some drive combinations libata: Don't disable dipm with SET FEATURES libata and bogus LBA48 drives
| * | | | libata: handle broken cable reportingAlan Cox2007-11-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | One or two ancient drives predated the cable spec and didn't sent the valid bits for the field. I had hoped to leave this out of libata as a piece of historical annoyance but a recent CD drive shows the same bug so we have to import support for it. Same concept as Bartlomiej's changes old IDE except that as we have centralised blacklists we can avoid keeping another private table of stuff Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
| * | | | libata and bogus LBA48 drivesGeert Uytterhoeven2007-11-04
| | |_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A colleague noticed recent versions of Ubuntu no longer detect his 80 GB ST380020ACE drive. This drive is special in that it advertises LBA48 support, but has the lba_capacity_2 field set to zero (cfr. http://lkml.org/lkml/2004/3/30/163). Upon closer look, libata indeed doesn't seem to handle this case yet. Below is an (untested) fix. Signed-off-by: Jeff Garzik <jeff@garzik.org>
* | | | m68knommu: fix pread/pwrite definesGreg Ungerer2007-11-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix system call defines for system call 180 and 181 to match the underlying system call table function entries. System call 180 calls sys_pread64, and 181 calls sys_pwrite64, so make the definitions match. Signed-off-by: Greg Ungerer <gerg@uclinux.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | arm26: remove it againHugh Dickins2007-11-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A tiny vestige of arm26 has appeared: remove it again. (akpm: someone (tm) needs to remove include/asm-arm26/ too) Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | Missing include file in kallsyms.hKamalesh Babulal2007-11-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Build with randconfig fails with following error with the 2.6.24-rc4-git9 include/linux/kallsyms.h:56: error: `NULL' undeclared (first use in this function) include/linux/kallsyms.h:56: error: (Each undeclared identifier is reported only once include/linux/kallsyms.h:56: error: for each function it appears in.) make[2]: *** [arch/powerpc/platforms/cell/spu_callbacks.o] Error 1 make[1]: *** [arch/powerpc/platforms/cell] Error 2 make: *** [arch/powerpc/platforms] Error 2 Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6Linus Torvalds2007-11-05
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6: PCI: Add Kconfig option to disable deprecated pci_find_* API PCI: pciserial_resume_one ignored return value of pci_enable_device PCI Hotplug: cpqhp_pushbutton_thread(): remove a pointless if() check PCI: make pci_match_device() static PCI: Remove 3 incorrect MSI quirks. PCI: Add MSI INTX_DISABLE quirks for ATI SB700/800 SATA and IXP SB400 USB PCI: Add quirk for devices which disable MSI when INTX_DISABLE is set. PCI: Add MSI quirk for ServerWorks HT1000 PCIX bridge. PCI: Revert "PCI: disable MSI by default on systems with Serverworks HT1000 chips"