aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAge
* [Blackfin] arch: fix gdb testing regressionBernd Schmidt2008-05-06
| | | | | | | | | | | | | | When transferring to IRQ5 from an exception, save SYSCFG in memory across the transfer and clear the trace bit. When we get a single step exception, check whether we can safely clear the trace bit in SYSCFG. We can (and should) clear it after the first instruction of the interrupt handler; the first insn saves SYSCFG to the stack in all handlers. Signed-off-by: Bernd Schmidt <bernds_cb1@t-online.de> Signed-off-by: Bryan Wu <cooloney@kernel.org>
* [Blackfin] arch: disable single stepping when delivering a signalBernd Schmidt2008-05-06
| | | | | | | | | | When delivering a signal, disable single stepping but call ptrace_notify if it was enabled before. The idea was taken from the x86 port. Signed-off-by: Bernd Schmidt <bernds_cb1@t-online.de> Signed-off-by: Bryan Wu <cooloney@kernel.org>
* [Blackfin] arch: Delete unused (copied from m68k) entries in asm-offsets.c.Bernd Schmidt2008-05-06
| | | | | | | | | Fix some really ancient code that was correct only for the m68k port. Delete unused (i.e. copied from m68k) entries in asm-offsets.c. Signed-off-by: Bernd Schmidt <bernds_cb1@t-online.de> Signed-off-by: Bryan Wu <cooloney@kernel.org>
* [Blackfin] arch: In the double fault handler, set up the PT_RETI slotBernd Schmidt2008-05-06
| | | | | | | | | In the double fault handler, set up the PT_RETI slot so that we print out the correct return address in the dumping code. Signed-off-by: Bernd Schmidt <bernds_cb1@t-online.de> Signed-off-by: Bryan Wu <cooloney@kernel.org>
* [Blackfin] arch: Support for CPU_FREQ and NOHZVitja Makarov2008-05-06
| | | | | Singed-off-by: Vitja Makarov <vitja.makarov@gmail.com>
* [Blackfin] arch: Functional power management support: Add CPU and platform ↵Michael Hennerich2008-05-06
| | | | | | | | voltage scaling support Signed-off-by: Michael Hennerich <michael.hennerich@analog.com> Signed-off-by: Bryan Wu <cooloney@kernel.org>
* [Blackfin] arch: fix bug - breaking the atomic sections code.Bernd Schmidt2008-05-06
| | | | | | | | | | | | The following cleanup patch: add __user markings to a few userspace system functions mysteriously added a "&" operator that doesn't belong in there, breaking the atomic sections code. Signed-off-by: Bernd Schmidt <bernds_cb1@t-online.de> Signed-off-by: Bryan Wu <cooloney@kernel.org>
* [Blackfin] arch: Equalize include files: Add VR_CTL masksMichael Hennerich2008-05-06
| | | | | | Signed-off-by: Michael Hennerich <michael.hennerich@analog.com> Signed-off-by: Bryan Wu <cooloney@kernel.org>
* [Blackfin] arch: Cleanup Kconfig, fix comment and make sure we exclude ↵Michael Hennerich2008-05-06
| | | | | | | | CCLK=SCLK for some configurations Signed-off-by: Michael Hennerich <michael.hennerich@analog.com> Signed-off-by: Bryan Wu <cooloney@kernel.org>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds2008-05-08
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (32 commits) net: Added ASSERT_RTNL() to dev_open() and dev_close(). can: Fix can_send() handling on dev_queue_xmit() failures netns: Fix arbitrary net_device-s corruptions on net_ns stop. netfilter: Kconfig: default DCCP/SCTP conntrack support to the protocol config values netfilter: nf_conntrack_sip: restrict RTP expect flushing on error to last request macvlan: Fix memleak on device removal/crash on module removal net/ipv4: correct RFC 1122 section reference in comment tcp FRTO: SACK variant is errorneously used with NewReno e1000e: don't return half-read eeprom on error ucc_geth: Don't use RX clock as TX clock. cxgb3: Use CAP_SYS_RAWIO for firmware pcnet32: delete non NAPI code from driver. fs_enet: Fix a memory leak in fs_enet_mdio_probe [netdrvr] eexpress: IPv6 fails - multicast problems 3c59x: use netstats in net_device structure 3c980-TX needs EXTRA_PREAMBLE fix warning in drivers/net/appletalk/cops.c e1000e: Add support for BM PHYs on ICH9 uli526x: fix endianness issues in the setup frame uli526x: initialize the hardware prior to requesting interrupts ...
| * net: Added ASSERT_RTNL() to dev_open() and dev_close().Ben Hutchings2008-05-08
| | | | | | | | | | | | | | | | dev_open() and dev_close() must be called holding the RTNL, since they call device functions and netdevice notifiers that are promised the RTNL. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * can: Fix can_send() handling on dev_queue_xmit() failuresOliver Hartkopp2008-05-08
| | | | | | | | | | | | | | | | | | | | | | | | The tx packet counting and the local loopback of CAN frames should only happen in the case that the CAN frame has been enqueued to the netdevice tx queue successfully. Thanks to Andre Naujoks <nautsch@gmail.com> for reporting this issue. Signed-off-by: Oliver Hartkopp <oliver@hartkopp.net> Signed-off-by: Urs Thuermann <urs@isnogud.escape.de> Signed-off-by: David S. Miller <davem@davemloft.net>
| * Merge branch 'upstream-davem' of ↵David S. Miller2008-05-08
| |\ | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6
| | * e1000e: don't return half-read eeprom on errorKok, Auke2008-05-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | On a read error, e1000e might have returned uninitialized block of eeprom data back to userspace. The convention is that 0xff is "empty", so mark the entire eeprom as empty in case of an error. Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
| | * ucc_geth: Don't use RX clock as TX clock.Joakim Tjernlund2008-05-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 9fb1e350e16164d56990dde036ae9c0a2fd3f634, ucc_geth: use rx-clock-name and tx-clock-name device tree properties Introduced a typo that made the driver use the RX clock as TX clock, causing massive TX errors. Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
| | * cxgb3: Use CAP_SYS_RAWIO for firmwareAlan Cox2008-05-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Otherwise theoretically at least CAP_NET_ADMIN Reload new firmware Wait.. Firmware patches kernel So it should be CAY_SYS_RAWIO - not that I suspect this is in fact a credible attack vector! Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
| | * pcnet32: delete non NAPI code from driver.Don Fry2008-05-06
| | | | | | | | | | | | | | | | | | | | | | | | Delete the non-napi code from the driver and Kconfig. Tested x86_64. Apply at next open opportunity. Signed-off-by: Don Fry <pcnet32@verizon.net> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
| | * fs_enet: Fix a memory leak in fs_enet_mdio_probeScott Wood2008-05-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are more memory leaks in the !PPC_CPM_NEW_BINDING case, but that code will disappear soon along with arch/ppc. Reported by Daniel Marjamki <danielm77@spray.se> at http://bugzilla.kernel.org/show_bug.cgi?id=10591 Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
| | * [netdrvr] eexpress: IPv6 fails - multicast problemsBruce Robson2008-05-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Taken from http://bugzilla.kernel.org/show_bug.cgi?id=10577 I was unable to access a computer containing an Intel EtherExpress 16 network card using IPv6. I traced this to failure of neighbour discovery. When I used an "ip -6 neigh add" command, on the computer attempting access, to insert a binding between the IPv6 address of the computer with the Intel EtherExpress 16 network card and the card's ethernet address, I was able to access that computer using IPv6. Neighbour discovery requires working multicast. The driver sources file eexpress.c contains an approximately 30 line function eexp_setup_filter used when loading multicast addresses. I found 3 problems in this function 1) It wrote the number of multicast addresses to the card instead of the number of bytes in the multicast addresses. 2) When loading multiple multicast addresses it loaded the first one provided multiple times instead of loading each one once. 3) The setting of pointer 'data' from 'dmi->dmi_addr' occured before the test for the error situation of 'dmi' being NULL. Correcting these problems allows the computer with the Intel EtherExpress 16 network card to found by IPv6 neighbour discovery. p.s. There is some information on the Intel EtherExpress 16 at http://www.intel.com/support/etherexpress/vintage/sb/cs-013500.htm Datasheet for the Intel 82586 ethernet controller used by the card http://www.datasheetcatalog.com/datasheets_pdf/8/2/5/8/82586.shtml Signed-off-by: Bruce Robson <bns_robson@hotmail.com> Cc: Jeff Garzik <jeff@garzik.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
| | * 3c59x: use netstats in net_device structurePaulius Zaleckas2008-05-06
| | | | | | | | | | | | | | | | | | | | | | | | Use net_device_stats from net_device structure instead of local. Signed-off-by: Paulius Zaleckas <paulius.zaleckas@teltonika.lt> Acked-by: Steffen Klassert <klassert@mathematik.tu-chemnitz.de> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
| | * 3c980-TX needs EXTRA_PREAMBLEGunnar Larisch2008-05-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ethernet card 3c980-TX needs a mdio_sync() to initialize the ethernet properly. This is forced by adding an EXTRA_PREAMBLE to its drv_flags. Without this, the driver did not reconnect after a link loss. Signed-off-by: Gunnar Larisch <Gunnar.Larisch@gmx.de> Acked-by: Steffen Klassert <klassert@mathematik.tu-chemnitz.de> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
| | * Merge branch 'for-2.6.26' of ↵Jeff Garzik2008-05-06
| | |\ | | | | | | | | | | | | git://git.farnsworth.org/dale/linux-2.6-mv643xx_eth into upstream
| | | * mv643xx_eth: inter-mv643xx SMI port sharingLennert Buytenhek2008-04-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There exist chips with up to four mv643xx_eth silicon blocks but only one external SMI (MII management) interface -- the SMI logic of the first block is shared by all the blocks. Handle this by allowing a per-port override of which mv643xx_eth_shared's SMI registers (and spinlock) to use. Signed-off-by: Lennert Buytenhek <buytenh@marvell.com> Acked-by: Nicolas Pitre <nico@marvell.com> Signed-off-by: Dale Farnsworth <dale@farnsworth.org>
| | | * mv643xx_eth: shorten shared platform driver nameLennert Buytenhek2008-04-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change the MV643XX_ETH_SHARED_NAME platform driver name to something shorter than 19 characters, so that we can register multiple (otherwise we end up with sysfs conflicts since all instances will map to "mv643xx_eth_shared." as there is a 20-char sysfs file name limit.) Signed-off-by: Lennert Buytenhek <buytenh@marvell.com> Acked-by: Nicolas Pitre <nico@marvell.com> Signed-off-by: Dale Farnsworth <dale@farnsworth.org>
| | | * mv643xx_eth: configurable t_clkLennert Buytenhek2008-04-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make t_clk configurable via platform device data (with the current hardcoded value, 133 MHz, being the default), as it varies across different chip families. Signed-off-by: Lennert Buytenhek <buytenh@marvell.com> Acked-by: Nicolas Pitre <nico@marvell.com> Signed-off-by: Dale Farnsworth <dale@farnsworth.org>
| | | * mv643xx_eth: mbus decode window supportLennert Buytenhek2008-04-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make it possible to pass mbus_dram_target_info to the mv643xx_eth driver via the platform data, and make the mv643xx_eth driver program the window registers based on this data if it is passed in. Signed-off-by: Lennert Buytenhek <buytenh@marvell.com> Reviewed-by: Tzachi Perelstein <tzachi@marvell.com> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Dale Farnsworth <dale@farnsworth.org>
| | | * mv643xx_eth: get rid of static variables, allow multiple instancesLennert Buytenhek2008-04-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move mv643xx_eth's static state (ethernet register block base address and MII management interface spinlock) into a struct hanging off the shared platform device. This is necessary to support chips that contain multiple mv643xx_eth silicon blocks. Signed-off-by: Lennert Buytenhek <buytenh@marvell.com> Acked-by: Nicolas Pitre <nico@marvell.com> Signed-off-by: Dale Farnsworth <dale@farnsworth.org>
| | * | fix warning in drivers/net/appletalk/cops.cJeff Garzik2008-05-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | drivers/net/appletalk/cops.c: In function ‘cops_reset’: drivers/net/appletalk/cops.c:507: warning: comparison of distinct pointer types lacks a cast by replacing hand-woven msleep() with call to msleep() Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
| | * | e1000e: Add support for BM PHYs on ICH9Bruce Allan2008-05-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support for the BM PHY, a new PHY model being used on ICH9-based implementations. This new PHY exposes issues in the ICH9 silicon when receiving jumbo frames large enough to use more than a certain part of the Rx FIFO, and this unfortunately breaks packet split jumbo receives. For this reason we re-introduce (for affected adapters only) the jumbo single-skb receive routine back so that people who do wish to use jumbo frames on these ich9 platforms can do so. Part of this problem has to do with CPU sleep states and to make sure that all the wake up timings are correctly we force them with the recently merged pm_qos infrastructure written by Mark Gross. (See http://lkml.org/lkml/2007/10/4/400). To make code read a bit easier we introduce a _IS_ICH flag so that we don't need to do mac type checks over the code. Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
| | * | uli526x: fix endianness issues in the setup frameAnton Vorontsov2008-05-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes uli526x driver's issues on a PowerPC boards: uli chip is unable to receive the packets. It appears that send_frame_filter prepares the setup frame in the endianness unsafe manner. On a big endian machines we should shift the address nibble by two bytes. Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
| | * | uli526x: initialize the hardware prior to requesting interruptsAnton Vorontsov2008-05-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The firmware on MPC8610HPCD boards enables ULI ethernet and leaves it in some funky state before booting Linux. For drivers, it's always good idea to (re)initialize the hardware prior to requesting interrupts. This patch fixes the following oops: Oops: Kernel access of bad area, sig: 11 [#1] MPC86xx HPCD NIP: c0172820 LR: c017287c CTR: 00000000 [...] NIP [c0172820] allocate_rx_buffer+0x2c/0xb0 LR [c017287c] allocate_rx_buffer+0x88/0xb0 Call Trace: [df82bdc0] [c017287c] allocate_rx_buffer+0x88/0xb0 (unreliable) [df82bde0] [c0173000] uli526x_interrupt+0xe4/0x49c [df82be20] [c0045418] request_irq+0xf0/0x114 [df82be50] [c01737b0] uli526x_open+0x48/0x160 [df82be70] [c0201184] dev_open+0xb0/0xe8 [df82be80] [c0200104] dev_change_flags+0x90/0x1bc [df82bea0] [c035fab0] ip_auto_config+0x214/0xef4 [df82bf60] [c03421c8] kernel_init+0xc4/0x2ac [df82bff0] [c0010834] kernel_thread+0x44/0x60 Instruction dump: 4e800020 9421ffe0 7c0802a6 bfa10014 7c7e1b78 90010024 80030060 83e30054 2b80002f 419d0078 3fa0c039 48000058 <907f0010> 80630088 2f830000 419e0014 Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
| | * | ucc_geth: Fix a bunch of sparse warningsAndy Fleming2008-05-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ucc_geth didn't have anything marked as __iomem. It was also inconsistent with its use of in/out accessors (using them sometimes, not using them other times). Cleaning this up cuts the warnings down from hundreds to just over a dozen. Signed-off-by: Andy Fleming <afleming@freescale.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
| | * | phylib: Fix some sparse warningsAndy Fleming2008-05-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Declared some things static, declared some things in the header. Signed-off-by: Andy Fleming <afleming@freescale.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
| | * | gianfar: Fix a locking bug in gianfar's sysfs codeAndy Fleming2008-05-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During sparse cleanup, found a locking bug. Some of the sysfs functions were acquiring a lock, and then returning in the event of an error. We rearrange the code so that the lock is released in error conditions, too. Signed-off-by: Andy Fleming <afleming@freescale.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
| | * | bonding: fix enslavement error unwindsJay Vosburgh2008-05-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As part of: commit c2edacf80e155ef54ae4774379d461b60896bc2e Author: Jay Vosburgh <fubar@us.ibm.com> Date: Mon Jul 9 10:42:47 2007 -0700 bonding / ipv6: no addrconf for slaves separately from master two steps were rearranged in the enslavement process: netdev_set_master is now before the call to dev_open to open the slave. This patch updates the error cases and unwind process at the end of bond_enslave to match the new order. Without this patch, it is possible for the enslavement to fail, but leave the slave with IFF_SLAVE set in its flags. Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
| | * | bonding: Deadlock between bonding_store_bonds and bond_destroy_sysfs.Pavel Emelyanov2008-05-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The sysfs layer has an internal protection, that ensures, that all the process sitting inside ->sore/->show callback exits before the appropriate entry is unregistered (the calltraces are rather big, but I can provide them if required). On the other hand, bonding takes rtnl_lock in a) the bonding_store_bonds, i.e. in ->store callback, b) module exit before calling the sysfs unregister routines. Thus, the classical AB-BA deadlock may occur. To reproduce run # while :; do modprobe bonding; rmmod bonding; done and # while :; do echo '+bond%d' > /sys/class/net/bonding_masters ; done in parallel. The fix is to move the bond_destroy_sysfs out of the rtnl_lock, but _before_ bond_free_all to make sure no bonding devices exist after module unload. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
| | * | bonding: fix error unwind in bonding_store_bondsJay Vosburgh2008-05-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixed an error unwind in bonding_store_bonds that didn't release the locks it held, and consolidated unwinds into a common block at the end of the function. Bug reported by Pavel Emelyanov <xemul@openvz.org>, who provided a different fix. Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
| | * | bonding: Do not call free_netdev for already registered device.Pavel Emelyanov2008-05-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the call to bond_create_sysfs_entry in bond_create fails, the proper rollback is to call unregister_netdevice, not free_netdev. Otherwise - kernel BUG at net/core/dev.c:4057! Checked with artificial failures injected into bond_create_sysfs_entry. Pavel's original patch modified by Jay Vosburgh to move code around for clarity (remove goto-hopping within the unwind block). Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
| * | | netns: Fix arbitrary net_device-s corruptions on net_ns stop.Pavel Emelyanov2008-05-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a net namespace is destroyed, some devices (those, not killed on ns stop explicitly) are moved back to init_net. The problem, is that this net_ns change has one point of failure - the __dev_alloc_name() may be called if a name collision occurs (and this is easy to trigger). This allocator performs a likely-to-fail GFP_ATOMIC allocation to find a suitable number. Other possible conditions that may cause error (for device being ns local or not registered) are always false in this case. So, when this call fails, the device is unregistered. But this is *not* the right thing to do, since after this the device may be released (and kfree-ed) improperly. E. g. bridges require more actions (sysfs update, timer disarming, etc.), some other devices want to remove their private areas from lists, etc. I. e. arbitrary use-after-free cases may occur. The proposed fix is the following: since the only reason for the dev_change_net_namespace to fail is the name generation, we may give it a unique fall-back name w/o %d-s in it - the dev<ifindex> one, since ifindexes are still unique. So make this change, raise the failure-case printk loglevel to EMERG and replace the unregister_netdevice call with BUG(). [ Use snprintf() -DaveM ] Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | netfilter: Kconfig: default DCCP/SCTP conntrack support to the protocol ↵Patrick McHardy2008-05-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | config values When conntrack and DCCP/SCTP protocols are enabled, chances are good that people also want DCCP/SCTP conntrack and NAT support. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | netfilter: nf_conntrack_sip: restrict RTP expect flushing on error to last ↵Patrick McHardy2008-05-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | request Some Inovaphone PBXs exhibit very stange behaviour: when dialing for example "123", the device sends INVITE requests for "1", "12" and "123" back to back. The first requests will elicit error responses from the receiver, causing the SIP helper to flush the RTP expectations even though we might still see a positive response. Note the sequence number of the last INVITE request that contained a media description and only flush the expectations when receiving a negative response for that sequence number. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | macvlan: Fix memleak on device removal/crash on module removalPatrick McHardy2008-05-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As noticed by Ben Greear, macvlan crashes the kernel when unloading the module. The reason is that it tries to clean up the macvlan_port pointer on the macvlan device itself instead of the underlying device. A non-NULL pointer is taken as indication that the macvlan_handle_frame_hook is valid, when receiving the next packet on the underlying device it tries to call the NULL hook and crashes. Clean up the macvlan_port on the correct device to fix this. Signed-off-by; Patrick McHardy <kaber@trash.net> Tested-by: Ben Greear <greearb@candelatech.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net/ipv4: correct RFC 1122 section reference in commentJ.H.M. Dassen (Ray)2008-05-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | RFC 1122 does not have a section 3.1.2.2. The requirement to silently discard datagrams with a bad checksum is in section 3.2.1.2 instead. Addresses http://bugzilla.kernel.org/show_bug.cgi?id=10611 Signed-off-by: J.H.M. Dassen (Ray) <jdassen@debian.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | tcp FRTO: SACK variant is errorneously used with NewRenoIlpo Järvinen2008-05-08
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Note: there's actually another bug in FRTO's SACK variant, which is the causing failure in NewReno case because of the error that's fixed here. I'll fix the SACK case separately (it's a separate bug really, though related, but in order to fix that I need to audit tp->snd_nxt usage a bit). There were two places where SACK variant of FRTO is getting incorrectly used even if SACK wasn't negotiated by the TCP flow. This leads to incorrect setting of frto_highmark with NewReno if a previous recovery was interrupted by another RTO. An eventual fallback to conventional recovery then incorrectly considers one or couple of segments as forward transmissions though they weren't, which then are not LOST marked during fallback making them "non-retransmittable" until the next RTO. In a bad case, those segments are really lost and are the only one left in the window. Thus TCP needs another RTO to continue. The next FRTO, however, could again repeat the same events making the progress of the TCP flow extremely slow. In order for these events to occur at all, FRTO must occur again in FRTOs step 3 while the key segments must be lost as well, which is not too likely in practice. It seems to most frequently with some small devices such as network printers that *seem* to accept TCP segments only in-order. In cases were key segments weren't lost, things get automatically resolved because those wrongly marked segments don't need to be retransmitted in order to continue. I found a reproducer after digging up relevant reports (few reports in total, none at netdev or lkml I know of), some cases seemed to indicate middlebox issues which seems now to be a false assumption some people had made. Bugzilla #10063 _might_ be related. Damon L. Chesser <damon@damtek.com> had a reproducable case and was kind enough to tcpdump it for me. With the tcpdump log it was quite trivial to figure out. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6Linus Torvalds2008-05-08
|\ \ \ | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6: sparc: Fix SA_ONSTACK signal handling.
| * | | sparc: Fix SA_ONSTACK signal handling.David S. Miller2008-05-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need to be more liberal about the alignment of the buffer given to us by sigaltstack(). The user should not need to be mindful of all of the alignment constraints we have for the stack frame. This mirrors how we handle this situation in clone() as well. Also, we align the stack even in non-SA_ONSTACK cases so that signals due to bad stack alignment can be delivered properly. This makes such errors easier to debug and recover from. Finally, add the sanity check x86 has to make sure we won't overflow the signal stack. This fixes glibc testcases nptl/tst-cancel20.c and nptl/tst-cancelx20.c Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | Revert "PCI: remove default PCI expansion ROM memory allocation"Linus Torvalds2008-05-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 9f8daccaa05c14e5643bdd4faf5aed9cc8e6f11e, which was reported to break X startup (xf86-video-ati-6.8.0). See http://bugs.freedesktop.org/show_bug.cgi?id=15523 for details. Reported-by: Laurence Withers <l@lwithers.me.uk> Cc: Gary Hade <garyhade@us.ibm.com> Cc: Greg KH <greg@kroah.com> Cc: Jan Beulich <jbeulich@novell.com> Cc: "Jun'ichi Nomura" <j-nomura@ce.jp.nec.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | Merge branch 'for-linus' of ↵Linus Torvalds2008-05-08
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-fixes * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-fixes: sched: fix weight calculations semaphore: fix
| * | | | sched: fix weight calculationsMike Galbraith2008-05-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The conversion between virtual and real time is as follows: dvt = rw/w * dt <=> dt = w/rw * dvt Since we want the fair sleeper granularity to be in real time, we actually need to do: dvt = - rw/w * l This bug could be related to the regression reported by Yanmin Zhang: | Comparing with kernel 2.6.25, sysbench+mysql(oltp, readonly) has lots | of regressions with 2.6.26-rc1: | | 1) 8-core stoakley: 28%; | 2) 16-core tigerton: 20%; | 3) Itanium Montvale: 50%. Reported-by: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com> Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | semaphore: fixIngo Molnar2008-05-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Yanmin Zhang reported: | Comparing with kernel 2.6.25, AIM7 (use tmpfs) has more th | regression under 2.6.26-rc1 on my 8-core stoakley, 16-core tigerton, | and Itanium Montecito. Bisect located the patch below: | | 64ac24e738823161693bf791f87adc802cf529ff is first bad commit | commit 64ac24e738823161693bf791f87adc802cf529ff | Author: Matthew Wilcox <matthew@wil.cx> | Date: Fri Mar 7 21:55:58 2008 -0500 | | Generic semaphore implementation | | After I manually reverted the patch against 2.6.26-rc1 while fixing | lots of conflicts/errors, aim7 regression became less than 2%. i reproduced the AIM7 workload and can confirm Yanmin's findings that -.26-rc1 regresses over .25 - by over 67% here. Looking at the workload i found and fixed what i believe to be the real bug causing the AIM7 regression: it was inefficient wakeup / scheduling / locking behavior of the new generic semaphore code, causing suboptimal performance. The problem comes from the following code. The new semaphore code does this on down(): spin_lock_irqsave(&sem->lock, flags); if (likely(sem->count > 0)) sem->count--; else __down(sem); spin_unlock_irqrestore(&sem->lock, flags); and this on up(): spin_lock_irqsave(&sem->lock, flags); if (likely(list_empty(&sem->wait_list))) sem->count++; else __up(sem); spin_unlock_irqrestore(&sem->lock, flags); where __up() does: list_del(&waiter->list); waiter->up = 1; wake_up_process(waiter->task); and where __down() does this in essence: list_add_tail(&waiter.list, &sem->wait_list); waiter.task = task; waiter.up = 0; for (;;) { [...] spin_unlock_irq(&sem->lock); timeout = schedule_timeout(timeout); spin_lock_irq(&sem->lock); if (waiter.up) return 0; } the fastpath looks good and obvious, but note the following property of the contended path: if there's a task on the ->wait_list, the up() of the current owner will "pass over" ownership to that waiting task, in a wake-one manner, via the waiter->up flag and by removing the waiter from the wait list. That is all and fine in principle, but as implemented in kernel/semaphore.c it also creates a nasty, hidden source of contention! The contention comes from the following property of the new semaphore code: the new owner owns the semaphore exclusively, even if it is not running yet. So if the old owner, even if just a few instructions later, does a down() [lock_kernel()] again, it will be blocked and will have to wait on the new owner to eventually be scheduled (possibly on another CPU)! Or if another task gets to lock_kernel() sooner than the "new owner" scheduled, it will be blocked unnecessarily and for a very long time when there are 2000 tasks running. I.e. the implementation of the new semaphores code does wake-one and lock ownership in a very restrictive way - it does not allow opportunistic re-locking of the lock at all and keeps the scheduler from picking task order intelligently. This kind of scheduling, with 2000 AIM7 processes running, creates awful cross-scheduling between those 2000 tasks, causes reduced parallelism, a throttled runqueue length and a lot of idle time. With increasing number of CPUs it causes an exponentially worse behavior in AIM7, as the chance for a newly woken new-owner task to actually run anytime soon is less and less likely. Note that it takes just a tiny bit of contention for the 'new-semaphore catastrophy' to happen: the wakeup latencies get added to whatever small contention there is, and quickly snowball out of control! I believe Yanmin's findings and numbers support this analysis too. The best fix for this problem is to use the same scheduling logic that the kernel/mutex.c code uses: keep the wake-one behavior (that is OK and wanted because we do not want to over-schedule), but also allow opportunistic locking of the lock even if a wakee is already "in flight". The patch below implements this new logic. With this patch applied the AIM7 regression is largely fixed on my quad testbox: # v2.6.25 vanilla: .................. Tasks Jobs/Min JTI Real CPU Jobs/sec/task 2000 56096.4 91 207.5 789.7 0.4675 2000 55894.4 94 208.2 792.7 0.4658 # v2.6.26-rc1-166-gc0a1811 vanilla: ................................... Tasks Jobs/Min JTI Real CPU Jobs/sec/task 2000 33230.6 83 350.3 784.5 0.2769 2000 31778.1 86 366.3 783.6 0.2648 # v2.6.26-rc1-166-gc0a1811 + semaphore-speedup: ............................................... Tasks Jobs/Min JTI Real CPU Jobs/sec/task 2000 55707.1 92 209.0 795.6 0.4642 2000 55704.4 96 209.0 796.0 0.4642 i.e. a 67% speedup. We are now back to within 1% of the v2.6.25 performance levels and have zero idle time during the test, as expected. Btw., interactivity also improved dramatically with the fix - for example console-switching became almost instantaneous during this workload (which after all is running 2000 tasks at once!), without the patch it was stuck for a minute at times. There's another nice side-effect of this speedup patch, the new generic semaphore code got even smaller: text data bss dec hex filename 1241 0 0 1241 4d9 semaphore.o.before 1207 0 0 1207 4b7 semaphore.o.after (because the waiter.up complication got removed.) Longer-term we should look into using the mutex code for the generic semaphore code as well - but i's not easy due to legacies and it's outside of the scope of v2.6.26 and outside the scope of this patch as well. Bisected-by: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>