aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAge
* memcg: fix mem_cgroup_shrink_usage()Daisuke Nishimura2009-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Current mem_cgroup_shrink_usage() has two problems. 1. It doesn't call mem_cgroup_out_of_memory and doesn't update last_oom_jiffies, so pagefault_out_of_memory invokes global OOM. 2. Considering hierarchy, shrinking has to be done from the mem_over_limit, not from the memcg which the page would be charged to. mem_cgroup_try_charge_swapin() does all of these things properly, so we use it and call cancel_charge_swapin when it succeeded. The name of "shrink_usage" is not appropriate for this behavior, so we change it too. Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Li Zefan <lizf@cn.fujitsu.cn> Cc: Paul Menage <menage@google.com> Cc: Dhaval Giani <dhaval@linux.vnet.ibm.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* pagemap: require aligned-length, non-null reads of /proc/pid/pagemapVitaly Mayatskikh2009-05-02
| | | | | | | | | | | | | | | | | The intention of commit aae8679b0ebcaa92f99c1c3cb0cd651594a43915 ("pagemap: fix bug in add_to_pagemap, require aligned-length reads of /proc/pid/pagemap") was to force reads of /proc/pid/pagemap to be a multiple of 8 bytes, but now it allows to read 0 bytes, which actually puts some data to user's buffer. According to POSIX, if count is zero, read() should return zero and has no other results. Signed-off-by: Vitaly Mayatskikh <v.mayatskih@gmail.com> Cc: Thomas Tuttle <ttuttle@google.com> Acked-by: Matt Mackall <mpm@selenic.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: close page_mkwrite racesNick Piggin2009-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change page_mkwrite to allow implementations to return with the page locked, and also change it's callers (in page fault paths) to hold the lock until the page is marked dirty. This allows the filesystem to have full control of page dirtying events coming from the VM. Rather than simply hold the page locked over the page_mkwrite call, we call page_mkwrite with the page unlocked and allow callers to return with it locked, so filesystems can avoid LOR conditions with page lock. The problem with the current scheme is this: a filesystem that wants to associate some metadata with a page as long as the page is dirty, will perform this manipulation in its ->page_mkwrite. It currently then must return with the page unlocked and may not hold any other locks (according to existing page_mkwrite convention). In this window, the VM could write out the page, clearing page-dirty. The filesystem has no good way to detect that a dirty pte is about to be attached, so it will happily write out the page, at which point, the filesystem may manipulate the metadata to reflect that the page is no longer dirty. It is not always possible to perform the required metadata manipulation in ->set_page_dirty, because that function cannot block or fail. The filesystem may need to allocate some data structure, for example. And the VM cannot mark the pte dirty before page_mkwrite, because page_mkwrite is allowed to fail, so we must not allow any window where the page could be written to if page_mkwrite does fail. This solution of holding the page locked over the 3 critical operations (page_mkwrite, setting the pte dirty, and finally setting the page dirty) closes out races nicely, preventing page cleaning for writeout being initiated in that window. This provides the filesystem with a strong synchronisation against the VM here. - Sage needs this race closed for ceph filesystem. - Trond for NFS (http://bugzilla.kernel.org/show_bug.cgi?id=12913). - I need it for fsblock. - I suspect other filesystems may need it too (eg. btrfs). - I have converted buffer.c to the new locking. Even simple block allocation under dirty pages might be susceptible to i_size changing under partial page at the end of file (we also have a buffer.c-side problem here, but it cannot be fixed properly without this patch). - Other filesystems (eg. NFS, maybe btrfs) will need to change their page_mkwrite functions themselves. [ This also moves page_mkwrite another step closer to fault, which should eventually allow page_mkwrite to be moved into ->fault, and thus avoiding a filesystem calldown and page lock/unlock cycle in __do_fault. ] [akpm@linux-foundation.org: fix derefs of NULL ->mapping] Cc: Sage Weil <sage@newdream.net> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* atomic: fix atomic_long_cmpxchg/xchg for 64 bit architecturesHeiko Carstens2009-05-02
| | | | | | | | | | | | | | | | | | On a linux-next allyesconfig build: kernel/trace/ring_buffer.c:1726: warning: passing argument 1 of 'atomic_cmpxchg' from incompatible pointer type linux-next/arch/s390/include/asm/atomic.h:112: note: expected 'struct atomic_t *' but argument is of type 'struct atomic64_t *' atomic_long_cmpxchg and atomic_long_xchg are incorrectly defined for 64 bit architectures. They should be mapped to the atomic64_* variants. Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* CRISv10: fix serial driver proc-usageJesper Nilsson2009-05-02
| | | | | | | | | | | | | | | | | | drivers/serial/crisv10.c:4428: error: unknown field 'read_proc' specified in initializer Commit 0f043a81ebe84be3576667f04fdda481609e3816 ("proc tty: remove struct tty_operations::read_proc") removes the read_proc entry from struct tty_operations. Rework the proc handling in the CRISv10 serial driver to use proc_fops instead. Signed-off-by: Jesper Nilsson <jesper.nilsson@axis.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Mikael Starvik <starvik@axis.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* MAINTAINERS: add ptrace entryChristoph Hellwig2009-05-02
| | | | | | | | | | | | | | Add Roland and Oleg as formal ptrace maintainers, they've been doing the job for a while. Includes the file patterns requested by Joe. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Joe Perches <joe@perches.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* memcg: fix try_get_mem_cgroup_from_swapcache()Daisuke Nishimura2009-05-02
| | | | | | | | | | | | | | | | This is a bugfix for commit 3c776e64660028236313f0e54f3a9945764422df ("memcg: charge swapcache to proper memcg"). Used bit of swapcache is solid under page lock, but considering move_account, pc->mem_cgroup is not. We need lock_page_cgroup() anyway. Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Balbir Singh <balbir@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* MAINTAINERS: Florian has movedFlorian Fainelli2009-05-02
| | | | | | | | I will finish school soon, so replace my student address with this one. Signed-off-by: Florian Fainelli <florian@openwrt.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* autofs4: fix incorrect return in autofs4_mount_busy()Ian Kent2009-05-02
| | | | | | | | Fix an obvious incorrect return status in autofs4_mount_busy(). Signed-off-by: Ian Kent <raven@themaw.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: fix pageref leak in do_swap_page()Johannes Weiner2009-05-02
| | | | | | | | | | | | | | | By the time the memory cgroup code is notified about a swapin we already hold a reference on the fault page. If the cgroup callback fails make sure to unlock AND release the page reference which was taken by lookup_swap_cach(), or we leak the reference. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* acpica: validate package element more carefully in ↵Robert Moore2009-05-02
| | | | | | | | | | | | | | | | | | | acpi_rs_get_pci_routing_table_length acpi_rs_get_pci_routing_table_length is not performing sufficient validation on the package returned from _PRT. It assumes a package of packages and fails/faults if this is not the case. We should validate each subpackage when extracted from the parent package, and not accept objects of the wrong type, since that will just cause the scanning to fail (likely with a kernel oops). This can only happen with a serious BIOS bug, and is accompanied by a warning something like this: ACPI Warning (nspredef-0949): \_SB_.PCI0.PEG4._PRT: Return Package type mismatch at index 0 - found Integer, expected Package [20090320] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Linux 2.6.30-rc4v2.6.30-rc4Linus Torvalds2009-04-30
|
* Merge branch 'for-linus' of ↵Linus Torvalds2009-04-30
|\ | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6: eCryptfs: Fix min function comparison warning ecryptfs: fix printk format warning
| * eCryptfs: Fix min function comparison warningTyler Hicks2009-04-27
| | | | | | | | | | | | | | | | | | This warning shows up on 64 bit builds: fs/ecryptfs/inode.c:693: warning: comparison of distinct pointer types lacks a cast Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
| * ecryptfs: fix printk format warningRandy Dunlap2009-04-27
| | | | | | | | | | | | | | | | | | fs/ecryptfs/inode.c:670: warning: format '%d' expects type 'int', but argument 3 has type 'size_t' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com> Cc: Dustin Kirkland <kirkland@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* | Merge branch 'for_linus' of ↵Linus Torvalds2009-04-30
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6: V4L/DVB (11652): au0828: fix kernel oops regression on USB disconnect. V4L/DVB (11626): cx23885: Two fixes for DViCO FusionHDTV DVB-T Dual Express V4L/DVB (11612): mx3_camera: Fix compilation with CONFIG_PM V4L/DVB (11570): patch: s2255drv: fix race condition on set mode V4L/DVB (11568): cx18: Fix the handling of i2c bus registration error V4L/DVB (11561a): move media after i2c V4L/DVB (11516): drivers/media/video/saa5246a.c: fix use-after-free V4L/DVB (11515): drivers/media/video/saa5249.c: fix use-after-free and leak V4L/DVB (11494a): cx231xx Kconfig fixes V4L/DVB (11494): cx18: Send correct input routing value to external audio multiplexers
| * V4L/DVB (11652): au0828: fix kernel oops regression on USB disconnect.Devin Heitmueller2009-04-29
| | | | | | | | | | | | | | | | | | | | | | A regression was introduced in hg changeset 33810c734a0d, which resulted in a kernel panic whenever the device was disconnected from USB. The call to 4l2_device_register() was overwriting the pointer for usb_set_intfdata(), so when au0828_usb_disconnect() was called, the usb_get_intfdata() returned a pointer to the v4l2_device instead of the au0828_dev structure. Signed-off-by: Devin Heitmueller <dheitmueller@linuxtv.org> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
| * V4L/DVB (11626): cx23885: Two fixes for DViCO FusionHDTV DVB-T Dual ExpressChristopher Pascoe2009-04-29
| | | | | | | | | | | | | | | | | | | | | | | | | | Two fixes for DViCO FusionHDTV DVB-T Dual Express: * Reset correct tuner when reinitializing xc3028. * Disable the I2C gate control to avoid locking up the I2C bus. Tested-by: John Knops <jknops@australiaonline.net.au> Reviewed-by: Steven Toth <stoth@linuxtv.org> Signed-off-by: Christopher Pascoe <linuxdvb@itee.uq.edu.au> Signed-off-by: Devin Heitmueller <dheitmueller@linuxtv.org> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
| * V4L/DVB (11612): mx3_camera: Fix compilation with CONFIG_PMSascha Hauer2009-04-29
| | | | | | | | | | | | Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
| * V4L/DVB (11570): patch: s2255drv: fix race condition on set modeDean Anderson2009-04-29
| | | | | | | | | | | | | | | | set_modeready flag must be set before command sent to USB in s2255_write_config. Signed-off-by: Dean Anderson <dean@sensoray.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
| * V4L/DVB (11568): cx18: Fix the handling of i2c bus registration errorJean Delvare2009-04-29
| | | | | | | | | | | | | | | | | | | | | | * Return actual error values as returned by the i2c subsystem, rather than 0 or 1. * If the registration of the second bus fails, unregister the first one before exiting, otherwise we are leaking resources. Signed-off-by: Jean Delvare <khali@linux-fr.org> Acked-by: Andy Walls <awalls@radix.net> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
| * V4L/DVB (11561a): move media after i2cGuennadi Liakhovetski2009-04-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently drivers/media drivers are linked very early - directly after base, block, misc, and mfd and before ata, scsi, ide, input, firewire, usb, and i2c. This breaks static build of video4linux drivers, that use generic CPU i2c adapter drivers and the v4l2-subdev subsystem, because during video4linux probing the v4l2-subdev core requires a struct i2c_adapter context, which cannot be satisfied before the i2c subsystem is initialised. Moving drivers/media after drivers/i2c fixes this problem. The best way to trigger action is by submitting a patch:-) So, let's see what comes out of it - on the one hand I don't see any reason why media has to be linked this early, and nobody was able to give me one yesterday as this problem has been discussed on linux-media, OTOH, maybe indeed it would be better to move i2c the whole way up above media, but that'd be much bigger of a change, I think. -- To unsubscribe from this list: send the line "unsubscribe linux-media" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de> Acked-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
| * V4L/DVB (11516): drivers/media/video/saa5246a.c: fix use-after-freeDan Carpenter2009-04-29
| | | | | | | | | | | | | | | | | | I lowered the kfree(t) down a couple lines and removed the superflous "t->vdev = NULL;" Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
| * V4L/DVB (11515): drivers/media/video/saa5249.c: fix use-after-free and leakDan Carpenter2009-04-29
| | | | | | | | | | | | | | I moved the kfree() down a couple lines. t->vdev is going to be in freed memory so there is no point setting it to NULL. I added a kfree(t) on a Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
| * V4L/DVB (11494a): cx231xx Kconfig fixesMauro Carvalho Chehab2009-04-29
| | | | | | | | | | | | | | | | selecting ALSA module breaks if !SND. Just remove select. While here, let's fix the whitespacing at the Kconfig. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
| * V4L/DVB (11494): cx18: Send correct input routing value to external audio ↵Andy Walls2009-04-29
| | | | | | | | | | | | | | | | | | | | | | multiplexers A late v4l2_subdev framework change accidentally sent the audio input routing value to the external multiplexer, instead of the muxer input routing value to the external multiplexer. This change corrects that error. Signed-off-by: Andy Walls <awalls@radix.net> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds2009-04-29
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (24 commits) e100: do not go D3 in shutdown unless system is powering off netfilter: revised locking for x_tables Bluetooth: Fix connection establishment with low security requirement Bluetooth: Add different pairing timeout for Legacy Pairing Bluetooth: Ensure that HCI sysfs add/del is preempt safe net: Avoid extra wakeups of threads blocked in wait_for_packet() net: Fix typo in net_device_ops description. ipv4: Limit size of route cache hash table Add reference to CAPI 2.0 standard Documentation/isdn/INTERFACE.CAPI update Documentation/isdn/00-INDEX ixgbe: Fix WoL functionality for 82599 KX4 devices veth: prevent oops caused by netdev destructor xfrm: wrong hash value for temporary SA forcedeth: tx timeout fix net: Fix LL_MAX_HEADER for CONFIG_TR_MODULE mlx4_en: Handle page allocation failure during receive mlx4_en: Fix cleanup flow on cq activation vlan: update vlan carrier state for admin up/down netfilter: xt_recent: fix stack overread in compat code ...
| * | e100: do not go D3 in shutdown unless system is powering offThadeu Lima de Souza Cascardo2009-04-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After experimenting with kexec with the last merges after 2.6.29, I've had some problems when probing e100. It would not read the eeprom. After some bisects, I realized this has been like that since forever (at least 2.6.18). The problem is that shutdown is doing the same thing that suspend does and puts the device in D3 state. I couldn't find a way to get the device back to a sane state in the probe function. So, based on some similar patches from Rafael J. Wysocki for e1000, e1000e, and ixgbe, I wrote this one for e100. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com> Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | Merge branch 'master' of ↵David S. Miller2009-04-29
| |\ \ | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/holtmann/bluetooth-2.6
| | * | Bluetooth: Fix connection establishment with low security requirementMarcel Holtmann2009-04-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Bluetooth 2.1 specification introduced four different security modes that can be mapped using Legacy Pairing and Simple Pairing. With the usage of Simple Pairing it is required that all connections (except the ones for SDP) are encrypted. So even the low security requirement mandates an encrypted connection when using Simple Pairing. When using Legacy Pairing (for Bluetooth 2.0 devices and older) this is not required since it causes interoperability issues. To support this properly the low security requirement translates into different host controller transactions depending if Simple Pairing is supported or not. However in case of Simple Pairing the command to switch on encryption after a successful authentication is not triggered for the low security mode. This patch fixes this and actually makes the logic to differentiate between Simple Pairing and Legacy Pairing a lot simpler. Based on a report by Ville Tervo <ville.tervo@nokia.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
| | * | Bluetooth: Add different pairing timeout for Legacy PairingMarcel Holtmann2009-04-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Bluetooth stack uses a reference counting for all established ACL links and if no user (L2CAP connection) is present, the link will be terminated to save power. The problem part is the dedicated pairing when using Legacy Pairing (Bluetooth 2.0 and before). At that point no user is present and pairing attempts will be disconnected within 10 seconds or less. In previous kernel version this was not a problem since the disconnect timeout wasn't triggered on incoming connections for the first time. However this caused issues with broken host stacks that kept the connections around after dedicated pairing. When the support for Simple Pairing got added, the link establishment procedure needed to be changed and now causes issues when using Legacy Pairing When using Simple Pairing it is possible to do a proper reference counting of ACL link users. With Legacy Pairing this is not possible since the specification is unclear in some areas and too many broken Bluetooth devices have already been deployed. So instead of trying to deal with all the broken devices, a special pairing timeout will be introduced that increases the timeout to 60 seconds when pairing is triggered. If a broken devices now puts the stack into an unforeseen state, the worst that happens is the disconnect timeout triggers after 120 seconds instead of 4 seconds. This allows successful pairings with legacy and broken devices now. Based on a report by Johan Hedberg <johan.hedberg@nokia.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
| | * | Bluetooth: Ensure that HCI sysfs add/del is preempt safeRoger Quadros2009-04-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use a different work_struct variables for add_conn() and del_conn() and use single work queue instead of two for adding and deleting connections. It eliminates the following error on a preemptible kernel: [ 204.358032] Unable to handle kernel NULL pointer dereference at virtual address 0000000c [ 204.370697] pgd = c0004000 [ 204.373443] [0000000c] *pgd=00000000 [ 204.378601] Internal error: Oops: 17 [#1] PREEMPT [ 204.383361] Modules linked in: vfat fat rfcomm sco l2cap sd_mod scsi_mod iphb pvr2d drm omaplfb ps [ 204.438537] CPU: 0 Not tainted (2.6.28-maemo2 #1) [ 204.443664] PC is at klist_put+0x2c/0xb4 [ 204.447601] LR is at klist_put+0x18/0xb4 [ 204.451568] pc : [<c0270f08>] lr : [<c0270ef4>] psr: a0000113 [ 204.451568] sp : cf1b3f10 ip : cf1b3f10 fp : cf1b3f2c [ 204.463104] r10: 00000000 r9 : 00000000 r8 : bf08029c [ 204.468353] r7 : c7869200 r6 : cfbe2690 r5 : c78692c8 r4 : 00000001 [ 204.474945] r3 : 00000001 r2 : cf1b2000 r1 : 00000001 r0 : 00000000 [ 204.481506] Flags: NzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment kernel [ 204.488861] Control: 10c5387d Table: 887fc018 DAC: 00000017 [ 204.494628] Process btdelconn (pid: 515, stack limit = 0xcf1b22e0) Signed-off-by: Roger Quadros <ext-roger.quadros@nokia.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
| * | | netfilter: revised locking for x_tablesStephen Hemminger2009-04-29
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The x_tables are organized with a table structure and a per-cpu copies of the counters and rules. On older kernels there was a reader/writer lock per table which was a performance bottleneck. In 2.6.30-rc, this was converted to use RCU and the counters/rules which solved the performance problems for do_table but made replacing rules much slower because of the necessary RCU grace period. This version uses a per-cpu set of spinlocks and counters to allow to table processing to proceed without the cache thrashing of a global reader lock and keeps the same performance for table updates. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: Avoid extra wakeups of threads blocked in wait_for_packet()Eric Dumazet2009-04-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In 2.6.25 we added UDP mem accounting. This unfortunatly added a penalty when a frame is transmitted, since we have at TX completion time to call sock_wfree() to perform necessary memory accounting. This calls sock_def_write_space() and utimately scheduler if any thread is waiting on the socket. Thread(s) waiting for an incoming frame was scheduled, then had to sleep again as event was meaningless. (All threads waiting on a socket are using same sk_sleep anchor) This adds lot of extra wakeups and increases latencies, as noted by Christoph Lameter, and slows down softirq handler. Reference : http://marc.info/?l=linux-netdev&m=124060437012283&w=2 Fortunatly, Davide Libenzi recently added concept of keyed wakeups into kernel, and particularly for sockets (see commit 37e5540b3c9d838eb20f2ca8ea2eb8072271e403 epoll keyed wakeups: make sockets use keyed wakeups) Davide goal was to optimize epoll, but this new wakeup infrastructure can help non epoll users as well, if they care to setup an appropriate handler. This patch introduces new DEFINE_WAIT_FUNC() helper and uses it in wait_for_packet(), so that only relevant event can wakeup a thread blocked in this function. Trace of function calls from bnx2 TX completion bnx2_poll_work() is : __kfree_skb() skb_release_head_state() sock_wfree() sock_def_write_space() __wake_up_sync_key() __wake_up_common() receiver_wake_function() : Stops here since thread is waiting for an INPUT Reported-by: Christoph Lameter <cl@linux.com> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: Fix typo in net_device_ops description.Mike Rapoport2009-04-27
| | | | | | | | | | | | | | | Signed-off-by: Mike Rapoport <mike@compulab.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | ipv4: Limit size of route cache hash tableAnton Blanchard2009-04-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Right now we have no upper limit on the size of the route cache hash table. On a 128GB POWER6 box it ends up as 32MB: IP route cache hash table entries: 4194304 (order: 9, 33554432 bytes) It would be nice to cap this for memory consumption reasons, but a massive hashtable also causes a significant spike when measuring OS jitter. With a 32MB hashtable and 4 million entries, rt_worker_func is taking 5 ms to complete. On another system with more memory it's taking 14 ms. Even though rt_worker_func does call cond_sched() to limit its impact, in an HPC environment we want to keep all sources of OS jitter to a minimum. With the patch applied we limit the number of entries to 512k which can still be overriden by using the rt_entries boot option: IP route cache hash table entries: 524288 (order: 6, 4194304 bytes) With this patch rt_worker_func now takes 0.460 ms on the same system. Signed-off-by: Anton Blanchard <anton@samba.org> Acked-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | Add reference to CAPI 2.0 standardKarsten Keil2009-04-27
| | | | | | | | | | | | | | | | | | | | | | | | Move the entry about CAPI 2.0 to the beginning and add a URL. Incorporate changes suggested by Randy Dunlap, thanks for proofreading. Signed-off-by: Karsten Keil <keil@b1-systems.de> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | Documentation/isdn/INTERFACE.CAPITilman Schmidt2009-04-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | isdn: document Kernel CAPI driver interface Create a file Documentation/isdn/INTERFACE.CAPI describing the interface between the kernel CAPI subsystem and ISDN device drivers, analogous to the existing Documentation/isdn/INTERFACE for the old isdn4linux subsystem. Also add kerneldoc comments to the exported functions in drivers/isdn/capi/kcapi.c. Impact: Documentation Signed-off-by: Tilman Schmidt <tilman@imap.cc> Signed-off-by: Karsten Keil <keil@b1-systems.de> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | update Documentation/isdn/00-INDEXTilman Schmidt2009-04-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | After the merging of mISDN, state which files refer only to the old isdn4linux subsystem. Also add a few missing files. Signed-off-by: Tilman Schmidt <tilman@imap.cc> Signed-off-by: Karsten Keil <keil@b1-systems.de> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | ixgbe: Fix WoL functionality for 82599 KX4 devicesWaskiewicz Jr, Peter P2009-04-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current code writes the PME enabled bit in PCI config space which is wrong. This was needed for pre-release hardware, and was not removed from the driver. Also, we need to clear the WUS (wake up status) after we resume. Otherwise we can't wake for the same event again since it's still asserted in the hardware. Plus, the multicast lists were being written improperly, causing multicast WoL to fail. Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | veth: prevent oops caused by netdev destructorStephen Hemminger2009-04-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | From: Stephen Hemminger <shemminger@vyatta.com> The veth driver will oops if sysfs hooks are open while module is removed. The net device destructor can not point to code in a module; basically there are only two possible safe values: NULL - no destructor, or free_netdev - free on last use Signed-off-by: David S. Miller <davem@davemloft.net>
| * | xfrm: wrong hash value for temporary SANicolas Dichtel2009-04-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When kernel inserts a temporary SA for IKE, it uses the wrong hash value for dst list. Two hash values were calcultated before: one with source address and one with a wildcard source address. Bug hinted by Junwei Zhang <junwei.zhang@6wind.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | forcedeth: tx timeout fixAyaz Abdulla2009-04-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes the tx_timeout() to properly handle the clean up of the tx ring. It also sets the tx put pointer back to the correct position to be in sync with HW. Signed-off-by: Ayaz Abdulla <aabdulla@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: Fix LL_MAX_HEADER for CONFIG_TR_MODULEAdrian Bunk2009-04-27
| | | | | | | | | | | | | | | | | | | | | Unless I miss anything this should fix a bug. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | mlx4_en: Handle page allocation failure during receiveYevgeny Petrilin2009-04-27
| | | | | | | | | | | | | | | | | | | | | | | | If we failed to allocate new fragments for receive buffer, the packet should be dropped and packets should be reused. Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | mlx4_en: Fix cleanup flow on cq activationYevgeny Petrilin2009-04-27
| | | | | | | | | | | | | | | | | | | | | | | | In case of mlx4_en_activate_cq() failure, the cleanup code would go to rx_err and try to disable unactivated rings. Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | vlan: update vlan carrier state for admin up/downJay Vosburgh2009-04-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, the VLAN event handler does not adjust the VLAN device's carrier state when the real device or the VLAN device is set administratively up or down. The following patch adds a transfer of operating state from the real device to the VLAN device when the real device is administratively set up or down, and sets the carrier state up or down during init, open and close of the VLAN device. This permits observers above the VLAN device that care about the carrier state (bonding's link monitor, for example) to receive updates for administrative changes by more closely mimicing the behavior of real devices. Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
| * | Merge branch 'master' of ↵David S. Miller2009-04-25
| |\ \ | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6
| | * | netfilter: xt_recent: fix stack overread in compat codeJan Engelhardt2009-04-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Related-to: commit 325fb5b4d26038cba665dd0d8ee09555321061f0 The compat path suffers from a similar problem. It only uses a __be32 when all of the recent code uses, and expects, an nf_inet_addr everywhere. As a result, addresses stored by xt_recents were filled with whatever other stuff was on the stack following the be32. Signed-off-by: Jan Engelhardt <jengelh@medozas.de> With a minor compile fix from Roman. Reported-and-tested-by: Roman Hoog Antink <rha@open.ch> Signed-off-by: Patrick McHardy <kaber@trash.net>
| | * | netfilter: nf_ct_dccp: add missing role attributes for DCCPPablo Neira Ayuso2009-04-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds missing role attribute to the DCCP type, otherwise the creation of entries is not of any use. The attribute added is CTA_PROTOINFO_DCCP_ROLE which contains the role of the conntrack original tuple. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net>