aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAge
* vhost: fix attach to cgroups regressionMichael S. Tsirkin2010-09-06
| | | | | | | | | | | | | | | Since 2.6.36-rc1, non-root users of vhost-net fail to attach if they are in any cgroups. The reason is that when qemu uses vhost, vhost wants to attach its thread to all cgroups that qemu has. But we got the API backwards, so a non-priveledged process (Qemu) tried to control the priveledged one (vhost), which fails. Fix this by switching to the new cgroup_attach_task_all, and running it from the vhost thread. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* cgroups: fix API thinkoMichael S. Tsirkin2010-09-05
| | | | | | | | | | | | | | | | | | cgroup_attach_task_current_cg API that have upstream is backwards: we really need an API to attach to the cgroups from another process A to the current one. In our case (vhost), a priveledged user wants to attach it's task to cgroups from a less priveledged one, the API makes us run it in the other task's context, and this fails. So let's make the API generic and just pass in 'from' and 'to' tasks. Add an inline wrapper for cgroup_attach_task_current_cg to avoid breaking bisect. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Paul Menage <menage@google.com>
* pkt_sched: Fix lockdep warning on est_tree_lock in gen_estimatorJarek Poplawski2010-09-02
| | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes a lockdep warning: [ 516.287584] ========================================================= [ 516.288386] [ INFO: possible irq lock inversion dependency detected ] [ 516.288386] 2.6.35b #7 [ 516.288386] --------------------------------------------------------- [ 516.288386] swapper/0 just changed the state of lock: [ 516.288386] (&qdisc_tx_lock){+.-...}, at: [<c12eacda>] est_timer+0x62/0x1b4 [ 516.288386] but this lock took another, SOFTIRQ-unsafe lock in the past: [ 516.288386] (est_tree_lock){+.+...} [ 516.288386] [ 516.288386] and interrupts could create inverse lock ordering between them. ... So, est_tree_lock needs BH protection because it's taken by qdisc_tx_lock, which is used both in BH and process contexts. (Full warning with this patch at netdev, 02 Sep 2010.) Fixes commit: ae638c47dc040b8def16d05dc6acdd527628f231 ("pkt_sched: gen_estimator: add a new lock") Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipvs: avoid oops for passive FTPJulian Anastasov2010-09-02
| | | | | | | | | | | Fix Passive FTP problem in ip_vs_ftp: - Do not oops in nf_nat_set_seq_adjust (adjust_tcp_sequence) when iptable_nat module is not loaded Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* Revert "sky2: don't do GRO on second port"David S. Miller2010-09-02
| | | | | | | | | This reverts commit de6be6c1f77798c4da38301693d33aff1cd76e84. After some discussion with Jarek Poplawski and Eric Dumazet, we've decided that this change is incorrect. Signed-off-by: David S. Miller <davem@davemloft.net>
* gro: fix different skb headroomsEric Dumazet2010-09-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Packets entering GRO might have different headrooms, even for a given flow (because of implementation details in drivers, like copybreak). We cant force drivers to deliver packets with a fixed headroom. 1) fix skb_segment() skb_segment() makes the false assumption headrooms of fragments are same than the head. When CHECKSUM_PARTIAL is used, this can give csum_start errors, and crash later in skb_copy_and_csum_dev() 2) allocate a minimal skb for head of frag_list skb_gro_receive() uses netdev_alloc_skb(headroom + skb_gro_offset(p)) to allocate a fresh skb. This adds NET_SKB_PAD to a padding already provided by netdevice, depending on various things, like copybreak. Use alloc_skb() to allocate an exact padding, to reduce cache line needs: NET_SKB_PAD + NET_IP_ALIGN bugzilla : https://bugzilla.kernel.org/show_bug.cgi?id=16626 Many thanks to Plamen Petrov, testing many debugging patches ! With help of Jarek Poplawski. Reported-by: Plamen Petrov <pvp-lsts@fs.uni-ruse.bg> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* bridge: Clear INET control block of SKBs passed into ip_fragment().David S. Miller2010-09-01
| | | | | | | | | | | | | In a similar vain to commit 17762060c25590bfddd68cc1131f28ec720f405f ("bridge: Clear IPCB before possible entry into IP stack") Any time we call into the IP stack we have to make sure the state there is as expected by the ipv4 code. With help from Eric Dumazet and Herbert Xu. Reported-by: Bandan Das <bandan.das@stratus.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* 3c59x: Remove incorrect locking; correct documented lock hierarchyBen Hutchings2010-09-01
| | | | | | | | | | | | | | | | | vortex_ioctl() was grabbing vortex_private::lock around its call to generic_mii_ioctl(). This is no longer necessary since there are more specific locks which the mdio_{read,write}() functions will obtain. Worse, those functions do not save and restore IRQ flags when locking the MII state, so interrupts will be enabled when generic_mii_ioctl() returns. Since there is currently no need for any function to call mdio_{read,write}() while holding another spinlock, do not change them to save and restore IRQ flags but remove the specification of ordering between vortex_private::lock and vortex_private::mii_lock. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
* sky2: don't do GRO on second portstephen hemminger2010-09-01
| | | | | | | | | | | | | | | | | | | | | | | There's something very important I forgot to tell you. What? Don't cross the GRO streams. Why? It would be bad. I'm fuzzy on the whole good/bad thing. What do you mean, "bad"? Try to imagine all the Internet as you know it stopping instantaneously and every bit in every packet swapping at the speed of light. Total packet reordering. Right. That's bad. Okay. All right. Important safety tip. Thanks, Hubert The simplest way to stop this is just avoid doing GRO on the second port. Very few Marvell boards support two ports per ring, and GRO is just an optimization. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: minor fix about RPF in help of KconfigNicolas Dichtel2010-09-01
| | | | | Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* xfrm_user: avoid a warning with some compilerNicolas Dichtel2010-09-01
| | | | | | | | Attached is a small patch to remove a warning ("warning: ISO C90 forbids mixed declarations and code" with gcc 4.3.2). Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/sched/sch_hfsc.c: initialize parent's cl_cfmin properly in init_vf()Michal Soltys2010-09-01
| | | | | | | | | This patch fixes init_vf() function, so on each new backlog period parent's cl_cfmin is properly updated (including further propgation towards the root), even if the activated leaf has no upperlimit curve defined. Signed-off-by: Michal Soltys <soltys@ziu.info> Signed-off-by: David S. Miller <davem@davemloft.net>
* pxa168_eth: fix a mdiobus leakDenis Kirjanov2010-09-01
| | | | | | | | mdiobus resources must be released on exit Signed-off-by: Denis Kirjanov <dkirjanov@kernel.org> Acked-by: Dan Carpenter <error27@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net sched: fix kernel leak in act_policeJeff Mahoney2010-09-01
| | | | | | | | | | | | | | | | While reviewing commit 1c40be12f7d8ca1d387510d39787b12e512a7ce8, I audited other users of tc_action_ops->dump for information leaks. That commit covered almost all of them but act_police still had a leak. opt.limit and opt.capab aren't zeroed out before the structure is passed out. This patch uses the C99 initializers to zero everything unused out. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Acked-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* vhost: stop worker only if createdEric Dumazet2010-09-01
| | | | | | | | | Its currently illegal to call kthread_stop(NULL) Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* MAINTAINERS: Add ehea driver as SupportedBreno Leitao2010-09-01
| | | | | | | This change just add the IBM eHEA 10Gb network drivers as supported. Signed-off-by: Breno Leitao <leitao@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'master' of ↵David S. Miller2010-09-01
|\ | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6
| * ath9k_hw: fix parsing of HT40 5 GHz CTLsLuis R. Rodriguez2010-08-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The 5 GHz CTL indexes were not being read for all hardware devices due to the masking out through the CTL_MODE_M mask being one bit too short. Without this the calibrated regulatory maximum values were not being picked up when devices operate on 5 GHz in HT40 mode. The final output power used for Atheros devices is the minimum between the calibrated CTL values and what CRDA provides. Cc: stable@kernel.org [2.6.27+] Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
| * ath9k_hw: Fix EEPROM uncompress block reading on AR9003Luis R. Rodriguez2010-08-31
| | | | | | | | | | | | | | | | | | | | | | | | | | The EEPROM is compressed on AR9003, upon decompression the wrong upper limit was being used for the block which prevented the 5 GHz CTL indexes from being used, which are stored towards the end of the EEPROM block. This fix allows the actual intended regulatory limits to be used on AR9003 hardware. Cc: stable@kernel.org [2.6.36+] Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
| * wireless: register wiphy rfkill w/o holding cfg80211_mutexJohn W. Linville2010-08-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Otherwise lockdep complains... https://bugzilla.kernel.org/show_bug.cgi?id=17311 [ INFO: possible circular locking dependency detected ] 2.6.36-rc2-git4 #12 ------------------------------------------------------- kworker/0:3/3630 is trying to acquire lock: (rtnl_mutex){+.+.+.}, at: [<ffffffff813396c7>] rtnl_lock+0x12/0x14 but task is already holding lock: (rfkill_global_mutex){+.+.+.}, at: [<ffffffffa014b129>] rfkill_switch_all+0x24/0x49 [rfkill] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (rfkill_global_mutex){+.+.+.}: [<ffffffff81079ad7>] lock_acquire+0x120/0x15b [<ffffffff813ae869>] __mutex_lock_common+0x54/0x52e [<ffffffff813aede9>] mutex_lock_nested+0x34/0x39 [<ffffffffa014b4ab>] rfkill_register+0x2b/0x29c [rfkill] [<ffffffffa0185ba0>] wiphy_register+0x1ae/0x270 [cfg80211] [<ffffffffa0206f01>] ieee80211_register_hw+0x1b4/0x3cf [mac80211] [<ffffffffa0292e98>] iwl_ucode_callback+0x9e9/0xae3 [iwlagn] [<ffffffff812d3e9d>] request_firmware_work_func+0x54/0x6f [<ffffffff81065d15>] kthread+0x8c/0x94 [<ffffffff8100ac24>] kernel_thread_helper+0x4/0x10 -> #1 (cfg80211_mutex){+.+.+.}: [<ffffffff81079ad7>] lock_acquire+0x120/0x15b [<ffffffff813ae869>] __mutex_lock_common+0x54/0x52e [<ffffffff813aede9>] mutex_lock_nested+0x34/0x39 [<ffffffffa018605e>] cfg80211_get_dev_from_ifindex+0x1b/0x7c [cfg80211] [<ffffffffa0189f36>] cfg80211_wext_giwscan+0x58/0x990 [cfg80211] [<ffffffff8139a3ce>] ioctl_standard_iw_point+0x1a8/0x272 [<ffffffff8139a529>] ioctl_standard_call+0x91/0xa7 [<ffffffff8139a687>] T.723+0xbd/0x12c [<ffffffff8139a727>] wext_handle_ioctl+0x31/0x6d [<ffffffff8133014e>] dev_ioctl+0x63d/0x67a [<ffffffff8131afd9>] sock_ioctl+0x48/0x21d [<ffffffff81102abd>] do_vfs_ioctl+0x4ba/0x509 [<ffffffff81102b5d>] sys_ioctl+0x51/0x74 [<ffffffff81009e02>] system_call_fastpath+0x16/0x1b -> #0 (rtnl_mutex){+.+.+.}: [<ffffffff810796b0>] __lock_acquire+0xa93/0xd9a [<ffffffff81079ad7>] lock_acquire+0x120/0x15b [<ffffffff813ae869>] __mutex_lock_common+0x54/0x52e [<ffffffff813aede9>] mutex_lock_nested+0x34/0x39 [<ffffffff813396c7>] rtnl_lock+0x12/0x14 [<ffffffffa0185cb5>] cfg80211_rfkill_set_block+0x1a/0x7b [cfg80211] [<ffffffffa014aed0>] rfkill_set_block+0x80/0xd5 [rfkill] [<ffffffffa014b07e>] __rfkill_switch_all+0x3f/0x6f [rfkill] [<ffffffffa014b13d>] rfkill_switch_all+0x38/0x49 [rfkill] [<ffffffffa014b821>] rfkill_op_handler+0x105/0x136 [rfkill] [<ffffffff81060708>] process_one_work+0x248/0x403 [<ffffffff81062620>] worker_thread+0x139/0x214 [<ffffffff81065d15>] kthread+0x8c/0x94 [<ffffffff8100ac24>] kernel_thread_helper+0x4/0x10 Signed-off-by: John W. Linville <linville@tuxdriver.com> Acked-by: Johannes Berg <johannes@sipsolutions.net>
| * wireless extensions: fix kernel heap content leakJohannes Berg2010-08-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Wireless extensions have an unfortunate, undocumented requirement which requires drivers to always fill iwp->length when returning a successful status. When a driver doesn't do this, it leads to a kernel heap content leak when userspace offers a larger buffer than would have been necessary. Arguably, this is a driver bug, as it should, if it returns 0, fill iwp->length, even if it separately indicated that the buffer contents was not valid. However, we can also at least avoid the memory content leak if the driver doesn't do this by setting the iwp length to max_tokens, which then reflects how big the buffer is that the driver may fill, regardless of how big the userspace buffer is. To illustrate the point, this patch also fixes a corresponding cfg80211 bug (since this requirement isn't documented nor was ever pointed out by anyone during code review, I don't trust all drivers nor all cfg80211 handlers to implement it correctly). Cc: stable@kernel.org [all the way back] Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
| * MAINTAINERS: change broken url for prism54John W. Linville2010-08-30
| | | | | | | | | | Reported-by: Joe Perches <joe@perches.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
| * mac80211: delete work timerJohannes Berg2010-08-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The new workqueue changes helped me find this bug that's been lingering since the changes to the work processing in mac80211 -- the work timer is never deleted properly. Do that to avoid having it fire after all data structures have been freed. It can't be re-armed because all it will do, if running, is schedule the work, but that gets flushed later and won't have anything to do since all work items are gone by now (by way of interface removal). Cc: stable@kernel.org [2.6.34+] Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
| * p54: fix tx feedback status flag checkChristian Lamparter2010-08-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Michael reported that p54* never really entered power save mode, even tough it was enabled. It turned out that upon a power save mode change the firmware will set a special flag onto the last outgoing frame tx status (which in this case is almost always the designated PSM nullfunc frame). This flag confused the driver; It erroneously reported transmission failures to the stack, which then generated the next nullfunc. and so on... Cc: <stable@kernel.org> Reported-by: Michael Buesch <mb@bu3sch.de> Tested-by: Michael Buesch <mb@bu3sch.de> Signed-off-by: Christian Lamparter <chunkeey@googlemail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
| * ath5k: check return value of ieee80211_get_tx_rateJohn W. Linville2010-08-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This avoids a NULL pointer dereference as reported here: https://bugzilla.redhat.com/show_bug.cgi?id=625889 When the WARN condition is hit in ieee80211_get_tx_rate, it will return NULL. So, we need to check the return value and avoid dereferencing it in that case. Signed-off-by: John W. Linville <linville@tuxdriver.com> Cc: stable@kernel.org Acked-by: Bob Copeland <me@bobcopeland.com>
| * libertas: if_sdio: fix buffer alignment in struct if_sdio_cardMike Rapoport2010-08-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The commit 886275ce41a9751117367fb387ed171049eb6148 (param: lock if_sdio's lbs_helper_name and lbs_fw_name against sysfs changes) introduced new fields into the if_sdio_card structure. It caused missalignment of the if_sdio_card.buffer field and failure at driver load time: ~# modprobe libertas_sdio [ 62.315124] libertas_sdio: Libertas SDIO driver [ 62.319976] libertas_sdio: Copyright Pierre Ossman [ 63.020629] DMA misaligned error with device 48 [ 63.025207] mmci-omap-hs mmci-omap-hs.1: unexpected dma status 800 [ 66.005035] libertas: command 0x0003 timed out [ 66.009826] libertas: Timeout submitting command 0x0003 [ 66.016296] libertas: PREP_CMD: command 0x0003 failed: -110 Adding explicit alignment attribute for the if_sdio_card.buffer field fixes this problem. Signed-off-by: Mike Rapoport <mike@compulab.co.il> Acked-by: Marek Vasut <marek.vasut@gmail.com> Acked-by: Dan Williams <dcbw@redhat.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
* | netlink: Make NETLINK_USERSOCK work again.David S. Miller2010-08-31
| | | | | | | | | | | | | | | | | | | | Once we started enforcing the a nl_table[] entry exist for a protocol, NETLINK_USERSOCK stopped working. Add a dummy table entry so that it works again. Reported-by: Thomas Voegtle <tv@lio96.de> Tested-by: Thomas Voegtle <tv@lio96.de> Signed-off-by: David S. Miller <davem@davemloft.net>
* | irda: Correctly clean up self->ias_obj on irda_bind() failure.David S. Miller2010-08-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | If irda_open_tsap() fails, the irda_bind() code tries to destroy the ->ias_obj object by hand, but does so wrongly. In particular, it fails to a) release the hashbin attached to the object and b) reset the self->ias_obj pointer to NULL. Fix both problems by using irias_delete_object() and explicitly setting self->ias_obj to NULL, just as irda_release() does. Reported-by: Tavis Ormandy <taviso@cmpxchg8b.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | pcnet_cs: add new_idKen Kawasaki2010-08-28
| | | | | | | | | | | | | | | | pcnet_cs: add new_id: "KENTRONICS KEP-230" 10Base-T PCMCIA card. Signed-off-by: Ken Kawasaki <ken_kawasaki@spring.nifty.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net/ipv4: Eliminate kstrdup memory leakJulia Lawall2010-08-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The string clone is only used as a temporary copy of the argument val within the while loop, and so it should be freed before leaving the function. The call to strsep, however, modifies clone, so a pointer to the front of the string is kept in saved_clone, to make it possible to free it. The sematic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @r exists@ local idexpression x; expression E; identifier l; statement S; @@ *x= \(kasprintf\|kstrdup\)(...); ... if (x == NULL) S ... when != kfree(x) when != E = x if (...) { <... when != kfree(x) * goto l; ...> * return ...; } // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net/caif/cfrfml.c: use asm/unaligned.hJeff Mahoney2010-08-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | caif does not build on ia64 starting with 2.6.32-rc1. Using asm/unaligned.h instead of linux/unaligned/le_byteshift.h fixes the issue. include/linux/unaligned/le_byteshift.h:40:50: error: redefinition of 'get_unaligned_le16' include/linux/unaligned/le_byteshift.h:45:50: error: redefinition of 'get_unaligned_le32' include/linux/unaligned/le_byteshift.h:50:50: error: redefinition of 'get_unaligned_le64' include/linux/unaligned/le_byteshift.h:55:51: error: redefinition of 'put_unaligned_le16' include/linux/unaligned/le_byteshift.h:60:51: error: redefinition of 'put_unaligned_le32' include/linux/unaligned/le_byteshift.h:65:51: error: redefinition of 'put_unaligned_le64' include/linux/unaligned/le_struct.h:31:51: note: previous definition of 'put_unaligned_le64' was here Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | ax25: missplaced sock_put(sk)Bernard Pidoux F6BVP2010-08-26
| | | | | | | | | | | | | | | | | | This patch moves a missplaced sock_put(sk) after bh_unlock_sock(sk) like in other parts of AX25 driver. Signed-off-by: Bernard Pidoux <f6bvp@free.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
* | qlge: reset the chip before freeing the buffersBreno Leitao2010-08-26
| | | | | | | | | | | | | | | | | | | | | | Qlge is freeing the buffers before stopping the card DMA, and this can cause some severe error, as a EEH event on PPC. This patch just stop the card and then free the resources. Signed-off-by: Breno Leitao <leitao@linux.vnet.ibm.com> Signed-off-by: Ron Mercer <ron.mercer@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | l2tp: test for ethernet header in l2tp_eth_dev_recv()Eric Dumazet2010-08-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | close https://bugzilla.kernel.org/show_bug.cgi?id=16529 Before calling dev_forward_skb(), we should make sure skb head contains at least an ethernet header, even if length included in upper layer said so. Use pskb_may_pull() to make sure this ethernet header is present in skb head. Reported-by: Thomas Heil <heil@terminal-consulting.de> Reported-by: Ian Campbell <Ian.Campbell@eu.citrix.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | tcp: select(writefds) don't hang up when a peer close connectionKOSAKI Motohiro2010-08-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This issue come from ruby language community. Below test program hang up when only run on Linux. % uname -mrsv Linux 2.6.26-2-486 #1 Sat Dec 26 08:37:39 UTC 2009 i686 % ruby -rsocket -ve ' BasicSocket.do_not_reverse_lookup = true serv = TCPServer.open("127.0.0.1", 0) s1 = TCPSocket.open("127.0.0.1", serv.addr[1]) s2 = serv.accept s2.close s1.write("a") rescue p $! s1.write("a") rescue p $! Thread.new { s1.write("a") }.join' ruby 1.9.3dev (2010-07-06 trunk 28554) [i686-linux] #<Errno::EPIPE: Broken pipe> [Hang Here] FreeBSD, Solaris, Mac doesn't. because Ruby's write() method call select() internally. and tcp_poll has a bug. SUS defined 'ready for writing' of select() as following. | A descriptor shall be considered ready for writing when a call to an output | function with O_NONBLOCK clear would not block, whether or not the function | would transfer data successfully. That said, EPIPE situation is clearly one of 'ready for writing'. We don't have read-side issue because tcp_poll() already has read side shutdown care. | if (sk->sk_shutdown & RCV_SHUTDOWN) | mask |= POLLIN | POLLRDNORM | POLLRDHUP; So, Let's insert same logic in write side. - reference url http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/31065 http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/31068 Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | tcp: fix three tcp sysctls tuningEric Dumazet2010-08-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As discovered by Anton Blanchard, current code to autotune tcp_death_row.sysctl_max_tw_buckets, sysctl_tcp_max_orphans and sysctl_max_syn_backlog makes little sense. The bigger a page is, the less tcp_max_orphans is : 4096 on a 512GB machine in Anton's case. (tcp_hashinfo.bhash_size * sizeof(struct inet_bind_hashbucket)) is much bigger if spinlock debugging is on. Its wrong to select bigger limits in this case (where kernel structures are also bigger) bhash_size max is 65536, and we get this value even for small machines. A better ground is to use size of ehash table, this also makes code shorter and more obvious. Based on a patch from Anton, and another from David. Reported-and-tested-by: Anton Blanchard <anton@samba.org> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | tcp: Combat per-cpu skew in orphan tests.David S. Miller2010-08-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As reported by Anton Blanchard when we use percpu_counter_read_positive() to make our orphan socket limit checks, the check can be off by up to num_cpus_online() * batch (which is 32 by default) which on a 128 cpu machine can be as large as the default orphan limit itself. Fix this by doing the full expensive sum check if the optimized check triggers. Reported-by: Anton Blanchard <anton@samba.org> Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
* | pxa168_eth: silence gcc warningsDan Carpenter2010-08-24
| | | | | | | | | | | | | | | | | | | | | | Casting "pep->tx_desc_dma" to to a struct tx_desc pointer makes gcc complain: drivers/net/pxa168_eth.c:657: warning: cast to pointer from integer of different size Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | pxa168_eth: update call to phy_mii_ioctl()Dan Carpenter2010-08-24
| | | | | | | | | | | | | | | | The phy_mii_ioctl() function changed recently. It now takes a struct ifreq pointer directly. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | pxa168_eth: fix error handling in propeDan Carpenter2010-08-24
| | | | | | | | | | | | | | | | | | | | | | | | A couple issues here: * Some resources weren't released. * If alloc_etherdev() failed it would have caused a NULL dereference because "pep" would be null when we checked "if (pep->clk)". * Also it's better to propagate the error codes from mdiobus_register() instead of just returning -ENOMEM. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | pxa168_eth: remove unneeded null checkDan Carpenter2010-08-24
| | | | | | | | | | | | | | | | | | | | "pep->pd" isn't checked consistently in this function. For example it's dereferenced unconditionally on the next line after the end of the if condition. This function is only called from pxa168_eth_probe() and pep->pd is always non-NULL so I removed the check. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | phylib: Fix race between returning phydev and calling adjust_linkAnton Vorontsov2010-08-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is possible that phylib will call adjust_link before returning from {,of_}phy_connect(), which may cause the following [very rare, though] oops upon reopening the device: Unable to handle kernel paging request for data at address 0x0000024c Oops: Kernel access of bad area, sig: 11 [#1] PREEMPT SMP NR_CPUS=2 LTT NESTING LEVEL : 0 P1021 RDB Modules linked in: NIP: c0345dac LR: c0345dac CTR: c0345d84 TASK = dffab6b0[30] 'events/0' THREAD: c0d24000 CPU: 0 [...] NIP [c0345dac] adjust_link+0x28/0x19c LR [c0345dac] adjust_link+0x28/0x19c Call Trace: [c0d25f00] [000045e1] 0x45e1 (unreliable) [c0d25f30] [c036c158] phy_state_machine+0x3ac/0x554 [...] Here is why. Drivers store phydev in their private structures, e.g. gianfar driver: static int init_phy(struct net_device *dev) { ... priv->phydev = of_phy_connect(...); ... } So that adjust_link could retrieve it back: static void adjust_link(struct net_device *dev) { ... struct phy_device *phydev = priv->phydev; ... } If the device has been opened before, then phydev->state is set to PHY_HALTED (or undefined if the driver didn't call phy_stop()). Now, phy_connect starts the PHY state machine before returning phydev to the driver: phy_start_machine(phydev, NULL); if (phydev->irq > 0) phy_start_interrupts(phydev); return phydev; The time between 'phy_start_machine()' and 'return phydev' is undefined. The start machine routine delays execution for 1 second, which is enough for most cases. But under heavy load, or if you're unlucky, it is quite possible that PHY state machine will execute before phy_connect() returns, and so adjust_link callback will try to dereference phydev, which is not yet ready. To fix the issue, simply initialize the PHY's state to PHY_READY during phy_attach(). This will ensure that phylib won't call adjust_link before phy_start(). Signed-off-by: Anton Vorontsov <avorontsov@mvista.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | caif-driver: add HAS_DMA dependencyHeiko Carstens2010-08-24
| | | | | | | | | | | | | | | | | | | | | | Fix this error on an s390 allyesconfig build: linux-2.6/drivers/net/caif/caif_spi.c:98: undefined reference to `dma_free_coherent' Cc: Sjur Braendeland <sjur.brandeland@stericsson.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | 3c59x: Fix deadlock between boomerang_interrupt and boomerang_start_txNeil Horman2010-08-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If netconsole is in use, there is a possibility for deadlock in 3c59x between boomerang_interrupt and boomerang_start_xmit. Both routines take the vp->lock, and if netconsole is in use, a pr_* call from the boomerang_interrupt routine will result in the netconsole code attempting to trnasmit an skb, which can try to take the same spin lock, resulting in deadlock. The fix is pretty straightforward. This patch allocats a bit in the 3c59x private structure to indicate that its handling an interrupt. If we get into the transmit routine and that bit is set, we can be sure that we have recursed and will deadlock if we continue, so instead we just return NETDEV_TX_BUSY, so the stack requeues the skb to try again later. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | qlcnic: fix poll implementationYinglin Luan2010-08-23
| | | | | | | | | | | | | | | | Function qlcnic_intr has pointer to qlcnic_host_sds_ring as second parameter not pointer to qlcnic_adapter. Signed-off-by: Yinglin Luan <synmyth@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | netxen: fix poll implementationYinglin Luan2010-08-23
| | | | | | | | | | | | | | | | Function netxen_intr has pointer to nx_host_sds_ring as second parameter not pointer to netxen_adapter. Signed-off-by: Yinglin Luan <synmyth@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | bridge: netfilter: fix a memory leakChangli Gao2010-08-23
| | | | | | | | | | | | | | | | | | nf_bridge_alloc() always reset the skb->nf_bridge, so we should always put the old one. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Bart De Schuymer <bdschuym@pandora.be> Signed-off-by: David S. Miller <davem@davemloft.net>
* | netfilter: fix CONFIG_COMPAT supportFlorian Westphal2010-08-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit f3c5c1bfd430858d3a05436f82c51e53104feb6b (netfilter: xtables: make ip_tables reentrant) forgot to also compute the jumpstack size in the compat handlers. Result is that "iptables -I INPUT -j userchain" turns into -j DROP. Reported by Sebastian Roesner on #netfilter, closes http://bugzilla.netfilter.org/show_bug.cgi?id=669. Note: arptables change is compile-tested only. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Tested-by: Mikael Pettersson <mikpe@it.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>
* | isdn/avm: fix build when PCMCIA is not enabledRandy Dunlap2010-08-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Why wouldn't kconfig symbol ISDN_DRV_AVMB1_B1PCMCIA also depend on PCMCIA? Fix build for PCMCIA not enabled: ERROR: "b1_free_card" [drivers/isdn/hardware/avm/b1pcmcia.ko] undefined! ERROR: "b1ctl_proc_fops" [drivers/isdn/hardware/avm/b1pcmcia.ko] undefined! ERROR: "b1_reset_ctr" [drivers/isdn/hardware/avm/b1pcmcia.ko] undefined! ERROR: "b1_load_firmware" [drivers/isdn/hardware/avm/b1pcmcia.ko] undefined! ERROR: "b1_send_message" [drivers/isdn/hardware/avm/b1pcmcia.ko] undefined! ERROR: "b1_release_appl" [drivers/isdn/hardware/avm/b1pcmcia.ko] undefined! ERROR: "b1_register_appl" [drivers/isdn/hardware/avm/b1pcmcia.ko] undefined! ERROR: "b1_getrevision" [drivers/isdn/hardware/avm/b1pcmcia.ko] undefined! ERROR: "b1_detect" [drivers/isdn/hardware/avm/b1pcmcia.ko] undefined! ERROR: "b1_interrupt" [drivers/isdn/hardware/avm/b1pcmcia.ko] undefined! ERROR: "b1_alloc_card" [drivers/isdn/hardware/avm/b1pcmcia.ko] undefined! Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Carsten Paeth <calle@calle.de> Cc: Karsten Keil <isdn@linux-pingi.de> Signed-off-by: David S. Miller <davem@davemloft.net>
* | header: fix broken headers for user spaceChangli Gao2010-08-23
| | | | | | | | | | | | | | | | | | | | __packed is only defined in kernel space, so we should use __attribute__((packed)) for the code shared between kernel and user space. Two __attribute() annotations are replaced with __attribute__() too. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>