litmus-rt.git - The LITMUS^RT kernel.

	Commit message (Collapse)	Author	Age
*	net: fix sk_forward_alloc corruption	Eric Dumazet	2009-10-30
\| \| \| \| \| \| \| \| \| \| \| \|	On UDP sockets, we must call skb_free_datagram() with socket locked, or risk sk_forward_alloc corruption. This requirement is not respected in SUNRPC. Add a convenient helper, skb_free_datagram_locked() and use it in SUNRPC Reported-by: Francis Moreau <francis.moro@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: Fix RPF to work with policy routing	jamal	2009-10-30
\| \| \| \| \| \| \| \|	Policy routing is not looked up by mark on reverse path filtering. This fixes it. Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: Fix 'Re: PACKET_TX_RING: packet size is too long'	Gabor Gombas	2009-10-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently PACKET_TX_RING forces certain amount of every frame to remain unused. This probably originates from an early version of the PACKET_TX_RING patch that in fact used the extra space when the (since removed) CONFIG_PACKET_MMAP_ZERO_COPY option was enabled. The current code does not make any use of this extra space. This patch removes the extra space reservation and lets userspace make use of the full frame size. Signed-off-by: Gabor Gombas <gombasg@sztaki.hu> Signed-off-by: David S. Miller <davem@davemloft.net>
*	AF_RAW: Augment raw_send_hdrinc to expand skb to fit iphdr->ihl (v2)	Neil Horman	2009-10-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Augment raw_send_hdrinc to correct for incorrect ip header length values A series of oopses was reported to me recently. Apparently when using AF_RAW sockets to send data to peers that were reachable via ipsec encapsulation, people could panic or BUG halt their systems. I've tracked the problem down to user space sending an invalid ip header over an AF_RAW socket with IP_HDRINCL set to 1. Basically what happens is that userspace sends down an ip frame that includes only the header (no data), but sets the ip header ihl value to a large number, one that is larger than the total amount of data passed to the sendmsg call. In raw_send_hdrincl, we allocate an skb based on the size of the data in the msghdr that was passed in, but assume the data is all valid. Later during ipsec encapsulation, xfrm4_tranport_output moves the entire frame back in the skbuff to provide headroom for the ipsec headers. During this operation, the skb->transport_header is repointed to a spot computed by skb->network_header + the ip header length (ihl). Since so little data was passed in relative to the value of ihl provided by the raw socket, we point transport header to an unknown location, resulting in various crashes. This fix for this is pretty straightforward, simply validate the value of of iph->ihl when sending over a raw socket. If (iph->ihl*4U) > user data buffer size, drop the frame and return -EINVAL. I just confirmed this fixes the reported crashes. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	mac80211: fix for incorrect sequence number on hostapd injected frames	Björn Smedman	2009-10-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When hostapd injects a frame, e.g. an authentication or association response, mac80211 looks for a suitable access point virtual interface to associate the frame with based on its source address. This makes it possible e.g. to correctly assign sequence numbers to the frames. A small typo in the ethernet address comparison statement caused a failure to find a suitable ap interface. Sequence numbers on such frames where therefore left unassigned causing some clients (especially windows-based 11b/g clients) to reject them and fail to authenticate or associate with the access point. This patch fixes the typo in the address comparison statement. Signed-off-by: Björn Smedman <bjorn.smedman@venatech.se> Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Cc: stable@kernel.org Signed-off-by: John W. Linville <linville@tuxdriver.com>
*	mac80211: trivial: fix spelling in mesh_hwmp	Andrey Yurovsky	2009-10-27
\| \| \| \| \| \| \| \|	Fix a typo in the description of hwmp_route_info_get(), no function changes. Signed-off-by: Andrey Yurovsky <andrey@cozybit.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
*	cfg80211: sme: deauthenticate on assoc failure	Johannes Berg	2009-10-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	When the in-kernel SME gets an association failure from the AP we don't deauthenticate, and thus get into a very confused state which will lead to warnings later on. Fix this by actually deauthenticating when the AP indicates an association failure. (Brought to you by the hacking session at Kernel Summit 2009 in Tokyo, Japan. -- JWL) Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
*	mac80211: keep auth state when assoc fails	Johannes Berg	2009-10-27
\| \| \| \| \| \| \| \| \| \| \| \| \|	When association fails, we should stay authenticated, which in mac80211 is represented by the existence of the mlme work struct, so we cannot free that, instead we need to just set it to idle. (Brought to you by the hacking session at Kernel Summit 2009 in Tokyo, Japan. -- JWL) Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
*	mac80211: fix ibss joining	Reinette Chatre	2009-10-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Recent commit "mac80211: fix logic error ibss merge bssid check" fixed joining of ibss cell when static bssid is provided. In this case ifibss->bssid is set before the cell is joined and comparing that address to a bss should thus always succeed. Unfortunately this change broke the other case of joining a ibss cell without providing a static bssid where the value of ifibss->bssid is not set before the cell is joined. Since ifibss->bssid may be set before or after joining the cell we do not learn anything by comparing it to a known bss. Remove this check. Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
*	pktgen: Dont leak kernel memory	Eric Dumazet	2009-10-24
\| \| \| \| \| \| \| \| \| \| \| \|	While playing with pktgen, I realized IP ID was not filled and a random value was taken, possibly leaking 2 bytes of kernel memory. We can use an increasing ID, this can help diagnostics anyway. Also clear packet payload, instead of leaking kernel memory. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: use WARN() for the WARN_ON in commit b6b39e8f3fbbb	Arjan van de Ven	2009-10-23
\| \| \| \| \| \| \| \| \| \| \| \|	Commit b6b39e8f3fbbb (tcp: Try to catch MSG_PEEK bug) added a printk() to the WARN_ON() that's in tcp.c. This patch changes this combination to WARN(); the advantage of WARN() is that the printk message shows up inside the message, so that kerneloops.org will collect the message. In addition, this gets rid of an extra if() statement. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	tcp: Try to catch MSG_PEEK bug	Herbert Xu	2009-10-20
\| \| \| \| \| \| \| \| \|	This patch tries to print out more information when we hit the MSG_PEEK bug in tcp_recvmsg. It's been around since at least 2005 and it's about time that we finally fix it. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: Fix IP_MULTICAST_IF	Eric Dumazet	2009-10-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ipv4/ipv6 setsockopt(IP_MULTICAST_IF) have dubious __dev_get_by_index() calls. This function should be called only with RTNL or dev_base_lock held, or reader could see a corrupt hash chain and eventually enter an endless loop. Fix is to call dev_get_by_index()/dev_put(). If this happens to be performance critical, we could define a new dev_exist_by_index() function to avoid touching dev refcount. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	bluetooth: static lock key fix	Dave Young	2009-10-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When shutdown ppp connection, lockdep waring about non-static key will happen, it is caused by the lock is not initialized properly at that time. Fix with tuning the lock/skb_queue_head init order [ 94.339261] INFO: trying to register non-static key. [ 94.342509] the code is fine but needs lockdep annotation. [ 94.342509] turning off the locking correctness validator. [ 94.342509] Pid: 0, comm: swapper Not tainted 2.6.31-mm1 #2 [ 94.342509] Call Trace: [ 94.342509] [<c0248fbe>] register_lock_class+0x58/0x241 [ 94.342509] [<c024b5df>] ? __lock_acquire+0xb57/0xb73 [ 94.342509] [<c024ab34>] __lock_acquire+0xac/0xb73 [ 94.342509] [<c024b7fa>] ? lock_release_non_nested+0x17b/0x1de [ 94.342509] [<c024b662>] lock_acquire+0x67/0x84 [ 94.342509] [<c04cd1eb>] ? skb_dequeue+0x15/0x41 [ 94.342509] [<c054a857>] _spin_lock_irqsave+0x2f/0x3f [ 94.342509] [<c04cd1eb>] ? skb_dequeue+0x15/0x41 [ 94.342509] [<c04cd1eb>] skb_dequeue+0x15/0x41 [ 94.342509] [<c054a648>] ? _read_unlock+0x1d/0x20 [ 94.342509] [<c04cd641>] skb_queue_purge+0x14/0x1b [ 94.342509] [<fab94fdc>] l2cap_recv_frame+0xea1/0x115a [l2cap] [ 94.342509] [<c024b5df>] ? __lock_acquire+0xb57/0xb73 [ 94.342509] [<c0249c04>] ? mark_lock+0x1e/0x1c7 [ 94.342509] [<f8364963>] ? hci_rx_task+0xd2/0x1bc [bluetooth] [ 94.342509] [<fab95346>] l2cap_recv_acldata+0xb1/0x1c6 [l2cap] [ 94.342509] [<f8364997>] hci_rx_task+0x106/0x1bc [bluetooth] [ 94.342509] [<fab95295>] ? l2cap_recv_acldata+0x0/0x1c6 [l2cap] [ 94.342509] [<c02302c4>] tasklet_action+0x69/0xc1 [ 94.342509] [<c022fbef>] __do_softirq+0x94/0x11e [ 94.342509] [<c022fcaf>] do_softirq+0x36/0x5a [ 94.342509] [<c022fe14>] irq_exit+0x35/0x68 [ 94.342509] [<c0204ced>] do_IRQ+0x72/0x89 [ 94.342509] [<c02038ee>] common_interrupt+0x2e/0x34 [ 94.342509] [<c024007b>] ? pm_qos_add_requirement+0x63/0x9d [ 94.342509] [<c038e8a5>] ? acpi_idle_enter_bm+0x209/0x238 [ 94.342509] [<c049d238>] cpuidle_idle_call+0x5c/0x94 [ 94.342509] [<c02023f8>] cpu_idle+0x4e/0x6f [ 94.342509] [<c0534153>] rest_init+0x53/0x55 [ 94.342509] [<c0781894>] start_kernel+0x2f0/0x2f5 [ 94.342509] [<c0781091>] i386_start_kernel+0x91/0x96 Reported-by: Oliver Hartkopp <oliver@hartkopp.net> Signed-off-by: Dave Young <hidave.darkstar@gmail.com> Tested-by: Oliver Hartkopp <oliver@hartkopp.net> Signed-off-by: David S. Miller <davem@davemloft.net>
*	bluetooth: scheduling while atomic bug fix	Dave Young	2009-10-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Due to driver core changes dev_set_drvdata will call kzalloc which should be in might_sleep context, but hci_conn_add will be called in atomic context Like dev_set_name move dev_set_drvdata to work queue function. oops as following: Oct 2 17:41:59 darkstar kernel: [ 438.001341] BUG: sleeping function called from invalid context at mm/slqb.c:1546 Oct 2 17:41:59 darkstar kernel: [ 438.001345] in_atomic(): 1, irqs_disabled(): 0, pid: 2133, name: sdptool Oct 2 17:41:59 darkstar kernel: [ 438.001348] 2 locks held by sdptool/2133: Oct 2 17:41:59 darkstar kernel: [ 438.001350] #0: (sk_lock-AF_BLUETOOTH-BTPROTO_L2CAP){+.+.+.}, at: [<faa1d2f5>] lock_sock+0xa/0xc [l2cap] Oct 2 17:41:59 darkstar kernel: [ 438.001360] #1: (&hdev->lock){+.-.+.}, at: [<faa20e16>] l2cap_sock_connect+0x103/0x26b [l2cap] Oct 2 17:41:59 darkstar kernel: [ 438.001371] Pid: 2133, comm: sdptool Not tainted 2.6.31-mm1 #2 Oct 2 17:41:59 darkstar kernel: [ 438.001373] Call Trace: Oct 2 17:41:59 darkstar kernel: [ 438.001381] [<c022433f>] __might_sleep+0xde/0xe5 Oct 2 17:41:59 darkstar kernel: [ 438.001386] [<c0298843>] __kmalloc+0x4a/0x15a Oct 2 17:41:59 darkstar kernel: [ 438.001392] [<c03f0065>] ? kzalloc+0xb/0xd Oct 2 17:41:59 darkstar kernel: [ 438.001396] [<c03f0065>] kzalloc+0xb/0xd Oct 2 17:41:59 darkstar kernel: [ 438.001400] [<c03f04ff>] device_private_init+0x15/0x3d Oct 2 17:41:59 darkstar kernel: [ 438.001405] [<c03f24c5>] dev_set_drvdata+0x18/0x26 Oct 2 17:41:59 darkstar kernel: [ 438.001414] [<fa51fff7>] hci_conn_init_sysfs+0x40/0xd9 [bluetooth] Oct 2 17:41:59 darkstar kernel: [ 438.001422] [<fa51cdc0>] ? hci_conn_add+0x128/0x186 [bluetooth] Oct 2 17:41:59 darkstar kernel: [ 438.001429] [<fa51ce0f>] hci_conn_add+0x177/0x186 [bluetooth] Oct 2 17:41:59 darkstar kernel: [ 438.001437] [<fa51cf8a>] hci_connect+0x3c/0xfb [bluetooth] Oct 2 17:41:59 darkstar kernel: [ 438.001442] [<faa20e87>] l2cap_sock_connect+0x174/0x26b [l2cap] Oct 2 17:41:59 darkstar kernel: [ 438.001448] [<c04c8df5>] sys_connect+0x60/0x7a Oct 2 17:41:59 darkstar kernel: [ 438.001453] [<c024b703>] ? lock_release_non_nested+0x84/0x1de Oct 2 17:41:59 darkstar kernel: [ 438.001458] [<c028804b>] ? might_fault+0x47/0x81 Oct 2 17:41:59 darkstar kernel: [ 438.001462] [<c028804b>] ? might_fault+0x47/0x81 Oct 2 17:41:59 darkstar kernel: [ 438.001468] [<c033361f>] ? __copy_from_user_ll+0x11/0xce Oct 2 17:41:59 darkstar kernel: [ 438.001472] [<c04c9419>] sys_socketcall+0x82/0x17b Oct 2 17:41:59 darkstar kernel: [ 438.001477] [<c020329d>] syscall_call+0x7/0xb Signed-off-by: Dave Young <hidave.darkstar@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	tcp: fix TCP_DEFER_ACCEPT retrans calculation	Julian Anastasov	2009-10-19
\| \| \| \| \| \| \| \| \| \| \| \|	Fix TCP_DEFER_ACCEPT conversion between seconds and retransmission to match the TCP SYN-ACK retransmission periods because the time is converted to such retransmissions. The old algorithm selects one more retransmission in some cases. Allow up to 255 retransmissions. Signed-off-by: Julian Anastasov <ja@ssi.bg> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	tcp: reduce SYN-ACK retrans for TCP_DEFER_ACCEPT	Julian Anastasov	2009-10-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Change SYN-ACK retransmitting code for the TCP_DEFER_ACCEPT users to not retransmit SYN-ACKs during the deferring period if ACK from client was received. The goal is to reduce traffic during the deferring period. When the period is finished we continue with sending SYN-ACKs (at least one) but this time any traffic from client will change the request to established socket allowing application to terminate it properly. Also, do not drop acked request if sending of SYN-ACK fails. Signed-off-by: Julian Anastasov <ja@ssi.bg> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	tcp: accept socket after TCP_DEFER_ACCEPT period	Julian Anastasov	2009-10-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Willy Tarreau and many other folks in recent years were concerned what happens when the TCP_DEFER_ACCEPT period expires for clients which sent ACK packet. They prefer clients that actively resend ACK on our SYN-ACK retransmissions to be converted from open requests to sockets and queued to the listener for accepting after the deferring period is finished. Then application server can decide to wait longer for data or to properly terminate the connection with FIN if read() returns EAGAIN which is an indication for accepting after the deferring period. This change still can have side effects for applications that expect always to see data on the accepted socket. Others can be prepared to work in both modes (with or without TCP_DEFER_ACCEPT period) and their data processing can ignore the read=EAGAIN notification and to allocate resources for clients which proved to have no data to send during the deferring period. OTOH, servers that use TCP_DEFER_ACCEPT=1 as flag (not as a timeout) to wait for data will notice clients that didn't send data for 3 seconds but that still resend ACKs. Thanks to Willy Tarreau for the initial idea and to Eric Dumazet for the review and testing the change. Signed-off-by: Julian Anastasov <ja@ssi.bg> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Revert "tcp: fix tcp_defer_accept to consider the timeout"	David S. Miller	2009-10-19
\| \| \| \| \| \| \| \| \|	This reverts commit 6d01a026b7d3009a418326bdcf313503a314f1ea. Julian Anastasov, Willy Tarreau and Eric Dumazet have come up with a more correct way to deal with this. Signed-off-by: David S. Miller <davem@davemloft.net>
*	AF_UNIX: Fix deadlock on connecting to shutdown socket	Tomoki Sekiyama	2009-10-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I found a deadlock bug in UNIX domain socket, which makes able to DoS attack against the local machine by non-root users. How to reproduce: 1. Make a listening AF_UNIX/SOCK_STREAM socket with an abstruct namespace(), and shutdown(2) it. 2. Repeat connect(2)ing to the listening socket from the other sockets until the connection backlog is full-filled. 3. connect(2) takes the CPU forever. If every core is taken, the system hangs. PoC code: (Run as many times as cores on SMP machines.) int main(void) { int ret; int csd; int lsd; struct sockaddr_un sun; / make an abstruct name address () / memset(&sun, 0, sizeof(sun)); sun.sun_family = PF_UNIX; sprintf(&sun.sun_path[1], "%d", getpid()); /* create the listening socket and shutdown / lsd = socket(AF_UNIX, SOCK_STREAM, 0); bind(lsd, (struct sockaddr )&sun, sizeof(sun)); listen(lsd, 1); shutdown(lsd, SHUT_RDWR); /* connect loop / alarm(15); / forcely exit the loop after 15 sec / for (;;) { csd = socket(AF_UNIX, SOCK_STREAM, 0); ret = connect(csd, (struct sockaddr )&sun, sizeof(sun)); if (-1 == ret) { perror("connect()"); break; } puts("Connection OK"); } return 0; } (*) Make sun_path[0] = 0 to use the abstruct namespace. If a file-based socket is used, the system doesn't deadlock because of context switches in the file system layer. Why this happens: Error checks between unix_socket_connect() and unix_wait_for_peer() are inconsistent. The former calls the latter to wait until the backlog is processed. Despite the latter returns without doing anything when the socket is shutdown, the former doesn't check the shutdown state and just retries calling the latter forever. Patch: The patch below adds shutdown check into unix_socket_connect(), so connect(2) to the shutdown socket will return -ECONREFUSED. Signed-off-by: Tomoki Sekiyama <tomoki.sekiyama.qu@hitachi.com> Signed-off-by: Masanori Yoshida <masanori.yoshida.tv@hitachi.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch 'master' of ↵	David S. Miller	2009-10-13
\|\ \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6
\| *	mac80211: document ieee80211_rx() context requirement	Johannes Berg	2009-10-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ieee80211_rx() must be called with softirqs disabled since the networking stack requires this for netif_rx() and some code in mac80211 can assume that it can not be processing its own tasklet and this call at the same time. It may be possible to remove this requirement after a careful audit of mac80211 and doing any needed locking improvements in it along with disabling softirqs around netif_rx(). An alternative might be to push all packet processing to process context in mac80211, instead of to the tasklet, and add other synchronisation. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
\| *	mac80211: fix ibss race	Johannes Berg	2009-10-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When a scan completes, we call ieee80211_sta_find_ibss(), which is also called from other places. When the scan was done in software, there's no problem as both run from the single-threaded mac80211 workqueue and are thus serialised against each other, but with hardware scan the completion can be in a different context and race against callers of this function from the workqueue (e.g. due to beacon RX). So instead of calling ieee80211_sta_find_ibss() directly, just arm the timer and have it fire, scheduling the work, which will invoke ieee80211_sta_find_ibss() (if that is appropriate in the current state). Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
\| *	mac80211: fix logic error ibss merge bssid check	Felix Fietkau	2009-10-12
\| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Felix Fietkau <nbd@openwrt.org> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
* \|	udp: Fix udp_poll() and ioctl()	Eric Dumazet	2009-10-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	udp_poll() can in some circumstances drop frames with incorrect checksums. Problem is we now have to lock the socket while dropping frames, or risk sk_forward corruption. This bug is present since commit 95766fff6b9a78d1 ([UDP]: Add memory accounting.) While we are at it, we can correct ioctl(SIOCINQ) to also drop bad frames. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	tcp: fix tcp_defer_accept to consider the timeout	Willy Tarreau	2009-10-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I was trying to use TCP_DEFER_ACCEPT and noticed that if the client does not talk, the connection is never accepted and remains in SYN_RECV state until the retransmits expire, where it finally is deleted. This is bad when some firewall such as netfilter sits between the client and the server because the firewall sees the connection in ESTABLISHED state while the server will finally silently drop it without sending an RST. This behaviour contradicts the man page which says it should wait only for some time : TCP_DEFER_ACCEPT (since Linux 2.4) Allows a listener to be awakened only when data arrives on the socket. Takes an integer value (seconds), this can bound the maximum number of attempts TCP will make to complete the connection. This option should not be used in code intended to be portable. Also, looking at ipv4/tcp.c, a retransmit counter is correctly computed : case TCP_DEFER_ACCEPT: icsk->icsk_accept_queue.rskq_defer_accept = 0; if (val > 0) { /* Translate value in seconds to number of * retransmits */ while (icsk->icsk_accept_queue.rskq_defer_accept < 32 && val > ((TCP_TIMEOUT_INIT / HZ) << icsk->icsk_accept_queue.rskq_defer_accept)) icsk->icsk_accept_queue.rskq_defer_accept++; icsk->icsk_accept_queue.rskq_defer_accept++; } break; ==> rskq_defer_accept is used as a counter of retransmits. But in tcp_minisocks.c, this counter is only checked. And in fact, I have found no location which updates it. So I think that what was intended was to decrease it in tcp_minisocks whenever it is checked, which the trivial patch below does. Signed-off-by: Willy Tarreau <w@1wt.eu> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	pkt_sched: pedit use proper struct	jamal	2009-10-12
\|/ \| \| \| \| \| \| \| \| \| \|	This probably deserves to go into -stable. Pedit will reject a policy that is large because it uses the wrong structure in the policy validation. This fixes it. Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch 'master' of ↵	David S. Miller	2009-10-08
\|\ \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6
\| *	cfg80211: fix netns error unwinding bug	Johannes Berg	2009-10-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The error unwinding code in set_netns has a bug that will make it run into a BUG_ON if passed a bad wiphy index, fix by not trying to unlock a wiphy that doesn't exist. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
\| *	mac80211: use kfree_skb() to free struct sk_buff pointers	Roel Kluin	2009-10-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	kfree_skb() should be used to free struct sk_buff pointers. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Cc: stable@kernel.org Signed-off-by: John W. Linville <linville@tuxdriver.com>
\| *	mac80211: fix vlan and optimise RX	Johannes Berg	2009-10-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When receiving data frames, we can send them only to the interface they belong to based on transmitting station (this doesn't work for probe requests). Also, don't try to handle other frames for AP_VLAN at all since those interface should only receive data. Additionally, the transmit side must check that the station we're sending a frame to is actually on the interface we're transmitting on, and not transmit packets to functions that live on other interfaces, so validate that as well. Another bug fix is needed in sta_info.c where in the VLAN case when adding/removing stations we overwrite the sdata variable we still need. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Cc: stable@kernel.org Signed-off-by: John W. Linville <linville@tuxdriver.com>
* \|	netlink: fix typo in initialization	Jiri Pirko	2009-10-08
\|/ \| \| \| \| \| \| \| \|	Commit 9ef1d4c7c7aca1cd436612b6ca785b726ffb8ed8 ("[NETLINK]: Missing initializations in dumped data") introduced a typo in initialization. This patch fixes this. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	ipv4: arp_notify address list bug	Stephen Hemminger	2009-10-07
\| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes a bug with arp_notify. If arp_notify is enabled, kernel will crash if address is changed and no IP address is assigned. http://bugzilla.kernel.org/show_bug.cgi?id=14330 Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	wext: let get_wireless_stats() sleep	Johannes Berg	2009-10-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A number of drivers (recently including cfg80211-based ones) assume that all wireless handlers, including statistics, can sleep and they often also implicitly assume that the rtnl is held around their invocation. This is almost always true now except when reading from sysfs: BUG: sleeping function called from invalid context at kernel/mutex.c:280 in_atomic(): 1, irqs_disabled(): 0, pid: 10450, name: head 2 locks held by head/10450: #0: (&buffer->mutex){+.+.+.}, at: [<c10ceb99>] sysfs_read_file+0x24/0xf4 #1: (dev_base_lock){++.?..}, at: [<c12844ee>] wireless_show+0x1a/0x4c Pid: 10450, comm: head Not tainted 2.6.32-rc3 #1 Call Trace: [<c102301c>] __might_sleep+0xf0/0xf7 [<c1324355>] mutex_lock_nested+0x1a/0x33 [<f8cea53b>] wdev_lock+0xd/0xf [cfg80211] [<f8cea58f>] cfg80211_wireless_stats+0x45/0x12d [cfg80211] [<c13118d6>] get_wireless_stats+0x16/0x1c [<c12844fe>] wireless_show+0x2a/0x4c Fix this by using the rtnl instead of dev_base_lock. Reported-by: Miles Lane <miles.lane@gmail.com> Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>
*	pktgen: restore nanosec delays	Eric Dumazet	2009-10-05
\| \| \| \| \| \| \| \|	Commit fd29cf72 (pktgen: convert to use ktime_t) inadvertantly converted "delay" parameter from nanosec to microsec. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	pktgen: Fix multiqueue handling	Eric Dumazet	2009-10-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It is not currently possible to instruct pktgen to use one selected tx queue. When Robert added multiqueue support in commit 45b270f8, he added an interval (queue_map_min, queue_map_max), and his code doesnt take into account the case of min = max, to select one tx queue exactly. I suspect a high performance setup on a eight txqueue device wants to use exactly eight cpus, and assign one tx queue to each sender. This patchs makes pktgen select the right tx queue, not the first one. Also updates Documentation to reflect Robert changes. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: splice() from tcp to pipe should take into account O_NONBLOCK	Eric Dumazet	2009-10-02
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	tcp_splice_read() doesnt take into account socket's O_NONBLOCK flag Before this patch : splice(socket,0,pipe,0,1281024,SPLICE_F_MOVE); causes a random endless block (if pipe is full) and splice(socket,0,pipe,0,1281024,SPLICE_F_MOVE \| SPLICE_F_NONBLOCK); will return 0 immediately if the TCP buffer is empty. User application has no way to instruct splice() that socket should be in blocking mode but pipe in nonblock more. Many projects cannot use splice(tcp -> pipe) because of this flaw. http://git.samba.org/?p=samba.git;a=history;f=source3/lib/recvfile.c;h=ea0159642137390a0f7e57a123684e6e63e47581;hb=HEAD http://lkml.indiana.edu/hypermail/linux/kernel/0807.2/0687.html Linus introduced SPLICE_F_NONBLOCK in commit 29e350944fdc2dfca102500790d8ad6d6ff4f69d (splice: add SPLICE_F_NONBLOCK flag ) It doesn't make the splice itself necessarily nonblocking (because the actual file descriptors that are spliced from/to may block unless they have the O_NONBLOCK flag set), but it makes the splice pipe operations nonblocking. Linus intention was clear : let SPLICE_F_NONBLOCK control the splice pipe mode only This patch instruct tcp_splice_read() to use the underlying file O_NONBLOCK flag, as other socket operations do. Users will then call : splice(socket,0,pipe,0,128*1024,SPLICE_F_MOVE \| SPLICE_F_NONBLOCK ); to block on data coming from socket (if file is in blocking mode), and not block on pipe output (to avoid deadlock) First version of this patch was submitted by Octavian Purdila Reported-by: Volker Lendecke <vl@samba.org> Reported-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Octavian Purdila <opurdila@ixiacom.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: Use sk_mark for routing lookup in more places	Atis Elsts	2009-10-01
\| \| \| \| \| \| \| \| \| \| \|	This patch against v2.6.31 adds support for route lookup using sk_mark in some more places. The benefits from this patch are the following. First, SO_MARK option now has effect on UDP sockets too. Second, ip_queue_xmit() and inet_sk_rebuild_header() could fail to do routing lookup correctly if TCP sockets with SO_MARK were used. Signed-off-by: Atis Elsts <atis@mikrotik.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
*	IPv4 TCP fails to send window scale option when window scale is zero	Ori Finkelman	2009-10-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Acknowledge TCP window scale support by inserting the proper option in SYN/ACK and SYN headers even if our window scale is zero. This fixes the following observed behavior: 1. Client sends a SYN with TCP window scaling option and non zero window scale value to a Linux box. 2. Linux box notes large receive window from client. 3. Linux decides on a zero value of window scale for its part. 4. Due to compare against requested window scale size option, Linux does not to send windows scale TCP option header on SYN/ACK at all. With the following result: Client box thinks TCP window scaling is not supported, since SYN/ACK had no TCP window scale option, while Linux thinks that TCP window scaling is supported (and scale might be non zero), since SYN had TCP window scale option and we have a mismatched idea between the client and server regarding window sizes. Probably it also fixes up the following bug (not observed in practice): 1. Linux box opens TCP connection to some server. 2. Linux decides on zero value of window scale. 3. Due to compare against computed window scale size option, Linux does not to set windows scale TCP option header on SYN. With the expected result that the server OS does not use window scale option due to not receiving such an option in the SYN headers, leading to suboptimal performance. Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com> Signed-off-by: Ori Finkelman <ori@comsleep.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net/ipv4/tcp.c: fix min() type mismatch warning	Andrew Morton	2009-10-01
\| \| \| \| \| \| \| \|	net/ipv4/tcp.c: In function 'do_tcp_setsockopt': net/ipv4/tcp.c:2050: warning: comparison of distinct pointer types lacks a cast Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch 'master' of ↵	David S. Miller	2009-10-01
\|\ \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6
\| *	mac80211: Fix [re]association power saving issue on AP side	Igor Perminov	2009-09-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Consider the following step-by step: 1. A STA authenticates and associates with the AP and exchanges traffic. 2. The STA reports to the AP that it is going to PS state. 3. Some time later the STA device goes to the stand-by mode (not only its wi-fi card, but the device itself) and drops the association state without sending a disassociation frame. 4. The STA device wakes up and begins authentication with an Auth frame as it hasn't been authenticated/associated previously. At the step 4 the AP "remembers" the STA and considers it is still in the PS state, so the AP buffers frames, which it has to send to the STA. But the STA isn't actually in the PS state and so it neither checks TIM bits nor reports to the AP that it isn't power saving. Because of that authentication/[re]association fails. To fix authentication/[re]association stage of this issue, Auth, Assoc Resp and Reassoc Resp frames are transmitted disregarding of STA's power saving state. N.B. This patch doesn't fix further data frame exchange after authentication/[re]association. A patch in hostapd is required to fix that. Signed-off-by: Igor Perminov <igor.perminov@inbox.ru> Signed-off-by: John W. Linville <linville@tuxdriver.com>
* \|	pktgen: Fix delay handling	Eric Dumazet	2009-10-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	After last pktgen changes, delay handling is wrong. pktgen actually sends packets at full line speed. Fix is to update pkt_dev->next_tx even if spin() returns early, so that next spin() calls have a chance to see a positive delay. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	ax25: Fix possible oops in ax25_make_new	Jarek Poplawski	2009-09-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In ax25_make_new, if kmemdup of digipeat returns an error, there would be an oops in sk_free while calling sk_destruct, because sk_protinfo is NULL at the moment; move sk->sk_destruct initialization after this. BTW of reported-by: Bernard Pidoux F6BVP <f6bvp@free.fr> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	net: restore tx timestamping for accelerated vlans	Eric Dumazet	2009-09-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since commit 9b22ea560957de1484e6b3e8538f7eef202e3596 ( net: fix packet socket delivery in rx irq handler ) We lost rx timestamping of packets received on accelerated vlans. Effect is that tcpdump on real dev can show strange timings, since it gets rx timestamps too late (ie at skb dequeueing time, not at skb queueing time) 14:47:26.986871 IP 192.168.20.110 > 192.168.20.141: icmp 64: echo request seq 1 14:47:26.986786 IP 192.168.20.141 > 192.168.20.110: icmp 64: echo reply seq 1 14:47:27.986888 IP 192.168.20.110 > 192.168.20.141: icmp 64: echo request seq 2 14:47:27.986781 IP 192.168.20.141 > 192.168.20.110: icmp 64: echo reply seq 2 14:47:28.986896 IP 192.168.20.110 > 192.168.20.141: icmp 64: echo request seq 3 14:47:28.986780 IP 192.168.20.141 > 192.168.20.110: icmp 64: echo reply seq 3 Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	Phonet: fix mutex imbalance	Rémi Denis-Courmont	2009-09-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	From: Rémi Denis-Courmont <remi.denis-courmont@nokia.com> port_mutex was unlocked twice. Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	sit: fix off-by-one in ipip6_tunnel_get_prl	Sascha Hlusiak	2009-09-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When requesting all prl entries (kprl.addr == INADDR_ANY) and there are more prl entries than there is space passed from userspace, the existing code would always copy cmax+1 entries, which is more than can be handled. This patch makes the kernel copy only exactly cmax entries. Signed-off-by: Sascha Hlusiak <contact@saschahlusiak.de> Acked-By: Fred L. Templin <Fred.L.Templin@boeing.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	net: Fix sock_wfree() race	Eric Dumazet	2009-09-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 2b85a34e911bf483c27cfdd124aeb1605145dc80 (net: No more expensive sock_hold()/sock_put() on each tx) opens a window in sock_wfree() where another cpu might free the socket we are working on. A fix is to call sk->sk_write_space(sk) while still holding a reference on sk. Reported-by: Jike Song <albcamus@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	net: Make setsockopt() optlen be unsigned.	David S. Miller	2009-09-30
\|/ \| \| \| \| \| \| \| \| \| \| \|	This provides safety against negative optlen at the type level instead of depending upon (sometimes non-trivial) checks against this sprinkled all over the the place, in each and every implementation. Based upon work done by Arjan van de Ven and feedback from Linus Torvalds. Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch 'master' of ↵	David S. Miller	2009-09-28
\|\ \| \| \| \| \| \|	ssh://master.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6