aboutsummaryrefslogtreecommitdiffstats
path: root/net
Commit message (Collapse)AuthorAge
* Merge branch 'master' of ↵David S. Miller2009-12-03
|\ | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6
| * netfilter: net/ipv[46]/netfilter: Move && and || to end of previous lineJoe Perches2009-11-23
| | | | | | | | | | | | | | Compile tested only. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
| * netfilter: xtables: fix conntrack match v1 ipt-save outputFlorian Westphal2009-11-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit d6d3f08b0fd998b647a05540cedd11a067b72867 (netfilter: xtables: conntrack match revision 2) does break the v1 conntrack match iptables-save output in a subtle way. Problem is as follows: up = kmalloc(sizeof(*up), GFP_KERNEL); [..] /* * The strategy here is to minimize the overhead of v1 matching, * by prebuilding a v2 struct and putting the pointer into the * v1 dataspace. */ memcpy(up, info, offsetof(typeof(*info), state_mask)); [..] *(void **)info = up; As the v2 struct pointer is saved in the match data space, it clobbers the first structure member (->origsrc_addr). Because the _v1 match function grabs this pointer and does not actually look at the v1 origsrc, run time functionality does not break. But iptables -nvL (or iptables-save) cannot know that v1 origsrc_addr has been overloaded in this way: $ iptables -p tcp -A OUTPUT -m conntrack --ctorigsrc 10.0.0.1 -j ACCEPT $ iptables-save -A OUTPUT -p tcp -m conntrack --ctorigsrc 128.173.134.206 -j ACCEPT (128.173... is the address to the v2 match structure). To fix this, we take advantage of the fact that the v1 and v2 structures are identical with exception of the last two structure members (u8 in v1, u16 in v2). We extract them as early as possible and prevent the v2 matching function from looking at those two members directly. Previously reported by Michel Messerschmidt via Ben Hutchings, also see Debian Bug tracker #556587. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Patrick McHardy <kaber@trash.net>
| * netfilter: nf_ct_tcp: improve out-of-sync situation in TCP trackingPablo Neira Ayuso2009-11-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Without this patch, if we receive a SYN packet from the client while the firewall is out-of-sync, we let it go through. Then, if we see the SYN/ACK reply coming from the server, we destroy the conntrack entry and drop the packet to trigger a new retransmission. Then, the retransmision from the client is used to start a new clean session. This patch improves the current handling. Basically, if we see an unexpected SYN packet, we annotate the TCP options. Then, if we see the reply SYN/ACK, this means that the firewall was indeed out-of-sync. Therefore, we set a clean new session from the existing entry based on the annotated values. This patch adds two new 8-bits fields that fit in a 16-bits gap of the ip_ct_tcp structure. This patch is particularly useful for conntrackd since the asynchronous nature of the state-synchronization allows to have backup nodes that are not perfect copies of the master. This helps to improve the recovery under some worst-case scenarios. I have tested this by creating lots of conntrack entries in wrong state: for ((i=1024;i<65535;i++)); do conntrack -I -p tcp -s 192.168.2.101 -d 192.168.2.2 --sport $i --dport 80 -t 800 --state ESTABLISHED -u ASSURED,SEEN_REPLY; done Then, I make some TCP connections: $ echo GET / | nc 192.168.2.2 80 The events show the result: [UPDATE] tcp 6 60 SYN_RECV src=192.168.2.101 dst=192.168.2.2 sport=33220 dport=80 src=192.168.2.2 dst=192.168.2.101 sport=80 dport=33220 [ASSURED] [UPDATE] tcp 6 432000 ESTABLISHED src=192.168.2.101 dst=192.168.2.2 sport=33220 dport=80 src=192.168.2.2 dst=192.168.2.101 sport=80 dport=33220 [ASSURED] [UPDATE] tcp 6 120 FIN_WAIT src=192.168.2.101 dst=192.168.2.2 sport=33220 dport=80 src=192.168.2.2 dst=192.168.2.101 sport=80 dport=33220 [ASSURED] [UPDATE] tcp 6 30 LAST_ACK src=192.168.2.101 dst=192.168.2.2 sport=33220 dport=80 src=192.168.2.2 dst=192.168.2.101 sport=80 dport=33220 [ASSURED] [UPDATE] tcp 6 120 TIME_WAIT src=192.168.2.101 dst=192.168.2.2 sport=33220 dport=80 src=192.168.2.2 dst=192.168.2.101 sport=80 dport=33220 [ASSURED] and tcpdump shows no retransmissions: 20:47:57.271951 IP 192.168.2.101.33221 > 192.168.2.2.www: S 435402517:435402517(0) win 5840 <mss 1460,sackOK,timestamp 4294961827 0,nop,wscale 6> 20:47:57.273538 IP 192.168.2.2.www > 192.168.2.101.33221: S 3509927945:3509927945(0) ack 435402518 win 5792 <mss 1460,sackOK,timestamp 235681024 4294961827,nop,wscale 4> 20:47:57.273608 IP 192.168.2.101.33221 > 192.168.2.2.www: . ack 3509927946 win 92 <nop,nop,timestamp 4294961827 235681024> 20:47:57.273693 IP 192.168.2.101.33221 > 192.168.2.2.www: P 435402518:435402524(6) ack 3509927946 win 92 <nop,nop,timestamp 4294961827 235681024> 20:47:57.275492 IP 192.168.2.2.www > 192.168.2.101.33221: . ack 435402524 win 362 <nop,nop,timestamp 235681024 4294961827> 20:47:57.276492 IP 192.168.2.2.www > 192.168.2.101.33221: P 3509927946:3509928082(136) ack 435402524 win 362 <nop,nop,timestamp 235681025 4294961827> 20:47:57.276515 IP 192.168.2.101.33221 > 192.168.2.2.www: . ack 3509928082 win 108 <nop,nop,timestamp 4294961828 235681025> 20:47:57.276521 IP 192.168.2.2.www > 192.168.2.101.33221: F 3509928082:3509928082(0) ack 435402524 win 362 <nop,nop,timestamp 235681025 4294961827> 20:47:57.277369 IP 192.168.2.101.33221 > 192.168.2.2.www: F 435402524:435402524(0) ack 3509928083 win 108 <nop,nop,timestamp 4294961828 235681025> 20:47:57.279491 IP 192.168.2.2.www > 192.168.2.101.33221: . ack 435402525 win 362 <nop,nop,timestamp 235681025 4294961828> I also added a rule to log invalid packets, with no occurrences :-) . Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Patrick McHardy <kaber@trash.net>
| * netfilter: remove unneccessary checks from netlink notifiersPatrick McHardy2009-11-06
| | | | | | | | | | | | | | The NETLINK_URELEASE notifier is only invoked for bound sockets, so there is no need to check ->pid again. Signed-off-by: Patrick McHardy <kaber@trash.net>
| * netfilter: nf_nat_helper: tidy up adjust_tcp_sequenceHannes Eder2009-11-05
| | | | | | | | | | | | | | | | | | | | | | The variable 'other_way' gets initialized but is not read afterwards, so remove it. Pass the right arguments to a pr_debug call. While being at tidy up a bit and it fix this checkpatch warning: WARNING: suspect code indent for conditional statements Signed-off-by: Hannes Eder <heder@google.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
| * netfilter: nf_conntrack: avoid additional compare.Changli Gao2009-11-05
| | | | | | | | | | Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
| * netfilter: remove synchronize_net() calls in ip_queue/ip6_queueEric Dumazet2009-11-04
| | | | | | | | | | | | | | | | nf_unregister_queue_handlers() already does a synchronize_rcu() call, we dont need to do it again in callers. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
| * netfilter: xt_socket: make module available for INPUT chainJan Engelhardt2009-10-29
| | | | | | | | | | | | | | | | | | | | | | This should make it possible to test for the existence of local sockets in the INPUT path. References: http://marc.info/?l=netfilter-devel&m=125380481517129&w=2 Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Balazs Scheidler <bazsi@balabit.hu> Signed-off-by: Patrick McHardy <kaber@trash.net>
* | net: Batch inet_twsk_purgeEric W. Biederman2009-12-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This function walks the whole hashtable so there is no point in passing it a network namespace. Instead I purge all timewait sockets from dead network namespaces that I find. If the namespace is one of the once I am trying to purge I am guaranteed no new timewait sockets can be formed so this will get them all. If the namespace is one I am not acting for it might form a few more but I will call inet_twsk_purge again and shortly to get rid of them. In any even if the network namespace is dead timewait sockets are useless. Move the calls of inet_twsk_purge into batch_exit routines so that if I am killing a bunch of namespaces at once I will just call inet_twsk_purge once and save a lot of redundant unnecessary work. My simple 4k network namespace exit test the cleanup time dropped from roughly 8.2s to 1.6s. While the time spent running inet_twsk_purge fell to about 2ms. 1ms for ipv4 and 1ms for ipv6. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: Use rcu lookups in inet_twsk_purge.Eric W. Biederman2009-12-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While we are looking up entries to free there is no reason to take the lock in inet_twsk_purge. We have to drop locks and restart occassionally anyway so adding a few more in case we get on the wrong list because of a timewait move is no big deal. At the same time not taking the lock for long periods of time is much more polite to the rest of the users of the hash table. In my test configuration of killing 4k network namespaces this change causes 4k back to back runs of inet_twsk_purge on an empty hash table to go from roughly 20.7s to 3.3s, and the total time to destroy 4k network namespaces goes from roughly 44s to 3.3s. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: Allow fib_rule_unregister to batchEric W. Biederman2009-12-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Refactor the code so fib_rules_register always takes a template instead of the actual fib_rules_ops structure that will be used. This is required for network namespace support so 2 out of the 3 callers already do this, it allows the error handling to be made common, and it allows fib_rules_unregister to free the template for hte caller. Modify fib_rules_unregister to use call_rcu instead of syncrhonize_rcu to allw multiple namespaces to be cleaned up in the same rcu grace period. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | netns: Add an explicit rcu_barrier to unregister_pernet_{device|subsys}Eric W. Biederman2009-12-03
| | | | | | | | | | | | | | | | | | This allows namespace exit methods to batch work that comes requires an rcu barrier using call_rcu without having to treat the unregister_pernet_operations cases specially. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: Allow xfrm_user_net_exit to batch efficiently.Eric W. Biederman2009-12-03
| | | | | | | | | | | | | | | | | | | | | | xfrm.nlsk is provided by the xfrm_user module and is access via rcu from other parts of the xfrm code. Add xfrm.nlsk_stash a copy of xfrm.nlsk that will never be set to NULL. This allows the synchronize_net and netlink_kernel_release to be deferred until a whole batch of xfrm.nlsk sockets have been set to NULL. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: Move network device exit batchingEric W. Biederman2009-12-03
| | | | | | | | | | | | | | | | Move network device exit batching from a special case in net_namespace.c to using common mechanisms in dev.c Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: Add support for batching network namespace cleanupsEric W. Biederman2009-12-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Add exit_list to struct net to support building lists of network namespaces to cleanup. - Add exit_batch to pernet_operations to allow running operations only once during a network namespace exit. Instead of once per network namespace. - Factor opt ops_exit_list and ops_exit_free so the logic with cleanup up a network namespace does not need to be duplicated. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | ipv4 05/05: add sysctl to accept packets with local source addressesPatrick McHardy2009-12-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 8ec1e0ebe26087bfc5c0394ada5feb5758014fc8 Author: Patrick McHardy <kaber@trash.net> Date: Thu Dec 3 12:16:35 2009 +0100 ipv4: add sysctl to accept packets with local source addresses Change fib_validate_source() to accept packets with a local source address when the "accept_local" sysctl is set for the incoming inet device. Combined with the previous patches, this allows to communicate between multiple local interfaces over the wire. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net 04/05: fib_rules: allow to delete local rulePatrick McHardy2009-12-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit d124356ce314fff22a047ea334379d5105b2d834 Author: Patrick McHardy <kaber@trash.net> Date: Thu Dec 3 12:16:35 2009 +0100 net: fib_rules: allow to delete local rule Allow to delete the local rule and recreate it with a higher priority. This can be used to force packets with a local destination out on the wire instead of routing them to loopback. Additionally this patch allows to recreate rules with a priority of 0. Combined with the previous patch to allow oif classification, a socket can be bound to the desired interface and packets routed to the wire like this: # move local rule to lower priority ip rule add pref 1000 lookup local ip rule del pref 0 # route packets of sockets bound to eth0 to the wire independant # of the destination address ip rule add pref 100 oif eth0 lookup 100 ip route add default dev eth0 table 100 Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net 03/05: fib_rules: add oif classificationPatrick McHardy2009-12-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 68144d350f4f6c348659c825cde6a82b34c27a91 Author: Patrick McHardy <kaber@trash.net> Date: Thu Dec 3 12:05:25 2009 +0100 net: fib_rules: add oif classification Support routing table lookup based on the flow's oif. This is useful to classify packets originating from sockets bound to interfaces differently. The route cache already includes the oif and needs no changes. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net 02/05: fib_rules: rename ifindex/ifname/FRA_IFNAME to ↵Patrick McHardy2009-12-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | iifindex/iifname/FRA_IIFNAME commit 229e77eec406ad68662f18e49fda8b5d366768c5 Author: Patrick McHardy <kaber@trash.net> Date: Thu Dec 3 12:05:23 2009 +0100 net: fib_rules: rename ifindex/ifname/FRA_IFNAME to iifindex/iifname/FRA_IIFNAME The next patch will add oif classification, rename interface related members and attributes to reflect that they're used for iif classification. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | tcp: sysctl_tcp_cookie_size needs to be exported to modules.David S. Miller2009-12-03
| | | | | | | | | | | | | | | | | | Otherwise: ERROR: "sysctl_tcp_cookie_size" [net/ipv6/ipv6.ko] undefined! make[1]: *** [__modpost] Error 1 Signed-off-by: David S. Miller <davem@davemloft.net>
* | tcp: Fix warning on 64-bit.David S. Miller2009-12-03
| | | | | | | | | | | | | | net/ipv4/tcp_output.c: In function ‘tcp_make_synack’: net/ipv4/tcp_output.c:2488: warning: cast from pointer to integer of different size Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: Teach vlans to cleanup as a pernet subsystemEric W. Biederman2009-12-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Take advantage of the fact that an explicit rtnl_kill_links is unnecessary (and skipping it improves batching), as network namespace exit calls dellink on all remaining virtual devices, and rtnl_link_unregister calls dellink on all outstanding devices in that network namespace. To do this we need to leave the vlan proc directories in place until after network device exit time, which is done by using register_pernet_subsys instead of register_pernet_device. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | TCPCT part 1g: Responder Cookie => InitiatorWilliam Allen Simpson2009-12-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Parse incoming TCP_COOKIE option(s). Calculate <SYN,ACK> TCP_COOKIE option. Send optional <SYN,ACK> data. This is a significantly revised implementation of an earlier (year-old) patch that no longer applies cleanly, with permission of the original author (Adam Langley): http://thread.gmane.org/gmane.linux.network/102586 Requires: TCPCT part 1a: add request_values parameter for sending SYNACK TCPCT part 1b: generate Responder Cookie secret TCPCT part 1c: sysctl_tcp_cookie_size, socket option TCP_COOKIE_TRANSACTIONS TCPCT part 1d: define TCP cookie option, extend existing struct's TCPCT part 1e: implement socket option TCP_COOKIE_TRANSACTIONS TCPCT part 1f: Initiator Cookie => Responder Signed-off-by: William.Allen.Simpson@gmail.com Signed-off-by: David S. Miller <davem@davemloft.net>
* | TCPCT part 1f: Initiator Cookie => ResponderWilliam Allen Simpson2009-12-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Calculate and format <SYN> TCP_COOKIE option. This is a significantly revised implementation of an earlier (year-old) patch that no longer applies cleanly, with permission of the original author (Adam Langley): http://thread.gmane.org/gmane.linux.network/102586 Requires: TCPCT part 1c: sysctl_tcp_cookie_size, socket option TCP_COOKIE_TRANSACTIONS TCPCT part 1d: define TCP cookie option, extend existing struct's Signed-off-by: William.Allen.Simpson@gmail.com Signed-off-by: David S. Miller <davem@davemloft.net>
* | TCPCT part 1e: implement socket option TCP_COOKIE_TRANSACTIONSWilliam Allen Simpson2009-12-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Provide per socket control of the TCP cookie option and SYN/SYNACK data. This is a straightforward re-implementation of an earlier (year-old) patch that no longer applies cleanly, with permission of the original author (Adam Langley): http://thread.gmane.org/gmane.linux.network/102586 The principle difference is using a TCP option to carry the cookie nonce, instead of a user configured offset in the data. Allocations have been rearranged to avoid requiring GFP_ATOMIC. Requires: net: TCP_MSS_DEFAULT, TCP_MSS_DESIRED TCPCT part 1c: sysctl_tcp_cookie_size, socket option TCP_COOKIE_TRANSACTIONS TCPCT part 1d: define TCP cookie option, extend existing struct's Signed-off-by: William.Allen.Simpson@gmail.com Signed-off-by: David S. Miller <davem@davemloft.net>
* | TCPCT part 1d: define TCP cookie option, extend existing struct'sWilliam Allen Simpson2009-12-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Data structures are carefully composed to require minimal additions. For example, the struct tcp_options_received cookie_plus variable fits between existing 16-bit and 8-bit variables, requiring no additional space (taking alignment into consideration). There are no additions to tcp_request_sock, and only 1 pointer in tcp_sock. This is a significantly revised implementation of an earlier (year-old) patch that no longer applies cleanly, with permission of the original author (Adam Langley): http://thread.gmane.org/gmane.linux.network/102586 The principle difference is using a TCP option to carry the cookie nonce, instead of a user configured offset in the data. This is more flexible and less subject to user configuration error. Such a cookie option has been suggested for many years, and is also useful without SYN data, allowing several related concepts to use the same extension option. "Re: SYN floods (was: does history repeat itself?)", September 9, 1996. http://www.merit.net/mail.archives/nanog/1996-09/msg00235.html "Re: what a new TCP header might look like", May 12, 1998. ftp://ftp.isi.edu/end2end/end2end-interest-1998.mail These functions will also be used in subsequent patches that implement additional features. Requires: TCPCT part 1a: add request_values parameter for sending SYNACK TCPCT part 1b: generate Responder Cookie secret TCPCT part 1c: sysctl_tcp_cookie_size, socket option TCP_COOKIE_TRANSACTIONS Signed-off-by: William.Allen.Simpson@gmail.com Signed-off-by: David S. Miller <davem@davemloft.net>
* | TCPCT part 1c: sysctl_tcp_cookie_size, socket option TCP_COOKIE_TRANSACTIONSWilliam Allen Simpson2009-12-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Define sysctl (tcp_cookie_size) to turn on and off the cookie option default globally, instead of a compiled configuration option. Define per socket option (TCP_COOKIE_TRANSACTIONS) for setting constant data values, retrieving variable cookie values, and other facilities. Move inline tcp_clear_options() unchanged from net/tcp.h to linux/tcp.h, near its corresponding struct tcp_options_received (prior to changes). This is a straightforward re-implementation of an earlier (year-old) patch that no longer applies cleanly, with permission of the original author (Adam Langley): http://thread.gmane.org/gmane.linux.network/102586 These functions will also be used in subsequent patches that implement additional features. Requires: net: TCP_MSS_DEFAULT, TCP_MSS_DESIRED Signed-off-by: William.Allen.Simpson@gmail.com Signed-off-by: David S. Miller <davem@davemloft.net>
* | TCPCT part 1b: generate Responder Cookie secretWilliam Allen Simpson2009-12-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Define (missing) hash message size for SHA1. Define hashing size constants specific to TCP cookies. Add new function: tcp_cookie_generator(). Maintain global secret values for tcp_cookie_generator(). This is a significantly revised implementation of earlier (15-year-old) Photuris [RFC-2522] code for the KA9Q cooperative multitasking platform. Linux RCU technique appears to be well-suited to this application, though neither of the circular queue items are freed. These functions will also be used in subsequent patches that implement additional features. Signed-off-by: William.Allen.Simpson@gmail.com Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | TCPCT part 1a: add request_values parameter for sending SYNACKWilliam Allen Simpson2009-12-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Add optional function parameters associated with sending SYNACK. These parameters are not needed after sending SYNACK, and are not used for retransmission. Avoids extending struct tcp_request_sock, and avoids allocating kernel memory. Also affects DCCP as it uses common struct request_sock_ops, but this parameter is currently reserved for future use. Signed-off-by: William.Allen.Simpson@gmail.com Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | skbuff: remove skb_dma_map/unmapAlexander Duyck2009-12-02
| | | | | | | | | | | | | | | | | | | | | | The two functions skb_dma_map/unmap are unsafe to use as they cause problems when packets are cloned and sent to multiple devices while a HW IOMMU is enabled. Due to this it is best to remove the code so it is not used by any other network driver maintainters. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: compat_sys_recvmmsg user timespec arg can be NULLJean-Mickael Guerin2009-12-02
| | | | | | | | | | | | | | | | | | | | | | We must test if user timespec is non-NULL before copying from userpace, same as sys_recvmmsg(). Commiter note: changed it so that we have just one branch. Signed-off-by: Jean-Mickael Guerin <jean-mickael.guerin@6wind.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: compat_mmsghdr must be used in sys_recvmmsgJean-Mickael Guerin2009-12-02
| | | | | | | | | | | | | | | | | | | | | | Both to traverse the entries and to set the msg_len field. Commiter note: folded two patches and avoided one branch repeating the compat test. Signed-off-by: Jean-Mickael Guerin <jean-mickael.guerin@6wind.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | sctp: fix sctp_setsockopt_autoclose compile warningAndrei Pelinescu-Onciul2009-12-02
| | | | | | | | | | | | | | | | | | | | | | Fix the following warning, when building on 64 bits: net/sctp/socket.c:2091: warning: large integer implicitly truncated to unsigned type Signed-off-by: Andrei Pelinescu-Onciul <andrei@iptel.org> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'master' of ↵David S. Miller2009-12-02
|\ \ | | | | | | | | | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: net/mac80211/ht.c
| * | ip_fragment: also adjust skb->truesize for packets not owned by a socketPatrick McHardy2009-12-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a large packet gets reassembled by ip_defrag(), the head skb accounts for all the fragments in skb->truesize. If this packet is refragmented again, skb->truesize is not re-adjusted to reflect only the head size since its not owned by a socket. If the head fragment then gets recycled and reused for another received fragment, it might exceed the defragmentation limits due to its large truesize value. skb_recycle_check() explicitly checks for linear skbs, so any recycled skb should reflect its true size in skb->truesize. Change ip_fragment() to also adjust the truesize value of skbs not owned by a socket. Reported-and-tested-by: Ben Menchaca <ben@bigfootnetworks.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | Merge branch 'master' of /home/davem/src/GIT/linux-2.6/David S. Miller2009-12-01
| |\ \
| | * \ Merge branch 'security' of ↵Linus Torvalds2009-11-30
| | |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 * 'security' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6: mac80211: fix spurious delBA handling mac80211: fix two remote exploits
| | | * | mac80211: fix spurious delBA handlingJohannes Berg2009-11-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Lennert Buytenhek noticed that delBA handling in mac80211 was broken and has remotely triggerable problems, some of which are due to some code shuffling I did that ended up changing the order in which things were done -- this was commit d75636ef9c1af224f1097941879d5a8db7cd04e5 Author: Johannes Berg <johannes@sipsolutions.net> Date: Tue Feb 10 21:25:53 2009 +0100 mac80211: RX aggregation: clean up stop session and other parts were already present in the original commit d92684e66091c0f0101819619b315b4bb8b5bcc5 Author: Ron Rindjunsky <ron.rindjunsky@intel.com> Date: Mon Jan 28 14:07:22 2008 +0200 mac80211: A-MPDU Tx add delBA from recipient support The first problem is that I moved a BUG_ON before various checks -- thereby making it possible to hit. As the comment indicates, the BUG_ON can be removed since the ampdu_action callback must already exist when the state is != IDLE. The second problem isn't easily exploitable but there's a race condition due to unconditionally setting the state to OPERATIONAL when a delBA frame is received, even when no aggregation session was ever initiated. All the drivers accept stopping the session even then, but that opens a race window where crashes could happen before the driver accepts it. Right now, a WARN_ON may happen with non-HT drivers, while the race opens only for HT drivers. For this case, there are two things necessary to fix it: 1) don't process spurious delBA frames, and be more careful about the session state; don't drop the lock 2) HT drivers need to be prepared to handle a session stop even before the session was really started -- this is true for all drivers (that support aggregation) but iwlwifi which can be fixed easily. The other HT drivers (ath9k and ar9170) are behaving properly already. Reported-by: Lennert Buytenhek <buytenh@marvell.com> Cc: stable@kernel.org Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
| | | * | mac80211: fix two remote exploitsJohannes Berg2009-11-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Lennert Buytenhek noticed a remotely triggerable problem in mac80211, which is due to some code shuffling I did that ended up changing the order in which things were done -- this was in commit d75636ef9c1af224f1097941879d5a8db7cd04e5 Author: Johannes Berg <johannes@sipsolutions.net> Date: Tue Feb 10 21:25:53 2009 +0100 mac80211: RX aggregation: clean up stop session The problem is that the BUG_ON moved before the various checks, and as such can be triggered. As the comment indicates, the BUG_ON can be removed since the ampdu_action callback must already exist when the state is OPERATIONAL. A similar code path leads to a WARN_ON in ieee80211_stop_tx_ba_session, which can also be removed. Cc: stable@kernel.org [2.6.29+] Cc: Lennert Buytenhek <buytenh@marvell.com> Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
| | * | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds2009-11-30
| | |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (42 commits) b44: Fix wedge when using netconsole. wan: cosa: drop chan->wsem on error path ep93xx-eth: check for zero MAC address on probe, not on device open NET: smc91x: Fix irq flags smsc9420: prevent BUG() if ethtool is called with interface down r8169: restore mac addr in rtl8169_remove_one and rtl_shutdown ipv4: additional update of dev_net(dev) to struct *net in ip_fragment.c, NULL ptr OOPS e100: Use pci pool to work around GFP_ATOMIC order 5 memory allocation failure sctp: on T3_RTX retransmit all the in-flight chunks pktgen: Fix netdevice unregister macvlan: fix gso_max_size setting rfkill: fix miscdev ops ath9k: set ps_default as false hso: fix soft-lockup hso: fix debug routines pktgen: Fix device name compares stmmac: do not fail when the timer cannot be used. stmmac: fixed a compilation error when use the external timer netfilter: xt_limit: fix invalid return code in limit_mt_check() Au1x00: fix crash when trying register_netdev() ...
| | * \ \ \ Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6Linus Torvalds2009-11-19
| | |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: SUNRPC: Address buffer overrun in rpc_uaddr2sockaddr() NFSv4: Fix a cache validation bug which causes getcwd() to return ENOENT
| | | * | | | SUNRPC: Address buffer overrun in rpc_uaddr2sockaddr()Chuck Lever2009-11-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The size of buf[] must account for the string termination needed for the first strict_strtoul() call. Introduced in commit a02d6926. Fábio Olivé Leite points out that strict_strtoul() requires _either_ '\n\0' _or_ '\0' termination, so use the simpler '\0' here instead. See http://bugzilla.kernel.org/show_bug.cgi?id=14546 . Reported-by: argp@census-labs.com Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Fábio Olivé Leite <fleite@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * | | | | | ipsec: can not add camellia cipher algorithm when using "ip xfrm state" commandLi Yewang2009-12-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | can not add camellia cipher algorithm when using "ip xfrm state" command. Signed-off-by: Li Yewang <lyw@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | tcp: tcp_disconnect() should clear window_clampEric Dumazet2009-11-30
| | |_|/ / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | NFS can reuse its TCP socket after calling tcp_disconnect(). We noticed window scaling was not negotiated in SYN packet of next connection request. Fix is to clear tp->window_clamp in tcp_disconnect(). Reported-by: Krzysztof Oledzki <ole@ans.pl> Tested-by: Krzysztof Oledzki <ole@ans.pl> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | ipv4: additional update of dev_net(dev) to struct *net in ip_fragment.c, ↵David Ford2009-11-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | NULL ptr OOPS ipv4 ip_frag_reasm(), fully replace 'dev_net(dev)' with 'net', defined previously patched into 2.6.29. Between 2.6.28.10 and 2.6.29, net/ipv4/ip_fragment.c was patched, changing from dev_net(dev) to container_of(...). Unfortunately the goto section (out_fail) on oversized packets inside ip_frag_reasm() didn't get touched up as well. Oversized IP packets cause a NULL pointer dereference and immediate hang. I discovered this running openvasd and my previous email on this is titled: NULL pointer dereference at 2.6.32-rc8:net/ipv4/ip_fragment.c:566 Signed-off-by: David Ford <david@blue-labs.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | net: Simplify ipip6 aka sit pernet operations.Eric W. Biederman2009-12-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Take advantage of the new pernet automatic storage management, and stop using compatibility network namespace functions. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | net: Simplify ip6_tunnel pernet operations.Eric W. Biederman2009-12-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Take advantage of the new pernet automatic storage management, and stop using compatibility network namespace functions. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | net: Simplify ipip pernet operations.Eric W. Biederman2009-12-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Take advantage of the new pernet automatic storage management, and stop using compatibility network namespace functions. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | net: Simplify ip_gre pernet operations.Eric W. Biederman2009-12-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Take advantage of the new pernet automatic storage management, and stop using compatibility network namespace functions. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>