aboutsummaryrefslogtreecommitdiffstats
path: root/include/net
Commit message (Collapse)AuthorAge
...
* | | | ipv6: use standard lists for FIB walksAlexey Dobriyan2010-02-18
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | netfilter: nf_defrag_ipv4: fix compilation error with NF_CONNTRACK=nPatrick McHardy2010-02-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As reported by Randy Dunlap <randy.dunlap@oracle.com>, compilation of nf_defrag_ipv4 fails with: include/net/netfilter/nf_conntrack.h:94: error: field 'ct_general' has incomplete type include/net/netfilter/nf_conntrack.h:178: error: 'const struct sk_buff' has no member named 'nfct' include/net/netfilter/nf_conntrack.h:185: error: implicit declaration of function 'nf_conntrack_put' include/net/netfilter/nf_conntrack.h:294: error: 'const struct sk_buff' has no member named 'nfct' net/ipv4/netfilter/nf_defrag_ipv4.c:45: error: 'struct sk_buff' has no member named 'nfct' net/ipv4/netfilter/nf_defrag_ipv4.c:46: error: 'struct sk_buff' has no member named 'nfct' net/nf_conntrack.h must not be included with NF_CONNTRACK=n, add a few #ifdefs. Long term the header file should be fixed to be usable even with NF_CONNTRACK=n. Tested-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
* | | | ipvs: SCTP Trasport Loadbalancing SupportVenkata Mohan Reddy2010-02-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Enhance IPVS to load balance SCTP transport protocol packets. This is done based on the SCTP rfc 4960. All possible control chunks have been taken care. The state machine used in this code looks some what lengthy. I tried to make the state machine easy to understand. Signed-off-by: Venkata Mohan Reddy Koppula <mohanreddykv@gmail.com> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Patrick McHardy <kaber@trash.net>
* | | | IPv6: convert mc_lock to spinlockStephen Hemminger2010-02-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Only used for writing, so convert to spinlock Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | ipv6.h: reassembly: replace calculated magic number with multiplicationJoe Perches2010-02-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On Tue, 2010-02-16 at 16:47 +0100, Patrick McHardy wrote: > Joe Perches wrote: > >> @@ -246,6 +246,8 @@ extern int ipv6_opt_accepted(struct sock *sk, struct sk_buff *skb); > >> int ip6_frag_nqueues(struct net *net); > >> int ip6_frag_mem(struct net *net); > >> > >> +#define IPV6_FRAG_HIGH_THRESH 262144 /* == 256*1024 */ > >> +#define IPV6_FRAG_LOW_THRESH 196608 /* == 192*1024 */ > >> #define IPV6_FRAG_TIMEOUT (60*HZ) /* 60 seconds */ > > > > 196608 isn't a number I want to remember. > > Is this better as: > > > > #define IPV6_FRAG_HIGH_THRESH (256 * 1024) /* 262144 */ > > #define IPV6_FRAG_LOW_THRESH (192 * 1024) /* 196608 */ > > Please send a patch, I'll apply it once these patches are in Dave's > tree. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | percpu: add __percpu sparse annotations to netTejun Heo2010-02-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add __percpu sparse annotations to net. These annotations are to make sparse consider percpu variables to be in a different address space and warn if accessed without going through percpu accessors. This patch doesn't affect normal builds. The macro and type tricks around snmp stats make things a bit interesting. DEFINE/DECLARE_SNMP_STAT() macros mark the target field as __percpu and SNMP_UPD_PO_STATS() macro is updated accordingly. All snmp_mib_*() users which used to cast the argument to (void **) are updated to cast it to (void __percpu **). Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: David S. Miller <davem@davemloft.net> Cc: Patrick McHardy <kaber@trash.net> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Vlad Yasevich <vladislav.yasevich@hp.com> Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | net neigh: Decouple per interface neighbour table controls from binary sysctlsEric W. Biederman2010-02-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Stop computing the number of neighbour table settings we have by counting the number of binary sysctls. This behaviour was silly and meant that we could not add another neighbour table setting without also adding another binary sysctl. Don't pass the binary sysctl path for neighour table entries into neigh_sysctl_register. These parameters are no longer used and so are just dead code. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | Merge branch 'master' of ↵David S. Miller2010-02-16
|\ \ \ \ | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6
| * | | | netfilter: nf_conntrack: add support for "conntrack zones"Patrick McHardy2010-02-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Normally, each connection needs a unique identity. Conntrack zones allow to specify a numerical zone using the CT target, connections in different zones can use the same identity. Example: iptables -t raw -A PREROUTING -i veth0 -j CT --zone 1 iptables -t raw -A OUTPUT -o veth1 -j CT --zone 1 Signed-off-by: Patrick McHardy <kaber@trash.net>
| * | | | netfilter: nf_conntrack: pass template to l4proto ->error() handlerPatrick McHardy2010-02-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The error handlers might need the template to get the conntrack zone introduced in the next patches to perform a conntrack lookup. Signed-off-by: Patrick McHardy <kaber@trash.net>
| * | | | netfilter: nf_conntrack: elegantly simplify nf_ct_exp_net()Alexey Dobriyan2010-02-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove #ifdef at nf_ct_exp_net() by using nf_ct_net(). Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
| * | | | netfilter: nf_conntrack_sip: add T.38 FAX supportPatrick McHardy2010-02-11
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Patrick McHardy <kaber@trash.net>
| * | | | netfilter: nf_nat: support mangling a single TCP packet multiple timesPatrick McHardy2010-02-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | nf_nat_mangle_tcp_packet() can currently only handle a single mangling per window because it only maintains two sequence adjustment positions: the one before the last adjustment and the one after. This patch makes sequence number adjustment tracking in nf_nat_mangle_tcp_packet() optional and allows a helper to manually update the offsets after the packet has been fully handled. Signed-off-by: Patrick McHardy <kaber@trash.net>
| * | | | netfilter: nf_conntrack: show helper and class in /proc/net/nf_conntrack_expectPatrick McHardy2010-02-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make the output a bit more informative by showing the helper an expectation belongs to and the expectation class. Signed-off-by: Patrick McHardy <kaber@trash.net>
| * | | | Merge branch 'master' of /repos/git/net-next-2.6Patrick McHardy2010-02-10
| |\ \ \ \ | | | | | | | | | | | | | | | | | | Signed-off-by: Patrick McHardy <kaber@trash.net>
| * | | | | netfilter: xtables: add CT targetPatrick McHardy2010-02-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a new target for the raw table, which can be used to specify conntrack parameters for specific connections, f.i. the conntrack helper. The target attaches a "template" connection tracking entry to the skb, which is used by the conntrack core when initializing a new conntrack. Signed-off-by: Patrick McHardy <kaber@trash.net>
| * | | | | netfilter: nf_conntrack: support conntrack templatesPatrick McHardy2010-02-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Support initializing selected parameters of new conntrack entries from a "conntrack template", which is a specially marked conntrack entry attached to the skb. Currently the helper and the event delivery masks can be initialized this way. Signed-off-by: Patrick McHardy <kaber@trash.net>
| * | | | | netfilter: ctnetlink: support selective event deliveryPatrick McHardy2010-02-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add two masks for conntrack end expectation events to struct nf_conntrack_ecache and use them to filter events. Their default value is "all events" when the event sysctl is on and "no events" when it is off. A following patch will add specific initializations. Expectation events depend on the ecache struct of their master conntrack. Signed-off-by: Patrick McHardy <kaber@trash.net>
| * | | | | netfilter: nf_conntrack: split up IPCT_STATUS eventPatrick McHardy2010-02-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Split up the IPCT_STATUS event into an IPCT_REPLY event, which is generated when the IPS_SEEN_REPLY bit is set, and an IPCT_ASSURED event, which is generated when the IPS_ASSURED bit is set. In combination with a following patch to support selective event delivery, this can be used for "sparse" conntrack replication: start replicating the conntrack entry after it reached the ASSURED state and that way it's SYN-flood resistant. Signed-off-by: Patrick McHardy <kaber@trash.net>
| * | | | | netfilter: ctnetlink: only assign helpers for matching protocolsPatrick McHardy2010-02-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make sure not to assign a helper for a different network or transport layer protocol to a connection. Additionally change expectation deletion by helper to compare the name directly - there might be multiple helper registrations using the same name, currently one of them is chosen in an unpredictable manner and only those expectations are removed. Signed-off-by: Patrick McHardy <kaber@trash.net>
| * | | | | IPv6: reassembly: replace magic number with macro definitionsShan Wei2010-01-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use macro to define high/low thresh value, refer to IPV6_FRAG_TIMEOUT. Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Patrick McHardy <kaber@trash.net>
| * | | | | netfilter: netns: #ifdef ->iptable_security, ->ip6table_securityAlexey Dobriyan2010-01-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 'security' tables depend on SECURITY, so ifdef them. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
| * | | | | netfilter: nfnetlink: netns supportAlexey Dobriyan2010-01-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make nfnl socket per-petns. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
| * | | | | IPVS: Allow boot time change of hash sizeCatalin(ux) M. BOIE2010-01-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I was very frustrated about the fact that I have to recompile the kernel to change the hash size. So, I created this patch. If IPVS is built-in you can append ip_vs.conn_tab_bits=?? to kernel command line, or, if you built IPVS as modules, you can add options ip_vs conn_tab_bits=??. To keep everything backward compatible, you still can select the size at compile time, and that will be used as default. It has been about a year since this patch was originally posted and subsequently dropped on the basis of insufficient test data. Mark Bergsma has provided the following test results which seem to strongly support the need for larger hash table sizes: We do however run into the same problem with the default setting (212 = 4096 entries), as most of our LVS balancers handle around a million connections/SLAB entries at any point in time (around 100-150 kpps load). With only 4096 hash table entries this implies that each entry consists of a linked list of 256 connections *on average*. To provide some statistics, I did an oprofile run on an 2.6.31 kernel, with both the default 4096 table size, and the same kernel recompiled with IP_VS_CONN_TAB_BITS set to 18 (218 = 262144 entries). I built a quick test setup with a part of Wikimedia/Wikipedia's live traffic mirrored by the switch to the test host. With the default setting, at ~ 120 kpps packet load we saw a typical %si CPU usage of around 30-35%, and oprofile reported a hot spot in ip_vs_conn_in_get: samples % image name app name symbol name 1719761 42.3741 ip_vs.ko ip_vs.ko ip_vs_conn_in_get 302577 7.4554 bnx2 bnx2 /bnx2 181984 4.4840 vmlinux vmlinux __ticket_spin_lock 128636 3.1695 vmlinux vmlinux ip_route_input 74345 1.8318 ip_vs.ko ip_vs.ko ip_vs_conn_out_get 68482 1.6874 vmlinux vmlinux mwait_idle After loading the recompiled kernel with 218 entries, %si CPU usage dropped in half to around 12-18%, and oprofile looks much healthier, with only 7% spent in ip_vs_conn_in_get: samples % image name app name symbol name 265641 14.4616 bnx2 bnx2 /bnx2 143251 7.7986 vmlinux vmlinux __ticket_spin_lock 140661 7.6576 ip_vs.ko ip_vs.ko ip_vs_conn_in_get 94364 5.1372 vmlinux vmlinux mwait_idle 86267 4.6964 vmlinux vmlinux ip_route_input [ horms@verge.net.au: trivial up-port and minor style fixes ] Signed-off-by: Catalin(ux) M. BOIE <catab@embedromix.ro> Cc: Mark Bergsma <mark@wikimedia.org> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Patrick McHardy <kaber@trash.net>
* | | | | | net: Fix first line of kernel-doc for a few functionsBen Hutchings2010-02-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The function name must be followed by a space, hypen, space, and a short description. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | Merge branch 'master' of ↵David S. Miller2010-02-14
|\ \ \ \ \ \ | | |_|/ / / | |/| | | | | | | | | | ssh://master.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6
| * | | | | mac80211: Retry null data frame for power save.Vivek Natarajan2010-02-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Even if the null data frame is not acked by the AP, mac80211 goes into power save. This might lead to loss of frames from the AP. Prevent this by restarting dynamic_ps_timer when ack is not received for null data frames. Cc: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Vivek Natarajan <vnatarajan@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
| * | | | | mac80211: remove get_tx_stats() driver opKalle Valo2010-02-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | get_tx_stats() driver operation is not currently used anywhere in mac80211 and there are no plans to use it in the not-so-near future. So it can go without anyone missing it. Signed-off-by: Kalle Valo <kalle.valo@iki.fi> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
| * | | | | mac80211: allow station add/remove to sleepJohannes Berg2010-02-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Many drivers would like to sleep during station addition and removal, and currently have a high complexity there from not being able to. This introduces two new callbacks sta_add() and sta_remove() that drivers can implement instead of using sta_notify() and that can sleep, and the new sta_add() callback is also allowed to fail. The reason we didn't do this previously is that the IBSS code wants to insert stations from the RX path, which is a tasklet, so cannot sleep. This patch will keep the station allocation in that path, but moves adding the station to the driver out of line. Since the addition can now fail, we can have IBSS peer structs the driver rejected -- in that case we still talk to the station but never tell the driver about it in the control.sta pointer. If there will ever be a driver that has a low limit on the number of stations and that cannot talk to any stations that are not known to it, we need to do come up with a new strategy of handling larger IBSSs, maybe quicker expiry or rejecting peers. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
| * | | | | wireless: update radiotap parserJohannes Berg2010-02-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upstream radiotap has adopted the namespace proposal David Young made and I then took care of, for which I had adapted the radiotap parser as a library outside the kernel. This brings the in-kernel parser up to speed. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
* | | | | | xfrm: use proper kernel typesjamal2010-02-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kernel side should use uxx instead of __uxx types Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | ipv6: fib: fix crash when changing large fib while dumping itPatrick McHardy2010-02-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the fib size exceeds what can be dumped in a single skb, the dump is suspended and resumed once the last skb has been received by userspace. When the fib is changed while the dump is suspended, the walker might contain stale pointers, causing a crash when the dump is resumed. BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 IP: [<ffffffffa01bce04>] fib6_walk_continue+0xbb/0x124 [ipv6] PGD 5347a067 PUD 65c7067 PMD 0 Oops: 0000 [#1] PREEMPT SMP ... RIP: 0010:[<ffffffffa01bce04>] [<ffffffffa01bce04>] fib6_walk_continue+0xbb/0x124 [ipv6] ... Call Trace: [<ffffffff8104aca3>] ? mutex_spin_on_owner+0x59/0x71 [<ffffffffa01bd105>] inet6_dump_fib+0x11b/0x1b9 [ipv6] [<ffffffff81371af4>] netlink_dump+0x5b/0x19e [<ffffffff8134f288>] ? consume_skb+0x28/0x2a [<ffffffff81373b69>] netlink_recvmsg+0x1ab/0x2c6 [<ffffffff81372781>] ? netlink_unicast+0xfa/0x151 [<ffffffff813483e0>] __sock_recvmsg+0x6d/0x79 [<ffffffff81348a53>] sock_recvmsg+0xca/0xe3 [<ffffffff81066d4b>] ? autoremove_wake_function+0x0/0x38 [<ffffffff811ed1f8>] ? radix_tree_lookup_slot+0xe/0x10 [<ffffffff810b3ed7>] ? find_get_page+0x90/0xa5 [<ffffffff810b5dc5>] ? filemap_fault+0x201/0x34f [<ffffffff810ef152>] ? fget_light+0x2f/0xac [<ffffffff813519e7>] ? verify_iovec+0x4f/0x94 [<ffffffff81349a65>] sys_recvmsg+0x14d/0x223 Store the serial number when beginning to walk the fib and reload pointers when continuing to walk after a change occured. Similar to other dumping functions, this might cause unrelated entries to be missed when entries are deleted. Tested-by: Ben Greear <greearb@candelatech.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | net: add a wrapper sk_entry()Li Zefan2010-02-10
| |_|/ / / |/| | | | | | | | | | | | | | | | | | | Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | Merge branch 'master' of ↵David S. Miller2010-02-09
|\ \ \ \ \ | | |_|/ / | |/| | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
| * | | | netfilter: nf_conntrack: fix hash resizing with namespacesPatrick McHardy2010-02-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As noticed by Jon Masters <jonathan@jonmasters.org>, the conntrack hash size is global and not per namespace, but modifiable at runtime through /sys/module/nf_conntrack/hashsize. Changing the hash size will only resize the hash in the current namespace however, so other namespaces will use an invalid hash size. This can cause crashes when enlarging the hashsize, or false negative lookups when shrinking it. Move the hash size into the per-namespace data and only use the global hash size to initialize the per-namespace value when instanciating a new namespace. Additionally restrict hash resizing to init_net for now as other namespaces are not handled currently. Cc: stable@kernel.org Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | netfilter: nf_conntrack: per netns nf_conntrack_cachepEric Dumazet2010-02-08
| | |_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | nf_conntrack_cachep is currently shared by all netns instances, but because of SLAB_DESTROY_BY_RCU special semantics, this is wrong. If we use a shared slab cache, one object can instantly flight between one hash table (netns ONE) to another one (netns TWO), and concurrent reader (doing a lookup in netns ONE, 'finding' an object of netns TWO) can be fooled without notice, because no RCU grace period has to be observed between object freeing and its reuse. We dont have this problem with UDP/TCP slab caches because TCP/UDP hashtables are global to the machine (and each object has a pointer to its netns). If we use per netns conntrack hash tables, we also *must* use per netns conntrack slab caches, to guarantee an object can not escape from one namespace to another one. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> [Patrick: added unique slab name allocation] Cc: stable@kernel.org Signed-off-by: Patrick McHardy <kaber@trash.net>
* | | | Merge branch 'master' of ↵David S. Miller2010-02-04
|\ \ \ \ | | |/ / | |/| | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6
| * | | mac80211: fix monitor mode tx radiotap header handlingFelix Fietkau2010-02-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When an injected frame gets buffered for a powersave STA or filtered and retransmitted, mac80211 attempts to parse the radiotap header again, which doesn't work because it's gone at that point. This patch adds a new flag for checking the availability of a radiotap header, so that it only attempts to parse it once, reusing the tx info on the next call to ieee80211_tx(). This fixes severe issues with rekeying in AP mode. Signed-off-by: Felix Fietkau <nbd@openwrt.org> Cc: stable@kernel.org Signed-off-by: John W. Linville <linville@tuxdriver.com>
| * | | cfg80211: add regulatory hint disconnect supportLuis R. Rodriguez2010-02-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds a new regulatory hint to be used when we know all devices have been disconnected and idle. This can happen when we suspend, for instance. When we disconnect we can no longer assume the same regulatory rules learned from a country IE or beacon hints are applicable so restore regulatory settings to an initial state. Since driver hints are cached on the wiphy that called the hint, those hints are not reproduced onto cfg80211 as the wiphy will respect its own wiphy->regd regardless. Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
| * | | mac80211: wait for beacon before enabling powersaveJohannes Berg2010-01-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Because DTIM information is required for powersave but is only conveyed in beacons, wait for a beacon before enabling powersave, and change the way the information is conveyed to the driver accordingly. mwl8k doesn't currently seem to implement PS but requires the DTIM period in a different way; after talking to Lennert we agreed to just have mwl8k do the parsing itself in the finalize_join work. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Acked-by: Lennert Buytenhek <buytenh@marvell.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
| * | | cfg80211: export cfg80211_find_ieJohannes Berg2010-01-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This new function (previously a static function called just "find_ie" can be used to find a specific IE in a buffer of IEs. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
| * | | mac80211: fix update_tkip_key() documentation about the contextKalle Valo2010-01-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Johannes noticed that I had incorrectly documented the context of update_tkip_key() driver operation. It must be atomic because all RX code is run inside rcu critical section. Reported-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Kalle Valo <kalle.valo@iki.fi> Signed-off-by: John W. Linville <linville@tuxdriver.com>
| * | | cfg80211: export multiple MAC addresses in sysfsJohannes Berg2010-01-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a device has multiple MAC addresses, userspace will need to know about that. Similarly, if it allows the MAC addresses to vary by a bitmask. If a driver exports multiple addresses, it is assumed that it will be able to deal with that many different addresses, which need not necessarily match the ones programmed into the device; if a mask is set then the device should deal addresses within that mask based on an arbitrary "base address". To test it all and show how it is used, add support to hwsim even though it can't actually deal with addresses different from the default. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
| * | | mac80211: pass vif and station to update_tkip_keyJohannes Berg2010-01-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a TKIP key is updated, we should pass the station pointer instead of just the address, since drivers can use that to store their own data. We also need to pass the virtual interface pointer. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
| * | | mac80211: re-enable re-transmission of filtered framesJohannes Berg2010-01-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In an earlier commit, mac80211: disable software retry for now Pavel Roskin reported a problem that seems to be due to software retry of already transmitted frames. It turns out that we've never done that correctly, but due to some recent changes it now crashes in the TX code. I've added a comment in the patch that explains the problem better and also points to possible solutions -- which I can't implement right now. I disabled software retry of failed/filtered frames because it was broken. With the work of the previous patches, it now becomes fairly easy to re-enable it by adding a flag indicating that the frame shouldn't be modified, but still running it through the transmit handlers to populate the control information. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
* | | | sched: add head drop fifo queueHagen Paul Pfeifer2010-01-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds an additional queuing strategy, called pfifo_head_drop, to remove the oldest skb in the case of an overflow within the queue - the head element - instead of the last skb (tail). To remove the oldest skb in congested situations is useful for sensor network environments where newer packets reflect the superior information. Reviewed-by: Florian Westphal <fw@strlen.de> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | netns xfrm: xfrm6_tunnel in netnsAlexey Dobriyan2010-01-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I'm not sure about rcu stuff near kmem cache destruction: * checks for non-empty hashes look bogus, they're done _before_ rcu_berrier() * unregistering netns ops is done before kmem_cache destoy (as it should), and unregistering involves rcu barriers by itself So it looks nothing should be done. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | Merge branch 'master' of ↵David S. Miller2010-01-28
|\ \ \ \ | | |/ / | |/| | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
| * | | netns xfrm: deal with dst entries in netnsAlexey Dobriyan2010-01-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | GC is non-existent in netns, so after you hit GC threshold, no new dst entries will be created until someone triggers cleanup in init_net. Make xfrm4_dst_ops and xfrm6_dst_ops per-netns. This is not done in a generic way, because it woule waste (AF_MAX - 2) * sizeof(struct dst_ops) bytes per-netns. Reorder GC threshold initialization so it'd be done before registering XFRM policies. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | netns xfrm: fix "ip xfrm state|policy count" misreportAlexey Dobriyan2010-01-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | "ip xfrm state|policy count" report SA/SP count from init_net, not from netns of caller process. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>