aboutsummaryrefslogtreecommitdiffstats
path: root/net/ipv4
Commit message (Collapse)AuthorAge
...
| * | tcp: Use SKB queue and list helpers instead of doing it by-hand.David S. Miller2009-05-29
| | | | | | | | | | | | Signed-off-by: David S. Miller <davem@davemloft.net>
* | | netfilter: nf_ct_icmp: keep the ICMP ct entries longerJan Kasprzak2009-06-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current conntrack code kills the ICMP conntrack entry as soon as the first reply is received. This is incorrect, as we then see only the first ICMP echo reply out of several possible duplicates as ESTABLISHED, while the rest will be INVALID. Also this unnecessarily increases the conntrackd traffic on H-A firewalls. Make all the ICMP conntrack entries (including the replied ones) last for the default of nf_conntrack_icmp{,v6}_timeout seconds. Signed-off-by: Jan "Yenya" Kasprzak <kas@fi.muni.cz> Signed-off-by: Patrick McHardy <kaber@trash.net>
* | | netfilter: ipt_MASQUERADE: remove redundant rwlockFlorian Westphal2009-06-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The lock "protects" an assignment and a comparision of an integer. When the caller of device_cmp() evaluates the result, nat->masq_index may already have been changed (regardless if the lock is there or not). So, the lock either has to be held during nf_ct_iterate_cleanup(), or can be removed. This does the latter. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Patrick McHardy <kaber@trash.net>
* | | netfilter: x_tables: added hook number into match extension parameter structure.Evgeniy Polyakov2009-06-04
| | | | | | | | | | | | | | | Signed-off-by: Evgeniy Polyakov <zbr@ioremap.net> Signed-off-by: Patrick McHardy <kaber@trash.net>
* | | netfilter: conntrack: simplify event caching systemPablo Neira Ayuso2009-06-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch simplifies the conntrack event caching system by removing several events: * IPCT_[*]_VOLATILE, IPCT_HELPINFO and IPCT_NATINFO has been deleted since the have no clients. * IPCT_COUNTER_FILLING which is a leftover of the 32-bits counter days. * IPCT_REFRESH which is not of any use since we always include the timeout in the messages. After this patch, the existing events are: * IPCT_NEW, IPCT_RELATED and IPCT_DESTROY, that are used to identify addition and deletion of entries. * IPCT_STATUS, that notes that the status bits have changes, eg. IPS_SEEN_REPLY and IPS_ASSURED. * IPCT_PROTOINFO, that reports that internal protocol information has changed, eg. the TCP, DCCP and SCTP protocol state. * IPCT_HELPER, that a helper has been assigned or unassigned to this entry. * IPCT_MARK and IPCT_SECMARK, that reports that the mark has changed, this covers the case when a mark is set to zero. * IPCT_NATSEQADJ, to report that there's updates in the NAT sequence adjustment. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | | Merge branch 'master' of git://dev.medozas.de/linuxPatrick McHardy2009-06-02
|\ \ \ | |/ / |/| |
| * | netfilter: xtables: consolidate comefrom debug cast accessJan Engelhardt2009-05-08
| | | | | | | | | | | | Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
| * | netfilter: xtables: remove another level of indentJan Engelhardt2009-05-08
| | | | | | | | | | | | Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
| * | netfilter: xtables: remove some gotoJan Engelhardt2009-05-08
| | | | | | | | | | | | | | | | | | Combining two ifs, and goto is easily gone. Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
| * | netfilter: xtables: reduce indent level by oneJan Engelhardt2009-05-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | Cosmetic only. Transformation applied: -if (foo) { long block; } else { short block; } +if (!foo) { short block; continue; } long block; Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
| * | netfilter: xtables: consolidate open-coded logicJan Engelhardt2009-05-08
| | | | | | | | | | | | Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
| * | netfilter: xtables: fix const inconsistencyJan Engelhardt2009-05-08
| | | | | | | | | | | | Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
| * | netfilter: xtables: remove redundant castsJan Engelhardt2009-05-08
| | | | | | | | | | | | Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
| * | netfilter: xtables: use NFPROTO_ in standard targetsJan Engelhardt2009-05-08
| | | | | | | | | | | | Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
| * | netfilter: queue: use NFPROTO_ for queue callsitesJan Engelhardt2009-05-08
| | | | | | | | | | | | | | | | | | af is an nfproto. Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
| * | netfilter: xtables: use NFPROTO_ for xt_proto_init callsitesJan Engelhardt2009-05-08
| | | | | | | | | | | | Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
* | | tcp: Do not check flush when comparing options for GROHerbert Xu2009-05-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | There is no need to repeatedly check flush when comparing TCP options for GRO as it will be false 99% of the time where it matters. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipv4: Use 32-bit loads for ID and length in GROHerbert Xu2009-05-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch optimises the IPv4 GRO code by using 32-bit loads (instead of 16-bit ones) on the ID and length checks in the receive function. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | gro: Avoid unnecessary comparison after skb_gro_headerHerbert Xu2009-05-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For the overwhelming majority of cases, skb_gro_header's return value cannot be NULL. Yet we must check it because of its current form. This patch splits it up into multiple functions in order to avoid this. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | tcp: Optimise len/mss comparisonHerbert Xu2009-05-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of checking len > mss || len == 0, we can accomplish both by checking (len - 1) > mss using the unsigned wraparound. At nearly a million times a second, this might just help. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | tcp: Remove unnecessary window comparisons for GROHerbert Xu2009-05-27
| | | | | | | | | | | | | | | | | | | | | | | | The window has already been checked as part of the flag word so there is no need to check it explicitly. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | tcp: Optimise GRO port comparisonsHerbert Xu2009-05-27
| | | | | | | | | | | | | | | | | | | | | | | | Instead of doing two 16-bit operations for the source/destination ports, we can do one 32-bit operation to take care both. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | Merge branch 'master' of ↵David S. Miller2009-05-25
|\ \ \ | | |/ | |/| | | | | | | | | | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/wireless/ath/ath5k/phy.c drivers/net/wireless/iwlwifi/iwl-agn.c drivers/net/wireless/iwlwifi/iwl3945-base.c
| * | ipv4: Fix oops with FIB_TRIERobert Olsson2009-05-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It seems we can fix this by disabling preemption while we re-balance the trie. This is with the CONFIG_CLASSIC_RCU. It's been stress-tested at high loads continuesly taking a full BGP table up/down via iproute -batch. Note. fib_trie is not updated for CONFIG_PREEMPT_RCU Reported-by: Andrei Popa Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: fix rtable leak in net/ipv4/route.cEric Dumazet2009-05-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Alexander V. Lukyanov found a regression in 2.6.29 and made a complete analysis found in http://bugzilla.kernel.org/show_bug.cgi?id=13339 Quoted here because its a perfect one : begin_of_quotation 2.6.29 patch has introduced flexible route cache rebuilding. Unfortunately the patch has at least one critical flaw, and another problem. rt_intern_hash calculates rthi pointer, which is later used for new entry insertion. The same loop calculates cand pointer which is used to clean the list. If the pointers are the same, rtable leak occurs, as first the cand is removed then the new entry is appended to it. This leak leads to unregister_netdevice problem (usage count > 0). Another problem of the patch is that it tries to insert the entries in certain order, to facilitate counting of entries distinct by all but QoS parameters. Unfortunately, referencing an existing rtable entry moves it to list beginning, to speed up further lookups, so the carefully built order is destroyed. For the first problem the simplest patch it to set rthi=0 when rthi==cand, but it will also destroy the ordering. end_of_quotation Problematic commit is 1080d709fb9d8cd4392f93476ee46a9d6ea05a5b (net: implement emergency route cache rebulds when gc_elasticity is exceeded) Trying to keep dst_entries ordered is too complex and breaks the fact that order should depend on the frequency of use for garbage collection. A possible fix is to make rt_intern_hash() simpler, and only makes rt_check_expire() a litle bit smarter, being able to cope with an arbitrary entries order. The added loop is running on cache hot data, while cpu is prefetching next object, so should be unnoticied. Reported-and-analyzed-by: Alexander V. Lukyanov <lav@yar.ru> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: fix length computation in rt_check_expire()Eric Dumazet2009-05-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | rt_check_expire() computes average and standard deviation of chain lengths, but not correclty reset length to 0 at beginning of each chain. This probably gives overflows for sum2 (and sum) on loaded machines instead of meaningful results. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | ipv4: make default for INET_LRO consistent with help textFrans Pop2009-05-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | Commit e81963b1 ("ipv4: Make INET_LRO a bool instead of tristate.") changed this config from tristate to bool. Add default so that it is consistent with the help text. Signed-off-by: Frans Pop <elendil@planet.nl> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net: Remove unused parameter from fill method in fib_rules_ops.Rami Rosen2009-05-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The netlink message header (struct nlmsghdr) is an unused parameter in fill method of fib_rules_ops struct. This patch removes this parameter from this method and fixes the places where this method is called. (include/net/fib_rules.h) Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipv4: teach ipconfig about the MTU option in DHCPChris Friesen2009-05-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The DHCP spec allows the server to specify the MTU. This can be useful for netbooting with UDP-based NFS-root on a network using jumbo frames. This patch allows the kernel IP autoconfiguration to handle this option correctly. It would be possible to use initramfs and add a script to set the MTU, but that seems like a complicated solution if no initramfs is otherwise necessary, and would bloat the kernel image more than this code would. This patch was originally submitted to LKML in 2003 by Hans-Peter Jansen. Signed-off-by: Chris Friesen <cfriesen@nortel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net: Fix devinet_sysctl_forwardEric W. Biederman2009-05-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sysctls are unregistered with the rntl_lock held making it unsafe to unconditionally grab the the rtnl_lock. Instead we need to call rtnl_trylock and restart the system call if we can not grab it. Otherwise we could deadlock at unregistration time. Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | Merge branch 'master' of ↵David S. Miller2009-05-19
|\| | | | | | | | | | | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/scsi/fcoe/fcoe.c
| * | tcp: fix MSG_PEEK race checkIlpo Järvinen2009-05-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 518a09ef11 (tcp: Fix recvmsg MSG_PEEK influence of blocking behavior) lets the loop run longer than the race check did previously expect, so we need to be more careful with this check and consider the work we have been doing. I tried my best to deal with urg hole madness too which happens here: if (!sock_flag(sk, SOCK_URGINLINE)) { ++*seq; ... by using additional offset by one but I certainly have very little interest in testing that part. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Tested-by: Frans Pop <elendil@planet.nl> Tested-by: Ian Zimmermann <itz@buug.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | ipconfig: handle case of delayed DHCP serverChris Friesen2009-05-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a DHCP server is delayed, it's possible for the client to receive the DHCPOFFER after it has already sent out a new DHCPDISCOVER message from a second interface. The client then sends out a DHCPREQUEST from the second interface, but the server doesn't recognize the device and rejects the request. This patch simply tracks the current device being configured and throws away the OFFER if it is not intended for the current device. A more sophisticated approach would be to put the OFFER information into the struct ic_device rather than storing it globally. Signed-off-by: Chris Friesen <cfriesen@nortel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | ipv4: Make INET_LRO a bool instead of tristate.David S. Miller2009-05-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This code is used as a library by several device drivers, which select INET_LRO. If some are modules and some are statically built into the kernel, we get build failures if INET_LRO is modular. Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipv4: cleanup: remove unnecessary include.Rami Rosen2009-05-18
| | | | | | | | | | | | | | | | | | | | | | | | There is no need for net/icmp.h header in net/ipv4/fib_frontend.c. This patch removes the #include net/icmp.h from it. Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipv4: cleanup - remove two unused parameters from fib_semantic_match().Rami Rosen2009-05-18
| | | | | | | | | | | | | | | Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipv4: remove an unused parameter from configure method of fib_rules_ops.Rami Rosen2009-05-17
| | | | | | | | | | | | | | | Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | Merge branch 'master' of ↵David S. Miller2009-05-08
|\| | | |/ |/| | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: include/net/tcp.h
| * tcp: Fix tcp_prequeue() to get correct rto_min valueSatoru SATOH2009-05-04
| | | | | | | | | | | | | | | | | | tcp_prequeue() refers to the constant value (TCP_RTO_MIN) regardless of the actual value might be tuned. The following patches fix this and make tcp_prequeue get the actual value returns from tcp_rto_min(). Signed-off-by: Satoru SATOH <satoru.satoh@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: Make inet_twsk_put similar to sock_putArnaldo Carvalho de Melo2009-05-06
| | | | | | | | | | | | | | | | | | | | By separating the freeing code from the refcounting decrementing. Probably reducing icache pressure when we still have reference counts to go. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | tcp:fix the code indentShan Wei2009-05-05
| | | | | | | | | | Signed-off-by: Shan Wei<shanwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | tcp: extend ECN sysctl to allow server-side only ECNIlpo Järvinen2009-05-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This should be very safe compared with full enabled, so I see no reason why it shouldn't be done right away. As ECN can only be negotiated if the SYN sending party is also supporting it, somebody in the loop probably knows what he/she is doing. If SYN does not ask for ECN, the server side SYN-ACK is identical to what it is without ECN. Thus it's quite safe. The chosen value is safe w.r.t to existing configs which choose to currently set manually either 0 or 1 but silently upgrades those who have not explicitly requested ECN off. Whether to just enable both sides comes up time to time but unless that gets done now we can at least make the servers aware of ECN already. As there are some known problems to occur if ECN is enabled, it's currently questionable whether there's any real gain from enabling clients as servers mostly won't support it anyway (so we'd hit just the negative sides). After enabling the servers and getting that deployed, the client end enable really has some potential gain too. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'master' of ↵David S. Miller2009-04-29
|\| | | | | | | | | | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: Documentation/isdn/00-INDEX drivers/net/wireless/iwlwifi/iwl-scan.c drivers/net/wireless/rndis_wlan.c net/mac80211/main.c
| * netfilter: revised locking for x_tablesStephen Hemminger2009-04-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The x_tables are organized with a table structure and a per-cpu copies of the counters and rules. On older kernels there was a reader/writer lock per table which was a performance bottleneck. In 2.6.30-rc, this was converted to use RCU and the counters/rules which solved the performance problems for do_table but made replacing rules much slower because of the necessary RCU grace period. This version uses a per-cpu set of spinlocks and counters to allow to table processing to proceed without the cache thrashing of a global reader lock and keeps the same performance for table updates. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * ipv4: Limit size of route cache hash tableAnton Blanchard2009-04-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Right now we have no upper limit on the size of the route cache hash table. On a 128GB POWER6 box it ends up as 32MB: IP route cache hash table entries: 4194304 (order: 9, 33554432 bytes) It would be nice to cap this for memory consumption reasons, but a massive hashtable also causes a significant spike when measuring OS jitter. With a 32MB hashtable and 4 million entries, rt_worker_func is taking 5 ms to complete. On another system with more memory it's taking 14 ms. Even though rt_worker_func does call cond_sched() to limit its impact, in an HPC environment we want to keep all sources of OS jitter to a minimum. With the patch applied we limit the number of entries to 512k which can still be overriden by using the rt_entries boot option: IP route cache hash table entries: 524288 (order: 6, 4194304 bytes) With this patch rt_worker_func now takes 0.460 ms on the same system. Signed-off-by: Anton Blanchard <anton@samba.org> Acked-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | inet_diag: Remove dup assignmentsArnaldo Carvalho de Melo2009-04-28
| | | | | | | | | | | | | | These are later assigned to other values without being used meanwhile. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | gro: Fix COMPLETE checksum handlingHerbert Xu2009-04-27
| | | | | | | | | | | | | | | | | | On a brand new GRO skb, we cannot call ip_hdr since the header may lie in the non-linear area. This patch adds the helper skb_gro_network_header to handle this. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* | snmp: add missing counters for RFC 4293Neil Horman2009-04-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The IP MIB (RFC 4293) defines stats for InOctets, OutOctets, InMcastOctets and OutMcastOctets: http://tools.ietf.org/html/rfc4293 But it seems we don't track those in any way that easy to separate from other protocols. This patch adds those missing counters to the stats file. Tested successfully by me With help from Eric Dumazet. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'master' of ↵David S. Miller2009-04-21
|\| | | | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: net/core/dev.c
| * tcp: fix mid-wq adjustment helperIlpo Järvinen2009-04-20
| | | | | | | | | | | | | | | | | | | | Just noticed while doing some new work that the recent mid-wq adjustment logic will misbehave when FACK is not in use (happens either due sysctl'ed off or auto-detected reordering) because I forgot the relevant TCPCB tagbit. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>