aboutsummaryrefslogtreecommitdiffstats
path: root/net/ipv4
Commit message (Collapse)AuthorAge
* [NETNS]: Add namespace parameter to ip_route_output_key.Denis V. Lunev2008-01-28
| | | | | | | Needed to propagate it down to the ip_route_output_flow. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS]: Add namespace parameter to ip_route_output_flow.Denis V. Lunev2008-01-28
| | | | | | | Needed to propagate it down to the __ip_route_output_key. Signed_off_by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS]: Add namespace parameter to __ip_route_output_key.Denis V. Lunev2008-01-28
| | | | | | | | This is only required to propagate it down to the ip_route_output_slow. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS]: Add namespace parameter to ip_route_output_slow.Denis V. Lunev2008-01-28
| | | | | | | | This function needs a net namespace to lookup devices, fib tables, etc. in, so pass it there. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS]: Add namespace parameter to ip_dev_find.Denis V. Lunev2008-01-28
| | | | | | | | in_dev_find() need a namespace to pass it to fib_get_table(), so add an argument. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS]: Add netns parameter to fib_select_default.Denis V. Lunev2008-01-28
| | | | | | | | | Currently fib_select_default calls fib_get_table() with the init_net. Prepare it to provide a correct namespace to lookup default route. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4]: Consolidate fib_select_default.Denis V. Lunev2008-01-28
| | | | | | | | | The difference in the implementation of the fib_select_default when CONFIG_IP_MULTIPLE_TABLES is (not) defined looks negligible. Consolidate it and place into fib_frontend.c. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4] fib_trie: avoid rescan on dumpStephen Hemminger2008-01-28
| | | | | | | | | | | | | | | This converts dumping (and flushing) of large route tables form O(N^2) to O(N). If the route dump took multiple pages then the dump routine gets called again. The old code kept track of location by counter, the new code instead uses the last key. This is a really big win ( 0.3 sec vs 12 sec) for big route tables. One side effect is that if the table changes during the dump, then the last key will not be found, and we will return -EBUSY. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4] fib_trie: avoid extra search on deleteStephen Hemminger2008-01-28
| | | | | | | Get rid of extra search that made route deletion O(n). Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4] fib_trie: dump table in sorted orderStephen Hemminger2008-01-28
| | | | | | | | | It is easier with TRIE to dump the data traversal rather than interating over every possible prefix. This saves some time and makes the dump come out in sorted order. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4] fib_trie: iterator recodeStephen Hemminger2008-01-28
| | | | | | | | | Remove the complex loop structure of nextleaf() and replace it with a simpler tree walker. This improves the performance and is much cleaner. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4] fib_trie: dump message multiple part flagStephen Hemminger2008-01-28
| | | | | | | Match fib_hash, and set NLM_F_MULTI to handle multiple part messages. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4] fib_trie: use hash listStephen Hemminger2008-01-28
| | | | | | | | The code to dump can use the existing hash chain rather than doing repeated lookup. Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4] fib_trie: compute size when neededStephen Hemminger2008-01-28
| | | | | | | Compute the number of prefixes when needed, rather than doing bookeeping. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4] fib_trie: style cleanupStephen Hemminger2008-01-28
| | | | | | | | | | | Style cleanups: * make check_leaf return -1 or plen, rather than by reference * Get rid of #ifdef that is always set * split out embedded function calls in if statements. * checkpatch warnings Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4] fib_trie: put leaf nodes in a slab cacheStephen Hemminger2008-01-28
| | | | | | | | This improves locality for operations that touch all the leaves. Save space since these entries don't need to be hardware cache aligned. Signed-off-by: Stephen Hemminger <stephen.hemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [DST]: shrinks sizeof(struct rtable) by 64 bytes on x86_64Eric Dumazet2008-01-28
| | | | | | | | | | | | | | | | | | | On x86_64, sizeof(struct rtable) is 0x148, which is rounded up to 0x180 bytes by SLAB allocator. We can reduce this to exactly 0x140 bytes, without alignment overhead, and store 12 struct rtable per PAGE instead of 10. rate_tokens is currently defined as an "unsigned long", while its content should not exceed 6*HZ. It can safely be converted to an unsigned int. Moving tclassid right after rate_tokens to fill the 4 bytes hole permits to save 8 bytes on 'struct dst_entry', which finally permits to save 8 bytes on 'struct rtable' Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS][FRAGS]: Make the pernet subsystem for fragments.Pavel Emelyanov2008-01-28
| | | | | | | | | | On namespace start we mainly prepare the ctl variables. When the namespace is stopped we have to kill all the fragments that point to this namespace. The inet_frags_exit_net() handles it. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS][FRAGS]: Make the LRU list per namespace.Pavel Emelyanov2008-01-28
| | | | | | | | | | | | | The inet_frags.lru_list is used for evicting only, so we have to make it per-namespace, to evict only those fragments, who's namespace exceeded its high threshold, but not the whole hash. Besides, this helps to avoid long loops in evictor. The spinlock is not per-namespace because it protects the hash table as well, which is global. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS][FRAGS]: Isolate the secret interval from namespaces.Pavel Emelyanov2008-01-28
| | | | | | | | | | | | Since we have one hashtable to lookup the fragment, having different secret_interval-s for hash rebuild doesn't make sense, so move this one to inet_frags. The inet_frags_ctl becomes empty after this, so remove it. The appropriate ctl table is kept read-only in namespaces. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS][FRAGS]: Make thresholds work in namespaces.Pavel Emelyanov2008-01-28
| | | | | | | | | | | This is the same as with the timeout variable. Currently, after exceeding the high threshold _all_ the fragments are evicted, but it will be fixed in later patch. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS][FRAGS]: Make the net.ipv4.ipfrag_timeout work in namespaces.Pavel Emelyanov2008-01-28
| | | | | | | | | | | Move it to the netns_frags, adjust the usage and make the appropriate ctl table writable. Now fragment, that live in different namespaces can live for different times. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS][FRAGS]: Duplicate sysctl tables for new namespaces.Pavel Emelyanov2008-01-28
| | | | | | | | | | | | Each namespace has to have own tables to tune their different parameters, so duplicate the tables and register them. All the tables in sub-namespaces are temporarily made read-only. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS][FRAGS]: Make the mem counter per-namespace.Pavel Emelyanov2008-01-28
| | | | | | | | This is also simple, but introduces more changes, since then mem counter is altered in more places. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS][FRAGS]: Make the nqueues counter per-namespace.Pavel Emelyanov2008-01-28
| | | | | | | | This is simple - just move the variable from struct inet_frags to struct netns_frags and adjust the usage appropriately. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS][FRAGS]: Make the inet_frag_queue lookup work in namespaces.Pavel Emelyanov2008-01-28
| | | | | | | | | | | | | | | | Since fragment management code is consolidated, we cannot have the pointer from inet_frag_queue to struct net, since we must know what king of fragment this is. So, I introduce the netns_frags structure. This one is currently empty, but will be eventually filled with per-namespace attributes. Each inet_frag_queue is tagged with this one. The conntrack_reasm is not "netns-izated", so it has one static netns_frags instance to keep working in init namespace. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS][FRAGS]: Move ctl tables around.Pavel Emelyanov2008-01-28
| | | | | | | | | | | | | | | | This is a preparation for sysctl netns-ization. Move the ctl tables to the files, where the tuning variables reside. Plus make the helpers to register the tables. This will simplify the later patches and will keep similar things closer to each other. ipv4, ipv6 and conntrack_reasm are patched differently, but the result is all the tables are in appropriate files. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4] UDP,UDPLITE: Sparse: {__udp4_lib,udp,udplite}_err() are of void.YOSHIFUJI Hideaki2008-01-28
| | | | | | | | Fix following sparse warnings: | net/ipv4/udp.c:421:2: warning: returning void-valued expression | net/ipv4/udplite.c:38:2: warning: returning void-valued expression Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
* [NETNS]: Pass correct namespace in ip_rt_get_source.Denis V. Lunev2008-01-28
| | | | | | | | | ip_rt_get_source is the infamous place for which dst_ifdown kludges have been implemented. This means that rt->u.dst.dev can be safely dereferrenced obtain nd_net. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS]: Pass correct namespace in ip_route_input_slow.Denis V. Lunev2008-01-28
| | | | | | | | The packet on the input path always has a referrence to an input network device it is passed from. Extract network namespace from it. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS]: Pass correct namespace in context fib_check_nh.Denis V. Lunev2008-01-28
| | | | | | | | | Correct network namespace is already used in fib_check_nh. Re-work its usage for better readability and pass into fib_lookup & inetdev_by_index. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS]: Pass correct namespace in fib_validate_source.Denis V. Lunev2008-01-28
| | | | | | | | | Correct network namespace is available inside fib_validate_source. It can be obtained from the device passed in. The device is not NULL as in_device is obtained from it just above. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS]: Add netns parameter to inetdev_by_index.Denis V. Lunev2008-01-28
| | | | | Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS]: Add netns parameter to fib_lookup.Denis V. Lunev2008-01-28
| | | | | Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4]: ipmr sparse warningsStephen Hemminger2008-01-28
| | | | | | | Get rid of some of the sparse warnings. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4]: igmp sparse warningsStephen Hemminger2008-01-28
| | | | | | | | Partial sparse warning fix. The other conditional locking is too much for sparse to handle. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4]: Enable use of 240/4 address space.Jan Engelhardt2008-01-28
| | | | | | | | | | This short patch modifies the IPv4 networking to enable use of the 240.0.0.0/4 (aka "class-E") address space as propsed in the internet draft draft-fuller-240space-00.txt. Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS]: Process FIB rule action in the context of the namespace.Denis V. Lunev2008-01-28
| | | | | | | | | Save namespace context on the fib rule at the rule creation time and call routing lookup in the correct namespace. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS]: FIB rules API cleanup.Denis V. Lunev2008-01-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove struct net from fib_rules_register(unregister)/notify_change paths and diet code size a bit. add/remove: 0/0 grow/shrink: 10/12 up/down: 35/-100 (-65) function old new delta notify_rule_change 273 280 +7 trie_show_stats 471 475 +4 fn_trie_delete 473 477 +4 fib_rules_unregister 144 148 +4 fib4_rule_compare 119 123 +4 resize 2842 2845 +3 fn_trie_select_default 515 518 +3 inet_sk_rebuild_header 836 838 +2 fib_trie_seq_show 764 766 +2 __devinet_sysctl_register 276 278 +2 fn_trie_lookup 1124 1123 -1 ip_fib_check_default 133 131 -2 devinet_conf_sysctl 223 221 -2 snmp_fold_field 126 123 -3 fn_trie_insert 2091 2086 -5 inet_create 876 870 -6 fib4_rules_init 197 191 -6 fib_sync_down 452 444 -8 inet_gso_send_check 334 325 -9 fib_create_info 3003 2991 -12 fib_nl_delrule 568 553 -15 fib_nl_newrule 883 852 -31 Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [FIB]: Add netns to fib_rules_ops.Denis V. Lunev2008-01-28
| | | | | | | | | The backward link from FIB rules operations to the network namespace will allow to simplify the API a bit. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS]: Namespace stop vs 'ip r l' race.Denis V. Lunev2008-01-28
| | | | | | | | | | | | | | | | | | | | During network namespace stop process kernel side netlink sockets belonging to a namespace should be closed. They should not prevent namespace to stop, so they do not increment namespace usage counter. Though this counter will be put during last sock_put. The raplacement of the correct netns for init_ns solves the problem only partial as socket to be stoped until proper stop is a valid netlink kernel socket and can be looked up by the user processes. This is not a problem until it resides in initial namespace (no processes inside this net), but this is not true for init_net. So, hold the referrence for a socket, remove it from lookup tables and only after that change namespace and perform a last put. Signed-off-by: Denis V. Lunev <den@openvz.org> Tested-by: Alexey Dobriyan <adobriyan@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS]: Consolidate kernel netlink socket destruction.Denis V. Lunev2008-01-28
| | | | | | | | | | Create a specific helper for netlink kernel socket disposal. This just let the code look better and provides a ground for proper disposal inside a namespace. Signed-off-by: Denis V. Lunev <den@openvz.org> Tested-by: Alexey Dobriyan <adobriyan@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS]: Memory leak on network namespace stop.Denis V. Lunev2008-01-28
| | | | | | | | | | Network namespace allocates 2 kernel netlink sockets, fibnl & rtnl. These sockets should be disposed properly, i.e. by sock_release. Plain sock_put is not enough. Signed-off-by: Denis V. Lunev <den@openvz.org> Tested-by: Alexey Dobriyan <adobriyan@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS][DST] dst: pass the dst_ops as parameter to the gc functionsDaniel Lezcano2008-01-28
| | | | | | | | | | | | | | The garbage collection function receive the dst_ops structure as parameter. This is useful for the next incoming patchset because it will need the dst_ops (there will be several instances) and the network namespace pointer (contained in the dst_ops). The protocols which do not take care of the namespaces will not be impacted by this change (expect for the function signature), they do just ignore the parameter. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4] FIB_HASH: Reduce memory needs and speedup lookupsEric Dumazet2008-01-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, sizeof(struct fib_alias) is 24 or 48 bytes on 32/64 bits arches. Because of SLAB_HWCACHE_ALIGN requirement, these are rounded to 32 and 64 bytes respectively. This patch moves rcu to the end of fib_alias, and conditionally defines it only for CONFIG_IP_FIB_TRIE. We also remove SLAB_HWCACHE_ALIGN requirement for fib_alias and fib_node objects because it is not necessary. (BTW SLUB currently denies it for objects smaller than cache_line_size() / 2, but not SLAB) Finally, sizeof(fib_alias) go back to 16 and 32 bytes. Then, we can embed one fib_alias on each fib_node, to favor locality. Most of the time access to the fib_alias will be free because one cache line contains both the list head (fn_alias) and (one of) the list element. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [FIB]: Fix rcu_dereference() abuses in fib_trie.cEric Dumazet2008-01-28
| | | | | | | | | | | | | | | | | | | | | | node_parent() and tnode_get_child() currently use rcu_dereference(). These functions are called from both - readers only paths (where rcu_dereference() is needed), and - writer path (where rcu_dereference() is not needed) To make explicit where rcu_dereference() is really needed, I introduced new node_parent_rcu() and tnode_get_child_rcu() functions which use rcu_dereference(), while node_parent() and tnode_get_child() dont use it. Then I changed calling sites where rcu_dereference() was really needed to call the _rcu() variants. This should have no impact but for alpha architecture, and may help future sparse checks. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETFILTER]: nf_conntrack: make print_conntrack function optional for l4protosPatrick McHardy2008-01-28
| | | | | | | Allows to remove five empty implementations. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETFILTER]: nf_conntrack: remove print_conntrack function from l3protosPatrick McHardy2008-01-28
| | | | | | | Its unused and unlikely to ever be used. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETFILTER]: kill nf_sysctl.cPatrick McHardy2008-01-28
| | | | | | | | | Since there now is generic support for shared sysctl paths, the only remains are the net/netfilter and net/ipv4/netfilter paths. Move them to net/netfilter/core.c and net/ipv4/netfilter.c and kill nf_sysctl.c. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETFILTER]: ipt_REJECT: properly handle IP optionsDenys Vlasenko2008-01-28
| | | | | | | | | The current TCP RST construction reuses the old packet and can't deal with IP options as a consequence of that. Construct the RST from scratch instead. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>