aboutsummaryrefslogtreecommitdiffstats
path: root/net/ipv4/ipmr.c
Commit message (Collapse)AuthorAge
* ipmr: use goto to common label instead of opencodingIlpo Järvinen2009-02-07
| | | | | Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
* netns: ipmr: enable namespace support in ipv4 multicast routing codeBenjamin Thery2009-01-22
| | | | | | | | | | | | | | | | | | | | | | | | This last patch makes the appropriate changes to use and propagate the network namespace where needed in IPv4 multicast routing code. This consists mainly in replacing all the remaining init_net occurences with current netns pointer retrieved from sockets, net devices or mfc_caches depending on the routines' contexts. Some routines receive a new 'struct net' parameter to propagate the current netns: * vif_add/vif_delete * ipmr_new_tunnel * mroute_clean_tables * ipmr_cache_find * ipmr_cache_report * ipmr_cache_unresolved * ipmr_mfc_add/ipmr_mfc_delete * ipmr_get_route * rt_fill_info (in route.c) Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* netns: ipmr: declare ipmr /proc/net entries per-namespaceBenjamin Thery2009-01-22
| | | | | | | | | Declare IPv4 multicast forwarding /proc/net entries per-namespace: /proc/net/ip_mr_vif /proc/net/ip_mr_cache Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* netns: ipmr: declare reg_vif_num per-namespaceBenjamin Thery2009-01-22
| | | | | | | | | | | Preliminary work to make IPv4 multicast routing netns-aware. Declare variable 'reg_vif_num' per-namespace, move into struct netns_ipv4. At the moment, this variable is only referenced in init_net. Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* netns: ipmr: declare mroute_do_assert and mroute_do_pim per-namespaceBenjamin Thery2009-01-22
| | | | | | | | | | | | Preliminary work to make IPv4 multicast routing netns-aware. Declare IPv multicast routing variables 'mroute_do_assert' and 'mroute_do_pim' per-namespace in struct netns_ipv4. At the moment, these variables are only referenced in init_net. Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* netns: ipmr: declare counter cache_resolve_queue_len per-namespaceBenjamin Thery2009-01-22
| | | | | | | | | | | | | | | | | | | | | | | | | | Preliminary work to make IPv4 multicast routing netns-aware. Declare variable cache_resolve_queue_len per-namespace: move it into struct netns_ipv4. This variable counts the number of unresolved cache entries queued in the list mfc_unres_queue. This list is kept global to all netns as the number of entries per namespace is limited to 10 (hardcoded in routine ipmr_cache_unresolved). Entries belonging to different namespaces in mfc_unres_queue will be identified by matching the mfc_net member introduced previously in struct mfc_cache. Keeping this list global to all netns, also allows us to keep a single timer (ipmr_expire_timer) to handle their expiration. In some places cache_resolve_queue_len value was tested for arming or deleting the timer. These tests were equivalent to testing mfc_unres_queue value instead and are replaced in this patch. At the moment, cache_resolve_queue_len is only referenced in init_net. Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* netns: ipmr: dynamically allocate mfc_cache_arrayBenjamin Thery2009-01-22
| | | | | | | | | | | | Preliminary work to make IPv4 multicast routing netns-aware. Dynamically allocate IPv4 multicast forwarding cache, mfc_cache_array, and move it to struct netns_ipv4. At the moment, mfc_cache_array is only referenced in init_net. Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* netns: ipmr: store netns in struct mfc_cacheBenjamin Thery2009-01-22
| | | | | | | | | | | | | | | | | This patch stores into struct mfc_cache the network namespace each mfc_cache belongs to. The new member is mfc_net. mfc_net is assigned at cache allocation and doesn't change during the rest of the cache entry life. A new net parameter is added to ipmr_cache_alloc/ipmr_cache_alloc_unres. This will help to retrieve the current netns around the IPv4 multicast routing code. At the moment, all mfc_cache are allocated in init_net. Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* netns: ipmr: dynamically allocate vif_tableBenjamin Thery2009-01-22
| | | | | | | | | | | | Preliminary work to make IPv6 multicast routing netns-aware. Dynamically allocate interface table vif_table and move it to struct netns_ipv4, and update MIF_EXISTS() macro. At the moment, vif_table is only referenced in init_net. Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* netns: ipmr: allocate mroute_socket per-namespace.Benjamin Thery2009-01-22
| | | | | | | | | | | | Preliminary work to make IPv4 multicast routing netns-aware. Make IPv4 multicast routing mroute_socket per-namespace, moves it into struct netns_ipv4. At the moment, mroute_socket is only referenced in init_net. Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipmr: merge common codeIlpo Järvinen2008-12-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Also removes redundant skb->len < x check which can't be true once pskb_may_pull(skb, x) succeeded. $ diff-funcs pim_rcv ipmr.c ipmr.c pim_rcv_v1 --- ipmr.c:pim_rcv() +++ ipmr.c:pim_rcv_v1() @@ -1,22 +1,27 @@ -static int pim_rcv(struct sk_buff * skb) +int pim_rcv_v1(struct sk_buff * skb) { - struct pimreghdr *pim; + struct igmphdr *pim; struct iphdr *encap; struct net_device *reg_dev = NULL; if (!pskb_may_pull(skb, sizeof(*pim) + sizeof(*encap))) goto drop; - pim = (struct pimreghdr *)skb_transport_header(skb); - if (pim->type != ((PIM_VERSION<<4)|(PIM_REGISTER)) || - (pim->flags&PIM_NULL_REGISTER) || - (ip_compute_csum((void *)pim, sizeof(*pim)) != 0 && - csum_fold(skb_checksum(skb, 0, skb->len, 0)))) + pim = igmp_hdr(skb); + + if (!mroute_do_pim || + skb->len < sizeof(*pim) + sizeof(*encap) || + pim->group != PIM_V1_VERSION || pim->code != PIM_V1_REGISTER) goto drop; - /* check if the inner packet is destined to mcast group */ encap = (struct iphdr *)(skb_transport_header(skb) + - sizeof(struct pimreghdr)); + sizeof(struct igmphdr)); + /* + Check that: + a. packet is really destinted to a multicast group + b. packet is not a NULL-REGISTER + c. packet is not truncated + */ if (!ipv4_is_multicast(encap->daddr) || encap->tot_len == 0 || ntohs(encap->tot_len) + sizeof(*pim) > skb->len) @@ -40,9 +45,9 @@ skb->ip_summed = 0; skb->pkt_type = PACKET_HOST; dst_release(skb->dst); + skb->dst = NULL; reg_dev->stats.rx_bytes += skb->len; reg_dev->stats.rx_packets++; - skb->dst = NULL; nf_reset(skb); netif_rx(skb); dev_put(reg_dev); $ codiff net/ipv4/ipmr.o.old net/ipv4/ipmr.o.new net/ipv4/ipmr.c: pim_rcv_v1 | -283 pim_rcv | -284 2 functions changed, 567 bytes removed net/ipv4/ipmr.c: __pim_rcv | +307 1 function changed, 307 bytes added net/ipv4/ipmr.o.new: 3 functions changed, 307 bytes added, 567 bytes removed, diff: -260 (Tested on x86_64). It seems that pimlen arg could be left out as well and eq-sizedness of structs trapped with BUILD_BUG_ON but I don't think that's more than a cosmetic flaw since there aren't that many args anyway. Compile tested. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: /proc/net/ip_mr_cache, display Iif as a signed shortBenjamin Thery2008-12-04
| | | | | | | | | | | | | | | | | | | | | | | | | | Today, iproute2 fails to show multicast forwarding unresolved cache entries while scanning /proc/net/ip_mr_cache. Indeed, it expects to see -1 in 'Iif' column to identify unresolved entries but the kernel outputs 65535. It's a signed/unsigned issue: 'Iif', the source interface, is retrieved from member mfc_parent in struct mfc_cache. mfc_parent is a vifi_t: unsigned short, but is displayed in ipmr_mfc_seq_show() as "%-3d", signed integer. In unresolevd entries, the 65535 value (0xFFFF) comes from this define: #define ALL_VIFS ((vifi_t)(-1)) That may explains why the guy who added support for this in iproute2 thought a -1 should be expected. I don't know if this must be fixed in kernel or in iproute2. Who is right? What is the correct API? How was it designed originally? I let you decide if it should goes in the kernel or be fixed in iproute2. Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: fix /proc/net/ip_mr_cache display - V2Benjamin Thery2008-12-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | /proc/net/ip_mr_cache and /proc/net/ip6_mr_cache displays garbage when showing unresolved mfc_cache entries. [root@qemu tests]# cat /proc/net/ip_mr_cache Group Origin Iif Pkts Bytes Wrong Oifs 014C00EF 010014AC 1 10 10050 0 2:1 3:1 024C00EF 010014AC 65535 514 2 -559067475 The first line is correct. It is a resolved cache entry, 10 packets used it... The second line represents an unresolved entry, and the columns Pkts(4th), Bytes(5th) and Wrong(6th) just show garbage. In struct mfc_cache, there's an union to store data for resolved and unresolved cases. And what ipmr_mfc_seq_show() is printing in these columns for the unresolved entries is some bytes from mfc_cache.mfc_un.res. Bad. (eg. In our case -559067475 is in fact 0xdead4ead which is the spinlock magic from mfc_cache.mfc_un.unres.unresolved.lock.magic). This patch replaces the garbage data written in these columns for the unresolved entries by '0' (zeros) which is more correct. This change doesn't break the ABI. Also, mfc->mfc_un.res.pkt, mfc->mfc_un.res.bytes, mfc->mfc_un.res.wrong_if are unsigned long. It applies on top of net-next-2.6. The patch for net-2.6 is slightly different because of the NIP6_FMT to %pI6 conversion that was made in the seq_printf. Changelog: ========== V2: * Instead of breaking the ABI by suppressing the columns that have no meaning for unresolved entries, fill them with 0 values. Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipmr: convert ipmr virtual interface to net_device_opsStephen Hemminger2008-11-20
| | | | | | | Convert to new network device ops interface. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'master' of ↵David S. Miller2008-11-20
|\ | | | | | | | | | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/ixgbe/ixgbe_main.c include/net/mac80211.h net/phonet/af_phonet.c
| * net: fix ip_mr_init() error pathBenjamin Thery2008-11-19
| | | | | | | | | | | | | | | | | | | | | | | | | | Similarly to IPv6 ip6_mr_init() (fixed last week), the order of cleanup operations in the error/exit section of ip_mr_init() is completely inversed. It should be the other way around. Also a del_timer() is missing in the error path. I should have guessed last week that this same error existed in ipmr.c too, as ip6mr.c is largely inspired by ipmr.c. Signed-off-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | ip: convert to net_device_ops for ioctlStephen Hemminger2008-11-20
| | | | | | | | | | | | | | Convert to net_device_ops function table pointer for ioctl. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: clean up net/ipv4/ipmr.cJianjun Kong2008-11-03
|/ | | | | Signed-off-by: Jianjun Kong <jianjun@zeuux.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Rationalise email address: Network Specific PartsAlan Cox2008-10-13
| | | | | | | | | | Clean up the various different email addresses of mine listed in the code to a single current and valid address. As Dave says his network merges for 2.6.28 are now done this seems a good point to send them in where they won't risk disrupting real changes. Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* netns: Use net_eq() to compare net-namespaces for optimization.YOSHIFUJI Hideaki2008-07-20
| | | | | | | | Without CONFIG_NET_NS, namespace is always &init_net. Compiler will be able to omit namespace comparisons with this patch. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* mib: add net to IP_INC_STATS_BHPavel Emelyanov2008-07-16
| | | | | Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: Fix ipmr unregister device oopsWang Chen2008-07-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | An oops happens during device unregister. The following oops happened when I add two tunnels, which use a same device, and then delete one tunnel. Obviously deleting tunnel "A" causes device unregister, which send a notification, and after receiving notification, ipmr do unregister again for tunnel "B" which also use same device. That is wrong. After receiving notification, ipmr only needs to decrease reference count and don't do duplicated unregister. Fortunately, IPv6 side doesn't add tunnel in ip6mr, so it's clean. This patch fixs: - unregister device oops - using after dev_put() Here is the oops: === Jul 11 15:39:29 wangchen kernel: ------------[ cut here ]------------ Jul 11 15:39:29 wangchen kernel: kernel BUG at net/core/dev.c:3651! Jul 11 15:39:29 wangchen kernel: invalid opcode: 0000 [#1] Jul 11 15:39:29 wangchen kernel: Modules linked in: ipip tunnel4 nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs ipv6 snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device af_packet binfmt_misc button battery ac loop dm_mod usbhid ff_memless pcmcia firmware_class ohci1394 8139too mii ieee1394 yenta_socket rsrc_nonstatic pcmcia_core ide_cd_mod cdrom snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm i2c_i801 snd_timer snd i2c_core soundcore snd_page_alloc rng_core shpchp ehci_hcd uhci_hcd pci_hotplug intel_agp agpgart usbcore ext3 jbd ata_piix ahci libata dock edd fan thermal processor thermal_sys piix sd_mod scsi_mod ide_disk ide_core [last unloaded: freq_table] Jul 11 15:39:29 wangchen kernel: Jul 11 15:39:29 wangchen kernel: Pid: 4102, comm: mroute Not tainted (2.6.26-rc9-default #69) Jul 11 15:39:29 wangchen kernel: EIP: 0060:[<c024636b>] EFLAGS: 00010202 CPU: 0 Jul 11 15:39:29 wangchen kernel: EIP is at rollback_registered+0x61/0xe3 Jul 11 15:39:29 wangchen kernel: EAX: 00000001 EBX: ecba6000 ECX: 00000000 EDX: ffffffff Jul 11 15:39:29 wangchen kernel: ESI: 00000001 EDI: ecba6000 EBP: c03de2e8 ESP: ed8e7c3c Jul 11 15:39:29 wangchen kernel: DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068 Jul 11 15:39:29 wangchen kernel: Process mroute (pid: 4102, ti=ed8e6000 task=ed41e830 task.ti=ed8e6000) Jul 11 15:39:29 wangchen kernel: Stack: ecba6000 c024641c 00000028 c0284e1a 00000001 c03de2e8 ecba6000 eecff360 Jul 11 15:39:29 wangchen kernel: c0284e4c c03536f4 fffffff8 00000000 c029a819 ecba6000 00000006 ecba6000 Jul 11 15:39:29 wangchen kernel: 00000000 ecba6000 c03de2c0 c012841b ffffffff 00000000 c024639f ecba6000 Jul 11 15:39:29 wangchen kernel: Call Trace: Jul 11 15:39:29 wangchen kernel: [<c024641c>] unregister_netdevice+0x2f/0x51 Jul 11 15:39:29 wangchen kernel: [<c0284e1a>] vif_delete+0xaf/0xc3 Jul 11 15:39:29 wangchen kernel: [<c0284e4c>] ipmr_device_event+0x1e/0x30 Jul 11 15:39:29 wangchen kernel: [<c029a819>] notifier_call_chain+0x2a/0x47 Jul 11 15:39:29 wangchen kernel: [<c012841b>] raw_notifier_call_chain+0x9/0xc Jul 11 15:39:29 wangchen kernel: [<c024639f>] rollback_registered+0x95/0xe3 Jul 11 15:39:29 wangchen kernel: [<c024641c>] unregister_netdevice+0x2f/0x51 Jul 11 15:39:29 wangchen kernel: [<c0284e1a>] vif_delete+0xaf/0xc3 Jul 11 15:39:29 wangchen kernel: [<c0285eee>] ip_mroute_setsockopt+0x47a/0x801 Jul 11 15:39:29 wangchen kernel: [<eea5a70c>] do_get_write_access+0x2df/0x313 [jbd] Jul 11 15:39:29 wangchen kernel: [<c01727c4>] __find_get_block_slow+0xda/0xe4 Jul 11 15:39:29 wangchen kernel: [<c0172a7f>] __find_get_block+0xf8/0x122 Jul 11 15:39:29 wangchen kernel: [<c0172a7f>] __find_get_block+0xf8/0x122 Jul 11 15:39:29 wangchen kernel: [<eea5d563>] journal_cancel_revoke+0xda/0x110 [jbd] Jul 11 15:39:29 wangchen kernel: [<c0263501>] ip_setsockopt+0xa9/0x9ee Jul 11 15:39:29 wangchen kernel: [<eea5d563>] journal_cancel_revoke+0xda/0x110 [jbd] Jul 11 15:39:29 wangchen kernel: [<eea5a70c>] do_get_write_access+0x2df/0x313 [jbd] Jul 11 15:39:29 wangchen kernel: [<eea69287>] __ext3_get_inode_loc+0xcf/0x271 [ext3] Jul 11 15:39:29 wangchen kernel: [<eea743c7>] __ext3_journal_dirty_metadata+0x13/0x32 [ext3] Jul 11 15:39:29 wangchen kernel: [<c0116434>] __wake_up+0xf/0x15 Jul 11 15:39:29 wangchen kernel: [<eea5a424>] journal_stop+0x1bd/0x1c6 [jbd] Jul 11 15:39:29 wangchen kernel: [<eea703a7>] __ext3_journal_stop+0x19/0x34 [ext3] Jul 11 15:39:29 wangchen kernel: [<c014291e>] get_page_from_freelist+0x94/0x369 Jul 11 15:39:29 wangchen kernel: [<c01408f2>] filemap_fault+0x1ac/0x2fe Jul 11 15:39:29 wangchen kernel: [<c01a605e>] security_sk_alloc+0xd/0xf Jul 11 15:39:29 wangchen kernel: [<c023edea>] sk_prot_alloc+0x36/0x78 Jul 11 15:39:29 wangchen kernel: [<c0240037>] sk_alloc+0x3a/0x40 Jul 11 15:39:29 wangchen kernel: [<c0276062>] raw_hash_sk+0x46/0x4e Jul 11 15:39:29 wangchen kernel: [<c0166aff>] d_alloc+0x1b/0x157 Jul 11 15:39:29 wangchen kernel: [<c023e4d1>] sock_common_setsockopt+0x12/0x16 Jul 11 15:39:29 wangchen kernel: [<c023cb1e>] sys_setsockopt+0x6f/0x8e Jul 11 15:39:29 wangchen kernel: [<c023e105>] sys_socketcall+0x15c/0x19e Jul 11 15:39:29 wangchen kernel: [<c0103611>] sysenter_past_esp+0x6a/0x99 Jul 11 15:39:29 wangchen kernel: [<c0290000>] unix_poll+0x69/0x78 Jul 11 15:39:29 wangchen kernel: ======================= Jul 11 15:39:29 wangchen kernel: Code: 83 e0 01 00 00 85 c0 75 1f 53 53 68 12 81 31 c0 e8 3c 30 ed ff ba 3f 0e 00 00 b8 b9 7f 31 c0 83 c4 0c 5b e9 f5 26 ed ff 48 74 04 <0f> 0b eb fe 89 d8 e8 21 ff ff ff 89 d8 e8 62 ea ff ff c7 83 e0 Jul 11 15:39:29 wangchen kernel: EIP: [<c024636b>] rollback_registered+0x61/0xe3 SS:ESP 0068:ed8e7c3c Jul 11 15:39:29 wangchen kernel: ---[ end trace c311acf85d169786 ]--- === Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: Check return of dev_set_allmultiWang Chen2008-07-14
| | | | | | | | | | | | | allmulti might overflow. Commit: "netdevice: Fix promiscuity and allmulti overflow" in net-next makes dev_set_promiscuity/allmulti return error number if overflow happened. Here, we check the positive increment for allmulti to get error return. PS: For unwinding tunnel creating, we let ipip->ioctl() to handle it. Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: Do cleanup for ip_mr_initWang Chen2008-07-03
| | | | | | | | Same as ip6_mr_init(), make ip_mr_init() return errno if fails. But do not do error handling in inet_init(), just print a msg. Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
* net: remove CVS keywordsAdrian Bunk2008-06-12
| | | | | | | | This patch removes CVS keywords that weren't updated for a long time from comments. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipmr: Use on-device stats instead of private ones.Pavel Emelyanov2008-05-21
| | | | | | | | | | | | | | | These devices use the private area of appropriate size for statistics. Turning them to use on-device ones make them "privless" and thus - really small wrt kmalloc cache, they are allocated from. Besides, code looks nicer, because of absence of multi-braced type casts and dereferences. [ Fix build failures -DaveM ] Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipmr: Ipip tunnel uses on-device stats.Pavel Emelyanov2008-05-21
| | | | | | | | | | | The ipmr uses ipip tunnels for its purposes and updates the tunnels' stats, but the ipip driver is already switched to use on-device ones. Actually, this is a part of the patch #4 from this set. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET] NETNS: Omit sock->sk_net without CONFIG_NET_NS.YOSHIFUJI Hideaki2008-03-25
| | | | | | | | | Introduce per-sock inlines: sock_net(), sock_net_set() and per-inet_timewait_sock inlines: twsk_net(), twsk_net_set(). Without CONFIG_NET_NS, no namespace other than &init_net exists. Let's explicitly define them to help compiler optimizations. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
* [NET] NETNS: Omit net_device->nd_net without CONFIG_NET_NS.YOSHIFUJI Hideaki2008-03-25
| | | | | | | | Introduce per-net_device inlines: dev_net(), dev_net_set(). Without CONFIG_NET_NS, no namespace other than &init_net exists. Let's explicitly define them to help compiler optimizations. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
* [IPV4]: Add 'rtable' field in struct sk_buff to alias 'dst' and avoid castsEric Dumazet2008-03-05
| | | | | | | | | | | | | | | | (Anonymous) unions can help us to avoid ugly casts. A common cast it the (struct rtable *)skb->dst one. Defining an union like : union { struct dst_entry *dst; struct rtable *rtable; }; permits to use skb->rtable in place. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS]: Add namespace parameter to ip_route_output_key.Denis V. Lunev2008-01-28
| | | | | | | Needed to propagate it down to the ip_route_output_flow. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS]: Add namespace parameter to ip_dev_find.Denis V. Lunev2008-01-28
| | | | | | | | in_dev_find() need a namespace to pass it to fib_get_table(), so add an argument. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4]: ipmr sparse warningsStephen Hemminger2008-01-28
| | | | | | | Get rid of some of the sparse warnings. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4] net/ipv4: Use ipv4_is_<type>Joe Perches2008-01-28
| | | | | Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4]: Switch users of ipv4_devconf(_all) to use the pernet onePavel Emelyanov2008-01-28
| | | | | | | | | | These are scattered over the code, but almost all the "critical" places already have the proper struct net at hand except for snmp proc showing function and routing rtnl handler. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Make rtnetlink infrastructure network namespace aware (v3)Denis V. Lunev2008-01-28
| | | | | | | | | | | | | | | | | After this patch none of the netlink callback support anything except the initial network namespace but the rtnetlink infrastructure now handles multiple network namespaces. Changes from v2: - IPv6 addrlabel processing Changes from v1: - no need for special rtnl_unlock handling - fixed IPv6 ndisc Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETFILTER]: Introduce NF_INET_ hook valuesPatrick McHardy2008-01-28
| | | | | | | | | | | The IPv4 and IPv6 hook values are identical, yet some code tries to figure out the "correct" value by looking at the address family. Introduce NF_INET_* values for both IPv4 and IPv6. The old values are kept in a #ifndef __KERNEL__ section for userspace compatibility. Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Convert init_timer into setup_timerPavel Emelyanov2008-01-28
| | | | | | | | | | | | | Many-many code in the kernel initialized the timer->function and timer->data together with calling init_timer(timer). There is already a helper for this. Use it for networking code. The patch is HUGE, but makes the code 130 lines shorter (98 insertions(+), 228 deletions(-)). Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Make core networking code use seq_open_privatePavel Emelyanov2007-10-10
| | | | | | | | | | | This concerns the ipv4 and ipv6 code mostly, but also the netlink and unix sockets. The netlink code is an example of how to use the __seq_open_private() call - it saves the net namespace on this private. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Make the device list and device lookups per namespace.Eric W. Biederman2007-10-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch makes most of the generic device layer network namespace safe. This patch makes dev_base_head a network namespace variable, and then it picks up a few associated variables. The functions: dev_getbyhwaddr dev_getfirsthwbytype dev_get_by_flags dev_get_by_name __dev_get_by_name dev_get_by_index __dev_get_by_index dev_ioctl dev_ethtool dev_load wireless_process_ioctl were modified to take a network namespace argument, and deal with it. vlan_ioctl_set and brioctl_set were modified so their hooks will receive a network namespace argument. So basically anthing in the core of the network stack that was affected to by the change of dev_base was modified to handle multiple network namespaces. The rest of the network stack was simply modified to explicitly use &init_net the initial network namespace. This can be fixed when those components of the network stack are modified to handle multiple network namespaces. For now the ifindex generator is left global. Fundametally ifindex numbers are per namespace, or else we will have corner case problems with migration when we get that far. At the same time there are assumptions in the network stack that the ifindex of a network device won't change. Making the ifindex number global seems a good compromise until the network stack can cope with ifindex changes when you change namespaces, and the like. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Make device event notification network namespace safeEric W. Biederman2007-10-10
| | | | | | | | | | | | | | | | | | Every user of the network device notifiers is either a protocol stack or a pseudo device. If a protocol stack that does not have support for multiple network namespaces receives an event for a device that is not in the initial network namespace it quite possibly can get confused and do the wrong thing. To avoid problems until all of the protocol stacks are converted this patch modifies all netdev event handlers to ignore events on devices that are not in the initial network namespace. As the rest of the code is made network namespace aware these checks can be removed. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Make /proc/net per network namespaceEric W. Biederman2007-10-10
| | | | | | | | | | | | | | | | | | This patch makes /proc/net per network namespace. It modifies the global variables proc_net and proc_net_stat to be per network namespace. The proc_net file helpers are modified to take a network namespace argument, and all of their callers are fixed to pass &init_net for that argument. This ensures that all of the /proc/net files are only visible and usable in the initial network namespace until the code behind them has been updated to be handle multiple network namespaces. Making /proc/net per namespace is necessary as at least some files in /proc/net depend upon the set of network devices which is per network namespace, and even more files in /proc/net have contents that are relevant to a single network namespace. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* mm: Remove slab destructors from kmem_cache_create().Paul Mundt2007-07-19
| | | | | | | | | | | | | | Slab destructors were no longer supported after Christoph's c59def9f222d44bb7e2f0a559f2906191a0862d7 change. They've been BUGs for both slab and slub, and slob never supported them either. This rips out support for the dtor pointer from kmem_cache_create() completely and fixes up every single callsite in the kernel (there were about 224, not including the slab allocator definitions themselves, or the documentation references). Signed-off-by: Paul Mundt <lethal@linux-sh.org>
* [IPV4]: Restore old behaviour of default config valuesHerbert Xu2007-06-07
| | | | | | | | | | | | | | | | | Previously inet devices were only constructed when addresses are added (or rarely in ipmr). Therefore the default config values they get are the ones at the time of these operations. Now that we're creating inet devices earlier, this changes the behaviour of default config values in an incompatible way (see bug #8519). This patch creates a compromise by setting the default values at the same point as before but only for those that have not been explicitly set by the user since the inet device's creation. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4]: Convert IPv4 devconf to an arrayHerbert Xu2007-06-07
| | | | | | | | | This patch converts the ipv4_devconf config members (everything except sysctl) to an array. This allows easier manipulation which will be needed later on to provide better management of default config values. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* [SK_BUFF]: Introduce skb_copy_to_linear_data{_offset}Arnaldo Carvalho de Melo2007-04-26
| | | | | | | To clearly state the intent of copying to linear sk_buffs, _offset being a overly long variant but interesting for the sake of saving some bytes. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
* [NETLINK]: Use nlmsg_trim() where appropriateArnaldo Carvalho de Melo2007-04-26
| | | | | Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [SK_BUFF]: Convert skb->tail to sk_buff_data_tArnaldo Carvalho de Melo2007-04-26
| | | | | | | | | | | | | | | So that it is also an offset from skb->head, reduces its size from 8 to 4 bytes on 64bit architectures, allowing us to combine the 4 bytes hole left by the layer headers conversion, reducing struct sk_buff size to 256 bytes, i.e. 4 64byte cachelines, and since the sk_buff slab cache is SLAB_HWCACHE_ALIGN... :-) Many calculations that previously required that skb->{transport,network, mac}_header be first converted to a pointer now can be done directly, being meaningful as offsets or pointers. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [SK_BUFF]: unions of just one member don't get anything done, kill themArnaldo Carvalho de Melo2007-04-26
| | | | | | | | | Renaming skb->h to skb->transport_header, skb->nh to skb->network_header and skb->mac to skb->mac_header, to match the names of the associated helpers (skb[_[re]set]_{transport,network,mac}_header). Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [SK_BUFF]: Some more layer header conversionsArnaldo Carvalho de Melo2007-04-26
| | | | | Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>