aboutsummaryrefslogtreecommitdiffstats
path: root/net/ipv6
Commit message (Collapse)AuthorAge
...
* netfilter: netns nf_conntrack: pass conntrack to nf_conntrack_event_cache() ↵Alexey Dobriyan2008-10-08
| | | | | | | | | not skb This is cleaner, we already know conntrack to which event is relevant. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
* netfilter: netns nf_conntrack: pass netns pointer to L4 protocol's ->error hookAlexey Dobriyan2008-10-08
| | | | | | | | Again, it's deducible from skb, but we're going to use it for nf_conntrack_checksum and statistics, so just pass it from upper layer. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
* netfilter: netns nf_conntrack: pass netns pointer to nf_conntrack_in()Alexey Dobriyan2008-10-08
| | | | | | | | | It's deducible from skb->dev or skb->dst->dev, but we know netns at the moment of call, so pass it down and use for finding and creating conntracks. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
* netfilter: netns: fix {ip,6}_route_me_harder() in netnsAlexey Dobriyan2008-10-08
| | | | | | | | | | | Take netns from skb->dst->dev. It should be safe because, they are called from LOCAL_OUT hook where dst is valid (though, I'm not exactly sure about IPVS and queueing packets to userspace). [Patrick: its safe everywhere since they already expect skb->dst to be set] Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
* netfilter: netns nf_conntrack: per-netns conntrack hashAlexey Dobriyan2008-10-08
| | | | | | | | | | | | | * make per-netns conntrack hash Other solution is to add ->ct_net pointer to tuplehashes and still has one hash, I tried that it's ugly and requires more code deep down in protocol modules et al. * propagate netns pointer to where needed, e. g. to conntrack iterators. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
* netfilter: netns: ip6t_REJECT in netns for realAlexey Dobriyan2008-10-08
| | | | | Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
* netfilter: netns: ip6table_mangle in netns for realAlexey Dobriyan2008-10-08
| | | | | Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
* netfilter: netns: ip6table_raw in netns for realAlexey Dobriyan2008-10-08
| | | | | Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
* netfilter: netns: remove nf_*_net() wrappersAlexey Dobriyan2008-10-08
| | | | | | | | | Now that dev_net() exists, the usefullness of them is even less. Also they're a big problem in resolving circular header dependencies necessary for NOTRACK-in-netns patch. See below. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
* netfilter: x_tables: use NFPROTO_* in extensionsJan Engelhardt2008-10-08
| | | | | Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net>
* netfilter: Use unsigned types for hooknum and pf varsJan Engelhardt2008-10-08
| | | | | | | and (try to) consistently use u_int8_t for the L3 family. Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net>
* netns: make uplitev6 mib per/namespaceDenis V. Lunev2008-10-07
| | | | | Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* netns: make udpv6 mib per/namespaceDenis V. Lunev2008-10-07
| | | | | Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* netns: add stub functions for per/namespace mibs allocationDenis V. Lunev2008-10-07
| | | | | | | | The content of init_ipv6_mibs/cleanup_ipv6_mibs will be moved to new calls one by one next. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* netns: allow per device ipv6 snmp statistics in non-initial namespaceDenis V. Lunev2008-10-07
| | | | | Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* netns: register global ipv6 mibs statistics in each namespaceDenis V. Lunev2008-10-07
| | | | | | | Unused net variable will become used very soon. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: separate seq_ops for global & per/device ipv6 statisticsDenis V. Lunev2008-10-07
| | | | | | | | | | | | | | | idev has been stored on seq->private. NULL has been stored for global statistics. The situation is changed with net namespace. We need to store pointer to struct net and the only place is seq->private. So, we'll have for /proc/net/dev_snmp6/* and for /proc/net/snmp6 pointers of two different types stored in the same field. This effectively requires to separate seq_ops of these files. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: consolidate ipv6 sock_stat code at the beginning of net/ipv6/proc.cDenis V. Lunev2008-10-07
| | | | | | | | | Simple, comsolidate sockstat6 staff in one place, at the beginning of the file. Right now sockstat6_seq_open/sockstat6_seq_fops looks like an intrusion in the middle of snmp6 code. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* netns: register /proc/net/dev_snmp6/* in each nsDenis V. Lunev2008-10-07
| | | | | | | Do the same for /proc/net/snmp6. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* netns: move /proc/net/dev_snmp6 to struct netDenis V. Lunev2008-10-07
| | | | | Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: initialize ip6_route sysctl vars in ip6_route_net_init()Peter Zijlstra2008-10-07
| | | | | | | | | This makes that ip6_route_net_init() does all of the route init code. There used to be a race between ip6_route_net_init() and ip6_net_init() and someone relying on the combined result was left out cold. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: clean up ip6_route_net_init() error handlingPeter Zijlstra2008-10-07
| | | | | | | ip6_route_net_init() error handling looked less than solid, fix 'er up. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: David S. Miller <davem@davemloft.net>
* inet: Don't lookup the socket if there's a socket attached to the skbKOVACS Krisztian2008-10-07
| | | | | | | | Use the socket cached in the skb if it's present. Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* inet: Add udplib_lookup_skb() helpersKOVACS Krisztian2008-10-07
| | | | | | | | | | To be able to use the cached socket reference in the skb during input processing we add a new set of lookup functions that receive the skb on their argument list. Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* inet_hashtables: Add inet_lookup_skb helpersArnaldo Carvalho de Melo2008-10-07
| | | | | | | | | | To be able to use the cached socket reference in the skb during input processing we add a new set of lookup functions that receive the skb on their argument list. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: Make inet_sock.h independent of route.hKOVACS Krisztian2008-10-01
| | | | | | | | | inet_iif() in inet_sock.h requires route.h. Since users of inet_iif() usually require other route.h functionality anyway this patch moves inet_iif() to route.h. Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'master' of ↵David S. Miller2008-10-01
|\ | | | | | | | | | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/wireless/ath9k/core.c drivers/net/wireless/ath9k/main.c net/core/dev.c
| * XFRM,IPv6: initialize ip6_dst_blackhole_ops.kmem_cachepArnaud Ebalard2008-10-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ip6_dst_blackhole_ops.kmem_cachep is not expected to be NULL (i.e. to be initialized) when dst_alloc() is called from ip6_dst_blackhole(). Otherwise, it results in the following (xfrm_larval_drop is now set to 1 by default): [ 78.697642] Unable to handle kernel paging request for data at address 0x0000004c [ 78.703449] Faulting instruction address: 0xc0097f54 [ 78.786896] Oops: Kernel access of bad area, sig: 11 [#1] [ 78.792791] PowerMac [ 78.798383] Modules linked in: btusb usbhid bluetooth b43 mac80211 cfg80211 ehci_hcd ohci_hcd sungem sungem_phy usbcore ssb [ 78.804263] NIP: c0097f54 LR: c0334a28 CTR: c002d430 [ 78.809997] REGS: eef19ad0 TRAP: 0300 Not tainted (2.6.27-rc5) [ 78.815743] MSR: 00001032 <ME,IR,DR> CR: 22242482 XER: 20000000 [ 78.821550] DAR: 0000004c, DSISR: 40000000 [ 78.827278] TASK = eef0df40[3035] 'mip6d' THREAD: eef18000 [ 78.827408] GPR00: 00001032 eef19b80 eef0df40 00000000 00008020 eef19c30 00000001 00000000 [ 78.833249] GPR08: eee5101c c05a5c10 ef9ad500 00000000 24242422 1005787c 00000000 1004f960 [ 78.839151] GPR16: 00000000 10024e90 10050040 48030018 0fe44150 00000000 00000000 eef19c30 [ 78.845046] GPR24: eef19e44 00000000 eef19bf8 efb37c14 eef19bf8 00008020 00009032 c0596064 [ 78.856671] NIP [c0097f54] kmem_cache_alloc+0x20/0x94 [ 78.862581] LR [c0334a28] dst_alloc+0x40/0xc4 [ 78.868451] Call Trace: [ 78.874252] [eef19b80] [c03c1810] ip6_dst_lookup_tail+0x1c8/0x1dc (unreliable) [ 78.880222] [eef19ba0] [c0334a28] dst_alloc+0x40/0xc4 [ 78.886164] [eef19bb0] [c03cd698] ip6_dst_blackhole+0x28/0x1cc [ 78.892090] [eef19be0] [c03d9be8] rawv6_sendmsg+0x75c/0xc88 [ 78.897999] [eef19cb0] [c038bca4] inet_sendmsg+0x4c/0x78 [ 78.903907] [eef19cd0] [c03207c8] sock_sendmsg+0xac/0xe4 [ 78.909734] [eef19db0] [c03209e4] sys_sendmsg+0x1e4/0x2a0 [ 78.915540] [eef19f00] [c03220a8] sys_socketcall+0xfc/0x210 [ 78.921406] [eef19f40] [c0014b3c] ret_from_syscall+0x0/0x38 [ 78.927295] --- Exception: c01 at 0xfe2d730 [ 78.927297] LR = 0xfe2d71c [ 78.939019] Instruction dump: [ 78.944835] 91640018 9144001c 900a0000 4bffff44 9421ffe0 7c0802a6 bf810010 7c9d2378 [ 78.950694] 90010024 7fc000a6 57c0045e 7c000124 <83e3004c> 8383005c 2f9f0000 419e0050 [ 78.956464] ---[ end trace 05fa1ed7972487a1 ]--- As commented by Benjamin Thery, the bug was introduced by f2fc6a54585a1be6669613a31fbaba2ecbadcd36, while adding network namespaces support to ipv6 routes. Signed-off-by: Arnaud Ebalard <arno@natisbad.org> Acked-by: Benjamin Thery <benjamin.thery@bull.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * ipv6: NULL pointer dereferrence in tcp_v6_send_ackDenis V. Lunev2008-10-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following actions are possible: tcp_v6_rcv skb->dev = NULL; tcp_v6_do_rcv tcp_v6_hnd_req tcp_check_req req->rsk_ops->send_ack == tcp_v6_send_ack So, skb->dev can be NULL in tcp_v6_send_ack. We must obtain namespace from dst entry. Thanks to Vitaliy Gusev <vgusev@openvz.org> for initial problem finding in IPv4 code. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * netfilter: ip6t_{hbh,dst}: Rejects not-strict mode on rule insertionYasuyuki Kozakai2008-09-24
| | | | | | | | | | | | | | | | | | | | | | | | | | The current code ignores rules for internal options in HBH/DST options header in packet processing if 'Not strict' mode is specified (which is not implemented). Clearly it is not expected by user. Kernel should reject HBH/DST rule insertion with 'Not strict' mode in the first place. Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | ipv6: almost identical frag hashing funcs combinedIlpo Järvinen2008-10-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | $ diff-funcs ip6qhashfn reassembly.c netfilter/nf_conntrack_reasm.c --- reassembly.c:ip6qhashfn() +++ netfilter/nf_conntrack_reasm.c:ip6qhashfn() @@ -1,5 +1,5 @@ -static unsigned int ip6qhashfn(__be32 id, struct in6_addr *saddr, - struct in6_addr *daddr) +static unsigned int ip6qhashfn(__be32 id, const struct in6_addr *saddr, + const struct in6_addr *daddr) { u32 a, b, c; @@ -9,7 +9,7 @@ a += JHASH_GOLDEN_RATIO; b += JHASH_GOLDEN_RATIO; - c += ip6_frags.rnd; + c += nf_frags.rnd; __jhash_mix(a, b, c); a += (__force u32)saddr->s6_addr32[3]; And codiff xx.o.old xx.o.new: net/ipv6/netfilter/nf_conntrack_reasm.c: ip6qhashfn | -512 nf_hashfn | +6 nf_ct_frag6_gather | +36 3 functions changed, 42 bytes added, 512 bytes removed, diff: -470 net/ipv6/reassembly.c: ip6qhashfn | -512 ip6_hashfn | +7 ipv6_frag_rcv | +89 3 functions changed, 96 bytes added, 512 bytes removed, diff: -416 net/ipv6/reassembly.c: inet6_hash_frag | +510 1 function changed, 510 bytes added, diff: +510 Total: -376 Compile tested. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: Use hton[sl]() instead of __constant_hton[sl]() where applicableArnaldo Carvalho de Melo2008-09-21
| | | | | | | | | | Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | tcp_ipv6: fix use of uninitialized memoryVegard Nossum2008-09-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | inet6_rsk() is called on a struct request_sock * before we have checked whether the socket is an ipv6 socket or a ipv6- mapped ipv4 socket. The access that triggers this is the inet_rsk(rsk)->inet6_rsk_offset dereference in inet6_rsk(). This is arguably not a critical error as the inet6_rsk_offset is only used to compute a pointer which is never really used (in the code path in question) anyway. But it might be a latent error, so let's fix it. Spotted by kmemcheck. Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | ipv6: On interface down/unregister, purge icmp routes too.David S. Miller2008-09-11
|/ | | | | | | | | | | | | | | | | Johannes Berg reported that occaisionally, bringing an interface down or unregistering it would hang for up to 30 seconds. Using debugging output he provided it became clear that ICMP6 routes were the culprit. The problem is that ICMP6 routes live in their own world totally separate from normal ipv6 routes. So there are all kinds of special cases throughout the ipv6 code to handle this. While we should really try to unify all of this stuff somehow, for the time being let's fix this by purging the ICMP6 routes that match the device in question during rt6_ifdown(). Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: Fix OOPS in ip6_dst_lookup_tail().Neil Horman2008-09-09
| | | | | | | | | | | | | | | | | | This fixes kernel bugzilla 11469: "TUN with 1024 neighbours: ip6_dst_lookup_tail NULL crash" dst->neighbour is not necessarily hooked up at this point in the processing path, so blindly dereferencing it is the wrong thing to do. This NULL check exists in other similar paths and this case was just an oversight. Also fix the completely wrong and confusing indentation here while we're at it. Based upon a patch by Evgeniy Polyakov. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* netns : fix kernel panic in timewait socket destructionDaniel Lezcano2008-09-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | How to reproduce ? - create a network namespace - use tcp protocol and get timewait socket - exit the network namespace - after a moment (when the timewait socket is destroyed), the kernel panics. # BUG: unable to handle kernel NULL pointer dereference at 0000000000000007 IP: [<ffffffff821e394d>] inet_twdr_do_twkill_work+0x6e/0xb8 PGD 119985067 PUD 11c5c0067 PMD 0 Oops: 0000 [1] SMP CPU 1 Modules linked in: ipv6 button battery ac loop dm_mod tg3 libphy ext3 jbd edd fan thermal processor thermal_sys sg sata_svw libata dock serverworks sd_mod scsi_mod ide_disk ide_core [last unloaded: freq_table] Pid: 0, comm: swapper Not tainted 2.6.27-rc2 #3 RIP: 0010:[<ffffffff821e394d>] [<ffffffff821e394d>] inet_twdr_do_twkill_work+0x6e/0xb8 RSP: 0018:ffff88011ff7fed0 EFLAGS: 00010246 RAX: ffffffffffffffff RBX: ffffffff82339420 RCX: ffff88011ff7ff30 RDX: 0000000000000001 RSI: ffff88011a4d03c0 RDI: ffff88011ac2fc00 RBP: ffffffff823392e0 R08: 0000000000000000 R09: ffff88002802a200 R10: ffff8800a5c4b000 R11: ffffffff823e4080 R12: ffff88011ac2fc00 R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000000 FS: 0000000041cbd940(0000) GS:ffff8800bff839c0(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000000007 CR3: 00000000bd87c000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process swapper (pid: 0, threadinfo ffff8800bff9e000, task ffff88011ff76690) Stack: ffffffff823392e0 0000000000000100 ffffffff821e3a3a 0000000000000008 0000000000000000 ffffffff821e3a61 ffff8800bff7c000 ffffffff8203c7e7 ffff88011ff7ff10 ffff88011ff7ff10 0000000000000021 ffffffff82351108 Call Trace: <IRQ> [<ffffffff821e3a3a>] ? inet_twdr_hangman+0x0/0x9e [<ffffffff821e3a61>] ? inet_twdr_hangman+0x27/0x9e [<ffffffff8203c7e7>] ? run_timer_softirq+0x12c/0x193 [<ffffffff820390d1>] ? __do_softirq+0x5e/0xcd [<ffffffff8200d08c>] ? call_softirq+0x1c/0x28 [<ffffffff8200e611>] ? do_softirq+0x2c/0x68 [<ffffffff8201a055>] ? smp_apic_timer_interrupt+0x8e/0xa9 [<ffffffff8200cad6>] ? apic_timer_interrupt+0x66/0x70 <EOI> [<ffffffff82011f4c>] ? default_idle+0x27/0x3b [<ffffffff8200abbd>] ? cpu_idle+0x5f/0x7d Code: e8 01 00 00 4c 89 e7 41 ff c5 e8 8d fd ff ff 49 8b 44 24 38 4c 89 e7 65 8b 14 25 24 00 00 00 89 d2 48 8b 80 e8 00 00 00 48 f7 d0 <48> 8b 04 d0 48 ff 40 58 e8 fc fc ff ff 48 89 df e8 c0 5f 04 00 RIP [<ffffffff821e394d>] inet_twdr_do_twkill_work+0x6e/0xb8 RSP <ffff88011ff7fed0> CR2: 0000000000000007 This patch provides a function to purge all timewait sockets related to a network namespace. The timewait sockets life cycle is not tied with the network namespace, that means the timewait sockets stay alive while the network namespace dies. The timewait sockets are for avoiding to receive a duplicate packet from the network, if the network namespace is freed, the network stack is removed, so no chance to receive any packets from the outside world. Furthermore, having a pending destruction timer on these sockets with a network namespace freed is not safe and will lead to an oops if the timer callback which try to access data belonging to the namespace like for example in: inet_twdr_do_twkill_work -> NET_INC_STATS_BH(twsk_net(tw), LINUX_MIB_TIMEWAITED); Purging the timewait sockets at the network namespace destruction will: 1) speed up memory freeing for the namespace 2) fix kernel panic on asynchronous timewait destruction Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Acked-by: Denis V. Lunev <den@openvz.org> Acked-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: When we droped a packet, we should return NET_RX_DROP instead of 0Yang Hongyang2008-08-29
| | | | | Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: sysctl fixesAl Viro2008-08-25
| | | | | | | | Braino: net.ipv6 in ipv6 skeleton has no business in rotable class Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: protocol for address routesStephen Hemminger2008-08-23
| | | | | | | | | | | | | | | | | | | | This fixes a problem spotted with zebra, but not sure if it is necessary a kernel problem. With IPV6 when an address is added to an interface, Zebra creates a duplicate RIB entry, one as a connected route, and other as a kernel route. When an address is added to an interface the RTN_NEWADDR message causes Zebra to create a connected route. In IPV4 when an address is added to an interface a RTN_NEWROUTE message is set to user space with the protocol RTPROT_KERNEL. Zebra ignores these messages, because it already has the connected route. The problem is that route created in IPV6 has route protocol == RTPROT_BOOT. Was this a design decision or a bug? This fixes it. Same patch applies to both net-2.6 and stable. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* icmp: icmp_sk() should not use smp_processor_id() in preemptible codeDenis V. Lunev2008-08-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pass namespace into icmp_xmit_lock, obtain socket inside and return it as a result for caller. Thanks Alexey Dobryan for this report: Steps to reproduce: CONFIG_PREEMPT=y CONFIG_DEBUG_PREEMPT=y tracepath <something> BUG: using smp_processor_id() in preemptible [00000000] code: tracepath/3205 caller is icmp_sk+0x15/0x30 Pid: 3205, comm: tracepath Not tainted 2.6.27-rc4 #1 Call Trace: [<ffffffff8031af14>] debug_smp_processor_id+0xe4/0xf0 [<ffffffff80409405>] icmp_sk+0x15/0x30 [<ffffffff8040a17b>] icmp_send+0x4b/0x3f0 [<ffffffff8025a415>] ? trace_hardirqs_on_caller+0xd5/0x160 [<ffffffff8025a4ad>] ? trace_hardirqs_on+0xd/0x10 [<ffffffff8023a475>] ? local_bh_enable_ip+0x95/0x110 [<ffffffff804285b9>] ? _spin_unlock_bh+0x39/0x40 [<ffffffff8025a26c>] ? mark_held_locks+0x4c/0x90 [<ffffffff8025a4ad>] ? trace_hardirqs_on+0xd/0x10 [<ffffffff8025a415>] ? trace_hardirqs_on_caller+0xd5/0x160 [<ffffffff803e91b4>] ip_fragment+0x8d4/0x900 [<ffffffff803e7030>] ? ip_finish_output2+0x0/0x290 [<ffffffff803e91e0>] ? ip_finish_output+0x0/0x60 [<ffffffff803e6650>] ? dst_output+0x0/0x10 [<ffffffff803e922c>] ip_finish_output+0x4c/0x60 [<ffffffff803e92e3>] ip_output+0xa3/0xf0 [<ffffffff803e68d0>] ip_local_out+0x20/0x30 [<ffffffff803e753f>] ip_push_pending_frames+0x27f/0x400 [<ffffffff80406313>] udp_push_pending_frames+0x233/0x3d0 [<ffffffff804067d1>] udp_sendmsg+0x321/0x6f0 [<ffffffff8040d155>] inet_sendmsg+0x45/0x80 [<ffffffff803b967f>] sock_sendmsg+0xdf/0x110 [<ffffffff8024a100>] ? autoremove_wake_function+0x0/0x40 [<ffffffff80257ce5>] ? validate_chain+0x415/0x1010 [<ffffffff8027dc10>] ? __do_fault+0x140/0x450 [<ffffffff802597d0>] ? __lock_acquire+0x260/0x590 [<ffffffff803b9e55>] ? sockfd_lookup_light+0x45/0x80 [<ffffffff803ba50a>] sys_sendto+0xea/0x120 [<ffffffff80428e42>] ? _spin_unlock_irqrestore+0x42/0x80 [<ffffffff803134bc>] ? __up_read+0x4c/0xb0 [<ffffffff8024e0c6>] ? up_read+0x26/0x30 [<ffffffff8020b8bb>] system_call_fastpath+0x16/0x1b icmp6_sk() is similar. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: Fix the return interface index when get it while no message is received.Yang Hongyang2008-08-18
| | | | | | | | | | | | | | | | | | When get receiving interface index while no message is received, the bounded device's index of the socket should be returned. RFC 3542: Issuing getsockopt() for the above options will return the sticky option value i.e., the value set with setsockopt(). If no sticky option value has been set getsockopt() will return the following values: - For the IPV6_PKTINFO option, it will return an in6_pktinfo structure with ipi6_addr being in6addr_any and ipi6_ifindex being zero. Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* netns: Add network namespace argument to rt6_fill_node() and ↵Brian Haley2008-08-14
| | | | | | | | | | | ipv6_dev_get_saddr() ipv6_dev_get_saddr() blindly de-references dst_dev to get the network namespace, but some callers might pass NULL. Change callers to pass a namespace pointer instead. Signed-off-by: Brian Haley <brian.haley@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: Fix OOPS, ip -f inet6 route get fec0::1, linux-2.6.26, ↵Brian Haley2008-08-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ip6_route_output, rt6_fill_node+0x175 Alexey Dobriyan wrote: > On Thu, Aug 07, 2008 at 07:00:56PM +0200, John Gumb wrote: >> Scenario: no ipv6 default route set. > >> # ip -f inet6 route get fec0::1 >> >> BUG: unable to handle kernel NULL pointer dereference at 00000000 >> IP: [<c0369b85>] rt6_fill_node+0x175/0x3b0 >> EIP is at rt6_fill_node+0x175/0x3b0 > > 0xffffffff80424dd3 is in rt6_fill_node (net/ipv6/route.c:2191). > 2186 } else > 2187 #endif > 2188 NLA_PUT_U32(skb, RTA_IIF, iif); > 2189 } else if (dst) { > 2190 struct in6_addr saddr_buf; > 2191 ====> if (ipv6_dev_get_saddr(ip6_dst_idev(&rt->u.dst)->dev, > ^^^^^^^^^^^^^^^^^^^^^^^^ > NULL > > 2192 dst, 0, &saddr_buf) == 0) > 2193 NLA_PUT(skb, RTA_PREFSRC, 16, &saddr_buf); > 2194 } The commit that changed this can't be reverted easily, but the patch below works for me. Fix NULL de-reference in rt6_fill_node() when there's no IPv6 input device present in the dst entry. Signed-off-by: Brian Haley <brian.haley@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* udp: Drop socket lock for encapsulated packetsHerbert Xu2008-08-09
| | | | | | | | | | | | | | The socket lock is there to protect the normal UDP receive path. Encapsulation UDP sockets don't need that protection. In fact the locking is deadly for them as they may contain another UDP packet within, possibly with the same addresses. Also the nested bit was copied from TCP. TCP needs it because of accept(2) spawning sockets. This simply doesn't apply to UDP so I've removed it. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* tcp: Fix kernel panic when calling tcp_v(4/6)_md5_do_lookupGui Jianfeng2008-08-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the following packet flow happen, kernel will panic. MathineA MathineB SYN ----------------------> SYN+ACK <---------------------- ACK(bad seq) ----------------------> When a bad seq ACK is received, tcp_v4_md5_do_lookup(skb->sk, ip_hdr(skb)->daddr)) is finally called by tcp_v4_reqsk_send_ack(), but the first parameter(skb->sk) is NULL at that moment, so kernel panic happens. This patch fixes this bug. OOPS output is as following: [ 302.812793] IP: [<c05cfaa6>] tcp_v4_md5_do_lookup+0x12/0x42 [ 302.817075] Oops: 0000 [#1] SMP [ 302.819815] Modules linked in: ipv6 loop dm_multipath rtc_cmos rtc_core rtc_lib pcspkr pcnet32 mii i2c_piix4 parport_pc i2c_core parport ac button ata_piix libata dm_mod mptspi mptscsih mptbase scsi_transport_spi sd_mod scsi_mod crc_t10dif ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd [last unloaded: scsi_wait_scan] [ 302.849946] [ 302.851198] Pid: 0, comm: swapper Not tainted (2.6.27-rc1-guijf #5) [ 302.855184] EIP: 0060:[<c05cfaa6>] EFLAGS: 00010296 CPU: 0 [ 302.858296] EIP is at tcp_v4_md5_do_lookup+0x12/0x42 [ 302.861027] EAX: 0000001e EBX: 00000000 ECX: 00000046 EDX: 00000046 [ 302.864867] ESI: ceb69e00 EDI: 1467a8c0 EBP: cf75f180 ESP: c0792e54 [ 302.868333] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 [ 302.871287] Process swapper (pid: 0, ti=c0792000 task=c0712340 task.ti=c0746000) [ 302.875592] Stack: c06f413a 00000000 cf75f180 ceb69e00 00000000 c05d0d86 000016d0 ceac5400 [ 302.883275] c05d28f8 000016d0 ceb69e00 ceb69e20 681bf6e3 00001000 00000000 0a67a8c0 [ 302.890971] ceac5400 c04250a3 c06f413a c0792eb0 c0792edc cf59a620 cf59a620 cf59a634 [ 302.900140] Call Trace: [ 302.902392] [<c05d0d86>] tcp_v4_reqsk_send_ack+0x17/0x35 [ 302.907060] [<c05d28f8>] tcp_check_req+0x156/0x372 [ 302.910082] [<c04250a3>] printk+0x14/0x18 [ 302.912868] [<c05d0aa1>] tcp_v4_do_rcv+0x1d3/0x2bf [ 302.917423] [<c05d26be>] tcp_v4_rcv+0x563/0x5b9 [ 302.920453] [<c05bb20f>] ip_local_deliver_finish+0xe8/0x183 [ 302.923865] [<c05bb10a>] ip_rcv_finish+0x286/0x2a3 [ 302.928569] [<c059e438>] dev_alloc_skb+0x11/0x25 [ 302.931563] [<c05a211f>] netif_receive_skb+0x2d6/0x33a [ 302.934914] [<d0917941>] pcnet32_poll+0x333/0x680 [pcnet32] [ 302.938735] [<c05a3b48>] net_rx_action+0x5c/0xfe [ 302.941792] [<c042856b>] __do_softirq+0x5d/0xc1 [ 302.944788] [<c042850e>] __do_softirq+0x0/0xc1 [ 302.948999] [<c040564b>] do_softirq+0x55/0x88 [ 302.951870] [<c04501b1>] handle_fasteoi_irq+0x0/0xa4 [ 302.954986] [<c04284da>] irq_exit+0x35/0x69 [ 302.959081] [<c0405717>] do_IRQ+0x99/0xae [ 302.961896] [<c040422b>] common_interrupt+0x23/0x28 [ 302.966279] [<c040819d>] default_idle+0x2a/0x3d [ 302.969212] [<c0402552>] cpu_idle+0xb2/0xd2 [ 302.972169] ======================= [ 302.974274] Code: fc ff 84 d2 0f 84 df fd ff ff e9 34 fe ff ff 83 c4 0c 5b 5e 5f 5d c3 90 90 57 89 d7 56 53 89 c3 50 68 3a 41 6f c0 e8 e9 55 e5 ff <8b> 93 9c 04 00 00 58 85 d2 59 74 1e 8b 72 10 31 db 31 c9 85 f6 [ 303.011610] EIP: [<c05cfaa6>] tcp_v4_md5_do_lookup+0x12/0x42 SS:ESP 0068:c0792e54 [ 303.018360] Kernel panic - not syncing: Fatal exception in interrupt Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipsec: Interfamily IPSec BEET, ipv4-inner ipv6-outerJoakim Koskela2008-08-06
| | | | | | | | | | | | | Here's a revised version, based on Herbert's comments, of a fix for the ipv4-inner, ipv6-outer interfamily ipsec beet mode. It fixes the network header adjustment during interfamily, as well as makes sure that we reserve enough room for the new ipv6 header if we might have something else as the inner family. Also, the ipv4 pseudo header construction was added. Signed-off-by: Joakim Koskela <jookos@gmail.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: replace dst_metric() with dst_mtu() in net/ipv6/route.c.Rami Rosen2008-08-06
| | | | | | | This patch replaces dst_metric() with dst_mtu() in net/ipv6/route.c. Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: Do not drop packet if skb->local_df is set to trueWei Yongjun2008-08-04
| | | | | | | | | | | | | The old code will drop IPv6 packet if ipfragok is not set, since ipfragok is obsoleted, will be instead by used skb->local_df, so this check must be changed to skb->local_df. This patch fix this problem and not drop packet if skb->local_df is set to true. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: Fix the return value of Set Hop-by-Hop options header with NULL data ↵Yang Hongyang2008-08-03
| | | | | | | | | | | | | | pointer When Set Hop-by-Hop options header with NULL data pointer and optlen is not zero use setsockopt(), the kernel successfully return 0 instead of return error EINVAL or EFAULT. This patch fix the problem. Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: syncookies: free reqsk on xfrm_lookup errorFlorian Westphal2008-08-03
| | | | | | | | cookie_v6_check() did not call reqsk_free() if xfrm_lookup() fails, leaking the request sock. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>