aboutsummaryrefslogtreecommitdiffstats
path: root/net/ipv4/ip_output.c
Commit message (Collapse)AuthorAge
* ip: Fix ip_dev_loopback_xmit()Eric Dumazet2010-04-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Eric Paris got following trace with a linux-next kernel [ 14.203970] BUG: using smp_processor_id() in preemptible [00000000] code: avahi-daemon/2093 [ 14.204025] caller is netif_rx+0xfa/0x110 [ 14.204035] Call Trace: [ 14.204064] [<ffffffff81278fe5>] debug_smp_processor_id+0x105/0x110 [ 14.204070] [<ffffffff8142163a>] netif_rx+0xfa/0x110 [ 14.204090] [<ffffffff8145b631>] ip_dev_loopback_xmit+0x71/0xa0 [ 14.204095] [<ffffffff8145b892>] ip_mc_output+0x192/0x2c0 [ 14.204099] [<ffffffff8145d610>] ip_local_out+0x20/0x30 [ 14.204105] [<ffffffff8145d8ad>] ip_push_pending_frames+0x28d/0x3d0 [ 14.204119] [<ffffffff8147f1cc>] udp_push_pending_frames+0x14c/0x400 [ 14.204125] [<ffffffff814803fc>] udp_sendmsg+0x39c/0x790 [ 14.204137] [<ffffffff814891d5>] inet_sendmsg+0x45/0x80 [ 14.204149] [<ffffffff8140af91>] sock_sendmsg+0xf1/0x110 [ 14.204189] [<ffffffff8140dc6c>] sys_sendmsg+0x20c/0x380 [ 14.204233] [<ffffffff8100ad82>] system_call_fastpath+0x16/0x1b While current linux-2.6 kernel doesnt emit this warning, bug is latent and might cause unexpected failures. ip_dev_loopback_xmit() runs in process context, preemption enabled, so must call netif_rx_ni() instead of netif_rx(), to make sure that we process pending software interrupt. Same change for ip6_dev_loopback_xmit() Reported-by: Eric Paris <eparis@redhat.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* include cleanup: Update gfp.h and slab.h includes to prepare for breaking ↵Tejun Heo2010-03-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
* ip: fix mc_loop checks for tunnels with multicast outer addressesOctavian Purdila2010-01-06
| | | | | | | | | | | | | | | | When we have L3 tunnels with different inner/outer families (i.e. IPV4/IPV6) which use a multicast address as the outer tunnel destination address, multicast packets will be loopbacked back to the sending socket even if IP*_MULTICAST_LOOP is set to disabled. The mc_loop flag is present in the family specific part of the socket (e.g. the IPv4 or IPv4 specific part). setsockopt sets the inner family mc_loop flag. When the packet is pushed through the L3 tunnel it will eventually be processed by the outer family which if different will check the flag in a different part of the socket then it was set. Signed-off-by: Octavian Purdila <opurdila@ixiacom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'master' of ↵David S. Miller2009-12-02
|\ | | | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: net/mac80211/ht.c
| * ip_fragment: also adjust skb->truesize for packets not owned by a socketPatrick McHardy2009-12-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a large packet gets reassembled by ip_defrag(), the head skb accounts for all the fragments in skb->truesize. If this packet is refragmented again, skb->truesize is not re-adjusted to reflect only the head size since its not owned by a socket. If the head fragment then gets recycled and reused for another received fragment, it might exceed the defragmentation limits due to its large truesize value. skb_recycle_check() explicitly checks for linear skbs, so any recycled skb should reflect its true size in skb->truesize. Change ip_fragment() to also adjust the truesize value of skbs not owned by a socket. Reported-and-tested-by: Ben Menchaca <ben@bigfootnetworks.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net/ipv4: Move && and || to end of previous lineJoe Perches2009-11-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On Sun, 2009-11-22 at 16:31 -0800, David Miller wrote: > It should be of the form: > if (x && > y) > > or: > if (x && y) > > Fix patches, rather than complaints, for existing cases where things > do not follow this pattern are certainly welcome. Also collapsed some multiple tabs to single space. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | inet: rename some inet_sock fieldsEric Dumazet2009-10-18
|/ | | | | | | | | | | | | | | | In order to have better cache layouts of struct sock (separate zones for rx/tx paths), we need this preliminary patch. Goal is to transfert fields used at lookup time in the first read-mostly cache line (inside struct sock_common) and move sk_refcnt to a separate cache line (only written by rx path) This patch adds inet_ prefix to daddr, rcv_saddr, dport, num, saddr, sport and id fields. This allows a future patch to define these fields as macros, like sk_refcnt, without name clashes. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Use sk_mark for routing lookup in more placesAtis Elsts2009-10-01
| | | | | | | | | | | This patch against v2.6.31 adds support for route lookup using sk_mark in some more places. The benefits from this patch are the following. First, SO_MARK option now has effect on UDP sockets too. Second, ip_queue_xmit() and inet_sk_rebuild_header() could fail to do routing lookup correctly if TCP sockets with SO_MARK were used. Signed-off-by: Atis Elsts <atis@mikrotik.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6Linus Torvalds2009-09-14
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1623 commits) netxen: update copyright netxen: fix tx timeout recovery netxen: fix file firmware leak netxen: improve pci memory access netxen: change firmware write size tg3: Fix return ring size breakage netxen: build fix for INET=n cdc-phonet: autoconfigure Phonet address Phonet: back-end for autoconfigured addresses Phonet: fix netlink address dump error handling ipv6: Add IFA_F_DADFAILED flag net: Add DEVTYPE support for Ethernet based devices mv643xx_eth.c: remove unused txq_set_wrr() ucc_geth: Fix hangs after switching from full to half duplex ucc_geth: Rearrange some code to avoid forward declarations phy/marvell: Make non-aneg speed/duplex forcing work for 88E1111 PHYs drivers/net/phy: introduce missing kfree drivers/net/wan: introduce missing kfree net: force bridge module(s) to be GPL Subject: [PATCH] appletalk: Fix skb leak when ipddp interface is not loaded ... Fixed up trivial conflicts: - arch/x86/include/asm/socket.h converted to <asm-generic/socket.h> in the x86 tree. The generic header has the same new #define's, so that works out fine. - drivers/net/tun.c fix conflict between 89f56d1e9 ("tun: reuse struct sock fields") that switched over to using 'tun->socket.sk' instead of the redundantly available (and thus removed) 'tun->sk', and 2b980dbd ("lsm: Add hooks to the TUN driver") which added a new 'tun->sk' use. Noted in 'next' by Stephen Rothwell.
| * ip: Report qdisc packet dropsEric Dumazet2009-09-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Christoph Lameter pointed out that packet drops at qdisc level where not accounted in SNMP counters. Only if application sets IP_RECVERR, drops are reported to user (-ENOBUFS errors) and SNMP counters updated. IP_RECVERR is used to enable extended reliable error message passing, but these are not needed to update system wide SNMP stats. This patch changes things a bit to allow SNMP counters to be updated, regardless of IP_RECVERR being set or not on the socket. Example after an UDP tx flood # netstat -s ... IP: 1487048 outgoing packets dropped ... Udp: ... SndbufErrors: 1487048 send() syscalls, do however still return an OK status, to not break applications. Note : send() manual page explicitly says for -ENOBUFS error : "The output queue for a network interface was full. This generally indicates that the interface has stopped sending, but may be caused by transient congestion. (Normally, this does not occur in Linux. Packets are just silently dropped when a device queue overflows.) " This is not true for IP_RECVERR enabled sockets : a send() syscall that hit a qdisc drop returns an ENOBUFS error. Many thanks to Christoph, David, and last but not least, Alexey ! Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | ipv4: make ip_append_data() handle NULL routing tableJulien TINNES2009-08-27
|/ | | | | | | | | | Add a check in ip_append_data() for NULL *rtp to prevent future bugs in callers from being exploitable. Signed-off-by: Julien Tinnes <julien@cr0.org> Signed-off-by: Tavis Ormandy <taviso@sdf.lonestar.org> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* net: ip_push_pending_frames() fixEric Dumazet2009-07-11
| | | | | | | | | | | | | | After commit 2b85a34e911bf483c27cfdd124aeb1605145dc80 (net: No more expensive sock_hold()/sock_put() on each tx) we do not take any more references on sk->sk_refcnt on outgoing packets. I forgot to delete two __sock_put() from ip_push_pending_frames() and ip6_push_pending_frames(). Reported-by: Emil S Tantilov <emils.tantilov@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Tested-by: Emil S Tantilov <emils.tantilov@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: No more expensive sock_hold()/sock_put() on each txEric Dumazet2009-06-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | One of the problem with sock memory accounting is it uses a pair of sock_hold()/sock_put() for each transmitted packet. This slows down bidirectional flows because the receive path also needs to take a refcount on socket and might use a different cpu than transmit path or transmit completion path. So these two atomic operations also trigger cache line bounces. We can see this in tx or tx/rx workloads (media gateways for example), where sock_wfree() can be in top five functions in profiles. We use this sock_hold()/sock_put() so that sock freeing is delayed until all tx packets are completed. As we also update sk_wmem_alloc, we could offset sk_wmem_alloc by one unit at init time, until sk_free() is called. Once sk_free() is called, we atomic_dec_and_test(sk_wmem_alloc) to decrement initial offset and atomicaly check if any packets are in flight. skb_set_owner_w() doesnt call sock_hold() anymore sock_wfree() doesnt call sock_put() anymore, but check if sk_wmem_alloc reached 0 to perform the final freeing. Drawback is that a skb->truesize error could lead to unfreeable sockets, or even worse, prematurely calling __sk_free() on a live socket. Nice speedups on SMP. tbench for example, going from 2691 MB/s to 2711 MB/s on my 8 cpu dev machine, even if tbench was not really hitting sk_refcnt contention point. 5 % speedup on a UDP transmit workload (depends on number of flows), lowering TX completion cpu usage. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: Use frag list abstraction interfaces.David S. Miller2009-06-09
| | | | Signed-off-by: David S. Miller <davem@davemloft.net>
* net: skb->dst accessorsEric Dumazet2009-06-03
| | | | | | | | | | | | | | | | | | Define three accessors to get/set dst attached to a skb struct dst_entry *skb_dst(const struct sk_buff *skb) void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst) void skb_dst_drop(struct sk_buff *skb) This one should replace occurrences of : dst_release(skb->dst) skb->dst = NULL; Delete skb->dst field Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: skb->rtable accessorEric Dumazet2009-06-03
| | | | | | | | | | | Define skb_rtable(const struct sk_buff *skb) accessor to get rtable from skb Delete skb->rtable field Setting rtable is not allowed, just set dst instead as rtable is an alias. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* snmp: add missing counters for RFC 4293Neil Horman2009-04-27
| | | | | | | | | | | | | | The IP MIB (RFC 4293) defines stats for InOctets, OutOctets, InMcastOctets and OutMcastOctets: http://tools.ietf.org/html/rfc4293 But it seems we don't track those in any way that easy to separate from other protocols. This patch adds those missing counters to the stats file. Tested successfully by me With help from Eric Dumazet. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ip: support for TX timestamps on UDP and RAW socketsPatrick Ohly2009-02-16
| | | | | | | | Instructions for time stamping outgoing packets are take from the socket layer and later copied into the new skb. Signed-off-by: Patrick Ohly <patrick.ohly@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: avoid a pair of dst_hold()/dst_release() in ip_push_pending_frames()Eric Dumazet2008-11-24
| | | | | | | | | | | | | | | | | We can reduce pressure on dst entry refcount that slowdown UDP transmit path on SMP machines. This pressure is visible on RTP servers when delivering content to mediagateways, especially big ones, handling thousand of streams. Several cpus send UDP frames to the same destination, hence use the same dst entry. This patch makes ip_push_pending_frames() steal the refcount its callers had to take when filling inet->cork.dst. This doesnt avoid all refcounting, but still gives speedups on SMP, on UDP/RAW transmit path. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: avoid a pair of dst_hold()/dst_release() in ip_append_data()Eric Dumazet2008-11-24
| | | | | | | | | | | | | | | | | We can reduce pressure on dst entry refcount that slowdown UDP transmit path on SMP machines. This pressure is visible on RTP servers when delivering content to mediagateways, especially big ones, handling thousand of streams. Several cpus send UDP frames to the same destination, hence use the same dst entry. This patch makes ip_append_data() eventually steal the refcount its callers had to take on the dst entry. This doesnt avoid all refcounting, but still gives speedups on SMP, on UDP/RAW transmit path Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: clean up net/ipv4/ah4.c esp4.c fib_semantics.c inet_connection_sock.c ↵Jianjun Kong2008-11-03
| | | | | | | inetpeer.c ip_output.c Signed-off-by: Jianjun Kong <jianjun@zeuux.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: Make Netfilter's ip_route_me_harder() non-local address compatibleKOVACS Krisztian2008-10-01
| | | | | | | | | | | Netfilter's ip_route_me_harder() tries to re-route packets either generated or re-routed by Netfilter. This patch changes ip_route_me_harder() to handle packets from non-locally-bound sockets with IP_TRANSPARENT set as local and to set the appropriate flowi flags when re-doing the routing lookup. Signed-off-by: KOVACS Krisztian <hidden@sch.bme.hu> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: convert BUG_TRAP to generic WARN_ONIlpo Järvinen2008-07-26
| | | | | | | | | | | | | | Removes legacy reinvent-the-wheel type thing. The generic machinery integrates much better to automated debugging aids such as kerneloops.org (and others), and is unambiguous due to better naming. Non-intuively BUG_TRAP() is actually equal to WARN_ON() rather than BUG_ON() though some might actually be promoted to BUG_ON() but I left that to future. I could make at least one BUILD_BUG_ON conversion. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
* mib: add net to IP_INC_STATSPavel Emelyanov2008-07-16
| | | | | | | | All the callers already have either the net itself, or the place where to get it from. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* icmp: add struct net argument to icmp_out_countPavel Emelyanov2008-07-15
| | | | | | | | This routine deals with ICMP statistics, but doesn't have a struct net at hands, so add one. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: remove CVS keywordsAdrian Bunk2008-06-12
| | | | | | | | This patch removes CVS keywords that weren't updated for a long time from comments. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPv4] UFO: prevent generation of chained skb destined to UFO deviceKostya B2008-04-30
| | | | | | | | | | | | | | | | | | | | | | | | Problem: ip_append_data() could wrongly generate a chained skb for devices which support UFO. When sk_write_queue is not empty (e.g. MSG_MORE), __instead__ of appending data into the next nr_frag of the queued skb, a new chained skb is created. I would normally assume UFO device should get data in nr_frags and not in frag_list. Later the udp4_hwcsum_outgoing() resets csum to NONE and skb_gso_segment() has oops. Proposal: 1. Even length is less than mtu, employ ip_ufo_append_data() and append data to the __existed__ skb in the sk_write_queue. 2. ip_ufo_append_data() is fixed due to a wrong manipulation of peek-ing and later enqueue-ing of the same skb. Now, enqueuing is always performed, because on error the further ip_flush_pending_frames() would release the queued skb. Signed-off-by: Kostya B <bkostya@hotmail.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET] NETNS: Omit sock->sk_net without CONFIG_NET_NS.YOSHIFUJI Hideaki2008-03-25
| | | | | | | | | Introduce per-sock inlines: sock_net(), sock_net_set() and per-inet_timewait_sock inlines: twsk_net(), twsk_net_set(). Without CONFIG_NET_NS, no namespace other than &init_net exists. Let's explicitly define them to help compiler optimizations. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
* [IPV4,IPV6]: Share cork.rt between IPv4 and IPv6.YOSHIFUJI Hideaki2008-03-24
| | | | Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
* [NETNS]: Process IP layer in the context of the correct namespace.Denis V. Lunev2008-03-24
| | | | | | | Replace all the rest of the init_net with a proper net on the IP layer. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4]: Add 'rtable' field in struct sk_buff to alias 'dst' and avoid castsEric Dumazet2008-03-05
| | | | | | | | | | | | | | | | (Anonymous) unions can help us to avoid ugly casts. A common cast it the (struct rtable *)skb->dst one. Defining an union like : union { struct dst_entry *dst; struct rtable *rtable; }; permits to use skb->rtable in place. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Introducing socket mark socket option.Laszlo Attila Toth2008-01-31
| | | | | | | | | | | | A userspace program may wish to set the mark for each packets its send without using the netfilter MARK target. Changing the mark can be used for mark based routing without netfilter or for packet filtering. It requires CAP_NET_ADMIN capability. Signed-off-by: Laszlo Attila Toth <panther@balabit.hu> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [INET]: Prevent out-of-sync truesize on ip_fragment slow pathHerbert Xu2008-01-31
| | | | | | | | | | | When ip_fragment has to hit the slow path the value of skb->truesize may go out of sync because we would have updated it without changing the packet length. This violates the constraints on truesize. This patch postpones the update of skb->truesize to prevent this. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS]: Add namespace for ICMP replying code.Denis V. Lunev2008-01-28
| | | | | | | | | | | All needed API is done, the namespace is available when required from the device on the DST entry from the incoming packet. So, just replace init_net with proper namespace. Other protocols will follow. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS]: Add namespace parameter to ip_route_output_key.Denis V. Lunev2008-01-28
| | | | | | | Needed to propagate it down to the ip_route_output_flow. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS]: Add namespace parameter to ip_route_output_flow.Denis V. Lunev2008-01-28
| | | | | | | Needed to propagate it down to the __ip_route_output_key. Signed_off_by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Remove obsolete commentIlpo Järvinen2008-01-28
| | | | | | | | It seems that ip_build_xmit is no longer used in here and ip_append_data is used. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETFILTER]: Introduce NF_INET_ hook valuesPatrick McHardy2008-01-28
| | | | | | | | | | | The IPv4 and IPv6 hook values are identical, yet some code tries to figure out the "correct" value by looking at the address family. Introduce NF_INET_* values for both IPv4 and IPv6. The old values are kept in a #ifndef __KERNEL__ section for userspace compatibility. Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4]: Add ip_local_outHerbert Xu2008-01-28
| | | | | | | | | | | | | | | Most callers of the LOCAL_OUT chain will set the IP packet length and header checksum before doing so. They also share the same output function dst_output. This patch creates a new function called ip_local_out which does all of that and converts the appropriate users over to it. Apart from removing duplicate code, it will also help in merging the IPsec output path once the same thing is done for IPv6. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* [INET]: Fix truesize setting in ip_append_dataHerbert Xu2008-01-23
| | | | | | | | | | | | | | | | As it is ip_append_data only counts page fragments to the skb that allocated it. As such it means that the first skb gets hit with a 4K charge even though it might have only used a fraction of it while all subsequent skb's that use the same page gets away with no charge at all. This bug was exposed by the UDP accounting patch. [ The wmem_alloc bumping needs to be moved with the truesize, noticed by Takahiro Yasui. -DaveM ] Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4]: Add missing skb->truesize increment in ip_append_page().David S. Miller2008-01-23
| | | | | | | And as noted by Takahiro Yasui, we thus need to bump the sk->sk_wmem_alloc at this spot as well. Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4]: Consolidate the ip cork destruction in ip_output.cPavel Emelyanov2007-11-07
| | | | | | | | | The ip_push_pending_frames and ip_flush_pending_frames do the same things to flush the sock's cork. Move this into a separate function and save ~80 bytes from the .text Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Treat the sign of the result of skb_headroom() consistentlyChuck Lever2007-10-24
| | | | | | | | | In some places, the result of skb_headroom() is compared to an unsigned integer, and in others, the result is compared to a signed integer. Make the comparisons consistent and correct. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4]: Uninline netfilter okfnsPatrick McHardy2007-10-15
| | | | | | | | | | | | | | | | | | | | | | | | | Now that we don't pass double skb pointers to nf_hook_slow anymore, gcc can generate tail calls for some of the netfilter hook okfn invocations, so there is no need to inline the functions anymore. This caused huge code bloat since we ended up with one inlined version and one out-of-line version since we pass the address to nf_hook_slow. Before: text data bss dec hex filename 8997385 1016524 524652 10538561 a0ce41 vmlinux After: text data bss dec hex filename 8994009 1016524 524652 10535185 a0c111 vmlinux ------------------------------------------------------- -3376 All cases have been verified to generate tail-calls with and without netfilter. The okfns in ipmr and xfrm4_input still remain inline because gcc can't generate tail-calls for them. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Move hardware header operations out of netdevice.Stephen Hemminger2007-10-10
| | | | | | | | | Since hardware header operations are part of the protocol class not the device instance, make them into a separate object and save memory. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4]: Add ICMPMsgStats MIB (RFC 4293)David L Stevens2007-10-10
| | | | | | | | | | | | | | | | | | | | | | | | | | Background: RFC 4293 deprecates existing individual, named ICMP type counters to be replaced with the ICMPMsgStatsTable. This table includes entries for both IPv4 and IPv6, and requires counting of all ICMP types, whether or not the machine implements the type. These patches "remove" (but not really) the existing counters, and replace them with the ICMPMsgStats tables for v4 and v6. It includes the named counters in the /proc places they were, but gets the values for them from the new tables. It also counts packets generated from raw socket output (e.g., OutEchoes, MLD queries, RA's from radvd, etc). Changes: 1) create icmpmsg_statistics mib 2) create icmpv6msg_statistics mib 3) modify existing counters to use these 4) modify /proc/net/snmp to add "IcmpMsg" with all ICMP types listed by number for easy SNMP parsing 5) modify /proc/net/snmp printing for "Icmp" to get the named data from new counters. Signed-off-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4]: Clean up duplicate includes in net/ipv4/Jesper Juhl2007-08-14
| | | | | | | | | This patch cleans up duplicate includes in net/ipv4/ Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETFILTER]: x_tables: add TRACE targetJozsef Kadlecsik2007-07-11
| | | | | | | | | The TRACE target can be used to follow IP and IPv6 packets through the ruleset. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Patrick NcHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: IPV6 checksum offloading in network devicesStephen Hemminger2007-07-11
| | | | | | | | | | | | | | | The existing model for checksum offload does not correctly handle devices that can offload IPV4 and IPV6 only. The NETIF_F_HW_CSUM flag implies device can do any arbitrary protocol. This patch: * adds NETIF_F_IPV6_CSUM for those devices * fixes bnx2 and tg3 devices that need it * add NETIF_F_IPV6_CSUM to ipv6 output (incl GSO) * fixes assumptions about NETIF_F_ALL_CSUM in nat * adjusts bridge union of checksumming computation Signed-off-by: David S. Miller <davem@davemloft.net>
* [TCP]: Honour sk_bound_dev_if in tcp_v4_send_ackPatrick McHardy2007-06-07
| | | | | | | | | | | A time_wait socket inherits sk_bound_dev_if from the original socket, but it is not used when sending ACK packets using ip_send_reply. Fix by passing the oif to ip_send_reply in struct ip_reply_arg and use it for output routing. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>