aboutsummaryrefslogtreecommitdiffstats
path: root/net/packet
Commit message (Collapse)AuthorAge
* net: Use skb_checksum_start_offset()Michał Mirosław2010-12-16
| | | | | | | | | Replace skb->csum_start - skb_headroom(skb) with skb_checksum_start_offset(). Note for usb/smsc95xx: skb->data - skb->head == skb_headroom(skb). Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
* af_packet: use swap() instead of the open coded macro XC()Changli Gao2010-12-10
| | | | | Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* af_packet: fix freeing pg_vec twice on error pathChangli Gao2010-12-08
| | | | | | | | | | | | It is introduced in: commit 0e3125c755445664f00ad036e4fc2cd32fd52877 Author: Neil Horman <nhorman@tuxdriver.com> Date: Tue Nov 16 10:26:47 2010 -0800 packet: Enhance AF_PACKET implementation to not require high order contiguous memory allocation (v4) Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* af_packet: eliminate pgv_to_page on some archesChangli Gao2010-12-08
| | | | | | | | Some arches don't need flush_dcache_page(), and don't implement it, so we can eliminate pgv_to_page() calls on those arches. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* filter: constify sk_run_filter()Eric Dumazet2010-12-08
| | | | | | | | | | sk_run_filter() doesnt write on skb, change its prototype to reflect this. Fix two af_packet comments. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* af_packet: remove pgv.flagsChangli Gao2010-12-06
| | | | | | | | As we can check if an address is vmalloc address with is_vmalloc_addr(), we remove pgv.flags. Then we may get more pg_vecs. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* af_packet: use vmalloc_to_page() instead for the addresss returned by vmalloc()Changli Gao2010-12-06
| | | | | | | | | | | | | | | | | | | | | | | The following commit causes the pgv->buffer may point to the memory returned by vmalloc(). And we can't use virt_to_page() for the vmalloc address. This patch introduces a new inline function pgv_to_page(), which calls vmalloc_to_page() for the vmalloc address, and virt_to_page() for the __get_free_pages address. We used to increase page pointer to get the next page at the next page address, after Neil's patch, it is wrong, as the physical address may be not continuous. This patch also fixes this issue. commit 0e3125c755445664f00ad036e4fc2cd32fd52877 Author: Neil Horman <nhorman@tuxdriver.com> Date: Tue Nov 16 10:26:47 2010 -0800 packet: Enhance AF_PACKET implementation to not require high order contiguous memory allocation (v4) Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* packet: use vzalloc()Eric Dumazet2010-11-21
| | | | | | | | | | alloc_one_pg_vec_page() is supposed to return zeroed memory, so use vzalloc() instead of vmalloc() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Neil Horman <nhorman@tuxdriver.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* filter: optimize sk_run_filterEric Dumazet2010-11-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove pc variable to avoid arithmetic to compute fentry at each filter instruction. Jumps directly manipulate fentry pointer. As the last instruction of filter[] is guaranteed to be a RETURN, and all jumps are before the last instruction, we dont need to check filter bounds (number of instructions in filter array) at each iteration, so we remove it from sk_run_filter() params. On x86_32 remove f_k var introduced in commit 57fe93b374a6b871 (filter: make sure filters dont read uninitialized memory) Note : We could use a CONFIG_ARCH_HAS_{FEW|MANY}_REGISTERS in order to avoid too many ifdefs in this code. This helps compiler to use cpu registers to hold fentry and A accumulator. On x86_32, this saves 401 bytes, and more important, sk_run_filter() runs much faster because less register pressure (One less conditional branch per BPF instruction) # size net/core/filter.o net/core/filter_pre.o text data bss dec hex filename 2948 0 0 2948 b84 net/core/filter.o 3349 0 0 3349 d15 net/core/filter_pre.o on x86_64 : # size net/core/filter.o net/core/filter_pre.o text data bss dec hex filename 5173 0 0 5173 1435 net/core/filter.o 5224 0 0 5224 1468 net/core/filter_pre.o Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* packet: Enhance AF_PACKET implementation to not require high order ↵Neil Horman2010-11-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | contiguous memory allocation (v4) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Version 4 of this patch. Change notes: 1) Removed extra memset. Didn't think kcalloc added a GFP_ZERO the way kzalloc did :) Summary: It was shown to me recently that systems under high load were driven very deep into swap when tcpdump was run. The reason this happened was because the AF_PACKET protocol has a SET_RINGBUFFER socket option that allows the user space application to specify how many entries an AF_PACKET socket will have and how large each entry will be. It seems the default setting for tcpdump is to set the ring buffer to 32 entries of 64 Kb each, which implies 32 order 5 allocation. Thats difficult under good circumstances, and horrid under memory pressure. I thought it would be good to make that a bit more usable. I was going to do a simple conversion of the ring buffer from contigous pages to iovecs, but unfortunately, the metadata which AF_PACKET places in these buffers can easily span a page boundary, and given that these buffers get mapped into user space, and the data layout doesn't easily allow for a change to padding between frames to avoid that, a simple iovec change is just going to break user space ABI consistency. So I've done this, I've added a three tiered mechanism to the af_packet set_ring socket option. It attempts to allocate memory in the following order: 1) Using __get_free_pages with GFP_NORETRY set, so as to fail quickly without digging into swap 2) Using vmalloc 3) Using __get_free_pages with GFP_NORETRY clear, causing us to try as hard as needed to get the memory The effect is that we don't disturb the system as much when we're under load, while still being able to conduct tcpdumps effectively. Tested successfully by me. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Maciej Żenczykowski <zenczykowski@gmail.com> Reported-by: Maciej Żenczykowski <zenczykowski@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Fix header size check for GSO case in recvmsg (af_packet)Mariusz Kozlowski2010-11-12
| | | | | | | Parameter 'len' is size_t type so it will never get negative. Signed-off-by: Mariusz Kozlowski <mk@lab.zgora.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: packet: fix information leak to userlandVasiliy Kulikov2010-11-10
| | | | | | | | | | | | | packet_getname_spkt() doesn't initialize all members of sa_data field of sockaddr struct if strlen(dev->name) < 13. This structure is then copied to userland. It leads to leaking of contents of kernel stack memory. We have to fully fill sa_data with strncpy() instead of strlcpy(). The same with packet_getname(): it doesn't initialize sll_pkttype field of sockaddr_ll. Set it to zero. Signed-off-by: Vasiliy Kulikov <segooon@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: simplify flags for tx timestampingOliver Hartkopp2010-08-19
| | | | | | | | | | | | | This patch removes the abstraction introduced by the union skb_shared_tx in the shared skb data. The access of the different union elements at several places led to some confusion about accessing the shared tx_flags e.g. in skb_orphan_try(). http://marc.info/?l=linux-netdev&m=128084897415886&w=2 Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* packet_mmap: expose hw packet timestamps to network packet capture utilitiesScott McMillan2010-06-02
| | | | | | | | | | | | | | | | | | | This patch adds a setting, PACKET_TIMESTAMP, to specify the packet timestamp source that is exported to capture utilities like tcpdump by packet_mmap. PACKET_TIMESTAMP accepts the same integer bit field as SO_TIMESTAMPING. However, only the SOF_TIMESTAMPING_SYS_HARDWARE and SOF_TIMESTAMPING_RAW_HARDWARE values are currently recognized by PACKET_TIMESTAMP. SOF_TIMESTAMPING_SYS_HARDWARE takes precedence over SOF_TIMESTAMPING_RAW_HARDWARE if both bits are set. If PACKET_TIMESTAMP is not set, a software timestamp generated inside the networking stack is used (the behavior before this setting was added). Signed-off-by: Scott McMillan <scott.a.mcmillan@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'master' of ↵David S. Miller2010-04-21
|\ | | | | | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/wireless/iwlwifi/iwl-6000.c net/core/dev.c
| * packet : remove init_net restrictionDaniel Lezcano2010-04-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The af_packet protocol is used by Perl to do ioctls as reported by Stephane Riviere: "Net::RawIP relies on SIOCGIFADDR et SIOCGIFHWADDR to get the IP and MAC addresses of the network interface." But in a new network namespace these ioctl fail because it is disabled for a namespace different from the init_net_ns. These two lines should not be there as af_inet and af_packet are namespace aware since a long time now. I suppose we forget to remove these lines because we sent the af_packet first, before af_inet was supported. Signed-off-by: Daniel Lezcano <daniel.lezcano@free.fr> Reported-by: Stephane Riviere <stephane.riviere@regis-dgac.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | packet: support for TX time stamps on RAW socketsRichard Cochran2010-04-13
| | | | | | | | | | | | | | | | | | | | | | Enable the SO_TIMESTAMPING socket infrastructure for raw packet sockets. We introduce PACKET_TX_TIMESTAMP for the control message cmsg_type. Similar support for UDP and CAN sockets was added in commit 51f31cabe3ce5345b51e4a4f82138b38c4d5dc91 Signed-off-by: Richard Cochran <richard.cochran@omicron.at> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'master' of ↵David S. Miller2010-04-11
|\| | | | | | | | | | | | | | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/stmmac/stmmac_main.c drivers/net/wireless/wl12xx/wl1271_cmd.c drivers/net/wireless/wl12xx/wl1271_main.c drivers/net/wireless/wl12xx/wl1271_spi.c net/core/ethtool.c net/mac80211/scan.c
| * include cleanup: Update gfp.h and slab.h includes to prepare for breaking ↵Tejun Heo2010-03-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
* | net: convert multicast list to list_headJiri Pirko2010-04-03
| | | | | | | | | | | | | | | | | | | | | | | | | | Converts the list and the core manipulating with it to be the same as uc_list. +uses two functions for adding/removing mc address (normal and "global" variant) instead of a function parameter. +removes dev_mcast.c completely. +exposes netdev_hw_addr_list_* macros along with __hw_addr_* functions for manipulation with lists on a sandbox (used in bonding and 80211 drivers) Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: move address list functions to a separate fileJiri Pirko2010-04-03
|/ | | | | | | +little renaming of unicast functions to be smooth with multicast ones Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* af_packet: move strict addr_len check right before dev_[mc/unicast]_[add/del]Jiri Pirko2010-03-03
| | | | | | | | | | | | My previous patch 914c8ad2d18b62ad1420f518c0cab0b0b90ab308 incorrectly changed the length check in packet_mc_add to be more strict. The problem is that userspace is not filling this field (and it stays zeroed) in case of setting PACKET_MR_PROMISC or PACKET_MR_ALLMULTI. So move the strict check to the point in path where the addr_len must be set correctly. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Reported-by: Pavel Roskin <proski@gnu.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'master' of /home/davem/src/GIT/linux-2.6/David S. Miller2010-02-28
|\ | | | | | | | | Conflicts: drivers/firmware/iscsi_ibft.c
| * net: Add checking to rcu_dereference() primitivesPaul E. McKenney2010-02-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Update rcu_dereference() primitives to use new lockdep-based checking. The rcu_dereference() in __in6_dev_get() may be protected either by rcu_read_lock() or RTNL, per Eric Dumazet. The rcu_dereference() in __sk_free() is protected by the fact that it is never reached if an update could change it. Check for this by using rcu_dereference_check() to verify that the struct sock's ->sk_wmem_alloc counter is zero. Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: mathieu.desnoyers@polymtl.ca Cc: josh@joshtriplett.org Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org Cc: Valdis.Kletnieks@vt.edu Cc: dhowells@redhat.com LKML-Reference: <1266887105-1528-5-git-send-email-paulmck@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | af_packet: do not accept mc address smaller then dev->addr_len in ↵Jiri Pirko2010-02-26
| | | | | | | | | | | | | | | | | | | | packet_mc_add() There is no point of accepting an address of smaller length than dev->addr_len here. Therefore change this for stonger check. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | packet: convert socket list to RCU (v3)stephen hemminger2010-02-22
| | | | | | | | | | | | | | | | | | | | | | Convert AF_PACKET to use RCU, eliminating one more reader/writer lock. There is no need for a real sk_del_node_init_rcu(), because sk_del_node_init is doing the equivalent thing to hlst_del_init_rcu already; but added some comments to try and make that obvious. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: packet: use seq_hlist_foo() helpersLi Zefan2010-02-10
| | | | | | | | | | | | | | Simplify seq_file code. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | packet: Kill CONFIG_PACKET_MMAP.David S. Miller2010-02-05
| | | | | | | | | | | | | | | | | | | | | | | | | | Early on this was an experimental facility that few people other than Alexey Kuznetsov played with. Now it's a pretty fundamental thing and as people add more features to AF_PACKET sockets this config options creates ifdef spaghetti. So kill it off. Signed-off-by: David S. Miller <davem@davemloft.net>
* | packet: Add GSO/csum offload support.Sridhar Samudrala2010-02-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds GSO/checksum offload to af_packet sockets using virtio_net_hdr. Based on Rusty's patch to add this support to tun. It allows GSO/checksum offload to be enabled when using raw socket backend with virtio_net. Adds PACKET_VNET_HDR socket option to prepend virtio_net_hdr in the receive path and process/skip virtio_net_hdr in the send path. This option is only allowed with SOCK_RAW sockets attached to ethernet type devices. v2 updates ---------- Michael's Comments - Perform length check in packet_snd() when GSO is off even when vnet_hdr is present. - Check for SKB_GSO_FCOE type and return -EINVAL - don't allow tx/rx ring when vnet_hdr is enabled. Herbert's Comments - Removed ethernet specific code. - protocol value is assumed to be passed in by the caller. Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'master' of ↵David S. Miller2010-01-23
|\| | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
| * af_packet: Don't use skb after dev_queue_xmit()Jarek Poplawski2010-01-11
| | | | | | | | | | | | | | | | | | | | | | | | | | tpacket_snd() can change and kfree an skb after dev_queue_xmit(), which is illegal. With debugging by: Stephen Hemminger <shemminger@vyatta.com> Reported-by: Michael Breuer <mbreuer@majjas.com> With help from: David S. Miller <davem@davemloft.net> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Tested-by: Michael Breuer<mbreuer@majjas.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: spread __net_init, __net_exitAlexey Dobriyan2010-01-17
|/ | | | | | | | | | | __net_init/__net_exit are apparently not going away, so use them to full extent. In some cases __net_init was removed, because it was called from __net_exit code. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* packet: dont call sleeping functions while holding rcu_read_lock()Eric Dumazet2009-12-16
| | | | | | | | | | | | | | | commit 654d1f8a019dfa06d (packet: less dev_put() calls) introduced a problem, calling potentially sleeping functions from a rcu_read_lock() protected section. Fix this by releasing lock before the sock_wmalloc()/memcpy_fromiovec() calls. After skb allocation and copy from user space, we redo device lookup and appropriate tests. Reported-and-tested-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Move && and || to end of previous lineJoe Perches2009-11-29
| | | | | | | | | | | Not including net/atm/ Compiled tested x86 allyesconfig only Added a > 80 column line or two, which I ignored. Existing checkpatch plaints willfully, cheerfully ignored. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: use net_eq to compare netsOctavian Purdila2009-11-25
| | | | | | | | | | | | | | | | | | | | | | | Generated with the following semantic patch @@ struct net *n1; struct net *n2; @@ - n1 == n2 + net_eq(n1, n2) @@ struct net *n1; struct net *n2; @@ - n1 != n2 + !net_eq(n1, n2) applied over {include,net,drivers/net}. Signed-off-by: Octavian Purdila <opurdila@ixiacom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: netlink_getname, packet_getname -- use DECLARE_SOCKADDR guardCyrill Gorcunov2009-11-10
| | | | | | | | Use guard DECLARE_SOCKADDR in a few more places which allow us to catch if the structure copied back is too big. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: pass kern to net_proto_family create functionEric Paris2009-11-06
| | | | | | | | | | | The generic __sock_create function has a kern argument which allows the security system to make decisions based on if a socket is being created by the kernel or by userspace. This patch passes that flag to the net_proto_family specific create function, so it can do the same thing. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* packet: less dev_put() callsEric Dumazet2009-11-02
| | | | | | | | | | | | | - packet_sendmsg_spkt() can use dev_get_by_name_rcu() to avoid touching device refcount. - packet_getname_spkt() & packet_getname() can use dev_get_by_index_rcu() to avoid touching device refcount too. tpacket_snd() & packet_snd() can not use RCU yet because they can sleep when allocating skb. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'master' of ↵David S. Miller2009-10-30
|\ | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
| * net: Fix 'Re: PACKET_TX_RING: packet size is too long'Gabor Gombas2009-10-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently PACKET_TX_RING forces certain amount of every frame to remain unused. This probably originates from an early version of the PACKET_TX_RING patch that in fact used the extra space when the (since removed) CONFIG_PACKET_MMAP_ZERO_COPY option was enabled. The current code does not make any use of this extra space. This patch removes the extra space reservation and lets userspace make use of the full frame size. Signed-off-by: Gabor Gombas <gombasg@sztaki.hu> Signed-off-by: David S. Miller <davem@davemloft.net>
* | vlan: allow null VLAN ID to be usedEric Dumazet2009-10-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We currently use a 16 bit field (vlan_tci) to store VLAN ID/PRIO on a skb. Null value is used as a special value, meaning vlan tagging not enabled. This forbids use of null vlan ID. As pointed by David, some drivers use the 3 high order bits (PRIO) As VLAN ID is 12 bits, we can use the remaining bit (CFI) as a flag, and allow null VLAN ID. In case future code really wants to use VLAN_CFI_MASK, we'll have to use a bit outside of vlan_tci. #define VLAN_PRIO_MASK 0xe000 /* Priority Code Point */ #define VLAN_PRIO_SHIFT 13 #define VLAN_CFI_MASK 0x1000 /* Canonical Format Indicator */ #define VLAN_TAG_PRESENT VLAN_CFI_MASK #define VLAN_VID_MASK 0x0fff /* VLAN Identifier */ Reported-by: Gertjan Hofman <gertjan_hofman@yahoo.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | af_packet: mc_drop/flush_mclist changesEric Dumazet2009-10-20
| | | | | | | | | | | | | | We hold RTNL, we can use __dev_get_by_index() instead of dev_get_by_index() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | af_packet: Avoid cache line dirtyingEric Dumazet2009-10-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While doing multiple captures, I found af_packet was dirtying cache line containing its prot_hook. This slow down machines where several cpus are necessary to handle capture traffic, as each prot_hook is traversed for each packet coming in or out the host. This patches moves "struct packet_type prot_hook" to the end of packet_sock, and uses a ____cacheline_aligned_in_smp to make sure this remains shared by all cpus. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: Generalize socket rx gap / receive queue overflow cmsgNeil Horman2009-10-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Create a new socket level option to report number of queue overflows Recently I augmented the AF_PACKET protocol to report the number of frames lost on the socket receive queue between any two enqueued frames. This value was exported via a SOL_PACKET level cmsg. AFter I completed that work it was requested that this feature be generalized so that any datagram oriented socket could make use of this option. As such I've created this patch, It creates a new SOL_SOCKET level option called SO_RXQ_OVFL, which when enabled exports a SOL_SOCKET level cmsg that reports the nubmer of times the sk_receive_queue overflowed between any two given frames. It also augments the AF_PACKET protocol to take advantage of this new feature (as it previously did not touch sk->sk_drops, which this patch uses to record the overflow count). Tested successfully by me. Notes: 1) Unlike my previous patch, this patch simply records the sk_drops value, which is not a number of drops between packets, but rather a total number of drops. Deltas must be computed in user space. 2) While this patch currently works with datagram oriented protocols, it will also be accepted by non-datagram oriented protocols. I'm not sure if thats agreeable to everyone, but my argument in favor of doing so is that, for those protocols which aren't applicable to this option, sk_drops will always be zero, and reporting no drops on a receive queue that isn't used for those non-participating protocols seems reasonable to me. This also saves us having to code in a per-protocol opt in mechanism. 3) This applies cleanly to net-next assuming that commit 977750076d98c7ff6cbda51858bb5a5894a9d9ab (my af packet cmsg patch) is reverted Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Revert "af_packet: add interframe drop cmsg (v6)"David S. Miller2009-10-12
| | | | | | | | | | | | | | | | This reverts commit 977750076d98c7ff6cbda51858bb5a5894a9d9ab. Neil is reimplementing this generically, outside of AF_PACKET. Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: mark net_proto_ops as constStephen Hemminger2009-10-07
| | | | | | | | | | | | | | All usages of structure net_proto_ops should be declared const. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Use sk_mark for routing lookup in more placesEric Dumazet2009-10-07
| | | | | | | | | | | | | | | | | | | | | | | | Here is a followup on this area, thanks. [RFC] af_packet: fill skb->mark at xmit skb->mark may be used by classifiers, so fill it in case user set a SO_MARK option on socket. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | af_packet: add interframe drop cmsg (v6)Neil Horman2009-10-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add Ancilliary data to better represent loss information I've had a few requests recently to provide more detail regarding frame loss during an AF_PACKET packet capture session. Specifically the requestors want to see where in a packet sequence frames were lost, i.e. they want to see that 40 frames were lost between frames 302 and 303 in a packet capture file. In order to do this we need: 1) The kernel to export this data to user space 2) The applications to make use of it This patch addresses item (1). It does this by doing the following: A) Anytime we drop a frame for which we would increment po->stats.tp_drops, we also no increment a stats called po->stats.tp_gap. B) Every time we successfully enqueue a frame to sk_receive_queue, we record the value of po->stats.tp_gap in skb->mark. skb->cb would nominally be the place to record this, but since all the space there is used up, we're overloading skb->mark. Its safe to do since any enqueued packet is guaranteed to be unshared at this point, and skb->mark isn't used for anything else in the rx path to the application. After we record tp_gap in the skb, we zero po->stats.tp_gap. This allows us to keep a counter of the number of frames lost between any two enqueued packets C) When the application goes to dequeue a frame from the packet socket, we look at skb->mark for that frame. If it is non-zero, we add a cmsg chunk to the msghdr of level SOL_PACKET and type PACKET_GAPDATA. Its a 32 bit integer that represents the number of frames lost between this packet and the last previous frame received. Note there is a chance that if there is frame loss after a receive, and then the socket is closed, some gap data might be lost. This is covered by the use of the PACKET_AUXDATA socket option, which gives total loss data. With a bit of math, the final gap can be determined that way. I've tested this patch myself, and it works well. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> include/linux/if_packet.h | 2 ++ net/packet/af_packet.c | 33 +++++++++++++++++++++++++++++++++ 2 files changed, 35 insertions(+) Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds2009-09-30
|\| | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: ax25: Fix possible oops in ax25_make_new net: restore tx timestamping for accelerated vlans Phonet: fix mutex imbalance sit: fix off-by-one in ipip6_tunnel_get_prl net: Fix sock_wfree() race net: Make setsockopt() optlen be unsigned.
| * net: Make setsockopt() optlen be unsigned.David S. Miller2009-09-30
| | | | | | | | | | | | | | | | | | | | | | | | This provides safety against negative optlen at the type level instead of depending upon (sometimes non-trivial) checks against this sprinkled all over the the place, in each and every implementation. Based upon work done by Arjan van de Ven and feedback from Linus Torvalds. Signed-off-by: David S. Miller <davem@davemloft.net>