aboutsummaryrefslogtreecommitdiffstats
path: root/include
Commit message (Collapse)AuthorAge
* ipv6 mcast: Introduce include/net/mld.h for MLD definitions.YOSHIFUJI Hideaki2010-04-23
| | | | Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
* remove DCB_PROTO_VERSION as we don't do netlink versioningScott Feldman2010-04-22
| | | | | | | remove DCB_PROTO_VERSION as we don't do netlink versioning Signed-off-by: Scott Feldman <scofeldm@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* X25: Add if_x25.h and x25 to device identifiersAndrew Hendry2010-04-22
| | | | | | | | | | | | | | | | V2 Feedback from John Hughes. - Add header for userspace implementations such as xot/xoe to use - Use explicit values for interface stability - No changes to driver patches V1 - Use identifiers instead of magic numbers for X25 layer 3 to device interface. - Also fixed checkpatch notes on updated code. [ Add new user header to include/linux/Kbuild -DaveM ] Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* dst: rcu check refinementEric Dumazet2010-04-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | __sk_dst_get() might be called from softirq, with socket lock held. [ 159.026180] include/net/sock.h:1200 invoked rcu_dereference_check() without protection! [ 159.026261] [ 159.026261] other info that might help us debug this: [ 159.026263] [ 159.026425] [ 159.026426] rcu_scheduler_active = 1, debug_locks = 0 [ 159.026552] 2 locks held by swapper/0: [ 159.026609] #0: (&icsk->icsk_retransmit_timer){+.-...}, at: [<ffffffff8104fc15>] run_timer_softirq+0x105/0x350 [ 159.026839] #1: (slock-AF_INET){+.-...}, at: [<ffffffff81392b8f>] tcp_write_timer+0x2f/0x1e0 [ 159.027063] [ 159.027064] stack backtrace: [ 159.027172] Pid: 0, comm: swapper Not tainted 2.6.34-rc5-03707-gde498c8-dirty #36 [ 159.027252] Call Trace: [ 159.027306] <IRQ> [<ffffffff810718ef>] lockdep_rcu_dereference +0xaf/0xc0 [ 159.027411] [<ffffffff8138e4f7>] tcp_current_mss+0xa7/0xb0 [ 159.027537] [<ffffffff8138fa49>] tcp_write_wakeup+0x89/0x190 [ 159.027600] [<ffffffff81391936>] tcp_send_probe0+0x16/0x100 [ 159.027726] [<ffffffff81392cd9>] tcp_write_timer+0x179/0x1e0 [ 159.027790] [<ffffffff8104fca1>] run_timer_softirq+0x191/0x350 [ 159.027980] [<ffffffff810477ed>] __do_softirq+0xcd/0x200 Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Socket filter ancilliary data access for skb->dev->typePaul LeoNerd Evans2010-04-22
| | | | | | | | | | | | | | | | Add an SKF_AD_HATYPE field to the packet ancilliary data area, giving access to skb->dev->type, as reported in the sll_hatype field. When capturing packets on a PF_PACKET/SOCK_RAW socket bound to all interfaces, there doesn't appear to be a way for the filter program to actually find out the underlying hardware type the packet was captured on. This patch adds such ability. This patch also handles the case where skb->dev can be NULL, such as on netlink sockets. Signed-off-by: Paul Evans <leonerd@leonerd.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
* tcp: fix outsegs stat for TSO segmentsTom Herbert2010-04-22
| | | | | | | | | Account for TSO segments of an skb in TCP_MIB_OUTSEGS counter. Without doing this, the counter can be off by orders of magnitude from the actual number of segments sent. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* IPv6: Generic TTL Security Mechanism (final version)Stephen Hemminger2010-04-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds IPv6 support for RFC5082 Generalized TTL Security Mechanism. Not to users of mapped address; the IPV6 and IPV4 socket options are seperate. The server does have to deal with both IPv4 and IPv6 socket options and the client has to handle the different for each family. On client: int ttl = 255; getaddrinfo(argv[1], argv[2], &hint, &result); for (rp = result; rp != NULL; rp = rp->ai_next) { s = socket(rp->ai_family, rp->ai_socktype, rp->ai_protocol); if (s < 0) continue; if (rp->ai_family == AF_INET) { setsockopt(s, IPPROTO_IP, IP_TTL, &ttl, sizeof(ttl)); } else if (rp->ai_family == AF_INET6) { setsockopt(s, IPPROTO_IPV6, IPV6_UNICAST_HOPS, &ttl, sizeof(ttl))) } if (connect(s, rp->ai_addr, rp->ai_addrlen) == 0) { ... On server: int minttl = 255 - maxhops; getaddrinfo(NULL, port, &hints, &result); for (rp = result; rp != NULL; rp = rp->ai_next) { s = socket(rp->ai_family, rp->ai_socktype, rp->ai_protocol); if (s < 0) continue; if (rp->ai_family == AF_INET6) setsockopt(s, IPPROTO_IPV6, IPV6_MINHOPCOUNT, &minttl, sizeof(minttl)); setsockopt(s, IPPROTO_IP, IP_MINTTL, &minttl, sizeof(minttl)); if (bind(s, rp->ai_addr, rp->ai_addrlen) == 0) break ... Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ks8842: Add platform data for setting mac addressRichard Röjfors2010-04-21
| | | | | | | | | | | | | This patch adds platform data to the ks8842 driver. Via the platform data a MAC address, to be used by the controller, can be passed. To ensure this MAC address is used, the MAC address is written after each hardware reset. Signed-off-by: Richard Röjfors <richard.rojfors@pelagicore.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* fasync: RCU and fine grained lockingEric Dumazet2010-04-21
| | | | | | | | | | | | | | | | | | | | | kill_fasync() uses a central rwlock, candidate for RCU conversion, to avoid cache line ping pongs on SMP. fasync_remove_entry() and fasync_add_entry() can disable IRQS on a short section instead during whole list scan. Use a spinlock per fasync_struct to synchronize kill_fasync_rcu() and fasync_{remove|add}_entry(). This spinlock is IRQ safe, so sock_fasync() doesnt need its own implementation and can use fasync_helper(), to reduce code size and complexity. We can remove __kill_fasync() direct use in net/socket.c, and rename it to kill_fasync_rcu(). Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'master' of ↵David S. Miller2010-04-21
|\ | | | | | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/wireless/iwlwifi/iwl-6000.c net/core/dev.c
| * Merge branch 'core-fixes-for-linus' of ↵Linus Torvalds2010-04-19
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: rcu: Make RCU lockdep check the lockdep_recursion variable rcu: Update docs for rcu_access_pointer and rcu_dereference_protected rcu: Better explain the condition parameter of rcu_dereference_check() rcu: Add rcu_access_pointer and rcu_dereference_protected
| | * rcu: Make RCU lockdep check the lockdep_recursion variablePaul E. McKenney2010-04-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The lockdep facility temporarily disables lockdep checking by incrementing the current->lockdep_recursion variable. Such disabling happens in NMIs and in other situations where lockdep might expect to recurse on itself. This patch therefore checks current->lockdep_recursion, disabling RCU lockdep splats when this variable is non-zero. In addition, this patch removes the "likely()", as suggested by Lai Jiangshan. Reported-by: Frederic Weisbecker <fweisbec@gmail.com> Reported-by: David Miller <davem@davemloft.net> Tested-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: mathieu.desnoyers@polymtl.ca Cc: josh@joshtriplett.org Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org Cc: Valdis.Kletnieks@vt.edu Cc: dhowells@redhat.com Cc: eric.dumazet@gmail.com LKML-Reference: <20100415195039.GA22623@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * rcu: Better explain the condition parameter of rcu_dereference_check()David Howells2010-04-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Better explain the condition parameter of rcu_dereference_check() that describes the conditions under which the dereference is permitted to take place (and incorporate Yong Zhang's suggestion). This condition is only checked under lockdep proving. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: mathieu.desnoyers@polymtl.ca Cc: josh@joshtriplett.org Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org Cc: Valdis.Kletnieks@vt.edu Cc: eric.dumazet@gmail.com LKML-Reference: <1270852752-25278-2-git-send-email-paulmck@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * rcu: Add rcu_access_pointer and rcu_dereference_protectedPaul E. McKenney2010-04-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds variants of rcu_dereference() that handle situations where the RCU-protected data structure cannot change, perhaps due to our holding the update-side lock, or where the RCU-protected pointer is only to be fetched, not dereferenced. These are needed due to some performance concerns with using rcu_dereference() where it is not required, aside from the need for lockdep/sparse checking. The new rcu_access_pointer() primitive is for the case where the pointer is be fetch and not dereferenced. This primitive may be used without protection, RCU or otherwise, due to the fact that it uses ACCESS_ONCE(). The new rcu_dereference_protected() primitive is for the case where updates are prevented, for example, due to holding the update-side lock. This primitive does neither ACCESS_ONCE() nor smp_read_barrier_depends(), so can only be used when updates are somehow prevented. Suggested-by: David Howells <dhowells@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: mathieu.desnoyers@polymtl.ca Cc: josh@joshtriplett.org Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org Cc: Valdis.Kletnieks@vt.edu Cc: dhowells@redhat.com Cc: eric.dumazet@gmail.com LKML-Reference: <1270852752-25278-1-git-send-email-paulmck@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | Merge branch 'drm-linus' of ↵Linus Torvalds2010-04-19
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6 * 'drm-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6: drm/radeon/kms: add FireMV 2400 PCI ID. drm/radeon/kms: allow R500 regs VAP_ALT_NUM_VERTICES and VAP_INDEX_OFFSET drivers/gpu/radeon: Add MSPOS regs to safe list. drm/radeon/kms: disable the tv encoder when tv/cv is not in use drm/radeon/kms: adjust pll settings for tv drm/radeon/kms: fix tv dac conflict resolver drm/radeon/kms/evergreen: don't enable hdmi audio stuff drm/radeon/kms/atom: fix dual-link DVI on DCE3.2/4.0 drm/radeon/kms: fix rs600 tlb flush drm/radeon/kms: print GPU family and device id when loading drm/radeon/kms: fix calculation of mipmapped 3D texture sizes drm/radeon/kms: only change mode when coherent value changes. drm/radeon/kms: more atom parser fixes (v2)
| | * | drm/radeon/kms: add FireMV 2400 PCI ID.Dave Airlie2010-04-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is an M24/X600 chip. From RH# 581927 cc: stable@kernel.org Signed-off-by: Dave Airlie <airlied@redhat.com>
| * | | Merge branch 'for-linus' of ↵Linus Torvalds2010-04-15
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6: firewire: cdev: change license of exported header files to MIT license firewire: cdev: comment fixlet firewire: cdev: iso packet documentation firewire: cdev: fix information leak firewire: cdev: require quadlet-aligned headers for transmit packets firewire: cdev: disallow receive packets without header
| | * | | firewire: cdev: change license of exported header files to MIT licenseStefan Richter2010-04-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Among else, this allows projects like libdc1394 to carry copies of the ABI related header files without them or distributors having to worry about effects on the project's overall license terms. Switch to MIT license as suggested by Kristian. Also update the year in the copyright statement according to source history. Cc: Jay Fenlason <fenlason@redhat.com> Acked-by: Clemens Ladisch <clemens@ladisch.de> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>
| | * | | firewire: cdev: comment fixletStefan Richter2010-04-10
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
| | * | | firewire: cdev: iso packet documentationClemens Ladisch2010-04-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add the missing documentation for iso packets. Signed-off-by: Clemens Ladisch <clemens@ladisch.de> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
| * | | | Merge branch 'for-linus' of ↵Linus Torvalds2010-04-15
| |\ \ \ \ | | |_|_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: wacom - switch mode upon system resume Revert "Input: wacom - merge out and in prox events" Input: matrix_keypad - allow platform to disable key autorepeat Input: ALPS - add signature for HP Pavilion dm3 laptops Input: i8042 - spelling fix Input: sparse-keymap - implement safer freeing of the keymap Input: update the status of the Multitouch X driver project Input: clarify the no-finger event in multitouch protocol Input: bcm5974 - retract efi-broken suspend_resume Input: sparse-keymap - free the right keymap on error
| | * | | Input: matrix_keypad - allow platform to disable key autorepeatH Hartley Sweeten2010-04-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In an embedded system the matrix_keypad driver might be used to interface with an external control panel and not an actual keyboard. On the control panel some of the keys could be used to turn on/off various functions. If key autorepeat is enabled this causes the function to quickly toggle between the on and off states and makes operation difficult. Add an option in the platform-specific data to disable the key autorepeat. Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
| * | | | Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6Linus Torvalds2010-04-13
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: NFSv4: fix delegated locking NFS: Ensure that the WRITE and COMMIT RPC calls are always uninterruptible NFS: Fix a race with the new commit code NFS: Ensure that writeback_single_inode() calls write_inode() when syncing NFS: Fix the mode calculation in nfs_find_open_context NFSv4: Fall back to ordinary lookup if nfs4_atomic_open() returns EISDIR
| | * | | | NFSv4: fix delegated lockingTrond Myklebust2010-04-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Arnaud Giersch reports that NFSv4 locking is broken when we hold a delegation since commit 8e469ebd6dc32cbaf620e134d79f740bf0ebab79 (NFSv4: Don't allow posix locking against servers that don't support it). According to Arnaud, the lock succeeds the first time he opens the file (since we cannot do a delegated open) but then fails after we start using delegated opens. The following patch fixes it by ensuring that locking behaviour is governed by a per-filesystem capability flag that is initially set, but gets cleared if the server ever returns an OPEN without the NFS4_OPEN_RESULT_LOCKTYPE_POSIX flag being set. Reported-by: Arnaud Giersch <arnaud.giersch@iut-bm.univ-fcomte.fr> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org
* | | | | | net: Remove two unnecessary exports (skbuff).Rami Rosen2010-04-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is no need to export skb_under_panic() and skb_over_panic() in skbuff.c, since these methods are used only in skbuff.c ; this patch removes these two exports. It also marks these functions as 'static' and removeS the extern declarations of them from include/linux/skbuff.h Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | net: sk_sleep() helperEric Dumazet2010-04-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Define a new function to return the waitqueue of a "struct sock". static inline wait_queue_head_t *sk_sleep(struct sock *sk) { return sk->sk_sleep; } Change all read occurrences of sk_sleep by a call to this function. Needed for a future RCU conversion. sk_sleep wont be a field directly available. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | rps: cleanupsEric Dumazet2010-04-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | struct softnet_data holds many queues, so consistent use "sd" name instead of "queue" is better. Adds a rps_ipi_queued() helper to cleanup enqueue_to_backlog() Adds a _and_irq_disable suffix to net_rps_action() name, as David suggested. incr_input_queue_head() becomes input_queue_head_incr() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | rps: shortcut net_rps_action()Eric Dumazet2010-04-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | net_rps_action() is a bit expensive on NR_CPUS=64..4096 kernels, even if RPS is not active. Tom Herbert used two bitmasks to hold information needed to send IPI, but a single LIFO list seems more appropriate. Move all RPS logic into net_rps_action() to cleanup net_rx_action() code (remove two ifdefs) Move rps_remote_softirq_cpus into softnet_data to share its first cache line, filling an existing hole. In a future patch, we could call net_rps_action() from process_backlog() to make sure we send IPI before handling this cpu backlog. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | rfs: Receive Flow SteeringTom Herbert2010-04-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch implements receive flow steering (RFS). RFS steers received packets for layer 3 and 4 processing to the CPU where the application for the corresponding flow is running. RFS is an extension of Receive Packet Steering (RPS). The basic idea of RFS is that when an application calls recvmsg (or sendmsg) the application's running CPU is stored in a hash table that is indexed by the connection's rxhash which is stored in the socket structure. The rxhash is passed in skb's received on the connection from netif_receive_skb. For each received packet, the associated rxhash is used to look up the CPU in the hash table, if a valid CPU is set then the packet is steered to that CPU using the RPS mechanisms. The convolution of the simple approach is that it would potentially allow OOO packets. If threads are thrashing around CPUs or multiple threads are trying to read from the same sockets, a quickly changing CPU value in the hash table could cause rampant OOO packets-- we consider this a non-starter. To avoid OOO packets, this solution implements two types of hash tables: rps_sock_flow_table and rps_dev_flow_table. rps_sock_table is a global hash table. Each entry is just a CPU number and it is populated in recvmsg and sendmsg as described above. This table contains the "desired" CPUs for flows. rps_dev_flow_table is specific to each device queue. Each entry contains a CPU and a tail queue counter. The CPU is the "current" CPU for a matching flow. The tail queue counter holds the value of a tail queue counter for the associated CPU's backlog queue at the time of last enqueue for a flow matching the entry. Each backlog queue has a queue head counter which is incremented on dequeue, and so a queue tail counter is computed as queue head count + queue length. When a packet is enqueued on a backlog queue, the current value of the queue tail counter is saved in the hash entry of the rps_dev_flow_table. And now the trick: when selecting the CPU for RPS (get_rps_cpu) the rps_sock_flow table and the rps_dev_flow table for the RX queue are consulted. When the desired CPU for the flow (found in the rps_sock_flow table) does not match the current CPU (found in the rps_dev_flow table), the current CPU is changed to the desired CPU if one of the following is true: - The current CPU is unset (equal to RPS_NO_CPU) - Current CPU is offline - The current CPU's queue head counter >= queue tail counter in the rps_dev_flow table. This checks if the queue tail has advanced beyond the last packet that was enqueued using this table entry. This guarantees that all packets queued using this entry have been dequeued, thus preserving in order delivery. Making each queue have its own rps_dev_flow table has two advantages: 1) the tail queue counters will be written on each receive, so keeping the table local to interrupting CPU s good for locality. 2) this allows lockless access to the table-- the CPU number and queue tail counter need to be accessed together under mutual exclusion from netif_receive_skb, we assume that this is only called from device napi_poll which is non-reentrant. This patch implements RFS for TCP and connected UDP sockets. It should be usable for other flow oriented protocols. There are two configuration parameters for RFS. The "rps_flow_entries" kernel init parameter sets the number of entries in the rps_sock_flow_table, the per rxqueue sysfs entry "rps_flow_cnt" contains the number of entries in the rps_dev_flow table for the rxqueue. Both are rounded to power of two. The obvious benefit of RFS (over just RPS) is that it achieves CPU locality between the receive processing for a flow and the applications processing; this can result in increased performance (higher pps, lower latency). The benefits of RFS are dependent on cache hierarchy, application load, and other factors. On simple benchmarks, we don't necessarily see improvement and sometimes see degradation. However, for more complex benchmarks and for applications where cache pressure is much higher this technique seems to perform very well. Below are some benchmark results which show the potential benfit of this patch. The netperf test has 500 instances of netperf TCP_RR test with 1 byte req. and resp. The RPC test is an request/response test similar in structure to netperf RR test ith 100 threads on each host, but does more work in userspace that netperf. e1000e on 8 core Intel No RFS or RPS 104K tps at 30% CPU No RFS (best RPS config): 290K tps at 63% CPU RFS 303K tps at 61% CPU RPC test tps CPU% 50/90/99% usec latency Latency StdDev No RFS/RPS 103K 48% 757/900/3185 4472.35 RPS only: 174K 73% 415/993/2468 491.66 RFS 223K 73% 379/651/1382 315.61 Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | net: replace ipfragok with skb->local_dfShan Wei2010-04-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As Herbert Xu said: we should be able to simply replace ipfragok with skb->local_df. commit f88037(sctp: Drop ipfargok in sctp_xmit function) has droped ipfragok and set local_df value properly. The patch kills the ipfragok parameter of .queue_xmit(). Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | Merge branch 'for-davem' of ↵David S. Miller2010-04-15
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6
| * \ \ \ \ \ Merge branch 'master' of ↵John W. Linville2010-04-15
| |\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into for-davem Conflicts: Documentation/feature-removal-schedule.txt drivers/net/wireless/ath/ath5k/phy.c drivers/net/wireless/wl12xx/wl1271_main.c
| | * | | | | | mac80211: Moved mesh action codes to a more visible locationJavier Cardona2010-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Grouped mesh action codes together with the other action codes in ieee80211.h. Signed-off-by: Javier Cardona <javier@cozybit.com> Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
| | * | | | | | Merge branch 'master' of ↵John W. Linville2010-04-08
| | |\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 into merge Conflicts: Documentation/feature-removal-schedule.txt drivers/net/wireless/ath/ath5k/phy.c drivers/net/wireless/iwlwifi/iwl-4965.c drivers/net/wireless/iwlwifi/iwl-agn.c drivers/net/wireless/iwlwifi/iwl-core.c drivers/net/wireless/iwlwifi/iwl-core.h drivers/net/wireless/iwlwifi/iwl-tx.c
| | * | | | | | | mac80211: clean up/fix aggregation codeJohannes Berg2010-04-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The aggregation code has a number of quirks, like inventing an unneeded WLAN_BACK_TIMER value and leaking memory under certain circumstances during station destruction. Fix these issues by using the regular aggregation session teardown code and blocking new aggregation sessions, all before the station is really destructed. As a side effect, this gets rid of the long code block to destroy aggregation safely. Additionally, rename tid_state_rx which can only have the values IDLE and OPERATIONAL to tid_active_rx to make it easier to understand that there is no bitwise stuff going on on the RX side -- the TX side remains because it needs to keep track of the driver and peer states. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
| | * | | | | | | cfg80211: Add local-state-change-only auth/deauth/disassocJouni Malinen2010-04-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cfg80211 is quite strict on allowing authentication and association commands only in certain states. In order to meet these requirements, user space applications may need to clear authentication or association state in some cases. Currently, this can be done with deauth/disassoc command, but that ends up sending out Deauthentication or Disassociation frame unnecessarily. Add a new nl80211 attribute to allow this sending of the frame be skipped, but with all other deauth/disassoc operations being completed. Similar state change is also needed for IEEE 802.11r FT protocol in the FT-over-DS case which does not use Authentication frame exchange in a transition to another BSS. For this to work with cfg80211, an authentication entry needs to be created for the target BSS without sending out an Authentication frame. The nl80211 authentication command can be used for this purpose, too, with the new attribute to indicate that the command is only for changing local state. This enables wpa_supplicant to complete FT-over-DS transition successfully. Signed-off-by: Jouni Malinen <j@w1.fi> Signed-off-by: John W. Linville <linville@tuxdriver.com>
| | * | | | | | | libertas/sdio: 8686: set ECSI bit for 1-bit transfersDaniel Mack2010-04-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When operating in 1-bit mode, SDAT1 is used as dedicated interrupt line. However, the 8686 will only drive this line when the ECSI bit is set in the CCCR_IF register. Thanks to Alagu Sankar for pointing me in the right direction. Signed-off-by: Daniel Mack <daniel@caiaq.de> Cc: Alagu Sankar <alagusankar@embwise.com> Cc: Volker Ernst <volker.ernst@txtr.com> Cc: Dan Williams <dcbw@redhat.com> Cc: John W. Linville <linville@tuxdriver.com> Cc: Holger Schurig <hs4233@mail.mn-solutions.de> Cc: Bing Zhao <bzhao@marvell.com> Cc: libertas-dev@lists.infradead.org Cc: linux-wireless@vger.kernel.org Cc: linux-mmc@vger.kernel.org Acked-by: Dan Williams <dcbw@redhat.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
| | * | | | | | | include/net/iw_handler.h: Use SIOCIWFIRST not SIOCSIWCOMMIT in commentJoe Perches2010-03-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | to match use in IW_IOCTL_IDX macro Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
| | * | | | | | | mac80211: explicitly disable/enable QoSStanislaw Gruszka2010-03-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add interface to disable/enable QoS (aka WMM or WME). Currently drivers enable it explicitly when ->conf_tx method is called, and newer disable. Disabling is needed for some APs, which do not support QoS, such we should send QoS frames to them. Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
| | * | | | | | | mac80211: support paged rx SKBsZhu Yi2010-03-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Mac80211 drivers can now pass paged SKBs to mac80211 via ieee80211_rx{_irqsafe}. The implementation currently use skb_linearize() in a few places i.e. management frame handling, software decryption, defragmentation and A-MSDU process. We will optimize them one by one later. Signed-off-by: Zhu Yi <yi.zhu@intel.com> Cc: Kalle Valo <kalle.valo@iki.fi> Cc: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
* | | | | | | | | net: CONFIG_SMP should be CONFIG_RPSChangli Gao2010-04-15
|/ / / / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | | stmmac: new descriptor field for the driver's platformGiuseppe CAVALLARO2010-04-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The new enh_desc is used for selecting the enhanced descriptors structure. There are several scenarios; some chips (mac10/100 or gmac) want to use the enhanced descriptors; others want the normal ones. For example, on ST platforms: MAC10/100 uses the normal desc structure and the GMAC uses the enhanced one. It can be useful to get this information from the platform. This could also be decided at run-time looking at the chip's ID number; but it could happen that chips with the same ID want to use different descriptor structure. Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | | ipv4: ipmr: support multiple tablesPatrick McHardy2010-04-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support for multiple independant multicast routing instances, named "tables". Userspace multicast routing daemons can bind to a specific table instance by issuing a setsockopt call using a new option MRT_TABLE. The table number is stored in the raw socket data and affects all following ipmr setsockopt(), getsockopt() and ioctl() calls. By default, a single table (RT_TABLE_DEFAULT) is created with a default routing rule pointing to it. Newly created pimreg devices have the table number appended ("pimregX"), with the exception of devices created in the default table, which are named just "pimreg" for compatibility reasons. Packets are directed to a specific table instance using routing rules, similar to how regular routing rules work. Currently iif, oif and mark are supported as keys, source and destination addresses could be supported additionally. Example usage: - bind pimd/xorp/... to a specific table: uint32_t table = 123; setsockopt(fd, IPPROTO_IP, MRT_TABLE, &table, sizeof(table)); - create routing rules directing packets to the new table: # ip mrule add iif eth0 lookup 123 # ip mrule add oif eth0 lookup 123 Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | | ipv4: ipmr: move mroute data into seperate structurePatrick McHardy2010-04-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | | ipv4: ipmr: convert struct mfc_cache to struct list_headPatrick McHardy2010-04-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | | ipv4: ipmr: remove net pointer from struct mfc_cachePatrick McHardy2010-04-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that cache entries in unres_queue don't need to be distinguished by their network namespace pointer anymore, we can remove it from struct mfc_cache add pass the namespace as function argument to the functions that need it. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | | ipv4: ipmr: move unres_queue and timer to per-namespace dataPatrick McHardy2010-04-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The unres_queue is currently shared between all namespaces. Following patches will additionally allow to create multiple multicast routing tables in each namespace. Having a single shared queue for all these users seems to excessive, move the queue and the cleanup timer to the per-namespace data to unshare it. As a side-effect, this fixes a bug in the seq file iteration functions: the first entry returned is always from the current namespace, entries returned after that may belong to any namespace. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | | ipv4: raw: move struct raw_sock and raw_sk() to include/net/raw.hPatrick McHardy2010-04-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A following patch will use struct raw_sock to store state for ipmr, so having the definitions in icmp.h doesn't fit very well anymore. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | | net: fib_rules: decouple address families from real address familiesPatrick McHardy2010-04-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Decouple the address family values used for fib_rules from the real address families in socket.h. This allows to use fib_rules for code that is not a real address family without increasing AF_MAX/NPROTO. Values up to 127 are reserved for real address families and map directly to the corresponding AF value, values starting from 128 are for other uses. rtnetlink is changed to invoke the AF_UNSPEC dumpit/doit handlers for these families. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | | net: fib_rules: consolidate IPv4 and DECnet ->default_pref() functions.Patrick McHardy2010-04-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Both functions are equivalent, consolidate them since a following patch needs a third implementation for multicast routing. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>