aboutsummaryrefslogtreecommitdiffstats
path: root/net/netlink
Commit message (Collapse)AuthorAge
* netlink: Adding inode field to /proc/net/netlinkMasatake YAMATO2010-02-28
| | | | | | | | | | | The Inode field in /proc/net/{tcp,udp,packet,raw,...} is useful to know the types of file descriptors associated to a process. Actually lsof utility uses the field. Unfortunately, unlike /proc/net/{tcp,udp,packet,raw,...}, /proc/net/netlink doesn't have the field. This patch adds the field to /proc/net/netlink. Signed-off-by: Masatake YAMATO <yamato@redhat.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'master' of ↵David S. Miller2010-02-03
|\ | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
| * netlink: fix for too early rmmodAlexey Dobriyan2010-02-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Netlink code does module autoload if protocol userspace is asking for is not ready. However, module can dissapear right after it was autoloaded. Example: modprobe/rmmod stress-testing and xfrm_user.ko providing NETLINK_XFRM. netlink_create() in such situation _will_ create userspace socket and _will_not_ pin module. Now if module was removed and we're going to call ->netlink_rcv into nothing: BUG: unable to handle kernel paging request at ffffffffa02f842a ^^^^^^^^^^^^^^^^ modules are loaded near these addresses here IP: [<ffffffffa02f842a>] 0xffffffffa02f842a PGD 161f067 PUD 1623063 PMD baa12067 PTE 0 Oops: 0010 [#1] PREEMPT SMP DEBUG_PAGEALLOC last sysfs file: /sys/devices/pci0000:00/0000:00:1f.2/host0/target0:0:0/0:0:0:0/block/sda/uevent CPU 1 Pid: 11515, comm: ip Not tainted 2.6.33-rc5-netns-00594-gaaa5728-dirty #6 P5E/P5E RIP: 0010:[<ffffffffa02f842a>] [<ffffffffa02f842a>] 0xffffffffa02f842a RSP: 0018:ffff8800baa3db48 EFLAGS: 00010292 RAX: ffff8800baa3dfd8 RBX: ffff8800be353640 RCX: 0000000000000000 RDX: ffffffff81959380 RSI: ffff8800bab7f130 RDI: 0000000000000001 RBP: ffff8800baa3db58 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000011 R13: ffff8800be353640 R14: ffff8800bcdec240 R15: ffff8800bd488010 FS: 00007f93749656f0(0000) GS:ffff880002300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: ffffffffa02f842a CR3: 00000000ba82b000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process ip (pid: 11515, threadinfo ffff8800baa3c000, task ffff8800bab7eb30) Stack: ffffffff813637c0 ffff8800bd488000 ffff8800baa3dba8 ffffffff8136397d <0> 0000000000000000 ffffffff81344adc 7fffffffffffffff 0000000000000000 <0> ffff8800baa3ded8 ffff8800be353640 ffff8800bcdec240 0000000000000000 Call Trace: [<ffffffff813637c0>] ? netlink_unicast+0x100/0x2d0 [<ffffffff8136397d>] netlink_unicast+0x2bd/0x2d0 netlink_unicast_kernel: nlk->netlink_rcv(skb); [<ffffffff81344adc>] ? memcpy_fromiovec+0x6c/0x90 [<ffffffff81364263>] netlink_sendmsg+0x1d3/0x2d0 [<ffffffff8133975b>] sock_sendmsg+0xbb/0xf0 [<ffffffff8106cdeb>] ? __lock_acquire+0x27b/0xa60 [<ffffffff810a18c3>] ? might_fault+0x73/0xd0 [<ffffffff810a18c3>] ? might_fault+0x73/0xd0 [<ffffffff8106db22>] ? __lock_release+0x82/0x170 [<ffffffff810a190e>] ? might_fault+0xbe/0xd0 [<ffffffff810a18c3>] ? might_fault+0x73/0xd0 [<ffffffff81344c77>] ? verify_iovec+0x47/0xd0 [<ffffffff8133a509>] sys_sendmsg+0x1a9/0x360 [<ffffffff813c2be5>] ? _raw_spin_unlock_irqrestore+0x65/0x70 [<ffffffff8106aced>] ? trace_hardirqs_on+0xd/0x10 [<ffffffff813c2bc2>] ? _raw_spin_unlock_irqrestore+0x42/0x70 [<ffffffff81197004>] ? __up_read+0x84/0xb0 [<ffffffff8106ac95>] ? trace_hardirqs_on_caller+0x145/0x190 [<ffffffff813c207f>] ? trace_hardirqs_on_thunk+0x3a/0x3f [<ffffffff8100262b>] system_call_fastpath+0x16/0x1b Code: Bad RIP value. RIP [<ffffffffa02f842a>] 0xffffffffa02f842a RSP <ffff8800baa3db48> CR2: ffffffffa02f842a If module was quickly removed after autoloading, return -E. Return -EPROTONOSUPPORT if module was quickly removed after autoloading. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | genetlink: optimize ctrl_dumpfamily()Samir Bellabes2010-01-13
|/ | | | | | | | | | there is a unnecessary test which can be replaced by a good initialization in the 'for' statement Noticed by Serge E. Hallyn <serue@us.ibm.com> Signed-off-by: Samir Bellabes <sam@synack.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: use net_eq to compare netsOctavian Purdila2009-11-25
| | | | | | | | | | | | | | | | | | | | | | | Generated with the following semantic patch @@ struct net *n1; struct net *n2; @@ - n1 == n2 + net_eq(n1, n2) @@ struct net *n1; struct net *n2; @@ - n1 != n2 + !net_eq(n1, n2) applied over {include,net,drivers/net}. Signed-off-by: Octavian Purdila <opurdila@ixiacom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* netlink: remove subscriptions check on notifierJohannes Berg2009-11-17
| | | | | | | | | | | | | | The netlink URELEASE notifier doesn't notify for sockets that have been used to receive multicast but it should be called for such sockets as well since they might _also_ be used for sending and not solely for receiving multicast. We will need that for nl80211 (generic netlink sockets) in the future. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: netlink_getname, packet_getname -- use DECLARE_SOCKADDR guardCyrill Gorcunov2009-11-10
| | | | | | | | Use guard DECLARE_SOCKADDR in a few more places which allow us to catch if the structure copied back is too big. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: pass kern to net_proto_family create functionEric Paris2009-11-06
| | | | | | | | | | | The generic __sock_create function has a kern argument which allows the security system to make decisions based on if a socket is being created by the kernel or by userspace. This patch passes that flag to the net_proto_family specific create function, so it can do the same thing. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* genetlink: Optimize and one bug fix in genl_generate_id()Krishna Kumar2009-10-18
| | | | | | | | | | | | | | | | | | | 1. GENL_MIN_ID is a valid id -> no need to start at GENL_MIN_ID + 1. 2. Avoid going through the ids two times: If we start at GENL_MIN_ID+1 (*or bigger*) and all ids are over!, the code iterates through the list twice (*or lesser*). 3. Simplify code - no need to start at idx=0 which gets reset to GENL_MIN_ID. Patch on net-next-2.6. Reboot test shows that first id passed to genl_register_family was 16, next two were GENL_ID_GENERATE and genl_generate_id returned 17 & 18 (user level testing of same code shows expected values across entire range of MIN/MAX). Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* genetlink: Optimize genl_register_family()Krishna Kumar2009-10-18
| | | | | | | | | | genl_register_family() doesn't need to call genl_family_find_byid when GENL_ID_GENERATE is passed during register. Patch on net-next-2.6, compile and reboot testing only. Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: mark net_proto_ops as constStephen Hemminger2009-10-07
| | | | | | | All usages of structure net_proto_ops should be declared const. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Make setsockopt() optlen be unsigned.David S. Miller2009-09-30
| | | | | | | | | | | | This provides safety against negative optlen at the type level instead of depending upon (sometimes non-trivial) checks against this sprinkled all over the the place, in each and every implementation. Based upon work done by Arjan van de Ven and feedback from Linus Torvalds. Signed-off-by: David S. Miller <davem@davemloft.net>
* net: fix nlmsg len size for skb when error bit is set.John Fastabend2009-09-26
| | | | | | | | | | | Currently, the nlmsg->len field is not set correctly in netlink_ack() for ack messages that include the nlmsg of the error frame. This corrects the length field passed to __nlmsg_put to use the correct payload size. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* genetlink: fix netns vs. netlink table locking (2)Johannes Berg2009-09-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | Similar to commit d136f1bd366fdb7e747ca7e0218171e7a00a98a5, there's a bug when unregistering a generic netlink family, which is caught by the might_sleep() added in that commit: BUG: sleeping function called from invalid context at net/netlink/af_netlink.c:183 in_atomic(): 1, irqs_disabled(): 0, pid: 1510, name: rmmod 2 locks held by rmmod/1510: #0: (genl_mutex){+.+.+.}, at: [<ffffffff8138283b>] genl_unregister_family+0x2b/0x130 #1: (rcu_read_lock){.+.+..}, at: [<ffffffff8138270c>] __genl_unregister_mc_group+0x1c/0x120 Pid: 1510, comm: rmmod Not tainted 2.6.31-wl #444 Call Trace: [<ffffffff81044ff9>] __might_sleep+0x119/0x150 [<ffffffff81380501>] netlink_table_grab+0x21/0x100 [<ffffffff813813a3>] netlink_clear_multicast_users+0x23/0x60 [<ffffffff81382761>] __genl_unregister_mc_group+0x71/0x120 [<ffffffff81382866>] genl_unregister_family+0x56/0x130 [<ffffffffa0007d85>] nl80211_exit+0x15/0x20 [cfg80211] [<ffffffffa000005a>] cfg80211_exit+0x1a/0x40 [cfg80211] Fix in the same way by grabbing the netlink table lock before doing rcu_read_lock(). Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* mm: replace various uses of num_physpages by totalram_pagesJan Beulich2009-09-22
| | | | | | | | | | | | | | | | | | | | | | | Sizing of memory allocations shouldn't depend on the number of physical pages found in a system, as that generally includes (perhaps a huge amount of) non-RAM pages. The amount of what actually is usable as storage should instead be used as a basis here. Some of the calculations (i.e. those not intending to use high memory) should likely even use (totalram_pages - totalhigh_pages). Signed-off-by: Jan Beulich <jbeulich@novell.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Dave Airlie <airlied@linux.ie> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: "David S. Miller" <davem@davemloft.net> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* genetlink: fix netns vs. netlink table lockingJohannes Berg2009-09-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | Since my commits introducing netns awareness into genetlink we can get this problem: BUG: scheduling while atomic: modprobe/1178/0x00000002 2 locks held by modprobe/1178: #0: (genl_mutex){+.+.+.}, at: [<ffffffff8135ee1a>] genl_register_mc_grou #1: (rcu_read_lock){.+.+..}, at: [<ffffffff8135eeb5>] genl_register_mc_g Pid: 1178, comm: modprobe Not tainted 2.6.31-rc8-wl-34789-g95cb731-dirty # Call Trace: [<ffffffff8103e285>] __schedule_bug+0x85/0x90 [<ffffffff81403138>] schedule+0x108/0x588 [<ffffffff8135b131>] netlink_table_grab+0xa1/0xf0 [<ffffffff8135c3a7>] netlink_change_ngroups+0x47/0x100 [<ffffffff8135ef0f>] genl_register_mc_group+0x12f/0x290 because I overlooked that netlink_table_grab() will schedule, thinking it was just the rwlock. However, in the contention case, that isn't actually true. Fix this by letting the code grab the netlink table lock first and then the RCU for netns protection. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'master' of ↵David S. Miller2009-09-10
|\ | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6
| * netlink: constify nlmsghdr argumentsPatrick McHardy2009-08-25
| | | | | | | | | | | | | | | | Consitfy nlmsghdr arguments to a couple of functions as preparation for the next patch, which will constify the netlink message data in all nfnetlink users. Signed-off-by: Patrick McHardy <kaber@trash.net>
* | netlink: silence compiler warningBrian Haley2009-09-04
|/ | | | | | | | | | | | CC net/netlink/genetlink.o net/netlink/genetlink.c: In function ‘genl_register_mc_group’: net/netlink/genetlink.c:139: warning: ‘err’ may be used uninitialized in this function From following the code 'err' is initialized, but set it to zero to silence the warning. Signed-off-by: Brian Haley <brian.haley@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/compat/wext: send different messages to compat tasksJohannes Berg2009-07-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Wireless extensions have the unfortunate problem that events are multicast netlink messages, and are not independent of pointer size. Thus, currently 32-bit tasks on 64-bit platforms cannot properly receive events and fail with all kinds of strange problems, for instance wpa_supplicant never notices disassociations, due to the way the 64-bit event looks (to a 32-bit process), the fact that the address is all zeroes is lost, it thinks instead it is 00:00:00:00:01:00. The same problem existed with the ioctls, until David Miller fixed those some time ago in an heroic effort. A different problem caused by this is that we cannot send the ASSOCREQIE/ASSOCRESPIE events because sending them causes a 32-bit wpa_supplicant on a 64-bit system to overwrite its internal information, which is worse than it not getting the information at all -- so we currently resort to sending a custom string event that it then parses. This, however, has a severe size limitation we are frequently hitting with modern access points; this limitation would can be lifted after this patch by sending the correct binary, not custom, event. A similar problem apparently happens for some other netlink users on x86_64 with 32-bit tasks due to the alignment for 64-bit quantities. In order to fix these problems, I have implemented a way to send compat messages to tasks. When sending an event, we send the non-compat event data together with a compat event data in skb_shinfo(main_skb)->frag_list. Then, when the event is read from the socket, the netlink code makes sure to pass out only the skb that is compatible with the task. This approach was suggested by David Miller, my original approach required always sending two skbs but that had various small problems. To determine whether compat is needed or not, I have used the MSG_CMSG_COMPAT flag, and adjusted the call path for recv and recvfrom to include it, even if those calls do not have a cmsg parameter. I have not solved one small part of the problem, and I don't think it is necessary to: if a 32-bit application uses read() rather than any form of recvmsg() it will still get the wrong (64-bit) event. However, neither do applications actually do this, nor would it be a regression. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* genetlink: make netns awareJohannes Berg2009-07-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This makes generic netlink network namespace aware. No generic netlink families except for the controller family are made namespace aware, they need to be checked one by one and then set the family->netnsok member to true. A new function genlmsg_multicast_netns() is introduced to allow sending a multicast message in a given namespace, for example when it applies to an object that lives in that namespace, a new function genlmsg_multicast_allns() to send a message to all network namespaces (for objects that do not have an associated netns). The function genlmsg_multicast() is changed to multicast the message in just init_net, which is currently correct for all generic netlink families since they only work in init_net right now. Some will later want to work in all net namespaces because they do not care about the netns at all -- those will have to be converted to use one of the new functions genlmsg_multicast_allns() or genlmsg_multicast_netns() whenever they are made netns aware in some way. After this patch families can easily decide whether or not they should be available in all net namespaces. Many genl families us it for objects not related to networking and should therefore be available in all namespaces, but that will have to be done on a per family basis. Note that this doesn't touch on the checkpoint/restart problem where network namespaces could be used, genl families and multicast groups are numbered globally and I see no easy way of changing that, especially since it must be possible to multicast to all network namespaces for those families that do not care about netns. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* netlink: use call_rcu for netlink_change_ngroupsJohannes Berg2009-07-12
| | | | | | | | | | | | | | | For the network namespace work in generic netlink I need to be able to call this function under rcu_read_lock(), otherwise the locking becomes a nightmare and more locks would be needed. Instead, just embed a struct rcu_head (actually a struct listeners_rcu_head that also carries the pointer to the memory block) into the listeners memory so we can use call_rcu() instead of synchronising and then freeing. No rcu_barrier() is needed since this code cannot be modular. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* netlink: remove unused exportsJohannes Berg2009-07-12
| | | | | | | | | I added those myself in commits b4ff4f04 and 84659eb5, but I see no reason now why they should be exported, only generic netlink uses them which cannot be modular. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: correct off-by-one write allocations reportsEric Dumazet2009-06-18
| | | | | | | | | | | | | commit 2b85a34e911bf483c27cfdd124aeb1605145dc80 (net: No more expensive sock_hold()/sock_put() on each tx) changed initial sk_wmem_alloc value. We need to take into account this offset when reporting sk_wmem_alloc to user, in PROC_FS files or various ioctls (SIOCOUTQ/TIOCOUTQ) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* genetlink: Introduce genl_register_family_with_ops()Michał Mirosław2009-05-21
| | | | | | | | | This introduces genl_register_family_with_ops() that registers a genetlink family along with operations from a table. This is used to kill copy'n'paste occurrences in following patches. Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'master' of /home/davem/src/GIT/linux-2.6/David S. Miller2009-03-26
|\ | | | | | | | | Conflicts: drivers/net/wimax/i2400m/usb-notif.c
| * Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6Linus Torvalds2009-03-26
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (29 commits) crypto: sha512-s390 - Add missing block size hwrng: timeriomem - Breaks an allyesconfig build on s390: nlattr: Fix build error with NET off crypto: testmgr - add zlib test crypto: zlib - New zlib crypto module, using pcomp crypto: testmgr - Add support for the pcomp interface crypto: compress - Add pcomp interface netlink: Move netlink attribute parsing support to lib crypto: Fix dead links hwrng: timeriomem - New driver crypto: chainiv - Use kcrypto_wq instead of keventd_wq crypto: cryptd - Per-CPU thread implementation based on kcrypto_wq crypto: api - Use dedicated workqueue for crypto subsystem crypto: testmgr - Test skciphers with no IVs crypto: aead - Avoid infinite loop when nivaead fails selftest crypto: skcipher - Avoid infinite loop when cipher fails selftest crypto: api - Fix crypto_alloc_tfm/create_create_tfm return convention crypto: api - crypto_alg_mod_lookup either tested or untested crypto: amcc - Add crypt4xx driver crypto: ansi_cprng - Add maintainer ...
| | * netlink: Move netlink attribute parsing support to libGeert Uytterhoeven2009-03-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Netlink attribute parsing may be used even if CONFIG_NET is not set. Move it from net/netlink to lib and control its inclusion based on the new config symbol CONFIG_NLATTR, which is selected by CONFIG_NET. Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* | | netlink: add NETLINK_NO_ENOBUFS socket flagPablo Neira Ayuso2009-03-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds the NETLINK_NO_ENOBUFS socket flag. This flag can be used by unicast and broadcast listeners to avoid receiving ENOBUFS errors. Generally speaking, ENOBUFS errors are useful to notify two things to the listener: a) You may increase the receiver buffer size via setsockopt(). b) You have lost messages, you may be out of sync. In some cases, ignoring ENOBUFS errors can be useful. For example: a) nfnetlink_queue: this subsystem does not have any sort of resync method and you can decide to ignore ENOBUFS once you have set a given buffer size. b) ctnetlink: you can use this together with the socket flag NETLINK_BROADCAST_SEND_ERROR to stop getting ENOBUFS errors as you do not need to resync (packets whose event are not delivered are drop to provide reliable logging and state-synchronization). Moreover, the use of NETLINK_NO_ENOBUFS also reduces a "go up, go down" effect in terms of performance which is due to the netlink congestion control when the listener cannot back off. The effect is the following: 1) throughput rate goes up and netlink messages are inserted in the receiver buffer. 2) Then, netlink buffer fills and overruns (set on nlk->state bit 0). 3) While the listener empties the receiver buffer, netlink keeps dropping messages. Thus, throughput goes dramatically down. 4) Then, once the listener has emptied the buffer (nlk->state bit 0 is set off), goto step 1. This effect is easy to trigger with netlink broadcast under heavy load, and it is more noticeable when using a big receiver buffer. You can find some results in [1] that show this problem. [1] http://1984.lsi.us.es/linux/netlink/ This patch also includes the use of sk_drop to account the number of netlink messages drop due to overrun. This value is shown in /proc/net/netlink. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | Merge branch 'master' of ↵David S. Miller2009-03-24
|\ \ \ | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6
| * | | nefilter: nfnetlink: add nfnetlink_set_err and use it in ctnetlinkPablo Neira Ayuso2009-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds nfnetlink_set_err() to propagate the error to netlink broadcast listener in case of memory allocation errors in the message building. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net>
* | | | Merge branch 'master' of ↵David S. Miller2009-03-05
|\ \ \ \ | | |/ / | |/| | | | | | | | | | | | | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/tokenring/tmspci.c drivers/net/ucc_geth_mii.c
| * | | netlink: invert error code in netlink_set_err()Pablo Neira Ayuso2009-03-04
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The callers of netlink_set_err() currently pass a negative value as parameter for the error code. However, sk->sk_err wants a positive error value. Without this patch, skb_recv_datagram() called by netlink_recvmsg() may return a positive value to report an error. Another choice to fix this is to change callers to pass a positive error value, but this seems a bit inconsistent and error prone to me. Indeed, the callers of netlink_set_err() assumed that the (usual) negative value for error codes was fine before this patch :). This patch also includes some documentation in docbook format for netlink_set_err() to avoid this sort of confusion. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | netlink: remove some pointless conditionals before kfree_skb()Wei Yongjun2009-02-27
| | | | | | | | | | | | | | | | | | | | | Remove some pointless conditionals before kfree_skb(). Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | netlink: change nlmsg_notify() return value logicPablo Neira Ayuso2009-02-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch changes the return value of nlmsg_notify() as follows: If NETLINK_BROADCAST_ERROR is set by any of the listeners and an error in the delivery happened, return the broadcast error; else if there are no listeners apart from the socket that requested a change with the echo flag, return the result of the unicast notification. Thus, with this patch, the unicast notification is handled in the same way of a broadcast listener that has set the NETLINK_BROADCAST_ERROR socket flag. This patch is useful in case that the caller of nlmsg_notify() wants to know the result of the delivery of a netlink notification (including the broadcast delivery) and take any action in case that the delivery failed. For example, ctnetlink can drop packets if the event delivery failed to provide reliable logging and state-synchronization at the cost of dropping packets. This patch also modifies the rtnetlink code to ignore the return value of rtnl_notify() in all callers. The function rtnl_notify() (before this patch) returned the error of the unicast notification which makes rtnl_set_sk_err() reports errors to all listeners. This is not of any help since the origin of the change (the socket that requested the echoing) notices the ENOBUFS error if the notification fails and should resync itself. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | netlink: add NETLINK_BROADCAST_ERROR socket optionPablo Neira Ayuso2009-02-20
| |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds NETLINK_BROADCAST_ERROR which is a netlink socket option that the listener can set to make netlink_broadcast() return errors in the delivery to the caller. This option is useful if the caller of netlink_broadcast() do something with the result of the message delivery, like in ctnetlink where it drops a network packet if the event delivery failed, this is used to enable reliable logging and state-synchronization. If this socket option is not set, netlink_broadcast() only reports ESRCH errors and silently ignore ENOBUFS errors, which is what most netlink_broadcast() callers should do. This socket option is based on a suggestion from Patrick McHardy. Patrick McHardy can exchange this patch for a beer from me ;). Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | netlink: change return-value logic of netlink_broadcast()Pablo Neira Ayuso2009-02-06
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, netlink_broadcast() reports errors to the caller if no messages at all were delivered: 1) If, at least, one message has been delivered correctly, returns 0. 2) Otherwise, if no messages at all were delivered due to skb_clone() failure, return -ENOBUFS. 3) Otherwise, if there are no listeners, return -ESRCH. With this patch, the caller knows if the delivery of any of the messages to the listeners have failed: 1) If it fails to deliver any message (for whatever reason), return -ENOBUFS. 2) Otherwise, if all messages were delivered OK, returns 0. 3) Otherwise, if no listeners, return -ESRCH. In the current ctnetlink code and in Netfilter in general, we can add reliable logging and connection tracking event delivery by dropping the packets whose events were not successfully delivered over Netlink. Of course, this option would be settable via /proc as this approach reduces performance (in terms of filtered connections per seconds by a stateful firewall) but providing reliable logging and event delivery (for conntrackd) in return. This patch also changes some clients of netlink_broadcast() that may report ENOBUFS errors via printk. This error handling is not of any help. Instead, the userspace daemons that are listening to those netlink messages should resync themselves with the kernel-side if they hit ENOBUFS. BTW, netlink_broadcast() clients include those that call cn_netlink_send(), nlmsg_multicast() and genlmsg_multicast() since they internally call netlink_broadcast() and return its error value. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* genetlink: export genl_unregister_mc_group()Inaky Perez-Gonzalez2009-01-07
| | | | | | | | | | | | | | | | | | | | Add an EXPORT_SYMBOL() to genl_unregister_mc_group(), to allow unregistering groups on the run. EXPORT_SYMBOL_GPL() is not used as the rest of the functions exported by this module (eg: genl_register_mc_group) are also not _GPL(). Cleanup is currently done when unregistering a family, but there is no way to unregister a single multicast group due to that function not being exported. Seems to be a mistake as it is documented as for external consumption. This is needed by the WiMAX stack to be able to cleanup unused mc groups. Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* netlink: allow empty nested attributesPatrick McHardy2008-11-28
| | | | | | | | | | | | | validate_nla() currently doesn't allow empty nested attributes. This makes userspace code unnecessarily complicated when starting and ending the nested attribute is done by generic upper level code and the inner attributes are dumped by a module. Add a special case to accept empty nested attributes. When the nested attribute is non empty, the same checks as before are performed. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Make sure BHs are disabled in sock_prot_inuse_add()Eric Dumazet2008-11-24
| | | | | | | | | There is still a call to sock_prot_inuse_add() in af_netlink while in a preemptable section. Add explicit BH disable around this call. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Make sure BHs are disabled in sock_prot_inuse_add()David S. Miller2008-11-23
| | | | | | | | | | The rule of calling sock_prot_inuse_add() is that BHs must be disabled. Some new calls were added where this was not true and this tiggers warnings as reported by Ilpo. Fix this by adding explicit BH disabling around those call sites. Signed-off-by: David S. Miller <davem@davemloft.net>
* net: af_netlink should update its inuse counterEric Dumazet2008-11-23
| | | | | | | | | In order to have relevant information for NETLINK protocol, in /proc/net/protocols, we should use sock_prot_inuse_add() to update a (percpu and pernamespace) counter of inuse sockets. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* netlink: constify struct nlattr * arg to parsing functionsPatrick McHardy2008-10-28
| | | | | Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Remove CONFIG_KMOD from net/ (towards removing CONFIG_KMOD entirely)Johannes Berg2008-10-16
| | | | | | | | | | | Some code here depends on CONFIG_KMOD to not try to load protocol modules or similar, replace by CONFIG_MODULES where more than just request_module depends on CONFIG_KMOD and and also use try_then_request_module in ebtables. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Rationalise email address: Network Specific PartsAlan Cox2008-10-13
| | | | | | | | | | Clean up the various different email addresses of mine listed in the code to a single current and valid address. As Dave says his network merges for 2.6.28 are now done this seems a good point to send them in where they won't risk disrupting real changes. Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: convert BUG_TRAP to generic WARN_ONIlpo Järvinen2008-07-26
| | | | | | | | | | | | | | Removes legacy reinvent-the-wheel type thing. The generic machinery integrates much better to automated debugging aids such as kerneloops.org (and others), and is unambiguous due to better naming. Non-intuively BUG_TRAP() is actually equal to WARN_ON() rather than BUG_ON() though some might actually be promoted to BUG_ON() but I left that to future. I could make at least one BUILD_BUG_ON conversion. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'master' of ↵David S. Miller2008-07-06
|\ | | | | | | | | | | | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: Documentation/feature-removal-schedule.txt drivers/net/wan/hdlc_fr.c drivers/net/wireless/iwlwifi/iwl-4965.c drivers/net/wireless/iwlwifi/iwl3945-base.c
| * netlink: Unneeded local variableWang Chen2008-07-01
| | | | | | | | | | | | | | We already have a variable, which has the same capability. Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'master' of ↵David S. Miller2008-06-28
|\| | | | | | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/wireless/iwlwifi/iwl4965-base.c
| * netlink: Fix some doc comments in net/netlink/attr.cJulius Volz2008-06-27
| | | | | | | | | | | | | | | | Fix some doc comments to match function and attribute names in net/netlink/attr.c. Signed-off-by: Julius Volz <juliusv@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>