aboutsummaryrefslogtreecommitdiffstats
path: root/include/linux/netlink.h
Commit message (Collapse)AuthorAge
* net: Make userland include of netlink.h more sane.David S. Miller2011-08-08
| | | | | | | | | | | | | | | | | | | | | | | | Currently userland will barf when including linux/netlink.h unless it precisely includes sys/socket.h first. The issue is where the definition of "sa_family_t" comes from. We've been back and forth on how to fix this issue in the past, see: http://thread.gmane.org/gmane.linux.debian.devel.bugs.general/622621 http://thread.gmane.org/gmane.linux.network/143380 Ben Hutchings suggested we take a hint from how we handle the sockaddr_storage type. First we define a "__kernel_sa_family_t" to linux/socket.h that is always defined. Then if __KERNEL__ is defined, we also define "sa_family_t" as equal to "__kernel_sa_family_t". Then in places like linux/netlink.h we use __kernel_sa_family_t in user visible datastructures. Reported-by: Michel Machado <michel@digirati.com.br> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'master' of ↵John W. Linville2011-06-24
|\ | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into for-davem Conflicts: drivers/net/wireless/rtlwifi/pci.c include/linux/netlink.h
| * netlink: advertise incomplete dumpsJohannes Berg2011-06-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Consider the following situation: * a dump that would show 8 entries, four in the first round, and four in the second * between the first and second rounds, 6 entries are removed * now the second round will not show any entry, and even if there is a sequence/generation counter the application will not know To solve this problem, add a new flag NLM_F_DUMP_INTR to the netlink header that indicates the dump wasn't consistent, this flag can also be set on the MSG_DONE message that terminates the dump, and as such above situation can be detected. To achieve this, add a sequence counter to the netlink callback struct. Of course, netlink code still needs to use this new functionality. The correct way to do that is to always set cb->seq when a dumpit callback is invoked and call nl_dump_check_consistent() for each new message. The core code will also call this function for the final MSG_DONE message. To make it usable with generic netlink, a new function genlmsg_nlhdr() is needed to obtain the netlink header from the genetlink user header. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
* | rtnetlink: Compute and store minimum ifinfo dump sizeGreg Rose2011-06-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The message size allocated for rtnl ifinfo dumps was limited to a single page. This is not enough for additional interface info available with devices that support SR-IOV and caused a bug in which VF info would not be displayed if more than approximately 40 VFs were created per interface. Implement a new function pointer for the rtnl_register service that will calculate the amount of data required for the ifinfo dump and allocate enough data to satisfy the request. Signed-off-by: Greg Rose <gregory.v.rose@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
* | RDMA: Add netlink infrastructureRoland Dreier2011-05-20
|/ | | | | | | | | | | | Add basic RDMA netlink infrastructure that allows for registration of RDMA clients for which data is to be exported and supplies message construction callbacks. Signed-off-by: Nir Muchtar <nirm@voltaire.com> [ Reorganize a few things, add CONFIG_NET dependency. - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>
* netlink: kill eff_cap from struct netlink_skb_parmsPatrick McHardy2011-03-03
| | | | | | | | | | | | Netlink message processing in the kernel is synchronous these days, capabilities can be checked directly in security_netlink_recv() from the current process. Signed-off-by: Patrick McHardy <kaber@trash.net> Reviewed-by: James Morris <jmorris@namei.org> [chrisw: update to include pohmelfs and uvesafb] Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* netlink: kill loginuid/sessionid/sid members from struct netlink_skb_parmsPatrick McHardy2011-03-03
| | | | | | | | Netlink message processing in the kernel is synchronous these days, the session information can be collected when needed. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* netlink: fix gcc -Wconversion compilation warningDmitry V. Levin2010-12-17
| | | | | | | | | | | | $ cat << EOF | gcc -Wconversion -xc -S -o/dev/null - unsigned f(void) {return NLMSG_HDRLEN;} EOF <stdin>: In function 'f': <stdin>:3:26: warning: negative integer implicitly converted to unsigned type Signed-off-by: Dmitry V. Levin <ldv@altlinux.org> Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Move "struct net" declaration inside the __KERNEL__ macro guardOllie Wild2010-09-22
| | | | | | | | | | | | This patch reduces namespace pollution by moving the "struct net" declaration out of the userspace-facing portion of linux/netlink.h. It has no impact on the kernel. (This came up because we have several C++ applications which use "net" as a namespace name.) Signed-off-by: Ollie Wild <aaw@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* netlink: Implment netlink_broadcast_filteredEric W. Biederman2010-05-21
| | | | | | | | | | | | | | | | | When netlink sockets are used to convey data that is in a namespace we need a way to select a subset of the listening sockets to deliver the packet to. For the network namespace we have been doing this by only transmitting packets in the correct network namespace. For data belonging to other namespaces netlink_bradcast_filtered provides a mechanism that allows us to examine the destination socket and to decide if we should transmit the specified packet to it. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* netlink: fix NETLINK_RECV_NO_ENOBUFS in netlink_set_err()Pablo Neira Ayuso2010-03-20
| | | | | | | | | | | | Currently, ENOBUFS errors are reported to the socket via netlink_set_err() even if NETLINK_RECV_NO_ENOBUFS is set. However, that should not happen. This fixes this problem and it changes the prototype of netlink_set_err() to return the number of sockets that have set the NETLINK_RECV_NO_ENOBUFS socket option. This return value is used in the next patch in these bugfix series. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: cleanup include/linuxEric Dumazet2009-11-04
| | | | | | | | | | | | | | | This cleanup patch puts struct/union/enum opening braces, in first line to ease grep games. struct something { becomes : struct something { Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* genetlink: fix netns vs. netlink table locking (2)Johannes Berg2009-09-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | Similar to commit d136f1bd366fdb7e747ca7e0218171e7a00a98a5, there's a bug when unregistering a generic netlink family, which is caught by the might_sleep() added in that commit: BUG: sleeping function called from invalid context at net/netlink/af_netlink.c:183 in_atomic(): 1, irqs_disabled(): 0, pid: 1510, name: rmmod 2 locks held by rmmod/1510: #0: (genl_mutex){+.+.+.}, at: [<ffffffff8138283b>] genl_unregister_family+0x2b/0x130 #1: (rcu_read_lock){.+.+..}, at: [<ffffffff8138270c>] __genl_unregister_mc_group+0x1c/0x120 Pid: 1510, comm: rmmod Not tainted 2.6.31-wl #444 Call Trace: [<ffffffff81044ff9>] __might_sleep+0x119/0x150 [<ffffffff81380501>] netlink_table_grab+0x21/0x100 [<ffffffff813813a3>] netlink_clear_multicast_users+0x23/0x60 [<ffffffff81382761>] __genl_unregister_mc_group+0x71/0x120 [<ffffffff81382866>] genl_unregister_family+0x56/0x130 [<ffffffffa0007d85>] nl80211_exit+0x15/0x20 [cfg80211] [<ffffffffa000005a>] cfg80211_exit+0x1a/0x40 [cfg80211] Fix in the same way by grabbing the netlink table lock before doing rcu_read_lock(). Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* genetlink: fix netns vs. netlink table lockingJohannes Berg2009-09-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | Since my commits introducing netns awareness into genetlink we can get this problem: BUG: scheduling while atomic: modprobe/1178/0x00000002 2 locks held by modprobe/1178: #0: (genl_mutex){+.+.+.}, at: [<ffffffff8135ee1a>] genl_register_mc_grou #1: (rcu_read_lock){.+.+..}, at: [<ffffffff8135eeb5>] genl_register_mc_g Pid: 1178, comm: modprobe Not tainted 2.6.31-rc8-wl-34789-g95cb731-dirty # Call Trace: [<ffffffff8103e285>] __schedule_bug+0x85/0x90 [<ffffffff81403138>] schedule+0x108/0x588 [<ffffffff8135b131>] netlink_table_grab+0xa1/0xf0 [<ffffffff8135c3a7>] netlink_change_ngroups+0x47/0x100 [<ffffffff8135ef0f>] genl_register_mc_group+0x12f/0x290 because I overlooked that netlink_table_grab() will schedule, thinking it was just the rwlock. However, in the contention case, that isn't actually true. Fix this by letting the code grab the netlink table lock first and then the RCU for netns protection. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* netlink: constify nlmsghdr argumentsPatrick McHardy2009-08-25
| | | | | | | | Consitfy nlmsghdr arguments to a couple of functions as preparation for the next patch, which will constify the netlink message data in all nfnetlink users. Signed-off-by: Patrick McHardy <kaber@trash.net>
* netlink: add NETLINK_NO_ENOBUFS socket flagPablo Neira Ayuso2009-03-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds the NETLINK_NO_ENOBUFS socket flag. This flag can be used by unicast and broadcast listeners to avoid receiving ENOBUFS errors. Generally speaking, ENOBUFS errors are useful to notify two things to the listener: a) You may increase the receiver buffer size via setsockopt(). b) You have lost messages, you may be out of sync. In some cases, ignoring ENOBUFS errors can be useful. For example: a) nfnetlink_queue: this subsystem does not have any sort of resync method and you can decide to ignore ENOBUFS once you have set a given buffer size. b) ctnetlink: you can use this together with the socket flag NETLINK_BROADCAST_SEND_ERROR to stop getting ENOBUFS errors as you do not need to resync (packets whose event are not delivered are drop to provide reliable logging and state-synchronization). Moreover, the use of NETLINK_NO_ENOBUFS also reduces a "go up, go down" effect in terms of performance which is due to the netlink congestion control when the listener cannot back off. The effect is the following: 1) throughput rate goes up and netlink messages are inserted in the receiver buffer. 2) Then, netlink buffer fills and overruns (set on nlk->state bit 0). 3) While the listener empties the receiver buffer, netlink keeps dropping messages. Thus, throughput goes dramatically down. 4) Then, once the listener has emptied the buffer (nlk->state bit 0 is set off), goto step 1. This effect is easy to trigger with netlink broadcast under heavy load, and it is more noticeable when using a big receiver buffer. You can find some results in [1] that show this problem. [1] http://1984.lsi.us.es/linux/netlink/ This patch also includes the use of sk_drop to account the number of netlink messages drop due to overrun. This value is shown in /proc/net/netlink. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* netlink: add NETLINK_BROADCAST_ERROR socket optionPablo Neira Ayuso2009-02-20
| | | | | | | | | | | | | | | | | | | | This patch adds NETLINK_BROADCAST_ERROR which is a netlink socket option that the listener can set to make netlink_broadcast() return errors in the delivery to the caller. This option is useful if the caller of netlink_broadcast() do something with the result of the message delivery, like in ctnetlink where it drops a network packet if the event delivery failed, this is used to enable reliable logging and state-synchronization. If this socket option is not set, netlink_broadcast() only reports ESRCH errors and silently ignore ENOBUFS errors, which is what most netlink_broadcast() callers should do. This socket option is based on a suggestion from Patrick McHardy. Patrick McHardy can exchange this patch for a beer from me ;). Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* netlink: avoid memset of 0 bytes sparse warningPatrick McHardy2008-11-20
| | | | | | | | | | | A netlink attribute padding of zero triggers this sparse warning: include/linux/netlink.h:245:8: warning: memset with byte count of 0 Avoid the memset when the size parameter is constant and requires no padding. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* netlink: Remove nonblock parameter from netlink_attachskbDenis V. Lunev2008-06-05
| | | | | Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* Audit: collect sessionid in netlink messagesEric Paris2008-04-28
| | | | | | | | | | Previously I added sessionid output to all audit messages where it was available but we still didn't know the sessionid of the sender of netlink messages. This patch adds that information to netlink messages so we can audit who sent netlink messages. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* [IPV4] fib_trie: rescan if key is lost during dumpStephen Hemminger2008-01-31
| | | | | | | | | | Normally during a dump the key of the last dumped entry is used for continuation, but since lock is dropped it might be lost. In that case fallback to the old counter based N^2 behaviour. This means the dump will end up skipping some routes which matches what FIB_HASH does. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS]: Consolidate kernel netlink socket destruction.Denis V. Lunev2008-01-28
| | | | | | | | | | Create a specific helper for netlink kernel socket disposal. This just let the code look better and provides a ground for proper disposal inside a namespace. Signed-off-by: Denis V. Lunev <den@openvz.org> Tested-by: Alexey Dobriyan <adobriyan@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETLINK]: Mark attribute construction exception unlikelyPatrick McHardy2008-01-28
| | | | | Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETLINK]: Fix unicast timeoutsPatrick McHardy2007-11-07
| | | | | | | | | | | | | Commit ed6dcf4a in the history.git tree broke netlink_unicast timeouts by moving the schedule_timeout() call to a new function that doesn't propagate the remaining timeout back to the caller. This means on each retry we start with the full timeout again. ipc/mqueue.c seems to actually want to wait indefinitely so this behaviour is retained. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: make netlink user -> kernel interface synchroniousDenis V. Lunev2007-10-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch make processing netlink user -> kernel messages synchronious. This change was inspired by the talk with Alexey Kuznetsov about current netlink messages processing. He says that he was badly wrong when introduced asynchronious user -> kernel communication. The call netlink_unicast is the only path to send message to the kernel netlink socket. But, unfortunately, it is also used to send data to the user. Before this change the user message has been attached to the socket queue and sk->sk_data_ready was called. The process has been blocked until all pending messages were processed. The bad thing is that this processing may occur in the arbitrary process context. This patch changes nlk->data_ready callback to get 1 skb and force packet processing right in the netlink_unicast. Kernel -> user path in netlink_unicast remains untouched. EINTR processing for in netlink_run_queue was changed. It forces rtnl_lock drop, but the process remains in the cycle until the message will be fully processed. So, there is no need to use this kludges now. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: cleanup 3rd argument in netlink_sendskbDenis V. Lunev2007-10-11
| | | | | | | | | netlink_sendskb does not use third argument. Clean it and save a couple of bytes. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETLINK]: Introduce nested and byteorder flag to netlink attributeThomas Graf2007-10-10
| | | | | | | | | | | This change allows the generic attribute interface to be used within the netfilter subsystem where this flag was initially introduced. The byte-order flag is yet unused, it's intended use is to allow automatic byte order convertions for all atomic types. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Support multiple network namespaces with netlinkEric W. Biederman2007-10-10
| | | | | | | | | | | | | | | | | | | | | Each netlink socket will live in exactly one network namespace, this includes the controlling kernel sockets. This patch updates all of the existing netlink protocols to only support the initial network namespace. Request by clients in other namespaces will get -ECONREFUSED. As they would if the kernel did not have the support for that netlink protocol compiled in. As each netlink protocol is updated to be multiple network namespace safe it can register multiple kernel sockets to acquire a presence in the rest of the network namespaces. The implementation in af_netlink is a simple filter implementation at hash table insertion and hash table look up time. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETLIKN]: Allow removing multicast groups.Johannes Berg2007-07-18
| | | | | | | | | | Allow kicking listeners out of a multicast group when necessary (for example if that group is going to be removed.) Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Acked-by: Patrick McHardy <kaber@trash.net> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETLINK]: allocate group bitmaps dynamicallyJohannes Berg2007-07-18
| | | | | | | | | | | Allow changing the number of groups for a netlink family after it has been created, use RCU to protect the listeners bitmap keeping netlink_has_listeners() lock-free. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Acked-by: Patrick McHardy <kaber@trash.net> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETLINK]: Remove references to process IDHerbert Xu2007-05-05
| | | | | | | | | | | | | People treating the *_pid fields in netlink as a process ID has caused endless confusion over the years. The fact that our own netlink.h does this only adds to the confusion. So here is a patch to change the comments to refer to it as the port ID which hopefully will make it clear what the purpose of the fields really is. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETLINK]: Switch cb_lock spinlock to mutex and allow to override itPatrick McHardy2007-04-26
| | | | | | | | | | Switch cb_lock to mutex and allow netlink kernel users to override it with a subsystem specific mutex for consistent locking in dump callbacks. All netlink_dump_start users have been audited not to rely on any side-effects of the previously used spinlock. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETLINK]: Remove NLMSG_{NEW_ANSWER,CANCEL,END}Arnaldo Carvalho de Melo2007-04-26
| | | | | | Not used anywhere and defined inside __KERNEL__, Thomas acked this on irc. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* [NETLINK]: Introduce nlmsg_hdr() helperArnaldo Carvalho de Melo2007-04-26
| | | | | | | | | For the common "(struct nlmsghdr *)skb->data" sequence, so that we reduce the number of direct accesses to skb->data and for consistency with all the other cast skb member helpers. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [SK_BUFF]: Convert skb->tail to sk_buff_data_tArnaldo Carvalho de Melo2007-04-26
| | | | | | | | | | | | | | | So that it is also an offset from skb->head, reduces its size from 8 to 4 bytes on 64bit architectures, allowing us to combine the 4 bytes hole left by the layer headers conversion, reducing struct sk_buff size to 256 bytes, i.e. 4 64byte cachelines, and since the sk_buff slab cache is SLAB_HWCACHE_ALIGN... :-) Many calculations that previously required that skb->{transport,network, mac}_header be first converted to a pointer now can be done directly, being meaningful as offsets or pointers. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETLINK]: Limit NLMSG_GOODSIZE to 8K.David S. Miller2007-04-26
| | | | Signed-off-by: David S. Miller <davem@davemloft.net>
* [PATCH] eCryptfs: Public key transport mechanismMichael Halcrow2007-02-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is the transport code for public key functionality in eCryptfs. It manages encryption/decryption request queues with a transport mechanism. Currently, netlink is the only implemented transport. Each inode has a unique File Encryption Key (FEK). Under passphrase, a File Encryption Key Encryption Key (FEKEK) is generated from a salt/passphrase combo on mount. This FEKEK encrypts each FEK and writes it into the header of each file using the packet format specified in RFC 2440. This is all symmetric key encryption, so it can all be done via the kernel crypto API. These new patches introduce public key encryption of the FEK. There is no asymmetric key encryption support in the kernel crypto API, so eCryptfs pushes the FEK encryption and decryption out to a userspace daemon. After considering our requirements and determining the complexity of using various transport mechanisms, we settled on netlink for this communication. eCryptfs stores authentication tokens into the kernel keyring. These tokens correlate with individual keys. For passphrase mode of operation, the authentication token contains the symmetric FEKEK. For public key, the authentication token contains a PKI type and an opaque data blob managed by individual PKI modules in userspace. Each user who opens a file under an eCryptfs partition mounted in public key mode must be running a daemon. That daemon has the user's credentials and has access to all of the keys to which the user should have access. The daemon, when started, initializes the pluggable PKI modules available on the system and registers itself with the eCryptfs kernel module. Userspace utilities register public key authentication tokens into the user session keyring. These authentication tokens correlate key signatures with PKI modules and PKI blobs. The PKI blobs contain PKI-specific information necessary for the PKI module to carry out asymmetric key encryption and decryption. When the eCryptfs module parses the header of an existing file and finds a Tag 1 (Public Key) packet (see RFC 2440), it reads in the public key identifier (signature). The asymmetrically encrypted FEK is in the Tag 1 packet; eCryptfs puts together a decrypt request packet containing the signature and the encrypted FEK, then it passes it to the daemon registered for the current->euid via a netlink unicast to the PID of the daemon, which was registered at the time the daemon was started by the user. The daemon actually just makes calls to libecryptfs, which implements request packet parsing and manages PKI modules. libecryptfs grabs the public key authentication token for the given signature from the user session keyring. This auth tok tells libecryptfs which PKI module should receive the request. libecryptfs then makes a decrypt() call to the PKI module, and it passes along the PKI block from the auth tok. The PKI uses the blob to figure out how it should decrypt the data passed to it; it performs the decryption and passes the decrypted data back to libecryptfs. libecryptfs then puts together a reply packet with the decrypted FEK and passes that back to the eCryptfs module. The eCryptfs module manages these request callouts to userspace code via message context structs. The module maintains an array of message context structs and places the elements of the array on two lists: a free and an allocated list. When eCryptfs wants to make a request, it moves a msg ctx from the free list to the allocated list, sets its state to pending, and fires off the message to the user's registered daemon. When eCryptfs receives a netlink message (via the callback), it correlates the msg ctx struct in the alloc list with the data in the message itself. The msg->index contains the offset of the array of msg ctx structs. It verifies that the registered daemon PID is the same as the PID of the process that sent the message. It also validates a sequence number between the received packet and the msg ctx. Then, it copies the contents of the message (the reply packet) into the msg ctx struct, sets the state in the msg ctx to done, and wakes up the process that was sleeping while waiting for the reply. The sleeping process was whatever was performing the sys_open(). This process originally called ecryptfs_send_message(); it is now in ecryptfs_wait_for_response(). When it wakes up and sees that the msg ctx state was set to done, it returns a pointer to the message contents (the reply packet) and returns. If all went well, this packet contains the decrypted FEK, which is then copied into the crypt_stat struct, and life continues as normal. The case for creation of a new file is very similar, only instead of a decrypt request, eCryptfs sends out an encrypt request. > - We have a great clod of key mangement code in-kernel. Why is that > not suitable (or growable) for public key management? eCryptfs uses Howells' keyring to store persistent key data and PKI state information. It defers public key cryptographic transformations to userspace code. The userspace data manipulation request really is orthogonal to key management in and of itself. What eCryptfs basically needs is a secure way to communicate with a particular daemon for a particular task doing a syscall, based on the UID. Nothing running under another UID should be able to access that channel of communication. > - Is it appropriate that new infrastructure for public key > management be private to a particular fs? The messaging.c file contains a lot of code that, perhaps, could be extracted into a separate kernel service. In essence, this would be a sort of request/reply mechanism that would involve a userspace daemon. I am not aware of anything that does quite what eCryptfs does, so I was not aware of any existing tools to do just what we wanted. > What happens if one of these daemons exits without sending a quit > message? There is a stale uid<->pid association in the hash table for that user. When the user registers a new daemon, eCryptfs cleans up the old association and generates a new one. See ecryptfs_process_helo(). > - _why_ does it use netlink? Netlink provides the transport mechanism that would minimize the complexity of the implementation, given that we can have multiple daemons (one per user). I explored the possibility of using relayfs, but that would involve having to introduce control channels and a protocol for creating and tearing down channels for the daemons. We do not have to worry about any of that with netlink. Signed-off-by: Michael Halcrow <mhalcrow@us.ibm.com> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [NETLINK]: Remove unused dst_pid field in netlink_skb_parmsThomas Graf2006-12-03
| | | | | | | | The destination PID is passed directly to netlink_unicast() respectively netlink_multicast(). Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETLINK]: Do precise netlink message allocations where possibleThomas Graf2006-12-03
| | | | | | | | | | | | | Account for the netlink message header size directly in nlmsg_new() instead of relying on the caller calculate it correctly. Replaces error handling of message construction functions when constructing notifications with bug traps since a failure implies a bug in calculating the size of the skb. Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Paul Moore <paul.moore@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [SCSI] SCSI and FC Transport: add netlink support for posting of transport ↵James Smart2006-09-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | events This patch formally adds support for the posting of FC events via netlink. It is a followup to the original RFC at: http://marc.theaimsgroup.com/?l=linux-scsi&m=114530667923464&w=2 and the initial posting at: http://marc.theaimsgroup.com/?l=linux-scsi&m=115507374832500&w=2 The patch has been updated to optimize the send path, per the discussions in the initial posting. Per discussions at the Storage Summit and at OLS, we are to use netlink for async events from transports. Also per discussions, to avoid a netlink protocol per transport, I've create a single NETLINK_SCSITRANSPORT protocol, which can then be used by all transports. This patch: - Creates new files scsi_netlink.c and scsi_netlink.h, which contains the single and shared definitions for the SCSI Transport. It is tied into the base SCSI subsystem intialization. Contains a single interface routine, scsi_send_transport_event(), for a transport to send an event (via multicast to a protocol specific group). - Creates a new scsi_netlink_fc.h file, which contains the FC netlink event messages - Adds 3 new routines to the fc transport: fc_get_event_number() - to get a FC event # fc_host_post_event() - to send a simple FC event (32 bits of data) fc_host_post_vendor_event() - to send a Vendor unique event, with arbitrary amounts of data. Note: the separation of event number allows for a LLD to send a standard event, followed by vendor-specific data for the event. Note: This patch assumes 2 prior fc transport patches have been installed: http://marc.theaimsgroup.com/?l=linux-scsi&m=115555807316329&w=2 http://marc.theaimsgroup.com/?l=linux-scsi&m=115581614930261&w=2 Sorry - next time I'll do something like making these individual patches of the same posting when I know they'll be posted closely together. Signed-off-by: James Smart <James.Smart@emulex.com> Tidy up configuration not to make SCSI always select NET Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
* [PATCH] w1: netlink: Mark netlink group 1 as unused.Evgeniy Polyakov2006-06-22
| | | | | | | netlink_w1 was moved to connector. Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* [PATCH] Reworked patch for labels on user space messagesSteve Grubb2006-05-01
| | | | | | | | | | | | | The below patch should be applied after the inode and ipc sid patches. This patch is a reworking of Tim's patch that has been updated to match the inode and ipc patches since its similar. [updated: > Stephen Smalley also wanted to change a variable from isec to tsec in the > user sid patch. ] Signed-off-by: Steve Grubb <sgrubb@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* [NETLINK]: Add netlink_has_listeners for avoiding unneccessary event message ↵Patrick McHardy2006-03-20
| | | | | | | | | | | | | | | | generation Keep a bitmask of multicast groups with subscribed listeners to let netlink users check for listeners before generating multicast messages. Queries don't perform any locking, which may result in false positives, it is guaranteed however that any new subscriptions are visible before bind() or setsockopt() return. Signed-off-by: Patrick McHardy <kaber@trash.net> ACKed-by: Jamal Hadi Salim<hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETLINK]: Fix a severe bugAlexey Kuznetsov2006-02-09
| | | | | | | | | | | | | | | | | | | | | | netlink overrun was broken while improvement of netlink. Destination socket is used in the place where it was meant to be source socket, so that now overrun is never sent to user netlink sockets, when it should be, and it even can be set on kernel socket, which results in complete deadlock of rtnetlink. Suggested fix is to restore status quo passing source socket as additional argument to netlink_attachskb(). A little explanation: overrun is set on a socket, when it failed to receive some message and sender of this messages does not or even have no way to handle this error. This happens in two cases: 1. when kernel sends something. Kernel never retransmits and cannot wait for buffer space. 2. when user sends a broadcast and the message was not delivered to some recipients. Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETLINK]: Type-safe netlink messages/attributes interfaceThomas Graf2005-11-09
| | | | | | | | | | | | | | Introduces a new type-safe interface for netlink message and attributes handling. The interface is fully binary compatible with the old interface towards userspace. Besides type safety, this interface features attribute validation capabilities, simplified message contstruction, and documentation. The resulting netlink code should be smaller, less error prone and easier to understand. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
* [PATCH] gfp flags annotations - part 1Al Viro2005-10-08
| | | | | | | | | | | | - added typedef unsigned int __nocast gfp_t; - replaced __nocast uses for gfp flags with gfp_t - it gives exactly the same warnings as far as sparse is concerned, doesn't change generated code (from gcc point of view we replaced unsigned int with typedef) and documents what's going on far better. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [NETLINK]: Reserve a slot for NETLINK_GENERIC.David S. Miller2005-09-14
| | | | | | As requested by Jamal. Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Add netlink connector.Evgeniy Polyakov2005-09-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Kernel connector - new userspace <-> kernel space easy to use communication module which implements easy to use bidirectional message bus using netlink as it's backend. Connector was created to eliminate complex skb handling both in send and receive message bus direction. Connector driver adds possibility to connect various agents using as one of it's backends netlink based network. One must register callback and identifier. When driver receives special netlink message with appropriate identifier, appropriate callback will be called. From the userspace point of view it's quite straightforward: socket(); bind(); send(); recv(); But if kernelspace want to use full power of such connections, driver writer must create special sockets, must know about struct sk_buff handling... Connector allows any kernelspace agents to use netlink based networking for inter-process communication in a significantly easier way: int cn_add_callback(struct cb_id *id, char *name, void (*callback) (void *)); void cn_netlink_send(struct cn_msg *msg, u32 __groups, int gfp_mask); struct cb_id { __u32 idx; __u32 val; }; idx and val are unique identifiers which must be registered in connector.h for in-kernel usage. void (*callback) (void *) - is a callback function which will be called when message with above idx.val will be received by connector core. Using connector completely hides low-level transport layer from it's users. Connector uses new netlink ability to have many groups in one socket. [ Incorporating many cleanups and fixes by myself and Andrew Morton -DaveM ] Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETLINK]: Fix sparse warningsArnaldo Carvalho de Melo2005-08-29
| | | | | Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETLINK]: Add "groups" argument to netlink_kernel_createPatrick McHardy2005-08-29
| | | | | Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>