aboutsummaryrefslogtreecommitdiffstats
path: root/include/net/af_unix.h
Commit message (Collapse)AuthorAge
* net: Fix soft lockups/OOM issues w/ unix garbage collectordann frazier2008-11-26
| | | | | | | | | | | | | | | | | | This is an implementation of David Miller's suggested fix in: https://bugzilla.redhat.com/show_bug.cgi?id=470201 It has been updated to use wait_event() instead of wait_event_interruptible(). Paraphrasing the description from the above report, it makes sendmsg() block while UNIX garbage collection is in progress. This avoids a situation where child processes continue to queue new FDs over a AF_UNIX socket to a parent which is in the exit path and running garbage collection on these FDs. This contention can result in soft lockups and oom-killing of unrelated processes. Signed-off-by: dann frazier <dannf@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: unix: fix inflight counting bug in garbage collectorMiklos Szeredi2008-11-09
| | | | | | | | | | | | | | | | | | | | | | | Previously I assumed that the receive queues of candidates don't change during the GC. This is only half true, nothing can be received from the queues (see comment in unix_gc()), but buffers could be added through the other half of the socket pair, which may still have file descriptors referring to it. This can result in inc_inflight_move_tail() erronously increasing the "inflight" counter for a unix socket for which dec_inflight() wasn't previously called. This in turn can trigger the "BUG_ON(total_refs < inflight_refs)" in a later garbage collection run. Fix this by only manipulating the "inflight" counter for sockets which are candidates themselves. Duplicating the file references in unix_attach_fds() is also needed to prevent a socket becoming a candidate for GC while the skb that contains it is not yet queued. Reported-by: Andrea Bittau <a.bittau@cs.ucl.ac.uk> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [PATCH] f_count may wrap aroundAl Viro2008-07-26
| | | | | | | make it atomic_long_t; while we are at it, get rid of useless checks in affs, hfs and hpfs - ->open() always has it equal to 1, ->release() - to 0. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* [AF_UNIX]: Remove unused declaration of sysctl_unix_max_dgram_qlen.Denis V. Lunev2008-01-28
| | | | | Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [UNIX]: Extend unix_sysctl_(un)register prototypesPavel Emelyanov2008-01-28
| | | | | | | | | | | | | Add the struct net * argument to both of them to use in the future. Also make the register one return an error code. It is useless right now, but will make the future patches much simpler. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* [AF_UNIX]: Make unix_tot_inflight counter non-atomicPavel Emelyanov2007-11-11
| | | | | | | | This counter is _always_ modified under the unix_gc_lock spinlock, so its atomicity can be provided w/o additional efforts. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [AF_UNIX]: Make code static.Adrian Bunk2007-07-31
| | | | | | | | | The following code can now become static: - struct unix_socket_table - unix_table_lock Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: David S. Miller <davem@davemloft.net>
* [AF_UNIX]: Rewrite garbage collector, fixes race.Miklos Szeredi2007-07-11
| | | | | | | | | | | | | | | | | | | | | | | | | Throw out the old mark & sweep garbage collector and put in a refcounting cycle detecting one. The old one had a race with recvmsg, that resulted in false positives and hence data loss. The old algorithm operated on all unix sockets in the system, so any additional locking would have meant performance problems for all users of these. The new algorithm instead only operates on "in flight" sockets, which are very rare, and the additional locking for these doesn't negatively impact the vast majority of users. In fact it's probable, that there weren't *any* heavy senders of sockets over sockets, otherwise the above race would have been discovered long ago. The patch works OK with the app that exposed the race with the old code. The garbage collection has also been verified to work in a few simple cases. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
* [AF_UNIX]: Make socket locking much less confusing.David S. Miller2007-06-03
| | | | | | | | | | | The unix_state_*() locking macros imply that there is some rwlock kind of thing going on, but the implementation is actually a spinlock which makes the code more confusing than it needs to be. So use plain unix_state_lock and unix_state_unlock. Signed-off-by: David S. Miller <davem@davemloft.net>
* [AF_UNIX]: Kernel memory leak fix for af_unix datagram getpeersec patchCatherine Zhang2006-08-02
| | | | | | | | | | | | | | | | | | | | From: Catherine Zhang <cxzhang@watson.ibm.com> This patch implements a cleaner fix for the memory leak problem of the original unix datagram getpeersec patch. Instead of creating a security context each time a unix datagram is sent, we only create the security context when the receiver requests it. This new design requires modification of the current unix_getsecpeer_dgram LSM hook and addition of two new hooks, namely, secid_to_secctx and release_secctx. The former retrieves the security context and the latter releases it. A hook is required for releasing the security context because it is up to the security module to decide how that's done. In the case of Selinux, it's a simple kfree operation. Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: David S. Miller <davem@davemloft.net>
* [PATCH] lockdep: annotate af_unix lockingIngo Molnar2006-07-03
| | | | | | | | | | | | Teach special (recursive) locking code to the lock validator. Also splits af_unix's sk_receive_queue.lock class from the other networking skb-queue locks. Has no effect on non-lockdep kernels. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [AF_UNIX]: Datagram getpeersecCatherine Zhang2006-06-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch implements an API whereby an application can determine the label of its peer's Unix datagram sockets via the auxiliary data mechanism of recvmsg. Patch purpose: This patch enables a security-aware application to retrieve the security context of the peer of a Unix datagram socket. The application can then use this security context to determine the security context for processing on behalf of the peer who sent the packet. Patch design and implementation: The design and implementation is very similar to the UDP case for INET sockets. Basically we build upon the existing Unix domain socket API for retrieving user credentials. Linux offers the API for obtaining user credentials via ancillary messages (i.e., out of band/control messages that are bundled together with a normal message). To retrieve the security context, the application first indicates to the kernel such desire by setting the SO_PASSSEC option via getsockopt. Then the application retrieves the security context using the auxiliary data mechanism. An example server application for Unix datagram socket should look like this: toggle = 1; toggle_len = sizeof(toggle); setsockopt(sockfd, SOL_SOCKET, SO_PASSSEC, &toggle, &toggle_len); recvmsg(sockfd, &msg_hdr, 0); if (msg_hdr.msg_controllen > sizeof(struct cmsghdr)) { cmsg_hdr = CMSG_FIRSTHDR(&msg_hdr); if (cmsg_hdr->cmsg_len <= CMSG_LEN(sizeof(scontext)) && cmsg_hdr->cmsg_level == SOL_SOCKET && cmsg_hdr->cmsg_type == SCM_SECURITY) { memcpy(&scontext, CMSG_DATA(cmsg_hdr), sizeof(scontext)); } } sock_setsockopt is enhanced with a new socket option SOCK_PASSSEC to allow a server socket to receive security context of the peer. Testing: We have tested the patch by setting up Unix datagram client and server applications. We verified that the server can retrieve the security context using the auxiliary data mechanism of recvmsg. Signed-off-by: Catherine Zhang <cxzhang@watson.ibm.com> Acked-by: Acked-by: James Morris <jmorris@namei.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* Don't include linux/config.h from anywhere else in include/David Woodhouse2006-04-26
| | | | Signed-off-by: David Woodhouse <dwmw2@infradead.org>
* [NET]: sem2mutex part 2Ingo Molnar2006-03-21
| | | | | | | | | | | Semaphore to mutex conversion. The conversion was generated via scripts, and the result was validated automatically via a script as well. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [AF_UNIX]: Convert to use a spinlock instead of rwlockBenjamin LaHaise2006-01-03
| | | | | | | | | | From: Benjamin LaHaise <bcrl@kvack.org> In af_unix, a rwlock is used to protect internal state. At least on my P4 with HT it is faster to use a spinlock due to the simpler memory barrier used to unlock. This patch raises bw_unix to ~690K/s. Signed-off-by: David S. Miller <davem@davemloft.net>
* [AF_UNIX]: Use spinlock for unix_table_lockDavid S. Miller2006-01-03
| | | | | | | | This lock is actually taken mostly as a writer, so using a rwlock actually just makes performance worse especially on chips like the Intel P4. Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Fix sparse warningsArnaldo Carvalho de Melo2005-08-29
| | | | | | | | | | | Of this type, mostly: CHECK net/ipv6/netfilter.c net/ipv6/netfilter.c:96:12: warning: symbol 'ipv6_netfilter_init' was not declared. Should it be static? net/ipv6/netfilter.c:101:6: warning: symbol 'ipv6_netfilter_fini' was not declared. Should it be static? Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Linux-2.6.12-rc2v2.6.12-rc2Linus Torvalds2005-04-16
Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!