aboutsummaryrefslogtreecommitdiffstats
path: root/net/rxrpc
Commit message (Collapse)AuthorAge
* Add a dummy printk function for the maintenance of unused printksDavid Howells2010-08-12
| | | | | | | | Add a dummy printk function for the maintenance of unused printks through gcc format checking, and also so that side-effect checking is maintained too. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* RxRPC: Fix a potential deadlock between the call resend_timer and state_lockDavid Howells2010-08-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | RxRPC can potentially deadlock as rxrpc_resend_time_expired() wants to get call->state_lock so that it can alter the state of an RxRPC call. However, its caller (call_timer_fn()) has an apparent lock on the timer struct. The problem is that rxrpc_resend_time_expired() isn't permitted to lock call->state_lock as this could cause a deadlock against rxrpc_send_abort() as that takes state_lock and then attempts to delete the resend timer by calling del_timer_sync(). The deadlock can occur because del_timer_sync() will sit there forever waiting for rxrpc_resend_time_expired() to return, but the latter may then wait for call->state_lock, which rxrpc_send_abort() holds around del_timer_sync()... This leads to a warning appearing in the kernel log that looks something like the attached. It should be sufficient to simply dispense with the locks. It doesn't matter if we set the resend timer expired event bit and queue the event processor whilst we're changing state to one where the resend timer is irrelevant as the event can just be ignored by the processor thereafter. ======================================================= [ INFO: possible circular locking dependency detected ] 2.6.35-rc3-cachefs+ #115 ------------------------------------------------------- swapper/0 is trying to acquire lock: (&call->state_lock){++--..}, at: [<ffffffffa00200d4>] rxrpc_resend_time_expired+0x56/0x96 [af_rxrpc] but task is already holding lock: (&call->resend_timer){+.-...}, at: [<ffffffff8103b675>] run_timer_softirq+0x182/0x2a5 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&call->resend_timer){+.-...}: [<ffffffff810560bc>] __lock_acquire+0x889/0x8fa [<ffffffff81056184>] lock_acquire+0x57/0x6d [<ffffffff8103bb9c>] del_timer_sync+0x3c/0x86 [<ffffffffa002bb7a>] rxrpc_send_abort+0x50/0x97 [af_rxrpc] [<ffffffffa002bdd9>] rxrpc_kernel_abort_call+0xa1/0xdd [af_rxrpc] [<ffffffffa0061588>] afs_deliver_to_call+0x129/0x368 [kafs] [<ffffffffa006181b>] afs_process_async_call+0x54/0xff [kafs] [<ffffffff8104261d>] worker_thread+0x1ef/0x2e2 [<ffffffff81045f47>] kthread+0x7a/0x82 [<ffffffff81002cd4>] kernel_thread_helper+0x4/0x10 -> #0 (&call->state_lock){++--..}: [<ffffffff81055237>] validate_chain+0x727/0xd23 [<ffffffff810560bc>] __lock_acquire+0x889/0x8fa [<ffffffff81056184>] lock_acquire+0x57/0x6d [<ffffffff813e6b69>] _raw_read_lock_bh+0x34/0x43 [<ffffffffa00200d4>] rxrpc_resend_time_expired+0x56/0x96 [af_rxrpc] [<ffffffff8103b6e6>] run_timer_softirq+0x1f3/0x2a5 [<ffffffff81036828>] __do_softirq+0xa2/0x13e [<ffffffff81002dcc>] call_softirq+0x1c/0x28 [<ffffffff810049f0>] do_softirq+0x38/0x80 [<ffffffff810361a2>] irq_exit+0x45/0x47 [<ffffffff81018fb3>] smp_apic_timer_interrupt+0x88/0x96 [<ffffffff81002893>] apic_timer_interrupt+0x13/0x20 [<ffffffff810011ac>] cpu_idle+0x4d/0x83 [<ffffffff813e06f3>] start_secondary+0x1bd/0x1c1 other info that might help us debug this: 1 lock held by swapper/0: #0: (&call->resend_timer){+.-...}, at: [<ffffffff8103b675>] run_timer_softirq+0x182/0x2a5 stack backtrace: Pid: 0, comm: swapper Not tainted 2.6.35-rc3-cachefs+ #115 Call Trace: <IRQ> [<ffffffff81054414>] print_circular_bug+0xae/0xbd [<ffffffff81055237>] validate_chain+0x727/0xd23 [<ffffffff810560bc>] __lock_acquire+0x889/0x8fa [<ffffffff810539a7>] ? mark_lock+0x42f/0x51f [<ffffffff81056184>] lock_acquire+0x57/0x6d [<ffffffffa00200d4>] ? rxrpc_resend_time_expired+0x56/0x96 [af_rxrpc] [<ffffffff813e6b69>] _raw_read_lock_bh+0x34/0x43 [<ffffffffa00200d4>] ? rxrpc_resend_time_expired+0x56/0x96 [af_rxrpc] [<ffffffffa00200d4>] rxrpc_resend_time_expired+0x56/0x96 [af_rxrpc] [<ffffffff8103b6e6>] run_timer_softirq+0x1f3/0x2a5 [<ffffffff8103b675>] ? run_timer_softirq+0x182/0x2a5 [<ffffffffa002007e>] ? rxrpc_resend_time_expired+0x0/0x96 [af_rxrpc] [<ffffffff810367ef>] ? __do_softirq+0x69/0x13e [<ffffffff81036828>] __do_softirq+0xa2/0x13e [<ffffffff81002dcc>] call_softirq+0x1c/0x28 [<ffffffff810049f0>] do_softirq+0x38/0x80 [<ffffffff810361a2>] irq_exit+0x45/0x47 [<ffffffff81018fb3>] smp_apic_timer_interrupt+0x88/0x96 [<ffffffff81002893>] apic_timer_interrupt+0x13/0x20 <EOI> [<ffffffff81049de1>] ? __atomic_notifier_call_chain+0x0/0x86 [<ffffffff8100955b>] ? mwait_idle+0x6e/0x78 [<ffffffff81009552>] ? mwait_idle+0x65/0x78 [<ffffffff810011ac>] cpu_idle+0x4d/0x83 [<ffffffff813e06f3>] start_secondary+0x1bd/0x1c1 Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net-next: remove useless union keywordChangli Gao2010-06-11
| | | | | | | | | | remove useless union keyword in rtable, rt6_info and dn_route. Since there is only one member in a union, the union keyword isn't useful. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: sock_def_readable() and friends RCU conversionEric Dumazet2010-05-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sk_callback_lock rwlock actually protects sk->sk_sleep pointer, so we need two atomic operations (and associated dirtying) per incoming packet. RCU conversion is pretty much needed : 1) Add a new structure, called "struct socket_wq" to hold all fields that will need rcu_read_lock() protection (currently: a wait_queue_head_t and a struct fasync_struct pointer). [Future patch will add a list anchor for wakeup coalescing] 2) Attach one of such structure to each "struct socket" created in sock_alloc_inode(). 3) Respect RCU grace period when freeing a "struct socket_wq" 4) Change sk_sleep pointer in "struct sock" by sk_wq, pointer to "struct socket_wq" 5) Change sk_sleep() function to use new sk->sk_wq instead of sk->sk_sleep 6) Change sk_has_sleeper() to wq_has_sleeper() that must be used inside a rcu_read_lock() section. 7) Change all sk_has_sleeper() callers to : - Use rcu_read_lock() instead of read_lock(&sk->sk_callback_lock) - Use wq_has_sleeper() to eventually wakeup tasks. - Use rcu_read_unlock() instead of read_unlock(&sk->sk_callback_lock) 8) sock_wake_async() is modified to use rcu protection as well. 9) Exceptions : macvtap, drivers/net/tun.c, af_unix use integrated "struct socket_wq" instead of dynamically allocated ones. They dont need rcu freeing. Some cleanups or followups are probably needed, (possible sk_callback_lock conversion to a spinlock for example...). Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: use sk_sleep()Eric Dumazet2010-04-26
| | | | | | | | Commit aa395145 (net: sk_sleep() helper) missed three files in the conversion. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: sk_sleep() helperEric Dumazet2010-04-20
| | | | | | | | | | | | | | | | | Define a new function to return the waitqueue of a "struct sock". static inline wait_queue_head_t *sk_sleep(struct sock *sk) { return sk->sk_sleep; } Change all read occurrences of sk_sleep by a call to this function. Needed for a future RCU conversion. sk_sleep wont be a field directly available. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* include cleanup: Update gfp.h and slab.h includes to prepare for breaking ↵Tejun Heo2010-03-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
* rxrpc: Check allocation failure.Tetsuo Handa2010-03-22
| | | | | | | | alloc_skb() can return NULL. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* net: use net_eq to compare netsOctavian Purdila2009-11-25
| | | | | | | | | | | | | | | | | | | | | | | Generated with the following semantic patch @@ struct net *n1; struct net *n2; @@ - n1 == n2 + net_eq(n1, n2) @@ struct net *n1; struct net *n2; @@ - n1 != n2 + !net_eq(n1, n2) applied over {include,net,drivers/net}. Signed-off-by: Octavian Purdila <opurdila@ixiacom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: pass kern to net_proto_family create functionEric Paris2009-11-06
| | | | | | | | | | | The generic __sock_create function has a kern argument which allows the security system to make decisions based on if a socket is being created by the kernel or by userspace. This patch passes that flag to the net_proto_family specific create function, so it can do the same thing. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Generalize socket rx gap / receive queue overflow cmsgNeil Horman2009-10-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Create a new socket level option to report number of queue overflows Recently I augmented the AF_PACKET protocol to report the number of frames lost on the socket receive queue between any two enqueued frames. This value was exported via a SOL_PACKET level cmsg. AFter I completed that work it was requested that this feature be generalized so that any datagram oriented socket could make use of this option. As such I've created this patch, It creates a new SOL_SOCKET level option called SO_RXQ_OVFL, which when enabled exports a SOL_SOCKET level cmsg that reports the nubmer of times the sk_receive_queue overflowed between any two given frames. It also augments the AF_PACKET protocol to take advantage of this new feature (as it previously did not touch sk->sk_drops, which this patch uses to record the overflow count). Tested successfully by me. Notes: 1) Unlike my previous patch, this patch simply records the sk_drops value, which is not a number of drops between packets, but rather a total number of drops. Deltas must be computed in user space. 2) While this patch currently works with datagram oriented protocols, it will also be accepted by non-datagram oriented protocols. I'm not sure if thats agreeable to everyone, but my argument in favor of doing so is that, for those protocols which aren't applicable to this option, sk_drops will always be zero, and reporting no drops on a receive queue that isn't used for those non-participating protocols seems reasonable to me. This also saves us having to code in a per-protocol opt in mechanism. 3) This applies cleanly to net-next assuming that commit 977750076d98c7ff6cbda51858bb5a5894a9d9ab (my af packet cmsg patch) is reverted Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: mark net_proto_ops as constStephen Hemminger2009-10-07
| | | | | | | All usages of structure net_proto_ops should be declared const. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Make setsockopt() optlen be unsigned.David S. Miller2009-09-30
| | | | | | | | | | | | This provides safety against negative optlen at the type level instead of depending upon (sometimes non-trivial) checks against this sprinkled all over the the place, in each and every implementation. Based upon work done by Arjan van de Ven and feedback from Linus Torvalds. Signed-off-by: David S. Miller <davem@davemloft.net>
* trivial: fix typo "to to" in multiple filesAnand Gadiyar2009-09-21
| | | | | Signed-off-by: Anand Gadiyar <gadiyar@ti.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
* RxRPC: Use uX/sX rather than uintX_t/intX_t typesDavid Howells2009-09-16
| | | | | | | Use uX rather than uintX_t types for consistency. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* RxRPC: Parse security index 5 keys (Kerberos 5)David Howells2009-09-15
| | | | | | | Parse RxRPC security index 5 type keys (Kerberos 5 tokens). Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* RxRPC: Allow RxRPC keys to be readDavid Howells2009-09-15
| | | | | | | | Allow RxRPC keys to be read. This is to allow pioctl() to be implemented in userspace. RxRPC keys are read out in XDR format. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* RxRPC: Allow key payloads to be passed in XDR formDavid Howells2009-09-15
| | | | | | | | | | Allow add_key() and KEYCTL_INSTANTIATE to accept key payloads in XDR form as described by openafs-1.4.10/src/auth/afs_token.xg. This provides a way of passing kaserver, Kerberos 4, Kerberos 5 and GSSAPI keys from userspace, and allows for future expansion. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* RxRPC: Declare the security index constants symbolicallyDavid Howells2009-09-15
| | | | | | | | Declare the security index constants symbolically rather than just referring to them numerically. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: mark read-only arrays as constJan Engelhardt2009-08-05
| | | | | | | | String literals are constant, and usually, we can also tag the array of pointers const too, moving it to the .rodata section. Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: adding memory barrier to the poll and receive callbacksJiri Olsa2009-07-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adding memory barrier after the poll_wait function, paired with receive callbacks. Adding fuctions sock_poll_wait and sk_has_sleeper to wrap the memory barrier. Without the memory barrier, following race can happen. The race fires, when following code paths meet, and the tp->rcv_nxt and __add_wait_queue updates stay in CPU caches. CPU1 CPU2 sys_select receive packet ... ... __add_wait_queue update tp->rcv_nxt ... ... tp->rcv_nxt check sock_def_readable ... { schedule ... if (sk->sk_sleep && waitqueue_active(sk->sk_sleep)) wake_up_interruptible(sk->sk_sleep) ... } If there was no cache the code would work ok, since the wait_queue and rcv_nxt are opposit to each other. Meaning that once tp->rcv_nxt is updated by CPU2, the CPU1 either already passed the tp->rcv_nxt check and sleeps, or will get the new value for tp->rcv_nxt and will return with new data mask. In both cases the process (CPU1) is being added to the wait queue, so the waitqueue_active (CPU2) call cannot miss and will wake up CPU1. The bad case is when the __add_wait_queue changes done by CPU1 stay in its cache, and so does the tp->rcv_nxt update on CPU2 side. The CPU1 will then endup calling schedule and sleep forever if there are no more data on the socket. Calls to poll_wait in following modules were ommited: net/bluetooth/af_bluetooth.c net/irda/af_irda.c net/irda/irnet/irnet_ppp.c net/mac80211/rc80211_pid_debugfs.c net/phonet/socket.c net/rds/af_rds.c net/rfkill/core.c net/sunrpc/cache.c net/sunrpc/rpc_pipe.c net/tipc/socket.c Signed-off-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* RxRPC: Don't attempt to reuse aborted connectionsDavid Howells2009-06-17
| | | | | | | | | | | | Connections that have seen a connection-level abort should not be reused as the far end will just abort them again; instead a new connection should be made. Connection-level aborts occur due to such things as authentication failures. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* RxRPC: Error handling for rxrpc_alloc_connection()Dan Carpenter2009-05-21
| | | | | | | | | rxrpc_alloc_connection() doesn't return an error code on failure, it just returns NULL. IS_ERR(NULL) is false. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* RxRPC: Fix a potential NULL dereferenceDavid Howells2009-02-07
| | | | | | | | | | | | Fix a potential NULL dereference bug during error handling in rxrpc_kernel_begin_call(), whereby rxrpc_put_transport() may be handed a NULL pointer. This was found with a code checker (http://repo.or.cz/w/smatch.git/). Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6Linus Torvalds2008-12-28
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1429 commits) net: Allow dependancies of FDDI & Tokenring to be modular. igb: Fix build warning when DCA is disabled. net: Fix warning fallout from recent NAPI interface changes. gro: Fix potential use after free sfc: If AN is enabled, always read speed/duplex from the AN advertising bits sfc: When disabling the NIC, close the device rather than unregistering it sfc: SFT9001: Add cable diagnostics sfc: Add support for multiple PHY self-tests sfc: Merge top-level functions for self-tests sfc: Clean up PHY mode management in loopback self-test sfc: Fix unreliable link detection in some loopback modes sfc: Generate unique names for per-NIC workqueues 802.3ad: use standard ethhdr instead of ad_header 802.3ad: generalize out mac address initializer 802.3ad: initialize ports LACPDU from const initializer 802.3ad: remove typedef around ad_system 802.3ad: turn ports is_individual into a bool 802.3ad: turn ports is_enabled into a bool 802.3ad: make ntt bool ixgbe: Fix set_ringparam in ixgbe to use the same memory pools. ... Fixed trivial IPv4/6 address printing conflicts in fs/cifs/connect.c due to the conversion to %pI (in this networking merge) and the addition of doing IPv6 addresses (from the earlier merge of CIFS).
| * net: Make staticRoel Kluin2008-12-10
| | | | | | | | | | | | | | Sparse asked whether these could be static. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: replace NIPQUAD() in net/*/Harvey Harrison2008-10-31
| | | | | | | | | | | | | | | | Using NIPQUAD() with NIPQUAD_FMT, %d.%d.%d.%d or %u.%u.%u.%u can be replaced with %pI4 Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | CRED: Inaugurate COW credentialsDavid Howells2008-11-13
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Inaugurate copy-on-write credentials management. This uses RCU to manage the credentials pointer in the task_struct with respect to accesses by other tasks. A process may only modify its own credentials, and so does not need locking to access or modify its own credentials. A mutex (cred_replace_mutex) is added to the task_struct to control the effect of PTRACE_ATTACHED on credential calculations, particularly with respect to execve(). With this patch, the contents of an active credentials struct may not be changed directly; rather a new set of credentials must be prepared, modified and committed using something like the following sequence of events: struct cred *new = prepare_creds(); int ret = blah(new); if (ret < 0) { abort_creds(new); return ret; } return commit_creds(new); There are some exceptions to this rule: the keyrings pointed to by the active credentials may be instantiated - keyrings violate the COW rule as managing COW keyrings is tricky, given that it is possible for a task to directly alter the keys in a keyring in use by another task. To help enforce this, various pointers to sets of credentials, such as those in the task_struct, are declared const. The purpose of this is compile-time discouragement of altering credentials through those pointers. Once a set of credentials has been made public through one of these pointers, it may not be modified, except under special circumstances: (1) Its reference count may incremented and decremented. (2) The keyrings to which it points may be modified, but not replaced. The only safe way to modify anything else is to create a replacement and commit using the functions described in Documentation/credentials.txt (which will be added by a later patch). This patch and the preceding patches have been tested with the LTP SELinux testsuite. This patch makes several logical sets of alteration: (1) execve(). This now prepares and commits credentials in various places in the security code rather than altering the current creds directly. (2) Temporary credential overrides. do_coredump() and sys_faccessat() now prepare their own credentials and temporarily override the ones currently on the acting thread, whilst preventing interference from other threads by holding cred_replace_mutex on the thread being dumped. This will be replaced in a future patch by something that hands down the credentials directly to the functions being called, rather than altering the task's objective credentials. (3) LSM interface. A number of functions have been changed, added or removed: (*) security_capset_check(), ->capset_check() (*) security_capset_set(), ->capset_set() Removed in favour of security_capset(). (*) security_capset(), ->capset() New. This is passed a pointer to the new creds, a pointer to the old creds and the proposed capability sets. It should fill in the new creds or return an error. All pointers, barring the pointer to the new creds, are now const. (*) security_bprm_apply_creds(), ->bprm_apply_creds() Changed; now returns a value, which will cause the process to be killed if it's an error. (*) security_task_alloc(), ->task_alloc_security() Removed in favour of security_prepare_creds(). (*) security_cred_free(), ->cred_free() New. Free security data attached to cred->security. (*) security_prepare_creds(), ->cred_prepare() New. Duplicate any security data attached to cred->security. (*) security_commit_creds(), ->cred_commit() New. Apply any security effects for the upcoming installation of new security by commit_creds(). (*) security_task_post_setuid(), ->task_post_setuid() Removed in favour of security_task_fix_setuid(). (*) security_task_fix_setuid(), ->task_fix_setuid() Fix up the proposed new credentials for setuid(). This is used by cap_set_fix_setuid() to implicitly adjust capabilities in line with setuid() changes. Changes are made to the new credentials, rather than the task itself as in security_task_post_setuid(). (*) security_task_reparent_to_init(), ->task_reparent_to_init() Removed. Instead the task being reparented to init is referred directly to init's credentials. NOTE! This results in the loss of some state: SELinux's osid no longer records the sid of the thread that forked it. (*) security_key_alloc(), ->key_alloc() (*) security_key_permission(), ->key_permission() Changed. These now take cred pointers rather than task pointers to refer to the security context. (4) sys_capset(). This has been simplified and uses less locking. The LSM functions it calls have been merged. (5) reparent_to_kthreadd(). This gives the current thread the same credentials as init by simply using commit_thread() to point that way. (6) __sigqueue_alloc() and switch_uid() __sigqueue_alloc() can't stop the target task from changing its creds beneath it, so this function gets a reference to the currently applicable user_struct which it then passes into the sigqueue struct it returns if successful. switch_uid() is now called from commit_creds(), and possibly should be folded into that. commit_creds() should take care of protecting __sigqueue_alloc(). (7) [sg]et[ug]id() and co and [sg]et_current_groups. The set functions now all use prepare_creds(), commit_creds() and abort_creds() to build and check a new set of credentials before applying it. security_task_set[ug]id() is called inside the prepared section. This guarantees that nothing else will affect the creds until we've finished. The calling of set_dumpable() has been moved into commit_creds(). Much of the functionality of set_user() has been moved into commit_creds(). The get functions all simply access the data directly. (8) security_task_prctl() and cap_task_prctl(). security_task_prctl() has been modified to return -ENOSYS if it doesn't want to handle a function, or otherwise return the return value directly rather than through an argument. Additionally, cap_task_prctl() now prepares a new set of credentials, even if it doesn't end up using it. (9) Keyrings. A number of changes have been made to the keyrings code: (a) switch_uid_keyring(), copy_keys(), exit_keys() and suid_keys() have all been dropped and built in to the credentials functions directly. They may want separating out again later. (b) key_alloc() and search_process_keyrings() now take a cred pointer rather than a task pointer to specify the security context. (c) copy_creds() gives a new thread within the same thread group a new thread keyring if its parent had one, otherwise it discards the thread keyring. (d) The authorisation key now points directly to the credentials to extend the search into rather pointing to the task that carries them. (e) Installing thread, process or session keyrings causes a new set of credentials to be created, even though it's not strictly necessary for process or session keyrings (they're shared). (10) Usermode helper. The usermode helper code now carries a cred struct pointer in its subprocess_info struct instead of a new session keyring pointer. This set of credentials is derived from init_cred and installed on the new process after it has been cloned. call_usermodehelper_setup() allocates the new credentials and call_usermodehelper_freeinfo() discards them if they haven't been used. A special cred function (prepare_usermodeinfo_creds()) is provided specifically for call_usermodehelper_setup() to call. call_usermodehelper_setkeys() adjusts the credentials to sport the supplied keyring as the new session keyring. (11) SELinux. SELinux has a number of changes, in addition to those to support the LSM interface changes mentioned above: (a) selinux_setprocattr() no longer does its check for whether the current ptracer can access processes with the new SID inside the lock that covers getting the ptracer's SID. Whilst this lock ensures that the check is done with the ptracer pinned, the result is only valid until the lock is released, so there's no point doing it inside the lock. (12) is_single_threaded(). This function has been extracted from selinux_setprocattr() and put into a file of its own in the lib/ directory as join_session_keyring() now wants to use it too. The code in SELinux just checked to see whether a task shared mm_structs with other tasks (CLONE_VM), but that isn't good enough. We really want to know if they're part of the same thread group (CLONE_THREAD). (13) nfsd. The NFS server daemon now has to use the COW credentials to set the credentials it is going to use. It really needs to pass the credentials down to the functions it calls, but it can't do that until other patches in this series have been applied. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: James Morris <jmorris@namei.org>
* net/rxrpc: Use an IS_ERR test rather than a NULL testJulien Brunel2008-08-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In case of error, the function rxrpc_get_transport returns an ERR pointer, but never returns a NULL pointer. So after a call to this function, a NULL test should be replaced by an IS_ERR test. A simplified version of the semantic patch that makes this change is as follows: (http://www.emn.fr/x-info/coccinelle/) // <smpl> @correct_null_test@ expression x,E; statement S1, S2; @@ x = rxrpc_get_transport(...) <... when != x = E if ( ( - x@p2 != NULL + ! IS_ERR ( x ) | - x@p2 == NULL + IS_ERR( x ) ) ) S1 else S2 ...> ? x = E; // </smpl> Signed-off-by: Julien Brunel <brunel@diku.dk> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: convert BUG_TRAP to generic WARN_ONIlpo Järvinen2008-07-26
| | | | | | | | | | | | | | Removes legacy reinvent-the-wheel type thing. The generic machinery integrates much better to automated debugging aids such as kerneloops.org (and others), and is unambiguous due to better naming. Non-intuively BUG_TRAP() is actually equal to WARN_ON() rather than BUG_ON() though some might actually be promoted to BUG_ON() but I left that to future. I could make at least one BUILD_BUG_ON conversion. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
* MIB: add struct net to UDP_INC_STATS_BHPavel Emelyanov2008-07-06
| | | | | | | | | | | | Two special cases here - one is rxrpc - I put init_net there explicitly, since we haven't touched this part yet. The second place is in __udp4_lib_rcv - we already have a struct net there, but I have to move its initialization above to make it ready at the "drop" label. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Add missing braces to multi-statement if()sIlpo Järvinen2008-05-02
| | | | | | | | One finds all kinds of crazy things with some shell pipelining. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* RxRPC: Fix a regression in the RXKAD security moduleDavid Howells2008-04-24
| | | | | | | | | | | | | | | | Fix a regression in the RXKAD security module introduced in: commit 91e916cffec7c0153c5cbaa447151862a7a9a047 Author: Al Viro <viro@ftp.linux.org.uk> Date: Sat Mar 29 03:08:38 2008 +0000 net/rxrpc trivial annotations A variable was declared as a 16-bit type rather than a 32-bit type. Signed-off-by: David Howells <dhowells@redhat.com> Acked-with-apologies-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-of-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'master' of ↵David S. Miller2008-04-18
|\ | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6
| * AFS: Do not describe debug parameters with their valuePaul Bolle2008-04-16
| | | | | | | | | | | | | | | | Describe debug parameters with their names (and not their values). Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * rxrpc: remove smp_processor_id() from debug macroSven Schnelle2008-04-03
| | | | | | | | | | | | Signed-off-by: Sven Schnelle <svens@stackframe.org> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge branch 'master' of ↵David S. Miller2008-04-03
|\| | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
| * net/rxrpc trivial annotationsAl Viro2008-03-30
| | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge branch 'master' of ↵David S. Miller2008-03-18
|\| | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/wireless/rt2x00/rt2x00dev.c net/8021q/vlan_dev.c
| * RxRPC: fix rxrpc_recvmsg()'s returning of msg_nameDavid Howells2008-03-05
| | | | | | | | | | | | | | | | | | | | Fix rxrpc_recvmsg() to return msg_name correctly. We shouldn't overwrite the *msg struct, but should rather write into msg->msg_name (there's a '&' unary operator that shouldn't be there). Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: replace remaining __FUNCTION__ occurrencesHarvey Harrison2008-03-05
| | | | | | | | | | | | | | __FUNCTION__ is gcc-specific, use __func__ Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | [NETFILTER/RXRPC]: Don't use seq_release_private where inappropriate.Pavel Emelyanov2008-02-29
|/ | | | | | | | | | Some netfilter code and rxrpc one use seq_open() to open a proc file, but seq_release_private to release one. This is harmless, but ambiguous. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/rxrpc: Use BUG_ONJulia Lawall2008-02-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | if (...) BUG(); should be replaced with BUG_ON(...) when the test has no side-effects to allow a definition of BUG_ON that drops the code completely. The semantic patch that makes this change is as follows: (http://www.emn.fr/x-info/coccinelle/) // <smpl> @ disable unlikely @ expression E,f; @@ ( if (<... f(...) ...>) { BUG(); } | - if (unlikely(E)) { BUG(); } + BUG_ON(E); ) @@ expression E,f; @@ ( if (<... f(...) ...>) { BUG(); } | - if (E) { BUG(); } + BUG_ON(E); ) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
* Convert ERR_PTR(PTR_ERR(p)) instances to ERR_CAST(p)David Howells2008-02-07
| | | | | | | | | | Convert instances of ERR_PTR(PTR_ERR(p)) to ERR_CAST(p) using: perl -spi -e 's/ERR_PTR[(]PTR_ERR[(](.*)[)][)]/ERR_CAST(\1)/' `grep -rl 'ERR_PTR[(]*PTR_ERR' fs crypto net security` Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [AF_RXRPC]: constify function pointer tablesJan Engelhardt2008-01-31
| | | | | Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETNS]: Add namespace parameter to ip_route_output_key.Denis V. Lunev2008-01-28
| | | | | | | Needed to propagate it down to the ip_route_output_flow. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [RXRPC]: Use cpu_to_be32() where appropriate.YOSHIFUJI Hideaki2008-01-28
| | | | | Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [UDP]: Restore missing inDatagrams incrementsHerbert Xu2008-01-28
| | | | | | | | | | | The previous move of the the UDP inDatagrams counter caused the counting of encapsulated packets, SUNRPC data (as opposed to call) packets and RXRPC packets to go missing. This patch restores all of these. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Name magic constants in sock_wake_async()Pavel Emelyanov2008-01-28
| | | | | | | | | | | | | | | | | | The sock_wake_async() performs a bit different actions depending on "how" argument. Unfortunately this argument ony has numerical magic values. I propose to give names to their constants to help people reading this function callers understand what's going on without looking into this function all the time. I suppose this is 2.6.25 material, but if it's not (or the naming seems poor/bad/awful), I can rework it against the current net-2.6 tree. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* [AF_RXRPC]: Add a missing gotoDavid Howells2007-12-07
| | | | | | | | Add a missing goto to error handling in the RXKAD security module for AF_RXRPC. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>