litmus-rt.git - The LITMUS^RT kernel.

	Commit message (Collapse)	Author	Age
*	SELinux: implement the new sb_remount LSM hook	Eric Paris	2011-03-03
\| \| \| \| \| \| \| \| \| \|	For SELinux we do not allow security information to change during a remount operation. Thus this hook simply strips the security module options from the data and verifies that those are the same options as exist on the current superblock. Signed-off-by: Eric Paris <eparis@redhat.com> Reviewed-by: James Morris <jmorris@namei.org>
*	LSM: Pass -o remount options to the LSM	Eric Paris	2011-03-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The VFS mount code passes the mount options to the LSM. The LSM will remove options it understands from the data and the VFS will then pass the remaining options onto the underlying filesystem. This is how options like the SELinux context= work. The problem comes in that -o remount never calls into LSM code. So if you include an LSM specific option it will get passed to the filesystem and will cause the remount to fail. An example of where this is a problem is the 'seclabel' option. The SELinux LSM hook will print this word in /proc/mounts if the filesystem is being labeled using xattrs. If you pass this word on mount it will be silently stripped and ignored. But if you pass this word on remount the LSM never gets called and it will be passed to the FS. The FS doesn't know what seclabel means and thus should fail the mount. For example an ext3 fs mounted over loop # mount -o loop /tmp/fs /mnt/tmp # cat /proc/mounts \| grep /mnt/tmp /dev/loop0 /mnt/tmp ext3 rw,seclabel,relatime,errors=continue,barrier=0,data=ordered 0 0 # mount -o remount /mnt/tmp mount: /mnt/tmp not mounted already, or bad option # dmesg EXT3-fs (loop0): error: unrecognized mount option "seclabel" or missing value This patch passes the remount mount options to an new LSM hook. Signed-off-by: Eric Paris <eparis@redhat.com> Reviewed-by: James Morris <jmorris@namei.org>
*	SELinux: Compute SID for the newly created socket	Harry Ciao	2011-03-03
\| \| \| \| \| \| \| \| \| \| \| \|	The security context for the newly created socket shares the same user, role and MLS attribute as its creator but may have a different type, which could be specified by a type_transition rule in the relevant policy package. Signed-off-by: Harry Ciao <qingtao.cao@windriver.com> [fix call to security_transition_sid to include qstr, Eric Paris] Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
*	SELinux: Socket retains creator role and MLS attribute	Harry Ciao	2011-03-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The socket SID would be computed on creation and no longer inherit its creator's SID by default. Socket may have a different type but needs to retain the creator's role and MLS attribute in order not to break labeled networking and network access control. The kernel value for a class would be used to determine if the class if one of socket classes. If security_compute_sid is called from userspace the policy value for a class would be mapped to the relevant kernel value first. Signed-off-by: Harry Ciao <qingtao.cao@windriver.com> Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
*	SELinux: Auto-generate security_is_socket_class	Harry Ciao	2011-03-03
\| \| \| \| \| \| \| \| \| \| \|	The security_is_socket_class() is auto-generated by genheaders based on classmap.h to reduce maintenance effort when a new class is defined in SELinux kernel. The name for any socket class should be suffixed by "socket" and doesn't contain more than one substr of "socket". Signed-off-by: Harry Ciao <qingtao.cao@windriver.com> Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
*	Revert "selinux: simplify ioctl checking"	Eric Paris	2011-02-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit 242631c49d4cf39642741d6627750151b058233b. Conflicts: security/selinux/hooks.c SELinux used to recognize certain individual ioctls and check permissions based on the knowledge of the individual ioctl. In commit 242631c49d4cf396 the SELinux code stopped trying to understand individual ioctls and to instead looked at the ioctl access bits to determine in we should check read or write for that operation. This same suggestion was made to SMACK (and I believe copied into TOMOYO). But this suggestion is total rubbish. The ioctl access bits are actually the access requirements for the structure being passed into the ioctl, and are completely unrelated to the operation of the ioctl or the object the ioctl is being performed upon. Take FS_IOC_FIEMAP as an example. FS_IOC_FIEMAP is defined as: FS_IOC_FIEMAP _IOWR('f', 11, struct fiemap) So it has access bits R and W. What this really means is that the kernel is going to both read and write to the struct fiemap. It has nothing at all to do with the operations that this ioctl might perform on the file itself! Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
*	selinux: drop unused packet flow permissions	Eric Paris	2011-02-25
\| \| \| \| \| \| \| \| \|	These permissions are not used and can be dropped in the kernel definitions. Suggested-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
*	selinux: Fix packet forwarding checks on postrouting	Steffen Klassert	2011-02-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The IPSKB_FORWARDED and IP6SKB_FORWARDED flags are used only in the multicast forwarding case to indicate that a packet looped back after forward. So these flags are not a good indicator for packet forwarding. A better indicator is the incoming interface. If we have no socket context, but an incoming interface and we see the packet in the ip postroute hook, the packet is going to be forwarded. With this patch we use the incoming interface as an indicator on packet forwarding. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Acked-by: Paul Moore <paul.moore@hp.com> Signed-off-by: Eric Paris <eparis@redhat.com>
*	selinux: Fix wrong checks for selinux_policycap_netpeer	Steffen Klassert	2011-02-25
\| \| \| \| \| \| \| \| \| \| \| \| \|	selinux_sock_rcv_skb_compat and selinux_ip_postroute_compat are just called if selinux_policycap_netpeer is not set. However in these functions we check if selinux_policycap_netpeer is set. This leads to some dead code and to the fact that selinux_xfrm_postroute_last is never executed. This patch removes the dead code and the checks for selinux_policycap_netpeer in the compatibility functions. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Acked-by: Paul Moore <paul.moore@hp.com> Signed-off-by: Eric Paris <eparis@redhat.com>
*	selinux: Fix check for xfrm selinux context algorithm	Steffen Klassert	2011-02-25
\| \| \| \| \| \| \| \| \| \|	selinux_xfrm_sec_ctx_alloc accidentally checks the xfrm domain of interpretation against the selinux context algorithm. This patch fixes this by checking ctx_alg against the selinux context algorithm. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Acked-by: Paul Moore <paul.moore@hp.com> Signed-off-by: Eric Paris <eparis@redhat.com>
*	security: remove unused security_sysctl hook	Lucian Adrian Grijincu	2011-02-01
\| \| \| \| \| \| \| \| \|	The only user for this hook was selinux. sysctl routes every call through /proc/sys/. Selinux and other security modules use the file system checks for sysctl too, so no need for this hook any more. Signed-off-by: Lucian Adrian Grijincu <lucian.grijincu@gmail.com> Signed-off-by: Eric Paris <eparis@redhat.com>
*	security/selinux: fix /proc/sys/ labeling	Lucian Adrian Grijincu	2011-02-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes an old (2007) selinux regression: filesystem labeling for /proc/sys returned -r--r--r-- unknown /proc/sys/fs/file-nr instead of -r--r--r-- system_u:object_r:sysctl_fs_t:s0 /proc/sys/fs/file-nr Events that lead to breaking of /proc/sys/ selinux labeling: 1) sysctl was reimplemented to route all calls through /proc/sys/ commit 77b14db502cb85a031fe8fde6c85d52f3e0acb63 [PATCH] sysctl: reimplement the sysctl proc support 2) proc_dir_entry was removed from ctl_table: commit 3fbfa98112fc3962c416452a0baf2214381030e6 [PATCH] sysctl: remove the proc_dir_entry member for the sysctl tables 3) selinux still walked the proc_dir_entry tree to apply labeling. Because ctl_tables don't have a proc_dir_entry, we did not label /proc/sys/ inodes any more. To achieve this the /proc/sys/ inodes were marked private and private inodes were ignored by selinux. commit bbaca6c2e7ef0f663bc31be4dad7cf530f6c4962 [PATCH] selinux: enhance selinux to always ignore private inodes commit 86a71dbd3e81e8870d0f0e56b87875f57e58222b [PATCH] sysctl: hide the sysctl proc inodes from selinux Access control checks have been done by means of a special sysctl hook that was called for read/write accesses to any /proc/sys/ entry. We don't have to do this because, instead of walking the proc_dir_entry tree we can walk the dentry tree (as done in this patch). With this patch: * we don't mark /proc/sys/ inodes as private * we don't need the sysclt security hook * we walk the dentry tree to find the path to the inode. We have to strip the PID in /proc/PID/ entries that have a proc_dir_entry because selinux does not know how to label paths like '/1/net/rpc/nfsd.fh' (and defaults to 'proc_t' labeling). Selinux does know of '/net/rpc/nfsd.fh' (and applies the 'sysctl_rpc_t' label). PID stripping from the path was done implicitly in the previous code because the proc_dir_entry tree had the root in '/net' in the example from above. The dentry tree has the root in '/1'. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Lucian Adrian Grijincu <lucian.grijincu@gmail.com> Signed-off-by: Eric Paris <eparis@redhat.com>
*	SELinux: Use dentry name in new object labeling	Eric Paris	2011-02-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently SELinux has rules which label new objects according to 3 criteria. The label of the process creating the object, the label of the parent directory, and the type of object (reg, dir, char, block, etc.) This patch adds a 4th criteria, the dentry name, thus we can distinguish between creating a file in an etc_t directory called shadow and one called motd. There is no file globbing, regex parsing, or anything mystical. Either the policy exactly (strcmp) matches the dentry name of the object or it doesn't. This patch has no changes from today if policy does not implement the new rules. Signed-off-by: Eric Paris <eparis@redhat.com>
*	fs/vfs/security: pass last path component to LSM on inode creation	Eric Paris	2011-02-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SELinux would like to implement a new labeling behavior of newly created inodes. We currently label new inodes based on the parent and the creating process. This new behavior would also take into account the name of the new object when deciding the new label. This is not the (supposed) full path, just the last component of the path. This is very useful because creating /etc/shadow is different than creating /etc/passwd but the kernel hooks are unable to differentiate these operations. We currently require that userspace realize it is doing some difficult operation like that and than userspace jumps through SELinux hoops to get things set up correctly. This patch does not implement new behavior, that is obviously contained in a seperate SELinux patch, but it does pass the needed name down to the correct LSM hook. If no such name exists it is fine to pass NULL. Signed-off-by: Eric Paris <eparis@redhat.com>
*	CacheFiles: Add calls to path-based security hooks	David Howells	2011-01-23
\| \| \| \| \| \| \| \| \|	Add calls to path-based security hooks into CacheFiles as, unlike inode-based security, these aren't implicit in the vfs_mkdir() and similar calls. Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>
*	security:selinux: kill unused MAX_AVTAB_HASH_MASK and ebitmap_startbit	Shan Wei	2011-01-23
\| \| \| \| \| \| \|	Kill unused MAX_AVTAB_HASH_MASK and ebitmap_startbit. Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Signed-off-by: James Morris <jmorris@namei.org>
*	Subject: [PATCH] Smack: mmap controls for library containment	Casey Schaufler	2011-01-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In the embedded world there are often situations where libraries are updated from a variety of sources, for a variety of reasons, and with any number of security characteristics. These differences might include privilege required for a given library provided interface to function properly, as occurs from time to time in graphics libraries. There are also cases where it is important to limit use of libraries based on the provider of the library and the security aware application may make choices based on that criteria. These issues are addressed by providing an additional Smack label that may optionally be assigned to an object, the SMACK64MMAP attribute. An mmap operation is allowed if there is no such attribute. If there is a SMACK64MMAP attribute the mmap is permitted only if a subject with that label has all of the access permitted a subject with the current task label. Security aware applications may from time to time wish to reduce their "privilege" to avoid accidental use of privilege. One case where this arises is the environment in which multiple sources provide libraries to perform the same functions. An application may know that it should eschew services made available from a particular vendor, or of a particular version. In support of this a secondary list of Smack rules has been added that is local to the task. This list is consulted only in the case where the global list has approved access. It can only further restrict access. Unlike the global last, if no entry is found on the local list access is granted. An application can add entries to its own list by writing to /smack/load-self. The changes appear large as they involve refactoring the list handling to accomodate there being more than one rule list. Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
*	Merge branch 'master' of git://git.infradead.org/users/eparis/selinux into next	James Morris	2011-01-09
\|\
\| *	SELinux: define permissions for DCB netlink messages	Eric Paris	2010-12-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 2f90b865 added two new netlink message types to the netlink route socket. SELinux has hooks to define if netlink messages are allowed to be sent or received, but it did not know about these two new message types. By default we allow such actions so noone likely noticed. This patch adds the proper definitions and thus proper permissions enforcement. Signed-off-by: Eric Paris <eparis@redhat.com>
\| *	selinux: cache sidtab_context_to_sid results	Eric Paris	2010-12-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	sidtab_context_to_sid takes up a large share of time when creating large numbers of new inodes (~30-40% in oprofile runs). This patch implements a cache of 3 entries which is checked before we do a full context_to_sid lookup. On one system this showed over a x3 improvement in the number of inodes that could be created per second and around a 20% improvement on another system. Any time we look up the same context string sucessivly (imagine ls -lZ) we should hit this cache hot. A cache miss should have a relatively minor affect on performance next to doing the full table search. All operations on the cache are done COMPLETELY lockless. We know that all struct sidtab_node objects created will never be deleted until a new policy is loaded thus we never have to worry about a pointer being dereferenced. Since we also know that pointer assignment is atomic we know that the cache will always have valid pointers. Given this information we implement a FIFO cache in an array of 3 pointers. Every result (whether a cache hit or table lookup) will be places in the 0 spot of the cache and the rest of the entries moved down one spot. The 3rd entry will be lost. Races are possible and are even likely to happen. Lets assume that 4 tasks are hitting sidtab_context_to_sid. The first task checks against the first entry in the cache and it is a miss. Now lets assume a second task updates the cache with a new entry. This will push the first entry back to the second spot. Now the first task might check against the second entry (which it already checked) and will miss again. Now say some third task updates the cache and push the second entry to the third spot. The first task my check the third entry (for the third time!) and again have a miss. At which point it will just do a full table lookup. No big deal! Signed-off-by: Eric Paris <eparis@redhat.com>
\| *	SELinux: do not compute transition labels on mountpoint labeled filesystems	Eric Paris	2010-12-02
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	selinux_inode_init_security computes transitions sids even for filesystems that use mount point labeling. It shouldn't do that. It should just use the mount point label always and no matter what. This causes 2 problems. 1) it makes file creation slower than it needs to be since we calculate the transition sid and 2) it allows files to be created with a different label than the mount point! # id -Z staff_u:sysadm_r:sysadm_t:s0-s0:c0.c1023 # sesearch --type --class file --source sysadm_t --target tmp_t Found 1 semantic te rules: type_transition sysadm_t tmp_t : file user_tmp_t; # mount -o loop,context="system_u:object_r:tmp_t:s0" /tmp/fs /mnt/tmp # ls -lZ /mnt/tmp drwx------. root root system_u:object_r:tmp_t:s0 lost+found # touch /mnt/tmp/file1 # ls -lZ /mnt/tmp -rw-r--r--. root root staff_u:object_r:user_tmp_t:s0 file1 drwx------. root root system_u:object_r:tmp_t:s0 lost+found Whoops, we have a mount point labeled filesystem tmp_t with a user_tmp_t labeled file! Signed-off-by: Eric Paris <eparis@redhat.com> Reviewed-by: Reviewed-by: James Morris <jmorris@namei.org>
\| *	SELinux: merge policydb_index_classes and policydb_index_others	Eric Paris	2010-11-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We duplicate functionality in policydb_index_classes() and policydb_index_others(). This patch merges those functions just to make it clear there is nothing special happening here. Signed-off-by: Eric Paris <eparis@redhat.com>
\| *	selinux: convert part of the sym_val_to_name array to use flex_array	Eric Paris	2010-11-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The sym_val_to_name type array can be quite large as it grows linearly with the number of types. With known policies having over 5k types these allocations are growing large enough that they are likely to fail. Convert those to flex_array so no allocation is larger than PAGE_SIZE Signed-off-by: Eric Paris <eparis@redhat.com>
\| *	selinux: convert type_val_to_struct to flex_array	Eric Paris	2010-11-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In rawhide type_val_to_struct will allocate 26848 bytes, an order 3 allocations. While this hasn't been seen to fail it isn't outside the realm of possibiliy on systems with severe memory fragmentation. Convert to flex_array so no allocation will ever be bigger than PAGE_SIZE. Signed-off-by: Eric Paris <eparis@redhat.com>
\| *	SELinux: do not set automatic i_ino in selinuxfs	Eric Paris	2010-11-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	selinuxfs carefully uses i_ino to figure out what the inode refers to. The VFS used to generically set this value and we would reset it to something useable. After 85fe4025c616 each filesystem sets this value to a default if needed. Since selinuxfs doesn't use the default value and it can only lead to problems (I'd rather have 2 inodes with i_ino == 0 than one pointing to the wrong data) lets just stop setting a default. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: James Morris <jmorris@namei.org>
\| *	selinux: rework security_netlbl_secattr_to_sid	Eric Paris	2010-11-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	security_netlbl_secattr_to_sid is difficult to follow, especially the return codes. Try to make the function obvious. Signed-off-by: Eric Paris <eparis@redhat.com>
\| *	SELinux: standardize return code handling in selinuxfs.c	Eric Paris	2010-11-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	selinuxfs.c has lots of different standards on how to handle return paths on error. For the most part transition to rc=errno if (failure) goto out; [...] out: cleanup() return rc; Instead of doing cleanup mid function, or having multiple returns or other options. This doesn't do that for every function, but most of the complex functions which have cleanup routines on error. Signed-off-by: Eric Paris <eparis@redhat.com>
\| *	SELinux: standardize return code handling in selinuxfs.c	Eric Paris	2010-11-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	selinuxfs.c has lots of different standards on how to handle return paths on error. For the most part transition to rc=errno if (failure) goto out; [...] out: cleanup() return rc; Instead of doing cleanup mid function, or having multiple returns or other options. This doesn't do that for every function, but most of the complex functions which have cleanup routines on error. Signed-off-by: Eric Paris <eparis@redhat.com>
\| *	SELinux: standardize return code handling in policydb.c	Eric Paris	2010-11-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	policydb.c has lots of different standards on how to handle return paths on error. For the most part transition to rc=errno if (failure) goto out; [...] out: cleanup() return rc; Instead of doing cleanup mid function, or having multiple returns or other options. This doesn't do that for every function, but most of the complex functions which have cleanup routines on error. Signed-off-by: Eric Paris <eparis@redhat.com>
* \|	Merge branch 'master' into next	James Morris	2011-01-09
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: security/smack/smack_lsm.c Verified and added fix by Stephen Rothwell <sfr@canb.auug.org.au> Ok'd by Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: James Morris <jmorris@namei.org>
\| * \	Merge branch 'vfs-scale-working' of ↵	Linus Torvalds	2011-01-07
\| \|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin * 'vfs-scale-working' of git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin: (57 commits) fs: scale mntget/mntput fs: rename vfsmount counter helpers fs: implement faster dentry memcmp fs: prefetch inode data in dcache lookup fs: improve scalability of pseudo filesystems fs: dcache per-inode inode alias locking fs: dcache per-bucket dcache hash locking bit_spinlock: add required includes kernel: add bl_list xfs: provide simple rcu-walk ACL implementation btrfs: provide simple rcu-walk ACL implementation ext2,3,4: provide simple rcu-walk ACL implementation fs: provide simple rcu-walk generic_check_acl implementation fs: provide rcu-walk aware permission i_ops fs: rcu-walk aware d_revalidate method fs: cache optimise dentry and inode for rcu-walk fs: dcache reduce branches in lookup path fs: dcache remove d_mounted fs: fs_struct use seqlock fs: rcu-walk for path lookup ...
\| \| * \|	fs: rcu-walk for path lookup	Nick Piggin	2011-01-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Perform common cases of path lookups without any stores or locking in the ancestor dentry elements. This is called rcu-walk, as opposed to the current algorithm which is a refcount based walk, or ref-walk. This results in far fewer atomic operations on every path element, significantly improving path lookup performance. It also avoids cacheline bouncing on common dentries, significantly improving scalability. The overall design is like this: * LOOKUP_RCU is set in nd->flags, which distinguishes rcu-walk from ref-walk. * Take the RCU lock for the entire path walk, starting with the acquiring of the starting path (eg. root/cwd/fd-path). So now dentry refcounts are not required for dentry persistence. * synchronize_rcu is called when unregistering a filesystem, so we can access d_ops and i_ops during rcu-walk. * Similarly take the vfsmount lock for the entire path walk. So now mnt refcounts are not required for persistence. Also we are free to perform mount lookups, and to assume dentry mount points and mount roots are stable up and down the path. * Have a per-dentry seqlock to protect the dentry name, parent, and inode, so we can load this tuple atomically, and also check whether any of its members have changed. * Dentry lookups (based on parent, candidate string tuple) recheck the parent sequence after the child is found in case anything changed in the parent during the path walk. * inode is also RCU protected so we can load d_inode and use the inode for limited things. * i_mode, i_uid, i_gid can be tested for exec permissions during path walk. * i_op can be loaded. When we reach the destination dentry, we lock it, recheck lookup sequence, and increment its refcount and mountpoint refcount. RCU and vfsmount locks are dropped. This is termed "dropping rcu-walk". If the dentry refcount does not match, we can not drop rcu-walk gracefully at the current point in the lokup, so instead return -ECHILD (for want of a better errno). This signals the path walking code to re-do the entire lookup with a ref-walk. Aside from the final dentry, there are other situations that may be encounted where we cannot continue rcu-walk. In that case, we drop rcu-walk (ie. take a reference on the last good dentry) and continue with a ref-walk. Again, if we can drop rcu-walk gracefully, we return -ECHILD and do the whole lookup using ref-walk. But it is very important that we can continue with ref-walk for most cases, particularly to avoid the overhead of double lookups, and to gain the scalability advantages on common path elements (like cwd and root). The cases where rcu-walk cannot continue are: * NULL dentry (ie. any uncached path element) * parent with d_inode->i_op->permission or ACLs * dentries with d_revalidate * Following links In future patches, permission checks and d_revalidate become rcu-walk aware. It may be possible eventually to make following links rcu-walk aware. Uncached path elements will always require dropping to ref-walk mode, at the very least because i_mutex needs to be grabbed, and objects allocated. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
\| \| * \|	fs: dcache rationalise dget variants	Nick Piggin	2011-01-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	dget_locked was a shortcut to avoid the lazy lru manipulation when we already held dcache_lock (lru manipulation was relatively cheap at that point). However, how that the lru lock is an innermost one, we never hold it at any caller, so the lock cost can now be avoided. We already have well working lazy dcache LRU, so it should be fine to defer LRU manipulations to scan time. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
\| \| * \|	fs: dcache remove dcache_lock	Nick Piggin	2011-01-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	dcache_lock no longer protects anything. remove it. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
\| \| * \|	fs: dcache scale subdirs	Nick Piggin	2011-01-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Protect d_subdirs and d_child with d_lock, except in filesystems that aren't using dcache_lock for these anyway (eg. using i_mutex). Note: if we change the locking rule in future so that ->d_child protection is provided only with ->d_parent->d_lock, it may allow us to reduce some locking. But it would be an exception to an otherwise regular locking scheme, so we'd have to see some good results. Probably not worthwhile. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
\| \| * \|	fs: dcache scale d_unhashed	Nick Piggin	2011-01-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Protect d_unhashed(dentry) condition with d_lock. This means keeping DCACHE_UNHASHED bit in synch with hash manipulations. Signed-off-by: Nick Piggin <npiggin@kernel.dk>
\| * \| \|	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6	Linus Torvalds	2011-01-06
\| \|\ \ \ \| \| \|/ / \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1436 commits) cassini: Use local-mac-address prom property for Cassini MAC address net: remove the duplicate #ifdef __KERNEL__ net: bridge: check the length of skb after nf_bridge_maybe_copy_header() netconsole: clarify stopping message netconsole: don't announce stopping if nothing happened cnic: Fix the type field in SPQ messages netfilter: fix export secctx error handling netfilter: fix the race when initializing nf_ct_expect_hash_rnd ipv4: IP defragmentation must be ECN aware net: r6040: Return proper error for r6040_init_one dcb: use after free in dcb_flushapp() dcb: unlock on error in dcbnl_ieee_get() net: ixp4xx_eth: Return proper error for eth_init_one include/linux/if_ether.h: Add #define ETH_P_LINK_CTL for HPNA and wlan local tunnel net: add POLLPRI to sock_def_readable() af_unix: Avoid socket->sk NULL OOPS in stream connect security hooks. net_sched: pfifo_head_drop problem mac80211: remove stray extern mac80211: implement off-channel TX using hw r-o-c offload mac80211: implement hardware offload for remain-on-channel ...
\| \| * \|	af_unix: Avoid socket->sk NULL OOPS in stream connect security hooks.	David S. Miller	2011-01-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	unix_release() can asynchornously set socket->sk to NULL, and it does so without holding the unix_state_lock() on "other" during stream connects. However, the reverse mapping, sk->sk_socket, is only transitioned to NULL under the unix_state_lock(). Therefore make the security hooks follow the reverse mapping instead of the forward mapping. Reported-by: Jeremy Fitzhardinge <jeremy@goop.org> Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
\| \| * \|	Merge branch 'master' of ↵	David S. Miller	2010-12-27
\| \| \|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: net/ipv4/fib_frontend.c
\| \| * \| \|	SELinux: indicate fatal error in compat netfilter code	Eric Paris	2010-11-23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The SELinux ip postroute code indicates when policy rejected a packet and passes the error back up the stack. The compat code does not. This patch sends the same kind of error back up the stack in the compat code. Based-on-patch-by: Paul Moore <paul.moore@hp.com> Signed-off-by: Eric Paris <eparis@redhat.com> Reviewed-by: Paul Moore <paul.moore@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| \| * \| \|	SELinux: Only return netlink error when we know the return is fatal	Eric Paris	2010-11-23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some of the SELinux netlink code returns a fatal error when the error might actually be transient. This patch just silently drops packets on potentially transient errors but continues to return a permanant error indicator when the denial was because of policy. Based-on-comments-by: Paul Moore <paul.moore@hp.com> Signed-off-by: Eric Paris <eparis@redhat.com> Reviewed-by: Paul Moore <paul.moore@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| \| * \| \|	SELinux: return -ECONNREFUSED from ip_postroute to signal fatal error	Eric Paris	2010-11-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The SELinux netfilter hooks just return NF_DROP if they drop a packet. We want to signal that a drop in this hook is a permanant fatal error and is not transient. If we do this the error will be passed back up the stack in some places and applications will get a faster interaction that something went wrong. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \| \| \|	ima: fix add LSM rule bug	Mimi Zohar	2011-01-03
\| \| \|/ / \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If security_filter_rule_init() doesn't return a rule, then not everything is as fine as the return code implies. This bug only occurs when the LSM (eg. SELinux) is disabled at runtime. Adding an empty LSM rule causes ima_match_rules() to always succeed, ignoring any remaining rules. default IMA TCB policy: # PROC_SUPER_MAGIC dont_measure fsmagic=0x9fa0 # SYSFS_MAGIC dont_measure fsmagic=0x62656572 # DEBUGFS_MAGIC dont_measure fsmagic=0x64626720 # TMPFS_MAGIC dont_measure fsmagic=0x01021994 # SECURITYFS_MAGIC dont_measure fsmagic=0x73636673 < LSM specific rule > dont_measure obj_type=var_log_t measure func=BPRM_CHECK measure func=FILE_MMAP mask=MAY_EXEC measure func=FILE_CHECK mask=MAY_READ uid=0 Thus without the patch, with the boot parameters 'tcb selinux=0', adding the above 'dont_measure obj_type=var_log_t' rule to the default IMA TCB measurement policy, would result in nothing being measured. The patch prevents the default TCB policy from being replaced. Signed-off-by: Mimi Zohar <zohar@us.ibm.com> Cc: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Cc: David Safford <safford@watson.ibm.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
\| * \| \|	KEYS: Don't call up_write() if __key_link_begin() returns an error	David Howells	2010-12-23
\| \| \|/ \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In construct_alloc_key(), up_write() is called in the error path if __key_link_begin() fails, but this is incorrect as __key_link_begin() only returns with the nominated keyring locked if it returns successfully. Without this patch, you might see the following in dmesg: ===================================== [ BUG: bad unlock balance detected! ] ------------------------------------- mount.cifs/5769 is trying to release lock (&key->sem) at: [<ffffffff81201159>] request_key_and_link+0x263/0x3fc but there are no more locks to release! other info that might help us debug this: 3 locks held by mount.cifs/5769: #0: (&type->s_umount_key#41/1){+.+.+.}, at: [<ffffffff81131321>] sget+0x278/0x3e7 #1: (&ret_buf->session_mutex){+.+.+.}, at: [<ffffffffa0258e59>] cifs_get_smb_ses+0x35a/0x443 [cifs] #2: (root_key_user.cons_lock){+.+.+.}, at: [<ffffffff81201000>] request_key_and_link+0x10a/0x3fc stack backtrace: Pid: 5769, comm: mount.cifs Not tainted 2.6.37-rc6+ #1 Call Trace: [<ffffffff81201159>] ? request_key_and_link+0x263/0x3fc [<ffffffff81081601>] print_unlock_inbalance_bug+0xca/0xd5 [<ffffffff81083248>] lock_release_non_nested+0xc1/0x263 [<ffffffff81201159>] ? request_key_and_link+0x263/0x3fc [<ffffffff81201159>] ? request_key_and_link+0x263/0x3fc [<ffffffff81083567>] lock_release+0x17d/0x1a4 [<ffffffff81073f45>] up_write+0x23/0x3b [<ffffffff81201159>] request_key_and_link+0x263/0x3fc [<ffffffffa026fe9e>] ? cifs_get_spnego_key+0x61/0x21f [cifs] [<ffffffff812013c5>] request_key+0x41/0x74 [<ffffffffa027003d>] cifs_get_spnego_key+0x200/0x21f [cifs] [<ffffffffa026e296>] CIFS_SessSetup+0x55d/0x1273 [cifs] [<ffffffffa02589e1>] cifs_setup_session+0x90/0x1ae [cifs] [<ffffffffa0258e7e>] cifs_get_smb_ses+0x37f/0x443 [cifs] [<ffffffffa025a9e3>] cifs_mount+0x1aa1/0x23f3 [cifs] [<ffffffff8111fd94>] ? alloc_debug_processing+0xdb/0x120 [<ffffffffa027002c>] ? cifs_get_spnego_key+0x1ef/0x21f [cifs] [<ffffffffa024cc71>] cifs_do_mount+0x165/0x2b3 [cifs] [<ffffffff81130e72>] vfs_kern_mount+0xaf/0x1dc [<ffffffff81131007>] do_kern_mount+0x4d/0xef [<ffffffff811483b9>] do_mount+0x6f4/0x733 [<ffffffff8114861f>] sys_mount+0x88/0xc2 [<ffffffff8100ac42>] system_call_fastpath+0x16/0x1b Reported-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-and-Tested-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \| \|	encrypted-keys: style and other cleanup	Mimi Zohar	2010-12-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Cleanup based on David Howells suggestions: - use static const char arrays instead of #define - rename init_sdesc to alloc_sdesc - convert 'unsigned int' definitions to 'size_t' - revert remaining 'const unsigned int' definitions to 'unsigned int' Signed-off-by: Mimi Zohar <zohar@us.ibm.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>
* \| \|	encrypted-keys: verify datablob size before converting to binary	Mimi Zohar	2010-12-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Verify the hex ascii datablob length is correct before converting the IV, encrypted data, and HMAC to binary. Reported-by: David Howells <dhowells@redhat.com> Signed-off-by: Mimi Zohar <zohar@us.ibm.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>
* \| \|	trusted-keys: kzalloc and other cleanup	Mimi Zohar	2010-12-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Cleanup based on David Howells suggestions: - replace kzalloc, where possible, with kmalloc - revert 'const unsigned int' definitions to 'unsigned int' Signed-off-by: David Safford <safford@watson.ibm.com> Acked-by: Mimi Zohar <zohar@us.ibm.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>
* \| \|	trusted-keys: additional TSS return code and other error handling	Mimi Zohar	2010-12-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously not all TSS return codes were tested, as they were all eventually caught by the TPM. Now all returns are tested and handled immediately. This patch also fixes memory leaks in error and non-error paths. Signed-off-by: David Safford <safford@watson.ibm.com> Acked-by: Mimi Zohar <zohar@us.ibm.com> Acked-by: David Howells <dhowells@redhat.com> Acked-by: Serge E. Hallyn <serge@hallyn.com> Signed-off-by: James Morris <jmorris@namei.org>
* \| \|	Smack: Transmute labels on specified directories	Jarkko Sakkinen	2010-12-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In a situation where Smack access rules allow processes with multiple labels to write to a directory it is easy to get into a situation where the directory gets cluttered with files that the owner can't deal with because while they could be written to the directory a process at the label of the directory can't write them. This is generally the desired behavior, but when it isn't it is a real issue. This patch introduces a new attribute SMACK64TRANSMUTE that instructs Smack to create the file with the label of the directory under certain circumstances. A new access mode, "t" for transmute, is made available to Smack access rules, which are expanded from "rwxa" to "rwxat". If a file is created in a directory marked as transmutable and if access was granted to perform the operation by a rule that included the transmute mode, then the file gets the Smack label of the directory instead of the Smack label of the creating process. Note that this is equivalent to creating an empty file at the label of the directory and then having the other process write to it. The transmute scheme requires that both the access rule allows transmutation and that the directory be explicitly marked. Signed-off-by: Jarkko Sakkinen <ext-jarkko.2.sakkinen@nokia.com> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
* \| \|	This patch adds a new security attribute to Smack called	Casey Schaufler	2010-12-02
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SMACK64EXEC. It defines label that is used while task is running. Exception: in smack_task_wait() child task is checked for write access to parent task using label inherited from the task that forked it. Fixed issues from previous submit: - SMACK64EXEC was not read when SMACK64 was not set. - inode security blob was not updated after setting SMACK64EXEC - inode security blob was not updated when removing SMACK64EXEC