From abffc0207f12563f17bbde96e4cc0d9f3d7e2a53 Mon Sep 17 00:00:00 2001 From: Andrea Righi Date: Wed, 27 Oct 2010 15:33:31 -0700 Subject: doc: clarify the behaviour of dirty_ratio/dirty_bytes When dirty_ratio or dirty_bytes is written the other parameter is disabled and set to 0 (in dirty_bytes_handler() / dirty_ratio_handler()). We do the same for dirty_background_ratio and dirty_background_bytes. However, in the sysctl documentation, we say that the counterpart becomes a function of the old value, that is not correct. Clarify the documentation reporting the actual behaviour. Reviewed-by: Greg Thelen Acked-by: David Rientjes Signed-off-by: Andrea Righi Cc: Randy Dunlap Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/sysctl/vm.txt | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) (limited to 'Documentation/sysctl') diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt index b606c2c4dd37..30289fab86eb 100644 --- a/Documentation/sysctl/vm.txt +++ b/Documentation/sysctl/vm.txt @@ -80,8 +80,10 @@ dirty_background_bytes Contains the amount of dirty memory at which the pdflush background writeback daemon will start writeback. -If dirty_background_bytes is written, dirty_background_ratio becomes a function -of its value (dirty_background_bytes / the amount of dirtyable system memory). +Note: dirty_background_bytes is the counterpart of dirty_background_ratio. Only +one of them may be specified at a time. When one sysctl is written it is +immediately taken into account to evaluate the dirty memory limits and the +other appears as 0 when read. ============================================================== @@ -97,8 +99,10 @@ dirty_bytes Contains the amount of dirty memory at which a process generating disk writes will itself start writeback. -If dirty_bytes is written, dirty_ratio becomes a function of its value -(dirty_bytes / the amount of dirtyable system memory). +Note: dirty_bytes is the counterpart of dirty_ratio. Only one of them may be +specified at a time. When one sysctl is written it is immediately taken into +account to evaluate the dirty memory limits and the other appears as 0 when +read. Note: the minimum value allowed for dirty_bytes is two pages (in bytes); any value lower than this limit will be ignored and the old configuration will be -- cgit v1.2.2 From eaf06b241b091357e72b76863ba16e89610d31bd Mon Sep 17 00:00:00 2001 From: Dan Rosenberg Date: Thu, 11 Nov 2010 14:05:18 -0800 Subject: Restrict unprivileged access to kernel syslog The kernel syslog contains debugging information that is often useful during exploitation of other vulnerabilities, such as kernel heap addresses. Rather than futilely attempt to sanitize hundreds (or thousands) of printk statements and simultaneously cripple useful debugging functionality, it is far simpler to create an option that prevents unprivileged users from reading the syslog. This patch, loosely based on grsecurity's GRKERNSEC_DMESG, creates the dmesg_restrict sysctl. When set to "0", the default, no restrictions are enforced. When set to "1", only users with CAP_SYS_ADMIN can read the kernel syslog via dmesg(8) or other mechanisms. [akpm@linux-foundation.org: explain the config option in kernel.txt] Signed-off-by: Dan Rosenberg Acked-by: Ingo Molnar Acked-by: Eugene Teo Acked-by: Kees Cook Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/sysctl/kernel.txt | 14 ++++++++++++++ 1 file changed, 14 insertions(+) (limited to 'Documentation/sysctl') diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt index 3894eaa23486..209e1584c3dc 100644 --- a/Documentation/sysctl/kernel.txt +++ b/Documentation/sysctl/kernel.txt @@ -28,6 +28,7 @@ show up in /proc/sys/kernel: - core_uses_pid - ctrl-alt-del - dentry-state +- dmesg_restrict - domainname - hostname - hotplug @@ -213,6 +214,19 @@ to decide what to do with it. ============================================================== +dmesg_restrict: + +This toggle indicates whether unprivileged users are prevented from using +dmesg(8) to view messages from the kernel's log buffer. When +dmesg_restrict is set to (0) there are no restrictions. When +dmesg_restrict is set set to (1), users must have CAP_SYS_ADMIN to use +dmesg(8). + +The kernel config option CONFIG_SECURITY_DMESG_RESTRICT sets the default +value of dmesg_restrict. + +============================================================== + domainname & hostname: These files can be used to set the NIS/YP domainname and the -- cgit v1.2.2 From 38ef4c2e437d11b5922723504b62824e96761459 Mon Sep 17 00:00:00 2001 From: "Serge E. Hallyn" Date: Wed, 8 Dec 2010 15:19:01 +0000 Subject: syslog: check cap_syslog when dmesg_restrict Eric Paris pointed out that it doesn't make sense to require both CAP_SYS_ADMIN and CAP_SYSLOG for certain syslog actions. So require CAP_SYSLOG, not CAP_SYS_ADMIN, when dmesg_restrict is set. (I'm also consolidating the now common error path) Signed-off-by: Serge E. Hallyn Acked-by: Eric Paris Acked-by: Kees Cook Signed-off-by: James Morris --- Documentation/sysctl/kernel.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation/sysctl') diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt index 209e1584c3dc..574067194f38 100644 --- a/Documentation/sysctl/kernel.txt +++ b/Documentation/sysctl/kernel.txt @@ -219,7 +219,7 @@ dmesg_restrict: This toggle indicates whether unprivileged users are prevented from using dmesg(8) to view messages from the kernel's log buffer. When dmesg_restrict is set to (0) there are no restrictions. When -dmesg_restrict is set set to (1), users must have CAP_SYS_ADMIN to use +dmesg_restrict is set set to (1), users must have CAP_SYSLOG to use dmesg(8). The kernel config option CONFIG_SECURITY_DMESG_RESTRICT sets the default -- cgit v1.2.2 From 455cd5ab305c90ffc422dd2e0fb634730942b257 Mon Sep 17 00:00:00 2001 From: Dan Rosenberg Date: Wed, 12 Jan 2011 16:59:41 -0800 Subject: kptr_restrict for hiding kernel pointers from unprivileged users Add the %pK printk format specifier and the /proc/sys/kernel/kptr_restrict sysctl. The %pK format specifier is designed to hide exposed kernel pointers, specifically via /proc interfaces. Exposing these pointers provides an easy target for kernel write vulnerabilities, since they reveal the locations of writable structures containing easily triggerable function pointers. The behavior of %pK depends on the kptr_restrict sysctl. If kptr_restrict is set to 0, no deviation from the standard %p behavior occurs. If kptr_restrict is set to 1, the default, if the current user (intended to be a reader via seq_printf(), etc.) does not have CAP_SYSLOG (currently in the LSM tree), kernel pointers using %pK are printed as 0's. If kptr_restrict is set to 2, kernel pointers using %pK are printed as 0's regardless of privileges. Replacing with 0's was chosen over the default "(null)", which cannot be parsed by userland %p, which expects "(nil)". [akpm@linux-foundation.org: check for IRQ context when !kptr_restrict, save an indent level, s/WARN/WARN_ONCE/] [akpm@linux-foundation.org: coding-style fixup] [randy.dunlap@oracle.com: fix kernel/sysctl.c warning] Signed-off-by: Dan Rosenberg Signed-off-by: Randy Dunlap Cc: James Morris Cc: Eric Dumazet Cc: Thomas Graf Cc: Eugene Teo Cc: Kees Cook Cc: Ingo Molnar Cc: David S. Miller Cc: Peter Zijlstra Cc: Eric Paris Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/sysctl/kernel.txt | 14 ++++++++++++++ 1 file changed, 14 insertions(+) (limited to 'Documentation/sysctl') diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt index 574067194f38..11d5ceda5bb0 100644 --- a/Documentation/sysctl/kernel.txt +++ b/Documentation/sysctl/kernel.txt @@ -34,6 +34,7 @@ show up in /proc/sys/kernel: - hotplug - java-appletviewer [ binfmt_java, obsolete ] - java-interpreter [ binfmt_java, obsolete ] +- kptr_restrict - kstack_depth_to_print [ X86 only ] - l2cr [ PPC only ] - modprobe ==> Documentation/debugging-modules.txt @@ -261,6 +262,19 @@ This flag controls the L2 cache of G3 processor boards. If ============================================================== +kptr_restrict: + +This toggle indicates whether restrictions are placed on +exposing kernel addresses via /proc and other interfaces. When +kptr_restrict is set to (0), there are no restrictions. When +kptr_restrict is set to (1), the default, kernel pointers +printed using the %pK format specifier will be replaced with 0's +unless the user has CAP_SYSLOG. When kptr_restrict is set to +(2), kernel pointers printed using %pK will be replaced with 0's +regardless of privileges. + +============================================================== + kstack_depth_to_print: (X86 only) Controls the number of words to print when dumping the raw -- cgit v1.2.2 From e020e742e5dbd8c44d31706995dc13ddc732e274 Mon Sep 17 00:00:00 2001 From: Jovi Zhang Date: Wed, 12 Jan 2011 17:00:45 -0800 Subject: sysctl: remove obsolete comments ctl_unnumbered.txt have been removed in Documentation directory so just also remove this invalid comments [akpm@linux-foundation.org: fix Documentation/sysctl/00-INDEX, per Dave] Signed-off-by: Jovi Zhang Cc: Dave Young Acked-by: WANG Cong Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/sysctl/00-INDEX | 2 -- 1 file changed, 2 deletions(-) (limited to 'Documentation/sysctl') diff --git a/Documentation/sysctl/00-INDEX b/Documentation/sysctl/00-INDEX index 1286f455992f..8cf5d493fd03 100644 --- a/Documentation/sysctl/00-INDEX +++ b/Documentation/sysctl/00-INDEX @@ -4,8 +4,6 @@ README - general information about /proc/sys/ sysctl files. abi.txt - documentation for /proc/sys/abi/*. -ctl_unnumbered.txt - - explanation of why one should not add new binary sysctl numbers. fs.txt - documentation for /proc/sys/fs/*. kernel.txt -- cgit v1.2.2 From 87889e158f59bbe8d40e88cf9de76e7d7f266498 Mon Sep 17 00:00:00 2001 From: Paul Bolle Date: Sun, 6 Feb 2011 21:00:41 +0100 Subject: Documentation: default_message_level is a typo It's default_message_loglevel, not default_message_level. Signed-off-by: Paul Bolle Signed-off-by: Jiri Kosina --- Documentation/sysctl/kernel.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation/sysctl') diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt index 11d5ceda5bb0..36f007514db3 100644 --- a/Documentation/sysctl/kernel.txt +++ b/Documentation/sysctl/kernel.txt @@ -367,7 +367,7 @@ the different loglevels. - console_loglevel: messages with a higher priority than this will be printed to the console -- default_message_level: messages without an explicit priority +- default_message_loglevel: messages without an explicit priority will be printed with this priority - minimum_console_loglevel: minimum (highest) value to which console_loglevel can be set -- cgit v1.2.2 From ca3b78aa1672162f93de90cbf5051edea298a290 Mon Sep 17 00:00:00 2001 From: Federica Teodori Date: Tue, 15 Mar 2011 16:12:05 -0700 Subject: Documentation: file handles are now freed Since file handles are freed, a little amendment to the documentation Signed-off-by: Federica Teodori Acked-by: Rik van Riel Signed-off-by: Randy Dunlap Signed-off-by: Linus Torvalds --- Documentation/sysctl/fs.txt | 17 ++++++++--------- 1 file changed, 8 insertions(+), 9 deletions(-) (limited to 'Documentation/sysctl') diff --git a/Documentation/sysctl/fs.txt b/Documentation/sysctl/fs.txt index 62682500878a..4af0614147ef 100644 --- a/Documentation/sysctl/fs.txt +++ b/Documentation/sysctl/fs.txt @@ -88,20 +88,19 @@ you might want to raise the limit. file-max & file-nr: -The kernel allocates file handles dynamically, but as yet it -doesn't free them again. - The value in file-max denotes the maximum number of file- handles that the Linux kernel will allocate. When you get lots of error messages about running out of file handles, you might want to increase this limit. -Historically, the three values in file-nr denoted the number of -allocated file handles, the number of allocated but unused file -handles, and the maximum number of file handles. Linux 2.6 always -reports 0 as the number of free file handles -- this is not an -error, it just means that the number of allocated file handles -exactly matches the number of used file handles. +Historically,the kernel was able to allocate file handles +dynamically, but not to free them again. The three values in +file-nr denote the number of allocated file handles, the number +of allocated but unused file handles, and the maximum number of +file handles. Linux 2.6 always reports 0 as the number of free +file handles -- this is not an error, it just means that the +number of allocated file handles exactly matches the number of +used file handles. Attempts to allocate more file descriptors than file-max are reported with printk, look for "VFS: file-max limit -- cgit v1.2.2 From 5a3016a61530ea171c1b8ab23d7f651de919e39f Mon Sep 17 00:00:00 2001 From: Paul Bolle Date: Wed, 6 Apr 2011 11:09:55 +0200 Subject: doc: fix 3 typos in sysctl/vm.txt Signed-off-by: Paul Bolle Signed-off-by: Jiri Kosina --- Documentation/sysctl/vm.txt | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'Documentation/sysctl') diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt index 30289fab86eb..96f0ee825bed 100644 --- a/Documentation/sysctl/vm.txt +++ b/Documentation/sysctl/vm.txt @@ -481,10 +481,10 @@ the DMA zone. Type(A) is called as "Node" order. Type (B) is "Zone" order. "Node order" orders the zonelists by node, then by zone within each node. -Specify "[Nn]ode" for zone order +Specify "[Nn]ode" for node order "Zone Order" orders the zonelists by zone type, then by node within each -zone. Specify "[Zz]one"for zode order. +zone. Specify "[Zz]one" for zone order. Specify "[Dd]efault" to request automatic configuration. Autoconfiguration will select "node" order in following case. -- cgit v1.2.2 From 0a14842f5a3c0e88a1e59fac5c3025db39721f74 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 20 Apr 2011 09:27:32 +0000 Subject: net: filter: Just In Time compiler for x86-64 In order to speedup packet filtering, here is an implementation of a JIT compiler for x86_64 It is disabled by default, and must be enabled by the admin. echo 1 >/proc/sys/net/core/bpf_jit_enable It uses module_alloc() and module_free() to get memory in the 2GB text kernel range since we call helpers functions from the generated code. EAX : BPF A accumulator EBX : BPF X accumulator RDI : pointer to skb (first argument given to JIT function) RBP : frame pointer (even if CONFIG_FRAME_POINTER=n) r9d : skb->len - skb->data_len (headlen) r8 : skb->data To get a trace of generated code, use : echo 2 >/proc/sys/net/core/bpf_jit_enable Example of generated code : # tcpdump -p -n -s 0 -i eth1 host 192.168.20.0/24 flen=18 proglen=147 pass=3 image=ffffffffa00b5000 JIT code: ffffffffa00b5000: 55 48 89 e5 48 83 ec 60 48 89 5d f8 44 8b 4f 60 JIT code: ffffffffa00b5010: 44 2b 4f 64 4c 8b 87 b8 00 00 00 be 0c 00 00 00 JIT code: ffffffffa00b5020: e8 24 7b f7 e0 3d 00 08 00 00 75 28 be 1a 00 00 JIT code: ffffffffa00b5030: 00 e8 fe 7a f7 e0 24 00 3d 00 14 a8 c0 74 49 be JIT code: ffffffffa00b5040: 1e 00 00 00 e8 eb 7a f7 e0 24 00 3d 00 14 a8 c0 JIT code: ffffffffa00b5050: 74 36 eb 3b 3d 06 08 00 00 74 07 3d 35 80 00 00 JIT code: ffffffffa00b5060: 75 2d be 1c 00 00 00 e8 c8 7a f7 e0 24 00 3d 00 JIT code: ffffffffa00b5070: 14 a8 c0 74 13 be 26 00 00 00 e8 b5 7a f7 e0 24 JIT code: ffffffffa00b5080: 00 3d 00 14 a8 c0 75 07 b8 ff ff 00 00 eb 02 31 JIT code: ffffffffa00b5090: c0 c9 c3 BPF program is 144 bytes long, so native program is almost same size ;) (000) ldh [12] (001) jeq #0x800 jt 2 jf 8 (002) ld [26] (003) and #0xffffff00 (004) jeq #0xc0a81400 jt 16 jf 5 (005) ld [30] (006) and #0xffffff00 (007) jeq #0xc0a81400 jt 16 jf 17 (008) jeq #0x806 jt 10 jf 9 (009) jeq #0x8035 jt 10 jf 17 (010) ld [28] (011) and #0xffffff00 (012) jeq #0xc0a81400 jt 16 jf 13 (013) ld [38] (014) and #0xffffff00 (015) jeq #0xc0a81400 jt 16 jf 17 (016) ret #65535 (017) ret #0 Signed-off-by: Eric Dumazet Cc: Arnaldo Carvalho de Melo Cc: Ben Hutchings Cc: Hagen Paul Pfeifer Signed-off-by: David S. Miller --- Documentation/sysctl/net.txt | 11 +++++++++++ 1 file changed, 11 insertions(+) (limited to 'Documentation/sysctl') diff --git a/Documentation/sysctl/net.txt b/Documentation/sysctl/net.txt index cbd05ffc606b..3201a7097e4d 100644 --- a/Documentation/sysctl/net.txt +++ b/Documentation/sysctl/net.txt @@ -32,6 +32,17 @@ Table : Subdirectories in /proc/sys/net 1. /proc/sys/net/core - Network core options ------------------------------------------------------- +bpf_jit_enable +-------------- + +This enables Berkeley Packet Filter Just in Time compiler. +Currently supported on x86_64 architecture, bpf_jit provides a framework +to speed packet filtering, the one used by tcpdump/libpcap for example. +Values : + 0 - disable the JIT (default value) + 1 - enable the JIT + 2 - enable the JIT and ask the compiler to emit traces on kernel log. + rmem_default ------------ -- cgit v1.2.2 From 52307a9e1d8910e205f6be2c4dd35900f7b11282 Mon Sep 17 00:00:00 2001 From: Lucian Adrian Grijincu Date: Mon, 23 May 2011 11:57:33 -0700 Subject: Documentation: update epoll sysctl text max_user_instances was removed in this commit: commit 9df04e1f25effde823a600e755b51475d438f56b Author: Davide Libenzi Date: Thu Jan 29 14:25:26 2009 -0800 epoll: drop max_user_instances and rely only on max_user_watches but the documentation entry was not removed. Cc: Davide Libenzi Signed-off-by: Lucian Adrian Grijincu Signed-off-by: Randy Dunlap Signed-off-by: Linus Torvalds --- Documentation/sysctl/fs.txt | 7 ------- 1 file changed, 7 deletions(-) (limited to 'Documentation/sysctl') diff --git a/Documentation/sysctl/fs.txt b/Documentation/sysctl/fs.txt index 4af0614147ef..88fd7f5c8dcd 100644 --- a/Documentation/sysctl/fs.txt +++ b/Documentation/sysctl/fs.txt @@ -231,13 +231,6 @@ its creation). This directory contains configuration options for the epoll(7) interface. -max_user_instances ------------------- - -This is the maximum number of epoll file descriptors that a single user can -have open at a given time. The default value is 128, and should be enough -for normal users. - max_user_watches ---------------- -- cgit v1.2.2 From 57cc083ad9e1bfeeb4a0ee831e7bb008c8865bf0 Mon Sep 17 00:00:00 2001 From: Jiri Slaby Date: Thu, 26 May 2011 16:25:46 -0700 Subject: coredump: add support for exe_file in core name Now, exe_file is not proc FS dependent, so we can use it to name core file. So we add %E pattern for core file name cration which extract path from mm_struct->exe_file. Then it converts slashes to exclamation marks and pastes the result to the core file name itself. This is useful for environments where binary names are longer than 16 character (the current->comm limitation). Also where there are binaries with same name but in a different path. Further in case the binery itself changes its current->comm after exec. So by doing (s/$/#/ -- # is treated as git comment): $ sysctl kernel.core_pattern='core.%p.%e.%E' $ ln /bin/cat cat45678901234567890 $ ./cat45678901234567890 ^Z $ rm cat45678901234567890 $ fg ^\Quit (core dumped) $ ls core* we now get: core.2434.cat456789012345.!root!cat45678901234567890 (deleted) Signed-off-by: Jiri Slaby Cc: Al Viro Cc: Alan Cox Reviewed-by: Andi Kleen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/sysctl/kernel.txt | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'Documentation/sysctl') diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt index 36f007514db3..5e7cb39ad195 100644 --- a/Documentation/sysctl/kernel.txt +++ b/Documentation/sysctl/kernel.txt @@ -161,7 +161,8 @@ core_pattern is used to specify a core dumpfile pattern name. %s signal number %t UNIX time of dump %h hostname - %e executable filename + %e executable filename (may be shortened) + %E executable path % both are dropped . If the first character of the pattern is a '|', the kernel will treat the rest of the pattern as a command to run. The core dump will be -- cgit v1.2.2