From f4a3e0bceb57466c31757f25e4e0ed108d1299ec Mon Sep 17 00:00:00 2001 From: "Dr. Werner Fink" Date: Wed, 22 Sep 2010 12:45:40 +0200 Subject: tty: Add a new file /proc/tty/consoles Add a new file /proc/tty/consoles to be able to determine the registered system console lines. If the reading process holds /dev/console open at the regular standard input stream the active device will be marked by an asterisk. Show possible operations and also decode the used flags of the listed console lines. Signed-off-by: Werner Fink Cc: Alan Cox Signed-off-by: Greg Kroah-Hartman --- Documentation/filesystems/proc.txt | 32 ++++++++++++++++++++++++++++++++ 1 file changed, 32 insertions(+) (limited to 'Documentation/filesystems/proc.txt') diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index a6aca8740883..98223a676940 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -1075,6 +1075,7 @@ Table 1-11: Files in /proc/tty drivers list of drivers and their usage ldiscs registered line disciplines driver/serial usage statistic and status of single tty lines + consoles registered system console lines .............................................................................. To see which tty's are currently in use, you can simply look into the file @@ -1093,6 +1094,37 @@ To see which tty's are currently in use, you can simply look into the file /dev/tty /dev/tty 5 0 system:/dev/tty unknown /dev/tty 4 1-63 console +To see which character device lines are currently used for the system console +/dev/console, you may simply look into the file /proc/tty/consoles: + + > cat /proc/tty/consoles + tty0 -WU (ECp) 4:7 + ttyS0 -W- (Ep) 4:64 + +The columns are: + + device name of the device + operations R = can do read operations + W = can do write operations + U = can do unblank + flags E = it is enabled + C = it is prefered console + B = it is primary boot console + p = it is used for printk buffer + b = it is not a TTY but a Braille device + a = it is safe to use when cpu is offline + * = it is standard input of the reading process + major:minor major and minor number of the device separated by a colon + +If the reading process holds /dev/console open at the regular standard input +stream the active device will be marked by an asterisk: + + > cat /proc/tty/consoles < /dev/console + tty0 -WU (ECp*) 4:7 + ttyS0 -W- (Ep) 4:64 + > tty + /dev/pts/3 + 1.8 Miscellaneous kernel statistics in /proc/stat ------------------------------------------------- -- cgit v1.2.2 From 6c2754c28f2388a276fe21edde826f2113c8f60e Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Sat, 23 Oct 2010 08:14:12 -0700 Subject: Revert "tty: Add a new file /proc/tty/consoles" This reverts commit f4a3e0bceb57466c31757f25e4e0ed108d1299ec. Jiri Sladby points out that the tty structure we're using may already be gone, and Al Viro doesn't hold back in complaining about the random loading of 'filp->private_data' which doesn't have to be a pointer at all, nor does checking the magic field for TTY_MAGIC prove anything. Belated review by Al: "a) global variable depending on stdin of the last opener? Affecting output of read(2)? Really? b) iterator is broken; list should be locked in ->start(), unlocked in ->stop() and *NOT* unlocked/relocked in ->next() c) ->show() ought to do nothing in case of ->device == NULL, instead of skipping those in ->next()/->start() d) regardless of the merits of the bright idea about asterisk at that line in output *and* regardless of (a), the implementation is not only atrociously ugly, it's actually very likely to be a roothole. Verifying that Cthulhu knows what number happens to be address of a tty_struct by blindly dereferencing memory at that address... Ouch. Please revert that crap." And Christoph pipes in and NAK's the approach of walking fd tables etc too. So it's pretty unanimous. Noticed-by: Jri Slaby Requested-by: Al Viro Cc: Greg Kroah-Hartman Cc: Werner Fink Cc: Alan Cox Cc: Christoph Hellwig Signed-off-by: Linus Torvalds --- Documentation/filesystems/proc.txt | 32 -------------------------------- 1 file changed, 32 deletions(-) (limited to 'Documentation/filesystems/proc.txt') diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index 98223a676940..a6aca8740883 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -1075,7 +1075,6 @@ Table 1-11: Files in /proc/tty drivers list of drivers and their usage ldiscs registered line disciplines driver/serial usage statistic and status of single tty lines - consoles registered system console lines .............................................................................. To see which tty's are currently in use, you can simply look into the file @@ -1094,37 +1093,6 @@ To see which tty's are currently in use, you can simply look into the file /dev/tty /dev/tty 5 0 system:/dev/tty unknown /dev/tty 4 1-63 console -To see which character device lines are currently used for the system console -/dev/console, you may simply look into the file /proc/tty/consoles: - - > cat /proc/tty/consoles - tty0 -WU (ECp) 4:7 - ttyS0 -W- (Ep) 4:64 - -The columns are: - - device name of the device - operations R = can do read operations - W = can do write operations - U = can do unblank - flags E = it is enabled - C = it is prefered console - B = it is primary boot console - p = it is used for printk buffer - b = it is not a TTY but a Braille device - a = it is safe to use when cpu is offline - * = it is standard input of the reading process - major:minor major and minor number of the device separated by a colon - -If the reading process holds /dev/console open at the regular standard input -stream the active device will be marked by an asterisk: - - > cat /proc/tty/consoles < /dev/console - tty0 -WU (ECp*) 4:7 - ttyS0 -W- (Ep) 4:64 - > tty - /dev/pts/3 - 1.8 Miscellaneous kernel statistics in /proc/stat ------------------------------------------------- -- cgit v1.2.2 From 0f4d208f1975f16f269134cee5f44c1f048581da Mon Sep 17 00:00:00 2001 From: Matt Mackall Date: Tue, 26 Oct 2010 14:21:22 -0700 Subject: Documentation/filesystems/proc.txt: improve smaps field documentation Signed-off-by: Matt Mackall Cc: Nikanth Karthikesan Cc: Balbir Singh Cc: Wu Fengguang Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/filesystems/proc.txt | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) (limited to 'Documentation/filesystems/proc.txt') diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index a6aca8740883..a563b74c7aef 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -374,13 +374,13 @@ Swap: 0 kB KernelPageSize: 4 kB MMUPageSize: 4 kB -The first of these lines shows the same information as is displayed for the -mapping in /proc/PID/maps. The remaining lines show the size of the mapping, -the amount of the mapping that is currently resident in RAM, the "proportional -set size” (divide each shared page by the number of processes sharing it), the -number of clean and dirty shared pages in the mapping, and the number of clean -and dirty private pages in the mapping. The "Referenced" indicates the amount -of memory currently marked as referenced or accessed. +The first of these lines shows the same information as is displayed for the +mapping in /proc/PID/maps. The remaining lines show the size of the mapping +(size), the amount of the mapping that is currently resident in RAM (RSS), the +process' proportional share of this mapping (PSS), the number of clean and +dirty shared pages in the mapping, and the number of clean and dirty private +pages in the mapping. The "Referenced" indicates the amount of memory +currently marked as referenced or accessed. This file is only present if the CONFIG_MMU kernel configuration option is enabled. -- cgit v1.2.2 From b40d4f84becd69275451baee7f0801c85eb58437 Mon Sep 17 00:00:00 2001 From: Nikanth Karthikesan Date: Wed, 27 Oct 2010 15:34:10 -0700 Subject: /proc/pid/smaps: export amount of anonymous memory in a mapping Export the number of anonymous pages in a mapping via smaps. Even the private pages in a mapping backed by a file, would be marked as anonymous, when they are modified. Export this information to user-space via smaps. Exporting this count will help gdb to make a better decision on which areas need to be dumped in its coredump; and should be useful to others studying the memory usage of a process. Signed-off-by: Nikanth Karthikesan Acked-by: Hugh Dickins Reviewed-by: KOSAKI Motohiro Cc: Matt Mackall Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/filesystems/proc.txt | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) (limited to 'Documentation/filesystems/proc.txt') diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index a563b74c7aef..976de6e19dd8 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -370,6 +370,7 @@ Shared_Dirty: 0 kB Private_Clean: 0 kB Private_Dirty: 0 kB Referenced: 892 kB +Anonymous: 0 kB Swap: 0 kB KernelPageSize: 4 kB MMUPageSize: 4 kB @@ -378,9 +379,15 @@ The first of these lines shows the same information as is displayed for the mapping in /proc/PID/maps. The remaining lines show the size of the mapping (size), the amount of the mapping that is currently resident in RAM (RSS), the process' proportional share of this mapping (PSS), the number of clean and -dirty shared pages in the mapping, and the number of clean and dirty private -pages in the mapping. The "Referenced" indicates the amount of memory -currently marked as referenced or accessed. +dirty private pages in the mapping. Note that even a page which is part of a +MAP_SHARED mapping, but has only a single pte mapped, i.e. is currently used +by only one process, is accounted as private and not as shared. "Referenced" +indicates the amount of memory currently marked as referenced or accessed. +"Anonymous" shows the amount of memory that does not belong to any file. Even +a mapping associated with a file may contain anonymous pages: when MAP_PRIVATE +and a page is modified, the file page is replaced by a private anonymous copy. +"Swap" shows how much would-be-anonymous memory is also used, but out on +swap. This file is only present if the CONFIG_MMU kernel configuration option is enabled. -- cgit v1.2.2 From 03f890f8c2f5c9008d3d8f6d85267717ced4bd79 Mon Sep 17 00:00:00 2001 From: Nikanth Karthikesan Date: Wed, 27 Oct 2010 15:34:11 -0700 Subject: /proc/pid/pagemap: document in Documentation/filesystems/proc.txt Document /proc/pid/pagemap in Documentation/filesystems/proc.txt Signed-off-by: Nikanth Karthikesan Cc: Richard Guenther Cc: Balbir Singh Cc: KOSAKI Motohiro Acked-by: Matt Mackall Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/filesystems/proc.txt | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'Documentation/filesystems/proc.txt') diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index 976de6e19dd8..e73df2722ff3 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -136,6 +136,7 @@ Table 1-1: Process specific entries in /proc statm Process memory status information status Process status in human readable form wchan If CONFIG_KALLSYMS is set, a pre-decoded wchan + pagemap Page table stack Report full stack trace, enable via CONFIG_STACKTRACE smaps a extension based on maps, showing the memory consumption of each mapping @@ -404,6 +405,9 @@ To clear the bits for the file mapped pages associated with the process > echo 3 > /proc/PID/clear_refs Any other value written to /proc/PID/clear_refs will have no effect. +The /proc/pid/pagemap gives the PFN, which can be used to find the pageflags +using /proc/kpageflags and number of times a page is mapped using +/proc/kpagecount. For detailed explanation, see Documentation/vm/pagemap.txt. 1.2 Kernel data --------------- -- cgit v1.2.2 From 23308ba54dcdb54481163bfb07dd8aeca76a7a2e Mon Sep 17 00:00:00 2001 From: Jiri Slaby Date: Thu, 4 Nov 2010 16:20:24 +0100 Subject: console: add /proc/consoles It allows users to see what consoles are currently known to the system and with what flags. It is based on Werner's patch, the part about traversing fds was removed, the code was moved to kernel/printk.c, where consoles are handled and it makes more sense to me. Signed-off-by: Jiri Slaby [cleanups] Signed-off-by: "Dr. Werner Fink" Cc: Al Viro Cc: Greg Kroah-Hartman Signed-off-by: Greg Kroah-Hartman --- Documentation/filesystems/proc.txt | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) (limited to 'Documentation/filesystems/proc.txt') diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index e73df2722ff3..9471225212c4 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -1181,6 +1181,30 @@ Table 1-12: Files in /proc/fs/ext4/ mb_groups details of multiblock allocator buddy cache of free blocks .............................................................................. +2.0 /proc/consoles +------------------ +Shows registered system console lines. + +To see which character device lines are currently used for the system console +/dev/console, you may simply look into the file /proc/consoles: + + > cat /proc/consoles + tty0 -WU (ECp) 4:7 + ttyS0 -W- (Ep) 4:64 + +The columns are: + + device name of the device + operations R = can do read operations + W = can do write operations + U = can do unblank + flags E = it is enabled + C = it is prefered console + B = it is primary boot console + p = it is used for printk buffer + b = it is not a TTY but a Braille device + a = it is safe to use when cpu is offline + major:minor major and minor number of the device separated by a colon ------------------------------------------------------------------------------ Summary -- cgit v1.2.2 From 2d90508f638241a2e7422d884767398296ebe720 Mon Sep 17 00:00:00 2001 From: Nikanth Karthikesan Date: Thu, 13 Jan 2011 15:45:53 -0800 Subject: mm: smaps: export mlock information Currently there is no way to find whether a process has locked its pages in memory or not. And which of the memory regions are locked in memory. Add a new field "Locked" to export this information via the smaps file. Signed-off-by: Nikanth Karthikesan Acked-by: Balbir Singh Acked-by: Wu Fengguang Cc: Matt Mackall Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/filesystems/proc.txt | 3 +++ 1 file changed, 3 insertions(+) (limited to 'Documentation/filesystems/proc.txt') diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index 9471225212c4..ef757fca470b 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -375,6 +375,7 @@ Anonymous: 0 kB Swap: 0 kB KernelPageSize: 4 kB MMUPageSize: 4 kB +Locked: 374 kB The first of these lines shows the same information as is displayed for the mapping in /proc/PID/maps. The remaining lines show the size of the mapping @@ -670,6 +671,8 @@ varies by architecture and compile options. The following is from a > cat /proc/meminfo +The "Locked" indicates whether the mapping is locked in memory or not. + MemTotal: 16344972 kB MemFree: 13634064 kB -- cgit v1.2.2 From dabb16f639820267b3850d804571c70bd93d4e07 Mon Sep 17 00:00:00 2001 From: Mandeep Singh Baines Date: Thu, 13 Jan 2011 15:46:05 -0800 Subject: oom: allow a non-CAP_SYS_RESOURCE proces to oom_score_adj down We'd like to be able to oom_score_adj a process up/down as it enters/leaves the foreground. Currently, it is not possible to oom_adj down without CAP_SYS_RESOURCE. This patch allows a task to decrease its oom_score_adj back to the value that a CAP_SYS_RESOURCE thread set it to or its inherited value at fork. Assuming the thread that has forked it has oom_score_adj of 0, each process could decrease it back from 0 upon activation unless a CAP_SYS_RESOURCE thread elevated it to something higher. Alternative considered: * a setuid binary * a daemon with CAP_SYS_RESOURCE Since you don't wan't all processes to be able to reduce their oom_adj, a setuid or daemon implementation would be complex. The alternatives also have much higher overhead. This patch updated from original patch based on feedback from David Rientjes. Signed-off-by: Mandeep Singh Baines Acked-by: David Rientjes Cc: KAMEZAWA Hiroyuki Cc: KOSAKI Motohiro Cc: Rik van Riel Cc: Ying Han Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/filesystems/proc.txt | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'Documentation/filesystems/proc.txt') diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index ef757fca470b..23cae6548d3a 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -1323,6 +1323,10 @@ scaled linearly with /proc//oom_score_adj. Writing to /proc//oom_score_adj or /proc//oom_adj will change the other with its scaled value. +The value of /proc//oom_score_adj may be reduced no lower than the last +value set by a CAP_SYS_RESOURCE process. To reduce the value any lower +requires CAP_SYS_RESOURCE. + NOTICE: /proc//oom_adj is deprecated and will be removed, please see Documentation/feature-removal-schedule.txt. -- cgit v1.2.2 From 25985edcedea6396277003854657b5f3cb31a628 Mon Sep 17 00:00:00 2001 From: Lucas De Marchi Date: Wed, 30 Mar 2011 22:57:33 -0300 Subject: Fix common misspellings Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi --- Documentation/filesystems/proc.txt | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'Documentation/filesystems/proc.txt') diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index 23cae6548d3a..b0b814d75ca1 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -543,7 +543,7 @@ just those considered 'most important'. The new vectors are: their statistics are used by kernel developers and interested users to determine the occurrence of interrupts of the given type. -The above IRQ vectors are displayed only when relevent. For example, +The above IRQ vectors are displayed only when relevant. For example, the threshold vector does not exist on x86_64 platforms. Others are suppressed when the system is a uniprocessor. As of this writing, only i386 and x86_64 platforms support the new IRQ vector displays. @@ -1202,7 +1202,7 @@ The columns are: W = can do write operations U = can do unblank flags E = it is enabled - C = it is prefered console + C = it is preferred console B = it is primary boot console p = it is used for printk buffer b = it is not a TTY but a Braille device @@ -1331,7 +1331,7 @@ NOTICE: /proc//oom_adj is deprecated and will be removed, please see Documentation/feature-removal-schedule.txt. Caveat: when a parent task is selected, the oom killer will sacrifice any first -generation children with seperate address spaces instead, if possible. This +generation children with separate address spaces instead, if possible. This avoids servers and important system daemons from being killed and loses the minimal amount of work. -- cgit v1.2.2 From a26ac2455ffcf3be5c6ef92bc6df7182700f2114 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Wed, 12 Jan 2011 14:10:23 -0800 Subject: rcu: move TREE_RCU from softirq to kthread If RCU priority boosting is to be meaningful, callback invocation must be boosted in addition to preempted RCU readers. Otherwise, in presence of CPU real-time threads, the grace period ends, but the callbacks don't get invoked. If the callbacks don't get invoked, the associated memory doesn't get freed, so the system is still subject to OOM. But it is not reasonable to priority-boost RCU_SOFTIRQ, so this commit moves the callback invocations to a kthread, which can be boosted easily. Also add comments and properly synchronized all accesses to rcu_cpu_kthread_task, as suggested by Lai Jiangshan. Signed-off-by: Paul E. McKenney Signed-off-by: Paul E. McKenney Reviewed-by: Josh Triplett --- Documentation/filesystems/proc.txt | 1 - 1 file changed, 1 deletion(-) (limited to 'Documentation/filesystems/proc.txt') diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index b0b814d75ca1..60740e8ecb37 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -836,7 +836,6 @@ Provides counts of softirq handlers serviced since boot time, for each cpu. TASKLET: 0 0 0 290 SCHED: 27035 26983 26971 26746 HRTIMER: 0 0 0 0 - RCU: 1678 1769 2178 2250 1.3 IDE devices in /proc/ide -- cgit v1.2.2 From 4b060420a596095869a6d7849caa798d23839cd1 Mon Sep 17 00:00:00 2001 From: Mike Travis Date: Tue, 24 May 2011 17:13:12 -0700 Subject: bitmap, irq: add smp_affinity_list interface to /proc/irq Manually adjusting the smp_affinity for IRQ's becomes unwieldy when the cpu count is large. Setting smp affinity to cpus 256 to 263 would be: echo 000000ff,00000000,00000000,00000000,00000000,00000000,00000000,00000000 > smp_affinity instead of: echo 256-263 > smp_affinity_list Think about what it looks like for cpus around say, 4088 to 4095. We already have many alternate "list" interfaces: /sys/devices/system/cpu/cpuX/indexY/shared_cpu_list /sys/devices/system/cpu/cpuX/topology/thread_siblings_list /sys/devices/system/cpu/cpuX/topology/core_siblings_list /sys/devices/system/node/nodeX/cpulist /sys/devices/pci***/***/local_cpulist Add a companion interface, smp_affinity_list to use cpu lists instead of cpu maps. This conforms to other companion interfaces where both a map and a list interface exists. This required adding a bitmap_parselist_user() function in a manner similar to the bitmap_parse_user() function. [akpm@linux-foundation.org: make __bitmap_parselist() static] Signed-off-by: Mike Travis Cc: Thomas Gleixner Cc: Jack Steiner Cc: Lee Schermerhorn Cc: Andy Shevchenko Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/filesystems/proc.txt | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) (limited to 'Documentation/filesystems/proc.txt') diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index 60740e8ecb37..f48178024067 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -574,6 +574,12 @@ The contents of each smp_affinity file is the same by default: > cat /proc/irq/0/smp_affinity ffffffff +There is an alternate interface, smp_affinity_list which allows specifying +a cpu range instead of a bitmask: + + > cat /proc/irq/0/smp_affinity_list + 1024-1031 + The default_smp_affinity mask applies to all non-active IRQs, which are the IRQs which have not yet been allocated/activated, and hence which lack a /proc/irq/[0-9]* directory. @@ -583,12 +589,13 @@ reports itself as being attached. This hardware locality information does not include information about any possible driver locality preference. prof_cpu_mask specifies which CPUs are to be profiled by the system wide -profiler. Default value is ffffffff (all cpus). +profiler. Default value is ffffffff (all cpus if there are only 32 of them). The way IRQs are routed is handled by the IO-APIC, and it's Round Robin between all the CPUs which are allowed to handle it. As usual the kernel has more info than you and does a better job than you, so the defaults are the -best choice for almost everyone. +best choice for almost everyone. [Note this applies only to those IO-APIC's +that support "Round Robin" interrupt distribution.] There are three more important subdirectories in /proc: net, scsi, and sys. The general rule is that the contents, or even the existence of these -- cgit v1.2.2 From 09223371deac67d08ca0b70bd18787920284c967 Mon Sep 17 00:00:00 2001 From: Shaohua Li Date: Tue, 14 Jun 2011 13:26:25 +0800 Subject: rcu: Use softirq to address performance regression Commit a26ac2455ffcf3(rcu: move TREE_RCU from softirq to kthread) introduced performance regression. In an AIM7 test, this commit degraded performance by about 40%. The commit runs rcu callbacks in a kthread instead of softirq. We observed high rate of context switch which is caused by this. Out test system has 64 CPUs and HZ is 1000, so we saw more than 64k context switch per second which is caused by RCU's per-CPU kthread. A trace showed that most of the time the RCU per-CPU kthread doesn't actually handle any callbacks, but instead just does a very small amount of work handling grace periods. This means that RCU's per-CPU kthreads are making the scheduler do quite a bit of work in order to allow a very small amount of RCU-related processing to be done. Alex Shi's analysis determined that this slowdown is due to lock contention within the scheduler. Unfortunately, as Peter Zijlstra points out, the scheduler's real-time semantics require global action, which means that this contention is inherent in real-time scheduling. (Yes, perhaps someone will come up with a workaround -- otherwise, -rt is not going to do well on large SMP systems -- but this patch will work around this issue in the meantime. And "the meantime" might well be forever.) This patch therefore re-introduces softirq processing to RCU, but only for core RCU work. RCU callbacks are still executed in kthread context, so that only a small amount of RCU work runs in softirq context in the common case. This should minimize ksoftirqd execution, allowing us to skip boosting of ksoftirqd for CONFIG_RCU_BOOST=y kernels. Signed-off-by: Shaohua Li Tested-by: "Alex,Shi" Signed-off-by: Paul E. McKenney --- Documentation/filesystems/proc.txt | 1 + 1 file changed, 1 insertion(+) (limited to 'Documentation/filesystems/proc.txt') diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index f48178024067..db3b1aba32a3 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -843,6 +843,7 @@ Provides counts of softirq handlers serviced since boot time, for each cpu. TASKLET: 0 0 0 290 SCHED: 27035 26983 26971 26746 HRTIMER: 0 0 0 0 + RCU: 1678 1769 2178 2250 1.3 IDE devices in /proc/ide -- cgit v1.2.2