aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2012-07-31 22:25:39 -0400
committerLinus Torvalds <torvalds@linux-foundation.org>2012-07-31 22:25:39 -0400
commitac694dbdbc403c00e2c14d10bc7b8412cc378259 (patch)
treee37328cfbeaf43716dd5914cad9179e57e84df76 /Documentation
parenta40a1d3d0a2fd613fdec6d89d3c053268ced76ed (diff)
parent437ea90cc3afdca5229b41c6b1d38c4842756cb9 (diff)
Merge branch 'akpm' (Andrew's patch-bomb)
Merge Andrew's second set of patches: - MM - a few random fixes - a couple of RTC leftovers * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (120 commits) rtc/rtc-88pm80x: remove unneed devm_kfree rtc/rtc-88pm80x: assign ret only when rtc_register_driver fails mm: hugetlbfs: close race during teardown of hugetlbfs shared page tables tmpfs: distribute interleave better across nodes mm: remove redundant initialization mm: warn if pg_data_t isn't initialized with zero mips: zero out pg_data_t when it's allocated memcg: gix memory accounting scalability in shrink_page_list mm/sparse: remove index_init_lock mm/sparse: more checks on mem_section number mm/sparse: optimize sparse_index_alloc memcg: add mem_cgroup_from_css() helper memcg: further prevent OOM with too many dirty pages memcg: prevent OOM with too many dirty pages mm: mmu_notifier: fix freed page still mapped in secondary MMU mm: memcg: only check anon swapin page charges for swap cache mm: memcg: only check swap cache pages for repeated charging mm: memcg: split swapin charge function into private and public part mm: memcg: remove needless !mm fixup to init_mm when charging mm: memcg: remove unneeded shmem charge type ...
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/ABI/obsolete/proc-sys-vm-nr_pdflush_threads5
-rw-r--r--Documentation/cgroups/hugetlb.txt45
-rw-r--r--Documentation/cgroups/memory.txt12
-rw-r--r--Documentation/feature-removal-schedule.txt8
-rw-r--r--Documentation/filesystems/Locking13
-rw-r--r--Documentation/filesystems/vfs.txt12
-rw-r--r--Documentation/sysctl/vm.txt30
7 files changed, 104 insertions, 21 deletions
diff --git a/Documentation/ABI/obsolete/proc-sys-vm-nr_pdflush_threads b/Documentation/ABI/obsolete/proc-sys-vm-nr_pdflush_threads
new file mode 100644
index 000000000000..b0b0eeb20fe3
--- /dev/null
+++ b/Documentation/ABI/obsolete/proc-sys-vm-nr_pdflush_threads
@@ -0,0 +1,5 @@
1What: /proc/sys/vm/nr_pdflush_threads
2Date: June 2012
3Contact: Wanpeng Li <liwp@linux.vnet.ibm.com>
4Description: Since pdflush is replaced by per-BDI flusher, the interface of old pdflush
5 exported in /proc/sys/vm/ should be removed.
diff --git a/Documentation/cgroups/hugetlb.txt b/Documentation/cgroups/hugetlb.txt
new file mode 100644
index 000000000000..a9faaca1f029
--- /dev/null
+++ b/Documentation/cgroups/hugetlb.txt
@@ -0,0 +1,45 @@
1HugeTLB Controller
2-------------------
3
4The HugeTLB controller allows to limit the HugeTLB usage per control group and
5enforces the controller limit during page fault. Since HugeTLB doesn't
6support page reclaim, enforcing the limit at page fault time implies that,
7the application will get SIGBUS signal if it tries to access HugeTLB pages
8beyond its limit. This requires the application to know beforehand how much
9HugeTLB pages it would require for its use.
10
11HugeTLB controller can be created by first mounting the cgroup filesystem.
12
13# mount -t cgroup -o hugetlb none /sys/fs/cgroup
14
15With the above step, the initial or the parent HugeTLB group becomes
16visible at /sys/fs/cgroup. At bootup, this group includes all the tasks in
17the system. /sys/fs/cgroup/tasks lists the tasks in this cgroup.
18
19New groups can be created under the parent group /sys/fs/cgroup.
20
21# cd /sys/fs/cgroup
22# mkdir g1
23# echo $$ > g1/tasks
24
25The above steps create a new group g1 and move the current shell
26process (bash) into it.
27
28Brief summary of control files
29
30 hugetlb.<hugepagesize>.limit_in_bytes # set/show limit of "hugepagesize" hugetlb usage
31 hugetlb.<hugepagesize>.max_usage_in_bytes # show max "hugepagesize" hugetlb usage recorded
32 hugetlb.<hugepagesize>.usage_in_bytes # show current res_counter usage for "hugepagesize" hugetlb
33 hugetlb.<hugepagesize>.failcnt # show the number of allocation failure due to HugeTLB limit
34
35For a system supporting two hugepage size (16M and 16G) the control
36files include:
37
38hugetlb.16GB.limit_in_bytes
39hugetlb.16GB.max_usage_in_bytes
40hugetlb.16GB.usage_in_bytes
41hugetlb.16GB.failcnt
42hugetlb.16MB.limit_in_bytes
43hugetlb.16MB.max_usage_in_bytes
44hugetlb.16MB.usage_in_bytes
45hugetlb.16MB.failcnt
diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt
index dd88540bb995..4372e6b8a353 100644
--- a/Documentation/cgroups/memory.txt
+++ b/Documentation/cgroups/memory.txt
@@ -73,6 +73,8 @@ Brief summary of control files.
73 73
74 memory.kmem.tcp.limit_in_bytes # set/show hard limit for tcp buf memory 74 memory.kmem.tcp.limit_in_bytes # set/show hard limit for tcp buf memory
75 memory.kmem.tcp.usage_in_bytes # show current tcp buf memory allocation 75 memory.kmem.tcp.usage_in_bytes # show current tcp buf memory allocation
76 memory.kmem.tcp.failcnt # show the number of tcp buf memory usage hits limits
77 memory.kmem.tcp.max_usage_in_bytes # show max tcp buf memory usage recorded
76 78
771. History 791. History
78 80
@@ -187,12 +189,12 @@ the cgroup that brought it in -- this will happen on memory pressure).
187But see section 8.2: when moving a task to another cgroup, its pages may 189But see section 8.2: when moving a task to another cgroup, its pages may
188be recharged to the new cgroup, if move_charge_at_immigrate has been chosen. 190be recharged to the new cgroup, if move_charge_at_immigrate has been chosen.
189 191
190Exception: If CONFIG_CGROUP_CGROUP_MEM_RES_CTLR_SWAP is not used. 192Exception: If CONFIG_CGROUP_CGROUP_MEMCG_SWAP is not used.
191When you do swapoff and make swapped-out pages of shmem(tmpfs) to 193When you do swapoff and make swapped-out pages of shmem(tmpfs) to
192be backed into memory in force, charges for pages are accounted against the 194be backed into memory in force, charges for pages are accounted against the
193caller of swapoff rather than the users of shmem. 195caller of swapoff rather than the users of shmem.
194 196
1952.4 Swap Extension (CONFIG_CGROUP_MEM_RES_CTLR_SWAP) 1972.4 Swap Extension (CONFIG_MEMCG_SWAP)
196 198
197Swap Extension allows you to record charge for swap. A swapped-in page is 199Swap Extension allows you to record charge for swap. A swapped-in page is
198charged back to original page allocator if possible. 200charged back to original page allocator if possible.
@@ -259,7 +261,7 @@ When oom event notifier is registered, event will be delivered.
259 per-zone-per-cgroup LRU (cgroup's private LRU) is just guarded by 261 per-zone-per-cgroup LRU (cgroup's private LRU) is just guarded by
260 zone->lru_lock, it has no lock of its own. 262 zone->lru_lock, it has no lock of its own.
261 263
2622.7 Kernel Memory Extension (CONFIG_CGROUP_MEM_RES_CTLR_KMEM) 2642.7 Kernel Memory Extension (CONFIG_MEMCG_KMEM)
263 265
264With the Kernel memory extension, the Memory Controller is able to limit 266With the Kernel memory extension, the Memory Controller is able to limit
265the amount of kernel memory used by the system. Kernel memory is fundamentally 267the amount of kernel memory used by the system. Kernel memory is fundamentally
@@ -286,8 +288,8 @@ per cgroup, instead of globally.
286 288
287a. Enable CONFIG_CGROUPS 289a. Enable CONFIG_CGROUPS
288b. Enable CONFIG_RESOURCE_COUNTERS 290b. Enable CONFIG_RESOURCE_COUNTERS
289c. Enable CONFIG_CGROUP_MEM_RES_CTLR 291c. Enable CONFIG_MEMCG
290d. Enable CONFIG_CGROUP_MEM_RES_CTLR_SWAP (to use swap extension) 292d. Enable CONFIG_MEMCG_SWAP (to use swap extension)
291 293
2921. Prepare the cgroups (see cgroups.txt, Why are cgroups needed?) 2941. Prepare the cgroups (see cgroups.txt, Why are cgroups needed?)
293# mount -t tmpfs none /sys/fs/cgroup 295# mount -t tmpfs none /sys/fs/cgroup
diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt
index 24fec7603e5e..72ed15075f79 100644
--- a/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@ -13,6 +13,14 @@ Who: Jim Cromie <jim.cromie@gmail.com>, Jason Baron <jbaron@redhat.com>
13 13
14--------------------------- 14---------------------------
15 15
16What: /proc/sys/vm/nr_pdflush_threads
17When: 2012
18Why: Since pdflush is deprecated, the interface exported in /proc/sys/vm/
19 should be removed.
20Who: Wanpeng Li <liwp@linux.vnet.ibm.com>
21
22---------------------------
23
16What: CONFIG_APM_CPU_IDLE, and its ability to call APM BIOS in idle 24What: CONFIG_APM_CPU_IDLE, and its ability to call APM BIOS in idle
17When: 2012 25When: 2012
18Why: This optional sub-feature of APM is of dubious reliability, 26Why: This optional sub-feature of APM is of dubious reliability,
diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index e0cce2a5f820..2db1900d7538 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -206,6 +206,8 @@ prototypes:
206 int (*launder_page)(struct page *); 206 int (*launder_page)(struct page *);
207 int (*is_partially_uptodate)(struct page *, read_descriptor_t *, unsigned long); 207 int (*is_partially_uptodate)(struct page *, read_descriptor_t *, unsigned long);
208 int (*error_remove_page)(struct address_space *, struct page *); 208 int (*error_remove_page)(struct address_space *, struct page *);
209 int (*swap_activate)(struct file *);
210 int (*swap_deactivate)(struct file *);
209 211
210locking rules: 212locking rules:
211 All except set_page_dirty and freepage may block 213 All except set_page_dirty and freepage may block
@@ -229,6 +231,8 @@ migratepage: yes (both)
229launder_page: yes 231launder_page: yes
230is_partially_uptodate: yes 232is_partially_uptodate: yes
231error_remove_page: yes 233error_remove_page: yes
234swap_activate: no
235swap_deactivate: no
232 236
233 ->write_begin(), ->write_end(), ->sync_page() and ->readpage() 237 ->write_begin(), ->write_end(), ->sync_page() and ->readpage()
234may be called from the request handler (/dev/loop). 238may be called from the request handler (/dev/loop).
@@ -330,6 +334,15 @@ cleaned, or an error value if not. Note that in order to prevent the page
330getting mapped back in and redirtied, it needs to be kept locked 334getting mapped back in and redirtied, it needs to be kept locked
331across the entire operation. 335across the entire operation.
332 336
337 ->swap_activate will be called with a non-zero argument on
338files backing (non block device backed) swapfiles. A return value
339of zero indicates success, in which case this file can be used for
340backing swapspace. The swapspace operations will be proxied to the
341address space operations.
342
343 ->swap_deactivate() will be called in the sys_swapoff()
344path after ->swap_activate() returned success.
345
333----------------------- file_lock_operations ------------------------------ 346----------------------- file_lock_operations ------------------------------
334prototypes: 347prototypes:
335 void (*fl_copy_lock)(struct file_lock *, struct file_lock *); 348 void (*fl_copy_lock)(struct file_lock *, struct file_lock *);
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index aa754e01464e..065aa2dc0835 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -592,6 +592,8 @@ struct address_space_operations {
592 int (*migratepage) (struct page *, struct page *); 592 int (*migratepage) (struct page *, struct page *);
593 int (*launder_page) (struct page *); 593 int (*launder_page) (struct page *);
594 int (*error_remove_page) (struct mapping *mapping, struct page *page); 594 int (*error_remove_page) (struct mapping *mapping, struct page *page);
595 int (*swap_activate)(struct file *);
596 int (*swap_deactivate)(struct file *);
595}; 597};
596 598
597 writepage: called by the VM to write a dirty page to backing store. 599 writepage: called by the VM to write a dirty page to backing store.
@@ -760,6 +762,16 @@ struct address_space_operations {
760 Setting this implies you deal with pages going away under you, 762 Setting this implies you deal with pages going away under you,
761 unless you have them locked or reference counts increased. 763 unless you have them locked or reference counts increased.
762 764
765 swap_activate: Called when swapon is used on a file to allocate
766 space if necessary and pin the block lookup information in
767 memory. A return value of zero indicates success,
768 in which case this file can be used to back swapspace. The
769 swapspace operations will be proxied to this address space's
770 ->swap_{out,in} methods.
771
772 swap_deactivate: Called during swapoff on files where swap_activate
773 was successful.
774
763 775
764The File Object 776The File Object
765=============== 777===============
diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
index 96f0ee825bed..dcc2a94ae34e 100644
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -42,7 +42,6 @@ Currently, these files are in /proc/sys/vm:
42- mmap_min_addr 42- mmap_min_addr
43- nr_hugepages 43- nr_hugepages
44- nr_overcommit_hugepages 44- nr_overcommit_hugepages
45- nr_pdflush_threads
46- nr_trim_pages (only if CONFIG_MMU=n) 45- nr_trim_pages (only if CONFIG_MMU=n)
47- numa_zonelist_order 46- numa_zonelist_order
48- oom_dump_tasks 47- oom_dump_tasks
@@ -426,16 +425,6 @@ See Documentation/vm/hugetlbpage.txt
426 425
427============================================================== 426==============================================================
428 427
429nr_pdflush_threads
430
431The current number of pdflush threads. This value is read-only.
432The value changes according to the number of dirty pages in the system.
433
434When necessary, additional pdflush threads are created, one per second, up to
435nr_pdflush_threads_max.
436
437==============================================================
438
439nr_trim_pages 428nr_trim_pages
440 429
441This is available only on NOMMU kernels. 430This is available only on NOMMU kernels.
@@ -502,9 +491,10 @@ oom_dump_tasks
502 491
503Enables a system-wide task dump (excluding kernel threads) to be 492Enables a system-wide task dump (excluding kernel threads) to be
504produced when the kernel performs an OOM-killing and includes such 493produced when the kernel performs an OOM-killing and includes such
505information as pid, uid, tgid, vm size, rss, cpu, oom_adj score, and 494information as pid, uid, tgid, vm size, rss, nr_ptes, swapents,
506name. This is helpful to determine why the OOM killer was invoked 495oom_score_adj score, and name. This is helpful to determine why the
507and to identify the rogue task that caused it. 496OOM killer was invoked, to identify the rogue task that caused it,
497and to determine why the OOM killer chose the task it did to kill.
508 498
509If this is set to zero, this information is suppressed. On very 499If this is set to zero, this information is suppressed. On very
510large systems with thousands of tasks it may not be feasible to dump 500large systems with thousands of tasks it may not be feasible to dump
@@ -574,16 +564,24 @@ of physical RAM. See above.
574 564
575page-cluster 565page-cluster
576 566
577page-cluster controls the number of pages which are written to swap in 567page-cluster controls the number of pages up to which consecutive pages
578a single attempt. The swap I/O size. 568are read in from swap in a single attempt. This is the swap counterpart
569to page cache readahead.
570The mentioned consecutivity is not in terms of virtual/physical addresses,
571but consecutive on swap space - that means they were swapped out together.
579 572
580It is a logarithmic value - setting it to zero means "1 page", setting 573It is a logarithmic value - setting it to zero means "1 page", setting
581it to 1 means "2 pages", setting it to 2 means "4 pages", etc. 574it to 1 means "2 pages", setting it to 2 means "4 pages", etc.
575Zero disables swap readahead completely.
582 576
583The default value is three (eight pages at a time). There may be some 577The default value is three (eight pages at a time). There may be some
584small benefits in tuning this to a different value if your workload is 578small benefits in tuning this to a different value if your workload is
585swap-intensive. 579swap-intensive.
586 580
581Lower values mean lower latencies for initial faults, but at the same time
582extra faults and I/O delays for following faults if they would have been part of
583that consecutive pages readahead would have brought in.
584
587============================================================= 585=============================================================
588 586
589panic_on_oom 587panic_on_oom