diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2012-07-31 22:25:39 -0400 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2012-07-31 22:25:39 -0400 |
commit | ac694dbdbc403c00e2c14d10bc7b8412cc378259 (patch) | |
tree | e37328cfbeaf43716dd5914cad9179e57e84df76 /Documentation | |
parent | a40a1d3d0a2fd613fdec6d89d3c053268ced76ed (diff) | |
parent | 437ea90cc3afdca5229b41c6b1d38c4842756cb9 (diff) |
Merge branch 'akpm' (Andrew's patch-bomb)
Merge Andrew's second set of patches:
- MM
- a few random fixes
- a couple of RTC leftovers
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (120 commits)
rtc/rtc-88pm80x: remove unneed devm_kfree
rtc/rtc-88pm80x: assign ret only when rtc_register_driver fails
mm: hugetlbfs: close race during teardown of hugetlbfs shared page tables
tmpfs: distribute interleave better across nodes
mm: remove redundant initialization
mm: warn if pg_data_t isn't initialized with zero
mips: zero out pg_data_t when it's allocated
memcg: gix memory accounting scalability in shrink_page_list
mm/sparse: remove index_init_lock
mm/sparse: more checks on mem_section number
mm/sparse: optimize sparse_index_alloc
memcg: add mem_cgroup_from_css() helper
memcg: further prevent OOM with too many dirty pages
memcg: prevent OOM with too many dirty pages
mm: mmu_notifier: fix freed page still mapped in secondary MMU
mm: memcg: only check anon swapin page charges for swap cache
mm: memcg: only check swap cache pages for repeated charging
mm: memcg: split swapin charge function into private and public part
mm: memcg: remove needless !mm fixup to init_mm when charging
mm: memcg: remove unneeded shmem charge type
...
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/ABI/obsolete/proc-sys-vm-nr_pdflush_threads | 5 | ||||
-rw-r--r-- | Documentation/cgroups/hugetlb.txt | 45 | ||||
-rw-r--r-- | Documentation/cgroups/memory.txt | 12 | ||||
-rw-r--r-- | Documentation/feature-removal-schedule.txt | 8 | ||||
-rw-r--r-- | Documentation/filesystems/Locking | 13 | ||||
-rw-r--r-- | Documentation/filesystems/vfs.txt | 12 | ||||
-rw-r--r-- | Documentation/sysctl/vm.txt | 30 |
7 files changed, 104 insertions, 21 deletions
diff --git a/Documentation/ABI/obsolete/proc-sys-vm-nr_pdflush_threads b/Documentation/ABI/obsolete/proc-sys-vm-nr_pdflush_threads new file mode 100644 index 000000000000..b0b0eeb20fe3 --- /dev/null +++ b/Documentation/ABI/obsolete/proc-sys-vm-nr_pdflush_threads | |||
@@ -0,0 +1,5 @@ | |||
1 | What: /proc/sys/vm/nr_pdflush_threads | ||
2 | Date: June 2012 | ||
3 | Contact: Wanpeng Li <liwp@linux.vnet.ibm.com> | ||
4 | Description: Since pdflush is replaced by per-BDI flusher, the interface of old pdflush | ||
5 | exported in /proc/sys/vm/ should be removed. | ||
diff --git a/Documentation/cgroups/hugetlb.txt b/Documentation/cgroups/hugetlb.txt new file mode 100644 index 000000000000..a9faaca1f029 --- /dev/null +++ b/Documentation/cgroups/hugetlb.txt | |||
@@ -0,0 +1,45 @@ | |||
1 | HugeTLB Controller | ||
2 | ------------------- | ||
3 | |||
4 | The HugeTLB controller allows to limit the HugeTLB usage per control group and | ||
5 | enforces the controller limit during page fault. Since HugeTLB doesn't | ||
6 | support page reclaim, enforcing the limit at page fault time implies that, | ||
7 | the application will get SIGBUS signal if it tries to access HugeTLB pages | ||
8 | beyond its limit. This requires the application to know beforehand how much | ||
9 | HugeTLB pages it would require for its use. | ||
10 | |||
11 | HugeTLB controller can be created by first mounting the cgroup filesystem. | ||
12 | |||
13 | # mount -t cgroup -o hugetlb none /sys/fs/cgroup | ||
14 | |||
15 | With the above step, the initial or the parent HugeTLB group becomes | ||
16 | visible at /sys/fs/cgroup. At bootup, this group includes all the tasks in | ||
17 | the system. /sys/fs/cgroup/tasks lists the tasks in this cgroup. | ||
18 | |||
19 | New groups can be created under the parent group /sys/fs/cgroup. | ||
20 | |||
21 | # cd /sys/fs/cgroup | ||
22 | # mkdir g1 | ||
23 | # echo $$ > g1/tasks | ||
24 | |||
25 | The above steps create a new group g1 and move the current shell | ||
26 | process (bash) into it. | ||
27 | |||
28 | Brief summary of control files | ||
29 | |||
30 | hugetlb.<hugepagesize>.limit_in_bytes # set/show limit of "hugepagesize" hugetlb usage | ||
31 | hugetlb.<hugepagesize>.max_usage_in_bytes # show max "hugepagesize" hugetlb usage recorded | ||
32 | hugetlb.<hugepagesize>.usage_in_bytes # show current res_counter usage for "hugepagesize" hugetlb | ||
33 | hugetlb.<hugepagesize>.failcnt # show the number of allocation failure due to HugeTLB limit | ||
34 | |||
35 | For a system supporting two hugepage size (16M and 16G) the control | ||
36 | files include: | ||
37 | |||
38 | hugetlb.16GB.limit_in_bytes | ||
39 | hugetlb.16GB.max_usage_in_bytes | ||
40 | hugetlb.16GB.usage_in_bytes | ||
41 | hugetlb.16GB.failcnt | ||
42 | hugetlb.16MB.limit_in_bytes | ||
43 | hugetlb.16MB.max_usage_in_bytes | ||
44 | hugetlb.16MB.usage_in_bytes | ||
45 | hugetlb.16MB.failcnt | ||
diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt index dd88540bb995..4372e6b8a353 100644 --- a/Documentation/cgroups/memory.txt +++ b/Documentation/cgroups/memory.txt | |||
@@ -73,6 +73,8 @@ Brief summary of control files. | |||
73 | 73 | ||
74 | memory.kmem.tcp.limit_in_bytes # set/show hard limit for tcp buf memory | 74 | memory.kmem.tcp.limit_in_bytes # set/show hard limit for tcp buf memory |
75 | memory.kmem.tcp.usage_in_bytes # show current tcp buf memory allocation | 75 | memory.kmem.tcp.usage_in_bytes # show current tcp buf memory allocation |
76 | memory.kmem.tcp.failcnt # show the number of tcp buf memory usage hits limits | ||
77 | memory.kmem.tcp.max_usage_in_bytes # show max tcp buf memory usage recorded | ||
76 | 78 | ||
77 | 1. History | 79 | 1. History |
78 | 80 | ||
@@ -187,12 +189,12 @@ the cgroup that brought it in -- this will happen on memory pressure). | |||
187 | But see section 8.2: when moving a task to another cgroup, its pages may | 189 | But see section 8.2: when moving a task to another cgroup, its pages may |
188 | be recharged to the new cgroup, if move_charge_at_immigrate has been chosen. | 190 | be recharged to the new cgroup, if move_charge_at_immigrate has been chosen. |
189 | 191 | ||
190 | Exception: If CONFIG_CGROUP_CGROUP_MEM_RES_CTLR_SWAP is not used. | 192 | Exception: If CONFIG_CGROUP_CGROUP_MEMCG_SWAP is not used. |
191 | When you do swapoff and make swapped-out pages of shmem(tmpfs) to | 193 | When you do swapoff and make swapped-out pages of shmem(tmpfs) to |
192 | be backed into memory in force, charges for pages are accounted against the | 194 | be backed into memory in force, charges for pages are accounted against the |
193 | caller of swapoff rather than the users of shmem. | 195 | caller of swapoff rather than the users of shmem. |
194 | 196 | ||
195 | 2.4 Swap Extension (CONFIG_CGROUP_MEM_RES_CTLR_SWAP) | 197 | 2.4 Swap Extension (CONFIG_MEMCG_SWAP) |
196 | 198 | ||
197 | Swap Extension allows you to record charge for swap. A swapped-in page is | 199 | Swap Extension allows you to record charge for swap. A swapped-in page is |
198 | charged back to original page allocator if possible. | 200 | charged back to original page allocator if possible. |
@@ -259,7 +261,7 @@ When oom event notifier is registered, event will be delivered. | |||
259 | per-zone-per-cgroup LRU (cgroup's private LRU) is just guarded by | 261 | per-zone-per-cgroup LRU (cgroup's private LRU) is just guarded by |
260 | zone->lru_lock, it has no lock of its own. | 262 | zone->lru_lock, it has no lock of its own. |
261 | 263 | ||
262 | 2.7 Kernel Memory Extension (CONFIG_CGROUP_MEM_RES_CTLR_KMEM) | 264 | 2.7 Kernel Memory Extension (CONFIG_MEMCG_KMEM) |
263 | 265 | ||
264 | With the Kernel memory extension, the Memory Controller is able to limit | 266 | With the Kernel memory extension, the Memory Controller is able to limit |
265 | the amount of kernel memory used by the system. Kernel memory is fundamentally | 267 | the amount of kernel memory used by the system. Kernel memory is fundamentally |
@@ -286,8 +288,8 @@ per cgroup, instead of globally. | |||
286 | 288 | ||
287 | a. Enable CONFIG_CGROUPS | 289 | a. Enable CONFIG_CGROUPS |
288 | b. Enable CONFIG_RESOURCE_COUNTERS | 290 | b. Enable CONFIG_RESOURCE_COUNTERS |
289 | c. Enable CONFIG_CGROUP_MEM_RES_CTLR | 291 | c. Enable CONFIG_MEMCG |
290 | d. Enable CONFIG_CGROUP_MEM_RES_CTLR_SWAP (to use swap extension) | 292 | d. Enable CONFIG_MEMCG_SWAP (to use swap extension) |
291 | 293 | ||
292 | 1. Prepare the cgroups (see cgroups.txt, Why are cgroups needed?) | 294 | 1. Prepare the cgroups (see cgroups.txt, Why are cgroups needed?) |
293 | # mount -t tmpfs none /sys/fs/cgroup | 295 | # mount -t tmpfs none /sys/fs/cgroup |
diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index 24fec7603e5e..72ed15075f79 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt | |||
@@ -13,6 +13,14 @@ Who: Jim Cromie <jim.cromie@gmail.com>, Jason Baron <jbaron@redhat.com> | |||
13 | 13 | ||
14 | --------------------------- | 14 | --------------------------- |
15 | 15 | ||
16 | What: /proc/sys/vm/nr_pdflush_threads | ||
17 | When: 2012 | ||
18 | Why: Since pdflush is deprecated, the interface exported in /proc/sys/vm/ | ||
19 | should be removed. | ||
20 | Who: Wanpeng Li <liwp@linux.vnet.ibm.com> | ||
21 | |||
22 | --------------------------- | ||
23 | |||
16 | What: CONFIG_APM_CPU_IDLE, and its ability to call APM BIOS in idle | 24 | What: CONFIG_APM_CPU_IDLE, and its ability to call APM BIOS in idle |
17 | When: 2012 | 25 | When: 2012 |
18 | Why: This optional sub-feature of APM is of dubious reliability, | 26 | Why: This optional sub-feature of APM is of dubious reliability, |
diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking index e0cce2a5f820..2db1900d7538 100644 --- a/Documentation/filesystems/Locking +++ b/Documentation/filesystems/Locking | |||
@@ -206,6 +206,8 @@ prototypes: | |||
206 | int (*launder_page)(struct page *); | 206 | int (*launder_page)(struct page *); |
207 | int (*is_partially_uptodate)(struct page *, read_descriptor_t *, unsigned long); | 207 | int (*is_partially_uptodate)(struct page *, read_descriptor_t *, unsigned long); |
208 | int (*error_remove_page)(struct address_space *, struct page *); | 208 | int (*error_remove_page)(struct address_space *, struct page *); |
209 | int (*swap_activate)(struct file *); | ||
210 | int (*swap_deactivate)(struct file *); | ||
209 | 211 | ||
210 | locking rules: | 212 | locking rules: |
211 | All except set_page_dirty and freepage may block | 213 | All except set_page_dirty and freepage may block |
@@ -229,6 +231,8 @@ migratepage: yes (both) | |||
229 | launder_page: yes | 231 | launder_page: yes |
230 | is_partially_uptodate: yes | 232 | is_partially_uptodate: yes |
231 | error_remove_page: yes | 233 | error_remove_page: yes |
234 | swap_activate: no | ||
235 | swap_deactivate: no | ||
232 | 236 | ||
233 | ->write_begin(), ->write_end(), ->sync_page() and ->readpage() | 237 | ->write_begin(), ->write_end(), ->sync_page() and ->readpage() |
234 | may be called from the request handler (/dev/loop). | 238 | may be called from the request handler (/dev/loop). |
@@ -330,6 +334,15 @@ cleaned, or an error value if not. Note that in order to prevent the page | |||
330 | getting mapped back in and redirtied, it needs to be kept locked | 334 | getting mapped back in and redirtied, it needs to be kept locked |
331 | across the entire operation. | 335 | across the entire operation. |
332 | 336 | ||
337 | ->swap_activate will be called with a non-zero argument on | ||
338 | files backing (non block device backed) swapfiles. A return value | ||
339 | of zero indicates success, in which case this file can be used for | ||
340 | backing swapspace. The swapspace operations will be proxied to the | ||
341 | address space operations. | ||
342 | |||
343 | ->swap_deactivate() will be called in the sys_swapoff() | ||
344 | path after ->swap_activate() returned success. | ||
345 | |||
333 | ----------------------- file_lock_operations ------------------------------ | 346 | ----------------------- file_lock_operations ------------------------------ |
334 | prototypes: | 347 | prototypes: |
335 | void (*fl_copy_lock)(struct file_lock *, struct file_lock *); | 348 | void (*fl_copy_lock)(struct file_lock *, struct file_lock *); |
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt index aa754e01464e..065aa2dc0835 100644 --- a/Documentation/filesystems/vfs.txt +++ b/Documentation/filesystems/vfs.txt | |||
@@ -592,6 +592,8 @@ struct address_space_operations { | |||
592 | int (*migratepage) (struct page *, struct page *); | 592 | int (*migratepage) (struct page *, struct page *); |
593 | int (*launder_page) (struct page *); | 593 | int (*launder_page) (struct page *); |
594 | int (*error_remove_page) (struct mapping *mapping, struct page *page); | 594 | int (*error_remove_page) (struct mapping *mapping, struct page *page); |
595 | int (*swap_activate)(struct file *); | ||
596 | int (*swap_deactivate)(struct file *); | ||
595 | }; | 597 | }; |
596 | 598 | ||
597 | writepage: called by the VM to write a dirty page to backing store. | 599 | writepage: called by the VM to write a dirty page to backing store. |
@@ -760,6 +762,16 @@ struct address_space_operations { | |||
760 | Setting this implies you deal with pages going away under you, | 762 | Setting this implies you deal with pages going away under you, |
761 | unless you have them locked or reference counts increased. | 763 | unless you have them locked or reference counts increased. |
762 | 764 | ||
765 | swap_activate: Called when swapon is used on a file to allocate | ||
766 | space if necessary and pin the block lookup information in | ||
767 | memory. A return value of zero indicates success, | ||
768 | in which case this file can be used to back swapspace. The | ||
769 | swapspace operations will be proxied to this address space's | ||
770 | ->swap_{out,in} methods. | ||
771 | |||
772 | swap_deactivate: Called during swapoff on files where swap_activate | ||
773 | was successful. | ||
774 | |||
763 | 775 | ||
764 | The File Object | 776 | The File Object |
765 | =============== | 777 | =============== |
diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt index 96f0ee825bed..dcc2a94ae34e 100644 --- a/Documentation/sysctl/vm.txt +++ b/Documentation/sysctl/vm.txt | |||
@@ -42,7 +42,6 @@ Currently, these files are in /proc/sys/vm: | |||
42 | - mmap_min_addr | 42 | - mmap_min_addr |
43 | - nr_hugepages | 43 | - nr_hugepages |
44 | - nr_overcommit_hugepages | 44 | - nr_overcommit_hugepages |
45 | - nr_pdflush_threads | ||
46 | - nr_trim_pages (only if CONFIG_MMU=n) | 45 | - nr_trim_pages (only if CONFIG_MMU=n) |
47 | - numa_zonelist_order | 46 | - numa_zonelist_order |
48 | - oom_dump_tasks | 47 | - oom_dump_tasks |
@@ -426,16 +425,6 @@ See Documentation/vm/hugetlbpage.txt | |||
426 | 425 | ||
427 | ============================================================== | 426 | ============================================================== |
428 | 427 | ||
429 | nr_pdflush_threads | ||
430 | |||
431 | The current number of pdflush threads. This value is read-only. | ||
432 | The value changes according to the number of dirty pages in the system. | ||
433 | |||
434 | When necessary, additional pdflush threads are created, one per second, up to | ||
435 | nr_pdflush_threads_max. | ||
436 | |||
437 | ============================================================== | ||
438 | |||
439 | nr_trim_pages | 428 | nr_trim_pages |
440 | 429 | ||
441 | This is available only on NOMMU kernels. | 430 | This is available only on NOMMU kernels. |
@@ -502,9 +491,10 @@ oom_dump_tasks | |||
502 | 491 | ||
503 | Enables a system-wide task dump (excluding kernel threads) to be | 492 | Enables a system-wide task dump (excluding kernel threads) to be |
504 | produced when the kernel performs an OOM-killing and includes such | 493 | produced when the kernel performs an OOM-killing and includes such |
505 | information as pid, uid, tgid, vm size, rss, cpu, oom_adj score, and | 494 | information as pid, uid, tgid, vm size, rss, nr_ptes, swapents, |
506 | name. This is helpful to determine why the OOM killer was invoked | 495 | oom_score_adj score, and name. This is helpful to determine why the |
507 | and to identify the rogue task that caused it. | 496 | OOM killer was invoked, to identify the rogue task that caused it, |
497 | and to determine why the OOM killer chose the task it did to kill. | ||
508 | 498 | ||
509 | If this is set to zero, this information is suppressed. On very | 499 | If this is set to zero, this information is suppressed. On very |
510 | large systems with thousands of tasks it may not be feasible to dump | 500 | large systems with thousands of tasks it may not be feasible to dump |
@@ -574,16 +564,24 @@ of physical RAM. See above. | |||
574 | 564 | ||
575 | page-cluster | 565 | page-cluster |
576 | 566 | ||
577 | page-cluster controls the number of pages which are written to swap in | 567 | page-cluster controls the number of pages up to which consecutive pages |
578 | a single attempt. The swap I/O size. | 568 | are read in from swap in a single attempt. This is the swap counterpart |
569 | to page cache readahead. | ||
570 | The mentioned consecutivity is not in terms of virtual/physical addresses, | ||
571 | but consecutive on swap space - that means they were swapped out together. | ||
579 | 572 | ||
580 | It is a logarithmic value - setting it to zero means "1 page", setting | 573 | It is a logarithmic value - setting it to zero means "1 page", setting |
581 | it to 1 means "2 pages", setting it to 2 means "4 pages", etc. | 574 | it to 1 means "2 pages", setting it to 2 means "4 pages", etc. |
575 | Zero disables swap readahead completely. | ||
582 | 576 | ||
583 | The default value is three (eight pages at a time). There may be some | 577 | The default value is three (eight pages at a time). There may be some |
584 | small benefits in tuning this to a different value if your workload is | 578 | small benefits in tuning this to a different value if your workload is |
585 | swap-intensive. | 579 | swap-intensive. |
586 | 580 | ||
581 | Lower values mean lower latencies for initial faults, but at the same time | ||
582 | extra faults and I/O delays for following faults if they would have been part of | ||
583 | that consecutive pages readahead would have brought in. | ||
584 | |||
587 | ============================================================= | 585 | ============================================================= |
588 | 586 | ||
589 | panic_on_oom | 587 | panic_on_oom |