diff options
author | Hugh Dickins <hugh.dickins@tiscali.co.uk> | 2009-09-21 20:02:24 -0400 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2009-09-22 10:17:33 -0400 |
commit | 7701c9c0f54feb682d0cefa2ae1f4a1e00e0ba09 (patch) | |
tree | 50f98c7bf7071f1559c529b0c963a9dc86c54e00 | |
parent | 2ffd8679c8e4ec226718bff58b50b226dd477015 (diff) |
ksm: add some documentation
Add Documentation/vm/ksm.txt: how to use the Kernel Samepage Merging feature
Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Izik Eidus <ieidus@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-rw-r--r-- | Documentation/vm/00-INDEX | 2 | ||||
-rw-r--r-- | Documentation/vm/ksm.txt | 89 | ||||
-rw-r--r-- | mm/Kconfig | 1 |
3 files changed, 92 insertions, 0 deletions
diff --git a/Documentation/vm/00-INDEX b/Documentation/vm/00-INDEX index 2f77ced35df7..f80a44944874 100644 --- a/Documentation/vm/00-INDEX +++ b/Documentation/vm/00-INDEX | |||
@@ -6,6 +6,8 @@ balance | |||
6 | - various information on memory balancing. | 6 | - various information on memory balancing. |
7 | hugetlbpage.txt | 7 | hugetlbpage.txt |
8 | - a brief summary of hugetlbpage support in the Linux kernel. | 8 | - a brief summary of hugetlbpage support in the Linux kernel. |
9 | ksm.txt | ||
10 | - how to use the Kernel Samepage Merging feature. | ||
9 | locking | 11 | locking |
10 | - info on how locking and synchronization is done in the Linux vm code. | 12 | - info on how locking and synchronization is done in the Linux vm code. |
11 | numa | 13 | numa |
diff --git a/Documentation/vm/ksm.txt b/Documentation/vm/ksm.txt new file mode 100644 index 000000000000..72a22f65960e --- /dev/null +++ b/Documentation/vm/ksm.txt | |||
@@ -0,0 +1,89 @@ | |||
1 | How to use the Kernel Samepage Merging feature | ||
2 | ---------------------------------------------- | ||
3 | |||
4 | KSM is a memory-saving de-duplication feature, enabled by CONFIG_KSM=y, | ||
5 | added to the Linux kernel in 2.6.32. See mm/ksm.c for its implementation, | ||
6 | and http://lwn.net/Articles/306704/ and http://lwn.net/Articles/330589/ | ||
7 | |||
8 | The KSM daemon ksmd periodically scans those areas of user memory which | ||
9 | have been registered with it, looking for pages of identical content which | ||
10 | can be replaced by a single write-protected page (which is automatically | ||
11 | copied if a process later wants to update its content). | ||
12 | |||
13 | KSM was originally developed for use with KVM (where it was known as | ||
14 | Kernel Shared Memory), to fit more virtual machines into physical memory, | ||
15 | by sharing the data common between them. But it can be useful to any | ||
16 | application which generates many instances of the same data. | ||
17 | |||
18 | KSM only merges anonymous (private) pages, never pagecache (file) pages. | ||
19 | KSM's merged pages are at present locked into kernel memory for as long | ||
20 | as they are shared: so cannot be swapped out like the user pages they | ||
21 | replace (but swapping KSM pages should follow soon in a later release). | ||
22 | |||
23 | KSM only operates on those areas of address space which an application | ||
24 | has advised to be likely candidates for merging, by using the madvise(2) | ||
25 | system call: int madvise(addr, length, MADV_MERGEABLE). | ||
26 | |||
27 | The app may call int madvise(addr, length, MADV_UNMERGEABLE) to cancel | ||
28 | that advice and restore unshared pages: whereupon KSM unmerges whatever | ||
29 | it merged in that range. Note: this unmerging call may suddenly require | ||
30 | more memory than is available - possibly failing with EAGAIN, but more | ||
31 | probably arousing the Out-Of-Memory killer. | ||
32 | |||
33 | If KSM is not configured into the running kernel, madvise MADV_MERGEABLE | ||
34 | and MADV_UNMERGEABLE simply fail with EINVAL. If the running kernel was | ||
35 | built with CONFIG_KSM=y, those calls will normally succeed: even if the | ||
36 | the KSM daemon is not currently running, MADV_MERGEABLE still registers | ||
37 | the range for whenever the KSM daemon is started; even if the range | ||
38 | cannot contain any pages which KSM could actually merge; even if | ||
39 | MADV_UNMERGEABLE is applied to a range which was never MADV_MERGEABLE. | ||
40 | |||
41 | Like other madvise calls, they are intended for use on mapped areas of | ||
42 | the user address space: they will report ENOMEM if the specified range | ||
43 | includes unmapped gaps (though working on the intervening mapped areas), | ||
44 | and might fail with EAGAIN if not enough memory for internal structures. | ||
45 | |||
46 | Applications should be considerate in their use of MADV_MERGEABLE, | ||
47 | restricting its use to areas likely to benefit. KSM's scans may use | ||
48 | a lot of processing power, and its kernel-resident pages are a limited | ||
49 | resource. Some installations will disable KSM for these reasons. | ||
50 | |||
51 | The KSM daemon is controlled by sysfs files in /sys/kernel/mm/ksm/, | ||
52 | readable by all but writable only by root: | ||
53 | |||
54 | max_kernel_pages - set to maximum number of kernel pages that KSM may use | ||
55 | e.g. "echo 2000 > /sys/kernel/mm/ksm/max_kernel_pages" | ||
56 | Value 0 imposes no limit on the kernel pages KSM may use; | ||
57 | but note that any process using MADV_MERGEABLE can cause | ||
58 | KSM to allocate these pages, unswappable until it exits. | ||
59 | Default: 2000 (chosen for demonstration purposes) | ||
60 | |||
61 | pages_to_scan - how many present pages to scan before ksmd goes to sleep | ||
62 | e.g. "echo 200 > /sys/kernel/mm/ksm/pages_to_scan" | ||
63 | Default: 200 (chosen for demonstration purposes) | ||
64 | |||
65 | sleep_millisecs - how many milliseconds ksmd should sleep before next scan | ||
66 | e.g. "echo 20 > /sys/kernel/mm/ksm/sleep_millisecs" | ||
67 | Default: 20 (chosen for demonstration purposes) | ||
68 | |||
69 | run - set 0 to stop ksmd from running but keep merged pages, | ||
70 | set 1 to run ksmd e.g. "echo 1 > /sys/kernel/mm/ksm/run", | ||
71 | set 2 to stop ksmd and unmerge all pages currently merged, | ||
72 | but leave mergeable areas registered for next run | ||
73 | Default: 1 (for immediate use by apps which register) | ||
74 | |||
75 | The effectiveness of KSM and MADV_MERGEABLE is shown in /sys/kernel/mm/ksm/: | ||
76 | |||
77 | pages_shared - how many shared unswappable kernel pages KSM is using | ||
78 | pages_sharing - how many more sites are sharing them i.e. how much saved | ||
79 | pages_unshared - how many pages unique but repeatedly checked for merging | ||
80 | pages_volatile - how many pages changing too fast to be placed in a tree | ||
81 | full_scans - how many times all mergeable areas have been scanned | ||
82 | |||
83 | A high ratio of pages_sharing to pages_shared indicates good sharing, but | ||
84 | a high ratio of pages_unshared to pages_sharing indicates wasted effort. | ||
85 | pages_volatile embraces several different kinds of activity, but a high | ||
86 | proportion there would also indicate poor use of madvise MADV_MERGEABLE. | ||
87 | |||
88 | Izik Eidus, | ||
89 | Hugh Dickins, 30 July 2009 | ||
diff --git a/mm/Kconfig b/mm/Kconfig index c0b6afa178a1..71eb0b4cce8d 100644 --- a/mm/Kconfig +++ b/mm/Kconfig | |||
@@ -224,6 +224,7 @@ config KSM | |||
224 | the many instances by a single resident page with that content, so | 224 | the many instances by a single resident page with that content, so |
225 | saving memory until one or another app needs to modify the content. | 225 | saving memory until one or another app needs to modify the content. |
226 | Recommended for use with KVM, or with other duplicative applications. | 226 | Recommended for use with KVM, or with other duplicative applications. |
227 | See Documentation/vm/ksm.txt for more information. | ||
227 | 228 | ||
228 | config DEFAULT_MMAP_MIN_ADDR | 229 | config DEFAULT_MMAP_MIN_ADDR |
229 | int "Low address space to protect from user allocation" | 230 | int "Low address space to protect from user allocation" |