aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/vm
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/vm')
-rw-r--r--Documentation/vm/cleancache.txt43
-rw-r--r--Documentation/vm/page-types.c2
-rw-r--r--Documentation/vm/pagemap.txt4
-rw-r--r--Documentation/vm/unevictable-lru.txt8
4 files changed, 32 insertions, 25 deletions
diff --git a/Documentation/vm/cleancache.txt b/Documentation/vm/cleancache.txt
index 36c367c73084..142fbb0f325a 100644
--- a/Documentation/vm/cleancache.txt
+++ b/Documentation/vm/cleancache.txt
@@ -46,10 +46,11 @@ a negative return value indicates failure. A "put_page" will copy a
46the pool id, a file key, and a page index into the file. (The combination 46the pool id, a file key, and a page index into the file. (The combination
47of a pool id, a file key, and an index is sometimes called a "handle".) 47of a pool id, a file key, and an index is sometimes called a "handle".)
48A "get_page" will copy the page, if found, from cleancache into kernel memory. 48A "get_page" will copy the page, if found, from cleancache into kernel memory.
49A "flush_page" will ensure the page no longer is present in cleancache; 49An "invalidate_page" will ensure the page no longer is present in cleancache;
50a "flush_inode" will flush all pages associated with the specified file; 50an "invalidate_inode" will invalidate all pages associated with the specified
51and, when a filesystem is unmounted, a "flush_fs" will flush all pages in 51file; and, when a filesystem is unmounted, an "invalidate_fs" will invalidate
52all files specified by the given pool id and also surrender the pool id. 52all pages in all files specified by the given pool id and also surrender
53the pool id.
53 54
54An "init_shared_fs", like init_fs, obtains a pool id but tells cleancache 55An "init_shared_fs", like init_fs, obtains a pool id but tells cleancache
55to treat the pool as shared using a 128-bit UUID as a key. On systems 56to treat the pool as shared using a 128-bit UUID as a key. On systems
@@ -62,12 +63,12 @@ of the kernel (e.g. by "tools" that control cleancache). Or a
62cleancache implementation can simply disable shared_init by always 63cleancache implementation can simply disable shared_init by always
63returning a negative value. 64returning a negative value.
64 65
65If a get_page is successful on a non-shared pool, the page is flushed (thus 66If a get_page is successful on a non-shared pool, the page is invalidated
66making cleancache an "exclusive" cache). On a shared pool, the page 67(thus making cleancache an "exclusive" cache). On a shared pool, the page
67is NOT flushed on a successful get_page so that it remains accessible to 68is NOT invalidated on a successful get_page so that it remains accessible to
68other sharers. The kernel is responsible for ensuring coherency between 69other sharers. The kernel is responsible for ensuring coherency between
69cleancache (shared or not), the page cache, and the filesystem, using 70cleancache (shared or not), the page cache, and the filesystem, using
70cleancache flush operations as required. 71cleancache invalidate operations as required.
71 72
72Note that cleancache must enforce put-put-get coherency and get-get 73Note that cleancache must enforce put-put-get coherency and get-get
73coherency. For the former, if two puts are made to the same handle but 74coherency. For the former, if two puts are made to the same handle but
@@ -77,22 +78,22 @@ if a get for a given handle fails, subsequent gets for that handle will
77never succeed unless preceded by a successful put with that handle. 78never succeed unless preceded by a successful put with that handle.
78 79
79Last, cleancache provides no SMP serialization guarantees; if two 80Last, cleancache provides no SMP serialization guarantees; if two
80different Linux threads are simultaneously putting and flushing a page 81different Linux threads are simultaneously putting and invalidating a page
81with the same handle, the results are indeterminate. Callers must 82with the same handle, the results are indeterminate. Callers must
82lock the page to ensure serial behavior. 83lock the page to ensure serial behavior.
83 84
84CLEANCACHE PERFORMANCE METRICS 85CLEANCACHE PERFORMANCE METRICS
85 86
86Cleancache monitoring is done by sysfs files in the 87If properly configured, monitoring of cleancache is done via debugfs in
87/sys/kernel/mm/cleancache directory. The effectiveness of cleancache 88the /sys/kernel/debug/mm/cleancache directory. The effectiveness of cleancache
88can be measured (across all filesystems) with: 89can be measured (across all filesystems) with:
89 90
90succ_gets - number of gets that were successful 91succ_gets - number of gets that were successful
91failed_gets - number of gets that failed 92failed_gets - number of gets that failed
92puts - number of puts attempted (all "succeed") 93puts - number of puts attempted (all "succeed")
93flushes - number of flushes attempted 94invalidates - number of invalidates attempted
94 95
95A backend implementatation may provide additional metrics. 96A backend implementation may provide additional metrics.
96 97
97FAQ 98FAQ
98 99
@@ -143,7 +144,7 @@ systems.
143 144
144The core hooks for cleancache in VFS are in most cases a single line 145The core hooks for cleancache in VFS are in most cases a single line
145and the minimum set are placed precisely where needed to maintain 146and the minimum set are placed precisely where needed to maintain
146coherency (via cleancache_flush operations) between cleancache, 147coherency (via cleancache_invalidate operations) between cleancache,
147the page cache, and disk. All hooks compile into nothingness if 148the page cache, and disk. All hooks compile into nothingness if
148cleancache is config'ed off and turn into a function-pointer- 149cleancache is config'ed off and turn into a function-pointer-
149compare-to-NULL if config'ed on but no backend claims the ops 150compare-to-NULL if config'ed on but no backend claims the ops
@@ -184,15 +185,15 @@ or for real kernel-addressable RAM, it makes perfect sense for
184transcendent memory. 185transcendent memory.
185 186
1864) Why is non-shared cleancache "exclusive"? And where is the 1874) Why is non-shared cleancache "exclusive"? And where is the
187 page "flushed" after a "get"? (Minchan Kim) 188 page "invalidated" after a "get"? (Minchan Kim)
188 189
189The main reason is to free up space in transcendent memory and 190The main reason is to free up space in transcendent memory and
190to avoid unnecessary cleancache_flush calls. If you want inclusive, 191to avoid unnecessary cleancache_invalidate calls. If you want inclusive,
191the page can be "put" immediately following the "get". If 192the page can be "put" immediately following the "get". If
192put-after-get for inclusive becomes common, the interface could 193put-after-get for inclusive becomes common, the interface could
193be easily extended to add a "get_no_flush" call. 194be easily extended to add a "get_no_invalidate" call.
194 195
195The flush is done by the cleancache backend implementation. 196The invalidate is done by the cleancache backend implementation.
196 197
1975) What's the performance impact? 1985) What's the performance impact?
198 199
@@ -222,7 +223,7 @@ Some points for a filesystem to consider:
222 as tmpfs should not enable cleancache) 223 as tmpfs should not enable cleancache)
223- To ensure coherency/correctness, the FS must ensure that all 224- To ensure coherency/correctness, the FS must ensure that all
224 file removal or truncation operations either go through VFS or 225 file removal or truncation operations either go through VFS or
225 add hooks to do the equivalent cleancache "flush" operations 226 add hooks to do the equivalent cleancache "invalidate" operations
226- To ensure coherency/correctness, either inode numbers must 227- To ensure coherency/correctness, either inode numbers must
227 be unique across the lifetime of the on-disk file OR the 228 be unique across the lifetime of the on-disk file OR the
228 FS must provide an "encode_fh" function. 229 FS must provide an "encode_fh" function.
@@ -243,11 +244,11 @@ If cleancache would use the inode virtual address instead of
243inode/filehandle, the pool id could be eliminated. But, this 244inode/filehandle, the pool id could be eliminated. But, this
244won't work because cleancache retains pagecache data pages 245won't work because cleancache retains pagecache data pages
245persistently even when the inode has been pruned from the 246persistently even when the inode has been pruned from the
246inode unused list, and only flushes the data page if the file 247inode unused list, and only invalidates the data page if the file
247gets removed/truncated. So if cleancache used the inode kva, 248gets removed/truncated. So if cleancache used the inode kva,
248there would be potential coherency issues if/when the inode 249there would be potential coherency issues if/when the inode
249kva is reused for a different file. Alternately, if cleancache 250kva is reused for a different file. Alternately, if cleancache
250flushed the pages when the inode kva was freed, much of the value 251invalidated the pages when the inode kva was freed, much of the value
251of cleancache would be lost because the cache of pages in cleanache 252of cleancache would be lost because the cache of pages in cleanache
252is potentially much larger than the kernel pagecache and is most 253is potentially much larger than the kernel pagecache and is most
253useful if the pages survive inode cache removal. 254useful if the pages survive inode cache removal.
diff --git a/Documentation/vm/page-types.c b/Documentation/vm/page-types.c
index 7445caa26d05..0b13f02d4059 100644
--- a/Documentation/vm/page-types.c
+++ b/Documentation/vm/page-types.c
@@ -98,6 +98,7 @@
98#define KPF_HWPOISON 19 98#define KPF_HWPOISON 19
99#define KPF_NOPAGE 20 99#define KPF_NOPAGE 20
100#define KPF_KSM 21 100#define KPF_KSM 21
101#define KPF_THP 22
101 102
102/* [32-] kernel hacking assistances */ 103/* [32-] kernel hacking assistances */
103#define KPF_RESERVED 32 104#define KPF_RESERVED 32
@@ -147,6 +148,7 @@ static const char *page_flag_names[] = {
147 [KPF_HWPOISON] = "X:hwpoison", 148 [KPF_HWPOISON] = "X:hwpoison",
148 [KPF_NOPAGE] = "n:nopage", 149 [KPF_NOPAGE] = "n:nopage",
149 [KPF_KSM] = "x:ksm", 150 [KPF_KSM] = "x:ksm",
151 [KPF_THP] = "t:thp",
150 152
151 [KPF_RESERVED] = "r:reserved", 153 [KPF_RESERVED] = "r:reserved",
152 [KPF_MLOCKED] = "m:mlocked", 154 [KPF_MLOCKED] = "m:mlocked",
diff --git a/Documentation/vm/pagemap.txt b/Documentation/vm/pagemap.txt
index df09b9650a81..4600cbe3d6be 100644
--- a/Documentation/vm/pagemap.txt
+++ b/Documentation/vm/pagemap.txt
@@ -60,6 +60,7 @@ There are three components to pagemap:
60 19. HWPOISON 60 19. HWPOISON
61 20. NOPAGE 61 20. NOPAGE
62 21. KSM 62 21. KSM
63 22. THP
63 64
64Short descriptions to the page flags: 65Short descriptions to the page flags:
65 66
@@ -97,6 +98,9 @@ Short descriptions to the page flags:
9721. KSM 9821. KSM
98 identical memory pages dynamically shared between one or more processes 99 identical memory pages dynamically shared between one or more processes
99 100
10122. THP
102 contiguous pages which construct transparent hugepages
103
100 [IO related page flags] 104 [IO related page flags]
101 1. ERROR IO error occurred 105 1. ERROR IO error occurred
102 3. UPTODATE page has up-to-date data 106 3. UPTODATE page has up-to-date data
diff --git a/Documentation/vm/unevictable-lru.txt b/Documentation/vm/unevictable-lru.txt
index 97bae3c576c2..fa206cccf89f 100644
--- a/Documentation/vm/unevictable-lru.txt
+++ b/Documentation/vm/unevictable-lru.txt
@@ -538,7 +538,7 @@ different reverse map mechanisms.
538 process because mlocked pages are migratable. However, for reclaim, if 538 process because mlocked pages are migratable. However, for reclaim, if
539 the page is mapped into a VM_LOCKED VMA, the scan stops. 539 the page is mapped into a VM_LOCKED VMA, the scan stops.
540 540
541 try_to_unmap_anon() attempts to acquire in read mode the mmap semphore of 541 try_to_unmap_anon() attempts to acquire in read mode the mmap semaphore of
542 the mm_struct to which the VMA belongs. If this is successful, it will 542 the mm_struct to which the VMA belongs. If this is successful, it will
543 mlock the page via mlock_vma_page() - we wouldn't have gotten to 543 mlock the page via mlock_vma_page() - we wouldn't have gotten to
544 try_to_unmap_anon() if the page were already mlocked - and will return 544 try_to_unmap_anon() if the page were already mlocked - and will return
@@ -619,11 +619,11 @@ all PTEs from the page. For this purpose, the unevictable/mlock infrastructure
619introduced a variant of try_to_unmap() called try_to_munlock(). 619introduced a variant of try_to_unmap() called try_to_munlock().
620 620
621try_to_munlock() calls the same functions as try_to_unmap() for anonymous and 621try_to_munlock() calls the same functions as try_to_unmap() for anonymous and
622mapped file pages with an additional argument specifing unlock versus unmap 622mapped file pages with an additional argument specifying unlock versus unmap
623processing. Again, these functions walk the respective reverse maps looking 623processing. Again, these functions walk the respective reverse maps looking
624for VM_LOCKED VMAs. When such a VMA is found for anonymous pages and file 624for VM_LOCKED VMAs. When such a VMA is found for anonymous pages and file
625pages mapped in linear VMAs, as in the try_to_unmap() case, the functions 625pages mapped in linear VMAs, as in the try_to_unmap() case, the functions
626attempt to acquire the associated mmap semphore, mlock the page via 626attempt to acquire the associated mmap semaphore, mlock the page via
627mlock_vma_page() and return SWAP_MLOCK. This effectively undoes the 627mlock_vma_page() and return SWAP_MLOCK. This effectively undoes the
628pre-clearing of the page's PG_mlocked done by munlock_vma_page. 628pre-clearing of the page's PG_mlocked done by munlock_vma_page.
629 629
@@ -641,7 +641,7 @@ with it - the usual fallback position.
641Note that try_to_munlock()'s reverse map walk must visit every VMA in a page's 641Note that try_to_munlock()'s reverse map walk must visit every VMA in a page's
642reverse map to determine that a page is NOT mapped into any VM_LOCKED VMA. 642reverse map to determine that a page is NOT mapped into any VM_LOCKED VMA.
643However, the scan can terminate when it encounters a VM_LOCKED VMA and can 643However, the scan can terminate when it encounters a VM_LOCKED VMA and can
644successfully acquire the VMA's mmap semphore for read and mlock the page. 644successfully acquire the VMA's mmap semaphore for read and mlock the page.
645Although try_to_munlock() might be called a great many times when munlocking a 645Although try_to_munlock() might be called a great many times when munlocking a
646large region or tearing down a large address space that has been mlocked via 646large region or tearing down a large address space that has been mlocked via
647mlockall(), overall this is a fairly rare event. 647mlockall(), overall this is a fairly rare event.