aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2012-03-22 22:52:47 -0400
committerLinus Torvalds <torvalds@linux-foundation.org>2012-03-22 22:52:47 -0400
commitaab008db8063364dc3c8ccf4981c21124866b395 (patch)
tree72914203f4decb023efdaabd0301a62d742dfa8c /Documentation
parent4f5b1affdda3e0c48cac674182f52004137b0ffc (diff)
parent16c0cfa425b8e1488f7a1873bd112a7a099325f0 (diff)
Merge tag 'stable/for-linus-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/mm
Pull cleancache changes from Konrad Rzeszutek Wilk: "This has some patches for the cleancache API that should have been submitted a _long_ time ago. They are basically cleanups: - rename of flush to invalidate - moving reporting of statistics into debugfs - use __read_mostly as necessary. Oh, and also the MAINTAINERS file change. The files (except the MAINTAINERS file) have been in #linux-next for months now. The late addition of MAINTAINERS file is a brain-fart on my side - didn't realize I needed that just until I was typing this up - and I based that patch on v3.3 - so the tree is on top of v3.3." * tag 'stable/for-linus-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/mm: MAINTAINERS: Adding cleancache API to the list. mm: cleancache: Use __read_mostly as appropiate. mm: cleancache: report statistics via debugfs instead of sysfs. mm: zcache/tmem/cleancache: s/flush/invalidate/ mm: cleancache: s/flush/invalidate/
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/ABI/testing/sysfs-kernel-mm-cleancache11
-rw-r--r--Documentation/vm/cleancache.txt41
2 files changed, 21 insertions, 31 deletions
diff --git a/Documentation/ABI/testing/sysfs-kernel-mm-cleancache b/Documentation/ABI/testing/sysfs-kernel-mm-cleancache
deleted file mode 100644
index 662ae646ea12..000000000000
--- a/Documentation/ABI/testing/sysfs-kernel-mm-cleancache
+++ /dev/null
@@ -1,11 +0,0 @@
1What: /sys/kernel/mm/cleancache/
2Date: April 2011
3Contact: Dan Magenheimer <dan.magenheimer@oracle.com>
4Description:
5 /sys/kernel/mm/cleancache/ contains a number of files which
6 record a count of various cleancache operations
7 (sum across all filesystems):
8 succ_gets
9 failed_gets
10 puts
11 flushes
diff --git a/Documentation/vm/cleancache.txt b/Documentation/vm/cleancache.txt
index d5c615af10ba..142fbb0f325a 100644
--- a/Documentation/vm/cleancache.txt
+++ b/Documentation/vm/cleancache.txt
@@ -46,10 +46,11 @@ a negative return value indicates failure. A "put_page" will copy a
46the pool id, a file key, and a page index into the file. (The combination 46the pool id, a file key, and a page index into the file. (The combination
47of a pool id, a file key, and an index is sometimes called a "handle".) 47of a pool id, a file key, and an index is sometimes called a "handle".)
48A "get_page" will copy the page, if found, from cleancache into kernel memory. 48A "get_page" will copy the page, if found, from cleancache into kernel memory.
49A "flush_page" will ensure the page no longer is present in cleancache; 49An "invalidate_page" will ensure the page no longer is present in cleancache;
50a "flush_inode" will flush all pages associated with the specified file; 50an "invalidate_inode" will invalidate all pages associated with the specified
51and, when a filesystem is unmounted, a "flush_fs" will flush all pages in 51file; and, when a filesystem is unmounted, an "invalidate_fs" will invalidate
52all files specified by the given pool id and also surrender the pool id. 52all pages in all files specified by the given pool id and also surrender
53the pool id.
53 54
54An "init_shared_fs", like init_fs, obtains a pool id but tells cleancache 55An "init_shared_fs", like init_fs, obtains a pool id but tells cleancache
55to treat the pool as shared using a 128-bit UUID as a key. On systems 56to treat the pool as shared using a 128-bit UUID as a key. On systems
@@ -62,12 +63,12 @@ of the kernel (e.g. by "tools" that control cleancache). Or a
62cleancache implementation can simply disable shared_init by always 63cleancache implementation can simply disable shared_init by always
63returning a negative value. 64returning a negative value.
64 65
65If a get_page is successful on a non-shared pool, the page is flushed (thus 66If a get_page is successful on a non-shared pool, the page is invalidated
66making cleancache an "exclusive" cache). On a shared pool, the page 67(thus making cleancache an "exclusive" cache). On a shared pool, the page
67is NOT flushed on a successful get_page so that it remains accessible to 68is NOT invalidated on a successful get_page so that it remains accessible to
68other sharers. The kernel is responsible for ensuring coherency between 69other sharers. The kernel is responsible for ensuring coherency between
69cleancache (shared or not), the page cache, and the filesystem, using 70cleancache (shared or not), the page cache, and the filesystem, using
70cleancache flush operations as required. 71cleancache invalidate operations as required.
71 72
72Note that cleancache must enforce put-put-get coherency and get-get 73Note that cleancache must enforce put-put-get coherency and get-get
73coherency. For the former, if two puts are made to the same handle but 74coherency. For the former, if two puts are made to the same handle but
@@ -77,20 +78,20 @@ if a get for a given handle fails, subsequent gets for that handle will
77never succeed unless preceded by a successful put with that handle. 78never succeed unless preceded by a successful put with that handle.
78 79
79Last, cleancache provides no SMP serialization guarantees; if two 80Last, cleancache provides no SMP serialization guarantees; if two
80different Linux threads are simultaneously putting and flushing a page 81different Linux threads are simultaneously putting and invalidating a page
81with the same handle, the results are indeterminate. Callers must 82with the same handle, the results are indeterminate. Callers must
82lock the page to ensure serial behavior. 83lock the page to ensure serial behavior.
83 84
84CLEANCACHE PERFORMANCE METRICS 85CLEANCACHE PERFORMANCE METRICS
85 86
86Cleancache monitoring is done by sysfs files in the 87If properly configured, monitoring of cleancache is done via debugfs in
87/sys/kernel/mm/cleancache directory. The effectiveness of cleancache 88the /sys/kernel/debug/mm/cleancache directory. The effectiveness of cleancache
88can be measured (across all filesystems) with: 89can be measured (across all filesystems) with:
89 90
90succ_gets - number of gets that were successful 91succ_gets - number of gets that were successful
91failed_gets - number of gets that failed 92failed_gets - number of gets that failed
92puts - number of puts attempted (all "succeed") 93puts - number of puts attempted (all "succeed")
93flushes - number of flushes attempted 94invalidates - number of invalidates attempted
94 95
95A backend implementation may provide additional metrics. 96A backend implementation may provide additional metrics.
96 97
@@ -143,7 +144,7 @@ systems.
143 144
144The core hooks for cleancache in VFS are in most cases a single line 145The core hooks for cleancache in VFS are in most cases a single line
145and the minimum set are placed precisely where needed to maintain 146and the minimum set are placed precisely where needed to maintain
146coherency (via cleancache_flush operations) between cleancache, 147coherency (via cleancache_invalidate operations) between cleancache,
147the page cache, and disk. All hooks compile into nothingness if 148the page cache, and disk. All hooks compile into nothingness if
148cleancache is config'ed off and turn into a function-pointer- 149cleancache is config'ed off and turn into a function-pointer-
149compare-to-NULL if config'ed on but no backend claims the ops 150compare-to-NULL if config'ed on but no backend claims the ops
@@ -184,15 +185,15 @@ or for real kernel-addressable RAM, it makes perfect sense for
184transcendent memory. 185transcendent memory.
185 186
1864) Why is non-shared cleancache "exclusive"? And where is the 1874) Why is non-shared cleancache "exclusive"? And where is the
187 page "flushed" after a "get"? (Minchan Kim) 188 page "invalidated" after a "get"? (Minchan Kim)
188 189
189The main reason is to free up space in transcendent memory and 190The main reason is to free up space in transcendent memory and
190to avoid unnecessary cleancache_flush calls. If you want inclusive, 191to avoid unnecessary cleancache_invalidate calls. If you want inclusive,
191the page can be "put" immediately following the "get". If 192the page can be "put" immediately following the "get". If
192put-after-get for inclusive becomes common, the interface could 193put-after-get for inclusive becomes common, the interface could
193be easily extended to add a "get_no_flush" call. 194be easily extended to add a "get_no_invalidate" call.
194 195
195The flush is done by the cleancache backend implementation. 196The invalidate is done by the cleancache backend implementation.
196 197
1975) What's the performance impact? 1985) What's the performance impact?
198 199
@@ -222,7 +223,7 @@ Some points for a filesystem to consider:
222 as tmpfs should not enable cleancache) 223 as tmpfs should not enable cleancache)
223- To ensure coherency/correctness, the FS must ensure that all 224- To ensure coherency/correctness, the FS must ensure that all
224 file removal or truncation operations either go through VFS or 225 file removal or truncation operations either go through VFS or
225 add hooks to do the equivalent cleancache "flush" operations 226 add hooks to do the equivalent cleancache "invalidate" operations
226- To ensure coherency/correctness, either inode numbers must 227- To ensure coherency/correctness, either inode numbers must
227 be unique across the lifetime of the on-disk file OR the 228 be unique across the lifetime of the on-disk file OR the
228 FS must provide an "encode_fh" function. 229 FS must provide an "encode_fh" function.
@@ -243,11 +244,11 @@ If cleancache would use the inode virtual address instead of
243inode/filehandle, the pool id could be eliminated. But, this 244inode/filehandle, the pool id could be eliminated. But, this
244won't work because cleancache retains pagecache data pages 245won't work because cleancache retains pagecache data pages
245persistently even when the inode has been pruned from the 246persistently even when the inode has been pruned from the
246inode unused list, and only flushes the data page if the file 247inode unused list, and only invalidates the data page if the file
247gets removed/truncated. So if cleancache used the inode kva, 248gets removed/truncated. So if cleancache used the inode kva,
248there would be potential coherency issues if/when the inode 249there would be potential coherency issues if/when the inode
249kva is reused for a different file. Alternately, if cleancache 250kva is reused for a different file. Alternately, if cleancache
250flushed the pages when the inode kva was freed, much of the value 251invalidated the pages when the inode kva was freed, much of the value
251of cleancache would be lost because the cache of pages in cleanache 252of cleancache would be lost because the cache of pages in cleanache
252is potentially much larger than the kernel pagecache and is most 253is potentially much larger than the kernel pagecache and is most
253useful if the pages survive inode cache removal. 254useful if the pages survive inode cache removal.