diff options
Diffstat (limited to 'Documentation/vm/frontswap.txt')
-rw-r--r-- | Documentation/vm/frontswap.txt | 50 |
1 files changed, 25 insertions, 25 deletions
diff --git a/Documentation/vm/frontswap.txt b/Documentation/vm/frontswap.txt index a9f731af0fac..37067cf455f4 100644 --- a/Documentation/vm/frontswap.txt +++ b/Documentation/vm/frontswap.txt | |||
@@ -21,21 +21,21 @@ frontswap_ops funcs appropriately and the functions it provides must | |||
21 | conform to certain policies as follows: | 21 | conform to certain policies as follows: |
22 | 22 | ||
23 | An "init" prepares the device to receive frontswap pages associated | 23 | An "init" prepares the device to receive frontswap pages associated |
24 | with the specified swap device number (aka "type"). A "put_page" will | 24 | with the specified swap device number (aka "type"). A "store" will |
25 | copy the page to transcendent memory and associate it with the type and | 25 | copy the page to transcendent memory and associate it with the type and |
26 | offset associated with the page. A "get_page" will copy the page, if found, | 26 | offset associated with the page. A "load" will copy the page, if found, |
27 | from transcendent memory into kernel memory, but will NOT remove the page | 27 | from transcendent memory into kernel memory, but will NOT remove the page |
28 | from from transcendent memory. An "invalidate_page" will remove the page | 28 | from from transcendent memory. An "invalidate_page" will remove the page |
29 | from transcendent memory and an "invalidate_area" will remove ALL pages | 29 | from transcendent memory and an "invalidate_area" will remove ALL pages |
30 | associated with the swap type (e.g., like swapoff) and notify the "device" | 30 | associated with the swap type (e.g., like swapoff) and notify the "device" |
31 | to refuse further puts with that swap type. | 31 | to refuse further stores with that swap type. |
32 | 32 | ||
33 | Once a page is successfully put, a matching get on the page will normally | 33 | Once a page is successfully stored, a matching load on the page will normally |
34 | succeed. So when the kernel finds itself in a situation where it needs | 34 | succeed. So when the kernel finds itself in a situation where it needs |
35 | to swap out a page, it first attempts to use frontswap. If the put returns | 35 | to swap out a page, it first attempts to use frontswap. If the store returns |
36 | success, the data has been successfully saved to transcendent memory and | 36 | success, the data has been successfully saved to transcendent memory and |
37 | a disk write and, if the data is later read back, a disk read are avoided. | 37 | a disk write and, if the data is later read back, a disk read are avoided. |
38 | If a put returns failure, transcendent memory has rejected the data, and the | 38 | If a store returns failure, transcendent memory has rejected the data, and the |
39 | page can be written to swap as usual. | 39 | page can be written to swap as usual. |
40 | 40 | ||
41 | If a backend chooses, frontswap can be configured as a "writethrough | 41 | If a backend chooses, frontswap can be configured as a "writethrough |
@@ -44,18 +44,18 @@ in swap device writes is lost (and also a non-trivial performance advantage) | |||
44 | in order to allow the backend to arbitrarily "reclaim" space used to | 44 | in order to allow the backend to arbitrarily "reclaim" space used to |
45 | store frontswap pages to more completely manage its memory usage. | 45 | store frontswap pages to more completely manage its memory usage. |
46 | 46 | ||
47 | Note that if a page is put and the page already exists in transcendent memory | 47 | Note that if a page is stored and the page already exists in transcendent memory |
48 | (a "duplicate" put), either the put succeeds and the data is overwritten, | 48 | (a "duplicate" store), either the store succeeds and the data is overwritten, |
49 | or the put fails AND the page is invalidated. This ensures stale data may | 49 | or the store fails AND the page is invalidated. This ensures stale data may |
50 | never be obtained from frontswap. | 50 | never be obtained from frontswap. |
51 | 51 | ||
52 | If properly configured, monitoring of frontswap is done via debugfs in | 52 | If properly configured, monitoring of frontswap is done via debugfs in |
53 | the /sys/kernel/debug/frontswap directory. The effectiveness of | 53 | the /sys/kernel/debug/frontswap directory. The effectiveness of |
54 | frontswap can be measured (across all swap devices) with: | 54 | frontswap can be measured (across all swap devices) with: |
55 | 55 | ||
56 | failed_puts - how many put attempts have failed | 56 | failed_stores - how many store attempts have failed |
57 | gets - how many gets were attempted (all should succeed) | 57 | loads - how many loads were attempted (all should succeed) |
58 | succ_puts - how many put attempts have succeeded | 58 | succ_stores - how many store attempts have succeeded |
59 | invalidates - how many invalidates were attempted | 59 | invalidates - how many invalidates were attempted |
60 | 60 | ||
61 | A backend implementation may provide additional metrics. | 61 | A backend implementation may provide additional metrics. |
@@ -125,7 +125,7 @@ nothingness and the only overhead is a few extra bytes per swapon'ed | |||
125 | swap device. If CONFIG_FRONTSWAP is enabled but no frontswap "backend" | 125 | swap device. If CONFIG_FRONTSWAP is enabled but no frontswap "backend" |
126 | registers, there is one extra global variable compared to zero for | 126 | registers, there is one extra global variable compared to zero for |
127 | every swap page read or written. If CONFIG_FRONTSWAP is enabled | 127 | every swap page read or written. If CONFIG_FRONTSWAP is enabled |
128 | AND a frontswap backend registers AND the backend fails every "put" | 128 | AND a frontswap backend registers AND the backend fails every "store" |
129 | request (i.e. provides no memory despite claiming it might), | 129 | request (i.e. provides no memory despite claiming it might), |
130 | CPU overhead is still negligible -- and since every frontswap fail | 130 | CPU overhead is still negligible -- and since every frontswap fail |
131 | precedes a swap page write-to-disk, the system is highly likely | 131 | precedes a swap page write-to-disk, the system is highly likely |
@@ -159,13 +159,13 @@ entirely dynamic and random. | |||
159 | 159 | ||
160 | Whenever a swap-device is swapon'd frontswap_init() is called, | 160 | Whenever a swap-device is swapon'd frontswap_init() is called, |
161 | passing the swap device number (aka "type") as a parameter. | 161 | passing the swap device number (aka "type") as a parameter. |
162 | This notifies frontswap to expect attempts to "put" swap pages | 162 | This notifies frontswap to expect attempts to "store" swap pages |
163 | associated with that number. | 163 | associated with that number. |
164 | 164 | ||
165 | Whenever the swap subsystem is readying a page to write to a swap | 165 | Whenever the swap subsystem is readying a page to write to a swap |
166 | device (c.f swap_writepage()), frontswap_put_page is called. Frontswap | 166 | device (c.f swap_writepage()), frontswap_store is called. Frontswap |
167 | consults with the frontswap backend and if the backend says it does NOT | 167 | consults with the frontswap backend and if the backend says it does NOT |
168 | have room, frontswap_put_page returns -1 and the kernel swaps the page | 168 | have room, frontswap_store returns -1 and the kernel swaps the page |
169 | to the swap device as normal. Note that the response from the frontswap | 169 | to the swap device as normal. Note that the response from the frontswap |
170 | backend is unpredictable to the kernel; it may choose to never accept a | 170 | backend is unpredictable to the kernel; it may choose to never accept a |
171 | page, it could accept every ninth page, or it might accept every | 171 | page, it could accept every ninth page, or it might accept every |
@@ -177,7 +177,7 @@ corresponding to the page offset on the swap device to which it would | |||
177 | otherwise have written the data. | 177 | otherwise have written the data. |
178 | 178 | ||
179 | When the swap subsystem needs to swap-in a page (swap_readpage()), | 179 | When the swap subsystem needs to swap-in a page (swap_readpage()), |
180 | it first calls frontswap_get_page() which checks the frontswap_map to | 180 | it first calls frontswap_load() which checks the frontswap_map to |
181 | see if the page was earlier accepted by the frontswap backend. If | 181 | see if the page was earlier accepted by the frontswap backend. If |
182 | it was, the page of data is filled from the frontswap backend and | 182 | it was, the page of data is filled from the frontswap backend and |
183 | the swap-in is complete. If not, the normal swap-in code is | 183 | the swap-in is complete. If not, the normal swap-in code is |
@@ -185,7 +185,7 @@ executed to obtain the page of data from the real swap device. | |||
185 | 185 | ||
186 | So every time the frontswap backend accepts a page, a swap device read | 186 | So every time the frontswap backend accepts a page, a swap device read |
187 | and (potentially) a swap device write are replaced by a "frontswap backend | 187 | and (potentially) a swap device write are replaced by a "frontswap backend |
188 | put" and (possibly) a "frontswap backend get", which are presumably much | 188 | store" and (possibly) a "frontswap backend loads", which are presumably much |
189 | faster. | 189 | faster. |
190 | 190 | ||
191 | 4) Can't frontswap be configured as a "special" swap device that is | 191 | 4) Can't frontswap be configured as a "special" swap device that is |
@@ -215,8 +215,8 @@ that are inappropriate for a RAM-oriented device including delaying | |||
215 | the write of some pages for a significant amount of time. Synchrony is | 215 | the write of some pages for a significant amount of time. Synchrony is |
216 | required to ensure the dynamicity of the backend and to avoid thorny race | 216 | required to ensure the dynamicity of the backend and to avoid thorny race |
217 | conditions that would unnecessarily and greatly complicate frontswap | 217 | conditions that would unnecessarily and greatly complicate frontswap |
218 | and/or the block I/O subsystem. That said, only the initial "put" | 218 | and/or the block I/O subsystem. That said, only the initial "store" |
219 | and "get" operations need be synchronous. A separate asynchronous thread | 219 | and "load" operations need be synchronous. A separate asynchronous thread |
220 | is free to manipulate the pages stored by frontswap. For example, | 220 | is free to manipulate the pages stored by frontswap. For example, |
221 | the "remotification" thread in RAMster uses standard asynchronous | 221 | the "remotification" thread in RAMster uses standard asynchronous |
222 | kernel sockets to move compressed frontswap pages to a remote machine. | 222 | kernel sockets to move compressed frontswap pages to a remote machine. |
@@ -229,7 +229,7 @@ choose to accept pages only until host-swapping might be imminent, | |||
229 | then force guests to do their own swapping. | 229 | then force guests to do their own swapping. |
230 | 230 | ||
231 | There is a downside to the transcendent memory specifications for | 231 | There is a downside to the transcendent memory specifications for |
232 | frontswap: Since any "put" might fail, there must always be a real | 232 | frontswap: Since any "store" might fail, there must always be a real |
233 | slot on a real swap device to swap the page. Thus frontswap must be | 233 | slot on a real swap device to swap the page. Thus frontswap must be |
234 | implemented as a "shadow" to every swapon'd device with the potential | 234 | implemented as a "shadow" to every swapon'd device with the potential |
235 | capability of holding every page that the swap device might have held | 235 | capability of holding every page that the swap device might have held |
@@ -240,16 +240,16 @@ installation, frontswap is useless. Swapless portable devices | |||
240 | can still use frontswap but a backend for such devices must configure | 240 | can still use frontswap but a backend for such devices must configure |
241 | some kind of "ghost" swap device and ensure that it is never used. | 241 | some kind of "ghost" swap device and ensure that it is never used. |
242 | 242 | ||
243 | 5) Why this weird definition about "duplicate puts"? If a page | 243 | 5) Why this weird definition about "duplicate stores"? If a page |
244 | has been previously successfully put, can't it always be | 244 | has been previously successfully stored, can't it always be |
245 | successfully overwritten? | 245 | successfully overwritten? |
246 | 246 | ||
247 | Nearly always it can, but no, sometimes it cannot. Consider an example | 247 | Nearly always it can, but no, sometimes it cannot. Consider an example |
248 | where data is compressed and the original 4K page has been compressed | 248 | where data is compressed and the original 4K page has been compressed |
249 | to 1K. Now an attempt is made to overwrite the page with data that | 249 | to 1K. Now an attempt is made to overwrite the page with data that |
250 | is non-compressible and so would take the entire 4K. But the backend | 250 | is non-compressible and so would take the entire 4K. But the backend |
251 | has no more space. In this case, the put must be rejected. Whenever | 251 | has no more space. In this case, the store must be rejected. Whenever |
252 | frontswap rejects a put that would overwrite, it also must invalidate | 252 | frontswap rejects a store that would overwrite, it also must invalidate |
253 | the old data and ensure that it is no longer accessible. Since the | 253 | the old data and ensure that it is no longer accessible. Since the |
254 | swap subsystem then writes the new data to the read swap device, | 254 | swap subsystem then writes the new data to the read swap device, |
255 | this is the correct course of action to ensure coherency. | 255 | this is the correct course of action to ensure coherency. |