diff options
Diffstat (limited to 'Documentation/filesystems/proc.txt')
-rw-r--r-- | Documentation/filesystems/proc.txt | 95 |
1 files changed, 78 insertions, 17 deletions
diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index 0b1b0c008613..e2799b5fafea 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt | |||
@@ -1315,13 +1315,28 @@ for writeout by the pdflush daemons. It is expressed in 100'ths of a second. | |||
1315 | Data which has been dirty in-memory for longer than this interval will be | 1315 | Data which has been dirty in-memory for longer than this interval will be |
1316 | written out next time a pdflush daemon wakes up. | 1316 | written out next time a pdflush daemon wakes up. |
1317 | 1317 | ||
1318 | highmem_is_dirtyable | ||
1319 | -------------------- | ||
1320 | |||
1321 | Only present if CONFIG_HIGHMEM is set. | ||
1322 | |||
1323 | This defaults to 0 (false), meaning that the ratios set above are calculated | ||
1324 | as a percentage of lowmem only. This protects against excessive scanning | ||
1325 | in page reclaim, swapping and general VM distress. | ||
1326 | |||
1327 | Setting this to 1 can be useful on 32 bit machines where you want to make | ||
1328 | random changes within an MMAPed file that is larger than your available | ||
1329 | lowmem without causing large quantities of random IO. Is is safe if the | ||
1330 | behavior of all programs running on the machine is known and memory will | ||
1331 | not be otherwise stressed. | ||
1332 | |||
1318 | legacy_va_layout | 1333 | legacy_va_layout |
1319 | ---------------- | 1334 | ---------------- |
1320 | 1335 | ||
1321 | If non-zero, this sysctl disables the new 32-bit mmap mmap layout - the kernel | 1336 | If non-zero, this sysctl disables the new 32-bit mmap mmap layout - the kernel |
1322 | will use the legacy (2.4) layout for all processes. | 1337 | will use the legacy (2.4) layout for all processes. |
1323 | 1338 | ||
1324 | lower_zone_protection | 1339 | lowmem_reserve_ratio |
1325 | --------------------- | 1340 | --------------------- |
1326 | 1341 | ||
1327 | For some specialised workloads on highmem machines it is dangerous for | 1342 | For some specialised workloads on highmem machines it is dangerous for |
@@ -1341,25 +1356,71 @@ captured into pinned user memory. | |||
1341 | mechanism will also defend that region from allocations which could use | 1356 | mechanism will also defend that region from allocations which could use |
1342 | highmem or lowmem). | 1357 | highmem or lowmem). |
1343 | 1358 | ||
1344 | The `lower_zone_protection' tunable determines how aggressive the kernel is | 1359 | The `lowmem_reserve_ratio' tunable determines how aggressive the kernel is |
1345 | in defending these lower zones. The default value is zero - no | 1360 | in defending these lower zones. |
1346 | protection at all. | ||
1347 | 1361 | ||
1348 | If you have a machine which uses highmem or ISA DMA and your | 1362 | If you have a machine which uses highmem or ISA DMA and your |
1349 | applications are using mlock(), or if you are running with no swap then | 1363 | applications are using mlock(), or if you are running with no swap then |
1350 | you probably should increase the lower_zone_protection setting. | 1364 | you probably should change the lowmem_reserve_ratio setting. |
1351 | 1365 | ||
1352 | The units of this tunable are fairly vague. It is approximately equal | 1366 | The lowmem_reserve_ratio is an array. You can see them by reading this file. |
1353 | to "megabytes," so setting lower_zone_protection=100 will protect around 100 | 1367 | - |
1354 | megabytes of the lowmem zone from user allocations. It will also make | 1368 | % cat /proc/sys/vm/lowmem_reserve_ratio |
1355 | those 100 megabytes unavailable for use by applications and by | 1369 | 256 256 32 |
1356 | pagecache, so there is a cost. | 1370 | - |
1357 | 1371 | Note: # of this elements is one fewer than number of zones. Because the highest | |
1358 | The effects of this tunable may be observed by monitoring | 1372 | zone's value is not necessary for following calculation. |
1359 | /proc/meminfo:LowFree. Write a single huge file and observe the point | 1373 | |
1360 | at which LowFree ceases to fall. | 1374 | But, these values are not used directly. The kernel calculates # of protection |
1361 | 1375 | pages for each zones from them. These are shown as array of protection pages | |
1362 | A reasonable value for lower_zone_protection is 100. | 1376 | in /proc/zoneinfo like followings. (This is an example of x86-64 box). |
1377 | Each zone has an array of protection pages like this. | ||
1378 | |||
1379 | - | ||
1380 | Node 0, zone DMA | ||
1381 | pages free 1355 | ||
1382 | min 3 | ||
1383 | low 3 | ||
1384 | high 4 | ||
1385 | : | ||
1386 | : | ||
1387 | numa_other 0 | ||
1388 | protection: (0, 2004, 2004, 2004) | ||
1389 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
1390 | pagesets | ||
1391 | cpu: 0 pcp: 0 | ||
1392 | : | ||
1393 | - | ||
1394 | These protections are added to score to judge whether this zone should be used | ||
1395 | for page allocation or should be reclaimed. | ||
1396 | |||
1397 | In this example, if normal pages (index=2) are required to this DMA zone and | ||
1398 | pages_high is used for watermark, the kernel judges this zone should not be | ||
1399 | used because pages_free(1355) is smaller than watermark + protection[2] | ||
1400 | (4 + 2004 = 2008). If this protection value is 0, this zone would be used for | ||
1401 | normal page requirement. If requirement is DMA zone(index=0), protection[0] | ||
1402 | (=0) is used. | ||
1403 | |||
1404 | zone[i]'s protection[j] is calculated by following exprssion. | ||
1405 | |||
1406 | (i < j): | ||
1407 | zone[i]->protection[j] | ||
1408 | = (total sums of present_pages from zone[i+1] to zone[j] on the node) | ||
1409 | / lowmem_reserve_ratio[i]; | ||
1410 | (i = j): | ||
1411 | (should not be protected. = 0; | ||
1412 | (i > j): | ||
1413 | (not necessary, but looks 0) | ||
1414 | |||
1415 | The default values of lowmem_reserve_ratio[i] are | ||
1416 | 256 (if zone[i] means DMA or DMA32 zone) | ||
1417 | 32 (others). | ||
1418 | As above expression, they are reciprocal number of ratio. | ||
1419 | 256 means 1/256. # of protection pages becomes about "0.39%" of total present | ||
1420 | pages of higher zones on the node. | ||
1421 | |||
1422 | If you would like to protect more pages, smaller values are effective. | ||
1423 | The minimum value is 1 (1/1 -> 100%). | ||
1363 | 1424 | ||
1364 | page-cluster | 1425 | page-cluster |
1365 | ------------ | 1426 | ------------ |