diff options
author | Peter W Morreale <pmorreale@novell.com> | 2009-01-15 16:50:42 -0500 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2009-01-15 19:39:35 -0500 |
commit | db0fb1848a645b0b1b033765f3a5244e7afd2e3c (patch) | |
tree | cf6b63e52fad2fa626e2a08251815f07626682dd /Documentation/filesystems | |
parent | b5db0e38653bfada34a92f360b4111566ede3842 (diff) |
Update of Documentation: vm.txt and proc.txt
Update Documentation/sysctl/vm.txt and Documentation/filesystems/proc.txt.
More specifically, the section on /proc/sys/vm in
Documentation/filesystems/proc.txt was removed and a link to
Documentation/sysctl/vm.txt added.
Most of the verbiage from proc.txt was simply moved in vm.txt, with new
addtional text for "swappiness" and "stat_interval".
Signed-off-by: Peter W Morreale <pmorreale@novell.com>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Diffstat (limited to 'Documentation/filesystems')
-rw-r--r-- | Documentation/filesystems/proc.txt | 288 |
1 files changed, 2 insertions, 286 deletions
diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index d105eb45282a..bbebc3a43ac0 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt | |||
@@ -1371,292 +1371,8 @@ auto_msgmni default value is 1. | |||
1371 | 2.4 /proc/sys/vm - The virtual memory subsystem | 1371 | 2.4 /proc/sys/vm - The virtual memory subsystem |
1372 | ----------------------------------------------- | 1372 | ----------------------------------------------- |
1373 | 1373 | ||
1374 | The files in this directory can be used to tune the operation of the virtual | 1374 | Please see: Documentation/sysctls/vm.txt for a description of these |
1375 | memory (VM) subsystem of the Linux kernel. | 1375 | entries. |
1376 | |||
1377 | vfs_cache_pressure | ||
1378 | ------------------ | ||
1379 | |||
1380 | Controls the tendency of the kernel to reclaim the memory which is used for | ||
1381 | caching of directory and inode objects. | ||
1382 | |||
1383 | At the default value of vfs_cache_pressure=100 the kernel will attempt to | ||
1384 | reclaim dentries and inodes at a "fair" rate with respect to pagecache and | ||
1385 | swapcache reclaim. Decreasing vfs_cache_pressure causes the kernel to prefer | ||
1386 | to retain dentry and inode caches. Increasing vfs_cache_pressure beyond 100 | ||
1387 | causes the kernel to prefer to reclaim dentries and inodes. | ||
1388 | |||
1389 | dirty_background_bytes | ||
1390 | ---------------------- | ||
1391 | |||
1392 | Contains the amount of dirty memory at which the pdflush background writeback | ||
1393 | daemon will start writeback. | ||
1394 | |||
1395 | If dirty_background_bytes is written, dirty_background_ratio becomes a function | ||
1396 | of its value (dirty_background_bytes / the amount of dirtyable system memory). | ||
1397 | |||
1398 | dirty_background_ratio | ||
1399 | ---------------------- | ||
1400 | |||
1401 | Contains, as a percentage of the dirtyable system memory (free pages + mapped | ||
1402 | pages + file cache, not including locked pages and HugePages), the number of | ||
1403 | pages at which the pdflush background writeback daemon will start writing out | ||
1404 | dirty data. | ||
1405 | |||
1406 | If dirty_background_ratio is written, dirty_background_bytes becomes a function | ||
1407 | of its value (dirty_background_ratio * the amount of dirtyable system memory). | ||
1408 | |||
1409 | dirty_bytes | ||
1410 | ----------- | ||
1411 | |||
1412 | Contains the amount of dirty memory at which a process generating disk writes | ||
1413 | will itself start writeback. | ||
1414 | |||
1415 | If dirty_bytes is written, dirty_ratio becomes a function of its value | ||
1416 | (dirty_bytes / the amount of dirtyable system memory). | ||
1417 | |||
1418 | dirty_ratio | ||
1419 | ----------- | ||
1420 | |||
1421 | Contains, as a percentage of the dirtyable system memory (free pages + mapped | ||
1422 | pages + file cache, not including locked pages and HugePages), the number of | ||
1423 | pages at which a process which is generating disk writes will itself start | ||
1424 | writing out dirty data. | ||
1425 | |||
1426 | If dirty_ratio is written, dirty_bytes becomes a function of its value | ||
1427 | (dirty_ratio * the amount of dirtyable system memory). | ||
1428 | |||
1429 | dirty_writeback_centisecs | ||
1430 | ------------------------- | ||
1431 | |||
1432 | The pdflush writeback daemons will periodically wake up and write `old' data | ||
1433 | out to disk. This tunable expresses the interval between those wakeups, in | ||
1434 | 100'ths of a second. | ||
1435 | |||
1436 | Setting this to zero disables periodic writeback altogether. | ||
1437 | |||
1438 | dirty_expire_centisecs | ||
1439 | ---------------------- | ||
1440 | |||
1441 | This tunable is used to define when dirty data is old enough to be eligible | ||
1442 | for writeout by the pdflush daemons. It is expressed in 100'ths of a second. | ||
1443 | Data which has been dirty in-memory for longer than this interval will be | ||
1444 | written out next time a pdflush daemon wakes up. | ||
1445 | |||
1446 | highmem_is_dirtyable | ||
1447 | -------------------- | ||
1448 | |||
1449 | Only present if CONFIG_HIGHMEM is set. | ||
1450 | |||
1451 | This defaults to 0 (false), meaning that the ratios set above are calculated | ||
1452 | as a percentage of lowmem only. This protects against excessive scanning | ||
1453 | in page reclaim, swapping and general VM distress. | ||
1454 | |||
1455 | Setting this to 1 can be useful on 32 bit machines where you want to make | ||
1456 | random changes within an MMAPed file that is larger than your available | ||
1457 | lowmem without causing large quantities of random IO. Is is safe if the | ||
1458 | behavior of all programs running on the machine is known and memory will | ||
1459 | not be otherwise stressed. | ||
1460 | |||
1461 | legacy_va_layout | ||
1462 | ---------------- | ||
1463 | |||
1464 | If non-zero, this sysctl disables the new 32-bit mmap mmap layout - the kernel | ||
1465 | will use the legacy (2.4) layout for all processes. | ||
1466 | |||
1467 | lowmem_reserve_ratio | ||
1468 | --------------------- | ||
1469 | |||
1470 | For some specialised workloads on highmem machines it is dangerous for | ||
1471 | the kernel to allow process memory to be allocated from the "lowmem" | ||
1472 | zone. This is because that memory could then be pinned via the mlock() | ||
1473 | system call, or by unavailability of swapspace. | ||
1474 | |||
1475 | And on large highmem machines this lack of reclaimable lowmem memory | ||
1476 | can be fatal. | ||
1477 | |||
1478 | So the Linux page allocator has a mechanism which prevents allocations | ||
1479 | which _could_ use highmem from using too much lowmem. This means that | ||
1480 | a certain amount of lowmem is defended from the possibility of being | ||
1481 | captured into pinned user memory. | ||
1482 | |||
1483 | (The same argument applies to the old 16 megabyte ISA DMA region. This | ||
1484 | mechanism will also defend that region from allocations which could use | ||
1485 | highmem or lowmem). | ||
1486 | |||
1487 | The `lowmem_reserve_ratio' tunable determines how aggressive the kernel is | ||
1488 | in defending these lower zones. | ||
1489 | |||
1490 | If you have a machine which uses highmem or ISA DMA and your | ||
1491 | applications are using mlock(), or if you are running with no swap then | ||
1492 | you probably should change the lowmem_reserve_ratio setting. | ||
1493 | |||
1494 | The lowmem_reserve_ratio is an array. You can see them by reading this file. | ||
1495 | - | ||
1496 | % cat /proc/sys/vm/lowmem_reserve_ratio | ||
1497 | 256 256 32 | ||
1498 | - | ||
1499 | Note: # of this elements is one fewer than number of zones. Because the highest | ||
1500 | zone's value is not necessary for following calculation. | ||
1501 | |||
1502 | But, these values are not used directly. The kernel calculates # of protection | ||
1503 | pages for each zones from them. These are shown as array of protection pages | ||
1504 | in /proc/zoneinfo like followings. (This is an example of x86-64 box). | ||
1505 | Each zone has an array of protection pages like this. | ||
1506 | |||
1507 | - | ||
1508 | Node 0, zone DMA | ||
1509 | pages free 1355 | ||
1510 | min 3 | ||
1511 | low 3 | ||
1512 | high 4 | ||
1513 | : | ||
1514 | : | ||
1515 | numa_other 0 | ||
1516 | protection: (0, 2004, 2004, 2004) | ||
1517 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
1518 | pagesets | ||
1519 | cpu: 0 pcp: 0 | ||
1520 | : | ||
1521 | - | ||
1522 | These protections are added to score to judge whether this zone should be used | ||
1523 | for page allocation or should be reclaimed. | ||
1524 | |||
1525 | In this example, if normal pages (index=2) are required to this DMA zone and | ||
1526 | pages_high is used for watermark, the kernel judges this zone should not be | ||
1527 | used because pages_free(1355) is smaller than watermark + protection[2] | ||
1528 | (4 + 2004 = 2008). If this protection value is 0, this zone would be used for | ||
1529 | normal page requirement. If requirement is DMA zone(index=0), protection[0] | ||
1530 | (=0) is used. | ||
1531 | |||
1532 | zone[i]'s protection[j] is calculated by following expression. | ||
1533 | |||
1534 | (i < j): | ||
1535 | zone[i]->protection[j] | ||
1536 | = (total sums of present_pages from zone[i+1] to zone[j] on the node) | ||
1537 | / lowmem_reserve_ratio[i]; | ||
1538 | (i = j): | ||
1539 | (should not be protected. = 0; | ||
1540 | (i > j): | ||
1541 | (not necessary, but looks 0) | ||
1542 | |||
1543 | The default values of lowmem_reserve_ratio[i] are | ||
1544 | 256 (if zone[i] means DMA or DMA32 zone) | ||
1545 | 32 (others). | ||
1546 | As above expression, they are reciprocal number of ratio. | ||
1547 | 256 means 1/256. # of protection pages becomes about "0.39%" of total present | ||
1548 | pages of higher zones on the node. | ||
1549 | |||
1550 | If you would like to protect more pages, smaller values are effective. | ||
1551 | The minimum value is 1 (1/1 -> 100%). | ||
1552 | |||
1553 | page-cluster | ||
1554 | ------------ | ||
1555 | |||
1556 | page-cluster controls the number of pages which are written to swap in | ||
1557 | a single attempt. The swap I/O size. | ||
1558 | |||
1559 | It is a logarithmic value - setting it to zero means "1 page", setting | ||
1560 | it to 1 means "2 pages", setting it to 2 means "4 pages", etc. | ||
1561 | |||
1562 | The default value is three (eight pages at a time). There may be some | ||
1563 | small benefits in tuning this to a different value if your workload is | ||
1564 | swap-intensive. | ||
1565 | |||
1566 | overcommit_memory | ||
1567 | ----------------- | ||
1568 | |||
1569 | Controls overcommit of system memory, possibly allowing processes | ||
1570 | to allocate (but not use) more memory than is actually available. | ||
1571 | |||
1572 | |||
1573 | 0 - Heuristic overcommit handling. Obvious overcommits of | ||
1574 | address space are refused. Used for a typical system. It | ||
1575 | ensures a seriously wild allocation fails while allowing | ||
1576 | overcommit to reduce swap usage. root is allowed to | ||
1577 | allocate slightly more memory in this mode. This is the | ||
1578 | default. | ||
1579 | |||
1580 | 1 - Always overcommit. Appropriate for some scientific | ||
1581 | applications. | ||
1582 | |||
1583 | 2 - Don't overcommit. The total address space commit | ||
1584 | for the system is not permitted to exceed swap plus a | ||
1585 | configurable percentage (default is 50) of physical RAM. | ||
1586 | Depending on the percentage you use, in most situations | ||
1587 | this means a process will not be killed while attempting | ||
1588 | to use already-allocated memory but will receive errors | ||
1589 | on memory allocation as appropriate. | ||
1590 | |||
1591 | overcommit_ratio | ||
1592 | ---------------- | ||
1593 | |||
1594 | Percentage of physical memory size to include in overcommit calculations | ||
1595 | (see above.) | ||
1596 | |||
1597 | Memory allocation limit = swapspace + physmem * (overcommit_ratio / 100) | ||
1598 | |||
1599 | swapspace = total size of all swap areas | ||
1600 | physmem = size of physical memory in system | ||
1601 | |||
1602 | nr_hugepages and hugetlb_shm_group | ||
1603 | ---------------------------------- | ||
1604 | |||
1605 | nr_hugepages configures number of hugetlb page reserved for the system. | ||
1606 | |||
1607 | hugetlb_shm_group contains group id that is allowed to create SysV shared | ||
1608 | memory segment using hugetlb page. | ||
1609 | |||
1610 | hugepages_treat_as_movable | ||
1611 | -------------------------- | ||
1612 | |||
1613 | This parameter is only useful when kernelcore= is specified at boot time to | ||
1614 | create ZONE_MOVABLE for pages that may be reclaimed or migrated. Huge pages | ||
1615 | are not movable so are not normally allocated from ZONE_MOVABLE. A non-zero | ||
1616 | value written to hugepages_treat_as_movable allows huge pages to be allocated | ||
1617 | from ZONE_MOVABLE. | ||
1618 | |||
1619 | Once enabled, the ZONE_MOVABLE is treated as an area of memory the huge | ||
1620 | pages pool can easily grow or shrink within. Assuming that applications are | ||
1621 | not running that mlock() a lot of memory, it is likely the huge pages pool | ||
1622 | can grow to the size of ZONE_MOVABLE by repeatedly entering the desired value | ||
1623 | into nr_hugepages and triggering page reclaim. | ||
1624 | |||
1625 | laptop_mode | ||
1626 | ----------- | ||
1627 | |||
1628 | laptop_mode is a knob that controls "laptop mode". All the things that are | ||
1629 | controlled by this knob are discussed in Documentation/laptops/laptop-mode.txt. | ||
1630 | |||
1631 | block_dump | ||
1632 | ---------- | ||
1633 | |||
1634 | block_dump enables block I/O debugging when set to a nonzero value. More | ||
1635 | information on block I/O debugging is in Documentation/laptops/laptop-mode.txt. | ||
1636 | |||
1637 | swap_token_timeout | ||
1638 | ------------------ | ||
1639 | |||
1640 | This file contains valid hold time of swap out protection token. The Linux | ||
1641 | VM has token based thrashing control mechanism and uses the token to prevent | ||
1642 | unnecessary page faults in thrashing situation. The unit of the value is | ||
1643 | second. The value would be useful to tune thrashing behavior. | ||
1644 | |||
1645 | drop_caches | ||
1646 | ----------- | ||
1647 | |||
1648 | Writing to this will cause the kernel to drop clean caches, dentries and | ||
1649 | inodes from memory, causing that memory to become free. | ||
1650 | |||
1651 | To free pagecache: | ||
1652 | echo 1 > /proc/sys/vm/drop_caches | ||
1653 | To free dentries and inodes: | ||
1654 | echo 2 > /proc/sys/vm/drop_caches | ||
1655 | To free pagecache, dentries and inodes: | ||
1656 | echo 3 > /proc/sys/vm/drop_caches | ||
1657 | |||
1658 | As this is a non-destructive operation and dirty objects are not freeable, the | ||
1659 | user should run `sync' first. | ||
1660 | 1376 | ||
1661 | 1377 | ||
1662 | 2.5 /proc/sys/dev - Device specific parameters | 1378 | 2.5 /proc/sys/dev - Device specific parameters |