diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2015-04-15 19:39:15 -0400 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2015-04-15 19:39:15 -0400 |
commit | eea3a00264cf243a28e4331566ce67b86059339d (patch) | |
tree | 487f16389e0dfa32e9caa7604d1274a7dcda8f04 /Documentation | |
parent | e7c82412433a8039616c7314533a0a1c025d99bf (diff) | |
parent | e693d73c20ffdb06840c9378f367bad849ac0d5d (diff) |
Merge branch 'akpm' (patches from Andrew)
Merge second patchbomb from Andrew Morton:
- the rest of MM
- various misc bits
- add ability to run /sbin/reboot at reboot time
- printk/vsprintf changes
- fiddle with seq_printf() return value
* akpm: (114 commits)
parisc: remove use of seq_printf return value
lru_cache: remove use of seq_printf return value
tracing: remove use of seq_printf return value
cgroup: remove use of seq_printf return value
proc: remove use of seq_printf return value
s390: remove use of seq_printf return value
cris fasttimer: remove use of seq_printf return value
cris: remove use of seq_printf return value
openrisc: remove use of seq_printf return value
ARM: plat-pxa: remove use of seq_printf return value
nios2: cpuinfo: remove use of seq_printf return value
microblaze: mb: remove use of seq_printf return value
ipc: remove use of seq_printf return value
rtc: remove use of seq_printf return value
power: wakeup: remove use of seq_printf return value
x86: mtrr: if: remove use of seq_printf return value
linux/bitmap.h: improve BITMAP_{LAST,FIRST}_WORD_MASK
MAINTAINERS: CREDITS: remove Stefano Brivio from B43
.mailmap: add Ricardo Ribalda
CREDITS: add Ricardo Ribalda Delgado
...
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/ABI/obsolete/sysfs-block-zram | 119 | ||||
-rw-r--r-- | Documentation/ABI/testing/sysfs-block-zram | 25 | ||||
-rw-r--r-- | Documentation/blockdev/zram.txt | 87 | ||||
-rw-r--r-- | Documentation/filesystems/Locking | 8 | ||||
-rw-r--r-- | Documentation/printk-formats.txt | 49 | ||||
-rw-r--r-- | Documentation/sysctl/vm.txt | 11 | ||||
-rw-r--r-- | Documentation/vm/hugetlbpage.txt | 55 | ||||
-rw-r--r-- | Documentation/vm/unevictable-lru.txt | 12 | ||||
-rw-r--r-- | Documentation/vm/zsmalloc.txt | 70 |
9 files changed, 393 insertions, 43 deletions
diff --git a/Documentation/ABI/obsolete/sysfs-block-zram b/Documentation/ABI/obsolete/sysfs-block-zram new file mode 100644 index 000000000000..720ea92cfb2e --- /dev/null +++ b/Documentation/ABI/obsolete/sysfs-block-zram | |||
@@ -0,0 +1,119 @@ | |||
1 | What: /sys/block/zram<id>/num_reads | ||
2 | Date: August 2015 | ||
3 | Contact: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> | ||
4 | Description: | ||
5 | The num_reads file is read-only and specifies the number of | ||
6 | reads (failed or successful) done on this device. | ||
7 | Now accessible via zram<id>/stat node. | ||
8 | |||
9 | What: /sys/block/zram<id>/num_writes | ||
10 | Date: August 2015 | ||
11 | Contact: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> | ||
12 | Description: | ||
13 | The num_writes file is read-only and specifies the number of | ||
14 | writes (failed or successful) done on this device. | ||
15 | Now accessible via zram<id>/stat node. | ||
16 | |||
17 | What: /sys/block/zram<id>/invalid_io | ||
18 | Date: August 2015 | ||
19 | Contact: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> | ||
20 | Description: | ||
21 | The invalid_io file is read-only and specifies the number of | ||
22 | non-page-size-aligned I/O requests issued to this device. | ||
23 | Now accessible via zram<id>/io_stat node. | ||
24 | |||
25 | What: /sys/block/zram<id>/failed_reads | ||
26 | Date: August 2015 | ||
27 | Contact: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> | ||
28 | Description: | ||
29 | The failed_reads file is read-only and specifies the number of | ||
30 | failed reads happened on this device. | ||
31 | Now accessible via zram<id>/io_stat node. | ||
32 | |||
33 | What: /sys/block/zram<id>/failed_writes | ||
34 | Date: August 2015 | ||
35 | Contact: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> | ||
36 | Description: | ||
37 | The failed_writes file is read-only and specifies the number of | ||
38 | failed writes happened on this device. | ||
39 | Now accessible via zram<id>/io_stat node. | ||
40 | |||
41 | What: /sys/block/zram<id>/notify_free | ||
42 | Date: August 2015 | ||
43 | Contact: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> | ||
44 | Description: | ||
45 | The notify_free file is read-only. Depending on device usage | ||
46 | scenario it may account a) the number of pages freed because | ||
47 | of swap slot free notifications or b) the number of pages freed | ||
48 | because of REQ_DISCARD requests sent by bio. The former ones | ||
49 | are sent to a swap block device when a swap slot is freed, which | ||
50 | implies that this disk is being used as a swap disk. The latter | ||
51 | ones are sent by filesystem mounted with discard option, | ||
52 | whenever some data blocks are getting discarded. | ||
53 | Now accessible via zram<id>/io_stat node. | ||
54 | |||
55 | What: /sys/block/zram<id>/zero_pages | ||
56 | Date: August 2015 | ||
57 | Contact: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> | ||
58 | Description: | ||
59 | The zero_pages file is read-only and specifies number of zero | ||
60 | filled pages written to this disk. No memory is allocated for | ||
61 | such pages. | ||
62 | Now accessible via zram<id>/mm_stat node. | ||
63 | |||
64 | What: /sys/block/zram<id>/orig_data_size | ||
65 | Date: August 2015 | ||
66 | Contact: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> | ||
67 | Description: | ||
68 | The orig_data_size file is read-only and specifies uncompressed | ||
69 | size of data stored in this disk. This excludes zero-filled | ||
70 | pages (zero_pages) since no memory is allocated for them. | ||
71 | Unit: bytes | ||
72 | Now accessible via zram<id>/mm_stat node. | ||
73 | |||
74 | What: /sys/block/zram<id>/compr_data_size | ||
75 | Date: August 2015 | ||
76 | Contact: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> | ||
77 | Description: | ||
78 | The compr_data_size file is read-only and specifies compressed | ||
79 | size of data stored in this disk. So, compression ratio can be | ||
80 | calculated using orig_data_size and this statistic. | ||
81 | Unit: bytes | ||
82 | Now accessible via zram<id>/mm_stat node. | ||
83 | |||
84 | What: /sys/block/zram<id>/mem_used_total | ||
85 | Date: August 2015 | ||
86 | Contact: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> | ||
87 | Description: | ||
88 | The mem_used_total file is read-only and specifies the amount | ||
89 | of memory, including allocator fragmentation and metadata | ||
90 | overhead, allocated for this disk. So, allocator space | ||
91 | efficiency can be calculated using compr_data_size and this | ||
92 | statistic. | ||
93 | Unit: bytes | ||
94 | Now accessible via zram<id>/mm_stat node. | ||
95 | |||
96 | What: /sys/block/zram<id>/mem_used_max | ||
97 | Date: August 2015 | ||
98 | Contact: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> | ||
99 | Description: | ||
100 | The mem_used_max file is read/write and specifies the amount | ||
101 | of maximum memory zram have consumed to store compressed data. | ||
102 | For resetting the value, you should write "0". Otherwise, | ||
103 | you could see -EINVAL. | ||
104 | Unit: bytes | ||
105 | Downgraded to write-only node: so it's possible to set new | ||
106 | value only; its current value is stored in zram<id>/mm_stat | ||
107 | node. | ||
108 | |||
109 | What: /sys/block/zram<id>/mem_limit | ||
110 | Date: August 2015 | ||
111 | Contact: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> | ||
112 | Description: | ||
113 | The mem_limit file is read/write and specifies the maximum | ||
114 | amount of memory ZRAM can use to store the compressed data. | ||
115 | The limit could be changed in run time and "0" means disable | ||
116 | the limit. No limit is the initial state. Unit: bytes | ||
117 | Downgraded to write-only node: so it's possible to set new | ||
118 | value only; its current value is stored in zram<id>/mm_stat | ||
119 | node. | ||
diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram index a6148eaf91e5..2e69e83bf510 100644 --- a/Documentation/ABI/testing/sysfs-block-zram +++ b/Documentation/ABI/testing/sysfs-block-zram | |||
@@ -141,3 +141,28 @@ Description: | |||
141 | amount of memory ZRAM can use to store the compressed data. The | 141 | amount of memory ZRAM can use to store the compressed data. The |
142 | limit could be changed in run time and "0" means disable the | 142 | limit could be changed in run time and "0" means disable the |
143 | limit. No limit is the initial state. Unit: bytes | 143 | limit. No limit is the initial state. Unit: bytes |
144 | |||
145 | What: /sys/block/zram<id>/compact | ||
146 | Date: August 2015 | ||
147 | Contact: Minchan Kim <minchan@kernel.org> | ||
148 | Description: | ||
149 | The compact file is write-only and trigger compaction for | ||
150 | allocator zrm uses. The allocator moves some objects so that | ||
151 | it could free fragment space. | ||
152 | |||
153 | What: /sys/block/zram<id>/io_stat | ||
154 | Date: August 2015 | ||
155 | Contact: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> | ||
156 | Description: | ||
157 | The io_stat file is read-only and accumulates device's I/O | ||
158 | statistics not accounted by block layer. For example, | ||
159 | failed_reads, failed_writes, etc. File format is similar to | ||
160 | block layer statistics file format. | ||
161 | |||
162 | What: /sys/block/zram<id>/mm_stat | ||
163 | Date: August 2015 | ||
164 | Contact: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> | ||
165 | Description: | ||
166 | The mm_stat file is read-only and represents device's mm | ||
167 | statistics (orig_data_size, compr_data_size, etc.) in a format | ||
168 | similar to block layer statistics file format. | ||
diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt index 7fcf9c6592ec..48a183e29988 100644 --- a/Documentation/blockdev/zram.txt +++ b/Documentation/blockdev/zram.txt | |||
@@ -98,20 +98,79 @@ size of the disk when not in use so a huge zram is wasteful. | |||
98 | mount /dev/zram1 /tmp | 98 | mount /dev/zram1 /tmp |
99 | 99 | ||
100 | 7) Stats: | 100 | 7) Stats: |
101 | Per-device statistics are exported as various nodes under | 101 | Per-device statistics are exported as various nodes under /sys/block/zram<id>/ |
102 | /sys/block/zram<id>/ | 102 | |
103 | disksize | 103 | A brief description of exported device attritbutes. For more details please |
104 | num_reads | 104 | read Documentation/ABI/testing/sysfs-block-zram. |
105 | num_writes | 105 | |
106 | failed_reads | 106 | Name access description |
107 | failed_writes | 107 | ---- ------ ----------- |
108 | invalid_io | 108 | disksize RW show and set the device's disk size |
109 | notify_free | 109 | initstate RO shows the initialization state of the device |
110 | zero_pages | 110 | reset WO trigger device reset |
111 | orig_data_size | 111 | num_reads RO the number of reads |
112 | compr_data_size | 112 | failed_reads RO the number of failed reads |
113 | mem_used_total | 113 | num_write RO the number of writes |
114 | mem_used_max | 114 | failed_writes RO the number of failed writes |
115 | invalid_io RO the number of non-page-size-aligned I/O requests | ||
116 | max_comp_streams RW the number of possible concurrent compress operations | ||
117 | comp_algorithm RW show and change the compression algorithm | ||
118 | notify_free RO the number of notifications to free pages (either | ||
119 | slot free notifications or REQ_DISCARD requests) | ||
120 | zero_pages RO the number of zero filled pages written to this disk | ||
121 | orig_data_size RO uncompressed size of data stored in this disk | ||
122 | compr_data_size RO compressed size of data stored in this disk | ||
123 | mem_used_total RO the amount of memory allocated for this disk | ||
124 | mem_used_max RW the maximum amount memory zram have consumed to | ||
125 | store compressed data | ||
126 | mem_limit RW the maximum amount of memory ZRAM can use to store | ||
127 | the compressed data | ||
128 | num_migrated RO the number of objects migrated migrated by compaction | ||
129 | |||
130 | |||
131 | WARNING | ||
132 | ======= | ||
133 | per-stat sysfs attributes are considered to be deprecated. | ||
134 | The basic strategy is: | ||
135 | -- the existing RW nodes will be downgraded to WO nodes (in linux 4.11) | ||
136 | -- deprecated RO sysfs nodes will eventually be removed (in linux 4.11) | ||
137 | |||
138 | The list of deprecated attributes can be found here: | ||
139 | Documentation/ABI/obsolete/sysfs-block-zram | ||
140 | |||
141 | Basically, every attribute that has its own read accessible sysfs node | ||
142 | (e.g. num_reads) *AND* is accessible via one of the stat files (zram<id>/stat | ||
143 | or zram<id>/io_stat or zram<id>/mm_stat) is considered to be deprecated. | ||
144 | |||
145 | User space is advised to use the following files to read the device statistics. | ||
146 | |||
147 | File /sys/block/zram<id>/stat | ||
148 | |||
149 | Represents block layer statistics. Read Documentation/block/stat.txt for | ||
150 | details. | ||
151 | |||
152 | File /sys/block/zram<id>/io_stat | ||
153 | |||
154 | The stat file represents device's I/O statistics not accounted by block | ||
155 | layer and, thus, not available in zram<id>/stat file. It consists of a | ||
156 | single line of text and contains the following stats separated by | ||
157 | whitespace: | ||
158 | failed_reads | ||
159 | failed_writes | ||
160 | invalid_io | ||
161 | notify_free | ||
162 | |||
163 | File /sys/block/zram<id>/mm_stat | ||
164 | |||
165 | The stat file represents device's mm statistics. It consists of a single | ||
166 | line of text and contains the following stats separated by whitespace: | ||
167 | orig_data_size | ||
168 | compr_data_size | ||
169 | mem_used_total | ||
170 | mem_limit | ||
171 | mem_used_max | ||
172 | zero_pages | ||
173 | num_migrated | ||
115 | 174 | ||
116 | 8) Deactivate: | 175 | 8) Deactivate: |
117 | swapoff /dev/zram0 | 176 | swapoff /dev/zram0 |
diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking index c3cd6279e92e..7c3f187d48bf 100644 --- a/Documentation/filesystems/Locking +++ b/Documentation/filesystems/Locking | |||
@@ -523,6 +523,7 @@ prototypes: | |||
523 | void (*close)(struct vm_area_struct*); | 523 | void (*close)(struct vm_area_struct*); |
524 | int (*fault)(struct vm_area_struct*, struct vm_fault *); | 524 | int (*fault)(struct vm_area_struct*, struct vm_fault *); |
525 | int (*page_mkwrite)(struct vm_area_struct *, struct vm_fault *); | 525 | int (*page_mkwrite)(struct vm_area_struct *, struct vm_fault *); |
526 | int (*pfn_mkwrite)(struct vm_area_struct *, struct vm_fault *); | ||
526 | int (*access)(struct vm_area_struct *, unsigned long, void*, int, int); | 527 | int (*access)(struct vm_area_struct *, unsigned long, void*, int, int); |
527 | 528 | ||
528 | locking rules: | 529 | locking rules: |
@@ -532,6 +533,7 @@ close: yes | |||
532 | fault: yes can return with page locked | 533 | fault: yes can return with page locked |
533 | map_pages: yes | 534 | map_pages: yes |
534 | page_mkwrite: yes can return with page locked | 535 | page_mkwrite: yes can return with page locked |
536 | pfn_mkwrite: yes | ||
535 | access: yes | 537 | access: yes |
536 | 538 | ||
537 | ->fault() is called when a previously not present pte is about | 539 | ->fault() is called when a previously not present pte is about |
@@ -558,6 +560,12 @@ the page has been truncated, the filesystem should not look up a new page | |||
558 | like the ->fault() handler, but simply return with VM_FAULT_NOPAGE, which | 560 | like the ->fault() handler, but simply return with VM_FAULT_NOPAGE, which |
559 | will cause the VM to retry the fault. | 561 | will cause the VM to retry the fault. |
560 | 562 | ||
563 | ->pfn_mkwrite() is the same as page_mkwrite but when the pte is | ||
564 | VM_PFNMAP or VM_MIXEDMAP with a page-less entry. Expected return is | ||
565 | VM_FAULT_NOPAGE. Or one of the VM_FAULT_ERROR types. The default behavior | ||
566 | after this call is to make the pte read-write, unless pfn_mkwrite returns | ||
567 | an error. | ||
568 | |||
561 | ->access() is called when get_user_pages() fails in | 569 | ->access() is called when get_user_pages() fails in |
562 | access_process_vm(), typically used to debug a process through | 570 | access_process_vm(), typically used to debug a process through |
563 | /proc/pid/mem or ptrace. This function is needed only for | 571 | /proc/pid/mem or ptrace. This function is needed only for |
diff --git a/Documentation/printk-formats.txt b/Documentation/printk-formats.txt index 5a615c14f75d..cb6a596072bb 100644 --- a/Documentation/printk-formats.txt +++ b/Documentation/printk-formats.txt | |||
@@ -8,6 +8,21 @@ If variable is of Type, use printk format specifier: | |||
8 | unsigned long long %llu or %llx | 8 | unsigned long long %llu or %llx |
9 | size_t %zu or %zx | 9 | size_t %zu or %zx |
10 | ssize_t %zd or %zx | 10 | ssize_t %zd or %zx |
11 | s32 %d or %x | ||
12 | u32 %u or %x | ||
13 | s64 %lld or %llx | ||
14 | u64 %llu or %llx | ||
15 | |||
16 | If <type> is dependent on a config option for its size (e.g., sector_t, | ||
17 | blkcnt_t) or is architecture-dependent for its size (e.g., tcflag_t), use a | ||
18 | format specifier of its largest possible type and explicitly cast to it. | ||
19 | Example: | ||
20 | |||
21 | printk("test: sector number/total blocks: %llu/%llu\n", | ||
22 | (unsigned long long)sector, (unsigned long long)blockcount); | ||
23 | |||
24 | Reminder: sizeof() result is of type size_t. | ||
25 | |||
11 | 26 | ||
12 | Raw pointer value SHOULD be printed with %p. The kernel supports | 27 | Raw pointer value SHOULD be printed with %p. The kernel supports |
13 | the following extended format specifiers for pointer types: | 28 | the following extended format specifiers for pointer types: |
@@ -54,6 +69,7 @@ Struct Resources: | |||
54 | 69 | ||
55 | For printing struct resources. The 'R' and 'r' specifiers result in a | 70 | For printing struct resources. The 'R' and 'r' specifiers result in a |
56 | printed resource with ('R') or without ('r') a decoded flags member. | 71 | printed resource with ('R') or without ('r') a decoded flags member. |
72 | Passed by reference. | ||
57 | 73 | ||
58 | Physical addresses types phys_addr_t: | 74 | Physical addresses types phys_addr_t: |
59 | 75 | ||
@@ -132,6 +148,8 @@ MAC/FDDI addresses: | |||
132 | specifier to use reversed byte order suitable for visual interpretation | 148 | specifier to use reversed byte order suitable for visual interpretation |
133 | of Bluetooth addresses which are in the little endian order. | 149 | of Bluetooth addresses which are in the little endian order. |
134 | 150 | ||
151 | Passed by reference. | ||
152 | |||
135 | IPv4 addresses: | 153 | IPv4 addresses: |
136 | 154 | ||
137 | %pI4 1.2.3.4 | 155 | %pI4 1.2.3.4 |
@@ -146,6 +164,8 @@ IPv4 addresses: | |||
146 | host, network, big or little endian order addresses respectively. Where | 164 | host, network, big or little endian order addresses respectively. Where |
147 | no specifier is provided the default network/big endian order is used. | 165 | no specifier is provided the default network/big endian order is used. |
148 | 166 | ||
167 | Passed by reference. | ||
168 | |||
149 | IPv6 addresses: | 169 | IPv6 addresses: |
150 | 170 | ||
151 | %pI6 0001:0002:0003:0004:0005:0006:0007:0008 | 171 | %pI6 0001:0002:0003:0004:0005:0006:0007:0008 |
@@ -160,6 +180,8 @@ IPv6 addresses: | |||
160 | print a compressed IPv6 address as described by | 180 | print a compressed IPv6 address as described by |
161 | http://tools.ietf.org/html/rfc5952 | 181 | http://tools.ietf.org/html/rfc5952 |
162 | 182 | ||
183 | Passed by reference. | ||
184 | |||
163 | IPv4/IPv6 addresses (generic, with port, flowinfo, scope): | 185 | IPv4/IPv6 addresses (generic, with port, flowinfo, scope): |
164 | 186 | ||
165 | %pIS 1.2.3.4 or 0001:0002:0003:0004:0005:0006:0007:0008 | 187 | %pIS 1.2.3.4 or 0001:0002:0003:0004:0005:0006:0007:0008 |
@@ -186,6 +208,8 @@ IPv4/IPv6 addresses (generic, with port, flowinfo, scope): | |||
186 | specifiers can be used as well and are ignored in case of an IPv6 | 208 | specifiers can be used as well and are ignored in case of an IPv6 |
187 | address. | 209 | address. |
188 | 210 | ||
211 | Passed by reference. | ||
212 | |||
189 | Further examples: | 213 | Further examples: |
190 | 214 | ||
191 | %pISfc 1.2.3.4 or [1:2:3:4:5:6:7:8]/123456789 | 215 | %pISfc 1.2.3.4 or [1:2:3:4:5:6:7:8]/123456789 |
@@ -207,6 +231,8 @@ UUID/GUID addresses: | |||
207 | Where no additional specifiers are used the default little endian | 231 | Where no additional specifiers are used the default little endian |
208 | order with lower case hex characters will be printed. | 232 | order with lower case hex characters will be printed. |
209 | 233 | ||
234 | Passed by reference. | ||
235 | |||
210 | dentry names: | 236 | dentry names: |
211 | %pd{,2,3,4} | 237 | %pd{,2,3,4} |
212 | %pD{,2,3,4} | 238 | %pD{,2,3,4} |
@@ -216,6 +242,8 @@ dentry names: | |||
216 | equivalent of %s dentry->d_name.name we used to use, %pd<n> prints | 242 | equivalent of %s dentry->d_name.name we used to use, %pd<n> prints |
217 | n last components. %pD does the same thing for struct file. | 243 | n last components. %pD does the same thing for struct file. |
218 | 244 | ||
245 | Passed by reference. | ||
246 | |||
219 | struct va_format: | 247 | struct va_format: |
220 | 248 | ||
221 | %pV | 249 | %pV |
@@ -231,23 +259,20 @@ struct va_format: | |||
231 | Do not use this feature without some mechanism to verify the | 259 | Do not use this feature without some mechanism to verify the |
232 | correctness of the format string and va_list arguments. | 260 | correctness of the format string and va_list arguments. |
233 | 261 | ||
234 | u64 SHOULD be printed with %llu/%llx: | 262 | Passed by reference. |
235 | |||
236 | printk("%llu", u64_var); | ||
237 | 263 | ||
238 | s64 SHOULD be printed with %lld/%llx: | 264 | struct clk: |
239 | 265 | ||
240 | printk("%lld", s64_var); | 266 | %pC pll1 |
267 | %pCn pll1 | ||
268 | %pCr 1560000000 | ||
241 | 269 | ||
242 | If <type> is dependent on a config option for its size (e.g., sector_t, | 270 | For printing struct clk structures. '%pC' and '%pCn' print the name |
243 | blkcnt_t) or is architecture-dependent for its size (e.g., tcflag_t), use a | 271 | (Common Clock Framework) or address (legacy clock framework) of the |
244 | format specifier of its largest possible type and explicitly cast to it. | 272 | structure; '%pCr' prints the current clock rate. |
245 | Example: | ||
246 | 273 | ||
247 | printk("test: sector number/total blocks: %llu/%llu\n", | 274 | Passed by reference. |
248 | (unsigned long long)sector, (unsigned long long)blockcount); | ||
249 | 275 | ||
250 | Reminder: sizeof() result is of type size_t. | ||
251 | 276 | ||
252 | Thank you for your cooperation and attention. | 277 | Thank you for your cooperation and attention. |
253 | 278 | ||
diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt index 902b4574acfb..9832ec52f859 100644 --- a/Documentation/sysctl/vm.txt +++ b/Documentation/sysctl/vm.txt | |||
@@ -21,6 +21,7 @@ Currently, these files are in /proc/sys/vm: | |||
21 | - admin_reserve_kbytes | 21 | - admin_reserve_kbytes |
22 | - block_dump | 22 | - block_dump |
23 | - compact_memory | 23 | - compact_memory |
24 | - compact_unevictable_allowed | ||
24 | - dirty_background_bytes | 25 | - dirty_background_bytes |
25 | - dirty_background_ratio | 26 | - dirty_background_ratio |
26 | - dirty_bytes | 27 | - dirty_bytes |
@@ -106,6 +107,16 @@ huge pages although processes will also directly compact memory as required. | |||
106 | 107 | ||
107 | ============================================================== | 108 | ============================================================== |
108 | 109 | ||
110 | compact_unevictable_allowed | ||
111 | |||
112 | Available only when CONFIG_COMPACTION is set. When set to 1, compaction is | ||
113 | allowed to examine the unevictable lru (mlocked pages) for pages to compact. | ||
114 | This should be used on systems where stalls for minor page faults are an | ||
115 | acceptable trade for large contiguous free memory. Set to 0 to prevent | ||
116 | compaction from moving pages that are unevictable. Default value is 1. | ||
117 | |||
118 | ============================================================== | ||
119 | |||
109 | dirty_background_bytes | 120 | dirty_background_bytes |
110 | 121 | ||
111 | Contains the amount of dirty memory at which the background kernel | 122 | Contains the amount of dirty memory at which the background kernel |
diff --git a/Documentation/vm/hugetlbpage.txt b/Documentation/vm/hugetlbpage.txt index f2d3a100fe38..030977fb8d2d 100644 --- a/Documentation/vm/hugetlbpage.txt +++ b/Documentation/vm/hugetlbpage.txt | |||
@@ -267,21 +267,34 @@ call, then it is required that system administrator mount a file system of | |||
267 | type hugetlbfs: | 267 | type hugetlbfs: |
268 | 268 | ||
269 | mount -t hugetlbfs \ | 269 | mount -t hugetlbfs \ |
270 | -o uid=<value>,gid=<value>,mode=<value>,size=<value>,nr_inodes=<value> \ | 270 | -o uid=<value>,gid=<value>,mode=<value>,pagesize=<value>,size=<value>,\ |
271 | none /mnt/huge | 271 | min_size=<value>,nr_inodes=<value> none /mnt/huge |
272 | 272 | ||
273 | This command mounts a (pseudo) filesystem of type hugetlbfs on the directory | 273 | This command mounts a (pseudo) filesystem of type hugetlbfs on the directory |
274 | /mnt/huge. Any files created on /mnt/huge uses huge pages. The uid and gid | 274 | /mnt/huge. Any files created on /mnt/huge uses huge pages. The uid and gid |
275 | options sets the owner and group of the root of the file system. By default | 275 | options sets the owner and group of the root of the file system. By default |
276 | the uid and gid of the current process are taken. The mode option sets the | 276 | the uid and gid of the current process are taken. The mode option sets the |
277 | mode of root of file system to value & 01777. This value is given in octal. | 277 | mode of root of file system to value & 01777. This value is given in octal. |
278 | By default the value 0755 is picked. The size option sets the maximum value of | 278 | By default the value 0755 is picked. If the paltform supports multiple huge |
279 | memory (huge pages) allowed for that filesystem (/mnt/huge). The size is | 279 | page sizes, the pagesize option can be used to specify the huge page size and |
280 | rounded down to HPAGE_SIZE. The option nr_inodes sets the maximum number of | 280 | associated pool. pagesize is specified in bytes. If pagesize is not specified |
281 | inodes that /mnt/huge can use. If the size or nr_inodes option is not | 281 | the paltform's default huge page size and associated pool will be used. The |
282 | provided on command line then no limits are set. For size and nr_inodes | 282 | size option sets the maximum value of memory (huge pages) allowed for that |
283 | options, you can use [G|g]/[M|m]/[K|k] to represent giga/mega/kilo. For | 283 | filesystem (/mnt/huge). The size option can be specified in bytes, or as a |
284 | example, size=2K has the same meaning as size=2048. | 284 | percentage of the specified huge page pool (nr_hugepages). The size is |
285 | rounded down to HPAGE_SIZE boundary. The min_size option sets the minimum | ||
286 | value of memory (huge pages) allowed for the filesystem. min_size can be | ||
287 | specified in the same way as size, either bytes or a percentage of the | ||
288 | huge page pool. At mount time, the number of huge pages specified by | ||
289 | min_size are reserved for use by the filesystem. If there are not enough | ||
290 | free huge pages available, the mount will fail. As huge pages are allocated | ||
291 | to the filesystem and freed, the reserve count is adjusted so that the sum | ||
292 | of allocated and reserved huge pages is always at least min_size. The option | ||
293 | nr_inodes sets the maximum number of inodes that /mnt/huge can use. If the | ||
294 | size, min_size or nr_inodes option is not provided on command line then | ||
295 | no limits are set. For pagesize, size, min_size and nr_inodes options, you | ||
296 | can use [G|g]/[M|m]/[K|k] to represent giga/mega/kilo. For example, size=2K | ||
297 | has the same meaning as size=2048. | ||
285 | 298 | ||
286 | While read system calls are supported on files that reside on hugetlb | 299 | While read system calls are supported on files that reside on hugetlb |
287 | file systems, write system calls are not. | 300 | file systems, write system calls are not. |
@@ -289,15 +302,23 @@ file systems, write system calls are not. | |||
289 | Regular chown, chgrp, and chmod commands (with right permissions) could be | 302 | Regular chown, chgrp, and chmod commands (with right permissions) could be |
290 | used to change the file attributes on hugetlbfs. | 303 | used to change the file attributes on hugetlbfs. |
291 | 304 | ||
292 | Also, it is important to note that no such mount command is required if the | 305 | Also, it is important to note that no such mount command is required if |
293 | applications are going to use only shmat/shmget system calls or mmap with | 306 | applications are going to use only shmat/shmget system calls or mmap with |
294 | MAP_HUGETLB. Users who wish to use hugetlb page via shared memory segment | 307 | MAP_HUGETLB. For an example of how to use mmap with MAP_HUGETLB see map_hugetlb |
295 | should be a member of a supplementary group and system admin needs to | 308 | below. |
296 | configure that gid into /proc/sys/vm/hugetlb_shm_group. It is possible for | 309 | |
297 | same or different applications to use any combination of mmaps and shm* | 310 | Users who wish to use hugetlb memory via shared memory segment should be a |
298 | calls, though the mount of filesystem will be required for using mmap calls | 311 | member of a supplementary group and system admin needs to configure that gid |
299 | without MAP_HUGETLB. For an example of how to use mmap with MAP_HUGETLB see | 312 | into /proc/sys/vm/hugetlb_shm_group. It is possible for same or different |
300 | map_hugetlb.c. | 313 | applications to use any combination of mmaps and shm* calls, though the mount of |
314 | filesystem will be required for using mmap calls without MAP_HUGETLB. | ||
315 | |||
316 | Syscalls that operate on memory backed by hugetlb pages only have their lengths | ||
317 | aligned to the native page size of the processor; they will normally fail with | ||
318 | errno set to EINVAL or exclude hugetlb pages that extend beyond the length if | ||
319 | not hugepage aligned. For example, munmap(2) will fail if memory is backed by | ||
320 | a hugetlb page and the length is smaller than the hugepage size. | ||
321 | |||
301 | 322 | ||
302 | Examples | 323 | Examples |
303 | ======== | 324 | ======== |
diff --git a/Documentation/vm/unevictable-lru.txt b/Documentation/vm/unevictable-lru.txt index 86cb4624fc5a..3be0bfc4738d 100644 --- a/Documentation/vm/unevictable-lru.txt +++ b/Documentation/vm/unevictable-lru.txt | |||
@@ -22,6 +22,7 @@ CONTENTS | |||
22 | - Filtering special vmas. | 22 | - Filtering special vmas. |
23 | - munlock()/munlockall() system call handling. | 23 | - munlock()/munlockall() system call handling. |
24 | - Migrating mlocked pages. | 24 | - Migrating mlocked pages. |
25 | - Compacting mlocked pages. | ||
25 | - mmap(MAP_LOCKED) system call handling. | 26 | - mmap(MAP_LOCKED) system call handling. |
26 | - munmap()/exit()/exec() system call handling. | 27 | - munmap()/exit()/exec() system call handling. |
27 | - try_to_unmap(). | 28 | - try_to_unmap(). |
@@ -450,6 +451,17 @@ list because of a race between munlock and migration, page migration uses the | |||
450 | putback_lru_page() function to add migrated pages back to the LRU. | 451 | putback_lru_page() function to add migrated pages back to the LRU. |
451 | 452 | ||
452 | 453 | ||
454 | COMPACTING MLOCKED PAGES | ||
455 | ------------------------ | ||
456 | |||
457 | The unevictable LRU can be scanned for compactable regions and the default | ||
458 | behavior is to do so. /proc/sys/vm/compact_unevictable_allowed controls | ||
459 | this behavior (see Documentation/sysctl/vm.txt). Once scanning of the | ||
460 | unevictable LRU is enabled, the work of compaction is mostly handled by | ||
461 | the page migration code and the same work flow as described in MIGRATING | ||
462 | MLOCKED PAGES will apply. | ||
463 | |||
464 | |||
453 | mmap(MAP_LOCKED) SYSTEM CALL HANDLING | 465 | mmap(MAP_LOCKED) SYSTEM CALL HANDLING |
454 | ------------------------------------- | 466 | ------------------------------------- |
455 | 467 | ||
diff --git a/Documentation/vm/zsmalloc.txt b/Documentation/vm/zsmalloc.txt new file mode 100644 index 000000000000..64ed63c4f69d --- /dev/null +++ b/Documentation/vm/zsmalloc.txt | |||
@@ -0,0 +1,70 @@ | |||
1 | zsmalloc | ||
2 | -------- | ||
3 | |||
4 | This allocator is designed for use with zram. Thus, the allocator is | ||
5 | supposed to work well under low memory conditions. In particular, it | ||
6 | never attempts higher order page allocation which is very likely to | ||
7 | fail under memory pressure. On the other hand, if we just use single | ||
8 | (0-order) pages, it would suffer from very high fragmentation -- | ||
9 | any object of size PAGE_SIZE/2 or larger would occupy an entire page. | ||
10 | This was one of the major issues with its predecessor (xvmalloc). | ||
11 | |||
12 | To overcome these issues, zsmalloc allocates a bunch of 0-order pages | ||
13 | and links them together using various 'struct page' fields. These linked | ||
14 | pages act as a single higher-order page i.e. an object can span 0-order | ||
15 | page boundaries. The code refers to these linked pages as a single entity | ||
16 | called zspage. | ||
17 | |||
18 | For simplicity, zsmalloc can only allocate objects of size up to PAGE_SIZE | ||
19 | since this satisfies the requirements of all its current users (in the | ||
20 | worst case, page is incompressible and is thus stored "as-is" i.e. in | ||
21 | uncompressed form). For allocation requests larger than this size, failure | ||
22 | is returned (see zs_malloc). | ||
23 | |||
24 | Additionally, zs_malloc() does not return a dereferenceable pointer. | ||
25 | Instead, it returns an opaque handle (unsigned long) which encodes actual | ||
26 | location of the allocated object. The reason for this indirection is that | ||
27 | zsmalloc does not keep zspages permanently mapped since that would cause | ||
28 | issues on 32-bit systems where the VA region for kernel space mappings | ||
29 | is very small. So, before using the allocating memory, the object has to | ||
30 | be mapped using zs_map_object() to get a usable pointer and subsequently | ||
31 | unmapped using zs_unmap_object(). | ||
32 | |||
33 | stat | ||
34 | ---- | ||
35 | |||
36 | With CONFIG_ZSMALLOC_STAT, we could see zsmalloc internal information via | ||
37 | /sys/kernel/debug/zsmalloc/<user name>. Here is a sample of stat output: | ||
38 | |||
39 | # cat /sys/kernel/debug/zsmalloc/zram0/classes | ||
40 | |||
41 | class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage | ||
42 | .. | ||
43 | .. | ||
44 | 9 176 0 1 186 129 8 4 | ||
45 | 10 192 1 0 2880 2872 135 3 | ||
46 | 11 208 0 1 819 795 42 2 | ||
47 | 12 224 0 1 219 159 12 4 | ||
48 | .. | ||
49 | .. | ||
50 | |||
51 | |||
52 | class: index | ||
53 | size: object size zspage stores | ||
54 | almost_empty: the number of ZS_ALMOST_EMPTY zspages(see below) | ||
55 | almost_full: the number of ZS_ALMOST_FULL zspages(see below) | ||
56 | obj_allocated: the number of objects allocated | ||
57 | obj_used: the number of objects allocated to the user | ||
58 | pages_used: the number of pages allocated for the class | ||
59 | pages_per_zspage: the number of 0-order pages to make a zspage | ||
60 | |||
61 | We assign a zspage to ZS_ALMOST_EMPTY fullness group when: | ||
62 | n <= N / f, where | ||
63 | n = number of allocated objects | ||
64 | N = total number of objects zspage can store | ||
65 | f = fullness_threshold_frac(ie, 4 at the moment) | ||
66 | |||
67 | Similarly, we assign zspage to: | ||
68 | ZS_ALMOST_FULL when n > N / f | ||
69 | ZS_EMPTY when n == 0 | ||
70 | ZS_FULL when n == N | ||