diff options
Diffstat (limited to 'Documentation/block/queue-sysfs.rst')
| -rw-r--r-- | Documentation/block/queue-sysfs.rst | 254 |
1 files changed, 254 insertions, 0 deletions
diff --git a/Documentation/block/queue-sysfs.rst b/Documentation/block/queue-sysfs.rst new file mode 100644 index 000000000000..6a8513af9201 --- /dev/null +++ b/Documentation/block/queue-sysfs.rst | |||
| @@ -0,0 +1,254 @@ | |||
| 1 | ================= | ||
| 2 | Queue sysfs files | ||
| 3 | ================= | ||
| 4 | |||
| 5 | This text file will detail the queue files that are located in the sysfs tree | ||
| 6 | for each block device. Note that stacked devices typically do not export | ||
| 7 | any settings, since their queue merely functions are a remapping target. | ||
| 8 | These files are the ones found in the /sys/block/xxx/queue/ directory. | ||
| 9 | |||
| 10 | Files denoted with a RO postfix are readonly and the RW postfix means | ||
| 11 | read-write. | ||
| 12 | |||
| 13 | add_random (RW) | ||
| 14 | --------------- | ||
| 15 | This file allows to turn off the disk entropy contribution. Default | ||
| 16 | value of this file is '1'(on). | ||
| 17 | |||
| 18 | chunk_sectors (RO) | ||
| 19 | ------------------ | ||
| 20 | This has different meaning depending on the type of the block device. | ||
| 21 | For a RAID device (dm-raid), chunk_sectors indicates the size in 512B sectors | ||
| 22 | of the RAID volume stripe segment. For a zoned block device, either host-aware | ||
| 23 | or host-managed, chunk_sectors indicates the size in 512B sectors of the zones | ||
| 24 | of the device, with the eventual exception of the last zone of the device which | ||
| 25 | may be smaller. | ||
| 26 | |||
| 27 | dax (RO) | ||
| 28 | -------- | ||
| 29 | This file indicates whether the device supports Direct Access (DAX), | ||
| 30 | used by CPU-addressable storage to bypass the pagecache. It shows '1' | ||
| 31 | if true, '0' if not. | ||
| 32 | |||
| 33 | discard_granularity (RO) | ||
| 34 | ------------------------ | ||
| 35 | This shows the size of internal allocation of the device in bytes, if | ||
| 36 | reported by the device. A value of '0' means device does not support | ||
| 37 | the discard functionality. | ||
| 38 | |||
| 39 | discard_max_hw_bytes (RO) | ||
| 40 | ------------------------- | ||
| 41 | Devices that support discard functionality may have internal limits on | ||
| 42 | the number of bytes that can be trimmed or unmapped in a single operation. | ||
| 43 | The discard_max_bytes parameter is set by the device driver to the maximum | ||
| 44 | number of bytes that can be discarded in a single operation. Discard | ||
| 45 | requests issued to the device must not exceed this limit. A discard_max_bytes | ||
| 46 | value of 0 means that the device does not support discard functionality. | ||
| 47 | |||
| 48 | discard_max_bytes (RW) | ||
| 49 | ---------------------- | ||
| 50 | While discard_max_hw_bytes is the hardware limit for the device, this | ||
| 51 | setting is the software limit. Some devices exhibit large latencies when | ||
| 52 | large discards are issued, setting this value lower will make Linux issue | ||
| 53 | smaller discards and potentially help reduce latencies induced by large | ||
| 54 | discard operations. | ||
| 55 | |||
| 56 | discard_zeroes_data (RO) | ||
| 57 | ------------------------ | ||
| 58 | Obsolete. Always zero. | ||
| 59 | |||
| 60 | fua (RO) | ||
| 61 | -------- | ||
| 62 | Whether or not the block driver supports the FUA flag for write requests. | ||
| 63 | FUA stands for Force Unit Access. If the FUA flag is set that means that | ||
| 64 | write requests must bypass the volatile cache of the storage device. | ||
| 65 | |||
| 66 | hw_sector_size (RO) | ||
| 67 | ------------------- | ||
| 68 | This is the hardware sector size of the device, in bytes. | ||
| 69 | |||
| 70 | io_poll (RW) | ||
| 71 | ------------ | ||
| 72 | When read, this file shows whether polling is enabled (1) or disabled | ||
| 73 | (0). Writing '0' to this file will disable polling for this device. | ||
| 74 | Writing any non-zero value will enable this feature. | ||
| 75 | |||
| 76 | io_poll_delay (RW) | ||
| 77 | ------------------ | ||
| 78 | If polling is enabled, this controls what kind of polling will be | ||
| 79 | performed. It defaults to -1, which is classic polling. In this mode, | ||
| 80 | the CPU will repeatedly ask for completions without giving up any time. | ||
| 81 | If set to 0, a hybrid polling mode is used, where the kernel will attempt | ||
| 82 | to make an educated guess at when the IO will complete. Based on this | ||
| 83 | guess, the kernel will put the process issuing IO to sleep for an amount | ||
| 84 | of time, before entering a classic poll loop. This mode might be a | ||
| 85 | little slower than pure classic polling, but it will be more efficient. | ||
| 86 | If set to a value larger than 0, the kernel will put the process issuing | ||
| 87 | IO to sleep for this amount of microseconds before entering classic | ||
| 88 | polling. | ||
| 89 | |||
| 90 | io_timeout (RW) | ||
| 91 | --------------- | ||
| 92 | io_timeout is the request timeout in milliseconds. If a request does not | ||
| 93 | complete in this time then the block driver timeout handler is invoked. | ||
| 94 | That timeout handler can decide to retry the request, to fail it or to start | ||
| 95 | a device recovery strategy. | ||
| 96 | |||
| 97 | iostats (RW) | ||
| 98 | ------------- | ||
| 99 | This file is used to control (on/off) the iostats accounting of the | ||
| 100 | disk. | ||
| 101 | |||
| 102 | logical_block_size (RO) | ||
| 103 | ----------------------- | ||
| 104 | This is the logical block size of the device, in bytes. | ||
| 105 | |||
| 106 | max_discard_segments (RO) | ||
| 107 | ------------------------- | ||
| 108 | The maximum number of DMA scatter/gather entries in a discard request. | ||
| 109 | |||
| 110 | max_hw_sectors_kb (RO) | ||
| 111 | ---------------------- | ||
| 112 | This is the maximum number of kilobytes supported in a single data transfer. | ||
| 113 | |||
| 114 | max_integrity_segments (RO) | ||
| 115 | --------------------------- | ||
| 116 | Maximum number of elements in a DMA scatter/gather list with integrity | ||
| 117 | data that will be submitted by the block layer core to the associated | ||
| 118 | block driver. | ||
| 119 | |||
| 120 | max_sectors_kb (RW) | ||
| 121 | ------------------- | ||
| 122 | This is the maximum number of kilobytes that the block layer will allow | ||
| 123 | for a filesystem request. Must be smaller than or equal to the maximum | ||
| 124 | size allowed by the hardware. | ||
| 125 | |||
| 126 | max_segments (RO) | ||
| 127 | ----------------- | ||
| 128 | Maximum number of elements in a DMA scatter/gather list that is submitted | ||
| 129 | to the associated block driver. | ||
| 130 | |||
| 131 | max_segment_size (RO) | ||
| 132 | --------------------- | ||
| 133 | Maximum size in bytes of a single element in a DMA scatter/gather list. | ||
| 134 | |||
| 135 | minimum_io_size (RO) | ||
| 136 | -------------------- | ||
| 137 | This is the smallest preferred IO size reported by the device. | ||
| 138 | |||
| 139 | nomerges (RW) | ||
| 140 | ------------- | ||
| 141 | This enables the user to disable the lookup logic involved with IO | ||
| 142 | merging requests in the block layer. By default (0) all merges are | ||
| 143 | enabled. When set to 1 only simple one-hit merges will be tried. When | ||
| 144 | set to 2 no merge algorithms will be tried (including one-hit or more | ||
| 145 | complex tree/hash lookups). | ||
| 146 | |||
| 147 | nr_requests (RW) | ||
| 148 | ---------------- | ||
| 149 | This controls how many requests may be allocated in the block layer for | ||
| 150 | read or write requests. Note that the total allocated number may be twice | ||
| 151 | this amount, since it applies only to reads or writes (not the accumulated | ||
| 152 | sum). | ||
| 153 | |||
| 154 | To avoid priority inversion through request starvation, a request | ||
| 155 | queue maintains a separate request pool per each cgroup when | ||
| 156 | CONFIG_BLK_CGROUP is enabled, and this parameter applies to each such | ||
| 157 | per-block-cgroup request pool. IOW, if there are N block cgroups, | ||
| 158 | each request queue may have up to N request pools, each independently | ||
| 159 | regulated by nr_requests. | ||
| 160 | |||
| 161 | nr_zones (RO) | ||
| 162 | ------------- | ||
| 163 | For zoned block devices (zoned attribute indicating "host-managed" or | ||
| 164 | "host-aware"), this indicates the total number of zones of the device. | ||
| 165 | This is always 0 for regular block devices. | ||
| 166 | |||
| 167 | optimal_io_size (RO) | ||
| 168 | -------------------- | ||
| 169 | This is the optimal IO size reported by the device. | ||
| 170 | |||
| 171 | physical_block_size (RO) | ||
| 172 | ------------------------ | ||
| 173 | This is the physical block size of device, in bytes. | ||
| 174 | |||
| 175 | read_ahead_kb (RW) | ||
| 176 | ------------------ | ||
| 177 | Maximum number of kilobytes to read-ahead for filesystems on this block | ||
| 178 | device. | ||
| 179 | |||
| 180 | rotational (RW) | ||
| 181 | --------------- | ||
| 182 | This file is used to stat if the device is of rotational type or | ||
| 183 | non-rotational type. | ||
| 184 | |||
| 185 | rq_affinity (RW) | ||
| 186 | ---------------- | ||
| 187 | If this option is '1', the block layer will migrate request completions to the | ||
| 188 | cpu "group" that originally submitted the request. For some workloads this | ||
| 189 | provides a significant reduction in CPU cycles due to caching effects. | ||
| 190 | |||
| 191 | For storage configurations that need to maximize distribution of completion | ||
| 192 | processing setting this option to '2' forces the completion to run on the | ||
| 193 | requesting cpu (bypassing the "group" aggregation logic). | ||
| 194 | |||
| 195 | scheduler (RW) | ||
| 196 | -------------- | ||
| 197 | When read, this file will display the current and available IO schedulers | ||
| 198 | for this block device. The currently active IO scheduler will be enclosed | ||
| 199 | in [] brackets. Writing an IO scheduler name to this file will switch | ||
| 200 | control of this block device to that new IO scheduler. Note that writing | ||
| 201 | an IO scheduler name to this file will attempt to load that IO scheduler | ||
| 202 | module, if it isn't already present in the system. | ||
| 203 | |||
| 204 | write_cache (RW) | ||
| 205 | ---------------- | ||
| 206 | When read, this file will display whether the device has write back | ||
| 207 | caching enabled or not. It will return "write back" for the former | ||
| 208 | case, and "write through" for the latter. Writing to this file can | ||
| 209 | change the kernels view of the device, but it doesn't alter the | ||
| 210 | device state. This means that it might not be safe to toggle the | ||
| 211 | setting from "write back" to "write through", since that will also | ||
| 212 | eliminate cache flushes issued by the kernel. | ||
| 213 | |||
| 214 | write_same_max_bytes (RO) | ||
| 215 | ------------------------- | ||
| 216 | This is the number of bytes the device can write in a single write-same | ||
| 217 | command. A value of '0' means write-same is not supported by this | ||
| 218 | device. | ||
| 219 | |||
| 220 | wbt_lat_usec (RW) | ||
| 221 | ----------------- | ||
| 222 | If the device is registered for writeback throttling, then this file shows | ||
| 223 | the target minimum read latency. If this latency is exceeded in a given | ||
| 224 | window of time (see wb_window_usec), then the writeback throttling will start | ||
| 225 | scaling back writes. Writing a value of '0' to this file disables the | ||
| 226 | feature. Writing a value of '-1' to this file resets the value to the | ||
| 227 | default setting. | ||
| 228 | |||
| 229 | throttle_sample_time (RW) | ||
| 230 | ------------------------- | ||
| 231 | This is the time window that blk-throttle samples data, in millisecond. | ||
| 232 | blk-throttle makes decision based on the samplings. Lower time means cgroups | ||
| 233 | have more smooth throughput, but higher CPU overhead. This exists only when | ||
| 234 | CONFIG_BLK_DEV_THROTTLING_LOW is enabled. | ||
| 235 | |||
| 236 | write_zeroes_max_bytes (RO) | ||
| 237 | --------------------------- | ||
| 238 | For block drivers that support REQ_OP_WRITE_ZEROES, the maximum number of | ||
| 239 | bytes that can be zeroed at once. The value 0 means that REQ_OP_WRITE_ZEROES | ||
| 240 | is not supported. | ||
| 241 | |||
| 242 | zoned (RO) | ||
| 243 | ---------- | ||
| 244 | This indicates if the device is a zoned block device and the zone model of the | ||
| 245 | device if it is indeed zoned. The possible values indicated by zoned are | ||
| 246 | "none" for regular block devices and "host-aware" or "host-managed" for zoned | ||
| 247 | block devices. The characteristics of host-aware and host-managed zoned block | ||
| 248 | devices are described in the ZBC (Zoned Block Commands) and ZAC | ||
| 249 | (Zoned Device ATA Command Set) standards. These standards also define the | ||
| 250 | "drive-managed" zone model. However, since drive-managed zoned block devices | ||
| 251 | do not support zone commands, they will be treated as regular block devices | ||
| 252 | and zoned will report "none". | ||
| 253 | |||
| 254 | Jens Axboe <jens.axboe@oracle.com>, February 2009 | ||
