diff options
| author | Linus Torvalds <torvalds@linux-foundation.org> | 2012-10-10 20:04:23 -0400 |
|---|---|---|
| committer | Linus Torvalds <torvalds@linux-foundation.org> | 2012-10-10 20:04:23 -0400 |
| commit | ce40be7a820bb393ac4ac69865f018d2f4038cf0 (patch) | |
| tree | b1fe5a93346eb06f22b1c303d63ec5456d7212ab /Documentation | |
| parent | ba0a5a36f60e4c1152af3a2ae2813251974405bf (diff) | |
| parent | 02f3939e1a9357b7c370a4a69717cf9c02452737 (diff) | |
Merge branch 'for-3.7/core' of git://git.kernel.dk/linux-block
Pull block IO update from Jens Axboe:
"Core block IO bits for 3.7. Not a huge round this time, it contains:
- First series from Kent cleaning up and generalizing bio allocation
and freeing.
- WRITE_SAME support from Martin.
- Mikulas patches to prevent O_DIRECT crashes when someone changes
the block size of a device.
- Make bio_split() work on data-less bio's (like trim/discards).
- A few other minor fixups."
Fixed up silent semantic mis-merge as per Mikulas Patocka and Andrew
Morton. It is due to the VM no longer using a prio-tree (see commit
6b2dbba8b6ac: "mm: replace vma prio_tree with an interval tree").
So make set_blocksize() use mapping_mapped() instead of open-coding the
internal VM knowledge that has changed.
* 'for-3.7/core' of git://git.kernel.dk/linux-block: (26 commits)
block: makes bio_split support bio without data
scatterlist: refactor the sg_nents
scatterlist: add sg_nents
fs: fix include/percpu-rwsem.h export error
percpu-rw-semaphore: fix documentation typos
fs/block_dev.c:1644:5: sparse: symbol 'blkdev_mmap' was not declared
blockdev: turn a rw semaphore into a percpu rw semaphore
Fix a crash when block device is read and block size is changed at the same time
block: fix request_queue->flags initialization
block: lift the initial queue bypass mode on blk_register_queue() instead of blk_init_allocated_queue()
block: ioctl to zero block ranges
block: Make blkdev_issue_zeroout use WRITE SAME
block: Implement support for WRITE SAME
block: Consolidate command flag and queue limit checks for merges
block: Clean up special command handling logic
block/blk-tag.c: Remove useless kfree
block: remove the duplicated setting for congestion_threshold
block: reject invalid queue attribute values
block: Add bio_clone_bioset(), bio_clone_kmalloc()
block: Consolidate bio_alloc_bioset(), bio_kmalloc()
...
Diffstat (limited to 'Documentation')
| -rw-r--r-- | Documentation/ABI/testing/sysfs-block | 14 | ||||
| -rw-r--r-- | Documentation/block/biodoc.txt | 5 | ||||
| -rw-r--r-- | Documentation/percpu-rw-semaphore.txt | 27 |
3 files changed, 41 insertions, 5 deletions
diff --git a/Documentation/ABI/testing/sysfs-block b/Documentation/ABI/testing/sysfs-block index c1eb41cb9876..279da08f7541 100644 --- a/Documentation/ABI/testing/sysfs-block +++ b/Documentation/ABI/testing/sysfs-block | |||
| @@ -206,3 +206,17 @@ Description: | |||
| 206 | when a discarded area is read the discard_zeroes_data | 206 | when a discarded area is read the discard_zeroes_data |
| 207 | parameter will be set to one. Otherwise it will be 0 and | 207 | parameter will be set to one. Otherwise it will be 0 and |
| 208 | the result of reading a discarded area is undefined. | 208 | the result of reading a discarded area is undefined. |
| 209 | |||
| 210 | What: /sys/block/<disk>/queue/write_same_max_bytes | ||
| 211 | Date: January 2012 | ||
| 212 | Contact: Martin K. Petersen <martin.petersen@oracle.com> | ||
| 213 | Description: | ||
| 214 | Some devices support a write same operation in which a | ||
| 215 | single data block can be written to a range of several | ||
| 216 | contiguous blocks on storage. This can be used to wipe | ||
| 217 | areas on disk or to initialize drives in a RAID | ||
| 218 | configuration. write_same_max_bytes indicates how many | ||
| 219 | bytes can be written in a single write same command. If | ||
| 220 | write_same_max_bytes is 0, write same is not supported | ||
| 221 | by the device. | ||
| 222 | |||
diff --git a/Documentation/block/biodoc.txt b/Documentation/block/biodoc.txt index e418dc0a7086..8df5e8e6dceb 100644 --- a/Documentation/block/biodoc.txt +++ b/Documentation/block/biodoc.txt | |||
| @@ -465,7 +465,6 @@ struct bio { | |||
| 465 | bio_end_io_t *bi_end_io; /* bi_end_io (bio) */ | 465 | bio_end_io_t *bi_end_io; /* bi_end_io (bio) */ |
| 466 | atomic_t bi_cnt; /* pin count: free when it hits zero */ | 466 | atomic_t bi_cnt; /* pin count: free when it hits zero */ |
| 467 | void *bi_private; | 467 | void *bi_private; |
| 468 | bio_destructor_t *bi_destructor; /* bi_destructor (bio) */ | ||
| 469 | }; | 468 | }; |
| 470 | 469 | ||
| 471 | With this multipage bio design: | 470 | With this multipage bio design: |
| @@ -647,10 +646,6 @@ for a non-clone bio. There are the 6 pools setup for different size biovecs, | |||
| 647 | so bio_alloc(gfp_mask, nr_iovecs) will allocate a vec_list of the | 646 | so bio_alloc(gfp_mask, nr_iovecs) will allocate a vec_list of the |
| 648 | given size from these slabs. | 647 | given size from these slabs. |
| 649 | 648 | ||
| 650 | The bi_destructor() routine takes into account the possibility of the bio | ||
| 651 | having originated from a different source (see later discussions on | ||
| 652 | n/w to block transfers and kvec_cb) | ||
| 653 | |||
| 654 | The bio_get() routine may be used to hold an extra reference on a bio prior | 649 | The bio_get() routine may be used to hold an extra reference on a bio prior |
| 655 | to i/o submission, if the bio fields are likely to be accessed after the | 650 | to i/o submission, if the bio fields are likely to be accessed after the |
| 656 | i/o is issued (since the bio may otherwise get freed in case i/o completion | 651 | i/o is issued (since the bio may otherwise get freed in case i/o completion |
diff --git a/Documentation/percpu-rw-semaphore.txt b/Documentation/percpu-rw-semaphore.txt new file mode 100644 index 000000000000..7d3c82431909 --- /dev/null +++ b/Documentation/percpu-rw-semaphore.txt | |||
| @@ -0,0 +1,27 @@ | |||
| 1 | Percpu rw semaphores | ||
| 2 | -------------------- | ||
| 3 | |||
| 4 | Percpu rw semaphores is a new read-write semaphore design that is | ||
| 5 | optimized for locking for reading. | ||
| 6 | |||
| 7 | The problem with traditional read-write semaphores is that when multiple | ||
| 8 | cores take the lock for reading, the cache line containing the semaphore | ||
| 9 | is bouncing between L1 caches of the cores, causing performance | ||
| 10 | degradation. | ||
| 11 | |||
| 12 | Locking for reading is very fast, it uses RCU and it avoids any atomic | ||
| 13 | instruction in the lock and unlock path. On the other hand, locking for | ||
| 14 | writing is very expensive, it calls synchronize_rcu() that can take | ||
| 15 | hundreds of milliseconds. | ||
| 16 | |||
| 17 | The lock is declared with "struct percpu_rw_semaphore" type. | ||
| 18 | The lock is initialized percpu_init_rwsem, it returns 0 on success and | ||
| 19 | -ENOMEM on allocation failure. | ||
| 20 | The lock must be freed with percpu_free_rwsem to avoid memory leak. | ||
| 21 | |||
| 22 | The lock is locked for read with percpu_down_read, percpu_up_read and | ||
| 23 | for write with percpu_down_write, percpu_up_write. | ||
| 24 | |||
| 25 | The idea of using RCU for optimized rw-lock was introduced by | ||
| 26 | Eric Dumazet <eric.dumazet@gmail.com>. | ||
| 27 | The code was written by Mikulas Patocka <mpatocka@redhat.com> | ||
