diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2012-10-10 20:04:23 -0400 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2012-10-10 20:04:23 -0400 |
commit | ce40be7a820bb393ac4ac69865f018d2f4038cf0 (patch) | |
tree | b1fe5a93346eb06f22b1c303d63ec5456d7212ab /Documentation | |
parent | ba0a5a36f60e4c1152af3a2ae2813251974405bf (diff) | |
parent | 02f3939e1a9357b7c370a4a69717cf9c02452737 (diff) |
Merge branch 'for-3.7/core' of git://git.kernel.dk/linux-block
Pull block IO update from Jens Axboe:
"Core block IO bits for 3.7. Not a huge round this time, it contains:
- First series from Kent cleaning up and generalizing bio allocation
and freeing.
- WRITE_SAME support from Martin.
- Mikulas patches to prevent O_DIRECT crashes when someone changes
the block size of a device.
- Make bio_split() work on data-less bio's (like trim/discards).
- A few other minor fixups."
Fixed up silent semantic mis-merge as per Mikulas Patocka and Andrew
Morton. It is due to the VM no longer using a prio-tree (see commit
6b2dbba8b6ac: "mm: replace vma prio_tree with an interval tree").
So make set_blocksize() use mapping_mapped() instead of open-coding the
internal VM knowledge that has changed.
* 'for-3.7/core' of git://git.kernel.dk/linux-block: (26 commits)
block: makes bio_split support bio without data
scatterlist: refactor the sg_nents
scatterlist: add sg_nents
fs: fix include/percpu-rwsem.h export error
percpu-rw-semaphore: fix documentation typos
fs/block_dev.c:1644:5: sparse: symbol 'blkdev_mmap' was not declared
blockdev: turn a rw semaphore into a percpu rw semaphore
Fix a crash when block device is read and block size is changed at the same time
block: fix request_queue->flags initialization
block: lift the initial queue bypass mode on blk_register_queue() instead of blk_init_allocated_queue()
block: ioctl to zero block ranges
block: Make blkdev_issue_zeroout use WRITE SAME
block: Implement support for WRITE SAME
block: Consolidate command flag and queue limit checks for merges
block: Clean up special command handling logic
block/blk-tag.c: Remove useless kfree
block: remove the duplicated setting for congestion_threshold
block: reject invalid queue attribute values
block: Add bio_clone_bioset(), bio_clone_kmalloc()
block: Consolidate bio_alloc_bioset(), bio_kmalloc()
...
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/ABI/testing/sysfs-block | 14 | ||||
-rw-r--r-- | Documentation/block/biodoc.txt | 5 | ||||
-rw-r--r-- | Documentation/percpu-rw-semaphore.txt | 27 |
3 files changed, 41 insertions, 5 deletions
diff --git a/Documentation/ABI/testing/sysfs-block b/Documentation/ABI/testing/sysfs-block index c1eb41cb9876..279da08f7541 100644 --- a/Documentation/ABI/testing/sysfs-block +++ b/Documentation/ABI/testing/sysfs-block | |||
@@ -206,3 +206,17 @@ Description: | |||
206 | when a discarded area is read the discard_zeroes_data | 206 | when a discarded area is read the discard_zeroes_data |
207 | parameter will be set to one. Otherwise it will be 0 and | 207 | parameter will be set to one. Otherwise it will be 0 and |
208 | the result of reading a discarded area is undefined. | 208 | the result of reading a discarded area is undefined. |
209 | |||
210 | What: /sys/block/<disk>/queue/write_same_max_bytes | ||
211 | Date: January 2012 | ||
212 | Contact: Martin K. Petersen <martin.petersen@oracle.com> | ||
213 | Description: | ||
214 | Some devices support a write same operation in which a | ||
215 | single data block can be written to a range of several | ||
216 | contiguous blocks on storage. This can be used to wipe | ||
217 | areas on disk or to initialize drives in a RAID | ||
218 | configuration. write_same_max_bytes indicates how many | ||
219 | bytes can be written in a single write same command. If | ||
220 | write_same_max_bytes is 0, write same is not supported | ||
221 | by the device. | ||
222 | |||
diff --git a/Documentation/block/biodoc.txt b/Documentation/block/biodoc.txt index e418dc0a7086..8df5e8e6dceb 100644 --- a/Documentation/block/biodoc.txt +++ b/Documentation/block/biodoc.txt | |||
@@ -465,7 +465,6 @@ struct bio { | |||
465 | bio_end_io_t *bi_end_io; /* bi_end_io (bio) */ | 465 | bio_end_io_t *bi_end_io; /* bi_end_io (bio) */ |
466 | atomic_t bi_cnt; /* pin count: free when it hits zero */ | 466 | atomic_t bi_cnt; /* pin count: free when it hits zero */ |
467 | void *bi_private; | 467 | void *bi_private; |
468 | bio_destructor_t *bi_destructor; /* bi_destructor (bio) */ | ||
469 | }; | 468 | }; |
470 | 469 | ||
471 | With this multipage bio design: | 470 | With this multipage bio design: |
@@ -647,10 +646,6 @@ for a non-clone bio. There are the 6 pools setup for different size biovecs, | |||
647 | so bio_alloc(gfp_mask, nr_iovecs) will allocate a vec_list of the | 646 | so bio_alloc(gfp_mask, nr_iovecs) will allocate a vec_list of the |
648 | given size from these slabs. | 647 | given size from these slabs. |
649 | 648 | ||
650 | The bi_destructor() routine takes into account the possibility of the bio | ||
651 | having originated from a different source (see later discussions on | ||
652 | n/w to block transfers and kvec_cb) | ||
653 | |||
654 | The bio_get() routine may be used to hold an extra reference on a bio prior | 649 | The bio_get() routine may be used to hold an extra reference on a bio prior |
655 | to i/o submission, if the bio fields are likely to be accessed after the | 650 | to i/o submission, if the bio fields are likely to be accessed after the |
656 | i/o is issued (since the bio may otherwise get freed in case i/o completion | 651 | i/o is issued (since the bio may otherwise get freed in case i/o completion |
diff --git a/Documentation/percpu-rw-semaphore.txt b/Documentation/percpu-rw-semaphore.txt new file mode 100644 index 000000000000..7d3c82431909 --- /dev/null +++ b/Documentation/percpu-rw-semaphore.txt | |||
@@ -0,0 +1,27 @@ | |||
1 | Percpu rw semaphores | ||
2 | -------------------- | ||
3 | |||
4 | Percpu rw semaphores is a new read-write semaphore design that is | ||
5 | optimized for locking for reading. | ||
6 | |||
7 | The problem with traditional read-write semaphores is that when multiple | ||
8 | cores take the lock for reading, the cache line containing the semaphore | ||
9 | is bouncing between L1 caches of the cores, causing performance | ||
10 | degradation. | ||
11 | |||
12 | Locking for reading is very fast, it uses RCU and it avoids any atomic | ||
13 | instruction in the lock and unlock path. On the other hand, locking for | ||
14 | writing is very expensive, it calls synchronize_rcu() that can take | ||
15 | hundreds of milliseconds. | ||
16 | |||
17 | The lock is declared with "struct percpu_rw_semaphore" type. | ||
18 | The lock is initialized percpu_init_rwsem, it returns 0 on success and | ||
19 | -ENOMEM on allocation failure. | ||
20 | The lock must be freed with percpu_free_rwsem to avoid memory leak. | ||
21 | |||
22 | The lock is locked for read with percpu_down_read, percpu_up_read and | ||
23 | for write with percpu_down_write, percpu_up_write. | ||
24 | |||
25 | The idea of using RCU for optimized rw-lock was introduced by | ||
26 | Eric Dumazet <eric.dumazet@gmail.com>. | ||
27 | The code was written by Mikulas Patocka <mpatocka@redhat.com> | ||