Merge branch 'for-3.7/core' of git://git.kernel.dk/linux-block

Pull block IO update from Jens Axboe: "Core block IO bits for 3.7. Not a huge round this time, it contains: - First series from Kent cleaning up and generalizing bio allocation and freeing. - WRITE_SAME support from Martin. - Mikulas patches to prevent O_DIRECT crashes when someone changes the block size of a device. - Make bio_split() work on data-less bio's (like trim/discards). - A few other minor fixups." Fixed up silent semantic mis-merge as per Mikulas Patocka and Andrew Morton. It is due to the VM no longer using a prio-tree (see commit 6b2dbba8b6ac: "mm: replace vma prio_tree with an interval tree"). So make set_blocksize() use mapping_mapped() instead of open-coding the internal VM knowledge that has changed. * 'for-3.7/core' of git://git.kernel.dk/linux-block: (26 commits) block: makes bio_split support bio without data scatterlist: refactor the sg_nents scatterlist: add sg_nents fs: fix include/percpu-rwsem.h export error percpu-rw-semaphore: fix documentation typos fs/block_dev.c:1644:5: sparse: symbol 'blkdev_mmap' was not declared blockdev: turn a rw semaphore into a percpu rw semaphore Fix a crash when block device is read and block size is changed at the same time block: fix request_queue->flags initialization block: lift the initial queue bypass mode on blk_register_queue() instead of blk_init_allocated_queue() block: ioctl to zero block ranges block: Make blkdev_issue_zeroout use WRITE SAME block: Implement support for WRITE SAME block: Consolidate command flag and queue limit checks for merges block: Clean up special command handling logic block/blk-tag.c: Remove useless kfree block: remove the duplicated setting for congestion_threshold block: reject invalid queue attribute values block: Add bio_clone_bioset(), bio_clone_kmalloc() block: Consolidate bio_alloc_bioset(), bio_kmalloc() ...
author: Linus Torvalds <torvalds@linux-foundation.org> 2012-10-10 20:04:23 -0400
committer: Linus Torvalds <torvalds@linux-foundation.org> 2012-10-10 20:04:23 -0400
commit: ce40be7a820bb393ac4ac69865f018d2f4038cf0 (patch)
tree: b1fe5a93346eb06f22b1c303d63ec5456d7212ab /Documentation
parent: ba0a5a36f60e4c1152af3a2ae2813251974405bf (diff)
parent: 02f3939e1a9357b7c370a4a69717cf9c02452737 (diff)
3 files changed, 41 insertions, 5 deletions
diff --git a/Documentation/ABI/testing/sysfs-block b/Documentation/ABI/testing/sysfs-block
index c1eb41cb9876..279da08f7541 100644
--- a/Documentation/ABI/testing/sysfs-block
+++ b/Documentation/ABI/testing/sysfs-block
@@ -206,3 +206,17 @@ Description:
                when a discarded area is read the discard_zeroes_data
                parameter will be set to one. Otherwise it will be 0 and
                the result of reading a discarded area is undefined.
+What:           /sys/block/<disk>/queue/write_same_max_bytes
+Date:           January 2012
+Contact:        Martin K. Petersen <martin.petersen@oracle.com>
+Description:
+                Some devices support a write same operation in which a
+                single data block can be written to a range of several
+                contiguous blocks on storage. This can be used to wipe
+                areas on disk or to initialize drives in a RAID
+                configuration. write_same_max_bytes indicates how many
+                bytes can be written in a single write same command. If
+                write_same_max_bytes is 0, write same is not supported
+                by the device.
diff --git a/Documentation/block/biodoc.txt b/Documentation/block/biodoc.txt
index e418dc0a7086..8df5e8e6dceb 100644
--- a/Documentation/block/biodoc.txt
+++ b/Documentation/block/biodoc.txt
@@ -465,7 +465,6 @@ struct bio {
       bio_end_io_t     *bi_end_io;  /* bi_end_io (bio) */
       atomic_t         bi_cnt;      /* pin count: free when it hits zero */
       void             *bi_private;
-       bio_destructor_t *bi_destructor; /* bi_destructor (bio) */
 };
 With this multipage bio design:
@@ -647,10 +646,6 @@ for a non-clone bio. There are the 6 pools setup for different size biovecs,
 so bio_alloc(gfp_mask, nr_iovecs) will allocate a vec_list of the
 given size from these slabs.
-The bi_destructor() routine takes into account the possibility of the bio
-having originated from a different source (see later discussions on
-n/w to block transfers and kvec_cb)
 The bio_get() routine may be used to hold an extra reference on a bio prior
 to i/o submission, if the bio fields are likely to be accessed after the
 i/o is issued (since the bio may otherwise get freed in case i/o completion
diff --git a/Documentation/percpu-rw-semaphore.txt b/Documentation/percpu-rw-semaphore.txt
new file mode 100644
index 000000000000..7d3c82431909
--- /dev/null
+++ b/Documentation/percpu-rw-semaphore.txt
@@ -0,0 +1,27 @@
+Percpu rw semaphores
+--------------------
+Percpu rw semaphores is a new read-write semaphore design that is
+optimized for locking for reading.
+The problem with traditional read-write semaphores is that when multiple
+cores take the lock for reading, the cache line containing the semaphore
+is bouncing between L1 caches of the cores, causing performance
+degradation.
+Locking for reading is very fast, it uses RCU and it avoids any atomic
+instruction in the lock and unlock path. On the other hand, locking for
+writing is very expensive, it calls synchronize_rcu() that can take
+hundreds of milliseconds.
+The lock is declared with "struct percpu_rw_semaphore" type.
+The lock is initialized percpu_init_rwsem, it returns 0 on success and
+-ENOMEM on allocation failure.
+The lock must be freed with percpu_free_rwsem to avoid memory leak.
+The lock is locked for read with percpu_down_read, percpu_up_read and
+for write with percpu_down_write, percpu_up_write.
+The idea of using RCU for optimized rw-lock was introduced by
+Eric Dumazet <eric.dumazet@gmail.com>.
+The code was written by Mikulas Patocka <mpatocka@redhat.com>
author	Linus Torvalds <torvalds@linux-foundation.org>	2012-10-10 20:04:23 -0400
committer	Linus Torvalds <torvalds@linux-foundation.org>	2012-10-10 20:04:23 -0400
commit	ce40be7a820bb393ac4ac69865f018d2f4038cf0 (patch)
tree	b1fe5a93346eb06f22b1c303d63ec5456d7212ab /Documentation
parent	ba0a5a36f60e4c1152af3a2ae2813251974405bf (diff)
parent	02f3939e1a9357b7c370a4a69717cf9c02452737 (diff)

diff --git a/Documentation/ABI/testing/sysfs-block b/Documentation/ABI/testing/sysfs-block index c1eb41cb9876..279da08f7541 100644 --- a/Documentation/ABI/testing/sysfs-block +++ b/Documentation/ABI/testing/sysfs-block
@@ -206,3 +206,17 @@ Description:
206	when a discarded area is read the discard_zeroes_data	206	when a discarded area is read the discard_zeroes_data
207	parameter will be set to one. Otherwise it will be 0 and	207	parameter will be set to one. Otherwise it will be 0 and
208	the result of reading a discarded area is undefined.	208	the result of reading a discarded area is undefined.
		209
		210	What: /sys/block/<disk>/queue/write_same_max_bytes
		211	Date: January 2012
		212	Contact: Martin K. Petersen <martin.petersen@oracle.com>
		213	Description:
		214	Some devices support a write same operation in which a
		215	single data block can be written to a range of several
		216	contiguous blocks on storage. This can be used to wipe
		217	areas on disk or to initialize drives in a RAID
		218	configuration. write_same_max_bytes indicates how many
		219	bytes can be written in a single write same command. If
		220	write_same_max_bytes is 0, write same is not supported
		221	by the device.
		222


diff --git a/Documentation/block/biodoc.txt b/Documentation/block/biodoc.txt index e418dc0a7086..8df5e8e6dceb 100644 --- a/Documentation/block/biodoc.txt +++ b/Documentation/block/biodoc.txt
@@ -465,7 +465,6 @@ struct bio {
465	bio_end_io_t bi_end_io; / bi_end_io (bio) */	465	bio_end_io_t bi_end_io; / bi_end_io (bio) */
466	atomic_t bi_cnt; /* pin count: free when it hits zero */	466	atomic_t bi_cnt; /* pin count: free when it hits zero */
467	void *bi_private;	467	void *bi_private;
468	bio_destructor_t bi_destructor; / bi_destructor (bio) */
469	};	468	};
470		469
471	With this multipage bio design:	470	With this multipage bio design:
@@ -647,10 +646,6 @@ for a non-clone bio. There are the 6 pools setup for different size biovecs,
647	so bio_alloc(gfp_mask, nr_iovecs) will allocate a vec_list of the	646	so bio_alloc(gfp_mask, nr_iovecs) will allocate a vec_list of the
648	given size from these slabs.	647	given size from these slabs.
649		648
650	The bi_destructor() routine takes into account the possibility of the bio
651	having originated from a different source (see later discussions on
652	n/w to block transfers and kvec_cb)
653
654	The bio_get() routine may be used to hold an extra reference on a bio prior	649	The bio_get() routine may be used to hold an extra reference on a bio prior
655	to i/o submission, if the bio fields are likely to be accessed after the	650	to i/o submission, if the bio fields are likely to be accessed after the
656	i/o is issued (since the bio may otherwise get freed in case i/o completion	651	i/o is issued (since the bio may otherwise get freed in case i/o completion


diff --git a/Documentation/percpu-rw-semaphore.txt b/Documentation/percpu-rw-semaphore.txt new file mode 100644 index 000000000000..7d3c82431909 --- /dev/null +++ b/Documentation/percpu-rw-semaphore.txt
@@ -0,0 +1,27 @@
		1	Percpu rw semaphores
		2	--------------------
		3
		4	Percpu rw semaphores is a new read-write semaphore design that is
		5	optimized for locking for reading.
		6
		7	The problem with traditional read-write semaphores is that when multiple
		8	cores take the lock for reading, the cache line containing the semaphore
		9	is bouncing between L1 caches of the cores, causing performance
		10	degradation.
		11
		12	Locking for reading is very fast, it uses RCU and it avoids any atomic
		13	instruction in the lock and unlock path. On the other hand, locking for
		14	writing is very expensive, it calls synchronize_rcu() that can take
		15	hundreds of milliseconds.
		16
		17	The lock is declared with "struct percpu_rw_semaphore" type.
		18	The lock is initialized percpu_init_rwsem, it returns 0 on success and
		19	-ENOMEM on allocation failure.
		20	The lock must be freed with percpu_free_rwsem to avoid memory leak.
		21
		22	The lock is locked for read with percpu_down_read, percpu_up_read and
		23	for write with percpu_down_write, percpu_up_write.
		24
		25	The idea of using RCU for optimized rw-lock was introduced by
		26	Eric Dumazet <eric.dumazet@gmail.com>.
		27	The code was written by Mikulas Patocka <mpatocka@redhat.com>