mmc: documentation of mmc non-blocking request usage and design.

Documentation about the background and the design of mmc non-blocking. Host driver guidelines to minimize request preparation overhead. Signed-off-by: Per Forlin <per.forlin@linaro.org> Acked-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Chris Ball <cjb@laptop.org>
author: Per Forlin <per.forlin@linaro.org> 2011-07-10 15:21:59 -0400
committer: Chris Ball <cjb@laptop.org> 2011-07-21 10:34:52 -0400
commit: 7937e878f91ccc32c09177f44cfdc45183d78605 (patch)
tree: ea343c892288164d9ae8e963e25e832a20d70fcc /Documentation/mmc
parent: 101ed47e01516adeffeb4769df77b9207e6ba48a (diff)
2 files changed, 89 insertions, 0 deletions
diff --git a/Documentation/mmc/00-INDEX b/Documentation/mmc/00-INDEX
index 93dd7a714075..a9ba6720ffdf 100644
--- a/Documentation/mmc/00-INDEX
+++ b/Documentation/mmc/00-INDEX
@@ -4,3 +4,5 @@ mmc-dev-attrs.txt
        - info on SD and MMC device attributes
 mmc-dev-parts.txt
        - info on SD and MMC device partitions
+mmc-async-req.txt
+        - info on mmc asynchronous requests
diff --git a/Documentation/mmc/mmc-async-req.txt b/Documentation/mmc/mmc-async-req.txt
new file mode 100644
index 000000000000..ae1907b10e4a
--- /dev/null
+++ b/Documentation/mmc/mmc-async-req.txt
@@ -0,0 +1,87 @@
+Rationale
+=========
+How significant is the cache maintenance overhead?
+It depends. Fast eMMC and multiple cache levels with speculative cache
+pre-fetch makes the cache overhead relatively significant. If the DMA
+preparations for the next request are done in parallel with the current
+transfer, the DMA preparation overhead would not affect the MMC performance.
+The intention of non-blocking (asynchronous) MMC requests is to minimize the
+time between when an MMC request ends and another MMC request begins.
+Using mmc_wait_for_req(), the MMC controller is idle while dma_map_sg and
+dma_unmap_sg are processing. Using non-blocking MMC requests makes it
+possible to prepare the caches for next job in parallel with an active
+MMC request.
+MMC block driver
+================
+The mmc_blk_issue_rw_rq() in the MMC block driver is made non-blocking.
+The increase in throughput is proportional to the time it takes to
+prepare (major part of preparations are dma_map_sg() and dma_unmap_sg())
+a request and how fast the memory is. The faster the MMC/SD is the
+more significant the prepare request time becomes. Roughly the expected
+performance gain is 5% for large writes and 10% on large reads on a L2 cache
+platform. In power save mode, when clocks run on a lower frequency, the DMA
+preparation may cost even more. As long as these slower preparations are run
+in parallel with the transfer performance won't be affected.
+Details on measurements from IOZone and mmc_test
+================================================
+https://wiki.linaro.org/WorkingGroups/Kernel/Specs/StoragePerfMMC-async-req
+MMC core API extension
+======================
+There is one new public function mmc_start_req().
+It starts a new MMC command request for a host. The function isn't
+truly non-blocking. If there is an ongoing async request it waits
+for completion of that request and starts the new one and returns. It
+doesn't wait for the new request to complete. If there is no ongoing
+request it starts the new request and returns immediately.
+MMC host extensions
+===================
+There are two optional members in the mmc_host_ops -- pre_req() and
+post_req() -- that the host driver may implement in order to move work
+to before and after the actual mmc_host_ops.request() function is called.
+In the DMA case pre_req() may do dma_map_sg() and prepare the DMA
+descriptor, and post_req() runs the dma_unmap_sg().
+Optimize for the first request
+==============================
+The first request in a series of requests can't be prepared in parallel
+with the previous transfer, since there is no previous request.
+The argument is_first_req in pre_req() indicates that there is no previous
+request. The host driver may optimize for this scenario to minimize
+the performance loss. A way to optimize for this is to split the current
+request in two chunks, prepare the first chunk and start the request,
+and finally prepare the second chunk and start the transfer.
+Pseudocode to handle is_first_req scenario with minimal prepare overhead:
+if (is_first_req && req->size > threshold)
+   /* start MMC transfer for the complete transfer size */
+   mmc_start_command(MMC_CMD_TRANSFER_FULL_SIZE);
+   /*
+    * Begin to prepare DMA while cmd is being processed by MMC.
+    * The first chunk of the request should take the same time
+    * to prepare as the "MMC process command time".
+    * If prepare time exceeds MMC cmd time
+    * the transfer is delayed, guesstimate max 4k as first chunk size.
+    */
+    prepare_1st_chunk_for_dma(req);
+    /* flush pending desc to the DMAC (dmaengine.h) */
+    dma_issue_pending(req->dma_desc);
+    prepare_2nd_chunk_for_dma(req);
+    /*
+     * The second issue_pending should be called before MMC runs out
+     * of the first chunk. If the MMC runs out of the first data chunk
+     * before this call, the transfer is delayed.
+     */
+    dma_issue_pending(req->dma_desc);
author	Per Forlin <per.forlin@linaro.org>	2011-07-10 15:21:59 -0400
committer	Chris Ball <cjb@laptop.org>	2011-07-21 10:34:52 -0400
commit	7937e878f91ccc32c09177f44cfdc45183d78605 (patch)
tree	ea343c892288164d9ae8e963e25e832a20d70fcc /Documentation/mmc
parent	101ed47e01516adeffeb4769df77b9207e6ba48a (diff)

diff --git a/Documentation/mmc/00-INDEX b/Documentation/mmc/00-INDEX index 93dd7a714075..a9ba6720ffdf 100644 --- a/Documentation/mmc/00-INDEX +++ b/Documentation/mmc/00-INDEX
@@ -4,3 +4,5 @@ mmc-dev-attrs.txt
4	- info on SD and MMC device attributes	4	- info on SD and MMC device attributes
5	mmc-dev-parts.txt	5	mmc-dev-parts.txt
6	- info on SD and MMC device partitions	6	- info on SD and MMC device partitions
		7	mmc-async-req.txt
		8	- info on mmc asynchronous requests


diff --git a/Documentation/mmc/mmc-async-req.txt b/Documentation/mmc/mmc-async-req.txt new file mode 100644 index 000000000000..ae1907b10e4a --- /dev/null +++ b/Documentation/mmc/mmc-async-req.txt
@@ -0,0 +1,87 @@
		1	Rationale
		2	=========
		3
		4	How significant is the cache maintenance overhead?
		5	It depends. Fast eMMC and multiple cache levels with speculative cache
		6	pre-fetch makes the cache overhead relatively significant. If the DMA
		7	preparations for the next request are done in parallel with the current
		8	transfer, the DMA preparation overhead would not affect the MMC performance.
		9	The intention of non-blocking (asynchronous) MMC requests is to minimize the
		10	time between when an MMC request ends and another MMC request begins.
		11	Using mmc_wait_for_req(), the MMC controller is idle while dma_map_sg and
		12	dma_unmap_sg are processing. Using non-blocking MMC requests makes it
		13	possible to prepare the caches for next job in parallel with an active
		14	MMC request.
		15
		16	MMC block driver
		17	================
		18
		19	The mmc_blk_issue_rw_rq() in the MMC block driver is made non-blocking.
		20	The increase in throughput is proportional to the time it takes to
		21	prepare (major part of preparations are dma_map_sg() and dma_unmap_sg())
		22	a request and how fast the memory is. The faster the MMC/SD is the
		23	more significant the prepare request time becomes. Roughly the expected
		24	performance gain is 5% for large writes and 10% on large reads on a L2 cache
		25	platform. In power save mode, when clocks run on a lower frequency, the DMA
		26	preparation may cost even more. As long as these slower preparations are run
		27	in parallel with the transfer performance won't be affected.
		28
		29	Details on measurements from IOZone and mmc_test
		30	================================================
		31
		32	https://wiki.linaro.org/WorkingGroups/Kernel/Specs/StoragePerfMMC-async-req
		33
		34	MMC core API extension
		35	======================
		36
		37	There is one new public function mmc_start_req().
		38	It starts a new MMC command request for a host. The function isn't
		39	truly non-blocking. If there is an ongoing async request it waits
		40	for completion of that request and starts the new one and returns. It
		41	doesn't wait for the new request to complete. If there is no ongoing
		42	request it starts the new request and returns immediately.
		43
		44	MMC host extensions
		45	===================
		46
		47	There are two optional members in the mmc_host_ops -- pre_req() and
		48	post_req() -- that the host driver may implement in order to move work
		49	to before and after the actual mmc_host_ops.request() function is called.
		50	In the DMA case pre_req() may do dma_map_sg() and prepare the DMA
		51	descriptor, and post_req() runs the dma_unmap_sg().
		52
		53	Optimize for the first request
		54	==============================
		55
		56	The first request in a series of requests can't be prepared in parallel
		57	with the previous transfer, since there is no previous request.
		58	The argument is_first_req in pre_req() indicates that there is no previous
		59	request. The host driver may optimize for this scenario to minimize
		60	the performance loss. A way to optimize for this is to split the current
		61	request in two chunks, prepare the first chunk and start the request,
		62	and finally prepare the second chunk and start the transfer.
		63
		64	Pseudocode to handle is_first_req scenario with minimal prepare overhead:
		65
		66	if (is_first_req && req->size > threshold)
		67	/* start MMC transfer for the complete transfer size */
		68	mmc_start_command(MMC_CMD_TRANSFER_FULL_SIZE);
		69
		70	/*
		71	* Begin to prepare DMA while cmd is being processed by MMC.
		72	* The first chunk of the request should take the same time
		73	* to prepare as the "MMC process command time".
		74	* If prepare time exceeds MMC cmd time
		75	* the transfer is delayed, guesstimate max 4k as first chunk size.
		76	*/
		77	prepare_1st_chunk_for_dma(req);
		78	/* flush pending desc to the DMAC (dmaengine.h) */
		79	dma_issue_pending(req->dma_desc);
		80
		81	prepare_2nd_chunk_for_dma(req);
		82	/*
		83	* The second issue_pending should be called before MMC runs out
		84	* of the first chunk. If the MMC runs out of the first data chunk
		85	* before this call, the transfer is delayed.
		86	*/
		87	dma_issue_pending(req->dma_desc);