[PATCH] dma doc updates

This updates the DMA API documentation to address a few issues: - The dma_map_sg() call results are used like pci_map_sg() results: using sg_dma_address() and sg_dma_len(). That's not wholly obvious to folk reading _only_ the "new" DMA-API.txt writeup. - Buffers allocated by dma_alloc_coherent() may not be completely free of coherency concerns ... some CPUs also have write buffers that may need to be flushed. - Cacheline coherence issues are now mentioned as being among issues which affect dma buffers, and complicate/prevent using of static and (especially) stack based buffers with the DMA calls. I don't think many drivers currently need to worry about flushing write buffers, but I did hit it with one SOC using external SDRAM for DMA descriptors: without explicit writebuffer flushing, the on-chip DMA controller accessed descriptors before the CPU completed the writes. Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
author: David Brownell <david-b@pacbell.net> 2006-04-01 13:21:52 -0500
committer: Greg Kroah-Hartman <gregkh@suse.de> 2006-04-14 15:25:26 -0400
commit: 21440d313358043b0ce5e43b00ff3c9b35a8616c (patch)
tree: 32f3ed659a76ad6e4a6061b57346178cf3fa6256 /Documentation
parent: 2d1e1c754d641bb8a32f0ce909dcff32906830ef (diff)
2 files changed, 53 insertions, 18 deletions
diff --git a/Documentation/DMA-API.txt b/Documentation/DMA-API.txt
index 1af0f2d5022..2ffb0d62f0f 100644
--- a/Documentation/DMA-API.txt
+++ b/Documentation/DMA-API.txt
@@ -33,7 +33,9 @@ pci_alloc_consistent(struct pci_dev *dev, size_t size,
 Consistent memory is memory for which a write by either the device or
 the processor can immediately be read by the processor or device
-without having to worry about caching effects.
+without having to worry about caching effects.  (You may however need
+to make sure to flush the processor's write buffers before telling
+devices to read that memory.)
 This routine allocates a region of <size> bytes of consistent memory.
 it also returns a <dma_handle> which may be cast to an unsigned
@@ -304,12 +306,12 @@ dma address with dma_mapping_error(). A non zero return value means the mapping
 could not be created and the driver should take appropriate action (eg
 reduce current DMA mapping usage or delay and try again later).
-int
+        int
-dma_map_sg(struct device *dev, struct scatterlist *sg, int nents,
+        dma_map_sg(struct device *dev, struct scatterlist *sg,
-           enum dma_data_direction direction)
+                int nents, enum dma_data_direction direction)
-int
+        int
-pci_map_sg(struct pci_dev *hwdev, struct scatterlist *sg,
+        pci_map_sg(struct pci_dev *hwdev, struct scatterlist *sg,
-           int nents, int direction)
+                int nents, int direction)
 Maps a scatter gather list from the block layer.
@@ -327,12 +329,33 @@ critical that the driver do something, in the case of a block driver
 aborting the request or even oopsing is better than doing nothing and
 corrupting the filesystem.
-void
+With scatterlists, you use the resulting mapping like this:
-dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nhwentries,
-             enum dma_data_direction direction)
+        int i, count = dma_map_sg(dev, sglist, nents, direction);
-void
+        struct scatterlist *sg;
-pci_unmap_sg(struct pci_dev *hwdev, struct scatterlist *sg,
-             int nents, int direction)
+        for (i = 0, sg = sglist; i < count; i++, sg++) {
+                hw_address[i] = sg_dma_address(sg);
+                hw_len[i] = sg_dma_len(sg);
+        }
+where nents is the number of entries in the sglist.
+The implementation is free to merge several consecutive sglist entries
+into one (e.g. with an IOMMU, or if several pages just happen to be
+physically contiguous) and returns the actual number of sg entries it
+mapped them to. On failure 0, is returned.
+Then you should loop count times (note: this can be less than nents times)
+and use sg_dma_address() and sg_dma_len() macros where you previously
+accessed sg->address and sg->length as shown above.
+        void
+        dma_unmap_sg(struct device *dev, struct scatterlist *sg,
+                int nhwentries, enum dma_data_direction direction)
+        void
+        pci_unmap_sg(struct pci_dev *hwdev, struct scatterlist *sg,
+                int nents, int direction)
 unmap the previously mapped scatter/gather list.  All the parameters
 must be the same as those and passed in to the scatter/gather mapping
diff --git a/Documentation/DMA-mapping.txt b/Documentation/DMA-mapping.txt
index 10bf4deb96a..7c717699032 100644
--- a/Documentation/DMA-mapping.txt
+++ b/Documentation/DMA-mapping.txt
@@ -58,11 +58,15 @@ translating each of those pages back to a kernel address using
 something like __va().  [ EDIT: Update this when we integrate
 Gerd Knorr's generic code which does this. ]
-This rule also means that you may not use kernel image addresses
+This rule also means that you may use neither kernel image addresses
-(ie. items in the kernel's data/text/bss segment, or your driver's)
+(items in data/text/bss segments), nor module image addresses, nor
-nor may you use kernel stack addresses for DMA.  Both of these items
+stack addresses for DMA.  These could all be mapped somewhere entirely
-might be mapped somewhere entirely different than the rest of physical
+different than the rest of physical memory.  Even if those classes of
-memory.
+memory could physically work with DMA, you'd need to ensure the I/O
+buffers were cacheline-aligned.  Without that, you'd see cacheline
+sharing problems (data corruption) on CPUs with DMA-incoherent caches.
+(The CPU could write to one word, DMA would write to a different one
+in the same cache line, and one of them could be overwritten.)
 Also, this means that you cannot take the return of a kmap()
 call and DMA to/from that.  This is similar to vmalloc().
@@ -284,6 +288,11 @@ There are two types of DMA mappings:
             in order to get correct behavior on all platforms.
+             Also, on some platforms your driver may need to flush CPU write
+             buffers in much the same way as it needs to flush write buffers
+             found in PCI bridges (such as by reading a register's value
+             after writing it).
 - Streaming DMA mappings which are usually mapped for one DMA transfer,
  unmapped right after it (unless you use pci_dma_sync_* below) and for which
  hardware can optimize for sequential accesses.
@@ -303,6 +312,9 @@ There are two types of DMA mappings:
 Neither type of DMA mapping has alignment restrictions that come
 from PCI, although some devices may have such restrictions.
+Also, systems with caches that aren't DMA-coherent will work better
+when the underlying buffers don't share cache lines with other data.
                 Using Consistent DMA mappings.
author	David Brownell <david-b@pacbell.net>	2006-04-01 13:21:52 -0500
committer	Greg Kroah-Hartman <gregkh@suse.de>	2006-04-14 15:25:26 -0400
commit	21440d313358043b0ce5e43b00ff3c9b35a8616c (patch)
tree	32f3ed659a76ad6e4a6061b57346178cf3fa6256 /Documentation
parent	2d1e1c754d641bb8a32f0ce909dcff32906830ef (diff)

diff --git a/Documentation/DMA-API.txt b/Documentation/DMA-API.txt index 1af0f2d5022..2ffb0d62f0f 100644 --- a/Documentation/DMA-API.txt +++ b/Documentation/DMA-API.txt
@@ -33,7 +33,9 @@ pci_alloc_consistent(struct pci_dev *dev, size_t size,
33		33
34	Consistent memory is memory for which a write by either the device or	34	Consistent memory is memory for which a write by either the device or
35	the processor can immediately be read by the processor or device	35	the processor can immediately be read by the processor or device
36	without having to worry about caching effects.	36	without having to worry about caching effects. (You may however need
		37	to make sure to flush the processor's write buffers before telling
		38	devices to read that memory.)
37		39
38	This routine allocates a region of <size> bytes of consistent memory.	40	This routine allocates a region of <size> bytes of consistent memory.
39	it also returns a <dma_handle> which may be cast to an unsigned	41	it also returns a <dma_handle> which may be cast to an unsigned
@@ -304,12 +306,12 @@ dma address with dma_mapping_error(). A non zero return value means the mapping
304	could not be created and the driver should take appropriate action (eg	306	could not be created and the driver should take appropriate action (eg
305	reduce current DMA mapping usage or delay and try again later).	307	reduce current DMA mapping usage or delay and try again later).
306		308
307	int	309	int
308	dma_map_sg(struct device dev, struct scatterlist sg, int nents,	310	dma_map_sg(struct device dev, struct scatterlist sg,
309	enum dma_data_direction direction)	311	int nents, enum dma_data_direction direction)
310	int	312	int
311	pci_map_sg(struct pci_dev hwdev, struct scatterlist sg,	313	pci_map_sg(struct pci_dev hwdev, struct scatterlist sg,
312	int nents, int direction)	314	int nents, int direction)
313		315
314	Maps a scatter gather list from the block layer.	316	Maps a scatter gather list from the block layer.
315		317
@@ -327,12 +329,33 @@ critical that the driver do something, in the case of a block driver
327	aborting the request or even oopsing is better than doing nothing and	329	aborting the request or even oopsing is better than doing nothing and
328	corrupting the filesystem.	330	corrupting the filesystem.
329		331
330	void	332	With scatterlists, you use the resulting mapping like this:
331	dma_unmap_sg(struct device dev, struct scatterlist sg, int nhwentries,	333
332	enum dma_data_direction direction)	334	int i, count = dma_map_sg(dev, sglist, nents, direction);
333	void	335	struct scatterlist *sg;
334	pci_unmap_sg(struct pci_dev hwdev, struct scatterlist sg,	336
335	int nents, int direction)	337	for (i = 0, sg = sglist; i < count; i++, sg++) {
		338	hw_address[i] = sg_dma_address(sg);
		339	hw_len[i] = sg_dma_len(sg);
		340	}
		341
		342	where nents is the number of entries in the sglist.
		343
		344	The implementation is free to merge several consecutive sglist entries
		345	into one (e.g. with an IOMMU, or if several pages just happen to be
		346	physically contiguous) and returns the actual number of sg entries it
		347	mapped them to. On failure 0, is returned.
		348
		349	Then you should loop count times (note: this can be less than nents times)
		350	and use sg_dma_address() and sg_dma_len() macros where you previously
		351	accessed sg->address and sg->length as shown above.
		352
		353	void
		354	dma_unmap_sg(struct device dev, struct scatterlist sg,
		355	int nhwentries, enum dma_data_direction direction)
		356	void
		357	pci_unmap_sg(struct pci_dev hwdev, struct scatterlist sg,
		358	int nents, int direction)
336		359
337	unmap the previously mapped scatter/gather list. All the parameters	360	unmap the previously mapped scatter/gather list. All the parameters
338	must be the same as those and passed in to the scatter/gather mapping	361	must be the same as those and passed in to the scatter/gather mapping


diff --git a/Documentation/DMA-mapping.txt b/Documentation/DMA-mapping.txt index 10bf4deb96a..7c717699032 100644 --- a/Documentation/DMA-mapping.txt +++ b/Documentation/DMA-mapping.txt
@@ -58,11 +58,15 @@ translating each of those pages back to a kernel address using
58	something like __va(). [ EDIT: Update this when we integrate	58	something like __va(). [ EDIT: Update this when we integrate
59	Gerd Knorr's generic code which does this. ]	59	Gerd Knorr's generic code which does this. ]
60		60
61	This rule also means that you may not use kernel image addresses	61	This rule also means that you may use neither kernel image addresses
62	(ie. items in the kernel's data/text/bss segment, or your driver's)	62	(items in data/text/bss segments), nor module image addresses, nor
63	nor may you use kernel stack addresses for DMA. Both of these items	63	stack addresses for DMA. These could all be mapped somewhere entirely
64	might be mapped somewhere entirely different than the rest of physical	64	different than the rest of physical memory. Even if those classes of
65	memory.	65	memory could physically work with DMA, you'd need to ensure the I/O
		66	buffers were cacheline-aligned. Without that, you'd see cacheline
		67	sharing problems (data corruption) on CPUs with DMA-incoherent caches.
		68	(The CPU could write to one word, DMA would write to a different one
		69	in the same cache line, and one of them could be overwritten.)
66		70
67	Also, this means that you cannot take the return of a kmap()	71	Also, this means that you cannot take the return of a kmap()
68	call and DMA to/from that. This is similar to vmalloc().	72	call and DMA to/from that. This is similar to vmalloc().
@@ -284,6 +288,11 @@ There are two types of DMA mappings:
284		288
285	in order to get correct behavior on all platforms.	289	in order to get correct behavior on all platforms.
286		290
		291	Also, on some platforms your driver may need to flush CPU write
		292	buffers in much the same way as it needs to flush write buffers
		293	found in PCI bridges (such as by reading a register's value
		294	after writing it).
		295
287	- Streaming DMA mappings which are usually mapped for one DMA transfer,	296	- Streaming DMA mappings which are usually mapped for one DMA transfer,
288	unmapped right after it (unless you use pci_dma_sync_* below) and for which	297	unmapped right after it (unless you use pci_dma_sync_* below) and for which
289	hardware can optimize for sequential accesses.	298	hardware can optimize for sequential accesses.
@@ -303,6 +312,9 @@ There are two types of DMA mappings:
303		312
304	Neither type of DMA mapping has alignment restrictions that come	313	Neither type of DMA mapping has alignment restrictions that come
305	from PCI, although some devices may have such restrictions.	314	from PCI, although some devices may have such restrictions.
		315	Also, systems with caches that aren't DMA-coherent will work better
		316	when the underlying buffers don't share cache lines with other data.
		317
306		318
307	Using Consistent DMA mappings.	319	Using Consistent DMA mappings.
308		320