diff options
| author | Maxime Ripard <maxime.ripard@free-electrons.com> | 2014-10-28 16:55:50 -0400 |
|---|---|---|
| committer | Vinod Koul <vinod.koul@intel.com> | 2014-11-06 00:45:58 -0500 |
| commit | c4d2ae967c1821b424a7d818c8297db8e61fc267 (patch) | |
| tree | 0d8d8f0c1dfbaa02eeff9e86ca853c2f56b14a21 /Documentation/dmaengine | |
| parent | f36d2e6752bad5323fd0dc2c717cc200d83a09d1 (diff) | |
Documentation: dmaengine: Add a documentation for the dma controller API
The dmaengine is neither trivial nor properly documented at the moment, which
means a lot of trial and error development, which is not that good for such a
central piece of the system.
Attempt at making such a documentation.
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
[fixed some minor typos]
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Diffstat (limited to 'Documentation/dmaengine')
| -rw-r--r-- | Documentation/dmaengine/provider.txt | 366 |
1 files changed, 366 insertions, 0 deletions
diff --git a/Documentation/dmaengine/provider.txt b/Documentation/dmaengine/provider.txt new file mode 100644 index 000000000000..766658ccf235 --- /dev/null +++ b/Documentation/dmaengine/provider.txt | |||
| @@ -0,0 +1,366 @@ | |||
| 1 | DMAengine controller documentation | ||
| 2 | ================================== | ||
| 3 | |||
| 4 | Hardware Introduction | ||
| 5 | +++++++++++++++++++++ | ||
| 6 | |||
| 7 | Most of the Slave DMA controllers have the same general principles of | ||
| 8 | operations. | ||
| 9 | |||
| 10 | They have a given number of channels to use for the DMA transfers, and | ||
| 11 | a given number of requests lines. | ||
| 12 | |||
| 13 | Requests and channels are pretty much orthogonal. Channels can be used | ||
| 14 | to serve several to any requests. To simplify, channels are the | ||
| 15 | entities that will be doing the copy, and requests what endpoints are | ||
| 16 | involved. | ||
| 17 | |||
| 18 | The request lines actually correspond to physical lines going from the | ||
| 19 | DMA-eligible devices to the controller itself. Whenever the device | ||
| 20 | will want to start a transfer, it will assert a DMA request (DRQ) by | ||
| 21 | asserting that request line. | ||
| 22 | |||
| 23 | A very simple DMA controller would only take into account a single | ||
| 24 | parameter: the transfer size. At each clock cycle, it would transfer a | ||
| 25 | byte of data from one buffer to another, until the transfer size has | ||
| 26 | been reached. | ||
| 27 | |||
| 28 | That wouldn't work well in the real world, since slave devices might | ||
| 29 | require a specific number of bits to be transferred in a single | ||
| 30 | cycle. For example, we may want to transfer as much data as the | ||
| 31 | physical bus allows to maximize performances when doing a simple | ||
| 32 | memory copy operation, but our audio device could have a narrower FIFO | ||
| 33 | that requires data to be written exactly 16 or 24 bits at a time. This | ||
| 34 | is why most if not all of the DMA controllers can adjust this, using a | ||
| 35 | parameter called the transfer width. | ||
| 36 | |||
| 37 | Moreover, some DMA controllers, whenever the RAM is used as a source | ||
| 38 | or destination, can group the reads or writes in memory into a buffer, | ||
| 39 | so instead of having a lot of small memory accesses, which is not | ||
| 40 | really efficient, you'll get several bigger transfers. This is done | ||
| 41 | using a parameter called the burst size, that defines how many single | ||
| 42 | reads/writes it's allowed to do without the controller splitting the | ||
| 43 | transfer into smaller sub-transfers. | ||
| 44 | |||
| 45 | Our theoretical DMA controller would then only be able to do transfers | ||
| 46 | that involve a single contiguous block of data. However, some of the | ||
| 47 | transfers we usually have are not, and want to copy data from | ||
| 48 | non-contiguous buffers to a contiguous buffer, which is called | ||
| 49 | scatter-gather. | ||
| 50 | |||
| 51 | DMAEngine, at least for mem2dev transfers, require support for | ||
| 52 | scatter-gather. So we're left with two cases here: either we have a | ||
| 53 | quite simple DMA controller that doesn't support it, and we'll have to | ||
| 54 | implement it in software, or we have a more advanced DMA controller, | ||
| 55 | that implements in hardware scatter-gather. | ||
| 56 | |||
| 57 | The latter are usually programmed using a collection of chunks to | ||
| 58 | transfer, and whenever the transfer is started, the controller will go | ||
| 59 | over that collection, doing whatever we programmed there. | ||
| 60 | |||
| 61 | This collection is usually either a table or a linked list. You will | ||
| 62 | then push either the address of the table and its number of elements, | ||
| 63 | or the first item of the list to one channel of the DMA controller, | ||
| 64 | and whenever a DRQ will be asserted, it will go through the collection | ||
| 65 | to know where to fetch the data from. | ||
| 66 | |||
| 67 | Either way, the format of this collection is completely dependent on | ||
| 68 | your hardware. Each DMA controller will require a different structure, | ||
| 69 | but all of them will require, for every chunk, at least the source and | ||
| 70 | destination addresses, whether it should increment these addresses or | ||
| 71 | not and the three parameters we saw earlier: the burst size, the | ||
| 72 | transfer width and the transfer size. | ||
| 73 | |||
| 74 | The one last thing is that usually, slave devices won't issue DRQ by | ||
| 75 | default, and you have to enable this in your slave device driver first | ||
| 76 | whenever you're willing to use DMA. | ||
| 77 | |||
| 78 | These were just the general memory-to-memory (also called mem2mem) or | ||
| 79 | memory-to-device (mem2dev) kind of transfers. Most devices often | ||
| 80 | support other kind of transfers or memory operations that dmaengine | ||
| 81 | support and will be detailed later in this document. | ||
| 82 | |||
| 83 | DMA Support in Linux | ||
| 84 | ++++++++++++++++++++ | ||
| 85 | |||
| 86 | Historically, DMA controller drivers have been implemented using the | ||
| 87 | async TX API, to offload operations such as memory copy, XOR, | ||
| 88 | cryptography, etc., basically any memory to memory operation. | ||
| 89 | |||
| 90 | Over time, the need for memory to device transfers arose, and | ||
| 91 | dmaengine was extended. Nowadays, the async TX API is written as a | ||
| 92 | layer on top of dmaengine, and acts as a client. Still, dmaengine | ||
| 93 | accommodates that API in some cases, and made some design choices to | ||
| 94 | ensure that it stayed compatible. | ||
| 95 | |||
| 96 | For more information on the Async TX API, please look the relevant | ||
| 97 | documentation file in Documentation/crypto/async-tx-api.txt. | ||
| 98 | |||
| 99 | DMAEngine Registration | ||
| 100 | ++++++++++++++++++++++ | ||
| 101 | |||
| 102 | struct dma_device Initialization | ||
| 103 | -------------------------------- | ||
| 104 | |||
| 105 | Just like any other kernel framework, the whole DMAEngine registration | ||
| 106 | relies on the driver filling a structure and registering against the | ||
| 107 | framework. In our case, that structure is dma_device. | ||
| 108 | |||
| 109 | The first thing you need to do in your driver is to allocate this | ||
| 110 | structure. Any of the usual memory allocators will do, but you'll also | ||
| 111 | need to initialize a few fields in there: | ||
| 112 | |||
| 113 | * channels: should be initialized as a list using the | ||
| 114 | INIT_LIST_HEAD macro for example | ||
| 115 | |||
| 116 | * dev: should hold the pointer to the struct device associated | ||
| 117 | to your current driver instance. | ||
| 118 | |||
| 119 | Supported transaction types | ||
| 120 | --------------------------- | ||
| 121 | |||
| 122 | The next thing you need is to set which transaction types your device | ||
| 123 | (and driver) supports. | ||
| 124 | |||
| 125 | Our dma_device structure has a field called cap_mask that holds the | ||
| 126 | various types of transaction supported, and you need to modify this | ||
| 127 | mask using the dma_cap_set function, with various flags depending on | ||
| 128 | transaction types you support as an argument. | ||
| 129 | |||
| 130 | All those capabilities are defined in the dma_transaction_type enum, | ||
| 131 | in include/linux/dmaengine.h | ||
| 132 | |||
| 133 | Currently, the types available are: | ||
| 134 | * DMA_MEMCPY | ||
| 135 | - The device is able to do memory to memory copies | ||
| 136 | |||
| 137 | * DMA_XOR | ||
| 138 | - The device is able to perform XOR operations on memory areas | ||
| 139 | - Used to accelerate XOR intensive tasks, such as RAID5 | ||
| 140 | |||
| 141 | * DMA_XOR_VAL | ||
| 142 | - The device is able to perform parity check using the XOR | ||
| 143 | algorithm against a memory buffer. | ||
| 144 | |||
| 145 | * DMA_PQ | ||
| 146 | - The device is able to perform RAID6 P+Q computations, P being a | ||
| 147 | simple XOR, and Q being a Reed-Solomon algorithm. | ||
| 148 | |||
| 149 | * DMA_PQ_VAL | ||
| 150 | - The device is able to perform parity check using RAID6 P+Q | ||
| 151 | algorithm against a memory buffer. | ||
| 152 | |||
| 153 | * DMA_INTERRUPT | ||
| 154 | - The device is able to trigger a dummy transfer that will | ||
| 155 | generate periodic interrupts | ||
| 156 | - Used by the client drivers to register a callback that will be | ||
| 157 | called on a regular basis through the DMA controller interrupt | ||
| 158 | |||
| 159 | * DMA_SG | ||
| 160 | - The device supports memory to memory scatter-gather | ||
| 161 | transfers. | ||
| 162 | - Even though a plain memcpy can look like a particular case of a | ||
| 163 | scatter-gather transfer, with a single chunk to transfer, it's a | ||
| 164 | distinct transaction type in the mem2mem transfers case | ||
| 165 | |||
| 166 | * DMA_PRIVATE | ||
| 167 | - The devices only supports slave transfers, and as such isn't | ||
| 168 | available for async transfers. | ||
| 169 | |||
| 170 | * DMA_ASYNC_TX | ||
| 171 | - Must not be set by the device, and will be set by the framework | ||
| 172 | if needed | ||
| 173 | - /* TODO: What is it about? */ | ||
| 174 | |||
| 175 | * DMA_SLAVE | ||
| 176 | - The device can handle device to memory transfers, including | ||
| 177 | scatter-gather transfers. | ||
| 178 | - While in the mem2mem case we were having two distinct types to | ||
| 179 | deal with a single chunk to copy or a collection of them, here, | ||
| 180 | we just have a single transaction type that is supposed to | ||
| 181 | handle both. | ||
| 182 | - If you want to transfer a single contiguous memory buffer, | ||
| 183 | simply build a scatter list with only one item. | ||
| 184 | |||
| 185 | * DMA_CYCLIC | ||
| 186 | - The device can handle cyclic transfers. | ||
| 187 | - A cyclic transfer is a transfer where the chunk collection will | ||
| 188 | loop over itself, with the last item pointing to the first. | ||
| 189 | - It's usually used for audio transfers, where you want to operate | ||
| 190 | on a single ring buffer that you will fill with your audio data. | ||
| 191 | |||
| 192 | * DMA_INTERLEAVE | ||
| 193 | - The device supports interleaved transfer. | ||
| 194 | - These transfers can transfer data from a non-contiguous buffer | ||
| 195 | to a non-contiguous buffer, opposed to DMA_SLAVE that can | ||
| 196 | transfer data from a non-contiguous data set to a continuous | ||
| 197 | destination buffer. | ||
| 198 | - It's usually used for 2d content transfers, in which case you | ||
| 199 | want to transfer a portion of uncompressed data directly to the | ||
| 200 | display to print it | ||
| 201 | |||
| 202 | These various types will also affect how the source and destination | ||
| 203 | addresses change over time. | ||
| 204 | |||
| 205 | Addresses pointing to RAM are typically incremented (or decremented) | ||
| 206 | after each transfer. In case of a ring buffer, they may loop | ||
| 207 | (DMA_CYCLIC). Addresses pointing to a device's register (e.g. a FIFO) | ||
| 208 | are typically fixed. | ||
| 209 | |||
| 210 | Device operations | ||
| 211 | ----------------- | ||
| 212 | |||
| 213 | Our dma_device structure also requires a few function pointers in | ||
| 214 | order to implement the actual logic, now that we described what | ||
| 215 | operations we were able to perform. | ||
| 216 | |||
| 217 | The functions that we have to fill in there, and hence have to | ||
| 218 | implement, obviously depend on the transaction types you reported as | ||
| 219 | supported. | ||
| 220 | |||
| 221 | * device_alloc_chan_resources | ||
| 222 | * device_free_chan_resources | ||
| 223 | - These functions will be called whenever a driver will call | ||
| 224 | dma_request_channel or dma_release_channel for the first/last | ||
| 225 | time on the channel associated to that driver. | ||
| 226 | - They are in charge of allocating/freeing all the needed | ||
| 227 | resources in order for that channel to be useful for your | ||
| 228 | driver. | ||
| 229 | - These functions can sleep. | ||
| 230 | |||
| 231 | * device_prep_dma_* | ||
| 232 | - These functions are matching the capabilities you registered | ||
| 233 | previously. | ||
| 234 | - These functions all take the buffer or the scatterlist relevant | ||
| 235 | for the transfer being prepared, and should create a hardware | ||
| 236 | descriptor or a list of hardware descriptors from it | ||
| 237 | - These functions can be called from an interrupt context | ||
| 238 | - Any allocation you might do should be using the GFP_NOWAIT | ||
| 239 | flag, in order not to potentially sleep, but without depleting | ||
| 240 | the emergency pool either. | ||
| 241 | - Drivers should try to pre-allocate any memory they might need | ||
| 242 | during the transfer setup at probe time to avoid putting to | ||
| 243 | much pressure on the nowait allocator. | ||
| 244 | |||
| 245 | - It should return a unique instance of the | ||
| 246 | dma_async_tx_descriptor structure, that further represents this | ||
| 247 | particular transfer. | ||
| 248 | |||
| 249 | - This structure can be initialized using the function | ||
| 250 | dma_async_tx_descriptor_init. | ||
| 251 | - You'll also need to set two fields in this structure: | ||
| 252 | + flags: | ||
| 253 | TODO: Can it be modified by the driver itself, or | ||
| 254 | should it be always the flags passed in the arguments | ||
| 255 | |||
| 256 | + tx_submit: A pointer to a function you have to implement, | ||
| 257 | that is supposed to push the current | ||
| 258 | transaction descriptor to a pending queue, waiting | ||
| 259 | for issue_pending to be called. | ||
| 260 | |||
| 261 | * device_issue_pending | ||
| 262 | - Takes the first transaction descriptor in the pending queue, | ||
| 263 | and starts the transfer. Whenever that transfer is done, it | ||
| 264 | should move to the next transaction in the list. | ||
| 265 | - This function can be called in an interrupt context | ||
| 266 | |||
| 267 | * device_tx_status | ||
| 268 | - Should report the bytes left to go over on the given channel | ||
| 269 | - Should only care about the transaction descriptor passed as | ||
| 270 | argument, not the currently active one on a given channel | ||
| 271 | - The tx_state argument might be NULL | ||
| 272 | - Should use dma_set_residue to report it | ||
| 273 | - In the case of a cyclic transfer, it should only take into | ||
| 274 | account the current period. | ||
| 275 | - This function can be called in an interrupt context. | ||
| 276 | |||
| 277 | * device_control | ||
| 278 | - Used by client drivers to control and configure the channel it | ||
| 279 | has a handle on. | ||
| 280 | - Called with a command and an argument | ||
| 281 | + The command is one of the values listed by the enum | ||
| 282 | dma_ctrl_cmd. The valid commands are: | ||
| 283 | + DMA_PAUSE | ||
| 284 | + Pauses a transfer on the channel | ||
| 285 | + This command should operate synchronously on the channel, | ||
| 286 | pausing right away the work of the given channel | ||
| 287 | + DMA_RESUME | ||
| 288 | + Restarts a transfer on the channel | ||
| 289 | + This command should operate synchronously on the channel, | ||
| 290 | resuming right away the work of the given channel | ||
| 291 | + DMA_TERMINATE_ALL | ||
| 292 | + Aborts all the pending and ongoing transfers on the | ||
| 293 | channel | ||
| 294 | + This command should operate synchronously on the channel, | ||
| 295 | terminating right away all the channels | ||
| 296 | + DMA_SLAVE_CONFIG | ||
| 297 | + Reconfigures the channel with passed configuration | ||
| 298 | + This command should NOT perform synchronously, or on any | ||
| 299 | currently queued transfers, but only on subsequent ones | ||
| 300 | + In this case, the function will receive a | ||
| 301 | dma_slave_config structure pointer as an argument, that | ||
| 302 | will detail which configuration to use. | ||
| 303 | + Even though that structure contains a direction field, | ||
| 304 | this field is deprecated in favor of the direction | ||
| 305 | argument given to the prep_* functions | ||
| 306 | + FSLDMA_EXTERNAL_START | ||
| 307 | + TODO: Why does that even exist? | ||
| 308 | + The argument is an opaque unsigned long. This actually is a | ||
| 309 | pointer to a struct dma_slave_config that should be used only | ||
| 310 | in the DMA_SLAVE_CONFIG. | ||
| 311 | |||
| 312 | * device_slave_caps | ||
| 313 | - Called through the framework by client drivers in order to have | ||
| 314 | an idea of what are the properties of the channel allocated to | ||
| 315 | them. | ||
| 316 | - Such properties are the buswidth, available directions, etc. | ||
| 317 | - Required for every generic layer doing DMA transfers, such as | ||
| 318 | ASoC. | ||
| 319 | |||
| 320 | Misc notes (stuff that should be documented, but don't really know | ||
| 321 | where to put them) | ||
| 322 | ------------------------------------------------------------------ | ||
| 323 | * dma_run_dependencies | ||
| 324 | - Should be called at the end of an async TX transfer, and can be | ||
| 325 | ignored in the slave transfers case. | ||
| 326 | - Makes sure that dependent operations are run before marking it | ||
| 327 | as complete. | ||
| 328 | |||
| 329 | * dma_cookie_t | ||
| 330 | - it's a DMA transaction ID that will increment over time. | ||
| 331 | - Not really relevant any more since the introduction of virt-dma | ||
| 332 | that abstracts it away. | ||
| 333 | |||
| 334 | * DMA_CTRL_ACK | ||
| 335 | - Undocumented feature | ||
| 336 | - No one really has an idea of what it's about, besides being | ||
| 337 | related to reusing the DMA transaction descriptors or having | ||
| 338 | additional transactions added to it in the async-tx API | ||
| 339 | - Useless in the case of the slave API | ||
| 340 | |||
| 341 | General Design Notes | ||
| 342 | -------------------- | ||
| 343 | |||
| 344 | Most of the DMAEngine drivers you'll see are based on a similar design | ||
| 345 | that handles the end of transfer interrupts in the handler, but defer | ||
| 346 | most work to a tasklet, including the start of a new transfer whenever | ||
| 347 | the previous transfer ended. | ||
| 348 | |||
| 349 | This is a rather inefficient design though, because the inter-transfer | ||
| 350 | latency will be not only the interrupt latency, but also the | ||
| 351 | scheduling latency of the tasklet, which will leave the channel idle | ||
| 352 | in between, which will slow down the global transfer rate. | ||
| 353 | |||
| 354 | You should avoid this kind of practice, and instead of electing a new | ||
| 355 | transfer in your tasklet, move that part to the interrupt handler in | ||
| 356 | order to have a shorter idle window (that we can't really avoid | ||
| 357 | anyway). | ||
| 358 | |||
| 359 | Glossary | ||
| 360 | -------- | ||
| 361 | |||
| 362 | Burst: A number of consecutive read or write operations | ||
| 363 | that can be queued to buffers before being flushed to | ||
| 364 | memory. | ||
| 365 | Chunk: A contiguous collection of bursts | ||
| 366 | Transfer: A collection of chunks (be it contiguous or not) | ||
