diff options
81 files changed, 6263 insertions, 4731 deletions
diff --git a/Documentation/DMA-API-HOWTO.txt b/Documentation/DMA-API-HOWTO.txt index 4ed388356898..f0cc3f772265 100644 --- a/Documentation/DMA-API-HOWTO.txt +++ b/Documentation/DMA-API-HOWTO.txt | |||
@@ -1,22 +1,24 @@ | |||
1 | Dynamic DMA mapping Guide | 1 | ========================= |
2 | ========================= | 2 | Dynamic DMA mapping Guide |
3 | ========================= | ||
3 | 4 | ||
4 | David S. Miller <davem@redhat.com> | 5 | :Author: David S. Miller <davem@redhat.com> |
5 | Richard Henderson <rth@cygnus.com> | 6 | :Author: Richard Henderson <rth@cygnus.com> |
6 | Jakub Jelinek <jakub@redhat.com> | 7 | :Author: Jakub Jelinek <jakub@redhat.com> |
7 | 8 | ||
8 | This is a guide to device driver writers on how to use the DMA API | 9 | This is a guide to device driver writers on how to use the DMA API |
9 | with example pseudo-code. For a concise description of the API, see | 10 | with example pseudo-code. For a concise description of the API, see |
10 | DMA-API.txt. | 11 | DMA-API.txt. |
11 | 12 | ||
12 | CPU and DMA addresses | 13 | CPU and DMA addresses |
14 | ===================== | ||
13 | 15 | ||
14 | There are several kinds of addresses involved in the DMA API, and it's | 16 | There are several kinds of addresses involved in the DMA API, and it's |
15 | important to understand the differences. | 17 | important to understand the differences. |
16 | 18 | ||
17 | The kernel normally uses virtual addresses. Any address returned by | 19 | The kernel normally uses virtual addresses. Any address returned by |
18 | kmalloc(), vmalloc(), and similar interfaces is a virtual address and can | 20 | kmalloc(), vmalloc(), and similar interfaces is a virtual address and can |
19 | be stored in a "void *". | 21 | be stored in a ``void *``. |
20 | 22 | ||
21 | The virtual memory system (TLB, page tables, etc.) translates virtual | 23 | The virtual memory system (TLB, page tables, etc.) translates virtual |
22 | addresses to CPU physical addresses, which are stored as "phys_addr_t" or | 24 | addresses to CPU physical addresses, which are stored as "phys_addr_t" or |
@@ -37,7 +39,7 @@ be restricted to a subset of that space. For example, even if a system | |||
37 | supports 64-bit addresses for main memory and PCI BARs, it may use an IOMMU | 39 | supports 64-bit addresses for main memory and PCI BARs, it may use an IOMMU |
38 | so devices only need to use 32-bit DMA addresses. | 40 | so devices only need to use 32-bit DMA addresses. |
39 | 41 | ||
40 | Here's a picture and some examples: | 42 | Here's a picture and some examples:: |
41 | 43 | ||
42 | CPU CPU Bus | 44 | CPU CPU Bus |
43 | Virtual Physical Address | 45 | Virtual Physical Address |
@@ -98,15 +100,16 @@ microprocessor architecture. You should use the DMA API rather than the | |||
98 | bus-specific DMA API, i.e., use the dma_map_*() interfaces rather than the | 100 | bus-specific DMA API, i.e., use the dma_map_*() interfaces rather than the |
99 | pci_map_*() interfaces. | 101 | pci_map_*() interfaces. |
100 | 102 | ||
101 | First of all, you should make sure | 103 | First of all, you should make sure:: |
102 | 104 | ||
103 | #include <linux/dma-mapping.h> | 105 | #include <linux/dma-mapping.h> |
104 | 106 | ||
105 | is in your driver, which provides the definition of dma_addr_t. This type | 107 | is in your driver, which provides the definition of dma_addr_t. This type |
106 | can hold any valid DMA address for the platform and should be used | 108 | can hold any valid DMA address for the platform and should be used |
107 | everywhere you hold a DMA address returned from the DMA mapping functions. | 109 | everywhere you hold a DMA address returned from the DMA mapping functions. |
108 | 110 | ||
109 | What memory is DMA'able? | 111 | What memory is DMA'able? |
112 | ======================== | ||
110 | 113 | ||
111 | The first piece of information you must know is what kernel memory can | 114 | The first piece of information you must know is what kernel memory can |
112 | be used with the DMA mapping facilities. There has been an unwritten | 115 | be used with the DMA mapping facilities. There has been an unwritten |
@@ -143,7 +146,8 @@ What about block I/O and networking buffers? The block I/O and | |||
143 | networking subsystems make sure that the buffers they use are valid | 146 | networking subsystems make sure that the buffers they use are valid |
144 | for you to DMA from/to. | 147 | for you to DMA from/to. |
145 | 148 | ||
146 | DMA addressing limitations | 149 | DMA addressing limitations |
150 | ========================== | ||
147 | 151 | ||
148 | Does your device have any DMA addressing limitations? For example, is | 152 | Does your device have any DMA addressing limitations? For example, is |
149 | your device only capable of driving the low order 24-bits of address? | 153 | your device only capable of driving the low order 24-bits of address? |
@@ -166,7 +170,7 @@ style to do this even if your device holds the default setting, | |||
166 | because this shows that you did think about these issues wrt. your | 170 | because this shows that you did think about these issues wrt. your |
167 | device. | 171 | device. |
168 | 172 | ||
169 | The query is performed via a call to dma_set_mask_and_coherent(): | 173 | The query is performed via a call to dma_set_mask_and_coherent():: |
170 | 174 | ||
171 | int dma_set_mask_and_coherent(struct device *dev, u64 mask); | 175 | int dma_set_mask_and_coherent(struct device *dev, u64 mask); |
172 | 176 | ||
@@ -175,12 +179,12 @@ If you have some special requirements, then the following two separate | |||
175 | queries can be used instead: | 179 | queries can be used instead: |
176 | 180 | ||
177 | The query for streaming mappings is performed via a call to | 181 | The query for streaming mappings is performed via a call to |
178 | dma_set_mask(): | 182 | dma_set_mask():: |
179 | 183 | ||
180 | int dma_set_mask(struct device *dev, u64 mask); | 184 | int dma_set_mask(struct device *dev, u64 mask); |
181 | 185 | ||
182 | The query for consistent allocations is performed via a call | 186 | The query for consistent allocations is performed via a call |
183 | to dma_set_coherent_mask(): | 187 | to dma_set_coherent_mask():: |
184 | 188 | ||
185 | int dma_set_coherent_mask(struct device *dev, u64 mask); | 189 | int dma_set_coherent_mask(struct device *dev, u64 mask); |
186 | 190 | ||
@@ -209,7 +213,7 @@ of your driver reports that performance is bad or that the device is not | |||
209 | even detected, you can ask them for the kernel messages to find out | 213 | even detected, you can ask them for the kernel messages to find out |
210 | exactly why. | 214 | exactly why. |
211 | 215 | ||
212 | The standard 32-bit addressing device would do something like this: | 216 | The standard 32-bit addressing device would do something like this:: |
213 | 217 | ||
214 | if (dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32))) { | 218 | if (dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32))) { |
215 | dev_warn(dev, "mydev: No suitable DMA available\n"); | 219 | dev_warn(dev, "mydev: No suitable DMA available\n"); |
@@ -225,7 +229,7 @@ than 64-bit addressing. For example, Sparc64 PCI SAC addressing is | |||
225 | more efficient than DAC addressing. | 229 | more efficient than DAC addressing. |
226 | 230 | ||
227 | Here is how you would handle a 64-bit capable device which can drive | 231 | Here is how you would handle a 64-bit capable device which can drive |
228 | all 64-bits when accessing streaming DMA: | 232 | all 64-bits when accessing streaming DMA:: |
229 | 233 | ||
230 | int using_dac; | 234 | int using_dac; |
231 | 235 | ||
@@ -239,7 +243,7 @@ all 64-bits when accessing streaming DMA: | |||
239 | } | 243 | } |
240 | 244 | ||
241 | If a card is capable of using 64-bit consistent allocations as well, | 245 | If a card is capable of using 64-bit consistent allocations as well, |
242 | the case would look like this: | 246 | the case would look like this:: |
243 | 247 | ||
244 | int using_dac, consistent_using_dac; | 248 | int using_dac, consistent_using_dac; |
245 | 249 | ||
@@ -260,7 +264,7 @@ uses consistent allocations, one would have to check the return value from | |||
260 | dma_set_coherent_mask(). | 264 | dma_set_coherent_mask(). |
261 | 265 | ||
262 | Finally, if your device can only drive the low 24-bits of | 266 | Finally, if your device can only drive the low 24-bits of |
263 | address you might do something like: | 267 | address you might do something like:: |
264 | 268 | ||
265 | if (dma_set_mask(dev, DMA_BIT_MASK(24))) { | 269 | if (dma_set_mask(dev, DMA_BIT_MASK(24))) { |
266 | dev_warn(dev, "mydev: 24-bit DMA addressing not available\n"); | 270 | dev_warn(dev, "mydev: 24-bit DMA addressing not available\n"); |
@@ -280,7 +284,7 @@ only provide the functionality which the machine can handle. It | |||
280 | is important that the last call to dma_set_mask() be for the | 284 | is important that the last call to dma_set_mask() be for the |
281 | most specific mask. | 285 | most specific mask. |
282 | 286 | ||
283 | Here is pseudo-code showing how this might be done: | 287 | Here is pseudo-code showing how this might be done:: |
284 | 288 | ||
285 | #define PLAYBACK_ADDRESS_BITS DMA_BIT_MASK(32) | 289 | #define PLAYBACK_ADDRESS_BITS DMA_BIT_MASK(32) |
286 | #define RECORD_ADDRESS_BITS DMA_BIT_MASK(24) | 290 | #define RECORD_ADDRESS_BITS DMA_BIT_MASK(24) |
@@ -308,7 +312,8 @@ A sound card was used as an example here because this genre of PCI | |||
308 | devices seems to be littered with ISA chips given a PCI front end, | 312 | devices seems to be littered with ISA chips given a PCI front end, |
309 | and thus retaining the 16MB DMA addressing limitations of ISA. | 313 | and thus retaining the 16MB DMA addressing limitations of ISA. |
310 | 314 | ||
311 | Types of DMA mappings | 315 | Types of DMA mappings |
316 | ===================== | ||
312 | 317 | ||
313 | There are two types of DMA mappings: | 318 | There are two types of DMA mappings: |
314 | 319 | ||
@@ -336,12 +341,14 @@ There are two types of DMA mappings: | |||
336 | to memory is immediately visible to the device, and vice | 341 | to memory is immediately visible to the device, and vice |
337 | versa. Consistent mappings guarantee this. | 342 | versa. Consistent mappings guarantee this. |
338 | 343 | ||
339 | IMPORTANT: Consistent DMA memory does not preclude the usage of | 344 | .. important:: |
340 | proper memory barriers. The CPU may reorder stores to | 345 | |
346 | Consistent DMA memory does not preclude the usage of | ||
347 | proper memory barriers. The CPU may reorder stores to | ||
341 | consistent memory just as it may normal memory. Example: | 348 | consistent memory just as it may normal memory. Example: |
342 | if it is important for the device to see the first word | 349 | if it is important for the device to see the first word |
343 | of a descriptor updated before the second, you must do | 350 | of a descriptor updated before the second, you must do |
344 | something like: | 351 | something like:: |
345 | 352 | ||
346 | desc->word0 = address; | 353 | desc->word0 = address; |
347 | wmb(); | 354 | wmb(); |
@@ -377,16 +384,17 @@ Also, systems with caches that aren't DMA-coherent will work better | |||
377 | when the underlying buffers don't share cache lines with other data. | 384 | when the underlying buffers don't share cache lines with other data. |
378 | 385 | ||
379 | 386 | ||
380 | Using Consistent DMA mappings. | 387 | Using Consistent DMA mappings |
388 | ============================= | ||
381 | 389 | ||
382 | To allocate and map large (PAGE_SIZE or so) consistent DMA regions, | 390 | To allocate and map large (PAGE_SIZE or so) consistent DMA regions, |
383 | you should do: | 391 | you should do:: |
384 | 392 | ||
385 | dma_addr_t dma_handle; | 393 | dma_addr_t dma_handle; |
386 | 394 | ||
387 | cpu_addr = dma_alloc_coherent(dev, size, &dma_handle, gfp); | 395 | cpu_addr = dma_alloc_coherent(dev, size, &dma_handle, gfp); |
388 | 396 | ||
389 | where device is a struct device *. This may be called in interrupt | 397 | where device is a ``struct device *``. This may be called in interrupt |
390 | context with the GFP_ATOMIC flag. | 398 | context with the GFP_ATOMIC flag. |
391 | 399 | ||
392 | Size is the length of the region you want to allocate, in bytes. | 400 | Size is the length of the region you want to allocate, in bytes. |
@@ -415,7 +423,7 @@ exists (for example) to guarantee that if you allocate a chunk | |||
415 | which is smaller than or equal to 64 kilobytes, the extent of the | 423 | which is smaller than or equal to 64 kilobytes, the extent of the |
416 | buffer you receive will not cross a 64K boundary. | 424 | buffer you receive will not cross a 64K boundary. |
417 | 425 | ||
418 | To unmap and free such a DMA region, you call: | 426 | To unmap and free such a DMA region, you call:: |
419 | 427 | ||
420 | dma_free_coherent(dev, size, cpu_addr, dma_handle); | 428 | dma_free_coherent(dev, size, cpu_addr, dma_handle); |
421 | 429 | ||
@@ -430,7 +438,7 @@ a kmem_cache, but it uses dma_alloc_coherent(), not __get_free_pages(). | |||
430 | Also, it understands common hardware constraints for alignment, | 438 | Also, it understands common hardware constraints for alignment, |
431 | like queue heads needing to be aligned on N byte boundaries. | 439 | like queue heads needing to be aligned on N byte boundaries. |
432 | 440 | ||
433 | Create a dma_pool like this: | 441 | Create a dma_pool like this:: |
434 | 442 | ||
435 | struct dma_pool *pool; | 443 | struct dma_pool *pool; |
436 | 444 | ||
@@ -444,7 +452,7 @@ pass 0 for boundary; passing 4096 says memory allocated from this pool | |||
444 | must not cross 4KByte boundaries (but at that time it may be better to | 452 | must not cross 4KByte boundaries (but at that time it may be better to |
445 | use dma_alloc_coherent() directly instead). | 453 | use dma_alloc_coherent() directly instead). |
446 | 454 | ||
447 | Allocate memory from a DMA pool like this: | 455 | Allocate memory from a DMA pool like this:: |
448 | 456 | ||
449 | cpu_addr = dma_pool_alloc(pool, flags, &dma_handle); | 457 | cpu_addr = dma_pool_alloc(pool, flags, &dma_handle); |
450 | 458 | ||
@@ -452,7 +460,7 @@ flags are GFP_KERNEL if blocking is permitted (not in_interrupt nor | |||
452 | holding SMP locks), GFP_ATOMIC otherwise. Like dma_alloc_coherent(), | 460 | holding SMP locks), GFP_ATOMIC otherwise. Like dma_alloc_coherent(), |
453 | this returns two values, cpu_addr and dma_handle. | 461 | this returns two values, cpu_addr and dma_handle. |
454 | 462 | ||
455 | Free memory that was allocated from a dma_pool like this: | 463 | Free memory that was allocated from a dma_pool like this:: |
456 | 464 | ||
457 | dma_pool_free(pool, cpu_addr, dma_handle); | 465 | dma_pool_free(pool, cpu_addr, dma_handle); |
458 | 466 | ||
@@ -460,7 +468,7 @@ where pool is what you passed to dma_pool_alloc(), and cpu_addr and | |||
460 | dma_handle are the values dma_pool_alloc() returned. This function | 468 | dma_handle are the values dma_pool_alloc() returned. This function |
461 | may be called in interrupt context. | 469 | may be called in interrupt context. |
462 | 470 | ||
463 | Destroy a dma_pool by calling: | 471 | Destroy a dma_pool by calling:: |
464 | 472 | ||
465 | dma_pool_destroy(pool); | 473 | dma_pool_destroy(pool); |
466 | 474 | ||
@@ -468,11 +476,12 @@ Make sure you've called dma_pool_free() for all memory allocated | |||
468 | from a pool before you destroy the pool. This function may not | 476 | from a pool before you destroy the pool. This function may not |
469 | be called in interrupt context. | 477 | be called in interrupt context. |
470 | 478 | ||
471 | DMA Direction | 479 | DMA Direction |
480 | ============= | ||
472 | 481 | ||
473 | The interfaces described in subsequent portions of this document | 482 | The interfaces described in subsequent portions of this document |
474 | take a DMA direction argument, which is an integer and takes on | 483 | take a DMA direction argument, which is an integer and takes on |
475 | one of the following values: | 484 | one of the following values:: |
476 | 485 | ||
477 | DMA_BIDIRECTIONAL | 486 | DMA_BIDIRECTIONAL |
478 | DMA_TO_DEVICE | 487 | DMA_TO_DEVICE |
@@ -521,14 +530,15 @@ packets, map/unmap them with the DMA_TO_DEVICE direction | |||
521 | specifier. For receive packets, just the opposite, map/unmap them | 530 | specifier. For receive packets, just the opposite, map/unmap them |
522 | with the DMA_FROM_DEVICE direction specifier. | 531 | with the DMA_FROM_DEVICE direction specifier. |
523 | 532 | ||
524 | Using Streaming DMA mappings | 533 | Using Streaming DMA mappings |
534 | ============================ | ||
525 | 535 | ||
526 | The streaming DMA mapping routines can be called from interrupt | 536 | The streaming DMA mapping routines can be called from interrupt |
527 | context. There are two versions of each map/unmap, one which will | 537 | context. There are two versions of each map/unmap, one which will |
528 | map/unmap a single memory region, and one which will map/unmap a | 538 | map/unmap a single memory region, and one which will map/unmap a |
529 | scatterlist. | 539 | scatterlist. |
530 | 540 | ||
531 | To map a single region, you do: | 541 | To map a single region, you do:: |
532 | 542 | ||
533 | struct device *dev = &my_dev->dev; | 543 | struct device *dev = &my_dev->dev; |
534 | dma_addr_t dma_handle; | 544 | dma_addr_t dma_handle; |
@@ -545,7 +555,7 @@ To map a single region, you do: | |||
545 | goto map_error_handling; | 555 | goto map_error_handling; |
546 | } | 556 | } |
547 | 557 | ||
548 | and to unmap it: | 558 | and to unmap it:: |
549 | 559 | ||
550 | dma_unmap_single(dev, dma_handle, size, direction); | 560 | dma_unmap_single(dev, dma_handle, size, direction); |
551 | 561 | ||
@@ -563,7 +573,7 @@ Using CPU pointers like this for single mappings has a disadvantage: | |||
563 | you cannot reference HIGHMEM memory in this way. Thus, there is a | 573 | you cannot reference HIGHMEM memory in this way. Thus, there is a |
564 | map/unmap interface pair akin to dma_{map,unmap}_single(). These | 574 | map/unmap interface pair akin to dma_{map,unmap}_single(). These |
565 | interfaces deal with page/offset pairs instead of CPU pointers. | 575 | interfaces deal with page/offset pairs instead of CPU pointers. |
566 | Specifically: | 576 | Specifically:: |
567 | 577 | ||
568 | struct device *dev = &my_dev->dev; | 578 | struct device *dev = &my_dev->dev; |
569 | dma_addr_t dma_handle; | 579 | dma_addr_t dma_handle; |
@@ -593,7 +603,7 @@ error as outlined under the dma_map_single() discussion. | |||
593 | You should call dma_unmap_page() when the DMA activity is finished, e.g., | 603 | You should call dma_unmap_page() when the DMA activity is finished, e.g., |
594 | from the interrupt which told you that the DMA transfer is done. | 604 | from the interrupt which told you that the DMA transfer is done. |
595 | 605 | ||
596 | With scatterlists, you map a region gathered from several regions by: | 606 | With scatterlists, you map a region gathered from several regions by:: |
597 | 607 | ||
598 | int i, count = dma_map_sg(dev, sglist, nents, direction); | 608 | int i, count = dma_map_sg(dev, sglist, nents, direction); |
599 | struct scatterlist *sg; | 609 | struct scatterlist *sg; |
@@ -617,16 +627,18 @@ Then you should loop count times (note: this can be less than nents times) | |||
617 | and use sg_dma_address() and sg_dma_len() macros where you previously | 627 | and use sg_dma_address() and sg_dma_len() macros where you previously |
618 | accessed sg->address and sg->length as shown above. | 628 | accessed sg->address and sg->length as shown above. |
619 | 629 | ||
620 | To unmap a scatterlist, just call: | 630 | To unmap a scatterlist, just call:: |
621 | 631 | ||
622 | dma_unmap_sg(dev, sglist, nents, direction); | 632 | dma_unmap_sg(dev, sglist, nents, direction); |
623 | 633 | ||
624 | Again, make sure DMA activity has already finished. | 634 | Again, make sure DMA activity has already finished. |
625 | 635 | ||
626 | PLEASE NOTE: The 'nents' argument to the dma_unmap_sg call must be | 636 | .. note:: |
627 | the _same_ one you passed into the dma_map_sg call, | 637 | |
628 | it should _NOT_ be the 'count' value _returned_ from the | 638 | The 'nents' argument to the dma_unmap_sg call must be |
629 | dma_map_sg call. | 639 | the _same_ one you passed into the dma_map_sg call, |
640 | it should _NOT_ be the 'count' value _returned_ from the | ||
641 | dma_map_sg call. | ||
630 | 642 | ||
631 | Every dma_map_{single,sg}() call should have its dma_unmap_{single,sg}() | 643 | Every dma_map_{single,sg}() call should have its dma_unmap_{single,sg}() |
632 | counterpart, because the DMA address space is a shared resource and | 644 | counterpart, because the DMA address space is a shared resource and |
@@ -638,11 +650,11 @@ properly in order for the CPU and device to see the most up-to-date and | |||
638 | correct copy of the DMA buffer. | 650 | correct copy of the DMA buffer. |
639 | 651 | ||
640 | So, firstly, just map it with dma_map_{single,sg}(), and after each DMA | 652 | So, firstly, just map it with dma_map_{single,sg}(), and after each DMA |
641 | transfer call either: | 653 | transfer call either:: |
642 | 654 | ||
643 | dma_sync_single_for_cpu(dev, dma_handle, size, direction); | 655 | dma_sync_single_for_cpu(dev, dma_handle, size, direction); |
644 | 656 | ||
645 | or: | 657 | or:: |
646 | 658 | ||
647 | dma_sync_sg_for_cpu(dev, sglist, nents, direction); | 659 | dma_sync_sg_for_cpu(dev, sglist, nents, direction); |
648 | 660 | ||
@@ -650,17 +662,19 @@ as appropriate. | |||
650 | 662 | ||
651 | Then, if you wish to let the device get at the DMA area again, | 663 | Then, if you wish to let the device get at the DMA area again, |
652 | finish accessing the data with the CPU, and then before actually | 664 | finish accessing the data with the CPU, and then before actually |
653 | giving the buffer to the hardware call either: | 665 | giving the buffer to the hardware call either:: |
654 | 666 | ||
655 | dma_sync_single_for_device(dev, dma_handle, size, direction); | 667 | dma_sync_single_for_device(dev, dma_handle, size, direction); |
656 | 668 | ||
657 | or: | 669 | or:: |
658 | 670 | ||
659 | dma_sync_sg_for_device(dev, sglist, nents, direction); | 671 | dma_sync_sg_for_device(dev, sglist, nents, direction); |
660 | 672 | ||
661 | as appropriate. | 673 | as appropriate. |
662 | 674 | ||
663 | PLEASE NOTE: The 'nents' argument to dma_sync_sg_for_cpu() and | 675 | .. note:: |
676 | |||
677 | The 'nents' argument to dma_sync_sg_for_cpu() and | ||
664 | dma_sync_sg_for_device() must be the same passed to | 678 | dma_sync_sg_for_device() must be the same passed to |
665 | dma_map_sg(). It is _NOT_ the count returned by | 679 | dma_map_sg(). It is _NOT_ the count returned by |
666 | dma_map_sg(). | 680 | dma_map_sg(). |
@@ -671,7 +685,7 @@ dma_map_*() call till dma_unmap_*(), then you don't have to call the | |||
671 | dma_sync_*() routines at all. | 685 | dma_sync_*() routines at all. |
672 | 686 | ||
673 | Here is pseudo code which shows a situation in which you would need | 687 | Here is pseudo code which shows a situation in which you would need |
674 | to use the dma_sync_*() interfaces. | 688 | to use the dma_sync_*() interfaces:: |
675 | 689 | ||
676 | my_card_setup_receive_buffer(struct my_card *cp, char *buffer, int len) | 690 | my_card_setup_receive_buffer(struct my_card *cp, char *buffer, int len) |
677 | { | 691 | { |
@@ -747,7 +761,8 @@ is planned to completely remove virt_to_bus() and bus_to_virt() as | |||
747 | they are entirely deprecated. Some ports already do not provide these | 761 | they are entirely deprecated. Some ports already do not provide these |
748 | as it is impossible to correctly support them. | 762 | as it is impossible to correctly support them. |
749 | 763 | ||
750 | Handling Errors | 764 | Handling Errors |
765 | =============== | ||
751 | 766 | ||
752 | DMA address space is limited on some architectures and an allocation | 767 | DMA address space is limited on some architectures and an allocation |
753 | failure can be determined by: | 768 | failure can be determined by: |
@@ -755,7 +770,7 @@ failure can be determined by: | |||
755 | - checking if dma_alloc_coherent() returns NULL or dma_map_sg returns 0 | 770 | - checking if dma_alloc_coherent() returns NULL or dma_map_sg returns 0 |
756 | 771 | ||
757 | - checking the dma_addr_t returned from dma_map_single() and dma_map_page() | 772 | - checking the dma_addr_t returned from dma_map_single() and dma_map_page() |
758 | by using dma_mapping_error(): | 773 | by using dma_mapping_error():: |
759 | 774 | ||
760 | dma_addr_t dma_handle; | 775 | dma_addr_t dma_handle; |
761 | 776 | ||
@@ -773,7 +788,8 @@ failure can be determined by: | |||
773 | of a multiple page mapping attempt. These example are applicable to | 788 | of a multiple page mapping attempt. These example are applicable to |
774 | dma_map_page() as well. | 789 | dma_map_page() as well. |
775 | 790 | ||
776 | Example 1: | 791 | Example 1:: |
792 | |||
777 | dma_addr_t dma_handle1; | 793 | dma_addr_t dma_handle1; |
778 | dma_addr_t dma_handle2; | 794 | dma_addr_t dma_handle2; |
779 | 795 | ||
@@ -802,8 +818,12 @@ Example 1: | |||
802 | dma_unmap_single(dma_handle1); | 818 | dma_unmap_single(dma_handle1); |
803 | map_error_handling1: | 819 | map_error_handling1: |
804 | 820 | ||
805 | Example 2: (if buffers are allocated in a loop, unmap all mapped buffers when | 821 | Example 2:: |
806 | mapping error is detected in the middle) | 822 | |
823 | /* | ||
824 | * if buffers are allocated in a loop, unmap all mapped buffers when | ||
825 | * mapping error is detected in the middle | ||
826 | */ | ||
807 | 827 | ||
808 | dma_addr_t dma_addr; | 828 | dma_addr_t dma_addr; |
809 | dma_addr_t array[DMA_BUFFERS]; | 829 | dma_addr_t array[DMA_BUFFERS]; |
@@ -846,7 +866,8 @@ SCSI drivers must return SCSI_MLQUEUE_HOST_BUSY if the DMA mapping | |||
846 | fails in the queuecommand hook. This means that the SCSI subsystem | 866 | fails in the queuecommand hook. This means that the SCSI subsystem |
847 | passes the command to the driver again later. | 867 | passes the command to the driver again later. |
848 | 868 | ||
849 | Optimizing Unmap State Space Consumption | 869 | Optimizing Unmap State Space Consumption |
870 | ======================================== | ||
850 | 871 | ||
851 | On many platforms, dma_unmap_{single,page}() is simply a nop. | 872 | On many platforms, dma_unmap_{single,page}() is simply a nop. |
852 | Therefore, keeping track of the mapping address and length is a waste | 873 | Therefore, keeping track of the mapping address and length is a waste |
@@ -858,7 +879,7 @@ Actually, instead of describing the macros one by one, we'll | |||
858 | transform some example code. | 879 | transform some example code. |
859 | 880 | ||
860 | 1) Use DEFINE_DMA_UNMAP_{ADDR,LEN} in state saving structures. | 881 | 1) Use DEFINE_DMA_UNMAP_{ADDR,LEN} in state saving structures. |
861 | Example, before: | 882 | Example, before:: |
862 | 883 | ||
863 | struct ring_state { | 884 | struct ring_state { |
864 | struct sk_buff *skb; | 885 | struct sk_buff *skb; |
@@ -866,7 +887,7 @@ transform some example code. | |||
866 | __u32 len; | 887 | __u32 len; |
867 | }; | 888 | }; |
868 | 889 | ||
869 | after: | 890 | after:: |
870 | 891 | ||
871 | struct ring_state { | 892 | struct ring_state { |
872 | struct sk_buff *skb; | 893 | struct sk_buff *skb; |
@@ -875,23 +896,23 @@ transform some example code. | |||
875 | }; | 896 | }; |
876 | 897 | ||
877 | 2) Use dma_unmap_{addr,len}_set() to set these values. | 898 | 2) Use dma_unmap_{addr,len}_set() to set these values. |
878 | Example, before: | 899 | Example, before:: |
879 | 900 | ||
880 | ringp->mapping = FOO; | 901 | ringp->mapping = FOO; |
881 | ringp->len = BAR; | 902 | ringp->len = BAR; |
882 | 903 | ||
883 | after: | 904 | after:: |
884 | 905 | ||
885 | dma_unmap_addr_set(ringp, mapping, FOO); | 906 | dma_unmap_addr_set(ringp, mapping, FOO); |
886 | dma_unmap_len_set(ringp, len, BAR); | 907 | dma_unmap_len_set(ringp, len, BAR); |
887 | 908 | ||
888 | 3) Use dma_unmap_{addr,len}() to access these values. | 909 | 3) Use dma_unmap_{addr,len}() to access these values. |
889 | Example, before: | 910 | Example, before:: |
890 | 911 | ||
891 | dma_unmap_single(dev, ringp->mapping, ringp->len, | 912 | dma_unmap_single(dev, ringp->mapping, ringp->len, |
892 | DMA_FROM_DEVICE); | 913 | DMA_FROM_DEVICE); |
893 | 914 | ||
894 | after: | 915 | after:: |
895 | 916 | ||
896 | dma_unmap_single(dev, | 917 | dma_unmap_single(dev, |
897 | dma_unmap_addr(ringp, mapping), | 918 | dma_unmap_addr(ringp, mapping), |
@@ -902,7 +923,8 @@ It really should be self-explanatory. We treat the ADDR and LEN | |||
902 | separately, because it is possible for an implementation to only | 923 | separately, because it is possible for an implementation to only |
903 | need the address in order to perform the unmap operation. | 924 | need the address in order to perform the unmap operation. |
904 | 925 | ||
905 | Platform Issues | 926 | Platform Issues |
927 | =============== | ||
906 | 928 | ||
907 | If you are just writing drivers for Linux and do not maintain | 929 | If you are just writing drivers for Linux and do not maintain |
908 | an architecture port for the kernel, you can safely skip down | 930 | an architecture port for the kernel, you can safely skip down |
@@ -928,12 +950,13 @@ to "Closing". | |||
928 | alignment constraints (e.g. the alignment constraints about 64-bit | 950 | alignment constraints (e.g. the alignment constraints about 64-bit |
929 | objects). | 951 | objects). |
930 | 952 | ||
931 | Closing | 953 | Closing |
954 | ======= | ||
932 | 955 | ||
933 | This document, and the API itself, would not be in its current | 956 | This document, and the API itself, would not be in its current |
934 | form without the feedback and suggestions from numerous individuals. | 957 | form without the feedback and suggestions from numerous individuals. |
935 | We would like to specifically mention, in no particular order, the | 958 | We would like to specifically mention, in no particular order, the |
936 | following people: | 959 | following people:: |
937 | 960 | ||
938 | Russell King <rmk@arm.linux.org.uk> | 961 | Russell King <rmk@arm.linux.org.uk> |
939 | Leo Dagum <dagum@barrel.engr.sgi.com> | 962 | Leo Dagum <dagum@barrel.engr.sgi.com> |
diff --git a/Documentation/DMA-API.txt b/Documentation/DMA-API.txt index 71200dfa0922..45b29326d719 100644 --- a/Documentation/DMA-API.txt +++ b/Documentation/DMA-API.txt | |||
@@ -1,7 +1,8 @@ | |||
1 | Dynamic DMA mapping using the generic device | 1 | ============================================ |
2 | ============================================ | 2 | Dynamic DMA mapping using the generic device |
3 | ============================================ | ||
3 | 4 | ||
4 | James E.J. Bottomley <James.Bottomley@HansenPartnership.com> | 5 | :Author: James E.J. Bottomley <James.Bottomley@HansenPartnership.com> |
5 | 6 | ||
6 | This document describes the DMA API. For a more gentle introduction | 7 | This document describes the DMA API. For a more gentle introduction |
7 | of the API (and actual examples), see Documentation/DMA-API-HOWTO.txt. | 8 | of the API (and actual examples), see Documentation/DMA-API-HOWTO.txt. |
@@ -12,10 +13,10 @@ machines. Unless you know that your driver absolutely has to support | |||
12 | non-consistent platforms (this is usually only legacy platforms) you | 13 | non-consistent platforms (this is usually only legacy platforms) you |
13 | should only use the API described in part I. | 14 | should only use the API described in part I. |
14 | 15 | ||
15 | Part I - dma_ API | 16 | Part I - dma_API |
16 | ------------------------------------- | 17 | ---------------- |
17 | 18 | ||
18 | To get the dma_ API, you must #include <linux/dma-mapping.h>. This | 19 | To get the dma_API, you must #include <linux/dma-mapping.h>. This |
19 | provides dma_addr_t and the interfaces described below. | 20 | provides dma_addr_t and the interfaces described below. |
20 | 21 | ||
21 | A dma_addr_t can hold any valid DMA address for the platform. It can be | 22 | A dma_addr_t can hold any valid DMA address for the platform. It can be |
@@ -26,9 +27,11 @@ address space and the DMA address space. | |||
26 | Part Ia - Using large DMA-coherent buffers | 27 | Part Ia - Using large DMA-coherent buffers |
27 | ------------------------------------------ | 28 | ------------------------------------------ |
28 | 29 | ||
29 | void * | 30 | :: |
30 | dma_alloc_coherent(struct device *dev, size_t size, | 31 | |
31 | dma_addr_t *dma_handle, gfp_t flag) | 32 | void * |
33 | dma_alloc_coherent(struct device *dev, size_t size, | ||
34 | dma_addr_t *dma_handle, gfp_t flag) | ||
32 | 35 | ||
33 | Consistent memory is memory for which a write by either the device or | 36 | Consistent memory is memory for which a write by either the device or |
34 | the processor can immediately be read by the processor or device | 37 | the processor can immediately be read by the processor or device |
@@ -51,20 +54,24 @@ consolidate your requests for consistent memory as much as possible. | |||
51 | The simplest way to do that is to use the dma_pool calls (see below). | 54 | The simplest way to do that is to use the dma_pool calls (see below). |
52 | 55 | ||
53 | The flag parameter (dma_alloc_coherent() only) allows the caller to | 56 | The flag parameter (dma_alloc_coherent() only) allows the caller to |
54 | specify the GFP_ flags (see kmalloc()) for the allocation (the | 57 | specify the ``GFP_`` flags (see kmalloc()) for the allocation (the |
55 | implementation may choose to ignore flags that affect the location of | 58 | implementation may choose to ignore flags that affect the location of |
56 | the returned memory, like GFP_DMA). | 59 | the returned memory, like GFP_DMA). |
57 | 60 | ||
58 | void * | 61 | :: |
59 | dma_zalloc_coherent(struct device *dev, size_t size, | 62 | |
60 | dma_addr_t *dma_handle, gfp_t flag) | 63 | void * |
64 | dma_zalloc_coherent(struct device *dev, size_t size, | ||
65 | dma_addr_t *dma_handle, gfp_t flag) | ||
61 | 66 | ||
62 | Wraps dma_alloc_coherent() and also zeroes the returned memory if the | 67 | Wraps dma_alloc_coherent() and also zeroes the returned memory if the |
63 | allocation attempt succeeded. | 68 | allocation attempt succeeded. |
64 | 69 | ||
65 | void | 70 | :: |
66 | dma_free_coherent(struct device *dev, size_t size, void *cpu_addr, | 71 | |
67 | dma_addr_t dma_handle) | 72 | void |
73 | dma_free_coherent(struct device *dev, size_t size, void *cpu_addr, | ||
74 | dma_addr_t dma_handle) | ||
68 | 75 | ||
69 | Free a region of consistent memory you previously allocated. dev, | 76 | Free a region of consistent memory you previously allocated. dev, |
70 | size and dma_handle must all be the same as those passed into | 77 | size and dma_handle must all be the same as those passed into |
@@ -78,7 +85,7 @@ may only be called with IRQs enabled. | |||
78 | Part Ib - Using small DMA-coherent buffers | 85 | Part Ib - Using small DMA-coherent buffers |
79 | ------------------------------------------ | 86 | ------------------------------------------ |
80 | 87 | ||
81 | To get this part of the dma_ API, you must #include <linux/dmapool.h> | 88 | To get this part of the dma_API, you must #include <linux/dmapool.h> |
82 | 89 | ||
83 | Many drivers need lots of small DMA-coherent memory regions for DMA | 90 | Many drivers need lots of small DMA-coherent memory regions for DMA |
84 | descriptors or I/O buffers. Rather than allocating in units of a page | 91 | descriptors or I/O buffers. Rather than allocating in units of a page |
@@ -88,6 +95,8 @@ not __get_free_pages(). Also, they understand common hardware constraints | |||
88 | for alignment, like queue heads needing to be aligned on N-byte boundaries. | 95 | for alignment, like queue heads needing to be aligned on N-byte boundaries. |
89 | 96 | ||
90 | 97 | ||
98 | :: | ||
99 | |||
91 | struct dma_pool * | 100 | struct dma_pool * |
92 | dma_pool_create(const char *name, struct device *dev, | 101 | dma_pool_create(const char *name, struct device *dev, |
93 | size_t size, size_t align, size_t alloc); | 102 | size_t size, size_t align, size_t alloc); |
@@ -103,16 +112,21 @@ in bytes, and must be a power of two). If your device has no boundary | |||
103 | crossing restrictions, pass 0 for alloc; passing 4096 says memory allocated | 112 | crossing restrictions, pass 0 for alloc; passing 4096 says memory allocated |
104 | from this pool must not cross 4KByte boundaries. | 113 | from this pool must not cross 4KByte boundaries. |
105 | 114 | ||
115 | :: | ||
106 | 116 | ||
107 | void *dma_pool_zalloc(struct dma_pool *pool, gfp_t mem_flags, | 117 | void * |
108 | dma_addr_t *handle) | 118 | dma_pool_zalloc(struct dma_pool *pool, gfp_t mem_flags, |
119 | dma_addr_t *handle) | ||
109 | 120 | ||
110 | Wraps dma_pool_alloc() and also zeroes the returned memory if the | 121 | Wraps dma_pool_alloc() and also zeroes the returned memory if the |
111 | allocation attempt succeeded. | 122 | allocation attempt succeeded. |
112 | 123 | ||
113 | 124 | ||
114 | void *dma_pool_alloc(struct dma_pool *pool, gfp_t gfp_flags, | 125 | :: |
115 | dma_addr_t *dma_handle); | 126 | |
127 | void * | ||
128 | dma_pool_alloc(struct dma_pool *pool, gfp_t gfp_flags, | ||
129 | dma_addr_t *dma_handle); | ||
116 | 130 | ||
117 | This allocates memory from the pool; the returned memory will meet the | 131 | This allocates memory from the pool; the returned memory will meet the |
118 | size and alignment requirements specified at creation time. Pass | 132 | size and alignment requirements specified at creation time. Pass |
@@ -122,16 +136,20 @@ blocking. Like dma_alloc_coherent(), this returns two values: an | |||
122 | address usable by the CPU, and the DMA address usable by the pool's | 136 | address usable by the CPU, and the DMA address usable by the pool's |
123 | device. | 137 | device. |
124 | 138 | ||
139 | :: | ||
125 | 140 | ||
126 | void dma_pool_free(struct dma_pool *pool, void *vaddr, | 141 | void |
127 | dma_addr_t addr); | 142 | dma_pool_free(struct dma_pool *pool, void *vaddr, |
143 | dma_addr_t addr); | ||
128 | 144 | ||
129 | This puts memory back into the pool. The pool is what was passed to | 145 | This puts memory back into the pool. The pool is what was passed to |
130 | dma_pool_alloc(); the CPU (vaddr) and DMA addresses are what | 146 | dma_pool_alloc(); the CPU (vaddr) and DMA addresses are what |
131 | were returned when that routine allocated the memory being freed. | 147 | were returned when that routine allocated the memory being freed. |
132 | 148 | ||
149 | :: | ||
133 | 150 | ||
134 | void dma_pool_destroy(struct dma_pool *pool); | 151 | void |
152 | dma_pool_destroy(struct dma_pool *pool); | ||
135 | 153 | ||
136 | dma_pool_destroy() frees the resources of the pool. It must be | 154 | dma_pool_destroy() frees the resources of the pool. It must be |
137 | called in a context which can sleep. Make sure you've freed all allocated | 155 | called in a context which can sleep. Make sure you've freed all allocated |
@@ -141,32 +159,40 @@ memory back to the pool before you destroy it. | |||
141 | Part Ic - DMA addressing limitations | 159 | Part Ic - DMA addressing limitations |
142 | ------------------------------------ | 160 | ------------------------------------ |
143 | 161 | ||
144 | int | 162 | :: |
145 | dma_set_mask_and_coherent(struct device *dev, u64 mask) | 163 | |
164 | int | ||
165 | dma_set_mask_and_coherent(struct device *dev, u64 mask) | ||
146 | 166 | ||
147 | Checks to see if the mask is possible and updates the device | 167 | Checks to see if the mask is possible and updates the device |
148 | streaming and coherent DMA mask parameters if it is. | 168 | streaming and coherent DMA mask parameters if it is. |
149 | 169 | ||
150 | Returns: 0 if successful and a negative error if not. | 170 | Returns: 0 if successful and a negative error if not. |
151 | 171 | ||
152 | int | 172 | :: |
153 | dma_set_mask(struct device *dev, u64 mask) | 173 | |
174 | int | ||
175 | dma_set_mask(struct device *dev, u64 mask) | ||
154 | 176 | ||
155 | Checks to see if the mask is possible and updates the device | 177 | Checks to see if the mask is possible and updates the device |
156 | parameters if it is. | 178 | parameters if it is. |
157 | 179 | ||
158 | Returns: 0 if successful and a negative error if not. | 180 | Returns: 0 if successful and a negative error if not. |
159 | 181 | ||
160 | int | 182 | :: |
161 | dma_set_coherent_mask(struct device *dev, u64 mask) | 183 | |
184 | int | ||
185 | dma_set_coherent_mask(struct device *dev, u64 mask) | ||
162 | 186 | ||
163 | Checks to see if the mask is possible and updates the device | 187 | Checks to see if the mask is possible and updates the device |
164 | parameters if it is. | 188 | parameters if it is. |
165 | 189 | ||
166 | Returns: 0 if successful and a negative error if not. | 190 | Returns: 0 if successful and a negative error if not. |
167 | 191 | ||
168 | u64 | 192 | :: |
169 | dma_get_required_mask(struct device *dev) | 193 | |
194 | u64 | ||
195 | dma_get_required_mask(struct device *dev) | ||
170 | 196 | ||
171 | This API returns the mask that the platform requires to | 197 | This API returns the mask that the platform requires to |
172 | operate efficiently. Usually this means the returned mask | 198 | operate efficiently. Usually this means the returned mask |
@@ -182,94 +208,107 @@ call to set the mask to the value returned. | |||
182 | Part Id - Streaming DMA mappings | 208 | Part Id - Streaming DMA mappings |
183 | -------------------------------- | 209 | -------------------------------- |
184 | 210 | ||
185 | dma_addr_t | 211 | :: |
186 | dma_map_single(struct device *dev, void *cpu_addr, size_t size, | 212 | |
187 | enum dma_data_direction direction) | 213 | dma_addr_t |
214 | dma_map_single(struct device *dev, void *cpu_addr, size_t size, | ||
215 | enum dma_data_direction direction) | ||
188 | 216 | ||
189 | Maps a piece of processor virtual memory so it can be accessed by the | 217 | Maps a piece of processor virtual memory so it can be accessed by the |
190 | device and returns the DMA address of the memory. | 218 | device and returns the DMA address of the memory. |
191 | 219 | ||
192 | The direction for both APIs may be converted freely by casting. | 220 | The direction for both APIs may be converted freely by casting. |
193 | However the dma_ API uses a strongly typed enumerator for its | 221 | However the dma_API uses a strongly typed enumerator for its |
194 | direction: | 222 | direction: |
195 | 223 | ||
224 | ======================= ============================================= | ||
196 | DMA_NONE no direction (used for debugging) | 225 | DMA_NONE no direction (used for debugging) |
197 | DMA_TO_DEVICE data is going from the memory to the device | 226 | DMA_TO_DEVICE data is going from the memory to the device |
198 | DMA_FROM_DEVICE data is coming from the device to the memory | 227 | DMA_FROM_DEVICE data is coming from the device to the memory |
199 | DMA_BIDIRECTIONAL direction isn't known | 228 | DMA_BIDIRECTIONAL direction isn't known |
229 | ======================= ============================================= | ||
230 | |||
231 | .. note:: | ||
232 | |||
233 | Not all memory regions in a machine can be mapped by this API. | ||
234 | Further, contiguous kernel virtual space may not be contiguous as | ||
235 | physical memory. Since this API does not provide any scatter/gather | ||
236 | capability, it will fail if the user tries to map a non-physically | ||
237 | contiguous piece of memory. For this reason, memory to be mapped by | ||
238 | this API should be obtained from sources which guarantee it to be | ||
239 | physically contiguous (like kmalloc). | ||
240 | |||
241 | Further, the DMA address of the memory must be within the | ||
242 | dma_mask of the device (the dma_mask is a bit mask of the | ||
243 | addressable region for the device, i.e., if the DMA address of | ||
244 | the memory ANDed with the dma_mask is still equal to the DMA | ||
245 | address, then the device can perform DMA to the memory). To | ||
246 | ensure that the memory allocated by kmalloc is within the dma_mask, | ||
247 | the driver may specify various platform-dependent flags to restrict | ||
248 | the DMA address range of the allocation (e.g., on x86, GFP_DMA | ||
249 | guarantees to be within the first 16MB of available DMA addresses, | ||
250 | as required by ISA devices). | ||
251 | |||
252 | Note also that the above constraints on physical contiguity and | ||
253 | dma_mask may not apply if the platform has an IOMMU (a device which | ||
254 | maps an I/O DMA address to a physical memory address). However, to be | ||
255 | portable, device driver writers may *not* assume that such an IOMMU | ||
256 | exists. | ||
257 | |||
258 | .. warning:: | ||
259 | |||
260 | Memory coherency operates at a granularity called the cache | ||
261 | line width. In order for memory mapped by this API to operate | ||
262 | correctly, the mapped region must begin exactly on a cache line | ||
263 | boundary and end exactly on one (to prevent two separately mapped | ||
264 | regions from sharing a single cache line). Since the cache line size | ||
265 | may not be known at compile time, the API will not enforce this | ||
266 | requirement. Therefore, it is recommended that driver writers who | ||
267 | don't take special care to determine the cache line size at run time | ||
268 | only map virtual regions that begin and end on page boundaries (which | ||
269 | are guaranteed also to be cache line boundaries). | ||
270 | |||
271 | DMA_TO_DEVICE synchronisation must be done after the last modification | ||
272 | of the memory region by the software and before it is handed off to | ||
273 | the device. Once this primitive is used, memory covered by this | ||
274 | primitive should be treated as read-only by the device. If the device | ||
275 | may write to it at any point, it should be DMA_BIDIRECTIONAL (see | ||
276 | below). | ||
277 | |||
278 | DMA_FROM_DEVICE synchronisation must be done before the driver | ||
279 | accesses data that may be changed by the device. This memory should | ||
280 | be treated as read-only by the driver. If the driver needs to write | ||
281 | to it at any point, it should be DMA_BIDIRECTIONAL (see below). | ||
282 | |||
283 | DMA_BIDIRECTIONAL requires special handling: it means that the driver | ||
284 | isn't sure if the memory was modified before being handed off to the | ||
285 | device and also isn't sure if the device will also modify it. Thus, | ||
286 | you must always sync bidirectional memory twice: once before the | ||
287 | memory is handed off to the device (to make sure all memory changes | ||
288 | are flushed from the processor) and once before the data may be | ||
289 | accessed after being used by the device (to make sure any processor | ||
290 | cache lines are updated with data that the device may have changed). | ||
291 | |||
292 | :: | ||
200 | 293 | ||
201 | Notes: Not all memory regions in a machine can be mapped by this API. | 294 | void |
202 | Further, contiguous kernel virtual space may not be contiguous as | 295 | dma_unmap_single(struct device *dev, dma_addr_t dma_addr, size_t size, |
203 | physical memory. Since this API does not provide any scatter/gather | 296 | enum dma_data_direction direction) |
204 | capability, it will fail if the user tries to map a non-physically | ||
205 | contiguous piece of memory. For this reason, memory to be mapped by | ||
206 | this API should be obtained from sources which guarantee it to be | ||
207 | physically contiguous (like kmalloc). | ||
208 | |||
209 | Further, the DMA address of the memory must be within the | ||
210 | dma_mask of the device (the dma_mask is a bit mask of the | ||
211 | addressable region for the device, i.e., if the DMA address of | ||
212 | the memory ANDed with the dma_mask is still equal to the DMA | ||
213 | address, then the device can perform DMA to the memory). To | ||
214 | ensure that the memory allocated by kmalloc is within the dma_mask, | ||
215 | the driver may specify various platform-dependent flags to restrict | ||
216 | the DMA address range of the allocation (e.g., on x86, GFP_DMA | ||
217 | guarantees to be within the first 16MB of available DMA addresses, | ||
218 | as required by ISA devices). | ||
219 | |||
220 | Note also that the above constraints on physical contiguity and | ||
221 | dma_mask may not apply if the platform has an IOMMU (a device which | ||
222 | maps an I/O DMA address to a physical memory address). However, to be | ||
223 | portable, device driver writers may *not* assume that such an IOMMU | ||
224 | exists. | ||
225 | |||
226 | Warnings: Memory coherency operates at a granularity called the cache | ||
227 | line width. In order for memory mapped by this API to operate | ||
228 | correctly, the mapped region must begin exactly on a cache line | ||
229 | boundary and end exactly on one (to prevent two separately mapped | ||
230 | regions from sharing a single cache line). Since the cache line size | ||
231 | may not be known at compile time, the API will not enforce this | ||
232 | requirement. Therefore, it is recommended that driver writers who | ||
233 | don't take special care to determine the cache line size at run time | ||
234 | only map virtual regions that begin and end on page boundaries (which | ||
235 | are guaranteed also to be cache line boundaries). | ||
236 | |||
237 | DMA_TO_DEVICE synchronisation must be done after the last modification | ||
238 | of the memory region by the software and before it is handed off to | ||
239 | the device. Once this primitive is used, memory covered by this | ||
240 | primitive should be treated as read-only by the device. If the device | ||
241 | may write to it at any point, it should be DMA_BIDIRECTIONAL (see | ||
242 | below). | ||
243 | |||
244 | DMA_FROM_DEVICE synchronisation must be done before the driver | ||
245 | accesses data that may be changed by the device. This memory should | ||
246 | be treated as read-only by the driver. If the driver needs to write | ||
247 | to it at any point, it should be DMA_BIDIRECTIONAL (see below). | ||
248 | |||
249 | DMA_BIDIRECTIONAL requires special handling: it means that the driver | ||
250 | isn't sure if the memory was modified before being handed off to the | ||
251 | device and also isn't sure if the device will also modify it. Thus, | ||
252 | you must always sync bidirectional memory twice: once before the | ||
253 | memory is handed off to the device (to make sure all memory changes | ||
254 | are flushed from the processor) and once before the data may be | ||
255 | accessed after being used by the device (to make sure any processor | ||
256 | cache lines are updated with data that the device may have changed). | ||
257 | |||
258 | void | ||
259 | dma_unmap_single(struct device *dev, dma_addr_t dma_addr, size_t size, | ||
260 | enum dma_data_direction direction) | ||
261 | 297 | ||
262 | Unmaps the region previously mapped. All the parameters passed in | 298 | Unmaps the region previously mapped. All the parameters passed in |
263 | must be identical to those passed in (and returned) by the mapping | 299 | must be identical to those passed in (and returned) by the mapping |
264 | API. | 300 | API. |
265 | 301 | ||
266 | dma_addr_t | 302 | :: |
267 | dma_map_page(struct device *dev, struct page *page, | 303 | |
268 | unsigned long offset, size_t size, | 304 | dma_addr_t |
269 | enum dma_data_direction direction) | 305 | dma_map_page(struct device *dev, struct page *page, |
270 | void | 306 | unsigned long offset, size_t size, |
271 | dma_unmap_page(struct device *dev, dma_addr_t dma_address, size_t size, | 307 | enum dma_data_direction direction) |
272 | enum dma_data_direction direction) | 308 | |
309 | void | ||
310 | dma_unmap_page(struct device *dev, dma_addr_t dma_address, size_t size, | ||
311 | enum dma_data_direction direction) | ||
273 | 312 | ||
274 | API for mapping and unmapping for pages. All the notes and warnings | 313 | API for mapping and unmapping for pages. All the notes and warnings |
275 | for the other mapping APIs apply here. Also, although the <offset> | 314 | for the other mapping APIs apply here. Also, although the <offset> |
@@ -277,20 +316,24 @@ and <size> parameters are provided to do partial page mapping, it is | |||
277 | recommended that you never use these unless you really know what the | 316 | recommended that you never use these unless you really know what the |
278 | cache width is. | 317 | cache width is. |
279 | 318 | ||
280 | dma_addr_t | 319 | :: |
281 | dma_map_resource(struct device *dev, phys_addr_t phys_addr, size_t size, | ||
282 | enum dma_data_direction dir, unsigned long attrs) | ||
283 | 320 | ||
284 | void | 321 | dma_addr_t |
285 | dma_unmap_resource(struct device *dev, dma_addr_t addr, size_t size, | 322 | dma_map_resource(struct device *dev, phys_addr_t phys_addr, size_t size, |
286 | enum dma_data_direction dir, unsigned long attrs) | 323 | enum dma_data_direction dir, unsigned long attrs) |
324 | |||
325 | void | ||
326 | dma_unmap_resource(struct device *dev, dma_addr_t addr, size_t size, | ||
327 | enum dma_data_direction dir, unsigned long attrs) | ||
287 | 328 | ||
288 | API for mapping and unmapping for MMIO resources. All the notes and | 329 | API for mapping and unmapping for MMIO resources. All the notes and |
289 | warnings for the other mapping APIs apply here. The API should only be | 330 | warnings for the other mapping APIs apply here. The API should only be |
290 | used to map device MMIO resources, mapping of RAM is not permitted. | 331 | used to map device MMIO resources, mapping of RAM is not permitted. |
291 | 332 | ||
292 | int | 333 | :: |
293 | dma_mapping_error(struct device *dev, dma_addr_t dma_addr) | 334 | |
335 | int | ||
336 | dma_mapping_error(struct device *dev, dma_addr_t dma_addr) | ||
294 | 337 | ||
295 | In some circumstances dma_map_single(), dma_map_page() and dma_map_resource() | 338 | In some circumstances dma_map_single(), dma_map_page() and dma_map_resource() |
296 | will fail to create a mapping. A driver can check for these errors by testing | 339 | will fail to create a mapping. A driver can check for these errors by testing |
@@ -298,9 +341,11 @@ the returned DMA address with dma_mapping_error(). A non-zero return value | |||
298 | means the mapping could not be created and the driver should take appropriate | 341 | means the mapping could not be created and the driver should take appropriate |
299 | action (e.g. reduce current DMA mapping usage or delay and try again later). | 342 | action (e.g. reduce current DMA mapping usage or delay and try again later). |
300 | 343 | ||
344 | :: | ||
345 | |||
301 | int | 346 | int |
302 | dma_map_sg(struct device *dev, struct scatterlist *sg, | 347 | dma_map_sg(struct device *dev, struct scatterlist *sg, |
303 | int nents, enum dma_data_direction direction) | 348 | int nents, enum dma_data_direction direction) |
304 | 349 | ||
305 | Returns: the number of DMA address segments mapped (this may be shorter | 350 | Returns: the number of DMA address segments mapped (this may be shorter |
306 | than <nents> passed in if some elements of the scatter/gather list are | 351 | than <nents> passed in if some elements of the scatter/gather list are |
@@ -316,7 +361,7 @@ critical that the driver do something, in the case of a block driver | |||
316 | aborting the request or even oopsing is better than doing nothing and | 361 | aborting the request or even oopsing is better than doing nothing and |
317 | corrupting the filesystem. | 362 | corrupting the filesystem. |
318 | 363 | ||
319 | With scatterlists, you use the resulting mapping like this: | 364 | With scatterlists, you use the resulting mapping like this:: |
320 | 365 | ||
321 | int i, count = dma_map_sg(dev, sglist, nents, direction); | 366 | int i, count = dma_map_sg(dev, sglist, nents, direction); |
322 | struct scatterlist *sg; | 367 | struct scatterlist *sg; |
@@ -337,9 +382,11 @@ Then you should loop count times (note: this can be less than nents times) | |||
337 | and use sg_dma_address() and sg_dma_len() macros where you previously | 382 | and use sg_dma_address() and sg_dma_len() macros where you previously |
338 | accessed sg->address and sg->length as shown above. | 383 | accessed sg->address and sg->length as shown above. |
339 | 384 | ||
385 | :: | ||
386 | |||
340 | void | 387 | void |
341 | dma_unmap_sg(struct device *dev, struct scatterlist *sg, | 388 | dma_unmap_sg(struct device *dev, struct scatterlist *sg, |
342 | int nents, enum dma_data_direction direction) | 389 | int nents, enum dma_data_direction direction) |
343 | 390 | ||
344 | Unmap the previously mapped scatter/gather list. All the parameters | 391 | Unmap the previously mapped scatter/gather list. All the parameters |
345 | must be the same as those and passed in to the scatter/gather mapping | 392 | must be the same as those and passed in to the scatter/gather mapping |
@@ -348,18 +395,27 @@ API. | |||
348 | Note: <nents> must be the number you passed in, *not* the number of | 395 | Note: <nents> must be the number you passed in, *not* the number of |
349 | DMA address entries returned. | 396 | DMA address entries returned. |
350 | 397 | ||
351 | void | 398 | :: |
352 | dma_sync_single_for_cpu(struct device *dev, dma_addr_t dma_handle, size_t size, | 399 | |
353 | enum dma_data_direction direction) | 400 | void |
354 | void | 401 | dma_sync_single_for_cpu(struct device *dev, dma_addr_t dma_handle, |
355 | dma_sync_single_for_device(struct device *dev, dma_addr_t dma_handle, size_t size, | 402 | size_t size, |
356 | enum dma_data_direction direction) | 403 | enum dma_data_direction direction) |
357 | void | 404 | |
358 | dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg, int nents, | 405 | void |
359 | enum dma_data_direction direction) | 406 | dma_sync_single_for_device(struct device *dev, dma_addr_t dma_handle, |
360 | void | 407 | size_t size, |
361 | dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg, int nents, | 408 | enum dma_data_direction direction) |
362 | enum dma_data_direction direction) | 409 | |
410 | void | ||
411 | dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg, | ||
412 | int nents, | ||
413 | enum dma_data_direction direction) | ||
414 | |||
415 | void | ||
416 | dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg, | ||
417 | int nents, | ||
418 | enum dma_data_direction direction) | ||
363 | 419 | ||
364 | Synchronise a single contiguous or scatter/gather mapping for the CPU | 420 | Synchronise a single contiguous or scatter/gather mapping for the CPU |
365 | and device. With the sync_sg API, all the parameters must be the same | 421 | and device. With the sync_sg API, all the parameters must be the same |
@@ -367,36 +423,41 @@ as those passed into the single mapping API. With the sync_single API, | |||
367 | you can use dma_handle and size parameters that aren't identical to | 423 | you can use dma_handle and size parameters that aren't identical to |
368 | those passed into the single mapping API to do a partial sync. | 424 | those passed into the single mapping API to do a partial sync. |
369 | 425 | ||
370 | Notes: You must do this: | ||
371 | 426 | ||
372 | - Before reading values that have been written by DMA from the device | 427 | .. note:: |
373 | (use the DMA_FROM_DEVICE direction) | 428 | |
374 | - After writing values that will be written to the device using DMA | 429 | You must do this: |
375 | (use the DMA_TO_DEVICE) direction | 430 | |
376 | - before *and* after handing memory to the device if the memory is | 431 | - Before reading values that have been written by DMA from the device |
377 | DMA_BIDIRECTIONAL | 432 | (use the DMA_FROM_DEVICE direction) |
433 | - After writing values that will be written to the device using DMA | ||
434 | (use the DMA_TO_DEVICE) direction | ||
435 | - before *and* after handing memory to the device if the memory is | ||
436 | DMA_BIDIRECTIONAL | ||
378 | 437 | ||
379 | See also dma_map_single(). | 438 | See also dma_map_single(). |
380 | 439 | ||
381 | dma_addr_t | 440 | :: |
382 | dma_map_single_attrs(struct device *dev, void *cpu_addr, size_t size, | 441 | |
383 | enum dma_data_direction dir, | 442 | dma_addr_t |
384 | unsigned long attrs) | 443 | dma_map_single_attrs(struct device *dev, void *cpu_addr, size_t size, |
444 | enum dma_data_direction dir, | ||
445 | unsigned long attrs) | ||
385 | 446 | ||
386 | void | 447 | void |
387 | dma_unmap_single_attrs(struct device *dev, dma_addr_t dma_addr, | 448 | dma_unmap_single_attrs(struct device *dev, dma_addr_t dma_addr, |
388 | size_t size, enum dma_data_direction dir, | 449 | size_t size, enum dma_data_direction dir, |
389 | unsigned long attrs) | 450 | unsigned long attrs) |
390 | 451 | ||
391 | int | 452 | int |
392 | dma_map_sg_attrs(struct device *dev, struct scatterlist *sgl, | 453 | dma_map_sg_attrs(struct device *dev, struct scatterlist *sgl, |
393 | int nents, enum dma_data_direction dir, | 454 | int nents, enum dma_data_direction dir, |
394 | unsigned long attrs) | 455 | unsigned long attrs) |
395 | 456 | ||
396 | void | 457 | void |
397 | dma_unmap_sg_attrs(struct device *dev, struct scatterlist *sgl, | 458 | dma_unmap_sg_attrs(struct device *dev, struct scatterlist *sgl, |
398 | int nents, enum dma_data_direction dir, | 459 | int nents, enum dma_data_direction dir, |
399 | unsigned long attrs) | 460 | unsigned long attrs) |
400 | 461 | ||
401 | The four functions above are just like the counterpart functions | 462 | The four functions above are just like the counterpart functions |
402 | without the _attrs suffixes, except that they pass an optional | 463 | without the _attrs suffixes, except that they pass an optional |
@@ -410,37 +471,38 @@ is identical to those of the corresponding function | |||
410 | without the _attrs suffix. As a result dma_map_single_attrs() | 471 | without the _attrs suffix. As a result dma_map_single_attrs() |
411 | can generally replace dma_map_single(), etc. | 472 | can generally replace dma_map_single(), etc. |
412 | 473 | ||
413 | As an example of the use of the *_attrs functions, here's how | 474 | As an example of the use of the ``*_attrs`` functions, here's how |
414 | you could pass an attribute DMA_ATTR_FOO when mapping memory | 475 | you could pass an attribute DMA_ATTR_FOO when mapping memory |
415 | for DMA: | 476 | for DMA:: |
416 | 477 | ||
417 | #include <linux/dma-mapping.h> | 478 | #include <linux/dma-mapping.h> |
418 | /* DMA_ATTR_FOO should be defined in linux/dma-mapping.h and | 479 | /* DMA_ATTR_FOO should be defined in linux/dma-mapping.h and |
419 | * documented in Documentation/DMA-attributes.txt */ | 480 | * documented in Documentation/DMA-attributes.txt */ |
420 | ... | 481 | ... |
421 | 482 | ||
422 | unsigned long attr; | 483 | unsigned long attr; |
423 | attr |= DMA_ATTR_FOO; | 484 | attr |= DMA_ATTR_FOO; |
424 | .... | 485 | .... |
425 | n = dma_map_sg_attrs(dev, sg, nents, DMA_TO_DEVICE, attr); | 486 | n = dma_map_sg_attrs(dev, sg, nents, DMA_TO_DEVICE, attr); |
426 | .... | 487 | .... |
427 | 488 | ||
428 | Architectures that care about DMA_ATTR_FOO would check for its | 489 | Architectures that care about DMA_ATTR_FOO would check for its |
429 | presence in their implementations of the mapping and unmapping | 490 | presence in their implementations of the mapping and unmapping |
430 | routines, e.g.: | 491 | routines, e.g.::: |
431 | 492 | ||
432 | void whizco_dma_map_sg_attrs(struct device *dev, dma_addr_t dma_addr, | 493 | void whizco_dma_map_sg_attrs(struct device *dev, dma_addr_t dma_addr, |
433 | size_t size, enum dma_data_direction dir, | 494 | size_t size, enum dma_data_direction dir, |
434 | unsigned long attrs) | 495 | unsigned long attrs) |
435 | { | 496 | { |
436 | .... | 497 | .... |
437 | if (attrs & DMA_ATTR_FOO) | 498 | if (attrs & DMA_ATTR_FOO) |
438 | /* twizzle the frobnozzle */ | 499 | /* twizzle the frobnozzle */ |
439 | .... | 500 | .... |
501 | } | ||
440 | 502 | ||
441 | 503 | ||
442 | Part II - Advanced dma_ usage | 504 | Part II - Advanced dma usage |
443 | ----------------------------- | 505 | ---------------------------- |
444 | 506 | ||
445 | Warning: These pieces of the DMA API should not be used in the | 507 | Warning: These pieces of the DMA API should not be used in the |
446 | majority of cases, since they cater for unlikely corner cases that | 508 | majority of cases, since they cater for unlikely corner cases that |
@@ -450,9 +512,11 @@ If you don't understand how cache line coherency works between a | |||
450 | processor and an I/O device, you should not be using this part of the | 512 | processor and an I/O device, you should not be using this part of the |
451 | API at all. | 513 | API at all. |
452 | 514 | ||
453 | void * | 515 | :: |
454 | dma_alloc_noncoherent(struct device *dev, size_t size, | 516 | |
455 | dma_addr_t *dma_handle, gfp_t flag) | 517 | void * |
518 | dma_alloc_noncoherent(struct device *dev, size_t size, | ||
519 | dma_addr_t *dma_handle, gfp_t flag) | ||
456 | 520 | ||
457 | Identical to dma_alloc_coherent() except that the platform will | 521 | Identical to dma_alloc_coherent() except that the platform will |
458 | choose to return either consistent or non-consistent memory as it sees | 522 | choose to return either consistent or non-consistent memory as it sees |
@@ -468,39 +532,49 @@ only use this API if you positively know your driver will be | |||
468 | required to work on one of the rare (usually non-PCI) architectures | 532 | required to work on one of the rare (usually non-PCI) architectures |
469 | that simply cannot make consistent memory. | 533 | that simply cannot make consistent memory. |
470 | 534 | ||
471 | void | 535 | :: |
472 | dma_free_noncoherent(struct device *dev, size_t size, void *cpu_addr, | 536 | |
473 | dma_addr_t dma_handle) | 537 | void |
538 | dma_free_noncoherent(struct device *dev, size_t size, void *cpu_addr, | ||
539 | dma_addr_t dma_handle) | ||
474 | 540 | ||
475 | Free memory allocated by the nonconsistent API. All parameters must | 541 | Free memory allocated by the nonconsistent API. All parameters must |
476 | be identical to those passed in (and returned by | 542 | be identical to those passed in (and returned by |
477 | dma_alloc_noncoherent()). | 543 | dma_alloc_noncoherent()). |
478 | 544 | ||
479 | int | 545 | :: |
480 | dma_get_cache_alignment(void) | 546 | |
547 | int | ||
548 | dma_get_cache_alignment(void) | ||
481 | 549 | ||
482 | Returns the processor cache alignment. This is the absolute minimum | 550 | Returns the processor cache alignment. This is the absolute minimum |
483 | alignment *and* width that you must observe when either mapping | 551 | alignment *and* width that you must observe when either mapping |
484 | memory or doing partial flushes. | 552 | memory or doing partial flushes. |
485 | 553 | ||
486 | Notes: This API may return a number *larger* than the actual cache | 554 | .. note:: |
487 | line, but it will guarantee that one or more cache lines fit exactly | ||
488 | into the width returned by this call. It will also always be a power | ||
489 | of two for easy alignment. | ||
490 | 555 | ||
491 | void | 556 | This API may return a number *larger* than the actual cache |
492 | dma_cache_sync(struct device *dev, void *vaddr, size_t size, | 557 | line, but it will guarantee that one or more cache lines fit exactly |
493 | enum dma_data_direction direction) | 558 | into the width returned by this call. It will also always be a power |
559 | of two for easy alignment. | ||
560 | |||
561 | :: | ||
562 | |||
563 | void | ||
564 | dma_cache_sync(struct device *dev, void *vaddr, size_t size, | ||
565 | enum dma_data_direction direction) | ||
494 | 566 | ||
495 | Do a partial sync of memory that was allocated by | 567 | Do a partial sync of memory that was allocated by |
496 | dma_alloc_noncoherent(), starting at virtual address vaddr and | 568 | dma_alloc_noncoherent(), starting at virtual address vaddr and |
497 | continuing on for size. Again, you *must* observe the cache line | 569 | continuing on for size. Again, you *must* observe the cache line |
498 | boundaries when doing this. | 570 | boundaries when doing this. |
499 | 571 | ||
500 | int | 572 | :: |
501 | dma_declare_coherent_memory(struct device *dev, phys_addr_t phys_addr, | 573 | |
502 | dma_addr_t device_addr, size_t size, int | 574 | int |
503 | flags) | 575 | dma_declare_coherent_memory(struct device *dev, phys_addr_t phys_addr, |
576 | dma_addr_t device_addr, size_t size, int | ||
577 | flags) | ||
504 | 578 | ||
505 | Declare region of memory to be handed out by dma_alloc_coherent() when | 579 | Declare region of memory to be handed out by dma_alloc_coherent() when |
506 | it's asked for coherent memory for this device. | 580 | it's asked for coherent memory for this device. |
@@ -516,21 +590,21 @@ size is the size of the area (must be multiples of PAGE_SIZE). | |||
516 | 590 | ||
517 | flags can be ORed together and are: | 591 | flags can be ORed together and are: |
518 | 592 | ||
519 | DMA_MEMORY_MAP - request that the memory returned from | 593 | - DMA_MEMORY_MAP - request that the memory returned from |
520 | dma_alloc_coherent() be directly writable. | 594 | dma_alloc_coherent() be directly writable. |
521 | 595 | ||
522 | DMA_MEMORY_IO - request that the memory returned from | 596 | - DMA_MEMORY_IO - request that the memory returned from |
523 | dma_alloc_coherent() be addressable using read()/write()/memcpy_toio() etc. | 597 | dma_alloc_coherent() be addressable using read()/write()/memcpy_toio() etc. |
524 | 598 | ||
525 | One or both of these flags must be present. | 599 | One or both of these flags must be present. |
526 | 600 | ||
527 | DMA_MEMORY_INCLUDES_CHILDREN - make the declared memory be allocated by | 601 | - DMA_MEMORY_INCLUDES_CHILDREN - make the declared memory be allocated by |
528 | dma_alloc_coherent of any child devices of this one (for memory residing | 602 | dma_alloc_coherent of any child devices of this one (for memory residing |
529 | on a bridge). | 603 | on a bridge). |
530 | 604 | ||
531 | DMA_MEMORY_EXCLUSIVE - only allocate memory from the declared regions. | 605 | - DMA_MEMORY_EXCLUSIVE - only allocate memory from the declared regions. |
532 | Do not allow dma_alloc_coherent() to fall back to system memory when | 606 | Do not allow dma_alloc_coherent() to fall back to system memory when |
533 | it's out of memory in the declared region. | 607 | it's out of memory in the declared region. |
534 | 608 | ||
535 | The return value will be either DMA_MEMORY_MAP or DMA_MEMORY_IO and | 609 | The return value will be either DMA_MEMORY_MAP or DMA_MEMORY_IO and |
536 | must correspond to a passed in flag (i.e. no returning DMA_MEMORY_IO | 610 | must correspond to a passed in flag (i.e. no returning DMA_MEMORY_IO |
@@ -543,15 +617,17 @@ must be accessed using the correct bus functions. If your driver | |||
543 | isn't prepared to handle this contingency, it should not specify | 617 | isn't prepared to handle this contingency, it should not specify |
544 | DMA_MEMORY_IO in the input flags. | 618 | DMA_MEMORY_IO in the input flags. |
545 | 619 | ||
546 | As a simplification for the platforms, only *one* such region of | 620 | As a simplification for the platforms, only **one** such region of |
547 | memory may be declared per device. | 621 | memory may be declared per device. |
548 | 622 | ||
549 | For reasons of efficiency, most platforms choose to track the declared | 623 | For reasons of efficiency, most platforms choose to track the declared |
550 | region only at the granularity of a page. For smaller allocations, | 624 | region only at the granularity of a page. For smaller allocations, |
551 | you should use the dma_pool() API. | 625 | you should use the dma_pool() API. |
552 | 626 | ||
553 | void | 627 | :: |
554 | dma_release_declared_memory(struct device *dev) | 628 | |
629 | void | ||
630 | dma_release_declared_memory(struct device *dev) | ||
555 | 631 | ||
556 | Remove the memory region previously declared from the system. This | 632 | Remove the memory region previously declared from the system. This |
557 | API performs *no* in-use checking for this region and will return | 633 | API performs *no* in-use checking for this region and will return |
@@ -559,9 +635,11 @@ unconditionally having removed all the required structures. It is the | |||
559 | driver's job to ensure that no parts of this memory region are | 635 | driver's job to ensure that no parts of this memory region are |
560 | currently in use. | 636 | currently in use. |
561 | 637 | ||
562 | void * | 638 | :: |
563 | dma_mark_declared_memory_occupied(struct device *dev, | 639 | |
564 | dma_addr_t device_addr, size_t size) | 640 | void * |
641 | dma_mark_declared_memory_occupied(struct device *dev, | ||
642 | dma_addr_t device_addr, size_t size) | ||
565 | 643 | ||
566 | This is used to occupy specific regions of the declared space | 644 | This is used to occupy specific regions of the declared space |
567 | (dma_alloc_coherent() will hand out the first free region it finds). | 645 | (dma_alloc_coherent() will hand out the first free region it finds). |
@@ -592,38 +670,37 @@ option has a performance impact. Do not enable it in production kernels. | |||
592 | If you boot the resulting kernel will contain code which does some bookkeeping | 670 | If you boot the resulting kernel will contain code which does some bookkeeping |
593 | about what DMA memory was allocated for which device. If this code detects an | 671 | about what DMA memory was allocated for which device. If this code detects an |
594 | error it prints a warning message with some details into your kernel log. An | 672 | error it prints a warning message with some details into your kernel log. An |
595 | example warning message may look like this: | 673 | example warning message may look like this:: |
596 | 674 | ||
597 | ------------[ cut here ]------------ | 675 | WARNING: at /data2/repos/linux-2.6-iommu/lib/dma-debug.c:448 |
598 | WARNING: at /data2/repos/linux-2.6-iommu/lib/dma-debug.c:448 | 676 | check_unmap+0x203/0x490() |
599 | check_unmap+0x203/0x490() | 677 | Hardware name: |
600 | Hardware name: | 678 | forcedeth 0000:00:08.0: DMA-API: device driver frees DMA memory with wrong |
601 | forcedeth 0000:00:08.0: DMA-API: device driver frees DMA memory with wrong | 679 | function [device address=0x00000000640444be] [size=66 bytes] [mapped as |
602 | function [device address=0x00000000640444be] [size=66 bytes] [mapped as | 680 | single] [unmapped as page] |
603 | single] [unmapped as page] | 681 | Modules linked in: nfsd exportfs bridge stp llc r8169 |
604 | Modules linked in: nfsd exportfs bridge stp llc r8169 | 682 | Pid: 0, comm: swapper Tainted: G W 2.6.28-dmatest-09289-g8bb99c0 #1 |
605 | Pid: 0, comm: swapper Tainted: G W 2.6.28-dmatest-09289-g8bb99c0 #1 | 683 | Call Trace: |
606 | Call Trace: | 684 | <IRQ> [<ffffffff80240b22>] warn_slowpath+0xf2/0x130 |
607 | <IRQ> [<ffffffff80240b22>] warn_slowpath+0xf2/0x130 | 685 | [<ffffffff80647b70>] _spin_unlock+0x10/0x30 |
608 | [<ffffffff80647b70>] _spin_unlock+0x10/0x30 | 686 | [<ffffffff80537e75>] usb_hcd_link_urb_to_ep+0x75/0xc0 |
609 | [<ffffffff80537e75>] usb_hcd_link_urb_to_ep+0x75/0xc0 | 687 | [<ffffffff80647c22>] _spin_unlock_irqrestore+0x12/0x40 |
610 | [<ffffffff80647c22>] _spin_unlock_irqrestore+0x12/0x40 | 688 | [<ffffffff8055347f>] ohci_urb_enqueue+0x19f/0x7c0 |
611 | [<ffffffff8055347f>] ohci_urb_enqueue+0x19f/0x7c0 | 689 | [<ffffffff80252f96>] queue_work+0x56/0x60 |
612 | [<ffffffff80252f96>] queue_work+0x56/0x60 | 690 | [<ffffffff80237e10>] enqueue_task_fair+0x20/0x50 |
613 | [<ffffffff80237e10>] enqueue_task_fair+0x20/0x50 | 691 | [<ffffffff80539279>] usb_hcd_submit_urb+0x379/0xbc0 |
614 | [<ffffffff80539279>] usb_hcd_submit_urb+0x379/0xbc0 | 692 | [<ffffffff803b78c3>] cpumask_next_and+0x23/0x40 |
615 | [<ffffffff803b78c3>] cpumask_next_and+0x23/0x40 | 693 | [<ffffffff80235177>] find_busiest_group+0x207/0x8a0 |
616 | [<ffffffff80235177>] find_busiest_group+0x207/0x8a0 | 694 | [<ffffffff8064784f>] _spin_lock_irqsave+0x1f/0x50 |
617 | [<ffffffff8064784f>] _spin_lock_irqsave+0x1f/0x50 | 695 | [<ffffffff803c7ea3>] check_unmap+0x203/0x490 |
618 | [<ffffffff803c7ea3>] check_unmap+0x203/0x490 | 696 | [<ffffffff803c8259>] debug_dma_unmap_page+0x49/0x50 |
619 | [<ffffffff803c8259>] debug_dma_unmap_page+0x49/0x50 | 697 | [<ffffffff80485f26>] nv_tx_done_optimized+0xc6/0x2c0 |
620 | [<ffffffff80485f26>] nv_tx_done_optimized+0xc6/0x2c0 | 698 | [<ffffffff80486c13>] nv_nic_irq_optimized+0x73/0x2b0 |
621 | [<ffffffff80486c13>] nv_nic_irq_optimized+0x73/0x2b0 | 699 | [<ffffffff8026df84>] handle_IRQ_event+0x34/0x70 |
622 | [<ffffffff8026df84>] handle_IRQ_event+0x34/0x70 | 700 | [<ffffffff8026ffe9>] handle_edge_irq+0xc9/0x150 |
623 | [<ffffffff8026ffe9>] handle_edge_irq+0xc9/0x150 | 701 | [<ffffffff8020e3ab>] do_IRQ+0xcb/0x1c0 |
624 | [<ffffffff8020e3ab>] do_IRQ+0xcb/0x1c0 | 702 | [<ffffffff8020c093>] ret_from_intr+0x0/0xa |
625 | [<ffffffff8020c093>] ret_from_intr+0x0/0xa | 703 | <EOI> <4>---[ end trace f6435a98e2a38c0e ]--- |
626 | <EOI> <4>---[ end trace f6435a98e2a38c0e ]--- | ||
627 | 704 | ||
628 | The driver developer can find the driver and the device including a stacktrace | 705 | The driver developer can find the driver and the device including a stacktrace |
629 | of the DMA-API call which caused this warning. | 706 | of the DMA-API call which caused this warning. |
@@ -637,43 +714,42 @@ details. | |||
637 | The debugfs directory for the DMA-API debugging code is called dma-api/. In | 714 | The debugfs directory for the DMA-API debugging code is called dma-api/. In |
638 | this directory the following files can currently be found: | 715 | this directory the following files can currently be found: |
639 | 716 | ||
640 | dma-api/all_errors This file contains a numeric value. If this | 717 | =============================== =============================================== |
718 | dma-api/all_errors This file contains a numeric value. If this | ||
641 | value is not equal to zero the debugging code | 719 | value is not equal to zero the debugging code |
642 | will print a warning for every error it finds | 720 | will print a warning for every error it finds |
643 | into the kernel log. Be careful with this | 721 | into the kernel log. Be careful with this |
644 | option, as it can easily flood your logs. | 722 | option, as it can easily flood your logs. |
645 | 723 | ||
646 | dma-api/disabled This read-only file contains the character 'Y' | 724 | dma-api/disabled This read-only file contains the character 'Y' |
647 | if the debugging code is disabled. This can | 725 | if the debugging code is disabled. This can |
648 | happen when it runs out of memory or if it was | 726 | happen when it runs out of memory or if it was |
649 | disabled at boot time | 727 | disabled at boot time |
650 | 728 | ||
651 | dma-api/error_count This file is read-only and shows the total | 729 | dma-api/error_count This file is read-only and shows the total |
652 | numbers of errors found. | 730 | numbers of errors found. |
653 | 731 | ||
654 | dma-api/num_errors The number in this file shows how many | 732 | dma-api/num_errors The number in this file shows how many |
655 | warnings will be printed to the kernel log | 733 | warnings will be printed to the kernel log |
656 | before it stops. This number is initialized to | 734 | before it stops. This number is initialized to |
657 | one at system boot and be set by writing into | 735 | one at system boot and be set by writing into |
658 | this file | 736 | this file |
659 | 737 | ||
660 | dma-api/min_free_entries | 738 | dma-api/min_free_entries This read-only file can be read to get the |
661 | This read-only file can be read to get the | ||
662 | minimum number of free dma_debug_entries the | 739 | minimum number of free dma_debug_entries the |
663 | allocator has ever seen. If this value goes | 740 | allocator has ever seen. If this value goes |
664 | down to zero the code will disable itself | 741 | down to zero the code will disable itself |
665 | because it is not longer reliable. | 742 | because it is not longer reliable. |
666 | 743 | ||
667 | dma-api/num_free_entries | 744 | dma-api/num_free_entries The current number of free dma_debug_entries |
668 | The current number of free dma_debug_entries | ||
669 | in the allocator. | 745 | in the allocator. |
670 | 746 | ||
671 | dma-api/driver-filter | 747 | dma-api/driver-filter You can write a name of a driver into this file |
672 | You can write a name of a driver into this file | ||
673 | to limit the debug output to requests from that | 748 | to limit the debug output to requests from that |
674 | particular driver. Write an empty string to | 749 | particular driver. Write an empty string to |
675 | that file to disable the filter and see | 750 | that file to disable the filter and see |
676 | all errors again. | 751 | all errors again. |
752 | =============================== =============================================== | ||
677 | 753 | ||
678 | If you have this code compiled into your kernel it will be enabled by default. | 754 | If you have this code compiled into your kernel it will be enabled by default. |
679 | If you want to boot without the bookkeeping anyway you can provide | 755 | If you want to boot without the bookkeeping anyway you can provide |
@@ -692,7 +768,10 @@ of preallocated entries is defined per architecture. If it is too low for you | |||
692 | boot with 'dma_debug_entries=<your_desired_number>' to overwrite the | 768 | boot with 'dma_debug_entries=<your_desired_number>' to overwrite the |
693 | architectural default. | 769 | architectural default. |
694 | 770 | ||
695 | void debug_dma_mapping_error(struct device *dev, dma_addr_t dma_addr); | 771 | :: |
772 | |||
773 | void | ||
774 | debug_dma_mapping_error(struct device *dev, dma_addr_t dma_addr); | ||
696 | 775 | ||
697 | dma-debug interface debug_dma_mapping_error() to debug drivers that fail | 776 | dma-debug interface debug_dma_mapping_error() to debug drivers that fail |
698 | to check DMA mapping errors on addresses returned by dma_map_single() and | 777 | to check DMA mapping errors on addresses returned by dma_map_single() and |
@@ -702,4 +781,3 @@ the driver. When driver does unmap, debug_dma_unmap() checks the flag and if | |||
702 | this flag is still set, prints warning message that includes call trace that | 781 | this flag is still set, prints warning message that includes call trace that |
703 | leads up to the unmap. This interface can be called from dma_mapping_error() | 782 | leads up to the unmap. This interface can be called from dma_mapping_error() |
704 | routines to enable DMA mapping error check debugging. | 783 | routines to enable DMA mapping error check debugging. |
705 | |||
diff --git a/Documentation/DMA-ISA-LPC.txt b/Documentation/DMA-ISA-LPC.txt index 7a065ac4a9d1..8c2b8be6e45b 100644 --- a/Documentation/DMA-ISA-LPC.txt +++ b/Documentation/DMA-ISA-LPC.txt | |||
@@ -1,19 +1,20 @@ | |||
1 | DMA with ISA and LPC devices | 1 | ============================ |
2 | ============================ | 2 | DMA with ISA and LPC devices |
3 | ============================ | ||
3 | 4 | ||
4 | Pierre Ossman <drzeus@drzeus.cx> | 5 | :Author: Pierre Ossman <drzeus@drzeus.cx> |
5 | 6 | ||
6 | This document describes how to do DMA transfers using the old ISA DMA | 7 | This document describes how to do DMA transfers using the old ISA DMA |
7 | controller. Even though ISA is more or less dead today the LPC bus | 8 | controller. Even though ISA is more or less dead today the LPC bus |
8 | uses the same DMA system so it will be around for quite some time. | 9 | uses the same DMA system so it will be around for quite some time. |
9 | 10 | ||
10 | Part I - Headers and dependencies | 11 | Headers and dependencies |
11 | --------------------------------- | 12 | ------------------------ |
12 | 13 | ||
13 | To do ISA style DMA you need to include two headers: | 14 | To do ISA style DMA you need to include two headers:: |
14 | 15 | ||
15 | #include <linux/dma-mapping.h> | 16 | #include <linux/dma-mapping.h> |
16 | #include <asm/dma.h> | 17 | #include <asm/dma.h> |
17 | 18 | ||
18 | The first is the generic DMA API used to convert virtual addresses to | 19 | The first is the generic DMA API used to convert virtual addresses to |
19 | bus addresses (see Documentation/DMA-API.txt for details). | 20 | bus addresses (see Documentation/DMA-API.txt for details). |
@@ -23,8 +24,8 @@ this is not present on all platforms make sure you construct your | |||
23 | Kconfig to be dependent on ISA_DMA_API (not ISA) so that nobody tries | 24 | Kconfig to be dependent on ISA_DMA_API (not ISA) so that nobody tries |
24 | to build your driver on unsupported platforms. | 25 | to build your driver on unsupported platforms. |
25 | 26 | ||
26 | Part II - Buffer allocation | 27 | Buffer allocation |
27 | --------------------------- | 28 | ----------------- |
28 | 29 | ||
29 | The ISA DMA controller has some very strict requirements on which | 30 | The ISA DMA controller has some very strict requirements on which |
30 | memory it can access so extra care must be taken when allocating | 31 | memory it can access so extra care must be taken when allocating |
@@ -47,8 +48,8 @@ __GFP_RETRY_MAYFAIL and __GFP_NOWARN to make the allocator try a bit harder. | |||
47 | (This scarcity also means that you should allocate the buffer as | 48 | (This scarcity also means that you should allocate the buffer as |
48 | early as possible and not release it until the driver is unloaded.) | 49 | early as possible and not release it until the driver is unloaded.) |
49 | 50 | ||
50 | Part III - Address translation | 51 | Address translation |
51 | ------------------------------ | 52 | ------------------- |
52 | 53 | ||
53 | To translate the virtual address to a bus address, use the normal DMA | 54 | To translate the virtual address to a bus address, use the normal DMA |
54 | API. Do _not_ use isa_virt_to_phys() even though it does the same | 55 | API. Do _not_ use isa_virt_to_phys() even though it does the same |
@@ -61,8 +62,8 @@ Note: x86_64 had a broken DMA API when it came to ISA but has since | |||
61 | been fixed. If your arch has problems then fix the DMA API instead of | 62 | been fixed. If your arch has problems then fix the DMA API instead of |
62 | reverting to the ISA functions. | 63 | reverting to the ISA functions. |
63 | 64 | ||
64 | Part IV - Channels | 65 | Channels |
65 | ------------------ | 66 | -------- |
66 | 67 | ||
67 | A normal ISA DMA controller has 8 channels. The lower four are for | 68 | A normal ISA DMA controller has 8 channels. The lower four are for |
68 | 8-bit transfers and the upper four are for 16-bit transfers. | 69 | 8-bit transfers and the upper four are for 16-bit transfers. |
@@ -80,8 +81,8 @@ The ability to use 16-bit or 8-bit transfers is _not_ up to you as a | |||
80 | driver author but depends on what the hardware supports. Check your | 81 | driver author but depends on what the hardware supports. Check your |
81 | specs or test different channels. | 82 | specs or test different channels. |
82 | 83 | ||
83 | Part V - Transfer data | 84 | Transfer data |
84 | ---------------------- | 85 | ------------- |
85 | 86 | ||
86 | Now for the good stuff, the actual DMA transfer. :) | 87 | Now for the good stuff, the actual DMA transfer. :) |
87 | 88 | ||
@@ -112,37 +113,37 @@ Once the DMA transfer is finished (or timed out) you should disable | |||
112 | the channel again. You should also check get_dma_residue() to make | 113 | the channel again. You should also check get_dma_residue() to make |
113 | sure that all data has been transferred. | 114 | sure that all data has been transferred. |
114 | 115 | ||
115 | Example: | 116 | Example:: |
116 | 117 | ||
117 | int flags, residue; | 118 | int flags, residue; |
118 | 119 | ||
119 | flags = claim_dma_lock(); | 120 | flags = claim_dma_lock(); |
120 | 121 | ||
121 | clear_dma_ff(); | 122 | clear_dma_ff(); |
122 | 123 | ||
123 | set_dma_mode(channel, DMA_MODE_WRITE); | 124 | set_dma_mode(channel, DMA_MODE_WRITE); |
124 | set_dma_addr(channel, phys_addr); | 125 | set_dma_addr(channel, phys_addr); |
125 | set_dma_count(channel, num_bytes); | 126 | set_dma_count(channel, num_bytes); |
126 | 127 | ||
127 | dma_enable(channel); | 128 | dma_enable(channel); |
128 | 129 | ||
129 | release_dma_lock(flags); | 130 | release_dma_lock(flags); |
130 | 131 | ||
131 | while (!device_done()); | 132 | while (!device_done()); |
132 | 133 | ||
133 | flags = claim_dma_lock(); | 134 | flags = claim_dma_lock(); |
134 | 135 | ||
135 | dma_disable(channel); | 136 | dma_disable(channel); |
136 | 137 | ||
137 | residue = dma_get_residue(channel); | 138 | residue = dma_get_residue(channel); |
138 | if (residue != 0) | 139 | if (residue != 0) |
139 | printk(KERN_ERR "driver: Incomplete DMA transfer!" | 140 | printk(KERN_ERR "driver: Incomplete DMA transfer!" |
140 | " %d bytes left!\n", residue); | 141 | " %d bytes left!\n", residue); |
141 | 142 | ||
142 | release_dma_lock(flags); | 143 | release_dma_lock(flags); |
143 | 144 | ||
144 | Part VI - Suspend/resume | 145 | Suspend/resume |
145 | ------------------------ | 146 | -------------- |
146 | 147 | ||
147 | It is the driver's responsibility to make sure that the machine isn't | 148 | It is the driver's responsibility to make sure that the machine isn't |
148 | suspended while a DMA transfer is in progress. Also, all DMA settings | 149 | suspended while a DMA transfer is in progress. Also, all DMA settings |
diff --git a/Documentation/DMA-attributes.txt b/Documentation/DMA-attributes.txt index 44c6bc496eee..8f8d97f65d73 100644 --- a/Documentation/DMA-attributes.txt +++ b/Documentation/DMA-attributes.txt | |||
@@ -1,5 +1,6 @@ | |||
1 | DMA attributes | 1 | ============== |
2 | ============== | 2 | DMA attributes |
3 | ============== | ||
3 | 4 | ||
4 | This document describes the semantics of the DMA attributes that are | 5 | This document describes the semantics of the DMA attributes that are |
5 | defined in linux/dma-mapping.h. | 6 | defined in linux/dma-mapping.h. |
@@ -108,6 +109,7 @@ This is a hint to the DMA-mapping subsystem that it's probably not worth | |||
108 | the time to try to allocate memory to in a way that gives better TLB | 109 | the time to try to allocate memory to in a way that gives better TLB |
109 | efficiency (AKA it's not worth trying to build the mapping out of larger | 110 | efficiency (AKA it's not worth trying to build the mapping out of larger |
110 | pages). You might want to specify this if: | 111 | pages). You might want to specify this if: |
112 | |||
111 | - You know that the accesses to this memory won't thrash the TLB. | 113 | - You know that the accesses to this memory won't thrash the TLB. |
112 | You might know that the accesses are likely to be sequential or | 114 | You might know that the accesses are likely to be sequential or |
113 | that they aren't sequential but it's unlikely you'll ping-pong | 115 | that they aren't sequential but it's unlikely you'll ping-pong |
@@ -121,11 +123,12 @@ pages). You might want to specify this if: | |||
121 | the mapping to have a short lifetime then it may be worth it to | 123 | the mapping to have a short lifetime then it may be worth it to |
122 | optimize allocation (avoid coming up with large pages) instead of | 124 | optimize allocation (avoid coming up with large pages) instead of |
123 | getting the slight performance win of larger pages. | 125 | getting the slight performance win of larger pages. |
126 | |||
124 | Setting this hint doesn't guarantee that you won't get huge pages, but it | 127 | Setting this hint doesn't guarantee that you won't get huge pages, but it |
125 | means that we won't try quite as hard to get them. | 128 | means that we won't try quite as hard to get them. |
126 | 129 | ||
127 | NOTE: At the moment DMA_ATTR_ALLOC_SINGLE_PAGES is only implemented on ARM, | 130 | .. note:: At the moment DMA_ATTR_ALLOC_SINGLE_PAGES is only implemented on ARM, |
128 | though ARM64 patches will likely be posted soon. | 131 | though ARM64 patches will likely be posted soon. |
129 | 132 | ||
130 | DMA_ATTR_NO_WARN | 133 | DMA_ATTR_NO_WARN |
131 | ---------------- | 134 | ---------------- |
@@ -142,10 +145,10 @@ problem at all, depending on the implementation of the retry mechanism. | |||
142 | So, this provides a way for drivers to avoid those error messages on calls | 145 | So, this provides a way for drivers to avoid those error messages on calls |
143 | where allocation failures are not a problem, and shouldn't bother the logs. | 146 | where allocation failures are not a problem, and shouldn't bother the logs. |
144 | 147 | ||
145 | NOTE: At the moment DMA_ATTR_NO_WARN is only implemented on PowerPC. | 148 | .. note:: At the moment DMA_ATTR_NO_WARN is only implemented on PowerPC. |
146 | 149 | ||
147 | DMA_ATTR_PRIVILEGED | 150 | DMA_ATTR_PRIVILEGED |
148 | ------------------------------ | 151 | ------------------- |
149 | 152 | ||
150 | Some advanced peripherals such as remote processors and GPUs perform | 153 | Some advanced peripherals such as remote processors and GPUs perform |
151 | accesses to DMA buffers in both privileged "supervisor" and unprivileged | 154 | accesses to DMA buffers in both privileged "supervisor" and unprivileged |
diff --git a/Documentation/IPMI.txt b/Documentation/IPMI.txt index 6962cab997ef..aa77a25a0940 100644 --- a/Documentation/IPMI.txt +++ b/Documentation/IPMI.txt | |||
@@ -1,9 +1,8 @@ | |||
1 | ===================== | ||
2 | The Linux IPMI Driver | ||
3 | ===================== | ||
1 | 4 | ||
2 | The Linux IPMI Driver | 5 | :Author: Corey Minyard <minyard@mvista.com> / <minyard@acm.org> |
3 | --------------------- | ||
4 | Corey Minyard | ||
5 | <minyard@mvista.com> | ||
6 | <minyard@acm.org> | ||
7 | 6 | ||
8 | The Intelligent Platform Management Interface, or IPMI, is a | 7 | The Intelligent Platform Management Interface, or IPMI, is a |
9 | standard for controlling intelligent devices that monitor a system. | 8 | standard for controlling intelligent devices that monitor a system. |
@@ -141,7 +140,7 @@ Addressing | |||
141 | ---------- | 140 | ---------- |
142 | 141 | ||
143 | The IPMI addressing works much like IP addresses, you have an overlay | 142 | The IPMI addressing works much like IP addresses, you have an overlay |
144 | to handle the different address types. The overlay is: | 143 | to handle the different address types. The overlay is:: |
145 | 144 | ||
146 | struct ipmi_addr | 145 | struct ipmi_addr |
147 | { | 146 | { |
@@ -153,7 +152,7 @@ to handle the different address types. The overlay is: | |||
153 | The addr_type determines what the address really is. The driver | 152 | The addr_type determines what the address really is. The driver |
154 | currently understands two different types of addresses. | 153 | currently understands two different types of addresses. |
155 | 154 | ||
156 | "System Interface" addresses are defined as: | 155 | "System Interface" addresses are defined as:: |
157 | 156 | ||
158 | struct ipmi_system_interface_addr | 157 | struct ipmi_system_interface_addr |
159 | { | 158 | { |
@@ -166,7 +165,7 @@ straight to the BMC on the current card. The channel must be | |||
166 | IPMI_BMC_CHANNEL. | 165 | IPMI_BMC_CHANNEL. |
167 | 166 | ||
168 | Messages that are destined to go out on the IPMB bus use the | 167 | Messages that are destined to go out on the IPMB bus use the |
169 | IPMI_IPMB_ADDR_TYPE address type. The format is | 168 | IPMI_IPMB_ADDR_TYPE address type. The format is:: |
170 | 169 | ||
171 | struct ipmi_ipmb_addr | 170 | struct ipmi_ipmb_addr |
172 | { | 171 | { |
@@ -184,16 +183,16 @@ spec. | |||
184 | Messages | 183 | Messages |
185 | -------- | 184 | -------- |
186 | 185 | ||
187 | Messages are defined as: | 186 | Messages are defined as:: |
188 | 187 | ||
189 | struct ipmi_msg | 188 | struct ipmi_msg |
190 | { | 189 | { |
191 | unsigned char netfn; | 190 | unsigned char netfn; |
192 | unsigned char lun; | 191 | unsigned char lun; |
193 | unsigned char cmd; | 192 | unsigned char cmd; |
194 | unsigned char *data; | 193 | unsigned char *data; |
195 | int data_len; | 194 | int data_len; |
196 | }; | 195 | }; |
197 | 196 | ||
198 | The driver takes care of adding/stripping the header information. The | 197 | The driver takes care of adding/stripping the header information. The |
199 | data portion is just the data to be send (do NOT put addressing info | 198 | data portion is just the data to be send (do NOT put addressing info |
@@ -208,7 +207,7 @@ block of data, even when receiving messages. Otherwise the driver | |||
208 | will have no place to put the message. | 207 | will have no place to put the message. |
209 | 208 | ||
210 | Messages coming up from the message handler in kernelland will come in | 209 | Messages coming up from the message handler in kernelland will come in |
211 | as: | 210 | as:: |
212 | 211 | ||
213 | struct ipmi_recv_msg | 212 | struct ipmi_recv_msg |
214 | { | 213 | { |
@@ -246,6 +245,7 @@ and the user should not have to care what type of SMI is below them. | |||
246 | 245 | ||
247 | 246 | ||
248 | Watching For Interfaces | 247 | Watching For Interfaces |
248 | ^^^^^^^^^^^^^^^^^^^^^^^ | ||
249 | 249 | ||
250 | When your code comes up, the IPMI driver may or may not have detected | 250 | When your code comes up, the IPMI driver may or may not have detected |
251 | if IPMI devices exist. So you might have to defer your setup until | 251 | if IPMI devices exist. So you might have to defer your setup until |
@@ -256,6 +256,7 @@ and tell you when they come and go. | |||
256 | 256 | ||
257 | 257 | ||
258 | Creating the User | 258 | Creating the User |
259 | ^^^^^^^^^^^^^^^^^ | ||
259 | 260 | ||
260 | To use the message handler, you must first create a user using | 261 | To use the message handler, you must first create a user using |
261 | ipmi_create_user. The interface number specifies which SMI you want | 262 | ipmi_create_user. The interface number specifies which SMI you want |
@@ -272,6 +273,7 @@ closing the device automatically destroys the user. | |||
272 | 273 | ||
273 | 274 | ||
274 | Messaging | 275 | Messaging |
276 | ^^^^^^^^^ | ||
275 | 277 | ||
276 | To send a message from kernel-land, the ipmi_request_settime() call does | 278 | To send a message from kernel-land, the ipmi_request_settime() call does |
277 | pretty much all message handling. Most of the parameter are | 279 | pretty much all message handling. Most of the parameter are |
@@ -321,6 +323,7 @@ though, since it is tricky to manage your own buffers. | |||
321 | 323 | ||
322 | 324 | ||
323 | Events and Incoming Commands | 325 | Events and Incoming Commands |
326 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
324 | 327 | ||
325 | The driver takes care of polling for IPMI events and receiving | 328 | The driver takes care of polling for IPMI events and receiving |
326 | commands (commands are messages that are not responses, they are | 329 | commands (commands are messages that are not responses, they are |
@@ -367,7 +370,7 @@ in the system. It discovers interfaces through a host of different | |||
367 | methods, depending on the system. | 370 | methods, depending on the system. |
368 | 371 | ||
369 | You can specify up to four interfaces on the module load line and | 372 | You can specify up to four interfaces on the module load line and |
370 | control some module parameters: | 373 | control some module parameters:: |
371 | 374 | ||
372 | modprobe ipmi_si.o type=<type1>,<type2>.... | 375 | modprobe ipmi_si.o type=<type1>,<type2>.... |
373 | ports=<port1>,<port2>... addrs=<addr1>,<addr2>... | 376 | ports=<port1>,<port2>... addrs=<addr1>,<addr2>... |
@@ -437,7 +440,7 @@ default is one. Setting to 0 is useful with the hotmod, but is | |||
437 | obviously only useful for modules. | 440 | obviously only useful for modules. |
438 | 441 | ||
439 | When compiled into the kernel, the parameters can be specified on the | 442 | When compiled into the kernel, the parameters can be specified on the |
440 | kernel command line as: | 443 | kernel command line as:: |
441 | 444 | ||
442 | ipmi_si.type=<type1>,<type2>... | 445 | ipmi_si.type=<type1>,<type2>... |
443 | ipmi_si.ports=<port1>,<port2>... ipmi_si.addrs=<addr1>,<addr2>... | 446 | ipmi_si.ports=<port1>,<port2>... ipmi_si.addrs=<addr1>,<addr2>... |
@@ -474,16 +477,22 @@ The driver supports a hot add and remove of interfaces. This way, | |||
474 | interfaces can be added or removed after the kernel is up and running. | 477 | interfaces can be added or removed after the kernel is up and running. |
475 | This is done using /sys/modules/ipmi_si/parameters/hotmod, which is a | 478 | This is done using /sys/modules/ipmi_si/parameters/hotmod, which is a |
476 | write-only parameter. You write a string to this interface. The string | 479 | write-only parameter. You write a string to this interface. The string |
477 | has the format: | 480 | has the format:: |
481 | |||
478 | <op1>[:op2[:op3...]] | 482 | <op1>[:op2[:op3...]] |
479 | The "op"s are: | 483 | |
484 | The "op"s are:: | ||
485 | |||
480 | add|remove,kcs|bt|smic,mem|i/o,<address>[,<opt1>[,<opt2>[,...]]] | 486 | add|remove,kcs|bt|smic,mem|i/o,<address>[,<opt1>[,<opt2>[,...]]] |
481 | You can specify more than one interface on the line. The "opt"s are: | 487 | |
488 | You can specify more than one interface on the line. The "opt"s are:: | ||
489 | |||
482 | rsp=<regspacing> | 490 | rsp=<regspacing> |
483 | rsi=<regsize> | 491 | rsi=<regsize> |
484 | rsh=<regshift> | 492 | rsh=<regshift> |
485 | irq=<irq> | 493 | irq=<irq> |
486 | ipmb=<ipmb slave addr> | 494 | ipmb=<ipmb slave addr> |
495 | |||
487 | and these have the same meanings as discussed above. Note that you | 496 | and these have the same meanings as discussed above. Note that you |
488 | can also use this on the kernel command line for a more compact format | 497 | can also use this on the kernel command line for a more compact format |
489 | for specifying an interface. Note that when removing an interface, | 498 | for specifying an interface. Note that when removing an interface, |
@@ -496,7 +505,7 @@ The SMBus Driver (SSIF) | |||
496 | The SMBus driver allows up to 4 SMBus devices to be configured in the | 505 | The SMBus driver allows up to 4 SMBus devices to be configured in the |
497 | system. By default, the driver will only register with something it | 506 | system. By default, the driver will only register with something it |
498 | finds in DMI or ACPI tables. You can change this | 507 | finds in DMI or ACPI tables. You can change this |
499 | at module load time (for a module) with: | 508 | at module load time (for a module) with:: |
500 | 509 | ||
501 | modprobe ipmi_ssif.o | 510 | modprobe ipmi_ssif.o |
502 | addr=<i2caddr1>[,<i2caddr2>[,...]] | 511 | addr=<i2caddr1>[,<i2caddr2>[,...]] |
@@ -535,7 +544,7 @@ the smb_addr parameter unless you have DMI or ACPI data to tell the | |||
535 | driver what to use. | 544 | driver what to use. |
536 | 545 | ||
537 | When compiled into the kernel, the addresses can be specified on the | 546 | When compiled into the kernel, the addresses can be specified on the |
538 | kernel command line as: | 547 | kernel command line as:: |
539 | 548 | ||
540 | ipmb_ssif.addr=<i2caddr1>[,<i2caddr2>[...]] | 549 | ipmb_ssif.addr=<i2caddr1>[,<i2caddr2>[...]] |
541 | ipmi_ssif.adapter=<adapter1>[,<adapter2>[...]] | 550 | ipmi_ssif.adapter=<adapter1>[,<adapter2>[...]] |
@@ -565,9 +574,9 @@ Some users need more detailed information about a device, like where | |||
565 | the address came from or the raw base device for the IPMI interface. | 574 | the address came from or the raw base device for the IPMI interface. |
566 | You can use the IPMI smi_watcher to catch the IPMI interfaces as they | 575 | You can use the IPMI smi_watcher to catch the IPMI interfaces as they |
567 | come or go, and to grab the information, you can use the function | 576 | come or go, and to grab the information, you can use the function |
568 | ipmi_get_smi_info(), which returns the following structure: | 577 | ipmi_get_smi_info(), which returns the following structure:: |
569 | 578 | ||
570 | struct ipmi_smi_info { | 579 | struct ipmi_smi_info { |
571 | enum ipmi_addr_src addr_src; | 580 | enum ipmi_addr_src addr_src; |
572 | struct device *dev; | 581 | struct device *dev; |
573 | union { | 582 | union { |
@@ -575,7 +584,7 @@ struct ipmi_smi_info { | |||
575 | void *acpi_handle; | 584 | void *acpi_handle; |
576 | } acpi_info; | 585 | } acpi_info; |
577 | } addr_info; | 586 | } addr_info; |
578 | }; | 587 | }; |
579 | 588 | ||
580 | Currently special info for only for SI_ACPI address sources is | 589 | Currently special info for only for SI_ACPI address sources is |
581 | returned. Others may be added as necessary. | 590 | returned. Others may be added as necessary. |
@@ -590,7 +599,7 @@ Watchdog | |||
590 | 599 | ||
591 | A watchdog timer is provided that implements the Linux-standard | 600 | A watchdog timer is provided that implements the Linux-standard |
592 | watchdog timer interface. It has three module parameters that can be | 601 | watchdog timer interface. It has three module parameters that can be |
593 | used to control it: | 602 | used to control it:: |
594 | 603 | ||
595 | modprobe ipmi_watchdog timeout=<t> pretimeout=<t> action=<action type> | 604 | modprobe ipmi_watchdog timeout=<t> pretimeout=<t> action=<action type> |
596 | preaction=<preaction type> preop=<preop type> start_now=x | 605 | preaction=<preaction type> preop=<preop type> start_now=x |
@@ -635,7 +644,7 @@ watchdog device is closed. The default value of nowayout is true | |||
635 | if the CONFIG_WATCHDOG_NOWAYOUT option is enabled, or false if not. | 644 | if the CONFIG_WATCHDOG_NOWAYOUT option is enabled, or false if not. |
636 | 645 | ||
637 | When compiled into the kernel, the kernel command line is available | 646 | When compiled into the kernel, the kernel command line is available |
638 | for configuring the watchdog: | 647 | for configuring the watchdog:: |
639 | 648 | ||
640 | ipmi_watchdog.timeout=<t> ipmi_watchdog.pretimeout=<t> | 649 | ipmi_watchdog.timeout=<t> ipmi_watchdog.pretimeout=<t> |
641 | ipmi_watchdog.action=<action type> | 650 | ipmi_watchdog.action=<action type> |
@@ -675,6 +684,7 @@ also get a bunch of OEM events holding the panic string. | |||
675 | 684 | ||
676 | 685 | ||
677 | The field settings of the events are: | 686 | The field settings of the events are: |
687 | |||
678 | * Generator ID: 0x21 (kernel) | 688 | * Generator ID: 0x21 (kernel) |
679 | * EvM Rev: 0x03 (this event is formatting in IPMI 1.0 format) | 689 | * EvM Rev: 0x03 (this event is formatting in IPMI 1.0 format) |
680 | * Sensor Type: 0x20 (OS critical stop sensor) | 690 | * Sensor Type: 0x20 (OS critical stop sensor) |
@@ -683,18 +693,20 @@ The field settings of the events are: | |||
683 | * Event Data 1: 0xa1 (Runtime stop in OEM bytes 2 and 3) | 693 | * Event Data 1: 0xa1 (Runtime stop in OEM bytes 2 and 3) |
684 | * Event data 2: second byte of panic string | 694 | * Event data 2: second byte of panic string |
685 | * Event data 3: third byte of panic string | 695 | * Event data 3: third byte of panic string |
696 | |||
686 | See the IPMI spec for the details of the event layout. This event is | 697 | See the IPMI spec for the details of the event layout. This event is |
687 | always sent to the local management controller. It will handle routing | 698 | always sent to the local management controller. It will handle routing |
688 | the message to the right place | 699 | the message to the right place |
689 | 700 | ||
690 | Other OEM events have the following format: | 701 | Other OEM events have the following format: |
691 | Record ID (bytes 0-1): Set by the SEL. | 702 | |
692 | Record type (byte 2): 0xf0 (OEM non-timestamped) | 703 | * Record ID (bytes 0-1): Set by the SEL. |
693 | byte 3: The slave address of the card saving the panic | 704 | * Record type (byte 2): 0xf0 (OEM non-timestamped) |
694 | byte 4: A sequence number (starting at zero) | 705 | * byte 3: The slave address of the card saving the panic |
695 | The rest of the bytes (11 bytes) are the panic string. If the panic string | 706 | * byte 4: A sequence number (starting at zero) |
696 | is longer than 11 bytes, multiple messages will be sent with increasing | 707 | The rest of the bytes (11 bytes) are the panic string. If the panic string |
697 | sequence numbers. | 708 | is longer than 11 bytes, multiple messages will be sent with increasing |
709 | sequence numbers. | ||
698 | 710 | ||
699 | Because you cannot send OEM events using the standard interface, this | 711 | Because you cannot send OEM events using the standard interface, this |
700 | function will attempt to find an SEL and add the events there. It | 712 | function will attempt to find an SEL and add the events there. It |
diff --git a/Documentation/IRQ-affinity.txt b/Documentation/IRQ-affinity.txt index 01a675175a36..29da5000836a 100644 --- a/Documentation/IRQ-affinity.txt +++ b/Documentation/IRQ-affinity.txt | |||
@@ -1,8 +1,11 @@ | |||
1 | ================ | ||
2 | SMP IRQ affinity | ||
3 | ================ | ||
4 | |||
1 | ChangeLog: | 5 | ChangeLog: |
2 | Started by Ingo Molnar <mingo@redhat.com> | 6 | - Started by Ingo Molnar <mingo@redhat.com> |
3 | Update by Max Krasnyansky <maxk@qualcomm.com> | 7 | - Update by Max Krasnyansky <maxk@qualcomm.com> |
4 | 8 | ||
5 | SMP IRQ affinity | ||
6 | 9 | ||
7 | /proc/irq/IRQ#/smp_affinity and /proc/irq/IRQ#/smp_affinity_list specify | 10 | /proc/irq/IRQ#/smp_affinity and /proc/irq/IRQ#/smp_affinity_list specify |
8 | which target CPUs are permitted for a given IRQ source. It's a bitmask | 11 | which target CPUs are permitted for a given IRQ source. It's a bitmask |
@@ -16,50 +19,52 @@ will be set to the default mask. It can then be changed as described above. | |||
16 | Default mask is 0xffffffff. | 19 | Default mask is 0xffffffff. |
17 | 20 | ||
18 | Here is an example of restricting IRQ44 (eth1) to CPU0-3 then restricting | 21 | Here is an example of restricting IRQ44 (eth1) to CPU0-3 then restricting |
19 | it to CPU4-7 (this is an 8-CPU SMP box): | 22 | it to CPU4-7 (this is an 8-CPU SMP box):: |
20 | 23 | ||
21 | [root@moon 44]# cd /proc/irq/44 | 24 | [root@moon 44]# cd /proc/irq/44 |
22 | [root@moon 44]# cat smp_affinity | 25 | [root@moon 44]# cat smp_affinity |
23 | ffffffff | 26 | ffffffff |
24 | 27 | ||
25 | [root@moon 44]# echo 0f > smp_affinity | 28 | [root@moon 44]# echo 0f > smp_affinity |
26 | [root@moon 44]# cat smp_affinity | 29 | [root@moon 44]# cat smp_affinity |
27 | 0000000f | 30 | 0000000f |
28 | [root@moon 44]# ping -f h | 31 | [root@moon 44]# ping -f h |
29 | PING hell (195.4.7.3): 56 data bytes | 32 | PING hell (195.4.7.3): 56 data bytes |
30 | ... | 33 | ... |
31 | --- hell ping statistics --- | 34 | --- hell ping statistics --- |
32 | 6029 packets transmitted, 6027 packets received, 0% packet loss | 35 | 6029 packets transmitted, 6027 packets received, 0% packet loss |
33 | round-trip min/avg/max = 0.1/0.1/0.4 ms | 36 | round-trip min/avg/max = 0.1/0.1/0.4 ms |
34 | [root@moon 44]# cat /proc/interrupts | grep 'CPU\|44:' | 37 | [root@moon 44]# cat /proc/interrupts | grep 'CPU\|44:' |
35 | CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 | 38 | CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 |
36 | 44: 1068 1785 1785 1783 0 0 0 0 IO-APIC-level eth1 | 39 | 44: 1068 1785 1785 1783 0 0 0 0 IO-APIC-level eth1 |
37 | 40 | ||
38 | As can be seen from the line above IRQ44 was delivered only to the first four | 41 | As can be seen from the line above IRQ44 was delivered only to the first four |
39 | processors (0-3). | 42 | processors (0-3). |
40 | Now lets restrict that IRQ to CPU(4-7). | 43 | Now lets restrict that IRQ to CPU(4-7). |
41 | 44 | ||
42 | [root@moon 44]# echo f0 > smp_affinity | 45 | :: |
43 | [root@moon 44]# cat smp_affinity | 46 | |
44 | 000000f0 | 47 | [root@moon 44]# echo f0 > smp_affinity |
45 | [root@moon 44]# ping -f h | 48 | [root@moon 44]# cat smp_affinity |
46 | PING hell (195.4.7.3): 56 data bytes | 49 | 000000f0 |
47 | .. | 50 | [root@moon 44]# ping -f h |
48 | --- hell ping statistics --- | 51 | PING hell (195.4.7.3): 56 data bytes |
49 | 2779 packets transmitted, 2777 packets received, 0% packet loss | 52 | .. |
50 | round-trip min/avg/max = 0.1/0.5/585.4 ms | 53 | --- hell ping statistics --- |
51 | [root@moon 44]# cat /proc/interrupts | 'CPU\|44:' | 54 | 2779 packets transmitted, 2777 packets received, 0% packet loss |
52 | CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 | 55 | round-trip min/avg/max = 0.1/0.5/585.4 ms |
53 | 44: 1068 1785 1785 1783 1784 1069 1070 1069 IO-APIC-level eth1 | 56 | [root@moon 44]# cat /proc/interrupts | 'CPU\|44:' |
57 | CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 | ||
58 | 44: 1068 1785 1785 1783 1784 1069 1070 1069 IO-APIC-level eth1 | ||
54 | 59 | ||
55 | This time around IRQ44 was delivered only to the last four processors. | 60 | This time around IRQ44 was delivered only to the last four processors. |
56 | i.e counters for the CPU0-3 did not change. | 61 | i.e counters for the CPU0-3 did not change. |
57 | 62 | ||
58 | Here is an example of limiting that same irq (44) to cpus 1024 to 1031: | 63 | Here is an example of limiting that same irq (44) to cpus 1024 to 1031:: |
59 | 64 | ||
60 | [root@moon 44]# echo 1024-1031 > smp_affinity_list | 65 | [root@moon 44]# echo 1024-1031 > smp_affinity_list |
61 | [root@moon 44]# cat smp_affinity_list | 66 | [root@moon 44]# cat smp_affinity_list |
62 | 1024-1031 | 67 | 1024-1031 |
63 | 68 | ||
64 | Note that to do this with a bitmask would require 32 bitmasks of zero | 69 | Note that to do this with a bitmask would require 32 bitmasks of zero |
65 | to follow the pertinent one. | 70 | to follow the pertinent one. |
diff --git a/Documentation/IRQ-domain.txt b/Documentation/IRQ-domain.txt index 1f246eb25ca5..4a1cd7645d85 100644 --- a/Documentation/IRQ-domain.txt +++ b/Documentation/IRQ-domain.txt | |||
@@ -1,4 +1,6 @@ | |||
1 | irq_domain interrupt number mapping library | 1 | =============================================== |
2 | The irq_domain interrupt number mapping library | ||
3 | =============================================== | ||
2 | 4 | ||
3 | The current design of the Linux kernel uses a single large number | 5 | The current design of the Linux kernel uses a single large number |
4 | space where each separate IRQ source is assigned a different number. | 6 | space where each separate IRQ source is assigned a different number. |
@@ -36,7 +38,9 @@ irq_domain also implements translation from an abstract irq_fwspec | |||
36 | structure to hwirq numbers (Device Tree and ACPI GSI so far), and can | 38 | structure to hwirq numbers (Device Tree and ACPI GSI so far), and can |
37 | be easily extended to support other IRQ topology data sources. | 39 | be easily extended to support other IRQ topology data sources. |
38 | 40 | ||
39 | === irq_domain usage === | 41 | irq_domain usage |
42 | ================ | ||
43 | |||
40 | An interrupt controller driver creates and registers an irq_domain by | 44 | An interrupt controller driver creates and registers an irq_domain by |
41 | calling one of the irq_domain_add_*() functions (each mapping method | 45 | calling one of the irq_domain_add_*() functions (each mapping method |
42 | has a different allocator function, more on that later). The function | 46 | has a different allocator function, more on that later). The function |
@@ -62,15 +66,21 @@ If the driver has the Linux IRQ number or the irq_data pointer, and | |||
62 | needs to know the associated hwirq number (such as in the irq_chip | 66 | needs to know the associated hwirq number (such as in the irq_chip |
63 | callbacks) then it can be directly obtained from irq_data->hwirq. | 67 | callbacks) then it can be directly obtained from irq_data->hwirq. |
64 | 68 | ||
65 | === Types of irq_domain mappings === | 69 | Types of irq_domain mappings |
70 | ============================ | ||
71 | |||
66 | There are several mechanisms available for reverse mapping from hwirq | 72 | There are several mechanisms available for reverse mapping from hwirq |
67 | to Linux irq, and each mechanism uses a different allocation function. | 73 | to Linux irq, and each mechanism uses a different allocation function. |
68 | Which reverse map type should be used depends on the use case. Each | 74 | Which reverse map type should be used depends on the use case. Each |
69 | of the reverse map types are described below: | 75 | of the reverse map types are described below: |
70 | 76 | ||
71 | ==== Linear ==== | 77 | Linear |
72 | irq_domain_add_linear() | 78 | ------ |
73 | irq_domain_create_linear() | 79 | |
80 | :: | ||
81 | |||
82 | irq_domain_add_linear() | ||
83 | irq_domain_create_linear() | ||
74 | 84 | ||
75 | The linear reverse map maintains a fixed size table indexed by the | 85 | The linear reverse map maintains a fixed size table indexed by the |
76 | hwirq number. When a hwirq is mapped, an irq_desc is allocated for | 86 | hwirq number. When a hwirq is mapped, an irq_desc is allocated for |
@@ -89,9 +99,13 @@ accepts a more general abstraction 'struct fwnode_handle'. | |||
89 | 99 | ||
90 | The majority of drivers should use the linear map. | 100 | The majority of drivers should use the linear map. |
91 | 101 | ||
92 | ==== Tree ==== | 102 | Tree |
93 | irq_domain_add_tree() | 103 | ---- |
94 | irq_domain_create_tree() | 104 | |
105 | :: | ||
106 | |||
107 | irq_domain_add_tree() | ||
108 | irq_domain_create_tree() | ||
95 | 109 | ||
96 | The irq_domain maintains a radix tree map from hwirq numbers to Linux | 110 | The irq_domain maintains a radix tree map from hwirq numbers to Linux |
97 | IRQs. When an hwirq is mapped, an irq_desc is allocated and the | 111 | IRQs. When an hwirq is mapped, an irq_desc is allocated and the |
@@ -109,8 +123,12 @@ accepts a more general abstraction 'struct fwnode_handle'. | |||
109 | 123 | ||
110 | Very few drivers should need this mapping. | 124 | Very few drivers should need this mapping. |
111 | 125 | ||
112 | ==== No Map ===- | 126 | No Map |
113 | irq_domain_add_nomap() | 127 | ------ |
128 | |||
129 | :: | ||
130 | |||
131 | irq_domain_add_nomap() | ||
114 | 132 | ||
115 | The No Map mapping is to be used when the hwirq number is | 133 | The No Map mapping is to be used when the hwirq number is |
116 | programmable in the hardware. In this case it is best to program the | 134 | programmable in the hardware. In this case it is best to program the |
@@ -121,10 +139,14 @@ Linux IRQ number into the hardware. | |||
121 | 139 | ||
122 | Most drivers cannot use this mapping. | 140 | Most drivers cannot use this mapping. |
123 | 141 | ||
124 | ==== Legacy ==== | 142 | Legacy |
125 | irq_domain_add_simple() | 143 | ------ |
126 | irq_domain_add_legacy() | 144 | |
127 | irq_domain_add_legacy_isa() | 145 | :: |
146 | |||
147 | irq_domain_add_simple() | ||
148 | irq_domain_add_legacy() | ||
149 | irq_domain_add_legacy_isa() | ||
128 | 150 | ||
129 | The Legacy mapping is a special case for drivers that already have a | 151 | The Legacy mapping is a special case for drivers that already have a |
130 | range of irq_descs allocated for the hwirqs. It is used when the | 152 | range of irq_descs allocated for the hwirqs. It is used when the |
@@ -163,14 +185,17 @@ that the driver using the simple domain call irq_create_mapping() | |||
163 | before any irq_find_mapping() since the latter will actually work | 185 | before any irq_find_mapping() since the latter will actually work |
164 | for the static IRQ assignment case. | 186 | for the static IRQ assignment case. |
165 | 187 | ||
166 | ==== Hierarchy IRQ domain ==== | 188 | Hierarchy IRQ domain |
189 | -------------------- | ||
190 | |||
167 | On some architectures, there may be multiple interrupt controllers | 191 | On some architectures, there may be multiple interrupt controllers |
168 | involved in delivering an interrupt from the device to the target CPU. | 192 | involved in delivering an interrupt from the device to the target CPU. |
169 | Let's look at a typical interrupt delivering path on x86 platforms: | 193 | Let's look at a typical interrupt delivering path on x86 platforms:: |
170 | 194 | ||
171 | Device --> IOAPIC -> Interrupt remapping Controller -> Local APIC -> CPU | 195 | Device --> IOAPIC -> Interrupt remapping Controller -> Local APIC -> CPU |
172 | 196 | ||
173 | There are three interrupt controllers involved: | 197 | There are three interrupt controllers involved: |
198 | |||
174 | 1) IOAPIC controller | 199 | 1) IOAPIC controller |
175 | 2) Interrupt remapping controller | 200 | 2) Interrupt remapping controller |
176 | 3) Local APIC controller | 201 | 3) Local APIC controller |
@@ -180,7 +205,8 @@ hardware architecture, an irq_domain data structure is built for each | |||
180 | interrupt controller and those irq_domains are organized into hierarchy. | 205 | interrupt controller and those irq_domains are organized into hierarchy. |
181 | When building irq_domain hierarchy, the irq_domain near to the device is | 206 | When building irq_domain hierarchy, the irq_domain near to the device is |
182 | child and the irq_domain near to CPU is parent. So a hierarchy structure | 207 | child and the irq_domain near to CPU is parent. So a hierarchy structure |
183 | as below will be built for the example above. | 208 | as below will be built for the example above:: |
209 | |||
184 | CPU Vector irq_domain (root irq_domain to manage CPU vectors) | 210 | CPU Vector irq_domain (root irq_domain to manage CPU vectors) |
185 | ^ | 211 | ^ |
186 | | | 212 | | |
@@ -190,6 +216,7 @@ as below will be built for the example above. | |||
190 | IOAPIC irq_domain (manage IOAPIC delivery entries/pins) | 216 | IOAPIC irq_domain (manage IOAPIC delivery entries/pins) |
191 | 217 | ||
192 | There are four major interfaces to use hierarchy irq_domain: | 218 | There are four major interfaces to use hierarchy irq_domain: |
219 | |||
193 | 1) irq_domain_alloc_irqs(): allocate IRQ descriptors and interrupt | 220 | 1) irq_domain_alloc_irqs(): allocate IRQ descriptors and interrupt |
194 | controller related resources to deliver these interrupts. | 221 | controller related resources to deliver these interrupts. |
195 | 2) irq_domain_free_irqs(): free IRQ descriptors and interrupt controller | 222 | 2) irq_domain_free_irqs(): free IRQ descriptors and interrupt controller |
@@ -199,7 +226,8 @@ There are four major interfaces to use hierarchy irq_domain: | |||
199 | 4) irq_domain_deactivate_irq(): deactivate interrupt controller hardware | 226 | 4) irq_domain_deactivate_irq(): deactivate interrupt controller hardware |
200 | to stop delivering the interrupt. | 227 | to stop delivering the interrupt. |
201 | 228 | ||
202 | Following changes are needed to support hierarchy irq_domain. | 229 | Following changes are needed to support hierarchy irq_domain: |
230 | |||
203 | 1) a new field 'parent' is added to struct irq_domain; it's used to | 231 | 1) a new field 'parent' is added to struct irq_domain; it's used to |
204 | maintain irq_domain hierarchy information. | 232 | maintain irq_domain hierarchy information. |
205 | 2) a new field 'parent_data' is added to struct irq_data; it's used to | 233 | 2) a new field 'parent_data' is added to struct irq_data; it's used to |
@@ -223,6 +251,7 @@ software architecture. | |||
223 | 251 | ||
224 | For an interrupt controller driver to support hierarchy irq_domain, it | 252 | For an interrupt controller driver to support hierarchy irq_domain, it |
225 | needs to: | 253 | needs to: |
254 | |||
226 | 1) Implement irq_domain_ops.alloc and irq_domain_ops.free | 255 | 1) Implement irq_domain_ops.alloc and irq_domain_ops.free |
227 | 2) Optionally implement irq_domain_ops.activate and | 256 | 2) Optionally implement irq_domain_ops.activate and |
228 | irq_domain_ops.deactivate. | 257 | irq_domain_ops.deactivate. |
diff --git a/Documentation/IRQ.txt b/Documentation/IRQ.txt index 1011e7175021..4273806a606b 100644 --- a/Documentation/IRQ.txt +++ b/Documentation/IRQ.txt | |||
@@ -1,4 +1,6 @@ | |||
1 | =============== | ||
1 | What is an IRQ? | 2 | What is an IRQ? |
3 | =============== | ||
2 | 4 | ||
3 | An IRQ is an interrupt request from a device. | 5 | An IRQ is an interrupt request from a device. |
4 | Currently they can come in over a pin, or over a packet. | 6 | Currently they can come in over a pin, or over a packet. |
diff --git a/Documentation/Intel-IOMMU.txt b/Documentation/Intel-IOMMU.txt index 49585b6e1ea2..9dae6b47e398 100644 --- a/Documentation/Intel-IOMMU.txt +++ b/Documentation/Intel-IOMMU.txt | |||
@@ -1,3 +1,4 @@ | |||
1 | =================== | ||
1 | Linux IOMMU Support | 2 | Linux IOMMU Support |
2 | =================== | 3 | =================== |
3 | 4 | ||
@@ -9,11 +10,11 @@ This guide gives a quick cheat sheet for some basic understanding. | |||
9 | 10 | ||
10 | Some Keywords | 11 | Some Keywords |
11 | 12 | ||
12 | DMAR - DMA remapping | 13 | - DMAR - DMA remapping |
13 | DRHD - DMA Remapping Hardware Unit Definition | 14 | - DRHD - DMA Remapping Hardware Unit Definition |
14 | RMRR - Reserved memory Region Reporting Structure | 15 | - RMRR - Reserved memory Region Reporting Structure |
15 | ZLR - Zero length reads from PCI devices | 16 | - ZLR - Zero length reads from PCI devices |
16 | IOVA - IO Virtual address. | 17 | - IOVA - IO Virtual address. |
17 | 18 | ||
18 | Basic stuff | 19 | Basic stuff |
19 | ----------- | 20 | ----------- |
@@ -33,7 +34,7 @@ devices that need to access these regions. OS is expected to setup | |||
33 | unity mappings for these regions for these devices to access these regions. | 34 | unity mappings for these regions for these devices to access these regions. |
34 | 35 | ||
35 | How is IOVA generated? | 36 | How is IOVA generated? |
36 | --------------------- | 37 | ---------------------- |
37 | 38 | ||
38 | Well behaved drivers call pci_map_*() calls before sending command to device | 39 | Well behaved drivers call pci_map_*() calls before sending command to device |
39 | that needs to perform DMA. Once DMA is completed and mapping is no longer | 40 | that needs to perform DMA. Once DMA is completed and mapping is no longer |
@@ -82,14 +83,14 @@ in ACPI. | |||
82 | ACPI: DMAR (v001 A M I OEMDMAR 0x00000001 MSFT 0x00000097) @ 0x000000007f5b5ef0 | 83 | ACPI: DMAR (v001 A M I OEMDMAR 0x00000001 MSFT 0x00000097) @ 0x000000007f5b5ef0 |
83 | 84 | ||
84 | When DMAR is being processed and initialized by ACPI, prints DMAR locations | 85 | When DMAR is being processed and initialized by ACPI, prints DMAR locations |
85 | and any RMRR's processed. | 86 | and any RMRR's processed:: |
86 | 87 | ||
87 | ACPI DMAR:Host address width 36 | 88 | ACPI DMAR:Host address width 36 |
88 | ACPI DMAR:DRHD (flags: 0x00000000)base: 0x00000000fed90000 | 89 | ACPI DMAR:DRHD (flags: 0x00000000)base: 0x00000000fed90000 |
89 | ACPI DMAR:DRHD (flags: 0x00000000)base: 0x00000000fed91000 | 90 | ACPI DMAR:DRHD (flags: 0x00000000)base: 0x00000000fed91000 |
90 | ACPI DMAR:DRHD (flags: 0x00000001)base: 0x00000000fed93000 | 91 | ACPI DMAR:DRHD (flags: 0x00000001)base: 0x00000000fed93000 |
91 | ACPI DMAR:RMRR base: 0x00000000000ed000 end: 0x00000000000effff | 92 | ACPI DMAR:RMRR base: 0x00000000000ed000 end: 0x00000000000effff |
92 | ACPI DMAR:RMRR base: 0x000000007f600000 end: 0x000000007fffffff | 93 | ACPI DMAR:RMRR base: 0x000000007f600000 end: 0x000000007fffffff |
93 | 94 | ||
94 | When DMAR is enabled for use, you will notice.. | 95 | When DMAR is enabled for use, you will notice.. |
95 | 96 | ||
@@ -98,10 +99,12 @@ PCI-DMA: Using DMAR IOMMU | |||
98 | Fault reporting | 99 | Fault reporting |
99 | --------------- | 100 | --------------- |
100 | 101 | ||
101 | DMAR:[DMA Write] Request device [00:02.0] fault addr 6df084000 | 102 | :: |
102 | DMAR:[fault reason 05] PTE Write access is not set | 103 | |
103 | DMAR:[DMA Write] Request device [00:02.0] fault addr 6df084000 | 104 | DMAR:[DMA Write] Request device [00:02.0] fault addr 6df084000 |
104 | DMAR:[fault reason 05] PTE Write access is not set | 105 | DMAR:[fault reason 05] PTE Write access is not set |
106 | DMAR:[DMA Write] Request device [00:02.0] fault addr 6df084000 | ||
107 | DMAR:[fault reason 05] PTE Write access is not set | ||
105 | 108 | ||
106 | TBD | 109 | TBD |
107 | ---- | 110 | ---- |
diff --git a/Documentation/SAK.txt b/Documentation/SAK.txt index 74be14679ed8..260e1d3687bd 100644 --- a/Documentation/SAK.txt +++ b/Documentation/SAK.txt | |||
@@ -1,5 +1,9 @@ | |||
1 | Linux 2.4.2 Secure Attention Key (SAK) handling | 1 | ========================================= |
2 | 18 March 2001, Andrew Morton | 2 | Linux Secure Attention Key (SAK) handling |
3 | ========================================= | ||
4 | |||
5 | :Date: 18 March 2001 | ||
6 | :Author: Andrew Morton | ||
3 | 7 | ||
4 | An operating system's Secure Attention Key is a security tool which is | 8 | An operating system's Secure Attention Key is a security tool which is |
5 | provided as protection against trojan password capturing programs. It | 9 | provided as protection against trojan password capturing programs. It |
@@ -13,7 +17,7 @@ this sequence. It is only available if the kernel was compiled with | |||
13 | sysrq support. | 17 | sysrq support. |
14 | 18 | ||
15 | The proper way of generating a SAK is to define the key sequence using | 19 | The proper way of generating a SAK is to define the key sequence using |
16 | `loadkeys'. This will work whether or not sysrq support is compiled | 20 | ``loadkeys``. This will work whether or not sysrq support is compiled |
17 | into the kernel. | 21 | into the kernel. |
18 | 22 | ||
19 | SAK works correctly when the keyboard is in raw mode. This means that | 23 | SAK works correctly when the keyboard is in raw mode. This means that |
@@ -25,64 +29,63 @@ What key sequence should you use? Well, CTRL-ALT-DEL is used to reboot | |||
25 | the machine. CTRL-ALT-BACKSPACE is magical to the X server. We'll | 29 | the machine. CTRL-ALT-BACKSPACE is magical to the X server. We'll |
26 | choose CTRL-ALT-PAUSE. | 30 | choose CTRL-ALT-PAUSE. |
27 | 31 | ||
28 | In your rc.sysinit (or rc.local) file, add the command | 32 | In your rc.sysinit (or rc.local) file, add the command:: |
29 | 33 | ||
30 | echo "control alt keycode 101 = SAK" | /bin/loadkeys | 34 | echo "control alt keycode 101 = SAK" | /bin/loadkeys |
31 | 35 | ||
32 | And that's it! Only the superuser may reprogram the SAK key. | 36 | And that's it! Only the superuser may reprogram the SAK key. |
33 | 37 | ||
34 | 38 | ||
35 | NOTES | 39 | .. note:: |
36 | ===== | ||
37 | 40 | ||
38 | 1: Linux SAK is said to be not a "true SAK" as is required by | 41 | 1. Linux SAK is said to be not a "true SAK" as is required by |
39 | systems which implement C2 level security. This author does not | 42 | systems which implement C2 level security. This author does not |
40 | know why. | 43 | know why. |
41 | 44 | ||
42 | 45 | ||
43 | 2: On the PC keyboard, SAK kills all applications which have | 46 | 2. On the PC keyboard, SAK kills all applications which have |
44 | /dev/console opened. | 47 | /dev/console opened. |
45 | 48 | ||
46 | Unfortunately this includes a number of things which you don't | 49 | Unfortunately this includes a number of things which you don't |
47 | actually want killed. This is because these applications are | 50 | actually want killed. This is because these applications are |
48 | incorrectly holding /dev/console open. Be sure to complain to your | 51 | incorrectly holding /dev/console open. Be sure to complain to your |
49 | Linux distributor about this! | 52 | Linux distributor about this! |
50 | 53 | ||
51 | You can identify processes which will be killed by SAK with the | 54 | You can identify processes which will be killed by SAK with the |
52 | command | 55 | command:: |
53 | 56 | ||
54 | # ls -l /proc/[0-9]*/fd/* | grep console | 57 | # ls -l /proc/[0-9]*/fd/* | grep console |
55 | l-wx------ 1 root root 64 Mar 18 00:46 /proc/579/fd/0 -> /dev/console | 58 | l-wx------ 1 root root 64 Mar 18 00:46 /proc/579/fd/0 -> /dev/console |
56 | 59 | ||
57 | Then: | 60 | Then:: |
58 | 61 | ||
59 | # ps aux|grep 579 | 62 | # ps aux|grep 579 |
60 | root 579 0.0 0.1 1088 436 ? S 00:43 0:00 gpm -t ps/2 | 63 | root 579 0.0 0.1 1088 436 ? S 00:43 0:00 gpm -t ps/2 |
61 | 64 | ||
62 | So `gpm' will be killed by SAK. This is a bug in gpm. It should | 65 | So ``gpm`` will be killed by SAK. This is a bug in gpm. It should |
63 | be closing standard input. You can work around this by finding the | 66 | be closing standard input. You can work around this by finding the |
64 | initscript which launches gpm and changing it thusly: | 67 | initscript which launches gpm and changing it thusly: |
65 | 68 | ||
66 | Old: | 69 | Old:: |
67 | 70 | ||
68 | daemon gpm | 71 | daemon gpm |
69 | 72 | ||
70 | New: | 73 | New:: |
71 | 74 | ||
72 | daemon gpm < /dev/null | 75 | daemon gpm < /dev/null |
73 | 76 | ||
74 | Vixie cron also seems to have this problem, and needs the same treatment. | 77 | Vixie cron also seems to have this problem, and needs the same treatment. |
75 | 78 | ||
76 | Also, one prominent Linux distribution has the following three | 79 | Also, one prominent Linux distribution has the following three |
77 | lines in its rc.sysinit and rc scripts: | 80 | lines in its rc.sysinit and rc scripts:: |
78 | 81 | ||
79 | exec 3<&0 | 82 | exec 3<&0 |
80 | exec 4>&1 | 83 | exec 4>&1 |
81 | exec 5>&2 | 84 | exec 5>&2 |
82 | 85 | ||
83 | These commands cause *all* daemons which are launched by the | 86 | These commands cause **all** daemons which are launched by the |
84 | initscripts to have file descriptors 3, 4 and 5 attached to | 87 | initscripts to have file descriptors 3, 4 and 5 attached to |
85 | /dev/console. So SAK kills them all. A workaround is to simply | 88 | /dev/console. So SAK kills them all. A workaround is to simply |
86 | delete these lines, but this may cause system management | 89 | delete these lines, but this may cause system management |
87 | applications to malfunction - test everything well. | 90 | applications to malfunction - test everything well. |
88 | 91 | ||
diff --git a/Documentation/SM501.txt b/Documentation/SM501.txt index 561826f82093..882507453ba4 100644 --- a/Documentation/SM501.txt +++ b/Documentation/SM501.txt | |||
@@ -1,7 +1,10 @@ | |||
1 | SM501 Driver | 1 | .. include:: <isonum.txt> |
2 | ============ | ||
3 | 2 | ||
4 | Copyright 2006, 2007 Simtec Electronics | 3 | ============ |
4 | SM501 Driver | ||
5 | ============ | ||
6 | |||
7 | :Copyright: |copy| 2006, 2007 Simtec Electronics | ||
5 | 8 | ||
6 | The Silicon Motion SM501 multimedia companion chip is a multifunction device | 9 | The Silicon Motion SM501 multimedia companion chip is a multifunction device |
7 | which may provide numerous interfaces including USB host controller USB gadget, | 10 | which may provide numerous interfaces including USB host controller USB gadget, |
diff --git a/Documentation/bcache.txt b/Documentation/bcache.txt index a9259b562d5c..c0ce64d75bbf 100644 --- a/Documentation/bcache.txt +++ b/Documentation/bcache.txt | |||
@@ -1,10 +1,15 @@ | |||
1 | ============================ | ||
2 | A block layer cache (bcache) | ||
3 | ============================ | ||
4 | |||
1 | Say you've got a big slow raid 6, and an ssd or three. Wouldn't it be | 5 | Say you've got a big slow raid 6, and an ssd or three. Wouldn't it be |
2 | nice if you could use them as cache... Hence bcache. | 6 | nice if you could use them as cache... Hence bcache. |
3 | 7 | ||
4 | Wiki and git repositories are at: | 8 | Wiki and git repositories are at: |
5 | http://bcache.evilpiepirate.org | 9 | |
6 | http://evilpiepirate.org/git/linux-bcache.git | 10 | - http://bcache.evilpiepirate.org |
7 | http://evilpiepirate.org/git/bcache-tools.git | 11 | - http://evilpiepirate.org/git/linux-bcache.git |
12 | - http://evilpiepirate.org/git/bcache-tools.git | ||
8 | 13 | ||
9 | It's designed around the performance characteristics of SSDs - it only allocates | 14 | It's designed around the performance characteristics of SSDs - it only allocates |
10 | in erase block sized buckets, and it uses a hybrid btree/log to track cached | 15 | in erase block sized buckets, and it uses a hybrid btree/log to track cached |
@@ -37,17 +42,19 @@ to be flushed. | |||
37 | 42 | ||
38 | Getting started: | 43 | Getting started: |
39 | You'll need make-bcache from the bcache-tools repository. Both the cache device | 44 | You'll need make-bcache from the bcache-tools repository. Both the cache device |
40 | and backing device must be formatted before use. | 45 | and backing device must be formatted before use:: |
46 | |||
41 | make-bcache -B /dev/sdb | 47 | make-bcache -B /dev/sdb |
42 | make-bcache -C /dev/sdc | 48 | make-bcache -C /dev/sdc |
43 | 49 | ||
44 | make-bcache has the ability to format multiple devices at the same time - if | 50 | make-bcache has the ability to format multiple devices at the same time - if |
45 | you format your backing devices and cache device at the same time, you won't | 51 | you format your backing devices and cache device at the same time, you won't |
46 | have to manually attach: | 52 | have to manually attach:: |
53 | |||
47 | make-bcache -B /dev/sda /dev/sdb -C /dev/sdc | 54 | make-bcache -B /dev/sda /dev/sdb -C /dev/sdc |
48 | 55 | ||
49 | bcache-tools now ships udev rules, and bcache devices are known to the kernel | 56 | bcache-tools now ships udev rules, and bcache devices are known to the kernel |
50 | immediately. Without udev, you can manually register devices like this: | 57 | immediately. Without udev, you can manually register devices like this:: |
51 | 58 | ||
52 | echo /dev/sdb > /sys/fs/bcache/register | 59 | echo /dev/sdb > /sys/fs/bcache/register |
53 | echo /dev/sdc > /sys/fs/bcache/register | 60 | echo /dev/sdc > /sys/fs/bcache/register |
@@ -60,16 +67,16 @@ slow devices as bcache backing devices without a cache, and you can choose to ad | |||
60 | a caching device later. | 67 | a caching device later. |
61 | See 'ATTACHING' section below. | 68 | See 'ATTACHING' section below. |
62 | 69 | ||
63 | The devices show up as: | 70 | The devices show up as:: |
64 | 71 | ||
65 | /dev/bcache<N> | 72 | /dev/bcache<N> |
66 | 73 | ||
67 | As well as (with udev): | 74 | As well as (with udev):: |
68 | 75 | ||
69 | /dev/bcache/by-uuid/<uuid> | 76 | /dev/bcache/by-uuid/<uuid> |
70 | /dev/bcache/by-label/<label> | 77 | /dev/bcache/by-label/<label> |
71 | 78 | ||
72 | To get started: | 79 | To get started:: |
73 | 80 | ||
74 | mkfs.ext4 /dev/bcache0 | 81 | mkfs.ext4 /dev/bcache0 |
75 | mount /dev/bcache0 /mnt | 82 | mount /dev/bcache0 /mnt |
@@ -81,13 +88,13 @@ Cache devices are managed as sets; multiple caches per set isn't supported yet | |||
81 | but will allow for mirroring of metadata and dirty data in the future. Your new | 88 | but will allow for mirroring of metadata and dirty data in the future. Your new |
82 | cache set shows up as /sys/fs/bcache/<UUID> | 89 | cache set shows up as /sys/fs/bcache/<UUID> |
83 | 90 | ||
84 | ATTACHING | 91 | Attaching |
85 | --------- | 92 | --------- |
86 | 93 | ||
87 | After your cache device and backing device are registered, the backing device | 94 | After your cache device and backing device are registered, the backing device |
88 | must be attached to your cache set to enable caching. Attaching a backing | 95 | must be attached to your cache set to enable caching. Attaching a backing |
89 | device to a cache set is done thusly, with the UUID of the cache set in | 96 | device to a cache set is done thusly, with the UUID of the cache set in |
90 | /sys/fs/bcache: | 97 | /sys/fs/bcache:: |
91 | 98 | ||
92 | echo <CSET-UUID> > /sys/block/bcache0/bcache/attach | 99 | echo <CSET-UUID> > /sys/block/bcache0/bcache/attach |
93 | 100 | ||
@@ -97,7 +104,7 @@ your bcache devices. If a backing device has data in a cache somewhere, the | |||
97 | important if you have writeback caching turned on. | 104 | important if you have writeback caching turned on. |
98 | 105 | ||
99 | If you're booting up and your cache device is gone and never coming back, you | 106 | If you're booting up and your cache device is gone and never coming back, you |
100 | can force run the backing device: | 107 | can force run the backing device:: |
101 | 108 | ||
102 | echo 1 > /sys/block/sdb/bcache/running | 109 | echo 1 > /sys/block/sdb/bcache/running |
103 | 110 | ||
@@ -110,7 +117,7 @@ but all the cached data will be invalidated. If there was dirty data in the | |||
110 | cache, don't expect the filesystem to be recoverable - you will have massive | 117 | cache, don't expect the filesystem to be recoverable - you will have massive |
111 | filesystem corruption, though ext4's fsck does work miracles. | 118 | filesystem corruption, though ext4's fsck does work miracles. |
112 | 119 | ||
113 | ERROR HANDLING | 120 | Error Handling |
114 | -------------- | 121 | -------------- |
115 | 122 | ||
116 | Bcache tries to transparently handle IO errors to/from the cache device without | 123 | Bcache tries to transparently handle IO errors to/from the cache device without |
@@ -134,25 +141,27 @@ the backing devices to passthrough mode. | |||
134 | read some of the dirty data, though. | 141 | read some of the dirty data, though. |
135 | 142 | ||
136 | 143 | ||
137 | HOWTO/COOKBOOK | 144 | Howto/cookbook |
138 | -------------- | 145 | -------------- |
139 | 146 | ||
140 | A) Starting a bcache with a missing caching device | 147 | A) Starting a bcache with a missing caching device |
141 | 148 | ||
142 | If registering the backing device doesn't help, it's already there, you just need | 149 | If registering the backing device doesn't help, it's already there, you just need |
143 | to force it to run without the cache: | 150 | to force it to run without the cache:: |
151 | |||
144 | host:~# echo /dev/sdb1 > /sys/fs/bcache/register | 152 | host:~# echo /dev/sdb1 > /sys/fs/bcache/register |
145 | [ 119.844831] bcache: register_bcache() error opening /dev/sdb1: device already registered | 153 | [ 119.844831] bcache: register_bcache() error opening /dev/sdb1: device already registered |
146 | 154 | ||
147 | Next, you try to register your caching device if it's present. However | 155 | Next, you try to register your caching device if it's present. However |
148 | if it's absent, or registration fails for some reason, you can still | 156 | if it's absent, or registration fails for some reason, you can still |
149 | start your bcache without its cache, like so: | 157 | start your bcache without its cache, like so:: |
158 | |||
150 | host:/sys/block/sdb/sdb1/bcache# echo 1 > running | 159 | host:/sys/block/sdb/sdb1/bcache# echo 1 > running |
151 | 160 | ||
152 | Note that this may cause data loss if you were running in writeback mode. | 161 | Note that this may cause data loss if you were running in writeback mode. |
153 | 162 | ||
154 | 163 | ||
155 | B) Bcache does not find its cache | 164 | B) Bcache does not find its cache:: |
156 | 165 | ||
157 | host:/sys/block/md5/bcache# echo 0226553a-37cf-41d5-b3ce-8b1e944543a8 > attach | 166 | host:/sys/block/md5/bcache# echo 0226553a-37cf-41d5-b3ce-8b1e944543a8 > attach |
158 | [ 1933.455082] bcache: bch_cached_dev_attach() Couldn't find uuid for md5 in set | 167 | [ 1933.455082] bcache: bch_cached_dev_attach() Couldn't find uuid for md5 in set |
@@ -160,7 +169,8 @@ B) Bcache does not find its cache | |||
160 | [ 1933.478179] : cache set not found | 169 | [ 1933.478179] : cache set not found |
161 | 170 | ||
162 | In this case, the caching device was simply not registered at boot | 171 | In this case, the caching device was simply not registered at boot |
163 | or disappeared and came back, and needs to be (re-)registered: | 172 | or disappeared and came back, and needs to be (re-)registered:: |
173 | |||
164 | host:/sys/block/md5/bcache# echo /dev/sdh2 > /sys/fs/bcache/register | 174 | host:/sys/block/md5/bcache# echo /dev/sdh2 > /sys/fs/bcache/register |
165 | 175 | ||
166 | 176 | ||
@@ -180,7 +190,8 @@ device is still available at an 8KiB offset. So either via a loopdev | |||
180 | of the backing device created with --offset 8K, or any value defined by | 190 | of the backing device created with --offset 8K, or any value defined by |
181 | --data-offset when you originally formatted bcache with `make-bcache`. | 191 | --data-offset when you originally formatted bcache with `make-bcache`. |
182 | 192 | ||
183 | For example: | 193 | For example:: |
194 | |||
184 | losetup -o 8192 /dev/loop0 /dev/your_bcache_backing_dev | 195 | losetup -o 8192 /dev/loop0 /dev/your_bcache_backing_dev |
185 | 196 | ||
186 | This should present your unmodified backing device data in /dev/loop0 | 197 | This should present your unmodified backing device data in /dev/loop0 |
@@ -191,33 +202,38 @@ cache device without loosing data. | |||
191 | 202 | ||
192 | E) Wiping a cache device | 203 | E) Wiping a cache device |
193 | 204 | ||
194 | host:~# wipefs -a /dev/sdh2 | 205 | :: |
195 | 16 bytes were erased at offset 0x1018 (bcache) | 206 | |
196 | they were: c6 85 73 f6 4e 1a 45 ca 82 65 f5 7f 48 ba 6d 81 | 207 | host:~# wipefs -a /dev/sdh2 |
208 | 16 bytes were erased at offset 0x1018 (bcache) | ||
209 | they were: c6 85 73 f6 4e 1a 45 ca 82 65 f5 7f 48 ba 6d 81 | ||
210 | |||
211 | After you boot back with bcache enabled, you recreate the cache and attach it:: | ||
197 | 212 | ||
198 | After you boot back with bcache enabled, you recreate the cache and attach it: | 213 | host:~# make-bcache -C /dev/sdh2 |
199 | host:~# make-bcache -C /dev/sdh2 | 214 | UUID: 7be7e175-8f4c-4f99-94b2-9c904d227045 |
200 | UUID: 7be7e175-8f4c-4f99-94b2-9c904d227045 | 215 | Set UUID: 5bc072a8-ab17-446d-9744-e247949913c1 |
201 | Set UUID: 5bc072a8-ab17-446d-9744-e247949913c1 | 216 | version: 0 |
202 | version: 0 | 217 | nbuckets: 106874 |
203 | nbuckets: 106874 | 218 | block_size: 1 |
204 | block_size: 1 | 219 | bucket_size: 1024 |
205 | bucket_size: 1024 | 220 | nr_in_set: 1 |
206 | nr_in_set: 1 | 221 | nr_this_dev: 0 |
207 | nr_this_dev: 0 | 222 | first_bucket: 1 |
208 | first_bucket: 1 | 223 | [ 650.511912] bcache: run_cache_set() invalidating existing data |
209 | [ 650.511912] bcache: run_cache_set() invalidating existing data | 224 | [ 650.549228] bcache: register_cache() registered cache device sdh2 |
210 | [ 650.549228] bcache: register_cache() registered cache device sdh2 | ||
211 | 225 | ||
212 | start backing device with missing cache: | 226 | start backing device with missing cache:: |
213 | host:/sys/block/md5/bcache# echo 1 > running | ||
214 | 227 | ||
215 | attach new cache: | 228 | host:/sys/block/md5/bcache# echo 1 > running |
216 | host:/sys/block/md5/bcache# echo 5bc072a8-ab17-446d-9744-e247949913c1 > attach | ||
217 | [ 865.276616] bcache: bch_cached_dev_attach() Caching md5 as bcache0 on set 5bc072a8-ab17-446d-9744-e247949913c1 | ||
218 | 229 | ||
230 | attach new cache:: | ||
219 | 231 | ||
220 | F) Remove or replace a caching device | 232 | host:/sys/block/md5/bcache# echo 5bc072a8-ab17-446d-9744-e247949913c1 > attach |
233 | [ 865.276616] bcache: bch_cached_dev_attach() Caching md5 as bcache0 on set 5bc072a8-ab17-446d-9744-e247949913c1 | ||
234 | |||
235 | |||
236 | F) Remove or replace a caching device:: | ||
221 | 237 | ||
222 | host:/sys/block/sda/sda7/bcache# echo 1 > detach | 238 | host:/sys/block/sda/sda7/bcache# echo 1 > detach |
223 | [ 695.872542] bcache: cached_dev_detach_finish() Caching disabled for sda7 | 239 | [ 695.872542] bcache: cached_dev_detach_finish() Caching disabled for sda7 |
@@ -226,13 +242,15 @@ F) Remove or replace a caching device | |||
226 | wipefs: error: /dev/nvme0n1p4: probing initialization failed: Device or resource busy | 242 | wipefs: error: /dev/nvme0n1p4: probing initialization failed: Device or resource busy |
227 | Ooops, it's disabled, but not unregistered, so it's still protected | 243 | Ooops, it's disabled, but not unregistered, so it's still protected |
228 | 244 | ||
229 | We need to go and unregister it: | 245 | We need to go and unregister it:: |
246 | |||
230 | host:/sys/fs/bcache/b7ba27a1-2398-4649-8ae3-0959f57ba128# ls -l cache0 | 247 | host:/sys/fs/bcache/b7ba27a1-2398-4649-8ae3-0959f57ba128# ls -l cache0 |
231 | lrwxrwxrwx 1 root root 0 Feb 25 18:33 cache0 -> ../../../devices/pci0000:00/0000:00:1d.0/0000:70:00.0/nvme/nvme0/nvme0n1/nvme0n1p4/bcache/ | 248 | lrwxrwxrwx 1 root root 0 Feb 25 18:33 cache0 -> ../../../devices/pci0000:00/0000:00:1d.0/0000:70:00.0/nvme/nvme0/nvme0n1/nvme0n1p4/bcache/ |
232 | host:/sys/fs/bcache/b7ba27a1-2398-4649-8ae3-0959f57ba128# echo 1 > stop | 249 | host:/sys/fs/bcache/b7ba27a1-2398-4649-8ae3-0959f57ba128# echo 1 > stop |
233 | kernel: [ 917.041908] bcache: cache_set_free() Cache set b7ba27a1-2398-4649-8ae3-0959f57ba128 unregistered | 250 | kernel: [ 917.041908] bcache: cache_set_free() Cache set b7ba27a1-2398-4649-8ae3-0959f57ba128 unregistered |
234 | 251 | ||
235 | Now we can wipe it: | 252 | Now we can wipe it:: |
253 | |||
236 | host:~# wipefs -a /dev/nvme0n1p4 | 254 | host:~# wipefs -a /dev/nvme0n1p4 |
237 | /dev/nvme0n1p4: 16 bytes were erased at offset 0x00001018 (bcache): c6 85 73 f6 4e 1a 45 ca 82 65 f5 7f 48 ba 6d 81 | 255 | /dev/nvme0n1p4: 16 bytes were erased at offset 0x00001018 (bcache): c6 85 73 f6 4e 1a 45 ca 82 65 f5 7f 48 ba 6d 81 |
238 | 256 | ||
@@ -252,40 +270,44 @@ if there are any active backing or caching devices left on it: | |||
252 | 270 | ||
253 | 1) Is it present in /dev/bcache* ? (there are times where it won't be) | 271 | 1) Is it present in /dev/bcache* ? (there are times where it won't be) |
254 | 272 | ||
255 | If so, it's easy: | 273 | If so, it's easy:: |
274 | |||
256 | host:/sys/block/bcache0/bcache# echo 1 > stop | 275 | host:/sys/block/bcache0/bcache# echo 1 > stop |
257 | 276 | ||
258 | 2) But if your backing device is gone, this won't work: | 277 | 2) But if your backing device is gone, this won't work:: |
278 | |||
259 | host:/sys/block/bcache0# cd bcache | 279 | host:/sys/block/bcache0# cd bcache |
260 | bash: cd: bcache: No such file or directory | 280 | bash: cd: bcache: No such file or directory |
261 | 281 | ||
262 | In this case, you may have to unregister the dmcrypt block device that | 282 | In this case, you may have to unregister the dmcrypt block device that |
263 | references this bcache to free it up: | 283 | references this bcache to free it up:: |
284 | |||
264 | host:~# dmsetup remove oldds1 | 285 | host:~# dmsetup remove oldds1 |
265 | bcache: bcache_device_free() bcache0 stopped | 286 | bcache: bcache_device_free() bcache0 stopped |
266 | bcache: cache_set_free() Cache set 5bc072a8-ab17-446d-9744-e247949913c1 unregistered | 287 | bcache: cache_set_free() Cache set 5bc072a8-ab17-446d-9744-e247949913c1 unregistered |
267 | 288 | ||
268 | This causes the backing bcache to be removed from /sys/fs/bcache and | 289 | This causes the backing bcache to be removed from /sys/fs/bcache and |
269 | then it can be reused. This would be true of any block device stacking | 290 | then it can be reused. This would be true of any block device stacking |
270 | where bcache is a lower device. | 291 | where bcache is a lower device. |
292 | |||
293 | 3) In other cases, you can also look in /sys/fs/bcache/:: | ||
271 | 294 | ||
272 | 3) In other cases, you can also look in /sys/fs/bcache/: | 295 | host:/sys/fs/bcache# ls -l */{cache?,bdev?} |
296 | lrwxrwxrwx 1 root root 0 Mar 5 09:39 0226553a-37cf-41d5-b3ce-8b1e944543a8/bdev1 -> ../../../devices/virtual/block/dm-1/bcache/ | ||
297 | lrwxrwxrwx 1 root root 0 Mar 5 09:39 0226553a-37cf-41d5-b3ce-8b1e944543a8/cache0 -> ../../../devices/virtual/block/dm-4/bcache/ | ||
298 | lrwxrwxrwx 1 root root 0 Mar 5 09:39 5bc072a8-ab17-446d-9744-e247949913c1/cache0 -> ../../../devices/pci0000:00/0000:00:01.0/0000:01:00.0/ata10/host9/target9:0:0/9:0:0:0/block/sdl/sdl2/bcache/ | ||
273 | 299 | ||
274 | host:/sys/fs/bcache# ls -l */{cache?,bdev?} | 300 | The device names will show which UUID is relevant, cd in that directory |
275 | lrwxrwxrwx 1 root root 0 Mar 5 09:39 0226553a-37cf-41d5-b3ce-8b1e944543a8/bdev1 -> ../../../devices/virtual/block/dm-1/bcache/ | 301 | and stop the cache:: |
276 | lrwxrwxrwx 1 root root 0 Mar 5 09:39 0226553a-37cf-41d5-b3ce-8b1e944543a8/cache0 -> ../../../devices/virtual/block/dm-4/bcache/ | ||
277 | lrwxrwxrwx 1 root root 0 Mar 5 09:39 5bc072a8-ab17-446d-9744-e247949913c1/cache0 -> ../../../devices/pci0000:00/0000:00:01.0/0000:01:00.0/ata10/host9/target9:0:0/9:0:0:0/block/sdl/sdl2/bcache/ | ||
278 | 302 | ||
279 | The device names will show which UUID is relevant, cd in that directory | ||
280 | and stop the cache: | ||
281 | host:/sys/fs/bcache/5bc072a8-ab17-446d-9744-e247949913c1# echo 1 > stop | 303 | host:/sys/fs/bcache/5bc072a8-ab17-446d-9744-e247949913c1# echo 1 > stop |
282 | 304 | ||
283 | This will free up bcache references and let you reuse the partition for | 305 | This will free up bcache references and let you reuse the partition for |
284 | other purposes. | 306 | other purposes. |
285 | 307 | ||
286 | 308 | ||
287 | 309 | ||
288 | TROUBLESHOOTING PERFORMANCE | 310 | Troubleshooting performance |
289 | --------------------------- | 311 | --------------------------- |
290 | 312 | ||
291 | Bcache has a bunch of config options and tunables. The defaults are intended to | 313 | Bcache has a bunch of config options and tunables. The defaults are intended to |
@@ -301,11 +323,13 @@ want for getting the best possible numbers when benchmarking. | |||
301 | raid stripe size to get the disk multiples that you would like. | 323 | raid stripe size to get the disk multiples that you would like. |
302 | 324 | ||
303 | For example: If you have a 64k stripe size, then the following offset | 325 | For example: If you have a 64k stripe size, then the following offset |
304 | would provide alignment for many common RAID5 data spindle counts: | 326 | would provide alignment for many common RAID5 data spindle counts:: |
327 | |||
305 | 64k * 2*2*2*3*3*5*7 bytes = 161280k | 328 | 64k * 2*2*2*3*3*5*7 bytes = 161280k |
306 | 329 | ||
307 | That space is wasted, but for only 157.5MB you can grow your RAID 5 | 330 | That space is wasted, but for only 157.5MB you can grow your RAID 5 |
308 | volume to the following data-spindle counts without re-aligning: | 331 | volume to the following data-spindle counts without re-aligning:: |
332 | |||
309 | 3,4,5,6,7,8,9,10,12,14,15,18,20,21 ... | 333 | 3,4,5,6,7,8,9,10,12,14,15,18,20,21 ... |
310 | 334 | ||
311 | - Bad write performance | 335 | - Bad write performance |
@@ -313,9 +337,9 @@ want for getting the best possible numbers when benchmarking. | |||
313 | If write performance is not what you expected, you probably wanted to be | 337 | If write performance is not what you expected, you probably wanted to be |
314 | running in writeback mode, which isn't the default (not due to a lack of | 338 | running in writeback mode, which isn't the default (not due to a lack of |
315 | maturity, but simply because in writeback mode you'll lose data if something | 339 | maturity, but simply because in writeback mode you'll lose data if something |
316 | happens to your SSD) | 340 | happens to your SSD):: |
317 | 341 | ||
318 | # echo writeback > /sys/block/bcache0/bcache/cache_mode | 342 | # echo writeback > /sys/block/bcache0/bcache/cache_mode |
319 | 343 | ||
320 | - Bad performance, or traffic not going to the SSD that you'd expect | 344 | - Bad performance, or traffic not going to the SSD that you'd expect |
321 | 345 | ||
@@ -325,13 +349,13 @@ want for getting the best possible numbers when benchmarking. | |||
325 | accessed data out of your cache. | 349 | accessed data out of your cache. |
326 | 350 | ||
327 | But if you want to benchmark reads from cache, and you start out with fio | 351 | But if you want to benchmark reads from cache, and you start out with fio |
328 | writing an 8 gigabyte test file - so you want to disable that. | 352 | writing an 8 gigabyte test file - so you want to disable that:: |
329 | 353 | ||
330 | # echo 0 > /sys/block/bcache0/bcache/sequential_cutoff | 354 | # echo 0 > /sys/block/bcache0/bcache/sequential_cutoff |
331 | 355 | ||
332 | To set it back to the default (4 mb), do | 356 | To set it back to the default (4 mb), do:: |
333 | 357 | ||
334 | # echo 4M > /sys/block/bcache0/bcache/sequential_cutoff | 358 | # echo 4M > /sys/block/bcache0/bcache/sequential_cutoff |
335 | 359 | ||
336 | - Traffic's still going to the spindle/still getting cache misses | 360 | - Traffic's still going to the spindle/still getting cache misses |
337 | 361 | ||
@@ -344,10 +368,10 @@ want for getting the best possible numbers when benchmarking. | |||
344 | throttles traffic if the latency exceeds a threshold (it does this by | 368 | throttles traffic if the latency exceeds a threshold (it does this by |
345 | cranking down the sequential bypass). | 369 | cranking down the sequential bypass). |
346 | 370 | ||
347 | You can disable this if you need to by setting the thresholds to 0: | 371 | You can disable this if you need to by setting the thresholds to 0:: |
348 | 372 | ||
349 | # echo 0 > /sys/fs/bcache/<cache set>/congested_read_threshold_us | 373 | # echo 0 > /sys/fs/bcache/<cache set>/congested_read_threshold_us |
350 | # echo 0 > /sys/fs/bcache/<cache set>/congested_write_threshold_us | 374 | # echo 0 > /sys/fs/bcache/<cache set>/congested_write_threshold_us |
351 | 375 | ||
352 | The default is 2000 us (2 milliseconds) for reads, and 20000 for writes. | 376 | The default is 2000 us (2 milliseconds) for reads, and 20000 for writes. |
353 | 377 | ||
@@ -369,7 +393,7 @@ want for getting the best possible numbers when benchmarking. | |||
369 | a fix for the issue there). | 393 | a fix for the issue there). |
370 | 394 | ||
371 | 395 | ||
372 | SYSFS - BACKING DEVICE | 396 | Sysfs - backing device |
373 | ---------------------- | 397 | ---------------------- |
374 | 398 | ||
375 | Available at /sys/block/<bdev>/bcache, /sys/block/bcache*/bcache and | 399 | Available at /sys/block/<bdev>/bcache, /sys/block/bcache*/bcache and |
@@ -454,7 +478,8 @@ writeback_running | |||
454 | still be added to the cache until it is mostly full; only meant for | 478 | still be added to the cache until it is mostly full; only meant for |
455 | benchmarking. Defaults to on. | 479 | benchmarking. Defaults to on. |
456 | 480 | ||
457 | SYSFS - BACKING DEVICE STATS: | 481 | Sysfs - backing device stats |
482 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
458 | 483 | ||
459 | There are directories with these numbers for a running total, as well as | 484 | There are directories with these numbers for a running total, as well as |
460 | versions that decay over the past day, hour and 5 minutes; they're also | 485 | versions that decay over the past day, hour and 5 minutes; they're also |
@@ -463,14 +488,11 @@ aggregated in the cache set directory as well. | |||
463 | bypassed | 488 | bypassed |
464 | Amount of IO (both reads and writes) that has bypassed the cache | 489 | Amount of IO (both reads and writes) that has bypassed the cache |
465 | 490 | ||
466 | cache_hits | 491 | cache_hits, cache_misses, cache_hit_ratio |
467 | cache_misses | ||
468 | cache_hit_ratio | ||
469 | Hits and misses are counted per individual IO as bcache sees them; a | 492 | Hits and misses are counted per individual IO as bcache sees them; a |
470 | partial hit is counted as a miss. | 493 | partial hit is counted as a miss. |
471 | 494 | ||
472 | cache_bypass_hits | 495 | cache_bypass_hits, cache_bypass_misses |
473 | cache_bypass_misses | ||
474 | Hits and misses for IO that is intended to skip the cache are still counted, | 496 | Hits and misses for IO that is intended to skip the cache are still counted, |
475 | but broken out here. | 497 | but broken out here. |
476 | 498 | ||
@@ -482,7 +504,8 @@ cache_miss_collisions | |||
482 | cache_readaheads | 504 | cache_readaheads |
483 | Count of times readahead occurred. | 505 | Count of times readahead occurred. |
484 | 506 | ||
485 | SYSFS - CACHE SET: | 507 | Sysfs - cache set |
508 | ~~~~~~~~~~~~~~~~~ | ||
486 | 509 | ||
487 | Available at /sys/fs/bcache/<cset-uuid> | 510 | Available at /sys/fs/bcache/<cset-uuid> |
488 | 511 | ||
@@ -520,8 +543,7 @@ flash_vol_create | |||
520 | Echoing a size to this file (in human readable units, k/M/G) creates a thinly | 543 | Echoing a size to this file (in human readable units, k/M/G) creates a thinly |
521 | provisioned volume backed by the cache set. | 544 | provisioned volume backed by the cache set. |
522 | 545 | ||
523 | io_error_halflife | 546 | io_error_halflife, io_error_limit |
524 | io_error_limit | ||
525 | These determines how many errors we accept before disabling the cache. | 547 | These determines how many errors we accept before disabling the cache. |
526 | Each error is decayed by the half life (in # ios). If the decaying count | 548 | Each error is decayed by the half life (in # ios). If the decaying count |
527 | reaches io_error_limit dirty data is written out and the cache is disabled. | 549 | reaches io_error_limit dirty data is written out and the cache is disabled. |
@@ -545,7 +567,8 @@ unregister | |||
545 | Detaches all backing devices and closes the cache devices; if dirty data is | 567 | Detaches all backing devices and closes the cache devices; if dirty data is |
546 | present it will disable writeback caching and wait for it to be flushed. | 568 | present it will disable writeback caching and wait for it to be flushed. |
547 | 569 | ||
548 | SYSFS - CACHE SET INTERNAL: | 570 | Sysfs - cache set internal |
571 | ~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
549 | 572 | ||
550 | This directory also exposes timings for a number of internal operations, with | 573 | This directory also exposes timings for a number of internal operations, with |
551 | separate files for average duration, average frequency, last occurrence and max | 574 | separate files for average duration, average frequency, last occurrence and max |
@@ -574,7 +597,8 @@ cache_read_races | |||
574 | trigger_gc | 597 | trigger_gc |
575 | Writing to this file forces garbage collection to run. | 598 | Writing to this file forces garbage collection to run. |
576 | 599 | ||
577 | SYSFS - CACHE DEVICE: | 600 | Sysfs - Cache device |
601 | ~~~~~~~~~~~~~~~~~~~~ | ||
578 | 602 | ||
579 | Available at /sys/block/<cdev>/bcache | 603 | Available at /sys/block/<cdev>/bcache |
580 | 604 | ||
diff --git a/Documentation/bt8xxgpio.txt b/Documentation/bt8xxgpio.txt index d8297e4ebd26..a845feb074de 100644 --- a/Documentation/bt8xxgpio.txt +++ b/Documentation/bt8xxgpio.txt | |||
@@ -1,12 +1,8 @@ | |||
1 | =============================================================== | 1 | =================================================================== |
2 | == BT8XXGPIO driver == | 2 | A driver for a selfmade cheap BT8xx based PCI GPIO-card (bt8xxgpio) |
3 | == == | 3 | =================================================================== |
4 | == A driver for a selfmade cheap BT8xx based PCI GPIO-card == | ||
5 | == == | ||
6 | == For advanced documentation, see == | ||
7 | == http://www.bu3sch.de/btgpio.php == | ||
8 | =============================================================== | ||
9 | 4 | ||
5 | For advanced documentation, see http://www.bu3sch.de/btgpio.php | ||
10 | 6 | ||
11 | A generic digital 24-port PCI GPIO card can be built out of an ordinary | 7 | A generic digital 24-port PCI GPIO card can be built out of an ordinary |
12 | Brooktree bt848, bt849, bt878 or bt879 based analog TV tuner card. The | 8 | Brooktree bt848, bt849, bt878 or bt879 based analog TV tuner card. The |
@@ -17,9 +13,8 @@ The bt8xx chip does have 24 digital GPIO ports. | |||
17 | These ports are accessible via 24 pins on the SMD chip package. | 13 | These ports are accessible via 24 pins on the SMD chip package. |
18 | 14 | ||
19 | 15 | ||
20 | ============================================== | 16 | How to physically access the GPIO pins |
21 | == How to physically access the GPIO pins == | 17 | ====================================== |
22 | ============================================== | ||
23 | 18 | ||
24 | The are several ways to access these pins. One might unsolder the whole chip | 19 | The are several ways to access these pins. One might unsolder the whole chip |
25 | and put it on a custom PCI board, or one might only unsolder each individual | 20 | and put it on a custom PCI board, or one might only unsolder each individual |
@@ -27,7 +22,7 @@ GPIO pin and solder that to some tiny wire. As the chip package really is tiny | |||
27 | there are some advanced soldering skills needed in any case. | 22 | there are some advanced soldering skills needed in any case. |
28 | 23 | ||
29 | The physical pinouts are drawn in the following ASCII art. | 24 | The physical pinouts are drawn in the following ASCII art. |
30 | The GPIO pins are marked with G00-G23 | 25 | The GPIO pins are marked with G00-G23:: |
31 | 26 | ||
32 | G G G G G G G G G G G G G G G G G G | 27 | G G G G G G G G G G G G G G G G G G |
33 | 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 | 28 | 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 |
diff --git a/Documentation/btmrvl.txt b/Documentation/btmrvl.txt index 34916a46c099..ec57740ead0c 100644 --- a/Documentation/btmrvl.txt +++ b/Documentation/btmrvl.txt | |||
@@ -1,18 +1,16 @@ | |||
1 | ======================================================================= | 1 | ============= |
2 | README for btmrvl driver | 2 | btmrvl driver |
3 | ======================================================================= | 3 | ============= |
4 | |||
5 | 4 | ||
6 | All commands are used via debugfs interface. | 5 | All commands are used via debugfs interface. |
7 | 6 | ||
8 | ===================== | 7 | Set/get driver configurations |
9 | Set/get driver configurations: | 8 | ============================= |
10 | 9 | ||
11 | Path: /debug/btmrvl/config/ | 10 | Path: /debug/btmrvl/config/ |
12 | 11 | ||
13 | gpiogap=[n] | 12 | gpiogap=[n], hscfgcmd |
14 | hscfgcmd | 13 | These commands are used to configure the host sleep parameters:: |
15 | These commands are used to configure the host sleep parameters. | ||
16 | bit 8:0 -- Gap | 14 | bit 8:0 -- Gap |
17 | bit 16:8 -- GPIO | 15 | bit 16:8 -- GPIO |
18 | 16 | ||
@@ -23,7 +21,8 @@ hscfgcmd | |||
23 | where Gap is the gap in milli seconds between wakeup signal and | 21 | where Gap is the gap in milli seconds between wakeup signal and |
24 | wakeup event, or 0xff for special host sleep setting. | 22 | wakeup event, or 0xff for special host sleep setting. |
25 | 23 | ||
26 | Usage: | 24 | Usage:: |
25 | |||
27 | # Use SDIO interface to wake up the host and set GAP to 0x80: | 26 | # Use SDIO interface to wake up the host and set GAP to 0x80: |
28 | echo 0xff80 > /debug/btmrvl/config/gpiogap | 27 | echo 0xff80 > /debug/btmrvl/config/gpiogap |
29 | echo 1 > /debug/btmrvl/config/hscfgcmd | 28 | echo 1 > /debug/btmrvl/config/hscfgcmd |
@@ -32,15 +31,16 @@ hscfgcmd | |||
32 | echo 0x03ff > /debug/btmrvl/config/gpiogap | 31 | echo 0x03ff > /debug/btmrvl/config/gpiogap |
33 | echo 1 > /debug/btmrvl/config/hscfgcmd | 32 | echo 1 > /debug/btmrvl/config/hscfgcmd |
34 | 33 | ||
35 | psmode=[n] | 34 | psmode=[n], pscmd |
36 | pscmd | ||
37 | These commands are used to enable/disable auto sleep mode | 35 | These commands are used to enable/disable auto sleep mode |
38 | 36 | ||
39 | where the option is: | 37 | where the option is:: |
38 | |||
40 | 1 -- Enable auto sleep mode | 39 | 1 -- Enable auto sleep mode |
41 | 0 -- Disable auto sleep mode | 40 | 0 -- Disable auto sleep mode |
42 | 41 | ||
43 | Usage: | 42 | Usage:: |
43 | |||
44 | # Enable auto sleep mode | 44 | # Enable auto sleep mode |
45 | echo 1 > /debug/btmrvl/config/psmode | 45 | echo 1 > /debug/btmrvl/config/psmode |
46 | echo 1 > /debug/btmrvl/config/pscmd | 46 | echo 1 > /debug/btmrvl/config/pscmd |
@@ -50,15 +50,16 @@ pscmd | |||
50 | echo 1 > /debug/btmrvl/config/pscmd | 50 | echo 1 > /debug/btmrvl/config/pscmd |
51 | 51 | ||
52 | 52 | ||
53 | hsmode=[n] | 53 | hsmode=[n], hscmd |
54 | hscmd | ||
55 | These commands are used to enable host sleep or wake up firmware | 54 | These commands are used to enable host sleep or wake up firmware |
56 | 55 | ||
57 | where the option is: | 56 | where the option is:: |
57 | |||
58 | 1 -- Enable host sleep | 58 | 1 -- Enable host sleep |
59 | 0 -- Wake up firmware | 59 | 0 -- Wake up firmware |
60 | 60 | ||
61 | Usage: | 61 | Usage:: |
62 | |||
62 | # Enable host sleep | 63 | # Enable host sleep |
63 | echo 1 > /debug/btmrvl/config/hsmode | 64 | echo 1 > /debug/btmrvl/config/hsmode |
64 | echo 1 > /debug/btmrvl/config/hscmd | 65 | echo 1 > /debug/btmrvl/config/hscmd |
@@ -68,12 +69,13 @@ hscmd | |||
68 | echo 1 > /debug/btmrvl/config/hscmd | 69 | echo 1 > /debug/btmrvl/config/hscmd |
69 | 70 | ||
70 | 71 | ||
71 | ====================== | 72 | Get driver status |
72 | Get driver status: | 73 | ================= |
73 | 74 | ||
74 | Path: /debug/btmrvl/status/ | 75 | Path: /debug/btmrvl/status/ |
75 | 76 | ||
76 | Usage: | 77 | Usage:: |
78 | |||
77 | cat /debug/btmrvl/status/<args> | 79 | cat /debug/btmrvl/status/<args> |
78 | 80 | ||
79 | where the args are: | 81 | where the args are: |
@@ -90,14 +92,17 @@ hsstate | |||
90 | txdnldrdy | 92 | txdnldrdy |
91 | This command displays the value of Tx download ready flag. | 93 | This command displays the value of Tx download ready flag. |
92 | 94 | ||
93 | 95 | Issuing a raw hci command | |
94 | ===================== | 96 | ========================= |
95 | 97 | ||
96 | Use hcitool to issue raw hci command, refer to hcitool manual | 98 | Use hcitool to issue raw hci command, refer to hcitool manual |
97 | 99 | ||
98 | Usage: Hcitool cmd <ogf> <ocf> [Parameters] | 100 | Usage:: |
101 | |||
102 | Hcitool cmd <ogf> <ocf> [Parameters] | ||
103 | |||
104 | Interface Control Command:: | ||
99 | 105 | ||
100 | Interface Control Command | ||
101 | hcitool cmd 0x3f 0x5b 0xf5 0x01 0x00 --Enable All interface | 106 | hcitool cmd 0x3f 0x5b 0xf5 0x01 0x00 --Enable All interface |
102 | hcitool cmd 0x3f 0x5b 0xf5 0x01 0x01 --Enable Wlan interface | 107 | hcitool cmd 0x3f 0x5b 0xf5 0x01 0x01 --Enable Wlan interface |
103 | hcitool cmd 0x3f 0x5b 0xf5 0x01 0x02 --Enable BT interface | 108 | hcitool cmd 0x3f 0x5b 0xf5 0x01 0x02 --Enable BT interface |
@@ -105,13 +110,13 @@ Use hcitool to issue raw hci command, refer to hcitool manual | |||
105 | hcitool cmd 0x3f 0x5b 0xf5 0x00 0x01 --Disable Wlan interface | 110 | hcitool cmd 0x3f 0x5b 0xf5 0x00 0x01 --Disable Wlan interface |
106 | hcitool cmd 0x3f 0x5b 0xf5 0x00 0x02 --Disable BT interface | 111 | hcitool cmd 0x3f 0x5b 0xf5 0x00 0x02 --Disable BT interface |
107 | 112 | ||
108 | ======================================================================= | 113 | SD8688 firmware |
109 | 114 | =============== | |
110 | 115 | ||
111 | SD8688 firmware: | 116 | Images: |
112 | 117 | ||
113 | /lib/firmware/sd8688_helper.bin | 118 | - /lib/firmware/sd8688_helper.bin |
114 | /lib/firmware/sd8688.bin | 119 | - /lib/firmware/sd8688.bin |
115 | 120 | ||
116 | 121 | ||
117 | The images can be downloaded from: | 122 | The images can be downloaded from: |
diff --git a/Documentation/bus-virt-phys-mapping.txt b/Documentation/bus-virt-phys-mapping.txt index 2bc55ff3b4d1..4bb07c2f3e7d 100644 --- a/Documentation/bus-virt-phys-mapping.txt +++ b/Documentation/bus-virt-phys-mapping.txt | |||
@@ -1,17 +1,27 @@ | |||
1 | [ NOTE: The virt_to_bus() and bus_to_virt() functions have been | 1 | ========================================================== |
2 | How to access I/O mapped memory from within device drivers | ||
3 | ========================================================== | ||
4 | |||
5 | :Author: Linus | ||
6 | |||
7 | .. warning:: | ||
8 | |||
9 | The virt_to_bus() and bus_to_virt() functions have been | ||
2 | superseded by the functionality provided by the PCI DMA interface | 10 | superseded by the functionality provided by the PCI DMA interface |
3 | (see Documentation/DMA-API-HOWTO.txt). They continue | 11 | (see Documentation/DMA-API-HOWTO.txt). They continue |
4 | to be documented below for historical purposes, but new code | 12 | to be documented below for historical purposes, but new code |
5 | must not use them. --davidm 00/12/12 ] | 13 | must not use them. --davidm 00/12/12 |
6 | 14 | ||
7 | [ This is a mail message in response to a query on IO mapping, thus the | 15 | :: |
8 | strange format for a "document" ] | 16 | |
17 | [ This is a mail message in response to a query on IO mapping, thus the | ||
18 | strange format for a "document" ] | ||
9 | 19 | ||
10 | The AHA-1542 is a bus-master device, and your patch makes the driver give the | 20 | The AHA-1542 is a bus-master device, and your patch makes the driver give the |
11 | controller the physical address of the buffers, which is correct on x86 | 21 | controller the physical address of the buffers, which is correct on x86 |
12 | (because all bus master devices see the physical memory mappings directly). | 22 | (because all bus master devices see the physical memory mappings directly). |
13 | 23 | ||
14 | However, on many setups, there are actually _three_ different ways of looking | 24 | However, on many setups, there are actually **three** different ways of looking |
15 | at memory addresses, and in this case we actually want the third, the | 25 | at memory addresses, and in this case we actually want the third, the |
16 | so-called "bus address". | 26 | so-called "bus address". |
17 | 27 | ||
@@ -38,7 +48,7 @@ because the memory and the devices share the same address space, and that is | |||
38 | not generally necessarily true on other PCI/ISA setups. | 48 | not generally necessarily true on other PCI/ISA setups. |
39 | 49 | ||
40 | Now, just as an example, on the PReP (PowerPC Reference Platform), the | 50 | Now, just as an example, on the PReP (PowerPC Reference Platform), the |
41 | CPU sees a memory map something like this (this is from memory): | 51 | CPU sees a memory map something like this (this is from memory):: |
42 | 52 | ||
43 | 0-2 GB "real memory" | 53 | 0-2 GB "real memory" |
44 | 2 GB-3 GB "system IO" (inb/out and similar accesses on x86) | 54 | 2 GB-3 GB "system IO" (inb/out and similar accesses on x86) |
@@ -52,7 +62,7 @@ So when the CPU wants any bus master to write to physical memory 0, it | |||
52 | has to give the master address 0x80000000 as the memory address. | 62 | has to give the master address 0x80000000 as the memory address. |
53 | 63 | ||
54 | So, for example, depending on how the kernel is actually mapped on the | 64 | So, for example, depending on how the kernel is actually mapped on the |
55 | PPC, you can end up with a setup like this: | 65 | PPC, you can end up with a setup like this:: |
56 | 66 | ||
57 | physical address: 0 | 67 | physical address: 0 |
58 | virtual address: 0xC0000000 | 68 | virtual address: 0xC0000000 |
@@ -61,7 +71,7 @@ PPC, you can end up with a setup like this: | |||
61 | where all the addresses actually point to the same thing. It's just seen | 71 | where all the addresses actually point to the same thing. It's just seen |
62 | through different translations.. | 72 | through different translations.. |
63 | 73 | ||
64 | Similarly, on the Alpha, the normal translation is | 74 | Similarly, on the Alpha, the normal translation is:: |
65 | 75 | ||
66 | physical address: 0 | 76 | physical address: 0 |
67 | virtual address: 0xfffffc0000000000 | 77 | virtual address: 0xfffffc0000000000 |
@@ -70,7 +80,7 @@ Similarly, on the Alpha, the normal translation is | |||
70 | (but there are also Alphas where the physical address and the bus address | 80 | (but there are also Alphas where the physical address and the bus address |
71 | are the same). | 81 | are the same). |
72 | 82 | ||
73 | Anyway, the way to look up all these translations, you do | 83 | Anyway, the way to look up all these translations, you do:: |
74 | 84 | ||
75 | #include <asm/io.h> | 85 | #include <asm/io.h> |
76 | 86 | ||
@@ -81,8 +91,8 @@ Anyway, the way to look up all these translations, you do | |||
81 | 91 | ||
82 | Now, when do you need these? | 92 | Now, when do you need these? |
83 | 93 | ||
84 | You want the _virtual_ address when you are actually going to access that | 94 | You want the **virtual** address when you are actually going to access that |
85 | pointer from the kernel. So you can have something like this: | 95 | pointer from the kernel. So you can have something like this:: |
86 | 96 | ||
87 | /* | 97 | /* |
88 | * this is the hardware "mailbox" we use to communicate with | 98 | * this is the hardware "mailbox" we use to communicate with |
@@ -104,7 +114,7 @@ pointer from the kernel. So you can have something like this: | |||
104 | ... | 114 | ... |
105 | 115 | ||
106 | on the other hand, you want the bus address when you have a buffer that | 116 | on the other hand, you want the bus address when you have a buffer that |
107 | you want to give to the controller: | 117 | you want to give to the controller:: |
108 | 118 | ||
109 | /* ask the controller to read the sense status into "sense_buffer" */ | 119 | /* ask the controller to read the sense status into "sense_buffer" */ |
110 | mbox.bufstart = virt_to_bus(&sense_buffer); | 120 | mbox.bufstart = virt_to_bus(&sense_buffer); |
@@ -112,7 +122,7 @@ you want to give to the controller: | |||
112 | mbox.status = 0; | 122 | mbox.status = 0; |
113 | notify_controller(&mbox); | 123 | notify_controller(&mbox); |
114 | 124 | ||
115 | And you generally _never_ want to use the physical address, because you can't | 125 | And you generally **never** want to use the physical address, because you can't |
116 | use that from the CPU (the CPU only uses translated virtual addresses), and | 126 | use that from the CPU (the CPU only uses translated virtual addresses), and |
117 | you can't use it from the bus master. | 127 | you can't use it from the bus master. |
118 | 128 | ||
@@ -124,8 +134,10 @@ be remapped as measured in units of pages, a.k.a. the pfn (the memory | |||
124 | management layer doesn't know about devices outside the CPU, so it | 134 | management layer doesn't know about devices outside the CPU, so it |
125 | shouldn't need to know about "bus addresses" etc). | 135 | shouldn't need to know about "bus addresses" etc). |
126 | 136 | ||
127 | NOTE NOTE NOTE! The above is only one part of the whole equation. The above | 137 | .. note:: |
128 | only talks about "real memory", that is, CPU memory (RAM). | 138 | |
139 | The above is only one part of the whole equation. The above | ||
140 | only talks about "real memory", that is, CPU memory (RAM). | ||
129 | 141 | ||
130 | There is a completely different type of memory too, and that's the "shared | 142 | There is a completely different type of memory too, and that's the "shared |
131 | memory" on the PCI or ISA bus. That's generally not RAM (although in the case | 143 | memory" on the PCI or ISA bus. That's generally not RAM (although in the case |
@@ -137,20 +149,22 @@ whatever, and there is only one way to access it: the readb/writeb and | |||
137 | related functions. You should never take the address of such memory, because | 149 | related functions. You should never take the address of such memory, because |
138 | there is really nothing you can do with such an address: it's not | 150 | there is really nothing you can do with such an address: it's not |
139 | conceptually in the same memory space as "real memory" at all, so you cannot | 151 | conceptually in the same memory space as "real memory" at all, so you cannot |
140 | just dereference a pointer. (Sadly, on x86 it _is_ in the same memory space, | 152 | just dereference a pointer. (Sadly, on x86 it **is** in the same memory space, |
141 | so on x86 it actually works to just deference a pointer, but it's not | 153 | so on x86 it actually works to just deference a pointer, but it's not |
142 | portable). | 154 | portable). |
143 | 155 | ||
144 | For such memory, you can do things like | 156 | For such memory, you can do things like: |
157 | |||
158 | - reading:: | ||
145 | 159 | ||
146 | - reading: | ||
147 | /* | 160 | /* |
148 | * read first 32 bits from ISA memory at 0xC0000, aka | 161 | * read first 32 bits from ISA memory at 0xC0000, aka |
149 | * C000:0000 in DOS terms | 162 | * C000:0000 in DOS terms |
150 | */ | 163 | */ |
151 | unsigned int signature = isa_readl(0xC0000); | 164 | unsigned int signature = isa_readl(0xC0000); |
152 | 165 | ||
153 | - remapping and writing: | 166 | - remapping and writing:: |
167 | |||
154 | /* | 168 | /* |
155 | * remap framebuffer PCI memory area at 0xFC000000, | 169 | * remap framebuffer PCI memory area at 0xFC000000, |
156 | * size 1MB, so that we can access it: We can directly | 170 | * size 1MB, so that we can access it: We can directly |
@@ -165,7 +179,8 @@ For such memory, you can do things like | |||
165 | /* unmap when we unload the driver */ | 179 | /* unmap when we unload the driver */ |
166 | iounmap(baseptr); | 180 | iounmap(baseptr); |
167 | 181 | ||
168 | - copying and clearing: | 182 | - copying and clearing:: |
183 | |||
169 | /* get the 6-byte Ethernet address at ISA address E000:0040 */ | 184 | /* get the 6-byte Ethernet address at ISA address E000:0040 */ |
170 | memcpy_fromio(kernel_buffer, 0xE0040, 6); | 185 | memcpy_fromio(kernel_buffer, 0xE0040, 6); |
171 | /* write a packet to the driver */ | 186 | /* write a packet to the driver */ |
@@ -181,10 +196,10 @@ happy that your driver works ;) | |||
181 | Note that kernel versions 2.0.x (and earlier) mistakenly called the | 196 | Note that kernel versions 2.0.x (and earlier) mistakenly called the |
182 | ioremap() function "vremap()". ioremap() is the proper name, but I | 197 | ioremap() function "vremap()". ioremap() is the proper name, but I |
183 | didn't think straight when I wrote it originally. People who have to | 198 | didn't think straight when I wrote it originally. People who have to |
184 | support both can do something like: | 199 | support both can do something like:: |
185 | 200 | ||
186 | /* support old naming silliness */ | 201 | /* support old naming silliness */ |
187 | #if LINUX_VERSION_CODE < 0x020100 | 202 | #if LINUX_VERSION_CODE < 0x020100 |
188 | #define ioremap vremap | 203 | #define ioremap vremap |
189 | #define iounmap vfree | 204 | #define iounmap vfree |
190 | #endif | 205 | #endif |
@@ -196,13 +211,10 @@ And the above sounds worse than it really is. Most real drivers really | |||
196 | don't do all that complex things (or rather: the complexity is not so | 211 | don't do all that complex things (or rather: the complexity is not so |
197 | much in the actual IO accesses as in error handling and timeouts etc). | 212 | much in the actual IO accesses as in error handling and timeouts etc). |
198 | It's generally not hard to fix drivers, and in many cases the code | 213 | It's generally not hard to fix drivers, and in many cases the code |
199 | actually looks better afterwards: | 214 | actually looks better afterwards:: |
200 | 215 | ||
201 | unsigned long signature = *(unsigned int *) 0xC0000; | 216 | unsigned long signature = *(unsigned int *) 0xC0000; |
202 | vs | 217 | vs |
203 | unsigned long signature = readl(0xC0000); | 218 | unsigned long signature = readl(0xC0000); |
204 | 219 | ||
205 | I think the second version actually is more readable, no? | 220 | I think the second version actually is more readable, no? |
206 | |||
207 | Linus | ||
208 | |||
diff --git a/Documentation/cachetlb.txt b/Documentation/cachetlb.txt index 3f9f808b5119..6eb9d3f090cd 100644 --- a/Documentation/cachetlb.txt +++ b/Documentation/cachetlb.txt | |||
@@ -1,7 +1,8 @@ | |||
1 | Cache and TLB Flushing | 1 | ================================== |
2 | Under Linux | 2 | Cache and TLB Flushing Under Linux |
3 | ================================== | ||
3 | 4 | ||
4 | David S. Miller <davem@redhat.com> | 5 | :Author: David S. Miller <davem@redhat.com> |
5 | 6 | ||
6 | This document describes the cache/tlb flushing interfaces called | 7 | This document describes the cache/tlb flushing interfaces called |
7 | by the Linux VM subsystem. It enumerates over each interface, | 8 | by the Linux VM subsystem. It enumerates over each interface, |
@@ -28,7 +29,7 @@ Therefore when software page table changes occur, the kernel will | |||
28 | invoke one of the following flush methods _after_ the page table | 29 | invoke one of the following flush methods _after_ the page table |
29 | changes occur: | 30 | changes occur: |
30 | 31 | ||
31 | 1) void flush_tlb_all(void) | 32 | 1) ``void flush_tlb_all(void)`` |
32 | 33 | ||
33 | The most severe flush of all. After this interface runs, | 34 | The most severe flush of all. After this interface runs, |
34 | any previous page table modification whatsoever will be | 35 | any previous page table modification whatsoever will be |
@@ -37,7 +38,7 @@ changes occur: | |||
37 | This is usually invoked when the kernel page tables are | 38 | This is usually invoked when the kernel page tables are |
38 | changed, since such translations are "global" in nature. | 39 | changed, since such translations are "global" in nature. |
39 | 40 | ||
40 | 2) void flush_tlb_mm(struct mm_struct *mm) | 41 | 2) ``void flush_tlb_mm(struct mm_struct *mm)`` |
41 | 42 | ||
42 | This interface flushes an entire user address space from | 43 | This interface flushes an entire user address space from |
43 | the TLB. After running, this interface must make sure that | 44 | the TLB. After running, this interface must make sure that |
@@ -49,8 +50,8 @@ changes occur: | |||
49 | page table operations such as what happens during | 50 | page table operations such as what happens during |
50 | fork, and exec. | 51 | fork, and exec. |
51 | 52 | ||
52 | 3) void flush_tlb_range(struct vm_area_struct *vma, | 53 | 3) ``void flush_tlb_range(struct vm_area_struct *vma, |
53 | unsigned long start, unsigned long end) | 54 | unsigned long start, unsigned long end)`` |
54 | 55 | ||
55 | Here we are flushing a specific range of (user) virtual | 56 | Here we are flushing a specific range of (user) virtual |
56 | address translations from the TLB. After running, this | 57 | address translations from the TLB. After running, this |
@@ -69,7 +70,7 @@ changes occur: | |||
69 | call flush_tlb_page (see below) for each entry which may be | 70 | call flush_tlb_page (see below) for each entry which may be |
70 | modified. | 71 | modified. |
71 | 72 | ||
72 | 4) void flush_tlb_page(struct vm_area_struct *vma, unsigned long addr) | 73 | 4) ``void flush_tlb_page(struct vm_area_struct *vma, unsigned long addr)`` |
73 | 74 | ||
74 | This time we need to remove the PAGE_SIZE sized translation | 75 | This time we need to remove the PAGE_SIZE sized translation |
75 | from the TLB. The 'vma' is the backing structure used by | 76 | from the TLB. The 'vma' is the backing structure used by |
@@ -87,8 +88,8 @@ changes occur: | |||
87 | 88 | ||
88 | This is used primarily during fault processing. | 89 | This is used primarily during fault processing. |
89 | 90 | ||
90 | 5) void update_mmu_cache(struct vm_area_struct *vma, | 91 | 5) ``void update_mmu_cache(struct vm_area_struct *vma, |
91 | unsigned long address, pte_t *ptep) | 92 | unsigned long address, pte_t *ptep)`` |
92 | 93 | ||
93 | At the end of every page fault, this routine is invoked to | 94 | At the end of every page fault, this routine is invoked to |
94 | tell the architecture specific code that a translation | 95 | tell the architecture specific code that a translation |
@@ -100,7 +101,7 @@ changes occur: | |||
100 | translations for software managed TLB configurations. | 101 | translations for software managed TLB configurations. |
101 | The sparc64 port currently does this. | 102 | The sparc64 port currently does this. |
102 | 103 | ||
103 | 6) void tlb_migrate_finish(struct mm_struct *mm) | 104 | 6) ``void tlb_migrate_finish(struct mm_struct *mm)`` |
104 | 105 | ||
105 | This interface is called at the end of an explicit | 106 | This interface is called at the end of an explicit |
106 | process migration. This interface provides a hook | 107 | process migration. This interface provides a hook |
@@ -112,7 +113,7 @@ changes occur: | |||
112 | 113 | ||
113 | Next, we have the cache flushing interfaces. In general, when Linux | 114 | Next, we have the cache flushing interfaces. In general, when Linux |
114 | is changing an existing virtual-->physical mapping to a new value, | 115 | is changing an existing virtual-->physical mapping to a new value, |
115 | the sequence will be in one of the following forms: | 116 | the sequence will be in one of the following forms:: |
116 | 117 | ||
117 | 1) flush_cache_mm(mm); | 118 | 1) flush_cache_mm(mm); |
118 | change_all_page_tables_of(mm); | 119 | change_all_page_tables_of(mm); |
@@ -143,7 +144,7 @@ and have no dependency on translation information. | |||
143 | 144 | ||
144 | Here are the routines, one by one: | 145 | Here are the routines, one by one: |
145 | 146 | ||
146 | 1) void flush_cache_mm(struct mm_struct *mm) | 147 | 1) ``void flush_cache_mm(struct mm_struct *mm)`` |
147 | 148 | ||
148 | This interface flushes an entire user address space from | 149 | This interface flushes an entire user address space from |
149 | the caches. That is, after running, there will be no cache | 150 | the caches. That is, after running, there will be no cache |
@@ -152,7 +153,7 @@ Here are the routines, one by one: | |||
152 | This interface is used to handle whole address space | 153 | This interface is used to handle whole address space |
153 | page table operations such as what happens during exit and exec. | 154 | page table operations such as what happens during exit and exec. |
154 | 155 | ||
155 | 2) void flush_cache_dup_mm(struct mm_struct *mm) | 156 | 2) ``void flush_cache_dup_mm(struct mm_struct *mm)`` |
156 | 157 | ||
157 | This interface flushes an entire user address space from | 158 | This interface flushes an entire user address space from |
158 | the caches. That is, after running, there will be no cache | 159 | the caches. That is, after running, there will be no cache |
@@ -164,8 +165,8 @@ Here are the routines, one by one: | |||
164 | This option is separate from flush_cache_mm to allow some | 165 | This option is separate from flush_cache_mm to allow some |
165 | optimizations for VIPT caches. | 166 | optimizations for VIPT caches. |
166 | 167 | ||
167 | 3) void flush_cache_range(struct vm_area_struct *vma, | 168 | 3) ``void flush_cache_range(struct vm_area_struct *vma, |
168 | unsigned long start, unsigned long end) | 169 | unsigned long start, unsigned long end)`` |
169 | 170 | ||
170 | Here we are flushing a specific range of (user) virtual | 171 | Here we are flushing a specific range of (user) virtual |
171 | addresses from the cache. After running, there will be no | 172 | addresses from the cache. After running, there will be no |
@@ -181,7 +182,7 @@ Here are the routines, one by one: | |||
181 | call flush_cache_page (see below) for each entry which may be | 182 | call flush_cache_page (see below) for each entry which may be |
182 | modified. | 183 | modified. |
183 | 184 | ||
184 | 4) void flush_cache_page(struct vm_area_struct *vma, unsigned long addr, unsigned long pfn) | 185 | 4) ``void flush_cache_page(struct vm_area_struct *vma, unsigned long addr, unsigned long pfn)`` |
185 | 186 | ||
186 | This time we need to remove a PAGE_SIZE sized range | 187 | This time we need to remove a PAGE_SIZE sized range |
187 | from the cache. The 'vma' is the backing structure used by | 188 | from the cache. The 'vma' is the backing structure used by |
@@ -202,7 +203,7 @@ Here are the routines, one by one: | |||
202 | 203 | ||
203 | This is used primarily during fault processing. | 204 | This is used primarily during fault processing. |
204 | 205 | ||
205 | 5) void flush_cache_kmaps(void) | 206 | 5) ``void flush_cache_kmaps(void)`` |
206 | 207 | ||
207 | This routine need only be implemented if the platform utilizes | 208 | This routine need only be implemented if the platform utilizes |
208 | highmem. It will be called right before all of the kmaps | 209 | highmem. It will be called right before all of the kmaps |
@@ -214,8 +215,8 @@ Here are the routines, one by one: | |||
214 | 215 | ||
215 | This routing should be implemented in asm/highmem.h | 216 | This routing should be implemented in asm/highmem.h |
216 | 217 | ||
217 | 6) void flush_cache_vmap(unsigned long start, unsigned long end) | 218 | 6) ``void flush_cache_vmap(unsigned long start, unsigned long end)`` |
218 | void flush_cache_vunmap(unsigned long start, unsigned long end) | 219 | ``void flush_cache_vunmap(unsigned long start, unsigned long end)`` |
219 | 220 | ||
220 | Here in these two interfaces we are flushing a specific range | 221 | Here in these two interfaces we are flushing a specific range |
221 | of (kernel) virtual addresses from the cache. After running, | 222 | of (kernel) virtual addresses from the cache. After running, |
@@ -243,8 +244,10 @@ size). This setting will force the SYSv IPC layer to only allow user | |||
243 | processes to mmap shared memory at address which are a multiple of | 244 | processes to mmap shared memory at address which are a multiple of |
244 | this value. | 245 | this value. |
245 | 246 | ||
246 | NOTE: This does not fix shared mmaps, check out the sparc64 port for | 247 | .. note:: |
247 | one way to solve this (in particular SPARC_FLAG_MMAPSHARED). | 248 | |
249 | This does not fix shared mmaps, check out the sparc64 port for | ||
250 | one way to solve this (in particular SPARC_FLAG_MMAPSHARED). | ||
248 | 251 | ||
249 | Next, you have to solve the D-cache aliasing issue for all | 252 | Next, you have to solve the D-cache aliasing issue for all |
250 | other cases. Please keep in mind that fact that, for a given page | 253 | other cases. Please keep in mind that fact that, for a given page |
@@ -255,8 +258,8 @@ physical page into its address space, by implication the D-cache | |||
255 | aliasing problem has the potential to exist since the kernel already | 258 | aliasing problem has the potential to exist since the kernel already |
256 | maps this page at its virtual address. | 259 | maps this page at its virtual address. |
257 | 260 | ||
258 | void copy_user_page(void *to, void *from, unsigned long addr, struct page *page) | 261 | ``void copy_user_page(void *to, void *from, unsigned long addr, struct page *page)`` |
259 | void clear_user_page(void *to, unsigned long addr, struct page *page) | 262 | ``void clear_user_page(void *to, unsigned long addr, struct page *page)`` |
260 | 263 | ||
261 | These two routines store data in user anonymous or COW | 264 | These two routines store data in user anonymous or COW |
262 | pages. It allows a port to efficiently avoid D-cache alias | 265 | pages. It allows a port to efficiently avoid D-cache alias |
@@ -276,14 +279,16 @@ maps this page at its virtual address. | |||
276 | If D-cache aliasing is not an issue, these two routines may | 279 | If D-cache aliasing is not an issue, these two routines may |
277 | simply call memcpy/memset directly and do nothing more. | 280 | simply call memcpy/memset directly and do nothing more. |
278 | 281 | ||
279 | void flush_dcache_page(struct page *page) | 282 | ``void flush_dcache_page(struct page *page)`` |
280 | 283 | ||
281 | Any time the kernel writes to a page cache page, _OR_ | 284 | Any time the kernel writes to a page cache page, _OR_ |
282 | the kernel is about to read from a page cache page and | 285 | the kernel is about to read from a page cache page and |
283 | user space shared/writable mappings of this page potentially | 286 | user space shared/writable mappings of this page potentially |
284 | exist, this routine is called. | 287 | exist, this routine is called. |
285 | 288 | ||
286 | NOTE: This routine need only be called for page cache pages | 289 | .. note:: |
290 | |||
291 | This routine need only be called for page cache pages | ||
287 | which can potentially ever be mapped into the address | 292 | which can potentially ever be mapped into the address |
288 | space of a user process. So for example, VFS layer code | 293 | space of a user process. So for example, VFS layer code |
289 | handling vfs symlinks in the page cache need not call | 294 | handling vfs symlinks in the page cache need not call |
@@ -322,18 +327,19 @@ maps this page at its virtual address. | |||
322 | made of this flag bit, and if set the flush is done and the flag | 327 | made of this flag bit, and if set the flush is done and the flag |
323 | bit is cleared. | 328 | bit is cleared. |
324 | 329 | ||
325 | IMPORTANT NOTE: It is often important, if you defer the flush, | 330 | .. important:: |
331 | |||
332 | It is often important, if you defer the flush, | ||
326 | that the actual flush occurs on the same CPU | 333 | that the actual flush occurs on the same CPU |
327 | as did the cpu stores into the page to make it | 334 | as did the cpu stores into the page to make it |
328 | dirty. Again, see sparc64 for examples of how | 335 | dirty. Again, see sparc64 for examples of how |
329 | to deal with this. | 336 | to deal with this. |
330 | 337 | ||
331 | void copy_to_user_page(struct vm_area_struct *vma, struct page *page, | 338 | ``void copy_to_user_page(struct vm_area_struct *vma, struct page *page, |
332 | unsigned long user_vaddr, | 339 | unsigned long user_vaddr, void *dst, void *src, int len)`` |
333 | void *dst, void *src, int len) | 340 | ``void copy_from_user_page(struct vm_area_struct *vma, struct page *page, |
334 | void copy_from_user_page(struct vm_area_struct *vma, struct page *page, | 341 | unsigned long user_vaddr, void *dst, void *src, int len)`` |
335 | unsigned long user_vaddr, | 342 | |
336 | void *dst, void *src, int len) | ||
337 | When the kernel needs to copy arbitrary data in and out | 343 | When the kernel needs to copy arbitrary data in and out |
338 | of arbitrary user pages (f.e. for ptrace()) it will use | 344 | of arbitrary user pages (f.e. for ptrace()) it will use |
339 | these two routines. | 345 | these two routines. |
@@ -344,8 +350,9 @@ maps this page at its virtual address. | |||
344 | likely that you will need to flush the instruction cache | 350 | likely that you will need to flush the instruction cache |
345 | for copy_to_user_page(). | 351 | for copy_to_user_page(). |
346 | 352 | ||
347 | void flush_anon_page(struct vm_area_struct *vma, struct page *page, | 353 | ``void flush_anon_page(struct vm_area_struct *vma, struct page *page, |
348 | unsigned long vmaddr) | 354 | unsigned long vmaddr)`` |
355 | |||
349 | When the kernel needs to access the contents of an anonymous | 356 | When the kernel needs to access the contents of an anonymous |
350 | page, it calls this function (currently only | 357 | page, it calls this function (currently only |
351 | get_user_pages()). Note: flush_dcache_page() deliberately | 358 | get_user_pages()). Note: flush_dcache_page() deliberately |
@@ -354,7 +361,8 @@ maps this page at its virtual address. | |||
354 | architectures). For incoherent architectures, it should flush | 361 | architectures). For incoherent architectures, it should flush |
355 | the cache of the page at vmaddr. | 362 | the cache of the page at vmaddr. |
356 | 363 | ||
357 | void flush_kernel_dcache_page(struct page *page) | 364 | ``void flush_kernel_dcache_page(struct page *page)`` |
365 | |||
358 | When the kernel needs to modify a user page is has obtained | 366 | When the kernel needs to modify a user page is has obtained |
359 | with kmap, it calls this function after all modifications are | 367 | with kmap, it calls this function after all modifications are |
360 | complete (but before kunmapping it) to bring the underlying | 368 | complete (but before kunmapping it) to bring the underlying |
@@ -366,14 +374,16 @@ maps this page at its virtual address. | |||
366 | the kernel cache for page (using page_address(page)). | 374 | the kernel cache for page (using page_address(page)). |
367 | 375 | ||
368 | 376 | ||
369 | void flush_icache_range(unsigned long start, unsigned long end) | 377 | ``void flush_icache_range(unsigned long start, unsigned long end)`` |
378 | |||
370 | When the kernel stores into addresses that it will execute | 379 | When the kernel stores into addresses that it will execute |
371 | out of (eg when loading modules), this function is called. | 380 | out of (eg when loading modules), this function is called. |
372 | 381 | ||
373 | If the icache does not snoop stores then this routine will need | 382 | If the icache does not snoop stores then this routine will need |
374 | to flush it. | 383 | to flush it. |
375 | 384 | ||
376 | void flush_icache_page(struct vm_area_struct *vma, struct page *page) | 385 | ``void flush_icache_page(struct vm_area_struct *vma, struct page *page)`` |
386 | |||
377 | All the functionality of flush_icache_page can be implemented in | 387 | All the functionality of flush_icache_page can be implemented in |
378 | flush_dcache_page and update_mmu_cache. In the future, the hope | 388 | flush_dcache_page and update_mmu_cache. In the future, the hope |
379 | is to remove this interface completely. | 389 | is to remove this interface completely. |
@@ -387,7 +397,8 @@ the kernel trying to do I/O to vmap areas must manually manage | |||
387 | coherency. It must do this by flushing the vmap range before doing | 397 | coherency. It must do this by flushing the vmap range before doing |
388 | I/O and invalidating it after the I/O returns. | 398 | I/O and invalidating it after the I/O returns. |
389 | 399 | ||
390 | void flush_kernel_vmap_range(void *vaddr, int size) | 400 | ``void flush_kernel_vmap_range(void *vaddr, int size)`` |
401 | |||
391 | flushes the kernel cache for a given virtual address range in | 402 | flushes the kernel cache for a given virtual address range in |
392 | the vmap area. This is to make sure that any data the kernel | 403 | the vmap area. This is to make sure that any data the kernel |
393 | modified in the vmap range is made visible to the physical | 404 | modified in the vmap range is made visible to the physical |
@@ -395,7 +406,8 @@ I/O and invalidating it after the I/O returns. | |||
395 | Note that this API does *not* also flush the offset map alias | 406 | Note that this API does *not* also flush the offset map alias |
396 | of the area. | 407 | of the area. |
397 | 408 | ||
398 | void invalidate_kernel_vmap_range(void *vaddr, int size) invalidates | 409 | ``void invalidate_kernel_vmap_range(void *vaddr, int size) invalidates`` |
410 | |||
399 | the cache for a given virtual address range in the vmap area | 411 | the cache for a given virtual address range in the vmap area |
400 | which prevents the processor from making the cache stale by | 412 | which prevents the processor from making the cache stale by |
401 | speculatively reading data while the I/O was occurring to the | 413 | speculatively reading data while the I/O was occurring to the |
diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt index e6101976e0f1..bde177103567 100644 --- a/Documentation/cgroup-v2.txt +++ b/Documentation/cgroup-v2.txt | |||
@@ -1,7 +1,9 @@ | |||
1 | 1 | ================ | |
2 | Control Group v2 | 2 | Control Group v2 |
3 | ================ | ||
3 | 4 | ||
4 | October, 2015 Tejun Heo <tj@kernel.org> | 5 | :Date: October, 2015 |
6 | :Author: Tejun Heo <tj@kernel.org> | ||
5 | 7 | ||
6 | This is the authoritative documentation on the design, interface and | 8 | This is the authoritative documentation on the design, interface and |
7 | conventions of cgroup v2. It describes all userland-visible aspects | 9 | conventions of cgroup v2. It describes all userland-visible aspects |
@@ -9,70 +11,72 @@ of cgroup including core and specific controller behaviors. All | |||
9 | future changes must be reflected in this document. Documentation for | 11 | future changes must be reflected in this document. Documentation for |
10 | v1 is available under Documentation/cgroup-v1/. | 12 | v1 is available under Documentation/cgroup-v1/. |
11 | 13 | ||
12 | CONTENTS | 14 | .. CONTENTS |
13 | 15 | ||
14 | 1. Introduction | 16 | 1. Introduction |
15 | 1-1. Terminology | 17 | 1-1. Terminology |
16 | 1-2. What is cgroup? | 18 | 1-2. What is cgroup? |
17 | 2. Basic Operations | 19 | 2. Basic Operations |
18 | 2-1. Mounting | 20 | 2-1. Mounting |
19 | 2-2. Organizing Processes | 21 | 2-2. Organizing Processes |
20 | 2-3. [Un]populated Notification | 22 | 2-3. [Un]populated Notification |
21 | 2-4. Controlling Controllers | 23 | 2-4. Controlling Controllers |
22 | 2-4-1. Enabling and Disabling | 24 | 2-4-1. Enabling and Disabling |
23 | 2-4-2. Top-down Constraint | 25 | 2-4-2. Top-down Constraint |
24 | 2-4-3. No Internal Process Constraint | 26 | 2-4-3. No Internal Process Constraint |
25 | 2-5. Delegation | 27 | 2-5. Delegation |
26 | 2-5-1. Model of Delegation | 28 | 2-5-1. Model of Delegation |
27 | 2-5-2. Delegation Containment | 29 | 2-5-2. Delegation Containment |
28 | 2-6. Guidelines | 30 | 2-6. Guidelines |
29 | 2-6-1. Organize Once and Control | 31 | 2-6-1. Organize Once and Control |
30 | 2-6-2. Avoid Name Collisions | 32 | 2-6-2. Avoid Name Collisions |
31 | 3. Resource Distribution Models | 33 | 3. Resource Distribution Models |
32 | 3-1. Weights | 34 | 3-1. Weights |
33 | 3-2. Limits | 35 | 3-2. Limits |
34 | 3-3. Protections | 36 | 3-3. Protections |
35 | 3-4. Allocations | 37 | 3-4. Allocations |
36 | 4. Interface Files | 38 | 4. Interface Files |
37 | 4-1. Format | 39 | 4-1. Format |
38 | 4-2. Conventions | 40 | 4-2. Conventions |
39 | 4-3. Core Interface Files | 41 | 4-3. Core Interface Files |
40 | 5. Controllers | 42 | 5. Controllers |
41 | 5-1. CPU | 43 | 5-1. CPU |
42 | 5-1-1. CPU Interface Files | 44 | 5-1-1. CPU Interface Files |
43 | 5-2. Memory | 45 | 5-2. Memory |
44 | 5-2-1. Memory Interface Files | 46 | 5-2-1. Memory Interface Files |
45 | 5-2-2. Usage Guidelines | 47 | 5-2-2. Usage Guidelines |
46 | 5-2-3. Memory Ownership | 48 | 5-2-3. Memory Ownership |
47 | 5-3. IO | 49 | 5-3. IO |
48 | 5-3-1. IO Interface Files | 50 | 5-3-1. IO Interface Files |
49 | 5-3-2. Writeback | 51 | 5-3-2. Writeback |
50 | 5-4. PID | 52 | 5-4. PID |
51 | 5-4-1. PID Interface Files | 53 | 5-4-1. PID Interface Files |
52 | 5-5. RDMA | 54 | 5-5. RDMA |
53 | 5-5-1. RDMA Interface Files | 55 | 5-5-1. RDMA Interface Files |
54 | 5-6. Misc | 56 | 5-6. Misc |
55 | 5-6-1. perf_event | 57 | 5-6-1. perf_event |
56 | 6. Namespace | 58 | 6. Namespace |
57 | 6-1. Basics | 59 | 6-1. Basics |
58 | 6-2. The Root and Views | 60 | 6-2. The Root and Views |
59 | 6-3. Migration and setns(2) | 61 | 6-3. Migration and setns(2) |
60 | 6-4. Interaction with Other Namespaces | 62 | 6-4. Interaction with Other Namespaces |
61 | P. Information on Kernel Programming | 63 | P. Information on Kernel Programming |
62 | P-1. Filesystem Support for Writeback | 64 | P-1. Filesystem Support for Writeback |
63 | D. Deprecated v1 Core Features | 65 | D. Deprecated v1 Core Features |
64 | R. Issues with v1 and Rationales for v2 | 66 | R. Issues with v1 and Rationales for v2 |
65 | R-1. Multiple Hierarchies | 67 | R-1. Multiple Hierarchies |
66 | R-2. Thread Granularity | 68 | R-2. Thread Granularity |
67 | R-3. Competition Between Inner Nodes and Threads | 69 | R-3. Competition Between Inner Nodes and Threads |
68 | R-4. Other Interface Issues | 70 | R-4. Other Interface Issues |
69 | R-5. Controller Issues and Remedies | 71 | R-5. Controller Issues and Remedies |
70 | R-5-1. Memory | 72 | R-5-1. Memory |
71 | 73 | ||
72 | 74 | ||
73 | 1. Introduction | 75 | Introduction |
74 | 76 | ============ | |
75 | 1-1. Terminology | 77 | |
78 | Terminology | ||
79 | ----------- | ||
76 | 80 | ||
77 | "cgroup" stands for "control group" and is never capitalized. The | 81 | "cgroup" stands for "control group" and is never capitalized. The |
78 | singular form is used to designate the whole feature and also as a | 82 | singular form is used to designate the whole feature and also as a |
@@ -80,7 +84,8 @@ qualifier as in "cgroup controllers". When explicitly referring to | |||
80 | multiple individual control groups, the plural form "cgroups" is used. | 84 | multiple individual control groups, the plural form "cgroups" is used. |
81 | 85 | ||
82 | 86 | ||
83 | 1-2. What is cgroup? | 87 | What is cgroup? |
88 | --------------- | ||
84 | 89 | ||
85 | cgroup is a mechanism to organize processes hierarchically and | 90 | cgroup is a mechanism to organize processes hierarchically and |
86 | distribute system resources along the hierarchy in a controlled and | 91 | distribute system resources along the hierarchy in a controlled and |
@@ -110,12 +115,14 @@ restrictions set closer to the root in the hierarchy can not be | |||
110 | overridden from further away. | 115 | overridden from further away. |
111 | 116 | ||
112 | 117 | ||
113 | 2. Basic Operations | 118 | Basic Operations |
119 | ================ | ||
114 | 120 | ||
115 | 2-1. Mounting | 121 | Mounting |
122 | -------- | ||
116 | 123 | ||
117 | Unlike v1, cgroup v2 has only single hierarchy. The cgroup v2 | 124 | Unlike v1, cgroup v2 has only single hierarchy. The cgroup v2 |
118 | hierarchy can be mounted with the following mount command. | 125 | hierarchy can be mounted with the following mount command:: |
119 | 126 | ||
120 | # mount -t cgroup2 none $MOUNT_POINT | 127 | # mount -t cgroup2 none $MOUNT_POINT |
121 | 128 | ||
@@ -160,10 +167,11 @@ cgroup v2 currently supports the following mount options. | |||
160 | Delegation section for details. | 167 | Delegation section for details. |
161 | 168 | ||
162 | 169 | ||
163 | 2-2. Organizing Processes | 170 | Organizing Processes |
171 | -------------------- | ||
164 | 172 | ||
165 | Initially, only the root cgroup exists to which all processes belong. | 173 | Initially, only the root cgroup exists to which all processes belong. |
166 | A child cgroup can be created by creating a sub-directory. | 174 | A child cgroup can be created by creating a sub-directory:: |
167 | 175 | ||
168 | # mkdir $CGROUP_NAME | 176 | # mkdir $CGROUP_NAME |
169 | 177 | ||
@@ -190,28 +198,29 @@ moved to another cgroup. | |||
190 | A cgroup which doesn't have any children or live processes can be | 198 | A cgroup which doesn't have any children or live processes can be |
191 | destroyed by removing the directory. Note that a cgroup which doesn't | 199 | destroyed by removing the directory. Note that a cgroup which doesn't |
192 | have any children and is associated only with zombie processes is | 200 | have any children and is associated only with zombie processes is |
193 | considered empty and can be removed. | 201 | considered empty and can be removed:: |
194 | 202 | ||
195 | # rmdir $CGROUP_NAME | 203 | # rmdir $CGROUP_NAME |
196 | 204 | ||
197 | "/proc/$PID/cgroup" lists a process's cgroup membership. If legacy | 205 | "/proc/$PID/cgroup" lists a process's cgroup membership. If legacy |
198 | cgroup is in use in the system, this file may contain multiple lines, | 206 | cgroup is in use in the system, this file may contain multiple lines, |
199 | one for each hierarchy. The entry for cgroup v2 is always in the | 207 | one for each hierarchy. The entry for cgroup v2 is always in the |
200 | format "0::$PATH". | 208 | format "0::$PATH":: |
201 | 209 | ||
202 | # cat /proc/842/cgroup | 210 | # cat /proc/842/cgroup |
203 | ... | 211 | ... |
204 | 0::/test-cgroup/test-cgroup-nested | 212 | 0::/test-cgroup/test-cgroup-nested |
205 | 213 | ||
206 | If the process becomes a zombie and the cgroup it was associated with | 214 | If the process becomes a zombie and the cgroup it was associated with |
207 | is removed subsequently, " (deleted)" is appended to the path. | 215 | is removed subsequently, " (deleted)" is appended to the path:: |
208 | 216 | ||
209 | # cat /proc/842/cgroup | 217 | # cat /proc/842/cgroup |
210 | ... | 218 | ... |
211 | 0::/test-cgroup/test-cgroup-nested (deleted) | 219 | 0::/test-cgroup/test-cgroup-nested (deleted) |
212 | 220 | ||
213 | 221 | ||
214 | 2-3. [Un]populated Notification | 222 | [Un]populated Notification |
223 | -------------------------- | ||
215 | 224 | ||
216 | Each non-root cgroup has a "cgroup.events" file which contains | 225 | Each non-root cgroup has a "cgroup.events" file which contains |
217 | "populated" field indicating whether the cgroup's sub-hierarchy has | 226 | "populated" field indicating whether the cgroup's sub-hierarchy has |
@@ -222,7 +231,7 @@ example, to start a clean-up operation after all processes of a given | |||
222 | sub-hierarchy have exited. The populated state updates and | 231 | sub-hierarchy have exited. The populated state updates and |
223 | notifications are recursive. Consider the following sub-hierarchy | 232 | notifications are recursive. Consider the following sub-hierarchy |
224 | where the numbers in the parentheses represent the numbers of processes | 233 | where the numbers in the parentheses represent the numbers of processes |
225 | in each cgroup. | 234 | in each cgroup:: |
226 | 235 | ||
227 | A(4) - B(0) - C(1) | 236 | A(4) - B(0) - C(1) |
228 | \ D(0) | 237 | \ D(0) |
@@ -233,18 +242,20 @@ file modified events will be generated on the "cgroup.events" files of | |||
233 | both cgroups. | 242 | both cgroups. |
234 | 243 | ||
235 | 244 | ||
236 | 2-4. Controlling Controllers | 245 | Controlling Controllers |
246 | ----------------------- | ||
237 | 247 | ||
238 | 2-4-1. Enabling and Disabling | 248 | Enabling and Disabling |
249 | ~~~~~~~~~~~~~~~~~~~~~~ | ||
239 | 250 | ||
240 | Each cgroup has a "cgroup.controllers" file which lists all | 251 | Each cgroup has a "cgroup.controllers" file which lists all |
241 | controllers available for the cgroup to enable. | 252 | controllers available for the cgroup to enable:: |
242 | 253 | ||
243 | # cat cgroup.controllers | 254 | # cat cgroup.controllers |
244 | cpu io memory | 255 | cpu io memory |
245 | 256 | ||
246 | No controller is enabled by default. Controllers can be enabled and | 257 | No controller is enabled by default. Controllers can be enabled and |
247 | disabled by writing to the "cgroup.subtree_control" file. | 258 | disabled by writing to the "cgroup.subtree_control" file:: |
248 | 259 | ||
249 | # echo "+cpu +memory -io" > cgroup.subtree_control | 260 | # echo "+cpu +memory -io" > cgroup.subtree_control |
250 | 261 | ||
@@ -256,7 +267,7 @@ are specified, the last one is effective. | |||
256 | Enabling a controller in a cgroup indicates that the distribution of | 267 | Enabling a controller in a cgroup indicates that the distribution of |
257 | the target resource across its immediate children will be controlled. | 268 | the target resource across its immediate children will be controlled. |
258 | Consider the following sub-hierarchy. The enabled controllers are | 269 | Consider the following sub-hierarchy. The enabled controllers are |
259 | listed in parentheses. | 270 | listed in parentheses:: |
260 | 271 | ||
261 | A(cpu,memory) - B(memory) - C() | 272 | A(cpu,memory) - B(memory) - C() |
262 | \ D() | 273 | \ D() |
@@ -276,7 +287,8 @@ controller interface files - anything which doesn't start with | |||
276 | "cgroup." are owned by the parent rather than the cgroup itself. | 287 | "cgroup." are owned by the parent rather than the cgroup itself. |
277 | 288 | ||
278 | 289 | ||
279 | 2-4-2. Top-down Constraint | 290 | Top-down Constraint |
291 | ~~~~~~~~~~~~~~~~~~~ | ||
280 | 292 | ||
281 | Resources are distributed top-down and a cgroup can further distribute | 293 | Resources are distributed top-down and a cgroup can further distribute |
282 | a resource only if the resource has been distributed to it from the | 294 | a resource only if the resource has been distributed to it from the |
@@ -287,7 +299,8 @@ the parent has the controller enabled and a controller can't be | |||
287 | disabled if one or more children have it enabled. | 299 | disabled if one or more children have it enabled. |
288 | 300 | ||
289 | 301 | ||
290 | 2-4-3. No Internal Process Constraint | 302 | No Internal Process Constraint |
303 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
291 | 304 | ||
292 | Non-root cgroups can only distribute resources to their children when | 305 | Non-root cgroups can only distribute resources to their children when |
293 | they don't have any processes of their own. In other words, only | 306 | they don't have any processes of their own. In other words, only |
@@ -314,9 +327,11 @@ children before enabling controllers in its "cgroup.subtree_control" | |||
314 | file. | 327 | file. |
315 | 328 | ||
316 | 329 | ||
317 | 2-5. Delegation | 330 | Delegation |
331 | ---------- | ||
318 | 332 | ||
319 | 2-5-1. Model of Delegation | 333 | Model of Delegation |
334 | ~~~~~~~~~~~~~~~~~~~ | ||
320 | 335 | ||
321 | A cgroup can be delegated in two ways. First, to a less privileged | 336 | A cgroup can be delegated in two ways. First, to a less privileged |
322 | user by granting write access of the directory and its "cgroup.procs" | 337 | user by granting write access of the directory and its "cgroup.procs" |
@@ -345,7 +360,8 @@ cgroups in or nesting depth of a delegated sub-hierarchy; however, | |||
345 | this may be limited explicitly in the future. | 360 | this may be limited explicitly in the future. |
346 | 361 | ||
347 | 362 | ||
348 | 2-5-2. Delegation Containment | 363 | Delegation Containment |
364 | ~~~~~~~~~~~~~~~~~~~~~~ | ||
349 | 365 | ||
350 | A delegated sub-hierarchy is contained in the sense that processes | 366 | A delegated sub-hierarchy is contained in the sense that processes |
351 | can't be moved into or out of the sub-hierarchy by the delegatee. | 367 | can't be moved into or out of the sub-hierarchy by the delegatee. |
@@ -366,7 +382,7 @@ in from or push out to outside the sub-hierarchy. | |||
366 | 382 | ||
367 | For an example, let's assume cgroups C0 and C1 have been delegated to | 383 | For an example, let's assume cgroups C0 and C1 have been delegated to |
368 | user U0 who created C00, C01 under C0 and C10 under C1 as follows and | 384 | user U0 who created C00, C01 under C0 and C10 under C1 as follows and |
369 | all processes under C0 and C1 belong to U0. | 385 | all processes under C0 and C1 belong to U0:: |
370 | 386 | ||
371 | ~~~~~~~~~~~~~ - C0 - C00 | 387 | ~~~~~~~~~~~~~ - C0 - C00 |
372 | ~ cgroup ~ \ C01 | 388 | ~ cgroup ~ \ C01 |
@@ -386,9 +402,11 @@ namespace of the process which is attempting the migration. If either | |||
386 | is not reachable, the migration is rejected with -ENOENT. | 402 | is not reachable, the migration is rejected with -ENOENT. |
387 | 403 | ||
388 | 404 | ||
389 | 2-6. Guidelines | 405 | Guidelines |
406 | ---------- | ||
390 | 407 | ||
391 | 2-6-1. Organize Once and Control | 408 | Organize Once and Control |
409 | ~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
392 | 410 | ||
393 | Migrating a process across cgroups is a relatively expensive operation | 411 | Migrating a process across cgroups is a relatively expensive operation |
394 | and stateful resources such as memory are not moved together with the | 412 | and stateful resources such as memory are not moved together with the |
@@ -404,7 +422,8 @@ distribution can be made by changing controller configuration through | |||
404 | the interface files. | 422 | the interface files. |
405 | 423 | ||
406 | 424 | ||
407 | 2-6-2. Avoid Name Collisions | 425 | Avoid Name Collisions |
426 | ~~~~~~~~~~~~~~~~~~~~~ | ||
408 | 427 | ||
409 | Interface files for a cgroup and its children cgroups occupy the same | 428 | Interface files for a cgroup and its children cgroups occupy the same |
410 | directory and it is possible to create children cgroups which collide | 429 | directory and it is possible to create children cgroups which collide |
@@ -422,14 +441,16 @@ cgroup doesn't do anything to prevent name collisions and it's the | |||
422 | user's responsibility to avoid them. | 441 | user's responsibility to avoid them. |
423 | 442 | ||
424 | 443 | ||
425 | 3. Resource Distribution Models | 444 | Resource Distribution Models |
445 | ============================ | ||
426 | 446 | ||
427 | cgroup controllers implement several resource distribution schemes | 447 | cgroup controllers implement several resource distribution schemes |
428 | depending on the resource type and expected use cases. This section | 448 | depending on the resource type and expected use cases. This section |
429 | describes major schemes in use along with their expected behaviors. | 449 | describes major schemes in use along with their expected behaviors. |
430 | 450 | ||
431 | 451 | ||
432 | 3-1. Weights | 452 | Weights |
453 | ------- | ||
433 | 454 | ||
434 | A parent's resource is distributed by adding up the weights of all | 455 | A parent's resource is distributed by adding up the weights of all |
435 | active children and giving each the fraction matching the ratio of its | 456 | active children and giving each the fraction matching the ratio of its |
@@ -450,7 +471,8 @@ process migrations. | |||
450 | and is an example of this type. | 471 | and is an example of this type. |
451 | 472 | ||
452 | 473 | ||
453 | 3-2. Limits | 474 | Limits |
475 | ------ | ||
454 | 476 | ||
455 | A child can only consume upto the configured amount of the resource. | 477 | A child can only consume upto the configured amount of the resource. |
456 | Limits can be over-committed - the sum of the limits of children can | 478 | Limits can be over-committed - the sum of the limits of children can |
@@ -466,7 +488,8 @@ process migrations. | |||
466 | on an IO device and is an example of this type. | 488 | on an IO device and is an example of this type. |
467 | 489 | ||
468 | 490 | ||
469 | 3-3. Protections | 491 | Protections |
492 | ----------- | ||
470 | 493 | ||
471 | A cgroup is protected to be allocated upto the configured amount of | 494 | A cgroup is protected to be allocated upto the configured amount of |
472 | the resource if the usages of all its ancestors are under their | 495 | the resource if the usages of all its ancestors are under their |
@@ -486,7 +509,8 @@ process migrations. | |||
486 | example of this type. | 509 | example of this type. |
487 | 510 | ||
488 | 511 | ||
489 | 3-4. Allocations | 512 | Allocations |
513 | ----------- | ||
490 | 514 | ||
491 | A cgroup is exclusively allocated a certain amount of a finite | 515 | A cgroup is exclusively allocated a certain amount of a finite |
492 | resource. Allocations can't be over-committed - the sum of the | 516 | resource. Allocations can't be over-committed - the sum of the |
@@ -505,12 +529,14 @@ may be rejected. | |||
505 | type. | 529 | type. |
506 | 530 | ||
507 | 531 | ||
508 | 4. Interface Files | 532 | Interface Files |
533 | =============== | ||
509 | 534 | ||
510 | 4-1. Format | 535 | Format |
536 | ------ | ||
511 | 537 | ||
512 | All interface files should be in one of the following formats whenever | 538 | All interface files should be in one of the following formats whenever |
513 | possible. | 539 | possible:: |
514 | 540 | ||
515 | New-line separated values | 541 | New-line separated values |
516 | (when only one value can be written at once) | 542 | (when only one value can be written at once) |
@@ -545,7 +571,8 @@ can be written at a time. For nested keyed files, the sub key pairs | |||
545 | may be specified in any order and not all pairs have to be specified. | 571 | may be specified in any order and not all pairs have to be specified. |
546 | 572 | ||
547 | 573 | ||
548 | 4-2. Conventions | 574 | Conventions |
575 | ----------- | ||
549 | 576 | ||
550 | - Settings for a single feature should be contained in a single file. | 577 | - Settings for a single feature should be contained in a single file. |
551 | 578 | ||
@@ -581,25 +608,25 @@ may be specified in any order and not all pairs have to be specified. | |||
581 | with "default" as the value must not appear when read. | 608 | with "default" as the value must not appear when read. |
582 | 609 | ||
583 | For example, a setting which is keyed by major:minor device numbers | 610 | For example, a setting which is keyed by major:minor device numbers |
584 | with integer values may look like the following. | 611 | with integer values may look like the following:: |
585 | 612 | ||
586 | # cat cgroup-example-interface-file | 613 | # cat cgroup-example-interface-file |
587 | default 150 | 614 | default 150 |
588 | 8:0 300 | 615 | 8:0 300 |
589 | 616 | ||
590 | The default value can be updated by | 617 | The default value can be updated by:: |
591 | 618 | ||
592 | # echo 125 > cgroup-example-interface-file | 619 | # echo 125 > cgroup-example-interface-file |
593 | 620 | ||
594 | or | 621 | or:: |
595 | 622 | ||
596 | # echo "default 125" > cgroup-example-interface-file | 623 | # echo "default 125" > cgroup-example-interface-file |
597 | 624 | ||
598 | An override can be set by | 625 | An override can be set by:: |
599 | 626 | ||
600 | # echo "8:16 170" > cgroup-example-interface-file | 627 | # echo "8:16 170" > cgroup-example-interface-file |
601 | 628 | ||
602 | and cleared by | 629 | and cleared by:: |
603 | 630 | ||
604 | # echo "8:0 default" > cgroup-example-interface-file | 631 | # echo "8:0 default" > cgroup-example-interface-file |
605 | # cat cgroup-example-interface-file | 632 | # cat cgroup-example-interface-file |
@@ -612,12 +639,12 @@ may be specified in any order and not all pairs have to be specified. | |||
612 | generated on the file. | 639 | generated on the file. |
613 | 640 | ||
614 | 641 | ||
615 | 4-3. Core Interface Files | 642 | Core Interface Files |
643 | -------------------- | ||
616 | 644 | ||
617 | All cgroup core files are prefixed with "cgroup." | 645 | All cgroup core files are prefixed with "cgroup." |
618 | 646 | ||
619 | cgroup.procs | 647 | cgroup.procs |
620 | |||
621 | A read-write new-line separated values file which exists on | 648 | A read-write new-line separated values file which exists on |
622 | all cgroups. | 649 | all cgroups. |
623 | 650 | ||
@@ -643,7 +670,6 @@ All cgroup core files are prefixed with "cgroup." | |||
643 | should be granted along with the containing directory. | 670 | should be granted along with the containing directory. |
644 | 671 | ||
645 | cgroup.controllers | 672 | cgroup.controllers |
646 | |||
647 | A read-only space separated values file which exists on all | 673 | A read-only space separated values file which exists on all |
648 | cgroups. | 674 | cgroups. |
649 | 675 | ||
@@ -651,7 +677,6 @@ All cgroup core files are prefixed with "cgroup." | |||
651 | the cgroup. The controllers are not ordered. | 677 | the cgroup. The controllers are not ordered. |
652 | 678 | ||
653 | cgroup.subtree_control | 679 | cgroup.subtree_control |
654 | |||
655 | A read-write space separated values file which exists on all | 680 | A read-write space separated values file which exists on all |
656 | cgroups. Starts out empty. | 681 | cgroups. Starts out empty. |
657 | 682 | ||
@@ -667,23 +692,25 @@ All cgroup core files are prefixed with "cgroup." | |||
667 | operations are specified, either all succeed or all fail. | 692 | operations are specified, either all succeed or all fail. |
668 | 693 | ||
669 | cgroup.events | 694 | cgroup.events |
670 | |||
671 | A read-only flat-keyed file which exists on non-root cgroups. | 695 | A read-only flat-keyed file which exists on non-root cgroups. |
672 | The following entries are defined. Unless specified | 696 | The following entries are defined. Unless specified |
673 | otherwise, a value change in this file generates a file | 697 | otherwise, a value change in this file generates a file |
674 | modified event. | 698 | modified event. |
675 | 699 | ||
676 | populated | 700 | populated |
677 | |||
678 | 1 if the cgroup or its descendants contains any live | 701 | 1 if the cgroup or its descendants contains any live |
679 | processes; otherwise, 0. | 702 | processes; otherwise, 0. |
680 | 703 | ||
681 | 704 | ||
682 | 5. Controllers | 705 | Controllers |
706 | =========== | ||
683 | 707 | ||
684 | 5-1. CPU | 708 | CPU |
709 | --- | ||
685 | 710 | ||
686 | [NOTE: The interface for the cpu controller hasn't been merged yet] | 711 | .. note:: |
712 | |||
713 | The interface for the cpu controller hasn't been merged yet | ||
687 | 714 | ||
688 | The "cpu" controllers regulates distribution of CPU cycles. This | 715 | The "cpu" controllers regulates distribution of CPU cycles. This |
689 | controller implements weight and absolute bandwidth limit models for | 716 | controller implements weight and absolute bandwidth limit models for |
@@ -691,36 +718,34 @@ normal scheduling policy and absolute bandwidth allocation model for | |||
691 | realtime scheduling policy. | 718 | realtime scheduling policy. |
692 | 719 | ||
693 | 720 | ||
694 | 5-1-1. CPU Interface Files | 721 | CPU Interface Files |
722 | ~~~~~~~~~~~~~~~~~~~ | ||
695 | 723 | ||
696 | All time durations are in microseconds. | 724 | All time durations are in microseconds. |
697 | 725 | ||
698 | cpu.stat | 726 | cpu.stat |
699 | |||
700 | A read-only flat-keyed file which exists on non-root cgroups. | 727 | A read-only flat-keyed file which exists on non-root cgroups. |
701 | 728 | ||
702 | It reports the following six stats. | 729 | It reports the following six stats: |
703 | 730 | ||
704 | usage_usec | 731 | - usage_usec |
705 | user_usec | 732 | - user_usec |
706 | system_usec | 733 | - system_usec |
707 | nr_periods | 734 | - nr_periods |
708 | nr_throttled | 735 | - nr_throttled |
709 | throttled_usec | 736 | - throttled_usec |
710 | 737 | ||
711 | cpu.weight | 738 | cpu.weight |
712 | |||
713 | A read-write single value file which exists on non-root | 739 | A read-write single value file which exists on non-root |
714 | cgroups. The default is "100". | 740 | cgroups. The default is "100". |
715 | 741 | ||
716 | The weight in the range [1, 10000]. | 742 | The weight in the range [1, 10000]. |
717 | 743 | ||
718 | cpu.max | 744 | cpu.max |
719 | |||
720 | A read-write two value file which exists on non-root cgroups. | 745 | A read-write two value file which exists on non-root cgroups. |
721 | The default is "max 100000". | 746 | The default is "max 100000". |
722 | 747 | ||
723 | The maximum bandwidth limit. It's in the following format. | 748 | The maximum bandwidth limit. It's in the following format:: |
724 | 749 | ||
725 | $MAX $PERIOD | 750 | $MAX $PERIOD |
726 | 751 | ||
@@ -729,9 +754,10 @@ All time durations are in microseconds. | |||
729 | one number is written, $MAX is updated. | 754 | one number is written, $MAX is updated. |
730 | 755 | ||
731 | cpu.rt.max | 756 | cpu.rt.max |
757 | .. note:: | ||
732 | 758 | ||
733 | [NOTE: The semantics of this file is still under discussion and the | 759 | The semantics of this file is still under discussion and the |
734 | interface hasn't been merged yet] | 760 | interface hasn't been merged yet |
735 | 761 | ||
736 | A read-write two value file which exists on all cgroups. | 762 | A read-write two value file which exists on all cgroups. |
737 | The default is "0 100000". | 763 | The default is "0 100000". |
@@ -739,7 +765,7 @@ All time durations are in microseconds. | |||
739 | The maximum realtime runtime allocation. Over-committing | 765 | The maximum realtime runtime allocation. Over-committing |
740 | configurations are disallowed and process migrations are | 766 | configurations are disallowed and process migrations are |
741 | rejected if not enough bandwidth is available. It's in the | 767 | rejected if not enough bandwidth is available. It's in the |
742 | following format. | 768 | following format:: |
743 | 769 | ||
744 | $MAX $PERIOD | 770 | $MAX $PERIOD |
745 | 771 | ||
@@ -748,7 +774,8 @@ All time durations are in microseconds. | |||
748 | updated. | 774 | updated. |
749 | 775 | ||
750 | 776 | ||
751 | 5-2. Memory | 777 | Memory |
778 | ------ | ||
752 | 779 | ||
753 | The "memory" controller regulates distribution of memory. Memory is | 780 | The "memory" controller regulates distribution of memory. Memory is |
754 | stateful and implements both limit and protection models. Due to the | 781 | stateful and implements both limit and protection models. Due to the |
@@ -770,14 +797,14 @@ following types of memory usages are tracked. | |||
770 | The above list may expand in the future for better coverage. | 797 | The above list may expand in the future for better coverage. |
771 | 798 | ||
772 | 799 | ||
773 | 5-2-1. Memory Interface Files | 800 | Memory Interface Files |
801 | ~~~~~~~~~~~~~~~~~~~~~~ | ||
774 | 802 | ||
775 | All memory amounts are in bytes. If a value which is not aligned to | 803 | All memory amounts are in bytes. If a value which is not aligned to |
776 | PAGE_SIZE is written, the value may be rounded up to the closest | 804 | PAGE_SIZE is written, the value may be rounded up to the closest |
777 | PAGE_SIZE multiple when read back. | 805 | PAGE_SIZE multiple when read back. |
778 | 806 | ||
779 | memory.current | 807 | memory.current |
780 | |||
781 | A read-only single value file which exists on non-root | 808 | A read-only single value file which exists on non-root |
782 | cgroups. | 809 | cgroups. |
783 | 810 | ||
@@ -785,7 +812,6 @@ PAGE_SIZE multiple when read back. | |||
785 | and its descendants. | 812 | and its descendants. |
786 | 813 | ||
787 | memory.low | 814 | memory.low |
788 | |||
789 | A read-write single value file which exists on non-root | 815 | A read-write single value file which exists on non-root |
790 | cgroups. The default is "0". | 816 | cgroups. The default is "0". |
791 | 817 | ||
@@ -798,7 +824,6 @@ PAGE_SIZE multiple when read back. | |||
798 | protection is discouraged. | 824 | protection is discouraged. |
799 | 825 | ||
800 | memory.high | 826 | memory.high |
801 | |||
802 | A read-write single value file which exists on non-root | 827 | A read-write single value file which exists on non-root |
803 | cgroups. The default is "max". | 828 | cgroups. The default is "max". |
804 | 829 | ||
@@ -811,7 +836,6 @@ PAGE_SIZE multiple when read back. | |||
811 | under extreme conditions the limit may be breached. | 836 | under extreme conditions the limit may be breached. |
812 | 837 | ||
813 | memory.max | 838 | memory.max |
814 | |||
815 | A read-write single value file which exists on non-root | 839 | A read-write single value file which exists on non-root |
816 | cgroups. The default is "max". | 840 | cgroups. The default is "max". |
817 | 841 | ||
@@ -826,21 +850,18 @@ PAGE_SIZE multiple when read back. | |||
826 | utility is limited to providing the final safety net. | 850 | utility is limited to providing the final safety net. |
827 | 851 | ||
828 | memory.events | 852 | memory.events |
829 | |||
830 | A read-only flat-keyed file which exists on non-root cgroups. | 853 | A read-only flat-keyed file which exists on non-root cgroups. |
831 | The following entries are defined. Unless specified | 854 | The following entries are defined. Unless specified |
832 | otherwise, a value change in this file generates a file | 855 | otherwise, a value change in this file generates a file |
833 | modified event. | 856 | modified event. |
834 | 857 | ||
835 | low | 858 | low |
836 | |||
837 | The number of times the cgroup is reclaimed due to | 859 | The number of times the cgroup is reclaimed due to |
838 | high memory pressure even though its usage is under | 860 | high memory pressure even though its usage is under |
839 | the low boundary. This usually indicates that the low | 861 | the low boundary. This usually indicates that the low |
840 | boundary is over-committed. | 862 | boundary is over-committed. |
841 | 863 | ||
842 | high | 864 | high |
843 | |||
844 | The number of times processes of the cgroup are | 865 | The number of times processes of the cgroup are |
845 | throttled and routed to perform direct memory reclaim | 866 | throttled and routed to perform direct memory reclaim |
846 | because the high memory boundary was exceeded. For a | 867 | because the high memory boundary was exceeded. For a |
@@ -849,13 +870,11 @@ PAGE_SIZE multiple when read back. | |||
849 | occurrences are expected. | 870 | occurrences are expected. |
850 | 871 | ||
851 | max | 872 | max |
852 | |||
853 | The number of times the cgroup's memory usage was | 873 | The number of times the cgroup's memory usage was |
854 | about to go over the max boundary. If direct reclaim | 874 | about to go over the max boundary. If direct reclaim |
855 | fails to bring it down, the cgroup goes to OOM state. | 875 | fails to bring it down, the cgroup goes to OOM state. |
856 | 876 | ||
857 | oom | 877 | oom |
858 | |||
859 | The number of time the cgroup's memory usage was | 878 | The number of time the cgroup's memory usage was |
860 | reached the limit and allocation was about to fail. | 879 | reached the limit and allocation was about to fail. |
861 | 880 | ||
@@ -864,16 +883,14 @@ PAGE_SIZE multiple when read back. | |||
864 | 883 | ||
865 | Failed allocation in its turn could be returned into | 884 | Failed allocation in its turn could be returned into |
866 | userspace as -ENOMEM or siletly ignored in cases like | 885 | userspace as -ENOMEM or siletly ignored in cases like |
867 | disk readahead. For now OOM in memory cgroup kills | 886 | disk readahead. For now OOM in memory cgroup kills |
868 | tasks iff shortage has happened inside page fault. | 887 | tasks iff shortage has happened inside page fault. |
869 | 888 | ||
870 | oom_kill | 889 | oom_kill |
871 | |||
872 | The number of processes belonging to this cgroup | 890 | The number of processes belonging to this cgroup |
873 | killed by any kind of OOM killer. | 891 | killed by any kind of OOM killer. |
874 | 892 | ||
875 | memory.stat | 893 | memory.stat |
876 | |||
877 | A read-only flat-keyed file which exists on non-root cgroups. | 894 | A read-only flat-keyed file which exists on non-root cgroups. |
878 | 895 | ||
879 | This breaks down the cgroup's memory footprint into different | 896 | This breaks down the cgroup's memory footprint into different |
@@ -887,73 +904,55 @@ PAGE_SIZE multiple when read back. | |||
887 | fixed position; use the keys to look up specific values! | 904 | fixed position; use the keys to look up specific values! |
888 | 905 | ||
889 | anon | 906 | anon |
890 | |||
891 | Amount of memory used in anonymous mappings such as | 907 | Amount of memory used in anonymous mappings such as |
892 | brk(), sbrk(), and mmap(MAP_ANONYMOUS) | 908 | brk(), sbrk(), and mmap(MAP_ANONYMOUS) |
893 | 909 | ||
894 | file | 910 | file |
895 | |||
896 | Amount of memory used to cache filesystem data, | 911 | Amount of memory used to cache filesystem data, |
897 | including tmpfs and shared memory. | 912 | including tmpfs and shared memory. |
898 | 913 | ||
899 | kernel_stack | 914 | kernel_stack |
900 | |||
901 | Amount of memory allocated to kernel stacks. | 915 | Amount of memory allocated to kernel stacks. |
902 | 916 | ||
903 | slab | 917 | slab |
904 | |||
905 | Amount of memory used for storing in-kernel data | 918 | Amount of memory used for storing in-kernel data |
906 | structures. | 919 | structures. |
907 | 920 | ||
908 | sock | 921 | sock |
909 | |||
910 | Amount of memory used in network transmission buffers | 922 | Amount of memory used in network transmission buffers |
911 | 923 | ||
912 | shmem | 924 | shmem |
913 | |||
914 | Amount of cached filesystem data that is swap-backed, | 925 | Amount of cached filesystem data that is swap-backed, |
915 | such as tmpfs, shm segments, shared anonymous mmap()s | 926 | such as tmpfs, shm segments, shared anonymous mmap()s |
916 | 927 | ||
917 | file_mapped | 928 | file_mapped |
918 | |||
919 | Amount of cached filesystem data mapped with mmap() | 929 | Amount of cached filesystem data mapped with mmap() |
920 | 930 | ||
921 | file_dirty | 931 | file_dirty |
922 | |||
923 | Amount of cached filesystem data that was modified but | 932 | Amount of cached filesystem data that was modified but |
924 | not yet written back to disk | 933 | not yet written back to disk |
925 | 934 | ||
926 | file_writeback | 935 | file_writeback |
927 | |||
928 | Amount of cached filesystem data that was modified and | 936 | Amount of cached filesystem data that was modified and |
929 | is currently being written back to disk | 937 | is currently being written back to disk |
930 | 938 | ||
931 | inactive_anon | 939 | inactive_anon, active_anon, inactive_file, active_file, unevictable |
932 | active_anon | ||
933 | inactive_file | ||
934 | active_file | ||
935 | unevictable | ||
936 | |||
937 | Amount of memory, swap-backed and filesystem-backed, | 940 | Amount of memory, swap-backed and filesystem-backed, |
938 | on the internal memory management lists used by the | 941 | on the internal memory management lists used by the |
939 | page reclaim algorithm | 942 | page reclaim algorithm |
940 | 943 | ||
941 | slab_reclaimable | 944 | slab_reclaimable |
942 | |||
943 | Part of "slab" that might be reclaimed, such as | 945 | Part of "slab" that might be reclaimed, such as |
944 | dentries and inodes. | 946 | dentries and inodes. |
945 | 947 | ||
946 | slab_unreclaimable | 948 | slab_unreclaimable |
947 | |||
948 | Part of "slab" that cannot be reclaimed on memory | 949 | Part of "slab" that cannot be reclaimed on memory |
949 | pressure. | 950 | pressure. |
950 | 951 | ||
951 | pgfault | 952 | pgfault |
952 | |||
953 | Total number of page faults incurred | 953 | Total number of page faults incurred |
954 | 954 | ||
955 | pgmajfault | 955 | pgmajfault |
956 | |||
957 | Number of major page faults incurred | 956 | Number of major page faults incurred |
958 | 957 | ||
959 | workingset_refault | 958 | workingset_refault |
@@ -997,7 +996,6 @@ PAGE_SIZE multiple when read back. | |||
997 | Amount of reclaimed lazyfree pages | 996 | Amount of reclaimed lazyfree pages |
998 | 997 | ||
999 | memory.swap.current | 998 | memory.swap.current |
1000 | |||
1001 | A read-only single value file which exists on non-root | 999 | A read-only single value file which exists on non-root |
1002 | cgroups. | 1000 | cgroups. |
1003 | 1001 | ||
@@ -1005,7 +1003,6 @@ PAGE_SIZE multiple when read back. | |||
1005 | and its descendants. | 1003 | and its descendants. |
1006 | 1004 | ||
1007 | memory.swap.max | 1005 | memory.swap.max |
1008 | |||
1009 | A read-write single value file which exists on non-root | 1006 | A read-write single value file which exists on non-root |
1010 | cgroups. The default is "max". | 1007 | cgroups. The default is "max". |
1011 | 1008 | ||
@@ -1013,7 +1010,8 @@ PAGE_SIZE multiple when read back. | |||
1013 | limit, anonymous meomry of the cgroup will not be swapped out. | 1010 | limit, anonymous meomry of the cgroup will not be swapped out. |
1014 | 1011 | ||
1015 | 1012 | ||
1016 | 5-2-2. Usage Guidelines | 1013 | Usage Guidelines |
1014 | ~~~~~~~~~~~~~~~~ | ||
1017 | 1015 | ||
1018 | "memory.high" is the main mechanism to control memory usage. | 1016 | "memory.high" is the main mechanism to control memory usage. |
1019 | Over-committing on high limit (sum of high limits > available memory) | 1017 | Over-committing on high limit (sum of high limits > available memory) |
@@ -1036,7 +1034,8 @@ memory; unfortunately, memory pressure monitoring mechanism isn't | |||
1036 | implemented yet. | 1034 | implemented yet. |
1037 | 1035 | ||
1038 | 1036 | ||
1039 | 5-2-3. Memory Ownership | 1037 | Memory Ownership |
1038 | ~~~~~~~~~~~~~~~~ | ||
1040 | 1039 | ||
1041 | A memory area is charged to the cgroup which instantiated it and stays | 1040 | A memory area is charged to the cgroup which instantiated it and stays |
1042 | charged to the cgroup until the area is released. Migrating a process | 1041 | charged to the cgroup until the area is released. Migrating a process |
@@ -1054,7 +1053,8 @@ POSIX_FADV_DONTNEED to relinquish the ownership of memory areas | |||
1054 | belonging to the affected files to ensure correct memory ownership. | 1053 | belonging to the affected files to ensure correct memory ownership. |
1055 | 1054 | ||
1056 | 1055 | ||
1057 | 5-3. IO | 1056 | IO |
1057 | -- | ||
1058 | 1058 | ||
1059 | The "io" controller regulates the distribution of IO resources. This | 1059 | The "io" controller regulates the distribution of IO resources. This |
1060 | controller implements both weight based and absolute bandwidth or IOPS | 1060 | controller implements both weight based and absolute bandwidth or IOPS |
@@ -1063,28 +1063,29 @@ only if cfq-iosched is in use and neither scheme is available for | |||
1063 | blk-mq devices. | 1063 | blk-mq devices. |
1064 | 1064 | ||
1065 | 1065 | ||
1066 | 5-3-1. IO Interface Files | 1066 | IO Interface Files |
1067 | ~~~~~~~~~~~~~~~~~~ | ||
1067 | 1068 | ||
1068 | io.stat | 1069 | io.stat |
1069 | |||
1070 | A read-only nested-keyed file which exists on non-root | 1070 | A read-only nested-keyed file which exists on non-root |
1071 | cgroups. | 1071 | cgroups. |
1072 | 1072 | ||
1073 | Lines are keyed by $MAJ:$MIN device numbers and not ordered. | 1073 | Lines are keyed by $MAJ:$MIN device numbers and not ordered. |
1074 | The following nested keys are defined. | 1074 | The following nested keys are defined. |
1075 | 1075 | ||
1076 | ====== =================== | ||
1076 | rbytes Bytes read | 1077 | rbytes Bytes read |
1077 | wbytes Bytes written | 1078 | wbytes Bytes written |
1078 | rios Number of read IOs | 1079 | rios Number of read IOs |
1079 | wios Number of write IOs | 1080 | wios Number of write IOs |
1081 | ====== =================== | ||
1080 | 1082 | ||
1081 | An example read output follows. | 1083 | An example read output follows: |
1082 | 1084 | ||
1083 | 8:16 rbytes=1459200 wbytes=314773504 rios=192 wios=353 | 1085 | 8:16 rbytes=1459200 wbytes=314773504 rios=192 wios=353 |
1084 | 8:0 rbytes=90430464 wbytes=299008000 rios=8950 wios=1252 | 1086 | 8:0 rbytes=90430464 wbytes=299008000 rios=8950 wios=1252 |
1085 | 1087 | ||
1086 | io.weight | 1088 | io.weight |
1087 | |||
1088 | A read-write flat-keyed file which exists on non-root cgroups. | 1089 | A read-write flat-keyed file which exists on non-root cgroups. |
1089 | The default is "default 100". | 1090 | The default is "default 100". |
1090 | 1091 | ||
@@ -1098,14 +1099,13 @@ blk-mq devices. | |||
1098 | $WEIGHT" or simply "$WEIGHT". Overrides can be set by writing | 1099 | $WEIGHT" or simply "$WEIGHT". Overrides can be set by writing |
1099 | "$MAJ:$MIN $WEIGHT" and unset by writing "$MAJ:$MIN default". | 1100 | "$MAJ:$MIN $WEIGHT" and unset by writing "$MAJ:$MIN default". |
1100 | 1101 | ||
1101 | An example read output follows. | 1102 | An example read output follows:: |
1102 | 1103 | ||
1103 | default 100 | 1104 | default 100 |
1104 | 8:16 200 | 1105 | 8:16 200 |
1105 | 8:0 50 | 1106 | 8:0 50 |
1106 | 1107 | ||
1107 | io.max | 1108 | io.max |
1108 | |||
1109 | A read-write nested-keyed file which exists on non-root | 1109 | A read-write nested-keyed file which exists on non-root |
1110 | cgroups. | 1110 | cgroups. |
1111 | 1111 | ||
@@ -1113,10 +1113,12 @@ blk-mq devices. | |||
1113 | device numbers and not ordered. The following nested keys are | 1113 | device numbers and not ordered. The following nested keys are |
1114 | defined. | 1114 | defined. |
1115 | 1115 | ||
1116 | ===== ================================== | ||
1116 | rbps Max read bytes per second | 1117 | rbps Max read bytes per second |
1117 | wbps Max write bytes per second | 1118 | wbps Max write bytes per second |
1118 | riops Max read IO operations per second | 1119 | riops Max read IO operations per second |
1119 | wiops Max write IO operations per second | 1120 | wiops Max write IO operations per second |
1121 | ===== ================================== | ||
1120 | 1122 | ||
1121 | When writing, any number of nested key-value pairs can be | 1123 | When writing, any number of nested key-value pairs can be |
1122 | specified in any order. "max" can be specified as the value | 1124 | specified in any order. "max" can be specified as the value |
@@ -1126,24 +1128,25 @@ blk-mq devices. | |||
1126 | BPS and IOPS are measured in each IO direction and IOs are | 1128 | BPS and IOPS are measured in each IO direction and IOs are |
1127 | delayed if limit is reached. Temporary bursts are allowed. | 1129 | delayed if limit is reached. Temporary bursts are allowed. |
1128 | 1130 | ||
1129 | Setting read limit at 2M BPS and write at 120 IOPS for 8:16. | 1131 | Setting read limit at 2M BPS and write at 120 IOPS for 8:16:: |
1130 | 1132 | ||
1131 | echo "8:16 rbps=2097152 wiops=120" > io.max | 1133 | echo "8:16 rbps=2097152 wiops=120" > io.max |
1132 | 1134 | ||
1133 | Reading returns the following. | 1135 | Reading returns the following:: |
1134 | 1136 | ||
1135 | 8:16 rbps=2097152 wbps=max riops=max wiops=120 | 1137 | 8:16 rbps=2097152 wbps=max riops=max wiops=120 |
1136 | 1138 | ||
1137 | Write IOPS limit can be removed by writing the following. | 1139 | Write IOPS limit can be removed by writing the following:: |
1138 | 1140 | ||
1139 | echo "8:16 wiops=max" > io.max | 1141 | echo "8:16 wiops=max" > io.max |
1140 | 1142 | ||
1141 | Reading now returns the following. | 1143 | Reading now returns the following:: |
1142 | 1144 | ||
1143 | 8:16 rbps=2097152 wbps=max riops=max wiops=max | 1145 | 8:16 rbps=2097152 wbps=max riops=max wiops=max |
1144 | 1146 | ||
1145 | 1147 | ||
1146 | 5-3-2. Writeback | 1148 | Writeback |
1149 | ~~~~~~~~~ | ||
1147 | 1150 | ||
1148 | Page cache is dirtied through buffered writes and shared mmaps and | 1151 | Page cache is dirtied through buffered writes and shared mmaps and |
1149 | written asynchronously to the backing filesystem by the writeback | 1152 | written asynchronously to the backing filesystem by the writeback |
@@ -1191,22 +1194,19 @@ patterns. | |||
1191 | The sysctl knobs which affect writeback behavior are applied to cgroup | 1194 | The sysctl knobs which affect writeback behavior are applied to cgroup |
1192 | writeback as follows. | 1195 | writeback as follows. |
1193 | 1196 | ||
1194 | vm.dirty_background_ratio | 1197 | vm.dirty_background_ratio, vm.dirty_ratio |
1195 | vm.dirty_ratio | ||
1196 | |||
1197 | These ratios apply the same to cgroup writeback with the | 1198 | These ratios apply the same to cgroup writeback with the |
1198 | amount of available memory capped by limits imposed by the | 1199 | amount of available memory capped by limits imposed by the |
1199 | memory controller and system-wide clean memory. | 1200 | memory controller and system-wide clean memory. |
1200 | 1201 | ||
1201 | vm.dirty_background_bytes | 1202 | vm.dirty_background_bytes, vm.dirty_bytes |
1202 | vm.dirty_bytes | ||
1203 | |||
1204 | For cgroup writeback, this is calculated into ratio against | 1203 | For cgroup writeback, this is calculated into ratio against |
1205 | total available memory and applied the same way as | 1204 | total available memory and applied the same way as |
1206 | vm.dirty[_background]_ratio. | 1205 | vm.dirty[_background]_ratio. |
1207 | 1206 | ||
1208 | 1207 | ||
1209 | 5-4. PID | 1208 | PID |
1209 | --- | ||
1210 | 1210 | ||
1211 | The process number controller is used to allow a cgroup to stop any | 1211 | The process number controller is used to allow a cgroup to stop any |
1212 | new tasks from being fork()'d or clone()'d after a specified limit is | 1212 | new tasks from being fork()'d or clone()'d after a specified limit is |
@@ -1221,17 +1221,16 @@ Note that PIDs used in this controller refer to TIDs, process IDs as | |||
1221 | used by the kernel. | 1221 | used by the kernel. |
1222 | 1222 | ||
1223 | 1223 | ||
1224 | 5-4-1. PID Interface Files | 1224 | PID Interface Files |
1225 | ~~~~~~~~~~~~~~~~~~~ | ||
1225 | 1226 | ||
1226 | pids.max | 1227 | pids.max |
1227 | |||
1228 | A read-write single value file which exists on non-root | 1228 | A read-write single value file which exists on non-root |
1229 | cgroups. The default is "max". | 1229 | cgroups. The default is "max". |
1230 | 1230 | ||
1231 | Hard limit of number of processes. | 1231 | Hard limit of number of processes. |
1232 | 1232 | ||
1233 | pids.current | 1233 | pids.current |
1234 | |||
1235 | A read-only single value file which exists on all cgroups. | 1234 | A read-only single value file which exists on all cgroups. |
1236 | 1235 | ||
1237 | The number of processes currently in the cgroup and its | 1236 | The number of processes currently in the cgroup and its |
@@ -1246,12 +1245,14 @@ through fork() or clone(). These will return -EAGAIN if the creation | |||
1246 | of a new process would cause a cgroup policy to be violated. | 1245 | of a new process would cause a cgroup policy to be violated. |
1247 | 1246 | ||
1248 | 1247 | ||
1249 | 5-5. RDMA | 1248 | RDMA |
1249 | ---- | ||
1250 | 1250 | ||
1251 | The "rdma" controller regulates the distribution and accounting of | 1251 | The "rdma" controller regulates the distribution and accounting of |
1252 | of RDMA resources. | 1252 | of RDMA resources. |
1253 | 1253 | ||
1254 | 5-5-1. RDMA Interface Files | 1254 | RDMA Interface Files |
1255 | ~~~~~~~~~~~~~~~~~~~~ | ||
1255 | 1256 | ||
1256 | rdma.max | 1257 | rdma.max |
1257 | A readwrite nested-keyed file that exists for all the cgroups | 1258 | A readwrite nested-keyed file that exists for all the cgroups |
@@ -1264,10 +1265,12 @@ of RDMA resources. | |||
1264 | 1265 | ||
1265 | The following nested keys are defined. | 1266 | The following nested keys are defined. |
1266 | 1267 | ||
1268 | ========== ============================= | ||
1267 | hca_handle Maximum number of HCA Handles | 1269 | hca_handle Maximum number of HCA Handles |
1268 | hca_object Maximum number of HCA Objects | 1270 | hca_object Maximum number of HCA Objects |
1271 | ========== ============================= | ||
1269 | 1272 | ||
1270 | An example for mlx4 and ocrdma device follows. | 1273 | An example for mlx4 and ocrdma device follows:: |
1271 | 1274 | ||
1272 | mlx4_0 hca_handle=2 hca_object=2000 | 1275 | mlx4_0 hca_handle=2 hca_object=2000 |
1273 | ocrdma1 hca_handle=3 hca_object=max | 1276 | ocrdma1 hca_handle=3 hca_object=max |
@@ -1276,15 +1279,17 @@ of RDMA resources. | |||
1276 | A read-only file that describes current resource usage. | 1279 | A read-only file that describes current resource usage. |
1277 | It exists for all the cgroup except root. | 1280 | It exists for all the cgroup except root. |
1278 | 1281 | ||
1279 | An example for mlx4 and ocrdma device follows. | 1282 | An example for mlx4 and ocrdma device follows:: |
1280 | 1283 | ||
1281 | mlx4_0 hca_handle=1 hca_object=20 | 1284 | mlx4_0 hca_handle=1 hca_object=20 |
1282 | ocrdma1 hca_handle=1 hca_object=23 | 1285 | ocrdma1 hca_handle=1 hca_object=23 |
1283 | 1286 | ||
1284 | 1287 | ||
1285 | 5-6. Misc | 1288 | Misc |
1289 | ---- | ||
1286 | 1290 | ||
1287 | 5-6-1. perf_event | 1291 | perf_event |
1292 | ~~~~~~~~~~ | ||
1288 | 1293 | ||
1289 | perf_event controller, if not mounted on a legacy hierarchy, is | 1294 | perf_event controller, if not mounted on a legacy hierarchy, is |
1290 | automatically enabled on the v2 hierarchy so that perf events can | 1295 | automatically enabled on the v2 hierarchy so that perf events can |
@@ -1292,9 +1297,11 @@ always be filtered by cgroup v2 path. The controller can still be | |||
1292 | moved to a legacy hierarchy after v2 hierarchy is populated. | 1297 | moved to a legacy hierarchy after v2 hierarchy is populated. |
1293 | 1298 | ||
1294 | 1299 | ||
1295 | 6. Namespace | 1300 | Namespace |
1301 | ========= | ||
1296 | 1302 | ||
1297 | 6-1. Basics | 1303 | Basics |
1304 | ------ | ||
1298 | 1305 | ||
1299 | cgroup namespace provides a mechanism to virtualize the view of the | 1306 | cgroup namespace provides a mechanism to virtualize the view of the |
1300 | "/proc/$PID/cgroup" file and cgroup mounts. The CLONE_NEWCGROUP clone | 1307 | "/proc/$PID/cgroup" file and cgroup mounts. The CLONE_NEWCGROUP clone |
@@ -1308,7 +1315,7 @@ Without cgroup namespace, the "/proc/$PID/cgroup" file shows the | |||
1308 | complete path of the cgroup of a process. In a container setup where | 1315 | complete path of the cgroup of a process. In a container setup where |
1309 | a set of cgroups and namespaces are intended to isolate processes the | 1316 | a set of cgroups and namespaces are intended to isolate processes the |
1310 | "/proc/$PID/cgroup" file may leak potential system level information | 1317 | "/proc/$PID/cgroup" file may leak potential system level information |
1311 | to the isolated processes. For Example: | 1318 | to the isolated processes. For Example:: |
1312 | 1319 | ||
1313 | # cat /proc/self/cgroup | 1320 | # cat /proc/self/cgroup |
1314 | 0::/batchjobs/container_id1 | 1321 | 0::/batchjobs/container_id1 |
@@ -1316,14 +1323,14 @@ to the isolated processes. For Example: | |||
1316 | The path '/batchjobs/container_id1' can be considered as system-data | 1323 | The path '/batchjobs/container_id1' can be considered as system-data |
1317 | and undesirable to expose to the isolated processes. cgroup namespace | 1324 | and undesirable to expose to the isolated processes. cgroup namespace |
1318 | can be used to restrict visibility of this path. For example, before | 1325 | can be used to restrict visibility of this path. For example, before |
1319 | creating a cgroup namespace, one would see: | 1326 | creating a cgroup namespace, one would see:: |
1320 | 1327 | ||
1321 | # ls -l /proc/self/ns/cgroup | 1328 | # ls -l /proc/self/ns/cgroup |
1322 | lrwxrwxrwx 1 root root 0 2014-07-15 10:37 /proc/self/ns/cgroup -> cgroup:[4026531835] | 1329 | lrwxrwxrwx 1 root root 0 2014-07-15 10:37 /proc/self/ns/cgroup -> cgroup:[4026531835] |
1323 | # cat /proc/self/cgroup | 1330 | # cat /proc/self/cgroup |
1324 | 0::/batchjobs/container_id1 | 1331 | 0::/batchjobs/container_id1 |
1325 | 1332 | ||
1326 | After unsharing a new namespace, the view changes. | 1333 | After unsharing a new namespace, the view changes:: |
1327 | 1334 | ||
1328 | # ls -l /proc/self/ns/cgroup | 1335 | # ls -l /proc/self/ns/cgroup |
1329 | lrwxrwxrwx 1 root root 0 2014-07-15 10:35 /proc/self/ns/cgroup -> cgroup:[4026532183] | 1336 | lrwxrwxrwx 1 root root 0 2014-07-15 10:35 /proc/self/ns/cgroup -> cgroup:[4026532183] |
@@ -1341,7 +1348,8 @@ namespace is destroyed. The cgroupns root and the actual cgroups | |||
1341 | remain. | 1348 | remain. |
1342 | 1349 | ||
1343 | 1350 | ||
1344 | 6-2. The Root and Views | 1351 | The Root and Views |
1352 | ------------------ | ||
1345 | 1353 | ||
1346 | The 'cgroupns root' for a cgroup namespace is the cgroup in which the | 1354 | The 'cgroupns root' for a cgroup namespace is the cgroup in which the |
1347 | process calling unshare(2) is running. For example, if a process in | 1355 | process calling unshare(2) is running. For example, if a process in |
@@ -1350,7 +1358,7 @@ process calling unshare(2) is running. For example, if a process in | |||
1350 | init_cgroup_ns, this is the real root ('/') cgroup. | 1358 | init_cgroup_ns, this is the real root ('/') cgroup. |
1351 | 1359 | ||
1352 | The cgroupns root cgroup does not change even if the namespace creator | 1360 | The cgroupns root cgroup does not change even if the namespace creator |
1353 | process later moves to a different cgroup. | 1361 | process later moves to a different cgroup:: |
1354 | 1362 | ||
1355 | # ~/unshare -c # unshare cgroupns in some cgroup | 1363 | # ~/unshare -c # unshare cgroupns in some cgroup |
1356 | # cat /proc/self/cgroup | 1364 | # cat /proc/self/cgroup |
@@ -1364,7 +1372,7 @@ Each process gets its namespace-specific view of "/proc/$PID/cgroup" | |||
1364 | 1372 | ||
1365 | Processes running inside the cgroup namespace will be able to see | 1373 | Processes running inside the cgroup namespace will be able to see |
1366 | cgroup paths (in /proc/self/cgroup) only inside their root cgroup. | 1374 | cgroup paths (in /proc/self/cgroup) only inside their root cgroup. |
1367 | From within an unshared cgroupns: | 1375 | From within an unshared cgroupns:: |
1368 | 1376 | ||
1369 | # sleep 100000 & | 1377 | # sleep 100000 & |
1370 | [1] 7353 | 1378 | [1] 7353 |
@@ -1373,7 +1381,7 @@ From within an unshared cgroupns: | |||
1373 | 0::/sub_cgrp_1 | 1381 | 0::/sub_cgrp_1 |
1374 | 1382 | ||
1375 | From the initial cgroup namespace, the real cgroup path will be | 1383 | From the initial cgroup namespace, the real cgroup path will be |
1376 | visible: | 1384 | visible:: |
1377 | 1385 | ||
1378 | $ cat /proc/7353/cgroup | 1386 | $ cat /proc/7353/cgroup |
1379 | 0::/batchjobs/container_id1/sub_cgrp_1 | 1387 | 0::/batchjobs/container_id1/sub_cgrp_1 |
@@ -1381,7 +1389,7 @@ visible: | |||
1381 | From a sibling cgroup namespace (that is, a namespace rooted at a | 1389 | From a sibling cgroup namespace (that is, a namespace rooted at a |
1382 | different cgroup), the cgroup path relative to its own cgroup | 1390 | different cgroup), the cgroup path relative to its own cgroup |
1383 | namespace root will be shown. For instance, if PID 7353's cgroup | 1391 | namespace root will be shown. For instance, if PID 7353's cgroup |
1384 | namespace root is at '/batchjobs/container_id2', then it will see | 1392 | namespace root is at '/batchjobs/container_id2', then it will see:: |
1385 | 1393 | ||
1386 | # cat /proc/7353/cgroup | 1394 | # cat /proc/7353/cgroup |
1387 | 0::/../container_id2/sub_cgrp_1 | 1395 | 0::/../container_id2/sub_cgrp_1 |
@@ -1390,13 +1398,14 @@ Note that the relative path always starts with '/' to indicate that | |||
1390 | its relative to the cgroup namespace root of the caller. | 1398 | its relative to the cgroup namespace root of the caller. |
1391 | 1399 | ||
1392 | 1400 | ||
1393 | 6-3. Migration and setns(2) | 1401 | Migration and setns(2) |
1402 | ---------------------- | ||
1394 | 1403 | ||
1395 | Processes inside a cgroup namespace can move into and out of the | 1404 | Processes inside a cgroup namespace can move into and out of the |
1396 | namespace root if they have proper access to external cgroups. For | 1405 | namespace root if they have proper access to external cgroups. For |
1397 | example, from inside a namespace with cgroupns root at | 1406 | example, from inside a namespace with cgroupns root at |
1398 | /batchjobs/container_id1, and assuming that the global hierarchy is | 1407 | /batchjobs/container_id1, and assuming that the global hierarchy is |
1399 | still accessible inside cgroupns: | 1408 | still accessible inside cgroupns:: |
1400 | 1409 | ||
1401 | # cat /proc/7353/cgroup | 1410 | # cat /proc/7353/cgroup |
1402 | 0::/sub_cgrp_1 | 1411 | 0::/sub_cgrp_1 |
@@ -1418,10 +1427,11 @@ namespace. It is expected that the someone moves the attaching | |||
1418 | process under the target cgroup namespace root. | 1427 | process under the target cgroup namespace root. |
1419 | 1428 | ||
1420 | 1429 | ||
1421 | 6-4. Interaction with Other Namespaces | 1430 | Interaction with Other Namespaces |
1431 | --------------------------------- | ||
1422 | 1432 | ||
1423 | Namespace specific cgroup hierarchy can be mounted by a process | 1433 | Namespace specific cgroup hierarchy can be mounted by a process |
1424 | running inside a non-init cgroup namespace. | 1434 | running inside a non-init cgroup namespace:: |
1425 | 1435 | ||
1426 | # mount -t cgroup2 none $MOUNT_POINT | 1436 | # mount -t cgroup2 none $MOUNT_POINT |
1427 | 1437 | ||
@@ -1434,27 +1444,27 @@ the view of cgroup hierarchy by namespace-private cgroupfs mount | |||
1434 | provides a properly isolated cgroup view inside the container. | 1444 | provides a properly isolated cgroup view inside the container. |
1435 | 1445 | ||
1436 | 1446 | ||
1437 | P. Information on Kernel Programming | 1447 | Information on Kernel Programming |
1448 | ================================= | ||
1438 | 1449 | ||
1439 | This section contains kernel programming information in the areas | 1450 | This section contains kernel programming information in the areas |
1440 | where interacting with cgroup is necessary. cgroup core and | 1451 | where interacting with cgroup is necessary. cgroup core and |
1441 | controllers are not covered. | 1452 | controllers are not covered. |
1442 | 1453 | ||
1443 | 1454 | ||
1444 | P-1. Filesystem Support for Writeback | 1455 | Filesystem Support for Writeback |
1456 | -------------------------------- | ||
1445 | 1457 | ||
1446 | A filesystem can support cgroup writeback by updating | 1458 | A filesystem can support cgroup writeback by updating |
1447 | address_space_operations->writepage[s]() to annotate bio's using the | 1459 | address_space_operations->writepage[s]() to annotate bio's using the |
1448 | following two functions. | 1460 | following two functions. |
1449 | 1461 | ||
1450 | wbc_init_bio(@wbc, @bio) | 1462 | wbc_init_bio(@wbc, @bio) |
1451 | |||
1452 | Should be called for each bio carrying writeback data and | 1463 | Should be called for each bio carrying writeback data and |
1453 | associates the bio with the inode's owner cgroup. Can be | 1464 | associates the bio with the inode's owner cgroup. Can be |
1454 | called anytime between bio allocation and submission. | 1465 | called anytime between bio allocation and submission. |
1455 | 1466 | ||
1456 | wbc_account_io(@wbc, @page, @bytes) | 1467 | wbc_account_io(@wbc, @page, @bytes) |
1457 | |||
1458 | Should be called for each data segment being written out. | 1468 | Should be called for each data segment being written out. |
1459 | While this function doesn't care exactly when it's called | 1469 | While this function doesn't care exactly when it's called |
1460 | during the writeback session, it's the easiest and most | 1470 | during the writeback session, it's the easiest and most |
@@ -1475,7 +1485,8 @@ cases by skipping wbc_init_bio() or using bio_associate_blkcg() | |||
1475 | directly. | 1485 | directly. |
1476 | 1486 | ||
1477 | 1487 | ||
1478 | D. Deprecated v1 Core Features | 1488 | Deprecated v1 Core Features |
1489 | =========================== | ||
1479 | 1490 | ||
1480 | - Multiple hierarchies including named ones are not supported. | 1491 | - Multiple hierarchies including named ones are not supported. |
1481 | 1492 | ||
@@ -1489,9 +1500,11 @@ D. Deprecated v1 Core Features | |||
1489 | at the root instead. | 1500 | at the root instead. |
1490 | 1501 | ||
1491 | 1502 | ||
1492 | R. Issues with v1 and Rationales for v2 | 1503 | Issues with v1 and Rationales for v2 |
1504 | ==================================== | ||
1493 | 1505 | ||
1494 | R-1. Multiple Hierarchies | 1506 | Multiple Hierarchies |
1507 | -------------------- | ||
1495 | 1508 | ||
1496 | cgroup v1 allowed an arbitrary number of hierarchies and each | 1509 | cgroup v1 allowed an arbitrary number of hierarchies and each |
1497 | hierarchy could host any number of controllers. While this seemed to | 1510 | hierarchy could host any number of controllers. While this seemed to |
@@ -1543,7 +1556,8 @@ how memory is distributed beyond a certain level while still wanting | |||
1543 | to control how CPU cycles are distributed. | 1556 | to control how CPU cycles are distributed. |
1544 | 1557 | ||
1545 | 1558 | ||
1546 | R-2. Thread Granularity | 1559 | Thread Granularity |
1560 | ------------------ | ||
1547 | 1561 | ||
1548 | cgroup v1 allowed threads of a process to belong to different cgroups. | 1562 | cgroup v1 allowed threads of a process to belong to different cgroups. |
1549 | This didn't make sense for some controllers and those controllers | 1563 | This didn't make sense for some controllers and those controllers |
@@ -1586,7 +1600,8 @@ misbehaving and poorly abstracted interfaces and kernel exposing and | |||
1586 | locked into constructs inadvertently. | 1600 | locked into constructs inadvertently. |
1587 | 1601 | ||
1588 | 1602 | ||
1589 | R-3. Competition Between Inner Nodes and Threads | 1603 | Competition Between Inner Nodes and Threads |
1604 | ------------------------------------------- | ||
1590 | 1605 | ||
1591 | cgroup v1 allowed threads to be in any cgroups which created an | 1606 | cgroup v1 allowed threads to be in any cgroups which created an |
1592 | interesting problem where threads belonging to a parent cgroup and its | 1607 | interesting problem where threads belonging to a parent cgroup and its |
@@ -1605,7 +1620,7 @@ simply weren't available for threads. | |||
1605 | 1620 | ||
1606 | The io controller implicitly created a hidden leaf node for each | 1621 | The io controller implicitly created a hidden leaf node for each |
1607 | cgroup to host the threads. The hidden leaf had its own copies of all | 1622 | cgroup to host the threads. The hidden leaf had its own copies of all |
1608 | the knobs with "leaf_" prefixed. While this allowed equivalent | 1623 | the knobs with ``leaf_`` prefixed. While this allowed equivalent |
1609 | control over internal threads, it was with serious drawbacks. It | 1624 | control over internal threads, it was with serious drawbacks. It |
1610 | always added an extra layer of nesting which wouldn't be necessary | 1625 | always added an extra layer of nesting which wouldn't be necessary |
1611 | otherwise, made the interface messy and significantly complicated the | 1626 | otherwise, made the interface messy and significantly complicated the |
@@ -1626,7 +1641,8 @@ This clearly is a problem which needs to be addressed from cgroup core | |||
1626 | in a uniform way. | 1641 | in a uniform way. |
1627 | 1642 | ||
1628 | 1643 | ||
1629 | R-4. Other Interface Issues | 1644 | Other Interface Issues |
1645 | ---------------------- | ||
1630 | 1646 | ||
1631 | cgroup v1 grew without oversight and developed a large number of | 1647 | cgroup v1 grew without oversight and developed a large number of |
1632 | idiosyncrasies and inconsistencies. One issue on the cgroup core side | 1648 | idiosyncrasies and inconsistencies. One issue on the cgroup core side |
@@ -1654,9 +1670,11 @@ cgroup v2 establishes common conventions where appropriate and updates | |||
1654 | controllers so that they expose minimal and consistent interfaces. | 1670 | controllers so that they expose minimal and consistent interfaces. |
1655 | 1671 | ||
1656 | 1672 | ||
1657 | R-5. Controller Issues and Remedies | 1673 | Controller Issues and Remedies |
1674 | ------------------------------ | ||
1658 | 1675 | ||
1659 | R-5-1. Memory | 1676 | Memory |
1677 | ~~~~~~ | ||
1660 | 1678 | ||
1661 | The original lower boundary, the soft limit, is defined as a limit | 1679 | The original lower boundary, the soft limit, is defined as a limit |
1662 | that is per default unset. As a result, the set of cgroups that | 1680 | that is per default unset. As a result, the set of cgroups that |
diff --git a/Documentation/circular-buffers.txt b/Documentation/circular-buffers.txt index 4a824d232472..d4628174b7c5 100644 --- a/Documentation/circular-buffers.txt +++ b/Documentation/circular-buffers.txt | |||
@@ -1,9 +1,9 @@ | |||
1 | ================ | 1 | ================ |
2 | CIRCULAR BUFFERS | 2 | Circular Buffers |
3 | ================ | 3 | ================ |
4 | 4 | ||
5 | By: David Howells <dhowells@redhat.com> | 5 | :Author: David Howells <dhowells@redhat.com> |
6 | Paul E. McKenney <paulmck@linux.vnet.ibm.com> | 6 | :Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com> |
7 | 7 | ||
8 | 8 | ||
9 | Linux provides a number of features that can be used to implement circular | 9 | Linux provides a number of features that can be used to implement circular |
@@ -20,7 +20,7 @@ producer and just one consumer. It is possible to handle multiple producers by | |||
20 | serialising them, and to handle multiple consumers by serialising them. | 20 | serialising them, and to handle multiple consumers by serialising them. |
21 | 21 | ||
22 | 22 | ||
23 | Contents: | 23 | .. Contents: |
24 | 24 | ||
25 | (*) What is a circular buffer? | 25 | (*) What is a circular buffer? |
26 | 26 | ||
@@ -31,8 +31,8 @@ Contents: | |||
31 | - The consumer. | 31 | - The consumer. |
32 | 32 | ||
33 | 33 | ||
34 | ========================== | 34 | |
35 | WHAT IS A CIRCULAR BUFFER? | 35 | What is a circular buffer? |
36 | ========================== | 36 | ========================== |
37 | 37 | ||
38 | First of all, what is a circular buffer? A circular buffer is a buffer of | 38 | First of all, what is a circular buffer? A circular buffer is a buffer of |
@@ -60,9 +60,7 @@ buffer, provided that neither index overtakes the other. The implementer must | |||
60 | be careful, however, as a region more than one unit in size may wrap the end of | 60 | be careful, however, as a region more than one unit in size may wrap the end of |
61 | the buffer and be broken into two segments. | 61 | the buffer and be broken into two segments. |
62 | 62 | ||
63 | 63 | Measuring power-of-2 buffers | |
64 | ============================ | ||
65 | MEASURING POWER-OF-2 BUFFERS | ||
66 | ============================ | 64 | ============================ |
67 | 65 | ||
68 | Calculation of the occupancy or the remaining capacity of an arbitrarily sized | 66 | Calculation of the occupancy or the remaining capacity of an arbitrarily sized |
@@ -71,13 +69,13 @@ modulus (divide) instruction. However, if the buffer is of a power-of-2 size, | |||
71 | then a much quicker bitwise-AND instruction can be used instead. | 69 | then a much quicker bitwise-AND instruction can be used instead. |
72 | 70 | ||
73 | Linux provides a set of macros for handling power-of-2 circular buffers. These | 71 | Linux provides a set of macros for handling power-of-2 circular buffers. These |
74 | can be made use of by: | 72 | can be made use of by:: |
75 | 73 | ||
76 | #include <linux/circ_buf.h> | 74 | #include <linux/circ_buf.h> |
77 | 75 | ||
78 | The macros are: | 76 | The macros are: |
79 | 77 | ||
80 | (*) Measure the remaining capacity of a buffer: | 78 | (#) Measure the remaining capacity of a buffer:: |
81 | 79 | ||
82 | CIRC_SPACE(head_index, tail_index, buffer_size); | 80 | CIRC_SPACE(head_index, tail_index, buffer_size); |
83 | 81 | ||
@@ -85,7 +83,7 @@ The macros are: | |||
85 | can be inserted. | 83 | can be inserted. |
86 | 84 | ||
87 | 85 | ||
88 | (*) Measure the maximum consecutive immediate space in a buffer: | 86 | (#) Measure the maximum consecutive immediate space in a buffer:: |
89 | 87 | ||
90 | CIRC_SPACE_TO_END(head_index, tail_index, buffer_size); | 88 | CIRC_SPACE_TO_END(head_index, tail_index, buffer_size); |
91 | 89 | ||
@@ -94,14 +92,14 @@ The macros are: | |||
94 | beginning of the buffer. | 92 | beginning of the buffer. |
95 | 93 | ||
96 | 94 | ||
97 | (*) Measure the occupancy of a buffer: | 95 | (#) Measure the occupancy of a buffer:: |
98 | 96 | ||
99 | CIRC_CNT(head_index, tail_index, buffer_size); | 97 | CIRC_CNT(head_index, tail_index, buffer_size); |
100 | 98 | ||
101 | This returns the number of items currently occupying a buffer[2]. | 99 | This returns the number of items currently occupying a buffer[2]. |
102 | 100 | ||
103 | 101 | ||
104 | (*) Measure the non-wrapping occupancy of a buffer: | 102 | (#) Measure the non-wrapping occupancy of a buffer:: |
105 | 103 | ||
106 | CIRC_CNT_TO_END(head_index, tail_index, buffer_size); | 104 | CIRC_CNT_TO_END(head_index, tail_index, buffer_size); |
107 | 105 | ||
@@ -112,7 +110,7 @@ The macros are: | |||
112 | Each of these macros will nominally return a value between 0 and buffer_size-1, | 110 | Each of these macros will nominally return a value between 0 and buffer_size-1, |
113 | however: | 111 | however: |
114 | 112 | ||
115 | [1] CIRC_SPACE*() are intended to be used in the producer. To the producer | 113 | (1) CIRC_SPACE*() are intended to be used in the producer. To the producer |
116 | they will return a lower bound as the producer controls the head index, | 114 | they will return a lower bound as the producer controls the head index, |
117 | but the consumer may still be depleting the buffer on another CPU and | 115 | but the consumer may still be depleting the buffer on another CPU and |
118 | moving the tail index. | 116 | moving the tail index. |
@@ -120,7 +118,7 @@ however: | |||
120 | To the consumer it will show an upper bound as the producer may be busy | 118 | To the consumer it will show an upper bound as the producer may be busy |
121 | depleting the space. | 119 | depleting the space. |
122 | 120 | ||
123 | [2] CIRC_CNT*() are intended to be used in the consumer. To the consumer they | 121 | (2) CIRC_CNT*() are intended to be used in the consumer. To the consumer they |
124 | will return a lower bound as the consumer controls the tail index, but the | 122 | will return a lower bound as the consumer controls the tail index, but the |
125 | producer may still be filling the buffer on another CPU and moving the | 123 | producer may still be filling the buffer on another CPU and moving the |
126 | head index. | 124 | head index. |
@@ -128,14 +126,12 @@ however: | |||
128 | To the producer it will show an upper bound as the consumer may be busy | 126 | To the producer it will show an upper bound as the consumer may be busy |
129 | emptying the buffer. | 127 | emptying the buffer. |
130 | 128 | ||
131 | [3] To a third party, the order in which the writes to the indices by the | 129 | (3) To a third party, the order in which the writes to the indices by the |
132 | producer and consumer become visible cannot be guaranteed as they are | 130 | producer and consumer become visible cannot be guaranteed as they are |
133 | independent and may be made on different CPUs - so the result in such a | 131 | independent and may be made on different CPUs - so the result in such a |
134 | situation will merely be a guess, and may even be negative. | 132 | situation will merely be a guess, and may even be negative. |
135 | 133 | ||
136 | 134 | Using memory barriers with circular buffers | |
137 | =========================================== | ||
138 | USING MEMORY BARRIERS WITH CIRCULAR BUFFERS | ||
139 | =========================================== | 135 | =========================================== |
140 | 136 | ||
141 | By using memory barriers in conjunction with circular buffers, you can avoid | 137 | By using memory barriers in conjunction with circular buffers, you can avoid |
@@ -152,10 +148,10 @@ time, and only one thing should be emptying a buffer at any one time, but the | |||
152 | two sides can operate simultaneously. | 148 | two sides can operate simultaneously. |
153 | 149 | ||
154 | 150 | ||
155 | THE PRODUCER | 151 | The producer |
156 | ------------ | 152 | ------------ |
157 | 153 | ||
158 | The producer will look something like this: | 154 | The producer will look something like this:: |
159 | 155 | ||
160 | spin_lock(&producer_lock); | 156 | spin_lock(&producer_lock); |
161 | 157 | ||
@@ -193,10 +189,10 @@ ordering between the read of the index indicating that the consumer has | |||
193 | vacated a given element and the write by the producer to that same element. | 189 | vacated a given element and the write by the producer to that same element. |
194 | 190 | ||
195 | 191 | ||
196 | THE CONSUMER | 192 | The Consumer |
197 | ------------ | 193 | ------------ |
198 | 194 | ||
199 | The consumer will look something like this: | 195 | The consumer will look something like this:: |
200 | 196 | ||
201 | spin_lock(&consumer_lock); | 197 | spin_lock(&consumer_lock); |
202 | 198 | ||
@@ -235,8 +231,7 @@ prevents the compiler from tearing the store, and enforces ordering | |||
235 | against previous accesses. | 231 | against previous accesses. |
236 | 232 | ||
237 | 233 | ||
238 | =============== | 234 | Further reading |
239 | FURTHER READING | ||
240 | =============== | 235 | =============== |
241 | 236 | ||
242 | See also Documentation/memory-barriers.txt for a description of Linux's memory | 237 | See also Documentation/memory-barriers.txt for a description of Linux's memory |
diff --git a/Documentation/clk.txt b/Documentation/clk.txt index 22f026aa2f34..be909ed45970 100644 --- a/Documentation/clk.txt +++ b/Documentation/clk.txt | |||
@@ -1,12 +1,16 @@ | |||
1 | The Common Clk Framework | 1 | ======================== |
2 | Mike Turquette <mturquette@ti.com> | 2 | The Common Clk Framework |
3 | ======================== | ||
4 | |||
5 | :Author: Mike Turquette <mturquette@ti.com> | ||
3 | 6 | ||
4 | This document endeavours to explain the common clk framework details, | 7 | This document endeavours to explain the common clk framework details, |
5 | and how to port a platform over to this framework. It is not yet a | 8 | and how to port a platform over to this framework. It is not yet a |
6 | detailed explanation of the clock api in include/linux/clk.h, but | 9 | detailed explanation of the clock api in include/linux/clk.h, but |
7 | perhaps someday it will include that information. | 10 | perhaps someday it will include that information. |
8 | 11 | ||
9 | Part 1 - introduction and interface split | 12 | Introduction and interface split |
13 | ================================ | ||
10 | 14 | ||
11 | The common clk framework is an interface to control the clock nodes | 15 | The common clk framework is an interface to control the clock nodes |
12 | available on various devices today. This may come in the form of clock | 16 | available on various devices today. This may come in the form of clock |
@@ -35,10 +39,11 @@ is defined in struct clk_foo and pointed to within struct clk_core. This | |||
35 | allows for easy navigation between the two discrete halves of the common | 39 | allows for easy navigation between the two discrete halves of the common |
36 | clock interface. | 40 | clock interface. |
37 | 41 | ||
38 | Part 2 - common data structures and api | 42 | Common data structures and api |
43 | ============================== | ||
39 | 44 | ||
40 | Below is the common struct clk_core definition from | 45 | Below is the common struct clk_core definition from |
41 | drivers/clk/clk.c, modified for brevity: | 46 | drivers/clk/clk.c, modified for brevity:: |
42 | 47 | ||
43 | struct clk_core { | 48 | struct clk_core { |
44 | const char *name; | 49 | const char *name; |
@@ -59,7 +64,7 @@ struct clk. That api is documented in include/linux/clk.h. | |||
59 | 64 | ||
60 | Platforms and devices utilizing the common struct clk_core use the struct | 65 | Platforms and devices utilizing the common struct clk_core use the struct |
61 | clk_ops pointer in struct clk_core to perform the hardware-specific parts of | 66 | clk_ops pointer in struct clk_core to perform the hardware-specific parts of |
62 | the operations defined in clk-provider.h: | 67 | the operations defined in clk-provider.h:: |
63 | 68 | ||
64 | struct clk_ops { | 69 | struct clk_ops { |
65 | int (*prepare)(struct clk_hw *hw); | 70 | int (*prepare)(struct clk_hw *hw); |
@@ -95,19 +100,20 @@ the operations defined in clk-provider.h: | |||
95 | struct dentry *dentry); | 100 | struct dentry *dentry); |
96 | }; | 101 | }; |
97 | 102 | ||
98 | Part 3 - hardware clk implementations | 103 | Hardware clk implementations |
104 | ============================ | ||
99 | 105 | ||
100 | The strength of the common struct clk_core comes from its .ops and .hw pointers | 106 | The strength of the common struct clk_core comes from its .ops and .hw pointers |
101 | which abstract the details of struct clk from the hardware-specific bits, and | 107 | which abstract the details of struct clk from the hardware-specific bits, and |
102 | vice versa. To illustrate consider the simple gateable clk implementation in | 108 | vice versa. To illustrate consider the simple gateable clk implementation in |
103 | drivers/clk/clk-gate.c: | 109 | drivers/clk/clk-gate.c:: |
104 | 110 | ||
105 | struct clk_gate { | 111 | struct clk_gate { |
106 | struct clk_hw hw; | 112 | struct clk_hw hw; |
107 | void __iomem *reg; | 113 | void __iomem *reg; |
108 | u8 bit_idx; | 114 | u8 bit_idx; |
109 | ... | 115 | ... |
110 | }; | 116 | }; |
111 | 117 | ||
112 | struct clk_gate contains struct clk_hw hw as well as hardware-specific | 118 | struct clk_gate contains struct clk_hw hw as well as hardware-specific |
113 | knowledge about which register and bit controls this clk's gating. | 119 | knowledge about which register and bit controls this clk's gating. |
@@ -115,7 +121,7 @@ Nothing about clock topology or accounting, such as enable_count or | |||
115 | notifier_count, is needed here. That is all handled by the common | 121 | notifier_count, is needed here. That is all handled by the common |
116 | framework code and struct clk_core. | 122 | framework code and struct clk_core. |
117 | 123 | ||
118 | Let's walk through enabling this clk from driver code: | 124 | Let's walk through enabling this clk from driver code:: |
119 | 125 | ||
120 | struct clk *clk; | 126 | struct clk *clk; |
121 | clk = clk_get(NULL, "my_gateable_clk"); | 127 | clk = clk_get(NULL, "my_gateable_clk"); |
@@ -123,70 +129,71 @@ Let's walk through enabling this clk from driver code: | |||
123 | clk_prepare(clk); | 129 | clk_prepare(clk); |
124 | clk_enable(clk); | 130 | clk_enable(clk); |
125 | 131 | ||
126 | The call graph for clk_enable is very simple: | 132 | The call graph for clk_enable is very simple:: |
127 | 133 | ||
128 | clk_enable(clk); | 134 | clk_enable(clk); |
129 | clk->ops->enable(clk->hw); | 135 | clk->ops->enable(clk->hw); |
130 | [resolves to...] | 136 | [resolves to...] |
131 | clk_gate_enable(hw); | 137 | clk_gate_enable(hw); |
132 | [resolves struct clk gate with to_clk_gate(hw)] | 138 | [resolves struct clk gate with to_clk_gate(hw)] |
133 | clk_gate_set_bit(gate); | 139 | clk_gate_set_bit(gate); |
134 | 140 | ||
135 | And the definition of clk_gate_set_bit: | 141 | And the definition of clk_gate_set_bit:: |
136 | 142 | ||
137 | static void clk_gate_set_bit(struct clk_gate *gate) | 143 | static void clk_gate_set_bit(struct clk_gate *gate) |
138 | { | 144 | { |
139 | u32 reg; | 145 | u32 reg; |
140 | 146 | ||
141 | reg = __raw_readl(gate->reg); | 147 | reg = __raw_readl(gate->reg); |
142 | reg |= BIT(gate->bit_idx); | 148 | reg |= BIT(gate->bit_idx); |
143 | writel(reg, gate->reg); | 149 | writel(reg, gate->reg); |
144 | } | 150 | } |
145 | 151 | ||
146 | Note that to_clk_gate is defined as: | 152 | Note that to_clk_gate is defined as:: |
147 | 153 | ||
148 | #define to_clk_gate(_hw) container_of(_hw, struct clk_gate, hw) | 154 | #define to_clk_gate(_hw) container_of(_hw, struct clk_gate, hw) |
149 | 155 | ||
150 | This pattern of abstraction is used for every clock hardware | 156 | This pattern of abstraction is used for every clock hardware |
151 | representation. | 157 | representation. |
152 | 158 | ||
153 | Part 4 - supporting your own clk hardware | 159 | Supporting your own clk hardware |
160 | ================================ | ||
154 | 161 | ||
155 | When implementing support for a new type of clock it is only necessary to | 162 | When implementing support for a new type of clock it is only necessary to |
156 | include the following header: | 163 | include the following header:: |
157 | 164 | ||
158 | #include <linux/clk-provider.h> | 165 | #include <linux/clk-provider.h> |
159 | 166 | ||
160 | To construct a clk hardware structure for your platform you must define | 167 | To construct a clk hardware structure for your platform you must define |
161 | the following: | 168 | the following:: |
162 | 169 | ||
163 | struct clk_foo { | 170 | struct clk_foo { |
164 | struct clk_hw hw; | 171 | struct clk_hw hw; |
165 | ... hardware specific data goes here ... | 172 | ... hardware specific data goes here ... |
166 | }; | 173 | }; |
167 | 174 | ||
168 | To take advantage of your data you'll need to support valid operations | 175 | To take advantage of your data you'll need to support valid operations |
169 | for your clk: | 176 | for your clk:: |
170 | 177 | ||
171 | struct clk_ops clk_foo_ops { | 178 | struct clk_ops clk_foo_ops { |
172 | .enable = &clk_foo_enable; | 179 | .enable = &clk_foo_enable; |
173 | .disable = &clk_foo_disable; | 180 | .disable = &clk_foo_disable; |
174 | }; | 181 | }; |
175 | 182 | ||
176 | Implement the above functions using container_of: | 183 | Implement the above functions using container_of:: |
177 | 184 | ||
178 | #define to_clk_foo(_hw) container_of(_hw, struct clk_foo, hw) | 185 | #define to_clk_foo(_hw) container_of(_hw, struct clk_foo, hw) |
179 | 186 | ||
180 | int clk_foo_enable(struct clk_hw *hw) | 187 | int clk_foo_enable(struct clk_hw *hw) |
181 | { | 188 | { |
182 | struct clk_foo *foo; | 189 | struct clk_foo *foo; |
183 | 190 | ||
184 | foo = to_clk_foo(hw); | 191 | foo = to_clk_foo(hw); |
185 | 192 | ||
186 | ... perform magic on foo ... | 193 | ... perform magic on foo ... |
187 | 194 | ||
188 | return 0; | 195 | return 0; |
189 | }; | 196 | }; |
190 | 197 | ||
191 | Below is a matrix detailing which clk_ops are mandatory based upon the | 198 | Below is a matrix detailing which clk_ops are mandatory based upon the |
192 | hardware capabilities of that clock. A cell marked as "y" means | 199 | hardware capabilities of that clock. A cell marked as "y" means |
@@ -194,41 +201,56 @@ mandatory, a cell marked as "n" implies that either including that | |||
194 | callback is invalid or otherwise unnecessary. Empty cells are either | 201 | callback is invalid or otherwise unnecessary. Empty cells are either |
195 | optional or must be evaluated on a case-by-case basis. | 202 | optional or must be evaluated on a case-by-case basis. |
196 | 203 | ||
197 | clock hardware characteristics | 204 | .. table:: clock hardware characteristics |
198 | ----------------------------------------------------------- | 205 | |
199 | | gate | change rate | single parent | multiplexer | root | | 206 | +----------------+------+-------------+---------------+-------------+------+ |
200 | |------|-------------|---------------|-------------|------| | 207 | | | gate | change rate | single parent | multiplexer | root | |
201 | .prepare | | | | | | | 208 | +================+======+=============+===============+=============+======+ |
202 | .unprepare | | | | | | | 209 | |.prepare | | | | | | |
203 | | | | | | | | 210 | +----------------+------+-------------+---------------+-------------+------+ |
204 | .enable | y | | | | | | 211 | |.unprepare | | | | | | |
205 | .disable | y | | | | | | 212 | +----------------+------+-------------+---------------+-------------+------+ |
206 | .is_enabled | y | | | | | | 213 | +----------------+------+-------------+---------------+-------------+------+ |
207 | | | | | | | | 214 | |.enable | y | | | | | |
208 | .recalc_rate | | y | | | | | 215 | +----------------+------+-------------+---------------+-------------+------+ |
209 | .round_rate | | y [1] | | | | | 216 | |.disable | y | | | | | |
210 | .determine_rate | | y [1] | | | | | 217 | +----------------+------+-------------+---------------+-------------+------+ |
211 | .set_rate | | y | | | | | 218 | |.is_enabled | y | | | | | |
212 | | | | | | | | 219 | +----------------+------+-------------+---------------+-------------+------+ |
213 | .set_parent | | | n | y | n | | 220 | +----------------+------+-------------+---------------+-------------+------+ |
214 | .get_parent | | | n | y | n | | 221 | |.recalc_rate | | y | | | | |
215 | | | | | | | | 222 | +----------------+------+-------------+---------------+-------------+------+ |
216 | .recalc_accuracy| | | | | | | 223 | |.round_rate | | y [1]_ | | | | |
217 | | | | | | | | 224 | +----------------+------+-------------+---------------+-------------+------+ |
218 | .init | | | | | | | 225 | |.determine_rate | | y [1]_ | | | | |
219 | ----------------------------------------------------------- | 226 | +----------------+------+-------------+---------------+-------------+------+ |
220 | [1] either one of round_rate or determine_rate is required. | 227 | |.set_rate | | y | | | | |
228 | +----------------+------+-------------+---------------+-------------+------+ | ||
229 | +----------------+------+-------------+---------------+-------------+------+ | ||
230 | |.set_parent | | | n | y | n | | ||
231 | +----------------+------+-------------+---------------+-------------+------+ | ||
232 | |.get_parent | | | n | y | n | | ||
233 | +----------------+------+-------------+---------------+-------------+------+ | ||
234 | +----------------+------+-------------+---------------+-------------+------+ | ||
235 | |.recalc_accuracy| | | | | | | ||
236 | +----------------+------+-------------+---------------+-------------+------+ | ||
237 | +----------------+------+-------------+---------------+-------------+------+ | ||
238 | |.init | | | | | | | ||
239 | +----------------+------+-------------+---------------+-------------+------+ | ||
240 | |||
241 | .. [1] either one of round_rate or determine_rate is required. | ||
221 | 242 | ||
222 | Finally, register your clock at run-time with a hardware-specific | 243 | Finally, register your clock at run-time with a hardware-specific |
223 | registration function. This function simply populates struct clk_foo's | 244 | registration function. This function simply populates struct clk_foo's |
224 | data and then passes the common struct clk parameters to the framework | 245 | data and then passes the common struct clk parameters to the framework |
225 | with a call to: | 246 | with a call to:: |
226 | 247 | ||
227 | clk_register(...) | 248 | clk_register(...) |
228 | 249 | ||
229 | See the basic clock types in drivers/clk/clk-*.c for examples. | 250 | See the basic clock types in ``drivers/clk/clk-*.c`` for examples. |
230 | 251 | ||
231 | Part 5 - Disabling clock gating of unused clocks | 252 | Disabling clock gating of unused clocks |
253 | ======================================= | ||
232 | 254 | ||
233 | Sometimes during development it can be useful to be able to bypass the | 255 | Sometimes during development it can be useful to be able to bypass the |
234 | default disabling of unused clocks. For example, if drivers aren't enabling | 256 | default disabling of unused clocks. For example, if drivers aren't enabling |
@@ -239,7 +261,8 @@ are sorted out. | |||
239 | To bypass this disabling, include "clk_ignore_unused" in the bootargs to the | 261 | To bypass this disabling, include "clk_ignore_unused" in the bootargs to the |
240 | kernel. | 262 | kernel. |
241 | 263 | ||
242 | Part 6 - Locking | 264 | Locking |
265 | ======= | ||
243 | 266 | ||
244 | The common clock framework uses two global locks, the prepare lock and the | 267 | The common clock framework uses two global locks, the prepare lock and the |
245 | enable lock. | 268 | enable lock. |
diff --git a/Documentation/cpu-load.txt b/Documentation/cpu-load.txt index 287224e57cfc..2d01ce43d2a2 100644 --- a/Documentation/cpu-load.txt +++ b/Documentation/cpu-load.txt | |||
@@ -1,9 +1,10 @@ | |||
1 | ======== | ||
1 | CPU load | 2 | CPU load |
2 | -------- | 3 | ======== |
3 | 4 | ||
4 | Linux exports various bits of information via `/proc/stat' and | 5 | Linux exports various bits of information via ``/proc/stat`` and |
5 | `/proc/uptime' that userland tools, such as top(1), use to calculate | 6 | ``/proc/uptime`` that userland tools, such as top(1), use to calculate |
6 | the average time system spent in a particular state, for example: | 7 | the average time system spent in a particular state, for example:: |
7 | 8 | ||
8 | $ iostat | 9 | $ iostat |
9 | Linux 2.6.18.3-exp (linmac) 02/20/2007 | 10 | Linux 2.6.18.3-exp (linmac) 02/20/2007 |
@@ -17,7 +18,7 @@ Here the system thinks that over the default sampling period the | |||
17 | system spent 10.01% of the time doing work in user space, 2.92% in the | 18 | system spent 10.01% of the time doing work in user space, 2.92% in the |
18 | kernel, and was overall 81.63% of the time idle. | 19 | kernel, and was overall 81.63% of the time idle. |
19 | 20 | ||
20 | In most cases the `/proc/stat' information reflects the reality quite | 21 | In most cases the ``/proc/stat`` information reflects the reality quite |
21 | closely, however due to the nature of how/when the kernel collects | 22 | closely, however due to the nature of how/when the kernel collects |
22 | this data sometimes it can not be trusted at all. | 23 | this data sometimes it can not be trusted at all. |
23 | 24 | ||
@@ -33,78 +34,78 @@ Example | |||
33 | ------- | 34 | ------- |
34 | 35 | ||
35 | If we imagine the system with one task that periodically burns cycles | 36 | If we imagine the system with one task that periodically burns cycles |
36 | in the following manner: | 37 | in the following manner:: |
37 | 38 | ||
38 | time line between two timer interrupts | 39 | time line between two timer interrupts |
39 | |--------------------------------------| | 40 | |--------------------------------------| |
40 | ^ ^ | 41 | ^ ^ |
41 | |_ something begins working | | 42 | |_ something begins working | |
42 | |_ something goes to sleep | 43 | |_ something goes to sleep |
43 | (only to be awaken quite soon) | 44 | (only to be awaken quite soon) |
44 | 45 | ||
45 | In the above situation the system will be 0% loaded according to the | 46 | In the above situation the system will be 0% loaded according to the |
46 | `/proc/stat' (since the timer interrupt will always happen when the | 47 | ``/proc/stat`` (since the timer interrupt will always happen when the |
47 | system is executing the idle handler), but in reality the load is | 48 | system is executing the idle handler), but in reality the load is |
48 | closer to 99%. | 49 | closer to 99%. |
49 | 50 | ||
50 | One can imagine many more situations where this behavior of the kernel | 51 | One can imagine many more situations where this behavior of the kernel |
51 | will lead to quite erratic information inside `/proc/stat'. | 52 | will lead to quite erratic information inside ``/proc/stat``:: |
52 | 53 | ||
53 | 54 | ||
54 | /* gcc -o hog smallhog.c */ | 55 | /* gcc -o hog smallhog.c */ |
55 | #include <time.h> | 56 | #include <time.h> |
56 | #include <limits.h> | 57 | #include <limits.h> |
57 | #include <signal.h> | 58 | #include <signal.h> |
58 | #include <sys/time.h> | 59 | #include <sys/time.h> |
59 | #define HIST 10 | 60 | #define HIST 10 |
60 | 61 | ||
61 | static volatile sig_atomic_t stop; | 62 | static volatile sig_atomic_t stop; |
62 | 63 | ||
63 | static void sighandler (int signr) | 64 | static void sighandler (int signr) |
64 | { | 65 | { |
65 | (void) signr; | 66 | (void) signr; |
66 | stop = 1; | 67 | stop = 1; |
67 | } | 68 | } |
68 | static unsigned long hog (unsigned long niters) | 69 | static unsigned long hog (unsigned long niters) |
69 | { | 70 | { |
70 | stop = 0; | 71 | stop = 0; |
71 | while (!stop && --niters); | 72 | while (!stop && --niters); |
72 | return niters; | 73 | return niters; |
73 | } | 74 | } |
74 | int main (void) | 75 | int main (void) |
75 | { | 76 | { |
76 | int i; | 77 | int i; |
77 | struct itimerval it = { .it_interval = { .tv_sec = 0, .tv_usec = 1 }, | 78 | struct itimerval it = { .it_interval = { .tv_sec = 0, .tv_usec = 1 }, |
78 | .it_value = { .tv_sec = 0, .tv_usec = 1 } }; | 79 | .it_value = { .tv_sec = 0, .tv_usec = 1 } }; |
79 | sigset_t set; | 80 | sigset_t set; |
80 | unsigned long v[HIST]; | 81 | unsigned long v[HIST]; |
81 | double tmp = 0.0; | 82 | double tmp = 0.0; |
82 | unsigned long n; | 83 | unsigned long n; |
83 | signal (SIGALRM, &sighandler); | 84 | signal (SIGALRM, &sighandler); |
84 | setitimer (ITIMER_REAL, &it, NULL); | 85 | setitimer (ITIMER_REAL, &it, NULL); |
85 | 86 | ||
86 | hog (ULONG_MAX); | 87 | hog (ULONG_MAX); |
87 | for (i = 0; i < HIST; ++i) v[i] = ULONG_MAX - hog (ULONG_MAX); | 88 | for (i = 0; i < HIST; ++i) v[i] = ULONG_MAX - hog (ULONG_MAX); |
88 | for (i = 0; i < HIST; ++i) tmp += v[i]; | 89 | for (i = 0; i < HIST; ++i) tmp += v[i]; |
89 | tmp /= HIST; | 90 | tmp /= HIST; |
90 | n = tmp - (tmp / 3.0); | 91 | n = tmp - (tmp / 3.0); |
91 | 92 | ||
92 | sigemptyset (&set); | 93 | sigemptyset (&set); |
93 | sigaddset (&set, SIGALRM); | 94 | sigaddset (&set, SIGALRM); |
94 | 95 | ||
95 | for (;;) { | 96 | for (;;) { |
96 | hog (n); | 97 | hog (n); |
97 | sigwait (&set, &i); | 98 | sigwait (&set, &i); |
98 | } | 99 | } |
99 | return 0; | 100 | return 0; |
100 | } | 101 | } |
101 | 102 | ||
102 | 103 | ||
103 | References | 104 | References |
104 | ---------- | 105 | ---------- |
105 | 106 | ||
106 | http://lkml.org/lkml/2007/2/12/6 | 107 | - http://lkml.org/lkml/2007/2/12/6 |
107 | Documentation/filesystems/proc.txt (1.8) | 108 | - Documentation/filesystems/proc.txt (1.8) |
108 | 109 | ||
109 | 110 | ||
110 | Thanks | 111 | Thanks |
diff --git a/Documentation/cputopology.txt b/Documentation/cputopology.txt index 127c9d8c2174..c6e7e9196a8b 100644 --- a/Documentation/cputopology.txt +++ b/Documentation/cputopology.txt | |||
@@ -1,3 +1,6 @@ | |||
1 | =========================================== | ||
2 | How CPU topology info is exported via sysfs | ||
3 | =========================================== | ||
1 | 4 | ||
2 | Export CPU topology info via sysfs. Items (attributes) are similar | 5 | Export CPU topology info via sysfs. Items (attributes) are similar |
3 | to /proc/cpuinfo output of some architectures: | 6 | to /proc/cpuinfo output of some architectures: |
@@ -75,24 +78,26 @@ CONFIG_SCHED_BOOK and CONFIG_DRAWER are currently only used on s390, where | |||
75 | they reflect the cpu and cache hierarchy. | 78 | they reflect the cpu and cache hierarchy. |
76 | 79 | ||
77 | For an architecture to support this feature, it must define some of | 80 | For an architecture to support this feature, it must define some of |
78 | these macros in include/asm-XXX/topology.h: | 81 | these macros in include/asm-XXX/topology.h:: |
79 | #define topology_physical_package_id(cpu) | 82 | |
80 | #define topology_core_id(cpu) | 83 | #define topology_physical_package_id(cpu) |
81 | #define topology_book_id(cpu) | 84 | #define topology_core_id(cpu) |
82 | #define topology_drawer_id(cpu) | 85 | #define topology_book_id(cpu) |
83 | #define topology_sibling_cpumask(cpu) | 86 | #define topology_drawer_id(cpu) |
84 | #define topology_core_cpumask(cpu) | 87 | #define topology_sibling_cpumask(cpu) |
85 | #define topology_book_cpumask(cpu) | 88 | #define topology_core_cpumask(cpu) |
86 | #define topology_drawer_cpumask(cpu) | 89 | #define topology_book_cpumask(cpu) |
87 | 90 | #define topology_drawer_cpumask(cpu) | |
88 | The type of **_id macros is int. | 91 | |
89 | The type of **_cpumask macros is (const) struct cpumask *. The latter | 92 | The type of ``**_id macros`` is int. |
90 | correspond with appropriate **_siblings sysfs attributes (except for | 93 | The type of ``**_cpumask macros`` is ``(const) struct cpumask *``. The latter |
94 | correspond with appropriate ``**_siblings`` sysfs attributes (except for | ||
91 | topology_sibling_cpumask() which corresponds with thread_siblings). | 95 | topology_sibling_cpumask() which corresponds with thread_siblings). |
92 | 96 | ||
93 | To be consistent on all architectures, include/linux/topology.h | 97 | To be consistent on all architectures, include/linux/topology.h |
94 | provides default definitions for any of the above macros that are | 98 | provides default definitions for any of the above macros that are |
95 | not defined by include/asm-XXX/topology.h: | 99 | not defined by include/asm-XXX/topology.h: |
100 | |||
96 | 1) physical_package_id: -1 | 101 | 1) physical_package_id: -1 |
97 | 2) core_id: 0 | 102 | 2) core_id: 0 |
98 | 3) sibling_cpumask: just the given CPU | 103 | 3) sibling_cpumask: just the given CPU |
@@ -107,6 +112,7 @@ Additionally, CPU topology information is provided under | |||
107 | /sys/devices/system/cpu and includes these files. The internal | 112 | /sys/devices/system/cpu and includes these files. The internal |
108 | source for the output is in brackets ("[]"). | 113 | source for the output is in brackets ("[]"). |
109 | 114 | ||
115 | =========== ========================================================== | ||
110 | kernel_max: the maximum CPU index allowed by the kernel configuration. | 116 | kernel_max: the maximum CPU index allowed by the kernel configuration. |
111 | [NR_CPUS-1] | 117 | [NR_CPUS-1] |
112 | 118 | ||
@@ -122,6 +128,7 @@ source for the output is in brackets ("[]"). | |||
122 | 128 | ||
123 | present: CPUs that have been identified as being present in the | 129 | present: CPUs that have been identified as being present in the |
124 | system. [cpu_present_mask] | 130 | system. [cpu_present_mask] |
131 | =========== ========================================================== | ||
125 | 132 | ||
126 | The format for the above output is compatible with cpulist_parse() | 133 | The format for the above output is compatible with cpulist_parse() |
127 | [see <linux/cpumask.h>]. Some examples follow. | 134 | [see <linux/cpumask.h>]. Some examples follow. |
@@ -129,7 +136,7 @@ The format for the above output is compatible with cpulist_parse() | |||
129 | In this example, there are 64 CPUs in the system but cpus 32-63 exceed | 136 | In this example, there are 64 CPUs in the system but cpus 32-63 exceed |
130 | the kernel max which is limited to 0..31 by the NR_CPUS config option | 137 | the kernel max which is limited to 0..31 by the NR_CPUS config option |
131 | being 32. Note also that CPUs 2 and 4-31 are not online but could be | 138 | being 32. Note also that CPUs 2 and 4-31 are not online but could be |
132 | brought online as they are both present and possible. | 139 | brought online as they are both present and possible:: |
133 | 140 | ||
134 | kernel_max: 31 | 141 | kernel_max: 31 |
135 | offline: 2,4-31,32-63 | 142 | offline: 2,4-31,32-63 |
@@ -140,7 +147,7 @@ brought online as they are both present and possible. | |||
140 | In this example, the NR_CPUS config option is 128, but the kernel was | 147 | In this example, the NR_CPUS config option is 128, but the kernel was |
141 | started with possible_cpus=144. There are 4 CPUs in the system and cpu2 | 148 | started with possible_cpus=144. There are 4 CPUs in the system and cpu2 |
142 | was manually taken offline (and is the only CPU that can be brought | 149 | was manually taken offline (and is the only CPU that can be brought |
143 | online.) | 150 | online.):: |
144 | 151 | ||
145 | kernel_max: 127 | 152 | kernel_max: 127 |
146 | offline: 2,4-127,128-143 | 153 | offline: 2,4-127,128-143 |
diff --git a/Documentation/crc32.txt b/Documentation/crc32.txt index a08a7dd9d625..8a6860f33b4e 100644 --- a/Documentation/crc32.txt +++ b/Documentation/crc32.txt | |||
@@ -1,4 +1,6 @@ | |||
1 | A brief CRC tutorial. | 1 | ================================= |
2 | brief tutorial on CRC computation | ||
3 | ================================= | ||
2 | 4 | ||
3 | A CRC is a long-division remainder. You add the CRC to the message, | 5 | A CRC is a long-division remainder. You add the CRC to the message, |
4 | and the whole thing (message+CRC) is a multiple of the given | 6 | and the whole thing (message+CRC) is a multiple of the given |
@@ -8,7 +10,8 @@ remainder computed on the message+CRC is 0. This latter approach | |||
8 | is used by a lot of hardware implementations, and is why so many | 10 | is used by a lot of hardware implementations, and is why so many |
9 | protocols put the end-of-frame flag after the CRC. | 11 | protocols put the end-of-frame flag after the CRC. |
10 | 12 | ||
11 | It's actually the same long division you learned in school, except that | 13 | It's actually the same long division you learned in school, except that: |
14 | |||
12 | - We're working in binary, so the digits are only 0 and 1, and | 15 | - We're working in binary, so the digits are only 0 and 1, and |
13 | - When dividing polynomials, there are no carries. Rather than add and | 16 | - When dividing polynomials, there are no carries. Rather than add and |
14 | subtract, we just xor. Thus, we tend to get a bit sloppy about | 17 | subtract, we just xor. Thus, we tend to get a bit sloppy about |
@@ -40,11 +43,12 @@ throw the quotient bit away, but subtract the appropriate multiple of | |||
40 | the polynomial from the remainder and we're back to where we started, | 43 | the polynomial from the remainder and we're back to where we started, |
41 | ready to process the next bit. | 44 | ready to process the next bit. |
42 | 45 | ||
43 | A big-endian CRC written this way would be coded like: | 46 | A big-endian CRC written this way would be coded like:: |
44 | for (i = 0; i < input_bits; i++) { | 47 | |
45 | multiple = remainder & 0x80000000 ? CRCPOLY : 0; | 48 | for (i = 0; i < input_bits; i++) { |
46 | remainder = (remainder << 1 | next_input_bit()) ^ multiple; | 49 | multiple = remainder & 0x80000000 ? CRCPOLY : 0; |
47 | } | 50 | remainder = (remainder << 1 | next_input_bit()) ^ multiple; |
51 | } | ||
48 | 52 | ||
49 | Notice how, to get at bit 32 of the shifted remainder, we look | 53 | Notice how, to get at bit 32 of the shifted remainder, we look |
50 | at bit 31 of the remainder *before* shifting it. | 54 | at bit 31 of the remainder *before* shifting it. |
@@ -54,25 +58,26 @@ the remainder don't actually affect any decision-making until | |||
54 | 32 bits later. Thus, the first 32 cycles of this are pretty boring. | 58 | 32 bits later. Thus, the first 32 cycles of this are pretty boring. |
55 | Also, to add the CRC to a message, we need a 32-bit-long hole for it at | 59 | Also, to add the CRC to a message, we need a 32-bit-long hole for it at |
56 | the end, so we have to add 32 extra cycles shifting in zeros at the | 60 | the end, so we have to add 32 extra cycles shifting in zeros at the |
57 | end of every message, | 61 | end of every message. |
58 | 62 | ||
59 | These details lead to a standard trick: rearrange merging in the | 63 | These details lead to a standard trick: rearrange merging in the |
60 | next_input_bit() until the moment it's needed. Then the first 32 cycles | 64 | next_input_bit() until the moment it's needed. Then the first 32 cycles |
61 | can be precomputed, and merging in the final 32 zero bits to make room | 65 | can be precomputed, and merging in the final 32 zero bits to make room |
62 | for the CRC can be skipped entirely. This changes the code to: | 66 | for the CRC can be skipped entirely. This changes the code to:: |
63 | 67 | ||
64 | for (i = 0; i < input_bits; i++) { | 68 | for (i = 0; i < input_bits; i++) { |
65 | remainder ^= next_input_bit() << 31; | 69 | remainder ^= next_input_bit() << 31; |
66 | multiple = (remainder & 0x80000000) ? CRCPOLY : 0; | 70 | multiple = (remainder & 0x80000000) ? CRCPOLY : 0; |
67 | remainder = (remainder << 1) ^ multiple; | 71 | remainder = (remainder << 1) ^ multiple; |
68 | } | 72 | } |
69 | 73 | ||
70 | With this optimization, the little-endian code is particularly simple: | 74 | With this optimization, the little-endian code is particularly simple:: |
71 | for (i = 0; i < input_bits; i++) { | 75 | |
72 | remainder ^= next_input_bit(); | 76 | for (i = 0; i < input_bits; i++) { |
73 | multiple = (remainder & 1) ? CRCPOLY : 0; | 77 | remainder ^= next_input_bit(); |
74 | remainder = (remainder >> 1) ^ multiple; | 78 | multiple = (remainder & 1) ? CRCPOLY : 0; |
75 | } | 79 | remainder = (remainder >> 1) ^ multiple; |
80 | } | ||
76 | 81 | ||
77 | The most significant coefficient of the remainder polynomial is stored | 82 | The most significant coefficient of the remainder polynomial is stored |
78 | in the least significant bit of the binary "remainder" variable. | 83 | in the least significant bit of the binary "remainder" variable. |
@@ -81,23 +86,25 @@ be bit-reversed) and next_input_bit(). | |||
81 | 86 | ||
82 | As long as next_input_bit is returning the bits in a sensible order, we don't | 87 | As long as next_input_bit is returning the bits in a sensible order, we don't |
83 | *have* to wait until the last possible moment to merge in additional bits. | 88 | *have* to wait until the last possible moment to merge in additional bits. |
84 | We can do it 8 bits at a time rather than 1 bit at a time: | 89 | We can do it 8 bits at a time rather than 1 bit at a time:: |
85 | for (i = 0; i < input_bytes; i++) { | 90 | |
86 | remainder ^= next_input_byte() << 24; | 91 | for (i = 0; i < input_bytes; i++) { |
87 | for (j = 0; j < 8; j++) { | 92 | remainder ^= next_input_byte() << 24; |
88 | multiple = (remainder & 0x80000000) ? CRCPOLY : 0; | 93 | for (j = 0; j < 8; j++) { |
89 | remainder = (remainder << 1) ^ multiple; | 94 | multiple = (remainder & 0x80000000) ? CRCPOLY : 0; |
95 | remainder = (remainder << 1) ^ multiple; | ||
96 | } | ||
90 | } | 97 | } |
91 | } | ||
92 | 98 | ||
93 | Or in little-endian: | 99 | Or in little-endian:: |
94 | for (i = 0; i < input_bytes; i++) { | 100 | |
95 | remainder ^= next_input_byte(); | 101 | for (i = 0; i < input_bytes; i++) { |
96 | for (j = 0; j < 8; j++) { | 102 | remainder ^= next_input_byte(); |
97 | multiple = (remainder & 1) ? CRCPOLY : 0; | 103 | for (j = 0; j < 8; j++) { |
98 | remainder = (remainder >> 1) ^ multiple; | 104 | multiple = (remainder & 1) ? CRCPOLY : 0; |
105 | remainder = (remainder >> 1) ^ multiple; | ||
106 | } | ||
99 | } | 107 | } |
100 | } | ||
101 | 108 | ||
102 | If the input is a multiple of 32 bits, you can even XOR in a 32-bit | 109 | If the input is a multiple of 32 bits, you can even XOR in a 32-bit |
103 | word at a time and increase the inner loop count to 32. | 110 | word at a time and increase the inner loop count to 32. |
diff --git a/Documentation/dcdbas.txt b/Documentation/dcdbas.txt index e1c52e2dc361..309cc57a7c1c 100644 --- a/Documentation/dcdbas.txt +++ b/Documentation/dcdbas.txt | |||
@@ -1,4 +1,9 @@ | |||
1 | =================================== | ||
2 | Dell Systems Management Base Driver | ||
3 | =================================== | ||
4 | |||
1 | Overview | 5 | Overview |
6 | ======== | ||
2 | 7 | ||
3 | The Dell Systems Management Base Driver provides a sysfs interface for | 8 | The Dell Systems Management Base Driver provides a sysfs interface for |
4 | systems management software such as Dell OpenManage to perform system | 9 | systems management software such as Dell OpenManage to perform system |
@@ -17,6 +22,7 @@ more information about the libsmbios project. | |||
17 | 22 | ||
18 | 23 | ||
19 | System Management Interrupt | 24 | System Management Interrupt |
25 | =========================== | ||
20 | 26 | ||
21 | On some Dell systems, systems management software must access certain | 27 | On some Dell systems, systems management software must access certain |
22 | management information via a system management interrupt (SMI). The SMI data | 28 | management information via a system management interrupt (SMI). The SMI data |
@@ -24,12 +30,12 @@ buffer must reside in 32-bit address space, and the physical address of the | |||
24 | buffer is required for the SMI. The driver maintains the memory required for | 30 | buffer is required for the SMI. The driver maintains the memory required for |
25 | the SMI and provides a way for the application to generate the SMI. | 31 | the SMI and provides a way for the application to generate the SMI. |
26 | The driver creates the following sysfs entries for systems management | 32 | The driver creates the following sysfs entries for systems management |
27 | software to perform these system management interrupts: | 33 | software to perform these system management interrupts:: |
28 | 34 | ||
29 | /sys/devices/platform/dcdbas/smi_data | 35 | /sys/devices/platform/dcdbas/smi_data |
30 | /sys/devices/platform/dcdbas/smi_data_buf_phys_addr | 36 | /sys/devices/platform/dcdbas/smi_data_buf_phys_addr |
31 | /sys/devices/platform/dcdbas/smi_data_buf_size | 37 | /sys/devices/platform/dcdbas/smi_data_buf_size |
32 | /sys/devices/platform/dcdbas/smi_request | 38 | /sys/devices/platform/dcdbas/smi_request |
33 | 39 | ||
34 | Systems management software must perform the following steps to execute | 40 | Systems management software must perform the following steps to execute |
35 | a SMI using this driver: | 41 | a SMI using this driver: |
@@ -43,6 +49,7 @@ a SMI using this driver: | |||
43 | 49 | ||
44 | 50 | ||
45 | Host Control Action | 51 | Host Control Action |
52 | =================== | ||
46 | 53 | ||
47 | Dell OpenManage supports a host control feature that allows the administrator | 54 | Dell OpenManage supports a host control feature that allows the administrator |
48 | to perform a power cycle or power off of the system after the OS has finished | 55 | to perform a power cycle or power off of the system after the OS has finished |
@@ -69,12 +76,14 @@ power off host control action using this driver: | |||
69 | 76 | ||
70 | 77 | ||
71 | Host Control SMI Type | 78 | Host Control SMI Type |
79 | ===================== | ||
72 | 80 | ||
73 | The following table shows the value to write to host_control_smi_type to | 81 | The following table shows the value to write to host_control_smi_type to |
74 | perform a power cycle or power off host control action: | 82 | perform a power cycle or power off host control action: |
75 | 83 | ||
84 | =================== ===================== | ||
76 | PowerEdge System Host Control SMI Type | 85 | PowerEdge System Host Control SMI Type |
77 | ---------------- --------------------- | 86 | =================== ===================== |
78 | 300 HC_SMITYPE_TYPE1 | 87 | 300 HC_SMITYPE_TYPE1 |
79 | 1300 HC_SMITYPE_TYPE1 | 88 | 1300 HC_SMITYPE_TYPE1 |
80 | 1400 HC_SMITYPE_TYPE2 | 89 | 1400 HC_SMITYPE_TYPE2 |
@@ -87,5 +96,4 @@ PowerEdge System Host Control SMI Type | |||
87 | 1655MC HC_SMITYPE_TYPE2 | 96 | 1655MC HC_SMITYPE_TYPE2 |
88 | 700 HC_SMITYPE_TYPE3 | 97 | 700 HC_SMITYPE_TYPE3 |
89 | 750 HC_SMITYPE_TYPE3 | 98 | 750 HC_SMITYPE_TYPE3 |
90 | 99 | =================== ===================== | |
91 | |||
diff --git a/Documentation/debugging-via-ohci1394.txt b/Documentation/debugging-via-ohci1394.txt index 9ff026d22b75..981ad4f89fd3 100644 --- a/Documentation/debugging-via-ohci1394.txt +++ b/Documentation/debugging-via-ohci1394.txt | |||
@@ -1,6 +1,6 @@ | |||
1 | 1 | =========================================================================== | |
2 | Using physical DMA provided by OHCI-1394 FireWire controllers for debugging | 2 | Using physical DMA provided by OHCI-1394 FireWire controllers for debugging |
3 | --------------------------------------------------------------------------- | 3 | =========================================================================== |
4 | 4 | ||
5 | Introduction | 5 | Introduction |
6 | ------------ | 6 | ------------ |
@@ -91,10 +91,10 @@ Step-by-step instructions for using firescope with early OHCI initialization: | |||
91 | 1) Verify that your hardware is supported: | 91 | 1) Verify that your hardware is supported: |
92 | 92 | ||
93 | Load the firewire-ohci module and check your kernel logs. | 93 | Load the firewire-ohci module and check your kernel logs. |
94 | You should see a line similar to | 94 | You should see a line similar to:: |
95 | 95 | ||
96 | firewire_ohci 0000:15:00.1: added OHCI v1.0 device as card 2, 4 IR + 4 IT | 96 | firewire_ohci 0000:15:00.1: added OHCI v1.0 device as card 2, 4 IR + 4 IT |
97 | ... contexts, quirks 0x11 | 97 | ... contexts, quirks 0x11 |
98 | 98 | ||
99 | when loading the driver. If you have no supported controller, many PCI, | 99 | when loading the driver. If you have no supported controller, many PCI, |
100 | CardBus and even some Express cards which are fully compliant to OHCI-1394 | 100 | CardBus and even some Express cards which are fully compliant to OHCI-1394 |
@@ -113,9 +113,9 @@ Step-by-step instructions for using firescope with early OHCI initialization: | |||
113 | stable connection and has matching connectors (there are small 4-pin and | 113 | stable connection and has matching connectors (there are small 4-pin and |
114 | large 6-pin FireWire ports) will do. | 114 | large 6-pin FireWire ports) will do. |
115 | 115 | ||
116 | If an driver is running on both machines you should see a line like | 116 | If an driver is running on both machines you should see a line like:: |
117 | 117 | ||
118 | firewire_core 0000:15:00.1: created device fw1: GUID 00061b0020105917, S400 | 118 | firewire_core 0000:15:00.1: created device fw1: GUID 00061b0020105917, S400 |
119 | 119 | ||
120 | on both machines in the kernel log when the cable is plugged in | 120 | on both machines in the kernel log when the cable is plugged in |
121 | and connects the two machines. | 121 | and connects the two machines. |
@@ -123,7 +123,7 @@ Step-by-step instructions for using firescope with early OHCI initialization: | |||
123 | 3) Test physical DMA using firescope: | 123 | 3) Test physical DMA using firescope: |
124 | 124 | ||
125 | On the debug host, make sure that /dev/fw* is accessible, | 125 | On the debug host, make sure that /dev/fw* is accessible, |
126 | then start firescope: | 126 | then start firescope:: |
127 | 127 | ||
128 | $ firescope | 128 | $ firescope |
129 | Port 0 (/dev/fw1) opened, 2 nodes detected | 129 | Port 0 (/dev/fw1) opened, 2 nodes detected |
@@ -163,7 +163,7 @@ Step-by-step instructions for using firescope with early OHCI initialization: | |||
163 | host loaded, reboot the debugged machine, booting the kernel which has | 163 | host loaded, reboot the debugged machine, booting the kernel which has |
164 | CONFIG_PROVIDE_OHCI1394_DMA_INIT enabled, with the option ohci1394_dma=early. | 164 | CONFIG_PROVIDE_OHCI1394_DMA_INIT enabled, with the option ohci1394_dma=early. |
165 | 165 | ||
166 | Then, on the debugging host, run firescope, for example by using -A: | 166 | Then, on the debugging host, run firescope, for example by using -A:: |
167 | 167 | ||
168 | firescope -A System.map-of-debug-target-kernel | 168 | firescope -A System.map-of-debug-target-kernel |
169 | 169 | ||
@@ -178,6 +178,7 @@ Step-by-step instructions for using firescope with early OHCI initialization: | |||
178 | 178 | ||
179 | Notes | 179 | Notes |
180 | ----- | 180 | ----- |
181 | |||
181 | Documentation and specifications: http://halobates.de/firewire/ | 182 | Documentation and specifications: http://halobates.de/firewire/ |
182 | 183 | ||
183 | FireWire is a trademark of Apple Inc. - for more information please refer to: | 184 | FireWire is a trademark of Apple Inc. - for more information please refer to: |
diff --git a/Documentation/dell_rbu.txt b/Documentation/dell_rbu.txt index d262e22bddec..0fdb6aa2704c 100644 --- a/Documentation/dell_rbu.txt +++ b/Documentation/dell_rbu.txt | |||
@@ -1,18 +1,30 @@ | |||
1 | Purpose: | 1 | ============================================================= |
2 | Demonstrate the usage of the new open sourced rbu (Remote BIOS Update) driver | 2 | Usage of the new open sourced rbu (Remote BIOS Update) driver |
3 | ============================================================= | ||
4 | |||
5 | Purpose | ||
6 | ======= | ||
7 | |||
8 | Document demonstrating the use of the Dell Remote BIOS Update driver. | ||
3 | for updating BIOS images on Dell servers and desktops. | 9 | for updating BIOS images on Dell servers and desktops. |
4 | 10 | ||
5 | Scope: | 11 | Scope |
12 | ===== | ||
13 | |||
6 | This document discusses the functionality of the rbu driver only. | 14 | This document discusses the functionality of the rbu driver only. |
7 | It does not cover the support needed from applications to enable the BIOS to | 15 | It does not cover the support needed from applications to enable the BIOS to |
8 | update itself with the image downloaded in to the memory. | 16 | update itself with the image downloaded in to the memory. |
9 | 17 | ||
10 | Overview: | 18 | Overview |
19 | ======== | ||
20 | |||
11 | This driver works with Dell OpenManage or Dell Update Packages for updating | 21 | This driver works with Dell OpenManage or Dell Update Packages for updating |
12 | the BIOS on Dell servers (starting from servers sold since 1999), desktops | 22 | the BIOS on Dell servers (starting from servers sold since 1999), desktops |
13 | and notebooks (starting from those sold in 2005). | 23 | and notebooks (starting from those sold in 2005). |
24 | |||
14 | Please go to http://support.dell.com register and you can find info on | 25 | Please go to http://support.dell.com register and you can find info on |
15 | OpenManage and Dell Update packages (DUP). | 26 | OpenManage and Dell Update packages (DUP). |
27 | |||
16 | Libsmbios can also be used to update BIOS on Dell systems go to | 28 | Libsmbios can also be used to update BIOS on Dell systems go to |
17 | http://linux.dell.com/libsmbios/ for details. | 29 | http://linux.dell.com/libsmbios/ for details. |
18 | 30 | ||
@@ -22,6 +34,7 @@ of physical pages having the BIOS image. In case of packetized the app | |||
22 | using the driver breaks the image in to packets of fixed sizes and the driver | 34 | using the driver breaks the image in to packets of fixed sizes and the driver |
23 | would place each packet in contiguous physical memory. The driver also | 35 | would place each packet in contiguous physical memory. The driver also |
24 | maintains a link list of packets for reading them back. | 36 | maintains a link list of packets for reading them back. |
37 | |||
25 | If the dell_rbu driver is unloaded all the allocated memory is freed. | 38 | If the dell_rbu driver is unloaded all the allocated memory is freed. |
26 | 39 | ||
27 | The rbu driver needs to have an application (as mentioned above)which will | 40 | The rbu driver needs to have an application (as mentioned above)which will |
@@ -30,28 +43,33 @@ inform the BIOS to enable the update in the next system reboot. | |||
30 | The user should not unload the rbu driver after downloading the BIOS image | 43 | The user should not unload the rbu driver after downloading the BIOS image |
31 | or updating. | 44 | or updating. |
32 | 45 | ||
33 | The driver load creates the following directories under the /sys file system. | 46 | The driver load creates the following directories under the /sys file system:: |
34 | /sys/class/firmware/dell_rbu/loading | 47 | |
35 | /sys/class/firmware/dell_rbu/data | 48 | /sys/class/firmware/dell_rbu/loading |
36 | /sys/devices/platform/dell_rbu/image_type | 49 | /sys/class/firmware/dell_rbu/data |
37 | /sys/devices/platform/dell_rbu/data | 50 | /sys/devices/platform/dell_rbu/image_type |
38 | /sys/devices/platform/dell_rbu/packet_size | 51 | /sys/devices/platform/dell_rbu/data |
52 | /sys/devices/platform/dell_rbu/packet_size | ||
39 | 53 | ||
40 | The driver supports two types of update mechanism; monolithic and packetized. | 54 | The driver supports two types of update mechanism; monolithic and packetized. |
41 | These update mechanism depends upon the BIOS currently running on the system. | 55 | These update mechanism depends upon the BIOS currently running on the system. |
42 | Most of the Dell systems support a monolithic update where the BIOS image is | 56 | Most of the Dell systems support a monolithic update where the BIOS image is |
43 | copied to a single contiguous block of physical memory. | 57 | copied to a single contiguous block of physical memory. |
58 | |||
44 | In case of packet mechanism the single memory can be broken in smaller chunks | 59 | In case of packet mechanism the single memory can be broken in smaller chunks |
45 | of contiguous memory and the BIOS image is scattered in these packets. | 60 | of contiguous memory and the BIOS image is scattered in these packets. |
46 | 61 | ||
47 | By default the driver uses monolithic memory for the update type. This can be | 62 | By default the driver uses monolithic memory for the update type. This can be |
48 | changed to packets during the driver load time by specifying the load | 63 | changed to packets during the driver load time by specifying the load |
49 | parameter image_type=packet. This can also be changed later as below | 64 | parameter image_type=packet. This can also be changed later as below:: |
50 | echo packet > /sys/devices/platform/dell_rbu/image_type | 65 | |
66 | echo packet > /sys/devices/platform/dell_rbu/image_type | ||
51 | 67 | ||
52 | In packet update mode the packet size has to be given before any packets can | 68 | In packet update mode the packet size has to be given before any packets can |
53 | be downloaded. It is done as below | 69 | be downloaded. It is done as below:: |
54 | echo XXXX > /sys/devices/platform/dell_rbu/packet_size | 70 | |
71 | echo XXXX > /sys/devices/platform/dell_rbu/packet_size | ||
72 | |||
55 | In the packet update mechanism, the user needs to create a new file having | 73 | In the packet update mechanism, the user needs to create a new file having |
56 | packets of data arranged back to back. It can be done as follows | 74 | packets of data arranged back to back. It can be done as follows |
57 | The user creates packets header, gets the chunk of the BIOS image and | 75 | The user creates packets header, gets the chunk of the BIOS image and |
@@ -60,41 +78,54 @@ added together should match the specified packet_size. This makes one | |||
60 | packet, the user needs to create more such packets out of the entire BIOS | 78 | packet, the user needs to create more such packets out of the entire BIOS |
61 | image file and then arrange all these packets back to back in to one single | 79 | image file and then arrange all these packets back to back in to one single |
62 | file. | 80 | file. |
81 | |||
63 | This file is then copied to /sys/class/firmware/dell_rbu/data. | 82 | This file is then copied to /sys/class/firmware/dell_rbu/data. |
64 | Once this file gets to the driver, the driver extracts packet_size data from | 83 | Once this file gets to the driver, the driver extracts packet_size data from |
65 | the file and spreads it across the physical memory in contiguous packet_sized | 84 | the file and spreads it across the physical memory in contiguous packet_sized |
66 | space. | 85 | space. |
86 | |||
67 | This method makes sure that all the packets get to the driver in a single operation. | 87 | This method makes sure that all the packets get to the driver in a single operation. |
68 | 88 | ||
69 | In monolithic update the user simply get the BIOS image (.hdr file) and copies | 89 | In monolithic update the user simply get the BIOS image (.hdr file) and copies |
70 | to the data file as is without any change to the BIOS image itself. | 90 | to the data file as is without any change to the BIOS image itself. |
71 | 91 | ||
72 | Do the steps below to download the BIOS image. | 92 | Do the steps below to download the BIOS image. |
93 | |||
73 | 1) echo 1 > /sys/class/firmware/dell_rbu/loading | 94 | 1) echo 1 > /sys/class/firmware/dell_rbu/loading |
74 | 2) cp bios_image.hdr /sys/class/firmware/dell_rbu/data | 95 | 2) cp bios_image.hdr /sys/class/firmware/dell_rbu/data |
75 | 3) echo 0 > /sys/class/firmware/dell_rbu/loading | 96 | 3) echo 0 > /sys/class/firmware/dell_rbu/loading |
76 | 97 | ||
77 | The /sys/class/firmware/dell_rbu/ entries will remain till the following is | 98 | The /sys/class/firmware/dell_rbu/ entries will remain till the following is |
78 | done. | 99 | done. |
79 | echo -1 > /sys/class/firmware/dell_rbu/loading | 100 | |
101 | :: | ||
102 | |||
103 | echo -1 > /sys/class/firmware/dell_rbu/loading | ||
104 | |||
80 | Until this step is completed the driver cannot be unloaded. | 105 | Until this step is completed the driver cannot be unloaded. |
106 | |||
81 | Also echoing either mono, packet or init in to image_type will free up the | 107 | Also echoing either mono, packet or init in to image_type will free up the |
82 | memory allocated by the driver. | 108 | memory allocated by the driver. |
83 | 109 | ||
84 | If a user by accident executes steps 1 and 3 above without executing step 2; | 110 | If a user by accident executes steps 1 and 3 above without executing step 2; |
85 | it will make the /sys/class/firmware/dell_rbu/ entries disappear. | 111 | it will make the /sys/class/firmware/dell_rbu/ entries disappear. |
86 | The entries can be recreated by doing the following | 112 | |
87 | echo init > /sys/devices/platform/dell_rbu/image_type | 113 | The entries can be recreated by doing the following:: |
88 | NOTE: echoing init in image_type does not change it original value. | 114 | |
115 | echo init > /sys/devices/platform/dell_rbu/image_type | ||
116 | |||
117 | .. note:: echoing init in image_type does not change it original value. | ||
89 | 118 | ||
90 | Also the driver provides /sys/devices/platform/dell_rbu/data readonly file to | 119 | Also the driver provides /sys/devices/platform/dell_rbu/data readonly file to |
91 | read back the image downloaded. | 120 | read back the image downloaded. |
92 | 121 | ||
93 | NOTE: | 122 | .. note:: |
94 | This driver requires a patch for firmware_class.c which has the modified | 123 | |
95 | request_firmware_nowait function. | 124 | This driver requires a patch for firmware_class.c which has the modified |
96 | Also after updating the BIOS image a user mode application needs to execute | 125 | request_firmware_nowait function. |
97 | code which sends the BIOS update request to the BIOS. So on the next reboot | 126 | |
98 | the BIOS knows about the new image downloaded and it updates itself. | 127 | Also after updating the BIOS image a user mode application needs to execute |
99 | Also don't unload the rbu driver if the image has to be updated. | 128 | code which sends the BIOS update request to the BIOS. So on the next reboot |
129 | the BIOS knows about the new image downloaded and it updates itself. | ||
130 | Also don't unload the rbu driver if the image has to be updated. | ||
100 | 131 | ||
diff --git a/Documentation/digsig.txt b/Documentation/digsig.txt index 3f682889068b..f6a8902d3ef7 100644 --- a/Documentation/digsig.txt +++ b/Documentation/digsig.txt | |||
@@ -1,13 +1,20 @@ | |||
1 | ================================== | ||
1 | Digital Signature Verification API | 2 | Digital Signature Verification API |
3 | ================================== | ||
2 | 4 | ||
3 | CONTENTS | 5 | :Author: Dmitry Kasatkin |
6 | :Date: 06.10.2011 | ||
4 | 7 | ||
5 | 1. Introduction | ||
6 | 2. API | ||
7 | 3. User-space utilities | ||
8 | 8 | ||
9 | .. CONTENTS | ||
9 | 10 | ||
10 | 1. Introduction | 11 | 1. Introduction |
12 | 2. API | ||
13 | 3. User-space utilities | ||
14 | |||
15 | |||
16 | Introduction | ||
17 | ============ | ||
11 | 18 | ||
12 | Digital signature verification API provides a method to verify digital signature. | 19 | Digital signature verification API provides a method to verify digital signature. |
13 | Currently digital signatures are used by the IMA/EVM integrity protection subsystem. | 20 | Currently digital signatures are used by the IMA/EVM integrity protection subsystem. |
@@ -17,25 +24,25 @@ GnuPG multi-precision integers (MPI) library. The kernel port provides | |||
17 | memory allocation errors handling, has been refactored according to kernel | 24 | memory allocation errors handling, has been refactored according to kernel |
18 | coding style, and checkpatch.pl reported errors and warnings have been fixed. | 25 | coding style, and checkpatch.pl reported errors and warnings have been fixed. |
19 | 26 | ||
20 | Public key and signature consist of header and MPIs. | 27 | Public key and signature consist of header and MPIs:: |
21 | 28 | ||
22 | struct pubkey_hdr { | 29 | struct pubkey_hdr { |
23 | uint8_t version; /* key format version */ | 30 | uint8_t version; /* key format version */ |
24 | time_t timestamp; /* key made, always 0 for now */ | 31 | time_t timestamp; /* key made, always 0 for now */ |
25 | uint8_t algo; | 32 | uint8_t algo; |
26 | uint8_t nmpi; | 33 | uint8_t nmpi; |
27 | char mpi[0]; | 34 | char mpi[0]; |
28 | } __packed; | 35 | } __packed; |
29 | 36 | ||
30 | struct signature_hdr { | 37 | struct signature_hdr { |
31 | uint8_t version; /* signature format version */ | 38 | uint8_t version; /* signature format version */ |
32 | time_t timestamp; /* signature made */ | 39 | time_t timestamp; /* signature made */ |
33 | uint8_t algo; | 40 | uint8_t algo; |
34 | uint8_t hash; | 41 | uint8_t hash; |
35 | uint8_t keyid[8]; | 42 | uint8_t keyid[8]; |
36 | uint8_t nmpi; | 43 | uint8_t nmpi; |
37 | char mpi[0]; | 44 | char mpi[0]; |
38 | } __packed; | 45 | } __packed; |
39 | 46 | ||
40 | keyid equals to SHA1[12-19] over the total key content. | 47 | keyid equals to SHA1[12-19] over the total key content. |
41 | Signature header is used as an input to generate a signature. | 48 | Signature header is used as an input to generate a signature. |
@@ -43,31 +50,33 @@ Such approach insures that key or signature header could not be changed. | |||
43 | It protects timestamp from been changed and can be used for rollback | 50 | It protects timestamp from been changed and can be used for rollback |
44 | protection. | 51 | protection. |
45 | 52 | ||
46 | 2. API | 53 | API |
54 | === | ||
47 | 55 | ||
48 | API currently includes only 1 function: | 56 | API currently includes only 1 function:: |
49 | 57 | ||
50 | digsig_verify() - digital signature verification with public key | 58 | digsig_verify() - digital signature verification with public key |
51 | 59 | ||
52 | 60 | ||
53 | /** | 61 | /** |
54 | * digsig_verify() - digital signature verification with public key | 62 | * digsig_verify() - digital signature verification with public key |
55 | * @keyring: keyring to search key in | 63 | * @keyring: keyring to search key in |
56 | * @sig: digital signature | 64 | * @sig: digital signature |
57 | * @sigen: length of the signature | 65 | * @sigen: length of the signature |
58 | * @data: data | 66 | * @data: data |
59 | * @datalen: length of the data | 67 | * @datalen: length of the data |
60 | * @return: 0 on success, -EINVAL otherwise | 68 | * @return: 0 on success, -EINVAL otherwise |
61 | * | 69 | * |
62 | * Verifies data integrity against digital signature. | 70 | * Verifies data integrity against digital signature. |
63 | * Currently only RSA is supported. | 71 | * Currently only RSA is supported. |
64 | * Normally hash of the content is used as a data for this function. | 72 | * Normally hash of the content is used as a data for this function. |
65 | * | 73 | * |
66 | */ | 74 | */ |
67 | int digsig_verify(struct key *keyring, const char *sig, int siglen, | 75 | int digsig_verify(struct key *keyring, const char *sig, int siglen, |
68 | const char *data, int datalen); | 76 | const char *data, int datalen); |
69 | 77 | ||
70 | 3. User-space utilities | 78 | User-space utilities |
79 | ==================== | ||
71 | 80 | ||
72 | The signing and key management utilities evm-utils provide functionality | 81 | The signing and key management utilities evm-utils provide functionality |
73 | to generate signatures, to load keys into the kernel keyring. | 82 | to generate signatures, to load keys into the kernel keyring. |
@@ -75,22 +84,18 @@ Keys can be in PEM or converted to the kernel format. | |||
75 | When the key is added to the kernel keyring, the keyid defines the name | 84 | When the key is added to the kernel keyring, the keyid defines the name |
76 | of the key: 5D2B05FC633EE3E8 in the example bellow. | 85 | of the key: 5D2B05FC633EE3E8 in the example bellow. |
77 | 86 | ||
78 | Here is example output of the keyctl utility. | 87 | Here is example output of the keyctl utility:: |
79 | 88 | ||
80 | $ keyctl show | 89 | $ keyctl show |
81 | Session Keyring | 90 | Session Keyring |
82 | -3 --alswrv 0 0 keyring: _ses | 91 | -3 --alswrv 0 0 keyring: _ses |
83 | 603976250 --alswrv 0 -1 \_ keyring: _uid.0 | 92 | 603976250 --alswrv 0 -1 \_ keyring: _uid.0 |
84 | 817777377 --alswrv 0 0 \_ user: kmk | 93 | 817777377 --alswrv 0 0 \_ user: kmk |
85 | 891974900 --alswrv 0 0 \_ encrypted: evm-key | 94 | 891974900 --alswrv 0 0 \_ encrypted: evm-key |
86 | 170323636 --alswrv 0 0 \_ keyring: _module | 95 | 170323636 --alswrv 0 0 \_ keyring: _module |
87 | 548221616 --alswrv 0 0 \_ keyring: _ima | 96 | 548221616 --alswrv 0 0 \_ keyring: _ima |
88 | 128198054 --alswrv 0 0 \_ keyring: _evm | 97 | 128198054 --alswrv 0 0 \_ keyring: _evm |
89 | 98 | ||
90 | $ keyctl list 128198054 | 99 | $ keyctl list 128198054 |
91 | 1 key in keyring: | 100 | 1 key in keyring: |
92 | 620789745: --alswrv 0 0 user: 5D2B05FC633EE3E8 | 101 | 620789745: --alswrv 0 0 user: 5D2B05FC633EE3E8 |
93 | |||
94 | |||
95 | Dmitry Kasatkin | ||
96 | 06.10.2011 | ||
diff --git a/Documentation/efi-stub.txt b/Documentation/efi-stub.txt index e15746988261..41df801f9a50 100644 --- a/Documentation/efi-stub.txt +++ b/Documentation/efi-stub.txt | |||
@@ -1,5 +1,6 @@ | |||
1 | The EFI Boot Stub | 1 | ================= |
2 | --------------------------- | 2 | The EFI Boot Stub |
3 | ================= | ||
3 | 4 | ||
4 | On the x86 and ARM platforms, a kernel zImage/bzImage can masquerade | 5 | On the x86 and ARM platforms, a kernel zImage/bzImage can masquerade |
5 | as a PE/COFF image, thereby convincing EFI firmware loaders to load | 6 | as a PE/COFF image, thereby convincing EFI firmware loaders to load |
@@ -25,7 +26,8 @@ a certain sense it *IS* the boot loader. | |||
25 | The EFI boot stub is enabled with the CONFIG_EFI_STUB kernel option. | 26 | The EFI boot stub is enabled with the CONFIG_EFI_STUB kernel option. |
26 | 27 | ||
27 | 28 | ||
28 | **** How to install bzImage.efi | 29 | How to install bzImage.efi |
30 | -------------------------- | ||
29 | 31 | ||
30 | The bzImage located in arch/x86/boot/bzImage must be copied to the EFI | 32 | The bzImage located in arch/x86/boot/bzImage must be copied to the EFI |
31 | System Partition (ESP) and renamed with the extension ".efi". Without | 33 | System Partition (ESP) and renamed with the extension ".efi". Without |
@@ -37,14 +39,16 @@ may not need to be renamed. Similarly for arm64, arch/arm64/boot/Image | |||
37 | should be copied but not necessarily renamed. | 39 | should be copied but not necessarily renamed. |
38 | 40 | ||
39 | 41 | ||
40 | **** Passing kernel parameters from the EFI shell | 42 | Passing kernel parameters from the EFI shell |
43 | -------------------------------------------- | ||
41 | 44 | ||
42 | Arguments to the kernel can be passed after bzImage.efi, e.g. | 45 | Arguments to the kernel can be passed after bzImage.efi, e.g.:: |
43 | 46 | ||
44 | fs0:> bzImage.efi console=ttyS0 root=/dev/sda4 | 47 | fs0:> bzImage.efi console=ttyS0 root=/dev/sda4 |
45 | 48 | ||
46 | 49 | ||
47 | **** The "initrd=" option | 50 | The "initrd=" option |
51 | -------------------- | ||
48 | 52 | ||
49 | Like most boot loaders, the EFI stub allows the user to specify | 53 | Like most boot loaders, the EFI stub allows the user to specify |
50 | multiple initrd files using the "initrd=" option. This is the only EFI | 54 | multiple initrd files using the "initrd=" option. This is the only EFI |
@@ -54,9 +58,9 @@ kernel when it boots. | |||
54 | The path to the initrd file must be an absolute path from the | 58 | The path to the initrd file must be an absolute path from the |
55 | beginning of the ESP, relative path names do not work. Also, the path | 59 | beginning of the ESP, relative path names do not work. Also, the path |
56 | is an EFI-style path and directory elements must be separated with | 60 | is an EFI-style path and directory elements must be separated with |
57 | backslashes (\). For example, given the following directory layout, | 61 | backslashes (\). For example, given the following directory layout:: |
58 | 62 | ||
59 | fs0:> | 63 | fs0:> |
60 | Kernels\ | 64 | Kernels\ |
61 | bzImage.efi | 65 | bzImage.efi |
62 | initrd-large.img | 66 | initrd-large.img |
@@ -66,7 +70,7 @@ fs0:> | |||
66 | initrd-medium.img | 70 | initrd-medium.img |
67 | 71 | ||
68 | to boot with the initrd-large.img file if the current working | 72 | to boot with the initrd-large.img file if the current working |
69 | directory is fs0:\Kernels, the following command must be used, | 73 | directory is fs0:\Kernels, the following command must be used:: |
70 | 74 | ||
71 | fs0:\Kernels> bzImage.efi initrd=\Kernels\initrd-large.img | 75 | fs0:\Kernels> bzImage.efi initrd=\Kernels\initrd-large.img |
72 | 76 | ||
@@ -76,7 +80,8 @@ which understands relative paths, whereas the rest of the command line | |||
76 | is passed to bzImage.efi. | 80 | is passed to bzImage.efi. |
77 | 81 | ||
78 | 82 | ||
79 | **** The "dtb=" option | 83 | The "dtb=" option |
84 | ----------------- | ||
80 | 85 | ||
81 | For the ARM and arm64 architectures, we also need to be able to provide a | 86 | For the ARM and arm64 architectures, we also need to be able to provide a |
82 | device tree to the kernel. This is done with the "dtb=" command line option, | 87 | device tree to the kernel. This is done with the "dtb=" command line option, |
diff --git a/Documentation/eisa.txt b/Documentation/eisa.txt index a55e4910924e..2806e5544e43 100644 --- a/Documentation/eisa.txt +++ b/Documentation/eisa.txt | |||
@@ -1,4 +1,8 @@ | |||
1 | EISA bus support (Marc Zyngier <maz@wild-wind.fr.eu.org>) | 1 | ================ |
2 | EISA bus support | ||
3 | ================ | ||
4 | |||
5 | :Author: Marc Zyngier <maz@wild-wind.fr.eu.org> | ||
2 | 6 | ||
3 | This document groups random notes about porting EISA drivers to the | 7 | This document groups random notes about porting EISA drivers to the |
4 | new EISA/sysfs API. | 8 | new EISA/sysfs API. |
@@ -14,168 +18,189 @@ detection code is generally also used to probe ISA cards). Moreover, | |||
14 | most EISA drivers are among the oldest Linux drivers so, as you can | 18 | most EISA drivers are among the oldest Linux drivers so, as you can |
15 | imagine, some dust has settled here over the years. | 19 | imagine, some dust has settled here over the years. |
16 | 20 | ||
17 | The EISA infrastructure is made up of three parts : | 21 | The EISA infrastructure is made up of three parts: |
18 | 22 | ||
19 | - The bus code implements most of the generic code. It is shared | 23 | - The bus code implements most of the generic code. It is shared |
20 | among all the architectures that the EISA code runs on. It | 24 | among all the architectures that the EISA code runs on. It |
21 | implements bus probing (detecting EISA cards available on the bus), | 25 | implements bus probing (detecting EISA cards available on the bus), |
22 | allocates I/O resources, allows fancy naming through sysfs, and | 26 | allocates I/O resources, allows fancy naming through sysfs, and |
23 | offers interfaces for driver to register. | 27 | offers interfaces for driver to register. |
24 | 28 | ||
25 | - The bus root driver implements the glue between the bus hardware | 29 | - The bus root driver implements the glue between the bus hardware |
26 | and the generic bus code. It is responsible for discovering the | 30 | and the generic bus code. It is responsible for discovering the |
27 | device implementing the bus, and setting it up to be latter probed | 31 | device implementing the bus, and setting it up to be latter probed |
28 | by the bus code. This can go from something as simple as reserving | 32 | by the bus code. This can go from something as simple as reserving |
29 | an I/O region on x86, to the rather more complex, like the hppa | 33 | an I/O region on x86, to the rather more complex, like the hppa |
30 | EISA code. This is the part to implement in order to have EISA | 34 | EISA code. This is the part to implement in order to have EISA |
31 | running on an "new" platform. | 35 | running on an "new" platform. |
32 | 36 | ||
33 | - The driver offers the bus a list of devices that it manages, and | 37 | - The driver offers the bus a list of devices that it manages, and |
34 | implements the necessary callbacks to probe and release devices | 38 | implements the necessary callbacks to probe and release devices |
35 | whenever told to. | 39 | whenever told to. |
36 | 40 | ||
37 | Every function/structure below lives in <linux/eisa.h>, which depends | 41 | Every function/structure below lives in <linux/eisa.h>, which depends |
38 | heavily on <linux/device.h>. | 42 | heavily on <linux/device.h>. |
39 | 43 | ||
40 | ** Bus root driver : | 44 | Bus root driver |
45 | =============== | ||
46 | |||
47 | :: | ||
41 | 48 | ||
42 | int eisa_root_register (struct eisa_root_device *root); | 49 | int eisa_root_register (struct eisa_root_device *root); |
43 | 50 | ||
44 | The eisa_root_register function is used to declare a device as the | 51 | The eisa_root_register function is used to declare a device as the |
45 | root of an EISA bus. The eisa_root_device structure holds a reference | 52 | root of an EISA bus. The eisa_root_device structure holds a reference |
46 | to this device, as well as some parameters for probing purposes. | 53 | to this device, as well as some parameters for probing purposes:: |
47 | 54 | ||
48 | struct eisa_root_device { | 55 | struct eisa_root_device { |
49 | struct device *dev; /* Pointer to bridge device */ | 56 | struct device *dev; /* Pointer to bridge device */ |
50 | struct resource *res; | 57 | struct resource *res; |
51 | unsigned long bus_base_addr; | 58 | unsigned long bus_base_addr; |
52 | int slots; /* Max slot number */ | 59 | int slots; /* Max slot number */ |
53 | int force_probe; /* Probe even when no slot 0 */ | 60 | int force_probe; /* Probe even when no slot 0 */ |
54 | u64 dma_mask; /* from bridge device */ | 61 | u64 dma_mask; /* from bridge device */ |
55 | int bus_nr; /* Set by eisa_root_register */ | 62 | int bus_nr; /* Set by eisa_root_register */ |
56 | struct resource eisa_root_res; /* ditto */ | 63 | struct resource eisa_root_res; /* ditto */ |
57 | }; | 64 | }; |
58 | 65 | ||
59 | node : used for eisa_root_register internal purpose | 66 | ============= ====================================================== |
60 | dev : pointer to the root device | 67 | node used for eisa_root_register internal purpose |
61 | res : root device I/O resource | 68 | dev pointer to the root device |
62 | bus_base_addr : slot 0 address on this bus | 69 | res root device I/O resource |
63 | slots : max slot number to probe | 70 | bus_base_addr slot 0 address on this bus |
64 | force_probe : Probe even when slot 0 is empty (no EISA mainboard) | 71 | slots max slot number to probe |
65 | dma_mask : Default DMA mask. Usually the bridge device dma_mask. | 72 | force_probe Probe even when slot 0 is empty (no EISA mainboard) |
66 | bus_nr : unique bus id, set by eisa_root_register | 73 | dma_mask Default DMA mask. Usually the bridge device dma_mask. |
67 | 74 | bus_nr unique bus id, set by eisa_root_register | |
68 | ** Driver : | 75 | ============= ====================================================== |
69 | 76 | ||
70 | int eisa_driver_register (struct eisa_driver *edrv); | 77 | Driver |
71 | void eisa_driver_unregister (struct eisa_driver *edrv); | 78 | ====== |
79 | |||
80 | :: | ||
81 | |||
82 | int eisa_driver_register (struct eisa_driver *edrv); | ||
83 | void eisa_driver_unregister (struct eisa_driver *edrv); | ||
72 | 84 | ||
73 | Clear enough ? | 85 | Clear enough ? |
74 | 86 | ||
75 | struct eisa_device_id { | 87 | :: |
76 | char sig[EISA_SIG_LEN]; | 88 | |
77 | unsigned long driver_data; | 89 | struct eisa_device_id { |
78 | }; | 90 | char sig[EISA_SIG_LEN]; |
79 | 91 | unsigned long driver_data; | |
80 | struct eisa_driver { | 92 | }; |
81 | const struct eisa_device_id *id_table; | 93 | |
82 | struct device_driver driver; | 94 | struct eisa_driver { |
83 | }; | 95 | const struct eisa_device_id *id_table; |
84 | 96 | struct device_driver driver; | |
85 | id_table : an array of NULL terminated EISA id strings, | 97 | }; |
86 | followed by an empty string. Each string can | 98 | |
87 | optionally be paired with a driver-dependent value | 99 | =============== ==================================================== |
88 | (driver_data). | 100 | id_table an array of NULL terminated EISA id strings, |
89 | 101 | followed by an empty string. Each string can | |
90 | driver : a generic driver, such as described in | 102 | optionally be paired with a driver-dependent value |
91 | Documentation/driver-model/driver.txt. Only .name, | 103 | (driver_data). |
92 | .probe and .remove members are mandatory. | 104 | |
93 | 105 | driver a generic driver, such as described in | |
94 | An example is the 3c59x driver : | 106 | Documentation/driver-model/driver.txt. Only .name, |
95 | 107 | .probe and .remove members are mandatory. | |
96 | static struct eisa_device_id vortex_eisa_ids[] = { | 108 | =============== ==================================================== |
97 | { "TCM5920", EISA_3C592_OFFSET }, | 109 | |
98 | { "TCM5970", EISA_3C597_OFFSET }, | 110 | An example is the 3c59x driver:: |
99 | { "" } | 111 | |
100 | }; | 112 | static struct eisa_device_id vortex_eisa_ids[] = { |
101 | 113 | { "TCM5920", EISA_3C592_OFFSET }, | |
102 | static struct eisa_driver vortex_eisa_driver = { | 114 | { "TCM5970", EISA_3C597_OFFSET }, |
103 | .id_table = vortex_eisa_ids, | 115 | { "" } |
104 | .driver = { | 116 | }; |
105 | .name = "3c59x", | 117 | |
106 | .probe = vortex_eisa_probe, | 118 | static struct eisa_driver vortex_eisa_driver = { |
107 | .remove = vortex_eisa_remove | 119 | .id_table = vortex_eisa_ids, |
108 | } | 120 | .driver = { |
109 | }; | 121 | .name = "3c59x", |
110 | 122 | .probe = vortex_eisa_probe, | |
111 | ** Device : | 123 | .remove = vortex_eisa_remove |
124 | } | ||
125 | }; | ||
126 | |||
127 | Device | ||
128 | ====== | ||
112 | 129 | ||
113 | The sysfs framework calls .probe and .remove functions upon device | 130 | The sysfs framework calls .probe and .remove functions upon device |
114 | discovery and removal (note that the .remove function is only called | 131 | discovery and removal (note that the .remove function is only called |
115 | when driver is built as a module). | 132 | when driver is built as a module). |
116 | 133 | ||
117 | Both functions are passed a pointer to a 'struct device', which is | 134 | Both functions are passed a pointer to a 'struct device', which is |
118 | encapsulated in a 'struct eisa_device' described as follows : | 135 | encapsulated in a 'struct eisa_device' described as follows:: |
119 | 136 | ||
120 | struct eisa_device { | 137 | struct eisa_device { |
121 | struct eisa_device_id id; | 138 | struct eisa_device_id id; |
122 | int slot; | 139 | int slot; |
123 | int state; | 140 | int state; |
124 | unsigned long base_addr; | 141 | unsigned long base_addr; |
125 | struct resource res[EISA_MAX_RESOURCES]; | 142 | struct resource res[EISA_MAX_RESOURCES]; |
126 | u64 dma_mask; | 143 | u64 dma_mask; |
127 | struct device dev; /* generic device */ | 144 | struct device dev; /* generic device */ |
128 | }; | 145 | }; |
129 | 146 | ||
130 | id : EISA id, as read from device. id.driver_data is set from the | 147 | ======== ============================================================ |
131 | matching driver EISA id. | 148 | id EISA id, as read from device. id.driver_data is set from the |
132 | slot : slot number which the device was detected on | 149 | matching driver EISA id. |
133 | state : set of flags indicating the state of the device. Current | 150 | slot slot number which the device was detected on |
134 | flags are EISA_CONFIG_ENABLED and EISA_CONFIG_FORCED. | 151 | state set of flags indicating the state of the device. Current |
135 | res : set of four 256 bytes I/O regions allocated to this device | 152 | flags are EISA_CONFIG_ENABLED and EISA_CONFIG_FORCED. |
136 | dma_mask: DMA mask set from the parent device. | 153 | res set of four 256 bytes I/O regions allocated to this device |
137 | dev : generic device (see Documentation/driver-model/device.txt) | 154 | dma_mask DMA mask set from the parent device. |
155 | dev generic device (see Documentation/driver-model/device.txt) | ||
156 | ======== ============================================================ | ||
138 | 157 | ||
139 | You can get the 'struct eisa_device' from 'struct device' using the | 158 | You can get the 'struct eisa_device' from 'struct device' using the |
140 | 'to_eisa_device' macro. | 159 | 'to_eisa_device' macro. |
141 | 160 | ||
142 | ** Misc stuff : | 161 | Misc stuff |
162 | ========== | ||
163 | |||
164 | :: | ||
143 | 165 | ||
144 | void eisa_set_drvdata (struct eisa_device *edev, void *data); | 166 | void eisa_set_drvdata (struct eisa_device *edev, void *data); |
145 | 167 | ||
146 | Stores data into the device's driver_data area. | 168 | Stores data into the device's driver_data area. |
147 | 169 | ||
148 | void *eisa_get_drvdata (struct eisa_device *edev): | 170 | :: |
171 | |||
172 | void *eisa_get_drvdata (struct eisa_device *edev): | ||
149 | 173 | ||
150 | Gets the pointer previously stored into the device's driver_data area. | 174 | Gets the pointer previously stored into the device's driver_data area. |
151 | 175 | ||
152 | int eisa_get_region_index (void *addr); | 176 | :: |
177 | |||
178 | int eisa_get_region_index (void *addr); | ||
153 | 179 | ||
154 | Returns the region number (0 <= x < EISA_MAX_RESOURCES) of a given | 180 | Returns the region number (0 <= x < EISA_MAX_RESOURCES) of a given |
155 | address. | 181 | address. |
156 | 182 | ||
157 | ** Kernel parameters : | 183 | Kernel parameters |
184 | ================= | ||
158 | 185 | ||
159 | eisa_bus.enable_dev : | 186 | eisa_bus.enable_dev |
187 | A comma-separated list of slots to be enabled, even if the firmware | ||
188 | set the card as disabled. The driver must be able to properly | ||
189 | initialize the device in such conditions. | ||
160 | 190 | ||
161 | A comma-separated list of slots to be enabled, even if the firmware | 191 | eisa_bus.disable_dev |
162 | set the card as disabled. The driver must be able to properly | 192 | A comma-separated list of slots to be enabled, even if the firmware |
163 | initialize the device in such conditions. | 193 | set the card as enabled. The driver won't be called to handle this |
194 | device. | ||
164 | 195 | ||
165 | eisa_bus.disable_dev : | 196 | virtual_root.force_probe |
197 | Force the probing code to probe EISA slots even when it cannot find an | ||
198 | EISA compliant mainboard (nothing appears on slot 0). Defaults to 0 | ||
199 | (don't force), and set to 1 (force probing) when either | ||
200 | CONFIG_ALPHA_JENSEN or CONFIG_EISA_VLB_PRIMING are set. | ||
166 | 201 | ||
167 | A comma-separated list of slots to be enabled, even if the firmware | 202 | Random notes |
168 | set the card as enabled. The driver won't be called to handle this | 203 | ============ |
169 | device. | ||
170 | |||
171 | virtual_root.force_probe : | ||
172 | |||
173 | Force the probing code to probe EISA slots even when it cannot find an | ||
174 | EISA compliant mainboard (nothing appears on slot 0). Defaults to 0 | ||
175 | (don't force), and set to 1 (force probing) when either | ||
176 | CONFIG_ALPHA_JENSEN or CONFIG_EISA_VLB_PRIMING are set. | ||
177 | |||
178 | ** Random notes : | ||
179 | 204 | ||
180 | Converting an EISA driver to the new API mostly involves *deleting* | 205 | Converting an EISA driver to the new API mostly involves *deleting* |
181 | code (since probing is now in the core EISA code). Unfortunately, most | 206 | code (since probing is now in the core EISA code). Unfortunately, most |
@@ -194,9 +219,11 @@ routine. | |||
194 | For example, switching your favorite EISA SCSI card to the "hotplug" | 219 | For example, switching your favorite EISA SCSI card to the "hotplug" |
195 | model is "the right thing"(tm). | 220 | model is "the right thing"(tm). |
196 | 221 | ||
197 | ** Thanks : | 222 | Thanks |
223 | ====== | ||
224 | |||
225 | I'd like to thank the following people for their help: | ||
198 | 226 | ||
199 | I'd like to thank the following people for their help : | ||
200 | - Xavier Benigni for lending me a wonderful Alpha Jensen, | 227 | - Xavier Benigni for lending me a wonderful Alpha Jensen, |
201 | - James Bottomley, Jeff Garzik for getting this stuff into the kernel, | 228 | - James Bottomley, Jeff Garzik for getting this stuff into the kernel, |
202 | - Andries Brouwer for contributing numerous EISA ids, | 229 | - Andries Brouwer for contributing numerous EISA ids, |
diff --git a/Documentation/flexible-arrays.txt b/Documentation/flexible-arrays.txt index df904aec9904..a0f2989dd804 100644 --- a/Documentation/flexible-arrays.txt +++ b/Documentation/flexible-arrays.txt | |||
@@ -1,6 +1,9 @@ | |||
1 | =================================== | ||
1 | Using flexible arrays in the kernel | 2 | Using flexible arrays in the kernel |
2 | Last updated for 2.6.32 | 3 | =================================== |
3 | Jonathan Corbet <corbet@lwn.net> | 4 | |
5 | :Updated: Last updated for 2.6.32 | ||
6 | :Author: Jonathan Corbet <corbet@lwn.net> | ||
4 | 7 | ||
5 | Large contiguous memory allocations can be unreliable in the Linux kernel. | 8 | Large contiguous memory allocations can be unreliable in the Linux kernel. |
6 | Kernel programmers will sometimes respond to this problem by allocating | 9 | Kernel programmers will sometimes respond to this problem by allocating |
@@ -26,7 +29,7 @@ operation. It's also worth noting that flexible arrays do no internal | |||
26 | locking at all; if concurrent access to an array is possible, then the | 29 | locking at all; if concurrent access to an array is possible, then the |
27 | caller must arrange for appropriate mutual exclusion. | 30 | caller must arrange for appropriate mutual exclusion. |
28 | 31 | ||
29 | The creation of a flexible array is done with: | 32 | The creation of a flexible array is done with:: |
30 | 33 | ||
31 | #include <linux/flex_array.h> | 34 | #include <linux/flex_array.h> |
32 | 35 | ||
@@ -40,14 +43,14 @@ argument is passed directly to the internal memory allocation calls. With | |||
40 | the current code, using flags to ask for high memory is likely to lead to | 43 | the current code, using flags to ask for high memory is likely to lead to |
41 | notably unpleasant side effects. | 44 | notably unpleasant side effects. |
42 | 45 | ||
43 | It is also possible to define flexible arrays at compile time with: | 46 | It is also possible to define flexible arrays at compile time with:: |
44 | 47 | ||
45 | DEFINE_FLEX_ARRAY(name, element_size, total); | 48 | DEFINE_FLEX_ARRAY(name, element_size, total); |
46 | 49 | ||
47 | This macro will result in a definition of an array with the given name; the | 50 | This macro will result in a definition of an array with the given name; the |
48 | element size and total will be checked for validity at compile time. | 51 | element size and total will be checked for validity at compile time. |
49 | 52 | ||
50 | Storing data into a flexible array is accomplished with a call to: | 53 | Storing data into a flexible array is accomplished with a call to:: |
51 | 54 | ||
52 | int flex_array_put(struct flex_array *array, unsigned int element_nr, | 55 | int flex_array_put(struct flex_array *array, unsigned int element_nr, |
53 | void *src, gfp_t flags); | 56 | void *src, gfp_t flags); |
@@ -63,7 +66,7 @@ running in some sort of atomic context; in this situation, sleeping in the | |||
63 | memory allocator would be a bad thing. That can be avoided by using | 66 | memory allocator would be a bad thing. That can be avoided by using |
64 | GFP_ATOMIC for the flags value, but, often, there is a better way. The | 67 | GFP_ATOMIC for the flags value, but, often, there is a better way. The |
65 | trick is to ensure that any needed memory allocations are done before | 68 | trick is to ensure that any needed memory allocations are done before |
66 | entering atomic context, using: | 69 | entering atomic context, using:: |
67 | 70 | ||
68 | int flex_array_prealloc(struct flex_array *array, unsigned int start, | 71 | int flex_array_prealloc(struct flex_array *array, unsigned int start, |
69 | unsigned int nr_elements, gfp_t flags); | 72 | unsigned int nr_elements, gfp_t flags); |
@@ -73,7 +76,7 @@ defined by start and nr_elements has been allocated. Thereafter, a | |||
73 | flex_array_put() call on an element in that range is guaranteed not to | 76 | flex_array_put() call on an element in that range is guaranteed not to |
74 | block. | 77 | block. |
75 | 78 | ||
76 | Getting data back out of the array is done with: | 79 | Getting data back out of the array is done with:: |
77 | 80 | ||
78 | void *flex_array_get(struct flex_array *fa, unsigned int element_nr); | 81 | void *flex_array_get(struct flex_array *fa, unsigned int element_nr); |
79 | 82 | ||
@@ -89,7 +92,7 @@ involving that number probably result from use of unstored array entries. | |||
89 | Note that, if array elements are allocated with __GFP_ZERO, they will be | 92 | Note that, if array elements are allocated with __GFP_ZERO, they will be |
90 | initialized to zero and this poisoning will not happen. | 93 | initialized to zero and this poisoning will not happen. |
91 | 94 | ||
92 | Individual elements in the array can be cleared with: | 95 | Individual elements in the array can be cleared with:: |
93 | 96 | ||
94 | int flex_array_clear(struct flex_array *array, unsigned int element_nr); | 97 | int flex_array_clear(struct flex_array *array, unsigned int element_nr); |
95 | 98 | ||
@@ -97,7 +100,7 @@ This function will set the given element to FLEX_ARRAY_FREE and return | |||
97 | zero. If storage for the indicated element is not allocated for the array, | 100 | zero. If storage for the indicated element is not allocated for the array, |
98 | flex_array_clear() will return -EINVAL instead. Note that clearing an | 101 | flex_array_clear() will return -EINVAL instead. Note that clearing an |
99 | element does not release the storage associated with it; to reduce the | 102 | element does not release the storage associated with it; to reduce the |
100 | allocated size of an array, call: | 103 | allocated size of an array, call:: |
101 | 104 | ||
102 | int flex_array_shrink(struct flex_array *array); | 105 | int flex_array_shrink(struct flex_array *array); |
103 | 106 | ||
@@ -106,12 +109,12 @@ This function works by scanning the array for pages containing nothing but | |||
106 | FLEX_ARRAY_FREE bytes, so (1) it can be expensive, and (2) it will not work | 109 | FLEX_ARRAY_FREE bytes, so (1) it can be expensive, and (2) it will not work |
107 | if the array's pages are allocated with __GFP_ZERO. | 110 | if the array's pages are allocated with __GFP_ZERO. |
108 | 111 | ||
109 | It is possible to remove all elements of an array with a call to: | 112 | It is possible to remove all elements of an array with a call to:: |
110 | 113 | ||
111 | void flex_array_free_parts(struct flex_array *array); | 114 | void flex_array_free_parts(struct flex_array *array); |
112 | 115 | ||
113 | This call frees all elements, but leaves the array itself in place. | 116 | This call frees all elements, but leaves the array itself in place. |
114 | Freeing the entire array is done with: | 117 | Freeing the entire array is done with:: |
115 | 118 | ||
116 | void flex_array_free(struct flex_array *array); | 119 | void flex_array_free(struct flex_array *array); |
117 | 120 | ||
diff --git a/Documentation/futex-requeue-pi.txt b/Documentation/futex-requeue-pi.txt index 77b36f59d16b..14ab5787b9a7 100644 --- a/Documentation/futex-requeue-pi.txt +++ b/Documentation/futex-requeue-pi.txt | |||
@@ -1,5 +1,6 @@ | |||
1 | ================ | ||
1 | Futex Requeue PI | 2 | Futex Requeue PI |
2 | ---------------- | 3 | ================ |
3 | 4 | ||
4 | Requeueing of tasks from a non-PI futex to a PI futex requires | 5 | Requeueing of tasks from a non-PI futex to a PI futex requires |
5 | special handling in order to ensure the underlying rt_mutex is never | 6 | special handling in order to ensure the underlying rt_mutex is never |
@@ -20,28 +21,28 @@ implementation would wake the highest-priority waiter, and leave the | |||
20 | rest to the natural wakeup inherent in unlocking the mutex | 21 | rest to the natural wakeup inherent in unlocking the mutex |
21 | associated with the condvar. | 22 | associated with the condvar. |
22 | 23 | ||
23 | Consider the simplified glibc calls: | 24 | Consider the simplified glibc calls:: |
24 | 25 | ||
25 | /* caller must lock mutex */ | 26 | /* caller must lock mutex */ |
26 | pthread_cond_wait(cond, mutex) | 27 | pthread_cond_wait(cond, mutex) |
27 | { | 28 | { |
28 | lock(cond->__data.__lock); | 29 | lock(cond->__data.__lock); |
29 | unlock(mutex); | 30 | unlock(mutex); |
30 | do { | 31 | do { |
31 | unlock(cond->__data.__lock); | 32 | unlock(cond->__data.__lock); |
32 | futex_wait(cond->__data.__futex); | 33 | futex_wait(cond->__data.__futex); |
33 | lock(cond->__data.__lock); | 34 | lock(cond->__data.__lock); |
34 | } while(...) | 35 | } while(...) |
35 | unlock(cond->__data.__lock); | 36 | unlock(cond->__data.__lock); |
36 | lock(mutex); | 37 | lock(mutex); |
37 | } | 38 | } |
38 | 39 | ||
39 | pthread_cond_broadcast(cond) | 40 | pthread_cond_broadcast(cond) |
40 | { | 41 | { |
41 | lock(cond->__data.__lock); | 42 | lock(cond->__data.__lock); |
42 | unlock(cond->__data.__lock); | 43 | unlock(cond->__data.__lock); |
43 | futex_requeue(cond->data.__futex, cond->mutex); | 44 | futex_requeue(cond->data.__futex, cond->mutex); |
44 | } | 45 | } |
45 | 46 | ||
46 | Once pthread_cond_broadcast() requeues the tasks, the cond->mutex | 47 | Once pthread_cond_broadcast() requeues the tasks, the cond->mutex |
47 | has waiters. Note that pthread_cond_wait() attempts to lock the | 48 | has waiters. Note that pthread_cond_wait() attempts to lock the |
@@ -53,29 +54,29 @@ In order to support PI-aware pthread_condvar's, the kernel needs to | |||
53 | be able to requeue tasks to PI futexes. This support implies that | 54 | be able to requeue tasks to PI futexes. This support implies that |
54 | upon a successful futex_wait system call, the caller would return to | 55 | upon a successful futex_wait system call, the caller would return to |
55 | user space already holding the PI futex. The glibc implementation | 56 | user space already holding the PI futex. The glibc implementation |
56 | would be modified as follows: | 57 | would be modified as follows:: |
57 | 58 | ||
58 | 59 | ||
59 | /* caller must lock mutex */ | 60 | /* caller must lock mutex */ |
60 | pthread_cond_wait_pi(cond, mutex) | 61 | pthread_cond_wait_pi(cond, mutex) |
61 | { | 62 | { |
62 | lock(cond->__data.__lock); | 63 | lock(cond->__data.__lock); |
63 | unlock(mutex); | 64 | unlock(mutex); |
64 | do { | 65 | do { |
65 | unlock(cond->__data.__lock); | 66 | unlock(cond->__data.__lock); |
66 | futex_wait_requeue_pi(cond->__data.__futex); | 67 | futex_wait_requeue_pi(cond->__data.__futex); |
67 | lock(cond->__data.__lock); | 68 | lock(cond->__data.__lock); |
68 | } while(...) | 69 | } while(...) |
69 | unlock(cond->__data.__lock); | 70 | unlock(cond->__data.__lock); |
70 | /* the kernel acquired the mutex for us */ | 71 | /* the kernel acquired the mutex for us */ |
71 | } | 72 | } |
72 | 73 | ||
73 | pthread_cond_broadcast_pi(cond) | 74 | pthread_cond_broadcast_pi(cond) |
74 | { | 75 | { |
75 | lock(cond->__data.__lock); | 76 | lock(cond->__data.__lock); |
76 | unlock(cond->__data.__lock); | 77 | unlock(cond->__data.__lock); |
77 | futex_requeue_pi(cond->data.__futex, cond->mutex); | 78 | futex_requeue_pi(cond->data.__futex, cond->mutex); |
78 | } | 79 | } |
79 | 80 | ||
80 | The actual glibc implementation will likely test for PI and make the | 81 | The actual glibc implementation will likely test for PI and make the |
81 | necessary changes inside the existing calls rather than creating new | 82 | necessary changes inside the existing calls rather than creating new |
diff --git a/Documentation/gcc-plugins.txt b/Documentation/gcc-plugins.txt index 433eaefb4aa1..8502f24396fb 100644 --- a/Documentation/gcc-plugins.txt +++ b/Documentation/gcc-plugins.txt | |||
@@ -1,14 +1,15 @@ | |||
1 | ========================= | ||
1 | GCC plugin infrastructure | 2 | GCC plugin infrastructure |
2 | ========================= | 3 | ========================= |
3 | 4 | ||
4 | 5 | ||
5 | 1. Introduction | 6 | Introduction |
6 | =============== | 7 | ============ |
7 | 8 | ||
8 | GCC plugins are loadable modules that provide extra features to the | 9 | GCC plugins are loadable modules that provide extra features to the |
9 | compiler [1]. They are useful for runtime instrumentation and static analysis. | 10 | compiler [1]_. They are useful for runtime instrumentation and static analysis. |
10 | We can analyse, change and add further code during compilation via | 11 | We can analyse, change and add further code during compilation via |
11 | callbacks [2], GIMPLE [3], IPA [4] and RTL passes [5]. | 12 | callbacks [2]_, GIMPLE [3]_, IPA [4]_ and RTL passes [5]_. |
12 | 13 | ||
13 | The GCC plugin infrastructure of the kernel supports all gcc versions from | 14 | The GCC plugin infrastructure of the kernel supports all gcc versions from |
14 | 4.5 to 6.0, building out-of-tree modules, cross-compilation and building in a | 15 | 4.5 to 6.0, building out-of-tree modules, cross-compilation and building in a |
@@ -21,56 +22,61 @@ and versions 4.8+ can only be compiled by a C++ compiler. | |||
21 | Currently the GCC plugin infrastructure supports only the x86, arm, arm64 and | 22 | Currently the GCC plugin infrastructure supports only the x86, arm, arm64 and |
22 | powerpc architectures. | 23 | powerpc architectures. |
23 | 24 | ||
24 | This infrastructure was ported from grsecurity [6] and PaX [7]. | 25 | This infrastructure was ported from grsecurity [6]_ and PaX [7]_. |
25 | 26 | ||
26 | -- | 27 | -- |
27 | [1] https://gcc.gnu.org/onlinedocs/gccint/Plugins.html | ||
28 | [2] https://gcc.gnu.org/onlinedocs/gccint/Plugin-API.html#Plugin-API | ||
29 | [3] https://gcc.gnu.org/onlinedocs/gccint/GIMPLE.html | ||
30 | [4] https://gcc.gnu.org/onlinedocs/gccint/IPA.html | ||
31 | [5] https://gcc.gnu.org/onlinedocs/gccint/RTL.html | ||
32 | [6] https://grsecurity.net/ | ||
33 | [7] https://pax.grsecurity.net/ | ||
34 | 28 | ||
29 | .. [1] https://gcc.gnu.org/onlinedocs/gccint/Plugins.html | ||
30 | .. [2] https://gcc.gnu.org/onlinedocs/gccint/Plugin-API.html#Plugin-API | ||
31 | .. [3] https://gcc.gnu.org/onlinedocs/gccint/GIMPLE.html | ||
32 | .. [4] https://gcc.gnu.org/onlinedocs/gccint/IPA.html | ||
33 | .. [5] https://gcc.gnu.org/onlinedocs/gccint/RTL.html | ||
34 | .. [6] https://grsecurity.net/ | ||
35 | .. [7] https://pax.grsecurity.net/ | ||
36 | |||
37 | |||
38 | Files | ||
39 | ===== | ||
35 | 40 | ||
36 | 2. Files | 41 | **$(src)/scripts/gcc-plugins** |
37 | ======== | ||
38 | 42 | ||
39 | $(src)/scripts/gcc-plugins | ||
40 | This is the directory of the GCC plugins. | 43 | This is the directory of the GCC plugins. |
41 | 44 | ||
42 | $(src)/scripts/gcc-plugins/gcc-common.h | 45 | **$(src)/scripts/gcc-plugins/gcc-common.h** |
46 | |||
43 | This is a compatibility header for GCC plugins. | 47 | This is a compatibility header for GCC plugins. |
44 | It should be always included instead of individual gcc headers. | 48 | It should be always included instead of individual gcc headers. |
45 | 49 | ||
46 | $(src)/scripts/gcc-plugin.sh | 50 | **$(src)/scripts/gcc-plugin.sh** |
51 | |||
47 | This script checks the availability of the included headers in | 52 | This script checks the availability of the included headers in |
48 | gcc-common.h and chooses the proper host compiler to build the plugins | 53 | gcc-common.h and chooses the proper host compiler to build the plugins |
49 | (gcc-4.7 can be built by either gcc or g++). | 54 | (gcc-4.7 can be built by either gcc or g++). |
50 | 55 | ||
51 | $(src)/scripts/gcc-plugins/gcc-generate-gimple-pass.h | 56 | **$(src)/scripts/gcc-plugins/gcc-generate-gimple-pass.h, |
52 | $(src)/scripts/gcc-plugins/gcc-generate-ipa-pass.h | 57 | $(src)/scripts/gcc-plugins/gcc-generate-ipa-pass.h, |
53 | $(src)/scripts/gcc-plugins/gcc-generate-simple_ipa-pass.h | 58 | $(src)/scripts/gcc-plugins/gcc-generate-simple_ipa-pass.h, |
54 | $(src)/scripts/gcc-plugins/gcc-generate-rtl-pass.h | 59 | $(src)/scripts/gcc-plugins/gcc-generate-rtl-pass.h** |
60 | |||
55 | These headers automatically generate the registration structures for | 61 | These headers automatically generate the registration structures for |
56 | GIMPLE, SIMPLE_IPA, IPA and RTL passes. They support all gcc versions | 62 | GIMPLE, SIMPLE_IPA, IPA and RTL passes. They support all gcc versions |
57 | from 4.5 to 6.0. | 63 | from 4.5 to 6.0. |
58 | They should be preferred to creating the structures by hand. | 64 | They should be preferred to creating the structures by hand. |
59 | 65 | ||
60 | 66 | ||
61 | 3. Usage | 67 | Usage |
62 | ======== | 68 | ===== |
63 | 69 | ||
64 | You must install the gcc plugin headers for your gcc version, | 70 | You must install the gcc plugin headers for your gcc version, |
65 | e.g., on Ubuntu for gcc-4.9: | 71 | e.g., on Ubuntu for gcc-4.9:: |
66 | 72 | ||
67 | apt-get install gcc-4.9-plugin-dev | 73 | apt-get install gcc-4.9-plugin-dev |
68 | 74 | ||
69 | Enable a GCC plugin based feature in the kernel config: | 75 | Enable a GCC plugin based feature in the kernel config:: |
70 | 76 | ||
71 | CONFIG_GCC_PLUGIN_CYC_COMPLEXITY = y | 77 | CONFIG_GCC_PLUGIN_CYC_COMPLEXITY = y |
72 | 78 | ||
73 | To compile only the plugin(s): | 79 | To compile only the plugin(s):: |
74 | 80 | ||
75 | make gcc-plugins | 81 | make gcc-plugins |
76 | 82 | ||
diff --git a/Documentation/highuid.txt b/Documentation/highuid.txt index 6bad6f1d1cac..6ee70465c0ea 100644 --- a/Documentation/highuid.txt +++ b/Documentation/highuid.txt | |||
@@ -1,4 +1,9 @@ | |||
1 | Notes on the change from 16-bit UIDs to 32-bit UIDs: | 1 | =================================================== |
2 | Notes on the change from 16-bit UIDs to 32-bit UIDs | ||
3 | =================================================== | ||
4 | |||
5 | :Author: Chris Wing <wingc@umich.edu> | ||
6 | :Last updated: January 11, 2000 | ||
2 | 7 | ||
3 | - kernel code MUST take into account __kernel_uid_t and __kernel_uid32_t | 8 | - kernel code MUST take into account __kernel_uid_t and __kernel_uid32_t |
4 | when communicating between user and kernel space in an ioctl or data | 9 | when communicating between user and kernel space in an ioctl or data |
@@ -28,30 +33,34 @@ What's left to be done for 32-bit UIDs on all Linux architectures: | |||
28 | uses the 32-bit UID system calls properly otherwise. | 33 | uses the 32-bit UID system calls properly otherwise. |
29 | 34 | ||
30 | This affects at least: | 35 | This affects at least: |
31 | iBCS on Intel | ||
32 | 36 | ||
33 | sparc32 emulation on sparc64 | 37 | - iBCS on Intel |
34 | (need to support whatever new 32-bit UID system calls are added to | 38 | |
35 | sparc32) | 39 | - sparc32 emulation on sparc64 |
40 | (need to support whatever new 32-bit UID system calls are added to | ||
41 | sparc32) | ||
36 | 42 | ||
37 | - Validate that all filesystems behave properly. | 43 | - Validate that all filesystems behave properly. |
38 | 44 | ||
39 | At present, 32-bit UIDs _should_ work for: | 45 | At present, 32-bit UIDs _should_ work for: |
40 | ext2 | 46 | |
41 | ufs | 47 | - ext2 |
42 | isofs | 48 | - ufs |
43 | nfs | 49 | - isofs |
44 | coda | 50 | - nfs |
45 | udf | 51 | - coda |
52 | - udf | ||
46 | 53 | ||
47 | Ioctl() fixups have been made for: | 54 | Ioctl() fixups have been made for: |
48 | ncpfs | 55 | |
49 | smbfs | 56 | - ncpfs |
57 | - smbfs | ||
50 | 58 | ||
51 | Filesystems with simple fixups to prevent 16-bit UID wraparound: | 59 | Filesystems with simple fixups to prevent 16-bit UID wraparound: |
52 | minix | 60 | |
53 | sysv | 61 | - minix |
54 | qnx4 | 62 | - sysv |
63 | - qnx4 | ||
55 | 64 | ||
56 | Other filesystems have not been checked yet. | 65 | Other filesystems have not been checked yet. |
57 | 66 | ||
@@ -69,9 +78,3 @@ What's left to be done for 32-bit UIDs on all Linux architectures: | |||
69 | - make sure that the UID mapping feature of AX25 networking works properly | 78 | - make sure that the UID mapping feature of AX25 networking works properly |
70 | (it should be safe because it's always used a 32-bit integer to | 79 | (it should be safe because it's always used a 32-bit integer to |
71 | communicate between user and kernel) | 80 | communicate between user and kernel) |
72 | |||
73 | |||
74 | Chris Wing | ||
75 | wingc@umich.edu | ||
76 | |||
77 | last updated: January 11, 2000 | ||
diff --git a/Documentation/hw_random.txt b/Documentation/hw_random.txt index fce1634907d0..121de96e395e 100644 --- a/Documentation/hw_random.txt +++ b/Documentation/hw_random.txt | |||
@@ -1,90 +1,105 @@ | |||
1 | Introduction: | 1 | ========================================================== |
2 | 2 | Linux support for random number generator in i8xx chipsets | |
3 | The hw_random framework is software that makes use of a | 3 | ========================================================== |
4 | special hardware feature on your CPU or motherboard, | 4 | |
5 | a Random Number Generator (RNG). The software has two parts: | 5 | Introduction |
6 | a core providing the /dev/hwrng character device and its | 6 | ============ |
7 | sysfs support, plus a hardware-specific driver that plugs | 7 | |
8 | into that core. | 8 | The hw_random framework is software that makes use of a |
9 | 9 | special hardware feature on your CPU or motherboard, | |
10 | To make the most effective use of these mechanisms, you | 10 | a Random Number Generator (RNG). The software has two parts: |
11 | should download the support software as well. Download the | 11 | a core providing the /dev/hwrng character device and its |
12 | latest version of the "rng-tools" package from the | 12 | sysfs support, plus a hardware-specific driver that plugs |
13 | hw_random driver's official Web site: | 13 | into that core. |
14 | 14 | ||
15 | http://sourceforge.net/projects/gkernel/ | 15 | To make the most effective use of these mechanisms, you |
16 | 16 | should download the support software as well. Download the | |
17 | Those tools use /dev/hwrng to fill the kernel entropy pool, | 17 | latest version of the "rng-tools" package from the |
18 | which is used internally and exported by the /dev/urandom and | 18 | hw_random driver's official Web site: |
19 | /dev/random special files. | 19 | |
20 | 20 | http://sourceforge.net/projects/gkernel/ | |
21 | Theory of operation: | 21 | |
22 | 22 | Those tools use /dev/hwrng to fill the kernel entropy pool, | |
23 | CHARACTER DEVICE. Using the standard open() | 23 | which is used internally and exported by the /dev/urandom and |
24 | and read() system calls, you can read random data from | 24 | /dev/random special files. |
25 | the hardware RNG device. This data is NOT CHECKED by any | 25 | |
26 | fitness tests, and could potentially be bogus (if the | 26 | Theory of operation |
27 | hardware is faulty or has been tampered with). Data is only | 27 | =================== |
28 | output if the hardware "has-data" flag is set, but nevertheless | 28 | |
29 | a security-conscious person would run fitness tests on the | 29 | CHARACTER DEVICE. Using the standard open() |
30 | data before assuming it is truly random. | 30 | and read() system calls, you can read random data from |
31 | 31 | the hardware RNG device. This data is NOT CHECKED by any | |
32 | The rng-tools package uses such tests in "rngd", and lets you | 32 | fitness tests, and could potentially be bogus (if the |
33 | run them by hand with a "rngtest" utility. | 33 | hardware is faulty or has been tampered with). Data is only |
34 | 34 | output if the hardware "has-data" flag is set, but nevertheless | |
35 | /dev/hwrng is char device major 10, minor 183. | 35 | a security-conscious person would run fitness tests on the |
36 | 36 | data before assuming it is truly random. | |
37 | CLASS DEVICE. There is a /sys/class/misc/hw_random node with | 37 | |
38 | two unique attributes, "rng_available" and "rng_current". The | 38 | The rng-tools package uses such tests in "rngd", and lets you |
39 | "rng_available" attribute lists the hardware-specific drivers | 39 | run them by hand with a "rngtest" utility. |
40 | available, while "rng_current" lists the one which is currently | 40 | |
41 | connected to /dev/hwrng. If your system has more than one | 41 | /dev/hwrng is char device major 10, minor 183. |
42 | RNG available, you may change the one used by writing a name from | 42 | |
43 | the list in "rng_available" into "rng_current". | 43 | CLASS DEVICE. There is a /sys/class/misc/hw_random node with |
44 | two unique attributes, "rng_available" and "rng_current". The | ||
45 | "rng_available" attribute lists the hardware-specific drivers | ||
46 | available, while "rng_current" lists the one which is currently | ||
47 | connected to /dev/hwrng. If your system has more than one | ||
48 | RNG available, you may change the one used by writing a name from | ||
49 | the list in "rng_available" into "rng_current". | ||
44 | 50 | ||
45 | ========================================================================== | 51 | ========================================================================== |
46 | 52 | ||
47 | Hardware driver for Intel/AMD/VIA Random Number Generators (RNG) | ||
48 | Copyright 2000,2001 Jeff Garzik <jgarzik@pobox.com> | ||
49 | Copyright 2000,2001 Philipp Rumpf <prumpf@mandrakesoft.com> | ||
50 | 53 | ||
54 | Hardware driver for Intel/AMD/VIA Random Number Generators (RNG) | ||
55 | - Copyright 2000,2001 Jeff Garzik <jgarzik@pobox.com> | ||
56 | - Copyright 2000,2001 Philipp Rumpf <prumpf@mandrakesoft.com> | ||
51 | 57 | ||
52 | About the Intel RNG hardware, from the firmware hub datasheet: | ||
53 | 58 | ||
54 | The Firmware Hub integrates a Random Number Generator (RNG) | 59 | About the Intel RNG hardware, from the firmware hub datasheet |
55 | using thermal noise generated from inherently random quantum | 60 | ============================================================= |
56 | mechanical properties of silicon. When not generating new random | ||
57 | bits the RNG circuitry will enter a low power state. Intel will | ||
58 | provide a binary software driver to give third party software | ||
59 | access to our RNG for use as a security feature. At this time, | ||
60 | the RNG is only to be used with a system in an OS-present state. | ||
61 | 61 | ||
62 | Intel RNG Driver notes: | 62 | The Firmware Hub integrates a Random Number Generator (RNG) |
63 | using thermal noise generated from inherently random quantum | ||
64 | mechanical properties of silicon. When not generating new random | ||
65 | bits the RNG circuitry will enter a low power state. Intel will | ||
66 | provide a binary software driver to give third party software | ||
67 | access to our RNG for use as a security feature. At this time, | ||
68 | the RNG is only to be used with a system in an OS-present state. | ||
63 | 69 | ||
64 | * FIXME: support poll(2) | 70 | Intel RNG Driver notes |
71 | ====================== | ||
65 | 72 | ||
66 | NOTE: request_mem_region was removed, for three reasons: | 73 | FIXME: support poll(2) |
67 | 1) Only one RNG is supported by this driver, 2) The location | 74 | |
68 | used by the RNG is a fixed location in MMIO-addressable memory, | 75 | .. note:: |
76 | |||
77 | request_mem_region was removed, for three reasons: | ||
78 | |||
79 | 1) Only one RNG is supported by this driver; | ||
80 | 2) The location used by the RNG is a fixed location in | ||
81 | MMIO-addressable memory; | ||
69 | 3) users with properly working BIOS e820 handling will always | 82 | 3) users with properly working BIOS e820 handling will always |
70 | have the region in which the RNG is located reserved, so | 83 | have the region in which the RNG is located reserved, so |
71 | request_mem_region calls always fail for proper setups. | 84 | request_mem_region calls always fail for proper setups. |
72 | However, for people who use mem=XX, BIOS e820 information is | 85 | However, for people who use mem=XX, BIOS e820 information is |
73 | -not- in /proc/iomem, and request_mem_region(RNG_ADDR) can | 86 | **not** in /proc/iomem, and request_mem_region(RNG_ADDR) can |
74 | succeed. | 87 | succeed. |
75 | 88 | ||
76 | Driver details: | 89 | Driver details |
90 | ============== | ||
77 | 91 | ||
78 | Based on: | 92 | Based on: |
79 | Intel 82802AB/82802AC Firmware Hub (FWH) Datasheet | 93 | Intel 82802AB/82802AC Firmware Hub (FWH) Datasheet |
80 | May 1999 Order Number: 290658-002 R | 94 | May 1999 Order Number: 290658-002 R |
81 | 95 | ||
82 | Intel 82802 Firmware Hub: Random Number Generator | 96 | Intel 82802 Firmware Hub: |
97 | Random Number Generator | ||
83 | Programmer's Reference Manual | 98 | Programmer's Reference Manual |
84 | December 1999 Order Number: 298029-001 R | 99 | December 1999 Order Number: 298029-001 R |
85 | 100 | ||
86 | Intel 82802 Firmware HUB Random Number Generator Driver | 101 | Intel 82802 Firmware HUB Random Number Generator Driver |
87 | Copyright (c) 2000 Matt Sottek <msottek@quiknet.com> | 102 | Copyright (c) 2000 Matt Sottek <msottek@quiknet.com> |
88 | 103 | ||
89 | Special thanks to Matt Sottek. I did the "guts", he | 104 | Special thanks to Matt Sottek. I did the "guts", he |
90 | did the "brains" and all the testing. | 105 | did the "brains" and all the testing. |
diff --git a/Documentation/hwspinlock.txt b/Documentation/hwspinlock.txt index 61c1ee98e59f..ed640a278185 100644 --- a/Documentation/hwspinlock.txt +++ b/Documentation/hwspinlock.txt | |||
@@ -1,6 +1,9 @@ | |||
1 | =========================== | ||
1 | Hardware Spinlock Framework | 2 | Hardware Spinlock Framework |
3 | =========================== | ||
2 | 4 | ||
3 | 1. Introduction | 5 | Introduction |
6 | ============ | ||
4 | 7 | ||
5 | Hardware spinlock modules provide hardware assistance for synchronization | 8 | Hardware spinlock modules provide hardware assistance for synchronization |
6 | and mutual exclusion between heterogeneous processors and those not operating | 9 | and mutual exclusion between heterogeneous processors and those not operating |
@@ -32,286 +35,370 @@ structure). | |||
32 | A common hwspinlock interface makes it possible to have generic, platform- | 35 | A common hwspinlock interface makes it possible to have generic, platform- |
33 | independent, drivers. | 36 | independent, drivers. |
34 | 37 | ||
35 | 2. User API | 38 | User API |
39 | ======== | ||
40 | |||
41 | :: | ||
36 | 42 | ||
37 | struct hwspinlock *hwspin_lock_request(void); | 43 | struct hwspinlock *hwspin_lock_request(void); |
38 | - dynamically assign an hwspinlock and return its address, or NULL | 44 | |
39 | in case an unused hwspinlock isn't available. Users of this | 45 | Dynamically assign an hwspinlock and return its address, or NULL |
40 | API will usually want to communicate the lock's id to the remote core | 46 | in case an unused hwspinlock isn't available. Users of this |
41 | before it can be used to achieve synchronization. | 47 | API will usually want to communicate the lock's id to the remote core |
42 | Should be called from a process context (might sleep). | 48 | before it can be used to achieve synchronization. |
49 | |||
50 | Should be called from a process context (might sleep). | ||
51 | |||
52 | :: | ||
43 | 53 | ||
44 | struct hwspinlock *hwspin_lock_request_specific(unsigned int id); | 54 | struct hwspinlock *hwspin_lock_request_specific(unsigned int id); |
45 | - assign a specific hwspinlock id and return its address, or NULL | 55 | |
46 | if that hwspinlock is already in use. Usually board code will | 56 | Assign a specific hwspinlock id and return its address, or NULL |
47 | be calling this function in order to reserve specific hwspinlock | 57 | if that hwspinlock is already in use. Usually board code will |
48 | ids for predefined purposes. | 58 | be calling this function in order to reserve specific hwspinlock |
49 | Should be called from a process context (might sleep). | 59 | ids for predefined purposes. |
60 | |||
61 | Should be called from a process context (might sleep). | ||
62 | |||
63 | :: | ||
50 | 64 | ||
51 | int of_hwspin_lock_get_id(struct device_node *np, int index); | 65 | int of_hwspin_lock_get_id(struct device_node *np, int index); |
52 | - retrieve the global lock id for an OF phandle-based specific lock. | 66 | |
53 | This function provides a means for DT users of a hwspinlock module | 67 | Retrieve the global lock id for an OF phandle-based specific lock. |
54 | to get the global lock id of a specific hwspinlock, so that it can | 68 | This function provides a means for DT users of a hwspinlock module |
55 | be requested using the normal hwspin_lock_request_specific() API. | 69 | to get the global lock id of a specific hwspinlock, so that it can |
56 | The function returns a lock id number on success, -EPROBE_DEFER if | 70 | be requested using the normal hwspin_lock_request_specific() API. |
57 | the hwspinlock device is not yet registered with the core, or other | 71 | |
58 | error values. | 72 | The function returns a lock id number on success, -EPROBE_DEFER if |
59 | Should be called from a process context (might sleep). | 73 | the hwspinlock device is not yet registered with the core, or other |
74 | error values. | ||
75 | |||
76 | Should be called from a process context (might sleep). | ||
77 | |||
78 | :: | ||
60 | 79 | ||
61 | int hwspin_lock_free(struct hwspinlock *hwlock); | 80 | int hwspin_lock_free(struct hwspinlock *hwlock); |
62 | - free a previously-assigned hwspinlock; returns 0 on success, or an | 81 | |
63 | appropriate error code on failure (e.g. -EINVAL if the hwspinlock | 82 | Free a previously-assigned hwspinlock; returns 0 on success, or an |
64 | is already free). | 83 | appropriate error code on failure (e.g. -EINVAL if the hwspinlock |
65 | Should be called from a process context (might sleep). | 84 | is already free). |
85 | |||
86 | Should be called from a process context (might sleep). | ||
87 | |||
88 | :: | ||
66 | 89 | ||
67 | int hwspin_lock_timeout(struct hwspinlock *hwlock, unsigned int timeout); | 90 | int hwspin_lock_timeout(struct hwspinlock *hwlock, unsigned int timeout); |
68 | - lock a previously-assigned hwspinlock with a timeout limit (specified in | 91 | |
69 | msecs). If the hwspinlock is already taken, the function will busy loop | 92 | Lock a previously-assigned hwspinlock with a timeout limit (specified in |
70 | waiting for it to be released, but give up when the timeout elapses. | 93 | msecs). If the hwspinlock is already taken, the function will busy loop |
71 | Upon a successful return from this function, preemption is disabled so | 94 | waiting for it to be released, but give up when the timeout elapses. |
72 | the caller must not sleep, and is advised to release the hwspinlock as | 95 | Upon a successful return from this function, preemption is disabled so |
73 | soon as possible, in order to minimize remote cores polling on the | 96 | the caller must not sleep, and is advised to release the hwspinlock as |
74 | hardware interconnect. | 97 | soon as possible, in order to minimize remote cores polling on the |
75 | Returns 0 when successful and an appropriate error code otherwise (most | 98 | hardware interconnect. |
76 | notably -ETIMEDOUT if the hwspinlock is still busy after timeout msecs). | 99 | |
77 | The function will never sleep. | 100 | Returns 0 when successful and an appropriate error code otherwise (most |
101 | notably -ETIMEDOUT if the hwspinlock is still busy after timeout msecs). | ||
102 | The function will never sleep. | ||
103 | |||
104 | :: | ||
78 | 105 | ||
79 | int hwspin_lock_timeout_irq(struct hwspinlock *hwlock, unsigned int timeout); | 106 | int hwspin_lock_timeout_irq(struct hwspinlock *hwlock, unsigned int timeout); |
80 | - lock a previously-assigned hwspinlock with a timeout limit (specified in | 107 | |
81 | msecs). If the hwspinlock is already taken, the function will busy loop | 108 | Lock a previously-assigned hwspinlock with a timeout limit (specified in |
82 | waiting for it to be released, but give up when the timeout elapses. | 109 | msecs). If the hwspinlock is already taken, the function will busy loop |
83 | Upon a successful return from this function, preemption and the local | 110 | waiting for it to be released, but give up when the timeout elapses. |
84 | interrupts are disabled, so the caller must not sleep, and is advised to | 111 | Upon a successful return from this function, preemption and the local |
85 | release the hwspinlock as soon as possible. | 112 | interrupts are disabled, so the caller must not sleep, and is advised to |
86 | Returns 0 when successful and an appropriate error code otherwise (most | 113 | release the hwspinlock as soon as possible. |
87 | notably -ETIMEDOUT if the hwspinlock is still busy after timeout msecs). | 114 | |
88 | The function will never sleep. | 115 | Returns 0 when successful and an appropriate error code otherwise (most |
116 | notably -ETIMEDOUT if the hwspinlock is still busy after timeout msecs). | ||
117 | The function will never sleep. | ||
118 | |||
119 | :: | ||
89 | 120 | ||
90 | int hwspin_lock_timeout_irqsave(struct hwspinlock *hwlock, unsigned int to, | 121 | int hwspin_lock_timeout_irqsave(struct hwspinlock *hwlock, unsigned int to, |
91 | unsigned long *flags); | 122 | unsigned long *flags); |
92 | - lock a previously-assigned hwspinlock with a timeout limit (specified in | 123 | |
93 | msecs). If the hwspinlock is already taken, the function will busy loop | 124 | Lock a previously-assigned hwspinlock with a timeout limit (specified in |
94 | waiting for it to be released, but give up when the timeout elapses. | 125 | msecs). If the hwspinlock is already taken, the function will busy loop |
95 | Upon a successful return from this function, preemption is disabled, | 126 | waiting for it to be released, but give up when the timeout elapses. |
96 | local interrupts are disabled and their previous state is saved at the | 127 | Upon a successful return from this function, preemption is disabled, |
97 | given flags placeholder. The caller must not sleep, and is advised to | 128 | local interrupts are disabled and their previous state is saved at the |
98 | release the hwspinlock as soon as possible. | 129 | given flags placeholder. The caller must not sleep, and is advised to |
99 | Returns 0 when successful and an appropriate error code otherwise (most | 130 | release the hwspinlock as soon as possible. |
100 | notably -ETIMEDOUT if the hwspinlock is still busy after timeout msecs). | 131 | |
101 | The function will never sleep. | 132 | Returns 0 when successful and an appropriate error code otherwise (most |
133 | notably -ETIMEDOUT if the hwspinlock is still busy after timeout msecs). | ||
134 | |||
135 | The function will never sleep. | ||
136 | |||
137 | :: | ||
102 | 138 | ||
103 | int hwspin_trylock(struct hwspinlock *hwlock); | 139 | int hwspin_trylock(struct hwspinlock *hwlock); |
104 | - attempt to lock a previously-assigned hwspinlock, but immediately fail if | 140 | |
105 | it is already taken. | 141 | |
106 | Upon a successful return from this function, preemption is disabled so | 142 | Attempt to lock a previously-assigned hwspinlock, but immediately fail if |
107 | caller must not sleep, and is advised to release the hwspinlock as soon as | 143 | it is already taken. |
108 | possible, in order to minimize remote cores polling on the hardware | 144 | |
109 | interconnect. | 145 | Upon a successful return from this function, preemption is disabled so |
110 | Returns 0 on success and an appropriate error code otherwise (most | 146 | caller must not sleep, and is advised to release the hwspinlock as soon as |
111 | notably -EBUSY if the hwspinlock was already taken). | 147 | possible, in order to minimize remote cores polling on the hardware |
112 | The function will never sleep. | 148 | interconnect. |
149 | |||
150 | Returns 0 on success and an appropriate error code otherwise (most | ||
151 | notably -EBUSY if the hwspinlock was already taken). | ||
152 | The function will never sleep. | ||
153 | |||
154 | :: | ||
113 | 155 | ||
114 | int hwspin_trylock_irq(struct hwspinlock *hwlock); | 156 | int hwspin_trylock_irq(struct hwspinlock *hwlock); |
115 | - attempt to lock a previously-assigned hwspinlock, but immediately fail if | 157 | |
116 | it is already taken. | 158 | |
117 | Upon a successful return from this function, preemption and the local | 159 | Attempt to lock a previously-assigned hwspinlock, but immediately fail if |
118 | interrupts are disabled so caller must not sleep, and is advised to | 160 | it is already taken. |
119 | release the hwspinlock as soon as possible. | 161 | |
120 | Returns 0 on success and an appropriate error code otherwise (most | 162 | Upon a successful return from this function, preemption and the local |
121 | notably -EBUSY if the hwspinlock was already taken). | 163 | interrupts are disabled so caller must not sleep, and is advised to |
122 | The function will never sleep. | 164 | release the hwspinlock as soon as possible. |
165 | |||
166 | Returns 0 on success and an appropriate error code otherwise (most | ||
167 | notably -EBUSY if the hwspinlock was already taken). | ||
168 | |||
169 | The function will never sleep. | ||
170 | |||
171 | :: | ||
123 | 172 | ||
124 | int hwspin_trylock_irqsave(struct hwspinlock *hwlock, unsigned long *flags); | 173 | int hwspin_trylock_irqsave(struct hwspinlock *hwlock, unsigned long *flags); |
125 | - attempt to lock a previously-assigned hwspinlock, but immediately fail if | 174 | |
126 | it is already taken. | 175 | Attempt to lock a previously-assigned hwspinlock, but immediately fail if |
127 | Upon a successful return from this function, preemption is disabled, | 176 | it is already taken. |
128 | the local interrupts are disabled and their previous state is saved | 177 | |
129 | at the given flags placeholder. The caller must not sleep, and is advised | 178 | Upon a successful return from this function, preemption is disabled, |
130 | to release the hwspinlock as soon as possible. | 179 | the local interrupts are disabled and their previous state is saved |
131 | Returns 0 on success and an appropriate error code otherwise (most | 180 | at the given flags placeholder. The caller must not sleep, and is advised |
132 | notably -EBUSY if the hwspinlock was already taken). | 181 | to release the hwspinlock as soon as possible. |
133 | The function will never sleep. | 182 | |
183 | Returns 0 on success and an appropriate error code otherwise (most | ||
184 | notably -EBUSY if the hwspinlock was already taken). | ||
185 | The function will never sleep. | ||
186 | |||
187 | :: | ||
134 | 188 | ||
135 | void hwspin_unlock(struct hwspinlock *hwlock); | 189 | void hwspin_unlock(struct hwspinlock *hwlock); |
136 | - unlock a previously-locked hwspinlock. Always succeed, and can be called | 190 | |
137 | from any context (the function never sleeps). Note: code should _never_ | 191 | Unlock a previously-locked hwspinlock. Always succeed, and can be called |
138 | unlock an hwspinlock which is already unlocked (there is no protection | 192 | from any context (the function never sleeps). |
139 | against this). | 193 | |
194 | .. note:: | ||
195 | |||
196 | code should **never** unlock an hwspinlock which is already unlocked | ||
197 | (there is no protection against this). | ||
198 | |||
199 | :: | ||
140 | 200 | ||
141 | void hwspin_unlock_irq(struct hwspinlock *hwlock); | 201 | void hwspin_unlock_irq(struct hwspinlock *hwlock); |
142 | - unlock a previously-locked hwspinlock and enable local interrupts. | 202 | |
143 | The caller should _never_ unlock an hwspinlock which is already unlocked. | 203 | Unlock a previously-locked hwspinlock and enable local interrupts. |
144 | Doing so is considered a bug (there is no protection against this). | 204 | The caller should **never** unlock an hwspinlock which is already unlocked. |
145 | Upon a successful return from this function, preemption and local | 205 | |
146 | interrupts are enabled. This function will never sleep. | 206 | Doing so is considered a bug (there is no protection against this). |
207 | Upon a successful return from this function, preemption and local | ||
208 | interrupts are enabled. This function will never sleep. | ||
209 | |||
210 | :: | ||
147 | 211 | ||
148 | void | 212 | void |
149 | hwspin_unlock_irqrestore(struct hwspinlock *hwlock, unsigned long *flags); | 213 | hwspin_unlock_irqrestore(struct hwspinlock *hwlock, unsigned long *flags); |
150 | - unlock a previously-locked hwspinlock. | 214 | |
151 | The caller should _never_ unlock an hwspinlock which is already unlocked. | 215 | Unlock a previously-locked hwspinlock. |
152 | Doing so is considered a bug (there is no protection against this). | 216 | |
153 | Upon a successful return from this function, preemption is reenabled, | 217 | The caller should **never** unlock an hwspinlock which is already unlocked. |
154 | and the state of the local interrupts is restored to the state saved at | 218 | Doing so is considered a bug (there is no protection against this). |
155 | the given flags. This function will never sleep. | 219 | Upon a successful return from this function, preemption is reenabled, |
220 | and the state of the local interrupts is restored to the state saved at | ||
221 | the given flags. This function will never sleep. | ||
222 | |||
223 | :: | ||
156 | 224 | ||
157 | int hwspin_lock_get_id(struct hwspinlock *hwlock); | 225 | int hwspin_lock_get_id(struct hwspinlock *hwlock); |
158 | - retrieve id number of a given hwspinlock. This is needed when an | ||
159 | hwspinlock is dynamically assigned: before it can be used to achieve | ||
160 | mutual exclusion with a remote cpu, the id number should be communicated | ||
161 | to the remote task with which we want to synchronize. | ||
162 | Returns the hwspinlock id number, or -EINVAL if hwlock is null. | ||
163 | |||
164 | 3. Typical usage | ||
165 | |||
166 | #include <linux/hwspinlock.h> | ||
167 | #include <linux/err.h> | ||
168 | |||
169 | int hwspinlock_example1(void) | ||
170 | { | ||
171 | struct hwspinlock *hwlock; | ||
172 | int ret; | ||
173 | |||
174 | /* dynamically assign a hwspinlock */ | ||
175 | hwlock = hwspin_lock_request(); | ||
176 | if (!hwlock) | ||
177 | ... | ||
178 | |||
179 | id = hwspin_lock_get_id(hwlock); | ||
180 | /* probably need to communicate id to a remote processor now */ | ||
181 | |||
182 | /* take the lock, spin for 1 sec if it's already taken */ | ||
183 | ret = hwspin_lock_timeout(hwlock, 1000); | ||
184 | if (ret) | ||
185 | ... | ||
186 | |||
187 | /* | ||
188 | * we took the lock, do our thing now, but do NOT sleep | ||
189 | */ | ||
190 | |||
191 | /* release the lock */ | ||
192 | hwspin_unlock(hwlock); | ||
193 | |||
194 | /* free the lock */ | ||
195 | ret = hwspin_lock_free(hwlock); | ||
196 | if (ret) | ||
197 | ... | ||
198 | |||
199 | return ret; | ||
200 | } | ||
201 | |||
202 | int hwspinlock_example2(void) | ||
203 | { | ||
204 | struct hwspinlock *hwlock; | ||
205 | int ret; | ||
206 | |||
207 | /* | ||
208 | * assign a specific hwspinlock id - this should be called early | ||
209 | * by board init code. | ||
210 | */ | ||
211 | hwlock = hwspin_lock_request_specific(PREDEFINED_LOCK_ID); | ||
212 | if (!hwlock) | ||
213 | ... | ||
214 | |||
215 | /* try to take it, but don't spin on it */ | ||
216 | ret = hwspin_trylock(hwlock); | ||
217 | if (!ret) { | ||
218 | pr_info("lock is already taken\n"); | ||
219 | return -EBUSY; | ||
220 | } | ||
221 | 226 | ||
222 | /* | 227 | Retrieve id number of a given hwspinlock. This is needed when an |
223 | * we took the lock, do our thing now, but do NOT sleep | 228 | hwspinlock is dynamically assigned: before it can be used to achieve |
224 | */ | 229 | mutual exclusion with a remote cpu, the id number should be communicated |
230 | to the remote task with which we want to synchronize. | ||
231 | |||
232 | Returns the hwspinlock id number, or -EINVAL if hwlock is null. | ||
233 | |||
234 | Typical usage | ||
235 | ============= | ||
225 | 236 | ||
226 | /* release the lock */ | 237 | :: |
227 | hwspin_unlock(hwlock); | ||
228 | 238 | ||
229 | /* free the lock */ | 239 | #include <linux/hwspinlock.h> |
230 | ret = hwspin_lock_free(hwlock); | 240 | #include <linux/err.h> |
231 | if (ret) | ||
232 | ... | ||
233 | 241 | ||
234 | return ret; | 242 | int hwspinlock_example1(void) |
235 | } | 243 | { |
244 | struct hwspinlock *hwlock; | ||
245 | int ret; | ||
236 | 246 | ||
247 | /* dynamically assign a hwspinlock */ | ||
248 | hwlock = hwspin_lock_request(); | ||
249 | if (!hwlock) | ||
250 | ... | ||
237 | 251 | ||
238 | 4. API for implementors | 252 | id = hwspin_lock_get_id(hwlock); |
253 | /* probably need to communicate id to a remote processor now */ | ||
254 | |||
255 | /* take the lock, spin for 1 sec if it's already taken */ | ||
256 | ret = hwspin_lock_timeout(hwlock, 1000); | ||
257 | if (ret) | ||
258 | ... | ||
259 | |||
260 | /* | ||
261 | * we took the lock, do our thing now, but do NOT sleep | ||
262 | */ | ||
263 | |||
264 | /* release the lock */ | ||
265 | hwspin_unlock(hwlock); | ||
266 | |||
267 | /* free the lock */ | ||
268 | ret = hwspin_lock_free(hwlock); | ||
269 | if (ret) | ||
270 | ... | ||
271 | |||
272 | return ret; | ||
273 | } | ||
274 | |||
275 | int hwspinlock_example2(void) | ||
276 | { | ||
277 | struct hwspinlock *hwlock; | ||
278 | int ret; | ||
279 | |||
280 | /* | ||
281 | * assign a specific hwspinlock id - this should be called early | ||
282 | * by board init code. | ||
283 | */ | ||
284 | hwlock = hwspin_lock_request_specific(PREDEFINED_LOCK_ID); | ||
285 | if (!hwlock) | ||
286 | ... | ||
287 | |||
288 | /* try to take it, but don't spin on it */ | ||
289 | ret = hwspin_trylock(hwlock); | ||
290 | if (!ret) { | ||
291 | pr_info("lock is already taken\n"); | ||
292 | return -EBUSY; | ||
293 | } | ||
294 | |||
295 | /* | ||
296 | * we took the lock, do our thing now, but do NOT sleep | ||
297 | */ | ||
298 | |||
299 | /* release the lock */ | ||
300 | hwspin_unlock(hwlock); | ||
301 | |||
302 | /* free the lock */ | ||
303 | ret = hwspin_lock_free(hwlock); | ||
304 | if (ret) | ||
305 | ... | ||
306 | |||
307 | return ret; | ||
308 | } | ||
309 | |||
310 | |||
311 | API for implementors | ||
312 | ==================== | ||
313 | |||
314 | :: | ||
239 | 315 | ||
240 | int hwspin_lock_register(struct hwspinlock_device *bank, struct device *dev, | 316 | int hwspin_lock_register(struct hwspinlock_device *bank, struct device *dev, |
241 | const struct hwspinlock_ops *ops, int base_id, int num_locks); | 317 | const struct hwspinlock_ops *ops, int base_id, int num_locks); |
242 | - to be called from the underlying platform-specific implementation, in | 318 | |
243 | order to register a new hwspinlock device (which is usually a bank of | 319 | To be called from the underlying platform-specific implementation, in |
244 | numerous locks). Should be called from a process context (this function | 320 | order to register a new hwspinlock device (which is usually a bank of |
245 | might sleep). | 321 | numerous locks). Should be called from a process context (this function |
246 | Returns 0 on success, or appropriate error code on failure. | 322 | might sleep). |
323 | |||
324 | Returns 0 on success, or appropriate error code on failure. | ||
325 | |||
326 | :: | ||
247 | 327 | ||
248 | int hwspin_lock_unregister(struct hwspinlock_device *bank); | 328 | int hwspin_lock_unregister(struct hwspinlock_device *bank); |
249 | - to be called from the underlying vendor-specific implementation, in order | ||
250 | to unregister an hwspinlock device (which is usually a bank of numerous | ||
251 | locks). | ||
252 | Should be called from a process context (this function might sleep). | ||
253 | Returns the address of hwspinlock on success, or NULL on error (e.g. | ||
254 | if the hwspinlock is still in use). | ||
255 | 329 | ||
256 | 5. Important structs | 330 | To be called from the underlying vendor-specific implementation, in order |
331 | to unregister an hwspinlock device (which is usually a bank of numerous | ||
332 | locks). | ||
333 | |||
334 | Should be called from a process context (this function might sleep). | ||
335 | |||
336 | Returns the address of hwspinlock on success, or NULL on error (e.g. | ||
337 | if the hwspinlock is still in use). | ||
338 | |||
339 | Important structs | ||
340 | ================= | ||
257 | 341 | ||
258 | struct hwspinlock_device is a device which usually contains a bank | 342 | struct hwspinlock_device is a device which usually contains a bank |
259 | of hardware locks. It is registered by the underlying hwspinlock | 343 | of hardware locks. It is registered by the underlying hwspinlock |
260 | implementation using the hwspin_lock_register() API. | 344 | implementation using the hwspin_lock_register() API. |
261 | 345 | ||
262 | /** | 346 | :: |
263 | * struct hwspinlock_device - a device which usually spans numerous hwspinlocks | 347 | |
264 | * @dev: underlying device, will be used to invoke runtime PM api | 348 | /** |
265 | * @ops: platform-specific hwspinlock handlers | 349 | * struct hwspinlock_device - a device which usually spans numerous hwspinlocks |
266 | * @base_id: id index of the first lock in this device | 350 | * @dev: underlying device, will be used to invoke runtime PM api |
267 | * @num_locks: number of locks in this device | 351 | * @ops: platform-specific hwspinlock handlers |
268 | * @lock: dynamically allocated array of 'struct hwspinlock' | 352 | * @base_id: id index of the first lock in this device |
269 | */ | 353 | * @num_locks: number of locks in this device |
270 | struct hwspinlock_device { | 354 | * @lock: dynamically allocated array of 'struct hwspinlock' |
271 | struct device *dev; | 355 | */ |
272 | const struct hwspinlock_ops *ops; | 356 | struct hwspinlock_device { |
273 | int base_id; | 357 | struct device *dev; |
274 | int num_locks; | 358 | const struct hwspinlock_ops *ops; |
275 | struct hwspinlock lock[0]; | 359 | int base_id; |
276 | }; | 360 | int num_locks; |
361 | struct hwspinlock lock[0]; | ||
362 | }; | ||
277 | 363 | ||
278 | struct hwspinlock_device contains an array of hwspinlock structs, each | 364 | struct hwspinlock_device contains an array of hwspinlock structs, each |
279 | of which represents a single hardware lock: | 365 | of which represents a single hardware lock:: |
280 | 366 | ||
281 | /** | 367 | /** |
282 | * struct hwspinlock - this struct represents a single hwspinlock instance | 368 | * struct hwspinlock - this struct represents a single hwspinlock instance |
283 | * @bank: the hwspinlock_device structure which owns this lock | 369 | * @bank: the hwspinlock_device structure which owns this lock |
284 | * @lock: initialized and used by hwspinlock core | 370 | * @lock: initialized and used by hwspinlock core |
285 | * @priv: private data, owned by the underlying platform-specific hwspinlock drv | 371 | * @priv: private data, owned by the underlying platform-specific hwspinlock drv |
286 | */ | 372 | */ |
287 | struct hwspinlock { | 373 | struct hwspinlock { |
288 | struct hwspinlock_device *bank; | 374 | struct hwspinlock_device *bank; |
289 | spinlock_t lock; | 375 | spinlock_t lock; |
290 | void *priv; | 376 | void *priv; |
291 | }; | 377 | }; |
292 | 378 | ||
293 | When registering a bank of locks, the hwspinlock driver only needs to | 379 | When registering a bank of locks, the hwspinlock driver only needs to |
294 | set the priv members of the locks. The rest of the members are set and | 380 | set the priv members of the locks. The rest of the members are set and |
295 | initialized by the hwspinlock core itself. | 381 | initialized by the hwspinlock core itself. |
296 | 382 | ||
297 | 6. Implementation callbacks | 383 | Implementation callbacks |
384 | ======================== | ||
298 | 385 | ||
299 | There are three possible callbacks defined in 'struct hwspinlock_ops': | 386 | There are three possible callbacks defined in 'struct hwspinlock_ops':: |
300 | 387 | ||
301 | struct hwspinlock_ops { | 388 | struct hwspinlock_ops { |
302 | int (*trylock)(struct hwspinlock *lock); | 389 | int (*trylock)(struct hwspinlock *lock); |
303 | void (*unlock)(struct hwspinlock *lock); | 390 | void (*unlock)(struct hwspinlock *lock); |
304 | void (*relax)(struct hwspinlock *lock); | 391 | void (*relax)(struct hwspinlock *lock); |
305 | }; | 392 | }; |
306 | 393 | ||
307 | The first two callbacks are mandatory: | 394 | The first two callbacks are mandatory: |
308 | 395 | ||
309 | The ->trylock() callback should make a single attempt to take the lock, and | 396 | The ->trylock() callback should make a single attempt to take the lock, and |
310 | return 0 on failure and 1 on success. This callback may _not_ sleep. | 397 | return 0 on failure and 1 on success. This callback may **not** sleep. |
311 | 398 | ||
312 | The ->unlock() callback releases the lock. It always succeed, and it, too, | 399 | The ->unlock() callback releases the lock. It always succeed, and it, too, |
313 | may _not_ sleep. | 400 | may **not** sleep. |
314 | 401 | ||
315 | The ->relax() callback is optional. It is called by hwspinlock core while | 402 | The ->relax() callback is optional. It is called by hwspinlock core while |
316 | spinning on a lock, and can be used by the underlying implementation to force | 403 | spinning on a lock, and can be used by the underlying implementation to force |
317 | a delay between two successive invocations of ->trylock(). It may _not_ sleep. | 404 | a delay between two successive invocations of ->trylock(). It may **not** sleep. |
diff --git a/Documentation/intel_txt.txt b/Documentation/intel_txt.txt index 91d89c540709..d83c1a2122c9 100644 --- a/Documentation/intel_txt.txt +++ b/Documentation/intel_txt.txt | |||
@@ -1,4 +1,5 @@ | |||
1 | Intel(R) TXT Overview: | 1 | ===================== |
2 | Intel(R) TXT Overview | ||
2 | ===================== | 3 | ===================== |
3 | 4 | ||
4 | Intel's technology for safer computing, Intel(R) Trusted Execution | 5 | Intel's technology for safer computing, Intel(R) Trusted Execution |
@@ -8,9 +9,10 @@ provide the building blocks for creating trusted platforms. | |||
8 | Intel TXT was formerly known by the code name LaGrande Technology (LT). | 9 | Intel TXT was formerly known by the code name LaGrande Technology (LT). |
9 | 10 | ||
10 | Intel TXT in Brief: | 11 | Intel TXT in Brief: |
11 | o Provides dynamic root of trust for measurement (DRTM) | 12 | |
12 | o Data protection in case of improper shutdown | 13 | - Provides dynamic root of trust for measurement (DRTM) |
13 | o Measurement and verification of launched environment | 14 | - Data protection in case of improper shutdown |
15 | - Measurement and verification of launched environment | ||
14 | 16 | ||
15 | Intel TXT is part of the vPro(TM) brand and is also available some | 17 | Intel TXT is part of the vPro(TM) brand and is also available some |
16 | non-vPro systems. It is currently available on desktop systems | 18 | non-vPro systems. It is currently available on desktop systems |
@@ -24,16 +26,21 @@ which has been updated for the new released platforms. | |||
24 | 26 | ||
25 | Intel TXT has been presented at various events over the past few | 27 | Intel TXT has been presented at various events over the past few |
26 | years, some of which are: | 28 | years, some of which are: |
27 | LinuxTAG 2008: | 29 | |
30 | - LinuxTAG 2008: | ||
28 | http://www.linuxtag.org/2008/en/conf/events/vp-donnerstag.html | 31 | http://www.linuxtag.org/2008/en/conf/events/vp-donnerstag.html |
29 | TRUST2008: | 32 | |
33 | - TRUST2008: | ||
30 | http://www.trust-conference.eu/downloads/Keynote-Speakers/ | 34 | http://www.trust-conference.eu/downloads/Keynote-Speakers/ |
31 | 3_David-Grawrock_The-Front-Door-of-Trusted-Computing.pdf | 35 | 3_David-Grawrock_The-Front-Door-of-Trusted-Computing.pdf |
32 | IDF, Shanghai: | 36 | |
37 | - IDF, Shanghai: | ||
33 | http://www.prcidf.com.cn/index_en.html | 38 | http://www.prcidf.com.cn/index_en.html |
34 | IDFs 2006, 2007 (I'm not sure if/where they are online) | ||
35 | 39 | ||
36 | Trusted Boot Project Overview: | 40 | - IDFs 2006, 2007 |
41 | (I'm not sure if/where they are online) | ||
42 | |||
43 | Trusted Boot Project Overview | ||
37 | ============================= | 44 | ============================= |
38 | 45 | ||
39 | Trusted Boot (tboot) is an open source, pre-kernel/VMM module that | 46 | Trusted Boot (tboot) is an open source, pre-kernel/VMM module that |
@@ -87,11 +94,12 @@ Intel-provided firmware). | |||
87 | How Does it Work? | 94 | How Does it Work? |
88 | ================= | 95 | ================= |
89 | 96 | ||
90 | o Tboot is an executable that is launched by the bootloader as | 97 | - Tboot is an executable that is launched by the bootloader as |
91 | the "kernel" (the binary the bootloader executes). | 98 | the "kernel" (the binary the bootloader executes). |
92 | o It performs all of the work necessary to determine if the | 99 | - It performs all of the work necessary to determine if the |
93 | platform supports Intel TXT and, if so, executes the GETSEC[SENTER] | 100 | platform supports Intel TXT and, if so, executes the GETSEC[SENTER] |
94 | processor instruction that initiates the dynamic root of trust. | 101 | processor instruction that initiates the dynamic root of trust. |
102 | |||
95 | - If tboot determines that the system does not support Intel TXT | 103 | - If tboot determines that the system does not support Intel TXT |
96 | or is not configured correctly (e.g. the SINIT AC Module was | 104 | or is not configured correctly (e.g. the SINIT AC Module was |
97 | incorrect), it will directly launch the kernel with no changes | 105 | incorrect), it will directly launch the kernel with no changes |
@@ -99,12 +107,14 @@ o It performs all of the work necessary to determine if the | |||
99 | - Tboot will output various information about its progress to the | 107 | - Tboot will output various information about its progress to the |
100 | terminal, serial port, and/or an in-memory log; the output | 108 | terminal, serial port, and/or an in-memory log; the output |
101 | locations can be configured with a command line switch. | 109 | locations can be configured with a command line switch. |
102 | o The GETSEC[SENTER] instruction will return control to tboot and | 110 | |
111 | - The GETSEC[SENTER] instruction will return control to tboot and | ||
103 | tboot then verifies certain aspects of the environment (e.g. TPM NV | 112 | tboot then verifies certain aspects of the environment (e.g. TPM NV |
104 | lock, e820 table does not have invalid entries, etc.). | 113 | lock, e820 table does not have invalid entries, etc.). |
105 | o It will wake the APs from the special sleep state the GETSEC[SENTER] | 114 | - It will wake the APs from the special sleep state the GETSEC[SENTER] |
106 | instruction had put them in and place them into a wait-for-SIPI | 115 | instruction had put them in and place them into a wait-for-SIPI |
107 | state. | 116 | state. |
117 | |||
108 | - Because the processors will not respond to an INIT or SIPI when | 118 | - Because the processors will not respond to an INIT or SIPI when |
109 | in the TXT environment, it is necessary to create a small VT-x | 119 | in the TXT environment, it is necessary to create a small VT-x |
110 | guest for the APs. When they run in this guest, they will | 120 | guest for the APs. When they run in this guest, they will |
@@ -112,8 +122,10 @@ o It will wake the APs from the special sleep state the GETSEC[SENTER] | |||
112 | VMEXITs, and then disable VT and jump to the SIPI vector. This | 122 | VMEXITs, and then disable VT and jump to the SIPI vector. This |
113 | approach seemed like a better choice than having to insert | 123 | approach seemed like a better choice than having to insert |
114 | special code into the kernel's MP wakeup sequence. | 124 | special code into the kernel's MP wakeup sequence. |
115 | o Tboot then applies an (optional) user-defined launch policy to | 125 | |
126 | - Tboot then applies an (optional) user-defined launch policy to | ||
116 | verify the kernel and initrd. | 127 | verify the kernel and initrd. |
128 | |||
117 | - This policy is rooted in TPM NV and is described in the tboot | 129 | - This policy is rooted in TPM NV and is described in the tboot |
118 | project. The tboot project also contains code for tools to | 130 | project. The tboot project also contains code for tools to |
119 | create and provision the policy. | 131 | create and provision the policy. |
@@ -121,30 +133,34 @@ o Tboot then applies an (optional) user-defined launch policy to | |||
121 | then any kernel will be launched. | 133 | then any kernel will be launched. |
122 | - Policy action is flexible and can include halting on failures | 134 | - Policy action is flexible and can include halting on failures |
123 | or simply logging them and continuing. | 135 | or simply logging them and continuing. |
124 | o Tboot adjusts the e820 table provided by the bootloader to reserve | 136 | |
137 | - Tboot adjusts the e820 table provided by the bootloader to reserve | ||
125 | its own location in memory as well as to reserve certain other | 138 | its own location in memory as well as to reserve certain other |
126 | TXT-related regions. | 139 | TXT-related regions. |
127 | o As part of its launch, tboot DMA protects all of RAM (using the | 140 | - As part of its launch, tboot DMA protects all of RAM (using the |
128 | VT-d PMRs). Thus, the kernel must be booted with 'intel_iommu=on' | 141 | VT-d PMRs). Thus, the kernel must be booted with 'intel_iommu=on' |
129 | in order to remove this blanket protection and use VT-d's | 142 | in order to remove this blanket protection and use VT-d's |
130 | page-level protection. | 143 | page-level protection. |
131 | o Tboot will populate a shared page with some data about itself and | 144 | - Tboot will populate a shared page with some data about itself and |
132 | pass this to the Linux kernel as it transfers control. | 145 | pass this to the Linux kernel as it transfers control. |
146 | |||
133 | - The location of the shared page is passed via the boot_params | 147 | - The location of the shared page is passed via the boot_params |
134 | struct as a physical address. | 148 | struct as a physical address. |
135 | o The kernel will look for the tboot shared page address and, if it | 149 | |
150 | - The kernel will look for the tboot shared page address and, if it | ||
136 | exists, map it. | 151 | exists, map it. |
137 | o As one of the checks/protections provided by TXT, it makes a copy | 152 | - As one of the checks/protections provided by TXT, it makes a copy |
138 | of the VT-d DMARs in a DMA-protected region of memory and verifies | 153 | of the VT-d DMARs in a DMA-protected region of memory and verifies |
139 | them for correctness. The VT-d code will detect if the kernel was | 154 | them for correctness. The VT-d code will detect if the kernel was |
140 | launched with tboot and use this copy instead of the one in the | 155 | launched with tboot and use this copy instead of the one in the |
141 | ACPI table. | 156 | ACPI table. |
142 | o At this point, tboot and TXT are out of the picture until a | 157 | - At this point, tboot and TXT are out of the picture until a |
143 | shutdown (S<n>) | 158 | shutdown (S<n>) |
144 | o In order to put a system into any of the sleep states after a TXT | 159 | - In order to put a system into any of the sleep states after a TXT |
145 | launch, TXT must first be exited. This is to prevent attacks that | 160 | launch, TXT must first be exited. This is to prevent attacks that |
146 | attempt to crash the system to gain control on reboot and steal | 161 | attempt to crash the system to gain control on reboot and steal |
147 | data left in memory. | 162 | data left in memory. |
163 | |||
148 | - The kernel will perform all of its sleep preparation and | 164 | - The kernel will perform all of its sleep preparation and |
149 | populate the shared page with the ACPI data needed to put the | 165 | populate the shared page with the ACPI data needed to put the |
150 | platform in the desired sleep state. | 166 | platform in the desired sleep state. |
@@ -172,7 +188,7 @@ o In order to put a system into any of the sleep states after a TXT | |||
172 | That's pretty much it for TXT support. | 188 | That's pretty much it for TXT support. |
173 | 189 | ||
174 | 190 | ||
175 | Configuring the System: | 191 | Configuring the System |
176 | ====================== | 192 | ====================== |
177 | 193 | ||
178 | This code works with 32bit, 32bit PAE, and 64bit (x86_64) kernels. | 194 | This code works with 32bit, 32bit PAE, and 64bit (x86_64) kernels. |
@@ -181,7 +197,8 @@ In BIOS, the user must enable: TPM, TXT, VT-x, VT-d. Not all BIOSes | |||
181 | allow these to be individually enabled/disabled and the screens in | 197 | allow these to be individually enabled/disabled and the screens in |
182 | which to find them are BIOS-specific. | 198 | which to find them are BIOS-specific. |
183 | 199 | ||
184 | grub.conf needs to be modified as follows: | 200 | grub.conf needs to be modified as follows:: |
201 | |||
185 | title Linux 2.6.29-tip w/ tboot | 202 | title Linux 2.6.29-tip w/ tboot |
186 | root (hd0,0) | 203 | root (hd0,0) |
187 | kernel /tboot.gz logging=serial,vga,memory | 204 | kernel /tboot.gz logging=serial,vga,memory |
diff --git a/Documentation/io-mapping.txt b/Documentation/io-mapping.txt index 5ca78426f54c..a966239f04e4 100644 --- a/Documentation/io-mapping.txt +++ b/Documentation/io-mapping.txt | |||
@@ -1,66 +1,81 @@ | |||
1 | ======================== | ||
2 | The io_mapping functions | ||
3 | ======================== | ||
4 | |||
5 | API | ||
6 | === | ||
7 | |||
1 | The io_mapping functions in linux/io-mapping.h provide an abstraction for | 8 | The io_mapping functions in linux/io-mapping.h provide an abstraction for |
2 | efficiently mapping small regions of an I/O device to the CPU. The initial | 9 | efficiently mapping small regions of an I/O device to the CPU. The initial |
3 | usage is to support the large graphics aperture on 32-bit processors where | 10 | usage is to support the large graphics aperture on 32-bit processors where |
4 | ioremap_wc cannot be used to statically map the entire aperture to the CPU | 11 | ioremap_wc cannot be used to statically map the entire aperture to the CPU |
5 | as it would consume too much of the kernel address space. | 12 | as it would consume too much of the kernel address space. |
6 | 13 | ||
7 | A mapping object is created during driver initialization using | 14 | A mapping object is created during driver initialization using:: |
8 | 15 | ||
9 | struct io_mapping *io_mapping_create_wc(unsigned long base, | 16 | struct io_mapping *io_mapping_create_wc(unsigned long base, |
10 | unsigned long size) | 17 | unsigned long size) |
11 | 18 | ||
12 | 'base' is the bus address of the region to be made | 19 | 'base' is the bus address of the region to be made |
13 | mappable, while 'size' indicates how large a mapping region to | 20 | mappable, while 'size' indicates how large a mapping region to |
14 | enable. Both are in bytes. | 21 | enable. Both are in bytes. |
15 | 22 | ||
16 | This _wc variant provides a mapping which may only be used | 23 | This _wc variant provides a mapping which may only be used |
17 | with the io_mapping_map_atomic_wc or io_mapping_map_wc. | 24 | with the io_mapping_map_atomic_wc or io_mapping_map_wc. |
18 | 25 | ||
19 | With this mapping object, individual pages can be mapped either atomically | 26 | With this mapping object, individual pages can be mapped either atomically |
20 | or not, depending on the necessary scheduling environment. Of course, atomic | 27 | or not, depending on the necessary scheduling environment. Of course, atomic |
21 | maps are more efficient: | 28 | maps are more efficient:: |
22 | 29 | ||
23 | void *io_mapping_map_atomic_wc(struct io_mapping *mapping, | 30 | void *io_mapping_map_atomic_wc(struct io_mapping *mapping, |
24 | unsigned long offset) | 31 | unsigned long offset) |
25 | 32 | ||
26 | 'offset' is the offset within the defined mapping region. | 33 | 'offset' is the offset within the defined mapping region. |
27 | Accessing addresses beyond the region specified in the | 34 | Accessing addresses beyond the region specified in the |
28 | creation function yields undefined results. Using an offset | 35 | creation function yields undefined results. Using an offset |
29 | which is not page aligned yields an undefined result. The | 36 | which is not page aligned yields an undefined result. The |
30 | return value points to a single page in CPU address space. | 37 | return value points to a single page in CPU address space. |
38 | |||
39 | This _wc variant returns a write-combining map to the | ||
40 | page and may only be used with mappings created by | ||
41 | io_mapping_create_wc | ||
31 | 42 | ||
32 | This _wc variant returns a write-combining map to the | 43 | Note that the task may not sleep while holding this page |
33 | page and may only be used with mappings created by | 44 | mapped. |
34 | io_mapping_create_wc | ||
35 | 45 | ||
36 | Note that the task may not sleep while holding this page | 46 | :: |
37 | mapped. | ||
38 | 47 | ||
39 | void io_mapping_unmap_atomic(void *vaddr) | 48 | void io_mapping_unmap_atomic(void *vaddr) |
40 | 49 | ||
41 | 'vaddr' must be the value returned by the last | 50 | 'vaddr' must be the value returned by the last |
42 | io_mapping_map_atomic_wc call. This unmaps the specified | 51 | io_mapping_map_atomic_wc call. This unmaps the specified |
43 | page and allows the task to sleep once again. | 52 | page and allows the task to sleep once again. |
44 | 53 | ||
45 | If you need to sleep while holding the lock, you can use the non-atomic | 54 | If you need to sleep while holding the lock, you can use the non-atomic |
46 | variant, although they may be significantly slower. | 55 | variant, although they may be significantly slower. |
47 | 56 | ||
57 | :: | ||
58 | |||
48 | void *io_mapping_map_wc(struct io_mapping *mapping, | 59 | void *io_mapping_map_wc(struct io_mapping *mapping, |
49 | unsigned long offset) | 60 | unsigned long offset) |
50 | 61 | ||
51 | This works like io_mapping_map_atomic_wc except it allows | 62 | This works like io_mapping_map_atomic_wc except it allows |
52 | the task to sleep while holding the page mapped. | 63 | the task to sleep while holding the page mapped. |
64 | |||
65 | |||
66 | :: | ||
53 | 67 | ||
54 | void io_mapping_unmap(void *vaddr) | 68 | void io_mapping_unmap(void *vaddr) |
55 | 69 | ||
56 | This works like io_mapping_unmap_atomic, except it is used | 70 | This works like io_mapping_unmap_atomic, except it is used |
57 | for pages mapped with io_mapping_map_wc. | 71 | for pages mapped with io_mapping_map_wc. |
58 | 72 | ||
59 | At driver close time, the io_mapping object must be freed: | 73 | At driver close time, the io_mapping object must be freed:: |
60 | 74 | ||
61 | void io_mapping_free(struct io_mapping *mapping) | 75 | void io_mapping_free(struct io_mapping *mapping) |
62 | 76 | ||
63 | Current Implementation: | 77 | Current Implementation |
78 | ====================== | ||
64 | 79 | ||
65 | The initial implementation of these functions uses existing mapping | 80 | The initial implementation of these functions uses existing mapping |
66 | mechanisms and so provides only an abstraction layer and no new | 81 | mechanisms and so provides only an abstraction layer and no new |
diff --git a/Documentation/io_ordering.txt b/Documentation/io_ordering.txt index 9faae6f26d32..2ab303ce9a0d 100644 --- a/Documentation/io_ordering.txt +++ b/Documentation/io_ordering.txt | |||
@@ -1,3 +1,7 @@ | |||
1 | ============================================== | ||
2 | Ordering I/O writes to memory-mapped addresses | ||
3 | ============================================== | ||
4 | |||
1 | On some platforms, so-called memory-mapped I/O is weakly ordered. On such | 5 | On some platforms, so-called memory-mapped I/O is weakly ordered. On such |
2 | platforms, driver writers are responsible for ensuring that I/O writes to | 6 | platforms, driver writers are responsible for ensuring that I/O writes to |
3 | memory-mapped addresses on their device arrive in the order intended. This is | 7 | memory-mapped addresses on their device arrive in the order intended. This is |
@@ -8,39 +12,39 @@ critical section of code protected by spinlocks. This would ensure that | |||
8 | subsequent writes to I/O space arrived only after all prior writes (much like a | 12 | subsequent writes to I/O space arrived only after all prior writes (much like a |
9 | memory barrier op, mb(), only with respect to I/O). | 13 | memory barrier op, mb(), only with respect to I/O). |
10 | 14 | ||
11 | A more concrete example from a hypothetical device driver: | 15 | A more concrete example from a hypothetical device driver:: |
12 | 16 | ||
13 | ... | 17 | ... |
14 | CPU A: spin_lock_irqsave(&dev_lock, flags) | 18 | CPU A: spin_lock_irqsave(&dev_lock, flags) |
15 | CPU A: val = readl(my_status); | 19 | CPU A: val = readl(my_status); |
16 | CPU A: ... | 20 | CPU A: ... |
17 | CPU A: writel(newval, ring_ptr); | 21 | CPU A: writel(newval, ring_ptr); |
18 | CPU A: spin_unlock_irqrestore(&dev_lock, flags) | 22 | CPU A: spin_unlock_irqrestore(&dev_lock, flags) |
19 | ... | 23 | ... |
20 | CPU B: spin_lock_irqsave(&dev_lock, flags) | 24 | CPU B: spin_lock_irqsave(&dev_lock, flags) |
21 | CPU B: val = readl(my_status); | 25 | CPU B: val = readl(my_status); |
22 | CPU B: ... | 26 | CPU B: ... |
23 | CPU B: writel(newval2, ring_ptr); | 27 | CPU B: writel(newval2, ring_ptr); |
24 | CPU B: spin_unlock_irqrestore(&dev_lock, flags) | 28 | CPU B: spin_unlock_irqrestore(&dev_lock, flags) |
25 | ... | 29 | ... |
26 | 30 | ||
27 | In the case above, the device may receive newval2 before it receives newval, | 31 | In the case above, the device may receive newval2 before it receives newval, |
28 | which could cause problems. Fixing it is easy enough though: | 32 | which could cause problems. Fixing it is easy enough though:: |
29 | 33 | ||
30 | ... | 34 | ... |
31 | CPU A: spin_lock_irqsave(&dev_lock, flags) | 35 | CPU A: spin_lock_irqsave(&dev_lock, flags) |
32 | CPU A: val = readl(my_status); | 36 | CPU A: val = readl(my_status); |
33 | CPU A: ... | 37 | CPU A: ... |
34 | CPU A: writel(newval, ring_ptr); | 38 | CPU A: writel(newval, ring_ptr); |
35 | CPU A: (void)readl(safe_register); /* maybe a config register? */ | 39 | CPU A: (void)readl(safe_register); /* maybe a config register? */ |
36 | CPU A: spin_unlock_irqrestore(&dev_lock, flags) | 40 | CPU A: spin_unlock_irqrestore(&dev_lock, flags) |
37 | ... | 41 | ... |
38 | CPU B: spin_lock_irqsave(&dev_lock, flags) | 42 | CPU B: spin_lock_irqsave(&dev_lock, flags) |
39 | CPU B: val = readl(my_status); | 43 | CPU B: val = readl(my_status); |
40 | CPU B: ... | 44 | CPU B: ... |
41 | CPU B: writel(newval2, ring_ptr); | 45 | CPU B: writel(newval2, ring_ptr); |
42 | CPU B: (void)readl(safe_register); /* maybe a config register? */ | 46 | CPU B: (void)readl(safe_register); /* maybe a config register? */ |
43 | CPU B: spin_unlock_irqrestore(&dev_lock, flags) | 47 | CPU B: spin_unlock_irqrestore(&dev_lock, flags) |
44 | 48 | ||
45 | Here, the reads from safe_register will cause the I/O chipset to flush any | 49 | Here, the reads from safe_register will cause the I/O chipset to flush any |
46 | pending writes before actually posting the read to the chipset, preventing | 50 | pending writes before actually posting the read to the chipset, preventing |
diff --git a/Documentation/iostats.txt b/Documentation/iostats.txt index 65f694f2d1c9..04d394a2e06c 100644 --- a/Documentation/iostats.txt +++ b/Documentation/iostats.txt | |||
@@ -1,49 +1,50 @@ | |||
1 | ===================== | ||
1 | I/O statistics fields | 2 | I/O statistics fields |
2 | --------------- | 3 | ===================== |
3 | 4 | ||
4 | Since 2.4.20 (and some versions before, with patches), and 2.5.45, | 5 | Since 2.4.20 (and some versions before, with patches), and 2.5.45, |
5 | more extensive disk statistics have been introduced to help measure disk | 6 | more extensive disk statistics have been introduced to help measure disk |
6 | activity. Tools such as sar and iostat typically interpret these and do | 7 | activity. Tools such as ``sar`` and ``iostat`` typically interpret these and do |
7 | the work for you, but in case you are interested in creating your own | 8 | the work for you, but in case you are interested in creating your own |
8 | tools, the fields are explained here. | 9 | tools, the fields are explained here. |
9 | 10 | ||
10 | In 2.4 now, the information is found as additional fields in | 11 | In 2.4 now, the information is found as additional fields in |
11 | /proc/partitions. In 2.6, the same information is found in two | 12 | ``/proc/partitions``. In 2.6 and upper, the same information is found in two |
12 | places: one is in the file /proc/diskstats, and the other is within | 13 | places: one is in the file ``/proc/diskstats``, and the other is within |
13 | the sysfs file system, which must be mounted in order to obtain | 14 | the sysfs file system, which must be mounted in order to obtain |
14 | the information. Throughout this document we'll assume that sysfs | 15 | the information. Throughout this document we'll assume that sysfs |
15 | is mounted on /sys, although of course it may be mounted anywhere. | 16 | is mounted on ``/sys``, although of course it may be mounted anywhere. |
16 | Both /proc/diskstats and sysfs use the same source for the information | 17 | Both ``/proc/diskstats`` and sysfs use the same source for the information |
17 | and so should not differ. | 18 | and so should not differ. |
18 | 19 | ||
19 | Here are examples of these different formats: | 20 | Here are examples of these different formats:: |
20 | 21 | ||
21 | 2.4: | 22 | 2.4: |
22 | 3 0 39082680 hda 446216 784926 9550688 4382310 424847 312726 5922052 19310380 0 3376340 23705160 | 23 | 3 0 39082680 hda 446216 784926 9550688 4382310 424847 312726 5922052 19310380 0 3376340 23705160 |
23 | 3 1 9221278 hda1 35486 0 35496 38030 0 0 0 0 0 38030 38030 | 24 | 3 1 9221278 hda1 35486 0 35496 38030 0 0 0 0 0 38030 38030 |
24 | 25 | ||
26 | 2.6+ sysfs: | ||
27 | 446216 784926 9550688 4382310 424847 312726 5922052 19310380 0 3376340 23705160 | ||
28 | 35486 38030 38030 38030 | ||
25 | 29 | ||
26 | 2.6 sysfs: | 30 | 2.6+ diskstats: |
27 | 446216 784926 9550688 4382310 424847 312726 5922052 19310380 0 3376340 23705160 | 31 | 3 0 hda 446216 784926 9550688 4382310 424847 312726 5922052 19310380 0 3376340 23705160 |
28 | 35486 38030 38030 38030 | 32 | 3 1 hda1 35486 38030 38030 38030 |
29 | 33 | ||
30 | 2.6 diskstats: | 34 | On 2.4 you might execute ``grep 'hda ' /proc/partitions``. On 2.6+, you have |
31 | 3 0 hda 446216 784926 9550688 4382310 424847 312726 5922052 19310380 0 3376340 23705160 | 35 | a choice of ``cat /sys/block/hda/stat`` or ``grep 'hda ' /proc/diskstats``. |
32 | 3 1 hda1 35486 38030 38030 38030 | ||
33 | 36 | ||
34 | On 2.4 you might execute "grep 'hda ' /proc/partitions". On 2.6, you have | ||
35 | a choice of "cat /sys/block/hda/stat" or "grep 'hda ' /proc/diskstats". | ||
36 | The advantage of one over the other is that the sysfs choice works well | 37 | The advantage of one over the other is that the sysfs choice works well |
37 | if you are watching a known, small set of disks. /proc/diskstats may | 38 | if you are watching a known, small set of disks. ``/proc/diskstats`` may |
38 | be a better choice if you are watching a large number of disks because | 39 | be a better choice if you are watching a large number of disks because |
39 | you'll avoid the overhead of 50, 100, or 500 or more opens/closes with | 40 | you'll avoid the overhead of 50, 100, or 500 or more opens/closes with |
40 | each snapshot of your disk statistics. | 41 | each snapshot of your disk statistics. |
41 | 42 | ||
42 | In 2.4, the statistics fields are those after the device name. In | 43 | In 2.4, the statistics fields are those after the device name. In |
43 | the above example, the first field of statistics would be 446216. | 44 | the above example, the first field of statistics would be 446216. |
44 | By contrast, in 2.6 if you look at /sys/block/hda/stat, you'll | 45 | By contrast, in 2.6+ if you look at ``/sys/block/hda/stat``, you'll |
45 | find just the eleven fields, beginning with 446216. If you look at | 46 | find just the eleven fields, beginning with 446216. If you look at |
46 | /proc/diskstats, the eleven fields will be preceded by the major and | 47 | ``/proc/diskstats``, the eleven fields will be preceded by the major and |
47 | minor device numbers, and device name. Each of these formats provides | 48 | minor device numbers, and device name. Each of these formats provides |
48 | eleven fields of statistics, each meaning exactly the same things. | 49 | eleven fields of statistics, each meaning exactly the same things. |
49 | All fields except field 9 are cumulative since boot. Field 9 should | 50 | All fields except field 9 are cumulative since boot. Field 9 should |
@@ -59,30 +60,40 @@ system-wide stats you'll have to find all the devices and sum them all up. | |||
59 | 60 | ||
60 | Field 1 -- # of reads completed | 61 | Field 1 -- # of reads completed |
61 | This is the total number of reads completed successfully. | 62 | This is the total number of reads completed successfully. |
63 | |||
62 | Field 2 -- # of reads merged, field 6 -- # of writes merged | 64 | Field 2 -- # of reads merged, field 6 -- # of writes merged |
63 | Reads and writes which are adjacent to each other may be merged for | 65 | Reads and writes which are adjacent to each other may be merged for |
64 | efficiency. Thus two 4K reads may become one 8K read before it is | 66 | efficiency. Thus two 4K reads may become one 8K read before it is |
65 | ultimately handed to the disk, and so it will be counted (and queued) | 67 | ultimately handed to the disk, and so it will be counted (and queued) |
66 | as only one I/O. This field lets you know how often this was done. | 68 | as only one I/O. This field lets you know how often this was done. |
69 | |||
67 | Field 3 -- # of sectors read | 70 | Field 3 -- # of sectors read |
68 | This is the total number of sectors read successfully. | 71 | This is the total number of sectors read successfully. |
72 | |||
69 | Field 4 -- # of milliseconds spent reading | 73 | Field 4 -- # of milliseconds spent reading |
70 | This is the total number of milliseconds spent by all reads (as | 74 | This is the total number of milliseconds spent by all reads (as |
71 | measured from __make_request() to end_that_request_last()). | 75 | measured from __make_request() to end_that_request_last()). |
76 | |||
72 | Field 5 -- # of writes completed | 77 | Field 5 -- # of writes completed |
73 | This is the total number of writes completed successfully. | 78 | This is the total number of writes completed successfully. |
79 | |||
74 | Field 6 -- # of writes merged | 80 | Field 6 -- # of writes merged |
75 | See the description of field 2. | 81 | See the description of field 2. |
82 | |||
76 | Field 7 -- # of sectors written | 83 | Field 7 -- # of sectors written |
77 | This is the total number of sectors written successfully. | 84 | This is the total number of sectors written successfully. |
85 | |||
78 | Field 8 -- # of milliseconds spent writing | 86 | Field 8 -- # of milliseconds spent writing |
79 | This is the total number of milliseconds spent by all writes (as | 87 | This is the total number of milliseconds spent by all writes (as |
80 | measured from __make_request() to end_that_request_last()). | 88 | measured from __make_request() to end_that_request_last()). |
89 | |||
81 | Field 9 -- # of I/Os currently in progress | 90 | Field 9 -- # of I/Os currently in progress |
82 | The only field that should go to zero. Incremented as requests are | 91 | The only field that should go to zero. Incremented as requests are |
83 | given to appropriate struct request_queue and decremented as they finish. | 92 | given to appropriate struct request_queue and decremented as they finish. |
93 | |||
84 | Field 10 -- # of milliseconds spent doing I/Os | 94 | Field 10 -- # of milliseconds spent doing I/Os |
85 | This field increases so long as field 9 is nonzero. | 95 | This field increases so long as field 9 is nonzero. |
96 | |||
86 | Field 11 -- weighted # of milliseconds spent doing I/Os | 97 | Field 11 -- weighted # of milliseconds spent doing I/Os |
87 | This field is incremented at each I/O start, I/O completion, I/O | 98 | This field is incremented at each I/O start, I/O completion, I/O |
88 | merge, or read of these stats by the number of I/Os in progress | 99 | merge, or read of these stats by the number of I/Os in progress |
@@ -97,7 +108,7 @@ introduced when changes collide, so (for instance) adding up all the | |||
97 | read I/Os issued per partition should equal those made to the disks ... | 108 | read I/Os issued per partition should equal those made to the disks ... |
98 | but due to the lack of locking it may only be very close. | 109 | but due to the lack of locking it may only be very close. |
99 | 110 | ||
100 | In 2.6, there are counters for each CPU, which make the lack of locking | 111 | In 2.6+, there are counters for each CPU, which make the lack of locking |
101 | almost a non-issue. When the statistics are read, the per-CPU counters | 112 | almost a non-issue. When the statistics are read, the per-CPU counters |
102 | are summed (possibly overflowing the unsigned long variable they are | 113 | are summed (possibly overflowing the unsigned long variable they are |
103 | summed to) and the result given to the user. There is no convenient | 114 | summed to) and the result given to the user. There is no convenient |
@@ -106,22 +117,25 @@ user interface for accessing the per-CPU counters themselves. | |||
106 | Disks vs Partitions | 117 | Disks vs Partitions |
107 | ------------------- | 118 | ------------------- |
108 | 119 | ||
109 | There were significant changes between 2.4 and 2.6 in the I/O subsystem. | 120 | There were significant changes between 2.4 and 2.6+ in the I/O subsystem. |
110 | As a result, some statistic information disappeared. The translation from | 121 | As a result, some statistic information disappeared. The translation from |
111 | a disk address relative to a partition to the disk address relative to | 122 | a disk address relative to a partition to the disk address relative to |
112 | the host disk happens much earlier. All merges and timings now happen | 123 | the host disk happens much earlier. All merges and timings now happen |
113 | at the disk level rather than at both the disk and partition level as | 124 | at the disk level rather than at both the disk and partition level as |
114 | in 2.4. Consequently, you'll see a different statistics output on 2.6 for | 125 | in 2.4. Consequently, you'll see a different statistics output on 2.6+ for |
115 | partitions from that for disks. There are only *four* fields available | 126 | partitions from that for disks. There are only *four* fields available |
116 | for partitions on 2.6 machines. This is reflected in the examples above. | 127 | for partitions on 2.6+ machines. This is reflected in the examples above. |
117 | 128 | ||
118 | Field 1 -- # of reads issued | 129 | Field 1 -- # of reads issued |
119 | This is the total number of reads issued to this partition. | 130 | This is the total number of reads issued to this partition. |
131 | |||
120 | Field 2 -- # of sectors read | 132 | Field 2 -- # of sectors read |
121 | This is the total number of sectors requested to be read from this | 133 | This is the total number of sectors requested to be read from this |
122 | partition. | 134 | partition. |
135 | |||
123 | Field 3 -- # of writes issued | 136 | Field 3 -- # of writes issued |
124 | This is the total number of writes issued to this partition. | 137 | This is the total number of writes issued to this partition. |
138 | |||
125 | Field 4 -- # of sectors written | 139 | Field 4 -- # of sectors written |
126 | This is the total number of sectors requested to be written to | 140 | This is the total number of sectors requested to be written to |
127 | this partition. | 141 | this partition. |
@@ -149,16 +163,16 @@ to some (probably insignificant) inaccuracy. | |||
149 | Additional notes | 163 | Additional notes |
150 | ---------------- | 164 | ---------------- |
151 | 165 | ||
152 | In 2.6, sysfs is not mounted by default. If your distribution of | 166 | In 2.6+, sysfs is not mounted by default. If your distribution of |
153 | Linux hasn't added it already, here's the line you'll want to add to | 167 | Linux hasn't added it already, here's the line you'll want to add to |
154 | your /etc/fstab: | 168 | your ``/etc/fstab``:: |
155 | 169 | ||
156 | none /sys sysfs defaults 0 0 | 170 | none /sys sysfs defaults 0 0 |
157 | 171 | ||
158 | 172 | ||
159 | In 2.6, all disk statistics were removed from /proc/stat. In 2.4, they | 173 | In 2.6+, all disk statistics were removed from ``/proc/stat``. In 2.4, they |
160 | appear in both /proc/partitions and /proc/stat, although the ones in | 174 | appear in both ``/proc/partitions`` and ``/proc/stat``, although the ones in |
161 | /proc/stat take a very different format from those in /proc/partitions | 175 | ``/proc/stat`` take a very different format from those in ``/proc/partitions`` |
162 | (see proc(5), if your system has it.) | 176 | (see proc(5), if your system has it.) |
163 | 177 | ||
164 | -- ricklind@us.ibm.com | 178 | -- ricklind@us.ibm.com |
diff --git a/Documentation/irqflags-tracing.txt b/Documentation/irqflags-tracing.txt index f6da05670e16..bdd208259fb3 100644 --- a/Documentation/irqflags-tracing.txt +++ b/Documentation/irqflags-tracing.txt | |||
@@ -1,8 +1,10 @@ | |||
1 | ======================= | ||
1 | IRQ-flags state tracing | 2 | IRQ-flags state tracing |
3 | ======================= | ||
2 | 4 | ||
3 | started by Ingo Molnar <mingo@redhat.com> | 5 | :Author: started by Ingo Molnar <mingo@redhat.com> |
4 | 6 | ||
5 | the "irq-flags tracing" feature "traces" hardirq and softirq state, in | 7 | The "irq-flags tracing" feature "traces" hardirq and softirq state, in |
6 | that it gives interested subsystems an opportunity to be notified of | 8 | that it gives interested subsystems an opportunity to be notified of |
7 | every hardirqs-off/hardirqs-on, softirqs-off/softirqs-on event that | 9 | every hardirqs-off/hardirqs-on, softirqs-off/softirqs-on event that |
8 | happens in the kernel. | 10 | happens in the kernel. |
@@ -14,7 +16,7 @@ CONFIG_PROVE_RWSEM_LOCKING will be offered on an architecture - these | |||
14 | are locking APIs that are not used in IRQ context. (the one exception | 16 | are locking APIs that are not used in IRQ context. (the one exception |
15 | for rwsems is worked around) | 17 | for rwsems is worked around) |
16 | 18 | ||
17 | architecture support for this is certainly not in the "trivial" | 19 | Architecture support for this is certainly not in the "trivial" |
18 | category, because lots of lowlevel assembly code deal with irq-flags | 20 | category, because lots of lowlevel assembly code deal with irq-flags |
19 | state changes. But an architecture can be irq-flags-tracing enabled in a | 21 | state changes. But an architecture can be irq-flags-tracing enabled in a |
20 | rather straightforward and risk-free manner. | 22 | rather straightforward and risk-free manner. |
@@ -41,7 +43,7 @@ irq-flags-tracing support: | |||
41 | excluded from the irq-tracing [and lock validation] mechanism via | 43 | excluded from the irq-tracing [and lock validation] mechanism via |
42 | lockdep_off()/lockdep_on(). | 44 | lockdep_off()/lockdep_on(). |
43 | 45 | ||
44 | in general there is no risk from having an incomplete irq-flags-tracing | 46 | In general there is no risk from having an incomplete irq-flags-tracing |
45 | implementation in an architecture: lockdep will detect that and will | 47 | implementation in an architecture: lockdep will detect that and will |
46 | turn itself off. I.e. the lock validator will still be reliable. There | 48 | turn itself off. I.e. the lock validator will still be reliable. There |
47 | should be no crashes due to irq-tracing bugs. (except if the assembly | 49 | should be no crashes due to irq-tracing bugs. (except if the assembly |
diff --git a/Documentation/isa.txt b/Documentation/isa.txt index f232c26a40be..def4a7b690b5 100644 --- a/Documentation/isa.txt +++ b/Documentation/isa.txt | |||
@@ -1,5 +1,6 @@ | |||
1 | =========== | ||
1 | ISA Drivers | 2 | ISA Drivers |
2 | ----------- | 3 | =========== |
3 | 4 | ||
4 | The following text is adapted from the commit message of the initial | 5 | The following text is adapted from the commit message of the initial |
5 | commit of the ISA bus driver authored by Rene Herman. | 6 | commit of the ISA bus driver authored by Rene Herman. |
@@ -23,17 +24,17 @@ that all device creation has been made internal as well. | |||
23 | 24 | ||
24 | The usage model this provides is nice, and has been acked from the ALSA | 25 | The usage model this provides is nice, and has been acked from the ALSA |
25 | side by Takashi Iwai and Jaroslav Kysela. The ALSA driver module_init's | 26 | side by Takashi Iwai and Jaroslav Kysela. The ALSA driver module_init's |
26 | now (for oldisa-only drivers) become: | 27 | now (for oldisa-only drivers) become:: |
27 | 28 | ||
28 | static int __init alsa_card_foo_init(void) | 29 | static int __init alsa_card_foo_init(void) |
29 | { | 30 | { |
30 | return isa_register_driver(&snd_foo_isa_driver, SNDRV_CARDS); | 31 | return isa_register_driver(&snd_foo_isa_driver, SNDRV_CARDS); |
31 | } | 32 | } |
32 | 33 | ||
33 | static void __exit alsa_card_foo_exit(void) | 34 | static void __exit alsa_card_foo_exit(void) |
34 | { | 35 | { |
35 | isa_unregister_driver(&snd_foo_isa_driver); | 36 | isa_unregister_driver(&snd_foo_isa_driver); |
36 | } | 37 | } |
37 | 38 | ||
38 | Quite like the other bus models therefore. This removes a lot of | 39 | Quite like the other bus models therefore. This removes a lot of |
39 | duplicated init code from the ALSA ISA drivers. | 40 | duplicated init code from the ALSA ISA drivers. |
@@ -47,11 +48,11 @@ parameter, indicating how many devices to create and call our methods | |||
47 | with. | 48 | with. |
48 | 49 | ||
49 | The platform_driver callbacks are called with a platform_device param; | 50 | The platform_driver callbacks are called with a platform_device param; |
50 | the isa_driver callbacks are being called with a "struct device *dev, | 51 | the isa_driver callbacks are being called with a ``struct device *dev, |
51 | unsigned int id" pair directly -- with the device creation completely | 52 | unsigned int id`` pair directly -- with the device creation completely |
52 | internal to the bus it's much cleaner to not leak isa_dev's by passing | 53 | internal to the bus it's much cleaner to not leak isa_dev's by passing |
53 | them in at all. The id is the only thing we ever want other then the | 54 | them in at all. The id is the only thing we ever want other then the |
54 | struct device * anyways, and it makes for nicer code in the callbacks as | 55 | struct device anyways, and it makes for nicer code in the callbacks as |
55 | well. | 56 | well. |
56 | 57 | ||
57 | With this additional .match() callback ISA drivers have all options. If | 58 | With this additional .match() callback ISA drivers have all options. If |
@@ -75,20 +76,20 @@ This exports only two functions; isa_{,un}register_driver(). | |||
75 | 76 | ||
76 | isa_register_driver() register's the struct device_driver, and then | 77 | isa_register_driver() register's the struct device_driver, and then |
77 | loops over the passed in ndev creating devices and registering them. | 78 | loops over the passed in ndev creating devices and registering them. |
78 | This causes the bus match method to be called for them, which is: | 79 | This causes the bus match method to be called for them, which is:: |
79 | 80 | ||
80 | int isa_bus_match(struct device *dev, struct device_driver *driver) | 81 | int isa_bus_match(struct device *dev, struct device_driver *driver) |
81 | { | 82 | { |
82 | struct isa_driver *isa_driver = to_isa_driver(driver); | 83 | struct isa_driver *isa_driver = to_isa_driver(driver); |
83 | 84 | ||
84 | if (dev->platform_data == isa_driver) { | 85 | if (dev->platform_data == isa_driver) { |
85 | if (!isa_driver->match || | 86 | if (!isa_driver->match || |
86 | isa_driver->match(dev, to_isa_dev(dev)->id)) | 87 | isa_driver->match(dev, to_isa_dev(dev)->id)) |
87 | return 1; | 88 | return 1; |
88 | dev->platform_data = NULL; | 89 | dev->platform_data = NULL; |
89 | } | 90 | } |
90 | return 0; | 91 | return 0; |
91 | } | 92 | } |
92 | 93 | ||
93 | The first thing this does is check if this device is in fact one of this | 94 | The first thing this does is check if this device is in fact one of this |
94 | driver's devices by seeing if the device's platform_data pointer is set | 95 | driver's devices by seeing if the device's platform_data pointer is set |
@@ -102,7 +103,7 @@ well. | |||
102 | Then, if the the driver did not provide a .match, it matches. If it did, | 103 | Then, if the the driver did not provide a .match, it matches. If it did, |
103 | the driver match() method is called to determine a match. | 104 | the driver match() method is called to determine a match. |
104 | 105 | ||
105 | If it did _not_ match, dev->platform_data is reset to indicate this to | 106 | If it did **not** match, dev->platform_data is reset to indicate this to |
106 | isa_register_driver which can then unregister the device again. | 107 | isa_register_driver which can then unregister the device again. |
107 | 108 | ||
108 | If during all this, there's any error, or no devices matched at all | 109 | If during all this, there's any error, or no devices matched at all |
diff --git a/Documentation/isapnp.txt b/Documentation/isapnp.txt index 400d1b5b523d..8d0840ac847b 100644 --- a/Documentation/isapnp.txt +++ b/Documentation/isapnp.txt | |||
@@ -1,3 +1,4 @@ | |||
1 | ========================================================== | ||
1 | ISA Plug & Play support by Jaroslav Kysela <perex@suse.cz> | 2 | ISA Plug & Play support by Jaroslav Kysela <perex@suse.cz> |
2 | ========================================================== | 3 | ========================================================== |
3 | 4 | ||
diff --git a/Documentation/kernel-per-CPU-kthreads.txt b/Documentation/kernel-per-CPU-kthreads.txt index 2cb7dc5c0e0d..0f00f9c164ac 100644 --- a/Documentation/kernel-per-CPU-kthreads.txt +++ b/Documentation/kernel-per-CPU-kthreads.txt | |||
@@ -1,27 +1,29 @@ | |||
1 | REDUCING OS JITTER DUE TO PER-CPU KTHREADS | 1 | ========================================== |
2 | Reducing OS jitter due to per-cpu kthreads | ||
3 | ========================================== | ||
2 | 4 | ||
3 | This document lists per-CPU kthreads in the Linux kernel and presents | 5 | This document lists per-CPU kthreads in the Linux kernel and presents |
4 | options to control their OS jitter. Note that non-per-CPU kthreads are | 6 | options to control their OS jitter. Note that non-per-CPU kthreads are |
5 | not listed here. To reduce OS jitter from non-per-CPU kthreads, bind | 7 | not listed here. To reduce OS jitter from non-per-CPU kthreads, bind |
6 | them to a "housekeeping" CPU dedicated to such work. | 8 | them to a "housekeeping" CPU dedicated to such work. |
7 | 9 | ||
10 | References | ||
11 | ========== | ||
8 | 12 | ||
9 | REFERENCES | 13 | - Documentation/IRQ-affinity.txt: Binding interrupts to sets of CPUs. |
10 | 14 | ||
11 | o Documentation/IRQ-affinity.txt: Binding interrupts to sets of CPUs. | 15 | - Documentation/cgroup-v1: Using cgroups to bind tasks to sets of CPUs. |
12 | 16 | ||
13 | o Documentation/cgroup-v1: Using cgroups to bind tasks to sets of CPUs. | 17 | - man taskset: Using the taskset command to bind tasks to sets |
14 | |||
15 | o man taskset: Using the taskset command to bind tasks to sets | ||
16 | of CPUs. | 18 | of CPUs. |
17 | 19 | ||
18 | o man sched_setaffinity: Using the sched_setaffinity() system | 20 | - man sched_setaffinity: Using the sched_setaffinity() system |
19 | call to bind tasks to sets of CPUs. | 21 | call to bind tasks to sets of CPUs. |
20 | 22 | ||
21 | o /sys/devices/system/cpu/cpuN/online: Control CPU N's hotplug state, | 23 | - /sys/devices/system/cpu/cpuN/online: Control CPU N's hotplug state, |
22 | writing "0" to offline and "1" to online. | 24 | writing "0" to offline and "1" to online. |
23 | 25 | ||
24 | o In order to locate kernel-generated OS jitter on CPU N: | 26 | - In order to locate kernel-generated OS jitter on CPU N: |
25 | 27 | ||
26 | cd /sys/kernel/debug/tracing | 28 | cd /sys/kernel/debug/tracing |
27 | echo 1 > max_graph_depth # Increase the "1" for more detail | 29 | echo 1 > max_graph_depth # Increase the "1" for more detail |
@@ -29,12 +31,17 @@ o In order to locate kernel-generated OS jitter on CPU N: | |||
29 | # run workload | 31 | # run workload |
30 | cat per_cpu/cpuN/trace | 32 | cat per_cpu/cpuN/trace |
31 | 33 | ||
34 | kthreads | ||
35 | ======== | ||
36 | |||
37 | Name: | ||
38 | ehca_comp/%u | ||
32 | 39 | ||
33 | KTHREADS | 40 | Purpose: |
41 | Periodically process Infiniband-related work. | ||
34 | 42 | ||
35 | Name: ehca_comp/%u | ||
36 | Purpose: Periodically process Infiniband-related work. | ||
37 | To reduce its OS jitter, do any of the following: | 43 | To reduce its OS jitter, do any of the following: |
44 | |||
38 | 1. Don't use eHCA Infiniband hardware, instead choosing hardware | 45 | 1. Don't use eHCA Infiniband hardware, instead choosing hardware |
39 | that does not require per-CPU kthreads. This will prevent these | 46 | that does not require per-CPU kthreads. This will prevent these |
40 | kthreads from being created in the first place. (This will | 47 | kthreads from being created in the first place. (This will |
@@ -46,26 +53,45 @@ To reduce its OS jitter, do any of the following: | |||
46 | provisioned only on selected CPUs. | 53 | provisioned only on selected CPUs. |
47 | 54 | ||
48 | 55 | ||
49 | Name: irq/%d-%s | 56 | Name: |
50 | Purpose: Handle threaded interrupts. | 57 | irq/%d-%s |
58 | |||
59 | Purpose: | ||
60 | Handle threaded interrupts. | ||
61 | |||
51 | To reduce its OS jitter, do the following: | 62 | To reduce its OS jitter, do the following: |
63 | |||
52 | 1. Use irq affinity to force the irq threads to execute on | 64 | 1. Use irq affinity to force the irq threads to execute on |
53 | some other CPU. | 65 | some other CPU. |
54 | 66 | ||
55 | Name: kcmtpd_ctr_%d | 67 | Name: |
56 | Purpose: Handle Bluetooth work. | 68 | kcmtpd_ctr_%d |
69 | |||
70 | Purpose: | ||
71 | Handle Bluetooth work. | ||
72 | |||
57 | To reduce its OS jitter, do one of the following: | 73 | To reduce its OS jitter, do one of the following: |
74 | |||
58 | 1. Don't use Bluetooth, in which case these kthreads won't be | 75 | 1. Don't use Bluetooth, in which case these kthreads won't be |
59 | created in the first place. | 76 | created in the first place. |
60 | 2. Use irq affinity to force Bluetooth-related interrupts to | 77 | 2. Use irq affinity to force Bluetooth-related interrupts to |
61 | occur on some other CPU and furthermore initiate all | 78 | occur on some other CPU and furthermore initiate all |
62 | Bluetooth activity on some other CPU. | 79 | Bluetooth activity on some other CPU. |
63 | 80 | ||
64 | Name: ksoftirqd/%u | 81 | Name: |
65 | Purpose: Execute softirq handlers when threaded or when under heavy load. | 82 | ksoftirqd/%u |
83 | |||
84 | Purpose: | ||
85 | Execute softirq handlers when threaded or when under heavy load. | ||
86 | |||
66 | To reduce its OS jitter, each softirq vector must be handled | 87 | To reduce its OS jitter, each softirq vector must be handled |
67 | separately as follows: | 88 | separately as follows: |
68 | TIMER_SOFTIRQ: Do all of the following: | 89 | |
90 | TIMER_SOFTIRQ | ||
91 | ------------- | ||
92 | |||
93 | Do all of the following: | ||
94 | |||
69 | 1. To the extent possible, keep the CPU out of the kernel when it | 95 | 1. To the extent possible, keep the CPU out of the kernel when it |
70 | is non-idle, for example, by avoiding system calls and by forcing | 96 | is non-idle, for example, by avoiding system calls and by forcing |
71 | both kernel threads and interrupts to execute elsewhere. | 97 | both kernel threads and interrupts to execute elsewhere. |
@@ -76,34 +102,59 @@ TIMER_SOFTIRQ: Do all of the following: | |||
76 | first one back online. Once you have onlined the CPUs in question, | 102 | first one back online. Once you have onlined the CPUs in question, |
77 | do not offline any other CPUs, because doing so could force the | 103 | do not offline any other CPUs, because doing so could force the |
78 | timer back onto one of the CPUs in question. | 104 | timer back onto one of the CPUs in question. |
79 | NET_TX_SOFTIRQ and NET_RX_SOFTIRQ: Do all of the following: | 105 | |
106 | NET_TX_SOFTIRQ and NET_RX_SOFTIRQ | ||
107 | --------------------------------- | ||
108 | |||
109 | Do all of the following: | ||
110 | |||
80 | 1. Force networking interrupts onto other CPUs. | 111 | 1. Force networking interrupts onto other CPUs. |
81 | 2. Initiate any network I/O on other CPUs. | 112 | 2. Initiate any network I/O on other CPUs. |
82 | 3. Once your application has started, prevent CPU-hotplug operations | 113 | 3. Once your application has started, prevent CPU-hotplug operations |
83 | from being initiated from tasks that might run on the CPU to | 114 | from being initiated from tasks that might run on the CPU to |
84 | be de-jittered. (It is OK to force this CPU offline and then | 115 | be de-jittered. (It is OK to force this CPU offline and then |
85 | bring it back online before you start your application.) | 116 | bring it back online before you start your application.) |
86 | BLOCK_SOFTIRQ: Do all of the following: | 117 | |
118 | BLOCK_SOFTIRQ | ||
119 | ------------- | ||
120 | |||
121 | Do all of the following: | ||
122 | |||
87 | 1. Force block-device interrupts onto some other CPU. | 123 | 1. Force block-device interrupts onto some other CPU. |
88 | 2. Initiate any block I/O on other CPUs. | 124 | 2. Initiate any block I/O on other CPUs. |
89 | 3. Once your application has started, prevent CPU-hotplug operations | 125 | 3. Once your application has started, prevent CPU-hotplug operations |
90 | from being initiated from tasks that might run on the CPU to | 126 | from being initiated from tasks that might run on the CPU to |
91 | be de-jittered. (It is OK to force this CPU offline and then | 127 | be de-jittered. (It is OK to force this CPU offline and then |
92 | bring it back online before you start your application.) | 128 | bring it back online before you start your application.) |
93 | IRQ_POLL_SOFTIRQ: Do all of the following: | 129 | |
130 | IRQ_POLL_SOFTIRQ | ||
131 | ---------------- | ||
132 | |||
133 | Do all of the following: | ||
134 | |||
94 | 1. Force block-device interrupts onto some other CPU. | 135 | 1. Force block-device interrupts onto some other CPU. |
95 | 2. Initiate any block I/O and block-I/O polling on other CPUs. | 136 | 2. Initiate any block I/O and block-I/O polling on other CPUs. |
96 | 3. Once your application has started, prevent CPU-hotplug operations | 137 | 3. Once your application has started, prevent CPU-hotplug operations |
97 | from being initiated from tasks that might run on the CPU to | 138 | from being initiated from tasks that might run on the CPU to |
98 | be de-jittered. (It is OK to force this CPU offline and then | 139 | be de-jittered. (It is OK to force this CPU offline and then |
99 | bring it back online before you start your application.) | 140 | bring it back online before you start your application.) |
100 | TASKLET_SOFTIRQ: Do one or more of the following: | 141 | |
142 | TASKLET_SOFTIRQ | ||
143 | --------------- | ||
144 | |||
145 | Do one or more of the following: | ||
146 | |||
101 | 1. Avoid use of drivers that use tasklets. (Such drivers will contain | 147 | 1. Avoid use of drivers that use tasklets. (Such drivers will contain |
102 | calls to things like tasklet_schedule().) | 148 | calls to things like tasklet_schedule().) |
103 | 2. Convert all drivers that you must use from tasklets to workqueues. | 149 | 2. Convert all drivers that you must use from tasklets to workqueues. |
104 | 3. Force interrupts for drivers using tasklets onto other CPUs, | 150 | 3. Force interrupts for drivers using tasklets onto other CPUs, |
105 | and also do I/O involving these drivers on other CPUs. | 151 | and also do I/O involving these drivers on other CPUs. |
106 | SCHED_SOFTIRQ: Do all of the following: | 152 | |
153 | SCHED_SOFTIRQ | ||
154 | ------------- | ||
155 | |||
156 | Do all of the following: | ||
157 | |||
107 | 1. Avoid sending scheduler IPIs to the CPU to be de-jittered, | 158 | 1. Avoid sending scheduler IPIs to the CPU to be de-jittered, |
108 | for example, ensure that at most one runnable kthread is present | 159 | for example, ensure that at most one runnable kthread is present |
109 | on that CPU. If a thread that expects to run on the de-jittered | 160 | on that CPU. If a thread that expects to run on the de-jittered |
@@ -120,7 +171,12 @@ SCHED_SOFTIRQ: Do all of the following: | |||
120 | forcing both kernel threads and interrupts to execute elsewhere. | 171 | forcing both kernel threads and interrupts to execute elsewhere. |
121 | This further reduces the number of scheduler-clock interrupts | 172 | This further reduces the number of scheduler-clock interrupts |
122 | received by the de-jittered CPU. | 173 | received by the de-jittered CPU. |
123 | HRTIMER_SOFTIRQ: Do all of the following: | 174 | |
175 | HRTIMER_SOFTIRQ | ||
176 | --------------- | ||
177 | |||
178 | Do all of the following: | ||
179 | |||
124 | 1. To the extent possible, keep the CPU out of the kernel when it | 180 | 1. To the extent possible, keep the CPU out of the kernel when it |
125 | is non-idle. For example, avoid system calls and force both | 181 | is non-idle. For example, avoid system calls and force both |
126 | kernel threads and interrupts to execute elsewhere. | 182 | kernel threads and interrupts to execute elsewhere. |
@@ -131,9 +187,15 @@ HRTIMER_SOFTIRQ: Do all of the following: | |||
131 | back online. Once you have onlined the CPUs in question, do not | 187 | back online. Once you have onlined the CPUs in question, do not |
132 | offline any other CPUs, because doing so could force the timer | 188 | offline any other CPUs, because doing so could force the timer |
133 | back onto one of the CPUs in question. | 189 | back onto one of the CPUs in question. |
134 | RCU_SOFTIRQ: Do at least one of the following: | 190 | |
191 | RCU_SOFTIRQ | ||
192 | ----------- | ||
193 | |||
194 | Do at least one of the following: | ||
195 | |||
135 | 1. Offload callbacks and keep the CPU in either dyntick-idle or | 196 | 1. Offload callbacks and keep the CPU in either dyntick-idle or |
136 | adaptive-ticks state by doing all of the following: | 197 | adaptive-ticks state by doing all of the following: |
198 | |||
137 | a. CONFIG_NO_HZ_FULL=y and ensure that the CPU to be | 199 | a. CONFIG_NO_HZ_FULL=y and ensure that the CPU to be |
138 | de-jittered is marked as an adaptive-ticks CPU using the | 200 | de-jittered is marked as an adaptive-ticks CPU using the |
139 | "nohz_full=" boot parameter. Bind the rcuo kthreads to | 201 | "nohz_full=" boot parameter. Bind the rcuo kthreads to |
@@ -142,8 +204,10 @@ RCU_SOFTIRQ: Do at least one of the following: | |||
142 | when it is non-idle, for example, by avoiding system | 204 | when it is non-idle, for example, by avoiding system |
143 | calls and by forcing both kernel threads and interrupts | 205 | calls and by forcing both kernel threads and interrupts |
144 | to execute elsewhere. | 206 | to execute elsewhere. |
207 | |||
145 | 2. Enable RCU to do its processing remotely via dyntick-idle by | 208 | 2. Enable RCU to do its processing remotely via dyntick-idle by |
146 | doing all of the following: | 209 | doing all of the following: |
210 | |||
147 | a. Build with CONFIG_NO_HZ=y and CONFIG_RCU_FAST_NO_HZ=y. | 211 | a. Build with CONFIG_NO_HZ=y and CONFIG_RCU_FAST_NO_HZ=y. |
148 | b. Ensure that the CPU goes idle frequently, allowing other | 212 | b. Ensure that the CPU goes idle frequently, allowing other |
149 | CPUs to detect that it has passed through an RCU quiescent | 213 | CPUs to detect that it has passed through an RCU quiescent |
@@ -155,15 +219,20 @@ RCU_SOFTIRQ: Do at least one of the following: | |||
155 | calls and by forcing both kernel threads and interrupts | 219 | calls and by forcing both kernel threads and interrupts |
156 | to execute elsewhere. | 220 | to execute elsewhere. |
157 | 221 | ||
158 | Name: kworker/%u:%d%s (cpu, id, priority) | 222 | Name: |
159 | Purpose: Execute workqueue requests | 223 | kworker/%u:%d%s (cpu, id, priority) |
224 | |||
225 | Purpose: | ||
226 | Execute workqueue requests | ||
227 | |||
160 | To reduce its OS jitter, do any of the following: | 228 | To reduce its OS jitter, do any of the following: |
229 | |||
161 | 1. Run your workload at a real-time priority, which will allow | 230 | 1. Run your workload at a real-time priority, which will allow |
162 | preempting the kworker daemons. | 231 | preempting the kworker daemons. |
163 | 2. A given workqueue can be made visible in the sysfs filesystem | 232 | 2. A given workqueue can be made visible in the sysfs filesystem |
164 | by passing the WQ_SYSFS to that workqueue's alloc_workqueue(). | 233 | by passing the WQ_SYSFS to that workqueue's alloc_workqueue(). |
165 | Such a workqueue can be confined to a given subset of the | 234 | Such a workqueue can be confined to a given subset of the |
166 | CPUs using the /sys/devices/virtual/workqueue/*/cpumask sysfs | 235 | CPUs using the ``/sys/devices/virtual/workqueue/*/cpumask`` sysfs |
167 | files. The set of WQ_SYSFS workqueues can be displayed using | 236 | files. The set of WQ_SYSFS workqueues can be displayed using |
168 | "ls sys/devices/virtual/workqueue". That said, the workqueues | 237 | "ls sys/devices/virtual/workqueue". That said, the workqueues |
169 | maintainer would like to caution people against indiscriminately | 238 | maintainer would like to caution people against indiscriminately |
@@ -173,6 +242,7 @@ To reduce its OS jitter, do any of the following: | |||
173 | to remove it, even if its addition was a mistake. | 242 | to remove it, even if its addition was a mistake. |
174 | 3. Do any of the following needed to avoid jitter that your | 243 | 3. Do any of the following needed to avoid jitter that your |
175 | application cannot tolerate: | 244 | application cannot tolerate: |
245 | |||
176 | a. Build your kernel with CONFIG_SLUB=y rather than | 246 | a. Build your kernel with CONFIG_SLUB=y rather than |
177 | CONFIG_SLAB=y, thus avoiding the slab allocator's periodic | 247 | CONFIG_SLAB=y, thus avoiding the slab allocator's periodic |
178 | use of each CPU's workqueues to run its cache_reap() | 248 | use of each CPU's workqueues to run its cache_reap() |
@@ -186,6 +256,7 @@ To reduce its OS jitter, do any of the following: | |||
186 | be able to build your kernel with CONFIG_CPU_FREQ=n to | 256 | be able to build your kernel with CONFIG_CPU_FREQ=n to |
187 | avoid the CPU-frequency governor periodically running | 257 | avoid the CPU-frequency governor periodically running |
188 | on each CPU, including cs_dbs_timer() and od_dbs_timer(). | 258 | on each CPU, including cs_dbs_timer() and od_dbs_timer(). |
259 | |||
189 | WARNING: Please check your CPU specifications to | 260 | WARNING: Please check your CPU specifications to |
190 | make sure that this is safe on your particular system. | 261 | make sure that this is safe on your particular system. |
191 | d. As of v3.18, Christoph Lameter's on-demand vmstat workers | 262 | d. As of v3.18, Christoph Lameter's on-demand vmstat workers |
@@ -222,9 +293,14 @@ To reduce its OS jitter, do any of the following: | |||
222 | CONFIG_PMAC_RACKMETER=n to disable the CPU-meter, | 293 | CONFIG_PMAC_RACKMETER=n to disable the CPU-meter, |
223 | avoiding OS jitter from rackmeter_do_timer(). | 294 | avoiding OS jitter from rackmeter_do_timer(). |
224 | 295 | ||
225 | Name: rcuc/%u | 296 | Name: |
226 | Purpose: Execute RCU callbacks in CONFIG_RCU_BOOST=y kernels. | 297 | rcuc/%u |
298 | |||
299 | Purpose: | ||
300 | Execute RCU callbacks in CONFIG_RCU_BOOST=y kernels. | ||
301 | |||
227 | To reduce its OS jitter, do at least one of the following: | 302 | To reduce its OS jitter, do at least one of the following: |
303 | |||
228 | 1. Build the kernel with CONFIG_PREEMPT=n. This prevents these | 304 | 1. Build the kernel with CONFIG_PREEMPT=n. This prevents these |
229 | kthreads from being created in the first place, and also obviates | 305 | kthreads from being created in the first place, and also obviates |
230 | the need for RCU priority boosting. This approach is feasible | 306 | the need for RCU priority boosting. This approach is feasible |
@@ -244,9 +320,14 @@ To reduce its OS jitter, do at least one of the following: | |||
244 | CPU, again preventing the rcuc/%u kthreads from having any work | 320 | CPU, again preventing the rcuc/%u kthreads from having any work |
245 | to do. | 321 | to do. |
246 | 322 | ||
247 | Name: rcuob/%d, rcuop/%d, and rcuos/%d | 323 | Name: |
248 | Purpose: Offload RCU callbacks from the corresponding CPU. | 324 | rcuob/%d, rcuop/%d, and rcuos/%d |
325 | |||
326 | Purpose: | ||
327 | Offload RCU callbacks from the corresponding CPU. | ||
328 | |||
249 | To reduce its OS jitter, do at least one of the following: | 329 | To reduce its OS jitter, do at least one of the following: |
330 | |||
250 | 1. Use affinity, cgroups, or other mechanism to force these kthreads | 331 | 1. Use affinity, cgroups, or other mechanism to force these kthreads |
251 | to execute on some other CPU. | 332 | to execute on some other CPU. |
252 | 2. Build with CONFIG_RCU_NOCB_CPU=n, which will prevent these | 333 | 2. Build with CONFIG_RCU_NOCB_CPU=n, which will prevent these |
@@ -254,9 +335,14 @@ To reduce its OS jitter, do at least one of the following: | |||
254 | note that this will not eliminate OS jitter, but will instead | 335 | note that this will not eliminate OS jitter, but will instead |
255 | shift it to RCU_SOFTIRQ. | 336 | shift it to RCU_SOFTIRQ. |
256 | 337 | ||
257 | Name: watchdog/%u | 338 | Name: |
258 | Purpose: Detect software lockups on each CPU. | 339 | watchdog/%u |
340 | |||
341 | Purpose: | ||
342 | Detect software lockups on each CPU. | ||
343 | |||
259 | To reduce its OS jitter, do at least one of the following: | 344 | To reduce its OS jitter, do at least one of the following: |
345 | |||
260 | 1. Build with CONFIG_LOCKUP_DETECTOR=n, which will prevent these | 346 | 1. Build with CONFIG_LOCKUP_DETECTOR=n, which will prevent these |
261 | kthreads from being created in the first place. | 347 | kthreads from being created in the first place. |
262 | 2. Boot with "nosoftlockup=0", which will also prevent these kthreads | 348 | 2. Boot with "nosoftlockup=0", which will also prevent these kthreads |
diff --git a/Documentation/kobject.txt b/Documentation/kobject.txt index 1be59a3a521c..fc9485d79061 100644 --- a/Documentation/kobject.txt +++ b/Documentation/kobject.txt | |||
@@ -1,13 +1,13 @@ | |||
1 | ===================================================================== | ||
1 | Everything you never wanted to know about kobjects, ksets, and ktypes | 2 | Everything you never wanted to know about kobjects, ksets, and ktypes |
3 | ===================================================================== | ||
2 | 4 | ||
3 | Greg Kroah-Hartman <gregkh@linuxfoundation.org> | 5 | :Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
6 | :Last updated: December 19, 2007 | ||
4 | 7 | ||
5 | Based on an original article by Jon Corbet for lwn.net written October 1, | 8 | Based on an original article by Jon Corbet for lwn.net written October 1, |
6 | 2003 and located at http://lwn.net/Articles/51437/ | 9 | 2003 and located at http://lwn.net/Articles/51437/ |
7 | 10 | ||
8 | Last updated December 19, 2007 | ||
9 | |||
10 | |||
11 | Part of the difficulty in understanding the driver model - and the kobject | 11 | Part of the difficulty in understanding the driver model - and the kobject |
12 | abstraction upon which it is built - is that there is no obvious starting | 12 | abstraction upon which it is built - is that there is no obvious starting |
13 | place. Dealing with kobjects requires understanding a few different types, | 13 | place. Dealing with kobjects requires understanding a few different types, |
@@ -47,6 +47,7 @@ approach will be taken, so we'll go back to kobjects. | |||
47 | 47 | ||
48 | 48 | ||
49 | Embedding kobjects | 49 | Embedding kobjects |
50 | ================== | ||
50 | 51 | ||
51 | It is rare for kernel code to create a standalone kobject, with one major | 52 | It is rare for kernel code to create a standalone kobject, with one major |
52 | exception explained below. Instead, kobjects are used to control access to | 53 | exception explained below. Instead, kobjects are used to control access to |
@@ -65,7 +66,7 @@ their own, but are invariably found embedded in the larger objects of | |||
65 | interest.) | 66 | interest.) |
66 | 67 | ||
67 | So, for example, the UIO code in drivers/uio/uio.c has a structure that | 68 | So, for example, the UIO code in drivers/uio/uio.c has a structure that |
68 | defines the memory region associated with a uio device: | 69 | defines the memory region associated with a uio device:: |
69 | 70 | ||
70 | struct uio_map { | 71 | struct uio_map { |
71 | struct kobject kobj; | 72 | struct kobject kobj; |
@@ -77,7 +78,7 @@ just a matter of using the kobj member. Code that works with kobjects will | |||
77 | often have the opposite problem, however: given a struct kobject pointer, | 78 | often have the opposite problem, however: given a struct kobject pointer, |
78 | what is the pointer to the containing structure? You must avoid tricks | 79 | what is the pointer to the containing structure? You must avoid tricks |
79 | (such as assuming that the kobject is at the beginning of the structure) | 80 | (such as assuming that the kobject is at the beginning of the structure) |
80 | and, instead, use the container_of() macro, found in <linux/kernel.h>: | 81 | and, instead, use the container_of() macro, found in <linux/kernel.h>:: |
81 | 82 | ||
82 | container_of(pointer, type, member) | 83 | container_of(pointer, type, member) |
83 | 84 | ||
@@ -90,13 +91,13 @@ where: | |||
90 | The return value from container_of() is a pointer to the corresponding | 91 | The return value from container_of() is a pointer to the corresponding |
91 | container type. So, for example, a pointer "kp" to a struct kobject | 92 | container type. So, for example, a pointer "kp" to a struct kobject |
92 | embedded *within* a struct uio_map could be converted to a pointer to the | 93 | embedded *within* a struct uio_map could be converted to a pointer to the |
93 | *containing* uio_map structure with: | 94 | *containing* uio_map structure with:: |
94 | 95 | ||
95 | struct uio_map *u_map = container_of(kp, struct uio_map, kobj); | 96 | struct uio_map *u_map = container_of(kp, struct uio_map, kobj); |
96 | 97 | ||
97 | For convenience, programmers often define a simple macro for "back-casting" | 98 | For convenience, programmers often define a simple macro for "back-casting" |
98 | kobject pointers to the containing type. Exactly this happens in the | 99 | kobject pointers to the containing type. Exactly this happens in the |
99 | earlier drivers/uio/uio.c, as you can see here: | 100 | earlier drivers/uio/uio.c, as you can see here:: |
100 | 101 | ||
101 | struct uio_map { | 102 | struct uio_map { |
102 | struct kobject kobj; | 103 | struct kobject kobj; |
@@ -106,23 +107,25 @@ earlier drivers/uio/uio.c, as you can see here: | |||
106 | #define to_map(map) container_of(map, struct uio_map, kobj) | 107 | #define to_map(map) container_of(map, struct uio_map, kobj) |
107 | 108 | ||
108 | where the macro argument "map" is a pointer to the struct kobject in | 109 | where the macro argument "map" is a pointer to the struct kobject in |
109 | question. That macro is subsequently invoked with: | 110 | question. That macro is subsequently invoked with:: |
110 | 111 | ||
111 | struct uio_map *map = to_map(kobj); | 112 | struct uio_map *map = to_map(kobj); |
112 | 113 | ||
113 | 114 | ||
114 | Initialization of kobjects | 115 | Initialization of kobjects |
116 | ========================== | ||
115 | 117 | ||
116 | Code which creates a kobject must, of course, initialize that object. Some | 118 | Code which creates a kobject must, of course, initialize that object. Some |
117 | of the internal fields are setup with a (mandatory) call to kobject_init(): | 119 | of the internal fields are setup with a (mandatory) call to kobject_init():: |
118 | 120 | ||
119 | void kobject_init(struct kobject *kobj, struct kobj_type *ktype); | 121 | void kobject_init(struct kobject *kobj, struct kobj_type *ktype); |
120 | 122 | ||
121 | The ktype is required for a kobject to be created properly, as every kobject | 123 | The ktype is required for a kobject to be created properly, as every kobject |
122 | must have an associated kobj_type. After calling kobject_init(), to | 124 | must have an associated kobj_type. After calling kobject_init(), to |
123 | register the kobject with sysfs, the function kobject_add() must be called: | 125 | register the kobject with sysfs, the function kobject_add() must be called:: |
124 | 126 | ||
125 | int kobject_add(struct kobject *kobj, struct kobject *parent, const char *fmt, ...); | 127 | int kobject_add(struct kobject *kobj, struct kobject *parent, |
128 | const char *fmt, ...); | ||
126 | 129 | ||
127 | This sets up the parent of the kobject and the name for the kobject | 130 | This sets up the parent of the kobject and the name for the kobject |
128 | properly. If the kobject is to be associated with a specific kset, | 131 | properly. If the kobject is to be associated with a specific kset, |
@@ -133,7 +136,7 @@ kset itself. | |||
133 | 136 | ||
134 | As the name of the kobject is set when it is added to the kernel, the name | 137 | As the name of the kobject is set when it is added to the kernel, the name |
135 | of the kobject should never be manipulated directly. If you must change | 138 | of the kobject should never be manipulated directly. If you must change |
136 | the name of the kobject, call kobject_rename(): | 139 | the name of the kobject, call kobject_rename():: |
137 | 140 | ||
138 | int kobject_rename(struct kobject *kobj, const char *new_name); | 141 | int kobject_rename(struct kobject *kobj, const char *new_name); |
139 | 142 | ||
@@ -146,12 +149,12 @@ is being removed. If your code needs to call this function, it is | |||
146 | incorrect and needs to be fixed. | 149 | incorrect and needs to be fixed. |
147 | 150 | ||
148 | To properly access the name of the kobject, use the function | 151 | To properly access the name of the kobject, use the function |
149 | kobject_name(): | 152 | kobject_name():: |
150 | 153 | ||
151 | const char *kobject_name(const struct kobject * kobj); | 154 | const char *kobject_name(const struct kobject * kobj); |
152 | 155 | ||
153 | There is a helper function to both initialize and add the kobject to the | 156 | There is a helper function to both initialize and add the kobject to the |
154 | kernel at the same time, called surprisingly enough kobject_init_and_add(): | 157 | kernel at the same time, called surprisingly enough kobject_init_and_add():: |
155 | 158 | ||
156 | int kobject_init_and_add(struct kobject *kobj, struct kobj_type *ktype, | 159 | int kobject_init_and_add(struct kobject *kobj, struct kobj_type *ktype, |
157 | struct kobject *parent, const char *fmt, ...); | 160 | struct kobject *parent, const char *fmt, ...); |
@@ -161,10 +164,11 @@ kobject_add() functions described above. | |||
161 | 164 | ||
162 | 165 | ||
163 | Uevents | 166 | Uevents |
167 | ======= | ||
164 | 168 | ||
165 | After a kobject has been registered with the kobject core, you need to | 169 | After a kobject has been registered with the kobject core, you need to |
166 | announce to the world that it has been created. This can be done with a | 170 | announce to the world that it has been created. This can be done with a |
167 | call to kobject_uevent(): | 171 | call to kobject_uevent():: |
168 | 172 | ||
169 | int kobject_uevent(struct kobject *kobj, enum kobject_action action); | 173 | int kobject_uevent(struct kobject *kobj, enum kobject_action action); |
170 | 174 | ||
@@ -180,11 +184,12 @@ hand. | |||
180 | 184 | ||
181 | 185 | ||
182 | Reference counts | 186 | Reference counts |
187 | ================ | ||
183 | 188 | ||
184 | One of the key functions of a kobject is to serve as a reference counter | 189 | One of the key functions of a kobject is to serve as a reference counter |
185 | for the object in which it is embedded. As long as references to the object | 190 | for the object in which it is embedded. As long as references to the object |
186 | exist, the object (and the code which supports it) must continue to exist. | 191 | exist, the object (and the code which supports it) must continue to exist. |
187 | The low-level functions for manipulating a kobject's reference counts are: | 192 | The low-level functions for manipulating a kobject's reference counts are:: |
188 | 193 | ||
189 | struct kobject *kobject_get(struct kobject *kobj); | 194 | struct kobject *kobject_get(struct kobject *kobj); |
190 | void kobject_put(struct kobject *kobj); | 195 | void kobject_put(struct kobject *kobj); |
@@ -209,21 +214,24 @@ file Documentation/kref.txt in the Linux kernel source tree. | |||
209 | 214 | ||
210 | 215 | ||
211 | Creating "simple" kobjects | 216 | Creating "simple" kobjects |
217 | ========================== | ||
212 | 218 | ||
213 | Sometimes all that a developer wants is a way to create a simple directory | 219 | Sometimes all that a developer wants is a way to create a simple directory |
214 | in the sysfs hierarchy, and not have to mess with the whole complication of | 220 | in the sysfs hierarchy, and not have to mess with the whole complication of |
215 | ksets, show and store functions, and other details. This is the one | 221 | ksets, show and store functions, and other details. This is the one |
216 | exception where a single kobject should be created. To create such an | 222 | exception where a single kobject should be created. To create such an |
217 | entry, use the function: | 223 | entry, use the function:: |
218 | 224 | ||
219 | struct kobject *kobject_create_and_add(char *name, struct kobject *parent); | 225 | struct kobject *kobject_create_and_add(char *name, struct kobject *parent); |
220 | 226 | ||
221 | This function will create a kobject and place it in sysfs in the location | 227 | This function will create a kobject and place it in sysfs in the location |
222 | underneath the specified parent kobject. To create simple attributes | 228 | underneath the specified parent kobject. To create simple attributes |
223 | associated with this kobject, use: | 229 | associated with this kobject, use:: |
224 | 230 | ||
225 | int sysfs_create_file(struct kobject *kobj, struct attribute *attr); | 231 | int sysfs_create_file(struct kobject *kobj, struct attribute *attr); |
226 | or | 232 | |
233 | or:: | ||
234 | |||
227 | int sysfs_create_group(struct kobject *kobj, struct attribute_group *grp); | 235 | int sysfs_create_group(struct kobject *kobj, struct attribute_group *grp); |
228 | 236 | ||
229 | Both types of attributes used here, with a kobject that has been created | 237 | Both types of attributes used here, with a kobject that has been created |
@@ -236,6 +244,7 @@ implementation of a simple kobject and attributes. | |||
236 | 244 | ||
237 | 245 | ||
238 | ktypes and release methods | 246 | ktypes and release methods |
247 | ========================== | ||
239 | 248 | ||
240 | One important thing still missing from the discussion is what happens to a | 249 | One important thing still missing from the discussion is what happens to a |
241 | kobject when its reference count reaches zero. The code which created the | 250 | kobject when its reference count reaches zero. The code which created the |
@@ -257,7 +266,7 @@ is good practice to always use kobject_put() after kobject_init() to avoid | |||
257 | errors creeping in. | 266 | errors creeping in. |
258 | 267 | ||
259 | This notification is done through a kobject's release() method. Usually | 268 | This notification is done through a kobject's release() method. Usually |
260 | such a method has a form like: | 269 | such a method has a form like:: |
261 | 270 | ||
262 | void my_object_release(struct kobject *kobj) | 271 | void my_object_release(struct kobject *kobj) |
263 | { | 272 | { |
@@ -281,7 +290,7 @@ leak in the kobject core, which makes people unhappy. | |||
281 | 290 | ||
282 | Interestingly, the release() method is not stored in the kobject itself; | 291 | Interestingly, the release() method is not stored in the kobject itself; |
283 | instead, it is associated with the ktype. So let us introduce struct | 292 | instead, it is associated with the ktype. So let us introduce struct |
284 | kobj_type: | 293 | kobj_type:: |
285 | 294 | ||
286 | struct kobj_type { | 295 | struct kobj_type { |
287 | void (*release)(struct kobject *kobj); | 296 | void (*release)(struct kobject *kobj); |
@@ -306,6 +315,7 @@ automatically created for any kobject that is registered with this ktype. | |||
306 | 315 | ||
307 | 316 | ||
308 | ksets | 317 | ksets |
318 | ===== | ||
309 | 319 | ||
310 | A kset is merely a collection of kobjects that want to be associated with | 320 | A kset is merely a collection of kobjects that want to be associated with |
311 | each other. There is no restriction that they be of the same ktype, but be | 321 | each other. There is no restriction that they be of the same ktype, but be |
@@ -335,13 +345,16 @@ kobject) in their parent. | |||
335 | 345 | ||
336 | As a kset contains a kobject within it, it should always be dynamically | 346 | As a kset contains a kobject within it, it should always be dynamically |
337 | created and never declared statically or on the stack. To create a new | 347 | created and never declared statically or on the stack. To create a new |
338 | kset use: | 348 | kset use:: |
349 | |||
339 | struct kset *kset_create_and_add(const char *name, | 350 | struct kset *kset_create_and_add(const char *name, |
340 | struct kset_uevent_ops *u, | 351 | struct kset_uevent_ops *u, |
341 | struct kobject *parent); | 352 | struct kobject *parent); |
342 | 353 | ||
343 | When you are finished with the kset, call: | 354 | When you are finished with the kset, call:: |
355 | |||
344 | void kset_unregister(struct kset *kset); | 356 | void kset_unregister(struct kset *kset); |
357 | |||
345 | to destroy it. This removes the kset from sysfs and decrements its reference | 358 | to destroy it. This removes the kset from sysfs and decrements its reference |
346 | count. When the reference count goes to zero, the kset will be released. | 359 | count. When the reference count goes to zero, the kset will be released. |
347 | Because other references to the kset may still exist, the release may happen | 360 | Because other references to the kset may still exist, the release may happen |
@@ -351,14 +364,14 @@ An example of using a kset can be seen in the | |||
351 | samples/kobject/kset-example.c file in the kernel tree. | 364 | samples/kobject/kset-example.c file in the kernel tree. |
352 | 365 | ||
353 | If a kset wishes to control the uevent operations of the kobjects | 366 | If a kset wishes to control the uevent operations of the kobjects |
354 | associated with it, it can use the struct kset_uevent_ops to handle it: | 367 | associated with it, it can use the struct kset_uevent_ops to handle it:: |
355 | 368 | ||
356 | struct kset_uevent_ops { | 369 | struct kset_uevent_ops { |
357 | int (*filter)(struct kset *kset, struct kobject *kobj); | 370 | int (*filter)(struct kset *kset, struct kobject *kobj); |
358 | const char *(*name)(struct kset *kset, struct kobject *kobj); | 371 | const char *(*name)(struct kset *kset, struct kobject *kobj); |
359 | int (*uevent)(struct kset *kset, struct kobject *kobj, | 372 | int (*uevent)(struct kset *kset, struct kobject *kobj, |
360 | struct kobj_uevent_env *env); | 373 | struct kobj_uevent_env *env); |
361 | }; | 374 | }; |
362 | 375 | ||
363 | 376 | ||
364 | The filter function allows a kset to prevent a uevent from being emitted to | 377 | The filter function allows a kset to prevent a uevent from being emitted to |
@@ -386,6 +399,7 @@ added below the parent kobject. | |||
386 | 399 | ||
387 | 400 | ||
388 | Kobject removal | 401 | Kobject removal |
402 | =============== | ||
389 | 403 | ||
390 | After a kobject has been registered with the kobject core successfully, it | 404 | After a kobject has been registered with the kobject core successfully, it |
391 | must be cleaned up when the code is finished with it. To do that, call | 405 | must be cleaned up when the code is finished with it. To do that, call |
@@ -409,6 +423,7 @@ called, and the objects in the former circle release each other. | |||
409 | 423 | ||
410 | 424 | ||
411 | Example code to copy from | 425 | Example code to copy from |
426 | ========================= | ||
412 | 427 | ||
413 | For a more complete example of using ksets and kobjects properly, see the | 428 | For a more complete example of using ksets and kobjects properly, see the |
414 | example programs samples/kobject/{kobject-example.c,kset-example.c}, | 429 | example programs samples/kobject/{kobject-example.c,kset-example.c}, |
diff --git a/Documentation/kprobes.txt b/Documentation/kprobes.txt index 1f6d45abfe42..2335715bf471 100644 --- a/Documentation/kprobes.txt +++ b/Documentation/kprobes.txt | |||
@@ -1,30 +1,36 @@ | |||
1 | Title : Kernel Probes (Kprobes) | 1 | ======================= |
2 | Authors : Jim Keniston <jkenisto@us.ibm.com> | 2 | Kernel Probes (Kprobes) |
3 | : Prasanna S Panchamukhi <prasanna.panchamukhi@gmail.com> | 3 | ======================= |
4 | : Masami Hiramatsu <mhiramat@redhat.com> | 4 | |
5 | 5 | :Author: Jim Keniston <jkenisto@us.ibm.com> | |
6 | CONTENTS | 6 | :Author: Prasanna S Panchamukhi <prasanna.panchamukhi@gmail.com> |
7 | 7 | :Author: Masami Hiramatsu <mhiramat@redhat.com> | |
8 | 1. Concepts: Kprobes, Jprobes, Return Probes | 8 | |
9 | 2. Architectures Supported | 9 | .. CONTENTS |
10 | 3. Configuring Kprobes | 10 | |
11 | 4. API Reference | 11 | 1. Concepts: Kprobes, Jprobes, Return Probes |
12 | 5. Kprobes Features and Limitations | 12 | 2. Architectures Supported |
13 | 6. Probe Overhead | 13 | 3. Configuring Kprobes |
14 | 7. TODO | 14 | 4. API Reference |
15 | 8. Kprobes Example | 15 | 5. Kprobes Features and Limitations |
16 | 9. Jprobes Example | 16 | 6. Probe Overhead |
17 | 10. Kretprobes Example | 17 | 7. TODO |
18 | Appendix A: The kprobes debugfs interface | 18 | 8. Kprobes Example |
19 | Appendix B: The kprobes sysctl interface | 19 | 9. Jprobes Example |
20 | 20 | 10. Kretprobes Example | |
21 | 1. Concepts: Kprobes, Jprobes, Return Probes | 21 | Appendix A: The kprobes debugfs interface |
22 | Appendix B: The kprobes sysctl interface | ||
23 | |||
24 | Concepts: Kprobes, Jprobes, Return Probes | ||
25 | ========================================= | ||
22 | 26 | ||
23 | Kprobes enables you to dynamically break into any kernel routine and | 27 | Kprobes enables you to dynamically break into any kernel routine and |
24 | collect debugging and performance information non-disruptively. You | 28 | collect debugging and performance information non-disruptively. You |
25 | can trap at almost any kernel code address(*), specifying a handler | 29 | can trap at almost any kernel code address [1]_, specifying a handler |
26 | routine to be invoked when the breakpoint is hit. | 30 | routine to be invoked when the breakpoint is hit. |
27 | (*: some parts of the kernel code can not be trapped, see 1.5 Blacklist) | 31 | |
32 | .. [1] some parts of the kernel code can not be trapped, see | ||
33 | :ref:`kprobes_blacklist`) | ||
28 | 34 | ||
29 | There are currently three types of probes: kprobes, jprobes, and | 35 | There are currently three types of probes: kprobes, jprobes, and |
30 | kretprobes (also called return probes). A kprobe can be inserted | 36 | kretprobes (also called return probes). A kprobe can be inserted |
@@ -40,8 +46,8 @@ registration function such as register_kprobe() specifies where | |||
40 | the probe is to be inserted and what handler is to be called when | 46 | the probe is to be inserted and what handler is to be called when |
41 | the probe is hit. | 47 | the probe is hit. |
42 | 48 | ||
43 | There are also register_/unregister_*probes() functions for batch | 49 | There are also ``register_/unregister_*probes()`` functions for batch |
44 | registration/unregistration of a group of *probes. These functions | 50 | registration/unregistration of a group of ``*probes``. These functions |
45 | can speed up unregistration process when you have to unregister | 51 | can speed up unregistration process when you have to unregister |
46 | a lot of probes at once. | 52 | a lot of probes at once. |
47 | 53 | ||
@@ -51,9 +57,10 @@ things that you'll need to know in order to make the best use of | |||
51 | Kprobes -- e.g., the difference between a pre_handler and | 57 | Kprobes -- e.g., the difference between a pre_handler and |
52 | a post_handler, and how to use the maxactive and nmissed fields of | 58 | a post_handler, and how to use the maxactive and nmissed fields of |
53 | a kretprobe. But if you're in a hurry to start using Kprobes, you | 59 | a kretprobe. But if you're in a hurry to start using Kprobes, you |
54 | can skip ahead to section 2. | 60 | can skip ahead to :ref:`kprobes_archs_supported`. |
55 | 61 | ||
56 | 1.1 How Does a Kprobe Work? | 62 | How Does a Kprobe Work? |
63 | ----------------------- | ||
57 | 64 | ||
58 | When a kprobe is registered, Kprobes makes a copy of the probed | 65 | When a kprobe is registered, Kprobes makes a copy of the probed |
59 | instruction and replaces the first byte(s) of the probed instruction | 66 | instruction and replaces the first byte(s) of the probed instruction |
@@ -75,7 +82,8 @@ After the instruction is single-stepped, Kprobes executes the | |||
75 | "post_handler," if any, that is associated with the kprobe. | 82 | "post_handler," if any, that is associated with the kprobe. |
76 | Execution then continues with the instruction following the probepoint. | 83 | Execution then continues with the instruction following the probepoint. |
77 | 84 | ||
78 | 1.2 How Does a Jprobe Work? | 85 | How Does a Jprobe Work? |
86 | ----------------------- | ||
79 | 87 | ||
80 | A jprobe is implemented using a kprobe that is placed on a function's | 88 | A jprobe is implemented using a kprobe that is placed on a function's |
81 | entry point. It employs a simple mirroring principle to allow | 89 | entry point. It employs a simple mirroring principle to allow |
@@ -113,9 +121,11 @@ more than eight function arguments, an argument of more than sixteen | |||
113 | bytes, or more than 64 bytes of argument data, depending on | 121 | bytes, or more than 64 bytes of argument data, depending on |
114 | architecture). | 122 | architecture). |
115 | 123 | ||
116 | 1.3 Return Probes | 124 | Return Probes |
125 | ------------- | ||
117 | 126 | ||
118 | 1.3.1 How Does a Return Probe Work? | 127 | How Does a Return Probe Work? |
128 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
119 | 129 | ||
120 | When you call register_kretprobe(), Kprobes establishes a kprobe at | 130 | When you call register_kretprobe(), Kprobes establishes a kprobe at |
121 | the entry to the function. When the probed function is called and this | 131 | the entry to the function. When the probed function is called and this |
@@ -150,7 +160,8 @@ zero when the return probe is registered, and is incremented every | |||
150 | time the probed function is entered but there is no kretprobe_instance | 160 | time the probed function is entered but there is no kretprobe_instance |
151 | object available for establishing the return probe. | 161 | object available for establishing the return probe. |
152 | 162 | ||
153 | 1.3.2 Kretprobe entry-handler | 163 | Kretprobe entry-handler |
164 | ^^^^^^^^^^^^^^^^^^^^^^^ | ||
154 | 165 | ||
155 | Kretprobes also provides an optional user-specified handler which runs | 166 | Kretprobes also provides an optional user-specified handler which runs |
156 | on function entry. This handler is specified by setting the entry_handler | 167 | on function entry. This handler is specified by setting the entry_handler |
@@ -174,7 +185,10 @@ In case probed function is entered but there is no kretprobe_instance | |||
174 | object available, then in addition to incrementing the nmissed count, | 185 | object available, then in addition to incrementing the nmissed count, |
175 | the user entry_handler invocation is also skipped. | 186 | the user entry_handler invocation is also skipped. |
176 | 187 | ||
177 | 1.4 How Does Jump Optimization Work? | 188 | .. _kprobes_jump_optimization: |
189 | |||
190 | How Does Jump Optimization Work? | ||
191 | -------------------------------- | ||
178 | 192 | ||
179 | If your kernel is built with CONFIG_OPTPROBES=y (currently this flag | 193 | If your kernel is built with CONFIG_OPTPROBES=y (currently this flag |
180 | is automatically set 'y' on x86/x86-64, non-preemptive kernel) and | 194 | is automatically set 'y' on x86/x86-64, non-preemptive kernel) and |
@@ -182,53 +196,60 @@ the "debug.kprobes_optimization" kernel parameter is set to 1 (see | |||
182 | sysctl(8)), Kprobes tries to reduce probe-hit overhead by using a jump | 196 | sysctl(8)), Kprobes tries to reduce probe-hit overhead by using a jump |
183 | instruction instead of a breakpoint instruction at each probepoint. | 197 | instruction instead of a breakpoint instruction at each probepoint. |
184 | 198 | ||
185 | 1.4.1 Init a Kprobe | 199 | Init a Kprobe |
200 | ^^^^^^^^^^^^^ | ||
186 | 201 | ||
187 | When a probe is registered, before attempting this optimization, | 202 | When a probe is registered, before attempting this optimization, |
188 | Kprobes inserts an ordinary, breakpoint-based kprobe at the specified | 203 | Kprobes inserts an ordinary, breakpoint-based kprobe at the specified |
189 | address. So, even if it's not possible to optimize this particular | 204 | address. So, even if it's not possible to optimize this particular |
190 | probepoint, there'll be a probe there. | 205 | probepoint, there'll be a probe there. |
191 | 206 | ||
192 | 1.4.2 Safety Check | 207 | Safety Check |
208 | ^^^^^^^^^^^^ | ||
193 | 209 | ||
194 | Before optimizing a probe, Kprobes performs the following safety checks: | 210 | Before optimizing a probe, Kprobes performs the following safety checks: |
195 | 211 | ||
196 | - Kprobes verifies that the region that will be replaced by the jump | 212 | - Kprobes verifies that the region that will be replaced by the jump |
197 | instruction (the "optimized region") lies entirely within one function. | 213 | instruction (the "optimized region") lies entirely within one function. |
198 | (A jump instruction is multiple bytes, and so may overlay multiple | 214 | (A jump instruction is multiple bytes, and so may overlay multiple |
199 | instructions.) | 215 | instructions.) |
200 | 216 | ||
201 | - Kprobes analyzes the entire function and verifies that there is no | 217 | - Kprobes analyzes the entire function and verifies that there is no |
202 | jump into the optimized region. Specifically: | 218 | jump into the optimized region. Specifically: |
219 | |||
203 | - the function contains no indirect jump; | 220 | - the function contains no indirect jump; |
204 | - the function contains no instruction that causes an exception (since | 221 | - the function contains no instruction that causes an exception (since |
205 | the fixup code triggered by the exception could jump back into the | 222 | the fixup code triggered by the exception could jump back into the |
206 | optimized region -- Kprobes checks the exception tables to verify this); | 223 | optimized region -- Kprobes checks the exception tables to verify this); |
207 | and | ||
208 | - there is no near jump to the optimized region (other than to the first | 224 | - there is no near jump to the optimized region (other than to the first |
209 | byte). | 225 | byte). |
210 | 226 | ||
211 | - For each instruction in the optimized region, Kprobes verifies that | 227 | - For each instruction in the optimized region, Kprobes verifies that |
212 | the instruction can be executed out of line. | 228 | the instruction can be executed out of line. |
213 | 229 | ||
214 | 1.4.3 Preparing Detour Buffer | 230 | Preparing Detour Buffer |
231 | ^^^^^^^^^^^^^^^^^^^^^^^ | ||
215 | 232 | ||
216 | Next, Kprobes prepares a "detour" buffer, which contains the following | 233 | Next, Kprobes prepares a "detour" buffer, which contains the following |
217 | instruction sequence: | 234 | instruction sequence: |
235 | |||
218 | - code to push the CPU's registers (emulating a breakpoint trap) | 236 | - code to push the CPU's registers (emulating a breakpoint trap) |
219 | - a call to the trampoline code which calls user's probe handlers. | 237 | - a call to the trampoline code which calls user's probe handlers. |
220 | - code to restore registers | 238 | - code to restore registers |
221 | - the instructions from the optimized region | 239 | - the instructions from the optimized region |
222 | - a jump back to the original execution path. | 240 | - a jump back to the original execution path. |
223 | 241 | ||
224 | 1.4.4 Pre-optimization | 242 | Pre-optimization |
243 | ^^^^^^^^^^^^^^^^ | ||
225 | 244 | ||
226 | After preparing the detour buffer, Kprobes verifies that none of the | 245 | After preparing the detour buffer, Kprobes verifies that none of the |
227 | following situations exist: | 246 | following situations exist: |
247 | |||
228 | - The probe has either a break_handler (i.e., it's a jprobe) or a | 248 | - The probe has either a break_handler (i.e., it's a jprobe) or a |
229 | post_handler. | 249 | post_handler. |
230 | - Other instructions in the optimized region are probed. | 250 | - Other instructions in the optimized region are probed. |
231 | - The probe is disabled. | 251 | - The probe is disabled. |
252 | |||
232 | In any of the above cases, Kprobes won't start optimizing the probe. | 253 | In any of the above cases, Kprobes won't start optimizing the probe. |
233 | Since these are temporary situations, Kprobes tries to start | 254 | Since these are temporary situations, Kprobes tries to start |
234 | optimizing it again if the situation is changed. | 255 | optimizing it again if the situation is changed. |
@@ -240,21 +261,23 @@ Kprobes returns control to the original instruction path by setting | |||
240 | the CPU's instruction pointer to the copied code in the detour buffer | 261 | the CPU's instruction pointer to the copied code in the detour buffer |
241 | -- thus at least avoiding the single-step. | 262 | -- thus at least avoiding the single-step. |
242 | 263 | ||
243 | 1.4.5 Optimization | 264 | Optimization |
265 | ^^^^^^^^^^^^ | ||
244 | 266 | ||
245 | The Kprobe-optimizer doesn't insert the jump instruction immediately; | 267 | The Kprobe-optimizer doesn't insert the jump instruction immediately; |
246 | rather, it calls synchronize_sched() for safety first, because it's | 268 | rather, it calls synchronize_sched() for safety first, because it's |
247 | possible for a CPU to be interrupted in the middle of executing the | 269 | possible for a CPU to be interrupted in the middle of executing the |
248 | optimized region(*). As you know, synchronize_sched() can ensure | 270 | optimized region [3]_. As you know, synchronize_sched() can ensure |
249 | that all interruptions that were active when synchronize_sched() | 271 | that all interruptions that were active when synchronize_sched() |
250 | was called are done, but only if CONFIG_PREEMPT=n. So, this version | 272 | was called are done, but only if CONFIG_PREEMPT=n. So, this version |
251 | of kprobe optimization supports only kernels with CONFIG_PREEMPT=n.(**) | 273 | of kprobe optimization supports only kernels with CONFIG_PREEMPT=n [4]_. |
252 | 274 | ||
253 | After that, the Kprobe-optimizer calls stop_machine() to replace | 275 | After that, the Kprobe-optimizer calls stop_machine() to replace |
254 | the optimized region with a jump instruction to the detour buffer, | 276 | the optimized region with a jump instruction to the detour buffer, |
255 | using text_poke_smp(). | 277 | using text_poke_smp(). |
256 | 278 | ||
257 | 1.4.6 Unoptimization | 279 | Unoptimization |
280 | ^^^^^^^^^^^^^^ | ||
258 | 281 | ||
259 | When an optimized kprobe is unregistered, disabled, or blocked by | 282 | When an optimized kprobe is unregistered, disabled, or blocked by |
260 | another kprobe, it will be unoptimized. If this happens before | 283 | another kprobe, it will be unoptimized. If this happens before |
@@ -263,15 +286,15 @@ optimized list. If the optimization has been done, the jump is | |||
263 | replaced with the original code (except for an int3 breakpoint in | 286 | replaced with the original code (except for an int3 breakpoint in |
264 | the first byte) by using text_poke_smp(). | 287 | the first byte) by using text_poke_smp(). |
265 | 288 | ||
266 | (*)Please imagine that the 2nd instruction is interrupted and then | 289 | .. [3] Please imagine that the 2nd instruction is interrupted and then |
267 | the optimizer replaces the 2nd instruction with the jump *address* | 290 | the optimizer replaces the 2nd instruction with the jump *address* |
268 | while the interrupt handler is running. When the interrupt | 291 | while the interrupt handler is running. When the interrupt |
269 | returns to original address, there is no valid instruction, | 292 | returns to original address, there is no valid instruction, |
270 | and it causes an unexpected result. | 293 | and it causes an unexpected result. |
271 | 294 | ||
272 | (**)This optimization-safety checking may be replaced with the | 295 | .. [4] This optimization-safety checking may be replaced with the |
273 | stop-machine method that ksplice uses for supporting a CONFIG_PREEMPT=y | 296 | stop-machine method that ksplice uses for supporting a CONFIG_PREEMPT=y |
274 | kernel. | 297 | kernel. |
275 | 298 | ||
276 | NOTE for geeks: | 299 | NOTE for geeks: |
277 | The jump optimization changes the kprobe's pre_handler behavior. | 300 | The jump optimization changes the kprobe's pre_handler behavior. |
@@ -280,11 +303,17 @@ path by changing regs->ip and returning 1. However, when the probe | |||
280 | is optimized, that modification is ignored. Thus, if you want to | 303 | is optimized, that modification is ignored. Thus, if you want to |
281 | tweak the kernel's execution path, you need to suppress optimization, | 304 | tweak the kernel's execution path, you need to suppress optimization, |
282 | using one of the following techniques: | 305 | using one of the following techniques: |
306 | |||
283 | - Specify an empty function for the kprobe's post_handler or break_handler. | 307 | - Specify an empty function for the kprobe's post_handler or break_handler. |
284 | or | 308 | |
309 | or | ||
310 | |||
285 | - Execute 'sysctl -w debug.kprobes_optimization=n' | 311 | - Execute 'sysctl -w debug.kprobes_optimization=n' |
286 | 312 | ||
287 | 1.5 Blacklist | 313 | .. _kprobes_blacklist: |
314 | |||
315 | Blacklist | ||
316 | --------- | ||
288 | 317 | ||
289 | Kprobes can probe most of the kernel except itself. This means | 318 | Kprobes can probe most of the kernel except itself. This means |
290 | that there are some functions where kprobes cannot probe. Probing | 319 | that there are some functions where kprobes cannot probe. Probing |
@@ -297,7 +326,10 @@ to specify a blacklisted function. | |||
297 | Kprobes checks the given probe address against the blacklist and | 326 | Kprobes checks the given probe address against the blacklist and |
298 | rejects registering it, if the given address is in the blacklist. | 327 | rejects registering it, if the given address is in the blacklist. |
299 | 328 | ||
300 | 2. Architectures Supported | 329 | .. _kprobes_archs_supported: |
330 | |||
331 | Architectures Supported | ||
332 | ======================= | ||
301 | 333 | ||
302 | Kprobes, jprobes, and return probes are implemented on the following | 334 | Kprobes, jprobes, and return probes are implemented on the following |
303 | architectures: | 335 | architectures: |
@@ -312,7 +344,8 @@ architectures: | |||
312 | - mips | 344 | - mips |
313 | - s390 | 345 | - s390 |
314 | 346 | ||
315 | 3. Configuring Kprobes | 347 | Configuring Kprobes |
348 | =================== | ||
316 | 349 | ||
317 | When configuring the kernel using make menuconfig/xconfig/oldconfig, | 350 | When configuring the kernel using make menuconfig/xconfig/oldconfig, |
318 | ensure that CONFIG_KPROBES is set to "y". Under "General setup", look | 351 | ensure that CONFIG_KPROBES is set to "y". Under "General setup", look |
@@ -331,7 +364,8 @@ it useful to "Compile the kernel with debug info" (CONFIG_DEBUG_INFO), | |||
331 | so you can use "objdump -d -l vmlinux" to see the source-to-object | 364 | so you can use "objdump -d -l vmlinux" to see the source-to-object |
332 | code mapping. | 365 | code mapping. |
333 | 366 | ||
334 | 4. API Reference | 367 | API Reference |
368 | ============= | ||
335 | 369 | ||
336 | The Kprobes API includes a "register" function and an "unregister" | 370 | The Kprobes API includes a "register" function and an "unregister" |
337 | function for each type of probe. The API also includes "register_*probes" | 371 | function for each type of probe. The API also includes "register_*probes" |
@@ -340,10 +374,13 @@ Here are terse, mini-man-page specifications for these functions and | |||
340 | the associated probe handlers that you'll write. See the files in the | 374 | the associated probe handlers that you'll write. See the files in the |
341 | samples/kprobes/ sub-directory for examples. | 375 | samples/kprobes/ sub-directory for examples. |
342 | 376 | ||
343 | 4.1 register_kprobe | 377 | register_kprobe |
378 | --------------- | ||
379 | |||
380 | :: | ||
344 | 381 | ||
345 | #include <linux/kprobes.h> | 382 | #include <linux/kprobes.h> |
346 | int register_kprobe(struct kprobe *kp); | 383 | int register_kprobe(struct kprobe *kp); |
347 | 384 | ||
348 | Sets a breakpoint at the address kp->addr. When the breakpoint is | 385 | Sets a breakpoint at the address kp->addr. When the breakpoint is |
349 | hit, Kprobes calls kp->pre_handler. After the probed instruction | 386 | hit, Kprobes calls kp->pre_handler. After the probed instruction |
@@ -354,61 +391,68 @@ kp->fault_handler. Any or all handlers can be NULL. If kp->flags | |||
354 | is set KPROBE_FLAG_DISABLED, that kp will be registered but disabled, | 391 | is set KPROBE_FLAG_DISABLED, that kp will be registered but disabled, |
355 | so, its handlers aren't hit until calling enable_kprobe(kp). | 392 | so, its handlers aren't hit until calling enable_kprobe(kp). |
356 | 393 | ||
357 | NOTE: | 394 | .. note:: |
358 | 1. With the introduction of the "symbol_name" field to struct kprobe, | 395 | |
359 | the probepoint address resolution will now be taken care of by the kernel. | 396 | 1. With the introduction of the "symbol_name" field to struct kprobe, |
360 | The following will now work: | 397 | the probepoint address resolution will now be taken care of by the kernel. |
398 | The following will now work:: | ||
361 | 399 | ||
362 | kp.symbol_name = "symbol_name"; | 400 | kp.symbol_name = "symbol_name"; |
363 | 401 | ||
364 | (64-bit powerpc intricacies such as function descriptors are handled | 402 | (64-bit powerpc intricacies such as function descriptors are handled |
365 | transparently) | 403 | transparently) |
366 | 404 | ||
367 | 2. Use the "offset" field of struct kprobe if the offset into the symbol | 405 | 2. Use the "offset" field of struct kprobe if the offset into the symbol |
368 | to install a probepoint is known. This field is used to calculate the | 406 | to install a probepoint is known. This field is used to calculate the |
369 | probepoint. | 407 | probepoint. |
370 | 408 | ||
371 | 3. Specify either the kprobe "symbol_name" OR the "addr". If both are | 409 | 3. Specify either the kprobe "symbol_name" OR the "addr". If both are |
372 | specified, kprobe registration will fail with -EINVAL. | 410 | specified, kprobe registration will fail with -EINVAL. |
373 | 411 | ||
374 | 4. With CISC architectures (such as i386 and x86_64), the kprobes code | 412 | 4. With CISC architectures (such as i386 and x86_64), the kprobes code |
375 | does not validate if the kprobe.addr is at an instruction boundary. | 413 | does not validate if the kprobe.addr is at an instruction boundary. |
376 | Use "offset" with caution. | 414 | Use "offset" with caution. |
377 | 415 | ||
378 | register_kprobe() returns 0 on success, or a negative errno otherwise. | 416 | register_kprobe() returns 0 on success, or a negative errno otherwise. |
379 | 417 | ||
380 | User's pre-handler (kp->pre_handler): | 418 | User's pre-handler (kp->pre_handler):: |
381 | #include <linux/kprobes.h> | 419 | |
382 | #include <linux/ptrace.h> | 420 | #include <linux/kprobes.h> |
383 | int pre_handler(struct kprobe *p, struct pt_regs *regs); | 421 | #include <linux/ptrace.h> |
422 | int pre_handler(struct kprobe *p, struct pt_regs *regs); | ||
384 | 423 | ||
385 | Called with p pointing to the kprobe associated with the breakpoint, | 424 | Called with p pointing to the kprobe associated with the breakpoint, |
386 | and regs pointing to the struct containing the registers saved when | 425 | and regs pointing to the struct containing the registers saved when |
387 | the breakpoint was hit. Return 0 here unless you're a Kprobes geek. | 426 | the breakpoint was hit. Return 0 here unless you're a Kprobes geek. |
388 | 427 | ||
389 | User's post-handler (kp->post_handler): | 428 | User's post-handler (kp->post_handler):: |
390 | #include <linux/kprobes.h> | 429 | |
391 | #include <linux/ptrace.h> | 430 | #include <linux/kprobes.h> |
392 | void post_handler(struct kprobe *p, struct pt_regs *regs, | 431 | #include <linux/ptrace.h> |
393 | unsigned long flags); | 432 | void post_handler(struct kprobe *p, struct pt_regs *regs, |
433 | unsigned long flags); | ||
394 | 434 | ||
395 | p and regs are as described for the pre_handler. flags always seems | 435 | p and regs are as described for the pre_handler. flags always seems |
396 | to be zero. | 436 | to be zero. |
397 | 437 | ||
398 | User's fault-handler (kp->fault_handler): | 438 | User's fault-handler (kp->fault_handler):: |
399 | #include <linux/kprobes.h> | 439 | |
400 | #include <linux/ptrace.h> | 440 | #include <linux/kprobes.h> |
401 | int fault_handler(struct kprobe *p, struct pt_regs *regs, int trapnr); | 441 | #include <linux/ptrace.h> |
442 | int fault_handler(struct kprobe *p, struct pt_regs *regs, int trapnr); | ||
402 | 443 | ||
403 | p and regs are as described for the pre_handler. trapnr is the | 444 | p and regs are as described for the pre_handler. trapnr is the |
404 | architecture-specific trap number associated with the fault (e.g., | 445 | architecture-specific trap number associated with the fault (e.g., |
405 | on i386, 13 for a general protection fault or 14 for a page fault). | 446 | on i386, 13 for a general protection fault or 14 for a page fault). |
406 | Returns 1 if it successfully handled the exception. | 447 | Returns 1 if it successfully handled the exception. |
407 | 448 | ||
408 | 4.2 register_jprobe | 449 | register_jprobe |
450 | --------------- | ||
409 | 451 | ||
410 | #include <linux/kprobes.h> | 452 | :: |
411 | int register_jprobe(struct jprobe *jp) | 453 | |
454 | #include <linux/kprobes.h> | ||
455 | int register_jprobe(struct jprobe *jp) | ||
412 | 456 | ||
413 | Sets a breakpoint at the address jp->kp.addr, which must be the address | 457 | Sets a breakpoint at the address jp->kp.addr, which must be the address |
414 | of the first instruction of a function. When the breakpoint is hit, | 458 | of the first instruction of a function. When the breakpoint is hit, |
@@ -423,10 +467,13 @@ declaration must match. | |||
423 | 467 | ||
424 | register_jprobe() returns 0 on success, or a negative errno otherwise. | 468 | register_jprobe() returns 0 on success, or a negative errno otherwise. |
425 | 469 | ||
426 | 4.3 register_kretprobe | 470 | register_kretprobe |
471 | ------------------ | ||
472 | |||
473 | :: | ||
427 | 474 | ||
428 | #include <linux/kprobes.h> | 475 | #include <linux/kprobes.h> |
429 | int register_kretprobe(struct kretprobe *rp); | 476 | int register_kretprobe(struct kretprobe *rp); |
430 | 477 | ||
431 | Establishes a return probe for the function whose address is | 478 | Establishes a return probe for the function whose address is |
432 | rp->kp.addr. When that function returns, Kprobes calls rp->handler. | 479 | rp->kp.addr. When that function returns, Kprobes calls rp->handler. |
@@ -436,14 +483,17 @@ register_kretprobe(); see "How Does a Return Probe Work?" for details. | |||
436 | register_kretprobe() returns 0 on success, or a negative errno | 483 | register_kretprobe() returns 0 on success, or a negative errno |
437 | otherwise. | 484 | otherwise. |
438 | 485 | ||
439 | User's return-probe handler (rp->handler): | 486 | User's return-probe handler (rp->handler):: |
440 | #include <linux/kprobes.h> | 487 | |
441 | #include <linux/ptrace.h> | 488 | #include <linux/kprobes.h> |
442 | int kretprobe_handler(struct kretprobe_instance *ri, struct pt_regs *regs); | 489 | #include <linux/ptrace.h> |
490 | int kretprobe_handler(struct kretprobe_instance *ri, | ||
491 | struct pt_regs *regs); | ||
443 | 492 | ||
444 | regs is as described for kprobe.pre_handler. ri points to the | 493 | regs is as described for kprobe.pre_handler. ri points to the |
445 | kretprobe_instance object, of which the following fields may be | 494 | kretprobe_instance object, of which the following fields may be |
446 | of interest: | 495 | of interest: |
496 | |||
447 | - ret_addr: the return address | 497 | - ret_addr: the return address |
448 | - rp: points to the corresponding kretprobe object | 498 | - rp: points to the corresponding kretprobe object |
449 | - task: points to the corresponding task struct | 499 | - task: points to the corresponding task struct |
@@ -456,74 +506,94 @@ the architecture's ABI. | |||
456 | 506 | ||
457 | The handler's return value is currently ignored. | 507 | The handler's return value is currently ignored. |
458 | 508 | ||
459 | 4.4 unregister_*probe | 509 | unregister_*probe |
510 | ------------------ | ||
511 | |||
512 | :: | ||
460 | 513 | ||
461 | #include <linux/kprobes.h> | 514 | #include <linux/kprobes.h> |
462 | void unregister_kprobe(struct kprobe *kp); | 515 | void unregister_kprobe(struct kprobe *kp); |
463 | void unregister_jprobe(struct jprobe *jp); | 516 | void unregister_jprobe(struct jprobe *jp); |
464 | void unregister_kretprobe(struct kretprobe *rp); | 517 | void unregister_kretprobe(struct kretprobe *rp); |
465 | 518 | ||
466 | Removes the specified probe. The unregister function can be called | 519 | Removes the specified probe. The unregister function can be called |
467 | at any time after the probe has been registered. | 520 | at any time after the probe has been registered. |
468 | 521 | ||
469 | NOTE: | 522 | .. note:: |
470 | If the functions find an incorrect probe (ex. an unregistered probe), | 523 | |
471 | they clear the addr field of the probe. | 524 | If the functions find an incorrect probe (ex. an unregistered probe), |
525 | they clear the addr field of the probe. | ||
526 | |||
527 | register_*probes | ||
528 | ---------------- | ||
472 | 529 | ||
473 | 4.5 register_*probes | 530 | :: |
474 | 531 | ||
475 | #include <linux/kprobes.h> | 532 | #include <linux/kprobes.h> |
476 | int register_kprobes(struct kprobe **kps, int num); | 533 | int register_kprobes(struct kprobe **kps, int num); |
477 | int register_kretprobes(struct kretprobe **rps, int num); | 534 | int register_kretprobes(struct kretprobe **rps, int num); |
478 | int register_jprobes(struct jprobe **jps, int num); | 535 | int register_jprobes(struct jprobe **jps, int num); |
479 | 536 | ||
480 | Registers each of the num probes in the specified array. If any | 537 | Registers each of the num probes in the specified array. If any |
481 | error occurs during registration, all probes in the array, up to | 538 | error occurs during registration, all probes in the array, up to |
482 | the bad probe, are safely unregistered before the register_*probes | 539 | the bad probe, are safely unregistered before the register_*probes |
483 | function returns. | 540 | function returns. |
484 | - kps/rps/jps: an array of pointers to *probe data structures | 541 | |
542 | - kps/rps/jps: an array of pointers to ``*probe`` data structures | ||
485 | - num: the number of the array entries. | 543 | - num: the number of the array entries. |
486 | 544 | ||
487 | NOTE: | 545 | .. note:: |
488 | You have to allocate(or define) an array of pointers and set all | 546 | |
489 | of the array entries before using these functions. | 547 | You have to allocate(or define) an array of pointers and set all |
548 | of the array entries before using these functions. | ||
490 | 549 | ||
491 | 4.6 unregister_*probes | 550 | unregister_*probes |
551 | ------------------ | ||
492 | 552 | ||
493 | #include <linux/kprobes.h> | 553 | :: |
494 | void unregister_kprobes(struct kprobe **kps, int num); | 554 | |
495 | void unregister_kretprobes(struct kretprobe **rps, int num); | 555 | #include <linux/kprobes.h> |
496 | void unregister_jprobes(struct jprobe **jps, int num); | 556 | void unregister_kprobes(struct kprobe **kps, int num); |
557 | void unregister_kretprobes(struct kretprobe **rps, int num); | ||
558 | void unregister_jprobes(struct jprobe **jps, int num); | ||
497 | 559 | ||
498 | Removes each of the num probes in the specified array at once. | 560 | Removes each of the num probes in the specified array at once. |
499 | 561 | ||
500 | NOTE: | 562 | .. note:: |
501 | If the functions find some incorrect probes (ex. unregistered | 563 | |
502 | probes) in the specified array, they clear the addr field of those | 564 | If the functions find some incorrect probes (ex. unregistered |
503 | incorrect probes. However, other probes in the array are | 565 | probes) in the specified array, they clear the addr field of those |
504 | unregistered correctly. | 566 | incorrect probes. However, other probes in the array are |
567 | unregistered correctly. | ||
505 | 568 | ||
506 | 4.7 disable_*probe | 569 | disable_*probe |
570 | -------------- | ||
507 | 571 | ||
508 | #include <linux/kprobes.h> | 572 | :: |
509 | int disable_kprobe(struct kprobe *kp); | ||
510 | int disable_kretprobe(struct kretprobe *rp); | ||
511 | int disable_jprobe(struct jprobe *jp); | ||
512 | 573 | ||
513 | Temporarily disables the specified *probe. You can enable it again by using | 574 | #include <linux/kprobes.h> |
575 | int disable_kprobe(struct kprobe *kp); | ||
576 | int disable_kretprobe(struct kretprobe *rp); | ||
577 | int disable_jprobe(struct jprobe *jp); | ||
578 | |||
579 | Temporarily disables the specified ``*probe``. You can enable it again by using | ||
514 | enable_*probe(). You must specify the probe which has been registered. | 580 | enable_*probe(). You must specify the probe which has been registered. |
515 | 581 | ||
516 | 4.8 enable_*probe | 582 | enable_*probe |
583 | ------------- | ||
584 | |||
585 | :: | ||
517 | 586 | ||
518 | #include <linux/kprobes.h> | 587 | #include <linux/kprobes.h> |
519 | int enable_kprobe(struct kprobe *kp); | 588 | int enable_kprobe(struct kprobe *kp); |
520 | int enable_kretprobe(struct kretprobe *rp); | 589 | int enable_kretprobe(struct kretprobe *rp); |
521 | int enable_jprobe(struct jprobe *jp); | 590 | int enable_jprobe(struct jprobe *jp); |
522 | 591 | ||
523 | Enables *probe which has been disabled by disable_*probe(). You must specify | 592 | Enables ``*probe`` which has been disabled by disable_*probe(). You must specify |
524 | the probe which has been registered. | 593 | the probe which has been registered. |
525 | 594 | ||
526 | 5. Kprobes Features and Limitations | 595 | Kprobes Features and Limitations |
596 | ================================ | ||
527 | 597 | ||
528 | Kprobes allows multiple probes at the same address. Currently, | 598 | Kprobes allows multiple probes at the same address. Currently, |
529 | however, there cannot be multiple jprobes on the same function at | 599 | however, there cannot be multiple jprobes on the same function at |
@@ -538,7 +608,7 @@ are discussed in this section. | |||
538 | 608 | ||
539 | The register_*probe functions will return -EINVAL if you attempt | 609 | The register_*probe functions will return -EINVAL if you attempt |
540 | to install a probe in the code that implements Kprobes (mostly | 610 | to install a probe in the code that implements Kprobes (mostly |
541 | kernel/kprobes.c and arch/*/kernel/kprobes.c, but also functions such | 611 | kernel/kprobes.c and ``arch/*/kernel/kprobes.c``, but also functions such |
542 | as do_page_fault and notifier_call_chain). | 612 | as do_page_fault and notifier_call_chain). |
543 | 613 | ||
544 | If you install a probe in an inline-able function, Kprobes makes | 614 | If you install a probe in an inline-able function, Kprobes makes |
@@ -602,19 +672,21 @@ explain it, we introduce some terminology. Imagine a 3-instruction | |||
602 | sequence consisting of a two 2-byte instructions and one 3-byte | 672 | sequence consisting of a two 2-byte instructions and one 3-byte |
603 | instruction. | 673 | instruction. |
604 | 674 | ||
605 | IA | 675 | :: |
606 | | | ||
607 | [-2][-1][0][1][2][3][4][5][6][7] | ||
608 | [ins1][ins2][ ins3 ] | ||
609 | [<- DCR ->] | ||
610 | [<- JTPR ->] | ||
611 | 676 | ||
612 | ins1: 1st Instruction | 677 | IA |
613 | ins2: 2nd Instruction | 678 | | |
614 | ins3: 3rd Instruction | 679 | [-2][-1][0][1][2][3][4][5][6][7] |
615 | IA: Insertion Address | 680 | [ins1][ins2][ ins3 ] |
616 | JTPR: Jump Target Prohibition Region | 681 | [<- DCR ->] |
617 | DCR: Detoured Code Region | 682 | [<- JTPR ->] |
683 | |||
684 | ins1: 1st Instruction | ||
685 | ins2: 2nd Instruction | ||
686 | ins3: 3rd Instruction | ||
687 | IA: Insertion Address | ||
688 | JTPR: Jump Target Prohibition Region | ||
689 | DCR: Detoured Code Region | ||
618 | 690 | ||
619 | The instructions in DCR are copied to the out-of-line buffer | 691 | The instructions in DCR are copied to the out-of-line buffer |
620 | of the kprobe, because the bytes in DCR are replaced by | 692 | of the kprobe, because the bytes in DCR are replaced by |
@@ -628,7 +700,8 @@ d) DCR must not straddle the border between functions. | |||
628 | Anyway, these limitations are checked by the in-kernel instruction | 700 | Anyway, these limitations are checked by the in-kernel instruction |
629 | decoder, so you don't need to worry about that. | 701 | decoder, so you don't need to worry about that. |
630 | 702 | ||
631 | 6. Probe Overhead | 703 | Probe Overhead |
704 | ============== | ||
632 | 705 | ||
633 | On a typical CPU in use in 2005, a kprobe hit takes 0.5 to 1.0 | 706 | On a typical CPU in use in 2005, a kprobe hit takes 0.5 to 1.0 |
634 | microseconds to process. Specifically, a benchmark that hits the same | 707 | microseconds to process. Specifically, a benchmark that hits the same |
@@ -638,70 +711,80 @@ return-probe hit typically takes 50-75% longer than a kprobe hit. | |||
638 | When you have a return probe set on a function, adding a kprobe at | 711 | When you have a return probe set on a function, adding a kprobe at |
639 | the entry to that function adds essentially no overhead. | 712 | the entry to that function adds essentially no overhead. |
640 | 713 | ||
641 | Here are sample overhead figures (in usec) for different architectures. | 714 | Here are sample overhead figures (in usec) for different architectures:: |
642 | k = kprobe; j = jprobe; r = return probe; kr = kprobe + return probe | 715 | |
643 | on same function; jr = jprobe + return probe on same function | 716 | k = kprobe; j = jprobe; r = return probe; kr = kprobe + return probe |
717 | on same function; jr = jprobe + return probe on same function:: | ||
644 | 718 | ||
645 | i386: Intel Pentium M, 1495 MHz, 2957.31 bogomips | 719 | i386: Intel Pentium M, 1495 MHz, 2957.31 bogomips |
646 | k = 0.57 usec; j = 1.00; r = 0.92; kr = 0.99; jr = 1.40 | 720 | k = 0.57 usec; j = 1.00; r = 0.92; kr = 0.99; jr = 1.40 |
647 | 721 | ||
648 | x86_64: AMD Opteron 246, 1994 MHz, 3971.48 bogomips | 722 | x86_64: AMD Opteron 246, 1994 MHz, 3971.48 bogomips |
649 | k = 0.49 usec; j = 0.76; r = 0.80; kr = 0.82; jr = 1.07 | 723 | k = 0.49 usec; j = 0.76; r = 0.80; kr = 0.82; jr = 1.07 |
650 | 724 | ||
651 | ppc64: POWER5 (gr), 1656 MHz (SMT disabled, 1 virtual CPU per physical CPU) | 725 | ppc64: POWER5 (gr), 1656 MHz (SMT disabled, 1 virtual CPU per physical CPU) |
652 | k = 0.77 usec; j = 1.31; r = 1.26; kr = 1.45; jr = 1.99 | 726 | k = 0.77 usec; j = 1.31; r = 1.26; kr = 1.45; jr = 1.99 |
653 | 727 | ||
654 | 6.1 Optimized Probe Overhead | 728 | Optimized Probe Overhead |
729 | ------------------------ | ||
655 | 730 | ||
656 | Typically, an optimized kprobe hit takes 0.07 to 0.1 microseconds to | 731 | Typically, an optimized kprobe hit takes 0.07 to 0.1 microseconds to |
657 | process. Here are sample overhead figures (in usec) for x86 architectures. | 732 | process. Here are sample overhead figures (in usec) for x86 architectures:: |
658 | k = unoptimized kprobe, b = boosted (single-step skipped), o = optimized kprobe, | ||
659 | r = unoptimized kretprobe, rb = boosted kretprobe, ro = optimized kretprobe. | ||
660 | 733 | ||
661 | i386: Intel(R) Xeon(R) E5410, 2.33GHz, 4656.90 bogomips | 734 | k = unoptimized kprobe, b = boosted (single-step skipped), o = optimized kprobe, |
662 | k = 0.80 usec; b = 0.33; o = 0.05; r = 1.10; rb = 0.61; ro = 0.33 | 735 | r = unoptimized kretprobe, rb = boosted kretprobe, ro = optimized kretprobe. |
663 | 736 | ||
664 | x86-64: Intel(R) Xeon(R) E5410, 2.33GHz, 4656.90 bogomips | 737 | i386: Intel(R) Xeon(R) E5410, 2.33GHz, 4656.90 bogomips |
665 | k = 0.99 usec; b = 0.43; o = 0.06; r = 1.24; rb = 0.68; ro = 0.30 | 738 | k = 0.80 usec; b = 0.33; o = 0.05; r = 1.10; rb = 0.61; ro = 0.33 |
666 | 739 | ||
667 | 7. TODO | 740 | x86-64: Intel(R) Xeon(R) E5410, 2.33GHz, 4656.90 bogomips |
741 | k = 0.99 usec; b = 0.43; o = 0.06; r = 1.24; rb = 0.68; ro = 0.30 | ||
742 | |||
743 | TODO | ||
744 | ==== | ||
668 | 745 | ||
669 | a. SystemTap (http://sourceware.org/systemtap): Provides a simplified | 746 | a. SystemTap (http://sourceware.org/systemtap): Provides a simplified |
670 | programming interface for probe-based instrumentation. Try it out. | 747 | programming interface for probe-based instrumentation. Try it out. |
671 | b. Kernel return probes for sparc64. | 748 | b. Kernel return probes for sparc64. |
672 | c. Support for other architectures. | 749 | c. Support for other architectures. |
673 | d. User-space probes. | 750 | d. User-space probes. |
674 | e. Watchpoint probes (which fire on data references). | 751 | e. Watchpoint probes (which fire on data references). |
675 | 752 | ||
676 | 8. Kprobes Example | 753 | Kprobes Example |
754 | =============== | ||
677 | 755 | ||
678 | See samples/kprobes/kprobe_example.c | 756 | See samples/kprobes/kprobe_example.c |
679 | 757 | ||
680 | 9. Jprobes Example | 758 | Jprobes Example |
759 | =============== | ||
681 | 760 | ||
682 | See samples/kprobes/jprobe_example.c | 761 | See samples/kprobes/jprobe_example.c |
683 | 762 | ||
684 | 10. Kretprobes Example | 763 | Kretprobes Example |
764 | ================== | ||
685 | 765 | ||
686 | See samples/kprobes/kretprobe_example.c | 766 | See samples/kprobes/kretprobe_example.c |
687 | 767 | ||
688 | For additional information on Kprobes, refer to the following URLs: | 768 | For additional information on Kprobes, refer to the following URLs: |
689 | http://www-106.ibm.com/developerworks/library/l-kprobes.html?ca=dgr-lnxw42Kprobe | 769 | |
690 | http://www.redhat.com/magazine/005mar05/features/kprobes/ | 770 | - http://www-106.ibm.com/developerworks/library/l-kprobes.html?ca=dgr-lnxw42Kprobe |
691 | http://www-users.cs.umn.edu/~boutcher/kprobes/ | 771 | - http://www.redhat.com/magazine/005mar05/features/kprobes/ |
692 | http://www.linuxsymposium.org/2006/linuxsymposium_procv2.pdf (pages 101-115) | 772 | - http://www-users.cs.umn.edu/~boutcher/kprobes/ |
773 | - http://www.linuxsymposium.org/2006/linuxsymposium_procv2.pdf (pages 101-115) | ||
693 | 774 | ||
694 | 775 | ||
695 | Appendix A: The kprobes debugfs interface | 776 | The kprobes debugfs interface |
777 | ============================= | ||
778 | |||
696 | 779 | ||
697 | With recent kernels (> 2.6.20) the list of registered kprobes is visible | 780 | With recent kernels (> 2.6.20) the list of registered kprobes is visible |
698 | under the /sys/kernel/debug/kprobes/ directory (assuming debugfs is mounted at //sys/kernel/debug). | 781 | under the /sys/kernel/debug/kprobes/ directory (assuming debugfs is mounted at //sys/kernel/debug). |
699 | 782 | ||
700 | /sys/kernel/debug/kprobes/list: Lists all registered probes on the system | 783 | /sys/kernel/debug/kprobes/list: Lists all registered probes on the system:: |
701 | 784 | ||
702 | c015d71a k vfs_read+0x0 | 785 | c015d71a k vfs_read+0x0 |
703 | c011a316 j do_fork+0x0 | 786 | c011a316 j do_fork+0x0 |
704 | c03dedc5 r tcp_v4_rcv+0x0 | 787 | c03dedc5 r tcp_v4_rcv+0x0 |
705 | 788 | ||
706 | The first column provides the kernel address where the probe is inserted. | 789 | The first column provides the kernel address where the probe is inserted. |
707 | The second column identifies the type of probe (k - kprobe, r - kretprobe | 790 | The second column identifies the type of probe (k - kprobe, r - kretprobe |
@@ -725,17 +808,19 @@ change each probe's disabling state. This means that disabled kprobes (marked | |||
725 | [DISABLED]) will be not enabled if you turn ON all kprobes by this knob. | 808 | [DISABLED]) will be not enabled if you turn ON all kprobes by this knob. |
726 | 809 | ||
727 | 810 | ||
728 | Appendix B: The kprobes sysctl interface | 811 | The kprobes sysctl interface |
812 | ============================ | ||
729 | 813 | ||
730 | /proc/sys/debug/kprobes-optimization: Turn kprobes optimization ON/OFF. | 814 | /proc/sys/debug/kprobes-optimization: Turn kprobes optimization ON/OFF. |
731 | 815 | ||
732 | When CONFIG_OPTPROBES=y, this sysctl interface appears and it provides | 816 | When CONFIG_OPTPROBES=y, this sysctl interface appears and it provides |
733 | a knob to globally and forcibly turn jump optimization (see section | 817 | a knob to globally and forcibly turn jump optimization (see section |
734 | 1.4) ON or OFF. By default, jump optimization is allowed (ON). | 818 | :ref:`kprobes_jump_optimization`) ON or OFF. By default, jump optimization |
735 | If you echo "0" to this file or set "debug.kprobes_optimization" to | 819 | is allowed (ON). If you echo "0" to this file or set |
736 | 0 via sysctl, all optimized probes will be unoptimized, and any new | 820 | "debug.kprobes_optimization" to 0 via sysctl, all optimized probes will be |
737 | probes registered after that will not be optimized. Note that this | 821 | unoptimized, and any new probes registered after that will not be optimized. |
738 | knob *changes* the optimized state. This means that optimized probes | 822 | |
739 | (marked [OPTIMIZED]) will be unoptimized ([OPTIMIZED] tag will be | 823 | Note that this knob *changes* the optimized state. This means that optimized |
824 | probes (marked [OPTIMIZED]) will be unoptimized ([OPTIMIZED] tag will be | ||
740 | removed). If the knob is turned on, they will be optimized again. | 825 | removed). If the knob is turned on, they will be optimized again. |
741 | 826 | ||
diff --git a/Documentation/kref.txt b/Documentation/kref.txt index d26a27ca964d..3af384156d7e 100644 --- a/Documentation/kref.txt +++ b/Documentation/kref.txt | |||
@@ -1,24 +1,42 @@ | |||
1 | =================================================== | ||
2 | Adding reference counters (krefs) to kernel objects | ||
3 | =================================================== | ||
4 | |||
5 | :Author: Corey Minyard <minyard@acm.org> | ||
6 | :Author: Thomas Hellstrom <thellstrom@vmware.com> | ||
7 | |||
8 | A lot of this was lifted from Greg Kroah-Hartman's 2004 OLS paper and | ||
9 | presentation on krefs, which can be found at: | ||
10 | |||
11 | - http://www.kroah.com/linux/talks/ols_2004_kref_paper/Reprint-Kroah-Hartman-OLS2004.pdf | ||
12 | - http://www.kroah.com/linux/talks/ols_2004_kref_talk/ | ||
13 | |||
14 | Introduction | ||
15 | ============ | ||
1 | 16 | ||
2 | krefs allow you to add reference counters to your objects. If you | 17 | krefs allow you to add reference counters to your objects. If you |
3 | have objects that are used in multiple places and passed around, and | 18 | have objects that are used in multiple places and passed around, and |
4 | you don't have refcounts, your code is almost certainly broken. If | 19 | you don't have refcounts, your code is almost certainly broken. If |
5 | you want refcounts, krefs are the way to go. | 20 | you want refcounts, krefs are the way to go. |
6 | 21 | ||
7 | To use a kref, add one to your data structures like: | 22 | To use a kref, add one to your data structures like:: |
8 | 23 | ||
9 | struct my_data | 24 | struct my_data |
10 | { | 25 | { |
11 | . | 26 | . |
12 | . | 27 | . |
13 | struct kref refcount; | 28 | struct kref refcount; |
14 | . | 29 | . |
15 | . | 30 | . |
16 | }; | 31 | }; |
17 | 32 | ||
18 | The kref can occur anywhere within the data structure. | 33 | The kref can occur anywhere within the data structure. |
19 | 34 | ||
35 | Initialization | ||
36 | ============== | ||
37 | |||
20 | You must initialize the kref after you allocate it. To do this, call | 38 | You must initialize the kref after you allocate it. To do this, call |
21 | kref_init as so: | 39 | kref_init as so:: |
22 | 40 | ||
23 | struct my_data *data; | 41 | struct my_data *data; |
24 | 42 | ||
@@ -29,18 +47,25 @@ kref_init as so: | |||
29 | 47 | ||
30 | This sets the refcount in the kref to 1. | 48 | This sets the refcount in the kref to 1. |
31 | 49 | ||
50 | Kref rules | ||
51 | ========== | ||
52 | |||
32 | Once you have an initialized kref, you must follow the following | 53 | Once you have an initialized kref, you must follow the following |
33 | rules: | 54 | rules: |
34 | 55 | ||
35 | 1) If you make a non-temporary copy of a pointer, especially if | 56 | 1) If you make a non-temporary copy of a pointer, especially if |
36 | it can be passed to another thread of execution, you must | 57 | it can be passed to another thread of execution, you must |
37 | increment the refcount with kref_get() before passing it off: | 58 | increment the refcount with kref_get() before passing it off:: |
59 | |||
38 | kref_get(&data->refcount); | 60 | kref_get(&data->refcount); |
61 | |||
39 | If you already have a valid pointer to a kref-ed structure (the | 62 | If you already have a valid pointer to a kref-ed structure (the |
40 | refcount cannot go to zero) you may do this without a lock. | 63 | refcount cannot go to zero) you may do this without a lock. |
41 | 64 | ||
42 | 2) When you are done with a pointer, you must call kref_put(): | 65 | 2) When you are done with a pointer, you must call kref_put():: |
66 | |||
43 | kref_put(&data->refcount, data_release); | 67 | kref_put(&data->refcount, data_release); |
68 | |||
44 | If this is the last reference to the pointer, the release | 69 | If this is the last reference to the pointer, the release |
45 | routine will be called. If the code never tries to get | 70 | routine will be called. If the code never tries to get |
46 | a valid pointer to a kref-ed structure without already | 71 | a valid pointer to a kref-ed structure without already |
@@ -53,25 +78,25 @@ rules: | |||
53 | structure must remain valid during the kref_get(). | 78 | structure must remain valid during the kref_get(). |
54 | 79 | ||
55 | For example, if you allocate some data and then pass it to another | 80 | For example, if you allocate some data and then pass it to another |
56 | thread to process: | 81 | thread to process:: |
57 | 82 | ||
58 | void data_release(struct kref *ref) | 83 | void data_release(struct kref *ref) |
59 | { | 84 | { |
60 | struct my_data *data = container_of(ref, struct my_data, refcount); | 85 | struct my_data *data = container_of(ref, struct my_data, refcount); |
61 | kfree(data); | 86 | kfree(data); |
62 | } | 87 | } |
63 | 88 | ||
64 | void more_data_handling(void *cb_data) | 89 | void more_data_handling(void *cb_data) |
65 | { | 90 | { |
66 | struct my_data *data = cb_data; | 91 | struct my_data *data = cb_data; |
67 | . | 92 | . |
68 | . do stuff with data here | 93 | . do stuff with data here |
69 | . | 94 | . |
70 | kref_put(&data->refcount, data_release); | 95 | kref_put(&data->refcount, data_release); |
71 | } | 96 | } |
72 | 97 | ||
73 | int my_data_handler(void) | 98 | int my_data_handler(void) |
74 | { | 99 | { |
75 | int rv = 0; | 100 | int rv = 0; |
76 | struct my_data *data; | 101 | struct my_data *data; |
77 | struct task_struct *task; | 102 | struct task_struct *task; |
@@ -91,10 +116,10 @@ int my_data_handler(void) | |||
91 | . | 116 | . |
92 | . do stuff with data here | 117 | . do stuff with data here |
93 | . | 118 | . |
94 | out: | 119 | out: |
95 | kref_put(&data->refcount, data_release); | 120 | kref_put(&data->refcount, data_release); |
96 | return rv; | 121 | return rv; |
97 | } | 122 | } |
98 | 123 | ||
99 | This way, it doesn't matter what order the two threads handle the | 124 | This way, it doesn't matter what order the two threads handle the |
100 | data, the kref_put() handles knowing when the data is not referenced | 125 | data, the kref_put() handles knowing when the data is not referenced |
@@ -104,7 +129,7 @@ put needs no lock because nothing tries to get the data without | |||
104 | already holding a pointer. | 129 | already holding a pointer. |
105 | 130 | ||
106 | Note that the "before" in rule 1 is very important. You should never | 131 | Note that the "before" in rule 1 is very important. You should never |
107 | do something like: | 132 | do something like:: |
108 | 133 | ||
109 | task = kthread_run(more_data_handling, data, "more_data_handling"); | 134 | task = kthread_run(more_data_handling, data, "more_data_handling"); |
110 | if (task == ERR_PTR(-ENOMEM)) { | 135 | if (task == ERR_PTR(-ENOMEM)) { |
@@ -124,14 +149,14 @@ bad style. Don't do it. | |||
124 | There are some situations where you can optimize the gets and puts. | 149 | There are some situations where you can optimize the gets and puts. |
125 | For instance, if you are done with an object and enqueuing it for | 150 | For instance, if you are done with an object and enqueuing it for |
126 | something else or passing it off to something else, there is no reason | 151 | something else or passing it off to something else, there is no reason |
127 | to do a get then a put: | 152 | to do a get then a put:: |
128 | 153 | ||
129 | /* Silly extra get and put */ | 154 | /* Silly extra get and put */ |
130 | kref_get(&obj->ref); | 155 | kref_get(&obj->ref); |
131 | enqueue(obj); | 156 | enqueue(obj); |
132 | kref_put(&obj->ref, obj_cleanup); | 157 | kref_put(&obj->ref, obj_cleanup); |
133 | 158 | ||
134 | Just do the enqueue. A comment about this is always welcome: | 159 | Just do the enqueue. A comment about this is always welcome:: |
135 | 160 | ||
136 | enqueue(obj); | 161 | enqueue(obj); |
137 | /* We are done with obj, so we pass our refcount off | 162 | /* We are done with obj, so we pass our refcount off |
@@ -142,109 +167,99 @@ instance, you have a list of items that are each kref-ed, and you wish | |||
142 | to get the first one. You can't just pull the first item off the list | 167 | to get the first one. You can't just pull the first item off the list |
143 | and kref_get() it. That violates rule 3 because you are not already | 168 | and kref_get() it. That violates rule 3 because you are not already |
144 | holding a valid pointer. You must add a mutex (or some other lock). | 169 | holding a valid pointer. You must add a mutex (or some other lock). |
145 | For instance: | 170 | For instance:: |
146 | 171 | ||
147 | static DEFINE_MUTEX(mutex); | 172 | static DEFINE_MUTEX(mutex); |
148 | static LIST_HEAD(q); | 173 | static LIST_HEAD(q); |
149 | struct my_data | 174 | struct my_data |
150 | { | 175 | { |
151 | struct kref refcount; | 176 | struct kref refcount; |
152 | struct list_head link; | 177 | struct list_head link; |
153 | }; | 178 | }; |
154 | 179 | ||
155 | static struct my_data *get_entry() | 180 | static struct my_data *get_entry() |
156 | { | 181 | { |
157 | struct my_data *entry = NULL; | 182 | struct my_data *entry = NULL; |
158 | mutex_lock(&mutex); | 183 | mutex_lock(&mutex); |
159 | if (!list_empty(&q)) { | 184 | if (!list_empty(&q)) { |
160 | entry = container_of(q.next, struct my_data, link); | 185 | entry = container_of(q.next, struct my_data, link); |
161 | kref_get(&entry->refcount); | 186 | kref_get(&entry->refcount); |
187 | } | ||
188 | mutex_unlock(&mutex); | ||
189 | return entry; | ||
162 | } | 190 | } |
163 | mutex_unlock(&mutex); | ||
164 | return entry; | ||
165 | } | ||
166 | 191 | ||
167 | static void release_entry(struct kref *ref) | 192 | static void release_entry(struct kref *ref) |
168 | { | 193 | { |
169 | struct my_data *entry = container_of(ref, struct my_data, refcount); | 194 | struct my_data *entry = container_of(ref, struct my_data, refcount); |
170 | 195 | ||
171 | list_del(&entry->link); | 196 | list_del(&entry->link); |
172 | kfree(entry); | 197 | kfree(entry); |
173 | } | 198 | } |
174 | 199 | ||
175 | static void put_entry(struct my_data *entry) | 200 | static void put_entry(struct my_data *entry) |
176 | { | 201 | { |
177 | mutex_lock(&mutex); | 202 | mutex_lock(&mutex); |
178 | kref_put(&entry->refcount, release_entry); | 203 | kref_put(&entry->refcount, release_entry); |
179 | mutex_unlock(&mutex); | 204 | mutex_unlock(&mutex); |
180 | } | 205 | } |
181 | 206 | ||
182 | The kref_put() return value is useful if you do not want to hold the | 207 | The kref_put() return value is useful if you do not want to hold the |
183 | lock during the whole release operation. Say you didn't want to call | 208 | lock during the whole release operation. Say you didn't want to call |
184 | kfree() with the lock held in the example above (since it is kind of | 209 | kfree() with the lock held in the example above (since it is kind of |
185 | pointless to do so). You could use kref_put() as follows: | 210 | pointless to do so). You could use kref_put() as follows:: |
186 | 211 | ||
187 | static void release_entry(struct kref *ref) | 212 | static void release_entry(struct kref *ref) |
188 | { | 213 | { |
189 | /* All work is done after the return from kref_put(). */ | 214 | /* All work is done after the return from kref_put(). */ |
190 | } | 215 | } |
191 | 216 | ||
192 | static void put_entry(struct my_data *entry) | 217 | static void put_entry(struct my_data *entry) |
193 | { | 218 | { |
194 | mutex_lock(&mutex); | 219 | mutex_lock(&mutex); |
195 | if (kref_put(&entry->refcount, release_entry)) { | 220 | if (kref_put(&entry->refcount, release_entry)) { |
196 | list_del(&entry->link); | 221 | list_del(&entry->link); |
197 | mutex_unlock(&mutex); | 222 | mutex_unlock(&mutex); |
198 | kfree(entry); | 223 | kfree(entry); |
199 | } else | 224 | } else |
200 | mutex_unlock(&mutex); | 225 | mutex_unlock(&mutex); |
201 | } | 226 | } |
202 | 227 | ||
203 | This is really more useful if you have to call other routines as part | 228 | This is really more useful if you have to call other routines as part |
204 | of the free operations that could take a long time or might claim the | 229 | of the free operations that could take a long time or might claim the |
205 | same lock. Note that doing everything in the release routine is still | 230 | same lock. Note that doing everything in the release routine is still |
206 | preferred as it is a little neater. | 231 | preferred as it is a little neater. |
207 | 232 | ||
208 | |||
209 | Corey Minyard <minyard@acm.org> | ||
210 | |||
211 | A lot of this was lifted from Greg Kroah-Hartman's 2004 OLS paper and | ||
212 | presentation on krefs, which can be found at: | ||
213 | http://www.kroah.com/linux/talks/ols_2004_kref_paper/Reprint-Kroah-Hartman-OLS2004.pdf | ||
214 | and: | ||
215 | http://www.kroah.com/linux/talks/ols_2004_kref_talk/ | ||
216 | |||
217 | |||
218 | The above example could also be optimized using kref_get_unless_zero() in | 233 | The above example could also be optimized using kref_get_unless_zero() in |
219 | the following way: | 234 | the following way:: |
220 | 235 | ||
221 | static struct my_data *get_entry() | 236 | static struct my_data *get_entry() |
222 | { | 237 | { |
223 | struct my_data *entry = NULL; | 238 | struct my_data *entry = NULL; |
224 | mutex_lock(&mutex); | 239 | mutex_lock(&mutex); |
225 | if (!list_empty(&q)) { | 240 | if (!list_empty(&q)) { |
226 | entry = container_of(q.next, struct my_data, link); | 241 | entry = container_of(q.next, struct my_data, link); |
227 | if (!kref_get_unless_zero(&entry->refcount)) | 242 | if (!kref_get_unless_zero(&entry->refcount)) |
228 | entry = NULL; | 243 | entry = NULL; |
244 | } | ||
245 | mutex_unlock(&mutex); | ||
246 | return entry; | ||
229 | } | 247 | } |
230 | mutex_unlock(&mutex); | ||
231 | return entry; | ||
232 | } | ||
233 | 248 | ||
234 | static void release_entry(struct kref *ref) | 249 | static void release_entry(struct kref *ref) |
235 | { | 250 | { |
236 | struct my_data *entry = container_of(ref, struct my_data, refcount); | 251 | struct my_data *entry = container_of(ref, struct my_data, refcount); |
237 | 252 | ||
238 | mutex_lock(&mutex); | 253 | mutex_lock(&mutex); |
239 | list_del(&entry->link); | 254 | list_del(&entry->link); |
240 | mutex_unlock(&mutex); | 255 | mutex_unlock(&mutex); |
241 | kfree(entry); | 256 | kfree(entry); |
242 | } | 257 | } |
243 | 258 | ||
244 | static void put_entry(struct my_data *entry) | 259 | static void put_entry(struct my_data *entry) |
245 | { | 260 | { |
246 | kref_put(&entry->refcount, release_entry); | 261 | kref_put(&entry->refcount, release_entry); |
247 | } | 262 | } |
248 | 263 | ||
249 | Which is useful to remove the mutex lock around kref_put() in put_entry(), but | 264 | Which is useful to remove the mutex lock around kref_put() in put_entry(), but |
250 | it's important that kref_get_unless_zero is enclosed in the same critical | 265 | it's important that kref_get_unless_zero is enclosed in the same critical |
@@ -254,51 +269,51 @@ Note that it is illegal to use kref_get_unless_zero without checking its | |||
254 | return value. If you are sure (by already having a valid pointer) that | 269 | return value. If you are sure (by already having a valid pointer) that |
255 | kref_get_unless_zero() will return true, then use kref_get() instead. | 270 | kref_get_unless_zero() will return true, then use kref_get() instead. |
256 | 271 | ||
257 | The function kref_get_unless_zero also makes it possible to use rcu | 272 | Krefs and RCU |
258 | locking for lookups in the above example: | 273 | ============= |
259 | 274 | ||
260 | struct my_data | 275 | The function kref_get_unless_zero also makes it possible to use rcu |
261 | { | 276 | locking for lookups in the above example:: |
262 | struct rcu_head rhead; | 277 | |
263 | . | 278 | struct my_data |
264 | struct kref refcount; | 279 | { |
265 | . | 280 | struct rcu_head rhead; |
266 | . | 281 | . |
267 | }; | 282 | struct kref refcount; |
268 | 283 | . | |
269 | static struct my_data *get_entry_rcu() | 284 | . |
270 | { | 285 | }; |
271 | struct my_data *entry = NULL; | 286 | |
272 | rcu_read_lock(); | 287 | static struct my_data *get_entry_rcu() |
273 | if (!list_empty(&q)) { | 288 | { |
274 | entry = container_of(q.next, struct my_data, link); | 289 | struct my_data *entry = NULL; |
275 | if (!kref_get_unless_zero(&entry->refcount)) | 290 | rcu_read_lock(); |
276 | entry = NULL; | 291 | if (!list_empty(&q)) { |
292 | entry = container_of(q.next, struct my_data, link); | ||
293 | if (!kref_get_unless_zero(&entry->refcount)) | ||
294 | entry = NULL; | ||
295 | } | ||
296 | rcu_read_unlock(); | ||
297 | return entry; | ||
277 | } | 298 | } |
278 | rcu_read_unlock(); | ||
279 | return entry; | ||
280 | } | ||
281 | 299 | ||
282 | static void release_entry_rcu(struct kref *ref) | 300 | static void release_entry_rcu(struct kref *ref) |
283 | { | 301 | { |
284 | struct my_data *entry = container_of(ref, struct my_data, refcount); | 302 | struct my_data *entry = container_of(ref, struct my_data, refcount); |
285 | 303 | ||
286 | mutex_lock(&mutex); | 304 | mutex_lock(&mutex); |
287 | list_del_rcu(&entry->link); | 305 | list_del_rcu(&entry->link); |
288 | mutex_unlock(&mutex); | 306 | mutex_unlock(&mutex); |
289 | kfree_rcu(entry, rhead); | 307 | kfree_rcu(entry, rhead); |
290 | } | 308 | } |
291 | 309 | ||
292 | static void put_entry(struct my_data *entry) | 310 | static void put_entry(struct my_data *entry) |
293 | { | 311 | { |
294 | kref_put(&entry->refcount, release_entry_rcu); | 312 | kref_put(&entry->refcount, release_entry_rcu); |
295 | } | 313 | } |
296 | 314 | ||
297 | But note that the struct kref member needs to remain in valid memory for a | 315 | But note that the struct kref member needs to remain in valid memory for a |
298 | rcu grace period after release_entry_rcu was called. That can be accomplished | 316 | rcu grace period after release_entry_rcu was called. That can be accomplished |
299 | by using kfree_rcu(entry, rhead) as done above, or by calling synchronize_rcu() | 317 | by using kfree_rcu(entry, rhead) as done above, or by calling synchronize_rcu() |
300 | before using kfree, but note that synchronize_rcu() may sleep for a | 318 | before using kfree, but note that synchronize_rcu() may sleep for a |
301 | substantial amount of time. | 319 | substantial amount of time. |
302 | |||
303 | |||
304 | Thomas Hellstrom <thellstrom@vmware.com> | ||
diff --git a/Documentation/ldm.txt b/Documentation/ldm.txt index 4f80edd14d0a..12c571368e73 100644 --- a/Documentation/ldm.txt +++ b/Documentation/ldm.txt | |||
@@ -1,9 +1,9 @@ | |||
1 | ========================================== | ||
2 | LDM - Logical Disk Manager (Dynamic Disks) | ||
3 | ========================================== | ||
1 | 4 | ||
2 | LDM - Logical Disk Manager (Dynamic Disks) | 5 | :Author: Originally Written by FlatCap - Richard Russon <ldm@flatcap.org>. |
3 | ------------------------------------------ | 6 | :Last Updated: Anton Altaparmakov on 30 March 2007 for Windows Vista. |
4 | |||
5 | Originally Written by FlatCap - Richard Russon <ldm@flatcap.org>. | ||
6 | Last Updated by Anton Altaparmakov on 30 March 2007 for Windows Vista. | ||
7 | 7 | ||
8 | Overview | 8 | Overview |
9 | -------- | 9 | -------- |
@@ -37,24 +37,36 @@ Example | |||
37 | ------- | 37 | ------- |
38 | 38 | ||
39 | Below we have a 50MiB disk, divided into seven partitions. | 39 | Below we have a 50MiB disk, divided into seven partitions. |
40 | N.B. The missing 1MiB at the end of the disk is where the LDM database is | 40 | |
41 | stored. | 41 | .. note:: |
42 | 42 | ||
43 | Device | Offset Bytes Sectors MiB | Size Bytes Sectors MiB | 43 | The missing 1MiB at the end of the disk is where the LDM database is |
44 | -------+----------------------------+--------------------------- | 44 | stored. |
45 | hda | 0 0 0 | 52428800 102400 50 | 45 | |
46 | hda1 | 51380224 100352 49 | 1048576 2048 1 | 46 | +-------++--------------+---------+-----++--------------+---------+----+ |
47 | hda2 | 16384 32 0 | 6979584 13632 6 | 47 | |Device || Offset Bytes | Sectors | MiB || Size Bytes | Sectors | MiB| |
48 | hda3 | 6995968 13664 6 | 10485760 20480 10 | 48 | +=======++==============+=========+=====++==============+=========+====+ |
49 | hda4 | 17481728 34144 16 | 4194304 8192 4 | 49 | |hda || 0 | 0 | 0 || 52428800 | 102400 | 50| |
50 | hda5 | 21676032 42336 20 | 5242880 10240 5 | 50 | +-------++--------------+---------+-----++--------------+---------+----+ |
51 | hda6 | 26918912 52576 25 | 10485760 20480 10 | 51 | |hda1 || 51380224 | 100352 | 49 || 1048576 | 2048 | 1| |
52 | hda7 | 37404672 73056 35 | 13959168 27264 13 | 52 | +-------++--------------+---------+-----++--------------+---------+----+ |
53 | |hda2 || 16384 | 32 | 0 || 6979584 | 13632 | 6| | ||
54 | +-------++--------------+---------+-----++--------------+---------+----+ | ||
55 | |hda3 || 6995968 | 13664 | 6 || 10485760 | 20480 | 10| | ||
56 | +-------++--------------+---------+-----++--------------+---------+----+ | ||
57 | |hda4 || 17481728 | 34144 | 16 || 4194304 | 8192 | 4| | ||
58 | +-------++--------------+---------+-----++--------------+---------+----+ | ||
59 | |hda5 || 21676032 | 42336 | 20 || 5242880 | 10240 | 5| | ||
60 | +-------++--------------+---------+-----++--------------+---------+----+ | ||
61 | |hda6 || 26918912 | 52576 | 25 || 10485760 | 20480 | 10| | ||
62 | +-------++--------------+---------+-----++--------------+---------+----+ | ||
63 | |hda7 || 37404672 | 73056 | 35 || 13959168 | 27264 | 13| | ||
64 | +-------++--------------+---------+-----++--------------+---------+----+ | ||
53 | 65 | ||
54 | The LDM Database may not store the partitions in the order that they appear on | 66 | The LDM Database may not store the partitions in the order that they appear on |
55 | disk, but the driver will sort them. | 67 | disk, but the driver will sort them. |
56 | 68 | ||
57 | When Linux boots, you will see something like: | 69 | When Linux boots, you will see something like:: |
58 | 70 | ||
59 | hda: 102400 sectors w/32KiB Cache, CHS=50/64/32 | 71 | hda: 102400 sectors w/32KiB Cache, CHS=50/64/32 |
60 | hda: [LDM] hda1 hda2 hda3 hda4 hda5 hda6 hda7 | 72 | hda: [LDM] hda1 hda2 hda3 hda4 hda5 hda6 hda7 |
@@ -65,13 +77,13 @@ Compiling LDM Support | |||
65 | 77 | ||
66 | To enable LDM, choose the following two options: | 78 | To enable LDM, choose the following two options: |
67 | 79 | ||
68 | "Advanced partition selection" CONFIG_PARTITION_ADVANCED | 80 | - "Advanced partition selection" CONFIG_PARTITION_ADVANCED |
69 | "Windows Logical Disk Manager (Dynamic Disk) support" CONFIG_LDM_PARTITION | 81 | - "Windows Logical Disk Manager (Dynamic Disk) support" CONFIG_LDM_PARTITION |
70 | 82 | ||
71 | If you believe the driver isn't working as it should, you can enable the extra | 83 | If you believe the driver isn't working as it should, you can enable the extra |
72 | debugging code. This will produce a LOT of output. The option is: | 84 | debugging code. This will produce a LOT of output. The option is: |
73 | 85 | ||
74 | "Windows LDM extra logging" CONFIG_LDM_DEBUG | 86 | - "Windows LDM extra logging" CONFIG_LDM_DEBUG |
75 | 87 | ||
76 | N.B. The partition code cannot be compiled as a module. | 88 | N.B. The partition code cannot be compiled as a module. |
77 | 89 | ||
diff --git a/Documentation/lockup-watchdogs.txt b/Documentation/lockup-watchdogs.txt index c8b8378513d6..290840c160af 100644 --- a/Documentation/lockup-watchdogs.txt +++ b/Documentation/lockup-watchdogs.txt | |||
@@ -30,7 +30,8 @@ timeout is set through the confusingly named "kernel.panic" sysctl), | |||
30 | to cause the system to reboot automatically after a specified amount | 30 | to cause the system to reboot automatically after a specified amount |
31 | of time. | 31 | of time. |
32 | 32 | ||
33 | === Implementation === | 33 | Implementation |
34 | ============== | ||
34 | 35 | ||
35 | The soft and hard lockup detectors are built on top of the hrtimer and | 36 | The soft and hard lockup detectors are built on top of the hrtimer and |
36 | perf subsystems, respectively. A direct consequence of this is that, | 37 | perf subsystems, respectively. A direct consequence of this is that, |
diff --git a/Documentation/lzo.txt b/Documentation/lzo.txt index 285c54f66779..6fa6a93d0949 100644 --- a/Documentation/lzo.txt +++ b/Documentation/lzo.txt | |||
@@ -1,8 +1,9 @@ | |||
1 | 1 | =========================================================== | |
2 | LZO stream format as understood by Linux's LZO decompressor | 2 | LZO stream format as understood by Linux's LZO decompressor |
3 | =========================================================== | 3 | =========================================================== |
4 | 4 | ||
5 | Introduction | 5 | Introduction |
6 | ============ | ||
6 | 7 | ||
7 | This is not a specification. No specification seems to be publicly available | 8 | This is not a specification. No specification seems to be publicly available |
8 | for the LZO stream format. This document describes what input format the LZO | 9 | for the LZO stream format. This document describes what input format the LZO |
@@ -14,12 +15,13 @@ Introduction | |||
14 | for future bug reports. | 15 | for future bug reports. |
15 | 16 | ||
16 | Description | 17 | Description |
18 | =========== | ||
17 | 19 | ||
18 | The stream is composed of a series of instructions, operands, and data. The | 20 | The stream is composed of a series of instructions, operands, and data. The |
19 | instructions consist in a few bits representing an opcode, and bits forming | 21 | instructions consist in a few bits representing an opcode, and bits forming |
20 | the operands for the instruction, whose size and position depend on the | 22 | the operands for the instruction, whose size and position depend on the |
21 | opcode and on the number of literals copied by previous instruction. The | 23 | opcode and on the number of literals copied by previous instruction. The |
22 | operands are used to indicate : | 24 | operands are used to indicate: |
23 | 25 | ||
24 | - a distance when copying data from the dictionary (past output buffer) | 26 | - a distance when copying data from the dictionary (past output buffer) |
25 | - a length (number of bytes to copy from dictionary) | 27 | - a length (number of bytes to copy from dictionary) |
@@ -38,7 +40,7 @@ Description | |||
38 | of bits in the operand. If the number of bits isn't enough to represent the | 40 | of bits in the operand. If the number of bits isn't enough to represent the |
39 | length, up to 255 may be added in increments by consuming more bytes with a | 41 | length, up to 255 may be added in increments by consuming more bytes with a |
40 | rate of at most 255 per extra byte (thus the compression ratio cannot exceed | 42 | rate of at most 255 per extra byte (thus the compression ratio cannot exceed |
41 | around 255:1). The variable length encoding using #bits is always the same : | 43 | around 255:1). The variable length encoding using #bits is always the same:: |
42 | 44 | ||
43 | length = byte & ((1 << #bits) - 1) | 45 | length = byte & ((1 << #bits) - 1) |
44 | if (!length) { | 46 | if (!length) { |
@@ -67,15 +69,19 @@ Description | |||
67 | instruction may encode this distance (0001HLLL), it takes one LE16 operand | 69 | instruction may encode this distance (0001HLLL), it takes one LE16 operand |
68 | for the distance, thus requiring 3 bytes. | 70 | for the distance, thus requiring 3 bytes. |
69 | 71 | ||
70 | IMPORTANT NOTE : in the code some length checks are missing because certain | 72 | .. important:: |
71 | instructions are called under the assumption that a certain number of bytes | 73 | |
72 | follow because it has already been guaranteed before parsing the instructions. | 74 | In the code some length checks are missing because certain instructions |
73 | They just have to "refill" this credit if they consume extra bytes. This is | 75 | are called under the assumption that a certain number of bytes follow |
74 | an implementation design choice independent on the algorithm or encoding. | 76 | because it has already been guaranteed before parsing the instructions. |
77 | They just have to "refill" this credit if they consume extra bytes. This | ||
78 | is an implementation design choice independent on the algorithm or | ||
79 | encoding. | ||
75 | 80 | ||
76 | Byte sequences | 81 | Byte sequences |
82 | ============== | ||
77 | 83 | ||
78 | First byte encoding : | 84 | First byte encoding:: |
79 | 85 | ||
80 | 0..17 : follow regular instruction encoding, see below. It is worth | 86 | 0..17 : follow regular instruction encoding, see below. It is worth |
81 | noting that codes 16 and 17 will represent a block copy from | 87 | noting that codes 16 and 17 will represent a block copy from |
@@ -91,7 +97,7 @@ Byte sequences | |||
91 | state = 4 [ don't copy extra literals ] | 97 | state = 4 [ don't copy extra literals ] |
92 | skip byte | 98 | skip byte |
93 | 99 | ||
94 | Instruction encoding : | 100 | Instruction encoding:: |
95 | 101 | ||
96 | 0 0 0 0 X X X X (0..15) | 102 | 0 0 0 0 X X X X (0..15) |
97 | Depends on the number of literals copied by the last instruction. | 103 | Depends on the number of literals copied by the last instruction. |
@@ -156,6 +162,7 @@ Byte sequences | |||
156 | distance = (H << 3) + D + 1 | 162 | distance = (H << 3) + D + 1 |
157 | 163 | ||
158 | Authors | 164 | Authors |
165 | ======= | ||
159 | 166 | ||
160 | This document was written by Willy Tarreau <w@1wt.eu> on 2014/07/19 during an | 167 | This document was written by Willy Tarreau <w@1wt.eu> on 2014/07/19 during an |
161 | analysis of the decompression code available in Linux 3.16-rc5. The code is | 168 | analysis of the decompression code available in Linux 3.16-rc5. The code is |
diff --git a/Documentation/mailbox.txt b/Documentation/mailbox.txt index 7ed371c85204..0ed95009cc30 100644 --- a/Documentation/mailbox.txt +++ b/Documentation/mailbox.txt | |||
@@ -1,7 +1,10 @@ | |||
1 | The Common Mailbox Framework | 1 | ============================ |
2 | Jassi Brar <jaswinder.singh@linaro.org> | 2 | The Common Mailbox Framework |
3 | ============================ | ||
3 | 4 | ||
4 | This document aims to help developers write client and controller | 5 | :Author: Jassi Brar <jaswinder.singh@linaro.org> |
6 | |||
7 | This document aims to help developers write client and controller | ||
5 | drivers for the API. But before we start, let us note that the | 8 | drivers for the API. But before we start, let us note that the |
6 | client (especially) and controller drivers are likely going to be | 9 | client (especially) and controller drivers are likely going to be |
7 | very platform specific because the remote firmware is likely to be | 10 | very platform specific because the remote firmware is likely to be |
@@ -13,14 +16,17 @@ similar copies of code written for each platform. Having said that, | |||
13 | nothing prevents the remote f/w to also be Linux based and use the | 16 | nothing prevents the remote f/w to also be Linux based and use the |
14 | same api there. However none of that helps us locally because we only | 17 | same api there. However none of that helps us locally because we only |
15 | ever deal at client's protocol level. | 18 | ever deal at client's protocol level. |
16 | Some of the choices made during implementation are the result of this | 19 | |
20 | Some of the choices made during implementation are the result of this | ||
17 | peculiarity of this "common" framework. | 21 | peculiarity of this "common" framework. |
18 | 22 | ||
19 | 23 | ||
20 | 24 | ||
21 | Part 1 - Controller Driver (See include/linux/mailbox_controller.h) | 25 | Controller Driver (See include/linux/mailbox_controller.h) |
26 | ========================================================== | ||
27 | |||
22 | 28 | ||
23 | Allocate mbox_controller and the array of mbox_chan. | 29 | Allocate mbox_controller and the array of mbox_chan. |
24 | Populate mbox_chan_ops, except peek_data() all are mandatory. | 30 | Populate mbox_chan_ops, except peek_data() all are mandatory. |
25 | The controller driver might know a message has been consumed | 31 | The controller driver might know a message has been consumed |
26 | by the remote by getting an IRQ or polling some hardware flag | 32 | by the remote by getting an IRQ or polling some hardware flag |
@@ -30,91 +36,94 @@ the controller driver should set via 'txdone_irq' or 'txdone_poll' | |||
30 | or neither. | 36 | or neither. |
31 | 37 | ||
32 | 38 | ||
33 | Part 2 - Client Driver (See include/linux/mailbox_client.h) | 39 | Client Driver (See include/linux/mailbox_client.h) |
40 | ================================================== | ||
34 | 41 | ||
35 | The client might want to operate in blocking mode (synchronously | 42 | |
43 | The client might want to operate in blocking mode (synchronously | ||
36 | send a message through before returning) or non-blocking/async mode (submit | 44 | send a message through before returning) or non-blocking/async mode (submit |
37 | a message and a callback function to the API and return immediately). | 45 | a message and a callback function to the API and return immediately). |
38 | 46 | ||
39 | 47 | :: | |
40 | struct demo_client { | 48 | |
41 | struct mbox_client cl; | 49 | struct demo_client { |
42 | struct mbox_chan *mbox; | 50 | struct mbox_client cl; |
43 | struct completion c; | 51 | struct mbox_chan *mbox; |
44 | bool async; | 52 | struct completion c; |
45 | /* ... */ | 53 | bool async; |
46 | }; | 54 | /* ... */ |
47 | 55 | }; | |
48 | /* | 56 | |
49 | * This is the handler for data received from remote. The behaviour is purely | 57 | /* |
50 | * dependent upon the protocol. This is just an example. | 58 | * This is the handler for data received from remote. The behaviour is purely |
51 | */ | 59 | * dependent upon the protocol. This is just an example. |
52 | static void message_from_remote(struct mbox_client *cl, void *mssg) | 60 | */ |
53 | { | 61 | static void message_from_remote(struct mbox_client *cl, void *mssg) |
54 | struct demo_client *dc = container_of(cl, struct demo_client, cl); | 62 | { |
55 | if (dc->async) { | 63 | struct demo_client *dc = container_of(cl, struct demo_client, cl); |
56 | if (is_an_ack(mssg)) { | 64 | if (dc->async) { |
57 | /* An ACK to our last sample sent */ | 65 | if (is_an_ack(mssg)) { |
58 | return; /* Or do something else here */ | 66 | /* An ACK to our last sample sent */ |
59 | } else { /* A new message from remote */ | 67 | return; /* Or do something else here */ |
60 | queue_req(mssg); | 68 | } else { /* A new message from remote */ |
69 | queue_req(mssg); | ||
70 | } | ||
71 | } else { | ||
72 | /* Remote f/w sends only ACK packets on this channel */ | ||
73 | return; | ||
61 | } | 74 | } |
62 | } else { | ||
63 | /* Remote f/w sends only ACK packets on this channel */ | ||
64 | return; | ||
65 | } | 75 | } |
66 | } | 76 | |
67 | 77 | static void sample_sent(struct mbox_client *cl, void *mssg, int r) | |
68 | static void sample_sent(struct mbox_client *cl, void *mssg, int r) | 78 | { |
69 | { | 79 | struct demo_client *dc = container_of(cl, struct demo_client, cl); |
70 | struct demo_client *dc = container_of(cl, struct demo_client, cl); | 80 | complete(&dc->c); |
71 | complete(&dc->c); | 81 | } |
72 | } | 82 | |
73 | 83 | static void client_demo(struct platform_device *pdev) | |
74 | static void client_demo(struct platform_device *pdev) | 84 | { |
75 | { | 85 | struct demo_client *dc_sync, *dc_async; |
76 | struct demo_client *dc_sync, *dc_async; | 86 | /* The controller already knows async_pkt and sync_pkt */ |
77 | /* The controller already knows async_pkt and sync_pkt */ | 87 | struct async_pkt ap; |
78 | struct async_pkt ap; | 88 | struct sync_pkt sp; |
79 | struct sync_pkt sp; | 89 | |
80 | 90 | dc_sync = kzalloc(sizeof(*dc_sync), GFP_KERNEL); | |
81 | dc_sync = kzalloc(sizeof(*dc_sync), GFP_KERNEL); | 91 | dc_async = kzalloc(sizeof(*dc_async), GFP_KERNEL); |
82 | dc_async = kzalloc(sizeof(*dc_async), GFP_KERNEL); | 92 | |
83 | 93 | /* Populate non-blocking mode client */ | |
84 | /* Populate non-blocking mode client */ | 94 | dc_async->cl.dev = &pdev->dev; |
85 | dc_async->cl.dev = &pdev->dev; | 95 | dc_async->cl.rx_callback = message_from_remote; |
86 | dc_async->cl.rx_callback = message_from_remote; | 96 | dc_async->cl.tx_done = sample_sent; |
87 | dc_async->cl.tx_done = sample_sent; | 97 | dc_async->cl.tx_block = false; |
88 | dc_async->cl.tx_block = false; | 98 | dc_async->cl.tx_tout = 0; /* doesn't matter here */ |
89 | dc_async->cl.tx_tout = 0; /* doesn't matter here */ | 99 | dc_async->cl.knows_txdone = false; /* depending upon protocol */ |
90 | dc_async->cl.knows_txdone = false; /* depending upon protocol */ | 100 | dc_async->async = true; |
91 | dc_async->async = true; | 101 | init_completion(&dc_async->c); |
92 | init_completion(&dc_async->c); | 102 | |
93 | 103 | /* Populate blocking mode client */ | |
94 | /* Populate blocking mode client */ | 104 | dc_sync->cl.dev = &pdev->dev; |
95 | dc_sync->cl.dev = &pdev->dev; | 105 | dc_sync->cl.rx_callback = message_from_remote; |
96 | dc_sync->cl.rx_callback = message_from_remote; | 106 | dc_sync->cl.tx_done = NULL; /* operate in blocking mode */ |
97 | dc_sync->cl.tx_done = NULL; /* operate in blocking mode */ | 107 | dc_sync->cl.tx_block = true; |
98 | dc_sync->cl.tx_block = true; | 108 | dc_sync->cl.tx_tout = 500; /* by half a second */ |
99 | dc_sync->cl.tx_tout = 500; /* by half a second */ | 109 | dc_sync->cl.knows_txdone = false; /* depending upon protocol */ |
100 | dc_sync->cl.knows_txdone = false; /* depending upon protocol */ | 110 | dc_sync->async = false; |
101 | dc_sync->async = false; | 111 | |
102 | 112 | /* ASync mailbox is listed second in 'mboxes' property */ | |
103 | /* ASync mailbox is listed second in 'mboxes' property */ | 113 | dc_async->mbox = mbox_request_channel(&dc_async->cl, 1); |
104 | dc_async->mbox = mbox_request_channel(&dc_async->cl, 1); | 114 | /* Populate data packet */ |
105 | /* Populate data packet */ | 115 | /* ap.xxx = 123; etc */ |
106 | /* ap.xxx = 123; etc */ | 116 | /* Send async message to remote */ |
107 | /* Send async message to remote */ | 117 | mbox_send_message(dc_async->mbox, &ap); |
108 | mbox_send_message(dc_async->mbox, &ap); | 118 | |
109 | 119 | /* Sync mailbox is listed first in 'mboxes' property */ | |
110 | /* Sync mailbox is listed first in 'mboxes' property */ | 120 | dc_sync->mbox = mbox_request_channel(&dc_sync->cl, 0); |
111 | dc_sync->mbox = mbox_request_channel(&dc_sync->cl, 0); | 121 | /* Populate data packet */ |
112 | /* Populate data packet */ | 122 | /* sp.abc = 123; etc */ |
113 | /* sp.abc = 123; etc */ | 123 | /* Send message to remote in blocking mode */ |
114 | /* Send message to remote in blocking mode */ | 124 | mbox_send_message(dc_sync->mbox, &sp); |
115 | mbox_send_message(dc_sync->mbox, &sp); | 125 | /* At this point 'sp' has been sent */ |
116 | /* At this point 'sp' has been sent */ | 126 | |
117 | 127 | /* Now wait for async chan to be done */ | |
118 | /* Now wait for async chan to be done */ | 128 | wait_for_completion(&dc_async->c); |
119 | wait_for_completion(&dc_async->c); | 129 | } |
120 | } | ||
diff --git a/Documentation/memory-hotplug.txt b/Documentation/memory-hotplug.txt index 5c628e19d6cd..7f49ebf3ddb2 100644 --- a/Documentation/memory-hotplug.txt +++ b/Documentation/memory-hotplug.txt | |||
@@ -2,43 +2,48 @@ | |||
2 | Memory Hotplug | 2 | Memory Hotplug |
3 | ============== | 3 | ============== |
4 | 4 | ||
5 | Created: Jul 28 2007 | 5 | :Created: Jul 28 2007 |
6 | Add description of notifier of memory hotplug Oct 11 2007 | 6 | :Updated: Add description of notifier of memory hotplug: Oct 11 2007 |
7 | 7 | ||
8 | This document is about memory hotplug including how-to-use and current status. | 8 | This document is about memory hotplug including how-to-use and current status. |
9 | Because Memory Hotplug is still under development, contents of this text will | 9 | Because Memory Hotplug is still under development, contents of this text will |
10 | be changed often. | 10 | be changed often. |
11 | 11 | ||
12 | 1. Introduction | 12 | .. CONTENTS |
13 | 1.1 purpose of memory hotplug | ||
14 | 1.2. Phases of memory hotplug | ||
15 | 1.3. Unit of Memory online/offline operation | ||
16 | 2. Kernel Configuration | ||
17 | 3. sysfs files for memory hotplug | ||
18 | 4. Physical memory hot-add phase | ||
19 | 4.1 Hardware(Firmware) Support | ||
20 | 4.2 Notify memory hot-add event by hand | ||
21 | 5. Logical Memory hot-add phase | ||
22 | 5.1. State of memory | ||
23 | 5.2. How to online memory | ||
24 | 6. Logical memory remove | ||
25 | 6.1 Memory offline and ZONE_MOVABLE | ||
26 | 6.2. How to offline memory | ||
27 | 7. Physical memory remove | ||
28 | 8. Memory hotplug event notifier | ||
29 | 9. Future Work List | ||
30 | |||
31 | Note(1): x86_64's has special implementation for memory hotplug. | ||
32 | This text does not describe it. | ||
33 | Note(2): This text assumes that sysfs is mounted at /sys. | ||
34 | 13 | ||
14 | 1. Introduction | ||
15 | 1.1 purpose of memory hotplug | ||
16 | 1.2. Phases of memory hotplug | ||
17 | 1.3. Unit of Memory online/offline operation | ||
18 | 2. Kernel Configuration | ||
19 | 3. sysfs files for memory hotplug | ||
20 | 4. Physical memory hot-add phase | ||
21 | 4.1 Hardware(Firmware) Support | ||
22 | 4.2 Notify memory hot-add event by hand | ||
23 | 5. Logical Memory hot-add phase | ||
24 | 5.1. State of memory | ||
25 | 5.2. How to online memory | ||
26 | 6. Logical memory remove | ||
27 | 6.1 Memory offline and ZONE_MOVABLE | ||
28 | 6.2. How to offline memory | ||
29 | 7. Physical memory remove | ||
30 | 8. Memory hotplug event notifier | ||
31 | 9. Future Work List | ||
35 | 32 | ||
36 | --------------- | ||
37 | 1. Introduction | ||
38 | --------------- | ||
39 | 33 | ||
40 | 1.1 purpose of memory hotplug | 34 | .. note:: |
41 | ------------ | 35 | |
36 | (1) x86_64's has special implementation for memory hotplug. | ||
37 | This text does not describe it. | ||
38 | (2) This text assumes that sysfs is mounted at /sys. | ||
39 | |||
40 | |||
41 | Introduction | ||
42 | ============ | ||
43 | |||
44 | purpose of memory hotplug | ||
45 | ------------------------- | ||
46 | |||
42 | Memory Hotplug allows users to increase/decrease the amount of memory. | 47 | Memory Hotplug allows users to increase/decrease the amount of memory. |
43 | Generally, there are two purposes. | 48 | Generally, there are two purposes. |
44 | 49 | ||
@@ -53,9 +58,11 @@ hardware which supports memory power management. | |||
53 | Linux memory hotplug is designed for both purpose. | 58 | Linux memory hotplug is designed for both purpose. |
54 | 59 | ||
55 | 60 | ||
56 | 1.2. Phases of memory hotplug | 61 | Phases of memory hotplug |
57 | --------------- | 62 | ------------------------ |
58 | There are 2 phases in Memory Hotplug. | 63 | |
64 | There are 2 phases in Memory Hotplug: | ||
65 | |||
59 | 1) Physical Memory Hotplug phase | 66 | 1) Physical Memory Hotplug phase |
60 | 2) Logical Memory Hotplug phase. | 67 | 2) Logical Memory Hotplug phase. |
61 | 68 | ||
@@ -70,7 +77,7 @@ management tables, and makes sysfs files for new memory's operation. | |||
70 | If firmware supports notification of connection of new memory to OS, | 77 | If firmware supports notification of connection of new memory to OS, |
71 | this phase is triggered automatically. ACPI can notify this event. If not, | 78 | this phase is triggered automatically. ACPI can notify this event. If not, |
72 | "probe" operation by system administration is used instead. | 79 | "probe" operation by system administration is used instead. |
73 | (see Section 4.). | 80 | (see :ref:`memory_hotplug_physical_mem`). |
74 | 81 | ||
75 | Logical Memory Hotplug phase is to change memory state into | 82 | Logical Memory Hotplug phase is to change memory state into |
76 | available/unavailable for users. Amount of memory from user's view is | 83 | available/unavailable for users. Amount of memory from user's view is |
@@ -83,11 +90,12 @@ Logical Memory Hotplug phase is triggered by write of sysfs file by system | |||
83 | administrator. For the hot-add case, it must be executed after Physical Hotplug | 90 | administrator. For the hot-add case, it must be executed after Physical Hotplug |
84 | phase by hand. | 91 | phase by hand. |
85 | (However, if you writes udev's hotplug scripts for memory hotplug, these | 92 | (However, if you writes udev's hotplug scripts for memory hotplug, these |
86 | phases can be execute in seamless way.) | 93 | phases can be execute in seamless way.) |
94 | |||
87 | 95 | ||
96 | Unit of Memory online/offline operation | ||
97 | --------------------------------------- | ||
88 | 98 | ||
89 | 1.3. Unit of Memory online/offline operation | ||
90 | ------------ | ||
91 | Memory hotplug uses SPARSEMEM memory model which allows memory to be divided | 99 | Memory hotplug uses SPARSEMEM memory model which allows memory to be divided |
92 | into chunks of the same size. These chunks are called "sections". The size of | 100 | into chunks of the same size. These chunks are called "sections". The size of |
93 | a memory section is architecture dependent. For example, power uses 16MiB, ia64 | 101 | a memory section is architecture dependent. For example, power uses 16MiB, ia64 |
@@ -97,46 +105,50 @@ Memory sections are combined into chunks referred to as "memory blocks". The | |||
97 | size of a memory block is architecture dependent and represents the logical | 105 | size of a memory block is architecture dependent and represents the logical |
98 | unit upon which memory online/offline operations are to be performed. The | 106 | unit upon which memory online/offline operations are to be performed. The |
99 | default size of a memory block is the same as memory section size unless an | 107 | default size of a memory block is the same as memory section size unless an |
100 | architecture specifies otherwise. (see Section 3.) | 108 | architecture specifies otherwise. (see :ref:`memory_hotplug_sysfs_files`.) |
101 | 109 | ||
102 | To determine the size (in bytes) of a memory block please read this file: | 110 | To determine the size (in bytes) of a memory block please read this file: |
103 | 111 | ||
104 | /sys/devices/system/memory/block_size_bytes | 112 | /sys/devices/system/memory/block_size_bytes |
105 | 113 | ||
106 | 114 | ||
107 | ----------------------- | 115 | Kernel Configuration |
108 | 2. Kernel Configuration | 116 | ==================== |
109 | ----------------------- | 117 | |
110 | To use memory hotplug feature, kernel must be compiled with following | 118 | To use memory hotplug feature, kernel must be compiled with following |
111 | config options. | 119 | config options. |
112 | 120 | ||
113 | - For all memory hotplug | 121 | - For all memory hotplug: |
114 | Memory model -> Sparse Memory (CONFIG_SPARSEMEM) | 122 | - Memory model -> Sparse Memory (CONFIG_SPARSEMEM) |
115 | Allow for memory hot-add (CONFIG_MEMORY_HOTPLUG) | 123 | - Allow for memory hot-add (CONFIG_MEMORY_HOTPLUG) |
116 | 124 | ||
117 | - To enable memory removal, the following are also necessary | 125 | - To enable memory removal, the following are also necessary: |
118 | Allow for memory hot remove (CONFIG_MEMORY_HOTREMOVE) | 126 | - Allow for memory hot remove (CONFIG_MEMORY_HOTREMOVE) |
119 | Page Migration (CONFIG_MIGRATION) | 127 | - Page Migration (CONFIG_MIGRATION) |
120 | 128 | ||
121 | - For ACPI memory hotplug, the following are also necessary | 129 | - For ACPI memory hotplug, the following are also necessary: |
122 | Memory hotplug (under ACPI Support menu) (CONFIG_ACPI_HOTPLUG_MEMORY) | 130 | - Memory hotplug (under ACPI Support menu) (CONFIG_ACPI_HOTPLUG_MEMORY) |
123 | This option can be kernel module. | 131 | - This option can be kernel module. |
124 | 132 | ||
125 | - As a related configuration, if your box has a feature of NUMA-node hotplug | 133 | - As a related configuration, if your box has a feature of NUMA-node hotplug |
126 | via ACPI, then this option is necessary too. | 134 | via ACPI, then this option is necessary too. |
127 | ACPI0004,PNP0A05 and PNP0A06 Container Driver (under ACPI Support menu) | ||
128 | (CONFIG_ACPI_CONTAINER). | ||
129 | This option can be kernel module too. | ||
130 | 135 | ||
136 | - ACPI0004,PNP0A05 and PNP0A06 Container Driver (under ACPI Support menu) | ||
137 | (CONFIG_ACPI_CONTAINER). | ||
138 | |||
139 | This option can be kernel module too. | ||
140 | |||
141 | |||
142 | .. _memory_hotplug_sysfs_files: | ||
143 | |||
144 | sysfs files for memory hotplug | ||
145 | ============================== | ||
131 | 146 | ||
132 | -------------------------------- | ||
133 | 3 sysfs files for memory hotplug | ||
134 | -------------------------------- | ||
135 | All memory blocks have their device information in sysfs. Each memory block | 147 | All memory blocks have their device information in sysfs. Each memory block |
136 | is described under /sys/devices/system/memory as | 148 | is described under /sys/devices/system/memory as: |
137 | 149 | ||
138 | /sys/devices/system/memory/memoryXXX | 150 | /sys/devices/system/memory/memoryXXX |
139 | (XXX is the memory block id.) | 151 | (XXX is the memory block id.) |
140 | 152 | ||
141 | For the memory block covered by the sysfs directory. It is expected that all | 153 | For the memory block covered by the sysfs directory. It is expected that all |
142 | memory sections in this range are present and no memory holes exist in the | 154 | memory sections in this range are present and no memory holes exist in the |
@@ -145,43 +157,53 @@ the existence of one should not affect the hotplug capabilities of the memory | |||
145 | block. | 157 | block. |
146 | 158 | ||
147 | For example, assume 1GiB memory block size. A device for a memory starting at | 159 | For example, assume 1GiB memory block size. A device for a memory starting at |
148 | 0x100000000 is /sys/device/system/memory/memory4 | 160 | 0x100000000 is /sys/device/system/memory/memory4:: |
149 | (0x100000000 / 1Gib = 4) | 161 | |
162 | (0x100000000 / 1Gib = 4) | ||
163 | |||
150 | This device covers address range [0x100000000 ... 0x140000000) | 164 | This device covers address range [0x100000000 ... 0x140000000) |
151 | 165 | ||
152 | Under each memory block, you can see 5 files: | 166 | Under each memory block, you can see 5 files: |
153 | 167 | ||
154 | /sys/devices/system/memory/memoryXXX/phys_index | 168 | - /sys/devices/system/memory/memoryXXX/phys_index |
155 | /sys/devices/system/memory/memoryXXX/phys_device | 169 | - /sys/devices/system/memory/memoryXXX/phys_device |
156 | /sys/devices/system/memory/memoryXXX/state | 170 | - /sys/devices/system/memory/memoryXXX/state |
157 | /sys/devices/system/memory/memoryXXX/removable | 171 | - /sys/devices/system/memory/memoryXXX/removable |
158 | /sys/devices/system/memory/memoryXXX/valid_zones | 172 | - /sys/devices/system/memory/memoryXXX/valid_zones |
173 | |||
174 | =================== ============================================================ | ||
175 | ``phys_index`` read-only and contains memory block id, same as XXX. | ||
176 | ``state`` read-write | ||
177 | |||
178 | - at read: contains online/offline state of memory. | ||
179 | - at write: user can specify "online_kernel", | ||
159 | 180 | ||
160 | 'phys_index' : read-only and contains memory block id, same as XXX. | ||
161 | 'state' : read-write | ||
162 | at read: contains online/offline state of memory. | ||
163 | at write: user can specify "online_kernel", | ||
164 | "online_movable", "online", "offline" command | 181 | "online_movable", "online", "offline" command |
165 | which will be performed on all sections in the block. | 182 | which will be performed on all sections in the block. |
166 | 'phys_device' : read-only: designed to show the name of physical memory | 183 | ``phys_device`` read-only: designed to show the name of physical memory |
167 | device. This is not well implemented now. | 184 | device. This is not well implemented now. |
168 | 'removable' : read-only: contains an integer value indicating | 185 | ``removable`` read-only: contains an integer value indicating |
169 | whether the memory block is removable or not | 186 | whether the memory block is removable or not |
170 | removable. A value of 1 indicates that the memory | 187 | removable. A value of 1 indicates that the memory |
171 | block is removable and a value of 0 indicates that | 188 | block is removable and a value of 0 indicates that |
172 | it is not removable. A memory block is removable only if | 189 | it is not removable. A memory block is removable only if |
173 | every section in the block is removable. | 190 | every section in the block is removable. |
174 | 'valid_zones' : read-only: designed to show which zones this memory block | 191 | ``valid_zones`` read-only: designed to show which zones this memory block |
175 | can be onlined to. | 192 | can be onlined to. |
176 | The first column shows it's default zone. | 193 | |
194 | The first column shows it`s default zone. | ||
195 | |||
177 | "memory6/valid_zones: Normal Movable" shows this memoryblock | 196 | "memory6/valid_zones: Normal Movable" shows this memoryblock |
178 | can be onlined to ZONE_NORMAL by default and to ZONE_MOVABLE | 197 | can be onlined to ZONE_NORMAL by default and to ZONE_MOVABLE |
179 | by online_movable. | 198 | by online_movable. |
199 | |||
180 | "memory7/valid_zones: Movable Normal" shows this memoryblock | 200 | "memory7/valid_zones: Movable Normal" shows this memoryblock |
181 | can be onlined to ZONE_MOVABLE by default and to ZONE_NORMAL | 201 | can be onlined to ZONE_MOVABLE by default and to ZONE_NORMAL |
182 | by online_kernel. | 202 | by online_kernel. |
203 | =================== ============================================================ | ||
204 | |||
205 | .. note:: | ||
183 | 206 | ||
184 | NOTE: | ||
185 | These directories/files appear after physical memory hotplug phase. | 207 | These directories/files appear after physical memory hotplug phase. |
186 | 208 | ||
187 | If CONFIG_NUMA is enabled the memoryXXX/ directories can also be accessed | 209 | If CONFIG_NUMA is enabled the memoryXXX/ directories can also be accessed |
@@ -193,13 +215,14 @@ For example: | |||
193 | A backlink will also be created: | 215 | A backlink will also be created: |
194 | /sys/devices/system/memory/memory9/node0 -> ../../node/node0 | 216 | /sys/devices/system/memory/memory9/node0 -> ../../node/node0 |
195 | 217 | ||
218 | .. _memory_hotplug_physical_mem: | ||
219 | |||
220 | Physical memory hot-add phase | ||
221 | ============================= | ||
196 | 222 | ||
197 | -------------------------------- | 223 | Hardware(Firmware) Support |
198 | 4. Physical memory hot-add phase | 224 | -------------------------- |
199 | -------------------------------- | ||
200 | 225 | ||
201 | 4.1 Hardware(Firmware) Support | ||
202 | ------------ | ||
203 | On x86_64/ia64 platform, memory hotplug by ACPI is supported. | 226 | On x86_64/ia64 platform, memory hotplug by ACPI is supported. |
204 | 227 | ||
205 | In general, the firmware (ACPI) which supports memory hotplug defines | 228 | In general, the firmware (ACPI) which supports memory hotplug defines |
@@ -209,7 +232,8 @@ script. This will be done automatically. | |||
209 | 232 | ||
210 | But scripts for memory hotplug are not contained in generic udev package(now). | 233 | But scripts for memory hotplug are not contained in generic udev package(now). |
211 | You may have to write it by yourself or online/offline memory by hand. | 234 | You may have to write it by yourself or online/offline memory by hand. |
212 | Please see "How to online memory", "How to offline memory" in this text. | 235 | Please see :ref:`memory_hotplug_how_to_online_memory` and |
236 | :ref:`memory_hotplug_how_to_offline_memory`. | ||
213 | 237 | ||
214 | If firmware supports NUMA-node hotplug, and defines an object _HID "ACPI0004", | 238 | If firmware supports NUMA-node hotplug, and defines an object _HID "ACPI0004", |
215 | "PNP0A05", or "PNP0A06", notification is asserted to it, and ACPI handler | 239 | "PNP0A05", or "PNP0A06", notification is asserted to it, and ACPI handler |
@@ -217,8 +241,9 @@ calls hotplug code for all of objects which are defined in it. | |||
217 | If memory device is found, memory hotplug code will be called. | 241 | If memory device is found, memory hotplug code will be called. |
218 | 242 | ||
219 | 243 | ||
220 | 4.2 Notify memory hot-add event by hand | 244 | Notify memory hot-add event by hand |
221 | ------------ | 245 | ----------------------------------- |
246 | |||
222 | On some architectures, the firmware may not notify the kernel of a memory | 247 | On some architectures, the firmware may not notify the kernel of a memory |
223 | hotplug event. Therefore, the memory "probe" interface is supported to | 248 | hotplug event. Therefore, the memory "probe" interface is supported to |
224 | explicitly notify the kernel. This interface depends on | 249 | explicitly notify the kernel. This interface depends on |
@@ -229,45 +254,48 @@ notification. | |||
229 | Probe interface is located at | 254 | Probe interface is located at |
230 | /sys/devices/system/memory/probe | 255 | /sys/devices/system/memory/probe |
231 | 256 | ||
232 | You can tell the physical address of new memory to the kernel by | 257 | You can tell the physical address of new memory to the kernel by:: |
233 | 258 | ||
234 | % echo start_address_of_new_memory > /sys/devices/system/memory/probe | 259 | % echo start_address_of_new_memory > /sys/devices/system/memory/probe |
235 | 260 | ||
236 | Then, [start_address_of_new_memory, start_address_of_new_memory + | 261 | Then, [start_address_of_new_memory, start_address_of_new_memory + |
237 | memory_block_size] memory range is hot-added. In this case, hotplug script is | 262 | memory_block_size] memory range is hot-added. In this case, hotplug script is |
238 | not called (in current implementation). You'll have to online memory by | 263 | not called (in current implementation). You'll have to online memory by |
239 | yourself. Please see "How to online memory" in this text. | 264 | yourself. Please see :ref:`memory_hotplug_how_to_online_memory`. |
240 | 265 | ||
241 | 266 | ||
242 | ------------------------------ | 267 | Logical Memory hot-add phase |
243 | 5. Logical Memory hot-add phase | 268 | ============================ |
244 | ------------------------------ | ||
245 | 269 | ||
246 | 5.1. State of memory | 270 | State of memory |
247 | ------------ | 271 | --------------- |
248 | To see (online/offline) state of a memory block, read 'state' file. | 272 | |
273 | To see (online/offline) state of a memory block, read 'state' file:: | ||
274 | |||
275 | % cat /sys/device/system/memory/memoryXXX/state | ||
249 | 276 | ||
250 | % cat /sys/device/system/memory/memoryXXX/state | ||
251 | 277 | ||
278 | - If the memory block is online, you'll read "online". | ||
279 | - If the memory block is offline, you'll read "offline". | ||
252 | 280 | ||
253 | If the memory block is online, you'll read "online". | ||
254 | If the memory block is offline, you'll read "offline". | ||
255 | 281 | ||
282 | .. _memory_hotplug_how_to_online_memory: | ||
283 | |||
284 | How to online memory | ||
285 | -------------------- | ||
256 | 286 | ||
257 | 5.2. How to online memory | ||
258 | ------------ | ||
259 | When the memory is hot-added, the kernel decides whether or not to "online" | 287 | When the memory is hot-added, the kernel decides whether or not to "online" |
260 | it according to the policy which can be read from "auto_online_blocks" file: | 288 | it according to the policy which can be read from "auto_online_blocks" file:: |
261 | 289 | ||
262 | % cat /sys/devices/system/memory/auto_online_blocks | 290 | % cat /sys/devices/system/memory/auto_online_blocks |
263 | 291 | ||
264 | The default depends on the CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE kernel config | 292 | The default depends on the CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE kernel config |
265 | option. If it is disabled the default is "offline" which means the newly added | 293 | option. If it is disabled the default is "offline" which means the newly added |
266 | memory is not in a ready-to-use state and you have to "online" the newly added | 294 | memory is not in a ready-to-use state and you have to "online" the newly added |
267 | memory blocks manually. Automatic onlining can be requested by writing "online" | 295 | memory blocks manually. Automatic onlining can be requested by writing "online" |
268 | to "auto_online_blocks" file: | 296 | to "auto_online_blocks" file:: |
269 | 297 | ||
270 | % echo online > /sys/devices/system/memory/auto_online_blocks | 298 | % echo online > /sys/devices/system/memory/auto_online_blocks |
271 | 299 | ||
272 | This sets a global policy and impacts all memory blocks that will subsequently | 300 | This sets a global policy and impacts all memory blocks that will subsequently |
273 | be hotplugged. Currently offline blocks keep their state. It is possible, under | 301 | be hotplugged. Currently offline blocks keep their state. It is possible, under |
@@ -277,24 +305,26 @@ online. User space tools can check their "state" files | |||
277 | 305 | ||
278 | If the automatic onlining wasn't requested, failed, or some memory block was | 306 | If the automatic onlining wasn't requested, failed, or some memory block was |
279 | offlined it is possible to change the individual block's state by writing to the | 307 | offlined it is possible to change the individual block's state by writing to the |
280 | "state" file: | 308 | "state" file:: |
281 | 309 | ||
282 | % echo online > /sys/devices/system/memory/memoryXXX/state | 310 | % echo online > /sys/devices/system/memory/memoryXXX/state |
283 | 311 | ||
284 | This onlining will not change the ZONE type of the target memory block, | 312 | This onlining will not change the ZONE type of the target memory block, |
285 | If the memory block doesn't belong to any zone an appropriate kernel zone | 313 | If the memory block doesn't belong to any zone an appropriate kernel zone |
286 | (usually ZONE_NORMAL) will be used unless movable_node kernel command line | 314 | (usually ZONE_NORMAL) will be used unless movable_node kernel command line |
287 | option is specified when ZONE_MOVABLE will be used. | 315 | option is specified when ZONE_MOVABLE will be used. |
288 | 316 | ||
289 | You can explicitly request to associate it with ZONE_MOVABLE by | 317 | You can explicitly request to associate it with ZONE_MOVABLE by:: |
318 | |||
319 | % echo online_movable > /sys/devices/system/memory/memoryXXX/state | ||
290 | 320 | ||
291 | % echo online_movable > /sys/devices/system/memory/memoryXXX/state | 321 | .. note:: current limit: this memory block must be adjacent to ZONE_MOVABLE |
292 | (NOTE: current limit: this memory block must be adjacent to ZONE_MOVABLE) | ||
293 | 322 | ||
294 | Or you can explicitly request a kernel zone (usually ZONE_NORMAL) by: | 323 | Or you can explicitly request a kernel zone (usually ZONE_NORMAL) by:: |
295 | 324 | ||
296 | % echo online_kernel > /sys/devices/system/memory/memoryXXX/state | 325 | % echo online_kernel > /sys/devices/system/memory/memoryXXX/state |
297 | (NOTE: current limit: this memory block must be adjacent to ZONE_NORMAL) | 326 | |
327 | .. note:: current limit: this memory block must be adjacent to ZONE_NORMAL | ||
298 | 328 | ||
299 | An explicit zone onlining can fail (e.g. when the range is already within | 329 | An explicit zone onlining can fail (e.g. when the range is already within |
300 | and existing and incompatible zone already). | 330 | and existing and incompatible zone already). |
@@ -306,12 +336,12 @@ This may be changed in future. | |||
306 | 336 | ||
307 | 337 | ||
308 | 338 | ||
309 | ------------------------ | 339 | Logical memory remove |
310 | 6. Logical memory remove | 340 | ===================== |
311 | ------------------------ | 341 | |
342 | Memory offline and ZONE_MOVABLE | ||
343 | ------------------------------- | ||
312 | 344 | ||
313 | 6.1 Memory offline and ZONE_MOVABLE | ||
314 | ------------ | ||
315 | Memory offlining is more complicated than memory online. Because memory offline | 345 | Memory offlining is more complicated than memory online. Because memory offline |
316 | has to make the whole memory block be unused, memory offline can fail if | 346 | has to make the whole memory block be unused, memory offline can fail if |
317 | the memory block includes memory which cannot be freed. | 347 | the memory block includes memory which cannot be freed. |
@@ -336,24 +366,27 @@ Assume the system has "TOTAL" amount of memory at boot time, this boot option | |||
336 | creates ZONE_MOVABLE as following. | 366 | creates ZONE_MOVABLE as following. |
337 | 367 | ||
338 | 1) When kernelcore=YYYY boot option is used, | 368 | 1) When kernelcore=YYYY boot option is used, |
339 | Size of memory not for movable pages (not for offline) is YYYY. | 369 | Size of memory not for movable pages (not for offline) is YYYY. |
340 | Size of memory for movable pages (for offline) is TOTAL-YYYY. | 370 | Size of memory for movable pages (for offline) is TOTAL-YYYY. |
341 | 371 | ||
342 | 2) When movablecore=ZZZZ boot option is used, | 372 | 2) When movablecore=ZZZZ boot option is used, |
343 | Size of memory not for movable pages (not for offline) is TOTAL - ZZZZ. | 373 | Size of memory not for movable pages (not for offline) is TOTAL - ZZZZ. |
344 | Size of memory for movable pages (for offline) is ZZZZ. | 374 | Size of memory for movable pages (for offline) is ZZZZ. |
375 | |||
376 | .. note:: | ||
345 | 377 | ||
378 | Unfortunately, there is no information to show which memory block belongs | ||
379 | to ZONE_MOVABLE. This is TBD. | ||
346 | 380 | ||
347 | Note: Unfortunately, there is no information to show which memory block belongs | 381 | .. _memory_hotplug_how_to_offline_memory: |
348 | to ZONE_MOVABLE. This is TBD. | ||
349 | 382 | ||
383 | How to offline memory | ||
384 | --------------------- | ||
350 | 385 | ||
351 | 6.2. How to offline memory | ||
352 | ------------ | ||
353 | You can offline a memory block by using the same sysfs interface that was used | 386 | You can offline a memory block by using the same sysfs interface that was used |
354 | in memory onlining. | 387 | in memory onlining:: |
355 | 388 | ||
356 | % echo offline > /sys/devices/system/memory/memoryXXX/state | 389 | % echo offline > /sys/devices/system/memory/memoryXXX/state |
357 | 390 | ||
358 | If offline succeeds, the state of the memory block is changed to be "offline". | 391 | If offline succeeds, the state of the memory block is changed to be "offline". |
359 | If it fails, some error core (like -EBUSY) will be returned by the kernel. | 392 | If it fails, some error core (like -EBUSY) will be returned by the kernel. |
@@ -367,22 +400,22 @@ able to offline it (or not). (For example, a page is referred to by some kernel | |||
367 | internal call and released soon.) | 400 | internal call and released soon.) |
368 | 401 | ||
369 | Consideration: | 402 | Consideration: |
370 | Memory hotplug's design direction is to make the possibility of memory offlining | 403 | Memory hotplug's design direction is to make the possibility of memory |
371 | higher and to guarantee unplugging memory under any situation. But it needs | 404 | offlining higher and to guarantee unplugging memory under any situation. But |
372 | more work. Returning -EBUSY under some situation may be good because the user | 405 | it needs more work. Returning -EBUSY under some situation may be good because |
373 | can decide to retry more or not by himself. Currently, memory offlining code | 406 | the user can decide to retry more or not by himself. Currently, memory |
374 | does some amount of retry with 120 seconds timeout. | 407 | offlining code does some amount of retry with 120 seconds timeout. |
408 | |||
409 | Physical memory remove | ||
410 | ====================== | ||
375 | 411 | ||
376 | ------------------------- | ||
377 | 7. Physical memory remove | ||
378 | ------------------------- | ||
379 | Need more implementation yet.... | 412 | Need more implementation yet.... |
380 | - Notification completion of remove works by OS to firmware. | 413 | - Notification completion of remove works by OS to firmware. |
381 | - Guard from remove if not yet. | 414 | - Guard from remove if not yet. |
382 | 415 | ||
383 | -------------------------------- | 416 | Memory hotplug event notifier |
384 | 8. Memory hotplug event notifier | 417 | ============================= |
385 | -------------------------------- | 418 | |
386 | Hotplugging events are sent to a notification queue. | 419 | Hotplugging events are sent to a notification queue. |
387 | 420 | ||
388 | There are six types of notification defined in include/linux/memory.h: | 421 | There are six types of notification defined in include/linux/memory.h: |
@@ -412,14 +445,14 @@ MEM_CANCEL_OFFLINE | |||
412 | MEM_OFFLINE | 445 | MEM_OFFLINE |
413 | Generated after offlining memory is complete. | 446 | Generated after offlining memory is complete. |
414 | 447 | ||
415 | A callback routine can be registered by calling | 448 | A callback routine can be registered by calling:: |
416 | 449 | ||
417 | hotplug_memory_notifier(callback_func, priority) | 450 | hotplug_memory_notifier(callback_func, priority) |
418 | 451 | ||
419 | Callback functions with higher values of priority are called before callback | 452 | Callback functions with higher values of priority are called before callback |
420 | functions with lower values. | 453 | functions with lower values. |
421 | 454 | ||
422 | A callback function must have the following prototype: | 455 | A callback function must have the following prototype:: |
423 | 456 | ||
424 | int callback_func( | 457 | int callback_func( |
425 | struct notifier_block *self, unsigned long action, void *arg); | 458 | struct notifier_block *self, unsigned long action, void *arg); |
@@ -427,27 +460,28 @@ A callback function must have the following prototype: | |||
427 | The first argument of the callback function (self) is a pointer to the block | 460 | The first argument of the callback function (self) is a pointer to the block |
428 | of the notifier chain that points to the callback function itself. | 461 | of the notifier chain that points to the callback function itself. |
429 | The second argument (action) is one of the event types described above. | 462 | The second argument (action) is one of the event types described above. |
430 | The third argument (arg) passes a pointer of struct memory_notify. | 463 | The third argument (arg) passes a pointer of struct memory_notify:: |
431 | 464 | ||
432 | struct memory_notify { | 465 | struct memory_notify { |
433 | unsigned long start_pfn; | 466 | unsigned long start_pfn; |
434 | unsigned long nr_pages; | 467 | unsigned long nr_pages; |
435 | int status_change_nid_normal; | 468 | int status_change_nid_normal; |
436 | int status_change_nid_high; | 469 | int status_change_nid_high; |
437 | int status_change_nid; | 470 | int status_change_nid; |
438 | } | 471 | } |
439 | 472 | ||
440 | start_pfn is start_pfn of online/offline memory. | 473 | - start_pfn is start_pfn of online/offline memory. |
441 | nr_pages is # of pages of online/offline memory. | 474 | - nr_pages is # of pages of online/offline memory. |
442 | status_change_nid_normal is set node id when N_NORMAL_MEMORY of nodemask | 475 | - status_change_nid_normal is set node id when N_NORMAL_MEMORY of nodemask |
443 | is (will be) set/clear, if this is -1, then nodemask status is not changed. | 476 | is (will be) set/clear, if this is -1, then nodemask status is not changed. |
444 | status_change_nid_high is set node id when N_HIGH_MEMORY of nodemask | 477 | - status_change_nid_high is set node id when N_HIGH_MEMORY of nodemask |
445 | is (will be) set/clear, if this is -1, then nodemask status is not changed. | 478 | is (will be) set/clear, if this is -1, then nodemask status is not changed. |
446 | status_change_nid is set node id when N_MEMORY of nodemask is (will be) | 479 | - status_change_nid is set node id when N_MEMORY of nodemask is (will be) |
447 | set/clear. It means a new(memoryless) node gets new memory by online and a | 480 | set/clear. It means a new(memoryless) node gets new memory by online and a |
448 | node loses all memory. If this is -1, then nodemask status is not changed. | 481 | node loses all memory. If this is -1, then nodemask status is not changed. |
449 | If status_changed_nid* >= 0, callback should create/discard structures for the | 482 | |
450 | node if necessary. | 483 | If status_changed_nid* >= 0, callback should create/discard structures for the |
484 | node if necessary. | ||
451 | 485 | ||
452 | The callback routine shall return one of the values | 486 | The callback routine shall return one of the values |
453 | NOTIFY_DONE, NOTIFY_OK, NOTIFY_BAD, NOTIFY_STOP | 487 | NOTIFY_DONE, NOTIFY_OK, NOTIFY_BAD, NOTIFY_STOP |
@@ -461,9 +495,9 @@ further processing of the notification queue. | |||
461 | 495 | ||
462 | NOTIFY_STOP stops further processing of the notification queue. | 496 | NOTIFY_STOP stops further processing of the notification queue. |
463 | 497 | ||
464 | -------------- | 498 | Future Work |
465 | 9. Future Work | 499 | =========== |
466 | -------------- | 500 | |
467 | - allowing memory hot-add to ZONE_MOVABLE. maybe we need some switch like | 501 | - allowing memory hot-add to ZONE_MOVABLE. maybe we need some switch like |
468 | sysctl or new control file. | 502 | sysctl or new control file. |
469 | - showing memory block and physical device relationship. | 503 | - showing memory block and physical device relationship. |
@@ -471,4 +505,3 @@ NOTIFY_STOP stops further processing of the notification queue. | |||
471 | - support HugeTLB page migration and offlining. | 505 | - support HugeTLB page migration and offlining. |
472 | - memmap removing at memory offline. | 506 | - memmap removing at memory offline. |
473 | - physical remove memory. | 507 | - physical remove memory. |
474 | |||
diff --git a/Documentation/men-chameleon-bus.txt b/Documentation/men-chameleon-bus.txt index 30ded732027e..1b1f048aa748 100644 --- a/Documentation/men-chameleon-bus.txt +++ b/Documentation/men-chameleon-bus.txt | |||
@@ -1,163 +1,175 @@ | |||
1 | MEN Chameleon Bus | ||
2 | ================= | ||
3 | |||
4 | Table of Contents | ||
5 | ================= | 1 | ================= |
6 | 1 Introduction | 2 | MEN Chameleon Bus |
7 | 1.1 Scope of this Document | 3 | ================= |
8 | 1.2 Limitations of the current implementation | 4 | |
9 | 2 Architecture | 5 | .. Table of Contents |
10 | 2.1 MEN Chameleon Bus | 6 | ================= |
11 | 2.2 Carrier Devices | 7 | 1 Introduction |
12 | 2.3 Parser | 8 | 1.1 Scope of this Document |
13 | 3 Resource handling | 9 | 1.2 Limitations of the current implementation |
14 | 3.1 Memory Resources | 10 | 2 Architecture |
15 | 3.2 IRQs | 11 | 2.1 MEN Chameleon Bus |
16 | 4 Writing an MCB driver | 12 | 2.2 Carrier Devices |
17 | 4.1 The driver structure | 13 | 2.3 Parser |
18 | 4.2 Probing and attaching | 14 | 3 Resource handling |
19 | 4.3 Initializing the driver | 15 | 3.1 Memory Resources |
20 | 16 | 3.2 IRQs | |
21 | 17 | 4 Writing an MCB driver | |
22 | 1 Introduction | 18 | 4.1 The driver structure |
23 | =============== | 19 | 4.2 Probing and attaching |
24 | This document describes the architecture and implementation of the MEN | 20 | 4.3 Initializing the driver |
25 | Chameleon Bus (called MCB throughout this document). | 21 | |
26 | 22 | ||
27 | 1.1 Scope of this Document | 23 | Introduction |
28 | --------------------------- | 24 | ============ |
29 | This document is intended to be a short overview of the current | 25 | |
30 | implementation and does by no means describe the complete possibilities of MCB | 26 | This document describes the architecture and implementation of the MEN |
31 | based devices. | 27 | Chameleon Bus (called MCB throughout this document). |
32 | 28 | ||
33 | 1.2 Limitations of the current implementation | 29 | Scope of this Document |
34 | ---------------------------------------------- | ||
35 | The current implementation is limited to PCI and PCIe based carrier devices | ||
36 | that only use a single memory resource and share the PCI legacy IRQ. Not | ||
37 | implemented are: | ||
38 | - Multi-resource MCB devices like the VME Controller or M-Module carrier. | ||
39 | - MCB devices that need another MCB device, like SRAM for a DMA Controller's | ||
40 | buffer descriptors or a video controller's video memory. | ||
41 | - A per-carrier IRQ domain for carrier devices that have one (or more) IRQs | ||
42 | per MCB device like PCIe based carriers with MSI or MSI-X support. | ||
43 | |||
44 | 2 Architecture | ||
45 | =============== | ||
46 | MCB is divided into 3 functional blocks: | ||
47 | - The MEN Chameleon Bus itself, | ||
48 | - drivers for MCB Carrier Devices and | ||
49 | - the parser for the Chameleon table. | ||
50 | |||
51 | 2.1 MEN Chameleon Bus | ||
52 | ---------------------- | 30 | ---------------------- |
53 | The MEN Chameleon Bus is an artificial bus system that attaches to a so | 31 | |
54 | called Chameleon FPGA device found on some hardware produced my MEN Mikro | 32 | This document is intended to be a short overview of the current |
55 | Elektronik GmbH. These devices are multi-function devices implemented in a | 33 | implementation and does by no means describe the complete possibilities of MCB |
56 | single FPGA and usually attached via some sort of PCI or PCIe link. Each | 34 | based devices. |
57 | FPGA contains a header section describing the content of the FPGA. The | 35 | |
58 | header lists the device id, PCI BAR, offset from the beginning of the PCI | 36 | Limitations of the current implementation |
59 | BAR, size in the FPGA, interrupt number and some other properties currently | 37 | ----------------------------------------- |
60 | not handled by the MCB implementation. | 38 | |
61 | 39 | The current implementation is limited to PCI and PCIe based carrier devices | |
62 | 2.2 Carrier Devices | 40 | that only use a single memory resource and share the PCI legacy IRQ. Not |
41 | implemented are: | ||
42 | |||
43 | - Multi-resource MCB devices like the VME Controller or M-Module carrier. | ||
44 | - MCB devices that need another MCB device, like SRAM for a DMA Controller's | ||
45 | buffer descriptors or a video controller's video memory. | ||
46 | - A per-carrier IRQ domain for carrier devices that have one (or more) IRQs | ||
47 | per MCB device like PCIe based carriers with MSI or MSI-X support. | ||
48 | |||
49 | Architecture | ||
50 | ============ | ||
51 | |||
52 | MCB is divided into 3 functional blocks: | ||
53 | |||
54 | - The MEN Chameleon Bus itself, | ||
55 | - drivers for MCB Carrier Devices and | ||
56 | - the parser for the Chameleon table. | ||
57 | |||
58 | MEN Chameleon Bus | ||
59 | ----------------- | ||
60 | |||
61 | The MEN Chameleon Bus is an artificial bus system that attaches to a so | ||
62 | called Chameleon FPGA device found on some hardware produced my MEN Mikro | ||
63 | Elektronik GmbH. These devices are multi-function devices implemented in a | ||
64 | single FPGA and usually attached via some sort of PCI or PCIe link. Each | ||
65 | FPGA contains a header section describing the content of the FPGA. The | ||
66 | header lists the device id, PCI BAR, offset from the beginning of the PCI | ||
67 | BAR, size in the FPGA, interrupt number and some other properties currently | ||
68 | not handled by the MCB implementation. | ||
69 | |||
70 | Carrier Devices | ||
71 | --------------- | ||
72 | |||
73 | A carrier device is just an abstraction for the real world physical bus the | ||
74 | Chameleon FPGA is attached to. Some IP Core drivers may need to interact with | ||
75 | properties of the carrier device (like querying the IRQ number of a PCI | ||
76 | device). To provide abstraction from the real hardware bus, an MCB carrier | ||
77 | device provides callback methods to translate the driver's MCB function calls | ||
78 | to hardware related function calls. For example a carrier device may | ||
79 | implement the get_irq() method which can be translated into a hardware bus | ||
80 | query for the IRQ number the device should use. | ||
81 | |||
82 | Parser | ||
83 | ------ | ||
84 | |||
85 | The parser reads the first 512 bytes of a Chameleon device and parses the | ||
86 | Chameleon table. Currently the parser only supports the Chameleon v2 variant | ||
87 | of the Chameleon table but can easily be adopted to support an older or | ||
88 | possible future variant. While parsing the table's entries new MCB devices | ||
89 | are allocated and their resources are assigned according to the resource | ||
90 | assignment in the Chameleon table. After resource assignment is finished, the | ||
91 | MCB devices are registered at the MCB and thus at the driver core of the | ||
92 | Linux kernel. | ||
93 | |||
94 | Resource handling | ||
95 | ================= | ||
96 | |||
97 | The current implementation assigns exactly one memory and one IRQ resource | ||
98 | per MCB device. But this is likely going to change in the future. | ||
99 | |||
100 | Memory Resources | ||
101 | ---------------- | ||
102 | |||
103 | Each MCB device has exactly one memory resource, which can be requested from | ||
104 | the MCB bus. This memory resource is the physical address of the MCB device | ||
105 | inside the carrier and is intended to be passed to ioremap() and friends. It | ||
106 | is already requested from the kernel by calling request_mem_region(). | ||
107 | |||
108 | IRQs | ||
109 | ---- | ||
110 | |||
111 | Each MCB device has exactly one IRQ resource, which can be requested from the | ||
112 | MCB bus. If a carrier device driver implements the ->get_irq() callback | ||
113 | method, the IRQ number assigned by the carrier device will be returned, | ||
114 | otherwise the IRQ number inside the Chameleon table will be returned. This | ||
115 | number is suitable to be passed to request_irq(). | ||
116 | |||
117 | Writing an MCB driver | ||
118 | ===================== | ||
119 | |||
120 | The driver structure | ||
63 | -------------------- | 121 | -------------------- |
64 | A carrier device is just an abstraction for the real world physical bus the | 122 | |
65 | Chameleon FPGA is attached to. Some IP Core drivers may need to interact with | 123 | Each MCB driver has a structure to identify the device driver as well as |
66 | properties of the carrier device (like querying the IRQ number of a PCI | 124 | device ids which identify the IP Core inside the FPGA. The driver structure |
67 | device). To provide abstraction from the real hardware bus, an MCB carrier | 125 | also contains callback methods which get executed on driver probe and |
68 | device provides callback methods to translate the driver's MCB function calls | 126 | removal from the system:: |
69 | to hardware related function calls. For example a carrier device may | 127 | |
70 | implement the get_irq() method which can be translated into a hardware bus | 128 | static const struct mcb_device_id foo_ids[] = { |
71 | query for the IRQ number the device should use. | 129 | { .device = 0x123 }, |
72 | 130 | { } | |
73 | 2.3 Parser | 131 | }; |
74 | ----------- | 132 | MODULE_DEVICE_TABLE(mcb, foo_ids); |
75 | The parser reads the first 512 bytes of a Chameleon device and parses the | 133 | |
76 | Chameleon table. Currently the parser only supports the Chameleon v2 variant | 134 | static struct mcb_driver foo_driver = { |
77 | of the Chameleon table but can easily be adopted to support an older or | 135 | driver = { |
78 | possible future variant. While parsing the table's entries new MCB devices | 136 | .name = "foo-bar", |
79 | are allocated and their resources are assigned according to the resource | 137 | .owner = THIS_MODULE, |
80 | assignment in the Chameleon table. After resource assignment is finished, the | 138 | }, |
81 | MCB devices are registered at the MCB and thus at the driver core of the | 139 | .probe = foo_probe, |
82 | Linux kernel. | 140 | .remove = foo_remove, |
83 | 141 | .id_table = foo_ids, | |
84 | 3 Resource handling | 142 | }; |
85 | ==================== | 143 | |
86 | The current implementation assigns exactly one memory and one IRQ resource | 144 | Probing and attaching |
87 | per MCB device. But this is likely going to change in the future. | ||
88 | |||
89 | 3.1 Memory Resources | ||
90 | --------------------- | 145 | --------------------- |
91 | Each MCB device has exactly one memory resource, which can be requested from | 146 | |
92 | the MCB bus. This memory resource is the physical address of the MCB device | 147 | When a driver is loaded and the MCB devices it services are found, the MCB |
93 | inside the carrier and is intended to be passed to ioremap() and friends. It | 148 | core will call the driver's probe callback method. When the driver is removed |
94 | is already requested from the kernel by calling request_mem_region(). | 149 | from the system, the MCB core will call the driver's remove callback method:: |
95 | 150 | ||
96 | 3.2 IRQs | 151 | static init foo_probe(struct mcb_device *mdev, const struct mcb_device_id *id); |
97 | --------- | 152 | static void foo_remove(struct mcb_device *mdev); |
98 | Each MCB device has exactly one IRQ resource, which can be requested from the | 153 | |
99 | MCB bus. If a carrier device driver implements the ->get_irq() callback | 154 | Initializing the driver |
100 | method, the IRQ number assigned by the carrier device will be returned, | 155 | ----------------------- |
101 | otherwise the IRQ number inside the Chameleon table will be returned. This | 156 | |
102 | number is suitable to be passed to request_irq(). | 157 | When the kernel is booted or your foo driver module is inserted, you have to |
103 | 158 | perform driver initialization. Usually it is enough to register your driver | |
104 | 4 Writing an MCB driver | 159 | module at the MCB core:: |
105 | ======================= | 160 | |
106 | 161 | static int __init foo_init(void) | |
107 | 4.1 The driver structure | 162 | { |
108 | ------------------------- | 163 | return mcb_register_driver(&foo_driver); |
109 | Each MCB driver has a structure to identify the device driver as well as | 164 | } |
110 | device ids which identify the IP Core inside the FPGA. The driver structure | 165 | module_init(foo_init); |
111 | also contains callback methods which get executed on driver probe and | 166 | |
112 | removal from the system. | 167 | static void __exit foo_exit(void) |
113 | 168 | { | |
114 | 169 | mcb_unregister_driver(&foo_driver); | |
115 | static const struct mcb_device_id foo_ids[] = { | 170 | } |
116 | { .device = 0x123 }, | 171 | module_exit(foo_exit); |
117 | { } | 172 | |
118 | }; | 173 | The module_mcb_driver() macro can be used to reduce the above code:: |
119 | MODULE_DEVICE_TABLE(mcb, foo_ids); | 174 | |
120 | 175 | module_mcb_driver(foo_driver); | |
121 | static struct mcb_driver foo_driver = { | ||
122 | driver = { | ||
123 | .name = "foo-bar", | ||
124 | .owner = THIS_MODULE, | ||
125 | }, | ||
126 | .probe = foo_probe, | ||
127 | .remove = foo_remove, | ||
128 | .id_table = foo_ids, | ||
129 | }; | ||
130 | |||
131 | 4.2 Probing and attaching | ||
132 | -------------------------- | ||
133 | When a driver is loaded and the MCB devices it services are found, the MCB | ||
134 | core will call the driver's probe callback method. When the driver is removed | ||
135 | from the system, the MCB core will call the driver's remove callback method. | ||
136 | |||
137 | |||
138 | static init foo_probe(struct mcb_device *mdev, const struct mcb_device_id *id); | ||
139 | static void foo_remove(struct mcb_device *mdev); | ||
140 | |||
141 | 4.3 Initializing the driver | ||
142 | ---------------------------- | ||
143 | When the kernel is booted or your foo driver module is inserted, you have to | ||
144 | perform driver initialization. Usually it is enough to register your driver | ||
145 | module at the MCB core. | ||
146 | |||
147 | |||
148 | static int __init foo_init(void) | ||
149 | { | ||
150 | return mcb_register_driver(&foo_driver); | ||
151 | } | ||
152 | module_init(foo_init); | ||
153 | |||
154 | static void __exit foo_exit(void) | ||
155 | { | ||
156 | mcb_unregister_driver(&foo_driver); | ||
157 | } | ||
158 | module_exit(foo_exit); | ||
159 | |||
160 | The module_mcb_driver() macro can be used to reduce the above code. | ||
161 | |||
162 | |||
163 | module_mcb_driver(foo_driver); | ||
diff --git a/Documentation/nommu-mmap.txt b/Documentation/nommu-mmap.txt index ae57b9ea0d41..69556f0d494b 100644 --- a/Documentation/nommu-mmap.txt +++ b/Documentation/nommu-mmap.txt | |||
@@ -1,6 +1,6 @@ | |||
1 | ============================= | 1 | ============================= |
2 | NO-MMU MEMORY MAPPING SUPPORT | 2 | No-MMU memory mapping support |
3 | ============================= | 3 | ============================= |
4 | 4 | ||
5 | The kernel has limited support for memory mapping under no-MMU conditions, such | 5 | The kernel has limited support for memory mapping under no-MMU conditions, such |
6 | as are used in uClinux environments. From the userspace point of view, memory | 6 | as are used in uClinux environments. From the userspace point of view, memory |
@@ -16,7 +16,7 @@ the CLONE_VM flag. | |||
16 | The behaviour is similar between the MMU and no-MMU cases, but not identical; | 16 | The behaviour is similar between the MMU and no-MMU cases, but not identical; |
17 | and it's also much more restricted in the latter case: | 17 | and it's also much more restricted in the latter case: |
18 | 18 | ||
19 | (*) Anonymous mapping, MAP_PRIVATE | 19 | (#) Anonymous mapping, MAP_PRIVATE |
20 | 20 | ||
21 | In the MMU case: VM regions backed by arbitrary pages; copy-on-write | 21 | In the MMU case: VM regions backed by arbitrary pages; copy-on-write |
22 | across fork. | 22 | across fork. |
@@ -24,14 +24,14 @@ and it's also much more restricted in the latter case: | |||
24 | In the no-MMU case: VM regions backed by arbitrary contiguous runs of | 24 | In the no-MMU case: VM regions backed by arbitrary contiguous runs of |
25 | pages. | 25 | pages. |
26 | 26 | ||
27 | (*) Anonymous mapping, MAP_SHARED | 27 | (#) Anonymous mapping, MAP_SHARED |
28 | 28 | ||
29 | These behave very much like private mappings, except that they're | 29 | These behave very much like private mappings, except that they're |
30 | shared across fork() or clone() without CLONE_VM in the MMU case. Since | 30 | shared across fork() or clone() without CLONE_VM in the MMU case. Since |
31 | the no-MMU case doesn't support these, behaviour is identical to | 31 | the no-MMU case doesn't support these, behaviour is identical to |
32 | MAP_PRIVATE there. | 32 | MAP_PRIVATE there. |
33 | 33 | ||
34 | (*) File, MAP_PRIVATE, PROT_READ / PROT_EXEC, !PROT_WRITE | 34 | (#) File, MAP_PRIVATE, PROT_READ / PROT_EXEC, !PROT_WRITE |
35 | 35 | ||
36 | In the MMU case: VM regions backed by pages read from file; changes to | 36 | In the MMU case: VM regions backed by pages read from file; changes to |
37 | the underlying file are reflected in the mapping; copied across fork. | 37 | the underlying file are reflected in the mapping; copied across fork. |
@@ -56,7 +56,7 @@ and it's also much more restricted in the latter case: | |||
56 | are visible in other processes (no MMU protection), but should not | 56 | are visible in other processes (no MMU protection), but should not |
57 | happen. | 57 | happen. |
58 | 58 | ||
59 | (*) File, MAP_PRIVATE, PROT_READ / PROT_EXEC, PROT_WRITE | 59 | (#) File, MAP_PRIVATE, PROT_READ / PROT_EXEC, PROT_WRITE |
60 | 60 | ||
61 | In the MMU case: like the non-PROT_WRITE case, except that the pages in | 61 | In the MMU case: like the non-PROT_WRITE case, except that the pages in |
62 | question get copied before the write actually happens. From that point | 62 | question get copied before the write actually happens. From that point |
@@ -66,7 +66,7 @@ and it's also much more restricted in the latter case: | |||
66 | In the no-MMU case: works much like the non-PROT_WRITE case, except | 66 | In the no-MMU case: works much like the non-PROT_WRITE case, except |
67 | that a copy is always taken and never shared. | 67 | that a copy is always taken and never shared. |
68 | 68 | ||
69 | (*) Regular file / blockdev, MAP_SHARED, PROT_READ / PROT_EXEC / PROT_WRITE | 69 | (#) Regular file / blockdev, MAP_SHARED, PROT_READ / PROT_EXEC / PROT_WRITE |
70 | 70 | ||
71 | In the MMU case: VM regions backed by pages read from file; changes to | 71 | In the MMU case: VM regions backed by pages read from file; changes to |
72 | pages written back to file; writes to file reflected into pages backing | 72 | pages written back to file; writes to file reflected into pages backing |
@@ -74,7 +74,7 @@ and it's also much more restricted in the latter case: | |||
74 | 74 | ||
75 | In the no-MMU case: not supported. | 75 | In the no-MMU case: not supported. |
76 | 76 | ||
77 | (*) Memory backed regular file, MAP_SHARED, PROT_READ / PROT_EXEC / PROT_WRITE | 77 | (#) Memory backed regular file, MAP_SHARED, PROT_READ / PROT_EXEC / PROT_WRITE |
78 | 78 | ||
79 | In the MMU case: As for ordinary regular files. | 79 | In the MMU case: As for ordinary regular files. |
80 | 80 | ||
@@ -85,7 +85,7 @@ and it's also much more restricted in the latter case: | |||
85 | as for the MMU case. If the filesystem does not provide any such | 85 | as for the MMU case. If the filesystem does not provide any such |
86 | support, then the mapping request will be denied. | 86 | support, then the mapping request will be denied. |
87 | 87 | ||
88 | (*) Memory backed blockdev, MAP_SHARED, PROT_READ / PROT_EXEC / PROT_WRITE | 88 | (#) Memory backed blockdev, MAP_SHARED, PROT_READ / PROT_EXEC / PROT_WRITE |
89 | 89 | ||
90 | In the MMU case: As for ordinary regular files. | 90 | In the MMU case: As for ordinary regular files. |
91 | 91 | ||
@@ -94,7 +94,7 @@ and it's also much more restricted in the latter case: | |||
94 | truncate being called. The ramdisk driver could do this if it allocated | 94 | truncate being called. The ramdisk driver could do this if it allocated |
95 | all its memory as a contiguous array upfront. | 95 | all its memory as a contiguous array upfront. |
96 | 96 | ||
97 | (*) Memory backed chardev, MAP_SHARED, PROT_READ / PROT_EXEC / PROT_WRITE | 97 | (#) Memory backed chardev, MAP_SHARED, PROT_READ / PROT_EXEC / PROT_WRITE |
98 | 98 | ||
99 | In the MMU case: As for ordinary regular files. | 99 | In the MMU case: As for ordinary regular files. |
100 | 100 | ||
@@ -105,21 +105,20 @@ and it's also much more restricted in the latter case: | |||
105 | provide any such support, then the mapping request will be denied. | 105 | provide any such support, then the mapping request will be denied. |
106 | 106 | ||
107 | 107 | ||
108 | ============================ | 108 | Further notes on no-MMU MMAP |
109 | FURTHER NOTES ON NO-MMU MMAP | ||
110 | ============================ | 109 | ============================ |
111 | 110 | ||
112 | (*) A request for a private mapping of a file may return a buffer that is not | 111 | (#) A request for a private mapping of a file may return a buffer that is not |
113 | page-aligned. This is because XIP may take place, and the data may not be | 112 | page-aligned. This is because XIP may take place, and the data may not be |
114 | paged aligned in the backing store. | 113 | paged aligned in the backing store. |
115 | 114 | ||
116 | (*) A request for an anonymous mapping will always be page aligned. If | 115 | (#) A request for an anonymous mapping will always be page aligned. If |
117 | possible the size of the request should be a power of two otherwise some | 116 | possible the size of the request should be a power of two otherwise some |
118 | of the space may be wasted as the kernel must allocate a power-of-2 | 117 | of the space may be wasted as the kernel must allocate a power-of-2 |
119 | granule but will only discard the excess if appropriately configured as | 118 | granule but will only discard the excess if appropriately configured as |
120 | this has an effect on fragmentation. | 119 | this has an effect on fragmentation. |
121 | 120 | ||
122 | (*) The memory allocated by a request for an anonymous mapping will normally | 121 | (#) The memory allocated by a request for an anonymous mapping will normally |
123 | be cleared by the kernel before being returned in accordance with the | 122 | be cleared by the kernel before being returned in accordance with the |
124 | Linux man pages (ver 2.22 or later). | 123 | Linux man pages (ver 2.22 or later). |
125 | 124 | ||
@@ -145,24 +144,23 @@ FURTHER NOTES ON NO-MMU MMAP | |||
145 | uClibc uses this to speed up malloc(), and the ELF-FDPIC binfmt uses this | 144 | uClibc uses this to speed up malloc(), and the ELF-FDPIC binfmt uses this |
146 | to allocate the brk and stack region. | 145 | to allocate the brk and stack region. |
147 | 146 | ||
148 | (*) A list of all the private copy and anonymous mappings on the system is | 147 | (#) A list of all the private copy and anonymous mappings on the system is |
149 | visible through /proc/maps in no-MMU mode. | 148 | visible through /proc/maps in no-MMU mode. |
150 | 149 | ||
151 | (*) A list of all the mappings in use by a process is visible through | 150 | (#) A list of all the mappings in use by a process is visible through |
152 | /proc/<pid>/maps in no-MMU mode. | 151 | /proc/<pid>/maps in no-MMU mode. |
153 | 152 | ||
154 | (*) Supplying MAP_FIXED or a requesting a particular mapping address will | 153 | (#) Supplying MAP_FIXED or a requesting a particular mapping address will |
155 | result in an error. | 154 | result in an error. |
156 | 155 | ||
157 | (*) Files mapped privately usually have to have a read method provided by the | 156 | (#) Files mapped privately usually have to have a read method provided by the |
158 | driver or filesystem so that the contents can be read into the memory | 157 | driver or filesystem so that the contents can be read into the memory |
159 | allocated if mmap() chooses not to map the backing device directly. An | 158 | allocated if mmap() chooses not to map the backing device directly. An |
160 | error will result if they don't. This is most likely to be encountered | 159 | error will result if they don't. This is most likely to be encountered |
161 | with character device files, pipes, fifos and sockets. | 160 | with character device files, pipes, fifos and sockets. |
162 | 161 | ||
163 | 162 | ||
164 | ========================== | 163 | Interprocess shared memory |
165 | INTERPROCESS SHARED MEMORY | ||
166 | ========================== | 164 | ========================== |
167 | 165 | ||
168 | Both SYSV IPC SHM shared memory and POSIX shared memory is supported in NOMMU | 166 | Both SYSV IPC SHM shared memory and POSIX shared memory is supported in NOMMU |
@@ -170,8 +168,7 @@ mode. The former through the usual mechanism, the latter through files created | |||
170 | on ramfs or tmpfs mounts. | 168 | on ramfs or tmpfs mounts. |
171 | 169 | ||
172 | 170 | ||
173 | ======= | 171 | Futexes |
174 | FUTEXES | ||
175 | ======= | 172 | ======= |
176 | 173 | ||
177 | Futexes are supported in NOMMU mode if the arch supports them. An error will | 174 | Futexes are supported in NOMMU mode if the arch supports them. An error will |
@@ -180,12 +177,11 @@ mappings made by a process or if the mapping in which the address lies does not | |||
180 | support futexes (such as an I/O chardev mapping). | 177 | support futexes (such as an I/O chardev mapping). |
181 | 178 | ||
182 | 179 | ||
183 | ============= | 180 | No-MMU mremap |
184 | NO-MMU MREMAP | ||
185 | ============= | 181 | ============= |
186 | 182 | ||
187 | The mremap() function is partially supported. It may change the size of a | 183 | The mremap() function is partially supported. It may change the size of a |
188 | mapping, and may move it[*] if MREMAP_MAYMOVE is specified and if the new size | 184 | mapping, and may move it [#]_ if MREMAP_MAYMOVE is specified and if the new size |
189 | of the mapping exceeds the size of the slab object currently occupied by the | 185 | of the mapping exceeds the size of the slab object currently occupied by the |
190 | memory to which the mapping refers, or if a smaller slab object could be used. | 186 | memory to which the mapping refers, or if a smaller slab object could be used. |
191 | 187 | ||
@@ -200,11 +196,10 @@ a previously mapped object. It may not be used to create holes in existing | |||
200 | mappings, move parts of existing mappings or resize parts of mappings. It must | 196 | mappings, move parts of existing mappings or resize parts of mappings. It must |
201 | act on a complete mapping. | 197 | act on a complete mapping. |
202 | 198 | ||
203 | [*] Not currently supported. | 199 | .. [#] Not currently supported. |
204 | 200 | ||
205 | 201 | ||
206 | ============================================ | 202 | Providing shareable character device support |
207 | PROVIDING SHAREABLE CHARACTER DEVICE SUPPORT | ||
208 | ============================================ | 203 | ============================================ |
209 | 204 | ||
210 | To provide shareable character device support, a driver must provide a | 205 | To provide shareable character device support, a driver must provide a |
@@ -235,7 +230,7 @@ direct the call to the device-specific driver. Under such circumstances, the | |||
235 | mapping request will be rejected if NOMMU_MAP_COPY is not specified, and a | 230 | mapping request will be rejected if NOMMU_MAP_COPY is not specified, and a |
236 | copy mapped otherwise. | 231 | copy mapped otherwise. |
237 | 232 | ||
238 | IMPORTANT NOTE: | 233 | .. important:: |
239 | 234 | ||
240 | Some types of device may present a different appearance to anyone | 235 | Some types of device may present a different appearance to anyone |
241 | looking at them in certain modes. Flash chips can be like this; for | 236 | looking at them in certain modes. Flash chips can be like this; for |
@@ -249,8 +244,7 @@ IMPORTANT NOTE: | |||
249 | circumstances! | 244 | circumstances! |
250 | 245 | ||
251 | 246 | ||
252 | ============================================== | 247 | Providing shareable memory-backed file support |
253 | PROVIDING SHAREABLE MEMORY-BACKED FILE SUPPORT | ||
254 | ============================================== | 248 | ============================================== |
255 | 249 | ||
256 | Provision of shared mappings on memory backed files is similar to the provision | 250 | Provision of shared mappings on memory backed files is similar to the provision |
@@ -267,8 +261,7 @@ Memory backed devices are indicated by the mapping's backing device info having | |||
267 | the memory_backed flag set. | 261 | the memory_backed flag set. |
268 | 262 | ||
269 | 263 | ||
270 | ======================================== | 264 | Providing shareable block device support |
271 | PROVIDING SHAREABLE BLOCK DEVICE SUPPORT | ||
272 | ======================================== | 265 | ======================================== |
273 | 266 | ||
274 | Provision of shared mappings on block device files is exactly the same as for | 267 | Provision of shared mappings on block device files is exactly the same as for |
@@ -276,8 +269,7 @@ character devices. If there isn't a real device underneath, then the driver | |||
276 | should allocate sufficient contiguous memory to honour any supported mapping. | 269 | should allocate sufficient contiguous memory to honour any supported mapping. |
277 | 270 | ||
278 | 271 | ||
279 | ================================= | 272 | Adjusting page trimming behaviour |
280 | ADJUSTING PAGE TRIMMING BEHAVIOUR | ||
281 | ================================= | 273 | ================================= |
282 | 274 | ||
283 | NOMMU mmap automatically rounds up to the nearest power-of-2 number of pages | 275 | NOMMU mmap automatically rounds up to the nearest power-of-2 number of pages |
@@ -288,4 +280,4 @@ allocator. In order to retain finer-grained control over fragmentation, this | |||
288 | behaviour can either be disabled completely, or bumped up to a higher page | 280 | behaviour can either be disabled completely, or bumped up to a higher page |
289 | watermark where trimming begins. | 281 | watermark where trimming begins. |
290 | 282 | ||
291 | Page trimming behaviour is configurable via the sysctl `vm.nr_trim_pages'. | 283 | Page trimming behaviour is configurable via the sysctl ``vm.nr_trim_pages``. |
diff --git a/Documentation/ntb.txt b/Documentation/ntb.txt index a5af4f0159f3..a043854d28df 100644 --- a/Documentation/ntb.txt +++ b/Documentation/ntb.txt | |||
@@ -1,4 +1,6 @@ | |||
1 | # NTB Drivers | 1 | =========== |
2 | NTB Drivers | ||
3 | =========== | ||
2 | 4 | ||
3 | NTB (Non-Transparent Bridge) is a type of PCI-Express bridge chip that connects | 5 | NTB (Non-Transparent Bridge) is a type of PCI-Express bridge chip that connects |
4 | the separate memory systems of two or more computers to the same PCI-Express | 6 | the separate memory systems of two or more computers to the same PCI-Express |
@@ -12,7 +14,8 @@ special status bits to make sure the information isn't rewritten by another | |||
12 | peer. Doorbell registers provide a way for peers to send interrupt events. | 14 | peer. Doorbell registers provide a way for peers to send interrupt events. |
13 | Memory windows allow translated read and write access to the peer memory. | 15 | Memory windows allow translated read and write access to the peer memory. |
14 | 16 | ||
15 | ## NTB Core Driver (ntb) | 17 | NTB Core Driver (ntb) |
18 | ===================== | ||
16 | 19 | ||
17 | The NTB core driver defines an api wrapping the common feature set, and allows | 20 | The NTB core driver defines an api wrapping the common feature set, and allows |
18 | clients interested in NTB features to discover NTB the devices supported by | 21 | clients interested in NTB features to discover NTB the devices supported by |
@@ -20,7 +23,8 @@ hardware drivers. The term "client" is used here to mean an upper layer | |||
20 | component making use of the NTB api. The term "driver," or "hardware driver," | 23 | component making use of the NTB api. The term "driver," or "hardware driver," |
21 | is used here to mean a driver for a specific vendor and model of NTB hardware. | 24 | is used here to mean a driver for a specific vendor and model of NTB hardware. |
22 | 25 | ||
23 | ## NTB Client Drivers | 26 | NTB Client Drivers |
27 | ================== | ||
24 | 28 | ||
25 | NTB client drivers should register with the NTB core driver. After | 29 | NTB client drivers should register with the NTB core driver. After |
26 | registering, the client probe and remove functions will be called appropriately | 30 | registering, the client probe and remove functions will be called appropriately |
@@ -28,7 +32,8 @@ as ntb hardware, or hardware drivers, are inserted and removed. The | |||
28 | registration uses the Linux Device framework, so it should feel familiar to | 32 | registration uses the Linux Device framework, so it should feel familiar to |
29 | anyone who has written a pci driver. | 33 | anyone who has written a pci driver. |
30 | 34 | ||
31 | ### NTB Typical client driver implementation | 35 | NTB Typical client driver implementation |
36 | ---------------------------------------- | ||
32 | 37 | ||
33 | Primary purpose of NTB is to share some peace of memory between at least two | 38 | Primary purpose of NTB is to share some peace of memory between at least two |
34 | systems. So the NTB device features like Scratchpad/Message registers are | 39 | systems. So the NTB device features like Scratchpad/Message registers are |
@@ -109,7 +114,8 @@ follows: | |||
109 | Also it is worth to note, that method ntb_mw_count(pidx) should return the | 114 | Also it is worth to note, that method ntb_mw_count(pidx) should return the |
110 | same value as ntb_peer_mw_count() on the peer with port index - pidx. | 115 | same value as ntb_peer_mw_count() on the peer with port index - pidx. |
111 | 116 | ||
112 | ### NTB Transport Client (ntb\_transport) and NTB Netdev (ntb\_netdev) | 117 | NTB Transport Client (ntb\_transport) and NTB Netdev (ntb\_netdev) |
118 | ------------------------------------------------------------------ | ||
113 | 119 | ||
114 | The primary client for NTB is the Transport client, used in tandem with NTB | 120 | The primary client for NTB is the Transport client, used in tandem with NTB |
115 | Netdev. These drivers function together to create a logical link to the peer, | 121 | Netdev. These drivers function together to create a logical link to the peer, |
@@ -120,7 +126,8 @@ Transport queue pair. Network data is copied between socket buffers and the | |||
120 | Transport queue pair buffer. The Transport client may be used for other things | 126 | Transport queue pair buffer. The Transport client may be used for other things |
121 | besides Netdev, however no other applications have yet been written. | 127 | besides Netdev, however no other applications have yet been written. |
122 | 128 | ||
123 | ### NTB Ping Pong Test Client (ntb\_pingpong) | 129 | NTB Ping Pong Test Client (ntb\_pingpong) |
130 | ----------------------------------------- | ||
124 | 131 | ||
125 | The Ping Pong test client serves as a demonstration to exercise the doorbell | 132 | The Ping Pong test client serves as a demonstration to exercise the doorbell |
126 | and scratchpad registers of NTB hardware, and as an example simple NTB client. | 133 | and scratchpad registers of NTB hardware, and as an example simple NTB client. |
@@ -147,7 +154,8 @@ Module Parameters: | |||
147 | * dyndbg - It is suggested to specify dyndbg=+p when loading this module, and | 154 | * dyndbg - It is suggested to specify dyndbg=+p when loading this module, and |
148 | then to observe debugging output on the console. | 155 | then to observe debugging output on the console. |
149 | 156 | ||
150 | ### NTB Tool Test Client (ntb\_tool) | 157 | NTB Tool Test Client (ntb\_tool) |
158 | -------------------------------- | ||
151 | 159 | ||
152 | The Tool test client serves for debugging, primarily, ntb hardware and drivers. | 160 | The Tool test client serves for debugging, primarily, ntb hardware and drivers. |
153 | The Tool provides access through debugfs for reading, setting, and clearing the | 161 | The Tool provides access through debugfs for reading, setting, and clearing the |
@@ -157,48 +165,60 @@ The Tool does not currently have any module parameters. | |||
157 | 165 | ||
158 | Debugfs Files: | 166 | Debugfs Files: |
159 | 167 | ||
160 | * *debugfs*/ntb\_tool/*hw*/ - A directory in debugfs will be created for each | 168 | * *debugfs*/ntb\_tool/*hw*/ |
169 | A directory in debugfs will be created for each | ||
161 | NTB device probed by the tool. This directory is shortened to *hw* | 170 | NTB device probed by the tool. This directory is shortened to *hw* |
162 | below. | 171 | below. |
163 | * *hw*/db - This file is used to read, set, and clear the local doorbell. Not | 172 | * *hw*/db |
173 | This file is used to read, set, and clear the local doorbell. Not | ||
164 | all operations may be supported by all hardware. To read the doorbell, | 174 | all operations may be supported by all hardware. To read the doorbell, |
165 | read the file. To set the doorbell, write `s` followed by the bits to | 175 | read the file. To set the doorbell, write `s` followed by the bits to |
166 | set (eg: `echo 's 0x0101' > db`). To clear the doorbell, write `c` | 176 | set (eg: `echo 's 0x0101' > db`). To clear the doorbell, write `c` |
167 | followed by the bits to clear. | 177 | followed by the bits to clear. |
168 | * *hw*/mask - This file is used to read, set, and clear the local doorbell mask. | 178 | * *hw*/mask |
179 | This file is used to read, set, and clear the local doorbell mask. | ||
169 | See *db* for details. | 180 | See *db* for details. |
170 | * *hw*/peer\_db - This file is used to read, set, and clear the peer doorbell. | 181 | * *hw*/peer\_db |
182 | This file is used to read, set, and clear the peer doorbell. | ||
171 | See *db* for details. | 183 | See *db* for details. |
172 | * *hw*/peer\_mask - This file is used to read, set, and clear the peer doorbell | 184 | * *hw*/peer\_mask |
185 | This file is used to read, set, and clear the peer doorbell | ||
173 | mask. See *db* for details. | 186 | mask. See *db* for details. |
174 | * *hw*/spad - This file is used to read and write local scratchpads. To read | 187 | * *hw*/spad |
188 | This file is used to read and write local scratchpads. To read | ||
175 | the values of all scratchpads, read the file. To write values, write a | 189 | the values of all scratchpads, read the file. To write values, write a |
176 | series of pairs of scratchpad number and value | 190 | series of pairs of scratchpad number and value |
177 | (eg: `echo '4 0x123 7 0xabc' > spad` | 191 | (eg: `echo '4 0x123 7 0xabc' > spad` |
178 | # to set scratchpads `4` and `7` to `0x123` and `0xabc`, respectively). | 192 | # to set scratchpads `4` and `7` to `0x123` and `0xabc`, respectively). |
179 | * *hw*/peer\_spad - This file is used to read and write peer scratchpads. See | 193 | * *hw*/peer\_spad |
194 | This file is used to read and write peer scratchpads. See | ||
180 | *spad* for details. | 195 | *spad* for details. |
181 | 196 | ||
182 | ## NTB Hardware Drivers | 197 | NTB Hardware Drivers |
198 | ==================== | ||
183 | 199 | ||
184 | NTB hardware drivers should register devices with the NTB core driver. After | 200 | NTB hardware drivers should register devices with the NTB core driver. After |
185 | registering, clients probe and remove functions will be called. | 201 | registering, clients probe and remove functions will be called. |
186 | 202 | ||
187 | ### NTB Intel Hardware Driver (ntb\_hw\_intel) | 203 | NTB Intel Hardware Driver (ntb\_hw\_intel) |
204 | ------------------------------------------ | ||
188 | 205 | ||
189 | The Intel hardware driver supports NTB on Xeon and Atom CPUs. | 206 | The Intel hardware driver supports NTB on Xeon and Atom CPUs. |
190 | 207 | ||
191 | Module Parameters: | 208 | Module Parameters: |
192 | 209 | ||
193 | * b2b\_mw\_idx - If the peer ntb is to be accessed via a memory window, then use | 210 | * b2b\_mw\_idx |
211 | If the peer ntb is to be accessed via a memory window, then use | ||
194 | this memory window to access the peer ntb. A value of zero or positive | 212 | this memory window to access the peer ntb. A value of zero or positive |
195 | starts from the first mw idx, and a negative value starts from the last | 213 | starts from the first mw idx, and a negative value starts from the last |
196 | mw idx. Both sides MUST set the same value here! The default value is | 214 | mw idx. Both sides MUST set the same value here! The default value is |
197 | `-1`. | 215 | `-1`. |
198 | * b2b\_mw\_share - If the peer ntb is to be accessed via a memory window, and if | 216 | * b2b\_mw\_share |
217 | If the peer ntb is to be accessed via a memory window, and if | ||
199 | the memory window is large enough, still allow the client to use the | 218 | the memory window is large enough, still allow the client to use the |
200 | second half of the memory window for address translation to the peer. | 219 | second half of the memory window for address translation to the peer. |
201 | * xeon\_b2b\_usd\_bar2\_addr64 - If using B2B topology on Xeon hardware, use | 220 | * xeon\_b2b\_usd\_bar2\_addr64 |
221 | If using B2B topology on Xeon hardware, use | ||
202 | this 64 bit address on the bus between the NTB devices for the window | 222 | this 64 bit address on the bus between the NTB devices for the window |
203 | at BAR2, on the upstream side of the link. | 223 | at BAR2, on the upstream side of the link. |
204 | * xeon\_b2b\_usd\_bar4\_addr64 - See *xeon\_b2b\_bar2\_addr64*. | 224 | * xeon\_b2b\_usd\_bar4\_addr64 - See *xeon\_b2b\_bar2\_addr64*. |
diff --git a/Documentation/numastat.txt b/Documentation/numastat.txt index 520327790d54..aaf1667489f8 100644 --- a/Documentation/numastat.txt +++ b/Documentation/numastat.txt | |||
@@ -1,10 +1,12 @@ | |||
1 | 1 | =============================== | |
2 | Numa policy hit/miss statistics | 2 | Numa policy hit/miss statistics |
3 | =============================== | ||
3 | 4 | ||
4 | /sys/devices/system/node/node*/numastat | 5 | /sys/devices/system/node/node*/numastat |
5 | 6 | ||
6 | All units are pages. Hugepages have separate counters. | 7 | All units are pages. Hugepages have separate counters. |
7 | 8 | ||
9 | =============== ============================================================ | ||
8 | numa_hit A process wanted to allocate memory from this node, | 10 | numa_hit A process wanted to allocate memory from this node, |
9 | and succeeded. | 11 | and succeeded. |
10 | 12 | ||
@@ -20,6 +22,7 @@ other_node A process ran on this node and got memory from another node. | |||
20 | 22 | ||
21 | interleave_hit Interleaving wanted to allocate from this node | 23 | interleave_hit Interleaving wanted to allocate from this node |
22 | and succeeded. | 24 | and succeeded. |
25 | =============== ============================================================ | ||
23 | 26 | ||
24 | For easier reading you can use the numastat utility from the numactl package | 27 | For easier reading you can use the numastat utility from the numactl package |
25 | (http://oss.sgi.com/projects/libnuma/). Note that it only works | 28 | (http://oss.sgi.com/projects/libnuma/). Note that it only works |
diff --git a/Documentation/padata.txt b/Documentation/padata.txt index 7ddfe216a0aa..b103d0c82000 100644 --- a/Documentation/padata.txt +++ b/Documentation/padata.txt | |||
@@ -1,5 +1,8 @@ | |||
1 | ======================================= | ||
1 | The padata parallel execution mechanism | 2 | The padata parallel execution mechanism |
2 | Last updated for 2.6.36 | 3 | ======================================= |
4 | |||
5 | :Last updated: for 2.6.36 | ||
3 | 6 | ||
4 | Padata is a mechanism by which the kernel can farm work out to be done in | 7 | Padata is a mechanism by which the kernel can farm work out to be done in |
5 | parallel on multiple CPUs while retaining the ordering of tasks. It was | 8 | parallel on multiple CPUs while retaining the ordering of tasks. It was |
@@ -9,7 +12,7 @@ those packets. The crypto developers made a point of writing padata in a | |||
9 | sufficiently general fashion that it could be put to other uses as well. | 12 | sufficiently general fashion that it could be put to other uses as well. |
10 | 13 | ||
11 | The first step in using padata is to set up a padata_instance structure for | 14 | The first step in using padata is to set up a padata_instance structure for |
12 | overall control of how tasks are to be run: | 15 | overall control of how tasks are to be run:: |
13 | 16 | ||
14 | #include <linux/padata.h> | 17 | #include <linux/padata.h> |
15 | 18 | ||
@@ -24,7 +27,7 @@ The workqueue wq is where the work will actually be done; it should be | |||
24 | a multithreaded queue, naturally. | 27 | a multithreaded queue, naturally. |
25 | 28 | ||
26 | To allocate a padata instance with the cpu_possible_mask for both | 29 | To allocate a padata instance with the cpu_possible_mask for both |
27 | cpumasks this helper function can be used: | 30 | cpumasks this helper function can be used:: |
28 | 31 | ||
29 | struct padata_instance *padata_alloc_possible(struct workqueue_struct *wq); | 32 | struct padata_instance *padata_alloc_possible(struct workqueue_struct *wq); |
30 | 33 | ||
@@ -36,7 +39,7 @@ it is legal to supply a cpumask to padata that contains offline CPUs. | |||
36 | Once an offline CPU in the user supplied cpumask comes online, padata | 39 | Once an offline CPU in the user supplied cpumask comes online, padata |
37 | is going to use it. | 40 | is going to use it. |
38 | 41 | ||
39 | There are functions for enabling and disabling the instance: | 42 | There are functions for enabling and disabling the instance:: |
40 | 43 | ||
41 | int padata_start(struct padata_instance *pinst); | 44 | int padata_start(struct padata_instance *pinst); |
42 | void padata_stop(struct padata_instance *pinst); | 45 | void padata_stop(struct padata_instance *pinst); |
@@ -48,7 +51,7 @@ padata cpumask contains no active CPU (flag not set). | |||
48 | padata_stop clears the flag and blocks until the padata instance | 51 | padata_stop clears the flag and blocks until the padata instance |
49 | is unused. | 52 | is unused. |
50 | 53 | ||
51 | The list of CPUs to be used can be adjusted with these functions: | 54 | The list of CPUs to be used can be adjusted with these functions:: |
52 | 55 | ||
53 | int padata_set_cpumasks(struct padata_instance *pinst, | 56 | int padata_set_cpumasks(struct padata_instance *pinst, |
54 | cpumask_var_t pcpumask, | 57 | cpumask_var_t pcpumask, |
@@ -71,12 +74,12 @@ padata_add_cpu/padata_remove_cpu are used. cpu specifies the CPU to add or | |||
71 | remove and mask is one of PADATA_CPU_SERIAL, PADATA_CPU_PARALLEL. | 74 | remove and mask is one of PADATA_CPU_SERIAL, PADATA_CPU_PARALLEL. |
72 | 75 | ||
73 | If a user is interested in padata cpumask changes, he can register to | 76 | If a user is interested in padata cpumask changes, he can register to |
74 | the padata cpumask change notifier: | 77 | the padata cpumask change notifier:: |
75 | 78 | ||
76 | int padata_register_cpumask_notifier(struct padata_instance *pinst, | 79 | int padata_register_cpumask_notifier(struct padata_instance *pinst, |
77 | struct notifier_block *nblock); | 80 | struct notifier_block *nblock); |
78 | 81 | ||
79 | To unregister from that notifier: | 82 | To unregister from that notifier:: |
80 | 83 | ||
81 | int padata_unregister_cpumask_notifier(struct padata_instance *pinst, | 84 | int padata_unregister_cpumask_notifier(struct padata_instance *pinst, |
82 | struct notifier_block *nblock); | 85 | struct notifier_block *nblock); |
@@ -84,7 +87,7 @@ To unregister from that notifier: | |||
84 | The padata cpumask change notifier notifies about changes of the usable | 87 | The padata cpumask change notifier notifies about changes of the usable |
85 | cpumasks, i.e. the subset of active CPUs in the user supplied cpumask. | 88 | cpumasks, i.e. the subset of active CPUs in the user supplied cpumask. |
86 | 89 | ||
87 | Padata calls the notifier chain with: | 90 | Padata calls the notifier chain with:: |
88 | 91 | ||
89 | blocking_notifier_call_chain(&pinst->cpumask_change_notifier, | 92 | blocking_notifier_call_chain(&pinst->cpumask_change_notifier, |
90 | notification_mask, | 93 | notification_mask, |
@@ -95,7 +98,7 @@ is one of PADATA_CPU_SERIAL, PADATA_CPU_PARALLEL and cpumask is a pointer | |||
95 | to a struct padata_cpumask that contains the new cpumask information. | 98 | to a struct padata_cpumask that contains the new cpumask information. |
96 | 99 | ||
97 | Actually submitting work to the padata instance requires the creation of a | 100 | Actually submitting work to the padata instance requires the creation of a |
98 | padata_priv structure: | 101 | padata_priv structure:: |
99 | 102 | ||
100 | struct padata_priv { | 103 | struct padata_priv { |
101 | /* Other stuff here... */ | 104 | /* Other stuff here... */ |
@@ -110,7 +113,7 @@ parallel() and serial() functions should be provided. Those functions will | |||
110 | be called in the process of getting the work done as we will see | 113 | be called in the process of getting the work done as we will see |
111 | momentarily. | 114 | momentarily. |
112 | 115 | ||
113 | The submission of work is done with: | 116 | The submission of work is done with:: |
114 | 117 | ||
115 | int padata_do_parallel(struct padata_instance *pinst, | 118 | int padata_do_parallel(struct padata_instance *pinst, |
116 | struct padata_priv *padata, int cb_cpu); | 119 | struct padata_priv *padata, int cb_cpu); |
@@ -138,7 +141,7 @@ need not be completed during this call, but, if parallel() leaves work | |||
138 | outstanding, it should be prepared to be called again with a new job before | 141 | outstanding, it should be prepared to be called again with a new job before |
139 | the previous one completes. When a task does complete, parallel() (or | 142 | the previous one completes. When a task does complete, parallel() (or |
140 | whatever function actually finishes the job) should inform padata of the | 143 | whatever function actually finishes the job) should inform padata of the |
141 | fact with a call to: | 144 | fact with a call to:: |
142 | 145 | ||
143 | void padata_do_serial(struct padata_priv *padata); | 146 | void padata_do_serial(struct padata_priv *padata); |
144 | 147 | ||
@@ -151,7 +154,7 @@ pains to ensure that tasks are completed in the order in which they were | |||
151 | submitted. | 154 | submitted. |
152 | 155 | ||
153 | The one remaining function in the padata API should be called to clean up | 156 | The one remaining function in the padata API should be called to clean up |
154 | when a padata instance is no longer needed: | 157 | when a padata instance is no longer needed:: |
155 | 158 | ||
156 | void padata_free(struct padata_instance *pinst); | 159 | void padata_free(struct padata_instance *pinst); |
157 | 160 | ||
diff --git a/Documentation/parport-lowlevel.txt b/Documentation/parport-lowlevel.txt index 120eb20dbb09..0633d70ffda7 100644 --- a/Documentation/parport-lowlevel.txt +++ b/Documentation/parport-lowlevel.txt | |||
@@ -1,11 +1,12 @@ | |||
1 | =============================== | ||
1 | PARPORT interface documentation | 2 | PARPORT interface documentation |
2 | ------------------------------- | 3 | =============================== |
3 | 4 | ||
4 | Time-stamp: <2000-02-24 13:30:20 twaugh> | 5 | :Time-stamp: <2000-02-24 13:30:20 twaugh> |
5 | 6 | ||
6 | Described here are the following functions: | 7 | Described here are the following functions: |
7 | 8 | ||
8 | Global functions: | 9 | Global functions:: |
9 | parport_register_driver | 10 | parport_register_driver |
10 | parport_unregister_driver | 11 | parport_unregister_driver |
11 | parport_enumerate | 12 | parport_enumerate |
@@ -31,7 +32,8 @@ Global functions: | |||
31 | parport_set_timeout | 32 | parport_set_timeout |
32 | 33 | ||
33 | Port functions (can be overridden by low-level drivers): | 34 | Port functions (can be overridden by low-level drivers): |
34 | SPP: | 35 | |
36 | SPP:: | ||
35 | port->ops->read_data | 37 | port->ops->read_data |
36 | port->ops->write_data | 38 | port->ops->write_data |
37 | port->ops->read_status | 39 | port->ops->read_status |
@@ -43,23 +45,23 @@ Port functions (can be overridden by low-level drivers): | |||
43 | port->ops->data_forward | 45 | port->ops->data_forward |
44 | port->ops->data_reverse | 46 | port->ops->data_reverse |
45 | 47 | ||
46 | EPP: | 48 | EPP:: |
47 | port->ops->epp_write_data | 49 | port->ops->epp_write_data |
48 | port->ops->epp_read_data | 50 | port->ops->epp_read_data |
49 | port->ops->epp_write_addr | 51 | port->ops->epp_write_addr |
50 | port->ops->epp_read_addr | 52 | port->ops->epp_read_addr |
51 | 53 | ||
52 | ECP: | 54 | ECP:: |
53 | port->ops->ecp_write_data | 55 | port->ops->ecp_write_data |
54 | port->ops->ecp_read_data | 56 | port->ops->ecp_read_data |
55 | port->ops->ecp_write_addr | 57 | port->ops->ecp_write_addr |
56 | 58 | ||
57 | Other: | 59 | Other:: |
58 | port->ops->nibble_read_data | 60 | port->ops->nibble_read_data |
59 | port->ops->byte_read_data | 61 | port->ops->byte_read_data |
60 | port->ops->compat_write_data | 62 | port->ops->compat_write_data |
61 | 63 | ||
62 | The parport subsystem comprises 'parport' (the core port-sharing | 64 | The parport subsystem comprises ``parport`` (the core port-sharing |
63 | code), and a variety of low-level drivers that actually do the port | 65 | code), and a variety of low-level drivers that actually do the port |
64 | accesses. Each low-level driver handles a particular style of port | 66 | accesses. Each low-level driver handles a particular style of port |
65 | (PC, Amiga, and so on). | 67 | (PC, Amiga, and so on). |
@@ -70,14 +72,14 @@ into global functions and port functions. | |||
70 | The global functions are mostly for communicating between the device | 72 | The global functions are mostly for communicating between the device |
71 | driver and the parport subsystem: acquiring a list of available ports, | 73 | driver and the parport subsystem: acquiring a list of available ports, |
72 | claiming a port for exclusive use, and so on. They also include | 74 | claiming a port for exclusive use, and so on. They also include |
73 | 'generic' functions for doing standard things that will work on any | 75 | ``generic`` functions for doing standard things that will work on any |
74 | IEEE 1284-capable architecture. | 76 | IEEE 1284-capable architecture. |
75 | 77 | ||
76 | The port functions are provided by the low-level drivers, although the | 78 | The port functions are provided by the low-level drivers, although the |
77 | core parport module provides generic 'defaults' for some routines. | 79 | core parport module provides generic ``defaults`` for some routines. |
78 | The port functions can be split into three groups: SPP, EPP, and ECP. | 80 | The port functions can be split into three groups: SPP, EPP, and ECP. |
79 | 81 | ||
80 | SPP (Standard Parallel Port) functions modify so-called 'SPP' | 82 | SPP (Standard Parallel Port) functions modify so-called ``SPP`` |
81 | registers: data, status, and control. The hardware may not actually | 83 | registers: data, status, and control. The hardware may not actually |
82 | have registers exactly like that, but the PC does and this interface is | 84 | have registers exactly like that, but the PC does and this interface is |
83 | modelled after common PC implementations. Other low-level drivers may | 85 | modelled after common PC implementations. Other low-level drivers may |
@@ -95,58 +97,63 @@ to cope with peripherals that only tenuously support IEEE 1284, a | |||
95 | low-level driver specific function is provided, for altering 'fudge | 97 | low-level driver specific function is provided, for altering 'fudge |
96 | factors'. | 98 | factors'. |
97 | 99 | ||
98 | GLOBAL FUNCTIONS | 100 | Global functions |
99 | ---------------- | 101 | ================ |
100 | 102 | ||
101 | parport_register_driver - register a device driver with parport | 103 | parport_register_driver - register a device driver with parport |
102 | ----------------------- | 104 | --------------------------------------------------------------- |
103 | 105 | ||
104 | SYNOPSIS | 106 | SYNOPSIS |
107 | ^^^^^^^^ | ||
108 | |||
109 | :: | ||
105 | 110 | ||
106 | #include <linux/parport.h> | 111 | #include <linux/parport.h> |
107 | 112 | ||
108 | struct parport_driver { | 113 | struct parport_driver { |
109 | const char *name; | 114 | const char *name; |
110 | void (*attach) (struct parport *); | 115 | void (*attach) (struct parport *); |
111 | void (*detach) (struct parport *); | 116 | void (*detach) (struct parport *); |
112 | struct parport_driver *next; | 117 | struct parport_driver *next; |
113 | }; | 118 | }; |
114 | int parport_register_driver (struct parport_driver *driver); | 119 | int parport_register_driver (struct parport_driver *driver); |
115 | 120 | ||
116 | DESCRIPTION | 121 | DESCRIPTION |
122 | ^^^^^^^^^^^ | ||
117 | 123 | ||
118 | In order to be notified about parallel ports when they are detected, | 124 | In order to be notified about parallel ports when they are detected, |
119 | parport_register_driver should be called. Your driver will | 125 | parport_register_driver should be called. Your driver will |
120 | immediately be notified of all ports that have already been detected, | 126 | immediately be notified of all ports that have already been detected, |
121 | and of each new port as low-level drivers are loaded. | 127 | and of each new port as low-level drivers are loaded. |
122 | 128 | ||
123 | A 'struct parport_driver' contains the textual name of your driver, | 129 | A ``struct parport_driver`` contains the textual name of your driver, |
124 | a pointer to a function to handle new ports, and a pointer to a | 130 | a pointer to a function to handle new ports, and a pointer to a |
125 | function to handle ports going away due to a low-level driver | 131 | function to handle ports going away due to a low-level driver |
126 | unloading. Ports will only be detached if they are not being used | 132 | unloading. Ports will only be detached if they are not being used |
127 | (i.e. there are no devices registered on them). | 133 | (i.e. there are no devices registered on them). |
128 | 134 | ||
129 | The visible parts of the 'struct parport *' argument given to | 135 | The visible parts of the ``struct parport *`` argument given to |
130 | attach/detach are: | 136 | attach/detach are:: |
131 | 137 | ||
132 | struct parport | 138 | struct parport |
133 | { | 139 | { |
134 | struct parport *next; /* next parport in list */ | 140 | struct parport *next; /* next parport in list */ |
135 | const char *name; /* port's name */ | 141 | const char *name; /* port's name */ |
136 | unsigned int modes; /* bitfield of hardware modes */ | 142 | unsigned int modes; /* bitfield of hardware modes */ |
137 | struct parport_device_info probe_info; | 143 | struct parport_device_info probe_info; |
138 | /* IEEE1284 info */ | 144 | /* IEEE1284 info */ |
139 | int number; /* parport index */ | 145 | int number; /* parport index */ |
140 | struct parport_operations *ops; | 146 | struct parport_operations *ops; |
141 | ... | 147 | ... |
142 | }; | 148 | }; |
143 | 149 | ||
144 | There are other members of the structure, but they should not be | 150 | There are other members of the structure, but they should not be |
145 | touched. | 151 | touched. |
146 | 152 | ||
147 | The 'modes' member summarises the capabilities of the underlying | 153 | The ``modes`` member summarises the capabilities of the underlying |
148 | hardware. It consists of flags which may be bitwise-ored together: | 154 | hardware. It consists of flags which may be bitwise-ored together: |
149 | 155 | ||
156 | ============================= =============================================== | ||
150 | PARPORT_MODE_PCSPP IBM PC registers are available, | 157 | PARPORT_MODE_PCSPP IBM PC registers are available, |
151 | i.e. functions that act on data, | 158 | i.e. functions that act on data, |
152 | control and status registers are | 159 | control and status registers are |
@@ -169,297 +176,351 @@ hardware. It consists of flags which may be bitwise-ored together: | |||
169 | GFP_DMA flag with kmalloc) to the | 176 | GFP_DMA flag with kmalloc) to the |
170 | low-level driver in order to take | 177 | low-level driver in order to take |
171 | advantage of it. | 178 | advantage of it. |
179 | ============================= =============================================== | ||
172 | 180 | ||
173 | There may be other flags in 'modes' as well. | 181 | There may be other flags in ``modes`` as well. |
174 | 182 | ||
175 | The contents of 'modes' is advisory only. For example, if the | 183 | The contents of ``modes`` is advisory only. For example, if the |
176 | hardware is capable of DMA, and PARPORT_MODE_DMA is in 'modes', it | 184 | hardware is capable of DMA, and PARPORT_MODE_DMA is in ``modes``, it |
177 | doesn't necessarily mean that DMA will always be used when possible. | 185 | doesn't necessarily mean that DMA will always be used when possible. |
178 | Similarly, hardware that is capable of assisting ECP transfers won't | 186 | Similarly, hardware that is capable of assisting ECP transfers won't |
179 | necessarily be used. | 187 | necessarily be used. |
180 | 188 | ||
181 | RETURN VALUE | 189 | RETURN VALUE |
190 | ^^^^^^^^^^^^ | ||
182 | 191 | ||
183 | Zero on success, otherwise an error code. | 192 | Zero on success, otherwise an error code. |
184 | 193 | ||
185 | ERRORS | 194 | ERRORS |
195 | ^^^^^^ | ||
186 | 196 | ||
187 | None. (Can it fail? Why return int?) | 197 | None. (Can it fail? Why return int?) |
188 | 198 | ||
189 | EXAMPLE | 199 | EXAMPLE |
200 | ^^^^^^^ | ||
190 | 201 | ||
191 | static void lp_attach (struct parport *port) | 202 | :: |
192 | { | ||
193 | ... | ||
194 | private = kmalloc (...); | ||
195 | dev[count++] = parport_register_device (...); | ||
196 | ... | ||
197 | } | ||
198 | 203 | ||
199 | static void lp_detach (struct parport *port) | 204 | static void lp_attach (struct parport *port) |
200 | { | 205 | { |
201 | ... | 206 | ... |
202 | } | 207 | private = kmalloc (...); |
208 | dev[count++] = parport_register_device (...); | ||
209 | ... | ||
210 | } | ||
203 | 211 | ||
204 | static struct parport_driver lp_driver = { | 212 | static void lp_detach (struct parport *port) |
205 | "lp", | 213 | { |
206 | lp_attach, | 214 | ... |
207 | lp_detach, | 215 | } |
208 | NULL /* always put NULL here */ | ||
209 | }; | ||
210 | 216 | ||
211 | int lp_init (void) | 217 | static struct parport_driver lp_driver = { |
212 | { | 218 | "lp", |
213 | ... | 219 | lp_attach, |
214 | if (parport_register_driver (&lp_driver)) { | 220 | lp_detach, |
215 | /* Failed; nothing we can do. */ | 221 | NULL /* always put NULL here */ |
216 | return -EIO; | 222 | }; |
223 | |||
224 | int lp_init (void) | ||
225 | { | ||
226 | ... | ||
227 | if (parport_register_driver (&lp_driver)) { | ||
228 | /* Failed; nothing we can do. */ | ||
229 | return -EIO; | ||
230 | } | ||
231 | ... | ||
217 | } | 232 | } |
218 | ... | 233 | |
219 | } | ||
220 | 234 | ||
221 | SEE ALSO | 235 | SEE ALSO |
236 | ^^^^^^^^ | ||
222 | 237 | ||
223 | parport_unregister_driver, parport_register_device, parport_enumerate | 238 | parport_unregister_driver, parport_register_device, parport_enumerate |
224 | 239 | ||
240 | |||
241 | |||
225 | parport_unregister_driver - tell parport to forget about this driver | 242 | parport_unregister_driver - tell parport to forget about this driver |
226 | ------------------------- | 243 | -------------------------------------------------------------------- |
227 | 244 | ||
228 | SYNOPSIS | 245 | SYNOPSIS |
246 | ^^^^^^^^ | ||
229 | 247 | ||
230 | #include <linux/parport.h> | 248 | :: |
231 | 249 | ||
232 | struct parport_driver { | 250 | #include <linux/parport.h> |
233 | const char *name; | 251 | |
234 | void (*attach) (struct parport *); | 252 | struct parport_driver { |
235 | void (*detach) (struct parport *); | 253 | const char *name; |
236 | struct parport_driver *next; | 254 | void (*attach) (struct parport *); |
237 | }; | 255 | void (*detach) (struct parport *); |
238 | void parport_unregister_driver (struct parport_driver *driver); | 256 | struct parport_driver *next; |
257 | }; | ||
258 | void parport_unregister_driver (struct parport_driver *driver); | ||
239 | 259 | ||
240 | DESCRIPTION | 260 | DESCRIPTION |
261 | ^^^^^^^^^^^ | ||
241 | 262 | ||
242 | This tells parport not to notify the device driver of new ports or of | 263 | This tells parport not to notify the device driver of new ports or of |
243 | ports going away. Registered devices belonging to that driver are NOT | 264 | ports going away. Registered devices belonging to that driver are NOT |
244 | unregistered: parport_unregister_device must be used for each one. | 265 | unregistered: parport_unregister_device must be used for each one. |
245 | 266 | ||
246 | EXAMPLE | 267 | EXAMPLE |
268 | ^^^^^^^ | ||
247 | 269 | ||
248 | void cleanup_module (void) | 270 | :: |
249 | { | ||
250 | ... | ||
251 | /* Stop notifications. */ | ||
252 | parport_unregister_driver (&lp_driver); | ||
253 | 271 | ||
254 | /* Unregister devices. */ | 272 | void cleanup_module (void) |
255 | for (i = 0; i < NUM_DEVS; i++) | 273 | { |
256 | parport_unregister_device (dev[i]); | 274 | ... |
257 | ... | 275 | /* Stop notifications. */ |
258 | } | 276 | parport_unregister_driver (&lp_driver); |
277 | |||
278 | /* Unregister devices. */ | ||
279 | for (i = 0; i < NUM_DEVS; i++) | ||
280 | parport_unregister_device (dev[i]); | ||
281 | ... | ||
282 | } | ||
259 | 283 | ||
260 | SEE ALSO | 284 | SEE ALSO |
285 | ^^^^^^^^ | ||
261 | 286 | ||
262 | parport_register_driver, parport_enumerate | 287 | parport_register_driver, parport_enumerate |
263 | 288 | ||
289 | |||
290 | |||
264 | parport_enumerate - retrieve a list of parallel ports (DEPRECATED) | 291 | parport_enumerate - retrieve a list of parallel ports (DEPRECATED) |
265 | ----------------- | 292 | ------------------------------------------------------------------ |
266 | 293 | ||
267 | SYNOPSIS | 294 | SYNOPSIS |
295 | ^^^^^^^^ | ||
268 | 296 | ||
269 | #include <linux/parport.h> | 297 | :: |
270 | 298 | ||
271 | struct parport *parport_enumerate (void); | 299 | #include <linux/parport.h> |
300 | |||
301 | struct parport *parport_enumerate (void); | ||
272 | 302 | ||
273 | DESCRIPTION | 303 | DESCRIPTION |
304 | ^^^^^^^^^^^ | ||
274 | 305 | ||
275 | Retrieve the first of a list of valid parallel ports for this machine. | 306 | Retrieve the first of a list of valid parallel ports for this machine. |
276 | Successive parallel ports can be found using the 'struct parport | 307 | Successive parallel ports can be found using the ``struct parport |
277 | *next' element of the 'struct parport *' that is returned. If 'next' | 308 | *next`` element of the ``struct parport *`` that is returned. If ``next`` |
278 | is NULL, there are no more parallel ports in the list. The number of | 309 | is NULL, there are no more parallel ports in the list. The number of |
279 | ports in the list will not exceed PARPORT_MAX. | 310 | ports in the list will not exceed PARPORT_MAX. |
280 | 311 | ||
281 | RETURN VALUE | 312 | RETURN VALUE |
313 | ^^^^^^^^^^^^ | ||
282 | 314 | ||
283 | A 'struct parport *' describing a valid parallel port for the machine, | 315 | A ``struct parport *`` describing a valid parallel port for the machine, |
284 | or NULL if there are none. | 316 | or NULL if there are none. |
285 | 317 | ||
286 | ERRORS | 318 | ERRORS |
319 | ^^^^^^ | ||
287 | 320 | ||
288 | This function can return NULL to indicate that there are no parallel | 321 | This function can return NULL to indicate that there are no parallel |
289 | ports to use. | 322 | ports to use. |
290 | 323 | ||
291 | EXAMPLE | 324 | EXAMPLE |
325 | ^^^^^^^ | ||
326 | |||
327 | :: | ||
292 | 328 | ||
293 | int detect_device (void) | 329 | int detect_device (void) |
294 | { | 330 | { |
295 | struct parport *port; | 331 | struct parport *port; |
332 | |||
333 | for (port = parport_enumerate (); | ||
334 | port != NULL; | ||
335 | port = port->next) { | ||
336 | /* Try to detect a device on the port... */ | ||
337 | ... | ||
338 | } | ||
339 | } | ||
296 | 340 | ||
297 | for (port = parport_enumerate (); | ||
298 | port != NULL; | ||
299 | port = port->next) { | ||
300 | /* Try to detect a device on the port... */ | ||
301 | ... | 341 | ... |
302 | } | ||
303 | } | 342 | } |
304 | 343 | ||
305 | ... | ||
306 | } | ||
307 | |||
308 | NOTES | 344 | NOTES |
345 | ^^^^^ | ||
309 | 346 | ||
310 | parport_enumerate is deprecated; parport_register_driver should be | 347 | parport_enumerate is deprecated; parport_register_driver should be |
311 | used instead. | 348 | used instead. |
312 | 349 | ||
313 | SEE ALSO | 350 | SEE ALSO |
351 | ^^^^^^^^ | ||
314 | 352 | ||
315 | parport_register_driver, parport_unregister_driver | 353 | parport_register_driver, parport_unregister_driver |
316 | 354 | ||
355 | |||
356 | |||
317 | parport_register_device - register to use a port | 357 | parport_register_device - register to use a port |
318 | ----------------------- | 358 | ------------------------------------------------ |
319 | 359 | ||
320 | SYNOPSIS | 360 | SYNOPSIS |
361 | ^^^^^^^^ | ||
321 | 362 | ||
322 | #include <linux/parport.h> | 363 | :: |
323 | 364 | ||
324 | typedef int (*preempt_func) (void *handle); | 365 | #include <linux/parport.h> |
325 | typedef void (*wakeup_func) (void *handle); | ||
326 | typedef int (*irq_func) (int irq, void *handle, struct pt_regs *); | ||
327 | 366 | ||
328 | struct pardevice *parport_register_device(struct parport *port, | 367 | typedef int (*preempt_func) (void *handle); |
329 | const char *name, | 368 | typedef void (*wakeup_func) (void *handle); |
330 | preempt_func preempt, | 369 | typedef int (*irq_func) (int irq, void *handle, struct pt_regs *); |
331 | wakeup_func wakeup, | 370 | |
332 | irq_func irq, | 371 | struct pardevice *parport_register_device(struct parport *port, |
333 | int flags, | 372 | const char *name, |
334 | void *handle); | 373 | preempt_func preempt, |
374 | wakeup_func wakeup, | ||
375 | irq_func irq, | ||
376 | int flags, | ||
377 | void *handle); | ||
335 | 378 | ||
336 | DESCRIPTION | 379 | DESCRIPTION |
380 | ^^^^^^^^^^^ | ||
337 | 381 | ||
338 | Use this function to register your device driver on a parallel port | 382 | Use this function to register your device driver on a parallel port |
339 | ('port'). Once you have done that, you will be able to use | 383 | (``port``). Once you have done that, you will be able to use |
340 | parport_claim and parport_release in order to use the port. | 384 | parport_claim and parport_release in order to use the port. |
341 | 385 | ||
342 | The ('name') argument is the name of the device that appears in /proc | 386 | The (``name``) argument is the name of the device that appears in /proc |
343 | filesystem. The string must be valid for the whole lifetime of the | 387 | filesystem. The string must be valid for the whole lifetime of the |
344 | device (until parport_unregister_device is called). | 388 | device (until parport_unregister_device is called). |
345 | 389 | ||
346 | This function will register three callbacks into your driver: | 390 | This function will register three callbacks into your driver: |
347 | 'preempt', 'wakeup' and 'irq'. Each of these may be NULL in order to | 391 | ``preempt``, ``wakeup`` and ``irq``. Each of these may be NULL in order to |
348 | indicate that you do not want a callback. | 392 | indicate that you do not want a callback. |
349 | 393 | ||
350 | When the 'preempt' function is called, it is because another driver | 394 | When the ``preempt`` function is called, it is because another driver |
351 | wishes to use the parallel port. The 'preempt' function should return | 395 | wishes to use the parallel port. The ``preempt`` function should return |
352 | non-zero if the parallel port cannot be released yet -- if zero is | 396 | non-zero if the parallel port cannot be released yet -- if zero is |
353 | returned, the port is lost to another driver and the port must be | 397 | returned, the port is lost to another driver and the port must be |
354 | re-claimed before use. | 398 | re-claimed before use. |
355 | 399 | ||
356 | The 'wakeup' function is called once another driver has released the | 400 | The ``wakeup`` function is called once another driver has released the |
357 | port and no other driver has yet claimed it. You can claim the | 401 | port and no other driver has yet claimed it. You can claim the |
358 | parallel port from within the 'wakeup' function (in which case the | 402 | parallel port from within the ``wakeup`` function (in which case the |
359 | claim is guaranteed to succeed), or choose not to if you don't need it | 403 | claim is guaranteed to succeed), or choose not to if you don't need it |
360 | now. | 404 | now. |
361 | 405 | ||
362 | If an interrupt occurs on the parallel port your driver has claimed, | 406 | If an interrupt occurs on the parallel port your driver has claimed, |
363 | the 'irq' function will be called. (Write something about shared | 407 | the ``irq`` function will be called. (Write something about shared |
364 | interrupts here.) | 408 | interrupts here.) |
365 | 409 | ||
366 | The 'handle' is a pointer to driver-specific data, and is passed to | 410 | The ``handle`` is a pointer to driver-specific data, and is passed to |
367 | the callback functions. | 411 | the callback functions. |
368 | 412 | ||
369 | 'flags' may be a bitwise combination of the following flags: | 413 | ``flags`` may be a bitwise combination of the following flags: |
370 | 414 | ||
415 | ===================== ================================================= | ||
371 | Flag Meaning | 416 | Flag Meaning |
417 | ===================== ================================================= | ||
372 | PARPORT_DEV_EXCL The device cannot share the parallel port at all. | 418 | PARPORT_DEV_EXCL The device cannot share the parallel port at all. |
373 | Use this only when absolutely necessary. | 419 | Use this only when absolutely necessary. |
420 | ===================== ================================================= | ||
374 | 421 | ||
375 | The typedefs are not actually defined -- they are only shown in order | 422 | The typedefs are not actually defined -- they are only shown in order |
376 | to make the function prototype more readable. | 423 | to make the function prototype more readable. |
377 | 424 | ||
378 | The visible parts of the returned 'struct pardevice' are: | 425 | The visible parts of the returned ``struct pardevice`` are:: |
379 | 426 | ||
380 | struct pardevice { | 427 | struct pardevice { |
381 | struct parport *port; /* Associated port */ | 428 | struct parport *port; /* Associated port */ |
382 | void *private; /* Device driver's 'handle' */ | 429 | void *private; /* Device driver's 'handle' */ |
383 | ... | 430 | ... |
384 | }; | 431 | }; |
385 | 432 | ||
386 | RETURN VALUE | 433 | RETURN VALUE |
434 | ^^^^^^^^^^^^ | ||
387 | 435 | ||
388 | A 'struct pardevice *': a handle to the registered parallel port | 436 | A ``struct pardevice *``: a handle to the registered parallel port |
389 | device that can be used for parport_claim, parport_release, etc. | 437 | device that can be used for parport_claim, parport_release, etc. |
390 | 438 | ||
391 | ERRORS | 439 | ERRORS |
440 | ^^^^^^ | ||
392 | 441 | ||
393 | A return value of NULL indicates that there was a problem registering | 442 | A return value of NULL indicates that there was a problem registering |
394 | a device on that port. | 443 | a device on that port. |
395 | 444 | ||
396 | EXAMPLE | 445 | EXAMPLE |
446 | ^^^^^^^ | ||
447 | |||
448 | :: | ||
449 | |||
450 | static int preempt (void *handle) | ||
451 | { | ||
452 | if (busy_right_now) | ||
453 | return 1; | ||
454 | |||
455 | must_reclaim_port = 1; | ||
456 | return 0; | ||
457 | } | ||
458 | |||
459 | static void wakeup (void *handle) | ||
460 | { | ||
461 | struct toaster *private = handle; | ||
462 | struct pardevice *dev = private->dev; | ||
463 | if (!dev) return; /* avoid races */ | ||
464 | |||
465 | if (want_port) | ||
466 | parport_claim (dev); | ||
467 | } | ||
468 | |||
469 | static int toaster_detect (struct toaster *private, struct parport *port) | ||
470 | { | ||
471 | private->dev = parport_register_device (port, "toaster", preempt, | ||
472 | wakeup, NULL, 0, | ||
473 | private); | ||
474 | if (!private->dev) | ||
475 | /* Couldn't register with parport. */ | ||
476 | return -EIO; | ||
397 | 477 | ||
398 | static int preempt (void *handle) | ||
399 | { | ||
400 | if (busy_right_now) | ||
401 | return 1; | ||
402 | |||
403 | must_reclaim_port = 1; | ||
404 | return 0; | ||
405 | } | ||
406 | |||
407 | static void wakeup (void *handle) | ||
408 | { | ||
409 | struct toaster *private = handle; | ||
410 | struct pardevice *dev = private->dev; | ||
411 | if (!dev) return; /* avoid races */ | ||
412 | |||
413 | if (want_port) | ||
414 | parport_claim (dev); | ||
415 | } | ||
416 | |||
417 | static int toaster_detect (struct toaster *private, struct parport *port) | ||
418 | { | ||
419 | private->dev = parport_register_device (port, "toaster", preempt, | ||
420 | wakeup, NULL, 0, | ||
421 | private); | ||
422 | if (!private->dev) | ||
423 | /* Couldn't register with parport. */ | ||
424 | return -EIO; | ||
425 | |||
426 | must_reclaim_port = 0; | ||
427 | busy_right_now = 1; | ||
428 | parport_claim_or_block (private->dev); | ||
429 | ... | ||
430 | /* Don't need the port while the toaster warms up. */ | ||
431 | busy_right_now = 0; | ||
432 | ... | ||
433 | busy_right_now = 1; | ||
434 | if (must_reclaim_port) { | ||
435 | parport_claim_or_block (private->dev); | ||
436 | must_reclaim_port = 0; | 478 | must_reclaim_port = 0; |
479 | busy_right_now = 1; | ||
480 | parport_claim_or_block (private->dev); | ||
481 | ... | ||
482 | /* Don't need the port while the toaster warms up. */ | ||
483 | busy_right_now = 0; | ||
484 | ... | ||
485 | busy_right_now = 1; | ||
486 | if (must_reclaim_port) { | ||
487 | parport_claim_or_block (private->dev); | ||
488 | must_reclaim_port = 0; | ||
489 | } | ||
490 | ... | ||
437 | } | 491 | } |
438 | ... | ||
439 | } | ||
440 | 492 | ||
441 | SEE ALSO | 493 | SEE ALSO |
494 | ^^^^^^^^ | ||
442 | 495 | ||
443 | parport_unregister_device, parport_claim | 496 | parport_unregister_device, parport_claim |
497 | |||
498 | |||
444 | 499 | ||
445 | parport_unregister_device - finish using a port | 500 | parport_unregister_device - finish using a port |
446 | ------------------------- | 501 | ----------------------------------------------- |
447 | 502 | ||
448 | SYNPOPSIS | 503 | SYNPOPSIS |
449 | 504 | ||
450 | #include <linux/parport.h> | 505 | :: |
506 | |||
507 | #include <linux/parport.h> | ||
451 | 508 | ||
452 | void parport_unregister_device (struct pardevice *dev); | 509 | void parport_unregister_device (struct pardevice *dev); |
453 | 510 | ||
454 | DESCRIPTION | 511 | DESCRIPTION |
512 | ^^^^^^^^^^^ | ||
455 | 513 | ||
456 | This function is the opposite of parport_register_device. After using | 514 | This function is the opposite of parport_register_device. After using |
457 | parport_unregister_device, 'dev' is no longer a valid device handle. | 515 | parport_unregister_device, ``dev`` is no longer a valid device handle. |
458 | 516 | ||
459 | You should not unregister a device that is currently claimed, although | 517 | You should not unregister a device that is currently claimed, although |
460 | if you do it will be released automatically. | 518 | if you do it will be released automatically. |
461 | 519 | ||
462 | EXAMPLE | 520 | EXAMPLE |
521 | ^^^^^^^ | ||
522 | |||
523 | :: | ||
463 | 524 | ||
464 | ... | 525 | ... |
465 | kfree (dev->private); /* before we lose the pointer */ | 526 | kfree (dev->private); /* before we lose the pointer */ |
@@ -467,460 +528,602 @@ EXAMPLE | |||
467 | ... | 528 | ... |
468 | 529 | ||
469 | SEE ALSO | 530 | SEE ALSO |
531 | ^^^^^^^^ | ||
532 | |||
470 | 533 | ||
471 | parport_unregister_driver | 534 | parport_unregister_driver |
472 | 535 | ||
473 | parport_claim, parport_claim_or_block - claim the parallel port for a device | 536 | parport_claim, parport_claim_or_block - claim the parallel port for a device |
474 | ------------------------------------- | 537 | ---------------------------------------------------------------------------- |
475 | 538 | ||
476 | SYNOPSIS | 539 | SYNOPSIS |
540 | ^^^^^^^^ | ||
541 | |||
542 | :: | ||
477 | 543 | ||
478 | #include <linux/parport.h> | 544 | #include <linux/parport.h> |
479 | 545 | ||
480 | int parport_claim (struct pardevice *dev); | 546 | int parport_claim (struct pardevice *dev); |
481 | int parport_claim_or_block (struct pardevice *dev); | 547 | int parport_claim_or_block (struct pardevice *dev); |
482 | 548 | ||
483 | DESCRIPTION | 549 | DESCRIPTION |
550 | ^^^^^^^^^^^ | ||
484 | 551 | ||
485 | These functions attempt to gain control of the parallel port on which | 552 | These functions attempt to gain control of the parallel port on which |
486 | 'dev' is registered. 'parport_claim' does not block, but | 553 | ``dev`` is registered. ``parport_claim`` does not block, but |
487 | 'parport_claim_or_block' may do. (Put something here about blocking | 554 | ``parport_claim_or_block`` may do. (Put something here about blocking |
488 | interruptibly or non-interruptibly.) | 555 | interruptibly or non-interruptibly.) |
489 | 556 | ||
490 | You should not try to claim a port that you have already claimed. | 557 | You should not try to claim a port that you have already claimed. |
491 | 558 | ||
492 | RETURN VALUE | 559 | RETURN VALUE |
560 | ^^^^^^^^^^^^ | ||
493 | 561 | ||
494 | A return value of zero indicates that the port was successfully | 562 | A return value of zero indicates that the port was successfully |
495 | claimed, and the caller now has possession of the parallel port. | 563 | claimed, and the caller now has possession of the parallel port. |
496 | 564 | ||
497 | If 'parport_claim_or_block' blocks before returning successfully, the | 565 | If ``parport_claim_or_block`` blocks before returning successfully, the |
498 | return value is positive. | 566 | return value is positive. |
499 | 567 | ||
500 | ERRORS | 568 | ERRORS |
569 | ^^^^^^ | ||
501 | 570 | ||
571 | ========== ========================================================== | ||
502 | -EAGAIN The port is unavailable at the moment, but another attempt | 572 | -EAGAIN The port is unavailable at the moment, but another attempt |
503 | to claim it may succeed. | 573 | to claim it may succeed. |
574 | ========== ========================================================== | ||
504 | 575 | ||
505 | SEE ALSO | 576 | SEE ALSO |
577 | ^^^^^^^^ | ||
578 | |||
506 | 579 | ||
507 | parport_release | 580 | parport_release |
508 | 581 | ||
509 | parport_release - release the parallel port | 582 | parport_release - release the parallel port |
510 | --------------- | 583 | ------------------------------------------- |
511 | 584 | ||
512 | SYNOPSIS | 585 | SYNOPSIS |
586 | ^^^^^^^^ | ||
587 | |||
588 | :: | ||
513 | 589 | ||
514 | #include <linux/parport.h> | 590 | #include <linux/parport.h> |
515 | 591 | ||
516 | void parport_release (struct pardevice *dev); | 592 | void parport_release (struct pardevice *dev); |
517 | 593 | ||
518 | DESCRIPTION | 594 | DESCRIPTION |
595 | ^^^^^^^^^^^ | ||
519 | 596 | ||
520 | Once a parallel port device has been claimed, it can be released using | 597 | Once a parallel port device has been claimed, it can be released using |
521 | 'parport_release'. It cannot fail, but you should not release a | 598 | ``parport_release``. It cannot fail, but you should not release a |
522 | device that you do not have possession of. | 599 | device that you do not have possession of. |
523 | 600 | ||
524 | EXAMPLE | 601 | EXAMPLE |
602 | ^^^^^^^ | ||
525 | 603 | ||
526 | static size_t write (struct pardevice *dev, const void *buf, | 604 | :: |
527 | size_t len) | 605 | |
528 | { | 606 | static size_t write (struct pardevice *dev, const void *buf, |
529 | ... | 607 | size_t len) |
530 | written = dev->port->ops->write_ecp_data (dev->port, buf, | 608 | { |
531 | len); | 609 | ... |
532 | parport_release (dev); | 610 | written = dev->port->ops->write_ecp_data (dev->port, buf, |
533 | ... | 611 | len); |
534 | } | 612 | parport_release (dev); |
613 | ... | ||
614 | } | ||
535 | 615 | ||
536 | 616 | ||
537 | SEE ALSO | 617 | SEE ALSO |
618 | ^^^^^^^^ | ||
538 | 619 | ||
539 | change_mode, parport_claim, parport_claim_or_block, parport_yield | 620 | change_mode, parport_claim, parport_claim_or_block, parport_yield |
540 | 621 | ||
622 | |||
623 | |||
541 | parport_yield, parport_yield_blocking - temporarily release a parallel port | 624 | parport_yield, parport_yield_blocking - temporarily release a parallel port |
542 | ------------------------------------- | 625 | --------------------------------------------------------------------------- |
543 | 626 | ||
544 | SYNOPSIS | 627 | SYNOPSIS |
628 | ^^^^^^^^ | ||
629 | |||
630 | :: | ||
545 | 631 | ||
546 | #include <linux/parport.h> | 632 | #include <linux/parport.h> |
547 | 633 | ||
548 | int parport_yield (struct pardevice *dev) | 634 | int parport_yield (struct pardevice *dev) |
549 | int parport_yield_blocking (struct pardevice *dev); | 635 | int parport_yield_blocking (struct pardevice *dev); |
550 | 636 | ||
551 | DESCRIPTION | 637 | DESCRIPTION |
638 | ^^^^^^^^^^^ | ||
552 | 639 | ||
553 | When a driver has control of a parallel port, it may allow another | 640 | When a driver has control of a parallel port, it may allow another |
554 | driver to temporarily 'borrow' it. 'parport_yield' does not block; | 641 | driver to temporarily ``borrow`` it. ``parport_yield`` does not block; |
555 | 'parport_yield_blocking' may do. | 642 | ``parport_yield_blocking`` may do. |
556 | 643 | ||
557 | RETURN VALUE | 644 | RETURN VALUE |
645 | ^^^^^^^^^^^^ | ||
558 | 646 | ||
559 | A return value of zero indicates that the caller still owns the port | 647 | A return value of zero indicates that the caller still owns the port |
560 | and the call did not block. | 648 | and the call did not block. |
561 | 649 | ||
562 | A positive return value from 'parport_yield_blocking' indicates that | 650 | A positive return value from ``parport_yield_blocking`` indicates that |
563 | the caller still owns the port and the call blocked. | 651 | the caller still owns the port and the call blocked. |
564 | 652 | ||
565 | A return value of -EAGAIN indicates that the caller no longer owns the | 653 | A return value of -EAGAIN indicates that the caller no longer owns the |
566 | port, and it must be re-claimed before use. | 654 | port, and it must be re-claimed before use. |
567 | 655 | ||
568 | ERRORS | 656 | ERRORS |
657 | ^^^^^^ | ||
569 | 658 | ||
659 | ========= ========================================================== | ||
570 | -EAGAIN Ownership of the parallel port was given away. | 660 | -EAGAIN Ownership of the parallel port was given away. |
661 | ========= ========================================================== | ||
571 | 662 | ||
572 | SEE ALSO | 663 | SEE ALSO |
664 | ^^^^^^^^ | ||
573 | 665 | ||
574 | parport_release | 666 | parport_release |
667 | |||
668 | |||
575 | 669 | ||
576 | parport_wait_peripheral - wait for status lines, up to 35ms | 670 | parport_wait_peripheral - wait for status lines, up to 35ms |
577 | ----------------------- | 671 | ----------------------------------------------------------- |
578 | 672 | ||
579 | SYNOPSIS | 673 | SYNOPSIS |
674 | ^^^^^^^^ | ||
675 | |||
676 | :: | ||
580 | 677 | ||
581 | #include <linux/parport.h> | 678 | #include <linux/parport.h> |
582 | 679 | ||
583 | int parport_wait_peripheral (struct parport *port, | 680 | int parport_wait_peripheral (struct parport *port, |
584 | unsigned char mask, | 681 | unsigned char mask, |
585 | unsigned char val); | 682 | unsigned char val); |
586 | 683 | ||
587 | DESCRIPTION | 684 | DESCRIPTION |
685 | ^^^^^^^^^^^ | ||
588 | 686 | ||
589 | Wait for the status lines in mask to match the values in val. | 687 | Wait for the status lines in mask to match the values in val. |
590 | 688 | ||
591 | RETURN VALUE | 689 | RETURN VALUE |
690 | ^^^^^^^^^^^^ | ||
592 | 691 | ||
692 | ======== ========================================================== | ||
593 | -EINTR a signal is pending | 693 | -EINTR a signal is pending |
594 | 0 the status lines in mask have values in val | 694 | 0 the status lines in mask have values in val |
595 | 1 timed out while waiting (35ms elapsed) | 695 | 1 timed out while waiting (35ms elapsed) |
696 | ======== ========================================================== | ||
596 | 697 | ||
597 | SEE ALSO | 698 | SEE ALSO |
699 | ^^^^^^^^ | ||
598 | 700 | ||
599 | parport_poll_peripheral | 701 | parport_poll_peripheral |
702 | |||
703 | |||
600 | 704 | ||
601 | parport_poll_peripheral - wait for status lines, in usec | 705 | parport_poll_peripheral - wait for status lines, in usec |
602 | ----------------------- | 706 | -------------------------------------------------------- |
603 | 707 | ||
604 | SYNOPSIS | 708 | SYNOPSIS |
709 | ^^^^^^^^ | ||
710 | |||
711 | :: | ||
605 | 712 | ||
606 | #include <linux/parport.h> | 713 | #include <linux/parport.h> |
607 | 714 | ||
608 | int parport_poll_peripheral (struct parport *port, | 715 | int parport_poll_peripheral (struct parport *port, |
609 | unsigned char mask, | 716 | unsigned char mask, |
610 | unsigned char val, | 717 | unsigned char val, |
611 | int usec); | 718 | int usec); |
612 | 719 | ||
613 | DESCRIPTION | 720 | DESCRIPTION |
721 | ^^^^^^^^^^^ | ||
614 | 722 | ||
615 | Wait for the status lines in mask to match the values in val. | 723 | Wait for the status lines in mask to match the values in val. |
616 | 724 | ||
617 | RETURN VALUE | 725 | RETURN VALUE |
726 | ^^^^^^^^^^^^ | ||
618 | 727 | ||
728 | ======== ========================================================== | ||
619 | -EINTR a signal is pending | 729 | -EINTR a signal is pending |
620 | 0 the status lines in mask have values in val | 730 | 0 the status lines in mask have values in val |
621 | 1 timed out while waiting (usec microseconds have elapsed) | 731 | 1 timed out while waiting (usec microseconds have elapsed) |
732 | ======== ========================================================== | ||
622 | 733 | ||
623 | SEE ALSO | 734 | SEE ALSO |
735 | ^^^^^^^^ | ||
624 | 736 | ||
625 | parport_wait_peripheral | 737 | parport_wait_peripheral |
626 | 738 | ||
739 | |||
740 | |||
627 | parport_wait_event - wait for an event on a port | 741 | parport_wait_event - wait for an event on a port |
628 | ------------------ | 742 | ------------------------------------------------ |
629 | 743 | ||
630 | SYNOPSIS | 744 | SYNOPSIS |
745 | ^^^^^^^^ | ||
631 | 746 | ||
632 | #include <linux/parport.h> | 747 | :: |
633 | 748 | ||
634 | int parport_wait_event (struct parport *port, signed long timeout) | 749 | #include <linux/parport.h> |
750 | |||
751 | int parport_wait_event (struct parport *port, signed long timeout) | ||
635 | 752 | ||
636 | DESCRIPTION | 753 | DESCRIPTION |
754 | ^^^^^^^^^^^ | ||
637 | 755 | ||
638 | Wait for an event (e.g. interrupt) on a port. The timeout is in | 756 | Wait for an event (e.g. interrupt) on a port. The timeout is in |
639 | jiffies. | 757 | jiffies. |
640 | 758 | ||
641 | RETURN VALUE | 759 | RETURN VALUE |
760 | ^^^^^^^^^^^^ | ||
642 | 761 | ||
762 | ======= ========================================================== | ||
643 | 0 success | 763 | 0 success |
644 | <0 error (exit as soon as possible) | 764 | <0 error (exit as soon as possible) |
645 | >0 timed out | 765 | >0 timed out |
646 | 766 | ======= ========================================================== | |
767 | |||
647 | parport_negotiate - perform IEEE 1284 negotiation | 768 | parport_negotiate - perform IEEE 1284 negotiation |
648 | ----------------- | 769 | ------------------------------------------------- |
649 | 770 | ||
650 | SYNOPSIS | 771 | SYNOPSIS |
772 | ^^^^^^^^ | ||
773 | |||
774 | :: | ||
651 | 775 | ||
652 | #include <linux/parport.h> | 776 | #include <linux/parport.h> |
653 | 777 | ||
654 | int parport_negotiate (struct parport *, int mode); | 778 | int parport_negotiate (struct parport *, int mode); |
655 | 779 | ||
656 | DESCRIPTION | 780 | DESCRIPTION |
781 | ^^^^^^^^^^^ | ||
657 | 782 | ||
658 | Perform IEEE 1284 negotiation. | 783 | Perform IEEE 1284 negotiation. |
659 | 784 | ||
660 | RETURN VALUE | 785 | RETURN VALUE |
786 | ^^^^^^^^^^^^ | ||
661 | 787 | ||
788 | ======= ========================================================== | ||
662 | 0 handshake OK; IEEE 1284 peripheral and mode available | 789 | 0 handshake OK; IEEE 1284 peripheral and mode available |
663 | -1 handshake failed; peripheral not compliant (or none present) | 790 | -1 handshake failed; peripheral not compliant (or none present) |
664 | 1 handshake OK; IEEE 1284 peripheral present but mode not | 791 | 1 handshake OK; IEEE 1284 peripheral present but mode not |
665 | available | 792 | available |
793 | ======= ========================================================== | ||
666 | 794 | ||
667 | SEE ALSO | 795 | SEE ALSO |
796 | ^^^^^^^^ | ||
668 | 797 | ||
669 | parport_read, parport_write | 798 | parport_read, parport_write |
670 | 799 | ||
800 | |||
801 | |||
671 | parport_read - read data from device | 802 | parport_read - read data from device |
672 | ------------ | 803 | ------------------------------------ |
673 | 804 | ||
674 | SYNOPSIS | 805 | SYNOPSIS |
806 | ^^^^^^^^ | ||
807 | |||
808 | :: | ||
675 | 809 | ||
676 | #include <linux/parport.h> | 810 | #include <linux/parport.h> |
677 | 811 | ||
678 | ssize_t parport_read (struct parport *, void *buf, size_t len); | 812 | ssize_t parport_read (struct parport *, void *buf, size_t len); |
679 | 813 | ||
680 | DESCRIPTION | 814 | DESCRIPTION |
815 | ^^^^^^^^^^^ | ||
681 | 816 | ||
682 | Read data from device in current IEEE 1284 transfer mode. This only | 817 | Read data from device in current IEEE 1284 transfer mode. This only |
683 | works for modes that support reverse data transfer. | 818 | works for modes that support reverse data transfer. |
684 | 819 | ||
685 | RETURN VALUE | 820 | RETURN VALUE |
821 | ^^^^^^^^^^^^ | ||
686 | 822 | ||
687 | If negative, an error code; otherwise the number of bytes transferred. | 823 | If negative, an error code; otherwise the number of bytes transferred. |
688 | 824 | ||
689 | SEE ALSO | 825 | SEE ALSO |
826 | ^^^^^^^^ | ||
690 | 827 | ||
691 | parport_write, parport_negotiate | 828 | parport_write, parport_negotiate |
692 | 829 | ||
830 | |||
831 | |||
693 | parport_write - write data to device | 832 | parport_write - write data to device |
694 | ------------- | 833 | ------------------------------------ |
695 | 834 | ||
696 | SYNOPSIS | 835 | SYNOPSIS |
836 | ^^^^^^^^ | ||
837 | |||
838 | :: | ||
697 | 839 | ||
698 | #include <linux/parport.h> | 840 | #include <linux/parport.h> |
699 | 841 | ||
700 | ssize_t parport_write (struct parport *, const void *buf, size_t len); | 842 | ssize_t parport_write (struct parport *, const void *buf, size_t len); |
701 | 843 | ||
702 | DESCRIPTION | 844 | DESCRIPTION |
845 | ^^^^^^^^^^^ | ||
703 | 846 | ||
704 | Write data to device in current IEEE 1284 transfer mode. This only | 847 | Write data to device in current IEEE 1284 transfer mode. This only |
705 | works for modes that support forward data transfer. | 848 | works for modes that support forward data transfer. |
706 | 849 | ||
707 | RETURN VALUE | 850 | RETURN VALUE |
851 | ^^^^^^^^^^^^ | ||
708 | 852 | ||
709 | If negative, an error code; otherwise the number of bytes transferred. | 853 | If negative, an error code; otherwise the number of bytes transferred. |
710 | 854 | ||
711 | SEE ALSO | 855 | SEE ALSO |
856 | ^^^^^^^^ | ||
712 | 857 | ||
713 | parport_read, parport_negotiate | 858 | parport_read, parport_negotiate |
859 | |||
860 | |||
714 | 861 | ||
715 | parport_open - register device for particular device number | 862 | parport_open - register device for particular device number |
716 | ------------ | 863 | ----------------------------------------------------------- |
717 | 864 | ||
718 | SYNOPSIS | 865 | SYNOPSIS |
866 | ^^^^^^^^ | ||
719 | 867 | ||
720 | #include <linux/parport.h> | 868 | :: |
721 | 869 | ||
722 | struct pardevice *parport_open (int devnum, const char *name, | 870 | #include <linux/parport.h> |
723 | int (*pf) (void *), | 871 | |
724 | void (*kf) (void *), | 872 | struct pardevice *parport_open (int devnum, const char *name, |
725 | void (*irqf) (int, void *, | 873 | int (*pf) (void *), |
726 | struct pt_regs *), | 874 | void (*kf) (void *), |
727 | int flags, void *handle); | 875 | void (*irqf) (int, void *, |
876 | struct pt_regs *), | ||
877 | int flags, void *handle); | ||
728 | 878 | ||
729 | DESCRIPTION | 879 | DESCRIPTION |
880 | ^^^^^^^^^^^ | ||
730 | 881 | ||
731 | This is like parport_register_device but takes a device number instead | 882 | This is like parport_register_device but takes a device number instead |
732 | of a pointer to a struct parport. | 883 | of a pointer to a struct parport. |
733 | 884 | ||
734 | RETURN VALUE | 885 | RETURN VALUE |
886 | ^^^^^^^^^^^^ | ||
735 | 887 | ||
736 | See parport_register_device. If no device is associated with devnum, | 888 | See parport_register_device. If no device is associated with devnum, |
737 | NULL is returned. | 889 | NULL is returned. |
738 | 890 | ||
739 | SEE ALSO | 891 | SEE ALSO |
892 | ^^^^^^^^ | ||
740 | 893 | ||
741 | parport_register_device | 894 | parport_register_device |
742 | 895 | ||
896 | |||
897 | |||
743 | parport_close - unregister device for particular device number | 898 | parport_close - unregister device for particular device number |
744 | ------------- | 899 | -------------------------------------------------------------- |
745 | 900 | ||
746 | SYNOPSIS | 901 | SYNOPSIS |
902 | ^^^^^^^^ | ||
903 | |||
904 | :: | ||
747 | 905 | ||
748 | #include <linux/parport.h> | 906 | #include <linux/parport.h> |
749 | 907 | ||
750 | void parport_close (struct pardevice *dev); | 908 | void parport_close (struct pardevice *dev); |
751 | 909 | ||
752 | DESCRIPTION | 910 | DESCRIPTION |
911 | ^^^^^^^^^^^ | ||
753 | 912 | ||
754 | This is the equivalent of parport_unregister_device for parport_open. | 913 | This is the equivalent of parport_unregister_device for parport_open. |
755 | 914 | ||
756 | SEE ALSO | 915 | SEE ALSO |
916 | ^^^^^^^^ | ||
757 | 917 | ||
758 | parport_unregister_device, parport_open | 918 | parport_unregister_device, parport_open |
759 | 919 | ||
920 | |||
921 | |||
760 | parport_device_id - obtain IEEE 1284 Device ID | 922 | parport_device_id - obtain IEEE 1284 Device ID |
761 | ----------------- | 923 | ---------------------------------------------- |
762 | 924 | ||
763 | SYNOPSIS | 925 | SYNOPSIS |
926 | ^^^^^^^^ | ||
927 | |||
928 | :: | ||
764 | 929 | ||
765 | #include <linux/parport.h> | 930 | #include <linux/parport.h> |
766 | 931 | ||
767 | ssize_t parport_device_id (int devnum, char *buffer, size_t len); | 932 | ssize_t parport_device_id (int devnum, char *buffer, size_t len); |
768 | 933 | ||
769 | DESCRIPTION | 934 | DESCRIPTION |
935 | ^^^^^^^^^^^ | ||
770 | 936 | ||
771 | Obtains the IEEE 1284 Device ID associated with a given device. | 937 | Obtains the IEEE 1284 Device ID associated with a given device. |
772 | 938 | ||
773 | RETURN VALUE | 939 | RETURN VALUE |
940 | ^^^^^^^^^^^^ | ||
774 | 941 | ||
775 | If negative, an error code; otherwise, the number of bytes of buffer | 942 | If negative, an error code; otherwise, the number of bytes of buffer |
776 | that contain the device ID. The format of the device ID is as | 943 | that contain the device ID. The format of the device ID is as |
777 | follows: | 944 | follows:: |
778 | 945 | ||
779 | [length][ID] | 946 | [length][ID] |
780 | 947 | ||
781 | The first two bytes indicate the inclusive length of the entire Device | 948 | The first two bytes indicate the inclusive length of the entire Device |
782 | ID, and are in big-endian order. The ID is a sequence of pairs of the | 949 | ID, and are in big-endian order. The ID is a sequence of pairs of the |
783 | form: | 950 | form:: |
784 | 951 | ||
785 | key:value; | 952 | key:value; |
786 | 953 | ||
787 | NOTES | 954 | NOTES |
955 | ^^^^^ | ||
788 | 956 | ||
789 | Many devices have ill-formed IEEE 1284 Device IDs. | 957 | Many devices have ill-formed IEEE 1284 Device IDs. |
790 | 958 | ||
791 | SEE ALSO | 959 | SEE ALSO |
960 | ^^^^^^^^ | ||
792 | 961 | ||
793 | parport_find_class, parport_find_device | 962 | parport_find_class, parport_find_device |
794 | 963 | ||
964 | |||
965 | |||
795 | parport_device_coords - convert device number to device coordinates | 966 | parport_device_coords - convert device number to device coordinates |
796 | ------------------ | 967 | ------------------------------------------------------------------- |
797 | 968 | ||
798 | SYNOPSIS | 969 | SYNOPSIS |
970 | ^^^^^^^^ | ||
971 | |||
972 | :: | ||
799 | 973 | ||
800 | #include <linux/parport.h> | 974 | #include <linux/parport.h> |
801 | 975 | ||
802 | int parport_device_coords (int devnum, int *parport, int *mux, | 976 | int parport_device_coords (int devnum, int *parport, int *mux, |
803 | int *daisy); | 977 | int *daisy); |
804 | 978 | ||
805 | DESCRIPTION | 979 | DESCRIPTION |
980 | ^^^^^^^^^^^ | ||
806 | 981 | ||
807 | Convert between device number (zero-based) and device coordinates | 982 | Convert between device number (zero-based) and device coordinates |
808 | (port, multiplexor, daisy chain address). | 983 | (port, multiplexor, daisy chain address). |
809 | 984 | ||
810 | RETURN VALUE | 985 | RETURN VALUE |
986 | ^^^^^^^^^^^^ | ||
811 | 987 | ||
812 | Zero on success, in which case the coordinates are (*parport, *mux, | 988 | Zero on success, in which case the coordinates are (``*parport``, ``*mux``, |
813 | *daisy). | 989 | ``*daisy``). |
814 | 990 | ||
815 | SEE ALSO | 991 | SEE ALSO |
992 | ^^^^^^^^ | ||
816 | 993 | ||
817 | parport_open, parport_device_id | 994 | parport_open, parport_device_id |
818 | 995 | ||
996 | |||
997 | |||
819 | parport_find_class - find a device by its class | 998 | parport_find_class - find a device by its class |
820 | ------------------ | 999 | ----------------------------------------------- |
821 | 1000 | ||
822 | SYNOPSIS | 1001 | SYNOPSIS |
823 | 1002 | ^^^^^^^^ | |
824 | #include <linux/parport.h> | 1003 | |
825 | 1004 | :: | |
826 | typedef enum { | 1005 | |
827 | PARPORT_CLASS_LEGACY = 0, /* Non-IEEE1284 device */ | 1006 | #include <linux/parport.h> |
828 | PARPORT_CLASS_PRINTER, | 1007 | |
829 | PARPORT_CLASS_MODEM, | 1008 | typedef enum { |
830 | PARPORT_CLASS_NET, | 1009 | PARPORT_CLASS_LEGACY = 0, /* Non-IEEE1284 device */ |
831 | PARPORT_CLASS_HDC, /* Hard disk controller */ | 1010 | PARPORT_CLASS_PRINTER, |
832 | PARPORT_CLASS_PCMCIA, | 1011 | PARPORT_CLASS_MODEM, |
833 | PARPORT_CLASS_MEDIA, /* Multimedia device */ | 1012 | PARPORT_CLASS_NET, |
834 | PARPORT_CLASS_FDC, /* Floppy disk controller */ | 1013 | PARPORT_CLASS_HDC, /* Hard disk controller */ |
835 | PARPORT_CLASS_PORTS, | 1014 | PARPORT_CLASS_PCMCIA, |
836 | PARPORT_CLASS_SCANNER, | 1015 | PARPORT_CLASS_MEDIA, /* Multimedia device */ |
837 | PARPORT_CLASS_DIGCAM, | 1016 | PARPORT_CLASS_FDC, /* Floppy disk controller */ |
838 | PARPORT_CLASS_OTHER, /* Anything else */ | 1017 | PARPORT_CLASS_PORTS, |
839 | PARPORT_CLASS_UNSPEC, /* No CLS field in ID */ | 1018 | PARPORT_CLASS_SCANNER, |
840 | PARPORT_CLASS_SCSIADAPTER | 1019 | PARPORT_CLASS_DIGCAM, |
841 | } parport_device_class; | 1020 | PARPORT_CLASS_OTHER, /* Anything else */ |
842 | 1021 | PARPORT_CLASS_UNSPEC, /* No CLS field in ID */ | |
843 | int parport_find_class (parport_device_class cls, int from); | 1022 | PARPORT_CLASS_SCSIADAPTER |
1023 | } parport_device_class; | ||
1024 | |||
1025 | int parport_find_class (parport_device_class cls, int from); | ||
844 | 1026 | ||
845 | DESCRIPTION | 1027 | DESCRIPTION |
1028 | ^^^^^^^^^^^ | ||
846 | 1029 | ||
847 | Find a device by class. The search starts from device number from+1. | 1030 | Find a device by class. The search starts from device number from+1. |
848 | 1031 | ||
849 | RETURN VALUE | 1032 | RETURN VALUE |
1033 | ^^^^^^^^^^^^ | ||
850 | 1034 | ||
851 | The device number of the next device in that class, or -1 if no such | 1035 | The device number of the next device in that class, or -1 if no such |
852 | device exists. | 1036 | device exists. |
853 | 1037 | ||
854 | NOTES | 1038 | NOTES |
1039 | ^^^^^ | ||
855 | 1040 | ||
856 | Example usage: | 1041 | Example usage:: |
857 | 1042 | ||
858 | int devnum = -1; | 1043 | int devnum = -1; |
859 | while ((devnum = parport_find_class (PARPORT_CLASS_DIGCAM, devnum)) != -1) { | 1044 | while ((devnum = parport_find_class (PARPORT_CLASS_DIGCAM, devnum)) != -1) { |
860 | struct pardevice *dev = parport_open (devnum, ...); | 1045 | struct pardevice *dev = parport_open (devnum, ...); |
861 | ... | 1046 | ... |
862 | } | 1047 | } |
863 | 1048 | ||
864 | SEE ALSO | 1049 | SEE ALSO |
1050 | ^^^^^^^^ | ||
865 | 1051 | ||
866 | parport_find_device, parport_open, parport_device_id | 1052 | parport_find_device, parport_open, parport_device_id |
867 | 1053 | ||
1054 | |||
1055 | |||
868 | parport_find_device - find a device by its class | 1056 | parport_find_device - find a device by its class |
869 | ------------------ | 1057 | ------------------------------------------------ |
870 | 1058 | ||
871 | SYNOPSIS | 1059 | SYNOPSIS |
1060 | ^^^^^^^^ | ||
872 | 1061 | ||
873 | #include <linux/parport.h> | 1062 | :: |
874 | 1063 | ||
875 | int parport_find_device (const char *mfg, const char *mdl, int from); | 1064 | #include <linux/parport.h> |
1065 | |||
1066 | int parport_find_device (const char *mfg, const char *mdl, int from); | ||
876 | 1067 | ||
877 | DESCRIPTION | 1068 | DESCRIPTION |
1069 | ^^^^^^^^^^^ | ||
878 | 1070 | ||
879 | Find a device by vendor and model. The search starts from device | 1071 | Find a device by vendor and model. The search starts from device |
880 | number from+1. | 1072 | number from+1. |
881 | 1073 | ||
882 | RETURN VALUE | 1074 | RETURN VALUE |
1075 | ^^^^^^^^^^^^ | ||
883 | 1076 | ||
884 | The device number of the next device matching the specifications, or | 1077 | The device number of the next device matching the specifications, or |
885 | -1 if no such device exists. | 1078 | -1 if no such device exists. |
886 | 1079 | ||
887 | NOTES | 1080 | NOTES |
1081 | ^^^^^ | ||
888 | 1082 | ||
889 | Example usage: | 1083 | Example usage:: |
890 | 1084 | ||
891 | int devnum = -1; | 1085 | int devnum = -1; |
892 | while ((devnum = parport_find_device ("IOMEGA", "ZIP+", devnum)) != -1) { | 1086 | while ((devnum = parport_find_device ("IOMEGA", "ZIP+", devnum)) != -1) { |
893 | struct pardevice *dev = parport_open (devnum, ...); | 1087 | struct pardevice *dev = parport_open (devnum, ...); |
894 | ... | 1088 | ... |
895 | } | 1089 | } |
896 | 1090 | ||
897 | SEE ALSO | 1091 | SEE ALSO |
1092 | ^^^^^^^^ | ||
898 | 1093 | ||
899 | parport_find_class, parport_open, parport_device_id | 1094 | parport_find_class, parport_open, parport_device_id |
1095 | |||
1096 | |||
900 | 1097 | ||
901 | parport_set_timeout - set the inactivity timeout | 1098 | parport_set_timeout - set the inactivity timeout |
902 | ------------------- | 1099 | ------------------------------------------------ |
903 | 1100 | ||
904 | SYNOPSIS | 1101 | SYNOPSIS |
1102 | ^^^^^^^^ | ||
1103 | |||
1104 | :: | ||
905 | 1105 | ||
906 | #include <linux/parport.h> | 1106 | #include <linux/parport.h> |
907 | 1107 | ||
908 | long parport_set_timeout (struct pardevice *dev, long inactivity); | 1108 | long parport_set_timeout (struct pardevice *dev, long inactivity); |
909 | 1109 | ||
910 | DESCRIPTION | 1110 | DESCRIPTION |
1111 | ^^^^^^^^^^^ | ||
911 | 1112 | ||
912 | Set the inactivity timeout, in jiffies, for a registered device. The | 1113 | Set the inactivity timeout, in jiffies, for a registered device. The |
913 | previous timeout is returned. | 1114 | previous timeout is returned. |
914 | 1115 | ||
915 | RETURN VALUE | 1116 | RETURN VALUE |
1117 | ^^^^^^^^^^^^ | ||
916 | 1118 | ||
917 | The previous timeout, in jiffies. | 1119 | The previous timeout, in jiffies. |
918 | 1120 | ||
919 | NOTES | 1121 | NOTES |
1122 | ^^^^^ | ||
920 | 1123 | ||
921 | Some of the port->ops functions for a parport may take time, owing to | 1124 | Some of the port->ops functions for a parport may take time, owing to |
922 | delays at the peripheral. After the peripheral has not responded for | 1125 | delays at the peripheral. After the peripheral has not responded for |
923 | 'inactivity' jiffies, a timeout will occur and the blocking function | 1126 | ``inactivity`` jiffies, a timeout will occur and the blocking function |
924 | will return. | 1127 | will return. |
925 | 1128 | ||
926 | A timeout of 0 jiffies is a special case: the function must do as much | 1129 | A timeout of 0 jiffies is a special case: the function must do as much |
@@ -932,29 +1135,37 @@ Once set for a registered device, the timeout will remain at the set | |||
932 | value until set again. | 1135 | value until set again. |
933 | 1136 | ||
934 | SEE ALSO | 1137 | SEE ALSO |
1138 | ^^^^^^^^ | ||
935 | 1139 | ||
936 | port->ops->xxx_read/write_yyy | 1140 | port->ops->xxx_read/write_yyy |
937 | 1141 | ||
1142 | |||
1143 | |||
1144 | |||
938 | PORT FUNCTIONS | 1145 | PORT FUNCTIONS |
939 | -------------- | 1146 | ============== |
940 | 1147 | ||
941 | The functions in the port->ops structure (struct parport_operations) | 1148 | The functions in the port->ops structure (struct parport_operations) |
942 | are provided by the low-level driver responsible for that port. | 1149 | are provided by the low-level driver responsible for that port. |
943 | 1150 | ||
944 | port->ops->read_data - read the data register | 1151 | port->ops->read_data - read the data register |
945 | -------------------- | 1152 | --------------------------------------------- |
946 | 1153 | ||
947 | SYNOPSIS | 1154 | SYNOPSIS |
1155 | ^^^^^^^^ | ||
948 | 1156 | ||
949 | #include <linux/parport.h> | 1157 | :: |
950 | 1158 | ||
951 | struct parport_operations { | 1159 | #include <linux/parport.h> |
952 | ... | 1160 | |
953 | unsigned char (*read_data) (struct parport *port); | 1161 | struct parport_operations { |
954 | ... | 1162 | ... |
955 | }; | 1163 | unsigned char (*read_data) (struct parport *port); |
1164 | ... | ||
1165 | }; | ||
956 | 1166 | ||
957 | DESCRIPTION | 1167 | DESCRIPTION |
1168 | ^^^^^^^^^^^ | ||
958 | 1169 | ||
959 | If port->modes contains the PARPORT_MODE_TRISTATE flag and the | 1170 | If port->modes contains the PARPORT_MODE_TRISTATE flag and the |
960 | PARPORT_CONTROL_DIRECTION bit in the control register is set, this | 1171 | PARPORT_CONTROL_DIRECTION bit in the control register is set, this |
@@ -964,45 +1175,59 @@ not set, the return value _may_ be the last value written to the data | |||
964 | register. Otherwise the return value is undefined. | 1175 | register. Otherwise the return value is undefined. |
965 | 1176 | ||
966 | SEE ALSO | 1177 | SEE ALSO |
1178 | ^^^^^^^^ | ||
967 | 1179 | ||
968 | write_data, read_status, write_control | 1180 | write_data, read_status, write_control |
1181 | |||
1182 | |||
969 | 1183 | ||
970 | port->ops->write_data - write the data register | 1184 | port->ops->write_data - write the data register |
971 | --------------------- | 1185 | ----------------------------------------------- |
972 | 1186 | ||
973 | SYNOPSIS | 1187 | SYNOPSIS |
1188 | ^^^^^^^^ | ||
974 | 1189 | ||
975 | #include <linux/parport.h> | 1190 | :: |
976 | 1191 | ||
977 | struct parport_operations { | 1192 | #include <linux/parport.h> |
978 | ... | 1193 | |
979 | void (*write_data) (struct parport *port, unsigned char d); | 1194 | struct parport_operations { |
980 | ... | 1195 | ... |
981 | }; | 1196 | void (*write_data) (struct parport *port, unsigned char d); |
1197 | ... | ||
1198 | }; | ||
982 | 1199 | ||
983 | DESCRIPTION | 1200 | DESCRIPTION |
1201 | ^^^^^^^^^^^ | ||
984 | 1202 | ||
985 | Writes to the data register. May have side-effects (a STROBE pulse, | 1203 | Writes to the data register. May have side-effects (a STROBE pulse, |
986 | for instance). | 1204 | for instance). |
987 | 1205 | ||
988 | SEE ALSO | 1206 | SEE ALSO |
1207 | ^^^^^^^^ | ||
989 | 1208 | ||
990 | read_data, read_status, write_control | 1209 | read_data, read_status, write_control |
1210 | |||
1211 | |||
991 | 1212 | ||
992 | port->ops->read_status - read the status register | 1213 | port->ops->read_status - read the status register |
993 | ---------------------- | 1214 | ------------------------------------------------- |
994 | 1215 | ||
995 | SYNOPSIS | 1216 | SYNOPSIS |
1217 | ^^^^^^^^ | ||
996 | 1218 | ||
997 | #include <linux/parport.h> | 1219 | :: |
998 | 1220 | ||
999 | struct parport_operations { | 1221 | #include <linux/parport.h> |
1000 | ... | 1222 | |
1001 | unsigned char (*read_status) (struct parport *port); | 1223 | struct parport_operations { |
1002 | ... | 1224 | ... |
1003 | }; | 1225 | unsigned char (*read_status) (struct parport *port); |
1226 | ... | ||
1227 | }; | ||
1004 | 1228 | ||
1005 | DESCRIPTION | 1229 | DESCRIPTION |
1230 | ^^^^^^^^^^^ | ||
1006 | 1231 | ||
1007 | Reads from the status register. This is a bitmask: | 1232 | Reads from the status register. This is a bitmask: |
1008 | 1233 | ||
@@ -1015,76 +1240,98 @@ Reads from the status register. This is a bitmask: | |||
1015 | There may be other bits set. | 1240 | There may be other bits set. |
1016 | 1241 | ||
1017 | SEE ALSO | 1242 | SEE ALSO |
1243 | ^^^^^^^^ | ||
1018 | 1244 | ||
1019 | read_data, write_data, write_control | 1245 | read_data, write_data, write_control |
1246 | |||
1247 | |||
1020 | 1248 | ||
1021 | port->ops->read_control - read the control register | 1249 | port->ops->read_control - read the control register |
1022 | ----------------------- | 1250 | --------------------------------------------------- |
1023 | 1251 | ||
1024 | SYNOPSIS | 1252 | SYNOPSIS |
1253 | ^^^^^^^^ | ||
1025 | 1254 | ||
1026 | #include <linux/parport.h> | 1255 | :: |
1027 | 1256 | ||
1028 | struct parport_operations { | 1257 | #include <linux/parport.h> |
1029 | ... | 1258 | |
1030 | unsigned char (*read_control) (struct parport *port); | 1259 | struct parport_operations { |
1031 | ... | 1260 | ... |
1032 | }; | 1261 | unsigned char (*read_control) (struct parport *port); |
1262 | ... | ||
1263 | }; | ||
1033 | 1264 | ||
1034 | DESCRIPTION | 1265 | DESCRIPTION |
1266 | ^^^^^^^^^^^ | ||
1035 | 1267 | ||
1036 | Returns the last value written to the control register (either from | 1268 | Returns the last value written to the control register (either from |
1037 | write_control or frob_control). No port access is performed. | 1269 | write_control or frob_control). No port access is performed. |
1038 | 1270 | ||
1039 | SEE ALSO | 1271 | SEE ALSO |
1272 | ^^^^^^^^ | ||
1040 | 1273 | ||
1041 | read_data, write_data, read_status, write_control | 1274 | read_data, write_data, read_status, write_control |
1275 | |||
1276 | |||
1042 | 1277 | ||
1043 | port->ops->write_control - write the control register | 1278 | port->ops->write_control - write the control register |
1044 | ------------------------ | 1279 | ----------------------------------------------------- |
1045 | 1280 | ||
1046 | SYNOPSIS | 1281 | SYNOPSIS |
1282 | ^^^^^^^^ | ||
1047 | 1283 | ||
1048 | #include <linux/parport.h> | 1284 | :: |
1049 | 1285 | ||
1050 | struct parport_operations { | 1286 | #include <linux/parport.h> |
1051 | ... | 1287 | |
1052 | void (*write_control) (struct parport *port, unsigned char s); | 1288 | struct parport_operations { |
1053 | ... | 1289 | ... |
1054 | }; | 1290 | void (*write_control) (struct parport *port, unsigned char s); |
1291 | ... | ||
1292 | }; | ||
1055 | 1293 | ||
1056 | DESCRIPTION | 1294 | DESCRIPTION |
1295 | ^^^^^^^^^^^ | ||
1057 | 1296 | ||
1058 | Writes to the control register. This is a bitmask: | 1297 | Writes to the control register. This is a bitmask:: |
1059 | _______ | 1298 | |
1060 | - PARPORT_CONTROL_STROBE (nStrobe) | 1299 | _______ |
1061 | _______ | 1300 | - PARPORT_CONTROL_STROBE (nStrobe) |
1062 | - PARPORT_CONTROL_AUTOFD (nAutoFd) | 1301 | _______ |
1063 | _____ | 1302 | - PARPORT_CONTROL_AUTOFD (nAutoFd) |
1064 | - PARPORT_CONTROL_INIT (nInit) | 1303 | _____ |
1065 | _________ | 1304 | - PARPORT_CONTROL_INIT (nInit) |
1066 | - PARPORT_CONTROL_SELECT (nSelectIn) | 1305 | _________ |
1306 | - PARPORT_CONTROL_SELECT (nSelectIn) | ||
1067 | 1307 | ||
1068 | SEE ALSO | 1308 | SEE ALSO |
1309 | ^^^^^^^^ | ||
1069 | 1310 | ||
1070 | read_data, write_data, read_status, frob_control | 1311 | read_data, write_data, read_status, frob_control |
1312 | |||
1313 | |||
1071 | 1314 | ||
1072 | port->ops->frob_control - write control register bits | 1315 | port->ops->frob_control - write control register bits |
1073 | ----------------------- | 1316 | ----------------------------------------------------- |
1074 | 1317 | ||
1075 | SYNOPSIS | 1318 | SYNOPSIS |
1319 | ^^^^^^^^ | ||
1076 | 1320 | ||
1077 | #include <linux/parport.h> | 1321 | :: |
1078 | 1322 | ||
1079 | struct parport_operations { | 1323 | #include <linux/parport.h> |
1080 | ... | 1324 | |
1081 | unsigned char (*frob_control) (struct parport *port, | 1325 | struct parport_operations { |
1082 | unsigned char mask, | 1326 | ... |
1083 | unsigned char val); | 1327 | unsigned char (*frob_control) (struct parport *port, |
1084 | ... | 1328 | unsigned char mask, |
1085 | }; | 1329 | unsigned char val); |
1330 | ... | ||
1331 | }; | ||
1086 | 1332 | ||
1087 | DESCRIPTION | 1333 | DESCRIPTION |
1334 | ^^^^^^^^^^^ | ||
1088 | 1335 | ||
1089 | This is equivalent to reading from the control register, masking out | 1336 | This is equivalent to reading from the control register, masking out |
1090 | the bits in mask, exclusive-or'ing with the bits in val, and writing | 1337 | the bits in mask, exclusive-or'ing with the bits in val, and writing |
@@ -1095,23 +1342,30 @@ of its contents is maintained, so frob_control is in fact only one | |||
1095 | port access. | 1342 | port access. |
1096 | 1343 | ||
1097 | SEE ALSO | 1344 | SEE ALSO |
1345 | ^^^^^^^^ | ||
1098 | 1346 | ||
1099 | read_data, write_data, read_status, write_control | 1347 | read_data, write_data, read_status, write_control |
1348 | |||
1349 | |||
1100 | 1350 | ||
1101 | port->ops->enable_irq - enable interrupt generation | 1351 | port->ops->enable_irq - enable interrupt generation |
1102 | --------------------- | 1352 | --------------------------------------------------- |
1103 | 1353 | ||
1104 | SYNOPSIS | 1354 | SYNOPSIS |
1355 | ^^^^^^^^ | ||
1105 | 1356 | ||
1106 | #include <linux/parport.h> | 1357 | :: |
1107 | 1358 | ||
1108 | struct parport_operations { | 1359 | #include <linux/parport.h> |
1109 | ... | 1360 | |
1110 | void (*enable_irq) (struct parport *port); | 1361 | struct parport_operations { |
1111 | ... | 1362 | ... |
1112 | }; | 1363 | void (*enable_irq) (struct parport *port); |
1364 | ... | ||
1365 | }; | ||
1113 | 1366 | ||
1114 | DESCRIPTION | 1367 | DESCRIPTION |
1368 | ^^^^^^^^^^^ | ||
1115 | 1369 | ||
1116 | The parallel port hardware is instructed to generate interrupts at | 1370 | The parallel port hardware is instructed to generate interrupts at |
1117 | appropriate moments, although those moments are | 1371 | appropriate moments, although those moments are |
@@ -1119,353 +1373,460 @@ architecture-specific. For the PC architecture, interrupts are | |||
1119 | commonly generated on the rising edge of nAck. | 1373 | commonly generated on the rising edge of nAck. |
1120 | 1374 | ||
1121 | SEE ALSO | 1375 | SEE ALSO |
1376 | ^^^^^^^^ | ||
1122 | 1377 | ||
1123 | disable_irq | 1378 | disable_irq |
1379 | |||
1380 | |||
1124 | 1381 | ||
1125 | port->ops->disable_irq - disable interrupt generation | 1382 | port->ops->disable_irq - disable interrupt generation |
1126 | ---------------------- | 1383 | ----------------------------------------------------- |
1127 | 1384 | ||
1128 | SYNOPSIS | 1385 | SYNOPSIS |
1386 | ^^^^^^^^ | ||
1129 | 1387 | ||
1130 | #include <linux/parport.h> | 1388 | :: |
1131 | 1389 | ||
1132 | struct parport_operations { | 1390 | #include <linux/parport.h> |
1133 | ... | 1391 | |
1134 | void (*disable_irq) (struct parport *port); | 1392 | struct parport_operations { |
1135 | ... | 1393 | ... |
1136 | }; | 1394 | void (*disable_irq) (struct parport *port); |
1395 | ... | ||
1396 | }; | ||
1137 | 1397 | ||
1138 | DESCRIPTION | 1398 | DESCRIPTION |
1399 | ^^^^^^^^^^^ | ||
1139 | 1400 | ||
1140 | The parallel port hardware is instructed not to generate interrupts. | 1401 | The parallel port hardware is instructed not to generate interrupts. |
1141 | The interrupt itself is not masked. | 1402 | The interrupt itself is not masked. |
1142 | 1403 | ||
1143 | SEE ALSO | 1404 | SEE ALSO |
1405 | ^^^^^^^^ | ||
1144 | 1406 | ||
1145 | enable_irq | 1407 | enable_irq |
1146 | 1408 | ||
1409 | |||
1410 | |||
1147 | port->ops->data_forward - enable data drivers | 1411 | port->ops->data_forward - enable data drivers |
1148 | ----------------------- | 1412 | --------------------------------------------- |
1149 | 1413 | ||
1150 | SYNOPSIS | 1414 | SYNOPSIS |
1415 | ^^^^^^^^ | ||
1151 | 1416 | ||
1152 | #include <linux/parport.h> | 1417 | :: |
1153 | 1418 | ||
1154 | struct parport_operations { | 1419 | #include <linux/parport.h> |
1155 | ... | 1420 | |
1156 | void (*data_forward) (struct parport *port); | 1421 | struct parport_operations { |
1157 | ... | 1422 | ... |
1158 | }; | 1423 | void (*data_forward) (struct parport *port); |
1424 | ... | ||
1425 | }; | ||
1159 | 1426 | ||
1160 | DESCRIPTION | 1427 | DESCRIPTION |
1428 | ^^^^^^^^^^^ | ||
1161 | 1429 | ||
1162 | Enables the data line drivers, for 8-bit host-to-peripheral | 1430 | Enables the data line drivers, for 8-bit host-to-peripheral |
1163 | communications. | 1431 | communications. |
1164 | 1432 | ||
1165 | SEE ALSO | 1433 | SEE ALSO |
1434 | ^^^^^^^^ | ||
1166 | 1435 | ||
1167 | data_reverse | 1436 | data_reverse |
1437 | |||
1438 | |||
1168 | 1439 | ||
1169 | port->ops->data_reverse - tristate the buffer | 1440 | port->ops->data_reverse - tristate the buffer |
1170 | ----------------------- | 1441 | --------------------------------------------- |
1171 | 1442 | ||
1172 | SYNOPSIS | 1443 | SYNOPSIS |
1444 | ^^^^^^^^ | ||
1173 | 1445 | ||
1174 | #include <linux/parport.h> | 1446 | :: |
1175 | 1447 | ||
1176 | struct parport_operations { | 1448 | #include <linux/parport.h> |
1177 | ... | 1449 | |
1178 | void (*data_reverse) (struct parport *port); | 1450 | struct parport_operations { |
1179 | ... | 1451 | ... |
1180 | }; | 1452 | void (*data_reverse) (struct parport *port); |
1453 | ... | ||
1454 | }; | ||
1181 | 1455 | ||
1182 | DESCRIPTION | 1456 | DESCRIPTION |
1457 | ^^^^^^^^^^^ | ||
1183 | 1458 | ||
1184 | Places the data bus in a high impedance state, if port->modes has the | 1459 | Places the data bus in a high impedance state, if port->modes has the |
1185 | PARPORT_MODE_TRISTATE bit set. | 1460 | PARPORT_MODE_TRISTATE bit set. |
1186 | 1461 | ||
1187 | SEE ALSO | 1462 | SEE ALSO |
1463 | ^^^^^^^^ | ||
1188 | 1464 | ||
1189 | data_forward | 1465 | data_forward |
1190 | 1466 | ||
1467 | |||
1468 | |||
1191 | port->ops->epp_write_data - write EPP data | 1469 | port->ops->epp_write_data - write EPP data |
1192 | ------------------------- | 1470 | ------------------------------------------ |
1193 | 1471 | ||
1194 | SYNOPSIS | 1472 | SYNOPSIS |
1473 | ^^^^^^^^ | ||
1195 | 1474 | ||
1196 | #include <linux/parport.h> | 1475 | :: |
1197 | 1476 | ||
1198 | struct parport_operations { | 1477 | #include <linux/parport.h> |
1199 | ... | 1478 | |
1200 | size_t (*epp_write_data) (struct parport *port, const void *buf, | 1479 | struct parport_operations { |
1201 | size_t len, int flags); | 1480 | ... |
1202 | ... | 1481 | size_t (*epp_write_data) (struct parport *port, const void *buf, |
1203 | }; | 1482 | size_t len, int flags); |
1483 | ... | ||
1484 | }; | ||
1204 | 1485 | ||
1205 | DESCRIPTION | 1486 | DESCRIPTION |
1487 | ^^^^^^^^^^^ | ||
1206 | 1488 | ||
1207 | Writes data in EPP mode, and returns the number of bytes written. | 1489 | Writes data in EPP mode, and returns the number of bytes written. |
1208 | 1490 | ||
1209 | The 'flags' parameter may be one or more of the following, | 1491 | The ``flags`` parameter may be one or more of the following, |
1210 | bitwise-or'ed together: | 1492 | bitwise-or'ed together: |
1211 | 1493 | ||
1494 | ======================= ================================================= | ||
1212 | PARPORT_EPP_FAST Use fast transfers. Some chips provide 16-bit and | 1495 | PARPORT_EPP_FAST Use fast transfers. Some chips provide 16-bit and |
1213 | 32-bit registers. However, if a transfer | 1496 | 32-bit registers. However, if a transfer |
1214 | times out, the return value may be unreliable. | 1497 | times out, the return value may be unreliable. |
1498 | ======================= ================================================= | ||
1215 | 1499 | ||
1216 | SEE ALSO | 1500 | SEE ALSO |
1501 | ^^^^^^^^ | ||
1217 | 1502 | ||
1218 | epp_read_data, epp_write_addr, epp_read_addr | 1503 | epp_read_data, epp_write_addr, epp_read_addr |
1504 | |||
1505 | |||
1219 | 1506 | ||
1220 | port->ops->epp_read_data - read EPP data | 1507 | port->ops->epp_read_data - read EPP data |
1221 | ------------------------ | 1508 | ---------------------------------------- |
1222 | 1509 | ||
1223 | SYNOPSIS | 1510 | SYNOPSIS |
1511 | ^^^^^^^^ | ||
1224 | 1512 | ||
1225 | #include <linux/parport.h> | 1513 | :: |
1226 | 1514 | ||
1227 | struct parport_operations { | 1515 | #include <linux/parport.h> |
1228 | ... | 1516 | |
1229 | size_t (*epp_read_data) (struct parport *port, void *buf, | 1517 | struct parport_operations { |
1230 | size_t len, int flags); | 1518 | ... |
1231 | ... | 1519 | size_t (*epp_read_data) (struct parport *port, void *buf, |
1232 | }; | 1520 | size_t len, int flags); |
1521 | ... | ||
1522 | }; | ||
1233 | 1523 | ||
1234 | DESCRIPTION | 1524 | DESCRIPTION |
1525 | ^^^^^^^^^^^ | ||
1235 | 1526 | ||
1236 | Reads data in EPP mode, and returns the number of bytes read. | 1527 | Reads data in EPP mode, and returns the number of bytes read. |
1237 | 1528 | ||
1238 | The 'flags' parameter may be one or more of the following, | 1529 | The ``flags`` parameter may be one or more of the following, |
1239 | bitwise-or'ed together: | 1530 | bitwise-or'ed together: |
1240 | 1531 | ||
1532 | ======================= ================================================= | ||
1241 | PARPORT_EPP_FAST Use fast transfers. Some chips provide 16-bit and | 1533 | PARPORT_EPP_FAST Use fast transfers. Some chips provide 16-bit and |
1242 | 32-bit registers. However, if a transfer | 1534 | 32-bit registers. However, if a transfer |
1243 | times out, the return value may be unreliable. | 1535 | times out, the return value may be unreliable. |
1536 | ======================= ================================================= | ||
1244 | 1537 | ||
1245 | SEE ALSO | 1538 | SEE ALSO |
1539 | ^^^^^^^^ | ||
1246 | 1540 | ||
1247 | epp_write_data, epp_write_addr, epp_read_addr | 1541 | epp_write_data, epp_write_addr, epp_read_addr |
1248 | 1542 | ||
1543 | |||
1544 | |||
1249 | port->ops->epp_write_addr - write EPP address | 1545 | port->ops->epp_write_addr - write EPP address |
1250 | ------------------------- | 1546 | --------------------------------------------- |
1251 | 1547 | ||
1252 | SYNOPSIS | 1548 | SYNOPSIS |
1549 | ^^^^^^^^ | ||
1253 | 1550 | ||
1254 | #include <linux/parport.h> | 1551 | :: |
1255 | 1552 | ||
1256 | struct parport_operations { | 1553 | #include <linux/parport.h> |
1257 | ... | 1554 | |
1258 | size_t (*epp_write_addr) (struct parport *port, | 1555 | struct parport_operations { |
1259 | const void *buf, size_t len, int flags); | 1556 | ... |
1260 | ... | 1557 | size_t (*epp_write_addr) (struct parport *port, |
1261 | }; | 1558 | const void *buf, size_t len, int flags); |
1559 | ... | ||
1560 | }; | ||
1262 | 1561 | ||
1263 | DESCRIPTION | 1562 | DESCRIPTION |
1563 | ^^^^^^^^^^^ | ||
1264 | 1564 | ||
1265 | Writes EPP addresses (8 bits each), and returns the number written. | 1565 | Writes EPP addresses (8 bits each), and returns the number written. |
1266 | 1566 | ||
1267 | The 'flags' parameter may be one or more of the following, | 1567 | The ``flags`` parameter may be one or more of the following, |
1268 | bitwise-or'ed together: | 1568 | bitwise-or'ed together: |
1269 | 1569 | ||
1570 | ======================= ================================================= | ||
1270 | PARPORT_EPP_FAST Use fast transfers. Some chips provide 16-bit and | 1571 | PARPORT_EPP_FAST Use fast transfers. Some chips provide 16-bit and |
1271 | 32-bit registers. However, if a transfer | 1572 | 32-bit registers. However, if a transfer |
1272 | times out, the return value may be unreliable. | 1573 | times out, the return value may be unreliable. |
1574 | ======================= ================================================= | ||
1273 | 1575 | ||
1274 | (Does PARPORT_EPP_FAST make sense for this function?) | 1576 | (Does PARPORT_EPP_FAST make sense for this function?) |
1275 | 1577 | ||
1276 | SEE ALSO | 1578 | SEE ALSO |
1579 | ^^^^^^^^ | ||
1277 | 1580 | ||
1278 | epp_write_data, epp_read_data, epp_read_addr | 1581 | epp_write_data, epp_read_data, epp_read_addr |
1582 | |||
1583 | |||
1279 | 1584 | ||
1280 | port->ops->epp_read_addr - read EPP address | 1585 | port->ops->epp_read_addr - read EPP address |
1281 | ------------------------ | 1586 | ------------------------------------------- |
1282 | 1587 | ||
1283 | SYNOPSIS | 1588 | SYNOPSIS |
1589 | ^^^^^^^^ | ||
1284 | 1590 | ||
1285 | #include <linux/parport.h> | 1591 | :: |
1286 | 1592 | ||
1287 | struct parport_operations { | 1593 | #include <linux/parport.h> |
1288 | ... | 1594 | |
1289 | size_t (*epp_read_addr) (struct parport *port, void *buf, | 1595 | struct parport_operations { |
1290 | size_t len, int flags); | 1596 | ... |
1291 | ... | 1597 | size_t (*epp_read_addr) (struct parport *port, void *buf, |
1292 | }; | 1598 | size_t len, int flags); |
1599 | ... | ||
1600 | }; | ||
1293 | 1601 | ||
1294 | DESCRIPTION | 1602 | DESCRIPTION |
1603 | ^^^^^^^^^^^ | ||
1295 | 1604 | ||
1296 | Reads EPP addresses (8 bits each), and returns the number read. | 1605 | Reads EPP addresses (8 bits each), and returns the number read. |
1297 | 1606 | ||
1298 | The 'flags' parameter may be one or more of the following, | 1607 | The ``flags`` parameter may be one or more of the following, |
1299 | bitwise-or'ed together: | 1608 | bitwise-or'ed together: |
1300 | 1609 | ||
1610 | ======================= ================================================= | ||
1301 | PARPORT_EPP_FAST Use fast transfers. Some chips provide 16-bit and | 1611 | PARPORT_EPP_FAST Use fast transfers. Some chips provide 16-bit and |
1302 | 32-bit registers. However, if a transfer | 1612 | 32-bit registers. However, if a transfer |
1303 | times out, the return value may be unreliable. | 1613 | times out, the return value may be unreliable. |
1614 | ======================= ================================================= | ||
1304 | 1615 | ||
1305 | (Does PARPORT_EPP_FAST make sense for this function?) | 1616 | (Does PARPORT_EPP_FAST make sense for this function?) |
1306 | 1617 | ||
1307 | SEE ALSO | 1618 | SEE ALSO |
1619 | ^^^^^^^^ | ||
1308 | 1620 | ||
1309 | epp_write_data, epp_read_data, epp_write_addr | 1621 | epp_write_data, epp_read_data, epp_write_addr |
1622 | |||
1623 | |||
1310 | 1624 | ||
1311 | port->ops->ecp_write_data - write a block of ECP data | 1625 | port->ops->ecp_write_data - write a block of ECP data |
1312 | ------------------------- | 1626 | ----------------------------------------------------- |
1313 | 1627 | ||
1314 | SYNOPSIS | 1628 | SYNOPSIS |
1629 | ^^^^^^^^ | ||
1315 | 1630 | ||
1316 | #include <linux/parport.h> | 1631 | :: |
1317 | 1632 | ||
1318 | struct parport_operations { | 1633 | #include <linux/parport.h> |
1319 | ... | 1634 | |
1320 | size_t (*ecp_write_data) (struct parport *port, | 1635 | struct parport_operations { |
1321 | const void *buf, size_t len, int flags); | 1636 | ... |
1322 | ... | 1637 | size_t (*ecp_write_data) (struct parport *port, |
1323 | }; | 1638 | const void *buf, size_t len, int flags); |
1639 | ... | ||
1640 | }; | ||
1324 | 1641 | ||
1325 | DESCRIPTION | 1642 | DESCRIPTION |
1643 | ^^^^^^^^^^^ | ||
1326 | 1644 | ||
1327 | Writes a block of ECP data. The 'flags' parameter is ignored. | 1645 | Writes a block of ECP data. The ``flags`` parameter is ignored. |
1328 | 1646 | ||
1329 | RETURN VALUE | 1647 | RETURN VALUE |
1648 | ^^^^^^^^^^^^ | ||
1330 | 1649 | ||
1331 | The number of bytes written. | 1650 | The number of bytes written. |
1332 | 1651 | ||
1333 | SEE ALSO | 1652 | SEE ALSO |
1653 | ^^^^^^^^ | ||
1334 | 1654 | ||
1335 | ecp_read_data, ecp_write_addr | 1655 | ecp_read_data, ecp_write_addr |
1336 | 1656 | ||
1657 | |||
1658 | |||
1337 | port->ops->ecp_read_data - read a block of ECP data | 1659 | port->ops->ecp_read_data - read a block of ECP data |
1338 | ------------------------ | 1660 | --------------------------------------------------- |
1339 | 1661 | ||
1340 | SYNOPSIS | 1662 | SYNOPSIS |
1663 | ^^^^^^^^ | ||
1341 | 1664 | ||
1342 | #include <linux/parport.h> | 1665 | :: |
1343 | 1666 | ||
1344 | struct parport_operations { | 1667 | #include <linux/parport.h> |
1345 | ... | 1668 | |
1346 | size_t (*ecp_read_data) (struct parport *port, | 1669 | struct parport_operations { |
1347 | void *buf, size_t len, int flags); | 1670 | ... |
1348 | ... | 1671 | size_t (*ecp_read_data) (struct parport *port, |
1349 | }; | 1672 | void *buf, size_t len, int flags); |
1673 | ... | ||
1674 | }; | ||
1350 | 1675 | ||
1351 | DESCRIPTION | 1676 | DESCRIPTION |
1677 | ^^^^^^^^^^^ | ||
1352 | 1678 | ||
1353 | Reads a block of ECP data. The 'flags' parameter is ignored. | 1679 | Reads a block of ECP data. The ``flags`` parameter is ignored. |
1354 | 1680 | ||
1355 | RETURN VALUE | 1681 | RETURN VALUE |
1682 | ^^^^^^^^^^^^ | ||
1356 | 1683 | ||
1357 | The number of bytes read. NB. There may be more unread data in a | 1684 | The number of bytes read. NB. There may be more unread data in a |
1358 | FIFO. Is there a way of stunning the FIFO to prevent this? | 1685 | FIFO. Is there a way of stunning the FIFO to prevent this? |
1359 | 1686 | ||
1360 | SEE ALSO | 1687 | SEE ALSO |
1688 | ^^^^^^^^ | ||
1361 | 1689 | ||
1362 | ecp_write_block, ecp_write_addr | 1690 | ecp_write_block, ecp_write_addr |
1363 | 1691 | ||
1692 | |||
1693 | |||
1364 | port->ops->ecp_write_addr - write a block of ECP addresses | 1694 | port->ops->ecp_write_addr - write a block of ECP addresses |
1365 | ------------------------- | 1695 | ---------------------------------------------------------- |
1366 | 1696 | ||
1367 | SYNOPSIS | 1697 | SYNOPSIS |
1698 | ^^^^^^^^ | ||
1368 | 1699 | ||
1369 | #include <linux/parport.h> | 1700 | :: |
1370 | 1701 | ||
1371 | struct parport_operations { | 1702 | #include <linux/parport.h> |
1372 | ... | 1703 | |
1373 | size_t (*ecp_write_addr) (struct parport *port, | 1704 | struct parport_operations { |
1374 | const void *buf, size_t len, int flags); | 1705 | ... |
1375 | ... | 1706 | size_t (*ecp_write_addr) (struct parport *port, |
1376 | }; | 1707 | const void *buf, size_t len, int flags); |
1708 | ... | ||
1709 | }; | ||
1377 | 1710 | ||
1378 | DESCRIPTION | 1711 | DESCRIPTION |
1712 | ^^^^^^^^^^^ | ||
1379 | 1713 | ||
1380 | Writes a block of ECP addresses. The 'flags' parameter is ignored. | 1714 | Writes a block of ECP addresses. The ``flags`` parameter is ignored. |
1381 | 1715 | ||
1382 | RETURN VALUE | 1716 | RETURN VALUE |
1717 | ^^^^^^^^^^^^ | ||
1383 | 1718 | ||
1384 | The number of bytes written. | 1719 | The number of bytes written. |
1385 | 1720 | ||
1386 | NOTES | 1721 | NOTES |
1722 | ^^^^^ | ||
1387 | 1723 | ||
1388 | This may use a FIFO, and if so shall not return until the FIFO is empty. | 1724 | This may use a FIFO, and if so shall not return until the FIFO is empty. |
1389 | 1725 | ||
1390 | SEE ALSO | 1726 | SEE ALSO |
1727 | ^^^^^^^^ | ||
1391 | 1728 | ||
1392 | ecp_read_data, ecp_write_data | 1729 | ecp_read_data, ecp_write_data |
1393 | 1730 | ||
1731 | |||
1732 | |||
1394 | port->ops->nibble_read_data - read a block of data in nibble mode | 1733 | port->ops->nibble_read_data - read a block of data in nibble mode |
1395 | --------------------------- | 1734 | ----------------------------------------------------------------- |
1396 | 1735 | ||
1397 | SYNOPSIS | 1736 | SYNOPSIS |
1737 | ^^^^^^^^ | ||
1398 | 1738 | ||
1399 | #include <linux/parport.h> | 1739 | :: |
1400 | 1740 | ||
1401 | struct parport_operations { | 1741 | #include <linux/parport.h> |
1402 | ... | 1742 | |
1403 | size_t (*nibble_read_data) (struct parport *port, | 1743 | struct parport_operations { |
1404 | void *buf, size_t len, int flags); | 1744 | ... |
1405 | ... | 1745 | size_t (*nibble_read_data) (struct parport *port, |
1406 | }; | 1746 | void *buf, size_t len, int flags); |
1747 | ... | ||
1748 | }; | ||
1407 | 1749 | ||
1408 | DESCRIPTION | 1750 | DESCRIPTION |
1751 | ^^^^^^^^^^^ | ||
1409 | 1752 | ||
1410 | Reads a block of data in nibble mode. The 'flags' parameter is ignored. | 1753 | Reads a block of data in nibble mode. The ``flags`` parameter is ignored. |
1411 | 1754 | ||
1412 | RETURN VALUE | 1755 | RETURN VALUE |
1756 | ^^^^^^^^^^^^ | ||
1413 | 1757 | ||
1414 | The number of whole bytes read. | 1758 | The number of whole bytes read. |
1415 | 1759 | ||
1416 | SEE ALSO | 1760 | SEE ALSO |
1761 | ^^^^^^^^ | ||
1417 | 1762 | ||
1418 | byte_read_data, compat_write_data | 1763 | byte_read_data, compat_write_data |
1764 | |||
1765 | |||
1419 | 1766 | ||
1420 | port->ops->byte_read_data - read a block of data in byte mode | 1767 | port->ops->byte_read_data - read a block of data in byte mode |
1421 | ------------------------- | 1768 | ------------------------------------------------------------- |
1422 | 1769 | ||
1423 | SYNOPSIS | 1770 | SYNOPSIS |
1771 | ^^^^^^^^ | ||
1424 | 1772 | ||
1425 | #include <linux/parport.h> | 1773 | :: |
1426 | 1774 | ||
1427 | struct parport_operations { | 1775 | #include <linux/parport.h> |
1428 | ... | 1776 | |
1429 | size_t (*byte_read_data) (struct parport *port, | 1777 | struct parport_operations { |
1430 | void *buf, size_t len, int flags); | 1778 | ... |
1431 | ... | 1779 | size_t (*byte_read_data) (struct parport *port, |
1432 | }; | 1780 | void *buf, size_t len, int flags); |
1781 | ... | ||
1782 | }; | ||
1433 | 1783 | ||
1434 | DESCRIPTION | 1784 | DESCRIPTION |
1785 | ^^^^^^^^^^^ | ||
1435 | 1786 | ||
1436 | Reads a block of data in byte mode. The 'flags' parameter is ignored. | 1787 | Reads a block of data in byte mode. The ``flags`` parameter is ignored. |
1437 | 1788 | ||
1438 | RETURN VALUE | 1789 | RETURN VALUE |
1790 | ^^^^^^^^^^^^ | ||
1439 | 1791 | ||
1440 | The number of bytes read. | 1792 | The number of bytes read. |
1441 | 1793 | ||
1442 | SEE ALSO | 1794 | SEE ALSO |
1795 | ^^^^^^^^ | ||
1443 | 1796 | ||
1444 | nibble_read_data, compat_write_data | 1797 | nibble_read_data, compat_write_data |
1798 | |||
1799 | |||
1445 | 1800 | ||
1446 | port->ops->compat_write_data - write a block of data in compatibility mode | 1801 | port->ops->compat_write_data - write a block of data in compatibility mode |
1447 | ---------------------------- | 1802 | -------------------------------------------------------------------------- |
1448 | 1803 | ||
1449 | SYNOPSIS | 1804 | SYNOPSIS |
1805 | ^^^^^^^^ | ||
1450 | 1806 | ||
1451 | #include <linux/parport.h> | 1807 | :: |
1452 | 1808 | ||
1453 | struct parport_operations { | 1809 | #include <linux/parport.h> |
1454 | ... | 1810 | |
1455 | size_t (*compat_write_data) (struct parport *port, | 1811 | struct parport_operations { |
1456 | const void *buf, size_t len, int flags); | 1812 | ... |
1457 | ... | 1813 | size_t (*compat_write_data) (struct parport *port, |
1458 | }; | 1814 | const void *buf, size_t len, int flags); |
1815 | ... | ||
1816 | }; | ||
1459 | 1817 | ||
1460 | DESCRIPTION | 1818 | DESCRIPTION |
1819 | ^^^^^^^^^^^ | ||
1461 | 1820 | ||
1462 | Writes a block of data in compatibility mode. The 'flags' parameter | 1821 | Writes a block of data in compatibility mode. The ``flags`` parameter |
1463 | is ignored. | 1822 | is ignored. |
1464 | 1823 | ||
1465 | RETURN VALUE | 1824 | RETURN VALUE |
1825 | ^^^^^^^^^^^^ | ||
1466 | 1826 | ||
1467 | The number of bytes written. | 1827 | The number of bytes written. |
1468 | 1828 | ||
1469 | SEE ALSO | 1829 | SEE ALSO |
1830 | ^^^^^^^^ | ||
1470 | 1831 | ||
1471 | nibble_read_data, byte_read_data | 1832 | nibble_read_data, byte_read_data |
diff --git a/Documentation/percpu-rw-semaphore.txt b/Documentation/percpu-rw-semaphore.txt index 7d3c82431909..247de6410855 100644 --- a/Documentation/percpu-rw-semaphore.txt +++ b/Documentation/percpu-rw-semaphore.txt | |||
@@ -1,5 +1,6 @@ | |||
1 | ==================== | ||
1 | Percpu rw semaphores | 2 | Percpu rw semaphores |
2 | -------------------- | 3 | ==================== |
3 | 4 | ||
4 | Percpu rw semaphores is a new read-write semaphore design that is | 5 | Percpu rw semaphores is a new read-write semaphore design that is |
5 | optimized for locking for reading. | 6 | optimized for locking for reading. |
diff --git a/Documentation/phy.txt b/Documentation/phy.txt index 383cdd863f08..457c3e0f86d6 100644 --- a/Documentation/phy.txt +++ b/Documentation/phy.txt | |||
@@ -1,10 +1,14 @@ | |||
1 | PHY SUBSYSTEM | 1 | ============= |
2 | Kishon Vijay Abraham I <kishon@ti.com> | 2 | PHY subsystem |
3 | ============= | ||
4 | |||
5 | :Author: Kishon Vijay Abraham I <kishon@ti.com> | ||
3 | 6 | ||
4 | This document explains the Generic PHY Framework along with the APIs provided, | 7 | This document explains the Generic PHY Framework along with the APIs provided, |
5 | and how-to-use. | 8 | and how-to-use. |
6 | 9 | ||
7 | 1. Introduction | 10 | Introduction |
11 | ============ | ||
8 | 12 | ||
9 | *PHY* is the abbreviation for physical layer. It is used to connect a device | 13 | *PHY* is the abbreviation for physical layer. It is used to connect a device |
10 | to the physical medium e.g., the USB controller has a PHY to provide functions | 14 | to the physical medium e.g., the USB controller has a PHY to provide functions |
@@ -21,7 +25,8 @@ better code maintainability. | |||
21 | This framework will be of use only to devices that use external PHY (PHY | 25 | This framework will be of use only to devices that use external PHY (PHY |
22 | functionality is not embedded within the controller). | 26 | functionality is not embedded within the controller). |
23 | 27 | ||
24 | 2. Registering/Unregistering the PHY provider | 28 | Registering/Unregistering the PHY provider |
29 | ========================================== | ||
25 | 30 | ||
26 | PHY provider refers to an entity that implements one or more PHY instances. | 31 | PHY provider refers to an entity that implements one or more PHY instances. |
27 | For the simple case where the PHY provider implements only a single instance of | 32 | For the simple case where the PHY provider implements only a single instance of |
@@ -30,11 +35,14 @@ of_phy_simple_xlate. If the PHY provider implements multiple instances, it | |||
30 | should provide its own implementation of of_xlate. of_xlate is used only for | 35 | should provide its own implementation of of_xlate. of_xlate is used only for |
31 | dt boot case. | 36 | dt boot case. |
32 | 37 | ||
33 | #define of_phy_provider_register(dev, xlate) \ | 38 | :: |
34 | __of_phy_provider_register((dev), NULL, THIS_MODULE, (xlate)) | 39 | |
40 | #define of_phy_provider_register(dev, xlate) \ | ||
41 | __of_phy_provider_register((dev), NULL, THIS_MODULE, (xlate)) | ||
35 | 42 | ||
36 | #define devm_of_phy_provider_register(dev, xlate) \ | 43 | #define devm_of_phy_provider_register(dev, xlate) \ |
37 | __devm_of_phy_provider_register((dev), NULL, THIS_MODULE, (xlate)) | 44 | __devm_of_phy_provider_register((dev), NULL, THIS_MODULE, |
45 | (xlate)) | ||
38 | 46 | ||
39 | of_phy_provider_register and devm_of_phy_provider_register macros can be used to | 47 | of_phy_provider_register and devm_of_phy_provider_register macros can be used to |
40 | register the phy_provider and it takes device and of_xlate as | 48 | register the phy_provider and it takes device and of_xlate as |
@@ -47,28 +55,35 @@ nodes within extra levels for context and extensibility, in which case the low | |||
47 | level of_phy_provider_register_full() and devm_of_phy_provider_register_full() | 55 | level of_phy_provider_register_full() and devm_of_phy_provider_register_full() |
48 | macros can be used to override the node containing the children. | 56 | macros can be used to override the node containing the children. |
49 | 57 | ||
50 | #define of_phy_provider_register_full(dev, children, xlate) \ | 58 | :: |
51 | __of_phy_provider_register(dev, children, THIS_MODULE, xlate) | 59 | |
60 | #define of_phy_provider_register_full(dev, children, xlate) \ | ||
61 | __of_phy_provider_register(dev, children, THIS_MODULE, xlate) | ||
52 | 62 | ||
53 | #define devm_of_phy_provider_register_full(dev, children, xlate) \ | 63 | #define devm_of_phy_provider_register_full(dev, children, xlate) \ |
54 | __devm_of_phy_provider_register_full(dev, children, THIS_MODULE, xlate) | 64 | __devm_of_phy_provider_register_full(dev, children, |
65 | THIS_MODULE, xlate) | ||
55 | 66 | ||
56 | void devm_of_phy_provider_unregister(struct device *dev, | 67 | void devm_of_phy_provider_unregister(struct device *dev, |
57 | struct phy_provider *phy_provider); | 68 | struct phy_provider *phy_provider); |
58 | void of_phy_provider_unregister(struct phy_provider *phy_provider); | 69 | void of_phy_provider_unregister(struct phy_provider *phy_provider); |
59 | 70 | ||
60 | devm_of_phy_provider_unregister and of_phy_provider_unregister can be used to | 71 | devm_of_phy_provider_unregister and of_phy_provider_unregister can be used to |
61 | unregister the PHY. | 72 | unregister the PHY. |
62 | 73 | ||
63 | 3. Creating the PHY | 74 | Creating the PHY |
75 | ================ | ||
64 | 76 | ||
65 | The PHY driver should create the PHY in order for other peripheral controllers | 77 | The PHY driver should create the PHY in order for other peripheral controllers |
66 | to make use of it. The PHY framework provides 2 APIs to create the PHY. | 78 | to make use of it. The PHY framework provides 2 APIs to create the PHY. |
67 | 79 | ||
68 | struct phy *phy_create(struct device *dev, struct device_node *node, | 80 | :: |
69 | const struct phy_ops *ops); | 81 | |
70 | struct phy *devm_phy_create(struct device *dev, struct device_node *node, | 82 | struct phy *phy_create(struct device *dev, struct device_node *node, |
71 | const struct phy_ops *ops); | 83 | const struct phy_ops *ops); |
84 | struct phy *devm_phy_create(struct device *dev, | ||
85 | struct device_node *node, | ||
86 | const struct phy_ops *ops); | ||
72 | 87 | ||
73 | The PHY drivers can use one of the above 2 APIs to create the PHY by passing | 88 | The PHY drivers can use one of the above 2 APIs to create the PHY by passing |
74 | the device pointer and phy ops. | 89 | the device pointer and phy ops. |
@@ -84,12 +99,16 @@ phy_ops to get back the private data. | |||
84 | Before the controller can make use of the PHY, it has to get a reference to | 99 | Before the controller can make use of the PHY, it has to get a reference to |
85 | it. This framework provides the following APIs to get a reference to the PHY. | 100 | it. This framework provides the following APIs to get a reference to the PHY. |
86 | 101 | ||
87 | struct phy *phy_get(struct device *dev, const char *string); | 102 | :: |
88 | struct phy *phy_optional_get(struct device *dev, const char *string); | 103 | |
89 | struct phy *devm_phy_get(struct device *dev, const char *string); | 104 | struct phy *phy_get(struct device *dev, const char *string); |
90 | struct phy *devm_phy_optional_get(struct device *dev, const char *string); | 105 | struct phy *phy_optional_get(struct device *dev, const char *string); |
91 | struct phy *devm_of_phy_get_by_index(struct device *dev, struct device_node *np, | 106 | struct phy *devm_phy_get(struct device *dev, const char *string); |
92 | int index); | 107 | struct phy *devm_phy_optional_get(struct device *dev, |
108 | const char *string); | ||
109 | struct phy *devm_of_phy_get_by_index(struct device *dev, | ||
110 | struct device_node *np, | ||
111 | int index); | ||
93 | 112 | ||
94 | phy_get, phy_optional_get, devm_phy_get and devm_phy_optional_get can | 113 | phy_get, phy_optional_get, devm_phy_get and devm_phy_optional_get can |
95 | be used to get the PHY. In the case of dt boot, the string arguments | 114 | be used to get the PHY. In the case of dt boot, the string arguments |
@@ -111,30 +130,35 @@ the phy_init() and phy_exit() calls, and phy_power_on() and | |||
111 | phy_power_off() calls are all NOP when applied to a NULL phy. The NULL | 130 | phy_power_off() calls are all NOP when applied to a NULL phy. The NULL |
112 | phy is useful in devices for handling optional phy devices. | 131 | phy is useful in devices for handling optional phy devices. |
113 | 132 | ||
114 | 5. Releasing a reference to the PHY | 133 | Releasing a reference to the PHY |
134 | ================================ | ||
115 | 135 | ||
116 | When the controller no longer needs the PHY, it has to release the reference | 136 | When the controller no longer needs the PHY, it has to release the reference |
117 | to the PHY it has obtained using the APIs mentioned in the above section. The | 137 | to the PHY it has obtained using the APIs mentioned in the above section. The |
118 | PHY framework provides 2 APIs to release a reference to the PHY. | 138 | PHY framework provides 2 APIs to release a reference to the PHY. |
119 | 139 | ||
120 | void phy_put(struct phy *phy); | 140 | :: |
121 | void devm_phy_put(struct device *dev, struct phy *phy); | 141 | |
142 | void phy_put(struct phy *phy); | ||
143 | void devm_phy_put(struct device *dev, struct phy *phy); | ||
122 | 144 | ||
123 | Both these APIs are used to release a reference to the PHY and devm_phy_put | 145 | Both these APIs are used to release a reference to the PHY and devm_phy_put |
124 | destroys the devres associated with this PHY. | 146 | destroys the devres associated with this PHY. |
125 | 147 | ||
126 | 6. Destroying the PHY | 148 | Destroying the PHY |
149 | ================== | ||
127 | 150 | ||
128 | When the driver that created the PHY is unloaded, it should destroy the PHY it | 151 | When the driver that created the PHY is unloaded, it should destroy the PHY it |
129 | created using one of the following 2 APIs. | 152 | created using one of the following 2 APIs:: |
130 | 153 | ||
131 | void phy_destroy(struct phy *phy); | 154 | void phy_destroy(struct phy *phy); |
132 | void devm_phy_destroy(struct device *dev, struct phy *phy); | 155 | void devm_phy_destroy(struct device *dev, struct phy *phy); |
133 | 156 | ||
134 | Both these APIs destroy the PHY and devm_phy_destroy destroys the devres | 157 | Both these APIs destroy the PHY and devm_phy_destroy destroys the devres |
135 | associated with this PHY. | 158 | associated with this PHY. |
136 | 159 | ||
137 | 7. PM Runtime | 160 | PM Runtime |
161 | ========== | ||
138 | 162 | ||
139 | This subsystem is pm runtime enabled. So while creating the PHY, | 163 | This subsystem is pm runtime enabled. So while creating the PHY, |
140 | pm_runtime_enable of the phy device created by this subsystem is called and | 164 | pm_runtime_enable of the phy device created by this subsystem is called and |
@@ -150,7 +174,8 @@ There are exported APIs like phy_pm_runtime_get, phy_pm_runtime_get_sync, | |||
150 | phy_pm_runtime_put, phy_pm_runtime_put_sync, phy_pm_runtime_allow and | 174 | phy_pm_runtime_put, phy_pm_runtime_put_sync, phy_pm_runtime_allow and |
151 | phy_pm_runtime_forbid for performing PM operations. | 175 | phy_pm_runtime_forbid for performing PM operations. |
152 | 176 | ||
153 | 8. PHY Mappings | 177 | PHY Mappings |
178 | ============ | ||
154 | 179 | ||
155 | In order to get reference to a PHY without help from DeviceTree, the framework | 180 | In order to get reference to a PHY without help from DeviceTree, the framework |
156 | offers lookups which can be compared to clkdev that allow clk structures to be | 181 | offers lookups which can be compared to clkdev that allow clk structures to be |
@@ -158,12 +183,15 @@ bound to devices. A lookup can be made be made during runtime when a handle to | |||
158 | the struct phy already exists. | 183 | the struct phy already exists. |
159 | 184 | ||
160 | The framework offers the following API for registering and unregistering the | 185 | The framework offers the following API for registering and unregistering the |
161 | lookups. | 186 | lookups:: |
162 | 187 | ||
163 | int phy_create_lookup(struct phy *phy, const char *con_id, const char *dev_id); | 188 | int phy_create_lookup(struct phy *phy, const char *con_id, |
164 | void phy_remove_lookup(struct phy *phy, const char *con_id, const char *dev_id); | 189 | const char *dev_id); |
190 | void phy_remove_lookup(struct phy *phy, const char *con_id, | ||
191 | const char *dev_id); | ||
165 | 192 | ||
166 | 9. DeviceTree Binding | 193 | DeviceTree Binding |
194 | ================== | ||
167 | 195 | ||
168 | The documentation for PHY dt binding can be found @ | 196 | The documentation for PHY dt binding can be found @ |
169 | Documentation/devicetree/bindings/phy/phy-bindings.txt | 197 | Documentation/devicetree/bindings/phy/phy-bindings.txt |
diff --git a/Documentation/pi-futex.txt b/Documentation/pi-futex.txt index 9a5bc8651c29..aafddbee7377 100644 --- a/Documentation/pi-futex.txt +++ b/Documentation/pi-futex.txt | |||
@@ -1,5 +1,6 @@ | |||
1 | ====================== | ||
1 | Lightweight PI-futexes | 2 | Lightweight PI-futexes |
2 | ---------------------- | 3 | ====================== |
3 | 4 | ||
4 | We are calling them lightweight for 3 reasons: | 5 | We are calling them lightweight for 3 reasons: |
5 | 6 | ||
@@ -25,8 +26,8 @@ determinism and well-bound latencies. Even in the worst-case, PI will | |||
25 | improve the statistical distribution of locking related application | 26 | improve the statistical distribution of locking related application |
26 | delays. | 27 | delays. |
27 | 28 | ||
28 | The longer reply: | 29 | The longer reply |
29 | ----------------- | 30 | ---------------- |
30 | 31 | ||
31 | Firstly, sharing locks between multiple tasks is a common programming | 32 | Firstly, sharing locks between multiple tasks is a common programming |
32 | technique that often cannot be replaced with lockless algorithms. As we | 33 | technique that often cannot be replaced with lockless algorithms. As we |
@@ -71,8 +72,8 @@ deterministic execution of the high-prio task: any medium-priority task | |||
71 | could preempt the low-prio task while it holds the shared lock and | 72 | could preempt the low-prio task while it holds the shared lock and |
72 | executes the critical section, and could delay it indefinitely. | 73 | executes the critical section, and could delay it indefinitely. |
73 | 74 | ||
74 | Implementation: | 75 | Implementation |
75 | --------------- | 76 | -------------- |
76 | 77 | ||
77 | As mentioned before, the userspace fastpath of PI-enabled pthread | 78 | As mentioned before, the userspace fastpath of PI-enabled pthread |
78 | mutexes involves no kernel work at all - they behave quite similarly to | 79 | mutexes involves no kernel work at all - they behave quite similarly to |
@@ -83,8 +84,8 @@ entering the kernel. | |||
83 | 84 | ||
84 | To handle the slowpath, we have added two new futex ops: | 85 | To handle the slowpath, we have added two new futex ops: |
85 | 86 | ||
86 | FUTEX_LOCK_PI | 87 | - FUTEX_LOCK_PI |
87 | FUTEX_UNLOCK_PI | 88 | - FUTEX_UNLOCK_PI |
88 | 89 | ||
89 | If the lock-acquire fastpath fails, [i.e. an atomic transition from 0 to | 90 | If the lock-acquire fastpath fails, [i.e. an atomic transition from 0 to |
90 | TID fails], then FUTEX_LOCK_PI is called. The kernel does all the | 91 | TID fails], then FUTEX_LOCK_PI is called. The kernel does all the |
diff --git a/Documentation/pnp.txt b/Documentation/pnp.txt index 763e4659bf18..bab2d10631f0 100644 --- a/Documentation/pnp.txt +++ b/Documentation/pnp.txt | |||
@@ -1,98 +1,118 @@ | |||
1 | ================================= | ||
1 | Linux Plug and Play Documentation | 2 | Linux Plug and Play Documentation |
2 | by Adam Belay <ambx1@neo.rr.com> | 3 | ================================= |
3 | last updated: Oct. 16, 2002 | ||
4 | --------------------------------------------------------------------------------------- | ||
5 | 4 | ||
5 | :Author: Adam Belay <ambx1@neo.rr.com> | ||
6 | :Last updated: Oct. 16, 2002 | ||
6 | 7 | ||
7 | 8 | ||
8 | Overview | 9 | Overview |
9 | -------- | 10 | -------- |
10 | Plug and Play provides a means of detecting and setting resources for legacy or | 11 | |
12 | Plug and Play provides a means of detecting and setting resources for legacy or | ||
11 | otherwise unconfigurable devices. The Linux Plug and Play Layer provides these | 13 | otherwise unconfigurable devices. The Linux Plug and Play Layer provides these |
12 | services to compatible drivers. | 14 | services to compatible drivers. |
13 | 15 | ||
14 | 16 | ||
15 | |||
16 | The User Interface | 17 | The User Interface |
17 | ------------------ | 18 | ------------------ |
18 | The Linux Plug and Play user interface provides a means to activate PnP devices | 19 | |
20 | The Linux Plug and Play user interface provides a means to activate PnP devices | ||
19 | for legacy and user level drivers that do not support Linux Plug and Play. The | 21 | for legacy and user level drivers that do not support Linux Plug and Play. The |
20 | user interface is integrated into sysfs. | 22 | user interface is integrated into sysfs. |
21 | 23 | ||
22 | In addition to the standard sysfs file the following are created in each | 24 | In addition to the standard sysfs file the following are created in each |
23 | device's directory: | 25 | device's directory: |
24 | id - displays a list of support EISA IDs | 26 | - id - displays a list of support EISA IDs |
25 | options - displays possible resource configurations | 27 | - options - displays possible resource configurations |
26 | resources - displays currently allocated resources and allows resource changes | 28 | - resources - displays currently allocated resources and allows resource changes |
27 | 29 | ||
28 | -activating a device | 30 | activating a device |
31 | ^^^^^^^^^^^^^^^^^^^ | ||
29 | 32 | ||
30 | #echo "auto" > resources | 33 | :: |
34 | |||
35 | # echo "auto" > resources | ||
31 | 36 | ||
32 | this will invoke the automatic resource config system to activate the device | 37 | this will invoke the automatic resource config system to activate the device |
33 | 38 | ||
34 | -manually activating a device | 39 | manually activating a device |
40 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
41 | |||
42 | :: | ||
43 | |||
44 | # echo "manual <depnum> <mode>" > resources | ||
35 | 45 | ||
36 | #echo "manual <depnum> <mode>" > resources | 46 | <depnum> - the configuration number |
37 | <depnum> - the configuration number | 47 | <mode> - static or dynamic |
38 | <mode> - static or dynamic | 48 | static = for next boot |
39 | static = for next boot | 49 | dynamic = now |
40 | dynamic = now | ||
41 | 50 | ||
42 | -disabling a device | 51 | disabling a device |
52 | ^^^^^^^^^^^^^^^^^^ | ||
43 | 53 | ||
44 | #echo "disable" > resources | 54 | :: |
55 | |||
56 | # echo "disable" > resources | ||
45 | 57 | ||
46 | 58 | ||
47 | EXAMPLE: | 59 | EXAMPLE: |
48 | 60 | ||
49 | Suppose you need to activate the floppy disk controller. | 61 | Suppose you need to activate the floppy disk controller. |
50 | 1.) change to the proper directory, in my case it is | 62 | |
51 | /driver/bus/pnp/devices/00:0f | 63 | 1. change to the proper directory, in my case it is |
52 | # cd /driver/bus/pnp/devices/00:0f | 64 | /driver/bus/pnp/devices/00:0f:: |
53 | # cat name | 65 | |
54 | PC standard floppy disk controller | 66 | # cd /driver/bus/pnp/devices/00:0f |
55 | 67 | # cat name | |
56 | 2.) check if the device is already active | 68 | PC standard floppy disk controller |
57 | # cat resources | 69 | |
58 | DISABLED | 70 | 2. check if the device is already active:: |
59 | 71 | ||
60 | - Notice the string "DISABLED". This means the device is not active. | 72 | # cat resources |
61 | 73 | DISABLED | |
62 | 3.) check the device's possible configurations (optional) | 74 | |
63 | # cat options | 75 | - Notice the string "DISABLED". This means the device is not active. |
64 | Dependent: 01 - Priority acceptable | 76 | |
65 | port 0x3f0-0x3f0, align 0x7, size 0x6, 16-bit address decoding | 77 | 3. check the device's possible configurations (optional):: |
66 | port 0x3f7-0x3f7, align 0x0, size 0x1, 16-bit address decoding | 78 | |
67 | irq 6 | 79 | # cat options |
68 | dma 2 8-bit compatible | 80 | Dependent: 01 - Priority acceptable |
69 | Dependent: 02 - Priority acceptable | 81 | port 0x3f0-0x3f0, align 0x7, size 0x6, 16-bit address decoding |
70 | port 0x370-0x370, align 0x7, size 0x6, 16-bit address decoding | 82 | port 0x3f7-0x3f7, align 0x0, size 0x1, 16-bit address decoding |
71 | port 0x377-0x377, align 0x0, size 0x1, 16-bit address decoding | 83 | irq 6 |
72 | irq 6 | 84 | dma 2 8-bit compatible |
73 | dma 2 8-bit compatible | 85 | Dependent: 02 - Priority acceptable |
74 | 86 | port 0x370-0x370, align 0x7, size 0x6, 16-bit address decoding | |
75 | 4.) now activate the device | 87 | port 0x377-0x377, align 0x0, size 0x1, 16-bit address decoding |
76 | # echo "auto" > resources | 88 | irq 6 |
77 | 89 | dma 2 8-bit compatible | |
78 | 5.) finally check if the device is active | 90 | |
79 | # cat resources | 91 | 4. now activate the device:: |
80 | io 0x3f0-0x3f5 | 92 | |
81 | io 0x3f7-0x3f7 | 93 | # echo "auto" > resources |
82 | irq 6 | 94 | |
83 | dma 2 | 95 | 5. finally check if the device is active:: |
84 | 96 | ||
85 | also there are a series of kernel parameters: | 97 | # cat resources |
86 | pnp_reserve_irq=irq1[,irq2] .... | 98 | io 0x3f0-0x3f5 |
87 | pnp_reserve_dma=dma1[,dma2] .... | 99 | io 0x3f7-0x3f7 |
88 | pnp_reserve_io=io1,size1[,io2,size2] .... | 100 | irq 6 |
89 | pnp_reserve_mem=mem1,size1[,mem2,size2] .... | 101 | dma 2 |
102 | |||
103 | also there are a series of kernel parameters:: | ||
104 | |||
105 | pnp_reserve_irq=irq1[,irq2] .... | ||
106 | pnp_reserve_dma=dma1[,dma2] .... | ||
107 | pnp_reserve_io=io1,size1[,io2,size2] .... | ||
108 | pnp_reserve_mem=mem1,size1[,mem2,size2] .... | ||
90 | 109 | ||
91 | 110 | ||
92 | 111 | ||
93 | The Unified Plug and Play Layer | 112 | The Unified Plug and Play Layer |
94 | ------------------------------- | 113 | ------------------------------- |
95 | All Plug and Play drivers, protocols, and services meet at a central location | 114 | |
115 | All Plug and Play drivers, protocols, and services meet at a central location | ||
96 | called the Plug and Play Layer. This layer is responsible for the exchange of | 116 | called the Plug and Play Layer. This layer is responsible for the exchange of |
97 | information between PnP drivers and PnP protocols. Thus it automatically | 117 | information between PnP drivers and PnP protocols. Thus it automatically |
98 | forwards commands to the proper protocol. This makes writing PnP drivers | 118 | forwards commands to the proper protocol. This makes writing PnP drivers |
@@ -101,64 +121,73 @@ significantly easier. | |||
101 | The following functions are available from the Plug and Play Layer: | 121 | The following functions are available from the Plug and Play Layer: |
102 | 122 | ||
103 | pnp_get_protocol | 123 | pnp_get_protocol |
104 | - increments the number of uses by one | 124 | increments the number of uses by one |
105 | 125 | ||
106 | pnp_put_protocol | 126 | pnp_put_protocol |
107 | - deincrements the number of uses by one | 127 | deincrements the number of uses by one |
108 | 128 | ||
109 | pnp_register_protocol | 129 | pnp_register_protocol |
110 | - use this to register a new PnP protocol | 130 | use this to register a new PnP protocol |
111 | 131 | ||
112 | pnp_unregister_protocol | 132 | pnp_unregister_protocol |
113 | - use this function to remove a PnP protocol from the Plug and Play Layer | 133 | use this function to remove a PnP protocol from the Plug and Play Layer |
114 | 134 | ||
115 | pnp_register_driver | 135 | pnp_register_driver |
116 | - adds a PnP driver to the Plug and Play Layer | 136 | adds a PnP driver to the Plug and Play Layer |
117 | - this includes driver model integration | 137 | |
118 | - returns zero for success or a negative error number for failure; count | 138 | this includes driver model integration |
139 | returns zero for success or a negative error number for failure; count | ||
119 | calls to the .add() method if you need to know how many devices bind to | 140 | calls to the .add() method if you need to know how many devices bind to |
120 | the driver | 141 | the driver |
121 | 142 | ||
122 | pnp_unregister_driver | 143 | pnp_unregister_driver |
123 | - removes a PnP driver from the Plug and Play Layer | 144 | removes a PnP driver from the Plug and Play Layer |
124 | 145 | ||
125 | 146 | ||
126 | 147 | ||
127 | Plug and Play Protocols | 148 | Plug and Play Protocols |
128 | ----------------------- | 149 | ----------------------- |
129 | This section contains information for PnP protocol developers. | 150 | |
151 | This section contains information for PnP protocol developers. | ||
130 | 152 | ||
131 | The following Protocols are currently available in the computing world: | 153 | The following Protocols are currently available in the computing world: |
132 | - PNPBIOS: used for system devices such as serial and parallel ports. | 154 | |
133 | - ISAPNP: provides PnP support for the ISA bus | 155 | - PNPBIOS: |
134 | - ACPI: among its many uses, ACPI provides information about system level | 156 | used for system devices such as serial and parallel ports. |
135 | devices. | 157 | - ISAPNP: |
158 | provides PnP support for the ISA bus | ||
159 | - ACPI: | ||
160 | among its many uses, ACPI provides information about system level | ||
161 | devices. | ||
162 | |||
136 | It is meant to replace the PNPBIOS. It is not currently supported by Linux | 163 | It is meant to replace the PNPBIOS. It is not currently supported by Linux |
137 | Plug and Play but it is planned to be in the near future. | 164 | Plug and Play but it is planned to be in the near future. |
138 | 165 | ||
139 | 166 | ||
140 | Requirements for a Linux PnP protocol: | 167 | Requirements for a Linux PnP protocol: |
141 | 1.) the protocol must use EISA IDs | 168 | 1. the protocol must use EISA IDs |
142 | 2.) the protocol must inform the PnP Layer of a device's current configuration | 169 | 2. the protocol must inform the PnP Layer of a device's current configuration |
170 | |||
143 | - the ability to set resources is optional but preferred. | 171 | - the ability to set resources is optional but preferred. |
144 | 172 | ||
145 | The following are PnP protocol related functions: | 173 | The following are PnP protocol related functions: |
146 | 174 | ||
147 | pnp_add_device | 175 | pnp_add_device |
148 | - use this function to add a PnP device to the PnP layer | 176 | use this function to add a PnP device to the PnP layer |
149 | - only call this function when all wanted values are set in the pnp_dev | 177 | |
150 | structure | 178 | only call this function when all wanted values are set in the pnp_dev |
179 | structure | ||
151 | 180 | ||
152 | pnp_init_device | 181 | pnp_init_device |
153 | - call this to initialize the PnP structure | 182 | call this to initialize the PnP structure |
154 | 183 | ||
155 | pnp_remove_device | 184 | pnp_remove_device |
156 | - call this to remove a device from the Plug and Play Layer. | 185 | call this to remove a device from the Plug and Play Layer. |
157 | - it will fail if the device is still in use. | 186 | it will fail if the device is still in use. |
158 | - automatically will free mem used by the device and related structures | 187 | automatically will free mem used by the device and related structures |
159 | 188 | ||
160 | pnp_add_id | 189 | pnp_add_id |
161 | - adds an EISA ID to the list of supported IDs for the specified device | 190 | adds an EISA ID to the list of supported IDs for the specified device |
162 | 191 | ||
163 | For more information consult the source of a protocol such as | 192 | For more information consult the source of a protocol such as |
164 | /drivers/pnp/pnpbios/core.c. | 193 | /drivers/pnp/pnpbios/core.c. |
@@ -167,85 +196,97 @@ For more information consult the source of a protocol such as | |||
167 | 196 | ||
168 | Linux Plug and Play Drivers | 197 | Linux Plug and Play Drivers |
169 | --------------------------- | 198 | --------------------------- |
170 | This section contains information for Linux PnP driver developers. | 199 | |
200 | This section contains information for Linux PnP driver developers. | ||
171 | 201 | ||
172 | The New Way | 202 | The New Way |
173 | ........... | 203 | ^^^^^^^^^^^ |
174 | 1.) first make a list of supported EISA IDS | 204 | |
175 | ex: | 205 | 1. first make a list of supported EISA IDS |
176 | static const struct pnp_id pnp_dev_table[] = { | 206 | |
177 | /* Standard LPT Printer Port */ | 207 | ex:: |
178 | {.id = "PNP0400", .driver_data = 0}, | 208 | |
179 | /* ECP Printer Port */ | 209 | static const struct pnp_id pnp_dev_table[] = { |
180 | {.id = "PNP0401", .driver_data = 0}, | 210 | /* Standard LPT Printer Port */ |
181 | {.id = ""} | 211 | {.id = "PNP0400", .driver_data = 0}, |
182 | }; | 212 | /* ECP Printer Port */ |
183 | 213 | {.id = "PNP0401", .driver_data = 0}, | |
184 | Please note that the character 'X' can be used as a wild card in the function | 214 | {.id = ""} |
185 | portion (last four characters). | 215 | }; |
186 | ex: | 216 | |
217 | Please note that the character 'X' can be used as a wild card in the function | ||
218 | portion (last four characters). | ||
219 | |||
220 | ex:: | ||
221 | |||
187 | /* Unknown PnP modems */ | 222 | /* Unknown PnP modems */ |
188 | { "PNPCXXX", UNKNOWN_DEV }, | 223 | { "PNPCXXX", UNKNOWN_DEV }, |
189 | 224 | ||
190 | Supported PnP card IDs can optionally be defined. | 225 | Supported PnP card IDs can optionally be defined. |
191 | ex: | 226 | ex:: |
192 | static const struct pnp_id pnp_card_table[] = { | 227 | |
193 | { "ANYDEVS", 0 }, | 228 | static const struct pnp_id pnp_card_table[] = { |
194 | { "", 0 } | 229 | { "ANYDEVS", 0 }, |
195 | }; | 230 | { "", 0 } |
196 | 231 | }; | |
197 | 2.) Optionally define probe and remove functions. It may make sense not to | 232 | |
198 | define these functions if the driver already has a reliable method of detecting | 233 | 2. Optionally define probe and remove functions. It may make sense not to |
199 | the resources, such as the parport_pc driver. | 234 | define these functions if the driver already has a reliable method of detecting |
200 | ex: | 235 | the resources, such as the parport_pc driver. |
201 | static int | 236 | |
202 | serial_pnp_probe(struct pnp_dev * dev, const struct pnp_id *card_id, const | 237 | ex:: |
203 | struct pnp_id *dev_id) | 238 | |
204 | { | 239 | static int |
205 | . . . | 240 | serial_pnp_probe(struct pnp_dev * dev, const struct pnp_id *card_id, const |
206 | 241 | struct pnp_id *dev_id) | |
207 | ex: | 242 | { |
208 | static void serial_pnp_remove(struct pnp_dev * dev) | 243 | . . . |
209 | { | 244 | |
210 | . . . | 245 | ex:: |
211 | 246 | ||
212 | consult /drivers/serial/8250_pnp.c for more information. | 247 | static void serial_pnp_remove(struct pnp_dev * dev) |
213 | 248 | { | |
214 | 3.) create a driver structure | 249 | . . . |
215 | ex: | 250 | |
216 | 251 | consult /drivers/serial/8250_pnp.c for more information. | |
217 | static struct pnp_driver serial_pnp_driver = { | 252 | |
218 | .name = "serial", | 253 | 3. create a driver structure |
219 | .card_id_table = pnp_card_table, | 254 | |
220 | .id_table = pnp_dev_table, | 255 | ex:: |
221 | .probe = serial_pnp_probe, | 256 | |
222 | .remove = serial_pnp_remove, | 257 | static struct pnp_driver serial_pnp_driver = { |
223 | }; | 258 | .name = "serial", |
224 | 259 | .card_id_table = pnp_card_table, | |
225 | * name and id_table cannot be NULL. | 260 | .id_table = pnp_dev_table, |
226 | 261 | .probe = serial_pnp_probe, | |
227 | 4.) register the driver | 262 | .remove = serial_pnp_remove, |
228 | ex: | 263 | }; |
229 | 264 | ||
230 | static int __init serial8250_pnp_init(void) | 265 | * name and id_table cannot be NULL. |
231 | { | 266 | |
232 | return pnp_register_driver(&serial_pnp_driver); | 267 | 4. register the driver |
233 | } | 268 | |
269 | ex:: | ||
270 | |||
271 | static int __init serial8250_pnp_init(void) | ||
272 | { | ||
273 | return pnp_register_driver(&serial_pnp_driver); | ||
274 | } | ||
234 | 275 | ||
235 | The Old Way | 276 | The Old Way |
236 | ........... | 277 | ^^^^^^^^^^^ |
237 | 278 | ||
238 | A series of compatibility functions have been created to make it easy to convert | 279 | A series of compatibility functions have been created to make it easy to convert |
239 | ISAPNP drivers. They should serve as a temporary solution only. | 280 | ISAPNP drivers. They should serve as a temporary solution only. |
240 | 281 | ||
241 | They are as follows: | 282 | They are as follows:: |
242 | 283 | ||
243 | struct pnp_card *pnp_find_card(unsigned short vendor, | 284 | struct pnp_card *pnp_find_card(unsigned short vendor, |
244 | unsigned short device, | 285 | unsigned short device, |
245 | struct pnp_card *from) | 286 | struct pnp_card *from) |
246 | 287 | ||
247 | struct pnp_dev *pnp_find_dev(struct pnp_card *card, | 288 | struct pnp_dev *pnp_find_dev(struct pnp_card *card, |
248 | unsigned short vendor, | 289 | unsigned short vendor, |
249 | unsigned short function, | 290 | unsigned short function, |
250 | struct pnp_dev *from) | 291 | struct pnp_dev *from) |
251 | 292 | ||
diff --git a/Documentation/preempt-locking.txt b/Documentation/preempt-locking.txt index e89ce6624af2..c945062be66c 100644 --- a/Documentation/preempt-locking.txt +++ b/Documentation/preempt-locking.txt | |||
@@ -1,10 +1,13 @@ | |||
1 | Proper Locking Under a Preemptible Kernel: | 1 | =========================================================================== |
2 | Keeping Kernel Code Preempt-Safe | 2 | Proper Locking Under a Preemptible Kernel: Keeping Kernel Code Preempt-Safe |
3 | Robert Love <rml@tech9.net> | 3 | =========================================================================== |
4 | Last Updated: 28 Aug 2002 | ||
5 | 4 | ||
5 | :Author: Robert Love <rml@tech9.net> | ||
6 | :Last Updated: 28 Aug 2002 | ||
6 | 7 | ||
7 | INTRODUCTION | 8 | |
9 | Introduction | ||
10 | ============ | ||
8 | 11 | ||
9 | 12 | ||
10 | A preemptible kernel creates new locking issues. The issues are the same as | 13 | A preemptible kernel creates new locking issues. The issues are the same as |
@@ -17,9 +20,10 @@ requires protecting these situations. | |||
17 | 20 | ||
18 | 21 | ||
19 | RULE #1: Per-CPU data structures need explicit protection | 22 | RULE #1: Per-CPU data structures need explicit protection |
23 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
20 | 24 | ||
21 | 25 | ||
22 | Two similar problems arise. An example code snippet: | 26 | Two similar problems arise. An example code snippet:: |
23 | 27 | ||
24 | struct this_needs_locking tux[NR_CPUS]; | 28 | struct this_needs_locking tux[NR_CPUS]; |
25 | tux[smp_processor_id()] = some_value; | 29 | tux[smp_processor_id()] = some_value; |
@@ -35,6 +39,7 @@ You can also use put_cpu() and get_cpu(), which will disable preemption. | |||
35 | 39 | ||
36 | 40 | ||
37 | RULE #2: CPU state must be protected. | 41 | RULE #2: CPU state must be protected. |
42 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
38 | 43 | ||
39 | 44 | ||
40 | Under preemption, the state of the CPU must be protected. This is arch- | 45 | Under preemption, the state of the CPU must be protected. This is arch- |
@@ -52,6 +57,7 @@ However, fpu__restore() must be called with preemption disabled. | |||
52 | 57 | ||
53 | 58 | ||
54 | RULE #3: Lock acquire and release must be performed by same task | 59 | RULE #3: Lock acquire and release must be performed by same task |
60 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
55 | 61 | ||
56 | 62 | ||
57 | A lock acquired in one task must be released by the same task. This | 63 | A lock acquired in one task must be released by the same task. This |
@@ -61,17 +67,20 @@ like this, acquire and release the task in the same code path and | |||
61 | have the caller wait on an event by the other task. | 67 | have the caller wait on an event by the other task. |
62 | 68 | ||
63 | 69 | ||
64 | SOLUTION | 70 | Solution |
71 | ======== | ||
65 | 72 | ||
66 | 73 | ||
67 | Data protection under preemption is achieved by disabling preemption for the | 74 | Data protection under preemption is achieved by disabling preemption for the |
68 | duration of the critical region. | 75 | duration of the critical region. |
69 | 76 | ||
70 | preempt_enable() decrement the preempt counter | 77 | :: |
71 | preempt_disable() increment the preempt counter | 78 | |
72 | preempt_enable_no_resched() decrement, but do not immediately preempt | 79 | preempt_enable() decrement the preempt counter |
73 | preempt_check_resched() if needed, reschedule | 80 | preempt_disable() increment the preempt counter |
74 | preempt_count() return the preempt counter | 81 | preempt_enable_no_resched() decrement, but do not immediately preempt |
82 | preempt_check_resched() if needed, reschedule | ||
83 | preempt_count() return the preempt counter | ||
75 | 84 | ||
76 | The functions are nestable. In other words, you can call preempt_disable | 85 | The functions are nestable. In other words, you can call preempt_disable |
77 | n-times in a code path, and preemption will not be reenabled until the n-th | 86 | n-times in a code path, and preemption will not be reenabled until the n-th |
@@ -89,7 +98,7 @@ So use this implicit preemption-disabling property only if you know that the | |||
89 | affected codepath does not do any of this. Best policy is to use this only for | 98 | affected codepath does not do any of this. Best policy is to use this only for |
90 | small, atomic code that you wrote and which calls no complex functions. | 99 | small, atomic code that you wrote and which calls no complex functions. |
91 | 100 | ||
92 | Example: | 101 | Example:: |
93 | 102 | ||
94 | cpucache_t *cc; /* this is per-CPU */ | 103 | cpucache_t *cc; /* this is per-CPU */ |
95 | preempt_disable(); | 104 | preempt_disable(); |
@@ -102,7 +111,7 @@ Example: | |||
102 | return 0; | 111 | return 0; |
103 | 112 | ||
104 | Notice how the preemption statements must encompass every reference of the | 113 | Notice how the preemption statements must encompass every reference of the |
105 | critical variables. Another example: | 114 | critical variables. Another example:: |
106 | 115 | ||
107 | int buf[NR_CPUS]; | 116 | int buf[NR_CPUS]; |
108 | set_cpu_val(buf); | 117 | set_cpu_val(buf); |
@@ -114,7 +123,8 @@ This code is not preempt-safe, but see how easily we can fix it by simply | |||
114 | moving the spin_lock up two lines. | 123 | moving the spin_lock up two lines. |
115 | 124 | ||
116 | 125 | ||
117 | PREVENTING PREEMPTION USING INTERRUPT DISABLING | 126 | Preventing preemption using interrupt disabling |
127 | =============================================== | ||
118 | 128 | ||
119 | 129 | ||
120 | It is possible to prevent a preemption event using local_irq_disable and | 130 | It is possible to prevent a preemption event using local_irq_disable and |
diff --git a/Documentation/printk-formats.txt b/Documentation/printk-formats.txt index 619cdffa5d44..65ea5915178b 100644 --- a/Documentation/printk-formats.txt +++ b/Documentation/printk-formats.txt | |||
@@ -1,5 +1,18 @@ | |||
1 | If variable is of Type, use printk format specifier: | 1 | ========================================= |
2 | --------------------------------------------------------- | 2 | How to get printk format specifiers right |
3 | ========================================= | ||
4 | |||
5 | :Author: Randy Dunlap <rdunlap@infradead.org> | ||
6 | :Author: Andrew Murray <amurray@mpc-data.co.uk> | ||
7 | |||
8 | |||
9 | Integer types | ||
10 | ============= | ||
11 | |||
12 | :: | ||
13 | |||
14 | If variable is of Type, use printk format specifier: | ||
15 | ------------------------------------------------------------ | ||
3 | int %d or %x | 16 | int %d or %x |
4 | unsigned int %u or %x | 17 | unsigned int %u or %x |
5 | long %ld or %lx | 18 | long %ld or %lx |
@@ -13,25 +26,29 @@ If variable is of Type, use printk format specifier: | |||
13 | s64 %lld or %llx | 26 | s64 %lld or %llx |
14 | u64 %llu or %llx | 27 | u64 %llu or %llx |
15 | 28 | ||
16 | If <type> is dependent on a config option for its size (e.g., sector_t, | 29 | If <type> is dependent on a config option for its size (e.g., ``sector_t``, |
17 | blkcnt_t) or is architecture-dependent for its size (e.g., tcflag_t), use a | 30 | ``blkcnt_t``) or is architecture-dependent for its size (e.g., ``tcflag_t``), |
18 | format specifier of its largest possible type and explicitly cast to it. | 31 | use a format specifier of its largest possible type and explicitly cast to it. |
19 | Example: | 32 | |
33 | Example:: | ||
20 | 34 | ||
21 | printk("test: sector number/total blocks: %llu/%llu\n", | 35 | printk("test: sector number/total blocks: %llu/%llu\n", |
22 | (unsigned long long)sector, (unsigned long long)blockcount); | 36 | (unsigned long long)sector, (unsigned long long)blockcount); |
23 | 37 | ||
24 | Reminder: sizeof() result is of type size_t. | 38 | Reminder: ``sizeof()`` result is of type ``size_t``. |
25 | 39 | ||
26 | The kernel's printf does not support %n. For obvious reasons, floating | 40 | The kernel's printf does not support ``%n``. For obvious reasons, floating |
27 | point formats (%e, %f, %g, %a) are also not recognized. Use of any | 41 | point formats (``%e, %f, %g, %a``) are also not recognized. Use of any |
28 | unsupported specifier or length qualifier results in a WARN and early | 42 | unsupported specifier or length qualifier results in a WARN and early |
29 | return from vsnprintf. | 43 | return from vsnprintf. |
30 | 44 | ||
31 | Raw pointer value SHOULD be printed with %p. The kernel supports | 45 | Raw pointer value SHOULD be printed with %p. The kernel supports |
32 | the following extended format specifiers for pointer types: | 46 | the following extended format specifiers for pointer types: |
33 | 47 | ||
34 | Symbols/Function Pointers: | 48 | Symbols/Function Pointers |
49 | ========================= | ||
50 | |||
51 | :: | ||
35 | 52 | ||
36 | %pF versatile_init+0x0/0x110 | 53 | %pF versatile_init+0x0/0x110 |
37 | %pf versatile_init | 54 | %pf versatile_init |
@@ -41,99 +58,122 @@ Symbols/Function Pointers: | |||
41 | %ps versatile_init | 58 | %ps versatile_init |
42 | %pB prev_fn_of_versatile_init+0x88/0x88 | 59 | %pB prev_fn_of_versatile_init+0x88/0x88 |
43 | 60 | ||
44 | For printing symbols and function pointers. The 'S' and 's' specifiers | 61 | For printing symbols and function pointers. The ``S`` and ``s`` specifiers |
45 | result in the symbol name with ('S') or without ('s') offsets. Where | 62 | result in the symbol name with (``S``) or without (``s``) offsets. Where |
46 | this is used on a kernel without KALLSYMS - the symbol address is | 63 | this is used on a kernel without KALLSYMS - the symbol address is |
47 | printed instead. | 64 | printed instead. |
65 | |||
66 | The ``B`` specifier results in the symbol name with offsets and should be | ||
67 | used when printing stack backtraces. The specifier takes into | ||
68 | consideration the effect of compiler optimisations which may occur | ||
69 | when tail-call``s are used and marked with the noreturn GCC attribute. | ||
48 | 70 | ||
49 | The 'B' specifier results in the symbol name with offsets and should be | 71 | On ia64, ppc64 and parisc64 architectures function pointers are |
50 | used when printing stack backtraces. The specifier takes into | 72 | actually function descriptors which must first be resolved. The ``F`` and |
51 | consideration the effect of compiler optimisations which may occur | 73 | ``f`` specifiers perform this resolution and then provide the same |
52 | when tail-call's are used and marked with the noreturn GCC attribute. | 74 | functionality as the ``S`` and ``s`` specifiers. |
53 | 75 | ||
54 | On ia64, ppc64 and parisc64 architectures function pointers are | 76 | Kernel Pointers |
55 | actually function descriptors which must first be resolved. The 'F' and | 77 | =============== |
56 | 'f' specifiers perform this resolution and then provide the same | ||
57 | functionality as the 'S' and 's' specifiers. | ||
58 | 78 | ||
59 | Kernel Pointers: | 79 | :: |
60 | 80 | ||
61 | %pK 0x01234567 or 0x0123456789abcdef | 81 | %pK 0x01234567 or 0x0123456789abcdef |
62 | 82 | ||
63 | For printing kernel pointers which should be hidden from unprivileged | 83 | For printing kernel pointers which should be hidden from unprivileged |
64 | users. The behaviour of %pK depends on the kptr_restrict sysctl - see | 84 | users. The behaviour of ``%pK`` depends on the ``kptr_restrict sysctl`` - see |
65 | Documentation/sysctl/kernel.txt for more details. | 85 | Documentation/sysctl/kernel.txt for more details. |
86 | |||
87 | Struct Resources | ||
88 | ================ | ||
66 | 89 | ||
67 | Struct Resources: | 90 | :: |
68 | 91 | ||
69 | %pr [mem 0x60000000-0x6fffffff flags 0x2200] or | 92 | %pr [mem 0x60000000-0x6fffffff flags 0x2200] or |
70 | [mem 0x0000000060000000-0x000000006fffffff flags 0x2200] | 93 | [mem 0x0000000060000000-0x000000006fffffff flags 0x2200] |
71 | %pR [mem 0x60000000-0x6fffffff pref] or | 94 | %pR [mem 0x60000000-0x6fffffff pref] or |
72 | [mem 0x0000000060000000-0x000000006fffffff pref] | 95 | [mem 0x0000000060000000-0x000000006fffffff pref] |
73 | 96 | ||
74 | For printing struct resources. The 'R' and 'r' specifiers result in a | 97 | For printing struct resources. The ``R`` and ``r`` specifiers result in a |
75 | printed resource with ('R') or without ('r') a decoded flags member. | 98 | printed resource with (``R``) or without (``r``) a decoded flags member. |
76 | Passed by reference. | 99 | Passed by reference. |
100 | |||
101 | Physical addresses types ``phys_addr_t`` | ||
102 | ======================================== | ||
77 | 103 | ||
78 | Physical addresses types phys_addr_t: | 104 | :: |
79 | 105 | ||
80 | %pa[p] 0x01234567 or 0x0123456789abcdef | 106 | %pa[p] 0x01234567 or 0x0123456789abcdef |
81 | 107 | ||
82 | For printing a phys_addr_t type (and its derivatives, such as | 108 | For printing a ``phys_addr_t`` type (and its derivatives, such as |
83 | resource_size_t) which can vary based on build options, regardless of | 109 | ``resource_size_t``) which can vary based on build options, regardless of |
84 | the width of the CPU data path. Passed by reference. | 110 | the width of the CPU data path. Passed by reference. |
85 | 111 | ||
86 | DMA addresses types dma_addr_t: | 112 | DMA addresses types ``dma_addr_t`` |
113 | ================================== | ||
114 | |||
115 | :: | ||
87 | 116 | ||
88 | %pad 0x01234567 or 0x0123456789abcdef | 117 | %pad 0x01234567 or 0x0123456789abcdef |
89 | 118 | ||
90 | For printing a dma_addr_t type which can vary based on build options, | 119 | For printing a ``dma_addr_t`` type which can vary based on build options, |
91 | regardless of the width of the CPU data path. Passed by reference. | 120 | regardless of the width of the CPU data path. Passed by reference. |
121 | |||
122 | Raw buffer as an escaped string | ||
123 | =============================== | ||
92 | 124 | ||
93 | Raw buffer as an escaped string: | 125 | :: |
94 | 126 | ||
95 | %*pE[achnops] | 127 | %*pE[achnops] |
96 | 128 | ||
97 | For printing raw buffer as an escaped string. For the following buffer | 129 | For printing raw buffer as an escaped string. For the following buffer:: |
98 | 130 | ||
99 | 1b 62 20 5c 43 07 22 90 0d 5d | 131 | 1b 62 20 5c 43 07 22 90 0d 5d |
100 | 132 | ||
101 | few examples show how the conversion would be done (the result string | 133 | few examples show how the conversion would be done (the result string |
102 | without surrounding quotes): | 134 | without surrounding quotes):: |
103 | 135 | ||
104 | %*pE "\eb \C\a"\220\r]" | 136 | %*pE "\eb \C\a"\220\r]" |
105 | %*pEhp "\x1bb \C\x07"\x90\x0d]" | 137 | %*pEhp "\x1bb \C\x07"\x90\x0d]" |
106 | %*pEa "\e\142\040\\\103\a\042\220\r\135" | 138 | %*pEa "\e\142\040\\\103\a\042\220\r\135" |
107 | 139 | ||
108 | The conversion rules are applied according to an optional combination | 140 | The conversion rules are applied according to an optional combination |
109 | of flags (see string_escape_mem() kernel documentation for the | 141 | of flags (see :c:func:`string_escape_mem` kernel documentation for the |
110 | details): | 142 | details): |
111 | a - ESCAPE_ANY | 143 | |
112 | c - ESCAPE_SPECIAL | 144 | - ``a`` - ESCAPE_ANY |
113 | h - ESCAPE_HEX | 145 | - ``c`` - ESCAPE_SPECIAL |
114 | n - ESCAPE_NULL | 146 | - ``h`` - ESCAPE_HEX |
115 | o - ESCAPE_OCTAL | 147 | - ``n`` - ESCAPE_NULL |
116 | p - ESCAPE_NP | 148 | - ``o`` - ESCAPE_OCTAL |
117 | s - ESCAPE_SPACE | 149 | - ``p`` - ESCAPE_NP |
118 | By default ESCAPE_ANY_NP is used. | 150 | - ``s`` - ESCAPE_SPACE |
119 | 151 | ||
120 | ESCAPE_ANY_NP is the sane choice for many cases, in particularly for | 152 | By default ESCAPE_ANY_NP is used. |
121 | printing SSIDs. | ||
122 | 153 | ||
123 | If field width is omitted the 1 byte only will be escaped. | 154 | ESCAPE_ANY_NP is the sane choice for many cases, in particularly for |
155 | printing SSIDs. | ||
124 | 156 | ||
125 | Raw buffer as a hex string: | 157 | If field width is omitted the 1 byte only will be escaped. |
158 | |||
159 | Raw buffer as a hex string | ||
160 | ========================== | ||
161 | |||
162 | :: | ||
126 | 163 | ||
127 | %*ph 00 01 02 ... 3f | 164 | %*ph 00 01 02 ... 3f |
128 | %*phC 00:01:02: ... :3f | 165 | %*phC 00:01:02: ... :3f |
129 | %*phD 00-01-02- ... -3f | 166 | %*phD 00-01-02- ... -3f |
130 | %*phN 000102 ... 3f | 167 | %*phN 000102 ... 3f |
131 | 168 | ||
132 | For printing a small buffers (up to 64 bytes long) as a hex string with | 169 | For printing a small buffers (up to 64 bytes long) as a hex string with |
133 | certain separator. For the larger buffers consider to use | 170 | certain separator. For the larger buffers consider to use |
134 | print_hex_dump(). | 171 | :c:func:`print_hex_dump`. |
172 | |||
173 | MAC/FDDI addresses | ||
174 | ================== | ||
135 | 175 | ||
136 | MAC/FDDI addresses: | 176 | :: |
137 | 177 | ||
138 | %pM 00:01:02:03:04:05 | 178 | %pM 00:01:02:03:04:05 |
139 | %pMR 05:04:03:02:01:00 | 179 | %pMR 05:04:03:02:01:00 |
@@ -141,53 +181,62 @@ MAC/FDDI addresses: | |||
141 | %pm 000102030405 | 181 | %pm 000102030405 |
142 | %pmR 050403020100 | 182 | %pmR 050403020100 |
143 | 183 | ||
144 | For printing 6-byte MAC/FDDI addresses in hex notation. The 'M' and 'm' | 184 | For printing 6-byte MAC/FDDI addresses in hex notation. The ``M`` and ``m`` |
145 | specifiers result in a printed address with ('M') or without ('m') byte | 185 | specifiers result in a printed address with (``M``) or without (``m``) byte |
146 | separators. The default byte separator is the colon (':'). | 186 | separators. The default byte separator is the colon (``:``). |
147 | 187 | ||
148 | Where FDDI addresses are concerned the 'F' specifier can be used after | 188 | Where FDDI addresses are concerned the ``F`` specifier can be used after |
149 | the 'M' specifier to use dash ('-') separators instead of the default | 189 | the ``M`` specifier to use dash (``-``) separators instead of the default |
150 | separator. | 190 | separator. |
151 | 191 | ||
152 | For Bluetooth addresses the 'R' specifier shall be used after the 'M' | 192 | For Bluetooth addresses the ``R`` specifier shall be used after the ``M`` |
153 | specifier to use reversed byte order suitable for visual interpretation | 193 | specifier to use reversed byte order suitable for visual interpretation |
154 | of Bluetooth addresses which are in the little endian order. | 194 | of Bluetooth addresses which are in the little endian order. |
155 | 195 | ||
156 | Passed by reference. | 196 | Passed by reference. |
197 | |||
198 | IPv4 addresses | ||
199 | ============== | ||
157 | 200 | ||
158 | IPv4 addresses: | 201 | :: |
159 | 202 | ||
160 | %pI4 1.2.3.4 | 203 | %pI4 1.2.3.4 |
161 | %pi4 001.002.003.004 | 204 | %pi4 001.002.003.004 |
162 | %p[Ii]4[hnbl] | 205 | %p[Ii]4[hnbl] |
163 | 206 | ||
164 | For printing IPv4 dot-separated decimal addresses. The 'I4' and 'i4' | 207 | For printing IPv4 dot-separated decimal addresses. The ``I4`` and ``i4`` |
165 | specifiers result in a printed address with ('i4') or without ('I4') | 208 | specifiers result in a printed address with (``i4``) or without (``I4``) |
166 | leading zeros. | 209 | leading zeros. |
167 | 210 | ||
168 | The additional 'h', 'n', 'b', and 'l' specifiers are used to specify | 211 | The additional ``h``, ``n``, ``b``, and ``l`` specifiers are used to specify |
169 | host, network, big or little endian order addresses respectively. Where | 212 | host, network, big or little endian order addresses respectively. Where |
170 | no specifier is provided the default network/big endian order is used. | 213 | no specifier is provided the default network/big endian order is used. |
171 | 214 | ||
172 | Passed by reference. | 215 | Passed by reference. |
173 | 216 | ||
174 | IPv6 addresses: | 217 | IPv6 addresses |
218 | ============== | ||
219 | |||
220 | :: | ||
175 | 221 | ||
176 | %pI6 0001:0002:0003:0004:0005:0006:0007:0008 | 222 | %pI6 0001:0002:0003:0004:0005:0006:0007:0008 |
177 | %pi6 00010002000300040005000600070008 | 223 | %pi6 00010002000300040005000600070008 |
178 | %pI6c 1:2:3:4:5:6:7:8 | 224 | %pI6c 1:2:3:4:5:6:7:8 |
179 | 225 | ||
180 | For printing IPv6 network-order 16-bit hex addresses. The 'I6' and 'i6' | 226 | For printing IPv6 network-order 16-bit hex addresses. The ``I6`` and ``i6`` |
181 | specifiers result in a printed address with ('I6') or without ('i6') | 227 | specifiers result in a printed address with (``I6``) or without (``i6``) |
182 | colon-separators. Leading zeros are always used. | 228 | colon-separators. Leading zeros are always used. |
183 | 229 | ||
184 | The additional 'c' specifier can be used with the 'I' specifier to | 230 | The additional ``c`` specifier can be used with the ``I`` specifier to |
185 | print a compressed IPv6 address as described by | 231 | print a compressed IPv6 address as described by |
186 | http://tools.ietf.org/html/rfc5952 | 232 | http://tools.ietf.org/html/rfc5952 |
187 | 233 | ||
188 | Passed by reference. | 234 | Passed by reference. |
189 | 235 | ||
190 | IPv4/IPv6 addresses (generic, with port, flowinfo, scope): | 236 | IPv4/IPv6 addresses (generic, with port, flowinfo, scope) |
237 | ========================================================= | ||
238 | |||
239 | :: | ||
191 | 240 | ||
192 | %pIS 1.2.3.4 or 0001:0002:0003:0004:0005:0006:0007:0008 | 241 | %pIS 1.2.3.4 or 0001:0002:0003:0004:0005:0006:0007:0008 |
193 | %piS 001.002.003.004 or 00010002000300040005000600070008 | 242 | %piS 001.002.003.004 or 00010002000300040005000600070008 |
@@ -195,87 +244,103 @@ IPv4/IPv6 addresses (generic, with port, flowinfo, scope): | |||
195 | %pISpc 1.2.3.4:12345 or [1:2:3:4:5:6:7:8]:12345 | 244 | %pISpc 1.2.3.4:12345 or [1:2:3:4:5:6:7:8]:12345 |
196 | %p[Ii]S[pfschnbl] | 245 | %p[Ii]S[pfschnbl] |
197 | 246 | ||
198 | For printing an IP address without the need to distinguish whether it's | 247 | For printing an IP address without the need to distinguish whether it``s |
199 | of type AF_INET or AF_INET6, a pointer to a valid 'struct sockaddr', | 248 | of type AF_INET or AF_INET6, a pointer to a valid ``struct sockaddr``, |
200 | specified through 'IS' or 'iS', can be passed to this format specifier. | 249 | specified through ``IS`` or ``iS``, can be passed to this format specifier. |
201 | 250 | ||
202 | The additional 'p', 'f', and 's' specifiers are used to specify port | 251 | The additional ``p``, ``f``, and ``s`` specifiers are used to specify port |
203 | (IPv4, IPv6), flowinfo (IPv6) and scope (IPv6). Ports have a ':' prefix, | 252 | (IPv4, IPv6), flowinfo (IPv6) and scope (IPv6). Ports have a ``:`` prefix, |
204 | flowinfo a '/' and scope a '%', each followed by the actual value. | 253 | flowinfo a ``/`` and scope a ``%``, each followed by the actual value. |
205 | 254 | ||
206 | In case of an IPv6 address the compressed IPv6 address as described by | 255 | In case of an IPv6 address the compressed IPv6 address as described by |
207 | http://tools.ietf.org/html/rfc5952 is being used if the additional | 256 | http://tools.ietf.org/html/rfc5952 is being used if the additional |
208 | specifier 'c' is given. The IPv6 address is surrounded by '[', ']' in | 257 | specifier ``c`` is given. The IPv6 address is surrounded by ``[``, ``]`` in |
209 | case of additional specifiers 'p', 'f' or 's' as suggested by | 258 | case of additional specifiers ``p``, ``f`` or ``s`` as suggested by |
210 | https://tools.ietf.org/html/draft-ietf-6man-text-addr-representation-07 | 259 | https://tools.ietf.org/html/draft-ietf-6man-text-addr-representation-07 |
211 | 260 | ||
212 | In case of IPv4 addresses, the additional 'h', 'n', 'b', and 'l' | 261 | In case of IPv4 addresses, the additional ``h``, ``n``, ``b``, and ``l`` |
213 | specifiers can be used as well and are ignored in case of an IPv6 | 262 | specifiers can be used as well and are ignored in case of an IPv6 |
214 | address. | 263 | address. |
215 | 264 | ||
216 | Passed by reference. | 265 | Passed by reference. |
217 | 266 | ||
218 | Further examples: | 267 | Further examples:: |
219 | 268 | ||
220 | %pISfc 1.2.3.4 or [1:2:3:4:5:6:7:8]/123456789 | 269 | %pISfc 1.2.3.4 or [1:2:3:4:5:6:7:8]/123456789 |
221 | %pISsc 1.2.3.4 or [1:2:3:4:5:6:7:8]%1234567890 | 270 | %pISsc 1.2.3.4 or [1:2:3:4:5:6:7:8]%1234567890 |
222 | %pISpfc 1.2.3.4:12345 or [1:2:3:4:5:6:7:8]:12345/123456789 | 271 | %pISpfc 1.2.3.4:12345 or [1:2:3:4:5:6:7:8]:12345/123456789 |
223 | 272 | ||
224 | UUID/GUID addresses: | 273 | UUID/GUID addresses |
274 | =================== | ||
275 | |||
276 | :: | ||
225 | 277 | ||
226 | %pUb 00010203-0405-0607-0809-0a0b0c0d0e0f | 278 | %pUb 00010203-0405-0607-0809-0a0b0c0d0e0f |
227 | %pUB 00010203-0405-0607-0809-0A0B0C0D0E0F | 279 | %pUB 00010203-0405-0607-0809-0A0B0C0D0E0F |
228 | %pUl 03020100-0504-0706-0809-0a0b0c0e0e0f | 280 | %pUl 03020100-0504-0706-0809-0a0b0c0e0e0f |
229 | %pUL 03020100-0504-0706-0809-0A0B0C0E0E0F | 281 | %pUL 03020100-0504-0706-0809-0A0B0C0E0E0F |
230 | 282 | ||
231 | For printing 16-byte UUID/GUIDs addresses. The additional 'l', 'L', | 283 | For printing 16-byte UUID/GUIDs addresses. The additional 'l', 'L', |
232 | 'b' and 'B' specifiers are used to specify a little endian order in | 284 | 'b' and 'B' specifiers are used to specify a little endian order in |
233 | lower ('l') or upper case ('L') hex characters - and big endian order | 285 | lower ('l') or upper case ('L') hex characters - and big endian order |
234 | in lower ('b') or upper case ('B') hex characters. | 286 | in lower ('b') or upper case ('B') hex characters. |
235 | 287 | ||
236 | Where no additional specifiers are used the default big endian | 288 | Where no additional specifiers are used the default big endian |
237 | order with lower case hex characters will be printed. | 289 | order with lower case hex characters will be printed. |
238 | 290 | ||
239 | Passed by reference. | 291 | Passed by reference. |
292 | |||
293 | dentry names | ||
294 | ============ | ||
240 | 295 | ||
241 | dentry names: | 296 | :: |
242 | 297 | ||
243 | %pd{,2,3,4} | 298 | %pd{,2,3,4} |
244 | %pD{,2,3,4} | 299 | %pD{,2,3,4} |
245 | 300 | ||
246 | For printing dentry name; if we race with d_move(), the name might be | 301 | For printing dentry name; if we race with :c:func:`d_move`, the name might be |
247 | a mix of old and new ones, but it won't oops. %pd dentry is a safer | 302 | a mix of old and new ones, but it won't oops. ``%pd`` dentry is a safer |
248 | equivalent of %s dentry->d_name.name we used to use, %pd<n> prints | 303 | equivalent of ``%s`` ``dentry->d_name.name`` we used to use, ``%pd<n>`` prints |
249 | n last components. %pD does the same thing for struct file. | 304 | ``n`` last components. ``%pD`` does the same thing for struct file. |
250 | 305 | ||
251 | Passed by reference. | 306 | Passed by reference. |
252 | 307 | ||
253 | block_device names: | 308 | block_device names |
309 | ================== | ||
310 | |||
311 | :: | ||
254 | 312 | ||
255 | %pg sda, sda1 or loop0p1 | 313 | %pg sda, sda1 or loop0p1 |
256 | 314 | ||
257 | For printing name of block_device pointers. | 315 | For printing name of block_device pointers. |
316 | |||
317 | struct va_format | ||
318 | ================ | ||
258 | 319 | ||
259 | struct va_format: | 320 | :: |
260 | 321 | ||
261 | %pV | 322 | %pV |
262 | 323 | ||
263 | For printing struct va_format structures. These contain a format string | 324 | For printing struct va_format structures. These contain a format string |
264 | and va_list as follows: | 325 | and va_list as follows:: |
265 | 326 | ||
266 | struct va_format { | 327 | struct va_format { |
267 | const char *fmt; | 328 | const char *fmt; |
268 | va_list *va; | 329 | va_list *va; |
269 | }; | 330 | }; |
270 | 331 | ||
271 | Implements a "recursive vsnprintf". | 332 | Implements a "recursive vsnprintf". |
272 | 333 | ||
273 | Do not use this feature without some mechanism to verify the | 334 | Do not use this feature without some mechanism to verify the |
274 | correctness of the format string and va_list arguments. | 335 | correctness of the format string and va_list arguments. |
275 | 336 | ||
276 | Passed by reference. | 337 | Passed by reference. |
338 | |||
339 | kobjects | ||
340 | ======== | ||
341 | |||
342 | :: | ||
277 | 343 | ||
278 | kobjects: | ||
279 | %pO | 344 | %pO |
280 | 345 | ||
281 | Base specifier for kobject based structs. Must be followed with | 346 | Base specifier for kobject based structs. Must be followed with |
@@ -311,61 +376,70 @@ kobjects: | |||
311 | 376 | ||
312 | Passed by reference. | 377 | Passed by reference. |
313 | 378 | ||
314 | struct clk: | 379 | |
380 | struct clk | ||
381 | ========== | ||
382 | |||
383 | :: | ||
315 | 384 | ||
316 | %pC pll1 | 385 | %pC pll1 |
317 | %pCn pll1 | 386 | %pCn pll1 |
318 | %pCr 1560000000 | 387 | %pCr 1560000000 |
319 | 388 | ||
320 | For printing struct clk structures. '%pC' and '%pCn' print the name | 389 | For printing struct clk structures. ``%pC`` and ``%pCn`` print the name |
321 | (Common Clock Framework) or address (legacy clock framework) of the | 390 | (Common Clock Framework) or address (legacy clock framework) of the |
322 | structure; '%pCr' prints the current clock rate. | 391 | structure; ``%pCr`` prints the current clock rate. |
323 | 392 | ||
324 | Passed by reference. | 393 | Passed by reference. |
325 | 394 | ||
326 | bitmap and its derivatives such as cpumask and nodemask: | 395 | bitmap and its derivatives such as cpumask and nodemask |
396 | ======================================================= | ||
397 | |||
398 | :: | ||
327 | 399 | ||
328 | %*pb 0779 | 400 | %*pb 0779 |
329 | %*pbl 0,3-6,8-10 | 401 | %*pbl 0,3-6,8-10 |
330 | 402 | ||
331 | For printing bitmap and its derivatives such as cpumask and nodemask, | 403 | For printing bitmap and its derivatives such as cpumask and nodemask, |
332 | %*pb output the bitmap with field width as the number of bits and %*pbl | 404 | ``%*pb`` output the bitmap with field width as the number of bits and ``%*pbl`` |
333 | output the bitmap as range list with field width as the number of bits. | 405 | output the bitmap as range list with field width as the number of bits. |
334 | 406 | ||
335 | Passed by reference. | 407 | Passed by reference. |
408 | |||
409 | Flags bitfields such as page flags, gfp_flags | ||
410 | ============================================= | ||
336 | 411 | ||
337 | Flags bitfields such as page flags, gfp_flags: | 412 | :: |
338 | 413 | ||
339 | %pGp referenced|uptodate|lru|active|private | 414 | %pGp referenced|uptodate|lru|active|private |
340 | %pGg GFP_USER|GFP_DMA32|GFP_NOWARN | 415 | %pGg GFP_USER|GFP_DMA32|GFP_NOWARN |
341 | %pGv read|exec|mayread|maywrite|mayexec|denywrite | 416 | %pGv read|exec|mayread|maywrite|mayexec|denywrite |
342 | 417 | ||
343 | For printing flags bitfields as a collection of symbolic constants that | 418 | For printing flags bitfields as a collection of symbolic constants that |
344 | would construct the value. The type of flags is given by the third | 419 | would construct the value. The type of flags is given by the third |
345 | character. Currently supported are [p]age flags, [v]ma_flags (both | 420 | character. Currently supported are [p]age flags, [v]ma_flags (both |
346 | expect unsigned long *) and [g]fp_flags (expects gfp_t *). The flag | 421 | expect ``unsigned long *``) and [g]fp_flags (expects ``gfp_t *``). The flag |
347 | names and print order depends on the particular type. | 422 | names and print order depends on the particular type. |
348 | 423 | ||
349 | Note that this format should not be used directly in TP_printk() part | 424 | Note that this format should not be used directly in :c:func:`TP_printk()` part |
350 | of a tracepoint. Instead, use the show_*_flags() functions from | 425 | of a tracepoint. Instead, use the ``show_*_flags()`` functions from |
351 | <trace/events/mmflags.h>. | 426 | <trace/events/mmflags.h>. |
352 | 427 | ||
353 | Passed by reference. | 428 | Passed by reference. |
429 | |||
430 | Network device features | ||
431 | ======================= | ||
354 | 432 | ||
355 | Network device features: | 433 | :: |
356 | 434 | ||
357 | %pNF 0x000000000000c000 | 435 | %pNF 0x000000000000c000 |
358 | 436 | ||
359 | For printing netdev_features_t. | 437 | For printing netdev_features_t. |
360 | 438 | ||
361 | Passed by reference. | 439 | Passed by reference. |
362 | 440 | ||
363 | If you add other %p extensions, please extend lib/test_printf.c with | 441 | If you add other ``%p`` extensions, please extend lib/test_printf.c with |
364 | one or more test cases, if at all feasible. | 442 | one or more test cases, if at all feasible. |
365 | 443 | ||
366 | 444 | ||
367 | Thank you for your cooperation and attention. | 445 | Thank you for your cooperation and attention. |
368 | |||
369 | |||
370 | By Randy Dunlap <rdunlap@infradead.org> and | ||
371 | Andrew Murray <amurray@mpc-data.co.uk> | ||
diff --git a/Documentation/rbtree.txt b/Documentation/rbtree.txt index b9d9cc57be18..b8a8c70b0188 100644 --- a/Documentation/rbtree.txt +++ b/Documentation/rbtree.txt | |||
@@ -1,7 +1,10 @@ | |||
1 | ================================= | ||
1 | Red-black Trees (rbtree) in Linux | 2 | Red-black Trees (rbtree) in Linux |
2 | January 18, 2007 | 3 | ================================= |
3 | Rob Landley <rob@landley.net> | 4 | |
4 | ============================= | 5 | |
6 | :Date: January 18, 2007 | ||
7 | :Author: Rob Landley <rob@landley.net> | ||
5 | 8 | ||
6 | What are red-black trees, and what are they for? | 9 | What are red-black trees, and what are they for? |
7 | ------------------------------------------------ | 10 | ------------------------------------------------ |
@@ -56,7 +59,7 @@ user of the rbtree code. | |||
56 | Creating a new rbtree | 59 | Creating a new rbtree |
57 | --------------------- | 60 | --------------------- |
58 | 61 | ||
59 | Data nodes in an rbtree tree are structures containing a struct rb_node member: | 62 | Data nodes in an rbtree tree are structures containing a struct rb_node member:: |
60 | 63 | ||
61 | struct mytype { | 64 | struct mytype { |
62 | struct rb_node node; | 65 | struct rb_node node; |
@@ -78,7 +81,7 @@ Searching for a value in an rbtree | |||
78 | Writing a search function for your tree is fairly straightforward: start at the | 81 | Writing a search function for your tree is fairly straightforward: start at the |
79 | root, compare each value, and follow the left or right branch as necessary. | 82 | root, compare each value, and follow the left or right branch as necessary. |
80 | 83 | ||
81 | Example: | 84 | Example:: |
82 | 85 | ||
83 | struct mytype *my_search(struct rb_root *root, char *string) | 86 | struct mytype *my_search(struct rb_root *root, char *string) |
84 | { | 87 | { |
@@ -110,7 +113,7 @@ The search for insertion differs from the previous search by finding the | |||
110 | location of the pointer on which to graft the new node. The new node also | 113 | location of the pointer on which to graft the new node. The new node also |
111 | needs a link to its parent node for rebalancing purposes. | 114 | needs a link to its parent node for rebalancing purposes. |
112 | 115 | ||
113 | Example: | 116 | Example:: |
114 | 117 | ||
115 | int my_insert(struct rb_root *root, struct mytype *data) | 118 | int my_insert(struct rb_root *root, struct mytype *data) |
116 | { | 119 | { |
@@ -140,11 +143,11 @@ Example: | |||
140 | Removing or replacing existing data in an rbtree | 143 | Removing or replacing existing data in an rbtree |
141 | ------------------------------------------------ | 144 | ------------------------------------------------ |
142 | 145 | ||
143 | To remove an existing node from a tree, call: | 146 | To remove an existing node from a tree, call:: |
144 | 147 | ||
145 | void rb_erase(struct rb_node *victim, struct rb_root *tree); | 148 | void rb_erase(struct rb_node *victim, struct rb_root *tree); |
146 | 149 | ||
147 | Example: | 150 | Example:: |
148 | 151 | ||
149 | struct mytype *data = mysearch(&mytree, "walrus"); | 152 | struct mytype *data = mysearch(&mytree, "walrus"); |
150 | 153 | ||
@@ -153,7 +156,7 @@ Example: | |||
153 | myfree(data); | 156 | myfree(data); |
154 | } | 157 | } |
155 | 158 | ||
156 | To replace an existing node in a tree with a new one with the same key, call: | 159 | To replace an existing node in a tree with a new one with the same key, call:: |
157 | 160 | ||
158 | void rb_replace_node(struct rb_node *old, struct rb_node *new, | 161 | void rb_replace_node(struct rb_node *old, struct rb_node *new, |
159 | struct rb_root *tree); | 162 | struct rb_root *tree); |
@@ -166,7 +169,7 @@ Iterating through the elements stored in an rbtree (in sort order) | |||
166 | 169 | ||
167 | Four functions are provided for iterating through an rbtree's contents in | 170 | Four functions are provided for iterating through an rbtree's contents in |
168 | sorted order. These work on arbitrary trees, and should not need to be | 171 | sorted order. These work on arbitrary trees, and should not need to be |
169 | modified or wrapped (except for locking purposes): | 172 | modified or wrapped (except for locking purposes):: |
170 | 173 | ||
171 | struct rb_node *rb_first(struct rb_root *tree); | 174 | struct rb_node *rb_first(struct rb_root *tree); |
172 | struct rb_node *rb_last(struct rb_root *tree); | 175 | struct rb_node *rb_last(struct rb_root *tree); |
@@ -184,7 +187,7 @@ which the containing data structure may be accessed with the container_of() | |||
184 | macro, and individual members may be accessed directly via | 187 | macro, and individual members may be accessed directly via |
185 | rb_entry(node, type, member). | 188 | rb_entry(node, type, member). |
186 | 189 | ||
187 | Example: | 190 | Example:: |
188 | 191 | ||
189 | struct rb_node *node; | 192 | struct rb_node *node; |
190 | for (node = rb_first(&mytree); node; node = rb_next(node)) | 193 | for (node = rb_first(&mytree); node; node = rb_next(node)) |
@@ -241,7 +244,8 @@ user should have a single rb_erase_augmented() call site in order to limit | |||
241 | compiled code size. | 244 | compiled code size. |
242 | 245 | ||
243 | 246 | ||
244 | Sample usage: | 247 | Sample usage |
248 | ^^^^^^^^^^^^ | ||
245 | 249 | ||
246 | Interval tree is an example of augmented rb tree. Reference - | 250 | Interval tree is an example of augmented rb tree. Reference - |
247 | "Introduction to Algorithms" by Cormen, Leiserson, Rivest and Stein. | 251 | "Introduction to Algorithms" by Cormen, Leiserson, Rivest and Stein. |
@@ -259,12 +263,12 @@ This "extra information" stored in each node is the maximum hi | |||
259 | information can be maintained at each node just be looking at the node | 263 | information can be maintained at each node just be looking at the node |
260 | and its immediate children. And this will be used in O(log n) lookup | 264 | and its immediate children. And this will be used in O(log n) lookup |
261 | for lowest match (lowest start address among all possible matches) | 265 | for lowest match (lowest start address among all possible matches) |
262 | with something like: | 266 | with something like:: |
263 | 267 | ||
264 | struct interval_tree_node * | 268 | struct interval_tree_node * |
265 | interval_tree_first_match(struct rb_root *root, | 269 | interval_tree_first_match(struct rb_root *root, |
266 | unsigned long start, unsigned long last) | 270 | unsigned long start, unsigned long last) |
267 | { | 271 | { |
268 | struct interval_tree_node *node; | 272 | struct interval_tree_node *node; |
269 | 273 | ||
270 | if (!root->rb_node) | 274 | if (!root->rb_node) |
@@ -301,13 +305,13 @@ interval_tree_first_match(struct rb_root *root, | |||
301 | } | 305 | } |
302 | return NULL; /* No match */ | 306 | return NULL; /* No match */ |
303 | } | 307 | } |
304 | } | 308 | } |
305 | 309 | ||
306 | Insertion/removal are defined using the following augmented callbacks: | 310 | Insertion/removal are defined using the following augmented callbacks:: |
307 | 311 | ||
308 | static inline unsigned long | 312 | static inline unsigned long |
309 | compute_subtree_last(struct interval_tree_node *node) | 313 | compute_subtree_last(struct interval_tree_node *node) |
310 | { | 314 | { |
311 | unsigned long max = node->last, subtree_last; | 315 | unsigned long max = node->last, subtree_last; |
312 | if (node->rb.rb_left) { | 316 | if (node->rb.rb_left) { |
313 | subtree_last = rb_entry(node->rb.rb_left, | 317 | subtree_last = rb_entry(node->rb.rb_left, |
@@ -322,10 +326,10 @@ compute_subtree_last(struct interval_tree_node *node) | |||
322 | max = subtree_last; | 326 | max = subtree_last; |
323 | } | 327 | } |
324 | return max; | 328 | return max; |
325 | } | 329 | } |
326 | 330 | ||
327 | static void augment_propagate(struct rb_node *rb, struct rb_node *stop) | 331 | static void augment_propagate(struct rb_node *rb, struct rb_node *stop) |
328 | { | 332 | { |
329 | while (rb != stop) { | 333 | while (rb != stop) { |
330 | struct interval_tree_node *node = | 334 | struct interval_tree_node *node = |
331 | rb_entry(rb, struct interval_tree_node, rb); | 335 | rb_entry(rb, struct interval_tree_node, rb); |
@@ -335,20 +339,20 @@ static void augment_propagate(struct rb_node *rb, struct rb_node *stop) | |||
335 | node->__subtree_last = subtree_last; | 339 | node->__subtree_last = subtree_last; |
336 | rb = rb_parent(&node->rb); | 340 | rb = rb_parent(&node->rb); |
337 | } | 341 | } |
338 | } | 342 | } |
339 | 343 | ||
340 | static void augment_copy(struct rb_node *rb_old, struct rb_node *rb_new) | 344 | static void augment_copy(struct rb_node *rb_old, struct rb_node *rb_new) |
341 | { | 345 | { |
342 | struct interval_tree_node *old = | 346 | struct interval_tree_node *old = |
343 | rb_entry(rb_old, struct interval_tree_node, rb); | 347 | rb_entry(rb_old, struct interval_tree_node, rb); |
344 | struct interval_tree_node *new = | 348 | struct interval_tree_node *new = |
345 | rb_entry(rb_new, struct interval_tree_node, rb); | 349 | rb_entry(rb_new, struct interval_tree_node, rb); |
346 | 350 | ||
347 | new->__subtree_last = old->__subtree_last; | 351 | new->__subtree_last = old->__subtree_last; |
348 | } | 352 | } |
349 | 353 | ||
350 | static void augment_rotate(struct rb_node *rb_old, struct rb_node *rb_new) | 354 | static void augment_rotate(struct rb_node *rb_old, struct rb_node *rb_new) |
351 | { | 355 | { |
352 | struct interval_tree_node *old = | 356 | struct interval_tree_node *old = |
353 | rb_entry(rb_old, struct interval_tree_node, rb); | 357 | rb_entry(rb_old, struct interval_tree_node, rb); |
354 | struct interval_tree_node *new = | 358 | struct interval_tree_node *new = |
@@ -356,15 +360,15 @@ static void augment_rotate(struct rb_node *rb_old, struct rb_node *rb_new) | |||
356 | 360 | ||
357 | new->__subtree_last = old->__subtree_last; | 361 | new->__subtree_last = old->__subtree_last; |
358 | old->__subtree_last = compute_subtree_last(old); | 362 | old->__subtree_last = compute_subtree_last(old); |
359 | } | 363 | } |
360 | 364 | ||
361 | static const struct rb_augment_callbacks augment_callbacks = { | 365 | static const struct rb_augment_callbacks augment_callbacks = { |
362 | augment_propagate, augment_copy, augment_rotate | 366 | augment_propagate, augment_copy, augment_rotate |
363 | }; | 367 | }; |
364 | 368 | ||
365 | void interval_tree_insert(struct interval_tree_node *node, | 369 | void interval_tree_insert(struct interval_tree_node *node, |
366 | struct rb_root *root) | 370 | struct rb_root *root) |
367 | { | 371 | { |
368 | struct rb_node **link = &root->rb_node, *rb_parent = NULL; | 372 | struct rb_node **link = &root->rb_node, *rb_parent = NULL; |
369 | unsigned long start = node->start, last = node->last; | 373 | unsigned long start = node->start, last = node->last; |
370 | struct interval_tree_node *parent; | 374 | struct interval_tree_node *parent; |
@@ -383,10 +387,10 @@ void interval_tree_insert(struct interval_tree_node *node, | |||
383 | node->__subtree_last = last; | 387 | node->__subtree_last = last; |
384 | rb_link_node(&node->rb, rb_parent, link); | 388 | rb_link_node(&node->rb, rb_parent, link); |
385 | rb_insert_augmented(&node->rb, root, &augment_callbacks); | 389 | rb_insert_augmented(&node->rb, root, &augment_callbacks); |
386 | } | 390 | } |
387 | 391 | ||
388 | void interval_tree_remove(struct interval_tree_node *node, | 392 | void interval_tree_remove(struct interval_tree_node *node, |
389 | struct rb_root *root) | 393 | struct rb_root *root) |
390 | { | 394 | { |
391 | rb_erase_augmented(&node->rb, root, &augment_callbacks); | 395 | rb_erase_augmented(&node->rb, root, &augment_callbacks); |
392 | } | 396 | } |
diff --git a/Documentation/remoteproc.txt b/Documentation/remoteproc.txt index f07597482351..77fb03acdbb4 100644 --- a/Documentation/remoteproc.txt +++ b/Documentation/remoteproc.txt | |||
@@ -1,6 +1,9 @@ | |||
1 | ========================== | ||
1 | Remote Processor Framework | 2 | Remote Processor Framework |
3 | ========================== | ||
2 | 4 | ||
3 | 1. Introduction | 5 | Introduction |
6 | ============ | ||
4 | 7 | ||
5 | Modern SoCs typically have heterogeneous remote processor devices in asymmetric | 8 | Modern SoCs typically have heterogeneous remote processor devices in asymmetric |
6 | multiprocessing (AMP) configurations, which may be running different instances | 9 | multiprocessing (AMP) configurations, which may be running different instances |
@@ -26,44 +29,62 @@ remoteproc will add those devices. This makes it possible to reuse the | |||
26 | existing virtio drivers with remote processor backends at a minimal development | 29 | existing virtio drivers with remote processor backends at a minimal development |
27 | cost. | 30 | cost. |
28 | 31 | ||
29 | 2. User API | 32 | User API |
33 | ======== | ||
34 | |||
35 | :: | ||
30 | 36 | ||
31 | int rproc_boot(struct rproc *rproc) | 37 | int rproc_boot(struct rproc *rproc) |
32 | - Boot a remote processor (i.e. load its firmware, power it on, ...). | 38 | |
33 | If the remote processor is already powered on, this function immediately | 39 | Boot a remote processor (i.e. load its firmware, power it on, ...). |
34 | returns (successfully). | 40 | |
35 | Returns 0 on success, and an appropriate error value otherwise. | 41 | If the remote processor is already powered on, this function immediately |
36 | Note: to use this function you should already have a valid rproc | 42 | returns (successfully). |
37 | handle. There are several ways to achieve that cleanly (devres, pdata, | 43 | |
38 | the way remoteproc_rpmsg.c does this, or, if this becomes prevalent, we | 44 | Returns 0 on success, and an appropriate error value otherwise. |
39 | might also consider using dev_archdata for this). | 45 | Note: to use this function you should already have a valid rproc |
46 | handle. There are several ways to achieve that cleanly (devres, pdata, | ||
47 | the way remoteproc_rpmsg.c does this, or, if this becomes prevalent, we | ||
48 | might also consider using dev_archdata for this). | ||
49 | |||
50 | :: | ||
40 | 51 | ||
41 | void rproc_shutdown(struct rproc *rproc) | 52 | void rproc_shutdown(struct rproc *rproc) |
42 | - Power off a remote processor (previously booted with rproc_boot()). | 53 | |
43 | In case @rproc is still being used by an additional user(s), then | 54 | Power off a remote processor (previously booted with rproc_boot()). |
44 | this function will just decrement the power refcount and exit, | 55 | In case @rproc is still being used by an additional user(s), then |
45 | without really powering off the device. | 56 | this function will just decrement the power refcount and exit, |
46 | Every call to rproc_boot() must (eventually) be accompanied by a call | 57 | without really powering off the device. |
47 | to rproc_shutdown(). Calling rproc_shutdown() redundantly is a bug. | 58 | |
48 | Notes: | 59 | Every call to rproc_boot() must (eventually) be accompanied by a call |
49 | - we're not decrementing the rproc's refcount, only the power refcount. | 60 | to rproc_shutdown(). Calling rproc_shutdown() redundantly is a bug. |
50 | which means that the @rproc handle stays valid even after | 61 | |
51 | rproc_shutdown() returns, and users can still use it with a subsequent | 62 | .. note:: |
52 | rproc_boot(), if needed. | 63 | |
64 | we're not decrementing the rproc's refcount, only the power refcount. | ||
65 | which means that the @rproc handle stays valid even after | ||
66 | rproc_shutdown() returns, and users can still use it with a subsequent | ||
67 | rproc_boot(), if needed. | ||
68 | |||
69 | :: | ||
53 | 70 | ||
54 | struct rproc *rproc_get_by_phandle(phandle phandle) | 71 | struct rproc *rproc_get_by_phandle(phandle phandle) |
55 | - Find an rproc handle using a device tree phandle. Returns the rproc | ||
56 | handle on success, and NULL on failure. This function increments | ||
57 | the remote processor's refcount, so always use rproc_put() to | ||
58 | decrement it back once rproc isn't needed anymore. | ||
59 | 72 | ||
60 | 3. Typical usage | 73 | Find an rproc handle using a device tree phandle. Returns the rproc |
74 | handle on success, and NULL on failure. This function increments | ||
75 | the remote processor's refcount, so always use rproc_put() to | ||
76 | decrement it back once rproc isn't needed anymore. | ||
77 | |||
78 | Typical usage | ||
79 | ============= | ||
61 | 80 | ||
62 | #include <linux/remoteproc.h> | 81 | :: |
63 | 82 | ||
64 | /* in case we were given a valid 'rproc' handle */ | 83 | #include <linux/remoteproc.h> |
65 | int dummy_rproc_example(struct rproc *my_rproc) | 84 | |
66 | { | 85 | /* in case we were given a valid 'rproc' handle */ |
86 | int dummy_rproc_example(struct rproc *my_rproc) | ||
87 | { | ||
67 | int ret; | 88 | int ret; |
68 | 89 | ||
69 | /* let's power on and boot our remote processor */ | 90 | /* let's power on and boot our remote processor */ |
@@ -80,84 +101,111 @@ int dummy_rproc_example(struct rproc *my_rproc) | |||
80 | 101 | ||
81 | /* let's shut it down now */ | 102 | /* let's shut it down now */ |
82 | rproc_shutdown(my_rproc); | 103 | rproc_shutdown(my_rproc); |
83 | } | 104 | } |
105 | |||
106 | API for implementors | ||
107 | ==================== | ||
84 | 108 | ||
85 | 4. API for implementors | 109 | :: |
86 | 110 | ||
87 | struct rproc *rproc_alloc(struct device *dev, const char *name, | 111 | struct rproc *rproc_alloc(struct device *dev, const char *name, |
88 | const struct rproc_ops *ops, | 112 | const struct rproc_ops *ops, |
89 | const char *firmware, int len) | 113 | const char *firmware, int len) |
90 | - Allocate a new remote processor handle, but don't register | 114 | |
91 | it yet. Required parameters are the underlying device, the | 115 | Allocate a new remote processor handle, but don't register |
92 | name of this remote processor, platform-specific ops handlers, | 116 | it yet. Required parameters are the underlying device, the |
93 | the name of the firmware to boot this rproc with, and the | 117 | name of this remote processor, platform-specific ops handlers, |
94 | length of private data needed by the allocating rproc driver (in bytes). | 118 | the name of the firmware to boot this rproc with, and the |
95 | 119 | length of private data needed by the allocating rproc driver (in bytes). | |
96 | This function should be used by rproc implementations during | 120 | |
97 | initialization of the remote processor. | 121 | This function should be used by rproc implementations during |
98 | After creating an rproc handle using this function, and when ready, | 122 | initialization of the remote processor. |
99 | implementations should then call rproc_add() to complete | 123 | |
100 | the registration of the remote processor. | 124 | After creating an rproc handle using this function, and when ready, |
101 | On success, the new rproc is returned, and on failure, NULL. | 125 | implementations should then call rproc_add() to complete |
102 | 126 | the registration of the remote processor. | |
103 | Note: _never_ directly deallocate @rproc, even if it was not registered | 127 | |
104 | yet. Instead, when you need to unroll rproc_alloc(), use rproc_free(). | 128 | On success, the new rproc is returned, and on failure, NULL. |
129 | |||
130 | .. note:: | ||
131 | |||
132 | **never** directly deallocate @rproc, even if it was not registered | ||
133 | yet. Instead, when you need to unroll rproc_alloc(), use rproc_free(). | ||
134 | |||
135 | :: | ||
105 | 136 | ||
106 | void rproc_free(struct rproc *rproc) | 137 | void rproc_free(struct rproc *rproc) |
107 | - Free an rproc handle that was allocated by rproc_alloc. | 138 | |
108 | This function essentially unrolls rproc_alloc(), by decrementing the | 139 | Free an rproc handle that was allocated by rproc_alloc. |
109 | rproc's refcount. It doesn't directly free rproc; that would happen | 140 | |
110 | only if there are no other references to rproc and its refcount now | 141 | This function essentially unrolls rproc_alloc(), by decrementing the |
111 | dropped to zero. | 142 | rproc's refcount. It doesn't directly free rproc; that would happen |
143 | only if there are no other references to rproc and its refcount now | ||
144 | dropped to zero. | ||
145 | |||
146 | :: | ||
112 | 147 | ||
113 | int rproc_add(struct rproc *rproc) | 148 | int rproc_add(struct rproc *rproc) |
114 | - Register @rproc with the remoteproc framework, after it has been | 149 | |
115 | allocated with rproc_alloc(). | 150 | Register @rproc with the remoteproc framework, after it has been |
116 | This is called by the platform-specific rproc implementation, whenever | 151 | allocated with rproc_alloc(). |
117 | a new remote processor device is probed. | 152 | |
118 | Returns 0 on success and an appropriate error code otherwise. | 153 | This is called by the platform-specific rproc implementation, whenever |
119 | Note: this function initiates an asynchronous firmware loading | 154 | a new remote processor device is probed. |
120 | context, which will look for virtio devices supported by the rproc's | 155 | |
121 | firmware. | 156 | Returns 0 on success and an appropriate error code otherwise. |
122 | If found, those virtio devices will be created and added, so as a result | 157 | Note: this function initiates an asynchronous firmware loading |
123 | of registering this remote processor, additional virtio drivers might get | 158 | context, which will look for virtio devices supported by the rproc's |
124 | probed. | 159 | firmware. |
160 | |||
161 | If found, those virtio devices will be created and added, so as a result | ||
162 | of registering this remote processor, additional virtio drivers might get | ||
163 | probed. | ||
164 | |||
165 | :: | ||
125 | 166 | ||
126 | int rproc_del(struct rproc *rproc) | 167 | int rproc_del(struct rproc *rproc) |
127 | - Unroll rproc_add(). | ||
128 | This function should be called when the platform specific rproc | ||
129 | implementation decides to remove the rproc device. it should | ||
130 | _only_ be called if a previous invocation of rproc_add() | ||
131 | has completed successfully. | ||
132 | 168 | ||
133 | After rproc_del() returns, @rproc is still valid, and its | 169 | Unroll rproc_add(). |
134 | last refcount should be decremented by calling rproc_free(). | 170 | |
171 | This function should be called when the platform specific rproc | ||
172 | implementation decides to remove the rproc device. it should | ||
173 | _only_ be called if a previous invocation of rproc_add() | ||
174 | has completed successfully. | ||
135 | 175 | ||
136 | Returns 0 on success and -EINVAL if @rproc isn't valid. | 176 | After rproc_del() returns, @rproc is still valid, and its |
177 | last refcount should be decremented by calling rproc_free(). | ||
178 | |||
179 | Returns 0 on success and -EINVAL if @rproc isn't valid. | ||
180 | |||
181 | :: | ||
137 | 182 | ||
138 | void rproc_report_crash(struct rproc *rproc, enum rproc_crash_type type) | 183 | void rproc_report_crash(struct rproc *rproc, enum rproc_crash_type type) |
139 | - Report a crash in a remoteproc | ||
140 | This function must be called every time a crash is detected by the | ||
141 | platform specific rproc implementation. This should not be called from a | ||
142 | non-remoteproc driver. This function can be called from atomic/interrupt | ||
143 | context. | ||
144 | 184 | ||
145 | 5. Implementation callbacks | 185 | Report a crash in a remoteproc |
186 | |||
187 | This function must be called every time a crash is detected by the | ||
188 | platform specific rproc implementation. This should not be called from a | ||
189 | non-remoteproc driver. This function can be called from atomic/interrupt | ||
190 | context. | ||
191 | |||
192 | Implementation callbacks | ||
193 | ======================== | ||
146 | 194 | ||
147 | These callbacks should be provided by platform-specific remoteproc | 195 | These callbacks should be provided by platform-specific remoteproc |
148 | drivers: | 196 | drivers:: |
149 | 197 | ||
150 | /** | 198 | /** |
151 | * struct rproc_ops - platform-specific device handlers | 199 | * struct rproc_ops - platform-specific device handlers |
152 | * @start: power on the device and boot it | 200 | * @start: power on the device and boot it |
153 | * @stop: power off the device | 201 | * @stop: power off the device |
154 | * @kick: kick a virtqueue (virtqueue id given as a parameter) | 202 | * @kick: kick a virtqueue (virtqueue id given as a parameter) |
155 | */ | 203 | */ |
156 | struct rproc_ops { | 204 | struct rproc_ops { |
157 | int (*start)(struct rproc *rproc); | 205 | int (*start)(struct rproc *rproc); |
158 | int (*stop)(struct rproc *rproc); | 206 | int (*stop)(struct rproc *rproc); |
159 | void (*kick)(struct rproc *rproc, int vqid); | 207 | void (*kick)(struct rproc *rproc, int vqid); |
160 | }; | 208 | }; |
161 | 209 | ||
162 | Every remoteproc implementation should at least provide the ->start and ->stop | 210 | Every remoteproc implementation should at least provide the ->start and ->stop |
163 | handlers. If rpmsg/virtio functionality is also desired, then the ->kick handler | 211 | handlers. If rpmsg/virtio functionality is also desired, then the ->kick handler |
@@ -179,7 +227,8 @@ the exact virtqueue index to look in is optional: it is easy (and not | |||
179 | too expensive) to go through the existing virtqueues and look for new buffers | 227 | too expensive) to go through the existing virtqueues and look for new buffers |
180 | in the used rings. | 228 | in the used rings. |
181 | 229 | ||
182 | 6. Binary Firmware Structure | 230 | Binary Firmware Structure |
231 | ========================= | ||
183 | 232 | ||
184 | At this point remoteproc only supports ELF32 firmware binaries. However, | 233 | At this point remoteproc only supports ELF32 firmware binaries. However, |
185 | it is quite expected that other platforms/devices which we'd want to | 234 | it is quite expected that other platforms/devices which we'd want to |
@@ -207,43 +256,43 @@ resource entries that publish the existence of supported features | |||
207 | or configurations by the remote processor, such as trace buffers and | 256 | or configurations by the remote processor, such as trace buffers and |
208 | supported virtio devices (and their configurations). | 257 | supported virtio devices (and their configurations). |
209 | 258 | ||
210 | The resource table begins with this header: | 259 | The resource table begins with this header:: |
211 | 260 | ||
212 | /** | 261 | /** |
213 | * struct resource_table - firmware resource table header | 262 | * struct resource_table - firmware resource table header |
214 | * @ver: version number | 263 | * @ver: version number |
215 | * @num: number of resource entries | 264 | * @num: number of resource entries |
216 | * @reserved: reserved (must be zero) | 265 | * @reserved: reserved (must be zero) |
217 | * @offset: array of offsets pointing at the various resource entries | 266 | * @offset: array of offsets pointing at the various resource entries |
218 | * | 267 | * |
219 | * The header of the resource table, as expressed by this structure, | 268 | * The header of the resource table, as expressed by this structure, |
220 | * contains a version number (should we need to change this format in the | 269 | * contains a version number (should we need to change this format in the |
221 | * future), the number of available resource entries, and their offsets | 270 | * future), the number of available resource entries, and their offsets |
222 | * in the table. | 271 | * in the table. |
223 | */ | 272 | */ |
224 | struct resource_table { | 273 | struct resource_table { |
225 | u32 ver; | 274 | u32 ver; |
226 | u32 num; | 275 | u32 num; |
227 | u32 reserved[2]; | 276 | u32 reserved[2]; |
228 | u32 offset[0]; | 277 | u32 offset[0]; |
229 | } __packed; | 278 | } __packed; |
230 | 279 | ||
231 | Immediately following this header are the resource entries themselves, | 280 | Immediately following this header are the resource entries themselves, |
232 | each of which begins with the following resource entry header: | 281 | each of which begins with the following resource entry header:: |
233 | 282 | ||
234 | /** | 283 | /** |
235 | * struct fw_rsc_hdr - firmware resource entry header | 284 | * struct fw_rsc_hdr - firmware resource entry header |
236 | * @type: resource type | 285 | * @type: resource type |
237 | * @data: resource data | 286 | * @data: resource data |
238 | * | 287 | * |
239 | * Every resource entry begins with a 'struct fw_rsc_hdr' header providing | 288 | * Every resource entry begins with a 'struct fw_rsc_hdr' header providing |
240 | * its @type. The content of the entry itself will immediately follow | 289 | * its @type. The content of the entry itself will immediately follow |
241 | * this header, and it should be parsed according to the resource type. | 290 | * this header, and it should be parsed according to the resource type. |
242 | */ | 291 | */ |
243 | struct fw_rsc_hdr { | 292 | struct fw_rsc_hdr { |
244 | u32 type; | 293 | u32 type; |
245 | u8 data[0]; | 294 | u8 data[0]; |
246 | } __packed; | 295 | } __packed; |
247 | 296 | ||
248 | Some resources entries are mere announcements, where the host is informed | 297 | Some resources entries are mere announcements, where the host is informed |
249 | of specific remoteproc configuration. Other entries require the host to | 298 | of specific remoteproc configuration. Other entries require the host to |
@@ -252,32 +301,32 @@ is expected, where the firmware requests a resource, and once allocated, | |||
252 | the host should provide back its details (e.g. address of an allocated | 301 | the host should provide back its details (e.g. address of an allocated |
253 | memory region). | 302 | memory region). |
254 | 303 | ||
255 | Here are the various resource types that are currently supported: | 304 | Here are the various resource types that are currently supported:: |
256 | 305 | ||
257 | /** | 306 | /** |
258 | * enum fw_resource_type - types of resource entries | 307 | * enum fw_resource_type - types of resource entries |
259 | * | 308 | * |
260 | * @RSC_CARVEOUT: request for allocation of a physically contiguous | 309 | * @RSC_CARVEOUT: request for allocation of a physically contiguous |
261 | * memory region. | 310 | * memory region. |
262 | * @RSC_DEVMEM: request to iommu_map a memory-based peripheral. | 311 | * @RSC_DEVMEM: request to iommu_map a memory-based peripheral. |
263 | * @RSC_TRACE: announces the availability of a trace buffer into which | 312 | * @RSC_TRACE: announces the availability of a trace buffer into which |
264 | * the remote processor will be writing logs. | 313 | * the remote processor will be writing logs. |
265 | * @RSC_VDEV: declare support for a virtio device, and serve as its | 314 | * @RSC_VDEV: declare support for a virtio device, and serve as its |
266 | * virtio header. | 315 | * virtio header. |
267 | * @RSC_LAST: just keep this one at the end | 316 | * @RSC_LAST: just keep this one at the end |
268 | * | 317 | * |
269 | * Please note that these values are used as indices to the rproc_handle_rsc | 318 | * Please note that these values are used as indices to the rproc_handle_rsc |
270 | * lookup table, so please keep them sane. Moreover, @RSC_LAST is used to | 319 | * lookup table, so please keep them sane. Moreover, @RSC_LAST is used to |
271 | * check the validity of an index before the lookup table is accessed, so | 320 | * check the validity of an index before the lookup table is accessed, so |
272 | * please update it as needed. | 321 | * please update it as needed. |
273 | */ | 322 | */ |
274 | enum fw_resource_type { | 323 | enum fw_resource_type { |
275 | RSC_CARVEOUT = 0, | 324 | RSC_CARVEOUT = 0, |
276 | RSC_DEVMEM = 1, | 325 | RSC_DEVMEM = 1, |
277 | RSC_TRACE = 2, | 326 | RSC_TRACE = 2, |
278 | RSC_VDEV = 3, | 327 | RSC_VDEV = 3, |
279 | RSC_LAST = 4, | 328 | RSC_LAST = 4, |
280 | }; | 329 | }; |
281 | 330 | ||
282 | For more details regarding a specific resource type, please see its | 331 | For more details regarding a specific resource type, please see its |
283 | dedicated structure in include/linux/remoteproc.h. | 332 | dedicated structure in include/linux/remoteproc.h. |
@@ -286,7 +335,8 @@ We also expect that platform-specific resource entries will show up | |||
286 | at some point. When that happens, we could easily add a new RSC_PLATFORM | 335 | at some point. When that happens, we could easily add a new RSC_PLATFORM |
287 | type, and hand those resources to the platform-specific rproc driver to handle. | 336 | type, and hand those resources to the platform-specific rproc driver to handle. |
288 | 337 | ||
289 | 7. Virtio and remoteproc | 338 | Virtio and remoteproc |
339 | ===================== | ||
290 | 340 | ||
291 | The firmware should provide remoteproc information about virtio devices | 341 | The firmware should provide remoteproc information about virtio devices |
292 | that it supports, and their configurations: a RSC_VDEV resource entry | 342 | that it supports, and their configurations: a RSC_VDEV resource entry |
diff --git a/Documentation/rfkill.txt b/Documentation/rfkill.txt index 8c174063b3f0..a289285d2412 100644 --- a/Documentation/rfkill.txt +++ b/Documentation/rfkill.txt | |||
@@ -1,13 +1,13 @@ | |||
1 | =============================== | ||
1 | rfkill - RF kill switch support | 2 | rfkill - RF kill switch support |
2 | =============================== | 3 | =============================== |
3 | 4 | ||
4 | 1. Introduction | ||
5 | 2. Implementation details | ||
6 | 3. Kernel API | ||
7 | 4. Userspace support | ||
8 | 5 | ||
6 | .. contents:: | ||
7 | :depth: 2 | ||
9 | 8 | ||
10 | 1. Introduction | 9 | Introduction |
10 | ============ | ||
11 | 11 | ||
12 | The rfkill subsystem provides a generic interface to disabling any radio | 12 | The rfkill subsystem provides a generic interface to disabling any radio |
13 | transmitter in the system. When a transmitter is blocked, it shall not | 13 | transmitter in the system. When a transmitter is blocked, it shall not |
@@ -21,17 +21,24 @@ aircraft. | |||
21 | The rfkill subsystem has a concept of "hard" and "soft" block, which | 21 | The rfkill subsystem has a concept of "hard" and "soft" block, which |
22 | differ little in their meaning (block == transmitters off) but rather in | 22 | differ little in their meaning (block == transmitters off) but rather in |
23 | whether they can be changed or not: | 23 | whether they can be changed or not: |
24 | - hard block: read-only radio block that cannot be overridden by software | 24 | |
25 | - soft block: writable radio block (need not be readable) that is set by | 25 | - hard block |
26 | the system software. | 26 | read-only radio block that cannot be overridden by software |
27 | |||
28 | - soft block | ||
29 | writable radio block (need not be readable) that is set by | ||
30 | the system software. | ||
27 | 31 | ||
28 | The rfkill subsystem has two parameters, rfkill.default_state and | 32 | The rfkill subsystem has two parameters, rfkill.default_state and |
29 | rfkill.master_switch_mode, which are documented in admin-guide/kernel-parameters.rst. | 33 | rfkill.master_switch_mode, which are documented in |
34 | admin-guide/kernel-parameters.rst. | ||
30 | 35 | ||
31 | 36 | ||
32 | 2. Implementation details | 37 | Implementation details |
38 | ====================== | ||
33 | 39 | ||
34 | The rfkill subsystem is composed of three main components: | 40 | The rfkill subsystem is composed of three main components: |
41 | |||
35 | * the rfkill core, | 42 | * the rfkill core, |
36 | * the deprecated rfkill-input module (an input layer handler, being | 43 | * the deprecated rfkill-input module (an input layer handler, being |
37 | replaced by userspace policy code) and | 44 | replaced by userspace policy code) and |
@@ -55,7 +62,8 @@ use the return value of rfkill_set_hw_state() unless the hardware actually | |||
55 | keeps track of soft and hard block separately. | 62 | keeps track of soft and hard block separately. |
56 | 63 | ||
57 | 64 | ||
58 | 3. Kernel API | 65 | Kernel API |
66 | ========== | ||
59 | 67 | ||
60 | 68 | ||
61 | Drivers for radio transmitters normally implement an rfkill driver. | 69 | Drivers for radio transmitters normally implement an rfkill driver. |
@@ -69,7 +77,7 @@ For some platforms, it is possible that the hardware state changes during | |||
69 | suspend/hibernation, in which case it will be necessary to update the rfkill | 77 | suspend/hibernation, in which case it will be necessary to update the rfkill |
70 | core with the current state is at resume time. | 78 | core with the current state is at resume time. |
71 | 79 | ||
72 | To create an rfkill driver, driver's Kconfig needs to have | 80 | To create an rfkill driver, driver's Kconfig needs to have:: |
73 | 81 | ||
74 | depends on RFKILL || !RFKILL | 82 | depends on RFKILL || !RFKILL |
75 | 83 | ||
@@ -87,7 +95,8 @@ RFKill provides per-switch LED triggers, which can be used to drive LEDs | |||
87 | according to the switch state (LED_FULL when blocked, LED_OFF otherwise). | 95 | according to the switch state (LED_FULL when blocked, LED_OFF otherwise). |
88 | 96 | ||
89 | 97 | ||
90 | 5. Userspace support | 98 | Userspace support |
99 | ================= | ||
91 | 100 | ||
92 | The recommended userspace interface to use is /dev/rfkill, which is a misc | 101 | The recommended userspace interface to use is /dev/rfkill, which is a misc |
93 | character device that allows userspace to obtain and set the state of rfkill | 102 | character device that allows userspace to obtain and set the state of rfkill |
@@ -112,11 +121,11 @@ rfkill core framework. | |||
112 | Additionally, each rfkill device is registered in sysfs and emits uevents. | 121 | Additionally, each rfkill device is registered in sysfs and emits uevents. |
113 | 122 | ||
114 | rfkill devices issue uevents (with an action of "change"), with the following | 123 | rfkill devices issue uevents (with an action of "change"), with the following |
115 | environment variables set: | 124 | environment variables set:: |
116 | 125 | ||
117 | RFKILL_NAME | 126 | RFKILL_NAME |
118 | RFKILL_STATE | 127 | RFKILL_STATE |
119 | RFKILL_TYPE | 128 | RFKILL_TYPE |
120 | 129 | ||
121 | The contents of these variables corresponds to the "name", "state" and | 130 | The contents of these variables corresponds to the "name", "state" and |
122 | "type" sysfs files explained above. | 131 | "type" sysfs files explained above. |
diff --git a/Documentation/robust-futex-ABI.txt b/Documentation/robust-futex-ABI.txt index 16eb314f56cc..8a5d34abf726 100644 --- a/Documentation/robust-futex-ABI.txt +++ b/Documentation/robust-futex-ABI.txt | |||
@@ -1,7 +1,9 @@ | |||
1 | Started by Paul Jackson <pj@sgi.com> | 1 | ==================== |
2 | |||
3 | The robust futex ABI | 2 | The robust futex ABI |
4 | -------------------- | 3 | ==================== |
4 | |||
5 | :Author: Started by Paul Jackson <pj@sgi.com> | ||
6 | |||
5 | 7 | ||
6 | Robust_futexes provide a mechanism that is used in addition to normal | 8 | Robust_futexes provide a mechanism that is used in addition to normal |
7 | futexes, for kernel assist of cleanup of held locks on task exit. | 9 | futexes, for kernel assist of cleanup of held locks on task exit. |
@@ -32,7 +34,7 @@ probably causing deadlock or other such failure of the other threads | |||
32 | waiting on the same locks. | 34 | waiting on the same locks. |
33 | 35 | ||
34 | A thread that anticipates possibly using robust_futexes should first | 36 | A thread that anticipates possibly using robust_futexes should first |
35 | issue the system call: | 37 | issue the system call:: |
36 | 38 | ||
37 | asmlinkage long | 39 | asmlinkage long |
38 | sys_set_robust_list(struct robust_list_head __user *head, size_t len); | 40 | sys_set_robust_list(struct robust_list_head __user *head, size_t len); |
@@ -91,7 +93,7 @@ that lock using the futex mechanism. | |||
91 | When a thread has invoked the above system call to indicate it | 93 | When a thread has invoked the above system call to indicate it |
92 | anticipates using robust_futexes, the kernel stores the passed in 'head' | 94 | anticipates using robust_futexes, the kernel stores the passed in 'head' |
93 | pointer for that task. The task may retrieve that value later on by | 95 | pointer for that task. The task may retrieve that value later on by |
94 | using the system call: | 96 | using the system call:: |
95 | 97 | ||
96 | asmlinkage long | 98 | asmlinkage long |
97 | sys_get_robust_list(int pid, struct robust_list_head __user **head_ptr, | 99 | sys_get_robust_list(int pid, struct robust_list_head __user **head_ptr, |
@@ -135,6 +137,7 @@ manipulating this list), the user code must observe the following | |||
135 | protocol on 'lock entry' insertion and removal: | 137 | protocol on 'lock entry' insertion and removal: |
136 | 138 | ||
137 | On insertion: | 139 | On insertion: |
140 | |||
138 | 1) set the 'list_op_pending' word to the address of the 'lock entry' | 141 | 1) set the 'list_op_pending' word to the address of the 'lock entry' |
139 | to be inserted, | 142 | to be inserted, |
140 | 2) acquire the futex lock, | 143 | 2) acquire the futex lock, |
@@ -143,6 +146,7 @@ On insertion: | |||
143 | 4) clear the 'list_op_pending' word. | 146 | 4) clear the 'list_op_pending' word. |
144 | 147 | ||
145 | On removal: | 148 | On removal: |
149 | |||
146 | 1) set the 'list_op_pending' word to the address of the 'lock entry' | 150 | 1) set the 'list_op_pending' word to the address of the 'lock entry' |
147 | to be removed, | 151 | to be removed, |
148 | 2) remove the lock entry for this lock from the 'head' list, | 152 | 2) remove the lock entry for this lock from the 'head' list, |
diff --git a/Documentation/robust-futexes.txt b/Documentation/robust-futexes.txt index 61c22d608759..6c42c75103eb 100644 --- a/Documentation/robust-futexes.txt +++ b/Documentation/robust-futexes.txt | |||
@@ -1,4 +1,8 @@ | |||
1 | Started by: Ingo Molnar <mingo@redhat.com> | 1 | ======================================== |
2 | A description of what robust futexes are | ||
3 | ======================================== | ||
4 | |||
5 | :Started by: Ingo Molnar <mingo@redhat.com> | ||
2 | 6 | ||
3 | Background | 7 | Background |
4 | ---------- | 8 | ---------- |
@@ -163,7 +167,7 @@ Implementation details | |||
163 | ---------------------- | 167 | ---------------------- |
164 | 168 | ||
165 | The patch adds two new syscalls: one to register the userspace list, and | 169 | The patch adds two new syscalls: one to register the userspace list, and |
166 | one to query the registered list pointer: | 170 | one to query the registered list pointer:: |
167 | 171 | ||
168 | asmlinkage long | 172 | asmlinkage long |
169 | sys_set_robust_list(struct robust_list_head __user *head, | 173 | sys_set_robust_list(struct robust_list_head __user *head, |
@@ -185,7 +189,7 @@ straightforward. The kernel doesn't have any internal distinction between | |||
185 | robust and normal futexes. | 189 | robust and normal futexes. |
186 | 190 | ||
187 | If a futex is found to be held at exit time, the kernel sets the | 191 | If a futex is found to be held at exit time, the kernel sets the |
188 | following bit of the futex word: | 192 | following bit of the futex word:: |
189 | 193 | ||
190 | #define FUTEX_OWNER_DIED 0x40000000 | 194 | #define FUTEX_OWNER_DIED 0x40000000 |
191 | 195 | ||
@@ -193,7 +197,7 @@ and wakes up the next futex waiter (if any). User-space does the rest of | |||
193 | the cleanup. | 197 | the cleanup. |
194 | 198 | ||
195 | Otherwise, robust futexes are acquired by glibc by putting the TID into | 199 | Otherwise, robust futexes are acquired by glibc by putting the TID into |
196 | the futex field atomically. Waiters set the FUTEX_WAITERS bit: | 200 | the futex field atomically. Waiters set the FUTEX_WAITERS bit:: |
197 | 201 | ||
198 | #define FUTEX_WAITERS 0x80000000 | 202 | #define FUTEX_WAITERS 0x80000000 |
199 | 203 | ||
diff --git a/Documentation/rpmsg.txt b/Documentation/rpmsg.txt index a95e36a43288..24b7a9e1a5f9 100644 --- a/Documentation/rpmsg.txt +++ b/Documentation/rpmsg.txt | |||
@@ -1,10 +1,15 @@ | |||
1 | ============================================ | ||
1 | Remote Processor Messaging (rpmsg) Framework | 2 | Remote Processor Messaging (rpmsg) Framework |
3 | ============================================ | ||
2 | 4 | ||
3 | Note: this document describes the rpmsg bus and how to write rpmsg drivers. | 5 | .. note:: |
4 | To learn how to add rpmsg support for new platforms, check out remoteproc.txt | ||
5 | (also a resident of Documentation/). | ||
6 | 6 | ||
7 | 1. Introduction | 7 | This document describes the rpmsg bus and how to write rpmsg drivers. |
8 | To learn how to add rpmsg support for new platforms, check out remoteproc.txt | ||
9 | (also a resident of Documentation/). | ||
10 | |||
11 | Introduction | ||
12 | ============ | ||
8 | 13 | ||
9 | Modern SoCs typically employ heterogeneous remote processor devices in | 14 | Modern SoCs typically employ heterogeneous remote processor devices in |
10 | asymmetric multiprocessing (AMP) configurations, which may be running | 15 | asymmetric multiprocessing (AMP) configurations, which may be running |
@@ -58,170 +63,222 @@ to their destination address (this is done by invoking the driver's rx handler | |||
58 | with the payload of the inbound message). | 63 | with the payload of the inbound message). |
59 | 64 | ||
60 | 65 | ||
61 | 2. User API | 66 | User API |
67 | ======== | ||
68 | |||
69 | :: | ||
62 | 70 | ||
63 | int rpmsg_send(struct rpmsg_channel *rpdev, void *data, int len); | 71 | int rpmsg_send(struct rpmsg_channel *rpdev, void *data, int len); |
64 | - sends a message across to the remote processor on a given channel. | 72 | |
65 | The caller should specify the channel, the data it wants to send, | 73 | sends a message across to the remote processor on a given channel. |
66 | and its length (in bytes). The message will be sent on the specified | 74 | The caller should specify the channel, the data it wants to send, |
67 | channel, i.e. its source and destination address fields will be | 75 | and its length (in bytes). The message will be sent on the specified |
68 | set to the channel's src and dst addresses. | 76 | channel, i.e. its source and destination address fields will be |
69 | 77 | set to the channel's src and dst addresses. | |
70 | In case there are no TX buffers available, the function will block until | 78 | |
71 | one becomes available (i.e. until the remote processor consumes | 79 | In case there are no TX buffers available, the function will block until |
72 | a tx buffer and puts it back on virtio's used descriptor ring), | 80 | one becomes available (i.e. until the remote processor consumes |
73 | or a timeout of 15 seconds elapses. When the latter happens, | 81 | a tx buffer and puts it back on virtio's used descriptor ring), |
74 | -ERESTARTSYS is returned. | 82 | or a timeout of 15 seconds elapses. When the latter happens, |
75 | The function can only be called from a process context (for now). | 83 | -ERESTARTSYS is returned. |
76 | Returns 0 on success and an appropriate error value on failure. | 84 | |
85 | The function can only be called from a process context (for now). | ||
86 | Returns 0 on success and an appropriate error value on failure. | ||
87 | |||
88 | :: | ||
77 | 89 | ||
78 | int rpmsg_sendto(struct rpmsg_channel *rpdev, void *data, int len, u32 dst); | 90 | int rpmsg_sendto(struct rpmsg_channel *rpdev, void *data, int len, u32 dst); |
79 | - sends a message across to the remote processor on a given channel, | 91 | |
80 | to a destination address provided by the caller. | 92 | sends a message across to the remote processor on a given channel, |
81 | The caller should specify the channel, the data it wants to send, | 93 | to a destination address provided by the caller. |
82 | its length (in bytes), and an explicit destination address. | 94 | |
83 | The message will then be sent to the remote processor to which the | 95 | The caller should specify the channel, the data it wants to send, |
84 | channel belongs, using the channel's src address, and the user-provided | 96 | its length (in bytes), and an explicit destination address. |
85 | dst address (thus the channel's dst address will be ignored). | 97 | |
86 | 98 | The message will then be sent to the remote processor to which the | |
87 | In case there are no TX buffers available, the function will block until | 99 | channel belongs, using the channel's src address, and the user-provided |
88 | one becomes available (i.e. until the remote processor consumes | 100 | dst address (thus the channel's dst address will be ignored). |
89 | a tx buffer and puts it back on virtio's used descriptor ring), | 101 | |
90 | or a timeout of 15 seconds elapses. When the latter happens, | 102 | In case there are no TX buffers available, the function will block until |
91 | -ERESTARTSYS is returned. | 103 | one becomes available (i.e. until the remote processor consumes |
92 | The function can only be called from a process context (for now). | 104 | a tx buffer and puts it back on virtio's used descriptor ring), |
93 | Returns 0 on success and an appropriate error value on failure. | 105 | or a timeout of 15 seconds elapses. When the latter happens, |
106 | -ERESTARTSYS is returned. | ||
107 | |||
108 | The function can only be called from a process context (for now). | ||
109 | Returns 0 on success and an appropriate error value on failure. | ||
110 | |||
111 | :: | ||
94 | 112 | ||
95 | int rpmsg_send_offchannel(struct rpmsg_channel *rpdev, u32 src, u32 dst, | 113 | int rpmsg_send_offchannel(struct rpmsg_channel *rpdev, u32 src, u32 dst, |
96 | void *data, int len); | 114 | void *data, int len); |
97 | - sends a message across to the remote processor, using the src and dst | 115 | |
98 | addresses provided by the user. | 116 | |
99 | The caller should specify the channel, the data it wants to send, | 117 | sends a message across to the remote processor, using the src and dst |
100 | its length (in bytes), and explicit source and destination addresses. | 118 | addresses provided by the user. |
101 | The message will then be sent to the remote processor to which the | 119 | |
102 | channel belongs, but the channel's src and dst addresses will be | 120 | The caller should specify the channel, the data it wants to send, |
103 | ignored (and the user-provided addresses will be used instead). | 121 | its length (in bytes), and explicit source and destination addresses. |
104 | 122 | The message will then be sent to the remote processor to which the | |
105 | In case there are no TX buffers available, the function will block until | 123 | channel belongs, but the channel's src and dst addresses will be |
106 | one becomes available (i.e. until the remote processor consumes | 124 | ignored (and the user-provided addresses will be used instead). |
107 | a tx buffer and puts it back on virtio's used descriptor ring), | 125 | |
108 | or a timeout of 15 seconds elapses. When the latter happens, | 126 | In case there are no TX buffers available, the function will block until |
109 | -ERESTARTSYS is returned. | 127 | one becomes available (i.e. until the remote processor consumes |
110 | The function can only be called from a process context (for now). | 128 | a tx buffer and puts it back on virtio's used descriptor ring), |
111 | Returns 0 on success and an appropriate error value on failure. | 129 | or a timeout of 15 seconds elapses. When the latter happens, |
130 | -ERESTARTSYS is returned. | ||
131 | |||
132 | The function can only be called from a process context (for now). | ||
133 | Returns 0 on success and an appropriate error value on failure. | ||
134 | |||
135 | :: | ||
112 | 136 | ||
113 | int rpmsg_trysend(struct rpmsg_channel *rpdev, void *data, int len); | 137 | int rpmsg_trysend(struct rpmsg_channel *rpdev, void *data, int len); |
114 | - sends a message across to the remote processor on a given channel. | ||
115 | The caller should specify the channel, the data it wants to send, | ||
116 | and its length (in bytes). The message will be sent on the specified | ||
117 | channel, i.e. its source and destination address fields will be | ||
118 | set to the channel's src and dst addresses. | ||
119 | 138 | ||
120 | In case there are no TX buffers available, the function will immediately | 139 | sends a message across to the remote processor on a given channel. |
121 | return -ENOMEM without waiting until one becomes available. | 140 | The caller should specify the channel, the data it wants to send, |
122 | The function can only be called from a process context (for now). | 141 | and its length (in bytes). The message will be sent on the specified |
123 | Returns 0 on success and an appropriate error value on failure. | 142 | channel, i.e. its source and destination address fields will be |
143 | set to the channel's src and dst addresses. | ||
144 | |||
145 | In case there are no TX buffers available, the function will immediately | ||
146 | return -ENOMEM without waiting until one becomes available. | ||
147 | |||
148 | The function can only be called from a process context (for now). | ||
149 | Returns 0 on success and an appropriate error value on failure. | ||
150 | |||
151 | :: | ||
124 | 152 | ||
125 | int rpmsg_trysendto(struct rpmsg_channel *rpdev, void *data, int len, u32 dst) | 153 | int rpmsg_trysendto(struct rpmsg_channel *rpdev, void *data, int len, u32 dst) |
126 | - sends a message across to the remote processor on a given channel, | 154 | |
127 | to a destination address provided by the user. | 155 | |
128 | The user should specify the channel, the data it wants to send, | 156 | sends a message across to the remote processor on a given channel, |
129 | its length (in bytes), and an explicit destination address. | 157 | to a destination address provided by the user. |
130 | The message will then be sent to the remote processor to which the | 158 | |
131 | channel belongs, using the channel's src address, and the user-provided | 159 | The user should specify the channel, the data it wants to send, |
132 | dst address (thus the channel's dst address will be ignored). | 160 | its length (in bytes), and an explicit destination address. |
133 | 161 | ||
134 | In case there are no TX buffers available, the function will immediately | 162 | The message will then be sent to the remote processor to which the |
135 | return -ENOMEM without waiting until one becomes available. | 163 | channel belongs, using the channel's src address, and the user-provided |
136 | The function can only be called from a process context (for now). | 164 | dst address (thus the channel's dst address will be ignored). |
137 | Returns 0 on success and an appropriate error value on failure. | 165 | |
166 | In case there are no TX buffers available, the function will immediately | ||
167 | return -ENOMEM without waiting until one becomes available. | ||
168 | |||
169 | The function can only be called from a process context (for now). | ||
170 | Returns 0 on success and an appropriate error value on failure. | ||
171 | |||
172 | :: | ||
138 | 173 | ||
139 | int rpmsg_trysend_offchannel(struct rpmsg_channel *rpdev, u32 src, u32 dst, | 174 | int rpmsg_trysend_offchannel(struct rpmsg_channel *rpdev, u32 src, u32 dst, |
140 | void *data, int len); | 175 | void *data, int len); |
141 | - sends a message across to the remote processor, using source and | 176 | |
142 | destination addresses provided by the user. | 177 | |
143 | The user should specify the channel, the data it wants to send, | 178 | sends a message across to the remote processor, using source and |
144 | its length (in bytes), and explicit source and destination addresses. | 179 | destination addresses provided by the user. |
145 | The message will then be sent to the remote processor to which the | 180 | |
146 | channel belongs, but the channel's src and dst addresses will be | 181 | The user should specify the channel, the data it wants to send, |
147 | ignored (and the user-provided addresses will be used instead). | 182 | its length (in bytes), and explicit source and destination addresses. |
148 | 183 | The message will then be sent to the remote processor to which the | |
149 | In case there are no TX buffers available, the function will immediately | 184 | channel belongs, but the channel's src and dst addresses will be |
150 | return -ENOMEM without waiting until one becomes available. | 185 | ignored (and the user-provided addresses will be used instead). |
151 | The function can only be called from a process context (for now). | 186 | |
152 | Returns 0 on success and an appropriate error value on failure. | 187 | In case there are no TX buffers available, the function will immediately |
188 | return -ENOMEM without waiting until one becomes available. | ||
189 | |||
190 | The function can only be called from a process context (for now). | ||
191 | Returns 0 on success and an appropriate error value on failure. | ||
192 | |||
193 | :: | ||
153 | 194 | ||
154 | struct rpmsg_endpoint *rpmsg_create_ept(struct rpmsg_channel *rpdev, | 195 | struct rpmsg_endpoint *rpmsg_create_ept(struct rpmsg_channel *rpdev, |
155 | void (*cb)(struct rpmsg_channel *, void *, int, void *, u32), | 196 | void (*cb)(struct rpmsg_channel *, void *, int, void *, u32), |
156 | void *priv, u32 addr); | 197 | void *priv, u32 addr); |
157 | - every rpmsg address in the system is bound to an rx callback (so when | 198 | |
158 | inbound messages arrive, they are dispatched by the rpmsg bus using the | 199 | every rpmsg address in the system is bound to an rx callback (so when |
159 | appropriate callback handler) by means of an rpmsg_endpoint struct. | 200 | inbound messages arrive, they are dispatched by the rpmsg bus using the |
160 | 201 | appropriate callback handler) by means of an rpmsg_endpoint struct. | |
161 | This function allows drivers to create such an endpoint, and by that, | 202 | |
162 | bind a callback, and possibly some private data too, to an rpmsg address | 203 | This function allows drivers to create such an endpoint, and by that, |
163 | (either one that is known in advance, or one that will be dynamically | 204 | bind a callback, and possibly some private data too, to an rpmsg address |
164 | assigned for them). | 205 | (either one that is known in advance, or one that will be dynamically |
165 | 206 | assigned for them). | |
166 | Simple rpmsg drivers need not call rpmsg_create_ept, because an endpoint | 207 | |
167 | is already created for them when they are probed by the rpmsg bus | 208 | Simple rpmsg drivers need not call rpmsg_create_ept, because an endpoint |
168 | (using the rx callback they provide when they registered to the rpmsg bus). | 209 | is already created for them when they are probed by the rpmsg bus |
169 | 210 | (using the rx callback they provide when they registered to the rpmsg bus). | |
170 | So things should just work for simple drivers: they already have an | 211 | |
171 | endpoint, their rx callback is bound to their rpmsg address, and when | 212 | So things should just work for simple drivers: they already have an |
172 | relevant inbound messages arrive (i.e. messages which their dst address | 213 | endpoint, their rx callback is bound to their rpmsg address, and when |
173 | equals to the src address of their rpmsg channel), the driver's handler | 214 | relevant inbound messages arrive (i.e. messages which their dst address |
174 | is invoked to process it. | 215 | equals to the src address of their rpmsg channel), the driver's handler |
175 | 216 | is invoked to process it. | |
176 | That said, more complicated drivers might do need to allocate | 217 | |
177 | additional rpmsg addresses, and bind them to different rx callbacks. | 218 | That said, more complicated drivers might do need to allocate |
178 | To accomplish that, those drivers need to call this function. | 219 | additional rpmsg addresses, and bind them to different rx callbacks. |
179 | Drivers should provide their channel (so the new endpoint would bind | 220 | To accomplish that, those drivers need to call this function. |
180 | to the same remote processor their channel belongs to), an rx callback | 221 | Drivers should provide their channel (so the new endpoint would bind |
181 | function, an optional private data (which is provided back when the | 222 | to the same remote processor their channel belongs to), an rx callback |
182 | rx callback is invoked), and an address they want to bind with the | 223 | function, an optional private data (which is provided back when the |
183 | callback. If addr is RPMSG_ADDR_ANY, then rpmsg_create_ept will | 224 | rx callback is invoked), and an address they want to bind with the |
184 | dynamically assign them an available rpmsg address (drivers should have | 225 | callback. If addr is RPMSG_ADDR_ANY, then rpmsg_create_ept will |
185 | a very good reason why not to always use RPMSG_ADDR_ANY here). | 226 | dynamically assign them an available rpmsg address (drivers should have |
186 | 227 | a very good reason why not to always use RPMSG_ADDR_ANY here). | |
187 | Returns a pointer to the endpoint on success, or NULL on error. | 228 | |
229 | Returns a pointer to the endpoint on success, or NULL on error. | ||
230 | |||
231 | :: | ||
188 | 232 | ||
189 | void rpmsg_destroy_ept(struct rpmsg_endpoint *ept); | 233 | void rpmsg_destroy_ept(struct rpmsg_endpoint *ept); |
190 | - destroys an existing rpmsg endpoint. user should provide a pointer | 234 | |
191 | to an rpmsg endpoint that was previously created with rpmsg_create_ept(). | 235 | |
236 | destroys an existing rpmsg endpoint. user should provide a pointer | ||
237 | to an rpmsg endpoint that was previously created with rpmsg_create_ept(). | ||
238 | |||
239 | :: | ||
192 | 240 | ||
193 | int register_rpmsg_driver(struct rpmsg_driver *rpdrv); | 241 | int register_rpmsg_driver(struct rpmsg_driver *rpdrv); |
194 | - registers an rpmsg driver with the rpmsg bus. user should provide | 242 | |
195 | a pointer to an rpmsg_driver struct, which contains the driver's | 243 | |
196 | ->probe() and ->remove() functions, an rx callback, and an id_table | 244 | registers an rpmsg driver with the rpmsg bus. user should provide |
197 | specifying the names of the channels this driver is interested to | 245 | a pointer to an rpmsg_driver struct, which contains the driver's |
198 | be probed with. | 246 | ->probe() and ->remove() functions, an rx callback, and an id_table |
247 | specifying the names of the channels this driver is interested to | ||
248 | be probed with. | ||
249 | |||
250 | :: | ||
199 | 251 | ||
200 | void unregister_rpmsg_driver(struct rpmsg_driver *rpdrv); | 252 | void unregister_rpmsg_driver(struct rpmsg_driver *rpdrv); |
201 | - unregisters an rpmsg driver from the rpmsg bus. user should provide | ||
202 | a pointer to a previously-registered rpmsg_driver struct. | ||
203 | Returns 0 on success, and an appropriate error value on failure. | ||
204 | 253 | ||
205 | 254 | ||
206 | 3. Typical usage | 255 | unregisters an rpmsg driver from the rpmsg bus. user should provide |
256 | a pointer to a previously-registered rpmsg_driver struct. | ||
257 | Returns 0 on success, and an appropriate error value on failure. | ||
258 | |||
259 | |||
260 | Typical usage | ||
261 | ============= | ||
207 | 262 | ||
208 | The following is a simple rpmsg driver, that sends an "hello!" message | 263 | The following is a simple rpmsg driver, that sends an "hello!" message |
209 | on probe(), and whenever it receives an incoming message, it dumps its | 264 | on probe(), and whenever it receives an incoming message, it dumps its |
210 | content to the console. | 265 | content to the console. |
211 | 266 | ||
212 | #include <linux/kernel.h> | 267 | :: |
213 | #include <linux/module.h> | 268 | |
214 | #include <linux/rpmsg.h> | 269 | #include <linux/kernel.h> |
270 | #include <linux/module.h> | ||
271 | #include <linux/rpmsg.h> | ||
215 | 272 | ||
216 | static void rpmsg_sample_cb(struct rpmsg_channel *rpdev, void *data, int len, | 273 | static void rpmsg_sample_cb(struct rpmsg_channel *rpdev, void *data, int len, |
217 | void *priv, u32 src) | 274 | void *priv, u32 src) |
218 | { | 275 | { |
219 | print_hex_dump(KERN_INFO, "incoming message:", DUMP_PREFIX_NONE, | 276 | print_hex_dump(KERN_INFO, "incoming message:", DUMP_PREFIX_NONE, |
220 | 16, 1, data, len, true); | 277 | 16, 1, data, len, true); |
221 | } | 278 | } |
222 | 279 | ||
223 | static int rpmsg_sample_probe(struct rpmsg_channel *rpdev) | 280 | static int rpmsg_sample_probe(struct rpmsg_channel *rpdev) |
224 | { | 281 | { |
225 | int err; | 282 | int err; |
226 | 283 | ||
227 | dev_info(&rpdev->dev, "chnl: 0x%x -> 0x%x\n", rpdev->src, rpdev->dst); | 284 | dev_info(&rpdev->dev, "chnl: 0x%x -> 0x%x\n", rpdev->src, rpdev->dst); |
@@ -234,32 +291,35 @@ static int rpmsg_sample_probe(struct rpmsg_channel *rpdev) | |||
234 | } | 291 | } |
235 | 292 | ||
236 | return 0; | 293 | return 0; |
237 | } | 294 | } |
238 | 295 | ||
239 | static void rpmsg_sample_remove(struct rpmsg_channel *rpdev) | 296 | static void rpmsg_sample_remove(struct rpmsg_channel *rpdev) |
240 | { | 297 | { |
241 | dev_info(&rpdev->dev, "rpmsg sample client driver is removed\n"); | 298 | dev_info(&rpdev->dev, "rpmsg sample client driver is removed\n"); |
242 | } | 299 | } |
243 | 300 | ||
244 | static struct rpmsg_device_id rpmsg_driver_sample_id_table[] = { | 301 | static struct rpmsg_device_id rpmsg_driver_sample_id_table[] = { |
245 | { .name = "rpmsg-client-sample" }, | 302 | { .name = "rpmsg-client-sample" }, |
246 | { }, | 303 | { }, |
247 | }; | 304 | }; |
248 | MODULE_DEVICE_TABLE(rpmsg, rpmsg_driver_sample_id_table); | 305 | MODULE_DEVICE_TABLE(rpmsg, rpmsg_driver_sample_id_table); |
249 | 306 | ||
250 | static struct rpmsg_driver rpmsg_sample_client = { | 307 | static struct rpmsg_driver rpmsg_sample_client = { |
251 | .drv.name = KBUILD_MODNAME, | 308 | .drv.name = KBUILD_MODNAME, |
252 | .id_table = rpmsg_driver_sample_id_table, | 309 | .id_table = rpmsg_driver_sample_id_table, |
253 | .probe = rpmsg_sample_probe, | 310 | .probe = rpmsg_sample_probe, |
254 | .callback = rpmsg_sample_cb, | 311 | .callback = rpmsg_sample_cb, |
255 | .remove = rpmsg_sample_remove, | 312 | .remove = rpmsg_sample_remove, |
256 | }; | 313 | }; |
257 | module_rpmsg_driver(rpmsg_sample_client); | 314 | module_rpmsg_driver(rpmsg_sample_client); |
315 | |||
316 | .. note:: | ||
258 | 317 | ||
259 | Note: a similar sample which can be built and loaded can be found | 318 | a similar sample which can be built and loaded can be found |
260 | in samples/rpmsg/. | 319 | in samples/rpmsg/. |
261 | 320 | ||
262 | 4. Allocations of rpmsg channels: | 321 | Allocations of rpmsg channels |
322 | ============================= | ||
263 | 323 | ||
264 | At this point we only support dynamic allocations of rpmsg channels. | 324 | At this point we only support dynamic allocations of rpmsg channels. |
265 | 325 | ||
diff --git a/Documentation/sgi-ioc4.txt b/Documentation/sgi-ioc4.txt index 876c96ae38db..72709222d3c0 100644 --- a/Documentation/sgi-ioc4.txt +++ b/Documentation/sgi-ioc4.txt | |||
@@ -1,3 +1,7 @@ | |||
1 | ==================================== | ||
2 | SGI IOC4 PCI (multi function) device | ||
3 | ==================================== | ||
4 | |||
1 | The SGI IOC4 PCI device is a bit of a strange beast, so some notes on | 5 | The SGI IOC4 PCI device is a bit of a strange beast, so some notes on |
2 | it are in order. | 6 | it are in order. |
3 | 7 | ||
diff --git a/Documentation/siphash.txt b/Documentation/siphash.txt index 908d348ff777..9965821ab333 100644 --- a/Documentation/siphash.txt +++ b/Documentation/siphash.txt | |||
@@ -1,6 +1,8 @@ | |||
1 | SipHash - a short input PRF | 1 | =========================== |
2 | ----------------------------------------------- | 2 | SipHash - a short input PRF |
3 | Written by Jason A. Donenfeld <jason@zx2c4.com> | 3 | =========================== |
4 | |||
5 | :Author: Written by Jason A. Donenfeld <jason@zx2c4.com> | ||
4 | 6 | ||
5 | SipHash is a cryptographically secure PRF -- a keyed hash function -- that | 7 | SipHash is a cryptographically secure PRF -- a keyed hash function -- that |
6 | performs very well for short inputs, hence the name. It was designed by | 8 | performs very well for short inputs, hence the name. It was designed by |
@@ -13,58 +15,61 @@ an input buffer or several input integers. It spits out an integer that is | |||
13 | indistinguishable from random. You may then use that integer as part of secure | 15 | indistinguishable from random. You may then use that integer as part of secure |
14 | sequence numbers, secure cookies, or mask it off for use in a hash table. | 16 | sequence numbers, secure cookies, or mask it off for use in a hash table. |
15 | 17 | ||
16 | 1. Generating a key | 18 | Generating a key |
19 | ================ | ||
17 | 20 | ||
18 | Keys should always be generated from a cryptographically secure source of | 21 | Keys should always be generated from a cryptographically secure source of |
19 | random numbers, either using get_random_bytes or get_random_once: | 22 | random numbers, either using get_random_bytes or get_random_once:: |
20 | 23 | ||
21 | siphash_key_t key; | 24 | siphash_key_t key; |
22 | get_random_bytes(&key, sizeof(key)); | 25 | get_random_bytes(&key, sizeof(key)); |
23 | 26 | ||
24 | If you're not deriving your key from here, you're doing it wrong. | 27 | If you're not deriving your key from here, you're doing it wrong. |
25 | 28 | ||
26 | 2. Using the functions | 29 | Using the functions |
30 | =================== | ||
27 | 31 | ||
28 | There are two variants of the function, one that takes a list of integers, and | 32 | There are two variants of the function, one that takes a list of integers, and |
29 | one that takes a buffer: | 33 | one that takes a buffer:: |
30 | 34 | ||
31 | u64 siphash(const void *data, size_t len, const siphash_key_t *key); | 35 | u64 siphash(const void *data, size_t len, const siphash_key_t *key); |
32 | 36 | ||
33 | And: | 37 | And:: |
34 | 38 | ||
35 | u64 siphash_1u64(u64, const siphash_key_t *key); | 39 | u64 siphash_1u64(u64, const siphash_key_t *key); |
36 | u64 siphash_2u64(u64, u64, const siphash_key_t *key); | 40 | u64 siphash_2u64(u64, u64, const siphash_key_t *key); |
37 | u64 siphash_3u64(u64, u64, u64, const siphash_key_t *key); | 41 | u64 siphash_3u64(u64, u64, u64, const siphash_key_t *key); |
38 | u64 siphash_4u64(u64, u64, u64, u64, const siphash_key_t *key); | 42 | u64 siphash_4u64(u64, u64, u64, u64, const siphash_key_t *key); |
39 | u64 siphash_1u32(u32, const siphash_key_t *key); | 43 | u64 siphash_1u32(u32, const siphash_key_t *key); |
40 | u64 siphash_2u32(u32, u32, const siphash_key_t *key); | 44 | u64 siphash_2u32(u32, u32, const siphash_key_t *key); |
41 | u64 siphash_3u32(u32, u32, u32, const siphash_key_t *key); | 45 | u64 siphash_3u32(u32, u32, u32, const siphash_key_t *key); |
42 | u64 siphash_4u32(u32, u32, u32, u32, const siphash_key_t *key); | 46 | u64 siphash_4u32(u32, u32, u32, u32, const siphash_key_t *key); |
43 | 47 | ||
44 | If you pass the generic siphash function something of a constant length, it | 48 | If you pass the generic siphash function something of a constant length, it |
45 | will constant fold at compile-time and automatically choose one of the | 49 | will constant fold at compile-time and automatically choose one of the |
46 | optimized functions. | 50 | optimized functions. |
47 | 51 | ||
48 | 3. Hashtable key function usage: | 52 | Hashtable key function usage:: |
49 | 53 | ||
50 | struct some_hashtable { | 54 | struct some_hashtable { |
51 | DECLARE_HASHTABLE(hashtable, 8); | 55 | DECLARE_HASHTABLE(hashtable, 8); |
52 | siphash_key_t key; | 56 | siphash_key_t key; |
53 | }; | 57 | }; |
54 | 58 | ||
55 | void init_hashtable(struct some_hashtable *table) | 59 | void init_hashtable(struct some_hashtable *table) |
56 | { | 60 | { |
57 | get_random_bytes(&table->key, sizeof(table->key)); | 61 | get_random_bytes(&table->key, sizeof(table->key)); |
58 | } | 62 | } |
59 | 63 | ||
60 | static inline hlist_head *some_hashtable_bucket(struct some_hashtable *table, struct interesting_input *input) | 64 | static inline hlist_head *some_hashtable_bucket(struct some_hashtable *table, struct interesting_input *input) |
61 | { | 65 | { |
62 | return &table->hashtable[siphash(input, sizeof(*input), &table->key) & (HASH_SIZE(table->hashtable) - 1)]; | 66 | return &table->hashtable[siphash(input, sizeof(*input), &table->key) & (HASH_SIZE(table->hashtable) - 1)]; |
63 | } | 67 | } |
64 | 68 | ||
65 | You may then iterate like usual over the returned hash bucket. | 69 | You may then iterate like usual over the returned hash bucket. |
66 | 70 | ||
67 | 4. Security | 71 | Security |
72 | ======== | ||
68 | 73 | ||
69 | SipHash has a very high security margin, with its 128-bit key. So long as the | 74 | SipHash has a very high security margin, with its 128-bit key. So long as the |
70 | key is kept secret, it is impossible for an attacker to guess the outputs of | 75 | key is kept secret, it is impossible for an attacker to guess the outputs of |
@@ -73,7 +78,8 @@ is significant. | |||
73 | 78 | ||
74 | Linux implements the "2-4" variant of SipHash. | 79 | Linux implements the "2-4" variant of SipHash. |
75 | 80 | ||
76 | 5. Struct-passing Pitfalls | 81 | Struct-passing Pitfalls |
82 | ======================= | ||
77 | 83 | ||
78 | Often times the XuY functions will not be large enough, and instead you'll | 84 | Often times the XuY functions will not be large enough, and instead you'll |
79 | want to pass a pre-filled struct to siphash. When doing this, it's important | 85 | want to pass a pre-filled struct to siphash. When doing this, it's important |
@@ -81,30 +87,32 @@ to always ensure the struct has no padding holes. The easiest way to do this | |||
81 | is to simply arrange the members of the struct in descending order of size, | 87 | is to simply arrange the members of the struct in descending order of size, |
82 | and to use offsetendof() instead of sizeof() for getting the size. For | 88 | and to use offsetendof() instead of sizeof() for getting the size. For |
83 | performance reasons, if possible, it's probably a good thing to align the | 89 | performance reasons, if possible, it's probably a good thing to align the |
84 | struct to the right boundary. Here's an example: | 90 | struct to the right boundary. Here's an example:: |
85 | 91 | ||
86 | const struct { | 92 | const struct { |
87 | struct in6_addr saddr; | 93 | struct in6_addr saddr; |
88 | u32 counter; | 94 | u32 counter; |
89 | u16 dport; | 95 | u16 dport; |
90 | } __aligned(SIPHASH_ALIGNMENT) combined = { | 96 | } __aligned(SIPHASH_ALIGNMENT) combined = { |
91 | .saddr = *(struct in6_addr *)saddr, | 97 | .saddr = *(struct in6_addr *)saddr, |
92 | .counter = counter, | 98 | .counter = counter, |
93 | .dport = dport | 99 | .dport = dport |
94 | }; | 100 | }; |
95 | u64 h = siphash(&combined, offsetofend(typeof(combined), dport), &secret); | 101 | u64 h = siphash(&combined, offsetofend(typeof(combined), dport), &secret); |
96 | 102 | ||
97 | 6. Resources | 103 | Resources |
104 | ========= | ||
98 | 105 | ||
99 | Read the SipHash paper if you're interested in learning more: | 106 | Read the SipHash paper if you're interested in learning more: |
100 | https://131002.net/siphash/siphash.pdf | 107 | https://131002.net/siphash/siphash.pdf |
101 | 108 | ||
109 | ------------------------------------------------------------------------------- | ||
102 | 110 | ||
103 | ~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~ | 111 | =============================================== |
104 | |||
105 | HalfSipHash - SipHash's insecure younger cousin | 112 | HalfSipHash - SipHash's insecure younger cousin |
106 | ----------------------------------------------- | 113 | =============================================== |
107 | Written by Jason A. Donenfeld <jason@zx2c4.com> | 114 | |
115 | :Author: Written by Jason A. Donenfeld <jason@zx2c4.com> | ||
108 | 116 | ||
109 | On the off-chance that SipHash is not fast enough for your needs, you might be | 117 | On the off-chance that SipHash is not fast enough for your needs, you might be |
110 | able to justify using HalfSipHash, a terrifying but potentially useful | 118 | able to justify using HalfSipHash, a terrifying but potentially useful |
@@ -120,7 +128,8 @@ then when you can be absolutely certain that the outputs will never be | |||
120 | transmitted out of the kernel. This is only remotely useful over `jhash` as a | 128 | transmitted out of the kernel. This is only remotely useful over `jhash` as a |
121 | means of mitigating hashtable flooding denial of service attacks. | 129 | means of mitigating hashtable flooding denial of service attacks. |
122 | 130 | ||
123 | 1. Generating a key | 131 | Generating a key |
132 | ================ | ||
124 | 133 | ||
125 | Keys should always be generated from a cryptographically secure source of | 134 | Keys should always be generated from a cryptographically secure source of |
126 | random numbers, either using get_random_bytes or get_random_once: | 135 | random numbers, either using get_random_bytes or get_random_once: |
@@ -130,44 +139,49 @@ get_random_bytes(&key, sizeof(key)); | |||
130 | 139 | ||
131 | If you're not deriving your key from here, you're doing it wrong. | 140 | If you're not deriving your key from here, you're doing it wrong. |
132 | 141 | ||
133 | 2. Using the functions | 142 | Using the functions |
143 | =================== | ||
134 | 144 | ||
135 | There are two variants of the function, one that takes a list of integers, and | 145 | There are two variants of the function, one that takes a list of integers, and |
136 | one that takes a buffer: | 146 | one that takes a buffer:: |
137 | 147 | ||
138 | u32 hsiphash(const void *data, size_t len, const hsiphash_key_t *key); | 148 | u32 hsiphash(const void *data, size_t len, const hsiphash_key_t *key); |
139 | 149 | ||
140 | And: | 150 | And:: |
141 | 151 | ||
142 | u32 hsiphash_1u32(u32, const hsiphash_key_t *key); | 152 | u32 hsiphash_1u32(u32, const hsiphash_key_t *key); |
143 | u32 hsiphash_2u32(u32, u32, const hsiphash_key_t *key); | 153 | u32 hsiphash_2u32(u32, u32, const hsiphash_key_t *key); |
144 | u32 hsiphash_3u32(u32, u32, u32, const hsiphash_key_t *key); | 154 | u32 hsiphash_3u32(u32, u32, u32, const hsiphash_key_t *key); |
145 | u32 hsiphash_4u32(u32, u32, u32, u32, const hsiphash_key_t *key); | 155 | u32 hsiphash_4u32(u32, u32, u32, u32, const hsiphash_key_t *key); |
146 | 156 | ||
147 | If you pass the generic hsiphash function something of a constant length, it | 157 | If you pass the generic hsiphash function something of a constant length, it |
148 | will constant fold at compile-time and automatically choose one of the | 158 | will constant fold at compile-time and automatically choose one of the |
149 | optimized functions. | 159 | optimized functions. |
150 | 160 | ||
151 | 3. Hashtable key function usage: | 161 | Hashtable key function usage |
162 | ============================ | ||
163 | |||
164 | :: | ||
152 | 165 | ||
153 | struct some_hashtable { | 166 | struct some_hashtable { |
154 | DECLARE_HASHTABLE(hashtable, 8); | 167 | DECLARE_HASHTABLE(hashtable, 8); |
155 | hsiphash_key_t key; | 168 | hsiphash_key_t key; |
156 | }; | 169 | }; |
157 | 170 | ||
158 | void init_hashtable(struct some_hashtable *table) | 171 | void init_hashtable(struct some_hashtable *table) |
159 | { | 172 | { |
160 | get_random_bytes(&table->key, sizeof(table->key)); | 173 | get_random_bytes(&table->key, sizeof(table->key)); |
161 | } | 174 | } |
162 | 175 | ||
163 | static inline hlist_head *some_hashtable_bucket(struct some_hashtable *table, struct interesting_input *input) | 176 | static inline hlist_head *some_hashtable_bucket(struct some_hashtable *table, struct interesting_input *input) |
164 | { | 177 | { |
165 | return &table->hashtable[hsiphash(input, sizeof(*input), &table->key) & (HASH_SIZE(table->hashtable) - 1)]; | 178 | return &table->hashtable[hsiphash(input, sizeof(*input), &table->key) & (HASH_SIZE(table->hashtable) - 1)]; |
166 | } | 179 | } |
167 | 180 | ||
168 | You may then iterate like usual over the returned hash bucket. | 181 | You may then iterate like usual over the returned hash bucket. |
169 | 182 | ||
170 | 4. Performance | 183 | Performance |
184 | =========== | ||
171 | 185 | ||
172 | HalfSipHash is roughly 3 times slower than JenkinsHash. For many replacements, | 186 | HalfSipHash is roughly 3 times slower than JenkinsHash. For many replacements, |
173 | this will not be a problem, as the hashtable lookup isn't the bottleneck. And | 187 | this will not be a problem, as the hashtable lookup isn't the bottleneck. And |
diff --git a/Documentation/smsc_ece1099.txt b/Documentation/smsc_ece1099.txt index 6b492e82b43d..079277421eaf 100644 --- a/Documentation/smsc_ece1099.txt +++ b/Documentation/smsc_ece1099.txt | |||
@@ -1,3 +1,7 @@ | |||
1 | ================================================= | ||
2 | Msc Keyboard Scan Expansion/GPIO Expansion device | ||
3 | ================================================= | ||
4 | |||
1 | What is smsc-ece1099? | 5 | What is smsc-ece1099? |
2 | ---------------------- | 6 | ---------------------- |
3 | 7 | ||
diff --git a/Documentation/static-keys.txt b/Documentation/static-keys.txt index ef419fd0897f..b83dfa1c0602 100644 --- a/Documentation/static-keys.txt +++ b/Documentation/static-keys.txt | |||
@@ -1,30 +1,34 @@ | |||
1 | Static Keys | 1 | =========== |
2 | ----------- | 2 | Static Keys |
3 | =========== | ||
3 | 4 | ||
4 | DEPRECATED API: | 5 | .. warning:: |
5 | 6 | ||
6 | The use of 'struct static_key' directly, is now DEPRECATED. In addition | 7 | DEPRECATED API: |
7 | static_key_{true,false}() is also DEPRECATED. IE DO NOT use the following: | ||
8 | 8 | ||
9 | struct static_key false = STATIC_KEY_INIT_FALSE; | 9 | The use of 'struct static_key' directly, is now DEPRECATED. In addition |
10 | struct static_key true = STATIC_KEY_INIT_TRUE; | 10 | static_key_{true,false}() is also DEPRECATED. IE DO NOT use the following:: |
11 | static_key_true() | ||
12 | static_key_false() | ||
13 | 11 | ||
14 | The updated API replacements are: | 12 | struct static_key false = STATIC_KEY_INIT_FALSE; |
13 | struct static_key true = STATIC_KEY_INIT_TRUE; | ||
14 | static_key_true() | ||
15 | static_key_false() | ||
15 | 16 | ||
16 | DEFINE_STATIC_KEY_TRUE(key); | 17 | The updated API replacements are:: |
17 | DEFINE_STATIC_KEY_FALSE(key); | ||
18 | DEFINE_STATIC_KEY_ARRAY_TRUE(keys, count); | ||
19 | DEFINE_STATIC_KEY_ARRAY_FALSE(keys, count); | ||
20 | static_branch_likely() | ||
21 | static_branch_unlikely() | ||
22 | 18 | ||
23 | 0) Abstract | 19 | DEFINE_STATIC_KEY_TRUE(key); |
20 | DEFINE_STATIC_KEY_FALSE(key); | ||
21 | DEFINE_STATIC_KEY_ARRAY_TRUE(keys, count); | ||
22 | DEFINE_STATIC_KEY_ARRAY_FALSE(keys, count); | ||
23 | static_branch_likely() | ||
24 | static_branch_unlikely() | ||
25 | |||
26 | Abstract | ||
27 | ======== | ||
24 | 28 | ||
25 | Static keys allows the inclusion of seldom used features in | 29 | Static keys allows the inclusion of seldom used features in |
26 | performance-sensitive fast-path kernel code, via a GCC feature and a code | 30 | performance-sensitive fast-path kernel code, via a GCC feature and a code |
27 | patching technique. A quick example: | 31 | patching technique. A quick example:: |
28 | 32 | ||
29 | DEFINE_STATIC_KEY_FALSE(key); | 33 | DEFINE_STATIC_KEY_FALSE(key); |
30 | 34 | ||
@@ -45,7 +49,8 @@ The static_branch_unlikely() branch will be generated into the code with as litt | |||
45 | impact to the likely code path as possible. | 49 | impact to the likely code path as possible. |
46 | 50 | ||
47 | 51 | ||
48 | 1) Motivation | 52 | Motivation |
53 | ========== | ||
49 | 54 | ||
50 | 55 | ||
51 | Currently, tracepoints are implemented using a conditional branch. The | 56 | Currently, tracepoints are implemented using a conditional branch. The |
@@ -60,7 +65,8 @@ possible. Although tracepoints are the original motivation for this work, other | |||
60 | kernel code paths should be able to make use of the static keys facility. | 65 | kernel code paths should be able to make use of the static keys facility. |
61 | 66 | ||
62 | 67 | ||
63 | 2) Solution | 68 | Solution |
69 | ======== | ||
64 | 70 | ||
65 | 71 | ||
66 | gcc (v4.5) adds a new 'asm goto' statement that allows branching to a label: | 72 | gcc (v4.5) adds a new 'asm goto' statement that allows branching to a label: |
@@ -71,7 +77,7 @@ Using the 'asm goto', we can create branches that are either taken or not taken | |||
71 | by default, without the need to check memory. Then, at run-time, we can patch | 77 | by default, without the need to check memory. Then, at run-time, we can patch |
72 | the branch site to change the branch direction. | 78 | the branch site to change the branch direction. |
73 | 79 | ||
74 | For example, if we have a simple branch that is disabled by default: | 80 | For example, if we have a simple branch that is disabled by default:: |
75 | 81 | ||
76 | if (static_branch_unlikely(&key)) | 82 | if (static_branch_unlikely(&key)) |
77 | printk("I am the true branch\n"); | 83 | printk("I am the true branch\n"); |
@@ -87,14 +93,15 @@ optimization. | |||
87 | This lowlevel patching mechanism is called 'jump label patching', and it gives | 93 | This lowlevel patching mechanism is called 'jump label patching', and it gives |
88 | the basis for the static keys facility. | 94 | the basis for the static keys facility. |
89 | 95 | ||
90 | 3) Static key label API, usage and examples: | 96 | Static key label API, usage and examples |
97 | ======================================== | ||
91 | 98 | ||
92 | 99 | ||
93 | In order to make use of this optimization you must first define a key: | 100 | In order to make use of this optimization you must first define a key:: |
94 | 101 | ||
95 | DEFINE_STATIC_KEY_TRUE(key); | 102 | DEFINE_STATIC_KEY_TRUE(key); |
96 | 103 | ||
97 | or: | 104 | or:: |
98 | 105 | ||
99 | DEFINE_STATIC_KEY_FALSE(key); | 106 | DEFINE_STATIC_KEY_FALSE(key); |
100 | 107 | ||
@@ -102,14 +109,14 @@ or: | |||
102 | The key must be global, that is, it can't be allocated on the stack or dynamically | 109 | The key must be global, that is, it can't be allocated on the stack or dynamically |
103 | allocated at run-time. | 110 | allocated at run-time. |
104 | 111 | ||
105 | The key is then used in code as: | 112 | The key is then used in code as:: |
106 | 113 | ||
107 | if (static_branch_unlikely(&key)) | 114 | if (static_branch_unlikely(&key)) |
108 | do unlikely code | 115 | do unlikely code |
109 | else | 116 | else |
110 | do likely code | 117 | do likely code |
111 | 118 | ||
112 | Or: | 119 | Or:: |
113 | 120 | ||
114 | if (static_branch_likely(&key)) | 121 | if (static_branch_likely(&key)) |
115 | do likely code | 122 | do likely code |
@@ -120,15 +127,15 @@ Keys defined via DEFINE_STATIC_KEY_TRUE(), or DEFINE_STATIC_KEY_FALSE, may | |||
120 | be used in either static_branch_likely() or static_branch_unlikely() | 127 | be used in either static_branch_likely() or static_branch_unlikely() |
121 | statements. | 128 | statements. |
122 | 129 | ||
123 | Branch(es) can be set true via: | 130 | Branch(es) can be set true via:: |
124 | 131 | ||
125 | static_branch_enable(&key); | 132 | static_branch_enable(&key); |
126 | 133 | ||
127 | or false via: | 134 | or false via:: |
128 | 135 | ||
129 | static_branch_disable(&key); | 136 | static_branch_disable(&key); |
130 | 137 | ||
131 | The branch(es) can then be switched via reference counts: | 138 | The branch(es) can then be switched via reference counts:: |
132 | 139 | ||
133 | static_branch_inc(&key); | 140 | static_branch_inc(&key); |
134 | ... | 141 | ... |
@@ -142,11 +149,11 @@ static_branch_inc(), will change the branch back to true. Likewise, if the | |||
142 | key is initialized false, a 'static_branch_inc()', will change the branch to | 149 | key is initialized false, a 'static_branch_inc()', will change the branch to |
143 | true. And then a 'static_branch_dec()', will again make the branch false. | 150 | true. And then a 'static_branch_dec()', will again make the branch false. |
144 | 151 | ||
145 | Where an array of keys is required, it can be defined as: | 152 | Where an array of keys is required, it can be defined as:: |
146 | 153 | ||
147 | DEFINE_STATIC_KEY_ARRAY_TRUE(keys, count); | 154 | DEFINE_STATIC_KEY_ARRAY_TRUE(keys, count); |
148 | 155 | ||
149 | or: | 156 | or:: |
150 | 157 | ||
151 | DEFINE_STATIC_KEY_ARRAY_FALSE(keys, count); | 158 | DEFINE_STATIC_KEY_ARRAY_FALSE(keys, count); |
152 | 159 | ||
@@ -159,96 +166,98 @@ simply fall back to a traditional, load, test, and jump sequence. Also, the | |||
159 | struct jump_entry table must be at least 4-byte aligned because the | 166 | struct jump_entry table must be at least 4-byte aligned because the |
160 | static_key->entry field makes use of the two least significant bits. | 167 | static_key->entry field makes use of the two least significant bits. |
161 | 168 | ||
162 | * select HAVE_ARCH_JUMP_LABEL, see: arch/x86/Kconfig | 169 | * ``select HAVE_ARCH_JUMP_LABEL``, |
163 | 170 | see: arch/x86/Kconfig | |
164 | * #define JUMP_LABEL_NOP_SIZE, see: arch/x86/include/asm/jump_label.h | ||
165 | 171 | ||
166 | * __always_inline bool arch_static_branch(struct static_key *key, bool branch), see: | 172 | * ``#define JUMP_LABEL_NOP_SIZE``, |
167 | arch/x86/include/asm/jump_label.h | 173 | see: arch/x86/include/asm/jump_label.h |
168 | 174 | ||
169 | * __always_inline bool arch_static_branch_jump(struct static_key *key, bool branch), | 175 | * ``__always_inline bool arch_static_branch(struct static_key *key, bool branch)``, |
170 | see: arch/x86/include/asm/jump_label.h | 176 | see: arch/x86/include/asm/jump_label.h |
171 | 177 | ||
172 | * void arch_jump_label_transform(struct jump_entry *entry, enum jump_label_type type), | 178 | * ``__always_inline bool arch_static_branch_jump(struct static_key *key, bool branch)``, |
173 | see: arch/x86/kernel/jump_label.c | 179 | see: arch/x86/include/asm/jump_label.h |
174 | 180 | ||
175 | * __init_or_module void arch_jump_label_transform_static(struct jump_entry *entry, enum jump_label_type type), | 181 | * ``void arch_jump_label_transform(struct jump_entry *entry, enum jump_label_type type)``, |
176 | see: arch/x86/kernel/jump_label.c | 182 | see: arch/x86/kernel/jump_label.c |
177 | 183 | ||
184 | * ``__init_or_module void arch_jump_label_transform_static(struct jump_entry *entry, enum jump_label_type type)``, | ||
185 | see: arch/x86/kernel/jump_label.c | ||
178 | 186 | ||
179 | * struct jump_entry, see: arch/x86/include/asm/jump_label.h | 187 | * ``struct jump_entry``, |
188 | see: arch/x86/include/asm/jump_label.h | ||
180 | 189 | ||
181 | 190 | ||
182 | 5) Static keys / jump label analysis, results (x86_64): | 191 | 5) Static keys / jump label analysis, results (x86_64): |
183 | 192 | ||
184 | 193 | ||
185 | As an example, let's add the following branch to 'getppid()', such that the | 194 | As an example, let's add the following branch to 'getppid()', such that the |
186 | system call now looks like: | 195 | system call now looks like:: |
187 | 196 | ||
188 | SYSCALL_DEFINE0(getppid) | 197 | SYSCALL_DEFINE0(getppid) |
189 | { | 198 | { |
190 | int pid; | 199 | int pid; |
191 | 200 | ||
192 | + if (static_branch_unlikely(&key)) | 201 | + if (static_branch_unlikely(&key)) |
193 | + printk("I am the true branch\n"); | 202 | + printk("I am the true branch\n"); |
194 | 203 | ||
195 | rcu_read_lock(); | 204 | rcu_read_lock(); |
196 | pid = task_tgid_vnr(rcu_dereference(current->real_parent)); | 205 | pid = task_tgid_vnr(rcu_dereference(current->real_parent)); |
197 | rcu_read_unlock(); | 206 | rcu_read_unlock(); |
198 | 207 | ||
199 | return pid; | 208 | return pid; |
200 | } | 209 | } |
201 | 210 | ||
202 | The resulting instructions with jump labels generated by GCC is: | 211 | The resulting instructions with jump labels generated by GCC is:: |
203 | 212 | ||
204 | ffffffff81044290 <sys_getppid>: | 213 | ffffffff81044290 <sys_getppid>: |
205 | ffffffff81044290: 55 push %rbp | 214 | ffffffff81044290: 55 push %rbp |
206 | ffffffff81044291: 48 89 e5 mov %rsp,%rbp | 215 | ffffffff81044291: 48 89 e5 mov %rsp,%rbp |
207 | ffffffff81044294: e9 00 00 00 00 jmpq ffffffff81044299 <sys_getppid+0x9> | 216 | ffffffff81044294: e9 00 00 00 00 jmpq ffffffff81044299 <sys_getppid+0x9> |
208 | ffffffff81044299: 65 48 8b 04 25 c0 b6 mov %gs:0xb6c0,%rax | 217 | ffffffff81044299: 65 48 8b 04 25 c0 b6 mov %gs:0xb6c0,%rax |
209 | ffffffff810442a0: 00 00 | 218 | ffffffff810442a0: 00 00 |
210 | ffffffff810442a2: 48 8b 80 80 02 00 00 mov 0x280(%rax),%rax | 219 | ffffffff810442a2: 48 8b 80 80 02 00 00 mov 0x280(%rax),%rax |
211 | ffffffff810442a9: 48 8b 80 b0 02 00 00 mov 0x2b0(%rax),%rax | 220 | ffffffff810442a9: 48 8b 80 b0 02 00 00 mov 0x2b0(%rax),%rax |
212 | ffffffff810442b0: 48 8b b8 e8 02 00 00 mov 0x2e8(%rax),%rdi | 221 | ffffffff810442b0: 48 8b b8 e8 02 00 00 mov 0x2e8(%rax),%rdi |
213 | ffffffff810442b7: e8 f4 d9 00 00 callq ffffffff81051cb0 <pid_vnr> | 222 | ffffffff810442b7: e8 f4 d9 00 00 callq ffffffff81051cb0 <pid_vnr> |
214 | ffffffff810442bc: 5d pop %rbp | 223 | ffffffff810442bc: 5d pop %rbp |
215 | ffffffff810442bd: 48 98 cltq | 224 | ffffffff810442bd: 48 98 cltq |
216 | ffffffff810442bf: c3 retq | 225 | ffffffff810442bf: c3 retq |
217 | ffffffff810442c0: 48 c7 c7 e3 54 98 81 mov $0xffffffff819854e3,%rdi | 226 | ffffffff810442c0: 48 c7 c7 e3 54 98 81 mov $0xffffffff819854e3,%rdi |
218 | ffffffff810442c7: 31 c0 xor %eax,%eax | 227 | ffffffff810442c7: 31 c0 xor %eax,%eax |
219 | ffffffff810442c9: e8 71 13 6d 00 callq ffffffff8171563f <printk> | 228 | ffffffff810442c9: e8 71 13 6d 00 callq ffffffff8171563f <printk> |
220 | ffffffff810442ce: eb c9 jmp ffffffff81044299 <sys_getppid+0x9> | 229 | ffffffff810442ce: eb c9 jmp ffffffff81044299 <sys_getppid+0x9> |
221 | 230 | ||
222 | Without the jump label optimization it looks like: | 231 | Without the jump label optimization it looks like:: |
223 | 232 | ||
224 | ffffffff810441f0 <sys_getppid>: | 233 | ffffffff810441f0 <sys_getppid>: |
225 | ffffffff810441f0: 8b 05 8a 52 d8 00 mov 0xd8528a(%rip),%eax # ffffffff81dc9480 <key> | 234 | ffffffff810441f0: 8b 05 8a 52 d8 00 mov 0xd8528a(%rip),%eax # ffffffff81dc9480 <key> |
226 | ffffffff810441f6: 55 push %rbp | 235 | ffffffff810441f6: 55 push %rbp |
227 | ffffffff810441f7: 48 89 e5 mov %rsp,%rbp | 236 | ffffffff810441f7: 48 89 e5 mov %rsp,%rbp |
228 | ffffffff810441fa: 85 c0 test %eax,%eax | 237 | ffffffff810441fa: 85 c0 test %eax,%eax |
229 | ffffffff810441fc: 75 27 jne ffffffff81044225 <sys_getppid+0x35> | 238 | ffffffff810441fc: 75 27 jne ffffffff81044225 <sys_getppid+0x35> |
230 | ffffffff810441fe: 65 48 8b 04 25 c0 b6 mov %gs:0xb6c0,%rax | 239 | ffffffff810441fe: 65 48 8b 04 25 c0 b6 mov %gs:0xb6c0,%rax |
231 | ffffffff81044205: 00 00 | 240 | ffffffff81044205: 00 00 |
232 | ffffffff81044207: 48 8b 80 80 02 00 00 mov 0x280(%rax),%rax | 241 | ffffffff81044207: 48 8b 80 80 02 00 00 mov 0x280(%rax),%rax |
233 | ffffffff8104420e: 48 8b 80 b0 02 00 00 mov 0x2b0(%rax),%rax | 242 | ffffffff8104420e: 48 8b 80 b0 02 00 00 mov 0x2b0(%rax),%rax |
234 | ffffffff81044215: 48 8b b8 e8 02 00 00 mov 0x2e8(%rax),%rdi | 243 | ffffffff81044215: 48 8b b8 e8 02 00 00 mov 0x2e8(%rax),%rdi |
235 | ffffffff8104421c: e8 2f da 00 00 callq ffffffff81051c50 <pid_vnr> | 244 | ffffffff8104421c: e8 2f da 00 00 callq ffffffff81051c50 <pid_vnr> |
236 | ffffffff81044221: 5d pop %rbp | 245 | ffffffff81044221: 5d pop %rbp |
237 | ffffffff81044222: 48 98 cltq | 246 | ffffffff81044222: 48 98 cltq |
238 | ffffffff81044224: c3 retq | 247 | ffffffff81044224: c3 retq |
239 | ffffffff81044225: 48 c7 c7 13 53 98 81 mov $0xffffffff81985313,%rdi | 248 | ffffffff81044225: 48 c7 c7 13 53 98 81 mov $0xffffffff81985313,%rdi |
240 | ffffffff8104422c: 31 c0 xor %eax,%eax | 249 | ffffffff8104422c: 31 c0 xor %eax,%eax |
241 | ffffffff8104422e: e8 60 0f 6d 00 callq ffffffff81715193 <printk> | 250 | ffffffff8104422e: e8 60 0f 6d 00 callq ffffffff81715193 <printk> |
242 | ffffffff81044233: eb c9 jmp ffffffff810441fe <sys_getppid+0xe> | 251 | ffffffff81044233: eb c9 jmp ffffffff810441fe <sys_getppid+0xe> |
243 | ffffffff81044235: 66 66 2e 0f 1f 84 00 data32 nopw %cs:0x0(%rax,%rax,1) | 252 | ffffffff81044235: 66 66 2e 0f 1f 84 00 data32 nopw %cs:0x0(%rax,%rax,1) |
244 | ffffffff8104423c: 00 00 00 00 | 253 | ffffffff8104423c: 00 00 00 00 |
245 | 254 | ||
246 | Thus, the disable jump label case adds a 'mov', 'test' and 'jne' instruction | 255 | Thus, the disable jump label case adds a 'mov', 'test' and 'jne' instruction |
247 | vs. the jump label case just has a 'no-op' or 'jmp 0'. (The jmp 0, is patched | 256 | vs. the jump label case just has a 'no-op' or 'jmp 0'. (The jmp 0, is patched |
248 | to a 5 byte atomic no-op instruction at boot-time.) Thus, the disabled jump | 257 | to a 5 byte atomic no-op instruction at boot-time.) Thus, the disabled jump |
249 | label case adds: | 258 | label case adds:: |
250 | 259 | ||
251 | 6 (mov) + 2 (test) + 2 (jne) = 10 - 5 (5 byte jump 0) = 5 addition bytes. | 260 | 6 (mov) + 2 (test) + 2 (jne) = 10 - 5 (5 byte jump 0) = 5 addition bytes. |
252 | 261 | ||
253 | If we then include the padding bytes, the jump label code saves, 16 total bytes | 262 | If we then include the padding bytes, the jump label code saves, 16 total bytes |
254 | of instruction memory for this small function. In this case the non-jump label | 263 | of instruction memory for this small function. In this case the non-jump label |
@@ -262,7 +271,7 @@ Since there are a number of static key API uses in the scheduler paths, | |||
262 | 'pipe-test' (also known as 'perf bench sched pipe') can be used to show the | 271 | 'pipe-test' (also known as 'perf bench sched pipe') can be used to show the |
263 | performance improvement. Testing done on 3.3.0-rc2: | 272 | performance improvement. Testing done on 3.3.0-rc2: |
264 | 273 | ||
265 | jump label disabled: | 274 | jump label disabled:: |
266 | 275 | ||
267 | Performance counter stats for 'bash -c /tmp/pipe-test' (50 runs): | 276 | Performance counter stats for 'bash -c /tmp/pipe-test' (50 runs): |
268 | 277 | ||
@@ -279,7 +288,7 @@ jump label disabled: | |||
279 | 288 | ||
280 | 1.601607384 seconds time elapsed ( +- 0.07% ) | 289 | 1.601607384 seconds time elapsed ( +- 0.07% ) |
281 | 290 | ||
282 | jump label enabled: | 291 | jump label enabled:: |
283 | 292 | ||
284 | Performance counter stats for 'bash -c /tmp/pipe-test' (50 runs): | 293 | Performance counter stats for 'bash -c /tmp/pipe-test' (50 runs): |
285 | 294 | ||
diff --git a/Documentation/svga.txt b/Documentation/svga.txt index cd66ec836e4f..119f1515b1ac 100644 --- a/Documentation/svga.txt +++ b/Documentation/svga.txt | |||
@@ -1,24 +1,31 @@ | |||
1 | Video Mode Selection Support 2.13 | 1 | .. include:: <isonum.txt> |
2 | (c) 1995--1999 Martin Mares, <mj@ucw.cz> | ||
3 | -------------------------------------------------------------------------------- | ||
4 | 2 | ||
5 | 1. Intro | 3 | ================================= |
6 | ~~~~~~~~ | 4 | Video Mode Selection Support 2.13 |
7 | This small document describes the "Video Mode Selection" feature which | 5 | ================================= |
6 | |||
7 | :Copyright: |copy| 1995--1999 Martin Mares, <mj@ucw.cz> | ||
8 | |||
9 | Intro | ||
10 | ~~~~~ | ||
11 | |||
12 | This small document describes the "Video Mode Selection" feature which | ||
8 | allows the use of various special video modes supported by the video BIOS. Due | 13 | allows the use of various special video modes supported by the video BIOS. Due |
9 | to usage of the BIOS, the selection is limited to boot time (before the | 14 | to usage of the BIOS, the selection is limited to boot time (before the |
10 | kernel decompression starts) and works only on 80X86 machines. | 15 | kernel decompression starts) and works only on 80X86 machines. |
11 | 16 | ||
12 | ** Short intro for the impatient: Just use vga=ask for the first time, | 17 | .. note:: |
13 | ** enter `scan' on the video mode prompt, pick the mode you want to use, | ||
14 | ** remember its mode ID (the four-digit hexadecimal number) and then | ||
15 | ** set the vga parameter to this number (converted to decimal first). | ||
16 | 18 | ||
17 | The video mode to be used is selected by a kernel parameter which can be | 19 | Short intro for the impatient: Just use vga=ask for the first time, |
20 | enter ``scan`` on the video mode prompt, pick the mode you want to use, | ||
21 | remember its mode ID (the four-digit hexadecimal number) and then | ||
22 | set the vga parameter to this number (converted to decimal first). | ||
23 | |||
24 | The video mode to be used is selected by a kernel parameter which can be | ||
18 | specified in the kernel Makefile (the SVGA_MODE=... line) or by the "vga=..." | 25 | specified in the kernel Makefile (the SVGA_MODE=... line) or by the "vga=..." |
19 | option of LILO (or some other boot loader you use) or by the "vidmode" utility | 26 | option of LILO (or some other boot loader you use) or by the "vidmode" utility |
20 | (present in standard Linux utility packages). You can use the following values | 27 | (present in standard Linux utility packages). You can use the following values |
21 | of this parameter: | 28 | of this parameter:: |
22 | 29 | ||
23 | NORMAL_VGA - Standard 80x25 mode available on all display adapters. | 30 | NORMAL_VGA - Standard 80x25 mode available on all display adapters. |
24 | 31 | ||
@@ -37,77 +44,79 @@ of this parameter: | |||
37 | for exact meaning of the ID). Warning: rdev and LILO don't support | 44 | for exact meaning of the ID). Warning: rdev and LILO don't support |
38 | hexadecimal numbers -- you have to convert it to decimal manually. | 45 | hexadecimal numbers -- you have to convert it to decimal manually. |
39 | 46 | ||
40 | 2. Menu | 47 | Menu |
41 | ~~~~~~~ | 48 | ~~~~ |
42 | The ASK_VGA mode causes the kernel to offer a video mode menu upon | 49 | |
50 | The ASK_VGA mode causes the kernel to offer a video mode menu upon | ||
43 | bootup. It displays a "Press <RETURN> to see video modes available, <SPACE> | 51 | bootup. It displays a "Press <RETURN> to see video modes available, <SPACE> |
44 | to continue or wait 30 secs" message. If you press <RETURN>, you enter the | 52 | to continue or wait 30 secs" message. If you press <RETURN>, you enter the |
45 | menu, if you press <SPACE> or wait 30 seconds, the kernel will boot up in | 53 | menu, if you press <SPACE> or wait 30 seconds, the kernel will boot up in |
46 | the standard 80x25 mode. | 54 | the standard 80x25 mode. |
47 | 55 | ||
48 | The menu looks like: | 56 | The menu looks like:: |
49 | 57 | ||
50 | Video adapter: <name-of-detected-video-adapter> | 58 | Video adapter: <name-of-detected-video-adapter> |
51 | Mode: COLSxROWS: | 59 | Mode: COLSxROWS: |
52 | 0 0F00 80x25 | 60 | 0 0F00 80x25 |
53 | 1 0F01 80x50 | 61 | 1 0F01 80x50 |
54 | 2 0F02 80x43 | 62 | 2 0F02 80x43 |
55 | 3 0F03 80x26 | 63 | 3 0F03 80x26 |
56 | .... | 64 | .... |
57 | Enter mode number or `scan': <flashing-cursor-here> | 65 | Enter mode number or ``scan``: <flashing-cursor-here> |
58 | 66 | ||
59 | <name-of-detected-video-adapter> tells what video adapter did Linux detect | 67 | <name-of-detected-video-adapter> tells what video adapter did Linux detect |
60 | -- it's either a generic adapter name (MDA, CGA, HGC, EGA, VGA, VESA VGA [a VGA | 68 | -- it's either a generic adapter name (MDA, CGA, HGC, EGA, VGA, VESA VGA [a VGA |
61 | with VESA-compliant BIOS]) or a chipset name (e.g., Trident). Direct detection | 69 | with VESA-compliant BIOS]) or a chipset name (e.g., Trident). Direct detection |
62 | of chipsets is turned off by default (see CONFIG_VIDEO_SVGA in chapter 4 to see | 70 | of chipsets is turned off by default (see CONFIG_VIDEO_SVGA in chapter 4 to see |
63 | how to enable it if you really want) as it's inherently unreliable due to | 71 | how to enable it if you really want) as it's inherently unreliable due to |
64 | absolutely insane PC design. | 72 | absolutely insane PC design. |
65 | 73 | ||
66 | "0 0F00 80x25" means that the first menu item (the menu items are numbered | 74 | "0 0F00 80x25" means that the first menu item (the menu items are numbered |
67 | from "0" to "9" and from "a" to "z") is a 80x25 mode with ID=0x0f00 (see the | 75 | from "0" to "9" and from "a" to "z") is a 80x25 mode with ID=0x0f00 (see the |
68 | next section for a description of mode IDs). | 76 | next section for a description of mode IDs). |
69 | 77 | ||
70 | <flashing-cursor-here> encourages you to enter the item number or mode ID | 78 | <flashing-cursor-here> encourages you to enter the item number or mode ID |
71 | you wish to set and press <RETURN>. If the computer complains something about | 79 | you wish to set and press <RETURN>. If the computer complains something about |
72 | "Unknown mode ID", it is trying to tell you that it isn't possible to set such | 80 | "Unknown mode ID", it is trying to tell you that it isn't possible to set such |
73 | a mode. It's also possible to press only <RETURN> which leaves the current mode. | 81 | a mode. It's also possible to press only <RETURN> which leaves the current mode. |
74 | 82 | ||
75 | The mode list usually contains a few basic modes and some VESA modes. In | 83 | The mode list usually contains a few basic modes and some VESA modes. In |
76 | case your chipset has been detected, some chipset-specific modes are shown as | 84 | case your chipset has been detected, some chipset-specific modes are shown as |
77 | well (some of these might be missing or unusable on your machine as different | 85 | well (some of these might be missing or unusable on your machine as different |
78 | BIOSes are often shipped with the same card and the mode numbers depend purely | 86 | BIOSes are often shipped with the same card and the mode numbers depend purely |
79 | on the VGA BIOS). | 87 | on the VGA BIOS). |
80 | 88 | ||
81 | The modes displayed on the menu are partially sorted: The list starts with | 89 | The modes displayed on the menu are partially sorted: The list starts with |
82 | the standard modes (80x25 and 80x50) followed by "special" modes (80x28 and | 90 | the standard modes (80x25 and 80x50) followed by "special" modes (80x28 and |
83 | 80x43), local modes (if the local modes feature is enabled), VESA modes and | 91 | 80x43), local modes (if the local modes feature is enabled), VESA modes and |
84 | finally SVGA modes for the auto-detected adapter. | 92 | finally SVGA modes for the auto-detected adapter. |
85 | 93 | ||
86 | If you are not happy with the mode list offered (e.g., if you think your card | 94 | If you are not happy with the mode list offered (e.g., if you think your card |
87 | is able to do more), you can enter "scan" instead of item number / mode ID. The | 95 | is able to do more), you can enter "scan" instead of item number / mode ID. The |
88 | program will try to ask the BIOS for all possible video mode numbers and test | 96 | program will try to ask the BIOS for all possible video mode numbers and test |
89 | what happens then. The screen will be probably flashing wildly for some time and | 97 | what happens then. The screen will be probably flashing wildly for some time and |
90 | strange noises will be heard from inside the monitor and so on and then, really | 98 | strange noises will be heard from inside the monitor and so on and then, really |
91 | all consistent video modes supported by your BIOS will appear (plus maybe some | 99 | all consistent video modes supported by your BIOS will appear (plus maybe some |
92 | `ghost modes'). If you are afraid this could damage your monitor, don't use this | 100 | ``ghost modes``). If you are afraid this could damage your monitor, don't use |
93 | function. | 101 | this function. |
94 | 102 | ||
95 | After scanning, the mode ordering is a bit different: the auto-detected SVGA | 103 | After scanning, the mode ordering is a bit different: the auto-detected SVGA |
96 | modes are not listed at all and the modes revealed by `scan' are shown before | 104 | modes are not listed at all and the modes revealed by ``scan`` are shown before |
97 | all VESA modes. | 105 | all VESA modes. |
98 | 106 | ||
99 | 3. Mode IDs | 107 | Mode IDs |
100 | ~~~~~~~~~~~ | 108 | ~~~~~~~~ |
101 | Because of the complexity of all the video stuff, the video mode IDs | 109 | |
110 | Because of the complexity of all the video stuff, the video mode IDs | ||
102 | used here are also a bit complex. A video mode ID is a 16-bit number usually | 111 | used here are also a bit complex. A video mode ID is a 16-bit number usually |
103 | expressed in a hexadecimal notation (starting with "0x"). You can set a mode | 112 | expressed in a hexadecimal notation (starting with "0x"). You can set a mode |
104 | by entering its mode directly if you know it even if it isn't shown on the menu. | 113 | by entering its mode directly if you know it even if it isn't shown on the menu. |
105 | 114 | ||
106 | The ID numbers can be divided to three regions: | 115 | The ID numbers can be divided to those regions:: |
107 | 116 | ||
108 | 0x0000 to 0x00ff - menu item references. 0x0000 is the first item. Don't use | 117 | 0x0000 to 0x00ff - menu item references. 0x0000 is the first item. Don't use |
109 | outside the menu as this can change from boot to boot (especially if you | 118 | outside the menu as this can change from boot to boot (especially if you |
110 | have used the `scan' feature). | 119 | have used the ``scan`` feature). |
111 | 120 | ||
112 | 0x0100 to 0x017f - standard BIOS modes. The ID is a BIOS video mode number | 121 | 0x0100 to 0x017f - standard BIOS modes. The ID is a BIOS video mode number |
113 | (as presented to INT 10, function 00) increased by 0x0100. | 122 | (as presented to INT 10, function 00) increased by 0x0100. |
@@ -142,53 +151,54 @@ The ID numbers can be divided to three regions: | |||
142 | 0xffff equivalent to 0x0f00 (standard 80x25) | 151 | 0xffff equivalent to 0x0f00 (standard 80x25) |
143 | 0xfffe equivalent to 0x0f01 (EGA 80x43 or VGA 80x50) | 152 | 0xfffe equivalent to 0x0f01 (EGA 80x43 or VGA 80x50) |
144 | 153 | ||
145 | If you add 0x8000 to the mode ID, the program will try to recalculate | 154 | If you add 0x8000 to the mode ID, the program will try to recalculate |
146 | vertical display timing according to mode parameters, which can be used to | 155 | vertical display timing according to mode parameters, which can be used to |
147 | eliminate some annoying bugs of certain VGA BIOSes (usually those used for | 156 | eliminate some annoying bugs of certain VGA BIOSes (usually those used for |
148 | cards with S3 chipsets and old Cirrus Logic BIOSes) -- mainly extra lines at the | 157 | cards with S3 chipsets and old Cirrus Logic BIOSes) -- mainly extra lines at the |
149 | end of the display. | 158 | end of the display. |
150 | 159 | ||
151 | 4. Options | 160 | Options |
152 | ~~~~~~~~~~ | 161 | ~~~~~~~ |
153 | Some options can be set in the source text (in arch/i386/boot/video.S). | 162 | |
163 | Some options can be set in the source text (in arch/i386/boot/video.S). | ||
154 | All of them are simple #define's -- change them to #undef's when you want to | 164 | All of them are simple #define's -- change them to #undef's when you want to |
155 | switch them off. Currently supported: | 165 | switch them off. Currently supported: |
156 | 166 | ||
157 | CONFIG_VIDEO_SVGA - enables autodetection of SVGA cards. This is switched | 167 | CONFIG_VIDEO_SVGA - enables autodetection of SVGA cards. This is switched |
158 | off by default as it's a bit unreliable due to terribly bad PC design. If you | 168 | off by default as it's a bit unreliable due to terribly bad PC design. If you |
159 | really want to have the adapter autodetected (maybe in case the `scan' feature | 169 | really want to have the adapter autodetected (maybe in case the ``scan`` feature |
160 | doesn't work on your machine), switch this on and don't cry if the results | 170 | doesn't work on your machine), switch this on and don't cry if the results |
161 | are not completely sane. In case you really need this feature, please drop me | 171 | are not completely sane. In case you really need this feature, please drop me |
162 | a mail as I think of removing it some day. | 172 | a mail as I think of removing it some day. |
163 | 173 | ||
164 | CONFIG_VIDEO_VESA - enables autodetection of VESA modes. If it doesn't work | 174 | CONFIG_VIDEO_VESA - enables autodetection of VESA modes. If it doesn't work |
165 | on your machine (or displays a "Error: Scanning of VESA modes failed" message), | 175 | on your machine (or displays a "Error: Scanning of VESA modes failed" message), |
166 | you can switch it off and report as a bug. | 176 | you can switch it off and report as a bug. |
167 | 177 | ||
168 | CONFIG_VIDEO_COMPACT - enables compacting of the video mode list. If there | 178 | CONFIG_VIDEO_COMPACT - enables compacting of the video mode list. If there |
169 | are more modes with the same screen size, only the first one is kept (see above | 179 | are more modes with the same screen size, only the first one is kept (see above |
170 | for more info on mode ordering). However, in very strange cases it's possible | 180 | for more info on mode ordering). However, in very strange cases it's possible |
171 | that the first "version" of the mode doesn't work although some of the others | 181 | that the first "version" of the mode doesn't work although some of the others |
172 | do -- in this case turn this switch off to see the rest. | 182 | do -- in this case turn this switch off to see the rest. |
173 | 183 | ||
174 | CONFIG_VIDEO_RETAIN - enables retaining of screen contents when switching | 184 | CONFIG_VIDEO_RETAIN - enables retaining of screen contents when switching |
175 | video modes. Works only with some boot loaders which leave enough room for the | 185 | video modes. Works only with some boot loaders which leave enough room for the |
176 | buffer. (If you have old LILO, you can adjust heap_end_ptr and loadflags | 186 | buffer. (If you have old LILO, you can adjust heap_end_ptr and loadflags |
177 | in setup.S, but it's better to upgrade the boot loader...) | 187 | in setup.S, but it's better to upgrade the boot loader...) |
178 | 188 | ||
179 | CONFIG_VIDEO_LOCAL - enables inclusion of "local modes" in the list. The | 189 | CONFIG_VIDEO_LOCAL - enables inclusion of "local modes" in the list. The |
180 | local modes are added automatically to the beginning of the list not depending | 190 | local modes are added automatically to the beginning of the list not depending |
181 | on hardware configuration. The local modes are listed in the source text after | 191 | on hardware configuration. The local modes are listed in the source text after |
182 | the "local_mode_table:" line. The comment before this line describes the format | 192 | the "local_mode_table:" line. The comment before this line describes the format |
183 | of the table (which also includes a video card name to be displayed on the | 193 | of the table (which also includes a video card name to be displayed on the |
184 | top of the menu). | 194 | top of the menu). |
185 | 195 | ||
186 | CONFIG_VIDEO_400_HACK - force setting of 400 scan lines for standard VGA | 196 | CONFIG_VIDEO_400_HACK - force setting of 400 scan lines for standard VGA |
187 | modes. This option is intended to be used on certain buggy BIOSes which draw | 197 | modes. This option is intended to be used on certain buggy BIOSes which draw |
188 | some useless logo using font download and then fail to reset the correct mode. | 198 | some useless logo using font download and then fail to reset the correct mode. |
189 | Don't use unless needed as it forces resetting the video card. | 199 | Don't use unless needed as it forces resetting the video card. |
190 | 200 | ||
191 | CONFIG_VIDEO_GFX_HACK - includes special hack for setting of graphics modes | 201 | CONFIG_VIDEO_GFX_HACK - includes special hack for setting of graphics modes |
192 | to be used later by special drivers (e.g., 800x600 on IBM ThinkPad -- see | 202 | to be used later by special drivers (e.g., 800x600 on IBM ThinkPad -- see |
193 | ftp://ftp.phys.keio.ac.jp/pub/XFree86/800x600/XF86Configs/XF86Config.IBM_TP560). | 203 | ftp://ftp.phys.keio.ac.jp/pub/XFree86/800x600/XF86Configs/XF86Config.IBM_TP560). |
194 | Allows to set _any_ BIOS mode including graphic ones and forcing specific | 204 | Allows to set _any_ BIOS mode including graphic ones and forcing specific |
@@ -196,33 +206,36 @@ text screen resolution instead of peeking it from BIOS variables. Don't use | |||
196 | unless you think you know what you're doing. To activate this setup, use | 206 | unless you think you know what you're doing. To activate this setup, use |
197 | mode number 0x0f08 (see section 3). | 207 | mode number 0x0f08 (see section 3). |
198 | 208 | ||
199 | 5. Still doesn't work? | 209 | Still doesn't work? |
200 | ~~~~~~~~~~~~~~~~~~~~~~ | 210 | ~~~~~~~~~~~~~~~~~~~ |
201 | When the mode detection doesn't work (e.g., the mode list is incorrect or | 211 | |
212 | When the mode detection doesn't work (e.g., the mode list is incorrect or | ||
202 | the machine hangs instead of displaying the menu), try to switch off some of | 213 | the machine hangs instead of displaying the menu), try to switch off some of |
203 | the configuration options listed in section 4. If it fails, you can still use | 214 | the configuration options listed in section 4. If it fails, you can still use |
204 | your kernel with the video mode set directly via the kernel parameter. | 215 | your kernel with the video mode set directly via the kernel parameter. |
205 | 216 | ||
206 | In either case, please send me a bug report containing what _exactly_ | 217 | In either case, please send me a bug report containing what _exactly_ |
207 | happens and how do the configuration switches affect the behaviour of the bug. | 218 | happens and how do the configuration switches affect the behaviour of the bug. |
208 | 219 | ||
209 | If you start Linux from M$-DOS, you might also use some DOS tools for | 220 | If you start Linux from M$-DOS, you might also use some DOS tools for |
210 | video mode setting. In this case, you must specify the 0x0f04 mode ("leave | 221 | video mode setting. In this case, you must specify the 0x0f04 mode ("leave |
211 | current settings") to Linux, because if you don't and you use any non-standard | 222 | current settings") to Linux, because if you don't and you use any non-standard |
212 | mode, Linux will switch to 80x25 automatically. | 223 | mode, Linux will switch to 80x25 automatically. |
213 | 224 | ||
214 | If you set some extended mode and there's one or more extra lines on the | 225 | If you set some extended mode and there's one or more extra lines on the |
215 | bottom of the display containing already scrolled-out text, your VGA BIOS | 226 | bottom of the display containing already scrolled-out text, your VGA BIOS |
216 | contains the most common video BIOS bug called "incorrect vertical display | 227 | contains the most common video BIOS bug called "incorrect vertical display |
217 | end setting". Adding 0x8000 to the mode ID might fix the problem. Unfortunately, | 228 | end setting". Adding 0x8000 to the mode ID might fix the problem. Unfortunately, |
218 | this must be done manually -- no autodetection mechanisms are available. | 229 | this must be done manually -- no autodetection mechanisms are available. |
219 | 230 | ||
220 | If you have a VGA card and your display still looks as on EGA, your BIOS | 231 | If you have a VGA card and your display still looks as on EGA, your BIOS |
221 | is probably broken and you need to set the CONFIG_VIDEO_400_HACK switch to | 232 | is probably broken and you need to set the CONFIG_VIDEO_400_HACK switch to |
222 | force setting of the correct mode. | 233 | force setting of the correct mode. |
223 | 234 | ||
224 | 6. History | 235 | History |
225 | ~~~~~~~~~~ | 236 | ~~~~~~~ |
237 | |||
238 | =============== ================================================================ | ||
226 | 1.0 (??-Nov-95) First version supporting all adapters supported by the old | 239 | 1.0 (??-Nov-95) First version supporting all adapters supported by the old |
227 | setup.S + Cirrus Logic 54XX. Present in some 1.3.4? kernels | 240 | setup.S + Cirrus Logic 54XX. Present in some 1.3.4? kernels |
228 | and then removed due to instability on some machines. | 241 | and then removed due to instability on some machines. |
@@ -260,17 +273,18 @@ force setting of the correct mode. | |||
260 | original version written by hhanemaa@cs.ruu.nl, patched by | 273 | original version written by hhanemaa@cs.ruu.nl, patched by |
261 | Jeff Chua, rewritten by me). | 274 | Jeff Chua, rewritten by me). |
262 | - Screen store/restore fixed. | 275 | - Screen store/restore fixed. |
263 | 2.8 (14-Apr-96) - Previous release was not compilable without CONFIG_VIDEO_SVGA. | 276 | 2.8 (14-Apr-96) - Previous release was not compilable without CONFIG_VIDEO_SVGA. |
264 | - Better recognition of text modes during mode scan. | 277 | - Better recognition of text modes during mode scan. |
265 | 2.9 (12-May-96) - Ignored VESA modes 0x80 - 0xff (more VESA BIOS bugs!) | 278 | 2.9 (12-May-96) - Ignored VESA modes 0x80 - 0xff (more VESA BIOS bugs!) |
266 | 2.10 (11-Nov-96)- The whole thing made optional. | 279 | 2.10(11-Nov-96) - The whole thing made optional. |
267 | - Added the CONFIG_VIDEO_400_HACK switch. | 280 | - Added the CONFIG_VIDEO_400_HACK switch. |
268 | - Added the CONFIG_VIDEO_GFX_HACK switch. | 281 | - Added the CONFIG_VIDEO_GFX_HACK switch. |
269 | - Code cleanup. | 282 | - Code cleanup. |
270 | 2.11 (03-May-97)- Yet another cleanup, now including also the documentation. | 283 | 2.11(03-May-97) - Yet another cleanup, now including also the documentation. |
271 | - Direct testing of SVGA adapters turned off by default, `scan' | 284 | - Direct testing of SVGA adapters turned off by default, ``scan`` |
272 | offered explicitly on the prompt line. | 285 | offered explicitly on the prompt line. |
273 | - Removed the doc section describing adding of new probing | 286 | - Removed the doc section describing adding of new probing |
274 | functions as I try to get rid of _all_ hardware probing here. | 287 | functions as I try to get rid of _all_ hardware probing here. |
275 | 2.12 (25-May-98)- Added support for VESA frame buffer graphics. | 288 | 2.12(25-May-98) Added support for VESA frame buffer graphics. |
276 | 2.13 (14-May-99)- Minor documentation fixes. | 289 | 2.13(14-May-99) Minor documentation fixes. |
290 | =============== ================================================================ | ||
diff --git a/Documentation/tee.txt b/Documentation/tee.txt index 718599357596..56ea85ffebf2 100644 --- a/Documentation/tee.txt +++ b/Documentation/tee.txt | |||
@@ -1,4 +1,7 @@ | |||
1 | ============= | ||
1 | TEE subsystem | 2 | TEE subsystem |
3 | ============= | ||
4 | |||
2 | This document describes the TEE subsystem in Linux. | 5 | This document describes the TEE subsystem in Linux. |
3 | 6 | ||
4 | A TEE (Trusted Execution Environment) is a trusted OS running in some | 7 | A TEE (Trusted Execution Environment) is a trusted OS running in some |
@@ -80,27 +83,27 @@ The GlobalPlatform TEE Client API [5] is implemented on top of the generic | |||
80 | TEE API. | 83 | TEE API. |
81 | 84 | ||
82 | Picture of the relationship between the different components in the | 85 | Picture of the relationship between the different components in the |
83 | OP-TEE architecture. | 86 | OP-TEE architecture:: |
84 | 87 | ||
85 | User space Kernel Secure world | 88 | User space Kernel Secure world |
86 | ~~~~~~~~~~ ~~~~~~ ~~~~~~~~~~~~ | 89 | ~~~~~~~~~~ ~~~~~~ ~~~~~~~~~~~~ |
87 | +--------+ +-------------+ | 90 | +--------+ +-------------+ |
88 | | Client | | Trusted | | 91 | | Client | | Trusted | |
89 | +--------+ | Application | | 92 | +--------+ | Application | |
90 | /\ +-------------+ | 93 | /\ +-------------+ |
91 | || +----------+ /\ | 94 | || +----------+ /\ |
92 | || |tee- | || | 95 | || |tee- | || |
93 | || |supplicant| \/ | 96 | || |supplicant| \/ |
94 | || +----------+ +-------------+ | 97 | || +----------+ +-------------+ |
95 | \/ /\ | TEE Internal| | 98 | \/ /\ | TEE Internal| |
96 | +-------+ || | API | | 99 | +-------+ || | API | |
97 | + TEE | || +--------+--------+ +-------------+ | 100 | + TEE | || +--------+--------+ +-------------+ |
98 | | Client| || | TEE | OP-TEE | | OP-TEE | | 101 | | Client| || | TEE | OP-TEE | | OP-TEE | |
99 | | API | \/ | subsys | driver | | Trusted OS | | 102 | | API | \/ | subsys | driver | | Trusted OS | |
100 | +-------+----------------+----+-------+----+-----------+-------------+ | 103 | +-------+----------------+----+-------+----+-----------+-------------+ |
101 | | Generic TEE API | | OP-TEE MSG | | 104 | | Generic TEE API | | OP-TEE MSG | |
102 | | IOCTL (TEE_IOC_*) | | SMCCC (OPTEE_SMC_CALL_*) | | 105 | | IOCTL (TEE_IOC_*) | | SMCCC (OPTEE_SMC_CALL_*) | |
103 | +-----------------------------+ +------------------------------+ | 106 | +-----------------------------+ +------------------------------+ |
104 | 107 | ||
105 | RPC (Remote Procedure Call) are requests from secure world to kernel driver | 108 | RPC (Remote Procedure Call) are requests from secure world to kernel driver |
106 | or tee-supplicant. An RPC is identified by a special range of SMCCC return | 109 | or tee-supplicant. An RPC is identified by a special range of SMCCC return |
@@ -109,10 +112,16 @@ kernel are handled by the kernel driver. Other RPC messages will be forwarded to | |||
109 | tee-supplicant without further involvement of the driver, except switching | 112 | tee-supplicant without further involvement of the driver, except switching |
110 | shared memory buffer representation. | 113 | shared memory buffer representation. |
111 | 114 | ||
112 | References: | 115 | References |
116 | ========== | ||
117 | |||
113 | [1] https://github.com/OP-TEE/optee_os | 118 | [1] https://github.com/OP-TEE/optee_os |
119 | |||
114 | [2] http://infocenter.arm.com/help/topic/com.arm.doc.den0028a/index.html | 120 | [2] http://infocenter.arm.com/help/topic/com.arm.doc.den0028a/index.html |
121 | |||
115 | [3] drivers/tee/optee/optee_smc.h | 122 | [3] drivers/tee/optee/optee_smc.h |
123 | |||
116 | [4] drivers/tee/optee/optee_msg.h | 124 | [4] drivers/tee/optee/optee_msg.h |
125 | |||
117 | [5] http://www.globalplatform.org/specificationsdevice.asp look for | 126 | [5] http://www.globalplatform.org/specificationsdevice.asp look for |
118 | "TEE Client API Specification v1.0" and click download. | 127 | "TEE Client API Specification v1.0" and click download. |
diff --git a/Documentation/this_cpu_ops.txt b/Documentation/this_cpu_ops.txt index 2cbf71975381..5cb8b883ae83 100644 --- a/Documentation/this_cpu_ops.txt +++ b/Documentation/this_cpu_ops.txt | |||
@@ -1,5 +1,9 @@ | |||
1 | =================== | ||
1 | this_cpu operations | 2 | this_cpu operations |
2 | ------------------- | 3 | =================== |
4 | |||
5 | :Author: Christoph Lameter, August 4th, 2014 | ||
6 | :Author: Pranith Kumar, Aug 2nd, 2014 | ||
3 | 7 | ||
4 | this_cpu operations are a way of optimizing access to per cpu | 8 | this_cpu operations are a way of optimizing access to per cpu |
5 | variables associated with the *currently* executing processor. This is | 9 | variables associated with the *currently* executing processor. This is |
@@ -39,7 +43,7 @@ operations. | |||
39 | 43 | ||
40 | The following this_cpu() operations with implied preemption protection | 44 | The following this_cpu() operations with implied preemption protection |
41 | are defined. These operations can be used without worrying about | 45 | are defined. These operations can be used without worrying about |
42 | preemption and interrupts. | 46 | preemption and interrupts:: |
43 | 47 | ||
44 | this_cpu_read(pcp) | 48 | this_cpu_read(pcp) |
45 | this_cpu_write(pcp, val) | 49 | this_cpu_write(pcp, val) |
@@ -67,14 +71,14 @@ to relocate a per cpu relative address to the proper per cpu area for | |||
67 | the processor. So the relocation to the per cpu base is encoded in the | 71 | the processor. So the relocation to the per cpu base is encoded in the |
68 | instruction via a segment register prefix. | 72 | instruction via a segment register prefix. |
69 | 73 | ||
70 | For example: | 74 | For example:: |
71 | 75 | ||
72 | DEFINE_PER_CPU(int, x); | 76 | DEFINE_PER_CPU(int, x); |
73 | int z; | 77 | int z; |
74 | 78 | ||
75 | z = this_cpu_read(x); | 79 | z = this_cpu_read(x); |
76 | 80 | ||
77 | results in a single instruction | 81 | results in a single instruction:: |
78 | 82 | ||
79 | mov ax, gs:[x] | 83 | mov ax, gs:[x] |
80 | 84 | ||
@@ -84,16 +88,16 @@ this_cpu_ops such sequence also required preempt disable/enable to | |||
84 | prevent the kernel from moving the thread to a different processor | 88 | prevent the kernel from moving the thread to a different processor |
85 | while the calculation is performed. | 89 | while the calculation is performed. |
86 | 90 | ||
87 | Consider the following this_cpu operation: | 91 | Consider the following this_cpu operation:: |
88 | 92 | ||
89 | this_cpu_inc(x) | 93 | this_cpu_inc(x) |
90 | 94 | ||
91 | The above results in the following single instruction (no lock prefix!) | 95 | The above results in the following single instruction (no lock prefix!):: |
92 | 96 | ||
93 | inc gs:[x] | 97 | inc gs:[x] |
94 | 98 | ||
95 | instead of the following operations required if there is no segment | 99 | instead of the following operations required if there is no segment |
96 | register: | 100 | register:: |
97 | 101 | ||
98 | int *y; | 102 | int *y; |
99 | int cpu; | 103 | int cpu; |
@@ -121,8 +125,10 @@ has to be paid for this optimization is the need to add up the per cpu | |||
121 | counters when the value of a counter is needed. | 125 | counters when the value of a counter is needed. |
122 | 126 | ||
123 | 127 | ||
124 | Special operations: | 128 | Special operations |
125 | ------------------- | 129 | ------------------ |
130 | |||
131 | :: | ||
126 | 132 | ||
127 | y = this_cpu_ptr(&x) | 133 | y = this_cpu_ptr(&x) |
128 | 134 | ||
@@ -153,11 +159,15 @@ Therefore the use of x or &x outside of the context of per cpu | |||
153 | operations is invalid and will generally be treated like a NULL | 159 | operations is invalid and will generally be treated like a NULL |
154 | pointer dereference. | 160 | pointer dereference. |
155 | 161 | ||
162 | :: | ||
163 | |||
156 | DEFINE_PER_CPU(int, x); | 164 | DEFINE_PER_CPU(int, x); |
157 | 165 | ||
158 | In the context of per cpu operations the above implies that x is a per | 166 | In the context of per cpu operations the above implies that x is a per |
159 | cpu variable. Most this_cpu operations take a cpu variable. | 167 | cpu variable. Most this_cpu operations take a cpu variable. |
160 | 168 | ||
169 | :: | ||
170 | |||
161 | int __percpu *p = &x; | 171 | int __percpu *p = &x; |
162 | 172 | ||
163 | &x and hence p is the *offset* of a per cpu variable. this_cpu_ptr() | 173 | &x and hence p is the *offset* of a per cpu variable. this_cpu_ptr() |
@@ -168,7 +178,7 @@ strange. | |||
168 | Operations on a field of a per cpu structure | 178 | Operations on a field of a per cpu structure |
169 | -------------------------------------------- | 179 | -------------------------------------------- |
170 | 180 | ||
171 | Let's say we have a percpu structure | 181 | Let's say we have a percpu structure:: |
172 | 182 | ||
173 | struct s { | 183 | struct s { |
174 | int n,m; | 184 | int n,m; |
@@ -177,14 +187,14 @@ Let's say we have a percpu structure | |||
177 | DEFINE_PER_CPU(struct s, p); | 187 | DEFINE_PER_CPU(struct s, p); |
178 | 188 | ||
179 | 189 | ||
180 | Operations on these fields are straightforward | 190 | Operations on these fields are straightforward:: |
181 | 191 | ||
182 | this_cpu_inc(p.m) | 192 | this_cpu_inc(p.m) |
183 | 193 | ||
184 | z = this_cpu_cmpxchg(p.m, 0, 1); | 194 | z = this_cpu_cmpxchg(p.m, 0, 1); |
185 | 195 | ||
186 | 196 | ||
187 | If we have an offset to struct s: | 197 | If we have an offset to struct s:: |
188 | 198 | ||
189 | struct s __percpu *ps = &p; | 199 | struct s __percpu *ps = &p; |
190 | 200 | ||
@@ -194,7 +204,7 @@ If we have an offset to struct s: | |||
194 | 204 | ||
195 | 205 | ||
196 | The calculation of the pointer may require the use of this_cpu_ptr() | 206 | The calculation of the pointer may require the use of this_cpu_ptr() |
197 | if we do not make use of this_cpu ops later to manipulate fields: | 207 | if we do not make use of this_cpu ops later to manipulate fields:: |
198 | 208 | ||
199 | struct s *pp; | 209 | struct s *pp; |
200 | 210 | ||
@@ -206,7 +216,7 @@ if we do not make use of this_cpu ops later to manipulate fields: | |||
206 | 216 | ||
207 | 217 | ||
208 | Variants of this_cpu ops | 218 | Variants of this_cpu ops |
209 | ------------------------- | 219 | ------------------------ |
210 | 220 | ||
211 | this_cpu ops are interrupt safe. Some architectures do not support | 221 | this_cpu ops are interrupt safe. Some architectures do not support |
212 | these per cpu local operations. In that case the operation must be | 222 | these per cpu local operations. In that case the operation must be |
@@ -222,7 +232,7 @@ preemption. If a per cpu variable is not used in an interrupt context | |||
222 | and the scheduler cannot preempt, then they are safe. If any interrupts | 232 | and the scheduler cannot preempt, then they are safe. If any interrupts |
223 | still occur while an operation is in progress and if the interrupt too | 233 | still occur while an operation is in progress and if the interrupt too |
224 | modifies the variable, then RMW actions can not be guaranteed to be | 234 | modifies the variable, then RMW actions can not be guaranteed to be |
225 | safe. | 235 | safe:: |
226 | 236 | ||
227 | __this_cpu_read(pcp) | 237 | __this_cpu_read(pcp) |
228 | __this_cpu_write(pcp, val) | 238 | __this_cpu_write(pcp, val) |
@@ -279,7 +289,7 @@ unless absolutely necessary. Please consider using an IPI to wake up | |||
279 | the remote CPU and perform the update to its per cpu area. | 289 | the remote CPU and perform the update to its per cpu area. |
280 | 290 | ||
281 | To access per-cpu data structure remotely, typically the per_cpu_ptr() | 291 | To access per-cpu data structure remotely, typically the per_cpu_ptr() |
282 | function is used: | 292 | function is used:: |
283 | 293 | ||
284 | 294 | ||
285 | DEFINE_PER_CPU(struct data, datap); | 295 | DEFINE_PER_CPU(struct data, datap); |
@@ -289,7 +299,7 @@ function is used: | |||
289 | This makes it explicit that we are getting ready to access a percpu | 299 | This makes it explicit that we are getting ready to access a percpu |
290 | area remotely. | 300 | area remotely. |
291 | 301 | ||
292 | You can also do the following to convert the datap offset to an address | 302 | You can also do the following to convert the datap offset to an address:: |
293 | 303 | ||
294 | struct data *p = this_cpu_ptr(&datap); | 304 | struct data *p = this_cpu_ptr(&datap); |
295 | 305 | ||
@@ -305,7 +315,7 @@ the following scenario that occurs because two per cpu variables | |||
305 | share a cache-line but the relaxed synchronization is applied to | 315 | share a cache-line but the relaxed synchronization is applied to |
306 | only one process updating the cache-line. | 316 | only one process updating the cache-line. |
307 | 317 | ||
308 | Consider the following example | 318 | Consider the following example:: |
309 | 319 | ||
310 | 320 | ||
311 | struct test { | 321 | struct test { |
@@ -327,6 +337,3 @@ mind that a remote write will evict the cache line from the processor | |||
327 | that most likely will access it. If the processor wakes up and finds a | 337 | that most likely will access it. If the processor wakes up and finds a |
328 | missing local cache line of a per cpu area, its performance and hence | 338 | missing local cache line of a per cpu area, its performance and hence |
329 | the wake up times will be affected. | 339 | the wake up times will be affected. |
330 | |||
331 | Christoph Lameter, August 4th, 2014 | ||
332 | Pranith Kumar, Aug 2nd, 2014 | ||
diff --git a/Documentation/unaligned-memory-access.txt b/Documentation/unaligned-memory-access.txt index 3f76c0c37920..51b4ff031586 100644 --- a/Documentation/unaligned-memory-access.txt +++ b/Documentation/unaligned-memory-access.txt | |||
@@ -1,6 +1,15 @@ | |||
1 | ========================= | ||
1 | UNALIGNED MEMORY ACCESSES | 2 | UNALIGNED MEMORY ACCESSES |
2 | ========================= | 3 | ========================= |
3 | 4 | ||
5 | :Author: Daniel Drake <dsd@gentoo.org>, | ||
6 | :Author: Johannes Berg <johannes@sipsolutions.net> | ||
7 | |||
8 | :With help from: Alan Cox, Avuton Olrich, Heikki Orsila, Jan Engelhardt, | ||
9 | Kyle McMartin, Kyle Moffett, Randy Dunlap, Robert Hancock, Uli Kunitz, | ||
10 | Vadim Lobanov | ||
11 | |||
12 | |||
4 | Linux runs on a wide variety of architectures which have varying behaviour | 13 | Linux runs on a wide variety of architectures which have varying behaviour |
5 | when it comes to memory access. This document presents some details about | 14 | when it comes to memory access. This document presents some details about |
6 | unaligned accesses, why you need to write code that doesn't cause them, | 15 | unaligned accesses, why you need to write code that doesn't cause them, |
@@ -73,7 +82,7 @@ memory addresses of certain variables, etc. | |||
73 | 82 | ||
74 | Fortunately things are not too complex, as in most cases, the compiler | 83 | Fortunately things are not too complex, as in most cases, the compiler |
75 | ensures that things will work for you. For example, take the following | 84 | ensures that things will work for you. For example, take the following |
76 | structure: | 85 | structure:: |
77 | 86 | ||
78 | struct foo { | 87 | struct foo { |
79 | u16 field1; | 88 | u16 field1; |
@@ -106,7 +115,7 @@ On a related topic, with the above considerations in mind you may observe | |||
106 | that you could reorder the fields in the structure in order to place fields | 115 | that you could reorder the fields in the structure in order to place fields |
107 | where padding would otherwise be inserted, and hence reduce the overall | 116 | where padding would otherwise be inserted, and hence reduce the overall |
108 | resident memory size of structure instances. The optimal layout of the | 117 | resident memory size of structure instances. The optimal layout of the |
109 | above example is: | 118 | above example is:: |
110 | 119 | ||
111 | struct foo { | 120 | struct foo { |
112 | u32 field2; | 121 | u32 field2; |
@@ -139,21 +148,21 @@ Code that causes unaligned access | |||
139 | With the above in mind, let's move onto a real life example of a function | 148 | With the above in mind, let's move onto a real life example of a function |
140 | that can cause an unaligned memory access. The following function taken | 149 | that can cause an unaligned memory access. The following function taken |
141 | from include/linux/etherdevice.h is an optimized routine to compare two | 150 | from include/linux/etherdevice.h is an optimized routine to compare two |
142 | ethernet MAC addresses for equality. | 151 | ethernet MAC addresses for equality:: |
143 | 152 | ||
144 | bool ether_addr_equal(const u8 *addr1, const u8 *addr2) | 153 | bool ether_addr_equal(const u8 *addr1, const u8 *addr2) |
145 | { | 154 | { |
146 | #ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS | 155 | #ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS |
147 | u32 fold = ((*(const u32 *)addr1) ^ (*(const u32 *)addr2)) | | 156 | u32 fold = ((*(const u32 *)addr1) ^ (*(const u32 *)addr2)) | |
148 | ((*(const u16 *)(addr1 + 4)) ^ (*(const u16 *)(addr2 + 4))); | 157 | ((*(const u16 *)(addr1 + 4)) ^ (*(const u16 *)(addr2 + 4))); |
149 | 158 | ||
150 | return fold == 0; | 159 | return fold == 0; |
151 | #else | 160 | #else |
152 | const u16 *a = (const u16 *)addr1; | 161 | const u16 *a = (const u16 *)addr1; |
153 | const u16 *b = (const u16 *)addr2; | 162 | const u16 *b = (const u16 *)addr2; |
154 | return ((a[0] ^ b[0]) | (a[1] ^ b[1]) | (a[2] ^ b[2])) == 0; | 163 | return ((a[0] ^ b[0]) | (a[1] ^ b[1]) | (a[2] ^ b[2])) == 0; |
155 | #endif | 164 | #endif |
156 | } | 165 | } |
157 | 166 | ||
158 | In the above function, when the hardware has efficient unaligned access | 167 | In the above function, when the hardware has efficient unaligned access |
159 | capability, there is no issue with this code. But when the hardware isn't | 168 | capability, there is no issue with this code. But when the hardware isn't |
@@ -171,7 +180,8 @@ as it is a decent optimization for the cases when you can ensure alignment, | |||
171 | which is true almost all of the time in ethernet networking context. | 180 | which is true almost all of the time in ethernet networking context. |
172 | 181 | ||
173 | 182 | ||
174 | Here is another example of some code that could cause unaligned accesses: | 183 | Here is another example of some code that could cause unaligned accesses:: |
184 | |||
175 | void myfunc(u8 *data, u32 value) | 185 | void myfunc(u8 *data, u32 value) |
176 | { | 186 | { |
177 | [...] | 187 | [...] |
@@ -184,6 +194,7 @@ to an address that is not evenly divisible by 4. | |||
184 | 194 | ||
185 | In summary, the 2 main scenarios where you may run into unaligned access | 195 | In summary, the 2 main scenarios where you may run into unaligned access |
186 | problems involve: | 196 | problems involve: |
197 | |||
187 | 1. Casting variables to types of different lengths | 198 | 1. Casting variables to types of different lengths |
188 | 2. Pointer arithmetic followed by access to at least 2 bytes of data | 199 | 2. Pointer arithmetic followed by access to at least 2 bytes of data |
189 | 200 | ||
@@ -195,7 +206,7 @@ The easiest way to avoid unaligned access is to use the get_unaligned() and | |||
195 | put_unaligned() macros provided by the <asm/unaligned.h> header file. | 206 | put_unaligned() macros provided by the <asm/unaligned.h> header file. |
196 | 207 | ||
197 | Going back to an earlier example of code that potentially causes unaligned | 208 | Going back to an earlier example of code that potentially causes unaligned |
198 | access: | 209 | access:: |
199 | 210 | ||
200 | void myfunc(u8 *data, u32 value) | 211 | void myfunc(u8 *data, u32 value) |
201 | { | 212 | { |
@@ -204,7 +215,7 @@ access: | |||
204 | [...] | 215 | [...] |
205 | } | 216 | } |
206 | 217 | ||
207 | To avoid the unaligned memory access, you would rewrite it as follows: | 218 | To avoid the unaligned memory access, you would rewrite it as follows:: |
208 | 219 | ||
209 | void myfunc(u8 *data, u32 value) | 220 | void myfunc(u8 *data, u32 value) |
210 | { | 221 | { |
@@ -215,7 +226,7 @@ To avoid the unaligned memory access, you would rewrite it as follows: | |||
215 | } | 226 | } |
216 | 227 | ||
217 | The get_unaligned() macro works similarly. Assuming 'data' is a pointer to | 228 | The get_unaligned() macro works similarly. Assuming 'data' is a pointer to |
218 | memory and you wish to avoid unaligned access, its usage is as follows: | 229 | memory and you wish to avoid unaligned access, its usage is as follows:: |
219 | 230 | ||
220 | u32 value = get_unaligned((u32 *) data); | 231 | u32 value = get_unaligned((u32 *) data); |
221 | 232 | ||
@@ -245,18 +256,10 @@ For some ethernet hardware that cannot DMA to unaligned addresses like | |||
245 | 4*n+2 or non-ethernet hardware, this can be a problem, and it is then | 256 | 4*n+2 or non-ethernet hardware, this can be a problem, and it is then |
246 | required to copy the incoming frame into an aligned buffer. Because this is | 257 | required to copy the incoming frame into an aligned buffer. Because this is |
247 | unnecessary on architectures that can do unaligned accesses, the code can be | 258 | unnecessary on architectures that can do unaligned accesses, the code can be |
248 | made dependent on CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS like so: | 259 | made dependent on CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS like so:: |
249 | |||
250 | #ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS | ||
251 | skb = original skb | ||
252 | #else | ||
253 | skb = copy skb | ||
254 | #endif | ||
255 | |||
256 | -- | ||
257 | Authors: Daniel Drake <dsd@gentoo.org>, | ||
258 | Johannes Berg <johannes@sipsolutions.net> | ||
259 | With help from: Alan Cox, Avuton Olrich, Heikki Orsila, Jan Engelhardt, | ||
260 | Kyle McMartin, Kyle Moffett, Randy Dunlap, Robert Hancock, Uli Kunitz, | ||
261 | Vadim Lobanov | ||
262 | 260 | ||
261 | #ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS | ||
262 | skb = original skb | ||
263 | #else | ||
264 | skb = copy skb | ||
265 | #endif | ||
diff --git a/Documentation/vfio-mediated-device.txt b/Documentation/vfio-mediated-device.txt index e5e57b40f8af..1b3950346532 100644 --- a/Documentation/vfio-mediated-device.txt +++ b/Documentation/vfio-mediated-device.txt | |||
@@ -1,14 +1,17 @@ | |||
1 | /* | 1 | .. include:: <isonum.txt> |
2 | * VFIO Mediated devices | 2 | |
3 | * | 3 | ===================== |
4 | * Copyright (c) 2016, NVIDIA CORPORATION. All rights reserved. | 4 | VFIO Mediated devices |
5 | * Author: Neo Jia <cjia@nvidia.com> | 5 | ===================== |
6 | * Kirti Wankhede <kwankhede@nvidia.com> | 6 | |
7 | * | 7 | :Copyright: |copy| 2016, NVIDIA CORPORATION. All rights reserved. |
8 | * This program is free software; you can redistribute it and/or modify | 8 | :Author: Neo Jia <cjia@nvidia.com> |
9 | * it under the terms of the GNU General Public License version 2 as | 9 | :Author: Kirti Wankhede <kwankhede@nvidia.com> |
10 | * published by the Free Software Foundation. | 10 | |
11 | */ | 11 | This program is free software; you can redistribute it and/or modify |
12 | it under the terms of the GNU General Public License version 2 as | ||
13 | published by the Free Software Foundation. | ||
14 | |||
12 | 15 | ||
13 | Virtual Function I/O (VFIO) Mediated devices[1] | 16 | Virtual Function I/O (VFIO) Mediated devices[1] |
14 | =============================================== | 17 | =============================================== |
@@ -42,7 +45,7 @@ removes it from a VFIO group. | |||
42 | 45 | ||
43 | The following high-level block diagram shows the main components and interfaces | 46 | The following high-level block diagram shows the main components and interfaces |
44 | in the VFIO mediated driver framework. The diagram shows NVIDIA, Intel, and IBM | 47 | in the VFIO mediated driver framework. The diagram shows NVIDIA, Intel, and IBM |
45 | devices as examples, as these devices are the first devices to use this module. | 48 | devices as examples, as these devices are the first devices to use this module:: |
46 | 49 | ||
47 | +---------------+ | 50 | +---------------+ |
48 | | | | 51 | | | |
@@ -91,7 +94,7 @@ Registration Interface for a Mediated Bus Driver | |||
91 | ------------------------------------------------ | 94 | ------------------------------------------------ |
92 | 95 | ||
93 | The registration interface for a mediated bus driver provides the following | 96 | The registration interface for a mediated bus driver provides the following |
94 | structure to represent a mediated device's driver: | 97 | structure to represent a mediated device's driver:: |
95 | 98 | ||
96 | /* | 99 | /* |
97 | * struct mdev_driver [2] - Mediated device's driver | 100 | * struct mdev_driver [2] - Mediated device's driver |
@@ -110,14 +113,14 @@ structure to represent a mediated device's driver: | |||
110 | A mediated bus driver for mdev should use this structure in the function calls | 113 | A mediated bus driver for mdev should use this structure in the function calls |
111 | to register and unregister itself with the core driver: | 114 | to register and unregister itself with the core driver: |
112 | 115 | ||
113 | * Register: | 116 | * Register:: |
114 | 117 | ||
115 | extern int mdev_register_driver(struct mdev_driver *drv, | 118 | extern int mdev_register_driver(struct mdev_driver *drv, |
116 | struct module *owner); | 119 | struct module *owner); |
117 | 120 | ||
118 | * Unregister: | 121 | * Unregister:: |
119 | 122 | ||
120 | extern void mdev_unregister_driver(struct mdev_driver *drv); | 123 | extern void mdev_unregister_driver(struct mdev_driver *drv); |
121 | 124 | ||
122 | The mediated bus driver is responsible for adding mediated devices to the VFIO | 125 | The mediated bus driver is responsible for adding mediated devices to the VFIO |
123 | group when devices are bound to the driver and removing mediated devices from | 126 | group when devices are bound to the driver and removing mediated devices from |
@@ -152,15 +155,15 @@ The callbacks in the mdev_parent_ops structure are as follows: | |||
152 | * mmap: mmap emulation callback | 155 | * mmap: mmap emulation callback |
153 | 156 | ||
154 | A driver should use the mdev_parent_ops structure in the function call to | 157 | A driver should use the mdev_parent_ops structure in the function call to |
155 | register itself with the mdev core driver: | 158 | register itself with the mdev core driver:: |
156 | 159 | ||
157 | extern int mdev_register_device(struct device *dev, | 160 | extern int mdev_register_device(struct device *dev, |
158 | const struct mdev_parent_ops *ops); | 161 | const struct mdev_parent_ops *ops); |
159 | 162 | ||
160 | However, the mdev_parent_ops structure is not required in the function call | 163 | However, the mdev_parent_ops structure is not required in the function call |
161 | that a driver should use to unregister itself with the mdev core driver: | 164 | that a driver should use to unregister itself with the mdev core driver:: |
162 | 165 | ||
163 | extern void mdev_unregister_device(struct device *dev); | 166 | extern void mdev_unregister_device(struct device *dev); |
164 | 167 | ||
165 | 168 | ||
166 | Mediated Device Management Interface Through sysfs | 169 | Mediated Device Management Interface Through sysfs |
@@ -183,30 +186,32 @@ with the mdev core driver. | |||
183 | Directories and files under the sysfs for Each Physical Device | 186 | Directories and files under the sysfs for Each Physical Device |
184 | -------------------------------------------------------------- | 187 | -------------------------------------------------------------- |
185 | 188 | ||
186 | |- [parent physical device] | 189 | :: |
187 | |--- Vendor-specific-attributes [optional] | 190 | |
188 | |--- [mdev_supported_types] | 191 | |- [parent physical device] |
189 | | |--- [<type-id>] | 192 | |--- Vendor-specific-attributes [optional] |
190 | | | |--- create | 193 | |--- [mdev_supported_types] |
191 | | | |--- name | 194 | | |--- [<type-id>] |
192 | | | |--- available_instances | 195 | | | |--- create |
193 | | | |--- device_api | 196 | | | |--- name |
194 | | | |--- description | 197 | | | |--- available_instances |
195 | | | |--- [devices] | 198 | | | |--- device_api |
196 | | |--- [<type-id>] | 199 | | | |--- description |
197 | | | |--- create | 200 | | | |--- [devices] |
198 | | | |--- name | 201 | | |--- [<type-id>] |
199 | | | |--- available_instances | 202 | | | |--- create |
200 | | | |--- device_api | 203 | | | |--- name |
201 | | | |--- description | 204 | | | |--- available_instances |
202 | | | |--- [devices] | 205 | | | |--- device_api |
203 | | |--- [<type-id>] | 206 | | | |--- description |
204 | | |--- create | 207 | | | |--- [devices] |
205 | | |--- name | 208 | | |--- [<type-id>] |
206 | | |--- available_instances | 209 | | |--- create |
207 | | |--- device_api | 210 | | |--- name |
208 | | |--- description | 211 | | |--- available_instances |
209 | | |--- [devices] | 212 | | |--- device_api |
213 | | |--- description | ||
214 | | |--- [devices] | ||
210 | 215 | ||
211 | * [mdev_supported_types] | 216 | * [mdev_supported_types] |
212 | 217 | ||
@@ -219,12 +224,12 @@ Directories and files under the sysfs for Each Physical Device | |||
219 | 224 | ||
220 | The [<type-id>] name is created by adding the device driver string as a prefix | 225 | The [<type-id>] name is created by adding the device driver string as a prefix |
221 | to the string provided by the vendor driver. This format of this name is as | 226 | to the string provided by the vendor driver. This format of this name is as |
222 | follows: | 227 | follows:: |
223 | 228 | ||
224 | sprintf(buf, "%s-%s", dev_driver_string(parent->dev), group->name); | 229 | sprintf(buf, "%s-%s", dev_driver_string(parent->dev), group->name); |
225 | 230 | ||
226 | (or using mdev_parent_dev(mdev) to arrive at the parent device outside | 231 | (or using mdev_parent_dev(mdev) to arrive at the parent device outside |
227 | of the core mdev code) | 232 | of the core mdev code) |
228 | 233 | ||
229 | * device_api | 234 | * device_api |
230 | 235 | ||
@@ -239,7 +244,7 @@ Directories and files under the sysfs for Each Physical Device | |||
239 | * [device] | 244 | * [device] |
240 | 245 | ||
241 | This directory contains links to the devices of type <type-id> that have been | 246 | This directory contains links to the devices of type <type-id> that have been |
242 | created. | 247 | created. |
243 | 248 | ||
244 | * name | 249 | * name |
245 | 250 | ||
@@ -253,21 +258,25 @@ created. | |||
253 | Directories and Files Under the sysfs for Each mdev Device | 258 | Directories and Files Under the sysfs for Each mdev Device |
254 | ---------------------------------------------------------- | 259 | ---------------------------------------------------------- |
255 | 260 | ||
256 | |- [parent phy device] | 261 | :: |
257 | |--- [$MDEV_UUID] | 262 | |
263 | |- [parent phy device] | ||
264 | |--- [$MDEV_UUID] | ||
258 | |--- remove | 265 | |--- remove |
259 | |--- mdev_type {link to its type} | 266 | |--- mdev_type {link to its type} |
260 | |--- vendor-specific-attributes [optional] | 267 | |--- vendor-specific-attributes [optional] |
261 | 268 | ||
262 | * remove (write only) | 269 | * remove (write only) |
270 | |||
263 | Writing '1' to the 'remove' file destroys the mdev device. The vendor driver can | 271 | Writing '1' to the 'remove' file destroys the mdev device. The vendor driver can |
264 | fail the remove() callback if that device is active and the vendor driver | 272 | fail the remove() callback if that device is active and the vendor driver |
265 | doesn't support hot unplug. | 273 | doesn't support hot unplug. |
266 | 274 | ||
267 | Example: | 275 | Example:: |
276 | |||
268 | # echo 1 > /sys/bus/mdev/devices/$mdev_UUID/remove | 277 | # echo 1 > /sys/bus/mdev/devices/$mdev_UUID/remove |
269 | 278 | ||
270 | Mediated device Hot plug: | 279 | Mediated device Hot plug |
271 | ------------------------ | 280 | ------------------------ |
272 | 281 | ||
273 | Mediated devices can be created and assigned at runtime. The procedure to hot | 282 | Mediated devices can be created and assigned at runtime. The procedure to hot |
@@ -277,13 +286,13 @@ Translation APIs for Mediated Devices | |||
277 | ===================================== | 286 | ===================================== |
278 | 287 | ||
279 | The following APIs are provided for translating user pfn to host pfn in a VFIO | 288 | The following APIs are provided for translating user pfn to host pfn in a VFIO |
280 | driver: | 289 | driver:: |
281 | 290 | ||
282 | extern int vfio_pin_pages(struct device *dev, unsigned long *user_pfn, | 291 | extern int vfio_pin_pages(struct device *dev, unsigned long *user_pfn, |
283 | int npage, int prot, unsigned long *phys_pfn); | 292 | int npage, int prot, unsigned long *phys_pfn); |
284 | 293 | ||
285 | extern int vfio_unpin_pages(struct device *dev, unsigned long *user_pfn, | 294 | extern int vfio_unpin_pages(struct device *dev, unsigned long *user_pfn, |
286 | int npage); | 295 | int npage); |
287 | 296 | ||
288 | These functions call back into the back-end IOMMU module by using the pin_pages | 297 | These functions call back into the back-end IOMMU module by using the pin_pages |
289 | and unpin_pages callbacks of the struct vfio_iommu_driver_ops[4]. Currently | 298 | and unpin_pages callbacks of the struct vfio_iommu_driver_ops[4]. Currently |
@@ -304,81 +313,80 @@ card. | |||
304 | 313 | ||
305 | This step creates a dummy device, /sys/devices/virtual/mtty/mtty/ | 314 | This step creates a dummy device, /sys/devices/virtual/mtty/mtty/ |
306 | 315 | ||
307 | Files in this device directory in sysfs are similar to the following: | 316 | Files in this device directory in sysfs are similar to the following:: |
308 | 317 | ||
309 | # tree /sys/devices/virtual/mtty/mtty/ | 318 | # tree /sys/devices/virtual/mtty/mtty/ |
310 | /sys/devices/virtual/mtty/mtty/ | 319 | /sys/devices/virtual/mtty/mtty/ |
311 | |-- mdev_supported_types | 320 | |-- mdev_supported_types |
312 | | |-- mtty-1 | 321 | | |-- mtty-1 |
313 | | | |-- available_instances | 322 | | | |-- available_instances |
314 | | | |-- create | 323 | | | |-- create |
315 | | | |-- device_api | 324 | | | |-- device_api |
316 | | | |-- devices | 325 | | | |-- devices |
317 | | | `-- name | 326 | | | `-- name |
318 | | `-- mtty-2 | 327 | | `-- mtty-2 |
319 | | |-- available_instances | 328 | | |-- available_instances |
320 | | |-- create | 329 | | |-- create |
321 | | |-- device_api | 330 | | |-- device_api |
322 | | |-- devices | 331 | | |-- devices |
323 | | `-- name | 332 | | `-- name |
324 | |-- mtty_dev | 333 | |-- mtty_dev |
325 | | `-- sample_mtty_dev | 334 | | `-- sample_mtty_dev |
326 | |-- power | 335 | |-- power |
327 | | |-- autosuspend_delay_ms | 336 | | |-- autosuspend_delay_ms |
328 | | |-- control | 337 | | |-- control |
329 | | |-- runtime_active_time | 338 | | |-- runtime_active_time |
330 | | |-- runtime_status | 339 | | |-- runtime_status |
331 | | `-- runtime_suspended_time | 340 | | `-- runtime_suspended_time |
332 | |-- subsystem -> ../../../../class/mtty | 341 | |-- subsystem -> ../../../../class/mtty |
333 | `-- uevent | 342 | `-- uevent |
334 | 343 | ||
335 | 2. Create a mediated device by using the dummy device that you created in the | 344 | 2. Create a mediated device by using the dummy device that you created in the |
336 | previous step. | 345 | previous step:: |
337 | 346 | ||
338 | # echo "83b8f4f2-509f-382f-3c1e-e6bfe0fa1001" > \ | 347 | # echo "83b8f4f2-509f-382f-3c1e-e6bfe0fa1001" > \ |
339 | /sys/devices/virtual/mtty/mtty/mdev_supported_types/mtty-2/create | 348 | /sys/devices/virtual/mtty/mtty/mdev_supported_types/mtty-2/create |
340 | 349 | ||
341 | 3. Add parameters to qemu-kvm. | 350 | 3. Add parameters to qemu-kvm:: |
342 | 351 | ||
343 | -device vfio-pci,\ | 352 | -device vfio-pci,\ |
344 | sysfsdev=/sys/bus/mdev/devices/83b8f4f2-509f-382f-3c1e-e6bfe0fa1001 | 353 | sysfsdev=/sys/bus/mdev/devices/83b8f4f2-509f-382f-3c1e-e6bfe0fa1001 |
345 | 354 | ||
346 | 4. Boot the VM. | 355 | 4. Boot the VM. |
347 | 356 | ||
348 | In the Linux guest VM, with no hardware on the host, the device appears | 357 | In the Linux guest VM, with no hardware on the host, the device appears |
349 | as follows: | 358 | as follows:: |
350 | 359 | ||
351 | # lspci -s 00:05.0 -xxvv | 360 | # lspci -s 00:05.0 -xxvv |
352 | 00:05.0 Serial controller: Device 4348:3253 (rev 10) (prog-if 02 [16550]) | 361 | 00:05.0 Serial controller: Device 4348:3253 (rev 10) (prog-if 02 [16550]) |
353 | Subsystem: Device 4348:3253 | 362 | Subsystem: Device 4348:3253 |
354 | Physical Slot: 5 | 363 | Physical Slot: 5 |
355 | Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- | 364 | Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- |
356 | Stepping- SERR- FastB2B- DisINTx- | 365 | Stepping- SERR- FastB2B- DisINTx- |
357 | Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- | 366 | Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- |
358 | <TAbort- <MAbort- >SERR- <PERR- INTx- | 367 | <TAbort- <MAbort- >SERR- <PERR- INTx- |
359 | Interrupt: pin A routed to IRQ 10 | 368 | Interrupt: pin A routed to IRQ 10 |
360 | Region 0: I/O ports at c150 [size=8] | 369 | Region 0: I/O ports at c150 [size=8] |
361 | Region 1: I/O ports at c158 [size=8] | 370 | Region 1: I/O ports at c158 [size=8] |
362 | Kernel driver in use: serial | 371 | Kernel driver in use: serial |
363 | 00: 48 43 53 32 01 00 00 02 10 02 00 07 00 00 00 00 | 372 | 00: 48 43 53 32 01 00 00 02 10 02 00 07 00 00 00 00 |
364 | 10: 51 c1 00 00 59 c1 00 00 00 00 00 00 00 00 00 00 | 373 | 10: 51 c1 00 00 59 c1 00 00 00 00 00 00 00 00 00 00 |
365 | 20: 00 00 00 00 00 00 00 00 00 00 00 00 48 43 53 32 | 374 | 20: 00 00 00 00 00 00 00 00 00 00 00 00 48 43 53 32 |
366 | 30: 00 00 00 00 00 00 00 00 00 00 00 00 0a 01 00 00 | 375 | 30: 00 00 00 00 00 00 00 00 00 00 00 00 0a 01 00 00 |
367 | 376 | ||
368 | In the Linux guest VM, dmesg output for the device is as follows: | 377 | In the Linux guest VM, dmesg output for the device is as follows: |
369 | 378 | ||
370 | serial 0000:00:05.0: PCI INT A -> Link[LNKA] -> GSI 10 (level, high) -> IRQ | 379 | serial 0000:00:05.0: PCI INT A -> Link[LNKA] -> GSI 10 (level, high) -> IRQ 10 |
371 | 10 | 380 | 0000:00:05.0: ttyS1 at I/O 0xc150 (irq = 10) is a 16550A |
372 | 0000:00:05.0: ttyS1 at I/O 0xc150 (irq = 10) is a 16550A | 381 | 0000:00:05.0: ttyS2 at I/O 0xc158 (irq = 10) is a 16550A |
373 | 0000:00:05.0: ttyS2 at I/O 0xc158 (irq = 10) is a 16550A | 382 | |
374 | 383 | ||
375 | 384 | 5. In the Linux guest VM, check the serial ports:: | |
376 | 5. In the Linux guest VM, check the serial ports. | 385 | |
377 | 386 | # setserial -g /dev/ttyS* | |
378 | # setserial -g /dev/ttyS* | 387 | /dev/ttyS0, UART: 16550A, Port: 0x03f8, IRQ: 4 |
379 | /dev/ttyS0, UART: 16550A, Port: 0x03f8, IRQ: 4 | 388 | /dev/ttyS1, UART: 16550A, Port: 0xc150, IRQ: 10 |
380 | /dev/ttyS1, UART: 16550A, Port: 0xc150, IRQ: 10 | 389 | /dev/ttyS2, UART: 16550A, Port: 0xc158, IRQ: 10 |
381 | /dev/ttyS2, UART: 16550A, Port: 0xc158, IRQ: 10 | ||
382 | 390 | ||
383 | 6. Using minicom or any terminal emulation program, open port /dev/ttyS1 or | 391 | 6. Using minicom or any terminal emulation program, open port /dev/ttyS1 or |
384 | /dev/ttyS2 with hardware flow control disabled. | 392 | /dev/ttyS2 with hardware flow control disabled. |
@@ -388,14 +396,14 @@ card. | |||
388 | 396 | ||
389 | Data is loop backed from hosts mtty driver. | 397 | Data is loop backed from hosts mtty driver. |
390 | 398 | ||
391 | 8. Destroy the mediated device that you created. | 399 | 8. Destroy the mediated device that you created:: |
392 | 400 | ||
393 | # echo 1 > /sys/bus/mdev/devices/83b8f4f2-509f-382f-3c1e-e6bfe0fa1001/remove | 401 | # echo 1 > /sys/bus/mdev/devices/83b8f4f2-509f-382f-3c1e-e6bfe0fa1001/remove |
394 | 402 | ||
395 | References | 403 | References |
396 | ========== | 404 | ========== |
397 | 405 | ||
398 | [1] See Documentation/vfio.txt for more information on VFIO. | 406 | 1. See Documentation/vfio.txt for more information on VFIO. |
399 | [2] struct mdev_driver in include/linux/mdev.h | 407 | 2. struct mdev_driver in include/linux/mdev.h |
400 | [3] struct mdev_parent_ops in include/linux/mdev.h | 408 | 3. struct mdev_parent_ops in include/linux/mdev.h |
401 | [4] struct vfio_iommu_driver_ops in include/linux/vfio.h | 409 | 4. struct vfio_iommu_driver_ops in include/linux/vfio.h |
diff --git a/Documentation/vfio.txt b/Documentation/vfio.txt index 1dd3fddfd3a1..ef6a5111eaa1 100644 --- a/Documentation/vfio.txt +++ b/Documentation/vfio.txt | |||
@@ -1,5 +1,7 @@ | |||
1 | VFIO - "Virtual Function I/O"[1] | 1 | ================================== |
2 | ------------------------------------------------------------------------------- | 2 | VFIO - "Virtual Function I/O" [1]_ |
3 | ================================== | ||
4 | |||
3 | Many modern system now provide DMA and interrupt remapping facilities | 5 | Many modern system now provide DMA and interrupt remapping facilities |
4 | to help ensure I/O devices behave within the boundaries they've been | 6 | to help ensure I/O devices behave within the boundaries they've been |
5 | allotted. This includes x86 hardware with AMD-Vi and Intel VT-d, | 7 | allotted. This includes x86 hardware with AMD-Vi and Intel VT-d, |
@@ -7,14 +9,14 @@ POWER systems with Partitionable Endpoints (PEs) and embedded PowerPC | |||
7 | systems such as Freescale PAMU. The VFIO driver is an IOMMU/device | 9 | systems such as Freescale PAMU. The VFIO driver is an IOMMU/device |
8 | agnostic framework for exposing direct device access to userspace, in | 10 | agnostic framework for exposing direct device access to userspace, in |
9 | a secure, IOMMU protected environment. In other words, this allows | 11 | a secure, IOMMU protected environment. In other words, this allows |
10 | safe[2], non-privileged, userspace drivers. | 12 | safe [2]_, non-privileged, userspace drivers. |
11 | 13 | ||
12 | Why do we want that? Virtual machines often make use of direct device | 14 | Why do we want that? Virtual machines often make use of direct device |
13 | access ("device assignment") when configured for the highest possible | 15 | access ("device assignment") when configured for the highest possible |
14 | I/O performance. From a device and host perspective, this simply | 16 | I/O performance. From a device and host perspective, this simply |
15 | turns the VM into a userspace driver, with the benefits of | 17 | turns the VM into a userspace driver, with the benefits of |
16 | significantly reduced latency, higher bandwidth, and direct use of | 18 | significantly reduced latency, higher bandwidth, and direct use of |
17 | bare-metal device drivers[3]. | 19 | bare-metal device drivers [3]_. |
18 | 20 | ||
19 | Some applications, particularly in the high performance computing | 21 | Some applications, particularly in the high performance computing |
20 | field, also benefit from low-overhead, direct device access from | 22 | field, also benefit from low-overhead, direct device access from |
@@ -31,7 +33,7 @@ KVM PCI specific device assignment code as well as provide a more | |||
31 | secure, more featureful userspace driver environment than UIO. | 33 | secure, more featureful userspace driver environment than UIO. |
32 | 34 | ||
33 | Groups, Devices, and IOMMUs | 35 | Groups, Devices, and IOMMUs |
34 | ------------------------------------------------------------------------------- | 36 | --------------------------- |
35 | 37 | ||
36 | Devices are the main target of any I/O driver. Devices typically | 38 | Devices are the main target of any I/O driver. Devices typically |
37 | create a programming interface made up of I/O access, interrupts, | 39 | create a programming interface made up of I/O access, interrupts, |
@@ -114,40 +116,40 @@ well as mechanisms for describing and registering interrupt | |||
114 | notifications. | 116 | notifications. |
115 | 117 | ||
116 | VFIO Usage Example | 118 | VFIO Usage Example |
117 | ------------------------------------------------------------------------------- | 119 | ------------------ |
118 | 120 | ||
119 | Assume user wants to access PCI device 0000:06:0d.0 | 121 | Assume user wants to access PCI device 0000:06:0d.0:: |
120 | 122 | ||
121 | $ readlink /sys/bus/pci/devices/0000:06:0d.0/iommu_group | 123 | $ readlink /sys/bus/pci/devices/0000:06:0d.0/iommu_group |
122 | ../../../../kernel/iommu_groups/26 | 124 | ../../../../kernel/iommu_groups/26 |
123 | 125 | ||
124 | This device is therefore in IOMMU group 26. This device is on the | 126 | This device is therefore in IOMMU group 26. This device is on the |
125 | pci bus, therefore the user will make use of vfio-pci to manage the | 127 | pci bus, therefore the user will make use of vfio-pci to manage the |
126 | group: | 128 | group:: |
127 | 129 | ||
128 | # modprobe vfio-pci | 130 | # modprobe vfio-pci |
129 | 131 | ||
130 | Binding this device to the vfio-pci driver creates the VFIO group | 132 | Binding this device to the vfio-pci driver creates the VFIO group |
131 | character devices for this group: | 133 | character devices for this group:: |
132 | 134 | ||
133 | $ lspci -n -s 0000:06:0d.0 | 135 | $ lspci -n -s 0000:06:0d.0 |
134 | 06:0d.0 0401: 1102:0002 (rev 08) | 136 | 06:0d.0 0401: 1102:0002 (rev 08) |
135 | # echo 0000:06:0d.0 > /sys/bus/pci/devices/0000:06:0d.0/driver/unbind | 137 | # echo 0000:06:0d.0 > /sys/bus/pci/devices/0000:06:0d.0/driver/unbind |
136 | # echo 1102 0002 > /sys/bus/pci/drivers/vfio-pci/new_id | 138 | # echo 1102 0002 > /sys/bus/pci/drivers/vfio-pci/new_id |
137 | 139 | ||
138 | Now we need to look at what other devices are in the group to free | 140 | Now we need to look at what other devices are in the group to free |
139 | it for use by VFIO: | 141 | it for use by VFIO:: |
140 | 142 | ||
141 | $ ls -l /sys/bus/pci/devices/0000:06:0d.0/iommu_group/devices | 143 | $ ls -l /sys/bus/pci/devices/0000:06:0d.0/iommu_group/devices |
142 | total 0 | 144 | total 0 |
143 | lrwxrwxrwx. 1 root root 0 Apr 23 16:13 0000:00:1e.0 -> | 145 | lrwxrwxrwx. 1 root root 0 Apr 23 16:13 0000:00:1e.0 -> |
144 | ../../../../devices/pci0000:00/0000:00:1e.0 | 146 | ../../../../devices/pci0000:00/0000:00:1e.0 |
145 | lrwxrwxrwx. 1 root root 0 Apr 23 16:13 0000:06:0d.0 -> | 147 | lrwxrwxrwx. 1 root root 0 Apr 23 16:13 0000:06:0d.0 -> |
146 | ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0 | 148 | ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0 |
147 | lrwxrwxrwx. 1 root root 0 Apr 23 16:13 0000:06:0d.1 -> | 149 | lrwxrwxrwx. 1 root root 0 Apr 23 16:13 0000:06:0d.1 -> |
148 | ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1 | 150 | ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1 |
149 | 151 | ||
150 | This device is behind a PCIe-to-PCI bridge[4], therefore we also | 152 | This device is behind a PCIe-to-PCI bridge [4]_, therefore we also |
151 | need to add device 0000:06:0d.1 to the group following the same | 153 | need to add device 0000:06:0d.1 to the group following the same |
152 | procedure as above. Device 0000:00:1e.0 is a bridge that does | 154 | procedure as above. Device 0000:00:1e.0 is a bridge that does |
153 | not currently have a host driver, therefore it's not required to | 155 | not currently have a host driver, therefore it's not required to |
@@ -157,12 +159,12 @@ support PCI bridges). | |||
157 | The final step is to provide the user with access to the group if | 159 | The final step is to provide the user with access to the group if |
158 | unprivileged operation is desired (note that /dev/vfio/vfio provides | 160 | unprivileged operation is desired (note that /dev/vfio/vfio provides |
159 | no capabilities on its own and is therefore expected to be set to | 161 | no capabilities on its own and is therefore expected to be set to |
160 | mode 0666 by the system). | 162 | mode 0666 by the system):: |
161 | 163 | ||
162 | # chown user:user /dev/vfio/26 | 164 | # chown user:user /dev/vfio/26 |
163 | 165 | ||
164 | The user now has full access to all the devices and the iommu for this | 166 | The user now has full access to all the devices and the iommu for this |
165 | group and can access them as follows: | 167 | group and can access them as follows:: |
166 | 168 | ||
167 | int container, group, device, i; | 169 | int container, group, device, i; |
168 | struct vfio_group_status group_status = | 170 | struct vfio_group_status group_status = |
@@ -248,31 +250,31 @@ VFIO bus driver API | |||
248 | VFIO bus drivers, such as vfio-pci make use of only a few interfaces | 250 | VFIO bus drivers, such as vfio-pci make use of only a few interfaces |
249 | into VFIO core. When devices are bound and unbound to the driver, | 251 | into VFIO core. When devices are bound and unbound to the driver, |
250 | the driver should call vfio_add_group_dev() and vfio_del_group_dev() | 252 | the driver should call vfio_add_group_dev() and vfio_del_group_dev() |
251 | respectively: | 253 | respectively:: |
252 | 254 | ||
253 | extern int vfio_add_group_dev(struct iommu_group *iommu_group, | 255 | extern int vfio_add_group_dev(struct iommu_group *iommu_group, |
254 | struct device *dev, | 256 | struct device *dev, |
255 | const struct vfio_device_ops *ops, | 257 | const struct vfio_device_ops *ops, |
256 | void *device_data); | 258 | void *device_data); |
257 | 259 | ||
258 | extern void *vfio_del_group_dev(struct device *dev); | 260 | extern void *vfio_del_group_dev(struct device *dev); |
259 | 261 | ||
260 | vfio_add_group_dev() indicates to the core to begin tracking the | 262 | vfio_add_group_dev() indicates to the core to begin tracking the |
261 | specified iommu_group and register the specified dev as owned by | 263 | specified iommu_group and register the specified dev as owned by |
262 | a VFIO bus driver. The driver provides an ops structure for callbacks | 264 | a VFIO bus driver. The driver provides an ops structure for callbacks |
263 | similar to a file operations structure: | 265 | similar to a file operations structure:: |
264 | 266 | ||
265 | struct vfio_device_ops { | 267 | struct vfio_device_ops { |
266 | int (*open)(void *device_data); | 268 | int (*open)(void *device_data); |
267 | void (*release)(void *device_data); | 269 | void (*release)(void *device_data); |
268 | ssize_t (*read)(void *device_data, char __user *buf, | 270 | ssize_t (*read)(void *device_data, char __user *buf, |
269 | size_t count, loff_t *ppos); | 271 | size_t count, loff_t *ppos); |
270 | ssize_t (*write)(void *device_data, const char __user *buf, | 272 | ssize_t (*write)(void *device_data, const char __user *buf, |
271 | size_t size, loff_t *ppos); | 273 | size_t size, loff_t *ppos); |
272 | long (*ioctl)(void *device_data, unsigned int cmd, | 274 | long (*ioctl)(void *device_data, unsigned int cmd, |
273 | unsigned long arg); | 275 | unsigned long arg); |
274 | int (*mmap)(void *device_data, struct vm_area_struct *vma); | 276 | int (*mmap)(void *device_data, struct vm_area_struct *vma); |
275 | }; | 277 | }; |
276 | 278 | ||
277 | Each function is passed the device_data that was originally registered | 279 | Each function is passed the device_data that was originally registered |
278 | in the vfio_add_group_dev() call above. This allows the bus driver | 280 | in the vfio_add_group_dev() call above. This allows the bus driver |
@@ -285,50 +287,55 @@ own VFIO_DEVICE_GET_REGION_INFO ioctl. | |||
285 | 287 | ||
286 | 288 | ||
287 | PPC64 sPAPR implementation note | 289 | PPC64 sPAPR implementation note |
288 | ------------------------------------------------------------------------------- | 290 | ------------------------------- |
289 | 291 | ||
290 | This implementation has some specifics: | 292 | This implementation has some specifics: |
291 | 293 | ||
292 | 1) On older systems (POWER7 with P5IOC2/IODA1) only one IOMMU group per | 294 | 1) On older systems (POWER7 with P5IOC2/IODA1) only one IOMMU group per |
293 | container is supported as an IOMMU table is allocated at the boot time, | 295 | container is supported as an IOMMU table is allocated at the boot time, |
294 | one table per a IOMMU group which is a Partitionable Endpoint (PE) | 296 | one table per a IOMMU group which is a Partitionable Endpoint (PE) |
295 | (PE is often a PCI domain but not always). | 297 | (PE is often a PCI domain but not always). |
296 | Newer systems (POWER8 with IODA2) have improved hardware design which allows | 298 | |
297 | to remove this limitation and have multiple IOMMU groups per a VFIO container. | 299 | Newer systems (POWER8 with IODA2) have improved hardware design which allows |
300 | to remove this limitation and have multiple IOMMU groups per a VFIO | ||
301 | container. | ||
298 | 302 | ||
299 | 2) The hardware supports so called DMA windows - the PCI address range | 303 | 2) The hardware supports so called DMA windows - the PCI address range |
300 | within which DMA transfer is allowed, any attempt to access address space | 304 | within which DMA transfer is allowed, any attempt to access address space |
301 | out of the window leads to the whole PE isolation. | 305 | out of the window leads to the whole PE isolation. |
302 | 306 | ||
303 | 3) PPC64 guests are paravirtualized but not fully emulated. There is an API | 307 | 3) PPC64 guests are paravirtualized but not fully emulated. There is an API |
304 | to map/unmap pages for DMA, and it normally maps 1..32 pages per call and | 308 | to map/unmap pages for DMA, and it normally maps 1..32 pages per call and |
305 | currently there is no way to reduce the number of calls. In order to make things | 309 | currently there is no way to reduce the number of calls. In order to make |
306 | faster, the map/unmap handling has been implemented in real mode which provides | 310 | things faster, the map/unmap handling has been implemented in real mode |
307 | an excellent performance which has limitations such as inability to do | 311 | which provides an excellent performance which has limitations such as |
308 | locked pages accounting in real time. | 312 | inability to do locked pages accounting in real time. |
309 | 313 | ||
310 | 4) According to sPAPR specification, A Partitionable Endpoint (PE) is an I/O | 314 | 4) According to sPAPR specification, A Partitionable Endpoint (PE) is an I/O |
311 | subtree that can be treated as a unit for the purposes of partitioning and | 315 | subtree that can be treated as a unit for the purposes of partitioning and |
312 | error recovery. A PE may be a single or multi-function IOA (IO Adapter), a | 316 | error recovery. A PE may be a single or multi-function IOA (IO Adapter), a |
313 | function of a multi-function IOA, or multiple IOAs (possibly including switch | 317 | function of a multi-function IOA, or multiple IOAs (possibly including |
314 | and bridge structures above the multiple IOAs). PPC64 guests detect PCI errors | 318 | switch and bridge structures above the multiple IOAs). PPC64 guests detect |
315 | and recover from them via EEH RTAS services, which works on the basis of | 319 | PCI errors and recover from them via EEH RTAS services, which works on the |
316 | additional ioctl commands. | 320 | basis of additional ioctl commands. |
317 | 321 | ||
318 | So 4 additional ioctls have been added: | 322 | So 4 additional ioctls have been added: |
319 | 323 | ||
320 | VFIO_IOMMU_SPAPR_TCE_GET_INFO - returns the size and the start | 324 | VFIO_IOMMU_SPAPR_TCE_GET_INFO |
321 | of the DMA window on the PCI bus. | 325 | returns the size and the start of the DMA window on the PCI bus. |
322 | 326 | ||
323 | VFIO_IOMMU_ENABLE - enables the container. The locked pages accounting | 327 | VFIO_IOMMU_ENABLE |
328 | enables the container. The locked pages accounting | ||
324 | is done at this point. This lets user first to know what | 329 | is done at this point. This lets user first to know what |
325 | the DMA window is and adjust rlimit before doing any real job. | 330 | the DMA window is and adjust rlimit before doing any real job. |
326 | 331 | ||
327 | VFIO_IOMMU_DISABLE - disables the container. | 332 | VFIO_IOMMU_DISABLE |
333 | disables the container. | ||
328 | 334 | ||
329 | VFIO_EEH_PE_OP - provides an API for EEH setup, error detection and recovery. | 335 | VFIO_EEH_PE_OP |
336 | provides an API for EEH setup, error detection and recovery. | ||
330 | 337 | ||
331 | The code flow from the example above should be slightly changed: | 338 | The code flow from the example above should be slightly changed:: |
332 | 339 | ||
333 | struct vfio_eeh_pe_op pe_op = { .argsz = sizeof(pe_op), .flags = 0 }; | 340 | struct vfio_eeh_pe_op pe_op = { .argsz = sizeof(pe_op), .flags = 0 }; |
334 | 341 | ||
@@ -442,73 +449,73 @@ The code flow from the example above should be slightly changed: | |||
442 | .... | 449 | .... |
443 | 450 | ||
444 | 5) There is v2 of SPAPR TCE IOMMU. It deprecates VFIO_IOMMU_ENABLE/ | 451 | 5) There is v2 of SPAPR TCE IOMMU. It deprecates VFIO_IOMMU_ENABLE/ |
445 | VFIO_IOMMU_DISABLE and implements 2 new ioctls: | 452 | VFIO_IOMMU_DISABLE and implements 2 new ioctls: |
446 | VFIO_IOMMU_SPAPR_REGISTER_MEMORY and VFIO_IOMMU_SPAPR_UNREGISTER_MEMORY | 453 | VFIO_IOMMU_SPAPR_REGISTER_MEMORY and VFIO_IOMMU_SPAPR_UNREGISTER_MEMORY |
447 | (which are unsupported in v1 IOMMU). | 454 | (which are unsupported in v1 IOMMU). |
448 | 455 | ||
449 | PPC64 paravirtualized guests generate a lot of map/unmap requests, | 456 | PPC64 paravirtualized guests generate a lot of map/unmap requests, |
450 | and the handling of those includes pinning/unpinning pages and updating | 457 | and the handling of those includes pinning/unpinning pages and updating |
451 | mm::locked_vm counter to make sure we do not exceed the rlimit. | 458 | mm::locked_vm counter to make sure we do not exceed the rlimit. |
452 | The v2 IOMMU splits accounting and pinning into separate operations: | 459 | The v2 IOMMU splits accounting and pinning into separate operations: |
453 | 460 | ||
454 | - VFIO_IOMMU_SPAPR_REGISTER_MEMORY/VFIO_IOMMU_SPAPR_UNREGISTER_MEMORY ioctls | 461 | - VFIO_IOMMU_SPAPR_REGISTER_MEMORY/VFIO_IOMMU_SPAPR_UNREGISTER_MEMORY ioctls |
455 | receive a user space address and size of the block to be pinned. | 462 | receive a user space address and size of the block to be pinned. |
456 | Bisecting is not supported and VFIO_IOMMU_UNREGISTER_MEMORY is expected to | 463 | Bisecting is not supported and VFIO_IOMMU_UNREGISTER_MEMORY is expected to |
457 | be called with the exact address and size used for registering | 464 | be called with the exact address and size used for registering |
458 | the memory block. The userspace is not expected to call these often. | 465 | the memory block. The userspace is not expected to call these often. |
459 | The ranges are stored in a linked list in a VFIO container. | 466 | The ranges are stored in a linked list in a VFIO container. |
460 | 467 | ||
461 | - VFIO_IOMMU_MAP_DMA/VFIO_IOMMU_UNMAP_DMA ioctls only update the actual | 468 | - VFIO_IOMMU_MAP_DMA/VFIO_IOMMU_UNMAP_DMA ioctls only update the actual |
462 | IOMMU table and do not do pinning; instead these check that the userspace | 469 | IOMMU table and do not do pinning; instead these check that the userspace |
463 | address is from pre-registered range. | 470 | address is from pre-registered range. |
464 | 471 | ||
465 | This separation helps in optimizing DMA for guests. | 472 | This separation helps in optimizing DMA for guests. |
466 | 473 | ||
467 | 6) sPAPR specification allows guests to have an additional DMA window(s) on | 474 | 6) sPAPR specification allows guests to have an additional DMA window(s) on |
468 | a PCI bus with a variable page size. Two ioctls have been added to support | 475 | a PCI bus with a variable page size. Two ioctls have been added to support |
469 | this: VFIO_IOMMU_SPAPR_TCE_CREATE and VFIO_IOMMU_SPAPR_TCE_REMOVE. | 476 | this: VFIO_IOMMU_SPAPR_TCE_CREATE and VFIO_IOMMU_SPAPR_TCE_REMOVE. |
470 | The platform has to support the functionality or error will be returned to | 477 | The platform has to support the functionality or error will be returned to |
471 | the userspace. The existing hardware supports up to 2 DMA windows, one is | 478 | the userspace. The existing hardware supports up to 2 DMA windows, one is |
472 | 2GB long, uses 4K pages and called "default 32bit window"; the other can | 479 | 2GB long, uses 4K pages and called "default 32bit window"; the other can |
473 | be as big as entire RAM, use different page size, it is optional - guests | 480 | be as big as entire RAM, use different page size, it is optional - guests |
474 | create those in run-time if the guest driver supports 64bit DMA. | 481 | create those in run-time if the guest driver supports 64bit DMA. |
475 | 482 | ||
476 | VFIO_IOMMU_SPAPR_TCE_CREATE receives a page shift, a DMA window size and | 483 | VFIO_IOMMU_SPAPR_TCE_CREATE receives a page shift, a DMA window size and |
477 | a number of TCE table levels (if a TCE table is going to be big enough and | 484 | a number of TCE table levels (if a TCE table is going to be big enough and |
478 | the kernel may not be able to allocate enough of physically contiguous memory). | 485 | the kernel may not be able to allocate enough of physically contiguous |
479 | It creates a new window in the available slot and returns the bus address where | 486 | memory). It creates a new window in the available slot and returns the bus |
480 | the new window starts. Due to hardware limitation, the user space cannot choose | 487 | address where the new window starts. Due to hardware limitation, the user |
481 | the location of DMA windows. | 488 | space cannot choose the location of DMA windows. |
482 | 489 | ||
483 | VFIO_IOMMU_SPAPR_TCE_REMOVE receives the bus start address of the window | 490 | VFIO_IOMMU_SPAPR_TCE_REMOVE receives the bus start address of the window |
484 | and removes it. | 491 | and removes it. |
485 | 492 | ||
486 | ------------------------------------------------------------------------------- | 493 | ------------------------------------------------------------------------------- |
487 | 494 | ||
488 | [1] VFIO was originally an acronym for "Virtual Function I/O" in its | 495 | .. [1] VFIO was originally an acronym for "Virtual Function I/O" in its |
489 | initial implementation by Tom Lyon while as Cisco. We've since | 496 | initial implementation by Tom Lyon while as Cisco. We've since |
490 | outgrown the acronym, but it's catchy. | 497 | outgrown the acronym, but it's catchy. |
491 | 498 | ||
492 | [2] "safe" also depends upon a device being "well behaved". It's | 499 | .. [2] "safe" also depends upon a device being "well behaved". It's |
493 | possible for multi-function devices to have backdoors between | 500 | possible for multi-function devices to have backdoors between |
494 | functions and even for single function devices to have alternative | 501 | functions and even for single function devices to have alternative |
495 | access to things like PCI config space through MMIO registers. To | 502 | access to things like PCI config space through MMIO registers. To |
496 | guard against the former we can include additional precautions in the | 503 | guard against the former we can include additional precautions in the |
497 | IOMMU driver to group multi-function PCI devices together | 504 | IOMMU driver to group multi-function PCI devices together |
498 | (iommu=group_mf). The latter we can't prevent, but the IOMMU should | 505 | (iommu=group_mf). The latter we can't prevent, but the IOMMU should |
499 | still provide isolation. For PCI, SR-IOV Virtual Functions are the | 506 | still provide isolation. For PCI, SR-IOV Virtual Functions are the |
500 | best indicator of "well behaved", as these are designed for | 507 | best indicator of "well behaved", as these are designed for |
501 | virtualization usage models. | 508 | virtualization usage models. |
502 | 509 | ||
503 | [3] As always there are trade-offs to virtual machine device | 510 | .. [3] As always there are trade-offs to virtual machine device |
504 | assignment that are beyond the scope of VFIO. It's expected that | 511 | assignment that are beyond the scope of VFIO. It's expected that |
505 | future IOMMU technologies will reduce some, but maybe not all, of | 512 | future IOMMU technologies will reduce some, but maybe not all, of |
506 | these trade-offs. | 513 | these trade-offs. |
507 | 514 | ||
508 | [4] In this case the device is below a PCI bridge, so transactions | 515 | .. [4] In this case the device is below a PCI bridge, so transactions |
509 | from either function of the device are indistinguishable to the iommu: | 516 | from either function of the device are indistinguishable to the iommu:: |
510 | 517 | ||
511 | -[0000:00]-+-1e.0-[06]--+-0d.0 | 518 | -[0000:00]-+-1e.0-[06]--+-0d.0 |
512 | \-0d.1 | 519 | \-0d.1 |
513 | 520 | ||
514 | 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 90) | 521 | 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 90) |
diff --git a/Documentation/xillybus.txt b/Documentation/xillybus.txt index 1660145b9969..2446ee303c09 100644 --- a/Documentation/xillybus.txt +++ b/Documentation/xillybus.txt | |||
@@ -1,12 +1,11 @@ | |||
1 | ========================================== | ||
2 | Xillybus driver for generic FPGA interface | ||
3 | ========================================== | ||
1 | 4 | ||
2 | ========================================== | 5 | :Author: Eli Billauer, Xillybus Ltd. (http://xillybus.com) |
3 | Xillybus driver for generic FPGA interface | 6 | :Email: eli.billauer@gmail.com or as advertised on Xillybus' site. |
4 | ========================================== | ||
5 | 7 | ||
6 | Author: Eli Billauer, Xillybus Ltd. (http://xillybus.com) | 8 | .. Contents: |
7 | Email: eli.billauer@gmail.com or as advertised on Xillybus' site. | ||
8 | |||
9 | Contents: | ||
10 | 9 | ||
11 | - Introduction | 10 | - Introduction |
12 | -- Background | 11 | -- Background |
@@ -17,7 +16,7 @@ Contents: | |||
17 | -- Synchronization | 16 | -- Synchronization |
18 | -- Seekable pipes | 17 | -- Seekable pipes |
19 | 18 | ||
20 | - Internals | 19 | - Internals |
21 | -- Source code organization | 20 | -- Source code organization |
22 | -- Pipe attributes | 21 | -- Pipe attributes |
23 | -- Host never reads from the FPGA | 22 | -- Host never reads from the FPGA |
@@ -29,7 +28,7 @@ Contents: | |||
29 | -- The "nonempty" message (supporting poll) | 28 | -- The "nonempty" message (supporting poll) |
30 | 29 | ||
31 | 30 | ||
32 | INTRODUCTION | 31 | Introduction |
33 | ============ | 32 | ============ |
34 | 33 | ||
35 | Background | 34 | Background |
@@ -105,7 +104,7 @@ driver is used to work out of the box with any Xillybus IP core. | |||
105 | The data structure just mentioned should not be confused with PCI's | 104 | The data structure just mentioned should not be confused with PCI's |
106 | configuration space or the Flattened Device Tree. | 105 | configuration space or the Flattened Device Tree. |
107 | 106 | ||
108 | USAGE | 107 | Usage |
109 | ===== | 108 | ===== |
110 | 109 | ||
111 | User interface | 110 | User interface |
@@ -117,11 +116,11 @@ names of these files depend on the IP core that is loaded in the FPGA (see | |||
117 | Probing below). To communicate with the FPGA, open the device file that | 116 | Probing below). To communicate with the FPGA, open the device file that |
118 | corresponds to the hardware FIFO you want to send data or receive data from, | 117 | corresponds to the hardware FIFO you want to send data or receive data from, |
119 | and use plain write() or read() calls, just like with a regular pipe. In | 118 | and use plain write() or read() calls, just like with a regular pipe. In |
120 | particular, it makes perfect sense to go: | 119 | particular, it makes perfect sense to go:: |
121 | 120 | ||
122 | $ cat mydata > /dev/xillybus_thisfifo | 121 | $ cat mydata > /dev/xillybus_thisfifo |
123 | 122 | ||
124 | $ cat /dev/xillybus_thatfifo > hisdata | 123 | $ cat /dev/xillybus_thatfifo > hisdata |
125 | 124 | ||
126 | possibly pressing CTRL-C as some stage, even though the xillybus_* pipes have | 125 | possibly pressing CTRL-C as some stage, even though the xillybus_* pipes have |
127 | the capability to send an EOF (but may not use it). | 126 | the capability to send an EOF (but may not use it). |
@@ -178,7 +177,7 @@ the attached memory is done by seeking to the desired address, and calling | |||
178 | read() or write() as required. | 177 | read() or write() as required. |
179 | 178 | ||
180 | 179 | ||
181 | INTERNALS | 180 | Internals |
182 | ========= | 181 | ========= |
183 | 182 | ||
184 | Source code organization | 183 | Source code organization |
@@ -365,7 +364,7 @@ into that page. It can be shown that all pages requested from the kernel | |||
365 | (except possibly for the last) are 100% utilized this way. | 364 | (except possibly for the last) are 100% utilized this way. |
366 | 365 | ||
367 | The "nonempty" message (supporting poll) | 366 | The "nonempty" message (supporting poll) |
368 | --------------------------------------- | 367 | ---------------------------------------- |
369 | 368 | ||
370 | In order to support the "poll" method (and hence select() ), there is a small | 369 | In order to support the "poll" method (and hence select() ), there is a small |
371 | catch regarding the FPGA to host direction: The FPGA may have filled a DMA | 370 | catch regarding the FPGA to host direction: The FPGA may have filled a DMA |
diff --git a/Documentation/xz.txt b/Documentation/xz.txt index 2cf3e2608de3..b2220d03aa50 100644 --- a/Documentation/xz.txt +++ b/Documentation/xz.txt | |||
@@ -1,121 +1,127 @@ | |||
1 | 1 | ============================ | |
2 | XZ data compression in Linux | 2 | XZ data compression in Linux |
3 | ============================ | 3 | ============================ |
4 | 4 | ||
5 | Introduction | 5 | Introduction |
6 | ============ | ||
6 | 7 | ||
7 | XZ is a general purpose data compression format with high compression | 8 | XZ is a general purpose data compression format with high compression |
8 | ratio and relatively fast decompression. The primary compression | 9 | ratio and relatively fast decompression. The primary compression |
9 | algorithm (filter) is LZMA2. Additional filters can be used to improve | 10 | algorithm (filter) is LZMA2. Additional filters can be used to improve |
10 | compression ratio even further. E.g. Branch/Call/Jump (BCJ) filters | 11 | compression ratio even further. E.g. Branch/Call/Jump (BCJ) filters |
11 | improve compression ratio of executable data. | 12 | improve compression ratio of executable data. |
12 | 13 | ||
13 | The XZ decompressor in Linux is called XZ Embedded. It supports | 14 | The XZ decompressor in Linux is called XZ Embedded. It supports |
14 | the LZMA2 filter and optionally also BCJ filters. CRC32 is supported | 15 | the LZMA2 filter and optionally also BCJ filters. CRC32 is supported |
15 | for integrity checking. The home page of XZ Embedded is at | 16 | for integrity checking. The home page of XZ Embedded is at |
16 | <http://tukaani.org/xz/embedded.html>, where you can find the | 17 | <http://tukaani.org/xz/embedded.html>, where you can find the |
17 | latest version and also information about using the code outside | 18 | latest version and also information about using the code outside |
18 | the Linux kernel. | 19 | the Linux kernel. |
19 | 20 | ||
20 | For userspace, XZ Utils provide a zlib-like compression library | 21 | For userspace, XZ Utils provide a zlib-like compression library |
21 | and a gzip-like command line tool. XZ Utils can be downloaded from | 22 | and a gzip-like command line tool. XZ Utils can be downloaded from |
22 | <http://tukaani.org/xz/>. | 23 | <http://tukaani.org/xz/>. |
23 | 24 | ||
24 | XZ related components in the kernel | 25 | XZ related components in the kernel |
25 | 26 | =================================== | |
26 | The xz_dec module provides XZ decompressor with single-call (buffer | 27 | |
27 | to buffer) and multi-call (stateful) APIs. The usage of the xz_dec | 28 | The xz_dec module provides XZ decompressor with single-call (buffer |
28 | module is documented in include/linux/xz.h. | 29 | to buffer) and multi-call (stateful) APIs. The usage of the xz_dec |
29 | 30 | module is documented in include/linux/xz.h. | |
30 | The xz_dec_test module is for testing xz_dec. xz_dec_test is not | 31 | |
31 | useful unless you are hacking the XZ decompressor. xz_dec_test | 32 | The xz_dec_test module is for testing xz_dec. xz_dec_test is not |
32 | allocates a char device major dynamically to which one can write | 33 | useful unless you are hacking the XZ decompressor. xz_dec_test |
33 | .xz files from userspace. The decompressed output is thrown away. | 34 | allocates a char device major dynamically to which one can write |
34 | Keep an eye on dmesg to see diagnostics printed by xz_dec_test. | 35 | .xz files from userspace. The decompressed output is thrown away. |
35 | See the xz_dec_test source code for the details. | 36 | Keep an eye on dmesg to see diagnostics printed by xz_dec_test. |
36 | 37 | See the xz_dec_test source code for the details. | |
37 | For decompressing the kernel image, initramfs, and initrd, there | 38 | |
38 | is a wrapper function in lib/decompress_unxz.c. Its API is the | 39 | For decompressing the kernel image, initramfs, and initrd, there |
39 | same as in other decompress_*.c files, which is defined in | 40 | is a wrapper function in lib/decompress_unxz.c. Its API is the |
40 | include/linux/decompress/generic.h. | 41 | same as in other decompress_*.c files, which is defined in |
41 | 42 | include/linux/decompress/generic.h. | |
42 | scripts/xz_wrap.sh is a wrapper for the xz command line tool found | 43 | |
43 | from XZ Utils. The wrapper sets compression options to values suitable | 44 | scripts/xz_wrap.sh is a wrapper for the xz command line tool found |
44 | for compressing the kernel image. | 45 | from XZ Utils. The wrapper sets compression options to values suitable |
45 | 46 | for compressing the kernel image. | |
46 | For kernel makefiles, two commands are provided for use with | 47 | |
47 | $(call if_needed). The kernel image should be compressed with | 48 | For kernel makefiles, two commands are provided for use with |
48 | $(call if_needed,xzkern) which will use a BCJ filter and a big LZMA2 | 49 | $(call if_needed). The kernel image should be compressed with |
49 | dictionary. It will also append a four-byte trailer containing the | 50 | $(call if_needed,xzkern) which will use a BCJ filter and a big LZMA2 |
50 | uncompressed size of the file, which is needed by the boot code. | 51 | dictionary. It will also append a four-byte trailer containing the |
51 | Other things should be compressed with $(call if_needed,xzmisc) | 52 | uncompressed size of the file, which is needed by the boot code. |
52 | which will use no BCJ filter and 1 MiB LZMA2 dictionary. | 53 | Other things should be compressed with $(call if_needed,xzmisc) |
54 | which will use no BCJ filter and 1 MiB LZMA2 dictionary. | ||
53 | 55 | ||
54 | Notes on compression options | 56 | Notes on compression options |
57 | ============================ | ||
55 | 58 | ||
56 | Since the XZ Embedded supports only streams with no integrity check or | 59 | Since the XZ Embedded supports only streams with no integrity check or |
57 | CRC32, make sure that you don't use some other integrity check type | 60 | CRC32, make sure that you don't use some other integrity check type |
58 | when encoding files that are supposed to be decoded by the kernel. With | 61 | when encoding files that are supposed to be decoded by the kernel. With |
59 | liblzma, you need to use either LZMA_CHECK_NONE or LZMA_CHECK_CRC32 | 62 | liblzma, you need to use either LZMA_CHECK_NONE or LZMA_CHECK_CRC32 |
60 | when encoding. With the xz command line tool, use --check=none or | 63 | when encoding. With the xz command line tool, use --check=none or |
61 | --check=crc32. | 64 | --check=crc32. |
62 | 65 | ||
63 | Using CRC32 is strongly recommended unless there is some other layer | 66 | Using CRC32 is strongly recommended unless there is some other layer |
64 | which will verify the integrity of the uncompressed data anyway. | 67 | which will verify the integrity of the uncompressed data anyway. |
65 | Double checking the integrity would probably be waste of CPU cycles. | 68 | Double checking the integrity would probably be waste of CPU cycles. |
66 | Note that the headers will always have a CRC32 which will be validated | 69 | Note that the headers will always have a CRC32 which will be validated |
67 | by the decoder; you can only change the integrity check type (or | 70 | by the decoder; you can only change the integrity check type (or |
68 | disable it) for the actual uncompressed data. | 71 | disable it) for the actual uncompressed data. |
69 | 72 | ||
70 | In userspace, LZMA2 is typically used with dictionary sizes of several | 73 | In userspace, LZMA2 is typically used with dictionary sizes of several |
71 | megabytes. The decoder needs to have the dictionary in RAM, thus big | 74 | megabytes. The decoder needs to have the dictionary in RAM, thus big |
72 | dictionaries cannot be used for files that are intended to be decoded | 75 | dictionaries cannot be used for files that are intended to be decoded |
73 | by the kernel. 1 MiB is probably the maximum reasonable dictionary | 76 | by the kernel. 1 MiB is probably the maximum reasonable dictionary |
74 | size for in-kernel use (maybe more is OK for initramfs). The presets | 77 | size for in-kernel use (maybe more is OK for initramfs). The presets |
75 | in XZ Utils may not be optimal when creating files for the kernel, | 78 | in XZ Utils may not be optimal when creating files for the kernel, |
76 | so don't hesitate to use custom settings. Example: | 79 | so don't hesitate to use custom settings. Example:: |
77 | 80 | ||
78 | xz --check=crc32 --lzma2=dict=512KiB inputfile | 81 | xz --check=crc32 --lzma2=dict=512KiB inputfile |
79 | 82 | ||
80 | An exception to above dictionary size limitation is when the decoder | 83 | An exception to above dictionary size limitation is when the decoder |
81 | is used in single-call mode. Decompressing the kernel itself is an | 84 | is used in single-call mode. Decompressing the kernel itself is an |
82 | example of this situation. In single-call mode, the memory usage | 85 | example of this situation. In single-call mode, the memory usage |
83 | doesn't depend on the dictionary size, and it is perfectly fine to | 86 | doesn't depend on the dictionary size, and it is perfectly fine to |
84 | use a big dictionary: for maximum compression, the dictionary should | 87 | use a big dictionary: for maximum compression, the dictionary should |
85 | be at least as big as the uncompressed data itself. | 88 | be at least as big as the uncompressed data itself. |
86 | 89 | ||
87 | Future plans | 90 | Future plans |
91 | ============ | ||
88 | 92 | ||
89 | Creating a limited XZ encoder may be considered if people think it is | 93 | Creating a limited XZ encoder may be considered if people think it is |
90 | useful. LZMA2 is slower to compress than e.g. Deflate or LZO even at | 94 | useful. LZMA2 is slower to compress than e.g. Deflate or LZO even at |
91 | the fastest settings, so it isn't clear if LZMA2 encoder is wanted | 95 | the fastest settings, so it isn't clear if LZMA2 encoder is wanted |
92 | into the kernel. | 96 | into the kernel. |
93 | 97 | ||
94 | Support for limited random-access reading is planned for the | 98 | Support for limited random-access reading is planned for the |
95 | decompression code. I don't know if it could have any use in the | 99 | decompression code. I don't know if it could have any use in the |
96 | kernel, but I know that it would be useful in some embedded projects | 100 | kernel, but I know that it would be useful in some embedded projects |
97 | outside the Linux kernel. | 101 | outside the Linux kernel. |
98 | 102 | ||
99 | Conformance to the .xz file format specification | 103 | Conformance to the .xz file format specification |
104 | ================================================ | ||
100 | 105 | ||
101 | There are a couple of corner cases where things have been simplified | 106 | There are a couple of corner cases where things have been simplified |
102 | at expense of detecting errors as early as possible. These should not | 107 | at expense of detecting errors as early as possible. These should not |
103 | matter in practice all, since they don't cause security issues. But | 108 | matter in practice all, since they don't cause security issues. But |
104 | it is good to know this if testing the code e.g. with the test files | 109 | it is good to know this if testing the code e.g. with the test files |
105 | from XZ Utils. | 110 | from XZ Utils. |
106 | 111 | ||
107 | Reporting bugs | 112 | Reporting bugs |
113 | ============== | ||
108 | 114 | ||
109 | Before reporting a bug, please check that it's not fixed already | 115 | Before reporting a bug, please check that it's not fixed already |
110 | at upstream. See <http://tukaani.org/xz/embedded.html> to get the | 116 | at upstream. See <http://tukaani.org/xz/embedded.html> to get the |
111 | latest code. | 117 | latest code. |
112 | 118 | ||
113 | Report bugs to <lasse.collin@tukaani.org> or visit #tukaani on | 119 | Report bugs to <lasse.collin@tukaani.org> or visit #tukaani on |
114 | Freenode and talk to Larhzu. I don't actively read LKML or other | 120 | Freenode and talk to Larhzu. I don't actively read LKML or other |
115 | kernel-related mailing lists, so if there's something I should know, | 121 | kernel-related mailing lists, so if there's something I should know, |
116 | you should email to me personally or use IRC. | 122 | you should email to me personally or use IRC. |
117 | 123 | ||
118 | Don't bother Igor Pavlov with questions about the XZ implementation | 124 | Don't bother Igor Pavlov with questions about the XZ implementation |
119 | in the kernel or about XZ Utils. While these two implementations | 125 | in the kernel or about XZ Utils. While these two implementations |
120 | include essential code that is directly based on Igor Pavlov's code, | 126 | include essential code that is directly based on Igor Pavlov's code, |
121 | these implementations aren't maintained nor supported by him. | 127 | these implementations aren't maintained nor supported by him. |
diff --git a/Documentation/zorro.txt b/Documentation/zorro.txt index d530971beb00..664072b017e3 100644 --- a/Documentation/zorro.txt +++ b/Documentation/zorro.txt | |||
@@ -1,12 +1,13 @@ | |||
1 | Writing Device Drivers for Zorro Devices | 1 | ======================================== |
2 | ---------------------------------------- | 2 | Writing Device Drivers for Zorro Devices |
3 | ======================================== | ||
3 | 4 | ||
4 | Written by Geert Uytterhoeven <geert@linux-m68k.org> | 5 | :Author: Written by Geert Uytterhoeven <geert@linux-m68k.org> |
5 | Last revised: September 5, 2003 | 6 | :Last revised: September 5, 2003 |
6 | 7 | ||
7 | 8 | ||
8 | 1. Introduction | 9 | Introduction |
9 | --------------- | 10 | ------------ |
10 | 11 | ||
11 | The Zorro bus is the bus used in the Amiga family of computers. Thanks to | 12 | The Zorro bus is the bus used in the Amiga family of computers. Thanks to |
12 | AutoConfig(tm), it's 100% Plug-and-Play. | 13 | AutoConfig(tm), it's 100% Plug-and-Play. |
@@ -20,12 +21,12 @@ There are two types of Zorro buses, Zorro II and Zorro III: | |||
20 | with Zorro II. The Zorro III address space lies outside the first 16 MB. | 21 | with Zorro II. The Zorro III address space lies outside the first 16 MB. |
21 | 22 | ||
22 | 23 | ||
23 | 2. Probing for Zorro Devices | 24 | Probing for Zorro Devices |
24 | ---------------------------- | 25 | ------------------------- |
25 | 26 | ||
26 | Zorro devices are found by calling `zorro_find_device()', which returns a | 27 | Zorro devices are found by calling ``zorro_find_device()``, which returns a |
27 | pointer to the `next' Zorro device with the specified Zorro ID. A probe loop | 28 | pointer to the ``next`` Zorro device with the specified Zorro ID. A probe loop |
28 | for the board with Zorro ID `ZORRO_PROD_xxx' looks like: | 29 | for the board with Zorro ID ``ZORRO_PROD_xxx`` looks like:: |
29 | 30 | ||
30 | struct zorro_dev *z = NULL; | 31 | struct zorro_dev *z = NULL; |
31 | 32 | ||
@@ -35,8 +36,8 @@ for the board with Zorro ID `ZORRO_PROD_xxx' looks like: | |||
35 | ... | 36 | ... |
36 | } | 37 | } |
37 | 38 | ||
38 | `ZORRO_WILDCARD' acts as a wildcard and finds any Zorro device. If your driver | 39 | ``ZORRO_WILDCARD`` acts as a wildcard and finds any Zorro device. If your driver |
39 | supports different types of boards, you can use a construct like: | 40 | supports different types of boards, you can use a construct like:: |
40 | 41 | ||
41 | struct zorro_dev *z = NULL; | 42 | struct zorro_dev *z = NULL; |
42 | 43 | ||
@@ -49,24 +50,24 @@ supports different types of boards, you can use a construct like: | |||
49 | } | 50 | } |
50 | 51 | ||
51 | 52 | ||
52 | 3. Zorro Resources | 53 | Zorro Resources |
53 | ------------------ | 54 | --------------- |
54 | 55 | ||
55 | Before you can access a Zorro device's registers, you have to make sure it's | 56 | Before you can access a Zorro device's registers, you have to make sure it's |
56 | not yet in use. This is done using the I/O memory space resource management | 57 | not yet in use. This is done using the I/O memory space resource management |
57 | functions: | 58 | functions:: |
58 | 59 | ||
59 | request_mem_region() | 60 | request_mem_region() |
60 | release_mem_region() | 61 | release_mem_region() |
61 | 62 | ||
62 | Shortcuts to claim the whole device's address space are provided as well: | 63 | Shortcuts to claim the whole device's address space are provided as well:: |
63 | 64 | ||
64 | zorro_request_device | 65 | zorro_request_device |
65 | zorro_release_device | 66 | zorro_release_device |
66 | 67 | ||
67 | 68 | ||
68 | 4. Accessing the Zorro Address Space | 69 | Accessing the Zorro Address Space |
69 | ------------------------------------ | 70 | --------------------------------- |
70 | 71 | ||
71 | The address regions in the Zorro device resources are Zorro bus address | 72 | The address regions in the Zorro device resources are Zorro bus address |
72 | regions. Due to the identity bus-physical address mapping on the Zorro bus, | 73 | regions. Due to the identity bus-physical address mapping on the Zorro bus, |
@@ -78,26 +79,26 @@ The treatment of these regions depends on the type of Zorro space: | |||
78 | explicitly using z_ioremap(). | 79 | explicitly using z_ioremap(). |
79 | 80 | ||
80 | Conversion from bus/physical Zorro II addresses to kernel virtual addresses | 81 | Conversion from bus/physical Zorro II addresses to kernel virtual addresses |
81 | and vice versa is done using: | 82 | and vice versa is done using:: |
82 | 83 | ||
83 | virt_addr = ZTWO_VADDR(bus_addr); | 84 | virt_addr = ZTWO_VADDR(bus_addr); |
84 | bus_addr = ZTWO_PADDR(virt_addr); | 85 | bus_addr = ZTWO_PADDR(virt_addr); |
85 | 86 | ||
86 | - Zorro III address space must be mapped explicitly using z_ioremap() first | 87 | - Zorro III address space must be mapped explicitly using z_ioremap() first |
87 | before it can be accessed: | 88 | before it can be accessed:: |
88 | 89 | ||
89 | virt_addr = z_ioremap(bus_addr, size); | 90 | virt_addr = z_ioremap(bus_addr, size); |
90 | ... | 91 | ... |
91 | z_iounmap(virt_addr); | 92 | z_iounmap(virt_addr); |
92 | 93 | ||
93 | 94 | ||
94 | 5. References | 95 | References |
95 | ------------- | 96 | ---------- |
96 | 97 | ||
97 | linux/include/linux/zorro.h | 98 | #. linux/include/linux/zorro.h |
98 | linux/include/uapi/linux/zorro.h | 99 | #. linux/include/uapi/linux/zorro.h |
99 | linux/include/uapi/linux/zorro_ids.h | 100 | #. linux/include/uapi/linux/zorro_ids.h |
100 | linux/arch/m68k/include/asm/zorro.h | 101 | #. linux/arch/m68k/include/asm/zorro.h |
101 | linux/drivers/zorro | 102 | #. linux/drivers/zorro |
102 | /proc/bus/zorro | 103 | #. /proc/bus/zorro |
103 | 104 | ||