diff options
| -rw-r--r-- | Documentation/vfio.txt | 281 |
1 files changed, 144 insertions, 137 deletions
diff --git a/Documentation/vfio.txt b/Documentation/vfio.txt index 1dd3fddfd3a1..ef6a5111eaa1 100644 --- a/Documentation/vfio.txt +++ b/Documentation/vfio.txt | |||
| @@ -1,5 +1,7 @@ | |||
| 1 | VFIO - "Virtual Function I/O"[1] | 1 | ================================== |
| 2 | ------------------------------------------------------------------------------- | 2 | VFIO - "Virtual Function I/O" [1]_ |
| 3 | ================================== | ||
| 4 | |||
| 3 | Many modern system now provide DMA and interrupt remapping facilities | 5 | Many modern system now provide DMA and interrupt remapping facilities |
| 4 | to help ensure I/O devices behave within the boundaries they've been | 6 | to help ensure I/O devices behave within the boundaries they've been |
| 5 | allotted. This includes x86 hardware with AMD-Vi and Intel VT-d, | 7 | allotted. This includes x86 hardware with AMD-Vi and Intel VT-d, |
| @@ -7,14 +9,14 @@ POWER systems with Partitionable Endpoints (PEs) and embedded PowerPC | |||
| 7 | systems such as Freescale PAMU. The VFIO driver is an IOMMU/device | 9 | systems such as Freescale PAMU. The VFIO driver is an IOMMU/device |
| 8 | agnostic framework for exposing direct device access to userspace, in | 10 | agnostic framework for exposing direct device access to userspace, in |
| 9 | a secure, IOMMU protected environment. In other words, this allows | 11 | a secure, IOMMU protected environment. In other words, this allows |
| 10 | safe[2], non-privileged, userspace drivers. | 12 | safe [2]_, non-privileged, userspace drivers. |
| 11 | 13 | ||
| 12 | Why do we want that? Virtual machines often make use of direct device | 14 | Why do we want that? Virtual machines often make use of direct device |
| 13 | access ("device assignment") when configured for the highest possible | 15 | access ("device assignment") when configured for the highest possible |
| 14 | I/O performance. From a device and host perspective, this simply | 16 | I/O performance. From a device and host perspective, this simply |
| 15 | turns the VM into a userspace driver, with the benefits of | 17 | turns the VM into a userspace driver, with the benefits of |
| 16 | significantly reduced latency, higher bandwidth, and direct use of | 18 | significantly reduced latency, higher bandwidth, and direct use of |
| 17 | bare-metal device drivers[3]. | 19 | bare-metal device drivers [3]_. |
| 18 | 20 | ||
| 19 | Some applications, particularly in the high performance computing | 21 | Some applications, particularly in the high performance computing |
| 20 | field, also benefit from low-overhead, direct device access from | 22 | field, also benefit from low-overhead, direct device access from |
| @@ -31,7 +33,7 @@ KVM PCI specific device assignment code as well as provide a more | |||
| 31 | secure, more featureful userspace driver environment than UIO. | 33 | secure, more featureful userspace driver environment than UIO. |
| 32 | 34 | ||
| 33 | Groups, Devices, and IOMMUs | 35 | Groups, Devices, and IOMMUs |
| 34 | ------------------------------------------------------------------------------- | 36 | --------------------------- |
| 35 | 37 | ||
| 36 | Devices are the main target of any I/O driver. Devices typically | 38 | Devices are the main target of any I/O driver. Devices typically |
| 37 | create a programming interface made up of I/O access, interrupts, | 39 | create a programming interface made up of I/O access, interrupts, |
| @@ -114,40 +116,40 @@ well as mechanisms for describing and registering interrupt | |||
| 114 | notifications. | 116 | notifications. |
| 115 | 117 | ||
| 116 | VFIO Usage Example | 118 | VFIO Usage Example |
| 117 | ------------------------------------------------------------------------------- | 119 | ------------------ |
| 118 | 120 | ||
| 119 | Assume user wants to access PCI device 0000:06:0d.0 | 121 | Assume user wants to access PCI device 0000:06:0d.0:: |
| 120 | 122 | ||
| 121 | $ readlink /sys/bus/pci/devices/0000:06:0d.0/iommu_group | 123 | $ readlink /sys/bus/pci/devices/0000:06:0d.0/iommu_group |
| 122 | ../../../../kernel/iommu_groups/26 | 124 | ../../../../kernel/iommu_groups/26 |
| 123 | 125 | ||
| 124 | This device is therefore in IOMMU group 26. This device is on the | 126 | This device is therefore in IOMMU group 26. This device is on the |
| 125 | pci bus, therefore the user will make use of vfio-pci to manage the | 127 | pci bus, therefore the user will make use of vfio-pci to manage the |
| 126 | group: | 128 | group:: |
| 127 | 129 | ||
| 128 | # modprobe vfio-pci | 130 | # modprobe vfio-pci |
| 129 | 131 | ||
| 130 | Binding this device to the vfio-pci driver creates the VFIO group | 132 | Binding this device to the vfio-pci driver creates the VFIO group |
| 131 | character devices for this group: | 133 | character devices for this group:: |
| 132 | 134 | ||
| 133 | $ lspci -n -s 0000:06:0d.0 | 135 | $ lspci -n -s 0000:06:0d.0 |
| 134 | 06:0d.0 0401: 1102:0002 (rev 08) | 136 | 06:0d.0 0401: 1102:0002 (rev 08) |
| 135 | # echo 0000:06:0d.0 > /sys/bus/pci/devices/0000:06:0d.0/driver/unbind | 137 | # echo 0000:06:0d.0 > /sys/bus/pci/devices/0000:06:0d.0/driver/unbind |
| 136 | # echo 1102 0002 > /sys/bus/pci/drivers/vfio-pci/new_id | 138 | # echo 1102 0002 > /sys/bus/pci/drivers/vfio-pci/new_id |
| 137 | 139 | ||
| 138 | Now we need to look at what other devices are in the group to free | 140 | Now we need to look at what other devices are in the group to free |
| 139 | it for use by VFIO: | 141 | it for use by VFIO:: |
| 140 | 142 | ||
| 141 | $ ls -l /sys/bus/pci/devices/0000:06:0d.0/iommu_group/devices | 143 | $ ls -l /sys/bus/pci/devices/0000:06:0d.0/iommu_group/devices |
| 142 | total 0 | 144 | total 0 |
| 143 | lrwxrwxrwx. 1 root root 0 Apr 23 16:13 0000:00:1e.0 -> | 145 | lrwxrwxrwx. 1 root root 0 Apr 23 16:13 0000:00:1e.0 -> |
| 144 | ../../../../devices/pci0000:00/0000:00:1e.0 | 146 | ../../../../devices/pci0000:00/0000:00:1e.0 |
| 145 | lrwxrwxrwx. 1 root root 0 Apr 23 16:13 0000:06:0d.0 -> | 147 | lrwxrwxrwx. 1 root root 0 Apr 23 16:13 0000:06:0d.0 -> |
| 146 | ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0 | 148 | ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0 |
| 147 | lrwxrwxrwx. 1 root root 0 Apr 23 16:13 0000:06:0d.1 -> | 149 | lrwxrwxrwx. 1 root root 0 Apr 23 16:13 0000:06:0d.1 -> |
| 148 | ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1 | 150 | ../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1 |
| 149 | 151 | ||
| 150 | This device is behind a PCIe-to-PCI bridge[4], therefore we also | 152 | This device is behind a PCIe-to-PCI bridge [4]_, therefore we also |
| 151 | need to add device 0000:06:0d.1 to the group following the same | 153 | need to add device 0000:06:0d.1 to the group following the same |
| 152 | procedure as above. Device 0000:00:1e.0 is a bridge that does | 154 | procedure as above. Device 0000:00:1e.0 is a bridge that does |
| 153 | not currently have a host driver, therefore it's not required to | 155 | not currently have a host driver, therefore it's not required to |
| @@ -157,12 +159,12 @@ support PCI bridges). | |||
| 157 | The final step is to provide the user with access to the group if | 159 | The final step is to provide the user with access to the group if |
| 158 | unprivileged operation is desired (note that /dev/vfio/vfio provides | 160 | unprivileged operation is desired (note that /dev/vfio/vfio provides |
| 159 | no capabilities on its own and is therefore expected to be set to | 161 | no capabilities on its own and is therefore expected to be set to |
| 160 | mode 0666 by the system). | 162 | mode 0666 by the system):: |
| 161 | 163 | ||
| 162 | # chown user:user /dev/vfio/26 | 164 | # chown user:user /dev/vfio/26 |
| 163 | 165 | ||
| 164 | The user now has full access to all the devices and the iommu for this | 166 | The user now has full access to all the devices and the iommu for this |
| 165 | group and can access them as follows: | 167 | group and can access them as follows:: |
| 166 | 168 | ||
| 167 | int container, group, device, i; | 169 | int container, group, device, i; |
| 168 | struct vfio_group_status group_status = | 170 | struct vfio_group_status group_status = |
| @@ -248,31 +250,31 @@ VFIO bus driver API | |||
| 248 | VFIO bus drivers, such as vfio-pci make use of only a few interfaces | 250 | VFIO bus drivers, such as vfio-pci make use of only a few interfaces |
| 249 | into VFIO core. When devices are bound and unbound to the driver, | 251 | into VFIO core. When devices are bound and unbound to the driver, |
| 250 | the driver should call vfio_add_group_dev() and vfio_del_group_dev() | 252 | the driver should call vfio_add_group_dev() and vfio_del_group_dev() |
| 251 | respectively: | 253 | respectively:: |
| 252 | 254 | ||
| 253 | extern int vfio_add_group_dev(struct iommu_group *iommu_group, | 255 | extern int vfio_add_group_dev(struct iommu_group *iommu_group, |
| 254 | struct device *dev, | 256 | struct device *dev, |
| 255 | const struct vfio_device_ops *ops, | 257 | const struct vfio_device_ops *ops, |
| 256 | void *device_data); | 258 | void *device_data); |
| 257 | 259 | ||
| 258 | extern void *vfio_del_group_dev(struct device *dev); | 260 | extern void *vfio_del_group_dev(struct device *dev); |
| 259 | 261 | ||
| 260 | vfio_add_group_dev() indicates to the core to begin tracking the | 262 | vfio_add_group_dev() indicates to the core to begin tracking the |
| 261 | specified iommu_group and register the specified dev as owned by | 263 | specified iommu_group and register the specified dev as owned by |
| 262 | a VFIO bus driver. The driver provides an ops structure for callbacks | 264 | a VFIO bus driver. The driver provides an ops structure for callbacks |
| 263 | similar to a file operations structure: | 265 | similar to a file operations structure:: |
| 264 | 266 | ||
| 265 | struct vfio_device_ops { | 267 | struct vfio_device_ops { |
| 266 | int (*open)(void *device_data); | 268 | int (*open)(void *device_data); |
| 267 | void (*release)(void *device_data); | 269 | void (*release)(void *device_data); |
| 268 | ssize_t (*read)(void *device_data, char __user *buf, | 270 | ssize_t (*read)(void *device_data, char __user *buf, |
| 269 | size_t count, loff_t *ppos); | 271 | size_t count, loff_t *ppos); |
| 270 | ssize_t (*write)(void *device_data, const char __user *buf, | 272 | ssize_t (*write)(void *device_data, const char __user *buf, |
| 271 | size_t size, loff_t *ppos); | 273 | size_t size, loff_t *ppos); |
| 272 | long (*ioctl)(void *device_data, unsigned int cmd, | 274 | long (*ioctl)(void *device_data, unsigned int cmd, |
| 273 | unsigned long arg); | 275 | unsigned long arg); |
| 274 | int (*mmap)(void *device_data, struct vm_area_struct *vma); | 276 | int (*mmap)(void *device_data, struct vm_area_struct *vma); |
| 275 | }; | 277 | }; |
| 276 | 278 | ||
| 277 | Each function is passed the device_data that was originally registered | 279 | Each function is passed the device_data that was originally registered |
| 278 | in the vfio_add_group_dev() call above. This allows the bus driver | 280 | in the vfio_add_group_dev() call above. This allows the bus driver |
| @@ -285,50 +287,55 @@ own VFIO_DEVICE_GET_REGION_INFO ioctl. | |||
| 285 | 287 | ||
| 286 | 288 | ||
| 287 | PPC64 sPAPR implementation note | 289 | PPC64 sPAPR implementation note |
| 288 | ------------------------------------------------------------------------------- | 290 | ------------------------------- |
| 289 | 291 | ||
| 290 | This implementation has some specifics: | 292 | This implementation has some specifics: |
| 291 | 293 | ||
| 292 | 1) On older systems (POWER7 with P5IOC2/IODA1) only one IOMMU group per | 294 | 1) On older systems (POWER7 with P5IOC2/IODA1) only one IOMMU group per |
| 293 | container is supported as an IOMMU table is allocated at the boot time, | 295 | container is supported as an IOMMU table is allocated at the boot time, |
| 294 | one table per a IOMMU group which is a Partitionable Endpoint (PE) | 296 | one table per a IOMMU group which is a Partitionable Endpoint (PE) |
| 295 | (PE is often a PCI domain but not always). | 297 | (PE is often a PCI domain but not always). |
| 296 | Newer systems (POWER8 with IODA2) have improved hardware design which allows | 298 | |
| 297 | to remove this limitation and have multiple IOMMU groups per a VFIO container. | 299 | Newer systems (POWER8 with IODA2) have improved hardware design which allows |
| 300 | to remove this limitation and have multiple IOMMU groups per a VFIO | ||
| 301 | container. | ||
| 298 | 302 | ||
| 299 | 2) The hardware supports so called DMA windows - the PCI address range | 303 | 2) The hardware supports so called DMA windows - the PCI address range |
| 300 | within which DMA transfer is allowed, any attempt to access address space | 304 | within which DMA transfer is allowed, any attempt to access address space |
| 301 | out of the window leads to the whole PE isolation. | 305 | out of the window leads to the whole PE isolation. |
| 302 | 306 | ||
| 303 | 3) PPC64 guests are paravirtualized but not fully emulated. There is an API | 307 | 3) PPC64 guests are paravirtualized but not fully emulated. There is an API |
| 304 | to map/unmap pages for DMA, and it normally maps 1..32 pages per call and | 308 | to map/unmap pages for DMA, and it normally maps 1..32 pages per call and |
| 305 | currently there is no way to reduce the number of calls. In order to make things | 309 | currently there is no way to reduce the number of calls. In order to make |
| 306 | faster, the map/unmap handling has been implemented in real mode which provides | 310 | things faster, the map/unmap handling has been implemented in real mode |
| 307 | an excellent performance which has limitations such as inability to do | 311 | which provides an excellent performance which has limitations such as |
| 308 | locked pages accounting in real time. | 312 | inability to do locked pages accounting in real time. |
| 309 | 313 | ||
| 310 | 4) According to sPAPR specification, A Partitionable Endpoint (PE) is an I/O | 314 | 4) According to sPAPR specification, A Partitionable Endpoint (PE) is an I/O |
| 311 | subtree that can be treated as a unit for the purposes of partitioning and | 315 | subtree that can be treated as a unit for the purposes of partitioning and |
| 312 | error recovery. A PE may be a single or multi-function IOA (IO Adapter), a | 316 | error recovery. A PE may be a single or multi-function IOA (IO Adapter), a |
| 313 | function of a multi-function IOA, or multiple IOAs (possibly including switch | 317 | function of a multi-function IOA, or multiple IOAs (possibly including |
| 314 | and bridge structures above the multiple IOAs). PPC64 guests detect PCI errors | 318 | switch and bridge structures above the multiple IOAs). PPC64 guests detect |
| 315 | and recover from them via EEH RTAS services, which works on the basis of | 319 | PCI errors and recover from them via EEH RTAS services, which works on the |
| 316 | additional ioctl commands. | 320 | basis of additional ioctl commands. |
| 317 | 321 | ||
| 318 | So 4 additional ioctls have been added: | 322 | So 4 additional ioctls have been added: |
| 319 | 323 | ||
| 320 | VFIO_IOMMU_SPAPR_TCE_GET_INFO - returns the size and the start | 324 | VFIO_IOMMU_SPAPR_TCE_GET_INFO |
| 321 | of the DMA window on the PCI bus. | 325 | returns the size and the start of the DMA window on the PCI bus. |
| 322 | 326 | ||
| 323 | VFIO_IOMMU_ENABLE - enables the container. The locked pages accounting | 327 | VFIO_IOMMU_ENABLE |
| 328 | enables the container. The locked pages accounting | ||
| 324 | is done at this point. This lets user first to know what | 329 | is done at this point. This lets user first to know what |
| 325 | the DMA window is and adjust rlimit before doing any real job. | 330 | the DMA window is and adjust rlimit before doing any real job. |
| 326 | 331 | ||
| 327 | VFIO_IOMMU_DISABLE - disables the container. | 332 | VFIO_IOMMU_DISABLE |
| 333 | disables the container. | ||
| 328 | 334 | ||
| 329 | VFIO_EEH_PE_OP - provides an API for EEH setup, error detection and recovery. | 335 | VFIO_EEH_PE_OP |
| 336 | provides an API for EEH setup, error detection and recovery. | ||
| 330 | 337 | ||
| 331 | The code flow from the example above should be slightly changed: | 338 | The code flow from the example above should be slightly changed:: |
| 332 | 339 | ||
| 333 | struct vfio_eeh_pe_op pe_op = { .argsz = sizeof(pe_op), .flags = 0 }; | 340 | struct vfio_eeh_pe_op pe_op = { .argsz = sizeof(pe_op), .flags = 0 }; |
| 334 | 341 | ||
| @@ -442,73 +449,73 @@ The code flow from the example above should be slightly changed: | |||
| 442 | .... | 449 | .... |
| 443 | 450 | ||
| 444 | 5) There is v2 of SPAPR TCE IOMMU. It deprecates VFIO_IOMMU_ENABLE/ | 451 | 5) There is v2 of SPAPR TCE IOMMU. It deprecates VFIO_IOMMU_ENABLE/ |
| 445 | VFIO_IOMMU_DISABLE and implements 2 new ioctls: | 452 | VFIO_IOMMU_DISABLE and implements 2 new ioctls: |
| 446 | VFIO_IOMMU_SPAPR_REGISTER_MEMORY and VFIO_IOMMU_SPAPR_UNREGISTER_MEMORY | 453 | VFIO_IOMMU_SPAPR_REGISTER_MEMORY and VFIO_IOMMU_SPAPR_UNREGISTER_MEMORY |
| 447 | (which are unsupported in v1 IOMMU). | 454 | (which are unsupported in v1 IOMMU). |
| 448 | 455 | ||
| 449 | PPC64 paravirtualized guests generate a lot of map/unmap requests, | 456 | PPC64 paravirtualized guests generate a lot of map/unmap requests, |
| 450 | and the handling of those includes pinning/unpinning pages and updating | 457 | and the handling of those includes pinning/unpinning pages and updating |
| 451 | mm::locked_vm counter to make sure we do not exceed the rlimit. | 458 | mm::locked_vm counter to make sure we do not exceed the rlimit. |
| 452 | The v2 IOMMU splits accounting and pinning into separate operations: | 459 | The v2 IOMMU splits accounting and pinning into separate operations: |
| 453 | 460 | ||
| 454 | - VFIO_IOMMU_SPAPR_REGISTER_MEMORY/VFIO_IOMMU_SPAPR_UNREGISTER_MEMORY ioctls | 461 | - VFIO_IOMMU_SPAPR_REGISTER_MEMORY/VFIO_IOMMU_SPAPR_UNREGISTER_MEMORY ioctls |
| 455 | receive a user space address and size of the block to be pinned. | 462 | receive a user space address and size of the block to be pinned. |
| 456 | Bisecting is not supported and VFIO_IOMMU_UNREGISTER_MEMORY is expected to | 463 | Bisecting is not supported and VFIO_IOMMU_UNREGISTER_MEMORY is expected to |
| 457 | be called with the exact address and size used for registering | 464 | be called with the exact address and size used for registering |
| 458 | the memory block. The userspace is not expected to call these often. | 465 | the memory block. The userspace is not expected to call these often. |
| 459 | The ranges are stored in a linked list in a VFIO container. | 466 | The ranges are stored in a linked list in a VFIO container. |
| 460 | 467 | ||
| 461 | - VFIO_IOMMU_MAP_DMA/VFIO_IOMMU_UNMAP_DMA ioctls only update the actual | 468 | - VFIO_IOMMU_MAP_DMA/VFIO_IOMMU_UNMAP_DMA ioctls only update the actual |
| 462 | IOMMU table and do not do pinning; instead these check that the userspace | 469 | IOMMU table and do not do pinning; instead these check that the userspace |
| 463 | address is from pre-registered range. | 470 | address is from pre-registered range. |
| 464 | 471 | ||
| 465 | This separation helps in optimizing DMA for guests. | 472 | This separation helps in optimizing DMA for guests. |
| 466 | 473 | ||
| 467 | 6) sPAPR specification allows guests to have an additional DMA window(s) on | 474 | 6) sPAPR specification allows guests to have an additional DMA window(s) on |
| 468 | a PCI bus with a variable page size. Two ioctls have been added to support | 475 | a PCI bus with a variable page size. Two ioctls have been added to support |
| 469 | this: VFIO_IOMMU_SPAPR_TCE_CREATE and VFIO_IOMMU_SPAPR_TCE_REMOVE. | 476 | this: VFIO_IOMMU_SPAPR_TCE_CREATE and VFIO_IOMMU_SPAPR_TCE_REMOVE. |
| 470 | The platform has to support the functionality or error will be returned to | 477 | The platform has to support the functionality or error will be returned to |
| 471 | the userspace. The existing hardware supports up to 2 DMA windows, one is | 478 | the userspace. The existing hardware supports up to 2 DMA windows, one is |
| 472 | 2GB long, uses 4K pages and called "default 32bit window"; the other can | 479 | 2GB long, uses 4K pages and called "default 32bit window"; the other can |
| 473 | be as big as entire RAM, use different page size, it is optional - guests | 480 | be as big as entire RAM, use different page size, it is optional - guests |
| 474 | create those in run-time if the guest driver supports 64bit DMA. | 481 | create those in run-time if the guest driver supports 64bit DMA. |
| 475 | 482 | ||
| 476 | VFIO_IOMMU_SPAPR_TCE_CREATE receives a page shift, a DMA window size and | 483 | VFIO_IOMMU_SPAPR_TCE_CREATE receives a page shift, a DMA window size and |
| 477 | a number of TCE table levels (if a TCE table is going to be big enough and | 484 | a number of TCE table levels (if a TCE table is going to be big enough and |
| 478 | the kernel may not be able to allocate enough of physically contiguous memory). | 485 | the kernel may not be able to allocate enough of physically contiguous |
| 479 | It creates a new window in the available slot and returns the bus address where | 486 | memory). It creates a new window in the available slot and returns the bus |
| 480 | the new window starts. Due to hardware limitation, the user space cannot choose | 487 | address where the new window starts. Due to hardware limitation, the user |
| 481 | the location of DMA windows. | 488 | space cannot choose the location of DMA windows. |
| 482 | 489 | ||
| 483 | VFIO_IOMMU_SPAPR_TCE_REMOVE receives the bus start address of the window | 490 | VFIO_IOMMU_SPAPR_TCE_REMOVE receives the bus start address of the window |
| 484 | and removes it. | 491 | and removes it. |
| 485 | 492 | ||
| 486 | ------------------------------------------------------------------------------- | 493 | ------------------------------------------------------------------------------- |
| 487 | 494 | ||
| 488 | [1] VFIO was originally an acronym for "Virtual Function I/O" in its | 495 | .. [1] VFIO was originally an acronym for "Virtual Function I/O" in its |
| 489 | initial implementation by Tom Lyon while as Cisco. We've since | 496 | initial implementation by Tom Lyon while as Cisco. We've since |
| 490 | outgrown the acronym, but it's catchy. | 497 | outgrown the acronym, but it's catchy. |
| 491 | 498 | ||
| 492 | [2] "safe" also depends upon a device being "well behaved". It's | 499 | .. [2] "safe" also depends upon a device being "well behaved". It's |
| 493 | possible for multi-function devices to have backdoors between | 500 | possible for multi-function devices to have backdoors between |
| 494 | functions and even for single function devices to have alternative | 501 | functions and even for single function devices to have alternative |
| 495 | access to things like PCI config space through MMIO registers. To | 502 | access to things like PCI config space through MMIO registers. To |
| 496 | guard against the former we can include additional precautions in the | 503 | guard against the former we can include additional precautions in the |
| 497 | IOMMU driver to group multi-function PCI devices together | 504 | IOMMU driver to group multi-function PCI devices together |
| 498 | (iommu=group_mf). The latter we can't prevent, but the IOMMU should | 505 | (iommu=group_mf). The latter we can't prevent, but the IOMMU should |
| 499 | still provide isolation. For PCI, SR-IOV Virtual Functions are the | 506 | still provide isolation. For PCI, SR-IOV Virtual Functions are the |
| 500 | best indicator of "well behaved", as these are designed for | 507 | best indicator of "well behaved", as these are designed for |
| 501 | virtualization usage models. | 508 | virtualization usage models. |
| 502 | 509 | ||
| 503 | [3] As always there are trade-offs to virtual machine device | 510 | .. [3] As always there are trade-offs to virtual machine device |
| 504 | assignment that are beyond the scope of VFIO. It's expected that | 511 | assignment that are beyond the scope of VFIO. It's expected that |
| 505 | future IOMMU technologies will reduce some, but maybe not all, of | 512 | future IOMMU technologies will reduce some, but maybe not all, of |
| 506 | these trade-offs. | 513 | these trade-offs. |
| 507 | 514 | ||
| 508 | [4] In this case the device is below a PCI bridge, so transactions | 515 | .. [4] In this case the device is below a PCI bridge, so transactions |
| 509 | from either function of the device are indistinguishable to the iommu: | 516 | from either function of the device are indistinguishable to the iommu:: |
| 510 | 517 | ||
| 511 | -[0000:00]-+-1e.0-[06]--+-0d.0 | 518 | -[0000:00]-+-1e.0-[06]--+-0d.0 |
| 512 | \-0d.1 | 519 | \-0d.1 |
| 513 | 520 | ||
| 514 | 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 90) | 521 | 00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 90) |
