diff options
author | Alex Williamson <alex.williamson@redhat.com> | 2013-06-21 11:38:02 -0400 |
---|---|---|
committer | Alex Williamson <alex.williamson@redhat.com> | 2013-06-21 11:38:02 -0400 |
commit | 166fd7d94afdac040b28c473e45241820ca522a2 (patch) | |
tree | 044cd4540cb2a949ed8a55949cc39471b05a73b3 /include/uapi | |
parent | cd9b22685e4ccd728550d51fbe108c473f89df4f (diff) |
vfio: hugepage support for vfio_iommu_type1
We currently send all mappings to the iommu in PAGE_SIZE chunks,
which prevents the iommu from enabling support for larger page sizes.
We still need to pin pages, which means we step through them in
PAGE_SIZE chunks, but we can batch up contiguous physical memory
chunks to allow the iommu the opportunity to use larger pages. The
approach here is a bit different that the one currently used for
legacy KVM device assignment. Rather than looking at the vma page
size and using that as the maximum size to pass to the iommu, we
instead simply look at whether the next page is physically
contiguous. This means we might ask the iommu to map a 4MB region,
while legacy KVM might limit itself to a maximum of 2MB.
Splitting our mapping path also allows us to be smarter about locked
memory because we can more easily unwind if the user attempts to
exceed the limit. Therefore, rather than assuming that a mapping
will result in locked memory, we test each page as it is pinned to
determine whether it locks RAM vs an mmap'd MMIO region. This should
result in better locking granularity and less locked page fudge
factors in userspace.
The unmap path uses the same algorithm as legacy KVM. We don't want
to track the pfn for each mapping ourselves, but we need the pfn in
order to unpin pages. We therefore ask the iommu for the iova to
physical address translation, ask it to unpin a page, and see how many
pages were actually unpinned. iommus supporting large pages will
often return something bigger than a page here, which we know will be
physically contiguous and we can unpin a batch of pfns. iommus not
supporting large mappings won't see an improvement in batching here as
they only unmap a page at a time.
With this change, we also make a clarification to the API for mapping
and unmapping DMA. We can only guarantee unmaps at the same
granularity as used for the original mapping. In other words,
unmapping a subregion of a previous mapping is not guaranteed and may
result in a larger or smaller unmapping than requested. The size
field in the unmapping structure is updated to reflect this.
Previously this was unmodified on mapping, always returning the the
requested unmap size. This is now updated to return the actual unmap
size on success, allowing userspace to appropriately track mappings.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Diffstat (limited to 'include/uapi')
-rw-r--r-- | include/uapi/linux/vfio.h | 8 |
1 files changed, 6 insertions, 2 deletions
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index 284ff2436829..513600612995 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h | |||
@@ -361,10 +361,14 @@ struct vfio_iommu_type1_dma_map { | |||
361 | #define VFIO_IOMMU_MAP_DMA _IO(VFIO_TYPE, VFIO_BASE + 13) | 361 | #define VFIO_IOMMU_MAP_DMA _IO(VFIO_TYPE, VFIO_BASE + 13) |
362 | 362 | ||
363 | /** | 363 | /** |
364 | * VFIO_IOMMU_UNMAP_DMA - _IOW(VFIO_TYPE, VFIO_BASE + 14, struct vfio_dma_unmap) | 364 | * VFIO_IOMMU_UNMAP_DMA - _IOWR(VFIO_TYPE, VFIO_BASE + 14, |
365 | * struct vfio_dma_unmap) | ||
365 | * | 366 | * |
366 | * Unmap IO virtual addresses using the provided struct vfio_dma_unmap. | 367 | * Unmap IO virtual addresses using the provided struct vfio_dma_unmap. |
367 | * Caller sets argsz. | 368 | * Caller sets argsz. The actual unmapped size is returned in the size |
369 | * field. No guarantee is made to the user that arbitrary unmaps of iova | ||
370 | * or size different from those used in the original mapping call will | ||
371 | * succeed. | ||
368 | */ | 372 | */ |
369 | struct vfio_iommu_type1_dma_unmap { | 373 | struct vfio_iommu_type1_dma_unmap { |
370 | __u32 argsz; | 374 | __u32 argsz; |