diff options
Diffstat (limited to 'Documentation/filesystems/xip.txt')
-rw-r--r-- | Documentation/filesystems/xip.txt | 67 |
1 files changed, 67 insertions, 0 deletions
diff --git a/Documentation/filesystems/xip.txt b/Documentation/filesystems/xip.txt new file mode 100644 index 000000000000..6c0cef10eb4d --- /dev/null +++ b/Documentation/filesystems/xip.txt | |||
@@ -0,0 +1,67 @@ | |||
1 | Execute-in-place for file mappings | ||
2 | ---------------------------------- | ||
3 | |||
4 | Motivation | ||
5 | ---------- | ||
6 | File mappings are performed by mapping page cache pages to userspace. In | ||
7 | addition, read&write type file operations also transfer data from/to the page | ||
8 | cache. | ||
9 | |||
10 | For memory backed storage devices that use the block device interface, the page | ||
11 | cache pages are in fact copies of the original storage. Various approaches | ||
12 | exist to work around the need for an extra copy. The ramdisk driver for example | ||
13 | does read the data into the page cache, keeps a reference, and discards the | ||
14 | original data behind later on. | ||
15 | |||
16 | Execute-in-place solves this issue the other way around: instead of keeping | ||
17 | data in the page cache, the need to have a page cache copy is eliminated | ||
18 | completely. With execute-in-place, read&write type operations are performed | ||
19 | directly from/to the memory backed storage device. For file mappings, the | ||
20 | storage device itself is mapped directly into userspace. | ||
21 | |||
22 | This implementation was initialy written for shared memory segments between | ||
23 | different virtual machines on s390 hardware to allow multiple machines to | ||
24 | share the same binaries and libraries. | ||
25 | |||
26 | Implementation | ||
27 | -------------- | ||
28 | Execute-in-place is implemented in three steps: block device operation, | ||
29 | address space operation, and file operations. | ||
30 | |||
31 | A block device operation named direct_access is used to retrieve a | ||
32 | reference (pointer) to a block on-disk. The reference is supposed to be | ||
33 | cpu-addressable, physical address and remain valid until the release operation | ||
34 | is performed. A struct block_device reference is used to address the device, | ||
35 | and a sector_t argument is used to identify the individual block. As an | ||
36 | alternative, memory technology devices can be used for this. | ||
37 | |||
38 | The block device operation is optional, these block devices support it as of | ||
39 | today: | ||
40 | - dcssblk: s390 dcss block device driver | ||
41 | |||
42 | An address space operation named get_xip_page is used to retrieve reference | ||
43 | to a struct page. To address the target page, a reference to an address_space, | ||
44 | and a sector number is provided. A 3rd argument indicates whether the | ||
45 | function should allocate blocks if needed. | ||
46 | |||
47 | This address space operation is mutually exclusive with readpage&writepage that | ||
48 | do page cache read/write operations. | ||
49 | The following filesystems support it as of today: | ||
50 | - ext2: the second extended filesystem, see Documentation/filesystems/ext2.txt | ||
51 | |||
52 | A set of file operations that do utilize get_xip_page can be found in | ||
53 | mm/filemap_xip.c . The following file operation implementations are provided: | ||
54 | - aio_read/aio_write | ||
55 | - readv/writev | ||
56 | - sendfile | ||
57 | |||
58 | The generic file operations do_sync_read/do_sync_write can be used to implement | ||
59 | classic synchronous IO calls. | ||
60 | |||
61 | Shortcomings | ||
62 | ------------ | ||
63 | This implementation is limited to storage devices that are cpu addressable at | ||
64 | all times (no highmem or such). It works well on rom/ram, but enhancements are | ||
65 | needed to make it work with flash in read+write mode. | ||
66 | Putting the Linux kernel and/or its modules on a xip filesystem does not mean | ||
67 | they are not copied. | ||