diff options
Diffstat (limited to 'Documentation/filesystems')
-rw-r--r-- | Documentation/filesystems/00-INDEX | 2 | ||||
-rw-r--r-- | Documentation/filesystems/9p.txt | 18 | ||||
-rw-r--r-- | Documentation/filesystems/ceph.txt | 140 | ||||
-rw-r--r-- | Documentation/filesystems/tmpfs.txt | 6 |
4 files changed, 163 insertions, 3 deletions
diff --git a/Documentation/filesystems/00-INDEX b/Documentation/filesystems/00-INDEX index 3bae418c6ad3..4303614b5add 100644 --- a/Documentation/filesystems/00-INDEX +++ b/Documentation/filesystems/00-INDEX | |||
@@ -16,6 +16,8 @@ befs.txt | |||
16 | - information about the BeOS filesystem for Linux. | 16 | - information about the BeOS filesystem for Linux. |
17 | bfs.txt | 17 | bfs.txt |
18 | - info for the SCO UnixWare Boot Filesystem (BFS). | 18 | - info for the SCO UnixWare Boot Filesystem (BFS). |
19 | ceph.txt | ||
20 | - info for the Ceph Distributed File System | ||
19 | cifs.txt | 21 | cifs.txt |
20 | - description of the CIFS filesystem. | 22 | - description of the CIFS filesystem. |
21 | coda.txt | 23 | coda.txt |
diff --git a/Documentation/filesystems/9p.txt b/Documentation/filesystems/9p.txt index 57e0b80a5274..c0236e753bc8 100644 --- a/Documentation/filesystems/9p.txt +++ b/Documentation/filesystems/9p.txt | |||
@@ -37,6 +37,15 @@ For Plan 9 From User Space applications (http://swtch.com/plan9) | |||
37 | 37 | ||
38 | mount -t 9p `namespace`/acme /mnt/9 -o trans=unix,uname=$USER | 38 | mount -t 9p `namespace`/acme /mnt/9 -o trans=unix,uname=$USER |
39 | 39 | ||
40 | For server running on QEMU host with virtio transport: | ||
41 | |||
42 | mount -t 9p -o trans=virtio <mount_tag> /mnt/9 | ||
43 | |||
44 | where mount_tag is the tag associated by the server to each of the exported | ||
45 | mount points. Each 9P export is seen by the client as a virtio device with an | ||
46 | associated "mount_tag" property. Available mount tags can be | ||
47 | seen by reading /sys/bus/virtio/drivers/9pnet_virtio/virtio<n>/mount_tag files. | ||
48 | |||
40 | OPTIONS | 49 | OPTIONS |
41 | ======= | 50 | ======= |
42 | 51 | ||
@@ -47,7 +56,7 @@ OPTIONS | |||
47 | fd - used passed file descriptors for connection | 56 | fd - used passed file descriptors for connection |
48 | (see rfdno and wfdno) | 57 | (see rfdno and wfdno) |
49 | virtio - connect to the next virtio channel available | 58 | virtio - connect to the next virtio channel available |
50 | (from lguest or KVM with trans_virtio module) | 59 | (from QEMU with trans_virtio module) |
51 | rdma - connect to a specified RDMA channel | 60 | rdma - connect to a specified RDMA channel |
52 | 61 | ||
53 | uname=name user name to attempt mount as on the remote server. The | 62 | uname=name user name to attempt mount as on the remote server. The |
@@ -85,7 +94,12 @@ OPTIONS | |||
85 | 94 | ||
86 | port=n port to connect to on the remote server | 95 | port=n port to connect to on the remote server |
87 | 96 | ||
88 | noextend force legacy mode (no 9p2000.u semantics) | 97 | noextend force legacy mode (no 9p2000.u or 9p2000.L semantics) |
98 | |||
99 | version=name Select 9P protocol version. Valid options are: | ||
100 | 9p2000 - Legacy mode (same as noextend) | ||
101 | 9p2000.u - Use 9P2000.u protocol | ||
102 | 9p2000.L - Use 9P2000.L protocol | ||
89 | 103 | ||
90 | dfltuid attempt to mount as a particular uid | 104 | dfltuid attempt to mount as a particular uid |
91 | 105 | ||
diff --git a/Documentation/filesystems/ceph.txt b/Documentation/filesystems/ceph.txt new file mode 100644 index 000000000000..0660c9f5deef --- /dev/null +++ b/Documentation/filesystems/ceph.txt | |||
@@ -0,0 +1,140 @@ | |||
1 | Ceph Distributed File System | ||
2 | ============================ | ||
3 | |||
4 | Ceph is a distributed network file system designed to provide good | ||
5 | performance, reliability, and scalability. | ||
6 | |||
7 | Basic features include: | ||
8 | |||
9 | * POSIX semantics | ||
10 | * Seamless scaling from 1 to many thousands of nodes | ||
11 | * High availability and reliability. No single point of failure. | ||
12 | * N-way replication of data across storage nodes | ||
13 | * Fast recovery from node failures | ||
14 | * Automatic rebalancing of data on node addition/removal | ||
15 | * Easy deployment: most FS components are userspace daemons | ||
16 | |||
17 | Also, | ||
18 | * Flexible snapshots (on any directory) | ||
19 | * Recursive accounting (nested files, directories, bytes) | ||
20 | |||
21 | In contrast to cluster filesystems like GFS, OCFS2, and GPFS that rely | ||
22 | on symmetric access by all clients to shared block devices, Ceph | ||
23 | separates data and metadata management into independent server | ||
24 | clusters, similar to Lustre. Unlike Lustre, however, metadata and | ||
25 | storage nodes run entirely as user space daemons. Storage nodes | ||
26 | utilize btrfs to store data objects, leveraging its advanced features | ||
27 | (checksumming, metadata replication, etc.). File data is striped | ||
28 | across storage nodes in large chunks to distribute workload and | ||
29 | facilitate high throughputs. When storage nodes fail, data is | ||
30 | re-replicated in a distributed fashion by the storage nodes themselves | ||
31 | (with some minimal coordination from a cluster monitor), making the | ||
32 | system extremely efficient and scalable. | ||
33 | |||
34 | Metadata servers effectively form a large, consistent, distributed | ||
35 | in-memory cache above the file namespace that is extremely scalable, | ||
36 | dynamically redistributes metadata in response to workload changes, | ||
37 | and can tolerate arbitrary (well, non-Byzantine) node failures. The | ||
38 | metadata server takes a somewhat unconventional approach to metadata | ||
39 | storage to significantly improve performance for common workloads. In | ||
40 | particular, inodes with only a single link are embedded in | ||
41 | directories, allowing entire directories of dentries and inodes to be | ||
42 | loaded into its cache with a single I/O operation. The contents of | ||
43 | extremely large directories can be fragmented and managed by | ||
44 | independent metadata servers, allowing scalable concurrent access. | ||
45 | |||
46 | The system offers automatic data rebalancing/migration when scaling | ||
47 | from a small cluster of just a few nodes to many hundreds, without | ||
48 | requiring an administrator carve the data set into static volumes or | ||
49 | go through the tedious process of migrating data between servers. | ||
50 | When the file system approaches full, new nodes can be easily added | ||
51 | and things will "just work." | ||
52 | |||
53 | Ceph includes flexible snapshot mechanism that allows a user to create | ||
54 | a snapshot on any subdirectory (and its nested contents) in the | ||
55 | system. Snapshot creation and deletion are as simple as 'mkdir | ||
56 | .snap/foo' and 'rmdir .snap/foo'. | ||
57 | |||
58 | Ceph also provides some recursive accounting on directories for nested | ||
59 | files and bytes. That is, a 'getfattr -d foo' on any directory in the | ||
60 | system will reveal the total number of nested regular files and | ||
61 | subdirectories, and a summation of all nested file sizes. This makes | ||
62 | the identification of large disk space consumers relatively quick, as | ||
63 | no 'du' or similar recursive scan of the file system is required. | ||
64 | |||
65 | |||
66 | Mount Syntax | ||
67 | ============ | ||
68 | |||
69 | The basic mount syntax is: | ||
70 | |||
71 | # mount -t ceph monip[:port][,monip2[:port]...]:/[subdir] mnt | ||
72 | |||
73 | You only need to specify a single monitor, as the client will get the | ||
74 | full list when it connects. (However, if the monitor you specify | ||
75 | happens to be down, the mount won't succeed.) The port can be left | ||
76 | off if the monitor is using the default. So if the monitor is at | ||
77 | 1.2.3.4, | ||
78 | |||
79 | # mount -t ceph 1.2.3.4:/ /mnt/ceph | ||
80 | |||
81 | is sufficient. If /sbin/mount.ceph is installed, a hostname can be | ||
82 | used instead of an IP address. | ||
83 | |||
84 | |||
85 | |||
86 | Mount Options | ||
87 | ============= | ||
88 | |||
89 | ip=A.B.C.D[:N] | ||
90 | Specify the IP and/or port the client should bind to locally. | ||
91 | There is normally not much reason to do this. If the IP is not | ||
92 | specified, the client's IP address is determined by looking at the | ||
93 | address it's connection to the monitor originates from. | ||
94 | |||
95 | wsize=X | ||
96 | Specify the maximum write size in bytes. By default there is no | ||
97 | maximum. Ceph will normally size writes based on the file stripe | ||
98 | size. | ||
99 | |||
100 | rsize=X | ||
101 | Specify the maximum readahead. | ||
102 | |||
103 | mount_timeout=X | ||
104 | Specify the timeout value for mount (in seconds), in the case | ||
105 | of a non-responsive Ceph file system. The default is 30 | ||
106 | seconds. | ||
107 | |||
108 | rbytes | ||
109 | When stat() is called on a directory, set st_size to 'rbytes', | ||
110 | the summation of file sizes over all files nested beneath that | ||
111 | directory. This is the default. | ||
112 | |||
113 | norbytes | ||
114 | When stat() is called on a directory, set st_size to the | ||
115 | number of entries in that directory. | ||
116 | |||
117 | nocrc | ||
118 | Disable CRC32C calculation for data writes. If set, the storage node | ||
119 | must rely on TCP's error correction to detect data corruption | ||
120 | in the data payload. | ||
121 | |||
122 | noasyncreaddir | ||
123 | Disable client's use its local cache to satisfy readdir | ||
124 | requests. (This does not change correctness; the client uses | ||
125 | cached metadata only when a lease or capability ensures it is | ||
126 | valid.) | ||
127 | |||
128 | |||
129 | More Information | ||
130 | ================ | ||
131 | |||
132 | For more information on Ceph, see the home page at | ||
133 | http://ceph.newdream.net/ | ||
134 | |||
135 | The Linux kernel client source tree is available at | ||
136 | git://ceph.newdream.net/git/ceph-client.git | ||
137 | git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client.git | ||
138 | |||
139 | and the source for the full system is at | ||
140 | git://ceph.newdream.net/git/ceph.git | ||
diff --git a/Documentation/filesystems/tmpfs.txt b/Documentation/filesystems/tmpfs.txt index 3015da0c6b2a..fe09a2cb1858 100644 --- a/Documentation/filesystems/tmpfs.txt +++ b/Documentation/filesystems/tmpfs.txt | |||
@@ -82,11 +82,13 @@ tmpfs has a mount option to set the NUMA memory allocation policy for | |||
82 | all files in that instance (if CONFIG_NUMA is enabled) - which can be | 82 | all files in that instance (if CONFIG_NUMA is enabled) - which can be |
83 | adjusted on the fly via 'mount -o remount ...' | 83 | adjusted on the fly via 'mount -o remount ...' |
84 | 84 | ||
85 | mpol=default prefers to allocate memory from the local node | 85 | mpol=default use the process allocation policy |
86 | (see set_mempolicy(2)) | ||
86 | mpol=prefer:Node prefers to allocate memory from the given Node | 87 | mpol=prefer:Node prefers to allocate memory from the given Node |
87 | mpol=bind:NodeList allocates memory only from nodes in NodeList | 88 | mpol=bind:NodeList allocates memory only from nodes in NodeList |
88 | mpol=interleave prefers to allocate from each node in turn | 89 | mpol=interleave prefers to allocate from each node in turn |
89 | mpol=interleave:NodeList allocates from each node of NodeList in turn | 90 | mpol=interleave:NodeList allocates from each node of NodeList in turn |
91 | mpol=local prefers to allocate memory from the local node | ||
90 | 92 | ||
91 | NodeList format is a comma-separated list of decimal numbers and ranges, | 93 | NodeList format is a comma-separated list of decimal numbers and ranges, |
92 | a range being two hyphen-separated decimal numbers, the smallest and | 94 | a range being two hyphen-separated decimal numbers, the smallest and |
@@ -134,3 +136,5 @@ Author: | |||
134 | Christoph Rohland <cr@sap.com>, 1.12.01 | 136 | Christoph Rohland <cr@sap.com>, 1.12.01 |
135 | Updated: | 137 | Updated: |
136 | Hugh Dickins, 4 June 2007 | 138 | Hugh Dickins, 4 June 2007 |
139 | Updated: | ||
140 | KOSAKI Motohiro, 16 Mar 2010 | ||