Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (205 commits) ceph: update for write_inode API change ceph: reset osd after relevant messages timed out ceph: fix flush_dirty_caps race with caps migration ceph: include migrating caps in issued set ceph: fix osdmap decoding when pools include (removed) snaps ceph: return EBADF if waiting for caps on closed file ceph: set osd request message front length correctly ceph: reset front len on return to msgpool; BUG on mismatched front iov ceph: fix snaptrace decoding on cap migration between mds ceph: use single osd op reply msg ceph: reset bits on connection close ceph: remove bogus mds forward warning ceph: remove fragile __map_osds optimization ceph: fix connection fault STANDBY check ceph: invalidate_authorizer without con->mutex held ceph: don't clobber write return value when using O_SYNC ceph: fix client_request_forward decoding ceph: drop messages on unregistered mds sessions; cleanup ceph: fix comments, locking in destroy_inode ceph: move dereference after NULL test ... Fix trivial conflicts in Documentation/ioctl/ioctl-number.txt
author: Linus Torvalds <torvalds@linux-foundation.org> 2010-03-19 12:43:06 -0400
committer: Linus Torvalds <torvalds@linux-foundation.org> 2010-03-19 12:43:06 -0400
commit: fc7f99cf36ebae853639dabb43bc2f0098c59aef (patch)
tree: 3ca7050397f515f91ef98f8b6293f9f7fd84ef02 /Documentation
parent: 0a492fdef8aa241f6139e6455e852cc710ae8ed1 (diff)
parent: f1a3d57213fe264b4cf584e78bac36aaf9998729 (diff)
2 files changed, 140 insertions, 0 deletions
diff --git a/Documentation/filesystems/ceph.txt b/Documentation/filesystems/ceph.txt
new file mode 100644
index 000000000000..6e03917316bd
--- /dev/null
+++ b/Documentation/filesystems/ceph.txt
@@ -0,0 +1,139 @@
+Ceph Distributed File System
+============================
+Ceph is a distributed network file system designed to provide good
+performance, reliability, and scalability.
+Basic features include:
+ * POSIX semantics
+ * Seamless scaling from 1 to many thousands of nodes
+ * High availability and reliability.  No single points of failure.
+ * N-way replication of data across storage nodes
+ * Fast recovery from node failures
+ * Automatic rebalancing of data on node addition/removal
+ * Easy deployment: most FS components are userspace daemons
+Also,
+ * Flexible snapshots (on any directory)
+ * Recursive accounting (nested files, directories, bytes)
+In contrast to cluster filesystems like GFS, OCFS2, and GPFS that rely
+on symmetric access by all clients to shared block devices, Ceph
+separates data and metadata management into independent server
+clusters, similar to Lustre.  Unlike Lustre, however, metadata and
+storage nodes run entirely as user space daemons.  Storage nodes
+utilize btrfs to store data objects, leveraging its advanced features
+(checksumming, metadata replication, etc.).  File data is striped
+across storage nodes in large chunks to distribute workload and
+facilitate high throughputs.  When storage nodes fail, data is
+re-replicated in a distributed fashion by the storage nodes themselves
+(with some minimal coordination from a cluster monitor), making the
+system extremely efficient and scalable.
+Metadata servers effectively form a large, consistent, distributed
+in-memory cache above the file namespace that is extremely scalable,
+dynamically redistributes metadata in response to workload changes,
+and can tolerate arbitrary (well, non-Byzantine) node failures.  The
+metadata server takes a somewhat unconventional approach to metadata
+storage to significantly improve performance for common workloads.  In
+particular, inodes with only a single link are embedded in
+directories, allowing entire directories of dentries and inodes to be
+loaded into its cache with a single I/O operation.  The contents of
+extremely large directories can be fragmented and managed by
+independent metadata servers, allowing scalable concurrent access.
+The system offers automatic data rebalancing/migration when scaling
+from a small cluster of just a few nodes to many hundreds, without
+requiring an administrator carve the data set into static volumes or
+go through the tedious process of migrating data between servers.
+When the file system approaches full, new nodes can be easily added
+and things will "just work."
+Ceph includes flexible snapshot mechanism that allows a user to create
+a snapshot on any subdirectory (and its nested contents) in the
+system.  Snapshot creation and deletion are as simple as 'mkdir
+.snap/foo' and 'rmdir .snap/foo'.
+Ceph also provides some recursive accounting on directories for nested
+files and bytes.  That is, a 'getfattr -d foo' on any directory in the
+system will reveal the total number of nested regular files and
+subdirectories, and a summation of all nested file sizes.  This makes
+the identification of large disk space consumers relatively quick, as
+no 'du' or similar recursive scan of the file system is required.
+Mount Syntax
+============
+The basic mount syntax is:
+ # mount -t ceph monip[:port][,monip2[:port]...]:/[subdir] mnt
+You only need to specify a single monitor, as the client will get the
+full list when it connects.  (However, if the monitor you specify
+happens to be down, the mount won't succeed.)  The port can be left
+off if the monitor is using the default.  So if the monitor is at
+1.2.3.4,
+ # mount -t ceph 1.2.3.4:/ /mnt/ceph
+is sufficient.  If /sbin/mount.ceph is installed, a hostname can be
+used instead of an IP address.
+Mount Options
+=============
+  ip=A.B.C.D[:N]
+        Specify the IP and/or port the client should bind to locally.
+        There is normally not much reason to do this.  If the IP is not
+        specified, the client's IP address is determined by looking at the
+        address it's connection to the monitor originates from.
+  wsize=X
+        Specify the maximum write size in bytes.  By default there is no
+        maximu.  Ceph will normally size writes based on the file stripe
+        size.
+  rsize=X
+        Specify the maximum readahead.
+  mount_timeout=X
+        Specify the timeout value for mount (in seconds), in the case
+        of a non-responsive Ceph file system.  The default is 30
+        seconds.
+  rbytes
+        When stat() is called on a directory, set st_size to 'rbytes',
+        the summation of file sizes over all files nested beneath that
+        directory.  This is the default.
+  norbytes
+        When stat() is called on a directory, set st_size to the
+        number of entries in that directory.
+  nocrc
+        Disable CRC32C calculation for data writes.  If set, the OSD
+        must rely on TCP's error correction to detect data corruption
+        in the data payload.
+  noasyncreaddir
+        Disable client's use its local cache to satisfy readdir
+        requests.  (This does not change correctness; the client uses
+        cached metadata only when a lease or capability ensures it is
+        valid.)
+More Information
+================
+For more information on Ceph, see the home page at
+        http://ceph.newdream.net/
+The Linux kernel client source tree is available at
+        git://ceph.newdream.net/linux-ceph-client.git
+and the source for the full system is at
+        git://ceph.newdream.net/ceph.git
diff --git a/Documentation/ioctl/ioctl-number.txt b/Documentation/ioctl/ioctl-number.txt
index 35c9b51d20ea..dd5806f4fcc4 100644
--- a/Documentation/ioctl/ioctl-number.txt
+++ b/Documentation/ioctl/ioctl-number.txt
@@ -291,6 +291,7 @@ Code  Seq#(hex)	Include File		Comments
 0x92    00-0F   drivers/usb/mon/mon_bin.c
 0x93    60-7F   linux/auto_fs.h
 0x94    all     fs/btrfs/ioctl.h
+0x97    00-7F   fs/ceph/ioctl.h         Ceph file system
 0x99    00-0F                           537-Addinboard driver
                                        <mailto:buk@buks.ipn.de>
 0xA0    all     linux/sdp/sdp.h         Industrial Device Project
author	Linus Torvalds <torvalds@linux-foundation.org>	2010-03-19 12:43:06 -0400
committer	Linus Torvalds <torvalds@linux-foundation.org>	2010-03-19 12:43:06 -0400
commit	fc7f99cf36ebae853639dabb43bc2f0098c59aef (patch)
tree	3ca7050397f515f91ef98f8b6293f9f7fd84ef02 /Documentation
parent	0a492fdef8aa241f6139e6455e852cc710ae8ed1 (diff)
parent	f1a3d57213fe264b4cf584e78bac36aaf9998729 (diff)

diff --git a/Documentation/filesystems/ceph.txt b/Documentation/filesystems/ceph.txt new file mode 100644 index 000000000000..6e03917316bd --- /dev/null +++ b/Documentation/filesystems/ceph.txt
@@ -0,0 +1,139 @@
		1	Ceph Distributed File System
		2	============================
		3
		4	Ceph is a distributed network file system designed to provide good
		5	performance, reliability, and scalability.
		6
		7	Basic features include:
		8
		9	* POSIX semantics
		10	* Seamless scaling from 1 to many thousands of nodes
		11	* High availability and reliability. No single points of failure.
		12	* N-way replication of data across storage nodes
		13	* Fast recovery from node failures
		14	* Automatic rebalancing of data on node addition/removal
		15	* Easy deployment: most FS components are userspace daemons
		16
		17	Also,
		18	* Flexible snapshots (on any directory)
		19	* Recursive accounting (nested files, directories, bytes)
		20
		21	In contrast to cluster filesystems like GFS, OCFS2, and GPFS that rely
		22	on symmetric access by all clients to shared block devices, Ceph
		23	separates data and metadata management into independent server
		24	clusters, similar to Lustre. Unlike Lustre, however, metadata and
		25	storage nodes run entirely as user space daemons. Storage nodes
		26	utilize btrfs to store data objects, leveraging its advanced features
		27	(checksumming, metadata replication, etc.). File data is striped
		28	across storage nodes in large chunks to distribute workload and
		29	facilitate high throughputs. When storage nodes fail, data is
		30	re-replicated in a distributed fashion by the storage nodes themselves
		31	(with some minimal coordination from a cluster monitor), making the
		32	system extremely efficient and scalable.
		33
		34	Metadata servers effectively form a large, consistent, distributed
		35	in-memory cache above the file namespace that is extremely scalable,
		36	dynamically redistributes metadata in response to workload changes,
		37	and can tolerate arbitrary (well, non-Byzantine) node failures. The
		38	metadata server takes a somewhat unconventional approach to metadata
		39	storage to significantly improve performance for common workloads. In
		40	particular, inodes with only a single link are embedded in
		41	directories, allowing entire directories of dentries and inodes to be
		42	loaded into its cache with a single I/O operation. The contents of
		43	extremely large directories can be fragmented and managed by
		44	independent metadata servers, allowing scalable concurrent access.
		45
		46	The system offers automatic data rebalancing/migration when scaling
		47	from a small cluster of just a few nodes to many hundreds, without
		48	requiring an administrator carve the data set into static volumes or
		49	go through the tedious process of migrating data between servers.
		50	When the file system approaches full, new nodes can be easily added
		51	and things will "just work."
		52
		53	Ceph includes flexible snapshot mechanism that allows a user to create
		54	a snapshot on any subdirectory (and its nested contents) in the
		55	system. Snapshot creation and deletion are as simple as 'mkdir
		56	.snap/foo' and 'rmdir .snap/foo'.
		57
		58	Ceph also provides some recursive accounting on directories for nested
		59	files and bytes. That is, a 'getfattr -d foo' on any directory in the
		60	system will reveal the total number of nested regular files and
		61	subdirectories, and a summation of all nested file sizes. This makes
		62	the identification of large disk space consumers relatively quick, as
		63	no 'du' or similar recursive scan of the file system is required.
		64
		65
		66	Mount Syntax
		67	============
		68
		69	The basic mount syntax is:
		70
		71	# mount -t ceph monip[:port][,monip2[:port]...]:/[subdir] mnt
		72
		73	You only need to specify a single monitor, as the client will get the
		74	full list when it connects. (However, if the monitor you specify
		75	happens to be down, the mount won't succeed.) The port can be left
		76	off if the monitor is using the default. So if the monitor is at
		77	1.2.3.4,
		78
		79	# mount -t ceph 1.2.3.4:/ /mnt/ceph
		80
		81	is sufficient. If /sbin/mount.ceph is installed, a hostname can be
		82	used instead of an IP address.
		83
		84
		85
		86	Mount Options
		87	=============
		88
		89	ip=A.B.C.D[:N]
		90	Specify the IP and/or port the client should bind to locally.
		91	There is normally not much reason to do this. If the IP is not
		92	specified, the client's IP address is determined by looking at the
		93	address it's connection to the monitor originates from.
		94
		95	wsize=X
		96	Specify the maximum write size in bytes. By default there is no
		97	maximu. Ceph will normally size writes based on the file stripe
		98	size.
		99
		100	rsize=X
		101	Specify the maximum readahead.
		102
		103	mount_timeout=X
		104	Specify the timeout value for mount (in seconds), in the case
		105	of a non-responsive Ceph file system. The default is 30
		106	seconds.
		107
		108	rbytes
		109	When stat() is called on a directory, set st_size to 'rbytes',
		110	the summation of file sizes over all files nested beneath that
		111	directory. This is the default.
		112
		113	norbytes
		114	When stat() is called on a directory, set st_size to the
		115	number of entries in that directory.
		116
		117	nocrc
		118	Disable CRC32C calculation for data writes. If set, the OSD
		119	must rely on TCP's error correction to detect data corruption
		120	in the data payload.
		121
		122	noasyncreaddir
		123	Disable client's use its local cache to satisfy readdir
		124	requests. (This does not change correctness; the client uses
		125	cached metadata only when a lease or capability ensures it is
		126	valid.)
		127
		128
		129	More Information
		130	================
		131
		132	For more information on Ceph, see the home page at
		133	http://ceph.newdream.net/
		134
		135	The Linux kernel client source tree is available at
		136	git://ceph.newdream.net/linux-ceph-client.git
		137
		138	and the source for the full system is at
		139	git://ceph.newdream.net/ceph.git


diff --git a/Documentation/ioctl/ioctl-number.txt b/Documentation/ioctl/ioctl-number.txt index 35c9b51d20ea..dd5806f4fcc4 100644 --- a/Documentation/ioctl/ioctl-number.txt +++ b/Documentation/ioctl/ioctl-number.txt
@@ -291,6 +291,7 @@ Code Seq#(hex) Include File Comments
291	0x92 00-0F drivers/usb/mon/mon_bin.c	291	0x92 00-0F drivers/usb/mon/mon_bin.c
292	0x93 60-7F linux/auto_fs.h	292	0x93 60-7F linux/auto_fs.h
293	0x94 all fs/btrfs/ioctl.h	293	0x94 all fs/btrfs/ioctl.h
		294	0x97 00-7F fs/ceph/ioctl.h Ceph file system
294	0x99 00-0F 537-Addinboard driver	295	0x99 00-0F 537-Addinboard driver
295	<mailto:buk@buks.ipn.de>	296	<mailto:buk@buks.ipn.de>
296	0xA0 all linux/sdp/sdp.h Industrial Device Project	297	0xA0 all linux/sdp/sdp.h Industrial Device Project