aboutsummaryrefslogtreecommitdiffstats
path: root/fs/ceph
Commit message (Collapse)AuthorAge
...
| * ceph: pre-allocate data structure that tracks caps flushingYan, Zheng2015-06-25
| | | | | | | | Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: re-send flushing caps (which are revoked) in reconnect stageYan, Zheng2015-06-25
| | | | | | | | | | | | | | | | if flushing caps were revoked, we should re-send the cap flush in client reconnect stage. This guarantees that MDS processes the cap flush message before issuing the flushing caps to other client. Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: send TID of the oldest pending caps flush to MDSYan, Zheng2015-06-25
| | | | | | | | | | | | | | According to this information, MDS can trim its completed caps flush list (which is used to detect duplicated cap flush). Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: track pending caps flushing globallyYan, Zheng2015-06-25
| | | | | | | | | | | | | | | | | | | | So we know TID of the oldest pending caps flushing. Later patch will send this information to MDS, so that MDS can trim its completed caps flush list. Tracking pending caps flushing globally also simplifies syncfs code. Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: track pending caps flushing accuratelyYan, Zheng2015-06-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously we do not trace accurate TID for flushing caps. when MDS failovers, we have no choice but to re-send all flushing caps with a new TID. This can cause problem because MDS can has already flushed some caps and has issued the same caps to other client. The re-sent cap flush has a new TID, which makes MDS unable to detect if it has already processed the cap flush. This patch adds code to track pending caps flushing accurately. When re-sending cap flush is needed, we use its original flush TID. Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: fix directory fsyncYan, Zheng2015-06-25
| | | | | | | | | | | | | | | | fsync() on directory should flush dirty caps and wait for any uncommitted directory opertions to commit. But ceph_dir_fsync() only waits for uncommitted directory opertions. Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: fix flushing capsYan, Zheng2015-06-25
| | | | | | | | | | | | | | | | | | | | Current ceph_fsync() only flushes dirty caps and wait for them to be flushed. It doesn't wait for caps that has already been flushing. This patch makes ceph_fsync() wait for pending flushing caps too. Besides, this patch also makes caps_are_flushed() peroperly handle tid wrapping. Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: don't include used caps in cap_wantedYan, Zheng2015-06-25
| | | | | | | | | | | | | | | | | | | | when copying files to cephfs, file data may stay in page cache after corresponding file is closed. Cached data use Fc capability. If we include Fc capability in cap_wanted, MDS will treat files with cached data as open files, and journal them in an EOpen event when trimming log segment. Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: ratelimit warn messages for MDS closes sessionYan, Zheng2015-06-25
| | | | | | | | Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: simplify two mount_timeout sitesIlya Dryomov2015-06-25
| | | | | | | | | | | | | | | | No need to bifurcate wait now that we've got ceph_timeout_jiffies(). Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org> Reviewed-by: Yan, Zheng <zyan@redhat.com>
| * libceph: store timeouts in jiffies, verify user inputIlya Dryomov2015-06-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are currently three libceph-level timeouts that the user can specify on mount: mount_timeout, osd_idle_ttl and osdkeepalive. All of these are in seconds and no checking is done on user input: negative values are accepted, we multiply them all by HZ which may or may not overflow, arbitrarily large jiffies then get added together, etc. There is also a bug in the way mount_timeout=0 is handled. It's supposed to mean "infinite timeout", but that's not how wait.h APIs treat it and so __ceph_open_session() for example will busy loop without much chance of being interrupted if none of ceph-mons are there. Fix all this by verifying user input, storing timeouts capped by msecs_to_jiffies() in jiffies and using the new ceph_timeout_jiffies() helper for all user-specified waits to handle infinite timeouts correctly. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org>
| * ceph: exclude setfilelock requests when calculating oldest tidYan, Zheng2015-06-25
| | | | | | | | | | | | | | setfilelock requests can block for a long time, which can prevent client from advancing its oldest tid. Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: don't pre-allocate space for cap release messagesYan, Zheng2015-06-25
| | | | | | | | | | | | | | | | | | | | | | Previously we pre-allocate cap release messages for each caps. This wastes lots of memory when there are large amount of caps. This patch make the code not pre-allocate the cap release messages. Instead, we add the corresponding ceph_cap struct to a list when releasing a cap. Later when flush cap releases is needed, we allocate the cap release messages dynamically. Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: make sure syncfs flushes all cap snapsYan, Zheng2015-06-25
| | | | | | | | Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: don't trim auth cap when there are cap snapsYan, Zheng2015-06-25
| | | | | | | | Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: take snap_rwsem when accessing snap realm's cached_contextYan, Zheng2015-06-25
| | | | | | | | | | | | | | | | When ceph inode's i_head_snapc is NULL, __ceph_mark_dirty_caps() accesses snap realm's cached_context. So we need take read lock of snap_rwsem. Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: avoid sending unnessesary FLUSHSNAP messageYan, Zheng2015-06-25
| | | | | | | | | | | | | | | | | | when a snap notification contains no new snapshot, we can avoid sending FLUSHSNAP message to MDS. But we still need to create cap_snap in some case because it's required by write path and page writeback path Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: set i_head_snapc when getting CEPH_CAP_FILE_WR referenceYan, Zheng2015-06-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In most cases that snap context is needed, we are holding reference of CEPH_CAP_FILE_WR. So we can set ceph inode's i_head_snapc when getting the CEPH_CAP_FILE_WR reference, and make codes get snap context from i_head_snapc. This makes the code simpler. Another benefit of this change is that we can handle snap notification more elegantly. Especially when snap context is updated while someone else is doing write. The old queue cap_snap code may set cap_snap's context to ether the old context or the new snap context, depending on if i_head_snapc is set. The new queue capp_snap code always set cap_snap's context to the old snap context. Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: use empty snap context for uninline_data and get_pool_permYan, Zheng2015-06-25
| | | | | | | | | | | | | | | | | | | | | | Cached_context in ceph_snap_realm is directly accessed by uninline_data() and get_pool_perm(). This is racy in theory. both uninline_data() and get_pool_perm() do not modify existing object, they only create new object. So we can pass the empty snap context to them. Unlike cached_context in ceph_snap_realm, we do not need to protect the empty snap context. Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: check OSD caps before read/writeYan, Zheng2015-06-25
| | | | | | | | Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * libceph: allow setting osd_req_op's flagsYan, Zheng2015-06-25
| | | | | | | | | | Signed-off-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Alex Elder <elder@linaro.org>
* | ceph: switch to simple_follow_link()Al Viro2015-05-10
|/ | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Merge branch 'for-linus' of ↵Linus Torvalds2015-04-26
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull fourth vfs update from Al Viro: "d_inode() annotations from David Howells (sat in for-next since before the beginning of merge window) + four assorted fixes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: RCU pathwalk breakage when running into a symlink overmounting something fix I_DIO_WAKEUP definition direct-io: only inc/dec inode->i_dio_count for file systems fs/9p: fix readdir() VFS: assorted d_backing_inode() annotations VFS: fs/inode.c helpers: d_inode() annotations VFS: fs/cachefiles: d_backing_inode() annotations VFS: fs library helpers: d_inode() annotations VFS: assorted weird filesystems: d_inode() annotations VFS: normal filesystems (and lustre): d_inode() annotations VFS: security/: d_inode() annotations VFS: security/: d_backing_inode() annotations VFS: net/: d_inode() annotations VFS: net/unix: d_backing_inode() annotations VFS: kernel/: d_inode() annotations VFS: audit: d_backing_inode() annotations VFS: Fix up some ->d_inode accesses in the chelsio driver VFS: Cachefiles should perform fs modifications on the top layer only VFS: AF_UNIX sockets should call mknod on the top layer only
| * VFS: normal filesystems (and lustre): d_inode() annotationsDavid Howells2015-04-15
| | | | | | | | | | | | | | that's the bulk of filesystem drivers dealing with inodes of their own Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | Merge branch 'for-linus' of ↵Linus Torvalds2015-04-22
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph updates from Sage Weil: "This time around we have a collection of CephFS fixes from Zheng around MDS failure handling and snapshots, support for a new CRUSH straw2 algorithm (to sync up with userspace) and several RBD cleanups and fixes from Ilya, an error path leak fix from Taesoo, and then an assorted collection of cleanups from others" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (28 commits) rbd: rbd_wq comment is obsolete libceph: announce support for straw2 buckets crush: straw2 bucket type with an efficient 64-bit crush_ln() crush: ensuring at most num-rep osds are selected crush: drop unnecessary include from mapper.c ceph: fix uninline data function ceph: rename snapshot support ceph: fix null pointer dereference in send_mds_reconnect() ceph: hold on to exclusive caps on complete directories libceph: simplify our debugfs attr macro ceph: show non-default options only libceph: expose client options through debugfs libceph, ceph: split ceph_show_options() rbd: mark block queue as non-rotational libceph: don't overwrite specific con error msgs ceph: cleanup unsafe requests when reconnecting is denied ceph: don't zero i_wrbuffer_ref when reconnecting is denied ceph: don't mark dirty caps when there is no auth cap ceph: keep i_snap_realm while there are writers libceph: osdmap.h: Add missing format newlines ...
| * ceph: fix uninline data functionYan, Zheng2015-04-22
| | | | | | | | | | | | | | For CEPH_OSD_CMPXATTR_MODE_U64, OSD expects the u64 to be encoded as string in object's xattr. Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: rename snapshot supportYan, Zheng2015-04-22
| | | | | | | | Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: fix null pointer dereference in send_mds_reconnect()Yan, Zheng2015-04-22
| | | | | | | | | | | | sb->s_root can be null when umounting Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: hold on to exclusive caps on complete directoriesYan, Zheng2015-04-20
| | | | | | | | | | | | | | | | If a directory is complete, we want to keep the exclusive cap. So that MDS does not end up revoking the shared cap on every create/unlink operation. Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: show non-default options onlyIlya Dryomov2015-04-20
| | | | | | | | | | | | | | | | Don't pollute /proc/mounts with default options (presently these are dcache, nofsc and acl). Leave the acl/noacl however - it's a bit of a special case due to CONFIG_CEPH_FS_POSIX_ACL. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * libceph, ceph: split ceph_show_options()Ilya Dryomov2015-04-20
| | | | | | | | | | | | | | | | | | Split ceph_show_options() into two pieces and move the piece responsible for printing client (libceph) options into net/ceph. This way people adding a libceph option wouldn't have to remember to update code in fs/ceph. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * ceph: cleanup unsafe requests when reconnecting is deniedYan, Zheng2015-04-20
| | | | | | | | Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: don't zero i_wrbuffer_ref when reconnecting is deniedYan, Zheng2015-04-20
| | | | | | | | | | | | | | | | remove_session_caps_cb() does not truncate dirty data in page cache, but zeros i_wrbuffer_ref/i_wrbuffer_ref_head. This will result negtive i_wrbuffer_ref/i_wrbuffer_ref_head Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: don't mark dirty caps when there is no auth capYan, Zheng2015-04-20
| | | | | | | | | | | | | | No i_auth_cap means reconnecting to MDS was denied. So don't add new dirty caps. Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: keep i_snap_realm while there are writersYan, Zheng2015-04-20
| | | | | | | | | | | | | | | | | | when reconnecting to MDS is denied, we remove session caps forcibly. But it's possible there are ongoing write, the write code needs to reference i_snap_realm. So if there are ongoing write, we keep i_snap_realm. Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: kstrdup() memory handlingSanidhya Kashyap2015-04-20
| | | | | | | | | | | | | | | | | | | | Currently, there is no check for the kstrdup() for r_path2, r_path1 and snapdir_name as various locations as there is a possibility of failure during memory pressure. Therefore, returning ENOMEM where the checks have been missed. Signed-off-by: Sanidhya Kashyap <sanidhya.gatech@gmail.com> Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: properly release page upon errorTaesoo Kim2015-04-20
| | | | | | | | | | | | | | | | | | | | | | When ceph_update_writeable_page fails (including -EAGAIN), it unlocks (w/ unlock_page) the page but does not 'release' (w/ page_cache_release) properly. Upon error, properly set *pagep to NULL, indicating an error. Signed-off-by: Taesoo Kim <tsgatesv@gmail.com> Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: match wait_for_completion_timeout return typeNicholas Mc Guire2015-04-20
| | | | | | | | | | | | | | | | return type of wait_for_completion_timeout is unsigned long not int. An appropriately named unsigned long is added and the assignment fixed up. Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org> Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: use msecs_to_jiffies for time conversionNicholas Mc Guire2015-04-20
| | | | | | | | | | | | | | | | This is only an API consolidation and should make things more readable it replaces var * HZ / 1000 by msecs_to_jiffies(var). Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org> Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: remove redundant declarationFabian Frederick2015-04-20
| | | | | | | | | | | | | | ceph_aops was already defined extern in addr.c section Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: fix dcache/nocache mount optionYan, Zheng2015-04-20
| | | | | | | | Signed-off-by: Yan, Zheng <zyan@redhat.com>
| * ceph: drop cap releases in requests composed before cap reconnectYan, Zheng2015-04-20
| | | | | | | | | | | | | | | | | | | | These cap releases are stale because MDS will re-establish client caps according to the cap reconnect messages. Note: MDS can detect stale cap messages, so these stale cap releases are harmless even we don't drop them. Signed-off-by: Yan, Zheng <zyan@redhat.com>
* | mirror O_APPEND and O_DIRECT into iocb->ki_flagsAl Viro2015-04-11
| | | | | | | | | | | | ... avoiding write_iter/fcntl races. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | switch generic_write_checks() to iocb and iterAl Viro2015-04-11
| | | | | | | | | | | | | | | | | | | | | | | | ... returning -E... upon error and amount of data left in iter after (possible) truncation upon success. Note, that normal case gives a non-zero (positive) return value, so any tests for != 0 _must_ be updated. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Conflicts: fs/ext4/file.c
* | generic_write_checks(): drop isblk argumentAl Viro2015-04-11
| | | | | | | | | | | | all remaining callers are passing 0; some just obscure that fact. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | direct_IO: remove rw from a_ops->direct_IO()Omar Sandoval2015-04-11
| | | | | | | | | | | | | | Now that no one is using rw, remove it completely. Signed-off-by: Omar Sandoval <osandov@osandov.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | make new_sync_{read,write}() staticAl Viro2015-04-11
| | | | | | | | | | | | | | | | All places outside of core VFS that checked ->read and ->write for being NULL or called the methods directly are gone now, so NULL {read,write} with non-NULL {read,write}_iter will do the right thing in all cases. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | Merge branch 'iocb' into for-nextAl Viro2015-04-11
|\ \ | |/ |/|
| * fs: move struct kiocb to fs.hChristoph Hellwig2015-03-25
| | | | | | | | | | | | | | | | struct kiocb now is a generic I/O container, so move it to fs.h. Also do a #include diet for aio.h while we're at it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * fs: remove ki_nbytesChristoph Hellwig2015-03-12
| | | | | | | | | | | | | | | | There is no need to pass the total request length in the kiocb, as we already get passed in through the iov_iter argument. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>