| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
| |
The client reports its version to the kernel on startup. We already test
that it is above the minimum version. Now we record it in a global
variable so code elsewhere can consult it before making a request the
client may not understand.
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://github.com/martinbrandenburg/linux
orangefs: integrate readahead cache changes from out-of-tree
The readahead cache has long been present as a compile time option to
OrangeFS. As it had not been tested in some time, it was disabled by
default. Recently, Walt Ligon started work reviving it, which eventually
culminated in the commit below to OrangeFS SVN.
r12519 | walt | 2016-07-13 14:32:42 -0400 (Wed, 13 Jul 2016) | 1 line
reintegrating readahead cache code
This cache is implemented almost entirely in userspace. There are a few
parameters exposed via sysfs, and the cache must be flushed after an
inode is released.
|
| |
| |
| |
| | |
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
|
| |
| |
| |
| | |
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This will support a upcoming request where two related values need to be
updated atomically.
This was done without a union in the OrangeFS server source already. Since
that will break the kernel protocol, it has been fixed there and done here
in a way that does not break the kernel protocol.
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
|
| |
| |
| |
| | |
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This has been dormant code for many years. Parts of it were removed from
the OrangeFS kernel code when it went into mainline. These bits were missed.
Now the readahead cache has been resurrected in the OrangeFS userspace
portions. It was renamed there, since it doesn't really have anything to do
with mmap specifically, so it will be renamed here.
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
|
| |\
| |/
|/|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Pull orangefs update from Martin Brandenburg:
"Kernel side caching and executable bugfix
This allows OrangeFS to utilize the dcache and adds an in kernel
attribute cache. We previously used the user side client for this
purpose.
We see a modest performance increase on small file operations. For
example, without the cache, compiling coreutils takes about 17
minutes. With the patch and a 50 millisecond timeout for
dcache_timeout_msecs and getattr_timeout_msecs (the default),
compiling coreutils takes about 6 minutes 20 seconds. On the same
hardware, compiling coreutils on an xfs filesystem takes 90 seconds.
We see similar improvements with mdtest and a test involving writing,
reading, and deleting a large number of small files.
Interested parties can review more data at the following URL.
https://docs.google.com/spreadsheets/d/1v4aUeppKexIbRMz_Yn9k4eaM3uy2KCaPoe_93YKWOtA/pubhtml
The eventual goal of this is to allow getdents to turn into a
readdirplus to the OrangeFS server. The cache will be filled then,
which should provide a performance benefit to the common case of
readdir followed by getattr on each entry (i.e. ls -l).
This also fixes a bug. When orangefs_inode_permission was added, it
did not collect i_size from the OrangeFS server, since this presses an
unnecessary load on the OrangeFS server. However, it left a case
where i_size is never initialized. Then running an executable could
fail.
With this patch, size is always collected to be inserted into the
cache. Thus the bug disappears. If this patch is not accepted during
this merge window, we will send a one-line band-aid for this bug
instead"
* tag 'for-linus-v4.8' of git://github.com/martinbrandenburg/linux:
Orangefs: update orangefs.txt
orangefs: Account for jiffies wraparound.
orangefs: Change default dcache and getattr timeout to 50 msec.
orangefs: Allow dcache and getattr cache time to be configured.
orangefs: Cache getattr results.
orangefs: Use d_time to avoid excessive lookups
|
| |
| |
| |
| |
| |
| |
| | |
Describe use of jiffy-based timeout values involved in inode maintenance.
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
|
| |
| |
| |
| | |
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
|
| |
| |
| |
| | |
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
|
| |
| |
| |
| | |
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
The userspace component attempts to do this, but this will prevent
us from even needing to go into userspace to satisfy certain getattr
requests.
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
|
| |
| |
| |
| | |
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
|
| |\
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Pull Ceph updates from Ilya Dryomov:
"The highlights are:
- RADOS namespace support in libceph and CephFS (Zheng Yan and
myself). The stopgaps added in 4.5 to deny access to inodes in
namespaces are removed and CEPH_FEATURE_FS_FILE_LAYOUT_V2 feature
bit is now fully supported
- A large rework of the MDS cap flushing code (Zheng Yan)
- Handle some of ->d_revalidate() in RCU mode (Jeff Layton). We were
overly pessimistic before, bailing at the first sight of LOOKUP_RCU
On top of that we've got a few CephFS bug fixes, a couple of cleanups
and Arnd's workaround for a weird genksyms issue"
* tag 'ceph-for-4.8-rc1' of git://github.com/ceph/ceph-client: (34 commits)
ceph: fix symbol versioning for ceph_monc_do_statfs
ceph: Correctly return NXIO errors from ceph_llseek
ceph: Mark the file cache as unreclaimable
ceph: optimize cap flush waiting
ceph: cleanup ceph_flush_snaps()
ceph: kick cap flushes before sending other cap message
ceph: introduce an inode flag to indicates if snapflush is needed
ceph: avoid sending duplicated cap flush message
ceph: unify cap flush and snapcap flush
ceph: use list instead of rbtree to track cap flushes
ceph: update types of some local varibles
ceph: include 'follows' of pending snapflush in cap reconnect message
ceph: update cap reconnect message to version 3
ceph: mount non-default filesystem by name
libceph: fsmap.user subscription support
ceph: handle LOOKUP_RCU in ceph_d_revalidate
ceph: allow dentry_lease_is_valid to work under RCU walk
ceph: clear d_fsinfo pointer under d_lock
ceph: remove ceph_mdsc_lease_release
ceph: don't use ->d_time
...
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The genksyms helper in the kernel cannot parse a type definition
like "typeof(((type *)0)->keyfld)" that is used in the DEFINE_RB_FUNCS
helper, causing the following EXPORT_SYMBOL() statement to be ignored
when computing the crcs, and triggering a warning about this:
WARNING: "ceph_monc_do_statfs" [fs/ceph/ceph.ko] has no CRC
To work around the problem, we can rewrite the type to reference
an undefined 'extern' symbol instead of a NULL pointer. This is
evidently ok for genksyms, and it no longer complains about the
line when calling it with 'genksyms -w'.
I've looked briefly into extending genksyms instead, but it seems
really hard to do. Jan Beulich introduced basic support for 'typeof'
a while ago in dc53324060f3 ("genksyms: fix typeof() handling"),
but that is not sufficient for the expression we have here.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: fcd00b68bbe2 ("libceph: DEFINE_RB_FUNCS macro")
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Michal Marek <mmarek@suse.cz>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
ceph_llseek does not correctly return NXIO errors because the 'out' path
always returns 'offset'.
Fixes: 06222e491e66 ("fs: handle SEEK_HOLE/SEEK_DATA properly in all fs's that define their own llseek")
Signed-off-by: Phil Turnbull <phil.turnbull@oracle.com>
Signed-off-by: Yan, Zheng <zyan@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Ceph creates multiple caches with the SLAB_RECLAIMABLE flag set, so
that it can satisfy its internal needs. Inspecting the code shows that
most of the caches are indeed reclaimable since they are directly
related to the generic inode/dentry shrinkers. However, one of the
cache used to satisfy struct file is not reclaimable since its
entries are freed only when the last reference to the file is
dropped. If a heavily loaded node opens a lot of files it can
introduce non-trivial discrepancies between memory shown as reclaimable
and what is actually reclaimed when drop_caches is used.
Fix this by removing the reclaimable flag for the file's cache.
Signed-off-by: Nikolay Borisov <n.borisov.lkml@gmail.com>
Signed-off-by: Yan, Zheng <zyan@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Add a 'wake' flag to ceph_cap_flush struct, which indicates if there
is someone waiting for it to finish. When getting flush ack message,
we check the 'wake' flag in corresponding ceph_cap_flush struct to
decide if we should wake up waiters. One corner case is that the
acked cap flush has 'wake' flags is set, but it is not the first one
on the flushing list. We do not wake up waiters in this case, set
'wake' flags of preceding ceph_cap_flush struct instead
Signed-off-by: Yan, Zheng <zyan@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This patch devide __ceph_flush_snaps() into two stags. In the first
stage, __ceph_flush_snaps() assign snapcaps flush TIDs and add them
to cap flush lists. __ceph_flush_snaps() keeps holding the
i_ceph_lock in this stagge. So inode's auth cap can not change. In
the second stage, __ceph_flush_snaps() send flushsnap cap messages.
i_ceph_lock is unlocked before sending each cap message. If auth cap
changes in the middle, __ceph_flush_snaps() just stops. This is OK
because kick_flushing_inode_caps() will re-send flushsnap cap messages
to inode's new auth MDS.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
If ceph_check_caps() wants to send cap message to a recovering MDS,
make sure it kicks cap flushes first.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
|
| | |
| | |
| | |
| | | |
Signed-off-by: Yan, Zheng <zyan@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
make ceph_kick_flushing_caps() ignore inodes whose cap flushes
have already been re-sent by ceph_early_kick_flushing_caps()
Signed-off-by: Yan, Zheng <zyan@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This patch includes following changes
- Assign flush tid to snapcap flush
- Remove session's s_cap_snaps_flushing list. Add inode to session's
s_cap_flushing list instead. Inode is removed from the list when
there is no pending snapcap flush or cap flush.
- make __kick_flushing_caps() re-send both snapcap flushes and cap
flushes.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
We don't have requirement of searching cap flush by TID. In most cases,
we just need to know TID of the oldest cap flush. List is ideal for this
usage.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
|
| | |
| | |
| | |
| | | |
Signed-off-by: Yan, Zheng <zyan@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This helps the recovering MDS to reconstruct the internal states that
tracking pending snapflush.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
|
| | |
| | |
| | |
| | | |
Signed-off-by: Yan, Zheng <zyan@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
To mount non-default filesytem, user currently needs to provide mds
namespace ID. This is inconvenience.
This patch makes user be able to mount filesystem by name. If user
wants to mount non-default filesystem. Client first subscribes to
fsmap.user. Subscribe to mdsmap.<ID> after getting ID of filesystem.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
|
| | |
| | |
| | |
| | |
| | | |
Signed-off-by: Yan, Zheng <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
We can now handle the snapshot cases under RCU, as well as the
non-snapshot case when we don't need to queue up a lease renewal
allow LOOKUP_RCU walks to proceed under those conditions.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Yan, Zheng <zyan@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Under rcuwalk, we need to take extra care when dereferencing d_parent.
We want to do that once and pass a pointer to dentry_lease_is_valid.
Also, we must ensure that that function can handle the case where we're
racing with d_release. Check whether "di" is NULL under the d_lock, and
just return 0 if so.
Finally, we still need to kick off a renewal job if the lease is getting
close to expiration. If that's the case, then just drop out of rcuwalk
mode since that could block.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Yan, Zheng <zyan@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
To check for a valid dentry lease, we need to get at the
ceph_dentry_info. Under rcuwalk though, we may end up with a dentry that
is on its way to destruction. Since we need to take the d_lock in
dentry_lease_is_valid already, we can just ensure that we clear the
d_fsinfo pointer out under the same lock before destroying it.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Yan, Zheng <zyan@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Nothing calls it.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Yan, Zheng <zyan@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Pretty simple: just use ceph_dentry_info.time instead (which was already
there, unused).
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | | |
trivial fix to spelling mistake in pr_err message
Signed-off-by: Colin Ian King <colin.king@canonical.com>
|
| | |
| | |
| | |
| | |
| | |
| | | |
old_snapc->seq is used in dout(...)
Signed-off-by: Yan, Zheng <zyan@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | | |
Otherwise ceph_sync_write_unsafe() may access/modify freed inode.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
ceph_aio_complete() can free the ceph_aio_request struct before
the code exits the while loop.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Track usage count for individual fmode bit. This can reduce the
array size by half.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
|
| | |
| | |
| | |
| | | |
Signed-off-by: Yan, Zheng <zyan@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This patch adds codes that decode pool namespace information in
cap message and request reply. Pool namespace is saved in i_layout,
it will be passed to libceph when doing read/write.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
|
| | |
| | |
| | |
| | |
| | | |
Signed-off-by: Yan, Zheng <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Add pool namesapce pointer to struct ceph_file_layout and struct
ceph_object_locator. Pool namespace is used by when mapping object
to PG, it's also used when composing OSD request.
The namespace pointer in struct ceph_file_layout is RCU protected.
So libceph can read namespace without taking lock.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
[idryomov@gmail.com: ceph_oloc_destroy(), misc minor changes]
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The data structure is for storing namesapce string. It allows namespace
string to be shared between cephfs inodes with same layout. This data
structure can also be referenced by OSD request.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Define new ceph_file_layout structure and rename old ceph_file_layout
to ceph_file_layout_legacy. This is preparation for adding namespace
to ceph_file_layout structure.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Add ceph_start_encoding() and ceph_start_decoding(), the equivalent of
ENCODE_START and DECODE_START in the userspace ceph code.
This is based on a patch from Mike Christie <michaelc@cs.wisc.edu>.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
An on-stack oid in ceph_ioctl_get_dataloc() is not initialized,
resulting in a WARN and a NULL pointer dereference later on. We will
have more of these on-stack in the future, so fix it with a convenience
macro.
Fixes: d30291b985d1 ("libceph: variable-sized ceph_object_id")
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
- decode.h needs slab.h for kmalloc()
- osd_client.h needs msgpool.h for struct ceph_msgpool
- msgpool.h doesn't need messenger.h
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
| |\ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
git://people.freedesktop.org/~airlied/linux
Pull i915 drm fixes from Dave Airlie:
"These are the two fixes from Ville for the bug you are seeing on your
HSW laptop.
They pretty much disable PSR in some cases where the panel reports a
setup time that would cause issues, like you seem to have"
* tag 'drm-psr-fixes-for-v4.8' of git://people.freedesktop.org/~airlied/linux:
drm/i915: Check PSR setup time vs. vblank length
drm/dp: Add drm_dp_psr_setup_time()
|