| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
sysfs is now completely out of driver/module lifetime game. After
deletion, a sysfs node doesn't access anything outside sysfs proper,
so there's no reason to hold onto the attribute owners. Note that
often the wrong modules were accounted for as owners leading to
accessing removed modules.
This patch kills now unnecessary attribute->owner. Note that with
this change, userland holding a sysfs node does not prevent the
backing module from being unloaded.
For more info regarding lifetime rule cleanup, please read the
following message.
http://article.gmane.org/gmane.linux.kernel/510293
(tweaked by Greg to not delete the field just yet, to make it easier to
merge things properly.)
Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch reimplements sysfs_drop_dentry() such that remove_dir() can
use it to drop dentry instead of using a separate mechanism. With
this change, making directories reclaimable is much easier.
This patch used to contain fixes for two race conditions around
sd->s_dentry but that part has been separated out and included into
mainline early as commit 6aa054aadfea613a437ad0b15d38eca2b963fc0a and
dd14cbc994709a1c5a64ed3621f583c49a27e521.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
|
|
|
|
|
|
|
|
| |
Consolidate sd <-> dentry association into sysfs_attach_dentry() and
call it after dentry and inode are properly set up. This is in
preparation of sysfs_drop_dentry() updates.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
|
|
|
|
|
|
|
|
|
| |
Now that sysfs_dirent can be disconnected from kobject on deletion,
there is no need to orphan each attribute files. All [bin_]attribute
nodes are automatically orphaned when the parent node is deleted.
Kill attribute file orphaning.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
sysfs: implement sysfs_dirent active reference and immediate disconnect
Opening a sysfs node references its associated kobject, so userland
can arbitrarily prolong lifetime of a kobject which complicates
lifetime rules in drivers. This patch implements active reference and
makes the association between kobject and sysfs immediately breakable.
Now each sysfs_dirent has two reference counts - s_count and s_active.
s_count is a regular reference count which guarantees that the
containing sysfs_dirent is accessible. As long as s_count reference
is held, all sysfs internal fields in sysfs_dirent are accessible
including s_parent and s_name.
The newly added s_active is active reference count. This is acquired
by invoking sysfs_get_active() and it's the caller's responsibility to
ensure sysfs_dirent itself is accessible (should be holding s_count
one way or the other). Dereferencing sysfs_dirent to access objects
out of sysfs proper requires active reference. This includes access
to the associated kobjects, attributes and ops.
The active references can be drained and denied by calling
sysfs_deactivate(). All active sysfs_dirents must be deactivated
after deletion but before the default reference is dropped. This
enables immediate disconnect of sysfs nodes. Once a sysfs_dirent is
deleted, it won't access any entity external to sysfs proper.
Because attr/bin_attr ops access both the node itself and its parent
for kobject, they need to hold active references to both.
sysfs_get/put_active_two() helpers are provided to help grabbing both
references. Parent's is acquired first and released last.
Unlike other operations, mmapped area lingers on after mmap() is
finished and the module implement implementing it and kobj need to
stay referenced till all the mapped pages are gone. This is
accomplished by holding one set of active references to the bin_attr
and its parent if there have been any mmap during lifetime of an
openfile. The references are dropped when the openfile is released.
This change makes sysfs lifetime rules independent from both kobject's
and module's. It not only fixes several race conditions caused by
sysfs not holding onto the proper module when referencing kobject, but
also helps fixing and simplifying lifetime management in driver model
and drivers by taking sysfs out of the equation.
Please read the following message for more info.
http://article.gmane.org/gmane.linux.kernel/510293
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
|
|
|
|
|
|
|
|
| |
Implement bin_buffer which contains a mutex and pointer to PAGE_SIZE
buffer to properly synchronize accesses to per-openfile buffer and
prepare for immediate-kobj-disconnect.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
sysfs symlink is implemented by referencing dentry and kobject from
sysfs_dirent - symlink entry references kobject, dentry is used to
walk the tree. This complicates object lifetimes rules and is
dangerous - for example, there is no way to tell to which module the
target of a symlink belongs and referencing that kobject can make it
linger after the module is gone.
This patch reimplements symlink using only sysfs_dirent tree. sd for
a symlink points and holds reference to the target sysfs_dirent and
all walking is done using sysfs_dirent tree. Simpler and safer.
Please read the following message for more info.
http://article.gmane.org/gmane.linux.kernel/510293
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
kobj->dentry can go away anytime unless the user controls when the
associated sysfs node is deleted. This patch implements
kobj_sysfs_assoc_lock which protects kobj->dentry. This will be used
to maintain kobj based API when converting sysfs to use sysfs_dirent
tree instead of dentry/kobject.
Note that this lock belongs to kobject/driver-model not sysfs. Once
sysfs is converted to not use kobject in its interface, this can be
removed from sysfs.
This is in preparation of object reference simplification.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Make sd->s_element a union of sysfs_elem_{dir|symlink|attr|bin_attr}
and rename it to s_elem. This is to achieve...
* some level of type checking : changing symlink to point to
sysfs_dirent instead of kobject is much safer and less painful now.
* easier / standardized dereferencing
* allow sysfs_elem_* to contain more than one entry
Where possible, pointer is obtained by directly deferencing from sd
instead of going through other entities. This reduces dependencies to
dentry, inode and kobject. to_attr() and to_bin_attr() are unused now
and removed.
This is in preparation of object reference simplification.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add s_name to sysfs_dirent. This is to further reduce dependency to
the associated dentry. Name is copied for directories and symlinks
but not for attributes.
Where possible, name dereferences are converted to use sd->s_name.
sysfs_symlink->link_name and sysfs_get_name() are unused now and
removed.
This change allows symlink to be implemented using sysfs_dirent tree
proper, which is the last remaining dentry-dependent sysfs walk.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add sysfs_dirent->s_parent. With this patch, each sd points to and
holds a reference to its parent. This allows walking sysfs tree
without referencing sd->s_dentry which can go away anytime if the user
doesn't control when it's deleted.
sd->s_parent is initialized and parent is referenced in
sysfs_attach_dirent(). Reference to parent is released when the sd is
released, so as long as reference to a sd is held, s_parent can be
followed.
dentry walk in sysfs_readdir() is convereted to s_parent walk.
This will be used to reimplement symlink such that it uses only
sysfs_dirent tree.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently there are four functions to create sysfs_dirent -
__sysfs_new_dirent(), sysfs_new_dirent(), __sysfs_make_dirent() and
sysfs_make_dirent(). Other than sysfs_make_dirent(), no function has
two users if calls to implement other functions are excluded.
This patch consolidates sysfs_dirent creation functions into the
following two.
* sysfs_new_dirent() : allocate and initialize
* sysfs_attach_dirent() : attach to sysfs_dirent hierarchy and/or
associate with dentry
This simplifies interface and gives callers more flexibility. This is
in preparation of object reference simplification.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Error handling in sysfs_rename_dir() was broken.
* When lookup_one_len() fails, 0 is returned.
* If parent inode check fails, returns with inode mutex and rename
rwsem held.
This patch fixes the above bugs and flattens error handling such that
it's more readable and easier to modify.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
|
|
|
|
|
|
|
|
| |
Flatten cleanup paths in sysfs_add_link() and create_dir() to improve
readability and ease further changes to these functions. This is in
preparation of object reference simplification.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Error handling in fs/sysfs/bin.c:write() was wrong because size_t
count is used to receive return value from flush_write() which is
negative on failure.
This patch updates write() such that int variable is used instead.
read() is updated the same way for consistency.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
|
|
|
|
|
|
| |
Make sysfs_put() ignore NULL sd instead of oopsing.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
|
|
|
|
|
|
|
|
| |
sysfs used simple incrementing allocator which is not guaranteed to be
unique. This patch makes sysfs use ida to give each sd a unique and
packed inode number.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
|
|
|
|
|
|
|
|
| |
There is no reason this function should be inlined and soon to follow
sysfs object reference simplification will make it heavier. Move it
to dir.c.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
|
|
|
|
|
|
| |
Implement debugfs_rename() to allow renaming files/directories in debugfs.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Many places in kernel use seq_file API to iterate over a regular list_head.
The code for such iteration is identical in all the places, so it's worth
introducing a common helpers.
This makes code about 300 lines smaller:
The first version of this patch made the helper functions static inline
in the seq_file.h header. This patch moves them to the fs/seq_file.c as
Andrew proposed. The vmlinux .text section sizes are as follows:
2.6.22-rc1-mm1: 0x001794d5
with the previous version: 0x00179505
with this patch: 0x00179135
The config file used was make allnoconfig with the "y" inclusion of all
the possible options to make the files modified by the patch compile plus
drivers I have on the test node.
This patch:
Many places in kernel use seq_file API to iterate over a regular list_head.
The code for such iteration is identical in all the places, so it's worth
introducing a common helpers.
Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6:
[S390] vmlogrdr function annotation.
[S390] s390: rename CPU_IDLE to S390_CPU_IDLE
[S390] cio: Remove prototype for non-existing function cmf_reset().
[S390] zcrypt: fix request timeout handling
[S390] system call optimization.
[S390] dasd: Avoid compile warnings on !CONFIG_DASD_PROFILE
[S390] Remove volatile from atomic_t
[S390] Program check in diag 210 under 31 bit
[S390] Bogomips calculation for 64 bit.
[S390] smp: Merge smp_count_cpus() and smp_get_save_areas().
[S390] zcore: Fix __user annotation.
[S390] fixed cdl-format detection.
[S390] sclp: Test facility list before executing a service call.
[S390] sclp: introduce some new interfaces.
[S390] Fixed comment typo.
[S390] vmcp cleanup
|
| |
| |
| |
| |
| |
| |
| |
| | |
CDL formated DASDs are now detected correctly even if no VOL1 label is
on the disk. This prevents possible loss of data.
Signed-off-by: Stefan Haberland <stefan.haberland@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: (57 commits)
[GFS2] Accept old format NFS filehandles
[GFS2] Small fixes to logging code
[DLM] dump more lock values
[GFS2] Remove i_mode passing from NFS File Handle
[GFS2] Obtaining no_formal_ino from directory entry
[GFS2] git-gfs2-nmw-build-fix
[GFS2] System won't suspend with GFS2 file system mounted
[GFS2] remounting w/o acl option leaves acls enabled
[GFS2] inode size inconsistency
[DLM] Telnet to port 21064 can stop all lockspaces
[GFS2] Fix gfs2_block_truncate_page err return
[GFS2] Addendum to the journaled file/unmount patch
[GFS2] Simplify multiple glock aquisition
[GFS2] assertion failure after writing to journaled file, umount
[GFS2] Use zero_user_page() in stuffed_readpage()
[GFS2] Remove bogus '\0' in rgrp.c
[GFS2] Journaled file write/unstuff bug
[DLM] don't require FS flag on all nodes
[GFS2] Fix deallocation issues
[GFS2] return conflicts for GETLK
...
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
On Tue, 2007-07-10 at 10:06 +0100, Christoph Hellwig wrote:
> > -#define GFS2_LARGE_FH_SIZE 10
> > -
> > -struct gfs2_fh_obj {
> > - struct gfs2_inum_host this;
> > - u32 imode;
> > -};
> > +#define GFS2_LARGE_FH_SIZE 8
>
> Because gfs2_decode_fh only accepts file handles with GFS2_LARGE_FH_SIZE
> or GFS2_LARGE_FH_SIZE you don't accept filehandles sent out by and older
> gfs version anymore. Stale filehandles because of a new kernel version
> are a big no-no, so please add back code to handle the old filehandles
> on the decode side.
>
This should fix that problem I think since its only relating to end of
the fh we can just ignore that field in order to accept the older
format.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Wendy Cheng <wcheng@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This reverts part of an earlier patch which tried to reclaim
gfs2_bufdata structures too early and resulted in a "use after free"
case (this bit from me). Also a change to not write out log headers
unless we really need to (in the case of flushing nothing we don't need
a header) from Bob.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Add two more output fields (lkb_flags and rsb nodeid) to the new debugfs
file that dumps one lock per line. Also, dump all locks instead of just
mastered locks. Accordingly, use a suffix of _locks instead of _master.
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
GFS2 has been passing i_mode within NFS File Handle. Other than the
wrong assumption that there is always room for this extra 16 bit value,
the current gfs2_get_dentry doesn't really need the i_mode to work
correctly. Note that GFS2 NFS code does go thru the same lookup code
path as direct file access route (where the mode is obtained from name
lookup) but gfs2_get_dentry() is coded for different purpose. It is not
used during lookup time. It is part of the file access procedure call.
When the call is invoked, if on-disk inode is not in-memory, it has to
be read-in. This makes i_mode passing a useless overhead.
Signed-off-by: S. Wendy Cheng <wcheng@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
GFS2 lookup code doesn't ask for inode shared glock. This implies during
in-memory inode creation for existing file, GFS2 will not disk-read in
the inode contents. This leaves no_formal_ino un-initialized during
lookup time. The un-initialized no_formal_ino is subsequently encoded
into file handle. Clients will get ESTALE error whenever it tries to
access these files.
Signed-off-by: S. Wendy Cheng <wcheng@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | | |
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The kernel threads in gfs2, namely gfs2_scand, gfs2_logd, gfs2_quotad,
gfs2_glockd, gfs2_recoverd weren't doing anything when the suspend
mechanism was trying to freeze them.
I put in calls to refrigerator() in the loops for all the daemons and
suspend works as expected.
Signed-off-by: Abhijith Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This patch is for bugzilla bug #245663. This crosswrites a fix from
gfs1 (bz #210369) so that the mount options are reset properly upon
remount. This was tested on system trin-10.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This should have been part of the NFS patch #1 but somehow I missed it
when packaging the patches. It is not a critical issue as the others (I
hope). RHEL 5.1 31.el5 kernel runs fine without this change.
Our truncate code is chopped into two parts, one for vfs inode changes
(in vmtruncate()) and one of gfs inode (in gfs2_truncatei()). These two
operatons are, unfortunately, not atomic. So it could happens that
vmtruncate() succeeds (inode->i_size is changed) but gfs2_truncatei
fails (say kernel temporarily out of memory). This would leave gfs inode
i_di.di_size out of sync with vfs inode i_size. It will later confuse
gfs2_commit_write() if a write is issued. Last time I checked, it will
cause file corruption.
Signed-off-by: S. Wendy Cheng <wcheng@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This patch fixes Red Hat bz#245892
Opening a tcp connection from a cluster member to another cluster member
targeting the dlm port it is enough to stop every dlm operation in the cluster.
This means that GFS and rgmanager will hang.
Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Code segment inside gfs2_block_truncate_page() doesn't set the return
code correctly. This causes NFSD erroneously returns EIO back to client
with setattr procedure call (truncate error).
Signed-off-by: S. Wendy Cheng <wcheng@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This patch is an addendum to the previous journaled file/unmount patch.
It fixes a problem discovered during testing.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
There is a bug in the code which acquires multiple glocks where if the
initial out-of-order attempt fails part way though we can land up trying
to acquire the wrong number of glocks. This is part of the fix for red
hat bz #239737. The other part of the bz doesn't apply to upstream
kernels since it was fixed by:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=d3717bdf8f08a0e1039158c8bab2c24d20f492b6
Since the out-of-order code doesn't appear to add anything to the
performance of GFS2, this patch just removed it rather than trying to
fix it. It should be much easier to see whats going on here now. In
addition, we don't allocate any memory unless we are using a lot of
glocks (which is a relatively uncommon case).
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This patch passes all my nasty tests that were causing the code to
fail under one circumstance or another. Here is a complete summary
of all changes from today's git tree, in order of appearance:
1. There are now separate variables for metadata buffer accounting.
2. Variable sd_log_num_hdrs is no longer needed, since the header
accounting is taken care of by the reserve/refund sequence.
3. Fixed a tiny grammatical problem in a comment.
4. Added a new function "calc_reserved" to calculate the reserved
log space. This isn't entirely necessary, but it has two benefits:
First, it simplifies the gfs2_log_refund function greatly.
Second, it allows for easier debugging because I could sprinkle the
code with calls to this function to make sure the accounting is
proper (by adding asserts and printks) at strategic point of the code.
5. In log_pull_tail there apparently was a kludge to fix up the
accounting based on a "pull" parameter. The buffer accounting is
now done properly, so the kludge was removed.
6. File sync operations were making a call to gfs2_log_flush that
writes another journal header. Since that header was unplanned
for (reserved) by the reserve/refund sequence, the free space had
to be decremented so that when log_pull_tail gets called, the free
space is be adjusted properly. (Did I hear you call that a kludge?
well, maybe, but a lot more justifiable than the one I removed).
7. In the gfs2_log_shutdown code, it optionally syncs the log by
specifying the PULL parameter to log_write_header. I'm not sure
this is necessary anymore. It just seems to me there could be
cases where shutdown is called while there are outstanding log
buffers.
8. In the (data)buf_lo_before_commit functions, I changed some offset
values from being calculated on the fly to being constants. That
simplified some code and we might as well let the compiler do the
calculation once rather than redoing those cycles at run time.
9. This version has my rewritten databuf_lo_add function.
This version is much more like its predecessor, buf_lo_add, which
makes it easier to understand. Again, this might not be necessary,
but it seems as if this one works as well as the previous one,
maybe even better, so I decided to leave it in.
10. In databuf_lo_before_commit, a previous data corruption problem
was caused by going off the end of the buffer. The proper solution
is to have the proper limit in place, rather than stopping earlier.
(Thus my previous attempt to fix it is wrong).
If you don't wrap the buffer, you're stopping too early and that
causes more log buffer accounting problems.
11. In lops.h there are two new (previously mentioned) constants for
figuring out the data offset for the journal buffers.
12. There are also two new functions, buf_limit and databuf_limit to
calculate how many entries will fit in the buffer.
13. In function gfs2_meta_wipe, it needs to distinguish between pinned
metadata buffers and journaled data buffers for proper journal buffer
accounting. It can't use the JDATA gfs2_inode flag because it's
sometimes passed the "real" inode and sometimes the "metadata
inode" and the inode flags will be random bits in a metadata
gfs2_inode. It needs to base its decision on which was passed in.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
As suggested by Robert P. J. Day <rpjday@mindspring.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Robert P. J. Day <rpjday@mindspring.com>
|
| | |
| | |
| | |
| | |
| | |
| | | |
Not sure how it slipped in, but we don't want it anyway.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This patch is for bugzilla bug 283162, which uncovered a number of
bugs pertaining to writing to files that have the journaled bit on.
These bugs happen most often when writing to the meta_fs because
the files are always journaled. So operations like gfs2_grow were
particularly vulnerable, although many of the problems could be
recreated with normal files after setting the journaled bit on.
The problems fixed are:
-GFS2 wasn't ever writing unstuffed journaled data blocks to their
in-place location on disk. Now it does.
-If you unmounted too quickly after doing IO to a journaled file,
GFS2 was crashing because you would discard a buffer whose bufdata
was still on the active items list. GFS2 now deals with this
gracefully.
-GFS2 was losing track of the bufdata for journaled data blocks,
and it wasn't getting freed, causing an error when you tried to
unmount the module. GFS2 now frees all the bufdata structures.
-There was a memory corruption occurring because GFS2 wrote
twice as many log entries for journaled buffers.
-It was occasionally trying to write journal headers in buffers
that weren't currently mapped.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Mask off the recently added DLM_LSFL_FS flag when setting the exflags.
This way all the nodes in the lockspace aren't required to have the FS
flag set, since we later check that exflags matches among all nodes.
Signed-off-by: Patrick Caulfield <pcaulfie@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
There were two issues during deallocation of unlinked inodes. The
first was relating to the use of a "try" lock which in the case of
the inode lock wasn't trying hard enough to deallocate in all
circumstances (now changed to a normal glock) and in the case of
the iopen lock didn't wait for the demotion of the shared lock before
attempting to get the exclusive lock, and thereby sometimes (timing dependent)
not completing the deallocation when it should have done.
The second issue related to the lack of a way to invalidate dcache entries
on remote nodes (now fixed by this patch) which meant that unlinks were
taking a long time to return disk space to the fs. By adding some code to
invalidate the dcache entries across the cluster for unlinked inodes, that
is now fixed.
This patch was written jointly by Abhijith Das and Steven Whitehouse.
Signed-off-by: Abhijith Das <adas@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
We weren't returning the correct result when GETLK found a conflict,
which is indicated by userspace passing back a 1.
Signed-off-by: Abhijith Das <adas redhat com>
Signed-off-by: David Teigland <teigland redhat com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Set the owner field in the plock info sent to userspace for GETLK.
Without this, gfs_controld won't correctly see when the GETLK from a
process matches one of the process's existing locks.
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
fs/gfs2/inode.c: In function 'gfs2_lookupi':
fs/gfs2/inode.c:392: warning: 'error' may be used uninitialized in this function
Looks like a real bug to me.
Cc: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Under certain circumstances its possible (though rather unlikely) that
inodes which were unlinked by one node while still open on another might
get "lost" in the sense that they don't get deallocated if the node
which held the inode open crashed before it was unlinked.
This patch adds the recovery code which allows automatic deallocation of
the inode if its found during block allocation (the sensible time to
look for such inodes since we are scanning the rgrp's bitmaps anyway at
this time, so it adds no overhead to do this).
Since the inode will have had its i_nlink set to zero, all we need to
trigger recovery is a lookup and an iput(), and the normal deallocation
code takes care of the rest.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This patch fixes bug 243131: Can't mount GFS2 file system on AoE device.
When using AoE devices with lock_nolock, there is no locking table, so
gfs2 (and gfs1) uses the superblock s_id. This turns out to be the device
name in some cases. In the case of AoE, the device contains a slash,
(e.g. "etherd/e1.1p2") which is an invalid character when we try to
register the table in sysfs. This patch replaces the "/" with underscore.
Rather than add a new variable to the stack, I'm just reusing a (char *)
variable that's no longer used: table.
This code has been tested on the failing system using a RHEL5 patch.
The upstream code was tested by using gfs2_tool sb to interject a "/"
into the table name of a clustered gfs2 file system.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This fixes a bug in the ordering of operations in the error path of
createi. Its not valid to do an iput() when holding the inode's glock
since the iput() will (in this case) result in delete_inode() being
called which needs to grab the lock itself. This was causing the
recursive lock checking code to trigger.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
A typo caused us to pass a NULL pointer when renaming directories. It
was accidentally introduced in: [GFS2] Clean up inode number handling
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Add a new flag, DLM_LSFL_FS, to be used when a file system creates a lockspace.
This flag causes the dlm to use GFP_NOFS for allocations instead of GFP_KERNEL.
(This updated version of the patch uses gfp_t for ls_allocation.)
Signed-Off-By: Patrick Caulfield <pcaulfie@redhat.com>
Signed-Off-By: David Teigland <teigland@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|