litmus-rt.git - The LITMUS^RT kernel.

	Commit message (Collapse)	Author	Age
*	ocfs2: Some tiny bug fixes for discontiguous block allocation.	Tao Ma	2010-04-22
\| \| \| \| \| \| \| \| \| \| \| \|	The fixes include: 1. some endian problems. 2. we should use bit/bpc in ocfs2_block_group_grow_discontig to allocate clusters. 3. set num_clusters properly in __ocfs2_claim_clusters. 4. change name from ocfs2_supports_discontig_bh to ocfs2_supports_discontig_bg. Signed-off-by: Tao Ma <tao.ma@oracle.com>
*	ocfs2: Allocate discontiguous block groups.	Joel Becker	2010-04-13
\| \| \| \| \| \| \| \|	If we cannot get a contiguous region for a block group, allocate a discontiguous one when the filesystem supports it. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Tao Ma <tao.ma@oracle.com>
*	ocfs2: Add dir_resv_level mount option	Mark Fasheh	2010-05-05
\| \| \| \| \| \| \| \| \|	The default behavior for directory reservations stays the same, but we add a mount option so people can tweak the size of directory reservations according to their workloads. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
*	ocfs2: increase the default size of local alloc windows	Mark Fasheh	2010-05-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I have observed that the current size of 8M gives us pretty poor fragmentation on multi-threaded workloads which do lots of writes. Generally, I can increase the size of local alloc windows and observe a marked decrease in fragmentation, even up and beyond window sizes of 512 megabytes. This makes sense for a couple reasons - larger local alloc means more room for reservation windows. On multi-node workloads the larger local alloc helps as well because we don't have to do window slides as often. Also, I removed the OCFS2_DEFAULT_LOCAL_ALLOC_SIZE constant as it is no longer used and the comment above it was out of date. To test fragmentation, I used a workload which launched 4 threads that did 4k writes into a series of about 140 alternating files. With resv_level=2, and a 4k/4k file system I observed the following average fragmentation for various localalloc= parameters: localalloc= avg. fragmentation 8 48 32 16 64 10 120 7 On larger cluster sizes, the difference is more dramatic. The new default size top out at 256M, which we'll only get for cluster sizes of 32K and above. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
*	ocfs2: clean up localalloc mount option size parsing	Mark Fasheh	2010-05-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch pulls the local alloc sizing code into localalloc.c and provides a callout to it from ocfs2_fill_super(). Behavior is essentially unchanged except that I correctly calculate the maximum local alloc size. The old code in ocfs2_parse_options() calculated the max size as: ocfs2_local_alloc_size(sb) * 8 which is correct, in bits. Unfortunately though the option passed in is in megabytes. Ultimately, this bug made no real difference - the shrink code would catch a too-large size and bring it down to something reasonable. Still, it's less than efficient as-is. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
*	ocfs2: allocation reservations	Mark Fasheh	2010-05-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch improves Ocfs2 allocation policy by allowing an inode to reserve a portion of the local alloc bitmap for itself. The reserved portion (allocation window) is advisory in that other allocation windows might steal it if the local alloc bitmap becomes full. Otherwise, the reservations are honored and guaranteed to be free. When the local alloc window is moved to a different portion of the bitmap, existing reservations are discarded. Reservation windows are represented internally by a red-black tree. Within that tree, each node represents the reservation window of one inode. An LRU of active reservations is also maintained. When new data is written, we allocate it from the inodes window. When all bits in a window are exhausted, we allocate a new one as close to the previous one as possible. Should we not find free space, an existing reservation is pulled off the LRU and cannibalized. Signed-off-by: Mark Fasheh <mfasheh@suse.com>
*	ocfs2: Clear undo bits when local alloc is freed	Mark Fasheh	2010-03-23
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	When the local alloc file changes windows, unused bits are freed back to the global bitmap. By defnition, those bits can not be in use by any file. Also, the local alloc will never have been able to allocate those bits if they were part of a previous truncate. Therefore it makes sense that we should clear unused local alloc bits in the undo buffer so that they can be used immediatly. [ Modified to call it ocfs2_release_clusters() -- Joel ] Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
*	Ocfs2: Move ocfs2 ioctl definitions from ocfs2_fs.h to newly added ocfs2_ioctl.h	Tristan Ye	2010-03-02
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently we were adding ioctl cmds/structures for ocfs2 into ocfs2_fs.h which was used for define ocfs2 on-disk layout. That sounds a little bit confusing, and it may be quickly polluted espcially when growing the ocfs2_info_request ioctls afterwards(it will grow i bet). As a result, such OCFS2 IOCs do need to be placed somewhere other than ocfs2_fs.h, a separated ocfs2_ioctl.h will be added to store such ioctl structures and definitions which could also be used from userspace to invoke ioctls call. Signed-off-by: Tristan Ye <tristan.ye@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
*	ocfs2: Attach the connection to the lksb	Joel Becker	2010-02-26
\| \| \| \| \| \| \|	We're going to want it in the ast functions, so we convert union ocfs2_dlm_lksb to struct ocfs2_dlm_lksb and let it carry the connection. Signed-off-by: Joel Becker <joel.becker@oracle.com>
*	ocfs2: add extent block stealing for ocfs2 v5	Tiger Yang	2010-02-26
\| \| \| \| \| \| \| \| \| \| \|	This patch add extent block (metadata) stealing mechanism for extent allocation. This mechanism is same as the inode stealing. if no room in slot specific extent_alloc, we will try to allocate extent block from the next slot. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Acked-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
*	ocfs2: Prevent a livelock in dlmglue	Sunil Mushran	2010-02-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is possibility of a livelock in __ocfs2_cluster_lock(). If a node were to get an ast for an upconvert request, followed immediately by a bast, there is a small window where the fs may downconvert the lock before the process requesting the upconvert is able to take the lock. This patch adds a new flag to indicate that the upconvert is still in progress and that the dc thread should not downconvert it right now. Wengang Wang <wen.gang.wang@oracle.com> and Joel Becker <joel.becker@oracle.com> contributed heavily to this patch. Reported-by: David Teigland <teigland@redhat.com> Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
*	Merge branch 'upstream-linus' of ↵	Linus Torvalds	2009-12-24
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: ocfs2/trivial: Use le16_to_cpu for a disk value in xattr.c ocfs2/trivial: Use proper mask for 2 places in hearbeat.c Ocfs2: Let ocfs2 support fiemap for symlink and fast symlink. Ocfs2: Should ocfs2 support fiemap for S_IFDIR inode? ocfs2: Use FIEMAP_EXTENT_SHARED fiemap: Add new extent flag FIEMAP_EXTENT_SHARED ocfs2: replace u8 by __u8 in ocfs2_fs.h ocfs2: explicit declare uninitialized var in user_cluster_connect() ocfs2-devel: remove redundant OCFS2_MOUNT_POSIX_ACL check in ocfs2_get_acl_nolock() ocfs2: return -EAGAIN instead of EAGAIN in dlm ocfs2/cluster: Make fence method configurable - v2 ocfs2: Set MS_POSIXACL on remount ocfs2: Make acl use the default ocfs2: Always include ACL support
\| *	ocfs2: Make acl use the default	Jan Kara	2009-10-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Change acl mount options handling to match the one of XFS and BTRFS and hopefully it is also easier to use now. When admin does not specify any acl mount option, acls are enabled if and only if the filesystem has xattr feature enabled. If admin specifies 'acl' mount option, we fail the mount if the filesystem does not have xattr feature and thus acls cannot be enabled. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>
* \|	ocfs2: Trivial cleanup of jbd compatibility layer removal	Sunil Mushran	2009-11-13
\|/ \| \| \| \| \| \| \| \|	Mainline commit 53ef99cad9878f02f27bb30bc304fc42af8bdd6e removed the JBD compatibility layer from OCFS2. This patch removes the last remaining remnants of that. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
*	ocfs2: Add functions for extents refcounted.	Tao Ma	2009-09-22
\| \| \| \| \| \| \|	Add function ocfs2_mark_extent_refcounted which can mark an extent refcounted. Signed-off-by: Tao Ma <tao.ma@oracle.com>
*	ocfs2: Add refcount tree lock mechanism.	Tao Ma	2009-09-22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Implement locking around struct ocfs2_refcount_tree. This protects all read/write operations on refcount trees. ocfs2_refcount_tree has its own lock and its own caching_info, protecting buffers among multiple nodes. User must call ocfs2_lock_refcount_tree before his operation on the tree and unlock it after that. ocfs2_refcount_trees are referenced by the block number of the refcount tree root block, So we create an rb-tree on the ocfs2_super to look them up. Signed-off-by: Tao Ma <tao.ma@oracle.com>
*	ocfs2: Add ocfs2_read_refcount_block.	Tao Ma	2009-09-22
\| \| \| \|	Signed-off-by: Tao Ma <tao.ma@oracle.com>
*	ocfs2: Pass struct ocfs2_caching_info to the journal functions.	Joel Becker	2009-09-04
\| \| \| \| \| \| \| \| \| \| \| \| \|	The next step in divorcing metadata I/O management from struct inode is to pass struct ocfs2_caching_info to the journal functions. Thus the journal locks a metadata cache with the cache io_lock function. It also can compare ci_last_trans and ci_created_trans directly. This is a large patch because of all the places we change ocfs2_journal_access..(handle, inode, ...) to ocfs2_journal_access..(handle, INODE_CACHE(inode), ...). Signed-off-by: Joel Becker <joel.becker@oracle.com>
*	ocfs2: move ip_created_trans to struct ocfs2_caching_info	Joel Becker	2009-09-04
\| \| \| \| \| \| \| \| \| \| \| \| \|	Similar ip_last_trans, ip_created_trans tracks the creation of a journal managed inode. This specifically tracks what transaction created the inode. This is so the code can know if the inode has ever been written to disk. This behavior is desirable for any journal managed object. We move it to struct ocfs2_caching_info as ci_created_trans so that any object using ocfs2_caching_info can rely on this behavior. Signed-off-by: Joel Becker <joel.becker@oracle.com>
*	ocfs2: move ip_last_trans to struct ocfs2_caching_info	Joel Becker	2009-09-04
\| \| \| \| \| \| \| \| \| \| \| \|	We have the read side of metadata caching isolated to struct ocfs2_caching_info, now we need the write side. This means the journal functions. The journal only does a couple of things with struct inode. This change moves the ip_last_trans field onto struct ocfs2_caching_info as ci_last_trans. This field tells the journal whether a pending journal flush is required. Signed-off-by: Joel Becker <joel.becker@oracle.com>
*	ocfs2: Change metadata caching locks to an operations structure.	Joel Becker	2009-09-04
\| \| \| \| \| \| \| \| \|	We don't really want to cart around too many new fields on the ocfs2_caching_info structure. So let's wrap all our access of the parent object in a set of operations. One pointer on caching_info, and more flexibility to boot. Signed-off-by: Joel Becker <joel.becker@oracle.com>
*	ocfs2: Make the ocfs2_caching_info structure self-contained.	Joel Becker	2009-09-04
\| \| \| \| \| \| \| \| \| \| \| \|	We want to use the ocfs2_caching_info structure in places that are not inodes. To do that, it can no longer rely on referencing the inode directly. This patch moves the flags to ocfs2_caching_info->ci_flags, stores pointers to the parent's locks on the ocfs2_caching_info, and renames the constants and flags to reflect its independant state. Signed-off-by: Joel Becker <joel.becker@oracle.com>
*	ocfs2: Fix deadlock on umount	Jan Kara	2009-07-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In commit ea455f8ab68338ba69f5d3362b342c115bea8e13, we moved the dentry lock put process into ocfs2_wq. This causes problems during umount because ocfs2_wq can drop references to inodes while they are being invalidated by invalidate_inodes() causing all sorts of nasty things (invalidate_inodes() ending in an infinite loop, "Busy inodes after umount" messages etc.). We fix the problem by stopping ocfs2_wq from doing any further releasing of inode references on the superblock being unmounted, wait until it finishes the current round of releasing and finally cleaning up all the references in dentry_lock_list from ocfs2_put_super(). The issue was tracked down by Tao Ma <tao.ma@oracle.com>. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>
*	ocfs2: Add lockdep annotations	Jan Kara	2009-06-22
\| \| \| \| \| \| \| \| \| \|	Add lockdep support to OCFS2. The support also covers all of the cluster locks except for open locks, journal locks, and local quotafile locks. These are special because they are acquired for a node, not for a particular process and lockdep cannot deal with such type of locking. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>
*	ocfs2: Stop orphan scan as early as possible during umount	Sunil Mushran	2009-06-22
\| \| \| \| \| \| \| \| \| \| \|	Currently if the orphan scan fires a tick before the user issues the umount, the umount will wait for the queued orphan scan tasks to complete. This patch makes the umount stop the orphan scan as early as possible so as to reduce the probability of the queued tasks slowing down the umount. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
*	ocfs2: Add statistics for the checksum and ecc operations.	Joel Becker	2009-06-03
\| \| \| \| \| \| \| \| \| \| \| \| \|	It would be nice to know how often we get checksum failures. Even better, how many of them we can fix with the single bit ecc. So, we add a statistics structure. The structure can be installed into debugfs wherever the user wants. For ocfs2, we'll put it in the superblock-specific debugfs directory and pass it down from our higher-level functions. The stats are only registered with debugfs when the filesystem supports metadata ecc. Signed-off-by: Joel Becker <joel.becker@oracle.com>
*	ocfs2 patch to track delayed orphan scan timer statistics	Srinivas Eeda	2009-06-03
\| \| \| \| \| \| \| \| \| \|	Patch to track delayed orphan scan timer statistics. Modifies ocfs2_osb_dump to print the following: Orphan Scan=> Local: 10 Global: 21 Last Scan: 67 seconds ago Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
*	ocfs2: timer to queue scan of all orphan slots	Srinivas Eeda	2009-06-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When a dentry is unlinked, the unlinking node takes an EX on the dentry lock before moving the dentry to the orphan directory. Other nodes that have this dentry in cache have a PR on the same dentry lock. When the EX is requested, the other nodes flag the corresponding inode as MAYBE_ORPHANED during downconvert. The inode is finally deleted when the last node to iput the inode sees that i_nlink==0 and the MAYBE_ORPHANED flag is set. A problem arises if a node is forced to free dentry locks because of memory pressure. If this happens, the node will no longer get downconvert notifications for the dentries that have been unlinked on another node. If it also happens that node is actively using the corresponding inode and happens to be the one performing the last iput on that inode, it will fail to delete the inode as it will not have the MAYBE_ORPHANED flag set. This patch fixes this shortcoming by introducing a periodic scan of the orphan directories to delete such inodes. Care has been taken to distribute the workload across the cluster so that no one node has to perform the task all the time. Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com> Signed-off-by: Joel Becker <joel.becker@oracle.com>
*	ocfs2: recover orphans in offline slots during recovery and mount	Srinivas Eeda	2009-04-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	During recovery, a node recovers orphans in it's slot and the dead node(s). But if the dead nodes were holding orphans in offline slots, they will be left unrecovered. If the dead node is the last one to die and is holding orphans in other slots and is the first one to mount, then it only recovers it's own slot, which leaves orphans in offline slots. This patch queues complete_recovery to clean orphans for all offline slots during mount and node recovery. Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com> Acked-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
*	ocfs2: fix rare stale inode errors when exporting via nfs	wengang wang	2009-04-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For nfs exporting, ocfs2_get_dentry() returns the dentry for fh. ocfs2_get_dentry() may read from disk when the inode is not in memory, without any cross cluster lock. this leads to the file system loading a stale inode. This patch fixes above problem. Solution is that in case of inode is not in memory, we get the cluster lock(PR) of alloc inode where the inode in question is allocated from (this causes node on which deletion is done sync the alloc inode) before reading out the inode itsself. then we check the bitmap in the group (the inode in question allcated from) to see if the bit is clear. if it's clear then it's stale. if the bit is set, we then check generation as the existing code does. We have to read out the inode in question from disk first to know its alloc slot and allot bit. And if its not stale we read it out using ocfs2_iget(). The second read should then be from cache. And also we have to add a per superblock nfs_sync_lock to cover the lock for alloc inode and that for inode in question. this is because ocfs2_get_dentry() and ocfs2_delete_inode() lock on them in reverse order. nfs_sync_lock is locked in EX mode in ocfs2_get_dentry() and in PR mode in ocfs2_delete_inode(). so that mutliple ocfs2_delete_inode() can run concurrently in normal case. [mfasheh@suse.com: build warning fixes and comment cleanups] Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Acked-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
*	ocfs2: Optimize inode group allocation by recording last used group.	Tao Ma	2009-04-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In ocfs2, the block group search looks for the "emptiest" group to allocate from. So if the allocator has many equally(or almost equally) empty groups, new block group will tend to get spread out amongst them. So we add osb_inode_alloc_group in ocfs2_super to record the last used inode allocation group. For more details, please see http://oss.oracle.com/osswiki/OCFS2/DesignDocs/InodeAllocationStrategy. I have done some basic test and the results are a ten times improvement on some cold-cache stat workloads. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
*	ocfs2: fix leaf start calculation in ocfs2_dx_dir_rebalance()	Mark Fasheh	2009-04-03
\| \| \| \| \| \| \| \| \| \| \| \|	ocfs2_dx_dir_rebalance() is passed the block offset of a dx leaf which needs rebalancing. Since we rebalance an entire cluster at a time however, this function needs to calculate the beginning of that cluster, in blocks. The calculation was wrong, which would result in a read of non-leaf blocks. Fix the calculation by adding ocfs2_block_to_cluster_start() which is a more straight-forward way of determining this. Reported-by: Tristan Ye <tristan.ye@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
*	ocfs2: Increase max links count	Mark Fasheh	2009-04-03
\| \| \| \| \| \| \| \| \|	Since we've now got a directory format capable of handling a large number of entries, we can increase the maximum link count supported. This only gets increased if the directory indexing feature is turned on. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Acked-by: Joel Becker <joel.becker@oracle.com>
*	ocfs2: Add a name indexed b-tree to directory inodes	Mark Fasheh	2009-04-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch makes use of Ocfs2's flexible btree code to add an additional tree to directory inodes. The new tree stores an array of small, fixed-length records in each leaf block. Each record stores a hash value, and pointer to a block in the traditional (unindexed) directory tree where a dirent with the given name hash resides. Lookup exclusively uses this tree to find dirents, thus providing us with constant time name lookups. Some of the hashing code was copied from ext3. Unfortunately, it has lots of unfixed checkpatch errors. I left that as-is so that tracking changes would be easier. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Acked-by: Joel Becker <joel.becker@oracle.com>
*	ocfs2: Remove debugfs file local_alloc_stats	Sunil Mushran	2009-04-03
\| \| \| \| \| \| \| \|	This patch removes the debugfs file local_alloc_stats as that information is now included in the fs_state debugfs file. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
*	ocfs2: Expose the file system state via debugfs	Sunil Mushran	2009-04-03
\| \| \| \| \| \| \| \| \| \|	This patch creates a per mount debugfs file, fs_state, which exposes information like, cluster stack in use, states of the downconvert, recovery and commit threads, number of journal txns, some allocation stats, list of all slots, etc. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
*	ocfs2: lock the metaecc process for xattr bucket	Tao Ma	2009-02-26
\| \| \| \| \| \| \| \| \| \| \|	For other metadata in ocfs2, metaecc is checked in ocfs2_read_blocks with io_mutex held. While for xattr bucket, it is calculated by the whole buckets. So we have to add a spin_lock to prevent multiple processes calculating metaecc. Signed-off-by: Tao Ma <tao.ma@oracle.com> Tested-by: Tristan Ye <tristan.ye@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
*	ocfs2: Push out dropping of dentry lock to ocfs2_wq	Jan Kara	2009-02-02
\| \| \| \| \| \| \| \| \| \|	Dropping of last reference to dentry lock is a complicated operation involving dropping of reference to inode. This can get complicated and quota code in particular needs to obtain some quota locks which leads to potential deadlock. Thus we defer dropping of inode reference to ocfs2_wq. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
*	ocfs2: Add directory block trailers.	Mark Fasheh	2009-01-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Future ocfs2 features metaecc and indexed directories need to store a little bit of data in each dirblock. For compatibility, we place this in a trailer at the end of the dirblock. The trailer plays itself as an empty dirent, so that if the features are turned off, it can be reused without requiring a tunefs scan. This code adds the trailer and validates it when the block is read in. [ Mark is the original author, but I reinserted this code before his dir index work. -- Joel ] Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
*	ocfs2: Use metadata-specific ocfs2_journal_access_*() functions.	Joel Becker	2009-01-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The per-metadata-type ocfs2_journal_access_() functions hook up jbd2 commit triggers and allow us to compute metadata ecc right before the buffers are written out. This commit provides ecc for inodes, extent blocks, group descriptors, and quota blocks. It is not safe to use extened attributes and metaecc at the same time yet. The ocfs2_extent_tree and ocfs2_path abstractions in alloc.c both hide the type of block at their root. Before, it didn't matter, but now the root block must use the appropriate ocfs2_journal_access_() function. To keep this abstract, the structures now have a pointer to the matching journal_access function and a wrapper call to call it. A few places use naked ocfs2_write_block() calls instead of adding the blocks to the journal. We make sure to calculate their checksum and ecc before the write. Since we pass around the journal_access functions. Let's typedef them in ocfs2.h. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
*	ocfs2: Add the underlying blockcheck code.	Joel Becker	2009-01-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is the code that computes crc32 and ecc for ocfs2 metadata blocks. There are high-level functions that check whether the filesystem has the ecc feature, mid-level functions that work on a single block or array of buffer_heads, and the low-level ecc hamming code that can handle multiple buffers like crc32_le(). It's not hooked up to the filesystem yet. Signed-off-by: Joel Becker <joel.becker@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
*	ocfs2: Enable quota accounting on mount, disable on umount	Jan Kara	2009-01-05
\| \| \| \| \| \| \| \| \|	Enable quota usage tracking on mount and disable it on umount. Also add support for quota on and quota off quotactls and usrquota and grpquota mount options. Add quota features among supported ones. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
*	ocfs2: Implement quota recovery	Jan Kara	2009-01-05
\| \| \| \| \| \| \| \|	Implement functions for recovery after a crash. Functions just read local quota file and sync info to global quota file. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
*	ocfs2: Wrap extent block reads in a dedicated function.	Joel Becker	2009-01-05
\| \| \| \| \| \| \| \| \| \|	We weren't consistently checking extent blocks after we read them. Most places checked the signature, but none checked h_blkno or h_fs_signature. Create a toplevel ocfs2_read_extent_block() that does the read and the validation. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
*	ocfs2: Morph the haphazard OCFS2_IS_VALID_GROUP_DESC() checks.	Joel Becker	2009-01-05
\| \| \| \| \| \| \| \| \| \| \| \| \|	Random places in the code would check a group descriptor bh to see if it was valid. The previous commit unified descriptor block reads, validating all block reads in the same place. Thus, these checks are no longer necessary. Rather than eliminate them, however, we change them to BUG_ON() checks. This ensures the assumptions remain true. All of the code paths to these checks have been audited to ensure they come from a validated descriptor read. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
*	ocfs2: Morph the haphazard OCFS2_IS_VALID_DINODE() checks.	Joel Becker	2009-01-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Random places in the code would check a dinode bh to see if it was valid. Not only did they do different levels of validation, they handled errors in different ways. The previous commit unified inode block reads, validating all block reads in the same place. Thus, these haphazard checks are no longer necessary. Rather than eliminate them, however, we change them to BUG_ON() checks. This ensures the assumptions remain true. All of the code paths to these checks have been audited to ensure they come from a validated inode read. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
*	ocfs2: add POSIX ACL API	Tiger Yang	2009-01-05
\| \| \| \| \| \| \| \| \|	This patch adds POSIX ACL(access control lists) APIs in ocfs2. We convert struct posix_acl to many ocfs2_acl_entry and regard them as an extended attribute entry. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
*	ocfs2: comments typo fix	Coly Li	2008-12-01
\| \| \| \| \| \| \|	This patch fixes two typos in comments of ocfs2. Signed-off-by: Coly Li <coyli@suse.de> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
*	ocfs2: Check xattr block signatures properly.	Joel Becker	2008-11-10
\| \| \| \| \| \| \| \| \| \| \| \|	The xattr.c code is currently memcmp()ing naking buffer pointers. Create the OCFS2_IS_VALID_XATTR_BLOCK() macro to match its peers and use that. In addition, failed signature checks were returning -EFAULT, which is completely wrong. Return -EIO. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
*	ocfs2: Switch over to JBD2.	Joel Becker	2008-10-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ocfs2 wants JBD2 for many reasons, not the least of which is that JBD is limiting our maximum filesystem size. It's a pretty trivial change. Most functions are just renamed. The only functional change is moving to Jan's inode-based ordered data mode. It's better, too. Because JBD2 reads and writes JBD journals, this is compatible with any existing filesystem. It can even interact with JBD-based ocfs2 as long as the journal is formated for JBD. We provide a compatibility option so that paranoid people can still use JBD for the time being. This will go away shortly. [ Moved call of ocfs2_begin_ordered_truncate() from ocfs2_delete_inode() to ocfs2_truncate_for_delete(). --Mark ] Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>