aboutsummaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAge
* nfsd41: proc stubsAndy Adamson2009-04-03
| | | | | Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* nfsd41: xdr infrastructureAndy Adamson2009-04-03
| | | | | | | | | | | | | | | | | Define nfsd41_dec_ops vector and add it to nfsd4_minorversion for minorversion 1. Note: nfsd4_enc_ops vector is shared for v4.0 and v4.1 since we don't need to filter out obsolete ops as this is done in the decoding phase. exchange_id, create_session, destroy_session, and sequence ops are implemented as stubs returning nfserr_opnotsupp at this stage. [was nfsd41: xdr stubs] [get rid of CONFIG_NFSD_V4_1] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* nfsd41: sessionid hashingMarc Eshel2009-04-03
| | | | | | | | | | | | | | | | | | Simple sessionid hashing using its monotonically increasing sequence number. Locking considerations: sessionid_hashtbl access is controlled by the sessionid_lock spin lock. It must be taken for insert, delete, and lookup. nfsd4_sequence looks up the session id and if the session is found, it calls nfsd4_get_session (still under the sessionid_lock). nfsd4_destroy_session calls nfsd4_put_session after unhashing it, so when the session's kref reaches zero it's going to get freed. Signed-off-by: Benny Halevy <bhalevy@panasas.com> [we don't use a prime for sessionid hash table size] [use sessionid_lock spin lock] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* nfsd41: release_session when client is expiredMarc Eshel2009-04-03
| | | | | | | | Signed-off-by: Benny Halevy <bhalevy@panasas.com> [add CONFIG_NFSD_V4_1 to fix v4.0 regression bug] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* nfsd41: introduce nfs4_client cl_sessions listMarc Eshel2009-04-03
| | | | | | [get rid of CONFIG_NFSD_V4_1] Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* nfsd41: sessions basic data typesAndy Adamson2009-04-03
| | | | | | | | | | | | | | | | | | | | | This patch provides basic data structures representing the nfs41 sessions and slots, plus helpers for keeping a reference count on the session and freeing it. Note that our server only support a headerpadsz of 0 and it ignores backchannel attributes at the moment. Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfsd41: remove headerpadsz from channel attributes] [nfsd41: embed nfsd4_channel in nfsd4_session] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfsd41: use bool inuse for slot state] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfsd41 remove sl_session from nfsd4_slot] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* nfsd: don't use the deferral service, return NFS4ERR_DELAYAndy Adamson2009-04-03
| | | | | | | | | | | | | | | | | | | | | | | On an NFSv4.1 server cache miss that causes an upcall, NFS4ERR_DELAY will be returned. It is up to the NFSv4.1 client to resend only the operations that have not been processed. Initialize rq_usedeferral to 1 in svc_process(). It sill be turned off in nfsd4_proc_compound() only when NFSv4.1 Sessions are used. Note: this isn't an adequate solution on its own. It's acceptable as a way to get some minimal 4.1 up and working, but we're going to have to find a way to avoid returning DELAY in all common cases before 4.1 can really be considered ready. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfsd41: reverse rq_nodeferral negative logic] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [sunrpc: initialize rq_usedeferral] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* nfsd: remove nfsd4_ops array sizeBenny Halevy2009-03-30
| | | | | | | There's no need for it. Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* nfsd: embed nfsd4_current_state in nfsd4_compoundresAndy Adamson2009-03-29
| | | | | | | | Remove the allocation of struct nfsd4_compound_state. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* Inconsistent setattr behaviourSachin S. Prabhu2009-03-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is an inconsistency seen in the behaviour of nfs compared to other local filesystems on linux when changing owner or group of a directory. If the directory has SUID/SGID flags set, on changing owner or group on the directory, the flags are stripped off on nfs. These flags are maintained on other filesystems such as ext3. To reproduce on a nfs share or local filesystem, run the following commands mkdir test; chmod +s+g test; chown user1 test; ls -ld test On the nfs share, the flags are stripped and the output seen is drwxr-xr-x 2 user1 root 4096 Feb 23 2009 test On other local filesystems(ex: ext3), the flags are not stripped and the output seen is drwsr-sr-x 2 user1 root 4096 Feb 23 13:57 test chown_common() called from sys_chown() will only strip the flags if the inode is not a directory. static int chown_common(struct dentry * dentry, uid_t user, gid_t group) { .. if (!S_ISDIR(inode->i_mode)) newattrs.ia_valid |= ATTR_KILL_SUID | ATTR_KILL_SGID | ATTR_KILL_PRIV; .. } See: http://www.opengroup.org/onlinepubs/7990989775/xsh/chown.html "If the path argument refers to a regular file, the set-user-ID (S_ISUID) and set-group-ID (S_ISGID) bits of the file mode are cleared upon successful return from chown(), unless the call is made by a process with appropriate privileges, in which case it is implementation-dependent whether these bits are altered. If chown() is successfully invoked on a file that is not a regular file, these bits may be cleared. These bits are defined in <sys/stat.h>." The behaviour as it stands does not appear to violate POSIX. However the actions performed are inconsistent when comparing ext3 and nfs. Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* nfsd4: don't check ip address in setclientidJ. Bruce Fields2009-03-18
| | | | | | | | | | | The spec allows clients to change ip address, so we shouldn't be requiring that setclientid always come from the same address. For example, a client could reboot and get a new dhcpd address, but still present the same clientid to the server. In that case the server should revoke the client's previous state and allow it to continue, instead of (as it currently does) returning a CLID_INUSE error. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* knfsd: add file to export stats about nfsd poolsGreg Banks2009-03-18
| | | | | | | | | | | | | | | | | | | | | | | | | | Add /proc/fs/nfsd/pool_stats to export to userspace various statistics about the operation of rpc server thread pools. This patch is based on a forward-ported version of knfsd-add-pool-thread-stats which has been shipping in the SGI "Enhanced NFS" product since 2006 and which was previously posted: http://article.gmane.org/gmane.linux.nfs/10375 It has also been updated thus: * moved EXPORT_SYMBOL() to near the function it exports * made the new struct struct seq_operations const * used SEQ_START_TOKEN instead of ((void *)1) * merged fix from SGI PV 990526 "sunrpc: use dprintk instead of printk in svc_pool_stats_*()" by Harshula Jayasuriya. * merged fix from SGI PV 964001 "Crash reading pool_stats before nfsds are started". Signed-off-by: Greg Banks <gnb@sgi.com> Signed-off-by: Harshula Jayasuriya <harshula@sgi.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* knfsd: remove the nfsd thread busy histogramGreg Banks2009-03-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Stop gathering the data that feeds the 'th' line in /proc/net/rpc/nfsd because the questionable data provided is not worth the scalability impact of calculating it. Instead, always report zeroes. The current approach suffers from three major issues: 1. update_thread_usage() increments buckets by call service time or call arrival time...in jiffies. On lightly loaded machines, call service times are usually < 1 jiffy; on heavily loaded machines call arrival times will be << 1 jiffy. So a large portion of the updates to the buckets are rounded down to zero, and the histogram is undercounting. 2. As seen previously on the nfs mailing list, the format in which the histogram is presented is cryptic, difficult to explain, and difficult to use. 3. Updating the histogram requires taking a global spinlock and dirtying the global variables nfsd_last_call, nfsd_busy, and nfsdstats *twice* on every RPC call, which is a significant scaling limitation. Testing on a 4 CPU 4 NIC Altix using 4 IRIX clients each doing 1K streaming reads at full line rate, shows the stats update code (inlined into nfsd()) takes about 1.7% of each CPU. This patch drops the contribution from nfsd() into the profile noise. This patch is a forward-ported version of knfsd-remove-nfsd-threadstats which has been shipping in the SGI "Enhanced NFS" product since 2006. In that time, exactly one customer has noticed that the threadstats were missing. It has been previously posted: http://article.gmane.org/gmane.linux.nfs/10376 and more recently requested to be posted again. Signed-off-by: Greg Banks <gnb@sgi.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* nfsd4: remove redundant check from nfsd4_openJ. Bruce Fields2009-03-18
| | | | | | | Note that we already checked for this invalid case at the top of this function. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* nfsd4: don't do lookup within readdir in recovery codeJ. Bruce Fields2009-03-18
| | | | | | | | | | | | | | | | The main nfsd code was recently modified to no longer do lookups from withing the readdir callback, to avoid locking problems on certain filesystems. This (rather hacky, and overdue for replacement) NFSv4 recovery code has the same problem. Fix it to build up a list of names (instead of dentries) and do the lookups afterwards. Reported symptoms were a deadlock in the xfs code (called from nfsd4_recdir_load), with /var/lib/nfs on xfs. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Reported-by: David Warren <warren@atmos.washington.edu>
* nfsd4: support putpubfh operationJ. Bruce Fields2009-03-18
| | | | | | | | | | | | | | | Currently putpubfh returns NFSERR_OPNOTSUPP, which isn't actually allowed for v4. The right error is probably NFSERR_NOTSUPP. But let's just implement it; though rarely seen, it can be used by Solaris (with a special mount option), is mandated by the rfc, and is trivial for us to support. Thanks to Yang Hongyang for pointing out the original problem, and to Mike Eisler, Tom Talpey, Trond Myklebust, and Dave Noveck for further argument.... Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* Short write in nfsd becomes a full write to the clientDavid Shaw2009-03-18
| | | | | | | | | | | | | | | | | | | | | | If a filesystem being written to via NFS returns a short write count (as opposed to an error) to nfsd, nfsd treats that as a success for the entire write, rather than the short count that actually succeeded. For example, given a 8192 byte write, if the underlying filesystem only writes 4096 bytes, nfsd will ack back to the nfs client that all 8192 bytes were written. The nfs client does have retry logic for short writes, but this is never called as the client is told the complete write succeeded. There are probably other ways it could happen, but in my case it happened with a fuse (filesystem in userspace) filesystem which can rather easily have a partial write. Here is a patch to properly return the short write count to the client. Signed-off-by: David Shaw <dshaw@jabberwocky.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* NFSD: return nfsv4 error code nfserr_notsupp rather than nfsv[23]'s ↵Benny Halevy2009-03-18
| | | | | | | | | | | | nfserr_opnotsupp Thanks for Bill Baker at sun.com for catching this at Connectathon 2009. This bug was introduced in 2.6.27 Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* nfsd4: move rpc_client setup to a separate functionJ. Bruce Fields2009-03-18
| | | | Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* nfsd4: fix do_probe_callback errorsJ. Bruce Fields2009-03-18
| | | | | | | | | The errors returned aren't used. Just return 0 and make them available to a dprintk(). Also, consistently use -ERRNO errors instead of nfs errors. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Reviewed-by: Benny Halevy <bhalevy@panasas.com>
* nfsd4: remove use of mutex for file_hashtableJ. Bruce Fields2009-03-18
| | | | | | | | | | | | | As part of reducing the scope of the client_mutex, and in order to remove the need for mutexes from the callback code (so that callbacks can be done as asynchronous rpc calls), move manipulations of the file_hashtable under the recall_lock. Update the relevant comments while we're here. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Cc: Alexandros Batsakis <batsakis@netapp.com> Reviewed-by: Benny Halevy <bhalevy@panasas.com>
* nfsd4: put_nfs4_client does not require state lockJ. Bruce Fields2009-03-18
| | | | | | | | | Since free_client() is guaranteed to only be called once, and to only touch the client structure itself (not any common data structures), it has no need for the state lock. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Cc: Alexandros Batsakis <batsakis@netapp.com>
* nfsd4: rename io_during_grace_disallowedJ. Bruce Fields2009-03-18
| | | | | | | Use a slightly clearer, more concise name. Also removed unused argument. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* nfsd4: remove unused CHECK_FH flagJ. Bruce Fields2009-03-18
| | | | | | All users now pass this, so it's meaningless. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* nfsd4: fail when delegreturn gets a non-delegation stateidJ. Bruce Fields2009-03-18
| | | | | | | | Previous cleanup reveals an obvious (though harmless) bug: when delegreturn gets a stateid that isn't for a delegation, it should return an error rather than doing nothing. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* nfsd4: separate delegreturn case from preprocess_stateid_opJ. Bruce Fields2009-03-18
| | | | | | | | | | | | Delegreturn is enough a special case for preprocess_stateid_op to warrant just open-coding it in delegreturn. There should be no change in behavior here; we're just reshuffling code. Thanks to Yang Hongyang for catching a critical typo. Reviewed-by: Yang Hongyang <yanghy@cn.fujitsu.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* nfsd4: add a helper function to decide if stateid is delegationJ. Bruce Fields2009-03-18
| | | | | | Make this check self-documenting. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* nfsd4: remove some dprintk'sJ. Bruce Fields2009-03-18
| | | | | | | | | I can't recall ever seeing these printk's used to debug a problem. I'll happily put them back if we see a case where they'd be useful. (Though if we do that the find_XXX() errors would probably be better reported in find_XXX() functions themselves.) Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* nfsd4: remove unneeded local variableJ. Bruce Fields2009-03-18
| | | | | | We no longer need stidp. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* nfsd4: remove redundant "if" in nfs4_preprocess_stateid_opJ. Bruce Fields2009-03-18
| | | | | | | | | Note that we exit this first big "if" with stp == NULL if and only if we took the first branch; therefore, the second "if" is redundant, and we can just combine the two, simplifying the logic. Reviewed-by: Yang Hongyang <yanghy@cn.fujitsu.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* nfsd4: move check_stateid_generation checkJ. Bruce Fields2009-03-18
| | | | | | No change in behavior. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* nfsd4: trivial preprocess_stateid_op cleanupJ. Bruce Fields2009-03-18
| | | | | | Remove a couple redundant comments, adjust style; no change in behavior. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* nfsd(v2/v3): fix the failure of creation from HPUX clientwengang wang2009-03-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sometimes HPUX nfs client sends a create request to linux nfs server(v2/v3). the dump of the request is like: obj_attributes mode: value follows set_it: value follows (1) mode: 00 uid: no value set_it: no value (0) gid: value follows set_it: value follows (1) gid: 8030 size: value follows set_it: value follows (1) size: 0 atime: don't change set_it: don't change (0) mtime: don't change set_it: don't change (0) note that mode is 00(havs no rwx privilege even for the owner) and it requires to set size to 0. as current nfsd(v2/v3) implementation, the server does mainly 2 steps: 1) creates the file in mode specified by calling vfs_create(). 2) sets attributes for the file by calling nfsd_setattr(). at step 2), it finally calls file system specific setattr() function which may fail when checking permission because changing size needs WRITE privilege but it has none since mode is 000. for this case, a new file created, we may simply ignore the request of setting size to 0, so that WRITE privilege is not needed and the open succeeds. Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> -- vfs.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* lockd: clean up blocking lock cases of nlsmvc_lock()Miklos Szeredi2009-03-18
| | | | | | | | No change in behavior, just rearranging the switch so that we break out of the switch if and only if we're in the wait case. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* nfsd: lock state around put client and delegation in nfsd4_cb_recallAlexandros Batsakis2009-03-18
| | | | | | | | | not having the state locked before putting the client/delegation causes a bug. Also removed the comment from the function header about the state being already locked Signed-off-by: Alexandros Batsakis <batsakis@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* nfsd4: use helper for copying delegation filehandleJ. Bruce Fields2009-03-18
| | | | Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* nfsd4: use helper for copying filehandles for replayJ. Bruce Fields2009-03-18
| | | | Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* nfsd4: fix misplaced commentJ. Bruce Fields2009-03-18
| | | | Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* nfsd: clarify exclusive create bitmask result.J. Bruce Fields2009-03-18
| | | | | | | The use of |= is confusing--the bitmask is always initialized to zero in this case, so we're effectively just doing an assignment here. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* nfsd : Define NFSD only when FILE_LOCKING is enabledManish Katiyar2009-03-18
| | | | | | | | Enable NFSD only when FILE_LOCKING is enabled, since we don't want to support NFSD without FILE_LOCKING. Signed-off-by: Manish Katiyar <mkatiyar@gmail.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* NFSD: cleanup for nfs3proc.cQinghuang Feng2009-03-18
| | | | | | | | MSDOS_SUPER_MAGIC is defined in <linux/magic.h>, so use MSDOS_SUPER_MAGIC directly. Signed-off-by: Qinghuang Feng <qhfeng.kernel@gmail.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* nfsd4: split open/lockowner release codeJ. Bruce Fields2009-03-18
| | | | | | | | The caller always knows specifically whether it's releasing a lockowner or an openowner, and the code is simpler if we use separate functions (and the apparent recursion is gone). Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* nfsd4: remove a forward declarationJ. Bruce Fields2009-03-18
| | | | Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* nfsd4: split lockstateid/openstateid release logicJ. Bruce Fields2009-03-18
| | | | | | | | | | | | | | | | | The flags here attempt to make the code more general, but I find it actually just adds confusion. I think it's clearer to separate the logic for the open and lock cases entirely. And eventually we may want to separate the stateowner and stateid types as well, as many of the fields aren't shared between the lock and open cases. Also move to eliminate forward references. Start with the stateid's. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Reviewed-by: Benny Halevy <bhalevy@panasas.com>
* Merge branch 'for-2.6.29' of git://linux-nfs.org/~bfields/linuxLinus Torvalds2009-03-18
|\ | | | | | | | | | | * 'for-2.6.29' of git://linux-nfs.org/~bfields/linux: nfsd: nfsd should drop CAP_MKNOD for non-root NFSD: provide encode routine for OP_OPENATTR
| * NFSD: provide encode routine for OP_OPENATTRBenny Halevy2009-03-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Although this operation is unsupported by our implementation we still need to provide an encode routine for it to merely encode its (error) status back in the compound reply. Thanks for Bill Baker at sun.com for testing with the Sun OpenSolaris' client, finding, and reporting this bug at Connectathon 2009. This bug was introduced in 2.6.27 Signed-off-by: Benny Halevy <bhalevy@panasas.com> Cc: stable@kernel.org Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* | Merge branch 'for_linus' of ↵Linus Torvalds2009-03-17
|\ \ | |/ |/| | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix bb_prealloc_list corruption due to wrong group locking ext4: fix bogus BUG_ONs in in mballoc code ext4: Print the find_group_flex() warning only once ext4: fix header check in ext4_ext_search_right() for deep extent trees.
| * ext4: fix bb_prealloc_list corruption due to wrong group lockingEric Sandeen2009-03-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is for Red Hat bug 490026: EXT4 panic, list corruption in ext4_mb_new_inode_pa ext4_lock_group(sb, group) is supposed to protect this list for each group, and a common code flow to remove an album is like this: ext4_get_group_no_and_offset(sb, pa->pa_pstart, &grp, NULL); ext4_lock_group(sb, grp); list_del(&pa->pa_group_list); ext4_unlock_group(sb, grp); so it's critical that we get the right group number back for this prealloc context, to lock the right group (the one associated with this pa) and prevent concurrent list manipulation. however, ext4_mb_put_pa() passes in (pa->pa_pstart - 1) with a comment, "-1 is to protect from crossing allocation group". This makes sense for the group_pa, where pa_pstart is advanced by the length which has been used (in ext4_mb_release_context()), and when the entire length has been used, pa_pstart has been advanced to the first block of the next group. However, for inode_pa, pa_pstart is never advanced; it's just set once to the first block in the group and not moved after that. So in this case, if we subtract one in ext4_mb_put_pa(), we are actually locking the *previous* group, and opening the race with the other threads which do not subtract off the extra block. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: fix bogus BUG_ONs in in mballoc codeEric Sandeen2009-03-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Thiemo Nagel reported that: # dd if=/dev/zero of=image.ext4 bs=1M count=2 # mkfs.ext4 -v -F -b 1024 -m 0 -g 512 -G 4 -I 128 -N 1 \ -O large_file,dir_index,flex_bg,extent,sparse_super image.ext4 # mount -o loop image.ext4 mnt/ # dd if=/dev/zero of=mnt/file oopsed, with a BUG_ON in ext4_mb_normalize_request because size == EXT4_BLOCKS_PER_GROUP It appears to me (esp. after talking to Andreas) that the BUG_ON is bogus; a request of exactly EXT4_BLOCKS_PER_GROUP should be allowed, though larger sizes do indicate a problem. Fix that an another (apparently rare) codepath with a similar check. Reported-by: Thiemo Nagel <thiemo.nagel@ph.tum.de> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: Print the find_group_flex() warning only onceTheodore Ts'o2009-03-12
| | | | | | | | | | | | | | This is a short-term warning, and even printk_ratelimit() can result in too much noise in system logs. So only print it once as a warning. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>