aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorJeff Layton <jlayton@redhat.com>2007-05-08 03:32:29 -0400
committerLinus Torvalds <torvalds@woody.linux-foundation.org>2007-05-08 14:15:16 -0400
commit866b04fccbf125cd39f2bdbcfeaa611d39a061a8 (patch)
tree5f59337d971bcb696b75e6fc39ca5d7d2a5a287b
parent63bd23591e6c3891d34e4c6dba7c6aa41b05caad (diff)
inode numbering: make static counters in new_inode and iunique be 32 bits
The problems are: - on filesystems w/o permanent inode numbers, i_ino values can be larger than 32 bits, which can cause problems for some 32 bit userspace programs on a 64 bit kernel. We can't do anything for filesystems that have actual >32-bit inode numbers, but on filesystems that generate i_ino values on the fly, we should try to have them fit in 32 bits. We could trivially fix this by making the static counters in new_inode and iunique 32 bits, but... - many filesystems call new_inode and assume that the i_ino values they are given are unique. They are not guaranteed to be so, since the static counter can wrap. This problem is exacerbated by the fix for #1. - after allocating a new inode, some filesystems call iunique to try to get a unique i_ino value, but they don't actually add their inodes to the hashtable, and so they're still not guaranteed to be unique if that counter wraps. This patch set takes the simpler approach of simply using iunique and hashing the inodes afterward. Christoph H. previously mentioned that he thought that this approach may slow down lookups for filesystems that currently hash their inodes. The questions are: 1) how much would this slow down lookups for these filesystems? 2) is it enough to justify adding more infrastructure to avoid it? What might be best is to start with this approach and then only move to using IDR or some other scheme if these extra inodes in the hashtable prove to be problematic. I've done some cursory testing with this patch and the overhead of hashing and unhashing the inodes with pipefs is pretty low -- just a few seconds of system time added on to the creation and destruction of 10 million pipes (very similar to the overhead that the IDR approach would add). The hard thing to measure is what effect this has on other filesystems. I'm open to ways to try and gauge this. Again, I've only converted pipefs as an example. If this approach is acceptable then I'll start work on patches to convert other filesystems. With a pretty-much-worst-case microbenchmark provided by Eric Dumazet <dada1@cosmosbay.com>: hashing patch (pipebench): sys 1m15.329s sys 1m16.249s sys 1m17.169s unpatched (pipebench): sys 1m9.836s sys 1m12.541s sys 1m14.153s Which works out to 1.05642174294555027017. So ~5-6% slowdown. This patch: When a 32-bit program that was not compiled with large file offsets does a stat and gets a st_ino value back that won't fit in the 32 bit field, glibc (correctly) generates an EOVERFLOW error. We can't do anything about fs's with larger permanent inode numbers, but when we generate them on the fly, we ought to try and have them fit within a 32 bit field. This patch takes the first step toward this by making the static counters in these two functions be 32 bits. [jlayton@redhat.com: mention that it's only the case for 32bit, non-LFS stat] Signed-off-by: Jeff Layton <jlayton@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-rw-r--r--fs/inode.c14
1 files changed, 12 insertions, 2 deletions
diff --git a/fs/inode.c b/fs/inode.c
index 410f235c337b..df2ef15d03d2 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -524,7 +524,12 @@ repeat:
524 */ 524 */
525struct inode *new_inode(struct super_block *sb) 525struct inode *new_inode(struct super_block *sb)
526{ 526{
527 static unsigned long last_ino; 527 /*
528 * On a 32bit, non LFS stat() call, glibc will generate an EOVERFLOW
529 * error if st_ino won't fit in target struct field. Use 32bit counter
530 * here to attempt to avoid that.
531 */
532 static unsigned int last_ino;
528 struct inode * inode; 533 struct inode * inode;
529 534
530 spin_lock_prefetch(&inode_lock); 535 spin_lock_prefetch(&inode_lock);
@@ -683,7 +688,12 @@ static unsigned long hash(struct super_block *sb, unsigned long hashval)
683 */ 688 */
684ino_t iunique(struct super_block *sb, ino_t max_reserved) 689ino_t iunique(struct super_block *sb, ino_t max_reserved)
685{ 690{
686 static ino_t counter; 691 /*
692 * On a 32bit, non LFS stat() call, glibc will generate an EOVERFLOW
693 * error if st_ino won't fit in target struct field. Use 32bit counter
694 * here to attempt to avoid that.
695 */
696 static unsigned int counter;
687 struct inode *inode; 697 struct inode *inode;
688 struct hlist_head *head; 698 struct hlist_head *head;
689 ino_t res; 699 ino_t res;