diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2012-03-21 16:36:41 -0400 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2012-03-21 16:36:41 -0400 |
commit | e2a0883e4071237d09b604a342c28b96b44a04b3 (patch) | |
tree | aa56f4d376b5eb1c32358c19c2669c2a94e0e1fd /Documentation/filesystems | |
parent | 3a990a52f9f25f45469e272017a31e7a3fda60ed (diff) | |
parent | 07c0c5d8b8c122b2f2df9ee574ac3083daefc981 (diff) |
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs pile 1 from Al Viro:
"This is _not_ all; in particular, Miklos' and Jan's stuff is not there
yet."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (64 commits)
ext4: initialization of ext4_li_mtx needs to be done earlier
debugfs-related mode_t whack-a-mole
hfsplus: add an ioctl to bless files
hfsplus: change finder_info to u32
hfsplus: initialise userflags
qnx4: new helper - try_extent()
qnx4: get rid of qnx4_bread/qnx4_getblk
take removal of PF_FORKNOEXEC to flush_old_exec()
trim includes in inode.c
um: uml_dup_mmap() relies on ->mmap_sem being held, but activate_mm() doesn't hold it
um: embed ->stub_pages[] into mmu_context
gadgetfs: list_for_each_safe() misuse
ocfs2: fix leaks on failure exits in module_init
ecryptfs: make register_filesystem() the last potential failure exit
ntfs: forgets to unregister sysctls on register_filesystem() failure
logfs: missing cleanup on register_filesystem() failure
jfs: mising cleanup on register_filesystem() failure
make configfs_pin_fs() return root dentry on success
configfs: configfs_create_dir() has parent dentry in dentry->d_parent
configfs: sanitize configfs_create()
...
Diffstat (limited to 'Documentation/filesystems')
-rw-r--r-- | Documentation/filesystems/debugfs.txt | 2 | ||||
-rw-r--r-- | Documentation/filesystems/porting | 6 | ||||
-rw-r--r-- | Documentation/filesystems/qnx6.txt | 174 |
3 files changed, 181 insertions, 1 deletions
diff --git a/Documentation/filesystems/debugfs.txt b/Documentation/filesystems/debugfs.txt index 4e2575873187..7a34f827989c 100644 --- a/Documentation/filesystems/debugfs.txt +++ b/Documentation/filesystems/debugfs.txt | |||
@@ -136,7 +136,7 @@ file. | |||
136 | void __iomem *base; | 136 | void __iomem *base; |
137 | }; | 137 | }; |
138 | 138 | ||
139 | struct dentry *debugfs_create_regset32(const char *name, mode_t mode, | 139 | struct dentry *debugfs_create_regset32(const char *name, umode_t mode, |
140 | struct dentry *parent, | 140 | struct dentry *parent, |
141 | struct debugfs_regset32 *regset); | 141 | struct debugfs_regset32 *regset); |
142 | 142 | ||
diff --git a/Documentation/filesystems/porting b/Documentation/filesystems/porting index b4a3d765ff9a..74acd9618819 100644 --- a/Documentation/filesystems/porting +++ b/Documentation/filesystems/porting | |||
@@ -429,3 +429,9 @@ filemap_write_and_wait_range() so that all dirty pages are synced out properly. | |||
429 | You must also keep in mind that ->fsync() is not called with i_mutex held | 429 | You must also keep in mind that ->fsync() is not called with i_mutex held |
430 | anymore, so if you require i_mutex locking you must make sure to take it and | 430 | anymore, so if you require i_mutex locking you must make sure to take it and |
431 | release it yourself. | 431 | release it yourself. |
432 | |||
433 | -- | ||
434 | [mandatory] | ||
435 | d_alloc_root() is gone, along with a lot of bugs caused by code | ||
436 | misusing it. Replacement: d_make_root(inode). The difference is, | ||
437 | d_make_root() drops the reference to inode if dentry allocation fails. | ||
diff --git a/Documentation/filesystems/qnx6.txt b/Documentation/filesystems/qnx6.txt new file mode 100644 index 000000000000..050223ea03c7 --- /dev/null +++ b/Documentation/filesystems/qnx6.txt | |||
@@ -0,0 +1,174 @@ | |||
1 | The QNX6 Filesystem | ||
2 | =================== | ||
3 | |||
4 | The qnx6fs is used by newer QNX operating system versions. (e.g. Neutrino) | ||
5 | It got introduced in QNX 6.4.0 and is used default since 6.4.1. | ||
6 | |||
7 | Option | ||
8 | ====== | ||
9 | |||
10 | mmi_fs Mount filesystem as used for example by Audi MMI 3G system | ||
11 | |||
12 | Specification | ||
13 | ============= | ||
14 | |||
15 | qnx6fs shares many properties with traditional Unix filesystems. It has the | ||
16 | concepts of blocks, inodes and directories. | ||
17 | On QNX it is possible to create little endian and big endian qnx6 filesystems. | ||
18 | This feature makes it possible to create and use a different endianness fs | ||
19 | for the target (QNX is used on quite a range of embedded systems) plattform | ||
20 | running on a different endianess. | ||
21 | The Linux driver handles endianness transparently. (LE and BE) | ||
22 | |||
23 | Blocks | ||
24 | ------ | ||
25 | |||
26 | The space in the device or file is split up into blocks. These are a fixed | ||
27 | size of 512, 1024, 2048 or 4096, which is decided when the filesystem is | ||
28 | created. | ||
29 | Blockpointers are 32bit, so the maximum space that can be adressed is | ||
30 | 2^32 * 4096 bytes or 16TB | ||
31 | |||
32 | The superblocks | ||
33 | --------------- | ||
34 | |||
35 | The superblock contains all global information about the filesystem. | ||
36 | Each qnx6fs got two superblocks, each one having a 64bit serial number. | ||
37 | That serial number is used to identify the "active" superblock. | ||
38 | In write mode with reach new snapshot (after each synchronous write), the | ||
39 | serial of the new master superblock is increased (old superblock serial + 1) | ||
40 | |||
41 | So basically the snapshot functionality is realized by an atomic final | ||
42 | update of the serial number. Before updating that serial, all modifications | ||
43 | are done by copying all modified blocks during that specific write request | ||
44 | (or period) and building up a new (stable) filesystem structure under the | ||
45 | inactive superblock. | ||
46 | |||
47 | Each superblock holds a set of root inodes for the different filesystem | ||
48 | parts. (Inode, Bitmap and Longfilenames) | ||
49 | Each of these root nodes holds information like total size of the stored | ||
50 | data and the adressing levels in that specific tree. | ||
51 | If the level value is 0, up to 16 direct blocks can be adressed by each | ||
52 | node. | ||
53 | Level 1 adds an additional indirect adressing level where each indirect | ||
54 | adressing block holds up to blocksize / 4 bytes pointers to data blocks. | ||
55 | Level 2 adds an additional indirect adressig block level (so, already up | ||
56 | to 16 * 256 * 256 = 1048576 blocks that can be adressed by such a tree)a | ||
57 | |||
58 | Unused block pointers are always set to ~0 - regardless of root node, | ||
59 | indirect adressing blocks or inodes. | ||
60 | Data leaves are always on the lowest level. So no data is stored on upper | ||
61 | tree levels. | ||
62 | |||
63 | The first Superblock is located at 0x2000. (0x2000 is the bootblock size) | ||
64 | The Audi MMI 3G first superblock directly starts at byte 0. | ||
65 | Second superblock position can either be calculated from the superblock | ||
66 | information (total number of filesystem blocks) or by taking the highest | ||
67 | device address, zeroing the last 3 bytes and then substracting 0x1000 from | ||
68 | that address. | ||
69 | |||
70 | 0x1000 is the size reserved for each superblock - regardless of the | ||
71 | blocksize of the filesystem. | ||
72 | |||
73 | Inodes | ||
74 | ------ | ||
75 | |||
76 | Each object in the filesystem is represented by an inode. (index node) | ||
77 | The inode structure contains pointers to the filesystem blocks which contain | ||
78 | the data held in the object and all of the metadata about an object except | ||
79 | its longname. (filenames longer than 27 characters) | ||
80 | The metadata about an object includes the permissions, owner, group, flags, | ||
81 | size, number of blocks used, access time, change time and modification time. | ||
82 | |||
83 | Object mode field is POSIX format. (which makes things easier) | ||
84 | |||
85 | There are also pointers to the first 16 blocks, if the object data can be | ||
86 | adressed with 16 direct blocks. | ||
87 | For more than 16 blocks an indirect adressing in form of another tree is | ||
88 | used. (scheme is the same as the one used for the superblock root nodes) | ||
89 | |||
90 | The filesize is stored 64bit. Inode counting starts with 1. (whilst long | ||
91 | filename inodes start with 0) | ||
92 | |||
93 | Directories | ||
94 | ----------- | ||
95 | |||
96 | A directory is a filesystem object and has an inode just like a file. | ||
97 | It is a specially formatted file containing records which associate each | ||
98 | name with an inode number. | ||
99 | '.' inode number points to the directory inode | ||
100 | '..' inode number points to the parent directory inode | ||
101 | Eeach filename record additionally got a filename length field. | ||
102 | |||
103 | One special case are long filenames or subdirectory names. | ||
104 | These got set a filename length field of 0xff in the corresponding directory | ||
105 | record plus the longfile inode number also stored in that record. | ||
106 | With that longfilename inode number, the longfilename tree can be walked | ||
107 | starting with the superblock longfilename root node pointers. | ||
108 | |||
109 | Special files | ||
110 | ------------- | ||
111 | |||
112 | Symbolic links are also filesystem objects with inodes. They got a specific | ||
113 | bit in the inode mode field identifying them as symbolic link. | ||
114 | The directory entry file inode pointer points to the target file inode. | ||
115 | |||
116 | Hard links got an inode, a directory entry, but a specific mode bit set, | ||
117 | no block pointers and the directory file record pointing to the target file | ||
118 | inode. | ||
119 | |||
120 | Character and block special devices do not exist in QNX as those files | ||
121 | are handled by the QNX kernel/drivers and created in /dev independant of the | ||
122 | underlaying filesystem. | ||
123 | |||
124 | Long filenames | ||
125 | -------------- | ||
126 | |||
127 | Long filenames are stored in a seperate adressing tree. The staring point | ||
128 | is the longfilename root node in the active superblock. | ||
129 | Each data block (tree leaves) holds one long filename. That filename is | ||
130 | limited to 510 bytes. The first two starting bytes are used as length field | ||
131 | for the actual filename. | ||
132 | If that structure shall fit for all allowed blocksizes, it is clear why there | ||
133 | is a limit of 510 bytes for the actual filename stored. | ||
134 | |||
135 | Bitmap | ||
136 | ------ | ||
137 | |||
138 | The qnx6fs filesystem allocation bitmap is stored in a tree under bitmap | ||
139 | root node in the superblock and each bit in the bitmap represents one | ||
140 | filesystem block. | ||
141 | The first block is block 0, which starts 0x1000 after superblock start. | ||
142 | So for a normal qnx6fs 0x3000 (bootblock + superblock) is the physical | ||
143 | address at which block 0 is located. | ||
144 | |||
145 | Bits at the end of the last bitmap block are set to 1, if the device is | ||
146 | smaller than addressing space in the bitmap. | ||
147 | |||
148 | Bitmap system area | ||
149 | ------------------ | ||
150 | |||
151 | The bitmap itself is devided into three parts. | ||
152 | First the system area, that is split into two halfs. | ||
153 | Then userspace. | ||
154 | |||
155 | The requirement for a static, fixed preallocated system area comes from how | ||
156 | qnx6fs deals with writes. | ||
157 | Each superblock got it's own half of the system area. So superblock #1 | ||
158 | always uses blocks from the lower half whilst superblock #2 just writes to | ||
159 | blocks represented by the upper half bitmap system area bits. | ||
160 | |||
161 | Bitmap blocks, Inode blocks and indirect addressing blocks for those two | ||
162 | tree structures are treated as system blocks. | ||
163 | |||
164 | The rational behind that is that a write request can work on a new snapshot | ||
165 | (system area of the inactive - resp. lower serial numbered superblock) while | ||
166 | at the same time there is still a complete stable filesystem structer in the | ||
167 | other half of the system area. | ||
168 | |||
169 | When finished with writing (a sync write is completed, the maximum sync leap | ||
170 | time or a filesystem sync is requested), serial of the previously inactive | ||
171 | superblock atomically is increased and the fs switches over to that - then | ||
172 | stable declared - superblock. | ||
173 | |||
174 | For all data outside the system area, blocks are just copied while writing. | ||