diff options
Diffstat (limited to 'Documentation/filesystems/porting')
| -rw-r--r-- | Documentation/filesystems/porting | 101 |
1 files changed, 95 insertions, 6 deletions
diff --git a/Documentation/filesystems/porting b/Documentation/filesystems/porting index b12c89538680..6e29954851a2 100644 --- a/Documentation/filesystems/porting +++ b/Documentation/filesystems/porting | |||
| @@ -216,7 +216,6 @@ had ->revalidate()) add calls in ->follow_link()/->readlink(). | |||
| 216 | ->d_parent changes are not protected by BKL anymore. Read access is safe | 216 | ->d_parent changes are not protected by BKL anymore. Read access is safe |
| 217 | if at least one of the following is true: | 217 | if at least one of the following is true: |
| 218 | * filesystem has no cross-directory rename() | 218 | * filesystem has no cross-directory rename() |
| 219 | * dcache_lock is held | ||
| 220 | * we know that parent had been locked (e.g. we are looking at | 219 | * we know that parent had been locked (e.g. we are looking at |
| 221 | ->d_parent of ->lookup() argument). | 220 | ->d_parent of ->lookup() argument). |
| 222 | * we are called from ->rename(). | 221 | * we are called from ->rename(). |
| @@ -299,11 +298,14 @@ be used instead. It gets called whenever the inode is evicted, whether it has | |||
| 299 | remaining links or not. Caller does *not* evict the pagecache or inode-associated | 298 | remaining links or not. Caller does *not* evict the pagecache or inode-associated |
| 300 | metadata buffers; getting rid of those is responsibility of method, as it had | 299 | metadata buffers; getting rid of those is responsibility of method, as it had |
| 301 | been for ->delete_inode(). | 300 | been for ->delete_inode(). |
| 302 | ->drop_inode() returns int now; it's called on final iput() with inode_lock | 301 | |
| 303 | held and it returns true if filesystems wants the inode to be dropped. As before, | 302 | ->drop_inode() returns int now; it's called on final iput() with |
| 304 | generic_drop_inode() is still the default and it's been updated appropriately. | 303 | inode->i_lock held and it returns true if filesystems wants the inode to be |
| 305 | generic_delete_inode() is also alive and it consists simply of return 1. Note that | 304 | dropped. As before, generic_drop_inode() is still the default and it's been |
| 306 | all actual eviction work is done by caller after ->drop_inode() returns. | 305 | updated appropriately. generic_delete_inode() is also alive and it consists |
| 306 | simply of return 1. Note that all actual eviction work is done by caller after | ||
| 307 | ->drop_inode() returns. | ||
| 308 | |||
| 307 | clear_inode() is gone; use end_writeback() instead. As before, it must | 309 | clear_inode() is gone; use end_writeback() instead. As before, it must |
| 308 | be called exactly once on each call of ->evict_inode() (as it used to be for | 310 | be called exactly once on each call of ->evict_inode() (as it used to be for |
| 309 | each call of ->delete_inode()). Unlike before, if you are using inode-associated | 311 | each call of ->delete_inode()). Unlike before, if you are using inode-associated |
| @@ -318,3 +320,90 @@ if it's zero is not *and* *never* *had* *been* enough. Final unlink() and iput( | |||
| 318 | may happen while the inode is in the middle of ->write_inode(); e.g. if you blindly | 320 | may happen while the inode is in the middle of ->write_inode(); e.g. if you blindly |
| 319 | free the on-disk inode, you may end up doing that while ->write_inode() is writing | 321 | free the on-disk inode, you may end up doing that while ->write_inode() is writing |
| 320 | to it. | 322 | to it. |
| 323 | |||
| 324 | --- | ||
| 325 | [mandatory] | ||
| 326 | |||
| 327 | .d_delete() now only advises the dcache as to whether or not to cache | ||
| 328 | unreferenced dentries, and is now only called when the dentry refcount goes to | ||
| 329 | 0. Even on 0 refcount transition, it must be able to tolerate being called 0, | ||
| 330 | 1, or more times (eg. constant, idempotent). | ||
| 331 | |||
| 332 | --- | ||
| 333 | [mandatory] | ||
| 334 | |||
| 335 | .d_compare() calling convention and locking rules are significantly | ||
| 336 | changed. Read updated documentation in Documentation/filesystems/vfs.txt (and | ||
| 337 | look at examples of other filesystems) for guidance. | ||
| 338 | |||
| 339 | --- | ||
| 340 | [mandatory] | ||
| 341 | |||
| 342 | .d_hash() calling convention and locking rules are significantly | ||
| 343 | changed. Read updated documentation in Documentation/filesystems/vfs.txt (and | ||
| 344 | look at examples of other filesystems) for guidance. | ||
| 345 | |||
| 346 | --- | ||
| 347 | [mandatory] | ||
| 348 | dcache_lock is gone, replaced by fine grained locks. See fs/dcache.c | ||
| 349 | for details of what locks to replace dcache_lock with in order to protect | ||
| 350 | particular things. Most of the time, a filesystem only needs ->d_lock, which | ||
| 351 | protects *all* the dcache state of a given dentry. | ||
| 352 | |||
| 353 | -- | ||
| 354 | [mandatory] | ||
| 355 | |||
| 356 | Filesystems must RCU-free their inodes, if they can have been accessed | ||
| 357 | via rcu-walk path walk (basically, if the file can have had a path name in the | ||
| 358 | vfs namespace). | ||
| 359 | |||
| 360 | i_dentry and i_rcu share storage in a union, and the vfs expects | ||
| 361 | i_dentry to be reinitialized before it is freed, so an: | ||
| 362 | |||
| 363 | INIT_LIST_HEAD(&inode->i_dentry); | ||
| 364 | |||
| 365 | must be done in the RCU callback. | ||
| 366 | |||
| 367 | -- | ||
| 368 | [recommended] | ||
| 369 | vfs now tries to do path walking in "rcu-walk mode", which avoids | ||
| 370 | atomic operations and scalability hazards on dentries and inodes (see | ||
| 371 | Documentation/filesystems/path-lookup.txt). d_hash and d_compare changes | ||
| 372 | (above) are examples of the changes required to support this. For more complex | ||
| 373 | filesystem callbacks, the vfs drops out of rcu-walk mode before the fs call, so | ||
| 374 | no changes are required to the filesystem. However, this is costly and loses | ||
| 375 | the benefits of rcu-walk mode. We will begin to add filesystem callbacks that | ||
| 376 | are rcu-walk aware, shown below. Filesystems should take advantage of this | ||
| 377 | where possible. | ||
| 378 | |||
| 379 | -- | ||
| 380 | [mandatory] | ||
| 381 | d_revalidate is a callback that is made on every path element (if | ||
| 382 | the filesystem provides it), which requires dropping out of rcu-walk mode. This | ||
| 383 | may now be called in rcu-walk mode (nd->flags & LOOKUP_RCU). -ECHILD should be | ||
| 384 | returned if the filesystem cannot handle rcu-walk. See | ||
| 385 | Documentation/filesystems/vfs.txt for more details. | ||
| 386 | |||
| 387 | permission and check_acl are inode permission checks that are called | ||
| 388 | on many or all directory inodes on the way down a path walk (to check for | ||
| 389 | exec permission). These must now be rcu-walk aware (flags & IPERM_FLAG_RCU). | ||
| 390 | See Documentation/filesystems/vfs.txt for more details. | ||
| 391 | |||
| 392 | -- | ||
| 393 | [mandatory] | ||
| 394 | In ->fallocate() you must check the mode option passed in. If your | ||
| 395 | filesystem does not support hole punching (deallocating space in the middle of a | ||
| 396 | file) you must return -EOPNOTSUPP if FALLOC_FL_PUNCH_HOLE is set in mode. | ||
| 397 | Currently you can only have FALLOC_FL_PUNCH_HOLE with FALLOC_FL_KEEP_SIZE set, | ||
| 398 | so the i_size should not change when hole punching, even when puching the end of | ||
| 399 | a file off. | ||
| 400 | |||
| 401 | -- | ||
| 402 | [mandatory] | ||
| 403 | |||
| 404 | -- | ||
| 405 | [mandatory] | ||
| 406 | ->get_sb() is gone. Switch to use of ->mount(). Typically it's just | ||
| 407 | a matter of switching from calling get_sb_... to mount_... and changing the | ||
| 408 | function type. If you were doing it manually, just switch from setting ->mnt_root | ||
| 409 | to some pointer to returning that pointer. On errors return ERR_PTR(...). | ||
