diff options
Diffstat (limited to 'Documentation/filesystems/vfs.txt')
-rw-r--r-- | Documentation/filesystems/vfs.txt | 177 |
1 files changed, 2 insertions, 175 deletions
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt index 821090946a1a..ee4c0a8b8db7 100644 --- a/Documentation/filesystems/vfs.txt +++ b/Documentation/filesystems/vfs.txt | |||
@@ -721,181 +721,8 @@ manipulate dentries: | |||
721 | and the dentry is returned. The caller must use d_put() | 721 | and the dentry is returned. The caller must use d_put() |
722 | to free the dentry when it finishes using it. | 722 | to free the dentry when it finishes using it. |
723 | 723 | ||
724 | 724 | For further information on dentry locking, please refer to the document | |
725 | RCU-based dcache locking model | 725 | Documentation/filesystems/dentry-locking.txt. |
726 | ------------------------------ | ||
727 | |||
728 | On many workloads, the most common operation on dcache is | ||
729 | to look up a dentry, given a parent dentry and the name | ||
730 | of the child. Typically, for every open(), stat() etc., | ||
731 | the dentry corresponding to the pathname will be looked | ||
732 | up by walking the tree starting with the first component | ||
733 | of the pathname and using that dentry along with the next | ||
734 | component to look up the next level and so on. Since it | ||
735 | is a frequent operation for workloads like multiuser | ||
736 | environments and web servers, it is important to optimize | ||
737 | this path. | ||
738 | |||
739 | Prior to 2.5.10, dcache_lock was acquired in d_lookup and thus | ||
740 | in every component during path look-up. Since 2.5.10 onwards, | ||
741 | fast-walk algorithm changed this by holding the dcache_lock | ||
742 | at the beginning and walking as many cached path component | ||
743 | dentries as possible. This significantly decreases the number | ||
744 | of acquisition of dcache_lock. However it also increases the | ||
745 | lock hold time significantly and affects performance in large | ||
746 | SMP machines. Since 2.5.62 kernel, dcache has been using | ||
747 | a new locking model that uses RCU to make dcache look-up | ||
748 | lock-free. | ||
749 | |||
750 | The current dcache locking model is not very different from the existing | ||
751 | dcache locking model. Prior to 2.5.62 kernel, dcache_lock | ||
752 | protected the hash chain, d_child, d_alias, d_lru lists as well | ||
753 | as d_inode and several other things like mount look-up. RCU-based | ||
754 | changes affect only the way the hash chain is protected. For everything | ||
755 | else the dcache_lock must be taken for both traversing as well as | ||
756 | updating. The hash chain updates too take the dcache_lock. | ||
757 | The significant change is the way d_lookup traverses the hash chain, | ||
758 | it doesn't acquire the dcache_lock for this and rely on RCU to | ||
759 | ensure that the dentry has not been *freed*. | ||
760 | |||
761 | |||
762 | Dcache locking details | ||
763 | ---------------------- | ||
764 | |||
765 | For many multi-user workloads, open() and stat() on files are | ||
766 | very frequently occurring operations. Both involve walking | ||
767 | of path names to find the dentry corresponding to the | ||
768 | concerned file. In 2.4 kernel, dcache_lock was held | ||
769 | during look-up of each path component. Contention and | ||
770 | cache-line bouncing of this global lock caused significant | ||
771 | scalability problems. With the introduction of RCU | ||
772 | in Linux kernel, this was worked around by making | ||
773 | the look-up of path components during path walking lock-free. | ||
774 | |||
775 | |||
776 | Safe lock-free look-up of dcache hash table | ||
777 | =========================================== | ||
778 | |||
779 | Dcache is a complex data structure with the hash table entries | ||
780 | also linked together in other lists. In 2.4 kernel, dcache_lock | ||
781 | protected all the lists. We applied RCU only on hash chain | ||
782 | walking. The rest of the lists are still protected by dcache_lock. | ||
783 | Some of the important changes are : | ||
784 | |||
785 | 1. The deletion from hash chain is done using hlist_del_rcu() macro which | ||
786 | doesn't initialize next pointer of the deleted dentry and this | ||
787 | allows us to walk safely lock-free while a deletion is happening. | ||
788 | |||
789 | 2. Insertion of a dentry into the hash table is done using | ||
790 | hlist_add_head_rcu() which take care of ordering the writes - | ||
791 | the writes to the dentry must be visible before the dentry | ||
792 | is inserted. This works in conjunction with hlist_for_each_rcu() | ||
793 | while walking the hash chain. The only requirement is that | ||
794 | all initialization to the dentry must be done before hlist_add_head_rcu() | ||
795 | since we don't have dcache_lock protection while traversing | ||
796 | the hash chain. This isn't different from the existing code. | ||
797 | |||
798 | 3. The dentry looked up without holding dcache_lock by cannot be | ||
799 | returned for walking if it is unhashed. It then may have a NULL | ||
800 | d_inode or other bogosity since RCU doesn't protect the other | ||
801 | fields in the dentry. We therefore use a flag DCACHE_UNHASHED to | ||
802 | indicate unhashed dentries and use this in conjunction with a | ||
803 | per-dentry lock (d_lock). Once looked up without the dcache_lock, | ||
804 | we acquire the per-dentry lock (d_lock) and check if the | ||
805 | dentry is unhashed. If so, the look-up is failed. If not, the | ||
806 | reference count of the dentry is increased and the dentry is returned. | ||
807 | |||
808 | 4. Once a dentry is looked up, it must be ensured during the path | ||
809 | walk for that component it doesn't go away. In pre-2.5.10 code, | ||
810 | this was done holding a reference to the dentry. dcache_rcu does | ||
811 | the same. In some sense, dcache_rcu path walking looks like | ||
812 | the pre-2.5.10 version. | ||
813 | |||
814 | 5. All dentry hash chain updates must take the dcache_lock as well as | ||
815 | the per-dentry lock in that order. dput() does this to ensure | ||
816 | that a dentry that has just been looked up in another CPU | ||
817 | doesn't get deleted before dget() can be done on it. | ||
818 | |||
819 | 6. There are several ways to do reference counting of RCU protected | ||
820 | objects. One such example is in ipv4 route cache where | ||
821 | deferred freeing (using call_rcu()) is done as soon as | ||
822 | the reference count goes to zero. This cannot be done in | ||
823 | the case of dentries because tearing down of dentries | ||
824 | require blocking (dentry_iput()) which isn't supported from | ||
825 | RCU callbacks. Instead, tearing down of dentries happen | ||
826 | synchronously in dput(), but actual freeing happens later | ||
827 | when RCU grace period is over. This allows safe lock-free | ||
828 | walking of the hash chains, but a matched dentry may have | ||
829 | been partially torn down. The checking of DCACHE_UNHASHED | ||
830 | flag with d_lock held detects such dentries and prevents | ||
831 | them from being returned from look-up. | ||
832 | |||
833 | |||
834 | Maintaining POSIX rename semantics | ||
835 | ================================== | ||
836 | |||
837 | Since look-up of dentries is lock-free, it can race against | ||
838 | a concurrent rename operation. For example, during rename | ||
839 | of file A to B, look-up of either A or B must succeed. | ||
840 | So, if look-up of B happens after A has been removed from the | ||
841 | hash chain but not added to the new hash chain, it may fail. | ||
842 | Also, a comparison while the name is being written concurrently | ||
843 | by a rename may result in false positive matches violating | ||
844 | rename semantics. Issues related to race with rename are | ||
845 | handled as described below : | ||
846 | |||
847 | 1. Look-up can be done in two ways - d_lookup() which is safe | ||
848 | from simultaneous renames and __d_lookup() which is not. | ||
849 | If __d_lookup() fails, it must be followed up by a d_lookup() | ||
850 | to correctly determine whether a dentry is in the hash table | ||
851 | or not. d_lookup() protects look-ups using a sequence | ||
852 | lock (rename_lock). | ||
853 | |||
854 | 2. The name associated with a dentry (d_name) may be changed if | ||
855 | a rename is allowed to happen simultaneously. To avoid memcmp() | ||
856 | in __d_lookup() go out of bounds due to a rename and false | ||
857 | positive comparison, the name comparison is done while holding the | ||
858 | per-dentry lock. This prevents concurrent renames during this | ||
859 | operation. | ||
860 | |||
861 | 3. Hash table walking during look-up may move to a different bucket as | ||
862 | the current dentry is moved to a different bucket due to rename. | ||
863 | But we use hlists in dcache hash table and they are null-terminated. | ||
864 | So, even if a dentry moves to a different bucket, hash chain | ||
865 | walk will terminate. [with a list_head list, it may not since | ||
866 | termination is when the list_head in the original bucket is reached]. | ||
867 | Since we redo the d_parent check and compare name while holding | ||
868 | d_lock, lock-free look-up will not race against d_move(). | ||
869 | |||
870 | 4. There can be a theoretical race when a dentry keeps coming back | ||
871 | to original bucket due to double moves. Due to this look-up may | ||
872 | consider that it has never moved and can end up in a infinite loop. | ||
873 | But this is not any worse that theoretical livelocks we already | ||
874 | have in the kernel. | ||
875 | |||
876 | |||
877 | Important guidelines for filesystem developers related to dcache_rcu | ||
878 | ==================================================================== | ||
879 | |||
880 | 1. Existing dcache interfaces (pre-2.5.62) exported to filesystem | ||
881 | don't change. Only dcache internal implementation changes. However | ||
882 | filesystems *must not* delete from the dentry hash chains directly | ||
883 | using the list macros like allowed earlier. They must use dcache | ||
884 | APIs like d_drop() or __d_drop() depending on the situation. | ||
885 | |||
886 | 2. d_flags is now protected by a per-dentry lock (d_lock). All | ||
887 | access to d_flags must be protected by it. | ||
888 | |||
889 | 3. For a hashed dentry, checking of d_count needs to be protected | ||
890 | by d_lock. | ||
891 | |||
892 | |||
893 | Papers and other documentation on dcache locking | ||
894 | ================================================ | ||
895 | |||
896 | 1. Scaling dcache with RCU (http://linuxjournal.com/article.php?sid=7124). | ||
897 | |||
898 | 2. http://lse.sourceforge.net/locking/dcache/dcache.html | ||
899 | 726 | ||
900 | 727 | ||
901 | Resources | 728 | Resources |