diff options
author | Linus Torvalds <torvalds@woody.linux-foundation.org> | 2007-10-15 19:07:40 -0400 |
---|---|---|
committer | Linus Torvalds <torvalds@woody.linux-foundation.org> | 2007-10-15 19:07:40 -0400 |
commit | 541010e4b8921cd781ff02ae68028501457045b6 (patch) | |
tree | 58bd529d4c6e69899a0aa20afa2d7f1c23326417 /Documentation/filesystems | |
parent | e457f790d8b05977853aa238bbc667b3bb375671 (diff) | |
parent | 5e7fc436426b1f9e106f511a049de91c82ec2c53 (diff) |
Merge branch 'locks' of git://linux-nfs.org/~bfields/linux
* 'locks' of git://linux-nfs.org/~bfields/linux:
nfsd: remove IS_ISMNDLCK macro
Rework /proc/locks via seq_files and seq_list helpers
fs/locks.c: use list_for_each_entry() instead of list_for_each()
NFS: clean up explicit check for mandatory locks
AFS: clean up explicit check for mandatory locks
9PFS: clean up explicit check for mandatory locks
GFS2: clean up explicit check for mandatory locks
Cleanup macros for distinguishing mandatory locks
Documentation: move locks.txt in filesystems/
locks: add warning about mandatory locking races
Documentation: move mandatory locking documentation to filesystems/
locks: Fix potential OOPS in generic_setlease()
Use list_first_entry in locks_wake_up_blocks
locks: fix flock_lock_file() comment
Memory shortage can result in inconsistent flocks state
locks: kill redundant local variable
locks: reverse order of posix_locks_conflict() arguments
Diffstat (limited to 'Documentation/filesystems')
-rw-r--r-- | Documentation/filesystems/00-INDEX | 4 | ||||
-rw-r--r-- | Documentation/filesystems/locks.txt | 67 | ||||
-rw-r--r-- | Documentation/filesystems/mandatory-locking.txt | 171 |
3 files changed, 242 insertions, 0 deletions
diff --git a/Documentation/filesystems/00-INDEX b/Documentation/filesystems/00-INDEX index 59db1bca7027..599593a17067 100644 --- a/Documentation/filesystems/00-INDEX +++ b/Documentation/filesystems/00-INDEX | |||
@@ -52,6 +52,10 @@ isofs.txt | |||
52 | - info and mount options for the ISO 9660 (CDROM) filesystem. | 52 | - info and mount options for the ISO 9660 (CDROM) filesystem. |
53 | jfs.txt | 53 | jfs.txt |
54 | - info and mount options for the JFS filesystem. | 54 | - info and mount options for the JFS filesystem. |
55 | locks.txt | ||
56 | - info on file locking implementations, flock() vs. fcntl(), etc. | ||
57 | mandatory-locking.txt | ||
58 | - info on the Linux implementation of Sys V mandatory file locking. | ||
55 | ncpfs.txt | 59 | ncpfs.txt |
56 | - info on Novell Netware(tm) filesystem using NCP protocol. | 60 | - info on Novell Netware(tm) filesystem using NCP protocol. |
57 | ntfs.txt | 61 | ntfs.txt |
diff --git a/Documentation/filesystems/locks.txt b/Documentation/filesystems/locks.txt new file mode 100644 index 000000000000..fab857accbd6 --- /dev/null +++ b/Documentation/filesystems/locks.txt | |||
@@ -0,0 +1,67 @@ | |||
1 | File Locking Release Notes | ||
2 | |||
3 | Andy Walker <andy@lysaker.kvaerner.no> | ||
4 | |||
5 | 12 May 1997 | ||
6 | |||
7 | |||
8 | 1. What's New? | ||
9 | -------------- | ||
10 | |||
11 | 1.1 Broken Flock Emulation | ||
12 | -------------------------- | ||
13 | |||
14 | The old flock(2) emulation in the kernel was swapped for proper BSD | ||
15 | compatible flock(2) support in the 1.3.x series of kernels. With the | ||
16 | release of the 2.1.x kernel series, support for the old emulation has | ||
17 | been totally removed, so that we don't need to carry this baggage | ||
18 | forever. | ||
19 | |||
20 | This should not cause problems for anybody, since everybody using a | ||
21 | 2.1.x kernel should have updated their C library to a suitable version | ||
22 | anyway (see the file "Documentation/Changes".) | ||
23 | |||
24 | 1.2 Allow Mixed Locks Again | ||
25 | --------------------------- | ||
26 | |||
27 | 1.2.1 Typical Problems - Sendmail | ||
28 | --------------------------------- | ||
29 | Because sendmail was unable to use the old flock() emulation, many sendmail | ||
30 | installations use fcntl() instead of flock(). This is true of Slackware 3.0 | ||
31 | for example. This gave rise to some other subtle problems if sendmail was | ||
32 | configured to rebuild the alias file. Sendmail tried to lock the aliases.dir | ||
33 | file with fcntl() at the same time as the GDBM routines tried to lock this | ||
34 | file with flock(). With pre 1.3.96 kernels this could result in deadlocks that, | ||
35 | over time, or under a very heavy mail load, would eventually cause the kernel | ||
36 | to lock solid with deadlocked processes. | ||
37 | |||
38 | |||
39 | 1.2.2 The Solution | ||
40 | ------------------ | ||
41 | The solution I have chosen, after much experimentation and discussion, | ||
42 | is to make flock() and fcntl() locks oblivious to each other. Both can | ||
43 | exists, and neither will have any effect on the other. | ||
44 | |||
45 | I wanted the two lock styles to be cooperative, but there were so many | ||
46 | race and deadlock conditions that the current solution was the only | ||
47 | practical one. It puts us in the same position as, for example, SunOS | ||
48 | 4.1.x and several other commercial Unices. The only OS's that support | ||
49 | cooperative flock()/fcntl() are those that emulate flock() using | ||
50 | fcntl(), with all the problems that implies. | ||
51 | |||
52 | |||
53 | 1.3 Mandatory Locking As A Mount Option | ||
54 | --------------------------------------- | ||
55 | |||
56 | Mandatory locking, as described in 'Documentation/filesystems/mandatory.txt' | ||
57 | was prior to this release a general configuration option that was valid for | ||
58 | all mounted filesystems. This had a number of inherent dangers, not the | ||
59 | least of which was the ability to freeze an NFS server by asking it to read | ||
60 | a file for which a mandatory lock existed. | ||
61 | |||
62 | From this release of the kernel, mandatory locking can be turned on and off | ||
63 | on a per-filesystem basis, using the mount options 'mand' and 'nomand'. | ||
64 | The default is to disallow mandatory locking. The intention is that | ||
65 | mandatory locking only be enabled on a local filesystem as the specific need | ||
66 | arises. | ||
67 | |||
diff --git a/Documentation/filesystems/mandatory-locking.txt b/Documentation/filesystems/mandatory-locking.txt new file mode 100644 index 000000000000..0979d1d2ca8b --- /dev/null +++ b/Documentation/filesystems/mandatory-locking.txt | |||
@@ -0,0 +1,171 @@ | |||
1 | Mandatory File Locking For The Linux Operating System | ||
2 | |||
3 | Andy Walker <andy@lysaker.kvaerner.no> | ||
4 | |||
5 | 15 April 1996 | ||
6 | (Updated September 2007) | ||
7 | |||
8 | 0. Why you should avoid mandatory locking | ||
9 | ----------------------------------------- | ||
10 | |||
11 | The Linux implementation is prey to a number of difficult-to-fix race | ||
12 | conditions which in practice make it not dependable: | ||
13 | |||
14 | - The write system call checks for a mandatory lock only once | ||
15 | at its start. It is therefore possible for a lock request to | ||
16 | be granted after this check but before the data is modified. | ||
17 | A process may then see file data change even while a mandatory | ||
18 | lock was held. | ||
19 | - Similarly, an exclusive lock may be granted on a file after | ||
20 | the kernel has decided to proceed with a read, but before the | ||
21 | read has actually completed, and the reading process may see | ||
22 | the file data in a state which should not have been visible | ||
23 | to it. | ||
24 | - Similar races make the claimed mutual exclusion between lock | ||
25 | and mmap similarly unreliable. | ||
26 | |||
27 | 1. What is mandatory locking? | ||
28 | ------------------------------ | ||
29 | |||
30 | Mandatory locking is kernel enforced file locking, as opposed to the more usual | ||
31 | cooperative file locking used to guarantee sequential access to files among | ||
32 | processes. File locks are applied using the flock() and fcntl() system calls | ||
33 | (and the lockf() library routine which is a wrapper around fcntl().) It is | ||
34 | normally a process' responsibility to check for locks on a file it wishes to | ||
35 | update, before applying its own lock, updating the file and unlocking it again. | ||
36 | The most commonly used example of this (and in the case of sendmail, the most | ||
37 | troublesome) is access to a user's mailbox. The mail user agent and the mail | ||
38 | transfer agent must guard against updating the mailbox at the same time, and | ||
39 | prevent reading the mailbox while it is being updated. | ||
40 | |||
41 | In a perfect world all processes would use and honour a cooperative, or | ||
42 | "advisory" locking scheme. However, the world isn't perfect, and there's | ||
43 | a lot of poorly written code out there. | ||
44 | |||
45 | In trying to address this problem, the designers of System V UNIX came up | ||
46 | with a "mandatory" locking scheme, whereby the operating system kernel would | ||
47 | block attempts by a process to write to a file that another process holds a | ||
48 | "read" -or- "shared" lock on, and block attempts to both read and write to a | ||
49 | file that a process holds a "write " -or- "exclusive" lock on. | ||
50 | |||
51 | The System V mandatory locking scheme was intended to have as little impact as | ||
52 | possible on existing user code. The scheme is based on marking individual files | ||
53 | as candidates for mandatory locking, and using the existing fcntl()/lockf() | ||
54 | interface for applying locks just as if they were normal, advisory locks. | ||
55 | |||
56 | Note 1: In saying "file" in the paragraphs above I am actually not telling | ||
57 | the whole truth. System V locking is based on fcntl(). The granularity of | ||
58 | fcntl() is such that it allows the locking of byte ranges in files, in addition | ||
59 | to entire files, so the mandatory locking rules also have byte level | ||
60 | granularity. | ||
61 | |||
62 | Note 2: POSIX.1 does not specify any scheme for mandatory locking, despite | ||
63 | borrowing the fcntl() locking scheme from System V. The mandatory locking | ||
64 | scheme is defined by the System V Interface Definition (SVID) Version 3. | ||
65 | |||
66 | 2. Marking a file for mandatory locking | ||
67 | --------------------------------------- | ||
68 | |||
69 | A file is marked as a candidate for mandatory locking by setting the group-id | ||
70 | bit in its file mode but removing the group-execute bit. This is an otherwise | ||
71 | meaningless combination, and was chosen by the System V implementors so as not | ||
72 | to break existing user programs. | ||
73 | |||
74 | Note that the group-id bit is usually automatically cleared by the kernel when | ||
75 | a setgid file is written to. This is a security measure. The kernel has been | ||
76 | modified to recognize the special case of a mandatory lock candidate and to | ||
77 | refrain from clearing this bit. Similarly the kernel has been modified not | ||
78 | to run mandatory lock candidates with setgid privileges. | ||
79 | |||
80 | 3. Available implementations | ||
81 | ---------------------------- | ||
82 | |||
83 | I have considered the implementations of mandatory locking available with | ||
84 | SunOS 4.1.x, Solaris 2.x and HP-UX 9.x. | ||
85 | |||
86 | Generally I have tried to make the most sense out of the behaviour exhibited | ||
87 | by these three reference systems. There are many anomalies. | ||
88 | |||
89 | All the reference systems reject all calls to open() for a file on which | ||
90 | another process has outstanding mandatory locks. This is in direct | ||
91 | contravention of SVID 3, which states that only calls to open() with the | ||
92 | O_TRUNC flag set should be rejected. The Linux implementation follows the SVID | ||
93 | definition, which is the "Right Thing", since only calls with O_TRUNC can | ||
94 | modify the contents of the file. | ||
95 | |||
96 | HP-UX even disallows open() with O_TRUNC for a file with advisory locks, not | ||
97 | just mandatory locks. That would appear to contravene POSIX.1. | ||
98 | |||
99 | mmap() is another interesting case. All the operating systems mentioned | ||
100 | prevent mandatory locks from being applied to an mmap()'ed file, but HP-UX | ||
101 | also disallows advisory locks for such a file. SVID actually specifies the | ||
102 | paranoid HP-UX behaviour. | ||
103 | |||
104 | In my opinion only MAP_SHARED mappings should be immune from locking, and then | ||
105 | only from mandatory locks - that is what is currently implemented. | ||
106 | |||
107 | SunOS is so hopeless that it doesn't even honour the O_NONBLOCK flag for | ||
108 | mandatory locks, so reads and writes to locked files always block when they | ||
109 | should return EAGAIN. | ||
110 | |||
111 | I'm afraid that this is such an esoteric area that the semantics described | ||
112 | below are just as valid as any others, so long as the main points seem to | ||
113 | agree. | ||
114 | |||
115 | 4. Semantics | ||
116 | ------------ | ||
117 | |||
118 | 1. Mandatory locks can only be applied via the fcntl()/lockf() locking | ||
119 | interface - in other words the System V/POSIX interface. BSD style | ||
120 | locks using flock() never result in a mandatory lock. | ||
121 | |||
122 | 2. If a process has locked a region of a file with a mandatory read lock, then | ||
123 | other processes are permitted to read from that region. If any of these | ||
124 | processes attempts to write to the region it will block until the lock is | ||
125 | released, unless the process has opened the file with the O_NONBLOCK | ||
126 | flag in which case the system call will return immediately with the error | ||
127 | status EAGAIN. | ||
128 | |||
129 | 3. If a process has locked a region of a file with a mandatory write lock, all | ||
130 | attempts to read or write to that region block until the lock is released, | ||
131 | unless a process has opened the file with the O_NONBLOCK flag in which case | ||
132 | the system call will return immediately with the error status EAGAIN. | ||
133 | |||
134 | 4. Calls to open() with O_TRUNC, or to creat(), on a existing file that has | ||
135 | any mandatory locks owned by other processes will be rejected with the | ||
136 | error status EAGAIN. | ||
137 | |||
138 | 5. Attempts to apply a mandatory lock to a file that is memory mapped and | ||
139 | shared (via mmap() with MAP_SHARED) will be rejected with the error status | ||
140 | EAGAIN. | ||
141 | |||
142 | 6. Attempts to create a shared memory map of a file (via mmap() with MAP_SHARED) | ||
143 | that has any mandatory locks in effect will be rejected with the error status | ||
144 | EAGAIN. | ||
145 | |||
146 | 5. Which system calls are affected? | ||
147 | ----------------------------------- | ||
148 | |||
149 | Those which modify a file's contents, not just the inode. That gives read(), | ||
150 | write(), readv(), writev(), open(), creat(), mmap(), truncate() and | ||
151 | ftruncate(). truncate() and ftruncate() are considered to be "write" actions | ||
152 | for the purposes of mandatory locking. | ||
153 | |||
154 | The affected region is usually defined as stretching from the current position | ||
155 | for the total number of bytes read or written. For the truncate calls it is | ||
156 | defined as the bytes of a file removed or added (we must also consider bytes | ||
157 | added, as a lock can specify just "the whole file", rather than a specific | ||
158 | range of bytes.) | ||
159 | |||
160 | Note 3: I may have overlooked some system calls that need mandatory lock | ||
161 | checking in my eagerness to get this code out the door. Please let me know, or | ||
162 | better still fix the system calls yourself and submit a patch to me or Linus. | ||
163 | |||
164 | 6. Warning! | ||
165 | ----------- | ||
166 | |||
167 | Not even root can override a mandatory lock, so runaway processes can wreak | ||
168 | havoc if they lock crucial files. The way around it is to change the file | ||
169 | permissions (remove the setgid bit) before trying to read or write to it. | ||
170 | Of course, that might be a bit tricky if the system is hung :-( | ||
171 | |||