diff options
Diffstat (limited to 'Documentation/filesystems/mandatory-locking.txt')
-rw-r--r-- | Documentation/filesystems/mandatory-locking.txt | 171 |
1 files changed, 171 insertions, 0 deletions
diff --git a/Documentation/filesystems/mandatory-locking.txt b/Documentation/filesystems/mandatory-locking.txt new file mode 100644 index 000000000000..0979d1d2ca8b --- /dev/null +++ b/Documentation/filesystems/mandatory-locking.txt | |||
@@ -0,0 +1,171 @@ | |||
1 | Mandatory File Locking For The Linux Operating System | ||
2 | |||
3 | Andy Walker <andy@lysaker.kvaerner.no> | ||
4 | |||
5 | 15 April 1996 | ||
6 | (Updated September 2007) | ||
7 | |||
8 | 0. Why you should avoid mandatory locking | ||
9 | ----------------------------------------- | ||
10 | |||
11 | The Linux implementation is prey to a number of difficult-to-fix race | ||
12 | conditions which in practice make it not dependable: | ||
13 | |||
14 | - The write system call checks for a mandatory lock only once | ||
15 | at its start. It is therefore possible for a lock request to | ||
16 | be granted after this check but before the data is modified. | ||
17 | A process may then see file data change even while a mandatory | ||
18 | lock was held. | ||
19 | - Similarly, an exclusive lock may be granted on a file after | ||
20 | the kernel has decided to proceed with a read, but before the | ||
21 | read has actually completed, and the reading process may see | ||
22 | the file data in a state which should not have been visible | ||
23 | to it. | ||
24 | - Similar races make the claimed mutual exclusion between lock | ||
25 | and mmap similarly unreliable. | ||
26 | |||
27 | 1. What is mandatory locking? | ||
28 | ------------------------------ | ||
29 | |||
30 | Mandatory locking is kernel enforced file locking, as opposed to the more usual | ||
31 | cooperative file locking used to guarantee sequential access to files among | ||
32 | processes. File locks are applied using the flock() and fcntl() system calls | ||
33 | (and the lockf() library routine which is a wrapper around fcntl().) It is | ||
34 | normally a process' responsibility to check for locks on a file it wishes to | ||
35 | update, before applying its own lock, updating the file and unlocking it again. | ||
36 | The most commonly used example of this (and in the case of sendmail, the most | ||
37 | troublesome) is access to a user's mailbox. The mail user agent and the mail | ||
38 | transfer agent must guard against updating the mailbox at the same time, and | ||
39 | prevent reading the mailbox while it is being updated. | ||
40 | |||
41 | In a perfect world all processes would use and honour a cooperative, or | ||
42 | "advisory" locking scheme. However, the world isn't perfect, and there's | ||
43 | a lot of poorly written code out there. | ||
44 | |||
45 | In trying to address this problem, the designers of System V UNIX came up | ||
46 | with a "mandatory" locking scheme, whereby the operating system kernel would | ||
47 | block attempts by a process to write to a file that another process holds a | ||
48 | "read" -or- "shared" lock on, and block attempts to both read and write to a | ||
49 | file that a process holds a "write " -or- "exclusive" lock on. | ||
50 | |||
51 | The System V mandatory locking scheme was intended to have as little impact as | ||
52 | possible on existing user code. The scheme is based on marking individual files | ||
53 | as candidates for mandatory locking, and using the existing fcntl()/lockf() | ||
54 | interface for applying locks just as if they were normal, advisory locks. | ||
55 | |||
56 | Note 1: In saying "file" in the paragraphs above I am actually not telling | ||
57 | the whole truth. System V locking is based on fcntl(). The granularity of | ||
58 | fcntl() is such that it allows the locking of byte ranges in files, in addition | ||
59 | to entire files, so the mandatory locking rules also have byte level | ||
60 | granularity. | ||
61 | |||
62 | Note 2: POSIX.1 does not specify any scheme for mandatory locking, despite | ||
63 | borrowing the fcntl() locking scheme from System V. The mandatory locking | ||
64 | scheme is defined by the System V Interface Definition (SVID) Version 3. | ||
65 | |||
66 | 2. Marking a file for mandatory locking | ||
67 | --------------------------------------- | ||
68 | |||
69 | A file is marked as a candidate for mandatory locking by setting the group-id | ||
70 | bit in its file mode but removing the group-execute bit. This is an otherwise | ||
71 | meaningless combination, and was chosen by the System V implementors so as not | ||
72 | to break existing user programs. | ||
73 | |||
74 | Note that the group-id bit is usually automatically cleared by the kernel when | ||
75 | a setgid file is written to. This is a security measure. The kernel has been | ||
76 | modified to recognize the special case of a mandatory lock candidate and to | ||
77 | refrain from clearing this bit. Similarly the kernel has been modified not | ||
78 | to run mandatory lock candidates with setgid privileges. | ||
79 | |||
80 | 3. Available implementations | ||
81 | ---------------------------- | ||
82 | |||
83 | I have considered the implementations of mandatory locking available with | ||
84 | SunOS 4.1.x, Solaris 2.x and HP-UX 9.x. | ||
85 | |||
86 | Generally I have tried to make the most sense out of the behaviour exhibited | ||
87 | by these three reference systems. There are many anomalies. | ||
88 | |||
89 | All the reference systems reject all calls to open() for a file on which | ||
90 | another process has outstanding mandatory locks. This is in direct | ||
91 | contravention of SVID 3, which states that only calls to open() with the | ||
92 | O_TRUNC flag set should be rejected. The Linux implementation follows the SVID | ||
93 | definition, which is the "Right Thing", since only calls with O_TRUNC can | ||
94 | modify the contents of the file. | ||
95 | |||
96 | HP-UX even disallows open() with O_TRUNC for a file with advisory locks, not | ||
97 | just mandatory locks. That would appear to contravene POSIX.1. | ||
98 | |||
99 | mmap() is another interesting case. All the operating systems mentioned | ||
100 | prevent mandatory locks from being applied to an mmap()'ed file, but HP-UX | ||
101 | also disallows advisory locks for such a file. SVID actually specifies the | ||
102 | paranoid HP-UX behaviour. | ||
103 | |||
104 | In my opinion only MAP_SHARED mappings should be immune from locking, and then | ||
105 | only from mandatory locks - that is what is currently implemented. | ||
106 | |||
107 | SunOS is so hopeless that it doesn't even honour the O_NONBLOCK flag for | ||
108 | mandatory locks, so reads and writes to locked files always block when they | ||
109 | should return EAGAIN. | ||
110 | |||
111 | I'm afraid that this is such an esoteric area that the semantics described | ||
112 | below are just as valid as any others, so long as the main points seem to | ||
113 | agree. | ||
114 | |||
115 | 4. Semantics | ||
116 | ------------ | ||
117 | |||
118 | 1. Mandatory locks can only be applied via the fcntl()/lockf() locking | ||
119 | interface - in other words the System V/POSIX interface. BSD style | ||
120 | locks using flock() never result in a mandatory lock. | ||
121 | |||
122 | 2. If a process has locked a region of a file with a mandatory read lock, then | ||
123 | other processes are permitted to read from that region. If any of these | ||
124 | processes attempts to write to the region it will block until the lock is | ||
125 | released, unless the process has opened the file with the O_NONBLOCK | ||
126 | flag in which case the system call will return immediately with the error | ||
127 | status EAGAIN. | ||
128 | |||
129 | 3. If a process has locked a region of a file with a mandatory write lock, all | ||
130 | attempts to read or write to that region block until the lock is released, | ||
131 | unless a process has opened the file with the O_NONBLOCK flag in which case | ||
132 | the system call will return immediately with the error status EAGAIN. | ||
133 | |||
134 | 4. Calls to open() with O_TRUNC, or to creat(), on a existing file that has | ||
135 | any mandatory locks owned by other processes will be rejected with the | ||
136 | error status EAGAIN. | ||
137 | |||
138 | 5. Attempts to apply a mandatory lock to a file that is memory mapped and | ||
139 | shared (via mmap() with MAP_SHARED) will be rejected with the error status | ||
140 | EAGAIN. | ||
141 | |||
142 | 6. Attempts to create a shared memory map of a file (via mmap() with MAP_SHARED) | ||
143 | that has any mandatory locks in effect will be rejected with the error status | ||
144 | EAGAIN. | ||
145 | |||
146 | 5. Which system calls are affected? | ||
147 | ----------------------------------- | ||
148 | |||
149 | Those which modify a file's contents, not just the inode. That gives read(), | ||
150 | write(), readv(), writev(), open(), creat(), mmap(), truncate() and | ||
151 | ftruncate(). truncate() and ftruncate() are considered to be "write" actions | ||
152 | for the purposes of mandatory locking. | ||
153 | |||
154 | The affected region is usually defined as stretching from the current position | ||
155 | for the total number of bytes read or written. For the truncate calls it is | ||
156 | defined as the bytes of a file removed or added (we must also consider bytes | ||
157 | added, as a lock can specify just "the whole file", rather than a specific | ||
158 | range of bytes.) | ||
159 | |||
160 | Note 3: I may have overlooked some system calls that need mandatory lock | ||
161 | checking in my eagerness to get this code out the door. Please let me know, or | ||
162 | better still fix the system calls yourself and submit a patch to me or Linus. | ||
163 | |||
164 | 6. Warning! | ||
165 | ----------- | ||
166 | |||
167 | Not even root can override a mandatory lock, so runaway processes can wreak | ||
168 | havoc if they lock crucial files. The way around it is to change the file | ||
169 | permissions (remove the setgid bit) before trying to read or write to it. | ||
170 | Of course, that might be a bit tricky if the system is hung :-( | ||
171 | |||