aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/filesystems
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@woody.linux-foundation.org>2007-10-15 19:07:40 -0400
committerLinus Torvalds <torvalds@woody.linux-foundation.org>2007-10-15 19:07:40 -0400
commit541010e4b8921cd781ff02ae68028501457045b6 (patch)
tree58bd529d4c6e69899a0aa20afa2d7f1c23326417 /Documentation/filesystems
parente457f790d8b05977853aa238bbc667b3bb375671 (diff)
parent5e7fc436426b1f9e106f511a049de91c82ec2c53 (diff)
Merge branch 'locks' of git://linux-nfs.org/~bfields/linux
* 'locks' of git://linux-nfs.org/~bfields/linux: nfsd: remove IS_ISMNDLCK macro Rework /proc/locks via seq_files and seq_list helpers fs/locks.c: use list_for_each_entry() instead of list_for_each() NFS: clean up explicit check for mandatory locks AFS: clean up explicit check for mandatory locks 9PFS: clean up explicit check for mandatory locks GFS2: clean up explicit check for mandatory locks Cleanup macros for distinguishing mandatory locks Documentation: move locks.txt in filesystems/ locks: add warning about mandatory locking races Documentation: move mandatory locking documentation to filesystems/ locks: Fix potential OOPS in generic_setlease() Use list_first_entry in locks_wake_up_blocks locks: fix flock_lock_file() comment Memory shortage can result in inconsistent flocks state locks: kill redundant local variable locks: reverse order of posix_locks_conflict() arguments
Diffstat (limited to 'Documentation/filesystems')
-rw-r--r--Documentation/filesystems/00-INDEX4
-rw-r--r--Documentation/filesystems/locks.txt67
-rw-r--r--Documentation/filesystems/mandatory-locking.txt171
3 files changed, 242 insertions, 0 deletions
diff --git a/Documentation/filesystems/00-INDEX b/Documentation/filesystems/00-INDEX
index 59db1bca7027..599593a17067 100644
--- a/Documentation/filesystems/00-INDEX
+++ b/Documentation/filesystems/00-INDEX
@@ -52,6 +52,10 @@ isofs.txt
52 - info and mount options for the ISO 9660 (CDROM) filesystem. 52 - info and mount options for the ISO 9660 (CDROM) filesystem.
53jfs.txt 53jfs.txt
54 - info and mount options for the JFS filesystem. 54 - info and mount options for the JFS filesystem.
55locks.txt
56 - info on file locking implementations, flock() vs. fcntl(), etc.
57mandatory-locking.txt
58 - info on the Linux implementation of Sys V mandatory file locking.
55ncpfs.txt 59ncpfs.txt
56 - info on Novell Netware(tm) filesystem using NCP protocol. 60 - info on Novell Netware(tm) filesystem using NCP protocol.
57ntfs.txt 61ntfs.txt
diff --git a/Documentation/filesystems/locks.txt b/Documentation/filesystems/locks.txt
new file mode 100644
index 000000000000..fab857accbd6
--- /dev/null
+++ b/Documentation/filesystems/locks.txt
@@ -0,0 +1,67 @@
1 File Locking Release Notes
2
3 Andy Walker <andy@lysaker.kvaerner.no>
4
5 12 May 1997
6
7
81. What's New?
9--------------
10
111.1 Broken Flock Emulation
12--------------------------
13
14The old flock(2) emulation in the kernel was swapped for proper BSD
15compatible flock(2) support in the 1.3.x series of kernels. With the
16release of the 2.1.x kernel series, support for the old emulation has
17been totally removed, so that we don't need to carry this baggage
18forever.
19
20This should not cause problems for anybody, since everybody using a
212.1.x kernel should have updated their C library to a suitable version
22anyway (see the file "Documentation/Changes".)
23
241.2 Allow Mixed Locks Again
25---------------------------
26
271.2.1 Typical Problems - Sendmail
28---------------------------------
29Because sendmail was unable to use the old flock() emulation, many sendmail
30installations use fcntl() instead of flock(). This is true of Slackware 3.0
31for example. This gave rise to some other subtle problems if sendmail was
32configured to rebuild the alias file. Sendmail tried to lock the aliases.dir
33file with fcntl() at the same time as the GDBM routines tried to lock this
34file with flock(). With pre 1.3.96 kernels this could result in deadlocks that,
35over time, or under a very heavy mail load, would eventually cause the kernel
36to lock solid with deadlocked processes.
37
38
391.2.2 The Solution
40------------------
41The solution I have chosen, after much experimentation and discussion,
42is to make flock() and fcntl() locks oblivious to each other. Both can
43exists, and neither will have any effect on the other.
44
45I wanted the two lock styles to be cooperative, but there were so many
46race and deadlock conditions that the current solution was the only
47practical one. It puts us in the same position as, for example, SunOS
484.1.x and several other commercial Unices. The only OS's that support
49cooperative flock()/fcntl() are those that emulate flock() using
50fcntl(), with all the problems that implies.
51
52
531.3 Mandatory Locking As A Mount Option
54---------------------------------------
55
56Mandatory locking, as described in 'Documentation/filesystems/mandatory.txt'
57was prior to this release a general configuration option that was valid for
58all mounted filesystems. This had a number of inherent dangers, not the
59least of which was the ability to freeze an NFS server by asking it to read
60a file for which a mandatory lock existed.
61
62From this release of the kernel, mandatory locking can be turned on and off
63on a per-filesystem basis, using the mount options 'mand' and 'nomand'.
64The default is to disallow mandatory locking. The intention is that
65mandatory locking only be enabled on a local filesystem as the specific need
66arises.
67
diff --git a/Documentation/filesystems/mandatory-locking.txt b/Documentation/filesystems/mandatory-locking.txt
new file mode 100644
index 000000000000..0979d1d2ca8b
--- /dev/null
+++ b/Documentation/filesystems/mandatory-locking.txt
@@ -0,0 +1,171 @@
1 Mandatory File Locking For The Linux Operating System
2
3 Andy Walker <andy@lysaker.kvaerner.no>
4
5 15 April 1996
6 (Updated September 2007)
7
80. Why you should avoid mandatory locking
9-----------------------------------------
10
11The Linux implementation is prey to a number of difficult-to-fix race
12conditions which in practice make it not dependable:
13
14 - The write system call checks for a mandatory lock only once
15 at its start. It is therefore possible for a lock request to
16 be granted after this check but before the data is modified.
17 A process may then see file data change even while a mandatory
18 lock was held.
19 - Similarly, an exclusive lock may be granted on a file after
20 the kernel has decided to proceed with a read, but before the
21 read has actually completed, and the reading process may see
22 the file data in a state which should not have been visible
23 to it.
24 - Similar races make the claimed mutual exclusion between lock
25 and mmap similarly unreliable.
26
271. What is mandatory locking?
28------------------------------
29
30Mandatory locking is kernel enforced file locking, as opposed to the more usual
31cooperative file locking used to guarantee sequential access to files among
32processes. File locks are applied using the flock() and fcntl() system calls
33(and the lockf() library routine which is a wrapper around fcntl().) It is
34normally a process' responsibility to check for locks on a file it wishes to
35update, before applying its own lock, updating the file and unlocking it again.
36The most commonly used example of this (and in the case of sendmail, the most
37troublesome) is access to a user's mailbox. The mail user agent and the mail
38transfer agent must guard against updating the mailbox at the same time, and
39prevent reading the mailbox while it is being updated.
40
41In a perfect world all processes would use and honour a cooperative, or
42"advisory" locking scheme. However, the world isn't perfect, and there's
43a lot of poorly written code out there.
44
45In trying to address this problem, the designers of System V UNIX came up
46with a "mandatory" locking scheme, whereby the operating system kernel would
47block attempts by a process to write to a file that another process holds a
48"read" -or- "shared" lock on, and block attempts to both read and write to a
49file that a process holds a "write " -or- "exclusive" lock on.
50
51The System V mandatory locking scheme was intended to have as little impact as
52possible on existing user code. The scheme is based on marking individual files
53as candidates for mandatory locking, and using the existing fcntl()/lockf()
54interface for applying locks just as if they were normal, advisory locks.
55
56Note 1: In saying "file" in the paragraphs above I am actually not telling
57the whole truth. System V locking is based on fcntl(). The granularity of
58fcntl() is such that it allows the locking of byte ranges in files, in addition
59to entire files, so the mandatory locking rules also have byte level
60granularity.
61
62Note 2: POSIX.1 does not specify any scheme for mandatory locking, despite
63borrowing the fcntl() locking scheme from System V. The mandatory locking
64scheme is defined by the System V Interface Definition (SVID) Version 3.
65
662. Marking a file for mandatory locking
67---------------------------------------
68
69A file is marked as a candidate for mandatory locking by setting the group-id
70bit in its file mode but removing the group-execute bit. This is an otherwise
71meaningless combination, and was chosen by the System V implementors so as not
72to break existing user programs.
73
74Note that the group-id bit is usually automatically cleared by the kernel when
75a setgid file is written to. This is a security measure. The kernel has been
76modified to recognize the special case of a mandatory lock candidate and to
77refrain from clearing this bit. Similarly the kernel has been modified not
78to run mandatory lock candidates with setgid privileges.
79
803. Available implementations
81----------------------------
82
83I have considered the implementations of mandatory locking available with
84SunOS 4.1.x, Solaris 2.x and HP-UX 9.x.
85
86Generally I have tried to make the most sense out of the behaviour exhibited
87by these three reference systems. There are many anomalies.
88
89All the reference systems reject all calls to open() for a file on which
90another process has outstanding mandatory locks. This is in direct
91contravention of SVID 3, which states that only calls to open() with the
92O_TRUNC flag set should be rejected. The Linux implementation follows the SVID
93definition, which is the "Right Thing", since only calls with O_TRUNC can
94modify the contents of the file.
95
96HP-UX even disallows open() with O_TRUNC for a file with advisory locks, not
97just mandatory locks. That would appear to contravene POSIX.1.
98
99mmap() is another interesting case. All the operating systems mentioned
100prevent mandatory locks from being applied to an mmap()'ed file, but HP-UX
101also disallows advisory locks for such a file. SVID actually specifies the
102paranoid HP-UX behaviour.
103
104In my opinion only MAP_SHARED mappings should be immune from locking, and then
105only from mandatory locks - that is what is currently implemented.
106
107SunOS is so hopeless that it doesn't even honour the O_NONBLOCK flag for
108mandatory locks, so reads and writes to locked files always block when they
109should return EAGAIN.
110
111I'm afraid that this is such an esoteric area that the semantics described
112below are just as valid as any others, so long as the main points seem to
113agree.
114
1154. Semantics
116------------
117
1181. Mandatory locks can only be applied via the fcntl()/lockf() locking
119 interface - in other words the System V/POSIX interface. BSD style
120 locks using flock() never result in a mandatory lock.
121
1222. If a process has locked a region of a file with a mandatory read lock, then
123 other processes are permitted to read from that region. If any of these
124 processes attempts to write to the region it will block until the lock is
125 released, unless the process has opened the file with the O_NONBLOCK
126 flag in which case the system call will return immediately with the error
127 status EAGAIN.
128
1293. If a process has locked a region of a file with a mandatory write lock, all
130 attempts to read or write to that region block until the lock is released,
131 unless a process has opened the file with the O_NONBLOCK flag in which case
132 the system call will return immediately with the error status EAGAIN.
133
1344. Calls to open() with O_TRUNC, or to creat(), on a existing file that has
135 any mandatory locks owned by other processes will be rejected with the
136 error status EAGAIN.
137
1385. Attempts to apply a mandatory lock to a file that is memory mapped and
139 shared (via mmap() with MAP_SHARED) will be rejected with the error status
140 EAGAIN.
141
1426. Attempts to create a shared memory map of a file (via mmap() with MAP_SHARED)
143 that has any mandatory locks in effect will be rejected with the error status
144 EAGAIN.
145
1465. Which system calls are affected?
147-----------------------------------
148
149Those which modify a file's contents, not just the inode. That gives read(),
150write(), readv(), writev(), open(), creat(), mmap(), truncate() and
151ftruncate(). truncate() and ftruncate() are considered to be "write" actions
152for the purposes of mandatory locking.
153
154The affected region is usually defined as stretching from the current position
155for the total number of bytes read or written. For the truncate calls it is
156defined as the bytes of a file removed or added (we must also consider bytes
157added, as a lock can specify just "the whole file", rather than a specific
158range of bytes.)
159
160Note 3: I may have overlooked some system calls that need mandatory lock
161checking in my eagerness to get this code out the door. Please let me know, or
162better still fix the system calls yourself and submit a patch to me or Linus.
163
1646. Warning!
165-----------
166
167Not even root can override a mandatory lock, so runaway processes can wreak
168havoc if they lock crucial files. The way around it is to change the file
169permissions (remove the setgid bit) before trying to read or write to it.
170Of course, that might be a bit tricky if the system is hung :-(
171