aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
authorSteven Whitehouse <swhiteho@redhat.com>2006-07-05 08:27:42 -0400
committerSteven Whitehouse <swhiteho@redhat.com>2006-07-05 08:27:42 -0400
commitcf57a308436653f3094590202c77459aab250ff3 (patch)
tree8a9e7096e494141911147a1f24865c3d79d583c1 /Documentation
parentfaac9bd0e3ce7cb0572ec66e0a426cacf6afa970 (diff)
parentca78f6baca863afe2e6a244a0fe94b3a70211d46 (diff)
Merge branch 'master'
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/DocBook/mtdnand.tmpl11
-rw-r--r--Documentation/irqflags-tracing.txt57
-rw-r--r--Documentation/kernel-parameters.txt9
-rw-r--r--Documentation/lockdep-design.txt197
-rw-r--r--Documentation/networking/ipvs-sysctl.txt143
-rw-r--r--Documentation/powerpc/booting-without-of.txt4
-rw-r--r--Documentation/scsi/ChangeLog.megaraid_sas16
-rw-r--r--Documentation/sysctl/vm.txt14
8 files changed, 444 insertions, 7 deletions
diff --git a/Documentation/DocBook/mtdnand.tmpl b/Documentation/DocBook/mtdnand.tmpl
index 999afe1ca8cb..a8c8cce50633 100644
--- a/Documentation/DocBook/mtdnand.tmpl
+++ b/Documentation/DocBook/mtdnand.tmpl
@@ -109,7 +109,7 @@
109 for most of the implementations. These functions can be replaced by the 109 for most of the implementations. These functions can be replaced by the
110 board driver if neccecary. Those functions are called via pointers in the 110 board driver if neccecary. Those functions are called via pointers in the
111 NAND chip description structure. The board driver can set the functions which 111 NAND chip description structure. The board driver can set the functions which
112 should be replaced by board dependend functions before calling nand_scan(). 112 should be replaced by board dependent functions before calling nand_scan().
113 If the function pointer is NULL on entry to nand_scan() then the pointer 113 If the function pointer is NULL on entry to nand_scan() then the pointer
114 is set to the default function which is suitable for the detected chip type. 114 is set to the default function which is suitable for the detected chip type.
115 </para></listitem> 115 </para></listitem>
@@ -133,7 +133,7 @@
133 [REPLACEABLE]</para><para> 133 [REPLACEABLE]</para><para>
134 Replaceable members hold hardware related functions which can be 134 Replaceable members hold hardware related functions which can be
135 provided by the board driver. The board driver can set the functions which 135 provided by the board driver. The board driver can set the functions which
136 should be replaced by board dependend functions before calling nand_scan(). 136 should be replaced by board dependent functions before calling nand_scan().
137 If the function pointer is NULL on entry to nand_scan() then the pointer 137 If the function pointer is NULL on entry to nand_scan() then the pointer
138 is set to the default function which is suitable for the detected chip type. 138 is set to the default function which is suitable for the detected chip type.
139 </para></listitem> 139 </para></listitem>
@@ -156,9 +156,8 @@
156 <title>Basic board driver</title> 156 <title>Basic board driver</title>
157 <para> 157 <para>
158 For most boards it will be sufficient to provide just the 158 For most boards it will be sufficient to provide just the
159 basic functions and fill out some really board dependend 159 basic functions and fill out some really board dependent
160 members in the nand chip description structure. 160 members in the nand chip description structure.
161 See drivers/mtd/nand/skeleton for reference.
162 </para> 161 </para>
163 <sect1> 162 <sect1>
164 <title>Basic defines</title> 163 <title>Basic defines</title>
@@ -1295,7 +1294,9 @@ in this page</entry>
1295 </para> 1294 </para>
1296!Idrivers/mtd/nand/nand_base.c 1295!Idrivers/mtd/nand/nand_base.c
1297!Idrivers/mtd/nand/nand_bbt.c 1296!Idrivers/mtd/nand/nand_bbt.c
1298!Idrivers/mtd/nand/nand_ecc.c 1297<!-- No internal functions for kernel-doc:
1298X!Idrivers/mtd/nand/nand_ecc.c
1299-->
1299 </chapter> 1300 </chapter>
1300 1301
1301 <chapter id="credits"> 1302 <chapter id="credits">
diff --git a/Documentation/irqflags-tracing.txt b/Documentation/irqflags-tracing.txt
new file mode 100644
index 000000000000..6a444877ee0b
--- /dev/null
+++ b/Documentation/irqflags-tracing.txt
@@ -0,0 +1,57 @@
1IRQ-flags state tracing
2
3started by Ingo Molnar <mingo@redhat.com>
4
5the "irq-flags tracing" feature "traces" hardirq and softirq state, in
6that it gives interested subsystems an opportunity to be notified of
7every hardirqs-off/hardirqs-on, softirqs-off/softirqs-on event that
8happens in the kernel.
9
10CONFIG_TRACE_IRQFLAGS_SUPPORT is needed for CONFIG_PROVE_SPIN_LOCKING
11and CONFIG_PROVE_RW_LOCKING to be offered by the generic lock debugging
12code. Otherwise only CONFIG_PROVE_MUTEX_LOCKING and
13CONFIG_PROVE_RWSEM_LOCKING will be offered on an architecture - these
14are locking APIs that are not used in IRQ context. (the one exception
15for rwsems is worked around)
16
17architecture support for this is certainly not in the "trivial"
18category, because lots of lowlevel assembly code deal with irq-flags
19state changes. But an architecture can be irq-flags-tracing enabled in a
20rather straightforward and risk-free manner.
21
22Architectures that want to support this need to do a couple of
23code-organizational changes first:
24
25- move their irq-flags manipulation code from their asm/system.h header
26 to asm/irqflags.h
27
28- rename local_irq_disable()/etc to raw_local_irq_disable()/etc. so that
29 the linux/irqflags.h code can inject callbacks and can construct the
30 real local_irq_disable()/etc APIs.
31
32- add and enable TRACE_IRQFLAGS_SUPPORT in their arch level Kconfig file
33
34and then a couple of functional changes are needed as well to implement
35irq-flags-tracing support:
36
37- in lowlevel entry code add (build-conditional) calls to the
38 trace_hardirqs_off()/trace_hardirqs_on() functions. The lock validator
39 closely guards whether the 'real' irq-flags matches the 'virtual'
40 irq-flags state, and complains loudly (and turns itself off) if the
41 two do not match. Usually most of the time for arch support for
42 irq-flags-tracing is spent in this state: look at the lockdep
43 complaint, try to figure out the assembly code we did not cover yet,
44 fix and repeat. Once the system has booted up and works without a
45 lockdep complaint in the irq-flags-tracing functions arch support is
46 complete.
47- if the architecture has non-maskable interrupts then those need to be
48 excluded from the irq-tracing [and lock validation] mechanism via
49 lockdep_off()/lockdep_on().
50
51in general there is no risk from having an incomplete irq-flags-tracing
52implementation in an architecture: lockdep will detect that and will
53turn itself off. I.e. the lock validator will still be reliable. There
54should be no crashes due to irq-tracing bugs. (except if the assembly
55changes break other code by modifying conditions or registers that
56shouldnt be)
57
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 86e9282d1c20..149f62ba14a5 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -435,6 +435,15 @@ running once the system is up.
435 435
436 debug [KNL] Enable kernel debugging (events log level). 436 debug [KNL] Enable kernel debugging (events log level).
437 437
438 debug_locks_verbose=
439 [KNL] verbose self-tests
440 Format=<0|1>
441 Print debugging info while doing the locking API
442 self-tests.
443 We default to 0 (no extra messages), setting it to
444 1 will print _a lot_ more information - normally
445 only useful to kernel developers.
446
438 decnet= [HW,NET] 447 decnet= [HW,NET]
439 Format: <area>[,<node>] 448 Format: <area>[,<node>]
440 See also Documentation/networking/decnet.txt. 449 See also Documentation/networking/decnet.txt.
diff --git a/Documentation/lockdep-design.txt b/Documentation/lockdep-design.txt
new file mode 100644
index 000000000000..00d93605bfd3
--- /dev/null
+++ b/Documentation/lockdep-design.txt
@@ -0,0 +1,197 @@
1Runtime locking correctness validator
2=====================================
3
4started by Ingo Molnar <mingo@redhat.com>
5additions by Arjan van de Ven <arjan@linux.intel.com>
6
7Lock-class
8----------
9
10The basic object the validator operates upon is a 'class' of locks.
11
12A class of locks is a group of locks that are logically the same with
13respect to locking rules, even if the locks may have multiple (possibly
14tens of thousands of) instantiations. For example a lock in the inode
15struct is one class, while each inode has its own instantiation of that
16lock class.
17
18The validator tracks the 'state' of lock-classes, and it tracks
19dependencies between different lock-classes. The validator maintains a
20rolling proof that the state and the dependencies are correct.
21
22Unlike an lock instantiation, the lock-class itself never goes away: when
23a lock-class is used for the first time after bootup it gets registered,
24and all subsequent uses of that lock-class will be attached to this
25lock-class.
26
27State
28-----
29
30The validator tracks lock-class usage history into 5 separate state bits:
31
32- 'ever held in hardirq context' [ == hardirq-safe ]
33- 'ever held in softirq context' [ == softirq-safe ]
34- 'ever held with hardirqs enabled' [ == hardirq-unsafe ]
35- 'ever held with softirqs and hardirqs enabled' [ == softirq-unsafe ]
36
37- 'ever used' [ == !unused ]
38
39Single-lock state rules:
40------------------------
41
42A softirq-unsafe lock-class is automatically hardirq-unsafe as well. The
43following states are exclusive, and only one of them is allowed to be
44set for any lock-class:
45
46 <hardirq-safe> and <hardirq-unsafe>
47 <softirq-safe> and <softirq-unsafe>
48
49The validator detects and reports lock usage that violate these
50single-lock state rules.
51
52Multi-lock dependency rules:
53----------------------------
54
55The same lock-class must not be acquired twice, because this could lead
56to lock recursion deadlocks.
57
58Furthermore, two locks may not be taken in different order:
59
60 <L1> -> <L2>
61 <L2> -> <L1>
62
63because this could lead to lock inversion deadlocks. (The validator
64finds such dependencies in arbitrary complexity, i.e. there can be any
65other locking sequence between the acquire-lock operations, the
66validator will still track all dependencies between locks.)
67
68Furthermore, the following usage based lock dependencies are not allowed
69between any two lock-classes:
70
71 <hardirq-safe> -> <hardirq-unsafe>
72 <softirq-safe> -> <softirq-unsafe>
73
74The first rule comes from the fact the a hardirq-safe lock could be
75taken by a hardirq context, interrupting a hardirq-unsafe lock - and
76thus could result in a lock inversion deadlock. Likewise, a softirq-safe
77lock could be taken by an softirq context, interrupting a softirq-unsafe
78lock.
79
80The above rules are enforced for any locking sequence that occurs in the
81kernel: when acquiring a new lock, the validator checks whether there is
82any rule violation between the new lock and any of the held locks.
83
84When a lock-class changes its state, the following aspects of the above
85dependency rules are enforced:
86
87- if a new hardirq-safe lock is discovered, we check whether it
88 took any hardirq-unsafe lock in the past.
89
90- if a new softirq-safe lock is discovered, we check whether it took
91 any softirq-unsafe lock in the past.
92
93- if a new hardirq-unsafe lock is discovered, we check whether any
94 hardirq-safe lock took it in the past.
95
96- if a new softirq-unsafe lock is discovered, we check whether any
97 softirq-safe lock took it in the past.
98
99(Again, we do these checks too on the basis that an interrupt context
100could interrupt _any_ of the irq-unsafe or hardirq-unsafe locks, which
101could lead to a lock inversion deadlock - even if that lock scenario did
102not trigger in practice yet.)
103
104Exception: Nested data dependencies leading to nested locking
105-------------------------------------------------------------
106
107There are a few cases where the Linux kernel acquires more than one
108instance of the same lock-class. Such cases typically happen when there
109is some sort of hierarchy within objects of the same type. In these
110cases there is an inherent "natural" ordering between the two objects
111(defined by the properties of the hierarchy), and the kernel grabs the
112locks in this fixed order on each of the objects.
113
114An example of such an object hieararchy that results in "nested locking"
115is that of a "whole disk" block-dev object and a "partition" block-dev
116object; the partition is "part of" the whole device and as long as one
117always takes the whole disk lock as a higher lock than the partition
118lock, the lock ordering is fully correct. The validator does not
119automatically detect this natural ordering, as the locking rule behind
120the ordering is not static.
121
122In order to teach the validator about this correct usage model, new
123versions of the various locking primitives were added that allow you to
124specify a "nesting level". An example call, for the block device mutex,
125looks like this:
126
127enum bdev_bd_mutex_lock_class
128{
129 BD_MUTEX_NORMAL,
130 BD_MUTEX_WHOLE,
131 BD_MUTEX_PARTITION
132};
133
134 mutex_lock_nested(&bdev->bd_contains->bd_mutex, BD_MUTEX_PARTITION);
135
136In this case the locking is done on a bdev object that is known to be a
137partition.
138
139The validator treats a lock that is taken in such a nested fasion as a
140separate (sub)class for the purposes of validation.
141
142Note: When changing code to use the _nested() primitives, be careful and
143check really thoroughly that the hiearchy is correctly mapped; otherwise
144you can get false positives or false negatives.
145
146Proof of 100% correctness:
147--------------------------
148
149The validator achieves perfect, mathematical 'closure' (proof of locking
150correctness) in the sense that for every simple, standalone single-task
151locking sequence that occured at least once during the lifetime of the
152kernel, the validator proves it with a 100% certainty that no
153combination and timing of these locking sequences can cause any class of
154lock related deadlock. [*]
155
156I.e. complex multi-CPU and multi-task locking scenarios do not have to
157occur in practice to prove a deadlock: only the simple 'component'
158locking chains have to occur at least once (anytime, in any
159task/context) for the validator to be able to prove correctness. (For
160example, complex deadlocks that would normally need more than 3 CPUs and
161a very unlikely constellation of tasks, irq-contexts and timings to
162occur, can be detected on a plain, lightly loaded single-CPU system as
163well!)
164
165This radically decreases the complexity of locking related QA of the
166kernel: what has to be done during QA is to trigger as many "simple"
167single-task locking dependencies in the kernel as possible, at least
168once, to prove locking correctness - instead of having to trigger every
169possible combination of locking interaction between CPUs, combined with
170every possible hardirq and softirq nesting scenario (which is impossible
171to do in practice).
172
173[*] assuming that the validator itself is 100% correct, and no other
174 part of the system corrupts the state of the validator in any way.
175 We also assume that all NMI/SMM paths [which could interrupt
176 even hardirq-disabled codepaths] are correct and do not interfere
177 with the validator. We also assume that the 64-bit 'chain hash'
178 value is unique for every lock-chain in the system. Also, lock
179 recursion must not be higher than 20.
180
181Performance:
182------------
183
184The above rules require _massive_ amounts of runtime checking. If we did
185that for every lock taken and for every irqs-enable event, it would
186render the system practically unusably slow. The complexity of checking
187is O(N^2), so even with just a few hundred lock-classes we'd have to do
188tens of thousands of checks for every event.
189
190This problem is solved by checking any given 'locking scenario' (unique
191sequence of locks taken after each other) only once. A simple stack of
192held locks is maintained, and a lightweight 64-bit hash value is
193calculated, which hash is unique for every lock chain. The hash value,
194when the chain is validated for the first time, is then put into a hash
195table, which hash-table can be checked in a lockfree manner. If the
196locking chain occurs again later on, the hash table tells us that we
197dont have to validate the chain again.
diff --git a/Documentation/networking/ipvs-sysctl.txt b/Documentation/networking/ipvs-sysctl.txt
new file mode 100644
index 000000000000..4ccdbca03811
--- /dev/null
+++ b/Documentation/networking/ipvs-sysctl.txt
@@ -0,0 +1,143 @@
1/proc/sys/net/ipv4/vs/* Variables:
2
3am_droprate - INTEGER
4 default 10
5
6 It sets the always mode drop rate, which is used in the mode 3
7 of the drop_rate defense.
8
9amemthresh - INTEGER
10 default 1024
11
12 It sets the available memory threshold (in pages), which is
13 used in the automatic modes of defense. When there is no
14 enough available memory, the respective strategy will be
15 enabled and the variable is automatically set to 2, otherwise
16 the strategy is disabled and the variable is set to 1.
17
18cache_bypass - BOOLEAN
19 0 - disabled (default)
20 not 0 - enabled
21
22 If it is enabled, forward packets to the original destination
23 directly when no cache server is available and destination
24 address is not local (iph->daddr is RTN_UNICAST). It is mostly
25 used in transparent web cache cluster.
26
27debug_level - INTEGER
28 0 - transmission error messages (default)
29 1 - non-fatal error messages
30 2 - configuration
31 3 - destination trash
32 4 - drop entry
33 5 - service lookup
34 6 - scheduling
35 7 - connection new/expire, lookup and synchronization
36 8 - state transition
37 9 - binding destination, template checks and applications
38 10 - IPVS packet transmission
39 11 - IPVS packet handling (ip_vs_in/ip_vs_out)
40 12 or more - packet traversal
41
42 Only available when IPVS is compiled with the CONFIG_IPVS_DEBUG
43
44 Higher debugging levels include the messages for lower debugging
45 levels, so setting debug level 2, includes level 0, 1 and 2
46 messages. Thus, logging becomes more and more verbose the higher
47 the level.
48
49drop_entry - INTEGER
50 0 - disabled (default)
51
52 The drop_entry defense is to randomly drop entries in the
53 connection hash table, just in order to collect back some
54 memory for new connections. In the current code, the
55 drop_entry procedure can be activated every second, then it
56 randomly scans 1/32 of the whole and drops entries that are in
57 the SYN-RECV/SYNACK state, which should be effective against
58 syn-flooding attack.
59
60 The valid values of drop_entry are from 0 to 3, where 0 means
61 that this strategy is always disabled, 1 and 2 mean automatic
62 modes (when there is no enough available memory, the strategy
63 is enabled and the variable is automatically set to 2,
64 otherwise the strategy is disabled and the variable is set to
65 1), and 3 means that that the strategy is always enabled.
66
67drop_packet - INTEGER
68 0 - disabled (default)
69
70 The drop_packet defense is designed to drop 1/rate packets
71 before forwarding them to real servers. If the rate is 1, then
72 drop all the incoming packets.
73
74 The value definition is the same as that of the drop_entry. In
75 the automatic mode, the rate is determined by the follow
76 formula: rate = amemthresh / (amemthresh - available_memory)
77 when available memory is less than the available memory
78 threshold. When the mode 3 is set, the always mode drop rate
79 is controlled by the /proc/sys/net/ipv4/vs/am_droprate.
80
81expire_nodest_conn - BOOLEAN
82 0 - disabled (default)
83 not 0 - enabled
84
85 The default value is 0, the load balancer will silently drop
86 packets when its destination server is not available. It may
87 be useful, when user-space monitoring program deletes the
88 destination server (because of server overload or wrong
89 detection) and add back the server later, and the connections
90 to the server can continue.
91
92 If this feature is enabled, the load balancer will expire the
93 connection immediately when a packet arrives and its
94 destination server is not available, then the client program
95 will be notified that the connection is closed. This is
96 equivalent to the feature some people requires to flush
97 connections when its destination is not available.
98
99expire_quiescent_template - BOOLEAN
100 0 - disabled (default)
101 not 0 - enabled
102
103 When set to a non-zero value, the load balancer will expire
104 persistent templates when the destination server is quiescent.
105 This may be useful, when a user makes a destination server
106 quiescent by setting its weight to 0 and it is desired that
107 subsequent otherwise persistent connections are sent to a
108 different destination server. By default new persistent
109 connections are allowed to quiescent destination servers.
110
111 If this feature is enabled, the load balancer will expire the
112 persistence template if it is to be used to schedule a new
113 connection and the destination server is quiescent.
114
115nat_icmp_send - BOOLEAN
116 0 - disabled (default)
117 not 0 - enabled
118
119 It controls sending icmp error messages (ICMP_DEST_UNREACH)
120 for VS/NAT when the load balancer receives packets from real
121 servers but the connection entries don't exist.
122
123secure_tcp - INTEGER
124 0 - disabled (default)
125
126 The secure_tcp defense is to use a more complicated state
127 transition table and some possible short timeouts of each
128 state. In the VS/NAT, it delays the entering the ESTABLISHED
129 until the real server starts to send data and ACK packet
130 (after 3-way handshake).
131
132 The value definition is the same as that of drop_entry or
133 drop_packet.
134
135sync_threshold - INTEGER
136 default 3
137
138 It sets synchronization threshold, which is the minimum number
139 of incoming packets that a connection needs to receive before
140 the connection will be synchronized. A connection will be
141 synchronized, every time the number of its incoming packets
142 modulus 50 equals the threshold. The range of the threshold is
143 from 0 to 49.
diff --git a/Documentation/powerpc/booting-without-of.txt b/Documentation/powerpc/booting-without-of.txt
index 217e51768b87..3c62e66e1fcc 100644
--- a/Documentation/powerpc/booting-without-of.txt
+++ b/Documentation/powerpc/booting-without-of.txt
@@ -1436,9 +1436,9 @@ platforms are moved over to use the flattened-device-tree model.
1436 interrupts = <1d 3>; 1436 interrupts = <1d 3>;
1437 interrupt-parent = <40000>; 1437 interrupt-parent = <40000>;
1438 num-channels = <4>; 1438 num-channels = <4>;
1439 channel-fifo-len = <24>; 1439 channel-fifo-len = <18>;
1440 exec-units-mask = <000000fe>; 1440 exec-units-mask = <000000fe>;
1441 descriptor-types-mask = <073f1127>; 1441 descriptor-types-mask = <012b0ebf>;
1442 }; 1442 };
1443 1443
1444 1444
diff --git a/Documentation/scsi/ChangeLog.megaraid_sas b/Documentation/scsi/ChangeLog.megaraid_sas
index 0a85a7e8120e..d9e5960dafd5 100644
--- a/Documentation/scsi/ChangeLog.megaraid_sas
+++ b/Documentation/scsi/ChangeLog.megaraid_sas
@@ -1,4 +1,20 @@
1 1
21 Release Date : Sun May 14 22:49:52 PDT 2006 - Sumant Patro <Sumant.Patro@lsil.com>
32 Current Version : 00.00.03.01
43 Older Version : 00.00.02.04
5
6i. Added support for ZCR controller.
7
8 New device id 0x413 added.
9
10ii. Bug fix : Disable controller interrupt before firing INIT cmd to FW.
11
12 Interrupt is enabled after required initialization is over.
13 This is done to ensure that driver is ready to handle interrupts when
14 it is generated by the controller.
15
16 -Sumant Patro <Sumant.Patro@lsil.com>
17
21 Release Date : Wed Feb 03 14:31:44 PST 2006 - Sumant Patro <Sumant.Patro@lsil.com> 181 Release Date : Wed Feb 03 14:31:44 PST 2006 - Sumant Patro <Sumant.Patro@lsil.com>
32 Current Version : 00.00.02.04 192 Current Version : 00.00.02.04
43 Older Version : 00.00.02.04 203 Older Version : 00.00.02.04
diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
index 86754eb390da..7cee90223d3a 100644
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -28,6 +28,7 @@ Currently, these files are in /proc/sys/vm:
28- block_dump 28- block_dump
29- drop-caches 29- drop-caches
30- zone_reclaim_mode 30- zone_reclaim_mode
31- min_unmapped_ratio
31- panic_on_oom 32- panic_on_oom
32 33
33============================================================== 34==============================================================
@@ -168,6 +169,19 @@ in all nodes of the system.
168 169
169============================================================= 170=============================================================
170 171
172min_unmapped_ratio:
173
174This is available only on NUMA kernels.
175
176A percentage of the file backed pages in each zone. Zone reclaim will only
177occur if more than this percentage of pages are file backed and unmapped.
178This is to insure that a minimal amount of local pages is still available for
179file I/O even if the node is overallocated.
180
181The default is 1 percent.
182
183=============================================================
184
171panic_on_oom 185panic_on_oom
172 186
173This enables or disables panic on out-of-memory feature. If this is set to 1, 187This enables or disables panic on out-of-memory feature. If this is set to 1,