diff options
Diffstat (limited to 'Documentation/preempt-locking.txt')
-rw-r--r-- | Documentation/preempt-locking.txt | 135 |
1 files changed, 135 insertions, 0 deletions
diff --git a/Documentation/preempt-locking.txt b/Documentation/preempt-locking.txt new file mode 100644 index 000000000000..57883ca2498b --- /dev/null +++ b/Documentation/preempt-locking.txt | |||
@@ -0,0 +1,135 @@ | |||
1 | Proper Locking Under a Preemptible Kernel: | ||
2 | Keeping Kernel Code Preempt-Safe | ||
3 | Robert Love <rml@tech9.net> | ||
4 | Last Updated: 28 Aug 2002 | ||
5 | |||
6 | |||
7 | INTRODUCTION | ||
8 | |||
9 | |||
10 | A preemptible kernel creates new locking issues. The issues are the same as | ||
11 | those under SMP: concurrency and reentrancy. Thankfully, the Linux preemptible | ||
12 | kernel model leverages existing SMP locking mechanisms. Thus, the kernel | ||
13 | requires explicit additional locking for very few additional situations. | ||
14 | |||
15 | This document is for all kernel hackers. Developing code in the kernel | ||
16 | requires protecting these situations. | ||
17 | |||
18 | |||
19 | RULE #1: Per-CPU data structures need explicit protection | ||
20 | |||
21 | |||
22 | Two similar problems arise. An example code snippet: | ||
23 | |||
24 | struct this_needs_locking tux[NR_CPUS]; | ||
25 | tux[smp_processor_id()] = some_value; | ||
26 | /* task is preempted here... */ | ||
27 | something = tux[smp_processor_id()]; | ||
28 | |||
29 | First, since the data is per-CPU, it may not have explicit SMP locking, but | ||
30 | require it otherwise. Second, when a preempted task is finally rescheduled, | ||
31 | the previous value of smp_processor_id may not equal the current. You must | ||
32 | protect these situations by disabling preemption around them. | ||
33 | |||
34 | You can also use put_cpu() and get_cpu(), which will disable preemption. | ||
35 | |||
36 | |||
37 | RULE #2: CPU state must be protected. | ||
38 | |||
39 | |||
40 | Under preemption, the state of the CPU must be protected. This is arch- | ||
41 | dependent, but includes CPU structures and state not preserved over a context | ||
42 | switch. For example, on x86, entering and exiting FPU mode is now a critical | ||
43 | section that must occur while preemption is disabled. Think what would happen | ||
44 | if the kernel is executing a floating-point instruction and is then preempted. | ||
45 | Remember, the kernel does not save FPU state except for user tasks. Therefore, | ||
46 | upon preemption, the FPU registers will be sold to the lowest bidder. Thus, | ||
47 | preemption must be disabled around such regions. | ||
48 | |||
49 | Note, some FPU functions are already explicitly preempt safe. For example, | ||
50 | kernel_fpu_begin and kernel_fpu_end will disable and enable preemption. | ||
51 | However, math_state_restore must be called with preemption disabled. | ||
52 | |||
53 | |||
54 | RULE #3: Lock acquire and release must be performed by same task | ||
55 | |||
56 | |||
57 | A lock acquired in one task must be released by the same task. This | ||
58 | means you can't do oddball things like acquire a lock and go off to | ||
59 | play while another task releases it. If you want to do something | ||
60 | like this, acquire and release the task in the same code path and | ||
61 | have the caller wait on an event by the other task. | ||
62 | |||
63 | |||
64 | SOLUTION | ||
65 | |||
66 | |||
67 | Data protection under preemption is achieved by disabling preemption for the | ||
68 | duration of the critical region. | ||
69 | |||
70 | preempt_enable() decrement the preempt counter | ||
71 | preempt_disable() increment the preempt counter | ||
72 | preempt_enable_no_resched() decrement, but do not immediately preempt | ||
73 | preempt_check_resched() if needed, reschedule | ||
74 | preempt_count() return the preempt counter | ||
75 | |||
76 | The functions are nestable. In other words, you can call preempt_disable | ||
77 | n-times in a code path, and preemption will not be reenabled until the n-th | ||
78 | call to preempt_enable. The preempt statements define to nothing if | ||
79 | preemption is not enabled. | ||
80 | |||
81 | Note that you do not need to explicitly prevent preemption if you are holding | ||
82 | any locks or interrupts are disabled, since preemption is implicitly disabled | ||
83 | in those cases. | ||
84 | |||
85 | But keep in mind that 'irqs disabled' is a fundamentally unsafe way of | ||
86 | disabling preemption - any spin_unlock() decreasing the preemption count | ||
87 | to 0 might trigger a reschedule. A simple printk() might trigger a reschedule. | ||
88 | So use this implicit preemption-disabling property only if you know that the | ||
89 | affected codepath does not do any of this. Best policy is to use this only for | ||
90 | small, atomic code that you wrote and which calls no complex functions. | ||
91 | |||
92 | Example: | ||
93 | |||
94 | cpucache_t *cc; /* this is per-CPU */ | ||
95 | preempt_disable(); | ||
96 | cc = cc_data(searchp); | ||
97 | if (cc && cc->avail) { | ||
98 | __free_block(searchp, cc_entry(cc), cc->avail); | ||
99 | cc->avail = 0; | ||
100 | } | ||
101 | preempt_enable(); | ||
102 | return 0; | ||
103 | |||
104 | Notice how the preemption statements must encompass every reference of the | ||
105 | critical variables. Another example: | ||
106 | |||
107 | int buf[NR_CPUS]; | ||
108 | set_cpu_val(buf); | ||
109 | if (buf[smp_processor_id()] == -1) printf(KERN_INFO "wee!\n"); | ||
110 | spin_lock(&buf_lock); | ||
111 | /* ... */ | ||
112 | |||
113 | This code is not preempt-safe, but see how easily we can fix it by simply | ||
114 | moving the spin_lock up two lines. | ||
115 | |||
116 | |||
117 | PREVENTING PREEMPTION USING INTERRUPT DISABLING | ||
118 | |||
119 | |||
120 | It is possible to prevent a preemption event using local_irq_disable and | ||
121 | local_irq_save. Note, when doing so, you must be very careful to not cause | ||
122 | an event that would set need_resched and result in a preemption check. When | ||
123 | in doubt, rely on locking or explicit preemption disabling. | ||
124 | |||
125 | Note in 2.5 interrupt disabling is now only per-CPU (e.g. local). | ||
126 | |||
127 | An additional concern is proper usage of local_irq_disable and local_irq_save. | ||
128 | These may be used to protect from preemption, however, on exit, if preemption | ||
129 | may be enabled, a test to see if preemption is required should be done. If | ||
130 | these are called from the spin_lock and read/write lock macros, the right thing | ||
131 | is done. They may also be called within a spin-lock protected region, however, | ||
132 | if they are ever called outside of this context, a test for preemption should | ||
133 | be made. Do note that calls from interrupt context or bottom half/ tasklets | ||
134 | are also protected by preemption locks and so may use the versions which do | ||
135 | not check preemption. | ||