diff options
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/gpio.txt | 17 | ||||
-rw-r--r-- | Documentation/hrtimer/timer_stats.txt | 68 | ||||
-rw-r--r-- | Documentation/hrtimers/highres.txt | 249 | ||||
-rw-r--r-- | Documentation/hrtimers/hrtimers.txt (renamed from Documentation/hrtimers.txt) | 0 | ||||
-rw-r--r-- | Documentation/kernel-parameters.txt | 8 |
5 files changed, 335 insertions, 7 deletions
diff --git a/Documentation/gpio.txt b/Documentation/gpio.txt index 09dd510c4a5f..576ce463cf44 100644 --- a/Documentation/gpio.txt +++ b/Documentation/gpio.txt | |||
@@ -78,7 +78,8 @@ Identifying GPIOs | |||
78 | ----------------- | 78 | ----------------- |
79 | GPIOs are identified by unsigned integers in the range 0..MAX_INT. That | 79 | GPIOs are identified by unsigned integers in the range 0..MAX_INT. That |
80 | reserves "negative" numbers for other purposes like marking signals as | 80 | reserves "negative" numbers for other purposes like marking signals as |
81 | "not available on this board", or indicating faults. | 81 | "not available on this board", or indicating faults. Code that doesn't |
82 | touch the underlying hardware treats these integers as opaque cookies. | ||
82 | 83 | ||
83 | Platforms define how they use those integers, and usually #define symbols | 84 | Platforms define how they use those integers, and usually #define symbols |
84 | for the GPIO lines so that board-specific setup code directly corresponds | 85 | for the GPIO lines so that board-specific setup code directly corresponds |
@@ -139,10 +140,10 @@ issues including wire-OR and output latencies. | |||
139 | The get/set calls have no error returns because "invalid GPIO" should have | 140 | The get/set calls have no error returns because "invalid GPIO" should have |
140 | been reported earlier in gpio_set_direction(). However, note that not all | 141 | been reported earlier in gpio_set_direction(). However, note that not all |
141 | platforms can read the value of output pins; those that can't should always | 142 | platforms can read the value of output pins; those that can't should always |
142 | return zero. Also, these calls will be ignored for GPIOs that can't safely | 143 | return zero. Also, using these calls for GPIOs that can't safely be accessed |
143 | be accessed wihtout sleeping (see below). | 144 | without sleeping (see below) is an error. |
144 | 145 | ||
145 | Platform-specific implementations are encouraged to optimise the two | 146 | Platform-specific implementations are encouraged to optimize the two |
146 | calls to access the GPIO value in cases where the GPIO number (and for | 147 | calls to access the GPIO value in cases where the GPIO number (and for |
147 | output, value) are constant. It's normal for them to need only a couple | 148 | output, value) are constant. It's normal for them to need only a couple |
148 | of instructions in such cases (reading or writing a hardware register), | 149 | of instructions in such cases (reading or writing a hardware register), |
@@ -239,7 +240,8 @@ options are part of the IRQ interface, e.g. IRQF_TRIGGER_FALLING, as are | |||
239 | system wakeup capabilities. | 240 | system wakeup capabilities. |
240 | 241 | ||
241 | Non-error values returned from irq_to_gpio() would most commonly be used | 242 | Non-error values returned from irq_to_gpio() would most commonly be used |
242 | with gpio_get_value(). | 243 | with gpio_get_value(), for example to initialize or update driver state |
244 | when the IRQ is edge-triggered. | ||
243 | 245 | ||
244 | 246 | ||
245 | 247 | ||
@@ -260,9 +262,10 @@ pullups (or pulldowns) so that the on-chip ones should not be used. | |||
260 | There are other system-specific mechanisms that are not specified here, | 262 | There are other system-specific mechanisms that are not specified here, |
261 | like the aforementioned options for input de-glitching and wire-OR output. | 263 | like the aforementioned options for input de-glitching and wire-OR output. |
262 | Hardware may support reading or writing GPIOs in gangs, but that's usually | 264 | Hardware may support reading or writing GPIOs in gangs, but that's usually |
263 | configuration dependednt: for GPIOs sharing the same bank. (GPIOs are | 265 | configuration dependent: for GPIOs sharing the same bank. (GPIOs are |
264 | commonly grouped in banks of 16 or 32, with a given SOC having several such | 266 | commonly grouped in banks of 16 or 32, with a given SOC having several such |
265 | banks.) Code relying on such mechanisms will necessarily be nonportable. | 267 | banks.) Some systems can trigger IRQs from output GPIOs. Code relying on |
268 | such mechanisms will necessarily be nonportable. | ||
266 | 269 | ||
267 | Dynamic definition of GPIOs is not currently supported; for example, as | 270 | Dynamic definition of GPIOs is not currently supported; for example, as |
268 | a side effect of configuring an add-on board with some GPIO expanders. | 271 | a side effect of configuring an add-on board with some GPIO expanders. |
diff --git a/Documentation/hrtimer/timer_stats.txt b/Documentation/hrtimer/timer_stats.txt new file mode 100644 index 000000000000..27f782e3593f --- /dev/null +++ b/Documentation/hrtimer/timer_stats.txt | |||
@@ -0,0 +1,68 @@ | |||
1 | timer_stats - timer usage statistics | ||
2 | ------------------------------------ | ||
3 | |||
4 | timer_stats is a debugging facility to make the timer (ab)usage in a Linux | ||
5 | system visible to kernel and userspace developers. It is not intended for | ||
6 | production usage as it adds significant overhead to the (hr)timer code and the | ||
7 | (hr)timer data structures. | ||
8 | |||
9 | timer_stats should be used by kernel and userspace developers to verify that | ||
10 | their code does not make unduly use of timers. This helps to avoid unnecessary | ||
11 | wakeups, which should be avoided to optimize power consumption. | ||
12 | |||
13 | It can be enabled by CONFIG_TIMER_STATS in the "Kernel hacking" configuration | ||
14 | section. | ||
15 | |||
16 | timer_stats collects information about the timer events which are fired in a | ||
17 | Linux system over a sample period: | ||
18 | |||
19 | - the pid of the task(process) which initialized the timer | ||
20 | - the name of the process which initialized the timer | ||
21 | - the function where the timer was intialized | ||
22 | - the callback function which is associated to the timer | ||
23 | - the number of events (callbacks) | ||
24 | |||
25 | timer_stats adds an entry to /proc: /proc/timer_stats | ||
26 | |||
27 | This entry is used to control the statistics functionality and to read out the | ||
28 | sampled information. | ||
29 | |||
30 | The timer_stats functionality is inactive on bootup. | ||
31 | |||
32 | To activate a sample period issue: | ||
33 | # echo 1 >/proc/timer_stats | ||
34 | |||
35 | To stop a sample period issue: | ||
36 | # echo 0 >/proc/timer_stats | ||
37 | |||
38 | The statistics can be retrieved by: | ||
39 | # cat /proc/timer_stats | ||
40 | |||
41 | The readout of /proc/timer_stats automatically disables sampling. The sampled | ||
42 | information is kept until a new sample period is started. This allows multiple | ||
43 | readouts. | ||
44 | |||
45 | Sample output of /proc/timer_stats: | ||
46 | |||
47 | Timerstats sample period: 3.888770 s | ||
48 | 12, 0 swapper hrtimer_stop_sched_tick (hrtimer_sched_tick) | ||
49 | 15, 1 swapper hcd_submit_urb (rh_timer_func) | ||
50 | 4, 959 kedac schedule_timeout (process_timeout) | ||
51 | 1, 0 swapper page_writeback_init (wb_timer_fn) | ||
52 | 28, 0 swapper hrtimer_stop_sched_tick (hrtimer_sched_tick) | ||
53 | 22, 2948 IRQ 4 tty_flip_buffer_push (delayed_work_timer_fn) | ||
54 | 3, 3100 bash schedule_timeout (process_timeout) | ||
55 | 1, 1 swapper queue_delayed_work_on (delayed_work_timer_fn) | ||
56 | 1, 1 swapper queue_delayed_work_on (delayed_work_timer_fn) | ||
57 | 1, 1 swapper neigh_table_init_no_netlink (neigh_periodic_timer) | ||
58 | 1, 2292 ip __netdev_watchdog_up (dev_watchdog) | ||
59 | 1, 23 events/1 do_cache_clean (delayed_work_timer_fn) | ||
60 | 90 total events, 30.0 events/sec | ||
61 | |||
62 | The first column is the number of events, the second column the pid, the third | ||
63 | column is the name of the process. The forth column shows the function which | ||
64 | initialized the timer and in parantheses the callback function which was | ||
65 | executed on expiry. | ||
66 | |||
67 | Thomas, Ingo | ||
68 | |||
diff --git a/Documentation/hrtimers/highres.txt b/Documentation/hrtimers/highres.txt new file mode 100644 index 000000000000..ce0e9a91e157 --- /dev/null +++ b/Documentation/hrtimers/highres.txt | |||
@@ -0,0 +1,249 @@ | |||
1 | High resolution timers and dynamic ticks design notes | ||
2 | ----------------------------------------------------- | ||
3 | |||
4 | Further information can be found in the paper of the OLS 2006 talk "hrtimers | ||
5 | and beyond". The paper is part of the OLS 2006 Proceedings Volume 1, which can | ||
6 | be found on the OLS website: | ||
7 | http://www.linuxsymposium.org/2006/linuxsymposium_procv1.pdf | ||
8 | |||
9 | The slides to this talk are available from: | ||
10 | http://tglx.de/projects/hrtimers/ols2006-hrtimers.pdf | ||
11 | |||
12 | The slides contain five figures (pages 2, 15, 18, 20, 22), which illustrate the | ||
13 | changes in the time(r) related Linux subsystems. Figure #1 (p. 2) shows the | ||
14 | design of the Linux time(r) system before hrtimers and other building blocks | ||
15 | got merged into mainline. | ||
16 | |||
17 | Note: the paper and the slides are talking about "clock event source", while we | ||
18 | switched to the name "clock event devices" in meantime. | ||
19 | |||
20 | The design contains the following basic building blocks: | ||
21 | |||
22 | - hrtimer base infrastructure | ||
23 | - timeofday and clock source management | ||
24 | - clock event management | ||
25 | - high resolution timer functionality | ||
26 | - dynamic ticks | ||
27 | |||
28 | |||
29 | hrtimer base infrastructure | ||
30 | --------------------------- | ||
31 | |||
32 | The hrtimer base infrastructure was merged into the 2.6.16 kernel. Details of | ||
33 | the base implementation are covered in Documentation/hrtimers/hrtimer.txt. See | ||
34 | also figure #2 (OLS slides p. 15) | ||
35 | |||
36 | The main differences to the timer wheel, which holds the armed timer_list type | ||
37 | timers are: | ||
38 | - time ordered enqueueing into a rb-tree | ||
39 | - independent of ticks (the processing is based on nanoseconds) | ||
40 | |||
41 | |||
42 | timeofday and clock source management | ||
43 | ------------------------------------- | ||
44 | |||
45 | John Stultz's Generic Time Of Day (GTOD) framework moves a large portion of | ||
46 | code out of the architecture-specific areas into a generic management | ||
47 | framework, as illustrated in figure #3 (OLS slides p. 18). The architecture | ||
48 | specific portion is reduced to the low level hardware details of the clock | ||
49 | sources, which are registered in the framework and selected on a quality based | ||
50 | decision. The low level code provides hardware setup and readout routines and | ||
51 | initializes data structures, which are used by the generic time keeping code to | ||
52 | convert the clock ticks to nanosecond based time values. All other time keeping | ||
53 | related functionality is moved into the generic code. The GTOD base patch got | ||
54 | merged into the 2.6.18 kernel. | ||
55 | |||
56 | Further information about the Generic Time Of Day framework is available in the | ||
57 | OLS 2005 Proceedings Volume 1: | ||
58 | http://www.linuxsymposium.org/2005/linuxsymposium_procv1.pdf | ||
59 | |||
60 | The paper "We Are Not Getting Any Younger: A New Approach to Time and | ||
61 | Timers" was written by J. Stultz, D.V. Hart, & N. Aravamudan. | ||
62 | |||
63 | Figure #3 (OLS slides p.18) illustrates the transformation. | ||
64 | |||
65 | |||
66 | clock event management | ||
67 | ---------------------- | ||
68 | |||
69 | While clock sources provide read access to the monotonically increasing time | ||
70 | value, clock event devices are used to schedule the next event | ||
71 | interrupt(s). The next event is currently defined to be periodic, with its | ||
72 | period defined at compile time. The setup and selection of the event device | ||
73 | for various event driven functionalities is hardwired into the architecture | ||
74 | dependent code. This results in duplicated code across all architectures and | ||
75 | makes it extremely difficult to change the configuration of the system to use | ||
76 | event interrupt devices other than those already built into the | ||
77 | architecture. Another implication of the current design is that it is necessary | ||
78 | to touch all the architecture-specific implementations in order to provide new | ||
79 | functionality like high resolution timers or dynamic ticks. | ||
80 | |||
81 | The clock events subsystem tries to address this problem by providing a generic | ||
82 | solution to manage clock event devices and their usage for the various clock | ||
83 | event driven kernel functionalities. The goal of the clock event subsystem is | ||
84 | to minimize the clock event related architecture dependent code to the pure | ||
85 | hardware related handling and to allow easy addition and utilization of new | ||
86 | clock event devices. It also minimizes the duplicated code across the | ||
87 | architectures as it provides generic functionality down to the interrupt | ||
88 | service handler, which is almost inherently hardware dependent. | ||
89 | |||
90 | Clock event devices are registered either by the architecture dependent boot | ||
91 | code or at module insertion time. Each clock event device fills a data | ||
92 | structure with clock-specific property parameters and callback functions. The | ||
93 | clock event management decides, by using the specified property parameters, the | ||
94 | set of system functions a clock event device will be used to support. This | ||
95 | includes the distinction of per-CPU and per-system global event devices. | ||
96 | |||
97 | System-level global event devices are used for the Linux periodic tick. Per-CPU | ||
98 | event devices are used to provide local CPU functionality such as process | ||
99 | accounting, profiling, and high resolution timers. | ||
100 | |||
101 | The management layer assignes one or more of the folliwing functions to a clock | ||
102 | event device: | ||
103 | - system global periodic tick (jiffies update) | ||
104 | - cpu local update_process_times | ||
105 | - cpu local profiling | ||
106 | - cpu local next event interrupt (non periodic mode) | ||
107 | |||
108 | The clock event device delegates the selection of those timer interrupt related | ||
109 | functions completely to the management layer. The clock management layer stores | ||
110 | a function pointer in the device description structure, which has to be called | ||
111 | from the hardware level handler. This removes a lot of duplicated code from the | ||
112 | architecture specific timer interrupt handlers and hands the control over the | ||
113 | clock event devices and the assignment of timer interrupt related functionality | ||
114 | to the core code. | ||
115 | |||
116 | The clock event layer API is rather small. Aside from the clock event device | ||
117 | registration interface it provides functions to schedule the next event | ||
118 | interrupt, clock event device notification service and support for suspend and | ||
119 | resume. | ||
120 | |||
121 | The framework adds about 700 lines of code which results in a 2KB increase of | ||
122 | the kernel binary size. The conversion of i386 removes about 100 lines of | ||
123 | code. The binary size decrease is in the range of 400 byte. We believe that the | ||
124 | increase of flexibility and the avoidance of duplicated code across | ||
125 | architectures justifies the slight increase of the binary size. | ||
126 | |||
127 | The conversion of an architecture has no functional impact, but allows to | ||
128 | utilize the high resolution and dynamic tick functionalites without any change | ||
129 | to the clock event device and timer interrupt code. After the conversion the | ||
130 | enabling of high resolution timers and dynamic ticks is simply provided by | ||
131 | adding the kernel/time/Kconfig file to the architecture specific Kconfig and | ||
132 | adding the dynamic tick specific calls to the idle routine (a total of 3 lines | ||
133 | added to the idle function and the Kconfig file) | ||
134 | |||
135 | Figure #4 (OLS slides p.20) illustrates the transformation. | ||
136 | |||
137 | |||
138 | high resolution timer functionality | ||
139 | ----------------------------------- | ||
140 | |||
141 | During system boot it is not possible to use the high resolution timer | ||
142 | functionality, while making it possible would be difficult and would serve no | ||
143 | useful function. The initialization of the clock event device framework, the | ||
144 | clock source framework (GTOD) and hrtimers itself has to be done and | ||
145 | appropriate clock sources and clock event devices have to be registered before | ||
146 | the high resolution functionality can work. Up to the point where hrtimers are | ||
147 | initialized, the system works in the usual low resolution periodic mode. The | ||
148 | clock source and the clock event device layers provide notification functions | ||
149 | which inform hrtimers about availability of new hardware. hrtimers validates | ||
150 | the usability of the registered clock sources and clock event devices before | ||
151 | switching to high resolution mode. This ensures also that a kernel which is | ||
152 | configured for high resolution timers can run on a system which lacks the | ||
153 | necessary hardware support. | ||
154 | |||
155 | The high resolution timer code does not support SMP machines which have only | ||
156 | global clock event devices. The support of such hardware would involve IPI | ||
157 | calls when an interrupt happens. The overhead would be much larger than the | ||
158 | benefit. This is the reason why we currently disable high resolution and | ||
159 | dynamic ticks on i386 SMP systems which stop the local APIC in C3 power | ||
160 | state. A workaround is available as an idea, but the problem has not been | ||
161 | tackled yet. | ||
162 | |||
163 | The time ordered insertion of timers provides all the infrastructure to decide | ||
164 | whether the event device has to be reprogrammed when a timer is added. The | ||
165 | decision is made per timer base and synchronized across per-cpu timer bases in | ||
166 | a support function. The design allows the system to utilize separate per-CPU | ||
167 | clock event devices for the per-CPU timer bases, but currently only one | ||
168 | reprogrammable clock event device per-CPU is utilized. | ||
169 | |||
170 | When the timer interrupt happens, the next event interrupt handler is called | ||
171 | from the clock event distribution code and moves expired timers from the | ||
172 | red-black tree to a separate double linked list and invokes the softirq | ||
173 | handler. An additional mode field in the hrtimer structure allows the system to | ||
174 | execute callback functions directly from the next event interrupt handler. This | ||
175 | is restricted to code which can safely be executed in the hard interrupt | ||
176 | context. This applies, for example, to the common case of a wakeup function as | ||
177 | used by nanosleep. The advantage of executing the handler in the interrupt | ||
178 | context is the avoidance of up to two context switches - from the interrupted | ||
179 | context to the softirq and to the task which is woken up by the expired | ||
180 | timer. | ||
181 | |||
182 | Once a system has switched to high resolution mode, the periodic tick is | ||
183 | switched off. This disables the per system global periodic clock event device - | ||
184 | e.g. the PIT on i386 SMP systems. | ||
185 | |||
186 | The periodic tick functionality is provided by an per-cpu hrtimer. The callback | ||
187 | function is executed in the next event interrupt context and updates jiffies | ||
188 | and calls update_process_times and profiling. The implementation of the hrtimer | ||
189 | based periodic tick is designed to be extended with dynamic tick functionality. | ||
190 | This allows to use a single clock event device to schedule high resolution | ||
191 | timer and periodic events (jiffies tick, profiling, process accounting) on UP | ||
192 | systems. This has been proved to work with the PIT on i386 and the Incrementer | ||
193 | on PPC. | ||
194 | |||
195 | The softirq for running the hrtimer queues and executing the callbacks has been | ||
196 | separated from the tick bound timer softirq to allow accurate delivery of high | ||
197 | resolution timer signals which are used by itimer and POSIX interval | ||
198 | timers. The execution of this softirq can still be delayed by other softirqs, | ||
199 | but the overall latencies have been significantly improved by this separation. | ||
200 | |||
201 | Figure #5 (OLS slides p.22) illustrates the transformation. | ||
202 | |||
203 | |||
204 | dynamic ticks | ||
205 | ------------- | ||
206 | |||
207 | Dynamic ticks are the logical consequence of the hrtimer based periodic tick | ||
208 | replacement (sched_tick). The functionality of the sched_tick hrtimer is | ||
209 | extended by three functions: | ||
210 | |||
211 | - hrtimer_stop_sched_tick | ||
212 | - hrtimer_restart_sched_tick | ||
213 | - hrtimer_update_jiffies | ||
214 | |||
215 | hrtimer_stop_sched_tick() is called when a CPU goes into idle state. The code | ||
216 | evaluates the next scheduled timer event (from both hrtimers and the timer | ||
217 | wheel) and in case that the next event is further away than the next tick it | ||
218 | reprograms the sched_tick to this future event, to allow longer idle sleeps | ||
219 | without worthless interruption by the periodic tick. The function is also | ||
220 | called when an interrupt happens during the idle period, which does not cause a | ||
221 | reschedule. The call is necessary as the interrupt handler might have armed a | ||
222 | new timer whose expiry time is before the time which was identified as the | ||
223 | nearest event in the previous call to hrtimer_stop_sched_tick. | ||
224 | |||
225 | hrtimer_restart_sched_tick() is called when the CPU leaves the idle state before | ||
226 | it calls schedule(). hrtimer_restart_sched_tick() resumes the periodic tick, | ||
227 | which is kept active until the next call to hrtimer_stop_sched_tick(). | ||
228 | |||
229 | hrtimer_update_jiffies() is called from irq_enter() when an interrupt happens | ||
230 | in the idle period to make sure that jiffies are up to date and the interrupt | ||
231 | handler has not to deal with an eventually stale jiffy value. | ||
232 | |||
233 | The dynamic tick feature provides statistical values which are exported to | ||
234 | userspace via /proc/stats and can be made available for enhanced power | ||
235 | management control. | ||
236 | |||
237 | The implementation leaves room for further development like full tickless | ||
238 | systems, where the time slice is controlled by the scheduler, variable | ||
239 | frequency profiling, and a complete removal of jiffies in the future. | ||
240 | |||
241 | |||
242 | Aside the current initial submission of i386 support, the patchset has been | ||
243 | extended to x86_64 and ARM already. Initial (work in progress) support is also | ||
244 | available for MIPS and PowerPC. | ||
245 | |||
246 | Thomas, Ingo | ||
247 | |||
248 | |||
249 | |||
diff --git a/Documentation/hrtimers.txt b/Documentation/hrtimers/hrtimers.txt index ce31f65e12e7..ce31f65e12e7 100644 --- a/Documentation/hrtimers.txt +++ b/Documentation/hrtimers/hrtimers.txt | |||
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 22b19962a1a2..abd575cfc759 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt | |||
@@ -609,6 +609,10 @@ and is between 256 and 4096 characters. It is defined in the file | |||
609 | highmem otherwise. This also works to reduce highmem | 609 | highmem otherwise. This also works to reduce highmem |
610 | size on bigger boxes. | 610 | size on bigger boxes. |
611 | 611 | ||
612 | highres= [KNL] Enable/disable high resolution timer mode. | ||
613 | Valid parameters: "on", "off" | ||
614 | Default: "on" | ||
615 | |||
612 | hisax= [HW,ISDN] | 616 | hisax= [HW,ISDN] |
613 | See Documentation/isdn/README.HiSax. | 617 | See Documentation/isdn/README.HiSax. |
614 | 618 | ||
@@ -1078,6 +1082,10 @@ and is between 256 and 4096 characters. It is defined in the file | |||
1078 | in certain environments such as networked servers or | 1082 | in certain environments such as networked servers or |
1079 | real-time systems. | 1083 | real-time systems. |
1080 | 1084 | ||
1085 | nohz= [KNL] Boottime enable/disable dynamic ticks | ||
1086 | Valid arguments: on, off | ||
1087 | Default: on | ||
1088 | |||
1081 | noirqbalance [IA-32,SMP,KNL] Disable kernel irq balancing | 1089 | noirqbalance [IA-32,SMP,KNL] Disable kernel irq balancing |
1082 | 1090 | ||
1083 | noirqdebug [IA-32] Disables the code which attempts to detect and | 1091 | noirqdebug [IA-32] Disables the code which attempts to detect and |