aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/ABI/testing/sysfs-firmware-memmap71
-rw-r--r--Documentation/HOWTO2
-rw-r--r--Documentation/ftrace.txt1353
-rw-r--r--Documentation/kdump/kdump.txt2
-rw-r--r--Documentation/kernel-parameters.txt37
-rw-r--r--Documentation/networking/ip-sysctl.txt256
-rw-r--r--Documentation/nmi_watchdog.txt16
-rw-r--r--Documentation/scheduler/sched-domains.txt7
-rw-r--r--Documentation/scheduler/sched-rt-group.txt4
-rw-r--r--Documentation/x86/i386/IO-APIC.txt (renamed from Documentation/i386/IO-APIC.txt)0
-rw-r--r--Documentation/x86/i386/boot.txt (renamed from Documentation/i386/boot.txt)79
-rw-r--r--Documentation/x86/i386/usb-legacy-support.txt (renamed from Documentation/i386/usb-legacy-support.txt)0
-rw-r--r--Documentation/x86/i386/zero-page.txt (renamed from Documentation/i386/zero-page.txt)0
-rw-r--r--Documentation/x86/x86_64/00-INDEX (renamed from Documentation/x86_64/00-INDEX)0
-rw-r--r--Documentation/x86/x86_64/boot-options.txt (renamed from Documentation/x86_64/boot-options.txt)0
-rw-r--r--Documentation/x86/x86_64/cpu-hotplug-spec (renamed from Documentation/x86_64/cpu-hotplug-spec)0
-rw-r--r--Documentation/x86/x86_64/fake-numa-for-cpusets (renamed from Documentation/x86_64/fake-numa-for-cpusets)0
-rw-r--r--Documentation/x86/x86_64/kernel-stacks (renamed from Documentation/x86_64/kernel-stacks)0
-rw-r--r--Documentation/x86/x86_64/machinecheck (renamed from Documentation/x86_64/machinecheck)0
-rw-r--r--Documentation/x86/x86_64/mm.txt (renamed from Documentation/x86_64/mm.txt)5
-rw-r--r--Documentation/x86/x86_64/uefi.txt (renamed from Documentation/x86_64/uefi.txt)4
21 files changed, 1740 insertions, 96 deletions
diff --git a/Documentation/ABI/testing/sysfs-firmware-memmap b/Documentation/ABI/testing/sysfs-firmware-memmap
new file mode 100644
index 000000000000..0d99ee6ae02e
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-firmware-memmap
@@ -0,0 +1,71 @@
1What: /sys/firmware/memmap/
2Date: June 2008
3Contact: Bernhard Walle <bwalle@suse.de>
4Description:
5 On all platforms, the firmware provides a memory map which the
6 kernel reads. The resources from that memory map are registered
7 in the kernel resource tree and exposed to userspace via
8 /proc/iomem (together with other resources).
9
10 However, on most architectures that firmware-provided memory
11 map is modified afterwards by the kernel itself, either because
12 the kernel merges that memory map with other information or
13 just because the user overwrites that memory map via command
14 line.
15
16 kexec needs the raw firmware-provided memory map to setup the
17 parameter segment of the kernel that should be booted with
18 kexec. Also, the raw memory map is useful for debugging. For
19 that reason, /sys/firmware/memmap is an interface that provides
20 the raw memory map to userspace.
21
22 The structure is as follows: Under /sys/firmware/memmap there
23 are subdirectories with the number of the entry as their name:
24
25 /sys/firmware/memmap/0
26 /sys/firmware/memmap/1
27 /sys/firmware/memmap/2
28 /sys/firmware/memmap/3
29 ...
30
31 The maximum depends on the number of memory map entries provided
32 by the firmware. The order is just the order that the firmware
33 provides.
34
35 Each directory contains three files:
36
37 start : The start address (as hexadecimal number with the
38 '0x' prefix).
39 end : The end address, inclusive (regardless whether the
40 firmware provides inclusive or exclusive ranges).
41 type : Type of the entry as string. See below for a list of
42 valid types.
43
44 So, for example:
45
46 /sys/firmware/memmap/0/start
47 /sys/firmware/memmap/0/end
48 /sys/firmware/memmap/0/type
49 /sys/firmware/memmap/1/start
50 ...
51
52 Currently following types exist:
53
54 - System RAM
55 - ACPI Tables
56 - ACPI Non-volatile Storage
57 - reserved
58
59 Following shell snippet can be used to display that memory
60 map in a human-readable format:
61
62 -------------------- 8< ----------------------------------------
63 #!/bin/bash
64 cd /sys/firmware/memmap
65 for dir in * ; do
66 start=$(cat $dir/start)
67 end=$(cat $dir/end)
68 type=$(cat $dir/type)
69 printf "%016x-%016x (%s)\n" $start $[ $end +1] "$type"
70 done
71 -------------------- >8 ----------------------------------------
diff --git a/Documentation/HOWTO b/Documentation/HOWTO
index 0291ade44c17..619e8caf30db 100644
--- a/Documentation/HOWTO
+++ b/Documentation/HOWTO
@@ -377,7 +377,7 @@ Bug Reporting
377bugzilla.kernel.org is where the Linux kernel developers track kernel 377bugzilla.kernel.org is where the Linux kernel developers track kernel
378bugs. Users are encouraged to report all bugs that they find in this 378bugs. Users are encouraged to report all bugs that they find in this
379tool. For details on how to use the kernel bugzilla, please see: 379tool. For details on how to use the kernel bugzilla, please see:
380 http://test.kernel.org/bugzilla/faq.html 380 http://bugzilla.kernel.org/page.cgi?id=faq.html
381 381
382The file REPORTING-BUGS in the main kernel source directory has a good 382The file REPORTING-BUGS in the main kernel source directory has a good
383template for how to report a possible kernel bug, and details what kind 383template for how to report a possible kernel bug, and details what kind
diff --git a/Documentation/ftrace.txt b/Documentation/ftrace.txt
new file mode 100644
index 000000000000..13e4bf054c38
--- /dev/null
+++ b/Documentation/ftrace.txt
@@ -0,0 +1,1353 @@
1 ftrace - Function Tracer
2 ========================
3
4Copyright 2008 Red Hat Inc.
5Author: Steven Rostedt <srostedt@redhat.com>
6
7
8Introduction
9------------
10
11Ftrace is an internal tracer designed to help out developers and
12designers of systems to find what is going on inside the kernel.
13It can be used for debugging or analyzing latencies and performance
14issues that take place outside of user-space.
15
16Although ftrace is the function tracer, it also includes an
17infrastructure that allows for other types of tracing. Some of the
18tracers that are currently in ftrace is a tracer to trace
19context switches, the time it takes for a high priority task to
20run after it was woken up, the time interrupts are disabled, and
21more.
22
23
24The File System
25---------------
26
27Ftrace uses the debugfs file system to hold the control files as well
28as the files to display output.
29
30To mount the debugfs system:
31
32 # mkdir /debug
33 # mount -t debugfs nodev /debug
34
35
36That's it! (assuming that you have ftrace configured into your kernel)
37
38After mounting the debugfs, you can see a directory called
39"tracing". This directory contains the control and output files
40of ftrace. Here is a list of some of the key files:
41
42
43 Note: all time values are in microseconds.
44
45 current_tracer : This is used to set or display the current tracer
46 that is configured.
47
48 available_tracers : This holds the different types of tracers that
49 has been compiled into the kernel. The tracers
50 listed here can be configured by echoing in their
51 name into current_tracer.
52
53 tracing_enabled : This sets or displays whether the current_tracer
54 is activated and tracing or not. Echo 0 into this
55 file to disable the tracer or 1 (or non-zero) to
56 enable it.
57
58 trace : This file holds the output of the trace in a human readable
59 format.
60
61 latency_trace : This file shows the same trace but the information
62 is organized more to display possible latencies
63 in the system.
64
65 trace_pipe : The output is the same as the "trace" file but this
66 file is meant to be streamed with live tracing.
67 Reads from this file will block until new data
68 is retrieved. Unlike the "trace" and "latency_trace"
69 files, this file is a consumer. This means reading
70 from this file causes sequential reads to display
71 more current data. Once data is read from this
72 file, it is consumed, and will not be read
73 again with a sequential read. The "trace" and
74 "latency_trace" files are static, and if the
75 tracer isn't adding more data, they will display
76 the same information every time they are read.
77
78 iter_ctrl : This file lets the user control the amount of data
79 that is displayed in one of the above output
80 files.
81
82 trace_max_latency : Some of the tracers record the max latency.
83 For example, the time interrupts are disabled.
84 This time is saved in this file. The max trace
85 will also be stored, and displayed by either
86 "trace" or "latency_trace". A new max trace will
87 only be recorded if the latency is greater than
88 the value in this file. (in microseconds)
89
90 trace_entries : This sets or displays the number of trace
91 entries each CPU buffer can hold. The tracer buffers
92 are the same size for each CPU, so care must be
93 taken when modifying the trace_entries. The number
94 of actually entries will be the number given
95 times the number of possible CPUS. The buffers
96 are saved as individual pages, and the actual entries
97 will always be rounded up to entries per page.
98
99 This can only be updated when the current_tracer
100 is set to "none".
101
102 NOTE: It is planned on changing the allocated buffers
103 from being the number of possible CPUS to
104 the number of online CPUS.
105
106 tracing_cpumask : This is a mask that lets the user only trace
107 on specified CPUS. The format is a hex string
108 representing the CPUS.
109
110 set_ftrace_filter : When dynamic ftrace is configured in, the
111 code is dynamically modified to disable calling
112 of the function profiler (mcount). This lets
113 tracing be configured in with practically no overhead
114 in performance. This also has a side effect of
115 enabling or disabling specific functions to be
116 traced. Echoing in names of functions into this
117 file will limit the trace to only those files.
118
119 set_ftrace_notrace: This has the opposite effect that
120 set_ftrace_filter has. Any function that is added
121 here will not be traced. If a function exists
122 in both set_ftrace_filter and set_ftrace_notrace
123 the function will _not_ bet traced.
124
125 available_filter_functions : When a function is encountered the first
126 time by the dynamic tracer, it is recorded and
127 later the call is converted into a nop. This file
128 lists the functions that have been recorded
129 by the dynamic tracer and these functions can
130 be used to set the ftrace filter by the above
131 "set_ftrace_filter" file.
132
133
134The Tracers
135-----------
136
137Here are the list of current tracers that can be configured.
138
139 ftrace - function tracer that uses mcount to trace all functions.
140 It is possible to filter out which functions that are
141 traced when dynamic ftrace is configured in.
142
143 sched_switch - traces the context switches between tasks.
144
145 irqsoff - traces the areas that disable interrupts and saves off
146 the trace with the longest max latency.
147 See tracing_max_latency. When a new max is recorded,
148 it replaces the old trace. It is best to view this
149 trace with the latency_trace file.
150
151 preemptoff - Similar to irqsoff but traces and records the time
152 preemption is disabled.
153
154 preemptirqsoff - Similar to irqsoff and preemptoff, but traces and
155 records the largest time irqs and/or preemption is
156 disabled.
157
158 wakeup - Traces and records the max latency that it takes for
159 the highest priority task to get scheduled after
160 it has been woken up.
161
162 none - This is not a tracer. To remove all tracers from tracing
163 simply echo "none" into current_tracer.
164
165
166Examples of using the tracer
167----------------------------
168
169Here are typical examples of using the tracers with only controlling
170them with the debugfs interface (without using any user-land utilities).
171
172Output format:
173--------------
174
175Here's an example of the output format of the file "trace"
176
177 --------
178# tracer: ftrace
179#
180# TASK-PID CPU# TIMESTAMP FUNCTION
181# | | | | |
182 bash-4251 [01] 10152.583854: path_put <-path_walk
183 bash-4251 [01] 10152.583855: dput <-path_put
184 bash-4251 [01] 10152.583855: _atomic_dec_and_lock <-dput
185 --------
186
187A header is printed with the trace that is represented. In this case
188the tracer is "ftrace". Then a header showing the format. Task name
189"bash", the task PID "4251", the CPU that it was running on
190"01", the timestamp in <secs>.<usecs> format, the function name that was
191traced "path_put" and the parent function that called this function
192"path_walk".
193
194The sched_switch tracer also includes tracing of task wake ups and
195context switches.
196
197 ksoftirqd/1-7 [01] 1453.070013: 7:115:R + 2916:115:S
198 ksoftirqd/1-7 [01] 1453.070013: 7:115:R + 10:115:S
199 ksoftirqd/1-7 [01] 1453.070013: 7:115:R ==> 10:115:R
200 events/1-10 [01] 1453.070013: 10:115:S ==> 2916:115:R
201 kondemand/1-2916 [01] 1453.070013: 2916:115:S ==> 7:115:R
202 ksoftirqd/1-7 [01] 1453.070013: 7:115:S ==> 0:140:R
203
204Wake ups are represented by a "+" and the context switches show
205"==>". The format is:
206
207 Context switches:
208
209 Previous task Next Task
210
211 <pid>:<prio>:<state> ==> <pid>:<prio>:<state>
212
213 Wake ups:
214
215 Current task Task waking up
216
217 <pid>:<prio>:<state> + <pid>:<prio>:<state>
218
219The prio is the internal kernel priority, which is inverse to the
220priority that is usually displayed by user-space tools. Zero represents
221the highest priority (99). Prio 100 starts the "nice" priorities with
222100 being equal to nice -20 and 139 being nice 19. The prio "140" is
223reserved for the idle task which is the lowest priority thread (pid 0).
224
225
226Latency trace format
227--------------------
228
229For traces that display latency times, the latency_trace file gives
230a bit more information to see why a latency happened. Here's a typical
231trace.
232
233# tracer: irqsoff
234#
235irqsoff latency trace v1.1.5 on 2.6.26-rc8
236--------------------------------------------------------------------
237 latency: 97 us, #3/3, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
238 -----------------
239 | task: swapper-0 (uid:0 nice:0 policy:0 rt_prio:0)
240 -----------------
241 => started at: apic_timer_interrupt
242 => ended at: do_softirq
243
244# _------=> CPU#
245# / _-----=> irqs-off
246# | / _----=> need-resched
247# || / _---=> hardirq/softirq
248# ||| / _--=> preempt-depth
249# |||| /
250# ||||| delay
251# cmd pid ||||| time | caller
252# \ / ||||| \ | /
253 <idle>-0 0d..1 0us+: trace_hardirqs_off_thunk (apic_timer_interrupt)
254 <idle>-0 0d.s. 97us : __do_softirq (do_softirq)
255 <idle>-0 0d.s1 98us : trace_hardirqs_on (do_softirq)
256
257
258vim:ft=help
259
260
261This shows that the current tracer is "irqsoff" tracing the time
262interrupts are disabled. It gives the trace version and the kernel
263this was executed on (2.6.26-rc8). Then it displays the max latency
264in microsecs (97 us). The number of trace entries displayed
265by the total number recorded (both are three: #3/3). The type of
266preemption that was used (PREEMPT). VP, KP, SP, and HP are always zero
267and reserved for later use. #P is the number of online CPUS (#P:2).
268
269The task is the process that was running when the latency happened.
270(swapper pid: 0).
271
272The start and stop that caused the latencies:
273
274 apic_timer_interrupt is where the interrupts were disabled.
275 do_softirq is where they were enabled again.
276
277The next lines after the header are the trace itself. The header
278explains which is which.
279
280 cmd: The name of the process in the trace.
281
282 pid: The PID of that process.
283
284 CPU#: The CPU that the process was running on.
285
286 irqs-off: 'd' interrupts are disabled. '.' otherwise.
287
288 need-resched: 'N' task need_resched is set, '.' otherwise.
289
290 hardirq/softirq:
291 'H' - hard irq happened inside a softirq.
292 'h' - hard irq is running
293 's' - soft irq is running
294 '.' - normal context.
295
296 preempt-depth: The level of preempt_disabled
297
298The above is mostly meaningful for kernel developers.
299
300 time: This differs from the trace output where as the trace output
301 contained a absolute timestamp. This timestamp is relative
302 to the start of the first entry in the the trace.
303
304 delay: This is just to help catch your eye a bit better. And
305 needs to be fixed to be only relative to the same CPU.
306 The marks is determined by the difference between this
307 current trace and the next trace.
308 '!' - greater than preempt_mark_thresh (default 100)
309 '+' - greater than 1 microsecond
310 ' ' - less than or equal to 1 microsecond.
311
312 The rest is the same as the 'trace' file.
313
314
315iter_ctrl
316---------
317
318The iter_ctrl file is used to control what gets printed in the trace
319output. To see what is available, simply cat the file:
320
321 cat /debug/tracing/iter_ctrl
322 print-parent nosym-offset nosym-addr noverbose noraw nohex nobin \
323 noblock nostacktrace nosched-tree
324
325To disable one of the options, echo in the option appended with "no".
326
327 echo noprint-parent > /debug/tracing/iter_ctrl
328
329To enable an option, leave off the "no".
330
331 echo sym-offest > /debug/tracing/iter_ctrl
332
333Here are the available options:
334
335 print-parent - On function traces, display the calling function
336 as well as the function being traced.
337
338 print-parent:
339 bash-4000 [01] 1477.606694: simple_strtoul <-strict_strtoul
340
341 noprint-parent:
342 bash-4000 [01] 1477.606694: simple_strtoul
343
344
345 sym-offset - Display not only the function name, but also the offset
346 in the function. For example, instead of seeing just
347 "ktime_get" you will see "ktime_get+0xb/0x20"
348
349 sym-offset:
350 bash-4000 [01] 1477.606694: simple_strtoul+0x6/0xa0
351
352 sym-addr - this will also display the function address as well as
353 the function name.
354
355 sym-addr:
356 bash-4000 [01] 1477.606694: simple_strtoul <c0339346>
357
358 verbose - This deals with the latency_trace file.
359
360 bash 4000 1 0 00000000 00010a95 [58127d26] 1720.415ms \
361 (+0.000ms): simple_strtoul (strict_strtoul)
362
363 raw - This will display raw numbers. This option is best for use with
364 user applications that can translate the raw numbers better than
365 having it done in the kernel.
366
367 hex - similar to raw, but the numbers will be in a hexadecimal format.
368
369 bin - This will print out the formats in raw binary.
370
371 block - TBD (needs update)
372
373 stacktrace - This is one of the options that changes the trace itself.
374 When a trace is recorded, so is the stack of functions.
375 This allows for back traces of trace sites.
376
377 sched-tree - TBD (any users??)
378
379
380sched_switch
381------------
382
383This tracer simply records schedule switches. Here's an example
384on how to implement it.
385
386 # echo sched_switch > /debug/tracing/current_tracer
387 # echo 1 > /debug/tracing/tracing_enabled
388 # sleep 1
389 # echo 0 > /debug/tracing/tracing_enabled
390 # cat /debug/tracing/trace
391
392# tracer: sched_switch
393#
394# TASK-PID CPU# TIMESTAMP FUNCTION
395# | | | | |
396 bash-3997 [01] 240.132281: 3997:120:R + 4055:120:R
397 bash-3997 [01] 240.132284: 3997:120:R ==> 4055:120:R
398 sleep-4055 [01] 240.132371: 4055:120:S ==> 3997:120:R
399 bash-3997 [01] 240.132454: 3997:120:R + 4055:120:S
400 bash-3997 [01] 240.132457: 3997:120:R ==> 4055:120:R
401 sleep-4055 [01] 240.132460: 4055:120:D ==> 3997:120:R
402 bash-3997 [01] 240.132463: 3997:120:R + 4055:120:D
403 bash-3997 [01] 240.132465: 3997:120:R ==> 4055:120:R
404 <idle>-0 [00] 240.132589: 0:140:R + 4:115:S
405 <idle>-0 [00] 240.132591: 0:140:R ==> 4:115:R
406 ksoftirqd/0-4 [00] 240.132595: 4:115:S ==> 0:140:R
407 <idle>-0 [00] 240.132598: 0:140:R + 4:115:S
408 <idle>-0 [00] 240.132599: 0:140:R ==> 4:115:R
409 ksoftirqd/0-4 [00] 240.132603: 4:115:S ==> 0:140:R
410 sleep-4055 [01] 240.133058: 4055:120:S ==> 3997:120:R
411 [...]
412
413
414As we have discussed previously about this format, the header shows
415the name of the trace and points to the options. The "FUNCTION"
416is a misnomer since here it represents the wake ups and context
417switches.
418
419The sched_switch only lists the wake ups (represented with '+')
420and context switches ('==>') with the previous task or current
421first followed by the next task or task waking up. The format for both
422of these is PID:KERNEL-PRIO:TASK-STATE. Remember that the KERNEL-PRIO
423is the inverse of the actual priority with zero (0) being the highest
424priority and the nice values starting at 100 (nice -20). Below is
425a quick chart to map the kernel priority to user land priorities.
426
427 Kernel priority: 0 to 99 ==> user RT priority 99 to 0
428 Kernel priority: 100 to 139 ==> user nice -20 to 19
429 Kernel priority: 140 ==> idle task priority
430
431The task states are:
432
433 R - running : wants to run, may not actually be running
434 S - sleep : process is waiting to be woken up (handles signals)
435 D - deep sleep : process must be woken up (ignores signals)
436 T - stopped : process suspended
437 t - traced : process is being traced (with something like gdb)
438 Z - zombie : process waiting to be cleaned up
439 X - unknown
440
441
442ftrace_enabled
443--------------
444
445The following tracers give different output depending on whether
446or not the sysctl ftrace_enabled is set. To set ftrace_enabled,
447one can either use the sysctl function or set it via the proc
448file system interface.
449
450 sysctl kernel.ftrace_enabled=1
451
452 or
453
454 echo 1 > /proc/sys/kernel/ftrace_enabled
455
456To disable ftrace_enabled simply replace the '1' with '0' in
457the above commands.
458
459When ftrace_enabled is set the tracers will also record the functions
460that are within the trace. The descriptions of the tracers
461will also show an example with ftrace enabled.
462
463
464irqsoff
465-------
466
467When interrupts are disabled, the CPU can not react to any other
468external event (besides NMIs and SMIs). This prevents the timer
469interrupt from triggering or the mouse interrupt from letting the
470kernel know of a new mouse event. The result is a latency with the
471reaction time.
472
473The irqsoff tracer tracks the time interrupts are disabled and when
474they are re-enabled. When a new maximum latency is hit, it saves off
475the trace so that it may be retrieved at a later time. Every time a
476new maximum in reached, the old saved trace is discarded and the new
477trace is saved.
478
479To reset the maximum, echo 0 into tracing_max_latency. Here's an
480example:
481
482 # echo irqsoff > /debug/tracing/current_tracer
483 # echo 0 > /debug/tracing/tracing_max_latency
484 # echo 1 > /debug/tracing/tracing_enabled
485 # ls -ltr
486 [...]
487 # echo 0 > /debug/tracing/tracing_enabled
488 # cat /debug/tracing/latency_trace
489# tracer: irqsoff
490#
491irqsoff latency trace v1.1.5 on 2.6.26-rc8
492--------------------------------------------------------------------
493 latency: 6 us, #3/3, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
494 -----------------
495 | task: bash-4269 (uid:0 nice:0 policy:0 rt_prio:0)
496 -----------------
497 => started at: copy_page_range
498 => ended at: copy_page_range
499
500# _------=> CPU#
501# / _-----=> irqs-off
502# | / _----=> need-resched
503# || / _---=> hardirq/softirq
504# ||| / _--=> preempt-depth
505# |||| /
506# ||||| delay
507# cmd pid ||||| time | caller
508# \ / ||||| \ | /
509 bash-4269 1...1 0us+: _spin_lock (copy_page_range)
510 bash-4269 1...1 7us : _spin_unlock (copy_page_range)
511 bash-4269 1...2 7us : trace_preempt_on (copy_page_range)
512
513
514vim:ft=help
515
516Here we see that that we had a latency of 6 microsecs (which is
517very good). The spin_lock in copy_page_range disabled interrupts.
518The difference between the 6 and the displayed timestamp 7us is
519because the clock must have incremented between the time of recording
520the max latency and recording the function that had that latency.
521
522Note the above had ftrace_enabled not set. If we set the ftrace_enabled
523we get a much larger output:
524
525# tracer: irqsoff
526#
527irqsoff latency trace v1.1.5 on 2.6.26-rc8
528--------------------------------------------------------------------
529 latency: 50 us, #101/101, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
530 -----------------
531 | task: ls-4339 (uid:0 nice:0 policy:0 rt_prio:0)
532 -----------------
533 => started at: __alloc_pages_internal
534 => ended at: __alloc_pages_internal
535
536# _------=> CPU#
537# / _-----=> irqs-off
538# | / _----=> need-resched
539# || / _---=> hardirq/softirq
540# ||| / _--=> preempt-depth
541# |||| /
542# ||||| delay
543# cmd pid ||||| time | caller
544# \ / ||||| \ | /
545 ls-4339 0...1 0us+: get_page_from_freelist (__alloc_pages_internal)
546 ls-4339 0d..1 3us : rmqueue_bulk (get_page_from_freelist)
547 ls-4339 0d..1 3us : _spin_lock (rmqueue_bulk)
548 ls-4339 0d..1 4us : add_preempt_count (_spin_lock)
549 ls-4339 0d..2 4us : __rmqueue (rmqueue_bulk)
550 ls-4339 0d..2 5us : __rmqueue_smallest (__rmqueue)
551 ls-4339 0d..2 5us : __mod_zone_page_state (__rmqueue_smallest)
552 ls-4339 0d..2 6us : __rmqueue (rmqueue_bulk)
553 ls-4339 0d..2 6us : __rmqueue_smallest (__rmqueue)
554 ls-4339 0d..2 7us : __mod_zone_page_state (__rmqueue_smallest)
555 ls-4339 0d..2 7us : __rmqueue (rmqueue_bulk)
556 ls-4339 0d..2 8us : __rmqueue_smallest (__rmqueue)
557[...]
558 ls-4339 0d..2 46us : __rmqueue_smallest (__rmqueue)
559 ls-4339 0d..2 47us : __mod_zone_page_state (__rmqueue_smallest)
560 ls-4339 0d..2 47us : __rmqueue (rmqueue_bulk)
561 ls-4339 0d..2 48us : __rmqueue_smallest (__rmqueue)
562 ls-4339 0d..2 48us : __mod_zone_page_state (__rmqueue_smallest)
563 ls-4339 0d..2 49us : _spin_unlock (rmqueue_bulk)
564 ls-4339 0d..2 49us : sub_preempt_count (_spin_unlock)
565 ls-4339 0d..1 50us : get_page_from_freelist (__alloc_pages_internal)
566 ls-4339 0d..2 51us : trace_hardirqs_on (__alloc_pages_internal)
567
568
569vim:ft=help
570
571
572Here we traced a 50 microsecond latency. But we also see all the
573functions that were called during that time. Note that enabling
574function tracing we endure an added overhead. This overhead may
575extend the latency times. But never the less, this trace has provided
576some very helpful debugging.
577
578
579preemptoff
580----------
581
582When preemption is disabled we may be able to receive interrupts but
583the task can not be preempted and a higher priority task must wait
584for preemption to be enabled again before it can preempt a lower
585priority task.
586
587The preemptoff tracer traces the places that disables preemption.
588Like the irqsoff, it records the maximum latency that preemption
589was disabled. The control of preemptoff is much like the irqsoff.
590
591 # echo preemptoff > /debug/tracing/current_tracer
592 # echo 0 > /debug/tracing/tracing_max_latency
593 # echo 1 > /debug/tracing/tracing_enabled
594 # ls -ltr
595 [...]
596 # echo 0 > /debug/tracing/tracing_enabled
597 # cat /debug/tracing/latency_trace
598# tracer: preemptoff
599#
600preemptoff latency trace v1.1.5 on 2.6.26-rc8
601--------------------------------------------------------------------
602 latency: 29 us, #3/3, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
603 -----------------
604 | task: sshd-4261 (uid:0 nice:0 policy:0 rt_prio:0)
605 -----------------
606 => started at: do_IRQ
607 => ended at: __do_softirq
608
609# _------=> CPU#
610# / _-----=> irqs-off
611# | / _----=> need-resched
612# || / _---=> hardirq/softirq
613# ||| / _--=> preempt-depth
614# |||| /
615# ||||| delay
616# cmd pid ||||| time | caller
617# \ / ||||| \ | /
618 sshd-4261 0d.h. 0us+: irq_enter (do_IRQ)
619 sshd-4261 0d.s. 29us : _local_bh_enable (__do_softirq)
620 sshd-4261 0d.s1 30us : trace_preempt_on (__do_softirq)
621
622
623vim:ft=help
624
625This has some more changes. Preemption was disabled when an interrupt
626came in (notice the 'h'), and was enabled while doing a softirq.
627(notice the 's'). But we also see that interrupts have been disabled
628when entering the preempt off section and leaving it (the 'd').
629We do not know if interrupts were enabled in the mean time.
630
631# tracer: preemptoff
632#
633preemptoff latency trace v1.1.5 on 2.6.26-rc8
634--------------------------------------------------------------------
635 latency: 63 us, #87/87, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
636 -----------------
637 | task: sshd-4261 (uid:0 nice:0 policy:0 rt_prio:0)
638 -----------------
639 => started at: remove_wait_queue
640 => ended at: __do_softirq
641
642# _------=> CPU#
643# / _-----=> irqs-off
644# | / _----=> need-resched
645# || / _---=> hardirq/softirq
646# ||| / _--=> preempt-depth
647# |||| /
648# ||||| delay
649# cmd pid ||||| time | caller
650# \ / ||||| \ | /
651 sshd-4261 0d..1 0us : _spin_lock_irqsave (remove_wait_queue)
652 sshd-4261 0d..1 1us : _spin_unlock_irqrestore (remove_wait_queue)
653 sshd-4261 0d..1 2us : do_IRQ (common_interrupt)
654 sshd-4261 0d..1 2us : irq_enter (do_IRQ)
655 sshd-4261 0d..1 2us : idle_cpu (irq_enter)
656 sshd-4261 0d..1 3us : add_preempt_count (irq_enter)
657 sshd-4261 0d.h1 3us : idle_cpu (irq_enter)
658 sshd-4261 0d.h. 4us : handle_fasteoi_irq (do_IRQ)
659[...]
660 sshd-4261 0d.h. 12us : add_preempt_count (_spin_lock)
661 sshd-4261 0d.h1 12us : ack_ioapic_quirk_irq (handle_fasteoi_irq)
662 sshd-4261 0d.h1 13us : move_native_irq (ack_ioapic_quirk_irq)
663 sshd-4261 0d.h1 13us : _spin_unlock (handle_fasteoi_irq)
664 sshd-4261 0d.h1 14us : sub_preempt_count (_spin_unlock)
665 sshd-4261 0d.h1 14us : irq_exit (do_IRQ)
666 sshd-4261 0d.h1 15us : sub_preempt_count (irq_exit)
667 sshd-4261 0d..2 15us : do_softirq (irq_exit)
668 sshd-4261 0d... 15us : __do_softirq (do_softirq)
669 sshd-4261 0d... 16us : __local_bh_disable (__do_softirq)
670 sshd-4261 0d... 16us+: add_preempt_count (__local_bh_disable)
671 sshd-4261 0d.s4 20us : add_preempt_count (__local_bh_disable)
672 sshd-4261 0d.s4 21us : sub_preempt_count (local_bh_enable)
673 sshd-4261 0d.s5 21us : sub_preempt_count (local_bh_enable)
674[...]
675 sshd-4261 0d.s6 41us : add_preempt_count (__local_bh_disable)
676 sshd-4261 0d.s6 42us : sub_preempt_count (local_bh_enable)
677 sshd-4261 0d.s7 42us : sub_preempt_count (local_bh_enable)
678 sshd-4261 0d.s5 43us : add_preempt_count (__local_bh_disable)
679 sshd-4261 0d.s5 43us : sub_preempt_count (local_bh_enable_ip)
680 sshd-4261 0d.s6 44us : sub_preempt_count (local_bh_enable_ip)
681 sshd-4261 0d.s5 44us : add_preempt_count (__local_bh_disable)
682 sshd-4261 0d.s5 45us : sub_preempt_count (local_bh_enable)
683[...]
684 sshd-4261 0d.s. 63us : _local_bh_enable (__do_softirq)
685 sshd-4261 0d.s1 64us : trace_preempt_on (__do_softirq)
686
687
688The above is an example of the preemptoff trace with ftrace_enabled
689set. Here we see that interrupts were disabled the entire time.
690The irq_enter code lets us know that we entered an interrupt 'h'.
691Before that, the functions being traced still show that it is not
692in an interrupt, but we can see by the functions themselves that
693this is not the case.
694
695Notice that the __do_softirq when called doesn't have a preempt_count.
696It may seem that we missed a preempt enabled. What really happened
697is that the preempt count is held on the threads stack and we
698switched to the softirq stack (4K stacks in effect). The code
699does not copy the preempt count, but because interrupts are disabled
700we don't need to worry about it. Having a tracer like this is good
701to let people know what really happens inside the kernel.
702
703
704preemptirqsoff
705--------------
706
707Knowing the locations that have interrupts disabled or preemption
708disabled for the longest times is helpful. But sometimes we would
709like to know when either preemption and/or interrupts are disabled.
710
711The following code:
712
713 local_irq_disable();
714 call_function_with_irqs_off();
715 preempt_disable();
716 call_function_with_irqs_and_preemption_off();
717 local_irq_enable();
718 call_function_with_preemption_off();
719 preempt_enable();
720
721The irqsoff tracer will record the total length of
722call_function_with_irqs_off() and
723call_function_with_irqs_and_preemption_off().
724
725The preemptoff tracer will record the total length of
726call_function_with_irqs_and_preemption_off() and
727call_function_with_preemption_off().
728
729But neither will trace the time that interrupts and/or preemption
730is disabled. This total time is the time that we can not schedule.
731To record this time, use the preemptirqsoff tracer.
732
733Again, using this trace is much like the irqsoff and preemptoff tracers.
734
735 # echo preemptoff > /debug/tracing/current_tracer
736 # echo 0 > /debug/tracing/tracing_max_latency
737 # echo 1 > /debug/tracing/tracing_enabled
738 # ls -ltr
739 [...]
740 # echo 0 > /debug/tracing/tracing_enabled
741 # cat /debug/tracing/latency_trace
742# tracer: preemptirqsoff
743#
744preemptirqsoff latency trace v1.1.5 on 2.6.26-rc8
745--------------------------------------------------------------------
746 latency: 293 us, #3/3, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
747 -----------------
748 | task: ls-4860 (uid:0 nice:0 policy:0 rt_prio:0)
749 -----------------
750 => started at: apic_timer_interrupt
751 => ended at: __do_softirq
752
753# _------=> CPU#
754# / _-----=> irqs-off
755# | / _----=> need-resched
756# || / _---=> hardirq/softirq
757# ||| / _--=> preempt-depth
758# |||| /
759# ||||| delay
760# cmd pid ||||| time | caller
761# \ / ||||| \ | /
762 ls-4860 0d... 0us!: trace_hardirqs_off_thunk (apic_timer_interrupt)
763 ls-4860 0d.s. 294us : _local_bh_enable (__do_softirq)
764 ls-4860 0d.s1 294us : trace_preempt_on (__do_softirq)
765
766
767vim:ft=help
768
769
770The trace_hardirqs_off_thunk is called from assembly on x86 when
771interrupts are disabled in the assembly code. Without the function
772tracing, we don't know if interrupts were enabled within the preemption
773points. We do see that it started with preemption enabled.
774
775Here is a trace with ftrace_enabled set:
776
777
778# tracer: preemptirqsoff
779#
780preemptirqsoff latency trace v1.1.5 on 2.6.26-rc8
781--------------------------------------------------------------------
782 latency: 105 us, #183/183, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
783 -----------------
784 | task: sshd-4261 (uid:0 nice:0 policy:0 rt_prio:0)
785 -----------------
786 => started at: write_chan
787 => ended at: __do_softirq
788
789# _------=> CPU#
790# / _-----=> irqs-off
791# | / _----=> need-resched
792# || / _---=> hardirq/softirq
793# ||| / _--=> preempt-depth
794# |||| /
795# ||||| delay
796# cmd pid ||||| time | caller
797# \ / ||||| \ | /
798 ls-4473 0.N.. 0us : preempt_schedule (write_chan)
799 ls-4473 0dN.1 1us : _spin_lock (schedule)
800 ls-4473 0dN.1 2us : add_preempt_count (_spin_lock)
801 ls-4473 0d..2 2us : put_prev_task_fair (schedule)
802[...]
803 ls-4473 0d..2 13us : set_normalized_timespec (ktime_get_ts)
804 ls-4473 0d..2 13us : __switch_to (schedule)
805 sshd-4261 0d..2 14us : finish_task_switch (schedule)
806 sshd-4261 0d..2 14us : _spin_unlock_irq (finish_task_switch)
807 sshd-4261 0d..1 15us : add_preempt_count (_spin_lock_irqsave)
808 sshd-4261 0d..2 16us : _spin_unlock_irqrestore (hrtick_set)
809 sshd-4261 0d..2 16us : do_IRQ (common_interrupt)
810 sshd-4261 0d..2 17us : irq_enter (do_IRQ)
811 sshd-4261 0d..2 17us : idle_cpu (irq_enter)
812 sshd-4261 0d..2 18us : add_preempt_count (irq_enter)
813 sshd-4261 0d.h2 18us : idle_cpu (irq_enter)
814 sshd-4261 0d.h. 18us : handle_fasteoi_irq (do_IRQ)
815 sshd-4261 0d.h. 19us : _spin_lock (handle_fasteoi_irq)
816 sshd-4261 0d.h. 19us : add_preempt_count (_spin_lock)
817 sshd-4261 0d.h1 20us : _spin_unlock (handle_fasteoi_irq)
818 sshd-4261 0d.h1 20us : sub_preempt_count (_spin_unlock)
819[...]
820 sshd-4261 0d.h1 28us : _spin_unlock (handle_fasteoi_irq)
821 sshd-4261 0d.h1 29us : sub_preempt_count (_spin_unlock)
822 sshd-4261 0d.h2 29us : irq_exit (do_IRQ)
823 sshd-4261 0d.h2 29us : sub_preempt_count (irq_exit)
824 sshd-4261 0d..3 30us : do_softirq (irq_exit)
825 sshd-4261 0d... 30us : __do_softirq (do_softirq)
826 sshd-4261 0d... 31us : __local_bh_disable (__do_softirq)
827 sshd-4261 0d... 31us+: add_preempt_count (__local_bh_disable)
828 sshd-4261 0d.s4 34us : add_preempt_count (__local_bh_disable)
829[...]
830 sshd-4261 0d.s3 43us : sub_preempt_count (local_bh_enable_ip)
831 sshd-4261 0d.s4 44us : sub_preempt_count (local_bh_enable_ip)
832 sshd-4261 0d.s3 44us : smp_apic_timer_interrupt (apic_timer_interrupt)
833 sshd-4261 0d.s3 45us : irq_enter (smp_apic_timer_interrupt)
834 sshd-4261 0d.s3 45us : idle_cpu (irq_enter)
835 sshd-4261 0d.s3 46us : add_preempt_count (irq_enter)
836 sshd-4261 0d.H3 46us : idle_cpu (irq_enter)
837 sshd-4261 0d.H3 47us : hrtimer_interrupt (smp_apic_timer_interrupt)
838 sshd-4261 0d.H3 47us : ktime_get (hrtimer_interrupt)
839[...]
840 sshd-4261 0d.H3 81us : tick_program_event (hrtimer_interrupt)
841 sshd-4261 0d.H3 82us : ktime_get (tick_program_event)
842 sshd-4261 0d.H3 82us : ktime_get_ts (ktime_get)
843 sshd-4261 0d.H3 83us : getnstimeofday (ktime_get_ts)
844 sshd-4261 0d.H3 83us : set_normalized_timespec (ktime_get_ts)
845 sshd-4261 0d.H3 84us : clockevents_program_event (tick_program_event)
846 sshd-4261 0d.H3 84us : lapic_next_event (clockevents_program_event)
847 sshd-4261 0d.H3 85us : irq_exit (smp_apic_timer_interrupt)
848 sshd-4261 0d.H3 85us : sub_preempt_count (irq_exit)
849 sshd-4261 0d.s4 86us : sub_preempt_count (irq_exit)
850 sshd-4261 0d.s3 86us : add_preempt_count (__local_bh_disable)
851[...]
852 sshd-4261 0d.s1 98us : sub_preempt_count (net_rx_action)
853 sshd-4261 0d.s. 99us : add_preempt_count (_spin_lock_irq)
854 sshd-4261 0d.s1 99us+: _spin_unlock_irq (run_timer_softirq)
855 sshd-4261 0d.s. 104us : _local_bh_enable (__do_softirq)
856 sshd-4261 0d.s. 104us : sub_preempt_count (_local_bh_enable)
857 sshd-4261 0d.s. 105us : _local_bh_enable (__do_softirq)
858 sshd-4261 0d.s1 105us : trace_preempt_on (__do_softirq)
859
860
861This is a very interesting trace. It started with the preemption of
862the ls task. We see that the task had the "need_resched" bit set
863with the 'N' in the trace. Interrupts are disabled in the spin_lock
864and the trace started. We see that a schedule took place to run
865sshd. When the interrupts were enabled we took an interrupt.
866On return of the interrupt the softirq ran. We took another interrupt
867while running the softirq as we see with the capital 'H'.
868
869
870wakeup
871------
872
873In Real-Time environment it is very important to know the wakeup
874time it takes for the highest priority task that wakes up to the
875time it executes. This is also known as "schedule latency".
876I stress the point that this is about RT tasks. It is also important
877to know the scheduling latency of non-RT tasks, but the average
878schedule latency is better for non-RT tasks. Tools like
879LatencyTop is more appropriate for such measurements.
880
881Real-Time environments is interested in the worst case latency.
882That is the longest latency it takes for something to happen, and
883not the average. We can have a very fast scheduler that may only
884have a large latency once in a while, but that would not work well
885with Real-Time tasks. The wakeup tracer was designed to record
886the worst case wakeups of RT tasks. Non-RT tasks are not recorded
887because the tracer only records one worst case and tracing non-RT
888tasks that are unpredictable will overwrite the worst case latency
889of RT tasks.
890
891Since this tracer only deals with RT tasks, we will run this slightly
892different than we did with the previous tracers. Instead of performing
893an 'ls' we will run 'sleep 1' under 'chrt' which changes the
894priority of the task.
895
896 # echo wakeup > /debug/tracing/current_tracer
897 # echo 0 > /debug/tracing/tracing_max_latency
898 # echo 1 > /debug/tracing/tracing_enabled
899 # chrt -f 5 sleep 1
900 # echo 0 > /debug/tracing/tracing_enabled
901 # cat /debug/tracing/latency_trace
902# tracer: wakeup
903#
904wakeup latency trace v1.1.5 on 2.6.26-rc8
905--------------------------------------------------------------------
906 latency: 4 us, #2/2, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
907 -----------------
908 | task: sleep-4901 (uid:0 nice:0 policy:1 rt_prio:5)
909 -----------------
910
911# _------=> CPU#
912# / _-----=> irqs-off
913# | / _----=> need-resched
914# || / _---=> hardirq/softirq
915# ||| / _--=> preempt-depth
916# |||| /
917# ||||| delay
918# cmd pid ||||| time | caller
919# \ / ||||| \ | /
920 <idle>-0 1d.h4 0us+: try_to_wake_up (wake_up_process)
921 <idle>-0 1d..4 4us : schedule (cpu_idle)
922
923
924vim:ft=help
925
926
927Running this on an idle system we see that it only took 4 microseconds
928to perform the task switch. Note, since the trace marker in the
929schedule is before the actual "switch" we stop the tracing when
930the recorded task is about to schedule in. This may change if
931we add a new marker at the end of the scheduler.
932
933Notice that the recorded task is 'sleep' with the PID of 4901 and it
934has an rt_prio of 5. This priority is user-space priority and not
935the internal kernel priority. The policy is 1 for SCHED_FIFO and 2
936for SCHED_RR.
937
938Doing the same with chrt -r 5 and ftrace_enabled set.
939
940# tracer: wakeup
941#
942wakeup latency trace v1.1.5 on 2.6.26-rc8
943--------------------------------------------------------------------
944 latency: 50 us, #60/60, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
945 -----------------
946 | task: sleep-4068 (uid:0 nice:0 policy:2 rt_prio:5)
947 -----------------
948
949# _------=> CPU#
950# / _-----=> irqs-off
951# | / _----=> need-resched
952# || / _---=> hardirq/softirq
953# ||| / _--=> preempt-depth
954# |||| /
955# ||||| delay
956# cmd pid ||||| time | caller
957# \ / ||||| \ | /
958ksoftirq-7 1d.H3 0us : try_to_wake_up (wake_up_process)
959ksoftirq-7 1d.H4 1us : sub_preempt_count (marker_probe_cb)
960ksoftirq-7 1d.H3 2us : check_preempt_wakeup (try_to_wake_up)
961ksoftirq-7 1d.H3 3us : update_curr (check_preempt_wakeup)
962ksoftirq-7 1d.H3 4us : calc_delta_mine (update_curr)
963ksoftirq-7 1d.H3 5us : __resched_task (check_preempt_wakeup)
964ksoftirq-7 1d.H3 6us : task_wake_up_rt (try_to_wake_up)
965ksoftirq-7 1d.H3 7us : _spin_unlock_irqrestore (try_to_wake_up)
966[...]
967ksoftirq-7 1d.H2 17us : irq_exit (smp_apic_timer_interrupt)
968ksoftirq-7 1d.H2 18us : sub_preempt_count (irq_exit)
969ksoftirq-7 1d.s3 19us : sub_preempt_count (irq_exit)
970ksoftirq-7 1..s2 20us : rcu_process_callbacks (__do_softirq)
971[...]
972ksoftirq-7 1..s2 26us : __rcu_process_callbacks (rcu_process_callbacks)
973ksoftirq-7 1d.s2 27us : _local_bh_enable (__do_softirq)
974ksoftirq-7 1d.s2 28us : sub_preempt_count (_local_bh_enable)
975ksoftirq-7 1.N.3 29us : sub_preempt_count (ksoftirqd)
976ksoftirq-7 1.N.2 30us : _cond_resched (ksoftirqd)
977ksoftirq-7 1.N.2 31us : __cond_resched (_cond_resched)
978ksoftirq-7 1.N.2 32us : add_preempt_count (__cond_resched)
979ksoftirq-7 1.N.2 33us : schedule (__cond_resched)
980ksoftirq-7 1.N.2 33us : add_preempt_count (schedule)
981ksoftirq-7 1.N.3 34us : hrtick_clear (schedule)
982ksoftirq-7 1dN.3 35us : _spin_lock (schedule)
983ksoftirq-7 1dN.3 36us : add_preempt_count (_spin_lock)
984ksoftirq-7 1d..4 37us : put_prev_task_fair (schedule)
985ksoftirq-7 1d..4 38us : update_curr (put_prev_task_fair)
986[...]
987ksoftirq-7 1d..5 47us : _spin_trylock (tracing_record_cmdline)
988ksoftirq-7 1d..5 48us : add_preempt_count (_spin_trylock)
989ksoftirq-7 1d..6 49us : _spin_unlock (tracing_record_cmdline)
990ksoftirq-7 1d..6 49us : sub_preempt_count (_spin_unlock)
991ksoftirq-7 1d..4 50us : schedule (__cond_resched)
992
993The interrupt went off while running ksoftirqd. This task runs at
994SCHED_OTHER. Why didn't we see the 'N' set early? This may be
995a harmless bug with x86_32 and 4K stacks. The need_reched() function
996that tests if we need to reschedule looks on the actual stack.
997Where as the setting of the NEED_RESCHED bit happens on the
998task's stack. But because we are in a hard interrupt, the test
999is with the interrupts stack which has that to be false. We don't
1000see the 'N' until we switch back to the task's stack.
1001
1002ftrace
1003------
1004
1005ftrace is not only the name of the tracing infrastructure, but it
1006is also a name of one of the tracers. The tracer is the function
1007tracer. Enabling the function tracer can be done from the
1008debug file system. Make sure the ftrace_enabled is set otherwise
1009this tracer is a nop.
1010
1011 # sysctl kernel.ftrace_enabled=1
1012 # echo ftrace > /debug/tracing/current_tracer
1013 # echo 1 > /debug/tracing/tracing_enabled
1014 # usleep 1
1015 # echo 0 > /debug/tracing/tracing_enabled
1016 # cat /debug/tracing/trace
1017# tracer: ftrace
1018#
1019# TASK-PID CPU# TIMESTAMP FUNCTION
1020# | | | | |
1021 bash-4003 [00] 123.638713: finish_task_switch <-schedule
1022 bash-4003 [00] 123.638714: _spin_unlock_irq <-finish_task_switch
1023 bash-4003 [00] 123.638714: sub_preempt_count <-_spin_unlock_irq
1024 bash-4003 [00] 123.638715: hrtick_set <-schedule
1025 bash-4003 [00] 123.638715: _spin_lock_irqsave <-hrtick_set
1026 bash-4003 [00] 123.638716: add_preempt_count <-_spin_lock_irqsave
1027 bash-4003 [00] 123.638716: _spin_unlock_irqrestore <-hrtick_set
1028 bash-4003 [00] 123.638717: sub_preempt_count <-_spin_unlock_irqrestore
1029 bash-4003 [00] 123.638717: hrtick_clear <-hrtick_set
1030 bash-4003 [00] 123.638718: sub_preempt_count <-schedule
1031 bash-4003 [00] 123.638718: sub_preempt_count <-preempt_schedule
1032 bash-4003 [00] 123.638719: wait_for_completion <-__stop_machine_run
1033 bash-4003 [00] 123.638719: wait_for_common <-wait_for_completion
1034 bash-4003 [00] 123.638720: _spin_lock_irq <-wait_for_common
1035 bash-4003 [00] 123.638720: add_preempt_count <-_spin_lock_irq
1036[...]
1037
1038
1039Note: It is sometimes better to enable or disable tracing directly from
1040a program, because the buffer may be overflowed by the echo commands
1041before you get to the point you want to trace. It is also easier to
1042stop the tracing at the point that you hit the part that you are
1043interested in. Since the ftrace buffer is a ring buffer with the
1044oldest data being overwritten, usually it is sufficient to start the
1045tracer with an echo command but have you code stop it. Something
1046like the following is usually appropriate for this.
1047
1048int trace_fd;
1049[...]
1050int main(int argc, char *argv[]) {
1051 [...]
1052 trace_fd = open("/debug/tracing/tracing_enabled", O_WRONLY);
1053 [...]
1054 if (condition_hit()) {
1055 write(trace_fd, "0", 1);
1056 }
1057 [...]
1058}
1059
1060
1061dynamic ftrace
1062--------------
1063
1064If CONFIG_DYNAMIC_FTRACE is set, then the system will run with
1065virtually no overhead when function tracing is disabled. The way
1066this works is the mcount function call (placed at the start of
1067every kernel function, produced by the -pg switch in gcc), starts
1068of pointing to a simple return.
1069
1070When dynamic ftrace is initialized, it calls kstop_machine to make it
1071act like a uniprocessor so that it can freely modify code without
1072worrying about other processors executing that same code. At
1073initialization, the mcount calls are change to call a "record_ip"
1074function. After this, the first time a kernel function is called,
1075it has the calling address saved in a hash table.
1076
1077Later on the ftraced kernel thread is awoken and will again call
1078kstop_machine if new functions have been recorded. The ftraced thread
1079will change all calls to mcount to "nop". Just calling mcount
1080and having mcount return has shown a 10% overhead. By converting
1081it to a nop, there is no recordable overhead to the system.
1082
1083One special side-effect to the recording of the functions being
1084traced, is that we can now selectively choose which functions we
1085want to trace and which ones we want the mcount calls to remain as
1086nops.
1087
1088Two files that contain to the enabling and disabling of recorded
1089functions are:
1090
1091 set_ftrace_filter
1092
1093and
1094
1095 set_ftrace_notrace
1096
1097A list of available functions that you can add to this files is listed
1098in:
1099
1100 available_filter_functions
1101
1102 # cat /debug/tracing/available_filter_functions
1103put_prev_task_idle
1104kmem_cache_create
1105pick_next_task_rt
1106get_online_cpus
1107pick_next_task_fair
1108mutex_lock
1109[...]
1110
1111If I'm only interested in sys_nanosleep and hrtimer_interrupt:
1112
1113 # echo sys_nanosleep hrtimer_interrupt \
1114 > /debug/tracing/set_ftrace_filter
1115 # echo ftrace > /debug/tracing/current_tracer
1116 # echo 1 > /debug/tracing/tracing_enabled
1117 # usleep 1
1118 # echo 0 > /debug/tracing/tracing_enabled
1119 # cat /debug/tracing/trace
1120# tracer: ftrace
1121#
1122# TASK-PID CPU# TIMESTAMP FUNCTION
1123# | | | | |
1124 usleep-4134 [00] 1317.070017: hrtimer_interrupt <-smp_apic_timer_interrupt
1125 usleep-4134 [00] 1317.070111: sys_nanosleep <-syscall_call
1126 <idle>-0 [00] 1317.070115: hrtimer_interrupt <-smp_apic_timer_interrupt
1127
1128To see what functions are being traced, you can cat the file:
1129
1130 # cat /debug/tracing/set_ftrace_filter
1131hrtimer_interrupt
1132sys_nanosleep
1133
1134
1135Perhaps this isn't enough. The filters also allow simple wild cards.
1136Only the following is currently available
1137
1138 <match>* - will match functions that begins with <match>
1139 *<match> - will match functions that end with <match>
1140 *<match>* - will match functions that have <match> in it
1141
1142Thats all the wild cards that are allowed.
1143
1144 <match>*<match> will not work.
1145
1146 # echo hrtimer_* > /debug/tracing/set_ftrace_filter
1147
1148Produces:
1149
1150# tracer: ftrace
1151#
1152# TASK-PID CPU# TIMESTAMP FUNCTION
1153# | | | | |
1154 bash-4003 [00] 1480.611794: hrtimer_init <-copy_process
1155 bash-4003 [00] 1480.611941: hrtimer_start <-hrtick_set
1156 bash-4003 [00] 1480.611956: hrtimer_cancel <-hrtick_clear
1157 bash-4003 [00] 1480.611956: hrtimer_try_to_cancel <-hrtimer_cancel
1158 <idle>-0 [00] 1480.612019: hrtimer_get_next_event <-get_next_timer_interrupt
1159 <idle>-0 [00] 1480.612025: hrtimer_get_next_event <-get_next_timer_interrupt
1160 <idle>-0 [00] 1480.612032: hrtimer_get_next_event <-get_next_timer_interrupt
1161 <idle>-0 [00] 1480.612037: hrtimer_get_next_event <-get_next_timer_interrupt
1162 <idle>-0 [00] 1480.612382: hrtimer_get_next_event <-get_next_timer_interrupt
1163
1164
1165Notice that we lost the sys_nanosleep.
1166
1167 # cat /debug/tracing/set_ftrace_filter
1168hrtimer_run_queues
1169hrtimer_run_pending
1170hrtimer_init
1171hrtimer_cancel
1172hrtimer_try_to_cancel
1173hrtimer_forward
1174hrtimer_start
1175hrtimer_reprogram
1176hrtimer_force_reprogram
1177hrtimer_get_next_event
1178hrtimer_interrupt
1179hrtimer_nanosleep
1180hrtimer_wakeup
1181hrtimer_get_remaining
1182hrtimer_get_res
1183hrtimer_init_sleeper
1184
1185
1186This is because the '>' and '>>' act just like they do in bash.
1187To rewrite the filters, use '>'
1188To append to the filters, use '>>'
1189
1190To clear out a filter so that all functions will be recorded again.
1191
1192 # echo > /debug/tracing/set_ftrace_filter
1193 # cat /debug/tracing/set_ftrace_filter
1194 #
1195
1196Again, now we want to append.
1197
1198 # echo sys_nanosleep > /debug/tracing/set_ftrace_filter
1199 # cat /debug/tracing/set_ftrace_filter
1200sys_nanosleep
1201 # echo hrtimer_* >> /debug/tracing/set_ftrace_filter
1202 # cat /debug/tracing/set_ftrace_filter
1203hrtimer_run_queues
1204hrtimer_run_pending
1205hrtimer_init
1206hrtimer_cancel
1207hrtimer_try_to_cancel
1208hrtimer_forward
1209hrtimer_start
1210hrtimer_reprogram
1211hrtimer_force_reprogram
1212hrtimer_get_next_event
1213hrtimer_interrupt
1214sys_nanosleep
1215hrtimer_nanosleep
1216hrtimer_wakeup
1217hrtimer_get_remaining
1218hrtimer_get_res
1219hrtimer_init_sleeper
1220
1221
1222The set_ftrace_notrace prevents those functions from being traced.
1223
1224 # echo '*preempt*' '*lock*' > /debug/tracing/set_ftrace_notrace
1225
1226Produces:
1227
1228# tracer: ftrace
1229#
1230# TASK-PID CPU# TIMESTAMP FUNCTION
1231# | | | | |
1232 bash-4043 [01] 115.281644: finish_task_switch <-schedule
1233 bash-4043 [01] 115.281645: hrtick_set <-schedule
1234 bash-4043 [01] 115.281645: hrtick_clear <-hrtick_set
1235 bash-4043 [01] 115.281646: wait_for_completion <-__stop_machine_run
1236 bash-4043 [01] 115.281647: wait_for_common <-wait_for_completion
1237 bash-4043 [01] 115.281647: kthread_stop <-stop_machine_run
1238 bash-4043 [01] 115.281648: init_waitqueue_head <-kthread_stop
1239 bash-4043 [01] 115.281648: wake_up_process <-kthread_stop
1240 bash-4043 [01] 115.281649: try_to_wake_up <-wake_up_process
1241
1242We can see that there's no more lock or preempt tracing.
1243
1244ftraced
1245-------
1246
1247As mentioned above, when dynamic ftrace is configured in, a kernel
1248thread wakes up once a second and checks to see if there are mcount
1249calls that need to be converted into nops. If there is not, then
1250it simply goes back to sleep. But if there is, it will call
1251kstop_machine to convert the calls to nops.
1252
1253There may be a case that you do not want this added latency.
1254Perhaps you are doing some audio recording and this activity might
1255cause skips in the playback. There is an interface to disable
1256and enable the ftraced kernel thread.
1257
1258 # echo 0 > /debug/tracing/ftraced_enabled
1259
1260This will disable the calling of the kstop_machine to update the
1261mcount calls to nops. Remember that there's a large overhead
1262to calling mcount. Without this kernel thread, that overhead will
1263exist.
1264
1265Any write to the ftraced_enabled file will cause the kstop_machine
1266to run if there are recorded calls to mcount. This means that a
1267user can manually perform the updates when they want to by simply
1268echoing a '0' into the ftraced_enabled file.
1269
1270The updates are also done at the beginning of enabling a tracer
1271that uses ftrace function recording.
1272
1273
1274trace_pipe
1275----------
1276
1277The trace_pipe outputs the same as trace, but the effect on the
1278tracing is different. Every read from trace_pipe is consumed.
1279This means that subsequent reads will be different. The trace
1280is live.
1281
1282 # echo ftrace > /debug/tracing/current_tracer
1283 # cat /debug/tracing/trace_pipe > /tmp/trace.out &
1284[1] 4153
1285 # echo 1 > /debug/tracing/tracing_enabled
1286 # usleep 1
1287 # echo 0 > /debug/tracing/tracing_enabled
1288 # cat /debug/tracing/trace
1289# tracer: ftrace
1290#
1291# TASK-PID CPU# TIMESTAMP FUNCTION
1292# | | | | |
1293
1294 #
1295 # cat /tmp/trace.out
1296 bash-4043 [00] 41.267106: finish_task_switch <-schedule
1297 bash-4043 [00] 41.267106: hrtick_set <-schedule
1298 bash-4043 [00] 41.267107: hrtick_clear <-hrtick_set
1299 bash-4043 [00] 41.267108: wait_for_completion <-__stop_machine_run
1300 bash-4043 [00] 41.267108: wait_for_common <-wait_for_completion
1301 bash-4043 [00] 41.267109: kthread_stop <-stop_machine_run
1302 bash-4043 [00] 41.267109: init_waitqueue_head <-kthread_stop
1303 bash-4043 [00] 41.267110: wake_up_process <-kthread_stop
1304 bash-4043 [00] 41.267110: try_to_wake_up <-wake_up_process
1305 bash-4043 [00] 41.267111: select_task_rq_rt <-try_to_wake_up
1306
1307
1308Note, reading the trace_pipe will block until more input is added.
1309By changing the tracer, trace_pipe will issue an EOF. We needed
1310to set the ftrace tracer _before_ cating the trace_pipe file.
1311
1312
1313trace entries
1314-------------
1315
1316Having too much or not enough data can be troublesome in diagnosing
1317some issue in the kernel. The file trace_entries is used to modify
1318the size of the internal trace buffers. The numbers listed
1319is the number of entries that can be recorded per CPU. To know
1320the full size, multiply the number of possible CPUS with the
1321number of entries.
1322
1323 # cat /debug/tracing/trace_entries
132465620
1325
1326Note, to modify this you must have tracing fulling disabled. To do that,
1327echo "none" into the current_tracer.
1328
1329 # echo none > /debug/tracing/current_tracer
1330 # echo 100000 > /debug/tracing/trace_entries
1331 # cat /debug/tracing/trace_entries
1332100045
1333
1334
1335Notice that we echoed in 100,000 but the size is 100,045. The entries
1336are held by individual pages. It allocates the number of pages it takes
1337to fulfill the request. If more entries may fit on the last page
1338it will add them.
1339
1340 # echo 1 > /debug/tracing/trace_entries
1341 # cat /debug/tracing/trace_entries
134285
1343
1344This shows us that 85 entries can fit on a single page.
1345
1346The number of pages that will be allocated is a percentage of available
1347memory. Allocating too much will produces an error.
1348
1349 # echo 1000000000000 > /debug/tracing/trace_entries
1350-bash: echo: write error: Cannot allocate memory
1351 # cat /debug/tracing/trace_entries
135285
1353
diff --git a/Documentation/kdump/kdump.txt b/Documentation/kdump/kdump.txt
index b8e52c0355d3..9691c7f5166c 100644
--- a/Documentation/kdump/kdump.txt
+++ b/Documentation/kdump/kdump.txt
@@ -109,7 +109,7 @@ There are two possible methods of using Kdump.
1092) Or use the system kernel binary itself as dump-capture kernel and there is 1092) Or use the system kernel binary itself as dump-capture kernel and there is
110 no need to build a separate dump-capture kernel. This is possible 110 no need to build a separate dump-capture kernel. This is possible
111 only with the architecutres which support a relocatable kernel. As 111 only with the architecutres which support a relocatable kernel. As
112 of today i386 and ia64 architectures support relocatable kernel. 112 of today, i386, x86_64 and ia64 architectures support relocatable kernel.
113 113
114Building a relocatable kernel is advantageous from the point of view that 114Building a relocatable kernel is advantageous from the point of view that
115one does not have to build a second kernel for capturing the dump. But 115one does not have to build a second kernel for capturing the dump. But
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index b52f47d588b4..795c487af8e4 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -271,6 +271,17 @@ and is between 256 and 4096 characters. It is defined in the file
271 aic79xx= [HW,SCSI] 271 aic79xx= [HW,SCSI]
272 See Documentation/scsi/aic79xx.txt. 272 See Documentation/scsi/aic79xx.txt.
273 273
274 amd_iommu= [HW,X86-84]
275 Pass parameters to the AMD IOMMU driver in the system.
276 Possible values are:
277 isolate - enable device isolation (each device, as far
278 as possible, will get its own protection
279 domain)
280 amd_iommu_size= [HW,X86-64]
281 Define the size of the aperture for the AMD IOMMU
282 driver. Possible values are:
283 '32M', '64M' (default), '128M', '256M', '512M', '1G'
284
274 amijoy.map= [HW,JOY] Amiga joystick support 285 amijoy.map= [HW,JOY] Amiga joystick support
275 Map of devices attached to JOY0DAT and JOY1DAT 286 Map of devices attached to JOY0DAT and JOY1DAT
276 Format: <a>,<b> 287 Format: <a>,<b>
@@ -599,6 +610,29 @@ and is between 256 and 4096 characters. It is defined in the file
599 See drivers/char/README.epca and 610 See drivers/char/README.epca and
600 Documentation/digiepca.txt. 611 Documentation/digiepca.txt.
601 612
613 disable_mtrr_cleanup [X86]
614 enable_mtrr_cleanup [X86]
615 The kernel tries to adjust MTRR layout from continuous
616 to discrete, to make X server driver able to add WB
617 entry later. This parameter enables/disables that.
618
619 mtrr_chunk_size=nn[KMG] [X86]
620 used for mtrr cleanup. It is largest continous chunk
621 that could hold holes aka. UC entries.
622
623 mtrr_gran_size=nn[KMG] [X86]
624 Used for mtrr cleanup. It is granularity of mtrr block.
625 Default is 1.
626 Large value could prevent small alignment from
627 using up MTRRs.
628
629 mtrr_spare_reg_nr=n [X86]
630 Format: <integer>
631 Range: 0,7 : spare reg number
632 Default : 1
633 Used for mtrr cleanup. It is spare mtrr entries number.
634 Set to 2 or more if your graphical card needs more.
635
602 disable_mtrr_trim [X86, Intel and AMD only] 636 disable_mtrr_trim [X86, Intel and AMD only]
603 By default the kernel will trim any uncacheable 637 By default the kernel will trim any uncacheable
604 memory out of your available memory pool based on 638 memory out of your available memory pool based on
@@ -2116,6 +2150,9 @@ and is between 256 and 4096 characters. It is defined in the file
2116 usbhid.mousepoll= 2150 usbhid.mousepoll=
2117 [USBHID] The interval which mice are to be polled at. 2151 [USBHID] The interval which mice are to be polled at.
2118 2152
2153 add_efi_memmap [EFI; x86-32,X86-64] Include EFI memory map in
2154 kernel's map of available physical RAM.
2155
2119 vdso= [X86-32,SH,x86-64] 2156 vdso= [X86-32,SH,x86-64]
2120 vdso=2: enable compat VDSO (default with COMPAT_VDSO) 2157 vdso=2: enable compat VDSO (default with COMPAT_VDSO)
2121 vdso=1: enable VDSO (default) 2158 vdso=1: enable VDSO (default)
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 17f1f91af35c..946b66e1b652 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -148,9 +148,9 @@ tcp_available_congestion_control - STRING
148 but not loaded. 148 but not loaded.
149 149
150tcp_base_mss - INTEGER 150tcp_base_mss - INTEGER
151 The initial value of search_low to be used by Packetization Layer 151 The initial value of search_low to be used by the packetization layer
152 Path MTU Discovery (MTU probing). If MTU probing is enabled, 152 Path MTU discovery (MTU probing). If MTU probing is enabled,
153 this is the inital MSS used by the connection. 153 this is the initial MSS used by the connection.
154 154
155tcp_congestion_control - STRING 155tcp_congestion_control - STRING
156 Set the congestion control algorithm to be used for new 156 Set the congestion control algorithm to be used for new
@@ -185,10 +185,9 @@ tcp_frto - INTEGER
185 timeouts. It is particularly beneficial in wireless environments 185 timeouts. It is particularly beneficial in wireless environments
186 where packet loss is typically due to random radio interference 186 where packet loss is typically due to random radio interference
187 rather than intermediate router congestion. F-RTO is sender-side 187 rather than intermediate router congestion. F-RTO is sender-side
188 only modification. Therefore it does not require any support from 188 only modification. Therefore it does not require any support from
189 the peer, but in a typical case, however, where wireless link is 189 the peer.
190 the local access link and most of the data flows downlink, the 190
191 faraway servers should have F-RTO enabled to take advantage of it.
192 If set to 1, basic version is enabled. 2 enables SACK enhanced 191 If set to 1, basic version is enabled. 2 enables SACK enhanced
193 F-RTO if flow uses SACK. The basic version can be used also when 192 F-RTO if flow uses SACK. The basic version can be used also when
194 SACK is in use though scenario(s) with it exists where F-RTO 193 SACK is in use though scenario(s) with it exists where F-RTO
@@ -276,7 +275,7 @@ tcp_mem - vector of 3 INTEGERs: min, pressure, max
276 memory. 275 memory.
277 276
278tcp_moderate_rcvbuf - BOOLEAN 277tcp_moderate_rcvbuf - BOOLEAN
279 If set, TCP performs receive buffer autotuning, attempting to 278 If set, TCP performs receive buffer auto-tuning, attempting to
280 automatically size the buffer (no greater than tcp_rmem[2]) to 279 automatically size the buffer (no greater than tcp_rmem[2]) to
281 match the size required by the path for full throughput. Enabled by 280 match the size required by the path for full throughput. Enabled by
282 default. 281 default.
@@ -336,7 +335,7 @@ tcp_rmem - vector of 3 INTEGERs: min, default, max
336 pressure. 335 pressure.
337 Default: 8K 336 Default: 8K
338 337
339 default: default size of receive buffer used by TCP sockets. 338 default: initial size of receive buffer used by TCP sockets.
340 This value overrides net.core.rmem_default used by other protocols. 339 This value overrides net.core.rmem_default used by other protocols.
341 Default: 87380 bytes. This value results in window of 65535 with 340 Default: 87380 bytes. This value results in window of 65535 with
342 default setting of tcp_adv_win_scale and tcp_app_win:0 and a bit 341 default setting of tcp_adv_win_scale and tcp_app_win:0 and a bit
@@ -344,8 +343,10 @@ tcp_rmem - vector of 3 INTEGERs: min, default, max
344 343
345 max: maximal size of receive buffer allowed for automatically 344 max: maximal size of receive buffer allowed for automatically
346 selected receiver buffers for TCP socket. This value does not override 345 selected receiver buffers for TCP socket. This value does not override
347 net.core.rmem_max, "static" selection via SO_RCVBUF does not use this. 346 net.core.rmem_max. Calling setsockopt() with SO_RCVBUF disables
348 Default: 87380*2 bytes. 347 automatic tuning of that socket's receive buffer size, in which
348 case this value is ignored.
349 Default: between 87380B and 4MB, depending on RAM size.
349 350
350tcp_sack - BOOLEAN 351tcp_sack - BOOLEAN
351 Enable select acknowledgments (SACKS). 352 Enable select acknowledgments (SACKS).
@@ -358,7 +359,7 @@ tcp_slow_start_after_idle - BOOLEAN
358 Default: 1 359 Default: 1
359 360
360tcp_stdurg - BOOLEAN 361tcp_stdurg - BOOLEAN
361 Use the Host requirements interpretation of the TCP urg pointer field. 362 Use the Host requirements interpretation of the TCP urgent pointer field.
362 Most hosts use the older BSD interpretation, so if you turn this on 363 Most hosts use the older BSD interpretation, so if you turn this on
363 Linux might not communicate correctly with them. 364 Linux might not communicate correctly with them.
364 Default: FALSE 365 Default: FALSE
@@ -371,12 +372,12 @@ tcp_synack_retries - INTEGER
371tcp_syncookies - BOOLEAN 372tcp_syncookies - BOOLEAN
372 Only valid when the kernel was compiled with CONFIG_SYNCOOKIES 373 Only valid when the kernel was compiled with CONFIG_SYNCOOKIES
373 Send out syncookies when the syn backlog queue of a socket 374 Send out syncookies when the syn backlog queue of a socket
374 overflows. This is to prevent against the common 'syn flood attack' 375 overflows. This is to prevent against the common 'SYN flood attack'
375 Default: FALSE 376 Default: FALSE
376 377
377 Note, that syncookies is fallback facility. 378 Note, that syncookies is fallback facility.
378 It MUST NOT be used to help highly loaded servers to stand 379 It MUST NOT be used to help highly loaded servers to stand
379 against legal connection rate. If you see synflood warnings 380 against legal connection rate. If you see SYN flood warnings
380 in your logs, but investigation shows that they occur 381 in your logs, but investigation shows that they occur
381 because of overload with legal connections, you should tune 382 because of overload with legal connections, you should tune
382 another parameters until this warning disappear. 383 another parameters until this warning disappear.
@@ -386,7 +387,7 @@ tcp_syncookies - BOOLEAN
386 to use TCP extensions, can result in serious degradation 387 to use TCP extensions, can result in serious degradation
387 of some services (f.e. SMTP relaying), visible not by you, 388 of some services (f.e. SMTP relaying), visible not by you,
388 but your clients and relays, contacting you. While you see 389 but your clients and relays, contacting you. While you see
389 synflood warnings in logs not being really flooded, your server 390 SYN flood warnings in logs not being really flooded, your server
390 is seriously misconfigured. 391 is seriously misconfigured.
391 392
392tcp_syn_retries - INTEGER 393tcp_syn_retries - INTEGER
@@ -419,19 +420,21 @@ tcp_window_scaling - BOOLEAN
419 Enable window scaling as defined in RFC1323. 420 Enable window scaling as defined in RFC1323.
420 421
421tcp_wmem - vector of 3 INTEGERs: min, default, max 422tcp_wmem - vector of 3 INTEGERs: min, default, max
422 min: Amount of memory reserved for send buffers for TCP socket. 423 min: Amount of memory reserved for send buffers for TCP sockets.
423 Each TCP socket has rights to use it due to fact of its birth. 424 Each TCP socket has rights to use it due to fact of its birth.
424 Default: 4K 425 Default: 4K
425 426
426 default: Amount of memory allowed for send buffers for TCP socket 427 default: initial size of send buffer used by TCP sockets. This
427 by default. This value overrides net.core.wmem_default used 428 value overrides net.core.wmem_default used by other protocols.
428 by other protocols, it is usually lower than net.core.wmem_default. 429 It is usually lower than net.core.wmem_default.
429 Default: 16K 430 Default: 16K
430 431
431 max: Maximal amount of memory allowed for automatically selected 432 max: Maximal amount of memory allowed for automatically tuned
432 send buffers for TCP socket. This value does not override 433 send buffers for TCP sockets. This value does not override
433 net.core.wmem_max, "static" selection via SO_SNDBUF does not use this. 434 net.core.wmem_max. Calling setsockopt() with SO_SNDBUF disables
434 Default: 128K 435 automatic tuning of that socket's send buffer size, in which case
436 this value is ignored.
437 Default: between 64K and 4MB, depending on RAM size.
435 438
436tcp_workaround_signed_windows - BOOLEAN 439tcp_workaround_signed_windows - BOOLEAN
437 If set, assume no receipt of a window scaling option means the 440 If set, assume no receipt of a window scaling option means the
@@ -1060,24 +1063,193 @@ bridge-nf-filter-pppoe-tagged - BOOLEAN
1060 Default: 1 1063 Default: 1
1061 1064
1062 1065
1063UNDOCUMENTED: 1066proc/sys/net/sctp/* Variables:
1067
1068addip_enable - BOOLEAN
1069 Enable or disable extension of Dynamic Address Reconfiguration
1070 (ADD-IP) functionality specified in RFC5061. This extension provides
1071 the ability to dynamically add and remove new addresses for the SCTP
1072 associations.
1073
1074 1: Enable extension.
1075
1076 0: Disable extension.
1077
1078 Default: 0
1079
1080addip_noauth_enable - BOOLEAN
1081 Dynamic Address Reconfiguration (ADD-IP) requires the use of
1082 authentication to protect the operations of adding or removing new
1083 addresses. This requirement is mandated so that unauthorized hosts
1084 would not be able to hijack associations. However, older
1085 implementations may not have implemented this requirement while
1086 allowing the ADD-IP extension. For reasons of interoperability,
1087 we provide this variable to control the enforcement of the
1088 authentication requirement.
1089
1090 1: Allow ADD-IP extension to be used without authentication. This
1091 should only be set in a closed environment for interoperability
1092 with older implementations.
1093
1094 0: Enforce the authentication requirement
1095
1096 Default: 0
1097
1098auth_enable - BOOLEAN
1099 Enable or disable Authenticated Chunks extension. This extension
1100 provides the ability to send and receive authenticated chunks and is
1101 required for secure operation of Dynamic Address Reconfiguration
1102 (ADD-IP) extension.
1103
1104 1: Enable this extension.
1105 0: Disable this extension.
1106
1107 Default: 0
1108
1109prsctp_enable - BOOLEAN
1110 Enable or disable the Partial Reliability extension (RFC3758) which
1111 is used to notify peers that a given DATA should no longer be expected.
1112
1113 1: Enable extension
1114 0: Disable
1115
1116 Default: 1
1117
1118max_burst - INTEGER
1119 The limit of the number of new packets that can be initially sent. It
1120 controls how bursty the generated traffic can be.
1121
1122 Default: 4
1123
1124association_max_retrans - INTEGER
1125 Set the maximum number for retransmissions that an association can
1126 attempt deciding that the remote end is unreachable. If this value
1127 is exceeded, the association is terminated.
1128
1129 Default: 10
1130
1131max_init_retransmits - INTEGER
1132 The maximum number of retransmissions of INIT and COOKIE-ECHO chunks
1133 that an association will attempt before declaring the destination
1134 unreachable and terminating.
1135
1136 Default: 8
1137
1138path_max_retrans - INTEGER
1139 The maximum number of retransmissions that will be attempted on a given
1140 path. Once this threshold is exceeded, the path is considered
1141 unreachable, and new traffic will use a different path when the
1142 association is multihomed.
1143
1144 Default: 5
1145
1146rto_initial - INTEGER
1147 The initial round trip timeout value in milliseconds that will be used
1148 in calculating round trip times. This is the initial time interval
1149 for retransmissions.
1150
1151 Default: 3000
1064 1152
1065dev_weight FIXME 1153rto_max - INTEGER
1066discovery_slots FIXME 1154 The maximum value (in milliseconds) of the round trip timeout. This
1067discovery_timeout FIXME 1155 is the largest time interval that can elapse between retransmissions.
1068fast_poll_increase FIXME 1156
1069ip6_queue_maxlen FIXME 1157 Default: 60000
1070lap_keepalive_time FIXME 1158
1071lo_cong FIXME 1159rto_min - INTEGER
1072max_baud_rate FIXME 1160 The minimum value (in milliseconds) of the round trip timeout. This
1073max_dgram_qlen FIXME 1161 is the smallest time interval the can elapse between retransmissions.
1074max_noreply_time FIXME 1162
1075max_tx_data_size FIXME 1163 Default: 1000
1076max_tx_window FIXME 1164
1077min_tx_turn_time FIXME 1165hb_interval - INTEGER
1078mod_cong FIXME 1166 The interval (in milliseconds) between HEARTBEAT chunks. These chunks
1079no_cong FIXME 1167 are sent at the specified interval on idle paths to probe the state of
1080no_cong_thresh FIXME 1168 a given path between 2 associations.
1081slot_timeout FIXME 1169
1082warn_noreply_time FIXME 1170 Default: 30000
1171
1172sack_timeout - INTEGER
1173 The amount of time (in milliseconds) that the implementation will wait
1174 to send a SACK.
1175
1176 Default: 200
1177
1178valid_cookie_life - INTEGER
1179 The default lifetime of the SCTP cookie (in milliseconds). The cookie
1180 is used during association establishment.
1181
1182 Default: 60000
1183
1184cookie_preserve_enable - BOOLEAN
1185 Enable or disable the ability to extend the lifetime of the SCTP cookie
1186 that is used during the establishment phase of SCTP association
1187
1188 1: Enable cookie lifetime extension.
1189 0: Disable
1190
1191 Default: 1
1192
1193rcvbuf_policy - INTEGER
1194 Determines if the receive buffer is attributed to the socket or to
1195 association. SCTP supports the capability to create multiple
1196 associations on a single socket. When using this capability, it is
1197 possible that a single stalled association that's buffering a lot
1198 of data may block other associations from delivering their data by
1199 consuming all of the receive buffer space. To work around this,
1200 the rcvbuf_policy could be set to attribute the receiver buffer space
1201 to each association instead of the socket. This prevents the described
1202 blocking.
1203
1204 1: rcvbuf space is per association
1205 0: recbuf space is per socket
1206
1207 Default: 0
1208
1209sndbuf_policy - INTEGER
1210 Similar to rcvbuf_policy above, this applies to send buffer space.
1211
1212 1: Send buffer is tracked per association
1213 0: Send buffer is tracked per socket.
1214
1215 Default: 0
1216
1217sctp_mem - vector of 3 INTEGERs: min, pressure, max
1218 Number of pages allowed for queueing by all SCTP sockets.
1219
1220 min: Below this number of pages SCTP is not bothered about its
1221 memory appetite. When amount of memory allocated by SCTP exceeds
1222 this number, SCTP starts to moderate memory usage.
1223
1224 pressure: This value was introduced to follow format of tcp_mem.
1225
1226 max: Number of pages allowed for queueing by all SCTP sockets.
1227
1228 Default is calculated at boot time from amount of available memory.
1229
1230sctp_rmem - vector of 3 INTEGERs: min, default, max
1231 See tcp_rmem for a description.
1232
1233sctp_wmem - vector of 3 INTEGERs: min, default, max
1234 See tcp_wmem for a description.
1235
1236UNDOCUMENTED:
1083 1237
1238/proc/sys/net/core/*
1239 dev_weight FIXME
1240
1241/proc/sys/net/unix/*
1242 max_dgram_qlen FIXME
1243
1244/proc/sys/net/irda/*
1245 fast_poll_increase FIXME
1246 warn_noreply_time FIXME
1247 discovery_slots FIXME
1248 slot_timeout FIXME
1249 max_baud_rate FIXME
1250 discovery_timeout FIXME
1251 lap_keepalive_time FIXME
1252 max_noreply_time FIXME
1253 max_tx_data_size FIXME
1254 max_tx_window FIXME
1255 min_tx_turn_time FIXME
diff --git a/Documentation/nmi_watchdog.txt b/Documentation/nmi_watchdog.txt
index 757c729ee42e..90aa4531cb67 100644
--- a/Documentation/nmi_watchdog.txt
+++ b/Documentation/nmi_watchdog.txt
@@ -10,7 +10,7 @@ us to generate 'watchdog NMI interrupts'. (NMI: Non Maskable Interrupt
10which get executed even if the system is otherwise locked up hard). 10which get executed even if the system is otherwise locked up hard).
11This can be used to debug hard kernel lockups. By executing periodic 11This can be used to debug hard kernel lockups. By executing periodic
12NMI interrupts, the kernel can monitor whether any CPU has locked up, 12NMI interrupts, the kernel can monitor whether any CPU has locked up,
13and print out debugging messages if so. 13and print out debugging messages if so.
14 14
15In order to use the NMI watchdog, you need to have APIC support in your 15In order to use the NMI watchdog, you need to have APIC support in your
16kernel. For SMP kernels, APIC support gets compiled in automatically. For 16kernel. For SMP kernels, APIC support gets compiled in automatically. For
@@ -22,8 +22,7 @@ CONFIG_X86_UP_IOAPIC is for uniprocessor with an IO-APIC. [Note: certain
22kernel debugging options, such as Kernel Stack Meter or Kernel Tracer, 22kernel debugging options, such as Kernel Stack Meter or Kernel Tracer,
23may implicitly disable the NMI watchdog.] 23may implicitly disable the NMI watchdog.]
24 24
25For x86-64, the needed APIC is always compiled in, and the NMI watchdog is 25For x86-64, the needed APIC is always compiled in.
26always enabled with I/O-APIC mode (nmi_watchdog=1).
27 26
28Using local APIC (nmi_watchdog=2) needs the first performance register, so 27Using local APIC (nmi_watchdog=2) needs the first performance register, so
29you can't use it for other purposes (such as high precision performance 28you can't use it for other purposes (such as high precision performance
@@ -63,16 +62,15 @@ when the system is idle), but if your system locks up on anything but the
63"hlt", then you are out of luck -- the event will not happen at all and the 62"hlt", then you are out of luck -- the event will not happen at all and the
64watchdog won't trigger. This is a shortcoming of the local APIC watchdog 63watchdog won't trigger. This is a shortcoming of the local APIC watchdog
65-- unfortunately there is no "clock ticks" event that would work all the 64-- unfortunately there is no "clock ticks" event that would work all the
66time. The I/O APIC watchdog is driven externally and has no such shortcoming. 65time. The I/O APIC watchdog is driven externally and has no such shortcoming.
67But its NMI frequency is much higher, resulting in a more significant hit 66But its NMI frequency is much higher, resulting in a more significant hit
68to the overall system performance. 67to the overall system performance.
69 68
70NOTE: starting with 2.4.2-ac18 the NMI-oopser is disabled by default, 69On x86 nmi_watchdog is disabled by default so you have to enable it with
71you have to enable it with a boot time parameter. Prior to 2.4.2-ac18 70a boot time parameter.
72the NMI-oopser is enabled unconditionally on x86 SMP boxes.
73 71
74On x86-64 the NMI oopser is on by default. On 64bit Intel CPUs 72NOTE: In kernels prior to 2.4.2-ac18 the NMI-oopser is enabled unconditionally
75it uses IO-APIC by default and on AMD it uses local APIC. 73on x86 SMP boxes.
76 74
77[ feel free to send bug reports, suggestions and patches to 75[ feel free to send bug reports, suggestions and patches to
78 Ingo Molnar <mingo@redhat.com> or the Linux SMP mailing 76 Ingo Molnar <mingo@redhat.com> or the Linux SMP mailing
diff --git a/Documentation/scheduler/sched-domains.txt b/Documentation/scheduler/sched-domains.txt
index a9e990ab980f..373ceacc367e 100644
--- a/Documentation/scheduler/sched-domains.txt
+++ b/Documentation/scheduler/sched-domains.txt
@@ -61,10 +61,7 @@ builder by #define'ing ARCH_HASH_SCHED_DOMAIN, and exporting your
61arch_init_sched_domains function. This function will attach domains to all 61arch_init_sched_domains function. This function will attach domains to all
62CPUs using cpu_attach_domain. 62CPUs using cpu_attach_domain.
63 63
64Implementors should change the line 64The sched-domains debugging infrastructure can be enabled by enabling
65#undef SCHED_DOMAIN_DEBUG 65CONFIG_SCHED_DEBUG. This enables an error checking parse of the sched domains
66to
67#define SCHED_DOMAIN_DEBUG
68in kernel/sched.c as this enables an error checking parse of the sched domains
69which should catch most possible errors (described above). It also prints out 66which should catch most possible errors (described above). It also prints out
70the domain structure in a visual format. 67the domain structure in a visual format.
diff --git a/Documentation/scheduler/sched-rt-group.txt b/Documentation/scheduler/sched-rt-group.txt
index 14f901f639ee..3ef339f491e0 100644
--- a/Documentation/scheduler/sched-rt-group.txt
+++ b/Documentation/scheduler/sched-rt-group.txt
@@ -51,9 +51,9 @@ needs only about 3% CPU time to do so, it can do with a 0.03 * 0.005s =
510.00015s. So this group can be scheduled with a period of 0.005s and a run time 510.00015s. So this group can be scheduled with a period of 0.005s and a run time
52of 0.00015s. 52of 0.00015s.
53 53
54The remaining CPU time will be used for user input and other tass. Because 54The remaining CPU time will be used for user input and other tasks. Because
55realtime tasks have explicitly allocated the CPU time they need to perform 55realtime tasks have explicitly allocated the CPU time they need to perform
56their tasks, buffer underruns in the graphocs or audio can be eliminated. 56their tasks, buffer underruns in the graphics or audio can be eliminated.
57 57
58NOTE: the above example is not fully implemented as of yet (2.6.25). We still 58NOTE: the above example is not fully implemented as of yet (2.6.25). We still
59lack an EDF scheduler to make non-uniform periods usable. 59lack an EDF scheduler to make non-uniform periods usable.
diff --git a/Documentation/i386/IO-APIC.txt b/Documentation/x86/i386/IO-APIC.txt
index 30b4c714fbe1..30b4c714fbe1 100644
--- a/Documentation/i386/IO-APIC.txt
+++ b/Documentation/x86/i386/IO-APIC.txt
diff --git a/Documentation/i386/boot.txt b/Documentation/x86/i386/boot.txt
index 95ad15c3b01f..147bfe511cdd 100644
--- a/Documentation/i386/boot.txt
+++ b/Documentation/x86/i386/boot.txt
@@ -1,17 +1,14 @@
1 THE LINUX/I386 BOOT PROTOCOL 1 THE LINUX/x86 BOOT PROTOCOL
2 ---------------------------- 2 ---------------------------
3 3
4 H. Peter Anvin <hpa@zytor.com> 4On the x86 platform, the Linux kernel uses a rather complicated boot
5 Last update 2007-05-23
6
7On the i386 platform, the Linux kernel uses a rather complicated boot
8convention. This has evolved partially due to historical aspects, as 5convention. This has evolved partially due to historical aspects, as
9well as the desire in the early days to have the kernel itself be a 6well as the desire in the early days to have the kernel itself be a
10bootable image, the complicated PC memory model and due to changed 7bootable image, the complicated PC memory model and due to changed
11expectations in the PC industry caused by the effective demise of 8expectations in the PC industry caused by the effective demise of
12real-mode DOS as a mainstream operating system. 9real-mode DOS as a mainstream operating system.
13 10
14Currently, the following versions of the Linux/i386 boot protocol exist. 11Currently, the following versions of the Linux/x86 boot protocol exist.
15 12
16Old kernels: zImage/Image support only. Some very early kernels 13Old kernels: zImage/Image support only. Some very early kernels
17 may not even support a command line. 14 may not even support a command line.
@@ -372,10 +369,17 @@ Protocol: 2.00+
372 - If 0, the protected-mode code is loaded at 0x10000. 369 - If 0, the protected-mode code is loaded at 0x10000.
373 - If 1, the protected-mode code is loaded at 0x100000. 370 - If 1, the protected-mode code is loaded at 0x100000.
374 371
372 Bit 5 (write): QUIET_FLAG
373 - If 0, print early messages.
374 - If 1, suppress early messages.
375 This requests to the kernel (decompressor and early
376 kernel) to not write early messages that require
377 accessing the display hardware directly.
378
375 Bit 6 (write): KEEP_SEGMENTS 379 Bit 6 (write): KEEP_SEGMENTS
376 Protocol: 2.07+ 380 Protocol: 2.07+
377 - if 0, reload the segment registers in the 32bit entry point. 381 - If 0, reload the segment registers in the 32bit entry point.
378 - if 1, do not reload the segment registers in the 32bit entry point. 382 - If 1, do not reload the segment registers in the 32bit entry point.
379 Assume that %cs %ds %ss %es are all set to flat segments with 383 Assume that %cs %ds %ss %es are all set to flat segments with
380 a base of 0 (or the equivalent for their environment). 384 a base of 0 (or the equivalent for their environment).
381 385
@@ -504,7 +508,7 @@ Protocol: 2.06+
504 maximum size was 255. 508 maximum size was 255.
505 509
506Field name: hardware_subarch 510Field name: hardware_subarch
507Type: write 511Type: write (optional, defaults to x86/PC)
508Offset/size: 0x23c/4 512Offset/size: 0x23c/4
509Protocol: 2.07+ 513Protocol: 2.07+
510 514
@@ -520,11 +524,13 @@ Protocol: 2.07+
520 0x00000002 Xen 524 0x00000002 Xen
521 525
522Field name: hardware_subarch_data 526Field name: hardware_subarch_data
523Type: write 527Type: write (subarch-dependent)
524Offset/size: 0x240/8 528Offset/size: 0x240/8
525Protocol: 2.07+ 529Protocol: 2.07+
526 530
527 A pointer to data that is specific to hardware subarch 531 A pointer to data that is specific to hardware subarch
532 This field is currently unused for the default x86/PC environment,
533 do not modify.
528 534
529Field name: payload_offset 535Field name: payload_offset
530Type: read 536Type: read
@@ -545,6 +551,34 @@ Protocol: 2.08+
545 551
546 The length of the payload. 552 The length of the payload.
547 553
554Field name: setup_data
555Type: write (special)
556Offset/size: 0x250/8
557Protocol: 2.09+
558
559 The 64-bit physical pointer to NULL terminated single linked list of
560 struct setup_data. This is used to define a more extensible boot
561 parameters passing mechanism. The definition of struct setup_data is
562 as follow:
563
564 struct setup_data {
565 u64 next;
566 u32 type;
567 u32 len;
568 u8 data[0];
569 };
570
571 Where, the next is a 64-bit physical pointer to the next node of
572 linked list, the next field of the last node is 0; the type is used
573 to identify the contents of data; the len is the length of data
574 field; the data holds the real payload.
575
576 This list may be modified at a number of points during the bootup
577 process. Therefore, when modifying this list one should always make
578 sure to consider the case where the linked list already contains
579 entries.
580
581
548**** THE IMAGE CHECKSUM 582**** THE IMAGE CHECKSUM
549 583
550From boot protocol version 2.08 onwards the CRC-32 is calculated over 584From boot protocol version 2.08 onwards the CRC-32 is calculated over
@@ -553,6 +587,7 @@ initial remainder of 0xffffffff. The checksum is appended to the
553file; therefore the CRC of the file up to the limit specified in the 587file; therefore the CRC of the file up to the limit specified in the
554syssize field of the header is always 0. 588syssize field of the header is always 0.
555 589
590
556**** THE KERNEL COMMAND LINE 591**** THE KERNEL COMMAND LINE
557 592
558The kernel command line has become an important way for the boot 593The kernel command line has become an important way for the boot
@@ -584,28 +619,6 @@ command line is entered using the following protocol:
584 covered by setup_move_size, so you may need to adjust this 619 covered by setup_move_size, so you may need to adjust this
585 field. 620 field.
586 621
587Field name: setup_data
588Type: write (obligatory)
589Offset/size: 0x250/8
590Protocol: 2.09+
591
592 The 64-bit physical pointer to NULL terminated single linked list of
593 struct setup_data. This is used to define a more extensible boot
594 parameters passing mechanism. The definition of struct setup_data is
595 as follow:
596
597 struct setup_data {
598 u64 next;
599 u32 type;
600 u32 len;
601 u8 data[0];
602 };
603
604 Where, the next is a 64-bit physical pointer to the next node of
605 linked list, the next field of the last node is 0; the type is used
606 to identify the contents of data; the len is the length of data
607 field; the data holds the real payload.
608
609 622
610**** MEMORY LAYOUT OF THE REAL-MODE CODE 623**** MEMORY LAYOUT OF THE REAL-MODE CODE
611 624
diff --git a/Documentation/i386/usb-legacy-support.txt b/Documentation/x86/i386/usb-legacy-support.txt
index 1894cdfc69d9..1894cdfc69d9 100644
--- a/Documentation/i386/usb-legacy-support.txt
+++ b/Documentation/x86/i386/usb-legacy-support.txt
diff --git a/Documentation/i386/zero-page.txt b/Documentation/x86/i386/zero-page.txt
index 169ad423a3d1..169ad423a3d1 100644
--- a/Documentation/i386/zero-page.txt
+++ b/Documentation/x86/i386/zero-page.txt
diff --git a/Documentation/x86_64/00-INDEX b/Documentation/x86/x86_64/00-INDEX
index 92fc20ab5f0e..92fc20ab5f0e 100644
--- a/Documentation/x86_64/00-INDEX
+++ b/Documentation/x86/x86_64/00-INDEX
diff --git a/Documentation/x86_64/boot-options.txt b/Documentation/x86/x86_64/boot-options.txt
index b0c7b6c4abda..b0c7b6c4abda 100644
--- a/Documentation/x86_64/boot-options.txt
+++ b/Documentation/x86/x86_64/boot-options.txt
diff --git a/Documentation/x86_64/cpu-hotplug-spec b/Documentation/x86/x86_64/cpu-hotplug-spec
index 3c23e0587db3..3c23e0587db3 100644
--- a/Documentation/x86_64/cpu-hotplug-spec
+++ b/Documentation/x86/x86_64/cpu-hotplug-spec
diff --git a/Documentation/x86_64/fake-numa-for-cpusets b/Documentation/x86/x86_64/fake-numa-for-cpusets
index d1a985c5b00a..d1a985c5b00a 100644
--- a/Documentation/x86_64/fake-numa-for-cpusets
+++ b/Documentation/x86/x86_64/fake-numa-for-cpusets
diff --git a/Documentation/x86_64/kernel-stacks b/Documentation/x86/x86_64/kernel-stacks
index 5ad65d51fb95..5ad65d51fb95 100644
--- a/Documentation/x86_64/kernel-stacks
+++ b/Documentation/x86/x86_64/kernel-stacks
diff --git a/Documentation/x86_64/machinecheck b/Documentation/x86/x86_64/machinecheck
index a05e58e7b159..a05e58e7b159 100644
--- a/Documentation/x86_64/machinecheck
+++ b/Documentation/x86/x86_64/machinecheck
diff --git a/Documentation/x86_64/mm.txt b/Documentation/x86/x86_64/mm.txt
index b89b6d2bebfa..efce75097369 100644
--- a/Documentation/x86_64/mm.txt
+++ b/Documentation/x86/x86_64/mm.txt
@@ -11,9 +11,8 @@ ffffc10000000000 - ffffc1ffffffffff (=40 bits) hole
11ffffc20000000000 - ffffe1ffffffffff (=45 bits) vmalloc/ioremap space 11ffffc20000000000 - ffffe1ffffffffff (=45 bits) vmalloc/ioremap space
12ffffe20000000000 - ffffe2ffffffffff (=40 bits) virtual memory map (1TB) 12ffffe20000000000 - ffffe2ffffffffff (=40 bits) virtual memory map (1TB)
13... unused hole ... 13... unused hole ...
14ffffffff80000000 - ffffffff82800000 (=40 MB) kernel text mapping, from phys 0 14ffffffff80000000 - ffffffffa0000000 (=512 MB) kernel text mapping, from phys 0
15... unused hole ... 15ffffffffa0000000 - fffffffffff00000 (=1536 MB) module mapping space
16ffffffff88000000 - fffffffffff00000 (=1919 MB) module mapping space
17 16
18The direct mapping covers all memory in the system up to the highest 17The direct mapping covers all memory in the system up to the highest
19memory address (this means in some cases it can also include PCI memory 18memory address (this means in some cases it can also include PCI memory
diff --git a/Documentation/x86_64/uefi.txt b/Documentation/x86/x86_64/uefi.txt
index 7d77120a5184..a5e2b4fdb170 100644
--- a/Documentation/x86_64/uefi.txt
+++ b/Documentation/x86/x86_64/uefi.txt
@@ -36,3 +36,7 @@ Mechanics:
36 services. 36 services.
37 noefi turn off all EFI runtime services 37 noefi turn off all EFI runtime services
38 reboot_type=k turn off EFI reboot runtime service 38 reboot_type=k turn off EFI reboot runtime service
39- If the EFI memory map has additional entries not in the E820 map,
40 you can include those entries in the kernels memory map of available
41 physical RAM by using the following kernel command line parameter.
42 add_efi_memmap include EFI memory map of available physical RAM