Merge branch 'wip-2.6.34' into old-private-masterarchived-private-master

author: Andrea Bastoni <bastoni@cs.unc.edu> 2010-05-30 19:16:45 -0400
committer: Andrea Bastoni <bastoni@cs.unc.edu> 2010-05-30 19:16:45 -0400
commit: ada47b5fe13d89735805b566185f4885f5a3f750 (patch)
tree: 644b88f8a71896307d71438e9b3af49126ffb22b /Documentation/trace
parent: 43e98717ad40a4ae64545b5ba047c7b86aa44f4f (diff)
parent: 3280f21d43ee541f97f8cda5792150d2dbec20d5 (diff)
7 files changed, 271 insertions, 86 deletions
diff --git a/Documentation/trace/events-kmem.txt b/Documentation/trace/events-kmem.txt
index 6ef2a8652e17..aa82ee4a5a87 100644
--- a/Documentation/trace/events-kmem.txt
+++ b/Documentation/trace/events-kmem.txt
@@ -1,7 +1,7 @@
                        Subsystem Trace Points: kmem
-The tracing system kmem captures events related to object and page allocation
+The kmem tracing system captures events related to object and page allocation
-within the kernel. Broadly speaking there are four major subheadings.
+within the kernel. Broadly speaking there are five major subheadings.
  o Slab allocation of small objects of unknown type (kmalloc)
  o Slab allocation of small objects of known type
@@ -9,7 +9,7 @@ within the kernel. Broadly speaking there are four major subheadings.
  o Per-CPU Allocator Activity
  o External Fragmentation
-This document will describe what each of the tracepoints are and why they
+This document describes what each of the tracepoints is and why they
 might be useful.
 1. Slab allocation of small objects of unknown type
@@ -34,7 +34,7 @@ kmem_cache_free		call_site=%lx ptr=%p
 These events are similar in usage to the kmalloc-related events except that
 it is likely easier to pin the event down to a specific cache. At the time
 of writing, no information is available on what slab is being allocated from,
-but the call_site can usually be used to extrapolate that information
+but the call_site can usually be used to extrapolate that information.
 3. Page allocation
 ==================
@@ -80,9 +80,9 @@ event indicating whether it is for a percpu_refill or not.
 When the per-CPU list is too full, a number of pages are freed, each one
 which triggers a mm_page_pcpu_drain event.
-The individual nature of the events are so that pages can be tracked
+The individual nature of the events is so that pages can be tracked
 between allocation and freeing. A number of drain or refill pages that occur
-consecutively imply the zone->lock being taken once. Large amounts of PCP
+consecutively imply the zone->lock being taken once. Large amounts of per-CPU
 refills and drains could imply an imbalance between CPUs where too much work
 is being concentrated in one place. It could also indicate that the per-CPU
 lists should be a larger size. Finally, large amounts of refills on one CPU
@@ -102,6 +102,6 @@ is important.
 Large numbers of this event implies that memory is fragmenting and
 high-order allocations will start failing at some time in the future. One
-means of reducing the occurange of this event is to increase the size of
+means of reducing the occurrence of this event is to increase the size of
 min_free_kbytes in increments of 3*pageblock_size*nr_online_nodes where
 pageblock_size is usually the size of the default hugepage size.
diff --git a/Documentation/trace/ftrace-design.txt b/Documentation/trace/ftrace-design.txt
index 7003e10f10f5..f1f81afee8a0 100644
--- a/Documentation/trace/ftrace-design.txt
+++ b/Documentation/trace/ftrace-design.txt
@@ -1,5 +1,6 @@
                function tracer guts
                ====================
+                By Mike Frysinger
 Introduction
 ------------
@@ -53,14 +54,14 @@ size of the mcount call that is embedded in the function).
 For example, if the function foo() calls bar(), when the bar() function calls
 mcount(), the arguments mcount() will pass to the tracer are:
        "frompc" - the address bar() will use to return to foo()
-        "selfpc" - the address bar() (with _mcount() size adjustment)
+        "selfpc" - the address bar() (with mcount() size adjustment)
 Also keep in mind that this mcount function will be called *a lot*, so
 optimizing for the default case of no tracer will help the smooth running of
 your system when tracing is disabled.  So the start of the mcount function is
-typically the bare min with checking things before returning.  That also means
+typically the bare minimum with checking things before returning.  That also
-the code flow should usually kept linear (i.e. no branching in the nop case).
+means the code flow should usually be kept linear (i.e. no branching in the nop
-This is of course an optimization and not a hard requirement.
+case).  This is of course an optimization and not a hard requirement.
 Here is some pseudo code that should help (these functions should actually be
 implemented in assembly):
@@ -131,10 +132,10 @@ some functions to save (hijack) and restore the return address.
 The mcount function should check the function pointers ftrace_graph_return
 (compare to ftrace_stub) and ftrace_graph_entry (compare to
-ftrace_graph_entry_stub).  If either of those are not set to the relevant stub
+ftrace_graph_entry_stub).  If either of those is not set to the relevant stub
 function, call the arch-specific function ftrace_graph_caller which in turn
 calls the arch-specific function prepare_ftrace_return.  Neither of these
-function names are strictly required, but you should use them anyways to stay
+function names is strictly required, but you should use them anyway to stay
 consistent across the architecture ports -- easier to compare & contrast
 things.
@@ -144,7 +145,7 @@ but the first argument should be a pointer to the "frompc".  Typically this is
 located on the stack.  This allows the function to hijack the return address
 temporarily to have it point to the arch-specific function return_to_handler.
 That function will simply call the common ftrace_return_to_handler function and
-that will return the original return address with which, you can return to the
+that will return the original return address with which you can return to the
 original call site.
 Here is the updated mcount pseudo code:
@@ -173,14 +174,16 @@ void ftrace_graph_caller(void)
        unsigned long *frompc = &...;
        unsigned long selfpc = <return address> - MCOUNT_INSN_SIZE;
-        prepare_ftrace_return(frompc, selfpc);
+        /* passing frame pointer up is optional -- see below */
+        prepare_ftrace_return(frompc, selfpc, frame_pointer);
        /* restore all state needed by the ABI */
 }
 #endif
-For information on how to implement prepare_ftrace_return(), simply look at
+For information on how to implement prepare_ftrace_return(), simply look at the
-the x86 version.  The only architecture-specific piece in it is the setup of
+x86 version (the frame pointer passing is optional; see the next section for
+more information).  The only architecture-specific piece in it is the setup of
 the fault recovery table (the asm(...) code).  The rest should be the same
 across architectures.
@@ -205,6 +208,23 @@ void return_to_handler(void)
 #endif
+HAVE_FUNCTION_GRAPH_FP_TEST
+---------------------------
+An arch may pass in a unique value (frame pointer) to both the entering and
+exiting of a function.  On exit, the value is compared and if it does not
+match, then it will panic the kernel.  This is largely a sanity check for bad
+code generation with gcc.  If gcc for your port sanely updates the frame
+pointer under different opitmization levels, then ignore this option.
+However, adding support for it isn't terribly difficult.  In your assembly code
+that calls prepare_ftrace_return(), pass the frame pointer as the 3rd argument.
+Then in the C version of that function, do what the x86 port does and pass it
+along to ftrace_push_return_trace() instead of a stub value of 0.
+Similarly, when you call ftrace_return_to_handler(), pass it the frame pointer.
 HAVE_FTRACE_NMI_ENTER
 ---------------------
@@ -213,10 +233,18 @@ If you can't trace NMI functions, then skip this option.
 <details to be filled>
-HAVE_FTRACE_SYSCALLS
+HAVE_SYSCALL_TRACEPOINTS
 ---------------------
-<details to be filled>
+You need very few things to get the syscalls tracing in an arch.
+- Support HAVE_ARCH_TRACEHOOK (see arch/Kconfig).
+- Have a NR_syscalls variable in <asm/unistd.h> that provides the number
+  of syscalls supported by the arch.
+- Support the TIF_SYSCALL_TRACEPOINT thread flags.
+- Put the trace_sys_enter() and trace_sys_exit() tracepoints calls from ptrace
+  in the ptrace syscalls tracing path.
+- Tag this arch as HAVE_SYSCALL_TRACEPOINTS.
 HAVE_FTRACE_MCOUNT_RECORD
diff --git a/Documentation/trace/ftrace.txt b/Documentation/trace/ftrace.txt
index 8179692fbb90..03485bfbd797 100644
--- a/Documentation/trace/ftrace.txt
+++ b/Documentation/trace/ftrace.txt
@@ -1588,7 +1588,7 @@ module author does not need to worry about it.
 When tracing is enabled, kstop_machine is called to prevent
 races with the CPUS executing code being modified (which can
-cause the CPU to do undesireable things), and the nops are
+cause the CPU to do undesirable things), and the nops are
 patched back to calls. But this time, they do not call mcount
 (which is just a function stub). They now call into the ftrace
 infrastructure.
@@ -1625,7 +1625,7 @@ If I am only interested in sys_nanosleep and hrtimer_interrupt:
 # echo sys_nanosleep hrtimer_interrupt \
                > set_ftrace_filter
- # echo ftrace > current_tracer
+ # echo function > current_tracer
 # echo 1 > tracing_enabled
 # usleep 1
 # echo 0 > tracing_enabled
diff --git a/Documentation/trace/kprobetrace.txt b/Documentation/trace/kprobetrace.txt
new file mode 100644
index 000000000000..a9100b28eb84
--- /dev/null
+++ b/Documentation/trace/kprobetrace.txt
@@ -0,0 +1,156 @@
+                        Kprobe-based Event Tracing
+                        ==========================
+                 Documentation is written by Masami Hiramatsu
+Overview
+--------
+These events are similar to tracepoint based events. Instead of Tracepoint,
+this is based on kprobes (kprobe and kretprobe). So it can probe wherever
+kprobes can probe (this means, all functions body except for __kprobes
+functions). Unlike the Tracepoint based event, this can be added and removed
+dynamically, on the fly.
+To enable this feature, build your kernel with CONFIG_KPROBE_TRACING=y.
+Similar to the events tracer, this doesn't need to be activated via
+current_tracer. Instead of that, add probe points via
+/sys/kernel/debug/tracing/kprobe_events, and enable it via
+/sys/kernel/debug/tracing/events/kprobes/<EVENT>/enabled.
+Synopsis of kprobe_events
+-------------------------
+  p[:[GRP/]EVENT] SYMBOL[+offs]|MEMADDR [FETCHARGS]     : Set a probe
+  r[:[GRP/]EVENT] SYMBOL[+0] [FETCHARGS]                : Set a return probe
+  -:[GRP/]EVENT                                         : Clear a probe
+ GRP            : Group name. If omitted, use "kprobes" for it.
+ EVENT          : Event name. If omitted, the event name is generated
+                  based on SYMBOL+offs or MEMADDR.
+ SYMBOL[+offs]  : Symbol+offset where the probe is inserted.
+ MEMADDR        : Address where the probe is inserted.
+ FETCHARGS      : Arguments. Each probe can have up to 128 args.
+  %REG          : Fetch register REG
+  @ADDR         : Fetch memory at ADDR (ADDR should be in kernel)
+  @SYM[+|-offs] : Fetch memory at SYM +|- offs (SYM should be a data symbol)
+  $stackN       : Fetch Nth entry of stack (N >= 0)
+  $stack        : Fetch stack address.
+  $retval       : Fetch return value.(*)
+  +|-offs(FETCHARG) : Fetch memory at FETCHARG +|- offs address.(**)
+  NAME=FETCHARG: Set NAME as the argument name of FETCHARG.
+  (*) only for return probe.
+  (**) this is useful for fetching a field of data structures.
+Per-Probe Event Filtering
+-------------------------
+ Per-probe event filtering feature allows you to set different filter on each
+probe and gives you what arguments will be shown in trace buffer. If an event
+name is specified right after 'p:' or 'r:' in kprobe_events, it adds an event
+under tracing/events/kprobes/<EVENT>, at the directory you can see 'id',
+'enabled', 'format' and 'filter'.
+enabled:
+  You can enable/disable the probe by writing 1 or 0 on it.
+format:
+  This shows the format of this probe event.
+filter:
+  You can write filtering rules of this event.
+id:
+  This shows the id of this probe event.
+Event Profiling
+---------------
+ You can check the total number of probe hits and probe miss-hits via
+/sys/kernel/debug/tracing/kprobe_profile.
+ The first column is event name, the second is the number of probe hits,
+the third is the number of probe miss-hits.
+Usage examples
+--------------
+To add a probe as a new event, write a new definition to kprobe_events
+as below.
+  echo 'p:myprobe do_sys_open dfd=%ax filename=%dx flags=%cx mode=+4($stack)' > /sys/kernel/debug/tracing/kprobe_events
+ This sets a kprobe on the top of do_sys_open() function with recording
+1st to 4th arguments as "myprobe" event. Note, which register/stack entry is
+assigned to each function argument depends on arch-specific ABI. If you unsure
+the ABI, please try to use probe subcommand of perf-tools (you can find it
+under tools/perf/).
+As this example shows, users can choose more familiar names for each arguments.
+  echo 'r:myretprobe do_sys_open $retval' >> /sys/kernel/debug/tracing/kprobe_events
+ This sets a kretprobe on the return point of do_sys_open() function with
+recording return value as "myretprobe" event.
+ You can see the format of these events via
+/sys/kernel/debug/tracing/events/kprobes/<EVENT>/format.
+  cat /sys/kernel/debug/tracing/events/kprobes/myprobe/format
+name: myprobe
+ID: 780
+format:
+        field:unsigned short common_type;       offset:0;       size:2; signed:0;
+        field:unsigned char common_flags;       offset:2;       size:1; signed:0;
+        field:unsigned char common_preempt_count;       offset:3; size:1;signed:0;
+        field:int common_pid;   offset:4;       size:4; signed:1;
+        field:int common_lock_depth;    offset:8;       size:4; signed:1;
+        field:unsigned long __probe_ip; offset:12;      size:4; signed:0;
+        field:int __probe_nargs;        offset:16;      size:4; signed:1;
+        field:unsigned long dfd;        offset:20;      size:4; signed:0;
+        field:unsigned long filename;   offset:24;      size:4; signed:0;
+        field:unsigned long flags;      offset:28;      size:4; signed:0;
+        field:unsigned long mode;       offset:32;      size:4; signed:0;
+print fmt: "(%lx) dfd=%lx filename=%lx flags=%lx mode=%lx", REC->__probe_ip,
+REC->dfd, REC->filename, REC->flags, REC->mode
+ You can see that the event has 4 arguments as in the expressions you specified.
+  echo > /sys/kernel/debug/tracing/kprobe_events
+ This clears all probe points.
+ Or,
+  echo -:myprobe >> kprobe_events
+ This clears probe points selectively.
+ Right after definition, each event is disabled by default. For tracing these
+events, you need to enable it.
+  echo 1 > /sys/kernel/debug/tracing/events/kprobes/myprobe/enable
+  echo 1 > /sys/kernel/debug/tracing/events/kprobes/myretprobe/enable
+ And you can see the traced information via /sys/kernel/debug/tracing/trace.
+  cat /sys/kernel/debug/tracing/trace
+# tracer: nop
+#
+#           TASK-PID    CPU#    TIMESTAMP  FUNCTION
+#              | |       |          |         |
+           <...>-1447  [001] 1038282.286875: myprobe: (do_sys_open+0x0/0xd6) dfd=3 filename=7fffd1ec4440 flags=8000 mode=0
+           <...>-1447  [001] 1038282.286878: myretprobe: (sys_openat+0xc/0xe <- do_sys_open) $retval=fffffffffffffffe
+           <...>-1447  [001] 1038282.286885: myprobe: (do_sys_open+0x0/0xd6) dfd=ffffff9c filename=40413c flags=8000 mode=1b6
+           <...>-1447  [001] 1038282.286915: myretprobe: (sys_open+0x1b/0x1d <- do_sys_open) $retval=3
+           <...>-1447  [001] 1038282.286969: myprobe: (do_sys_open+0x0/0xd6) dfd=ffffff9c filename=4041c6 flags=98800 mode=10
+           <...>-1447  [001] 1038282.286976: myretprobe: (sys_open+0x1b/0x1d <- do_sys_open) $retval=3
+ Each line shows when the kernel hits an event, and <- SYMBOL means kernel
+returns from SYMBOL(e.g. "sys_open+0x1b/0x1d <- do_sys_open" means kernel
+returns from do_sys_open to sys_open+0x1b).
diff --git a/Documentation/trace/mmiotrace.txt b/Documentation/trace/mmiotrace.txt
index 162effbfbdec..664e7386d89e 100644
--- a/Documentation/trace/mmiotrace.txt
+++ b/Documentation/trace/mmiotrace.txt
@@ -44,7 +44,8 @@ Check for lost events.
 Usage
 -----
-Make sure debugfs is mounted to /sys/kernel/debug. If not, (requires root privileges)
+Make sure debugfs is mounted to /sys/kernel/debug.
+If not (requires root privileges):
 $ mount -t debugfs debugfs /sys/kernel/debug
 Check that the driver you are about to trace is not loaded.
@@ -91,7 +92,7 @@ $ dmesg > dmesg.txt
 $ tar zcf pciid-nick-mmiotrace.tar.gz mydump.txt lspci.txt dmesg.txt
 and then send the .tar.gz file. The trace compresses considerably. Replace
 "pciid" and "nick" with the PCI ID or model name of your piece of hardware
-under investigation and your nick name.
+under investigation and your nickname.
 How Mmiotrace Works
@@ -100,7 +101,7 @@ How Mmiotrace Works
 Access to hardware IO-memory is gained by mapping addresses from PCI bus by
 calling one of the ioremap_*() functions. Mmiotrace is hooked into the
 __ioremap() function and gets called whenever a mapping is created. Mapping is
-an event that is recorded into the trace log. Note, that ISA range mappings
+an event that is recorded into the trace log. Note that ISA range mappings
 are not caught, since the mapping always exists and is returned directly.
 MMIO accesses are recorded via page faults. Just before __ioremap() returns,
@@ -122,11 +123,11 @@ Trace Log Format
 ----------------
 The raw log is text and easily filtered with e.g. grep and awk. One record is
-one line in the log. A record starts with a keyword, followed by keyword
+one line in the log. A record starts with a keyword, followed by keyword-
-dependant arguments. Arguments are separated by a space, or continue until the
+dependent arguments. Arguments are separated by a space, or continue until the
 end of line. The format for version 20070824 is as follows:
-Explanation     Keyword Space separated arguments
+Explanation     Keyword Space-separated arguments
 ---------------------------------------------------------------------------
 read event      R       width, timestamp, map id, physical, value, PC, PID
@@ -136,7 +137,7 @@ iounmap event	UNMAP	timestamp, map id, PC, PID
 marker          MARK    timestamp, text
 version         VERSION the string "20070824"
 info for reader LSPCI   one line from lspci -v
-PCI address map PCIDEV  space separated /proc/bus/pci/devices data
+PCI address map PCIDEV  space-separated /proc/bus/pci/devices data
 unk. opcode     UNKNOWN timestamp, map id, physical, data, PC, PID
 Timestamp is in seconds with decimals. Physical is a PCI bus address, virtual
diff --git a/Documentation/trace/ring-buffer-design.txt b/Documentation/trace/ring-buffer-design.txt
index 5b1d23d604c5..d299ff31df57 100644
--- a/Documentation/trace/ring-buffer-design.txt
+++ b/Documentation/trace/ring-buffer-design.txt
@@ -33,9 +33,9 @@ head_page - a pointer to the page that the reader will use next
 tail_page - a pointer to the page that will be written to next
-commit_page - a pointer to the page with the last finished non nested write.
+commit_page - a pointer to the page with the last finished non-nested write.
-cmpxchg - hardware assisted atomic transaction that performs the following:
+cmpxchg - hardware-assisted atomic transaction that performs the following:
   A = B iff previous A == C
@@ -52,15 +52,15 @@ The Generic Ring Buffer
 The ring buffer can be used in either an overwrite mode or in
 producer/consumer mode.
-Producer/consumer mode is where the producer were to fill up the
+Producer/consumer mode is where if the producer were to fill up the
 buffer before the consumer could free up anything, the producer
 will stop writing to the buffer. This will lose most recent events.
-Overwrite mode is where the produce were to fill up the buffer
+Overwrite mode is where if the producer were to fill up the buffer
 before the consumer could free up anything, the producer will
 overwrite the older data. This will lose the oldest events.
-No two writers can write at the same time (on the same per cpu buffer),
+No two writers can write at the same time (on the same per-cpu buffer),
 but a writer may interrupt another writer, but it must finish writing
 before the previous writer may continue. This is very important to the
 algorithm. The writers act like a "stack". The way interrupts works
@@ -79,16 +79,16 @@ the interrupt doing a write as well.
 Readers can happen at any time. But no two readers may run at the
 same time, nor can a reader preempt/interrupt another reader. A reader
-can not preempt/interrupt a writer, but it may read/consume from the
+cannot preempt/interrupt a writer, but it may read/consume from the
 buffer at the same time as a writer is writing, but the reader must be
 on another processor to do so. A reader may read on its own processor
 and can be preempted by a writer.
-A writer can preempt a reader, but a reader can not preempt a writer.
+A writer can preempt a reader, but a reader cannot preempt a writer.
 But a reader can read the buffer at the same time (on another processor)
 as a writer.
-The ring buffer is made up of a list of pages held together by a link list.
+The ring buffer is made up of a list of pages held together by a linked list.
 At initialization a reader page is allocated for the reader that is not
 part of the ring buffer.
@@ -102,7 +102,7 @@ the head page.
 The reader has its own page to use. At start up time, this page is
 allocated but is not attached to the list. When the reader wants
-to read from the buffer, if its page is empty (like it is on start up)
+to read from the buffer, if its page is empty (like it is on start-up),
 it will swap its page with the head_page. The old reader page will
 become part of the ring buffer and the head_page will be removed.
 The page after the inserted page (old reader_page) will become the
@@ -206,7 +206,7 @@ The main pointers:
  commit page - the page that last finished a write.
-The commit page only is updated by the outer most writer in the
+The commit page only is updated by the outermost writer in the
 writer stack. A writer that preempts another writer will not move the
 commit page.
@@ -281,7 +281,7 @@ with the previous write.
 The commit pointer points to the last write location that was
 committed without preempting another write. When a write that
 preempted another write is committed, it only becomes a pending commit
-and will not be a full commit till all writes have been committed.
+and will not be a full commit until all writes have been committed.
 The commit page points to the page that has the last full commit.
 The tail page points to the page with the last write (before
@@ -292,7 +292,7 @@ be several pages ahead. If the tail page catches up to the commit
 page then no more writes may take place (regardless of the mode
 of the ring buffer: overwrite and produce/consumer).
-The order of pages are:
+The order of pages is:
 head page
 commit page
@@ -311,7 +311,7 @@ Possible scenario:
 There is a special case that the head page is after either the commit page
 and possibly the tail page. That is when the commit (and tail) page has been
 swapped with the reader page. This is because the head page is always
-part of the ring buffer, but the reader page is not. When ever there
+part of the ring buffer, but the reader page is not. Whenever there
 has been less than a full page that has been committed inside the ring buffer,
 and a reader swaps out a page, it will be swapping out the commit page.
@@ -338,7 +338,7 @@ and a reader swaps out a page, it will be swapping out the commit page.
 In this case, the head page will not move when the tail and commit
 move back into the ring buffer.
-The reader can not swap a page into the ring buffer if the commit page
+The reader cannot swap a page into the ring buffer if the commit page
 is still on that page. If the read meets the last commit (real commit
 not pending or reserved), then there is nothing more to read.
 The buffer is considered empty until another full commit finishes.
@@ -395,7 +395,7 @@ The main idea behind the lockless algorithm is to combine the moving
 of the head_page pointer with the swapping of pages with the reader.
 State flags are placed inside the pointer to the page. To do this,
 each page must be aligned in memory by 4 bytes. This will allow the 2
-least significant bits of the address to be used as flags. Since
+least significant bits of the address to be used as flags, since
 they will always be zero for the address. To get the address,
 simply mask out the flags.
@@ -460,7 +460,7 @@ When the reader tries to swap the page with the ring buffer, it
 will also use cmpxchg. If the flag bit in the pointer to the
 head page does not have the HEADER flag set, the compare will fail
 and the reader will need to look for the new head page and try again.
-Note, the flag UPDATE and HEADER are never set at the same time.
+Note, the flags UPDATE and HEADER are never set at the same time.
 The reader swaps the reader page as follows:
@@ -539,7 +539,7 @@ updated to the reader page.
    |  +-----------------------------+   |
    +------------------------------------+
-Another important point. The page that the reader page points back to
+Another important point: The page that the reader page points back to
 by its previous pointer (the one that now points to the new head page)
 never points back to the reader page. That is because the reader page is
 not part of the ring buffer. Traversing the ring buffer via the next pointers
@@ -572,7 +572,7 @@ not be able to swap the head page from the buffer, nor will it be able to
 move the head page, until the writer is finished with the move.
 This eliminates any races that the reader can have on the writer. The reader
-must spin, and this is why the reader can not preempt the writer.
+must spin, and this is why the reader cannot preempt the writer.
            tail page
               |
@@ -659,9 +659,9 @@ before pushing the head page. If it is, then it can be assumed that the
 tail page wrapped the buffer, and we must drop new writes.
 This is not a race condition, because the commit page can only be moved
-by the outter most writer (the writer that was preempted).
+by the outermost writer (the writer that was preempted).
 This means that the commit will not move while a writer is moving the
-tail page. The reader can not swap the reader page if it is also being
+tail page. The reader cannot swap the reader page if it is also being
 used as the commit page. The reader can simply check that the commit
 is off the reader page. Once the commit page leaves the reader page
 it will never go back on it unless a reader does another swap with the
@@ -733,7 +733,7 @@ The write converts the head page pointer to UPDATE.
 --->|   |<---|   |<---|   |<---|   |<---
    +---+    +---+    +---+    +---+
-But if a nested writer preempts here. It will see that the next
+But if a nested writer preempts here, it will see that the next
 page is a head page, but it is also nested. It will detect that
 it is nested and will save that information. The detection is the
 fact that it sees the UPDATE flag instead of a HEADER or NORMAL
@@ -761,7 +761,7 @@ to NORMAL.
 --->|   |<---|   |<---|   |<---|   |<---
    +---+    +---+    +---+    +---+
-After the nested writer finishes, the outer most writer will convert
+After the nested writer finishes, the outermost writer will convert
 the UPDATE pointer to NORMAL.
@@ -812,7 +812,7 @@ head page.
    +---+    +---+    +---+    +---+
 The nested writer moves the tail page forward. But does not set the old
-update page to NORMAL because it is not the outer most writer.
+update page to NORMAL because it is not the outermost writer.
                    tail page
                        |
@@ -892,7 +892,7 @@ It will return to the first writer.
 --->|   |<---|   |<---|   |<---|   |<---
    +---+    +---+    +---+    +---+
-The first writer can not know atomically test if the tail page moved
+The first writer cannot know atomically if the tail page moved
 while it updates the HEAD page. It will then update the head page to
 what it thinks is the new head page.
@@ -923,9 +923,9 @@ if the tail page is either where it use to be or on the next page:
 --->|   |<---|   |<---|   |<---|   |<---
    +---+    +---+    +---+    +---+
-If tail page != A and tail page does not equal B, then it must reset the
+If tail page != A and tail page != B, then it must reset the pointer
-pointer back to NORMAL. The fact that it only needs to worry about
+back to NORMAL. The fact that it only needs to worry about nested
-nested writers, it only needs to check this after setting the HEAD page.
+writers means that it only needs to check this after setting the HEAD page.
 (first writer)
@@ -939,7 +939,7 @@ nested writers, it only needs to check this after setting the HEAD page.
    +---+    +---+    +---+    +---+
 Now the writer can update the head page. This is also why the head page must
-remain in UPDATE and only reset by the outer most writer. This prevents
+remain in UPDATE and only reset by the outermost writer. This prevents
 the reader from seeing the incorrect head page.
diff --git a/Documentation/trace/tracepoint-analysis.txt b/Documentation/trace/tracepoint-analysis.txt
index 5eb4e487e667..87bee3c129ba 100644
--- a/Documentation/trace/tracepoint-analysis.txt
+++ b/Documentation/trace/tracepoint-analysis.txt
@@ -10,8 +10,8 @@ Tracepoints (see Documentation/trace/tracepoints.txt) can be used without
 creating custom kernel modules to register probe functions using the event
 tracing infrastructure.
-Simplistically, tracepoints will represent an important event that when can
+Simplistically, tracepoints represent important events that can be
-be taken in conjunction with other tracepoints to build a "Big Picture" of
+taken in conjunction with other tracepoints to build a "Big Picture" of
 what is going on within the system. There are a large number of methods for
 gathering and interpreting these events. Lacking any current Best Practises,
 this document describes some of the methods that can be used.
@@ -33,12 +33,12 @@ calling
 will give a fair indication of the number of events available.
-2.2 PCL
+2.2 PCL (Performance Counters for Linux)
 -------
-Discovery and enumeration of all counters and events, including tracepoints
+Discovery and enumeration of all counters and events, including tracepoints,
 are available with the perf tool. Getting a list of available events is a
-simple case of
+simple case of:
  $ perf list 2>&1 | grep Tracepoint
  ext4:ext4_free_inode                     [Tracepoint event]
@@ -49,19 +49,19 @@ simple case of
  [ .... remaining output snipped .... ]
-2. Enabling Events
+3. Enabling Events
 ==================
-2.1 System-Wide Event Enabling
+3.1 System-Wide Event Enabling
 ------------------------------
 See Documentation/trace/events.txt for a proper description on how events
 can be enabled system-wide. A short example of enabling all events related
-to page allocation would look something like
+to page allocation would look something like:
  $ for i in `find /sys/kernel/debug/tracing/events -name "enable" | grep mm_`; do echo 1 > $i; done
-2.2 System-Wide Event Enabling with SystemTap
+3.2 System-Wide Event Enabling with SystemTap
 ---------------------------------------------
 In SystemTap, tracepoints are accessible using the kernel.trace() function
@@ -86,7 +86,7 @@ were allocating the pages.
          print_count()
  }
-2.3 System-Wide Event Enabling with PCL
+3.3 System-Wide Event Enabling with PCL
 ---------------------------------------
 By specifying the -a switch and analysing sleep, the system-wide events
@@ -107,16 +107,16 @@ for a duration of time can be examined.
 Similarly, one could execute a shell and exit it as desired to get a report
 at that point.
-2.4 Local Event Enabling
+3.4 Local Event Enabling
 ------------------------
 Documentation/trace/ftrace.txt describes how to enable events on a per-thread
 basis using set_ftrace_pid.
-2.5 Local Event Enablement with PCL
+3.5 Local Event Enablement with PCL
 -----------------------------------
-Events can be activate and tracked for the duration of a process on a local
+Events can be activated and tracked for the duration of a process on a local
 basis using PCL such as follows.
  $ perf stat -e kmem:mm_page_alloc -e kmem:mm_page_free_direct \
@@ -131,18 +131,18 @@ basis using PCL such as follows.
    0.973913387  seconds time elapsed
-3. Event Filtering
+4. Event Filtering
 ==================
 Documentation/trace/ftrace.txt covers in-depth how to filter events in
 ftrace.  Obviously using grep and awk of trace_pipe is an option as well
 as any script reading trace_pipe.
-4. Analysing Event Variances with PCL
+5. Analysing Event Variances with PCL
 =====================================
 Any workload can exhibit variances between runs and it can be important
-to know what the standard deviation in. By and large, this is left to the
+to know what the standard deviation is. By and large, this is left to the
 performance analyst to do it by hand. In the event that the discrete event
 occurrences are useful to the performance analyst, then perf can be used.
@@ -166,7 +166,7 @@ In the event that some higher-level event is required that depends on some
 aggregation of discrete events, then a script would need to be developed.
 Using --repeat, it is also possible to view how events are fluctuating over
-time on a system wide basis using -a and sleep.
+time on a system-wide basis using -a and sleep.
  $ perf stat -e kmem:mm_page_alloc -e kmem:mm_page_free_direct \
                -e kmem:mm_pagevec_free \
@@ -180,7 +180,7 @@ time on a system wide basis using -a and sleep.
    1.002251757  seconds time elapsed   ( +-   0.005% )
-5. Higher-Level Analysis with Helper Scripts
+6. Higher-Level Analysis with Helper Scripts
 ============================================
 When events are enabled the events that are triggering can be read from
@@ -190,11 +190,11 @@ be gathered on-line as appropriate. Examples of post-processing might include
  o Reading information from /proc for the PID that triggered the event
  o Deriving a higher-level event from a series of lower-level events.
-  o Calculate latencies between two events
+  o Calculating latencies between two events
 Documentation/trace/postprocess/trace-pagealloc-postprocess.pl is an example
 script that can read trace_pipe from STDIN or a copy of a trace. When used
-on-line, it can be interrupted once to generate a report without existing
+on-line, it can be interrupted once to generate a report without exiting
 and twice to exit.
 Simplistically, the script just reads STDIN and counts up events but it
@@ -212,12 +212,12 @@ also can do more such as
    processes, the parent process responsible for creating all the helpers
    can be identified
-6. Lower-Level Analysis with PCL
+7. Lower-Level Analysis with PCL
 ================================
-There may also be a requirement to identify what functions with a program
+There may also be a requirement to identify what functions within a program
 were generating events within the kernel. To begin this sort of analysis, the
-data must be recorded. At the time of writing, this required root
+data must be recorded. At the time of writing, this required root:
  $ perf record -c 1 \
        -e kmem:mm_page_alloc -e kmem:mm_page_free_direct \
@@ -253,11 +253,11 @@ perf report.
  # (For more details, try: perf report --sort comm,dso,symbol)
  #
-According to this, the vast majority of events occured triggered on events
+According to this, the vast majority of events triggered on events
-within the VDSO. With simple binaries, this will often be the case so lets
+within the VDSO. With simple binaries, this will often be the case so let's
 take a slightly different example. In the course of writing this, it was
-noticed that X was generating an insane amount of page allocations so lets look
+noticed that X was generating an insane amount of page allocations so let's look
-at it
+at it:
  $ perf record -c 1 -f \
                -e kmem:mm_page_alloc -e kmem:mm_page_free_direct \
@@ -280,8 +280,8 @@ This was interrupted after a few seconds and
  # (For more details, try: perf report --sort comm,dso,symbol)
  #
-So, almost half of the events are occuring in a library. To get an idea which
+So, almost half of the events are occurring in a library. To get an idea which
-symbol.
+symbol:
  $ perf report --sort comm,dso,symbol
  # Samples: 27666
@@ -297,7 +297,7 @@ symbol.
       0.01%     Xorg  /opt/gfx-test/lib/libpixman-1.so.0.13.1  [.] get_fast_path
       0.00%     Xorg  [kernel]                                 [k] ftrace_trace_userstack
-To see where within the function pixmanFillsse2 things are going wrong
+To see where within the function pixmanFillsse2 things are going wrong:
  $ perf annotate pixmanFillsse2
  [ ... ]
author	Andrea Bastoni <bastoni@cs.unc.edu>	2010-05-30 19:16:45 -0400
committer	Andrea Bastoni <bastoni@cs.unc.edu>	2010-05-30 19:16:45 -0400
commit	ada47b5fe13d89735805b566185f4885f5a3f750 (patch)
tree	644b88f8a71896307d71438e9b3af49126ffb22b /Documentation/trace
parent	43e98717ad40a4ae64545b5ba047c7b86aa44f4f (diff)
parent	3280f21d43ee541f97f8cda5792150d2dbec20d5 (diff)