diff options
-rw-r--r-- | Documentation/perf-counters.txt | 104 |
1 files changed, 104 insertions, 0 deletions
diff --git a/Documentation/perf-counters.txt b/Documentation/perf-counters.txt new file mode 100644 index 000000000000..19033a0bb526 --- /dev/null +++ b/Documentation/perf-counters.txt | |||
@@ -0,0 +1,104 @@ | |||
1 | |||
2 | Performance Counters for Linux | ||
3 | ------------------------------ | ||
4 | |||
5 | Performance counters are special hardware registers available on most modern | ||
6 | CPUs. These registers count the number of certain types of hw events: such | ||
7 | as instructions executed, cachemisses suffered, or branches mis-predicted - | ||
8 | without slowing down the kernel or applications. These registers can also | ||
9 | trigger interrupts when a threshold number of events have passed - and can | ||
10 | thus be used to profile the code that runs on that CPU. | ||
11 | |||
12 | The Linux Performance Counter subsystem provides an abstraction of these | ||
13 | hardware capabilities. It provides per task and per CPU counters, and | ||
14 | it provides event capabilities on top of those. | ||
15 | |||
16 | Performance counters are accessed via special file descriptors. | ||
17 | There's one file descriptor per virtual counter used. | ||
18 | |||
19 | The special file descriptor is opened via the perf_counter_open() | ||
20 | system call: | ||
21 | |||
22 | int | ||
23 | perf_counter_open(u32 hw_event_type, | ||
24 | u32 hw_event_period, | ||
25 | u32 record_type, | ||
26 | pid_t pid, | ||
27 | int cpu); | ||
28 | |||
29 | The syscall returns the new fd. The fd can be used via the normal | ||
30 | VFS system calls: read() can be used to read the counter, fcntl() | ||
31 | can be used to set the blocking mode, etc. | ||
32 | |||
33 | Multiple counters can be kept open at a time, and the counters | ||
34 | can be poll()ed. | ||
35 | |||
36 | When creating a new counter fd, 'hw_event_type' is one of: | ||
37 | |||
38 | enum hw_event_types { | ||
39 | PERF_COUNT_CYCLES, | ||
40 | PERF_COUNT_INSTRUCTIONS, | ||
41 | PERF_COUNT_CACHE_REFERENCES, | ||
42 | PERF_COUNT_CACHE_MISSES, | ||
43 | PERF_COUNT_BRANCH_INSTRUCTIONS, | ||
44 | PERF_COUNT_BRANCH_MISSES, | ||
45 | }; | ||
46 | |||
47 | These are standardized types of events that work uniformly on all CPUs | ||
48 | that implements Performance Counters support under Linux. If a CPU is | ||
49 | not able to count branch-misses, then the system call will return | ||
50 | -EINVAL. | ||
51 | |||
52 | [ Note: more hw_event_types are supported as well, but they are CPU | ||
53 | specific and are enumerated via /sys on a per CPU basis. Raw hw event | ||
54 | types can be passed in as negative numbers. For example, to count | ||
55 | "External bus cycles while bus lock signal asserted" events on Intel | ||
56 | Core CPUs, pass in a -0x4064 event type value. ] | ||
57 | |||
58 | The parameter 'hw_event_period' is the number of events before waking up | ||
59 | a read() that is blocked on a counter fd. Zero value means a non-blocking | ||
60 | counter. | ||
61 | |||
62 | 'record_type' is the type of data that a read() will provide for the | ||
63 | counter, and it can be one of: | ||
64 | |||
65 | enum perf_record_type { | ||
66 | PERF_RECORD_SIMPLE, | ||
67 | PERF_RECORD_IRQ, | ||
68 | }; | ||
69 | |||
70 | a "simple" counter is one that counts hardware events and allows | ||
71 | them to be read out into a u64 count value. (read() returns 8 on | ||
72 | a successful read of a simple counter.) | ||
73 | |||
74 | An "irq" counter is one that will also provide an IRQ context information: | ||
75 | the IP of the interrupted context. In this case read() will return | ||
76 | the 8-byte counter value, plus the Instruction Pointer address of the | ||
77 | interrupted context. | ||
78 | |||
79 | The 'pid' parameter allows the counter to be specific to a task: | ||
80 | |||
81 | pid == 0: if the pid parameter is zero, the counter is attached to the | ||
82 | current task. | ||
83 | |||
84 | pid > 0: the counter is attached to a specific task (if the current task | ||
85 | has sufficient privilege to do so) | ||
86 | |||
87 | pid < 0: all tasks are counted (per cpu counters) | ||
88 | |||
89 | The 'cpu' parameter allows a counter to be made specific to a full | ||
90 | CPU: | ||
91 | |||
92 | cpu >= 0: the counter is restricted to a specific CPU | ||
93 | cpu == -1: the counter counts on all CPUs | ||
94 | |||
95 | Note: the combination of 'pid == -1' and 'cpu == -1' is not valid. | ||
96 | |||
97 | A 'pid > 0' and 'cpu == -1' counter is a per task counter that counts | ||
98 | events of that task and 'follows' that task to whatever CPU the task | ||
99 | gets schedule to. Per task counters can be created by any user, for | ||
100 | their own tasks. | ||
101 | |||
102 | A 'pid == -1' and 'cpu == x' counter is a per CPU counter that counts | ||
103 | all events on CPU-x. Per CPU counters need CAP_SYS_ADMIN privilege. | ||
104 | |||