diff options
Diffstat (limited to 'Documentation/accounting/taskstats.txt')
-rw-r--r-- | Documentation/accounting/taskstats.txt | 64 |
1 files changed, 52 insertions, 12 deletions
diff --git a/Documentation/accounting/taskstats.txt b/Documentation/accounting/taskstats.txt index efd8f605bcd5..92ebf29e9041 100644 --- a/Documentation/accounting/taskstats.txt +++ b/Documentation/accounting/taskstats.txt | |||
@@ -26,20 +26,28 @@ leader - a process is deemed alive as long as it has any task belonging to it. | |||
26 | Usage | 26 | Usage |
27 | ----- | 27 | ----- |
28 | 28 | ||
29 | To get statistics during task's lifetime, userspace opens a unicast netlink | 29 | To get statistics during a task's lifetime, userspace opens a unicast netlink |
30 | socket (NETLINK_GENERIC family) and sends commands specifying a pid or a tgid. | 30 | socket (NETLINK_GENERIC family) and sends commands specifying a pid or a tgid. |
31 | The response contains statistics for a task (if pid is specified) or the sum of | 31 | The response contains statistics for a task (if pid is specified) or the sum of |
32 | statistics for all tasks of the process (if tgid is specified). | 32 | statistics for all tasks of the process (if tgid is specified). |
33 | 33 | ||
34 | To obtain statistics for tasks which are exiting, userspace opens a multicast | 34 | To obtain statistics for tasks which are exiting, the userspace listener |
35 | netlink socket. Each time a task exits, its per-pid statistics is always sent | 35 | sends a register command and specifies a cpumask. Whenever a task exits on |
36 | by the kernel to each listener on the multicast socket. In addition, if it is | 36 | one of the cpus in the cpumask, its per-pid statistics are sent to the |
37 | the last thread exiting its thread group, an additional record containing the | 37 | registered listener. Using cpumasks allows the data received by one listener |
38 | per-tgid stats are also sent. The latter contains the sum of per-pid stats for | 38 | to be limited and assists in flow control over the netlink interface and is |
39 | all threads in the thread group, both past and present. | 39 | explained in more detail below. |
40 | |||
41 | If the exiting task is the last thread exiting its thread group, | ||
42 | an additional record containing the per-tgid stats is also sent to userspace. | ||
43 | The latter contains the sum of per-pid stats for all threads in the thread | ||
44 | group, both past and present. | ||
40 | 45 | ||
41 | getdelays.c is a simple utility demonstrating usage of the taskstats interface | 46 | getdelays.c is a simple utility demonstrating usage of the taskstats interface |
42 | for reporting delay accounting statistics. | 47 | for reporting delay accounting statistics. Users can register cpumasks, |
48 | send commands and process responses, listen for per-tid/tgid exit data, | ||
49 | write the data received to a file and do basic flow control by increasing | ||
50 | receive buffer sizes. | ||
43 | 51 | ||
44 | Interface | 52 | Interface |
45 | --------- | 53 | --------- |
@@ -66,10 +74,20 @@ The messages are in the format | |||
66 | 74 | ||
67 | The taskstats payload is one of the following three kinds: | 75 | The taskstats payload is one of the following three kinds: |
68 | 76 | ||
69 | 1. Commands: Sent from user to kernel. The payload is one attribute, of type | 77 | 1. Commands: Sent from user to kernel. Commands to get data on |
70 | TASKSTATS_CMD_ATTR_PID/TGID, containing a u32 pid or tgid in the attribute | 78 | a pid/tgid consist of one attribute, of type TASKSTATS_CMD_ATTR_PID/TGID, |
71 | payload. The pid/tgid denotes the task/process for which userspace wants | 79 | containing a u32 pid or tgid in the attribute payload. The pid/tgid denotes |
72 | statistics. | 80 | the task/process for which userspace wants statistics. |
81 | |||
82 | Commands to register/deregister interest in exit data from a set of cpus | ||
83 | consist of one attribute, of type | ||
84 | TASKSTATS_CMD_ATTR_REGISTER/DEREGISTER_CPUMASK and contain a cpumask in the | ||
85 | attribute payload. The cpumask is specified as an ascii string of | ||
86 | comma-separated cpu ranges e.g. to listen to exit data from cpus 1,2,3,5,7,8 | ||
87 | the cpumask would be "1-3,5,7-8". If userspace forgets to deregister interest | ||
88 | in cpus before closing the listening socket, the kernel cleans up its interest | ||
89 | set over time. However, for the sake of efficiency, an explicit deregistration | ||
90 | is advisable. | ||
73 | 91 | ||
74 | 2. Response for a command: sent from the kernel in response to a userspace | 92 | 2. Response for a command: sent from the kernel in response to a userspace |
75 | command. The payload is a series of three attributes of type: | 93 | command. The payload is a series of three attributes of type: |
@@ -138,4 +156,26 @@ struct too much, requiring disparate userspace accounting utilities to | |||
138 | unnecessarily receive large structures whose fields are of no interest, then | 156 | unnecessarily receive large structures whose fields are of no interest, then |
139 | extending the attributes structure would be worthwhile. | 157 | extending the attributes structure would be worthwhile. |
140 | 158 | ||
159 | Flow control for taskstats | ||
160 | -------------------------- | ||
161 | |||
162 | When the rate of task exits becomes large, a listener may not be able to keep | ||
163 | up with the kernel's rate of sending per-tid/tgid exit data leading to data | ||
164 | loss. This possibility gets compounded when the taskstats structure gets | ||
165 | extended and the number of cpus grows large. | ||
166 | |||
167 | To avoid losing statistics, userspace should do one or more of the following: | ||
168 | |||
169 | - increase the receive buffer sizes for the netlink sockets opened by | ||
170 | listeners to receive exit data. | ||
171 | |||
172 | - create more listeners and reduce the number of cpus being listened to by | ||
173 | each listener. In the extreme case, there could be one listener for each cpu. | ||
174 | Users may also consider setting the cpu affinity of the listener to the subset | ||
175 | of cpus to which it listens, especially if they are listening to just one cpu. | ||
176 | |||
177 | Despite these measures, if the userspace receives ENOBUFS error messages | ||
178 | indicated overflow of receive buffers, it should take measures to handle the | ||
179 | loss of data. | ||
180 | |||
141 | ---- | 181 | ---- |