aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/accounting
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/accounting')
-rw-r--r--Documentation/accounting/taskstats.txt146
1 files changed, 146 insertions, 0 deletions
diff --git a/Documentation/accounting/taskstats.txt b/Documentation/accounting/taskstats.txt
new file mode 100644
index 000000000000..ad9b6997e162
--- /dev/null
+++ b/Documentation/accounting/taskstats.txt
@@ -0,0 +1,146 @@
1Per-task statistics interface
2-----------------------------
3
4
5Taskstats is a netlink-based interface for sending per-task and
6per-process statistics from the kernel to userspace.
7
8Taskstats was designed for the following benefits:
9
10- efficiently provide statistics during lifetime of a task and on its exit
11- unified interface for multiple accounting subsystems
12- extensibility for use by future accounting patches
13
14Terminology
15-----------
16
17"pid", "tid" and "task" are used interchangeably and refer to the standard
18Linux task defined by struct task_struct. per-pid stats are the same as
19per-task stats.
20
21"tgid", "process" and "thread group" are used interchangeably and refer to the
22tasks that share an mm_struct i.e. the traditional Unix process. Despite the
23use of tgid, there is no special treatment for the task that is thread group
24leader - a process is deemed alive as long as it has any task belonging to it.
25
26Usage
27-----
28
29To get statistics during task's lifetime, userspace opens a unicast netlink
30socket (NETLINK_GENERIC family) and sends commands specifying a pid or a tgid.
31The response contains statistics for a task (if pid is specified) or the sum of
32statistics for all tasks of the process (if tgid is specified).
33
34To obtain statistics for tasks which are exiting, userspace opens a multicast
35netlink socket. Each time a task exits, two records are sent by the kernel to
36each listener on the multicast socket. The first the per-pid task's statistics
37and the second is the sum for all tasks of the process to which the task
38belongs (the task does not need to be the thread group leader). The need for
39per-tgid stats to be sent for each exiting task is explained in the per-tgid
40stats section below.
41
42
43Interface
44---------
45
46The user-kernel interface is encapsulated in include/linux/taskstats.h
47
48To avoid this documentation becoming obsolete as the interface evolves, only
49an outline of the current version is given. taskstats.h always overrides the
50description here.
51
52struct taskstats is the common accounting structure for both per-pid and
53per-tgid data. It is versioned and can be extended by each accounting subsystem
54that is added to the kernel. The fields and their semantics are defined in the
55taskstats.h file.
56
57The data exchanged between user and kernel space is a netlink message belonging
58to the NETLINK_GENERIC family and using the netlink attributes interface.
59The messages are in the format
60
61 +----------+- - -+-------------+-------------------+
62 | nlmsghdr | Pad | genlmsghdr | taskstats payload |
63 +----------+- - -+-------------+-------------------+
64
65
66The taskstats payload is one of the following three kinds:
67
681. Commands: Sent from user to kernel. The payload is one attribute, of type
69TASKSTATS_CMD_ATTR_PID/TGID, containing a u32 pid or tgid in the attribute
70payload. The pid/tgid denotes the task/process for which userspace wants
71statistics.
72
732. Response for a command: sent from the kernel in response to a userspace
74command. The payload is a series of three attributes of type:
75
76a) TASKSTATS_TYPE_AGGR_PID/TGID : attribute containing no payload but indicates
77a pid/tgid will be followed by some stats.
78
79b) TASKSTATS_TYPE_PID/TGID: attribute whose payload is the pid/tgid whose stats
80is being returned.
81
82c) TASKSTATS_TYPE_STATS: attribute with a struct taskstsats as payload. The
83same structure is used for both per-pid and per-tgid stats.
84
853. New message sent by kernel whenever a task exits. The payload consists of a
86 series of attributes of the following type:
87
88a) TASKSTATS_TYPE_AGGR_PID: indicates next two attributes will be pid+stats
89b) TASKSTATS_TYPE_PID: contains exiting task's pid
90c) TASKSTATS_TYPE_STATS: contains the exiting task's per-pid stats
91d) TASKSTATS_TYPE_AGGR_TGID: indicates next two attributes will be tgid+stats
92e) TASKSTATS_TYPE_TGID: contains tgid of process to which task belongs
93f) TASKSTATS_TYPE_STATS: contains the per-tgid stats for exiting task's process
94
95
96per-tgid stats
97--------------
98
99Taskstats provides per-process stats, in addition to per-task stats, since
100resource management is often done at a process granularity and aggregating task
101stats in userspace alone is inefficient and potentially inaccurate (due to lack
102of atomicity).
103
104However, maintaining per-process, in addition to per-task stats, within the
105kernel has space and time overheads. Hence the taskstats implementation
106dynamically sums up the per-task stats for each task belonging to a process
107whenever per-process stats are needed.
108
109Not maintaining per-tgid stats creates a problem when userspace is interested
110in getting these stats when the process dies i.e. the last thread of
111a process exits. It isn't possible to simply return some aggregated per-process
112statistic from the kernel.
113
114The approach taken by taskstats is to return the per-tgid stats *each* time
115a task exits, in addition to the per-pid stats for that task. Userspace can
116maintain task<->process mappings and use them to maintain the per-process stats
117in userspace, updating the aggregate appropriately as the tasks of a process
118exit.
119
120Extending taskstats
121-------------------
122
123There are two ways to extend the taskstats interface to export more
124per-task/process stats as patches to collect them get added to the kernel
125in future:
126
1271. Adding more fields to the end of the existing struct taskstats. Backward
128 compatibility is ensured by the version number within the
129 structure. Userspace will use only the fields of the struct that correspond
130 to the version its using.
131
1322. Defining separate statistic structs and using the netlink attributes
133 interface to return them. Since userspace processes each netlink attribute
134 independently, it can always ignore attributes whose type it does not
135 understand (because it is using an older version of the interface).
136
137
138Choosing between 1. and 2. is a matter of trading off flexibility and
139overhead. If only a few fields need to be added, then 1. is the preferable
140path since the kernel and userspace don't need to incur the overhead of
141processing new netlink attributes. But if the new fields expand the existing
142struct too much, requiring disparate userspace accounting utilities to
143unnecessarily receive large structures whose fields are of no interest, then
144extending the attributes structure would be worthwhile.
145
146----