aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
authorWill Drewry <wad@chromium.org>2011-06-27 12:11:13 -0400
committerLeann Ogasawara <leann.ogasawara@canonical.com>2011-08-30 17:33:51 -0400
commitba94ba4d8fc971b1b6a607bbb6885da79319d65a (patch)
treea4f533c2caee3388d5e85472828721893f3fbe17 /Documentation
parent2a4a5645a73cbcfc1d19673edd0200e24e350194 (diff)
UBUNTU: SAUCE: seccomp_filter: Document what seccomp_filter is and how it works.
Adds a text file covering what CONFIG_SECCOMP_FILTER is, how it is implemented presently, and what it may be used for. In addition, the limitations and caveats of the proposed implementation are included. v10: fix to reflect mode==13 now. v9: rebase on to bccaeafd7c117acee36e90d37c7e05c19be9e7bf v8: - v7: Add a caveat around fork behavior and execve v6: - v5: - v4: rewording (courtesy kees.cook@canonical.com) reflect support for event ids add a small section on adding per-arch support v3: a little more cleanup v2: moved to prctl/ updated for the v2 syntax. adds a note about compat behavior Signed-off-by: Will Drewry <wad@chromium.org> BUG=chromium-os:14496 TEST=I can readz. Change-Id: I10945ea369757756b08834650e59d148b3e08aa2 Reviewed-on: http://gerrit.chromium.org/gerrit/3243 Reviewed-by: Sonny Rao <sonnyrao@chromium.org> Tested-by: Will Drewry <wad@chromium.org> Signed-off-by: Kees Cook <kees.cook@canonical.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/prctl/seccomp_filter.txt189
1 files changed, 189 insertions, 0 deletions
diff --git a/Documentation/prctl/seccomp_filter.txt b/Documentation/prctl/seccomp_filter.txt
new file mode 100644
index 00000000000..5afb4787e6a
--- /dev/null
+++ b/Documentation/prctl/seccomp_filter.txt
@@ -0,0 +1,189 @@
1 Seccomp filtering
2 =================
3
4Introduction
5------------
6
7A large number of system calls are exposed to every userland process
8with many of them going unused for the entire lifetime of the process.
9As system calls change and mature, bugs are found and eradicated. A
10certain subset of userland applications benefit by having a reduced set
11of available system calls. The resulting set reduces the total kernel
12surface exposed to the application. System call filtering is meant for
13use with those applications.
14
15The implementation currently leverages both the existing seccomp
16infrastructure and the kernel tracing infrastructure. By centralizing
17hooks for attack surface reduction in seccomp, it is possible to assure
18attention to security that is less relevant in normal ftrace scenarios,
19such as time-of-check, time-of-use attacks. However, ftrace provides a
20rich, human-friendly environment for interfacing with system call
21specific arguments. (As such, this requires FTRACE_SYSCALLS for any
22introspective filtering support.)
23
24
25What it isn't
26-------------
27
28System call filtering isn't a sandbox. It provides a clearly defined
29mechanism for minimizing the exposed kernel surface. Beyond that,
30policy for logical behavior and information flow should be managed with
31a combinations of other system hardening techniques and, potentially, a
32LSM of your choosing. Expressive, dynamic filters based on the ftrace
33filter engine provide further options down this path (avoiding
34pathological sizes or selecting which of the multiplexed system calls in
35socketcall() is allowed, for instance) which could be construed,
36incorrectly, as a more complete sandboxing solution.
37
38
39Usage
40-----
41
42An additional seccomp mode is exposed through mode '13'.
43This mode depends on CONFIG_SECCOMP_FILTER. By default, it provides
44only the most trivial of filter support "1" or cleared. However, if
45CONFIG_FTRACE_SYSCALLS is enabled, the ftrace filter engine may be used
46for more expressive filters.
47
48A collection of filters may be supplied via prctl, and the current set
49of filters is exposed in /proc/<pid>/seccomp_filter.
50
51Interacting with seccomp filters can be done through three new prctl calls
52and one existing one.
53
54PR_SET_SECCOMP:
55 A pre-existing option for enabling strict seccomp mode (1) or
56 filtering seccomp (13).
57
58 Usage:
59 prctl(PR_SET_SECCOMP, 1); /* strict */
60 prctl(PR_SET_SECCOMP, 13); /* filters */
61
62PR_SET_SECCOMP_FILTER:
63 Allows the specification of a new filter for a given system
64 call, by number, and filter string. By default, the filter
65 string may only be "1". However, if CONFIG_FTRACE_SYSCALLS is
66 supported, the filter string may make use of the ftrace
67 filtering language's awareness of system call arguments.
68
69 In addition, the event id for the system call entry may be
70 specified in lieu of the system call number itself, as
71 determined by the 'type' argument. This allows for the future
72 addition of seccomp-based filtering on other registered,
73 relevant ftrace events.
74
75 All calls to PR_SET_SECCOMP_FILTER for a given system
76 call will append the supplied string to any existing filters.
77 Filter construction looks as follows:
78 (Nothing) + "fd == 1 || fd == 2" => fd == 1 || fd == 2
79 ... + "fd != 2" => (fd == 1 || fd == 2) && fd != 2
80 ... + "size < 100" =>
81 ((fd == 1 || fd == 2) && fd != 2) && size < 100
82 If there is no filter and the seccomp mode has already
83 transitioned to filtering, additions cannot be made. Filters
84 may only be added that reduce the available kernel surface.
85
86 Usage (per the construction example above):
87 unsigned long type = PR_SECCOMP_FILTER_SYSCALL;
88 prctl(PR_SET_SECCOMP_FILTER, type, __NR_write,
89 "fd == 1 || fd == 2");
90 prctl(PR_SET_SECCOMP_FILTER, type, __NR_write,
91 "fd != 2");
92 prctl(PR_SET_SECCOMP_FILTER, type, __NR_write,
93 "size < 100");
94
95 The 'type' argument may be one of PR_SECCOMP_FILTER_SYSCALL or
96 PR_SECCOMP_FILTER_EVENT.
97
98PR_CLEAR_SECCOMP_FILTER:
99 Removes all filter entries for a given system call number or
100 event id. When called prior to entering seccomp filtering mode,
101 it allows for new filters to be applied to the same system call.
102 After transition, however, it completely drops access to the
103 call.
104
105 Usage:
106 prctl(PR_CLEAR_SECCOMP_FILTER,
107 PR_SECCOMP_FILTER_SYSCALL, __NR_open);
108
109PR_GET_SECCOMP_FILTER:
110 Returns the aggregated filter string for a system call into a
111 user-supplied buffer of a given length.
112
113 Usage:
114 prctl(PR_GET_SECCOMP_FILTER,
115 PR_SECCOMP_FILTER_SYSCALL, __NR_write, buf,
116 sizeof(buf));
117
118All of the above calls return 0 on success and non-zero on error. If
119CONFIG_FTRACE_SYSCALLS is not supported and a rich-filter was specified,
120the caller may check the errno for -ENOSYS. The same is true if
121specifying an filter by the event id fails to discover any relevant
122event entries.
123
124
125Example
126-------
127
128Assume a process would like to cleanly read and write to stdin/out/err
129as well as access its filters after seccomp enforcement begins. This
130may be done as follows:
131
132 int filter_syscall(int nr, char *buf) {
133 return prctl(PR_SET_SECCOMP_FILTER, PR_SECCOMP_FILTER_SYSCALL,
134 nr, buf);
135 }
136
137 filter_syscall(__NR_read, "fd == 0");
138 filter_syscall(_NR_write, "fd == 1 || fd == 2");
139 filter_syscall(__NR_exit, "1");
140 filter_syscall(__NR_prctl, "1");
141 prctl(PR_SET_SECCOMP, 13);
142
143 /* Do stuff with fdset . . .*/
144
145 /* Drop read access and keep only write access to fd 1. */
146 prctl(PR_CLEAR_SECCOMP_FILTER, PR_SECCOMP_FILTER_SYSCALL, __NR_read);
147 filter_syscall(__NR_write, "fd != 2");
148
149 /* Perform any final processing . . . */
150 syscall(__NR_exit, 0);
151
152
153Caveats
154-------
155
156- Avoid using a filter of "0" to disable a filter. Always favor calling
157 prctl(PR_CLEAR_SECCOMP_FILTER, ...). Otherwise the behavior may vary
158 depending on if CONFIG_FTRACE_SYSCALLS support exists -- though an
159 error will be returned if the support is missing.
160
161- execve is always blocked. seccomp filters may not cross that boundary.
162
163- Filters can be inherited across fork/clone but only when they are
164 active (e.g., PR_SET_SECCOMP has been set to 13), but not prior to use.
165 This stops the parent process from adding filters that may undermine
166 the child process security or create unexpected behavior after an
167 execve.
168
169- Some platforms support a 32-bit userspace with 64-bit kernels. In
170 these cases (CONFIG_COMPAT), system call numbers may not match across
171 64-bit and 32-bit system calls. When the first PRCTL_SET_SECCOMP_FILTER
172 is called, the in-memory filters state is annotated with whether the
173 call has been made via the compat interface. All subsequent calls will
174 be checked for compat call mismatch. In the long run, it may make sense
175 to store compat and non-compat filters separately, but that is not
176 supported at present. Once one type of system call interface has been
177 used, it must be continued to be used.
178
179
180Adding architecture support
181-----------------------
182
183Any platform with seccomp support should be able to support the bare
184minimum of seccomp filter features. However, since seccomp_filter
185requires that execve be blocked, it expects the architecture to expose a
186__NR_seccomp_execve define that maps to the execve system call number.
187On platforms where CONFIG_COMPAT applies, __NR_seccomp_execve_32 must
188also be provided. Once those macros exist, "select HAVE_SECCOMP_FILTER"
189support may be added to the architectures Kconfig.