aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorWill Drewry <wad@chromium.org>2012-04-12 17:48:04 -0400
committerJames Morris <james.l.morris@oracle.com>2012-04-13 21:13:22 -0400
commit8ac270d1e29f0428228ab2b9a8ae5e1ed4a5cd84 (patch)
tree6deba4ed83da9ace758004b29d15aa0d2ec875a7
parentc6cfbeb4029610c8c330c312dcf4d514cc067554 (diff)
Documentation: prctl/seccomp_filter
Documents how system call filtering using Berkeley Packet Filter programs works and how it may be used. Includes an example for x86 and a semi-generic example using a macro-based code generator. Acked-by: Eric Paris <eparis@redhat.com> Signed-off-by: Will Drewry <wad@chromium.org> Acked-by: Kees Cook <keescook@chromium.org> v18: - added acked by - update no new privs numbers v17: - remove @compat note and add Pitfalls section for arch checking (keescook@chromium.org) v16: - v15: - v14: - rebase/nochanges v13: - rebase on to 88ebdda6159ffc15699f204c33feb3e431bf9bdc v12: - comment on the ptrace_event use - update arch support comment - note the behavior of SECCOMP_RET_DATA when there are multiple filters (keescook@chromium.org) - lots of samples/ clean up incl 64-bit bpf-direct support (markus@chromium.org) - rebase to linux-next v11: - overhaul return value language, updates (keescook@chromium.org) - comment on do_exit(SIGSYS) v10: - update for SIGSYS - update for new seccomp_data layout - update for ptrace option use v9: - updated bpf-direct.c for SIGILL v8: - add PR_SET_NO_NEW_PRIVS to the samples. v7: - updated for all the new stuff in v7: TRAP, TRACE - only talk about PR_SET_SECCOMP now - fixed bad JLE32 check (coreyb@linux.vnet.ibm.com) - adds dropper.c: a simple system call disabler v6: - tweak the language to note the requirement of PR_SET_NO_NEW_PRIVS being called prior to use. (luto@mit.edu) v5: - update sample to use system call arguments - adds a "fancy" example using a macro-based generator - cleaned up bpf in the sample - update docs to mention arguments - fix prctl value (eparis@redhat.com) - language cleanup (rdunlap@xenotime.net) v4: - update for no_new_privs use - minor tweaks v3: - call out BPF <-> Berkeley Packet Filter (rdunlap@xenotime.net) - document use of tentative always-unprivileged - guard sample compilation for i386 and x86_64 v2: - move code to samples (corbet@lwn.net) Signed-off-by: James Morris <james.l.morris@oracle.com>
-rw-r--r--Documentation/prctl/seccomp_filter.txt163
-rw-r--r--samples/Makefile2
-rw-r--r--samples/seccomp/Makefile38
-rw-r--r--samples/seccomp/bpf-direct.c176
-rw-r--r--samples/seccomp/bpf-fancy.c102
-rw-r--r--samples/seccomp/bpf-helper.c89
-rw-r--r--samples/seccomp/bpf-helper.h238
-rw-r--r--samples/seccomp/dropper.c68
8 files changed, 875 insertions, 1 deletions
diff --git a/Documentation/prctl/seccomp_filter.txt b/Documentation/prctl/seccomp_filter.txt
new file mode 100644
index 000000000000..597c3c581375
--- /dev/null
+++ b/Documentation/prctl/seccomp_filter.txt
@@ -0,0 +1,163 @@
1 SECure COMPuting with filters
2 =============================
3
4Introduction
5------------
6
7A large number of system calls are exposed to every userland process
8with many of them going unused for the entire lifetime of the process.
9As system calls change and mature, bugs are found and eradicated. A
10certain subset of userland applications benefit by having a reduced set
11of available system calls. The resulting set reduces the total kernel
12surface exposed to the application. System call filtering is meant for
13use with those applications.
14
15Seccomp filtering provides a means for a process to specify a filter for
16incoming system calls. The filter is expressed as a Berkeley Packet
17Filter (BPF) program, as with socket filters, except that the data
18operated on is related to the system call being made: system call
19number and the system call arguments. This allows for expressive
20filtering of system calls using a filter program language with a long
21history of being exposed to userland and a straightforward data set.
22
23Additionally, BPF makes it impossible for users of seccomp to fall prey
24to time-of-check-time-of-use (TOCTOU) attacks that are common in system
25call interposition frameworks. BPF programs may not dereference
26pointers which constrains all filters to solely evaluating the system
27call arguments directly.
28
29What it isn't
30-------------
31
32System call filtering isn't a sandbox. It provides a clearly defined
33mechanism for minimizing the exposed kernel surface. It is meant to be
34a tool for sandbox developers to use. Beyond that, policy for logical
35behavior and information flow should be managed with a combination of
36other system hardening techniques and, potentially, an LSM of your
37choosing. Expressive, dynamic filters provide further options down this
38path (avoiding pathological sizes or selecting which of the multiplexed
39system calls in socketcall() is allowed, for instance) which could be
40construed, incorrectly, as a more complete sandboxing solution.
41
42Usage
43-----
44
45An additional seccomp mode is added and is enabled using the same
46prctl(2) call as the strict seccomp. If the architecture has
47CONFIG_HAVE_ARCH_SECCOMP_FILTER, then filters may be added as below:
48
49PR_SET_SECCOMP:
50 Now takes an additional argument which specifies a new filter
51 using a BPF program.
52 The BPF program will be executed over struct seccomp_data
53 reflecting the system call number, arguments, and other
54 metadata. The BPF program must then return one of the
55 acceptable values to inform the kernel which action should be
56 taken.
57
58 Usage:
59 prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, prog);
60
61 The 'prog' argument is a pointer to a struct sock_fprog which
62 will contain the filter program. If the program is invalid, the
63 call will return -1 and set errno to EINVAL.
64
65 If fork/clone and execve are allowed by @prog, any child
66 processes will be constrained to the same filters and system
67 call ABI as the parent.
68
69 Prior to use, the task must call prctl(PR_SET_NO_NEW_PRIVS, 1) or
70 run with CAP_SYS_ADMIN privileges in its namespace. If these are not
71 true, -EACCES will be returned. This requirement ensures that filter
72 programs cannot be applied to child processes with greater privileges
73 than the task that installed them.
74
75 Additionally, if prctl(2) is allowed by the attached filter,
76 additional filters may be layered on which will increase evaluation
77 time, but allow for further decreasing the attack surface during
78 execution of a process.
79
80The above call returns 0 on success and non-zero on error.
81
82Return values
83-------------
84A seccomp filter may return any of the following values. If multiple
85filters exist, the return value for the evaluation of a given system
86call will always use the highest precedent value. (For example,
87SECCOMP_RET_KILL will always take precedence.)
88
89In precedence order, they are:
90
91SECCOMP_RET_KILL:
92 Results in the task exiting immediately without executing the
93 system call. The exit status of the task (status & 0x7f) will
94 be SIGSYS, not SIGKILL.
95
96SECCOMP_RET_TRAP:
97 Results in the kernel sending a SIGSYS signal to the triggering
98 task without executing the system call. The kernel will
99 rollback the register state to just before the system call
100 entry such that a signal handler in the task will be able to
101 inspect the ucontext_t->uc_mcontext registers and emulate
102 system call success or failure upon return from the signal
103 handler.
104
105 The SECCOMP_RET_DATA portion of the return value will be passed
106 as si_errno.
107
108 SIGSYS triggered by seccomp will have a si_code of SYS_SECCOMP.
109
110SECCOMP_RET_ERRNO:
111 Results in the lower 16-bits of the return value being passed
112 to userland as the errno without executing the system call.
113
114SECCOMP_RET_TRACE:
115 When returned, this value will cause the kernel to attempt to
116 notify a ptrace()-based tracer prior to executing the system
117 call. If there is no tracer present, -ENOSYS is returned to
118 userland and the system call is not executed.
119
120 A tracer will be notified if it requests PTRACE_O_TRACESECCOMP
121 using ptrace(PTRACE_SETOPTIONS). The tracer will be notified
122 of a PTRACE_EVENT_SECCOMP and the SECCOMP_RET_DATA portion of
123 the BPF program return value will be available to the tracer
124 via PTRACE_GETEVENTMSG.
125
126SECCOMP_RET_ALLOW:
127 Results in the system call being executed.
128
129If multiple filters exist, the return value for the evaluation of a
130given system call will always use the highest precedent value.
131
132Precedence is only determined using the SECCOMP_RET_ACTION mask. When
133multiple filters return values of the same precedence, only the
134SECCOMP_RET_DATA from the most recently installed filter will be
135returned.
136
137Pitfalls
138--------
139
140The biggest pitfall to avoid during use is filtering on system call
141number without checking the architecture value. Why? On any
142architecture that supports multiple system call invocation conventions,
143the system call numbers may vary based on the specific invocation. If
144the numbers in the different calling conventions overlap, then checks in
145the filters may be abused. Always check the arch value!
146
147Example
148-------
149
150The samples/seccomp/ directory contains both an x86-specific example
151and a more generic example of a higher level macro interface for BPF
152program generation.
153
154
155
156Adding architecture support
157-----------------------
158
159See arch/Kconfig for the authoritative requirements. In general, if an
160architecture supports both ptrace_event and seccomp, it will be able to
161support seccomp filter with minor fixup: SIGSYS support and seccomp return
162value checking. Then it must just add CONFIG_HAVE_ARCH_SECCOMP_FILTER
163to its arch-specific Kconfig.
diff --git a/samples/Makefile b/samples/Makefile
index 2f75851ec629..5ef08bba96ce 100644
--- a/samples/Makefile
+++ b/samples/Makefile
@@ -1,4 +1,4 @@
1# Makefile for Linux samples code 1# Makefile for Linux samples code
2 2
3obj-$(CONFIG_SAMPLES) += kobject/ kprobes/ tracepoints/ trace_events/ \ 3obj-$(CONFIG_SAMPLES) += kobject/ kprobes/ tracepoints/ trace_events/ \
4 hw_breakpoint/ kfifo/ kdb/ hidraw/ rpmsg/ 4 hw_breakpoint/ kfifo/ kdb/ hidraw/ rpmsg/ seccomp/
diff --git a/samples/seccomp/Makefile b/samples/seccomp/Makefile
new file mode 100644
index 000000000000..e8fe0f57b68f
--- /dev/null
+++ b/samples/seccomp/Makefile
@@ -0,0 +1,38 @@
1# kbuild trick to avoid linker error. Can be omitted if a module is built.
2obj- := dummy.o
3
4hostprogs-$(CONFIG_SECCOMP) := bpf-fancy dropper
5bpf-fancy-objs := bpf-fancy.o bpf-helper.o
6
7HOSTCFLAGS_bpf-fancy.o += -I$(objtree)/usr/include
8HOSTCFLAGS_bpf-fancy.o += -idirafter $(objtree)/include
9HOSTCFLAGS_bpf-helper.o += -I$(objtree)/usr/include
10HOSTCFLAGS_bpf-helper.o += -idirafter $(objtree)/include
11
12HOSTCFLAGS_dropper.o += -I$(objtree)/usr/include
13HOSTCFLAGS_dropper.o += -idirafter $(objtree)/include
14dropper-objs := dropper.o
15
16# bpf-direct.c is x86-only.
17ifeq ($(SRCARCH),x86)
18# List of programs to build
19hostprogs-$(CONFIG_SECCOMP) += bpf-direct
20bpf-direct-objs := bpf-direct.o
21endif
22
23HOSTCFLAGS_bpf-direct.o += -I$(objtree)/usr/include
24HOSTCFLAGS_bpf-direct.o += -idirafter $(objtree)/include
25
26# Try to match the kernel target.
27ifeq ($(CONFIG_64BIT),)
28HOSTCFLAGS_bpf-direct.o += -m32
29HOSTCFLAGS_dropper.o += -m32
30HOSTCFLAGS_bpf-helper.o += -m32
31HOSTCFLAGS_bpf-fancy.o += -m32
32HOSTLOADLIBES_bpf-direct += -m32
33HOSTLOADLIBES_bpf-fancy += -m32
34HOSTLOADLIBES_dropper += -m32
35endif
36
37# Tell kbuild to always build the programs
38always := $(hostprogs-y)
diff --git a/samples/seccomp/bpf-direct.c b/samples/seccomp/bpf-direct.c
new file mode 100644
index 000000000000..26f523e6ed74
--- /dev/null
+++ b/samples/seccomp/bpf-direct.c
@@ -0,0 +1,176 @@
1/*
2 * Seccomp filter example for x86 (32-bit and 64-bit) with BPF macros
3 *
4 * Copyright (c) 2012 The Chromium OS Authors <chromium-os-dev@chromium.org>
5 * Author: Will Drewry <wad@chromium.org>
6 *
7 * The code may be used by anyone for any purpose,
8 * and can serve as a starting point for developing
9 * applications using prctl(PR_SET_SECCOMP, 2, ...).
10 */
11#define __USE_GNU 1
12#define _GNU_SOURCE 1
13
14#include <linux/types.h>
15#include <linux/filter.h>
16#include <linux/seccomp.h>
17#include <linux/unistd.h>
18#include <signal.h>
19#include <stdio.h>
20#include <stddef.h>
21#include <string.h>
22#include <sys/prctl.h>
23#include <unistd.h>
24
25#define syscall_arg(_n) (offsetof(struct seccomp_data, args[_n]))
26#define syscall_nr (offsetof(struct seccomp_data, nr))
27
28#if defined(__i386__)
29#define REG_RESULT REG_EAX
30#define REG_SYSCALL REG_EAX
31#define REG_ARG0 REG_EBX
32#define REG_ARG1 REG_ECX
33#define REG_ARG2 REG_EDX
34#define REG_ARG3 REG_ESI
35#define REG_ARG4 REG_EDI
36#define REG_ARG5 REG_EBP
37#elif defined(__x86_64__)
38#define REG_RESULT REG_RAX
39#define REG_SYSCALL REG_RAX
40#define REG_ARG0 REG_RDI
41#define REG_ARG1 REG_RSI
42#define REG_ARG2 REG_RDX
43#define REG_ARG3 REG_R10
44#define REG_ARG4 REG_R8
45#define REG_ARG5 REG_R9
46#else
47#error Unsupported platform
48#endif
49
50#ifndef PR_SET_NO_NEW_PRIVS
51#define PR_SET_NO_NEW_PRIVS 38
52#endif
53
54#ifndef SYS_SECCOMP
55#define SYS_SECCOMP 1
56#endif
57
58static void emulator(int nr, siginfo_t *info, void *void_context)
59{
60 ucontext_t *ctx = (ucontext_t *)(void_context);
61 int syscall;
62 char *buf;
63 ssize_t bytes;
64 size_t len;
65 if (info->si_code != SYS_SECCOMP)
66 return;
67 if (!ctx)
68 return;
69 syscall = ctx->uc_mcontext.gregs[REG_SYSCALL];
70 buf = (char *) ctx->uc_mcontext.gregs[REG_ARG1];
71 len = (size_t) ctx->uc_mcontext.gregs[REG_ARG2];
72
73 if (syscall != __NR_write)
74 return;
75 if (ctx->uc_mcontext.gregs[REG_ARG0] != STDERR_FILENO)
76 return;
77 /* Redirect stderr messages to stdout. Doesn't handle EINTR, etc */
78 ctx->uc_mcontext.gregs[REG_RESULT] = -1;
79 if (write(STDOUT_FILENO, "[ERR] ", 6) > 0) {
80 bytes = write(STDOUT_FILENO, buf, len);
81 ctx->uc_mcontext.gregs[REG_RESULT] = bytes;
82 }
83 return;
84}
85
86static int install_emulator(void)
87{
88 struct sigaction act;
89 sigset_t mask;
90 memset(&act, 0, sizeof(act));
91 sigemptyset(&mask);
92 sigaddset(&mask, SIGSYS);
93
94 act.sa_sigaction = &emulator;
95 act.sa_flags = SA_SIGINFO;
96 if (sigaction(SIGSYS, &act, NULL) < 0) {
97 perror("sigaction");
98 return -1;
99 }
100 if (sigprocmask(SIG_UNBLOCK, &mask, NULL)) {
101 perror("sigprocmask");
102 return -1;
103 }
104 return 0;
105}
106
107static int install_filter(void)
108{
109 struct sock_filter filter[] = {
110 /* Grab the system call number */
111 BPF_STMT(BPF_LD+BPF_W+BPF_ABS, syscall_nr),
112 /* Jump table for the allowed syscalls */
113 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_rt_sigreturn, 0, 1),
114 BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW),
115#ifdef __NR_sigreturn
116 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_sigreturn, 0, 1),
117 BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW),
118#endif
119 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_exit_group, 0, 1),
120 BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW),
121 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_exit, 0, 1),
122 BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW),
123 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_read, 1, 0),
124 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, __NR_write, 3, 2),
125
126 /* Check that read is only using stdin. */
127 BPF_STMT(BPF_LD+BPF_W+BPF_ABS, syscall_arg(0)),
128 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, STDIN_FILENO, 4, 0),
129 BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_KILL),
130
131 /* Check that write is only using stdout */
132 BPF_STMT(BPF_LD+BPF_W+BPF_ABS, syscall_arg(0)),
133 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, STDOUT_FILENO, 1, 0),
134 /* Trap attempts to write to stderr */
135 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, STDERR_FILENO, 1, 2),
136
137 BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW),
138 BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_TRAP),
139 BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_KILL),
140 };
141 struct sock_fprog prog = {
142 .len = (unsigned short)(sizeof(filter)/sizeof(filter[0])),
143 .filter = filter,
144 };
145
146 if (prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)) {
147 perror("prctl(NO_NEW_PRIVS)");
148 return 1;
149 }
150
151
152 if (prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog)) {
153 perror("prctl");
154 return 1;
155 }
156 return 0;
157}
158
159#define payload(_c) (_c), sizeof((_c))
160int main(int argc, char **argv)
161{
162 char buf[4096];
163 ssize_t bytes = 0;
164 if (install_emulator())
165 return 1;
166 if (install_filter())
167 return 1;
168 syscall(__NR_write, STDOUT_FILENO,
169 payload("OHAI! WHAT IS YOUR NAME? "));
170 bytes = syscall(__NR_read, STDIN_FILENO, buf, sizeof(buf));
171 syscall(__NR_write, STDOUT_FILENO, payload("HELLO, "));
172 syscall(__NR_write, STDOUT_FILENO, buf, bytes);
173 syscall(__NR_write, STDERR_FILENO,
174 payload("Error message going to STDERR\n"));
175 return 0;
176}
diff --git a/samples/seccomp/bpf-fancy.c b/samples/seccomp/bpf-fancy.c
new file mode 100644
index 000000000000..8eb483aaec46
--- /dev/null
+++ b/samples/seccomp/bpf-fancy.c
@@ -0,0 +1,102 @@
1/*
2 * Seccomp BPF example using a macro-based generator.
3 *
4 * Copyright (c) 2012 The Chromium OS Authors <chromium-os-dev@chromium.org>
5 * Author: Will Drewry <wad@chromium.org>
6 *
7 * The code may be used by anyone for any purpose,
8 * and can serve as a starting point for developing
9 * applications using prctl(PR_ATTACH_SECCOMP_FILTER).
10 */
11
12#include <linux/filter.h>
13#include <linux/seccomp.h>
14#include <linux/unistd.h>
15#include <stdio.h>
16#include <string.h>
17#include <sys/prctl.h>
18#include <unistd.h>
19
20#include "bpf-helper.h"
21
22#ifndef PR_SET_NO_NEW_PRIVS
23#define PR_SET_NO_NEW_PRIVS 38
24#endif
25
26int main(int argc, char **argv)
27{
28 struct bpf_labels l;
29 static const char msg1[] = "Please type something: ";
30 static const char msg2[] = "You typed: ";
31 char buf[256];
32 struct sock_filter filter[] = {
33 /* TODO: LOAD_SYSCALL_NR(arch) and enforce an arch */
34 LOAD_SYSCALL_NR,
35 SYSCALL(__NR_exit, ALLOW),
36 SYSCALL(__NR_exit_group, ALLOW),
37 SYSCALL(__NR_write, JUMP(&l, write_fd)),
38 SYSCALL(__NR_read, JUMP(&l, read)),
39 DENY, /* Don't passthrough into a label */
40
41 LABEL(&l, read),
42 ARG(0),
43 JNE(STDIN_FILENO, DENY),
44 ARG(1),
45 JNE((unsigned long)buf, DENY),
46 ARG(2),
47 JGE(sizeof(buf), DENY),
48 ALLOW,
49
50 LABEL(&l, write_fd),
51 ARG(0),
52 JEQ(STDOUT_FILENO, JUMP(&l, write_buf)),
53 JEQ(STDERR_FILENO, JUMP(&l, write_buf)),
54 DENY,
55
56 LABEL(&l, write_buf),
57 ARG(1),
58 JEQ((unsigned long)msg1, JUMP(&l, msg1_len)),
59 JEQ((unsigned long)msg2, JUMP(&l, msg2_len)),
60 JEQ((unsigned long)buf, JUMP(&l, buf_len)),
61 DENY,
62
63 LABEL(&l, msg1_len),
64 ARG(2),
65 JLT(sizeof(msg1), ALLOW),
66 DENY,
67
68 LABEL(&l, msg2_len),
69 ARG(2),
70 JLT(sizeof(msg2), ALLOW),
71 DENY,
72
73 LABEL(&l, buf_len),
74 ARG(2),
75 JLT(sizeof(buf), ALLOW),
76 DENY,
77 };
78 struct sock_fprog prog = {
79 .filter = filter,
80 .len = (unsigned short)(sizeof(filter)/sizeof(filter[0])),
81 };
82 ssize_t bytes;
83 bpf_resolve_jumps(&l, filter, sizeof(filter)/sizeof(*filter));
84
85 if (prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)) {
86 perror("prctl(NO_NEW_PRIVS)");
87 return 1;
88 }
89
90 if (prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog)) {
91 perror("prctl(SECCOMP)");
92 return 1;
93 }
94 syscall(__NR_write, STDOUT_FILENO, msg1, strlen(msg1));
95 bytes = syscall(__NR_read, STDIN_FILENO, buf, sizeof(buf)-1);
96 bytes = (bytes > 0 ? bytes : 0);
97 syscall(__NR_write, STDERR_FILENO, msg2, strlen(msg2));
98 syscall(__NR_write, STDERR_FILENO, buf, bytes);
99 /* Now get killed */
100 syscall(__NR_write, STDERR_FILENO, msg2, strlen(msg2)+2);
101 return 0;
102}
diff --git a/samples/seccomp/bpf-helper.c b/samples/seccomp/bpf-helper.c
new file mode 100644
index 000000000000..579cfe331886
--- /dev/null
+++ b/samples/seccomp/bpf-helper.c
@@ -0,0 +1,89 @@
1/*
2 * Seccomp BPF helper functions
3 *
4 * Copyright (c) 2012 The Chromium OS Authors <chromium-os-dev@chromium.org>
5 * Author: Will Drewry <wad@chromium.org>
6 *
7 * The code may be used by anyone for any purpose,
8 * and can serve as a starting point for developing
9 * applications using prctl(PR_ATTACH_SECCOMP_FILTER).
10 */
11
12#include <stdio.h>
13#include <string.h>
14
15#include "bpf-helper.h"
16
17int bpf_resolve_jumps(struct bpf_labels *labels,
18 struct sock_filter *filter, size_t count)
19{
20 struct sock_filter *begin = filter;
21 __u8 insn = count - 1;
22
23 if (count < 1)
24 return -1;
25 /*
26 * Walk it once, backwards, to build the label table and do fixups.
27 * Since backward jumps are disallowed by BPF, this is easy.
28 */
29 filter += insn;
30 for (; filter >= begin; --insn, --filter) {
31 if (filter->code != (BPF_JMP+BPF_JA))
32 continue;
33 switch ((filter->jt<<8)|filter->jf) {
34 case (JUMP_JT<<8)|JUMP_JF:
35 if (labels->labels[filter->k].location == 0xffffffff) {
36 fprintf(stderr, "Unresolved label: '%s'\n",
37 labels->labels[filter->k].label);
38 return 1;
39 }
40 filter->k = labels->labels[filter->k].location -
41 (insn + 1);
42 filter->jt = 0;
43 filter->jf = 0;
44 continue;
45 case (LABEL_JT<<8)|LABEL_JF:
46 if (labels->labels[filter->k].location != 0xffffffff) {
47 fprintf(stderr, "Duplicate label use: '%s'\n",
48 labels->labels[filter->k].label);
49 return 1;
50 }
51 labels->labels[filter->k].location = insn;
52 filter->k = 0; /* fall through */
53 filter->jt = 0;
54 filter->jf = 0;
55 continue;
56 }
57 }
58 return 0;
59}
60
61/* Simple lookup table for labels. */
62__u32 seccomp_bpf_label(struct bpf_labels *labels, const char *label)
63{
64 struct __bpf_label *begin = labels->labels, *end;
65 int id;
66 if (labels->count == 0) {
67 begin->label = label;
68 begin->location = 0xffffffff;
69 labels->count++;
70 return 0;
71 }
72 end = begin + labels->count;
73 for (id = 0; begin < end; ++begin, ++id) {
74 if (!strcmp(label, begin->label))
75 return id;
76 }
77 begin->label = label;
78 begin->location = 0xffffffff;
79 labels->count++;
80 return id;
81}
82
83void seccomp_bpf_print(struct sock_filter *filter, size_t count)
84{
85 struct sock_filter *end = filter + count;
86 for ( ; filter < end; ++filter)
87 printf("{ code=%u,jt=%u,jf=%u,k=%u },\n",
88 filter->code, filter->jt, filter->jf, filter->k);
89}
diff --git a/samples/seccomp/bpf-helper.h b/samples/seccomp/bpf-helper.h
new file mode 100644
index 000000000000..643279dd30fb
--- /dev/null
+++ b/samples/seccomp/bpf-helper.h
@@ -0,0 +1,238 @@
1/*
2 * Example wrapper around BPF macros.
3 *
4 * Copyright (c) 2012 The Chromium OS Authors <chromium-os-dev@chromium.org>
5 * Author: Will Drewry <wad@chromium.org>
6 *
7 * The code may be used by anyone for any purpose,
8 * and can serve as a starting point for developing
9 * applications using prctl(PR_SET_SECCOMP, 2, ...).
10 *
11 * No guarantees are provided with respect to the correctness
12 * or functionality of this code.
13 */
14#ifndef __BPF_HELPER_H__
15#define __BPF_HELPER_H__
16
17#include <asm/bitsperlong.h> /* for __BITS_PER_LONG */
18#include <endian.h>
19#include <linux/filter.h>
20#include <linux/seccomp.h> /* for seccomp_data */
21#include <linux/types.h>
22#include <linux/unistd.h>
23#include <stddef.h>
24
25#define BPF_LABELS_MAX 256
26struct bpf_labels {
27 int count;
28 struct __bpf_label {
29 const char *label;
30 __u32 location;
31 } labels[BPF_LABELS_MAX];
32};
33
34int bpf_resolve_jumps(struct bpf_labels *labels,
35 struct sock_filter *filter, size_t count);
36__u32 seccomp_bpf_label(struct bpf_labels *labels, const char *label);
37void seccomp_bpf_print(struct sock_filter *filter, size_t count);
38
39#define JUMP_JT 0xff
40#define JUMP_JF 0xff
41#define LABEL_JT 0xfe
42#define LABEL_JF 0xfe
43
44#define ALLOW \
45 BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW)
46#define DENY \
47 BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_KILL)
48#define JUMP(labels, label) \
49 BPF_JUMP(BPF_JMP+BPF_JA, FIND_LABEL((labels), (label)), \
50 JUMP_JT, JUMP_JF)
51#define LABEL(labels, label) \
52 BPF_JUMP(BPF_JMP+BPF_JA, FIND_LABEL((labels), (label)), \
53 LABEL_JT, LABEL_JF)
54#define SYSCALL(nr, jt) \
55 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (nr), 0, 1), \
56 jt
57
58/* Lame, but just an example */
59#define FIND_LABEL(labels, label) seccomp_bpf_label((labels), #label)
60
61#define EXPAND(...) __VA_ARGS__
62/* Map all width-sensitive operations */
63#if __BITS_PER_LONG == 32
64
65#define JEQ(x, jt) JEQ32(x, EXPAND(jt))
66#define JNE(x, jt) JNE32(x, EXPAND(jt))
67#define JGT(x, jt) JGT32(x, EXPAND(jt))
68#define JLT(x, jt) JLT32(x, EXPAND(jt))
69#define JGE(x, jt) JGE32(x, EXPAND(jt))
70#define JLE(x, jt) JLE32(x, EXPAND(jt))
71#define JA(x, jt) JA32(x, EXPAND(jt))
72#define ARG(i) ARG_32(i)
73#define LO_ARG(idx) offsetof(struct seccomp_data, args[(idx)])
74
75#elif __BITS_PER_LONG == 64
76
77/* Ensure that we load the logically correct offset. */
78#if __BYTE_ORDER == __LITTLE_ENDIAN
79#define ENDIAN(_lo, _hi) _lo, _hi
80#define LO_ARG(idx) offsetof(struct seccomp_data, args[(idx)])
81#define HI_ARG(idx) offsetof(struct seccomp_data, args[(idx)]) + sizeof(__u32)
82#elif __BYTE_ORDER == __BIG_ENDIAN
83#define ENDIAN(_lo, _hi) _hi, _lo
84#define LO_ARG(idx) offsetof(struct seccomp_data, args[(idx)]) + sizeof(__u32)
85#define HI_ARG(idx) offsetof(struct seccomp_data, args[(idx)])
86#else
87#error "Unknown endianness"
88#endif
89
90union arg64 {
91 struct {
92 __u32 ENDIAN(lo32, hi32);
93 };
94 __u64 u64;
95};
96
97#define JEQ(x, jt) \
98 JEQ64(((union arg64){.u64 = (x)}).lo32, \
99 ((union arg64){.u64 = (x)}).hi32, \
100 EXPAND(jt))
101#define JGT(x, jt) \
102 JGT64(((union arg64){.u64 = (x)}).lo32, \
103 ((union arg64){.u64 = (x)}).hi32, \
104 EXPAND(jt))
105#define JGE(x, jt) \
106 JGE64(((union arg64){.u64 = (x)}).lo32, \
107 ((union arg64){.u64 = (x)}).hi32, \
108 EXPAND(jt))
109#define JNE(x, jt) \
110 JNE64(((union arg64){.u64 = (x)}).lo32, \
111 ((union arg64){.u64 = (x)}).hi32, \
112 EXPAND(jt))
113#define JLT(x, jt) \
114 JLT64(((union arg64){.u64 = (x)}).lo32, \
115 ((union arg64){.u64 = (x)}).hi32, \
116 EXPAND(jt))
117#define JLE(x, jt) \
118 JLE64(((union arg64){.u64 = (x)}).lo32, \
119 ((union arg64){.u64 = (x)}).hi32, \
120 EXPAND(jt))
121
122#define JA(x, jt) \
123 JA64(((union arg64){.u64 = (x)}).lo32, \
124 ((union arg64){.u64 = (x)}).hi32, \
125 EXPAND(jt))
126#define ARG(i) ARG_64(i)
127
128#else
129#error __BITS_PER_LONG value unusable.
130#endif
131
132/* Loads the arg into A */
133#define ARG_32(idx) \
134 BPF_STMT(BPF_LD+BPF_W+BPF_ABS, LO_ARG(idx))
135
136/* Loads hi into A and lo in X */
137#define ARG_64(idx) \
138 BPF_STMT(BPF_LD+BPF_W+BPF_ABS, LO_ARG(idx)), \
139 BPF_STMT(BPF_ST, 0), /* lo -> M[0] */ \
140 BPF_STMT(BPF_LD+BPF_W+BPF_ABS, HI_ARG(idx)), \
141 BPF_STMT(BPF_ST, 1) /* hi -> M[1] */
142
143#define JEQ32(value, jt) \
144 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (value), 0, 1), \
145 jt
146
147#define JNE32(value, jt) \
148 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (value), 1, 0), \
149 jt
150
151/* Checks the lo, then swaps to check the hi. A=lo,X=hi */
152#define JEQ64(lo, hi, jt) \
153 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (hi), 0, 5), \
154 BPF_STMT(BPF_LD+BPF_MEM, 0), /* swap in lo */ \
155 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (lo), 0, 2), \
156 BPF_STMT(BPF_LD+BPF_MEM, 1), /* passed: swap hi back in */ \
157 jt, \
158 BPF_STMT(BPF_LD+BPF_MEM, 1) /* failed: swap hi back in */
159
160#define JNE64(lo, hi, jt) \
161 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (hi), 5, 0), \
162 BPF_STMT(BPF_LD+BPF_MEM, 0), /* swap in lo */ \
163 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (lo), 2, 0), \
164 BPF_STMT(BPF_LD+BPF_MEM, 1), /* passed: swap hi back in */ \
165 jt, \
166 BPF_STMT(BPF_LD+BPF_MEM, 1) /* failed: swap hi back in */
167
168#define JA32(value, jt) \
169 BPF_JUMP(BPF_JMP+BPF_JSET+BPF_K, (value), 0, 1), \
170 jt
171
172#define JA64(lo, hi, jt) \
173 BPF_JUMP(BPF_JMP+BPF_JSET+BPF_K, (hi), 3, 0), \
174 BPF_STMT(BPF_LD+BPF_MEM, 0), /* swap in lo */ \
175 BPF_JUMP(BPF_JMP+BPF_JSET+BPF_K, (lo), 0, 2), \
176 BPF_STMT(BPF_LD+BPF_MEM, 1), /* passed: swap hi back in */ \
177 jt, \
178 BPF_STMT(BPF_LD+BPF_MEM, 1) /* failed: swap hi back in */
179
180#define JGE32(value, jt) \
181 BPF_JUMP(BPF_JMP+BPF_JGE+BPF_K, (value), 0, 1), \
182 jt
183
184#define JLT32(value, jt) \
185 BPF_JUMP(BPF_JMP+BPF_JGE+BPF_K, (value), 1, 0), \
186 jt
187
188/* Shortcut checking if hi > arg.hi. */
189#define JGE64(lo, hi, jt) \
190 BPF_JUMP(BPF_JMP+BPF_JGT+BPF_K, (hi), 4, 0), \
191 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (hi), 0, 5), \
192 BPF_STMT(BPF_LD+BPF_MEM, 0), /* swap in lo */ \
193 BPF_JUMP(BPF_JMP+BPF_JGE+BPF_K, (lo), 0, 2), \
194 BPF_STMT(BPF_LD+BPF_MEM, 1), /* passed: swap hi back in */ \
195 jt, \
196 BPF_STMT(BPF_LD+BPF_MEM, 1) /* failed: swap hi back in */
197
198#define JLT64(lo, hi, jt) \
199 BPF_JUMP(BPF_JMP+BPF_JGE+BPF_K, (hi), 0, 4), \
200 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (hi), 0, 5), \
201 BPF_STMT(BPF_LD+BPF_MEM, 0), /* swap in lo */ \
202 BPF_JUMP(BPF_JMP+BPF_JGT+BPF_K, (lo), 2, 0), \
203 BPF_STMT(BPF_LD+BPF_MEM, 1), /* passed: swap hi back in */ \
204 jt, \
205 BPF_STMT(BPF_LD+BPF_MEM, 1) /* failed: swap hi back in */
206
207#define JGT32(value, jt) \
208 BPF_JUMP(BPF_JMP+BPF_JGT+BPF_K, (value), 0, 1), \
209 jt
210
211#define JLE32(value, jt) \
212 BPF_JUMP(BPF_JMP+BPF_JGT+BPF_K, (value), 1, 0), \
213 jt
214
215/* Check hi > args.hi first, then do the GE checking */
216#define JGT64(lo, hi, jt) \
217 BPF_JUMP(BPF_JMP+BPF_JGT+BPF_K, (hi), 4, 0), \
218 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (hi), 0, 5), \
219 BPF_STMT(BPF_LD+BPF_MEM, 0), /* swap in lo */ \
220 BPF_JUMP(BPF_JMP+BPF_JGT+BPF_K, (lo), 0, 2), \
221 BPF_STMT(BPF_LD+BPF_MEM, 1), /* passed: swap hi back in */ \
222 jt, \
223 BPF_STMT(BPF_LD+BPF_MEM, 1) /* failed: swap hi back in */
224
225#define JLE64(lo, hi, jt) \
226 BPF_JUMP(BPF_JMP+BPF_JGT+BPF_K, (hi), 6, 0), \
227 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, (hi), 0, 3), \
228 BPF_STMT(BPF_LD+BPF_MEM, 0), /* swap in lo */ \
229 BPF_JUMP(BPF_JMP+BPF_JGT+BPF_K, (lo), 2, 0), \
230 BPF_STMT(BPF_LD+BPF_MEM, 1), /* passed: swap hi back in */ \
231 jt, \
232 BPF_STMT(BPF_LD+BPF_MEM, 1) /* failed: swap hi back in */
233
234#define LOAD_SYSCALL_NR \
235 BPF_STMT(BPF_LD+BPF_W+BPF_ABS, \
236 offsetof(struct seccomp_data, nr))
237
238#endif /* __BPF_HELPER_H__ */
diff --git a/samples/seccomp/dropper.c b/samples/seccomp/dropper.c
new file mode 100644
index 000000000000..c69c347c7011
--- /dev/null
+++ b/samples/seccomp/dropper.c
@@ -0,0 +1,68 @@
1/*
2 * Naive system call dropper built on seccomp_filter.
3 *
4 * Copyright (c) 2012 The Chromium OS Authors <chromium-os-dev@chromium.org>
5 * Author: Will Drewry <wad@chromium.org>
6 *
7 * The code may be used by anyone for any purpose,
8 * and can serve as a starting point for developing
9 * applications using prctl(PR_SET_SECCOMP, 2, ...).
10 *
11 * When run, returns the specified errno for the specified
12 * system call number against the given architecture.
13 *
14 * Run this one as root as PR_SET_NO_NEW_PRIVS is not called.
15 */
16
17#include <errno.h>
18#include <linux/audit.h>
19#include <linux/filter.h>
20#include <linux/seccomp.h>
21#include <linux/unistd.h>
22#include <stdio.h>
23#include <stddef.h>
24#include <stdlib.h>
25#include <sys/prctl.h>
26#include <unistd.h>
27
28static int install_filter(int nr, int arch, int error)
29{
30 struct sock_filter filter[] = {
31 BPF_STMT(BPF_LD+BPF_W+BPF_ABS,
32 (offsetof(struct seccomp_data, arch))),
33 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, arch, 0, 3),
34 BPF_STMT(BPF_LD+BPF_W+BPF_ABS,
35 (offsetof(struct seccomp_data, nr))),
36 BPF_JUMP(BPF_JMP+BPF_JEQ+BPF_K, nr, 0, 1),
37 BPF_STMT(BPF_RET+BPF_K,
38 SECCOMP_RET_ERRNO|(error & SECCOMP_RET_DATA)),
39 BPF_STMT(BPF_RET+BPF_K, SECCOMP_RET_ALLOW),
40 };
41 struct sock_fprog prog = {
42 .len = (unsigned short)(sizeof(filter)/sizeof(filter[0])),
43 .filter = filter,
44 };
45 if (prctl(PR_SET_SECCOMP, 2, &prog)) {
46 perror("prctl");
47 return 1;
48 }
49 return 0;
50}
51
52int main(int argc, char **argv)
53{
54 if (argc < 5) {
55 fprintf(stderr, "Usage:\n"
56 "dropper <syscall_nr> <arch> <errno> <prog> [<args>]\n"
57 "Hint: AUDIT_ARCH_I386: 0x%X\n"
58 " AUDIT_ARCH_X86_64: 0x%X\n"
59 "\n", AUDIT_ARCH_I386, AUDIT_ARCH_X86_64);
60 return 1;
61 }
62 if (install_filter(strtol(argv[1], NULL, 0), strtol(argv[2], NULL, 0),
63 strtol(argv[3], NULL, 0)))
64 return 1;
65 execv(argv[4], &argv[4]);
66 printf("Failed to execv\n");
67 return 255;
68}