diff options
| author | Linus Torvalds <torvalds@linux-foundation.org> | 2019-01-02 12:48:13 -0500 |
|---|---|---|
| committer | Linus Torvalds <torvalds@linux-foundation.org> | 2019-01-02 12:48:13 -0500 |
| commit | d9a7fa67b4bfe6ce93ee9aab23ae2e7ca0763e84 (patch) | |
| tree | ea15c22c088160107c09da1c8d380753bb0c8d21 /Documentation/userspace-api | |
| parent | f218a29c25ad8abdb961435d6b8139f462061364 (diff) | |
| parent | 55b8cbe470d103b44104c64dbf89e5cad525d4e0 (diff) | |
Merge branch 'next-seccomp' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull seccomp updates from James Morris:
- Add SECCOMP_RET_USER_NOTIF
- seccomp fixes for sparse warnings and s390 build (Tycho)
* 'next-seccomp' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
seccomp, s390: fix build for syscall type change
seccomp: fix poor type promotion
samples: add an example of seccomp user trap
seccomp: add a return code to trap to userspace
seccomp: switch system call argument type to void *
seccomp: hoist struct seccomp_data recalculation higher
Diffstat (limited to 'Documentation/userspace-api')
| -rw-r--r-- | Documentation/userspace-api/seccomp_filter.rst | 84 |
1 files changed, 84 insertions, 0 deletions
diff --git a/Documentation/userspace-api/seccomp_filter.rst b/Documentation/userspace-api/seccomp_filter.rst index 82a468bc7560..b1b846d8a094 100644 --- a/Documentation/userspace-api/seccomp_filter.rst +++ b/Documentation/userspace-api/seccomp_filter.rst | |||
| @@ -122,6 +122,11 @@ In precedence order, they are: | |||
| 122 | Results in the lower 16-bits of the return value being passed | 122 | Results in the lower 16-bits of the return value being passed |
| 123 | to userland as the errno without executing the system call. | 123 | to userland as the errno without executing the system call. |
| 124 | 124 | ||
| 125 | ``SECCOMP_RET_USER_NOTIF``: | ||
| 126 | Results in a ``struct seccomp_notif`` message sent on the userspace | ||
| 127 | notification fd, if it is attached, or ``-ENOSYS`` if it is not. See below | ||
| 128 | on discussion of how to handle user notifications. | ||
| 129 | |||
| 125 | ``SECCOMP_RET_TRACE``: | 130 | ``SECCOMP_RET_TRACE``: |
| 126 | When returned, this value will cause the kernel to attempt to | 131 | When returned, this value will cause the kernel to attempt to |
| 127 | notify a ``ptrace()``-based tracer prior to executing the system | 132 | notify a ``ptrace()``-based tracer prior to executing the system |
| @@ -183,6 +188,85 @@ The ``samples/seccomp/`` directory contains both an x86-specific example | |||
| 183 | and a more generic example of a higher level macro interface for BPF | 188 | and a more generic example of a higher level macro interface for BPF |
| 184 | program generation. | 189 | program generation. |
| 185 | 190 | ||
| 191 | Userspace Notification | ||
| 192 | ====================== | ||
| 193 | |||
| 194 | The ``SECCOMP_RET_USER_NOTIF`` return code lets seccomp filters pass a | ||
| 195 | particular syscall to userspace to be handled. This may be useful for | ||
| 196 | applications like container managers, which wish to intercept particular | ||
| 197 | syscalls (``mount()``, ``finit_module()``, etc.) and change their behavior. | ||
| 198 | |||
| 199 | To acquire a notification FD, use the ``SECCOMP_FILTER_FLAG_NEW_LISTENER`` | ||
| 200 | argument to the ``seccomp()`` syscall: | ||
| 201 | |||
| 202 | .. code-block:: c | ||
| 203 | |||
| 204 | fd = seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_NEW_LISTENER, &prog); | ||
| 205 | |||
| 206 | which (on success) will return a listener fd for the filter, which can then be | ||
| 207 | passed around via ``SCM_RIGHTS`` or similar. Note that filter fds correspond to | ||
| 208 | a particular filter, and not a particular task. So if this task then forks, | ||
| 209 | notifications from both tasks will appear on the same filter fd. Reads and | ||
| 210 | writes to/from a filter fd are also synchronized, so a filter fd can safely | ||
| 211 | have many readers. | ||
| 212 | |||
| 213 | The interface for a seccomp notification fd consists of two structures: | ||
| 214 | |||
| 215 | .. code-block:: c | ||
| 216 | |||
| 217 | struct seccomp_notif_sizes { | ||
| 218 | __u16 seccomp_notif; | ||
| 219 | __u16 seccomp_notif_resp; | ||
| 220 | __u16 seccomp_data; | ||
| 221 | }; | ||
| 222 | |||
| 223 | struct seccomp_notif { | ||
| 224 | __u64 id; | ||
| 225 | __u32 pid; | ||
| 226 | __u32 flags; | ||
| 227 | struct seccomp_data data; | ||
| 228 | }; | ||
| 229 | |||
| 230 | struct seccomp_notif_resp { | ||
| 231 | __u64 id; | ||
| 232 | __s64 val; | ||
| 233 | __s32 error; | ||
| 234 | __u32 flags; | ||
| 235 | }; | ||
| 236 | |||
| 237 | The ``struct seccomp_notif_sizes`` structure can be used to determine the size | ||
| 238 | of the various structures used in seccomp notifications. The size of ``struct | ||
| 239 | seccomp_data`` may change in the future, so code should use: | ||
| 240 | |||
| 241 | .. code-block:: c | ||
| 242 | |||
| 243 | struct seccomp_notif_sizes sizes; | ||
| 244 | seccomp(SECCOMP_GET_NOTIF_SIZES, 0, &sizes); | ||
| 245 | |||
| 246 | to determine the size of the various structures to allocate. See | ||
| 247 | samples/seccomp/user-trap.c for an example. | ||
| 248 | |||
| 249 | Users can read via ``ioctl(SECCOMP_IOCTL_NOTIF_RECV)`` (or ``poll()``) on a | ||
| 250 | seccomp notification fd to receive a ``struct seccomp_notif``, which contains | ||
| 251 | five members: the input length of the structure, a unique-per-filter ``id``, | ||
| 252 | the ``pid`` of the task which triggered this request (which may be 0 if the | ||
| 253 | task is in a pid ns not visible from the listener's pid namespace), a ``flags`` | ||
| 254 | member which for now only has ``SECCOMP_NOTIF_FLAG_SIGNALED``, representing | ||
| 255 | whether or not the notification is a result of a non-fatal signal, and the | ||
| 256 | ``data`` passed to seccomp. Userspace can then make a decision based on this | ||
| 257 | information about what to do, and ``ioctl(SECCOMP_IOCTL_NOTIF_SEND)`` a | ||
| 258 | response, indicating what should be returned to userspace. The ``id`` member of | ||
| 259 | ``struct seccomp_notif_resp`` should be the same ``id`` as in ``struct | ||
| 260 | seccomp_notif``. | ||
| 261 | |||
| 262 | It is worth noting that ``struct seccomp_data`` contains the values of register | ||
| 263 | arguments to the syscall, but does not contain pointers to memory. The task's | ||
| 264 | memory is accessible to suitably privileged traces via ``ptrace()`` or | ||
| 265 | ``/proc/pid/mem``. However, care should be taken to avoid the TOCTOU mentioned | ||
| 266 | above in this document: all arguments being read from the tracee's memory | ||
| 267 | should be read into the tracer's memory before any policy decisions are made. | ||
| 268 | This allows for an atomic decision on syscall arguments. | ||
| 269 | |||
| 186 | Sysctls | 270 | Sysctls |
| 187 | ======= | 271 | ======= |
| 188 | 272 | ||
