aboutsummaryrefslogtreecommitdiffstats
path: root/samples
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2015-04-14 17:37:47 -0400
committerLinus Torvalds <torvalds@linux-foundation.org>2015-04-14 17:37:47 -0400
commit6c8a53c9e6a151fffb07f8b4c34bd1e33dddd467 (patch)
tree791caf826ef136c521a97b7878f226b6ba1c1d75 /samples
parente95e7f627062be5e6ce971ce873e6234c91ffc50 (diff)
parent066450be419fa48007a9f29e19828f2a86198754 (diff)
Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf changes from Ingo Molnar: "Core kernel changes: - One of the more interesting features in this cycle is the ability to attach eBPF programs (user-defined, sandboxed bytecode executed by the kernel) to kprobes. This allows user-defined instrumentation on a live kernel image that can never crash, hang or interfere with the kernel negatively. (Right now it's limited to root-only, but in the future we might allow unprivileged use as well.) (Alexei Starovoitov) - Another non-trivial feature is per event clockid support: this allows, amongst other things, the selection of different clock sources for event timestamps traced via perf. This feature is sought by people who'd like to merge perf generated events with external events that were measured with different clocks: - cluster wide profiling - for system wide tracing with user-space events, - JIT profiling events etc. Matching perf tooling support is added as well, available via the -k, --clockid <clockid> parameter to perf record et al. (Peter Zijlstra) Hardware enablement kernel changes: - x86 Intel Processor Trace (PT) support: which is a hardware tracer on steroids, available on Broadwell CPUs. The hardware trace stream is directly output into the user-space ring-buffer, using the 'AUX' data format extension that was added to the perf core to support hardware constraints such as the necessity to have the tracing buffer physically contiguous. This patch-set was developed for two years and this is the result. A simple way to make use of this is to use BTS tracing, the PT driver emulates BTS output - available via the 'intel_bts' PMU. More explicit PT specific tooling support is in the works as well - will probably be ready by 4.2. (Alexander Shishkin, Peter Zijlstra) - x86 Intel Cache QoS Monitoring (CQM) support: this is a hardware feature of Intel Xeon CPUs that allows the measurement and allocation/partitioning of caches to individual workloads. These kernel changes expose the measurement side as a new PMU driver, which exposes various QoS related PMU events. (The partitioning change is work in progress and is planned to be merged as a cgroup extension.) (Matt Fleming, Peter Zijlstra; CPU feature detection by Peter P Waskiewicz Jr) - x86 Intel Haswell LBR call stack support: this is a new Haswell feature that allows the hardware recording of call chains, plus tooling support. To activate this feature you have to enable it via the new 'lbr' call-graph recording option: perf record --call-graph lbr perf report or: perf top --call-graph lbr This hardware feature is a lot faster than stack walk or dwarf based unwinding, but has some limitations: - It reuses the current LBR facility, so LBR call stack and branch record can not be enabled at the same time. - It is only available for user-space callchains. (Yan, Zheng) - x86 Intel Broadwell CPU support and various event constraints and event table fixes for earlier models. (Andi Kleen) - x86 Intel HT CPUs event scheduling workarounds. This is a complex CPU bug affecting the SNB,IVB,HSW families that results in counter value corruption. The mitigation code is automatically enabled and is transparent. (Maria Dimakopoulou, Stephane Eranian) The perf tooling side had a ton of changes in this cycle as well, so I'm only able to list the user visible changes here, in addition to the tooling changes outlined above: User visible changes affecting all tools: - Improve support of compressed kernel modules (Jiri Olsa) - Save DSO loading errno to better report errors (Arnaldo Carvalho de Melo) - Bash completion for subcommands (Yunlong Song) - Add 'I' event modifier for perf_event_attr.exclude_idle bit (Jiri Olsa) - Support missing -f to override perf.data file ownership. (Yunlong Song) - Show the first event with an invalid filter (David Ahern, Arnaldo Carvalho de Melo) User visible changes in individual tools: 'perf data': New tool for converting perf.data to other formats, initially for the CTF (Common Trace Format) from LTTng (Jiri Olsa, Sebastian Siewior) 'perf diff': Add --kallsyms option (David Ahern) 'perf list': Allow listing events with 'tracepoint' prefix (Yunlong Song) Sort the output of the command (Yunlong Song) 'perf kmem': Respect -i option (Jiri Olsa) Print big numbers using thousands' group (Namhyung Kim) Allow -v option (Namhyung Kim) Fix alignment of slab result table (Namhyung Kim) 'perf probe': Support multiple probes on different binaries on the same command line (Masami Hiramatsu) Support unnamed union/structure members data collection. (Masami Hiramatsu) Check kprobes blacklist when adding new events. (Masami Hiramatsu) 'perf record': Teach 'perf record' about perf_event_attr.clockid (Peter Zijlstra) Support recording running/enabled time (Andi Kleen) 'perf sched': Improve the performance of 'perf sched replay' on high CPU core count machines (Yunlong Song) 'perf report' and 'perf top': Allow annotating entries in callchains in the hists browser (Arnaldo Carvalho de Melo) Indicate which callchain entries are annotated in the TUI hists browser (Arnaldo Carvalho de Melo) Add pid/tid filtering to 'report' and 'script' commands (David Ahern) Consider PERF_RECORD_ events with cpumode == 0 in 'perf top', removing one cause of long term memory usage buildup, i.e. not processing PERF_RECORD_EXIT events (Arnaldo Carvalho de Melo) 'perf stat': Report unsupported events properly (Suzuki K. Poulose) Output running time and run/enabled ratio in CSV mode (Andi Kleen) 'perf trace': Handle legacy syscalls tracepoints (David Ahern, Arnaldo Carvalho de Melo) Only insert blank duration bracket when tracing syscalls (Arnaldo Carvalho de Melo) Filter out the trace pid when no threads are specified (Arnaldo Carvalho de Melo) Dump stack on segfaults (Arnaldo Carvalho de Melo) No need to explicitely enable evsels for workload started from perf, let it be enabled via perf_event_attr.enable_on_exec, removing some events that take place in the 'perf trace' before a workload is really started by it. (Arnaldo Carvalho de Melo) Allow mixing with tracepoints and suppressing plain syscalls. (Arnaldo Carvalho de Melo) There's also been a ton of infrastructure work done, such as the split-out of perf's build system into tools/build/ and other changes - see the shortlog and changelog for details" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (358 commits) perf/x86/intel/pt: Clean up the control flow in pt_pmu_hw_init() perf evlist: Fix type for references to data_head/tail perf probe: Check the orphaned -x option perf probe: Support multiple probes on different binaries perf buildid-list: Fix segfault when show DSOs with hits perf tools: Fix cross-endian analysis perf tools: Fix error path to do closedir() when synthesizing threads perf tools: Fix synthesizing fork_event.ppid for non-main thread perf tools: Add 'I' event modifier for exclude_idle bit perf report: Don't call map__kmap if map is NULL. perf tests: Fix attr tests perf probe: Fix ARM 32 building error perf tools: Merge all perf_event_attr print functions perf record: Add clockid parameter perf sched replay: Use replay_repeat to calculate the runavg of cpu usage instead of the default value 10 perf sched replay: Support using -f to override perf.data file ownership perf sched replay: Fix the EMFILE error caused by the limitation of the maximum open files perf sched replay: Handle the dead halt of sem_wait when create_tasks() fails for any task perf sched replay: Fix the segmentation fault problem caused by pr_err in threads perf sched replay: Realloc the memory of pid_to_task stepwise to adapt to the different pid_max configurations ...
Diffstat (limited to 'samples')
-rw-r--r--samples/bpf/Makefile16
-rw-r--r--samples/bpf/bpf_helpers.h6
-rw-r--r--samples/bpf/bpf_load.c125
-rw-r--r--samples/bpf/bpf_load.h3
-rw-r--r--samples/bpf/libbpf.c14
-rw-r--r--samples/bpf/libbpf.h5
-rw-r--r--samples/bpf/sock_example.c2
-rw-r--r--samples/bpf/test_verifier.c2
-rw-r--r--samples/bpf/tracex1_kern.c50
-rw-r--r--samples/bpf/tracex1_user.c25
-rw-r--r--samples/bpf/tracex2_kern.c86
-rw-r--r--samples/bpf/tracex2_user.c95
-rw-r--r--samples/bpf/tracex3_kern.c89
-rw-r--r--samples/bpf/tracex3_user.c150
-rw-r--r--samples/bpf/tracex4_kern.c54
-rw-r--r--samples/bpf/tracex4_user.c69
16 files changed, 779 insertions, 12 deletions
diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index b5b3600dcdf5..fe98fb226e6e 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -6,23 +6,39 @@ hostprogs-y := test_verifier test_maps
6hostprogs-y += sock_example 6hostprogs-y += sock_example
7hostprogs-y += sockex1 7hostprogs-y += sockex1
8hostprogs-y += sockex2 8hostprogs-y += sockex2
9hostprogs-y += tracex1
10hostprogs-y += tracex2
11hostprogs-y += tracex3
12hostprogs-y += tracex4
9 13
10test_verifier-objs := test_verifier.o libbpf.o 14test_verifier-objs := test_verifier.o libbpf.o
11test_maps-objs := test_maps.o libbpf.o 15test_maps-objs := test_maps.o libbpf.o
12sock_example-objs := sock_example.o libbpf.o 16sock_example-objs := sock_example.o libbpf.o
13sockex1-objs := bpf_load.o libbpf.o sockex1_user.o 17sockex1-objs := bpf_load.o libbpf.o sockex1_user.o
14sockex2-objs := bpf_load.o libbpf.o sockex2_user.o 18sockex2-objs := bpf_load.o libbpf.o sockex2_user.o
19tracex1-objs := bpf_load.o libbpf.o tracex1_user.o
20tracex2-objs := bpf_load.o libbpf.o tracex2_user.o
21tracex3-objs := bpf_load.o libbpf.o tracex3_user.o
22tracex4-objs := bpf_load.o libbpf.o tracex4_user.o
15 23
16# Tell kbuild to always build the programs 24# Tell kbuild to always build the programs
17always := $(hostprogs-y) 25always := $(hostprogs-y)
18always += sockex1_kern.o 26always += sockex1_kern.o
19always += sockex2_kern.o 27always += sockex2_kern.o
28always += tracex1_kern.o
29always += tracex2_kern.o
30always += tracex3_kern.o
31always += tracex4_kern.o
20 32
21HOSTCFLAGS += -I$(objtree)/usr/include 33HOSTCFLAGS += -I$(objtree)/usr/include
22 34
23HOSTCFLAGS_bpf_load.o += -I$(objtree)/usr/include -Wno-unused-variable 35HOSTCFLAGS_bpf_load.o += -I$(objtree)/usr/include -Wno-unused-variable
24HOSTLOADLIBES_sockex1 += -lelf 36HOSTLOADLIBES_sockex1 += -lelf
25HOSTLOADLIBES_sockex2 += -lelf 37HOSTLOADLIBES_sockex2 += -lelf
38HOSTLOADLIBES_tracex1 += -lelf
39HOSTLOADLIBES_tracex2 += -lelf
40HOSTLOADLIBES_tracex3 += -lelf
41HOSTLOADLIBES_tracex4 += -lelf -lrt
26 42
27# point this to your LLVM backend with bpf support 43# point this to your LLVM backend with bpf support
28LLC=$(srctree)/tools/bpf/llvm/bld/Debug+Asserts/bin/llc 44LLC=$(srctree)/tools/bpf/llvm/bld/Debug+Asserts/bin/llc
diff --git a/samples/bpf/bpf_helpers.h b/samples/bpf/bpf_helpers.h
index ca0333146006..1c872bcf5a80 100644
--- a/samples/bpf/bpf_helpers.h
+++ b/samples/bpf/bpf_helpers.h
@@ -15,6 +15,12 @@ static int (*bpf_map_update_elem)(void *map, void *key, void *value,
15 (void *) BPF_FUNC_map_update_elem; 15 (void *) BPF_FUNC_map_update_elem;
16static int (*bpf_map_delete_elem)(void *map, void *key) = 16static int (*bpf_map_delete_elem)(void *map, void *key) =
17 (void *) BPF_FUNC_map_delete_elem; 17 (void *) BPF_FUNC_map_delete_elem;
18static int (*bpf_probe_read)(void *dst, int size, void *unsafe_ptr) =
19 (void *) BPF_FUNC_probe_read;
20static unsigned long long (*bpf_ktime_get_ns)(void) =
21 (void *) BPF_FUNC_ktime_get_ns;
22static int (*bpf_trace_printk)(const char *fmt, int fmt_size, ...) =
23 (void *) BPF_FUNC_trace_printk;
18 24
19/* llvm builtin functions that eBPF C program may use to 25/* llvm builtin functions that eBPF C program may use to
20 * emit BPF_LD_ABS and BPF_LD_IND instructions 26 * emit BPF_LD_ABS and BPF_LD_IND instructions
diff --git a/samples/bpf/bpf_load.c b/samples/bpf/bpf_load.c
index 1831d236382b..38dac5a53b51 100644
--- a/samples/bpf/bpf_load.c
+++ b/samples/bpf/bpf_load.c
@@ -8,29 +8,70 @@
8#include <unistd.h> 8#include <unistd.h>
9#include <string.h> 9#include <string.h>
10#include <stdbool.h> 10#include <stdbool.h>
11#include <stdlib.h>
11#include <linux/bpf.h> 12#include <linux/bpf.h>
12#include <linux/filter.h> 13#include <linux/filter.h>
14#include <linux/perf_event.h>
15#include <sys/syscall.h>
16#include <sys/ioctl.h>
17#include <sys/mman.h>
18#include <poll.h>
13#include "libbpf.h" 19#include "libbpf.h"
14#include "bpf_helpers.h" 20#include "bpf_helpers.h"
15#include "bpf_load.h" 21#include "bpf_load.h"
16 22
23#define DEBUGFS "/sys/kernel/debug/tracing/"
24
17static char license[128]; 25static char license[128];
26static int kern_version;
18static bool processed_sec[128]; 27static bool processed_sec[128];
19int map_fd[MAX_MAPS]; 28int map_fd[MAX_MAPS];
20int prog_fd[MAX_PROGS]; 29int prog_fd[MAX_PROGS];
30int event_fd[MAX_PROGS];
21int prog_cnt; 31int prog_cnt;
22 32
23static int load_and_attach(const char *event, struct bpf_insn *prog, int size) 33static int load_and_attach(const char *event, struct bpf_insn *prog, int size)
24{ 34{
25 int fd;
26 bool is_socket = strncmp(event, "socket", 6) == 0; 35 bool is_socket = strncmp(event, "socket", 6) == 0;
27 36 bool is_kprobe = strncmp(event, "kprobe/", 7) == 0;
28 if (!is_socket) 37 bool is_kretprobe = strncmp(event, "kretprobe/", 10) == 0;
29 /* tracing events tbd */ 38 enum bpf_prog_type prog_type;
39 char buf[256];
40 int fd, efd, err, id;
41 struct perf_event_attr attr = {};
42
43 attr.type = PERF_TYPE_TRACEPOINT;
44 attr.sample_type = PERF_SAMPLE_RAW;
45 attr.sample_period = 1;
46 attr.wakeup_events = 1;
47
48 if (is_socket) {
49 prog_type = BPF_PROG_TYPE_SOCKET_FILTER;
50 } else if (is_kprobe || is_kretprobe) {
51 prog_type = BPF_PROG_TYPE_KPROBE;
52 } else {
53 printf("Unknown event '%s'\n", event);
30 return -1; 54 return -1;
55 }
56
57 if (is_kprobe || is_kretprobe) {
58 if (is_kprobe)
59 event += 7;
60 else
61 event += 10;
62
63 snprintf(buf, sizeof(buf),
64 "echo '%c:%s %s' >> /sys/kernel/debug/tracing/kprobe_events",
65 is_kprobe ? 'p' : 'r', event, event);
66 err = system(buf);
67 if (err < 0) {
68 printf("failed to create kprobe '%s' error '%s'\n",
69 event, strerror(errno));
70 return -1;
71 }
72 }
31 73
32 fd = bpf_prog_load(BPF_PROG_TYPE_SOCKET_FILTER, 74 fd = bpf_prog_load(prog_type, prog, size, license, kern_version);
33 prog, size, license);
34 75
35 if (fd < 0) { 76 if (fd < 0) {
36 printf("bpf_prog_load() err=%d\n%s", errno, bpf_log_buf); 77 printf("bpf_prog_load() err=%d\n%s", errno, bpf_log_buf);
@@ -39,6 +80,41 @@ static int load_and_attach(const char *event, struct bpf_insn *prog, int size)
39 80
40 prog_fd[prog_cnt++] = fd; 81 prog_fd[prog_cnt++] = fd;
41 82
83 if (is_socket)
84 return 0;
85
86 strcpy(buf, DEBUGFS);
87 strcat(buf, "events/kprobes/");
88 strcat(buf, event);
89 strcat(buf, "/id");
90
91 efd = open(buf, O_RDONLY, 0);
92 if (efd < 0) {
93 printf("failed to open event %s\n", event);
94 return -1;
95 }
96
97 err = read(efd, buf, sizeof(buf));
98 if (err < 0 || err >= sizeof(buf)) {
99 printf("read from '%s' failed '%s'\n", event, strerror(errno));
100 return -1;
101 }
102
103 close(efd);
104
105 buf[err] = 0;
106 id = atoi(buf);
107 attr.config = id;
108
109 efd = perf_event_open(&attr, -1/*pid*/, 0/*cpu*/, -1/*group_fd*/, 0);
110 if (efd < 0) {
111 printf("event %d fd %d err %s\n", id, efd, strerror(errno));
112 return -1;
113 }
114 event_fd[prog_cnt - 1] = efd;
115 ioctl(efd, PERF_EVENT_IOC_ENABLE, 0);
116 ioctl(efd, PERF_EVENT_IOC_SET_BPF, fd);
117
42 return 0; 118 return 0;
43} 119}
44 120
@@ -135,6 +211,9 @@ int load_bpf_file(char *path)
135 if (gelf_getehdr(elf, &ehdr) != &ehdr) 211 if (gelf_getehdr(elf, &ehdr) != &ehdr)
136 return 1; 212 return 1;
137 213
214 /* clear all kprobes */
215 i = system("echo \"\" > /sys/kernel/debug/tracing/kprobe_events");
216
138 /* scan over all elf sections to get license and map info */ 217 /* scan over all elf sections to get license and map info */
139 for (i = 1; i < ehdr.e_shnum; i++) { 218 for (i = 1; i < ehdr.e_shnum; i++) {
140 219
@@ -149,6 +228,14 @@ int load_bpf_file(char *path)
149 if (strcmp(shname, "license") == 0) { 228 if (strcmp(shname, "license") == 0) {
150 processed_sec[i] = true; 229 processed_sec[i] = true;
151 memcpy(license, data->d_buf, data->d_size); 230 memcpy(license, data->d_buf, data->d_size);
231 } else if (strcmp(shname, "version") == 0) {
232 processed_sec[i] = true;
233 if (data->d_size != sizeof(int)) {
234 printf("invalid size of version section %zd\n",
235 data->d_size);
236 return 1;
237 }
238 memcpy(&kern_version, data->d_buf, sizeof(int));
152 } else if (strcmp(shname, "maps") == 0) { 239 } else if (strcmp(shname, "maps") == 0) {
153 processed_sec[i] = true; 240 processed_sec[i] = true;
154 if (load_maps(data->d_buf, data->d_size)) 241 if (load_maps(data->d_buf, data->d_size))
@@ -178,7 +265,8 @@ int load_bpf_file(char *path)
178 if (parse_relo_and_apply(data, symbols, &shdr, insns)) 265 if (parse_relo_and_apply(data, symbols, &shdr, insns))
179 continue; 266 continue;
180 267
181 if (memcmp(shname_prog, "events/", 7) == 0 || 268 if (memcmp(shname_prog, "kprobe/", 7) == 0 ||
269 memcmp(shname_prog, "kretprobe/", 10) == 0 ||
182 memcmp(shname_prog, "socket", 6) == 0) 270 memcmp(shname_prog, "socket", 6) == 0)
183 load_and_attach(shname_prog, insns, data_prog->d_size); 271 load_and_attach(shname_prog, insns, data_prog->d_size);
184 } 272 }
@@ -193,7 +281,8 @@ int load_bpf_file(char *path)
193 if (get_sec(elf, i, &ehdr, &shname, &shdr, &data)) 281 if (get_sec(elf, i, &ehdr, &shname, &shdr, &data))
194 continue; 282 continue;
195 283
196 if (memcmp(shname, "events/", 7) == 0 || 284 if (memcmp(shname, "kprobe/", 7) == 0 ||
285 memcmp(shname, "kretprobe/", 10) == 0 ||
197 memcmp(shname, "socket", 6) == 0) 286 memcmp(shname, "socket", 6) == 0)
198 load_and_attach(shname, data->d_buf, data->d_size); 287 load_and_attach(shname, data->d_buf, data->d_size);
199 } 288 }
@@ -201,3 +290,23 @@ int load_bpf_file(char *path)
201 close(fd); 290 close(fd);
202 return 0; 291 return 0;
203} 292}
293
294void read_trace_pipe(void)
295{
296 int trace_fd;
297
298 trace_fd = open(DEBUGFS "trace_pipe", O_RDONLY, 0);
299 if (trace_fd < 0)
300 return;
301
302 while (1) {
303 static char buf[4096];
304 ssize_t sz;
305
306 sz = read(trace_fd, buf, sizeof(buf));
307 if (sz > 0) {
308 buf[sz] = 0;
309 puts(buf);
310 }
311 }
312}
diff --git a/samples/bpf/bpf_load.h b/samples/bpf/bpf_load.h
index 27789a34f5e6..cbd7c2b532b9 100644
--- a/samples/bpf/bpf_load.h
+++ b/samples/bpf/bpf_load.h
@@ -6,6 +6,7 @@
6 6
7extern int map_fd[MAX_MAPS]; 7extern int map_fd[MAX_MAPS];
8extern int prog_fd[MAX_PROGS]; 8extern int prog_fd[MAX_PROGS];
9extern int event_fd[MAX_PROGS];
9 10
10/* parses elf file compiled by llvm .c->.o 11/* parses elf file compiled by llvm .c->.o
11 * . parses 'maps' section and creates maps via BPF syscall 12 * . parses 'maps' section and creates maps via BPF syscall
@@ -21,4 +22,6 @@ extern int prog_fd[MAX_PROGS];
21 */ 22 */
22int load_bpf_file(char *path); 23int load_bpf_file(char *path);
23 24
25void read_trace_pipe(void);
26
24#endif 27#endif
diff --git a/samples/bpf/libbpf.c b/samples/bpf/libbpf.c
index 46d50b7ddf79..7e1efa7e2ed7 100644
--- a/samples/bpf/libbpf.c
+++ b/samples/bpf/libbpf.c
@@ -81,7 +81,7 @@ char bpf_log_buf[LOG_BUF_SIZE];
81 81
82int bpf_prog_load(enum bpf_prog_type prog_type, 82int bpf_prog_load(enum bpf_prog_type prog_type,
83 const struct bpf_insn *insns, int prog_len, 83 const struct bpf_insn *insns, int prog_len,
84 const char *license) 84 const char *license, int kern_version)
85{ 85{
86 union bpf_attr attr = { 86 union bpf_attr attr = {
87 .prog_type = prog_type, 87 .prog_type = prog_type,
@@ -93,6 +93,11 @@ int bpf_prog_load(enum bpf_prog_type prog_type,
93 .log_level = 1, 93 .log_level = 1,
94 }; 94 };
95 95
96 /* assign one field outside of struct init to make sure any
97 * padding is zero initialized
98 */
99 attr.kern_version = kern_version;
100
96 bpf_log_buf[0] = 0; 101 bpf_log_buf[0] = 0;
97 102
98 return syscall(__NR_bpf, BPF_PROG_LOAD, &attr, sizeof(attr)); 103 return syscall(__NR_bpf, BPF_PROG_LOAD, &attr, sizeof(attr));
@@ -121,3 +126,10 @@ int open_raw_sock(const char *name)
121 126
122 return sock; 127 return sock;
123} 128}
129
130int perf_event_open(struct perf_event_attr *attr, int pid, int cpu,
131 int group_fd, unsigned long flags)
132{
133 return syscall(__NR_perf_event_open, attr, pid, cpu,
134 group_fd, flags);
135}
diff --git a/samples/bpf/libbpf.h b/samples/bpf/libbpf.h
index 58c5fe1bdba1..ac7b09672b46 100644
--- a/samples/bpf/libbpf.h
+++ b/samples/bpf/libbpf.h
@@ -13,7 +13,7 @@ int bpf_get_next_key(int fd, void *key, void *next_key);
13 13
14int bpf_prog_load(enum bpf_prog_type prog_type, 14int bpf_prog_load(enum bpf_prog_type prog_type,
15 const struct bpf_insn *insns, int insn_len, 15 const struct bpf_insn *insns, int insn_len,
16 const char *license); 16 const char *license, int kern_version);
17 17
18#define LOG_BUF_SIZE 65536 18#define LOG_BUF_SIZE 65536
19extern char bpf_log_buf[LOG_BUF_SIZE]; 19extern char bpf_log_buf[LOG_BUF_SIZE];
@@ -182,4 +182,7 @@ extern char bpf_log_buf[LOG_BUF_SIZE];
182/* create RAW socket and bind to interface 'name' */ 182/* create RAW socket and bind to interface 'name' */
183int open_raw_sock(const char *name); 183int open_raw_sock(const char *name);
184 184
185struct perf_event_attr;
186int perf_event_open(struct perf_event_attr *attr, int pid, int cpu,
187 int group_fd, unsigned long flags);
185#endif 188#endif
diff --git a/samples/bpf/sock_example.c b/samples/bpf/sock_example.c
index c8ad0404416f..a0ce251c5390 100644
--- a/samples/bpf/sock_example.c
+++ b/samples/bpf/sock_example.c
@@ -56,7 +56,7 @@ static int test_sock(void)
56 }; 56 };
57 57
58 prog_fd = bpf_prog_load(BPF_PROG_TYPE_SOCKET_FILTER, prog, sizeof(prog), 58 prog_fd = bpf_prog_load(BPF_PROG_TYPE_SOCKET_FILTER, prog, sizeof(prog),
59 "GPL"); 59 "GPL", 0);
60 if (prog_fd < 0) { 60 if (prog_fd < 0) {
61 printf("failed to load prog '%s'\n", strerror(errno)); 61 printf("failed to load prog '%s'\n", strerror(errno));
62 goto cleanup; 62 goto cleanup;
diff --git a/samples/bpf/test_verifier.c b/samples/bpf/test_verifier.c
index b96175e90363..740ce97cda5e 100644
--- a/samples/bpf/test_verifier.c
+++ b/samples/bpf/test_verifier.c
@@ -689,7 +689,7 @@ static int test(void)
689 689
690 prog_fd = bpf_prog_load(BPF_PROG_TYPE_UNSPEC, prog, 690 prog_fd = bpf_prog_load(BPF_PROG_TYPE_UNSPEC, prog,
691 prog_len * sizeof(struct bpf_insn), 691 prog_len * sizeof(struct bpf_insn),
692 "GPL"); 692 "GPL", 0);
693 693
694 if (tests[i].result == ACCEPT) { 694 if (tests[i].result == ACCEPT) {
695 if (prog_fd < 0) { 695 if (prog_fd < 0) {
diff --git a/samples/bpf/tracex1_kern.c b/samples/bpf/tracex1_kern.c
new file mode 100644
index 000000000000..31620463701a
--- /dev/null
+++ b/samples/bpf/tracex1_kern.c
@@ -0,0 +1,50 @@
1/* Copyright (c) 2013-2015 PLUMgrid, http://plumgrid.com
2 *
3 * This program is free software; you can redistribute it and/or
4 * modify it under the terms of version 2 of the GNU General Public
5 * License as published by the Free Software Foundation.
6 */
7#include <linux/skbuff.h>
8#include <linux/netdevice.h>
9#include <uapi/linux/bpf.h>
10#include <linux/version.h>
11#include "bpf_helpers.h"
12
13#define _(P) ({typeof(P) val = 0; bpf_probe_read(&val, sizeof(val), &P); val;})
14
15/* kprobe is NOT a stable ABI
16 * kernel functions can be removed, renamed or completely change semantics.
17 * Number of arguments and their positions can change, etc.
18 * In such case this bpf+kprobe example will no longer be meaningful
19 */
20SEC("kprobe/__netif_receive_skb_core")
21int bpf_prog1(struct pt_regs *ctx)
22{
23 /* attaches to kprobe netif_receive_skb,
24 * looks for packets on loobpack device and prints them
25 */
26 char devname[IFNAMSIZ] = {};
27 struct net_device *dev;
28 struct sk_buff *skb;
29 int len;
30
31 /* non-portable! works for the given kernel only */
32 skb = (struct sk_buff *) ctx->di;
33
34 dev = _(skb->dev);
35
36 len = _(skb->len);
37
38 bpf_probe_read(devname, sizeof(devname), dev->name);
39
40 if (devname[0] == 'l' && devname[1] == 'o') {
41 char fmt[] = "skb %p len %d\n";
42 /* using bpf_trace_printk() for DEBUG ONLY */
43 bpf_trace_printk(fmt, sizeof(fmt), skb, len);
44 }
45
46 return 0;
47}
48
49char _license[] SEC("license") = "GPL";
50u32 _version SEC("version") = LINUX_VERSION_CODE;
diff --git a/samples/bpf/tracex1_user.c b/samples/bpf/tracex1_user.c
new file mode 100644
index 000000000000..31a48183beea
--- /dev/null
+++ b/samples/bpf/tracex1_user.c
@@ -0,0 +1,25 @@
1#include <stdio.h>
2#include <linux/bpf.h>
3#include <unistd.h>
4#include "libbpf.h"
5#include "bpf_load.h"
6
7int main(int ac, char **argv)
8{
9 FILE *f;
10 char filename[256];
11
12 snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
13
14 if (load_bpf_file(filename)) {
15 printf("%s", bpf_log_buf);
16 return 1;
17 }
18
19 f = popen("taskset 1 ping -c5 localhost", "r");
20 (void) f;
21
22 read_trace_pipe();
23
24 return 0;
25}
diff --git a/samples/bpf/tracex2_kern.c b/samples/bpf/tracex2_kern.c
new file mode 100644
index 000000000000..19ec1cfc45db
--- /dev/null
+++ b/samples/bpf/tracex2_kern.c
@@ -0,0 +1,86 @@
1/* Copyright (c) 2013-2015 PLUMgrid, http://plumgrid.com
2 *
3 * This program is free software; you can redistribute it and/or
4 * modify it under the terms of version 2 of the GNU General Public
5 * License as published by the Free Software Foundation.
6 */
7#include <linux/skbuff.h>
8#include <linux/netdevice.h>
9#include <linux/version.h>
10#include <uapi/linux/bpf.h>
11#include "bpf_helpers.h"
12
13struct bpf_map_def SEC("maps") my_map = {
14 .type = BPF_MAP_TYPE_HASH,
15 .key_size = sizeof(long),
16 .value_size = sizeof(long),
17 .max_entries = 1024,
18};
19
20/* kprobe is NOT a stable ABI. If kernel internals change this bpf+kprobe
21 * example will no longer be meaningful
22 */
23SEC("kprobe/kfree_skb")
24int bpf_prog2(struct pt_regs *ctx)
25{
26 long loc = 0;
27 long init_val = 1;
28 long *value;
29
30 /* x64 specific: read ip of kfree_skb caller.
31 * non-portable version of __builtin_return_address(0)
32 */
33 bpf_probe_read(&loc, sizeof(loc), (void *)ctx->sp);
34
35 value = bpf_map_lookup_elem(&my_map, &loc);
36 if (value)
37 *value += 1;
38 else
39 bpf_map_update_elem(&my_map, &loc, &init_val, BPF_ANY);
40 return 0;
41}
42
43static unsigned int log2(unsigned int v)
44{
45 unsigned int r;
46 unsigned int shift;
47
48 r = (v > 0xFFFF) << 4; v >>= r;
49 shift = (v > 0xFF) << 3; v >>= shift; r |= shift;
50 shift = (v > 0xF) << 2; v >>= shift; r |= shift;
51 shift = (v > 0x3) << 1; v >>= shift; r |= shift;
52 r |= (v >> 1);
53 return r;
54}
55
56static unsigned int log2l(unsigned long v)
57{
58 unsigned int hi = v >> 32;
59 if (hi)
60 return log2(hi) + 32;
61 else
62 return log2(v);
63}
64
65struct bpf_map_def SEC("maps") my_hist_map = {
66 .type = BPF_MAP_TYPE_ARRAY,
67 .key_size = sizeof(u32),
68 .value_size = sizeof(long),
69 .max_entries = 64,
70};
71
72SEC("kprobe/sys_write")
73int bpf_prog3(struct pt_regs *ctx)
74{
75 long write_size = ctx->dx; /* arg3 */
76 long init_val = 1;
77 long *value;
78 u32 index = log2l(write_size);
79
80 value = bpf_map_lookup_elem(&my_hist_map, &index);
81 if (value)
82 __sync_fetch_and_add(value, 1);
83 return 0;
84}
85char _license[] SEC("license") = "GPL";
86u32 _version SEC("version") = LINUX_VERSION_CODE;
diff --git a/samples/bpf/tracex2_user.c b/samples/bpf/tracex2_user.c
new file mode 100644
index 000000000000..91b8d0896fbb
--- /dev/null
+++ b/samples/bpf/tracex2_user.c
@@ -0,0 +1,95 @@
1#include <stdio.h>
2#include <unistd.h>
3#include <stdlib.h>
4#include <signal.h>
5#include <linux/bpf.h>
6#include "libbpf.h"
7#include "bpf_load.h"
8
9#define MAX_INDEX 64
10#define MAX_STARS 38
11
12static void stars(char *str, long val, long max, int width)
13{
14 int i;
15
16 for (i = 0; i < (width * val / max) - 1 && i < width - 1; i++)
17 str[i] = '*';
18 if (val > max)
19 str[i - 1] = '+';
20 str[i] = '\0';
21}
22
23static void print_hist(int fd)
24{
25 int key;
26 long value;
27 long data[MAX_INDEX] = {};
28 char starstr[MAX_STARS];
29 int i;
30 int max_ind = -1;
31 long max_value = 0;
32
33 for (key = 0; key < MAX_INDEX; key++) {
34 bpf_lookup_elem(fd, &key, &value);
35 data[key] = value;
36 if (value && key > max_ind)
37 max_ind = key;
38 if (value > max_value)
39 max_value = value;
40 }
41
42 printf(" syscall write() stats\n");
43 printf(" byte_size : count distribution\n");
44 for (i = 1; i <= max_ind + 1; i++) {
45 stars(starstr, data[i - 1], max_value, MAX_STARS);
46 printf("%8ld -> %-8ld : %-8ld |%-*s|\n",
47 (1l << i) >> 1, (1l << i) - 1, data[i - 1],
48 MAX_STARS, starstr);
49 }
50}
51static void int_exit(int sig)
52{
53 print_hist(map_fd[1]);
54 exit(0);
55}
56
57int main(int ac, char **argv)
58{
59 char filename[256];
60 long key, next_key, value;
61 FILE *f;
62 int i;
63
64 snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
65
66 signal(SIGINT, int_exit);
67
68 /* start 'ping' in the background to have some kfree_skb events */
69 f = popen("ping -c5 localhost", "r");
70 (void) f;
71
72 /* start 'dd' in the background to have plenty of 'write' syscalls */
73 f = popen("dd if=/dev/zero of=/dev/null count=5000000", "r");
74 (void) f;
75
76 if (load_bpf_file(filename)) {
77 printf("%s", bpf_log_buf);
78 return 1;
79 }
80
81 for (i = 0; i < 5; i++) {
82 key = 0;
83 while (bpf_get_next_key(map_fd[0], &key, &next_key) == 0) {
84 bpf_lookup_elem(map_fd[0], &next_key, &value);
85 printf("location 0x%lx count %ld\n", next_key, value);
86 key = next_key;
87 }
88 if (key)
89 printf("\n");
90 sleep(1);
91 }
92 print_hist(map_fd[1]);
93
94 return 0;
95}
diff --git a/samples/bpf/tracex3_kern.c b/samples/bpf/tracex3_kern.c
new file mode 100644
index 000000000000..255ff2792366
--- /dev/null
+++ b/samples/bpf/tracex3_kern.c
@@ -0,0 +1,89 @@
1/* Copyright (c) 2013-2015 PLUMgrid, http://plumgrid.com
2 *
3 * This program is free software; you can redistribute it and/or
4 * modify it under the terms of version 2 of the GNU General Public
5 * License as published by the Free Software Foundation.
6 */
7#include <linux/skbuff.h>
8#include <linux/netdevice.h>
9#include <linux/version.h>
10#include <uapi/linux/bpf.h>
11#include "bpf_helpers.h"
12
13struct bpf_map_def SEC("maps") my_map = {
14 .type = BPF_MAP_TYPE_HASH,
15 .key_size = sizeof(long),
16 .value_size = sizeof(u64),
17 .max_entries = 4096,
18};
19
20/* kprobe is NOT a stable ABI. If kernel internals change this bpf+kprobe
21 * example will no longer be meaningful
22 */
23SEC("kprobe/blk_mq_start_request")
24int bpf_prog1(struct pt_regs *ctx)
25{
26 long rq = ctx->di;
27 u64 val = bpf_ktime_get_ns();
28
29 bpf_map_update_elem(&my_map, &rq, &val, BPF_ANY);
30 return 0;
31}
32
33static unsigned int log2l(unsigned long long n)
34{
35#define S(k) if (n >= (1ull << k)) { i += k; n >>= k; }
36 int i = -(n == 0);
37 S(32); S(16); S(8); S(4); S(2); S(1);
38 return i;
39#undef S
40}
41
42#define SLOTS 100
43
44struct bpf_map_def SEC("maps") lat_map = {
45 .type = BPF_MAP_TYPE_ARRAY,
46 .key_size = sizeof(u32),
47 .value_size = sizeof(u64),
48 .max_entries = SLOTS,
49};
50
51SEC("kprobe/blk_update_request")
52int bpf_prog2(struct pt_regs *ctx)
53{
54 long rq = ctx->di;
55 u64 *value, l, base;
56 u32 index;
57
58 value = bpf_map_lookup_elem(&my_map, &rq);
59 if (!value)
60 return 0;
61
62 u64 cur_time = bpf_ktime_get_ns();
63 u64 delta = cur_time - *value;
64
65 bpf_map_delete_elem(&my_map, &rq);
66
67 /* the lines below are computing index = log10(delta)*10
68 * using integer arithmetic
69 * index = 29 ~ 1 usec
70 * index = 59 ~ 1 msec
71 * index = 89 ~ 1 sec
72 * index = 99 ~ 10sec or more
73 * log10(x)*10 = log2(x)*10/log2(10) = log2(x)*3
74 */
75 l = log2l(delta);
76 base = 1ll << l;
77 index = (l * 64 + (delta - base) * 64 / base) * 3 / 64;
78
79 if (index >= SLOTS)
80 index = SLOTS - 1;
81
82 value = bpf_map_lookup_elem(&lat_map, &index);
83 if (value)
84 __sync_fetch_and_add((long *)value, 1);
85
86 return 0;
87}
88char _license[] SEC("license") = "GPL";
89u32 _version SEC("version") = LINUX_VERSION_CODE;
diff --git a/samples/bpf/tracex3_user.c b/samples/bpf/tracex3_user.c
new file mode 100644
index 000000000000..0aaa933ab938
--- /dev/null
+++ b/samples/bpf/tracex3_user.c
@@ -0,0 +1,150 @@
1/* Copyright (c) 2013-2015 PLUMgrid, http://plumgrid.com
2 *
3 * This program is free software; you can redistribute it and/or
4 * modify it under the terms of version 2 of the GNU General Public
5 * License as published by the Free Software Foundation.
6 */
7#include <stdio.h>
8#include <stdlib.h>
9#include <signal.h>
10#include <unistd.h>
11#include <stdbool.h>
12#include <string.h>
13#include <linux/bpf.h>
14#include "libbpf.h"
15#include "bpf_load.h"
16
17#define ARRAY_SIZE(x) (sizeof(x) / sizeof(*(x)))
18
19#define SLOTS 100
20
21static void clear_stats(int fd)
22{
23 __u32 key;
24 __u64 value = 0;
25
26 for (key = 0; key < SLOTS; key++)
27 bpf_update_elem(fd, &key, &value, BPF_ANY);
28}
29
30const char *color[] = {
31 "\033[48;5;255m",
32 "\033[48;5;252m",
33 "\033[48;5;250m",
34 "\033[48;5;248m",
35 "\033[48;5;246m",
36 "\033[48;5;244m",
37 "\033[48;5;242m",
38 "\033[48;5;240m",
39 "\033[48;5;238m",
40 "\033[48;5;236m",
41 "\033[48;5;234m",
42 "\033[48;5;232m",
43};
44const int num_colors = ARRAY_SIZE(color);
45
46const char nocolor[] = "\033[00m";
47
48const char *sym[] = {
49 " ",
50 " ",
51 ".",
52 ".",
53 "*",
54 "*",
55 "o",
56 "o",
57 "O",
58 "O",
59 "#",
60 "#",
61};
62
63bool full_range = false;
64bool text_only = false;
65
66static void print_banner(void)
67{
68 if (full_range)
69 printf("|1ns |10ns |100ns |1us |10us |100us"
70 " |1ms |10ms |100ms |1s |10s\n");
71 else
72 printf("|1us |10us |100us |1ms |10ms "
73 "|100ms |1s |10s\n");
74}
75
76static void print_hist(int fd)
77{
78 __u32 key;
79 __u64 value;
80 __u64 cnt[SLOTS];
81 __u64 max_cnt = 0;
82 __u64 total_events = 0;
83
84 for (key = 0; key < SLOTS; key++) {
85 value = 0;
86 bpf_lookup_elem(fd, &key, &value);
87 cnt[key] = value;
88 total_events += value;
89 if (value > max_cnt)
90 max_cnt = value;
91 }
92 clear_stats(fd);
93 for (key = full_range ? 0 : 29; key < SLOTS; key++) {
94 int c = num_colors * cnt[key] / (max_cnt + 1);
95
96 if (text_only)
97 printf("%s", sym[c]);
98 else
99 printf("%s %s", color[c], nocolor);
100 }
101 printf(" # %lld\n", total_events);
102}
103
104int main(int ac, char **argv)
105{
106 char filename[256];
107 int i;
108
109 snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
110
111 if (load_bpf_file(filename)) {
112 printf("%s", bpf_log_buf);
113 return 1;
114 }
115
116 for (i = 1; i < ac; i++) {
117 if (strcmp(argv[i], "-a") == 0) {
118 full_range = true;
119 } else if (strcmp(argv[i], "-t") == 0) {
120 text_only = true;
121 } else if (strcmp(argv[i], "-h") == 0) {
122 printf("Usage:\n"
123 " -a display wider latency range\n"
124 " -t text only\n");
125 return 1;
126 }
127 }
128
129 printf(" heatmap of IO latency\n");
130 if (text_only)
131 printf(" %s", sym[num_colors - 1]);
132 else
133 printf(" %s %s", color[num_colors - 1], nocolor);
134 printf(" - many events with this latency\n");
135
136 if (text_only)
137 printf(" %s", sym[0]);
138 else
139 printf(" %s %s", color[0], nocolor);
140 printf(" - few events\n");
141
142 for (i = 0; ; i++) {
143 if (i % 20 == 0)
144 print_banner();
145 print_hist(map_fd[1]);
146 sleep(2);
147 }
148
149 return 0;
150}
diff --git a/samples/bpf/tracex4_kern.c b/samples/bpf/tracex4_kern.c
new file mode 100644
index 000000000000..126b80512228
--- /dev/null
+++ b/samples/bpf/tracex4_kern.c
@@ -0,0 +1,54 @@
1/* Copyright (c) 2015 PLUMgrid, http://plumgrid.com
2 *
3 * This program is free software; you can redistribute it and/or
4 * modify it under the terms of version 2 of the GNU General Public
5 * License as published by the Free Software Foundation.
6 */
7#include <linux/ptrace.h>
8#include <linux/version.h>
9#include <uapi/linux/bpf.h>
10#include "bpf_helpers.h"
11
12struct pair {
13 u64 val;
14 u64 ip;
15};
16
17struct bpf_map_def SEC("maps") my_map = {
18 .type = BPF_MAP_TYPE_HASH,
19 .key_size = sizeof(long),
20 .value_size = sizeof(struct pair),
21 .max_entries = 1000000,
22};
23
24/* kprobe is NOT a stable ABI. If kernel internals change this bpf+kprobe
25 * example will no longer be meaningful
26 */
27SEC("kprobe/kmem_cache_free")
28int bpf_prog1(struct pt_regs *ctx)
29{
30 long ptr = ctx->si;
31
32 bpf_map_delete_elem(&my_map, &ptr);
33 return 0;
34}
35
36SEC("kretprobe/kmem_cache_alloc_node")
37int bpf_prog2(struct pt_regs *ctx)
38{
39 long ptr = ctx->ax;
40 long ip = 0;
41
42 /* get ip address of kmem_cache_alloc_node() caller */
43 bpf_probe_read(&ip, sizeof(ip), (void *)(ctx->bp + sizeof(ip)));
44
45 struct pair v = {
46 .val = bpf_ktime_get_ns(),
47 .ip = ip,
48 };
49
50 bpf_map_update_elem(&my_map, &ptr, &v, BPF_ANY);
51 return 0;
52}
53char _license[] SEC("license") = "GPL";
54u32 _version SEC("version") = LINUX_VERSION_CODE;
diff --git a/samples/bpf/tracex4_user.c b/samples/bpf/tracex4_user.c
new file mode 100644
index 000000000000..bc4a3bdea6ed
--- /dev/null
+++ b/samples/bpf/tracex4_user.c
@@ -0,0 +1,69 @@
1/* Copyright (c) 2015 PLUMgrid, http://plumgrid.com
2 *
3 * This program is free software; you can redistribute it and/or
4 * modify it under the terms of version 2 of the GNU General Public
5 * License as published by the Free Software Foundation.
6 */
7#include <stdio.h>
8#include <stdlib.h>
9#include <signal.h>
10#include <unistd.h>
11#include <stdbool.h>
12#include <string.h>
13#include <time.h>
14#include <linux/bpf.h>
15#include "libbpf.h"
16#include "bpf_load.h"
17
18struct pair {
19 long long val;
20 __u64 ip;
21};
22
23static __u64 time_get_ns(void)
24{
25 struct timespec ts;
26
27 clock_gettime(CLOCK_MONOTONIC, &ts);
28 return ts.tv_sec * 1000000000ull + ts.tv_nsec;
29}
30
31static void print_old_objects(int fd)
32{
33 long long val = time_get_ns();
34 __u64 key, next_key;
35 struct pair v;
36
37 key = write(1, "\e[1;1H\e[2J", 12); /* clear screen */
38
39 key = -1;
40 while (bpf_get_next_key(map_fd[0], &key, &next_key) == 0) {
41 bpf_lookup_elem(map_fd[0], &next_key, &v);
42 key = next_key;
43 if (val - v.val < 1000000000ll)
44 /* object was allocated more then 1 sec ago */
45 continue;
46 printf("obj 0x%llx is %2lldsec old was allocated at ip %llx\n",
47 next_key, (val - v.val) / 1000000000ll, v.ip);
48 }
49}
50
51int main(int ac, char **argv)
52{
53 char filename[256];
54 int i;
55
56 snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
57
58 if (load_bpf_file(filename)) {
59 printf("%s", bpf_log_buf);
60 return 1;
61 }
62
63 for (i = 0; ; i++) {
64 print_old_objects(map_fd[1]);
65 sleep(1);
66 }
67
68 return 0;
69}