diff options
-rw-r--r-- | Documentation/bpf/bpf_design_QA.txt | 156 | ||||
-rw-r--r-- | MAINTAINERS | 1 |
2 files changed, 157 insertions, 0 deletions
diff --git a/Documentation/bpf/bpf_design_QA.txt b/Documentation/bpf/bpf_design_QA.txt new file mode 100644 index 000000000000..f3e458a0bb2f --- /dev/null +++ b/Documentation/bpf/bpf_design_QA.txt | |||
@@ -0,0 +1,156 @@ | |||
1 | BPF extensibility and applicability to networking, tracing, security | ||
2 | in the linux kernel and several user space implementations of BPF | ||
3 | virtual machine led to a number of misunderstanding on what BPF actually is. | ||
4 | This short QA is an attempt to address that and outline a direction | ||
5 | of where BPF is heading long term. | ||
6 | |||
7 | Q: Is BPF a generic instruction set similar to x64 and arm64? | ||
8 | A: NO. | ||
9 | |||
10 | Q: Is BPF a generic virtual machine ? | ||
11 | A: NO. | ||
12 | |||
13 | BPF is generic instruction set _with_ C calling convention. | ||
14 | |||
15 | Q: Why C calling convention was chosen? | ||
16 | A: Because BPF programs are designed to run in the linux kernel | ||
17 | which is written in C, hence BPF defines instruction set compatible | ||
18 | with two most used architectures x64 and arm64 (and takes into | ||
19 | consideration important quirks of other architectures) and | ||
20 | defines calling convention that is compatible with C calling | ||
21 | convention of the linux kernel on those architectures. | ||
22 | |||
23 | Q: can multiple return values be supported in the future? | ||
24 | A: NO. BPF allows only register R0 to be used as return value. | ||
25 | |||
26 | Q: can more than 5 function arguments be supported in the future? | ||
27 | A: NO. BPF calling convention only allows registers R1-R5 to be used | ||
28 | as arguments. BPF is not a standalone instruction set. | ||
29 | (unlike x64 ISA that allows msft, cdecl and other conventions) | ||
30 | |||
31 | Q: can BPF programs access instruction pointer or return address? | ||
32 | A: NO. | ||
33 | |||
34 | Q: can BPF programs access stack pointer ? | ||
35 | A: NO. Only frame pointer (register R10) is accessible. | ||
36 | From compiler point of view it's necessary to have stack pointer. | ||
37 | For example LLVM defines register R11 as stack pointer in its | ||
38 | BPF backend, but it makes sure that generated code never uses it. | ||
39 | |||
40 | Q: Does C-calling convention diminishes possible use cases? | ||
41 | A: YES. BPF design forces addition of major functionality in the form | ||
42 | of kernel helper functions and kernel objects like BPF maps with | ||
43 | seamless interoperability between them. It lets kernel call into | ||
44 | BPF programs and programs call kernel helpers with zero overhead. | ||
45 | As all of them were native C code. That is particularly the case | ||
46 | for JITed BPF programs that are indistinguishable from | ||
47 | native kernel C code. | ||
48 | |||
49 | Q: Does it mean that 'innovative' extensions to BPF code are disallowed? | ||
50 | A: Soft yes. At least for now until BPF core has support for | ||
51 | bpf-to-bpf calls, indirect calls, loops, global variables, | ||
52 | jump tables, read only sections and all other normal constructs | ||
53 | that C code can produce. | ||
54 | |||
55 | Q: Can loops be supported in a safe way? | ||
56 | A: It's not clear yet. BPF developers are trying to find a way to | ||
57 | support bounded loops where the verifier can guarantee that | ||
58 | the program terminates in less than 4096 instructions. | ||
59 | |||
60 | Q: How come LD_ABS and LD_IND instruction are present in BPF whereas | ||
61 | C code cannot express them and has to use builtin intrinsics? | ||
62 | A: This is artifact of compatibility with classic BPF. Modern | ||
63 | networking code in BPF performs better without them. | ||
64 | See 'direct packet access'. | ||
65 | |||
66 | Q: It seems not all BPF instructions are one-to-one to native CPU. | ||
67 | For example why BPF_JNE and other compare and jumps are not cpu-like? | ||
68 | A: This was necessary to avoid introducing flags into ISA which are | ||
69 | impossible to make generic and efficient across CPU architectures. | ||
70 | |||
71 | Q: why BPF_DIV instruction doesn't map to x64 div? | ||
72 | A: Because if we picked one-to-one relationship to x64 it would have made | ||
73 | it more complicated to support on arm64 and other archs. Also it | ||
74 | needs div-by-zero runtime check. | ||
75 | |||
76 | Q: why there is no BPF_SDIV for signed divide operation? | ||
77 | A: Because it would be rarely used. llvm errors in such case and | ||
78 | prints a suggestion to use unsigned divide instead | ||
79 | |||
80 | Q: Why BPF has implicit prologue and epilogue? | ||
81 | A: Because architectures like sparc have register windows and in general | ||
82 | there are enough subtle differences between architectures, so naive | ||
83 | store return address into stack won't work. Another reason is BPF has | ||
84 | to be safe from division by zero (and legacy exception path | ||
85 | of LD_ABS insn). Those instructions need to invoke epilogue and | ||
86 | return implicitly. | ||
87 | |||
88 | Q: Why BPF_JLT and BPF_JLE instructions were not introduced in the beginning? | ||
89 | A: Because classic BPF didn't have them and BPF authors felt that compiler | ||
90 | workaround would be acceptable. Turned out that programs lose performance | ||
91 | due to lack of these compare instructions and they were added. | ||
92 | These two instructions is a perfect example what kind of new BPF | ||
93 | instructions are acceptable and can be added in the future. | ||
94 | These two already had equivalent instructions in native CPUs. | ||
95 | New instructions that don't have one-to-one mapping to HW instructions | ||
96 | will not be accepted. | ||
97 | |||
98 | Q: BPF 32-bit subregisters have a requirement to zero upper 32-bits of BPF | ||
99 | registers which makes BPF inefficient virtual machine for 32-bit | ||
100 | CPU architectures and 32-bit HW accelerators. Can true 32-bit registers | ||
101 | be added to BPF in the future? | ||
102 | A: NO. The first thing to improve performance on 32-bit archs is to teach | ||
103 | LLVM to generate code that uses 32-bit subregisters. Then second step | ||
104 | is to teach verifier to mark operations where zero-ing upper bits | ||
105 | is unnecessary. Then JITs can take advantage of those markings and | ||
106 | drastically reduce size of generated code and improve performance. | ||
107 | |||
108 | Q: Does BPF have a stable ABI? | ||
109 | A: YES. BPF instructions, arguments to BPF programs, set of helper | ||
110 | functions and their arguments, recognized return codes are all part | ||
111 | of ABI. However when tracing programs are using bpf_probe_read() helper | ||
112 | to walk kernel internal datastructures and compile with kernel | ||
113 | internal headers these accesses can and will break with newer | ||
114 | kernels. The union bpf_attr -> kern_version is checked at load time | ||
115 | to prevent accidentally loading kprobe-based bpf programs written | ||
116 | for a different kernel. Networking programs don't do kern_version check. | ||
117 | |||
118 | Q: How much stack space a BPF program uses? | ||
119 | A: Currently all program types are limited to 512 bytes of stack | ||
120 | space, but the verifier computes the actual amount of stack used | ||
121 | and both interpreter and most JITed code consume necessary amount. | ||
122 | |||
123 | Q: Can BPF be offloaded to HW? | ||
124 | A: YES. BPF HW offload is supported by NFP driver. | ||
125 | |||
126 | Q: Does classic BPF interpreter still exist? | ||
127 | A: NO. Classic BPF programs are converted into extend BPF instructions. | ||
128 | |||
129 | Q: Can BPF call arbitrary kernel functions? | ||
130 | A: NO. BPF programs can only call a set of helper functions which | ||
131 | is defined for every program type. | ||
132 | |||
133 | Q: Can BPF overwrite arbitrary kernel memory? | ||
134 | A: NO. Tracing bpf programs can _read_ arbitrary memory with bpf_probe_read() | ||
135 | and bpf_probe_read_str() helpers. Networking programs cannot read | ||
136 | arbitrary memory, since they don't have access to these helpers. | ||
137 | Programs can never read or write arbitrary memory directly. | ||
138 | |||
139 | Q: Can BPF overwrite arbitrary user memory? | ||
140 | A: Sort-of. Tracing BPF programs can overwrite the user memory | ||
141 | of the current task with bpf_probe_write_user(). Every time such | ||
142 | program is loaded the kernel will print warning message, so | ||
143 | this helper is only useful for experiments and prototypes. | ||
144 | Tracing BPF programs are root only. | ||
145 | |||
146 | Q: When bpf_trace_printk() helper is used the kernel prints nasty | ||
147 | warning message. Why is that? | ||
148 | A: This is done to nudge program authors into better interfaces when | ||
149 | programs need to pass data to user space. Like bpf_perf_event_output() | ||
150 | can be used to efficiently stream data via perf ring buffer. | ||
151 | BPF maps can be used for asynchronous data sharing between kernel | ||
152 | and user space. bpf_trace_printk() should only be used for debugging. | ||
153 | |||
154 | Q: Can BPF functionality such as new program or map types, new | ||
155 | helpers, etc be added out of kernel module code? | ||
156 | A: NO. | ||
diff --git a/MAINTAINERS b/MAINTAINERS index c4d21b302409..66471f7d77d4 100644 --- a/MAINTAINERS +++ b/MAINTAINERS | |||
@@ -2713,6 +2713,7 @@ L: linux-kernel@vger.kernel.org | |||
2713 | S: Supported | 2713 | S: Supported |
2714 | F: arch/x86/net/bpf_jit* | 2714 | F: arch/x86/net/bpf_jit* |
2715 | F: Documentation/networking/filter.txt | 2715 | F: Documentation/networking/filter.txt |
2716 | F: Documentation/bpf/ | ||
2716 | F: include/linux/bpf* | 2717 | F: include/linux/bpf* |
2717 | F: include/linux/filter.h | 2718 | F: include/linux/filter.h |
2718 | F: include/uapi/linux/bpf* | 2719 | F: include/uapi/linux/bpf* |