diff options
author | Alexei Starovoitov <ast@plumgrid.com> | 2014-09-26 03:17:02 -0400 |
---|---|---|
committer | David S. Miller <davem@davemloft.net> | 2014-09-26 15:05:14 -0400 |
commit | 51580e798cb61b0fc63fa3aa6c5c975375aa0550 (patch) | |
tree | 2b608f048ba6415a28be79135af26f28ba7ebd5b /Documentation/networking | |
parent | 0a542a86d73b1577e7d4f55fc95dcffd3fe62643 (diff) |
bpf: verifier (add docs)
this patch adds all of eBPF verfier documentation and empty bpf_check()
The end goal for the verifier is to statically check safety of the program.
Verifier will catch:
- loops
- out of range jumps
- unreachable instructions
- invalid instructions
- uninitialized register access
- uninitialized stack access
- misaligned stack access
- out of range stack access
- invalid calling convention
More details in Documentation/networking/filter.txt
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Diffstat (limited to 'Documentation/networking')
-rw-r--r-- | Documentation/networking/filter.txt | 224 |
1 files changed, 224 insertions, 0 deletions
diff --git a/Documentation/networking/filter.txt b/Documentation/networking/filter.txt index 4a01d71785e9..5ce4d07406a5 100644 --- a/Documentation/networking/filter.txt +++ b/Documentation/networking/filter.txt | |||
@@ -1001,6 +1001,99 @@ instruction that loads 64-bit immediate value into a dst_reg. | |||
1001 | Classic BPF has similar instruction: BPF_LD | BPF_W | BPF_IMM which loads | 1001 | Classic BPF has similar instruction: BPF_LD | BPF_W | BPF_IMM which loads |
1002 | 32-bit immediate value into a register. | 1002 | 32-bit immediate value into a register. |
1003 | 1003 | ||
1004 | eBPF verifier | ||
1005 | ------------- | ||
1006 | The safety of the eBPF program is determined in two steps. | ||
1007 | |||
1008 | First step does DAG check to disallow loops and other CFG validation. | ||
1009 | In particular it will detect programs that have unreachable instructions. | ||
1010 | (though classic BPF checker allows them) | ||
1011 | |||
1012 | Second step starts from the first insn and descends all possible paths. | ||
1013 | It simulates execution of every insn and observes the state change of | ||
1014 | registers and stack. | ||
1015 | |||
1016 | At the start of the program the register R1 contains a pointer to context | ||
1017 | and has type PTR_TO_CTX. | ||
1018 | If verifier sees an insn that does R2=R1, then R2 has now type | ||
1019 | PTR_TO_CTX as well and can be used on the right hand side of expression. | ||
1020 | If R1=PTR_TO_CTX and insn is R2=R1+R1, then R2=UNKNOWN_VALUE, | ||
1021 | since addition of two valid pointers makes invalid pointer. | ||
1022 | (In 'secure' mode verifier will reject any type of pointer arithmetic to make | ||
1023 | sure that kernel addresses don't leak to unprivileged users) | ||
1024 | |||
1025 | If register was never written to, it's not readable: | ||
1026 | bpf_mov R0 = R2 | ||
1027 | bpf_exit | ||
1028 | will be rejected, since R2 is unreadable at the start of the program. | ||
1029 | |||
1030 | After kernel function call, R1-R5 are reset to unreadable and | ||
1031 | R0 has a return type of the function. | ||
1032 | |||
1033 | Since R6-R9 are callee saved, their state is preserved across the call. | ||
1034 | bpf_mov R6 = 1 | ||
1035 | bpf_call foo | ||
1036 | bpf_mov R0 = R6 | ||
1037 | bpf_exit | ||
1038 | is a correct program. If there was R1 instead of R6, it would have | ||
1039 | been rejected. | ||
1040 | |||
1041 | load/store instructions are allowed only with registers of valid types, which | ||
1042 | are PTR_TO_CTX, PTR_TO_MAP, FRAME_PTR. They are bounds and alignment checked. | ||
1043 | For example: | ||
1044 | bpf_mov R1 = 1 | ||
1045 | bpf_mov R2 = 2 | ||
1046 | bpf_xadd *(u32 *)(R1 + 3) += R2 | ||
1047 | bpf_exit | ||
1048 | will be rejected, since R1 doesn't have a valid pointer type at the time of | ||
1049 | execution of instruction bpf_xadd. | ||
1050 | |||
1051 | At the start R1 type is PTR_TO_CTX (a pointer to generic 'struct bpf_context') | ||
1052 | A callback is used to customize verifier to restrict eBPF program access to only | ||
1053 | certain fields within ctx structure with specified size and alignment. | ||
1054 | |||
1055 | For example, the following insn: | ||
1056 | bpf_ld R0 = *(u32 *)(R6 + 8) | ||
1057 | intends to load a word from address R6 + 8 and store it into R0 | ||
1058 | If R6=PTR_TO_CTX, via is_valid_access() callback the verifier will know | ||
1059 | that offset 8 of size 4 bytes can be accessed for reading, otherwise | ||
1060 | the verifier will reject the program. | ||
1061 | If R6=FRAME_PTR, then access should be aligned and be within | ||
1062 | stack bounds, which are [-MAX_BPF_STACK, 0). In this example offset is 8, | ||
1063 | so it will fail verification, since it's out of bounds. | ||
1064 | |||
1065 | The verifier will allow eBPF program to read data from stack only after | ||
1066 | it wrote into it. | ||
1067 | Classic BPF verifier does similar check with M[0-15] memory slots. | ||
1068 | For example: | ||
1069 | bpf_ld R0 = *(u32 *)(R10 - 4) | ||
1070 | bpf_exit | ||
1071 | is invalid program. | ||
1072 | Though R10 is correct read-only register and has type FRAME_PTR | ||
1073 | and R10 - 4 is within stack bounds, there were no stores into that location. | ||
1074 | |||
1075 | Pointer register spill/fill is tracked as well, since four (R6-R9) | ||
1076 | callee saved registers may not be enough for some programs. | ||
1077 | |||
1078 | Allowed function calls are customized with bpf_verifier_ops->get_func_proto() | ||
1079 | The eBPF verifier will check that registers match argument constraints. | ||
1080 | After the call register R0 will be set to return type of the function. | ||
1081 | |||
1082 | Function calls is a main mechanism to extend functionality of eBPF programs. | ||
1083 | Socket filters may let programs to call one set of functions, whereas tracing | ||
1084 | filters may allow completely different set. | ||
1085 | |||
1086 | If a function made accessible to eBPF program, it needs to be thought through | ||
1087 | from safety point of view. The verifier will guarantee that the function is | ||
1088 | called with valid arguments. | ||
1089 | |||
1090 | seccomp vs socket filters have different security restrictions for classic BPF. | ||
1091 | Seccomp solves this by two stage verifier: classic BPF verifier is followed | ||
1092 | by seccomp verifier. In case of eBPF one configurable verifier is shared for | ||
1093 | all use cases. | ||
1094 | |||
1095 | See details of eBPF verifier in kernel/bpf/verifier.c | ||
1096 | |||
1004 | eBPF maps | 1097 | eBPF maps |
1005 | --------- | 1098 | --------- |
1006 | 'maps' is a generic storage of different types for sharing data between kernel | 1099 | 'maps' is a generic storage of different types for sharing data between kernel |
@@ -1040,6 +1133,137 @@ The map is defined by: | |||
1040 | . key size in bytes | 1133 | . key size in bytes |
1041 | . value size in bytes | 1134 | . value size in bytes |
1042 | 1135 | ||
1136 | Understanding eBPF verifier messages | ||
1137 | ------------------------------------ | ||
1138 | |||
1139 | The following are few examples of invalid eBPF programs and verifier error | ||
1140 | messages as seen in the log: | ||
1141 | |||
1142 | Program with unreachable instructions: | ||
1143 | static struct bpf_insn prog[] = { | ||
1144 | BPF_EXIT_INSN(), | ||
1145 | BPF_EXIT_INSN(), | ||
1146 | }; | ||
1147 | Error: | ||
1148 | unreachable insn 1 | ||
1149 | |||
1150 | Program that reads uninitialized register: | ||
1151 | BPF_MOV64_REG(BPF_REG_0, BPF_REG_2), | ||
1152 | BPF_EXIT_INSN(), | ||
1153 | Error: | ||
1154 | 0: (bf) r0 = r2 | ||
1155 | R2 !read_ok | ||
1156 | |||
1157 | Program that doesn't initialize R0 before exiting: | ||
1158 | BPF_MOV64_REG(BPF_REG_2, BPF_REG_1), | ||
1159 | BPF_EXIT_INSN(), | ||
1160 | Error: | ||
1161 | 0: (bf) r2 = r1 | ||
1162 | 1: (95) exit | ||
1163 | R0 !read_ok | ||
1164 | |||
1165 | Program that accesses stack out of bounds: | ||
1166 | BPF_ST_MEM(BPF_DW, BPF_REG_10, 8, 0), | ||
1167 | BPF_EXIT_INSN(), | ||
1168 | Error: | ||
1169 | 0: (7a) *(u64 *)(r10 +8) = 0 | ||
1170 | invalid stack off=8 size=8 | ||
1171 | |||
1172 | Program that doesn't initialize stack before passing its address into function: | ||
1173 | BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), | ||
1174 | BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), | ||
1175 | BPF_LD_MAP_FD(BPF_REG_1, 0), | ||
1176 | BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem), | ||
1177 | BPF_EXIT_INSN(), | ||
1178 | Error: | ||
1179 | 0: (bf) r2 = r10 | ||
1180 | 1: (07) r2 += -8 | ||
1181 | 2: (b7) r1 = 0x0 | ||
1182 | 3: (85) call 1 | ||
1183 | invalid indirect read from stack off -8+0 size 8 | ||
1184 | |||
1185 | Program that uses invalid map_fd=0 while calling to map_lookup_elem() function: | ||
1186 | BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), | ||
1187 | BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), | ||
1188 | BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), | ||
1189 | BPF_LD_MAP_FD(BPF_REG_1, 0), | ||
1190 | BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem), | ||
1191 | BPF_EXIT_INSN(), | ||
1192 | Error: | ||
1193 | 0: (7a) *(u64 *)(r10 -8) = 0 | ||
1194 | 1: (bf) r2 = r10 | ||
1195 | 2: (07) r2 += -8 | ||
1196 | 3: (b7) r1 = 0x0 | ||
1197 | 4: (85) call 1 | ||
1198 | fd 0 is not pointing to valid bpf_map | ||
1199 | |||
1200 | Program that doesn't check return value of map_lookup_elem() before accessing | ||
1201 | map element: | ||
1202 | BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), | ||
1203 | BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), | ||
1204 | BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), | ||
1205 | BPF_LD_MAP_FD(BPF_REG_1, 0), | ||
1206 | BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem), | ||
1207 | BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 0), | ||
1208 | BPF_EXIT_INSN(), | ||
1209 | Error: | ||
1210 | 0: (7a) *(u64 *)(r10 -8) = 0 | ||
1211 | 1: (bf) r2 = r10 | ||
1212 | 2: (07) r2 += -8 | ||
1213 | 3: (b7) r1 = 0x0 | ||
1214 | 4: (85) call 1 | ||
1215 | 5: (7a) *(u64 *)(r0 +0) = 0 | ||
1216 | R0 invalid mem access 'map_value_or_null' | ||
1217 | |||
1218 | Program that correctly checks map_lookup_elem() returned value for NULL, but | ||
1219 | accesses the memory with incorrect alignment: | ||
1220 | BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), | ||
1221 | BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), | ||
1222 | BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), | ||
1223 | BPF_LD_MAP_FD(BPF_REG_1, 0), | ||
1224 | BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem), | ||
1225 | BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1), | ||
1226 | BPF_ST_MEM(BPF_DW, BPF_REG_0, 4, 0), | ||
1227 | BPF_EXIT_INSN(), | ||
1228 | Error: | ||
1229 | 0: (7a) *(u64 *)(r10 -8) = 0 | ||
1230 | 1: (bf) r2 = r10 | ||
1231 | 2: (07) r2 += -8 | ||
1232 | 3: (b7) r1 = 1 | ||
1233 | 4: (85) call 1 | ||
1234 | 5: (15) if r0 == 0x0 goto pc+1 | ||
1235 | R0=map_ptr R10=fp | ||
1236 | 6: (7a) *(u64 *)(r0 +4) = 0 | ||
1237 | misaligned access off 4 size 8 | ||
1238 | |||
1239 | Program that correctly checks map_lookup_elem() returned value for NULL and | ||
1240 | accesses memory with correct alignment in one side of 'if' branch, but fails | ||
1241 | to do so in the other side of 'if' branch: | ||
1242 | BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), | ||
1243 | BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), | ||
1244 | BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), | ||
1245 | BPF_LD_MAP_FD(BPF_REG_1, 0), | ||
1246 | BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem), | ||
1247 | BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2), | ||
1248 | BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 0), | ||
1249 | BPF_EXIT_INSN(), | ||
1250 | BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 1), | ||
1251 | BPF_EXIT_INSN(), | ||
1252 | Error: | ||
1253 | 0: (7a) *(u64 *)(r10 -8) = 0 | ||
1254 | 1: (bf) r2 = r10 | ||
1255 | 2: (07) r2 += -8 | ||
1256 | 3: (b7) r1 = 1 | ||
1257 | 4: (85) call 1 | ||
1258 | 5: (15) if r0 == 0x0 goto pc+2 | ||
1259 | R0=map_ptr R10=fp | ||
1260 | 6: (7a) *(u64 *)(r0 +0) = 0 | ||
1261 | 7: (95) exit | ||
1262 | |||
1263 | from 5 to 8: R0=imm0 R10=fp | ||
1264 | 8: (7a) *(u64 *)(r0 +0) = 1 | ||
1265 | R0 invalid mem access 'imm' | ||
1266 | |||
1043 | Testing | 1267 | Testing |
1044 | ------- | 1268 | ------- |
1045 | 1269 | ||