aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/networking
diff options
context:
space:
mode:
authorAlexei Starovoitov <ast@fb.com>2016-05-05 22:49:13 -0400
committerDavid S. Miller <davem@davemloft.net>2016-05-06 16:01:54 -0400
commitf9c8d19d6c7c15a59963f80ec47e68808914abd4 (patch)
treeaa1ef70115c0fb206623a613956412f3aa330cec /Documentation/networking
parentdb58ba45920255e967cc1d62a430cebd634b5046 (diff)
bpf: add documentation for 'direct packet access'
explain how verifier checks safety of packet access and update email addresses. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Diffstat (limited to 'Documentation/networking')
-rw-r--r--Documentation/networking/filter.txt85
1 files changed, 83 insertions, 2 deletions
diff --git a/Documentation/networking/filter.txt b/Documentation/networking/filter.txt
index 96da119a47e7..6aef0b5f3bc7 100644
--- a/Documentation/networking/filter.txt
+++ b/Documentation/networking/filter.txt
@@ -1095,6 +1095,87 @@ all use cases.
1095 1095
1096See details of eBPF verifier in kernel/bpf/verifier.c 1096See details of eBPF verifier in kernel/bpf/verifier.c
1097 1097
1098Direct packet access
1099--------------------
1100In cls_bpf and act_bpf programs the verifier allows direct access to the packet
1101data via skb->data and skb->data_end pointers.
1102Ex:
11031: r4 = *(u32 *)(r1 +80) /* load skb->data_end */
11042: r3 = *(u32 *)(r1 +76) /* load skb->data */
11053: r5 = r3
11064: r5 += 14
11075: if r5 > r4 goto pc+16
1108R1=ctx R3=pkt(id=0,off=0,r=14) R4=pkt_end R5=pkt(id=0,off=14,r=14) R10=fp
11096: r0 = *(u16 *)(r3 +12) /* access 12 and 13 bytes of the packet */
1110
1111this 2byte load from the packet is safe to do, since the program author
1112did check 'if (skb->data + 14 > skb->data_end) goto err' at insn #5 which
1113means that in the fall-through case the register R3 (which points to skb->data)
1114has at least 14 directly accessible bytes. The verifier marks it
1115as R3=pkt(id=0,off=0,r=14).
1116id=0 means that no additional variables were added to the register.
1117off=0 means that no additional constants were added.
1118r=14 is the range of safe access which means that bytes [R3, R3 + 14) are ok.
1119Note that R5 is marked as R5=pkt(id=0,off=14,r=14). It also points
1120to the packet data, but constant 14 was added to the register, so
1121it now points to 'skb->data + 14' and accessible range is [R5, R5 + 14 - 14)
1122which is zero bytes.
1123
1124More complex packet access may look like:
1125 R0=imm1 R1=ctx R3=pkt(id=0,off=0,r=14) R4=pkt_end R5=pkt(id=0,off=14,r=14) R10=fp
1126 6: r0 = *(u8 *)(r3 +7) /* load 7th byte from the packet */
1127 7: r4 = *(u8 *)(r3 +12)
1128 8: r4 *= 14
1129 9: r3 = *(u32 *)(r1 +76) /* load skb->data */
113010: r3 += r4
113111: r2 = r1
113212: r2 <<= 48
113313: r2 >>= 48
113414: r3 += r2
113515: r2 = r3
113616: r2 += 8
113717: r1 = *(u32 *)(r1 +80) /* load skb->data_end */
113818: if r2 > r1 goto pc+2
1139 R0=inv56 R1=pkt_end R2=pkt(id=2,off=8,r=8) R3=pkt(id=2,off=0,r=8) R4=inv52 R5=pkt(id=0,off=14,r=14) R10=fp
114019: r1 = *(u8 *)(r3 +4)
1141The state of the register R3 is R3=pkt(id=2,off=0,r=8)
1142id=2 means that two 'r3 += rX' instructions were seen, so r3 points to some
1143offset within a packet and since the program author did
1144'if (r3 + 8 > r1) goto err' at insn #18, the safe range is [R3, R3 + 8).
1145The verifier only allows 'add' operation on packet registers. Any other
1146operation will set the register state to 'unknown_value' and it won't be
1147available for direct packet access.
1148Operation 'r3 += rX' may overflow and become less than original skb->data,
1149therefore the verifier has to prevent that. So it tracks the number of
1150upper zero bits in all 'uknown_value' registers, so when it sees
1151'r3 += rX' instruction and rX is more than 16-bit value, it will error as:
1152"cannot add integer value with N upper zero bits to ptr_to_packet"
1153Ex. after insn 'r4 = *(u8 *)(r3 +12)' (insn #7 above) the state of r4 is
1154R4=inv56 which means that upper 56 bits on the register are guaranteed
1155to be zero. After insn 'r4 *= 14' the state becomes R4=inv52, since
1156multiplying 8-bit value by constant 14 will keep upper 52 bits as zero.
1157Similarly 'r2 >>= 48' will make R2=inv48, since the shift is not sign
1158extending. This logic is implemented in evaluate_reg_alu() function.
1159
1160The end result is that bpf program author can access packet directly
1161using normal C code as:
1162 void *data = (void *)(long)skb->data;
1163 void *data_end = (void *)(long)skb->data_end;
1164 struct eth_hdr *eth = data;
1165 struct iphdr *iph = data + sizeof(*eth);
1166 struct udphdr *udp = data + sizeof(*eth) + sizeof(*iph);
1167
1168 if (data + sizeof(*eth) + sizeof(*iph) + sizeof(*udp) > data_end)
1169 return 0;
1170 if (eth->h_proto != htons(ETH_P_IP))
1171 return 0;
1172 if (iph->protocol != IPPROTO_UDP || iph->ihl != 5)
1173 return 0;
1174 if (udp->dest == 53 || udp->source == 9)
1175 ...;
1176which makes such programs easier to write comparing to LD_ABS insn
1177and significantly faster.
1178
1098eBPF maps 1179eBPF maps
1099--------- 1180---------
1100'maps' is a generic storage of different types for sharing data between kernel 1181'maps' is a generic storage of different types for sharing data between kernel
@@ -1293,5 +1374,5 @@ to give potential BPF hackers or security auditors a better overview of
1293the underlying architecture. 1374the underlying architecture.
1294 1375
1295Jay Schulist <jschlst@samba.org> 1376Jay Schulist <jschlst@samba.org>
1296Daniel Borkmann <dborkman@redhat.com> 1377Daniel Borkmann <daniel@iogearbox.net>
1297Alexei Starovoitov <ast@plumgrid.com> 1378Alexei Starovoitov <ast@kernel.org>