aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/networking
diff options
context:
space:
mode:
authorAlexei Starovoitov <ast@plumgrid.com>2014-06-10 11:44:07 -0400
committerDavid S. Miller <davem@davemloft.net>2014-06-11 18:39:18 -0400
commit783e327b69e24924055359a4e5779d04c052974a (patch)
tree0ef64e416793dd392c44e51c6db1e63c518bca11 /Documentation/networking
parente4ad403269ff0ecdfb137b2a72349c30941cec7a (diff)
net: filter: document internal instruction encoding
This patch adds a description of eBPFs instruction encoding in order to bring the documentation in line with the implementation. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Diffstat (limited to 'Documentation/networking')
-rw-r--r--Documentation/networking/filter.txt161
1 files changed, 161 insertions, 0 deletions
diff --git a/Documentation/networking/filter.txt b/Documentation/networking/filter.txt
index 1c7fc6baed84..ee78eba78a9d 100644
--- a/Documentation/networking/filter.txt
+++ b/Documentation/networking/filter.txt
@@ -834,6 +834,167 @@ loops and other CFG validation; second step starts from the first insn and
834descends all possible paths. It simulates execution of every insn and observes 834descends all possible paths. It simulates execution of every insn and observes
835the state change of registers and stack. 835the state change of registers and stack.
836 836
837eBPF opcode encoding
838--------------------
839
840eBPF is reusing most of the opcode encoding from classic to simplify conversion
841of classic BPF to eBPF. For arithmetic and jump instructions the 8-bit 'code'
842field is divided into three parts:
843
844 +----------------+--------+--------------------+
845 | 4 bits | 1 bit | 3 bits |
846 | operation code | source | instruction class |
847 +----------------+--------+--------------------+
848 (MSB) (LSB)
849
850Three LSB bits store instruction class which is one of:
851
852 Classic BPF classes: eBPF classes:
853
854 BPF_LD 0x00 BPF_LD 0x00
855 BPF_LDX 0x01 BPF_LDX 0x01
856 BPF_ST 0x02 BPF_ST 0x02
857 BPF_STX 0x03 BPF_STX 0x03
858 BPF_ALU 0x04 BPF_ALU 0x04
859 BPF_JMP 0x05 BPF_JMP 0x05
860 BPF_RET 0x06 [ class 6 unused, for future if needed ]
861 BPF_MISC 0x07 BPF_ALU64 0x07
862
863When BPF_CLASS(code) == BPF_ALU or BPF_JMP, 4th bit encodes source operand ...
864
865 BPF_K 0x00
866 BPF_X 0x08
867
868 * in classic BPF, this means:
869
870 BPF_SRC(code) == BPF_X - use register X as source operand
871 BPF_SRC(code) == BPF_K - use 32-bit immediate as source operand
872
873 * in eBPF, this means:
874
875 BPF_SRC(code) == BPF_X - use 'src_reg' register as source operand
876 BPF_SRC(code) == BPF_K - use 32-bit immediate as source operand
877
878... and four MSB bits store operation code.
879
880If BPF_CLASS(code) == BPF_ALU or BPF_ALU64 [ in eBPF ], BPF_OP(code) is one of:
881
882 BPF_ADD 0x00
883 BPF_SUB 0x10
884 BPF_MUL 0x20
885 BPF_DIV 0x30
886 BPF_OR 0x40
887 BPF_AND 0x50
888 BPF_LSH 0x60
889 BPF_RSH 0x70
890 BPF_NEG 0x80
891 BPF_MOD 0x90
892 BPF_XOR 0xa0
893 BPF_MOV 0xb0 /* eBPF only: mov reg to reg */
894 BPF_ARSH 0xc0 /* eBPF only: sign extending shift right */
895 BPF_END 0xd0 /* eBPF only: endianness conversion */
896
897If BPF_CLASS(code) == BPF_JMP, BPF_OP(code) is one of:
898
899 BPF_JA 0x00
900 BPF_JEQ 0x10
901 BPF_JGT 0x20
902 BPF_JGE 0x30
903 BPF_JSET 0x40
904 BPF_JNE 0x50 /* eBPF only: jump != */
905 BPF_JSGT 0x60 /* eBPF only: signed '>' */
906 BPF_JSGE 0x70 /* eBPF only: signed '>=' */
907 BPF_CALL 0x80 /* eBPF only: function call */
908 BPF_EXIT 0x90 /* eBPF only: function return */
909
910So BPF_ADD | BPF_X | BPF_ALU means 32-bit addition in both classic BPF
911and eBPF. There are only two registers in classic BPF, so it means A += X.
912In eBPF it means dst_reg = (u32) dst_reg + (u32) src_reg; similarly,
913BPF_XOR | BPF_K | BPF_ALU means A ^= imm32 in classic BPF and analogous
914src_reg = (u32) src_reg ^ (u32) imm32 in eBPF.
915
916Classic BPF is using BPF_MISC class to represent A = X and X = A moves.
917eBPF is using BPF_MOV | BPF_X | BPF_ALU code instead. Since there are no
918BPF_MISC operations in eBPF, the class 7 is used as BPF_ALU64 to mean
919exactly the same operations as BPF_ALU, but with 64-bit wide operands
920instead. So BPF_ADD | BPF_X | BPF_ALU64 means 64-bit addition, i.e.:
921dst_reg = dst_reg + src_reg
922
923Classic BPF wastes the whole BPF_RET class to represent a single 'ret'
924operation. Classic BPF_RET | BPF_K means copy imm32 into return register
925and perform function exit. eBPF is modeled to match CPU, so BPF_JMP | BPF_EXIT
926in eBPF means function exit only. The eBPF program needs to store return
927value into register R0 before doing a BPF_EXIT. Class 6 in eBPF is currently
928unused and reserved for future use.
929
930For load and store instructions the 8-bit 'code' field is divided as:
931
932 +--------+--------+-------------------+
933 | 3 bits | 2 bits | 3 bits |
934 | mode | size | instruction class |
935 +--------+--------+-------------------+
936 (MSB) (LSB)
937
938Size modifier is one of ...
939
940 BPF_W 0x00 /* word */
941 BPF_H 0x08 /* half word */
942 BPF_B 0x10 /* byte */
943 BPF_DW 0x18 /* eBPF only, double word */
944
945... which encodes size of load/store operation:
946
947 B - 1 byte
948 H - 2 byte
949 W - 4 byte
950 DW - 8 byte (eBPF only)
951
952Mode modifier is one of:
953
954 BPF_IMM 0x00 /* classic BPF only, reserved in eBPF */
955 BPF_ABS 0x20
956 BPF_IND 0x40
957 BPF_MEM 0x60
958 BPF_LEN 0x80 /* classic BPF only, reserved in eBPF */
959 BPF_MSH 0xa0 /* classic BPF only, reserved in eBPF */
960 BPF_XADD 0xc0 /* eBPF only, exclusive add */
961
962eBPF has two non-generic instructions: (BPF_ABS | <size> | BPF_LD) and
963(BPF_IND | <size> | BPF_LD) which are used to access packet data.
964
965They had to be carried over from classic to have strong performance of
966socket filters running in eBPF interpreter. These instructions can only
967be used when interpreter context is a pointer to 'struct sk_buff' and
968have seven implicit operands. Register R6 is an implicit input that must
969contain pointer to sk_buff. Register R0 is an implicit output which contains
970the data fetched from the packet. Registers R1-R5 are scratch registers
971and must not be used to store the data across BPF_ABS | BPF_LD or
972BPF_IND | BPF_LD instructions.
973
974These instructions have implicit program exit condition as well. When
975eBPF program is trying to access the data beyond the packet boundary,
976the interpreter will abort the execution of the program. JIT compilers
977therefore must preserve this property. src_reg and imm32 fields are
978explicit inputs to these instructions.
979
980For example:
981
982 BPF_IND | BPF_W | BPF_LD means:
983
984 R0 = ntohl(*(u32 *) (((struct sk_buff *) R6)->data + src_reg + imm32))
985 and R1 - R5 were scratched.
986
987Unlike classic BPF instruction set, eBPF has generic load/store operations:
988
989BPF_MEM | <size> | BPF_STX: *(size *) (dst_reg + off) = src_reg
990BPF_MEM | <size> | BPF_ST: *(size *) (dst_reg + off) = imm32
991BPF_MEM | <size> | BPF_LDX: dst_reg = *(size *) (src_reg + off)
992BPF_XADD | BPF_W | BPF_STX: lock xadd *(u32 *)(dst_reg + off16) += src_reg
993BPF_XADD | BPF_DW | BPF_STX: lock xadd *(u64 *)(dst_reg + off16) += src_reg
994
995Where size is one of: BPF_B or BPF_H or BPF_W or BPF_DW. Note that 1 and
9962 byte atomic increments are not supported.
997
837Testing 998Testing
838------- 999-------
839 1000