diff options
author | Alexei Starovoitov <ast@plumgrid.com> | 2014-06-10 11:44:07 -0400 |
---|---|---|
committer | David S. Miller <davem@davemloft.net> | 2014-06-11 18:39:18 -0400 |
commit | 783e327b69e24924055359a4e5779d04c052974a (patch) | |
tree | 0ef64e416793dd392c44e51c6db1e63c518bca11 /Documentation/networking | |
parent | e4ad403269ff0ecdfb137b2a72349c30941cec7a (diff) |
net: filter: document internal instruction encoding
This patch adds a description of eBPFs instruction encoding in order
to bring the documentation in line with the implementation.
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Diffstat (limited to 'Documentation/networking')
-rw-r--r-- | Documentation/networking/filter.txt | 161 |
1 files changed, 161 insertions, 0 deletions
diff --git a/Documentation/networking/filter.txt b/Documentation/networking/filter.txt index 1c7fc6baed84..ee78eba78a9d 100644 --- a/Documentation/networking/filter.txt +++ b/Documentation/networking/filter.txt | |||
@@ -834,6 +834,167 @@ loops and other CFG validation; second step starts from the first insn and | |||
834 | descends all possible paths. It simulates execution of every insn and observes | 834 | descends all possible paths. It simulates execution of every insn and observes |
835 | the state change of registers and stack. | 835 | the state change of registers and stack. |
836 | 836 | ||
837 | eBPF opcode encoding | ||
838 | -------------------- | ||
839 | |||
840 | eBPF is reusing most of the opcode encoding from classic to simplify conversion | ||
841 | of classic BPF to eBPF. For arithmetic and jump instructions the 8-bit 'code' | ||
842 | field is divided into three parts: | ||
843 | |||
844 | +----------------+--------+--------------------+ | ||
845 | | 4 bits | 1 bit | 3 bits | | ||
846 | | operation code | source | instruction class | | ||
847 | +----------------+--------+--------------------+ | ||
848 | (MSB) (LSB) | ||
849 | |||
850 | Three LSB bits store instruction class which is one of: | ||
851 | |||
852 | Classic BPF classes: eBPF classes: | ||
853 | |||
854 | BPF_LD 0x00 BPF_LD 0x00 | ||
855 | BPF_LDX 0x01 BPF_LDX 0x01 | ||
856 | BPF_ST 0x02 BPF_ST 0x02 | ||
857 | BPF_STX 0x03 BPF_STX 0x03 | ||
858 | BPF_ALU 0x04 BPF_ALU 0x04 | ||
859 | BPF_JMP 0x05 BPF_JMP 0x05 | ||
860 | BPF_RET 0x06 [ class 6 unused, for future if needed ] | ||
861 | BPF_MISC 0x07 BPF_ALU64 0x07 | ||
862 | |||
863 | When BPF_CLASS(code) == BPF_ALU or BPF_JMP, 4th bit encodes source operand ... | ||
864 | |||
865 | BPF_K 0x00 | ||
866 | BPF_X 0x08 | ||
867 | |||
868 | * in classic BPF, this means: | ||
869 | |||
870 | BPF_SRC(code) == BPF_X - use register X as source operand | ||
871 | BPF_SRC(code) == BPF_K - use 32-bit immediate as source operand | ||
872 | |||
873 | * in eBPF, this means: | ||
874 | |||
875 | BPF_SRC(code) == BPF_X - use 'src_reg' register as source operand | ||
876 | BPF_SRC(code) == BPF_K - use 32-bit immediate as source operand | ||
877 | |||
878 | ... and four MSB bits store operation code. | ||
879 | |||
880 | If BPF_CLASS(code) == BPF_ALU or BPF_ALU64 [ in eBPF ], BPF_OP(code) is one of: | ||
881 | |||
882 | BPF_ADD 0x00 | ||
883 | BPF_SUB 0x10 | ||
884 | BPF_MUL 0x20 | ||
885 | BPF_DIV 0x30 | ||
886 | BPF_OR 0x40 | ||
887 | BPF_AND 0x50 | ||
888 | BPF_LSH 0x60 | ||
889 | BPF_RSH 0x70 | ||
890 | BPF_NEG 0x80 | ||
891 | BPF_MOD 0x90 | ||
892 | BPF_XOR 0xa0 | ||
893 | BPF_MOV 0xb0 /* eBPF only: mov reg to reg */ | ||
894 | BPF_ARSH 0xc0 /* eBPF only: sign extending shift right */ | ||
895 | BPF_END 0xd0 /* eBPF only: endianness conversion */ | ||
896 | |||
897 | If BPF_CLASS(code) == BPF_JMP, BPF_OP(code) is one of: | ||
898 | |||
899 | BPF_JA 0x00 | ||
900 | BPF_JEQ 0x10 | ||
901 | BPF_JGT 0x20 | ||
902 | BPF_JGE 0x30 | ||
903 | BPF_JSET 0x40 | ||
904 | BPF_JNE 0x50 /* eBPF only: jump != */ | ||
905 | BPF_JSGT 0x60 /* eBPF only: signed '>' */ | ||
906 | BPF_JSGE 0x70 /* eBPF only: signed '>=' */ | ||
907 | BPF_CALL 0x80 /* eBPF only: function call */ | ||
908 | BPF_EXIT 0x90 /* eBPF only: function return */ | ||
909 | |||
910 | So BPF_ADD | BPF_X | BPF_ALU means 32-bit addition in both classic BPF | ||
911 | and eBPF. There are only two registers in classic BPF, so it means A += X. | ||
912 | In eBPF it means dst_reg = (u32) dst_reg + (u32) src_reg; similarly, | ||
913 | BPF_XOR | BPF_K | BPF_ALU means A ^= imm32 in classic BPF and analogous | ||
914 | src_reg = (u32) src_reg ^ (u32) imm32 in eBPF. | ||
915 | |||
916 | Classic BPF is using BPF_MISC class to represent A = X and X = A moves. | ||
917 | eBPF is using BPF_MOV | BPF_X | BPF_ALU code instead. Since there are no | ||
918 | BPF_MISC operations in eBPF, the class 7 is used as BPF_ALU64 to mean | ||
919 | exactly the same operations as BPF_ALU, but with 64-bit wide operands | ||
920 | instead. So BPF_ADD | BPF_X | BPF_ALU64 means 64-bit addition, i.e.: | ||
921 | dst_reg = dst_reg + src_reg | ||
922 | |||
923 | Classic BPF wastes the whole BPF_RET class to represent a single 'ret' | ||
924 | operation. Classic BPF_RET | BPF_K means copy imm32 into return register | ||
925 | and perform function exit. eBPF is modeled to match CPU, so BPF_JMP | BPF_EXIT | ||
926 | in eBPF means function exit only. The eBPF program needs to store return | ||
927 | value into register R0 before doing a BPF_EXIT. Class 6 in eBPF is currently | ||
928 | unused and reserved for future use. | ||
929 | |||
930 | For load and store instructions the 8-bit 'code' field is divided as: | ||
931 | |||
932 | +--------+--------+-------------------+ | ||
933 | | 3 bits | 2 bits | 3 bits | | ||
934 | | mode | size | instruction class | | ||
935 | +--------+--------+-------------------+ | ||
936 | (MSB) (LSB) | ||
937 | |||
938 | Size modifier is one of ... | ||
939 | |||
940 | BPF_W 0x00 /* word */ | ||
941 | BPF_H 0x08 /* half word */ | ||
942 | BPF_B 0x10 /* byte */ | ||
943 | BPF_DW 0x18 /* eBPF only, double word */ | ||
944 | |||
945 | ... which encodes size of load/store operation: | ||
946 | |||
947 | B - 1 byte | ||
948 | H - 2 byte | ||
949 | W - 4 byte | ||
950 | DW - 8 byte (eBPF only) | ||
951 | |||
952 | Mode modifier is one of: | ||
953 | |||
954 | BPF_IMM 0x00 /* classic BPF only, reserved in eBPF */ | ||
955 | BPF_ABS 0x20 | ||
956 | BPF_IND 0x40 | ||
957 | BPF_MEM 0x60 | ||
958 | BPF_LEN 0x80 /* classic BPF only, reserved in eBPF */ | ||
959 | BPF_MSH 0xa0 /* classic BPF only, reserved in eBPF */ | ||
960 | BPF_XADD 0xc0 /* eBPF only, exclusive add */ | ||
961 | |||
962 | eBPF has two non-generic instructions: (BPF_ABS | <size> | BPF_LD) and | ||
963 | (BPF_IND | <size> | BPF_LD) which are used to access packet data. | ||
964 | |||
965 | They had to be carried over from classic to have strong performance of | ||
966 | socket filters running in eBPF interpreter. These instructions can only | ||
967 | be used when interpreter context is a pointer to 'struct sk_buff' and | ||
968 | have seven implicit operands. Register R6 is an implicit input that must | ||
969 | contain pointer to sk_buff. Register R0 is an implicit output which contains | ||
970 | the data fetched from the packet. Registers R1-R5 are scratch registers | ||
971 | and must not be used to store the data across BPF_ABS | BPF_LD or | ||
972 | BPF_IND | BPF_LD instructions. | ||
973 | |||
974 | These instructions have implicit program exit condition as well. When | ||
975 | eBPF program is trying to access the data beyond the packet boundary, | ||
976 | the interpreter will abort the execution of the program. JIT compilers | ||
977 | therefore must preserve this property. src_reg and imm32 fields are | ||
978 | explicit inputs to these instructions. | ||
979 | |||
980 | For example: | ||
981 | |||
982 | BPF_IND | BPF_W | BPF_LD means: | ||
983 | |||
984 | R0 = ntohl(*(u32 *) (((struct sk_buff *) R6)->data + src_reg + imm32)) | ||
985 | and R1 - R5 were scratched. | ||
986 | |||
987 | Unlike classic BPF instruction set, eBPF has generic load/store operations: | ||
988 | |||
989 | BPF_MEM | <size> | BPF_STX: *(size *) (dst_reg + off) = src_reg | ||
990 | BPF_MEM | <size> | BPF_ST: *(size *) (dst_reg + off) = imm32 | ||
991 | BPF_MEM | <size> | BPF_LDX: dst_reg = *(size *) (src_reg + off) | ||
992 | BPF_XADD | BPF_W | BPF_STX: lock xadd *(u32 *)(dst_reg + off16) += src_reg | ||
993 | BPF_XADD | BPF_DW | BPF_STX: lock xadd *(u64 *)(dst_reg + off16) += src_reg | ||
994 | |||
995 | Where size is one of: BPF_B or BPF_H or BPF_W or BPF_DW. Note that 1 and | ||
996 | 2 byte atomic increments are not supported. | ||
997 | |||
837 | Testing | 998 | Testing |
838 | ------- | 999 | ------- |
839 | 1000 | ||