net: filter: document internal instruction encoding

This patch adds a description of eBPFs instruction encoding in order to bring the documentation in line with the implementation. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
author: Alexei Starovoitov <ast@plumgrid.com> 2014-06-10 11:44:07 -0400
committer: David S. Miller <davem@davemloft.net> 2014-06-11 18:39:18 -0400
commit: 783e327b69e24924055359a4e5779d04c052974a (patch)
tree: 0ef64e416793dd392c44e51c6db1e63c518bca11 /Documentation/networking
parent: e4ad403269ff0ecdfb137b2a72349c30941cec7a (diff)
1 files changed, 161 insertions, 0 deletions
diff --git a/Documentation/networking/filter.txt b/Documentation/networking/filter.txt
index 1c7fc6baed84..ee78eba78a9d 100644
--- a/Documentation/networking/filter.txt
+++ b/Documentation/networking/filter.txt
@@ -834,6 +834,167 @@ loops and other CFG validation; second step starts from the first insn and
 descends all possible paths. It simulates execution of every insn and observes
 the state change of registers and stack.
+eBPF opcode encoding
+--------------------
+eBPF is reusing most of the opcode encoding from classic to simplify conversion
+of classic BPF to eBPF. For arithmetic and jump instructions the 8-bit 'code'
+field is divided into three parts:
+  +----------------+--------+--------------------+
+  |   4 bits       |  1 bit |   3 bits           |
+  | operation code | source | instruction class  |
+  +----------------+--------+--------------------+
+  (MSB)                                      (LSB)
+Three LSB bits store instruction class which is one of:
+  Classic BPF classes:    eBPF classes:
+  BPF_LD    0x00          BPF_LD    0x00
+  BPF_LDX   0x01          BPF_LDX   0x01
+  BPF_ST    0x02          BPF_ST    0x02
+  BPF_STX   0x03          BPF_STX   0x03
+  BPF_ALU   0x04          BPF_ALU   0x04
+  BPF_JMP   0x05          BPF_JMP   0x05
+  BPF_RET   0x06          [ class 6 unused, for future if needed ]
+  BPF_MISC  0x07          BPF_ALU64 0x07
+When BPF_CLASS(code) == BPF_ALU or BPF_JMP, 4th bit encodes source operand ...
+  BPF_K     0x00
+  BPF_X     0x08
+ * in classic BPF, this means:
+  BPF_SRC(code) == BPF_X - use register X as source operand
+  BPF_SRC(code) == BPF_K - use 32-bit immediate as source operand
+ * in eBPF, this means:
+  BPF_SRC(code) == BPF_X - use 'src_reg' register as source operand
+  BPF_SRC(code) == BPF_K - use 32-bit immediate as source operand
+... and four MSB bits store operation code.
+If BPF_CLASS(code) == BPF_ALU or BPF_ALU64 [ in eBPF ], BPF_OP(code) is one of:
+  BPF_ADD   0x00
+  BPF_SUB   0x10
+  BPF_MUL   0x20
+  BPF_DIV   0x30
+  BPF_OR    0x40
+  BPF_AND   0x50
+  BPF_LSH   0x60
+  BPF_RSH   0x70
+  BPF_NEG   0x80
+  BPF_MOD   0x90
+  BPF_XOR   0xa0
+  BPF_MOV   0xb0  /* eBPF only: mov reg to reg */
+  BPF_ARSH  0xc0  /* eBPF only: sign extending shift right */
+  BPF_END   0xd0  /* eBPF only: endianness conversion */
+If BPF_CLASS(code) == BPF_JMP, BPF_OP(code) is one of:
+  BPF_JA    0x00
+  BPF_JEQ   0x10
+  BPF_JGT   0x20
+  BPF_JGE   0x30
+  BPF_JSET  0x40
+  BPF_JNE   0x50  /* eBPF only: jump != */
+  BPF_JSGT  0x60  /* eBPF only: signed '>' */
+  BPF_JSGE  0x70  /* eBPF only: signed '>=' */
+  BPF_CALL  0x80  /* eBPF only: function call */
+  BPF_EXIT  0x90  /* eBPF only: function return */
+So BPF_ADD | BPF_X | BPF_ALU means 32-bit addition in both classic BPF
+and eBPF. There are only two registers in classic BPF, so it means A += X.
+In eBPF it means dst_reg = (u32) dst_reg + (u32) src_reg; similarly,
+BPF_XOR | BPF_K | BPF_ALU means A ^= imm32 in classic BPF and analogous
+src_reg = (u32) src_reg ^ (u32) imm32 in eBPF.
+Classic BPF is using BPF_MISC class to represent A = X and X = A moves.
+eBPF is using BPF_MOV | BPF_X | BPF_ALU code instead. Since there are no
+BPF_MISC operations in eBPF, the class 7 is used as BPF_ALU64 to mean
+exactly the same operations as BPF_ALU, but with 64-bit wide operands
+instead. So BPF_ADD | BPF_X | BPF_ALU64 means 64-bit addition, i.e.:
+dst_reg = dst_reg + src_reg
+Classic BPF wastes the whole BPF_RET class to represent a single 'ret'
+operation. Classic BPF_RET | BPF_K means copy imm32 into return register
+and perform function exit. eBPF is modeled to match CPU, so BPF_JMP | BPF_EXIT
+in eBPF means function exit only. The eBPF program needs to store return
+value into register R0 before doing a BPF_EXIT. Class 6 in eBPF is currently
+unused and reserved for future use.
+For load and store instructions the 8-bit 'code' field is divided as:
+  +--------+--------+-------------------+
+  | 3 bits | 2 bits |   3 bits          |
+  |  mode  |  size  | instruction class |
+  +--------+--------+-------------------+
+  (MSB)                             (LSB)
+Size modifier is one of ...
+  BPF_W   0x00    /* word */
+  BPF_H   0x08    /* half word */
+  BPF_B   0x10    /* byte */
+  BPF_DW  0x18    /* eBPF only, double word */
+... which encodes size of load/store operation:
+ B  - 1 byte
+ H  - 2 byte
+ W  - 4 byte
+ DW - 8 byte (eBPF only)
+Mode modifier is one of:
+  BPF_IMM  0x00  /* classic BPF only, reserved in eBPF */
+  BPF_ABS  0x20
+  BPF_IND  0x40
+  BPF_MEM  0x60
+  BPF_LEN  0x80  /* classic BPF only, reserved in eBPF */
+  BPF_MSH  0xa0  /* classic BPF only, reserved in eBPF */
+  BPF_XADD 0xc0  /* eBPF only, exclusive add */
+eBPF has two non-generic instructions: (BPF_ABS | <size> | BPF_LD) and
+(BPF_IND | <size> | BPF_LD) which are used to access packet data.
+They had to be carried over from classic to have strong performance of
+socket filters running in eBPF interpreter. These instructions can only
+be used when interpreter context is a pointer to 'struct sk_buff' and
+have seven implicit operands. Register R6 is an implicit input that must
+contain pointer to sk_buff. Register R0 is an implicit output which contains
+the data fetched from the packet. Registers R1-R5 are scratch registers
+and must not be used to store the data across BPF_ABS | BPF_LD or
+BPF_IND | BPF_LD instructions.
+These instructions have implicit program exit condition as well. When
+eBPF program is trying to access the data beyond the packet boundary,
+the interpreter will abort the execution of the program. JIT compilers
+therefore must preserve this property. src_reg and imm32 fields are
+explicit inputs to these instructions.
+For example:
+  BPF_IND | BPF_W | BPF_LD means:
+    R0 = ntohl(*(u32 *) (((struct sk_buff *) R6)->data + src_reg + imm32))
+    and R1 - R5 were scratched.
+Unlike classic BPF instruction set, eBPF has generic load/store operations:
+BPF_MEM | <size> | BPF_STX:  *(size *) (dst_reg + off) = src_reg
+BPF_MEM | <size> | BPF_ST:   *(size *) (dst_reg + off) = imm32
+BPF_MEM | <size> | BPF_LDX:  dst_reg = *(size *) (src_reg + off)
+BPF_XADD | BPF_W  | BPF_STX: lock xadd *(u32 *)(dst_reg + off16) += src_reg
+BPF_XADD | BPF_DW | BPF_STX: lock xadd *(u64 *)(dst_reg + off16) += src_reg
+Where size is one of: BPF_B or BPF_H or BPF_W or BPF_DW. Note that 1 and
+2 byte atomic increments are not supported.
 Testing
 -------
author	Alexei Starovoitov <ast@plumgrid.com>	2014-06-10 11:44:07 -0400
committer	David S. Miller <davem@davemloft.net>	2014-06-11 18:39:18 -0400
commit	783e327b69e24924055359a4e5779d04c052974a (patch)
tree	0ef64e416793dd392c44e51c6db1e63c518bca11 /Documentation/networking
parent	e4ad403269ff0ecdfb137b2a72349c30941cec7a (diff)

diff --git a/Documentation/networking/filter.txt b/Documentation/networking/filter.txt index 1c7fc6baed84..ee78eba78a9d 100644 --- a/Documentation/networking/filter.txt +++ b/Documentation/networking/filter.txt
@@ -834,6 +834,167 @@ loops and other CFG validation; second step starts from the first insn and
834	descends all possible paths. It simulates execution of every insn and observes	834	descends all possible paths. It simulates execution of every insn and observes
835	the state change of registers and stack.	835	the state change of registers and stack.
836		836
		837	eBPF opcode encoding
		838	--------------------
		839
		840	eBPF is reusing most of the opcode encoding from classic to simplify conversion
		841	of classic BPF to eBPF. For arithmetic and jump instructions the 8-bit 'code'
		842	field is divided into three parts:
		843
		844	+----------------+--------+--------------------+
		845	\| 4 bits \| 1 bit \| 3 bits \|
		846	\| operation code \| source \| instruction class \|
		847	+----------------+--------+--------------------+
		848	(MSB) (LSB)
		849
		850	Three LSB bits store instruction class which is one of:
		851
		852	Classic BPF classes: eBPF classes:
		853
		854	BPF_LD 0x00 BPF_LD 0x00
		855	BPF_LDX 0x01 BPF_LDX 0x01
		856	BPF_ST 0x02 BPF_ST 0x02
		857	BPF_STX 0x03 BPF_STX 0x03
		858	BPF_ALU 0x04 BPF_ALU 0x04
		859	BPF_JMP 0x05 BPF_JMP 0x05
		860	BPF_RET 0x06 [ class 6 unused, for future if needed ]
		861	BPF_MISC 0x07 BPF_ALU64 0x07
		862
		863	When BPF_CLASS(code) == BPF_ALU or BPF_JMP, 4th bit encodes source operand ...
		864
		865	BPF_K 0x00
		866	BPF_X 0x08
		867
		868	* in classic BPF, this means:
		869
		870	BPF_SRC(code) == BPF_X - use register X as source operand
		871	BPF_SRC(code) == BPF_K - use 32-bit immediate as source operand
		872
		873	* in eBPF, this means:
		874
		875	BPF_SRC(code) == BPF_X - use 'src_reg' register as source operand
		876	BPF_SRC(code) == BPF_K - use 32-bit immediate as source operand
		877
		878	... and four MSB bits store operation code.
		879
		880	If BPF_CLASS(code) == BPF_ALU or BPF_ALU64 [ in eBPF ], BPF_OP(code) is one of:
		881
		882	BPF_ADD 0x00
		883	BPF_SUB 0x10
		884	BPF_MUL 0x20
		885	BPF_DIV 0x30
		886	BPF_OR 0x40
		887	BPF_AND 0x50
		888	BPF_LSH 0x60
		889	BPF_RSH 0x70
		890	BPF_NEG 0x80
		891	BPF_MOD 0x90
		892	BPF_XOR 0xa0
		893	BPF_MOV 0xb0 /* eBPF only: mov reg to reg */
		894	BPF_ARSH 0xc0 /* eBPF only: sign extending shift right */
		895	BPF_END 0xd0 /* eBPF only: endianness conversion */
		896
		897	If BPF_CLASS(code) == BPF_JMP, BPF_OP(code) is one of:
		898
		899	BPF_JA 0x00
		900	BPF_JEQ 0x10
		901	BPF_JGT 0x20
		902	BPF_JGE 0x30
		903	BPF_JSET 0x40
		904	BPF_JNE 0x50 /* eBPF only: jump != */
		905	BPF_JSGT 0x60 /* eBPF only: signed '>' */
		906	BPF_JSGE 0x70 /* eBPF only: signed '>=' */
		907	BPF_CALL 0x80 /* eBPF only: function call */
		908	BPF_EXIT 0x90 /* eBPF only: function return */
		909
		910	So BPF_ADD \| BPF_X \| BPF_ALU means 32-bit addition in both classic BPF
		911	and eBPF. There are only two registers in classic BPF, so it means A += X.
		912	In eBPF it means dst_reg = (u32) dst_reg + (u32) src_reg; similarly,
		913	BPF_XOR \| BPF_K \| BPF_ALU means A ^= imm32 in classic BPF and analogous
		914	src_reg = (u32) src_reg ^ (u32) imm32 in eBPF.
		915
		916	Classic BPF is using BPF_MISC class to represent A = X and X = A moves.
		917	eBPF is using BPF_MOV \| BPF_X \| BPF_ALU code instead. Since there are no
		918	BPF_MISC operations in eBPF, the class 7 is used as BPF_ALU64 to mean
		919	exactly the same operations as BPF_ALU, but with 64-bit wide operands
		920	instead. So BPF_ADD \| BPF_X \| BPF_ALU64 means 64-bit addition, i.e.:
		921	dst_reg = dst_reg + src_reg
		922
		923	Classic BPF wastes the whole BPF_RET class to represent a single 'ret'
		924	operation. Classic BPF_RET \| BPF_K means copy imm32 into return register
		925	and perform function exit. eBPF is modeled to match CPU, so BPF_JMP \| BPF_EXIT
		926	in eBPF means function exit only. The eBPF program needs to store return
		927	value into register R0 before doing a BPF_EXIT. Class 6 in eBPF is currently
		928	unused and reserved for future use.
		929
		930	For load and store instructions the 8-bit 'code' field is divided as:
		931
		932	+--------+--------+-------------------+
		933	\| 3 bits \| 2 bits \| 3 bits \|
		934	\| mode \| size \| instruction class \|
		935	+--------+--------+-------------------+
		936	(MSB) (LSB)
		937
		938	Size modifier is one of ...
		939
		940	BPF_W 0x00 /* word */
		941	BPF_H 0x08 /* half word */
		942	BPF_B 0x10 /* byte */
		943	BPF_DW 0x18 /* eBPF only, double word */
		944
		945	... which encodes size of load/store operation:
		946
		947	B - 1 byte
		948	H - 2 byte
		949	W - 4 byte
		950	DW - 8 byte (eBPF only)
		951
		952	Mode modifier is one of:
		953
		954	BPF_IMM 0x00 /* classic BPF only, reserved in eBPF */
		955	BPF_ABS 0x20
		956	BPF_IND 0x40
		957	BPF_MEM 0x60
		958	BPF_LEN 0x80 /* classic BPF only, reserved in eBPF */
		959	BPF_MSH 0xa0 /* classic BPF only, reserved in eBPF */
		960	BPF_XADD 0xc0 /* eBPF only, exclusive add */
		961
		962	eBPF has two non-generic instructions: (BPF_ABS \| <size> \| BPF_LD) and
		963	(BPF_IND \| <size> \| BPF_LD) which are used to access packet data.
		964
		965	They had to be carried over from classic to have strong performance of
		966	socket filters running in eBPF interpreter. These instructions can only
		967	be used when interpreter context is a pointer to 'struct sk_buff' and
		968	have seven implicit operands. Register R6 is an implicit input that must
		969	contain pointer to sk_buff. Register R0 is an implicit output which contains
		970	the data fetched from the packet. Registers R1-R5 are scratch registers
		971	and must not be used to store the data across BPF_ABS \| BPF_LD or
		972	BPF_IND \| BPF_LD instructions.
		973
		974	These instructions have implicit program exit condition as well. When
		975	eBPF program is trying to access the data beyond the packet boundary,
		976	the interpreter will abort the execution of the program. JIT compilers
		977	therefore must preserve this property. src_reg and imm32 fields are
		978	explicit inputs to these instructions.
		979
		980	For example:
		981
		982	BPF_IND \| BPF_W \| BPF_LD means:
		983
		984	R0 = ntohl((u32 ) (((struct sk_buff *) R6)->data + src_reg + imm32))
		985	and R1 - R5 were scratched.
		986
		987	Unlike classic BPF instruction set, eBPF has generic load/store operations:
		988
		989	BPF_MEM \| <size> \| BPF_STX: (size ) (dst_reg + off) = src_reg
		990	BPF_MEM \| <size> \| BPF_ST: (size ) (dst_reg + off) = imm32
		991	BPF_MEM \| <size> \| BPF_LDX: dst_reg = (size ) (src_reg + off)
		992	BPF_XADD \| BPF_W \| BPF_STX: lock xadd (u32 )(dst_reg + off16) += src_reg
		993	BPF_XADD \| BPF_DW \| BPF_STX: lock xadd (u64 )(dst_reg + off16) += src_reg
		994
		995	Where size is one of: BPF_B or BPF_H or BPF_W or BPF_DW. Note that 1 and
		996	2 byte atomic increments are not supported.
		997
837	Testing	998	Testing
838	-------	999	-------
839		1000