aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
authorAlexei Starovoitov <ast@plumgrid.com>2014-06-10 11:44:06 -0400
committerDavid S. Miller <davem@davemloft.net>2014-06-11 18:39:18 -0400
commite4ad403269ff0ecdfb137b2a72349c30941cec7a (patch)
tree059c9ca9c07dbcba990ddf8e2032cec35ee19699 /Documentation
parent9709674e68646cee5a24e3000b3558d25412203a (diff)
net: filter: mention eBPF terminology as well
Since the term eBPF is used anyway on mailing list discussions, lets also document that in the main BPF documentation file and replace a couple of occurrences with eBPF terminology to be more clear. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/networking/filter.txt85
1 files changed, 43 insertions, 42 deletions
diff --git a/Documentation/networking/filter.txt b/Documentation/networking/filter.txt
index 9f49b8690500..1c7fc6baed84 100644
--- a/Documentation/networking/filter.txt
+++ b/Documentation/networking/filter.txt
@@ -561,42 +561,43 @@ toolchain for developing and testing the kernel's JIT compiler.
561 561
562BPF kernel internals 562BPF kernel internals
563-------------------- 563--------------------
564Internally, for the kernel interpreter, a different BPF instruction set 564Internally, for the kernel interpreter, a different instruction set
565format with similar underlying principles from BPF described in previous 565format with similar underlying principles from BPF described in previous
566paragraphs is being used. However, the instruction set format is modelled 566paragraphs is being used. However, the instruction set format is modelled
567closer to the underlying architecture to mimic native instruction sets, so 567closer to the underlying architecture to mimic native instruction sets, so
568that a better performance can be achieved (more details later). 568that a better performance can be achieved (more details later). This new
569ISA is called 'eBPF' or 'internal BPF' interchangeably. (Note: eBPF which
570originates from [e]xtended BPF is not the same as BPF extensions! While
571eBPF is an ISA, BPF extensions date back to classic BPF's 'overloading'
572of BPF_LD | BPF_{B,H,W} | BPF_ABS instruction.)
569 573
570It is designed to be JITed with one to one mapping, which can also open up 574It is designed to be JITed with one to one mapping, which can also open up
571the possibility for GCC/LLVM compilers to generate optimized BPF code through 575the possibility for GCC/LLVM compilers to generate optimized eBPF code through
572a BPF backend that performs almost as fast as natively compiled code. 576an eBPF backend that performs almost as fast as natively compiled code.
573 577
574The new instruction set was originally designed with the possible goal in 578The new instruction set was originally designed with the possible goal in
575mind to write programs in "restricted C" and compile into BPF with a optional 579mind to write programs in "restricted C" and compile into eBPF with a optional
576GCC/LLVM backend, so that it can just-in-time map to modern 64-bit CPUs with 580GCC/LLVM backend, so that it can just-in-time map to modern 64-bit CPUs with
577minimal performance overhead over two steps, that is, C -> BPF -> native code. 581minimal performance overhead over two steps, that is, C -> eBPF -> native code.
578 582
579Currently, the new format is being used for running user BPF programs, which 583Currently, the new format is being used for running user BPF programs, which
580includes seccomp BPF, classic socket filters, cls_bpf traffic classifier, 584includes seccomp BPF, classic socket filters, cls_bpf traffic classifier,
581team driver's classifier for its load-balancing mode, netfilter's xt_bpf 585team driver's classifier for its load-balancing mode, netfilter's xt_bpf
582extension, PTP dissector/classifier, and much more. They are all internally 586extension, PTP dissector/classifier, and much more. They are all internally
583converted by the kernel into the new instruction set representation and run 587converted by the kernel into the new instruction set representation and run
584in the extended interpreter. For in-kernel handlers, this all works 588in the eBPF interpreter. For in-kernel handlers, this all works transparently
585transparently by using sk_unattached_filter_create() for setting up the 589by using sk_unattached_filter_create() for setting up the filter, resp.
586filter, resp. sk_unattached_filter_destroy() for destroying it. The macro 590sk_unattached_filter_destroy() for destroying it. The macro
587SK_RUN_FILTER(filter, ctx) transparently invokes the right BPF function to 591SK_RUN_FILTER(filter, ctx) transparently invokes eBPF interpreter or JITed
588run the filter. 'filter' is a pointer to struct sk_filter that we got from 592code to run the filter. 'filter' is a pointer to struct sk_filter that we
589sk_unattached_filter_create(), and 'ctx' the given context (e.g. skb pointer). 593got from sk_unattached_filter_create(), and 'ctx' the given context (e.g.
590All constraints and restrictions from sk_chk_filter() apply before a 594skb pointer). All constraints and restrictions from sk_chk_filter() apply
591conversion to the new layout is being done behind the scenes! 595before a conversion to the new layout is being done behind the scenes!
592 596
593Currently, for JITing, the user BPF format is being used and current BPF JIT 597Currently, the classic BPF format is being used for JITing on most of the
594compilers reused whenever possible. In other words, we do not (yet!) perform 598architectures. Only x86-64 performs JIT compilation from eBPF instruction set,
595a JIT compilation in the new layout, however, future work will successively 599however, future work will migrate other JIT compilers as well, so that they
596migrate traditional JIT compilers into the new instruction format as well, so 600will profit from the very same benefits.
597that they will profit from the very same benefits. Thus, when speaking about
598JIT in the following, a JIT compiler (TBD) for the new instruction format is
599meant in this context.
600 601
601Some core changes of the new internal format: 602Some core changes of the new internal format:
602 603
@@ -605,35 +606,35 @@ Some core changes of the new internal format:
605 The old format had two registers A and X, and a hidden frame pointer. The 606 The old format had two registers A and X, and a hidden frame pointer. The
606 new layout extends this to be 10 internal registers and a read-only frame 607 new layout extends this to be 10 internal registers and a read-only frame
607 pointer. Since 64-bit CPUs are passing arguments to functions via registers 608 pointer. Since 64-bit CPUs are passing arguments to functions via registers
608 the number of args from BPF program to in-kernel function is restricted 609 the number of args from eBPF program to in-kernel function is restricted
609 to 5 and one register is used to accept return value from an in-kernel 610 to 5 and one register is used to accept return value from an in-kernel
610 function. Natively, x86_64 passes first 6 arguments in registers, aarch64/ 611 function. Natively, x86_64 passes first 6 arguments in registers, aarch64/
611 sparcv9/mips64 have 7 - 8 registers for arguments; x86_64 has 6 callee saved 612 sparcv9/mips64 have 7 - 8 registers for arguments; x86_64 has 6 callee saved
612 registers, and aarch64/sparcv9/mips64 have 11 or more callee saved registers. 613 registers, and aarch64/sparcv9/mips64 have 11 or more callee saved registers.
613 614
614 Therefore, BPF calling convention is defined as: 615 Therefore, eBPF calling convention is defined as:
615 616
616 * R0 - return value from in-kernel function, and exit value for BPF program 617 * R0 - return value from in-kernel function, and exit value for eBPF program
617 * R1 - R5 - arguments from BPF program to in-kernel function 618 * R1 - R5 - arguments from eBPF program to in-kernel function
618 * R6 - R9 - callee saved registers that in-kernel function will preserve 619 * R6 - R9 - callee saved registers that in-kernel function will preserve
619 * R10 - read-only frame pointer to access stack 620 * R10 - read-only frame pointer to access stack
620 621
621 Thus, all BPF registers map one to one to HW registers on x86_64, aarch64, 622 Thus, all eBPF registers map one to one to HW registers on x86_64, aarch64,
622 etc, and BPF calling convention maps directly to ABIs used by the kernel on 623 etc, and eBPF calling convention maps directly to ABIs used by the kernel on
623 64-bit architectures. 624 64-bit architectures.
624 625
625 On 32-bit architectures JIT may map programs that use only 32-bit arithmetic 626 On 32-bit architectures JIT may map programs that use only 32-bit arithmetic
626 and may let more complex programs to be interpreted. 627 and may let more complex programs to be interpreted.
627 628
628 R0 - R5 are scratch registers and BPF program needs spill/fill them if 629 R0 - R5 are scratch registers and eBPF program needs spill/fill them if
629 necessary across calls. Note that there is only one BPF program (== one BPF 630 necessary across calls. Note that there is only one eBPF program (== one
630 main routine) and it cannot call other BPF functions, it can only call 631 eBPF main routine) and it cannot call other eBPF functions, it can only
631 predefined in-kernel functions, though. 632 call predefined in-kernel functions, though.
632 633
633- Register width increases from 32-bit to 64-bit: 634- Register width increases from 32-bit to 64-bit:
634 635
635 Still, the semantics of the original 32-bit ALU operations are preserved 636 Still, the semantics of the original 32-bit ALU operations are preserved
636 via 32-bit subregisters. All BPF registers are 64-bit with 32-bit lower 637 via 32-bit subregisters. All eBPF registers are 64-bit with 32-bit lower
637 subregisters that zero-extend into 64-bit if they are being written to. 638 subregisters that zero-extend into 64-bit if they are being written to.
638 That behavior maps directly to x86_64 and arm64 subregister definition, but 639 That behavior maps directly to x86_64 and arm64 subregister definition, but
639 makes other JITs more difficult. 640 makes other JITs more difficult.
@@ -644,8 +645,8 @@ Some core changes of the new internal format:
644 645
645 Operation is 64-bit, because on 64-bit architectures, pointers are also 646 Operation is 64-bit, because on 64-bit architectures, pointers are also
646 64-bit wide, and we want to pass 64-bit values in/out of kernel functions, 647 64-bit wide, and we want to pass 64-bit values in/out of kernel functions,
647 so 32-bit BPF registers would otherwise require to define register-pair 648 so 32-bit eBPF registers would otherwise require to define register-pair
648 ABI, thus, there won't be able to use a direct BPF register to HW register 649 ABI, thus, there won't be able to use a direct eBPF register to HW register
649 mapping and JIT would need to do combine/split/move operations for every 650 mapping and JIT would need to do combine/split/move operations for every
650 register in and out of the function, which is complex, bug prone and slow. 651 register in and out of the function, which is complex, bug prone and slow.
651 Another reason is the use of atomic 64-bit counters. 652 Another reason is the use of atomic 64-bit counters.
@@ -690,7 +691,7 @@ Some core changes of the new internal format:
690 subq %rsi, %rax 691 subq %rsi, %rax
691 ret 692 ret
692 693
693 Function f2 in BPF may look like: 694 Function f2 in eBPF may look like:
694 695
695 f2: 696 f2:
696 bpf_mov R2, R1 697 bpf_mov R2, R1
@@ -702,7 +703,7 @@ Some core changes of the new internal format:
702 returns will be seamless. Without JIT, __sk_run_filter() interpreter needs to 703 returns will be seamless. Without JIT, __sk_run_filter() interpreter needs to
703 be used to call into f2. 704 be used to call into f2.
704 705
705 For practical reasons all BPF programs have only one argument 'ctx' which is 706 For practical reasons all eBPF programs have only one argument 'ctx' which is
706 already placed into R1 (e.g. on __sk_run_filter() startup) and the programs 707 already placed into R1 (e.g. on __sk_run_filter() startup) and the programs
707 can call kernel functions with up to 5 arguments. Calls with 6 or more arguments 708 can call kernel functions with up to 5 arguments. Calls with 6 or more arguments
708 are currently not supported, but these restrictions can be lifted if necessary 709 are currently not supported, but these restrictions can be lifted if necessary
@@ -779,9 +780,9 @@ Some core changes of the new internal format:
779 780
780 In-kernel functions foo() and bar() with prototype: u64 (*)(u64 arg1, u64 781 In-kernel functions foo() and bar() with prototype: u64 (*)(u64 arg1, u64
781 arg2, u64 arg3, u64 arg4, u64 arg5); will receive arguments in proper 782 arg2, u64 arg3, u64 arg4, u64 arg5); will receive arguments in proper
782 registers and place their return value into '%rax' which is R0 in BPF. 783 registers and place their return value into '%rax' which is R0 in eBPF.
783 Prologue and epilogue are emitted by JIT and are implicit in the 784 Prologue and epilogue are emitted by JIT and are implicit in the
784 interpreter. R0-R5 are scratch registers, so BPF program needs to preserve 785 interpreter. R0-R5 are scratch registers, so eBPF program needs to preserve
785 them across the calls as defined by calling convention. 786 them across the calls as defined by calling convention.
786 787
787 For example the following program is invalid: 788 For example the following program is invalid:
@@ -792,12 +793,12 @@ Some core changes of the new internal format:
792 bpf_exit 793 bpf_exit
793 794
794 After the call the registers R1-R5 contain junk values and cannot be read. 795 After the call the registers R1-R5 contain junk values and cannot be read.
795 In the future a BPF verifier can be used to validate internal BPF programs. 796 In the future an eBPF verifier can be used to validate internal BPF programs.
796 797
797Also in the new design, BPF is limited to 4096 insns, which means that any 798Also in the new design, eBPF is limited to 4096 insns, which means that any
798program will terminate quickly and will only call a fixed number of kernel 799program will terminate quickly and will only call a fixed number of kernel
799functions. Original BPF and the new format are two operand instructions, 800functions. Original BPF and the new format are two operand instructions,
800which helps to do one-to-one mapping between BPF insn and x86 insn during JIT. 801which helps to do one-to-one mapping between eBPF insn and x86 insn during JIT.
801 802
802The input context pointer for invoking the interpreter function is generic, 803The input context pointer for invoking the interpreter function is generic,
803its content is defined by a specific use case. For seccomp register R1 points 804its content is defined by a specific use case. For seccomp register R1 points