diff options
author | David Ahern <dsa@cumulusnetworks.com> | 2015-09-15 12:50:14 -0400 |
---|---|---|
committer | David S. Miller <davem@davemloft.net> | 2015-09-17 19:06:43 -0400 |
commit | 562d897d15a6e2bab3cc9b4c172286b612834fe8 (patch) | |
tree | 6a396532aa6894c5d1e089886f54bcbc37b6a57a /Documentation/networking | |
parent | cc5706056baa3002b844ff240a1cc2199a978795 (diff) |
net: Add documentation for VRF device
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Diffstat (limited to 'Documentation/networking')
-rw-r--r-- | Documentation/networking/vrf.txt | 96 |
1 files changed, 96 insertions, 0 deletions
diff --git a/Documentation/networking/vrf.txt b/Documentation/networking/vrf.txt new file mode 100644 index 000000000000..031ef4a63485 --- /dev/null +++ b/Documentation/networking/vrf.txt | |||
@@ -0,0 +1,96 @@ | |||
1 | Virtual Routing and Forwarding (VRF) | ||
2 | ==================================== | ||
3 | The VRF device combined with ip rules provides the ability to create virtual | ||
4 | routing and forwarding domains (aka VRFs, VRF-lite to be specific) in the | ||
5 | Linux network stack. One use case is the multi-tenancy problem where each | ||
6 | tenant has their own unique routing tables and in the very least need | ||
7 | different default gateways. | ||
8 | |||
9 | Processes can be "VRF aware" by binding a socket to the VRF device. Packets | ||
10 | through the socket then use the routing table associated with the VRF | ||
11 | device. An important feature of the VRF device implementation is that it | ||
12 | impacts only Layer 3 and above so L2 tools (e.g., LLDP) are not affected | ||
13 | (ie., they do not need to be run in each VRF). The design also allows | ||
14 | the use of higher priority ip rules (Policy Based Routing, PBR) to take | ||
15 | precedence over the VRF device rules directing specific traffic as desired. | ||
16 | |||
17 | In addition, VRF devices allow VRFs to be nested within namespaces. For | ||
18 | example network namespaces provide separation of network interfaces at L1 | ||
19 | (Layer 1 separation), VLANs on the interfaces within a namespace provide | ||
20 | L2 separation and then VRF devices provide L3 separation. | ||
21 | |||
22 | Design | ||
23 | ------ | ||
24 | A VRF device is created with an associated route table. Network interfaces | ||
25 | are then enslaved to a VRF device: | ||
26 | |||
27 | +-----------------------------+ | ||
28 | | vrf-blue | ===> route table 10 | ||
29 | +-----------------------------+ | ||
30 | | | | | ||
31 | +------+ +------+ +-------------+ | ||
32 | | eth1 | | eth2 | ... | bond1 | | ||
33 | +------+ +------+ +-------------+ | ||
34 | | | | ||
35 | +------+ +------+ | ||
36 | | eth8 | | eth9 | | ||
37 | +------+ +------+ | ||
38 | |||
39 | Packets received on an enslaved device and are switched to the VRF device | ||
40 | using an rx_handler which gives the impression that packets flow through | ||
41 | the VRF device. Similarly on egress routing rules are used to send packets | ||
42 | to the VRF device driver before getting sent out the actual interface. This | ||
43 | allows tcpdump on a VRF device to capture all packets into and out of the | ||
44 | VRF as a whole.[1] Similiarly, netfilter [2] and tc rules can be applied | ||
45 | using the VRF device to specify rules that apply to the VRF domain as a whole. | ||
46 | |||
47 | [1] Packets in the forwarded state do not flow through the device, so those | ||
48 | packets are not seen by tcpdump. Will revisit this limitation in a | ||
49 | future release. | ||
50 | |||
51 | [2] Iptables on ingress is limited to NF_INET_PRE_ROUTING only with skb->dev | ||
52 | set to real ingress device and egress is limited to NF_INET_POST_ROUTING. | ||
53 | Will revisit this limitation in a future release. | ||
54 | |||
55 | |||
56 | Setup | ||
57 | ----- | ||
58 | 1. VRF device is created with an association to a FIB table. | ||
59 | e.g, ip link add vrf-blue type vrf table 10 | ||
60 | ip link set dev vrf-blue up | ||
61 | |||
62 | 2. Rules are added that send lookups to the associated FIB table when the | ||
63 | iif or oif is the VRF device. e.g., | ||
64 | ip ru add oif vrf-blue table 10 | ||
65 | ip ru add iif vrf-blue table 10 | ||
66 | |||
67 | Set the default route for the table (and hence default route for the VRF). | ||
68 | e.g, ip route add table 10 prohibit default | ||
69 | |||
70 | 3. Enslave L3 interfaces to a VRF device. | ||
71 | e.g, ip link set dev eth1 master vrf-blue | ||
72 | |||
73 | Local and connected routes for enslaved devices are automatically moved to | ||
74 | the table associated with VRF device. Any additional routes depending on | ||
75 | the enslaved device will need to be reinserted following the enslavement. | ||
76 | |||
77 | 4. Additional VRF routes are added to associated table. | ||
78 | e.g., ip route add table 10 ... | ||
79 | |||
80 | |||
81 | Applications | ||
82 | ------------ | ||
83 | Applications that are to work within a VRF need to bind their socket to the | ||
84 | VRF device: | ||
85 | |||
86 | setsockopt(sd, SOL_SOCKET, SO_BINDTODEVICE, dev, strlen(dev)+1); | ||
87 | |||
88 | or to specify the output device using cmsg and IP_PKTINFO. | ||
89 | |||
90 | |||
91 | Limitations | ||
92 | ----------- | ||
93 | VRF device currently only works for IPv4. Support for IPv6 is under development. | ||
94 | |||
95 | Index of original ingress interface is not available via cmsg. Will address | ||
96 | soon. | ||