diff --git a/docs/network-setup.md b/docs/network-setup.md index ad77bd3e7e9..077b893ff3a 100644 --- a/docs/network-setup.md +++ b/docs/network-setup.md @@ -1,49 +1,99 @@ # Getting Started Firecracker Network Setup -This is a very simple quick-start guide to getting a Firecracker guest connected -to the network. If you're using Firecracker in production, or even want to run -multiple guests, you'll need to adapt this setup. +This is a simple quick-start guide to getting one or more Firecracker microVMs +connected to the Internet via the host. If you run a production setup, you should +consider modifying this setup to accommodate your specific needs. -**Note** Currently firecracker supports only TUN/TAP network backend with no +**Note:** Currently, Firecracker supports only a TUN/TAP network backend with no multi queue support. -The simple steps in this guide assume that your internet-facing interface is -`eth0`, you have nothing else using `tap0` and no other `iptables` rules. Check -out the *Advanced:* sections if that doesn't work for you. +The steps in this guide assume `eth0` to be your Internet-facing network interface +on the host. If `eth0` isn't your main network interface, you should change the +value to the correct one in the commands below. IPv4 is also assumed to be used, +so you will need to adapt the instructions accordingly to support IPv6. + +Each microVM requires a host network interface (like `eth0`) and a Linux +`tap` device (like `tap0`) used by Firecracker, but the differences in configuration +stem from routing: how packets from the `tap` get to the network interface (egress) +and vice-versa (ingress). There are three main approaches of how to configure routing +for a microVM. + +1. **NAT-based**, which is presented in the main part of this guide. It is simple but + doesn't expose your microVM to the local network (LAN). +2. **Bridge-based**, which exposes your microVM to the local network. Learn more about in + the _Advanced: Bridge-based routing_ section of this guide. +3. **Namespaced NAT**, which sacrifices performance in comparison to the other approaches + but is desired in the scenario when two clones of the same microVM are running at the same + time. To learn more about it, check out the [Network Connectivity for Clones](./snapshotting/network-for-clones.md) guide. + +To run multiple microVMs while using NAT-based routing, check out the +_Advanced: Multiple guests_ section. The same principles can be applied to other routing +methods with a bit more effort. + +For the choice of firewall, `nft` is recommended for use on production Linux systems, +but, for the sake of compatibility, this guide provides a choice between either +`nft` or the `iptables-nft` translation layer. The latter is +[no longer recommended](https://access.redhat.com/solutions/6739041) but may be more +familiar to readers. + +## On the Host + +The first step on the host for any microVM is to create a Linux `tap` device, which Firecracker +will use for networking. + +For this setup, only two IP addresses will be necessary - one for the `tap` device and one for +the guest itself, through which you will, for example, `ssh` into the guest. So, we'll choose the +smallest IPv4 subnet needed for 2 addresses: `/30`. For this VM, let's use the `172.16.0.1` `tap` IP +and the `172.16.0.2` guest IP. -## On The Host +```bash +# Create the tap device. +sudo ip tuntap add tap0 mode tap +# Assign it the tap IP and start up the device. +sudo ip addr add 172.16.0.1/30 dev tap0 +sudo ip link set tap0 up +``` -The first step on the host is to create a `tap` device: +**Note:** The IP of the TAP device should be chosen such that it's not in the same +subnet as the IP address of the host. +We'll need to enable IPv4 forwarding on the system. ```bash -sudo ip tuntap add tap0 mode tap +echo 1 | sudo tee /proc/sys/net/ipv4/ip_forward ``` -Then you have a few options for routing traffic out of the tap device, through -your host's network interface. One option is NAT, set up like this: +### Configuration via `nft` +We'll need an nftables table for our routing needs, and 2 chains inside that table: one +for NAT on `postrouting` stage, and another one for filtering on `forward` stage: ```bash -sudo ip addr add 172.16.0.1/24 dev tap0 -sudo ip link set tap0 up -sudo sh -c "echo 1 > /proc/sys/net/ipv4/ip_forward" -sudo iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE -sudo iptables -A FORWARD -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT -sudo iptables -A FORWARD -i tap0 -o eth0 -j ACCEPT +sudo nft add table firecracker +sudo nft 'add chain firecracker postrouting { type nat hook postrouting priority srcnat; policy accept; }' +sudo nft 'add chain firecracker filter { type filter hook forward priority filter; policy accept; }' ``` -*Note:* The IP of the TAP device should be chosen such that it's not in the same -subnet as the IP address of the host. +The first rule we'll need will masquerade packets from the guest IP as if they came from the +host's IP, by changing the source IP address of these packets: +```bash +sudo nft add rule firecracker postrouting ip saddr 172.16.0.2 oifname eth0 counter masquerade +``` + +The second rule we'll need will accept packets from the tap IP (the guest will use the tap IP as its +gateway and will therefore route its own packets through the tap IP) and direct them to the host +network interface: +```bash +sudo nft add rule firecracker filter iifname tap0 oifname eth0 accept +``` -*Advanced:* If you are running multiple Firecracker MicroVMs in parallel, or -have something else on your system using `tap0` then you need to create a `tap` -for each one, with a unique name. +### Configuration via `iptables-nft` -*Advanced:* You also need to do the `iptables` set up for each new `tap`. If you -have `iptables` rules you care about on your host, you may want to save those -rules before starting. +Tables and chains are managed by `iptables-nft` automatically, but we'll need three rules to perform +the NAT steps: ```bash -sudo iptables-save > iptables.rules.old +sudo iptables-nft -t nat -A POSTROUTING -o eth0 -s 172.16.0.2 -j MASQUERADE +sudo iptables-nft -A FORWARD -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT +sudo iptables-nft -A FORWARD -i tap0 -o eth0 -j ACCEPT ``` ## Setting Up Firecracker @@ -85,14 +135,20 @@ configuration file like this: ``` Alternatively, if you are using firectl, add ---tap-device=tap0/06:00:AC:10:00:02\` to your command line. +`--tap-device=tap0/06:00:AC:10:00:02\` to your command line. ## In The Guest -Once you have booted the guest, bring up networking within the guest: +Once you have booted the guest, it will have its networking interface with the +name specified by `iface_id` in the Firecracker configuration. + +You'll now need to assign the guest its IP, activate the guest's networking +interface and set up the `tap` IP as the guest's gateway address, so that packets +are routed through the `tap` device, where they are then picked up by the setup +on the host prepared before: ```bash -ip addr add 172.16.0.2/24 dev eth0 +ip addr add 172.16.0.2/30 dev eth0 ip link set eth0 up ip route add default via 172.16.0.1 dev eth0 ``` @@ -107,23 +163,171 @@ your environment. For testing, you can add a public DNS server to nameserver 8.8.8.8 ``` -## \[Advanced\] Setting Up a Bridge Interface +**Note:** Sometimes, it's undesirable to have `iproute2` (providing the `ip` command) +installed on your guest OS, or you simply want to have these steps be performed +automatically. To do this, check out the +_Advanced: Guest network configuration using kernel command line_ section. + +## Cleaning up + +The first step to cleaning up is to delete the tap device on the host: + +```bash +sudo ip link del tap0 +``` + +### Cleanup using `nft` + +You'll want to delete the two nftables rules for NAT routing from the +`postrouting` and `filter` chains. To do this with nftables, you'll need to +look up the _handles_ (identifiers) of these rules by running: + +```bash +sudo nft -a list ruleset +``` + +Now, find the `# handle` comments relating to the two rules and delete them. +For example, if the handle to the masquerade rule is 1 and the one to the +forwarding rule is 2: +```bash +sudo nft delete rule firecracker postrouting handle 1 +sudo nft delete rule firecracker filter handle 2 +``` + +Run the following steps only **if you have no more guests** running on the host: + +Set IPv4 forwarding back to disabled: +```bash +echo 0 | sudo tee /proc/sys/net/ipv4/ip_forward +``` + +If you're using `nft`, delete the `firecracker` table to revert your nftables +configuration fully back to its initial state: +```bash +sudo nft delete table firecracker +``` + +### Cleanup using `iptables-nft` + +Of the configured `iptables-nft` rules, two should be deleted if you have guests +remaining in your configuration: + +```bash +sudo iptables-nft -t nat -D POSTROUTING -o eth0 -s 172.16.0.2 -j MASQUERADE +sudo iptables-nft -D FORWARD -i tap0 -o eth0 -j ACCEPT +``` + +**If you have no more guests** running on the host, then similarly set IPv4 forwarding +back to disabled: + +```bash +echo 0 | sudo tee /proc/sys/net/ipv4/ip_forward +``` + +And delete the remaining `conntrack` rule that applies to all guests: + +```bash +sudo iptables-nft -D FORWARD -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT +``` + +If nothing else is using `iptables-nft` on the system, you may even want to delete the +entire system ruleset like so: + +```bash +sudo iptables-nft -F +sudo iptables-nft -t nat -F +``` + +## Advanced: Multiple guests + +To configure multiple guests, we will only need to repeat some of the steps in this setup +for each of the microVMs: + +1. Each microVM has its own subnet and the two IP addresses inside of it: the `tap` IP and +the guest IP. +2. Each microVM has its own two nftables rules for masquerading and forwarding, while the same +table and two chains can be shared between the microVMs. +3. Each microVM has its own routing configuration inside the guest itself (achieved through +`iproute2` or the method described in the _Advanced: Guest network configuration at kernel level_ +section). + +To give a more concrete example, **let's add a second microVM** to the one you've already configured: + +Let's assume we allocate /30 subnets in the 172.16.0.0/16 range sequentially to give out as +few addresses as needed. + +The next /30 subnet in the 172.16.0.0/16 range will give us these two IPs: 172.16.0.5 as the +`tap` IP and 172.16.0.6 as the guest IP. + +Our new `tap` device will, sequentially, have the name `tap1`: +```bash +sudo ip tuntap add tap1 mode tap +sudo ip addr add 172.16.0.5/30 dev tap1 +sudo ip link set tap1 up +``` + +Now, let's add the new two `nft` rules, also with the new values: +```bash +sudo nft add rule firecracker postrouting ip saddr 172.16.0.6 oifname eth0 counter masquerade +sudo nft add rule firecracker filter iifname tap1 oifname eth0 accept +``` + +If using `iptables-nft`, add the rules like so: +```bash +sudo iptables-nft -t nat -A POSTROUTING -o eth0 -s 172.16.0.6 -j MASQUERADE +sudo iptables-nft -A FORWARD -i tap1 -o eth0 -j ACCEPT +``` + +Modify your Firecracker configuration with the `host_dev_name` now being `tap1` instead of `tap0`, +boot up the guest and perform the routing inside of it like so, changing the guest IP and `tap` IP: +```bash +ip addr add 172.16.0.6/30 dev eth0 +ip link set eth0 up +ip route add default via 172.16.0.5 dev eth0 +``` + +Or, you can use the setup from _Advanced: Guest network configuration at kernel level_ by simply +changing the G and T variables, i.e. the guest IP and `tap` IP. + +**Note:** if you'd like to calculate the guest and `tap` IPs using the sequential subnet allocation +method that has been used here, you can use the following formulas specific to IPv4 addresses: + +`tap` IP = `172.16.[(A*O+1)/256].[(A*O+1)%256]`. + +Guest IP = `172.16.[(A*O+2)/256].[(A*O+2)%256]`. + +Round down the division and replace `A` with the amount of IP addresses inside your subnet (for a +/30 subnet, that will be 4 addresses, for example) and replace `O` with the sequential number of +your microVM, starting at 0. You can replace `172.16` with any other values that fit between between +1 and 255 as usual with an IPv4 address. + +For example, let's calculate the addresses of the 1000-th microVM with a /30 subnet in +the `172.16.0.0/16` range: + +`tap` IP = `172.16.[(4*999+1)/256].[(4*999+1)%256]` = `172.16.15.157`. + +Guest IP = `172.16.[(4*999+2)/256].[(4*999+2)%256]` = `172.16.15.158`. + +This allocation setup has been used successfully in the `firecracker-demo` project for launching several +thousand microVMs on the same host: [relevant lines](https://github.com/firecracker-microvm/firecracker-demo/blob/63717c6e7fbd277bdec8e26a5533d53544a760bb/start-firecracker.sh#L45). + +## Advanced: Bridge-based routing ### On The Host -1. Create a bridge interface +1. Create a bridge interface: ```bash sudo ip link add name br0 type bridge ``` -1. Add tap interface [created above](#on-the-host) to the bridge +2. Add the `tap` device [created above](#on-the-host) to the bridge: ```bash sudo ip link set dev tap0 master br0 ``` -1. Define an IP address in your network for the bridge. +3. Define an IP address in your network for the bridge: For example, if your gateway were on `192.168.1.1` and you wanted to use this for getting dynamic IPs, you would want to give the bridge an unused IP @@ -133,36 +337,42 @@ nameserver 8.8.8.8 sudo ip address add 192.168.1.7/24 dev br0 ``` -1. Add firewall rules to allow traffic to be routed to the guest +4. Add a firewall rule to allow traffic to be routed to the guest: ```bash sudo iptables -t nat -A POSTROUTING -o br0 -j MASQUERADE ``` +5. Once you're cleaning up the configuration, make sure to delete the bridge: + + ```bash + sudo ip link del br0 + ``` + ### On The Guest 1. Define an unused IP address in the bridge's subnet e.g., `192.168.1.169/24`. - _Note: Alternatively, you could rely on DHCP for getting a dynamic IP address - from your gateway._ + **Note**: Alternatively, you could rely on DHCP for getting a dynamic IP address + from your gateway. ```bash ip addr add 192.168.1.169/24 dev eth0 ``` -1. Set the interface up. +2. Enable the network interface: ```bash ip link set eth0 up ``` -1. Create a route to the bridge device +3. Create a route to the bridge device ```bash ip r add 192.168.1.1 via 192.168.1.7 dev eth0 ``` -1. Create a route to the internet via the bridge +4. Create a route to the internet via the bridge ```bash ip r add default via 192.168.1.7 dev eth0 @@ -177,43 +387,34 @@ nameserver 8.8.8.8 192.168.1.1 via 192.168.1.7 dev eth0 ``` -1. Add your nameserver to `resolve.conf` +5. Add your nameserver to `/etc/resolve.conf` ```bash # cat /etc/resolv.conf nameserver 192.168.1.1 ``` -## Cleaning up - -The first step to cleaning up is deleting the tap device: - -```bash -sudo ip link del tap0 -``` - -If you don't have anything else using `iptables` on your machine, clean up those -rules: +## Advanced: Guest network configuration using kernel command line -```bash -sudo iptables -F -sudo sh -c "echo 0 > /proc/sys/net/ipv4/ip_forward" # usually the default -``` +The Linux kernel supports an `ip` CLI arguments that can be passed to it when booting. +Boot arguments in Firecracker are configured in the `boot_args` property of the boot source +(`boot-source` object in the JSON configuration or the equivalent endpoint in the API server). -If you have an existing iptables setup, you'll want to be more careful about -cleaning up. +The value of the `ip` CLI argument for our setup will be the of this format: +`G::T:GM::GI:off`. G is the guest IP (without the subnet), T is the `tap` IP (without the subnet), +GM is the "long" mask IP of the guest CIDR and GI is the name of the guest network interface. -*Advanced:* If you saved your iptables rules in the first step, then you can -restore them like this: +Substituting our values, we get: `ip=172.16.0.2::172.16.0.1:255.255.255.252::eth0:off`. Insert this +at the end of your boot arguments for your microVM, and the guest Linux kernel will automatically +perform the routing configuration done in the _In the Guest_ section without needing `iproute2` +installed in the guest. -```bash -if [ -f iptables.rules.old ]; then - sudo iptables-restore < iptables.rules.old -fi -``` +As soon as you boot the guest, it will already be connected to the network (assuming you correctly +performing the other steps). -*Advanced:* If you created a bridge interface, delete it using the following: +**Note**: you can also use the `ip` argument to configure a primary DNS server and, optionally, a +second DNS server without needing to touch `/etc/resolv.conf`. As an example: -```bash -sudo ip link del br0 -``` +`ip=172.16.0.2::172.16.0.1:255.255.255.252::eth0:off:8.8.8.8:1.1.1.1` configures `8.8.8.8` as the +primary DNS server and `1.1.1.1` as the secondary DNS server, as well as the rest of the guest-side +routing.