-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VPC Subnet Routing #490
VPC Subnet Routing #490
Conversation
This also adds in the `RouterClass` specifier, which does not yet do anything.
9149747
to
1ebbbb6
Compare
RFD 21 goes into some detail describing use cases where users might want to use one instance as a tunnel endpoint for a VPN, and how traffic towards an arbitrary CIDR can be routed to an instance for, e.g., software routing purposes. To enable this, we need to be able to setup holes in the anti-spoof filters in the gateway, which are added here as a new pair of ioctls to add and remove these holes. It's not clear from the RFD whether this is to be an on/off toggle or as fine-grained as the new ioctls allow, but we can suoport either case here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some minor stuff while looking at this code in draft.
dest: IpCidr, | ||
vpc_mappings: Arc<VpcMappings>, | ||
) -> (Rule<Finalized>, Rule<Finalized>) { | ||
let vpc_meta = Arc::new(VpcMeta::new(vpc_mappings)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
n00b2 Q: Would it make sense to implement MetaAction
(used with vpc_meta
/ Action::Meta
below) on vpc_mappings directly or impl it on Arc<VpcMappings>
so you don't have to initialize the Arc here (again)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The double-Arc
is really unfortunate for sure. We could impl MetaAction for VpcMappings
, but we'd need to recast the design of the MetaAction
to use something additional like RuleDisplay
instead of Display
, I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@FelixMcFelix ah interesting, I'll look further at this then.
Breaks out the transit IPs/CIDR allowlist functions as requested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for putting this together Kyle! The demo was great and excited to see this land. Looks good overall but one question about dealing with partial failures during add/remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review Luqman!
// ``` | ||
// | ||
// `max_prefix_len` is the maximum prefix length for a given IP | ||
// CIDR type: `32` for IPv4, `128` for IPv6. | ||
// | ||
// `prefix_len` comes from the passed in `cidr` argument. | ||
// | ||
// One bit is used to ensure that 'custom' router rules take precedence |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: is there a test that specifically tests precedence? It's probably already captured, but I thought to ask.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking a look. The new system vs. custom route precedence is tested in the new intra_subnet_routes_with_custom
test, adding a second rule with an identical destination CIDR:
opte/lib/oxide-vpc/tests/integration_tests.rs
Lines 4064 to 4090 in 789b927
// Suppose the user now installs a 'custom' route in the first subnet to | |
// drop traffic towards the second subnet. This rule must take priority. | |
router::add_entry( | |
&g1.port, | |
cidr.clone(), | |
RouterTarget::Drop, | |
RouterClass::Custom, | |
) | |
.unwrap(); | |
incr!(g1, ["epoch", "router.rules.out"]); | |
let mut pkt2 = gen_icmpv4_echo_req( | |
g1_cfg.guest_mac, | |
g1_cfg.gateway_mac, | |
g1_cfg.ipv4().private_ip, | |
dst_ip, | |
7777, | |
1, | |
data, | |
1, | |
); | |
let res = g1.port.process(Out, &mut pkt2, ActionMeta::new()); | |
assert!(matches!( | |
res, | |
Ok(ProcessResult::Drop { | |
reason: DropReason::Layer { name: "router", .. } | |
}) | |
)); |
I don't believe we have any explicit tests on e.g. the priority values we're using to get LPM-like behaviour, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One small suggestion for opteadm but otherwise lgtm! Thanks Kyle.
bin/opteadm/src/bin/opteadm.rs
Outdated
/// Control whether traffic on the CIDR should be allowed for | ||
/// inbound or outbound traffic. | ||
#[arg(long = "dir")] | ||
direction: Direction, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given opteadm
is primarily a dev tool, I think making this an Option<Direction>
and doing the ioctl calls for both directions if it's None
is fine.
With rollback behaviour!
This PR wires up all the backing machinery for VPC subnet routing, and automatically resolves and pushes updated rules to sleds using an RPW. This allows instances in all subnets of a VPC to talk with one another -- assuming no firewall rules have been configured otherwise. At a high level, this works by a few changes: * During the VPC create saga, we now push two rules explicitly to the system router -- default routes from `(0.0.0.0/0, ::/0) -> inetgw:outbound`. * Any CRUD operation on a VPC subnet will reconcile the set of VPC subnet routes within the system router to have one entry per subnet. This takes the form `subnet:{name} -> subnet:{name}` for each subnet, which are later resolved to both v4 and v6 entries. * Ports are created using route information known to sled-agent -- this defaults to an empty route set for instances/probes, and an internet gateway rule for services to enable early NTP sync. * Routes are sync'd with sleds using a new background task. Broadly, this asks each sled for the set of VPCs and subnets it has ports on, and a version for the current route set installed in each. The background task will use this information to determine which routes must be rebuilt, and will send updated versions out in response. The most immediate consequence in this PR is that hosts within a subnet -- on different VPCs -- will be able to talk with one another at last. The user facing API (#2116) will be re-enabled in a concurrent PR -- #5823 -- as will NIC spoof detection hole-punching. Depends on oxidecomputer/opte#490. Closes #2232, Fixes #1336. --- A few pieces will block tests passing & merge-readiness: - [x] Creation of a `lab-2.0-opte-0.32` image. - [x] Merge of oxidecomputer/maghemite#274 (and updating all the right SHAs in this PR).
This PR introduces various features needed in OPTE side for VPC subnet routing:
192.168.0.0/16 -> ip(instance_abc)
for software routing or enabling VPN endpoints.I'm leaving this in draft while the omicron-side pieces are in development.