Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VPC Subnet Routing #490

Merged
merged 14 commits into from
Jun 13, 2024
Merged

VPC Subnet Routing #490

merged 14 commits into from
Jun 13, 2024

Conversation

FelixMcFelix
Copy link
Collaborator

@FelixMcFelix FelixMcFelix commented Apr 9, 2024

This PR introduces various features needed in OPTE side for VPC subnet routing:

  • Add ioctl for deletion of router rules.
  • Support for the 'system' vs 'custom' router distinction, and prioritisation of 'custom' rules of equal prefix length.
  • Allow IP spoof-detection to be disabled (and re-enabled) on a per-CIDR basis. This is necassary to enable rules such as 192.168.0.0/16 -> ip(instance_abc) for software routing or enabling VPN endpoints.

I'm leaving this in draft while the omicron-side pieces are in development.

This also adds in the `RouterClass` specifier, which does not yet do
anything.
RFD 21 goes into some detail describing use cases where users might want
to use one instance as a tunnel endpoint for a VPN, and how traffic
towards an arbitrary CIDR can be routed to an instance for, e.g.,
software routing purposes.

To enable this, we need to be able to setup holes in the anti-spoof
filters in the gateway, which are added here as a new pair of ioctls to
add and remove these holes. It's not clear from the RFD whether this is
to be an on/off toggle or as fine-grained as the new ioctls allow, but
we can suoport either case here.
@FelixMcFelix FelixMcFelix self-assigned this Apr 22, 2024
@FelixMcFelix FelixMcFelix added this to the 9 milestone Apr 22, 2024
Copy link

@zeeshanlakhani zeeshanlakhani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some minor stuff while looking at this code in draft.

lib/oxide-vpc/src/engine/gateway/mod.rs Outdated Show resolved Hide resolved
dest: IpCidr,
vpc_mappings: Arc<VpcMappings>,
) -> (Rule<Finalized>, Rule<Finalized>) {
let vpc_meta = Arc::new(VpcMeta::new(vpc_mappings));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

n00b2 Q: Would it make sense to implement MetaAction (used with vpc_meta / Action::Meta below) on vpc_mappings directly or impl it on Arc<VpcMappings> so you don't have to initialize the Arc here (again)?

Copy link
Collaborator Author

@FelixMcFelix FelixMcFelix May 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The double-Arc is really unfortunate for sure. We could impl MetaAction for VpcMappings, but we'd need to recast the design of the MetaAction to use something additional like RuleDisplay instead of Display, I think?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FelixMcFelix ah interesting, I'll look further at this then.

@FelixMcFelix FelixMcFelix marked this pull request as ready for review May 31, 2024 16:27
@FelixMcFelix FelixMcFelix changed the title WIP: VPC Subnet Routing VPC Subnet Routing May 31, 2024
Breaks out the transit IPs/CIDR allowlist functions as requested.
Copy link
Contributor

@luqmana luqmana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting this together Kyle! The demo was great and excited to see this land. Looks good overall but one question about dealing with partial failures during add/remove.

lib/oxide-vpc/src/engine/router.rs Outdated Show resolved Hide resolved
lib/oxide-vpc/tests/integration_tests.rs Show resolved Hide resolved
lib/oxide-vpc/tests/integration_tests.rs Show resolved Hide resolved
bin/opteadm/src/bin/opteadm.rs Outdated Show resolved Hide resolved
lib/oxide-vpc/src/engine/gateway/transit.rs Outdated Show resolved Hide resolved
lib/oxide-vpc/src/engine/gateway/transit.rs Outdated Show resolved Hide resolved
Copy link
Collaborator Author

@FelixMcFelix FelixMcFelix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review Luqman!

bin/opteadm/src/bin/opteadm.rs Outdated Show resolved Hide resolved
lib/oxide-vpc/src/engine/gateway/transit.rs Outdated Show resolved Hide resolved
lib/oxide-vpc/src/engine/gateway/transit.rs Outdated Show resolved Hide resolved
lib/oxide-vpc/src/engine/router.rs Outdated Show resolved Hide resolved
lib/oxide-vpc/src/api.rs Show resolved Hide resolved
// ```
//
// `max_prefix_len` is the maximum prefix length for a given IP
// CIDR type: `32` for IPv4, `128` for IPv6.
//
// `prefix_len` comes from the passed in `cidr` argument.
//
// One bit is used to ensure that 'custom' router rules take precedence

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: is there a test that specifically tests precedence? It's probably already captured, but I thought to ask.

Copy link
Collaborator Author

@FelixMcFelix FelixMcFelix Jun 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking a look. The new system vs. custom route precedence is tested in the new intra_subnet_routes_with_custom test, adding a second rule with an identical destination CIDR:

// Suppose the user now installs a 'custom' route in the first subnet to
// drop traffic towards the second subnet. This rule must take priority.
router::add_entry(
&g1.port,
cidr.clone(),
RouterTarget::Drop,
RouterClass::Custom,
)
.unwrap();
incr!(g1, ["epoch", "router.rules.out"]);
let mut pkt2 = gen_icmpv4_echo_req(
g1_cfg.guest_mac,
g1_cfg.gateway_mac,
g1_cfg.ipv4().private_ip,
dst_ip,
7777,
1,
data,
1,
);
let res = g1.port.process(Out, &mut pkt2, ActionMeta::new());
assert!(matches!(
res,
Ok(ProcessResult::Drop {
reason: DropReason::Layer { name: "router", .. }
})
));

I don't believe we have any explicit tests on e.g. the priority values we're using to get LPM-like behaviour, though.

Copy link
Contributor

@luqmana luqmana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small suggestion for opteadm but otherwise lgtm! Thanks Kyle.

/// Control whether traffic on the CIDR should be allowed for
/// inbound or outbound traffic.
#[arg(long = "dir")]
direction: Direction,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given opteadm is primarily a dev tool, I think making this an Option<Direction> and doing the ioctl calls for both directions if it's None is fine.

@FelixMcFelix FelixMcFelix merged commit 915975f into master Jun 13, 2024
10 checks passed
@FelixMcFelix FelixMcFelix deleted the custom-subnet-routing branch June 13, 2024 11:10
FelixMcFelix added a commit to oxidecomputer/omicron that referenced this pull request Jun 26, 2024
This PR wires up all the backing machinery for VPC subnet routing, and
automatically resolves and pushes updated rules to sleds using an RPW.
This allows instances in all subnets of a VPC to talk with one another
-- assuming no firewall rules have been configured otherwise. At a high
level, this works by a few changes:
* During the VPC create saga, we now push two rules explicitly to the
system router -- default routes from `(0.0.0.0/0, ::/0) ->
inetgw:outbound`.
* Any CRUD operation on a VPC subnet will reconcile the set of VPC
subnet routes within the system router to have one entry per subnet.
This takes the form `subnet:{name} -> subnet:{name}` for each subnet,
which are later resolved to both v4 and v6 entries.
* Ports are created using route information known to sled-agent -- this
defaults to an empty route set for instances/probes, and an internet
gateway rule for services to enable early NTP sync.
* Routes are sync'd with sleds using a new background task. Broadly,
this asks each sled for the set of VPCs and subnets it has ports on, and
a version for the current route set installed in each. The background
task will use this information to determine which routes must be
rebuilt, and will send updated versions out in response.

The most immediate consequence in this PR is that hosts within a subnet
-- on different VPCs -- will be able to talk with one another at last.
The user facing API (#2116) will be re-enabled in a concurrent PR --
#5823 -- as will NIC spoof detection hole-punching.

Depends on oxidecomputer/opte#490.

Closes #2232, Fixes #1336.

---

A few pieces will block tests passing & merge-readiness:
- [x] Creation of a `lab-2.0-opte-0.32` image.
- [x] Merge of oxidecomputer/maghemite#274 (and updating all the right
SHAs in this PR).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants