Adding X.509 SPIFFE auth #14084

jgnagy · 2023-09-22T22:03:13Z

Description

Attempts to implement #14083.

This change adds a new gRPC server option that enables X.509 SVIDs for authentication.

Support for this auth plugin requires the use of --grpc_auth_mode=spiffe and --grpc_auth_spiffe_allowed_trust_domains (plus the existing --grpc_key, --grpc_cert, --grpc_ca, --vtctld_grpc_key, --vtctld_grpc_cert, --vtctld_grpc_ca parameters, used the same as the mtls auth mode).

When enabled, gRPC servers will expect that clients will provide valid X.509 Leaf SVIDs as client auth certificates. These are described in the SPIFFE documentation as:

A leaf certificate is an SVID that serves to identify a caller or resource and is suitable for use in authentication processes. A leaf certificate [...] is the only type that may serve to identify a resource or caller.

Leaf certificate SPIFFE IDs MUST have a non-root path component. The Subject field is not required, however, the URI SAN extension MUST be marked as critical if Subject is omitted, per section 4.1.2.6 of RFC 5280.

Its validation is further described as:

When validating an X.509 SVID for authentication purposes, the validator MUST ensure that the CA field in the basic constraints extension is set to false, and that keyCertSign and cRLSign are not set in the key usage extension. The validator must also ensure that the scheme of the SPIFFE ID is set to spiffe://. SVIDs containing more than one URI SAN MUST be rejected.

Thus an X.509 SVID is an mTLS client cert with specific features. Crafting them is fairly straightforward and there are several tools to help do this automatically.

Related Issue(s)

Feature Request: Add Support for SPIFFE gRPC auth #14083

Checklist

"Backport to:" labels have been added if this change should be back-ported
Tests were added or are not required
Did the new or modified tests pass consistently locally and on the CI
Documentation was added or is not required

Deployment Notes

None that I'm aware of, other than a TLS infrastructure that can issue valid SVIDs.

vitess-bot · 2023-09-22T22:03:15Z

jgnagy · 2023-09-22T22:04:25Z

I'll admit that I'm very new to developing Vitess (this is my first PR, after all), and I would not consider myself a Golang expert by any stretch.

I may have missed some things, so thanks in advance for your PR reviews!

go/vt/servenv/grpc_server_auth_spiffe.go

mattrobenolt · 2023-09-25T17:22:07Z

go/vt/servenv/grpc_server_auth_spiffe.go

+	}
+
+	for _, trustDomain := range trustedDomains {
+		if cert.URIs[0].Hostname() == trustDomain {


Just to clarify, this is explicitly only validating the hostnames obviously, but realistically, would you not want to also validate against the path as well?

Typically, this also can be paired with a ~~(excuse me since I don't fully know the terms here), some~~ SPIRE server to phone home to to ask "is this URI allowed to do this", similar to OAuth.

Without that full implementation, is there any merit into only validating the hostname rather than the full path?

I can see a case of:

spiffe://example.com/client1, and spiffe://example.com/client2 and wanting to allow client1, but not allow client2.

I'm not entirely sure of all the use cases, but I believe in a lot of contexts, this translates to a per-user issuing a certificate for auth, or per-app, and you'd want to restrict to some apps/users, and not the others.

I think explicitly declaring we only validate the hostname, but I would assume there'd be a desire to do more and the implementation here would make that a bit harder to shoehorn in.

Also, I'm not an expert on this subject at all, but I tend to refer to Envoy for a golden standard with this configuration, and they seem to cover a lot more options as well as a SPIRE agent. https://spiffe.io/docs/latest/microservices/envoy-x509/readme/

If we're choosing to implement a subset of the feature set, I think that's personally fine, I just worry if we implement a subset, now others might want more.

@mattrobenolt these are great questions/points. For SPIFFE, the entire URI forms the SPIFFE ID, which uniquely identifies a particular user/workload, so things like ACLs could be based on that ID. However, a collection of SPIFFE IDs can belong to a single trust domain (which usually corresponds to a CA, but doesn't necessarily have to). Given this, I think, for accepting connections at all, the thing we should validate is the trust domain, much like we're validating the CA that signed a client cert. This is mostly because we'd accept connections from many individual workloads and users all issued for the same trust domain.

The SPIFFE ID makes more sense to use in ACLs, where we might decide that certain SPIFFE IDs (perhaps some applications or users) should be considered writers, while other IDs are readers.

SPIRE is one implementation of the SPIFFE Workload API which makes it an issuer of SVIDs. It can attest to the identity of workloads and then provide them the tools to access protected resources and validate the IDs of other workloads. You could think of it as something like cert-manager + let's encrypt. SPIRE agents are used to create and automatically rotate workload certs and provide the CA bundle for verifying both sides of an mTLS connection. SPIRE is not used as an API to query information about other workloads. Further, SPIRE is not required to use SPIFFE as long as you have some issuer for SVIDs.

deepthi · 2023-09-25T19:11:04Z

@jgnagy can you provide a concrete motivation for implementing this at this time? Are you adopting Vitess for possible production use?
I did read the linked issue, but that doesn't address my question.

jgnagy · 2023-09-26T00:01:50Z

[C]an you provide a concrete motivation for implementing this at this time?

Yes, though it'll take a bit of explaining. The current mTLS approach in Vitess is limited. In particular, it uses SANs in an unexpected/non-standard way. Typically, subjectAltNames (SANs) are alternative ways of referring to the same certificate subject. Vitess is using them as a way of providing group information. Beyond that, we're using a substring search that seems like it would easy to work around or collide (since CN=foo would match both CN=foo and CN=foobar). These are symptoms of using mTLS, which in general lacks well-defined standards beyond standard TLS verification.

SPIFFE allows us to defer to a real, well-defined standard for identifying services distributed across providers/clusters/etc., which is precisely what we need for authentication. This is a better form of mTLS that we could rely on working the same wherever it is used.

For my use-case, I would expect a SVID (SPIFFE Verifiable Identity Document) issuer to be created for a Vitess Cluster. The specifics of how this issuer works isn't important, other than it could attest to workloads that need to access various parts of the cluster.

Let's call these components "application", "vtgate", and "vttablet". Let's describe SVIDs issued for these components:

The application could be issued spiffe://some.vitess.cluster/applications/some.app
The vtgate could be issued spiffe://some.vitess.cluster/vtgates/some.vtgate
The vttablet could be issued spiffe://some.vitess.cluster/vttablets/some.vttablet

When issued SVIDs, the components also received the SPIFFE bundle (CA bundle for the issuer). Given this, the application can connect to the vtgate, where the vtgate will recognize the issuer and will be able to see that it is for the allowed trust domain. We could make the vtgate extract the ID of the application caller and pass it via the context to the vttablet.

The vtgate would connect to the vttablet, authenticating via its SVID. The same thing takes place, where the vttablet recognizes the issuer and sees that the SVID is for an allowed trust domain. The vtgate could pass through the effective caller ID, allowing the vttablet to use ACLs based on the application's SPIFFE ID.

This would mean authentication end-to-end, support ACLs, and no passwords required, much like the mTLS authentication tooling, but using a widely-used implementation.

Note: I could definitely add more to this PR. I plan on adding another commit to add the caller effective ID pass-through I described above. We could also add support for requiring certain path components of SPIFFE IDs, but this isn't required for the use-case I'd like to demonstrate.

Are you adopting Vitess for possible production use?

Yes, definitely. I work at Shopify with Vitess regularly as a part of our DB team. That said, I wanted to implement this as a personal PR since I haven't yet proposed the SPIFFE-based approach at work.

This change adds a new gRPC server option that enables X.509 SVIDs for authentication between gRPC clients and servers. Signed-off-by: Jonathan Gnagy <[email protected]>

deepthi · 2023-09-26T17:22:11Z

I wanted to implement this as a personal PR since I haven't yet proposed the SPIFFE-based approach at work.

This raises some concerns. We've had issues in the past (like any large open-source project) where we accept contributions which are ultimately not used by anyone in production and start bit-rotting. At the same time the original PR author is no longer active, and it falls to the maintainer team to keep things up to date with security issues, golang changes etc.
When proposing anything new, it is much better if there is an actual user that is planning to use that feature soon-ish.
The order I would suggest for this feature is not "Hey Vitess already supports this, let's use it". The way to do it would be to first convince Shopify that this is worth doing and then implement it in Vitess.

Having said that, I do think it will be a good addition to the project. I can say right now that if it is done with Shopify's backing we will almost certainly accept it.

jgnagy · 2023-09-26T20:42:34Z

We've had issues in the past (like any large open-source project) where we accept contributions which are ultimately not used by anyone in production and [...] it falls to the maintainer team to keep things up to date

@deepthi I completely understand this sentiment. Based on your comment, I had a preliminary conversation with other members of my team and I'll push this to our internal fork so I can put together a proof-of-concept and demonstrate it end-to-end. Happily, as it stands, there is at least some interest in using the feature. I'll explore it more there for now.

While I do have confidence that we'll use this (including in production), it might take some time before we get all the way there. In the meantime, I'd like to keep this PR open if possible, in case there is any other interest in the feature. Does that work?

github-actions · 2023-10-30T01:26:43Z

This PR is being marked as stale because it has been open for 30 days with no activity. To rectify, you may do any of the following:

Push additional commits to the associated branch.
Remove the stale label.
Add a comment indicating why it is not stale.

If no action is taken within 7 days, this PR will be closed.

github-actions · 2023-11-06T01:28:23Z

This PR was closed because it has been stale for 7 days with no activity.

jgnagy requested review from deepthi and ajm188 as code owners September 22, 2023 22:03

vitess-bot bot added NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsIssue A linked issue is missing for this Pull Request NeedsWebsiteDocsUpdate What it says labels Sep 22, 2023

github-actions bot added this to the v18.0.0 milestone Sep 22, 2023

jgnagy force-pushed the feat/grpc-spiffe-auth branch from ca5fdb9 to 6710cdb Compare September 22, 2023 23:34

mattrobenolt reviewed Sep 25, 2023

View reviewed changes

feat: adding X.509 SPIFFE auth to gRPC server

f01abb2

This change adds a new gRPC server option that enables X.509 SVIDs for authentication between gRPC clients and servers. Signed-off-by: Jonathan Gnagy <[email protected]>

jgnagy force-pushed the feat/grpc-spiffe-auth branch from 6710cdb to f01abb2 Compare September 26, 2023 04:31

jgnagy requested review from harshit-gangal, systay, frouioui and GuptaManan100 as code owners September 26, 2023 04:31

jgnagy changed the title ~~Adding X.509 SPIFFE auth to gRPC server~~ Adding X.509 SPIFFE auth Sep 26, 2023

deepthi mentioned this pull request Sep 26, 2023

Feature Request: Add Support for SPIFFE gRPC auth #14083

Open

frouioui modified the milestones: v18.0.0, v19.0.0 Sep 29, 2023

github-actions bot added the Stale Marks PRs as stale after a period of inactivity, which are then closed after a grace period. label Oct 30, 2023

github-actions bot closed this Nov 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding X.509 SPIFFE auth #14084

Adding X.509 SPIFFE auth #14084

jgnagy commented Sep 22, 2023 •

edited

Loading

vitess-bot bot commented Sep 22, 2023

jgnagy commented Sep 22, 2023

mattrobenolt Sep 25, 2023 •

edited

Loading

mattrobenolt Sep 25, 2023

jgnagy Sep 25, 2023

deepthi commented Sep 25, 2023

jgnagy commented Sep 26, 2023

deepthi commented Sep 26, 2023

jgnagy commented Sep 26, 2023

github-actions bot commented Oct 30, 2023

github-actions bot commented Nov 6, 2023

Adding X.509 SPIFFE auth #14084

Adding X.509 SPIFFE auth #14084

Conversation

jgnagy commented Sep 22, 2023 • edited Loading

Description

Related Issue(s)

Checklist

Deployment Notes

vitess-bot bot commented Sep 22, 2023

Review Checklist

General

Tests

Documentation

New flags

If a workflow is added or modified:

Backward compatibility

jgnagy commented Sep 22, 2023

mattrobenolt Sep 25, 2023 • edited Loading

Choose a reason for hiding this comment

mattrobenolt Sep 25, 2023

Choose a reason for hiding this comment

jgnagy Sep 25, 2023

Choose a reason for hiding this comment

deepthi commented Sep 25, 2023

jgnagy commented Sep 26, 2023

deepthi commented Sep 26, 2023

jgnagy commented Sep 26, 2023

github-actions bot commented Oct 30, 2023

github-actions bot commented Nov 6, 2023

jgnagy commented Sep 22, 2023 •

edited

Loading

mattrobenolt Sep 25, 2023 •

edited

Loading