-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding X.509 SPIFFE auth #14084
Adding X.509 SPIFFE auth #14084
Conversation
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
I'll admit that I'm very new to developing Vitess (this is my first PR, after all), and I would not consider myself a Golang expert by any stretch. I may have missed some things, so thanks in advance for your PR reviews! |
ca5fdb9
to
6710cdb
Compare
} | ||
|
||
for _, trustDomain := range trustedDomains { | ||
if cert.URIs[0].Hostname() == trustDomain { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify, this is explicitly only validating the hostnames obviously, but realistically, would you not want to also validate against the path as well?
Typically, this also can be paired with a (excuse me since I don't fully know the terms here), some SPIRE server to phone home to to ask "is this URI allowed to do this", similar to OAuth.
Without that full implementation, is there any merit into only validating the hostname rather than the full path?
I can see a case of:
spiffe://example.com/client1
, and spiffe://example.com/client2
and wanting to allow client1, but not allow client2.
I'm not entirely sure of all the use cases, but I believe in a lot of contexts, this translates to a per-user issuing a certificate for auth, or per-app, and you'd want to restrict to some apps/users, and not the others.
I think explicitly declaring we only validate the hostname, but I would assume there'd be a desire to do more and the implementation here would make that a bit harder to shoehorn in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I'm not an expert on this subject at all, but I tend to refer to Envoy for a golden standard with this configuration, and they seem to cover a lot more options as well as a SPIRE agent. https://spiffe.io/docs/latest/microservices/envoy-x509/readme/
If we're choosing to implement a subset of the feature set, I think that's personally fine, I just worry if we implement a subset, now others might want more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mattrobenolt these are great questions/points. For SPIFFE, the entire URI forms the SPIFFE ID, which uniquely identifies a particular user/workload, so things like ACLs could be based on that ID. However, a collection of SPIFFE IDs can belong to a single trust domain (which usually corresponds to a CA, but doesn't necessarily have to). Given this, I think, for accepting connections at all, the thing we should validate is the trust domain, much like we're validating the CA that signed a client cert. This is mostly because we'd accept connections from many individual workloads and users all issued for the same trust domain.
The SPIFFE ID makes more sense to use in ACLs, where we might decide that certain SPIFFE IDs (perhaps some applications or users) should be considered writers, while other IDs are readers.
SPIRE is one implementation of the SPIFFE Workload API which makes it an issuer of SVIDs. It can attest to the identity of workloads and then provide them the tools to access protected resources and validate the IDs of other workloads. You could think of it as something like cert-manager + let's encrypt. SPIRE agents are used to create and automatically rotate workload certs and provide the CA bundle for verifying both sides of an mTLS connection. SPIRE is not used as an API to query information about other workloads. Further, SPIRE is not required to use SPIFFE as long as you have some issuer for SVIDs.
@jgnagy can you provide a concrete motivation for implementing this at this time? Are you adopting Vitess for possible production use? |
Yes, though it'll take a bit of explaining. The current mTLS approach in Vitess is limited. In particular, it uses SANs in an unexpected/non-standard way. Typically, SPIFFE allows us to defer to a real, well-defined standard for identifying services distributed across providers/clusters/etc., which is precisely what we need for authentication. This is a better form of mTLS that we could rely on working the same wherever it is used. For my use-case, I would expect a SVID (SPIFFE Verifiable Identity Document) issuer to be created for a Vitess Cluster. The specifics of how this issuer works isn't important, other than it could attest to workloads that need to access various parts of the cluster. Let's call these components "application", "vtgate", and "vttablet". Let's describe SVIDs issued for these components:
When issued SVIDs, the components also received the SPIFFE bundle (CA bundle for the issuer). Given this, the application can connect to the vtgate, where the vtgate will recognize the issuer and will be able to see that it is for the allowed trust domain. We could make the vtgate extract the ID of the application caller and pass it via the context to the vttablet. The vtgate would connect to the vttablet, authenticating via its SVID. The same thing takes place, where the vttablet recognizes the issuer and sees that the SVID is for an allowed trust domain. The vtgate could pass through the effective caller ID, allowing the vttablet to use ACLs based on the application's SPIFFE ID. This would mean authentication end-to-end, support ACLs, and no passwords required, much like the mTLS authentication tooling, but using a widely-used implementation. Note: I could definitely add more to this PR. I plan on adding another commit to add the caller effective ID pass-through I described above. We could also add support for requiring certain path components of SPIFFE IDs, but this isn't required for the use-case I'd like to demonstrate.
Yes, definitely. I work at Shopify with Vitess regularly as a part of our DB team. That said, I wanted to implement this as a personal PR since I haven't yet proposed the SPIFFE-based approach at work. |
This change adds a new gRPC server option that enables X.509 SVIDs for authentication between gRPC clients and servers. Signed-off-by: Jonathan Gnagy <[email protected]>
6710cdb
to
f01abb2
Compare
This raises some concerns. We've had issues in the past (like any large open-source project) where we accept contributions which are ultimately not used by anyone in production and start bit-rotting. At the same time the original PR author is no longer active, and it falls to the maintainer team to keep things up to date with security issues, golang changes etc. Having said that, I do think it will be a good addition to the project. I can say right now that if it is done with Shopify's backing we will almost certainly accept it. |
@deepthi I completely understand this sentiment. Based on your comment, I had a preliminary conversation with other members of my team and I'll push this to our internal fork so I can put together a proof-of-concept and demonstrate it end-to-end. Happily, as it stands, there is at least some interest in using the feature. I'll explore it more there for now. While I do have confidence that we'll use this (including in production), it might take some time before we get all the way there. In the meantime, I'd like to keep this PR open if possible, in case there is any other interest in the feature. Does that work? |
This PR is being marked as stale because it has been open for 30 days with no activity. To rectify, you may do any of the following:
If no action is taken within 7 days, this PR will be closed. |
This PR was closed because it has been stale for 7 days with no activity. |
Description
Attempts to implement #14083.
This change adds a new gRPC server option that enables X.509 SVIDs for authentication.
Support for this auth plugin requires the use of
--grpc_auth_mode=spiffe
and--grpc_auth_spiffe_allowed_trust_domains
(plus the existing--grpc_key
,--grpc_cert
,--grpc_ca
,--vtctld_grpc_key
,--vtctld_grpc_cert
,--vtctld_grpc_ca
parameters, used the same as themtls
auth mode).When enabled, gRPC servers will expect that clients will provide valid X.509 Leaf SVIDs as client auth certificates. These are described in the SPIFFE documentation as:
Its validation is further described as:
Thus an X.509 SVID is an mTLS client cert with specific features. Crafting them is fairly straightforward and there are several tools to help do this automatically.
Related Issue(s)
Checklist
Deployment Notes
None that I'm aware of, other than a TLS infrastructure that can issue valid SVIDs.