-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
meshca: credentials/sts: PerRPCCreds Implementation #3696
Conversation
credentials/sts/sts.go
Outdated
// callCreds provides the implementation of call credentials based on an STS | ||
// token exchange. | ||
type callCreds struct { | ||
opts Options |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thought is we can decouple the call creds from the sts implementation. So the call creds is a thin wrapper of the sts implementation. This way, we may be able to move sts to other library (cloud libraries?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried doing this. The problem seems to be the following:
- Many of our call creds implementations model themselves after the
oauth2.TokenSource
interface which provides aToken
method to retrieve the token to use. But this method does not take a context, and therefore it would be hard/ugly for us to enforce a timeout with this approach. There is an open issue for the same: TokenSource.Token method should take in a Context golang/oauth2#262.
And based on an offline discussion with @menghanl, we decided to keep the implementation as is.
@dfawley : Ping ... |
credentials/sts/sts.go
Outdated
// Send the request with exponential backoff and retry. Even though not | ||
// retrying here is OK, as the connection attempt will be retried by the | ||
// subConn, it is not very hard to perform some basic retries here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not hard, but it may go against some principles of gRPC. Consider the first connection attempt in particular -- if we get an error here caused by misconfiguration, we could return that error immediately and RPCs would fail right away at startup. But if we retry and wait for 20 seconds or a minute to elapse before failing, then even fail-fast RPCs will stall for that duration, when they have no hope of succeeding. Also, the error returned is (currently) the context error and not the last encountered error, which hurts debuggability (but this can be fixed).
Yes we will churn the server connection if a transient error occurs here, but that seems like a tradeoff worth making to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember Muxi saying that C-core does not do any retries here. I'm hearing that we shouldn't be doing any either. Shall I go ahead and get rid of all the code doing retries and just return an error if we run into one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that would be best, yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PTAL.
credentials/sts/sts.go
Outdated
// Send the request with exponential backoff and retry. Even though not | ||
// retrying here is OK, as the connection attempt will be retried by the | ||
// subConn, it is not very hard to perform some basic retries here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember Muxi saying that C-core does not do any retries here. I'm hearing that we shouldn't be doing any either. Shall I go ahead and get rid of all the code doing retries and just return an error if we run into one?
Also, refactor the code a bit for easier and better testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, refactored the code a bit for better testability. PTAL.
credentials/sts/sts.go
Outdated
// Send the request with exponential backoff and retry. Even though not | ||
// retrying here is OK, as the connection attempt will be retried by the | ||
// subConn, it is not very hard to perform some basic retries here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
- Unexport [Request/Response]Parameters structs. - Hold lock for entire duration of sts request. - Don't ignore errors on attempts to read actor token.
Implementation of call credentials based on https://tools.ietf.org/html/rfc8693.