-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HPCC-30299 Update secrets in the background to avoid roxie stalls #18071
Conversation
https://track.hpccsystems.com/browse/HPCC-30299 |
This is easiest to review as two separate commits - one which refactors the code and rearranges it. The second implements the background updater and fixes problems with access times not being updated. |
Worth checking the unit tests are correct, and also that the timeouts are sensible. |
15fea4b
to
c692f35
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor comments, plus a question on the logic
{ | ||
const char * slash = strchr(key, '/'); | ||
assertex(slash); | ||
const char * at = strchr(slash, '@'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will go wrong if anyone put a # into a vault id. Perhaps that is checked elsewhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. The vault id is just the name of the vault config entry in values.yaml or environment.xml. I don't think there is currently any checking for allowed characters. We have control over what is allowed and probably should limit the characters used. Could add checking to the helm chart and the CVault constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added HPCC-30124 to check them.
const char * at = strchr(slash, '@'); | ||
const char * hash = strchr(slash, '#'); | ||
|
||
const char * end = nullptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels like this code would be simpler if you initialised end = key+strlen(key) rather than null
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be, not sure I want to change it no though.
// We should never replace known contents for unknown contents | ||
// so once this returns true it should always return true | ||
bool hasContents() const | ||
{ | ||
return contents != nullptr; | ||
} | ||
|
||
//Has the secret value been used since it was last checked for an update? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm struggling to understand why it is relevant whether a secret value has been used recently, when updating them. Unless you are removing secrets from the cache if not used recently (which I don't think you are).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The aim is to not refresh secrets in the background that have not been used recently - to avoid unnecessary load on the vault.
for (auto & entry : secrets) | ||
{ | ||
SecretCacheEntry * secret = entry.second.get(); | ||
if (secret->isActive() && secret->needsRefresh(when)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't feel right - just because a secret has not been read recently doesn't mean that the next time it IS used it's ok to use an old version, does it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Next time it is actually used it will check to see if it needs a refresh, and if it does it will go and get the new value. This PR doesn't change that behaviour, it preemptively refreshes secrets that are in active use behind the scenes to avoid roxie pausing when it needs to access them.
I found a bunch of core files (in CentOS based On-Demand Smoketest) and the generated trace files show similar problem:
I don't know it is related to your changes or not. |
if (resolved) | ||
globalSecretCache.updateSecret(secret, resolved, now, accessed); | ||
else | ||
secret->noteFailedUpdate(now, accessed); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would we always honor accessed if not resolved ?
const char * slash = strchr(key, '/'); | ||
assertex(slash); | ||
const char * at = strchr(slash, '@'); | ||
const char * hash = strchr(slash, '#'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - are these '/', '#', '@' special chars always ok to use as separators ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have two minor comments, but otherwise looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very minor comments otherwise looks good. Vault name should be restricted to basic charset and validated in helm chart and perhaps elsewhere.
{ | ||
const char * slash = strchr(key, '/'); | ||
assertex(slash); | ||
const char * at = strchr(slash, '@'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. The vault id is just the name of the vault config entry in values.yaml or environment.xml. I don't think there is currently any checking for allowed characters. We have control over what is allowed and probably should limit the characters used. Could add checking to the helm chart and the CVault constructor.
testing/unittests/jlibtests.cpp
Outdated
MilliSleep(30); // elapsed=180 = 80 + 80 + 20 | ||
CPPUNIT_ASSERT(secret6->isValid()); | ||
CPPUNIT_ASSERT(!secret6->isStale()); | ||
unsigned version3 = secret6->getVersion(); // Mark the value as accessed, but too early to be refreshed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the above comment wrong? No longer too early to be refreshed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct comment is wrong.
c692f35
to
8956ab2
Compare
@afishbeck I have created a jira, fixed the comment and squashed. |
I agree that this is an important improvement. One way of looking at it is that since this can be turned off via config, what is the chance the code change would break something even if turned off? |
8956ab2
to
42782ff
Compare
@mckellyln I decided not to merge because I was concerned about msTick() wrapping. In this case, secrets are never removed from the hash table - which means that if a roxie has been up for more than 25 days, and early in its life it accessed a secret, but hasn't since, the test for needs refresh and expired may be wrong. |
} | ||
|
||
extern jlib_decl void setSecretTimeout(unsigned timeoutMs) | ||
{ | ||
secretTimeoutMs = timeoutMs; | ||
secretTimeoutNs = (unsigned __int64)timeoutMs * 1000000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why unsigned __int64 and not __uint64 ? Does it matter ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commit 2 looks good.
Approved.
Signed-off-by: Gavin Halliday <[email protected]>
Type of change:
Checklist:
Smoketest:
Testing: