-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
srvtopo: Setup metrics in init() function #15304
Conversation
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #15304 +/- ##
==========================================
+ Coverage 67.41% 67.64% +0.22%
==========================================
Files 1560 1561 +1
Lines 192752 193537 +785
==========================================
+ Hits 129952 130919 +967
+ Misses 62800 62618 -182 ☔ View full report in Codecov by Sentry. |
This sounds like a bug rather than an internal cleanup, and should be backported? |
There's no user facing bug here at the moment. Creating multiple of these is something that only happens in tests, and not in for example spinning up a |
Can you create an issue describing the problems that this leads to? It's unclear from the description what bad things can happen with the current state. |
Since it was an internal refactor to avoid potential problems in the future I didn't. But I guess I failed to describe in the PR properly what this is avoiding? What exactly is unclear there that I should expand on? |
ef898ae
to
e89ff3d
Compare
if srvTopoCacheRefresh > srvTopoCacheTTL { | ||
log.Fatalf("srv_topo_cache_refresh must be less than or equal to srv_topo_cache_ttl") | ||
} | ||
|
||
metric := "" | ||
if counterPrefix != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
counterPrefix
was never empty, so we never created counters with no name anyway.
e89ff3d
to
12f753e
Compare
12f753e
to
3393ee3
Compare
3393ee3
to
a2b0374
Compare
if counterPrefix != "" { | ||
metric = counterPrefix + "Counts" | ||
} | ||
counts := stats.NewCountersWithSingleLabel(metric, "Resilient srvtopo server operations", "type") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dynamic naming here is what is core to the problem and if the same name is (accidentally) used twice, it would panic. Instead we now always pass in the counter and build it in init()
functions in the appropriate places beforehand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Callers have to do a little more work, but I do think this is a good change overall. As a bonus we get rid of the random hack we had to introduce in tests to avoid panics.
a2b0374
to
6cd4b1e
Compare
We currently create a single `srvTopoServer` inside `NewTabletServer` to avoid panics on duplicate metrics. This is wrong though, since the topo server created is tied to the context given, and if that is cancelled, it will be stopped. We will then never start a new one. The reason this was set up like that is because of metrics. So instead of what we did before, we need to create metrics once. All other metrics are created in `init()` functions and we need to do the same here. Signed-off-by: Dirkjan Bussink <[email protected]>
We don't need to store this in the resilient server, we can just pass it down to the watchers instead. Signed-off-by: Dirkjan Bussink <[email protected]>
Signed-off-by: Dirkjan Bussink <[email protected]>
6cd4b1e
to
2973dc4
Compare
We currently create a single
srvTopoServer
insideNewTabletServer
to avoid panics on duplicate metrics.This is wrong though, since the topo server created is tied to the context given, and if that is cancelled, it will be stopped. We will then never start a new one.
The reason this was set up like that is because of metrics. So instead of what we did before, we need to create metrics once. All other metrics are created in
init()
functions and we need to do the same here.Issue
Fixes #15319
Checklist