-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auto-set 5s on-demand heartbeat if --enable_heartbeat is disabled #15099
Auto-set 5s on-demand heartbeat if --enable_heartbeat is disabled #15099
Conversation
Signed-off-by: Shlomi Noach <[email protected]>
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #15099 +/- ##
==========================================
+ Coverage 47.29% 47.72% +0.42%
==========================================
Files 1137 1155 +18
Lines 238684 240275 +1591
==========================================
+ Hits 112895 114673 +1778
+ Misses 117168 116999 -169
+ Partials 8621 8603 -18 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The description seems pretty geared towards making heartbeats available in examples
. With that context in mind, I think changing the default value (and behavior) of Vitess to meet an examples
need creates added risk. The default value being dynamic here adds more complexity that will become difficult to reason about when someone encounters behavior they can't explain.
I'm not sure the added risk and complexity is worth it to satisfy an examples
need, if we're able to satisfy it just as well by setting an explicit flag in the examples
config?
@maxenglander I see your point and it is valid. Still, some further thoughts:
Yes and no. As I went back and forth with the configuration, I realized the heartbeat configuration emerged organically but ended up in a bit of a confusing state. Like the fact it takes either of two flags to make heartbeats possible, or the fact that one flag ( A different approach could be that, upon activation, the throttler could programmatically enforce the activation of heartbeats. |
fs.DurationVar(&heartbeatInterval, "heartbeat_interval", 1*time.Second, "How frequently to read and write replication heartbeat.") | ||
fs.DurationVar(&heartbeatOnDemandDuration, "heartbeat_on_demand_duration", 0, "If non-zero, heartbeats are only written upon consumer request, and only run for up to given duration following the request. Frequent requests can keep the heartbeat running consistently; when requests are infrequent heartbeat may completely stop between requests") | ||
fs.DurationVar(&heartbeatOnDemandDuration, "heartbeat_on_demand_duration", 0, "If non-zero, heartbeats are only written upon consumer request, and only run for up to given duration following the request. Frequent requests can keep the heartbeat running consistently. Automatically set to 5s when --heartbeat_enable is unset.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we make the two flags mutually exclusive in pFlag? That might simplify the logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a great idea, irrespective of this PR! I only wonder if it's going to break someone's existing setup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it might... 😢 We could potentially add it to the 19.0.0 changelog as a breaking change though.
Maybe we should take a documentation driven approach here. How would we document the new/proposed configuration here? https://vitess.io/docs/19.0/reference/features/tablet-throttler/#configuration
- IMO we should remove all of the old tablet level config parts/noise there in the v19 docs
- That seems a little misleading now as it says "Enabling the lag throttler also automatically enables heartbeat injection." and it does not note that you must set
--heartbeat_enable=false
(which I think was true before given the noted issues reported with the examples) and--heartbeat_on_demand_duration
to a non-zero value — although it does describe what it is and recommend values for it
In the meantime I'll open a new PR adding the flag in |
Followup in #15204. |
Meanwhile converting this to |
Please see this followup issue: #15303 |
Closing for now. Follow #15303 for related work. |
Description
This work started with #14978 and #14980. And while the discussion was about the
examples
suite, it applies to all environments.After #14980 was merged, we realized
examples
could no longer facilitate the tablet throttler with its default setup. That's because the throttler, by default, relies on replication lag calculated from_vt.heartbeat
.And while the throttler can be enabled/disabled dynamically, the heartbeat configuration is set on startup.
Following #14980 we asked ourselves: should the throttler be available for operation in
examples
? And we agreed that it should -- same as any other component in Vitess. To that effect, the throttler should be able to read valid heartbeat information.There are multiple ways to go about it, and some considerations are:
examples
are expected to run on local laptops or otherwise small hosts.examples
should not generate any excessive load./debug/status
page reported lag due to stale heartbeats, and even though there was no actual lag.The solution in this PR is simple: if
--heartbeat_enable
is set, great, heartbeats are on, and nothing else to do. But if it is unset, and neither isheartbeat_on_demand_duration
, then we setheartbeat_on_demand_duration
to5s
. Meaning, there has to be some sort of heartbeat.As a reminder, the rules for
heartbeat_on_demand_duration
are:Open()
.5s
, upon request.vreplication
) actively asks it for throttling feedback.In short, with this new
5s
on-demand heartbeat default value, the the throttler will be able to work correctly inexamples
and in any other Vitess setup that is otherwise not configured for heartbeats. Heartbeats will only be generated while the throttler is both enabled and actively engaged with some vreplication workflow or Online DDL operation etc.And, back to
examples
, becauseheartbeat_enable
is unset, thenheartbeatReader
is not enabled, and as result/debug/status
reports replication lag based on MySQL replication as opposed to based on_vt.heartbeat
. So this PR still respects #14978.Related Issue(s)
Backporting
Like #15099, this should be backported to
18.0
and17.0
and for the same reasons.Checklist
Deployment Notes