-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MDEV-34614] mysqlbinlog warn on EOF before GTID in --stop-position #3600
base: 10.11
Are you sure you want to change the base?
Conversation
PR # 60² |
mysql-test/suite/binlog/t/binlog_mysqlbinlog_warn_stop_gtid.test
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ParadoxV5
Thanks for a well done PR! Please see my notes.
sql/rpl_gtid.cc
Outdated
rpl_gtid *stop_gtid= &stop_gtids[i]; | ||
struct audit_elem *audit_elem= (struct audit_elem *)my_hash_search( | ||
&m_audit_elem_domain_lookup, | ||
(const uchar *)&(stop_gtid->domain_id), 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(const uchar *)&(stop_gtid->domain_id), 0 | |
reinterpret_cast<const uchar *>(&(stop_gtid->domain_id)), 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks; my background was in C and I’m unfamiliar with C++ trickeries like this.
Note that I copied based this section from
Lines 3188 to 3190 in 0e8fb97
struct audit_elem *audit_elem= (struct audit_elem *) my_hash_search( | |
&m_audit_elem_domain_lookup, | |
(const uchar *) &(stop_gtid->domain_id), 0); |
which used the C-cast.
mysql-test/suite/binlog/t/binlog_mysqlbinlog_warn_stop_gtid.test
Outdated
Show resolved
Hide resolved
2bf416c
to
ffec23d
Compare
sql/rpl_gtid.cc
Outdated
@@ -1,5 +1,5 @@ | |||
/* Copyright (c) 2013, Kristian Nielsen and MariaDB Services Ab. | |||
Copyright (c) 2020, 2022, MariaDB Corporation. | |||
Copyright (c) 2020, 2024, MariaDB Corporation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello seniors, new member question: Who, when, and what conventions are on the copyright years?
By the way, perhaps devs should ditch them altogether: https://hynek.me/til/copyright-years
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -2,25 +2,25 @@ | |||
--echo # MDEV-27037 mysqlbinlog emits a warning when reaching EOF before stop-condition | |||
--echo | |||
|
|||
--let assert_file= $MYSQLTEST_VARDIR/tmp/warn_pos_test_file.out |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This renaming is fine, but IMHO not worth the churn.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ParadoxV5 ! See my latest notes (though I can't help but feel I lead you in a circle earlier, to arrive back where you had already started looking 😅)
client/mysqlbinlog.cc
Outdated
@@ -3490,6 +3491,8 @@ static Exit_status dump_local_log_entries(PRINT_EVENT_INFO *print_event_info, | |||
/* | |||
Emit a warning in the event that we finished processing input | |||
before reaching the boundary indicated by --stop-position. | |||
Due to design limitations, Binlog_gtid_state_validator::report |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was originally going to say something like..
For more straightforward reading, it'd be better to highlight the inconsistency at the start of the sentence. Also, "due to design limitations" doesn't really provide context into why (which is good to have when going back through time :), so it would be good to expand upon that).
and in writing up a suggestion, my rational on design limitations was to reduce code redundancy, b/c I was thinking the stop-check would go into repot_audit_findings()
which is already at the domain-scope. But now that is no longer the case, and gtid_state_validator::report()
is inconsistent already, in the sense that the stop-gtid check is already separated from the rest of the checks. So we may as well retain some level of consistency and do the check here (e.g. by adding a new API call for gtid_state_validator
that checks if the stop position has been hit).
_Though I wonder if that is already what you had planned before I interfered in the fist place 😇 _
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though I wonder if that is already what you had planned before I interfered in the fist place 😇
Yes, it was similar.
I originally considered adding an API to Domain_gtid_event_filter *position_gtid_filter
that exposes the completion states of each domain-specific subfilter.
I was ultimately convinced that hiding the code in gtid_state_validator
instead is a lesser change to APIs. After all,
This is a bug fix, and the PR is based against the earliest maintained branch in which the bug can be reproduced.
I’d definitely choose the former, more consistent strategy if
This is a new feature or a refactoring, and the PR is based against the main branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For more straightforward reading, it'd be better to highlight the inconsistency at the start of the sentence. Also, "due to design limitations" doesn't really provide context into why (which is good to have when going back through time :), so it would be good to expand upon that).
I have my rationals in the commit description. I’ll copy them over. (EDIT: if we’re not going back to the drawing board, that is)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here’s what we could do: Apply these introverted changes for 10.11
branch, but also follow-up with a refactor (see top post) on the main
branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I originally considered adding an API to Domain_gtid_event_filter *position_gtid_filter that exposes the completion states of each domain-specific subfilter.
You could also do the warnings in the filters. There is already has_finished()
, which you could extend with an optional err-mode parameter (with some noop default) , and then call has_finished(<err_mode_config>)
here (or something like that. One idea could be a function-pointer to an output function (like report_f
in report_audit_findings
(just an idea, you could do it how you like, if you agree with the overall direction)). Though you'll also have to extend the filters a little to support this :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'd be ok extending the filter API in 10.11, so long as we do it minimally (i.e. I'm not a huge fan of exposing the filters directly as position_gtid_filter.m_stop_filters
). Perhaps we can add a function to Domain_gtid_event_filter
of signature rpl_gtid *Domain_gtid_event_filter::get_stop_gtids_for_active_filters(uint n)
which abstracts the loop and has_finished()
call
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have reëvaluated that it’s not worth to implement suboptimally and inconsistentlly (and also tamper the delete gtid_state_validator
optimization) just to minimalize API differences.
I think we'd be ok extending the filter API in 10.11, so long as we do it minimally.
I am going to reïmplement this and #3400 with my ideal concept imagined in the top post:
Move the warning feature to within the (entire?) Gtid_event_filter
family.
This will probably be a new method (rather than repurposing has_finished
) so it futureproofs for other filters such as --start-position
and even --ignore-server-ids
(MDEV-20119).
If this turns out to be too much change for 10.11, I might rebrand this ticket & PR to a new feature on main
.
(I won’t refactor the validator, at least for now.)
Speaking of – according to git grep
, this is an unimplemented prototype found only in its header:
Lines 666 to 674 in a37f71b
protected: | |
/* | |
When processing GTID streams, the order in which they are processed should | |
be sequential with no gaps between events. If a gap is found within a | |
window, warn the user. | |
*/ | |
void verify_gtid_is_expected(rpl_gtid *gtid); | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good (though I could never find what you were referring to with):
with my ideal concept imagined in the top post:
And please feel free to remove verify_gtid_is_expected()
while you're at it :) (though as a separate cleanup commit would be best)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with my ideal concept imagined in the top post:
oh, it’s collapsed under “▶ Unlike 3400”
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Design B is up.
#3400 was not affected (fortunately?) because index and timestamp filters were not cousin objects of the Gtid_event_filter
s.
I prefer OOP for them too, but such rewriting would be well beyond the scope of this JIRA ticket, especially one that’s targeting a–insert repeated phrase here.
mysql-test/suite/binlog/t/binlog_mysqlbinlog_warn_stop_gtid.test
Outdated
Show resolved
Hide resolved
This commit is a collection of various minor touchups extracted from multiple iterations of MDEV-34614 mysqlbinlog warn on EOF before GTID in --stop-position. * `rpl_gtid.h`: Remove unimplemented function prototype `Window_gtid_event_filter::verify_gtid_is_expected` (it’s also a red herring to MDEV-34614) * `rpl_gtid.cc`: Add brief description to `report_audit_findings` * `binlog_mysqlbinlog_warn_stop_position`: Name the temporary output redirect as ignored to match `binlog_mysqlbinlog_warn_stop_datetime` * #3400 (comment) * `binlog_mysqlbinlog_warn_stop_datetime`: Remove duplicate case
This commit adds warnings for `--stop-position` GTIDs that were not reached at EOF, mainly as a follow-up to MDEV-27037 Mysqlbinlog should output a warning if EOF is found before its stop condition (#3400). `--stop-position` warnings inform possible mistakes in the input, especially for progress reporting of scripts/wrappers. MDEV-34614 enhances MDEV-27037 with individualized GTID validation, for GTID range selection weren’t in all versions that MDEV-27037 targeted. The `Gtid_event_filter` family provides the the warning mechanism polymorphically and through the new public method `verify_completed_state`. It’s uncommon to commit a new public API targeting a non-latest version, but this design is not only hierarchical but also extensible (e.g., to `--ignore-server-ids`).
ffec23d
to
4007ff0
Compare
@@ -4067,3 +4119,17 @@ my_bool Intersecting_gtid_event_filter::has_finished() | |||
} | |||
return FALSE; | |||
} | |||
|
|||
my_bool Intersecting_gtid_event_filter::verify_completed_state() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn’t currently called, but I implemented the override anyway for potential future-proofing.
(position_gtid_filter
is never an Intersecting_gtid_event_filter
, but gtid_event_filter
may be.)
my_hash_iterate(&m_filters_by_id_hash, | ||
verify_subfilter_completed_state, &is_any_incomplete); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of iterating the hash (and segfault myself from the hash’s untypedness), I could’ve directly iterated the Domain_gtid_event_filter::m_stop_filters
.
Iterating Id_delegating_gtid_event_filter::m_filters_by_id_hash
- future-proofs for
--start-position
warnings – if we want to. - is possible in the superclass, so
Server_gtid_event_filter
(used by--ignore-server-ids
) can inherit the method override.
Any preference?
{ | ||
my_bool is_any_incomplete= FALSE; | ||
Gtid_event_filter *subfilter; | ||
for (size_t i= 0; i < m_filters.elements; i++) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method was based on ::has_finished()
.
Besides formatting, a difference is that this uses size_t
to match the type of m_filters.elements
.
The former had a ulong
; it’s a potential bug, but I’m yet to review which earliest (supported) version this fix should apply on.
it specifies stderr and has a `fflush(result_file)` step. | ||
*/ | ||
Binlog_gtid_state_validator::warn(stderr, | ||
"Did not reach stop position %u-%u-%llu before end of input", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
%u-%u-%llu
is used throughout this file (and likely more); but GTID struct fields are actually fixed-size, whereas these formats varies by platform.
I’ll file this mismatch as a separate subissue of #3360 (review) (MDEV-21978).
Description
This PR adds
mariadb-binlog
warnings for--stop-position
GTIDs that were not reached at EOF, mainly as a follow-up to MDEV-27037 which added warnings for unreached--stop-position
file indices and--stop-datetime
s.What problem is the patch trying to solve?
--stop-position
warnings inform possible mistakes in the input; they’re helpful for both DBAs on CLI and scripts/wrappers that check/log the output status.MDEV-34614 enchances MDEV-27037 with per-domain GTID validation.
MDEV-27037 was considered a bug fix and so built upon the oldest supported branch (at the time) for forwardporting, but GTID range selection wasn’t a feature of its ancient base version.
Unlike #3400, the warning implementation is in the
Gtid_event_filter
family through a new public polymorphic (virtual
&override
) methodverify_completed_state
, which provides (sub)filter statuses tostderr
without exposing internals such asDomain_gtid_event_filter#m_stop_filters
.While it’s uncommon to commit a new public API targeting a non-latest version, this method is hierarchical and is extensible to say
--start-position
and even--ignore-server-ids
(MDEV-20119).This choice is superior to the previous
Binlog_gtid_state_validator
design.Note that in
--gtid-strict-mode
(errors) or-vvv
mode (warnings) (only),Binlog_gtid_state_validator
is already prompting that all (rather than any)--start-position
s are unreached. This may rather be an erroneous side effect (not certain; I’m new 👶).server/client/mysqlbinlog.cc
Lines 1130 to 1151 in 0e8fb97
Ideally, the Separation of Concerns would have:
--start-position
/datetime
s regardless of--gtid-strict-mode
/-vvv
, not just--stop-position
/datetime
s.Do you think this patch might introduce side-effects in other parts of the server?
Performance: The task of validating
--stop-position
(and--start-position
too, if those’re implemented) GTIDs isO(n)
.Release Notes
What does #3400 say for its Release Notes?… Nothing? 🤷
Here’s a summary that covers both this and the previous PR:
How can this PR be tested?
The new test
binlog_mysqlbinlog_warn_stop_gtid
covers positive expectations(but not negative testing yet), though QA should run the entirebinlog
suite to also confirm no regressions.PR quality check
main
branch.