Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MDEV-34614] mysqlbinlog warn on EOF before GTID in --stop-position #3600

Open
wants to merge 2 commits into
base: 10.11
Choose a base branch
from

Conversation

ParadoxV5
Copy link
Contributor

@ParadoxV5 ParadoxV5 commented Oct 29, 2024

Description

This PR adds mariadb-binlog warnings for --stop-position GTIDs that were not reached at EOF, mainly as a follow-up to MDEV-27037 which added warnings for unreached --stop-position file indices and --stop-datetimes.

What problem is the patch trying to solve?

--stop-position warnings inform possible mistakes in the input; they’re helpful for both DBAs on CLI and scripts/wrappers that check/log the output status.

MDEV-34614 enchances MDEV-27037 with per-domain GTID validation.
MDEV-27037 was considered a bug fix and so built upon the oldest supported branch (at the time) for forwardporting, but GTID range selection wasn’t a feature of its ancient base version.

Unlike #3400, the warning implementation is in the Gtid_event_filter family through a new public polymorphic (virtual&override) method verify_completed_state, which provides (sub)filter statuses to stderr without exposing internals such as Domain_gtid_event_filter#m_stop_filters.
While it’s uncommon to commit a new public API targeting a non-latest version, this method is hierarchical and is extensible to say --start-position and even --ignore-server-ids (MDEV-20119).

This choice is superior to the previous Binlog_gtid_state_validator design.
Note that in --gtid-strict-mode (errors) or -vvv mode (warnings) (only), Binlog_gtid_state_validator is already prompting that all (rather than any) --start-positions are unreached. This may rather be an erroneous side effect (not certain; I’m new 👶).

server/client/mysqlbinlog.cc

Lines 1130 to 1151 in 0e8fb97

/*
Where we always ensure the initial binlog state is valid, we only
continually monitor the GTID stream for validity if we are in GTID
strict mode (for errors) or if three levels of verbosity is provided
(for warnings).
If we don't care about ensuring GTID validity, just delete the auditor
object to disable it for future checks.
*/
if (gtid_state_validator && print_event_info->is_event_group_active())
{
if (!(opt_gtid_strict_mode || verbose >= 3))
{
delete gtid_state_validator;
/*
Explicitly reset to NULL to simplify checks on if auditing is enabled
i.e. if it is defined, assume we want to use it
*/
gtid_state_validator= NULL;
}
else

Ideally, the Separation of Concerns would have:

  • Filters preselect which entries (row or transaction depending on the format) are processed.
  • The GTID validator doesn’t care about ranges; it only validates entries (e.g., confirming GITD order) within the range given by the filters.
  • For parity, the program warns out-of-range --start-position/datetimes regardless of --gtid-strict-mode/-vvv, not just --stop-position/datetimes.

Do you think this patch might introduce side-effects in other parts of the server?

Performance: The task of validating --stop-position (and --start-position too, if those’re implemented) GTIDs is O(n).

Release Notes

What does #3400 say for its Release Notes?… Nothing? 🤷

Here’s a summary that covers both this and the previous PR:

mariadb-binlog now warns about any --stop-positions and --stop-datetimes that didn’t apply after reading the entire log.

How can this PR be tested?

The new test binlog_mysqlbinlog_warn_stop_gtid covers positive expectations (but not negative testing yet), though QA should run the entire binlog suite to also confirm no regressions.

PR quality check

  • This is a new feature or a refactoring, and the PR is based against the main branch.
  • This is a bug fix, and the PR is based against the earliest maintained branch in which the bug can be reproduced.
  • I checked the CODING_STANDARDS.md file and my PR conforms to this where appropriate.
  • For any trivial modifications to the PR, I am ok with the reviewer making the changes themselves.

@ParadoxV5
Copy link
Contributor Author

PR # 60²

sql/rpl_gtid.cc Outdated Show resolved Hide resolved
Copy link
Contributor

@bnestere bnestere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ParadoxV5

Thanks for a well done PR! Please see my notes.

client/mysqlbinlog.cc Outdated Show resolved Hide resolved
sql/rpl_gtid.cc Outdated Show resolved Hide resolved
sql/rpl_gtid.cc Outdated Show resolved Hide resolved
sql/rpl_gtid.cc Outdated Show resolved Hide resolved
sql/rpl_gtid.cc Outdated
rpl_gtid *stop_gtid= &stop_gtids[i];
struct audit_elem *audit_elem= (struct audit_elem *)my_hash_search(
&m_audit_elem_domain_lookup,
(const uchar *)&(stop_gtid->domain_id), 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(const uchar *)&(stop_gtid->domain_id), 0
reinterpret_cast<const uchar *>(&(stop_gtid->domain_id)), 0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks; my background was in C and I’m unfamiliar with C++ trickeries like this.

Note that I copied based this section from

server/sql/rpl_gtid.cc

Lines 3188 to 3190 in 0e8fb97

struct audit_elem *audit_elem= (struct audit_elem *) my_hash_search(
&m_audit_elem_domain_lookup,
(const uchar *) &(stop_gtid->domain_id), 0);

which used the C-cast.

sql/rpl_gtid.cc Outdated Show resolved Hide resolved
sql/rpl_gtid.cc Outdated
@@ -1,5 +1,5 @@
/* Copyright (c) 2013, Kristian Nielsen and MariaDB Services Ab.
Copyright (c) 2020, 2022, MariaDB Corporation.
Copyright (c) 2020, 2024, MariaDB Corporation.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello seniors, new member question: Who, when, and what conventions are on the copyright years?

By the way, perhaps devs should ditch them altogether: https://hynek.me/til/copyright-years

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's probably a question for @vuvova (or maybe @dbart ?)

And FWIW, speaking for myself, I've only actually set a year when creating a new file. I never felt it necessary to update one in an existing file.

@@ -2,25 +2,25 @@
--echo # MDEV-27037 mysqlbinlog emits a warning when reaching EOF before stop-condition
--echo

--let assert_file= $MYSQLTEST_VARDIR/tmp/warn_pos_test_file.out

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This renaming is fine, but IMHO not worth the churn.

sql/rpl_gtid.cc Outdated Show resolved Hide resolved
Copy link
Contributor

@bnestere bnestere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ParadoxV5 ! See my latest notes (though I can't help but feel I lead you in a circle earlier, to arrive back where you had already started looking 😅)

sql/rpl_gtid.cc Outdated Show resolved Hide resolved
@@ -3490,6 +3491,8 @@ static Exit_status dump_local_log_entries(PRINT_EVENT_INFO *print_event_info,
/*
Emit a warning in the event that we finished processing input
before reaching the boundary indicated by --stop-position.
Due to design limitations, Binlog_gtid_state_validator::report
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was originally going to say something like..

For more straightforward reading, it'd be better to highlight the inconsistency at the start of the sentence. Also, "due to design limitations" doesn't really provide context into why (which is good to have when going back through time :), so it would be good to expand upon that).

and in writing up a suggestion, my rational on design limitations was to reduce code redundancy, b/c I was thinking the stop-check would go into repot_audit_findings() which is already at the domain-scope. But now that is no longer the case, and gtid_state_validator::report() is inconsistent already, in the sense that the stop-gtid check is already separated from the rest of the checks. So we may as well retain some level of consistency and do the check here (e.g. by adding a new API call for gtid_state_validator that checks if the stop position has been hit).

_Though I wonder if that is already what you had planned before I interfered in the fist place 😇 _

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though I wonder if that is already what you had planned before I interfered in the fist place 😇

Yes, it was similar.
I originally considered adding an API to Domain_gtid_event_filter *position_gtid_filter that exposes the completion states of each domain-specific subfilter.
I was ultimately convinced that hiding the code in gtid_state_validator instead is a lesser change to APIs. After all,

This is a bug fix, and the PR is based against the earliest maintained branch in which the bug can be reproduced.

I’d definitely choose the former, more consistent strategy if

This is a new feature or a refactoring, and the PR is based against the main branch.

Copy link
Contributor Author

@ParadoxV5 ParadoxV5 Nov 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For more straightforward reading, it'd be better to highlight the inconsistency at the start of the sentence. Also, "due to design limitations" doesn't really provide context into why (which is good to have when going back through time :), so it would be good to expand upon that).

I have my rationals in the commit description. I’ll copy them over. (EDIT: if we’re not going back to the drawing board, that is)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here’s what we could do: Apply these introverted changes for 10.11 branch, but also follow-up with a refactor (see top post) on the main branch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally considered adding an API to Domain_gtid_event_filter *position_gtid_filter that exposes the completion states of each domain-specific subfilter.

You could also do the warnings in the filters. There is already has_finished(), which you could extend with an optional err-mode parameter (with some noop default) , and then call has_finished(<err_mode_config>) here (or something like that. One idea could be a function-pointer to an output function (like report_f in report_audit_findings (just an idea, you could do it how you like, if you agree with the overall direction)). Though you'll also have to extend the filters a little to support this :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'd be ok extending the filter API in 10.11, so long as we do it minimally (i.e. I'm not a huge fan of exposing the filters directly as position_gtid_filter.m_stop_filters). Perhaps we can add a function to Domain_gtid_event_filter of signature rpl_gtid *Domain_gtid_event_filter::get_stop_gtids_for_active_filters(uint n) which abstracts the loop and has_finished() call

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reëvaluated that it’s not worth to implement suboptimally and inconsistentlly (and also tamper the delete gtid_state_validator optimization) just to minimalize API differences.

I think we'd be ok extending the filter API in 10.11, so long as we do it minimally.

I am going to reïmplement this and #3400 with my ideal concept imagined in the top post:
Move the warning feature to within the (entire?) Gtid_event_filter family.
This will probably be a new method (rather than repurposing has_finished) so it futureproofs for other filters such as --start-position and even --ignore-server-ids (MDEV-20119).

If this turns out to be too much change for 10.11, I might rebrand this ticket & PR to a new feature on main.
(I won’t refactor the validator, at least for now.)


Speaking of – according to git grep, this is an unimplemented prototype found only in its header:

server/sql/rpl_gtid.h

Lines 666 to 674 in a37f71b

protected:
/*
When processing GTID streams, the order in which they are processed should
be sequential with no gaps between events. If a gap is found within a
window, warn the user.
*/
void verify_gtid_is_expected(rpl_gtid *gtid);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good (though I could never find what you were referring to with):

with my ideal concept imagined in the top post:

And please feel free to remove verify_gtid_is_expected() while you're at it :) (though as a separate cleanup commit would be best)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with my ideal concept imagined in the top post:

oh, it’s collapsed under “▶ Unlike 3400”

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Design B is up.

#3400 was not affected (fortunately?) because index and timestamp filters were not cousin objects of the Gtid_event_filters.
I prefer OOP for them too, but such rewriting would be well beyond the scope of this JIRA ticket, especially one that’s targeting a–insert repeated phrase here.

sql/rpl_gtid.cc Show resolved Hide resolved
@ParadoxV5 ParadoxV5 marked this pull request as draft November 5, 2024 20:18
This commit is a collection of various minor
touchups extracted from multiple iterations of
MDEV-34614 mysqlbinlog warn on EOF before GTID in --stop-position.

* `rpl_gtid.h`: Remove unimplemented function prototype
  `Window_gtid_event_filter::verify_gtid_is_expected`
  (it’s also a red herring to MDEV-34614)
* `rpl_gtid.cc`: Add brief description to `report_audit_findings`
* `binlog_mysqlbinlog_warn_stop_position`: Name the temporary output
  redirect as ignored to match `binlog_mysqlbinlog_warn_stop_datetime`
  * #3400 (comment)
* `binlog_mysqlbinlog_warn_stop_datetime`: Remove duplicate case
This commit adds warnings for `--stop-position` GTIDs
that were not reached at EOF, mainly as a follow-up to
MDEV-27037 Mysqlbinlog should output a warning if EOF is found before its stop condition
(#3400).

`--stop-position` warnings inform possible mistakes in the input,
especially for progress reporting of scripts/wrappers.
MDEV-34614 enhances MDEV-27037 with individualized GTID validation, for
GTID range selection weren’t in all versions that MDEV-27037 targeted.

The `Gtid_event_filter` family provides the the warning
mechanism polymorphically and through the new public method
`verify_completed_state`. It’s uncommon to commit a new public
API targeting a non-latest version, but this design is not only
hierarchical but also extensible (e.g., to `--ignore-server-ids`).
@ParadoxV5 ParadoxV5 marked this pull request as ready for review November 11, 2024 01:14
@@ -4067,3 +4119,17 @@ my_bool Intersecting_gtid_event_filter::has_finished()
}
return FALSE;
}

my_bool Intersecting_gtid_event_filter::verify_completed_state()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn’t currently called, but I implemented the override anyway for potential future-proofing.
(position_gtid_filter is never an Intersecting_gtid_event_filter, but gtid_event_filter may be.)

Comment on lines +3657 to +3658
my_hash_iterate(&m_filters_by_id_hash,
verify_subfilter_completed_state, &is_any_incomplete);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of iterating the hash (and segfault myself from the hash’s untypedness), I could’ve directly iterated the Domain_gtid_event_filter::m_stop_filters.
Iterating Id_delegating_gtid_event_filter::m_filters_by_id_hash

  • future-proofs for --start-position warnings – if we want to.
  • is possible in the superclass, so Server_gtid_event_filter (used by --ignore-server-ids) can inherit the method override.

Any preference?

{
my_bool is_any_incomplete= FALSE;
Gtid_event_filter *subfilter;
for (size_t i= 0; i < m_filters.elements; i++)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method was based on ::has_finished().
Besides formatting, a difference is that this uses size_t to match the type of m_filters.elements.
The former had a ulong; it’s a potential bug, but I’m yet to review which earliest (supported) version this fix should apply on.

it specifies stderr and has a `fflush(result_file)` step.
*/
Binlog_gtid_state_validator::warn(stderr,
"Did not reach stop position %u-%u-%llu before end of input",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

%u-%u-%llu is used throughout this file (and likely more); but GTID struct fields are actually fixed-size, whereas these formats varies by platform.
I’ll file this mismatch as a separate subissue of #3360 (review) (MDEV-21978).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

3 participants