-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Online DDL: do not CONVERT column charset when migration does not modify column's charset or collation #16594
Conversation
…ify column's charset or collation Signed-off-by: Shlomi Noach <[email protected]>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #16594 +/- ##
==========================================
- Coverage 68.85% 68.83% -0.03%
==========================================
Files 1557 1557
Lines 199891 199893 +2
==========================================
- Hits 137644 137596 -48
- Misses 62247 62297 +50 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! ❤️
if trivialCharset(fromCollation) && trivialCharset(toCollation) && targetCol.Type() != "json" { | ||
sb.WriteString(escapeName(name)) | ||
} else if fromCollation == toCollation && targetCol.Type() != "json" { | ||
sb.WriteString(escapeName(name)) | ||
} else { | ||
v.convertCharset[targetName] = &binlogdatapb.CharsetConversion{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then I think for the next step in this work we can see if the from/to collations are coercible and only use CONVERT
if they are not (using CONVERT(coo, using target_charset)
in the filter (then no CONVERT
is needed on the vplayer
side, which I think would resolve #16023).
This implicit coercion is already happening e.g. when we specify the LastPK values in the subsequent copy phase cycles where we add those predicates to the base filter as literals — the passed literals will be using the character_set_connection
and collation_connection
(which is altered by set names
) and is compared to the column value which is using the collation of the column. Every character set is a subset of unicode (utf8mb4) so when that is the TO then no CONVERT
is necessary. When we're moving the other direction then I'm actually not sure if we should use convert or not as w/o it the migration could fail (if we try to insert invalid chars on the target, at least assuming STRICT mode), but with it you lose data (e.g. non-latin1 chars would be lost in the CONVERT
output ):
mysql> select "I'll have a 🍺" as msg;
+------------------+
| msg |
+------------------+
| I'll have a 🍺 |
+------------------+
1 row in set (0.00 sec)
mysql> select convert("I'll have a 🍺" using latin1) as msg;
+---------------+
| msg |
+---------------+
| I'll have a ? |
+---------------+
1 row in set (0.00 sec)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we're moving the other direction then I'm actually not sure if we should use convert or not as w/o it the migration could fail
There's an explicit test that validate that the migration fails. See fail-invalid-character
.
The changes in this PR were implicitly merged by #16597 |
Description
Partial fix to #16023 (to be followed up).
Per the discussion in #16023 (comment) and #16023 (comment), adding
CONVERT(...)
to the column negates the ability to use the index over that column, and if we try and outsmart theORDER BY
clause we can end up with incorrect ordering due to collation differences.In this PR we avoid adding
CONVERT()
to columns whose charset/collations have not been changed in the migration. For the simplest example, if we have:The column
text_id
is not modified, and more specifically its charset and collation remain the same. In this scenario, there is no need to addCONVERT
. There used to be such need, in the past, from way before Vitess supported non-utf8 charsets&collations.All tests remain the same (and one added). We expect everything to pass.
Related Issue(s)
Checklist
Deployment Notes