Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Online DDL: do not CONVERT column charset when migration does not modify column's charset or collation #16594

Conversation

shlomi-noach
Copy link
Contributor

@shlomi-noach shlomi-noach commented Aug 14, 2024

Description

Partial fix to #16023 (to be followed up).

Per the discussion in #16023 (comment) and #16023 (comment), adding CONVERT(...) to the column negates the ability to use the index over that column, and if we try and outsmart the ORDER BY clause we can end up with incorrect ordering due to collation differences.

In this PR we avoid adding CONVERT() to columns whose charset/collations have not been changed in the migration. For the simplest example, if we have:

create table t (
  text_id varchar(64) charset latin1 not null,
  col1 int,
  primary key (text_id)
);

alter table t add column col2 int;

The column text_id is not modified, and more specifically its charset and collation remain the same. In this scenario, there is no need to add CONVERT. There used to be such need, in the past, from way before Vitess supported non-utf8 charsets&collations.

All tests remain the same (and one added). We expect everything to pass.

Related Issue(s)

Checklist

  • "Backport to:" labels have been added if this change should be back-ported to release branches
  • If this change is to be back-ported to previous releases, a justification is included in the PR description
  • Tests were added or are not required
  • Did the new or modified tests pass consistently locally and on CI?
  • Documentation was added or is not required

Deployment Notes

…ify column's charset or collation

Signed-off-by: Shlomi Noach <[email protected]>
@shlomi-noach shlomi-noach added Type: Bug Component: Online DDL Online DDL (vitess/native/gh-ost/pt-osc) labels Aug 14, 2024
@github-actions github-actions bot added this to the v21.0.0 milestone Aug 14, 2024
Copy link

codecov bot commented Aug 14, 2024

Codecov Report

Attention: Patch coverage is 0% with 2 lines in your changes missing coverage. Please review.

Project coverage is 68.83%. Comparing base (cc68dd5) to head (63d4ae3).
Report is 4 commits behind head on main.

Files Patch % Lines
go/vt/vttablet/onlineddl/vrepl.go 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #16594      +/-   ##
==========================================
- Coverage   68.85%   68.83%   -0.03%     
==========================================
  Files        1557     1557              
  Lines      199891   199893       +2     
==========================================
- Hits       137644   137596      -48     
- Misses      62247    62297      +50     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@shlomi-noach shlomi-noach requested review from dbussink and a team August 14, 2024 07:33
Copy link
Contributor

@mattlord mattlord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! ❤️

if trivialCharset(fromCollation) && trivialCharset(toCollation) && targetCol.Type() != "json" {
sb.WriteString(escapeName(name))
} else if fromCollation == toCollation && targetCol.Type() != "json" {
sb.WriteString(escapeName(name))
} else {
v.convertCharset[targetName] = &binlogdatapb.CharsetConversion{
Copy link
Contributor

@mattlord mattlord Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I think for the next step in this work we can see if the from/to collations are coercible and only use CONVERT if they are not (using CONVERT(coo, using target_charset) in the filter (then no CONVERT is needed on the vplayer side, which I think would resolve #16023).

This implicit coercion is already happening e.g. when we specify the LastPK values in the subsequent copy phase cycles where we add those predicates to the base filter as literals — the passed literals will be using the character_set_connection and collation_connection (which is altered by set names) and is compared to the column value which is using the collation of the column. Every character set is a subset of unicode (utf8mb4) so when that is the TO then no CONVERT is necessary. When we're moving the other direction then I'm actually not sure if we should use convert or not as w/o it the migration could fail (if we try to insert invalid chars on the target, at least assuming STRICT mode), but with it you lose data (e.g. non-latin1 chars would be lost in the CONVERT output ):

mysql> select "I'll have a 🍺" as msg;
+------------------+
| msg              |
+------------------+
| I'll have a 🍺     |
+------------------+
1 row in set (0.00 sec)

mysql> select convert("I'll have a 🍺" using latin1) as msg;
+---------------+
| msg           |
+---------------+
| I'll have a ? |
+---------------+
1 row in set (0.00 sec)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we're moving the other direction then I'm actually not sure if we should use convert or not as w/o it the migration could fail

There's an explicit test that validate that the migration fails. See fail-invalid-character.

@shlomi-noach
Copy link
Contributor Author

The changes in this PR were implicitly merged by #16597

@shlomi-noach shlomi-noach deleted the online-ddl-no-convert-identical-collation branch August 15, 2024 14:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Online DDL Online DDL (vitess/native/gh-ost/pt-osc) Type: Bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants