Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Direct PR] [release-18.0] Remove sql changes from schemacopy #17050

Conversation

GuptaManan100
Copy link
Member

Description

This PR is fixing an issue we saw while running v18 in production. With the changes to the SQL made in #15859, we are seeing that ever iteration of syncSideCarDB is finding a schema diff and running it -

Applying DDL for table schemacopy:
ALTER TABLE `schemacopy` MODIFY COLUMN `table_schema` varchar(64) COLLATE utf8mb3_bin NOT NULL, MODIFY COLUMN `table_name` varchar(64) COLLATE utf8mb3_bin NOT NULL, MODIFY COLUMN `column_name` varchar(64) NOT NULL, MODIFY COLUMN `character_set_name` varchar(32), MODIFY COLUMN `collation_name` varchar(32), MODIFY COLUMN `data_type` varchar(64) COLLATE utf8mb3_bin NOT NULL, MODIFY COLUMN `column_key` varchar(3) COLLATE utf8mb3_bin NOT NULL, ALGORITHM = COPY

This is because the schema diff code doesn't handle collations and charsets correctly. I've added a test for that assertion too. Moreover, in the PR #15859, we already changed how we read the data from schemacopy so it doesn't look like the SQL changes there are required. They are being reverted in this PR.

Related Issue(s)

Checklist

  • "Backport to:" labels have been added if this change should be back-ported to release branches
  • If this change is to be back-ported to previous releases, a justification is included in the PR description
  • Tests were added or are not required
  • Did the new or modified tests pass consistently locally and on CI?
  • Documentation was added or is not required

Deployment Notes

Copy link
Contributor

vitess-bot bot commented Oct 23, 2024

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has a descriptive title.
  • Ensure there is a link to an issue (except for internal cleanup and flaky test fixes), new features should have an RFC that documents use cases and test cases.

Tests

  • Bug fixes should have at least one unit or end-to-end test, enhancement and new features should have a sufficient number of tests.

Documentation

  • Apply the release notes (needs details) label if users need to know about this change.
  • New features should be documented.
  • There should be some code comments as to why things are implemented the way they are.
  • There should be a comment at the top of each new or modified test to explain what the test does.

New flags

  • Is this flag really necessary?
  • Flag names must be clear and intuitive, use dashes (-), and have a clear help text.

If a workflow is added or modified:

  • Each item in Jobs should be named in order to mark it as required.
  • If the workflow needs to be marked as required, the maintainer team must be notified.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • RPC changes should be compatible with vitess-operator
  • If a flag is removed, then it should also be removed from vitess-operator and arewefastyet, if used there.
  • vtctl command output order should be stable and awk-able.

@vitess-bot vitess-bot bot added NeedsBackportReason If backport labels have been applied to a PR, a justification is required NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsIssue A linked issue is missing for this Pull Request NeedsWebsiteDocsUpdate What it says labels Oct 23, 2024
@GuptaManan100
Copy link
Member Author

I would appreciate reviews from both of you, @shlomi-noach and @arthurschreiber, as the original PR reviewer and creators! 🥺

@github-actions github-actions bot added this to the v18.0.8 milestone Oct 23, 2024
@GuptaManan100 GuptaManan100 removed NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsWebsiteDocsUpdate What it says NeedsIssue A linked issue is missing for this Pull Request NeedsBackportReason If backport labels have been applied to a PR, a justification is required labels Oct 23, 2024
@deepthi
Copy link
Member

deepthi commented Oct 23, 2024

I agree with Manan's analysis, but we'll wait for the other reviewers.

@GuptaManan100
Copy link
Member Author

Without the schema change the detect query fails -

COLLATION 'utf8mb3_bin' is not valid for CHARACTER SET 'utf8mb4' (errno 1253) (sqlstate 42000) during query: SELECT DISTINCT table_name
        	            	FROM (
        	            		SELECT table_name COLLATE utf8mb3_bin AS table_name, column_name COLLATE utf8mb3_general_ci AS column_name, ordinal_position, character_set_name COLLATE utf8mb3_general_ci AS character_set_name, collation_name COLLATE utf8mb3_general_ci AS collation_name, data_type COLLATE utf8mb3_bin AS data_type, column_key COLLATE utf8mb3_bin AS column_key
        	            		FROM information_schema.columns
        	            		WHERE table_schema = database()
        	            	
        	            		UNION ALL
        	            	
        	            		SELECT table_name COLLATE utf8mb3_bin AS table_name, column_name COLLATE utf8mb3_general_ci AS column_name, ordinal_position, character_set_name COLLATE utf8mb3_general_ci AS character_set_name, collation_name COLLATE utf8mb3_general_ci AS collation_name, data_type COLLATE utf8mb3_bin AS data_type, column_key COLLATE utf8mb3_bin AS column_key
        	            		FROM _vt.schemacopy
        	            		WHERE table_schema = database()
        	            	) _inner
        	            	GROUP BY table_name, column_name, ordinal_position, character_set_name, collation_name, data_type, column_key
        	            	HAVING COUNT(*) = 1

@arthurschreiber
Copy link
Contributor

arthurschreiber commented Oct 23, 2024

I don't remember the reason behind it, but can't we use utf8mb4 based collations for the query? 🤔

Copy link
Contributor

@shlomi-noach shlomi-noach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is inline with schema_migrations.sql and other tables.

toQueries: []string{
"create table t1 (id int primary key, foo varchar(64) collate utf8mb3_bin)",
},
// This isn't strictly correct. We have a diff even though there shouldn't be one.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is, if we assume the default charset is utf8mb4, right?

@mattlord
Copy link
Contributor

This is potentially related? #16670

@GuptaManan100
Copy link
Member Author

I looked into this in more detail, and here is what I found.

In release-17.0, the schemacopy schema we had was -

CREATE TABLE IF NOT EXISTS schemacopy
 (
     `table_schema`       varchar(64)     NOT NULL,
     `table_name`         varchar(64)     NOT NULL,
     `column_name`        varchar(64)     NOT NULL,
     `ordinal_position`   bigint unsigned NOT NULL,
     `character_set_name` varchar(32) DEFAULT NULL,
     `collation_name`     varchar(32) DEFAULT NULL,
     `data_type`          varchar(64)     NOT NULL,
     `column_key`         varchar(3)      NOT NULL,
     PRIMARY KEY (`table_schema`, `table_name`, `ordinal_position`)
 ) ENGINE = InnoDB

and then when we upgrade to release-18.0, the schema becomes -

CREATE TABLE IF NOT EXISTS schemacopy
 (
     `table_schema`          varchar(64) CHARACTER SET utf8mb3 COLLATE utf8mb3_bin         NOT NULL,
     `table_name`            varchar(64) CHARACTER SET utf8mb3 COLLATE utf8mb3_bin         NOT NULL,
     `column_name`           varchar(64) CHARACTER SET utf8mb3 COLLATE utf8mb3_general_ci  NOT NULL,
     `ordinal_position`      bigint unsigned                                               NOT NULL,
     `character_set_name`    varchar(32) CHARACTER SET utf8mb3 COLLATE utf8mb3_general_ci  DEFAULT NULL,
     `collation_name`        varchar(32) CHARACTER SET utf8mb3 COLLATE utf8mb3_general_ci  DEFAULT NULL,
     `data_type`             varchar(64) CHARACTER SET utf8mb3 COLLATE utf8mb3_bin         NOT NULL,
     `column_key`            varchar(3)  CHARACTER SET utf8mb3 COLLATE utf8mb3_bin         NOT NULL,
     PRIMARY KEY (`table_schema`, `table_name`, `ordinal_position`)
 ) ENGINE = InnoDB, CHARACTER SET = utf8mb3

this makes schema diff generate this diff -

ALTER TABLE `schemacopy` MODIFY COLUMN `table_schema` varchar(64) COLLATE utf8mb3_bin NOT NULL, MODIFY COLUMN `table_name` varchar(64) COLLATE utf8mb3_bin NOT NULL, MODIFY COLUMN `column_name` varchar(64) NOT NULL, MODIFY COLUMN `character_set_name` varchar(32), MODIFY COLUMN `collation_name` varchar(32), MODIFY COLUMN `data_type` varchar(64) COLLATE utf8mb3_bin NOT NULL, MODIFY COLUMN `column_key` varchar(3) COLLATE utf8mb3_bin NOT NULL, ALGORITHM = COPY

However, when we apply this, the show create table command is -

CREATE TABLE `schemacopy` (
  `table_schema` varchar(64) CHARACTER SET utf8mb3 COLLATE utf8mb3_bin NOT NULL,
  `table_name` varchar(64) CHARACTER SET utf8mb3 COLLATE utf8mb3_bin NOT NULL,
  `column_name` varchar(64) NOT NULL,
  `ordinal_position` bigint unsigned NOT NULL,
  `character_set_name` varchar(32) DEFAULT NULL,
  `collation_name` varchar(32) DEFAULT NULL,
  `data_type` varchar(64) CHARACTER SET utf8mb3 COLLATE utf8mb3_bin NOT NULL,
  `column_key` varchar(3) CHARACTER SET utf8mb3 COLLATE utf8mb3_bin NOT NULL,
  PRIMARY KEY (`table_schema`,`table_name`,`ordinal_position`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

this creates schema diff to continue to generate the following diff with the desired schema -

ALTER TABLE `schemacopy` MODIFY COLUMN `table_schema` varchar(64) COLLATE utf8mb3_bin NOT NULL, MODIFY COLUMN `table_name` varchar(64) COLLATE utf8mb3_bin NOT NULL, MODIFY COLUMN `column_name` varchar(64) NOT NULL, MODIFY COLUMN `character_set_name` varchar(32), MODIFY COLUMN `collation_name` varchar(32), MODIFY COLUMN `data_type` varchar(64) COLLATE utf8mb3_bin NOT NULL, MODIFY COLUMN `column_key` varchar(3) COLLATE utf8mb3_bin NOT NULL, ALGORITHM = COPY

The problem is the default charset added in the end. We could change the schemacopy table to have utf8mb4 collations and charsets to fix this problem on MySQL 8.0, but then MySQL 5.7 would break because the default charset there would be different! I think this was fixed properly by #14930 (I'm not sure, I didn't confirm), but its only available release-19.0 onwards.

Since there is no good way of fixing this easily, and it doesn't actually cause any catastrophic issue, we're just gonna not do anything for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants