Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: (#3387) use collation to determine if string type is compatible #3400

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

DrewMcArthur
Copy link

@DrewMcArthur DrewMcArthur commented Aug 2, 2024

Does your PR solve an issue?

fixes #3387
fixes #3390

todo

Copy link
Author

@DrewMcArthur DrewMcArthur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abonander @alu I'd love your feedback on this, whenever you get a chance 😄

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these changes are all a result of checking against SELECT ID, COLLATION_NAME FROM INFORMATION_SCHEMA.COLLATIONS ORDER BY ID; locally - let me know if any of them should be thrown out.

  • There was an additional ~100 collations on my machine, which I added and then commented out, rather than implementing all the other pieces necessary. let me know if you'd like me to finish that out, I'd be happy to, if that's something we want. otherwise, we can remove the comments of those extra collations.
  • uft8_* was displayed as utf8mb3_* on my machine. let me know if those changes should be reverted
  • also added impl TryFrom<u16> for Collation for use here

Comment on lines 65 to +69
// https://dev.mysql.com/doc/internals/en/com-query-response.html#column-type
// dead link ^
// https://dev.mysql.com/doc/dev/mysql-server/9.0.0/page_protocol_com_query_response_text_resultset_column_definition.html
// the "type of the column as defined in enum_field_types"
// https://dev.mysql.com/doc/dev/mysql-server/9.0.0/field__types_8h.html
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feel free to suggest which of these links we should keep - the original link here was dead when i tried it.

@DrewMcArthur DrewMcArthur changed the title [3387] use collation to determine if string type is compatible fix: (3387) use collation to determine if string type is compatible Aug 2, 2024
@DrewMcArthur DrewMcArthur changed the title fix: (3387) use collation to determine if string type is compatible fix: (#3387) use collation to determine if string type is compatible Aug 2, 2024
@@ -12,6 +13,7 @@ impl Type<MySql> for str {
MySqlTypeInfo {
r#type: ColumnType::VarString, // VARCHAR
flags: ColumnFlags::empty(),
collation: Collation::utf8mb3_bin,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DrewMcArthur

I think using utf8mb3_bin regardless of session state is misleading.
In my opinion, the only difference needed in this context is whether it is binary or string.
How about using an enum like this one?

CollationType {
    Binary,
    String
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was one of the pieces I wasn't sure about, what the default should be for some of these. I think you're right, _bin probably isn't right & I'll update this to utf8mb3_general_ci if you think that'd be correct.

As for changing the type of MySqlTypeInfo's collation field to an enum, I think the source of the issue on #3387 is that there are three cases being compressed into the ColumnFlag::binary flag:

  • a VARCHAR has a collation like utf8_general_ci, and the ColumnFlag::binary is false, so this is compatible with a Rust String
  • a BLOB has a collation of binary, ColumnFlag::binary is true, so it's not compatible with a Rust String, which is also correct
  • a VARCHAR has a collation of *_bin, so ColumnFlag::binary is true, so it's marked incompatible when we should be able to decode this into a String. This is the case that needs to be fixed.

This could be handled with that enum, if the value is String for both VARCHAR cases, but I feel like that could also be confusing to have a CollationType::String (as opposed to binary) while simultaneously having ColumnFlag::binary = true.

I think my preference is to keep track of each unique collation here, since it's only a byte, but I'll defer to you and @abonander to decide what makes more sense.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
collation: Collation::utf8mb3_bin,
collation: Collation::utf8mb3_general_ci,

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ @alu (sorry, I'm terrible at remembering to directly @ people in replies here)

@eirnym
Copy link

eirnym commented Oct 9, 2024

@alu This fixes only for MySQL, not for SQLite one. I understand that original question is for MySQL, but SQLite and presumably PostgreSQL are also affected.

Please, tell me if i need to create a separate issue for SQLite.

@alvico
Copy link

alvico commented Oct 17, 2024

can you folks merge this?

@DrewMcArthur
Copy link
Author

@eirnym oh good catch - my bandwidth is pretty low right now, so if you wanted to open a PR into my branch with those changes, or even just the tests for them that'd be awesome!

@alvico I'm not sure their release schedule, but you can target this branch in your Cargo.toml if you'd like to have the fix locally.

@alvico
Copy link

alvico commented Nov 11, 2024

any news on merging this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants