Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix MySQL parsing of GRANT, REVOKE, and CREATE VIEW #1538

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

mvzink
Copy link
Contributor

@mvzink mvzink commented Nov 20, 2024

  • Support MySQL style 'user'@'host' syntax.
  • Support wildcards on grant objects: *.*.
  • Make CASCADE/RESTRICT in revoke statements optional since MySQL doesn't accept it.
  • Support optional MySQL parameters for create view: algorithm, definer (requires user@host syntax), and security context.

This is needed for parsing MySQL-style `'user'@'host'` grantee syntax.

As far as I can tell, no dialect allows quotes or backticks as part of
an identifier (regardless of whether it starts with `@`) without other
specific syntax (e.g. nested in another quote style and thus not
starting with `@`), so this shouldn't adversely affect non-MySQL
dialects.
Introduces a new `Grantee` enum to differentiate bare identifiers (every
other database: `root`) from user/host pairs (MySQL: `'root'@'%'`).

While we're here, make the CASCADE/RESTRICT syntax for REVOKE optional
since Postgres doesn't require it and MySQL doesn't allow it.

Add support for MySQL wildcard object syntax: `GRANT ALL ON *.* ...`
@mvzink
Copy link
Contributor Author

mvzink commented Nov 20, 2024

Would appreciate feedback on the naming the MySQL-specific params types (would a generic/non-branded name be better?) and the API change for wildcard support (didn't want to change all 107 parse_object_name callsites for an option that is so niche, but happy to be convinced otherwise).

tests/sqlparser_mysql.rs Show resolved Hide resolved
tests/sqlparser_mysql.rs Show resolved Hide resolved
src/parser/mod.rs Outdated Show resolved Hide resolved
src/parser/mod.rs Outdated Show resolved Hide resolved
Comment on lines +1205 to +1207
Some('\'') => Ok(Some(Token::AtSign)),
Some('\"') => Ok(Some(Token::AtSign)),
Some('`') => Ok(Some(Token::AtSign)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the changes feel unexpected, took a quick look it seems rather a bug that mysql dialect is currently configured to accept @ as an identifier start? their doc seems to suggest otherwise?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought so at first too, but variables in MySQL start with @ and I don't think there's a way to differentiate them during tokenization. We would want to tokenize @foo as a single Token::Word, but @'foo' (in my understanding) as [Token::AtSign, Token::SingleQuotedString]. In this check during tokenization, we aren't differentiating between "schema names etc." and "variable names", just "identifiers". Maybe there is something more correct to do here, like checking for self.is_identifier_continuation(sch) and ruling out ' there. But my approach of just adding cases for quotes is based on the fact that as far as I can tell, no dialect allows quote marks within an identifier unless it's in an outer quotation, which doesn't get tokenized by this path.

I think an approach like self.is_identifier_continuation(sch) would effectively be the same but could allow customization per dialect, but it has another pitfall: say in addition to quote marks, a dialect doesn't allow _ in an identifier. Then should we fail to tokenize @_ with an error, or tokenize them separately as [@, _]? If the former, we would then need another special case for quote marks, and the latter seems wrong to me.

One other example in support of tokenizing like [Token::AtSign, Token::SingleQuotedString], just in case it's not clear why I think that's right: in Postgres, @ is absolute value, and it will implicitly cast the operand to a numeric type, so that select @'1'; (no space after @) treats '1' as a string and casts it to double.

All that in mind, I still think my current approach is the simplest correct I can think of; let me know what you think!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see that makes sense to me and the current approach sounds reasonable! Can we add a comment around mentioning the example use case? I think that would be useful since the logic/tradeoff wouldnt be obvious otherwise, Also can we extend the tokenizer test case to cover the other two tokens supported by this path " and '`'?

@@ -8353,20 +8409,40 @@ impl<'a> Parser<'a> {
}
}

/// Parse a possibly qualified, possibly quoted identifier, optionally allowing for wildcards,
/// e.g. *, `foo`.*, or "foo"."bar"
pub fn parse_object_name_with_wildcards(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method sounds like what parse_wildcard_expr does, minus the return type. Can we reuse that functionality instead? e.g. if we move that logic out to its own method so that parse_wildcard_expr and other functionality like grant etc can call it directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still having a think about the best way to do this; posting the other updates for further feedback.

src/parser/mod.rs Outdated Show resolved Hide resolved
tests/sqlparser_mysql.rs Show resolved Hide resolved
tests/sqlparser_mysql.rs Outdated Show resolved Hide resolved
src/ast/mod.rs Outdated Show resolved Hide resolved
Comment on lines +1205 to +1207
Some('\'') => Ok(Some(Token::AtSign)),
Some('\"') => Ok(Some(Token::AtSign)),
Some('`') => Ok(Some(Token::AtSign)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see that makes sense to me and the current approach sounds reasonable! Can we add a comment around mentioning the example use case? I think that would be useful since the logic/tradeoff wouldnt be obvious otherwise, Also can we extend the tokenizer test case to cover the other two tokens supported by this path " and '`'?

Comment on lines +7483 to +7497
let parts = [
self.algorithm
.as_ref()
.map(|algorithm| format!("ALGORITHM = {algorithm}")),
self.definer
.as_ref()
.map(|definer| format!("DEFINER = {definer}")),
self.security
.as_ref()
.map(|security| format!("SQL SECURITY {security}")),
]
.into_iter()
.flatten()
.collect::<Vec<_>>();
display_separated(&parts, " ").fmt(f)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we write these explicitly as e.g?

if let Some(algorithm) = self.algorithm { write!() }

I think that potentially makes it easier to spot the final output and extend in the future.
Also ideally we can use an exhaustive check with

let CreateviewParams { algorithm, definer, security } = self

so that the user is automatically guided to this function whenever the representation changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants