Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

splitSql too aggressive in Redshift for stored procedure definition #381

Open
MPagel opened this issue Dec 17, 2024 · 4 comments
Open

splitSql too aggressive in Redshift for stored procedure definition #381

MPagel opened this issue Dec 17, 2024 · 4 comments

Comments

@MPagel
Copy link

MPagel commented Dec 17, 2024

splitSql does not consider $$ or $LABEL$ to constitute valid code block toggles that are used in plpgsql (redshift and postgress).

e.g.

CREATE OR REPLACE PROCEDURE testproc (varchar(50)) AS $$
DECLARE
	i int DEFAULT 1;
BEGIN
	CREATE TABLE IF NOT EXISTS testtab (
			id INT NOT NULL DEFAULT i
			,data varchar(50) DEFAULT NULL
			)
	DISTSTYLE ALL;
	INSERT INTO testtab(id, data) VALUES(i,$1);
END; $$ LANGUAGE plpgsql;

is a valid single SQL statement

however splitSQL attempts to break the stored procedure into three separate statements
splitSql("CREATE OR REPLACE PROCEDURE testproc (varchar(50)) AS $$\nDECLARE\n\ti int DEFAULT 1;\nBEGIN\n\tCREATE TABLE IF NOT EXISTS testtab (\n\t\t\tid INT NOT NULL DEFAULT i\n\t\t\t,data varchar(50) DEFAULT NULL\n\t\t\t)\n\tDISTSTYLE ALL;\n\tINSERT INTO testtab(id, data) VALUES(i,$1);\nEND; $$ LANGUAGE plpgsql;\n")

[1] "CREATE OR REPLACE PROCEDURE testproc (varchar(50)) AS $$\nDECLARE\n\ti int DEFAULT 1"
[2] "BEGIN\n\tCREATE TABLE IF NOT EXISTS testtab (\n\t\t\tid INT NOT NULL DEFAULT i\n\t\t\t,data varchar(50) DEFAULT NULL\n\t\t\t)\n\tDISTSTYLE ALL;\n\tINSERT INTO testtab(id, data) VALUES(i,$1);\nEND;"
[3] "$$ LANGUAGE plpgsql"

If not using a DECLARE section, you get 2 total statements
[1] "CREATE OR REPLACE PROCEDURE testproc (varchar(50)) AS $$\nBEGIN\n\tCREATE TABLE IF NOT EXISTS testtab (\n\t\t\tid INT NOT NULL DEFAULT i\n\t\t\t,data varchar(50) DEFAULT NULL\n\t\t\t)\n\tDISTSTYLE ALL;\n\tINSERT INTO testtab(id, data) VALUES(1,$1);\nEND;"
[2] "$$ LANGUAGE plpgsql"

Then executing these leads to

com.amazon.redshift.util.RedshiftException: Unterminated dollar quote started at position nn in SQL CREATE OR REPLACE PROCEDURE ... i int DEFAULT 1;. Expected terminating $$

To test that it the original statement at top is valid, you can execute (using DBeaver, OmniDB, etc) that statement followed by

truncate testtab;
CALL testproc('abc');
select * from testtab;

id "data"
1 abc

insert into testtab values (2,'mno');
insert into testtab(data) values ('def');
insert into testtab(id) values (7);
select * from testtab;

id "data"
1 abc
2 mno
1 def
7 [NULL]

possibly relevant refs:
https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_PROCEDURE.html
https://www.postgresql.org/docs/current/sql-createprocedure.html

@MPagel
Copy link
Author

MPagel commented Dec 17, 2024

In order to handle $$ (but not $LABEL$) properly, I think boolean doubleDollar = false; String doubleDollarText = "$$"; needs to be inserted before

for (cursor = start; cursor < tokens.size(); cursor++) {

then add an else block into the if statement that follows (L42)

else if (token.text.equals(doubleDollarText)) {
  if (doubleDollar) { // closing an open block
    lastPop = nestStack.pop();
  } else { // open a $$ block
    nestStack.push(token.text);
  }
  doubleDollar = !(doubleDollar); //toggle
}

This has not been tested, and I don't know if the BEGIN/CASE and END logic would need to be altered.

@MPagel
Copy link
Author

MPagel commented Dec 18, 2024

my workaround for now is to use a modified version of N3C's parse_sql function to do the splitting of one query from the next, rather than using DatabaseConnector::executeSQL. Then I overrode executeSQL by adding a noSplit with default value FALSE to the function parameters and added a block between lines 416 and 417
https://github.com/OHDSI/DatabaseConnector/blob/46eb774beae80714efa7a6ad39558c478ceb597c/R/Sql.R#L416

  if (noSplit) { 
    sqlStatements <- sql 
  } else {
    sqlStatements <- SqlRender::splitSql(sql)
  }

@schuemie
Copy link
Member

The main purpose of SqlRender is to translate from one SQL dialect (OHDSISql) to others. splitSql() is mostly a convenience function for afterwards. I don't think the RedShift SQL you're referring to is the result of a translation?

@MPagel
Copy link
Author

MPagel commented Dec 19, 2024

The code that I'm trying to wrap into a procedure/function is from N3C's phenotype scripts, which I believe was the result of code translated from T-SQL.

Further, N3C uses the OHDSI DatabaseConnector package to execute the queries by way of executeSql. executeSql calls SqlRender::splitSql, which is why I posted here. I will see if I can bypass the call out to splitSql by calling lowLevelExecuteSql instead of executeSql, seeing as I've already done the SQL splitting.

I will also open a pull request over at DatabaseConnector to allow override of the call out to splitSql within (the non-low-level) executeSql and will reference this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants