splitSql too aggressive in Redshift for stored procedure definition #381

MPagel · 2024-12-17T17:24:09Z

splitSql does not consider $$ or $LABEL$ to constitute valid code block toggles that are used in plpgsql (redshift and postgress).

e.g.

CREATE OR REPLACE PROCEDURE testproc (varchar(50)) AS $$
DECLARE
	i int DEFAULT 1;
BEGIN
	CREATE TABLE IF NOT EXISTS testtab (
			id INT NOT NULL DEFAULT i
			,data varchar(50) DEFAULT NULL
			)
	DISTSTYLE ALL;
	INSERT INTO testtab(id, data) VALUES(i,$1);
END; $$ LANGUAGE plpgsql;

is a valid single SQL statement

however splitSQL attempts to break the stored procedure into three separate statements
splitSql("CREATE OR REPLACE PROCEDURE testproc (varchar(50)) AS $$\nDECLARE\n\ti int DEFAULT 1;\nBEGIN\n\tCREATE TABLE IF NOT EXISTS testtab (\n\t\t\tid INT NOT NULL DEFAULT i\n\t\t\t,data varchar(50) DEFAULT NULL\n\t\t\t)\n\tDISTSTYLE ALL;\n\tINSERT INTO testtab(id, data) VALUES(i,$1);\nEND; $$ LANGUAGE plpgsql;\n")

[1] "CREATE OR REPLACE PROCEDURE testproc (varchar(50)) AS $$\nDECLARE\n\ti int DEFAULT 1"
[2] "BEGIN\n\tCREATE TABLE IF NOT EXISTS testtab (\n\t\t\tid INT NOT NULL DEFAULT i\n\t\t\t,data varchar(50) DEFAULT NULL\n\t\t\t)\n\tDISTSTYLE ALL;\n\tINSERT INTO testtab(id, data) VALUES(i,$1);\nEND;"
[3] "$$ LANGUAGE plpgsql"

If not using a DECLARE section, you get 2 total statements
[1] "CREATE OR REPLACE PROCEDURE testproc (varchar(50)) AS $$\nBEGIN\n\tCREATE TABLE IF NOT EXISTS testtab (\n\t\t\tid INT NOT NULL DEFAULT i\n\t\t\t,data varchar(50) DEFAULT NULL\n\t\t\t)\n\tDISTSTYLE ALL;\n\tINSERT INTO testtab(id, data) VALUES(1,$1);\nEND;"
[2] "$$ LANGUAGE plpgsql"

Then executing these leads to

com.amazon.redshift.util.RedshiftException: Unterminated dollar quote started at position nn in SQL CREATE OR REPLACE PROCEDURE ... i int DEFAULT 1;. Expected terminating $$

To test that it the original statement at top is valid, you can execute (using DBeaver, OmniDB, etc) that statement followed by

truncate testtab;
CALL testproc('abc');
select * from testtab;

id "data"
1 abc

insert into testtab values (2,'mno');
insert into testtab(data) values ('def');
insert into testtab(id) values (7);
select * from testtab;

id "data"
1 abc
2 mno
1 def
7 [NULL]

possibly relevant refs:
https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_PROCEDURE.html
https://www.postgresql.org/docs/current/sql-createprocedure.html

The text was updated successfully, but these errors were encountered:

MPagel · 2024-12-17T17:47:07Z

In order to handle $$ (but not $LABEL$ ) properly, I think boolean doubleDollar = false; String doubleDollarText = "$$"; needs to be inserted before

SqlRender/java/org/ohdsi/sql/SqlSplit.java

Line 40 in a0dad78

for (cursor = start; cursor < tokens.size(); cursor++) {

then add an else block into the if statement that follows (L42)

else if (token.text.equals(doubleDollarText)) {
  if (doubleDollar) { // closing an open block
    lastPop = nestStack.pop();
  } else { // open a $$ block
    nestStack.push(token.text);
  }
  doubleDollar = !(doubleDollar); //toggle
}

This has not been tested, and I don't know if the BEGIN/CASE and END logic would need to be altered.

MPagel · 2024-12-18T19:04:29Z

my workaround for now is to use a modified version of N3C's parse_sql function to do the splitting of one query from the next, rather than using DatabaseConnector::executeSQL. Then I overrode executeSQL by adding a noSplit with default value FALSE to the function parameters and added a block between lines 416 and 417
https://github.com/OHDSI/DatabaseConnector/blob/46eb774beae80714efa7a6ad39558c478ceb597c/R/Sql.R#L416

  if (noSplit) { 
    sqlStatements <- sql 
  } else {
    sqlStatements <- SqlRender::splitSql(sql)
  }

schuemie · 2024-12-19T15:40:37Z

The main purpose of SqlRender is to translate from one SQL dialect (OHDSISql) to others. splitSql() is mostly a convenience function for afterwards. I don't think the RedShift SQL you're referring to is the result of a translation?

MPagel · 2024-12-19T16:48:06Z

The code that I'm trying to wrap into a procedure/function is from N3C's phenotype scripts, which I believe was the result of code translated from T-SQL.

Further, N3C uses the OHDSI DatabaseConnector package to execute the queries by way of executeSql. executeSql calls SqlRender::splitSql, which is why I posted here. I will see if I can bypass the call out to splitSql by calling lowLevelExecuteSql instead of executeSql, seeing as I've already done the SQL splitting.

I will also open a pull request over at DatabaseConnector to allow override of the call out to splitSql within (the non-low-level) executeSql and will reference this issue.

MPagel mentioned this issue Dec 19, 2024

add noSplit option to executeSql OHDSI/DatabaseConnector#298

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

splitSql too aggressive in Redshift for stored procedure definition #381

splitSql too aggressive in Redshift for stored procedure definition #381

MPagel commented Dec 17, 2024 •

edited

Loading

MPagel commented Dec 17, 2024 •

edited

Loading

MPagel commented Dec 18, 2024 •

edited

Loading

schuemie commented Dec 19, 2024

MPagel commented Dec 19, 2024 •

edited

Loading

splitSql too aggressive in Redshift for stored procedure definition #381

splitSql too aggressive in Redshift for stored procedure definition #381

Comments

MPagel commented Dec 17, 2024 • edited Loading

MPagel commented Dec 17, 2024 • edited Loading

MPagel commented Dec 18, 2024 • edited Loading

schuemie commented Dec 19, 2024

MPagel commented Dec 19, 2024 • edited Loading

MPagel commented Dec 17, 2024 •

edited

Loading

MPagel commented Dec 17, 2024 •

edited

Loading

MPagel commented Dec 18, 2024 •

edited

Loading

MPagel commented Dec 19, 2024 •

edited

Loading