Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unescape query from EMR spark submit parameter #306

Merged
merged 2 commits into from
Apr 17, 2024

Conversation

seankao-az
Copy link
Collaborator

@seankao-az seankao-az commented Apr 10, 2024

Description

In a previous PR in sql, query is escaped in spark submit parameter. This PR unescapes the query on the receiving end, retrieving the original query. This affects only the query in SparkConf.

End to end test is performed as follows:
Query:

CREATE INDEX cv
ON mys3.default.http_logs (`status`, `day`, `size`, `request`)
WITH (
  auto_refresh = true,
  refresh_interval="5 Minutes",
  checkpoint_location = 's3://test',
  index_settings = "{\"number_of_shards\":1,\"number_of_replicas\":1}"
)

Spark submit parameter for the EMR job (query escaped and wrapped with quotes):

--conf spark.flint.job.query="CREATE INDEX cv ON mys3.default.http_logs (`status`, `day`, `size`, `request`) WITH (auto_refresh = true, refresh_interval=\"5 Minutes\", checkpoint_location = 's3://test',index_settings = \"{\\\"number_of_shards\\\":1,\\\"number_of_replicas\\\":1}\")"

Query received on spark before this PR fix (query still escaped and result in syntax error):

CREATE INDEX cv ON mys3.default.http_logs (`status`, `day`, `size`, `request`) WITH (auto_refresh = true, refresh_interval=\"5 Minutes\", checkpoint_location = 's3://test',index_settings = \"{\\\"number_of_shards\\\":1,\\\"number_of_replicas\\\":1}\")

Query received on spark after this PR fix (query unescaped):

CREATE INDEX cv ON mys3.default.http_logs (`status`, `day`, `size`, `request`) WITH (auto_refresh = true, refresh_interval="5 Minutes", checkpoint_location = 's3://test',index_settings = "{\"number_of_shards\":1,\"number_of_replicas\":1}")

Issues Resolved

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@seankao-az seankao-az changed the title Bug Fix: unescape query from EMR spark submit parameter Unescape query from EMR spark submit parameter Apr 10, 2024
@seankao-az seankao-az self-assigned this Apr 10, 2024
@seankao-az seankao-az added bug Something isn't working 0.4 labels Apr 10, 2024
Signed-off-by: Sean Kao <[email protected]>
@noCharger
Copy link
Collaborator

let's include this in 0.3 after testing

@seankao-az seankao-az added 0.3 and removed 0.4 labels Apr 11, 2024
@dai-chen
Copy link
Collaborator

let's include this in 0.3 after testing

Because we've started working on 0.4 task, is this the last PR for 0.3? If not, I think we'd better cut 0.3 branch now?

@@ -371,6 +380,7 @@ trait FlintJobExecutor {
val startTime = System.currentTimeMillis()
// we have to set job group in the same thread that started the query according to spark doc
spark.sparkContext.setJobGroup(queryId, "Job group for " + queryId, interruptOnCancel = true)
logInfo(s"Executing query: $query")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may require sensitive info anonymized like: https://github.com/opensearch-project/sql/blob/2649200e065dff48282dce438ceb0ee5ac39054e/sql/src/main/java/org/opensearch/sql/sql/antlr/AnonymizerListener.java#L34. Probably we can do this later since this is also helpful for analyzing query pattern.

Copy link
Collaborator

@dai-chen dai-chen Apr 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seankao-az @noCharger could you address this in other 0.3 PR if any? Probably remove it for now?

@noCharger
Copy link
Collaborator

let's include this in 0.3 after testing

Because we've started working on 0.4 task, is this the last PR for 0.3? If not, I think we'd better cut 0.3 branch now?

Can we include this #303? I will add one more commit later today.

@dai-chen
Copy link
Collaborator

let's include this in 0.3 after testing

Because we've started working on 0.4 task, is this the last PR for 0.3? If not, I think we'd better cut 0.3 branch now?

Can we include this #303? I will add one more commit later today.

Sure. Will block other 0.4 PRs for now.

@noCharger
Copy link
Collaborator

let's include this in 0.3 after testing

Because we've started working on 0.4 task, is this the last PR for 0.3? If not, I think we'd better cut 0.3 branch now?

Can we include this #303? I will add one more commit later today.

Sure. Will block other 0.4 PRs for now.

We can cut the branch and backport too.

@noCharger noCharger merged commit b69f38b into opensearch-project:main Apr 17, 2024
5 checks passed
@noCharger
Copy link
Collaborator

@dai-chen cut 0.3 branch to unblock 0.4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.3 bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants