Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change emr job names based on the query type #2543

Merged
merged 1 commit into from
Mar 11, 2024

Conversation

vamsimanohar
Copy link
Member

@vamsimanohar vamsimanohar commented Mar 7, 2024

Description

  • Change EMR job names to following.
    * Batch Query: clustername:batch
    * Interactive Query: clustername:interactive:sessionId
    * Index Streaming Query: clustername:streaming:flint_my_glue_default_http_logs_elb_and_requesturi_index
  • Truncating jobname if length of it is greater than 255 characters.
  • SparkQueryDispatcherTest.java and CreateAsyncQueryRequestTest.java refactoring.

Issues Resolved

[List any issues this PR will resolve]

Check List

  • New functionality includes testing.
    • All tests pass, including unit test, integration test and doctest
  • New functionality has been documented.
    • New functionality has javadoc added
    • New functionality has user manual doc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link

codecov bot commented Mar 7, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.42%. Comparing base (f57d686) to head (71333b3).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff            @@
##               main    #2543   +/-   ##
=========================================
  Coverage     95.41%   95.42%           
  Complexity     5027     5027           
=========================================
  Files           483      483           
  Lines         14016    14020    +4     
  Branches        944      944           
=========================================
+ Hits          13374    13378    +4     
  Misses          621      621           
  Partials         21       21           
Flag Coverage Δ
sql-engine 95.42% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@penghuo
Copy link
Collaborator

penghuo commented Mar 8, 2024

could we also add index name in batch query.

Batch Query: clustername:flint_my_glue_default_http_logs_elb_and_requesturi_index:batch
Streaming Query: clustername:flint_my_glue_default_http_logs_elb_and_requesturi_index:streaming

penghuo
penghuo previously approved these changes Mar 8, 2024
dai-chen
dai-chen previously approved these changes Mar 8, 2024
Copy link
Collaborator

@dai-chen dai-chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wonder any max length limit in EMR job name?

@vamsimanohar vamsimanohar dismissed stale reviews from dai-chen and penghuo via 23eaa3b March 8, 2024 20:12
@vamsimanohar
Copy link
Member Author

Just wonder any max length limit in EMR job name?

Yes addressing that even suresh raised this concern. 255 is the limit.

@vamsimanohar
Copy link
Member Author

Cluster name is between 3-28 chars, but index name can go beyond 255 characters.
Will truncate the job name beyond 255 characters before making the emr job call.

@vamsimanohar vamsimanohar self-assigned this Mar 8, 2024
@vamsimanohar vamsimanohar added v2.13.0 Issues targeting release v2.13.0 enhancement New feature or request backport 2.x labels Mar 8, 2024
@noCharger noCharger closed this Mar 9, 2024
@noCharger noCharger reopened this Mar 9, 2024
spark/src/main/antlr/SqlBaseParser.g4 Show resolved Hide resolved
Comment on lines -97 to +96
jobName,
clusterName,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the benefit of transferring the jobname formatting to the callee side? I'd rather leave them at handler level for consistency.

Copy link
Member Author

@vamsimanohar vamsimanohar Mar 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to do that but it requires a lot of refactoring and the current code structure is little convoluted.
Currently, sessionId creation happen inside CreateSessionRequest and so the jobName can't be built on the Callee side.

@vamsimanohar vamsimanohar merged commit 1a09f96 into opensearch-project:main Mar 11, 2024
53 of 58 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Mar 11, 2024
Signed-off-by: Vamsi Manohar <[email protected]>
(cherry picked from commit 1a09f96)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
vamsimanohar pushed a commit that referenced this pull request Mar 12, 2024
(cherry picked from commit 1a09f96)

Signed-off-by: Vamsi Manohar <[email protected]>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x enhancement New feature or request v2.13.0 Issues targeting release v2.13.0
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

4 participants