Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Async Job query can longer than 10280 chars #2376

Open
penghuo opened this issue Oct 26, 2023 · 3 comments
Open

[BUG] Async Job query can longer than 10280 chars #2376

penghuo opened this issue Oct 26, 2023 · 3 comments
Assignees
Labels
bug Something isn't working Flint

Comments

@penghuo
Copy link
Collaborator

penghuo commented Oct 26, 2023

POST _plugins/_async_query
{"query":"CREATE MATERIALIZED VIEW\n  mys3.default.aws_elb_mview AS\n    SELECT\n      type as `aws.elb.elb_type`,\n      time as `@timestamp`,\n      elb as `aws.elb.elb_name`,\n      client_ip as `aws.elb.client.ip`,\n      client_port as `aws.elb.client.port`,\n      target_ip as `aws.elb.target_ip`,\n      target_port as `aws.elb.target_port`,\n      request_processing_time as `aws.elb.request_processing_time`,\n      target_processing_time as `aws.elb.target_processing_time`,\n      response_processing_time as `aws.elb.response_processing_time`,\n      elb_status_code as `aws.elb.elb_status_code`,\n      target_status_code as `aws.elb.target_status_code`,\n      received_bytes as `aws.elb.received_bytes`,\n      sent_bytes as `aws.elb.sent_bytes`,\n      request_verb as `http.request.method`,\n      request_url as `url.full`,\n      request_proto as `url.schema`,\n      user_agent as `http.user_agent.name`,\n      ssl_cipher as `aws.elb.ssl_cipher`,\n      ssl_protocol as `aws.elb.ssl_protocol`,\n      target_group_arn as `aws.elb.target_group_arn`,\n      trace_id as `traceId`,\n      domain_name as `url.domain`,\n      chosen_cert_arn as `aws.elb.chosen_cert_arn`,\n      matched_rule_priority as `aws.elb.matched_rule_priority`,\n      request_creation_time as `aws.elb.request_creation_time`,\n      actions_executed as `aws.elb.actions_executed`,\n      redirect_url as `aws.elb.redirect_url`,\n      lambda_error_reason as `aws.elb.lambda_error_reason`,\n      target_port_list as `aws.elb.target_port_list`,\n      target_status_code_list as `aws.elb.target_status_code_list`,\n      classification as `aws.elb.classification`,\n      classification_reason as `aws.elb.classification_reason`\n    FROM\n      mys3.default.aws_elb;\n","datasource":"mys3","lang":"sql","sessionId":"Y3ZFWkFMR2NPS215czM="}

> "{\n  \"status\": 503,\n  \"error\": {\n    \"type\": \"ValidationException\",\n    \"reason\": \"There was internal problem at backend\",\n    \"details\": \"1 validation error detected: Value at \\u0027jobDriver.sparkSubmit.entryPointArguments\\u0027 failed to satisfy constraint: Member must satisfy constraint: [Member must have length less than or equal to 10280, Member must have length greater than or equal to 1, Member must satisfy regular expression pattern: .*\\\\S.*] (Service: AWSEMRServerless; Status Code: 400; Error Code: ValidationException; Request ID: 5bf4879b-6d5b-4c29-aca9-04f845a50f7b; Proxy: null)\"\n  }\n}"
@penghuo penghuo added bug Something isn't working untriaged v2.11.1 Issues targeting release v2.11.1 labels Oct 26, 2023
@penghuo
Copy link
Collaborator Author

penghuo commented Oct 26, 2023

  1. For Job, use spark conf to pass query, instead of query parameters.
  2. For Session, support non streaming query.

@penghuo penghuo self-assigned this Oct 26, 2023
@penghuo penghuo added Flint and removed v2.11.1 Issues targeting release v2.11.1 labels Dec 8, 2023
@anirudha anirudha changed the title [BUG] Async Job query can longer than 10280 [BUG] Async Job query can longer than 10280 chars Feb 9, 2024
@noCharger
Copy link
Collaborator

noCharger commented Feb 12, 2024

Currently, the query is stored in StartJobRunRequest.entryPointArguments and retrieved by entry point FlintREPL as the first argument. Move it to Spark conf as part of StartJobRunRequest.sparkSubmitParameters will share the maximum length of 102400 with other confs (average total length 3000 to 5000).

@noCharger
Copy link
Collaborator

noCharger commented Feb 12, 2024

The query passed when creating a job for interactive query is a dummy query. However, the actual query is passed in for batch query and streaming query. We need to ensure that this breaking change is compatible with the future release in the opensearch-spark repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Flint
Projects
None yet
Development

No branches or pull requests

3 participants