Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Track query state in plugin with calling EMR-S jobRun API #2401

Open
penghuo opened this issue Oct 31, 2023 · 1 comment
Open

[FEATURE] Track query state in plugin with calling EMR-S jobRun API #2401

penghuo opened this issue Oct 31, 2023 · 1 comment
Assignees
Labels
enhancement New feature or request Flint

Comments

@penghuo
Copy link
Collaborator

penghuo commented Oct 31, 2023

  • Track all the query state in plugin. currently, we only track interactive queries. we should track batch and streaming queries.
  • one requirement of this feature is to reduce getJobRun call on EMR-S service.
@penghuo penghuo added enhancement New feature or request untriaged v2.12.0 Issues targeting release v2.12.0 Flint labels Oct 31, 2023
@penghuo penghuo self-assigned this Nov 1, 2023
@penghuo penghuo removed the untriaged label Nov 1, 2023
@penghuo
Copy link
Collaborator Author

penghuo commented Nov 1, 2023

Queries Journey

Based on job types, we categorize queries into three main types:

  1. Interactive jobs. e.g. create table
  2. Batch jobs. e.g. refresh table
  3. Streaming jobs. e.g. create table with (auto_refresh=true)

Additionally, we have queries that run on the plugin side. e.g. drop index
In following table, we explain how the queries is processed in the system

POST

POST async_query create 2 docs in flint_execution_request index

  • statement. docId: statementId

type: statement
queryId: // asssigned by Transport
statementId: // same as queryId
state: "waiting"

  • session. docId: sessionId
    • based on the query, the sessionType will be different.

type: session
sessionType: interactive / batch / streaming / local
state: "NOT_STARTED"

FlintREPLJob

  • get sessionId from spark.flint.job.sessionId conf.
  • get sessionType of sessionId.
  • take action based on sessionType
    • interactive
      • while (true)
        • read statement with sessionId and process.
        • write result to resultIndex
        • update statementState = success
      • quit if process done
    • batch
      • read statement with sessionId and process.
      • write result to resultIndex
      • update statementState = success
      • quit if process done
    • streaming
      • read statement with sessionId and process.
      • write result to resultIndex
      • update statementState = success
  • Update session lastUpdateTime regularly
  • Update sessionState to DEAD before quit

GET

GET async_query fetch state from statement doc. If the state is success, fetch result from query_result_indices.

CANCEL

CANCEL async_query.

  • opt-1
    • Plugin (1) set statement state to cancelling state.
    • FlintJob (1) read statement state, if it cancelling. mark as cancelled state. force quite the job.
  • opt-2
    • Plugin (1) set statement state to cancelling. (2) cancel the job associate with session. (3) set statement to cancelled state.

Limitation: DROP INDEX query can not be cancelled.

@penghuo penghuo changed the title [FEATURE] Track query state in plugin. [FEATURE] Track query state in plugin with calling EMR-S jobRun API Nov 2, 2023
@vmmusings vmmusings removed the v2.12.0 Issues targeting release v2.12.0 label Jan 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Flint
Projects
None yet
Development

No branches or pull requests

2 participants