You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The system is configured to handle up to 15 queries per queue and 1,000 queries at the domain level. Exceeding these thresholds triggers throttling, and the system will respond with exceptions to additional requests. SessionManager track active sessions.
Spark Application Failure Handling
Happy case - Spark Job running in EMR-S update connection state. Connection running in Plugin listen to connection state change then notify registered Sessions.
No Statement Exit - If Spark Job does not receive statement for 10mis. Spark Job should exit.
Heartbeat - Connection using lastUpdateTime to detected lost connection Spark Job. if lastUpdateTime check failed 1 times consecutive, (1) Trigger Connection close. (2) Trigger session state change with AppState.ERROR and set sessionState = ERROR
LSE - In case of EMR-S LSE, Plugin does not take any action and depend on CP assign new Application Id.
Plugin Failure Failure Handling
Datasource metadata index unavailable: Flint async query does not work. Error response is “no datasource exist”. Todo - SOP need.
Session index unavailable: Flint async query does not work. Error response is “no session store available”. Todo - SOP need.
Result index unavailable: Flint async query fetch result does not work. Error response is “result write failed”. Todo - SOP need.
Data index unavailable: Flint skipping index / cover index / mv does not work. (1) Query can not be rewrite. (2) Refresh failed with error message “update index failed”.
Interactive Session Recovery
Plugin could trigger Interactive session recovery if session in dead state. The concern is in LSE, Plugin will keep create new session and recover statement. Plugin should emit metrics to CP. CP decide when to trigger statement recovery.
// 1. list all the existing statements in dead session
List<Statement> stList = sql("
SELECT statement
FROM flint_ql_sessions
WHERE type = "session"
and sessionId = "deadSessionId"
and state = "waiting"
// 2. create new session
Session newSession = sessionManager.allocateSession()
// 3. recover statement in new Session
for(st: stList) {
st.recover(newSession)
}
Deployment
Migrate to new Application
Opt-1
CP update DNS record with new Application ID.
CP list active sessions in domain,
For non streaming session, call DELETE async_query/sessions/{sessionId}/?wait=30mins close session.
As InteractiveSession, it wait 30mins || List are finish
As BatchSession, it wait 30mins || statement is finish
For streaming session,
call DELETE async_query/sessions/{sessionId} close session
call POSTasync_query/sessions/{sessionId}/?recover=true&applicationId=newApp to recover session.
Opt-2 - Preferred
CP update DNS record with new ApplicationId and old ApplicationId
DP list active sessions in domain,
For non streaming session, call Session.close().
As InteractiveSession, it wait 30mins || List are finish
As BatchSession, it wait 30mins || statement is finish
For streaming session, call Session.close() close current session and Session.recover() to recover session in new ApplicationId.
Limitation
If DP is not available, this solution does works. CP should leverage EMR-S statement to trigger deployment.
Metrics & Monitor
Statement
statement count
success statement count
failed statement count
Session
active session count
dead session count
error session count
finish session count
The text was updated successfully, but these errors were encountered:
Overview
In high level, we introduce following concept
Use Cases
Select
Client - Plugin
Plugin - Spark
Design
API - Todo
Statement
A session defines the context for executing statements. Each session is bound to a single Spark job.
Statement State
Failure Handling
Session
Session define the statement execution context. There are 3 types of sessions
SessionManager
SessionManager provide follow APIs for Rest/Transports
Session state store
Session State
Spark Job
Plugin compose Spark Job parameters and submit Spark Job to EMR-S. The submit parameters are:
Flint Session Index Mapping
Fault Tolerance
Backpressure
The system is configured to handle up to 15 queries per queue and 1,000 queries at the domain level. Exceeding these thresholds triggers throttling, and the system will respond with exceptions to additional requests. SessionManager track active sessions.
Spark Application Failure Handling
Plugin Failure Failure Handling
Interactive Session Recovery
Plugin could trigger Interactive session recovery if session in dead state. The concern is in LSE, Plugin will keep create new session and recover statement. Plugin should emit metrics to CP. CP decide when to trigger statement recovery.
Deployment
Migrate to new Application
Opt-1
Opt-2 - Preferred
Limitation
Metrics & Monitor
Statement
Session
The text was updated successfully, but these errors were encountered: