-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement FlintJob to handle all query types in warmpool mode #979
base: main
Are you sure you want to change the base?
Conversation
6747ab9
to
59aa26b
Compare
logInfo(s"WarmpoolEnabled: ${warmpoolEnabled}") | ||
|
||
if (!warmpoolEnabled) { | ||
val jobType = sparkSession.conf.get("spark.flint.job.type", FlintJobType.BATCH) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any particular reason to have the conf key hard-coded here?
We can probably do FlintSparkConf.JOB_TYPE.key
, which similar to FlintSparkConf.WARMPOOL_ENABLED.key,
on above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The existing code was doing the same; I just wrapped it in an if
block. I can modify this if needed.
CustomLogging.logInfo(s"""Job type is: ${jobType}""") | ||
sparkSession.conf.set(FlintSparkConf.JOB_TYPE.key, jobType) | ||
|
||
val dataSource = conf.get("spark.flint.datasource.name", "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for DATA_SOURCE_NAME
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The existing code was doing the same; I just wrapped it in an if
block. I can modify this if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify and document how WarmPool is abstracted and can be enabled/disabled?
val warmpoolEnabled = conf.get(FlintSparkConf.WARMPOOL_ENABLED.key, "false").toBoolean | ||
logInfo(s"WarmpoolEnabled: ${warmpoolEnabled}") | ||
|
||
if (!warmpoolEnabled) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This introduced huge if/else block, which reduce the readability/maintainability a lot. Can you split the class for warmpool and original interactive job?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on this, let's abstract the common interface and move from there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
val queryId = conf.get(FlintSparkConf.QUERY_ID.key, "") | ||
} | ||
|
||
def queryLoop(commandContext: CommandContext, segmentName: String): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks similar to FlintREPL.queryLoop method, but modified. It would become very difficult to maintain since we need to very carefully maintain both to be consistent.
Can you add abstraction and avoid duplicates?
Can you add abstraction and avoid duplicates?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This redundancy is expected. After this PR is merged, FlintREPL will be deprecated. FlintJob will be the single point of entry for all types of queries.
def getSegmentName(sparkSession: SparkSession): String = { | ||
val maxExecutorsCount = | ||
sparkSession.conf.get(FlintSparkConf.MAX_EXECUTORS_COUNT.key, "unknown") | ||
String.format("%se", maxExecutorsCount) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This segmentName
is specific to warmpool logic; let us create abstractions on warmpool and record metrics via AOP.
@@ -610,15 +610,15 @@ object FlintREPL extends Logging with FlintJobExecutor { | |||
} | |||
} | |||
|
|||
private def handleCommandTimeout( | |||
def handleCommandTimeout( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both FlintJob and FlintREPL extends FlintJobExecutor, consider refactor common methods to FlintJobExecutor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack
@@ -32,7 +32,8 @@ case class JobOperator( | |||
dataSource: String, | |||
resultIndex: String, | |||
jobType: String, | |||
streamingRunningCount: AtomicInteger) | |||
streamingRunningCount: AtomicInteger, | |||
statementContext: Map[String, Any] = Map.empty[String, Any]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the purpose of adding statementContext which belongs to FlintStatement
model?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The getNextStatement
call in FlintJob retrieves all the query information, including the statementContext
. However, when the query is a streaming or batch query, we need to invoke the JobOperator
. Currently, the JobOperator
class only accepts the query, queryId, and other related information, but it does not include the statementContext
. As a result, when the JobOperator
calls executeStatement
, we may encounter issues. To resolve this, the statementContext
should be passed to the JobOperator
.
Additionally, before calling executeStatement
, the FlintStatement
is constructed. Currently, the FlintStatement
does not include the statementContext
. However, with the introduction of the warmpool, it becomes necessary to include the statementContext
in the FlintStatement
, as there is client-side logic that depends on it.
val warmpoolEnabled = conf.get(FlintSparkConf.WARMPOOL_ENABLED.key, "false").toBoolean | ||
logInfo(s"WarmpoolEnabled: ${warmpoolEnabled}") | ||
|
||
if (!warmpoolEnabled) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on this, let's abstract the common interface and move from there.
cd81203
to
044aeea
Compare
Signed-off-by: Shri Saran Raj N <[email protected]>
044aeea
to
adef5b6
Compare
Description
This PR introduces support for FlintJob to handle all types of queries — interactive, streaming, and batch — with all data sources in warmpool mode. Additionally, FlintJob will also support non-warmpool mode for streaming and batch queries, configurable via a Spark configuration setting.
Changes:
Related Issues
Check List
--signoff
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.