Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Add maxExecutors configuration for streaming queries #324

Closed
penghuo opened this issue Apr 30, 2024 · 0 comments
Closed

[FEATURE] Add maxExecutors configuration for streaming queries #324

penghuo opened this issue Apr 30, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@penghuo
Copy link
Collaborator

penghuo commented Apr 30, 2024

Is your feature request related to a problem?
Currently, both interactive and streaming queries share configurations for Spark executors. When streaming jobs are running, executors are dynamically allocated with an upper limit set by spark.dynamicAllocation.maxExecutors (the default value is 30). Spark's dynamic resource allocation (DRA) system aggressively allocates more executors than necessary to achieve low latency, leading to inefficiencies, because the volume of data processed in each interval is relatively small (100GB per 15 minutes), and the latency requirement is not stringent (15 minutes).

What solution would you like?
add spark.dynamicAllocation.streaming.maxExecutors configuration and default value is 10

What alternatives have you considered?
in openserach-sql, submit interactive and streaming query separately and configure spark.dynamicAllocation.maxExecutors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant