-
Notifications
You must be signed in to change notification settings - Fork 361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CELEBORN-1595] Support quota low watermark for checking quota available #2727
base: main
Are you sure you want to change the base?
Conversation
IMO this configuration can be a dynamic configuration. Wdyt? |
Also, IMO quota should be a hard limit any application breaching quota should get a |
ping @FMX @SteNicholas @waitinfuture wdyt about this. |
val quota = getQuota(userIdentifier) | ||
Quota( | ||
(quota.diskBytesWritten * quotaWatermark).toLong, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @s0nskar , I don't quite understand why can't we just config the quota to quota.diskBytesWritten * 1.0 (some user defined factor)
? Is it necessary to introduce this new config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@waitinfuture Currently Celeborn consider quota as soft limits and just stop allowing new jobs on use Celeborn if the quota is breached while the existing jobs keeps on running while breaching the quota. IMO there are two shortcoming in current quota implementation –
- It allows new jobs to onboard even if quota is just about to be breached. Ideally it should stop onboarding jobs after a certain threshold to not go overboard with the quota limits (This PR addresses that)
- Killing jobs if quota for an user is breached (I was planning to implement this but noticed that @leixm raised a PR to handle this today - [CELEBORN-1577][Phase1] Storage quota should support interrupt shuffle. #2801)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@waitinfuture wdyt about this? IMO this should be a must have to avoid interrupting the user/tenant jobs in bulk.
This PR is stale because it has been open 20 days with no activity. Remove stale label or comment or this will be closed in 10 days. |
What changes were proposed in this pull request?
Support quota low watermark for checking quota available. This will not allow new jobs to run on Celeborn if quota used is above lowWatermark.
Why are the changes needed?
Currently we allow jobs to run even if we're just about to breach quota limits. This is not ideal behaviour, ideally we should not allow any new jobs to run on Celeborn after certain threshold (called lowWatermark here). This will ensure current running jobs will use the quota and will not go way beyond quota usage.
I'll also follow up with a PR to throw CelebornIOException, if quota is breached.
Does this PR introduce any user-facing change?
NA
How was this patch tested?
UTs