-
Notifications
You must be signed in to change notification settings - Fork 567
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide configurable option to queue requests when concurrency is limited with "max-concurrent-requests" #9229
Comments
This is not different from setting |
The Fault Tolerance Bulkhead feature (SE, MP) provides a mechanism for rate-limiting access to specific tasks. You control both parallelism and wait-queue length. See the Helidon SE Rate Limiting example for examples of using a Bulkhead as well as a Java Semaphore for doing rate limiting. I think of |
The bulkhead feature requires programmatic changes, where providing the queue via max-connection-requests would just be a config change to make old code still behave the same way. |
You can setup bulkhead for all requests with a filter: int rateLimit = Config.global().get("ratelimit").asInt().orElse(20);
Bulkhead bulkhead = Bulkhead.builder()
.limit(rateLimit)
.queueLength(rateLimit * 2)
.build();
routing
.addFilter((chain, req, res) -> {
try {
bulkhead.invoke(() -> {
chain.proceed();
return null;
});
} catch (BulkheadException ex) {
res.status(Status.SERVICE_UNAVAILABLE_503).send();
}
}) |
Yes, This is not same as having the ability in Helidon level, and at Helidon level behavior is not consistent with H3 and individual services have to make code changes to implement this. |
Hello team, I am with @vasanth-bhat on this request as we also are experiencing the same issue on H4. Our team has identified that implementing the Bulkhead API with a queue is necessary to effectively manage the spike in load. However, we've encountered a challenge: each team is required to implement and maintain the same logic independently. This approach is not only time-consuming but also potentially leads to inconsistencies across teams. To streamline our process and ensure uniformity, I propose implementing this solution at the Helidon level. This approach would:
I would greatly appreciate your thoughts on this proposal. Thank you for your attention to this matter. Best regards, |
Hi Team,
Hierarchical Rate Limiting intended to provide Multiple Levels of Limiting instead of a single rate limit applied uniformly across all requests, hierarchical rate limiting involves applying different limits at different levels or scopes. In this context: Application Level (Whole App) Rate Limiting:Simplicity: Easier to implement and manage since you set a global limit for all requests. Path Level Rate Limiting:Granularity: Allows for different rate limits based on the sensitivity or resource intensity of different paths. For example, you might have a more lenient limit for read operations versus write operations. Hybrid Approach:Sometimes, a hybrid approach works well where you have a global rate limit plus specific limits for particularly sensitive or high-traffic paths. This combines the simplicity of app-level control with the precision of path-level where needed. Example Scenario: In this setup, even if /api/search is not hitting its limit, the global limit could still throttle requests if the total across all paths exceeds 1000 requests per second. Counters:globalRequestsDenied : This counter tracks the number of requests denied at the global level due to rate limiting. Gauges:globalQueueLengthPercentage: This gauge provides the current percentage of the global queue length in use, calculated as (current queue size / maximum queue length) * 100 . Queue Utilization : The gauges for queue length percentages offer real-time information on how close the system is to reaching its rate limit capacity, both globally and for specific paths. This can be crucial for understanding system load and for capacity planning or scaling decisions. Properly configuring the queue size offers several key benefits:
Mitigating Horizontal Pod Autoscaler (HPA) Lag:
Serving as a Key Scale-Out Indicator:
By carefully tuning the queue size, you can enhance your system's resilience, responsiveness, and overall performance in the face of varying workloads. CONCLUSION:For small to medium-sized apps or where simplicity is key**: Application-wide rate limiting might be preferable due to ease of management.
Thanks, |
You can create a shared module that implements ServerFeature to register a filter automatically. |
hi @romain-grecourt - Yes that is what we have to do, but in the interest of everyone else who is using Helidon outside of our organization. Don't you think it valuable to have it at Helidon level? |
The reasoning that this features must be provided by Helidon out of the box because using BulkHead requires programmatic changes or that it cannot be shared among projects is not correct. It is reasonable to expect Helidon to provide a more sophisticated feature for concurrency limits, it is currently addressed by #8897. This issue overlaps with #8897 and given the prescribed workarounds It isn't clear what it represents other than sharing one single class. |
There is now a PR for Helidon. |
Environment Details
In. Helidon 4.x with. Webserver that supports loom based virtual threads , uses the new thread per request model . So by design there is no longer server thread pool or any associated queues where requests get queued .
By default. there is no limit on concurrency and this can lead to issues when resources such as DB connections, external system integrations, and. other such downstream resources are limited. This can lead to performance degrade and also errors when requests timeout waiting for such resources.
To address this Helidon provides the "max-concurrent-requests" parameter on the Listener configuration. While it helps to limit the concurrency , the services are running into issues when trying to use this parameter to limit the concurrency
When the "max-concurrent-requests" parameter is set, any surge requests beyond the limit get rejected and fail with 503. There can be occasional surges that can cause the concurrency to go beyond the configured limit, and such cases teh requests would error out.
This behaviour is not consistent with the behavior in earlier versions of Helidon where under this situation the requests would get queued in the queue associated with Helidon's server thread pool.
It would be good have an additional configurable options in Helidon 4 , where. one can additionally enable queueing of requests , when a limit is configured for max-concurrent-requests"
Something like below
server :
max-concurrent-requests : 40
request-queue :
enable : true
max : 100
The text was updated successfully, but these errors were encountered: