diff --git a/rfcs/0000-limitador-multithread-inmemory.md b/rfcs/0000-limitador-multithread-inmemory.md new file mode 100644 index 00000000..a30c1b5a --- /dev/null +++ b/rfcs/0000-limitador-multithread-inmemory.md @@ -0,0 +1,241 @@ +# RFC Template + +- Feature Name: `limitador_multithread_inmemory` +- Start Date: 2023-11-02 +- RFC PR: [Kuadrant/architecture#0000](https://github.com/Kuadrant/architecture/pull/0000) + +# Summary +[summary]: #summary + +Enable Limitador service to process requests in parallel when configured to use In Memory storage. + +# Motivation +[motivation]: #motivation + +Currently, Limitador service is single threaded, regardless of its chosen storage. This means that it can only process one request at a time. +Alternatively, we could use a multithreading approach to process requests in parallel, which would improve the overall performance of Limitador. +However, this would introduce some particular behaviour regarding the _Accuracy_ of the defined limit counters and the _Throughput_ of the service. + +# Guide-level explanation +[guide-level-explanation]: #guide-level-explanation + +In order to achieve the desired multi-threading functionality, we'd need to make sure that the user understands the +trade-offs that this approach implies and that they are aware of the possible consequences of using it, and how to +configure the service in order to obtain the desired behaviour. Regarding this last statement, we will need to +introduce an interface that allows the user to meet their requirements. + +A couple of concepts we need to define in order to understand the proposal: +* **Accuracy**: The _accuracy_ is defined by the service's ability to enforce the limits in an exact factual way. This means + that if the service is configured to allow 10 requests per second, it will only allow 10 requests per second, and + not 11 or 9. In terms of observability, the counters, will be monotonically strictly decreasing without repeating any + value given per service. +* **Throughput**: The _throughput_ of a service is defined by the number of requests that the service can process in a + given time. As a consequence of having higher throughput, we need to introduce two more concepts: + * **Overshoot**: The _overshooting_ of a limit counter is defined by the difference between the _expected_ value of the counter + and the value that the counter has in the storage, when the stored value is **greater** than the expected one. + * **Undershoot**: The _undershooting_ of a limit counter is defined the same way the **Overshoot** is, but the stored value + is **lower** than the expected one. + +## Behaviour + +When using the multi-threading approach, we need to understand the trade-offs that we are making. The main one is that +it's not possible to have both _Accuracy_ and _Throughput_ at the same time. This means that we need to choose one of +them and configure the service accordingly, favouring accuracy comes close to the current single thread implementation, +or concede to _overshoot_ or _undershoot_ in order to have higher throughput. However, we can still have decent values for both of them, if we choose to +introduce a more complex implementation (thread pools, PID control, etc.). + +[IMG Accuracy vs Throughput] + +## Configuration + +In order to configure Limitador, we need to introduce a clear interface that allows the user to choose the desired +behaviour. This interface should be able to seamlessly integrate with the current configuration options, and at least +at the first implementation, it should be able to be configured at initialization time. The fact that we are using +multi-threading or a single thread, it's not something that the user should be aware of, so we need to abstract that +away from them, in the end, they would only care about the "precision" of the service. + +### Example + +In this example, we are configuring the service to use the _InMemory_ storage and to use the _Throughput_ mode. +```bash +limitador-server --mode=throughput memory +``` + +In this other example, we are configuring the service to use the _InMemory_ storage and to use the _Accuracy_ mode. +```bash +limitador-server --mode=accuracy memory +``` + +## Implications + +### Accuracy + +When using the _Accuracy_ mode, the service will behave in a similar way to the current implementation, most likely +in a single thread. This means that the service will be able to process one request at a time, and the _accuracy_ of +the limit counters will always be as expected. This mode is the one that one should use when it's important to enforce +the limits as accurately as possible, and the throughput is not a concern. + +### Throughput + +When using the _Throughput_ mode, the service will fully behave in a multi-threaded way, which means that it will be able +to process multiple requests at the same time. However, this will introduce some particular behaviour regarding the +_accuracy_ of the limit counters, which will be affected by the _overshoot_ and _undershoot_ concepts. + +#### Overshoot + +Given the following (simplified) limit definitions: + +Limit 1: +```yaml +namespace: example.org +max_value: 4 +seconds: 60 +``` +This could be translated to _any_ request to the namespace `example.org` can be authorized up to 4 times in a span +of 1 minute. + +Limit 2: +```yaml +namespace: example.org +max_value: 2 +seconds: 1 +conditions: + - "req.method == 'POST'" +``` +While this one would be translated to _POST_ requests to the namespace `example.org` can be authorized up to 2 times +in a span of 1 second. + +Now imagine that we have the following requests happening at the same time in parallel: `POST`, `POST`, `POST`, `GET`, +`GET`; it could happen that the service authorizes the first two `POST` requests and updates both counters to 2, then the third +`POST` request is not authorized *but* the counter of limit 1 is updated to 3 (wrongly), and finally then only one `GET` request +that comes in would be authorized, leaving one wrongly denied. In this case, we would have an _overshoot_ of 1 in the +limit 1 counter, limiting the request to the service for the next 60 seconds. + +#### Undershoot + +This behaviour is the opposite of the _overshoot_ one, and it happens when the service authorizes a request that should +have been denied. This scenario is less likely to happen, but it's still possible. In this case, the _undershoot_ would +be the difference between the expected value of the counter and the value that the counter has in the storage, when the +stored value is **lower** than the expected one. Usually, this would happen when trying to revert a wrongly updated counter +twice in a row, for example, when trying to revert the _overshoot_ from the previous example. + + +# Reference-level explanation +[reference-level-explanation]: #reference-level-explanation + +Enhancing the Limitador service to process requests in parallel and being capable of operating in the previously described +modes, entails considering various implications. + +Firstly, ensuring consistency in storing counter values across multiple threads is paramount to maintaining the integrity +of rate limiting operations. This involves implementing robust concurrency control mechanisms to prevent race conditions +and data corruption when accessing and updating shared counter data. + +Additionally, choosing the data structure that will store the counter values must prioritize efficiency and thread safety +to handle concurrent access effectively. + +Finally, considering that initially we will implement the setup of the mode at the service initialization time, balancing +the need for strict adherence to defined limits with the desire to maximize throughput presents a trade-off that we could +give the user the ability to manage. + + +## Implementation guidelines + +### Data structures + +Limitador already possess a data structure to store the limit counters that also stores the expiration time of the counter, +the type is named `AtomicExpiringCounter`. This data structure is a thread-safe counter that can be incremented and +decremented atomically, and it also has a methods to check the value at a certain time and update the counter. + +The collection of these counters should be consistent across threads and also being able to quickly sort and retrieve the +counters associated to a certain namespace and limits definitions. Currently, the data structure can't provide this +much desired functionality, so we need to introduce a new data structure that can provide this. + +### Request handling and concurrency control + +When a request comes in, Limitador needs to determine the appropriate Counter object(s) based on the request's namespace +and limit definitions. Taking into account the collection will be in ascending order by expiration time, iterating over the +counter objects associated with those premises and performing the following steps: + +1. Retrieve the Counter object from the collection. +2. Compare the current timestamp with the expiration time of the counter to determine if the request falls within the +sliding window +3. If the request falls within the window, increment the hit count of the Counter. +4. If the hit count exceeds the limit defined for the Counter, deny the request. +5. If the request passes all limit checks, allow it to proceed. + +To ensure that these operations are performed atomically and consistently across threads, we need to use fine-grained +locking or atomic operations when accessing and updating the counter data in the collection. Using synchronization +primitives such as mutexes or atomic types will help to protect the integrity of the counter data from concurrent modifications. + +### Configuration and mode selection + +The configuration of the service will be done at the initialization time, and it will be done through the command line as +described in the [Guide-level explanation](#guide-level-explanation) section. + +#### Accuracy mode + +* In Accuracy mode, we need to strictly adhere to the defined limits by denying requests that exceed the limits for each +Counter Object. +* Check and update Counter values atomically to maintain consistency and prevent race conditions. +* Enforce rate limits accurately by comparing the number of hits within the sliding window to the defined limit. + +#### Throughput mode + +* In Throughput mode, we need to prioritize maximizing the number of requests that can be processed concurrently over +strict adherence to defined limits. +* Allowing request to proceed even if they temporarily exceed the defined limits, favouring higher request rates. +* We should use appropriate concurrency mechanisms and CAS operations to handle requests efficiently while maintaining +consistency in the counter data. We might not need to use locks, but we need to be careful with the CAS operations nonetheless. + + +# Drawbacks +[drawbacks]: #drawbacks + +* The implementation of a multi-threaded Limitador service will introduce additional complexity to the codebase +* The need to manage concurrency and consistency across threads will require careful design and testing to ensure +that the service operates correctly and efficiently. + +# Rationale and alternatives +[rationale-and-alternatives]: #rationale-and-alternatives + +- **Single-threaded**: We could choose to keep the current single-threaded implementation of Limitador, which would + maintain the simplicity and predictability of the service, but would limit its ability to handle concurrent requests + and process requests at a higher throughput. + +# Prior art +[prior-art]: #prior-art + +- [Limitador Issue #69](https://github.com/Kuadrant/limitador/issues/69) - This issue discusses "Sharable counters + across multiple threads" and provides some context and background on the need for multi-threading support in Limitador. +- **Redis**: Redis is a popular in-memory data store that is often used to implement rate limiting and throttling + functionality. It provides atomic operations and data structures such as counters and sorted sets that can be used + to implement rate limiting with high throughput and accuracy. + +# Unresolved questions +[unresolved-questions]: #unresolved-questions + +- How to balance the need for strict adherence to defined limits with the desire to maximize throughput in the + multi-threaded implementation of Limitador? +- What are the best concurrency control mechanisms and data structures to use for storing and updating counter data + across multiple threads? +- What would be the algorithm to use to balance the _Accuracy_ and _Throughput_ modes? + +# Future possibilities +[future-possibilities]: #future-possibilities + +- **Dynamic mode selection**: We could introduce a mechanism to dynamically switch between _Accuracy_ and _Throughput_ + modes based on the current load and performance characteristics of the service. +- **Fine-grained configuration**: We could provide more granular configuration options to allow users to fine-tune the + behaviour of the multi-threaded Limitador service based on their specific requirements. As shown below, we might be able + to provide a richer interface that allows the user to configure the service in a balanced and/or more granular way. + ```bash + limitador-server --mode=balanced --accuracy=0.1 --throughput=0.9 memory + ``` + + or simply + ```bash + limitador-server --accuracy=0.1 --throughput=0.9 memory + ``` + +- **Performance optimizations**: We could explore various performance optimizations such as caching, pre-fetching, and + load balancing to further improve the throughput and efficiency of the multi-threaded Limitador service. \ No newline at end of file