Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-threaded Limitador #37

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
241 changes: 241 additions & 0 deletions rfcs/0000-limitador-multithread-inmemory.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,241 @@
# RFC Template

- Feature Name: `limitador_multithread_inmemory`
- Start Date: 2023-11-02
- RFC PR: [Kuadrant/architecture#0000](https://github.com/Kuadrant/architecture/pull/0000)

# Summary
[summary]: #summary

Enable Limitador service to process requests in parallel when configured to use In Memory storage.

# Motivation
[motivation]: #motivation

Currently, Limitador service is single threaded, regardless of its chosen storage. This means that it can only process one request at a time.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually not true, is it? The gRPC service (tonic) is built on top of Tokio, an event-driven, non-blocking I/O platform. Or when the storage is memory, there is no I/O involved, hence only one request at a time can be processed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See this issues, other than something having changed in newer version, this was still true like 10 months ago.

Alternatively, we could use a multithreading approach to process requests in parallel, which would improve the overall performance of Limitador.
However, this would introduce some particular behaviour regarding the _Accuracy_ of the defined limit counters and the _Throughput_ of the service.

# Guide-level explanation
[guide-level-explanation]: #guide-level-explanation

In order to achieve the desired multi-threading functionality, we'd need to make sure that the user understands the
trade-offs that this approach implies and that they are aware of the possible consequences of using it, and how to
configure the service in order to obtain the desired behaviour. Regarding this last statement, we will need to
introduce an interface that allows the user to meet their requirements.

A couple of concepts we need to define in order to understand the proposal:
* **Accuracy**: The _accuracy_ is defined by the service's ability to enforce the limits in an exact factual way. This means
that if the service is configured to allow 10 requests per second, it will only allow 10 requests per second, and
not 11 or 9. In terms of observability, the counters, will be monotonically strictly decreasing without repeating any
value given per service.
* **Throughput**: The _throughput_ of a service is defined by the number of requests that the service can process in a
given time. As a consequence of having higher throughput, we need to introduce two more concepts:
* **Overshoot**: The _overshooting_ of a limit counter is defined by the difference between the _expected_ value of the counter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overshooting is when the counter goes beyond the limit and the request is still admitted, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, I agree with your definition, but looks like @didierofrivia looks at it the other way around.

and the value that the counter has in the storage, when the stored value is **greater** than the expected one.
* **Undershoot**: The _undershooting_ of a limit counter is defined the same way the **Overshoot** is, but the stored value
is **lower** than the expected one.

## Behaviour

When using the multi-threading approach, we need to understand the trade-offs that we are making. The main one is that
it's not possible to have both _Accuracy_ and _Throughput_ at the same time. This means that we need to choose one of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be nice to explain why both are not achievable.

them and configure the service accordingly, favouring accuracy comes close to the current single thread implementation,
or concede to _overshoot_ or _undershoot_ in order to have higher throughput. However, we can still have decent values for both of them, if we choose to
introduce a more complex implementation (thread pools, PID control, etc.).

[IMG Accuracy vs Throughput]

## Configuration

In order to configure Limitador, we need to introduce a clear interface that allows the user to choose the desired
behaviour. This interface should be able to seamlessly integrate with the current configuration options, and at least
at the first implementation, it should be able to be configured at initialization time. The fact that we are using
multi-threading or a single thread, it's not something that the user should be aware of, so we need to abstract that
away from them, in the end, they would only care about the "precision" of the service.

### Example

In this example, we are configuring the service to use the _InMemory_ storage and to use the _Throughput_ mode.
```bash
limitador-server --mode=throughput memory
```

In this other example, we are configuring the service to use the _InMemory_ storage and to use the _Accuracy_ mode.
```bash
limitador-server --mode=accuracy memory
```

## Implications

### Accuracy

When using the _Accuracy_ mode, the service will behave in a similar way to the current implementation, most likely
in a single thread. This means that the service will be able to process one request at a time, and the _accuracy_ of
the limit counters will always be as expected. This mode is the one that one should use when it's important to enforce
the limits as accurately as possible, and the throughput is not a concern.

### Throughput

When using the _Throughput_ mode, the service will fully behave in a multi-threaded way, which means that it will be able
to process multiple requests at the same time. However, this will introduce some particular behaviour regarding the
_accuracy_ of the limit counters, which will be affected by the _overshoot_ and _undershoot_ concepts.

#### Overshoot

Given the following (simplified) limit definitions:

Limit 1:
```yaml
namespace: example.org
max_value: 4
seconds: 60
```
This could be translated to _any_ request to the namespace `example.org` can be authorized up to 4 times in a span
of 1 minute.

Limit 2:
```yaml
namespace: example.org
max_value: 2
seconds: 1
conditions:
- "req.method == 'POST'"
```
While this one would be translated to _POST_ requests to the namespace `example.org` can be authorized up to 2 times
in a span of 1 second.

Now imagine that we have the following requests happening at the same time in parallel: `POST`, `POST`, `POST`, `GET`,
`GET`; it could happen that the service authorizes the first two `POST` requests and updates both counters to 2, then the third
`POST` request is not authorized *but* the counter of limit 1 is updated to 3 (wrongly), and finally then only one `GET` request
that comes in would be authorized, leaving one wrongly denied. In this case, we would have an _overshoot_ of 1 in the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say this is a undershooting example, i.e. traffic is being rejected before reaching the limit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may need to review my understanding about these concepts.

limit 1 counter, limiting the request to the service for the next 60 seconds.

#### Undershoot

This behaviour is the opposite of the _overshoot_ one, and it happens when the service authorizes a request that should
have been denied. This scenario is less likely to happen, but it's still possible. In this case, the _undershoot_ would
be the difference between the expected value of the counter and the value that the counter has in the storage, when the
stored value is **lower** than the expected one. Usually, this would happen when trying to revert a wrongly updated counter
twice in a row, for example, when trying to revert the _overshoot_ from the previous example.


# Reference-level explanation
[reference-level-explanation]: #reference-level-explanation

Enhancing the Limitador service to process requests in parallel and being capable of operating in the previously described
modes, entails considering various implications.

Firstly, ensuring consistency in storing counter values across multiple threads is paramount to maintaining the integrity
of rate limiting operations. This involves implementing robust concurrency control mechanisms to prevent race conditions
and data corruption when accessing and updating shared counter data.

Additionally, choosing the data structure that will store the counter values must prioritize efficiency and thread safety
to handle concurrent access effectively.

Finally, considering that initially we will implement the setup of the mode at the service initialization time, balancing
the need for strict adherence to defined limits with the desire to maximize throughput presents a trade-off that we could
give the user the ability to manage.


## Implementation guidelines

### Data structures

Limitador already possess a data structure to store the limit counters that also stores the expiration time of the counter,
the type is named `AtomicExpiringCounter`. This data structure is a thread-safe counter that can be incremented and
decremented atomically, and it also has a methods to check the value at a certain time and update the counter.

The collection of these counters should be consistent across threads and also being able to quickly sort and retrieve the
counters associated to a certain namespace and limits definitions. Currently, the data structure can't provide this
much desired functionality, so we need to introduce a new data structure that can provide this.

### Request handling and concurrency control

When a request comes in, Limitador needs to determine the appropriate Counter object(s) based on the request's namespace
and limit definitions. Taking into account the collection will be in ascending order by expiration time, iterating over the
counter objects associated with those premises and performing the following steps:

1. Retrieve the Counter object from the collection.
2. Compare the current timestamp with the expiration time of the counter to determine if the request falls within the
sliding window
3. If the request falls within the window, increment the hit count of the Counter.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't the increment happen only when it has been verified that no counters has exceeded the associated limit?

4. If the hit count exceeds the limit defined for the Counter, deny the request.
5. If the request passes all limit checks, allow it to proceed.

To ensure that these operations are performed atomically and consistently across threads, we need to use fine-grained
locking or atomic operations when accessing and updating the counter data in the collection. Using synchronization
primitives such as mutexes or atomic types will help to protect the integrity of the counter data from concurrent modifications.

### Configuration and mode selection

The configuration of the service will be done at the initialization time, and it will be done through the command line as
described in the [Guide-level explanation](#guide-level-explanation) section.

#### Accuracy mode

* In Accuracy mode, we need to strictly adhere to the defined limits by denying requests that exceed the limits for each
Counter Object.
* Check and update Counter values atomically to maintain consistency and prevent race conditions.
* Enforce rate limits accurately by comparing the number of hits within the sliding window to the defined limit.

#### Throughput mode

* In Throughput mode, we need to prioritize maximizing the number of requests that can be processed concurrently over
strict adherence to defined limits.
* Allowing request to proceed even if they temporarily exceed the defined limits, favouring higher request rates.
* We should use appropriate concurrency mechanisms and CAS operations to handle requests efficiently while maintaining
consistency in the counter data. We might not need to use locks, but we need to be careful with the CAS operations nonetheless.


# Drawbacks
[drawbacks]: #drawbacks

* The implementation of a multi-threaded Limitador service will introduce additional complexity to the codebase
* The need to manage concurrency and consistency across threads will require careful design and testing to ensure
that the service operates correctly and efficiently.

# Rationale and alternatives
[rationale-and-alternatives]: #rationale-and-alternatives

- **Single-threaded**: We could choose to keep the current single-threaded implementation of Limitador, which would
maintain the simplicity and predictability of the service, but would limit its ability to handle concurrent requests
and process requests at a higher throughput.

# Prior art
[prior-art]: #prior-art

- [Limitador Issue #69](https://github.com/Kuadrant/limitador/issues/69) - This issue discusses "Sharable counters
across multiple threads" and provides some context and background on the need for multi-threading support in Limitador.
- **Redis**: Redis is a popular in-memory data store that is often used to implement rate limiting and throttling
functionality. It provides atomic operations and data structures such as counters and sorted sets that can be used
to implement rate limiting with high throughput and accuracy.

# Unresolved questions
[unresolved-questions]: #unresolved-questions

- How to balance the need for strict adherence to defined limits with the desire to maximize throughput in the
multi-threaded implementation of Limitador?
- What are the best concurrency control mechanisms and data structures to use for storing and updating counter data
across multiple threads?
- What would be the algorithm to use to balance the _Accuracy_ and _Throughput_ modes?

# Future possibilities
[future-possibilities]: #future-possibilities

- **Dynamic mode selection**: We could introduce a mechanism to dynamically switch between _Accuracy_ and _Throughput_
modes based on the current load and performance characteristics of the service.
- **Fine-grained configuration**: We could provide more granular configuration options to allow users to fine-tune the
behaviour of the multi-threaded Limitador service based on their specific requirements. As shown below, we might be able
to provide a richer interface that allows the user to configure the service in a balanced and/or more granular way.
```bash
limitador-server --mode=balanced --accuracy=0.1 --throughput=0.9 memory
```

or simply
```bash
limitador-server --accuracy=0.1 --throughput=0.9 memory
```

- **Performance optimizations**: We could explore various performance optimizations such as caching, pre-fetching, and
load balancing to further improve the throughput and efficiency of the multi-threaded Limitador service.