-
Notifications
You must be signed in to change notification settings - Fork 203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add prometheus.relabel external cache proposal #1600
base: main
Are you sure you want to change the base?
Changes from 3 commits
74887d9
4660ad7
8d68d6b
8684ddd
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,172 @@ | ||
# Proposal: Prometheus relabel external cache | ||
|
||
- Author(s): Wilfried ROSET (@wilfriedroset), Pierre BAILHACHE (@pbailhache) | ||
- Last updated: 2024-09-02 | ||
- Original issue: <https://github.com/grafana/alloy/issues/1600> | ||
|
||
## Abstract | ||
|
||
This proposal introduces a way to configure the component `prometheus.relabel` so that it could use an external cache such as `Redis` or `Memcached` instead of `in-memory`. | ||
|
||
## Problem | ||
|
||
The `prometheus.relabel` component rewrites the label set of each metric passed along to the exported receiver by applying one or more relabeling rules. To do so it uses a relabeling cache stored in memory. For each `prometheus.relabel` component Alloy will create a dedicated relabel cache. This is not a huge issue per se due to the possibility of registering multiple rules into on component to mutualize the cache. | ||
|
||
However if you're horizontally scaling Alloy deployment with load-balancing, you will end up with one cache per Alloy instance and those local caches will have overlaps, increasing the footprint of each instance. | ||
Moreover, the cache is tied to the pod, which means that a new Alloy process starts with an empty cache. | ||
|
||
For a couple of horizontally scaled Alloy pods it's acceptable but if you plan to have lots of instances processing data horizontally it's not sustainable. | ||
|
||
## Proposal | ||
|
||
Allow to use `Redis` or `Memcached` instead of the `in-memory` one as cache. | ||
|
||
Using [dskit](https://github.com/grafana/dskit/blob/main/cache/cache.go) code to manage connection and client configuration. | ||
|
||
We could create an interface to avoid changing the code logic and to abstract the kind of cache used to the component. | ||
|
||
## Pros and cons | ||
|
||
**Pros:** | ||
|
||
- No logic change so impact is expected to be null for users | ||
- Possibility to use an external cache if needed, even having multiple caches for different relabeling components. | ||
- Will be easy to use in other parts of the codebase | ||
|
||
**Cons:** | ||
|
||
- Config is a bit more complex compared to previous one | ||
|
||
## Alternative solutions | ||
|
||
The alternative is to do nothing as we deem this improvement unnecessary. | ||
|
||
## Compatibility | ||
|
||
This proposal offer to deprecate for a couple of release the old way to configure the in-memory cache, then release+2 drop the old way. Doing so allow to migrate the settings to the correct block. | ||
|
||
## Implementation | ||
|
||
We should add a re-usable cache interface compatible with multiple backend, it should be usable with different value types | ||
|
||
```golang | ||
type Cache[valueType any] interface { | ||
Get(key string) (*valueType, error) | ||
Set(key string, value *valueType, ttl time.Duration) error | ||
Remove(key string) | ||
GetMultiple(keys []string) ([]*valueType, error) | ||
SetMultiple(values map[string]*valueType, ttl time.Duration) error | ||
Clear(newSize int) error | ||
GetCacheSize() int | ||
} | ||
``` | ||
|
||
Each backend cache should be implemented in a separate file. For now we should support in_memory, redis and memcached. | ||
|
||
We should add a way to select the cache and its connections options through the components arguments | ||
|
||
For example, based on what's done in [Mimir index cache](https://github.com/grafana/mimir/blob/main/pkg/storage/tsdb/index_cache.go#L47): | ||
|
||
```golang | ||
type Arguments struct { | ||
// Where the relabelled metrics should be forwarded to. | ||
ForwardTo []storage.Appendable `alloy:"forward_to,attr"` | ||
|
||
// The relabelling rules to apply to each metric before it's forwarded. | ||
MetricRelabelConfigs []*alloy_relabel.Config `alloy:"rule,block,optional"` | ||
|
||
// DEPRECATED Use type = inmemory and cache_size field. | ||
InMemoryCacheSizeDeprecated int `alloy:"max_cache_size,attr,optional"` | ||
|
||
// Cache backend configuration. | ||
CacheConfig cache.CacheConfig `alloy:"cache,block,optional"` | ||
} | ||
|
||
type CacheConfig struct { | ||
cache.BackendConfig `yaml:",inline"` | ||
InMemory InMemoryCacheConfig `yaml:"inmemory"` | ||
} | ||
|
||
type InMemoryCacheConfig struct { | ||
CacheSize int `yaml:"cache_size"` | ||
} | ||
|
||
------ | ||
type BackendConfig struct { | ||
Backend string `yaml:"backend"` | ||
Memcached MemcachedClientConfig `yaml:"memcached"` | ||
Redis RedisClientConfig `yaml:"redis"` | ||
} | ||
``` | ||
|
||
Configuration should be `in_memory` by default. | ||
`max_cache_size` should still be taken into account but only if `backend = in_memory`. It also should be deprecated and we should redirect to the `InMemoryRelabelCacheConfig.CacheSize` field. | ||
|
||
Here is some examples: | ||
|
||
- legacy config unchanged | ||
|
||
```river | ||
prometheus.relabel "legacy_config" { | ||
forward_to = [...] | ||
max_cache_size = 10000000 | ||
rule { | ||
... | ||
} | ||
|
||
} | ||
``` | ||
|
||
- redis config | ||
|
||
```river | ||
prometheus.relabel "redis_config" { | ||
forward_to = [...] | ||
cache { | ||
backend = "redis" | ||
redis { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Instead of a bespoke redis/memcached action, how about a redis and memcached component that exports a key value interface? Along with generic accessors. I could see this being used for more than just relabel. Much in the same way we allow components to consume strings but those strings can come from vault, file, http etc. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could even get clever and allow prometheus.relabel to have a list of caches, if there are problems with redis then fallback to in memory.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I definitively like the idea of having a unified way to define the caching mecanism.
You are correct but this add an extra complexity to take into account. A highly available external cache is indeed recommended but having a fallback is definitively a good idea. The extra complexity comes from when the external caches becomes unavailable Alloy needs to use the in-memory one. This could lead to OOM, e.g: you have plan to use an external cache then you fallback on in-memory but the limits of your pod is not high enough. That being said, as I'm not that familiar with the code base yet and your proposal seems harder to achieve. What do you think about extending our proposal to type RelabelCache interface {
Get(key uint64) (*labelAndID, error)
Set(key uint64, value *labelAndID) error
Remove(key uint64)
Clear(newSize int) error
GetCacheSize() int
} Once done, the associated work to create the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fallback caches are likely a stretch though I do feel the kv store should be a generic component. As far as the interface I would go with something like the below. Disclaimer this is getting into the deeper technical ideas. The reason keys might want to be strings is to mirror how alloy uses maps. Where the key is always a string.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You shouldn't have to specifically use that for the first round. Capsules are mainly if you want to bridge into it being accessible within the alloy config. In this case your likely passing an interface/pointer from go code to go code, which should work without problem. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've updated the proposal to take reflect what we have been discussing. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I also think we will likely need There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @mattdurham I've tried to take your feedbacks into account, the PR should be fine what do you think about it? p.s: I don't know what to do about the build 😅 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just opened a draft pull request for the feature : This is still a work in progress and I'm ready to iterate over it. |
||
endpoint = "redis.url" | ||
username = "user" | ||
password = "password" | ||
... | ||
} | ||
} | ||
... | ||
} | ||
``` | ||
|
||
- new in memory config | ||
|
||
```river | ||
prometheus.relabel "inmemory_config" { | ||
forward_to = [...] | ||
cache { | ||
backend = "inmemory" | ||
inmemory { | ||
cache_size = 10000000 | ||
} | ||
} | ||
... | ||
} | ||
``` | ||
|
||
- memcached config | ||
|
||
```river | ||
prometheus.relabel "memcached_config" { | ||
forward_to = [...] | ||
cache { | ||
backend = "memcached" | ||
memcached { | ||
addresses = "address1, address2" | ||
timeout = 10 | ||
... | ||
} | ||
} | ||
... | ||
} | ||
``` | ||
|
||
## Related open issues | ||
|
||
N/A |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think our accepted backwards compatibility guarantees go against this. Any breaking changes for stable functionality would have to wait until v2.0.
Is there any reason for wanting to deprecate it? We can't control the pace of users adopting new versions, so IMO doing our best to avoid breaking their configurations and use cases would be the way to go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't want to deprecate anything sooner than what is guaranteed.
I will rephrase to say that we keep both compatibility until the next major release