-
Notifications
You must be signed in to change notification settings - Fork 203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add support of multiple kind of cache for relabeling components #1692
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments to open discussion to improve this code
@@ -80,7 +80,6 @@ func (s *server) Run(ctx context.Context) error { | |||
}) | |||
|
|||
mw := middleware.Instrument{ | |||
RouteMatcher: r, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about the impact of this change (This is related to the update of dskit : grafana/dskit@27d7d41)
return fmt.Errorf("cache_size must be greater than 0 and is %d", arg.CacheConfig.InMemory.CacheSize) | ||
} | ||
case cache.Memcached: | ||
// todo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to determine what to include here, maybe move this to the service/cache package ?
@@ -230,7 +264,13 @@ func (c *Component) Update(args component.Arguments) error { | |||
defer c.mut.Unlock() | |||
|
|||
newArgs := args.(Arguments) | |||
c.clearCache(newArgs.CacheSize) | |||
|
|||
// todo maybe recreate whole relabelCache here in case of change for redis/memcached client |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As there is no clearCache for redis or memcached, I don't really knows what to do here with those kind of cache
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's not supported by redis/memchached, can we simply return an error on call to clearCache for redis/memcached ? Or would it break some stuff ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would be a no-op, generally this is used to reset the cache size. In the case if redis/memcache I imagine this would not even be called. IE the cache size should not exist within Alloy.
// Ideally we should be using the dskit/cache conf directly, but it means it should not | ||
// be into the alloy configuration ? | ||
|
||
type RedisConf struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As explained in the comment, I'm open to suggestion here on how to handle the config part.
For now each cache is configured at the relabel component level but this include copying the struct to add the alloy
tags as we cannot embed the dskit/cache configs into the grafana alloy config.
If we decide to move this config outside of the alloy config we could directly use the dskit/cache config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is a re-implementation of the LRU cached that was present in the prometheus relabel.
} | ||
|
||
func newMemcachedCache[valueType any](cfg MemcachedConfig) (*MemcachedCache[valueType], error) { | ||
client, err := cache.NewMemcachedClientWithConfig( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some things to add here (same in the other implementation)
func (c *MemcachedCache[valueType]) Remove(key string) { | ||
ctx := context.Background() | ||
//TODO manage error | ||
_ = c.client.Delete(ctx, key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We ignore error for now but this isn't ideal
} | ||
|
||
func newRedisCache[valueType any](cfg RedisConf) (*RedisCache[valueType], error) { | ||
client, err := cache.NewRedisClient( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some things to add here (same in the other implementation)
|
||
func (c *RedisCache[valueType]) Remove(key string) { | ||
ctx := context.Background() | ||
//TODO manage error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ignoring error is never ideal
"sync" | ||
"time" | ||
|
||
lru "github.com/hashicorp/golang-lru/v2" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't there also an lru cache interface in dskit/cache that we should use instead ?
) | ||
|
||
type InMemoryCache[valueType any] struct { | ||
lru *lru.Cache[string, *valueType] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we worry about cache strategy, or is it out of scope of this PR ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previous cache was LRU so I kept it this way, I think it's out of scope
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LRU should be kept as is, if we want to change that behavior should be a separate PR.
found := false | ||
values[key], found = c.lru.Get(key) | ||
if !found { | ||
return nil, errNotFound |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This behavior is different from memcached's GetMulti : https://github.com/grafana/dskit/blob/931a021fb06b39732425870848e12b5a61333cb9/cache/memcached_client.go#L374
if err := encoder.Encode(*value); err != nil { | ||
return err | ||
} | ||
c.client.SetAsync(key, indexBuffer.Bytes(), ttl) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it weird to have no way of being notified of an error here (and in SetMulti), but it seems to be how asyncQueue works, so not much to be done here...
@@ -230,7 +264,13 @@ func (c *Component) Update(args component.Arguments) error { | |||
defer c.mut.Unlock() | |||
|
|||
newArgs := args.(Arguments) | |||
c.clearCache(newArgs.CacheSize) | |||
|
|||
// todo maybe recreate whole relabelCache here in case of change for redis/memcached client |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's not supported by redis/memchached, can we simply return an error on call to clearCache for redis/memcached ? Or would it break some stuff ?
DB int `alloy:"db,attr"` | ||
|
||
// MaxAsyncConcurrency specifies the maximum number of SetAsync goroutines. | ||
MaxAsyncConcurrency int `yaml:"max_async_concurrency" category:"advanced"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there some decent default values that we can chose for this and all the *BufferSize ?
Password: flagext.SecretWithValue(""), | ||
MaxAsyncConcurrency: cfg.MaxAsyncConcurrency, | ||
MaxAsyncBufferSize: cfg.MaxAsyncBufferSize, | ||
DB: 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not set DB to cfg.DB here ? Default value would still be 0 and it would be possible to configure
CacheConfig: cache.CacheConfig{ | ||
Backend: cache.InMemory, | ||
InMemory: cache.InMemoryCacheConfig{ | ||
CacheSize: 100_100, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo here?
Before we get to far down this path, I have a few longer term concerns about using this while we still have the handling of a single metric at a time. In an ideal world the Appender a batch based interface so we can batch the cache requests but thats a bigger lift. |
Thank you @mattdurham for your review and feedbacks. regarding your concerns
Do you think it would be possible to split the two needs? First we introduce the external caches via this PR, then after we take time to design and implement the batching to optimize further the performance. As you have pointed, batching the requests will require more work. WDYT? |
Really want to see some benchmarks with using redis/memcache, use something like https://golang.testcontainers.org/modules/redis/ with relabel and a few thousand signals. |
I made a bench branch, you can see the commit here : Here are the result of the benchmark on my laptop, running with dockertest. It's a bit better with a running docker container outside of the benchmark (around 65000ns/op for redis). Here is the 3 flamegraphs for each backend, as we can see, the Get calls to the redis/memcached are what makes it slow. Here I'm still using the deskit/cache package, I chose to use it because I did not want to re-implement the whole cache clients system. Do you think it's better to not use it here, I could implement a simpler version of each client and bench that, but I don't think the difference will be huge. |
Actually we are already using some dskit structs in a few places, will review your imports. We recently found an issue with exemplars that will likely require changes to the appender/appender interface that will batch the samples which should significantly improve the viability of this. I would hold off until we get that out, which will likely be in two weeks. |
PR Description
Related to proposal introduced by #1600.
This is a (working) draft for this feature.
Which issue(s) this PR fixes
Notes to the Reviewer
PR Checklist