feat: Add support of multiple kind of cache for relabeling components #1692

pbailhache · 2024-09-16T14:17:25Z

PR Description

Related to proposal introduced by #1600.

This is a (working) draft for this feature.

Which issue(s) this PR fixes

Notes to the Reviewer

PR Checklist

CHANGELOG.md updated
Documentation added
Tests updated
Config converters updated

pbailhache

Some comments to open discussion to improve this code

pbailhache · 2024-09-16T14:18:54Z

internal/component/faro/receiver/server.go

@@ -80,7 +80,6 @@ func (s *server) Run(ctx context.Context) error {
 	})

 	mw := middleware.Instrument{
-		RouteMatcher:     r,


Not sure about the impact of this change (This is related to the update of dskit : grafana/dskit@27d7d41)

pbailhache · 2024-09-16T14:20:06Z

internal/component/prometheus/relabel/relabel.go

+			return fmt.Errorf("cache_size must be greater than 0 and is %d", arg.CacheConfig.InMemory.CacheSize)
+		}
+	case cache.Memcached:
+		// todo


Need to determine what to include here, maybe move this to the service/cache package ?

pbailhache · 2024-09-16T14:21:06Z

internal/component/prometheus/relabel/relabel.go

@@ -230,7 +264,13 @@ func (c *Component) Update(args component.Arguments) error {
 	defer c.mut.Unlock()

 	newArgs := args.(Arguments)
-	c.clearCache(newArgs.CacheSize)
+
+	// todo maybe recreate whole relabelCache here in case of change for redis/memcached client


As there is no clearCache for redis or memcached, I don't really knows what to do here with those kind of cache

If it's not supported by redis/memchached, can we simply return an error on call to clearCache for redis/memcached ? Or would it break some stuff ?

I think this would be a no-op, generally this is used to reset the cache size. In the case if redis/memcache I imagine this would not even be called. IE the cache size should not exist within Alloy.

pbailhache · 2024-09-16T14:23:45Z

internal/service/cache/cache.go

+// Ideally we should be using the dskit/cache conf directly, but it means it should not
+// be into the alloy configuration ?
+
+type RedisConf struct {


As explained in the comment, I'm open to suggestion here on how to handle the config part.

For now each cache is configured at the relabel component level but this include copying the struct to add the alloy tags as we cannot embed the dskit/cache configs into the grafana alloy config.

If we decide to move this config outside of the alloy config we could directly use the dskit/cache config.

pbailhache · 2024-09-16T14:24:50Z

internal/service/cache/cache_inmemory.go

This file is a re-implementation of the LRU cached that was present in the prometheus relabel.

pbailhache · 2024-09-16T14:25:25Z

internal/service/cache/cache_memcached.go

+}
+
+func newMemcachedCache[valueType any](cfg MemcachedConfig) (*MemcachedCache[valueType], error) {
+	client, err := cache.NewMemcachedClientWithConfig(


Some things to add here (same in the other implementation)

pbailhache · 2024-09-16T14:25:49Z

internal/service/cache/cache_memcached.go

+func (c *MemcachedCache[valueType]) Remove(key string) {
+	ctx := context.Background()
+	//TODO manage error
+	_ = c.client.Delete(ctx, key)


We ignore error for now but this isn't ideal

pbailhache · 2024-09-16T14:25:56Z

internal/service/cache/cache_redis.go

+}
+
+func newRedisCache[valueType any](cfg RedisConf) (*RedisCache[valueType], error) {
+	client, err := cache.NewRedisClient(


Some things to add here (same in the other implementation)

pbailhache · 2024-09-16T14:26:13Z

internal/service/cache/cache_redis.go

+
+func (c *RedisCache[valueType]) Remove(key string) {
+	ctx := context.Background()
+	//TODO manage error


Ignoring error is never ideal

internal/service/cache/cache.go

psauvage0 · 2024-09-16T15:18:26Z

internal/service/cache/cache_inmemory.go

+	"sync"
+	"time"
+
+	lru "github.com/hashicorp/golang-lru/v2"


Isn't there also an lru cache interface in dskit/cache that we should use instead ?

psauvage0 · 2024-09-16T15:18:57Z

internal/service/cache/cache_inmemory.go

+)
+
+type InMemoryCache[valueType any] struct {
+	lru       *lru.Cache[string, *valueType]


Should we worry about cache strategy, or is it out of scope of this PR ?

Previous cache was LRU so I kept it this way, I think it's out of scope

LRU should be kept as is, if we want to change that behavior should be a separate PR.

psauvage0 · 2024-09-16T15:19:49Z

internal/service/cache/cache_inmemory.go

+		found := false
+		values[key], found = c.lru.Get(key)
+		if !found {
+			return nil, errNotFound


This behavior is different from memcached's GetMulti : https://github.com/grafana/dskit/blob/931a021fb06b39732425870848e12b5a61333cb9/cache/memcached_client.go#L374

psauvage0 · 2024-09-16T15:46:39Z

internal/service/cache/cache_memcached.go

+	if err := encoder.Encode(*value); err != nil {
+		return err
+	}
+	c.client.SetAsync(key, indexBuffer.Bytes(), ttl)


I find it weird to have no way of being notified of an error here (and in SetMulti), but it seems to be how asyncQueue works, so not much to be done here...

psauvage0 · 2024-09-16T15:53:58Z

internal/component/prometheus/relabel/relabel.go

@@ -230,7 +264,13 @@ func (c *Component) Update(args component.Arguments) error {
 	defer c.mut.Unlock()

 	newArgs := args.(Arguments)
-	c.clearCache(newArgs.CacheSize)
+
+	// todo maybe recreate whole relabelCache here in case of change for redis/memcached client


If it's not supported by redis/memchached, can we simply return an error on call to clearCache for redis/memcached ? Or would it break some stuff ?

psauvage0 · 2024-09-16T16:01:39Z

internal/service/cache/cache.go

+	DB int `alloy:"db,attr"`
+
+	// MaxAsyncConcurrency specifies the maximum number of SetAsync goroutines.
+	MaxAsyncConcurrency int `yaml:"max_async_concurrency" category:"advanced"`


Are there some decent default values that we can chose for this and all the *BufferSize ?

internal/service/cache/cache.go

psauvage0 · 2024-09-17T08:37:38Z

internal/service/cache/cache_redis.go

+			Password:            flagext.SecretWithValue(""),
+			MaxAsyncConcurrency: cfg.MaxAsyncConcurrency,
+			MaxAsyncBufferSize:  cfg.MaxAsyncBufferSize,
+			DB:                  0,


Why not set DB to cfg.DB here ? Default value would still be 0 and it would be possible to configure

internal/service/cache/cache.go

mattdurham · 2024-09-17T15:36:50Z

internal/component/prometheus/relabel/relabel.go

+		CacheConfig: cache.CacheConfig{
+			Backend: cache.InMemory,
+			InMemory: cache.InMemoryCacheConfig{
+				CacheSize: 100_100,


mattdurham · 2024-09-17T15:38:56Z

Before we get to far down this path, I have a few longer term concerns about using this while we still have the handling of a single metric at a time. In an ideal world the Appender a batch based interface so we can batch the cache requests but thats a bigger lift.

wilfriedroset · 2024-09-26T14:34:13Z

Thank you @mattdurham for your review and feedbacks. regarding your concerns

about using this while we still have the handling of a single metric at a time. In an ideal world the Appender a batch based interface so we can batch the cache requests but thats a bigger lift.

Do you think it would be possible to split the two needs? First we introduce the external caches via this PR, then after we take time to design and implement the batching to optimize further the performance.

As you have pointed, batching the requests will require more work.
The reason why I'm suggesting that is because the current inmemory cache is the default and will continue to work as it is now with the same behavior/performance.
The performance improvement added by the refactoring will benefit the external caches only so there is no risks.
When I'm saying later, this is something we are willing to contribute but we might require help due to the complexity.

WDYT?

internal/service/cache/cache.go

mattdurham · 2024-10-15T15:15:13Z

Really want to see some benchmarks with using redis/memcache, use something like https://golang.testcontainers.org/modules/redis/ with relabel and a few thousand signals.

pbailhache · 2024-10-17T09:36:10Z

I made a bench branch, you can see the commit here :
pbailhache@95098e0

Here are the result of the benchmark on my laptop, running with dockertest.

It's a bit better with a running docker container outside of the benchmark (around 65000ns/op for redis).
I used this benchmark to optimize the encode/decode for redis & memcached, switching from gob to json as it a bit quicker for small structs.

Here is the 3 flamegraphs for each backend, as we can see, the Get calls to the redis/memcached are what makes it slow.
LRU :

Redis

Memcached

Here I'm still using the deskit/cache package, I chose to use it because I did not want to re-implement the whole cache clients system. Do you think it's better to not use it here, I could implement a simpler version of each client and bench that, but I don't think the difference will be huge.

mattdurham · 2024-10-17T18:51:17Z

Actually we are already using some dskit structs in a few places, will review your imports. We recently found an issue with exemplars that will likely require changes to the appender/appender interface that will batch the samples which should significantly improve the viability of this. I would hold off until we get that out, which will likely be in two weeks.

pbailhache commented Sep 16, 2024

View reviewed changes

pbailhache mentioned this pull request Sep 16, 2024

Add prometheus.relabel external cache proposal #1600

Open

psauvage0 reviewed Sep 16, 2024

View reviewed changes

psauvage0 reviewed Sep 17, 2024

View reviewed changes

pbailhache force-pushed the main branch from ba30cd8 to e196233 Compare September 17, 2024 08:46

mattdurham reviewed Sep 17, 2024

View reviewed changes

pbailhache force-pushed the main branch from e196233 to fc1cb38 Compare September 30, 2024 13:55

feat: Add support of multiple kind of cache for relabeling components

b74427d

pbailhache force-pushed the main branch from fc1cb38 to b74427d Compare September 30, 2024 14:18

mattdurham reviewed Oct 15, 2024

View reviewed changes

internal/service/cache/cache.go Show resolved Hide resolved

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add support of multiple kind of cache for relabeling components #1692

feat: Add support of multiple kind of cache for relabeling components #1692

pbailhache commented Sep 16, 2024

pbailhache left a comment

pbailhache Sep 16, 2024

pbailhache Sep 16, 2024

pbailhache Sep 16, 2024

psauvage0 Sep 16, 2024

mattdurham Sep 17, 2024

pbailhache Sep 16, 2024

pbailhache Sep 16, 2024

pbailhache Sep 16, 2024

pbailhache Sep 16, 2024

pbailhache Sep 16, 2024

pbailhache Sep 16, 2024

psauvage0 Sep 16, 2024

psauvage0 Sep 16, 2024

pbailhache Sep 17, 2024 •

edited

Loading

mattdurham Oct 15, 2024

psauvage0 Sep 16, 2024

psauvage0 Sep 16, 2024

psauvage0 Sep 16, 2024

psauvage0 Sep 16, 2024

psauvage0 Sep 17, 2024

mattdurham Sep 17, 2024

mattdurham commented Sep 17, 2024

wilfriedroset commented Sep 26, 2024

mattdurham commented Oct 15, 2024

pbailhache commented Oct 17, 2024

mattdurham commented Oct 17, 2024

feat: Add support of multiple kind of cache for relabeling components #1692

Are you sure you want to change the base?

feat: Add support of multiple kind of cache for relabeling components #1692

Conversation

pbailhache commented Sep 16, 2024

PR Description

Which issue(s) this PR fixes

Notes to the Reviewer

PR Checklist

pbailhache left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pbailhache Sep 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattdurham commented Sep 17, 2024

wilfriedroset commented Sep 26, 2024

mattdurham commented Oct 15, 2024

pbailhache commented Oct 17, 2024

mattdurham commented Oct 17, 2024

pbailhache Sep 17, 2024 •

edited

Loading