Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] TBS: Replace badger with pebble #15235

Draft
wants to merge 39 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
7478521
WIP
carsonip Dec 16, 2024
29b3a63
Add pebble
carsonip Dec 16, 2024
452c0ca
Use pebble
carsonip Dec 16, 2024
e30ee49
Implement sampling decision
carsonip Dec 16, 2024
3467415
Fix locking, use NoSync
carsonip Dec 16, 2024
886643f
Tune batch flush size, add delete
carsonip Dec 16, 2024
7d2618d
Sharded
carsonip Dec 16, 2024
79e32c0
Increase flush threshold to 5MB
carsonip Dec 16, 2024
75efa89
Merge branch 'main' into tbs-pebble
carsonip Dec 19, 2024
dc5bc9a
Merge branch 'main' into tbs-pebble
carsonip Jan 14, 2025
d2b1080
Replace badger
carsonip Jan 14, 2025
640ee0c
Remove gc loop and drop loop
carsonip Jan 14, 2025
8596317
Fix test
carsonip Jan 14, 2025
3a6f727
Restore storage monitoring
carsonip Jan 14, 2025
fd9abbc
Update processor test
carsonip Jan 14, 2025
896dedb
Explain AlreadyTailSampled failure
carsonip Jan 14, 2025
2c70ad5
Update size estimation
carsonip Jan 14, 2025
cc38dbb
Add FIXME
carsonip Jan 14, 2025
8841d41
Flush by number of pending writes for fair perf comparison
carsonip Jan 14, 2025
adac2a2
Try disable WAL
carsonip Jan 14, 2025
a71fab0
Try in-memory mode
carsonip Jan 14, 2025
15387c8
Revert "Try in-memory mode"
carsonip Jan 14, 2025
d73809d
Revert "Try disable WAL"
carsonip Jan 14, 2025
b294877
Use fork for pebble batch config
carsonip Jan 14, 2025
e2d2cdf
Add db commit threshold bytes
carsonip Jan 14, 2025
60ab9ca
Use BatchOption
carsonip Jan 14, 2025
ef1b422
Add FIXME
carsonip Jan 14, 2025
4b1cbdd
Fix compile error
carsonip Jan 14, 2025
c0e679c
Add logger to pebble
carsonip Jan 14, 2025
c883c35
Sort imports, remove logger.go
carsonip Jan 14, 2025
08c9205
Disable pebble level compression
carsonip Jan 14, 2025
1c89735
Add table bloom filter
carsonip Jan 14, 2025
c35dcb9
Add FIXME
carsonip Jan 14, 2025
ea7b481
Add prefix to sampling decision key to separate from events to improv…
carsonip Jan 15, 2025
5dde3d0
Use a sync.Map for sampling decision to establish baseline perf
carsonip Jan 15, 2025
b8f50f3
Try enabling snappy to trade cpu for reduced disk IO / usage
carsonip Jan 15, 2025
554e93d
Try 16KB block size for better compression
carsonip Jan 15, 2025
e1694a1
Try FormatMajorVersion
carsonip Jan 15, 2025
a1e9010
Revert "Use a sync.Map for sampling decision to establish baseline perf"
carsonip Jan 15, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ go 1.23.0
require (
github.com/KimMachineGun/automemlimit v0.7.0-pre.3
github.com/cespare/xxhash/v2 v2.3.0
github.com/cockroachdb/pebble v1.1.2
github.com/dgraph-io/badger/v2 v2.2007.4
github.com/dustin/go-humanize v1.0.1
github.com/elastic/apm-aggregation v1.2.0
Expand Down Expand Up @@ -66,9 +67,7 @@ require (
github.com/beorn7/perks v1.0.1 // indirect
github.com/cespare/xxhash v1.1.0 // indirect
github.com/cockroachdb/errors v1.11.3 // indirect
github.com/cockroachdb/fifo v0.0.0-20240816210425-c5d0cb0b6fc0 // indirect
github.com/cockroachdb/logtags v0.0.0-20230118201751-21c54148d20b // indirect
github.com/cockroachdb/pebble v1.1.2 // indirect
github.com/cockroachdb/redact v1.1.5 // indirect
github.com/cockroachdb/tokenbucket v0.0.0-20230807174530-cc333fc44b06 // indirect
github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc // indirect
Expand Down Expand Up @@ -159,3 +158,5 @@ require (
)

replace github.com/dop251/goja => github.com/elastic/goja v0.0.0-20190128172624-dd2ac4456e20 // pin to version used by beats

replace github.com/cockroachdb/pebble => github.com/carsonip/pebble v0.0.0-20250114162318-fa34738bbef0
6 changes: 2 additions & 4 deletions go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,8 @@ github.com/axiomhq/hyperloglog v0.2.0/go.mod h1:GcgMjz9gaDKZ3G0UMS6Fq/VkZ4l7uGgc
github.com/benbjohnson/clock v1.1.0/go.mod h1:J11/hYXuz8f4ySSvYwY0FKfm+ezbsZBKZxNJlLklBHA=
github.com/beorn7/perks v1.0.1 h1:VlbKKnNfV8bJzeqoa4cOKqO6bYr3WgKZxO8Z16+hsOM=
github.com/beorn7/perks v1.0.1/go.mod h1:G2ZrVWU2WbWT9wwq4/hrbKbnv/1ERSJQ0ibhJ6rlkpw=
github.com/carsonip/pebble v0.0.0-20250114162318-fa34738bbef0 h1:FSsCMsR/nTCbTsfbxQu2Xy5VArWxzgjBXRe0uEJiMMI=
github.com/carsonip/pebble v0.0.0-20250114162318-fa34738bbef0/go.mod h1:sEHm5NOXxyiAoKWhoFxT8xMgd/f3RA6qUqQ1BXKrh2E=
github.com/census-instrumentation/opencensus-proto v0.2.1/go.mod h1:f6KPmirojxKA12rnyqOA5BBL4O983OfeGPqjHWSTneU=
github.com/cespare/xxhash v1.1.0 h1:a6HrQnmkObjyL+Gs60czilIUGqrzKutQD6XZog3p+ko=
github.com/cespare/xxhash v1.1.0/go.mod h1:XrSqR1VqqWfGrhpAt58auRo0WTKS1nRRg3ghfAqPWnc=
Expand All @@ -71,12 +73,8 @@ github.com/cockroachdb/datadriven v1.0.3-0.20230413201302-be42291fc80f h1:otljaY
github.com/cockroachdb/datadriven v1.0.3-0.20230413201302-be42291fc80f/go.mod h1:a9RdTaap04u637JoCzcUoIcDmvwSUtcUFtT/C3kJlTU=
github.com/cockroachdb/errors v1.11.3 h1:5bA+k2Y6r+oz/6Z/RFlNeVCesGARKuC6YymtcDrbC/I=
github.com/cockroachdb/errors v1.11.3/go.mod h1:m4UIW4CDjx+R5cybPsNrRbreomiFqt8o1h1wUVazSd8=
github.com/cockroachdb/fifo v0.0.0-20240816210425-c5d0cb0b6fc0 h1:pU88SPhIFid6/k0egdR5V6eALQYq2qbSmukrkgIh/0A=
github.com/cockroachdb/fifo v0.0.0-20240816210425-c5d0cb0b6fc0/go.mod h1:9/y3cnZ5GKakj/H4y9r9GTjCvAFta7KLgSHPJJYc52M=
github.com/cockroachdb/logtags v0.0.0-20230118201751-21c54148d20b h1:r6VH0faHjZeQy818SGhaone5OnYfxFR/+AzdY3sf5aE=
github.com/cockroachdb/logtags v0.0.0-20230118201751-21c54148d20b/go.mod h1:Vz9DsVWQQhf3vs21MhPMZpMGSht7O/2vFW2xusFUVOs=
github.com/cockroachdb/pebble v1.1.2 h1:CUh2IPtR4swHlEj48Rhfzw6l/d0qA31fItcIszQVIsA=
github.com/cockroachdb/pebble v1.1.2/go.mod h1:4exszw1r40423ZsmkG/09AFEG83I0uDgfujJdbL6kYU=
github.com/cockroachdb/redact v1.1.5 h1:u1PMllDkdFfPWaNGMyLD1+so+aq3uUItthCFqzwPJ30=
github.com/cockroachdb/redact v1.1.5/go.mod h1:BVNblN9mBWFyMyqK1k3AAiSxhvhfK2oOZZ2lK+dpvRg=
github.com/cockroachdb/tokenbucket v0.0.0-20230807174530-cc333fc44b06 h1:zuQyyAKVxetITBuuhv3BI9cMrmStnpT18zmgmTxunpo=
Expand Down
32 changes: 16 additions & 16 deletions x-pack/apm-server/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -40,9 +40,9 @@ var (
// will hopefully disappear in the future, when agents no longer send unsampled transactions.
samplingMonitoringRegistry = monitoring.Default.GetRegistry("apm-server.sampling")

// badgerDB holds the badger database to use when tail-based sampling is configured.
badgerMu sync.Mutex
badgerDB *eventstorage.StorageManager
// db holds the database to use when tail-based sampling is configured.
dbMu sync.Mutex
db *eventstorage.StorageManager

storageMu sync.Mutex
storage *eventstorage.ManagedReadWriter
Expand Down Expand Up @@ -117,11 +117,11 @@ func newTailSamplingProcessor(args beater.ServerParams) (*sampling.Processor, er
}

storageDir := paths.Resolve(paths.Data, tailSamplingStorageDir)
badgerDB, err = getBadgerDB(storageDir)
db, err := getDB(storageDir)
if err != nil {
return nil, fmt.Errorf("failed to get Badger database: %w", err)
}
readWriter := getStorage(badgerDB)
readWriter := getStorage(db)

policies := make([]sampling.Policy, len(tailSamplingConfig.Policies))
for i, in := range tailSamplingConfig.Policies {
Expand Down Expand Up @@ -155,7 +155,7 @@ func newTailSamplingProcessor(args beater.ServerParams) (*sampling.Processor, er
UUID: samplerUUID.String(),
},
StorageConfig: sampling.StorageConfig{
DB: badgerDB,
DB: db,
Storage: readWriter,
StorageDir: storageDir,
StorageGCInterval: tailSamplingConfig.StorageGCInterval,
Expand All @@ -166,17 +166,17 @@ func newTailSamplingProcessor(args beater.ServerParams) (*sampling.Processor, er
})
}

func getBadgerDB(storageDir string) (*eventstorage.StorageManager, error) {
badgerMu.Lock()
defer badgerMu.Unlock()
if badgerDB == nil {
func getDB(storageDir string) (*eventstorage.StorageManager, error) {
dbMu.Lock()
defer dbMu.Unlock()
if db == nil {
sm, err := eventstorage.NewStorageManager(storageDir)
if err != nil {
return nil, err
}
badgerDB = sm
db = sm
}
return badgerDB, nil
return db, nil
}

func getStorage(sm *eventstorage.StorageManager) *eventstorage.ManagedReadWriter {
Expand Down Expand Up @@ -251,11 +251,11 @@ func wrapServer(args beater.ServerParams, runServer beater.RunServerFunc) (beate

// closeBadger is called at process exit time to close the badger.DB opened
// by the tail-based sampling processor constructor, if any. This is never
// called concurrently with opening badger.DB/accessing the badgerDB global,
// so it does not need to hold badgerMu.
// called concurrently with opening badger.DB/accessing the db global,
// so it does not need to hold dbMu.
func closeBadger() error {
if badgerDB != nil {
return badgerDB.Close()
if db != nil {
return db.Close()
}
return nil
}
Expand Down
45 changes: 0 additions & 45 deletions x-pack/apm-server/sampling/eventstorage/badger.go

This file was deleted.

50 changes: 0 additions & 50 deletions x-pack/apm-server/sampling/eventstorage/logger.go

This file was deleted.

60 changes: 60 additions & 0 deletions x-pack/apm-server/sampling/eventstorage/pebble.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
package eventstorage

import (
"github.com/cockroachdb/pebble"
"github.com/cockroachdb/pebble/bloom"

"github.com/elastic/apm-server/internal/logs"
"github.com/elastic/elastic-agent-libs/logp"
)

const (
// Batch grows in multiples of 2 based on the initial size. For
// example, if the initial size is 1MB then the batch will grow as
// {2, 4, 8, 16, ...}. If a batch of size greater than 4MBs is
// consistently committed then that batch will never be retained
// if the max retained size is smaller than 8MBs as the batch capacity
// will always grow to 8MB.
initialPebbleBatchSize = 64 << 10 // 64KB
maxRetainedPebbleBatchSize = 8 << 20 // 8MB

// pebbleMemTableSize defines the max stead state size of a memtable.
// There can be more than 1 memtable in memory at a time as it takes
// time for old memtable to flush. The memtable size also defines
// the size for large batches. A large batch is a batch which will
// take atleast half of the memtable size. Note that the Batch#Len
// is not the same as the memtable size that the batch will occupy
// as data in batches are encoded differently. In general, the
// memtable size of the batch will be higher than the length of the
// batch data.
//
// On commit, data in the large batch maybe kept by pebble and thus
// large batches will need to be reallocated. Note that large batch
// classification uses the memtable size that a batch will occupy
// rather than the length of data slice backing the batch.
pebbleMemTableSize = 32 << 20 // 32MB

// dbCommitThresholdBytes is a soft limit and the batch is committed
// to the DB as soon as it crosses this threshold. To make sure that
// the commit threshold plays will with the max retained batch size
// the threshold should be kept smaller than the sum of max retained
// batch size and encoded size of aggregated data to be committed.
dbCommitThresholdBytes = 8000 << 10 // 8000KB
)

func OpenPebble(storageDir string) (*pebble.DB, error) {
return pebble.Open(storageDir, &pebble.Options{
// FIXME: Specify FormatMajorVersion to use value blocks?
FormatMajorVersion: pebble.FormatNewest,
Logger: logp.NewLogger(logs.Sampling),
MemTableSize: pebbleMemTableSize,
Levels: []pebble.LevelOptions{
{
BlockSize: 16 << 10,
Compression: pebble.SnappyCompression,
FilterPolicy: bloom.FilterPolicy(10),
FilterType: pebble.TableFilter,
},
},
})
}
Loading
Loading