Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(store): set optimal badger config to avoid memory spikes #1072

Merged
merged 14 commits into from
Sep 25, 2024

Conversation

srene
Copy link
Contributor

@srene srene commented Sep 13, 2024

PR Standards

Opening a pull request should be able to meet the following requirements

--

PR naming convention: https://hackmd.io/@nZpxHZ0CT7O5ngTp0TP9mg/HJP_jrm7A


Close #1067

<-- Briefly describe the content of this pull request -->

For Author:

  • Targeted PR against correct branch
  • included the correct type prefix in the PR title
  • Linked to Github issue with discussion and accepted design
  • Targets only one github issue
  • Wrote unit and integration tests
  • All CI checks have passed
  • Added relevant godoc comments

For Reviewer:

  • confirmed the correct type prefix in the PR title
  • Reviewers assigned
  • confirmed all author checklist items have been addressed

After reviewer approval:

  • In case targets main branch, PR should be squashed and merged.
  • In case PR targets a release branch, PR should be rebased.

@srene srene requested a review from a team as a code owner September 13, 2024 08:55
@srene srene self-assigned this Sep 13, 2024
@srene srene marked this pull request as draft September 13, 2024 08:56
store/badger.go Fixed Show fixed Hide fixed
@srene srene marked this pull request as ready for review September 13, 2024 18:16
store/badger.go Outdated
Comment on lines 16 to 17
gcTimeout = 1 * time.Minute
discardRatio = 0.5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you comment a justification for these? link to some relevant forum thread or reasoning?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

store/badger.go Outdated
Comment on lines 78 to 89
ctx := context.Background()
for {
select {
case <-ctx.Done():
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ctx is useless

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

case <-gcTimeout.C:
err := b.db.RunValueLogGC(discardRatio)
if err != nil {
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to log?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log added

store/badger.go Fixed Show fixed Hide fixed
Copy link
Contributor

@danwt danwt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good just not sure about this conditional code based on context

store/badger.go Outdated
Comment on lines 53 to 54
if ctx != context.TODO() {
go b.gc(ctx, gcTimeout, discardRatio, logger)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bit weird
can you comment?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what exactl is the problem with running the gc even with context todo?
Cant you just handle ErrGCInMemoryMode at the callsite?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it does not make sense. i actually removed the context.

@srene srene requested a review from danwt September 17, 2024 14:29
db: db,
}
go b.gc(gcTimeout, discardRatio, logger)

Check notice

Code scanning / CodeQL

Spawning a Go routine Note

Spawning a Go routine may be a possible source of non-determinism
danwt
danwt previously approved these changes Sep 18, 2024
store/badger.go Outdated
func (b *BadgerKV) gc(period time.Duration, discardRatio float64, logger types.Logger) {
ticker := time.NewTicker(period)
defer ticker.Stop()
for range ticker.C {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add graceful shutdown handling
and probably godoc

case <-db.ctx.Done():
    // Exit the function if the database context is cancelled
    return
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

channel added as an exit case

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the db has exposed ctx.
from the docs:

// Only one GC is allowed at a time. If another value log GC is running, or DB
// has been closed, this would return an ErrRejected.

maybe it's more elegenat to just check for this type of error

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but if we check for ErrRejected, it maybe the previous didnt finish for some reason, for instance, and we kill the loop while it was not necessary...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fine
Even though I'm not sure how maybe the previous didnt finish is possible. does RunValueLogGC actually runs in the background?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, sorry, this shoudlnt happen, but from the docs not sure if ErrRejected will only happen when closing the db...

// Not really necessary if disabling compression
opts.BlockCacheSize = 0
// compressions reduces storage usage but increases memory consumption, specially during compaction
opts.Compression = options.None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no compression at all?
maybe we need to separate between sequencer/full node/archive

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the problem with dymint store is not the storage space but the memory consumption when compacting. its true its not optimal (the size it can be doubled when using no compression) but taking into account that in an unpruned node the application db space can be more than 10x the dymint storage space, it does not seem to be the issue. also this is a short-term solution to avoid oom issues in nodes. long-term solution will probably be replace badger by a more optimal store for dymint requirements.

mtsitrin
mtsitrin previously approved these changes Sep 19, 2024
danwt
danwt previously approved these changes Sep 19, 2024
Copy link
Contributor

@danwt danwt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The closing channel is a bit weird, that's the point of context, which is what was there before

so context would make more sense?

@srene srene dismissed stale reviews from danwt and mtsitrin via 70dc4b7 September 25, 2024 08:16
@omritoptix omritoptix merged commit 5f8b1f7 into main Sep 25, 2024
6 checks passed
@omritoptix omritoptix deleted the srene/1067-badger-mem-fix branch September 25, 2024 10:26
srene added a commit that referenced this pull request Sep 25, 2024
omritoptix pushed a commit that referenced this pull request Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Badger compaction memory optimization
4 participants