-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
node: startup cleanup plus enable jemalloc #1819
base: stage
Are you sure you want to change the base?
Conversation
// Args expose available global args for cli command | ||
type Args struct { | ||
ConfigPath string | ||
// ConfigPath is a path to main configuration file. | ||
ConfigPath string | ||
// ShareConfigPath is an additional config file (path) that (if present) will overwrite | ||
// configuration supplied from config file at ConfigPath. | ||
ShareConfigPath string | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we could add another comment line explaining why we might need 2 config files (not just 1).
|
||
replace github.com/dgraph-io/ristretto => github.com/dgraph-io/ristretto v0.1.1-0.20211108053508-297c39e6640f |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what this line is about,
I've upgraded badgerdb
to the latest release in this PR (which upgrades ristretto
by extension to a newer version) - hence we must remove it for code to compile (or at least adjust somehow).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job
storage/kv/badger.go
Outdated
out := z.CallocNoRef(1, "jemalloc check") | ||
defer z.Free(out) | ||
jemallocEnabled := len(out) > 0 | ||
logger.Debug("jemalloc allocator will be used", zap.Bool("jemalloc_enabled", jemallocEnabled)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pretty much the best way I found to check if jemalloc
is actually used or not,
note, we print true both for -tags="blst_enabled,jemalloc"
and -tags="blst_enabled,jemalloc,allocator"
, and hence it reports on the presence of jemalloc-only (nothing about allocator
),
allocator
is a "different strategy" to approach memory management (from what I understand, it serves as a kind of cache between go application and jemalloc buffering large chunks of memory - hence it will make less cgo calls but will be holding onto large unused memory chunks increasing overall memory consumption):
// Allocator amortizes the cost of small allocations by allocating memory in
// bigger chunks. Internally it uses z.Calloc to allocate memory. Once
// allocated, the memory is not moved, so it is safe to use the allocated bytes
// to unsafe cast them to Go struct pointers. Maintaining a freelist is slow.
// Instead, Allocator only allocates memory, with the idea that finally we
// would just release the entire Allocator.
type Allocator struct {
sync.Mutex
compIdx uint64 // Stores bufIdx in 32 MSBs and posIdx in 32 LSBs.
buffers [][]byte
Ref uint64
Tag string
}
It's hard to tell without doing any tests (or me not having enough context on how exactly we use Badger), but I think if we want to reduce memory footprint as much as possible - we better go with -tags="blst_enabled,jemalloc"
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's hard to tell without doing any tests (or me not having enough context on how exactly we use Badger), but I think if we want to reduce memory footprint as much as possible - we better go with -tags="blst_enabled,jemalloc"
Having done some testing on stage I would say there isn't a noticeable difference for the workloads we do (for operator managing ~500 validators), see #1643 (comment) for more details. So I've added allocator
tag as well to match how it originally was configured back when we first started using jemalloc
.
cli/operator/node.go
Outdated
// load & parse local events yaml if exists, otherwise sync from contract | ||
if len(cfg.LocalEventsPath) != 0 { | ||
localEvents, err := localevents.Load(cfg.LocalEventsPath) | ||
if err != nil { | ||
logger.Fatal("failed to load local events", zap.Error(err)) | ||
} | ||
// Sync historical registry events from Ethereum smart contract. | ||
logger.Debug("syncing historical registry events", zap.Uint64("fromBlock", fromBlock.Uint64())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is unreachable because setupEventHandling
is only ever called when len(cfg.LocalEventsPath) == 0
. Which kind of suggests we are never calling EventHandler.HandleLocalEvents
... which is kind of weird ?
I also don't see cfg.LocalEventsPath
being used anywhere to actually read the contents of the file, not sure if such usage was removed in the past or just hasn't been added yet, @nkryuchkov could you elaborate ? I'd rather clean it up while we are at it (by maybe removing cfg.LocalEventsPath
altogether if we don't need it anymore).
Just as additional context - the current behavior seems to be that we are synching every Ethereum smart contract event that happened since block reported by nodeStorage.GetLastProcessedBlock
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We always need to call setupEventHandling
because it handles both on-chain and local events. cfg.LocalEventsPath
is used by localevents.Load(cfg.LocalEventsPath)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cfg.LocalEventsPath is used by localevents.Load(cfg.LocalEventsPath)
But like I mentioned above, we never enter the branch that executes localevents.Load(cfg.LocalEventsPath)
because it's "guarded" by 2 contradicting conditions:
len(cfg.LocalEventsPath) == 0
len(cfg.LocalEventsPath) != 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iurii-ssv in your PR we never do because you call setupEventHandling
only if !usingLocalEvents
, but on stage we always call setupEventHandling
, which calls localevents.Load(cfg.LocalEventsPath)
if len(cfg.LocalEventsPath) != 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, you are right ... I see what's going on. On stage the following happens:
eventSyncer := setupEventHandling(
cmd.Context(),
logger,
executionClient,
validatorCtrl,
metricsReporter,
networkConfig,
nodeStorage,
operatorDataStore,
operatorPrivKey,
keyManager,
)
if len(cfg.LocalEventsPath) == 0 {
nodeProber.AddNode("event syncer", eventSyncer)
}
And setupEventHandling
returning eventSyncer
baited me into thinking "there is no need to even call setupEventHandling
if we aren't gonna do nodeProber.AddNode("event syncer", eventSyncer)
" - and I merged these 2 under if len(cfg.LocalEventsPath) == 0
.
Let me revert that part, and see how I can separate those things from each other (to make it clear & explicit).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, reworked this a bit, relevant commit is here 9900a0b and PR is ready (but I'll redeploy to stage to test it again 1 more time).
What's not 100% clear to me though is "what those local events are", I thought of it as some kind of cache - but that's unlikely to be correct because we don't "subscribe" for events that come after the last event in cfg.LocalEventsPath
file, could you clarify (and I'll add a comment about it somewhere) ?
44e4e0c
to
463fc76
Compare
463fc76
to
9900a0b
Compare
Re-tested on stage for post-Alan-fork (previuosly tested for pre-Alan-fork). |
cmd.Context(), | ||
logger, | ||
executionClient, | ||
eventFilterer, err := executionClient.Filterer() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we try to extract some lines to a function/method/entity to avoid increasing the size of this function? It's quite large and IMO we need to gradually decrease its size
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that we want function size to be smaller (not larger) pretty much all the time, but this is not the only/best thing to optimize for - the more important thing is that general code structure stays predictable (easy to understand and navigate),
so in that sense grouping related code together rather than splitting it apart (possibly mixing it with unrelated things - aka mixing different abstraction layers) could be a better choice - and above is a good example of why we might want to "untangle" code sometimes
and now that it is untangled - we can find a better way to refactor StartNodeCmd.Run
- which I'm trying to do in #1843 (better do it in a separate PR cause this one is getting larger and larger already, plus it's also better to do after we merge #1820 cause it removes some code from there too)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but I'd prefer not to increase the size of code initializing the node
ac23d12
to
e5e652b
Compare
This PR contains two different features (I bundled these together so I could test both extensively on stage in one fell swoop):
Before merging: