node: startup cleanup plus enable jemalloc #1819

iurii-ssv · 2024-10-26T10:18:06Z

This PR contains two different features (I bundled these together so I could test both extensively on stage in one fell swoop):

while I've been trying to run SSV node in Docker locally I've encountered a bunch of minor issues (mostly with the way config parsing is handled), this PR addresses all of these
closes Re-enable Go RSA after Alan fork #1643 by enabling jemalloc (it works fine with openssl on for go/jemalloc/badgerdb version in this PR, as per my testing done on stage)

Before merging:

test on stage

cli/bootnode/boot_node.go

iurii-ssv · 2024-10-26T10:20:33Z

cli/config/config.go

 // Args expose available global args for cli command
 type Args struct {
-	ConfigPath      string
+	// ConfigPath is a path to main configuration file.
+	ConfigPath string
+	// ShareConfigPath is an additional config file (path) that (if present) will overwrite
+	// configuration supplied from config file at ConfigPath.
 	ShareConfigPath string
 }


Perhaps we could add another comment line explaining why we might need 2 config files (not just 1).

cli/config/config.go

iurii-ssv · 2024-10-26T10:25:35Z

go.mod

-
-replace github.com/dgraph-io/ristretto => github.com/dgraph-io/ristretto v0.1.1-0.20211108053508-297c39e6640f


Not sure what this line is about,

I've upgraded badgerdb to the latest release in this PR (which upgrades ristretto by extension to a newer version) - hence we must remove it for code to compile (or at least adjust somehow).

nkryuchkov

Great job

nodeprobe/nodeprobe.go

Dockerfile

Makefile

cli/config/config.go

nodeprobe/nodeprobe.go

iurii-ssv · 2024-10-29T18:21:27Z

storage/kv/badger.go

+	out := z.CallocNoRef(1, "jemalloc check")
+	defer z.Free(out)
+	jemallocEnabled := len(out) > 0
+	logger.Debug("jemalloc allocator will be used", zap.Bool("jemalloc_enabled", jemallocEnabled))


This is pretty much the best way I found to check if jemalloc is actually used or not,

note, we print true both for -tags="blst_enabled,jemalloc" and -tags="blst_enabled,jemalloc,allocator", and hence it reports on the presence of jemalloc-only (nothing about allocator),

allocator is a "different strategy" to approach memory management (from what I understand, it serves as a kind of cache between go application and jemalloc buffering large chunks of memory - hence it will make less cgo calls but will be holding onto large unused memory chunks increasing overall memory consumption):

// Allocator amortizes the cost of small allocations by allocating memory in // bigger chunks. Internally it uses z.Calloc to allocate memory. Once // allocated, the memory is not moved, so it is safe to use the allocated bytes // to unsafe cast them to Go struct pointers. Maintaining a freelist is slow. // Instead, Allocator only allocates memory, with the idea that finally we // would just release the entire Allocator. type Allocator struct { sync.Mutex compIdx uint64 // Stores bufIdx in 32 MSBs and posIdx in 32 LSBs. buffers [][]byte Ref uint64 Tag string }

It's hard to tell without doing any tests (or me not having enough context on how exactly we use Badger), but I think if we want to reduce memory footprint as much as possible - we better go with -tags="blst_enabled,jemalloc".

It's hard to tell without doing any tests (or me not having enough context on how exactly we use Badger), but I think if we want to reduce memory footprint as much as possible - we better go with -tags="blst_enabled,jemalloc"

Having done some testing on stage I would say there isn't a noticeable difference for the workloads we do (for operator managing ~500 validators), see #1643 (comment) for more details. So I've added allocator tag as well to match how it originally was configured back when we first started using jemalloc.

Makefile

iurii-ssv · 2024-11-03T13:18:05Z

cli/operator/node.go

-	// load & parse local events yaml if exists, otherwise sync from contract
-	if len(cfg.LocalEventsPath) != 0 {
-		localEvents, err := localevents.Load(cfg.LocalEventsPath)
-		if err != nil {
-			logger.Fatal("failed to load local events", zap.Error(err))
-		}
+	// Sync historical registry events from Ethereum smart contract.
+	logger.Debug("syncing historical registry events", zap.Uint64("fromBlock", fromBlock.Uint64()))


This code is unreachable because setupEventHandling is only ever called when len(cfg.LocalEventsPath) == 0. Which kind of suggests we are never calling EventHandler.HandleLocalEvents ... which is kind of weird ?

I also don't see cfg.LocalEventsPath being used anywhere to actually read the contents of the file, not sure if such usage was removed in the past or just hasn't been added yet, @nkryuchkov could you elaborate ? I'd rather clean it up while we are at it (by maybe removing cfg.LocalEventsPath altogether if we don't need it anymore).

Just as additional context - the current behavior seems to be that we are synching every Ethereum smart contract event that happened since block reported by nodeStorage.GetLastProcessedBlock.

We always need to call setupEventHandling because it handles both on-chain and local events. cfg.LocalEventsPath is used by localevents.Load(cfg.LocalEventsPath)

cfg.LocalEventsPath is used by localevents.Load(cfg.LocalEventsPath)

But like I mentioned above, we never enter the branch that executes localevents.Load(cfg.LocalEventsPath) because it's "guarded" by 2 contradicting conditions:

len(cfg.LocalEventsPath) == 0

len(cfg.LocalEventsPath) != 0

@iurii-ssv in your PR we never do because you call setupEventHandling only if !usingLocalEvents, but on stage we always call setupEventHandling, which calls localevents.Load(cfg.LocalEventsPath) if len(cfg.LocalEventsPath) != 0

Oh, you are right ... I see what's going on. On stage the following happens:

eventSyncer := setupEventHandling( cmd.Context(), logger, executionClient, validatorCtrl, metricsReporter, networkConfig, nodeStorage, operatorDataStore, operatorPrivKey, keyManager, ) if len(cfg.LocalEventsPath) == 0 { nodeProber.AddNode("event syncer", eventSyncer) }

And setupEventHandling returning eventSyncer baited me into thinking "there is no need to even call setupEventHandling if we aren't gonna do nodeProber.AddNode("event syncer", eventSyncer)" - and I merged these 2 under if len(cfg.LocalEventsPath) == 0.

Let me revert that part, and see how I can separate those things from each other (to make it clear & explicit).

Okay, reworked this a bit, relevant commit is here 9900a0b and PR is ready (but I'll redeploy to stage to test it again 1 more time).

What's not 100% clear to me though is "what those local events are", I thought of it as some kind of cache - but that's unlikely to be correct because we don't "subscribe" for events that come after the last event in cfg.LocalEventsPath file, could you clarify (and I'll add a comment about it somewhere) ?

cli/operator/node.go

iurii-ssv · 2024-11-04T13:02:45Z

Re-tested on stage for post-Alan-fork (previuosly tested for pre-Alan-fork).

cli/operator/node.go

nkryuchkov · 2024-11-07T19:01:18Z

cli/operator/node.go

-			cmd.Context(),
-			logger,
-			executionClient,
+		eventFilterer, err := executionClient.Filterer()


Can we try to extract some lines to a function/method/entity to avoid increasing the size of this function? It's quite large and IMO we need to gradually decrease its size

I agree that we want function size to be smaller (not larger) pretty much all the time, but this is not the only/best thing to optimize for - the more important thing is that general code structure stays predictable (easy to understand and navigate),

so in that sense grouping related code together rather than splitting it apart (possibly mixing it with unrelated things - aka mixing different abstraction layers) could be a better choice - and above is a good example of why we might want to "untangle" code sometimes

and now that it is untangled - we can find a better way to refactor StartNodeCmd.Run - which I'm trying to do in #1843 (better do it in a separate PR cause this one is getting larger and larger already, plus it's also better to do after we merge #1820 cause it removes some code from there too)

cli/operator/node.go

nodeprobe/nodeprobe.go

nkryuchkov

LGTM, but I'd prefer not to increase the size of code initializing the node

iurii-ssv commented Oct 26, 2024

View reviewed changes

cli/bootnode/boot_node.go Show resolved Hide resolved

iurii-ssv commented Oct 26, 2024

View reviewed changes

cli/config/config.go Show resolved Hide resolved

iurii-ssv commented Oct 26, 2024

View reviewed changes

nkryuchkov reviewed Oct 26, 2024

View reviewed changes

nodeprobe/nodeprobe.go Outdated Show resolved Hide resolved

Dockerfile Outdated Show resolved Hide resolved

Makefile Outdated Show resolved Hide resolved

cli/config/config.go Outdated Show resolved Hide resolved

nodeprobe/nodeprobe.go Outdated Show resolved Hide resolved

iurii-ssv commented Oct 29, 2024

View reviewed changes

iurii-ssv mentioned this pull request Oct 29, 2024

Re-enable Go RSA after Alan fork #1643

Open

iurii-ssv commented Oct 31, 2024

View reviewed changes

Makefile Show resolved Hide resolved

iurii-ssv commented Nov 3, 2024

View reviewed changes

iurii-ssv requested a review from nkryuchkov November 3, 2024 13:18

iurii-ssv force-pushed the node-startup-cleanup-plus-enable-jemalloc branch from 44e4e0c to 463fc76 Compare November 3, 2024 13:22

nkryuchkov reviewed Nov 3, 2024

View reviewed changes

cli/operator/node.go Show resolved Hide resolved

nkryuchkov reviewed Nov 3, 2024

View reviewed changes

cli/operator/node.go Show resolved Hide resolved

nkryuchkov reviewed Nov 3, 2024

View reviewed changes

cli/operator/node.go Show resolved Hide resolved

nkryuchkov reviewed Nov 3, 2024

View reviewed changes

cli/operator/node.go Show resolved Hide resolved

iurii-ssv force-pushed the node-startup-cleanup-plus-enable-jemalloc branch from 463fc76 to 9900a0b Compare November 3, 2024 19:10

nkryuchkov reviewed Nov 7, 2024

View reviewed changes

nkryuchkov approved these changes Nov 7, 2024

View reviewed changes

iurii-ssv mentioned this pull request Nov 8, 2024

node: simplify start node cmd #1843

Draft

1 task

iurii-ssv added 10 commits November 19, 2024 12:22

node: startup cleanup plus enable jemalloc

9edda31

remove allocator tag

40cd2d9

add jemalloc printout, disable jemalloc

113892d

enable jemalloc

1d7988c

enable jemaloc, fix misc issues

e0fd197

use allocator

dbfaa40

disable allocator

0fc7c7d

use allocator

6f993d9

adjust unit-test to account for added log-line

2283a34

rework evenySyncer initialization/startup

5cadbaa

address review comments

e5e652b

iurii-ssv force-pushed the node-startup-cleanup-plus-enable-jemalloc branch from ac23d12 to e5e652b Compare November 19, 2024 10:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

node: startup cleanup plus enable jemalloc #1819

node: startup cleanup plus enable jemalloc #1819

iurii-ssv commented Oct 26, 2024 •

edited

Loading

iurii-ssv Oct 26, 2024

iurii-ssv Oct 26, 2024 •

edited

Loading

nkryuchkov left a comment

iurii-ssv Oct 29, 2024

iurii-ssv Oct 31, 2024

iurii-ssv Nov 3, 2024 •

edited

Loading

nkryuchkov Nov 3, 2024

iurii-ssv Nov 3, 2024 •

edited

Loading

nkryuchkov Nov 3, 2024

iurii-ssv Nov 3, 2024 •

edited

Loading

iurii-ssv Nov 3, 2024 •

edited

Loading

iurii-ssv commented Nov 4, 2024

nkryuchkov Nov 7, 2024

iurii-ssv Nov 8, 2024

nkryuchkov left a comment


		replace github.com/dgraph-io/ristretto => github.com/dgraph-io/ristretto v0.1.1-0.20211108053508-297c39e6640f

node: startup cleanup plus enable jemalloc #1819

Are you sure you want to change the base?

node: startup cleanup plus enable jemalloc #1819

Conversation

iurii-ssv commented Oct 26, 2024 • edited Loading

iurii-ssv Oct 26, 2024

Choose a reason for hiding this comment

iurii-ssv Oct 26, 2024 • edited Loading

Choose a reason for hiding this comment

nkryuchkov left a comment

Choose a reason for hiding this comment

iurii-ssv Oct 29, 2024

Choose a reason for hiding this comment

iurii-ssv Oct 31, 2024

Choose a reason for hiding this comment

iurii-ssv Nov 3, 2024 • edited Loading

Choose a reason for hiding this comment

nkryuchkov Nov 3, 2024

Choose a reason for hiding this comment

iurii-ssv Nov 3, 2024 • edited Loading

Choose a reason for hiding this comment

nkryuchkov Nov 3, 2024

Choose a reason for hiding this comment

iurii-ssv Nov 3, 2024 • edited Loading

Choose a reason for hiding this comment

iurii-ssv Nov 3, 2024 • edited Loading

Choose a reason for hiding this comment

iurii-ssv commented Nov 4, 2024

nkryuchkov Nov 7, 2024

Choose a reason for hiding this comment

iurii-ssv Nov 8, 2024

Choose a reason for hiding this comment

nkryuchkov left a comment

Choose a reason for hiding this comment

iurii-ssv commented Oct 26, 2024 •

edited

Loading

iurii-ssv Oct 26, 2024 •

edited

Loading

iurii-ssv Nov 3, 2024 •

edited

Loading

iurii-ssv Nov 3, 2024 •

edited

Loading

iurii-ssv Nov 3, 2024 •

edited

Loading

iurii-ssv Nov 3, 2024 •

edited

Loading