Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

node: startup cleanup plus enable jemalloc #1819

Open
wants to merge 11 commits into
base: stage
Choose a base branch
from

Conversation

iurii-ssv
Copy link
Contributor

@iurii-ssv iurii-ssv commented Oct 26, 2024

This PR contains two different features (I bundled these together so I could test both extensively on stage in one fell swoop):

  • while I've been trying to run SSV node in Docker locally I've encountered a bunch of minor issues (mostly with the way config parsing is handled), this PR addresses all of these
  • closes Re-enable Go RSA after Alan fork #1643 by enabling jemalloc (it works fine with openssl on for go/jemalloc/badgerdb version in this PR, as per my testing done on stage)

Before merging:

  • test on stage

Comment on lines 10 to 17
// Args expose available global args for cli command
type Args struct {
ConfigPath string
// ConfigPath is a path to main configuration file.
ConfigPath string
// ShareConfigPath is an additional config file (path) that (if present) will overwrite
// configuration supplied from config file at ConfigPath.
ShareConfigPath string
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we could add another comment line explaining why we might need 2 config files (not just 1).

Comment on lines -245 to -247

replace github.com/dgraph-io/ristretto => github.com/dgraph-io/ristretto v0.1.1-0.20211108053508-297c39e6640f
Copy link
Contributor Author

@iurii-ssv iurii-ssv Oct 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what this line is about,

I've upgraded badgerdb to the latest release in this PR (which upgrades ristretto by extension to a newer version) - hence we must remove it for code to compile (or at least adjust somehow).

Copy link
Contributor

@nkryuchkov nkryuchkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job

nodeprobe/nodeprobe.go Outdated Show resolved Hide resolved
Dockerfile Outdated Show resolved Hide resolved
Makefile Outdated Show resolved Hide resolved
cli/config/config.go Outdated Show resolved Hide resolved
nodeprobe/nodeprobe.go Outdated Show resolved Hide resolved
Comment on lines 95 to 98
out := z.CallocNoRef(1, "jemalloc check")
defer z.Free(out)
jemallocEnabled := len(out) > 0
logger.Debug("jemalloc allocator will be used", zap.Bool("jemalloc_enabled", jemallocEnabled))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty much the best way I found to check if jemalloc is actually used or not,

note, we print true both for -tags="blst_enabled,jemalloc" and -tags="blst_enabled,jemalloc,allocator", and hence it reports on the presence of jemalloc-only (nothing about allocator),

allocator is a "different strategy" to approach memory management (from what I understand, it serves as a kind of cache between go application and jemalloc buffering large chunks of memory - hence it will make less cgo calls but will be holding onto large unused memory chunks increasing overall memory consumption):

// Allocator amortizes the cost of small allocations by allocating memory in
// bigger chunks.  Internally it uses z.Calloc to allocate memory. Once
// allocated, the memory is not moved, so it is safe to use the allocated bytes
// to unsafe cast them to Go struct pointers. Maintaining a freelist is slow.
// Instead, Allocator only allocates memory, with the idea that finally we
// would just release the entire Allocator.
type Allocator struct {
	sync.Mutex
	compIdx uint64 // Stores bufIdx in 32 MSBs and posIdx in 32 LSBs.
	buffers [][]byte
	Ref     uint64
	Tag     string
}

It's hard to tell without doing any tests (or me not having enough context on how exactly we use Badger), but I think if we want to reduce memory footprint as much as possible - we better go with -tags="blst_enabled,jemalloc".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hard to tell without doing any tests (or me not having enough context on how exactly we use Badger), but I think if we want to reduce memory footprint as much as possible - we better go with -tags="blst_enabled,jemalloc"

Having done some testing on stage I would say there isn't a noticeable difference for the workloads we do (for operator managing ~500 validators), see #1643 (comment) for more details. So I've added allocator tag as well to match how it originally was configured back when we first started using jemalloc.

Makefile Show resolved Hide resolved
Comment on lines 724 to 730
// load & parse local events yaml if exists, otherwise sync from contract
if len(cfg.LocalEventsPath) != 0 {
localEvents, err := localevents.Load(cfg.LocalEventsPath)
if err != nil {
logger.Fatal("failed to load local events", zap.Error(err))
}
// Sync historical registry events from Ethereum smart contract.
logger.Debug("syncing historical registry events", zap.Uint64("fromBlock", fromBlock.Uint64()))
Copy link
Contributor Author

@iurii-ssv iurii-ssv Nov 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is unreachable because setupEventHandling is only ever called when len(cfg.LocalEventsPath) == 0. Which kind of suggests we are never calling EventHandler.HandleLocalEvents ... which is kind of weird ?

I also don't see cfg.LocalEventsPath being used anywhere to actually read the contents of the file, not sure if such usage was removed in the past or just hasn't been added yet, @nkryuchkov could you elaborate ? I'd rather clean it up while we are at it (by maybe removing cfg.LocalEventsPath altogether if we don't need it anymore).

Just as additional context - the current behavior seems to be that we are synching every Ethereum smart contract event that happened since block reported by nodeStorage.GetLastProcessedBlock.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We always need to call setupEventHandling because it handles both on-chain and local events. cfg.LocalEventsPath is used by localevents.Load(cfg.LocalEventsPath)

Copy link
Contributor Author

@iurii-ssv iurii-ssv Nov 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cfg.LocalEventsPath is used by localevents.Load(cfg.LocalEventsPath)

But like I mentioned above, we never enter the branch that executes localevents.Load(cfg.LocalEventsPath) because it's "guarded" by 2 contradicting conditions:

  1. len(cfg.LocalEventsPath) == 0
  2. len(cfg.LocalEventsPath) != 0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iurii-ssv in your PR we never do because you call setupEventHandling only if !usingLocalEvents, but on stage we always call setupEventHandling, which calls localevents.Load(cfg.LocalEventsPath) if len(cfg.LocalEventsPath) != 0

Copy link
Contributor Author

@iurii-ssv iurii-ssv Nov 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, you are right ... I see what's going on. On stage the following happens:

		eventSyncer := setupEventHandling(
			cmd.Context(),
			logger,
			executionClient,
			validatorCtrl,
			metricsReporter,
			networkConfig,
			nodeStorage,
			operatorDataStore,
			operatorPrivKey,
			keyManager,
		)
		if len(cfg.LocalEventsPath) == 0 {
			nodeProber.AddNode("event syncer", eventSyncer)
		}

And setupEventHandling returning eventSyncer baited me into thinking "there is no need to even call setupEventHandling if we aren't gonna do nodeProber.AddNode("event syncer", eventSyncer)" - and I merged these 2 under if len(cfg.LocalEventsPath) == 0.

Let me revert that part, and see how I can separate those things from each other (to make it clear & explicit).

Copy link
Contributor Author

@iurii-ssv iurii-ssv Nov 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, reworked this a bit, relevant commit is here 9900a0b and PR is ready (but I'll redeploy to stage to test it again 1 more time).

What's not 100% clear to me though is "what those local events are", I thought of it as some kind of cache - but that's unlikely to be correct because we don't "subscribe" for events that come after the last event in cfg.LocalEventsPath file, could you clarify (and I'll add a comment about it somewhere) ?

@iurii-ssv iurii-ssv force-pushed the node-startup-cleanup-plus-enable-jemalloc branch from 44e4e0c to 463fc76 Compare November 3, 2024 13:22
@iurii-ssv iurii-ssv force-pushed the node-startup-cleanup-plus-enable-jemalloc branch from 463fc76 to 9900a0b Compare November 3, 2024 19:10
@iurii-ssv
Copy link
Contributor Author

Re-tested on stage for post-Alan-fork (previuosly tested for pre-Alan-fork).

cli/operator/node.go Show resolved Hide resolved
cmd.Context(),
logger,
executionClient,
eventFilterer, err := executionClient.Filterer()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we try to extract some lines to a function/method/entity to avoid increasing the size of this function? It's quite large and IMO we need to gradually decrease its size

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we want function size to be smaller (not larger) pretty much all the time, but this is not the only/best thing to optimize for - the more important thing is that general code structure stays predictable (easy to understand and navigate),

so in that sense grouping related code together rather than splitting it apart (possibly mixing it with unrelated things - aka mixing different abstraction layers) could be a better choice - and above is a good example of why we might want to "untangle" code sometimes

and now that it is untangled - we can find a better way to refactor StartNodeCmd.Run - which I'm trying to do in #1843 (better do it in a separate PR cause this one is getting larger and larger already, plus it's also better to do after we merge #1820 cause it removes some code from there too)

cli/operator/node.go Outdated Show resolved Hide resolved
nodeprobe/nodeprobe.go Outdated Show resolved Hide resolved
Copy link
Contributor

@nkryuchkov nkryuchkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I'd prefer not to increase the size of code initializing the node

@iurii-ssv iurii-ssv mentioned this pull request Nov 8, 2024
1 task
@iurii-ssv iurii-ssv force-pushed the node-startup-cleanup-plus-enable-jemalloc branch from ac23d12 to e5e652b Compare November 19, 2024 10:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants