Skip to content
This repository has been archived by the owner on Sep 27, 2023. It is now read-only.

demo: multi thread key generation #79

Closed
wants to merge 4 commits into from
Closed

demo: multi thread key generation #79

wants to merge 4 commits into from

Conversation

h4ck3rk3y
Copy link
Contributor

@h4ck3rk3y h4ck3rk3y commented Jul 28, 2023

This PR here shows how we can possibly multi thread key generation. This is badly written and needs to be re-written and this is just a proof of concept

Running 10 nodes with 10000 keys per node; this takes about 8 minutes and 23 seconds to generate the keys. Compared to the previous implementation which took about 55 seconds minutes to generate keys for one node (9 minutes 10 seconds). The improvement isn't as dramatic as I would have hoped. Tried again with the previous way to do 10x10000; it took 8 minutes and 30 seconds. Perhaps this is because Python threads make use of IO parallelism and not CPU parallelism; will try multiprocessing. Multiprocessing (latest commit) took 8 min 17 seconds. With 5 cores assigned, no matter if I do single load, multi thread, multiprocessing I am constantly hitting 500% in CPU; this is a compute heavy operation. Wonder how much can I really squeeze out by parallelizing things.

These numbers are from my M1 Pro ; 32 GB where I have given Docker 5 CPUs and 11.65GB of Memory. Note I have several other applications running but I can expect a similar improvement on your end. Perhaps we can even add min_cpu && min_memory to the configuration when we launch this on K8S

Other things being considered - Generating a bunch of keys and allowing a user to upload a few files artifacts, and then pass the artifact ids and skip key generation all together. This will have some limitations. Currently the following are configurable

mnemnoic - if we have a cache this won't be configurable
prysm-pass - if we have a cache this won't be configurable
num_keys_per_node - currently this is configurable if we make it static; we can't configure this. We can have different bundles somewhere stored though or have a "bring your own keystore api"

@barnabasbusa Can you give this a spin and see if you get any improvements on your k8s workflow? Further if you can see how much you can play with max_concurrent_threads = 10 on line 44 in the Python script to see how far you can get. When I tried running all 50 nodes 10000 keys together that failed. Perhaps your k8s boxes are bigger and you can try a higher count.

Imagining the bring your own bundle api even more - I can imagine it looking like

kurtosis files upload ./validator-keystore-0 --name velvet-underground
kurtosis files upload ./validator-keystore-9 --name greenday
  "participants": [
    {
      "el_client_type": "geth",
      "el_client_image": "ethpandaops/geth:master",
      "cl_client_type": "teku",
      "cl_client_image": "consensys/teku:develop",
      "bn_max_cpu": 3000,
      "bn_max_mem": 4096,
      "count": 10
      "keystores": ["velvet-underground" ... "greenday"]
    }
  ],

How does this sound?

Relevant issue #78

To try out this branch do

kurtosis run github.com/kurtosis-tech/eth-network-package@gyani/speed-up

@h4ck3rk3y
Copy link
Contributor Author

@adschwartz fysa

@h4ck3rk3y
Copy link
Contributor Author

{ "participants": [ { "el_client_type": "geth", "el_client_image": "ethpandaops/geth:master", "cl_client_type": "teku", "cl_client_image": "consensys/teku:develop", "bn_max_cpu": 3000, "bn_max_mem": 4096, "count": 10 }, { "el_client_type": "geth", "el_client_image": "ethpandaops/geth:master", "cl_client_type": "lighthouse", "cl_client_image": "sigp/lighthouse:latest", "bn_max_cpu": 3000, "bn_max_mem": 4096, "v_max_cpu": 3000, "v_max_mem": 2048, "count": 10 }, { "el_client_type": "geth", "el_client_image": "ethpandaops/geth:master", "cl_client_type": "lodestar", "cl_client_image": "chainsafe/lodestar:latest", "bn_max_cpu": 3000, "bn_max_mem": 4096, "v_max_cpu": 3000, "v_max_mem": 2048, "count": 10 }, { "el_client_type": "geth", "el_client_image": "ethpandaops/geth:master", "cl_client_type": "prysm", "cl_client_image": "prysmaticlabs/prysm-beacon-chain:latest,prysmaticlabs/prysm-validator:latest", "beacon_extra_params": ["--grpc-max-msg-size=18388608"], "validator_extra_params": ["--grpc-max-msg-size=18388608"], "bn_max_cpu": 3000, "bn_max_mem": 4096, "v_max_cpu": 3000, "v_max_mem": 2048, "count": 10 }, { "el_client_type": "geth", "el_client_image": "ethpandaops/geth:master", "cl_client_type": "nimbus", "cl_client_image": "statusim/nimbus-eth2:amd64-latest", "v_max_cpu": 3000, "v_max_mem": 2048, "count": 10 } ], "network_params": { "network_id": "3151908", "deposit_contract_address": "0x4242424242424242424242424242424242424242", "seconds_per_slot": 12, "slots_per_epoch": 32, "genesis_delay": 1800, "capella_fork_epoch": 2, "deneb_fork_epoch": 1000, "num_validator_keys_per_node": 10000, "preregistered_validator_keys_mnemonic": "giant issue aisle success illegal bike spike question tent bar rely arctic volcano long crawl hungry vocal artwork sniff fantasy very lucky have athlete" }, "launch_additional_services": true, "wait_for_finalization": false, "wait_for_verifications": false, "verifications_epoch_limit": 5, "global_client_log_level": "info" }

@barnabasbusa
Copy link
Collaborator

I have had an idea today which might or might not work well.
Instead of using one container to generate all the keystore files, would it be possible to spin up n*m pods/containers, and let each node (of a cluster) to generate a set of keys, thus decreasing the total time spent on generating keys.

Is this idea something that could be done?
Probably no benefit on a docker system; but I would expect very big gains from large deployments where we have 50-100 nodes.

@barnabasbusa
Copy link
Collaborator

Running the following test on 8 node cluster:

{
  "participants": [
    {
      "el_client_type": "geth",
      "el_client_image": "ethpandaops/geth:master",
      "cl_client_type": "teku",
      "cl_client_image": "consensys/teku:develop",
      "bn_max_cpu": 3000,
      "bn_max_mem": 4096,
      "count": 3
    },
    {
      "el_client_type": "geth",
      "el_client_image": "ethpandaops/geth:master",
      "cl_client_type": "lighthouse",
      "cl_client_image": "sigp/lighthouse:latest",
      "bn_max_cpu": 3000,
      "bn_max_mem": 4096,
      "v_max_cpu": 3000,
      "v_max_mem": 2048,
      "count": 1
    },
    {
      "el_client_type": "geth",
      "el_client_image": "ethpandaops/geth:master",
      "cl_client_type": "lodestar",
      "cl_client_image": "chainsafe/lodestar:latest",
      "bn_max_cpu": 3000,
      "bn_max_mem": 4096,
      "v_max_cpu": 3000,
      "v_max_mem": 2048,
      "count": 1
    },
    {
      "el_client_type": "geth",
      "el_client_image": "ethpandaops/geth:master",
      "cl_client_type": "prysm",
      "cl_client_image": "prysmaticlabs/prysm-beacon-chain:latest,prysmaticlabs/prysm-validator:latest",
      "beacon_extra_params": ["--grpc-max-msg-size=18388608"],
      "validator_extra_params": ["--grpc-max-msg-size=18388608"],
      "bn_max_cpu": 3000,
      "bn_max_mem": 4096,
      "v_max_cpu": 3000,
      "v_max_mem": 2048,
      "count": 1
    },
    {
      "el_client_type": "geth",
      "el_client_image": "ethpandaops/geth:master",
      "cl_client_type": "nimbus",
      "cl_client_image": "statusim/nimbus-eth2:amd64-latest",
      "v_max_cpu": 3000,
      "v_max_mem": 2048,
      "count": 1
    }
  ],
  "network_params": {
    "network_id": "3151908",
    "deposit_contract_address": "0x4242424242424242424242424242424242424242",
    "seconds_per_slot": 12,
    "slots_per_epoch": 32,
    "genesis_delay": 1800,
    "capella_fork_epoch": 2,
    "deneb_fork_epoch": 1000,
    "num_validator_keys_per_node": 10000,
    "preregistered_validator_keys_mnemonic": "giant issue aisle success illegal bike spike question tent bar rely arctic volcano long crawl hungry vocal artwork sniff fantasy very lucky have athlete"
  },
  "launch_additional_services": true,
  "wait_for_finalization": false,
  "wait_for_verifications": false,
  "verifications_epoch_limit": 5,
  "global_client_log_level": "info"
}

Results in this following error:

Command returned with exit code '0' and the following output:
--------------------
starting at time.struct_time(tm_year=2023, tm_mon=7, tm_mday=31, tm_hour=11, tm_min=5, tm_sec=39, tm_wday=0, tm_yday=212, tm_isdst=0)
executing eth2-val-tools keystores --insecure --prysm-pass password --out-loc /node-3-keystores --source-mnemonic "giant issue aisle success illegal bike spike question tent bar rely arctic volcano long crawl hungry vocal artwork sniff fantasy very lucky have athlete" --source-min 30000 --source-max 40000
Error occurred while executing: eth2-val-tools keystores --insecure --prysm-pass password --out-loc /node-3-keystores --source-mnemonic "giant issue aisle success illegal bike spike question tent bar rely arctic volcano long crawl hungry vocal artwork sniff fantasy very lucky have athlete" --source-min 30000 --source-max 40000

Error output:
runtime: program exceeds 10000-thread limit
fatal error: thread exhaustion

runtime stack:
runtime.throw({0x73a5fa?, 0x47dd80?})
	/usr/local/go/src/runtime/panic.go:1047 +0x5d fp=0x7fbf95511cb8 sp=0x7fbf95511c88 pc=0x451e1d
runtime.checkmcount()
	/usr/local/go/src/runtime/proc.go:766 +0x8c fp=0x7fbf95511ce0 sp=0x7fbf95511cb8 pc=0x455b2c
runtime.mReserveID()
	/usr/local/go/src/runtime/proc.go:782 +0x36 fp=0x7fbf95511d08 sp=0x7fbf95511ce0 pc=0x455b76
runtime.startm(0xc00002cf00, 0x0)
	/usr/local/go/src/runtime/proc.go:2318 +0x92 fp=0x7fbf95511d50 sp=0x7fbf95511d08 pc=0x4589d2
runtime.handoffp(0xffffffff?)
	/usr/local/go/src/runtime/proc.go:2361 +0x2ee fp=0x7fbf95511d78 sp=0x7fbf95511d50 pc=0x458eee
runtime.retake(0x1cc3f15dd4)
	/usr/local/go/src/runtime/proc.go:5351 +0x1d5 fp=0x7fbf95511db8 sp=0x7fbf95511d78 pc=0x45fc75
runtime.sysmon()
	/usr/local/go/src/runtime/proc.go:5259 +0x325 fp=0x7fbf95511e28 sp=0x7fbf95511db8 pc=0x45f9a5
runtime.mstart1()
	/usr/local/go/src/runtime/proc.go:1426 +0x93 fp=0x7fbf95511e50 sp=0x7fbf95511e28 pc=0x457313
runtime.mstart0()
	/usr/local/go/src/runtime/proc.go:1383 +0x79 fp=0x7fbf95511e80 sp=0x7fbf95511e50 pc=0x457259
runtime.mstart()
	/usr/local/go/src/runtime/asm_amd64.s:390 +0x5 fp=0x7fbf95511e88 sp=0x7fbf95511e80 pc=0x47dd85

@barnabasbusa
Copy link
Collaborator

Fix the keystore_stop_index to: 6db2e07

@barnabasbusa
Copy link
Collaborator

It looks like the secret files are no longer populated on the pods.
cp: cannot stat '/validator-keys/node-1-keystores/teku-secrets': No such file or directory
teku-secrets is an empty dir.

@h4ck3rk3y
Copy link
Contributor Author

Interesting. I only tried docker; haven't tried this on k8s.

I'm away till Wednesday(back Thursday) but I really like your multi node key generation idea. That scales pretty well and we can reduce time down to minutes - especially if we can do one container per node for which we need to generated keystores.

@h4ck3rk3y h4ck3rk3y closed this Aug 3, 2023
@h4ck3rk3y
Copy link
Contributor Author

h4ck3rk3y commented Aug 3, 2023

Have implemented #82 (a container per keystore generator)

I had closed this in favor of that

@h4ck3rk3y h4ck3rk3y deleted the gyani/speed-up branch September 21, 2023 09:11
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants