demo: multi thread key generation #79

h4ck3rk3y · 2023-07-28T18:27:32Z

This PR here shows how we can possibly multi thread key generation. This is badly written and needs to be re-written and this is just a proof of concept

Running 10 nodes with 10000 keys per node; this takes about 8 minutes and 23 seconds to generate the keys. Compared to the previous implementation which took about 55 seconds minutes to generate keys for one node (9 minutes 10 seconds). The improvement isn't as dramatic as I would have hoped. Tried again with the previous way to do 10x10000; it took 8 minutes and 30 seconds. Perhaps this is because Python threads make use of IO parallelism and not CPU parallelism; will try multiprocessing. Multiprocessing (latest commit) took 8 min 17 seconds. With 5 cores assigned, no matter if I do single load, multi thread, multiprocessing I am constantly hitting 500% in CPU; this is a compute heavy operation. Wonder how much can I really squeeze out by parallelizing things.

These numbers are from my M1 Pro ; 32 GB where I have given Docker 5 CPUs and 11.65GB of Memory. Note I have several other applications running but I can expect a similar improvement on your end. Perhaps we can even add min_cpu && min_memory to the configuration when we launch this on K8S

Other things being considered - Generating a bunch of keys and allowing a user to upload a few files artifacts, and then pass the artifact ids and skip key generation all together. This will have some limitations. Currently the following are configurable

mnemnoic - if we have a cache this won't be configurable
prysm-pass - if we have a cache this won't be configurable
num_keys_per_node - currently this is configurable if we make it static; we can't configure this. We can have different bundles somewhere stored though or have a "bring your own keystore api"

@barnabasbusa Can you give this a spin and see if you get any improvements on your k8s workflow? Further if you can see how much you can play with max_concurrent_threads = 10 on line 44 in the Python script to see how far you can get. When I tried running all 50 nodes 10000 keys together that failed. Perhaps your k8s boxes are bigger and you can try a higher count.

Imagining the bring your own bundle api even more - I can imagine it looking like

kurtosis files upload ./validator-keystore-0 --name velvet-underground
kurtosis files upload ./validator-keystore-9 --name greenday
  "participants": [
    {
      "el_client_type": "geth",
      "el_client_image": "ethpandaops/geth:master",
      "cl_client_type": "teku",
      "cl_client_image": "consensys/teku:develop",
      "bn_max_cpu": 3000,
      "bn_max_mem": 4096,
      "count": 10
      "keystores": ["velvet-underground" ... "greenday"]
    }
  ],

How does this sound?

Relevant issue #78

To try out this branch do

kurtosis run github.com/kurtosis-tech/eth-network-package@gyani/speed-up

h4ck3rk3y · 2023-07-28T18:30:09Z

@adschwartz fysa

h4ck3rk3y · 2023-07-28T18:30:33Z

{ "participants": [ { "el_client_type": "geth", "el_client_image": "ethpandaops/geth:master", "cl_client_type": "teku", "cl_client_image": "consensys/teku:develop", "bn_max_cpu": 3000, "bn_max_mem": 4096, "count": 10 }, { "el_client_type": "geth", "el_client_image": "ethpandaops/geth:master", "cl_client_type": "lighthouse", "cl_client_image": "sigp/lighthouse:latest", "bn_max_cpu": 3000, "bn_max_mem": 4096, "v_max_cpu": 3000, "v_max_mem": 2048, "count": 10 }, { "el_client_type": "geth", "el_client_image": "ethpandaops/geth:master", "cl_client_type": "lodestar", "cl_client_image": "chainsafe/lodestar:latest", "bn_max_cpu": 3000, "bn_max_mem": 4096, "v_max_cpu": 3000, "v_max_mem": 2048, "count": 10 }, { "el_client_type": "geth", "el_client_image": "ethpandaops/geth:master", "cl_client_type": "prysm", "cl_client_image": "prysmaticlabs/prysm-beacon-chain:latest,prysmaticlabs/prysm-validator:latest", "beacon_extra_params": ["--grpc-max-msg-size=18388608"], "validator_extra_params": ["--grpc-max-msg-size=18388608"], "bn_max_cpu": 3000, "bn_max_mem": 4096, "v_max_cpu": 3000, "v_max_mem": 2048, "count": 10 }, { "el_client_type": "geth", "el_client_image": "ethpandaops/geth:master", "cl_client_type": "nimbus", "cl_client_image": "statusim/nimbus-eth2:amd64-latest", "v_max_cpu": 3000, "v_max_mem": 2048, "count": 10 } ], "network_params": { "network_id": "3151908", "deposit_contract_address": "0x4242424242424242424242424242424242424242", "seconds_per_slot": 12, "slots_per_epoch": 32, "genesis_delay": 1800, "capella_fork_epoch": 2, "deneb_fork_epoch": 1000, "num_validator_keys_per_node": 10000, "preregistered_validator_keys_mnemonic": "giant issue aisle success illegal bike spike question tent bar rely arctic volcano long crawl hungry vocal artwork sniff fantasy very lucky have athlete" }, "launch_additional_services": true, "wait_for_finalization": false, "wait_for_verifications": false, "verifications_epoch_limit": 5, "global_client_log_level": "info" }

barnabasbusa · 2023-07-28T19:46:26Z

I have had an idea today which might or might not work well.
Instead of using one container to generate all the keystore files, would it be possible to spin up n*m pods/containers, and let each node (of a cluster) to generate a set of keys, thus decreasing the total time spent on generating keys.

Is this idea something that could be done?
Probably no benefit on a docker system; but I would expect very big gains from large deployments where we have 50-100 nodes.

barnabasbusa · 2023-07-31T11:07:23Z

Running the following test on 8 node cluster:

{
  "participants": [
    {
      "el_client_type": "geth",
      "el_client_image": "ethpandaops/geth:master",
      "cl_client_type": "teku",
      "cl_client_image": "consensys/teku:develop",
      "bn_max_cpu": 3000,
      "bn_max_mem": 4096,
      "count": 3
    },
    {
      "el_client_type": "geth",
      "el_client_image": "ethpandaops/geth:master",
      "cl_client_type": "lighthouse",
      "cl_client_image": "sigp/lighthouse:latest",
      "bn_max_cpu": 3000,
      "bn_max_mem": 4096,
      "v_max_cpu": 3000,
      "v_max_mem": 2048,
      "count": 1
    },
    {
      "el_client_type": "geth",
      "el_client_image": "ethpandaops/geth:master",
      "cl_client_type": "lodestar",
      "cl_client_image": "chainsafe/lodestar:latest",
      "bn_max_cpu": 3000,
      "bn_max_mem": 4096,
      "v_max_cpu": 3000,
      "v_max_mem": 2048,
      "count": 1
    },
    {
      "el_client_type": "geth",
      "el_client_image": "ethpandaops/geth:master",
      "cl_client_type": "prysm",
      "cl_client_image": "prysmaticlabs/prysm-beacon-chain:latest,prysmaticlabs/prysm-validator:latest",
      "beacon_extra_params": ["--grpc-max-msg-size=18388608"],
      "validator_extra_params": ["--grpc-max-msg-size=18388608"],
      "bn_max_cpu": 3000,
      "bn_max_mem": 4096,
      "v_max_cpu": 3000,
      "v_max_mem": 2048,
      "count": 1
    },
    {
      "el_client_type": "geth",
      "el_client_image": "ethpandaops/geth:master",
      "cl_client_type": "nimbus",
      "cl_client_image": "statusim/nimbus-eth2:amd64-latest",
      "v_max_cpu": 3000,
      "v_max_mem": 2048,
      "count": 1
    }
  ],
  "network_params": {
    "network_id": "3151908",
    "deposit_contract_address": "0x4242424242424242424242424242424242424242",
    "seconds_per_slot": 12,
    "slots_per_epoch": 32,
    "genesis_delay": 1800,
    "capella_fork_epoch": 2,
    "deneb_fork_epoch": 1000,
    "num_validator_keys_per_node": 10000,
    "preregistered_validator_keys_mnemonic": "giant issue aisle success illegal bike spike question tent bar rely arctic volcano long crawl hungry vocal artwork sniff fantasy very lucky have athlete"
  },
  "launch_additional_services": true,
  "wait_for_finalization": false,
  "wait_for_verifications": false,
  "verifications_epoch_limit": 5,
  "global_client_log_level": "info"
}

Results in this following error:

Command returned with exit code '0' and the following output:
--------------------
starting at time.struct_time(tm_year=2023, tm_mon=7, tm_mday=31, tm_hour=11, tm_min=5, tm_sec=39, tm_wday=0, tm_yday=212, tm_isdst=0)
executing eth2-val-tools keystores --insecure --prysm-pass password --out-loc /node-3-keystores --source-mnemonic "giant issue aisle success illegal bike spike question tent bar rely arctic volcano long crawl hungry vocal artwork sniff fantasy very lucky have athlete" --source-min 30000 --source-max 40000
Error occurred while executing: eth2-val-tools keystores --insecure --prysm-pass password --out-loc /node-3-keystores --source-mnemonic "giant issue aisle success illegal bike spike question tent bar rely arctic volcano long crawl hungry vocal artwork sniff fantasy very lucky have athlete" --source-min 30000 --source-max 40000

Error output:
runtime: program exceeds 10000-thread limit
fatal error: thread exhaustion

runtime stack:
runtime.throw({0x73a5fa?, 0x47dd80?})
	/usr/local/go/src/runtime/panic.go:1047 +0x5d fp=0x7fbf95511cb8 sp=0x7fbf95511c88 pc=0x451e1d
runtime.checkmcount()
	/usr/local/go/src/runtime/proc.go:766 +0x8c fp=0x7fbf95511ce0 sp=0x7fbf95511cb8 pc=0x455b2c
runtime.mReserveID()
	/usr/local/go/src/runtime/proc.go:782 +0x36 fp=0x7fbf95511d08 sp=0x7fbf95511ce0 pc=0x455b76
runtime.startm(0xc00002cf00, 0x0)
	/usr/local/go/src/runtime/proc.go:2318 +0x92 fp=0x7fbf95511d50 sp=0x7fbf95511d08 pc=0x4589d2
runtime.handoffp(0xffffffff?)
	/usr/local/go/src/runtime/proc.go:2361 +0x2ee fp=0x7fbf95511d78 sp=0x7fbf95511d50 pc=0x458eee
runtime.retake(0x1cc3f15dd4)
	/usr/local/go/src/runtime/proc.go:5351 +0x1d5 fp=0x7fbf95511db8 sp=0x7fbf95511d78 pc=0x45fc75
runtime.sysmon()
	/usr/local/go/src/runtime/proc.go:5259 +0x325 fp=0x7fbf95511e28 sp=0x7fbf95511db8 pc=0x45f9a5
runtime.mstart1()
	/usr/local/go/src/runtime/proc.go:1426 +0x93 fp=0x7fbf95511e50 sp=0x7fbf95511e28 pc=0x457313
runtime.mstart0()
	/usr/local/go/src/runtime/proc.go:1383 +0x79 fp=0x7fbf95511e80 sp=0x7fbf95511e50 pc=0x457259
runtime.mstart()
	/usr/local/go/src/runtime/asm_amd64.s:390 +0x5 fp=0x7fbf95511e88 sp=0x7fbf95511e80 pc=0x47dd85

barnabasbusa · 2023-07-31T11:45:44Z

Fix the keystore_stop_index to: 6db2e07

barnabasbusa · 2023-07-31T11:54:11Z

It looks like the secret files are no longer populated on the pods.
cp: cannot stat '/validator-keys/node-1-keystores/teku-secrets': No such file or directory
teku-secrets is an empty dir.

h4ck3rk3y · 2023-07-31T13:05:42Z

Interesting. I only tried docker; haven't tried this on k8s.

I'm away till Wednesday(back Thursday) but I really like your multi node key generation idea. That scales pretty well and we can reduce time down to minutes - especially if we can do one container per node for which we need to generated keystores.

h4ck3rk3y · 2023-08-03T14:58:36Z

Have implemented #82 (a container per keystore generator)

I had closed this in favor of that

h4ck3rk3y added 4 commits July 28, 2023 18:26

lets go

08de4c1

this should be good

6f4988b

smaller set

4ee646c

multiprocessing

0dbd07b

h4ck3rk3y closed this Aug 3, 2023

h4ck3rk3y deleted the gyani/speed-up branch September 21, 2023 09:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

demo: multi thread key generation #79

demo: multi thread key generation #79

h4ck3rk3y commented Jul 28, 2023 •

edited

Loading

h4ck3rk3y commented Jul 28, 2023

h4ck3rk3y commented Jul 28, 2023

barnabasbusa commented Jul 28, 2023

barnabasbusa commented Jul 31, 2023

barnabasbusa commented Jul 31, 2023

barnabasbusa commented Jul 31, 2023

h4ck3rk3y commented Jul 31, 2023

h4ck3rk3y commented Aug 3, 2023 •

edited

Loading

demo: multi thread key generation #79

demo: multi thread key generation #79

Conversation

h4ck3rk3y commented Jul 28, 2023 • edited Loading

h4ck3rk3y commented Jul 28, 2023

h4ck3rk3y commented Jul 28, 2023

barnabasbusa commented Jul 28, 2023

barnabasbusa commented Jul 31, 2023

barnabasbusa commented Jul 31, 2023

barnabasbusa commented Jul 31, 2023

h4ck3rk3y commented Jul 31, 2023

h4ck3rk3y commented Aug 3, 2023 • edited Loading

h4ck3rk3y commented Jul 28, 2023 •

edited

Loading

h4ck3rk3y commented Aug 3, 2023 •

edited

Loading