-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor node provision #2322
base: main
Are you sure you want to change the base?
Refactor node provision #2322
Conversation
* prompt addresses * fix lint * fix lint * fix lint * fix lint * add flags * address comments * fix test * address comments * fix lint --------- Signed-off-by: sukantoraymond <[email protected]>
* prompt addresses * fix lint * fix lint * fix lint * fix lint * add flags * validator prompt * bootstrap validators * add mock * fix lint * update prompt validator * update prompt validator * implement prompts * fix prompt mock * update sidecar * update prompts * update prompts * refactor * update prompts * update prompts * use default balance and weight * generate new node ids and bls * fix lint * add flags * add flags * address comments * address comments * address comments * move validator prompt to deploy * address comments * address comments * fix lint * address comments * address comments * fix test * address comments * fix lint * address coments * address coments * fix test * fix test * fix test * fix lint --------- Signed-off-by: sukantoraymond <[email protected]>
* adding teleporter contract to genesis * fixes for icm on genesis * add icm from a given embedded file * add function to deploy ICM registry * creating mapping values * add consts * make local deploy to work * do not print info for icm support addrs * nit * nit * fix wiz stuff * address PR comments * filter based on balance * add unit tests * Update cmd/blockchaincmd/describe.go Co-authored-by: Michael Kaplan <[email protected]> Signed-off-by: felipemadero <[email protected]> * lint --------- Signed-off-by: felipemadero <[email protected]> Co-authored-by: Michael Kaplan <[email protected]>
* prompt addresses * fix lint * fix lint * fix lint * fix lint * add flags * adding teleporter contract to genesis * validator prompt * bootstrap validators * add mock * fix lint * update prompt validator * update prompt validator * fixes for icm on genesis * add icm from a given embedded file * add function to deploy ICM registry * creating mapping values * add consts * make local deploy to work * do not print info for icm support addrs * nit * implement prompts * fix prompt mock * update sidecar * update prompts * update prompts * refactor * nit * update prompts * fix wiz stuff * update prompts * use default balance and weight * generate new node ids and bls * fix lint * add flags * add flags * add a function to initialize PoA manager * add lib folder * address comments * address comments * address comments * move validator prompt to deploy * address comments * address comments * fix lint * address comments * address comments * fix test * address PR comments * basic PoA manager setting * lint * address comments * fix lint * fix e2e * fix merge * fix merge * filter based on balance * add unit tests * nit * address PR comments * Update cmd/blockchaincmd/describe.go Co-authored-by: Michael Kaplan <[email protected]> Signed-off-by: felipemadero <[email protected]> * nit * address PR comments * lint --------- Signed-off-by: sukantoraymond <[email protected]> Signed-off-by: felipemadero <[email protected]> Co-authored-by: Raymond Sukanto <[email protected]> Co-authored-by: Michael Kaplan <[email protected]>
* wip * go mod tidy * use awm-relayer@main * fixes * use latest awm-relayer pkg which caused upgrade on coreth and avago - has to be resolved * use latest anr and rm prometheus metrics * check for errors * lint * mv to sdk/interchain * some refactoring and added unittests * fix license * fix unittest for now * fix lint * allow empty subnetID add usage working example * rm error msg from example * refactor to address felipe feedback * we should remove this from the test in favour of https://github.com/ava-labs/awm-relayer/blob/main/signature-aggregator/aggregator/aggregator_test.go - we just use exposed API * update readme example * rm goconst fix CI * put `- goconst` back * resolve conficts --------- Co-authored-by: Raymond Sukanto <[email protected]>
* use latest anr main * tidy * bump e2e avago version to be dynamic fee compatible * fix e2e for dynamic fees * lint
* prompt addresses * fix lint * fix lint * fix lint * fix lint * add flags * adding teleporter contract to genesis * validator prompt * bootstrap validators * add mock * fix lint * update prompt validator * update prompt validator * fixes for icm on genesis * add icm from a given embedded file * add function to deploy ICM registry * creating mapping values * add consts * make local deploy to work * do not print info for icm support addrs * nit * implement prompts * fix prompt mock * update sidecar * update prompts * update prompts * refactor * nit * update prompts * fix wiz stuff * update prompts * use default balance and weight * generate new node ids and bls * fix lint * add flags * add flags * add a function to initialize PoA manager * add lib folder * address comments * address comments * address comments * move validator prompt to deploy * address comments * address comments * fix lint * address comments * address comments * fix test * address PR comments * basic PoA manager setting * lint * address comments * fix lint * fix e2e * fix merge * fix merge * convert Subnet Tx * convert subnet * register validator * add validator * mock * remove validator * update weight * update weight * add flags add validator * remove validator update * refactor addvalidator * use node id as arg in addvalidator * fix change weight arg * resolve merge conflict * fix lint * address comments * fix lint * Enable non sov subnets (#2203) * subnet create non sov * support non sov blockchain * update test * update tests * update tests * update test * update tests * update tests * fix test * update tests * Rename Subnet to L1 for new commands ACP77 (#2209) * rename convert subnet * rename to l1 * fix lint * fix lint * fix lint * fix lint * Integrate ConvertSubnetTx AvalancheGo (#2213) * complete implementation of avalanche go convertsubnet * update relayer * update convertsubnettx * update avalanchego * update avalanche go * update avalanche go * dont prompt control key for sov * add tx cases for convert subnet tx (#2218) * Poa integration (#2212) * use latest contract * point to latest convert subnet avago PR * nit * use fixed anr * nit * add subnet conversion id calculation * created unsigned warp message * fixed local network deploy * added signing * now this is working * using latest avago master + awm relayer * working pretty well * ready * lint * address PR comments * nit --------- Signed-off-by: sukantoraymond <[email protected]> Co-authored-by: sukantoraymond <[email protected]> * fix lint * reverse sidecar sov logic * fix merge * add poa setup to blockchain deploy (#2219) * add setup poa to blockchain deploy * add command * add proposervm update * proposervm flag for nodes * nit * nit * fixed dynamic fees params calculation * only adding tracked apis to apis in sidecar * improve prompts * fixing various stuff * add upgrade file * use anr etna enabled * keep upgrade on sync * workin * almost working * working1 * Update cmd/keycmd/transfer.go Co-authored-by: Meaghan FitzGerald <[email protected]> Signed-off-by: felipemadero <[email protected]> * fix some lint * address PR comments * missing file * nit --------- Signed-off-by: felipemadero <[email protected]> Signed-off-by: sukantoraymond <[email protected]> Co-authored-by: Meaghan FitzGerald <[email protected]> Co-authored-by: sukantoraymond <[email protected]> --------- Signed-off-by: sukantoraymond <[email protected]> Signed-off-by: felipemadero <[email protected]> Co-authored-by: felipemadero <[email protected]> Co-authored-by: Meaghan FitzGerald <[email protected]> * fix lint --------- Signed-off-by: sukantoraymond <[email protected]> Signed-off-by: felipemadero <[email protected]> Co-authored-by: felipemadero <[email protected]> Co-authored-by: Meaghan FitzGerald <[email protected]> --------- Signed-off-by: sukantoraymond <[email protected]> Signed-off-by: felipemadero <[email protected]> Co-authored-by: Felipe Madero <[email protected]> Co-authored-by: Meaghan FitzGerald <[email protected]>
Signed-off-by: sukantoraymond <[email protected]>
* refactor subnet sync * fix lint * Remove redundancies (#2229) * remove redundant code * fix lint * fix lint * Convert Subnet in 1 command (#2230) * one command deploy * use default private key to deploy validator manager contract * fix lint * remove redundant code * address comments --------- Signed-off-by: sukantoraymond <[email protected]> --------- Signed-off-by: sukantoraymond <[email protected]>
* generate new node id * fix lint
Signed-off-by: sukantoraymond <[email protected]>
* node local init * hardcode etna devnet config * wip * rm not needed utils * ready for draft PR * nits rm unused struct * refactor - wip * make sure we can resume without prompts * add custom network and local cluster * devnet instead of custom network, rm custom * go mod tidy * upgrade.json for etna devnet * more local cluster * separate between local network and local cluster * working * nit * fix plugin * nit * improve snapshot management * fix bug * nit * proper etna devnet boostrappers * fix bug with local rootdir * remove global node config stuff * destroy node data and processes if bad start * nit * always use fresh staking keys * nit * so less flags * add back fast boot time * add checks for local cluster. minor refactor (#2239) * add checks for local cluster minor refactoring * fix test * addressed Felipe feedback --------- Signed-off-by: arturrez <[email protected]> * bump anr * fixes for convert flow * add check for already initialized error on PoA init * nit * use anr main * fix merge * Local mkdir rootdir (#2245) * make sure rootDir exists * mv it to proper location * it's done before the start * update anr * Integrate local devnet (#2254) * integrate deploy * integrate deploy * Fix lint * address comments * address comments * Integrate start (#2262) * integrate start * integrate start * update etna network * integrate start * integrate start * finish integrate start * fix lint * stop existing local avalanchego * more robust --------- Signed-off-by: arturrez <[email protected]> Signed-off-by: sukantoraymond <[email protected]> Co-authored-by: Felipe Madero <[email protected]> Co-authored-by: sukantoraymond <[email protected]>
@@ -1043,18 +1016,29 @@ func generateNodeCertAndKeys(stakerCertFilePath, stakerKeyFilePath, blsKeyFilePa | |||
} | |||
|
|||
func provideStakingCertAndKey(host *models.Host) error { | |||
instanceID := host.GetCloudID() | |||
keyPath := filepath.Join(app.GetNodesDir(), instanceID) | |||
keyPath := filepath.Join(app.GetNodesDir(), "staking", host.IP) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently we are saving node staking info in dir nodes/cloudid/
This PR saves staking info in staking/hostIP and also in nodes/cloudid
nodeResults.AddResult(host.NodeID, nil, err) | ||
ux.SpinFailWithError(spinner, "", err) | ||
return | ||
publicAccessToHTTPPort := slices.Contains(cloudConfigMap.GetAllAPIInstanceIDs(), host.GetCloudID()) || publicHTTPPortAccess |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as we are not using CloudIDs anymore we should rename and change logic GetAllAPIInstanceIDs to GetAllAPIInstanceIPs.
host.GetCloudID is not needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we rely on IPs now or we should use staking nodeID instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetAllAPIInstanceIDs is mainly to determine whether a node is API node / not. With us changing function of node create to set up, users will set host.APINode to true / false themselves
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we are provisioning the nodes ourselves, we can still use the cloud id
defer wg.Done() | ||
spinner := spinSession.SpinToUser(utils.ScriptLog(host.IP, "Add Monitoring")) | ||
if addMonitoring { | ||
cloudID := host.GetCloudID() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should remove host.GetCloudID() and change logic behind RunSSHSetupPromtailConfig and
GetNodeInstanceDirPath(cloudID)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is just to separate setup process from creation process, while keeping all functionalities (like wiz) the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Therefore, the proposed change seems like out of scope of this PR
cmd/nodecmd/create.go
Outdated
if network.Kind == models.Devnet { | ||
if err := setupDevnet(clusterName, hosts, apiNodeIPMap); err != nil { | ||
return err | ||
} | ||
} | ||
for _, node := range hosts { | ||
if wgResults.HasNodeIDWithError(node.NodeID) { | ||
ux.Logger.RedXToUser("Node %s is ERROR with error: %s", node.NodeID, wgResults.GetErrorHostMap()[node.NodeID]) | ||
if wgResults.HasNodeIDWithError(node.IP) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HasNodeIDWithError should be HasNodeIPWithError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this function is used outside of create.go like setupdevnet, which still uses node id. Changed to HasIDWithError
@@ -1043,18 +1016,29 @@ func generateNodeCertAndKeys(stakerCertFilePath, stakerKeyFilePath, blsKeyFilePa | |||
} | |||
|
|||
func provideStakingCertAndKey(host *models.Host) error { | |||
instanceID := host.GetCloudID() | |||
keyPath := filepath.Join(app.GetNodesDir(), instanceID) | |||
keyPath := filepath.Join(app.GetNodesDir(), "staking", host.IP) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest we create a separate func that uses keyPath := filepath.Join(app.GetNodesDir(), "staking", host.IP)
so IP is used by default when we want to save stuff. In terms of the reading stuff I suggest func that used keyPath := filepath.Join(app.GetNodesDir(), "staking", host.IP)
but falls back to keyPath := filepath.Join(app.GetNodesDir(), "staking", NodeID) if this folder doesn't exists to support backward compatibility with nodes created with prev versions of CLI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently the directory for filepath.Join(nodeDir, constants.StakerCertFileName) is used in mutliple functions. I would rather just create a duplicate staking dir to make it work this PR and not break other functionality
nodeID, err := generateNodeCertAndKeys( | ||
filepath.Join(keyPath, constants.StakerCertFileName), | ||
filepath.Join(keyPath, constants.StakerKeyFileName), | ||
filepath.Join(keyPath, constants.BLSKeyFileName), | ||
) | ||
instanceID := host.GetCloudID() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should deprecate host.GetCloudID()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get cloud id can still be used for when we provision ourselves
cmd.Flags().StringVar(&useAvalanchegoVersionFromSubnet, "avalanchego-version-from-subnet", "", "install latest avalanchego version, that is compatible with the given subnet, on node/s") | ||
cmd.Flags().BoolVar(&publicHTTPPortAccess, "public-http-port", false, "allow public access to avalanchego HTTP port") | ||
cmd.Flags().StringArrayVar(&nodeIPs, "node-ips", []string{}, "IP addresses of nodes") | ||
cmd.Flags().StringArrayVar(&sshKeyPaths, "ssh-key-paths", []string{}, "ssh key paths") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
most likely group of the nodes will be created using the same ssh key file, so I feel like better UX will be allowing users to supply only 1 ssh key for all IP. I guess we can support case when sshKeyPath slice has 1 element or exact number of elements as nodeIP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this pr is already checkign if ssh-key-paths and node-ips have same number of elements, if ssh-agent is false
cmd/nodecmd/setup.go
Outdated
wg.Wait() | ||
spinSession.Stop() | ||
for _, node := range hosts { | ||
if wgResults.HasNodeIDWithError(node.NodeID) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HasNodeIPWithError
we should not use NodeID as we only have NodeIP now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think even better way is to depricate ansible nodeiD in favour of NodeID from avalanchego staking cert.
We will need a special nodeID for monitoring box as it doesn't have avalanchego
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment with out of scope of this pr for now
|
||
func printSetupResults(hosts []*models.Host) { | ||
for _, host := range hosts { | ||
nodePath := filepath.Join(app.GetNodesDir(), "staking", host.IP) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be abstracted so we can hide if we access staking data from IP or NodeID(for backwards compatibility)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
abstract which part?
return cmd | ||
} | ||
|
||
func setup(hosts []*models.Host, avalancheGoVersion string, network models.Network) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as we exposed this function to the users, we need to check if provided slice of hosts in compatible with CLI
i.e it has ubuntu user and it's supported OS
i.e it has list of pkgs needed
I would expect people to input weird stuff Ip/ssh key for some user and redhat box, or raspery pi or whatever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed below
return err | ||
} | ||
for len(nodeIPs) < numNodes { | ||
ux.Logger.PrintToUser("Getting info for node %d", len(nodeIPs)+1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to tell the user what OS we expect etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added ux.Logger.PrintToUser("Note that currently we only support Ubuntu operating system")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also the command description already mentions it
Long: `The node setup command installs Avalanche Go on specified remote servers.
To run the command, the remote servers' IP addresses and SSH private keys are required.
Currently, only ubuntu-based operating system is supported.`,
cmd/nodecmd/setup.go
Outdated
} | ||
nodeIPs = append(nodeIPs, ipAddress) | ||
sshKeyPaths = append(sshKeyPaths, sshKeyPath) | ||
ux.Logger.GreenCheckmarkToUser("Bootstrap Validator %d:", len(nodeIPs)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bootstrap validator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch
@@ -46,7 +47,9 @@ var _ = ginkgo.Describe("[Node create]", func() { | |||
re := regexp.MustCompile(`Generated staking keys for host (\S+)\[NodeID-(\S+)\]`) | |||
match := re.FindStringSubmatch(output) | |||
if len(match) >= 3 { | |||
hostName = match[1] | |||
var err error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want to provide basic e2e for new setup cmd?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can do it for next pr when we default ot using local machine.
if err != nil { | ||
return "", err | ||
} | ||
_, cloudHostID, _ := models.HostAnsibleIDToCloudID(hosts[0].NodeID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as we are only getting IP address/ SSH key pairs from the user we should not use CloudID and use IP addresses instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is to pass the test u made for e2e
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest switching to avalanchego NodeID from staking cert to replace CloudID everywhere as node setup
doesn't require cloud nodes anymore and can be some bare metal
* use network instead of cluster * replace clustername with network * update test * fix test * fix test * fix lint * fix lint * fix test * fix lint * fix test * fix lint * address comments * fix lint * address comments * fix lint
Signed-off-by: sukantoraymond <[email protected]>
* fix contexts and signals * add fuji cluster * lint * point to anr main * address PR comments * remove unused function * address PR comments * add flags for partial sync to local and remote * fix panic * fix e2e
Signed-off-by: sukantoraymond <[email protected]>
Why this should be merged
Separates node provisioning from node infrastructure creation in node create command. This will enable node provisioning function to be used in other packages.
New command node provision is also created, which will take the node ip and the node ssh key path to provision specified nodes with AvalancheGo and Avalanche CLI installed
How this works
Modularize node provision out of node create
How this was tested
Create cloud servers (e.g. AWS / GCP) without them being provisioned and provision them using node provision command.
How is this documented
NA