Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor node provision #2322

Open
wants to merge 102 commits into
base: main
Choose a base branch
from
Open

Refactor node provision #2322

wants to merge 102 commits into from

Conversation

sukantoraymond
Copy link
Collaborator

@sukantoraymond sukantoraymond commented Nov 7, 2024

Why this should be merged

Separates node provisioning from node infrastructure creation in node create command. This will enable node provisioning function to be used in other packages.

New command node provision is also created, which will take the node ip and the node ssh key path to provision specified nodes with AvalancheGo and Avalanche CLI installed

How this works

Modularize node provision out of node create

How this was tested

Create cloud servers (e.g. AWS / GCP) without them being provisioned and provision them using node provision command.

How is this documented

NA

sukantoraymond and others added 30 commits September 9, 2024 10:36
* prompt addresses

* fix lint

* fix lint

* fix lint

* fix lint

* add flags

* address comments

* fix test

* address comments

* fix lint

---------

Signed-off-by: sukantoraymond <[email protected]>
* prompt addresses

* fix lint

* fix lint

* fix lint

* fix lint

* add flags

* validator prompt

* bootstrap validators

* add mock

* fix lint

* update prompt validator

* update prompt validator

* implement prompts

* fix prompt mock

* update sidecar

* update prompts

* update prompts

* refactor

* update prompts

* update prompts

* use default balance and weight

* generate new node ids and bls

* fix lint

* add flags

* add flags

* address comments

* address comments

* address comments

* move validator prompt to deploy

* address comments

* address comments

* fix lint

* address comments

* address comments

* fix test

* address comments

* fix lint

* address coments

* address coments

* fix test

* fix test

* fix test

* fix lint

---------

Signed-off-by: sukantoraymond <[email protected]>
* adding teleporter contract to genesis

* fixes for icm on genesis

* add icm from a given embedded file

* add function to deploy ICM registry

* creating mapping values

* add consts

* make local deploy to work

* do not print info for icm support addrs

* nit

* nit

* fix wiz stuff

* address PR comments

* filter based on balance

* add unit tests

* Update cmd/blockchaincmd/describe.go

Co-authored-by: Michael Kaplan <[email protected]>
Signed-off-by: felipemadero <[email protected]>

* lint

---------

Signed-off-by: felipemadero <[email protected]>
Co-authored-by: Michael Kaplan <[email protected]>
* prompt addresses

* fix lint

* fix lint

* fix lint

* fix lint

* add flags

* adding teleporter contract to genesis

* validator prompt

* bootstrap validators

* add mock

* fix lint

* update prompt validator

* update prompt validator

* fixes for icm on genesis

* add icm from a given embedded file

* add function to deploy ICM registry

* creating mapping values

* add consts

* make local deploy to work

* do not print info for icm support addrs

* nit

* implement prompts

* fix prompt mock

* update sidecar

* update prompts

* update prompts

* refactor

* nit

* update prompts

* fix wiz stuff

* update prompts

* use default balance and weight

* generate new node ids and bls

* fix lint

* add flags

* add flags

* add a function to initialize PoA manager

* add lib folder

* address comments

* address comments

* address comments

* move validator prompt to deploy

* address comments

* address comments

* fix lint

* address comments

* address comments

* fix test

* address PR comments

* basic PoA manager setting

* lint

* address comments

* fix lint

* fix e2e

* fix merge

* fix merge

* filter based on balance

* add unit tests

* nit

* address PR comments

* Update cmd/blockchaincmd/describe.go

Co-authored-by: Michael Kaplan <[email protected]>
Signed-off-by: felipemadero <[email protected]>

* nit

* address PR comments

* lint

---------

Signed-off-by: sukantoraymond <[email protected]>
Signed-off-by: felipemadero <[email protected]>
Co-authored-by: Raymond Sukanto <[email protected]>
Co-authored-by: Michael Kaplan <[email protected]>
* wip

* go mod tidy

* use awm-relayer@main

* fixes

* use latest awm-relayer pkg  which caused upgrade on coreth and avago  - has to be resolved

* use latest anr and rm prometheus metrics

* check for errors

* lint

* mv to sdk/interchain

* some refactoring and added unittests

* fix license

* fix unittest for now

* fix lint

* allow empty subnetID
add usage working example

* rm error msg from example

* refactor to address felipe feedback

* we should remove this from the test in favour of https://github.com/ava-labs/awm-relayer/blob/main/signature-aggregator/aggregator/aggregator_test.go - we just use exposed API

* update readme example

* rm goconst fix CI

* put `- goconst` back

* resolve conficts

---------

Co-authored-by: Raymond Sukanto <[email protected]>
* use latest anr main

* tidy

* bump e2e avago version to be dynamic fee compatible

* fix e2e for dynamic fees

* lint
* prompt addresses

* fix lint

* fix lint

* fix lint

* fix lint

* add flags

* adding teleporter contract to genesis

* validator prompt

* bootstrap validators

* add mock

* fix lint

* update prompt validator

* update prompt validator

* fixes for icm on genesis

* add icm from a given embedded file

* add function to deploy ICM registry

* creating mapping values

* add consts

* make local deploy to work

* do not print info for icm support addrs

* nit

* implement prompts

* fix prompt mock

* update sidecar

* update prompts

* update prompts

* refactor

* nit

* update prompts

* fix wiz stuff

* update prompts

* use default balance and weight

* generate new node ids and bls

* fix lint

* add flags

* add flags

* add a function to initialize PoA manager

* add lib folder

* address comments

* address comments

* address comments

* move validator prompt to deploy

* address comments

* address comments

* fix lint

* address comments

* address comments

* fix test

* address PR comments

* basic PoA manager setting

* lint

* address comments

* fix lint

* fix e2e

* fix merge

* fix merge

* convert Subnet Tx

* convert subnet

* register validator

* add validator

* mock

* remove validator

* update weight

* update weight

* add flags add validator

* remove validator update

* refactor addvalidator

* use node id as arg in addvalidator

* fix change weight arg

* resolve merge conflict

* fix lint

* address comments

* fix lint

* Enable non sov subnets (#2203)

* subnet create non sov

* support non sov blockchain

* update test

* update tests

* update tests

* update test

* update tests

* update tests

* fix test

* update tests

* Rename Subnet to L1 for new commands ACP77 (#2209)

* rename convert subnet

* rename to l1

* fix lint

* fix lint

* fix lint

* fix lint

* Integrate ConvertSubnetTx AvalancheGo  (#2213)

* complete implementation of avalanche go convertsubnet

* update relayer

* update convertsubnettx

* update avalanchego

* update avalanche go

* update avalanche go

* dont prompt control key for sov

* add tx cases for convert subnet tx (#2218)

* Poa integration (#2212)

* use latest contract

* point to latest convert subnet avago PR

* nit

* use fixed anr

* nit

* add subnet conversion id calculation

* created unsigned warp message

* fixed local network deploy

* added signing

* now this is working

* using latest avago master + awm relayer

* working pretty well

* ready

* lint

* address PR comments

* nit

---------

Signed-off-by: sukantoraymond <[email protected]>
Co-authored-by: sukantoraymond <[email protected]>

* fix lint

* reverse sidecar sov logic

* fix merge

* add poa setup to blockchain deploy (#2219)

* add setup poa to blockchain deploy

* add command

* add proposervm update

* proposervm flag for nodes

* nit

* nit

* fixed dynamic fees params calculation

* only adding tracked apis to apis in sidecar

* improve prompts

* fixing various stuff

* add upgrade file

* use anr etna enabled

* keep upgrade on sync

* workin

* almost working

* working1

* Update cmd/keycmd/transfer.go

Co-authored-by: Meaghan FitzGerald <[email protected]>
Signed-off-by: felipemadero <[email protected]>

* fix some lint

* address PR comments

* missing file

* nit

---------

Signed-off-by: felipemadero <[email protected]>
Signed-off-by: sukantoraymond <[email protected]>
Co-authored-by: Meaghan FitzGerald <[email protected]>
Co-authored-by: sukantoraymond <[email protected]>

---------

Signed-off-by: sukantoraymond <[email protected]>
Signed-off-by: felipemadero <[email protected]>
Co-authored-by: felipemadero <[email protected]>
Co-authored-by: Meaghan FitzGerald <[email protected]>

* fix lint

---------

Signed-off-by: sukantoraymond <[email protected]>
Signed-off-by: felipemadero <[email protected]>
Co-authored-by: felipemadero <[email protected]>
Co-authored-by: Meaghan FitzGerald <[email protected]>

---------

Signed-off-by: sukantoraymond <[email protected]>
Signed-off-by: felipemadero <[email protected]>
Co-authored-by: Felipe Madero <[email protected]>
Co-authored-by: Meaghan FitzGerald <[email protected]>
Signed-off-by: sukantoraymond <[email protected]>
* refactor subnet sync

* fix lint

* Remove redundancies (#2229)

* remove redundant code

* fix lint

* fix lint

* Convert Subnet in  1 command (#2230)

* one command deploy

* use default private key to deploy validator manager contract

* fix lint

* remove redundant code

* address comments

---------

Signed-off-by: sukantoraymond <[email protected]>

---------

Signed-off-by: sukantoraymond <[email protected]>
* generate new node id

* fix lint
Signed-off-by: sukantoraymond <[email protected]>
* node local init

* hardcode etna devnet config

* wip

* rm not needed utils

* ready for draft PR

* nits
rm unused struct

* refactor - wip

* make sure we can resume without prompts

* add custom network and local cluster

* devnet instead of custom network, rm custom

* go mod tidy

* upgrade.json for etna devnet

* more local cluster

* separate between local network and local cluster

* working

* nit

* fix plugin

* nit

* improve snapshot management

* fix bug

* nit

* proper etna devnet boostrappers

* fix bug with local rootdir

* remove global node config stuff

* destroy node data and processes if bad start

* nit

* always use fresh staking keys

* nit

* so less flags

* add back fast boot time

* add checks for local cluster. minor refactor (#2239)

* add checks for local cluster
minor refactoring

* fix test

* addressed Felipe feedback

---------

Signed-off-by: arturrez <[email protected]>

* bump anr

* fixes for convert flow

* add check for already initialized error on PoA init

* nit

* use anr main

* fix merge

* Local mkdir rootdir (#2245)

* make sure rootDir exists

* mv it to proper location

* it's done before the start

* update anr

* Integrate local devnet (#2254)

* integrate deploy

* integrate deploy

* Fix lint

* address comments

* address comments

* Integrate start (#2262)

* integrate start

* integrate start

* update etna network

* integrate start

* integrate start

* finish integrate start

* fix lint

* stop existing local avalanchego

* more robust

---------

Signed-off-by: arturrez <[email protected]>
Signed-off-by: sukantoraymond <[email protected]>
Co-authored-by: Felipe Madero <[email protected]>
Co-authored-by: sukantoraymond <[email protected]>
@@ -1043,18 +1016,29 @@ func generateNodeCertAndKeys(stakerCertFilePath, stakerKeyFilePath, blsKeyFilePa
}

func provideStakingCertAndKey(host *models.Host) error {
instanceID := host.GetCloudID()
keyPath := filepath.Join(app.GetNodesDir(), instanceID)
keyPath := filepath.Join(app.GetNodesDir(), "staking", host.IP)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we are saving node staking info in dir nodes/cloudid/

This PR saves staking info in staking/hostIP and also in nodes/cloudid

@sukantoraymond sukantoraymond added the DO NOT MERGE This PR is not meant to be merged in its current state label Nov 13, 2024
nodeResults.AddResult(host.NodeID, nil, err)
ux.SpinFailWithError(spinner, "", err)
return
publicAccessToHTTPPort := slices.Contains(cloudConfigMap.GetAllAPIInstanceIDs(), host.GetCloudID()) || publicHTTPPortAccess
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as we are not using CloudIDs anymore we should rename and change logic GetAllAPIInstanceIDs to GetAllAPIInstanceIPs.
host.GetCloudID is not needed

Copy link
Collaborator

@arturrez arturrez Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we rely on IPs now or we should use staking nodeID instead

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetAllAPIInstanceIDs is mainly to determine whether a node is API node / not. With us changing function of node create to set up, users will set host.APINode to true / false themselves

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we are provisioning the nodes ourselves, we can still use the cloud id

defer wg.Done()
spinner := spinSession.SpinToUser(utils.ScriptLog(host.IP, "Add Monitoring"))
if addMonitoring {
cloudID := host.GetCloudID()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should remove host.GetCloudID() and change logic behind RunSSHSetupPromtailConfig and
GetNodeInstanceDirPath(cloudID)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is just to separate setup process from creation process, while keeping all functionalities (like wiz) the same.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Therefore, the proposed change seems like out of scope of this PR

if network.Kind == models.Devnet {
if err := setupDevnet(clusterName, hosts, apiNodeIPMap); err != nil {
return err
}
}
for _, node := range hosts {
if wgResults.HasNodeIDWithError(node.NodeID) {
ux.Logger.RedXToUser("Node %s is ERROR with error: %s", node.NodeID, wgResults.GetErrorHostMap()[node.NodeID])
if wgResults.HasNodeIDWithError(node.IP) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HasNodeIDWithError should be HasNodeIPWithError

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function is used outside of create.go like setupdevnet, which still uses node id. Changed to HasIDWithError

@@ -1043,18 +1016,29 @@ func generateNodeCertAndKeys(stakerCertFilePath, stakerKeyFilePath, blsKeyFilePa
}

func provideStakingCertAndKey(host *models.Host) error {
instanceID := host.GetCloudID()
keyPath := filepath.Join(app.GetNodesDir(), instanceID)
keyPath := filepath.Join(app.GetNodesDir(), "staking", host.IP)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we create a separate func that uses keyPath := filepath.Join(app.GetNodesDir(), "staking", host.IP) so IP is used by default when we want to save stuff. In terms of the reading stuff I suggest func that used keyPath := filepath.Join(app.GetNodesDir(), "staking", host.IP) but falls back to keyPath := filepath.Join(app.GetNodesDir(), "staking", NodeID) if this folder doesn't exists to support backward compatibility with nodes created with prev versions of CLI

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently the directory for filepath.Join(nodeDir, constants.StakerCertFileName) is used in mutliple functions. I would rather just create a duplicate staking dir to make it work this PR and not break other functionality

nodeID, err := generateNodeCertAndKeys(
filepath.Join(keyPath, constants.StakerCertFileName),
filepath.Join(keyPath, constants.StakerKeyFileName),
filepath.Join(keyPath, constants.BLSKeyFileName),
)
instanceID := host.GetCloudID()
Copy link
Collaborator

@arturrez arturrez Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should deprecate host.GetCloudID()

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get cloud id can still be used for when we provision ourselves

cmd.Flags().StringVar(&useAvalanchegoVersionFromSubnet, "avalanchego-version-from-subnet", "", "install latest avalanchego version, that is compatible with the given subnet, on node/s")
cmd.Flags().BoolVar(&publicHTTPPortAccess, "public-http-port", false, "allow public access to avalanchego HTTP port")
cmd.Flags().StringArrayVar(&nodeIPs, "node-ips", []string{}, "IP addresses of nodes")
cmd.Flags().StringArrayVar(&sshKeyPaths, "ssh-key-paths", []string{}, "ssh key paths")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

most likely group of the nodes will be created using the same ssh key file, so I feel like better UX will be allowing users to supply only 1 ssh key for all IP. I guess we can support case when sshKeyPath slice has 1 element or exact number of elements as nodeIP

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this pr is already checkign if ssh-key-paths and node-ips have same number of elements, if ssh-agent is false

wg.Wait()
spinSession.Stop()
for _, node := range hosts {
if wgResults.HasNodeIDWithError(node.NodeID) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HasNodeIPWithError
we should not use NodeID as we only have NodeIP now

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think even better way is to depricate ansible nodeiD in favour of NodeID from avalanchego staking cert.
We will need a special nodeID for monitoring box as it doesn't have avalanchego

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed below

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment with out of scope of this pr for now


func printSetupResults(hosts []*models.Host) {
for _, host := range hosts {
nodePath := filepath.Join(app.GetNodesDir(), "staking", host.IP)
Copy link
Collaborator

@arturrez arturrez Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be abstracted so we can hide if we access staking data from IP or NodeID(for backwards compatibility)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

abstract which part?

return cmd
}

func setup(hosts []*models.Host, avalancheGoVersion string, network models.Network) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as we exposed this function to the users, we need to check if provided slice of hosts in compatible with CLI
i.e it has ubuntu user and it's supported OS
i.e it has list of pkgs needed

I would expect people to input weird stuff Ip/ssh key for some user and redhat box, or raspery pi or whatever.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed below

return err
}
for len(nodeIPs) < numNodes {
ux.Logger.PrintToUser("Getting info for node %d", len(nodeIPs)+1)
Copy link
Collaborator

@arturrez arturrez Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to tell the user what OS we expect etc

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added ux.Logger.PrintToUser("Note that currently we only support Ubuntu operating system")

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also the command description already mentions it

		Long: `The node setup command installs Avalanche Go on specified remote servers. 
To run the command, the remote servers' IP addresses and SSH private keys are required. 

Currently, only ubuntu-based operating system is supported.`,

}
nodeIPs = append(nodeIPs, ipAddress)
sshKeyPaths = append(sshKeyPaths, sshKeyPath)
ux.Logger.GreenCheckmarkToUser("Bootstrap Validator %d:", len(nodeIPs))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bootstrap validator?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch

@@ -46,7 +47,9 @@ var _ = ginkgo.Describe("[Node create]", func() {
re := regexp.MustCompile(`Generated staking keys for host (\S+)\[NodeID-(\S+)\]`)
match := re.FindStringSubmatch(output)
if len(match) >= 3 {
hostName = match[1]
var err error
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to provide basic e2e for new setup cmd?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can do it for next pr when we default ot using local machine.

if err != nil {
return "", err
}
_, cloudHostID, _ := models.HostAnsibleIDToCloudID(hosts[0].NodeID)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as we are only getting IP address/ SSH key pairs from the user we should not use CloudID and use IP addresses instead

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is to pass the test u made for e2e

Copy link
Collaborator

@arturrez arturrez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest switching to avalanchego NodeID from staking cert to replace CloudID everywhere as node setup doesn't require cloud nodes anymore and can be some bare metal

sukantoraymond and others added 5 commits November 13, 2024 18:16
* use network instead of cluster

* replace clustername with network

* update test

* fix test

* fix test

* fix lint

* fix lint

* fix test

* fix lint

* fix test

* fix lint

* address comments

* fix lint

* address comments

* fix lint
Signed-off-by: sukantoraymond <[email protected]>
* fix contexts and signals

* add fuji cluster

* lint

* point to anr main

* address PR comments

* remove unused function

* address PR comments

* add flags for partial sync to local and remote

* fix panic

* fix e2e
@sukantoraymond sukantoraymond changed the base branch from acp-77 to main November 14, 2024 18:51
@sukantoraymond sukantoraymond changed the base branch from main to acp-77 November 14, 2024 18:51
Base automatically changed from acp-77 to main November 23, 2024 01:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DO NOT MERGE This PR is not meant to be merged in its current state
Projects
Status: Backlog 🗄️
Development

Successfully merging this pull request may close these issues.

4 participants