From a64d3e328df3d3d02bcdb11ebb1fc6d7caec58fe Mon Sep 17 00:00:00 2001 From: Lynn Bendixsen Date: Wed, 20 Sep 2023 15:18:02 -0600 Subject: [PATCH 1/4] Added some troubleshooting steps that might be useful to others when they have trouble while adding a node to a network. Some workarounds for bugs I encountered are included as well. Signed-off-by: Lynn Bendixsen --- docs/source/node-add-troubleshooting.md | 66 +++++++++++++++++++++++++ 1 file changed, 66 insertions(+) create mode 100644 docs/source/node-add-troubleshooting.md diff --git a/docs/source/node-add-troubleshooting.md b/docs/source/node-add-troubleshooting.md new file mode 100644 index 000000000..c235ddb63 --- /dev/null +++ b/docs/source/node-add-troubleshooting.md @@ -0,0 +1,66 @@ +# Troubleshooting - Adding or Upgrading Indy Nodes +Many things can go wrong while adding or upgrading nodes on an existing Indy network and this guide will cover symptoms and issues encountered and some steps you might take to recover from those. The steps listed are likely just possible remedies to the listed issues. Feel free to add more remedies or issues if you don't see your's included here. As bugs are fixed, the issues noted below might not occur any more, or might have a different remedy. + +## Adding a Node +This section covers troubleshooting the addition of a node to a network. This can occur either as part of an upgrade (e.g. the 20.04 upgrade) or as part of a new node being added to an existing network. + +### Symptom 1 - Node is unresponsive +- Cause #1 - Node is performing catchup. (Large Network) +If your node appears unresponsive after adding it to a network (i.e. validator-info shows non-incrementing subledger counts) then the most likely thing you need to do is wait. While smaller networks with a low number of transactions seem to perform "catchup" quite fast (within a minute or two for a domain ledger with 15K transactions) larger networks or networks that have been running for a long time can take 3 hours or more. Networks do not respond or recover well if you restart a node while it is performing catchup, so please be patient. To verify that this is the cause first check that the node is connected to the Primary Node (if not, see Cause #2), then check the logs to verify that "catchup" operations are in process. +- Cause #2 - Node is not connected to the Primary Node +If the added node cannot reach the primary node, then it sometimes has problems with catchup. Further symptoms in this case include Out Of Consensus (OOC) for your node and possibly others. +If you realize the issue quickly, you might be able to recover from this by simply a) stopping the node, b) repairing the connection and then c) restarting the node. Otherwise, to recover you will need to perform the following: +1. Stop indy-node on the new node + `sudo systemctl stop indy-node` +1. Remove the new node from the network using the indy-cli (if the network is OOC, see instructions below for restoring consensus, then return here and try again) + `ledger node target= alias= services=` +1. Repair the connection issue from the new node to the Primary Node (e.g. fix the new node or the primary nodes firewall?) +1. After connection to the primary is verified, remove the "data" directory on the new node. The data directory will be recreated when the node starts back up. This step clears out any possible corruption that might have occurred while the new node was unable to connect to the Primary node. + `sudo rm -rf /var/lib/indy//data` +1. Add the node back to the ledger with VALIDATOR privileges (indy-cli) + `ledger node target= alias= services=VALIDATOR` +1. Start indy-node on the new node + `sudo systemctl start indy-node` +### Symptom #2 - General Network Connectivity +Sometimes network connectivity issues have unexpected consequences and manifestations. Here is a checklist of items that might help correct for some of the odd cases that have been seen: +1. Firewall - Double check that both of the ports are open only for the matching interfaces and that all of the Node ports' allowed-list items have the correct port and IP's associated with them. Also check (where applicable) that you have assigned the correct security group and that each interface only has it's own security group assigned to it. Also check that the Client IP and interface are "allowed" for the world and that the Node IP and interface are restricted to only allow other nodes on your network. +1. On the node, run `ip a` and note the internal (local) ip addresses of your Client and Node interfaces. Then run `cat /etc/indy/indy.env` and verify that the internal ip addresses are used in the appropriate places. For at least AWS/GCP/Azure nodes, it's important not to have "0.0.0.0" used as the ip addresses in the files, as it causes weird connection problems that are difficult to diagnose. +1. More? + +### Symptom #3 - Out of Consensus (OOC) +OOC can happen at any time and for a variety of reasons, not all of them known. Sometimes its just one node, and sometimes its several nodes all at once. Due to a presumed bug that has not yet been diagnosed or fixed, sometimes when adding a node to an existing network("large"?), the network goes OOC immediately. OOC is a serious state that does not allow writes to be made to the network and sometimes indicates a "slow response" even for reads, and thus should be dealt with at the first indication of even one node going OOC. Sometimes one node going OOC will lead to others following. If the network has entered a state where the nodes' primary node is not consistent (two or more primary nodes listed for the network nodes), then skip down to the "Multiple Primaries Symptom". Here's a general order of remedies to follow for returning the network to consensus: +- Restart the OOC nodes. +1. From an Indy CLI: + `ledger pool-restart action=start nodes=` +1. or if all nodes are OOC: + `ledger pool-restart action=start` +NOTE: This last command causes a brief but complete "Downtime" for the entire network and should only be run on "test networks" or as a last resort on production networks. While it is usually only down a few seconds, network "reads" will be incapable of occurring during that time. (Network "writes" are already not happening if the whole network is OOC.) That said, this complete restart regularly restores the network to an "in consensus" state. +- Remove the OOC node from participating in the network (then re-add). +Sometimes one node going OOC will lead to others having the same problem. If you can identify the node having or causing the problem (not always easy) removing it from the network then restarting the other nodes in the network (as mentioned immediately previously) can sometimes return the network to consensus. Then what? Sometimes adding the offending node back into the network immediately causes the problem to recur so the following steps are recommended: +1. Stop indy-node on the offending node + `sudo systemctl stop indy-node` +1. Remove the new node from the network using the indy-cli + `ledger node target= alias= services=` +1. Remove the "data" directory on the offending node. + `sudo rm -rf /var/lib/indy//data` +1. (Optional) For larger networks this step might be required, but on smaller networks you can probably skip it. +Get a full copy of the /var/lib/indy//data" directory from a "good" node and copy it to the offending node. Be sure to stop indy-node on the "good" node before zipping up a copy of the directory on it, then restart it when you are done. Yes, this could have some interesting side effects for an already troubled network... +1. Add the node back to the ledger with VALIDATOR privileges (indy-cli) + `ledger node target= alias= services=VALIDATOR` +1. Start indy-node on the offending node + `sudo systemctl start indy-node` +NOTE: Sometimes when a node is added to a network and it exhibits poor behavior, as mentioned in several symptoms in this document, other nodes become "corrupted" in the process. In that case, performing the remedies in this symptom for each "corrupted node" may be required and is the recommended course of action. +### Symptom #4 Multiple Primaries +This is the case where different nodes on the network claim two or more nodes as the primary node (split-brain). This is likely caused by a "view change" bug that as yet is unresolved, but has occurred enough times to warrant a mention and steps for recovery here. This condition usually exists along with an OOC condition, but getting the primaries to agree should be the first step towards complete resolution (and sometimes also repairs the associated OOC condition). Our goal will be to get all nodes to re-agree to the primary that is the consistent one that most nodes agree on. +1. If the addition of a node to the network seems to have "caused" the multi-primary condition, remove that node from participating in the network. +`ledger node target= alias= services=` +3. It is important to watch what is happening with the primary node changes on your network to determine what to do next. In the case of multiple primaries, regularly some of the nodes have a consistent primary that is the same, where the rest of the nodes are looping through a "view-change" process where their primary is changing somewhat rapidly. Watch the IndyMonitor tool and see which node are doing the rapid change and then make a list of those nodes. Add to that list any of the remaining nodes that have a different primary node than the majority. In other words, find the nodes that seem to all have the same primary node that is unchanging then add all of the OTHER nodes to a list. +4. These next 2 steps must be done in relatively rapid succession, so make sure you are aware and prepared to do them before beginning. Run the following from the indy-cli using the list made in the previous steps. +`ledger pool-restart action=start nodes=` +Wait for about 30 seconds, then run: +`ledger pool-restart action=start` +While this might seem a bit unusual, this two step "restart the network" has been the way that has worked to recover the network split-primary issues. If it doesn't work, try again and maybe wait a bit longer in between the two steps (the timing might be remembered incorrectly or might be somewhat arbitrary). +NOTE: Sometimes when a network begins to exhibit "split-brain" behaviours, in severe cases the symptoms will recur. A complete network reset may be required to remedy the issue if this happens on your network (or apply a future network upgrade that contains an undetermined fix). + +#### Looking for other sypmtoms? +For issues not covered here, there's also a great guide with some deeper troubleshooting tips: [Indy Network Troubleshooting]( https://github.com/hyperledger/indy-node/blob/main/docs/source/troubleshooting.md) From 1eea9ae01e378ad4247e990d2275b257da7479ca Mon Sep 17 00:00:00 2001 From: Lynn Bendixsen Date: Thu, 21 Sep 2023 08:06:39 -0600 Subject: [PATCH 2/4] Made a few minor changes for added clarity. Signed-off-by: Lynn Bendixsen --- docs/source/node-add-troubleshooting.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/node-add-troubleshooting.md b/docs/source/node-add-troubleshooting.md index c235ddb63..1ac81289a 100644 --- a/docs/source/node-add-troubleshooting.md +++ b/docs/source/node-add-troubleshooting.md @@ -6,7 +6,7 @@ This section covers troubleshooting the addition of a node to a network. This c ### Symptom 1 - Node is unresponsive - Cause #1 - Node is performing catchup. (Large Network) -If your node appears unresponsive after adding it to a network (i.e. validator-info shows non-incrementing subledger counts) then the most likely thing you need to do is wait. While smaller networks with a low number of transactions seem to perform "catchup" quite fast (within a minute or two for a domain ledger with 15K transactions) larger networks or networks that have been running for a long time can take 3 hours or more. Networks do not respond or recover well if you restart a node while it is performing catchup, so please be patient. To verify that this is the cause first check that the node is connected to the Primary Node (if not, see Cause #2), then check the logs to verify that "catchup" operations are in process. +If your node appears unresponsive after adding it to a network (i.e. validator-info shows non-incrementing subledger counts) and no other symptoms are evident, then the first thing to do is wait. While smaller networks with a low number of transactions seem to perform "catchup" quite fast (within a minute or two for a domain ledger with 15K transactions) larger networks or networks that have been running for a long time can take 3 hours or more. Networks do not respond or recover well if you restart a node while it is performing catchup, so please be patient. To verify that this is the cause first check that the node is connected to the Primary Node (if not, see Cause #2), then check the logs to verify that normal "catchup" operations are in process. - Cause #2 - Node is not connected to the Primary Node If the added node cannot reach the primary node, then it sometimes has problems with catchup. Further symptoms in this case include Out Of Consensus (OOC) for your node and possibly others. If you realize the issue quickly, you might be able to recover from this by simply a) stopping the node, b) repairing the connection and then c) restarting the node. Otherwise, to recover you will need to perform the following: From 8940b7e99847db46efd382fa771485371525ac9b Mon Sep 17 00:00:00 2001 From: Lynn Bendixsen Date: Tue, 26 Sep 2023 17:00:49 -0600 Subject: [PATCH 3/4] Install file directory plus the first installment of install files are in this commit. These install files are the ones that Indicio and Sovrin are using for the initial installation of servers in different scenarios. Signed-off-by: Lynn Bendixsen --- .../source/install-docs/AWS-NodeInstall-20.04 | 343 ++++++++++++++++ .../install-docs/Azure-NodeInstall-20.04 | 374 +++++++++++++++++ docs/source/install-docs/GC-NodeInstall-20.04 | 378 ++++++++++++++++++ .../install-docs/Physical-NodeInstall-20.04 | 219 ++++++++++ 4 files changed, 1314 insertions(+) create mode 100644 docs/source/install-docs/AWS-NodeInstall-20.04 create mode 100644 docs/source/install-docs/Azure-NodeInstall-20.04 create mode 100644 docs/source/install-docs/GC-NodeInstall-20.04 create mode 100644 docs/source/install-docs/Physical-NodeInstall-20.04 diff --git a/docs/source/install-docs/AWS-NodeInstall-20.04 b/docs/source/install-docs/AWS-NodeInstall-20.04 new file mode 100644 index 000000000..fd7c881a2 --- /dev/null +++ b/docs/source/install-docs/AWS-NodeInstall-20.04 @@ -0,0 +1,343 @@ +## AWS - Install a VM for an Indy Node - Ubuntu 20.04 + +#### Introduction +The following steps are one way to adhere to the Indy Node guidelines for installing an AWS instance server to host an Indy Node. For the hardware requirements applicable for your network, ask the Network administrator or refer to the Technical Requirements document included in the Network Governance documents for your network. +NOTE: Since AWS regularly updates their user interface, this document becomes outdated quickly. The general steps can still be followed with a good chance of success, but please submit a PR with any changes you see or inform the author of the updates (lynn@indicio.tech) to keep this document up to date. + +#### Installation + +1. Before you begin the installation steps, login to your AWS console and select a region to run your VM in. Recommendation: Select the region matching the jurisdiction of your company's corporate offices. +2. From the AWS EC2 services page, click 'Instances' +3. Click 'Launch Instances' +4. Name: Your choice (this is the name you will see in AWS. Usually name it the same thing as your node name which should have the name of your company in it. So if the name of your company is “Jecko”, the node name would work well as “jecko”. Capitals are allowed) +5. Application and OS images + 1. Click the Ubuntu square + 2. Click the down arrow next to the default Ubuntu version and select 'Ubuntu Server 20.04 LTS (HVM), SSD Volume Type'. + 3. Architecture (your choice) +6. Instance Type + 1. Select a type with at least 2 vCPUs and 8G RAM (or whatever your technical requirements specify) then click 'Next: Configure Instance Details'. + 2. HINT: t3.large is sufficient, but you can choose a 'm' or 'c' class server if you prefer. +7. Key pair (login) + 1. Select 'Create a new key pair' (or use an existing one if preferred and available). + 2. Key pair name - Your choice (I selected the alias name that I will eventually assign to this Validator Node, “jecko”) + 3. Select RSA and .pem (or your choice) + 4. Click ‘Create Key Pair’ + 5. Your key pair is now automatically downloaded to your default Download location. + 6. Copy the downloaded file to a "secure and accessible location". For example, you can use a ~/pems directory for your .pem files. + 7. Change the permissions on your new .pem file + 1. chmod 600 ~/pems/jecko.pem +8. Network Settings + 1. Network - default should be fine + 2. Subnet - default (No preference, but record your choice) + 3. Firewall + 1. Select 'Create security group' (default) + 2. Change the default Allow SSH rule that is already in the group. + 1. To restrict SSH access to Admins only, add IP addresses for all admin users of the system in a comma separated list here. This part can be done later and instructions for doing so are included later in this guide. + 2. For Example: Select ‘My IP’ from the dropdown choices and then add more admins later. + 3. We’ll add more firewall rules later to restrict the traffic appropriately. +9. Configure Storage + 1. Root volume - Your choice, defaults are fine + 2. Click 'Add New Volume' + 1. Size - 250 GiB + 2. Encrypted at rest -> on + 3. Volume Type - your choice (magnetic standard is fine, SSD is more expensive and not required for a Node). +10. Advanced Details - Your choice (you can leave all values at defaults) + 1. Suggestion: Enable "Termination Protection" and leave the rest as default. +11. Review the summary, then click ‘Launch instance’ (lower right of the screen in the “Summary” window). +12. Read the information on the screen, become familiar with it and click on any links that you need. + 1. Scroll down and click the 'View all Instances' button (bottom right) to proceed. +13. On the Instances screen, select your instance (i.e. check only the box next to the instance you just created). + 1. Record the Instance ID, Availability Zone, VPC ID, and the Subnet ID. You will need these when you add the second NIC. (availability zone must be recorded completely e.g. af-south-1a) +14. Stop your Virtual machine so that you can configure firewalls and add a new Network Interface (NIC). + 1. Click 'Instance State' -> 'Stop instance' -> 'Stop' + 2. Wait for your instance in the instance list to stop and for the Instance State to be 'stopped' before proceeding. (Hint: This could take several minutes, you can create a subnet and security group while you wait, but you won’t be able to add the new NIC until the VM is stopped). +15. Configure the Client-NIC firewall + 1. Click the networking tab + 2. Scroll down to 'Network interfaces' then scroll to the right on the existing interface and and click on the security group name. + 3. Click Edit inbound rules, then click 'Add rule'. + 4. Type - Custom TCP Rule + 5. Protocol - TCP + 6. Port Range - 9702 + 7. Source - Anywhere ipv4 (ignore the warnings) + 8. Description - your choice (e.g. "Allow all external agents and clients to access this node through port 9702") + 9. Click 'Save rules' +16. Create a new Subnet for the second NIC (Node-NIC). + 1. Return to the EC2 view and select your VM. + 2. Scroll down in the instance details of your new VM and click on your VPC ID link. + 3. Select 'Subnets' from the new left menu, then click 'Create subnet'. + 1. VPC ID - select the same VPC as your new VM (recorded in the previous step). + 2. Subnet name - your choice (e.g. ValidatorNode9701-subnet) + 3. Availability Zone - must select the same Availability zone as your new VM (recorded in the previous step). + 4. IPv4 CIDR block - Type in a valid new subnet block similar (not the same) to the CIDR already showing above. (e.g. I used 172.31.128.0/24) + 5. Click 'Create Subnet' + 6. Return to the EC2 services main page. +17. Create a new security group for the second NIC (Node-NIC) + 1. On the EC2 side menu, click 'Security Groups' (under Network & Security) + 2. Click 'Create security group' + 3. Security group name - ValidatorNode9701-nsg + 4. Description -Your choice + 5. VPC - default (make sure it's the same as new subnet you just created) + 6. Before performing the next set of steps, obtain a list of Node IP addresses from your network administrator. (If you want to add the other Node IP's to the firewall later, open the firewall up by adding a rule to allow 0.0.0.0/0 to all and skip the rest of the steps in this section.) + 7. YOU CAN'T DO THIS STEP YET, SAVE IT FOR LATER: To get your own list of nodes on your network, run the following command from your validator node after installation is complete and the node is added to the network: \ + > **sudo current_validators --writeJson | node_address_list** + 8. Inbound rules: + 1. Repeat the following steps for each IP address in your Nodes list. + 2. Click 'Add rule' + 3. Type - Custom TCP + 4. Port range - 9701 (Must match the port that you will set up later in the Node software configuration and must be the same for all rules added to the allowed list) + 5. Source - Custom -> The next IP address from the Nodes list. (Be sure to add a /32 to the end of the address. Example: 44.242.86.156/32) + 6. Description - Enter the Alias name of the Node IP for ease of future management. + 9. Click 'Create security group' when you have added all of the Node IPs from your list. + 10. Record the security group Name and ID (e.g. ValidatorNode9701-nsg, sg-09c5205a3af5fb5c6) +18. Create a second Network Interface (Node-NIC) + 1. On the EC2 left side menu - Under 'Network & Security' click 'Network Interfaces' + 2. Click 'Create Network Interface' + 1. Description - your choice (e.g. "NIC for the Node IP and port on Jecko") + 2. Subnet -> Select the new subnet created in a previous step of these instructions. Double check that it is in the exact same availability zone as your instance. (e.g. us-east-2b) + 3. IPv4 Private IP -> auto assign + 4. Elastic Fabric Adapter - leave unchecked + 5. Security groups - Select the Group created during a previous step of these instructions (e.g. ValidatorNode9701-nsg) + 6. Click 'Create network interface' + 7. Select the new interface (only) and click the 'Attach' button in the top menu bar. + 1. Find and select the AWS VM instance ID (recorded in an earlier step) + 2. Click 'Attach' +19. Record the Network Interface ID of each network interface + 3. On EC2 left side menu - INSTANCES -> Instances + 4. Select your new instance + 5. At the bottom of the screen click “Networking” tab + 6. Scroll down to ‘Network interfaces’ + 7. Record the ‘Interface ID’ and the ‘Private IP Address’ for the Client and Node interfaces for later use. Usually ens5 is the Client-NIC and ens6 is the Node-NIC. +20. Create 2 Elastic IP’s and associate them with the NIC's + 1. For Indy Nodes on AWS we create Elastic IP addresses because we want the addresses to be static and the default is for them to be dynamically assigned. We do not want the IP address to change every time you have to reboot your server. + 2. On EC2 left side menu - Network & Security ->Elastic IPs + 1. Click 'Allocate Elastic IP Address' + 2. Verify the zone (border group) then Click 'Allocate' + 3. Repeat the above 2 steps to allocate another IP address + 4. At this point you will not see both addresses created! A filter appears that blocks you from seeing any more new addresses created. Remove the filter to see all of the addresses. If you have created too many addresses, select the ones you want to remove, click 'Actions', then select 'Release Elastic IP addresses' and follow the prompts for removal. + 3. Give your new addresses appropriate names so that you can identify them later. (i.e. Jecko Client and Jecko Node) + 4. For each new Elastic IP do the following: + 1. Select one of the Elastic IP’s you just created + 2. Click Actions -> Associate address + 1. Resource type - ‘Network interface’ + 2. Network Interface - <use one of the network interface IDs noted in previous step> + 3. Private IP - Select the IP from the list (there should only be one option and it should match the internal IP address of the chosen interface) + 4. Leave 'Ressociation' checkbox empty + 5. Click 'Associate' + 6. Click 'Clear filters' + 3. Repeat the above steps for the other interface. + 5. Click 'Clear filters' again. + 6. Check to make sure that both Elastic IP's have been associated, and then record and label the Public/Private IP address combinations in a place where you can get to it later. + 7. Click 'Instances' in the left menu and then select your instance. + 8. Select Networking tab in the bottom pane, expand “Network interfaces (2)” and view each of the network interfaces to double check and be sure that you have recorded the Public and Private IP addresses associated with each named interface. (e.g. Client - ens5, 13.58.197.208, 172.31.26.65 and Node - ens6, 3.135.134.42, 172.31.128.42) This information will be used when you install the Validator on your instance. +21. Start your instance + 1. On the EC2 left side menu - click 'Instances' + 2. Select your instance - click Instance State -> Start instance. + 3. Wait for the 'Instance State' of your instance to be 'running' before performing the next step. +22. Log in to your VM + 1. From your Linux or MAC workstation do the following: (a Windows workstation will be different) + 2. ssh -i <public rsa key file> ubuntu@<Client IP Address> + 3. Where rsa key file was the ssh key .pem file generated earlier + 4. And where Client IP is the public address from Nic #1 (Client Public IP from your Node Installation Info spreadsheet) + 5. for example: ssh -i ~/pems/jecko.pem ubuntu@13.58.197.208 + 6. NOTE: I got an error the first time I ran the above to login: "Permission denied" because "Permissions are too open" <for your pem file>. To correct the issue I ran chmod 600 ~/pems/jecko.pem and then I was able to login successfully. +23. Configure networking to the second NIC + 1. From your instance's command prompt, run the command `ip a` and verify that you have 2 internal IP addresses that match what you have in your Node Installation Info spreadsheet. Note the names of the network interfaces. (e.g. ens5 and ens6) The remaining instructions in this section assume ens5 is your original primary NIC (Client-NIC) and ens6 is the secondary NIC (Node-NIC). + 2. Record the interface names, ip addresses, and mac addresses of your 2 interfaces contained in the output of `ip a` + 1. The MAC address is found right after ‘link/ether’ for each interface and is formatted like this: 12:e6:fa:8f:42:79 + 2. For the ens6 or node interface, you might only have the MAC address displayed and not the local IP address yet. If so, use the IP address you recorded earlier for this interface. + 3. Find the default gateway for the main interface. + 1. `ip r` + 2. Look for the line that says ‘default’ and the gateway ends with a .1 + 3. For example: 172.31.84.1 + 4. Disable automatic network management by GCP. Run the following: + 1. `sudo su -` + 2. `echo 'network: {config: disabled}' > /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg` + 5. `vim /etc/iproute2/rt_tables` + 1. Add 2 lines + 800 800 + 801 801 + 6. `vim /etc/netplan/50-cloud-init.yaml` + 1. Replace the “ethernets:” section of the file with the following, but substitute in your own local (internal) IP addresses, and your own routes in several places. + + ethernets: + ens5: + addresses: + - 172.31.84.84/24 + gateway4: 172.31.84.1 + match: + macaddress: 12:e6:fa:8f:42:79 + mtu: 1500 + set-name: ens5 + routes: + - to: 0.0.0.0/0 + via: 172.31.84.1 + table: 800 + routing-policy: + - from: 172.31.84.84 + table: 800 + priority: 300 + nameservers: + addresses: + - 8.8.8.8 + - 8.8.4.4 + - 1.1.1.1 + ens6: + addresses: + - 172.31.128.159/24 + match: + macaddress: 12:69:78:aa:0d:b1 + mtu: 1500 + set-name: ens6 + routes: + - to: 0.0.0.0/0 + via: 172.31.128.1 + table: 801 + routing-policy: + - from: 172.31.128.159 + table: 801 + priority: 300 + nameservers: + addresses: + - 8.8.8.8 + - 8.8.4.4 + - 1.1.1.1 + 7. Please double and triple check that all of the information in the above file is correct before proceeding. Mistakes in the netplan file can cause you to lose access to your VM and you might have to start over. + 8. `netplan generate` + 9. If no output appears (no errors) run: + 1. `netplan apply` + 2. If the above command does not return you to the command prompt, then you made an error in your netplan file and will need to start over from the beginning of this document. Sorry! (You will likely be able to re-use most of the elastic IPs, subnets, security groups, and etc, that you created, but otherwise you need to start from the beginning.) + 10. NOTE: Netplan guidance came from: https://www.opensourcelisting.com/how-to-configure-multiple-network-interfaces/ + 11. Restart your instance. + 1. `reboot` + 12. ssh to your instance again as described earlier. + 1. `ssh -i <public rsa key file> ubuntu@<Client IP Address>` +24. Configure and mount the data disk. + 1. Find the name of your data disk: + 1. `sudo fdisk -l` + 2. In most cases **/dev/nvme1n1** will be the name of the 250Gib data disk created during the EC2 instance setup. + 2. The following steps assume that your disk size is less than 2 TiB, that your disk is /dev/nvme1n1 and that you will be using MBR partitioning. + 3. `sudo fdisk /dev/nvme1n1` + 1. Create a new partition + 1. n + 2. p + 3. <defaults for the rest> TIP: press enter 3 times to accept the defaults and complete the process of creating a partition. + 4. Now, print and write the partition and exit. + 5. p + 6. w + 4. Update the kernel: + 1. `partprobe` + 5. Add a filesystem to your new disk partition: + 2. `sudo mkfs -t ext4 /dev/nvme1n1p`1 + 6. Mount the disk to the directory where the Node software does the most writing (/var/lib/indy): + 1. `sudo mkdir /var/lib/indy` + 2. `sudo mount /dev/nvme1n1p1 /var/lib/indy` + 7. Add the drive to /etc/fstab so that it mounts at server startup. + 1. `sudo blkid` + 2. Record the UUID of /dev/nvme1n1p1 for use in the /etc/fstab file. + 3. `sudo vim /etc/fstab` + 4. Add the following line to the end of the fstab file (substituting in your own UUID): + 1. `UUID=336030b9-df26-42e7-8c42-df7a``967f3c1e /var/lib/indy ext4 defaults,nofail 1 2` + 2. Vim Hint: In vim, arrow down to the last line of the file, press the ‘o’ key and then paste in the above line. As before, <esc> then :wq will write and then exit the file. + 3. WARNING! If you mistakenly use the wrong UUID here and continue on (without verifications listed below), you will likely have to remove your VM and start over. (At some point during the install process, ownership is changed on multiple files simultaneously and accidentally setting your UUID to nvme0n1p1 will cause that command to wreak havoc at the root of your drive.) +25. Restart the instance to check for NIC and Disk persistence. + 1. From EC2 select your instance, click 'Instance State' -> 'Reboot' + 2. Login to your VM as before: + 1. `ssh -i <public rsa key file> ubuntu@<Client IP Address>` + 3. Check the NIC and Disk + 1. `ip a` + 2. The output of the above command should have 2 NICS with the correct IP addresses displayed. + 3. `df -h` + 4. The output of the above command should show /var/lib/indy mounted to the /dev/nvme1n1p1 disk with the correct size (250G). + 5. More NIC and disk verifications will occur during the Indy Node install process. +26. Add a temporary administrative user as a safety net during Two Factor Authentication (2FA) setup. (This is optional, continue to the next step if you choose not to set up a temporary user.) + 1. `sudo adduser tempadmin` + 1. You can safely ignore messages like “sent invalidate(passwd) request, exiting“ + 2. `sudo usermod -aG sudo tempadmin` + 3. Setup sshd_config to temporarily allow password login for the tempadmin user. + 1. `sudo vim /etc/ssh/sshd_config` + 2. Comment out the line containing ‘ChallengeResponseAuthentication’. + 1. #ChallengeResponseAuthentication no + 3. Make sure this line exists and is set to yes: + 1. PasswordAuthentication yes + 4. :wq to save and exit. + 5. `sudo systemctl restart sshd` + 6. The above lines will be altered again when you set up 2FA. + 4. To be able to login, you will also likely need to setup an ssh key + 1. `sudo mkdir /home/tempadmin/.ssh` + 2. `sudo chown tempadmin:tempadmin /home/tempadmin/.ssh` + 3. `sudo vim /home/tempadmin/.ssh/authorized_keys` + 4. Paste the users public key into the open file and then save it (:wq) (You can use the same key as you used for the ubuntu user in this case, since it is a temporary user) + 5. `sudo chown tempadmin:tempadmin /home/tempadmin/.ssh/authorized_keys` +27. AWS does not setup a user-friendly hostname (i.e. ip-172-31-30-158) for an easy to use experience with Google authenticator and 2FA. To change your hostname to “NewHostName” do the following: + 1. `sudo hostnamectl set-hostname NewHostName` + 2. `sudo vi /etc/hosts` + 1. Add a line right after “localhost” + 2. 127.0.0.1 NewHostName + 3. `sudo vi /etc/cloud/cloud.cfg` + 1. Search for preserve_hostname and change the value from false to true: +28. Setup 2FA for SSH access to the Node for your base user. + 1. Optional: Login in a separate terminal as your tempadmin user (that has sudo privileges) to have a backup just in case something goes wrong during setup. + 1. `ssh tempadmin@<Client IP Addr>` + 2. Install Google Authenticator, Duo, or Authy on your phone. + 3. As your base user on the Node VM, run the following to install the authenticator: + 1. `sudo apt-get install libpam-google-authenticator` + 4. Configure the authenticator to allow both password and SSH key login with 2FA by changing 2 files: + 1. `sudo vim /etc/pam.d/common-auth` + 2. Add the following line as the first uncommented line in the file + 1. auth sufficient pam_google_authenticator.so + 2. <esc> + 3. :wq + 3. `sudo vim /etc/ssh/sshd_config` + 1. add/configure the following lines: + 1. `ChallengeResponseAuthentication yes` + 2. `PasswordAuthentication no` + 3. `AuthenticationMethods publickey,keyboard-interactive` + 4. `UsePAM yes` + 2. If you see any of the above lines commented out, remove the # to uncomment them. If you don't see any of the above lines, make sure to add them. If you see those lines configured in any different way, edit them to reflect the above. In my file, a. needed changed b. and d. were already set, and c. needed added (I added it right by b.) + 3. :wq + 4. `sudo systemctl restart sshd` + 5. Setup your base user to use 2FA by running the following from a terminal: + 1. `google-authenticator` + 2. Answer ‘y’ to all questions asked during the setup + 3. Save the secret key, verification code and scratch codes in a safe place. These are all just for your user and can be used to login or to recover as needed. + 6. On your phone app add an account and then scan the barcode or enter the 16 character secret key from the previous steps output. + 7. You should now be able to login using 2FA. First, check that login still works for your base user in a new terminal. If that doesn’t work, double check all of the configuration steps above and then restart sshd again. If it still doesn’t work, it’s possible that a server restart is required to make 2FA work (NOTE: It is dangerous to restart at this point, because then all of your backup terminals that are logged in will be logged out and there is a chance that you will lose access. Please check that all other steps have been executed properly before restarting.) +29. Add other administrative users: + 1. Send the other new admin users the following instructions for generating their own SSH keys: + 1. `ssh-keygen -P "" -t rsa -b 4096 -m pem -f ~/pems/validatornode.pem` + 2. Have the new users send you their public key (e.g. validatornode.pem.pub if they do the above command) + 3. Also have them send you their Public IP address so that you can add it to the EC2 firewall to allow them access. Optionally, have them send a preferred username also. + 2. Add their IP addresses to the EC2 firewall: + 1. From the EC2 instance screen, select your instance and scroll down to find and click on the primary security group. (e.g. ValidatorClient9702) + 2. Click the Inbound rules tab just below the middle of the screen and click the 'Edit inbound rules' button. + 3. In the new window that pops up, click in the 'Source' field of the port 22 rule to add the new users' IP addresses separated by commas.(no spaces) + 4. Click ‘Save’ (Note: Restart is not needed. As soon as you save, they should have access.) + 3. Add the users to the server: + 1. Login to the Node as the base user. + 2. Run the following commands, substituting the username in for <newuser> + 3. `sudo adduser <newuser>` + 1. You can safely ignore messages like “sent invalidate(passwd) request, exiting“ + 2. For “Enter new UNIX password:” put password1 (This will be changed later) + 3. Enter a name (optional) + 4. Defaults are fine for the rest + 4. `sudo usermod -aG sudo <newuser>` + 5. Then create a file in the newusers home directory: + 1. `sudo mkdir /home/<newuser>/.ssh` + 2. `sudo chown <newuser>:<newuser> /home/<newuser>/.ssh` + 3. `sudo vim /home/<newuser>/.ssh/authorized_keys` + 4. Paste the users public key into the open file and then save it (:wq) + 5. `sudo chown <newuser>:<newuser> /home/<newuser>/.ssh/authorized_keys` + 6. Repeat the above for each new admin user you create. + 4. The new users are now able to login. Since 2FA is required, when you send the password to each of the new users, also send the following instructions (HINT: fill in the username, Client IP address, and password for them with the correct values): + 1. Thanks for agreeing to help with the administration of our Indy Validator Node. Please login to the node, change your password, and setup Two Factor Authentication (2FA) using the following instructions: + 1. ssh -i <your private SSH key file> <username>@<Client IP Addr> + 2. Type in password1 for your password + 3. On successful login, type in ‘passwd’ to change your password on the Validator Node. Please use a unique password of sufficient length and store it in a secure place (i.e. a password manager). + 4. To set up 2FA, type in ‘google-authenticator’ + 1. Answer ‘y’ to all questions asked during the setup + 2. Save the secret key, verification code, and scratch codes in a safe place. These are all for your user and can be used to login or to recover as needed. + 5. Install Google Authenticator, Duo, Authy, or other google-authenticator compatible app on your phone or device. + 6. On your 2FA phone app, add an account, and then scan the barcode or enter the 16 character secret key from step 4’s output. + 7. Log out and then log back in to check and make sure it worked! + 5. All of your secondary admin users should be setup now. +30. You can now begin the Indy Node installation using the Validator Preparation Guide. diff --git a/docs/source/install-docs/Azure-NodeInstall-20.04 b/docs/source/install-docs/Azure-NodeInstall-20.04 new file mode 100644 index 000000000..eb9d902b3 --- /dev/null +++ b/docs/source/install-docs/Azure-NodeInstall-20.04 @@ -0,0 +1,374 @@ +## Azure - Install a VM for an Indy Node - Ubuntu 20.04 + +#### Introduction +The following steps are one way to adhere to the Indy Node guidelines for installing an Azure instance server to host an Indy Node. For the hardware requirements applicable for your network, ask the Network administrator or refer to the Technical Requirements document included in the Network Governance documents for your network. +NOTE: Since Azure regularly updates their user interface, this document becomes outdated quickly. The general steps can still be followed with a good chance of success, but please submit a PR with any changes you see or inform the author of the updates (lynn@indicio.tech) to keep this document up to date. + +#### Installation + +1. Create a new or open an existing “Resource Group” (“Create New” was used for this document.) You can also do this later. +2. From the Azure portal ‘home’ click 'Create a resource'. +3. Type “ubuntu server” in the search field, ‘Enter’, then select 'Ubuntu server 20.04 LTS' +4. Click 'Create' to deploy with Resource Manager. +5. TIP: Throughout the process of going through all of the tabs in the next several steps do NOT click your browsers back button as this will remove all previous selections and cause you to have to start over. Instead, you should either click on each tab across the top of the interface, or use the ‘Previous’ and ‘Next’ buttons at the bottom for navigation between the steps. +6. Basics tab + 1. Project Details + 1. Subscription - Your choice. + 2. Resource group - Your choice. Recommended: For ease of administration, create a new one of your choosing for your Node. For example: NODE + 2. Instance Details + 1. VM Name - Your Choice + 2. Region - Recommendation: select the region that your business resides in, for added network diversity. + 3. Availability options - select 'No infrastructure redundancy required'. + 4. Image - Default (already filled in with Ubuntu Server 20.04 LTS) + 5. Azure Spot instance - select No (or uncheck the box). + 6. Size - click 'see all sizes', click the x by 'Size : Small(0-6)', and select a size with at least 2 vCPUs and 8G RAM or greater then click ‘select’. Minimum: Standard B2ms, 2 vcpus, 8 GiB memory. Or follow the governance for the network you are joining. + 3. Administrator Account + 1. Authentication type: SSH public key + 2. Username: <your choice> (e.g. “ubuntu”) + 3. SSH public key: “Generate new key pair”. + 4. Click Next:Disks at the bottom of the screen. +7. Disks tab + 1. Disk options + 1. OS disk type - standard HDD is inexpensive and acceptable, but the choice is yours. + 2. Encryption type - check the box for “encrypted at rest” to be enabled. + 2. Data disks - click 'Create and attach a new disk' and a new entry screen appears: + 1. Name - default is fine + 2. Source Type - default (None) + 3. Size - click 'Change size' -> Select storage type as Standard HDD (or better) and then select 256 GiB (or use the disk size required by governance documents) and click OK + 4. Encryption type - default + 5. Click ‘OK’ + 3. Leave LUN as 0 + 4. Leave advanced options at default (use managed disks - Yes) NOTE: Changing managed disks to No is not supported and it resets all selections made in the Data Disks section! + 5. DO NOT click 'Review + create' yet. This is not for reviewing and creating the disk, it is for the whole VM and we have a few more tabs to go through before we are ready for that. + 6. Click Next:Networking +8. Networking tab + 1. Virtual network - default + 2. Subnet - default + 3. Public IP - click ‘Create new’ (for the Client-NIC) + 1. SKU - standard (NOTE: this is critical! It must match what you choose later for the Second IP address on the Node-NIC) + 2. Click ‘OK’ + 4. NIC network security group - Advanced + 5. Configure network security group - click 'Create new' + 1. Change the name for ease of identification, because this is the Client-NIC’s nsg and it will operate on port 9702 (i.e. ValidatorClient9702-nsg) + 2. Click on the provided default SSH entry to change it + 1. Source - IP Addresses + 2. Source IP addresses - add all the Node admins’ workstations IP addresses to allow them to access the machine for maintenance. + 3. Priority - 900 + 4. Leave the rest of the values at default and click ‘OK’ (lower right). + 3. Remove rules allowing port 80 and port 443. + 4. Click '+ Add an inbound rule' to setup this NIC as the “Client” NIC: + 1. Source - Any + 2. Source port ranges - * (Any) + 3. Destination - Any + 4. Destination port ranges - 9702 + 5. Protocol - TCP + 6. Name - your choice (probably make it more appropriate than the default, i.e. ClientPort_9702) + 7. Description - add if desired. For example, you might add: “This is the rule that opens up port 9702 for all client connections to the Validator Node.” + 8. Click “Add” + 5. You should now have 2 inbound rules, 1 for admin SSH access and 1 for client access to the node on port 9702. + 6. Click 'OK' + 6. Delete public IP and NIC when VM is deleted - leave unchecked + 7. Accelerated networking - unchecked + 8. Place this virtual machine behind an existing load balancing solution? - unchecked + 9. Click Next:Management +9. Management tab + 1. Identity - Off + 2. Enable auto-shutdown - Off (required) + 3. Enable backup - On (unless you have another backup solution. Some type of backup is required) + 1. Recovery Services vault - your choice (some of the steps below will change if you select ‘Use existing’) + 2. Resource group - your choice. For example: (new) NODE + 3. Backup Policy - your choice, but backup is required. Suggested Backup policy setup follows: + 1. Click ‘Create new’ + 2. Policy name - WeeklyPolicy + 3. Backup schedule - your choice + 4. Retain instant recovery snapshots for - 5 Day(s) + 5. Retention range - defaults are fine + 6. Click ‘OK’ +10. Monitoring tab + 1. Your choice +11. Advanced tab + 1. Defaults are fine +12. Tags tab + 1. No tags needed +13. Click Review + create + 1. Check all values for accuracy +14. If accurate, click ‘Create’ + 1. Download of ssh key occurs here +15. Wait for the message “Your deployment is complete” (Hint: This will take a few minutes) +16. Click ‘Go to resource’ +17. On the VM overview screen find and record the public and private(local) IP addresses for later use. These are the Client IP’s. +18. Stop your Virtual machine so that you can add a new NIC. + 1. From Azure Portal Home, select your virtual machine then select ‘overview’. + 2. From the menu bar across the top select ‘Stop’ then ‘Yes’ to stop the VM + 3. Wait for notification that states “Successfully stopped virtual machine” (Hint: This could take several minutes, you can create a new Node IP address while you wait, but you won’t be able to add the new NIC until the VM is stopped) +19. Add a subnet to your virtual network. + 1. Select ‘Virtual networks’ from the Azure Home screen. (Hint: You might need to click ‘More services’ and search for it if you do not see it on the main page.) + 2. Click on the new Virtual network that you made earlier as part of these instructions. (example: NODE-vnet) + 3. Click ‘Address space’ in the left menu + 1. Click in the ‘Add additional address range’ entry box + 2. Type in 10.2.0.0/16 + 3. Click ‘Save’ in the bottom left + 4. Click ‘Subnets’ in the left menu + 1. Click ‘+ Subnet’ on the top menu + 2. Name - your choice (i.e. nodeSubnet-9701) + 3. Address range - 10.2.0.0/24 + 4. Defaults for the rest + 5. Click ‘Save’ +20. Add a second NIC to ValidatorNode VM + 1. From Azure home, find and select your new VM + 2. Select ‘Networking’ from the side menu of the Azure portal Virtual machine interface. + 3. Select ‘Attach network interface’ from the top menu + 4. Click ‘Create and attach network interface’ + 5. Subscription - your choice + 6. Resource group - NODE (**must be the same** as the new Node VM) + 7. Name - your choice (ValidatorNode9701-NIC) + 8. Subnet - Select the subnet created in the previous step of these instructions. + 9. NIC network security group -> Advanced + 10. Network security group - Click the arrow to create a new group. + 1. Click ‘+ Create new’ + 2. Name - ValidatorNode9701-nsg + 3. The following steps must be repeated for each Node in the Indy Network that you will be a part of. For a list of IPs and ports in your network, please ask your network administrator. Note: This step can be done later and the “Allowed list” that you begin during this step needs updated every time a new node is added to your network. + 4. LATER: To get your own list of nodes on your network, run the following command from your validator node after installation is complete and the node is added to the network: + `sudo current_validators --writeJson | node_address_list` + 1. Click ‘+ Add an inbound rule’ + 2. Source - IP Addresses + 1. Enter the next IP address from the IP/port list. + 3. Source port ranges - * + 4. Destination - IP addresses (your local IP address) + 1. 10.2.0.5 + 5. Destination port ranges - 9701 (Must match the port that you will set up later in the Node software configuration and must be the same for all rules added to the allowed list) + 6. Protocol - TCP + 7. Action - Allow + 8. Priority - your choice (default value should work here) + 9. Name - your choice. Recommended to add the name of the node allowed access for ease of future removal or IP/port change. + 10. Click ‘Add’ + 4. Repeat steps iii.1-10 above until all nodes in the network have been added to your “allowed list” + 5. Click ‘OK’ to complete the Security Group creation + 11. Private IP address assignment - Static + 12. Private IP address - 10.2.0.5 (or other if preferred. Record this as private(local) ip address of node_ip) + 13. Click ‘Create’ to create the new NIC + 14. Select ‘Attach network interface’ from the top menu (again) + 15. Select the new NIC form the dropdown list (if it isn’t there, it may already be connected) + 16. Click ‘OK’ to complete the addition of the new NIC to the VM +8. Add a static public IP address to the new NIC + 1. Click on the new Network Interface name then click on the name again (next to **Network Interface:**) to open the ‘Overview’ view for the new NIC. + 2. Click ‘IP configurations’ in the left menu. + 3. Click ‘ipconfig1’ to open the settings for the configuration + 4. Public IP address - select “Associate” + 1. Click ‘Create new’ + 1. Name - your choice (i.e. ValidatorNode9701-ip) + 2. SKU - Standard (HINT: This value must match what you used for the first IP address for your VM!) + 3. Click ‘OK’ + 2. Assignment - Static + 3. IP address - 10.2.0.5 + 4. Click ‘Save’ (Upper left) + 5. Click the ‘X’ in the upper right of the active window** twice** to close the IP configuration windows. + 6. Refresh the screen (refresh browser, then click again on the second NIC) to view and copy the new Public IP just created. Save that value for future use. + 7. Click ‘Overview’ on the left bar to prepare for the next step. +22. Start your new VM and then Log in to your VM + 1. From your Linux or MAC workstation do the following: (a Windows workstation will be different) + 2. ssh -i <public rsa key file> ubuntu@<Client IP Address> + 3. Where rsa key file was the ssh key .pem file generated earlier + 4. And where Client IP is the public address from Nic #1 (Client Public IP from your Node Installation Info spreadsheet) + 5. for example: ssh -i ~/pems/jecko.pem ubuntu@13.58.197.208 + 6. NOTE: I got an error the first time I ran the above to login: "Permission denied" because "Permissions are too open" <for your pem file>. To correct the issue I ran chmod 600 ~/pems/jecko.pem and then I was able to login successfully. +23. Configure networking to the second NIC + 1. From your instance's command prompt, run the command `ip a` and verify that you have 2 internal IP addresses that match what you have in your Node Installation Info spreadsheet. Note the names of the network interfaces. (e.g. eth0 and eth1) The remaining instructions in this section assume eth0 is your original primary NIC (Client-NIC) and eth1 is the secondary NIC (Node-NIC). + 2. Record the interface names, ip addresses, and mac addresses of your 2 interfaces contained in the output of `ip a` + 1. The MAC address is found right after ‘link/ether’ for each interface and is formatted like this: 12:e6:fa:8f:42:79 + 3. Find the default gateway for the main interface. + 1. `ip r` + 2. Look for the line that says ‘default’ and the gateway ends with a .1 + 3. For example: 10.1.0.1 + 4. Disable automatic network management by GCP. Run the following: + 1. `sudo su -` + 2. `echo 'network: {config: disabled}' > /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg` + 5. `vim /etc/iproute2/rt_tables` + 1. Add 2 lines + 800 800 + 801 801 + 6. `vim /etc/netplan/50-cloud-init.yaml` + 1. Replace the “ethernets:” section of the file with the following, but substitute in your own local (internal) IP addresses, and your own routes in several places. + + ethernets: + eth0: + addresses: + - 172.31.84.84/24 + gateway4: 172.31.84.1 + match: + macaddress: 12:e6:fa:8f:42:79 + mtu: 1500 + set-name: eth0 + routes: + - to: 0.0.0.0/0 + via: 172.31.84.1 + table: 800 + routing-policy: + - from: 172.31.84.84 + table: 800 + priority: 300 + nameservers: + addresses: + - 8.8.8.8 + - 8.8.4.4 + - 1.1.1.1 + eth1: + addresses: + - 172.31.128.159/24 + match: + macaddress: 12:69:78:aa:0d:b1 + mtu: 1500 + set-name: eth1 + routes: + - to: 0.0.0.0/0 + via: 172.31.128.1 + table: 801 + routing-policy: + - from: 172.31.128.159 + table: 801 + priority: 300 + nameservers: + addresses: + - 8.8.8.8 + - 8.8.4.4 + - 1.1.1.1 + 7. Please double and triple check that all of the information in the above file is correct before proceeding. Mistakes in the netplan file can cause you to lose access to your VM and you might have to start over. + 8. `netplan generate` + 9. If no output appears (no errors) run: + 1. `netplan apply` + 2. If the above command does not return you to the command prompt, then you made an error in your netplan file and will need to start over from the beginning of this document. Sorry! (You will likely be able to re-use most of the elastic IPs, subnets, security groups, and etc, that you created, but otherwise you need to start from the beginning.) + 10. NOTE: Netplan guidance came from: https://www.opensourcelisting.com/how-to-configure-multiple-network-interfaces/ + 11. Restart your instance. + 1. `reboot` + 12. ssh to your instance again as described earlier. + 1. `ssh -i <public rsa key file> ubuntu@<Client IP Address>` +24. Configure and mount the data disk. + 1. Find the name of your data disk: + 1. `sudo fdisk -l` + 2. In most cases **/dev/sda** will be the name of the 250Gib data disk created during the instance setup. + 2. The following steps assume that your disk size is less than 2 TiB, that your disk is /dev/daand that you will be using MBR partitioning. + 3. `sudo fdisk /dev/sda` + 1. Create a new partition + 1. n + 2. p + 3. <defaults for the rest> TIP: press enter 3 times to accept the defaults and complete the process of creating a partition. + 4. Now, print and write the partition and exit. + 5. p + 6. w + 4. Update the kernel: + 1. `partprobe` + 5. Add a filesystem to your new disk partition: + 2. `sudo mkfs -t ext4 /dev/sda1` + 6. Mount the disk to the directory where the Node software does the most writing (/var/lib/indy): + 1. `sudo mkdir /var/lib/indy` + 2. `sudo mount /dev/sda1 /var/lib/indy` + 7. Add the drive to /etc/fstab so that it mounts at server startup. + 1. `sudo blkid` + 2. Record the UUID of /dev/sda1 for use in the /etc/fstab file. + 3. `sudo vim /etc/fstab` + 4. Add the following line to the end of the fstab file (substituting in your own UUID): + 1. `UUID=336030b9-df26-42e7-8c42-df7a``967f3c1e /var/lib/indy ext4 defaults,nofail 1 2` + 2. Vim Hint: In vim, arrow down to the last line of the file, press the ‘o’ key and then paste in the above line. As before, <esc> then :wq will write and then exit the file. + 3. WARNING! If you mistakenly use the wrong UUID here and continue on (without verifications listed below), you will likely have to remove your VM and start over. (At some point during the install process, ownership is changed on multiple files simultaneously and accidentally setting your UUID wrong will cause that command to wreak havoc at the root of your drive.) +25. Restart the instance to check for NIC and Disk persistence. + 1. From your Virtual Machine overview in Azure, click ‘Restart’, then ‘Yes’ + 2. Login to your VM as before: + 1. `ssh -i <public rsa key file> ubuntu@<Client IP Address>` + 3. Check the NIC and Disk + 1. `ip a` + 2. The output of the above command should have 2 NICS with the correct IP addresses displayed. + 3. `df -h` + 4. The output of the above command should show /var/lib/indy mounted to the /dev/sda1 disk with the correct size (250G). + 5. More NIC and disk verifications will occur during the Indy Node install process. +26. Add a temporary administrative user as a safety net during Two Factor Authentication (2FA) setup. (This is optional, continue to the next step if you choose not to set up a temporary user.) + 1. `sudo adduser tempadmin` + 1. You can safely ignore messages like “sent invalidate(passwd) request, exiting“ + 2. `sudo usermod -aG sudo tempadmin` + 3. Setup sshd_config to temporarily allow password login for the tempadmin user. + 1. `sudo vim /etc/ssh/sshd_config` + 2. Comment out the line containing ‘ChallengeResponseAuthentication’. + 1. #ChallengeResponseAuthentication no + 3. Make sure this line exists and is set to yes: + 1. PasswordAuthentication yes + 4. :wq to save and exit. + 5. `sudo systemctl restart sshd` + 6. The above lines will be altered again when you set up 2FA. + 4. To be able to login, you will also likely need to setup an ssh key + 1. `sudo mkdir /home/tempadmin/.ssh` + 2. `sudo chown tempadmin:tempadmin /home/tempadmin/.ssh` + 3. `sudo vim /home/tempadmin/.ssh/authorized_keys` + 4. Paste the users public key into the open file and then save it (:wq) (You can use the same key as you used for the ubuntu user in this case, since it is a temporary user) + 5. `sudo chown tempadmin:tempadmin /home/tempadmin/.ssh/authorized_keys` +27. If Azure does not setup a user-friendly hostname, then for an easy to use experience with Google authenticator and 2FA, change your hostname to “NewHostName” by doing the following: + 1. `sudo hostnamectl set-hostname NewHostName` + 2. `sudo vi /etc/hosts` + 1. Add a line right after “localhost” + 2. 127.0.0.1 NewHostName + 3. `sudo vi /etc/cloud/cloud.cfg` + 1. Search for preserve_hostname and change the value from false to true: +28. Setup 2FA for SSH access to the Node for your base user. + 1. Optional: Login in a separate terminal as your tempadmin user (that has sudo privileges) to have a backup just in case something goes wrong during setup. + 1. `ssh tempadmin@<Client IP Addr>` + 2. Install Google Authenticator, Duo, or Authy on your phone. + 3. As your base user on the Node VM, run the following to install the authenticator: + 1. `sudo apt-get install libpam-google-authenticator` + 4. Configure the authenticator to allow both password and SSH key login with 2FA by changing 2 files: + 1. `sudo vim /etc/pam.d/common-auth` + 2. Add the following line as the first uncommented line in the file + 1. auth sufficient pam_google_authenticator.so + 2. <esc> + 3. :wq + 3. `sudo vim /etc/ssh/sshd_config` + 1. add/configure the following lines: + 1. `ChallengeResponseAuthentication yes` + 2. `PasswordAuthentication no` + 3. `AuthenticationMethods publickey,keyboard-interactive` + 4. `UsePAM yes` + 2. If you see any of the above lines commented out, remove the # to uncomment them. If you don't see any of the above lines, make sure to add them. If you see those lines configured in any different way, edit them to reflect the above. In my file, a. needed changed b. and d. were already set, and c. needed added (I added it right by b.) + 3. :wq + 4. `sudo systemctl restart sshd` + 5. Setup your base user to use 2FA by running the following from a terminal: + 1. `google-authenticator` + 2. Answer ‘y’ to all questions asked during the setup + 3. Save the secret key, verification code and scratch codes in a safe place. These are all just for your user and can be used to login or to recover as needed. + 6. On your phone app add an account and then scan the barcode or enter the 16 character secret key from the previous steps output. + 7. You should now be able to login using 2FA. First, check that login still works for your base user in a new terminal. If that doesn’t work, double check all of the configuration steps above and then restart sshd again. If it still doesn’t work, it’s possible that a server restart is required to make 2FA work (NOTE: It is dangerous to restart at this point, because then all of your backup terminals that are logged in will be logged out and there is a chance that you will lose access. Please check that all other steps have been executed properly before restarting.) +29. Add other administrative users: + 1. Send the other new admin users the following instructions for generating their own SSH keys: + 1. `ssh-keygen -P "" -t rsa -b 4096 -m pem -f ~/pems/validatornode.pem` + 2. Have the new users send you their public key (e.g. validatornode.pem.pub if they do the above command) + 3. Also have them send you their Public IP address so that you can add it to the Azure firewall to allow them access. Optionally, have them send a preferred username also. + 2. Add their IP addresses to the Azure firewall: + 1. From the Azure portal, select your VM name and click ‘Networking’ in the left menu. + 2. Select the Client NIC (default) and then click on the priority 900 rule allowing port 22 access to your Client IP. + 3. In the new window that pops up, add the new users' IP addresses to the ‘Source IP addresses’ field, separated by commas.(no spaces) + 4. Click ‘Save’ (Note: Restart is not needed. As soon as you save, they should have access.) + 3. Add the users to the server: + 1. Login to the Node as the base user. + 2. Run the following commands, substituting the username in for <newuser> + 3. `sudo adduser <newuser>` + 1. You can safely ignore messages like “sent invalidate(passwd) request, exiting“ + 2. For “Enter new UNIX password:” put password1 (This will be changed later) + 3. Enter a name (optional) + 4. Defaults are fine for the rest + 4. `sudo usermod -aG sudo <newuser>` + 5. Then create a file in the newusers home directory: + 1. `sudo mkdir /home/<newuser>/.ssh` + 2. `sudo chown <newuser>:<newuser> /home/<newuser>/.ssh` + 3. `sudo vim /home/<newuser>/.ssh/authorized_keys` + 4. Paste the users public key into the open file and then save it (:wq) + 5. `sudo chown <newuser>:<newuser> /home/<newuser>/.ssh/authorized_keys` + 6. Repeat the above for each new admin user you create. + 4. The new users are now able to login. Since 2FA is required, when you send the password to each of the new users, also send the following instructions (HINT: fill in the username, Client IP address, and password for them with the correct values): + 1. Thanks for agreeing to help with the administration of our Indy Validator Node. Please login to the node, change your password, and setup Two Factor Authentication (2FA) using the following instructions: + 1. ssh -i <your private SSH key file> <username>@<Client IP Addr> + 2. Type in password1 for your password + 3. On successful login, type in ‘passwd’ to change your password on the Validator Node. Please use a unique password of sufficient length and store it in a secure place (i.e. a password manager). + 4. To set up 2FA, type in ‘google-authenticator’ + 1. Answer ‘y’ to all questions asked during the setup + 2. Save the secret key, verification code, and scratch codes in a safe place. These are all for your user and can be used to login or to recover as needed. + 5. Install Google Authenticator, Duo, Authy, or other google-authenticator compatible app on your phone or device. + 6. On your 2FA phone app, add an account, and then scan the barcode or enter the 16 character secret key from step 4’s output. + 7. Log out and then log back in to check and make sure it worked! + 5. All of your secondary admin users should be setup now. +30. You can now begin the Indy Node installation using the Validator Preparation Guide. diff --git a/docs/source/install-docs/GC-NodeInstall-20.04 b/docs/source/install-docs/GC-NodeInstall-20.04 new file mode 100644 index 000000000..626d42dd3 --- /dev/null +++ b/docs/source/install-docs/GC-NodeInstall-20.04 @@ -0,0 +1,378 @@ +## GC - Install a VM for an Indy Node - Ubuntu 20.04 + +#### Introduction +The following steps are one way to adhere to the Indy Node guidelines for installing a Google Cloud(GC) instance server to host an Indy Node. For the hardware requirements applicable for your network, ask the Network Administrator or refer to the Technical Requirements document included in the Network Governance documents for your network. +NOTE: Since GC regularly updates their user interface, this document becomes outdated quickly. The general steps can still be followed with a good chance of success, but please submit a PR with any changes you see or inform the author of the updates (lynn@indicio.tech) to keep this document up to date. + +#### Installation +1. To prepare for VM creation, there are a few preliminary steps needed. First you might need to create a project in which you will create your VM. You will then need to set up items needed for Node networking (detailed steps below). You will also need to create a snapshot schedule so that your VM can be backed up automatically (optional, but this is the only method described herein that satisfies the "backup" requirement). +2. From the GCP console ([https://console.cloud.google.com/](https://console.cloud.google.com/)) scroll down in the upper left hamburger menu to the 'Networking' section, select 'VPC Network', then 'VPC Networks' If you haven’t already, you might need to “Enable” the compute engine API. + 1. Before you begin, decide on a 'region' in which to run your VM that closely matches the jurisdiction of your company's corporate offices. Record the region selected as it will be used later in these instructions. + 2. Create 2 new VPC Networks using the following steps. + 1. Click 'CREATE VPC NETWORK' to create a network for your Client connection on your node. + 2. Name - your choice (e.g. client-vpc-9702) + 3. Description - your choice + 4. Subnets - select 'Custom' and create a new subnet. + 1. Expand the 'new subnet' section + 2. Name - your choice (e.g. client-subnet-9702) + 3. Region - Select the region chosen earlier. + 4. IP address range - Type in a valid new subnet block. (e.g. 10.0.1.0/24) + 5. Private Google access - off + 6. Flow logs - your choice (e.g. off) + 7. Click 'Done' + 5. Dynamic routing mode - Regional + 6. Click 'Create' + 7. Repeat the above steps to create a second VPC Network for the Node IP of your server using names node-vpc-9701 and node-subnet-9701 and a range of 10.0.2.0/24 + 3. Now set up the firewalls for your new VPC's + 4. Click on the Client VPC in the list of VPC Networks - left side (e.g. client-vpc-9702) + 1. Click 'Firewalls' in about the middle of the page, and then click 'ADD FIREWALL RULE' to add SSH access through the Client VPC. + 1. Name - your choice (e.g. ssh-for-admin-access) + 2. Logs - Off + 3. Network - client-vpc-9702 (should already be set) + 4. Priority - default is fine + 5. Direction of traffic - Ingress + 6. Action on match - Allow + 7. Targets - All instances in the network (If you have other VM's using the same VPC as this one, then perform the optional steps listed next) + 8. OPTIONAL: Targets - Specified target tags + 1. Target tags - client9702 (record this value as you will need to associate it later with the VM.) + 9. Source filter - IPv4 ranges + 10. Source IP ranges - Enter the public IP addresses or ranges for your Node Administrators. (e.g. 67.199.174.247) + 11. Protocols and ports - Specified protocols and ports + 1. Select the tcp box and enter 22 for the port. + 12. Click 'Create' + 2. Click 'Firewall rules' in about the middle of the page, and then click 'ADD FIREWALL RULE' to add port 9702 access through the Client VPC. + 1. Name - your choice (e.g. )client-access-9702 + 2. Logs - Off + 3. Network - client-vpc-9702 (should already be set) + 4. Priority - default is fine + 5. Direction of traffic - Ingress + 6. Action on match - Allow + 7. Targets - All instances in the network + 8. Source filter - IPv4 ranges + 9. Source IP ranges - Enter the signification for "all access" (e.g. 0.0.0.0/0) + 10. Protocols and ports - Specified protocols and ports + 1. Select the tcp box and enter 9702 for the port. + 11. Click 'Create' + 3. Click the back arrow to return to the 'VPC networks' view + 4. Click on the node-vpc-9701 network then click 'FIREWALLS' to add some rules. + 5. Ask your network administrator for a list of node IPs to add to your 'allowed list' as part of the following steps. NOTE if you choose to do this firewall setup step later, then open up port 9701 on 0.0.0.0/0 temporarily, then a remove it later when you add the other nodes' IPs. For each node IP on the Indy network you will be joining, do the following: + 1. Click 'Add firewall rule' + 2. Name- Name (alias) of the node you are adding (the next name in the list) + 3. Logs - Off + 4. Network - (should already be set) + 5. Priority - default is fine + 6. Direction of traffic - Ingress + 7. Action on match - Allow + 8. Targets - All instances in the network + 9. Source filter - IP ranges + 10. Source IP ranges - Enter the public IP address matching the Node name that you are adding. (e.g. 68.179.145.150/32) + 11. Protocols and ports - Specified protocols and ports + 1. Select the tcp box and enter 9701 for the port. + 12. Click 'Create' + 6. Repeat the last set of steps for each node in the node list, changing the node Name and IP address for each new rule. +3. From the GC 'Compute Engine' console, click 'Snapshots’ in the left pane + 1. Select the 'SNAPSHOT SCHEDULES' tab then click 'CREATE SNAPSHOT SCHEDULE' + 2. Name - your choice (e.g. 'nodesnapweekly') + 3. Region - Select the same region chosen earlier in this guide. + 4. Snapshot location - Regional (default location) + 5. Schedule frequency - Weekly (then your choice of day and time.) + 6. Autodelete snapshots after - 60 days + 7. Deletion rule - your choice (e.g. Select 'Delete snapshots older than days' to remove the snapshots after you no longer need the VM) + 8. Other options - your choice (defaults are fine) + 9. Click 'CREATE' +4. From the GC Compute Engine console, click 'VM Instances' in the left pane +5. Click 'CREATE INSTANCE' +6. WARNING: Do not press enter or return at any time during the filling out of the form that is now displayed. Pressing enter before you completed the configuration might inadvertently create the VM and you might have to delete the VM and start over. +7. Select 'New VM instance' in the left pane +8. Name - your choice (tempnet-node1) +9. Labels - none needed +10. Region - Select the same region chosen earlier in this guide. +11. Choose and record a zone (us-east4-c) +12. Machine configuration + 1. Select 'General-purpose' tab + 2. Series - N1 is probably sufficient + 3. Machine Type - Select a type with 2 vCPUs and 8G RAM (n1-standard-2 is close enough) or greater, or choose a tpye matching your networks governance rules. +13. Container - leave unchecked (not needed) +14. Boot disk - Click 'Change' + 1. Select the 'Public images' tab (default) + 2. Operating system - select “Ubuntu” + 3. Version - 'Ubuntu 20.04 LTS (x86/64)' (note: there are many Ubuntu 20.04 LTS options. Please find the one with ‘x86/64’ in the subtitle, and you must choose the 20.04 Ubuntu version) + 4. Boot disk type - your choice (Standard is sufficient) + 5. Size - default is sufficient (10 GB) + 6. Click 'Select' +15. Identity and API access - leave at defaults +16. Firewall - leave boxes unchecked +17. Click to expand the “Advanced options” section + 1. Networking tab + 1. Network tags - leave blank + 2. Hostname - default (blank) should be fine + 3. Network interfaces (1) - Fill in the fields for the network interface that will correspond to the Client-NIC interface. The Node-NIC interface will be the second interface created for this instance. Expand “default” using the “arrow” to begin with the client interface: + 1. Network - select the name you created earlier ( client-vpc-9702) + 2. Subnetwork - default + 3. Primary internal IP - Reserve static internal IP + 1. Name - your choice (e.g. client-internal-ip) + 2. Description - optional + 3. Subnet - default + 4. Static IP address - Assign automatically + 5. Purpose - Non-shared + 6. Click 'RESERVE' + 4. External IPv4 address - click the down arrow, then click “CREATE IP ADDRESS” + 7. Name - your choice (client-public-ip) + 8. Description - optional + 9. Click 'RESERVE' + 5. Public DNS PTR Record - unchecked + 6. Click 'Done' + 4. Click 'Add network interface' (Node-NIC) + 1. Network - node-vpc-9701 + 2. Subnetwork - default + 3. Primary internal IP - Reserve static internal IP + 1. Name - your choice (e.g. node-internal-ip) + 2. Subnet - default + 3. Static IP address - Assign automatically + 4. Purpose - Non-shared + 5. Click 'RESERVE' + 4. External IP - Create IP address + 6. Name - your choice (node-public-ip) + 7. Click 'RESERVE' + 5. Click 'Done' + 2. Disks tab + 1. Click '+ Add new disk` + 1. Name - your choice (e.g. nodedatadisk) + 2. Description - your choice + 3. Source type - Blank disk + 4. Disk Type - Standard Persistent disk + 5. Size (GB) - 250 + 6. Snapshot schedule - nodesnapweekly (created earlier in these instructions) If it does not appear in the list, type the name in and then select it. + 7. Encryption - your choice, default is fine + 8. Mode - Read-write + 9. Deletion rule - your choice (e.g. use 'Delete disk' to make sure there are no unseen charges when you no longer need the VM) + 10. Defaults are fine for the rest of this section + 2. Click ‘SAVE’ + 3. Security Tab + 1. Click the “manage access” dropdown + 2. Shielded VM - (defaults) + 3. SSH Keys + 1. Check the box to 'Block project-wide SSH keys' (recommended) + 2. Enter a public SSH key for each Admin user (at least your own) + 3. To create an SSH key: + 1. You can use the following command to create a new SSH key pair on Linux or MAC that will work for this step. + 1. ssh-keygen -P "" -t rsa -b 4096 -m pem -f ~/pems/gcpnode.pem + 2. Once a public key is created the following example can be used on MAC or Linux to display the public key and copy it to the form: + 1. cat ~/pems/gcpnode.pem.pub + 3. Copy the results of the above and paste it into the space provided being careful NOT to copy any leading or trailing whitespace. + 4. Do NOT click 'Create' yet!, Please proceed to the Disks tab. + 4. Management Tab + 1. Description - your choice + 2. Deletion protection - Check the box (recommended) + 3. (the rest of the options under Management) - your choice (defaults will work) +18. Click 'Create' to create the new GCP VM instance. +19. Wait for your VM to launch. +20. Log in to your VM + 1. From your Linux or MAC workstation do the following: (a Windows workstation will be different) + 2. `ssh -i <public rsa key file> ubuntu@<Client IP Address> ` + 3. Where rsa key file was the ssh key .pem file generated earlier + 4. And where Client IP is the public address from Nic #1 (Client Public IP from your Node Installation Info spreadsheet) + 5. for example: `ssh -i ~/pems/jecko.pem ubuntu@13.58.197.208` + 6. NOTE: I got an error the first time I ran the above to login: "Permission denied" because "Permissions are too open" <for your pem file>. To correct the issue I ran chmod 600 ~/pems/jecko.pem and then I was able to login successfully. +21. Configure networking to the second NIC + 1. From your instance's command prompt, run the command `ip a` and verify that you have 2 internal IP addresses that match what you have in your Node Installation Info spreadsheet. Note the names of the network interfaces. (e.g. ens4 and ens5) The remaining instructions in this section assume ens4 is your original primary NIC (Client-NIC) and ens5 is the secondary NIC (Node-NIC). + 2. Record the interface names, ip addresses, and mac addresses of your 2 interfaces contained in the output of `ip a` + 1. The MAC address is found right after ‘link/ether’ for each interface and is formatted like this: 12:e6:fa:8f:42:79 + 3. Find the default gateway for the main interface. + 1. `ip r` + 2. Look for the line that says ‘default’ and the gateway ends with a .1 + 3. For example: 10.0.1.1 + 4. Disable automatic network management by GC. Run the following: + 1. `sudo su -` + 2. `echo 'network: {config: disabled}' > /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg` + 5. `vim /etc/iproute2/rt_tables` + 1. Add 2 lines + 800 800 + 801 801 + 6. `vim /etc/netplan/50-cloud-init.yaml` + 1. Replace the “ethernets:” section of the file with the following, but substitute in your own local (internal) IP addresses, and your own routes in several places. + + ethernets: + ens4: + addresses: + - 10.0.1.2/24 + gateway4: 10.0.1.1 + match: + macaddress: 12:e6:fa:8f:42:79 + mtu: 1500 + set-name: ens4 + routes: + - to: 0.0.0.0/0 + via: 10.0.1.1 + table: 800 + routing-policy: + - from: 10.0.1.2/24 + table: 800 + priority: 300 + nameservers: + addresses: + - 8.8.8.8 + - 8.8.4.4 + - 1.1.1.1 + ens6: + addresses: + - 10.0.2.2/24 + match: + macaddress: 12:69:78:aa:0d:b1 + mtu: 1500 + set-name: ens6 + routes: + - to: 0.0.0.0/0 + via: 10.0.2.1 + table: 801 + routing-policy: + - from: 10.0.2.2 + table: 801 + priority: 300 + nameservers: + addresses: + - 8.8.8.8 + - 8.8.4.4 + - 1.1.1.1 + 7. Please double and triple check that all of the information in the above file is correct before proceeding. Mistakes in the netplan file can cause you to lose access to your VM and you might have to start over. + 8. `netplan generate` + 9. If no output appears (no errors) run: + 1. `netplan apply` + 2. If the above command does not return you to the command prompt, then you made an error in your netplan file and will need to start over from the beginning of this document. Sorry! (You will likely be able to re-use most of the elastic IPs, subnets, security groups, and etc, that you created, but otherwise you need to start from the beginning.) + 10. NOTE: Netplan guidance came from: https://www.opensourcelisting.com/how-to-configure-multiple-network-interfaces/ + 11. Restart your instance. + 1. `reboot` + 12. ssh to your instance again as described earlier. + 1. `ssh -i <public rsa key file> ubuntu@<Client IP Address>` +22. Configure and mount the data disk. + 1. Find the name of your data disk: + 1. `sudo fdisk -l` + 2. In most cases **/dev/sdb** will be the name of the 250Gib data disk created during the GC instance setup. + 2. The following steps assume that your disk size is less than 2 TiB, that your disk is /dev/sdb and that you will be using MBR partitioning. + 3. `sudo fdisk /dev/sdb` + 1. Create a new partition + 1. n + 2. p + 3. <defaults for the rest> TIP: press enter 3 times to accept the defaults and complete the process of creating a partition. + 4. Now, print and write the partition and exit. + 5. p + 6. w + 4. Update the kernel: + 1. `partprobe` + 5. Add a filesystem to your new disk partition: + 2. `sudo mkfs -t ext4 /dev/sdb1` + 6. Mount the disk to the directory where the Node software does the most writing (/var/lib/indy): + 1. `sudo mkdir /var/lib/indy` + 2. `sudo mount /dev/sdb1 /var/lib/indy` + 7. Add the drive to /etc/fstab so that it mounts at server startup. + 1. `sudo blkid` + 2. Record the UUID of /dev/sdb1 for use in the /etc/fstab file. + 3. `sudo vim /etc/fstab` + 4. Add the following line to the end of the fstab file (substituting in your own UUID): + 1. `UUID=336030b9-df26-42e7-8c42-df7a``967f3c1e /var/lib/indy ext4 defaults,nofail 1 2` + 2. Vim Hint: In vim, arrow down to the last line of the file, press the ‘o’ key and then paste in the above line. As before, <esc> then :wq will write and then exit the file. + 3. WARNING! If you mistakenly use the wrong UUID here and continue on (without verifications listed below), you will likely have to remove your VM and start over. (At some point during the install process, ownership is changed on multiple files simultaneously and accidentally setting your UUID wrong will cause that command to wreak havoc at the root of your drive.) +23. Restart the instance to check for NIC and Disk persistence. + 1. From the GC Compute Engine console, click 'VM Instances' in the left pane, select your VM and click 'Reset'. + 2. Login to your VM as before: + 1. `ssh -i <public rsa key file> ubuntu@<Client IP Address>` + 3. Check the NIC and Disk + 1. `ip a` + 2. The output of the above command should have 2 NICS with the correct IP addresses displayed. + 3. `df -h` + 4. The output of the above command should show /var/lib/indy mounted to the /dev/sdb1 disk with the correct size (250G). + 5. More NIC and disk verifications will occur during the Indy Node install process. +24. Add a temporary administrative user as a safety net during Two Factor Authentication (2FA) setup. (This is optional, continue to the next step if you choose not to set up a temporary user.) + 1. `sudo adduser tempadmin` + 1. You can safely ignore messages like “sent invalidate(passwd) request, exiting“ + 2. `sudo usermod -aG sudo tempadmin` + 3. Setup sshd_config to temporarily allow password login for the tempadmin user. + 1. `sudo vim /etc/ssh/sshd_config` + 2. Comment out the line containing ‘ChallengeResponseAuthentication’. + 1. #ChallengeResponseAuthentication no + 3. Make sure this line exists and is set to yes: + 1. PasswordAuthentication yes + 4. :wq to save and exit. + 5. `sudo systemctl restart sshd` + 6. The above lines will be altered again when you set up 2FA. + 4. To be able to login, you will also likely need to setup an ssh key + 1. `sudo mkdir /home/tempadmin/.ssh` + 2. `sudo chown tempadmin:tempadmin /home/tempadmin/.ssh` + 3. `sudo vim /home/tempadmin/.ssh/authorized_keys` + 4. Paste the users public key into the open file and then save it (:wq) (You can use the same key as you used for the ubuntu user in this case, since it is a temporary user) + 5. `sudo chown tempadmin:tempadmin /home/tempadmin/.ssh/authorized_keys` +25. If GC does not setup a user-friendly hostname, then for an easy to use experience with Google authenticator and 2FA, change your hostname to “NewHostName” by doing the following: + 1. `sudo hostnamectl set-hostname NewHostName` + 2. `sudo vi /etc/hosts` + 1. Add a line right after “localhost” + 2. 127.0.0.1 NewHostName + 3. `sudo vi /etc/cloud/cloud.cfg` + 1. Search for preserve_hostname and change the value from false to true: +26. Setup 2FA for SSH access to the Node for your base user. + 1. Optional: Login in a separate terminal as your tempadmin user (that has sudo privileges) to have a backup just in case something goes wrong during setup. + 1. `ssh tempadmin@<Client IP Addr>` + 2. Install Google Authenticator, Duo, or Authy on your phone. + 3. As your base user on the Node VM, run the following to install the authenticator: + 1. `sudo apt-get install libpam-google-authenticator` + 4. Configure the authenticator to allow both password and SSH key login with 2FA by changing 2 files: + 1. `sudo vim /etc/pam.d/common-auth` + 2. Add the following line as the first uncommented line in the file + 1. auth sufficient pam_google_authenticator.so + 2. <esc> + 3. :wq + 3. `sudo vim /etc/ssh/sshd_config` + 1. add/configure the following lines: + 1. `ChallengeResponseAuthentication yes` + 2. `PasswordAuthentication no` + 3. `AuthenticationMethods publickey,keyboard-interactive` + 4. `UsePAM yes` + 2. If you see any of the above lines commented out, remove the # to uncomment them. If you don't see any of the above lines, make sure to add them. If you see those lines configured in any different way, edit them to reflect the above. In my file, a. needed changed b. and d. were already set, and c. needed added (I added it right by b.) + 3. :wq + 4. `sudo systemctl restart sshd` + 5. Setup your base user to use 2FA by running the following from a terminal: + 1. `google-authenticator` + 2. Answer ‘y’ to all questions asked during the setup + 3. Save the secret key, verification code and scratch codes in a safe place. These are all just for your user and can be used to login or to recover as needed. + 6. On your phone app add an account and then scan the barcode or enter the 16 character secret key from the previous steps output. + 7. You should now be able to login using 2FA. First, check that login still works for your base user in a new terminal. If that doesn’t work, double check all of the configuration steps above and then restart sshd again. If it still doesn’t work, it’s possible that a server restart is required to make 2FA work (NOTE: It is dangerous to restart at this point, because then all of your backup terminals that are logged in will be logged out and there is a chance that you will lose access. Please check that all other steps have been executed properly before restarting.) +27. Add other administrative users: + 1. Send the other new admin users the following instructions for generating their own SSH keys: + 1. `ssh-keygen -P "" -t rsa -b 4096 -m pem -f ~/pems/validatornode.pem` + 2. Have the new users send you their public key (e.g. validatornode.pem.pub if they do the above command) + 3. Also have them send you their Public IP address so that you can add it to the GC firewall to allow them access. Optionally, have them send a preferred username also. + 2. Add their IP addresses to the GC firewall: + 1. From the GC VPC Networks screen (GC main menu -> VPC network->VPC networks), click on your Client VPC (e.g. client-vpc-9702) + 2. Click the 'Firewall rules' tab (in about the middle of the screen). + 3. Click on the name of the rule that allows port 22 access for your admins (e.g. ssh-for-admin-access) + 4. Click 'EDIT' at the top of the screen. + 5. Scroll down to the list of Source IP ranges and add the new Admins' IP addresses. + 6. Click ‘SAVE’ (Note: Restart is not needed. As soon as you save, they should have access.) + 3. Add the users to the server: + 1. Login to the Node as the base user. + 2. Run the following commands, substituting the username in for <newuser> + 3. `sudo adduser <newuser>` + 1. You can safely ignore messages like “sent invalidate(passwd) request, exiting“ + 2. For “Enter new UNIX password:” put password1 (This will be changed later) + 3. Enter a name (optional) + 4. Defaults are fine for the rest + 4. `sudo usermod -aG sudo <newuser>` + 5. Then create a file in the newusers home directory: + 1. `sudo mkdir /home/<newuser>/.ssh` + 2. `sudo chown <newuser>:<newuser> /home/<newuser>/.ssh` + 3. `sudo vim /home/<newuser>/.ssh/authorized_keys` + 4. Paste the users public key into the open file and then save it (:wq) + 5. `sudo chown <newuser>:<newuser> /home/<newuser>/.ssh/authorized_keys` + 6. Repeat the above for each new admin user you create. + 4. The new users are now able to login. Since 2FA is required, when you send the password to each of the new users, also send the following instructions (HINT: fill in the username, Client IP address, and password for them with the correct values): + 1. Thanks for agreeing to help with the administration of our Indy Validator Node. Please login to the node, change your password, and setup Two Factor Authentication (2FA) using the following instructions: + 1. ssh -i <your private SSH key file> <username>@<Client IP Addr> + 2. Type in password1 for your password + 3. On successful login, type in ‘passwd’ to change your password on the Validator Node. Please use a unique password of sufficient length and store it in a secure place (i.e. a password manager). + 4. To set up 2FA, type in ‘google-authenticator’ + 1. Answer ‘y’ to all questions asked during the setup + 2. Save the secret key, verification code, and scratch codes in a safe place. These are all for your user and can be used to login or to recover as needed. + 5. Install Google Authenticator, Duo, Authy, or other google-authenticator compatible app on your phone or device. + 6. On your 2FA phone app, add an account, and then scan the barcode or enter the 16 character secret key from step 4’s output. + 7. Log out and then log back in to check and make sure it worked! + 5. All of your secondary admin users should be setup now. +28. You can now begin the Indy Node installation using the Validator Preparation Guide. diff --git a/docs/source/install-docs/Physical-NodeInstall-20.04 b/docs/source/install-docs/Physical-NodeInstall-20.04 new file mode 100644 index 000000000..3adf7f406 --- /dev/null +++ b/docs/source/install-docs/Physical-NodeInstall-20.04 @@ -0,0 +1,219 @@ +## Physical Hardware - Install a server for an Indy Node - Ubuntu 20.04 + +#### Introduction +The following steps are one way to adhere to the Indy Node guidelines for installing a physical server to host an Indy Node. For the hardware requirements applicable for your network, ask the Network administrator or refer to the Technical Requirements document included in the Network Governance documents for your network. + +#### Installation + +1. Before you begin: + 1. For most governance frameworks' hardware requirements, you will need 2 NIC's and 2 subnets (one per NIC). Configure these before beginning the install. + 2. Hardware requirements might include the following, (or greater, depending on your network governance requirements): + 1. 8 G RAM + 2. 2 CPU cores + 3. 250G RAIDed disk space + 4. 2 NICs with 2 Public IP addresses (1 per NIC) + 3. Create your own SSH key to use later for logging in to the Node. + 1. `mkdir ~/pems` + 2. `ssh-keygen -P "" -t rsa -b 4096 -m pem -f ~/pems/validatornode.pem` +2. Install Ubuntu 20.04 on the server (or VM). + 1. During installation, please make sure that the /var/lib/indy directory has the required amount of disk space available to it. It does not need to be that specific directory, it's okay if / has all of it. +3. Log in to your VM + 1. Use an admin user created during the installation process (not root). + + + +4. Configure networking to the second NIC + 1. From your instance's command prompt, run the command `ip a` and verify that you have 2 internal IP addresses that match what you have in your Node Installation Info spreadsheet. Note the names of the network interfaces. (e.g. ens5 and ens6) The remaining instructions in this section assume ens5 is your original primary NIC (Client-NIC) and ens6 is the secondary NIC (Node-NIC). + 2. Record the interface names, ip addresses, and mac addresses of your 2 interfaces contained in the output of `ip a` + 1. The MAC address is found right after ‘link/ether’ for each interface and is formatted like this: 12:e6:fa:8f:42:79 + 2. For the ens6 or node interface, you might only have the MAC address displayed and not the local IP address yet. If so, use the IP address you recorded earlier for this interface. + 3. Find the default gateway for the main interface. + 1. `ip r` + 2. Look for the line that says ‘default’ and the gateway ends with a .1 + 3. For example: 172.31.84.1 + 4. Disable automatic network management by GCP. Run the following: + 1. `sudo su -` + 2. `echo 'network: {config: disabled}' > /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg` + 5. `vim /etc/iproute2/rt_tables` + 1. Add 2 lines + 800 800 + 801 801 + 6. `vim /etc/netplan/50-cloud-init.yaml` + 1. Replace the “ethernets:” section of the file with the following, but substitute in your own local (internal) IP addresses, and your own routes in several places. + + ethernets: + ens5: + addresses: + - 172.31.84.84/24 + gateway4: 172.31.84.1 + match: + macaddress: 12:e6:fa:8f:42:79 + mtu: 1500 + set-name: ens5 + routes: + - to: 0.0.0.0/0 + via: 172.31.84.1 + table: 800 + routing-policy: + - from: 172.31.84.84 + table: 800 + priority: 300 + nameservers: + addresses: + - 8.8.8.8 + - 8.8.4.4 + - 1.1.1.1 + ens6: + addresses: + - 172.31.128.159/24 + match: + macaddress: 12:69:78:aa:0d:b1 + mtu: 1500 + set-name: ens6 + routes: + - to: 0.0.0.0/0 + via: 172.31.128.1 + table: 801 + routing-policy: + - from: 172.31.128.159 + table: 801 + priority: 300 + nameservers: + addresses: + - 8.8.8.8 + - 8.8.4.4 + - 1.1.1.1 + 7. Please double and triple check that all of the information in the above file is correct before proceeding. Mistakes in the netplan file can cause you to lose access to your VM and you might have to start over. + 8. `netplan generate` + 9. If no output appears (no errors) run: + 1. `netplan apply` + 2. If the above command does not return you to the command prompt, then you made an error in your netplan file and will need to start over from the beginning of this document. Sorry! (You will likely be able to re-use most of the elastic IPs, subnets, security groups, and etc, that you created, but otherwise you need to start from the beginning.) + 10. NOTE: Netplan guidance came from: https://www.opensourcelisting.com/how-to-configure-multiple-network-interfaces/ + 11. Restart your instance. + 1. `reboot` + 12. ssh to your instance again as described earlier. + 1. `ssh -i <public rsa key file> ubuntu@<Client IP Address>` +5. Configure and mount the data disk. + 1. Find the name of your data disk: + 1. `sudo fdisk -l` + 2. The following steps assume that you have 2 disks, your disk size is less than 2 TiB, and that your disk is /dev/sdb and that you will be using MBR partitioning. + 3. `sudo fdisk /dev/sdb` + 1. Create a new partition + 1. n + 2. p + 3. <defaults for the rest> TIP: press enter 3 times to accept the defaults and complete the process of creating a partition. + 4. Now, print and write the partition and exit. + 5. p + 6. w + 4. Update the kernel: + 1. `partprobe` + 5. Add a filesystem to your new disk partition: + 2. `sudo mkfs -t ext4 /dev/sdb1` + 6. Mount the disk to the directory where the Node software does the most writing (/var/lib/indy): + 1. `sudo mkdir /var/lib/indy` + 2. `sudo mount /dev/sdb1 /var/lib/indy` + 7. Add the drive to /etc/fstab so that it mounts at server startup. + 1. `sudo blkid` + 2. Record the UUID of /dev/sdb1 for use in the /etc/fstab file. + 3. `sudo vim /etc/fstab` + 4. Add the following line to the end of the fstab file (substituting in your own UUID): + 1. `UUID=336030b9-df26-42e7-8c42-df7a``967f3c1e /var/lib/indy ext4 defaults,nofail 1 2` + 2. Vim Hint: In vim, arrow down to the last line of the file, press the ‘o’ key and then paste in the above line. As before, <esc> then :wq will write and then exit the file. + 3. WARNING! If you mistakenly use the wrong UUID here and continue on (without verifications listed below), you will likely have to remove your VM and start over. (At some point during the install process, ownership is changed on multiple files simultaneously and accidentally setting your UUID wrong will cause that command to wreak havoc at the root of your drive.) +6. Restart the instance to check for NIC and Disk persistence. + 1. Login to your VM as before: + 1. `ssh -i <public rsa key file> ubuntu@<Client IP Address>` + 2. Check the NIC and Disk + 1. `ip a` + 2. The output of the above command should have 2 NICS with the correct IP addresses displayed. + 3. `df -h` + 4. The output of the above command should show /var/lib/indy mounted to the /dev/sdb1 disk with the correct size. + 5. More NIC and disk verifications will occur during the Indy Node install process. +7. Add a temporary administrative user as a safety net during Two Factor Authentication (2FA) setup. (This is optional, continue to the next step if you choose not to set up a temporary user.) + 1. `sudo adduser tempadmin` + 1. You can safely ignore messages like “sent invalidate(passwd) request, exiting“ + 2. `sudo usermod -aG sudo tempadmin` + 3. Setup sshd_config to temporarily allow password login for the tempadmin user. + 1. `sudo vim /etc/ssh/sshd_config` + 2. Comment out the line containing ‘ChallengeResponseAuthentication’. + 1. #ChallengeResponseAuthentication no + 3. Make sure this line exists and is set to yes: + 1. PasswordAuthentication yes + 4. :wq to save and exit. + 5. `sudo systemctl restart sshd` + 6. The above lines will be altered again when you set up 2FA. + 4. To be able to login, you will also likely need to setup an ssh key + 1. `sudo mkdir /home/tempadmin/.ssh` + 2. `sudo chown tempadmin:tempadmin /home/tempadmin/.ssh` + 3. `sudo vim /home/tempadmin/.ssh/authorized_keys` + 4. Paste the users public key into the open file and then save it (:wq) (You can use the same key as you used for the ubuntu user in this case, since it is a temporary user) + 5. `sudo chown tempadmin:tempadmin /home/tempadmin/.ssh/authorized_keys` +8. For an easy to use experience with Google authenticator and 2FA you might want to change your hostname. To change your hostname to “NewHostName” do the following: + 1. `sudo hostnamectl set-hostname NewHostName` + 2. `sudo vi /etc/hosts` + 1. Add a line right after “localhost” + 2. 127.0.0.1 NewHostName + 3. `sudo vi /etc/cloud/cloud.cfg` + 1. Search for preserve_hostname and change the value from false to true: +28. Setup 2FA for SSH access to the Node for your base user. + 1. Optional: Login in a separate terminal as your tempadmin user (that has sudo privileges) to have a backup just in case something goes wrong during setup. + 1. `ssh tempadmin@<Client IP Addr>` + 2. Install Google Authenticator, Duo, or Authy on your phone. + 3. As your base user on the Node VM, run the following to install the authenticator: + 1. `sudo apt-get install libpam-google-authenticator` + 4. Configure the authenticator to allow both password and SSH key login with 2FA by changing 2 files: + 1. `sudo vim /etc/pam.d/common-auth` + 2. Add the following line as the first uncommented line in the file + 1. auth sufficient pam_google_authenticator.so + 2. <esc> + 3. :wq + 3. `sudo vim /etc/ssh/sshd_config` + 1. add/configure the following lines: + 1. `ChallengeResponseAuthentication yes` + 2. `PasswordAuthentication no` + 3. `AuthenticationMethods publickey,keyboard-interactive` + 4. `UsePAM yes` + 2. If you see any of the above lines commented out, remove the # to uncomment them. If you don't see any of the above lines, make sure to add them. If you see those lines configured in any different way, edit them to reflect the above. In my file, a. needed changed b. and d. were already set, and c. needed added (I added it right by b.) + 3. :wq + 4. `sudo systemctl restart sshd` + 5. Setup your base user to use 2FA by running the following from a terminal: + 1. `google-authenticator` + 2. Answer ‘y’ to all questions asked during the setup + 3. Save the secret key, verification code and scratch codes in a safe place. These are all just for your user and can be used to login or to recover as needed. + 6. On your phone app add an account and then scan the barcode or enter the 16 character secret key from the previous steps output. + 7. You should now be able to login using 2FA. First, check that login still works for your base user in a new terminal. If that doesn’t work, double check all of the configuration steps above and then restart sshd again. If it still doesn’t work, it’s possible that a server restart is required to make 2FA work (NOTE: It is dangerous to restart at this point, because then all of your backup terminals that are logged in will be logged out and there is a chance that you will lose access. Please check that all other steps have been executed properly before restarting.) +29. Add other administrative users: + 1. Send the other new admin users the following instructions for generating their own SSH keys: + 1. `ssh-keygen -P "" -t rsa -b 4096 -m pem -f ~/pems/validatornode.pem` + 2. Have the new users send you their public key (e.g. validatornode.pem.pub if they do the above command) + 3. Also have them send you their Public IP address so that you can add it to the firewall to allow them access. Optionally, have them send a preferred username also. + 2. Add their IP addresses to the firewall: + 3. Add the users to the server: + 1. Login to the Node as the base user. + 2. Run the following commands, substituting the username in for <newuser> + 3. `sudo adduser <newuser>` + 1. You can safely ignore messages like “sent invalidate(passwd) request, exiting“ + 2. For “Enter new UNIX password:” put password1 (This will be changed later) + 3. Enter a name (optional) + 4. Defaults are fine for the rest + 4. `sudo usermod -aG sudo <newuser>` + 5. Then create a file in the newusers home directory: + 1. `sudo mkdir /home/<newuser>/.ssh` + 2. `sudo chown <newuser>:<newuser> /home/<newuser>/.ssh` + 3. `sudo vim /home/<newuser>/.ssh/authorized_keys` + 4. Paste the users public key into the open file and then save it (:wq) + 5. `sudo chown <newuser>:<newuser> /home/<newuser>/.ssh/authorized_keys` + 6. Repeat the above for each new admin user you create. + 4. The new users are now able to login. Since 2FA is required, when you send the password to each of the new users, also send the following instructions (HINT: fill in the username, Client IP address, and password for them with the correct values): + 1. Thanks for agreeing to help with the administration of our Indy Validator Node. Please login to the node, change your password, and setup Two Factor Authentication (2FA) using the following instructions: + 1. ssh -i <your private SSH key file> <username>@<Client IP Addr> + 2. Type in password1 for your password + 3. On successful login, type in ‘passwd’ to change your password on the Validator Node. Please use a unique password of sufficient length and store it in a secure place (i.e. a password manager). + 4. To set up 2FA, type in ‘google-authenticator’ + 1. Answer ‘y’ to all questions asked during the setup + 2. Save the secret key, verification code, and scratch codes in a safe place. These are all for your user and can be used to login or to recover as needed. + 5. Install Google Authenticator, Duo, Authy, or other google-authenticator compatible app on your phone or device. + 6. On your 2FA phone app, add an account, and then scan the barcode or enter the 16 character secret key from step 4’s output. + 7. Log out and then log back in to check and make sure it worked! + 5. All of your secondary admin users should be setup now. +30. You can now begin the Indy Node installation using the Validator Preparation Guide. From c02538ca94c68cb797f1f0ba5244f5b8743f7e67 Mon Sep 17 00:00:00 2001 From: Lynn Bendixsen Date: Tue, 31 Oct 2023 08:20:48 -0600 Subject: [PATCH 4/4] Made some edits based on comments. Thanks! Signed-off-by: Lynn Bendixsen --- docs/source/install-docs/Physical-NodeInstall-20.04 | 4 ++++ docs/source/node-add-troubleshooting.md | 4 ++-- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/docs/source/install-docs/Physical-NodeInstall-20.04 b/docs/source/install-docs/Physical-NodeInstall-20.04 index 3adf7f406..ca63c669e 100644 --- a/docs/source/install-docs/Physical-NodeInstall-20.04 +++ b/docs/source/install-docs/Physical-NodeInstall-20.04 @@ -63,6 +63,8 @@ The following steps are one way to adhere to the Indy Node guidelines for instal - 8.8.8.8 - 8.8.4.4 - 1.1.1.1 + dhcp6: no + link-local: [ ] ens6: addresses: - 172.31.128.159/24 @@ -83,6 +85,8 @@ The following steps are one way to adhere to the Indy Node guidelines for instal - 8.8.8.8 - 8.8.4.4 - 1.1.1.1 + dhcp6: no + link-local: [ ] 7. Please double and triple check that all of the information in the above file is correct before proceeding. Mistakes in the netplan file can cause you to lose access to your VM and you might have to start over. 8. `netplan generate` 9. If no output appears (no errors) run: diff --git a/docs/source/node-add-troubleshooting.md b/docs/source/node-add-troubleshooting.md index 1ac81289a..c33f786f6 100644 --- a/docs/source/node-add-troubleshooting.md +++ b/docs/source/node-add-troubleshooting.md @@ -1,12 +1,12 @@ # Troubleshooting - Adding or Upgrading Indy Nodes -Many things can go wrong while adding or upgrading nodes on an existing Indy network and this guide will cover symptoms and issues encountered and some steps you might take to recover from those. The steps listed are likely just possible remedies to the listed issues. Feel free to add more remedies or issues if you don't see your's included here. As bugs are fixed, the issues noted below might not occur any more, or might have a different remedy. +Things can go wrong while adding or upgrading nodes on an existing Indy network and this guide will cover symptoms and issues encountered and some steps you might take to recover from those. The steps listed are likely just possible remedies to the listed issues. Feel free to add more remedies or issues if you don't see your's included here. As bugs are fixed, the issues noted below might not occur any more, or might have a different remedy. ## Adding a Node This section covers troubleshooting the addition of a node to a network. This can occur either as part of an upgrade (e.g. the 20.04 upgrade) or as part of a new node being added to an existing network. ### Symptom 1 - Node is unresponsive - Cause #1 - Node is performing catchup. (Large Network) -If your node appears unresponsive after adding it to a network (i.e. validator-info shows non-incrementing subledger counts) and no other symptoms are evident, then the first thing to do is wait. While smaller networks with a low number of transactions seem to perform "catchup" quite fast (within a minute or two for a domain ledger with 15K transactions) larger networks or networks that have been running for a long time can take 3 hours or more. Networks do not respond or recover well if you restart a node while it is performing catchup, so please be patient. To verify that this is the cause first check that the node is connected to the Primary Node (if not, see Cause #2), then check the logs to verify that normal "catchup" operations are in process. +If your node appears unresponsive after adding it to a network (i.e. validator-info shows non-incrementing subledger counts) and no other symptoms are evident, then the first thing to do is wait. While smaller networks with a low number of transactions seem to perform "catchup" quite fast (within a minute or two for a domain ledger with 15K transactions) larger networks or networks that have been running for a long time can take 3 hours or more. Networks do not respond or recover well if you restart a node while it is performing catchup, so please be patient. To verify that this is the cause first check that the node is connected to the Primary Node (if not, see Cause #2), then check the logs to verify that normal "catchup" operations are in process. The best remedy for this would be to apply the "Best Practice" listed in the Validator Preparation Guide which suggests to "pre-fill" the data directory, especially on large networks, before starting a node for the first time. - Cause #2 - Node is not connected to the Primary Node If the added node cannot reach the primary node, then it sometimes has problems with catchup. Further symptoms in this case include Out Of Consensus (OOC) for your node and possibly others. If you realize the issue quickly, you might be able to recover from this by simply a) stopping the node, b) repairing the connection and then c) restarting the node. Otherwise, to recover you will need to perform the following: