Skip to content

Multi Region Cluster, Large footprint

Joel Gauci edited this page Jun 19, 2020 · 7 revisions

Plan topology and layout

The main tasks to create a dual data centre are:

  • Plan your cluster, hybrid topology layout, and required infrastructure components;
  • Create clusters;
  • Create multi-Region Cassandra ring;
  • Install Runtimes in each data centre;
  • Configure GTM.

Topology and Configuration

?. Let's define configuration of the installation.

Project: emea-cs-hybrid-demo6

Cluster Type: Multi-Zonal

config region/zones data x 3 runtime x 3
common dc-all.sh n1-standard-4 n1-standard-4
dc1-cluster dc1-cluster-s-1.1.0.sh us-east1 b, c, d
dc2-cluster dc2-cluster-s-1.1.0.sh asia-east1 a, b, c

NOTE: Keep an eye on total core count as well as mem requirements. COMING SOON: resource calculator...

Environment Configuration Bootstrap

TODO: [ ] those are generic ahr usage instructions. move to a separate page.

?. Clone ahr and source ahr environment

NOTE: ahr scripts require yq for yaml file processing. To install yq on linux:

curl -L https://github.com/mikefarah/yq/releases/download/3.2.1/yq_linux_amd64 -o ~/bin/yq

chmod +x ~/bin/yq

mkdir -p ~/apigee-hybrid
cd ~/apigee-hybrid
git clone https://github.com/yuriylesyuk/ahr

export AHR_HOME=~/apigee-hybrid/ahr

?. Create a project directory:

export HYBRID_HOME=~/apigee-hybrid/dual-dc-hybrid-110
mkdir -p $HYBRID_HOME

?. Copy examples of mult-region environment variable files.

cp $AHR_HOME/examples/dc-all.sh $HYBRID_HOME
cp $AHR_HOME/examples/dc1-cluster-l-1.1.0.sh $HYBRID_HOME
cp $AHR_HOME/examples/dc2-cluster-l-1.1.0.sh $HYBRID_HOME

?. Runtime Configuration

vi $HYBRID_HOME/dc-all.sh

Consider varibles common to every data centre. Put them inot dc-all.sh config file. Define differing elements in the each data centre file.

Three main groups of variables are:

  • Hybrid version
  • Project definition
  • Cluster parameters
  • Runtime configuration

TODO:? [ ] Expand??

Changes in this case:

export PROJECT=emea-cs-hybrid-demo6

# as we plan to use apigeeconnect, in this version mart ip and hostname must be defined but will not be used.
export MART_HOST_ALIAS=$ORG-mart.hybrid-apigee.net
export MART_IP=35.197.194.6

TODO: configure lb first!

for each DC
   regions and zones 
   runtime IPs 

Define working environement configuration

?. We keep things nice and tidy, and define cluster credentials config file in the project directory.

export KUBECONFIG=$PWD/config-dual-dc

?. Configure kubectl aliases and autocomplete, ahr-*-ctl path, and current project setting.

source $HYBRID_HOME/dc-all.sh
source $AHR_HOME/bin/ahr-env

Check that project reflect correct project.

?. OPTIONAL: You can cd to a project directory, however to keep things CI/CD-friendly, all file invocations are using full paths, and therefore current-location independent.

cd $HYBRID_HOME

create dc1 cluster config json and the cluster

(
source $HYBRID_HOME/dc1-cluster-l-1.1.0.sh;

ahr-cluster-ctl template $CLUSTER_TEMPLATE > $CLUSTER_CONFIG;

ahr-cluster-ctl create
)

create dc2 cluster config json and the cluster

( source $HYBRID_HOME/dc2-cluster-l-1.1.0.sh;

ahr-cluster-ctl template $CLUSTER_TEMPLATE > $CLUSTER_CONFIG;

ahr-cluster-ctl create )

After the clusters are created, your config-dual-dc has two DCs configured, dc1-cluster and dc2-cluster.

?. Source kubectl configuration for a dc1 and check a cluster version source $HYBRID_HOME/dc1-cluster-l-1.1.0.sh; source <(ahr-runtime-ctl home) kubectl version

TIP: Your session will expire. This is a set of statements to copy-and-paste it to your terminal to reset it again to point to a Project level [+ DC1 cluster [+ project directorty]]:

# for project level
export AHR_HOME=~/apigee-hybrid/ahr
export HYBRID_HOME=~/apigee-hybrid/dual-dc-hybrid-110
export KUBECONFIG=$HYBRID_HOME/config-dual-dc

source $HYBRID_HOME/dc-all.sh
source $AHR_HOME/bin/ahr-env
# for DC-cluster level
source $HYBRID_HOME/dc1-cluster-l-1.1.0.sh
source <(ahr-runtime-ctl home)
# go to project directory
cd $HYBRID_HOME

Create Service Accounts

In our case, we create a set of project SAs that are used for each cluster.

ahr-sa-ctl create all

Validate runtime configuration

(
source $HYBRID_HOME/dc1-cluster-l-1.1.0.sh
source <(ahr-runtime-ctl home)

ahr-verify
)

(
source $HYBRID_HOME/dc2-cluster-l-1.1.0.sh
source <(ahr-runtime-ctl home)

ahr-verify
)

TIP: ahr-verify stops on error. User

ahr-verify --stoponerror=false

if you want to check all known violations.

Create dc1 and dc2 cluster runtime config yaml

(
source $HYBRID_HOME/dc1-cluster-l-1.1.0.sh;
source <(ahr-runtime-ctl home)

ahr-runtime-ctl template $RUNTIME_TEMPLATE > $RUNTIME_CONFIG;
)

(
source $HYBRID_HOME/dc2-cluster-l-1.1.0.sh;
source <(ahr-runtime-ctl home)

ahr-runtime-ctl template $RUNTIME_TEMPLATE > $RUNTIME_CONFIG;
)

Get hybrid installation code and apigeectl.

ahr-runtime-ctl get

Installing Supporting Components at dc1

# dc1
(
source $HYBRID_HOME/dc1-cluster-l-1.1.0.sh;
source <(ahr-runtime-ctl home)

ahr-runtime-ctl apigeectl init -f $RUNTIME_CONFIG
)

Cassandra Ring

To setup Cassandra Ring, we follow those steps:

  • Install Cassandra in DC1
  • Boot up a new region DC2 with an external seed from DC1
  • Change seed host in DC2 back to its local cluster
  • Reconfigure replication and rebuild nodes

This time, we will execute manual steps from the official documentation. Besides being error-prone, these steps are also harder to automate for CI/CD inclusion. ahr-cs-ctl solves this problem. Ahr includes ahr-cs-ctl command, which converts those steps into three actions:

ahr-cs-ctl keyspaces-list
ahr-cs-ctl keyspaces-expand
ahr-cs-ctl nodetool <args>

?. Install Cassandra in dc1

(
source $HYBRID_HOME/dc1-cluster-l-1.1.0.sh
source <(ahr-runtime-ctl home)

cd $APIGEECTL_HOME
apigeectl -c cassandra apply -f $RUNTIME_CONFIG
)

TIP: If you need to delete Cassandra component completely, don't forget about PVCs.

(cd $APIGEECTL_HOME
apigeectl -c cassandra delete -f $RUNTIME_CONFIG
kubectl delete pvc -l app=apigee-cassandra
)

?. Check Cassandra ring status

kubectl --context dc1-cluster -n apigee exec -it apigee-cassandra-0 -- nodetool status

Output:

Datacenter: dc-1
================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens       Owns (effective)  Host ID                               Rack
UN  10.44.0.5  90.72 KiB  256          100.0%            bcbee035-2984-4e09-bec3-d117fbbbf80d  ra-1
UN  10.44.3.3  94.15 KiB  256          100.0%            744e362b-d911-470d-8cd0-ff5dec0da4d8  ra-1
UN  10.44.5.3  112.3 KiB  256          100.0%            43c8ed0a-bf87-44da-ac6d-fc5ece8298f5  ra-1

Adding dc2 to the Hybrid topology

?. Configure dc2-cluster as active cluster

# dc2
source $HYBRID_HOME/dc2-cluster-l-1.1.0.sh;
source <(ahr-runtime-ctl home)

Root CA Sertificate from dc1 to dc2

?. To install dc1 cluster Root CA certificate and key to the dc2:

  1. create namespace cert-manager in dc2
  2. fetch certificates from dc1, apigee-ca
  3. put certificates to dc2

?. Create cert-manager namespace i dc2

kubectl create namespace cert-manager

?. Replicate apigee-ca key and certificate to dc2-cluster

kubectl --context=dc1-cluster get secret apigee-ca --namespace=cert-manager --export -o yaml | kubectl --context=dc2-cluster apply --namespace=cert-manager -f -

?. Installing Supporting Components at dc2

ahr-runtime-ctl apigeectl init -f $RUNTIME_CONFIG
ahr-runtime-ctl apigeectl wait-for-ready -f $RUNTIME_CONFIG

Install Cassandra in dc2

Seed hosts are local cluster members. To boot up a new region an external seed host is required. Once a region boots up you need to change the seed hosts back to their local clusters in your runtime config yaml and then reapply the configuration.

cassandra:
  multiRegionSeedHost: <ip-address-of-first-cs-node-in-dc1>
  datacenter: "dc-2"
  rack: "ra-1"

IMPORTANT: There is a bug in 1.1.x versions of Hybrid that pohibit correct processing of .cassandra.multiRegionSeedHost property. You hit this problem, if you see an error like:

Debug: Name does not resolve
ERROR io.apigee.common.format.ErrorMessages - getFormattedMessage() : Unable to locate a resource bundle for error code apigee-cassandra-0.apigee-cassandra.apigee.svc.cluster.local,10.44.5.9: Name does not resolve
apigee-cassandra-0: node: gke-dc2-cluster-apigee-data-999924fc-219f.

We need to patch the 4_cps-cassandra-setup.yaml file

?. vi $APIGEECTL_HOME/templates/4_cps-cassandra-setup.yaml

?. Edit line 6 from

{{- $cassSeed = (printf "%s,%s" $cassSeed .cassandra.multiRegionSeedHost) }}

to

{{- $cassSeed = (printf "%s" .cassandra.multiRegionSeedHost) }}

?. Lookup the result of nodetool status command and note the IP address of first cassandra node. In our case, it's 10.44.0.5. We will use this node as an external seed node for dc1.

export DC1_CS_SEED_NODE=10.44.0.5

?. Add multiRegionSeedHost, datacenter, and rack properties into dc2-runtime.yaml config file. You should be in active dc2 environment.

echo $CLUSTER


yq m -i $RUNTIME_CONFIG - <<EOF                                                          
cassandra:
  multiRegionSeedHost: $DC1_CS_SEED_NODE
  datacenter: "dc-2"
  rack: "ra-1"
EOF

? Install cassandra component into dc2

ahr-runtime-ctl apigeectl -c cassandra apply -f $RUNTIME_CONFIG
ahr-runtime-ctl apigeectl -c cassandra wait-for-ready -f $RUNTIME_CONFIG

You now can see 6 PVCs, in the Storage page of Kubernetes Engine.

?. Run nodetool status command at dc1 to see that Cassandra ring is now 3 nodes in each dc.

kubectl --context dc1-cluster -n apigee exec -it apigee-cassandra-0 -- nodetool status

Datacenter: dc-1
================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens       Owns (effective)  Host ID                               Rack
UN  10.44.0.5  426.04 KiB  256          100.0%            bcbee035-2984-4e09-bec3-d117fbbbf80d  ra-1
UN  10.44.3.3  427.08 KiB  256          100.0%            744e362b-d911-470d-8cd0-ff5dec0da4d8  ra-1
UN  10.44.5.3  420.85 KiB  256          100.0%            43c8ed0a-bf87-44da-ac6d-fc5ece8298f5  ra-1
Datacenter: dc-2
================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens       Owns (effective)  Host ID                               Rack
UN  10.32.5.3  79.04 KiB  256          0.0%              17376e22-12ba-4b67-9b26-e0772cad2276  ra-1
UN  10.32.2.4  106.69 KiB  256          0.0%              8372f1a9-acae-418d-8b6f-abb37cd46dea  ra-1
UN  10.32.0.4  79.04 KiB  256          0.0%              0f558429-f3ec-4cd7-ad34-b6e783e7d29a  ra-1

Adjust replication configuration and rebuild nodes in dc1

?. Create an interactive containter. The bash shell will start.

kubectl run -i --tty --restart=Never --rm --image google/apigee-hybrid-cassandra-client:1.0.0 cqlsh

?. At the container bash prompt, execute cqlsh utility. Use

cqlsh apigee-cassandra-0.apigee-cassandra.apigee.svc.cluster.local -u ddl_user --ssl
# Password: iloveapis123

?. At the cqlsh prompt, run following commands to change replication and check state of before and after. Replace a project id in the keyspace name to yours.

NOTE: Please note, that a keyspace name contains a project id with dashes transcribed to underscores. As we execute cql statements in the cqlsh, we cannot use shell environment expansion. Replace a project id in the keyspace name to yours manually.

Example, for kvm_$PROJECT_hybrid, for emea-cs-hybrid-demo6 project, use kms_emea_cs_hybrid_demo6_hybrid

SELECT * from system_schema.keyspaces;

ALTER KEYSPACE cache_emea_cs_hybrid_demo6_hybrid WITH replication = {'class': 'NetworkTopologyStrategy', 'dc-1':3, 'dc-2':3};
ALTER KEYSPACE kms_emea_cs_hybrid_demo6_hybrid WITH replication = {'class': 'NetworkTopologyStrategy', 'dc-1':3, 'dc-2':3};
ALTER KEYSPACE kvm_emea_cs_hybrid_demo6_hybrid WITH replication = {'class': 'NetworkTopologyStrategy', 'dc-1':3, 'dc-2':3};
ALTER KEYSPACE perses WITH replication = {'class': 'NetworkTopologyStrategy', 'dc-1':3, 'dc-2':3};
ALTER KEYSPACE quota_emea_cs_hybrid_demo6_hybrid  WITH replication = {'class': 'NetworkTopologyStrategy', 'dc-1':3, 'dc-2':3};

SELECT * from system_schema.keyspaces;

?. exit from cqlsh and cqlsh container.

exit
exit

?. Rebuild nodes in dc1.

kubectl --context=dc2-cluster exec apigee-cassandra-0 -- nodetool rebuild dc-1
kubectl --context=dc2-cluster exec apigee-cassandra-1 -- nodetool rebuild dc-1
kubectl --context=dc2-cluster exec apigee-cassandra-2 -- nodetool rebuild dc-1

?. You can verify rebuild process using logs -f command. Example for the first CS node

kubectl --context=dc2-cluster logs apigee-cassandra-0 -f

...
INFO  22:54:33 rebuild from dc: dc-1, (All keyspaces), (All tokens)
INFO  22:54:34 [Stream #ee33cbd0-6bc6-11ea-941f-81a83ed8b7d8] Executing streaming plan for Rebuild
INFO  22:54:34 [Stream #ee33cbd0-6bc6-11ea-941f-81a83ed8b7d8] Starting streaming to /10.44.0.5
INFO  22:54:36 [Stream #ee33cbd0-6bc6-11ea-941f-81a83ed8b7d8, ID#0] Beginning stream session with /10.44.0.5
INFO  22:54:37 [Stream #ee33cbd0-6bc6-11ea-941f-81a83ed8b7d8 ID#0] Prepare completed. Receiving 6 files(5.122KiB), sending 0 files(0.000KiB)
INFO  22:54:38 [Stream #ee33cbd0-6bc6-11ea-941f-81a83ed8b7d8] Session with /10.44.0.5 is complete
INFO  22:54:38 [Stream #ee33cbd0-6bc6-11ea-941f-81a83ed8b7d8] All sessions completed

TIP: Something wit your Cassandra install might get wrong and you'd need to rebuild second cluster. I know, because that's what happened with me.

In this case, you would need to remove CS PVCs and repair Cassandra topology, remove nodes that correspond to deleted PVCS.

For this, run nodetool status on dc1 and notice uuids of non-existant nodes.

Execute nodetool remove operation to clean the topology. I.e.:

kubectl --context dc1-cluster exec -it apigee-cassandra-0 -- nodetool removenode > 0f558429-f3ec-4cd7-ad34-b6e783e7d29a

?. Remove ' multiRegionSeedHost: 10.44.5.9' in dc2-runtime.yaml

[ ] delete pod/apigee-cps-setup-emea-cs-hybrid-demo2 [ ] re-apply; check: apigee-cps-setup-emea-cs-hybrid-demo2 in dc2-cluster

?. Remove multiRegionSeedHost: property from .yaml file and delete/apply apigee-cps-setup* component to switch external seed node to the local datacentre seed node.

yq d -i $RUNTIME_CONFIG cassandra.multiRegionSeedHost

kubectl delete pod apigee-cps-setup-emea-cs-hybrid-demo6

ahr-runtime-ctl apigeectl -c cassandra apply -f $RUNTIME_CONFIG

?. Check status of the ring and observe that both datacenters have replicates data correctly.

kubectl --context dc1-cluster -n apigee exec -it apigee-cassandra-0 -- nodetool status

Datacenter: dc-1
================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens       Owns (effective)  Host ID                               Rack
UN  10.44.0.5  466.28 KiB  256          100.0%            bcbee035-2984-4e09-bec3-d117fbbbf80d  ra-1
UN  10.44.3.3  451.89 KiB  256          100.0%            744e362b-d911-470d-8cd0-ff5dec0da4d8  ra-1
UN  10.44.5.3  430.05 KiB  256          100.0%            43c8ed0a-bf87-44da-ac6d-fc5ece8298f5  ra-1
Datacenter: dc-2
================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens       Owns (effective)  Host ID                               Rack
UN  10.32.5.4  660.88 KiB  256          100.0%            17376e22-12ba-4b67-9b26-e0772cad2276  ra-1
UN  10.32.2.5  677.26 KiB  256          100.0%            8372f1a9-acae-418d-8b6f-abb37cd46dea  ra-1
UN  10.32.0.5  614.42 KiB  256          100.0%            0f558429-f3ec-4cd7-ad34-b6e783e7d29a  ra-1

Create Other Hybrid Runtime Components in both DCs

source $HYBRID_HOME/dc1-cluster-l-1.1.0.sh;
source <(ahr-runtime-ctl home)

ahr-runtime-ctl apigeectl apply -f $RUNTIME_CONFIG
ahr-runtime-ctl apigeectl wait-for-ready apply -f $RUNTIME_CONFIG


source $HYBRID_HOME/dc2-cluster-l-1.1.0.sh;
source <(ahr-runtime-ctl home)

# apigeectl apply in dc2
ahr-runtime-ctl apigeectl apply -f $RUNTIME_CONFIG
ahr-runtime-ctl apigeectl wait-for-ready apply -f $RUNTIME_CONFIG

TODO: GTM section.

Delete Multi Region Cluster

TODO: add others

[ ] clear setsync
[ ] remove SAs
[ ] delete clusters
[ ] remove PVCs
(
source $HYBRID_HOME/dc1-cluster-l-1.1.0.sh
source <(ahr-runtime-ctl home)
ahr-cluster-ctl delete
)
(
source $HYBRID_HOME/dc2-cluster-l-1.1.0.sh
source <(ahr-runtime-ctl home)
ahr-cluster-ctl delete
)
Clone this wiki locally