Skip to content

Scripts and other tools for installing / operating Fusion on Kubernetes

Notifications You must be signed in to change notification settings

lucidworks/fusion-cloud-native

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Fusion Cloud Native on Kubernetes

This repo contains scripts for installing Fusion 5.x on Kubernetes (K8s). The scripts provide an option to create Kubernetes clusters that are suitable for demo / proof-of-concept purposes only. We assume that you’ll want to control how your production clusters are provisioned, secured, and managed, as these are typically concerns we’re not able to script for you.

Prerequisites

This section covers prerequisites and background knowledge needed to help you understand the structure of this document and how the Fusion installation process works with Kubernetes.

Release Name and Namespace

Before installing Fusion, you need to choose a unique release name for Fusion, such as f5; Helm uses the release name to track a specific installation of an application in the cluster. Use a short name for your release containing only letters, digits, underscore, and dashes.

As of Helm v3, releases are managed at the namespace level, so you can have multiple releases with the same name across different namespaces in the same cluster. However, we recommend NOT doing this and use a unique release for every namespace; if you do use the same release label across multiple namespaces in the same cluster, then you need to take care to include the namespace in your custom values yaml file(s).

You also need to choose the Kubernetes namespace to install Fusion into. Think of a K8s namespace as a virtual cluster within a physical cluster. You can install multiple instances of Fusion in the same cluster in separate namespaces. However, please do not install more than one Fusion release in the same namespace.

NOTE: All Fusion services must run in the same namespace, i.e. you should not try to split a Fusion cluster across multiple namespaces.

Install Helm

Helm is a package manager for Kubernetes that helps you install and manage applications on your Kubernetes cluster. Regardless of which Kubernetes platform you’re using, you need to install helm as it is required to install Fusion for any K8s platform. On MacOS, you can do:

brew install kubernetes-helm

If you already have helm installed, make sure you’re using the latest version:

brew upgrade kubernetes-helm

For other OS, please refer to the Helm installation docs: https://helm.sh/docs/using_helm/

The Fusion helm chart requires that helm is greater than version 2.12.0 but Lucidworks recommends upgrading to Helm v3. Check your Helm version by running helm version.

Clone fusion-cloud-native from Github

You should clone this repo from github as you’ll need to run the scripts on your local workstation:

git clone https://github.com/lucidworks/fusion-cloud-native.git

You should get into the habit of pulling this repo for the latest changes before performing any maintenance operations on your Fusion cluster to ensure you have the latest updates to the scripts.

cd fusion-cloud-native
git pull

Google Kubernetes Engine (GKE)

The setup_f5_gke.sh script provided in this repo is strictly optional. The script is mainly to help those new to Kubernetes and/or Fusion get started quickly. If you’re already familiar with K8s, Helm, and GKE, then you can skip the script and just use Helm directly to install Fusion into an existing cluster or one you create yourself using the process described here.

If you’re new to Google Cloud Platform (GCP), then you need an account on Google Cloud Platform before you can begin deploying Fusion on GKE.

Set up the Google Cloud SDK (one time only)

If you’ve already installed the gcloud command-line tools, you can skip to Create a Fusion cluster in GKE.

These steps set up your local Google Cloud SDK environment so that you’re ready to use the command-line tools to manage your Fusion deployment.

Usually, you only need to perform these setup steps once. After that, you’re ready to create a cluster.

For a nice getting started tutorial for GKE, see: https://codelabs.developers.google.com/codelabs/cloud-gke-workshop-v2/#1

How to set up the Google Cloud SDK
  1. Enable the Kubernetes Engine API.

  2. Log in to Google Cloud: gcloud auth login

  3. Set up the Google Cloud SDK:

    1. gcloud config set compute/zone <zone-name>

      If you are working with regional clusters instead of zone clusters, use gcloud config set compute/region <region-name> instead.

    2. gcloud config set core/account <email address>

    3. New GKE projects only: gcloud projects create <new-project-name>

      If you have already created a project, for example in the Google Cloud Platform console, then skip to the next step.

    4. gcloud config set project <project-name>

Make sure you install the Kubernetes command-line tool kubectl using:

gcloud components install kubectl
gcloud components update

Create a Fusion cluster in GKE

Run the setup_f5_gke.sh script to install Fusion 5.x in a GKE cluster. To create a new cluster and install Fusion, simply do:

./setup_f5_gke.sh -c <cluster_name> -p <gcp_project_id> -r <release> -n <namespace>

Use the --help option to see script usage. If you want the script to create a cluster for you (the default behavior), then you need to pass the --create option with either demo or multi_az. If you don’t want the script to create a cluster, then you need to create a cluster before running the script and simply pass the name of the existing cluster using the -c parameter.

If you pass --create demo to the script, then we create a single node GKE cluster. The minimum node type you’ll need for a 1 node cluster is an n1-standard-4 (on GKE) which has 4 CPU and 15 GB of memory. This is cutting it very close in terms of resources as you also need to host all of the Kubernetes system pods on this same node. Obviously, this works for kicking the tires on Fusion 5.0 but is not sufficient for production workloads.

You can change the instance type using the -i parameter; see: https://cloud.google.com/compute/docs/regions-zones/#available for an list of which machine types are available in your desired region.

Note: If not provided the script generates a custom values file named gke_<cluster>_<release>_fusion_values.yaml which you can use to customize the Fusion chart.

WARNING If using Helm V2, the setup_f5_gke.sh script installs Helm’s tiller component into your GKE cluster with the cluster admin role. If you don’t want this, then please upgrade to Helm v3.

If you see an error similar to the following, then wait a few seconds and try running the setup_f5_gke.sh script again with the same arguments as this is usually a transient issue:

Error: could not get apiVersions from Kubernetes: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request

After running the setup_f5_gke.sh script, proceed to the Verifying the Fusion Installation section below.

The steps below show you how to create several kinds of Fusion clusters.

How to create a single-node Fusion demo cluster

A single-node configuration is useful for exploring Fusion in a demo or development environment.

This type of deployment can take at least 12 minutes, plus 3–5 minutes for cluster startup.

How to create a single-node Fusion demo cluster
  1. Run the setup script:

    ./setup_f5_gke.sh -c <cluster> -p <project> -z <zone-name> --create demo
    • <cluster> value should be the name of a non-existent cluster; the script will create the new cluster.

    • <project> must match the name of an existing project in GKE.

      Run gcloud config get-value project to get this value, or see the GKE setup instructions.

    • <zone-name> must match the name of the zone you set in GKE. For a demo cluster, the zone must be a specific Availability Zone and not a Region, such as us-west1-a instead of us-west1

      Run gcloud config get-value compute/zone to get this value, or see the GKE setup instructions to set the value.

    Upon success, the script shows you where to find the Fusion UI. For example:

    Fusion 5 Gateway service exposed at: <some-external-ip>:6764
  2. Access the Fusion UI by pointing your browser to the IP address and port specified in the setup script’s output.

Create a three-node regional cluster to withstand a zone outage

With a three-node regional cluster, nodes are deployed across three separate availability zones.

./setup_f5_gke.sh -c <cluster> -p <project> -z <zone-name> --create multi_az

In this configuration, we want a ZooKeeper and Solr instance on each node, which allows the cluster to retain ZK quorum and remain operational after losing one node, such as during an outage in one availability zone.

When running in a multi-zone cluster, each Solr node has the solr_zone system property set to the zone it is running in, such as -Dsolr_zone=us-west1-a.

GKE Ingress and TLS

The Fusion proxy service provides authentication and serves as an API gateway for accessing all other Fusion services. It’s typical to use an Ingress for TLS termination in front of the proxy service.

The setup_f5_gke.sh supports creating an Ingress with an TLS cert for a domain you own by passing: -t -h <hostname>

After the script runs, you need to create an A record in GCP’s DNS service to map your domain name to the Ingress IP. Once this occurs, our script setup uses Let’s Encrypt to issue a TLS cert for your Ingress.

To see the status of the Let’s Encrypt issued certificate, do:

kubectl get managedcertificates -n <namespace> -o yaml

Please refer to the Kubernetes documentation on configuring an Ingress for GKE: Setting up HTTP Load Balancing with Ingress

Note
The GCP Ingress defaults to a 30 second timeout, which can lead to false negatives for long running requests such as importing apps. To configure the timeout for the backend in kubernetes:

Create a BackendConfig object in your namespace:

---
apiVersion: cloud.google.com/v1beta1
kind: BackendConfig
metadata:
  name: backend_config_name
spec:
  timeoutSec: 120
  connectionDraining:
    drainingTimeoutSec: 60

Then make sure that the following entries are in the right place in your values.yaml file:

api-gateway:
  service:
    annotations:
      beta.cloud.google.com/backend-config: '{"ports": {"6764":"backend_config_name"}}'

and upgrade your release to apply the configuration changes

Ingresses and externalTrafficPolicy

When running a fusion cluster behind an externally controlled LoadBalancer it can be advantageous to configure the externalTrafficPolicy of the proxy service to Local. This preserves the client source IP and avoids a second hop for LoadBalancer and NodePort type services, but risks potentially imbalanced traffic spreading. Although when running in a cluster with a dedicated pool for spark jobs that can scale up and down freely it can prevent unwanted request failures. This behaviour can be altered with the api-gateway.externalTrafficPolicy value, which is set to Local if the example values file is used.

Considerations when using the nginx ingress controller

If you are using the nginx ingress controller to fulfil your ingress definitions there are a couple of options that are recommended to be set in the configmap:

enable-underscores-in-headers: "true"   # Fusion can return some headers that have underscores, these have to be explicitly enabled in nginx
proxy-body-size: "0"        # By default nginx places a maximum size on request bodies, either increase as needed or disable by setting to 0
proxy-read-timeout: "300"   # Increases the timeout for potential slow queries.

Custom values

There are some example values files that can be used as a starting point for resources, affinity and replica count configuration in the example-values folder. These can be passed to the install script using the --values option, for example:

./setup_f5_gke.sh -c <cluster> -p <project> -r <release> -n <namespace> \
  --values example-values/affinity.yaml --values example-values/resources.yaml --values example-values/replicas.yaml

The --values option can be passed multiple times, if the same configuration property is contained within multiple values files then the values from the latest file passed as a --values option are used.

Upgrades and Ingress

IMPORTANT If you used the -t -h <hostname> options when installing your cluster, our script created an additional values yaml file named tls-values.yaml.

To make things easier for you when upgrading, you should add the settings from this file into your main custom values yaml file, e.g.:

api-gateway:
  service:
    type: "NodePort"
  ingress:
    enabled: true
    host: "<hostname>"
    tls:
      enabled: true
    annotations:
      "networking.gke.io/managed-certificates": "<RELEASE>-managed-certificate"
      "kubernetes.io/ingress.class": "gce"

This way you don’t have to remember to pass the additional tls-values.yaml file when upgrading.

Upgrade Fusion on GKE

NOTE: If you’re currently running Fusion 5.0.1, then please use the instructions at Upgrade from 5.0.1

During installation, the script generates a file named gke_<cluster>_<release>_fusion_values.yaml; use this file to customize Fusion settings.

After making changes to this file, you need to run the following command:

./setup_f5_gke.sh -c <existing_cluster> -p <gcp_project_id> -r <release> -n <namespace> \
  --values gke_<cluster>_<release>_fusion_values.yaml --upgrade

You will also use the --upgrade option to upgrade to a newer version of Fusion, such as 5.0.2. Our setup script creates an upgrade script you can use to perform upgrades, see:

gke_<cluster>_<release>_upgrade_fusion.sh

If you’re using the default namespace and see an error similar to the following, then simply pass the --force parameter when upgrading:

Namespace default is owned by: , by we are: OWNER please provide the `--force` parameter if you are sure you wish to upgrade this namespace

This owner label check before upgrading is in place as a safeguard for shared clusters with Fusion deployed to multiple namespaces.

After running the upgrade, use kubectl get pods to see the changes being applied to your cluster. It may take several minutes to perform the upgrade as new Docker images need to be pulled from DockerHub. To see the versions of running pods, do:

kubectl get po -o jsonpath='{..image}'  | tr -s '[[:space:]]' '\n' | sort | uniq

Upgrade from 5.0.1 to 5.0.2 to Zookeeper 3.5.6 and Solr 8.3.1

Fusion 5.0.1 (and subsequent 5.0.2 pre-release versions, such as 5.0.2-7) runs Solr 8.2.0 and Zookeeper 3.4.14. Prior to upgrading to Fusion 5.0.2, you need to upgrade Solr to 8.3.1 in your existing cluster and perform some minor changes to the custom values yaml.

When you upgrade to 5.0.2, Zookeeper will migrate from 3.4.14 to 3.5.6. Behind the scenes, we also had update the ZK Helm chart to work around an issue with purging logs (kubernetes-retired/contrib#2942), so we’ll have to delete the existing StatefulSet in order to switch charts during the upgrade.

Prior to upgrading, list our your releases for Helm v2:

helm ls --all-namespaces

Once you’re ready to upgrade, on a Mac, do:

brew upgrade kubernetes-helm

For other OS, download from https://github.com/helm/helm/releases

Verify: helm version --short

v3.0.0+ge29ce2a
Migrate your release to Helm v3 using the helm-2to3 plugin (if needed)

If you installed your F5 cluster using Helm v2, you need to migrate it to v3 using the process described here: https://helm.sh/blog/migrate-from-helm-v2-to-helm-v3/. Basically, you need to migrate the release metadata that lives in Tiller over to your local system.

If you installed your cluster with Helm v3 originally, then you don’t need to do this step. Just verify your release is shown by: helm ls

During testing, we found upgrading Solr to 8.3.1 before moving to ZK 3.5.6 was more stable.

Edit your custom values yaml file and change the Solr version to 8.3.1.

solr:
  image:
    tag: 8.3.1
  updateStrategy:
    type: "RollingUpdate"

Determine the version of the Fusion chart you are currently running (shown by helm ls -n <namespace>) as you’ll need to pass that to the setup script when upgrading Solr to 8.3.1.

For instance, your chart version may be: fusion-5.0.2-7 in which case you would pass --version 5.0.2-7. The -7 part of the version is considered a "pre-release" of 5.0.2 in the semantic versioning scheme, see: https://semver.org/

./setup_f5_gke.sh -c <existing_cluster> -p <gcp_project_id> -r <release> -n <namespace> \
  --version <CHART_VERSION> \
  --values gke_<cluster>_<release>_fusion_values.yaml --upgrade

Wait until solr is back up and heatlhy

IMPORTANT: You need to edit your custom values file and move the Zookeeper settings out from under the solr: section to the main level, e.g. instead of:

solr:
  ...
  zookeeper:
    ...

You need:

solr:
  ...

zookeeper:
  ...

At this point you’re ready to switch over to ZK 3.5.6. However, we cannot do this with zero downtime, meaning your cluster will lose quorum momentarily. So plan to have a minute or so of downtime in this cluster. Also, to avoid as much downtime as possible, be ready to upgrade to 5.0.2 immediately after deleting the existing statefulset.

When ready, do:

kubectl delete statefulset ${RELEASE}-solr
kubectl delete statefulset ${RELEASE}-zookeeper

Deleting the StatefulSet does not remove the persistent volumes backing Zookeeper and Solr, so no data will be lost.

After editing your custom values yaml file, run:

cd fusion-cloud-native

./setup_f5_gke.sh -c <CLUSTER> -p <PROJECT> -z <ZONE> \
  -n <NAMESPACE> -r <RELEASE> \
    --values <MY_VALUES> --version 5.0.2 --upgrade --force

Wait a few minutes and then verify the new ZK establishes quorum:

kubectl get pods

It will take some time for the upgrade to rollout across all the services as K8s needs to pull new Docker images and then perform a rolling upgrade for each Fusion service.

After upgrading, verify the versions of each pod:

kubectl get po -o jsonpath='{..image}'  | tr -s '[[:space:]]' '\n' | sort | uniq
Install Prometheus / Grafana to Existing Cluster

As of 5.0.2, the Fusion setup scripts provide the option to install Prometheus and Grafana using the --prometheus option. However, if you installed a previous version of Fusion 5.0.x, then the upgrade does not install Prometheus / Grafana for you.

Once you complete the upgrade to Fusion 5.0.2, you can run the install_prom.sh script to install these additional services into your namespace. Pass the --help option to see script usage details.

For instance, to install into a GKE cluster and schedule the new pods in the default Node Pool, you would do:

./install_prom.sh -c <cluster> -r <release> -n <namespace> \
  --node-pool "cloud.google.com/gke-nodepool: default-pool" --provider gke

Once Prometheus and Grafana are deployed, edit your custom values yaml file for Fusion to enable the Solr exporter:

solr:
  ...
  exporter:
    enabled: true
    podAnnotations:
      prometheus.io/scrape: "true"
      prometheus.io/port: "9983"
      prometheus.io/path: "/metrics"
    nodeSelector:
      cloud.google.com/gke-nodepool: default-pool

Add pod annotations to the query-pipeline, fusion-indexing, api-gateway services as needed to allow Prometheus to scrape metrics:

fusion-indexing:
  ...
  pod:
    annotations:
      prometheus.io/port: "8765"
      prometheus.io/scrape: "true"
query-pipeline:
  ...
  pod:
    annotations:
      prometheus.io/port: "8787"
      prometheus.io/scrape: "true"
api-gateway:
  ...
  pod:
    annotations:
      prometheus.io/port: "6764"
      prometheus.io/scrape: "true"

After making changes to the custom values yaml file, run an upgrade on the Fusion Helm chart.

Amazon Elastic Kubernetes Service (EKS)

The setup_f5_eks.sh script provided in this repo is strictly optional. The script is mainly to help those new to Kubernetes and/or Fusion get started quickly. If you’re already familiar with K8s, Helm, and EKS, then you use Helm directly to install Fusion into an existing cluster or one you create yourself using the process described here.

If you’re new to Amazon Web Services (AWS), then please visit the Amazon Web Services Getting Started Center to set up an account.

If you’re new to Kubernetes and EKS, then we recommend going through Amazon’s EKS Workshop before proceeding with Fusion.

Set up the AWS CLI tools

Before launching an EKS cluster, you need to install and configure kubectl, aws, eksctl, aws-iam-authenticator using the links provided below:

Required AWS Command-line Tools:
  1. kubectl: Install kubectl

  2. aws: Installing the AWS CLI

  3. eksctl: Getting Started with eksctl

  4. aws-iam-authenticator: AWS IAM Authenticator for Kubernetes

Run aws configure to configure a profile for authenticating to AWS. You’ll use the profile name you configure in this step, which defaults to default, as the -p argument to the setup_f5_eks.sh script in the next section.

Note
When working in Ubuntu, avoid using the eksctl snap version. Alternative sources can have different versions that could cause command failures.

Set up Fusion on EKS

To create a cluster in EKS the following IAM policies are required:

  • AmazonEC2FullAccess

  • AWSCloudFormationFullAccess

Table 1. EKS Permissions

eks:DeleteCluster

eks:UpdateClusterVersion

eks:ListUpdates

eks:DescribeUpdate

eks:DescribeCluster

eks:ListClusters

eks:CreateCluster

Table 2. VPC Permissions

ec2:DeleteSubnet

ec2:DeleteVpcEndpoints

ec2:CreateVpc

ec2:AttachInternetGateway

ec2:DetachInternetGateway

ec2:DisassociateSubnetCidrBlock

ec2:DescribeVpcAttribute

ec2:AssociateVpcCidrBlock

ec2:ModifySubnetAttribute

ec2:DisassociateVpcCidrBlock

ec2:CreateVpcEndpoint

ec2:DescribeVpcs

ec2:CreateInternetGateway

ec2:AssociateSubnetCidrBlock

ec2:ModifyVpcAttribute

ec2:DeleteInternetGateway

ec2:DeleteVpc

ec2:CreateSubnet

ec2:DescribeSubnets

ec2:ModifyVpcEndpoint

Table 3. IAM Permissions

iam:CreateInstanceProfile

iam:DeleteInstanceProfile

iam:GetRole

iam:GetPolicyVersion

iam:UntagRole

iam:GetInstanceProfile

iam:GetPolicy

iam:TagRole

iam:RemoveRoleFromInstanceProfile

iam:DeletePolicy

iam:CreateRole

iam:DeleteRole

iam:AttachRolePolicy

iam:PutRolePolicy

iam:ListInstanceProfiles

iam:AddRoleToInstanceProfile

iam:CreatePolicy

iam:ListInstanceProfilesForRole

iam:PassRole

iam:DetachRolePolicy

iam:DeleteRolePolicy

iam:CreatePolicyVersion

iam:GetRolePolicy

iam:DeletePolicyVersion

Download and run the setup_f5_eks.sh script to install Fusion 5.x in a EKS cluster. To create a new cluster and install Fusion, simply do:

./setup_f5_eks.sh -c <cluster_name> -p <aks_resource_group>

If you want the script to create a cluster for you (the default behavior), then you need to pass the --create option with either demo or multi_az. If you don’t want the script to create a cluster, then you need to create a cluster before running the script and simply pass the name of the existing cluster using the -c parameter.

Use the --help option to see full script usage.

WARNING If using Helm V2, the setup_f5_eks.sh script installs Helm’s tiller component into your EKS cluster with the cluster admin role. If you don’t want this, then please upgrade to Helm v3.

WARNING The setup_f5_eks.sh script creates a service account that provides S3 read-only permissions to the created pods.

After running the setup_f5_eks.sh script, proceed to the Verifying the Fusion Installation section below.

EKS cluster overview

The EKS cluster is created using eksctl (https://eksctl.io/). By default it will setup the following resources in your AWS account:

  • A dedicated VPC for the EKS cluster in the specified region with CIDR: 192.168.0.0/16

  • 3 Public and 3 Private subnets within the created VPC, each with a /19 CIDR range, along with the corresponding route tables.

  • A NAT gateway in each Public subnet

  • An Auto Scaling Group of the instance type specified by the script, which defaults to m5.2xlarge, with 3 instances spanning the public subnets.

See https://eksctl.io/usage/vpc-networking/ for more information on the networking setup.

EKS Ingress

The setup_f5_eks.sh script exposes the Fusion proxy service on an external IP over HTTP. This is done for demo or getting started purposes. However, you’re strongly encouraged to configure a K8s Ingress with TLS termination in front of the proxy service. See: https://aws.amazon.com/premiumsupport/knowledge-center/terminate-https-traffic-eks-acm/

Upgrade Fusion on EKS

During installation, the script generates a file named eks_<cluster>_<release>_fusion_values.yaml. Use this file to customize Fusion settings. After making changes to this file, run the following command:

./setup_f5_eks.sh -c <existing_cluster> -p <aks_resource_group> -r <release> -n <namespace> \
  --values eks_<cluster>_<release>_fusion_values.yaml --upgrade

You will also use the --upgrade option to upgrade to a newer version of Fusion, such as 5.0.2.

To make things easier for you, our setup script creates an upgrade script you can use to perform upgrades, see:

eks_<cluster>_<release>_upgrade_fusion.sh

Provide access to the EKS cluster to other users

Initially, only the user that created the Amazon EKS cluster has system:masters permissions to configure the cluster. In order to extend the permissions, a ConfigMap should be created to allow access to IAM users or roles.

For providing these permissions, use the following yaml file as a template, replacing the required values:

aws-auth.yaml

apiVersion: v1
kind: ConfigMap
metadata:
  name: aws-auth
  namespace: kube-system
data:
  mapRoles: |
    - rolearn: <node_instance_role_arn>
      username: system:node:{{EC2PrivateDNSName}}
      groups:
        - system:bootstrappers
        - system:nodes
  mapUsers: |
    - userarn: arn:aws:iam::<account_id>:user/<username>
      username: <username>
      groups:
        - system:masters

Use the following command for applying the yaml file: kubectl apply -f aws-auth.yaml

Azure Kubernetes (AKS)

The setup_f5_aks.sh script provided in this repo is strictly optional. The script is mainly to help those new to Kubernetes and/or Fusion get started quickly. If you’re already familiar with K8s, Helm, and AKS, then you use Helm directly to install Fusion into an existing cluster or one you create yourself using the process described here.

If you’re new to Azure, then please visit https://azure.microsoft.com/en-us/free/search/ to set up an account.

Set up the AKS CLI tools

Before launching an AKS cluster, you need to install and configure kubectl and az using the links provided below:

Required AKS Command-line Tools:
  1. kubectl: Install kubectl

  2. az: Installing the Azure CLI

To confirm your account access and command-line tools are set up correctly, run the az login command (az login –help to see available options).

Azure Prerequisites

To launch a cluster in AKS (or pretty much do anything with Azure) you need to setup a Resource Group. Resource Groups are a way of organizing and managing related resources in Azure. For more information about resource groups, see https://docs.microsoft.com/en-us/azure/azure-resource-manager/resource-group-overview#resource-groups.

You also need to choose a location where you want to spin up your AKS cluster, such as westus2. For a list of locations you can choose, see https://azure.microsoft.com/en-us/global-infrastructure/locations/.

Use the Azure console in your browser to create a resource group, or simply do:

az group create -g $AZURE_RESOURCE_GROUP -l $AZURE_LOCATION
To recap, you should have the following requirements in place:
  1. Azure Account set up.

  2. azure-cli (az) command-line tools installed.

  3. az login working.

  4. Created an Azure Resource Group and selected a location to launch the cluster.

Set up Fusion on AKS

Download and run the setup_f5_aks.sh script to install Fusion 5.x in a AKS cluster. To create a new cluster and install Fusion, simply do:

./setup_f5_aks.sh -c <cluster_name> -p <aks_resource_group>

If you don’t want the script to create a cluster, then you need to create a cluster before running the script and simply pass the name of the existing cluster using the -c parameter.

Use the --help option to see full script usage.

By default, our script installs Fusion into the default namespace; think of a K8s namespace as a virtual cluster within a physical cluster. You can install multiple instances of Fusion in the same cluster in separate namespaces. However, please do not install more than one Fusion release in the same namespace.

You can override the namespace using the -n option. In addition, our script uses f5 for the Helm release name; you can customize this using the -r option. Helm uses the release name you provide to track a specific instance of an installation, allowing you to perform updates and rollback changes for that specific release only.

You can also pass the --preview option to the script, which enables soon-to-be-released features for AKS, such as deploying a multi-zone cluster across 3 availability zones for higher availability guarantees. For more information about the Availability Zone feature, see https://docs.microsoft.com/en-us/azure/aks/availability-zones.

It takes a while for AKS to spin up the new cluster. The cluster will have three Standard_D4_v3 nodes which have 4 CPU cores and 16 GB of memory. Behind the scenes, our script calls the az aks create command.

Warning
If using Helm V2, the setup_f5_aks.sh script installs Helm’s tiller component into your AKS cluster with the cluster admin role. If you don’t want this, then please upgrade to Helm v3.

After running the setup_f5_aks.sh script, proceed to Verifying the Fusion Installation.

AKS Ingress

The setup_f5_aks.sh script exposes the Fusion proxy service on an external IP over HTTP. This is done for demo or getting started purposes. However, you’re strongly encouraged to configure a K8s Ingress with TLS termination in front of the proxy service.

Use the -t and -h <hostname> options to have our script create an Ingress with a TLS certificate issued by Let’s Encrypt.

Upgrades and Ingress

Important
If you used the -t -h <hostname> options when installing your cluster, our script created an additional values yaml file named tls-values.yaml.

To make things easier for you when upgrading, you should add the settings from this file into your main custom values yaml file. For example:

api-gateway:
  service:
    type: "NodePort"
  ingress:
    enabled: true
    host: "<hostname>"
    tls:
      enabled: true
    annotations:
      "networking.gke.io/managed-certificates": "<RELEASE>-managed-certificate"
      "kubernetes.io/ingress.class": "gce"

This way, you don’t have to remember to pass the additional tls-values.yaml file when upgrading.

Upgrade Fusion on AKS

During installation, the script generates a file named aks_<cluster>_<release>_fusion_values.yaml. Use this file to customize Fusion settings. After making changes to this file, run the following command:

./setup_f5_aks.sh -c <existing_cluster> -p <aks_resource_group> -r <release> -n <namespace> \
  --values aks_<cluster>_<release>_fusion_values.yaml --upgrade

You will also use the --upgrade option to upgrade to a newer version of Fusion.

To make things easier for you, our setup script creates an upgrade script you can use to perform upgrades, see:

aks_<cluster>_<release>_upgrade_fusion.sh

Other Kubernetes Platforms

If you’re not running on managed K8s platform such as GKE, AKS, or EKS, you can use Helm to install the Fusion chart to an existing Kubernetes cluster.

(To install Fusion locally, you should have a node with at least 12GB of memory, 100GB of disk and 4 CPU cores. The recommended setup is 16GB (or more) of memory and 8+ CPU cores. Please follow the instructions here to adjust your Docker’s resource limits)

Then run the setup_f5_k8s.sh script to install Fusion 5.x into local k8s cluster.

./setup_f5_k8s.sh -c <cluster_name> -r <release> -n <namespace>

Make sure all the pods are healthy and running by watching the cluster rollout using: kubectl get pods --watch

Hit ctrl-c once all the pods are running.

Run the following commands to access Fusion locally on http://localhost:6764

kubectl port-forward svc/proxy 6764:6764

Use Helm v3 to Install Fusion

You should upgrade to the latest version of Helm v3 for working with Fusion. If you need to keep Helm V2 for other clusters, ensure Helm V3 is ahead of Helm V2 in your working shell’s PATH before proceeding.

Customize Fusion Chart Settings

Fusion aims to be well-configured out-of-the-box, but you can customize any of the built-in settings using a custom values YAML file. If you use one of our setup scripts, such as setup_f5_gke.sh, then it will create a custom values YAML file for you the first time you run it using the customize_fusion_values.yaml.example as a template.

If you’re working with Helm directly and not using one of our setup scripts, then run the customize_fusion_values.sh script to create a custom values YAML file from our customize_fusion_values.yaml.example template as a starting point:

./customize_fusion_values.sh <provider>_<cluster>_<release>_fusion_values.yaml \
  -c <cluster> -r <release> \
  --provider <provider> --num-solr 1 --node-pool "<node_pool>"
Note
Pass --help for usage details.

In this example: * <provider> is the K8s platform you’re running on, such as gke * <cluster> is the name of your cluster * <release> is the name you give to your Fusion release, such as f5

Note
The --node-pool option specifies the node selector label for determining which nodes to run Fusion pods. You can pass "{}" to let Kubernetes decide which nodes to schedule pods on.

This file is referred to as ${MY_VALUES} in the commands belo. Replace the filename with the correct filename for your environment. Keep this file handy, as you’ll need it to customize Fusion settings and upgrade to a newer version.

Review the settings in the custom values YAML file to ensure the defaults are appropriate for your environment, including the number of Solr and Zookeeper replicas.

RELEASE=f5
NAMESPACE=default

helm version --short
helm repo add lucidworks https://charts.lucidworks.com
helm repo update
helm install ${RELEASE} lucidworks/fusion --timeout=240s --namespace "${NAMESPACE}" --values "${MY_VALUES}" --version 5.0.2
kubectl rollout status deployment/${RELEASE}-api-gateway --timeout=600s --namespace "${NAMESPACE}"

Upgrade Existing Installation with Helm V3

To update an existing installation, do:

RELEASE=f5
NAMESPACE=default
helm repo update
helm upgrade ${RELEASE} "lucidworks/fusion" --namespace "${NAMESPACE}" --values "${MY_VALUES}"

Except for Zookeeper, all K8s deployments and statefulsets use a RollingUpdate update policy:

  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate

Zookeeper instances use OnDelete to avoid changing critical stateful pods in the Fusion deployment. To apply changes to Zookeeper after performing the upgrade (uncommon), you need to manually delete the pods. For example:

kubectl delete pod f5-zookeeper-0
Important
Delete one pod at a time, and verify the new pod is healthy and serving traffic before deleting the next healthy pod.

Alternatively, you can set the updateStrategy under the zookeeper section in your "${MY_VALUES}" file:

solr:
  ...
  zookeeper:
    updateStrategy:
      type: "RollingUpdate"

RedHat OpenShift

We can deploy Fusion in an existing OpenShift cluster. This cluster should be created using OpenShift Infrastructure Provider. A Red Hat Customer Portal account is required. OpenShift Online services are not supported.

In case Tiller is required, the cluster security needs to be relaxed to allow images to run with different UIDs:

oc adm policy add-scc-to-group anyuid system:authenticated

Verifying the Fusion Installation

In this section, we provide some tips on how to verify the Fusion installation. First, let’s review some useful kubectl commands.

Enhance the K8s Command-line Experience

When working with Kubernetes on the command-line, it’s useful to create a shell alias for kubectl, e.g.:

alias k=kubectl

Here is a list of tools we found useful for improving your command-line experience with Kubernetes:

Useful kubectl commands

Set the namespace for kubectl if not using the default:

kubectl config set-context --current --namespace=<NAMESPACE>

This saves you from having to pass -n with every command.

Get a list of running pods: k get pods

Get logs for a pod using a label: k logs –l app.kubernetes.io/component=query-pipeline

Get pod deployment spec and details: k get pods <pod_id> -o yaml

Get details about a pod events: k describe po <pod_id>

Port forward to a specific pod: k port-forward <pod_id> 8983:8983

SSH into a pod: k exec -it <pod_id> — /bin/bash

CPU/Memory usage report for pods: k top pods

Forcefully kill a pod: k delete po <pod_id> --force --grace-period 0

Scale up (or down) a deployment: k scale deployment.v1.apps/<id> --replicas=N

Get a list of pod versions: k get po -o jsonpath='{..image}' | tr -s '' '\n' | sort | uniq

Check Fusion Pods and Services

Once the install script completes, you can check that all pods and services are available using:

kubectl get pods

If all goes well, you should see a list of pods similar to:

NAME                                     READY   STATUS    RESTARTS   AGE
f5-admin-ui-669bb68f74-pjqtw           1/1     Running   0          19h
f5-api-gateway-6f7fdd69d-bt2nc         1/1     Running   0          19h
f5-auth-ui-b4dfd4f6d-f9tb6             1/1     Running   0          19h
f5-classic-rest-service-0              1/1     Running   1          19h
f5-devops-ui-768cf6f55b-wphsw          1/1     Running   0          19h
f5-fusion-admin-5888f54447-hprt6       1/1     Running   0          19h
f5-fusion-indexing-76dfb65dfd-929f4    1/1     Running   0          19h
f5-insights-686464b75b-6pzw5           1/1     Running   0          19h
f5-job-launcher-5d84c859c4-dl7s9       1/1     Running   0          19h
f5-job-rest-server-fb99fcfd7-lmqvd     1/1     Running   0          19h
f5-logstash-0                          1/1     Running   0          19h
f5-ml-model-service-8574b96c68-jqt88   2/2     Running   0          17h
f5-query-pipeline-77956f56f8-22wg7     1/1     Running   0          19h
f5-rest-service-77ff7d45-rbrn4         1/1     Running   0          19h
f5-rpc-service-67b6f4bf49-2d65g        1/1     Running   1          19h
f5-rules-ui-65d59dc5b4-5ntq9           1/1     Running   0          19h
f5-solr-0                              1/1     Running   0          19h
f5-webapps-7d9497c485-bbtg9            1/1     Running   0          19h
f5-zookeeper-0                         1/1     Running   0          19h

The number of pods per deployment / statefulset will vary based on your cluster size and replicaCount settings in your custom values YAML file. Also, don’t worry if you see some pods having been restarted as that just means they were too slow to come up and Kubernetes killed and restarted them. You do want to see at least one pod running for every service. If a pod is not running after waiting a sufficient amount of time, use kubectl logs <pod_id> to see the logs for that pod; to see the logs for previous versions of a pod, use: kubectl logs <pod_id> -p. You can also look at the actions Kubernetes performed on the pod using kubectl describe po <pod_id>.

To see a list of Fusion services, do:

kubectl get svc

For an overview of the various Fusion 5 microservices, see: https://doc.lucidworks.com/fusion-server/5.0/deployment/kubernetes/microservices.html

Once you’re ready to build a Fusion cluster for production, please see the Fusion 5 Survival Guide PDF in this repo.

Upgrading with Zero Downtime

One of the most powerful features provided by Kubernetes and a cloud-native microservices architecture is the ability to do a rolling update on a live cluster. Fusion 5 allows customers to upgrade from Fusion 5.x.y to a later 5.x.z version on a live cluster with zero downtime or disruption of service.

When Kubernetes performs a rolling update to an individual microservice, there will be a mix of old and new services in the cluster concurrently (only briefly in most cases) and requests from other services will be routed to both versions. Consequently, Lucidworks ensures all changes we make to our service do not break the API interface exposed to other services in the same 5.x line of releases. We also ensure stored configuration remains compatible in the same 5.x release line.

Lucidworks releases minor updates to individual services frequently, so our customers can pull in those upgrades using Helm at their discretion.

To upgrade your cluster at any time, use the --upgrade option with our setup scripts in this repo.

The scripts in this repo automatically pull in the latest chart updates from our Helm repository and deploy any updates needed by doing a diff of your current installation and the latest release from Lucidworks. To see what would be upgraded, you can pass the --dry-run option to the script.

Grafana Dashboards

Get the initial Grafana password from a K8s secret by doing:

kubectl get secret --namespace "${NAMESPACE}" ${RELEASE}-graf-grafana \
  -o jsonpath="{.data.admin-password}" | base64 --decode ; echo

With Grafana, you can either setup a temporary port-forward to a Grafana pod or expose Grafana on an external IP using a K8s LoadBalancer. To define a LoadBalancer, do (replace ${RELEASE} with your Helm release label):

kubectl expose deployment ${RELEASE}-graf-grafana --type=LoadBalancer --name=grafana

You can use kubectl get services --namespace <namespace> to determine when the load balancer is setup and its IP address. Direct your browser to http://<GrafanaIP>:3000 and enter the username admin@localhost and the password that was returned in the previous step.

This will log you into the application. It is recommended that you create another administrative user with a more desirable password.

One of the first things you will want to do is to configure the Prometheus data source in Grafana. Go to the gear icon on the left and then to Data Sources.

Click Add Data Source and then click on Prometheus as the data source type. It will bring you to a page where it will ask for HTTP URL for the Prometheus server.

http://<RELEASE>-prom-prometheus-server

Import dashboards from monitoring/grafana/*.json