Create Observability Stack for Monitoring and Logging #3

MooseQuest · 2020-03-17T16:36:01Z

Generating the observability stack serves the following purposes:

Allows for monitoring of both the cluster and application.
Identify resource contention and scale issues through metrics
Allows developers to pinpoint errors and surface them for ops alerts

Components to generate:

The data pipeline which will deliver logs from the environment and application
The metrics visualization
Data management layer either on the cluster, or provide the connectivity to another component which will surface.

Technologies and software to consider:

Elasticsearch
Grafana
Prometheus
Splunk

lottspot · 2020-03-18T22:33:21Z

Going to try out this pre-rolled stack as a starting point: https://github.com/coreos/kube-prometheus

Even if all goes well, this doesn't get us a logging stack; just metrics and monitoring.

lottspot · 2020-03-18T23:47:20Z

Roll out went well and we have metrics dashboards running at https://metrics.chime-live-cluster.phl.io/

The manifests used for the rollout are currently sitting in the issues/3 branch, where they will remain until the freeze on PR to masters is lifted.

Add prometheus+grafana to k8s; refactor other infra manifests

Push to remote devel

251 add pdf export bufferd [WIP]

rcknplyr · 2020-04-12T22:40:00Z

@lottspot would we consider this completed?

lottspot · 2020-04-12T23:51:33Z

We don't have anything capturing logs yet so this is technically not completed

MooseQuest · 2020-04-15T18:42:25Z

I'll be pushing up what we have so far onto a branch and will reference here.

mariekers · 2020-04-17T20:12:02Z

Would someone be interested in telling a non-devops person how this differs from #32 ?

fxdgear · 2020-04-22T17:10:58Z

Just going to leave a few comments here for posterity:

I had a conversation with @MooseQuest and he told me that Elasticserach was installed on the dev k8s cluster.

Elasticsearch was installed on a following the instructions here: https://www.elastic.co/guide/en/cloud-on-k8s/current/k8s-quickstart.html

For reference the instructions here are for installing Elastic Cloud, which is a service for managing multiple elasticsearch deployments. This of this as https://cloud.elastic.co on prem. Meaning you will have a web interface for managing multiple es clusters. You can upgrade, manage backups, etc.. It's a great service but might be overkill to have an elastic cloud serice for each chime deployment

My recommendation is that for each deployment of CHIME it would have a single deployment of Elasticsearch.

To deploy Elasticsearch (and the elastic stack at large) I would recommend using the Elasticsearch Helm Charts

Elasticsearch Helm chart requirements are:

Helm >=2.8.0 and <3.0.0 (see parent README for more details)
Kubernetes >=1.8
Minimum cluster requirements include the following to run this chart with default settings. All of these settings are configurable.
- Three Kubernetes nodes to respect the default "hard" affinity settings
- 1GB of RAM for the JVM heap

Elasticsearch being a distributed system operates on an high availability model. Meaning the minimum number of Elasticsearch nodes should be 3. This is why the kubernetes cluster must have at least 3 nodes. This allows for Elasticsearch cluster to survive a kubernetes node failure.

Using the helm charts also gives us the added benefit of being able to deploy:

elasticsearch
filebeat
metricbeat
kibana
apm-server
Filebeat can be configured to read the logs from pods in the k8s cluster and ship the logs to elasticsearch
metricbeat can be configured to collect metrics from the k8s cluster and ship them to elasticsearch
APM server is a service that runs on the k8s clusters and can accept APM data from various applications deployed in the K8s cluster and ship APM data to elasticserach.

the benefit of having all this data going into elasticsearch is that you can use Kibana to vizualize all these different data sources in one place.

Kibana also has a "logs" app which lets you tail incoming logs to elasticsearch. You can even filter on k8s labels or pod names or namespaces etc..

The elastic apm service currently has support the following languages

Go
Java
.NET
Node.js
Python
Ruby

themightychris · 2020-04-22T17:17:09Z

@fxdgear long term, we're not looking to give each deployment of CHIME its own cluster. That was a stop-gap measure to proceed quickly. Eventually, we want to have a single prod cluster hosting many civic applications including chime, alternate versions of chime, follow-up projects related to chime, and other local civic projects. We are thinking that each project would be within its own namespace.

We need an infrastructure that gets us as close as possible to each project/namespace being free-when-idle. Any cluster services that we need to deploy instances of per-project/namespace will create poor economics for us. We have very modest funding within which we need to be able to host a large number of low-traffic projects sustainably for many years. At any given time, only a small number of projects, if any, will have high traffic. It's kind of an inverse scenario of most enterprise use cases

Given that, would you adjust your recommendations at all?

fxdgear · 2020-04-22T17:30:46Z

@themightychris Thanks for the quick response.

Given the longterm goal of a single K8s cluster with multiple namespaces what I think I would recommend in this case is the following:

Deploy the elastic stack into it's own namespace
- APM
- *beats
- Elasticsearch
- Kibana
Configure the *beats to read from ALL namespaces
Setup APM server to run as an internal service (ie no ingres)
- configure your apps which send APM data to APM Server to communicate to the full service name. ie service-name.namespace.svc.cluster.local

The end goal here being (wrt the elastic stack) is that it's a single deployment of the elastic tooling. It's configured in a way that lets you add and remove namespaces (ie various CHIME related projects and deployments)

But you end up with a singular entity to monitor ALL your deployments.

This was not explicit in my previous comment, but the goal here is that a if you end up having multiple k8s clusters or a single k8s cluster you still only need a single elastic stack deployment per k8s cluster.

This strategy will scale regardless.

On another note, depending on volume of logs/metrics you may or may run out of disk space for storing data in Elasticsearch. There's a couple ways to handle this.

If you have a policy on the length of time you are required (or want) to store logs you can do any of the following:

increase the disk size of your PVC to account for the ammount of data you need to store.
sheduled snapshots of the data to store outside the cluster.
Roll ups (basically storing older data with lower fidelity)
And finally using ILM (Index Lifecycle Management) to automate a lot of this to ensure your disks don't overload with stale data.

MooseQuest self-assigned this Mar 17, 2020

MooseQuest added the observability label Mar 17, 2020

lottspot added devops and removed observability labels Mar 18, 2020

lottspot self-assigned this Mar 18, 2020

themightychris added a commit that referenced this issue Mar 19, 2020

Merge pull request #80 from /issues/3

cc59ee8

Add prometheus+grafana to k8s; refactor other infra manifests

mariekers removed the devops label Mar 20, 2020

lottspot added the k8s infra Requires work on ops-facing workloads which support k8s app label Mar 20, 2020

quinn-dougherty pushed a commit that referenced this issue Mar 25, 2020

Merge pull request #3 from BrianThomasRoss/dash-app

8cc5443

Push to remote devel

ckoerber pushed a commit that referenced this issue Apr 1, 2020

Merge pull request #3 from CodeForPhilly/251_add_pdf_export_bufferd

849131e

251 add pdf export bufferd [WIP]

fxdgear mentioned this issue Apr 22, 2020

[DevOps] adding readme for how to deploy elasticstack using helm charts #565

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create Observability Stack for Monitoring and Logging #3

Create Observability Stack for Monitoring and Logging #3

MooseQuest commented Mar 17, 2020

lottspot commented Mar 18, 2020

lottspot commented Mar 18, 2020

rcknplyr commented Apr 12, 2020

lottspot commented Apr 12, 2020

MooseQuest commented Apr 15, 2020

mariekers commented Apr 17, 2020

fxdgear commented Apr 22, 2020

themightychris commented Apr 22, 2020 •

edited

Loading

fxdgear commented Apr 22, 2020

Create Observability Stack for Monitoring and Logging #3

Create Observability Stack for Monitoring and Logging #3

Comments

MooseQuest commented Mar 17, 2020

lottspot commented Mar 18, 2020

lottspot commented Mar 18, 2020

rcknplyr commented Apr 12, 2020

lottspot commented Apr 12, 2020

MooseQuest commented Apr 15, 2020

mariekers commented Apr 17, 2020

fxdgear commented Apr 22, 2020

themightychris commented Apr 22, 2020 • edited Loading

fxdgear commented Apr 22, 2020

themightychris commented Apr 22, 2020 •

edited

Loading