From e7f3ecc76da57f738b5af1b52d67a48738a63190 Mon Sep 17 00:00:00 2001 From: pipo02mix Date: Fri, 3 May 2024 11:01:30 +0200 Subject: [PATCH 1/8] Proposal for observability overview page --- src/content/overview/observability/_index.md | 24 ++++++++++++++++++-- 1 file changed, 22 insertions(+), 2 deletions(-) diff --git a/src/content/overview/observability/_index.md b/src/content/overview/observability/_index.md index bb737ad4e1..ca1731c114 100644 --- a/src/content/overview/observability/_index.md +++ b/src/content/overview/observability/_index.md @@ -1,12 +1,32 @@ --- title: Observability -description: Observability tooling to provide you with visibility into the Giant Swarm platform, your cluster fleet and application workloads. +description: Monitoring, Logging and Tracing to provide you with visibility into the Giant Swarm platform, your cluster fleet and application workloads. weight: 70 menu: principal: parent: overview identifier: overview-observability -last_review_date: 2024-03-18 +last_review_date: 2024-05-03 owner: - https://github.com/orgs/giantswarm/teams/sig-product --- + +Check first the template (https://github.com/giantswarm/docs/pull/2180/files) it is still on development and you feedback is welcome, but you can take it as a reference. + +Concepts come to my mind when I think on observability (worthy to mention here without going deep): + +- Monitoring +- Logging +- Tracing +- Alerting +- Profiling + +Write a brief description of the the observability in general from introductory point of view. + +## Features + +Describe what use cases are covered by the observability features. + +## Cloud-native technologies + +- What technologies are used to provide observability in Giant Swarm? \ No newline at end of file From 78a15bca51d8d1e234133d4d81f6b9b9be1164a1 Mon Sep 17 00:00:00 2001 From: Fernando Ripoll Date: Fri, 3 May 2024 11:46:08 +0200 Subject: [PATCH 2/8] Update src/content/overview/observability/_index.md --- src/content/overview/observability/_index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/content/overview/observability/_index.md b/src/content/overview/observability/_index.md index ca1731c114..6f62b4e450 100644 --- a/src/content/overview/observability/_index.md +++ b/src/content/overview/observability/_index.md @@ -21,7 +21,7 @@ Concepts come to my mind when I think on observability (worthy to mention here w - Alerting - Profiling -Write a brief description of the the observability in general from introductory point of view. +The main idea is to write a brief description of observability in general from an introductory point of view setting the high-level goals we would like to achieve. ## Features From c088edb66a98f437dbc773ca67fa0758e15d05e3 Mon Sep 17 00:00:00 2001 From: pipo02mix Date: Mon, 3 Jun 2024 13:31:10 +0200 Subject: [PATCH 3/8] Add a proposal --- src/content/overview/observability/_index.md | 38 +++++++++++++------- 1 file changed, 25 insertions(+), 13 deletions(-) diff --git a/src/content/overview/observability/_index.md b/src/content/overview/observability/_index.md index 6f62b4e450..1f315e12f4 100644 --- a/src/content/overview/observability/_index.md +++ b/src/content/overview/observability/_index.md @@ -6,27 +6,39 @@ menu: principal: parent: overview identifier: overview-observability -last_review_date: 2024-05-03 +last_review_date: 2024-06-03 owner: - https://github.com/orgs/giantswarm/teams/sig-product --- -Check first the template (https://github.com/giantswarm/docs/pull/2180/files) it is still on development and you feedback is welcome, but you can take it as a reference. +Observability is a fundamental aspect of modern cloud-native environments, providing the insights needed to understand and improve the performance, reliability, and overall health of applications and infrastructure. At Giant Swarm, we prioritize observability to ensure our customers can maintain visibility into their systems, quickly identify and resolve issues, and continuously optimize their operations. -Concepts come to my mind when I think on observability (worthy to mention here without going deep): +## Capabilities -- Monitoring -- Logging -- Tracing -- Alerting -- Profiling +- **Monitoring**: It involves the continuous collection and analysis of metrics to assess the performance and health of applications and infrastructure. Effective monitoring across all environments allows teams to detect anomalies, understand usage patterns, and make data-driven decisions to optimize their systems. -The main idea is to write a brief description of observability in general from an introductory point of view setting the high-level goals we would like to achieve. +- **Logging**: It captures detailed records of system and application events, providing crucial information for troubleshooting and auditing. Centralized logging, a powerful tool that empowers teams, enables them to search, analyze, and visualize log data, helping them identify issues, track changes, and ensure compliance with security and regulatory requirements, thereby giving them a greater sense of control over their system's health. -## Features +- **Tracing**: lets you track requests' journeys through various application services and components. It also provides a detailed view of the interactions and dependencies between services, helping to pinpoint performance bottlenecks and understand request end-to-end latency. -Describe what use cases are covered by the observability features. +- **Alerting**: It is crucial to be able to notify your teams about significant events or issues that require immediate attention. By making it easy to set up and configure across apps and environments, alerting helps platform teams avoid wasting time on repetitive tasks, thereby enhancing their efficiency and allowing them to focus on what matters. -## Cloud-native technologies +- **Profiling**: Profiling analyzes applications' resource usage and performance characteristics, identifying inefficient code paths and resource bottlenecks. A standardized approach to profiling applications helps teams optimize performance, reduce costs, and deliver a better user experience. -- What technologies are used to provide observability in Giant Swarm? \ No newline at end of file +One of the key benefits using Giant Swarm is that we provide a set of integrated observability tools that help you have a comprehensive view of your applications and infrastructure. + +## Cloud-Native technologies + +**Prometheus** is a leading open-source monitoring and alerting toolkit for reliability and scalability. It collects and stores metrics as time series data, enabling powerful querying and alerting capabilities. Prometheus integrates seamlessly with Kubernetes, making it a popular choice for cloud-native environments. + +- **Grafana**: Grafana is a popular open-source visualization tool for creating dashboards and graphs for monitoring and observability. It supports various data sources, including Prometheus, Elasticsearch, and InfluxDB, making it versatile for visualizing metrics and logs. + +- **Fluent Bit**: It is a versatile log collector that can aggregate, process, and forward logs to various destinations. It supports multiple input and output plugins, making integrating different logging backends and systems effortless. + +- **Tempo**: Tempo is an open-source tracing system for monitoring and troubleshooting microservices-based architectures. It helps visualize request flows, measure latencies, and analyze the performance of distributed systems. Tempo integrates well with Kubernetes and other cloud-native tools, providing comprehensive tracing capabilities. + +- **Alertmanager**: This is part of the Prometheus ecosystem and is responsible for handling Prometheus-generated alerts. It manages the routing, grouping, and silencing of alerts, ensuring that the right people are notified at the right time. At the same time, it supports various notification methods, including email, Slack, and PagerDuty. + +- **Pyroscope**: This tool provides continuous profiling capabilities for cloud-native applications. It collects and visualizes profiling data, helping teams identify performance issues and optimize resource usage. Continuous profiling ensures that applications run efficiently, reducing costs and improving performance. + +Learn how to start with observability on Giant Swarm by visiting our [getting started observability page]({{< relref "getting-started/observe-your-clusters-and-apps/" >}}). From 46a07c74e498185d2bd81d5dba77eec3751109f1 Mon Sep 17 00:00:00 2001 From: Dominik Kress Date: Thu, 6 Jun 2024 10:29:59 +0200 Subject: [PATCH 4/8] refactor observability overview to align with observability platform vision --- src/content/overview/observability/_index.md | 28 +++++++++----------- 1 file changed, 12 insertions(+), 16 deletions(-) diff --git a/src/content/overview/observability/_index.md b/src/content/overview/observability/_index.md index 1f315e12f4..d918f2f95d 100644 --- a/src/content/overview/observability/_index.md +++ b/src/content/overview/observability/_index.md @@ -1,6 +1,6 @@ --- title: Observability -description: Monitoring, Logging and Tracing to provide you with visibility into the Giant Swarm platform, your cluster fleet and application workloads. +description: The Observability Platform provides you with visibility into the Giant Swarm platform, your cluster fleet and application workloads. weight: 70 menu: principal: @@ -11,34 +11,30 @@ owner: - https://github.com/orgs/giantswarm/teams/sig-product --- -Observability is a fundamental aspect of modern cloud-native environments, providing the insights needed to understand and improve the performance, reliability, and overall health of applications and infrastructure. At Giant Swarm, we prioritize observability to ensure our customers can maintain visibility into their systems, quickly identify and resolve issues, and continuously optimize their operations. +Observability is a fundamental aspect of modern cloud-native environments, providing the insights needed to understand and improve the performance, reliability, and overall health of applications and infrastructure. At Giant Swarm, we prioritize observability to ensure our customers can maintain visibility into their systems, quickly identify and resolve issues, and continuously optimize their operations. With our Observability Platform we aim to empower you to fulfill all of your observability needs from Data Exploration, over Visualisation to Alerting in a self-service fashion, while providing you useful and battle-proven out-of-the-box defaults. ## Capabilities -- **Monitoring**: It involves the continuous collection and analysis of metrics to assess the performance and health of applications and infrastructure. Effective monitoring across all environments allows teams to detect anomalies, understand usage patterns, and make data-driven decisions to optimize their systems. +- **Monitoring**: The heart of our Observability Platform is the continuous collection and analysis of metrics to assess the performance and health of applications and infrastructure. Effective monitoring across all environments allows teams to detect anomalies, understand usage patterns, and make data-driven decisions to optimize their systems. -- **Logging**: It captures detailed records of system and application events, providing crucial information for troubleshooting and auditing. Centralized logging, a powerful tool that empowers teams, enables them to search, analyze, and visualize log data, helping them identify issues, track changes, and ensure compliance with security and regulatory requirements, thereby giving them a greater sense of control over their system's health. +- **Logging**: Our Observability Platform also captures detailed records of system and application events, providing crucial information for troubleshooting and auditing. Centralized logging, a powerful tool that empowers teams, enables them to search and analyze log data, helping them identify issues, track changes, and ensure compliance with security and regulatory requirements, thereby giving them a greater sense of control over their system's health. -- **Tracing**: lets you track requests' journeys through various application services and components. It also provides a detailed view of the interactions and dependencies between services, helping to pinpoint performance bottlenecks and understand request end-to-end latency. +- **Visualisation**: At Giant Swarm we believe that the real power of observability comes from not only exploring isolated data but also "connecting the dots" - we aim to provide you not just with data, but knowledge. For this our Observability Platform offers a wide range of visualisations and dashboards for metrics and logs and enables you to create and add your own dashboards to empower your teams to learn the insights they need to know to run their systems and apps efficiently in just a glance. -- **Alerting**: It is crucial to be able to notify your teams about significant events or issues that require immediate attention. By making it easy to set up and configure across apps and environments, alerting helps platform teams avoid wasting time on repetitive tasks, thereby enhancing their efficiency and allowing them to focus on what matters. +- **Alerting**: To not just look at dashboards all day it is crucial to be able to get notified about significant events or issues that require immediate attention. By making it easy to set up and configure alerting rules across apps and environments, our Observability Platforms alerting helps your teams avoid wasting time on repetitive tasks, thereby enhancing their efficiency and allowing them to focus on what matters. -- **Profiling**: Profiling analyzes applications' resource usage and performance characteristics, identifying inefficient code paths and resource bottlenecks. A standardized approach to profiling applications helps teams optimize performance, reduce costs, and deliver a better user experience. - -One of the key benefits using Giant Swarm is that we provide a set of integrated observability tools that help you have a comprehensive view of your applications and infrastructure. +One of the key benefits using Giant Swarm is that we provide a set of battle-tested and highly integrated observability tools that our own teams already use on a daily basis and will help you have a comprehensive view of your applications and infrastructure. ## Cloud-Native technologies -**Prometheus** is a leading open-source monitoring and alerting toolkit for reliability and scalability. It collects and stores metrics as time series data, enabling powerful querying and alerting capabilities. Prometheus integrates seamlessly with Kubernetes, making it a popular choice for cloud-native environments. - -- **Grafana**: Grafana is a popular open-source visualization tool for creating dashboards and graphs for monitoring and observability. It supports various data sources, including Prometheus, Elasticsearch, and InfluxDB, making it versatile for visualizing metrics and logs. +- **Mimir** is an open source, horizontally scalable, highly available, multi-tenant time series database for long-term storage for metrics and serves as central core component for storing and analyzing metrics on all our managed clusters. Think of it as Prometheus on Steroids. Like Prometheus it integrates seamlessly with Kubernetes and collects, but also stores metrics for a longer period of time as time series data, enabling powerful querying and alerting capabilities. -- **Fluent Bit**: It is a versatile log collector that can aggregate, process, and forward logs to various destinations. It supports multiple input and output plugins, making integrating different logging backends and systems effortless. +- **Loki** is an open source, horizontally scalable, highly available, multi-tenant log aggregation system inspired by Prometheus. It is designed to be very efficient and cost effective and integrates all sorts of logs in any format from any source, covering everything from app and system events to audit logs. With LogQL Loki also offers an easy Query Language simplifying querying logs. Additionally the language facilitates the generation of metrics from log data, a powerful feature that goes well beyond log aggregation. -- **Tempo**: Tempo is an open-source tracing system for monitoring and troubleshooting microservices-based architectures. It helps visualize request flows, measure latencies, and analyze the performance of distributed systems. Tempo integrates well with Kubernetes and other cloud-native tools, providing comprehensive tracing capabilities. +- **Grafana** is a popular open-source visualization tool for exploring metrics and logs and creating dashboards and graphs for all observability needs. It perfectly integrates Mimir and Loki but also supports various other data sources, including Prometheus, Elasticsearch, InfluxDB, and others, making it versatile for visualizing all observability data in just one single place. -- **Alertmanager**: This is part of the Prometheus ecosystem and is responsible for handling Prometheus-generated alerts. It manages the routing, grouping, and silencing of alerts, ensuring that the right people are notified at the right time. At the same time, it supports various notification methods, including email, Slack, and PagerDuty. +- **Alertmanager** is part of the Grafana ecosystem and is responsible for handling all alerts. It manages the routing, grouping, and silencing of alerts, ensuring that the right people are notified at the right time. At the same time, it supports various notification methods, including email, Slack, and PagerDuty. -- **Pyroscope**: This tool provides continuous profiling capabilities for cloud-native applications. It collects and visualizes profiling data, helping teams identify performance issues and optimize resource usage. Continuous profiling ensures that applications run efficiently, reducing costs and improving performance. +- **Various Agents** like Grafana-Agent, Fluent-bit and others help the Observability Platform collect and integrate relevant data from Workload Clusters, making the process to add new data as simple as just setting up a new service monitor or add the right labels. Learn how to start with observability on Giant Swarm by visiting our [getting started observability page]({{< relref "getting-started/observe-your-clusters-and-apps/" >}}). From ab8ecc882fa127696a615c96654312a3ffe796da Mon Sep 17 00:00:00 2001 From: Dominik Kress Date: Thu, 6 Jun 2024 10:31:46 +0200 Subject: [PATCH 5/8] update review date --- src/content/overview/observability/_index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/content/overview/observability/_index.md b/src/content/overview/observability/_index.md index d918f2f95d..473b3e3ff8 100644 --- a/src/content/overview/observability/_index.md +++ b/src/content/overview/observability/_index.md @@ -6,7 +6,7 @@ menu: principal: parent: overview identifier: overview-observability -last_review_date: 2024-06-03 +last_review_date: 2024-06-06 owner: - https://github.com/orgs/giantswarm/teams/sig-product --- From 1ab9083d2161d8240e6105b5547f0451f1582ce0 Mon Sep 17 00:00:00 2001 From: Dominik Kress Date: Thu, 6 Jun 2024 10:36:16 +0200 Subject: [PATCH 6/8] fix linting errors --- src/content/overview/observability/_index.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/src/content/overview/observability/_index.md b/src/content/overview/observability/_index.md index 473b3e3ff8..98a9014de0 100644 --- a/src/content/overview/observability/_index.md +++ b/src/content/overview/observability/_index.md @@ -19,11 +19,11 @@ Observability is a fundamental aspect of modern cloud-native environments, provi - **Logging**: Our Observability Platform also captures detailed records of system and application events, providing crucial information for troubleshooting and auditing. Centralized logging, a powerful tool that empowers teams, enables them to search and analyze log data, helping them identify issues, track changes, and ensure compliance with security and regulatory requirements, thereby giving them a greater sense of control over their system's health. -- **Visualisation**: At Giant Swarm we believe that the real power of observability comes from not only exploring isolated data but also "connecting the dots" - we aim to provide you not just with data, but knowledge. For this our Observability Platform offers a wide range of visualisations and dashboards for metrics and logs and enables you to create and add your own dashboards to empower your teams to learn the insights they need to know to run their systems and apps efficiently in just a glance. +- **Visualisation**: At Giant Swarm we believe that the real power of observability comes from not only exploring isolated data but also "connecting the dots" - we aim to provide you not just with data, but knowledge. For this our Observability Platform offers a wide range of visualisations and dashboards for metrics and logs and enables you to create and add your own dashboards to empower your teams to learn the insights they need to know to run their systems and apps efficiently in just a glance. - **Alerting**: To not just look at dashboards all day it is crucial to be able to get notified about significant events or issues that require immediate attention. By making it easy to set up and configure alerting rules across apps and environments, our Observability Platforms alerting helps your teams avoid wasting time on repetitive tasks, thereby enhancing their efficiency and allowing them to focus on what matters. -One of the key benefits using Giant Swarm is that we provide a set of battle-tested and highly integrated observability tools that our own teams already use on a daily basis and will help you have a comprehensive view of your applications and infrastructure. +One of the key benefits using Giant Swarm is that we provide a set of battle-tested and highly integrated observability tools that our own teams already use on a daily basis and will help you have a comprehensive view of your applications and infrastructure. ## Cloud-Native technologies @@ -31,7 +31,7 @@ One of the key benefits using Giant Swarm is that we provide a set of battle-tes - **Loki** is an open source, horizontally scalable, highly available, multi-tenant log aggregation system inspired by Prometheus. It is designed to be very efficient and cost effective and integrates all sorts of logs in any format from any source, covering everything from app and system events to audit logs. With LogQL Loki also offers an easy Query Language simplifying querying logs. Additionally the language facilitates the generation of metrics from log data, a powerful feature that goes well beyond log aggregation. -- **Grafana** is a popular open-source visualization tool for exploring metrics and logs and creating dashboards and graphs for all observability needs. It perfectly integrates Mimir and Loki but also supports various other data sources, including Prometheus, Elasticsearch, InfluxDB, and others, making it versatile for visualizing all observability data in just one single place. +- **Grafana** is a popular open-source visualization tool for exploring metrics and logs and creating dashboards and graphs for all observability needs. It perfectly integrates Mimir and Loki but also supports various other data sources, including Prometheus, Elasticsearch, InfluxDB, and others, making it versatile for visualizing all observability data in just one single place. - **Alertmanager** is part of the Grafana ecosystem and is responsible for handling all alerts. It manages the routing, grouping, and silencing of alerts, ensuring that the right people are notified at the right time. At the same time, it supports various notification methods, including email, Slack, and PagerDuty. From 4cc3936740adb7d8b8691ffd40ea6783a291797a Mon Sep 17 00:00:00 2001 From: Fernando Ripoll Date: Thu, 6 Jun 2024 15:13:29 +0200 Subject: [PATCH 7/8] Apply suggestions from code review MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Hervé Nicol Co-authored-by: Quentin Bisson --- src/content/overview/observability/_index.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/src/content/overview/observability/_index.md b/src/content/overview/observability/_index.md index 98a9014de0..f93b9dbe31 100644 --- a/src/content/overview/observability/_index.md +++ b/src/content/overview/observability/_index.md @@ -17,24 +17,24 @@ Observability is a fundamental aspect of modern cloud-native environments, provi - **Monitoring**: The heart of our Observability Platform is the continuous collection and analysis of metrics to assess the performance and health of applications and infrastructure. Effective monitoring across all environments allows teams to detect anomalies, understand usage patterns, and make data-driven decisions to optimize their systems. -- **Logging**: Our Observability Platform also captures detailed records of system and application events, providing crucial information for troubleshooting and auditing. Centralized logging, a powerful tool that empowers teams, enables them to search and analyze log data, helping them identify issues, track changes, and ensure compliance with security and regulatory requirements, thereby giving them a greater sense of control over their system's health. +- **Logging**: Our Observability Platform also captures detailed records of system and application events, providing crucial information for troubleshooting and auditing. Centralized logging, a powerful tool that empowers teams, enables them to search and analyze log data, helping them identify issues, track changes, and ensure compliance with security and regulatory requirements, thereby giving them greater control over their system's health. - **Visualisation**: At Giant Swarm we believe that the real power of observability comes from not only exploring isolated data but also "connecting the dots" - we aim to provide you not just with data, but knowledge. For this our Observability Platform offers a wide range of visualisations and dashboards for metrics and logs and enables you to create and add your own dashboards to empower your teams to learn the insights they need to know to run their systems and apps efficiently in just a glance. -- **Alerting**: To not just look at dashboards all day it is crucial to be able to get notified about significant events or issues that require immediate attention. By making it easy to set up and configure alerting rules across apps and environments, our Observability Platforms alerting helps your teams avoid wasting time on repetitive tasks, thereby enhancing their efficiency and allowing them to focus on what matters. +- **Alerting**: To not just look at dashboards all day it is crucial to be able to get notified about significant events or issues that require immediate attention. By making it easy to set up and configure alerting rules across apps and environments, our Observability Platform's alerting helps your teams avoid wasting time on repetitive tasks, thereby enhancing their efficiency and allowing them to focus on what matters. One of the key benefits using Giant Swarm is that we provide a set of battle-tested and highly integrated observability tools that our own teams already use on a daily basis and will help you have a comprehensive view of your applications and infrastructure. ## Cloud-Native technologies -- **Mimir** is an open source, horizontally scalable, highly available, multi-tenant time series database for long-term storage for metrics and serves as central core component for storing and analyzing metrics on all our managed clusters. Think of it as Prometheus on Steroids. Like Prometheus it integrates seamlessly with Kubernetes and collects, but also stores metrics for a longer period of time as time series data, enabling powerful querying and alerting capabilities. +- **Mimir** is an open source, horizontally scalable, highly available, multi-tenant time series database for long-term storage for metrics and serves as central core component for storing and analyzing metrics on all our managed clusters. Think of it as Prometheus on Steroids. Like Prometheus, it integrates seamlessly with Kubernetes and collects, but also stores metrics for a longer period of time as time series data, enabling powerful querying and alerting capabilities. -- **Loki** is an open source, horizontally scalable, highly available, multi-tenant log aggregation system inspired by Prometheus. It is designed to be very efficient and cost effective and integrates all sorts of logs in any format from any source, covering everything from app and system events to audit logs. With LogQL Loki also offers an easy Query Language simplifying querying logs. Additionally the language facilitates the generation of metrics from log data, a powerful feature that goes well beyond log aggregation. +- **Loki** is an open source, horizontally scalable, highly available, multi-tenant log aggregation system inspired by Prometheus. It is designed to be very efficient and cost effective and integrates all sorts of logs in any format from any source, covering everything from app and system events to audit logs. With LogQL, Loki also offers a simple Query Language to query logs. Additionally the language facilitates the generation of metrics from log data, a powerful feature that goes well beyond log aggregation. - **Grafana** is a popular open-source visualization tool for exploring metrics and logs and creating dashboards and graphs for all observability needs. It perfectly integrates Mimir and Loki but also supports various other data sources, including Prometheus, Elasticsearch, InfluxDB, and others, making it versatile for visualizing all observability data in just one single place. - **Alertmanager** is part of the Grafana ecosystem and is responsible for handling all alerts. It manages the routing, grouping, and silencing of alerts, ensuring that the right people are notified at the right time. At the same time, it supports various notification methods, including email, Slack, and PagerDuty. -- **Various Agents** like Grafana-Agent, Fluent-bit and others help the Observability Platform collect and integrate relevant data from Workload Clusters, making the process to add new data as simple as just setting up a new service monitor or add the right labels. +- **Various Agents** like Grafana-Agent (now renamed Alloy), Fluent-bit and others help the Observability Platform collect and integrate relevant data from Workload Clusters, making the process to add new data as simple as just setting up a new service monitor or add the right labels. Learn how to start with observability on Giant Swarm by visiting our [getting started observability page]({{< relref "getting-started/observe-your-clusters-and-apps/" >}}). From 45ea4c98dffd4ecc424899cf5f09d6aabb976c51 Mon Sep 17 00:00:00 2001 From: pipo02mix Date: Thu, 6 Jun 2024 15:37:26 +0200 Subject: [PATCH 8/8] Address comments --- .../config/vocabularies/docs/accept.txt | 1 + src/content/overview/observability/_index.md | 20 +++++++++---------- 2 files changed, 11 insertions(+), 10 deletions(-) diff --git a/.vale/styles/config/vocabularies/docs/accept.txt b/.vale/styles/config/vocabularies/docs/accept.txt index 12b5b01215..b1068ef259 100644 --- a/.vale/styles/config/vocabularies/docs/accept.txt +++ b/.vale/styles/config/vocabularies/docs/accept.txt @@ -1,4 +1,5 @@ # Please keep this list sorted (alphabetically, case-insensitive) +Alertmanager API[s] auditable CIDRs diff --git a/src/content/overview/observability/_index.md b/src/content/overview/observability/_index.md index f93b9dbe31..27a6f1af63 100644 --- a/src/content/overview/observability/_index.md +++ b/src/content/overview/observability/_index.md @@ -1,6 +1,6 @@ --- title: Observability -description: The Observability Platform provides you with visibility into the Giant Swarm platform, your cluster fleet and application workloads. +description: The observability platform provides you with visibility into the Giant Swarm platform, your cluster fleet and application workloads. weight: 70 menu: principal: @@ -11,30 +11,30 @@ owner: - https://github.com/orgs/giantswarm/teams/sig-product --- -Observability is a fundamental aspect of modern cloud-native environments, providing the insights needed to understand and improve the performance, reliability, and overall health of applications and infrastructure. At Giant Swarm, we prioritize observability to ensure our customers can maintain visibility into their systems, quickly identify and resolve issues, and continuously optimize their operations. With our Observability Platform we aim to empower you to fulfill all of your observability needs from Data Exploration, over Visualisation to Alerting in a self-service fashion, while providing you useful and battle-proven out-of-the-box defaults. +Observability is a fundamental aspect of modern cloud-native environments, providing the insights needed to understand and improve the performance, reliability, and overall health of applications and infrastructure. At Giant Swarm, we prioritize observability to ensure our customers can maintain visibility into their systems, quickly identify and resolve issues, and continuously optimize their operations. With our observability platform we aim to empower you to fulfill all of your observability needs from data exploration, over visualization to alerting in a self-service fashion, while providing you useful and battle-proven out-of-the-box defaults. ## Capabilities -- **Monitoring**: The heart of our Observability Platform is the continuous collection and analysis of metrics to assess the performance and health of applications and infrastructure. Effective monitoring across all environments allows teams to detect anomalies, understand usage patterns, and make data-driven decisions to optimize their systems. +- **Monitoring**: The heart of our observability platform is the continuous collection and analysis of metrics to assess the performance and health of applications and infrastructure. Effective monitoring across all environments allows teams to detect anomalies, understand usage patterns, and make data-driven decisions to optimize their systems. -- **Logging**: Our Observability Platform also captures detailed records of system and application events, providing crucial information for troubleshooting and auditing. Centralized logging, a powerful tool that empowers teams, enables them to search and analyze log data, helping them identify issues, track changes, and ensure compliance with security and regulatory requirements, thereby giving them greater control over their system's health. +- **Logging**: Our observability platform also captures detailed records of system and application events, providing crucial information for troubleshooting and auditing. Centralized logging, a powerful tool that empowers teams, enables them to search and analyze log data, helping them identify issues, track changes, and ensure compliance with security and regulatory requirements, thereby giving them greater control over their system's health. -- **Visualisation**: At Giant Swarm we believe that the real power of observability comes from not only exploring isolated data but also "connecting the dots" - we aim to provide you not just with data, but knowledge. For this our Observability Platform offers a wide range of visualisations and dashboards for metrics and logs and enables you to create and add your own dashboards to empower your teams to learn the insights they need to know to run their systems and apps efficiently in just a glance. +- **Visualization**: At Giant Swarm we believe that the real power of observability comes from not only exploring isolated data but also "connecting the dots" - we aim to provide you not just with data, but knowledge. Our staff operates the observability platform using a wide battle-tested range of dashboards, visualizations and alerts to ensure continuous availability of the whole platform. Our customers can use those assets to monitor the system at the same time they can configure new metrics, design new dashboards or set workloads alerts. -- **Alerting**: To not just look at dashboards all day it is crucial to be able to get notified about significant events or issues that require immediate attention. By making it easy to set up and configure alerting rules across apps and environments, our Observability Platform's alerting helps your teams avoid wasting time on repetitive tasks, thereby enhancing their efficiency and allowing them to focus on what matters. +- **Alerting**: To not just look at dashboards all day it's crucial to be able to get notified about significant events or issues that require immediate attention. By making it easy to set up and configure alerting rules across apps and environments, our observability platform's alerting helps your teams avoid wasting time on repetitive tasks, thereby enhancing their efficiency and allowing them to focus on what matters. -One of the key benefits using Giant Swarm is that we provide a set of battle-tested and highly integrated observability tools that our own teams already use on a daily basis and will help you have a comprehensive view of your applications and infrastructure. +One of the key benefits using Giant Swarm is that we provide a set of reliable and highly integrated observability tools that our own teams already use on a daily basis and will help you have a comprehensive view of your applications and infrastructure. -## Cloud-Native technologies +## Cloud-native technologies - **Mimir** is an open source, horizontally scalable, highly available, multi-tenant time series database for long-term storage for metrics and serves as central core component for storing and analyzing metrics on all our managed clusters. Think of it as Prometheus on Steroids. Like Prometheus, it integrates seamlessly with Kubernetes and collects, but also stores metrics for a longer period of time as time series data, enabling powerful querying and alerting capabilities. -- **Loki** is an open source, horizontally scalable, highly available, multi-tenant log aggregation system inspired by Prometheus. It is designed to be very efficient and cost effective and integrates all sorts of logs in any format from any source, covering everything from app and system events to audit logs. With LogQL, Loki also offers a simple Query Language to query logs. Additionally the language facilitates the generation of metrics from log data, a powerful feature that goes well beyond log aggregation. +- **Loki** is an open source, horizontally scalable, highly available, multi-tenant log aggregation system inspired by Prometheus. It's designed to be very efficient and cost effective and integrates all sorts of logs in any format from any source, covering everything from app and system events to audit logs. With LogQL, Loki also offers a simple Query Language to query logs. Additionally the language facilitates the generation of metrics from log data, a powerful feature that goes well beyond log aggregation. - **Grafana** is a popular open-source visualization tool for exploring metrics and logs and creating dashboards and graphs for all observability needs. It perfectly integrates Mimir and Loki but also supports various other data sources, including Prometheus, Elasticsearch, InfluxDB, and others, making it versatile for visualizing all observability data in just one single place. - **Alertmanager** is part of the Grafana ecosystem and is responsible for handling all alerts. It manages the routing, grouping, and silencing of alerts, ensuring that the right people are notified at the right time. At the same time, it supports various notification methods, including email, Slack, and PagerDuty. -- **Various Agents** like Grafana-Agent (now renamed Alloy), Fluent-bit and others help the Observability Platform collect and integrate relevant data from Workload Clusters, making the process to add new data as simple as just setting up a new service monitor or add the right labels. +- **Various agents** like Grafana-Agent (now renamed Alloy), Fluent-bit and others help the observability platform collect and integrate relevant data from Workload Clusters, making the process to add new data as simple as just setting up a new service monitor or add the right labels. Learn how to start with observability on Giant Swarm by visiting our [getting started observability page]({{< relref "getting-started/observe-your-clusters-and-apps/" >}}).