stackabletech · razvan · Sep 27, 2024 · Sep 24, 2024 · Sep 24, 2024 · Sep 24, 2024
diff --git a/docs/modules/druid/pages/getting_started/first_steps.adoc b/docs/modules/druid/pages/getting_started/first_steps.adoc
@@ -1,8 +1,8 @@
 = First steps
 :description: Set up a Druid cluster using the Stackable Operator by installing ZooKeeper, HDFS, and Druid. Ingest and query example data via the web UI or API.
 
-After going through the xref:getting_started/installation.adoc[] section and having installed all the Operators, you will now deploy a Druid cluster and its dependencies.
-Afterwards you can <<_verify_that_it_works, verify that it works>> by ingesting example data and subsequently query it.
+With the operators installed, deploy a Druid cluster and its dependencies.
+Afterward you can <<_verify_that_it_works, verify that it works>> by ingesting example data and subsequently query it.
 
 == Setup
 
@@ -12,8 +12,8 @@ Three things need to be installed to have a Druid cluster:
 * An HDFS instance to be used as a backend for deep storage
 * The Druid cluster itself
 
-We will create them in this order, each one is created by applying a manifest file.
-The Operators you just installed will then create the resources according to the manifest.
+Create them in this order, each one is created by applying a manifest file.
+The operators you just installed then create the resources according to the manifests.
 
 === ZooKeeper
 
@@ -25,7 +25,7 @@ Create a file named `zookeeper.yaml` with the following content:
 include::example$getting_started/zookeeper.yaml[]
 ----
 
-Then create the resources by applying the manifest file
+Then create the resources by applying the manifest file:
 
 [source,bash]
 ----
@@ -62,15 +62,15 @@ And apply it:
 include::example$getting_started/getting_started.sh[tag=install-druid]
 ----
 
-This will create the actual druid instance.
+This creates the actual Druid Stacklet.
 
 WARNING: This Druid instance uses Derby (`dbType: derby`) as a metadata store, which is an interal SQL database.
 It is not persisted and not suitable for production use!
 Consult the https://druid.apache.org/docs/latest/dependencies/metadata-storage.html#available-metadata-stores[Druid documentation] for a list of supported databases and setup instructions for production instances.
 
 == Verify that it works
 
-Next you will submit an ingestion job and then query the ingested data - either through the web interface or the API.
+Submit an ingestion job and then query the ingested data -- either through the web interface or the API.
 
 First, make sure that all the Pods in the StatefulSets are ready:
 
@@ -102,7 +102,7 @@ include::example$getting_started/getting_started.sh[tag=port-forwarding]
 
 === Ingest example data
 
-Next, we will ingest some example data using the web interface.
+Next, ingest some example data using the web interface.
 If you prefer to use the command line instead, follow the instructions in the collapsed section below.
 
 
@@ -137,7 +137,7 @@ image::getting_started/load_example.png[]
 Click through all pages of the load process.
 You can also follow the https://druid.apache.org/docs/latest/tutorials/index.html#step-4-load-data[Druid Quickstart Guide].
 
-Once you finished the ingestion dialog you should see the ingestion overview with the job, which will eventually show SUCCESS:
+Once you finished the ingestion dialog you should see the ingestion overview with the job, which eventually shows SUCCESS:
 
 image::getting_started/load_success.png[]
 
@@ -173,4 +173,4 @@ Great! You've set up your first Druid cluster, ingested some data and queried it
 
 == What's next
 
-Have a look at the xref:usage-guide/index.adoc[] page to find out more about the features of the Operator, such as S3-backed deep storage (as opposed to the HDFS backend used in this guide) or OPA-based authorization.
+Have a look at the xref:usage-guide/index.adoc[] page to find out more about the features of the operator, such as S3-backed deep storage (as opposed to the HDFS backend used in this guide) or OPA-based authorization.
diff --git a/docs/modules/druid/pages/getting_started/index.adoc b/docs/modules/druid/pages/getting_started/index.adoc
@@ -1,17 +1,18 @@
 = Getting started
 :description: Get started with Druid on Kubernetes using the Stackable Operator. Follow steps to install, configure, and query data.
 
-This guide will get you started with Druid using the Stackable Operator. It will guide you through the installation of the Operator and its dependencies, setting up your first Druid instance and connecting to it, ingesting example data and querying that data.
+This guide helps you get started with Druid using the Stackable Operator
+It covers installing the operator and its dependencies, setting up your first Druid instance, ingesting example data, and querying that data.
 
 == Prerequisites
 
-You will need:
+You need:
 
 * a Kubernetes cluster
 * kubectl
 * optional: Helm
 
-Resource sizing depends on cluster type(s), usage and scope, but as a starting point we recommend a minimum of the following resources for this operator:
+Resource sizing depends on cluster type(s), usage and scope, but as a starting point the following resources are recommended as a minimum for this operator:
 
 * 0.2 cores (e.g. i5 or similar)
 * 256MB RAM

diff --git a/docs/modules/druid/pages/getting_started/installation.adoc b/docs/modules/druid/pages/getting_started/installation.adoc
@@ -1,20 +1,17 @@
 = Installation
 :description: Install the Stackable Druid Operator and its dependencies on Kubernetes using stackablectl or Helm.
 
-On this page you will install the Stackable Druid Operator and Operators for its dependencies - ZooKeeper and HDFS - as
-well as the commons, secret and listener operator which are required by all Stackable Operators.
+Install the Stackable Operator for Apache Druid and operators for its dependencies -- ZooKeeper and HDFS -- as well as the commons, secret and listener operator which are required by all Stackable operators.
 
-== Stackable Operators
+There are multiple ways to install the operators, xref:management:stackablectl:index.adoc[] is the preferred way but Helm is also supported.
+OpenShift users may prefer installing the operator from the RedHat Certified Operator catalog using the OpenShift web console.
 
-There are 2 ways to run Stackable Operators
-
-1. Using xref:management:stackablectl:index.adoc[]
-
-2. Using Helm
-
-=== stackablectl
-
-stackablectl is the command line tool to interact with Stackable operators and our recommended way to install Operators.
+[tabs]
+====
+stackablectl::
++
+--
+stackablectl is the command line tool to interact with Stackable operators and the recommended way to install operators.
 Follow the xref:management:stackablectl:installation.adoc[installation steps] for your platform.
 
 After you have installed stackablectl run the following command to install all Operators necessary for Druid:
@@ -24,29 +21,34 @@
 include::example$getting_started/getting_started.sh[tag=stackablectl-install-operators]
 ----
 
-The tool will show
+The tool prints
 
 [source]
 include::example$getting_started/install_output.txt[]
 
 TIP: Consult the xref:management:stackablectl:quickstart.adoc[] to learn more about how to use `stackablectl`.
+--
 
-=== Helm
+Helm::
++
+--
+You can also use Helm to install the operators.
 
-You can also use Helm to install the Operators. Add the Stackable Helm repository:
+.Add the Stackable Helm repository
 [source,bash]
 ----
 include::example$getting_started/getting_started.sh[tag=helm-add-repo]
 ----
 
-Then install the Stackable Operators:
+.Install the Stackable operators
 [source,bash]
 ----
 include::example$getting_started/getting_started.sh[tag=helm-install-operators]
 ----
 
-Helm will deploy the Operators in a Kubernetes Deployment and apply the CRDs for the Apache Druid service (as well as
-the CRDs for the required operators). You are now ready to deploy Apache Druid in Kubernetes.
+Helm deploys the operators in a Kubernetes Deployment and applies the CRDs for the Apache Druid service (as well as the CRDs for the required operators).
+--
+====
 
 == What's next
 

diff --git a/docs/modules/druid/pages/index.adoc b/docs/modules/druid/pages/index.adoc
@@ -15,7 +15,7 @@
 * {feature-tracker}[Feature Tracker {external-link-icon}^]
 * {crd}[CRD documentation {external-link-icon}^]
 
-The Stackable operator for Apache Druid is an operator that can deploy and manage {druid}[Apache Druid] clusters on Kubernetes.
+The Stackable operator for Apache Druid deploys and manages {druid}[Apache Druid] clusters on Kubernetes.
 Apache Druid is an open-source, distributed data store designed to quickly process large amounts of data in real-time.
 It enables users to ingest, store, and query massive amounts of data in real-time, a great tool for handling high-volume data processing and analysis.
 This operator provides several resources and features to manage Druid clusters efficiently.
@@ -89,7 +89,7 @@ The xref:demos:nifi-kafka-druid-earthquake-data.adoc[] demo ingests {earthquake}
 == Supported versions
 
 The Stackable operator for Apache Druid currently supports the Druid versions listed below.
-To use a specific Druid version in your DruidCluster, you have to specify an image - this is explained in the xref:concepts:product-image-selection.adoc[] documentation.
+To use a specific Druid version in your Druid Stacklet, you have to specify an image -- this is explained in the xref:concepts:product-image-selection.adoc[] documentation.
 The operator also supports running images from a custom registry or running entirely customized images; both of these cases are explained under xref:concepts:product-image-selection.adoc[] as well.
 
 include::partial$supported-versions.adoc[]

diff --git a/docs/modules/druid/pages/reference/commandline-parameters.adoc b/docs/modules/druid/pages/reference/commandline-parameters.adoc
@@ -23,7 +23,7 @@ stackable-druid-operator run --product-config /foo/bar/properties.yaml
 
 *Multiple values:* false
 
-The operator will **only** watch for resources in the provided namespace `test`:
+The operator **only** watches for resources in the provided namespace `test`:
 
 [source]
 ----

diff --git a/docs/modules/druid/pages/reference/crds.adoc b/docs/modules/druid/pages/reference/crds.adoc
@@ -1,3 +1,3 @@
 = CRD Reference
 
-Find all CRD reference for the Stackable Operator for Apache Druid at: {crd-docs-base-url}/druid-operator/{crd-docs-version}.
+Find all CRD reference for the Stackable operator for Apache Druid at: {crd-docs-base-url}/druid-operator/{crd-docs-version}.
diff --git a/docs/modules/druid/pages/reference/discovery.adoc b/docs/modules/druid/pages/reference/discovery.adoc
@@ -6,8 +6,7 @@
 :namespace: stackable
 :routerPort: 8888
 
-The Stackable Operator for Druid publishes a xref:concepts:service_discovery.adoc[] with the following properties,
-where `{clusterName}` represents the name and `{namespace}` the namespace of the cluster:
+The Stackable operator for Druid publishes a xref:concepts:service_discovery.adoc[] with the following properties, where `{clusterName}` represents the name and `{namespace}` the namespace of the cluster:
 
 `DRUID_AVATICA_JDBC`::
 ====

diff --git a/docs/modules/druid/pages/reference/environment-variables.adoc b/docs/modules/druid/pages/reference/environment-variables.adoc
@@ -36,7 +36,7 @@ docker run \
 
 *Multiple values:* false
 
-The operator will **only** watch for resources in the provided namespace `test`:
+The operator **only** watches for resources in the provided namespace `test`:
 
 [source]
 ----

diff --git a/docs/modules/druid/pages/required-external-components.adoc b/docs/modules/druid/pages/required-external-components.adoc
@@ -1,14 +1,17 @@
-# Required external components
+= Required external components
 :description: Druid requires an SQL database for metadata and supports various deep storage options like S3, HDFS, and cloud storage
+:druid-available-metadata-stores: https://druid.apache.org/docs/latest/design/metadata-storage/#available-metadata-stores
+:druid-deep-storage: https://druid.apache.org/docs/latest/design/deep-storage
 
 Druid uses an SQL database to store metadata.
-Consult the https://druid.apache.org/docs/latest/dependencies/metadata-storage.html#available-metadata-stores[Druid documentation] for a list of supported databases and setup instructions.
+Consult the {druid-available-metadata-stores}[Druid documentation] for a list of supported databases and setup instructions.
 
-## Feature specific: S3 and cloud deep storage
+== Feature specific: S3 and cloud deep storage
 
-https://druid.apache.org/docs/latest/dependencies/deep-storage.html[Deep storage] is where segments are stored.
-Druid offers multiple storage backends. For the local storage there are no prerequisites.
+{druid-deep-storage}[Deep storage] is where segments are stored.
+Druid offers multiple storage backends.
+For the local storage there are no prerequisites.
 HDFS deep storage can be set up with the xref:hdfs:index.adoc[Stackable Operator for Apache HDFS].
-For S3 deep storage or the Google Cloud and Azure storage backends, you need to set up the storage.
+For S3 deep storage or the Google Cloud and Azure storage backends, you need to set up the respective storage backend.
 
 Read the xref:usage-guide/deep-storage.adoc[deep storage usage guide] to learn more about configuring Druid deep storage.
diff --git a/docs/modules/druid/pages/usage-guide/configuration-and-environment-overrides.adoc b/docs/modules/druid/pages/usage-guide/configuration-and-environment-overrides.adoc
@@ -1,5 +1,6 @@
 = Configuration & Environment Overrides
 :description: Override Druid configuration properties and environment variables per role or role group. Customize runtime.properties, jvm.config, and security.properties as needed.
+:druid-config-reference: https://druid.apache.org/docs/latest/configuration/index.html
 
 The cluster definition also supports overriding configuration properties and environment variables, either per role or per role group, where the more specific override (role group) has precedence over the less specific one (role).
 
@@ -9,9 +10,9 @@ IMPORTANT: Overriding certain properties which are set by the operator (such as
 
 For a role or role group, at the same level of `config`, you can specify: `configOverrides` for the following files:
 
-- `runtime.properties`
-- `jvm.config`
-- `security.properties`
+* `runtime.properties`
+* `jvm.config`
+* `security.properties`
 
 For example, if you want to set the `druid.server.http.numThreads` for the router to 100 adapt the `routers` section of the cluster resource like so:
 
@@ -43,13 +44,16 @@ routers:
 
 All override property values must be strings.
 
-For a full list of configuration options please refer to the Druid https://druid.apache.org/docs/latest/configuration/index.html[Configuration Reference].
+For a full list of configuration options please refer to the Druid {druid-config-reference}[Configuration Reference].
 
 === The security.properties file
 
-The `security.properties` file is used to configure JVM security properties. It is very seldom that users need to tweak any of these, but there is one use-case that stands out, and that users need to be aware of: the JVM DNS cache.
+The `security.properties` file is used to configure JVM security properties.
+It is very seldom that users need to tweak any of these, but there is one use-case that stands out, and that users need to be aware of: the JVM DNS cache.
 
-The JVM manages it's own cache of successfully resolved host names as well as a cache of host names that cannot be resolved. Some products of the Stackable platform are very sensible to the contents of these caches and their performance is heavily affected by them. As of version 26.0.0 Apache Druid performs poorly if the positive cache is disabled. To cache resolved host names, and thus speeding up Druid queries you can configure the TTL of entries in the positive cache like this:
+The JVM manages it's own cache of successfully resolved host names as well as a cache of host names that cannot be resolved.
+Some products of the Stackable platform are very sensible to the contents of these caches and their performance is heavily affected by them.
+As of version 26.0.0 Apache Druid performs poorly if the positive cache is disabled. To cache resolved host names, and thus speeding up Druid queries you can configure the TTL of entries in the positive cache like this:
 
 [source,yaml]
 ----
@@ -84,9 +88,10 @@ NOTE: The operator configures DNS caching by default as shown in the example abo
 
 For details on the JVM security see https://docs.oracle.com/en/java/javase/11/security/java-security-overview1.html
 
-== Environment Variables
+== Environment variables
 
-In a similar fashion, environment variables can be (over)written. For example per role group:
+In a similar fashion, environment variables can be (over)written.
+For example per role group:
 
 [source,yaml]
 ----
@@ -113,3 +118,8 @@ routers:
 ----
 
 // cliOverrides don't make sense for this operator, so the feature is omitted for now
+
+== Pod overrides
+
+The Druid operator also supports Pod overrides, allowing you to override any property that you can set on a Kubernetes Pod.
+Read the xref:concepts:overrides.adoc#pod-overrides[Pod overrides documentation] to learn more about this feature.
diff --git a/docs/modules/druid/pages/usage-guide/deep-storage.adoc b/docs/modules/druid/pages/usage-guide/deep-storage.adoc
@@ -1,7 +1,8 @@
 = Deep storage configuration
 :description: Configure Apache Druid deep storage with HDFS or S3. Set up HDFS via a ConfigMap, or use S3 with inline or referenced bucket details.
+:druid-deep-storage: https://druid.apache.org/docs/latest/design/deep-storage/
 
-https://druid.apache.org/docs/latest/design/deep-storage/[Deep Storage] is where Druid stores data segments.
+{druid-deep-storage}[Deep Storage] is where Druid stores data segments.
 For a Kubernetes environment, either the HDFS or S3 backend is recommended.
 
 == [[hdfs]]HDFS

diff --git a/docs/modules/druid/pages/usage-guide/extensions.adoc b/docs/modules/druid/pages/usage-guide/extensions.adoc
@@ -1,32 +1,32 @@
 = Druid extensions
 :druid-extensions: https://druid.apache.org/docs/latest/configuration/extensions/
+:druid-core-extensions: https://druid.apache.org/docs/latest/configuration/extensions/#core-extensions
 :druid-community-extensions: https://druid.apache.org/docs/latest/configuration/extensions/#loading-community-extensions
 :description: Add functionality to Druid with default or custom extensions. Default extensions include Kafka and HDFS support; community extensions require extra setup.
 
 {druid-extensions}[Druid extensions] are used to provide additional functionality at runtime, e.g. for data formats or different types of deep storage.
 
 == [[default-extensions]]Default extensions
 
-Some extensions are loaded by default:
+Druid Stacklets use the following extensions by default:
 
-- `druid-kafka-indexing-service`
-- `druid-datasketches`
-- `prometheus-emitter`
-- `druid-basic-security`
-- `druid-opa-authorizer`
-- `druid-hdfs-storage`
+* `druid-kafka-indexing-service`
+* `druid-datasketches`
+* `prometheus-emitter`
+* `druid-basic-security`
+* `druid-opa-authorizer`
+* `druid-hdfs-storage`
 
 Some extensions are loaded conditionally depending on the DruidCluster configuration:
 
-- `postgresql-metadata-storage` (when using PostgreSQL as metadata store)
-- `mysql-metadata-storage` (when using MySQL as metadata store)
-- `druid-s3-extensions` (when S3 is used as deep storage)
-- `simple-client-sslcontext` (when TLS is enabled)
+* `postgresql-metadata-storage` (when using PostgreSQL as metadata store)
+* `mysql-metadata-storage` (when using MySQL as metadata store)
+* `druid-s3-extensions` (when S3 is used as deep storage)
+* `simple-client-sslcontext` (when TLS is enabled)
 
 == [[custom-extensions]]Custom extensions
 
-Druid can be configured load any number of additional extensions.
-Core extensions are already bundled with Druid but adding community extensions requires {druid-community-extensions}[some extra steps].
+You can configure Druid to load more extensions by adding them to the `additionalExtensions` list as shown below:
 
 [source,yaml]
 ----
@@ -38,4 +38,6 @@ spec:
     ...
 ----
 
+{druid-core-extensions}[Core extensions] are already bundled with Druid but adding community extensions requires {druid-community-extensions}[some extra steps].
+
 Some extensions may require additional configuration which can be added using xref:usage-guide/configuration-and-environment-overrides.adoc[configuration and environment overrides].