Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: pod overrides and other improvements #630

Merged
merged 9 commits into from
Sep 27, 2024
20 changes: 10 additions & 10 deletions docs/modules/druid/pages/getting_started/first_steps.adoc
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
= First steps
:description: Set up a Druid cluster using the Stackable Operator by installing ZooKeeper, HDFS, and Druid. Ingest and query example data via the web UI or API.

After going through the xref:getting_started/installation.adoc[] section and having installed all the Operators, you will now deploy a Druid cluster and its dependencies.
Afterwards you can <<_verify_that_it_works, verify that it works>> by ingesting example data and subsequently query it.
With the operators installed, deploy a Druid cluster and its dependencies.
Afterward you can <<_verify_that_it_works, verify that it works>> by ingesting example data and subsequently query it.

== Setup

Expand All @@ -12,8 +12,8 @@ Three things need to be installed to have a Druid cluster:
* An HDFS instance to be used as a backend for deep storage
* The Druid cluster itself

We will create them in this order, each one is created by applying a manifest file.
The Operators you just installed will then create the resources according to the manifest.
Create them in this order, each one is created by applying a manifest file.
The operators you just installed then create the resources according to the manifests.

=== ZooKeeper

Expand All @@ -25,7 +25,7 @@ Create a file named `zookeeper.yaml` with the following content:
include::example$getting_started/zookeeper.yaml[]
----

Then create the resources by applying the manifest file
Then create the resources by applying the manifest file:

[source,bash]
----
Expand Down Expand Up @@ -62,15 +62,15 @@ And apply it:
include::example$getting_started/getting_started.sh[tag=install-druid]
----

This will create the actual druid instance.
This creates the actual Druid Stacklet.

WARNING: This Druid instance uses Derby (`dbType: derby`) as a metadata store, which is an interal SQL database.
It is not persisted and not suitable for production use!
Consult the https://druid.apache.org/docs/latest/dependencies/metadata-storage.html#available-metadata-stores[Druid documentation] for a list of supported databases and setup instructions for production instances.

== Verify that it works

Next you will submit an ingestion job and then query the ingested data - either through the web interface or the API.
Submit an ingestion job and then query the ingested data -- either through the web interface or the API.

First, make sure that all the Pods in the StatefulSets are ready:

Expand Down Expand Up @@ -102,7 +102,7 @@ include::example$getting_started/getting_started.sh[tag=port-forwarding]

=== Ingest example data

Next, we will ingest some example data using the web interface.
Next, ingest some example data using the web interface.
If you prefer to use the command line instead, follow the instructions in the collapsed section below.


Expand Down Expand Up @@ -137,7 +137,7 @@ image::getting_started/load_example.png[]
Click through all pages of the load process.
You can also follow the https://druid.apache.org/docs/latest/tutorials/index.html#step-4-load-data[Druid Quickstart Guide].

Once you finished the ingestion dialog you should see the ingestion overview with the job, which will eventually show SUCCESS:
Once you finished the ingestion dialog you should see the ingestion overview with the job, which eventually shows SUCCESS:

image::getting_started/load_success.png[]

Expand Down Expand Up @@ -173,4 +173,4 @@ Great! You've set up your first Druid cluster, ingested some data and queried it

== What's next

Have a look at the xref:usage-guide/index.adoc[] page to find out more about the features of the Operator, such as S3-backed deep storage (as opposed to the HDFS backend used in this guide) or OPA-based authorization.
Have a look at the xref:usage-guide/index.adoc[] page to find out more about the features of the operator, such as S3-backed deep storage (as opposed to the HDFS backend used in this guide) or OPA-based authorization.
7 changes: 4 additions & 3 deletions docs/modules/druid/pages/getting_started/index.adoc
Original file line number Diff line number Diff line change
@@ -1,17 +1,18 @@
= Getting started
:description: Get started with Druid on Kubernetes using the Stackable Operator. Follow steps to install, configure, and query data.

This guide will get you started with Druid using the Stackable Operator. It will guide you through the installation of the Operator and its dependencies, setting up your first Druid instance and connecting to it, ingesting example data and querying that data.
This guide helps you get started with Druid using the Stackable Operator
It covers installing the operator and its dependencies, setting up your first Druid instance, ingesting example data, and querying that data.

== Prerequisites

You will need:
You need:

* a Kubernetes cluster
* kubectl
* optional: Helm

Resource sizing depends on cluster type(s), usage and scope, but as a starting point we recommend a minimum of the following resources for this operator:
Resource sizing depends on cluster type(s), usage and scope, but as a starting point the following resources are recommended as a minimum for this operator:

* 0.2 cores (e.g. i5 or similar)
* 256MB RAM
Expand Down
38 changes: 20 additions & 18 deletions docs/modules/druid/pages/getting_started/installation.adoc
Original file line number Diff line number Diff line change
@@ -1,20 +1,17 @@
= Installation
:description: Install the Stackable Druid Operator and its dependencies on Kubernetes using stackablectl or Helm.

On this page you will install the Stackable Druid Operator and Operators for its dependencies - ZooKeeper and HDFS - as
well as the commons, secret and listener operator which are required by all Stackable Operators.
Install the Stackable Operator for Apache Druid and operators for its dependencies -- ZooKeeper and HDFS -- as well as the commons, secret and listener operator which are required by all Stackable operators.

== Stackable Operators
There are multiple ways to install the operators, xref:management:stackablectl:index.adoc[] is the preferred way but Helm is also supported.

Check notice on line 6 in docs/modules/druid/pages/getting_started/installation.adoc

View workflow job for this annotation

GitHub Actions / LanguageTool

[LanguageTool] docs/modules/druid/pages/getting_started/installation.adoc#L6

Use a comma before ‘but’ if it connects two independent clauses (unless they are closely connected and short). (COMMA_COMPOUND_SENTENCE_2[3]) Suggestions: `, but` URL: https://languagetool.org/insights/post/types-of-sentences/#compound-sentence Rule: https://community.languagetool.org/rule/show/COMMA_COMPOUND_SENTENCE_2?lang=en-US&subId=3 Category: PUNCTUATION
Raw output
docs/modules/druid/pages/getting_started/installation.adoc:6:112: Use a comma before ‘but’ if it connects two independent clauses (unless they are closely connected and short). (COMMA_COMPOUND_SENTENCE_2[3])
 Suggestions: `, but`
 URL: https://languagetool.org/insights/post/types-of-sentences/#compound-sentence 
 Rule: https://community.languagetool.org/rule/show/COMMA_COMPOUND_SENTENCE_2?lang=en-US&subId=3
 Category: PUNCTUATION
OpenShift users may prefer installing the operator from the RedHat Certified Operator catalog using the OpenShift web console.

There are 2 ways to run Stackable Operators

1. Using xref:management:stackablectl:index.adoc[]

2. Using Helm

=== stackablectl

stackablectl is the command line tool to interact with Stackable operators and our recommended way to install Operators.
[tabs]
====
stackablectl::
+
--
stackablectl is the command line tool to interact with Stackable operators and the recommended way to install operators.
Follow the xref:management:stackablectl:installation.adoc[installation steps] for your platform.

After you have installed stackablectl run the following command to install all Operators necessary for Druid:
Expand All @@ -24,29 +21,34 @@
include::example$getting_started/getting_started.sh[tag=stackablectl-install-operators]
----

The tool will show
The tool prints

[source]
include::example$getting_started/install_output.txt[]

TIP: Consult the xref:management:stackablectl:quickstart.adoc[] to learn more about how to use `stackablectl`.
--

=== Helm
Helm::
+
--
You can also use Helm to install the operators.

You can also use Helm to install the Operators. Add the Stackable Helm repository:
.Add the Stackable Helm repository
[source,bash]
----
include::example$getting_started/getting_started.sh[tag=helm-add-repo]
----

Then install the Stackable Operators:
.Install the Stackable operators
[source,bash]
----
include::example$getting_started/getting_started.sh[tag=helm-install-operators]
----

Helm will deploy the Operators in a Kubernetes Deployment and apply the CRDs for the Apache Druid service (as well as
the CRDs for the required operators). You are now ready to deploy Apache Druid in Kubernetes.
Helm deploys the operators in a Kubernetes Deployment and applies the CRDs for the Apache Druid service (as well as the CRDs for the required operators).
--
====

== What's next

Expand Down
4 changes: 2 additions & 2 deletions docs/modules/druid/pages/index.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
* {feature-tracker}[Feature Tracker {external-link-icon}^]
* {crd}[CRD documentation {external-link-icon}^]

The Stackable operator for Apache Druid is an operator that can deploy and manage {druid}[Apache Druid] clusters on Kubernetes.
The Stackable operator for Apache Druid deploys and manages {druid}[Apache Druid] clusters on Kubernetes.
Apache Druid is an open-source, distributed data store designed to quickly process large amounts of data in real-time.
It enables users to ingest, store, and query massive amounts of data in real-time, a great tool for handling high-volume data processing and analysis.
This operator provides several resources and features to manage Druid clusters efficiently.
Expand Down Expand Up @@ -89,7 +89,7 @@ The xref:demos:nifi-kafka-druid-earthquake-data.adoc[] demo ingests {earthquake}
== Supported versions

The Stackable operator for Apache Druid currently supports the Druid versions listed below.
To use a specific Druid version in your DruidCluster, you have to specify an image - this is explained in the xref:concepts:product-image-selection.adoc[] documentation.
To use a specific Druid version in your Druid Stacklet, you have to specify an image -- this is explained in the xref:concepts:product-image-selection.adoc[] documentation.
The operator also supports running images from a custom registry or running entirely customized images; both of these cases are explained under xref:concepts:product-image-selection.adoc[] as well.

include::partial$supported-versions.adoc[]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ stackable-druid-operator run --product-config /foo/bar/properties.yaml

*Multiple values:* false

The operator will **only** watch for resources in the provided namespace `test`:
The operator **only** watches for resources in the provided namespace `test`:

[source]
----
Expand Down
2 changes: 1 addition & 1 deletion docs/modules/druid/pages/reference/crds.adoc
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
= CRD Reference

Find all CRD reference for the Stackable Operator for Apache Druid at: {crd-docs-base-url}/druid-operator/{crd-docs-version}.
Find all CRD reference for the Stackable operator for Apache Druid at: {crd-docs-base-url}/druid-operator/{crd-docs-version}.
3 changes: 1 addition & 2 deletions docs/modules/druid/pages/reference/discovery.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,7 @@
:namespace: stackable
:routerPort: 8888

The Stackable Operator for Druid publishes a xref:concepts:service_discovery.adoc[] with the following properties,
where `{clusterName}` represents the name and `{namespace}` the namespace of the cluster:
The Stackable operator for Druid publishes a xref:concepts:service_discovery.adoc[] with the following properties, where `{clusterName}` represents the name and `{namespace}` the namespace of the cluster:

`DRUID_AVATICA_JDBC`::
====
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ docker run \

*Multiple values:* false

The operator will **only** watch for resources in the provided namespace `test`:
The operator **only** watches for resources in the provided namespace `test`:

[source]
----
Expand Down
15 changes: 9 additions & 6 deletions docs/modules/druid/pages/required-external-components.adoc
Original file line number Diff line number Diff line change
@@ -1,14 +1,17 @@
# Required external components
= Required external components
:description: Druid requires an SQL database for metadata and supports various deep storage options like S3, HDFS, and cloud storage
:druid-available-metadata-stores: https://druid.apache.org/docs/latest/design/metadata-storage/#available-metadata-stores
:druid-deep-storage: https://druid.apache.org/docs/latest/design/deep-storage

Druid uses an SQL database to store metadata.
Consult the https://druid.apache.org/docs/latest/dependencies/metadata-storage.html#available-metadata-stores[Druid documentation] for a list of supported databases and setup instructions.
Consult the {druid-available-metadata-stores}[Druid documentation] for a list of supported databases and setup instructions.

## Feature specific: S3 and cloud deep storage
== Feature specific: S3 and cloud deep storage

https://druid.apache.org/docs/latest/dependencies/deep-storage.html[Deep storage] is where segments are stored.
Druid offers multiple storage backends. For the local storage there are no prerequisites.
{druid-deep-storage}[Deep storage] is where segments are stored.
Druid offers multiple storage backends.
For the local storage there are no prerequisites.
HDFS deep storage can be set up with the xref:hdfs:index.adoc[Stackable Operator for Apache HDFS].
For S3 deep storage or the Google Cloud and Azure storage backends, you need to set up the storage.
For S3 deep storage or the Google Cloud and Azure storage backends, you need to set up the respective storage backend.

Read the xref:usage-guide/deep-storage.adoc[deep storage usage guide] to learn more about configuring Druid deep storage.
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
= Configuration & Environment Overrides
:description: Override Druid configuration properties and environment variables per role or role group. Customize runtime.properties, jvm.config, and security.properties as needed.
:druid-config-reference: https://druid.apache.org/docs/latest/configuration/index.html

The cluster definition also supports overriding configuration properties and environment variables, either per role or per role group, where the more specific override (role group) has precedence over the less specific one (role).

Expand All @@ -9,9 +10,9 @@ IMPORTANT: Overriding certain properties which are set by the operator (such as

For a role or role group, at the same level of `config`, you can specify: `configOverrides` for the following files:

- `runtime.properties`
- `jvm.config`
- `security.properties`
* `runtime.properties`
* `jvm.config`
* `security.properties`

For example, if you want to set the `druid.server.http.numThreads` for the router to 100 adapt the `routers` section of the cluster resource like so:

Expand Down Expand Up @@ -43,13 +44,16 @@ routers:

All override property values must be strings.

For a full list of configuration options please refer to the Druid https://druid.apache.org/docs/latest/configuration/index.html[Configuration Reference].
For a full list of configuration options please refer to the Druid {druid-config-reference}[Configuration Reference].

=== The security.properties file

The `security.properties` file is used to configure JVM security properties. It is very seldom that users need to tweak any of these, but there is one use-case that stands out, and that users need to be aware of: the JVM DNS cache.
The `security.properties` file is used to configure JVM security properties.
It is very seldom that users need to tweak any of these, but there is one use-case that stands out, and that users need to be aware of: the JVM DNS cache.

The JVM manages it's own cache of successfully resolved host names as well as a cache of host names that cannot be resolved. Some products of the Stackable platform are very sensible to the contents of these caches and their performance is heavily affected by them. As of version 26.0.0 Apache Druid performs poorly if the positive cache is disabled. To cache resolved host names, and thus speeding up Druid queries you can configure the TTL of entries in the positive cache like this:
The JVM manages it's own cache of successfully resolved host names as well as a cache of host names that cannot be resolved.
Some products of the Stackable platform are very sensible to the contents of these caches and their performance is heavily affected by them.
As of version 26.0.0 Apache Druid performs poorly if the positive cache is disabled. To cache resolved host names, and thus speeding up Druid queries you can configure the TTL of entries in the positive cache like this:

[source,yaml]
----
Expand Down Expand Up @@ -84,9 +88,10 @@ NOTE: The operator configures DNS caching by default as shown in the example abo

For details on the JVM security see https://docs.oracle.com/en/java/javase/11/security/java-security-overview1.html

== Environment Variables
== Environment variables

In a similar fashion, environment variables can be (over)written. For example per role group:
In a similar fashion, environment variables can be (over)written.
For example per role group:

[source,yaml]
----
Expand All @@ -113,3 +118,8 @@ routers:
----

// cliOverrides don't make sense for this operator, so the feature is omitted for now

== Pod overrides

The Druid operator also supports Pod overrides, allowing you to override any property that you can set on a Kubernetes Pod.
Read the xref:concepts:overrides.adoc#pod-overrides[Pod overrides documentation] to learn more about this feature.
3 changes: 2 additions & 1 deletion docs/modules/druid/pages/usage-guide/deep-storage.adoc
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
= Deep storage configuration
:description: Configure Apache Druid deep storage with HDFS or S3. Set up HDFS via a ConfigMap, or use S3 with inline or referenced bucket details.
:druid-deep-storage: https://druid.apache.org/docs/latest/design/deep-storage/

https://druid.apache.org/docs/latest/design/deep-storage/[Deep Storage] is where Druid stores data segments.
{druid-deep-storage}[Deep Storage] is where Druid stores data segments.
For a Kubernetes environment, either the HDFS or S3 backend is recommended.

== [[hdfs]]HDFS
Expand Down
28 changes: 15 additions & 13 deletions docs/modules/druid/pages/usage-guide/extensions.adoc
Original file line number Diff line number Diff line change
@@ -1,32 +1,32 @@
= Druid extensions
:druid-extensions: https://druid.apache.org/docs/latest/configuration/extensions/
:druid-core-extensions: https://druid.apache.org/docs/latest/configuration/extensions/#core-extensions
:druid-community-extensions: https://druid.apache.org/docs/latest/configuration/extensions/#loading-community-extensions
:description: Add functionality to Druid with default or custom extensions. Default extensions include Kafka and HDFS support; community extensions require extra setup.

{druid-extensions}[Druid extensions] are used to provide additional functionality at runtime, e.g. for data formats or different types of deep storage.

== [[default-extensions]]Default extensions

Some extensions are loaded by default:
Druid Stacklets use the following extensions by default:

- `druid-kafka-indexing-service`
- `druid-datasketches`
- `prometheus-emitter`
- `druid-basic-security`
- `druid-opa-authorizer`
- `druid-hdfs-storage`
* `druid-kafka-indexing-service`
* `druid-datasketches`
* `prometheus-emitter`
* `druid-basic-security`
* `druid-opa-authorizer`
* `druid-hdfs-storage`

Some extensions are loaded conditionally depending on the DruidCluster configuration:

- `postgresql-metadata-storage` (when using PostgreSQL as metadata store)
- `mysql-metadata-storage` (when using MySQL as metadata store)
- `druid-s3-extensions` (when S3 is used as deep storage)
- `simple-client-sslcontext` (when TLS is enabled)
* `postgresql-metadata-storage` (when using PostgreSQL as metadata store)
* `mysql-metadata-storage` (when using MySQL as metadata store)
* `druid-s3-extensions` (when S3 is used as deep storage)
* `simple-client-sslcontext` (when TLS is enabled)

== [[custom-extensions]]Custom extensions

Druid can be configured load any number of additional extensions.
Core extensions are already bundled with Druid but adding community extensions requires {druid-community-extensions}[some extra steps].
You can configure Druid to load more extensions by adding them to the `additionalExtensions` list as shown below:

[source,yaml]
----
Expand All @@ -38,4 +38,6 @@ spec:
...
----

{druid-core-extensions}[Core extensions] are already bundled with Druid but adding community extensions requires {druid-community-extensions}[some extra steps].

Some extensions may require additional configuration which can be added using xref:usage-guide/configuration-and-environment-overrides.adoc[configuration and environment overrides].
Loading
Loading