Skip to content

Commit

Permalink
Add descriptions (#620)
Browse files Browse the repository at this point in the history
* Add descriptions

* Update docs/modules/druid/pages/getting_started/first_steps.adoc

Co-authored-by: Malte Sander <[email protected]>

---------

Co-authored-by: Malte Sander <[email protected]>
  • Loading branch information
fhennig and maltesander authored Sep 12, 2024
1 parent 0cf0b46 commit cacb557
Show file tree
Hide file tree
Showing 14 changed files with 34 additions and 11 deletions.
21 changes: 14 additions & 7 deletions docs/modules/druid/pages/getting_started/first_steps.adoc
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
= First steps
:description: Set up a Druid cluster using the Stackable Operator by installing ZooKeeper, HDFS, and Druid. Ingest and query example data via the web UI or API.

After going through the xref:getting_started/installation.adoc[] section and having installed all the Operators, you will now deploy a Druid cluster and it's dependencies. Afterwards you can <<_verify_that_it_works, verify that it works>> by ingesting example data and subsequently query it.
After going through the xref:getting_started/installation.adoc[] section and having installed all the Operators, you will now deploy a Druid cluster and its dependencies.
Afterwards you can <<_verify_that_it_works, verify that it works>> by ingesting example data and subsequently query it.

== Setup

Expand All @@ -10,7 +12,8 @@ Three things need to be installed to have a Druid cluster:
* An HDFS instance to be used as a backend for deep storage
* The Druid cluster itself

We will create them in this order, each one is created by applying a manifest file. The Operators you just installed will then create the resources according to the manifest.
We will create them in this order, each one is created by applying a manifest file.
The Operators you just installed will then create the resources according to the manifest.

=== ZooKeeper

Expand Down Expand Up @@ -61,11 +64,13 @@ include::example$getting_started/getting_started.sh[tag=install-druid]

This will create the actual druid instance.

WARNING: This Druid instance uses Derby (`dbType: derby`) as a metadata store, which is an interal SQL database. It is not persisted and not suitable for production use! Consult the https://druid.apache.org/docs/latest/dependencies/metadata-storage.html#available-metadata-stores[Druid documentation] for a list of supported databases and setup instructions for production instances.
WARNING: This Druid instance uses Derby (`dbType: derby`) as a metadata store, which is an interal SQL database.
It is not persisted and not suitable for production use!
Consult the https://druid.apache.org/docs/latest/dependencies/metadata-storage.html#available-metadata-stores[Druid documentation] for a list of supported databases and setup instructions for production instances.

== Verify that it works

Next you will submit an ingestion job and then query the ingested data - either through the web interface or the API.
Next you will submit an ingestion job and then query the ingested data - either through the web interface or the API.

First, make sure that all the Pods in the StatefulSets are ready:

Expand Down Expand Up @@ -97,7 +102,8 @@ include::example$getting_started/getting_started.sh[tag=port-forwarding]

=== Ingest example data

Next, we will ingest some example data using the web interface. If you prefer to use the command line instead, follow the instructions in the collapsed section below.
Next, we will ingest some example data using the web interface.
If you prefer to use the command line instead, follow the instructions in the collapsed section below.


[#ingest-cmd-line]
Expand Down Expand Up @@ -128,15 +134,16 @@ Now load the example data:

image::getting_started/load_example.png[]

Click through all pages of the load process. You can also follow the https://druid.apache.org/docs/latest/tutorials/index.html#step-4-load-data[Druid Quickstart Guide].
Click through all pages of the load process.
You can also follow the https://druid.apache.org/docs/latest/tutorials/index.html#step-4-load-data[Druid Quickstart Guide].

Once you finished the ingestion dialog you should see the ingestion overview with the job, which will eventually show SUCCESS:

image::getting_started/load_success.png[]

=== Query the data

Query from the user interface by navigating to the "Query" interface in the menu and query the `wikipedia` table:
Query from the user interface by navigating to the "Query" interface in the menu and query the `wikipedia` table:

[#query-cmd-line]
.Alternative: Using the command line
Expand Down
1 change: 1 addition & 0 deletions docs/modules/druid/pages/getting_started/index.adoc
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
= Getting started
:description: Get started with Druid on Kubernetes using the Stackable Operator. Follow steps to install, configure, and query data.

This guide will get you started with Druid using the Stackable Operator. It will guide you through the installation of the Operator and its dependencies, setting up your first Druid instance and connecting to it, ingesting example data and querying that data.

Expand Down
1 change: 1 addition & 0 deletions docs/modules/druid/pages/getting_started/installation.adoc
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
= Installation
:description: Install the Stackable Druid Operator and its dependencies on Kubernetes using stackablectl or Helm.

On this page you will install the Stackable Druid Operator and Operators for its dependencies - ZooKeeper and HDFS - as
well as the commons, secret and listener operator which are required by all Stackable Operators.
Expand Down
2 changes: 1 addition & 1 deletion docs/modules/druid/pages/index.adoc
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
= Stackable Operator for Apache Druid
:description: The Stackable Operator for Apache Druid is a Kubernetes operator that can manage Apache Druid clusters. Learn about its features, resources, dependencies, and demos, and see the list of supported Druid versions.
:description: The Stackable Operator for Apache Druid is a Kubernetes operator that manages Druid clusters, handling setup, dependencies, and integration with tools like Trino.
:keywords: Stackable Operator, Apache Druid, Kubernetes, operator, DevOps, CRD, ZooKeeper, HDFS, S3, Kafka, Trino, OPA
:github: https://github.com/stackabletech/druid-operator/
:crd: {crd-docs-base-url}/druid-operator/{crd-docs-version}/
Expand Down
9 changes: 7 additions & 2 deletions docs/modules/druid/pages/required-external-components.adoc
Original file line number Diff line number Diff line change
@@ -1,9 +1,14 @@
# Required external components
:description: Druid requires an SQL database for metadata and supports various deep storage options like S3, HDFS, and cloud storage

Druid uses an SQL database to store metadata. Consult the https://druid.apache.org/docs/latest/dependencies/metadata-storage.html#available-metadata-stores[Druid documentation] for a list of supported databases and setup instructions.
Druid uses an SQL database to store metadata.
Consult the https://druid.apache.org/docs/latest/dependencies/metadata-storage.html#available-metadata-stores[Druid documentation] for a list of supported databases and setup instructions.

## Feature specific: S3 and cloud deep storage

https://druid.apache.org/docs/latest/dependencies/deep-storage.html[Deep storage] is where segments are stored. Druid offers multiple storage backends. For the local storage there are no prerequisites. HDFS deep storage can be set up with the xref:hdfs:index.adoc[Stackable Operator for Apache HDFS]. For S3 deep storage or the Google Cloud and Azure storage backends, you need to set up the storage.
https://druid.apache.org/docs/latest/dependencies/deep-storage.html[Deep storage] is where segments are stored.
Druid offers multiple storage backends. For the local storage there are no prerequisites.
HDFS deep storage can be set up with the xref:hdfs:index.adoc[Stackable Operator for Apache HDFS].
For S3 deep storage or the Google Cloud and Azure storage backends, you need to set up the storage.

Read the xref:usage-guide/deep-storage.adoc[deep storage usage guide] to learn more about configuring Druid deep storage.
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
= Configuration & Environment Overrides
:description: Override Druid configuration properties and environment variables per role or role group. Customize runtime.properties, jvm.config, and security.properties as needed.

The cluster definition also supports overriding configuration properties and environment variables, either per role or per role group, where the more specific override (role group) has precedence over the less specific one (role).

Expand Down
3 changes: 2 additions & 1 deletion docs/modules/druid/pages/usage-guide/deep-storage.adoc
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
= Deep storage configuration
:description: Configure Apache Druid deep storage with HDFS or S3. Set up HDFS via a ConfigMap, or use S3 with inline or referenced bucket details.

https://druid.apache.org/docs/latest/design/deep-storage/[Deep Storage] is where Druid stores data segments.
For a Kubernetes environment, either the HDFS or S3 backend is recommended.
Expand All @@ -19,7 +20,7 @@ spec:
directory: /druid # <2>
...
----
<1> Name of the HDFS cluster discovery config map. Can be supplied manually for a cluster not provided by Stackable. Needs to contain the `core-site.xml` and `hdfs-site.xml`.
<1> Name of the HDFS cluster discovery ConfigMap. Can be supplied manually for a cluster not provided by Stackable. Needs to contain the `core-site.xml` and `hdfs-site.xml`.
<2> The directory where to store the druid data.

== [[s3]]S3
Expand Down
1 change: 1 addition & 0 deletions docs/modules/druid/pages/usage-guide/extensions.adoc
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
= Druid extensions
:druid-extensions: https://druid.apache.org/docs/latest/configuration/extensions/
:druid-community-extensions: https://druid.apache.org/docs/latest/configuration/extensions/#loading-community-extensions
:description: Add functionality to Druid with default or custom extensions. Default extensions include Kafka and HDFS support; community extensions require extra setup.

{druid-extensions}[Druid extensions] are used to provide additional functionality at runtime, e.g. for data formats or different types of deep storage.

Expand Down
1 change: 1 addition & 0 deletions docs/modules/druid/pages/usage-guide/ingestion.adoc
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
= Ingestion
:description: Ingest data from S3 by specifying the host and optional credentials. Add external files to Druid pods using extra volumes for client certificates or keytabs.

== [[s3]]From S3

Expand Down
1 change: 1 addition & 0 deletions docs/modules/druid/pages/usage-guide/listenerclass.adoc
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
= Service exposition with ListenerClasses
:description: Configure Apache Druid service exposure using ListenerClass to control service types: cluster-internal, external-unstable, or external-stable.

Apache Druid offers a web UI and an API, both are exposed by the `router` role.
Other roles also expose API endpoints such as the `broker` and `coordinator`.
Expand Down
1 change: 1 addition & 0 deletions docs/modules/druid/pages/usage-guide/logging.adoc
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
= Log aggregation
:description: Forward logs to a Vector aggregator by enabling the log agent and specifying a discovery ConfigMap.

The logs can be forwarded to a Vector log aggregator by providing a discovery ConfigMap for the aggregator and by enabling the log agent:

Expand Down
1 change: 1 addition & 0 deletions docs/modules/druid/pages/usage-guide/monitoring.adoc
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
= Monitoring
:description: Managed Druid instances export Prometheus metrics by default for easy monitoring.

The managed Druid instances are automatically configured to export Prometheus metrics.
See xref:operators:monitoring.adoc[] for more details.
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
= Storage and resource configuration
:description: Configure storage and resource requests for Druid with default settings for CPU, memory, and additional settings for historical segment caches.

== Storage for data volumes

Expand Down
1 change: 1 addition & 0 deletions docs/modules/druid/pages/usage-guide/security.adoc
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
= Security
:description: Secure your Druid cluster with TLS encryption, LDAP, or OIDC authentication. Connect with OPA for policy-based authorization.

The Druid cluster can be secured and protected in multiple ways.

Expand Down

0 comments on commit cacb557

Please sign in to comment.