diff --git a/docs/modules/druid/pages/getting_started/first_steps.adoc b/docs/modules/druid/pages/getting_started/first_steps.adoc index 18bb9e8c..30a90cee 100644 --- a/docs/modules/druid/pages/getting_started/first_steps.adoc +++ b/docs/modules/druid/pages/getting_started/first_steps.adoc @@ -1,8 +1,8 @@ = First steps :description: Set up a Druid cluster using the Stackable Operator by installing ZooKeeper, HDFS, and Druid. Ingest and query example data via the web UI or API. -After going through the xref:getting_started/installation.adoc[] section and having installed all the Operators, you will now deploy a Druid cluster and its dependencies. -Afterwards you can <<_verify_that_it_works, verify that it works>> by ingesting example data and subsequently query it. +With the operators installed, deploy a Druid cluster and its dependencies. +Afterward you can <<_verify_that_it_works, verify that it works>> by ingesting example data and subsequently query it. == Setup @@ -12,8 +12,8 @@ Three things need to be installed to have a Druid cluster: * An HDFS instance to be used as a backend for deep storage * The Druid cluster itself -We will create them in this order, each one is created by applying a manifest file. -The Operators you just installed will then create the resources according to the manifest. +Create them in this order, each one is created by applying a manifest file. +The operators you just installed then create the resources according to the manifests. === ZooKeeper @@ -25,7 +25,7 @@ Create a file named `zookeeper.yaml` with the following content: include::example$getting_started/zookeeper.yaml[] ---- -Then create the resources by applying the manifest file +Then create the resources by applying the manifest file: [source,bash] ---- @@ -62,7 +62,7 @@ And apply it: include::example$getting_started/getting_started.sh[tag=install-druid] ---- -This will create the actual druid instance. +This creates the actual Druid Stacklet. WARNING: This Druid instance uses Derby (`dbType: derby`) as a metadata store, which is an interal SQL database. It is not persisted and not suitable for production use! @@ -70,7 +70,7 @@ Consult the https://druid.apache.org/docs/latest/dependencies/metadata-storage.h == Verify that it works -Next you will submit an ingestion job and then query the ingested data - either through the web interface or the API. +Submit an ingestion job and then query the ingested data -- either through the web interface or the API. First, make sure that all the Pods in the StatefulSets are ready: @@ -102,7 +102,7 @@ include::example$getting_started/getting_started.sh[tag=port-forwarding] === Ingest example data -Next, we will ingest some example data using the web interface. +Next, ingest some example data using the web interface. If you prefer to use the command line instead, follow the instructions in the collapsed section below. @@ -137,7 +137,7 @@ image::getting_started/load_example.png[] Click through all pages of the load process. You can also follow the https://druid.apache.org/docs/latest/tutorials/index.html#step-4-load-data[Druid Quickstart Guide]. -Once you finished the ingestion dialog you should see the ingestion overview with the job, which will eventually show SUCCESS: +Once you finished the ingestion dialog you should see the ingestion overview with the job, which eventually shows SUCCESS: image::getting_started/load_success.png[] @@ -173,4 +173,4 @@ Great! You've set up your first Druid cluster, ingested some data and queried it == What's next -Have a look at the xref:usage-guide/index.adoc[] page to find out more about the features of the Operator, such as S3-backed deep storage (as opposed to the HDFS backend used in this guide) or OPA-based authorization. +Have a look at the xref:usage-guide/index.adoc[] page to find out more about the features of the operator, such as S3-backed deep storage (as opposed to the HDFS backend used in this guide) or OPA-based authorization. diff --git a/docs/modules/druid/pages/getting_started/index.adoc b/docs/modules/druid/pages/getting_started/index.adoc index b353bb1e..0283e09b 100644 --- a/docs/modules/druid/pages/getting_started/index.adoc +++ b/docs/modules/druid/pages/getting_started/index.adoc @@ -1,17 +1,18 @@ = Getting started :description: Get started with Druid on Kubernetes using the Stackable Operator. Follow steps to install, configure, and query data. -This guide will get you started with Druid using the Stackable Operator. It will guide you through the installation of the Operator and its dependencies, setting up your first Druid instance and connecting to it, ingesting example data and querying that data. +This guide helps you get started with Druid using the Stackable Operator +It covers installing the operator and its dependencies, setting up your first Druid instance, ingesting example data, and querying that data. == Prerequisites -You will need: +You need: * a Kubernetes cluster * kubectl * optional: Helm -Resource sizing depends on cluster type(s), usage and scope, but as a starting point we recommend a minimum of the following resources for this operator: +Resource sizing depends on cluster type(s), usage and scope, but as a starting point the following resources are recommended as a minimum for this operator: * 0.2 cores (e.g. i5 or similar) * 256MB RAM diff --git a/docs/modules/druid/pages/getting_started/installation.adoc b/docs/modules/druid/pages/getting_started/installation.adoc index 07f28129..710ac354 100644 --- a/docs/modules/druid/pages/getting_started/installation.adoc +++ b/docs/modules/druid/pages/getting_started/installation.adoc @@ -1,20 +1,17 @@ = Installation :description: Install the Stackable Druid Operator and its dependencies on Kubernetes using stackablectl or Helm. -On this page you will install the Stackable Druid Operator and Operators for its dependencies - ZooKeeper and HDFS - as -well as the commons, secret and listener operator which are required by all Stackable Operators. +Install the Stackable Operator for Apache Druid and operators for its dependencies -- ZooKeeper and HDFS -- as well as the commons, secret and listener operator which are required by all Stackable operators. -== Stackable Operators +There are multiple ways to install the operators, xref:management:stackablectl:index.adoc[] is the preferred way but Helm is also supported. +OpenShift users may prefer installing the operator from the RedHat Certified Operator catalog using the OpenShift web console. -There are 2 ways to run Stackable Operators - -1. Using xref:management:stackablectl:index.adoc[] - -2. Using Helm - -=== stackablectl - -stackablectl is the command line tool to interact with Stackable operators and our recommended way to install Operators. +[tabs] +==== +stackablectl:: ++ +-- +stackablectl is the command line tool to interact with Stackable operators and the recommended way to install operators. Follow the xref:management:stackablectl:installation.adoc[installation steps] for your platform. After you have installed stackablectl run the following command to install all Operators necessary for Druid: @@ -24,29 +21,34 @@ After you have installed stackablectl run the following command to install all O include::example$getting_started/getting_started.sh[tag=stackablectl-install-operators] ---- -The tool will show +The tool prints [source] include::example$getting_started/install_output.txt[] TIP: Consult the xref:management:stackablectl:quickstart.adoc[] to learn more about how to use `stackablectl`. +-- -=== Helm +Helm:: ++ +-- +You can also use Helm to install the operators. -You can also use Helm to install the Operators. Add the Stackable Helm repository: +.Add the Stackable Helm repository [source,bash] ---- include::example$getting_started/getting_started.sh[tag=helm-add-repo] ---- -Then install the Stackable Operators: +.Install the Stackable operators [source,bash] ---- include::example$getting_started/getting_started.sh[tag=helm-install-operators] ---- -Helm will deploy the Operators in a Kubernetes Deployment and apply the CRDs for the Apache Druid service (as well as -the CRDs for the required operators). You are now ready to deploy Apache Druid in Kubernetes. +Helm deploys the operators in a Kubernetes Deployment and applies the CRDs for the Apache Druid service (as well as the CRDs for the required operators). +-- +==== == What's next diff --git a/docs/modules/druid/pages/index.adoc b/docs/modules/druid/pages/index.adoc index 89f52d3b..0c796853 100644 --- a/docs/modules/druid/pages/index.adoc +++ b/docs/modules/druid/pages/index.adoc @@ -15,7 +15,7 @@ * {feature-tracker}[Feature Tracker {external-link-icon}^] * {crd}[CRD documentation {external-link-icon}^] -The Stackable operator for Apache Druid is an operator that can deploy and manage {druid}[Apache Druid] clusters on Kubernetes. +The Stackable operator for Apache Druid deploys and manages {druid}[Apache Druid] clusters on Kubernetes. Apache Druid is an open-source, distributed data store designed to quickly process large amounts of data in real-time. It enables users to ingest, store, and query massive amounts of data in real-time, a great tool for handling high-volume data processing and analysis. This operator provides several resources and features to manage Druid clusters efficiently. @@ -89,7 +89,7 @@ The xref:demos:nifi-kafka-druid-earthquake-data.adoc[] demo ingests {earthquake} == Supported versions The Stackable operator for Apache Druid currently supports the Druid versions listed below. -To use a specific Druid version in your DruidCluster, you have to specify an image - this is explained in the xref:concepts:product-image-selection.adoc[] documentation. +To use a specific Druid version in your Druid Stacklet, you have to specify an image -- this is explained in the xref:concepts:product-image-selection.adoc[] documentation. The operator also supports running images from a custom registry or running entirely customized images; both of these cases are explained under xref:concepts:product-image-selection.adoc[] as well. include::partial$supported-versions.adoc[] diff --git a/docs/modules/druid/pages/reference/commandline-parameters.adoc b/docs/modules/druid/pages/reference/commandline-parameters.adoc index 0d4aabf7..2aeb7591 100644 --- a/docs/modules/druid/pages/reference/commandline-parameters.adoc +++ b/docs/modules/druid/pages/reference/commandline-parameters.adoc @@ -23,7 +23,7 @@ stackable-druid-operator run --product-config /foo/bar/properties.yaml *Multiple values:* false -The operator will **only** watch for resources in the provided namespace `test`: +The operator **only** watches for resources in the provided namespace `test`: [source] ---- diff --git a/docs/modules/druid/pages/reference/crds.adoc b/docs/modules/druid/pages/reference/crds.adoc index 30d189f7..3e8bb3fe 100644 --- a/docs/modules/druid/pages/reference/crds.adoc +++ b/docs/modules/druid/pages/reference/crds.adoc @@ -1,3 +1,3 @@ = CRD Reference -Find all CRD reference for the Stackable Operator for Apache Druid at: {crd-docs-base-url}/druid-operator/{crd-docs-version}. +Find all CRD reference for the Stackable operator for Apache Druid at: {crd-docs-base-url}/druid-operator/{crd-docs-version}. diff --git a/docs/modules/druid/pages/reference/discovery.adoc b/docs/modules/druid/pages/reference/discovery.adoc index 0c549a2b..0dfa4df4 100644 --- a/docs/modules/druid/pages/reference/discovery.adoc +++ b/docs/modules/druid/pages/reference/discovery.adoc @@ -6,8 +6,7 @@ :namespace: stackable :routerPort: 8888 -The Stackable Operator for Druid publishes a xref:concepts:service_discovery.adoc[] with the following properties, -where `{clusterName}` represents the name and `{namespace}` the namespace of the cluster: +The Stackable operator for Druid publishes a xref:concepts:service_discovery.adoc[] with the following properties, where `{clusterName}` represents the name and `{namespace}` the namespace of the cluster: `DRUID_AVATICA_JDBC`:: ==== diff --git a/docs/modules/druid/pages/reference/environment-variables.adoc b/docs/modules/druid/pages/reference/environment-variables.adoc index 725e7caa..f71f0c01 100644 --- a/docs/modules/druid/pages/reference/environment-variables.adoc +++ b/docs/modules/druid/pages/reference/environment-variables.adoc @@ -36,7 +36,7 @@ docker run \ *Multiple values:* false -The operator will **only** watch for resources in the provided namespace `test`: +The operator **only** watches for resources in the provided namespace `test`: [source] ---- diff --git a/docs/modules/druid/pages/required-external-components.adoc b/docs/modules/druid/pages/required-external-components.adoc index c040b22c..d5d790e1 100644 --- a/docs/modules/druid/pages/required-external-components.adoc +++ b/docs/modules/druid/pages/required-external-components.adoc @@ -1,14 +1,17 @@ -# Required external components += Required external components :description: Druid requires an SQL database for metadata and supports various deep storage options like S3, HDFS, and cloud storage +:druid-available-metadata-stores: https://druid.apache.org/docs/latest/design/metadata-storage/#available-metadata-stores +:druid-deep-storage: https://druid.apache.org/docs/latest/design/deep-storage Druid uses an SQL database to store metadata. -Consult the https://druid.apache.org/docs/latest/dependencies/metadata-storage.html#available-metadata-stores[Druid documentation] for a list of supported databases and setup instructions. +Consult the {druid-available-metadata-stores}[Druid documentation] for a list of supported databases and setup instructions. -## Feature specific: S3 and cloud deep storage +== Feature specific: S3 and cloud deep storage -https://druid.apache.org/docs/latest/dependencies/deep-storage.html[Deep storage] is where segments are stored. -Druid offers multiple storage backends. For the local storage there are no prerequisites. +{druid-deep-storage}[Deep storage] is where segments are stored. +Druid offers multiple storage backends. +For the local storage there are no prerequisites. HDFS deep storage can be set up with the xref:hdfs:index.adoc[Stackable Operator for Apache HDFS]. -For S3 deep storage or the Google Cloud and Azure storage backends, you need to set up the storage. +For S3 deep storage or the Google Cloud and Azure storage backends, you need to set up the respective storage backend. Read the xref:usage-guide/deep-storage.adoc[deep storage usage guide] to learn more about configuring Druid deep storage. diff --git a/docs/modules/druid/pages/usage-guide/configuration-and-environment-overrides.adoc b/docs/modules/druid/pages/usage-guide/configuration-and-environment-overrides.adoc index 9760c568..483b797a 100644 --- a/docs/modules/druid/pages/usage-guide/configuration-and-environment-overrides.adoc +++ b/docs/modules/druid/pages/usage-guide/configuration-and-environment-overrides.adoc @@ -1,5 +1,6 @@ = Configuration & Environment Overrides :description: Override Druid configuration properties and environment variables per role or role group. Customize runtime.properties, jvm.config, and security.properties as needed. +:druid-config-reference: https://druid.apache.org/docs/latest/configuration/index.html The cluster definition also supports overriding configuration properties and environment variables, either per role or per role group, where the more specific override (role group) has precedence over the less specific one (role). @@ -9,9 +10,9 @@ IMPORTANT: Overriding certain properties which are set by the operator (such as For a role or role group, at the same level of `config`, you can specify: `configOverrides` for the following files: -- `runtime.properties` -- `jvm.config` -- `security.properties` +* `runtime.properties` +* `jvm.config` +* `security.properties` For example, if you want to set the `druid.server.http.numThreads` for the router to 100 adapt the `routers` section of the cluster resource like so: @@ -43,13 +44,16 @@ routers: All override property values must be strings. -For a full list of configuration options please refer to the Druid https://druid.apache.org/docs/latest/configuration/index.html[Configuration Reference]. +For a full list of configuration options please refer to the Druid {druid-config-reference}[Configuration Reference]. === The security.properties file -The `security.properties` file is used to configure JVM security properties. It is very seldom that users need to tweak any of these, but there is one use-case that stands out, and that users need to be aware of: the JVM DNS cache. +The `security.properties` file is used to configure JVM security properties. +It is very seldom that users need to tweak any of these, but there is one use-case that stands out, and that users need to be aware of: the JVM DNS cache. -The JVM manages it's own cache of successfully resolved host names as well as a cache of host names that cannot be resolved. Some products of the Stackable platform are very sensible to the contents of these caches and their performance is heavily affected by them. As of version 26.0.0 Apache Druid performs poorly if the positive cache is disabled. To cache resolved host names, and thus speeding up Druid queries you can configure the TTL of entries in the positive cache like this: +The JVM manages it's own cache of successfully resolved host names as well as a cache of host names that cannot be resolved. +Some products of the Stackable platform are very sensible to the contents of these caches and their performance is heavily affected by them. +As of version 26.0.0 Apache Druid performs poorly if the positive cache is disabled. To cache resolved host names, and thus speeding up Druid queries you can configure the TTL of entries in the positive cache like this: [source,yaml] ---- @@ -84,9 +88,10 @@ NOTE: The operator configures DNS caching by default as shown in the example abo For details on the JVM security see https://docs.oracle.com/en/java/javase/11/security/java-security-overview1.html -== Environment Variables +== Environment variables -In a similar fashion, environment variables can be (over)written. For example per role group: +In a similar fashion, environment variables can be (over)written. +For example per role group: [source,yaml] ---- @@ -113,3 +118,8 @@ routers: ---- // cliOverrides don't make sense for this operator, so the feature is omitted for now + +== Pod overrides + +The Druid operator also supports Pod overrides, allowing you to override any property that you can set on a Kubernetes Pod. +Read the xref:concepts:overrides.adoc#pod-overrides[Pod overrides documentation] to learn more about this feature. diff --git a/docs/modules/druid/pages/usage-guide/deep-storage.adoc b/docs/modules/druid/pages/usage-guide/deep-storage.adoc index 4ebb29c7..b7c1058f 100644 --- a/docs/modules/druid/pages/usage-guide/deep-storage.adoc +++ b/docs/modules/druid/pages/usage-guide/deep-storage.adoc @@ -1,7 +1,8 @@ = Deep storage configuration :description: Configure Apache Druid deep storage with HDFS or S3. Set up HDFS via a ConfigMap, or use S3 with inline or referenced bucket details. +:druid-deep-storage: https://druid.apache.org/docs/latest/design/deep-storage/ -https://druid.apache.org/docs/latest/design/deep-storage/[Deep Storage] is where Druid stores data segments. +{druid-deep-storage}[Deep Storage] is where Druid stores data segments. For a Kubernetes environment, either the HDFS or S3 backend is recommended. == [[hdfs]]HDFS diff --git a/docs/modules/druid/pages/usage-guide/extensions.adoc b/docs/modules/druid/pages/usage-guide/extensions.adoc index b4bdcbe7..e3ac68bd 100644 --- a/docs/modules/druid/pages/usage-guide/extensions.adoc +++ b/docs/modules/druid/pages/usage-guide/extensions.adoc @@ -1,5 +1,6 @@ = Druid extensions :druid-extensions: https://druid.apache.org/docs/latest/configuration/extensions/ +:druid-core-extensions: https://druid.apache.org/docs/latest/configuration/extensions/#core-extensions :druid-community-extensions: https://druid.apache.org/docs/latest/configuration/extensions/#loading-community-extensions :description: Add functionality to Druid with default or custom extensions. Default extensions include Kafka and HDFS support; community extensions require extra setup. @@ -7,26 +8,25 @@ == [[default-extensions]]Default extensions -Some extensions are loaded by default: +Druid Stacklets use the following extensions by default: -- `druid-kafka-indexing-service` -- `druid-datasketches` -- `prometheus-emitter` -- `druid-basic-security` -- `druid-opa-authorizer` -- `druid-hdfs-storage` +* `druid-kafka-indexing-service` +* `druid-datasketches` +* `prometheus-emitter` +* `druid-basic-security` +* `druid-opa-authorizer` +* `druid-hdfs-storage` Some extensions are loaded conditionally depending on the DruidCluster configuration: -- `postgresql-metadata-storage` (when using PostgreSQL as metadata store) -- `mysql-metadata-storage` (when using MySQL as metadata store) -- `druid-s3-extensions` (when S3 is used as deep storage) -- `simple-client-sslcontext` (when TLS is enabled) +* `postgresql-metadata-storage` (when using PostgreSQL as metadata store) +* `mysql-metadata-storage` (when using MySQL as metadata store) +* `druid-s3-extensions` (when S3 is used as deep storage) +* `simple-client-sslcontext` (when TLS is enabled) == [[custom-extensions]]Custom extensions -Druid can be configured load any number of additional extensions. -Core extensions are already bundled with Druid but adding community extensions requires {druid-community-extensions}[some extra steps]. +You can configure Druid to load more extensions by adding them to the `additionalExtensions` list as shown below: [source,yaml] ---- @@ -38,4 +38,6 @@ spec: ... ---- +{druid-core-extensions}[Core extensions] are already bundled with Druid but adding community extensions requires {druid-community-extensions}[some extra steps]. + Some extensions may require additional configuration which can be added using xref:usage-guide/configuration-and-environment-overrides.adoc[configuration and environment overrides]. diff --git a/docs/modules/druid/pages/usage-guide/ingestion.adoc b/docs/modules/druid/pages/usage-guide/ingestion.adoc index f715d610..c6a7460b 100644 --- a/docs/modules/druid/pages/usage-guide/ingestion.adoc +++ b/docs/modules/druid/pages/usage-guide/ingestion.adoc @@ -19,7 +19,9 @@ spec: <1> The S3 host, not optional <2> Port, optional, defaults to 80 -<3> Credentials to use. Since these might be bucket-dependent, they can instead be given in the ingestion job. Specifying the credentials here is explained <>. +<3> Credentials to use. + Since these might be bucket-dependent, they can instead be given in the ingestion job. + Specifying the credentials here is explained <>. include::partial$s3-note.adoc[] @@ -33,7 +35,7 @@ Since Druid actively runs ingestion tasks there may be a need to make extra file These could for example be client certificates used to connect to a Kafka cluster or a keytab to obtain a Kerberos ticket. -In order to make these files available the operator allows specifying extra volumes that will be added to all pods deployed for this cluster. +In order to make these files available the operator allows specifying extra volumes that are added to all pods deployed for this cluster. [source,yaml] ---- @@ -45,4 +47,4 @@ spec: secretName: google-service-account ---- -All `Volumes` specified in this section will be made available under `/stackable/userdata/\{volumename\}`. +All Volumes specified in this section are made available under `/stackable/userdata/\{volumename\}`. diff --git a/docs/modules/druid/pages/usage-guide/operations/cluster-operations.adoc b/docs/modules/druid/pages/usage-guide/operations/cluster-operations.adoc index 510102c2..a7e36869 100644 --- a/docs/modules/druid/pages/usage-guide/operations/cluster-operations.adoc +++ b/docs/modules/druid/pages/usage-guide/operations/cluster-operations.adoc @@ -1,3 +1,4 @@ = Cluster operation -Druid installations can be configured with different cluster operations like pausing reconciliation or stopping the cluster. See xref:concepts:operations/cluster_operations.adoc[cluster operations] for more details. +Druid installations can be configured with different cluster operations like pausing reconciliation or stopping the cluster. +See xref:concepts:operations/cluster_operations.adoc[cluster operations] for more details. diff --git a/docs/modules/druid/pages/usage-guide/operations/graceful-shutdown.adoc b/docs/modules/druid/pages/usage-guide/operations/graceful-shutdown.adoc index fb497ab7..d979c41e 100644 --- a/docs/modules/druid/pages/usage-guide/operations/graceful-shutdown.adoc +++ b/docs/modules/druid/pages/usage-guide/operations/graceful-shutdown.adoc @@ -2,9 +2,9 @@ You can configure the graceful shutdown as described in xref:concepts:operations/graceful_shutdown.adoc[]. -The Druid processes will receive a `SIGTERM` signal when Kubernetes wants to terminate the Pod. -It will log the received signal as shown in the log below and initiate a graceful shutdown. -After the graceful shutdown timeout runs out, and the process still didn't exit, Kubernetes will issue a `SIGKILL` signal. +The Druid processes receives a `SIGTERM` signal when Kubernetes wants to terminate the Pod. +It logs the received signal as shown in the log below and initiates a graceful shutdown. +After the graceful shutdown timeout runs out, and the process is still running, Kubernetes issues a `SIGKILL` signal. == Broker @@ -121,7 +121,8 @@ druid 2023-11-13T10:56:54,212 INFO [Thread-55] org.apache.druid.java.util.common As a default, Druid middle managers have `5 minutes` to shut down gracefully. -The middle manager can be terminated gracefully by disabling it. Meaning the overlord will not send any new tasks and the middle manager will terminate after all tasks are finished or the termination grace period is exceeded. +The middle manager can be terminated gracefully by disabling it. +Meaning the overlord will not send any new tasks and the middle manager terminates after all tasks are finished or the termination grace period is exceeded. [source,text] ---- diff --git a/docs/modules/druid/pages/usage-guide/operations/pod-disruptions.adoc b/docs/modules/druid/pages/usage-guide/operations/pod-disruptions.adoc index 9b610e39..482dc391 100644 --- a/docs/modules/druid/pages/usage-guide/operations/pod-disruptions.adoc +++ b/docs/modules/druid/pages/usage-guide/operations/pod-disruptions.adoc @@ -2,19 +2,19 @@ You can configure the permitted Pod disruptions for Druid nodes as described in xref:concepts:operations/pod_disruptions.adoc[]. -Unless you configure something else or disable our PodDisruptionBudgets (PDBs), we write the following PDBs: +Unless you configure something else or disable the default PodDisruptionBudgets (PDBs), the following PDBs apply: == Brokers -We only allow a single broker to be offline at any given time, regardless of the number of replicas or `roleGroups`. +Only one broker may be offline at any time, regardless of the number of replicas or role groups. == Coordinators -We only allow a single coordinator to be offline at any given time, regardless of the number of replicas or `roleGroups`. +Only one coordinator may be offline at any time, regardless of the number of replicas or role groups. == Historicals -We only allow a single historical to be offline at any given time, regardless of the number of replicas or `roleGroups`. +Only one historical may be offline at any time, regardless of the number of replicas or role groups. == MiddleManagers -We only allow a single middleManager to be offline at any given time, regardless of the number of replicas or `roleGroups`. +Only one middle manager may be offline at any time, regardless of the number of replicas or role groups. == Routers -We only allow a single Router to be offline at any given time, regardless of the number of replicas or `roleGroups`. +Only one router may be offline at any time, regardless of the number of replicas or role groups. diff --git a/docs/modules/druid/pages/usage-guide/operations/pod-placement.adoc b/docs/modules/druid/pages/usage-guide/operations/pod-placement.adoc index b1071ab8..5a3e2ddb 100644 --- a/docs/modules/druid/pages/usage-guide/operations/pod-placement.adoc +++ b/docs/modules/druid/pages/usage-guide/operations/pod-placement.adoc @@ -7,7 +7,7 @@ The default affinities created by the operator are: 1. Distribute all Pods within the same role (brokers, coordinators, historicals, middle-managers, routers) (weight 70) Some of the Druid roles do frequently communicate with each other. -To address this, some affinities will be created to attract these roles: +To address this, some affinities are created to attract these roles: *For brokers:* @@ -24,4 +24,4 @@ To address this, some affinities will be created to attract these roles: *For coordinators:* -- No affinities +No affinities diff --git a/docs/modules/druid/pages/usage-guide/resources-and-storage.adoc b/docs/modules/druid/pages/usage-guide/resources-and-storage.adoc index 086c5b10..b54d49ea 100644 --- a/docs/modules/druid/pages/usage-guide/resources-and-storage.adoc +++ b/docs/modules/druid/pages/usage-guide/resources-and-storage.adoc @@ -76,7 +76,11 @@ You can configure your own resource requests and limits by following the example In addition to the cpu and memory resources described above, historical Pods also accept a `storage` resource with the following properties: -* `segmentCache` - used to set the maximum size allowed for the historical segment cache locations. See the Druid documentation regarding https://druid.apache.org/docs/latest/configuration/index.html#historical[druid.segmentCache.locations]. The operator creates an `emptyDir` and sets the `max_size` of the volume to be the value of the `capacity` property. In addition Druid is configured to keep 7% volume size free. By default, if no `segmentCache` is configured, the operator will create an `emptyDir` with a size of `1G` and `freePercentage` of `5`. +* `segmentCache` - used to set the maximum size allowed for the historical segment cache locations. + See the Druid documentation regarding https://druid.apache.org/docs/latest/configuration/index.html#historical[druid.segmentCache.locations]. + The operator creates an `emptyDir` and sets the `max_size` of the volume to be the value of the `capacity` property. + In addition Druid is configured to keep 7% volume size free. + By default, if no `segmentCache` is configured, the operator creates an `emptyDir` with a size of `1G` and `freePercentage` of `5`. Example historical configuration with storage resources: diff --git a/docs/modules/druid/pages/usage-guide/security.adoc b/docs/modules/druid/pages/usage-guide/security.adoc index 5850ec74..d0e54d32 100644 --- a/docs/modules/druid/pages/usage-guide/security.adoc +++ b/docs/modules/druid/pages/usage-guide/security.adoc @@ -1,11 +1,13 @@ = Security :description: Secure your Druid cluster with TLS encryption, LDAP, or OIDC authentication. Connect with OPA for policy-based authorization. +:druid-auth-authz-model: https://druid.apache.org/docs/latest/operations/security-user-auth/#authentication-and-authorization-model +:opa-rego-docs: https://www.openpolicyagent.org/docs/latest/#rego -The Druid cluster can be secured and protected in multiple ways. +Secure your Apache Druid Stacklet by enabling TLS encryption, user authentication and authorization. == Encryption -TLS encryption is supported for internal cluster communication (e.g. between Broker and Coordinator) as well as for external communication (e.g. between the Browser and the Router Web UI). +TLS encryption is supported for internal cluster communication (e.g. between Broker and Coordinator) as well as for external communication (e.g. between the browser and the Router web UI). [source,yaml] ---- @@ -14,9 +16,9 @@ spec: tls: serverAndInternalSecretClass: tls # <1> ---- -<1> Name of the `SecretClass` that is used to encrypt internal and external communication. +<1> Name of the SecretClass that is used to encrypt internal and external communication. -IMPORTANT: A Stackable Druid cluster is always encrypted per default i.e. `spec.clusterConfig.tls.serverAndInternalSecretClass: tls` in the above example does not need to be specified as it will be applied by default: in order to disable this default behavior you can set `spec.clusterConfig.tls.serverAndInternalSecretClass: null`. +IMPORTANT: A Druid Stacklet is always encrypted per default i.e. `spec.clusterConfig.tls.serverAndInternalSecretClass: tls` in the above example does not need to be specified as it is applied by default: in order to disable this default behavior you can set `spec.clusterConfig.tls.serverAndInternalSecretClass: null`. == [[authentication]]Authentication @@ -32,9 +34,9 @@ spec: authentication: - authenticationClass: druid-tls-auth # <1> ---- -<1> Name of the `AuthenticationClass` that is used to encrypt and authenticate communication. +<1> Name of the AuthenticationClass that is used to encrypt and authenticate communication. -The `AuthenticationClass` may or may not have a `SecretClass` configured: +The AuthenticationClass may or may not have a SecretClass configured: [source,yaml] ---- --- @@ -50,12 +52,13 @@ spec: # Option 2 tls: {} # <2> ---- -<1> If a client `SecretClass` is provided in the `AuthenticationClass` (here `druid-mtls`), these certificates will be used for encryption and authentication. -<2> If no client `SecretClass` is provided in the `AuthenticationClass`, the `spec.clusterConfig.tls.serverAndInternalSecretClass` will be used for encryption and authentication. It cannot be explicitly set to null in this case. +<1> If a client SecretClass is provided in the AuthenticationClass (here `druid-mtls`), these certificates are used for encryption and authentication. +<2> If no client SecretClass is provided in the AuthenticationClass, the `spec.clusterConfig.tls.serverAndInternalSecretClass` is used for encryption and authentication. +It cannot be explicitly set to null in this case. === LDAP -Druid supports xref:concepts:authentication.adoc[authentication] of users against an LDAP server. +Druid supports xref:concepts:authentication.adoc[authentication] of users via an LDAP server. This requires setting up an xref:concepts:authentication.adoc#authenticationclass[AuthenticationClass] for the LDAP server: [source,yaml] @@ -63,7 +66,7 @@ This requires setting up an xref:concepts:authentication.adoc#authenticationclas include::example$druid-ldap-authentication.yaml[tag=authclass] ---- -NOTE: You can follow the xref:tutorials:authentication_with_openldap.adoc[] tutorial to learn how to create an AuthenticationClass for an LDAP server. +TIP: You can follow the xref:tutorials:authentication_with_openldap.adoc[] tutorial to learn how to create an AuthenticationClass for an LDAP server. Reference the AuthenticationClass in your DruidCluster resource: @@ -103,15 +106,19 @@ include::example$druid-oidc-authentication.yaml[tag=secret] At the moment you can either use TLS, LDAP or OIDC authentication but not a combination of authentication methods. -Using an LDAP server **without** bind credentials is not supported. This limitation is due to Druid not supporting this scenario. See https://github.com/stackabletech/druid-operator/issues/383[our issue] for details. +Druid doesn't support LDAP authentication **without** bind credentials. +See https://github.com/stackabletech/druid-operator/issues/383[our issue] for details. -Authorization is done using the `allowAll` authorizer. Support for `memberOf` and OPA authorization is planned. +Authorization is done using the `allowAll` authorizer or OPA (see below). == [[authorization]]Authorization with Open Policy Agent (OPA) -Druid can connect to an Open Policy Agent (OPA) instance for authorization policy decisions. You need to run an OPA instance to connect to: for this, please refer to the https://docs.stackable.tech/opa/index.html[OPA Operator docs]. A short explanation of how to write RegoRules for Druid is given <<_defining_regorules, below>>. +Druid can connect to an Open Policy Agent (OPA) instance for authorization policy decisions. +You need to run an OPA instance to connect to: for this, please refer to the xref:opa:index.adoc[OPA Operator docs]. +A short explanation of how to write RegoRules for Druid is given <<_defining_regorules, below>>. -Once you have defined your rules, you need to configure the OPA cluster name and endpoint to use for Druid authorization requests. Add a section to the `spec` for OPA: +Once you have defined your rules, you need to configure the OPA cluster name and endpoint to use for Druid authorization requests. +Add a section to the `spec` for OPA: [source,yaml] ---- @@ -122,27 +129,36 @@ spec: configMapName: simple-opa <1> package: my-druid-rules <2> ---- -<1> The name of your OPA cluster (`simple-opa` in this case) -<2> The RegoRule package to use for policy decisions. The package should contain an `allow` rule. This is optional and will default to the name of the Druid cluster. +<1> The name of your OPA Stacklet (`simple-opa` in this case) +<2> The RegoRule package to use for policy decisions. +The package needs to contain an `allow` rule. +This is optional and defaults to the name of the Druid Stacklet. === Defining RegoRules -For a general explanation of how rules are written, please refer to the https://www.openpolicyagent.org/docs/latest/#rego[OPA documentation]. Inside your rule you will have access to input from Druid. Druid provides this data to you to base your policy decisions on: +For a general explanation of how rules are written, please refer to the {opa-rego-docs}[OPA documentation]. +Inside your rule you have access to input from Druid. +Druid provides this data to you to base your policy decisions on: [source,json] ---- { - "user": "someUsername", <1> - "action": "READ", <2> + "authenticationResult": { + "identity": , <1> + "authorizerName": , + "authenticatedBy": , + "context": Map, + } + "action": <2> "resource": { - "type": "DATASOURCE", <3> - "name": "myTable" <4> + "name": , <3> + "type": <4> } } ---- <1> The authenticated identity of the user that wants to perform the action <2> The action type, can be either `READ` or `WRITE`. -<3> The resource type, one of `STATE`, `CONFIG` and `DATASOURCE`. -<4> In case of a datasource this is the table name, for `STATE` this will simply be `STATE`, the same for `CONFIG`. +<3> In case of a datasource this is the table name, for `STATE` this is simply `STATE`, the same for `CONFIG`. +<4> The resource type, one of `STATE`, `CONFIG` and `DATASOURCE`. -For more details consult the https://druid.apache.org/docs/latest/operations/security-user-auth.html#authentication-and-authorization-model[Druid Authentication and Authorization Model]. +For more details consult the {druid-auth-authz-model}[Druid Authentication and Authorization Model]. diff --git a/docs/modules/druid/partials/s3-credentials.adoc b/docs/modules/druid/partials/s3-credentials.adoc index b61ee7d8..1c08e579 100644 --- a/docs/modules/druid/partials/s3-credentials.adoc +++ b/docs/modules/druid/partials/s3-credentials.adoc @@ -1,6 +1,7 @@ -No matter if a connection is specified inline or as a separate object, the credentials are always specified in the same way. You will need a `Secret` containing the access key ID and secret access key, a `SecretClass` and then a reference to this `SecretClass` where you want to specify the credentials. +No matter if a connection is specified inline or as a separate object, the credentials are always specified in the same way. +You need a Secret containing the access key ID and secret access key, a SecretClass and then a reference to this SecretClass where you want to specify the credentials. -The `Secret`: +The Secret: [source,yaml] ----