Add Collector security documentation (#5209)

Co-authored-by: Juraci Paixão Kröhling <[email protected]> Co-authored-by: Pablo Baeyens <[email protected]> Co-authored-by: opentelemetrybot <[email protected]> Co-authored-by: Reiley Yang <[email protected]> Co-authored-by: Patrice Chalin <[email protected]>
open-telemetry · Nov 2, 2024 · d96ef10 · d96ef10
1 parent 81475f0
commit d96ef10
Show file tree

Hide file tree

Showing 6 changed files with 322 additions and 2 deletions.
diff --git a/content/en/docs/security/_index.md b/content/en/docs/security/_index.md
@@ -2,3 +2,40 @@
 title: Security
 weight: 970
 ---
+
+In this section, learn how the OpenTelemetry project discloses vulnerabilities
+and responds to incidents and discover what you can do to securely collect and
+transmit your observability data.
+
+## Common Vulnerabilities and Exposures (CVEs)
+
+For CVEs across all repositories, see
+[Common Vulnerabilities and Exposures](cve/).
+
+## Incident response
+
+Learn how to report a vulnerability or find out how incident responses are
+handled in [Community incident response guidelines](security-response/).
+
+## Collector security
+
+When setting up the OpenTelemetry Collector, consider implementing security best
+practices in both your hosting infrastructure and your Collector configuration.
+Running a secure Collector can help you
+
+- Protect telemetry that shouldn't but might contain sensitive information, such
+  as personally identifiable information (PII), application-specific data, or
+  network traffic patterns.
+- Prevent data tampering that makes telemetry unreliable and disrupts incident
+  responses.
+- Comply with data privacy and security regulations.
+- Defend against denial of service (DoS) attacks.
+
+See [Hosting best practices](hosting-best-practices/) to learn how to secure
+your Collector's infrastructure.
+
+See [Configuration best practices](config-best-practices/) to learn how to
+securely configure your Collector.
+
+For Collector component developers, see
+[Security best practices](https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md).
diff --git a/content/en/docs/security/config-best-practices.md b/content/en/docs/security/config-best-practices.md
@@ -0,0 +1,208 @@
+---
+title: Collector configuration best practices
+linkTitle: Collector configuration
+weight: 112
+cSpell:ignore: exporterhelper
+---
+
+When configuring the OpenTelemetry (OTel) Collector, consider these best
+practices to better secure your Collector instance.
+
+## Create secure configurations
+
+Follow these guidelines to secure your Collector's configuration and its
+pipelines.
+
+### Store your configuration securely
+
+The Collector's configuration might contain sensitive information including:
+
+- Authentication information such as API tokens.
+- TLS certificates including private keys.
+
+You should store sensitive information securely such as on an encrypted
+filesystem or secret store. You can use environment variables to handle
+sensitive and non-sensitive data as the Collector supports
+[environment variable expansion](/docs/collector/configuration/#environment-variables).
+
+### Use encryption and authentication
+
+Your OTel Collector configuration should include encryption and authentication.
+
+- For communication encryption, see
+  [Configuring certificates](/docs/collector/configuration/#setting-up-certificates).
+- For authentication, use the OTel Collector's authentication mechanism, as
+  described in [Authentication](/docs/collector/configuration/#authentication).
+
+### Minimize the number of components
+
+We recommend limiting the set of components in your Collector configuration to
+only those you need. Minimizing the number of components you use minimizes the
+attack surface exposed.
+
+- Use the
+  [OpenTelemetry Collector Builder (`ocb`)](/docs/collector/custom-collector) to
+  create a Collector distribution that uses only the components you need.
+- Remove unused components from your configuration.
+
+### Configure with care
+
+Some components can increase the security risk of your Collector pipelines.
+
+- Receivers, exporters, and other components should establish network
+  connections over a secure channel, potentially authenticated as well.
+- Receivers and exporters might expose buffer, queue, payload, and worker
+  settings using configuration parameters. If these settings are available, you
+  should proceed with caution before modifying the default configuration values.
+  Improperly setting these values might expose the OpenTelemetry Collector to
+  additional attack vectors.
+
+## Set permissions carefully
+
+Avoid running the Collector as a root user. Some components might require
+special permissions, however. In those cases, follow the principle of least
+privilege and make sure your components only have the access they need to do
+their job.
+
+### Observers
+
+Observers are implemented as extensions. Extensions are a type of component that
+adds capabilities on top of the primary functions of the Collector. Extensions
+don't require direct access to telemetry and aren't part of pipelines, but they
+can still pose security risks if they require special permissions.
+
+An observer discovers networked endpoints such as a Kubernetes pod, Docker
+container, or local listening port on behalf of the
+[receiver creator](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/receivercreator/README.md).
+In order to discover services, observers might require greater access. For
+example, the `k8s_observer` requires
+[role-based access control (RBAC) permissions](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/extension/observer/k8sobserver#setting-up-rbac-permissions)
+in Kubernetes.
+
+## Manage specific security risks
+
+Configure your Collector to block these security threats.
+
+### Protect against denial of service attacks
+
+For server-like receivers and extensions, you can protect your Collector from
+exposure to the public internet or to wider networks than necessary by binding
+these components' endpoints to addresses that limit connections to authorized
+users. Try to always use specific interfaces, such as a pod's IP, or `localhost`
+instead of `0.0.0.0`. For more information, see
+[CWE-1327: Binding to an Unrestricted IP Address](https://cwe.mitre.org/data/definitions/1327.html).
+
+From Collector v0.110.0, the default host for all servers in Collector
+components is `localhost`. For earlier versions of the Collector, change the
+default endpoint from `0.0.0.0` to `localhost` in all components by enabling the
+`component.UseLocalHostAsDefaultHost`
+[feature gate](https://github.com/open-telemetry/opentelemetry-collector/tree/main/featuregate).
+
+If `localhost` resolves to a different IP due to your DNS settings, then
+explicitly use the loopback IP instead: `127.0.0.1` for IPv4 or `::1` for IPv6.
+For example, here's an IPv4 configuration using a gRPC port:
+
+```yaml
+receivers:
+  otlp:
+    protocols:
+      grpc:
+        endpoint: 127.0.0.1:4317
+```
+
+In IPv6 setups, make sure your system supports both IPv4 and IPv6 loopback
+addresses so the network functions properly in dual-stack environments and
+applications, where both protocol versions are used.
+
+If you are working in environments that have nonstandard networking setups, such
+as Docker or Kubernetes, see the
+[example configurations](https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/security-best-practices.md#safeguards-against-denial-of-service-attacks)
+in our component developer documentation for ideas on how to bind your component
+endpoints.
+
+### Scrub sensitive data
+
+[Processors](/docs/collector/configuration/#processors) are the Collector
+components that sit between receivers and exporters. They are responsible for
+processing telemetry before it's analyzed. You can use the OpenTelemetry
+Collector's `redaction` processor to obfuscate or scrub sensitive data before
+exporting it to a backend.
+
+The
+[`redaction` processor](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/redactionprocessor)
+deletes span, log, and metric datapoint attributes that don't match a list of
+allowed attributes. It also masks attribute values that match a blocked value
+list. Attributes that aren't on the allowed list are removed before any value
+checks are done.
+
+For example, here is a configuration that masks values containing credit card
+numbers:
+
+```yaml
+processors:
+  redaction:
+    allow_all_keys: false
+    allowed_keys:
+      - description
+      - group
+      - id
+      - name
+    ignored_keys:
+      - safe_attribute
+    blocked_values: # Regular expressions for blocking values of allowed span attributes
+      - '4[0-9]{12}(?:[0-9]{3})?' # Visa credit card number
+      - '(5[1-5][0-9]{14})' # MasterCard number
+    summary: debug
+```
+
+See the
+[documentation](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/redactionprocessor)
+to learn how to add the `redaction` processor to your Collector configuration.
+
+### Safeguard resource utilization
+
+After implementing safeguards for resource utilization in your
+[hosting infrastructure](../hosting-best-practices/), consider also adding these
+safeguards to your OpenTelemetry Collector configuration.
+
+Batching your telemetry and limiting the memory available to your Collector can
+prevent out-of-memory errors and usage spikes. You can also handle traffic
+spikes by adjusting queue sizes to manage memory usage while avoiding data loss.
+For example, use the
+[`exporterhelper`](https://github.com/open-telemetry/opentelemetry-collector/blob/main/exporter/exporterhelper/README.md)
+to manage queue size for your `otlp` exporter:
+
+```yaml
+exporters:
+  otlp:
+    endpoint: <ENDPOINT>
+    sending_queue:
+      queue_size: 800
+```
+
+Filtering unwanted telemetry is another way you can protect your Collector's
+resources. Not only does filtering protect your Collector instance, but it also
+reduces the load on your backend. You can use the
+[`filter` processor](/docs/collector/transforming-telemetry/#basic-filtering) to
+drop logs, metrics, and spans you don't need. For example, here's a
+configuration that drops non-HTTP spans:
+
+```yaml
+processors:
+  filter:
+    error_mode: ignore
+    traces:
+      span:
+        - attributes["http.request.method"] == nil
+```
+
+You can also configure your components with appropriate timeout and retry
+limits. These limits should allow your Collector to handle failures without
+accumulating too much data in memory. See the
+[`exporterhelper` documentation](https://github.com/open-telemetry/opentelemetry-collector/blob/main/exporter/exporterhelper/README.md)
+for more information.
+
+Finally, consider using compression with your exporters to reduce the send size
+of your data and conserve network and CPU resources. By default, the
+[`otlp` exporter](https://github.com/open-telemetry/opentelemetry-collector/tree/main/exporter/otlpexporter)
+uses `gzip` compression.
diff --git a/content/en/docs/security/cve.md b/content/en/docs/security/cve.md
@@ -1,6 +1,6 @@
 ---
 title: Common Vulnerabilities and Exposures
-weight: 102
+weight: 100
 ---
 
 This is a list of reported Common Vulnerabilities and Exposures (CVEs) across

diff --git a/content/en/docs/security/hosting-best-practices.md b/content/en/docs/security/hosting-best-practices.md
@@ -0,0 +1,63 @@
+---
+title: Collector hosting best practices
+linkTitle: Collector hosting
+weight: 115
+---
+
+When setting up hosting for OpenTelemetry (OTel) Collector, consider these best
+practices to better secure your hosting instance.
+
+## Store data securely
+
+Your Collector configuration file might contain sensitive data, including
+authentication tokens or TLS certificates. See the best practices for
+[securing your configuration](../config-best-practices/#create-secure-configurations).
+
+If you are storing telemetry for processing, make sure to restrict access to
+those directories to prevent tampering with raw data.
+
+## Keep your secrets safe
+
+Kubernetes [secrets](https://kubernetes.io/docs/concepts/configuration/secret/)
+are credentials that hold confidential data. They authenticate and authorize
+privileged access. If you're using a Kubernetes deployment for your Collector,
+make sure to follow these
+[recommended practices](https://kubernetes.io/docs/concepts/security/secrets-good-practices/)
+to improve security for your clusters.
+
+## Apply the principle of least privilege
+
+The Collector should not require privileged access, except where the data it's
+collecting is in a privileged location. For example, in a Kubernetes deployment,
+system logs, application logs, and container runtime logs are often stored in a
+node volume that requires special permission to access. If your Collector is
+running as a daemonset on the node, make sure to grant only the specific volume
+mount permissions it needs to access these logs and no more. You can configure
+privilege access with role-based access control (RBAC). See
+[RBAC good practices](https://kubernetes.io/docs/concepts/security/rbac-good-practices/)
+for more information.
+
+## Control access to server-like components
+
+Some Collector components such as receivers and exporters can function like
+servers. To limit access to authorized users, you should:
+
+- Enable authentication by using bearer token authentication extensions and
+  basic authentication extensions, for example.
+- Restrict the IPs that your Collector runs on.
+
+## Safeguard resource utilization
+
+Use the Collector's own
+[internal telemetry](/docs/collector/internal-telemetry/) to monitor its
+performance. Collect metrics from the Collector about its CPU, memory, and
+throughput usage and set alerts for resource exhaustion.
+
+If resource limits are reached, consider horizontally
+[scaling the Collector](/docs/collector/scaling/) by deploying multiple
+instances in a load-balanced configuration. Scaling your Collector distributes
+the resource demands and prevents bottlenecks.
+
+Once you secure resource utilization in your deployment, make sure your
+Collector instance also uses
+[safeguards in its configuration](../config-best-practices/#safeguard-resource-utilization).
diff --git a/content/en/docs/security/security-response.md b/content/en/docs/security/security-response.md
@@ -1,5 +1,5 @@
 ---
-title: Community Incident Response Guidelines
+title: Community incident response guidelines
 weight: 102
 ---
 

diff --git a/static/refcache.json b/static/refcache.json
@@ -8755,6 +8755,10 @@
     "StatusCode": 206,
     "LastSeen": "2024-08-09T10:44:30.895853-04:00"
   },
+  "https://kubernetes.io/docs/concepts/configuration/secret/": {
+    "StatusCode": 206,
+    "LastSeen": "2024-10-17T20:41:39.419625448-07:00"
+  },
   "https://kubernetes.io/docs/concepts/configuration/secret/#using-a-secret": {
     "StatusCode": 206,
     "LastSeen": "2024-04-25T00:01:05.630302-04:00"
@@ -8783,6 +8787,14 @@
     "StatusCode": 206,
     "LastSeen": "2024-08-09T10:45:22.265624-04:00"
   },
+  "https://kubernetes.io/docs/concepts/security/rbac-good-practices/": {
+    "StatusCode": 206,
+    "LastSeen": "2024-10-28T23:48:13.923440181Z"
+  },
+  "https://kubernetes.io/docs/concepts/security/secrets-good-practices/": {
+    "StatusCode": 206,
+    "LastSeen": "2024-10-17T20:41:39.602462106-07:00"
+  },
   "https://kubernetes.io/docs/concepts/services-networking/service/": {
     "StatusCode": 206,
     "LastSeen": "2024-01-30T06:06:10.439014-05:00"