Skip to content

Commit

Permalink
Major push for Jaeger v2 docs updates (#735)
Browse files Browse the repository at this point in the history
## Description of the changes
Lots of changes per the new menu format, also updated a lot of content
but more to go around configurations.

## How was this change tested?
make develop

## Checklist
- [X] I have read
https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md
- [X] I have signed all commits

---------

Signed-off-by: Jonah Kowall <[email protected]>
  • Loading branch information
jkowall authored Sep 11, 2024
1 parent 6c61b39 commit 832a3ec
Show file tree
Hide file tree
Showing 27 changed files with 3,977 additions and 2 deletions.
6 changes: 4 additions & 2 deletions content/docs/next-release-v2/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,11 @@ weight: 1
children:
- title: Features
url: features
- title: Migration
url: migration
---

Welcome to Jaeger v2 documentation portal! Below, you'll find information for beginners and experienced Jaeger users.
Welcome to Jaeger's documentation! Below, you'll find information for beginners and experienced Jaeger users.

If you cannot find what you are looking for, or have an issue not covered here, we'd love to [hear from you](/get-in-touch).

Expand All @@ -28,7 +30,7 @@ Uber published a blog post, [Evolving Distributed Tracing at Uber](https://eng.u

* [OpenTracing](https://opentracing.io/)-inspired data model
* [OpenTelemetry](https://opentelemetry.io/) compatible
* Multiple built-in storage backends: Cassandra, Elasticsearch and in-memory
* Multiple built-in storage backends: Cassandra, Elasticsearch, OpenSearch, and in-memory
* Community supported external storage backends via the gRPC plugin: [ClickHouse](https://github.com/jaegertracing/jaeger-clickhouse)
* System topology graphs
* Adaptive sampling
Expand Down
11 changes: 11 additions & 0 deletions content/docs/next-release-v2/advanced.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
---
title: Advanced
weight: 6
children:
- title: APIs
url: apis
- title: Tools
url: tools
---

For other advanced topics of Jaeger see the subpages.
144 changes: 144 additions & 0 deletions content/docs/next-release-v2/apis.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
---
title: APIs
hasparent: true
---

Jaeger components implement various APIs for saving or retrieving trace data.

The following labels are used to describe API compatibility guarantees.

* **stable** - the API guarantees backwards compatibility. If breaking changes are going to be made in the future, they will result in a new API version, e.g. `/api/v2` URL prefix or a different namespace in the IDL.
* **internal** - the APIs are intended for internal communications between Jaeger components and are not recommended for use by external components.
* **deprecated** - the APIs that are only maintained for legacy reasons and will be phased out in the future.

Since Jaeger v1.32, **jaeger-collector** and **jaeger-query** Service ports that serve gRPC endpoints enable [gRPC reflection][grpc-reflection]. Unfortunately, the internally used `gogo/protobuf` has a [compatibility issue][gogo-reflection] with the official `golang/protobuf`, and as a result only the `list` reflection command is currently working properly.

## Span reporting APIs

**jaeger-agent** and **jaeger-collector** are the two components of the Jaeger backend that can receive spans. At this time they support two sets of non-overlapping APIs.

### OpenTelemetry Protocol (stable)

Since v1.35, the Jaeger backend can receive trace data from the OpenTelemetry SDKs in their native [OpenTelemetry Protocol (OTLP)][otlp]. It is no longer necessary to configure the OpenTelemetry SDKs with Jaeger exporters, nor deploy the OpenTelemetry Collector between the OpenTelemetry SDKs and the Jaeger backend.

The OTLP data is accepted in these formats: (1) binary gRPC, (2) Protobuf over HTTP, (3) JSON over HTTP. For more details on the OTLP receiver see the [official documentation][otlp-rcvr]. Note that not all configuration options are supported in **jaeger-collector** (see `--collector.otlp.*` [CLI Flags](../cli/#jaeger-collector)), and only tracing data is accepted, since Jaeger does not store other telemetry types.

| Port | Protocol | Endpoint | Format
| ----- | ------- | ------------ | ----
| 4317 | gRPC | n/a | Protobuf
| 4318 | HTTP | `/v1/traces` | Protobuf or JSON

[otlp-rcvr]: https://github.com/open-telemetry/opentelemetry-collector/blob/main/receiver/otlpreceiver/README.md
[otlp]: https://github.com/open-telemetry/opentelemetry-proto/blob/main/docs/specification.md

### Thrift over UDP (stable)

**jaeger-agent** can only receive spans over UDP in Thrift format. The primary API is a UDP packet that contains a Thrift-encoded `Batch` struct defined in the [jaeger.thrift] IDL file, located in the [jaeger-idl] repository. Most Jaeger Clients use Thrift's `compact` encoding, however some client libraries do not support it (notably, Node.js) and use Thrift's `binary` encoding (sent to a different UDP port). **jaeger-agent**'s API is defined by the [agent.thrift] IDL file.

For legacy reasons, **jaeger-agent** also accepts spans over UDP in Zipkin format, however, only very old versions of Jaeger clients can send data in that format and it is officially deprecated.

### Protobuf via gRPC (stable)

In a typical Jaeger deployment, **jaeger-agent**s receive spans from Clients and forward them to **jaeger-collector**s. Since Jaeger v1.11, the official and recommended protocol between **jaeger-agent**s and **jaeger-collector**s is `jaeger.api_v2.CollectorService` gRPC endpoint defined in [collector.proto] IDL file. The same endpoint can be used to submit trace data from SDKs directly to **jaeger-collector**.

### Thrift over HTTP (stable)

In some cases it is not feasible to deploy **jaeger-agent** next to the application, for example, when the application code is running as a serverless function. In these scenarios the SDKs can be configured to submit spans directly to **jaeger-collector**s over HTTP/HTTPS.

The same [jaeger.thrift] payload can be submitted in an HTTP POST request to the `/api/traces` endpoint, for example, `https://jaeger-collector:14268/api/traces`. The `Batch` struct needs to be encoded using Thrift's `binary` encoding, and the HTTP request should specify the content type header:

```
Content-Type: application/vnd.apache.thrift.binary
```

### JSON over HTTP (n/a)

There is no official Jaeger JSON format that can be accepted by **jaeger-collector**.
Jaeger does accept the OpenTelemetry protocol via JSON (see [above](#opentelemetry-protocol-stable)).

### Zipkin Formats (stable)

**jaeger-collector** can also accept spans in several Zipkin data formats, namely JSON v1/v2 and Thrift. **jaeger-collector** needs to be configured to enable Zipkin HTTP server, e.g. on port 9411 used by Zipkin collectors. The server enables two endpoints that expect POST requests:

* `/api/v1/spans` for submitting spans in Zipkin JSON v1 or Zipkin Thrift format.
* `/api/v2/spans` for submitting spans in Zipkin JSON v2.

## Trace retrieval APIs

Traces saved in the storage can be retrieved by calling **jaeger-query** Service.

### gRPC/Protobuf (stable)

The recommended way for programmatically retrieving traces and other data is via the `jaeger.api_v2.QueryService` gRPC endpoint defined in [query.proto] IDL file. In the default configuration this endpoint is accessible from `jaeger-query:16685`.

### HTTP JSON (internal)

Jaeger UI communicates with **jaeger-query** Service via JSON API. For example, a trace can be retrieved via a GET request to `https://jaeger-query:16686/api/traces/{trace-id-hex-string}`. This JSON API is intentionally undocumented and subject to change.

## Remote Storage API (stable)

When using the `grpc` storage type (a.k.a. [remote storage](../deployment/#remote-storage)), Jaeger components can use custom storage backends as long as those backends implement the gRPC [Remote Storage API][storage.proto].

## Remote Sampling Configuration (stable)

This API supports Jaeger's [Remote Sampling](../sampling/#remote-sampling) protocol, defined in the [sampling.proto] IDL file.

Both **jaeger-agent** and **jaeger-collector** implement the API. See [Remote Sampling](../sampling/#remote-sampling) for details on how to configure the Collector with sampling strategies. **jaeger-agent** is merely acting as a proxy to **jaeger-collector**.

The following table lists different endpoints and formats that can be used to query for sampling strategies. The official HTTP/JSON endpoints use standard [Protobuf-to-JSON mapping](https://developers.google.com/protocol-buffers/docs/proto3#json).

Component | Port | Endpoint | Format | Notes
--------- | ----- | ----------------- | --------- | -----
Collector | 14268 | `/api/sampling` | HTTP/JSON | Recommended for most SDKs
Collector | 14250 | [sampling.proto] | gRPC | For SDKs that want to use gRPC (e.g. OpenTelemetry Java SDK)
Agent | 5778 | `/sampling` | HTTP/JSON | Recommended for most SDKs if the Agent is used in a deployment
Agent | 5778 | `/` (deprecated) | HTTP/JSON | Legacy format, with enums encoded as numbers. **Not recommended.**

**Examples**

Run all-in-one in one terminal:
```shell
$ go run ./cmd/all-in-one \
--sampling.strategies-file=cmd/all-in-one/sampling_strategies.json
```

Query different endpoints in another terminal:
```shell
# Collector
$ curl "http://localhost:14268/api/sampling?service=foo"
{"strategyType":"PROBABILISTIC","probabilisticSampling":{"samplingRate":1}}

# Agent
$ curl "http://localhost:5778/sampling?service=foo"
{"strategyType":"PROBABILISTIC","probabilisticSampling":{"samplingRate":1}}

# Agent, legacy endpoint / (not recommended)
$ curl "http://localhost:5778/?service=foo"
{"strategyType":0,"probabilisticSampling":{"samplingRate":1}}
```

## Service dependencies graph (internal)

Can be retrieved from**jaeger-query** Service at `/api/dependencies` endpoint. The GET request expects two parameters:

* `endTs` (number of milliseconds since epoch) - the end of the time interval
* `lookback` (in milliseconds) - the length the time interval (i.e. start-time + lookback = end-time).

The returned JSON is a list of edges represented as tuples `(caller, callee, count)`.

For programmatic access to the service graph, the recommended API is gRPC/Protobuf described above.

## Service Performance Monitoring (internal)

Please refer to the [SPM Documentation](../spm#api)

[jaeger-idl]: https://github.com/jaegertracing/jaeger-idl/
[jaeger.thrift]: https://github.com/jaegertracing/jaeger-idl/blob/main/thrift/jaeger.thrift
[agent.thrift]: https://github.com/jaegertracing/jaeger-idl/blob/main/thrift/agent.thrift
[sampling.thrift]: https://github.com/jaegertracing/jaeger-idl/blob/main/thrift/sampling.thrift
[collector.proto]: https://github.com/jaegertracing/jaeger-idl/blob/main/proto/api_v2/collector.proto
[query.proto]: https://github.com/jaegertracing/jaeger-idl/blob/main/proto/api_v2/query.proto
[sampling.proto]: https://github.com/jaegertracing/jaeger-idl/blob/main/proto/api_v2/sampling.proto
[grpc-reflection]: https://github.com/grpc/grpc-go/blob/master/Documentation/server-reflection-tutorial.md#enable-server-reflection
[gogo-reflection]: https://jbrandhorst.com/post/gogoproto/#reflection
[storage.proto]: https://github.com/jaegertracing/jaeger/blob/main/plugin/storage/grpc/proto/storage.proto
81 changes: 81 additions & 0 deletions content/docs/next-release-v2/architecture.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
---
title: Architecture
hasparent: true
weight: 3
children:
- title: APIs
url: apis
- title: Sampling
url: sampling
---

## Terminology

Jaeger represents tracing data in a data model inspired by the [OpenTracing Specification](https://github.com/opentracing/specification/blob/master/specification.md). The data model is logically very similar to [OpenTelemetry Traces](https://opentelemetry.io/docs/concepts/signals/traces/), with some naming differences:

| Jaeger | OpenTelemetry | Notes |
| -------------------- | --------------- | ----------------------------------------------------------------------- |
| Tags | Attributes | Both support typed values, but nested tags are not supported in Jaeger. |
| Span Logs | Span Events | Point-in-time events on the span recorded in a structured form. |
| Span References | Span Links | Jaeger's Span References have a required type (`child-of` or `follows-from`) and always refer to predecessor spans; OpenTelemetry's Span Links have no type, but allow attributes. |
| Process | Resource | A struct describing the entity that produces the telemetry. |

### Span

A **span** represents a logical unit of work that has an operation name, the start time of the operation, and the duration. Spans may be nested and ordered to model causal relationships.

![Traces And Spans](/img/spans-traces.png)

### Trace

A **trace** represents the data or execution path through the system. It can be thought of as a directed acyclic graph of spans.

### Baggage

**Baggage** is arbitrary user-defined metadata (key-value pairs) that can be attached to distributed context and propagated by the tracing SDKs. See [W3C Baggage](https://www.w3.org/TR/baggage/) for more information.

## Architecture

Jaeger can be deployed either as an **all-in-one** binary, where all Jaeger backend components
run in a single process, or as a scalable distributed system. There are two main deployment options discussed below.

### Direct to storage

In this deployment Jaeger receives the data from traced applications and writes it directly to storage. The storage must be able to handle both average and peak traffic. Collectors use an in-memory queue to smooth short-term traffic peaks, but a sustained traffic spike may result in dropped data if the storage is not able to keep up.

![Architecture](/img/architecture-v1-2023.png)

### Via Kafka

To prevent data loss between collectors and storage, Kafka can be used as an intermediary, persistent queue. Jaeger can be deployed with OpenTelemetry to handle writing the data to Kafka and pulling it off the queue and writing the data to the storage. Multiple Jaeger instances can be deployed to scale up ingestion; they will automatically partition the load across them.

![Architecture](/img/architecture-v2-2023.png)

### With OpenTelemetry Collector

You **do not need to use OpenTelemetry Collector**, because **Jaeger** is a customized distribution of the OpenTelemetry Collector with different roles. However, if you already use the OpenTelemetry Collectors, for gathering other types of telemetry or for pre-processing / enriching the tracing data, it __can be placed before__ **Jaeger**. The OpenTelemetry Collectors can be run as an application sidecar, as a host agent / daemon, or as a central cluster.

The OpenTelemetry Collector supports Jaeger's Remote Sampling protocol and can either serve static configurations from config files directly, or proxy the requests to the Jaeger backend (e.g., when using adaptive sampling).

![Architecture](/img/architecture-otel.png)

#### OpenTelemetry Collector as a sidecar / host agent

Benefits:

* The SDK configuration is simplified as both trace export endpoint and sampling config endpoint can point to a local host and not worry about discovering where those services run remotely.
* Collector may provide data enrichment by adding environment information, like k8s pod name.
* Resource usage for data enrichment can be distributed across all application hosts.

Downsides:

* An extra layer of marshaling/unmarshaling the data.

#### OpenTelemetry Collector as a remote cluster

Benefits:
* Sharding capabilities, e.g., when using [tail-based sampling](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/processor/tailsamplingprocessor/README.md).

Downsides:

* An extra layer of marshaling/unmarshaling the data.
22 changes: 22 additions & 0 deletions content/docs/next-release-v2/badger.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
---
title: Badger
hasparent: true
---

[Badger](https://github.com/dgraph-io/badger) is an embedded local storage, only available
with **all-in-one** distribution. By default, it acts as ephemeral storage using a temporary file system.
This can be overridden by using the `--badger.ephemeral=false` option.

```sh
docker run \
-e SPAN_STORAGE_TYPE=badger \
-e BADGER_EPHEMERAL=false \
-e BADGER_DIRECTORY_VALUE=/badger/data \
-e BADGER_DIRECTORY_KEY=/badger/key \
-v <storage_dir_on_host>:/badger \
-p 16686:16686 \
jaegertracing/all-in-one:{{< currentVersion >}}
```

* [Upgrade Badger v1 to v3](https://github.com/jaegertracing/jaeger/blob/main/plugin/storage/badger/docs/upgrade-v1-to-v3.md)
* [Badger file permissions as non-root service](https://github.com/jaegertracing/jaeger/blob/main/plugin/storage/badger/docs/storage-file-non-root-permission.md)
103 changes: 103 additions & 0 deletions content/docs/next-release-v2/cassandra.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
---
title: Cassandra
hasparent: true
---

* Supported versions: 3.4+

Deploying Cassandra itself is out of scope for our documentation. One good
source of documentation is the [Apache Cassandra Docs](https://cassandra.apache.org/doc/latest/).

Cassandra also has the following officially supported resources available from the community:
- [Docker container](https://hub.docker.com/_/cassandra) for getting a single node up quickly
- [Helm chart](https://artifacthub.io/packages/helm/bitnami/cassandra) from Bitnami
- [Kubernetes Operator](https://github.com/k8ssandra/cass-operator) from DataStax

#### Configuration
##### Minimal
```sh
docker run \
-e SPAN_STORAGE_TYPE=cassandra \
-e CASSANDRA_SERVERS=<...> \
jaegertracing/jaeger-collector:{{< currentVersion >}}
```

Note: White space characters are allowed in `CASSANDRA_SERVERS`. For Example: Servers can be passed as `CASSANDRA_SERVERS="1.2.3.4, 5.6.7.8" for better readability.

##### All options
To view the full list of configuration options, you can run the following command:
```sh
docker run \
-e SPAN_STORAGE_TYPE=cassandra \
jaegertracing/jaeger-collector:{{< currentVersion >}} \
--help
```

#### Schema script

A script is provided to initialize Cassandra keyspace and schema
using Cassandra's interactive shell [`cqlsh`][cqlsh]:

```sh
MODE=test sh ./plugin/storage/cassandra/schema/create.sh | cqlsh
```

Or using the published Docker image (make sure to provide the right IP address):
```sh
docker run \
-e CQLSH_HOST={server IP address} \
jaegertracing/jaeger-cassandra-schema:{{< currentVersion >}}
```

For production deployment, pass `MODE=prod DATACENTER={datacenter}` arguments to the script,
where `{datacenter}` is the name used in the Cassandra configuration / network topology.

The script also allows overriding TTL, keyspace name, replication factor, etc.
Run the script without arguments to see the full list of recognized parameters.

**Note**: See [README](https://github.com/jaegertracing/jaeger/blob/main/plugin/storage/cassandra/schema/README.md) for more details on Cassandra schema management.

#### TLS support

Jaeger supports TLS client to node connections as long as you've configured
your Cassandra cluster correctly. After verifying with e.g. `cqlsh`, you can
configure the collector and query like this:

```sh
docker run \
-e CASSANDRA_SERVERS=<...> \
-e CASSANDRA_TLS=true \
-e CASSANDRA_TLS_SERVER_NAME="CN-in-certificate" \
-e CASSANDRA_TLS_KEY=<path to client key file> \
-e CASSANDRA_TLS_CERT=<path to client cert file> \
-e CASSANDRA_TLS_CA=<path to your CA cert file> \
jaegertracing/jaeger-collector:{{< currentVersion >}}
```

The schema tool also supports TLS. You need to make a custom cqlshrc file like
so:

```
# Creating schema in a cassandra cluster requiring client TLS certificates.
#
# Create a volume for the schema docker container containing four files:
# cqlshrc: this file
# ca-cert: the cert authority for your keys
# client-key: the keyfile for your client
# client-cert: the cert file matching client-key
#
# if there is any sort of DNS mismatch and you want to ignore server validation
# issues, then uncomment validate = false below.
#
# When running the container, map this volume to /root/.cassandra and set the
# environment variable CQLSH_SSL=--ssl
[ssl]
certfile = ~/.cassandra/ca-cert
userkey = ~/.cassandra/client-key
usercert = ~/.cassandra/client-cert
# validate = false
```

#### Compatible Backends

* ScyllaDB [can be used](https://github.com/jaegertracing/jaeger/blob/main/plugin/storage/scylladb/README.md) as a drop-in replacement for Cassandra since it uses the same data model and query language.
Loading

0 comments on commit 832a3ec

Please sign in to comment.