Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds documentation for DocumentDB as a source. #7137

Merged
merged 35 commits into from
May 24, 2024
Merged
Changes from 32 commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
79d3f43
Adds documentation for DocumentDB as a source.
dlvenable May 14, 2024
e444d3a
Merge branch 'main' into data-prepper-documentdb
vagimeli May 20, 2024
c82878e
Apply suggestions from code review
dlvenable May 20, 2024
a5d9d8a
Made some other changes to the documentation per the PR.
dlvenable May 24, 2024
fc88eda
Merge branch 'main' into data-prepper-documentdb
vagimeli May 24, 2024
f59c570
Update _data-prepper/pipelines/configuration/sources/documentdb.md
vagimeli May 24, 2024
8c9b630
Update _data-prepper/pipelines/configuration/sources/documentdb.md
vagimeli May 24, 2024
dd1130d
Update _data-prepper/pipelines/configuration/sources/documentdb.md
vagimeli May 24, 2024
960f6ce
Update _data-prepper/pipelines/configuration/sources/documentdb.md
vagimeli May 24, 2024
b8372f6
Update _data-prepper/pipelines/configuration/sources/documentdb.md
vagimeli May 24, 2024
d19bfb7
Update _data-prepper/pipelines/configuration/sources/documentdb.md
vagimeli May 24, 2024
f7e120e
Update _data-prepper/pipelines/configuration/sources/documentdb.md
vagimeli May 24, 2024
ed6daae
Update _data-prepper/pipelines/configuration/sources/documentdb.md
vagimeli May 24, 2024
7d76a74
Update _data-prepper/pipelines/configuration/sources/documentdb.md
vagimeli May 24, 2024
bc33f53
Update _data-prepper/pipelines/configuration/sources/documentdb.md
vagimeli May 24, 2024
134c4af
Update _data-prepper/pipelines/configuration/sources/documentdb.md
vagimeli May 24, 2024
bf743ab
Update _data-prepper/pipelines/configuration/sources/documentdb.md
vagimeli May 24, 2024
cbb33f1
Update _data-prepper/pipelines/configuration/sources/documentdb.md
vagimeli May 24, 2024
666752c
Update _data-prepper/pipelines/configuration/sources/documentdb.md
vagimeli May 24, 2024
79dc780
Update _data-prepper/pipelines/configuration/sources/documentdb.md
vagimeli May 24, 2024
797ad86
Update _data-prepper/pipelines/configuration/sources/documentdb.md
vagimeli May 24, 2024
c8da4bb
Update _data-prepper/pipelines/configuration/sources/documentdb.md
vagimeli May 24, 2024
b78c736
Update _data-prepper/pipelines/configuration/sources/documentdb.md
vagimeli May 24, 2024
78d551c
Update _data-prepper/pipelines/configuration/sources/documentdb.md
vagimeli May 24, 2024
176dfa7
Update _data-prepper/pipelines/configuration/sources/documentdb.md
vagimeli May 24, 2024
a350940
Update _data-prepper/pipelines/configuration/sources/documentdb.md
vagimeli May 24, 2024
a66cc1c
Update _data-prepper/pipelines/configuration/sources/documentdb.md
vagimeli May 24, 2024
9e7e8fb
Update _data-prepper/pipelines/configuration/sources/documentdb.md
vagimeli May 24, 2024
07c6f25
Update _data-prepper/pipelines/configuration/sources/documentdb.md
vagimeli May 24, 2024
df48ebc
Update _data-prepper/pipelines/configuration/sources/documentdb.md
vagimeli May 24, 2024
6abf7bd
Update _data-prepper/pipelines/configuration/sources/documentdb.md
vagimeli May 24, 2024
4e3e546
Update _data-prepper/pipelines/configuration/sources/documentdb.md
vagimeli May 24, 2024
69102ee
Update _data-prepper/pipelines/configuration/sources/documentdb.md
vagimeli May 24, 2024
dd10324
Update _data-prepper/pipelines/configuration/sources/documentdb.md
vagimeli May 24, 2024
cd27480
Update _data-prepper/pipelines/configuration/sources/documentdb.md
vagimeli May 24, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
98 changes: 98 additions & 0 deletions _data-prepper/pipelines/configuration/sources/documentdb.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
---
layout: default
title: documentdb
parent: Sources
grand_parent: Pipelines
nav_order: 2
---

# documentdb

Check failure on line 9 in _data-prepper/pipelines/configuration/sources/documentdb.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.HeadingCapitalization] 'documentdb' is a heading and should be in sentence case. Raw Output: {"message": "[OpenSearch.HeadingCapitalization] 'documentdb' is a heading and should be in sentence case.", "location": {"path": "_data-prepper/pipelines/configuration/sources/documentdb.md", "range": {"start": {"line": 9, "column": 3}}}, "severity": "ERROR"}

Check failure on line 9 in _data-prepper/pipelines/configuration/sources/documentdb.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.Spelling] Error: documentdb. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks. Raw Output: {"message": "[OpenSearch.Spelling] Error: documentdb. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_data-prepper/pipelines/configuration/sources/documentdb.md", "range": {"start": {"line": 9, "column": 3}}}, "severity": "ERROR"}

The `documentdb` source reads documents from [Amazon DocumentDB](https://aws.amazon.com/documentdb/) collections.
It can read historical data from an export and keep up to date on the data using Amazon DocumentDB [change streams](https://docs.aws.amazon.com/documentdb/latest/developerguide/change_streams.html).

The `documentdb` source reads data from Amazon DocumentDB and puts that data into an [Amazon Simple Storage Service (Amazon S3)](https://aws.amazon.com/s3/) bucket.
Then, other Data Prepper workers read from the S3 bucket to process data.

## Usage
The following example pipeline uses the `documentdb` source:

dlvenable marked this conversation as resolved.
Show resolved Hide resolved
```yaml
version: "2"
documentdb-pipeline:
source:
documentdb:
host: "docdb-mycluster.cluster-random.us-west-2.docdb.amazonaws.com"
port: 27017
authentication:
username: ${{aws_secrets:secret:username}}
password: ${{aws_secrets:secret:password}}
aws:
sts_role_arn: "arn:aws:iam::123456789012:role/MyRole"
s3_bucket: my-bucket
s3_region: us-west-2
collections:
- collection: my-collection
export: true
stream: true
acknowledgments: true
```
vagimeli marked this conversation as resolved.
Show resolved Hide resolved
{% include copy-curl.html %}

## Configuration

You can use the following options to configure the `documentdb` source.

Option | Required | Type | Description
:--- | :--- | :--- | :---
`host` | Yes | String | The hostname of the Amazon DocumentDB cluster.
`port` | No | Integer | The port number of the Amazon DocumentDB cluster. Defaults to `27017`.
`trust_store_file_path` | No | String | The path to a truststore file that contains the public certificate for the Amazon DocumentDB cluster.
`trust_store_password` | No | String | The password for the truststore specified by `trust_store_file_path`.
`authentication` | Yes | Authentication | The authentication configuration. See the [authentication](#authentication) section for more information.
`collections` | Yes | List | A list of collection configurations. Exactly one collection is required. See the [collections](#collection) section for more information.
`s3_bucket` | Yes | String | The S3 bucket to use for processing events from Amazon DocumentDB.
`s3_prefix` | No | String | An optional Amazon S3 key prefix. By default, there is no key prefix.
`s3_region` | No | String | The AWS Region in which the S3 bucket resides.
`aws` | Yes | AWS | The AWS configuration. See the [aws](#aws) section for more information.

Check failure on line 57 in _data-prepper/pipelines/configuration/sources/documentdb.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.Spelling] Error: aws. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks. Raw Output: {"message": "[OpenSearch.Spelling] Error: aws. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_data-prepper/pipelines/configuration/sources/documentdb.md", "range": {"start": {"line": 57, "column": 2}}}, "severity": "ERROR"}
`id_key` | No | String | When specified, the Amazon DocumentDB `_id` field is set to the key name specified by `id_key`. You can use this when you need more information than is provided by the `ObjectId` string saved to your sink. By default, the `_id` is not included as part of the event.
`direct_connection` | No | Boolean | When `true`, the MongoDB driver connects directly to the specified Amazon DocumentDB server(s) without discovering and connecting to the entire replica set. Defaults to `true`.
`read_preference` | No | String | Determines how to read from Amazon DocumentDB. See [Read Preference Modes](https://www.mongodb.com/docs/v3.6/reference/read-preference/#read-preference-modes) for more information. Defaults to `primaryPreferred`.
`disable_s3_read_for_leader` | No | Boolean | When `true`, the current leader node does not read from Amazon S3. It only reads the stream. Defaults to `false`.
`partition_acknowledgment_timeout` | No | Duration | Configures the amount of time during which the node holds a partition. Defaults to `2h`.
`acknowledgments` | No | Boolean | When set to `true`, enables [end-to-end acknowledgments]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/pipelines#end-to-end-acknowledgments) on the source after events are sent to the sinks.
`insecure` | No | Boolean | Disables TLS. Defaults to `false`. Do not use this value in production.
`ssl_insecure_disable_verification` | No | Boolean | Disables TLS hostname verification. Defaults to `false`. Do not enable this flag in production. Instead, use the `trust_store_file_path` to verify the hostname.

### `authentication`

The following parameters enable you to configure authentication for the Amazon DocumentDB cluster.
vagimeli marked this conversation as resolved.
Show resolved Hide resolved

Option | Required | Type | Description
:--- | :--- | :--- | :---
`username` | Yes | String | The username to use when authenticating with the Amazon DocumentDB cluster. Supports automatic refresh.
`password` | Yes | String | The password to use when authenticating with the Amazon DocumentDB cluster. Supports automatic refresh.

vagimeli marked this conversation as resolved.
Show resolved Hide resolved

### `collection`

The following parameters enable you to configure the `collection` to read from the Amazon DocumentDB cluster.
vagimeli marked this conversation as resolved.
Show resolved Hide resolved

Option | Required | Type | Description
:--- | :--- | :--- | :---
`collection` | Yes | String | The name of the collection.
`export` | No | Boolean | Whether to include an export or a full load. Defaults to `true`.
`stream` | No | Boolean | Whether to enable a stream. Defaults to `true`.
`partition_count` | No | Integer | Defines the number of partitions to create in Amazon S3. Defaults to `100`.
`export_batch_size` | No | Integer | Defaults to `10,000`.
`stream_batch_size` | No | Integer | Defaults to `1,000`.

## `aws`

The following parameters enable you to configure your access to Amazon DocumentDB.

vagimeli marked this conversation as resolved.
Show resolved Hide resolved
Option | Required | Type | Description
:--- | :--- | :--- | :---
`sts_role_arn` | No | String | The AWS Security Token Service (AWS STS) role to assume for requests to Amazon Simple Queue Service (Amazon SQS) and Amazon S3. Defaults to `null`, which uses the [standard SDK behavior for credentials](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/credentials.html).
`aws_sts_header_overrides` | No | Map | A map of header overrides that the AWS Identity and Access Management (IAM) role assumes for the sink plugin.
`sts_external_id` | No | String | An external STS ID used when Data Prepper assumes the STS role. See `ExternalID` in the [STS AssumeRole](https://docs.aws.amazon.com/STS/latest/APIReference/API_AssumeRole.html) API reference documentation.
Loading