Checkpoint shards and rotate through them for DynamoDB streams source #5208

graytaylor0 · 2024-11-20T21:25:50Z

Is your feature request related to a problem? Please describe.
Currently, the DynamoDB source will grab up to 150 active shards in one data prepper container, and continue to hold onto those shards until the shard is closed and the end of the shard iterator is reached, which will happen either after 4 hours, or after the shard has a certain amount of data.

This means that for DynamoDB tables with a large amount of shards on the streams, regardless of how much data is being sent to the streams, many data prepper containers (a minimum of shard count / 150) must be used to achieve low latency on the DDB stream data.

Describe the solution you'd like
A single data prepper container should grab ownership of a shard, process it for some time, then checkpoint it with a sequence number, before giving up that shard and moving to the next one. This would allow for one data prepper container to process all of the shards in a DynamoDB stream in a somewhat timely manner, with the trade off that latency may be slightly higher when using a large amount of data prepper containers

Describe alternatives you've considered (Optional)
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

The text was updated successfully, but these errors were encountered:

graytaylor0 added the untriaged label Nov 20, 2024

github-project-automation bot added this to Data Prepper Tracking Board Nov 20, 2024

github-project-automation bot moved this to Unplanned in Data Prepper Tracking Board Nov 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Checkpoint shards and rotate through them for DynamoDB streams source #5208

Checkpoint shards and rotate through them for DynamoDB streams source #5208

graytaylor0 commented Nov 20, 2024

Checkpoint shards and rotate through them for DynamoDB streams source #5208

Checkpoint shards and rotate through them for DynamoDB streams source #5208

Comments

graytaylor0 commented Nov 20, 2024