Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checkpoint shards and rotate through them for DynamoDB streams source #5208

Open
graytaylor0 opened this issue Nov 20, 2024 · 0 comments
Open

Comments

@graytaylor0
Copy link
Member

Is your feature request related to a problem? Please describe.
Currently, the DynamoDB source will grab up to 150 active shards in one data prepper container, and continue to hold onto those shards until the shard is closed and the end of the shard iterator is reached, which will happen either after 4 hours, or after the shard has a certain amount of data.

This means that for DynamoDB tables with a large amount of shards on the streams, regardless of how much data is being sent to the streams, many data prepper containers (a minimum of shard count / 150) must be used to achieve low latency on the DDB stream data.

Describe the solution you'd like
A single data prepper container should grab ownership of a shard, process it for some time, then checkpoint it with a sequence number, before giving up that shard and moving to the next one. This would allow for one data prepper container to process all of the shards in a DynamoDB stream in a somewhat timely manner, with the trade off that latency may be slightly higher when using a large amount of data prepper containers

Describe alternatives you've considered (Optional)
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

1 participant