Why can using Kinesis Aggregation result in data loss?

All Kinesis Streams are divided into Shards, which are indexed using an 128bit integer (range 0 to 340282366920938463463374607431768211455). Each Shard contains messages whose Explicit Hash Key or MD5 checksum of the provided Partition Key is between two values - the Start and End Hash.

When Kinesis Aggregation is used, multiple messages are aggregated into one, which are targeted toward a Single Shard based upon the first Partition or Explicit Hash Key. This means that messages that could target multiple Shards are stored within a single Protobuf messages. In v1.x versions of KCL processing applications, these messages were passed directly to the consumer, regardless of which Shard that consumer was 'assigned' to. However, in v2.x KCL consumers, messages which are not destined for the 'assigned' consumer are dropped silently by the KCL and not delivered to the Consumer. This can result in data loss unless Kinesis Aggregation is carefully designed.

To address this issue, we recommend that you use the DescribeStream API to obtain a detailed view of the Stream's Shard topology, and create separate RecordAggregator objects per destination Shard. You should then use the record Partition or Explicit hash key to compute the destination Shard, and then add the record through the associated RecordAggregator. Please ensure that you periodically refresh the Stream topology to maintain a relatively up to date mapping. While this architecture can result in an improvement on using the base RecordAggregator, there is still the possibility of data loss because the Shard topology may change while records are being aggregated. You can minimise the amount of time that this inconsistent aggregation may be occuring, but cannot eliminate it completely. Therefore, for any use cases where guaranteed delivery of messages is required, DO NOT USE KINESIS AGGREGATION SEPARATELY FROM THE KPL, and instead use the Kinesis PutRecords API without Aggregation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

potential_data_loss.md

potential_data_loss.md

Why can using Kinesis Aggregation result in data loss?

Files

potential_data_loss.md

Latest commit

History

potential_data_loss.md

File metadata and controls

Why can using Kinesis Aggregation result in data loss?