You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
StreamingIngestion, 10 file per 10 seconds, file size is 1MB (24MB uncompressed) - streaming ingestion
BatchIngestion, 10 file per 30 minutes, file size is 100MB - log case
Configuration
streaming source model
pull mode: MicroBatch pull files from s3.
push mode: MicroBatch read from SQS/SNS. (TBD)
configure job with dynamicAllocation enable / disable
configure job with different executors configuration
executors = 3 (default)
executors = 10
executors = 30
spark streaming job interval
default
10mins
Measurement
Latency: p90 and p75 of Latency. Latency is the time difference between the moment of data production at the source (PUT object on S3) and the moment that the data has produced an output.
Cost: p90 and p75 Billed resource utilization
Latency
We measure the event-time latency.
For skipping index. We define event-time latency to be the interval between a file’s event-time and its emission time from the output operator.
The generator append eventTime to filename.
streaming system calculate latency = processTime - extractTimeFromFileName.
For covering index. We define event-time latency to be the interval between a file’s event-time and its emission time from the output operator.
The generator append eventTime for each tuple.
streaming system calculate latency = processTime - eventTime.
For MV. We define event-time latency to be the interval between a tuple’s event-time and its emission time from the output operator.
The generator append eventTime for each tuple.
streaming system re-calculate eventTime = max(eventTime contribute to the window).
latency = processingTime - eventTime
The text was updated successfully, but these errors were encountered:
Goals
Test Plan
Dimensions
Measurement
Latency
We measure the event-time latency.
For skipping index. We define event-time latency to be the interval between a file’s event-time and its emission time from the output operator.
For covering index. We define event-time latency to be the interval between a file’s event-time and its emission time from the output operator.
For MV. We define event-time latency to be the interval between a tuple’s event-time and its emission time from the output operator.
The text was updated successfully, but these errors were encountered: