This repo contains the code showcased in the Beam Summit talk Understanding Exactly-Once Processing And Windowing In Streaming Pipelines
In that talk, I explained how windowing works in streaming pipelines, and what are the decisions you have to make in order to do complex event processing in streaming with Apache Beam.
Whenever we apply a window, there are always doubts about whether the window will drop data, or how many times (and when) will be the output triggered.
This repo contains a sample pipeline that uses unit testing to check if your window would drop data, and how many times would the window be trigered. You write your window in a function, and then use the unit test to check the output of that pipeline. If the window drops data, the test will fail. In addition, you get a CSV output that you can examine to see how and when your window produced output.
Watch the video at the Beam Summit website, or at YouTube:
Check also the slides and the notes of each slide.
The pipeline processes 60 messages. 50 messages produced on time, and 10 messages that arrive after the watermark (late data).
- We first add the 50 messages and advance the watermark
- Then we add 10 messages with a timestamp older than the watermark, and we advance the watermark some seconds per message (to simulate some elapsed time)
- We read the messages, and count them before applying the window
- Then we apply the window, group, calculate a sum (and update another metric to count the aggregated messages), and generate a CSV
- We can now check if the window dropped any message or not
Add a new window to src/main/java/com/google/cloud/pso/windows/SomeSampleWindow.java
.
For that, just add a new method with this signature:
public Window<KV<String, MyDummyEvent>> myCustomWindow()
(maybe with some input parameters if you want to use those in your window).
See some examples of windows in that file
TODO
Copyright 2020 Google LLC Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.