Consumer Shutdown Testing: Exception handling #6

markglh · 2017-05-30T10:46:40Z

When processing a batch and an exception scenario occurs we need to gracefully handle this. Exceptions include:

The application not acknowledging the processing of messages above the configured threshold. (failed batches)
Exception within the Kinesis Client itself (definitely a bug, but again, causing a failed batch) - handled in the manager.
Losing connectivity to Kinesis - I'm almost certain the KCL will retry x times before shutting down the worker, this in turn will cause a "GracefulShutdown" which will notify the application's message processor actor. It's up to that what happens (Notification's calls System.exit - sugar perhaps shouldn't?). What's certain here though is that once the Worker shuts down - it won't ever restart. Can we increase the retries on the KCL/KPL via config? Or do we need a process which restarts the worker? Alternatively shutdown? ...

By gracefully do we mean:

Shutting down the whole application? Regardless of the number of individual stream consumers, producers & shards
Delegating this to the application?

Currently we shutdown the processing and then the whole application, working on the assumption that without a consumer the application can't serve it's purpose. Perhaps this should be configurable?

markglh · 2017-05-30T10:48:00Z

Initial testing scenarios by @DavidW-ww

Scenario: #1 - Baseline

1 producer putRecord for 10MM records over 30 minutes. Make sure that towards the end of the 30 minutes, records are still being put.
1 consumer writing all sequence numbers to file.
1 shard

Expectation: Ensure that exactly 10 MM sequence numbers in file.

Scenario #2 - Single worker recovers
1 producer putRecord for 10MM records over 30 minutes.
1 consumer writing all sequence numbers to file.
1 shard

At 10 minutes after start, Ctrl-C consumer
Wait 2 minutes then restart consumer

At end of test, expect exactly 10MM unique sequence numbers inside file.

Scenario #3 - Inter-worker Checkpoint
1 producer putRecord for 10MM records over 30 minutes.
1 shard
2 consumers listening to same stream writing all sequence numbers to file.

Start consumer #1.
Start producer.
Start consumer #2. Consumer #2 has no lease since there is only one shard which is being used by Consumer #1
10 minutes after start, Ctrl-C consumer #1
Verify consumer #2 is picking up messages from stream.
Verify at end of test that 10MM unique sequence numbers are in file

Scenario #4 - Simulate Kubernetes terminate during re-deploy
Five shards for stream.
1 producer
5 consumers

Start producer
Verify 5 consumers are processing messages
10 minutes after start of test, Ctrl-C 2 consumers.
2 minutes later, restart 2 consumers previously shut down.
At end of test, verify exactly 10MM unique sequence numbers in file.

markglh closed this as completed May 30, 2017

markglh reopened this May 30, 2017

markglh mentioned this issue May 30, 2017

Producer Shutdown Testing: Graceful Shutdown #7

Open

markglh changed the title ~~Kinesis Consumer Shutdown Testing: Exception handling~~ Consumer Shutdown Testing: Exception handling May 30, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consumer Shutdown Testing: Exception handling #6

Consumer Shutdown Testing: Exception handling #6

markglh commented May 30, 2017

markglh commented May 30, 2017 •

edited

Loading

Consumer Shutdown Testing: Exception handling #6

Consumer Shutdown Testing: Exception handling #6

Comments

markglh commented May 30, 2017

markglh commented May 30, 2017 • edited Loading

markglh commented May 30, 2017 •

edited

Loading