Skip to content
This repository has been archived by the owner on Oct 23, 2023. It is now read-only.

Consumer Shutdown Testing: Exception handling #6

Open
markglh opened this issue May 30, 2017 · 1 comment
Open

Consumer Shutdown Testing: Exception handling #6

markglh opened this issue May 30, 2017 · 1 comment

Comments

@markglh
Copy link
Contributor

markglh commented May 30, 2017

When processing a batch and an exception scenario occurs we need to gracefully handle this. Exceptions include:

  • The application not acknowledging the processing of messages above the configured threshold. (failed batches)
  • Exception within the Kinesis Client itself (definitely a bug, but again, causing a failed batch) - handled in the manager.
  • Losing connectivity to Kinesis - I'm almost certain the KCL will retry x times before shutting down the worker, this in turn will cause a "GracefulShutdown" which will notify the application's message processor actor. It's up to that what happens (Notification's calls System.exit - sugar perhaps shouldn't?). What's certain here though is that once the Worker shuts down - it won't ever restart. Can we increase the retries on the KCL/KPL via config? Or do we need a process which restarts the worker? Alternatively shutdown? ...

By gracefully do we mean:

  • Shutting down the whole application? Regardless of the number of individual stream consumers, producers & shards
  • Delegating this to the application?

Currently we shutdown the processing and then the whole application, working on the assumption that without a consumer the application can't serve it's purpose. Perhaps this should be configurable?

@markglh
Copy link
Contributor Author

markglh commented May 30, 2017

Initial testing scenarios by @DavidW-ww

Scenario: #1 - Baseline

1 producer putRecord for 10MM records over 30 minutes. Make sure that towards the end of the 30 minutes, records are still being put.
1 consumer writing all sequence numbers to file.
1 shard

Expectation: Ensure that exactly 10 MM sequence numbers in file.

Scenario #2 - Single worker recovers
1 producer putRecord for 10MM records over 30 minutes.
1 consumer writing all sequence numbers to file.
1 shard

At 10 minutes after start, Ctrl-C consumer
Wait 2 minutes then restart consumer

At end of test, expect exactly 10MM unique sequence numbers inside file.

Scenario #3 - Inter-worker Checkpoint
1 producer putRecord for 10MM records over 30 minutes.
1 shard
2 consumers listening to same stream writing all sequence numbers to file.

Start consumer #1.
Start producer.
Start consumer #2. Consumer #2 has no lease since there is only one shard which is being used by Consumer #1
10 minutes after start, Ctrl-C consumer #1
Verify consumer #2 is picking up messages from stream.
Verify at end of test that 10MM unique sequence numbers are in file

Scenario #4 - Simulate Kubernetes terminate during re-deploy
Five shards for stream.
1 producer
5 consumers

Start producer
Verify 5 consumers are processing messages
10 minutes after start of test, Ctrl-C 2 consumers.
2 minutes later, restart 2 consumers previously shut down.
At end of test, verify exactly 10MM unique sequence numbers in file.

@markglh markglh closed this as completed May 30, 2017
@markglh markglh reopened this May 30, 2017
@markglh markglh changed the title Kinesis Consumer Shutdown Testing: Exception handling Consumer Shutdown Testing: Exception handling May 30, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

No branches or pull requests

1 participant