-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adressing Out Of Order Processing via a datastore solution #265
Comments
Won't other messages (not necessarily of the same order Id) be blocked in the partition even if we use a unique key?? Shouldn't we be offloading such poison messages into a DLQ(which ziggurat already offers) according to a timeout. This would ensure the blocking can be scoped according to expected SLA. Also, what happens if there's a timeout or degradation on redis end (although not common). Will the message be treated as a successfully processed message or not...? |
Also, what advantage does native ziggurat support for OOO offer except for DRY. |
We are thinking of having a guarantee that is-stale? will deliver three statuses -> true / false / failed . The third status implies that we couldn't determine the stalesness of the message in which case it is upto user to choose behavior.
DRY is one of the motivations definitely and a big one . However another one is a usecase that I often see in our org . A lot of services using ziggurat are written in a way where we treat the kafka messages as a notification-log/ticker , call the api of the same service when we get message to get the information it needs to process the message . We do this even if the message contains everything the actor needs to process the message . This causes a perf problem where the service first constructs the message and sends it in the kafka event log but the actor calls it again hence forcing it to reconstruct the message by recomputing the same info . Why do developers do this ?This is mostly to protect against out of order in case of retrying messages or replaying messages from DLQ . Calling the originating/producer service always ensures that you get latest state of resource. Two solutions to it are :
The idea is for ziggurat to provide this small middleware so that the only thing devs need to do is create / maintain the infra as well as fill in the configs. |
In principle, I think this arises because we are now keeping the state of truth in two places
|
Which timestamp are we talking about here? Timestamps might not be strictly increasing as well (given event in same millisecond) and subject to clock-skews. Thus could be better to use offset number and the partition key as a combination. |
Server time yes it is subject to the problem you are mentioning , also same millisecond message for the same user is a very real possibility . yes . |
Problem and context
I can explain a OutOfOrder Processing problem using an example. Assume the following is the timeline of events in a kafka topic ->
Chances of Out of Order Processing will happen when a consumer
Now someone cannot replay the DLQ as if they do M2 might get processed and that's a issue cause you cant have a cancelled order modified . It should be handled in the consumer but then all these rules need to be written in the consumer to detail out an acceptable sequence of events which might not be feasible.
Kafka guarantees that messages pushed in the same partition will always be delivered in order of pushing . Here messages for a single order are being pushed to same partition as the key of the record is order_id . Kafka wants you to segment your data this way as well . After all, this allows for in order processing .
However it is not always necessary that developers push the change event for the same resource in the same partition and instead choose to generate a unique key for each kafka record . Usecase being to even out the partitions and not have one partition completely blocked if one message for a order is creating a problem .
Solution
A universal solution is for all consumers is to remember when was the last seen timestamp for each order . So for above example said store would look like
Every new message that comes into the queue makes an entry in the table and based on its staleness we take an action. Lets say since M2 with timestamp 10:01 arrived after M3 , it can be skipped or handled differently post comparison with this table ( table shows that we already processed a later message so this message M2 is stale event ).
Proposal
We are proposing an optional middle-ware using which a user can get in there message-metadata in the mapper function whether or not the message is stale. ( behind latest processed message)
The function would take in a message , extract the key ( userid/orderid/driverid ) from it and upsert into the store the id and timestamp it has against itself. Every time a new message is processed it would compare timestamps and add a is-stale? key to message-metadata persistentMap. If key is not present in the store i.e. the message is seen for the first time OR message is fresh i.e. timestamp greater than present timestamp in store it will just upsert the entry ( id ,timestamp) in the store.
We would be using redis as a data store . We would need four configs from the user
For any issues do reach out .
Thanks,
Rijul
The text was updated successfully, but these errors were encountered: