Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reorder messages by timestamp #8440

Open
ybyzek opened this issue Dec 2, 2021 · 3 comments
Open

Reorder messages by timestamp #8440

ybyzek opened this issue Dec 2, 2021 · 3 comments
Assignees
Labels

Comments

@ybyzek
Copy link
Contributor

ybyzek commented Dec 2, 2021

Is your feature request related to a problem? Please describe.

Use case 1: Receiving stock market data from an external source, however, the source sometimes delivers the market data out of order. Need to re-order the records by the timestamp within a certain time window (say 30 second) in ksqlDB, so that the downstream topics will have the results back in the right order

Use case 2: Game provider clients can be offline and accumulate messages, then when they come online the messages are (sometimes) delivered. Need to re-order the messages for proper processing.

Describe the solution you'd like

Built-in function that re-orders records within a given window.

Describe alternatives you've considered

Kafka Streams example: confluentinc/kafka-streams-examples#411

Additional context
Add any other context or screenshots about the feature request here.

@agavra agavra added the streaming-engine Tickets owned by the ksqlDB Streaming Team label Dec 3, 2021
@ybyzek
Copy link
Contributor Author

ybyzek commented Dec 7, 2021

At a high-level, this seems related to ORDER BY (#1572)

@mjsax
Copy link
Member

mjsax commented Dec 7, 2021

Just to dump a view thoughts:

  • we should look into SQL OVER clause (maybe we could leverage it for this case?)
  • if OVER clause does not fit, we might need to consider adding a new operator (or maybe allow use a SLIDING WINDOW [that we didn't add yet] without a GROUP BY clause) -- Not sure if we should reuse ORDER BY as keyword or not
  • it might also be possible to just add a completely new operator reorder(stream, grace) (ie, a table-value function) that does the reordering
  • adding an "re-order" operator to Kafka Streams might be beneficial for KS users as well (as an alternative, we could use a custom transfromValues to implement it

@gphilipp
Copy link

gphilipp commented Aug 2, 2022

It's a problem that we are currently facing too. Eg the Salesforce source KC connector produces messages without a key. If you use a topic with multiple partitions to store those messages, they will end up in random partitions and you'll possibly process them out of order.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants