Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling data redundancy for legacy data with the event bus #319

Open
robrap opened this issue Feb 21, 2024 · 6 comments
Open

Handling data redundancy for legacy data with the event bus #319

robrap opened this issue Feb 21, 2024 · 6 comments
Labels
event-bus Work related to the Event Bus.

Comments

@robrap
Copy link
Contributor

robrap commented Feb 21, 2024

How to we handle the case when services want to use the event bus for data redundancy across services, but there is legacy data that predates the existence of the event?

It's unclear if this ticket would result in any tooling, or just documentation to acknowledge and provide some guidance around this potentially common situation.

It is possible that each situation will need its own way to copy the old data (API, export/import, etc.) that doesn't conflict with the ongoing events, or the events could be rerun from a certain time.

How we handle db timestamps for each event may come into play.

@robrap robrap added the event-bus Work related to the Event Bus. label Feb 21, 2024
@zacharis278
Copy link
Contributor

If we want to rely on the event bus as the only method of data transfer I could see "rebuilding" events from the old/existing data working safely when there is a single consumer. What happens the next time we need this same event data synced to a different service? Can we run that process again or would it risk unforeseen side effects in the first service?

@robrap
Copy link
Contributor Author

robrap commented Feb 21, 2024

@zacharis278: Based on the issues that you are raising, I don't think re-sending all events to the event bus is the right solution, at least for the topics we have created.

  1. There is the idea of event sourcing, where a topic becomes the source of truth of certain data and contains all history with infinite data retention (which may have PII and other implications). We decided to punt on this, but it is an option for a new topic. I'm not clear on how and if event sourcing could be introduced for a previously existing source of data, but it we could, it would require a new topic that retains all the data from the start, and could be read/-re-read by different consumers through time.
  2. For our existing topics, I think the initial data simply needs to be loaded separately. As you noted, all consumers will be affected, so it doesn't seem like the right pattern in general. Whether or not it is a hack that could be used in certain circumstances is a separate question, but we need to think about whether this is public or private code, and who else might be affected, etc. I'd prefer not to if it can be avoided though.

@bmtcril
Copy link
Contributor

bmtcril commented Feb 21, 2024

Creating synthetic old events is tricky in that it will explicitly be sending events out of order, which puts the onus of figuring out ordering and idempotency on every consumer of that stream (some of which may be out of the operator's control). IMO bootstrapping new consumers with all of the existing event data s a perfect use case for the event bus, but once data is flowing manufacturing old events is fraught with peril.

@robrap
Copy link
Contributor Author

robrap commented Feb 21, 2024

@bmtcril: Can you help explain the following?

IMO bootstrapping new consumers with all of the existing event data s a perfect use case for the event bus, but once data is flowing manufacturing old events is fraught with peril.

When data on the event bus is sent to old, new, and future consumers, I'm just not clear on what perfect use case you are referring to? It feels like "once data is flowing" will always be the case, unless you mean quite literally when a topic is first getting produced to? And if so, I'm not clear on whether you are saying the event bus should be used for this and how you think this would work?

@zacharis278
Copy link
Contributor

It's sounding like the best option for right now is to load data separately and its up to the particular implementation to work out a seamless transition from initially loaded data to incoming events. Luckily, I don't think this is problematic for our team's use case. As a general solution, I could see this being a bit harder to deal with if we have high frequency events that do more than just create a DB row.

@bmtcril
Copy link
Contributor

bmtcril commented Feb 21, 2024

@robrap our comments crossed, but I think we were saying the same things. You just did it better. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
event-bus Work related to the Event Bus.
Projects
None yet
Development

No branches or pull requests

3 participants