Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New pulsar source #8285

Open
jszwedko opened this issue Jul 14, 2021 · 4 comments
Open

New pulsar source #8285

jszwedko opened this issue Jul 14, 2021 · 4 comments
Labels
domain: sources Anything related to the Vector's sources source: new A request for a new source type: feature A value-adding code addition that introduce new functionality.

Comments

@jszwedko
Copy link
Member

Suggested by user in discord.

@jszwedko jszwedko added domain: sources Anything related to the Vector's sources type: feature A value-adding code addition that introduce new functionality. labels Jul 14, 2021
@jaysonsantos
Copy link
Contributor

Relaying the chat message for better visibility here.

Hey there folks, I'd like to contribute with vector by adding a pulsar source but I got a few questions first, do you mind shedding some light?
I'd like to try and build it with at-most-once guarantees and being re-entrant and to do that with s3 sink for example, the flow i was thinking is:
- use exclusive/failover subscription so batching is more efficient and i am able to acknowledge the last message so pulsar makes the atomic commit
- send messages to the sink with batching notifier
- acknowledge only the last message which the sink managed to push to the final destination
and my questions about it are:
- i saw that kafka source uses batch notifier [1] which calls the receiver function for every log entry, is it feasible to aggregate the coming messages from a few seconds in order to ack only the last one?
- the receiver would be called only when the s3 sink actually put the file into the bucket?
- could the sink be configured to use the datetime of the first message on the batch, so if the worker dies before acknowledging pulsar it can be re-entrant by pushing the batch file with the same name (at the expense of upload bandwidth)?

[1] https://github.com/timberio/vector/blob/master/src/sources/kafka.rs#L215-L219

@sumeet-zuora
Copy link

@jaysonsantos were you able to fork and create pulsar source? we are also interested

@jszwedko jszwedko added the source: new A request for a new source label Dec 29, 2022
@bwmcadams
Copy link

I am working on a Pulsar source for Vector, hopefully to have a PR soonish.

@bmcadams-datastax
Copy link

I am working on a Pulsar source for Vector, hopefully to have a PR soonish.

scratch that as it looks like there's an open PR for it already

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: sources Anything related to the Vector's sources source: new A request for a new source type: feature A value-adding code addition that introduce new functionality.
Projects
None yet
Development

No branches or pull requests

5 participants