Support multiple producers #36

josevalim · 2019-02-13T11:28:23Z

This issue is open to collect feedback and use cases.

lessless · 2019-03-08T20:02:30Z

Hi folks,

Thanks for pushing boundaries. Broadway has potential to make our lives so much simpler, and according to my current understanding having multiple producers is a key here.

In our case latency and correctness are two top priorities. Say, we need to consume a 1k messages at the same time (1k producers?) and be sure that each of them will end up in the sam processing unit (partitioned demand dispatcher?).

Our events has a lifecycle and to avoid queuing we are ok for spawning a process per event and shutting it down at the end of the event lifecycle or by timeout in days.

josevalim · 2019-03-08T20:04:53Z

How do you determine which messages go to which processing unit? EDIT: I am asking just so we understand the use case better to develop this feature.

josevalim · 2019-03-08T21:49:25Z

Our events has a lifecycle and to avoid queuing we are ok for spawning a process per event and shutting it down at the end of the event lifecycle or by timeout in days.

Follow up question: why is it important to go to the same unit? Do you keep intermediate results in memory?

lessless · 2019-03-09T08:39:57Z

Each message has an unique event_id field
Nope, it's about sequentiality. Some of a parallelly processed messages may update an exact same entity and in our system all updated should happen in an exactly the same order as they were in a queue.
Similar to a problem with word "are" in the word counting example ;)

josevalim · 2019-03-09T09:34:06Z

Alright, it is more about ordering than locality. Thank you!

mgwidmann · 2019-07-24T12:05:56Z

The broadway sqs producer is very difficult to tune without this feature. If I cannot separate download concurrency from processing concurrency (by making a separate stage for downloaders and a separate stage for processors) then I cannot scale them independently.

The best workaround currently is to have the processor spawn a few tasks to do the download but that leaves the processor waiting instead of processing an already ready message (say from another downloader).

msaraiva · 2019-07-24T12:18:04Z

Hi @mgwidmann!

If I cannot separate download concurrency from processing concurrency (by making a separate stage for downloaders and a separate stage for processors) then I cannot scale them independently.

They are already independent. You can define the concurrency level of the producer or processor by setting the :stages option individually. This issue is about supporting multiple different kinds of producers/sources simultaneously, like consuming data from different queues or even from SQS and RabbitMQ.

mgwidmann · 2019-07-24T12:20:03Z

Sorry I misread the title, thought this was about multiple processors! Is there a separate issue for that?

josevalim · 2019-07-24T12:37:02Z

@mgwidmann the issue right above this one. :D #39 I have some comments on this, so please copy and paste your original comment there and we can discuss solutions.

whatyouhide · 2019-08-30T16:46:23Z

We have a use case for multiple producers where we have different (RabbitMQ) producers producing from different RabbitMQ connections but producing the same kind of messages, that we want to process in the same way. I think it might not be such a unique use case, so it might be worth adding this :) As always, I volunteer to help if we want to go through with this at some point.

josevalim · 2019-08-31T19:21:16Z

In this case you can share the code using modules. I think we won’t get this feature in because we are adding the feature for a producer to change the topology, só producers could change the topology in conflicting ways.

whatyouhide · 2019-08-31T22:19:40Z

@josevalim I can share the code, yep, but I need to start two different Broadway pipelines with basically the same set of options except for the producer. It's fine, it's what we do now, but since I saw the open issue I thought I would discuss. I'm a bit concerned about the current Broadway API which suggests that multiple producers/processors should be supported (for example, passing a list of producers/processors, passing the producer name in handle_message, and so on). So maybe it might be worth deprecating the current API at some point.

whatyouhide · 2019-08-31T22:20:23Z

@josevalim btw, can you expand on the feature of a producer changing the topology?

josevalim · 2019-08-31T22:45:54Z

It is issue #100.

stavro · 2023-01-24T19:15:40Z

I have a situation where we have ~40 SQS queues that need to be consumed from.

I have concerns about setting each SQS queue up in an isolated BroadwaySQS pipeline, with the major concern being that tuning the global concurrency of message handling isn't possible.

In an ideal scenario I would be able to merge the messages from all queues into a singular pipeline, of which a limited number of processors would handle messages across all 40 queues (perhaps with custom priority logic).

With each BroadwaySQS pipeline in isolation, each of the 40 isolated pipelines would have a fixed number of processors, and under heavy load could overwhelm the system.

josevalim · 2023-01-25T07:40:49Z

My suggestion is two:

Change BroadwaySQS to allow multiple queues
Allow the queue/queues themselves to be costumised per producer index (BroadwayRabbitMQ already has this feature)

Then the idea is that you start X producers with Y queues. This is better than 40 producers because demand is always individual between producer/processor, and not shared.

josevalim · 2023-01-25T07:41:10Z

Pull request are welcome! :)

xandervr · 2023-03-10T15:39:45Z

Is someone working on this?

My use case would be to handle multiple sources of information in one pipeline and thus needing multiple types of producers in one pipeline.

whatyouhide · 2023-03-10T19:11:54Z

@xandervr do you mean the SQS case that José mentioned above?

xandervr · 2023-03-16T08:54:47Z

Not in particular, in general I just want Broadway to be able to have multiple types of producers in 1 pipeline. SQS, RabbitMQ... does not really matter, just every type of GenStage should be supported.

Let me know if I misunderstood your question.

@whatyouhide

josevalim · 2023-04-06T07:32:23Z

I cleaned up the thread a bit and reopened it.

atomkirk · 2023-08-05T13:01:08Z

My use case is I have a bunch of GCP deadletter topics that I'd like to consume from and process all the exact same way: store them in a table for review and possible retry. Would be nice not to set up a pipeline for each one.

ekosz · 2024-07-18T22:59:48Z

Hi there! Thank you again for the amazing library. I figured I would explain our usecase for multiple producers. We're an application that works with content creators. As creators signup with our service we need to listen to incoming events either from Youtube or Twitch for that new user, transform them, then both rebroadcast them to our own event system and store the (batched) events our datalake. We were thinking each creator/source tuple would be their own producer as they each have their own API credentials to manage as well as different ways they fetch events and such.

If we can only have a single producer than would the suggestion to create single process that all of our creator-specific produces send their events to? Which then acts as the source for the rest of the pipeline? Also not sure how we would shard / dynamically add or remove the producers as well.

josevalim · 2024-07-19T00:28:32Z

I think you only need the batching part of Broadway. We had plans to extract it out but we never completed them. But you should be able to roll your own batcher processes that accumulates items and then starts a task to publish them to your storage.

mgwidmann mentioned this issue Jul 24, 2019

Support multiple processors #39

Open

josevalim closed this as completed Sep 18, 2019

whatyouhide mentioned this issue Apr 6, 2023

Use Broadway for stream processing #264

Closed

dashbitco deleted a comment from msaraiva Apr 6, 2023

josevalim reopened this Apr 6, 2023

whatyouhide added the Kind:Feature label Apr 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support multiple producers #36

Support multiple producers #36

josevalim commented Feb 13, 2019

lessless commented Mar 8, 2019

josevalim commented Mar 8, 2019 via email •

edited

Loading

josevalim commented Mar 8, 2019

lessless commented Mar 9, 2019

josevalim commented Mar 9, 2019

mgwidmann commented Jul 24, 2019

msaraiva commented Jul 24, 2019

mgwidmann commented Jul 24, 2019

josevalim commented Jul 24, 2019

whatyouhide commented Aug 30, 2019

josevalim commented Aug 31, 2019

whatyouhide commented Aug 31, 2019

whatyouhide commented Aug 31, 2019

josevalim commented Aug 31, 2019

stavro commented Jan 24, 2023

josevalim commented Jan 25, 2023

josevalim commented Jan 25, 2023

xandervr commented Mar 10, 2023 •

edited

Loading

whatyouhide commented Mar 10, 2023

xandervr commented Mar 16, 2023 •

edited

Loading

josevalim commented Apr 6, 2023

atomkirk commented Aug 5, 2023

ekosz commented Jul 18, 2024 •

edited

Loading

josevalim commented Jul 19, 2024

Support multiple producers #36

Support multiple producers #36

Comments

josevalim commented Feb 13, 2019

lessless commented Mar 8, 2019

josevalim commented Mar 8, 2019 via email • edited Loading

josevalim commented Mar 8, 2019

lessless commented Mar 9, 2019

josevalim commented Mar 9, 2019

mgwidmann commented Jul 24, 2019

msaraiva commented Jul 24, 2019

mgwidmann commented Jul 24, 2019

josevalim commented Jul 24, 2019

whatyouhide commented Aug 30, 2019

josevalim commented Aug 31, 2019

whatyouhide commented Aug 31, 2019

whatyouhide commented Aug 31, 2019

josevalim commented Aug 31, 2019

stavro commented Jan 24, 2023

josevalim commented Jan 25, 2023

josevalim commented Jan 25, 2023

xandervr commented Mar 10, 2023 • edited Loading

whatyouhide commented Mar 10, 2023

xandervr commented Mar 16, 2023 • edited Loading

josevalim commented Apr 6, 2023

atomkirk commented Aug 5, 2023

ekosz commented Jul 18, 2024 • edited Loading

josevalim commented Jul 19, 2024

josevalim commented Mar 8, 2019 via email •

edited

Loading

xandervr commented Mar 10, 2023 •

edited

Loading

xandervr commented Mar 16, 2023 •

edited

Loading

ekosz commented Jul 18, 2024 •

edited

Loading