-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Batch message publishing #42
Comments
We'd also need to figure out how the Option 1)
Option 2)
I think option 1 is better because
|
To dive right into the options, Option 1 is correct in that the ttl is defined per channel. Its also future proof in that we can add the As you correctly point out, Option 2 doesn't make sense since the I'd like to propose a third option that I think would result in a simpler publisher/consumer: [
{
channel: '/my/unique/channel',
payload: {"pay":"load"},
ttl: 90
},
{
channel: '/my/unique/channel',
payload: {"would":"have been the body"},
ttl: 90
},
{
channel: "/some/other/channel",
payload: {"more":"data"},
ttl: 120,
buffer_size: 1
},
{
channel: "/some/other/channel",
payload: {"that would":"have been the body"},
ttl: 120,
buffer_size: 1
}
] The client can be less stateful when putting together a message to send to the server (e.g. you don't need to check if a channel exists or not before adding messages to it) and this lends itself better to being a "stream" of events. This protocol would also be useful for connection multiplexing in the Firehose.js library on the client side so that we don't have to open up 10 different WS connections for 10 different channels: [
{
channel: '/my/unique/channel',
payload: {"pay":"load"},
sequence: 1
},
{
channel: "/some/other/channel",
payload: {"that would":"have been the body"},
sequence: 3
}
] Taking a step back, have you considered that HTTP/HTTPS is the wrong protocol for publishing to firehose? Perhaps a direct connection to the Redis pub/sub instances would be performant without HTTP connection overhead. There are also other protocols like protobuf that could speed things up. |
Those are some great ideas. I totally agree that the 3rd option for syntax you have proposed is much more stream based and simpler for the client. However, with the use case I have in mind, I actually want my client to be more stateful. My goal is to address https://github.com/polleverywhere/firehose/issues/35 in the publisher. That way the unneeded messages are not even sent to the server. That is much more efficient than sending them to the server and having the server drop them. The other reason I like option 1 better than option 3 is that it would allow the server to batch-update Redis. The current pull request doesn't do this; it just publishes 1 message at a time to redis. However, in theory we could have a single redis lua script that would batch update each channel. Thus we'd only have 1 redis command for each channel in the batch update. The idea of using other protocols is very interesting. I think it requires more sysadmin overhead though, so I'm not personally as interested in it at the moment. edit: I like Option 1 not 2 |
For Option 3 the client can still be more stateful, but that additional state doesn't have to be imposed on the protocol. If you impose state in the protocol then at a minimum any client also needs to be stateful. If you don't impose state in the protocol, you can have both simple and more complex clients. |
I'm going to do some performance tests to see how much benefit I get from batching requests client side vs. both batching and dropping unneeded requests client side. Regarding stateful vs. stateless, I'm not sure that is the best comparison between the two options. Any client that deals with batching is stateful. One uses an array for state, the other uses a dictionary/hash. The array is definitely simpler state. It also lends itself towards streaming. For example, the client could continue adding to the array while it is in the process of uploading the server. The hash state requires a bit more logic client side. But if the server can benefit from this logic (such as batching redis updates) then that makes the system more scalable. Firehose clients (including publishers) are more horizontally scalable than Firehose servers. |
I've started to see the stream format emerge on the server in the [
{
"message": "Hi there",
"channel": "/greetings/from/earth",
"ttl": "90"
},
{
"message": "Bee boop",
"channel": "/greetings/from/mars",
"ttl": "60"
},
{
"message": "Hi there again",
"channel": "/greetings/from/earth",
"ttl": "30"
}
] The Firehose server would process these messages from top to bottom and preserve the clients' intent on the order it wants the messages published (though we can't make guarantees on the order messages are published, we should at least make a best effort). I also want to attempt define a more consistent message format for subscribing and publishing. Our clients are already consuming a stream of sequential messages, which an array best approximates. For the service that's publishing to Firehose, it could batch up a bunch of messages with a batch = Firehose::Publisher::Batch.new
batch.messages << Firehose::Message.new("Hi there", channel: "/greetings/from/earth", ttl: 90, buffer_size: 1)
batch.messages << Firehose::Message.new("Bee boop", channel: "/greetings/from/mars", ttl: 60)
batch.messages << Firehose::Message.new("Hi there again", channel: "/greetings/from/earth", ttl: 30, buffer_size: 1)
batch.publish That messages would be converted into the JSON as shown above and sent to the Firehose server for processing. The Firehose client would be free to implement rate limiting, etc. based on the contents of the message (e.g. Messages with We could guarantee the order of publishing received by the Firehose server from the JSON payload via the Lua script since that's executed atomically. |
I would like firehose to accept a batch of messages for various channels with a single HTTP request.
The POST body might look like this:
It might also be reasonable to return a 500 error in the case of an error, since we would generally expect all messages to be processed.
If any of the POSTed JSON is unexpectedly incorrect (for example, if one element in the array isn't a valid message to publish to a firehose channel), then we could return a 400.
The use cases for this are:
The text was updated successfully, but these errors were encountered: