-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance Issue #165
Comments
I was actually going to revisit performance issues. There was an issue with the new C lib that has slowed things down and got me thinking about it again: When that fix is released, I want to compare against the C library, and also start testing the different async executors to see if they make a difference. My thought is that this library should be a little slower than the C one, due to the FFI overhead, but not by too much. |
Thanks for your reply! I'll look forward to the new release. |
Just profiled the async consumer with perf. |
Interesting. In my initial notes from 5yrs ago, it says, "reduce/eliminate mem copies." I should have listened to my own advice! I really do need to create some performance tests with large payloads to stay on top of this kind of thing. It might be worth keeping the underlying C struct and creating a Rust slice over the payload. That creates all kinds of issues, like creating the messages from Rust, moving, etc. But for that performance improvement, it would be worth trying. |
I explored your code and saw that the For subscription, |
Yes, Originally, when this library was started, Rust used a different allocator (jemalloc) that was not compatible with the system allocator used by C. So you could not allocate memory in Rust and free it in C, or visa versa. The app would likely crash. This changed in 2018(?), and now Rust uses the system allocator by default: For the most part, though, the C library doesn't pass memory ownership in and out of the library. The big exception is incoming messages, which the application must free manually. But, going the other way, on publish, the library doesn't take possession of the message. I can only assume it does an internal copy of the payload if it needs to cache the message, particularly for QoS >= 1. That said, the Rust But it also might just flip the problem around. Receiving messages would be more performant, but creating and sending messages could suffer a performance drop. With the current code, the app can build a payload into a What we would likely need is something like a |
The issue that @jerry73204 raised is a real one, and I'm still thinking of a way to make a I copy/pasted @YuanYuYuan 's
The -sys crate has been updated on crates.io to eliminate one known performance issue. I'm using Linux Mint 20 on a fairly beefy desktop. Intel® Core™ i9-10900KF CPU @ 3.70GHz × 20, w/ 64GB RAM. So I'm seeing these speeds:
A little closer to the Python speeds. But... Trying again with the release build:
This is certainly faster, but a lot more erratic. It would be interesting to see where the bottleneck is here and what's causing the differences. (BTW: The messages w/ an 8-byte payload total in at 87 bytes. So at max ~160k msg/s, it's doing around 14MB/s, not counting the TCP headers, etc) I also tested with the unreleased code in the release branch, which now uses Paho C v1.3.12, which has some more changes to the underlying network code.
This seems fairly comparable with the previous version. |
Sorry for resurrecting this old thread but we try to integrate the rust library and so I got curious. I think the performance difference comes mainly from how the C library is implemented. It basically uses a thread for sending out the messages which reads from a queue and if there are no more elements it waits for a cond variable: The problem is now that in this test there is only a single rust thread which adds an element to the queue and then sleeps until its woken up when the mqtt send thread successfully send the message via TCP. Since there will always be only one message in the queue the mqtt send thread will always wait on the cond variable so for small messages like 8byte the runtime is dominated by the thread switching. To see this, at least on my machine, removing the wait in the C library made it a lot faster. Since this is not viable a better way would be to allow publishing with a callback like
for example and instead of awaiting the token simply count in the callback the number of successful publishes. This keeps the mqtt send queue full and prevents the mqtt send thread from sleeping / waiting. With this I get around 450k messages per second published to a local mosquitto broker on my M3. I will upload the branch to github and link it here would be interesting if this is reproducible on other systems. |
Here is a link to the testcase and the fixes: https://github.com/jjj-vtm/paho.mqtt.rust/blob/performance_improvements/examples/pub_loop.rs I am not sure if it is worth a feature request but it shows that the rust + c combination is quite fast |
@jjj-vtm Thanks for your insight. It's nice if we can have callback in the public crate. It can serve those focusing on performance before we can make async/.await performant. |
This is great stuff. I'm deep into refactoring the options structs to remove the C FFI dependence in them, as a 1st step towards a fully Rust library, but that is nearing completion, then I can start looking at this more closely. Also, @jerry73204, I haven't forgotten about the copy-on-receiving-messages performance hit. That would be a relatively quick boost as well. |
Hi, there,
While benchmarking the throughput of MQTT, I found that the Rust version is not stable and slower than one using Python. The testing is about the throughput of the publisher. Here are the details.
pub.rs
pub.py
Throughput comparison
pub.rs
pub.py
The problems are
I'm wondering that if it's the performance issue. Or is anything missing in my Rust code? Any suggestion is appreciated!
The text was updated successfully, but these errors were encountered: