-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: TopicMatcher as extension trait to iterators (BREAKING) #228
base: develop
Are you sure you want to change the base?
Conversation
Thanks God ECA didn't fail this time 😅 |
I couldn't run the tests myself as there were some compilation errors: error[E0277]: the trait bound `message::Message: std::convert::From<(&str, &[u8], types::QoS, bool)>` is not satisfied
--> src/message.rs:402:19
|
402 | let msg = Message::from((TOPIC, PAYLOAD, QOS, RETAINED));
| ^^^^^^^ the trait `std::convert::From<(&str, &[u8], types::QoS, bool)>` is not implemented for `message::Message`
|
= help: the following other types implement trait `std::convert::From<T>`:
<message::Message as std::convert::From<token::DeliveryToken>>
<message::Message as std::convert::From<(&'a str, &'b [u8])>>
<message::Message as std::convert::From<(&'a str, &'b [u8], i32, bool)>>
error[E0308]: mismatched types
--> src/message.rs:545:38
|
545 | assert_eq!(QOS as c_int, msg.qos());
| ^^^^^^^^^ expected `i32`, found `QoS`
Some errors have detailed explanations: E0277, E0308.
For more information about an error, try `rustc --explain E0277`.
error: could not compile `paho-mqtt` (lib test) due to 2 previous errors
warning: build failed, waiting for other jobs to finish... |
Thanks for the submission and wrestling with the ECA! The simple But the function does miss a corner case: Wildcards in the first field should not match a topic that starts with a dollar sign So together, something like this:
To verify, I extracted the tests from the Python library and added a few extras:
The updated version passes all the tests. |
But the problem with implementing the matcher as a collection of string filters is that it runs the risk of trading some up-front (probably one-time) allocations with runtime performance. And runtime performance is far more critical. A trie structure is extremely performant for lookups. Especially one that works correctly 🫤 Say you're writing an RPC server that maps a lot of similar request topics to callback functions, like:
and imagine you had hundreds of callbacks. For each incoming request message you would need to manually split each of those hundres of filters, as you searched through them, and then walk down the fields of each topic. Every time. With a trie you only need to walk down six nodes to find the matching value in this example... no matter how many topics are in the collection. So finding a match against a single topic is less efficient. But finding a match in a large collection of topics is much more efficient. At the cost of some up-front allocations. |
Good idea. It is also useful for application that (internally) use topics as arrays of levels.
Wow that's a good one. Gotcha. However I would rather have something like this: pub fn matches_iter<'a>(filter: impl Iterator<Item = &'a str>, topic: impl Iterator<Item = &'a str>) -> bool {
let mut filter = filter.peekable();
let mut topic = topic.peekable();
// See https://docs.oasis-open.org/mqtt/mqtt/v5.0/os/mqtt-v5.0-os.html#_Toc3901246
if matches!(filter.peek(), Some(&"#" | &"+")) && topic.peek().unwrap_or_default().starts_with('$') {
return false;
}
loop {
let filter_level = filter.next();
let topic_level = topic.next();
match (filter_level, topic_level) {
// Exhausted both filter and topic
(None, None) => return true,
// Wildcard on filter
(Some("#"), _) => return true,
// Single level wildcard on filter
(Some("+"), Some(_)) => continue,
// Equal levels
(Some(filter), Some(topic)) if filter == topic => continue,
// Otherwise, no match
_ => return false,
}
}
} It clearly states the corner case before the main loop. Also, while we are at it, why not make
In theory, yes. However, I don't think its worth our time. Modern CPUs really like comparing contiguous strings. A trie structure would only be desirable if you maintain a considerable large collection of subscriptions with lots levels. Considering that the target audience for this crate are mostly MQTT clients for embedded and web services, I think this implementation is sufficient, and more importantly, correct. |
Yeah, I was thinking of checking for the corner case at the top. I think either implementation is fine.
No, that's not necessary. You typically specify the filter once when you create the matcher object, so it's optimal to split it once at the time of creation. But you get the topic as a
I, personally, maintain considerably large collection of subscriptions with lots levels! I do a lot of RPC-driven microservices on beefy Linux gateways, so runtime performance is paramount and I'm willing to sacrifice some allocations to get it. But really the only way to find out the optimal solution is to write some tests to measure performance. TODO.
This library is for the heftier operating systems: Linux, Windows, and MacOs; not for an RTOS. And considering all the issues that I've answered for problems on Windows, I can't say that this is mostly used in embedded systems! Who knows? But your point is taken. This library may be used in a number of different systems where one solution is not the best. So what I'm thinking....
That should give us the best of both worlds. User's would have a few choices out-of-the-box, but also be able to create a new implementation to suit their personal needs. And it will (mostly) eliminate the breaking change. (Just need to add the trait declaration to apps). I have a lot of code I don't want to change. I can merge your PR as-is and then restore the trie, or if you want to update the PR to just add your code and keep the pre-existing code. Let me know. (I've already started on #2 and #3) Oh, and BTW... if you can use MQTT v5, then use subscription identifiers whenever possible, and you don't need a topic matcher at all!!! |
Consider this example: use paho_mqtt::topic_matcher::matches_iter;
let message = // From a received MQTT message
let levels: Vec<_> = message.topic().split('/').collect();
if matches_iter(["+", "living_room", "+"].iter(), levels.iter()) {
let device_id: u8 = levels[0].parse()?;
let command: Command = levels[3].parse()?;
do_stuff(device_id, command, message);
}
if matches_iter(["+", "bathroom", "+", "+"].iter(), levels.iter()) {
let device_id: u8 = levels[0].parse()?;
let command: Command = levels[3].parse()?;
let arg = levels[4].parse()?;
// ...
} If Also, consider
Roger.
Note that if
Better to keep it separate imho
Hmm didn't really bother with MQTT v5. Ill take a look nonetheless. |
You should be using v5. Everyone should. The improvements to the spec are astounding. Better error reporting, server management, subscription options (no echo, identifiers), and message metadata (such as for standardized RPCs). Sorry, I should have mentioned this from the start because v5 can totally eliminate the burden of matching from the client and shift it to the server. Which is probably what you want in an under-powered client. As of a few months ago RabbitMQ was the last of the major brokers to add MQTT v5 support, and although it is incomplete, it does cover a lot of the spec, including subscription identifiers. All the other major brokers (Mosquitto, VerneMQ, HiveMQ, etc) have had full v5 support for years. So unless your suck with a really old broker due to your slow-moving IT department, there's no reason not to move to v5. With subscription identifiers, when you subscribe, you add a property to the subscription message assigning an integer identifier to the topic filter. Then, when the broker sends you a message, it includes the subscription identifier(s), telling you which filters matched for that message by integer ID. So you don't need a topic matcher at all; the broker does the work for you. And finding a value associated with that specific topic filter is a simple integer lookup table. Note that if you "subscribe many" with several filters in one SUBSCRIBE message, they all get the same ID, so you would typically subscribe to filters individually to tell them apart. And the broker has the option of sending you a separate message for each match with a single ID, or one message with multiple IDs. So watch for that if you overlap. The sync_consume_v5.rs example is a full implementation of a client using subscription identifiers with integer table lookup for associating callbacks to incoming messages. |
Unfortunately my application is required to work with v3.11 as a baseline. We will look into optimizing the software for v5 tho, it sounds promising! |
So I fixed, cleaned up, and optimized the I put it into a new branch topic_matcher and merged it with this PR. Now I'm trying to figure how to get to a trait that both techniques can share. I implemented an iterator that walks the whole collection so you can see all the items. And used it for an So, any ideas for a trait that allows for an implementation to use its optimal search? Or am I missing how to implement what's here already? Oh, I also renamed the EDIT: Oh, I see one mistake I made. A |
The existing trait also breaks MSRV with v1.63.0:
We may be able to bump that at some point, but I know people are using this crate with Yocto, and the the Rust compiler bundled into the last two versions are:
|
Those are some amazing news. Thank you.
Without https://rust-lang.github.io/rfcs/1210-impl-specialization.html there is little to be done. Either the trait is changed or TopicMatcher:!IntoIterator
Yea, good one. I'm terrible naming things
Does it break MSRV if you guard the trait behind a feature flag? |
Solves #226