Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TopicMatcher does not work as expected #226

Closed
Altair-Bueno opened this issue Apr 16, 2024 · 6 comments
Closed

TopicMatcher does not work as expected #226

Altair-Bueno opened this issue Apr 16, 2024 · 6 comments

Comments

@Altair-Bueno
Copy link

Summary

TopicMatcher behaviour is nowhere near the advertised behaviour, and it differs from the Python implementation

Example

from paho.mqtt.matcher import MQTTMatcher

matcher = MQTTMatcher()

matcher["$MAC/+/+/rpc"] = "_/device_type/systemid/_"
matcher["$MAC/+/+/+/rpc"] = "_/device_type/systemid/zoneid/_"
matcher["$MAC/+/rpc"] = "_/device_type/_"
matcher["$MAC/rpc"] = ""

print(list(matcher.iter_match("$MAC/humidifier/1/rpc")))
# ['_/device_type/systemid/_']
use paho_mqtt::topic_matcher::TopicMatcher;

fn main() {
    let mut topic_matcher = TopicMatcher::new();
    topic_matcher.insert("$MAC/+/+/rpc", "_/device_type/systemid/_");
    // Comment all inserts below to see the difference
    topic_matcher.insert("$MAC/+/+/+/rpc", "_/device_type/systemid/zoneid/_");
    topic_matcher.insert("$MAC/+/rpc", "_/device_type/_");
    topic_matcher.insert("$MAC/rpc", "");

    let matches: Vec<_> = topic_matcher.matches("$MAC/humidifier/1/rpc").collect();
    println!("{:?}", matches);
    // Prints []
    // Only works if it is the only insertion
}

MRE

topic-matcher-behaviour.zip

@Altair-Bueno
Copy link
Author

Looking at the current implementation of TopicMatcher, it seems to me it uses too many resources. There are multiple string clones which is not ideal, at least for my usecase (memory constrained device).

I decided to use a far simpler solution that is both correct and should use less memory. It should be somewhat performant too, at least with a few topics (which is my usecase anyways).

This is the code, for anyone interested (trait bounds should be revisited, as they are too general)

Code

// Author: Altair Bueno <[email protected]>

use std::borrow::Borrow;
use std::collections::BTreeMap;
use std::collections::HashMap;
use std::hash::Hash;

pub fn matches(pattern: &str, topic: &str) -> bool {
    let mut pattern = pattern.split('/');
    let mut topic = topic.split('/');
    loop {
        let pattern_level = pattern.next();
        let topic_level = topic.next();
        match (pattern_level, topic_level) {
            // Exhausted both pattern and topic
            (None, None) => return true,
            // Wildcard on pattern
            (Some("#"), _) => return true,
            // Single level wildcard on pattern
            (Some("+"), Some(_)) => continue,
            // Equal levels
            (Some(pattern), Some(topic)) if pattern == topic => continue,
            // Otherwise, no match
            _ => return false,
        }
    }
}

pub trait MQTTMatches {
    type Key<'this>
    where
        Self: 'this;
    type Value<'this>
    where
        Self: 'this;
    fn matches<'this, 'topic>(
        &'this self,
        topic: &'topic str,
    ) -> impl Iterator<Item = (Self::Key<'this>, Self::Value<'this>)> + 'topic
    where
        'this: 'topic;
}

impl<K, V, S> MQTTMatches for HashMap<K, V, S>
where
    K: Eq + Hash + Borrow<str>,
    S: std::hash::BuildHasher,
{
    type Key<'this> = &'this K
    where
        Self: 'this;
    type Value<'this> = &'this V
    where
        Self: 'this;

    fn matches<'this, 'topic>(
        &'this self,
        topic: &'topic str,
    ) -> impl Iterator<Item = (Self::Key<'this>, Self::Value<'this>)> + 'topic
    where
        'this: 'topic,
    {
        self.iter().filter(move |(pattern, _)| {
            let pattern: &str = (*pattern).borrow();
            matches(pattern, topic)
        })
    }
}

impl<K, V> MQTTMatches for BTreeMap<K, V>
where
    K: Eq + Hash + Borrow<str>,
{
    type Key<'this> = &'this K
    where
        Self: 'this;
    type Value<'this> = &'this V
    where
        Self: 'this;

    fn matches<'this, 'topic>(
        &'this self,
        topic: &'topic str,
    ) -> impl Iterator<Item = (Self::Key<'this>, Self::Value<'this>)> + 'topic
    where
        'this: 'topic,
    {
        self.iter().filter(move |(pattern, _)| {
            let pattern: &str = (*pattern).borrow();
            matches(pattern, topic)
        })
    }
}

#[cfg(test)]
mod test {
    use super::*;

    #[test]
    fn assert_it_works_for_our_usecase() {
        let mut matcher = HashMap::<&str, &str>::new();
        matcher.insert("$MAC/+/+/rpc", "_/device_type/systemid/_");
        matcher.insert("$MAC/+/+/+/rpc", "_/device_type/systemid/zoneid/_");
        matcher.insert("$MAC/+/rpc", "_/device_type/_");
        matcher.insert("$MAC/rpc", "");

        let topic = "$MAC/humidifier/1/rpc";
        let matches: Vec<_> = matcher.matches(topic).collect();
        assert_eq!(
            matches,
            vec![(&"$MAC/+/+/rpc", &"_/device_type/systemid/_")]
        );
    }
}

@Altair-Bueno
Copy link
Author

Improved code with tests and documentation. Also, it now works with anything that can be iterated over.

Code

// Author: Altair Bueno <[email protected]>

/// Checks if a topic with wildcards matches a given topic.
fn matches(pattern: &str, topic: &str) -> bool {
    let mut pattern = pattern.split('/');
    let mut topic = topic.split('/');
    loop {
        let pattern_level = pattern.next();
        let topic_level = topic.next();
        match (pattern_level, topic_level) {
            // Exhausted both pattern and topic
            (None, None) => return true,
            // Wildcard on pattern
            (Some("#"), _) => return true,
            // Single level wildcard on pattern
            (Some("+"), Some(_)) => continue,
            // Equal levels
            (Some(pattern), Some(topic)) if pattern == topic => continue,
            // Otherwise, no match
            _ => return false,
        }
    }
}

/// Extension trait for map types and tuple iterators that allows to filter
/// entries by matching a MQTT topic.
///
/// # Example
///
/// ```
/// use std::collections::HashMap;
/// use std::collections::HashSet;
/// use az_mqtt::util::MQTTMatchesExt;
///
/// let mut matcher = HashMap::<&str, &str>::new();
/// matcher.insert("$MAC/+/+/rpc", "_/device_type/systemid/_");
/// matcher.insert("$MAC/+/+/+/rpc", "_/device_type/systemid/zoneid/_");
/// matcher.insert("$MAC/+/rpc", "_/device_type/_");
/// matcher.insert("$MAC/rpc", "");
///
/// let topic = "$MAC/humidifier/1/rpc";
/// let matches: HashSet<_> = matcher.matches(topic).collect();
/// assert_eq!(
///    matches,
///   HashSet::from([("$MAC/+/+/rpc", "_/device_type/systemid/_")])
/// );
/// ```
pub trait MQTTMatchesExt {
    /// The key type returned by the iterator.
    type Key;
    /// The value type returned by the iterator.
    type Value;

    /// Matches the given topic against the keys of the map and returns an
    /// iterator over the matching entries. Keys of the map are expected to
    /// be MQTT topic patterns and may contain wildcards.
    fn matches<'topic>(
        self,
        topic: &'topic str,
    ) -> impl Iterator<Item = (Self::Key, Self::Value)> + 'topic
    where
        Self: 'topic;
}

impl<K, V, C> MQTTMatchesExt for C
where
    C: IntoIterator<Item = (K, V)>,
    K: AsRef<str>,
{
    type Key = K;
    type Value = V;

    fn matches<'topic>(
        self,
        topic: &'topic str,
    ) -> impl Iterator<Item = (Self::Key, Self::Value)> + 'topic
    where
        Self: 'topic,
    {
        self.into_iter()
            .filter(move |(pattern, _)| matches(pattern.as_ref(), topic))
    }
}

#[cfg(test)]
mod test {
    use super::*;

    #[test]
    fn assert_that_no_wildcards_matches() {
        assert!(matches("a/b/c", "a/b/c"));
    }
    #[test]
    fn assert_that_plus_wildcard_matches() {
        assert!(matches("a/+/c", "a/b/c"));
    }
    #[test]
    fn assert_that_leading_plus_wildcard_matches() {
        assert!(matches("+/b/c", "a/b/c"));
    }
    #[test]
    fn assert_that_trailing_plus_wildcard_matches() {
        assert!(matches("a/b/+", "a/b/c"));
    }
    #[test]
    fn assert_that_hash_wildcard_matches_none_level() {
        assert!(matches("a/b/#", "a/b"));
    }
    #[test]
    fn assert_that_hash_wildcard_matches_single_level() {
        assert!(matches("a/b/#", "a/b/c"));
    }
    #[test]
    fn assert_that_hash_wildcard_matches_multiple_levels() {
        assert!(matches("a/b/#", "a/b/c/d"));
    }
}

@fpagliughi
Copy link
Contributor

Hey! Thanks for this. Yes, the TopicMatcher was a fairly quick hack which I meant to re-assess when I had the time. But it has been working for my use case, so it got a low priority.

Unfortunately, since this is an Eclipse project, I can't really do anything with you code unless you submit it as a PR with a signed Eclipse (ECA) Agreement. Eclipse is big on the legal stuff.

@Altair-Bueno
Copy link
Author

I can do that, i already signed the annoying ECA for another project.

Do you want me to remove TopicMatcher and replace it with MQTTMatchesExt? If so, keep in mind it is a breaking change. But to be fair, TopicMatcher is already broken so there is little motivation to keep it. Additionally, want a different name or API?

@fpagliughi
Copy link
Contributor

Keep the name TopicMatcher. Submit against the develop branch. I'll use that to roll up a couple of breaking changes, if necessary.

@fpagliughi
Copy link
Contributor

The broken TopicMatcher is fixed in v0.12.4. The Rust example above now produces:

[("$MAC/+/+/rpc", "_/device_type/systemid/_")]

which is the key/value pair that matched; the key being the filter.

The new TopcMatcher is optimized quite a bit, and I'm much happier with the implementation. There are string and map allocations as the trie is created, but the iterator only uses a single Vec<> stack to search for matches. That seems pretty normal/reasonable for a tree search, especially one that is normally built/modified once and searched maybe millions of times.

I'll write some performance benchmarks in the near future.

I assume the existing implementation "does not work as expected" is fixed, so will close this issue. If it still seems broken, please feel free to re-open or create a new ticket.

As for new and alternate implementations and a common trait, let's move that discussion to PR #228

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants