-
-
Notifications
You must be signed in to change notification settings - Fork 153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize selectors matching #510
Conversation
26b07ff
to
75f9507
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice optimizations! 🚀
Let's see if we can solve that issue with the whitespaces :)
lib/floki/selector.ex
Outdated
end | ||
|
||
defp do_classes_matches?(class_attr_value, [class]) do | ||
Enum.member?(String.split(class_attr_value, [" ", "\t", "\n"]), class) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is not exactly the same from the regex version, because the occurrence of multiple whitespace chars in sequence are treated differently.
For example, if we have a class attribute that contains multiple spaces between classes values, the former version would capture that correctly, while the new version does not:
class_attr_value = "my class\tanother one"
# new
String.split(class_attr_value, [" ", "\t", "\n"])
#=> ["my", "class", "another", "", "", "one"]
# old
String.split(class_attr_value, ~r/\s+/)
#=> ["my", "class", "another", "one"]
We can, however, reject the empty binaries from the match. Can you try this and see if there is an impact in performance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, fa64b01.
Adding trim: true
didn't had performance impact in the benchmark, but it did reduce the memory usage, since we won't have this empty strings in the list.
lib/floki/selector.ex
Outdated
classes -- String.split(class_attr_value, ~r/\s+/) == [] | ||
min_size = Enum.reduce(classes, -1, fn item, acc -> acc + 1 + bit_size(item) end) | ||
can_match? = bit_size(class_attr_value) >= min_size | ||
can_match? && classes -- String.split(class_attr_value, [" ", "\t", "\n"]) == [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing here :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -63,7 +63,7 @@ defmodule Floki.Selector.AttributeSelector do | |||
s.attribute | |||
|> get_value(attributes) | |||
# Splits by whitespaces ("a b c" -> ["a", "b", "c"]) | |||
|> String.split(~r/\s+/) | |||
|> String.split([" ", "\t", "\n"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here as well. I think we need to consider the same problems here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -103,7 +103,7 @@ defmodule Floki.Selector.AttributeSelector do | |||
|
|||
def match?(attributes, s = %AttributeSelector{match_type: :includes, value: value}) do | |||
get_value(s.attribute, attributes) | |||
|> String.split(~r/\s+/) | |||
|> String.split([" ", "\t", "\n"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And here :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ypconstante thank you! |
Apply some optimizations on selector matching to try to avoid expensive calls:
match?
, check if selector requires descendant, and skip other checks if the current node doesn't have descendants