-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Complete rewrite #16
base: devel
Are you sure you want to change the base?
Complete rewrite #16
Conversation
added regex filtering ...
All right, reviewing the diff now
Basically you have collapse the logic you build for AllMatches logic to have Hashkey(allkeys that need match) and interesect with keys available => if not all keys around, just fail. If keys here, then keep hashset of all values & go look for presence in it. this will take all the orders to LOG complexity instead of naive linear complexity you have everywhere implement the NotAnyMatch in same way, i.e. Hashkey(keys that should not match) and interesect with keys available => resulting keys need same set lookup on values. rest is good addition IMO, the not/or/and can be a nice way to generically combine stuff. Allow possibly a futures implementation even over time to get parallelization on evaluation. |
ad 1.Yeah, this particular sentence was copy-pasted from ad 2.This is complicated. First of all, but slightly unrelated: I'm still not sure about the semantics of your original implementation, as I found the comments kind of confusing. Does the filter accept when, for just one key, all the values match, or any value matches? I've implemented the "all values match" and you don't seem to disagree after the code review, but after looking into your original implementation again, it seems to me that maybe the intent is actually "any value matches"? Can you please state what behavior you need? Now to the actual problem: I do agree my solution doesn't scale. On the other hand, for simple filters, it will be more performant as it avoids all the bookkeeping with allocating HashMaps/Sets, computing hashes etc. I'm not sure if there will be usages more complicated than yours. That is why I specifically asked you about your typical logging scenario. You said you have 5-6 keys, and I presumed that a linear find of a string in 5-6 strings wouldn't be much slower - if slower at all - than having to compute the hash of the string and doing a HashSet lookup. Most of the times, When the appropriate key is found, looking for the value in the There will be some overhead associated with allocating the intermediate structures both with your implementation and my implementation, and various possibilities to mitigate it. I guess my structures have a bigger overhead for more complicated filters as there's more mallocing going on, but that would need to be measured to be sure. I also have some ideas on how to make the structures less alloc hungry, if that indeed proves to be a problem. I think at this point it would be too bold to suggest any one implementation to "overperform them all", as different scenarios ask for different "best" solutions. We could seriously complicate the implementation while actually harming the performance for the common use-cases. If you do seriously care about performance, I suggest that you implement a criterion benchmark for you original implementation, with the scenario you want to benchmark. IIRC that would be 5-6 keys, 10 values for each, carefully selecting the keys and values to have similar characteristics to the ones you use in production with respect to length and comparison performance (that is, if your production keys are different words like "car", "mouse", "remote", don't use keys like "key1", "key2", "key3" in the benchmark as those take longer to compare. Try to use string lengths similar to the ones you use in production). I'll then port the benchmark to the new implementation so we can compare the performances based on hard data and not just speculations. If there is indeed a serious performance regression, we'll try to find ways to make it better. ad 3Ok, let's solve that after 2. is resolved. |
Hey Tomas, so I rewrote the comment (or actually I think maybe even David wrote it @ the end. /// * a key is ignored until present in should be really clear. filter takes set of keys, each holding seet of values and only passes if all keys have at least one matching value neg_filter takes same but rejects if any of the keys have at least one matching value feel free to add according criterion on the main and submit another pull and then I agree, good idea to run both over the criterion ... yes, scale of about 5-6 keys with 5-6 values each is about the envelope I think would be good to test. And I hear you, having a pre-compute hash of keys in front (I didn't see that in your implementation when looking @ the diff) to get the set of keys that need testing is good and then I see the value of what you say, i.e. strcmp being fast compared to compute hashes. Let's look @ perf and then maybe we can fall back on something like prefix trees to try things ... |
Ok, I was kind of hoping you could do the Criterion part for the master and I then migrate it into complete-rewrite, but if you don't want to do it i guess I can do both, I've used Criterion before. I've got some pressing work to do, hope I'll get back to this on Wednesday or Thursday... |
unfortunately, same here, I wouldn't get to it to couple weeks. If you don't find time I'll do it then ... |
OK, finally got to do it. I've implemented the benchmark on both the original KVFilter (see here: https://github.com/dvtomas/kvfilter/tree/criterion), and the complete-rewrite branch (see here: see here: https://github.com/dvtomas/kvfilter/tree/complete-rewrite) I've implemented a simple AND benchmark with just two KVs, and a more complicated one with
The results for the original:
and for complete-rewrite
So, yes, the performance for your use-case has regressed, from roughly 4.5 us to 10.5 us per three log messages for your use case. In my opinion, that's not too bad, and I really like the new opportunities the new KVFilter has (basically, for me, it turns something that just can't do what I need into something that's actually useful to me). What do you think? |
So, Tony, did you take a look at the results? |
thomas, on my plate, after this weekend. sorry, drafts to submit & code to chuck ... |
@przygienda Hey, Tony, still busy coding? Any chance for a moment of free time to take peek at the benchmarks in the near future? |
yes, on my plate. beg forgiveness ...
…On Wed, Oct 31, 2018 at 11:56 AM Tomas Dvorak ***@***.***> wrote:
@przygienda <https://github.com/przygienda> Hey, Tony, still busy coding?
Any chance for a moment of free time to take peek at the benchmarks in the
near future?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#16 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABo0C8zXiG-CMPnDGEZwQJcraYVSL5qjks5uqfJFgaJpZM4XKxlJ>
.
|
Can this PR be merged? |
I've been using this branch personally for the last two years to my satisfaction, but I guess it is a breaking change (different API), and the performance might be somewhat worse for some specific cases. IMO the much more powerful design makes up for it, but opinions of others on whether it is worth the change may differ. |
yeah, I looked @ it couple times but the API completely breaks stuff & my
stuff is extremely performance sensitive. Best probably take it into a new
crate
…On Wed, Mar 10, 2021 at 1:28 PM Tomas Dvorak ***@***.***> wrote:
I've been using this branch personally for the last two years to my
satisfaction, but I guess it is a breaking change (different API), and the
performance might be somewhat worse for some specific cases. IMO the much
more powerful design makes up for it, but opinions of others on whether it
is worth the change may differ.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#16 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AANDIC5WZSBYSOKIM4A2J53TC5QVVANCNFSM4FZLDFEQ>
.
|
@dvtomas Do you have time to update the documentation, or some basic examples, for how to use your branch? I've been trying to setup a KVFilter to discard messages where a key is present and contains "bad" values, or if the key is not present... some semantics like this: // discard/filter out before logging to drain
info!(logger, ""; "err" => None);
// discard/filter out before logging to drain
info!(logger, ""; "err" => "");
// discard/filter out before logging to drain
info!(logger, ""; "key" => "value");
// do not discard, log to the drain
info!(logger, ""; "err" => "error!!!"); However, I haven't been able to figure out how to achieve this with standard KVFilter. How would I express these logging semantics using your branch, @dvtomas? Edit: Here's the current KVFilter code I've been using to no success: let filter = KVFilter::new(drain, Level::Debug)
.always_suppress_any(Some(
vec![(
"err".to_string(),
HashSet::from_iter(vec!["None".to_string(), "".to_string()]),
)]
.into_iter()
.collect(),
))
.only_pass_any_on_all_keys(Some(vec![("err".to_string(), HashSet::new())].into_iter().collect())); |
The only documentation for now is the unit tests, let me know if they are too difficult to understand. There is currently no filter with the semantics "Key exists". That is what you want, because with it you would be able to express your filter as (pseudocode) "Pass message if The only missing piece is |
I think this is the major piece I'm looking for, @dvtomas. I don't know if @przygienda has plans to implement this, but I'd be open to using either the main project or your fork if you're willing/have the spare time to implement something like
I wouldn't say they're too difficult to understand, but I just have a hard time/have to spend a lot of time to extract info. on how to use the library just from reading unit tests... 😅
I'd very much appreciate you doing this! |
hmm, key exists only or key not exists only, no, nothing like this but I
could add that if that's the only thing that bugs you ;-) not much work
…-- tony
On Wed, Mar 10, 2021 at 8:22 PM Sean Pianka ***@***.***> wrote:
There is currently no filter with the semantics "Key exists".
I think this is the major piece I'm looking for, @dvtomas
<https://github.com/dvtomas>. I don't know if @przygienda
<https://github.com/przygienda> has plans to implement this, but I'd be
open to using either the main project or your fork if you're willing/have
the spare time to implement something like key_exists(key) for the filter.
The only documentation for now is the unit tests, let me know if they are
too difficult to understand.
I wouldn't say they're too difficult to understand, but I just have a hard
time/have to spend a lot of time to extract info. on how to use the library
just from reading unit tests... 😅
The only missing piece is key_exists, the rest is already implemented. If
you are really interested, I can implement key_exists for you and provide a
complete example, it should be fairly simple to do that as the new
architecture is easily extensible.
I'd very much appreciate you doing this! key_exists, along with an
example of how to use it, would be very useful to me, given the semantics
I've been trying (and failing) to achieve.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#16 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AANDIC2KBHM67LHGFJWZHPLTC62FFANCNFSM4FZLDFEQ>
.
|
@przygienda That'd be the only thing! 😁 I'm surprised no one has requested Anyways, do you want me to open a new issue for this, @przygienda? |
it is for me. let me implement that for fun now.
…On Wed, Mar 10, 2021 at 9:37 PM Sean Pianka ***@***.***> wrote:
@przygienda <https://github.com/przygienda> That'd be the only thing! 😁
I'm surprised no one has requested key_exists so far, perhaps the
semantics with key_exists is an uncommon use-case?
Anyways, do you want me to open a new issue for this, @przygienda
<https://github.com/przygienda>?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#16 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AANDIC4NRRR3LUZ4LAYBSEDTC7C7FANCNFSM4FZLDFEQ>
.
|
Pushed a preliminary version. Give it a try. If fits I'll publish as 0.7
…-- tony
On Wed, Mar 10, 2021 at 9:37 PM Sean Pianka ***@***.***> wrote:
@przygienda <https://github.com/przygienda> That'd be the only thing! 😁
I'm surprised no one has requested key_exists so far, perhaps the
semantics with key_exists is an uncommon use-case?
Anyways, do you want me to open a new issue for this, @przygienda
<https://github.com/przygienda>?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#16 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AANDIC4NRRR3LUZ4LAYBSEDTC7C7FANCNFSM4FZLDFEQ>
.
|
version I pushed should allow you to do that
check for absence first
and then check for what values it has inside
…On Wed, Mar 10, 2021 at 7:18 PM Sean Pianka ***@***.***> wrote:
@dvtomas <https://github.com/dvtomas> Do you have time to update the
documentation, or some basic examples, for how to use your branch?
I've been trying to setup a KVFilter to discard messages where a key is
present and contains "bad" values, or if the key is not present... some
semantics like this:
// discardinfo!(server.logger, ""; "err" => None);// discardinfo!(server.logger, ""; "err" => ""); // discardinfo!(server.logger, ""; "key" => "value"); // do not discard, log with sub-`Drain`info!(server.logger, ""; "err" => "error!!!");
However, I haven't been able to figure out how to achieve this with
standard KVFilter. How would I express these logging semantics using your
branch, @dvtomas <https://github.com/dvtomas>?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#16 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AANDIC7UGS4IIQC2NLSTB23TC6SYFANCNFSM4FZLDFEQ>
.
|
I'm still seeing some issues in a unit-test I'm writing for my custom filter. Here's a function which builds a filter that should match the semantics from above: "Pass message fn filter<D>(drain: D, level: Level) -> KVFilter<D> where D: Drain {
KVFilter::new(drain, level)
.only_pass_on_any_key_present(["err".to_string()].iter())
.always_suppress_any(Some(
HashMap::from_iter(
vec![(
"err".to_string(),
HashSet::from_iter(vec!["None".to_string(), "".to_string()]),
)]
)
))
}
#[cfg(test)]
mod tests {
use slog::{error, debug, info, Logger, Level, Drain};
use super::filter;
#[test]
fn should_not_log_info_messages() {
let decorator = slog_term::PlainSyncDecorator::new(slog_term::TestStdoutWriter);
let drain = slog_term::FullFormat::new(decorator).build().fuse();
let filter = filter(drain, Level::Debug).fuse();
let logger = Logger::root_typed(filter, o!());
// should discard
info!(logger, "NO: test info");
info!(logger, "NO: test info"; "count" => 10);
info!(logger, "NO: test error"; "err" => "None");
info!(logger, "NO: test error"; "err" => "");
debug!(logger, "NO: test debug");
// should log to drain
info!(logger, "YES: test error"; "err" => "Panic!");
error!(logger, "YES: test error");
}
} The output from this test is:
I'm not sure why any of the messages are being filtered. Am I building the logger incorrectly, or am I mis-using the code you just added? |
@seanpianka Ok, I've implemented what you need. See latest https://github.com/dvtomas/kvfilter/tree/complete-rewrite. Your exact use-case is implemented here: https://github.com/dvtomas/kvfilter/blob/f856384d9b6bbfbeb531152d439aabc4f3a4afe9/src/lib.rs#L1213 The logic is #[test]
fn test_complex_example_3() {
// Implements https://github.com/slog-rs/kvfilter/pull/16#issuecomment-795856834
let tester = Tester::new(
LevelAtLeast(Level::Warning).or(
LevelAtLeast(Level::Info).and(
FilterSpec::key_exists("err")
.and(FilterSpec::match_kv("err", "None").not())
.and(FilterSpec::match_kv("err", "").not())
)),
EvaluationOrder::LoggerAndMessage,
);
// should discard
info!(tester.log, "REJECT: test info");
info!(tester.log, "REJECT: test info"; "count" => 10);
info!(tester.log, "REJECT: test error"; "err" => "None");
info!(tester.log, "REJECT: test error"; "err" => "");
debug!(tester.log, "REJECT: test debug");
// should log to drain
info!(tester.log, "ACCEPT: test error"; "err" => "Panic!");
error!(tester.log, "ACCEPT: test error");
tester.assert_accepted(2);
} |
sure, will look @ it tonite
…On Thu, Mar 11, 2021 at 1:01 AM Sean Pianka ***@***.***> wrote:
I'm still seeing some issues in a unit-test I'm writing for my custom
filter. Here's a function which builds a filter that *should* match the
semantics from above: "Pass message if key_exists(key) &&
not(key_matches(key, bad_value)) && not(key_matches(key, ""))".
fn filter<D>(drain: D, level: Level) -> KVFilter<D> where D: Drain {
KVFilter::new(drain, level)
.only_pass_on_any_key_present(["err".to_string()].iter())
.always_suppress_any(Some(
HashMap::from_iter(
vec![(
"err".to_string(),
HashSet::from_iter(vec!["None".to_string(), "".to_string()]),
)]
)
))
}
#[cfg(test)]mod tests {
use slog::{error, debug, info, Logger, Level, Drain};
use super::filter;
#[test]
fn should_not_log_info_messages() {
let decorator = slog_term::PlainSyncDecorator::new(slog_term::TestStdoutWriter);
let drain = slog_term::FullFormat::new(decorator).build().fuse();
let filter = filter(drain, Level::Debug).fuse();
let logger = Logger::root_typed(filter, o!());
// should discard
info!(logger, "NO: test info");
info!(logger, "NO: test info"; "count" => 10);
info!(logger, "NO: test error"; "err" => "None");
info!(logger, "NO: test error"; "err" => "");
info!(logger, "NO: test info"; "count" => 10);
debug!(logger, "NO: test debug");
// should log to drain
info!(logger, "YES: test error"; "err" => "Panic!");
error!(logger, "YES: test error");
}
}
Please let me know if I'm using your code wrong somehow...
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#16 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AANDICYJUSTFYXAWEEYKUE3TC723DANCNFSM4FZLDFEQ>
.
|
I thought you gave me a unit test. write a full unit test for me that fails
and then I'll fix it. I don't have time to do that right now ...
…On Thu, Mar 11, 2021 at 1:01 AM Sean Pianka ***@***.***> wrote:
I'm still seeing some issues in a unit-test I'm writing for my custom
filter. Here's a function which builds a filter that *should* match the
semantics from above: "Pass message if key_exists(key) &&
not(key_matches(key, bad_value)) && not(key_matches(key, ""))".
fn filter<D>(drain: D, level: Level) -> KVFilter<D> where D: Drain {
KVFilter::new(drain, level)
.only_pass_on_any_key_present(["err".to_string()].iter())
.always_suppress_any(Some(
HashMap::from_iter(
vec![(
"err".to_string(),
HashSet::from_iter(vec!["None".to_string(), "".to_string()]),
)]
)
))
}
#[cfg(test)]mod tests {
use slog::{error, debug, info, Logger, Level, Drain};
use super::filter;
#[test]
fn should_not_log_info_messages() {
let decorator = slog_term::PlainSyncDecorator::new(slog_term::TestStdoutWriter);
let drain = slog_term::FullFormat::new(decorator).build().fuse();
let filter = filter(drain, Level::Debug).fuse();
let logger = Logger::root_typed(filter, o!());
// should discard
info!(logger, "NO: test info");
info!(logger, "NO: test info"; "count" => 10);
info!(logger, "NO: test error"; "err" => "None");
info!(logger, "NO: test error"; "err" => "");
info!(logger, "NO: test info"; "count" => 10);
debug!(logger, "NO: test debug");
// should log to drain
info!(logger, "YES: test error"; "err" => "Panic!");
error!(logger, "YES: test error");
}
}
Please let me know if I'm using your code wrong somehow...
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#16 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AANDICYJUSTFYXAWEEYKUE3TC723DANCNFSM4FZLDFEQ>
.
|
I wrote a unit test that fails, @przygienda. Did you see my message above? |
works fine, committed. what you missed is that only things @ logging level
same or higher than the KVFilter are filtered
since you put in debug ONLY the debug was filtered, e'thing else was higher
preference and making it out. check out the fixed test
…On Thu, Mar 11, 2021 at 7:44 PM Sean Pianka ***@***.***> wrote:
I wrote a unit test that fails, @przygienda
<https://github.com/przygienda>. Did you see my message above?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#16 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AANDICZIPLOG2UZVZA64RA3TDD6QZANCNFSM4FZLDFEQ>
.
|
Your implementation and my implementation differ in handling the following statement (not included in the original @seanpianka unit test, but possibly important):
My implementation rejects the message (see my newest commit), because it is at the debug level. Your implementation accepts it, because it passes the has err key with a non-supressed value. The question is, which behavior @seanpianka prefers. My guess is that the intent is to reject debug messages altogether, so my implementation would be a better fit, but of course it is up to @seanpianka to decide. Just to let you know, it is quite easy to change the filter configuration in my implementation to give the alternative accept debug behavior. |
my experience on large systems is that you normally run above debug for
performance reasons and you switch on debug but have to filter debug
statements becasuse otherwise the output will overrun you. Nevertheless you
want info/error and so on not be filtered because they're too important. I
make a living out of that for last 25+ years and work on systems that
literally hold up infra @ planet scale so I have fairly firm opinion ;-)
…On Thu, Mar 11, 2021 at 10:18 PM Tomas Dvorak ***@***.***> wrote:
@przygienda <https://github.com/przygienda> @seanpianka
<https://github.com/seanpianka>
Your implementation and my implementation differ in handling the following
statement (not included in the original @seanpianka
<https://github.com/seanpianka> unit test, but possibly important):
debug!(logger, "NO: test debug"; "err" => "Panic!");
My implementation rejects the message (see my newest commit
<dvtomas@a1bad7d>),
because it is at the *debug* level. Your implementation accepts it,
because it passes the *has err key with a non-supressed value*. The
question is, which behavior @seanpianka <https://github.com/seanpianka>
prefers. My guess is that the intent is to reject debug messages
altogether, so my implementation would be a better fit, but of course it is
up to @seanpianka <https://github.com/seanpianka> to decide. Just to let
you know, it is quite easy to change the filter configuration in my
implementation to give the alternative *accept debug* behavior.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#16 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AANDIC6VBR76ZFNLXTM6ODLTDEXT5ANCNFSM4FZLDFEQ>
.
|
Yeah, I usually do the same. @seanpianka seems to have a different use-case, though, judging by him wanting to have Also, your implementation accepts the debug messages, which is something you usually don't want, as you've just claimed yourself. Is there a way to configure your filter to pass @seanpianka 's unit test, but reject debug messages? |
# Conflicts: # Cargo.toml # README.md # src/lib.rs
Implement #15