-
-
Notifications
You must be signed in to change notification settings - Fork 327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filter Duplicate Input Execution #2771
base: main
Are you sure you want to change the base?
Filter Duplicate Input Execution #2771
Conversation
* Rules * more * aa
* fixing empty multipart name * fixing clippy * improve flexibility of DumpToDiskStage * adding note to MIGRATION.md
Updates the requirements on [bindgen](https://github.com/rust-lang/rust-bindgen) to permit the latest version. - [Release notes](https://github.com/rust-lang/rust-bindgen/releases) - [Changelog](https://github.com/rust-lang/rust-bindgen/blob/main/CHANGELOG.md) - [Commits](rust-lang/rust-bindgen@v0.70.1...v0.71.1) --- updated-dependencies: - dependency-name: bindgen dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* no from stage * fixer * doc fix * how was this working???? * more fixes * delete more * rq * cargo-fuzz * m * aa
* go * fixing stuf * hello from windows * more * lolg * lolf * fix * a --------- Co-authored-by: Your Name <[email protected]>
* Maybe fix CI * does this help? * Very dirty 'fix'
* fixing empty multipart name * fixing clippy * New rules for the contributing (AFLplusplus#2752) * Rules * more * aa * Improve Flexibility of DumpToDiskStage (AFLplusplus#2753) * fixing empty multipart name * fixing clippy * improve flexibility of DumpToDiskStage * adding note to MIGRATION.md * Introduce WrappingMutator * introducing mutators for int types * fixing no_std * random fixes * Add hash derivation for WrappingInput * Revert fixes that broke things * Derive Default on WrappingInput * Add unit tests * Fixes according to code review * introduce mappable ValueInputs * remove unnecessary comments * Elide more lifetimes * remove dead code * simplify hashing * improve docs * improve randomization * rename method to align with standard library * add typedefs for int types for ValueMutRefInput * rename test * add safety notice to trait function * improve randomize performance for i128/u128 * rename macro * improve comment * actually check return values in test * make 128 bit int randomize even more efficient * shifting signed values --------- Co-authored-by: Dongjia "toka" Zhang <[email protected]> Co-authored-by: Dominik Maier <[email protected]>
Or even |
As I stated in the discussion thread, I think a method for rejecting inputs that were already tried would be more useful (but I don't know your use case, so..) |
I'm targeting the TCP/IP stack of an OS, so each execution takes in the order of magnitude of 1s, although most of that is spent in wait states (hence previous work like overcommit). Even still, the added runtime of this would be nothing compared to the execution, so this felt like an easy win.
Something like this would definitely further improve the situation. Do you suggest creating a wrapping executor that returns either Tracing this back it seems most appropriate in the stage? But that seems not that generic. So maybe in I'm also not sure if there's an opportunity here to combine this somehow with |
I think it could simply wrap an executor, yeah. And have an extra observation that's "skipped" -if it's true the testcase isn't interesting. Should be easy enough to do. \We can still merge this PR as well, but the feedback should be renamed IMHO. |
How about something like this? |
I'll do some performance comparisons later today. Initial runs suggest that adding even a 10µs sleep to the harness reduces the performance penalty to <5%. I might also see how many duplicate inputs actually appear. But for now I feel like for slow targets this very well might be worth using. |
Alright, some performance tests. Running against the
All these numbers obviously depend on the exact fuzzers:
Overall, I feel like this may be worth having in the library. Btw: There is no easy way of adding metadata to the state such that it is printed by monitors, right? Otherwise, calculating the number/rate of duplicates may be an interesting addition. |
There is an easy way, using |
@@ -56,28 +56,13 @@ license = "MIT OR Apache-2.0" | |||
# Internal deps | |||
libafl = { path = "./libafl", version = "0.14.1", default-features = false } | |||
libafl_bolts = { path = "./libafl_bolts", version = "0.14.1", default-features = false } | |||
libafl_cc = { path = "./libafl_cc", version = "0.14.1", default-features = false } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's just testing for I assume?
@@ -8,8 +8,9 @@ authors = [ | |||
edition = "2021" | |||
|
|||
[features] | |||
default = ["std"] | |||
default = ["std", "bloom_filter"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The feature flag should be called by the name of the feature, not by implementation detail. Maybe something like. "reexecution_filter" or similar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bloom_input_filter
would be a middle ground(?)
Btw: This will not work, since the stage might mutate the input and execute it multiple times while the filtering is only possible at the start of the stage. So while the input at the beginning of the mutational stage might not have been seen before, mutations might still transform it into a version we've already executed before. Also: The input at the beginning of the stage comes from the corpus, right? So it has been executed before by definition. So implementing this as a wrapper stage seems not possible. Implementing this within stages would require changes to every mutational stage. I'm in favour of doing it in the fuzzer tbh. Implementations in the executor still require observers/feedbacks to run, implementations in the stage don't really work either. |
yes. after talking to domenuk i realized what we want to is to filter against every generated input. so it's impossible to do with stages |
So do I fix the things @domenukk mentioned in the beginning and we merge this approach? Or how do we continue? |
The idea is to add the option to return |
Correct me if I'm wrong, but wouldn't this still run the observers and feedback every time? |
Not if we change the executors to not execute in this case |
But those are not run from within the executors but instead in the fuzzer, no? Or do you want to change this as well? |
@riesentoaster |
I think the problems is that we are using several types of APIs to call the harness target. I think adding your change to one of them before unifying the use of them is not good |
Another solution is that since your stuff will work mostly with MutationalStage or PowerMutationalStage then we can just add filter to those files only then domenukk's
this problem is solved |
Also GenStage and TuneableMutationalStage at least. Sounds like a good solution but we need to be careful not to forget things |
This sounds like a lot of code duplication.
I like the ability to just call the executor without any observers or anything around it, since it may be helpful to run just the target. I personally think the functionality in this PR should be implemented wherever the observers and executor are called during the fuzzing loop. If that is in the fuzzer, so be it. If we want it elsewhere, move the logic that calls observers/executors there, too. It's a single function. Anything else is just hacky. Btw: Why does executor.run_target need a reference to the fuzzer? That seems like a cyclic dependency, since it would otherwise mostly be called through the fuzzer. I think it's exclusively used to run observers in |
It's the right amount of code duplication: each stage should probably decide for itself if it needs to filter inputs or not. |
Unrelated, if you can remove some trait bounds you're more than welcome to open a PR :) |
We could have a run_with_filter method on the executor trait, maybe that'd reduce the shared code? |
And have a default implementation that calls |
I've thought about this some more. To me, filtering the evaluation of an input belongs in the
Would you be willing to entertain any of these ideas? My current favourite is 3 > 1 > 2. |
now we know executor hook won't work, and implementing into stages causes duplicate codes, |
@@ -549,6 +589,7 @@ where | |||
+ UsesInput<Input = <S::Corpus as Corpus>::Input>, | |||
<S::Corpus as Corpus>::Input: Input, | |||
S::Solutions: Corpus<Input = <S::Corpus as Corpus>::Input>, | |||
IF: InputFilter<<S::Corpus as Corpus>::Input>, | |||
{ | |||
/// Process one input, adding to the respective corpora if needed and firing the right events |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add doc here to explain that here filtering is taking place
I'm still pro stages: most stages don't need filtering. At least Calibration, Tracing, Concolic execution stages all will break when we filter the inputs. Many other stages are not executing the target. |
It's only relevant for three stages or so, right? Let's just add a helper function that the stages can call if they choose. |
Good points. That would also mean storing the filter data in each stage. Is that the best approach? Or do we want to store that globally for each fuzzing instance (so probably either in the state or the fuzzer)? Also: I think we should still make the unfiltered version of each stage available, the overhead of filtering may not always be worth it. And since we need to store the data somewhere, but only in certain cases, it would involve significant code changes to each stage, more than a simple helper function. |
can't we add |
I would name it |
See #2759.
Some mutators report
MutationResult::Mutated
, even if nothing actually changes about the input.HashMutator
is a wrapper around other mutators that hashes inputs pre- and post-mutation to ensureMutationResult::Mutated
is only reported if something actually changed.This may be worth using on slow targets, where the hashing is quicker than the unnecessary additional executions of the target for previously tried inputs.