-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AFL filename formats #11
Conversation
Note that only the two first commits directly concern the PR. |
Thanks for the PR @cponcelets! I will take a closer look at the code sometime soon, sorry about that as I am quite busy these days. A bit more context on why using the simplified format, as KLEE needs to convert the synthesized input back to something AFL can recognize, it does not have the mutation context on this seed, thus I chose the simplified format in the first place. BTW, I am not opinionated towards either directions though, just curious on what you think the pros and cons are : ) |
Sure, let me be clearer. As you said, there are three modules within SAVIOR: the fuzzer (AFL), the coordinator and the concolic engine (klee). Now, you have also different data-flows between these modules:
As shown in the example, the current version is using:
In order for AFL to understand klee outputs:
True, the problem here is that I cannot change klee converter outputs. It should also work if klee outputs
Yes, but this is not a problem, only The problem are the To depict you an overview after the PR, the modules use:
Correct me if I am wrong but the coordinator uses filename as a testcase id in its seed lists. In order to map the scores/first_seen values to a file, it needs to unify the names coming from A. and B. It is thus necessary to convert standard into simple formats and avoid confusions between |
Thanks for the detailed clarification, Indeed KLEE outputting the standard format will disrupt AFL from benefiting from the generated seeds. I was scratching my head to recall what happened, cuz when we experimented before we see AFL imported KLEE's generated inputs, @junxzm1990 please hold me accountable here. Now here we have 2 options, make the rest of the modules understand standard format, and we have more insightful names maybe for further analysis, or we can ask KLEE to output simplified name to have minimum change. I am happy to accept the PR btw, but would like to flag it to @DanielGuoVT for awareness as he is working to open source KLEE, and AFAIK there is another version of savior in Baidu's internal repo. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we remove the binary files incl. *.o
and *.o.bc
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem are the first_seen values in AFL. You cannot retrieve easily filenames from ids as it is currently implemented in AFL. This is the reason why I kept simple file format inside the coordinator, demanding the format conversions.
it's been a few years since i work on this code, can you elaborate a bit more, why we can't keep the naming scheme consistent (i.e., use the standard naming across all modules?), the first_seen values in AFL can be modified here:
Line 3546 in b419c86
fprintf(f, "%u\t%s/queue/id_%06u\n", i, out_dir, edge_san_first_seen[i]); |
would it help make things come cleaner?
if not tmp: | ||
return "" | ||
else: | ||
return tmp[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is there a case when there will be multiple entries return by glob
given a unique name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah never mind, glob.glob returns a list,
can we add an assert here to ensure the list len is 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think an assert is too strong since a file can be removed by AFL on the fly.
Time to time AFL calls a routine to polish the queue (a cmin similar function if you want), this is briefly mentionned here as a part of afl-fuzz algorithm.
Unfortunately, it may raise the assertion if the file savior wants to read has been removed by AFL.
I preferred the way you chose here and simply continue if a problem occurred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem are the first_seen values in AFL. You cannot retrieve easily filenames from ids as it is currently implemented in AFL. This is the reason why I kept simple file format inside the coordinator, demanding the format conversions.
it's been a few years since i work on this code, can you elaborate a bit more, why we can't keep the naming scheme consistent (i.e., use the standard naming across all modules?), the first_seen values in AFL can be modified here:
Line 3546 in b419c86
fprintf(f, "%u\t%s/queue/id_%06u\n", i, out_dir, edge_san_first_seen[i]); would it help make things come cleaner?
The problem here is the use of edge_san_first_seen[i]
storing only the id of the first testcase covering a branch. I have not seen a simple way to print back the full filename in the standard format. A solution would be to store the full name but it does not sound like a simpler way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, I got your point, reading the code again we use the seed names in the input_id_map
for SE converter, so it needs to be a full match.
Thanks for the discussion btw, my concern was keeping a mixed scheme will make the code logic more convoluted, being able to modified KLEE seems like a more straightforward approach but we don't have source.
@DanielGuoVT maybe you could consider release another klee version before fully open source. But until then we can use the solution in this PR.
can we remove the binary files incl. *.o and *.o.bc?
Of course, make clean should even remove these hidden files
(.savior_sanitizer_combination and .afl_coverage_combination also).
|
Goal:
Fix filename formats between SAVIOR and AFL.
Issue:
AFL can use two kinds of filename formats:
id_[0-9]{6}
),id:[0-9]{6}
) and followed by AFL information.However, SAVIOR:
SIMPLE_FILES
,As a consequence AFL does not read SAVIOR testcases because of a format mismatch.
A first solution has been committed (commit:e2c18d9bf) removing
SIMPLE_FILES
flag.However, there is still a mix between simple and standard formats in SAVIOR.
For example::
coverage.csv
,edge_sanitizer.csv
.These files are using the simple format to specify testcases.
Problems:
Solution: Use only simple filename format into SAVIOR.
The idea is to:
This way, a testcase has a unique internal name into SAVIOR (the simple formatted one).
Example:
Before:
Comments:
od
(dumping the file in octal formats). The second0000000
shows that none of the testcases have been checked.After:
Comment:
4
, all the testcases have been checked.Just to be sure and check the AFL testcases: