Wasmtime fuzzing: manually inject faults in a branch to verify fuzzing end-to-end #9449

cfallin · 2024-10-10T21:50:26Z

In the Wasmtime meeting today, there was a lot of good discussion about our procedures and standards around fuzzing, and how to make process improvements to make sure fuzzing catches issues in on-by-default features. I thought I'd expand a bit on an idea I proposed there.

The basic problem is: the configuration space of Wasm features that get fuzzed is the result of a fairly nontrivial interaction between feature selection logic in multiple places and crates. There are plausible reasons why this isn't all centralized; even if we do eventually get to a point where there is One Main Function that determines what is fuzzed, there is still the question of whether the plumbing that flows outward from that definition faithfully preserves the feature.

I'm suggesting as a principle that the only way to really know whether fuzzing will catch bugs is to... make fuzzing catch bugs. An "integration test" wrapped around our whole setup, as it were. The idea (which I'll maybe call a "fire drill" approach) is:

Have a branch in the repo (fuzzing-firedrill) that has main at some recent point plus one commit.
Update our fuzzing configuration to run one or several targets on this branch as well as main in oss-fuzz.
As a weekly/monthly/?? chore, recreate this branch by taking current main and injecting one fault. Perhaps we can keep a list/queue of ideas. They should be simple changes -- one-line commits that invert a bit or conditional, or delete a register update, or ... Ideally, this is ~10 minutes per week of work.
Wait for oss-fuzz to find the issue. Revert the fault commit (or move to the next fault?) when it does. If it doesn't, find out why.

The main idea is that we want to test the whole pipeline all the way to oss-fuzz-email-in-inboxes. Nothing else tests this whole pipeline today. We simply don't know if it works, except when we get emails. The idea here is to turn the email rate up from "0 + actual errors" to "fault-injected stream + actual errors", so it's always nonzero, so we can keep verifying it works.

There were a few objections or alternative ideas in discussions today:

Suggestions to do automated mutation testing (the meeting notes above describe the proposal as mutation testing even). I don't think this is quite the same thing: the idea is explicitly not to have automation (which could fail), but to simulate a human making a faulty commit. It's like a firedrill: do everything except actually set the fire (put the commit on main).
Suggestions that this only catches bugs we think to inject: I would argue the main point is to (i) ensure the reporting pipeline works, and (ii) that code we inject faults into is as load-bearing as we think it is. As long as our sense of which code is load-bearing is tuned right, it shouldn't matter too much which faults we inject, just that we break it (and watch for the email).
Suggestions that this would take too much time: I don't think "push one commit a week" is too bad, but on the other hand, we're getting something for it too: this adds end-to-end assurance/confidence in our setup that we don't really have today otherwise. If we get no emails from oss-fuzz for a month, is that because we're on a lucky streak and writing good code, or because something else changed? I'd argue that "push a commit that deletes a line every Monday morning" is easier than reviewing coverage carefully.

Thoughts?

The text was updated successfully, but these errors were encountered:

This commit adds a test to the `wasmtime-fuzzing` crate as a sanity-check that eventually a module is generated requiring all of the features that wasmtime supports. This is intended to be another double-check in the process of enabling a proposal in Wasmtime by ensuring that the feature is added to this list which then transitively requires that fuzzing eventually generates a module needing that feature. cc bytecodealliance#9449

alexcrichton · 2024-10-10T22:44:09Z

One idea I had sort of along these lines (but also relatively orthogonal) which was similar to what Nick was saying about tests in wasmtime-fuzzing is #9452. Mostly cc'ing that as related-to, but definitely not replacing this.

I personally really like the idea of testing "oss-fuzz-email-in-inboxes" as oss-fuzz infrastructure itself has hiccups relatively frequently and this helps us diagnose that as well. The success of this testing though I think will be based in how little effort we need to apply to it. Automating as much as we can with "just hit the button" or something like that I think would be best. I wouldn't want to have to maintain a document explaining all the various steps of updating branches, how/when to revert, what cadence, etc.

That being said I think this is all quite doable. GitHub Actions gives us cron jobs and we can leave an unprotected branch able to get modified by it. Having a button to hit for "ok revert that top commit" which ideally auto-merges or just doesn't even go through PRs would be pretty nice too.

cfallin · 2024-10-10T22:54:43Z

Ah, perhaps this is workable: note certain points in the code with a magic comment string. // delete-to-test-fuzzing before a line of code or somesuch. Then a GH Actions cronjob could pick one of these at random and update the branch once a week, as you say. (The selection of such lines needs a little human thought, but not much: should be deletions that won't break compilation trivially, and will do interesting things: misalign the stack, generate the wrong value, skip a GC write barrier, ...)

alexcrichton · 2024-10-10T22:59:44Z

Oh man now I really like that. One could imagine dialing that up to 11 and fully automate this entire system. For example a branch could be made and if a maintainer doesn't flag somewhere "hey the fuzzers found the bug" then an issue would automatically get opened and filed. Otherwise the management of the "fuzz this broken wasmtime" branch would be 100% automated in theory.

This commit adds a test to the `wasmtime-fuzzing` crate as a sanity-check that eventually a module is generated requiring all of the features that wasmtime supports. This is intended to be another double-check in the process of enabling a proposal in Wasmtime by ensuring that the feature is added to this list which then transitively requires that fuzzing eventually generates a module needing that feature. cc #9449

cfallin mentioned this issue Oct 10, 2024

Update docs/stabilization of wasm proposals #9434

Merged

alexcrichton mentioned this issue Oct 10, 2024

Add a test that fuzzers generate wasm proposals #9452

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wasmtime fuzzing: manually inject faults in a branch to verify fuzzing end-to-end #9449

Wasmtime fuzzing: manually inject faults in a branch to verify fuzzing end-to-end #9449

cfallin commented Oct 10, 2024

alexcrichton commented Oct 10, 2024

cfallin commented Oct 10, 2024

alexcrichton commented Oct 10, 2024

Wasmtime fuzzing: manually inject faults in a branch to verify fuzzing end-to-end #9449

Wasmtime fuzzing: manually inject faults in a branch to verify fuzzing end-to-end #9449

Comments

cfallin commented Oct 10, 2024

alexcrichton commented Oct 10, 2024

cfallin commented Oct 10, 2024

alexcrichton commented Oct 10, 2024