Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spurious counterexamples #844

Open
samysweb opened this issue Oct 9, 2024 · 0 comments
Open

Spurious counterexamples #844

samysweb opened this issue Oct 9, 2024 · 0 comments

Comments

@samysweb
Copy link

samysweb commented Oct 9, 2024

Hi,

during experiments, we observed that Marabou returns spurious counterexamples that are not actual counterexamples to the provided spec. Our experiments are concerned with specifications for product networks in the context of MNIST, i.e. we have two NNs that are glued together and fed the same inputs. We then try to prove bounds on the difference between the two NNs.

To this end, we try to prove unsatisfiability for an output spec that looks something like this (full VNNLIB spec in the zip file):

(assert (or
	(and (>= Y_0 1.0))
	(and (<= Y_0 -1.0))
        ...
	(and (>= Y_9 1.0))
	(and (<= Y_9 -1.0))
))

Note that in this case the spec can be satisfied iff one of the outputs is <= -1 or >= 1.0.
However, Marabou will sometimes claim satisfiability while returning counterexamples that do not violate the property (all outputs between -1 and 1). In particular, for 13 instances we have good reason to believe that these specs are unsatisfiable as we have received unsatisfiability results from other tools. We have found 13 such cases in Marabou 2.0 (pip version) and were able to reproduce 8 of these cases with Marabou's master branch (commit 984c51f) -- for the other cases we had timeouts due to computational constraints.

It also seems like we have further cases where other tools also claim satisfiability, but the counterexample returned by Marabou does not violate the spec, i.e. Marabou returns the right judgment, but with a spurious counterexample.
For now, we have only included the instances where Marabou gives the wrong answer (i.e. sat instead of unsat).

For reproducibility, I am attaching a zip file with the following structure:

  • nets/product_mnist_relu_3_100_mnist_relu_3_100_pruned5_eps.onnx: ONNX network to verify
  • specs/: VNNLIB files with full specifications of the 13 properties
  • logs/: Log output produced for the instances by master branch version
  • logs-2.0: Log output produced for the instances by 2.0 version
  • instances_reproduced.csv Instances that were reproduced on master branch version
  • instances_unknown.csv Instances with unknown result on master branch and bug on 2.0
  • run_marabou.sh: Script to rerun instances in instances_reproduced.csv (requires adjusting MARABOU_PATH; assumes no timeout)
    Note that this overwrites the logs/ directory

We hope this report will help with fixing the issue, thanks a lot for your work on Marabou!

report.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant