You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
during experiments, we observed that Marabou returns spurious counterexamples that are not actual counterexamples to the provided spec. Our experiments are concerned with specifications for product networks in the context of MNIST, i.e. we have two NNs that are glued together and fed the same inputs. We then try to prove bounds on the difference between the two NNs.
To this end, we try to prove unsatisfiability for an output spec that looks something like this (full VNNLIB spec in the zip file):
Note that in this case the spec can be satisfied iff one of the outputs is <= -1 or >= 1.0.
However, Marabou will sometimes claim satisfiability while returning counterexamples that do not violate the property (all outputs between -1 and 1). In particular, for 13 instances we have good reason to believe that these specs are unsatisfiable as we have received unsatisfiability results from other tools. We have found 13 such cases in Marabou 2.0 (pip version) and were able to reproduce 8 of these cases with Marabou's master branch (commit 984c51f) -- for the other cases we had timeouts due to computational constraints.
It also seems like we have further cases where other tools also claim satisfiability, but the counterexample returned by Marabou does not violate the spec, i.e. Marabou returns the right judgment, but with a spurious counterexample.
For now, we have only included the instances where Marabou gives the wrong answer (i.e. sat instead of unsat).
For reproducibility, I am attaching a zip file with the following structure:
nets/product_mnist_relu_3_100_mnist_relu_3_100_pruned5_eps.onnx: ONNX network to verify
specs/: VNNLIB files with full specifications of the 13 properties
logs/: Log output produced for the instances by master branch version
logs-2.0: Log output produced for the instances by 2.0 version
instances_reproduced.csv Instances that were reproduced on master branch version
instances_unknown.csv Instances with unknown result on master branch and bug on 2.0
run_marabou.sh: Script to rerun instances in instances_reproduced.csv (requires adjusting MARABOU_PATH; assumes no timeout) Note that this overwrites the logs/ directory
We hope this report will help with fixing the issue, thanks a lot for your work on Marabou!
Hi,
during experiments, we observed that Marabou returns spurious counterexamples that are not actual counterexamples to the provided spec. Our experiments are concerned with specifications for product networks in the context of MNIST, i.e. we have two NNs that are glued together and fed the same inputs. We then try to prove bounds on the difference between the two NNs.
To this end, we try to prove unsatisfiability for an output spec that looks something like this (full VNNLIB spec in the zip file):
Note that in this case the spec can be satisfied iff one of the outputs is <= -1 or >= 1.0.
However, Marabou will sometimes claim satisfiability while returning counterexamples that do not violate the property (all outputs between -1 and 1). In particular, for 13 instances we have good reason to believe that these specs are unsatisfiable as we have received unsatisfiability results from other tools. We have found 13 such cases in Marabou 2.0 (pip version) and were able to reproduce 8 of these cases with Marabou's master branch (commit
984c51f
) -- for the other cases we had timeouts due to computational constraints.It also seems like we have further cases where other tools also claim satisfiability, but the counterexample returned by Marabou does not violate the spec, i.e. Marabou returns the right judgment, but with a spurious counterexample.
For now, we have only included the instances where Marabou gives the wrong answer (i.e. sat instead of unsat).
For reproducibility, I am attaching a zip file with the following structure:
nets/product_mnist_relu_3_100_mnist_relu_3_100_pruned5_eps.onnx
: ONNX network to verifyspecs/
: VNNLIB files with full specifications of the 13 propertieslogs/
: Log output produced for the instances by master branch versionlogs-2.0
: Log output produced for the instances by 2.0 versioninstances_reproduced.csv
Instances that were reproduced on master branch versioninstances_unknown.csv
Instances with unknown result on master branch and bug on 2.0run_marabou.sh
: Script to rerun instances ininstances_reproduced.csv
(requires adjustingMARABOU_PATH
; assumes no timeout)Note that this overwrites the
logs/
directoryWe hope this report will help with fixing the issue, thanks a lot for your work on Marabou!
report.zip
The text was updated successfully, but these errors were encountered: