-
Notifications
You must be signed in to change notification settings - Fork 365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
5 Instances in Verified Fail for Gold Patch #267
Comments
I was looking into it. The matplotlib problem could be something on my end: Test failing: lib/matplotlib/tests/test_image.py::test_https_imread_smoketest This is a simple test trying to read https://matplotlib.org/1.5.0/_static/logo2.png Things work fine on my machine when trying to set up the test manually. Could be that the urllib request gets blocked. This has happened previously already with django (this was fixable though since it was part of swebench code, not the benchmark itself) |
I ran the provided script, but I observed that the instances all pass.
|
Thanks for the details on the matplotlib instance, good to know. For the astropy instances, I'm not sure if there's an issue? I started containers for the respective images as well, and |
I've just tried with the latest code again. I can confirm that astropy 8707 and 8872 work now. The problem for 7606 remains. It is resolved when using SWE-bench, it is not when using Verified. I've deleted ~/.cache/huggingface/datasets before running the harness. I checked the website, the row doesn't seem to have been updated |
Describe the bug
There are 5 cases in which the gold patch fails on Verified:
The reason for why
astropy__astropy-7606
fails seems to be that Verified was not update (see #223). It works when usingprinceton-nlp/SWE-bench
instead.Steps/Code to Reproduce
Expected Results
All 5 problems are resolved
Actual Results
None of them is resolved
System Information
latest version on main
The text was updated successfully, but these errors were encountered: