Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

5 Instances in Verified Fail for Gold Patch #267

Open
wistuba opened this issue Nov 30, 2024 · 1 comment
Open

5 Instances in Verified Fail for Gold Patch #267

wistuba opened this issue Nov 30, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@wistuba
Copy link
Contributor

wistuba commented Nov 30, 2024

Describe the bug

There are 5 cases in which the gold patch fails on Verified:

  • astropy__astropy-7606
  • astropy__astropy-8707
  • astropy__astropy-8872
  • matplotlib__matplotlib-20488
  • django__django-10097

The reason for why astropy__astropy-7606 fails seems to be that Verified was not update (see #223). It works when using princeton-nlp/SWE-bench instead.

Steps/Code to Reproduce

python -m swebench.harness.run_evaluation \                                                                                                      
 --predictions_path gold \                                                                 
 --max_workers 5 \
 --dataset_name princeton-nlp/SWE-bench_Verified \
 --run_id validate-gold \
 --instance_ids astropy__astropy-7606 matplotlib__matplotlib-20488 django__django-10097 astropy__astropy-8872 astropy__astropy-8707 \
 --cache_level instance

Expected Results

All 5 problems are resolved

Actual Results

None of them is resolved

System Information

latest version on main

@wistuba wistuba added the bug Something isn't working label Nov 30, 2024
@wistuba wistuba changed the title 5 Instances in Verified Fail 5 Instances in Verified Fail for Gold Patch Nov 30, 2024
@wistuba
Copy link
Contributor Author

wistuba commented Nov 30, 2024

I was looking into it. The matplotlib problem could be something on my end:

Test failing: lib/matplotlib/tests/test_image.py::test_https_imread_smoketest
Reason: urllib.error.HTTPError: HTTP Error 403: Forbidden

This is a simple test trying to read https://matplotlib.org/1.5.0/_static/logo2.png Things work fine on my machine when trying to set up the test manually. Could be that the urllib request gets blocked. This has happened previously already with django (this was fixable though since it was part of swebench code, not the benchmark itself)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant