PASS_TO_PASS and FAIL_TO_PASS test cases #257

chenzimin · 2024-11-21T09:11:28Z

Describe the issue

Hi, sorry if I missed this in your paper or somewhere in your GitHub repository.

But I tried to execute all existing test cases for a given SWE-bench project, for example by using the docker image (I assume that SWE-bench team have uploaded it) from https://hub.docker.com/r/swebench/sweb.eval.x86_64.psf_1776_requests-1142:

docker run -it swebench/sweb.eval.x86_64.psf_1776_requests-1142:v1
pytest -rA

Here is the full list of test results:

PASSED test_requests.py::RequestsTestCase::test_basic_building
PASSED test_requests.py::RequestsTestCase::test_entry_points
PASSED test_requests.py::RequestsTestCase::test_invalid_url
PASSED test_requests.py::RequestsTestCase::test_params_are_added_before_fragment
PASSED test_requests.py::RequestsTestCase::test_path_is_not_double_encoded
FAILED test_requests.py::RequestsTestCase::test_BASICAUTH_TUPLE_HTTP_200_OK_GET - TypeError: __init__() got an unexpected keyword argument 'strict'
FAILED test_requests.py::RequestsTestCase::test_DIGESTAUTH_WRONG_HTTP_401_GET - TypeError: __init__() got an unexpected keyword argument 'strict'
FAILED test_requests.py::RequestsTestCase::test_DIGEST_HTTP_200_OK_GET - TypeError: __init__() got an unexpected keyword argument 'strict'
FAILED test_requests.py::RequestsTestCase::test_HTTP_200_OK_GET_ALTERNATIVE - TypeError: __init__() got an unexpected keyword argument 'strict'
FAILED test_requests.py::RequestsTestCase::test_HTTP_200_OK_GET_WITH_MIXED_PARAMS - TypeError: __init__() got an unexpected keyword argument 'strict'
FAILED test_requests.py::RequestsTestCase::test_HTTP_200_OK_GET_WITH_PARAMS - TypeError: __init__() got an unexpected keyword argument 'strict'
FAILED test_requests.py::RequestsTestCase::test_HTTP_200_OK_HEAD - TypeError: __init__() got an unexpected keyword argument 'strict'
FAILED test_requests.py::RequestsTestCase::test_HTTP_200_OK_PUT - TypeError: __init__() got an unexpected keyword argument 'strict'
FAILED test_requests.py::RequestsTestCase::test_HTTP_302_ALLOW_REDIRECT_GET - TypeError: __init__() got an unexpected keyword argument 'strict'
FAILED test_requests.py::RequestsTestCase::test_POSTBIN_GET_POST_FILES - TypeError: __init__() got an unexpected keyword argument 'strict'
FAILED test_requests.py::RequestsTestCase::test_POSTBIN_GET_POST_FILES_WITH_DATA - TypeError: __init__() got an unexpected keyword argument 'strict'
FAILED test_requests.py::RequestsTestCase::test_custom_content_type - TypeError: __init__() got an unexpected keyword argument 'strict'
FAILED test_requests.py::RequestsTestCase::test_decompress_gzip - TypeError: __init__() got an unexpected keyword argument 'strict'
FAILED test_requests.py::RequestsTestCase::test_different_encodings_dont_break_post - TypeError: __init__() got an unexpected keyword argument 'strict'
FAILED test_requests.py::RequestsTestCase::test_links - TypeError: __init__() got an unexpected keyword argument 'strict'
FAILED test_requests.py::RequestsTestCase::test_prepared_request_hook - TypeError: __init__() got an unexpected keyword argument 'strict'
FAILED test_requests.py::RequestsTestCase::test_request_ok_set - TypeError: __init__() got an unexpected keyword argument 'strict'
FAILED test_requests.py::RequestsTestCase::test_status_raising - TypeError: __init__() got an unexpected keyword argument 'strict'
FAILED test_requests.py::RequestsTestCase::test_unicode_get - TypeError: __init__() got an unexpected keyword argument 'strict'
FAILED test_requests.py::RequestsTestCase::test_urlencoded_get_query_multivalued_param - TypeError: __init__() got an unexpected keyword argument 'strict'
FAILED test_requests.py::RequestsTestCase::test_user_agent_transfers - TypeError: __init__() got an unexpected keyword argument 'strict'

The first question is why does not all test cases pass? I also found this is the case for several other instances.

The second question is, what is your reasoning for not allowing the use PASS_TO_PASS test cases for evaluation of SWE-bench? This question is related to the first one, the test cases that pass before fixing the issue shouldn't be a secret, having the existing test failing makes it harder to determine if they are caused by the bug that the issue raised, or if it is SWE-bench issue.

I come from automated program repair background, therefore in there, the usual assumption is that existing test cases will pass and we will use them as regression test to test that our patch did not break the existing functionality.

Suggest an improvement to documentation

No response

The text was updated successfully, but these errors were encountered:

chenzimin added the documentation Improvements or additions to documentation label Nov 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PASS_TO_PASS and FAIL_TO_PASS test cases #257

PASS_TO_PASS and FAIL_TO_PASS test cases #257

chenzimin commented Nov 21, 2024 •

edited

Loading

PASS_TO_PASS and FAIL_TO_PASS test cases #257

PASS_TO_PASS and FAIL_TO_PASS test cases #257

Comments

chenzimin commented Nov 21, 2024 • edited Loading

Describe the issue

Suggest an improvement to documentation

chenzimin commented Nov 21, 2024 •

edited

Loading