Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update datasets to 2.20.0 #3040

Merged
merged 10 commits into from
Aug 23, 2024
Merged

Update datasets to 2.20.0 #3040

merged 10 commits into from
Aug 23, 2024

Conversation

albertvillanova
Copy link
Member

@albertvillanova albertvillanova commented Aug 22, 2024

Update datasets to 2.20.0.

This PR is intended to address the CI errors raised by this update, as a first step before:

Fixes after the update of datasets:

  • Pass trust_remote_code=True for script dataset: 227f2c4
  • Use JSON-Lines (instead of JSON) dataset in test_statistics_endpoint to avoid pandas bug that downcasts float to int column: a86d040

@albertvillanova
Copy link
Member Author

albertvillanova commented Aug 22, 2024

EDIT:
I have replaced the JSON dataset with a JSON-Lines dataset in test_statistics_endpoint: a86d040

  • JSON-Lines files are read with pyarrow instead of pandas

Note I have commented an assertion in test_statistics_endpoint: c437c9f

  • when reading a JSON file, pandas downcasts a float column to int if all the numbers can be downcast to integers, e.g. [1.0, 2.0, 3.0] -> [1, 2, 3]
  • I don't think this is a big issue

@albertvillanova albertvillanova merged commit 29465f3 into main Aug 23, 2024
27 checks passed
@albertvillanova albertvillanova deleted the update-datasets-2.20.0 branch August 23, 2024 05:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants