-
Notifications
You must be signed in to change notification settings - Fork 309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
download pile dataset is not working #1419
Comments
I learned that the dataset has been removed due to the DMCA claim. There is an alternative location, https://huggingface.co/datasets/EleutherAI/the_pile_deduplicated/tree/main This has formatted in parquet, so you will need to read from parquet (instead of jsonl.zst) and convert to tfrecord.
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
seems like https://the-eye.eu/public/AI/pile/ is not reachable.
Is there any other reliable alternative location to fetch data?
The text was updated successfully, but these errors were encountered: