The data can be downloaded from Azure blob storage at the following URLs (all sizes are compressed):
- Computer Science (32.79Gb)
- Economics (14.99Gb)
- Geography (16.02Gb)
- History (11.39Gb)
- Physics (42.14Gb)
Please note that the datasets are much larger when decompressed. Each is a zip file with the shown structure:
├── documents.json.gz
├── graph.csv
├── qrels
│ ├── test.csv
│ ├── train.csv
│ └── val.csv
└── queries.json.gz
The repository will be updated with scripts and examples for using the data and performing retrieval in the coming days.