About: 6000+ Arxiv papers from AI category at 2020. The dataset contains latex source files and images, which make it a good research dataset for multimodal learning.
- Dataset URL: https://pan.baidu.com/s/1DsLVmZno7JSWxNQ9CBbBJQ
- Dataset size: ~20G(compressed).
About:build AI helper system for computer science.
- See Home for Researchers for reference.
About: Build multimodal retrieval or recommendation system supporting text, image, formulas, and tables. Consider answering the following questions:
- Which image is most relevant to a given sentence/query?
- Which sentence/paragraph is most relevant to a given image?
- Which formulas are relevant to a given sentence/query?
- Which tables are relevant to a given sentence/query?
- What concepts are relevant to a given formula?
- ... other important questions ...
About:build your own dataset, and develop some interesting models with it.