Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

在hugging face上下载的liuhaotian--LLaVA-CC3M-Pretrain-595K数据集没有cc3m.csv文件 #5

Open
lsuae opened this issue Oct 14, 2024 · 1 comment

Comments

@lsuae
Copy link

lsuae commented Oct 14, 2024

您好,我根据tvl_llama下的readme的指示下载CCM3数据集,发现下载到的数据集中并没有.csv文件,这个文件是需要自己写吗?还是说我理解错了,恳请指教,谢谢!

@Max-Fu
Copy link
Owner

Max-Fu commented Oct 22, 2024

Hi,

Thanks for the question. The data server of our lab is down, so I can't upload the csv file. I remember I downloaded from somewhere else, please check the following two links for references:

  1. https://huggingface.co/spaces/flax-community/dalle-mini/commit/75b01a0a3a29bb2eb6962f5f2fdf160e5c784647
  2. https://github.com/rom1504/img2dataset/blob/main/dataset_examples/cc3m.md

I think it is also fine to create a two-column csv file from the json that is available on:
https://huggingface.co/datasets/liuhaotian/LLaVA-CC3M-Pretrain-595K?row=0

Let me know if it works! I will adjust the readme accordingly!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants