Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

坐等数据集 #1

Open
shipengai opened this issue Nov 1, 2023 · 12 comments
Open

坐等数据集 #1

shipengai opened this issue Nov 1, 2023 · 12 comments

Comments

@shipengai
Copy link

好棒的工作

@yqy2001
Copy link
Member

yqy2001 commented Nov 1, 2023

谢谢关注~数据集可能最近(1-2周内)不会放。但用来生成数据的CapsLLaMA模型和推理代码会大约1-2周内放,可以先用这个做推理构造数据

The dataset will not be released recently (about 1-2 weeks). But the CapsLLaMA model used to generate data together with the large-scale distributed inference code will be released in 1-2 weeks, please stay tuned.

@cliangyu
Copy link

Thanks for the update. Is there a planned timeline of dataset release?

@yqy2001
Copy link
Member

yqy2001 commented Nov 29, 2023

Hi there, the CapsFus-LLaMA model and distributed inference code have been released, please check it out and give me feedback on any problem you encounter.

Thank you.

@shipengai
Copy link
Author

好棒,继续期待数据集公开

@Moonteresa
Copy link

Hi there, the CapsFus-LLaMA model and distributed inference code have been released, please check it out and give me feedback on any problem you encounter.

Thank you.

Looking for dataset, too. Please kindly @ me if the dataset is released! Thanks!

@iamlockelightning
Copy link

Waiting for the datasets to be released! 👀

@yqy2001
Copy link
Member

yqy2001 commented Jan 10, 2024

@shipengai @cliangyu @Moonteresa @iamlockelightning
Hi there, we have released the CapsFusion-120M dataset, please check it out!

@Moonteresa
Copy link

@shipengai @cliangyu @Moonteresa @iamlockelightning Hi there, we have released the CapsFusion-120M dataset, please check it out!

hi! I download the parquets,but only the third one can be read rightly by pd.read_parquet,the other three show error thrift data. What else way can be used to read these parquets?

@TyRantLQlyf
Copy link

hi! I download the parquets,but only the third one can be read rightly by pd.read_parquet,the other three show error thrift data. What else way can be used to read these parquets?

hi @yqy2001 ! I face the same problem as @Moonteresa . Need your Help on reading the data.

@yqy2001
Copy link
Member

yqy2001 commented Jan 16, 2024

@Moonteresa @TyRantLQlyf Thank you for your feedback. I will check it.

@TyRantLQlyf
Copy link

@Moonteresa @TyRantLQlyf Thank you for your feedback. I will check it.

hi @yqy2001 . Have you checked this dataset issue?

@yqy2001
Copy link
Member

yqy2001 commented Jan 18, 2024

@TyRantLQlyf @Moonteresa

Hello! I've downloaded data directly from the HuggingFace repository. Upon testing, I successfully accessed the data using the following code:

image

Can you share your error messages?

(Note that pandas and pyarrow packages need to be installed, you can install them through pip install pandas pyarrow)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants