Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add train/val/test split script for raw #20

Open
kdu4108 opened this issue Jul 23, 2024 · 1 comment
Open

Add train/val/test split script for raw #20

kdu4108 opened this issue Jul 23, 2024 · 1 comment
Assignees

Comments

@kdu4108
Copy link
Collaborator

kdu4108 commented Jul 23, 2024

Given a folder data/raw/... where ... = the downloaded v2d datasets, we need a script to partition (move) them into subdirectories called train, val, and test.

This is because the 4M folder directories expect the train/val/test to precede the modality, e.g.

train/video_rgb/class0/000.tar
train/video_det/class0/000.tar
train/video_transcript/class0/000.tar
```.

Then we can just run the merge_data.sh to go from raw to 4m-data 3 times (one for each split).

For now we can use a split of 70/10/20.
@markus583 markus583 self-assigned this Jul 23, 2024
@markus583
Copy link

See #15

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants