Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Video StreamingDataset setup #6

Open
kshitijkg opened this issue Jul 19, 2023 · 5 comments
Open

Video StreamingDataset setup #6

kshitijkg opened this issue Jul 19, 2023 · 5 comments
Assignees

Comments

@kshitijkg
Copy link
Collaborator

No description provided.

@kshitijkg kshitijkg changed the title VideoStreaming dataset setup Video StreamingDataset setup Jul 19, 2023
@t46
Copy link

t46 commented Aug 24, 2023

Option 1:
We store entire youtube video and captions using mds writer, we can store bytes. Then in the data loader, in the collate fn, we split the youtube data into multiple clips and return both text and vision in batched format where the batched vision data has dimensions: (B, C, T, H, W, Ch)

Option 2:
We store splitted youtube videos as arrays and dont do the splitting online

@t46
Copy link

t46 commented Aug 24, 2023

Uploaded a sample script for Option 2.
https://github.com/t46/video-dataset

@t46
Copy link

t46 commented Aug 28, 2023

Uploaded the script for option 1 as well, and stored scripts both for option 1 and 2 under the videorl repo.

https://github.com/TheDuckAI/videorl/tree/feature/streaming-dataset/code/data/streamingdataset/preprocess

@t46
Copy link

t46 commented Aug 28, 2023

@t46
Copy link

t46 commented Sep 2, 2023

Updated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants