How is the DSG integrated in the generation process #3

DiegoBiagini · 2024-08-08T16:45:05Z

Thank you for having released the code for this project.
I'm having trouble wrapping my head around the integration of the DSG into the generation process, when comparing against the paper, and I wanted to ask for some clarifications.

In step 2 it is said: "This step uses gold DSG of video for the updating of recurrent graph Transformer in 3D-UNet."
However the config file specifies that the DSG conditioning is not added, through use_temporal_transformer:False, is this still a T2V only pretraining step?

It would also be nice to have some more explanation on 'how to parse the DSG annotations in advance with the tools in dysen/DSG', since the original code is made for images, not videos. Or if possible provide pre-parsed representations for one of the video datasets used.

I might have misunderstood the process, but even in bash shellscripts/run_sample_vdm_text2video.sh I can't find the step that lies between the textual representation and the graph representation. Is that the script that is used to generate the data passed to shellscripts/run_eval_dysen_vdm.sh?

Thank you in advance!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How is the DSG integrated in the generation process #3

How is the DSG integrated in the generation process #3

DiegoBiagini commented Aug 8, 2024

How is the DSG integrated in the generation process #3

How is the DSG integrated in the generation process #3

Comments

DiegoBiagini commented Aug 8, 2024