option for max_sequence_length of video generation #699

wr0124 · 2024-09-23T14:22:57Z

Create an option for the maximum sequence length for video generation, --vid_max_sequence_length. The data_temporal_number_frames value should not exceed --vid_max_sequence_length

Example of execution. It works with:

python3 -W ignore::UserWarning  train.py \
--dataroot /path/to/online_mario2sonic_full_mario  \
--checkpoints_dir  /path/to/checkpoints \
--name  mario_vid   \
--gpu_ids 0    \
--model_type palette \
--output_print_freq 1   \
--output_display_freq 1   \
--data_dataset_mode  self_supervised_temporal_labeled_mask_online  \
--train_batch_size 1  \
--train_iter_size 1  \
--model_input_nc 3 \
--model_output_nc 3 \
--data_relative_paths \
--train_G_ema \
--train_optim adamw \
--G_netG unet_vid   \
--data_online_creation_crop_size_A 32  \
--data_online_creation_crop_size_B 32 \
--data_crop_size 32 \
--data_load_size 32  \
--data_online_creation_rand_mask_A \
--train_G_lr 0.0001 \
--dataaug_no_rotate \
--G_diff_n_timestep_train  6  \
--G_diff_n_timestep_test  3  \
--data_temporal_number_frames 8  \
--data_temporal_frame_step 1 \
--data_online_creation_mask_delta_A_ratio 0.12 0.12 \
--alg_diffusion_cond_image_creation    computed_sketch  \
--alg_diffusion_cond_computed_sketch_list canny \
--alg_diffusion_vid_canny_dropout 0.1 0.8  \
--alg_diffusion_cond_sketch_canny_range  500 1000  \
--vid_max_sequence_length 24

beniz · 2024-09-25T08:09:03Z

docs/options.md

@@ -70,7 +70,7 @@ Here are all the available options to call with `train.py`
 | --G_unet_mha_num_heads | int | 1 | number of heads in the mha architecture |
 | --G_unet_mha_res_blocks | array | [2, 2, 2, 2] | distribution of resnet blocks across the UNet stages, should have same size as --G_unet_mha_channel_mults |
 | --G_unet_mha_vit_efficient | flag |  | if true, use efficient attention in UNet and UViT |
-| --G_unet_vid_max_frame | int | 24 | max frame number for unet_vid in the PositionalEncoding |
+| --vid_max_sequence_length | int | 25 | max frame number for unet_vid in the PositionalEncoding |


Rename --G_unet_vid_max_sequence_length since this is a property of the unet_vid architecture, not the data or the video.

beniz · 2024-09-25T08:09:29Z

docs/source/options.rst

@@ -130,7 +130,7 @@ Generator
 +------------------------------------------------+-----------------+---------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
 | --G_unet_mha_vit_efficient                     | flag            |                                                   | if true, use efficient attention in UNet and UViT                                                                                                                                                                                       |
 +------------------------------------------------+-----------------+---------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
-| --G_unet_vid_max_frame                         | int             | 24                                                | max frame number for unet_vid in the PositionalEncoding                                                                                                                                                                                 |
+| --vid_max_sequence_length                         | int             | 25                                                | max frame number for unet_vid in the PositionalEncoding                                                                                                                                                                                 |


Don't modify, this is generated automatically.

beniz · 2024-09-25T08:10:27Z

models/modules/unet_generator_attn/unet_generator_attn_vid.py

@@ -379,7 +379,7 @@ def __init__(
        attention_block_types=("Temporal_Self", "Temporal_Self"),
        cross_frame_attention_mode=None,
        temporal_position_encoding=False,
-        temporal_position_encoding_max_len=25,
+        temporal_position_encoding_max_len=None,


Keep the default 25 value because None cannot work, and should not be used for init.

beniz · 2024-09-25T08:10:49Z

models/modules/unet_generator_attn/unet_generator_attn_vid.py

@@ -439,7 +438,7 @@ def __init__(
        upcast_attention=False,
        cross_frame_attention_mode=None,
        temporal_position_encoding=False,
-        temporal_position_encoding_max_len=25,
+        temporal_position_encoding_max_len=None,


set default value instead

wr0124 · 2024-09-25T09:45:32Z

code works with
python3 -W ignore::UserWarning train.py
--dataroot /data1/juliew/dataset/online_mario2sonic_full_mario
--checkpoints_dir /data1/juliew/checkpoints
--name mario_vid
--gpu_ids 0
--model_type palette
--output_print_freq 1
--output_display_freq 1
--data_dataset_mode self_supervised_temporal_labeled_mask_online
--train_batch_size 1
--train_iter_size 1
--model_input_nc 3
--model_output_nc 3
--data_relative_paths
--train_G_ema
--train_optim adamw
--G_netG unet_vid
--data_online_creation_crop_size_A 32
--data_online_creation_crop_size_B 32
--data_crop_size 32
--data_load_size 32
--data_online_creation_rand_mask_A
--train_G_lr 0.0001
--dataaug_no_rotate
--G_diff_n_timestep_train 6
--G_diff_n_timestep_test 3
--data_temporal_number_frames 8
--data_temporal_frame_step 1
--data_online_creation_mask_delta_A_ratio 0.12 0.12
--alg_diffusion_cond_image_creation computed_sketch
--alg_diffusion_cond_computed_sketch_list canny
--alg_diffusion_vid_canny_dropout 0.1 0.8
--alg_diffusion_cond_sketch_canny_range 500 1000
--G_unet_vid_max_sequence_length 15
~

wr0124 changed the title ~~feat(ml): option for max_sequence_lenght of video generation~~ option for max_sequence_lenght of video generation Sep 23, 2024

wr0124 requested review from royale and beniz September 23, 2024 19:31

wr0124 added alg:diffusion alg:palette labels Sep 23, 2024

beniz changed the title ~~option for max_sequence_lenght of video generation~~ option for max_sequence_length of video generation Sep 25, 2024

beniz assigned wr0124 Sep 25, 2024

beniz reviewed Sep 25, 2024

View reviewed changes

wr0124 force-pushed the opt_maxframe branch from b718c8c to 9f0b76a Compare September 25, 2024 10:15

wr0124 requested a review from beniz September 25, 2024 10:16

feat(ml): option for max_sequence_lenght of video generation

efa0e8c

wr0124 force-pushed the opt_maxframe branch from 9f0b76a to efa0e8c Compare September 25, 2024 11:45

beniz merged commit 12cfc1b into jolibrain:master Oct 1, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

option for max_sequence_length of video generation #699

option for max_sequence_length of video generation #699

wr0124 commented Sep 23, 2024 •

edited by beniz

Loading

beniz Sep 25, 2024

beniz Sep 25, 2024

beniz Sep 25, 2024

beniz Sep 25, 2024

wr0124 commented Sep 25, 2024 •

edited

Loading

option for max_sequence_length of video generation #699

option for max_sequence_length of video generation #699

Conversation

wr0124 commented Sep 23, 2024 • edited by beniz Loading

beniz Sep 25, 2024

Choose a reason for hiding this comment

beniz Sep 25, 2024

Choose a reason for hiding this comment

beniz Sep 25, 2024

Choose a reason for hiding this comment

beniz Sep 25, 2024

Choose a reason for hiding this comment

wr0124 commented Sep 25, 2024 • edited Loading

wr0124 commented Sep 23, 2024 •

edited by beniz

Loading

wr0124 commented Sep 25, 2024 •

edited

Loading