-
Notifications
You must be signed in to change notification settings - Fork 186
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
sandbox bench experiment workflow (#364)
* FVD and ISV for video eval * restore tools init * restore tools init * pre-commit done * add FID KID IS PR and PRV metrics * add KVD metric * fix doc * allow relative path * fix sample 50000 image * fvd sandbox * fvd sandbox test done * precommit done * easyanimate train and infer in sandbox * divide dataset pipline * fix data num for each partition * pre-commit done * test sandbox for videos done * fix executor * fix executor * check datalen * sort data for partition * sort data for partition * fix video_aspect_ratio_filter * fix video_aspect_ratio_filter * tensor stats to float * precommit done * fix words num filter * pre-commit done * add seed for train and infer * add seed for easyanimate * sandbox rebuild v1 * fix empty frames * switch * fix conflict * fix hpo 3sigma * after pre-commit * sandbox readme zh * finish doc * remove training limit * other_configs -> extra_configs * other_configs -> extra_configs * res_name -> meta_name * hooker -> hook * analyze -> analyse * after pre-commit * analyse -> analyze * analyser.py -> analyzer.py * analyser.py -> analyzer.py * analyser.py -> analyzer.py * regist -> register, DICT -> MAPPING * range_specified_field_selector * pipline test done * dataset in readme * update readme * pre-commit done * rm experiment name in dj * add init dataset * fix auto_evaluation_helm readme * remove easyanimate code * shorten diff --------- Co-authored-by: binke <[email protected]>
- Loading branch information
Showing
60 changed files
with
3,817 additions
and
268 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,5 @@ | ||
|
||
# data & resources | ||
models/ | ||
outputs/ | ||
assets/ | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,68 @@ | ||
# Sandbox config example | ||
|
||
# global parameters | ||
project_name: 'demo-bench' | ||
experiment_name: 'single_op_language_score' # for wandb tracer name | ||
work_dir: './outputs/demo-bench' # the default output dir for meta logging | ||
|
||
# configs for each job, the jobs will be executed according to the order in the list | ||
probe_job_configs: | ||
# get statistics value for each sample and get the distribution analysis for given percentiles | ||
- hook: 'ProbeViaAnalyzerHook' | ||
meta_name: 'analysis_ori_data' | ||
dj_configs: | ||
project_name: 'demo-bench' | ||
dataset_path: './demos/data/demo-dataset-videos.jsonl' # path to your dataset directory or file | ||
percentiles: [0.333, 0.667] # percentiles to analyze the dataset distribution | ||
export_path: './outputs/demo-bench/demo-dataset-with-language-score.jsonl' | ||
export_original_dataset: true # must be true to keep statistics values with dataset | ||
process: | ||
- language_id_score_filter: | ||
lang: 'zh' | ||
min_score: 0.8 | ||
extra_configs: | ||
|
||
refine_recipe_job_configs: | ||
|
||
execution_job_configs: | ||
# sample the splits with low/middle/high statistics values | ||
- hook: 'ProcessDataHook' | ||
meta_name: | ||
dj_configs: | ||
project_name: 'demo-bench' | ||
dataset_path: './outputs/demo-bench/demo-dataset-with-language-score.jsonl' # output dataset of probe jobs | ||
export_path: './outputs/demo-bench/demo-dataset-with-high-language-score.jsonl' | ||
process: | ||
- range_specified_field_selector: | ||
field_key: '__dj__stats__.lang_score' # '__dj__stats__' the target keys corresponding to multi-level field information need to be separated by '.'. 'dj__stats' is the default location for storing stats in Data Juicer, and 'lang_score' is the stats corresponding to the language_id_score_filter. | ||
lower_percentile: 0.667 | ||
upper_percentile: 1.000 | ||
extra_configs: | ||
# random sample dataset with fix number of instances | ||
- hook: 'ProcessDataHook' | ||
meta_name: | ||
dj_configs: | ||
project_name: 'demo-bench' | ||
dataset_path: './outputs/demo-bench/demo-dataset-with-high-language-score.jsonl' # output dataset of probe jobs | ||
export_path: './outputs/demo-bench/demo-dataset-for-train.jsonl' | ||
process: | ||
- random_selector: | ||
select_num: 16 | ||
extra_configs: | ||
# train model | ||
- hook: 'TrainModelHook' | ||
meta_name: | ||
dj_configs: | ||
extra_configs: './configs/demo/bench/model_train.yaml' | ||
# infer model | ||
- hook: 'InferModelHook' | ||
meta_name: | ||
dj_configs: | ||
extra_configs: './configs/demo/bench/model_infer.yaml' | ||
|
||
evaluation_job_configs: | ||
# vbench evaluation | ||
- hook: 'EvaluateDataHook' | ||
meta_name: 'vbench_eval' | ||
dj_configs: | ||
extra_configs: './configs/demo/bench/vbench_eval.yaml' |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,58 @@ | ||
# Sandbox config example | ||
|
||
# global parameters | ||
project_name: 'demo-bench' | ||
experiment_name: 'single_op_language_score' # for wandb tracer name | ||
work_dir: './outputs/demo-bench' # the default output dir for meta logging | ||
|
||
# configs for each job, the jobs will be executed according to the order in the list | ||
probe_job_configs: | ||
|
||
refine_recipe_job_configs: | ||
|
||
execution_job_configs: | ||
- hook: 'ProcessDataHook' | ||
meta_name: | ||
dj_configs: | ||
project_name: 'demo-bench' | ||
dataset_path: './demos/data/demo-dataset-videos.jsonl' # path to your dataset directory or file | ||
export_path: './outputs/demo-bench/demo-dataset-with-multi-op-stats.jsonl' | ||
export_original_dataset: true # must be true to keep statistics values with dataset | ||
process: | ||
# select samples with high language score | ||
- language_id_score_filter: | ||
lang: | ||
min_score: 0.7206037306785583 # this value can be observed in the analysis result of the probe job in one op experiments | ||
# select samples with middle video duration | ||
- video_duration_filter: | ||
min_duration: 19.315000 # this value can be observed in the analysis result of the probe job in one op experiments | ||
max_duration: 32.045000 # this value can be observed in the analysis result of the probe job in one op experiments | ||
|
||
extra_configs: | ||
- hook: 'ProcessDataHook' | ||
meta_name: | ||
dj_configs: | ||
project_name: 'demo-bench' | ||
dataset_path: './outputs/demo-bench/demo-dataset-with-multi-op-stats.jsonl' | ||
export_path: './outputs/demo-bench/demo-dataset-for-train.jsonl' | ||
process: | ||
- random_selector: | ||
select_num: 16 | ||
extra_configs: | ||
# train model | ||
- hook: 'TrainModelHook' | ||
meta_name: | ||
dj_configs: | ||
extra_configs: './configs/demo/bench/model_train.yaml' | ||
# infer model | ||
- hook: 'InferModelHook' | ||
meta_name: | ||
dj_configs: | ||
extra_configs: './configs/demo/bench/model_infer.yaml' | ||
|
||
evaluation_job_configs: | ||
# vbench evaluation | ||
- hook: 'EvaluateDataHook' | ||
meta_name: 'vbench_eval' | ||
dj_configs: | ||
extra_configs: './configs/demo/bench/vbench_eval.yaml' |
Oops, something went wrong.