The Source Project is from Donut: Please refer to Donut Repo.

SynthDoG 🐶: Synthetic Document Generator

SynthDoG is synthetic document generator for visual document understanding (VDU).

Prerequisites

python>=3.6
synthtiger (pip install synthtiger)

Usage

# Set environment variable (for macOS)
$ export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES

synthtiger -o ./outputs/SynthDoG_en -c 50 -w 4 -v template.py SynthDoG config_en.yaml

{'config': 'config_en.yaml',
 'count': 50,
 'name': 'SynthDoG',
 'output': './outputs/SynthDoG_en',
 'script': 'template.py',
 'verbose': True,
 'worker': 4}
{'aspect_ratio': [1, 2],
     .
     .
 'quality': [50, 95],
 'short_size': [720, 1024]}
Generated 1 data (task 3)
Generated 2 data (task 0)
Generated 3 data (task 1)
     .
     .
Generated 49 data (task 48)
Generated 50 data (task 49)
46.32 seconds elapsed

Some important arguments:

-o : directory path to save data.
-c : number of data to generate.
-w : number of workers.
-s : random seed.
-v : print error messages.

To generate ECJK samples:

# english
synthtiger -o {dataset_path} -c {num_of_data} -w {num_of_workers} -v template.py SynthDoG config_en.yaml

# chinese
synthtiger -o {dataset_path} -c {num_of_data} -w {num_of_workers} -v template.py SynthDoG config_zh.yaml

# japanese
synthtiger -o {dataset_path} -c {num_of_data} -w {num_of_workers} -v template.py SynthDoG config_ja.yaml

# korean
synthtiger -o {dataset_path} -c {num_of_data} -w {num_of_workers} -v template.py SynthDoG config_ko.yaml

Citation

@inproceedings{kim2022donut,
  title     = {OCR-Free Document Understanding Transformer},
  author    = {Kim, Geewook and Hong, Teakgyu and Yim, Moonbin and Nam, JeongYeon and Park, Jinyoung and Yim, Jinyeong and Hwang, Wonseok and Yun, Sangdoo and Han, Dongyoon and Park, Seunghyun},
  booktitle = {European Conference on Computer Vision (ECCV)},
  year      = {2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
elements		elements
layouts		layouts
resources		resources
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
config_en.yaml		config_en.yaml
config_ja.yaml		config_ja.yaml
config_ko.yaml		config_ko.yaml
config_vi.yaml		config_vi.yaml
config_zh.yaml		config_zh.yaml
run.sh		run.sh
template.py		template.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The Source Project is from Donut: Please refer to Donut Repo.

SynthDoG 🐶: Synthetic Document Generator

Prerequisites

Usage

Citation

About

Releases

Packages

Languages

License

bachvudinh/SyntheticDocument

Folders and files

Latest commit

History

Repository files navigation

The Source Project is from Donut: Please refer to Donut Repo.

SynthDoG 🐶: Synthetic Document Generator

Prerequisites

Usage

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages