AIHub LAVT: Language-Aware Vision Transformer for Referring Image Segmentation

Welcome to the official repository for the method presented in "LAVT: Language-Aware Vision Transformer for Referring Image Segmentation."

Train with AIHub Data

제조환경 데이터 학습 코드

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch --nproc_per_node 8 --master_port 12345 train.py --model lavt_one_xlm --dataset aihub_manufact_80 --model_id refcoco_manufact_80_uniq_id --batch-size 4 --lr 0.00005 --wd 1e-2 --swin_type base --pretrained_swin_weights ./pretrained_weights/swin_base_patch4_window12_384_22k.pth --epochs 40 --img_size 480 2>&1 | tee ./models/refcoco_manufact_80_uniq_id/output

Test with AIHub Data

제조환경 데이터 테스트 코드

python test.py --model lavt_one_xlm --swin_type base --dataset aihub_manufact_80 --split test --resume ./checkpoints/model_best_refcoco_manufact_80_uniq_id.pth --workers 4 --ddp_trained_weights --window12 --img_size 480

Citing LAVT

@inproceedings{yang2022lavt,
  title={LAVT: Language-Aware Vision Transformer for Referring Image Segmentation},
  author={Yang, Zhao and Wang, Jiaqi and Tang, Yansong and Chen, Kai and Zhao, Hengshuang and Torr, Philip HS},
  booktitle={CVPR},
  year={2022}
}

Contributing

We appreciate all contributions. It helps the project if you could

report issues you are facing,
give a 👍 on issues reported by others that are relevant to you,
answer issues reported by others for which you have found solutions,
and implement helpful new features or improve the code otherwise with pull requests.

Acknowledgements

Code in this repository is built upon several public repositories. Specifically,

data pre-processing leverages the refer repository,
the backbone model is implemented based on code from Swin Transformer for Semantic Segmentation,
the training and testing pipelines are adapted from RefVOS,
and implementation of the BERT model (files in the bert directory) is from Hugging Face Transformers v3.0.2 (we migrated over the relevant code to fix a bug and simplify the installation process).

Some of these repositories in turn adapt code from OpenMMLab and TorchVision. We'd like to thank the authors/organizations of these repositories for open sourcing their projects.

License

GNU GPLv3

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
bert		bert
data		data
lib		lib
refer		refer
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README_original_lavt.md		README_original_lavt.md
args.py		args.py
convert_aihub_to_refcoco.py		convert_aihub_to_refcoco.py
convert_aihub_to_refcoco_80.py		convert_aihub_to_refcoco_80.py
convert_aihub_to_refcoco_80_uniq_id.py		convert_aihub_to_refcoco_80_uniq_id.py
demo_inference.py		demo_inference.py
demo_inference_one.py		demo_inference_one.py
find_image_num.py		find_image_num.py
pipeline.jpg		pipeline.jpg
requirements.txt		requirements.txt
requirements_aws.txt		requirements_aws.txt
split_aihub_train_val.py		split_aihub_train_val.py
split_aihub_train_val_test.py		split_aihub_train_val_test.py
split_update_aihub_train_val.py		split_update_aihub_train_val.py
test.py		test.py
train.py		train.py
transforms.py		transforms.py
utils.py		utils.py
visualize_converted_aihub_data.py		visualize_converted_aihub_data.py
visualize_original_aihub_data.py		visualize_original_aihub_data.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AIHub LAVT: Language-Aware Vision Transformer for Referring Image Segmentation

Train with AIHub Data

Test with AIHub Data

Citing LAVT

Contributing

Acknowledgements

License

About

Releases

Packages

Languages

License

gist-ailab/AIHub_LAVT-RIS

Folders and files

Latest commit

History

Repository files navigation

AIHub LAVT: Language-Aware Vision Transformer for Referring Image Segmentation

Train with AIHub Data

Test with AIHub Data

Citing LAVT

Contributing

Acknowledgements

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages