You Only Look At CoefficienTs

    ██╗   ██╗ ██████╗ ██╗      █████╗  ██████╗████████╗
    ╚██╗ ██╔╝██╔═══██╗██║     ██╔══██╗██╔════╝╚══██╔══╝
     ╚████╔╝ ██║   ██║██║     ███████║██║        ██║   
      ╚██╔╝  ██║   ██║██║     ██╔══██║██║        ██║   
       ██║   ╚██████╔╝███████╗██║  ██║╚██████╗   ██║   
       ╚═╝    ╚═════╝ ╚══════╝╚═╝  ╚═╝ ╚═════╝   ╚═╝

A simple, fully convolutional model for real-time instance segmentation. This is the code for our paper, and for the forseeable future is still in development.

Here's a look at our current results for our base model (33 fps on a Titan Xp and 29.8 mAP on COCO's test-dev):

Installation

Set up a Python3 environment.
Install Pytorch 1.0.1 (or higher) and TorchVision.

Install some other packages:

# Cython needs to be installed before pycocotools
pip install cython
pip install opencv-python pillow pycocotools matplotlib

Clone this repository and enter it:

git clone https://github.com/dbolya/yolact.git
cd yolact

If you'd like to train YOLACT, download the COCO dataset and the 2014/2017 annotations. Note that this script will take a while and dump 21gb of files into ./data/coco.
```
sh data/scripts/COCO.sh
```
If you'd like to evaluate YOLACT on test-dev, download test-dev with this script.
```
sh data/scripts/COCO_test.sh
```

Evaluation

As of April 5th, 2019 here are our latest models along with their FPS on a Titan Xp and mAP on test-dev:

Image Size	Backbone	FPS	mAP	Weights
550	Resnet50-FPN	42.5	28.2	yolact_resnet50_54_800000.pth	Mirror
550	Darknet53-FPN	40.0	28.7	yolact_darknet53_54_800000.pth	Mirror
550	Resnet101-FPN	33.0	29.8	yolact_base_54_800000.pth	Mirror
700	Resnet101-FPN	23.6	31.2	yolact_im700_54_800000.pth	Mirror

To evalute the model, put the corresponding weights file in the ./weights directory and run one of the following commands.

Quantitative Results on COCO

# Quantitatively evaluate a trained model on the entire validation set. Make sure you have COCO downloaded as above.
# This should get 29.92 validation mask mAP last time I checked.
python eval.py --trained_model=weights/yolact_base_54_800000.pth

# Output a COCOEval json to submit to the website or to use the run_coco_eval.py script.
# This command will create './results/bbox_detections.json' and './results/mask_detections.json' for detection and instance segmentation respectively.
python eval.py --trained_model=weights/yolact_base_54_800000.pth --output_coco_json

# You can run COCOEval on the files created in the previous command. The performance should match my implementation in eval.py.
python run_coco_eval.py

# To output a coco json file for test-dev, make sure you have test-dev downloaded from above and go
python eval.py --trained_model=weights/yolact_base_54_800000.pth --output_coco_json --dataset=coco2017_testdev_dataset

Qualitative Results on COCO

# Display qualitative results on COCO. From here on I'll use a confidence threshold of 0.3.
python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.3 --top_k=100 --display

Benchmarking on COCO

# Run just the raw model on the first 1k images of the validation set
python eval.py --trained_model=weights/yolact_base_54_800000.pth --benchmark --max_images=1000

Images

# Display qualitative results on the specified image.
python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.3 --top_k=100 --image=my_image.png

# Process an image and save it to another file.
python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.3 --top_k=100 --image=input_image.png:output_image.png

# Process a whole folder of images.
python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.3 --top_k=100 --images=path/to/input/folder:path/to/output/folder

Video

# Display a video in real-time. "--video_multiframe" will process that many frames at once for improved performance.
python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.3 --top_k=100 --video_multiframe=2 --video=my_video.mp4

# Display a webcam feed in real-time. If you have multiple webcams pass the index of the webcam you want instead of 0.
python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.3 --top_k=100 --video_multiframe=2 --video=0

# Process a video and save it to another file. This is unoptimized.
python eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.3 --top_k=100 --video=input_video.mp4:output_video.mp4

As you can tell, eval.py can do a ton of stuff. Run the --help command to see everything it can do.

python eval.py --help

Training

By default, we Train on COCO. Make sure to download the entire dataset using the commands above.

To train, grab an imagenet-pretrained model and put it in ./weights.
- For Resnet101, download resnet101_reducedfc.pth from here.
- For Resnet50, download resnet50-19c8e357.pth from here.
- For Darknet53, download darknet53.pth from here.
Run one of the training commands below.
- Note that you can press ctrl+c while training and it will save an *_interrupt.pth file at the current iteration.
- All weights are saved in the ./weights directory by default with the file name <config>_<epoch>_<iter>.pth.

# Trains using the base config with a batch size of 8 (the default).
python train.py --config=yolact_base_config

# Trains yolact_base_config with a batch_size of 5. For the 550px models, 1 batch takes up around 1.5 gigs of VRAM, so specify accordingly.
python train.py --config=yolact_base_config --batch_size=5

# Resume training yolact_base with a specific weight file and start from the iteration specified in the weight file's name.
python train.py --config=yolact_base_config --resume=weights/yolact_base_10_32100.pth --start_iter=-1

# Use the help option to see a description of all available command line arguments
python train.py --help

Custom Datasets

You can also train on your own dataset by following these steps:

Create a COCO-style Object Detection JSON annotation file for your dataset. The specification for this can be found here. Note that we don't use some fields, so the following may be omitted:
- info
- liscense
- Under image: license, flickr_url, coco_url, date_captured
- categories (we use our own format for categories, see below)
Create a definition for your dataset under dataset_base in data/config.py (see the comments in dataset_base for an explanation of each field):

my_custom_dataset = dataset_base.copy({
    'name': 'My Dataset',

    'train_images': 'path_to_training_images',
    'train_info':   'path_to_training_annotation',

    'valid_images': 'path_to_validation_images',
    'valid_info':   'path_to_validation_annotation',

    'has_gt': True,
    'class_names': ('my_class_id_1', 'my_class_id_2', 'my_class_id_3', ...)
})

A couple things to note:
- Class IDs in the annotation file should start at 1 and increase sequentially on the order of class_names. If this isn't the case for your annotation file (like in COCO), see the field label_map in dataset_base.
- If you do not want to create a validation split, use the same image path and annotations file for validation. By default (see python train.py --help), train.py will output validation mAP for the first 5000 images in the dataset every 2 epochs.
Finally, in yolact_base_config in the same file, change the value for 'dataset' to 'my_custom_dataset' or whatever you named the config object above. Then you can use any of the training commands in the previous section.

Creating a Custom Dataset from Scratch

See this nice post by @Amit12690 for tips on how to annotate a custom dataset and prepare it for use with YOLACT.

Green screen mode

The green screen mode can be activated by running e.g. the following command:

python .\eval.py --trained_model=weights/yolact_base_54_800000.pth --display_masks=False --display_bboxes=False --display_text=False --score_threshold=0.3 --video=0 --background_image=path_to_image.jpg

This will insert a person found on the video stream from the webcam (--video=0) on top of the image given in the --background_image option. Using other --video options can do this to video files instead of the webcam video.

Citation

If you use YOLACT or this code base in your work, please cite

@article{bolya-arxiv2019,
  author    = {Daniel Bolya and Chong Zhou and Fanyi Xiao and Yong Jae Lee},
  title     = {YOLACT: {Real-time} Instance Segmentation},
  journal   = {arXiv},
  year      = {2019},
}

Contact

For questions about our paper or code, please contact Daniel Bolya.

Name		Name	Last commit message	Last commit date
Latest commit History 486 Commits
data		data
layers		layers
scripts		scripts
utils		utils
web		web
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
backbone.py		backbone.py
eval.py		eval.py
run_coco_eval.py		run_coco_eval.py
train.py		train.py
yolact.py		yolact.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

You Only Look At CoefficienTs

Installation

Evaluation

Quantitative Results on COCO

Qualitative Results on COCO

Benchmarking on COCO

Images

Video

Training

Custom Datasets

Creating a Custom Dataset from Scratch

Green screen mode

Citation

Contact

About

Releases

Packages

Languages

License

MadsBornebusch/yolact

Folders and files

Latest commit

History

Repository files navigation

You Only Look At CoefficienTs

Installation

Evaluation

Quantitative Results on COCO

Qualitative Results on COCO

Benchmarking on COCO

Images

Video

Training

Custom Datasets

Creating a Custom Dataset from Scratch

Green screen mode

Citation

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages