Skip to content

Latest commit

 

History

History
123 lines (96 loc) · 4.52 KB

README.md

File metadata and controls

123 lines (96 loc) · 4.52 KB

MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation

1S-Lab, 2Nanyang Technological University
🚩 Accepted to IJCV 2024

[arXiv]

We present MosaicFusion, a general diffusion-based data augmentation pipeline for large-vocabulary instance segmentation. The MosaicFusion-synthesized instance segmentation dataset can be used to train various downstream detection and segmentation models to improve their performances, especially for rare and novel categories.

🤩 Key Properties

  • Training-free
  • Directly generate multiple objects
  • Agnostic to detection architectures
  • Without extra detectors or segmentors

  • 😎 Method

    MosaicFusion is a training-free diffusion-based dataset augmentation pipeline that can produce image and mask pairs with multiple objects simultaneously using the off-the-shelf text-to-image diffusion models. The overall pipeline of MosaicFusion consists of two components: image generation and mask generation.

    🥰 Qualitative Examples

    Given only interest category names, MosaicFusion can generate high-quality multi-object images and masks simultaneously by conditioning on a specific text prompt for each region.

    🛠️ Usage

    Installation

    • Clone our repo from GitHub:
    git clone https://github.com/Jiahao000/MosaicFusion.git
    cd MosaicFusion
    • Create the conda environment:
    conda env create -f environment.yml
    • Download lvis_v1_train.json, unzip and put it under a directory, e.g., data/lvis/meta/lvis_v1_train.json.

    Data Generation

    1. Generate images and masks with MosaicFusion:
    bash scripts/dist_text2seg.sh "a photo of a single category" output/text2seg Generation_log

    Alternatively, if you run MosaicFusion on a cluster managed with slurm:

    bash scripts/slurm_text2seg.sh Dummy Generation_job "a photo of a single category" output/text2seg Generation_log
    1. Convert generated images and masks to the required data format:
    bash scripts/run_seg2ann.sh output/text2seg output/seg2ann
    1. Merge MosaicFusion annotations into LVIS annotations:
    bash scripts/run_merge_ann.sh data/lvis/meta/lvis_v1_train.json output/seg2ann/annotations/lvis_v1_train_mosaicfusion.json output/seg2ann/annotations/lvis_v1_train_merged.json

    Training Downstream Detectors or Segmentors

    Please refer to TRAIN.md for training details.

    👨‍💻 Todo

    • Data generation code for MosaicFusion
    • Third-party training code with MosaicFusion data

    🤟 Citation

    If you find this work useful for your research, please consider citing our paper:

    @article{xie2024mosaicfusion,
      author = {Xie, Jiahao and Li, Wei and Li, Xiangtai and Liu, Ziwei and Ong, Yew Soon and Loy, Chen Change},
      title = {MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation},
      journal = {International Journal of Computer Vision},
      year = {2024}
    }

    🗞️ License

    Distributed under the S-Lab License. See LICENSE for more information.