Skip to content

Official implementation of the paper "GOI: Find 3D Gaussians of Interest with an Optimizable Open-vocabulary Semantic-space Hyperplane"(ACM MM2024).

License

Notifications You must be signed in to change notification settings

Quyans/GOI-Hyperplane

Repository files navigation

🆒 GOI: Find 3D Gaussians of Interest with an Optimizable Open-vocabulary Semantic-space Hyperplane

Paper PDF Project Page MipNeRF360-OV Replica

😊 TL;DR

GOI can locate 3D gaussians of interests as directed by open-vocabulary prompts.

⭐ Key components of GOI:

  • A Trainable Feature Clustering Codebook effciently condense noisy high-dimensional semantic features into compact, low-dimensional vectores, ensuring well-defined segmentation boundaries.
  • Finetuning Semantic-space Hyperplane, initiallized by text query embedding, to better locate target area.
  • An open-vocabulary dataset is proposed, named MipNeRF360-OV.

🔥 News:

  • We have updated the GUI and released the evaluation code.
  • We provide the processed Replica dataset used in our paper, which is derived from the Semantic-NeRF version.

📖 Open-vocabulary Query Results

❗ You can precisely locate 3D Gaussians of Interest with an open-vocabulary text prompt

teaser.mp4

Visiting our Project Page for more result.

🔧 Installation

  • clone this repo:
git clone https://github.com/Quyans/GOI-Hyperplane.git
cd GOI-Hyperplane
  • set up a new conda environment
conda env create --file environment.yml
conda activate goi
  • If you confront with any problems at the pip installation stage, you can try the following command in the goi environment:
conda activate goi
pip install submodules/diff-gaussian-rasterization submodules/simple-knn
pip install trimesh kiui pymeshlab rembg open3d scipy dearpygui omegaconf open_clip_torch transformations transformers==4.38.1 yapf pycocotools
pip install clip@git+https://github.com/openai/CLIP.git

📚 Data Preparation

We use datasets in the COLMAP format. For your own dataset, you can use the convert.py script. Refer to 3DGS for specific usage.

In addition to RGB data, we also use pixel-aligned semantic feature maps. Specifically, we use APE as our vision-language model to extract semantic features from images.

First, install our modified APE repository. Then, run the following command to extract semantic features from all RGB images and save them in the clip_feat folder under the scene path:

cd ../APE
python demo/demo_lazy.py -i <scene_path>/images/* --feat-out <scene_path>/clip_feat/
  • Due to the high dimensionality of pixel-aligned feature encoded by APE, we tend to use lower resolution (< 1.6k) images for encoding (e.g. images_4 folder for Mip360 dataset)

After preparing the depth maps, your scene folder should look like this:

scene_path
├── clip_feat/
├── images/
├── sparse/
└── ...

🚋 Training

Our method leverages 3D semantic fields generated from pre-trained 3DGS scenes. To begin, you must run the training script provided in the original 3DGS project. Once the training is complete, rename the output folder from the 3DGS training (e.g., iteration_30000) to iteration_1.

Next, to reconstruct the 3D semantic field, run the following command. Be sure to use the -m option to specify the path to the pre-trained scene.

python train.py -s <scene path> -m <model path> -i <alternative image path>

Ensure that the resolution of the feature maps matches the resolution of the RGB images. For example, if you're using images from the images_4 folder to extract semantic features, use the -i images_4 option in the train.py script.

For detailed usage instructions for train.py, please refer to the 3DGS documentation.

👀 Visualization

After completing the reconstruction, you can visualize the results using our GUI.

First, download the language model of APE from here, and place it in the models folder in the root directory.

Run the following command to use the updated GUI:

python gui/main.py --config gui/configs/default.yaml
  • A few additional models will be automatically downloaded the first time you run the script.

  • You can download our pre-trained scenes for evaluation.

  • Please look up to the documentation and the configuration file for guidance on using the new GUI and performing evaluation operations.

Evaluation

Since our work utilizes a 2D RES model, you'll need to use the GUI to query objects and save the segmentation masks. For further guidance, you can refer to the GUI documentation.

Once you have obtained the segmentation for a scene or the entire evaluation set, you can run the evaluation code.

python eval_seg.py --eval_root <eval dataset path> --saving_root <mask saving path> --dataset <'m360' or 'replica'> --scene_list [scenes to eval]

Depending on where you save the masks, you may need to adjust the format of their paths in the code.

Citation

@article{goi2024,
    title={GOI: Find 3D Gaussians of Interest with an Optimizable Open-vocabulary Semantic-space Hyperplane},
    author={Qu, Yansong and Dai, Shaohui and Li, Xinyang and Lin, Jianghang and Cao, Liujuan and Zhang, Shengchuan and Ji, Rongrong},
    journal={arXiv preprint arXiv:2405.17596},
    year={2024}
}

License

Licensed under the CC BY-NC-SA 4.0 (Attribution-NonCommercial-ShareAlike 4.0 International)

The code is released for academic research use only.

If you have any questions, please contact me via [email protected].

About

Official implementation of the paper "GOI: Find 3D Gaussians of Interest with an Optimizable Open-vocabulary Semantic-space Hyperplane"(ACM MM2024).

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published