🆒 GOI: Find 3D Gaussians of Interest with an Optimizable Open-vocabulary Semantic-space Hyperplane

😊 TL;DR

GOI can locate 3D gaussians of interests as directed by open-vocabulary prompts.

⭐ Key components of GOI:

A Trainable Feature Clustering Codebook effciently condense noisy high-dimensional semantic features into compact, low-dimensional vectores, ensuring well-defined segmentation boundaries.
Finetuning Semantic-space Hyperplane, initiallized by text query embedding, to better locate target area.
An open-vocabulary dataset is proposed, named MipNeRF360-OV.

🔥 News:

We have updated the GUI and released the evaluation code.
We provide the processed Replica dataset used in our paper, which is derived from the Semantic-NeRF version.

📖 Open-vocabulary Query Results

❗ You can precisely locate 3D Gaussians of Interest with an open-vocabulary text prompt

teaser.mp4

Visiting our Project Page for more result.

🔧 Installation

clone this repo:

git clone https://github.com/Quyans/GOI-Hyperplane.git
cd GOI-Hyperplane

set up a new conda environment

conda env create --file environment.yml
conda activate goi

If you confront with any problems at the pip installation stage, you can try the following command in the goi environment:

conda activate goi
pip install submodules/diff-gaussian-rasterization submodules/simple-knn
pip install trimesh kiui pymeshlab rembg open3d scipy dearpygui omegaconf open_clip_torch transformations transformers==4.38.1 yapf pycocotools
pip install clip@git+https://github.com/openai/CLIP.git

📚 Data Preparation

We use datasets in the COLMAP format. For your own dataset, you can use the convert.py script. Refer to 3DGS for specific usage.

In addition to RGB data, we also use pixel-aligned semantic feature maps. Specifically, we use APE as our vision-language model to extract semantic features from images.

First, install our modified APE repository. Then, run the following command to extract semantic features from all RGB images and save them in the clip_feat folder under the scene path:

cd ../APE
python demo/demo_lazy.py -i <scene_path>/images/* --feat-out <scene_path>/clip_feat/

Due to the high dimensionality of pixel-aligned feature encoded by APE, we tend to use lower resolution (< 1.6k) images for encoding (e.g. images_4 folder for Mip360 dataset)

After preparing the depth maps, your scene folder should look like this:

scene_path
├── clip_feat/
├── images/
├── sparse/
└── ...

🚋 Training

Our method leverages 3D semantic fields generated from pre-trained 3DGS scenes. To begin, you must run the training script provided in the original 3DGS project. Once the training is complete, rename the output folder from the 3DGS training (e.g., iteration_30000) to iteration_1.

Next, to reconstruct the 3D semantic field, run the following command. Be sure to use the -m option to specify the path to the pre-trained scene.

python train.py -s <scene path> -m <model path> -i <alternative image path>

Ensure that the resolution of the feature maps matches the resolution of the RGB images. For example, if you're using images from the images_4 folder to extract semantic features, use the -i images_4 option in the train.py script.

For detailed usage instructions for train.py, please refer to the 3DGS documentation.

👀 Visualization

After completing the reconstruction, you can visualize the results using our GUI.

First, download the language model of APE from here, and place it in the models folder in the root directory.

Run the following command to use the updated GUI:

python gui/main.py --config gui/configs/default.yaml

A few additional models will be automatically downloaded the first time you run the script.
You can download our pre-trained scenes for evaluation.
Please look up to the documentation and the configuration file for guidance on using the new GUI and performing evaluation operations.

Evaluation

Since our work utilizes a 2D RES model, you'll need to use the GUI to query objects and save the segmentation masks. For further guidance, you can refer to the GUI documentation.

Once you have obtained the segmentation for a scene or the entire evaluation set, you can run the evaluation code.

python eval_seg.py --eval_root <eval dataset path> --saving_root <mask saving path> --dataset <'m360' or 'replica'> --scene_list [scenes to eval]

Depending on where you save the masks, you may need to adjust the format of their paths in the code.

Citation

@article{goi2024,
    title={GOI: Find 3D Gaussians of Interest with an Optimizable Open-vocabulary Semantic-space Hyperplane},
    author={Qu, Yansong and Dai, Shaohui and Li, Xinyang and Lin, Jianghang and Cao, Liujuan and Zhang, Shengchuan and Ji, Rongrong},
    journal={arXiv preprint arXiv:2405.17596},
    year={2024}
}

License

Licensed under the CC BY-NC-SA 4.0 (Attribution-NonCommercial-ShareAlike 4.0 International)

The code is released for academic research use only.

If you have any questions, please contact me via [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
arguments		arguments
assets		assets
ext		ext
gaussian_renderer		gaussian_renderer
gui		gui
guidance		guidance
lpipsPyTorch		lpipsPyTorch
my_script		my_script
scene		scene
submodules		submodules
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
convert.py		convert.py
environment.yml		environment.yml
eval_seg.py		eval_seg.py
metrics.py		metrics.py
networks.py		networks.py
render.py		render.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🆒 GOI: Find 3D Gaussians of Interest with an Optimizable Open-vocabulary Semantic-space Hyperplane

📖 Open-vocabulary Query Results

🔧 Installation

📚 Data Preparation

🚋 Training

👀 Visualization

Evaluation

Citation

License

About

Releases

Packages

Contributors 2

Languages

License

Quyans/GOI-Hyperplane

Folders and files

Latest commit

History

Repository files navigation

🆒 GOI: Find 3D Gaussians of Interest with an Optimizable Open-vocabulary Semantic-space Hyperplane

📖 Open-vocabulary Query Results

🔧 Installation

📚 Data Preparation

🚋 Training

👀 Visualization

Evaluation

Citation

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages