Skip to content

A Large-scale and Diverse Benchmark for Text-based 3D Model Retrieval

License

Notifications You must be signed in to change notification settings

yuanze1024/LD-T3D

Repository files navigation

LD-T3D: A Large-scale and Diverse Benchmark for Text-based 3D Model Retrieval

Benchmark

Method Features All Easy Medium Hard
mAP mNDCG mFT mST mAP mFT mAP mFT mAP mFT
E5 Text 64.5 82.1 58.4 79.6 69.4 64.9 62.4 55.9 61.7 54.4
AnglE Text 66.8 83.4 60.8 81.1 71.3 66.8 65.7 59.0 63.3 56.6
CLIP Text 59.5 79.7 53.6 74.2 59.6 54.8 59.1 52.3 59.9 53.8
Image 66.1 82.4 59.8 79.9 73.8 68.2 66.1 59.1 58.2 51.7
Text & Image 69.7 85.1 63.0 83.2 74.1 68.3 69.4 62.7 65.5 57.6
BLIP2 Text 56.5 77.3 50.3 71.8 58.4 52.4 53.7 47.6 57.8 51.3
Image 68.6 84.1 62.5 81.7 74.5 69.4 68.2 61.9 63.1 56.1
Text & Image 70.0 84.9 63.6 83.2 75.0 69.6 69.1 62.4 66.0 58.7
Openshape 3D 51.9 73.1 46.5 67.0 63.6 58.8 50.8 45.8 40.9 34.5
3D & Image 70.2 85.1 63.7 82.6 76.9 71.5 70.0 62.7 63.5 56.7
3D & Image & Text 74.3 87.8 67.0 86.1 78.4 72.4 74.5 66.7 69.9 61.6
Uni3D 3D 66.8 82.5 60.5 80.3 76.8 72.0 64.5 58.3 59.0 51.0
3D & Image 75.0 87.9 68.3 86.8 81.0 75.7 74.4 67.5 69.6 61.8
3D & Image & Text 77.1 89.3 70.0 88.3 81.4 75.8 76.8 69.1 73.0 65.1

Online Demo

All our experiments are conducted undeer the GPU A100. You can try our online demo for visualization of Uni3D's retrieval results.

Installation

[Note]: The installation steps is not necessary to use our dataset, which you can easily use in HF Dataset. The docker image may be quite heavy because it involves all the requirements of retrieval methods mentioned in our benchmark. If you only want to try one of those methods, you can refer to their official code repo.

All in One

We provide a built image yuanze1024/LD-T3D.

You can use it by:

docker pull yuanze1024/ld-t3d:v1

From Scratch

If you fail to pull our image, you can build from the Dockerfile:

git clone https://github.com/yuanze1024/LD-T3D.git
cd LD-T3D
docker build -t ld-t3d .

[Note]: Change the TORCH_CUDA_ARCH_LIST in Dockerfile for compilation, e.g., 8.0 for A100, and 8.6 for 3090.

Config

Set your huggingface cache_dir in config/config.yaml. [Note]: Make sure you set the general.cache_dir correctly, which means the dir where you put the downloaded pretrained checkpoints.

Evaluation

The methods' checkpoints will be downloaded automaticlly the first time you use a certain method.

# E5
python eval.py --option e5 --cross_modal text --batch_size 1024
# AnglE
python eval.py --option angle --cross_modal text --batch_size 1024
# CLIP
python eval.py --option clip --cross_modal image --angles diag_above --batch_size 256
# BLIP2
python eval.py --option blip2 --cross_modal image --angles diag_above --batch_size 256
# Openshape
python eval.py --option Openshape --cross_modal text_image_3D --op add --angles diag_above --batch_size 256
# Uni3D
python eval.py --option Uni3D --cross_modal text_image_3D --op add --angles diag_above --batch_size 256

Eval Custom Method

Note that we only support dual-stream architecture by now, which means the embeddings of queries and multimodal features must be encoded seperately.

You can refer to encoders in feature_extractors and achieve your own method which inherits the base class FeatureExtractor in feature_extractors/__init__.py. BTW, if you want to use image modality, you also need to implement a get_img_transform function.

Citation

not published yet

About

A Large-scale and Diverse Benchmark for Text-based 3D Model Retrieval

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published