Skip to content

Latest commit

 

History

History
273 lines (260 loc) · 5.69 KB

README.md

File metadata and controls

273 lines (260 loc) · 5.69 KB

LD-T3D: A Large-scale and Diverse Benchmark for Text-based 3D Model Retrieval

Benchmark

Method Features All Easy Medium Hard
mAP mNDCG mFT mST mAP mFT mAP mFT mAP mFT
E5 Text 64.5 82.1 58.4 79.6 69.4 64.9 62.4 55.9 61.7 54.4
AnglE Text 66.8 83.4 60.8 81.1 71.3 66.8 65.7 59.0 63.3 56.6
CLIP Text 59.5 79.7 53.6 74.2 59.6 54.8 59.1 52.3 59.9 53.8
Image 66.1 82.4 59.8 79.9 73.8 68.2 66.1 59.1 58.2 51.7
Text & Image 69.7 85.1 63.0 83.2 74.1 68.3 69.4 62.7 65.5 57.6
BLIP2 Text 56.5 77.3 50.3 71.8 58.4 52.4 53.7 47.6 57.8 51.3
Image 68.6 84.1 62.5 81.7 74.5 69.4 68.2 61.9 63.1 56.1
Text & Image 70.0 84.9 63.6 83.2 75.0 69.6 69.1 62.4 66.0 58.7
Openshape 3D 51.9 73.1 46.5 67.0 63.6 58.8 50.8 45.8 40.9 34.5
3D & Image 70.2 85.1 63.7 82.6 76.9 71.5 70.0 62.7 63.5 56.7
3D & Image & Text 74.3 87.8 67.0 86.1 78.4 72.4 74.5 66.7 69.9 61.6
Uni3D 3D 66.8 82.5 60.5 80.3 76.8 72.0 64.5 58.3 59.0 51.0
3D & Image 75.0 87.9 68.3 86.8 81.0 75.7 74.4 67.5 69.6 61.8
3D & Image & Text 77.1 89.3 70.0 88.3 81.4 75.8 76.8 69.1 73.0 65.1

Online Demo

All our experiments are conducted undeer the GPU A100. You can try our online demo for visualization of Uni3D's retrieval results.

Installation

[Note]: The installation steps is not necessary to use our dataset, which you can easily use in HF Dataset. The docker image may be quite heavy because it involves all the requirements of retrieval methods mentioned in our benchmark. If you only want to try one of those methods, you can refer to their official code repo.

All in One

We provide a built image yuanze1024/LD-T3D.

You can use it by:

docker pull yuanze1024/ld-t3d:v1

From Scratch

If you fail to pull our image, you can build from the Dockerfile:

git clone https://github.com/yuanze1024/LD-T3D.git
cd LD-T3D
docker build -t ld-t3d .

[Note]: Change the TORCH_CUDA_ARCH_LIST in Dockerfile for compilation, e.g., 8.0 for A100, and 8.6 for 3090.

Config

Set your huggingface cache_dir in config/config.yaml. [Note]: Make sure you set the general.cache_dir correctly, which means the dir where you put the downloaded pretrained checkpoints.

Evaluation

The methods' checkpoints will be downloaded automaticlly the first time you use a certain method.

# E5
python eval.py --option e5 --cross_modal text --batch_size 1024
# AnglE
python eval.py --option angle --cross_modal text --batch_size 1024
# CLIP
python eval.py --option clip --cross_modal image --angles diag_above --batch_size 256
# BLIP2
python eval.py --option blip2 --cross_modal image --angles diag_above --batch_size 256
# Openshape
python eval.py --option Openshape --cross_modal text_image_3D --op add --angles diag_above --batch_size 256
# Uni3D
python eval.py --option Uni3D --cross_modal text_image_3D --op add --angles diag_above --batch_size 256

Eval Custom Method

Note that we only support dual-stream architecture by now, which means the embeddings of queries and multimodal features must be encoded seperately.

You can refer to encoders in feature_extractors and achieve your own method which inherits the base class FeatureExtractor in feature_extractors/__init__.py. BTW, if you want to use image modality, you also need to implement a get_img_transform function.

Citation

not published yet