LD-T3D: A Large-scale and Diverse Benchmark for Text-based 3D Model Retrieval

Benchmark

Method	Features	All				Easy		Medium		Hard
Method	Features	mAP	mNDCG	mFT	mST	mAP	mFT	mAP	mFT	mAP	mFT
E5	Text	64.5	82.1	58.4	79.6	69.4	64.9	62.4	55.9	61.7	54.4
AnglE	Text	66.8	83.4	60.8	81.1	71.3	66.8	65.7	59.0	63.3	56.6
CLIP	Text	59.5	79.7	53.6	74.2	59.6	54.8	59.1	52.3	59.9	53.8
	Image	66.1	82.4	59.8	79.9	73.8	68.2	66.1	59.1	58.2	51.7
	Text & Image	69.7	85.1	63.0	83.2	74.1	68.3	69.4	62.7	65.5	57.6
BLIP2	Text	56.5	77.3	50.3	71.8	58.4	52.4	53.7	47.6	57.8	51.3
	Image	68.6	84.1	62.5	81.7	74.5	69.4	68.2	61.9	63.1	56.1
	Text & Image	70.0	84.9	63.6	83.2	75.0	69.6	69.1	62.4	66.0	58.7
Openshape	3D	51.9	73.1	46.5	67.0	63.6	58.8	50.8	45.8	40.9	34.5
	3D & Image	70.2	85.1	63.7	82.6	76.9	71.5	70.0	62.7	63.5	56.7
	3D & Image & Text	74.3	87.8	67.0	86.1	78.4	72.4	74.5	66.7	69.9	61.6
Uni3D	3D	66.8	82.5	60.5	80.3	76.8	72.0	64.5	58.3	59.0	51.0
	3D & Image	75.0	87.9	68.3	86.8	81.0	75.7	74.4	67.5	69.6	61.8
	3D & Image & Text	77.1	89.3	70.0	88.3	81.4	75.8	76.8	69.1	73.0	65.1

Online Demo

All our experiments are conducted undeer the GPU A100. You can try our online demo for visualization of Uni3D's retrieval results.

Installation

[Note]: The installation steps is not necessary to use our dataset, which you can easily use in HF Dataset. The docker image may be quite heavy because it involves all the requirements of retrieval methods mentioned in our benchmark. If you only want to try one of those methods, you can refer to their official code repo.

All in One

We provide a built image yuanze1024/LD-T3D.

You can use it by:

docker pull yuanze1024/ld-t3d:v1

From Scratch

If you fail to pull our image, you can build from the Dockerfile:

git clone https://github.com/yuanze1024/LD-T3D.git
cd LD-T3D
docker build -t ld-t3d .

[Note]: Change the TORCH_CUDA_ARCH_LIST in Dockerfile for compilation, e.g., 8.0 for A100, and 8.6 for 3090.

Config

Set your huggingface cache_dir in config/config.yaml. [Note]: Make sure you set the general.cache_dir correctly, which means the dir where you put the downloaded pretrained checkpoints.

Evaluation

The methods' checkpoints will be downloaded automaticlly the first time you use a certain method.

# E5
python eval.py --option e5 --cross_modal text --batch_size 1024
# AnglE
python eval.py --option angle --cross_modal text --batch_size 1024
# CLIP
python eval.py --option clip --cross_modal image --angles diag_above --batch_size 256
# BLIP2
python eval.py --option blip2 --cross_modal image --angles diag_above --batch_size 256
# Openshape
python eval.py --option Openshape --cross_modal text_image_3D --op add --angles diag_above --batch_size 256
# Uni3D
python eval.py --option Uni3D --cross_modal text_image_3D --op add --angles diag_above --batch_size 256

Eval Custom Method

Note that we only support dual-stream architecture by now, which means the embeddings of queries and multimodal features must be encoded seperately.

You can refer to encoders in feature_extractors and achieve your own method which inherits the base class FeatureExtractor in feature_extractors/__init__.py. BTW, if you want to use image modality, you also need to implement a get_img_transform function.

Citation

not published yet

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

LD-T3D: A Large-scale and Diverse Benchmark for Text-based 3D Model Retrieval

Benchmark

Online Demo

Installation

All in One

From Scratch

Config

Evaluation

Eval Custom Method

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

LD-T3D: A Large-scale and Diverse Benchmark for Text-based 3D Model Retrieval

Benchmark

Online Demo

Installation

All in One

From Scratch

Config

Evaluation

Eval Custom Method

Citation