We're releasing 4 new variants of DreamSim! These new checkpoints are:
- DINOv2 B/14 and SynCLR B/16 as backbones
- DINOv2 B/14 and DINO B/16 trained with the original contrastive loss on both CLS and dense features.
These models (and the originals) are further evaluated in our new NeurIPS 2024 paper, When Does Perceptual Alignment Benefit Vision Representations?
We find that our perceptually-aligned representations outperform the baseline models on a variety of downstream standard computer vision tasks, including semantic segmentation, depth estimation, object counting, instance retrieval, and retrieval-augmented generation. These results point towards perceptual alignment as a useful objective for learning general-purpose vision representations. See the paper and our blog post for more details.
Here's how they perform on NIGHTS:
NIGHTS - Val | NIGHTS - Test | |
---|---|---|
ensemble |
96.9% | 96.2% |
dino_vitb16 |
95.6% | 94.8% |
open_clip_vitb32 |
95.6% | 95.3% |
clip_vitb32 |
94.9% | 93.6% |
dinov2_vitb14 |
94.9% | 95.0% |
synclr_vitb16 |
96.0% | 95.9% |
dino_vitb16 (patch) |
94.9% | 94.8% |
dinov2_vitb14 (patch) |
95.5% | 95.1% |
Additionally, we fixed a bug in embedding normalization. This shouldn't significantly affect model performance, but may explain very minor changes in pipelines where DreamSim (with normalize_embeds=True
) is being used.