The list will be continually updated. Stay tuned!
OCL models decompose and reconstruct the synthetic/real-world scenes via learning multiple disentangled abstract representations, which interpret multiple levels of object-centric concepts, in a fully unsupervised manner.
< Last updated: Aug/07/2023 >
Table of Contents
Name | Source |
---|---|
Multi-Object Datasets | Link |
Name | Source |
---|---|
PASCAL VOC, COCO | Link |
CUB200 Birds, Stanford Dogs, Stanford Cars, and Caltech Flowers | Link |
YCB,ScanNet and COCO | Link |
Year | Publication | Title | Source | Forum | Real-World? | Using VLMs? |
---|---|---|---|---|---|---|
2024 | ArXiv | Representation Alignment for Generation: Training Diffusion Transformers is Easier Than You Think REPA: Aligning DINO and DiT features to potentially enable scalable object-centric learning. |
Code | ✅ |
Year | Publication | Title | Source | Forum | Real-World? | Using VLMs? |
---|---|---|---|---|---|---|
2023 | ICCV | Unsupervised Compositional Concepts Discovery with Text-to-Image Generative Models UCCD: Discovering a set of compositional concepts given a dataset of unlabeled images. |
Code | ✅ | ✅ | |
2023 | ICML | Composer: Creative and Controllable Image Synthesis with Composable Conditions Composer: Learning multiple concepts of the given real-world image and synthesize a new one by altering and compose them. |
PMLR | ✅ | ✅ | |
2023 | ICML | Provably Learning Object-Centric Representations ProvablyOCL: Analyzing when object-centric representations can be learned without supervision and introducing two assumptions, compositionality and irreducibility, to prove that ground-truth object representations can be identified. |
Code | PMLR | ||
2023 | ICML | Unlocking Slot Attention by Changing Optimal Transport Costs MESH: A cross-attention module that combines the tiebreaking properties of unregularized optimal transport with the speed of regularized optimal transport. |
PMLR | |||
2023 | ICML | Slot-VAE: Object-Centric Scene Generation with Slot Attention SlotVAE: A generative model that integrates slot attention with the hierarchical VAE framework for object-centric structured scene generation. |
PMLR | |||
2023 | ICML | Invariant Slot Attention: Object Discovery with Slot-Centric Reference Frames InvSlotAttns: Incorporating equivariance to per-object pose transformations into the attention and generation mechanism of Slot Attention by translating, scaling, and rotating position encodings. |
Code | PMLR | ✅ | |
2023 | ICML | An Investigation into Pre-Training Object-Centric Representations for Reinforcement Learning OCRL: Examining critical aspects of incorporating object-centric representation pre-training in reinforcement learning, such as performance in visually complex environments and the selection of an appropriate pooling layer for aggregating object representations. |
Code | PMLR | ||
2023 | ICML | Discovering Object-Centric Generalized Value Functions From Pixels OCGVFs: Introducing a method that tries to discover meaningful features from objects, translating them to temporally coherent ‘question’ functions and leveraging the subsequent learned general value functions for control. |
PMLR | |||
2023 | UAI | Time-Conditioned Generative Modeling of Object-Centric Representations for Video Decomposition and Prediction TCGM-OCL: Introducing a time-conditioned generative model for videos. |
PMLR | |||
2023 | CVPR | Intrinsic Physical Concepts Discovery with Object-Centric Predictive Models PHYCINE: A system that infers physical concepts in different abstract levels without supervision. |
CVF Proceedings | |||
2023 | CVPR | Object Discovery from Motion-Guided Tokens MoTok: Enabling the emergence of interpretable object-specific mid-level features, demonstrating the benefits of motion-guidance (no labeling) and quantization (interpretability, memory efficiency). |
Code | CVF Proceedings | ✅ | |
2023 | CVPR | Shepherding Slots to Objects: Towards Stable and Robust Object-Centric Learning SLASH: Consisting of two simple-yet-effective modules on top of Slot Attention. |
Code | CVF Proceedings | ||
2023 | CVPR | Multi-Object Manipulation via Object-Centric Neural Scattering Functions OSFs-MOM: Combining object-centric neural scattering functions with inverse parameter estimation, and graph-based neural dynamics models. |
Code | CVF Proceedings | ✅ | |
2023 | ICLR | Bridging the Gap to Real-World Object-Centric Learning DINOSAUR: Using slot attention with self-supervised DINO features to discover objects on real-world data. |
Code | OpenReview | ✅ | |
2023 | ICLR | Improving Object-centric Learning with Query Optimization BO-QSA: Extending slot attention, outperforming previous baselines on both synthetic and real images. |
Code | OpenReview | ✅ | |
2023 | ICLR | Learning to Reason over Visual Objects STSN: Combining slot attention, an objectcentric encoding method, and a transformer reasoning module. |
OpenReview | |||
2023 | ICLR | Learning What and Where: Disentangling Location and Identity Tracking Without Supervision Loci: An unsupervised disentangled location and identity tracking system, which excels on the CATER and related object tracking challenges featuring emergent object permanence and stable entity disentanglement via fully unsupervised learning. |
Code | OpenReview | ✅ | |
2023 | ICLR | Neural Constraint Satisfaction: Hierarchical Abstraction for Combinatorial Generalization in Object Rearrangement NCS: Demonstrating how to generalize over a combinatorially large space of rearrangement tasks from only pixel observations by constructing from video demonstrations a factorized transition graph over entity state transitions that we use for control. |
Code | OpenReview | ||
2023 | ICLR | Robust and Controllable Object-Centric Learning through Energy-based Models EGO: A conceptually simple and general approach to learning object-centric representation through energy-based model. |
OpenReview | |||
2023 | ICLR | Neural Groundplans: Persistent Neural Scene Representations from a Single Image GroundPlans: Training a self-supervised model that learns to map a single image to a 3D representation of the scene, with separate components for the immovable and movable 3D regions. |
Code | OpenReview | ||
2023 | ICLR | Neural Systematic Binder NSB: Proposing a novel object-centric representation called block-slots, which unlike the conventional slots, provides within-slot disentanglement via vector-formed factor representations. |
Code | OpenReview | ||
2023 | ICLR | SlotFormer: Unsupervised Visual Dynamics Simulation with Object-Centric Models SlotFormer: Proposing a general Transformer-based dynamic model to enable consistent future prediction in object-centric models. |
Code | OpenReview | ||
2023 | CLeaR | Causal Triplet: An Open Challenge for Intervention-centric Causal Representation Learning CausalTriplet: Presenting a causal representation learning benchmark that is close to realistic settings and empirically demonstrate the strengths and weaknesses of recent hypotheses and methods. |
Code | OpenReview | ✅ | |
2023 | NeurIPS | SlotDiffusion: Object-Centric Generative Modeling with Diffusion Models SlotDiff: An object-centric latent diffusion model designed for both synthetic/real-world image and video data. |
Code | ✅ | ||
2023 | arXiv | Object-Centric Slot Diffusion LSD: Replacing the conventional slot decoders with a latent diffusion model conditioned on the object slots. |
Code | ✅ | ||
2023 | arXiv | Sensitivity of Slot-Based Object-Centric Models to their Number of Slots NumSlots: Proposing to use analogs to precision and recall based on the Adjusted Rand Index to accurately quantify model behavior over a large range of slots. |
||||
2023 | arXiv | Spotlight Attention: Robust Object-Centric Learning With a Spatial Locality Prior LSP: Incorporating a spatial-locality prior into state-of-the-art object-centric vision models, and obtaining significant improvements in segmenting objects in both synthetic and real-world datasets. |
✅ | |||
2023 | arXiv | Unsupervised Open-Vocabulary Object Localization in Videos RWV-OCL: Proposing an unsupervised approach to localize and name objects in real-world videos. |
✅ | ✅ |
Year | Publication | Title | Source | Forum | Real-World? | Using VLMs? |
---|---|---|---|---|---|---|
2022 | TMLR | Complex-Valued Autoencoders for Object Discovery CAE: Introducing complex-valued activations into a convolutional autoencoder, it learns to encode feature information in the activations’ magnitudes and object affiliation in their phase values. |
Code | |||
2022 | NeurIPSw | Object-Centric Causal Representation Learning CausalOCL: Advancing causal representation learning by developing an object-centric architecture that leverages weak supervision from sparse perturbations to disentangle each object's properties. |
OpenReview | |||
2022 | NeurIPSw | Unlocking Slot Attention by Changing Optimal Transport Costs SA-MESH: Slot attention can do tiebreaking by changing the costs for optimal transport to minimize entropy, which improves results significantly on object detection. |
Code | OpenReview | ||
2022 | NeurIPS | Visual Concepts Tokenization VCT: Proposing an unsupervised transformer-based Visual Concepts Tokenization framework, to perceive an image into a set of disentangled visual concept tokens, with each concept token responding to one type of independent visual concept. |
Code | OpenReview | ✅ | |
2022 | NeurIPS | Unsupervised Multi-object Segmentation by Predicting Probable Motion Patterns PPMP: Segmenting independent objects in still images by predicting regions that contain motion patterns likely to arise from such objects. |
Code | OpenReview | ✅ | |
2022 | NeurIPS | Unsupervised Causal Generative Understanding of Images UCGU: A framework for unsupervised object-centric 3D scene understanding that generalizes robustly to out-of-distribution images. |
OpenReview | |||
2022 | NeurIPS | SAVi++: Towards End-to-End Object-Centric Learning from Real-World Videos SAVi++: An object-centric video model which is trained to predict depth signals from a slot-based video representation. SAVi++ is able to learn emergent object segmentation and tracking from videos in the real-world Waymo Open dataset. |
Code | OpenReview | ✅ | |
2022 | NeurIPS | Promising or Elusive? Unsupervised Object Segmentation from Real-world Single Images UnsupObjSeg: Training more than 200 models to demonstrate that current unsupervised methods cannot segment generic objects from real-world single images, unless the complex objectness biases are removed. |
Code | OpenReview | ✅ | |
2022 | NeurIPS | Object Scene Representation Transformer OSRT: Proposing Object Scene Representation Transformer, a highly efficient 3D-centric model in which individual object representations naturally emerge through novel view synthesis. |
Code | OpenReview | ||
2022 | NeurIPS | Object Representations as Fixed Points: Training Iterative Refinement Algorithms with Implicit Differentiation iSlotAttns: Improving the training of object-centric learning methods by applying implicit differentiation to slot attention. |
Code | OpenReview | ||
2022 | NeurIPS | Simple Unsupervised Object-Centric Learning for Complex and Naturalistic Videos STEVE: A simple fully unsupervised model for object-centric learning in complex and naturalistic videos. |
Code | OpenReview | ✅ | |
2022 | ICML | Unsupervised Image Representation Learning with Deep Latent Particles DLP: Decomposing the visual input into low-dimensional latent particles, where each particle is described by its spatial location and features of its surrounding region. |
Code | PMLR | ✅ | |
2022 | ICML | Toward Compositional Generalization in Object-Oriented World Modeling HOWM: Formalizing the compositional generalization problem with an algebraic approach and studying how a world model can achieve that. |
Code | PMLR | ||
2022 | ICML | COAT: Measuring Object Compositionality in Emergent Representations COAT: Directly measuring compositionality in the representation space as a form of objections, making such evaluations tractable for a wider class of models. |
PMLR | |||
2022 | ICML | Generalization and Robustness Implications in Object-Centric Learning OCLLib: when the distribution shift affects the input in a less structured manner, robustness in terms of segmentation and downstream task performance may vary significantly across models and distribution shifts. |
Code | PMLR | ||
2022 | CVPR | HP-Capsule: Unsupervised Face Part Discovery by Hierarchical Parsing Capsule Network HPCapsule: Extending the application of capsule networks from digits to human faces and takes a step forward to show how the neural networks understand homologous objects without human intervention. |
CVF Proceedings | ✅ | ||
2022 | CVPR | Discovering Objects that Can Move ObjMove: Simplifying auto-encoders' architecture, and augmenting the resulting model with a weak learning signal from general motion segmentation algorithms. |
Code | CVF Proceedings | ✅ | |
2022 | ICLRw | Towards Self-Supervised Learning of Global and Object-Centric Representations SSL-OCL: Discussing the interplay of attention, global and per-object contrastive losses, and data augmentation for learning object representations through self-supervision. |
Code | OpenReview | ||
2022 | ICLR | Illiterate DALL-E Learns to Compose SLATE: To learn compositional slot-based representation of an image and perform slot composition for zero-shot novel image generation. |
Code | OpenReview | ✅ | ✅ |
2022 | ICLR | Conditional Object-Centric Learning from Video SAVi: A sequential extension to Slot Attention. |
Code | OpenReview | ||
2022 | ICLR | Unsupervised Discovery of Object Radiance Fields uORF: Inferring object-centric factorized 3D scene representations from a single image, learned without 3D geometry or segmentation supervision. |
Code | OpenReview | ✅ | |
2022 | ICLR | Evaluating Disentanglement of Structured Representations SLR-Metric: Introducing the first metric for evaluating disentanglement at individual hierarchy levels of a structured latent representation, and applying it to object-centric generative models. |
OpenReview | |||
2022 | CLeaR | VIM: Variational Independent Modules for Video Prediction VIM: Defining an object-centric video prediction model that learns modular object dynamics and displays good compositional generalization skills. |
OpenReview | |||
2022 | SIGGRAPH | Sprite-from-Sprite: Cartoon Animation Decomposition with Self-supervised Sprite Estimation ToonDecompose: Decomposing a cartoon animation into several components (a.k.a., "Sprites" in terminology), where the optical flow is the only external prior used for model training. |
Code | ✅ |
Year | Publication | Title | Source | Forum | Real-World? | Using VLMs? |
---|---|---|---|---|---|---|
2021 | NeurIPS | Unsupervised Foreground Extraction via Deep Region Competition DRC: An unsupervised foreground extraction algorithm to unseen objects, in both synthetic and low-resolution real-world scenes. |
Code | OpenReview | ✅ | |
2021 | NeurIPS | SIMONe: View-Invariant, Temporally-Abstracted Object Representations via Unsupervised Video Decomposition SIMONe: A video scene model which separates the time-invariant, object-level contents of the scene from global time-varying elements such as viewpoint. |
Code | OpenReview | ||
2021 | NeurIPS | Object-Centric Representation Learning with Generative Spatial-Temporal Factorization DyMON: Extending unsupervised object-centric representation learning to multi-view-dynamic-scene settings. |
OpenReview | ✅ | ||
2021 | NeurIPS | Neural Production Systems NPS: Modelling sparse interactions among seperate entities using dynamically selected rules. |
OpenReview | |||
2021 | NeurIPS | MarioNette: Self-Supervised Sprite Learning MarioNette: Jointly learning a dictionary of texture patches and training a network that places them onto a canvas, effectively deconstructing sprite-based content video content. |
Code | OpenReview | ✅ | |
2021 | NeurIPS | GENESIS-V2: Inferring Unordered Object Representations without Iterative Refinement GENESISv2: Presenting an improved object-centric generative model of visual scenes that uses a stochastic clustering algorithm for inferring object representations without imposing a fixed ordering on objects or using iterative refinement. |
Code | OpenReview | ✅ | |
2021 | NeurIPS | Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language VRDP: A unified framework to learn visual concepts and infer physics models of objects and their interactions jointly from videos and language. |
Code | OpenReview | ✅ | ✅ |
2021 | NeurIPS | Attention over Learned Object Embeddings Enables Complex Visual Reasoning ALOE: A general framework of attention over learned object embeddings outperforms task-specific models on complex visual reasoning tasks thought to be too challenging for general models. |
Code | OpenReview | ||
2021 | ICML | Efficient Iterative Amortized Inference for Learning Symmetric and Disentangled Multi-Object Representations EfficientMORL: A framework for efficient multi-object representation learning consisting of a hierarchical VAE and a lightweight network for iterative refinement. |
Code | PMLR | ||
2021 | ICCV | PARTS: Unsupervised segmentation with slots, attention and independence maximization PARTS: Introducing a recurrent slot-attention like encoder which allows for top-down influence during inference, to both 3D synthetic and real-world robotics' scenes. |
CVF Proceedings | ✅ | ||
2021 | ICLR | Self-supervised Visual Reinforcement Learning with Object-centric Representations SMORL: The combination of object-centric representations and goal-conditioned attention policies helps autonomous agents to learn useful multi-task policies in visual multi-object environments. |
Code | OpenReview |
Year | Publication | Title | Source | Forum | Real-World? | Using VLMs? |
---|---|---|---|---|---|---|
2020 | NeurIPS | Learning Object-Centric Representations of Multi-Object Scenes from Multiple Views MulMON: Extending IODINE with Generative Query Network (GQN)-based module, for unsupervised object segmentation in 2D multi-view synthetic data. |
Code | NeurIPS Proceedings | ||
2020 | NeurIPS | Object-Centric Learning with Slot Attention SLotAttns: Learning abstract representations of Convolutional Neural Networks (CNNs) for unsupervised object segmentation in 2D synthetic data. |
Code | NeurIPS Proceedings | ||
2020 | NeurIPS | Unsupervised object-centric video generation and decomposition in 3D O3V: Generation and decomposition of 3D synthetic scenes with 2D synthetic videos. |
Code | NeurIPS Proceedings | ||
2020 | NeurIPS | BlockGAN: Learning 3D Object-aware Scene Representations from Unlabelled Images BlockGAN: Representing 3D synthetic objects using GANs based on 2D synthetic data. |
Code | NeurIPS Proceedings | ✅ | |
2020 | ICLR | GENESIS: Generative Scene Inference and Sampling with Object-Centric Latent Representations GENESIS: Modeling relationship of scene components for decomposition and generation of 3D synthetic scenes. |
Code | OpenReview | ||
2020 | ICLR | Structured Object-Aware Physics Prediction for Video Modeling and Planning STOVE: Proposing a structured object-aware video prediction model, which explicitly reasons about objects and demonstrate that it provides high-quality long term video predictions for planning. |
Code | OpenReview | ||
2020 | ICLR | SCALOR: Generative World Models with Scalable Object Representations SCALOR: Generation of both synthetic and low-resolution real-world scenes where a large number of objects exist. |
Code | OpenReview | ✅ | |
2020 | ICML | Improving Generative Imagination in Object-Centric World Models G-SWM: Modeling multi-modal uncertainty and situation-awareness for 3D synthetic scene generation. |
Code | PMLR |
Year | Publication | Title | Source | Forum | Real-World? | Using VLMs? |
---|---|---|---|---|---|---|
2019 | ICML | Multi-Object Representation Learning with Iterative Variational Inference IODINE: Learning multi-object disentangled representations for unsupervised object segmentation in 2D synthetic data. |
Code | PMLR | ✅ | |
2019 | arXiv | MONet: Unsupervised Scene Decomposition and Representation MoNet: Training a Variational Autoencoder (VAE) and a recurrent attention network to decompose and represent 3D synthetic scenes. |
Code |
Feel free to drop an e-mail to [email protected]