Collect some World Models (for Autonomous Driving) papers.
If you find some ignored papers, feel free to create pull requests, open issues, or email me / Qi Wang. Contributions in any form to make this list more comprehensive are welcome. 📣📣📣
If you find this repository useful, please consider giving us a star 🌟.
Feel free to share this list with others! 🥳🥳🥳
-
CVPR 2024 Workshop & Challenge | OpenDriveLab
Track #4: Predictive World Model.Serving as an abstract spatio-temporal representation of reality, the world model can predict future states based on the current state. The learning process of world models has the potential to elevate a pre-trained foundation model to the next level. Given vision-only inputs, the neural network outputs point clouds in the future to testify its predictive capability of the world.
-
CVPR 2023 Workshop on Autonomous Driving
CHALLENGE 3: ARGOVERSE CHALLENGES, 3D Occupancy Forecasting using the Argoverse 2 Sensor Dataset. Predict the spacetime occupancy of the world for the next 3 seconds.
- Using Occupancy Grids for Mobile Robot Perception and Navigation [paper]
Yann LeCun
: A Path Towards Autonomous Machine Intelligence [paper] [Video]CVPR'23 WAD
Keynote - Ashok Elluswamy, Tesla [Video]Wayve
Introducing GAIA-1: A Cutting-Edge Generative AI Model for Autonomy [blog]World models are the basis for the ability to predict what might happen next, which is fundamentally important for autonomous driving. They can act as a learned simulator, or a mental “what if” thought experiment for model-based reinforcement learning (RL) or planning. By incorporating world models into our driving models, we can enable them to understand human decisions better and ultimately generalise to more real-world situations.
- A survey on multimodal large language models for autonomous driving.
WACVW 2024
[Paper] [Code] - World Models: The Safety Perspective.
ISSREW
[Paper - Exploring the Interplay Between Video Generation and World Models in Autonomous Driving: A Survey.
arXiv 2024.11
[Paper] - Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI.
arXiv 2024.7
[Paper] [Code] - Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond.
arXiv 2024.5
[Paper] [Code] - World Models for Autonomous Driving: An Initial Survey.
2024.3, arxiv
[Paper]
- [SEM2] Enhance Sample Efficiency and Robustness of End-to-end Urban Autonomous Driving via Semantic Masked World Model.
TITS
[Paper] - Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability.
NeurIPS 2024
[Paper] [Code] - DrivingDojo Dataset: Advancing Interactive and Knowledge-Enriched Driving World Model.
NeurIPS 2024
[Paper] [Project] - Think2Drive: Efficient Reinforcement Learning by Thinking in Latent World Model for Quasi-Realistic Autonomous Driving.
ECCV 2024
[Paper] - [MARL-CCE] Modelling Competitive Behaviors in Autonomous Driving Under Generative World Model.
ECCV 2024
[Paper] [Code] - DriveDreamer: Towards Real-world-driven World Models for Autonomous Driving.
ECCV 2024
[Paper] [Code] - GenAD: Generative End-to-End Autonomous Driving.
ECCV 2024
[Paper] [Code] - OccWorld: Learning a 3D Occupancy World Model for Autonomous Driving.
ECCV 2024
[Paper] [Code] - [NeMo] Neural Volumetric World Models for Autonomous Driving.
ECCV 2024
[Paper] - CarFormer: Self-Driving with Learned Object-Centric Representations.
ECCV 2024
[Paper] [Code] - [MARL-CCE] Modelling-Competitive-Behaviors-in-Autonomous-Driving-Under-Generative-World-Model.
ECCV 2024
[Code] - DrivingDiffusion: Layout-Guided multi-view driving scene video generation with latent diffusion model.
ECCV 2024
[Paper] [Code] - 3D-VLA: A 3D Vision-Language-Action Generative World Model.
ICML 2024
[Paper] - [ViDAR] Visual Point Cloud Forecasting enables Scalable Autonomous Driving.
CVPR 2024
[Paper] [Code] - [GenAD] Generalized Predictive Model for Autonomous Driving.
CVPR 2024
[Paper] [Data] - Cam4DOCC: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications.
CVPR 2024
[Paper] [Code] - [Drive-WM] Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving.
CVPR 2024
[Paper] [Code] - DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving.
CVPR 2024
[Paper] - Panacea: Panoramic and Controllable Video Generation for Autonomous Driving.
CVPR 2024
[Paper] [Code] - UnO: Unsupervised Occupancy Fields for Perception and Forecasting.
CVPR 2024
[Paper] [Code] - MagicDrive: Street View Generation with Diverse 3D Geometry Control.
ICLR 2024
[Paper] [Code] - Copilot4D: Learning Unsupervised World Models for Autonomous Driving via Discrete Diffusion.
ICLR 2024
[Paper] - SafeDreamer: Safe Reinforcement Learning with World Models.
ICLR 2024
[Paper] [Code] - Imagine-2-Drive: High-Fidelity World Modeling in CARLA for Autonomous Vehicles.
arXiv 2024.11
[Paper] [Project Page] - WorldSimBench: Towards Video Generation Models as World Simulator.
arXiv 2024.10
[Paper] [Project Page] - DriveDreamer4D: World Models Are Effective Data Machines for 4D Driving Scene Representation.
arXiv 2024.10
[Paper] [Project Page] - DOME: Taming Diffusion Model into High-Fidelity Controllable Occupancy World Model.
arXiv 2024.10
[Paper] [Project Page] - [SSR] Does End-to-End Autonomous Driving Really Need Perception Tasks?
arXiv 2024.9
[Paper] [Code] - Mitigating Covariate Shift in Imitation Learning for Autonomous Vehicles Using Latent Space Generative World Models.
arXiv 2024.9
[Paper] - [LatentDriver] Learning Multiple Probabilistic Decisions from Latent World Model in Autonomous Driving.
arXiv 2024.9
[Paper] [Code] - RenderWorld: World Model with Self-Supervised 3D Label.
arXiv 2024.9
[Paper] - OccLLaMA: An Occupancy-Language-Action Generative World Model for Autonomous Driving.
arXiv 2024.9
[Paper] - DriveGenVLM: Real-world Video Generation for Vision Language Model based Autonomous Driving.
arXiv 2024.8
[Paper] - [Drive-OccWorld] Driving in the Occupancy World: Vision-Centric 4D Occupancy Forecasting and Planning via World Models for Autonomous Driving.
arXiv 2024.8
[Paper] - BEVWorld: A Multimodal World Model for Autonomous Driving via Unified BEV Latent Space.
arXiv 2024.7
[Paper] [Code] - [TOKEN] Tokenize the World into Object-level Knowledge to Address Long-tail Events in Autonomous Driving.
arXiv 2024.7
[Paper] - UMAD: Unsupervised Mask-Level Anomaly Detection for Autonomous Driving.
arXiv 2024.6
[Paper] - SimGen: Simulator-conditioned Driving Scene Generation.
arXiv 2024.6
[Paper] [Code] - [AdaptiveDriver] Planning with Adaptive World Models for Autonomous Driving.
arXiv 2024.6
[Paper] [Code] - [LAW] Enhancing End-to-End Autonomous Driving with Latent World Model.
arXiv 2024.6
[Paper] [Code] - [Delphi] Unleashing Generalization of End-to-End Autonomous Driving with Controllable Long Video Generation.
arXiv 2024.6
[Paper] [Code] - OccSora: 4D Occupancy Generation Models as World Simulators for Autonomous Driving.
arXiv 2024.5
[Paper] [Code] - MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes.
arXiv 2024.5
[Paper] [Code] - CarDreamer: Open-Source Learning Platform for World Model based Autonomous Driving.
arXiv 2024.5
[Paper] [Code] - [DriveSim] Probing Multimodal LLMs as World Models for Driving.
arXiv 2024.5
[Paper] [Code] - LidarDM: Generative LiDAR Simulation in a Generated World.
arXiv 2024.4
[Paper] [Code] - SubjectDrive: Scaling Generative Data in Autonomous Driving via Subject Control.
arXiv 2024.3
[Paper] [Project] - DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation.
arXiv 2024.3
[Paper] [Code]
- TrafficBots: Towards World Models for Autonomous Driving Simulation and Motion Prediction.
ICRA 2023
[Paper] [Code] - WoVoGen: World Volume-aware Diffusion for Controllable Multi-camera Driving Scene Generation.
arXiv 2023.12
[Paper] [Code] - [CTT] Categorical Traffic Transformer: Interpretable and Diverse Behavior Prediction with Tokenized Latent.
arXiv 2023.11
[Paper] - MUVO: A Multimodal Generative World Model for Autonomous Driving with Geometric Representations.
arXiv 2023.11
[Paper] - GAIA-1: A Generative World Model for Autonomous Driving.
arXiv 2023.9
[Paper] - ADriver-I: A General World Model for Autonomous Driving.
arXiv 2023.9
[Paper] - UniWorld: Autonomous Driving Pre-training via World Models.
arXiv 2023.8
[Paper] [Code]
- [MILE] Model-Based Imitation Learning for Urban Driving.
NeurIPS 2022
[Paper] [Code] - Iso-Dream: Isolating and Leveraging Noncontrollable Visual Dynamics in World Models.
NeurIPS 2022 Spotlight
[Paper] [Code] - Symphony: Learning Realistic and Diverse Agents for Autonomous Driving Simulation.
ICRA 2022
[Paper] - Hierarchical Model-Based Imitation Learning for Planning in Autonomous Driving.
IROS 2022
[Paper] - [SEM2] Enhance Sample Efficiency and Robustness of End-to-end Urban Autonomous Driving via Semantic Masked World Model.
NeurIPS 2022 workshop
[Paper]
- [SMAC] Grounded Answers for Multi-agent Decision-making Problem through Generative World Model.
NeurIPS 2024
[Paper] - [CoWorld] Making Offline RL Online: Collaborative World Models for Offline Visual Reinforcement Learning.
NeurIPS 2024
[Paper] - PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic Manipulation.
NeurIPS 2024
[Paper] - [MUN]Learning World Models for Unconstrained Goal Navigation.
NeurIPS 2024
[Paper] [Code] - VidMan: Exploiting Implicit Dynamics from Video Diffusion Model for Effective Robot Manipulation.
NeurIPS 24
[Paper] - Adaptive World Models: Learning Behaviors by Latent Imagination Under Non-Stationarity.
NeurIPSW 2024
[Paper] - Emergence of Implicit World Models from Mortal Agents.
NeurIPSW 2024
[Paper] - PreLAR: World Model Pre-training with Learnable Action Representation.
ECCV 2024
[Paper] [Code] - [CWM] Understanding Physical Dynamics with Counterfactual World Modeling.
ECCV 2024
[Paper] [Code] - [DWL] Advancing Humanoid Locomotion: Mastering Challenging Terrains with Denoising World Model Learning.
RSS 2024 (Best Paper Award Finalist)
[Paper] - [LLM-Sim] Can Language Models Serve as Text-Based World Simulators?
ACL
[Paper] [Code] - RoboDreamer: Learning Compositional World Models for Robot Imagination.
ICML 2024
[Paper] [Code] - [Δ-IRIS] Efficient World Models with Context-Aware Tokenization.
ICML 2024
[Paper] [Code] - AD3: Implicit Action is the Key for World Models to Distinguish the Diverse Visual Distractors.
ICML 2024
[Paper] - Hieros: Hierarchical Imagination on Structured State Space Sequence World Models.
ICML 2024
[Paper] - [HRSSM] Learning Latent Dynamic Robust Representations for World Models.
ICML 2024
[Paper] [Code] - HarmonyDream: Task Harmonization Inside World Models.
ICML 2024
[Paper] [Code] - [REM] Improving Token-Based World Models with Parallel Observation Prediction.
ICML 2024
[Paper] [Code] - Do Transformer World Models Give Better Policy Gradients?
ICML 2024
[Paper] - TD-MPC2: Scalable, Robust World Models for Continuous Control.
ICLR 2024
[Paper] [Torch Code] - DreamSmooth: Improving Model-based Reinforcement Learning via Reward Smoothing.
ICLR 2024
[Paper] - [R2I] Mastering Memory Tasks with World Models.
ICLR 2024
[Paper] [JAX Code] - MAMBA: an Effective World Model Approach for Meta-Reinforcement Learning.
ICLR 2024
[Paper] [Code] - Multi-Task Interactive Robot Fleet Learning with Visual World Models.
CoRL 2024
[Paper] [Code] - Generative World Explorer.
arXiv 2024.11
[Paper] [Project] - [WebDreamer] Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents.
arXiv 2024.11
[Paper] [Code] - WHALE: Towards Generalizable and Scalable World Models for Embodied Decision-making.
arXiv 2024.11
[Paper] - DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot Planning.
arXiv 2024.11
Yann LeCun
[Paper] - Scaling Laws for Pre-training Agents and World Models.
arXiv 2024.11
[Paper] - [Phyworld] How Far is Video Generation from World Model: A Physical Law Perspective.
arXiv 2024.11
[Paper] [Project] - IGOR: Image-GOal Representations are the Atomic Control Units for Foundation Models in Embodied AI.
arXiv 2024.10
[Paper] [Project] - EVA: An Embodied World Model for Future Video Anticipation.
arXiv 2024.10
[Paper] - VisualPredicator: Learning Abstract World Models with Neuro-Symbolic Predicates for Robot Planning.
arXiv 2024.10
[Paper] - [LLMCWM] Language Agents Meet Causality -- Bridging LLMs and Causal World Models.
arXiv 2024.10
[Paper] [Code] - Reward-free World Models for Online Imitation Learning.
arXiv 2024.10
[Paper] - Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation.
arXiv 2024.10
[Paper] - [GLIMO] Grounding Large Language Models In Embodied Environment With Imperfect World Models.
arXiv 2024.10
[Paper] - AVID: Adapting Video Diffusion Models to World Models.
arXiv 2024.10
[Paper] [Code] - [WMP] World Model-based Perception for Visual Legged Locomotion.
arXiv 2024.9
[Paper] [Project] - [OSWM] One-shot World Models Using a Transformer Trained on a Synthetic Prior.
arXiv 2024.9
[Paper] - R-AIF: Solving Sparse-Reward Robotic Tasks from Pixels with Active Inference and World Models.
arXiv 2024.9
[Paper] - Representing Positional Information in Generative World Models for Object Manipulation.
arXiv 2024.9
[Paper] - Making Large Language Models into World Models with Precondition and Effect Knowledge.
arXiv 2024.9
[Paper] - DexSim2Real$^2$: Building Explicit World Model for Precise Articulated Object Dexterous Manipulation.
arXiv 2024.9
[Paper] - Efficient Exploration and Discriminative World Model Learning with an Object-Centric Abstraction.
arXiv 2024.8
[Paper] - [MoReFree] World Models Increase Autonomy in Reinforcement Learning.
arXiv 2024.8
[Paper] [Project] - UrbanWorld: An Urban World Model for 3D City Generation.
arXiv 2024.7
[Paper] - PWM: Policy Learning with Large World Models.
arXiv 2024.7
[Paper] [Code] - Predicting vs. Acting: A Trade-off Between World Modeling & Agent Modeling.
arXiv 2024.7
[Paper] - [GenRL] Multimodal foundation world models for generalist embodied agents.
arXiv 2024.6
[Paper] [Code] - [DLLM] World Models with Hints of Large Language Models for Goal Achieving.
arXiv 2024.6
[Paper] - Cognitive Map for Language Models: Optimal Planning via Verbally Representing the World Model.
arXiv 2024.6
[Paper] - CityBench: Evaluating the Capabilities of Large Language Model as World Model.
arXiv 2024.6
[Paper] [Code] - CoDreamer: Communication-Based Decentralised World Models.
arXiv 2024.6
[Paper] - [EBWM] Cognitively Inspired Energy-Based World Models.
arXiv 2024.6
[Paper] - Evaluating the World Model Implicit in a Generative Model.
arXiv 2024.6
[Paper] [Code] - Transformers and Slot Encoding for Sample Efficient Physical World Modelling.
arXiv 2024.5
[Paper] [Code] - [Puppeteer] Hierarchical World Models as Visual Whole-Body Humanoid Controllers.
arXiv 2024.5
Yann LeCun
[Paper] [Code] - BWArea Model: Learning World Model, Inverse Dynamics, and Policy for Controllable Language Generation.
arXiv 2024.5
[Paper] - Pandora: Towards General World Model with Natural Language Actions and Video States. [Paper] [Code]
- [WKM] Agent Planning with World Knowledge Model.
arXiv 2024.5
[Paper] [Code] - [Diamond] Diffusion for World Modeling: Visual Details Matter in Atari.
arXiv 2024.5
[Paper] [Code] - Newton™ – a first-of-its-kind foundation model for understanding the physical world.
Archetype AI
[Blog] - Compete and Compose: Learning Independent Mechanisms for Modular World Models.
arXiv 2024.4
[Paper] - MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators.
arXiv 2024.4
[Paper] [Code] - Dreaming of Many Worlds: Learning Contextual World Models Aids Zero-Shot Generalization.
arXiv 2024.3
[Paper] [Code] - ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation.
arXiv 2024.3
[Paper] [Code] - V-JEPA: Video Joint Embedding Predictive Architecture.
Meta AI
Yann LeCun
[Blog] [Paper] [Code] - [IWM] Learning and Leveraging World Models in Visual Representation Learning.
Meta AI
[Paper] - Genie: Generative Interactive Environments.
DeepMind
[Paper] [Blog] - [Sora] Video generation models as world simulators.
OpenAI
[Technical report] - [LWM] World Model on Million-Length Video And Language With RingAttention.
arXiv 2024.2
[Paper] [Code] - Planning with an Ensemble of World Models.
OpenReview
[Paper] - WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens.
arXiv 2024.1
[Paper] [Code]
- [IRIS] Transformers are Sample Efficient World Models.
ICLR 2023 Oral
[Paper] [Torch Code] - STORM: Efficient Stochastic Transformer based World Models for Reinforcement Learning.
NIPS 2023
[Paper] [Torch Code] - [TWM] Transformer-based World Models Are Happy with 100k Interactions.
ICLR 2023
[Paper] [Torch Code] - [Dynalang] Learning to Model the World with Language.
arXiv 2023.8
[Paper] [JAX Code] - [DreamerV3] Mastering Atari with Discrete World Models.
arXiv 2023.1
[Paper] [JAX Code] [Torch Code]
- [TD-MPC] Temporal Difference Learning for Model Predictive Control.
ICML 2022
[Paper][Torch Code] - DreamerPro: Reconstruction-Free Model-Based Reinforcement Learning with Prototypical Representations.
ICML 2022
[Paper] [TF Code] - DayDreamer: World Models for Physical Robot Learning.
CoRL 2022
[Paper] [TF Code] - Deep Hierarchical Planning from Pixels.
NIPS 2022
[Paper] [TF Code] - Iso-Dream: Isolating and Leveraging Noncontrollable Visual Dynamics in World Models.
NIPS 2022 Spotlight
[Paper] [Torch Code] - DreamingV2: Reinforcement Learning with Discrete World Models without Reconstruction.
arXiv 2022.3
[Paper]
- [DreamerV2] Mastering Atari with Discrete World Models.
ICLR 2021
[Paper] [TF Code] [Torch Code] - Dreaming: Model-based Reinforcement Learning by Latent Imagination without Reconstruction.
ICRA 2021
[Paper]
- [DreamerV1] Dream to Control: Learning Behaviors by Latent Imagination.
ICLR 2020
[Paper] [TF Code] [Torch Code] - [Plan2Explore] Planning to Explore via Self-Supervised World Models.
ICML 2020
[Paper] [TF Code] [Torch Code]
- World Models.
NIPS 2018 Oral
[Paper]