This project is a work-in-progress implementing world model training for Unity environments using the ML-Agents to OpenAI Gym wrapper. The goal is to train reinforcement learning agents efficiently by having them learn in a learned world model rather than directly in the environment.
This is currently running the Civilization Simulations.
I'm currently developing this to add depth of gameplay for my 2d civilization simulations, but it can be used for training generalist agents for simulations for other purposes.
The project implements three key components based on modern world model approaches:
- Data Collection: Gather experience from Unity environments using ML-Agents Gym wrapper
- World Model Training: Learn to predict next observations and rewards
- Policy Training: Train RL agents inside the learned world model
Then the policy can be used in a Unity environment for controlling NPC behavior or other purposes.
- Python 3.10+
- Unity ML-Agents
- PyTorch
- OpenAI Gym
pip install -r requirements.txt
python src/collect.py
The training process follows two main stages that can be run with a single command:
python src/train.py
This will automatically:
-
Train World Model: Updates the world model on collected experiences from the Unity environment to better predict next observations, rewards, and episode terminations. The world model combines a VAE for compact state representation with an MDN-RNN for dynamics prediction.
-
Train Policy in Imagination: Optimizes the agent's policy entirely inside the learned world model using actor-critic RL. This allows rapid policy improvement without additional environment interaction.
- Add the Agentics package to your Unity project
- Add the Brain component to your character
- Configure required components (Sensor, Plan, Motivation, etc.)
- Set up training configuration with the python directory in the root of this repo
- Add example plans in Data directory and configure with NetworkingController [coming soon]
@incollection{ha2018worldmodels,
title = {Recurrent World Models Facilitate Policy Evolution},
author = {Ha, David and Schmidhuber, J{\"u}rgen},
booktitle = {Advances in Neural Information Processing Systems 31},
pages = {2451--2463},
year = {2018},
publisher = {Curran Associates, Inc.},
url = {https://papers.nips.cc/paper/7512-recurrent-world-models-facilitate-policy-evolution},
note = "\url{https://worldmodels.github.io}",
}
@inproceedings{Park2023GenerativeAgents,
author = {Park, Joon Sung and O'Brien, Joseph C. and Cai, Carrie J. and Morris, Meredith Ringel and Liang, Percy and Bernstein, Michael S.},
title = {Generative Agents: Interactive Simulacra of Human Behavior},
year = {2023},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
booktitle = {In the 36th Annual ACM Symposium on User Interface Software and Technology (UIST '23)},
keywords = {Human-AI interaction, agents, generative AI, large language models},
location = {San Francisco, CA, USA},
series = {UIST '23}
}
@inproceedings{alonso2024diffusionworldmodelingvisual,
title={Diffusion for World Modeling: Visual Details Matter in Atari},
author={Eloi Alonso and Adam Jelley and Vincent Micheli and Anssi Kanervisto and Amos Storkey and Tim Pearce and François Fleuret},
booktitle={Thirty-eighth Conference on Neural Information Processing Systems}}
year={2024},
url={https://arxiv.org/abs/2405.12399},
}
@article{hafner2023dreamerv3,
title={Mastering Diverse Domains through World Models},
author={Hafner, Danijar and Pasukonis, Jurgis and Ba, Jimmy and Lillicrap, Timothy},
journal={arXiv preprint arXiv:2301.04104},
year={2023}
}