This project uses reinforcement learning (RL) with actor-critic model to instill flocking behaviour in drones based on biological models. This involved:
- Developing a simulation of multi-agent systems in an environment that incorporates potential field functions and boid flocking behavior.
- Formulating an effective loss function
- Designing a custom reward function that uses exponential variations to ensure flocking within set steps
- Drones are directed through gains that boost specific behaviours, with the goal to maximize long-term rewards.
This work highlights RL's promise in UAV systems and suggests transitioning to a more scalable multi-agent environment.
📖 For a comprehensive overview, refer to the dissertation document. 📖
The methods employed include:
- Incorporating the boid flocking model with separation, alignment, and cohesion behaviors.
- Utilizing the actor-critic RL model to learn gains that control flocking behavior.
- Layering potential field functions for attraction to destinations.
- Designing a custom environment and reward shaping to foster flocking and ensure drones reach their destinations.
The project's outcomes are:
- The RL agent successfully learns the emergent flocking behavior of drones.
- The significance of reward shaping in promoting desired behaviors is highlighted.
- Certain challenges, such as some drones not reaching their destinations, underscore the need for further refinement of the reward function.
Potential avenues for future exploration:
- Transition to a decentralized multi-agent approach to enhance the RL structure.
- Enrich the environment dynamics and introduce obstacles.
- Boost simulation speed using techniques like quad trees.
- Refine the reward function to rectify any undesirable behaviors.
Matlab Simulation: The custom environment and the RL agent.
Dissertation: The primary document detailing the research, methodology, and findings.
git clone --recursive https://github.com/oscell/Biologically-inspired-UAV.git
Meunier, Oscar. "Biologically inspired UAV guidance: Using reinforcement learning to optimize flocking behaviour." University of Glasgow, 2022.