This directory contains the work of Carlos F. De Inza Niemeijer for the completion of the MSc degree in Aerospace Engineering at TU Delft. The purpose of this project is to use reinforcement learning to train a neural network policy that can perform an autonomous rendezvous with a rotating target.
A custom environment was created to simulate the rendezvous scenario. This environment is located in rendezvous_env.py
. It follows the guidelines specified in OpenAI Gym.
The learning algorithm chosen to train the controller is the Proximal Policy Optimization (PPO) algorithm implemented by Stable-Baselines3.
The scripts main.py
or main_rnn.py
can be executed to train a feedforward policy or a recurrent policy, respectively. The trained policy will be saved as a .zip
file. If Weights & Biases is enabled, the callback function defined in custom/custom_callbacks.py
will log the progress of the training process.
The monte_carlo.py
script can be executed to evaluate the trajectories generated by a trained policy based on a set of initial conditions. The initial conditions for the Monte Carlo trajectories can be generated as a .csv
file by running the verification/get_initial_conditions.py
script. Trajectories can be plotted in various formats using the plot_*.py
scripts.
- The scripts
sensitivity_analysis.py
andtune_*.py
are used for performing a sensitivity analysis and for tuning the different components of the model. - The trained policies can be found in the
models
directory. - All of the results from the tuning, training, and evaluation experiments can be found in the
results
directory.