This project has been made in a studying context so it could have some errors in the code.
(You have a list in the "Bug List" file in the doc folder if you're interested to help the project!)
This project has been done with Gymnasium from Farama-Foundation that is made for the AI Reinforcement Learining and the Q-Learning domains in python.
(If you want to see what is gymnasium click here to go on the Github page of Gymnasium)
If you want more information about Q-Learning and the Frozen Lake game, you could read the article found on medium, he help me a lot to understand how works the Q-Learning: Q-Learning For Beginners by Maxime Labonne
Welcome on one of the most ultra-detailed version of the
Frozen-Lake Q-Learning project
Ver. 2.1.0
Like his name is telling, the project is an ultra-detailed version of the Frozen-Lake Q-Learning project.
This program allow to train an agent on the Frozen-Lake game in a range of episodes that the user enter at the start of the program. This program use the Exploration X Exploitation method for the training. That means that the agent explore the environment but also use the updated Q-Table to have a better update of the Q-Table at the end.
The program offers the user the possibility of testing the updated Q-Table obtained by following the training.
During the training like during the test, you have a lot of datas that are detailed in the console during the sessions.
For this project you need some packages to install to run correctly the project:
- gymnasium(ToyText):
pip install "gymnasium[toytext]"
- matplotlib.pyplot:
pip install matplotlib.pyplot
- numpy:
pip install numpy
- pygame:
pip install pygame
- time:
pip install time
- warning:
pip install warning
(optional only hide an error)
nb_success
: Is use in the formulanb_sucess/episodes*100
to calculate the success rate of the training and of the test of the trainingbest_sequence
: List of states in the best (shortest) episode that reach the goallongest_best_sequence
: List of states in the longest episode that reach the goallongest_sequence
: List of states in the longer episode that doesn't reach the goalshortest_sequence
: List of states in the shortest episode that doesn't reach the goal
(All the sequence appeared in the input format (0, 1, 2, 3) and the words format (LEFT, DOWN, RIGHT, UP))reward_counter
: number of time that the agent obtain the rewardreward_episode
: List of the episode that the agent obtain the rewardreward_sequence
: List of the states in the episodes that the agent obtain the rewardrecurent_sequence
: Number of the episodes that the agent done the same sequence to reach the goal with the best sequencetotal_actions
: Total number of actions in the episodes where the agent reach the goalaction_counts[action_words[action]]
: Number of Action by types of actions (LEFT, DOWN, RIGHT, UP)
- 2x2 map
- 4x4 map
- 8x8 map
- 16x16 map
(The list of predefined maps and random generations ones are in themap.txt
file in thetools
folder.)
The Q-Injection is a functionality that have for goal to test Q-Tables like:
- Randomized Q-Table
- Trained Q-Table (obtained by a training done by our team)
- A start of trained Q-Table (Three Value)
But also to train them to obtain better results using the Exploration X Exploitation method.
(For more information about the Q-Injection read the injection.md
file in the tools
folder)
For those who are interested by the calculation of the Q-Table here is an explication:
(Hope it helps you to understand the Q-Learning)
qtable[state, action] = qtable[state, action] + alpha * (reward + gamma * np.max(qtable[next_state, :]) - qtable[state, action])
qtable[state, action]
: This refers to the current value of action (0, 1, 2, 3 (LEFT, DOWN, RIGHT, UP)) in state (number of the case) of the Q-table. This is the value we will update.alpha
: This is the learning rate. It controls the extent to which new information will be integrated into the old values of the Q-table. A high value means that new information will have a greater impact on existing values, while a low value means they will have a lesser impact.reward
: This is the immediate reward obtained after taking action in state. This reward is equals to a positive float (1.0).gamma
: This is the discount factor. It represents the importance of future rewards compared to immediate rewards. A gamma close to 1 gives great importance to future rewards, while a gamma close to 0 gives similar importance to all rewards, whether immediate or future.np.max(qtable[next_state, :])
: This is the maximum value among all possible actions in the next state (next_state). This represents the best estimate of the future value that the agent can obtain from the next state.