In this project you will apply reinforcement learning techniques for a self-driving agent in a simplified world to aid it in effectively reaching its destinations in the allotted time. You will first investigate the environment the agent operates in by constructing a very basic driving implementation. Once your agent is successful at operating within the environment, you will then identify each possible state the agent can be in when considering such things as traffic lights and oncoming traffic at each intersection. With states identified, you will then implement a Q-Learning algorithm for the self-driving agent to guide the agent towards its destination within the allotted time. Finally, you will improve upon the Q-Learning algorithm to find the best configuration of learning and exploration factors to ensure the self-driving agent is reaching its destinations with consistently positive results.
In the not-so-distant future, taxicab companies across the United States no longer employ human drivers to operate their fleet of vehicles. Instead, the taxicabs are operated by self-driving agents, known as smartcabs, to transport people from one location to another within the cities those companies operate. In major metropolitan areas, such as Chicago, New York City, and San Francisco, an increasing number of people have come to rely on smartcabs to get to where they need to go as safely and efficiently as possible. Although smartcabs have become the transport of choice, concerns have arose that a self-driving agent might not be as safe or efficient as human drivers, particularly when considering city traffic lights and other vehicles. To alleviate these concerns, your task as an employee for a national taxicab company is to use reinforcement learning techniques to construct a demonstration of a smartcab operating in real-time to prove that both safety and efficiency can be achieved.
This project uses the following software and Python libraries:
- Python 2.7
- NumPy
- PyGame
- Helpful links for installing PyGame:
- Getting Started
- PyGame Information
- Google Group
- PyGame subreddit
If you do not have Python installed yet, it is highly recommended that you install the Anaconda distribution of Python, which already has the above packages and more included. Make sure that you select the Python 2.7 installer and not the Python 3.x installer. pygame
can then be installed using one of the following commands:
Mac: conda install -c https://conda.anaconda.org/quasiben pygame
Windows: conda install -c https://conda.anaconda.org/tlatorre pygame
Linux: conda install -c https://conda.anaconda.org/prkrekel pygame
For this assignment, you can find the smart cab
folder containing the necessary project files on the Machine Learning projects GitHub, under the projects
folder. You may download all of the files for projects we'll use in this Nanodegree program directly from this repo. Please make sure that you use the most recent version of project files when completing a project!
This project contains two directories:
/images/
: This folder contains various images of cars to be used in the graphical user interface. You will not need to modify or create any files in this directory./smartcab/
: This folder contains the Python scripts that create the environment, graphical user interface, the simulation, and the agents. You will not need to modify or create any files in this directory except foragent.py
.
In /smartcab/
are the following four files:
- Modify:
agent.py
: This is the main Python file where you will be performing your work on the project.
- Do not modify:
environment.py
: This Python file will create the smartcab environment.planner.py
: This Python file creates a high-level planner for the agent to follow towards a set goal.simulation.py
: This Python file creates the simulation and graphical user interface.
In a terminal or command window, navigate to the top-level project directory smartcab/
(that contains the two project directories) and run one of the following commands:
python smartcab/agent.py
or
python -m smartcab.agent
This will run the agent.py
file and execute your implemented agent code into the environment. A README file has also been provided with the project files which may contain additional necessary information or instruction for the project. The following Definitions and Tasks slides will provide details for how the project will be completed.
The smartcab operates in an ideal, grid-like city (similar to New York City), with roads going in the North-South and East-West directions. Other vehicles will certainly be present on the road, but there will be no pedestrians to be concerned with. At each intersection there is a traffic light that either allows traffic in the North-South direction or the East-West direction. U.S. Right-of-Way rules apply:
- On a green light, a left turn is permitted if there is no oncoming traffic making a right turn or coming straight through the intersection.
- On a red light, a right turn is permitted if no oncoming traffic is approaching from your left through the intersection. To understand how to correctly yield to oncoming traffic when turning left, you may refer to this official drivers? education video, or this passionate exposition.
Assume that the smartcab is assigned a route plan based on the passengers? starting location and destination. The route is split at each intersection into waypoints, and you may assume that the smartcab, at any instant, is at some intersection in the world. Therefore, the next waypoint to the destination, assuming the destination has not already been reached, is one intersection away in one direction (North, South, East, or West). The smartcab has only an egocentric view of the intersection it is at: It can determine the state of the traffic light for its direction of movement, and whether there is a vehicle at the intersection for each of the oncoming directions. For each action, the smartcab may either idle at the intersection, or drive to the next intersection to the left, right, or ahead of it. Finally, each trip has a time to reach the destination which decreases for each action taken (the passengers want to get there quickly). If the allotted time becomes zero before reaching the destination, the trip has failed.
The smartcab receives a reward for each successfully completed trip, and also receives a smaller reward for each action it executes successfully that obeys traffic rules. The smartcab receives a small penalty for any incorrect action, and a larger penalty for any action that violates traffic rules or causes an accident with another vehicle. Based on the rewards and penalties the smartcab receives, the self-driving agent implementation should learn an optimal policy for driving on the city roads while obeying traffic rules, avoiding accidents, and reaching passengers? destinations in the allotted time.
You will be required to submit a project report along with your modified agent code as part of your submission. As you complete the tasks below, include thorough, detailed answers to each question provided in italics.
To begin, your only task is to get the smartcab to move around in the environment. At this point, you will not be concerned with any sort of optimal driving policy. Note that the driving agent is given the following information at each intersection:
- The next waypoint location relative to its current location and heading.
- The state of the traffic light at the intersection and the presence of oncoming vehicles from other directions.
- The current time left from the allotted deadline.
To complete this task, simply have your driving agent choose a random action from the set of possible actions (None
, 'forward'
, 'left'
, 'right'
) at each intersection, disregarding the input information above. Set the simulation deadline enforcement, enforce_deadline
to False
and observe how it performs.
QUESTION: Observe what you see with the agent's behavior as it takes random actions. Does the smartcab eventually make it to the destination? Are there any other interesting observations to note?
Now that your driving agent is capable of moving around in the environment, your next task is to identify a set of states that are appropriate for modeling the smartcab and environment. The main source of state variables are the current inputs at the intersection, but not all may require representation. You may choose to explicitly define states, or use some combination of inputs as an implicit state. At each time step, process the inputs and update the agent's current state using the self.state
variable. Continue with the simulation deadline enforcement enforce_deadline
being set to False
, and observe how your driving agent now reports the change in state as the simulation progresses.
QUESTION: What states have you identified that are appropriate for modeling the smartcab and environment? Why do you believe each of these states to be appropriate for this problem?
OPTIONAL: How many states in total exist for the smartcab in this environment? Does this number seem reasonable given that the goal of Q-Learning is to learn and make informed decisions about each state? Why or why not?
With your driving agent being capable of interpreting the input information and having a mapping of environmental states, your next task is to implement the Q-Learning algorithm for your driving agent to choose the best action at each time step, based on the Q-values for the current state and action. Each action taken by the smartcab will produce a reward which depends on the state of the environment. The Q-Learning driving agent will need to consider these rewards when updating the Q-values. Once implemented, set the simulation deadline enforcement enforce_deadline
to True
. Run the simulation and observe how the smartcab moves about the environment in each trial.
The formulas for updating Q-values can be found in this video.
QUESTION: What changes do you notice in the agent's behavior when compared to the basic driving agent when random actions were always taken? Why is this behavior occurring?
Your final task for this project is to enhance your driving agent so that, after sufficient training, the smartcab is able to reach the destination within the allotted time safely and efficiently. Parameters in the Q-Learning algorithm, such as the learning rate (alpha
), the discount factor (gamma
) and the exploration rate (epsilon
) all contribute to the driving agent?s ability to learn the best action for each state. To improve on the success of your smartcab:
- Set the number of trials,
n_trials
, in the simulation to 100. - Run the simulation with the deadline enforcement
enforce_deadline
set toTrue
(you will need to reduce the update delayupdate_delay
and set thedisplay
toFalse
). - Observe the driving agent?s learning and smartcab?s success rate, particularly during the later trials.
- Adjust one or several of the above parameters and iterate this process.
This task is complete once you have arrived at what you determine is the best combination of parameters required for your driving agent to learn successfully.
QUESTION: Report the different values for the parameters tuned in your basic implementation of Q-Learning. For which set of parameters does the agent perform best? How well does the final driving agent perform?
QUESTION: Does your agent get close to finding an optimal policy, i.e. reach the destination in the minimum possible time, and not incur any penalties? How would you describe an optimal policy for this problem?
Your project will be reviewed by a Udacity reviewer against the Train a Smartcab to Drive project rubric. Be sure to review this rubric thoroughly and self-evaluate your project before submission. All criteria found in the rubric must be meeting specifications for you to pass.
When you are ready to submit your project, collect the following files and compress them into a single archive for upload. Alternatively, you may supply the following files on your GitHub Repo in a folder named smartcab
for ease of access:
- The
agent.py
Python file with all code implemented as required in the instructed tasks. - A PDF project report with the name report.pdf which answers all of the questions related to the tasks completed. This file must be present for your project to be evaluated.
Once you have collected these files and reviewed the project rubric, proceed to the project submission page.
When you're ready to submit your project, click on the Submit Project button at the bottom of the page.
If you are having any problems submitting your project or wish to check on the status of your submission, please email us at [email protected] or visit us in the discussion forums.
You will get an email as soon as your reviewer has feedback for you. In the meantime, review your next project and feel free to get started on it or the courses supporting it!