add README

rwbot · Apr 19, 2018 · 95bfe43 · 95bfe43
1 parent 8f21965
commit 95bfe43
Showing 1 changed file with 121 additions and 0 deletions.
diff --git a/README.md b/README.md
@@ -0,0 +1,121 @@
+
+# Deep RL Arm Manipulation
+
+This project is based on the Nvidia open source project "jetson-reinforcement" developed by [Dustin Franklin](https://github.com/dusty-nv). The goal of the project is to create a DQN agent and define reward functions to teach a robotic arm to carry out two primary objectives:
+
+1. Have any part of the robot arm touch the object of interest, with at least a 90% accuracy.
+2. Have only the gripper base of the robot arm touch the object, with at least a 80% accuracy.
+
+## Building from Source (Nvidia Jetson TX2)
+
+Run the following commands from terminal to build the project from source:
+
+``` bash
+$ sudo apt-get install cmake
+$ git clone http://github.com/udacity/RoboND-DeepRL-Project
+$ cd RoboND-DeepRL-Project
+$ git submodule update --init
+$ mkdir build
+$ cd build
+$ cmake ../
+$ make
+```
+
+During the `cmake` step, Torch will be installed so it can take awhile. It will download packages and ask you for your `sudo` password during the install.
+
+## Testing the API
+
+To make sure that the reinforcement learners are still functioning properly from C++, a simple example of using the API called [`catch`](samples/catch/catch.cpp) is provided.  Similar in concept to pong, a ball drops from the top of the screen which the agent must catch before the ball reaches the bottom of the screen, by moving it's paddle left or right.
+
+To test the textual [`catch`](samples/catch/catch.cpp) sample, run the following executable from the terminal.  After around 100 episodes or so, the agent should start winning the episodes nearly 100% of the time:  
+
+``` bash
+$ cd RoboND-DeepRL-Project/build/aarch64/bin
+$ ./catch 
+[deepRL]  input_width:    64
+[deepRL]  input_height:   64
+[deepRL]  input_channels: 1
+[deepRL]  num_actions:    3
+[deepRL]  optimizer:      RMSprop
+[deepRL]  learning rate:  0.01
+[deepRL]  replay_memory:  10000
+[deepRL]  batch_size:     32
+[deepRL]  gamma:          0.9
+[deepRL]  epsilon_start:  0.9
+[deepRL]  epsilon_end:    0.05
+[deepRL]  epsilon_decay:  200.0
+[deepRL]  allow_random:   1
+[deepRL]  debug_mode:     0
+[deepRL]  creating DQN model instance
+[deepRL]  DQN model instance created
+[deepRL]  DQN script done init
+[cuda]  cudaAllocMapped 16384 bytes, CPU 0x1020a800000 GPU 0x1020a800000
+[deepRL]  pyTorch THCState  0x0318D490
+[deepRL]  nn.Conv2d() output size = 800
+WON! episode 1
+001 for 001  (1.0000)  
+WON! episode 5
+004 for 005  (0.8000)  
+WON! episode 10
+007 for 010  (0.7000)  
+WON! episode 15
+010 for 015  (0.6667)  
+WON! episode 20
+013 for 020  (0.6500)  13 of last 20  (0.65)  (max=0.65)
+WON! episode 25
+015 for 025  (0.6000)  11 of last 20  (0.55)  (max=0.65)
+LOST episode 30
+018 for 030  (0.6000)  11 of last 20  (0.55)  (max=0.65)
+LOST episode 35
+019 for 035  (0.5429)  09 of last 20  (0.45)  (max=0.65)
+WON! episode 40
+022 for 040  (0.5500)  09 of last 20  (0.45)  (max=0.65)
+LOST episode 45
+024 for 045  (0.5333)  09 of last 20  (0.45)  (max=0.65)
+WON! episode 50
+027 for 050  (0.5400)  09 of last 20  (0.45)  (max=0.65)
+WON! episode 55
+031 for 055  (0.5636)  12 of last 20  (0.60)  (max=0.65)
+LOST episode 60
+034 for 060  (0.5667)  12 of last 20  (0.60)  (max=0.65)
+WON! episode 65
+038 for 065  (0.5846)  14 of last 20  (0.70)  (max=0.70)
+WON! episode 70
+042 for 070  (0.6000)  15 of last 20  (0.75)  (max=0.75)
+LOST episode 75
+045 for 075  (0.6000)  14 of last 20  (0.70)  (max=0.75)
+WON! episode 80
+050 for 080  (0.6250)  16 of last 20  (0.80)  (max=0.80)
+WON! episode 85
+055 for 085  (0.6471)  17 of last 20  (0.85)  (max=0.85)
+WON! episode 90
+059 for 090  (0.6556)  17 of last 20  (0.85)  (max=0.85)
+WON! episode 95
+063 for 095  (0.6632)  18 of last 20  (0.90)  (max=0.90)
+WON! episode 100
+068 for 100  (0.6800)  18 of last 20  (0.90)  (max=0.90)
+WON! episode 105
+073 for 105  (0.6952)  18 of last 20  (0.90)  (max=0.90)
+WON! episode 110
+078 for 110  (0.7091)  19 of last 20  (0.95)  (max=0.95)
+WON! episode 111
+079 for 111  (0.7117)  19 of last 20  (0.95)  (max=0.95)
+WON! episode 112
+080 for 112  (0.7143)  20 of last 20  (1.00)  (max=1.00)
+```
+
+Internally, [`catch`](samples/catch/catch.cpp) is using the [`dqnAgent`](c/dqnAgent.h) API from our C++ library to implement the learning.
+
+
+## Project Environment
+
+To get started with the project environment, run the following:
+
+``` bash
+$ cd RoboND-DeepRL-Project/build/aarch64/bin
+$ ./gazebo-arm.sh
+```
+
+<img src="https://github.com/dusty-nv/jetson-reinforcement/raw/master/docs/images/gazebo.png">
+
+The plugins which hook the learning into the simulation are located in the `gazebo/` directory of the repo. The RL agent and the reward functions are to be defined in [`ArmPlugin.cpp`](gazebo/ArmPlugin.cpp).