finish most documenation, start part 1

uwrov · Aug 27, 2020 · 705b6a1 · 705b6a1
1 parent a13829e
commit 705b6a1
Show file tree

Hide file tree

Showing 6 changed files with 231 additions and 69 deletions.
diff --git a/Part 1/README.md b/Part 1/README.md
@@ -1 +1,24 @@
-# Part 1 - The Big Picture
+# Part 1 - The Big Picture
+
+## Robot Model
+
+Our ROV consists of 3 different layers. Lowest to highest level:
+1. Motor Code which tells the motors to move with certain voltages (speed).
+2. Onboard Raspberry Pi which tells the Arduino what to do and gives a live video feed to the surface.
+3. Surface Computer which we use to give commands to ROV.
+
+Mostly we'll be working between layers 2 and 3: creating computer vision algorithms to complete tasks.
+
+To make our lives a little simpler, we'll be using ROS to handle all the communication between layers.
+
+## ROS
+[ROS](http://wiki.ros.org/) is the Robot Operating System. Like any other operating system, it helps abstract a lot of the nitty gritty logistic details from you - the programmer.
+
+ROS helps different programs communicate with each other by providing nodes. Publishers can put out important data to nodes, and subscribers can read in that data. There's a lot of networking invovled in getting data from one place to another, and ROS provides an abstraction for that.
+
+ROS also moonlights as a package manager. It supports both Python and C++, so it has a lot of versatilty. A big bonus of using ROS that the nodes are totally independent of programming languages - so we can have one person write a publisher in Python and another write a subscriber in C++.
+
+We'll be writing most of the code in this tutorial in Python, but we could have written it in C++.
+
+Underwater simulator
+https://github.com/clydemcqueen/orca2
diff --git a/Part 3/subscriber/README.md b/Part 3/subscriber/README.md
@@ -1,6 +1,6 @@
 # Robot Plugin
 
-Load Method
+## Load Method
 ```
 virtual void Load(physics::ModelPtr _model, sdf::ElementPtr _sdf) {
     ...
@@ -13,57 +13,54 @@ Here, we are overriding the ModelPlugin's Load function
 
 Components of the Load Method
 ```
-     virtual void Load(physics::ModelPtr _model, sdf::ElementPtr _sdf)
-     {
-       _m = _model;
+virtual void Load(physics::ModelPtr _model, sdf::ElementPtr _sdf)
+{
+  _m = _model;
 
-       if(!ros::isInitialized)
-       {
-         ROS_FATAL_STREAM("ROS node for Gazebo not established. Plugin failed");
-         return;
-       }
-       ROS_INFO("Wheely Boi Plugin Loaded");
+  if(!ros::isInitialized)
+  {
+    ROS_FATAL_STREAM("ROS node for Gazebo not established. Plugin failed");
+    return;
+  }
+  ROS_INFO("Wheely Boi Plugin Loaded");
 
-       // For some reason the joint friction isn't loaded from the URDF
-       // so we have to manually update them here
-       _m->GetJoint("jointR")->SetParam("friction", 0, 0.0);
-       _m->GetJoint("jointL")->SetParam("friction", 0, 0.0);
+  // For some reason the joint friction isn't loaded from the URDF
+  // so we have to manually update them here
+  _m->GetJoint("jointR")->SetParam("friction", 0, 0.0);
+  _m->GetJoint("jointL")->SetParam("friction", 0, 0.0);
 
-       std::string name = _m->GetName();
+  std::string name = _m->GetName();
 
-       // Spawn our node
-       ROS_INFO("Spawning node: %s", name.c_str());
-       node.reset(new ros::NodeHandle(name.c_str())); 
-       ROS_INFO("Node spawned.");
+  // Spawn our node
+  ROS_INFO("Spawning node: %s", name.c_str());
+  node.reset(new ros::NodeHandle(name.c_str())); 
+  ROS_INFO("Node spawned.");
 
-       // Initialize our ROS subscriber
-       sub = new ros::Subscriber();
-       std::string subName = name + "/cmd";
-       *sub = node->subscribe(subName, 1, &WBPlugin::onCmd, this);
+  // Initialize our ROS subscriber
+  sub = new ros::Subscriber();
+  std::string subName = name + "/cmd";
+  *sub = node->subscribe(subName, 1, &WBPlugin::onCmd, this);
 
-       _w = _m->GetWorld();
-       lastSimTime = _w->SimTime();
+  _w = _m->GetWorld();
+  lastSimTime = _w->SimTime();
 
-       lc_V = lc_Y = 0;
-       lc_T = lastSimTime;
+  lc_V = lc_Y = 0;
+  lc_T = lastSimTime;
 
-       // Initialize our prior positions and rotations.
-       pRot = _m->WorldPose().Rot();
-       pPos = _m->WorldPose().Pos();
-
-       // Bind our onUpdate function to a callback that happens every nanosecond
-       this->updateConnection = event::Events::ConnectWorldUpdateBegin(boost::bind(&WBPlugin::onUpdate, this, _1));
-      }
-  };
+  // Initialize our prior positions and rotations.
+  pRot = _m->WorldPose().Rot();
+  pPos = _m->WorldPose().Pos();
 
+  // Bind our onUpdate function to a callback that happens every nanosecond
+  this->updateConnection = event::Events::ConnectWorldUpdateBegin(boost::bind(&WBPlugin::onUpdate, this, _1));
+}
 ```
 - The if statement ensures that the ROS master node has been launched. It will terminate the program otherwise.
 - Some initial states that are not loaded from the URDF can be set in the Load method (ex: friction).
 - The Load method is also where we spawn in the model, initialize the ROS subscriber and initialize initial positions and rotations.
 - In the Load method, we have to bind the onUpdate function to be called every nanosecond.
 
-
-*onUpdate Method*
+## onUpdate Method
 Updates the movement every nanosecond
 - create a timeout for every nanosecond, since a real microcontroller does not process at that rate
   - get current time, define length of wait period and timeout
@@ -77,7 +74,7 @@ ignition::math::Vector3<double> pos = _m->WorldPose().Pos();
 ```
 and the distance travelled in the last timestamp
 ```
-ignitoin::math::Vector3<double> dist = pos - pPos;
+ignition::math::Vector3<double> dist = pos - pPos;
 ```
 
 - Determine the desired speed by multiplying the maximum linear velocity and the last given velocity command

diff --git a/Part 4/controller/README.md b/Part 4/controller/README.md
@@ -1,3 +1,139 @@
 # Buliding cam_control.py
 
-Now we'll apply our newfound ML knowledge to a ROS controller. By the 
+Now we'll apply our newfound ML knowledge to a ROS controller.
+
+Here's our roadmap for this file
+- Make a new thread to stream video from a source
+- Load a trained model to recognized handwritten digits
+- Publish findings to '/wheely_boi/wheely_boi/cmd'
+
+
+## Threading
+The vm we're running our image recognition from doesn't have that many resources. So if we try to toss every frame of our video stream into the model, we'll be met with poor performance and a lot of lag.
+
+To avoid this, we'll create a new thread which is dedicated to streaming the video, and occasionally grab and process a frame from it every second or so.
+
+Now let's jump into the code.
+
+```
+from threading import Thread
+from tools.VideoStream import VideoStream
+
+...
+
+stream = VideoStream(src)
+stream.start()
+```
+
+This chunk of code creates a new VideoStream object, and that's where the threading magic happens.
+
+```
+class VideoStream(Thread):
+    def __init__(self, src=0):
+        Thread.__init__(self)
+```
+`VideoStream` is an object which extends `Thread`, which means it has support for threading. We call `Thread.__init__(self)` to call the superclass' constructor for the object.
+
+Now when run `stream.start()`, we'll start a new thread which will run `VideoStream.run()`, which is a while loop which will just read from the video source.
+
+Now back in the while loop in `cam_control.py` we'll grab a frame with `frame=stream.frame`. This is the frame that we'll throw into our model for prediction.
+
+## Digit Recognition
+It's a little hidden in this function but all our digit recognition happens within a call to `find_label(frame, network)` within our while loop.
+
+`find_label()` is a function which lives in `cam_util.py`, a utility file which contains some useful functions.
+
+Let's walk through a call to `find_label()`:
+```
+final_img = img_to_mnist(frame)
+
+def img_to_mnist(frame):
+    gray_img = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
+    gray_img = cv2.GaussianBlur(gray_img, (5, 5), 0)
+    #adaptive here does better with variable lighting:
+    gray_img = cv2.adaptiveThreshold(gray_img, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
+                                     cv2.THRESH_BINARY_INV, blockSize = 321, C = 28)
+
+    return gray_img
+```
+First we'll do some filtering on our input frame with `img_to_mnist()`. Recall that the images in the MNIST dataset are all in black and white, but our camera pictures things in color, so we'll apply some filters to it to get in a usable state. Going play by play:
+1. `cv2.CvtColor`: First we'll convert our image to grayscale
+2. `cv2.GaussianBlur`: Next we'll apply a blur to the image to reduce noise. This is helpful as it'll reduce the amount of contours we'll have to sift through later.
+3. `cv2.adaptiveThreashold`: This function will make our image close to black and white. It's responsible for making our image look like an MNIST data point.
+
+```
+contours, _ = cv2.findContours(final_img.copy(), cv2.RETR_EXTERNAL,
+                                    cv2.CHAIN_APPROX_SIMPLE)
+```
+Next, we'll apply contours and identify the boundaries in our image. This will hopefully give us an outline of our handwritten digit if `img_to_mnist` did its job correctly.
+
+```
+rects = [cv2.boundingRect(contour) for contour in contours]
+rects = [rect for rect in rects if rect[2] >= 3 and rect[3] >= 8]
+
+...
+
+max_rect_i = np.argmax([rect[2] * rect[3] for rect in rects])
+rect = rects[max_rect_i]
+
+```
+Here we are singling out the largest rectangle present in the image (if there is one) and drawing a bounding box around it.
+
+```
+mnist_frame = extract_digit(frame, final_img, rect, pad = 15)
+
+def extract_digit(frame, img, rect, pad = 10, SIZE=28):
+    cropped_digit = img[y-pad:y+h+pad, x-pad:x+w+pad]
+    cropped_digit = cropped_digit/255.0
+
+    #only look at images that are somewhat big:
+    if cropped_digit.shape[0] < 32 or cropped_digit.shape[1] < 32:
+        return
+
+    return cv2.resize(cropped_digit, (SIZE, SIZE))
+```
+Given the predetermined rectangle and its contents, the image is cropped and resized to better fit our model using the `extract_digit()` method. Smaller images are ignored.
+
+```
+model_in = torch.tensor(mnist_frame).float()
+# lift model into appropriate dim
+model_in = model_in.unsqueeze(0).unsqueeze(0)
+
+with torch.no_grad():
+    output = network(model_in)
+    label = output.data.max(1, keepdim=True)[1][0][0]
+ret = int(label)
+```
+Now we'll run into some PyTorch voodoo. First we'll spend some time getting an appropriate `model_in` into a format that our neural network will accept.
+
+Next, we'll do `output = network(model_in)` to get an output from our network, then extract the corresponding label from the output and return it.
+
+## Publishing
+Our publishing is essentially the same as it was in `key_in.py`, but with different signals.
+
+In our while loop, we'll simply tell ROS to publish a message according to what our model decides the output should be.
+```
+publish(find_label(frame, network), t, velocity_publisher)
+
+def publish(signal, t, velocity_publisher):
+    if (signal == 8):
+        t.linear.x = min(t.linear.x + 0.1, 1.0)
+    elif (signal == 4):
+        t.angular.z = min(t.angular.z + 0.1, 1.0)
+    elif (signal == 6):
+        t.linear.x = max(t.linear.x - 0.1, -1.0)
+    elif (signal == 2):
+        t.angular.z = max(t.angular.z - 0.1, -1.0)
+    else:
+        t.linear.x = 0
+        t.angular.z = 0
+
+    t.linear.x = round(t.linear.x, 1)
+    t.angular.z = round(t.angular.z, 1)
+
+    rospy.loginfo("Sending Command v:" + str(t.linear.x)
+                    + ", y:" + str(t.angular.z))
+    velocity_publisher.publish(t)
+```
+
+`publish(signal, t, velocity_publisher)` is essentially lifted from `key_in.py`.
diff --git a/Part 4/controller/cam_control.py b/Part 4/controller/cam_control.py
@@ -27,7 +27,7 @@ def main():
                                          Twist, queue_size=10)
     rospy.init_node('wheely_boi', anonymous=True)
     t = Twist()
-    rate = rospy.Rate(1)
+    rate = rospy.Rate(wait_time)
 
     stream = VideoStream(src)
     stream.start()

diff --git a/Part 4/controller/cam_util.py b/Part 4/controller/cam_util.py
@@ -31,31 +31,30 @@ def img_to_mnist(frame):
 # Take in a frame, return an integer 
 # representing a number found in the frame
 def find_label(frame, network):
-        final_img = img_to_mnist(frame)
-        contours, _ = cv2.findContours(final_img.copy(), cv2.RETR_EXTERNAL,
-                                        cv2.CHAIN_APPROX_SIMPLE)
-
-        rects = [cv2.boundingRect(contour) for contour in contours]
-        rects = [rect for rect in rects if rect[2] >= 3 and rect[3] >= 8]
-
-        ret = -1
-
-        # Identify the largest rectangle to draw (if there is one)
-        if len(rects) > 0:
-            max_rect_i = np.argmax([rect[2] * rect[3] for rect in rects])
-            rect = rects[max_rect_i]
-
-            #Draw the rectangle we singled out
-            x, y, w, h = rect
-            mnist_frame = extract_digit(frame, final_img, rect, pad = 15)
-            # cv2.imshow('seen', mnist_frame)
-            if mnist_frame is not None:
-                model_in = torch.tensor(mnist_frame).float()
-                # lift model into appropriate dim
-                model_in = model_in.unsqueeze(0).unsqueeze(0)
-
-                with torch.no_grad():
-                    output = network(model_in)
-                    label = output.data.max(1, keepdim=True)[1][0][0]
-                ret = int(label)
-        return ret
+    final_img = img_to_mnist(frame)
+    contours, _ = cv2.findContours(final_img.copy(), cv2.RETR_EXTERNAL,
+                                    cv2.CHAIN_APPROX_SIMPLE)
+
+    rects = [cv2.boundingRect(contour) for contour in contours]
+    rects = [rect for rect in rects if rect[2] >= 3 and rect[3] >= 8]
+
+    ret = -1
+
+    # Identify the largest rectangle to draw (if there is one)
+    if len(rects) > 0:
+        max_rect_i = np.argmax([rect[2] * rect[3] for rect in rects])
+        rect = rects[max_rect_i]
+
+        #Draw the rectangle we singled out
+        mnist_frame = extract_digit(frame, final_img, rect, pad = 15)
+        # cv2.imshow('seen', mnist_frame)
+        if mnist_frame is not None:
+            model_in = torch.tensor(mnist_frame).float()
+            # lift model into appropriate dim
+            model_in = model_in.unsqueeze(0).unsqueeze(0)
+
+            with torch.no_grad():
+                output = network(model_in)
+                label = output.data.max(1, keepdim=True)[1][0][0]
+            ret = int(label)
+    return ret
diff --git a/Part 4/model/mnist.ipynb b/Part 4/model/mnist.ipynb
@@ -73,6 +73,13 @@
     "5. Throw output of 4. into a neural network which looks like our first visualization of a neural network.\n",
     "6. Determine our output.\n",
     "\n",
+    "Our output ends up being a list of 10 floats, which represent scores on how likely an input matches an index. So if we have an output which looks like:\n",
+    "\n",
+    "```\n",
+    "[0.3, 0.9, 0.1, 0.0, 0.1, 0.1, 0.6, 0.7, 0.8, 0.7]\n",
+    "```\n",
+    "We would assign the label `1` to the input.\n",
+    "\n",
     "## Setup\n",
     "\n",
     "We'll be using PyTorch to run our nerual network. Machine Learning requires a lot of disgusting math, so we want to use libraries to do the heavy lifting for us whenever possible.\n",