diff --git a/INSTRUCTION.md b/INSTRUCTION.md
index 86909c34..c629f874 100644
--- a/INSTRUCTION.md
+++ b/INSTRUCTION.md
@@ -68,7 +68,7 @@ You will need to implement the following features:
   * See notes on diffuse/specular in `scatterRay` and on imperfect specular below.
 * Path continuation/termination using Stream Compaction from Project 2.
 * After you have a [basic pathtracer up and running](img/REFERENCE_cornell.5000samp.png),
-implement a means of making rays/pathSegments/intersections contiguous in memory by material type. This should be easily toggleable.
+  implement a means of making rays/pathSegments/intersections contiguous in memory by material type. This should be easily toggleable.
   * Consider the problems with coloring every path segment in a buffer and performing BSDF evaluation using one big shading kernel: different materials/BSDF evaluations within the kernel will take different amounts of time to complete.
   * Sort the rays/path segments so that rays/paths interacting with the same material are contiguous in memory before shading. How does this impact performance? Why?
 * A toggleable option to cache the first bounce intersections for re-use across all subsequent iterations. Provide performance benefit analysis across different max ray depths.
@@ -149,8 +149,8 @@ For each extra feature, you must provide the following analysis:
 
 You'll be working in the following files. Look for important parts of the code:
 
-* Search for `CHECKITOUT`.
-* You'll have to implement parts labeled with `TODO`. (But don't let these constrain you - you have free rein!)
+* Search for `CHECKITOUT`. (`interactions.h`, `intersections.h` and `main.cpp`)
+* You'll have to implement parts labeled with `TODO`. (But don't let these constrain you - you have free rein!)  (`interactions.h`, `pathtrace.cu`)
 
 * `src/pathtrace.cu`: path tracing kernels, device functions, and calling code
   * `pathtraceInit` initializes the path tracer state - it should copy scene data (e.g. geometry, materials) from `Scene`.
diff --git a/README.md b/README.md
index 110697ce..f82a683d 100644
--- a/README.md
+++ b/README.md
@@ -1,13 +1,45 @@
 CUDA Path Tracer
 ================
 
-**University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 3**
+![](img/Demo.png)
 
-* (TODO) YOUR NAME HERE
-* Tested on: (TODO) Windows 22, i7-2222 @ 2.22GHz 22GB, GTX 222 222MB (Moore 2222 Lab)
 
-### (TODO: Your README)
 
-*DO NOT* leave the README to the last minute! It is a crucial part of the
-project, and we will not be able to grade you without a good README.
+## Sorting by Material
+
+5000 iteration with a simple Cornell scene: 
+
+![](img/sorting.png)
+
+Sorting by material worsened the performance when using the simple Cornell scene. I think that it is because the scene has so few materials that sorting itself would negatively affect the performance. I predict that if the scene is complex with lots of materials, sorting would improve the performance. 
+
+## Caching the First Bounce
+
+5000 iteration with a simple Cornell scene: 
+
+![](img/caching.png)
+
+Caching the first bounce gives us a huge performance boost as shown in the figure. Without considering anti-aliasing, it is critical that we cache the first bounce so that later iterations can reuse it whenever they need. 
+
+## Anti-aliasing
+
+Visual comparision:
+
+![](img/AA.gif)
+
+Performance comparison:
+
+![](img/AA.png)
+
+# Notes
+
+Changes to scene files: 
+
+- Use FOVYH for half angle vertical field of view
+
+### Notes to myself
+
+ATTENTION: 
+
+1. read access violation when using thrust::remove_if. Fix: use thrust::device as the first parameter to indicate the correct execution policy because dev_paths is in device memory
 
diff --git a/img/AA.gif b/img/AA.gif
new file mode 100644
index 00000000..7555d7cd
Binary files /dev/null and b/img/AA.gif differ
diff --git a/img/AA.png b/img/AA.png
new file mode 100644
index 00000000..c03a0ed4
Binary files /dev/null and b/img/AA.png differ
diff --git a/img/Demo.png b/img/Demo.png
new file mode 100644
index 00000000..8df1f82b
Binary files /dev/null and b/img/Demo.png differ
diff --git a/img/caching.png b/img/caching.png
new file mode 100644
index 00000000..b833c14b
Binary files /dev/null and b/img/caching.png differ
diff --git a/img/sorting.png b/img/sorting.png
new file mode 100644
index 00000000..4afa770d
Binary files /dev/null and b/img/sorting.png differ
diff --git a/img/stuck.png b/img/stuck.png
new file mode 100644
index 00000000..cab788c9
Binary files /dev/null and b/img/stuck.png differ
diff --git a/scenes/cornell.txt b/scenes/cornell.txt
index 83ff8202..1dd0093e 100644
--- a/scenes/cornell.txt
+++ b/scenes/cornell.txt
@@ -50,8 +50,8 @@ EMITTANCE   0
 
 // Camera
 CAMERA
-RES         800 800
-FOVY        45
+RES         1920 1080
+FOVYH       45
 ITERATIONS  5000
 DEPTH       8
 FILE        cornell
@@ -114,4 +114,4 @@ sphere
 material 4
 TRANS       -1 4 -1
 ROTAT       0 0 0
-SCALE       3 3 3
+SCALE       6 6 6
\ No newline at end of file
diff --git a/scenes/newScene.txt b/scenes/newScene.txt
new file mode 100644
index 00000000..bf062455
--- /dev/null
+++ b/scenes/newScene.txt
@@ -0,0 +1,149 @@
+// Emissive material (light)
+MATERIAL 0
+RGB         1 1 1
+SPECEX      0
+SPECRGB     0 0 0
+REFL        0
+REFR        0
+REFRIOR     0
+EMITTANCE   5
+
+// Diffuse white
+MATERIAL 1
+RGB         .98 .98 .98
+SPECEX      0
+SPECRGB     0 0 0
+REFL        0
+REFR        0
+REFRIOR     0
+EMITTANCE   0
+
+// Diffuse red
+MATERIAL 2
+RGB         .85 .35 .35
+SPECEX      0
+SPECRGB     0 0 0
+REFL        0
+REFR        0
+REFRIOR     0
+EMITTANCE   0
+
+// Diffuse green
+MATERIAL 3
+RGB         .35 .85 .35
+SPECEX      0
+SPECRGB     0 0 0
+REFL        0
+REFR        0
+REFRIOR     0
+EMITTANCE   0
+
+// Specular white
+MATERIAL 4
+RGB         .98 .98 .98
+SPECEX      0
+SPECRGB     .98 .98 .98
+REFL        1
+REFR        0
+REFRIOR     0
+EMITTANCE   0
+
+// Camera
+CAMERA
+RES         1920 1080
+FOVYH       45
+ITERATIONS  5000
+DEPTH       8
+FILE        cornell
+EYE         0.0 5 10.5
+LOOKAT      0 5 0
+UP          0 1 0
+
+
+// Ceiling light
+OBJECT 0
+cube
+material 0
+TRANS       0 10 0
+ROTAT       0 0 0
+SCALE       3 .3 3
+
+// Floor
+OBJECT 1
+cube
+material 1
+TRANS       0 0 0
+ROTAT       0 0 0
+SCALE       10 .01 10
+
+// Ceiling
+OBJECT 2
+cube
+material 1
+TRANS       0 10 0
+ROTAT       0 0 90
+SCALE       .01 10 10
+
+// Back wall
+OBJECT 3
+cube
+material 1
+TRANS       0 5 -5
+ROTAT       0 90 0
+SCALE       .01 10 10
+
+// Left wall
+OBJECT 4
+cube
+material 2
+TRANS       -5 5 0
+ROTAT       0 0 0
+SCALE       .01 10 10
+
+// Right wall
+OBJECT 5
+cube
+material 3
+TRANS       5 5 0
+ROTAT       0 0 0
+SCALE       .01 10 10
+
+// Sphere
+OBJECT 6
+sphere
+material 4
+TRANS       -1 2 2
+ROTAT       0 0 0
+SCALE       3 3 3
+
+// Sphere
+OBJECT 7
+sphere
+material 1
+TRANS       -1 4 0
+ROTAT       0 0 0
+SCALE       3 3 3
+
+// Sphere
+OBJECT 8
+sphere
+material 3
+TRANS       -1 5 -1
+ROTAT       0 0 0
+SCALE       3 3 3
+
+// Sphere
+OBJECT 9
+sphere
+material 2
+TRANS       0 4 -1
+ROTAT       0 0 0
+SCALE       3 3 3
+
+// Sphere
+OBJECT 10
+sphere
+material 4
+TRANS       1 2 2
+ROTAT       0 0 0
+SCALE       3 3 3
diff --git a/scenes/sphere.txt b/scenes/sphere.txt
index a74b5458..a8221ccf 100644
--- a/scenes/sphere.txt
+++ b/scenes/sphere.txt
@@ -11,7 +11,7 @@ EMITTANCE   5
 // Camera
 CAMERA
 RES         800 800
-FOVY        45
+FOVYH       45
 ITERATIONS  5000
 DEPTH       8
 FILE        sphere
diff --git a/src/interactions.h b/src/interactions.h
index f969e458..63f717d1 100644
--- a/src/interactions.h
+++ b/src/interactions.h
@@ -1,3 +1,4 @@
+// ray scattering functions
 #pragma once
 
 #include "intersections.h"
@@ -50,7 +51,7 @@ glm::vec3 calculateRandomDirectionInHemisphere(
  *
  * The visual effect you want is to straight-up add the diffuse and specular
  * components. You can do this in a few ways. This logic also applies to
- * combining other types of materias (such as refractive).
+ * combining other types of materials (such as refractive).
  *
  * - Always take an even (50/50) split between a each effect (a diffuse bounce
  *   and a specular bounce), but divide the resulting color of either branch
@@ -72,8 +73,33 @@ void scatterRay(
         glm::vec3 intersect,
         glm::vec3 normal,
         const Material &m,
-        thrust::default_random_engine &rng) {
+        thrust::default_random_engine &rng) 
+{
     // TODO: implement this.
     // A basic implementation of pure-diffuse shading will just call the
     // calculateRandomDirectionInHemisphere defined above.
+    pathSegment.ray.origin = intersect + EPSILON * normal;
+    if (m.hasReflective)
+    {
+        // specular
+        glm::vec3 reflectedDir = glm::reflect(glm::normalize(pathSegment.ray.direction),
+                                              normal);
+        glm::vec3 reflectedColor = m.specular.color;
+        pathSegment.ray.direction = reflectedDir;
+        pathSegment.color *= reflectedColor;
+    }
+    else if (m.hasRefractive)
+    {
+
+    }
+    else {
+        // diffuse
+        glm::vec3 diffuseDir = glm::normalize(calculateRandomDirectionInHemisphere(normal, rng));
+        glm::vec3 diffuseColor = m.color;
+        pathSegment.ray.direction = diffuseDir;
+        pathSegment.color *= diffuseColor;
+    }
+
+    pathSegment.color = glm::clamp(pathSegment.color, glm::vec3(0.0f), glm::vec3(1.0f));
+    pathSegment.remainingBounces--;
 }
diff --git a/src/intersections.h b/src/intersections.h
index b1504071..8ea283d4 100644
--- a/src/intersections.h
+++ b/src/intersections.h
@@ -1,3 +1,4 @@
+// ray intersection functions
 #pragma once
 
 #include <glm/glm.hpp>
@@ -20,14 +21,6 @@ __host__ __device__ inline unsigned int utilhash(unsigned int a) {
 }
 
 // CHECKITOUT
-/**
- * Compute a point at parameter value `t` on ray `r`.
- * Falls slightly short so that it doesn't intersect the object it's hitting.
- */
-__host__ __device__ glm::vec3 getPointOnRay(Ray r, float t) {
-    return r.origin + (t - .0001f) * glm::normalize(r.direction);
-}
-
 /**
  * Multiplies a mat4 and a vec4 and returns a vec3 clipped from the vec4.
  */
@@ -82,7 +75,7 @@ __host__ __device__ float boxIntersectionTest(Geom box, Ray r,
             tmin_n = tmax_n;
             outside = false;
         }
-        intersectionPoint = multiplyMV(box.transform, glm::vec4(getPointOnRay(q, tmin), 1.0f));
+        intersectionPoint = multiplyMV(box.transform, glm::vec4(q.evaluate(tmin), 1.0f));
         normal = glm::normalize(multiplyMV(box.invTranspose, glm::vec4(tmin_n, 0.0f)));
         return glm::length(r.origin - intersectionPoint);
     }
@@ -131,8 +124,8 @@ __host__ __device__ float sphereIntersectionTest(Geom sphere, Ray r,
         t = max(t1, t2);
         outside = false;
     }
-
-    glm::vec3 objspaceIntersection = getPointOnRay(rt, t);
+    
+    glm::vec3 objspaceIntersection = rt.evaluate(t);
 
     intersectionPoint = multiplyMV(sphere.transform, glm::vec4(objspaceIntersection, 1.f));
     normal = glm::normalize(multiplyMV(sphere.invTranspose, glm::vec4(objspaceIntersection, 0.f)));
diff --git a/src/main.cpp b/src/main.cpp
index fe8e85ec..d32083e2 100644
--- a/src/main.cpp
+++ b/src/main.cpp
@@ -2,6 +2,8 @@
 #include "preview.h"
 #include <cstring>
 
+// this is a global variable with internal linkage
+// thus can only be seen and used in this file
 static std::string startTimeString;
 
 // For camera controls
@@ -31,6 +33,7 @@ int height;
 //-------------------------------
 
 int main(int argc, char** argv) {
+    // Used for naming image files
     startTimeString = currentTimeString();
 
     if (argc < 2) {
@@ -75,6 +78,7 @@ int main(int argc, char** argv) {
     return 0;
 }
 
+// Save an image in Project3-CUDA-Path-Tracer/build/ 
 void saveImage() {
     float samples = iteration;
     // output image file
@@ -90,6 +94,8 @@ void saveImage() {
 
     std::string filename = renderState->imageName;
     std::ostringstream ss;
+    // Every image will get a different filename based on current
+    // time on the author's machine
     ss << filename << "." << startTimeString << "." << samples << "samp";
     filename = ss.str();
 
@@ -98,8 +104,11 @@ void saveImage() {
     //img.saveHDR(filename);  // Save a Radiance HDR file
 }
 
+clock_t start, stop;
+
 void runCuda() {
     if (camchanged) {
+        start = clock();
         iteration = 0;
         Camera &cam = renderState->camera;
         cameraPosition.x = zoom * sin(phi) * sin(theta);
@@ -142,6 +151,10 @@ void runCuda() {
         saveImage();
         pathtraceFree();
         cudaDeviceReset();
+        stop = clock();
+        double timer_seconds = ((double)(stop - start)) / CLOCKS_PER_SEC;
+        std::cerr << "took " << timer_seconds << " seconds.\n";
+
         exit(EXIT_SUCCESS);
     }
 }
diff --git a/src/pathtrace.cu b/src/pathtrace.cu
index 056e1467..4d0908ee 100644
--- a/src/pathtrace.cu
+++ b/src/pathtrace.cu
@@ -3,7 +3,8 @@
 #include <cmath>
 #include <thrust/execution_policy.h>
 #include <thrust/random.h>
-#include <thrust/remove.h>
+#include <thrust/partition.h>
+
 
 #include "sceneStructs.h"
 #include "scene.h"
@@ -14,6 +15,9 @@
 #include "intersections.h"
 #include "interactions.h"
 
+#define SORT_BY_MATERIAL 0
+static bool useCachedFirstBounce = false;
+
 #define ERRORCHECK 1
 
 #define FILENAME (strrchr(__FILE__, '/') ? strrchr(__FILE__, '/') + 1 : __FILE__)
@@ -72,10 +76,13 @@ static glm::vec3 * dev_image = NULL;
 static Geom * dev_geoms = NULL;
 static Material * dev_materials = NULL;
 static PathSegment * dev_paths = NULL;
-static ShadeableIntersection * dev_intersections = NULL;
+static HitRecord * dev_intersections = NULL;
+static HitRecord* dev_intersections_first_bounce = NULL;
 // TODO: static variables for device memory, any extra info you need, etc
 // ...
 
+// initializes the path tracer state - it should copy scene data 
+// (e.g. geometry, materials) from `Scene`.
 void pathtraceInit(Scene *scene) {
     hst_scene = scene;
     const Camera &cam = hst_scene->state.camera;
@@ -92,14 +99,19 @@ void pathtraceInit(Scene *scene) {
     cudaMalloc(&dev_materials, scene->materials.size() * sizeof(Material));
     cudaMemcpy(dev_materials, scene->materials.data(), scene->materials.size() * sizeof(Material), cudaMemcpyHostToDevice);
 
-    cudaMalloc(&dev_intersections, pixelcount * sizeof(ShadeableIntersection));
-    cudaMemset(dev_intersections, 0, pixelcount * sizeof(ShadeableIntersection));
+    cudaMalloc(&dev_intersections, pixelcount * sizeof(HitRecord));
+    cudaMemset(dev_intersections, 0, pixelcount * sizeof(HitRecord));
 
     // TODO: initialize any extra device memeory you need
+    cudaMalloc(&dev_intersections_first_bounce, pixelcount * sizeof(HitRecord));
+    cudaMemset(dev_intersections_first_bounce, 0, pixelcount * sizeof(HitRecord));
+
+
 
     checkCUDAError("pathtraceInit");
 }
 
+// frees memory allocated by `pathtraceInit`
 void pathtraceFree() {
     cudaFree(dev_image);  // no-op if dev_image is null
     cudaFree(dev_paths);
@@ -107,6 +119,7 @@ void pathtraceFree() {
     cudaFree(dev_materials);
     cudaFree(dev_intersections);
     // TODO: clean up any extra device memory you created
+    cudaFree(dev_intersections_first_bounce);
 
     checkCUDAError("pathtraceFree");
 }
@@ -115,12 +128,16 @@ void pathtraceFree() {
 * Generate PathSegments with rays from the camera through the screen into the
 * scene, which is the first bounce of rays.
 *
+* iter: 0 to 5000 (set in Camera - ITERATIONS in txt) not used for now
+* traceDepth: 8 (set in Camera - DEPTH in txt) 
+* 
 * Antialiasing - add rays for sub-pixel sampling
 * motion blur - jitter rays "in time"
 * lens effect - jitter ray origin positions based on a lens
 */
 __global__ void generateRayFromCamera(Camera cam, int iter, int traceDepth, PathSegment* pathSegments)
 {
+    // x and y are in screen space: [0, width] x [0, height]
     int x = (blockIdx.x * blockDim.x) + threadIdx.x;
     int y = (blockIdx.y * blockDim.y) + threadIdx.y;
 
@@ -129,12 +146,16 @@ __global__ void generateRayFromCamera(Camera cam, int iter, int traceDepth, Path
         PathSegment & segment = pathSegments[index];
 
         segment.ray.origin = cam.position;
-    segment.color = glm::vec3(1.0f, 1.0f, 1.0f);
+        segment.color = glm::vec3(1.0f, 1.0f, 1.0f);
 
         // TODO: implement antialiasing by jittering the ray
+        thrust::default_random_engine rng = makeSeededRandomEngine(iter, index, 0);
+        thrust::uniform_real_distribution<float> u01(0, 1);
+        float jitteredX = (float)x + u01(rng);
+        float jitteredY = (float)y + u01(rng);
         segment.ray.direction = glm::normalize(cam.view
-            - cam.right * cam.pixelLength.x * ((float)x - (float)cam.resolution.x * 0.5f)
-            - cam.up * cam.pixelLength.y * ((float)y - (float)cam.resolution.y * 0.5f)
+            - cam.right * cam.pixelLength.x * (jitteredX - (float)cam.resolution.x * 0.5f)
+            - cam.up * cam.pixelLength.y * (jitteredY - (float)cam.resolution.y * 0.5f)
             );
 
         segment.pixelIndex = index;
@@ -152,7 +173,7 @@ __global__ void computeIntersections(
     , PathSegment * pathSegments
     , Geom * geoms
     , int geoms_size
-    , ShadeableIntersection * intersections
+    , HitRecord * intersections
     )
 {
     int path_index = blockIdx.x * blockDim.x + threadIdx.x;
@@ -208,6 +229,7 @@ __global__ void computeIntersections(
             intersections[path_index].t = t_min;
             intersections[path_index].materialId = geoms[hit_geom_index].materialid;
             intersections[path_index].surfaceNormal = normal;
+            intersections[path_index].intersectionPoint = intersect_point;
         }
     }
 }
@@ -224,45 +246,51 @@ __global__ void computeIntersections(
 __global__ void shadeFakeMaterial (
   int iter
   , int num_paths
-    , ShadeableIntersection * shadeableIntersections
+    , HitRecord * shadeableIntersections
     , PathSegment * pathSegments
     , Material * materials
     )
 {
-  int idx = blockIdx.x * blockDim.x + threadIdx.x;
-  if (idx < num_paths)
-  {
-    ShadeableIntersection intersection = shadeableIntersections[idx];
-    if (intersection.t > 0.0f) { // if the intersection exists...
-      // Set up the RNG
-      // LOOK: this is how you use thrust's RNG! Please look at
-      // makeSeededRandomEngine as well.
-      thrust::default_random_engine rng = makeSeededRandomEngine(iter, idx, 0);
-      thrust::uniform_real_distribution<float> u01(0, 1);
-
-      Material material = materials[intersection.materialId];
-      glm::vec3 materialColor = material.color;
-
-      // If the material indicates that the object was a light, "light" the ray
-      if (material.emittance > 0.0f) {
-        pathSegments[idx].color *= (materialColor * material.emittance);
-      }
-      // Otherwise, do some pseudo-lighting computation. This is actually more
-      // like what you would expect from shading in a rasterizer like OpenGL.
-      // TODO: replace this! you should be able to start with basically a one-liner
-      else {
-        float lightTerm = glm::dot(intersection.surfaceNormal, glm::vec3(0.0f, 1.0f, 0.0f));
-        pathSegments[idx].color *= (materialColor * lightTerm) * 0.3f + ((1.0f - intersection.t * 0.02f) * materialColor) * 0.7f;
-        pathSegments[idx].color *= u01(rng); // apply some noise because why not
-      }
-    // If there was no intersection, color the ray black.
-    // Lots of renderers use 4 channel color, RGBA, where A = alpha, often
-    // used for opacity, in which case they can indicate "no opacity".
-    // This can be useful for post-processing and image compositing.
-    } else {
-      pathSegments[idx].color = glm::vec3(0.0f);
+    int idx = blockIdx.x * blockDim.x + threadIdx.x;
+    if (idx < num_paths)
+    { 
+        HitRecord intersection = shadeableIntersections[idx];
+        if (intersection.t > 0.0f) { // if the intersection exists...
+            // Set up the RNG
+            // LOOK: this is how you use thrust's RNG! Please look at
+            // makeSeededRandomEngine as well. 
+            // (iter, ray) has a unique random engine
+            thrust::default_random_engine rng = makeSeededRandomEngine(iter, idx, 0);
+            thrust::uniform_real_distribution<float> u01(0, 1);
+
+            Material material = materials[intersection.materialId];
+            glm::vec3 materialColor = material.color;
+            // could use material.specular.color, etc
+
+            // If the material indicates that the object was a light, "light" the ray
+            if (material.emittance > 0.0f) {
+                pathSegments[idx].color *= (materialColor * material.emittance);
+                pathSegments[idx].remainingBounces = 0;
+            }
+            // Otherwise, do some pseudo-lighting computation. This is actually more
+            // like what you would expect from shading in a rasterizer like OpenGL.
+            // TODO: replace this! you should be able to start with basically a one-liner
+            else {
+                scatterRay(pathSegments[idx],
+                           intersection.intersectionPoint,
+                           intersection.surfaceNormal,
+                           material,
+                           rng);
+            }
+        // If there was no intersection, color the ray black.
+        // Lots of renderers use 4 channel color, RGBA, where A = alpha, often
+        // used for opacity, in which case they can indicate "no opacity".
+        // This can be useful for post-processing and image compositing.
+        } else {
+            pathSegments[idx].color = Color3f(0.f);
+            pathSegments[idx].remainingBounces = 0;
+        }
     }
-  }
 }
 
 // Add the current iteration's output to the overall image
@@ -277,9 +305,43 @@ __global__ void finalGather(int nPaths, glm::vec3 * image, PathSegment * iterati
     }
 }
 
+
+
+
+///////////////////////////////////////////////////////////////////////////
+
+// Recap:
+// * Initialize array of path rays (using rays that come out of the camera)
+//   * You can pass the Camera object to that kernel.
+//   * Each path ray must carry at minimum a (ray, color) pair,
+//   * where color starts as the multiplicative identity, white = (1, 1, 1).
+//   * This has already been done for you.
+// * For each depth:
+//   * Compute an intersection in the scene for each path ray.
+//     A very naive version of this has been implemented for you, but feel
+//     free to add more primitives and/or a better algorithm.
+//     Currently, intersection distance is recorded as a parametric distance,
+//     t, or a "distance along the ray." t = -1.0 indicates no intersection.
+//     * Color is attenuated (multiplied) by reflections off of any object
+//   * TODO: Stream compact away all of the terminated paths.
+//     You may use either your implementation or `thrust::remove_if` or its
+//     cousins.
+//     * Note that you can't really use a 2D kernel launch any more - switch
+//       to 1D.
+//   * TODO: Shade the rays that intersected something or didn't bottom out.
+//     That is, color the ray by performing a color computation according
+//     to the shader, then generate a new ray to continue the ray path.
+//     We recommend just updating the ray's PathSegment in place.
+//     Note that this step may come before or after stream compaction,
+//     since some shaders you write may also cause a path to terminate.
+// * Finally, add this iteration's results to the image. This has been done
+//   for you.
+// TODO: perform one iteration of path tracing
 /**
  * Wrapper for the __global__ call that sets up the kernel calls and does a ton
  * of memory management
+ * performs one iteration of the rendering - 
+ * it handles kernel launches, memory copies, transferring some data, etc.
  */
 void pathtrace(uchar4 *pbo, int frame, int iter) {
     const int traceDepth = hst_scene->state.traceDepth;
@@ -295,90 +357,148 @@ void pathtrace(uchar4 *pbo, int frame, int iter) {
     // 1D block for path tracing
     const int blockSize1d = 128;
 
-    ///////////////////////////////////////////////////////////////////////////
-
-    // Recap:
     // * Initialize array of path rays (using rays that come out of the camera)
     //   * You can pass the Camera object to that kernel.
     //   * Each path ray must carry at minimum a (ray, color) pair,
     //   * where color starts as the multiplicative identity, white = (1, 1, 1).
     //   * This has already been done for you.
-    // * For each depth:
-    //   * Compute an intersection in the scene for each path ray.
-    //     A very naive version of this has been implemented for you, but feel
-    //     free to add more primitives and/or a better algorithm.
-    //     Currently, intersection distance is recorded as a parametric distance,
-    //     t, or a "distance along the ray." t = -1.0 indicates no intersection.
-    //     * Color is attenuated (multiplied) by reflections off of any object
-    //   * TODO: Stream compact away all of the terminated paths.
-    //     You may use either your implementation or `thrust::remove_if` or its
-    //     cousins.
-    //     * Note that you can't really use a 2D kernel launch any more - switch
-    //       to 1D.
-    //   * TODO: Shade the rays that intersected something or didn't bottom out.
-    //     That is, color the ray by performing a color computation according
-    //     to the shader, then generate a new ray to continue the ray path.
-    //     We recommend just updating the ray's PathSegment in place.
-    //     Note that this step may come before or after stream compaction,
-    //     since some shaders you write may also cause a path to terminate.
-    // * Finally, add this iteration's results to the image. This has been done
-    //   for you.
-
-    // TODO: perform one iteration of path tracing
-
     generateRayFromCamera <<<blocksPerGrid2d, blockSize2d >>>(cam, iter, traceDepth, dev_paths);
+    // after this kernel, dev_paths will be populated with pixelcount (640,000 by default)'s 
+    // path segments. 
+    // Each path segment contains
+    // 1) a ray starting from the cam position, shooting towards one pixel in the image
+    // 2) a white color
+    // 3) pixelIndex ranging from 0 to pixelcount (640,000 by default)
+    // 4) remainingBounces initialized to traceDepth (8 by default)
     checkCUDAError("generate camera ray");
 
     int depth = 0;
     PathSegment* dev_path_end = dev_paths + pixelcount;
+    // Each pixel in the image has one ray initially
     int num_paths = dev_path_end - dev_paths;
 
     // --- PathSegment Tracing Stage ---
     // Shoot ray into scene, bounce between objects, push shading chunks
 
-  bool iterationComplete = false;
+    // * For each depth:
+    bool iterationComplete = false;
     while (!iterationComplete) {
+        // tracing
+        // observation: num_paths most definitely needs to be changed after stream compaction
+        dim3 numblocksPathSegmentTracing = (num_paths + blockSize1d - 1) / blockSize1d;
+        if (useCachedFirstBounce)
+        {
+            bool firstBounceInFirstIteration = iter == 1 && depth == 0;
+            bool firstbounceInAnyIteration = iter != 1 && depth == 0;
+            if (firstBounceInFirstIteration)
+            {
+                cudaMemset(dev_intersections_first_bounce, 0, pixelcount * sizeof(HitRecord));
+                computeIntersections << <numblocksPathSegmentTracing, blockSize1d >> > (
+                    depth
+                    , num_paths
+                    , dev_paths
+                    , dev_geoms
+                    , hst_scene->geoms.size()
+                    , dev_intersections
+                    );
+                cudaMemcpy(dev_intersections_first_bounce, dev_intersections,
+                    pixelcount * sizeof(HitRecord), cudaMemcpyDeviceToDevice);
+            }
+            else if(firstbounceInAnyIteration)
+            {
+                cudaMemcpy(dev_intersections, dev_intersections_first_bounce,
+                    pixelcount * sizeof(HitRecord), cudaMemcpyDeviceToDevice);
+            }
+            else {
+                cudaMemset(dev_intersections, 0, pixelcount * sizeof(HitRecord));
+                computeIntersections << <numblocksPathSegmentTracing, blockSize1d >> > (
+                    depth
+                    , num_paths
+                    , dev_paths
+                    , dev_geoms
+                    , hst_scene->geoms.size()
+                    , dev_intersections
+                    );
+                checkCUDAError("trace one bounce");
+                cudaDeviceSynchronize();
+            }
+        }
+        else {
+            cudaMemset(dev_intersections, 0, pixelcount * sizeof(HitRecord));
+            //   * Compute an intersection in the scene for each path ray.
+            //     A very naive version of this has been implemented for you, but feel
+            //     free to add more primitives and/or a better algorithm.
+            //     Currently, intersection distance is recorded as a parametric distance,
+            //     t, or a "distance along the ray." t = -1.0 indicates no intersection.
+            //     * Color is attenuated (multiplied) by reflections off of any object
+            computeIntersections << <numblocksPathSegmentTracing, blockSize1d >> > (
+                depth
+                , num_paths
+                , dev_paths
+                , dev_geoms
+                , hst_scene->geoms.size()
+                , dev_intersections
+                );
+            // after this kernel, dev_intersections will be populated with pixelcount 
+            // (640,000 by default)'s ShadableIntersection. 
+            // Each ShadableIntersection contains:
+            // 1) t: -1 if no intersection. Others if intersection
+            // 2) surfaceNormal
+            // 3) materialId
+            checkCUDAError("trace one bounce");
+            cudaDeviceSynchronize();
+        }
 
-    // clean shading chunks
-    cudaMemset(dev_intersections, 0, pixelcount * sizeof(ShadeableIntersection));
-
-    // tracing
-    dim3 numblocksPathSegmentTracing = (num_paths + blockSize1d - 1) / blockSize1d;
-    computeIntersections <<<numblocksPathSegmentTracing, blockSize1d>>> (
-        depth
-        , num_paths
-        , dev_paths
-        , dev_geoms
-        , hst_scene->geoms.size()
-        , dev_intersections
+        depth++;
+
+        // TODO:
+        // --- Sort rays by material Stage ---
+#if SORT_BY_MATERIAL
+        thrust::sort_by_key(thrust::device, dev_intersections, dev_intersections + num_paths, dev_paths, sortMaterial());
+#endif
+        // TODO:
+        // --- Shading Stage (will genereate new rays in shader(s) ---
+        //     * TODO: Shade the rays that intersected something or didn't bottom out.
+        //     That is, color the ray by performing a color computation according
+        //     to the shader, then generate a new ray to continue the ray path.
+        //     We recommend just updating the ray's PathSegment in place.
+        //     Note that this step may come before or after stream compaction,
+        //     since some shaders you write may also cause a path to terminate.
+        // Shade path segments based on intersections and generate new rays by
+        // evaluating the BSDF.
+        // Start off with just a big kernel that handles all the different
+        // materials you have in the scenefile.
+        // TODO: compare between directly shading the path segments and shading
+        // path segments that have been reshuffled to be contiguous in memory.
+        shadeFakeMaterial<<<numblocksPathSegmentTracing, blockSize1d>>> (
+        iter,
+        num_paths,
+        dev_intersections,
+        dev_paths,
+        dev_materials
         );
-    checkCUDAError("trace one bounce");
-    cudaDeviceSynchronize();
-    depth++;
-
-
-    // TODO:
-    // --- Shading Stage ---
-    // Shade path segments based on intersections and generate new rays by
-  // evaluating the BSDF.
-  // Start off with just a big kernel that handles all the different
-  // materials you have in the scenefile.
-  // TODO: compare between directly shading the path segments and shading
-  // path segments that have been reshuffled to be contiguous in memory.
-
-  shadeFakeMaterial<<<numblocksPathSegmentTracing, blockSize1d>>> (
-    iter,
-    num_paths,
-    dev_intersections,
-    dev_paths,
-    dev_materials
-  );
-  iterationComplete = true; // TODO: should be based off stream compaction results.
+        
+
+        // --- Stream Compaction Stage ---
+        //     * TODO: Stream compact away all of the terminated paths.
+        //     You may use either your implementation or `thrust::remove_if` or its
+        //     cousins.
+        //     * Note that you can't really use a 2D kernel launch any more - switch
+        //       to 1D.
+        // Ray terminates when:
+        // There is no intersection
+        // There is intersection, but it hits light
+        PathSegment* newEnd = thrust::stable_partition(thrust::device, dev_paths, dev_paths + num_paths, isTerminated());
+        num_paths = newEnd - dev_paths;
+        iterationComplete = num_paths <= 0;
     }
 
-  // Assemble this iteration and apply it to the image
-  dim3 numBlocksPixels = (pixelcount + blockSize1d - 1) / blockSize1d;
-    finalGather<<<numBlocksPixels, blockSize1d>>>(num_paths, dev_image, dev_paths);
+    // * Finally, add this iteration's results to the image. This has been done
+    //   for you.
+    // Assemble this iteration and apply it to the image
+    dim3 numBlocksPixels = (pixelcount + blockSize1d - 1) / blockSize1d;
+    // original BASE CODE: finalGather<<<numBlocksPixels, blockSize1d>>>(num_paths, dev_image, dev_paths);
+    finalGather<<<numBlocksPixels, blockSize1d>>>(pixelcount, dev_image, dev_paths);
 
     ///////////////////////////////////////////////////////////////////////////
 
diff --git a/src/preview.cpp b/src/preview.cpp
index 4eb0bc13..a7ddab9d 100644
--- a/src/preview.cpp
+++ b/src/preview.cpp
@@ -1,4 +1,8 @@
 #define _CRT_SECURE_NO_DEPRECATE
+// This header was originally in the C standard library as <time.h>.
+// This header is part of the C - style date and time library.
+// https://en.cppreference.com/w/cpp/header/ctime
+// some types and functions in this header are used in currentTimeString()
 #include <ctime>
 #include "main.h"
 #include "preview.h"
@@ -10,11 +14,23 @@ GLuint displayImage;
 
 GLFWwindow *window;
 
+// Return a string of current time in UTC 
 std::string currentTimeString() {
+    // time_t: Arithmetic type capable of representing times.
     time_t now;
+    // stores current calendar time in now 
     time(&now);
+    // sizeof expression
+    // yields the size in bytes of the object representation of 
+    // the type of expression, if that expression is evaluated.
+    // "0000-00-00_00-00-00z" is char[21] (includes null terminator)
+    // sizeof "0000-00-00_00-00-00z" evaluates to 21
     char buf[sizeof "0000-00-00_00-00-00z"];
-    strftime(buf, sizeof buf, "%Y-%m-%d_%H-%M-%Sz", gmtime(&now));
+    // gmtime() converts given time since epoch as std::time_t value into calendar time, 
+    // expressed in Coordinated Universal Time (UTC). Changed to localtime by Charles
+    // strftime converts the date and time information from now into a formatted buf
+    // format: year-month-day_hour-minute-second
+    strftime(buf, sizeof buf, "%Y-%m-%d_%H-%M-%Sz", localtime(&now));
     return std::string(buf);
 }
 
diff --git a/src/scene.cpp b/src/scene.cpp
index 3fb6239a..990cda52 100644
--- a/src/scene.cpp
+++ b/src/scene.cpp
@@ -11,6 +11,7 @@ Scene::Scene(string filename) {
     fp_in.open(fname);
     if (!fp_in.is_open()) {
         cout << "Error reading from file - aborting!" << endl;
+        // application will terminate with unhandled exception
         throw;
     }
     while (fp_in.good()) {
@@ -93,7 +94,7 @@ int Scene::loadCamera() {
     cout << "Loading Camera ..." << endl;
     RenderState &state = this->state;
     Camera &camera = state.camera;
-    float fovy;
+    float fovyh;
 
     //load static properties
     for (int i = 0; i < 5; i++) {
@@ -103,8 +104,8 @@ int Scene::loadCamera() {
         if (strcmp(tokens[0].c_str(), "RES") == 0) {
             camera.resolution.x = atoi(tokens[1].c_str());
             camera.resolution.y = atoi(tokens[2].c_str());
-        } else if (strcmp(tokens[0].c_str(), "FOVY") == 0) {
-            fovy = atof(tokens[1].c_str());
+        } else if (strcmp(tokens[0].c_str(), "FOVYH") == 0) {
+            fovyh = atof(tokens[1].c_str());
         } else if (strcmp(tokens[0].c_str(), "ITERATIONS") == 0) {
             state.iterations = atoi(tokens[1].c_str());
         } else if (strcmp(tokens[0].c_str(), "DEPTH") == 0) {
@@ -130,16 +131,17 @@ int Scene::loadCamera() {
     }
 
     //calculate fov based on resolution
-    float yscaled = tan(fovy * (PI / 180));
-    float xscaled = (yscaled * camera.resolution.x) / camera.resolution.y;
-    float fovx = (atan(xscaled) * 180) / PI;
-    camera.fov = glm::vec2(fovx, fovy);
-
-    camera.right = glm::normalize(glm::cross(camera.view, camera.up));
-    camera.pixelLength = glm::vec2(2 * xscaled / (float)camera.resolution.x,
-                                   2 * yscaled / (float)camera.resolution.y);
-
+    //assume |n| = 1. Following Games 101 convention. 
+    float t = tan(fovyh * (PI / 180));
+    float r = (t * camera.resolution.x) / camera.resolution.y;
+    float fovxh = (atan(r) * 180) / PI;
+    camera.fov = glm::vec2(fovxh, fovyh);
+    
     camera.view = glm::normalize(camera.lookAt - camera.position);
+    camera.right = glm::normalize(glm::cross(camera.view, camera.up));
+    // pixels are assumed to be in NDC -1 to 1 
+    camera.pixelLength = glm::vec2(2 * r / (float)camera.resolution.x,
+                                   2 * t / (float)camera.resolution.y);
 
     //set up render camera stuff
     int arraylen = camera.resolution.x * camera.resolution.y;
diff --git a/src/sceneStructs.h b/src/sceneStructs.h
index da4dbf30..cc3e4dfe 100644
--- a/src/sceneStructs.h
+++ b/src/sceneStructs.h
@@ -4,6 +4,7 @@
 #include <vector>
 #include <cuda_runtime.h>
 #include "glm/glm.hpp"
+#include "utilities.h"
 
 #define BACKGROUND_COLOR (glm::vec3(0.0f))
 
@@ -12,9 +13,32 @@ enum GeomType {
     CUBE,
 };
 
-struct Ray {
-    glm::vec3 origin;
-    glm::vec3 direction;
+class Ray {
+public:
+    __host__ __device__ Ray::Ray()
+        : origin(), direction()
+    {}
+    __host__ __device__ Ray::Ray(const Point3f& o, const Vector3f& d)
+        : origin(o), direction(d)
+    {}
+
+    __host__ __device__ Point3f getOrigin() const
+    {
+        return origin;
+    }
+
+    __host__ __device__ Vector3f getDirection() const
+    {
+        return direction;
+    }
+
+    //  Falls slightly short so that it doesn't intersect the object it's hitting. 
+    __host__ __device__ Point3f evaluate(float t) const {
+        return origin + (t - .0001f) * glm::normalize(direction);
+    }
+public:
+    Point3f origin;
+    Vector3f direction;
 };
 
 struct Geom {
@@ -42,6 +66,7 @@ struct Material {
 
 struct Camera {
     glm::ivec2 resolution;
+    // EYE
     glm::vec3 position;
     glm::vec3 lookAt;
     glm::vec3 view;
@@ -54,8 +79,10 @@ struct Camera {
 struct RenderState {
     Camera camera;
     unsigned int iterations;
+    // DEPTH in Camera
     int traceDepth;
     std::vector<glm::vec3> image;
+    // FILE in Camera
     std::string imageName;
 };
 
@@ -69,8 +96,27 @@ struct PathSegment {
 // Use with a corresponding PathSegment to do:
 // 1) color contribution computation
 // 2) BSDF evaluation: generate a new ray
-struct ShadeableIntersection {
+struct HitRecord {
   float t;
   glm::vec3 surfaceNormal;
   int materialId;
+  glm::vec3 intersectionPoint;
 };
+
+
+struct isTerminated
+{
+    __host__ __device__
+    bool operator()(const PathSegment& p)
+    {
+        return p.remainingBounces > 0;
+    }
+};
+
+struct sortMaterial 
+{
+    __host__ __device__ 
+    bool operator()(const HitRecord& a, const HitRecord& b) {
+        return a.materialId < b.materialId;
+    }
+};
\ No newline at end of file
diff --git a/src/utilities.h b/src/utilities.h
index abb4f27c..76cd5ede 100644
--- a/src/utilities.h
+++ b/src/utilities.h
@@ -13,14 +13,32 @@
 #define TWO_PI            6.2831853071795864769252867665590057683943f
 #define SQRT_OF_ONE_THIRD 0.5773502691896257645091487805019574556476f
 #define EPSILON           0.00001f
+#define uPtr std::unique_ptr
+#define mkU std::make_unique
+#define sPtr std::shared_ptr
+#define mkS std::make_shared
+
+typedef glm::vec3 Color3f;
+typedef glm::vec3 Point3f;
+typedef glm::vec3 Normal3f;
+typedef glm::vec2 Point2f;
+typedef glm::ivec2 Point2i;
+typedef glm::ivec3 Point3i;
+typedef glm::vec3 Vector3f;
+typedef glm::vec2 Vector2f;
+typedef glm::ivec2 Vector2i;
+typedef glm::mat4 Matrix4x4;
+typedef glm::mat3 Matrix3x3;
 
 namespace utilityCore {
     extern float clamp(float f, float min, float max);
     extern bool replaceString(std::string& str, const std::string& from, const std::string& to);
     extern glm::vec3 clampRGB(glm::vec3 color);
     extern bool epsilonCheck(float a, float b);
+    // Split str with any whitespace as delimiters and put them into a vector
     extern std::vector<std::string> tokenizeString(std::string str);
     extern glm::mat4 buildTransformationMatrix(glm::vec3 translation, glm::vec3 rotation, glm::vec3 scale);
     extern std::string convertIntToString(int number);
+    // Get the next line in a file and put it into t
     extern std::istream& safeGetline(std::istream& is, std::string& t); //Thanks to http://stackoverflow.com/a/6089413
 }