CUDA Path Tracer

University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 3

Yian Chen
- LinkedIn, personal website etc.
Tested on: Windows 10, AMD Ryzen 5800 HS with Radeon Graphics CPU @ 3.20GHz 16GB, NVIDIA GeForce RTX3060 Laptop 8GB

Implemeted Feature

Core(as required in Project 3)
- Diffuse & Specular
- Jittering (Antialiasing)
- First Bounce Cache
- Sort by material
Load gltf
BVH && SAH
Texture mapping & bump mapping
Environment Mapping
Microfacet BSDF
Emissive BSDF (with Emissive Texture)
Direct Lighting
Multiple Importance Sampling
Depth of Field
Tone mapping && Gamma Correction

Core features (As required by project instruction)

Diffuse & Specular

Jittering

Before jittering	After jittering

`gltf` Load & A Better Workflow (?)

In this pathtracer, supported scene format is gltf for its high expressive capability of 3D scenes. Please view this page for more details about gltf.

Eventually, during development, most scenes used for testing is directly exported from Blender. This enables a much higher flexibility for testing.

scenes/pathtracer_robots_demo.glb Link

BVH

On host, we can construct and traverse BVH recursively. While in this project, our code run on GPU. Though recent cuda update allows recursive function execution on device, we cannot take that risk as raytracer is very performance-oriented. Recursive execution will slow down the kernel function, as it may bring dynamic stack size.

Thanks to this paper, a novel BVH constructing and traversing algorithm called MTBVH is adopted in this pathtracer. This method is stack-free.

This pathtracer only implements a simple version of MTBVH. Instead of constructing 6 BVHs and traversing one of them at runtime, only 1 BVH is constructed. It implies that this pathtracer still has the potential of speeding up.

With BVH & Without BVH:

With BVH	Without BVH

As expected, speedup is huge up to 40 times. With a more complex scene, BVH should give a higher speedup.

Texture Mapping & Bump Mapping

To enhance the details of mesh surfaces and gemometries, texture mapping is a must. Here we have not implemented mipmap on GPU, though it should not be that difficult to do so.

scenes/pathtracer_test_texture.glb Link

Before bump mapping	After bump mapping

Microfact BSDF

To use various material, bsdfs that are more complicated than diffuse/specular are required. Here, we will first implement the classic microfacet BSDF to extend the capability of material in this pathtracer.

This pathtracer uses the Microfacet implementation basd on pbrt.

Metallness = 1. Roughness 0 to 1 from left to right.

Please note that the sphere used here is not an actual sphere but an icosphere.

scenes/pathtracer_test_microfacet.glb Link

With texture mapping implemented, we can use metallicRoughness texture now. Luckily, gltf has a good support over metallic workflow.

scenes/pathtracer_robot.glb Link

Direct Lighting & MIS

To stress the speed up of convergence in MIS, Russian-Roulette is disabled in this part's rendering.

The tiny dark stripe is visible in some rendering result. This is because by default we do not allow double-sided lighting in this pathtracer.

By default, number of light sample is set to 3.

When sampling for the direction of next bounce, we have adopted importance sampling for bsdf for most of the time. It enhances the convergence speed for specular materials, as the sampling strategies greatly aligned with the expected radiance distribution on hemisphere. However, for diffuse/matte surfaces, this sampling strategies can be optimized, as the most affecting factors for the radiance distribution of these sort of materials is light instead of outgoing rays. Thus, sampling from light is also a valuable strategy to speedup convergence speed of raytracing rough surfaces.

In this demo scene, 3 metal plane are allocated with 4 cube lights. When we only sample bsdf, we can see that the expected radiance on the surface of metal plane converges. When we only sample light, we can see how the rougher part of the scene, the back white wall, has better converging speed. Hence, we are looking forward to a sampling strategy that combines the advantages of these two, which is multiple importance sampling.

scenes/pathtracer_mis_demo.glb Link

Only sample bsdf 500spp	Only sample light 500spp	MIS 500spp

To see more details about this part, see this part of pbrt or this post of mine.

Test on bunny scene. Faster convergence speed can be observed.

scenes/pathtracer_bunny_mis.glb Link

Without MIS 256spp	With MIS 256spp

Without MIS 5k spp	With MIS 5k spp

Depth of Field

In depth of field, we define two variables. focal_length & aperture.

More details can be viewed in this post.

Depth of Field (Aperture=0.3)

How to Run

Due to time limit, this code is not refactored to be production-ready. Many features are controlled by macros or even hardcoded, which is indeed irrational. More refactoring job might be accomplished in future.

To run this program for now, users need to adjust code. Most changes should be done within pathtrace.cu.

...
// Default settings
#define USE_FIRST_BOUNCE_CACHE 0
#define USE_SORT_BY_MATERIAL 0
#define MIS 1 // Multiple Importance Sampling
#define RUSSIAN_ROULETTE 0
#if RUSSIAN_ROULETTE
	#define RR_THRESHOLD 0.7f
#endif
#define USE_ENV_MAP 1
#define USE_BVH 1
#define TONE_MAPPING 0
...
// Change loaded scene here
static Scene * hst_scene = new Scene("..\\scenes\\pathtracer_robots_demo.glb");
...

Most settings should be self-explainable.

Another part where users can change settings is in bvh.h

...
#define BVH_NAIVE 1
#define BVH_SAH 0
...

Here, we can determine whether BVH will use a naive split or a surface area heuristic function.

Other Result

Run Video
Depth of Field Video

Future (If possible)

Cuda Side

More cuda optimization
- Bank conflict
- Loop unroll
  - Light sample loop (if multiple light rays)
- Higher parallelism (Use streams?)
Tile-based raytracing
- Potentially, it should increase the rendering speed, as it will maximize locallity within one pixel/tile. No more realtime camera movement though.

Render Side

Adaptive Sampling
Mipmap
ReSTIR
Refractive
True BSDF (Add some subsurface scattering if possible?)
Volume Rendering ~~(Ready for NeRF)~~

Below are the development history & some bloppers of this program. If you are not intersted, there is no need to keep reading.

History

Load mesh within arbitrary scene
- Triangle
- Integrate tinygltf
- Scene Node Tree
Core
- G Buffer
- Russian Roulette
- Sort by material
More BSDF
- Diffuse
- Emissive
- Microfacet
- Reflective
- Refractive
- Disney
BVH
- Basic BVH
  - BoundingBox Array
  - Construct BVH
  - Traverse BVH
- Better Heuristics
  - SAH
- MTBVH
Texture
- Naive texture sampling
  - A Resource Mananger to help get the handle to texture?
- Bump mapping
- Displacement mapping
- Deal with antialiasing
Better sampler
- Encapsulate a sampler class
  - Gotta deal with cmake issue
- Monte carlo sampling
- Importance sampling
- Direct Lighting
- Multiple Importance Sampling
Camera
- Jitter
- Field of depth
- Motion blur
Denoiser
- Use Intel OpenImage Denoiser for now

Log

09.20

Basic raytracer
Refactor integrator
First triangle!

09.21-22

Load arbitrary scene(only geom)
- Triangle
- ~~Primitive assemble phase~~(This will not work, see README of this commit)
- Use tinygltf Remember to check data type before using accessor
- Done with loading a scene with node tree!
  
  Can't tell how excited I am! Now my raytracer is open to most of the scenes!
  - Scene with parenting relationship

09.23-09.26

Waste too much time on OOP. Eventually used C-style coding.

09.26 Finally, finish gltf loading and basic bsdf.

A brief trial
- Note that this difference might be due to different bsdf we are using right now. For convenience, we are using the most naive Diffuse BSDF, while Blender use a standard BSDF by default.

09.27

Naive BVH (probably done...) Scene with 1k faces

One bounce, normal shading
- Without BVH: FPS 10.9
- With BVH: FPS 53.4
- 5 times faster
Multiple bounces
- Without BVH: FPS 7.6
- With BVH: FPS 22.8

09.28

SAH BVH(probably done...)
Texture sampling
- Try billboarding

09.29

Texture mapping
- Texture mapping test(only baseColor shading)
- Bump mapping
  - Normals in world coordinate
  - Before bump mapping
  - After bump mapping It might be a little difficult to notice the difference before and after bump mapping. Please observe the logo on the box, more details are added.
- Texture aliasing is indeed quite serious!
  - However, to implement antialiasing for texture mapping, I may need to consider implementing mipmapping.
Microfacet model pbr failed...
- Need to read through microfacet part and think about how to use roughness and metallic indeed

09.30

Microfacet
- Metal Fresnel hack
- Conductor
  - After mixing, need to consider how to sample
Camera
- Antialiasing

10.1-10.2 Try to refactor camera

Failed. gltf seems to have a really ambiguous definition of camera.

10.3

Denoising
- OpenImage Denoiser built
  - CPU only for now
  - Figure out how to build oidn for cuda
- Integrate it into project

10.4-10.6

Microfacet

10.7

Environment map

10.8

Fix random number issue(Maybe try to generate a better random number array in future?)
- Before
- After
Please notice the fracture on rabbit head before fixing

10.9

MIS (Finally!)
Russian Roulette
- Pro: Speed up by 60%
- Con: Lower the converge speed
Depth of field
- Add a realtime slider to adjust

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

CUDA Path Tracer

Implemeted Feature

Core features (As required by project instruction)

`gltf` Load & A Better Workflow (?)

BVH

Texture Mapping & Bump Mapping

Microfact BSDF

Direct Lighting & MIS

Depth of Field

How to Run

Other Result

Future (If possible)

Cuda Side

Render Side

History

Log

Files

README.md

Latest commit

History

README.md

File metadata and controls

CUDA Path Tracer

Implemeted Feature

Core features (As required by project instruction)

gltf Load & A Better Workflow (?)

BVH

Texture Mapping & Bump Mapping

Microfact BSDF

Direct Lighting & MIS

Depth of Field

How to Run

Other Result

Future (If possible)

Cuda Side

Render Side

History

Log

`gltf` Load & A Better Workflow (?)