University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 3
- Yian Chen
- LinkedIn, personal website etc.
- Tested on: Windows 10, AMD Ryzen 5800 HS with Radeon Graphics CPU @ 3.20GHz 16GB, NVIDIA GeForce RTX3060 Laptop 8GB
- Core(as required in Project 3)
- Diffuse & Specular
- Jittering (Antialiasing)
- First Bounce Cache
- Sort by material
- Load gltf
- BVH && SAH
- Texture mapping & bump mapping
- Environment Mapping
- Microfacet BSDF
- Emissive BSDF (with Emissive Texture)
- Direct Lighting
- Multiple Importance Sampling
- Depth of Field
- Tone mapping && Gamma Correction
- Diffuse & Specular
- Jittering
Before jittering | After jittering |
---|---|
In this pathtracer, supported scene format is gltf
for its high expressive capability of 3D scenes. Please view this page for more details about gltf.
Eventually, during development, most scenes used for testing is directly exported from Blender. This enables a much higher flexibility for testing.
scenes/pathtracer_robots_demo.glb
Link
On host, we can construct and traverse BVH recursively. While in this project, our code run on GPU. Though recent cuda update allows recursive function execution on device, we cannot take that risk as raytracer is very performance-oriented. Recursive execution will slow down the kernel function, as it may bring dynamic stack size.
Thanks to this paper, a novel BVH constructing and traversing algorithm called MTBVH is adopted in this pathtracer. This method is stack-free.
This pathtracer only implements a simple version of MTBVH. Instead of constructing 6 BVHs and traversing one of them at runtime, only 1 BVH is constructed. It implies that this pathtracer still has the potential of speeding up.
- With BVH & Without BVH:
With BVH | Without BVH |
---|---|
As expected, speedup is huge up to 40 times. With a more complex scene, BVH should give a higher speedup.
To enhance the details of mesh surfaces and gemometries, texture mapping is a must. Here we have not implemented mipmap on GPU, though it should not be that difficult to do so.
scenes/pathtracer_test_texture.glb
Link
Before bump mapping | After bump mapping |
---|---|
To use various material, bsdfs that are more complicated than diffuse/specular are required. Here, we will first implement the classic microfacet BSDF to extend the capability of material in this pathtracer.
This pathtracer uses the Microfacet implementation basd on pbrt.
Metallness = 1. Roughness 0 to 1 from left to right.
Please note that the sphere used here is not an actual sphere but an icosphere.
scenes/pathtracer_test_microfacet.glb
Link
With texture mapping implemented, we can use metallicRoughness
texture now. Luckily, gltf
has a good support over metallic workflow.
scenes/pathtracer_robot.glb
Link
To stress the speed up of convergence in MIS, Russian-Roulette is disabled in this part's rendering.
The tiny dark stripe is visible in some rendering result. This is because by default we do not allow double-sided lighting in this pathtracer.
By default, number of light sample is set to 3.
When sampling for the direction of next bounce, we have adopted importance sampling for bsdf for most of the time. It enhances the convergence speed for specular materials, as the sampling strategies greatly aligned with the expected radiance distribution on hemisphere. However, for diffuse/matte surfaces, this sampling strategies can be optimized, as the most affecting factors for the radiance distribution of these sort of materials is light instead of outgoing rays. Thus, sampling from light is also a valuable strategy to speedup convergence speed of raytracing rough surfaces.
In this demo scene, 3 metal plane are allocated with 4 cube lights. When we only sample bsdf, we can see that the expected radiance on the surface of metal plane converges. When we only sample light, we can see how the rougher part of the scene, the back white wall, has better converging speed. Hence, we are looking forward to a sampling strategy that combines the advantages of these two, which is multiple importance sampling.
scenes/pathtracer_mis_demo.glb
Link
Only sample bsdf 500spp | Only sample light 500spp | MIS 500spp |
---|---|---|
To see more details about this part, see this part of pbrt or this post of mine.
Test on bunny scene. Faster convergence speed can be observed.
scenes/pathtracer_bunny_mis.glb
Link
Without MIS 256spp | With MIS 256spp |
---|---|
Without MIS 5k spp | With MIS 5k spp |
In depth of field, we define two variables. focal_length
& aperture
.
More details can be viewed in this post.
Depth of Field (Aperture=0.3) |
---|
Due to time limit, this code is not refactored to be production-ready. Many features are controlled by macros or even hardcoded, which is indeed irrational. More refactoring job might be accomplished in future.
To run this program for now, users need to adjust code.
Most changes should be done within pathtrace.cu
.
...
// Default settings
#define USE_FIRST_BOUNCE_CACHE 0
#define USE_SORT_BY_MATERIAL 0
#define MIS 1 // Multiple Importance Sampling
#define RUSSIAN_ROULETTE 0
#if RUSSIAN_ROULETTE
#define RR_THRESHOLD 0.7f
#endif
#define USE_ENV_MAP 1
#define USE_BVH 1
#define TONE_MAPPING 0
...
// Change loaded scene here
static Scene * hst_scene = new Scene("..\\scenes\\pathtracer_robots_demo.glb");
...
Most settings should be self-explainable.
Another part where users can change settings is in bvh.h
...
#define BVH_NAIVE 1
#define BVH_SAH 0
...
Here, we can determine whether BVH will use a naive split or a surface area heuristic function.
- More cuda optimization
- Bank conflict
- Loop unroll
- Light sample loop (if multiple light rays)
- Higher parallelism (Use streams?)
- Tile-based raytracing
- Potentially, it should increase the rendering speed, as it will maximize locallity within one pixel/tile. No more realtime camera movement though.
- Adaptive Sampling
- Mipmap
- ReSTIR
- Refractive
- True BSDF (Add some subsurface scattering if possible?)
- Volume Rendering
(Ready for NeRF)
Below are the development history & some bloppers of this program. If you are not intersted, there is no need to keep reading.
-
Load mesh within arbitrary scene
- Triangle
- Integrate
tinygltf
- Scene Node Tree
-
Core
- G Buffer
- Russian Roulette
- Sort by material
-
More BSDF
- Diffuse
- Emissive
- Microfacet
- Reflective
- Refractive
- Disney
-
BVH
- Basic BVH
- BoundingBox Array
- Construct BVH
- Traverse BVH
- Better Heuristics
- SAH
- MTBVH
- Basic BVH
-
Texture
- Naive texture sampling
- A Resource Mananger to help get the handle to texture?
- Bump mapping
- Displacement mapping
- Deal with antialiasing
- Naive texture sampling
-
Better sampler
- Encapsulate a sampler class
- Gotta deal with cmake issue
- Monte carlo sampling
- Importance sampling
- Direct Lighting
- Multiple Importance Sampling
- Encapsulate a sampler class
-
Camera
- Jitter
- Field of depth
- Motion blur
-
Denoiser
- Use Intel OpenImage Denoiser for now
09.20
- Basic raytracer
- Refactor integrator
- First triangle!
09.21-22
- Load arbitrary scene(only geom)
09.23-09.26
Waste too much time on OOP. Eventually used C-style coding.
09.26 Finally, finish gltf loading and basic bsdf.
- A brief trial
09.27
Naive BVH (probably done...) Scene with 1k faces
- One bounce, normal shading
- Multiple bounces
09.28
09.29
-
Texture mapping
-
Bump mapping
-
Texture aliasing is indeed quite serious!
- However, to implement antialiasing for texture mapping, I may need to consider implementing mipmapping.
-
Microfacet model pbr failed...
- Need to read through microfacet part and think about how to use roughness and metallic indeed
09.30
- Microfacet
- Metal Fresnel hack
- Conductor
- After mixing, need to consider how to sample
- Camera
- Antialiasing
10.1-10.2 Try to refactor camera
- Failed. gltf seems to have a really ambiguous definition of camera.
10.3
- Denoising
- OpenImage Denoiser built
- CPU only for now
- Figure out how to build
oidn
for cuda
- Integrate it into project
- OpenImage Denoiser built
10.4-10.6
- Microfacet
10.7
- Environment map
10.8
-
Fix random number issue(Maybe try to generate a better random number array in future?)
Please notice the fracture on rabbit head before fixing
10.9
-
MIS (Finally!)
-
Russian Roulette
- Pro: Speed up by 60%
- Con: Lower the converge speed
-
Depth of field
- Add a realtime slider to adjust