Awesome
Steps to Build and Run
- git clone the repo
- git submodule update --init --recursive
- Run .\tools\premake5\win\premake5.exe vs2022
- Build directory will be created which will have the solution file.
HIP-BVH-Construction
This repo is implementation of different GPU BVH build methods and optimizations. Following methods are on the roadmap.
- LBVH - Two pass method based on the research paper Maximizing Parallelism in the Construction of BVHs,Octrees, and k-d Trees [DONE]
- LBVH - Single pass method based on the research paper Fast and Simple Agglomerative LBVH Construction [DONE]
- Inplace LBVH - Based on the HIPRT paper.Refer to HIPRT source for more details. [DONE]
- Inplace PLOC++ - When nClusters drop below block size we can optimize the PLOC++ further. After this ploc bvh builder takes ~1ms less. Based on the HIPRT paper. Refer to HIPRT source for more details. [DONE]
- HPLOC - Based on the paper. [DONE]
- Ploc++ - Based on the PLOC++ paper. You can find the kernel is heavily commented to help understand each step more in detail. [DONE]
- Collapse LBVH to nwide BVH - Based on the research paper Getting Rid of Packets Efficient SIMD Single-Ray Traversal using Multi-branching BVHs [Cpu/Gpu DONE] The BVH SAH cost for bunny model drops from ~46 to ~22 while for sponza it drops to ~59 from ~131.
- Binned SAH Builder - Based on research paper On fast Construction of SAH-based Bounding Volume Hierarchies, by I. Wald [DONE] Currently this is CPU only build, though implemented based on task model so should be easy to port to GPU.
- Early Split Clipping - Based on the research paper Early Split Clipping for Bounding Volume Hierarchies Basic idea is to split the AABB of the primitive along the maximum extent midpoint. The problem is leaf node explosion and the criteria to decide to split is user defined parameter. It is hard to find a generic value for this heuristic. Though, I dont see any performance boost in traversal at least for sponza.(Hard to verify my implementation as of now).
- If-If/while-while traversal shaders - Based on the research paper Understanding the Efficiency of Ray Traversal on GPUs – Kepler and Fermi Addendum
- Restart Trail Traversal : Based on the research paper.
- Speculative while/while
Note : You might find function duplication in the kernel files. I did not get time to refactor this much though it helps to understand code as we have all code in one file.
Reference Images
Leaf Node Visits
Performance Numbers With Two Pass LBVH
Timings for Bunny Box(150K triangles) on RX6800 AMD
-----------------------Perf Times----------------------------
CalculateCentroidExtentsTime :0.247701ms
CalculateMortonCodesTime :0.037101ms
SortingTime : 0.1635ms
BvhBuildTime : 0.693201ms
CollapseTime : 2.9558ms
Bvh Cost : 22.6397
Total Time : 1.1415ms
Timings for Sponza(260K triangles) on RX6800 AMD
-------------------------Perf Times------------------------
CalculateCentroidExtentsTime :0.223ms
CalculateMortonCodesTime :0.082599ms
SortingTime : 0.249ms
BvhBuildTime : 0.929ms
CollapseTime : 3.6383ms
Bvh Cost : 59.4779
Total Time : 1.4836ms
Performance Numbers with Single Pass LBVH
Timings for Bunny Box(150K triangles) on RX6800 AMD
------------------------Perf Times------------------------------
CalculateCentroidExtentsTime :0.2209ms
CalculateMortonCodesTime :0.0465ms
SortingTime : 0.173501ms
BvhBuildTime : 0.4865ms
CollapseTime : 3.3596ms
Bvh Cost : 22.6397
Total Time : 0.927401ms
Timings for Sponza(260K triangles) on RX6800 AMD
-----------------------Perf Times---------------------------
CalculateCentroidExtentsTime :0.224899ms
CalculateMortonCodesTime :0.0853ms
SortingTime : 0.2496ms
BvhBuildTime : 0.428799ms
CollapseTime : 3.316ms
Bvh Cost : 59.4779
Total Time : 0.988598ms
Performance Numbers with HPLOC
Timings for Bunny Box(150K triangles) on RX6800 AMD
-----------------------Perf Times----------------------------
CalculateCentroidExtentsTime :0.2593ms
CalculateMortonCodesTime :0.070599ms
SortingTime : 0.177599ms
BvhBuildTime : 0.5147ms
CollapseTime : 3.3104ms
Bvh Cost : 21.9676
Total Time : 1.0222ms
Timings for Sponza(260K triangles) on RX6800 AMD
-----------------------Perf Times---------------------------
CalculateCentroidExtentsTime :0.279299ms
CalculateMortonCodesTime :0.204699ms
SortingTime : 0.2535ms
BvhBuildTime : 0.6133ms
CollapseTime : 4.0426ms
Bvh Cost : 48.2362
Total Time : 1.3508ms
Performance Numbers with PLOC++
Timings for Bunny Box(150K triangles) on RX6800 AMD
-----------------------Perf Times---------------------------
CalculateCentroidExtentsTime :0.1981ms
CalculateMortonCodesTime :0.0776ms
SortingTime : 0.1944ms
BvhBuildTime : 0.688ms
CollapseTime : 3.2174ms
Bvh Cost : 21.9248
Total Time : 1.1581ms
Timings for Sponza(260K triangles) on RX6800 AMD
-----------------------Perf Times---------------------------
CalculateCentroidExtentsTime :0.2975ms
CalculateMortonCodesTime :0.1004ms
SortingTime : 0.2598ms
BvhBuildTime : 0.983297ms
CollapseTime : 4.3278ms
Bvh Cost : 48.842
Total Time : 1.641ms
Note : Timings are on AMD 6800W Pro