Home

Awesome

vkgs

Gaussian splatting viewer written in Vulkan.

Main goal of this project is maximizing rendering speed.

Now that I achieved satisfactory performance with Vulkan-based viewer, I would like to catch my breath for the next steps, or stop further developments and start a new side project - compression, large scale, train, etc.

Desktop Viewer

Viewer works with pre-trained vanilla 3DGS models as input.

Feature Highlights

Requirements

Dependencies

Build

$ cmake . -B build
$ cmake --build build --config Release -j

Run

$ ./build/vkgs_viewer  # or ./build/Release/vkgs_viewer
$ ./build/vkgs_viewer -i <ply_filepath>

Drag and drop pretrained .ply file from official gaussian splatting, Pre-trained Models (14 GB).

Left drag to rotate.

Right drag to translate.

Left+Right drag to zoom in/out.

WASD, Space to move.

Wheel to zoom in/out.

Ctrl+wheel to change FOV.

Performance Test

Rendering Algorithm Details

Like other web based viewer, it uses traditional graphics pipeline, drawing splats projected in 2D screen space.

One of benefits of using graphics pipeline rather than compute pipeline is that splats can be drawn together with other objects and graphics pipeline features such as MSAA.

  1. (COMPUTE) rank
    • Cull splats outside view frustum, create key-value pairs to sort, based on view space depth.
  2. (COMPUTE) sort
    • Perform 32bit key-value radix sort.
    • Indirect dispatch, sorting only visible points. Not a big deal, sort time is negligible compared to projection/rendering step.
  3. (COMPUTE) inverse
    • Create inverse index map from splat order from sorted index.
    • This is for sequential memory access pattern in the next step.
  4. (COMPUTE) projection
    • Calculate 3D-to-2D gaussian splat projection, and color using spherical harmonics.
    • Using F16 Spherical Harmonics increased rendering speed.
  5. (GRAPHICS) rendering
    • Simply draw 2D guassian quads.
    • Speed up with indirect rendering, issuing only visible splats to draw command, reducing the number of shader invocations.

Projection and rendering steps are bottlenecks.

Current Onesweep radix sort implementation doesn't seem to work on MacOS.

https://raphlinus.github.io/gpu/2021/11/17/prefix-sum-portable.html

So I've implemented reduce-then-scan radix sort. No big performance difference even on NVidia GPU.

References

Notes

pygs: Python Binding (WIP)

GUI is created in an off thread. According to GLFW documentation, the user should create window in main thread. However, managing windows off-thread seems working in Windows and Linux somehow.

Unfortunately, Apple doesn't allow this. Apple’s UI frameworks can only be called from the main thread. Here's a related thread by Apple staff.

Requirements

$ conda create -n pygs python=3.10
$ conda activate pygs
$ conda install conda-forge::cmake
$ conda install conda-forge::pybind11
$ conda install nvidia/label/cuda-12.2.2::cuda-toolkit  # or any other version

Build

The python package dynamically links to c++ shared library file.

So, first build the shared library first, then install python package.

$ cmake . -B build
$ cmake --build build --config Release -j
$ pip install -e binding/python

Test

$ python
>>> import pygs
>>> pygs.show()
>>> pygs.load("./models/bicycle_30000.ply")  # asynchronously load model to viewer
>>> pygs.load("./models/garden_30000.ply")
>>> pygs.close()