Home

Awesome

vk_displacement_micromaps

This is a sample for rendering models using displacement micromaps resulting in a displaced NVIDIA Micro-Mesh. Micromeshes are a new geometry representation to allow high geometric complexity with a reduced memory footprint and fast BVH build times for ray-tracing.

Please refer to any additional details:

While micromeshes were mostly designed for raytracing hardware, this sample focuses a lot on their rasterization through either mesh shaders or compute shaders. It does, however, also implement a renderer for raytracing. The rasterization is demonstrated for both HW rasterization with task and mesh shaders and a "software" compute shader based approach that works well for near pixel-sized triangles.

This sample may also be interesting to those looking into doing custom tessellation for rasterization beyond the traditional tessellation shaders.

Note: The Vulkan SDK may not yet contain a shaderc version that supports all the features of this sample. See Known Issues towards the end of this document.

Fixed Meshlet Allocation

As mesh shaders or compute shaders use fixed allocation schemes for the number of vertices/triangles they operate on, we want to maximize the use of that memory. For mesh shaders this is the output space (layout(max_vertices = ..., max_primitives = ...) out;), for compute shaders we use shared memory to store transient data. Both are configured to 64 vertices and 64 triangles.

A micromesh is the result of power-of-two subdivision of a basetriangle. We want to create the necessary triangle topology on-the-fly within the meshlet allocation.

A key feature and complexity is load balancing shader work with varying subdivision levels. Both approaches rely on fixed meshlet allocations and therefore are efficient when they are utilizing that well. Triangles with high subdivision can be distributed over multiple meshlets and those with low subdivision can be batched together.

Micromeshes support up to subdivision level 5, which is 1024 triangles. That means we need up to 16 x 64 meshlets to render such a displaced triangle. In the shader code we refer to such partial regions of the input triangle as parts. We can also pack multiple lower subdivision levels into one meshlet:

Subdivision LevelMicrotrianglesParts / MeshletsPackedUtilized TrianglesUtilized Vertices
011161648
14183248
216146460
364116445
4256416445
510241616445

To rasterize micromeshes some key operations must be performed:

We recommend the Micro-Mesh Rasterization slides which illustrate these operations in more detail.

This sample implements both vertex decoding variants through an abstract interface micromesh_decoder_api.glsl. The decoding process adds a healthy amount of code complexity and this layered api does allow benchmarking different decoders more easily (during research even more variants had existed). However, it should be noted that this setup could be simplified, if only the intrinsics were targeted.

The decoder implementations currently make use of several pre-computed lookup tables, to get key properties of each vertex within the meshlet as well as the meshlet index buffer permutations that are relevant. Similar as before, if we would only ever use the intrinsic-based decoder, some of this could be more easily handled at runtime, by computing vertex uv-coordinates on the fly.

Note: On NVIDIA hardware launching compute shader workgroups with just 32 threads may yield suboptimal performance. Hence some of the compute shaders in this sample are set to do the equivalent of two jobs at a time, increasing the threads per workgroup. The values for MICRO_FLAT_MESH_GROUPS, MICRO_FLAT_TASK_GROUPS and MICRO_FLAT_SPLIT_TASK_GROUPS were based on a few benchmarks.

Loading

By default the sample will load umesh_Murex_Romosus_compressed.gltf which can be found in the downloaded_resources/umesh_Murex_Romosus directory. There is also a version using uncompressed displacements of the file in the same directory, as well as the config files used for the benchmarks mentioned in the slides. The model was converted from Ramose Murex by Three D Scans as part of the Micro-Mesh Construction research paper by Maggiordomo et al..

The top menu allows to load gltf model files.

The models are represented in the MeshSet struct:

Viewing

At the top of the UI you will find some drop-downs to change what variant of the model is rendered, as well as the ability to overlay a wireframe. Although not all renderers will support overlays.

Rendering

The application always uses downsampling for anti-aliasing, and therefore renders 4x the amount of pixels than seen on screen. Simple forward-shading is used, you can easily hack the shading output by editing draw_shading.glsl and press R to reload.

The renderers only affect the rendering of the displaced data. The list of available renderers in the UI will change depending on whether the loaded file uses compressed or uncompressed data.

Renderer name codes:

Generic files used by renderers

uncompressed common data (rasterization)

uncompressed ms

Uses mesh shader to render uncompressed displacement data.

uncompressed cs

Uses compute shader to render uncompressed displacement data.

This renderer does basic "software rasterization" using 64-bit atomics. Each triangle is sampled in dynamic for loops within a single thread and pixel points within the triangle trigger a 64-bit atomicMin at the appropriate output image location. The upper 32-bit store the depth, and the lower 32-bit a payload. The final shading is done as fullscreen fragment shader pass and turns the 64-bit image into a color and depth output.

The shading is very much simplified in this samples and always outputs global microtriangle ids as colors.

Software rasterization may be faster than hardware rasterization for subpixel-sized triangles and when only simple values need to be rasterized. The software rasterization logic used here is fairly basic and was not specifically optimized. It will not handle near or far clipping properly, and larger triangles can quickly lower performance a lot. A more sophisticated renderer would distribute these more complex scenarios to traditional hardware rasterization.

One compute shader pass operates similar to task shader phase and stores the result in a global scratch buffer. Then an indirect dispatch is computed, which performs the actual rasterization over all visible binpacks. Internally this global scratch buffer is also referred as "flat" buffer, and the number of maximum visible elements can be set in the advanced UI (Render Advanced : flat max visible mshlts).

uncompressed split cs

This is a version of the previous which does some rasterization directly within the task-shading phase. Namely, all bins that are below subdivision level 3 are rasterized directly. The other triangles are sent for rasterization in the second pass through the scratch buffer. This can be slightly faster than previous.

compressed ray

Uses raytracing pipeline to generate an image.

The displacement data needs to be created as VkMicromapEXT and we also need to build the raytracing BLAS and TLAS.

compressed common data (rasterization)

Most of the common data here are is only relevant for rasterization.

The Micro-Mesh Rasterization slides are explaining a lot of the details and should be consulted first.

In the UI "Render Settings > decoder type" influences what decoder logic is used to decompress the displacement at render time.

Micromesh rasterization makes a lot of use of pre-computed lookup tables for different micromesh configurations and decoders. Similar to above you will find relevant information in the appropriate files:

compressed ms

Uses mesh shader to render compressed displacement data.

Following shader code is relevant:

compressed cs

Similar to uncompressed cs but handles compressed data.

Following shader code is relevant:

compressed split cs

Just like uncompressed split cs but handles compressed data.

Screenshot

screenshot

Build

Vulkan SDK 1.3.261.0 or higher

Use CMake to generate solution in /build, leave settings as is. Must be built for x64.

Use vk_displacement_micromaps as the start-up solution.

Known Issues & Limitations

Known Issues

You can find shaderc prebuilt binaries here and the latest Vulkan Beta drivers from NVIDIA here.

Limitations

Third Party Licenses