Home

Awesome

Realtime DXT/BCn compression, in Unity, on the GPU

Small testbed to see how compute shaders can be used to do texture compression on the GPU in Unity.

Screenshot

Outline of how to do GPU texture compression:

  1. Input is any 2D texture (regular texture, render texture etc.) that the GPU can sample.
  2. We'll need a temporary RenderTexture that is 4x smaller than the destination texture on each axis, i.e. each "pixel" in it is one BCn block. Format of the texture is GraphicsFormat.R32G32_SInt (64 bits) for DXT1/BC1, and GraphicsFormat.R32G32B32A32_SInt (128 bits) otherwise. We'll want to make it writable from a compute shader by setting enableRandomWrite=true.
  3. Output is same size as input (plus any padding to be multiple-of-4 size) Texture2D using one of compressed formats (DXT1/BC1, DXT5/BC3 etc.). We only need it to exist on the GPU, so create Texture2D with TextureCreationFlags.DontInitializePixels | TextureCreationFlags.DontUploadUponCreate flags to save some time, and call Apply(false, true) on it; the last argument ditches the CPU side memory copy.
  4. A compute shader reads input texture from step 1, does {whatever GPU texture compression you do}, and writes into the "one pixel per BCn block" temporary texture from step 2.
  5. Now we must copy from temporary "one pixel per BCn block" texture (step 2) into actual destination texture (step 3). Graphics.CopyTexture or CommandBuffer.CopyTexture with just source and destination textures will not work (since that one checks "does width and height match", which they don't - they differ 4x on each axis). But, Graphics.CopyTexture (or CommandBuffer equivalent) that takes srcElement and dstElement arguments (zeroes for the largest mip level) does work!
  6. Profit! 📈

What is in this project:

Project is based on Unity 2022.3.4. There's one scene that renders things, compresses the rendered result and displays it on screen. The display on screen also shows the difference (multiplied 2x) between original and compressed, as well as alpha channel and difference of that between original and compressed.

Actual GPU texture compressors are just code taken from external projects, under GPUTexCompression/External:

It is extremely likely that better real-time compute shader texture compressors are possible, the two above are just the ones I found that were already written in HLSL. There's also Betsy but that one is written in GLSL, and possibly some others. This example is not so much about compressor itself, but rather "how to plug that into Unity".

Timings for compression of 1280x720 image into BC3 format on several configurations I tried:

GeForce 3080 Ti (D3D11, D3D12, Vulkan)Apple M1 Max (Metal)
XDK0.01ms, RMSE 3.877, 2.0060.01ms, RMSE 3.865, 1.994
AMD q<0.50.01ms, RMSE 3.562, 2.0060.17ms, RMSE 3.563, 1.994
AMD q<0.80.01ms, RMSE 2.817, 2.0060.83ms, RMSE 2.819, 1.994
AMD q<=13.10ms, RMSE 2.544, 1.534117ms😲, RMSE 2.544, 1.524

On Apple/Metal the AMD compressor at "high" quality level is astonishingly slow when using the default (FXC) HLSL shader compiler. However, switching to a more modern DXC shader compiler #pragma use_dxc metal does not work at all, gives a "Error creating compute pipeline state: Compiler encountered an internal error" failure when the compute shader is actually used. Fun!