Home

Awesome

<p align="center"> <!-- license badge --> <a href="https://github.com/nerfstudio-project/nerfstudio/blob/master/LICENSE"> <img alt="License" src="https://img.shields.io/badge/License-Apache_2.0-blue.svg"></a> <!-- stars badge --> <a href="https://github.com/nyu-systems/Grendel-GS/stargazers"> <img alt="GitHub stars" src="https://img.shields.io/github/stars/nyu-systems/Grendel-GS?style=social"/> </a> <!-- community badges --> <!-- <a href="https://discord.gg/uMbNqcraFc"><img src="https://dcbadge.vercel.app/api/server/uMbNqcraFc?style=plastic"/></a> --> <!-- last commit badge --> <!-- <a href="https://github.com/TarzanZhao/Dist-GS/commits/master"> <img alt="Last Commit" src="https://img.shields.io/github/last-commit/TarzanZhao/Dist-GS"/> </a> --> <!-- pull requests badge --> <a href="https://github.com/nyu-systems/Grendel-GS/pulls"> <img alt="Pull Requests" src="https://img.shields.io/github/issues-pr/nyu-systems/Grendel-GS"/> </a> </p> <div align="center">

Grendel-GS

<h4>Gaussian Splatting at Scale with Distributed Training System</h4>

Paper | Project Page

<div align="left"> <div align="center"> <img src="assets/teaser.png" width="900"> </div> <details> <summary> <strong> Click Here to Download Pre-trained Models behind the above visualizations </strong> </summary> </details> <!-- ### [Rubble Model ](https://www.wpeebles.com/DiT) | [Rubble Evaluation Images](https://www.wpeebles.com/DiT) | [MatrixCity Model](https://www.wpeebles.com/DiT) | [MatrixCity Evaluation Images](https://www.wpeebles.com/DiT) ### [Pre-trained Rubble Model(the left one)](https://www.wpeebles.com/DiT) | [Rubble Evaluation Images(the left one)](https://www.wpeebles.com/DiT) | [Pre-trained MatrixCity Model(the right one)](https://www.wpeebles.com/DiT) | [MatrixCity Evaluation Images](https://www.wpeebles.com/DiT) -->

Overview

We design and implement Grendel-GS, which serves as a distributed implementation of 3D Gaussian Splatting training. We aim to help 3DGS achieve its scaling laws with distributed system support, just as the achievements of current LLMs rely on distributed system.

By using Grendel, your 3DGS training could leverage multiple GPUs' capability to achieve significantly faster training, supports a substantially more Gaussians in GPU memory, and ultimately allows for the reconstruction of larger-area, higher-resolution scenes to better PSNR. Grendel-GS retains the original algorithm, making it a direct and safe replacement for original 3DGS implementation in any Gaussian Splatting workflow or application.

For examples, with 4 GPU, Grendel-GS allows you to:

<!-- *Many more new features are developing now, following us!* -->

📢 News

🌟 Follow us for future updates! Interested in collaborating or contributing? Email us!

Table of contents


Why use Grendel-GS

Here is a diagram showing why you may need distributed gaussian splatting training like our Grendel-GS' techniques:

whydistributed

How to use Grendel-GS

This repo and its dependency, our customized distributed version of rendering cuda code(diff-gaussian-rasterization), are both forks from the original 3DGS implementation. Therefore, the usage is generally very similar to the original 3DGS.

The two main differences are:

  1. We support training on multiple GPUs, using the torchrun command-line utility provided by PyTorch to launch jobs.
  2. We support batch sizes greater than 1, with the --bsz argument flag used to specify the batch size.
<!-- Our repository contains a distributed PyTorch-based optimizer to produce a 3D Gaussian model from SfM inputs. The optimizer uses PyTorch and CUDA extensions to produce trained models. -->

Setup

Cloning the Repository

The repository contains submodules, thus please check it out with

git clone git@github.com:nyu-systems/Grendel-GS.git --recursive

Pytorch Environment

Ensure you have Conda, GPU with compatible driver and cuda environment installed on your machine, as prerequisites. Then please install PyTorch, Torchvision, Plyfile, tqdm which are essential packages. Make sure PyTorch version >= 1.10 to have torchrun for distributed training. Finally, compile and install two dependent cuda repo diff-gaussian-rasterization and simple-knn containing our customized cuda kernels for rendering and etc.

We provide a yml file for easy environment setup. However, you should choose the versions to match your local running environment.

conda env create --file environment.yml
conda activate gaussian_splatting

NOTES: We kept additional dependencies minimal compared to the original 3DGS. For environment setup issues, maybe you could refer to the original 3DGS repo issue section first.

Dataset

We use colmap format to load dataset. Therefore, please download and unzip colmap datasets before trainning, for example Mip360 dataset and 4 scenes from Tanks&Temple and DeepBlending.

<!-- Its core idea is to leverage more GPU's capability during training and to increase the batch size to utilize these GPU better. Therefore, we could accommodate much more gaussian primitives in large-scale and high resolution scenes, and speed up at the same time. It contains a distributed PyTorch-based optimizer to produce a 3D Gaussian model from SfM inputs. The optimizer uses PyTorch and CUDA extensions in a Python environment to produce trained models. --> <!-- _Authors: [**Hexu Zhao¹**](https://tarzanzhao.github.io), [**Haoyang Weng¹\***](https://egalahad.github.io), [**Daohan Lu¹\***](https://daohanlu.github.io), [**Ang Li²**](https://www.angliphd.com), [**Jinyang Li¹**](https://www.news.cs.nyu.edu/~jinyang/), [**Aurojit Panda¹**](https://cs.nyu.edu/~apanda/), [**Saining Xie¹**](https://www.sainingxie.com)_ (\* *Indicates equal contribution*) _Affiliations: [**¹New York University**](https://cs.nyu.edu/home/index.html), [**²Pacific Northwest National Laboratory**](https://www.pnnl.gov)_ - **[Arxiv Paper](https://github.com/microsoft/DeepSpeed/tree/master/blogs/deepspeed-fp6/03-05-2024/README.md)** - **[Pre-trained Models (14 GB)](https://github.com/microsoft/DeepSpeed/tree/master/blogs/deepspeed-fp6/03-05-2024/README.md)** - **[Evaluation Images (7 GB)](https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/evaluation/images.zip)** (TODO: check all the links here before releasing) Abstract: *3D Gaussian Splatting (3DGS) is gaining popularity for 3D reconstruction because of its superior visual quality and rendering speed. However, training is currently done on a single GPU, and thus cannot handle high-resolution and large-scale 3D reconstruction tasks due to the GPU's memory capacity limit. We build a distributed system, called Grendel, to partition 3DGS' parameters and parallelize its computation across multiple GPUs. As each Gaussian affects a small and changing subset of rendered pixels, Grendel relies on sparse all-to-all communication to transfer each required Gaussian to a pixel partition and performs dynamic load balancing. Unlike existing 3DGS systems that train using one camera view image at a time, Grendel supports batched training using multiple views. We explore different learning rate scaling strategies and identify the simple sqrt(batch size) scaling rule to be highly effective. Evaluation using large-scale high-resolution scenes show that Grendel can improve rendering quality by scaling up 3DGS quantity using multiple GPUs. On the "Rubble" dataset, we achieve a test PSNR of 27.28 by distributing 40.4 million Gaussians across 16 GPUs. By comparison, one achieves a PSNR of 26.28 when using 11.2 million Gaussians in order to fit in a single GPU's memory.* <section class="section" id="BibTeX"> <div class="container is-max-desktop content"> <h2 class="title">BibTeX</h2> <pre><code>@Article{kerbl3Dgaussians, author = {Kerbl, Bernhard and Kopanas, Georgios and Leimk{\"u}hler, Thomas and Drettakis, George}, title = {3D Gaussian Splatting for Real-Time Radiance Field Rendering}, journal = {ACM Transactions on Graphics}, number = {4}, volume = {42}, month = {July}, year = {2023}, url = {https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/} }</code></pre> </div> (TODO:change it to our bib) </section> --> <!-- --- # Repository Overview Our repository is a research-focused framework for distributed Gaussian Splatting training, as detailed in the paper. Its core idea is to leverage more GPU by distributed computation during training and to increase the batch size to utilize these GPU better. Therefore, we could accommodate much more gaussian primitives in large-scale and high resolution scenes, and speed up at the same time. It contains a distributed PyTorch-based optimizer to produce a 3D Gaussian model from SfM inputs. The optimizer uses PyTorch and CUDA extensions in a Python environment to produce trained models. Advantages of using our distributed implementation for gaussians splatting: 1. Train at higher resolution. 2. Train larger scenes. 3. Train the same scene dozens of times faster. 4. Increased PSNR with the same training time. -->

Training

For single-GPU non-distributed training with batch size of 1:

python train.py -s <path to COLMAP dataset> --eval

For 4 GPU distributed training and batch size of 4:

torchrun --standalone --nnodes=1 --nproc-per-node=4 train.py --bsz 4 -s <path to COLMAP dataset> --eval
<details> <summary><span style="font-weight: bold;">Command Line Arguments for train.py</span></summary>

--source_path / -s

Path to the source directory containing a COLMAP data set.

--model_path / -m

Path where the trained model and loggings should be stored (/tmp/gaussian_splatting by default).

--eval

Add this flag to use a MipNeRF360-style training/test split for evaluation.

--bsz

The batch size(the number of camera views) in single step training. 1 by default.

--backend

The CUDA backend to use in training. Valid options include diff (diff-gaussian-rasterization) and gsplat. diff by default.

--lr_scale_mode

The mode of scaling learning rate given larger batch size. sqrt by default.

--preload_dataset_to_gpu

Save all groundtruth images from the dataset in GPU, rather than load each image on-the-fly at each training step. If dataset is large, preload_dataset_to_gpu will lead to OOM; when the dataset is small, preload_dataset_to_gpu could speed up the training a little bit by avoiding some cpu-gpu communication.

--iterations

Number of total iterations to train for, 30_000 by default.

--test_iterations

Space-separated iterations at which the training script computes L1 and PSNR over test set, 7000 30000 by default.

--save_iterations

Space-separated iterations at which the training script saves the Gaussian model, 7000 30000 <iterations> by default.

--checkpoint_iterations

Space-separated iterations at which to store a checkpoint for continuing later, saved in the model directory.

--start_checkpoint

Path to a saved checkpoint to continue training from.

--white_background / -w

Add this flag to use white background instead of black (default), e.g., for evaluation of NeRF Synthetic dataset.

--sh_degree

Order of spherical harmonics to be used (no larger than 3). 3 by default.

--feature_lr

Spherical harmonics features learning rate, 0.0025 by default.

--opacity_lr

Opacity learning rate, 0.05 by default.

--scaling_lr

Scaling learning rate, 0.005 by default.

--rotation_lr

Rotation learning rate, 0.001 by default.

--position_lr_max_steps

Number of steps (from 0) where position learning rate goes from initial to final. 30_000 by default.

--position_lr_init

Initial 3D position learning rate, 0.00016 by default.

--position_lr_final

Final 3D position learning rate, 0.0000016 by default.

--position_lr_delay_mult

Position learning rate multiplier (cf. Plenoxels), 0.01 by default.

--densify_from_iter

Iteration where densification starts, 500 by default.

--densify_until_iter

Iteration where densification stops, 15_000 by default.

--densify_grad_threshold

Limit that decides if points should be densified based on 2D position gradient, 0.0002 by default.

--densification_interval

How frequently to densify, 100 (every 100 iterations) by default.

--opacity_reset_interval

How frequently to reset opacity, 3_000 by default.

--lambda_dssim

Influence of SSIM on total loss from 0 to 1, 0.2 by default.

--percent_dense

Percentage of scene extent (0--1) a point must exceed to be forcibly densified, 0.01 by default.

</details> <br>

Rendering

python render.py -s <path to COLMAP dataset> --model_path <path to folder of saving model> 
<details> <summary><span style="font-weight: bold;">Command Line Arguments for render.py</span></summary>

--model_path / -m

Path to the trained model directory you want to create renderings for.

--skip_train

Flag to skip rendering the training set.

--skip_test

Flag to skip rendering the test set.

--distributed_load

If point cloud models are saved distributedly during training, we should set this flag to load all of them.

--quiet

Flag to omit any text written to standard out pipe.

The below parameters will be read automatically from the model path, based on what was used for training. However, you may override them by providing them explicitly on the command line.

--source_path / -s

Path to the source directory containing a COLMAP or Synthetic NeRF data set.

--images / -i

Alternative subdirectory for COLMAP images (images by default).

--eval

Add this flag to use a MipNeRF360-style training/test split for evaluation.

--llffhold

The training/test split ratio in the whole dataset for evaluation. llffhold=8 means 1/8 is used as test set and others are used as train set.

--white_background / -w

Add this flag to use white background instead of black (default), e.g., for evaluation of NeRF Synthetic dataset.

</details>

For interactive rendering, please refer to GaussFusion, which also support rendering two checkpoints with interactive controls.

Evaluating metrics

python metrics.py --model_path <path to folder of saving model> 
<details> <summary><span style="font-weight: bold;">Command Line Arguments for metrics.py</span></summary>

--model_paths / -m

Space-separated list of model paths for which metrics should be computed.

</details> <br>

Migrating from original 3DGS codebase

If you are currently using the original 3DGS codebase for training in your application, you can effortlessly switch to our codebase because we haven't made any algorithmic changes. This will allow you to train faster and successfully train larger, higher-precision scenes without running out of memory (OOM) within a reasonable time frame.

It is worth noting that we only support the training functionality; this repository does not include the interactive viewer, network viewer, or colmap features from the original 3DGS. We are actively developing to support more features. Please let us know your needs or directly contribute to our project. Thank you!


Benefits and Examples

Significantly Faster Training Without Compromising Reconstruction Quality On Mip360 Dataset

Training Time

30k Train Time(min)stumpbicyclekitchenroomcountergardenbonsai
1 GPU + Batch Size=124.0330.1825.5822.4521.630.1519.18
4 GPU + Batch Size=19.0711.679.538.938.8210.858.03
4 GPU + Batch Size=45.226.476.986.185.986.485.28

Test PSNR

30k Test PSNRstumpbicyclekitchenroomcountergardenbonsai
1 GPU + Batch Size=126.6125.2131.431.428.9327.2732.01
4 GPU + Batch Size=126.6525.1931.4131.3828.9827.2831.92
4 GPU + Batch Size=426.5925.1731.3731.3228.9827.231.94

Reproduction Instructions

  1. Download and unzip the Mip360 dataset.
  2. Activate the appropriate conda/python environment.
  3. To execute all experiments and generate this table, run the following command:
    bash examples/mip360/eval_all_mip360.sh <path_to_save_experiment_results> <path_to_mip360_dataset>
    

Significantly Speed up and Reduce per-GPU memory usage on Mip360 at 4K Resolution

Configuration50k Training TimeMemory Per GPUPSNR
bicycle + 1 GPU + Batch Size=12h 38min37.1823.78
bicycle + 4 GPU + Batch Size=10h 50min10.3923.79
garden + 1 GPU + Batch Size=12h 49min29.8726.06
garden + 4 GPU + Batch Size=10h 50min7.8826.06

Unlike the typical approach of downsampling the Mip360 dataset by a factor of four before training, our system can train directly at full resolution. The bicycle and garden images have resolutions of 4946x3286 and 5187x3361, respectively. Our distributed system demonstrates that we can significantly accelerate and reduce memory usage per GPU by several folds without sacrificing quality.

Reproduction Instructions

Set up the dataset and Python environment as outlined previously, then execute the following:

   bash examples/mip360_4k/eval_mip360_4k.sh <path_to_save_experiment_results> <path_to_mip360_dataset>

Train in 45 Seconds on Tanks&Temple at 1K Resolution

Configuration7k Training Time7k test PSNR30k Training Time30k test PSNR
train + 4 GPU + Batch Size=844s19.373min 30s21.87
truck + 4 GPU + Batch Size=845s23.793min 39s25.35

Tanks&Temples dataset includes train and truck scenes with resolutions of 980x545 and 979x546, respectively. Utilizing 4 GPUs, we've managed to train on these small scenes to a reasonable quality in just 45 seconds(7k iterations). In the original Gaussian splatting papers, achieving a test PSNR of 18.892 and 23.506 at 7K resolution was considered good on train and truck, respectively. Our results are comparable to these benchmarks.

Reproduction Instructions

Set up the Tanks&Temple and DeepBlending Dataset and Python environment as outlined previously, then execute the following:

   bash examples/train_truck_1k/eval_train_truck_1k.sh <path_to_save_experiment_results> <path_to_tandb_dataset>

(TODO: check these scripts have no side-effects)

Experimental Setup for all experiments statistics above


New features [Please check regularly!]

Paper and Citation

Our system design, analysis of large-batch training dynamics, and insights from scaling up are all documented in the paper below:

On Scaling Up 3D Gaussian Splatting Training<br> Hexu Zhao¹, Haoyang Weng¹*, Daohan Lu¹*, Ang Li², Jinyang Li¹, Aurojit Panda¹, Saining Xie¹ (* co-second authors) <br>¹New York University, ²Pacific Northwest National Laboratory <br>

<section class="section" id="BibTeX"> <div class="container is-max-desktop content"> <h2 class="title">BibTeX</h2> <pre><code>@misc{zhao2024scaling3dgaussiansplatting, title={On Scaling Up 3D Gaussian Splatting Training}, author={Hexu Zhao and Haoyang Weng and Daohan Lu and Ang Li and Jinyang Li and Aurojit Panda and Saining Xie}, year={2024}, eprint={2406.18533}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2406.18533}, }</code></pre> </div> </section>

Code Specification

Please use "black" with default settings to format the code if you want to contribute.

Setup

conda install black==24.4.2

License

Distributed under the Apache License Version 2.0 License. See LICENSE.txt for more information.

Reference

  1. Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics, July 2023. URL: https://repo-sam.inria.fr/fungraph/3d-gaussian-splatting/.