Home

Awesome

pixelSplat

This is the code for pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction by David Charatan, Sizhe Lester Li, Andrea Tagliasacchi, and Vincent Sitzmann.

Check out the project website here. We presented pixelSplat at CVPR 2024 in Seattle. You can find the presentation slides here.

https://github.com/dcharatan/pixelsplat/assets/13124225/de90101e-1bb5-42e4-8c5b-35922cae8f64

Camera-ready Updates

This version of the codebase has been updated slightly to reflect the CVPR camera-ready version of the paper (and the latest version of the paper on arXiv). Here are the changes:

Run NamePSNRSSIMLPIPS
re10k (old)25.890.8580.142
re10k (new)26.090.8630.136
acid (old)28.140.8390.150
acid (new)28.270.8430.146

Installation

To get started, create a virtual environment using Python 3.10+:

python3.10 -m venv venv
source venv/bin/activate
# Install these first! Also, make sure you have python3.11-dev installed if using Ubuntu.
pip install wheel torch torchvision torchaudio
pip install -r requirements.txt

If your system does not use CUDA 12.1 by default, see the troubleshooting tips below.

<details> <summary>Troubleshooting</summary> <br>

The Gaussian splatting CUDA code (diff-gaussian-rasterization) must be compiled using the same version of CUDA that PyTorch was compiled with. As of December 2023, the version of PyTorch you get when doing pip install torch was built using CUDA 12.1. If your system does not use CUDA 12.1 by default, you can try the following:

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
LD_LIBRARY_PATH=/usr/local/cuda-12.1/lib64 pip install -r requirements.txt
# If everything else was installed but you're missing diff-gaussian-rasterization, do:
LD_LIBRARY_PATH=/usr/local/cuda-12.1/lib64 pip install git+https://github.com/dcharatan/diff-gaussian-rasterization-modified
</details>

Acquiring Datasets

pixelSplat was trained using versions of the RealEstate10k and ACID datasets that were split into ~100 MB chunks for use on server cluster file systems. Small subsets of the Real Estate 10k and ACID datasets in this format can be found here. To use them, simply unzip them into a newly created datasets folder in the project root directory.

If you would like to convert downloaded versions of the Real Estate 10k and ACID datasets to our format, you can use the scripts here. Reach out to us if you want the full versions of our processed datasets, which are about 500 GB and 160 GB for Real Estate 10k and ACID respectively.

Acquiring Pre-trained Checkpoints

You can find pre-trained checkpoints here. You can find the checkpoints for the original codebase (without the improvements from the camera-ready version of the paper) here.

Running the Code

Training

The main entry point is src/main.py. Call it via:

python3 -m src.main +experiment=re10k

This configuration requires a single GPU with 80 GB of VRAM (A100 or H100). To reduce memory usage, you can change the batch size as follows:

python3 -m src.main +experiment=re10k data_loader.train.batch_size=1

Our code supports multi-GPU training. The above batch size is the per-GPU batch size.

Evaluation

To render frames from an existing checkpoint, run the following:

# Real Estate 10k
python3 -m src.main +experiment=re10k mode=test dataset/view_sampler=evaluation dataset.view_sampler.index_path=assets/evaluation_index_re10k.json checkpointing.load=checkpoints/re10k.ckpt

# ACID
python3 -m src.main +experiment=acid mode=test dataset/view_sampler=evaluation dataset.view_sampler.index_path=assets/evaluation_index_acid.json checkpointing.load=checkpoints/acid.ckpt

Note that you can also use the evaluation indices that end with _video (in /assets) to render the videos shown on the website.

Ablations

You can run the ablations from the paper by using the corresponding experiment configurations. For example, to ablate the epipolar encoder:

python3 -m src.main +experiment=re10k_ablation_no_epipolar_transformer

Our collection of pre-trained checkpoints includes checkpoints for the ablations.

VS Code Launch Configuration

We provide VS Code launch configurations for easy debugging.

Camera Conventions

Our extrinsics are OpenCV-style camera-to-world matrices. This means that +Z is the camera look vector, +X is the camera right vector, and -Y is the camera up vector. Our intrinsics are normalized, meaning that the first row is divided by image width, and the second row is divided by image height.

Figure Generation Code

We've included the scripts that generate tables and figures in the paper. Note that since these are one-offs, they might have to be modified to be run.

Notes on Bugs

Since the original release of the pixelSplat codebase, the following bugs have been identified:

Related Papers

Check out the following papers that build on top of pixelSplat's codebase:

If you used ideas or code from pixelSplat and would like to be featured here, send an email to charatan@mit.edu!

BibTeX

@inproceedings{charatan23pixelsplat,
      title={pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction},
      author={David Charatan and Sizhe Li and Andrea Tagliasacchi and Vincent Sitzmann},
      year={2023},
      booktitle={arXiv},
}

Acknowledgements

This work was supported by the National Science Foundation under Grant No. 2211259, by the Singapore DSTA under DST00OECI20300823 (New Representations for Vision), by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior/ Interior Business Center (DOI/IBC) under 140D0423C0075, and by the Amazon Science Hub. The Toyota Research Institute also partially supported this work. The views and conclusions contained herein reflect the opinions and conclusions of its authors and no other entity.