Home

Awesome

<!-- # NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance Fields --> <div align="center"> <img src="demo/nerf-mae_teaser.png" width="85%"> <img src="demo/nerf-mae_teaser.jpeg" width="85%"> </div> <!-- <p align="center"> <img src="demo/nerf-mae_teaser.jpeg" width="100%"> </p> --> <br> <div align="center">

arXiv Project Page Pytorch Cite Video

</div>
<a href="https://www.tri.global/" target="_blank"> <img align="right" src="demo/GeorgiaTech_RGB.png" width="18%"/> </a> <a href="https://www.tri.global/" target="_blank"> <img align="right" src="demo/tri-logo.png" width="17%"/> </a>

Project Page | arXiv | PDF

NeRF-MAE : Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance Fields

<a href="https://zubairirshad.com"><strong>Muhammad Zubair Irshad</strong></a> · <a href="https://zakharos.github.io/"><strong>Sergey Zakharov</strong></a> · <a href="https://www.linkedin.com/in/vitorguizilini"><strong>Vitor Guizilini</strong></a> · <a href="https://adriengaidon.com/"><strong>Adrien Gaidon</strong></a> · <a href="https://faculty.cc.gatech.edu/~zk15/"><strong>Zsolt Kira</strong></a> · <a href="https://www.tri.global/about-us/dr-rares-ambrus"><strong>Rares Ambrus</strong></a> <br> European Conference on Computer Vision, ECCV 2024<br>

<b> Toyota Research Institute   |   Georgia Institute of Technology</b>

💡 Highlights

🏷️ TODO

NeRF-MAE Model Architecture

<p align="center"> <img src="demo/nerf-mae_architecture.jpg" width="90%"> </p> <!-- _________________ <p align="center"> <img src="demo/comparison_mae.jpeg" width="100%"> </p> -->

Citation

If you find this repository or our dataset useful, please star ⭐ this repository and consider citing 📝:

@inproceedings{irshad2024nerfmae,
    title={NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance Fields},
    author={Muhammad Zubair Irshad and Sergey Zakharov and Vitor Guizilini and Adrien Gaidon and Zsolt Kira and Rares Ambrus},
    booktitle={European Conference on Computer Vision (ECCV)},
    year={2024}
    }

Contents

🌇 Environment

Create a python 3.7 virtual environment and install requirements:

cd $NeRF-MAE repo
conda create -n nerf-mae python=3.9
conda activate nerf-mae
pip install --upgrade pip
pip install -r requirements.txt
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 -f https://download.pytorch.org/whl/torch_stable.html

The code was built and tested on cuda 11.3

Compile CUDA extension, to run downstream task finetuning, as described in NeRF-RPN:

cd $NeRF-MAE repo
cd nerf_rpn/model/rotated_iou/cuda_op
python setup.py install
cd ../../../..

⛳ Dataset

Download the preprocessed datasets here.

Extract pretraining and finetuning dataset under NeRF-MAE/datasets. The directory structure should look like this:

NeRF-MAE
├── pretrain
│   ├── features
│   └── nerfmae_split.npz
└── finetune
    └── front3d_rpn_data
        ├── features
        ├── aabb
        └── obb

Note: The above datasets are all you need to train and evaluate our method. Bonus: we will be releasing our multi-view rendered posed RGB images from FRONT3D, HM3D and Hypersim as well as Instant-NGP trained checkpoints soon (these comprise over 1M+ images and 3k+ NeRF checkpoints)

Please note that our dataset was generated using the instruction from NeRF-RPN and 3D-CLR. Please consider citing our work, NeRF-RPN and 3D-CLR if you find this dataset useful in your research.

Please also note that our dataset uses Front3D, Habitat-Matterport3D, HyperSim and ScanNet as the base version of the dataset i.e. we train a NeRF per scene and extract radiance and desnity grid as well as aligned NeRF-grid 3D annotations. Please read the term of use for each dataset if you want to utilize the posed multi-view images for each of these datasets.

💫 Usage (Coming Soon)

NeRF-MAE (package: nerf-mae) is structured to provide easy access to pretrained NeRF-MAE models (and reproductions), to facilitate use for various downstream tasks. This is for extracting good visual features from NeRFs if you don't have resources for large-scale pretraining. Our pretraining provides an easy-to-access embedding of any NeRF scene, which can be used for a variety of downstream tasks in a straightforwaed way. Package, usage and our pretrained checkpoints are coming soon.

<!-- Using a pretrained NeRF-MAE model is easy: Navigate to **nerf-mae** folder and run pretraining script. -->

📉 Pretraining

Ofcourse, you can also pretrain your own NeRF-MAE models. Navigate to nerf-mae folder and run pretraining script.

cd nerf-mae
bash train_mae3d.sh

Checkout train_mae3d.sh file for a complete list of all hyperparameters such as num_epochs, lr, masking_prob etc.

Checkpoints will be saved at a regular interval of 200 epochs. For reproducing the paper results, we utilize the checkpoints at 1200 epochs.

Notes:

  1. with default settings i.e. batch_size 32 and gpus 0,1,2,3,4,5,6,7 on A100 GPU, the expected time it takes to pretrain is around 2 days. Please set these accoringly based on your machine's capacity.

  2. The dataset_name is set to default as dataset_name="nerfmae". This is for convenince for the dataloader as it describes the format. Our pretraining data comprises of scenes from Front3D, Habitat Matterport3D and Hypersim.

📊 Finetuning

Our finetuning code is largely based on NeRF-RPN. Infact, we use the same dataset as NeRF-RPN (unseen during pretraining), for finetuning. This makes sure our comparison with NeRF-RPN is based on the same architecture, the only difference is the network weights are started from scratch for NeRF-RPN, whereas in our case, we start with our pretrained network weights. Please see our paper for more details.

Note: We do not see ScanNet dataset during our pretraining. ScanNet 3D OBB prediction finetuning is a challenging case of cross-dataset transfer.

3D Object Detection

Navigate to nerf-rpn folder and run finetuning script.

To run 3D Swin Transformer + FPN model finetuning with our pretrained weights:

cd nerf-rpn
bash train_fcos_pretrained.sh

To train the 3D Swin Transformer + FPN model model with weights started from scratch:

cd nerf-rpn
bash train_fcos.sh

Note: only 3D Swin Transformer weights are started from our pretraining. FPN weights for both cases are started from scratch. For evaluating our pretrained weights or finetuning from scratch, use bash test_fcos_pretrained.sh or bash test_fcos.sh

Checkout train_fcos_pretraining.sh and test_fcos_pretrained.sh file for a complete list of all hyperparameters such as mae_checkpoint, num_epochs, lr, masking_prob etc. Code for finetuning and eval for our downstream tasks are based on NeRF-RPN's implementation.

Acknowledgments

This code is built upon the implementation from NeRF-RPN. We appreciate the authors for releasing their open-source implementation.

Licenses

This repository and dataset is released under the CC BY-NC 4.0 license.