Home

Awesome

MINER_pl

Unofficial implementation of MINER: Multiscale Implicit Neural Representations in pytorch-lightning.

Official implementation : https://github.com/vishwa91/miner

image

:open_book: Ref readings

<p align="center"> <a href="https://youtu.be/cXZtbfjnJtA"> <img src="https://user-images.githubusercontent.com/11364490/168209075-330d879e-2bff-467f-bf31-4e0ad2809777.png", width="45%"> </a> <a href="https://youtu.be/MSVEhq67Ca4"> <img src="https://user-images.githubusercontent.com/11364490/168209233-4bde51ba-df6d-4fdb-87d6-9704986c1248.png", width="45%"> </a> </p>

:warning: Main differences w.r.t. the original paper before continue:

:computer: Installation

:key: Training

<details> <summary><h2>image</summary>

Pluto example:

python train.py \
    --task image --path images/pluto.png \
    --input_size 4096 4096 --patch_size 32 32 --batch_size 256 --n_scales 4 \
    --use_pe --n_layers 3 \
    --num_epochs 50 50 50 200 \
    --exp_name pluto4k_4scale

Tokyo station example:

python train.py \
    --task image --path images/tokyo-station.jpg \
    --input_size 6000 4000 --patch_size 25 25 --batch_size 192 --n_scales 5 \
    --use_pe --n_layers 3 \
    --num_epochs 50 50 50 50 150 \
    --exp_name tokyo6k_5scale
Image (size)Train time (s)GPU mem (MiB)#Params (M)PSNR
Pluto (4096x4096)5331719.1642.14
Pluto (8192x8192)106609928.0545.09
Tokyo station (6000x4000)68681935.442.48
Shibuya (7168x2560)101896717.7337.78
Shibuya (14336x5120)372884775.4239.32
Shibuya (28672x10240)89010255277.3741.93
Shibuya (28672x10240)*1244627798.737.59

*paper settings (6 scales, each network has 4 layer with 9 hidden units)

The original image will be resized to img_wh for reconstruction. You need to make sure img_wh divided by 2^(n_scales-1) (the resolution at the coarsest level) is still a multiple of patch_wh.


</details> <details> <summary><h2>mesh</summary>

First, convert the mesh to N^3 occupancy grid by

python preprocess_mesh.py --N 512 --M 1 --T 1 --path <path/to/mesh> 

This will create N^3 occupancy to be regressed by the neural network. For detailed options, please see preprocess_mesh.py. Typically, increase M or T if you find the resulting occupancy bad.

Next, start training (bunny example):

python train.py \
    --task mesh --path occupancy/bunny_512.npy \
    --input_size 512 --patch_size 16 --batch_size 512 --n_scales 4 \
    --use_pe --n_freq 5 --n_layers 2 --n_hidden 8 \
    --loss_thr 5e-3 --b_chunks 512 \
    --num_epochs 50 50 50 150 \
    --exp_name bunny512_4scale

</details>

For full options, please see here. Some important options:

You are recommended to monitor the training progress by

tensorboard --logdir logs

where you can see training curves and images.

:red_square::green_square::blue_square: Block decomposition

To reconstruct the image using trained model and to visualize block decomposition per scale like Fig. 4 in the paper, see image_test.ipynb or mesh_test.ipynb

<!-- Pretrained models can be downloaded from [releases](https://github.com/kwea123/MINER_pl/releases). -->

Examples:

<p align="center"> <img src="https://user-images.githubusercontent.com/11364490/168275200-e625d828-61df-4ff2-a658-7dd10e123847.jpg", width="45%"> <img src="https://user-images.githubusercontent.com/11364490/168275208-a35e828d-0ca0-408f-90c3-89dd97d108ba.jpg", width="45%"> </p> <p align="center"> <img src="https://user-images.githubusercontent.com/11364490/169640414-9da542dc-2df5-4a46-b80a-9591e86f98b3.jpg", width="30%"> <img src="https://user-images.githubusercontent.com/11364490/169640416-59e4391f-7377-4b7e-b103-715b25d7253c.jpg", width="30%"> <img src="https://user-images.githubusercontent.com/11364490/169640742-49f4a43e-4705-4463-bbe4-822839220ddd.jpg", width="30%"> </p>

:bulb: Implementation tricks

:gift_heart: Acknowledgement

:question: Further readings

During a stream, my audience suggested me to test on this image with random pixels:

random

The default 32x32 patch size doesn't work well, since the texture varies too quickly inside a patch. Decreasing to 16x16 and increasing network hidden units make the network converge right away to 43.91 dB under a minute. Surprisingly, with the other image reconstruction SOTA instant-ngp, the network is stuck at 17 dB no matter how long I train.

ngp-random

Is this a possible weakness of instant-ngp? What effect could it bring to real application? You are welcome to test other methods to reconstruct this image!