Awesome

ACAN: Attention-based Context Aggregation Model for Monocular Depth Estimation.

Pytorch implementation of ACAN for monocular depth estimation. More detalis arXiv

Architecture

Visualization of Attention Maps

The first and second row respectively denotes the attention maps trained with and w/o Attention Loss.

Soft Inference VS Hard Inference

The third column and the fourth column respectively denotes the results of soft inference and hard inference.

Quick start

Requirements

torch=0.4.1
torchvision
tensorboardX
pillow
tqdm
h5py
scikit-learn
cv2

This code was tested with Pytorch 0.4.1, CUDA 9.1 and Ubuntu 18.04.
Training takes about 48 hours with the default parameters on the KITTI dataset on a Nvidia GTX1080Ti machine.

Data

There are two main datasets available:

KITTI

We used Eigen split of the data, amounting for approximately 22k training samples, you can find them in the kitti_path_txt folder.

We download the raw dataset, which weights about 428GB. We use the toolbox of NYU v2 to sample around 12k training samples, you can find them in the matlab folder and use Get_Dataset.m to produce the training set or download the processed dataset from BaiduCloud.

Training

Warning: The input sizes need to be mutiples of 8.

bash ./code/train_nyu_script.sh

Testing

bash ./code/test_nyu_script.sh

Attention Map

If you want to get the task-specific attention maps, you should first train your model from scratch, then finetuning with attention loss, by setting

BETA=1
RESUME=./workspace/log/best.pkl
EPOCHES=10

Thanks to the Third Party Libs

Non-local_pytorch

Pytorch-OCNet

NConv-CNN