Home

Awesome

Official implementation of GLUNet

This is the official implementation of our paper :

GLU-Net: Global-Local Universal Network for dense flow and correspondences (CVPR 2020-Oral).

Authors: Prune Truong, Martin Danelljan and Radu Timofte <br /> [Paper] [Website] [Poster] [Oral Video] [Teaser Video]

Check out our related publication GOCor (website) and corresponding code here !

For an improved version of a dense correspondence network, also predicting a confidence mask, check out PDCNet (website) and code here. <br /><br /><br />

For any questions, issues or recommendations, please contact Prune at prune.truong@vision.ee.ethz.ch <br /> If our project is helpful for your research, please consider citing :

@inproceedings{GLUNet_Truong_2020,
      title = {{GLU-Net}: Global-Local Universal Network for dense flow and correspondences},
      author    = {Prune Truong and
                   Martin Danelljan and
                   Radu Timofte},
      year = {2020},
      booktitle = {2020 {IEEE} Conference on Computer Vision and Pattern Recognition, {CVPR} 2020}
}

Updates

06/03/2022: We found that significantly better performance and reduced training time are obtained when initializing with bilinear interpolation weights the weights of the transposed convolutions used to upsample the predicted flow fields between the different pyramid levels. We have integrated this initialization as the default. We might provide updated pre-trained weights as well. Alternatively, one can directly simply use bilinear interpolation for upsampling with similar (maybe a bit better) performance, which is also now an option proposed.

Network

Our model GLU-Net is illustrated below: alt text

The models, evaluation and training codes for Local-Net (a 3 level pyramidal network with only local correlations), Global-Net (a 3 levels pyramidal network with a single global correlation followed by concatenation of feature maps) and GLOCAL-Net (a combination of the two, a 3 levels pyramidel network with a single global correlation followed by two local correlation layers) are also available for reference. They are all illustrated below: alt text

For more details, refer to our paper

Table of Content

  1. Installation
  2. Test on your own image pairs !
  3. Datasets downloading
    1. Training datasets
    2. Testing datasets
  4. Training
  5. Evaluation
    1. Performance on geometric matching dataset
    2. Performance on semantic matching dataset
    3. Performance on optical flow dataset
  6. Acknowledgement
  7. Changelog

1. Installation <a name="Installation"></a>

Note that the models were trained with torch 1.0. Torch versions up to 1.7 were tested for inference on testing datasets but NOT for training, so I cannot guarantee that the models train smoothly for higher torch versions.

conda create -n GLUNet_env python=3.7
conda activate GLUNet_env
pip install -r requirements.txt

ATTENTION, CUDA is required to run the code. Indeed, the correlation layer is implemented in CUDA using CuPy, which is why CuPy is a required dependency. It can be installed using pip install cupy or alternatively using one of the provided binary packages as outlined in the CuPy repository. The code was developed using Python 3.7 & PyTorch 1.0 & CUDA 9.0, which is why I installed cupy for cuda90. For another CUDA version, change accordingly.

pip install cupy-cuda90 --no-cache-dir 

2. Test on your own image pairs ! <a name="test_pairs"></a>

One can test GLU-Net on a pair of images using test_GLUNet.py and the provided trained model weights. The inputs are the paths to the source and target images. They are then passed to the network which outputs the corresponding flow field relating the target to the source image. The source is then warped according to the estimated flow, and a figure is saved.

For this pair of images (provided to check that the code is working properly), the output is:

python test_GLUNet.py --path_source_image images/yosemite_source.png --path_target_image images/yosemite_target.png --write_dir evaluation/

optional arguments:

alt text

Another example and output (attention large images):

python test_GLUNet.py --path_source_image images/hp_source.png --path_target_image images/hp_target.png --write_dir evaluation/

optional arguments:

alt text

3. Datasets downloading <a name="download_dataset"></a>

3.1. Training datasets <a name="training_dataset"></a>

For the training, we use a combination of the DPED, CityScapes and ADE-20K datasets. The DPED training dataset is composed of only approximately 5000 sets of images taken by four different cameras. We use the images from two cameras, resulting in around 10,000 images. CityScapes additionally adds about 23,000 images. We complement with a random sample of ADE-20K images with a minimum resolution of 750 x 750. It results in 40.000 original images, used to create pairs of training images by applying geometric transformations to them. The path to the original images as well as the geometric transformation parameters are given in the csv files 'datasets/csv_files/homo_aff_tps_train_DPED_CityScape_ADE.csv' and 'datasets/csv_files/homo_aff_tps_test_DPED_CityScape_ADE.csv'.

Put all the datasets in the same directory. As illustration, your root training directory should be organised as follows:

<pre> /training_datasets/ original_images/ CityScape/ CityScape_extra/ ADE20K_2016_07_26/ </pre>

Optional: To save the synthetic image pairs and flows to disk
During training, from this set of original images, the pairs of synthetic images are created on the fly at each epoch. However, this dataset generation takes time and since no augmentation is applied at each epoch, one can also create the dataset in advance and save it to disk. During training, the image pairs composing the training datasets are then just loaded from the disk before passing through the network, which is a lot faster. To generate the training dataset and save it to disk:

python save_training_dataset_to_disk.py --image_data_path /directory/to/original/training_datasets/ 
--csv_path datasets/csv_files/homo_aff_tps_train_DPED_CityScape_ADE.csv --save_dir /path/to/save_dir --plot True

It will create the images pairs and corresponding flow fields in save_dir/images and save_dir/flow respectively.

3.2. Testing datasets <a name="testing_dataset"></a>

The testing datasets are available at the following links:

4.0 Training <a name="training"></a>

Training files for GLUNet (and its variants, including Semantic-GLU-Net), GLOCAL-Net, LocalNet and GlobalNet are available.

This will create the synthetic training and evaluation pairs along with the ground-truth on the fly !

python train_GLUNet.py --name_exp GLUNet_train --training_data_dir /path/to/directory/original_images-for-training/ --evaluation_data_dir /path/to/directory/original_images-for-evaluation/

if the network is already pretrained and the user wants to start the training from an old weight file: --pretrained /path/to/pretrained_file.pth

To load the pre-saved synthetic training and evaluation image pairs and ground truth flow fields instead (created earlier and saved to disk):

python train_GLUNet.py --name_exp GLUNet_train --pre_loaded_training_dataset True --training_data_dir /path/to/directory/synthetic_training_image_pairs_and_flows/
--evaluation_data_dir /path/to/directory/synthetic_validation_image_pairs_and_flows/

if the network is already pretrained and the user wants to start the training from an old weight file: --pretrained /path/to/pretrained_file.pth

In the training files, one can modify all the parameters of the network. The default ones are for GLU-Net.

5. Evaluation <a name="evaluation"></a>

5.1. Performance on geometric matching dataset <a name="geometric_matching"></a>

In the case of geometric matching, pairs of images present different viewpoints of the same scene.

HPATCHES (original size and resized to 240x240)

To test on the HPatches dataset, HP-240 (images and flow rescaled to 240x240) and HP (original)

python eval.py --model GLUNet --pre_trained_models DPED_CityScape_ADE --dataset HPatchesdataset --data_dir /directory/to/hpatches --save_dir /directory/to/save_dir

optional argument: --hpatches_original_size, to test on the original image size, True or False (default to False).

Out of the 120 sequences of HPatches, we only evaluate on the 59 sequences in HP labelled with v_X, which have viewpoint changes, thus excluding the ones labelled i_X, which only have illumination changes.

<br /><br /> AEPE on the different viewpoints of HP-240 and HP:

MethodHP-240 IHP-240 IIHP-240 IIIHP-240 IVHP-240 VHP-240 AllHP IHP IIHP IIIHP IVHP VHP All
PWC-Net5.7417.6920.4627.6136.9721.6823.9376.3391.30124.22164.9196.14
LiteFlowNet6.9916.7819.1325.2728.8919.4136.69102.17113.58154.97186.82118.85
DGC-Net (paper)1.555.538.9811.6616.708.88------
DGC-Net (repo)1.745.889.0712.1416.509.075.7120.4834.1543.9462.0133.26
GLU-Net (Ours)0.594.057.649.8214.897.401.5512.6627.5432.0451.4725.05
GLU-Net (this repo)0.594.057.649.8214.897.401.5312.6227.3831.9551.1124.91

Illustration on two examples of pairs of HP: alt text

ETH3D dataset

Data preparation: execute 'bash download_ETH3D.sh', the file is stored in datasets/ It does the following:

As illustration, your root ETH3D directory should be organised as follows:

<pre> /ETH3D/ multiview_testing/ lakeside/ sand_box/ storage_room/ storage_room_2/ tunnel/ multiview_training/ delivery_area/ electro/ forest/ playground/ terrains/ info_ETH3D_files/ </pre>

The organisation of your directories is important, since the bundle files contain the relative paths to the images, from the ETH3D root folder.

<br /><br /> Evaluation: for each interval rate (3,5,7,9,11,13,15), we compute the metrics for each of the sub-datasets (lakeside, delivery area and so on). The final metrics are the average over all datasets for each rate.

python eval_ETH3D.py --model GLUNet --pre_trained_models DPED_CityScape_ADE --data_dir /directory/to/ETH3D --save_dir /directory/to/save_dir 
<br /> AEPE for different rates of intervals between image pairs (corresponds to Fig. 5 of the main paper and Tab. 9 in supplementary). PCK results are also provided in the paper. These results are computed with a slightly more precise ground-truth than in the paper.
Methodrate=3rate=5rate=7rate=9rate=11rate=13rate=15
PWC-Net1.752.103.215.5914.3527.4943.41
LiteFlowNet1.662.586.0512.9529.6752.4174.96
DGC-Net2.493.284.185.356.789.0212.23
GLU-Net (Ours)1.982.543.494.245.617.5510.78

Qualitative examples on the testing set of DPED

We tested our network GLU-Net as well as DGC-Net on a the testing image pairs of the DPED dataset. A few examples are presented below. No ground-truth flow field between the image pairs are available, therefore those are only qualitative results. alt text

5.2. Performance on semantic matching dataset <a name="semantic_matching"></a>

In the case of semantic matching, pairs of images show two instances of the same object or scene category.

TSS dataset = only dataset with dense ground truth on foreground objects

To test on TSS

python eval.py --model GLUNet --flipping_condition True --pre_trained_models DPED_CityScape_ADE --dataset TSS --data_dir /directory/to/TSS/DJOBS --save_dir /directory/to/save_dir 

optional arguments: --flipping condition True or False, for TSS recommanded

or for the custom network:

python eval.py --model SemanticGLUNet --flipping_condition True --pre_trained_models DPED_CityScape_ADE --dataset TSS --data_dir /directory/to/TSS/JOBS --save_dir /directory/to/save_dir 

optional arguments: --flipping condition True or False, for TSS recommanded

<br /> PCK [%] obtained on TSS for the task of semantic matching
MethodFG3DCarJODSPASCALavg.
PARN (VGG-16)87.671.668.876.0
NC-Net94.581.457.177.7
DCCNet93.582.657.677.9
SAM-Net96.182.267.281.8
GLU-Net (Ours)93.273.371.179.2
GLU-Net (this repo)93.273.771.179.3
Semantic-GLU-Net (Ours)94.475.578.382.8
Semantic-GLU-Net (this repo)94.475.778.382.8

Illustration on examples of the TSS dataset alt text

Qualitative examples of day/night, seasonnal changes

Scenarios such as day/night, seasonnal changes and can be considered as borderline between geometric matching and semantic matching tasks since such pairs of images depict the same scenes but the appearance variations are so drastic that those images can be associated to semantic matching tasks. We qualitatively tested our network GLU-Net and Semantic-GLU-Net on examples of such cases and compared them to DGC-Net. The corresponding figures are presented below:

Day/Night changes: alt text

Seasonnal changes: alt text

Some of the images are from the WxBS dataset [WxBS: Wide Baseline Stereo Generalizations. D. Mishkin et al. Proceedings of the British Machine Vision Conference. 2015. ]

5.3. Performance on optical flow dataset <a name="OF"></a>

In the case of optical flow dataset, pairs of images show two consecutive images of a sequence or video.

KITTI 2012 and 2015

To test on KITTI datasets

python eval.py --model GLUNet --pre_trained_models DPED_CityScape_ADE --dataset KITTI_occ --data_dir /directory/to/KITTI/training/ --save_dir /directory/to/save_dir 

KITTI-2012KITTI-2015
AEPEAEPEF1 [%]
PWC-Net (flying-chairs ft 3d-Things)4.1410.3533.67
LiteFlowNet (flying-chairs ft 3d-Things4.010.3928.50
DGC-Net (tokyo)8.5014.9750.98
GLU-Net (CityScape-DPED-ADE)3.349.7937.52
GLU-Net (CityScape-DPED-ADE), this repo3.339.7937.77

Quantitative results on optical flow KITTI training datasets. Fl-all: Percentage of outliers averaged over all pixels. Inliers are defined as AEPE < 3 pixels or < 5 %. Lower F1 and AEPE are best.

6. Acknowledgement <a name="Acknowledgement"></a>

We borrow code from public projects, such as DGC-Net, PWC-Net, NC-Net, Flow-Net-Pytorch...

7. Changelog <a name="changelog"></a>