Home

Awesome

PyPI

Pytorch Correlation module

this is a custom C++/Cuda implementation of Correlation module, used e.g. in FlowNetC

This tutorial was used as a basis for implementation, as well as NVIDIA's cuda code

Requirements

This module is expected to compile for Pytorch 2.1.0.

Before installation please check compatibility of your GPU and CUDA (Compute Capability) nvidia docs. e.g RTX 6000 is using CC=8.9 so we are setting the environment variable to

export TORCH_CUDA_ARCH_LIST="8.9+PTX"

Installation

be reminded this module requires python3-dev to compile C++ code, e.g. on Ubuntu run:

apt install python3-dev

this module is available on pip

pip install spatial-correlation-sampler

For a cpu-only version, you can install from source with

python setup_cpu.py install

Known Problems

This module needs compatible gcc version and CUDA to be compiled. Namely, CUDA 9.1 and below will need gcc5, while CUDA 9.2 and 10.0 will need gcc7 See this issue for more information

Usage

API has a few difference with NVIDIA's module

input (B x C x H x W) -> output (B x PatchH x PatchW x oH x oW)
kernel_size=1
patch_size=21,
stride=1,
padding=0,
dilation=1
dilation_patch=2

Example

import torch
from spatial_correlation_sampler import SpatialCorrelationSampler, spatial_correlation_sample

device = "cuda"
batch_size = 1
channel = 1
H = 10
W = 10
dtype = torch.float32

input1 = torch.randint(1, 4, (batch_size, channel, H, W), dtype=dtype, device=device, requires_grad=True)
input2 = torch.randint_like(input1, 1, 4).requires_grad_(True)

#You can either use the function or the module. Note that the module doesn't contain any parameter tensor.

#function

out = spatial_correlation_sample(input1,
	                         input2,
                                 kernel_size=3,
                                 patch_size=1,
                                 stride=2,
                                 padding=0,
                                 dilation=2,
                                 dilation_patch=1)

#module

correlation_sampler = SpatialCorrelationSampler(
    kernel_size=3,
    patch_size=1,
    stride=2,
    padding=0,
    dilation=2,
    dilation_patch=1)
out = correlation_sampler(input1, input2)

Benchmark

CUDA Benchmark

CUDA_LAUNCH_BLOCKING=1 python benchmark.py --scale ms -k1 --patch 21 -s1 -p0 --patch_dilation 2 -b4 --height 48 --width 64 -c256 cuda -d float

CUDA_LAUNCH_BLOCKING=1 python NV_correlation_benchmark.py --scale ms -k1 --patch 21 -s1 -p0 --patch_dilation 2 -b4 --height 48 --width 64 -c256
implementationCorrelation parametersdevicepassmin timeavg time
oursdefault980 GTXforward5.745 ms5.851 ms
oursdefault980 GTXbackward77.694 ms77.957 ms
NVIDIAdefault980 GTXforward13.779 ms13.853 ms
NVIDIAdefault980 GTXbackward73.383 ms73.708 ms
oursFlowNetC980 GTXforward26.102 ms26.179 ms
oursFlowNetC980 GTXbackward208.091 ms208.510 ms
NVIDIAFlowNetC980 GTXforward35.363 ms35.550 ms
NVIDIAFlowNetC980 GTXbackward283.748 ms284.346 ms

Notes

CPU Benchmark

Correlation parametersdevicepassmin timeavg time
defaultE5-2630 v3 @ 2.40GHzforward159.616 ms188.727 ms
defaultE5-2630 v3 @ 2.40GHzbackward282.641 ms294.194 ms
FlowNetCE5-2630 v3 @ 2.40GHzforward2.138 s2.144 s
FlowNetCE5-2630 v3 @ 2.40GHzbackward7.006 s7.075 s