Awesome
cudaDecon
GPU-accelerated 3D image deconvolution & affine transforms using CUDA.
Python bindings are also available at pycudadecon
Install
Precompiled binaries available for linux and windows at conda-forge (see GPU driver requirements below)
# install just the executable binary and shared libraries from this repo
conda install -c conda-forge cudadecon
# install binary, libraries, and python bindings
conda install -c conda-forge pycudadecon
GPU requirements
This software requires a CUDA-compatible NVIDIA GPU.
The libraries available on conda-forge have been compiled against different versions of the CUDA toolkit. The required CUDA libraries are bundled in the conda distributions so you don't need to install the CUDA toolkit separately.
If desired, you may specify cuda-version
as follows:
conda install -c conda-forge cudadecon cuda-version=<11 or 12>
You should also ensure that you have the minimum required driver version installed for the CUDA version you are using.
Usage
# check that GPU is discovered
cudaDecon -Q
# Basic Usage
# 1. create an OTF from a PSF with "radialft"
radialft /path/to/psf.tif /path/to/otf_output.tif --nocleanup --fixorigin 10
# 2. run decon on a folder of tiffs:
# 'filename_pattern' is a string that must appear in the filename to be processed
cudaDecon $OPTIONS /folder/of/images filename_pattern /path/to/otf_output.tif
# see manual for all of the available arguments
cudaDecon --help
Local build instructions
If you simply wish to use this package, it is best to install the precompiled binaries from conda as described above
To build the source locally, you have two options:
1. Build using run_docker_build
With docker installed, use .scripts/run_docker_build.sh
with one of the
configs available in .ci_support
, for instance:
CONFIG=linux_64_cuda_compiler_version10.2 .scripts/run_docker_build.sh
2. using cmake directly
This package depends on boost, libtiff, fftw, and cuda.
Here we create a dedicated conda environment with all of the build dependencies installed, and then use cmake directly. This method is faster and creates an immediately useable binary (i.e. it is better for iteration if you're changing the source code), but requires that you set up build dependencies correctly.
-
(windows only) install build tools for VisualStudio 2019. For linux, all necessary build tools will be installed by conda.
-
create a new conda environment with all of the dependencies installed
conda config --add channels conda-forge conda create -n build -y cmake libboost-devel libtiff fftw ninja cuda-nvcc libcufft-dev conda activate build # you will need to reactivate the "build" environment each time you close the terminal
-
create a new
build
directory inside of the top levelcudaDecon
foldermkdir build # inside the cudaDecon folder cd build
-
(windows only) Activate your build tools (adjust the path to your installation):
"C:\Program Files (x86)\Microsoft Visual Studio\2017\BuildTools\VC\Auxiliary\Build\vcvars64.bat"
-
Run
cmake
and compile withninja
on windows ormake
on linux.# windows cmake ../src -DCMAKE_BUILD_TYPE=Release -G "Ninja" ninja # linux cmake ../src -DCMAKE_BUILD_TYPE=Release make -j4
note that you can specify the CUDA version to use by using the
-DCUDA_TOOLKIT_ROOT_DIR
flag
The binary will be written to cudaDecon\build\<platform>-<compiler>-release
.
If you change the source code, you can just rerun ninja
or make
and the
binary will be updated.
Developer Notes
-
GPU based resources have a
d_
prefix in their name such as : GPUBuffer & d_interpOTF -
transferConstants() is a function to send small data values from host to GPU device.
-
The link between the function arguments of "transferConstants()" and the globals like : constant unsigned const_nzotf; are found in RLgpuImpl.cu with calls like : cutilSafeCall(cudaMemcpyToSymbol(const_nzotf, &nzotf, sizeof(int)));
-
This RL is based upon the built-in Matlab version : deconvlucy.m (see http://ecco2.jpl.nasa.gov/opendap/hyrax/matlab/images/images/deconvlucy.m)
-
Cudadecon.exe
main()
function is insrc/linearDecon.cpp
-
If not enough memory is on the GPU, the program will use host PC's RAM.
-
If you are processing on the GPU that drives the display, Windows will terminate cudaDecon if an iteration takes too long. Set the windows display driver timeout to something larger (like 10 seconds instead of default 5 seconds) : see http://stackoverflow.com/questions/17186638/modifying-registry-to-increase-gpu-timeout-windows-7 Running this command from an adminstrator command prompt should set the timeout to 10 :
reg.exe ADD "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\GraphicsDrivers" /v "TdrDelay" /t REG_DWORD /D "10" /f
-
Better yet, use a second GPU. The GPU you wish to use for computation only should use the TCC driver (must be a Titan or Tesla or other GPU that supports TCC). This card should be initialized after the display GPU, so put the compute card in a slot that is > display card. The TCC driver is selected with NVIDIAsmi.exe -L from an administrator cmd window to show the GPUs, then NVIDIAsmi.exe -dm 1 -i 0 to set TCC on GPU 0. Then use
set CUDA_VISIBLE_DEVICES
to pick the GPU the deconv code should execute on.