Awesome

DiffEqGPU

This library is a component package of the DifferentialEquations.jl ecosystem. It includes functionality for making use of GPUs in the differential equation solvers.

The two ways to accelerate ODE solvers with GPUs

There are two very different ways that one can accelerate an ODE solution with GPUs. There is one case where u is very big and f is very expensive but very structured, and you use GPUs to accelerate the computation of said f. The other use case is where u is very small but you want to solve the ODE f over many different initial conditions (u0) or parameters p. In that case, you can use GPUs to parallelize over different parameters and initial conditions. In other words:

Type of Problem	SciML Solution
Accelerate a big ODE	Use CUDA.jl's CuArray as `u0`
Solve the same ODE with many `u0` and `p`	Use DiffEqGPU.jl's `EnsembleGPUArray` and `EnsembleGPUKernel`

Supported GPUs

SciML's GPU support extends to a wide array of hardware, including:

GPU Manufacturer	GPU Kernel Language	Julia Support Package	Backend Type
NVIDIA	CUDA	CUDA.jl	`CUDA.CUDABackend()`
AMD	ROCm	AMDGPU.jl	`AMDGPU.ROCBackend()`
Intel	OneAPI	OneAPI.jl	`oneAPI.oneAPIBackend()`
Apple (M-Series)	Metal	Metal.jl	`Metal.MetalBackend()`

For this tutorial we will demonstrate the CUDA backend for NVIDIA GPUs, though any of the other GPUs can be used by simply swapping out the backend choice.

Example of Within-Method GPU Parallelism

using OrdinaryDiffEq, CUDA, LinearAlgebra
u0 = cu(rand(1000))
A = cu(randn(1000, 1000))
f(du, u, p, t) = mul!(du, A, u)
prob = ODEProblem(f, u0, (0.0f0, 1.0f0)) # Float32 is better on GPUs!
sol = solve(prob, Tsit5())

Example of Parameter-Parallelism with GPU Ensemble Methods

using DiffEqGPU, CUDA, OrdinaryDiffEq, StaticArrays

function lorenz(u, p, t)
    σ = p[1]
    ρ = p[2]
    β = p[3]
    du1 = σ * (u[2] - u[1])
    du2 = u[1] * (ρ - u[3]) - u[2]
    du3 = u[1] * u[2] - β * u[3]
    return SVector{3}(du1, du2, du3)
end

u0 = @SVector [1.0f0; 0.0f0; 0.0f0]
tspan = (0.0f0, 10.0f0)
p = @SVector [10.0f0, 28.0f0, 8 / 3.0f0]
prob = ODEProblem{false}(lorenz, u0, tspan, p)
prob_func = (prob, i, repeat) -> remake(prob, p = (@SVector rand(Float32, 3)) .* p)
monteprob = EnsembleProblem(prob, prob_func = prob_func, safetycopy = false)

@time sol = solve(monteprob, GPUTsit5(), EnsembleGPUKernel(CUDA.CUDABackend()),
    trajectories = 10_000, adaptive = false, dt = 0.1f0)

Benchmarks

Curious about our claims? See https://github.com/utkarsh530/GPUODEBenchmarks for comparsion of our GPU solvers against CPUs and GPUs implementation in C++, JAX and PyTorch.

Citation

If you are using DiffEqGPU.jl in your work, consider citing our paper:

@article{utkarsh2024automated,
  title={Automated translation and accelerated solving of differential equations on multiple GPU platforms},
  author={Utkarsh, Utkarsh and Churavy, Valentin and Ma, Yingbo and Besard, Tim and Srisuma, Prakitr and Gymnich, Tim and Gerlach, Adam R and Edelman, Alan and Barbastathis, George and Braatz, Richard D and others},
  journal={Computer Methods in Applied Mechanics and Engineering},
  volume={419},
  pages={116591},
  year={2024},
  publisher={Elsevier}
}