Home

Awesome

A SYCL plug-in to run AMReX apps on AMD/Nvidia GPUs

Nuno Nobre, Alex Grant, Karthikeyan Chockalingam and Xiaohu Guo

DOI

SYCL unlocks single-source development for hardware accelerators by leveraging C++ templated functions, greatly easing the otherwise laborious task of porting C++ code to heterogeneous architectures. The aim of this work is to show that state of the art scientific applications such as AMReX can be solely written in SYCL while still preserving its performance portability features.

We demonstrate how minimal the extra required development effort can be with AMReX's ElectromagneticPIC tutorial. This is due to the tutorial's relevance to plasma fusion but we note that all AMReX applications should be able to benefit. Here is an illustration of a PIC simulation you can carry out using this code:

Plasma Oscillations

Current-driven Langmuir oscillations at the plasma frequency on a 32 x 32 x 32 grid with 1 electron per cell. The mesh is coloured after the amplitude of the oscillating but uniform electric field: from blue (-) to red (+).

The plug-in consists of a build script and code patches which extend AMReX's SYCL capability beyond Intel GPUs. We support two open-source SYCL compiler and runtime frameworks:

The plug-in has been tested on all the high performance computing GPUs generally available at the beginning of 2023:

Since AMReX also includes native support for both the Nvidia CUDA and the AMD HIP programming models, a direct comparison against those is trivial. This plot shows that the SYCL implementation is as fast as those vendor alternatives.

Performance Results

SYCL vs CUDA and HIP. Performance comparison for a Langmuir oscillations simulation on a 128 x 128 x 128 grid with 64 electrons per cell and 100 time steps. The AMD MI100 is an order of magnitude slower due to the lack of support for FP64 atomics on that GPU. The PIC loop is always faster on the SYCL implementation, but the particle initialisation routine is notably slower, meaning the small number of iterations results in slower execution times for SYCL on some GPUs such as the A100. For a detailed per-routine comparison for each GPU, see here.

To learn how to install and use the plug-in, continue reading here.

Acknowledgments