Home

Awesome

Sparkler

Overview

The Sparkler miniapp computes a specialized dense matrix-matrix product C = A^T A for small integer elements of the matrix A. This operation mimics the matrix product operation used to compute the Custom Corellation Coefficient (CCC) in the CoMet computational genomics code.

Sparkler is licensed under the CoMet license; see https://github.com/wdj/comet.

Building

The build requires MPI and make. The default build requires CUDA 9.2 or higher for NVIDIA GPUs. An alternative build path for CPU-only execution requires an installed BLAS library, preferably multithreaded if the runs use more than one core per MPI rank.

To build for a cluster, modify the Makefile to reflect your MPI and CUDA installs and then type "make" (GPU case) or "env USE_GPU=NO make" (CPU-only case).

Running

Running the GPU executable requires one or more NVIDIA GPUs. Volta V100 or later (compute capability 7.0 or higher) GPUs are preferred; older GPUs will run much slower due to lack of tensor core hardware.

A run is composed of a series of iterations, each representing a global dense matrix-matrix product. A single iteration is composed of steps, each corresponding to a single GEMM executed on each GPU.

Command-line options:

    --num_vector - number of vectors (half the number of columns of matrix A)

    --num_field - number of fields (the number of rows of A)

    --num_iterations - number of (global) matrix products done

Example:

mpirun -n 2 ./exec.gpu --num_vector 1000 --num_field 2000 --num_iterations 2

Reported values are:

TF - teraflops, total number of GEMM floating point operations

GEMM sec - total time spent in GPU GEMM operations

GEMM TF/sec - GEMM teraflop rate, ratio of TF to GEMM sec

total sec - total runtime

hash - a hash of the results computed, for evaluating correctness

Competition Test Cases:

The four competition test cases can be run by

  ./run_test_case.sh <i>

where <i> = 1, 2, 3 or 4. Higher test case numbers correspond to more GPUs (1, 2, 3 or 6) and longer runtime. Note test case 1 can run on smaller-memory GPUs, but cases 2, 3 and 4 on GPUs require at least 16 GB memory per GPU.

The script run_test_case.sh may need to be modified for your specific CUDA and MPI installations. The execution mode is one MPI rank per GPU.

Values to reported are (1) the hash, to validate correctness, and (2) the GEMM TF/sec value, to measure performance. Note that due to load balancing issues best per-GPU performance is achieved for odd numbers of GPUs.

Representative outputs are shown below, from test runs on the Summit architecture using the Volta V100 tensor cores. The hash values from your runs should match those shown. Values marked here by "XXXXXX" will appear as actual numbers in your runs.

summit-batch4$ ./run_test_case.sh 1
num_vector 4000 num_field 90000 num_iterations 400 num_proc 1
Iteration 1 of 400, step 1 of 1, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 2 of 400, step 1 of 1, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 4 of 400, step 1 of 1, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 8 of 400, step 1 of 1, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 16 of 400, step 1 of 1, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 32 of 400, step 1 of 1, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 64 of 400, step 1 of 1, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 128 of 400, step 1 of 1, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 256 of 400, step 1 of 1, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 400 of 400, step 1 of 1, elapsed sec XXXXXX: setup... GEMM... check...
TF 4608.000 GEMM sec XXXXXX GEMM TF/sec XXXXXX total sec XXXXXX hash 435999930709XXXXXX
summit-batch4$ ./run_test_case.sh 2
num_vector 18000 num_field 90000 num_iterations 350 num_proc 2
Iteration 1 of 350, step 1 of 2, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 1 of 350, step 2 of 2, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 2 of 350, step 1 of 2, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 2 of 350, step 2 of 2, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 4 of 350, step 1 of 2, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 4 of 350, step 2 of 2, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 8 of 350, step 1 of 2, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 8 of 350, step 2 of 2, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 16 of 350, step 1 of 2, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 16 of 350, step 2 of 2, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 32 of 350, step 1 of 2, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 32 of 350, step 2 of 2, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 64 of 350, step 1 of 2, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 64 of 350, step 2 of 2, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 128 of 350, step 1 of 2, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 128 of 350, step 2 of 2, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 256 of 350, step 1 of 2, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 256 of 350, step 2 of 2, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 350 of 350, step 1 of 2, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 350 of 350, step 2 of 2, elapsed sec XXXXXX: setup... GEMM... check...
TF 61236.000 GEMM sec XXXXXX GEMM TF/sec XXXXXX total sec XXXXXX hash 2775866192702XXXXXX
summit-batch4$ ./run_test_case.sh 3
num_vector 27000 num_field 90000 num_iterations 1600 num_proc 3
Iteration 1 of 1600, step 1 of 2, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 1 of 1600, step 2 of 2, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 2 of 1600, step 1 of 2, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 2 of 1600, step 2 of 2, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 4 of 1600, step 1 of 2, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 4 of 1600, step 2 of 2, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 8 of 1600, step 1 of 2, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 8 of 1600, step 2 of 2, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 16 of 1600, step 1 of 2, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 16 of 1600, step 2 of 2, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 32 of 1600, step 1 of 2, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 32 of 1600, step 2 of 2, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 64 of 1600, step 1 of 2, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 64 of 1600, step 2 of 2, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 128 of 1600, step 1 of 2, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 128 of 1600, step 2 of 2, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 256 of 1600, step 1 of 2, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 256 of 1600, step 2 of 2, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 512 of 1600, step 1 of 2, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 512 of 1600, step 2 of 2, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 768 of 1600, step 1 of 2, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 768 of 1600, step 2 of 2, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 1024 of 1600, step 1 of 2, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 1024 of 1600, step 2 of 2, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 1280 of 1600, step 1 of 2, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 1280 of 1600, step 2 of 2, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 1536 of 1600, step 1 of 2, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 1536 of 1600, step 2 of 2, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 1600 of 1600, step 1 of 2, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 1600 of 1600, step 2 of 2, elapsed sec XXXXXX: setup... GEMM... check...
TF 559872.000 GEMM sec XXXXXX GEMM TF/sec XXXXXX total sec XXXXXX hash 3719610844656XXXXXX
peak-login1$ ./run_test_case.sh 4
num_vector 54000 num_field 90000 num_iterations 3000 num_proc 6
Iteration 1 of 3000, step 1 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 1 of 3000, step 2 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 1 of 3000, step 3 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 1 of 3000, step 4 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 2 of 3000, step 1 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 2 of 3000, step 2 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 2 of 3000, step 3 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 2 of 3000, step 4 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 4 of 3000, step 1 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 4 of 3000, step 2 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 4 of 3000, step 3 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 4 of 3000, step 4 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 8 of 3000, step 1 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 8 of 3000, step 2 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 8 of 3000, step 3 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 8 of 3000, step 4 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 16 of 3000, step 1 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 16 of 3000, step 2 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 16 of 3000, step 3 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 16 of 3000, step 4 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 32 of 3000, step 1 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 32 of 3000, step 2 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 32 of 3000, step 3 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 32 of 3000, step 4 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 64 of 3000, step 1 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 64 of 3000, step 2 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 64 of 3000, step 3 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 64 of 3000, step 4 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 128 of 3000, step 1 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 128 of 3000, step 2 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 128 of 3000, step 3 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 128 of 3000, step 4 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 256 of 3000, step 1 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 256 of 3000, step 2 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 256 of 3000, step 3 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 256 of 3000, step 4 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 512 of 3000, step 1 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 512 of 3000, step 2 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 512 of 3000, step 3 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 512 of 3000, step 4 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 768 of 3000, step 1 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 768 of 3000, step 2 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 768 of 3000, step 3 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 768 of 3000, step 4 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 1024 of 3000, step 1 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 1024 of 3000, step 2 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 1024 of 3000, step 3 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 1024 of 3000, step 4 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 1280 of 3000, step 1 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 1280 of 3000, step 2 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 1280 of 3000, step 3 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 1280 of 3000, step 4 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 1536 of 3000, step 1 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 1536 of 3000, step 2 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 1536 of 3000, step 3 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 1536 of 3000, step 4 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 1792 of 3000, step 1 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 1792 of 3000, step 2 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 1792 of 3000, step 3 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 1792 of 3000, step 4 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 2048 of 3000, step 1 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 2048 of 3000, step 2 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 2048 of 3000, step 3 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 2048 of 3000, step 4 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 2304 of 3000, step 1 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 2304 of 3000, step 2 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 2304 of 3000, step 3 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 2304 of 3000, step 4 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 2560 of 3000, step 1 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 2560 of 3000, step 2 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 2560 of 3000, step 3 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 2560 of 3000, step 4 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 2816 of 3000, step 1 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 2816 of 3000, step 2 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 2816 of 3000, step 3 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 2816 of 3000, step 4 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 3000 of 3000, step 1 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 3000 of 3000, step 2 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 3000 of 3000, step 3 of 4, elapsed sec XXXXXX: setup... GEMM... check...
Iteration 3000 of 3000, step 4 of 4, elapsed sec XXXXXX: setup... GEMM... check...
TF 3674160.000 GEMM sec XXXXXX GEMM TF/sec XXXXXX total sec XXXXXX hash 4137762059954XXXXXX