Home

Awesome

CUDPP documentation {#mainpage}

Introduction

CUDPP is the CUDA Data Parallel Primitives Library. CUDPP is a library of data-parallel algorithm primitives such as parallel-prefix-sum ("scan"), parallel sort and parallel reduction. Primitives such as these are important building blocks for a wide variety of data-parallel algorithms, including sorting, stream compaction, and building data structures such as trees and summed-area tables.

Overview Presentation

A brief set of slides that describe the features, design principles, applications and impact of CUDPP is available: CUDPP Presentation.

Home Page

Homepage for CUDPP: http://cudpp.github.io/

Announcements and discussion of CUDPP are hosted on the CUDPP Google Group.

Getting Started with CUDPP

You may want to start by browsing the [CUDPP Public Interface](@ref publicInterface). For information on building CUDPP, see [Building CUDPP](@ref building-cudpp). See [Overview of CUDPP hash tables](@ref hash_overview) for an overview of CUDPP's hash table support.

The "apps" subdirectory included with CUDPP has a few source code samples that use CUDPP:

We have also provided a code walkthrough of the [simpleCUDPP](@ref example_simpleCUDPP) example.

Getting Help and Reporting Problems

To get help using CUDPP, please use the CUDPP Google Group.

To report CUDPP bugs or request features, please file an issue directly using Github.

Release Notes {#release-notes}

For specific release details see the [Change Log](@ref changelog).

Known Issues

For a complete list of issues, see the CUDPP issues list on Github.

Algorithm Input Size Limitations

The following maximum size limitations currently apply. In some cases this is the theory—the algorithms may not have been tested to the maximum size. Also, for things like 32-bit integer scans, precision often limits the useful maximum size.

AlgorithmMaximum Supported Size
CUDPP_SCAN67,107,840 elements
CUDPP_SEGMENTED_SCAN67,107,840 elements
CUDPP_COMPACT67,107,840 elements
CUDPP_COMPRESS1,048,576 elements
CUDPP_LISTRANKNO LIMIT
CUDPP_MTFBounded by GPU memory
CUDPP_BWT1,048,576 elements
CUDPP_SA0.14 GPU memory
CUDPP_STRINGSORT2,147,450,880 elements
CUDPP_MERGESORT2,147,450,880 elements
CUDPP_MULTISPLITBounded by GPU memory
CUDPP_REDUCENO LIMIT
CUDPP_RAND33,554,432 elements
CUDPP_SPMVMULT67,107,840 non-zero elements
CUDPP_HASHSee [Hash Space Limitations](@ref hash_space_limitations)
CUDPP_TRIDIAGONAL65535 systems, 1024 equations per system (Compute capability 2.x), 512 equations per system (Compute capability < 2.0)

Operating System Support and Requirements

This release (2.3) has been tested on the following OSes. For more information, visit our test results page.

We expect CUDPP to build and run correctly on other flavors of Linux and Windows, but only the above are actively tested at this time. Version 2.3 does not currently support 32-bit operating systems.

Requirements

CUDPP, from this release 2.3 and onwards, now requires a minimum of SM 3.0. CUDPP 2.3 has not been tested with any CUDA version < 6.5.

CUDA

CUDPP is implemented in CUDA C/C++. It requires the CUDA Toolkit. Please see the NVIDIA CUDA homepage to download CUDA as well as the CUDA Programming Guide and CUDA SDK, which includes many CUDA code examples.

Design Goals

Design goals for CUDPP include:

Programmers may use any of the lower three CUDPP layers in their own programs by building the source directly into their application. However, the typical usage of CUDPP is to link to the library and invoke functions in the CUDPP [Public Interface](@ref publicInterface), as in the [simpleCUDPP](@ref example_simpleCUDPP), satGL, cudpp_testrig, and cudpp_hash_testrig application examples included in the CUDPP distribution.

Use Cases

We expect the normal use of CUDPP will be in one of two ways:

References {#references}

The following publications describe work incorporated in CUDPP.

</pre> Many researchers are using CUDPP in their work, and there are many publications that have used it ([references](@ref cudpp_refs)). If your work uses CUDPP, please let us know by sending us a reference (preferably in BibTeX format) to your work.

Citing CUDPP

If you make use of CUDPP primitives in your work and want to cite CUDPP (thanks!), we would prefer for you to cite the appropriate papers above, since they form the core of CUDPP. To be more specific, the GPU Gems paper (Harris et al.) describes (unsegmented) scan, multi-scan for summed-area tables, and stream compaction. The Sengupta et al. book chapter describes the current scan and segmented scan algorithms used in the library, and the Sengupta et al. Graphics Hardware paper describes an earlier implementation of segmented scan, quicksort, and sparse matrix-vector multiply. The IPDPS paper (Satish et al.) describes the radix sort used in CUDPP (prior to CUDPP 2.0. Later releases use Thrust::sort), and the I3D paper (Tzeng and Wei) describes the random number generation algorithm. The two Alcantara papers describe the hash algorithms. The two Zhang papers describe the tridiagonal solvers.

Credits

CUDPP Developers

Other CUDPP Contributors

Acknowledgments

Thanks to Jim Ahrens, Timo Aila, Nathan Bell, Ian Buck, Guy Blelloch, Jeff Bolz, Michael Garland, Jeff Inman, Eric Lengyel, Samuli Laine, David Luebke, Pat McCormick, Duane Merrill, and Richard Vuduc for their contributions during the development of this library.

CUDPP Developers from UC Davis thank their funding agencies:

CUDPP Copyright and Software License

CUDPP is copyright The Regents of the University of California, Davis campus and NVIDIA Corporation. The library, examples, and all source code are released under the BSD license, designed to encourage reuse of this software in other projects, both commercial and non-commercial. For details, please see the [license](@ref license) page.

Non source-code content (such as documentation, web pages, etc.) from CUDPP is distributed under a Creative Commons Attribution-ShareAlike 3.0 (CC BY-SA 3.0) license.

Note that prior to release 1.1 of CUDPP, the license used was a modified BSD license. With release 1.1, this license was replaced with the pure BSD license to facilitate the use of open source hosting of the code.

CUDPP also includes the Mersenne twister code of Makoto Matsumoto, also licensed under BSD.

CUDPP also calls functions in the Thrust template library, which is included with the CUDA Toolkit and licensed under the Apache 2.0 open source license.

CUDPP also includes a modified version of FindGLEW.cmake from nvidia-texture-tools, licensed under the MIT license.