Home

Awesome

cutorch

** NOTE on API changes and versioning **

Cutorch provides a CUDA backend for torch7.

Cutorch provides the following:

torch.CudaTensor

This new tensor type behaves exactly like a torch.FloatTensor, but has a couple of extra functions of note:

Other CUDA tensor types

Most other (besides float) CPU torch tensor types now have a cutorch equivalent, with similar names:

Note: these are currently limited to copying/conversion, and several indexing and shaping operations (e.g. narrow, select, unfold, transpose).

CUDA memory allocation

Set the environment variable THC_CACHING_ALLOCATOR=1 to enable the caching CUDA memory allocator.

By default, cutorch calls cudaMalloc and cudaFree when CUDA tensors are allocated and freed. This is expensive because cudaFree synchronizes the CPU with the GPU. Setting THC_CACHING_ALLOCATOR=1 will cause cutorch to cache and re-use CUDA device and pinned memory allocations to avoid synchronizations.

With the caching memory allocator, device allocations and frees should logically be considered "usages" of the memory segment associated with streams, just like kernel launches. The programmer must insert the proper synchronization if memory segments are used from multiple streams.

###cutorch.* API

Low-level streams functions (dont use this as a user, easy to shoot yourself in the foot):

Common Examples

Transfering a FloatTensor src to the GPU:

dest = src:cuda() -- dest is on the current GPU

Allocating a tensor on a given GPU: Allocate src on GPU 3

cutorch.setDevice(3)
src = torch.CudaTensor(100)

Copying a CUDA tensor from one GPU to another: Given a tensor called src on GPU 1, if you want to create it's clone on GPU 2, then:

cutorch.setDevice(2)
local dest = src:clone()

OR

local dest
cutorch.withDevice(2, function() dest = src:clone() end)

API changes and Versioning

Version 1.0 can be installed via: luarocks install cutorch 1.0-0 Compared to version 1.0, these are the following API changes:

operators1.0master
lt, le, gt, ge, eq, ne return typetorch.CudaTensortorch.CudaByteTensor
min,max (2nd return value)torch.CudaTensortorch.CudaLongTensor
maskedFill, maskedCopy (mask input)torch.CudaTensortorch.CudaByteTensor
topk, sort (2nd return value)torch.CudaTensortorch.CudaLongTensor

Inconsistencies with CPU API

operatorsCPUCUDA