Awesome
cutorch-rtc
Basic feature list:
- cutorch.launchPTX function
- apply kernels from cutorch
This package brings CUDA 7 runtime compilation to Torch. Linux or OS X with C++11 compiler required. Installation:
luarocks install https://raw.githubusercontent.com/szagoruyko/cutorch-rtc/master/cutorch-rtc-scm-1.rockspec
Then after requiring cutorch-rtc
you will get launchPTX
function, which can run ptx code generated with NVRTC, and cutorch.apply
functions:
require 'cutorch-rtc'
t = torch.randn(8):cuda()
t:apply1'x = x < 0 ? 0 : x'
That would be a simple ReLU implementation.
Documentation
cutorch.launchPTX
Runs compiled PTX.
function cutorch.launchPTX(ptx, kernel_name, arguments, gridDim, blockDim)
Arguments:
- ptx - compiled PTX lua string
- kernel_name - name of kernel to run from the given PTX
- arguments - lua table with CudaTensors as inputs and subtables in the form {'int', n} to provide scalar arguments
- gridDim - size of the grid table, has to have at least one value, others will be filled with ones
- blockDim - size of block table, again has to have at least one value, others will be ones
PTX can be generated in runtime with https://github.com/szagoruyko/nvrtc.torch
Short example:
local kernel = [[
extern "C" __global__
void kernel(float *a, int n)
{
int tx = blockIdx.x*blockDim.x + threadIdx.x;
if(tx < n)
a[tx] *= 2.f;
}
]]
local ptx = nvrtc.compileReturnPTX(kernel)
local a = torch.randn(32):cuda()
local b = a:clone()
cutorch.launchPTX(ptx, 'kernel', {a, {'int', a:numel()}}, {1}, {32})
apply1
Applies provided operator to a tensor:
function CudaTensor.apply1(self, op)
op has to be a lua string assigning a value to variable 'x'. CUDA built-in device functions can be used, see CUDA documentation for more information. Multiline ops supported, has to be separated with ; Both contiguous and non-contiguous tensors are valid. First call to any apply operation takes about 0.5s, then the compiled code is cached and other calls are fast.
apply2
Applies provided operator using two tensors:
function CudaTensor.apply2(self, a, op)
op has to use 'x' and 'y' - self and a tensors. Can assign values to both tensors. See apply1 for properties.
apply3
Applies provided operator using three tensors:
function CudaTensor.apply3(self, a, b, op)
op has to use 'x', 'y' and 'z' - self, a and b tensors. Can assign values to all three tensors. See apply1 for properties.