View on GitHub


ManagedCuda aims an easy integration of NVidia's CUDA in .net applications written in C#, Visual Basic or any other .net language.

Download this project as a .zip file Download this project as a tar.gz file

Welcome to managedCuda

managedCuda combines Cuda's GPU computing power with the comfort of managed .net code. While offering access to the entire feature set of Cuda's driver API, managedCuda has type safe wrapper classes for every handle defined by the API. ManagedCuda also includes wrappers for all Cuda based libraries, as there were CUFFT, CURAND, CUSPARSE, CUBLAS, CUSOLVE, NPP and NVRTC.

What managedCuda is not

managedCuda is not a code converter, which means that no C# code will be translated to Cuda. Every cuda kernel that you want to use has to be written in CUDA-C and must be compiled to PTX or CUBIN format using the NVCC toolchain.

What managedCuda is

managedCuda is the right library if you want to accelerate your .net application with Cuda without any restrictions. As every kernel is written in plain CUDA-C, all Cuda specific features are maintained. Even future improvements to Cuda by NVIDIA can be integrated without any changes to your application host code.

Where to get

Here on GitHub.

Previously, managedCuda was hosted on codeplex. Elder releases (pre cuda 7.5) are available there.

Also available as NuGet packages: search for managedCuda using NuGet package manager.

Sample code as given by the Cuda SDK samples:

//Kernel code:
extern "C"  {   
    // Device code
    __global__ void VecAdd(const float* A, const float* B, float* C, int N)
        int i = blockDim.x * blockIdx.x + threadIdx.x;
        if (i < N)
            C[i] = A[i] + B[i];

Corresponding C# code to call the kernel:

int N = 50000;
int deviceID = 0;
CudaContext ctx = new CudaContext(deviceID);
CudaKernel kernel = ctx.LoadKernel("vectorAdd.ptx", "VecAdd");
kernel.GridDimensions = (N + 255) / 256;
kernel.BlockDimensions = 256;

// Allocate input vectors h_A and h_B in host memory
float[] h_A = new float[N];
float[] h_B = new float[N];

// TODO: Initialize input vectors h_A, h_B

// Allocate vectors in device memory and copy vectors from host memory to device memory 
CudaDeviceVariable<float> d_A = h_A;
CudaDeviceVariable<float> d_B = h_B;
CudaDeviceVariable<float> d_C = new CudaDeviceVariable<float>(N);

// Invoke kernel
kernel.Run(d_A.DevicePointer, d_B.DevicePointer, d_C.DevicePointer, N);

// Copy result from device memory to host memory
// h_C contains the result in host memory
float[] h_C = d_C;

Sample showing the simple and elegant integration of NPP

//Load an image
Bitmap bmp = new Bitmap("niceImage.png");

//Alloc device memory using NPP images
NPPImage_8uC3 bmp_d = new NPPImage_8uC3(bmp.Width, bmp.Height);
NPPImage_8uC3 bmpDest_d = new NPPImage_8uC3(bmp.Width, bmp.Height);

//Copy image to GPU
//Run a NPP function
bmp_d.FilterGaussBorder(bmpDest_d, MaskSize.Size_5_X_5, NppiBorderType.Replicate);
//Copy result back to host
//Use the result