|
http://www.multiscalelab.org/swan
Swan: A simple tool for porting CUDA kernels to OpenCL
Download latest version
For the impatient: download the latest version of Swan here (by downloading this file, you are accepting the GPL2 license http://www.gnu.org/licenses/gpl-2.0.txt). To build, untar and follow the instructions in the README. the requirements are:
* A modern(ish) Linux OS
* CUDA toolkit and/or OpenCL runtime and a matching GPU
* Perl (with the package Text::Balanced)
* gcc/g++ 3.x or later
What is it?
Swan is a small tool that aids the reversible conversion of existing CUDA codebases to OpenCL. It does several useful things:
* Translates CUDA kernel source-code to OpenCL.
* Provides a common API that abstracts both CUDA and OpenCL runtimes.
*
Preserves the convenience of the CUDA <<< grid, block >>> kernel launch syntax by generating C source-code for kernel entry-point functions.
Why might you want it?
Possible uses include:
* Evaluating OpenCL performance of an existing CUDA code.
* Maintaining a dual-target OpenCL and CUDA code.
* Reducing dependence on NVCC when compiling host code.
* Support multiple CUDA compute capabilities in a single binary
Limitations
It's not a drop-in replacement for nvcc. Host code needs to have all kernel invocations and CUDA API calls re-written.
Swan does not support a few things. In particular:
* CUDA C++ templating in kernel code.
* OpenCL Images/Samplers (analogous to Textures).
* Multiple device management in a single process.
* Compiling kernels for the CPU.
* CUDA device-emulation mode.
Furthermore, it's a work in progress. It works for our code but no promises it will for yours.
Use
The essential steps for converting a CUDA code:
* Separate CUDA kernels into separate source files.
* Compile the kernel source using swan. For a CUDA target:
swan --cuda kernel.kh kernel.cu
or for OpenCL
swan --opencl kernel.kh kernel.cu
*
The output file kernel.kh contains:
o a data block which holds the compiled kernel source code (PTX or IL).
o autogenerated source-code functions for running each of the kernels
*
#include "kernel.kh" into the host source.
* Re-write all kernel invocations to use the corresponding entry-point functions. For example, the kernel with formal declaration:
__global__ void vec_add( float *in, float *out, int N );
will have an entry point defined as:
void k_vec_add( int3 grid, int3 block, int shmem, float *in, float *out, int N );
where grid, block and shmem correspond to the first three arguments of the <<< >>> syntax.
Entry points are named systematically. For any kernel kernel(), there will be two entry points: k_kernel() and k_kernel_async(). The former is a synchronous call (blocks until kernel completes execution).
*
Re-write all of the CUDA API calls with the Swan analogues (see swan_api.h)
*
Compile and link code against the appropriate Swan library (libswan_ocl or libswan_cuda).
Examples can be found in the package.
Contact
For queries, problems and suggestions, contact Matt @ M.J.Harvey (at) imperial (dot) ic (dot) ac (dot) uk.
Publications
*
Experiences porting from CUDA to OpenCL Presentation at the Daresbury Machine Evaluation Workshop, 2009
License
Swan is licensed under the GPL version 2 |
|