Tag Archives: opencl

AMD and Oracle to collaborate on Heterogenous Computing in Java

In August John Coomes from Oracle made a proposal to add GPU support to Java. One month later, on Sep 10, he proposed the creation of a new project called Sumatra to continue with this endeavour. On Sep 24 this project was approved by a 100% vote in favour. During the recent JavaOne 2012 AMD officially announced its participation in OpenJDK Project Sumatra in collaboration with Oracle and OpenJDK to bring heterogenous computing to Java for server and cloud environments. The Inquirer also reports on this subject.

This is very exciting news indeed. Although there are already two libraries for GPU programming in Java – namely rootbeer and aparapi, having GPU support built in to the Java language, the Java API and most importantly the JVM will provide an alternative more compelling than the use of any external library. And to be quite frank there could not be a collaborator than AMD given their vast contribution to date to OpenCL and OpenCL development tools. And unlike Nvidia, they are wholly committed to OpenCL and not working on their own proprietary alternative.

Although it’ll be a while before this project sees any substantial contribution I cannot wait to see this take form over the next year or two. OpenCL and, in general, the GPU programming paradigm is hard; very hard; and even more importantly porting existing code is even harder; and if anyone can make this domain accessible to the mainstream it’s Java. Once Sumatra is ready hopefully we won’t have to write OpenCL anymore. We’ll be able to write normal Java, compile it and at either compile time or runtime the byte code will get translated into OpenCL and compiled. At execution time we won’t have to worry about what hardware we’re running because with any luck it’ll be write once run anywhere!

StreamComputing.eu links to OpenCL Cookbook series!

I am delighted to find that streamcomputing.eu, a Dutch consultancy specialising in high performance parallel computing, has linked to my OpenCL Cookbook series under their training > self study > tutorials > learning opencl section! Thanks streamcomputing.eu – I really appreciate it! It’s an honour to be linked to from such a specialist and dedicated resource on a subject as challenging, as critically important and as promising as GPGPU computing.

I first came across streamcomputing.eu a few weeks ago just when I started learning OpenCL. OpenCL being such a niche and young topic I found that there was precious little content on it out there and streamcomputing.eu was one of the few resources that not only had a lot of content on this topic but was almost entirely dedicated to it. They also appeared to be very passionate holding talks as well as producing a lot of blog content.

Needless to say I will continue to follow them to keep up to date on this exhilarating subject and I will also do my best to expand this series to make it an indispensable resource on the subject (time permitting!).

Update: Thanks for the two tweets Vincent.

OpenCL Cookbook: Series Reference

Recently I’d been writing a number of primer articles on OpenCL programming under the common reference name of ‘OpenCL Cookbook’ but, caught up in the content, I had completely forgotten to provide a single point of reference to all articles in the series. Here it is finally. This page will always be kept up-to-date, in chronological order, with all new articles in the series.

  1. OpenCL Cookbook: Listing all platforms and their attributes
  2. OpenCL Cookbook: Listing all devices and their critical attributes
  3. OpenCL Cookbook: Creating contexts and reference counting
  4. OpenCL Cookbook: Creating programs and reading kernels from a file
  5. OpenCL Cookbook: Building a program and debugging failures
  6. OpenCL Cookbook: Hello World using C host binding
  7. OpenCL Cookbook: Hello World using C++ host binding
  8. OpenCL Cookbook: Parallelise your host loops using OpenCL
  9. OpenCL Cookbook: Hello World using C# Cloo host binding
  10. OpenCL Cookbook: How to leverage multiple devices in OpenCL
  11. OpenCL Cookbook: Compiling OpenCL with Ubuntu 12.10, Unity, AMD 12.11 beta drivers & AMD APP SDK 2.7
  12. OpenCL Cookbook: Using amdconfig/aticonfig – a powerful utility in the AMD OpenCL toolset on Linux
  13. OpenCL Cookbook: Running remote multi-gpu OpenCL computations over SSH on Ubuntu or Debian using AMD Catalyst drivers
  14. OpenCL Cookbook: Multi device utilisation strategies
  15. OpenCL Cookbook: 10 tips for high performance kernels

OpenCL Cookbook: Hello World using C host binding

In our OpenCL Cookbook series so far we’ve looked at some preliminary data structures in OpenCL host programming using the C language. This time – we finally arrive at a complete end-to-end example – the customary Hello World!

What this example does is simple. The host program in C passes a character array to the GPU into which the GPU writes the characters of the phrase: “Hello, World!”. The host program then reads the contents of the character array back and prints them on the screen. The output should be “Hello, World!”.

The code is annotated using brief comments. There are some aspects of OpenCL that are new that I have not yet been through in previous articles but don’t worry I’ll go through a full dissection after presenting the complete code.

Note that error handling has been taken out completely to keep the program short for easy viewing. The most important aspect of error handling in the program below is checking the build error, status and log for a failed program build which you can find further details of in my previous post.

Host source

#include
#include
#ifdef __APPLE__
#include <OpenCL/opencl.h>
#else
#include <CL/cl.h>
#endif

int main() {

    cl_platform_id platform; cl_device_id device; cl_context context;
    cl_program program; cl_kernel kernel; cl_command_queue queue;
    cl_mem kernelBuffer;

    FILE* programHandle; char *programBuffer; char *programLog;
    size_t programSize; char hostBuffer[32];

    // get first available sdk and gpu and create context
    clGetPlatformIDs(1, &platform, NULL);
    clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU, 1, &device, NULL);
    context = clCreateContext(NULL, 1, &device, NULL, NULL, NULL);

    // get size of kernel source
    programHandle = fopen("helloWorld.cl", "r");
    fseek(programHandle, 0, SEEK_END);
    programSize = ftell(programHandle);
    rewind(programHandle);

    // read kernel source into buffer
    programBuffer = (char*) malloc(programSize + 1);
    programBuffer[programSize] = '\0';
    fread(programBuffer, sizeof(char), programSize, programHandle);
    fclose(programHandle);

    // create and build program
    program = clCreateProgramWithSource(context, 1,
            (const char**) &programBuffer, &programSize, NULL);
    free(programBuffer);
    clBuildProgram(program, 1, &device, "-Werror -cl-std=CL1.1", NULL, NULL);

    // create kernel and command queue
    kernel = clCreateKernel(program, "hello", NULL);
    queue = clCreateCommandQueue(context, device, 0, NULL);

    // create kernel argument buffer and set it into kernel
    kernelBuffer = clCreateBuffer(context, CL_MEM_WRITE_ONLY,
            32 * sizeof(char), NULL, NULL);
    clSetKernelArg(kernel, 0, sizeof(cl_mem), &kernelBuffer);

    // execute kernel, read back the output and print to screen
    clEnqueueTask(queue, kernel, 0, NULL, NULL);
    clEnqueueReadBuffer(queue, kernelBuffer, CL_TRUE, 0,
            32 * sizeof(char), hostBuffer, 0, NULL, NULL);
    puts(hostBuffer);

    clFlush(queue);
    clFinish(queue);
    clReleaseKernel(kernel);
    clReleaseProgram(program);
    clReleaseMemObject(kernelBuffer);
    clReleaseCommandQueue(queue);
    clReleaseContext(context);
    return 0;

}

The host source runs on the CPU and is written in C in this case though you could also write it in C++, Python or Java whereas the kernel source runs on a device which could be one or more CPUs, GPUs or accelerators. The host source must be written in a host language whereas the kernel source must be written in OpenCL.

Host source by dissection

Here I describe what the host source is doing by dissecting it. A hello world example should ideally be entirely self contained and not rely on other articles to complement the reader’s understanding. With the exception of error handling and particularly how to debug a failed program build which I address elsewhere this example is self contained.

Below I present one snippet of code at a time followed by its dissection.

Creating platforms, devices and contexts

// get first available sdk and gpu and create context
clGetPlatformIDs(1, &platform, NULL);
clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU, 1, &device, NULL);
context = clCreateContext(NULL, 1, &device, NULL, NULL, NULL);

Here I first get a platform (an OpenCL SDK/framework). As I know I only have the Apple OpenCL framework installed on my Mac it will always be the one selected. However, if you have multiple SDKs installed such as AMD, Nvidia and Intel then you may want to select one explicitly. Next I ask for a GPU device. Once again, my machine only has one GPU so it will always be the one that’s selected but if you have multiple GPUs installed you may want to choose one in particular. Finally I create a context which is an incredibly important OpenCL data structure as it is required for the creation of numerous other structures such as programs, command queues and kernel buffers.

Loading kernel sources

// get size of kernel source
programHandle = fopen("helloWorld.cl", "r");
fseek(programHandle, 0, SEEK_END);
programSize = ftell(programHandle);
rewind(programHandle);

// read kernel source into buffer
programBuffer = (char*) malloc(programSize + 1);
programBuffer[programSize] = '\0';
fread(programBuffer, sizeof(char), programSize, programHandle);
fclose(programHandle);

As this is a host source file it has the responsibility of involving the kernel source. Generally speaking the kernel source is usually compiled at runtime as part of the execution of the host source. Therefore, the host source file must pull in the kernel source and compile it. Above I first calculate the size of the kernel source file and then read the source in into a buffer of that calculated size.

Creating a program and compiling kernel sources

// create and build program
program = clCreateProgramWithSource(context, 1,
        (const char**) &programBuffer, &programSize, NULL);
free(programBuffer);
clBuildProgram(program, 1, &device, "-Werror -cl-std=CL1.1", NULL, NULL);

Here I construct a program structure by passing in a context and the buffer containing the kernel source. Then I build the program which essentially compiles the kernel source based on supplied build options. Note that a program can contain numerous kernel sources containing multiple OpenCL functions potentially drawn in from a number of files. This program build steps builds the sum total of all kernels sources read in. At this point the build could fail for a variety of reasons and it’s critically important to be able to narrow the cause easily. Here I’ve skipped this error handling but I address this subject in detail on my previous post in the series.

Creating kernels and command queues

// create kernel and command queue
kernel = clCreateKernel(program, "hello", NULL);
queue = clCreateCommandQueue(context, device, 0, NULL);

Here I create a kernel and a command queue structure. Let’s look at what each one means in turn.

A kernel is an OpenCL function that executes on one or more devices. The program structure above may contain numerous functions so the purpose of creating the kernel structure above is to pinpoint one particular one called ‘hello’. I need a reference to this kernel in order to pass it an argument later on in the process.

A command queue is exactly what the name implies. The host program invokes a device by sending it a command. The sending mechanism for that command is a queue. Commands are by default processed in FIFO order but that can be changed by a configuration option. Sending a command, also known as a task, to a command queue is a way of requesting its execution.

Setting kernel arguments

// create kernel argument buffer and set it into kernel
kernelBuffer = clCreateBuffer(context, CL_MEM_WRITE_ONLY,
        32 * sizeof(char), NULL, NULL);
clSetKernelArg(kernel, 0, sizeof(cl_mem), &kernelBuffer);

Here, I create an OpenCL memory object. There are two types of memory objects – image and buffer. Here I am not dealing with image data so I choose a buffer memory object. Our goal here is to provide the kernel with a character array big enough to hold the phrase ‘Hello, World!’. However I cannot pass a character array into a kernel directly. I must create an OpenCL buffer memory object of a given size and then set it as the first kernel argument and that’s what I’m doing above. You’ll notice that I set the memory object to be write only as the device only needs to write to it.

Executing kernels and reading output data

// execute kernel, read back the output and print to screen
clEnqueueTask(queue, kernel, 0, NULL, NULL);
clEnqueueReadBuffer(queue, kernelBuffer, CL_TRUE, 0,
        32 * sizeof(char), hostBuffer, 0, NULL, NULL);
puts(hostBuffer);

This is the final step. Earlier I created a command queue and a kernel structure for the hello function and passed in a buffer memory object as the first argument. Here I complete the entire process by enqueuing the kernel for execution as a task into the command queue and reading back the output by passing in a character array of the same size as the original kernel buffer memory object. I then print the contents of that array onto the screen to prove that it contains what the GPU originally wrote into it.

Cleaning up

clFlush(queue);
clFinish(queue);
clReleaseKernel(kernel);
clReleaseProgram(program);
clReleaseMemObject(kernelBuffer);
clReleaseCommandQueue(queue);
clReleaseContext(context);

Above I first ensure that all commands have been issed to the device associated with the command queue by calling clFlush(). Then I block until all commands have been issued and completed by calling clFinish(). The rest of the functions above simple deallocate their own respective named structures.

Kernel source

__kernel void hello(__global char* message){
message[0] = 'H';
message[1] = 'e';
message[2] = 'l';
message[3] = 'l';
message[4] = 'o';
message[5] = ',';
message[6] = ' ';
message[7] = 'W';
message[8] = 'o';
message[9] = 'r';
message[10] = 'l';
message[11] = 'd';
message[12] = '!';
message[13] = '';
}

The kernel source is fairly self explanatory. It simply receives a character array called message and writes its message into it. Kernel functions get infinitely more complex than this one but this one has been kept deliberately simple.

Compile and run as follows keeping both source files in the same directory.

clang -framework OpenCL helloWorld.c -o helloWorld && ./helloWorld

As always if you have any feedback or if this helped you let me know in the comments!

OpenCL Cookbook: Building a program and debugging failures

Last time, in the opencl cookbook series, we looked at how to create a program data structure in the C OpenCL host programming API as well as how to read kernel source from a separate file. A program is a container or collection of kernels and a kernel, in turn, is a function in OpenCL that executes on an OpenCL device such as a CPU, GPU or accelerator. This time we look at how to build a program, which in turn builds the kernels within it and also how to debug failures that occur in the program build. For the latter we recreate two kinds of failures to see how the program reacts.

But what actually happens when you build a program? The clBuildProgram() function in C takes a program and a program in turn contains the source of one or more kernels read in from one or more files. However, kernel sources in raw form are of little use. To be functionally useful they must be compiled. This is what happens when you build a program.

Every OpenCL framework/SDK/implementation (whatever you want to call it) is mandated by the specification to make a compiler accessible through the clBuildProgram(). Though, they may provide other interfaces to their OpenCL compiler. AMD provides a compile time compiler command called clc whereas NVidia provides only a runtime one. clBuildProgram() compiles and link a program for devices associated with the platform.

Our example host program below is an extension of the one in the previous article in the series. It reads in a kernel function from a separate file into an OpenCL program and builds the program which would normally succeed. However, here I introduce a couple of mistakes to show you how the API reacts in each case and how to debug such issues.

Host source (buildProgramDebug.c)

#include <stdio.h>
#include <stdlib.h>
#ifdef __APPLE__
#include <OpenCL/opencl.h>
#else
#include <CL/cl.h>
#endif

int main() {

    cl_platform_id platform; cl_device_id device; cl_context context;
    cl_program program; cl_int error; cl_build_status status;

    FILE* programHandle;
    char *programBuffer; char *programLog;
    size_t programSize; size_t logSize;

    // get first available platform and gpu and create context
    clGetPlatformIDs(1, &platform, NULL);
    clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU, 1, &device, NULL);
    context = clCreateContext(NULL, 1, &device, NULL, NULL, NULL);

    // get size of kernel source
    programHandle = fopen("kernel.cl", "r");
    fseek(programHandle, 0, SEEK_END);
    programSize = ftell(programHandle);
    rewind(programHandle);

    // read kernel source into buffer
    programBuffer = (char*) malloc(programSize + 1);
    programBuffer[programSize] = '\0';
    fread(programBuffer, sizeof(char), programSize, programHandle);
    fclose(programHandle);

    // create program from buffer
    program = clCreateProgramWithSource(context, 1,
            (const char**) &programBuffer, &programSize, NULL);
    free(programBuffer);

    // build program
    const char options[] = "-Werror -cl-std=CL1.1";
    error = clBuildProgram(program, 1, &device, options, NULL, NULL);

    // build failed
    if (error != CL_SUCCESS) {

        // check build error and build status first
        clGetProgramBuildInfo(program, device, CL_PROGRAM_BUILD_STATUS, 
                sizeof(cl_build_status), &status, NULL);

        // check build log
        clGetProgramBuildInfo(program, device, 
                CL_PROGRAM_BUILD_LOG, 0, NULL, &logSize);
        programLog = (char*) calloc (logSize+1, sizeof(char));
        clGetProgramBuildInfo(program, device, 
                CL_PROGRAM_BUILD_LOG, logSize+1, programLog, NULL);
        printf("Build failed; error=%d, status=%d, programLog:nn%s", 
                error, status, programLog);
        free(programLog);

    }

    clReleaseContext(context);
    return 0;

}

Kernel source (kernel.cl)

__kernel void hello(__global char* string){

string[0] = 'H';
string[1] = 'e';
string[2] = 'l';
string[3] = 'l';
string[4] = 'o';
string[5] = ',';
string[6] = ' ';
string[7] = 'W';
string[8] = 'o';
string[9] = 'r';
string[10] = 'l';
string[11] = 'd';
string[12] = '!';
string[13] = '';

}

These two programs, as they stand, have no errors and in that state produce no output. The kernel source is read in to a program and the program built based on the command line options provided on line 41. The error variable on line 45 always equals CL_SUCCESS so nothing is printed. Let’s now introduce two errors in our source code one at a time and see what happens. Specifically, when a problem occurs, we’ll be examining three separate variables – the error code (which is the return value from clBuildProgram), the program build status (which has to be specifically requested) and the program build log (which also has to be requested).

Error 1: Rogue build option

Here I change line 41 to contain a bogus command line option called ‘-foobar’ as below.

const char options[] = "-Werror -cl-std=CL1.1 -foobar";

When the program is built, unsurprisingly it fails, with the output below.

tron:opencl dhruba$ clang -framework OpenCL buildProgramDebug.c -o buildProgramDebug && ./buildProgramDebug
Build failed; error=-43, status=-1, programLog:

Above the error code of -43 corresponds to the constant CL_INVALID_BUILD_OPTIONS and a status of -1 corresponds to the constant CL_BUILD_NONE. The former is self explanatory whereas the latter means that the kernel was not compiled which is to be expected as the build options were wrong.

Error 2: Kernel source syntax error

Here, I revert the error I introduced last time, and instead add an extra underscore as the first character of the kernel source to create an OpenCL syntax error. This time the output is longer as the program log is no longer blank.

tron:opencl dhruba$ clang -framework OpenCL buildProgramDebug.c -o buildProgramDebug && ./buildProgramDebug
Build failed; error=-11, status=-2, programLog:

:1:1: error: unknown type name '___kernel'
___kernel void hello(__global char* string){
^
:1:11: error: expected identifier or '('
___kernel void hello(__global char* string){

Above the error code of -11 refers to the constant CL_BUILD_PROGRAM_FAILURE and a program build status of -2 refers to the constant CL_BUILD_ERROR which makes sense. This time, however, we have some output in the program log field. The syntax error in the kernel source is being reported by the runtime compiler to the program log.

Error code and build status constants

One final tip: you may be asking how I knew which error codes and build statuses corresponded to which constants in OpenCL. Well – this is a bit of a nightmare to be quite honest. I had to open up the following header file in the Apple OpenCL framework to check which constants matched those integers. I’m sure there will be similar places to look in other SDKs.

/System/Library/Frameworks/OpenCL.framework/Headers/cl.h

As you can see, the above three fields, provide critically important means of debugging the failure to build of your program and can point our errors both in your host source and your kernel source. Did this help you, did you face any issues or do you have feedback for improvement? Let me know in the comments!

OpenCL Cookbook: Creating programs and reading kernels from a file

In OpenCL host programming, after platforms, devices and contexts in our OpenCL Cookbook series, comes the program data structure. Remember that a kernel represents a function in actual OpenCL code intended for execution on any given device as opposed to being written in a host language like C which can only execute on the CPU. In contrast a program is a container of kernels in the host programming language – in this case C though it could be C++, Java or Python. In order to execute a kernel function on a given device (CPU, GPU or accelerator) it must first be fed in to a host program to create a program data structure.

In the host program below, written in C, we take a small step further than last time to read in an OpenCL kernel function from a separate file into our host program and create a program data structure. We, then, retrieve the kernel source code from within the program data structure and verify that it is the same kernel function that we fed in earlier as a simple test. This can be valuable in the real world where you may be feeding in kernels from multiple files and want to make sure that all of them have been read into the program. When the program structure is queried for the kernel source it concatenates all kernel sources into one string so simply reading that will give you a global view of all kernel code previously fed in.

The host program which reads the kernel file in is as follows.

#include <stdio.h>
#include <stdlib.h>
#ifdef __APPLE__
#include <OpenCL/opencl.h>
#else
#include <CL/cl.h>
#endif

int main() {

    cl_platform_id platform;
    cl_device_id device;
    cl_context context;
    cl_program program;

    FILE* programHandle;
    size_t programSize, kernelSourceSize;
    char *programBuffer, *kernelSource;

    // get first available platform and gpu and create context
    clGetPlatformIDs(1, &platform, NULL);
    clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU, 1, &device, NULL);
    context = clCreateContext(NULL, 1, &device, NULL, NULL, NULL);

    // get size of kernel source
    programHandle = fopen("kernel.cl", "r");
    fseek(programHandle, 0, SEEK_END);
    programSize = ftell(programHandle);
    rewind(programHandle);

    // read kernel source into buffer
    programBuffer = (char*) malloc(programSize + 1);
    programBuffer[programSize] = '\0';
    fread(programBuffer, sizeof(char), programSize, programHandle);
    fclose(programHandle);

    // create program from buffer
    program = clCreateProgramWithSource(context, 1,
            (const char**) &programBuffer, &programSize, NULL);
    free(programBuffer);

    // read kernel source back in from program to check
    clGetProgramInfo(program, CL_PROGRAM_SOURCE, 0, NULL, &kernelSourceSize);
    kernelSource = (char*) malloc(kernelSourceSize);
    clGetProgramInfo(program, CL_PROGRAM_SOURCE, kernelSourceSize, kernelSource, NULL);
    printf("nKernel source:nn%sn", kernelSource);
    free(kernelSource);

    clReleaseContext(context);
    return 0;

}

The kernel code, saved in a file called kernel.cl, is as below.

_kernel void hello(__global char* string){

string[0] = 'H';
string[1] = 'e';
string[2] = 'l';
string[3] = 'l';
string[4] = 'o';
string[5] = ',';
string[6] = ' ';
string[7] = 'W';
string[8] = 'o';
string[9] = 'r';
string[10] = 'l';
string[11] = 'd';
string[12] = '!';
string[13] = '';

}

Save the host program as programs.c and then compile and run as follows. If you don’t have the clang command install it.

clang -framework OpenCL programs.c -o programs && ./programs

The output on my system (Macbook Air) is, as you would expect, the kernel source exactly as above.

Did this help you, did you face any issues or do you have any feedback for improvements? Let me know in the comments!

OpenCL Cookbook: Creating contexts and reference counting

Following on from my previous articles on platforms and devices in the OpenCL Cookbook series, in this instalment, I move onto the next most critical host programming data structure in OpenCL – the context.

Contexts

A context in OpenCL requires a platform and one or more devices to function and is used to create command queues which are the structures that allow hosts to send kernels to devices. That’s a loaded sentence so let’s break it down. The program such as the one written below in C (the host) may want the CPU or the GPU (devices) to execute a calculation (a kernel i.e. a function). In order for that to happen a command queue for that device must be created and the calculation enqueued onto it. That, in essence, is how a task is relayed to a device and execution is triggered in OpenCL.

Context creation

In creating a context – the platform and devices do not necessarily have to be created and supplied to the context creating method. For example a context can be created simply by choosing a device type as below.

context = clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU, NULL, NULL, NULL);

In the example above, the platform that is selected is implementation defined but if you only have one platform like me then that’s automatically selected. The device selected will be the first available one of the type you specify. So, on my Macbook Air, the Apple SDK and my only GPU – the NVidia card are automatically selected.

Alternatively, as in the snippet below you can create and provide the platform and device explicitly.

context = clCreateContext(NULL, 1, &device, NULL, NULL, NULL);

Controlling context lifetime using its reference count

In the complete program below I also introduce another important concept related to contexts – its reference count. A context when created starts with a reference count of 1 and when the function you are creating it in goes out of scope it is deallocated automatically. But this may be undesirable – maybe you want to continue accessing the context after the current function has gone out of scope. For this reason a context’s reference count can be incremented and decremented and it is only deallocated when its reference count reaches zero.

The general guideline is that if you are writing a function that uses an already created context you should increment the reference count at the start and decrement it at the end of your function. If, however, you are creating a context then at the end of your function you must simply decrement the reference count as I’m doing below.

#include <stdio.h>
#include <stdlib.h>
#ifdef __APPLE__
#include <OpenCL/opencl.h>
#else
#include <CL/cl.h>
#endif

int main() {

    cl_platform_id platform;
    cl_device_id device;
    cl_context context;
    cl_uint refCount;

    // get first available platform
    clGetPlatformIDs(1, &platform, NULL);

    // get first available gpu device
    clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU, 1, &device, NULL);

    // create context
    context = clCreateContext(NULL, 1, &device, NULL, NULL, NULL);

    // get context reference count
    clGetContextInfo(context, CL_CONTEXT_REFERENCE_COUNT,
            sizeof(refCount), &refCount, NULL);
    printf("Ref count: %u ", refCount);

    // increment reference count
    clRetainContext(context);
    clGetContextInfo(context, CL_CONTEXT_REFERENCE_COUNT,
            sizeof(refCount), &refCount, NULL);
    printf(">> %u ", refCount);

    // decrement reference count
    clReleaseContext(context);
    clGetContextInfo(context, CL_CONTEXT_REFERENCE_COUNT,
            sizeof(refCount), &refCount, NULL);
    printf(">> %u ", refCount);

    // finally release context
    clReleaseContext(context);
    printf(">> 0n");
    return 0;

}

Compile and run on the Mac as follows. If you don’t have the clang, g++ or gcc commands install them.

$ clang -framework OpenCL contexts.c -o contexts && ./contexts

The output produced on my machine is as below.

Ref count: 1 >> 2 >> 1 >> 0

As always error handling has been omitted for brevity and the code is only tested on my Macbook Air but should work on other platforms. If you have any issues or have suggestions for improvements to the code do let me know.

Did this help you? Let me know in the comments!

OpenCL Cookbook: Listing all devices and their critical attributes

Last time, in our OpenCL Cookbook series, we looked at how to list all platforms and their attributes. This time, we take the next step and list all devices that a platform provides access to and their critical attributes. This program is very useful in that it provides a quick and easy way of introspecting a given system’s OpenCL capabilities. Note that error handling has been omitted for brevity and the host language as before is C.

To recap on terminology: a platform is an OpenCL SDK such as an Apple, Intel, NVidia or AMD SDK. A device, on the other hand, may be a cpu, gpu or accelerator and as a result it’s highly likely a system will have multiple devices. For each device we list its critical attributes: hardware OpenCL version, software driver version, opencl c version supported by compiler for device and finally the number of parallel compute units (cores) it possesses which symbolises the extent of task based parallelism that we can achieve.

#include <stdio.h>                                                                                                                                               
#include <stdlib.h>
#ifdef __APPLE__
#include <OpenCL/opencl.h>
#else
#include <CL/cl.h>
#endif

int main() {

    int i, j;
    char* value;
    size_t valueSize;
    cl_uint platformCount;
    cl_platform_id* platforms;
    cl_uint deviceCount;
    cl_device_id* devices;
    cl_uint maxComputeUnits;

    // get all platforms
    clGetPlatformIDs(0, NULL, &platformCount);
    platforms = (cl_platform_id*) malloc(sizeof(cl_platform_id) * platformCount);
    clGetPlatformIDs(platformCount, platforms, NULL);

    for (i = 0; i < platformCount; i++) {

        // get all devices
        clGetDeviceIDs(platforms[i], CL_DEVICE_TYPE_ALL, 0, NULL, &deviceCount);
        devices = (cl_device_id*) malloc(sizeof(cl_device_id) * deviceCount);
        clGetDeviceIDs(platforms[i], CL_DEVICE_TYPE_ALL, deviceCount, devices, NULL);

        // for each device print critical attributes
        for (j = 0; j < deviceCount; j++) {

            // print device name
            clGetDeviceInfo(devices[j], CL_DEVICE_NAME, 0, NULL, &valueSize);
            value = (char*) malloc(valueSize);
            clGetDeviceInfo(devices[j], CL_DEVICE_NAME, valueSize, value, NULL);
            printf("%d. Device: %sn", j+1, value);
            free(value);

            // print hardware device version
            clGetDeviceInfo(devices[j], CL_DEVICE_VERSION, 0, NULL, &valueSize);
            value = (char*) malloc(valueSize);
            clGetDeviceInfo(devices[j], CL_DEVICE_VERSION, valueSize, value, NULL);
            printf(" %d.%d Hardware version: %sn", j+1, 1, value);
            free(value);

            // print software driver version
            clGetDeviceInfo(devices[j], CL_DRIVER_VERSION, 0, NULL, &valueSize);
            value = (char*) malloc(valueSize);
            clGetDeviceInfo(devices[j], CL_DRIVER_VERSION, valueSize, value, NULL);
            printf(" %d.%d Software version: %sn", j+1, 2, value);
            free(value);

            // print c version supported by compiler for device
            clGetDeviceInfo(devices[j], CL_DEVICE_OPENCL_C_VERSION, 0, NULL, &valueSize);
            value = (char*) malloc(valueSize);
            clGetDeviceInfo(devices[j], CL_DEVICE_OPENCL_C_VERSION, valueSize, value, NULL);
            printf(" %d.%d OpenCL C version: %sn", j+1, 3, value);
            free(value);

            // print parallel compute units
            clGetDeviceInfo(devices[j], CL_DEVICE_MAX_COMPUTE_UNITS,
                    sizeof(maxComputeUnits), &maxComputeUnits, NULL);
            printf(" %d.%d Parallel compute units: %dn", j+1, 4, maxComputeUnits);

        }

        free(devices);

    }

    free(platforms);
    return 0;

}

Compile and run on the Mac as follows. If you don’t have the clang, g++ or gcc commands install them. Any of those commands should work.

$ clang -framework OpenCL devices.c -o devices && ./devices

The output produced on my machine is as follows but may differ on your system.

1. Device: Intel(R) Core(TM)2 Duo CPU     U9600  @ 1.60GHz
 1.1 Hardware version: OpenCL 1.2 
 1.2 Software version: 1.1
 1.3 OpenCL C version: OpenCL C 1.2 
 1.4 Parallel compute units: 2
2. Device: GeForce 320M
 2.1 Hardware version: OpenCL 1.0 
 2.2 Software version: CLH 1.0
 2.3 OpenCL C version: OpenCL C 1.1 
 2.4 Parallel compute units: 6

As you can see my Macbook Air shows rather feeble and outdated metadata being an old slimline laptop. As always, the code is only tested on my Macbook Air but, in theory, should run on Windows and Linux though the way you compile and run will differ from above slightly. If you have any issues or would like to critique and improve my code (given I’m not a C programmer) by all means leave a comment.

Did this help you? Let me know in the comments!

OpenCL Cookbook: Listing all platforms and their attributes

The first article in the OpenCL Cookbook series looks at how to list all platforms and their attributes in OpenCL using C as a host language on an OpenCL supported system.

For those new to OpenCL (like me) a platform is a top level entity in the OpenCL API and represents an SDK. You have to get a platform before you can delve deeper into what a platform provides access to such as devices (cpu, gpu). Depending on the hardware/GPU of a system may find an AMD, NVidia, Intel or Apple OpenCL SDK. You may even find multiple SDKs, for instance, if you have multiple GPUs of different makes.

#include <stdio.h>
#include <stdlib.h>
#ifdef __APPLE__
#include <OpenCL/opencl.h>
#else
#include <CL/cl.h>
#endif

int main() {

    int i, j;
    char* info;
    size_t infoSize;
    cl_uint platformCount;
    cl_platform_id *platforms;
    const char* attributeNames[5] = { "Name", "Vendor",
        "Version", "Profile", "Extensions" };
    const cl_platform_info attributeTypes[5] = { CL_PLATFORM_NAME, CL_PLATFORM_VENDOR,
        CL_PLATFORM_VERSION, CL_PLATFORM_PROFILE, CL_PLATFORM_EXTENSIONS };
    const int attributeCount = sizeof(attributeNames) / sizeof(char*);

    // get platform count
    clGetPlatformIDs(5, NULL, &platformCount);

    // get all platforms
    platforms = (cl_platform_id*) malloc(sizeof(cl_platform_id) * platformCount);
    clGetPlatformIDs(platformCount, platforms, NULL);

    // for each platform print all attributes
    for (i = 0; i < platformCount; i++) {

        printf("n %d. Platform n", i+1);

        for (j = 0; j < attributeCount; j++) {

            // get platform attribute value size
            clGetPlatformInfo(platforms[i], attributeTypes[j], 0, NULL, &infoSize);
            info = (char*) malloc(infoSize);

            // get platform attribute value
            clGetPlatformInfo(platforms[i], attributeTypes[j], infoSize, info, NULL);

            printf("  %d.%d %-11s: %sn", i+1, j+1, attributeNames[j], info);
            free(info);

        }

        printf("n");

    }

    free(platforms);
    return 0;

}

Compile and run on the Mac as follows. If you don’t have the clang, g++ or gcc commands install them. Any of those commands should work.

clang -framework OpenCL platforms.c -o platforms && ./platforms

The output produced on my Macbook Air is as follows but may differ for your system.

 1. Platform 
  1.1 Name       : Apple
  1.2 Vendor     : Apple
  1.3 Version    : OpenCL 1.2 (Jun 20 2012 14:18:19)
  1.4 Profile    : FULL_PROFILE
  1.5 Extensions : cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event

The code has been tested only on my Macbook Air but should work on Windows and Linux too though the way you compile and run will differ from above slightly. If you find any issues or would like to suggest improvements to the program (given that I’m not a C programmer) then please let me know in the comments. If you would like a step by step dissected guide to the above program explaining what it’s doing let me know and if there’s enough demand I’ll do a breakdown in another post.