Tag Archives: opencl cookbook

OpenCL Cookbook: Building a program and debugging failures

Last time, in the opencl cookbook series, we looked at how to create a program data structure in the C OpenCL host programming API as well as how to read kernel source from a separate file. A program is a container or collection of kernels and a kernel, in turn, is a function in OpenCL that executes on an OpenCL device such as a CPU, GPU or accelerator. This time we look at how to build a program, which in turn builds the kernels within it and also how to debug failures that occur in the program build. For the latter we recreate two kinds of failures to see how the program reacts.

But what actually happens when you build a program? The clBuildProgram() function in C takes a program and a program in turn contains the source of one or more kernels read in from one or more files. However, kernel sources in raw form are of little use. To be functionally useful they must be compiled. This is what happens when you build a program.

Every OpenCL framework/SDK/implementation (whatever you want to call it) is mandated by the specification to make a compiler accessible through the clBuildProgram(). Though, they may provide other interfaces to their OpenCL compiler. AMD provides a compile time compiler command called clc whereas NVidia provides only a runtime one. clBuildProgram() compiles and link a program for devices associated with the platform.

Our example host program below is an extension of the one in the previous article in the series. It reads in a kernel function from a separate file into an OpenCL program and builds the program which would normally succeed. However, here I introduce a couple of mistakes to show you how the API reacts in each case and how to debug such issues.

Host source (buildProgramDebug.c)

#include <stdio.h>
#include <stdlib.h>
#ifdef __APPLE__
#include <OpenCL/opencl.h>
#else
#include <CL/cl.h>
#endif

int main() {

    cl_platform_id platform; cl_device_id device; cl_context context;
    cl_program program; cl_int error; cl_build_status status;

    FILE* programHandle;
    char *programBuffer; char *programLog;
    size_t programSize; size_t logSize;

    // get first available platform and gpu and create context
    clGetPlatformIDs(1, &platform, NULL);
    clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU, 1, &device, NULL);
    context = clCreateContext(NULL, 1, &device, NULL, NULL, NULL);

    // get size of kernel source
    programHandle = fopen("kernel.cl", "r");
    fseek(programHandle, 0, SEEK_END);
    programSize = ftell(programHandle);
    rewind(programHandle);

    // read kernel source into buffer
    programBuffer = (char*) malloc(programSize + 1);
    programBuffer[programSize] = '\0';
    fread(programBuffer, sizeof(char), programSize, programHandle);
    fclose(programHandle);

    // create program from buffer
    program = clCreateProgramWithSource(context, 1,
            (const char**) &programBuffer, &programSize, NULL);
    free(programBuffer);

    // build program
    const char options[] = "-Werror -cl-std=CL1.1";
    error = clBuildProgram(program, 1, &device, options, NULL, NULL);

    // build failed
    if (error != CL_SUCCESS) {

        // check build error and build status first
        clGetProgramBuildInfo(program, device, CL_PROGRAM_BUILD_STATUS, 
                sizeof(cl_build_status), &status, NULL);

        // check build log
        clGetProgramBuildInfo(program, device, 
                CL_PROGRAM_BUILD_LOG, 0, NULL, &logSize);
        programLog = (char*) calloc (logSize+1, sizeof(char));
        clGetProgramBuildInfo(program, device, 
                CL_PROGRAM_BUILD_LOG, logSize+1, programLog, NULL);
        printf("Build failed; error=%d, status=%d, programLog:nn%s", 
                error, status, programLog);
        free(programLog);

    }

    clReleaseContext(context);
    return 0;

}

Kernel source (kernel.cl)

__kernel void hello(__global char* string){

string[0] = 'H';
string[1] = 'e';
string[2] = 'l';
string[3] = 'l';
string[4] = 'o';
string[5] = ',';
string[6] = ' ';
string[7] = 'W';
string[8] = 'o';
string[9] = 'r';
string[10] = 'l';
string[11] = 'd';
string[12] = '!';
string[13] = '';

}

These two programs, as they stand, have no errors and in that state produce no output. The kernel source is read in to a program and the program built based on the command line options provided on line 41. The error variable on line 45 always equals CL_SUCCESS so nothing is printed. Let’s now introduce two errors in our source code one at a time and see what happens. Specifically, when a problem occurs, we’ll be examining three separate variables – the error code (which is the return value from clBuildProgram), the program build status (which has to be specifically requested) and the program build log (which also has to be requested).

Error 1: Rogue build option

Here I change line 41 to contain a bogus command line option called ‘-foobar’ as below.

const char options[] = "-Werror -cl-std=CL1.1 -foobar";

When the program is built, unsurprisingly it fails, with the output below.

tron:opencl dhruba$ clang -framework OpenCL buildProgramDebug.c -o buildProgramDebug && ./buildProgramDebug
Build failed; error=-43, status=-1, programLog:

Above the error code of -43 corresponds to the constant CL_INVALID_BUILD_OPTIONS and a status of -1 corresponds to the constant CL_BUILD_NONE. The former is self explanatory whereas the latter means that the kernel was not compiled which is to be expected as the build options were wrong.

Error 2: Kernel source syntax error

Here, I revert the error I introduced last time, and instead add an extra underscore as the first character of the kernel source to create an OpenCL syntax error. This time the output is longer as the program log is no longer blank.

tron:opencl dhruba$ clang -framework OpenCL buildProgramDebug.c -o buildProgramDebug && ./buildProgramDebug
Build failed; error=-11, status=-2, programLog:

:1:1: error: unknown type name '___kernel'
___kernel void hello(__global char* string){
^
:1:11: error: expected identifier or '('
___kernel void hello(__global char* string){

Above the error code of -11 refers to the constant CL_BUILD_PROGRAM_FAILURE and a program build status of -2 refers to the constant CL_BUILD_ERROR which makes sense. This time, however, we have some output in the program log field. The syntax error in the kernel source is being reported by the runtime compiler to the program log.

Error code and build status constants

One final tip: you may be asking how I knew which error codes and build statuses corresponded to which constants in OpenCL. Well – this is a bit of a nightmare to be quite honest. I had to open up the following header file in the Apple OpenCL framework to check which constants matched those integers. I’m sure there will be similar places to look in other SDKs.

/System/Library/Frameworks/OpenCL.framework/Headers/cl.h

As you can see, the above three fields, provide critically important means of debugging the failure to build of your program and can point our errors both in your host source and your kernel source. Did this help you, did you face any issues or do you have feedback for improvement? Let me know in the comments!

OpenCL Cookbook: Creating programs and reading kernels from a file

In OpenCL host programming, after platforms, devices and contexts in our OpenCL Cookbook series, comes the program data structure. Remember that a kernel represents a function in actual OpenCL code intended for execution on any given device as opposed to being written in a host language like C which can only execute on the CPU. In contrast a program is a container of kernels in the host programming language – in this case C though it could be C++, Java or Python. In order to execute a kernel function on a given device (CPU, GPU or accelerator) it must first be fed in to a host program to create a program data structure.

In the host program below, written in C, we take a small step further than last time to read in an OpenCL kernel function from a separate file into our host program and create a program data structure. We, then, retrieve the kernel source code from within the program data structure and verify that it is the same kernel function that we fed in earlier as a simple test. This can be valuable in the real world where you may be feeding in kernels from multiple files and want to make sure that all of them have been read into the program. When the program structure is queried for the kernel source it concatenates all kernel sources into one string so simply reading that will give you a global view of all kernel code previously fed in.

The host program which reads the kernel file in is as follows.

#include <stdio.h>
#include <stdlib.h>
#ifdef __APPLE__
#include <OpenCL/opencl.h>
#else
#include <CL/cl.h>
#endif

int main() {

    cl_platform_id platform;
    cl_device_id device;
    cl_context context;
    cl_program program;

    FILE* programHandle;
    size_t programSize, kernelSourceSize;
    char *programBuffer, *kernelSource;

    // get first available platform and gpu and create context
    clGetPlatformIDs(1, &platform, NULL);
    clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU, 1, &device, NULL);
    context = clCreateContext(NULL, 1, &device, NULL, NULL, NULL);

    // get size of kernel source
    programHandle = fopen("kernel.cl", "r");
    fseek(programHandle, 0, SEEK_END);
    programSize = ftell(programHandle);
    rewind(programHandle);

    // read kernel source into buffer
    programBuffer = (char*) malloc(programSize + 1);
    programBuffer[programSize] = '\0';
    fread(programBuffer, sizeof(char), programSize, programHandle);
    fclose(programHandle);

    // create program from buffer
    program = clCreateProgramWithSource(context, 1,
            (const char**) &programBuffer, &programSize, NULL);
    free(programBuffer);

    // read kernel source back in from program to check
    clGetProgramInfo(program, CL_PROGRAM_SOURCE, 0, NULL, &kernelSourceSize);
    kernelSource = (char*) malloc(kernelSourceSize);
    clGetProgramInfo(program, CL_PROGRAM_SOURCE, kernelSourceSize, kernelSource, NULL);
    printf("nKernel source:nn%sn", kernelSource);
    free(kernelSource);

    clReleaseContext(context);
    return 0;

}

The kernel code, saved in a file called kernel.cl, is as below.

_kernel void hello(__global char* string){

string[0] = 'H';
string[1] = 'e';
string[2] = 'l';
string[3] = 'l';
string[4] = 'o';
string[5] = ',';
string[6] = ' ';
string[7] = 'W';
string[8] = 'o';
string[9] = 'r';
string[10] = 'l';
string[11] = 'd';
string[12] = '!';
string[13] = '';

}

Save the host program as programs.c and then compile and run as follows. If you don’t have the clang command install it.

clang -framework OpenCL programs.c -o programs && ./programs

The output on my system (Macbook Air) is, as you would expect, the kernel source exactly as above.

Did this help you, did you face any issues or do you have any feedback for improvements? Let me know in the comments!

OpenCL Cookbook: Creating contexts and reference counting

Following on from my previous articles on platforms and devices in the OpenCL Cookbook series, in this instalment, I move onto the next most critical host programming data structure in OpenCL – the context.

Contexts

A context in OpenCL requires a platform and one or more devices to function and is used to create command queues which are the structures that allow hosts to send kernels to devices. That’s a loaded sentence so let’s break it down. The program such as the one written below in C (the host) may want the CPU or the GPU (devices) to execute a calculation (a kernel i.e. a function). In order for that to happen a command queue for that device must be created and the calculation enqueued onto it. That, in essence, is how a task is relayed to a device and execution is triggered in OpenCL.

Context creation

In creating a context – the platform and devices do not necessarily have to be created and supplied to the context creating method. For example a context can be created simply by choosing a device type as below.

context = clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU, NULL, NULL, NULL);

In the example above, the platform that is selected is implementation defined but if you only have one platform like me then that’s automatically selected. The device selected will be the first available one of the type you specify. So, on my Macbook Air, the Apple SDK and my only GPU – the NVidia card are automatically selected.

Alternatively, as in the snippet below you can create and provide the platform and device explicitly.

context = clCreateContext(NULL, 1, &device, NULL, NULL, NULL);

Controlling context lifetime using its reference count

In the complete program below I also introduce another important concept related to contexts – its reference count. A context when created starts with a reference count of 1 and when the function you are creating it in goes out of scope it is deallocated automatically. But this may be undesirable – maybe you want to continue accessing the context after the current function has gone out of scope. For this reason a context’s reference count can be incremented and decremented and it is only deallocated when its reference count reaches zero.

The general guideline is that if you are writing a function that uses an already created context you should increment the reference count at the start and decrement it at the end of your function. If, however, you are creating a context then at the end of your function you must simply decrement the reference count as I’m doing below.

#include <stdio.h>
#include <stdlib.h>
#ifdef __APPLE__
#include <OpenCL/opencl.h>
#else
#include <CL/cl.h>
#endif

int main() {

    cl_platform_id platform;
    cl_device_id device;
    cl_context context;
    cl_uint refCount;

    // get first available platform
    clGetPlatformIDs(1, &platform, NULL);

    // get first available gpu device
    clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU, 1, &device, NULL);

    // create context
    context = clCreateContext(NULL, 1, &device, NULL, NULL, NULL);

    // get context reference count
    clGetContextInfo(context, CL_CONTEXT_REFERENCE_COUNT,
            sizeof(refCount), &refCount, NULL);
    printf("Ref count: %u ", refCount);

    // increment reference count
    clRetainContext(context);
    clGetContextInfo(context, CL_CONTEXT_REFERENCE_COUNT,
            sizeof(refCount), &refCount, NULL);
    printf(">> %u ", refCount);

    // decrement reference count
    clReleaseContext(context);
    clGetContextInfo(context, CL_CONTEXT_REFERENCE_COUNT,
            sizeof(refCount), &refCount, NULL);
    printf(">> %u ", refCount);

    // finally release context
    clReleaseContext(context);
    printf(">> 0n");
    return 0;

}

Compile and run on the Mac as follows. If you don’t have the clang, g++ or gcc commands install them.

$ clang -framework OpenCL contexts.c -o contexts && ./contexts

The output produced on my machine is as below.

Ref count: 1 >> 2 >> 1 >> 0

As always error handling has been omitted for brevity and the code is only tested on my Macbook Air but should work on other platforms. If you have any issues or have suggestions for improvements to the code do let me know.

Did this help you? Let me know in the comments!

OpenCL Cookbook: Listing all devices and their critical attributes

Last time, in our OpenCL Cookbook series, we looked at how to list all platforms and their attributes. This time, we take the next step and list all devices that a platform provides access to and their critical attributes. This program is very useful in that it provides a quick and easy way of introspecting a given system’s OpenCL capabilities. Note that error handling has been omitted for brevity and the host language as before is C.

To recap on terminology: a platform is an OpenCL SDK such as an Apple, Intel, NVidia or AMD SDK. A device, on the other hand, may be a cpu, gpu or accelerator and as a result it’s highly likely a system will have multiple devices. For each device we list its critical attributes: hardware OpenCL version, software driver version, opencl c version supported by compiler for device and finally the number of parallel compute units (cores) it possesses which symbolises the extent of task based parallelism that we can achieve.

#include <stdio.h>                                                                                                                                               
#include <stdlib.h>
#ifdef __APPLE__
#include <OpenCL/opencl.h>
#else
#include <CL/cl.h>
#endif

int main() {

    int i, j;
    char* value;
    size_t valueSize;
    cl_uint platformCount;
    cl_platform_id* platforms;
    cl_uint deviceCount;
    cl_device_id* devices;
    cl_uint maxComputeUnits;

    // get all platforms
    clGetPlatformIDs(0, NULL, &platformCount);
    platforms = (cl_platform_id*) malloc(sizeof(cl_platform_id) * platformCount);
    clGetPlatformIDs(platformCount, platforms, NULL);

    for (i = 0; i < platformCount; i++) {

        // get all devices
        clGetDeviceIDs(platforms[i], CL_DEVICE_TYPE_ALL, 0, NULL, &deviceCount);
        devices = (cl_device_id*) malloc(sizeof(cl_device_id) * deviceCount);
        clGetDeviceIDs(platforms[i], CL_DEVICE_TYPE_ALL, deviceCount, devices, NULL);

        // for each device print critical attributes
        for (j = 0; j < deviceCount; j++) {

            // print device name
            clGetDeviceInfo(devices[j], CL_DEVICE_NAME, 0, NULL, &valueSize);
            value = (char*) malloc(valueSize);
            clGetDeviceInfo(devices[j], CL_DEVICE_NAME, valueSize, value, NULL);
            printf("%d. Device: %sn", j+1, value);
            free(value);

            // print hardware device version
            clGetDeviceInfo(devices[j], CL_DEVICE_VERSION, 0, NULL, &valueSize);
            value = (char*) malloc(valueSize);
            clGetDeviceInfo(devices[j], CL_DEVICE_VERSION, valueSize, value, NULL);
            printf(" %d.%d Hardware version: %sn", j+1, 1, value);
            free(value);

            // print software driver version
            clGetDeviceInfo(devices[j], CL_DRIVER_VERSION, 0, NULL, &valueSize);
            value = (char*) malloc(valueSize);
            clGetDeviceInfo(devices[j], CL_DRIVER_VERSION, valueSize, value, NULL);
            printf(" %d.%d Software version: %sn", j+1, 2, value);
            free(value);

            // print c version supported by compiler for device
            clGetDeviceInfo(devices[j], CL_DEVICE_OPENCL_C_VERSION, 0, NULL, &valueSize);
            value = (char*) malloc(valueSize);
            clGetDeviceInfo(devices[j], CL_DEVICE_OPENCL_C_VERSION, valueSize, value, NULL);
            printf(" %d.%d OpenCL C version: %sn", j+1, 3, value);
            free(value);

            // print parallel compute units
            clGetDeviceInfo(devices[j], CL_DEVICE_MAX_COMPUTE_UNITS,
                    sizeof(maxComputeUnits), &maxComputeUnits, NULL);
            printf(" %d.%d Parallel compute units: %dn", j+1, 4, maxComputeUnits);

        }

        free(devices);

    }

    free(platforms);
    return 0;

}

Compile and run on the Mac as follows. If you don’t have the clang, g++ or gcc commands install them. Any of those commands should work.

$ clang -framework OpenCL devices.c -o devices && ./devices

The output produced on my machine is as follows but may differ on your system.

1. Device: Intel(R) Core(TM)2 Duo CPU     U9600  @ 1.60GHz
 1.1 Hardware version: OpenCL 1.2 
 1.2 Software version: 1.1
 1.3 OpenCL C version: OpenCL C 1.2 
 1.4 Parallel compute units: 2
2. Device: GeForce 320M
 2.1 Hardware version: OpenCL 1.0 
 2.2 Software version: CLH 1.0
 2.3 OpenCL C version: OpenCL C 1.1 
 2.4 Parallel compute units: 6

As you can see my Macbook Air shows rather feeble and outdated metadata being an old slimline laptop. As always, the code is only tested on my Macbook Air but, in theory, should run on Windows and Linux though the way you compile and run will differ from above slightly. If you have any issues or would like to critique and improve my code (given I’m not a C programmer) by all means leave a comment.

Did this help you? Let me know in the comments!

OpenCL Cookbook: Listing all platforms and their attributes

The first article in the OpenCL Cookbook series looks at how to list all platforms and their attributes in OpenCL using C as a host language on an OpenCL supported system.

For those new to OpenCL (like me) a platform is a top level entity in the OpenCL API and represents an SDK. You have to get a platform before you can delve deeper into what a platform provides access to such as devices (cpu, gpu). Depending on the hardware/GPU of a system may find an AMD, NVidia, Intel or Apple OpenCL SDK. You may even find multiple SDKs, for instance, if you have multiple GPUs of different makes.

#include <stdio.h>
#include <stdlib.h>
#ifdef __APPLE__
#include <OpenCL/opencl.h>
#else
#include <CL/cl.h>
#endif

int main() {

    int i, j;
    char* info;
    size_t infoSize;
    cl_uint platformCount;
    cl_platform_id *platforms;
    const char* attributeNames[5] = { "Name", "Vendor",
        "Version", "Profile", "Extensions" };
    const cl_platform_info attributeTypes[5] = { CL_PLATFORM_NAME, CL_PLATFORM_VENDOR,
        CL_PLATFORM_VERSION, CL_PLATFORM_PROFILE, CL_PLATFORM_EXTENSIONS };
    const int attributeCount = sizeof(attributeNames) / sizeof(char*);

    // get platform count
    clGetPlatformIDs(5, NULL, &platformCount);

    // get all platforms
    platforms = (cl_platform_id*) malloc(sizeof(cl_platform_id) * platformCount);
    clGetPlatformIDs(platformCount, platforms, NULL);

    // for each platform print all attributes
    for (i = 0; i < platformCount; i++) {

        printf("n %d. Platform n", i+1);

        for (j = 0; j < attributeCount; j++) {

            // get platform attribute value size
            clGetPlatformInfo(platforms[i], attributeTypes[j], 0, NULL, &infoSize);
            info = (char*) malloc(infoSize);

            // get platform attribute value
            clGetPlatformInfo(platforms[i], attributeTypes[j], infoSize, info, NULL);

            printf("  %d.%d %-11s: %sn", i+1, j+1, attributeNames[j], info);
            free(info);

        }

        printf("n");

    }

    free(platforms);
    return 0;

}

Compile and run on the Mac as follows. If you don’t have the clang, g++ or gcc commands install them. Any of those commands should work.

clang -framework OpenCL platforms.c -o platforms && ./platforms

The output produced on my Macbook Air is as follows but may differ for your system.

 1. Platform 
  1.1 Name       : Apple
  1.2 Vendor     : Apple
  1.3 Version    : OpenCL 1.2 (Jun 20 2012 14:18:19)
  1.4 Profile    : FULL_PROFILE
  1.5 Extensions : cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event

The code has been tested only on my Macbook Air but should work on Windows and Linux too though the way you compile and run will differ from above slightly. If you find any issues or would like to suggest improvements to the program (given that I’m not a C programmer) then please let me know in the comments. If you would like a step by step dissected guide to the above program explaining what it’s doing let me know and if there’s enough demand I’ll do a breakdown in another post.