Tag Archives: amd

AMD release APP SDK 2.8, CodeXL 1.0 and Bolt

This is old news now that AMD have released APP SDK 2.8, CodeXL 1.0 and Bolt. Bolt is a C++ abstraction of OpenCL which allows you to write OpenCL applications without actually writing OpenCL. I personally haven’t tried it as we are not going to be using it at work but it sounds a good step forward in making GPGPU more accessible to the mainstream. For anyone interested there are a number of blog posts by AMD on how to use Bolt.

8 GPU watercooled computation rig using Power Color Devil 13 HD 7990 cards

Workstation motherboards provide 4 or more PCIE x16 gen 3 slots with a good distribution of dedicated links direct to multiple onboard CPUs. With dual cards such as 7990 which are in fact 2×7970 each one can in theory load up a 4 slot motherboard with 8 gpus in total. Alas, in practice, there can be obstacles booting with 7 or 8 of them with certain motherboads as they tend to run out of PCIE resources after around 6 cards which could be considered to be surprising for a workstation class motherboard. Incidentally, I think it’s worth noting that not only are these the only model of 7990 made but they are practically impossible to get hold of so very rare gems indeed.

Watercooling four dual gpus on a standard workstation motherboard can also be a challenge due to a severe shortage of space between them. Dual slot spacing between PCIE x16 slots is a tight fit as the tubes can take up almost three slots worth of space between cards. In this configuration they are installed on alternate slots so if you had 7 slots in total you’d only install on 1, 3, 5 and 7 leaving 2, 4 and 6 empty. Though 2, 4 and 6 will usually be PCIE x8 slots anyway as opposed to the rest being PCIE x16.

As you can see below these cards have been completely stripped down of their air cooling and heatsink apparatus prior to attaching waterblocks and tubing for coolant to pass through them. The tubing is secured using barb fittings which are stronger than the alternative: compression fittings though they do lack the aesthetic appeal of compression fittings. Compression fittings can come apart under tension and that can create a real mess as I realised the hard way one night.

If not using full card waterblocks (which these aren’t) individual adhesive heatsinks for all the ram chips (known as ram sinks) are required for sufficient cooling. There may be upto 12 of these tiny little ram sinks on each face of a card. I don’t have any photos of that right now but I’ll try and get some. Though, ram sinks, can be flimsy and easily become dislodged and fall off the cards if they are knocked which is why some people prefer full cover blocks. Full cover blocks are more robust but also more expensive.

In terms of power the system is supplied with 2400 watts of power composed of two 1200W power supplies chained by an adapter for the first to kickstart the other on boot. Half the gpus are powered by one and half by the other. This particular machine has 128GB of RAM and dual xeons with a combined total of 32 cores. Update: As rightly pointed out by the commenter below I meant hardware threads not physical cores here.

Note: I do not own the hardware or the photographs but consent has been acquired to publish them here. Also this system is not conceived or assembled by me though I do have the pleasure of loading it with OpenCL benchmarks and computations.

OpenCL Cookbook: Managing your GPUs using amdconfig/aticonfig – a powerful utility in the AMD OpenCL toolset on Linux

When you install AMD Catalyst drivers (I’m using 12.11 beta) on linux you gain access to a command line utility called amdconfig. This is better known by its legacy name aticonfig but in this article we’ll stick with the new name. This tool provides a wealth of functionality in querying and configuring your AMD gpu making it a very powerful utility in your OpenCL toolset on Linux.

Here we explore some basic yet powerful uses of this tool to query and manage the state of our gpus. In the following examples I use the long form of the arguments to the command so that it is easier to remember and understand for those new to the command. Note that in general write commands due to the inherent risks involved can only be run as root. Read only commands can be run by normal users.

List all gpu adapters

Note – just because you see multiple adapters here does not mean they will be enabled in the OpenCL runtime. For that you have to generate a new X configuration with all adapters (see command for further down).

dhruba@debian:~$ amdconfig --list-adapters
* 0. 05:00.0 AMD Radeon HD 7900 Series  
  1. 06:00.0 AMD Radeon HD 7900 Series  
  2. 09:00.0 AMD Radeon HD 7900 Series  
  3. 0a:00.0 AMD Radeon HD 7900 Series  
  4. 85:00.0 AMD Radeon HD 7900 Series  
  5. 86:00.0 AMD Radeon HD 7900 Series  

* - Default adapter

Generate a fresh X config with all adapters enabled

Generating a new config in this way will backup your old config.

dhruba@debian:~$ sudo amdconfig --initial --force --adapter=all
Uninitialised file found, configuring.
Using xorg.conf
Saving back-up to xorg.conf.fglrx-0

Or you can specify your old and new config files explicitly.

dhruba@debian:~$ sudo amdconfig --initial --force --adapter=all --input=foo --output=bar
Uninitialised file found, configuring.
Using bar
Saving back-up to bar.original-0

Query current clocks, clock ranges and load for all adapters

Note here that by default, in idle mode, on linux adapters are clocked at 300/150 (core/memory) but under load the clocks automatically increase to 925/1375 (core/memory) which is nice.

dhruba@debian:~$ amdconfig --od-getclocks --adapter=all

Adapter 0 - AMD Radeon HD 7900 Series  
                            Core (MHz)    Memory (MHz)
           Current Clocks :    925           1375
             Current Peak :    925           1375
  Configurable Peak Range : [300-1125]     [150-1575]
                 GPU load :    99%

Adapter 1 - AMD Radeon HD 7900 Series  
                            Core (MHz)    Memory (MHz)
           Current Clocks :    925           1375
             Current Peak :    925           1375
  Configurable Peak Range : [300-1125]     [150-1575]
                 GPU load :    98%

Adapter 2 - AMD Radeon HD 7900 Series  
                            Core (MHz)    Memory (MHz)
           Current Clocks :    925           1375
             Current Peak :    925           1375
  Configurable Peak Range : [300-1125]     [150-1575]
                 GPU load :    98%

Adapter 3 - AMD Radeon HD 7900 Series  
                            Core (MHz)    Memory (MHz)
           Current Clocks :    925           1375
             Current Peak :    925           1375
  Configurable Peak Range : [300-1125]     [150-1575]
                 GPU load :    98%

Adapter 4 - AMD Radeon HD 7900 Series  
                            Core (MHz)    Memory (MHz)
           Current Clocks :    925           1375
             Current Peak :    925           1375
  Configurable Peak Range : [300-1125]     [150-1575]
                 GPU load :    98%

Adapter 5 - AMD Radeon HD 7900 Series  
                            Core (MHz)    Memory (MHz)
           Current Clocks :    925           1375
             Current Peak :    925           1375
  Configurable Peak Range : [300-1125]     [150-1575]
                 GPU load :    98%

Query temperatues for all adapters

This is handy to keep an eye on your gpus under load to check they are not overheating. The following temperatures were taken under load and you can see that adapter 3 has reached 70C despite all cards being aggressively water cooled.

dhruba@debian:~$ amdconfig --odgt --adapter=all

Adapter 0 - AMD Radeon HD 7900 Series  
            Sensor 0: Temperature - 65.00 C

Adapter 1 - AMD Radeon HD 7900 Series  
            Sensor 0: Temperature - 50.00 C

Adapter 2 - AMD Radeon HD 7900 Series  
            Sensor 0: Temperature - 60.00 C

Adapter 3 - AMD Radeon HD 7900 Series  
            Sensor 0: Temperature - 70.00 C

Adapter 4 - AMD Radeon HD 7900 Series  
            Sensor 0: Temperature - 58.00 C

Adapter 5 - AMD Radeon HD 7900 Series  
            Sensor 0: Temperature - 54.00 C

List crossfire candidates and crossfire status

For OpenCL it is essential that you have crossfire disabled. You can disable it either using amdconfig --crossfire=off or through catalyst control centre which you start by running amdcccle.

dhruba@debian:~$ amdconfig --list-crossfire-candidates

Master adapter:  0. 05:00.0 AMD Radeon HD 7900 Series  
    Candidates:  none
Master adapter:  1. 06:00.0 AMD Radeon HD 7900 Series  
    Candidates:  none
Master adapter:  2. 09:00.0 AMD Radeon HD 7900 Series  
    Candidates:  none
Master adapter:  3. 0a:00.0 AMD Radeon HD 7900 Series  
    Candidates:  none
Master adapter:  4. 85:00.0 AMD Radeon HD 7900 Series  
    Candidates:  none
Master adapter:  5. 86:00.0 AMD Radeon HD 7900 Series  
    Candidates:  none
dhruba@debian:~$ amdconfig --list-crossfire-status
    Candidate Combination: 
    Master: 0:0:0 
    Slave: 0:0:0 
    CrossFire is disabled on current device
    CrossFire Diagnostics:
    CrossFire can work with P2P mapping through GART
    Candidate Combination: 
    Master: 0:0:0 
    Slave: 0:0:0 
    CrossFire is disabled on current device
    CrossFire Diagnostics:
    CrossFire can work with P2P mapping through GART
    Candidate Combination: 
    Master: 0:0:0 
    Slave: 0:0:0 
    CrossFire is disabled on current device
    CrossFire Diagnostics:
    CrossFire can work with P2P mapping through GART
    Candidate Combination: 
    Master: 0:0:0 
    Slave: 0:0:0 
    CrossFire is disabled on current device
    CrossFire Diagnostics:
    CrossFire can work with P2P mapping through GART
    Candidate Combination: 
    Master: 0:0:0 
    Slave: 0:0:0 
    CrossFire is disabled on current device
    CrossFire Diagnostics:
    CrossFire can work with P2P mapping through GART
    Candidate Combination: 
    Master: 0:0:0 
    Slave: 0:0:0 
    CrossFire is disabled on current device
    CrossFire Diagnostics:
    CrossFire can work with P2P mapping through GART

You can also use amdconfig to set core and memory clocks but this will be covered in a separate article. I do not want to run these commands on my system as I’m happy with current clocks. But here’s a snippet from the man page which is fairly self explanatory. Bear in mind that to tweak clocks you need to enable overdrive using –od-enable.

  --od-enable
        Unlocks the ability to change core or memory clock values by
        acknowledging that you have read and understood the AMD Overdrive (TM)
        disclaimer and accept responsibility for and recognize the potential
        dangers posed to your hardware by changing the default core or memory
        clocks
  --od-disable
        Disables AMD Overdrive(TM) set related aticonfig options.  Previously
        commited core and memory clock values will remain, but will not be set
        on X Server restart.
  --odsc, --od-setclocks={NewCoreClock|0,NewMemoryClock|0}
        Sets the core and memory clock to the values specified in MHz
        The new clock values must be within the theoretical ranges provided
        by --od-getclocks.  If a 0 is passed as either the NewCoreClock or
        NewMemoryClock it will retain the previous value and not be changed.
        There is no guarantee that the attempted clock values will succeed
        even if they lay inside the theoretical range.  These newly set
        clock values will revert to the default values if they are not
        committed using the "--od-commitclocks" command before X is
        restarted
  --odrd, --od-restoredefaultclocks
        Sets the core and memory clock to the default values.
        Warning X needs to be restarted before these clock changes will take
        effect
  --odcc, --od-commitclocks
        Once the stability of a new set of custom clocks has been proven this
        command will ensure that the Adapter will attempt to run at these new
        values whenever X is restarted

OpenCL Cookbook: Compiling OpenCL with Ubuntu 12.10, Unity, AMD 12.11 beta drivers & AMD APP SDK 2.7

Continuing on in the OpenCL cookbook series here I present a post not about code but about environmental setup further diversifying the scope of the cookbook. It can be a real challenge for the uninitiated to install all the above and compile an opencl c or c++ program on linux. Here’s a short guide. First download and install ubuntu (duh!).

Install ubuntu build tools and linux kernel extras

Then install the following packages which are a prerequisite to the amd installers and the subsequent c/c++ compilation.

sudo apt-get update
sudo apt-get install build-essential
sudo apt-get install linux-source
sudo apt-get install linux-headers-generic

Then download AMD 12.11 beta drivers (amd-driver-installer-catalyst-12.11-beta-x86.x86_64.zip) and AMD APP SDK 2.7 (AMD-APP-SDK-v2.7-lnx64.tgz). Obviously download either 32bit or 64bit based on what your system supports.

AMD 12.11 beta drivers installation

Once you’ve done that install the AMD 12.11 beta drivers as root first. Installation is as simple as extracting the tarball, marking the script inside as executable and running the script as root. Reboot. After the reboot unity should start using the new AMD 12.11 beta driver and you’ll know it’s the beta because you’ll see a watermark at the bottom left of the screen saying ‘AMD Testing use only’. Note that the reason we’re using the beta here is because unity does not work with earlier versions of the driver. You get a problem where you see the desktop background and a mouse pointer but there’s no toolbar or status bar. But the 12.11 beta driver works which is great.

AMD APP SDK 2.7 installation

Then install the AMD APP SDK 2.7 also as root. Again installation is very simple and exactly the same as for the beta driver above. The AMD beta drivers install a video driver and the OpenCL runtime. The AMD APP SDK install the SDK and also OpenCL and OpenGL runtimes. However if you’ve already installed the video driver first you’ll already have the OpenCL runtime on your system in /usr/lib/libamdocl64.so so the APP SDK won’t install another copy in its location of /opt/AMDAPP/lib/x86_64/libOpenCL.so. You’ll see some messages during installation that it’s skipping the opencl runtime and that’s absolutely fine for now.

Test your OpenCL environment

Now you should test your OpenCL environment by compiling and running an example c opencl program. Get my C file to list all devices on your system as an example calling it devices.c and compile as follows.

gcc -L/usr/lib -I/opt/AMDAPP/include devices.c -lamdocl64 -o devices.o # for c
g++ -L/usr/lib -I/opt/AMDAPP/include devices.c -lamdocl64 -o devices.o # for c++

Once compiled run the output file (devices.o) and if it works then you should output similar to that below.

1. Device: Tahiti
 1.1 Hardware version: OpenCL 1.2 AMD-APP (923.1)
 1.2 Software version: CAL 1.4.1741 (VM)
 1.3 OpenCL C version: OpenCL C 1.2 
 1.4 Parallel compute units: 32
2. Device: Intel(R) Xeon(R) CPU E5-2687W 0 @ 3.10GHz
 2.1 Hardware version: OpenCL 1.2 AMD-APP (923.1)
 2.2 Software version: 2.0 (sse2,avx)
 2.3 OpenCL C version: OpenCL C 1.2 
 2.4 Parallel compute units: 32

Enabling multiple gpus for OpenCL

You may find that you are only seeing one gpu in your opencl programs. There are two things you need to do to enable multiple gpus in the OpenCL runtime. The first is to disable all crossfire. You can do this either in the amd catalyst control centre > performance which you start by running amdcccle or you can do it using the awesome amdconfig tool by running amdconfig --crossfire=off. See my post on amdconfig to find out more about this incredibly powerful tool.

The second thing you may or may not need to do is to enable COMPUTE mode as follows.

export COMPUTE=:0

Once you’ve done the above you should see program output from the program above similar to below.

dhruba@debian:~$ ./source/devices.o 
1. Device: Tahiti
 1.1 Hardware version: OpenCL 1.2 AMD-APP (1084.2)
 1.2 Software version: 1084.2 (VM)
 1.3 OpenCL C version: OpenCL C 1.2 
 1.4 Parallel compute units: 32
2. Device: Tahiti
 2.1 Hardware version: OpenCL 1.2 AMD-APP (1084.2)
 2.2 Software version: 1084.2 (VM)
 2.3 OpenCL C version: OpenCL C 1.2 
 2.4 Parallel compute units: 32
3. Device: Tahiti
 3.1 Hardware version: OpenCL 1.2 AMD-APP (1084.2)
 3.2 Software version: 1084.2 (VM)
 3.3 OpenCL C version: OpenCL C 1.2 
 3.4 Parallel compute units: 32
4. Device: Tahiti
 4.1 Hardware version: OpenCL 1.2 AMD-APP (1084.2)
 4.2 Software version: 1084.2 (VM)
 4.3 OpenCL C version: OpenCL C 1.2 
 4.4 Parallel compute units: 32
5. Device: Tahiti
 5.1 Hardware version: OpenCL 1.2 AMD-APP (1084.2)
 5.2 Software version: 1084.2 (VM)
 5.3 OpenCL C version: OpenCL C 1.2 
 5.4 Parallel compute units: 32
6. Device: Tahiti
 6.1 Hardware version: OpenCL 1.2 AMD-APP (1084.2)
 6.2 Software version: 1084.2 (VM)
 6.3 OpenCL C version: OpenCL C 1.2 
 6.4 Parallel compute units: 32
7. Device: Intel(R) Xeon(R) CPU E5-2687W 0 @ 3.10GHz
 7.1 Hardware version: OpenCL 1.2 AMD-APP (1084.2)
 7.2 Software version: 1084.2 (sse2,avx)
 7.3 OpenCL C version: OpenCL C 1.2 
 7.4 Parallel compute units: 32

Standardising the OpenCL runtime library path

Now – it may be that you wish for the OpenCL runtime library to be installed in the standard AMD APP SDK location of /opt/AMDAPP/lib/x86_64/libOpenCL.so as opposed to the non-standard location of /usr/lib/libamdocl64.so which is where the beta driver installation puts it. The proper way to do this would probably be to install the AMD APP SDK first and then the video driver or simply skip the video driver installation (I haven’t tried either of these options so they may need verification).

However, I used a little trick to make this easier since I’d already installed the video driver followed by the APP SDK. I renamed /usr/lib/libamdocl64.so to /usr/lib/libamdocl64.so.x and reinstalled the APP SDK. This time it detected that the runtime wasn’t present and installed another runtime in /opt/AMDAPP/lib/x86_64/libOpenCL.so – the standard SDK runtime path. With the new APP SDK OpenCL runtime in place I was able to compile the same program using the new runtime as below depending on whether you want the c or c++ compiler.

gcc -L/opt/AMDAPP/lib/x86_64/ -I/opt/AMDAPP/include devices.c -lOpenCL -o devices.o # for c
g++ -L/opt/AMDAPP/lib/x86_64/ -I/opt/AMDAPP/include devices.c -lOpenCL -o devices.o # for c++

Summary

And there you have it – an opencl compiler working on ubuntu 12.10 using the AMD 12.11 beta drivers and the AMD APP 2.7 SDK. Sometimes you just need someone else to have done it first and written a guide and I hope this serves to help someone out there.

AMD and Oracle to collaborate on Heterogenous Computing in Java

In August John Coomes from Oracle made a proposal to add GPU support to Java. One month later, on Sep 10, he proposed the creation of a new project called Sumatra to continue with this endeavour. On Sep 24 this project was approved by a 100% vote in favour. During the recent JavaOne 2012 AMD officially announced its participation in OpenJDK Project Sumatra in collaboration with Oracle and OpenJDK to bring heterogenous computing to Java for server and cloud environments. The Inquirer also reports on this subject.

This is very exciting news indeed. Although there are already two libraries for GPU programming in Java – namely rootbeer and aparapi, having GPU support built in to the Java language, the Java API and most importantly the JVM will provide an alternative more compelling than the use of any external library. And to be quite frank there could not be a collaborator than AMD given their vast contribution to date to OpenCL and OpenCL development tools. And unlike Nvidia, they are wholly committed to OpenCL and not working on their own proprietary alternative.

Although it’ll be a while before this project sees any substantial contribution I cannot wait to see this take form over the next year or two. OpenCL and, in general, the GPU programming paradigm is hard; very hard; and even more importantly porting existing code is even harder; and if anyone can make this domain accessible to the mainstream it’s Java. Once Sumatra is ready hopefully we won’t have to write OpenCL anymore. We’ll be able to write normal Java, compile it and at either compile time or runtime the byte code will get translated into OpenCL and compiled. At execution time we won’t have to worry about what hardware we’re running because with any luck it’ll be write once run anywhere!

Oracle and AMD propose GPU support in Java

A very exciting development indeed has come to my attention albeit four days late. This news is, without exaggeration, in my humble opinion nothing short of groundbreaking, not only because it pushes the boundaries and capabilities of Java even further in terms of hardware support and performance but also because, as I’m gradually beginning to realise, the future of high performance computing is in the GPU.

John Coomes (OpenJDK HotSpot Group Lead) and Gary Frost (AMD) have proposed a new OpenJDK project to implement GPU support in Java with a native JVM (full thread).

This project intends to enable Java applications to seamlessly take advantage of a GPU–whether it is a discrete device or integrated with a CPU–with the objective to improve the application performance.

Their focus will be on code generation, garbage collection and runtimes.

We propose to use the Hotspot JVM, and will concentrate on code generation, garbage collection, and
runtimes. Performance will be improved, while preserving compile time, memory consumption and code generation quality.

The project will also use the new Java 8 lambda language and may eventually sprout further platform enhancements.

We will start exploring leveraging the new Java 8 Lambda language and library features. As this project progress, we may identify challenges with the Java API and constructs which may lead to new language, JVM and library extensions that will need standardization under the JCP process.

John Coomes (Oracle) will lead the project and Gary Frost (AMD) has gladly offered resources from AMD. As the mail thread got underway two already existing related projects were brought to the list’s attention: rootbeer (PDF, slashdot) and Aparapi (AMD page). Rootbeer is a particularly interesting project as it performs static analysis of Java code and generates CUDA code automatically – quite different from compile time OpenCL bindings from Java. The developer of rootbeer has also shown interest in joining the openjdk project. John Rose, Oracle HotSpot developer, also posted a talk he recently gave on Arrays 2.0 which I’m yet to watch.

The missing link in deriving value from GPUs, from what I understand in a domain that I’m still fairly new to, is that GPU hardware and programming need to be made accessible to the mainstream and that’s the purpose I hope this project will serve. I hope also that, once underway, it also raises wider awareness of how we must make a mental shift from concurrency to parallelism and from data parallelism to task parallelism and how the next generational leap in parallelism will come, not from cpu cores or concurrency frameworks, but from harnessing the mighty GPU.

Hacker news has some discussion. What do you think? Let me know.