Tag Archives: java

Oracle and AMD propose GPU support in Java

A very exciting development indeed has come to my attention albeit four days late. This news is, without exaggeration, in my humble opinion nothing short of groundbreaking, not only because it pushes the boundaries and capabilities of Java even further in terms of hardware support and performance but also because, as I’m gradually beginning to realise, the future of high performance computing is in the GPU.

John Coomes (OpenJDK HotSpot Group Lead) and Gary Frost (AMD) have proposed a new OpenJDK project to implement GPU support in Java with a native JVM (full thread).

This project intends to enable Java applications to seamlessly take advantage of a GPU–whether it is a discrete device or integrated with a CPU–with the objective to improve the application performance.

Their focus will be on code generation, garbage collection and runtimes.

We propose to use the Hotspot JVM, and will concentrate on code generation, garbage collection, and
runtimes. Performance will be improved, while preserving compile time, memory consumption and code generation quality.

The project will also use the new Java 8 lambda language and may eventually sprout further platform enhancements.

We will start exploring leveraging the new Java 8 Lambda language and library features. As this project progress, we may identify challenges with the Java API and constructs which may lead to new language, JVM and library extensions that will need standardization under the JCP process.

John Coomes (Oracle) will lead the project and Gary Frost (AMD) has gladly offered resources from AMD. As the mail thread got underway two already existing related projects were brought to the list’s attention: rootbeer (PDF, slashdot) and Aparapi (AMD page). Rootbeer is a particularly interesting project as it performs static analysis of Java code and generates CUDA code automatically – quite different from compile time OpenCL bindings from Java. The developer of rootbeer has also shown interest in joining the openjdk project. John Rose, Oracle HotSpot developer, also posted a talk he recently gave on Arrays 2.0 which I’m yet to watch.

The missing link in deriving value from GPUs, from what I understand in a domain that I’m still fairly new to, is that GPU hardware and programming need to be made accessible to the mainstream and that’s the purpose I hope this project will serve. I hope also that, once underway, it also raises wider awareness of how we must make a mental shift from concurrency to parallelism and from data parallelism to task parallelism and how the next generational leap in parallelism will come, not from cpu cores or concurrency frameworks, but from harnessing the mighty GPU.

Hacker news has some discussion. What do you think? Let me know.

Two bug fixes in JDK6u34 worthy of note

JDKu34 fixes a lot of bugs but two, in particular, struck me as particularly worthy of note.

  • Bug 7027300 – Unsynchronized HashMap access causes endless loop
  • Bug 6941923 – RFE: Handling large log files produced by long running Java Applications

The first fix really made me laugh as this problem has existed for so long, been widely written about on the web and plagued many who made the rather novice and deadly mistake of using hashmap unsynchronized over the years. It’s also been a tricky one to debug once you’re hit with it as nothing really happens other than the one thread executing the hashmap code going into an infinite loop whilst still remaining in a RUNNABLE state and not a BLOCKED state which is what people tend to look for in thread dumps. The application may become unresponsive which may lead you to think that it’s probably just some slow code if you’re working with cpu intensive code like aggregate statistical calculations.

The key to effective debugging of such subtle concurrency bugs is to always have jvisualvm open on local processes and to check the real time thread visualisation tab frequently to see what your application is doing or if running live then to take multiple thread dumps either manually or programmatically and check for differences between them.

The second fix also made me laugh as this is another issue, though less critical than the previous one, has caused much inconvenience to both java developers, system administrators and support staff over the years. Large log files have resulted in periodic process restarts being incorporated and numerous emails being sent about running out of disk space. Can you believe that in this day and age we still run out of disk space? Much bash code has also been written to do log rotation manually as was the case in my last workplace and processing or reading large log files is also much slower.

Though, to be honest, I have to confess I’m a little confused by the Oracle bug system. On the second bug above it says ‘Closed, Will not Fix’ but at the same time under ‘Evaluation’ it says three new flags have been introduced so has the issue been fixed or not? I also noticed another bug which looked interesting: bug 7071826 – UUID.randomUUID() race condition. ID generation is a very critical function of any environment and if there was a race condition it could potentially be a serious and widespread issue but I noticed that in that bug there was no description of the problem; merely pointers to comments elsewhere but where? Where are those comments being held? Help me out if you know.

On a related note JDK7u6 introduces JDK and JRE support for Mac OS X!

JDK8: StampedLock: A possible SequenceLock replacement

Quite some time back I wrote about SequenceLock – a new kind of lock, due for release, in JDK8 where each lock acquisition or release advanced a sequence number. Two days ago, however, Doug Lea has reported that, for a variety of reasons, this API may be “less useful than anticipated”. Instead he has proposed a replacement API called StampedLock and requested feedback on it.

Here I reproduce example usage of both APIs for for easy comparison by the reader.

SequenceLock sample usage

 class Point {
   private volatile double x, y;
   private final SequenceLock sl = new SequenceLock();

   // A read-only method
   double distanceFromOriginV1() {
     double currentX, currentY;
     long seq;
     do {
       seq = sl.awaitAvailability();
       currentX = x;
       currentY = y;
     } while (sl.getSequence() != seq); // retry if sequence changed
     return Math.sqrt(currentX * currentX + currentY * currentY);
   }

   // an exclusively locked method
   void move(double deltaX, double deltaY) {
     sl.lock();
     try {
       x += deltaX;
       y += deltaY;
     } finally {
       sl.unlock();
     }
   }

   // Uses bounded retries before locking
   double distanceFromOriginV2() {
     double currentX, currentY;
     long seq;
     int retries = RETRIES_BEFORE_LOCKING; // for example 8
     try {
       do {
         if (--retries < 0)
           sl.lock();
         seq = sl.awaitAvailability();
         currentX = x;
         currentY = y;
       } while (sl.getSequence() != seq);
     } finally {
       if (retries < 0)
         sl.unlock();
     }
     return Math.sqrt(currentX * currentX + currentY * currentY);
   }
 }

StampedLock sample usage

class Point {
    private int x, y;
    private final StampedLock lock = new StampedLock();

     public int magnitude() { // a read-only method
         long stamp = lock.beginObserving();
         try {
             int currentX = x;
             int currentY = y;
         } finally {
             if (!lock.validate(stamp)) {
                 stamp = lock.lockForReading();
                 try {
                     currentX = x;
                     currentY = y;
                 } finally {
                     lock.unlock(stamp);
                 }
             }
             return currentX * currentX + currentY * currentY;
         }
     }

     public void move(int deltaX, int deltaY) { // a write-locked method
        long stamp = lock.lockForWriting();
        try {
            x += deltaX;
            y += deltaY;
        } finally {
            lock.unlock(stamp);
        }
    }
}

There are two primary differences that I can see. First of all – in SequenceLock the read only method has an element of indefinite retry without lock acquisition. In StampedLock, however, the retry element is replaced with lock acquisition which would perform better under a lot of writes. Secondly, the single undifferentiated lock in SequenceLock is replaced with a differentiated read and write lock. The latter feature makes this class another alternative to the existing class: ReentrantReadWriteLock that Doug Lea describes as “cheaper but more restricted”. It will be fascinating to watch the progression of this API over time.

JDK8 ConcurrentHashMap gets huge map support

Doug Lea announces huge map support in the latest incarnation of ConcurrentHashMap to more effectively support more than a billion elements.

Finally acting on an old idea, I committed an update to ConcurrentHashMap (currently only the one in our jdk8 preview package, as jsr166e.ConcurrentHashMap8) that much more gracefully handles maps that are huge or have many keys with colliding hash codes. Internally, it uses tree-map-like structures to maintain bins containing more nodes than would be expected under ideal random key distributions over ideal numbers of bins.

It’s nice to see the huge map consideration get some real effort put into it. Mark Reinhold, I believe, talked about huge map 64-bit support as a possible feature in JDK8 onwards in one of the videos hosted by Adam Messinger. I don’t think Doug Lea’s efforts are related to that but rather his own ideas.

JDK8 update: Release schedule and StringBuilder patch

Looking through the jdk8-dev archives (albeit slightly late) I found two points worthy of note. First is the JDK8 release schedule (thread view) as well as commendable openness by the release manager of the release schedule and commitment by Oracle. The second is a movement (thread view) in openjdk to migrate from StringBuffer to StringBuilder which in my opinion is fantastic not only as a means of keeping the codebase up to date in a backward compatible manner but also as a performance improvement. Note the lmax disruptor whitepaper in which benchmarks showed how sychronized code can be slower than unsynchronized code even when only executed by a single thread. On a note of curious trivia: Adam Messinger, who hosted the jdk launch event on video has left Oracle for Twitter to become VP of Application Engineering!

Java SE 8 Developer Preview with Lambda Support

Java SE 8 developer preview released with lambda support. The big question now is which editors support Java 8 and lambda syntax? I wasn’t able to find explicit mention of any editors supporting it -I suspect I’ll have to hack around in vim for a while. If you know please comment.

Three Java releases in one day!

Oracle have just made three java releases (or maybe I’ve only just noticed them).

The 7u1 release fixes six bugs of which two appear to be loop related (1,2). The bug with loop predication was the one that originally tainted the release of Java 7 as you might remember. If I was working for a startup or running my own company I’d move to 7u1 straight away but in a bank that’ll never happen. Still, at least, I’ve managed to persuade them to move to 1.6.0_25 due to a critical CMS fragmentation bug afflicting 1.6.0_23. 🙁

The 6u29 release appears to have skipped a build number which is justified in the release notes. This one has two bug fixes worthy of mention out of a total of five – one is where ‘java.net.InterfaceAddress’s equals method may throw NPE’ (odd) and the other is where there is a ‘Memory leak of java.lang.ref.WeakReference objects’ (how ironic!).

Regarding the mac release it is great to see the Mac get official recognition on the oracle java homepage and have the release packaged as a dmg rather than needing to be built from source. (It’s quite possible by the way that I’ve only just noticed this one and that it’s been there all along or maybe I’d seen this earlier and have forgotten!). Though I’m liking the steady progress Oracle. Keep up the good work. I hope you are working on Java 8 as planned for end of next year!

jsr166e: Upcoming java.util.concurrent APIs for Java 8

Jsr166e is to Java8 as Jsr166y was to Java7. Jsr166y introduced the fork join framework and Phaser to Java7 which are worthy of blog posts of their own. The fork join framework will enable us to introduce fine grained inversion of concurrency whereby we can code logic without really needing to think about or implement how that logic will perform on arbitrary hardware.

Now that Java 7 has been released jsr166e has emerged as a repository of utilities that are intended for inclusion into Java 8 next year. Having followed the discussion on the concurrency mailing list I’ve become absolutely fascinated by the work going on in jsr166e for the simple reason that it is catering for use cases that have been directly relevant to my recent work. So without further delay here is an abridged guide to the current state of jsr166e.

Collections

  • ConcurrentHashMapV8: A candidate replacement for java.util.concurrent.ConcurrentHashMap with lower memory footprint. The exact improvements of this over the old implementation I’m yet to explore. Here’s the mail thread announcement and discussion.
  • ConcurrentHashMapV8.MappingFunction: A (well overdue) mechanism for automatically computing a value for a key that doesn’t already have one. I’ve waited a long time for this and in my opinion this is the most basic requirement of a concurrent map as without this you always end up locking to create a new mapping.
  • LongAdderTable: A concurrent counting map where a key is associated with an efficient concurrent primitive counter. This provides significantly improved performance over AtomicLong under high contention as it utilises striping across multiple values. I’ve desperately needed this in my job and I am overjoyed that this has finally been written by the experts group. I’ve been exploring and coding up various implementations of my own of such a class recently but I’d rather have this provided by the JDK. Again a very basic requirement and a class that’s well overdue.
  • ReadMostlyVector: Same as Vector but with reduced contention and better throughput for concurrent reads. I’m a little surprised about this one. Does anyone even use Vector anymore? Why not just replace the underlying implementations of Hashtable and Vector with more performant ones? Is there any backward compatibility constraint that’s restricting this?

Adders

The following adders are essentially high performance concurrent primitive counters that dynamically adapt to growing contention to reduce it. The key value add here is achieved by utilising striping across values on writes and acting across the stripes for read.

Again, high performance primitive counters, are something I’ve desperately needed in my work lately. Imagine if you are implementing client server protocols. You may need message sequence numbers to ensure you can discard out of order/older messages. You might also need request response id correlation for which id generation is necessary. For any such id generation I wanted to use primitive longs for efficiency and as a result needed a high performance primitive long counter and now I have one!

Important: It’s important to note one limitation of these counting APIs. There are no compound methods like incrementAndGet() or addAndGet() which significantly reduces the utility of such API. I can see why this is the case: although the writes can stripe across values the read must act across all striped values and as a result is quite expensive. I therefore need to think about how much this will compromise the use of this API for the use case of an efficient id generator.

  • DoubleAdder: A high performance concurrent primitive double counter.
  • LongAdder: A high performance concurrent primitive long counter.

MaxUpdaters

The following exhibit similar performance characteristics to the adders above but instead of maintaining a count or sum they maintain a maximum value. These also use striped values for writes and reading across striped values to compute aggregate values.

  • DoubleMaxUpdater: A high performance primitive double maximum value maintainer.
  • LongMaxUpdater: A high performance primitive long maximum value maintainer.

Synchronizers

  • SequenceLock: Finally, jsr166e adds an additional synchronisation utility. This is an interesting class which took me two or three reviews of the javadoc example to understand its value add. Essentially it offers the ability to conduct a more accommodating conversation between you and the lock provider whereby you can not only choose not to lock and still retain consistent visibility but also fundamentally allow you to detect when other threads have been active simultaneously with your logic thereby allowing you to retry your behaviour until your read of any state is completely consistent at that moment in time. I can see what value this adds and how to use it but I need to think about real world use cases for this utility.

What is still missing?

Sadly, despite the above, Java shows no signs of addressing a number of other real world use cases of mine.

  • Concurrent primitive key maps
  • Concurrent primitive value maps
  • Concurrent primitive key value maps
  • Externalised (inverted) striping utilities that allow you to hash an incoming key to a particular lock across a distribution of locks. This means that you no longer have to lock entire collections but just the lock relevant to the input you are working with. This is absolutely fundamental and essential in my opinion and has already been written by EhCache for their own use but this should ideally be provided as a building block by the JDK.
  • There’s also been a lot of talk about core-striping as opposed to lock striping which I suppose is an interesting need. In other words instead of the distribution of contention being across lock instances they are across representations (IDs) of physical processor cores. Check the mailing list for details.

Summary

I’m very excited indeed by the incorporations of jsr166e not only because they have directly addressed a number of my real world use cases but also because they give an early peek at what’s to come in Java 8. The additional support for primitives is welcome as they will eliminate reliance on the ghastly autoboxing and gc churn of primitive wrappers. I’ll certainly be using these utilities for my own purposes. Keep up the great work! However, I’d love to hear why the above use cases under ‘What’s missing’ still haven’t seen any activity in Java.

Java 7 loop predication bugs surface and workaround known

Software and bugs always have been and always will be inseparable. Java 7 certainly wasn’t going to be the first exception to this rule. Unsurprisingly, since internal testing can never compete with the testing that takes place through mass adoption, in less than a day after Java 7’s release bugs surfaced with loop predication.

Oracle are aware of the issues and Mark Reinhold has suggested a work around while they work on fixes for an early update release. Though apparently update 1 will be security fixes only and loop fixes are more likely to appear in update 2 though they will try to push into update 1 if possible. Keep at it Oracle – you have our support.

Supposedly the bugs were found by Apache Lucene and Solr but I’m sure I’m not the first to wonder why these projects neglected to test with all the openjdk nightly snapshots and particularly with the release candidate.

Update [01/08/2011]: The following links provide a bit more insight on what happened: ‘The real story behind the Java 7 GA bugs affecting Apache Lucene / Solr‘ and ‘Don’t Use Java 7? Are you kidding me?