Maximum Zeal ~ Emphatic prose on indulged fascinations

Concurrency Pattern: Concurrent Set implementations in Java 6

An interesting question came up on the jsr166 concurrency interest mailing list recently which I felt was worthy of mention.  Why is there no ConcurrentHashSet equivalent of the ConcurrentHashMap data structure and how does one achieve the same concurrency and performance characteristics of the latter while maintaining the uniqueness semantics of the former? Currently there exist a few different ways of making a Set concurrent.

  • CopyOnWriteArraySet
  • ConcurrentSkipListSet
  • Collections.synchronizedSet(Set<T> s)

However none of these exhibit the lock striped minimal blocking high concurrency characteristics of a ConcurrentHashMap.  In actual fact all Set implementations with the exception of EnumSet are little more than wrappers around a second kind of backing implementation. Here are all Set implementations and their corresponding backing implementations in brackets.

  • HashSet (HashMap)
  • TreeSet (TreeMap)
  • LinkedHashSet (LinkedHashMap)
  • CopyOnWriteArraySet (CopyOnWriteArrayList)
  • ConcurrentSkipListSet (ConcurrentSkipListMap)
  • JobStateReasons (HashMap)

Following on from that, for those map implementations for which there aren’t already Set equivalents, the JDK from version 1.6 onwards provides a way for you to create a set with your own choice of backing map implementation. For example one can create a ConcurrentHashSet or a WeakHashSet simply by doing the following.

Collections.newSetFromMap(new ConcurrentHashMap<Object,Boolean>())
Collections.newSetFromMap(new WeakHashMap<Object, Boolean>())

With this knowledge some may opt to use the underlying map implementations directly by themselves and this is a trade off between a solely theoretical performance optimisation and making your choice of collection a semantically correct one.

9 Responses to Concurrency Pattern: Concurrent Set implementations in Java 6

  1. Paul says:

    Thanks for this list. I find it very useful. :)

  2. Evgeny says:

    thanks for your post. That’s exactly what I’ve been looking for

  3. Siggy says:

    Big thanks; a very useful method.

  4. Pip says:

    very nice! thanks.

  5. JamesD says:

    Awesome stuff. Thanks!

  6. Semyon Ch says:

    Hello Dhruba!
    7th result in google search on query “ConcurrentHashSet”!
    On the first page!
    Of course, your blog was the first link I navigated to as I was sure there would be a clear answer!

  7. Thanks Semyon! I’m really glad I could help. And when readers provide positive feedback it makes all the effort worthwhile. Don’t forget – if you have the option to use the NonBlockingHashSet then use it instead. It will be faster than jdk for most use cases.

    • Semyon Ch says:

      NonBlockingHashSet is good for “add” but it is vey slow in case of iteration. Profiling (JProfiler7) showes that iteration over NonBlockingHashSet around 10 times slower than iteration over the set built on ConcurrentHashMap.
      We replaced NonBlockingHashSet with the latter one.

  8. Hi Semyon! That is an excellent point. And well done for profiling. So few people do it these days – everyone is always complaining they have no time. I do miss the more interesting work along with you guys. So, yes, I agree. The key point about Cliff Click’s collections as has often been mentioned by Doug Lea himself on the concurrency mailing list (http://goo.gl/ATpmV as an example) is that they are better for certain usage patterns with the JDK equivalents being better for others.

    It’s amazing how many people don’t know about the trick in this post. This must be one of the most popular posts on the blog. The entire codebase I’m working with at work is using CopyOnWriteArraySet as a concurrent set everywhere. Imagine that! :-) Luckily most uses of Set for us are only to store listeners (observer pattern). I’m looking forward also to the new ConcurrentHashMap that is upcoming in Java 8 (http://goo.gl/2cHFJ). Check it out!