Category Archives: java

How to print assembly for your Java code in OS X

0. Write program.

package name.dhruba;
public class Main {
  public static void main(String[] args) {
    System.out.println("Hello World!");

1. Add JVM arguments to your program.

-XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly

2. Run your program. You will probably see the output below.

Java HotSpot(TM) 64-Bit Server VM warning: PrintAssembly is enabled; turning on DebugNonSafepoints to gain additional output
Could not load hsdis-amd64.dylib; library not loadable; PrintAssembly is disabled
Hello World!
Process finished with exit code 0

3. Download the missing library from here. The direct link to the lib file itself is here. I downloaded the one named ‘gnu-bsd-libhsdis-amd64.dylib’ as I’m running 64-bit. This produces a file on your system called ‘hsdis-amd64.dylib’.

4. Move it to where, the JDK that you are running with, looks for it. I was using Java8.

sudo mv ~/Downloads/hsdis-amd64.dylib /Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home/jre/lib/hsdis-amd64.dylib

5. Run your program again and see assembly! The output for Hello World is huge so I can’t paste all of it here but here’s the initial bit where you can see that the disassembler has been loaded.

Java HotSpot(TM) 64-Bit Server VM warning: PrintAssembly is enabled; turning on DebugNonSafepoints to gain additional output
Loaded disassembler from /Library/Java/JavaVirtualMachines/jdk1.8.0.jdk/Contents/Home/jre/lib/hsdis-amd64.dylib
Decoding compiled method 0x000000010f4c23d0:
[Disassembling for mach='i386:x86-64']
[Entry Point]
  # {method} {0x00000001280fe660} 'arraycopy' '(Ljava/lang/Object;ILjava/lang/Object;II)V' in 'java/lang/System'
  # parm0:    rsi:rsi   = 'java/lang/Object'
  # parm1:    rdx       = int
  # parm2:    rcx:rcx   = 'java/lang/Object'
  # parm3:    r8        = int
  # parm4:    r9        = int
  #           [sp+0x60]  (sp of caller)
  0x000000010f4c2540: mov    0x8(%rsi),%r10d
  0x000000010f4c2544: shl    $0x3,%r10
  0x000000010f4c2548: cmp    %r10,%rax
  0x000000010f4c254b: je     0x000000010f4c2558
  0x000000010f4c2551: jmpq   0x000000010f407b60  ;   {runtime_call}
  0x000000010f4c2556: xchg   %ax,%ax

Credit: Nitsan Wakart.

Three Java releases in one day!

Oracle have just made three java releases (or maybe I’ve only just noticed them).

The 7u1 release fixes six bugs of which two appear to be loop related (1,2). The bug with loop predication was the one that originally tainted the release of Java 7 as you might remember. If I was working for a startup or running my own company I’d move to 7u1 straight away but in a bank that’ll never happen. Still, at least, I’ve managed to persuade them to move to 1.6.0_25 due to a critical CMS fragmentation bug afflicting 1.6.0_23. 🙁

The 6u29 release appears to have skipped a build number which is justified in the release notes. This one has two bug fixes worthy of mention out of a total of five – one is where ‘’s equals method may throw NPE’ (odd) and the other is where there is a ‘Memory leak of java.lang.ref.WeakReference objects’ (how ironic!).

Regarding the mac release it is great to see the Mac get official recognition on the oracle java homepage and have the release packaged as a dmg rather than needing to be built from source. (It’s quite possible by the way that I’ve only just noticed this one and that it’s been there all along or maybe I’d seen this earlier and have forgotten!). Though I’m liking the steady progress Oracle. Keep up the good work. I hope you are working on Java 8 as planned for end of next year!

jsr166e: Upcoming java.util.concurrent APIs for Java 8

Jsr166e is to Java8 as Jsr166y was to Java7. Jsr166y introduced the fork join framework and Phaser to Java7 which are worthy of blog posts of their own. The fork join framework will enable us to introduce fine grained inversion of concurrency whereby we can code logic without really needing to think about or implement how that logic will perform on arbitrary hardware.

Now that Java 7 has been released jsr166e has emerged as a repository of utilities that are intended for inclusion into Java 8 next year. Having followed the discussion on the concurrency mailing list I’ve become absolutely fascinated by the work going on in jsr166e for the simple reason that it is catering for use cases that have been directly relevant to my recent work. So without further delay here is an abridged guide to the current state of jsr166e.


  • ConcurrentHashMapV8: A candidate replacement for java.util.concurrent.ConcurrentHashMap with lower memory footprint. The exact improvements of this over the old implementation I’m yet to explore. Here’s the mail thread announcement and discussion.
  • ConcurrentHashMapV8.MappingFunction: A (well overdue) mechanism for automatically computing a value for a key that doesn’t already have one. I’ve waited a long time for this and in my opinion this is the most basic requirement of a concurrent map as without this you always end up locking to create a new mapping.
  • LongAdderTable: A concurrent counting map where a key is associated with an efficient concurrent primitive counter. This provides significantly improved performance over AtomicLong under high contention as it utilises striping across multiple values. I’ve desperately needed this in my job and I am overjoyed that this has finally been written by the experts group. I’ve been exploring and coding up various implementations of my own of such a class recently but I’d rather have this provided by the JDK. Again a very basic requirement and a class that’s well overdue.
  • ReadMostlyVector: Same as Vector but with reduced contention and better throughput for concurrent reads. I’m a little surprised about this one. Does anyone even use Vector anymore? Why not just replace the underlying implementations of Hashtable and Vector with more performant ones? Is there any backward compatibility constraint that’s restricting this?


The following adders are essentially high performance concurrent primitive counters that dynamically adapt to growing contention to reduce it. The key value add here is achieved by utilising striping across values on writes and acting across the stripes for read.

Again, high performance primitive counters, are something I’ve desperately needed in my work lately. Imagine if you are implementing client server protocols. You may need message sequence numbers to ensure you can discard out of order/older messages. You might also need request response id correlation for which id generation is necessary. For any such id generation I wanted to use primitive longs for efficiency and as a result needed a high performance primitive long counter and now I have one!

Important: It’s important to note one limitation of these counting APIs. There are no compound methods like incrementAndGet() or addAndGet() which significantly reduces the utility of such API. I can see why this is the case: although the writes can stripe across values the read must act across all striped values and as a result is quite expensive. I therefore need to think about how much this will compromise the use of this API for the use case of an efficient id generator.

  • DoubleAdder: A high performance concurrent primitive double counter.
  • LongAdder: A high performance concurrent primitive long counter.


The following exhibit similar performance characteristics to the adders above but instead of maintaining a count or sum they maintain a maximum value. These also use striped values for writes and reading across striped values to compute aggregate values.

  • DoubleMaxUpdater: A high performance primitive double maximum value maintainer.
  • LongMaxUpdater: A high performance primitive long maximum value maintainer.


  • SequenceLock: Finally, jsr166e adds an additional synchronisation utility. This is an interesting class which took me two or three reviews of the javadoc example to understand its value add. Essentially it offers the ability to conduct a more accommodating conversation between you and the lock provider whereby you can not only choose not to lock and still retain consistent visibility but also fundamentally allow you to detect when other threads have been active simultaneously with your logic thereby allowing you to retry your behaviour until your read of any state is completely consistent at that moment in time. I can see what value this adds and how to use it but I need to think about real world use cases for this utility.

What is still missing?

Sadly, despite the above, Java shows no signs of addressing a number of other real world use cases of mine.

  • Concurrent primitive key maps
  • Concurrent primitive value maps
  • Concurrent primitive key value maps
  • Externalised (inverted) striping utilities that allow you to hash an incoming key to a particular lock across a distribution of locks. This means that you no longer have to lock entire collections but just the lock relevant to the input you are working with. This is absolutely fundamental and essential in my opinion and has already been written by EhCache for their own use but this should ideally be provided as a building block by the JDK.
  • There’s also been a lot of talk about core-striping as opposed to lock striping which I suppose is an interesting need. In other words instead of the distribution of contention being across lock instances they are across representations (IDs) of physical processor cores. Check the mailing list for details.


I’m very excited indeed by the incorporations of jsr166e not only because they have directly addressed a number of my real world use cases but also because they give an early peek at what’s to come in Java 8. The additional support for primitives is welcome as they will eliminate reliance on the ghastly autoboxing and gc churn of primitive wrappers. I’ll certainly be using these utilities for my own purposes. Keep up the great work! However, I’d love to hear why the above use cases under ‘What’s missing’ still haven’t seen any activity in Java.

Java 7 loop predication bugs surface and workaround known

Software and bugs always have been and always will be inseparable. Java 7 certainly wasn’t going to be the first exception to this rule. Unsurprisingly, since internal testing can never compete with the testing that takes place through mass adoption, in less than a day after Java 7’s release bugs surfaced with loop predication.

Oracle are aware of the issues and Mark Reinhold has suggested a work around while they work on fixes for an early update release. Though apparently update 1 will be security fixes only and loop fixes are more likely to appear in update 2 though they will try to push into update 1 if possible. Keep at it Oracle – you have our support.

Supposedly the bugs were found by Apache Lucene and Solr but I’m sure I’m not the first to wonder why these projects neglected to test with all the openjdk nightly snapshots and particularly with the release candidate.

Update [01/08/2011]: The following links provide a bit more insight on what happened: ‘The real story behind the Java 7 GA bugs affecting Apache Lucene / Solr‘ and ‘Don’t Use Java 7? Are you kidding me?

Java 7 released!

As if you didn’t know – Java 7 is released (1, 2, 3). As the linked post says it’s been a long five years but hopefully more regular release cycles and expert innovation of the kind we’ve already seen in Java 7 will become the norm and turn the droves of skeptics, cynics and deserters back to believing in Java and the JVM as the supreme platform.

The delay hasn’t been all bad. In fact I think it’s been quite positive in many ways. The lack of growth of Java has fostered innovation in jvm languages to try and plug the inadequacies while also creating new things. It’s also spurred its loyal users to do more with less and explore alternative languages and paradigms but also contribute back to Java with what they’ve learnt. And in Java 8 with Project Lambda Java now has the benefit of hindsight in being able to examine, for example, how Scala and Clojure have done things, and take the best of all worlds but at the same it will need to compete effectively with other languages both on and outside of the JVM. The ubiquitous nature of Java means that it must grow and compete in all directions to continue to be so.

This is, in my opinion, as I’m sure you realise – if you look back to what’s gone on in the past year and what is to provisionally come in the next year or two – only the beginning. With Oracle heading Java now this is very much a commercial endeavour and with the first release over the audience is more than ever unrelenting and eagerly awaiting the next.

Paul Graham – Beating the averages

Recently, having been prodded sufficiently by fellow enthusiasts, I’d been looking into the rationale behind Clojure amidst the ongoing explosion of dynamic languages on the jvm. And while I was looking into that, somehow, I came across numerous sites linking to the essay by Paul Graham – Beating the averages. Today I finished reading it and I have to say it was a fascinating, captivating and thought provoking read. Paul Graham is an adept writer. Yes I’m late to this scene. I’ve never really paid much attention to his essays in the past but I guess I’m finding myself going back in time in some ways now.

Also being one of the most persuasive pieces of prose speaking in favour of a language I’ve ever read – it made me incredibly curious about Lisp of which Clojure is a derivative. I think, at this stage, simply as a result of having read that post, there is a real danger of me looking into Lisp along the way which, if nothing else, should at least give me some insight into why the jvm developers Joe Darcy, John Rose and Mark Reinhold have been so enamoured with it and why they take so much inspiration from it.

The Java 7 launch party videos also made numerous mentions of Scala and Clojure which you watch if you haven’t already. The Q&A video at the end is the one I’m referring to here but I’d recommend watching them all in order. Anyway, you should read Paul Graham’s essay simply to provoke thought if for no other reason.

If you’re an avid Paul Graham follower which essay is your favourite that you’d recommend?

JProfiler 7 released

Aside from knowledge of the jvm – profiler technology is another critical skill to possess in our toolbox. It pays to know how to use them and how to leverage new innovations in them to achieve insight into what your code is doing. Speaking of which jprofiler 7 is released with a number of rather interesting features.

It’s interesting that they’ve chosen to add higher level introspection features such as with jdbc. I’ve always felt that introspection tools shouldn’t just regurgitate verbatim whatever they are analysing but should in addition draw conclusions from such lower level details and offer higher level insights.

A classic example of this would be Eclipse Memory Analyser Tool. It doesn’t just give you a listing of all objects in the heap like most other analysers. It reviews the heap contents and is able to draw conclusions and create higher level reports such as leak suspect reports and dominator reports which not only give you the required numbers but also graphical representations so that you can instantly tell what’s going wrong but then it also allows you to drill down into details. I think this is where introspection tools should be heading in general progressively.

Exposing JMX attributes and operations in Java

Something I’m working on currently in my spare time requires me to expose attributes and operations over JMX programmatically without the use of Spring. And I jumped at the opportunity to do a quick post on how to do so.

The general steps are as follows.

  • Write an interface.
  • Write an implementation.
  • Expose over JMX using JMX API.
  • View using JMX Client!

Below I provide a simple but complete example.

Write an interface.

Note the fact that the interface name has ‘MBean’ at the end. This isn’t essential but is a way of telling the JMX api that you are coding by convention. Though this isn’t essential. You can call it whatever you like but you’ll just have to use the JMX api in a slightly different way. Personally I prefer arbitrary naming.

package test;

public interface UserMBean {

    public enum Mood {

    int getAge();

    void setAge(int age);

    String getName();

    void setName(String name);

    String getMood();

    void makeSad();


Write an implementation.

Note that the class name here is the same as the interface name but without ‘MBean’. Again this is coding by JMX convention but isn’t essential.

package test;

public class User implements UserMBean {
    private int age;
    private String name;
    private Mood mood;

    public User(String name, int age, Mood mood) { = name;
        this.age = age;
        this.mood = mood;

    public String getName() {
        return name;

    public int getAge() {
        return age;

    public void setName(String name) { = name;

    public void setAge(int age) {
        this.age = age;

    public String getMood() {
        return mood.toString();

    public void makeSad() {
        mood = Mood.SAD;


Expose over JMX by convention

Here we just pass the user to the jmx api. JMX checks that we are either following the coding convention or that we are passing the interface explicitly.

package test;



import test.UserMBean.Mood;

public class JmxExampleByConvention {

    public static void main(String[] args) throws Exception {
        MBeanServer server = ManagementFactory.getPlatformMBeanServer();
        ObjectName id = new ObjectName("name.dhruba.test:type=test1");
        User user = new User("dhruba", 32, Mood.HAPPY);
        server.registerMBean(user, id);


If you violate the naming convention you will get an exception like the one below.

Exception in thread "main" MBean class test.DefaultUser does not implement DynamicMBean, neither follows the Standard MBean conventions ( Class test.DefaultUser is not a JMX compliant Standard MBean) nor the MXBean conventions ( test.DefaultUser: Class test.DefaultUser is not a JMX compliant MXBean)
	at com.sun.jmx.mbeanserver.Introspector.checkCompliance(
	at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(
	at com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(
	at test.JmxExampleByConvention.main(

Expose over JMX by configuration

Here is the use of the JMX api if we do not wish to use conventional naming for our classes and want to call our classes whatever we want. In this case we have to pass the interface explicitly to JMX.

package test;



import test.UserMBean.Mood;

public class JmxExampleByConfiguration {

    public static void main(String[] args) throws Exception {
        MBeanServer server = ManagementFactory.getPlatformMBeanServer();
        ObjectName id = new ObjectName("name.dhruba.test:type=test1");
        User user = new User("dhruba", 32, Mood.HAPPY);
        StandardMBean mbean = new StandardMBean(user, UserMBean.class);
        server.registerMBean(mbean, id);


View using JMX Client

When you start a JMX client like JVisualVM or JConsole you should initially see some attributes.

Jmx attributes

You can then double click the value cells in JVisualVM to change them or invoke an operation in the Operations tab.

JMX operations

Having done so you’ll end up with new values in the Attributes tab.

JMX attributes

And that’s it. This trick is useful to expose attributes and operations over JMX without the use of Spring and without needing to make your classes Spring beans. Thanks for reading.

LMAX disruptor framework and whitepaper

This is really old news now as I’m very late in posting it but since I’m still coming across people who have remained blissfully unaware I thought this was worth re-iterating. If you haven’t come across this yet drop everything else and read about the LMAX Disruptor framework and the associated whitepaper titled Disruptor: High performance alternative to bounded queues for exchanging data between concurrent threads. There is also an associated (and rather dated now) infoq presentation titled How to Do 100K TPS at Less than 1ms Latency.

In the beginning there was a main thread of execution, then came two and then thousands. Once we had scaled to starvation with threads came SEDA and the concept of queues, hierarchical topologies of queues and lots of writers and readers operating on queues with threads now relegated to second class citizen status. For a while the industry rested in the assurance that it had achieved equilibrium with innovation on latency. Then – out of the blue LMAX happened. LMAX (London Multi Asset eXchange) are the highest performance financial exchange in the world.

Read the whitepaper to find out just how outdated conventional wisdom on concurrent queuing in Java actually is and how a lack of awareness of how your financial code performs end-to-end hardware to VM could be created bottlenecks for your platform. The essence of the disruptor framework is a strikingly simple concept but at the same time profound not only in its effectiveness in attaining its goal – reducing latency – but also in the extent to which it leverages knowledge of hardware and the java virtual machine that it runs on.

It proves wrong beyond doubt the rather outdated mindset that questions employing Java for financial low latency use cases. Ever since Java 5 and particularly Java 6 – the JVM has dwarfed the Java language in its importance, capabilities and scope and as a result utilising Java is now fundamentally synonymous with utilising the JVM which is what makes the language so compelling.

It isn’t about the code that you write. It’s about the code that’s interpreted and then runs natively. It is naive to consider only the language as many seem to be doing in the light of the imminent release of Java 7. It’s important to bear in mind that whilst language sugar is important if runtime matters to you then you’ll want to focus on: (1) the VM (2) writing wholly non-idiomatic Java and (3) opposing conventional wisdom at every level of abstraction every step of the way.

Performance pattern: Modulo and powers of two

The modulo operator is rare but does occur in certain vital use-cases in Java programming. I’ve been seeing it a lot in striping, segmenting, pipelining and circularity use cases lately. The normal and naive implementation is as below.

    public static int modulo(int x, int y) {
        return x % y;

Recently I saw the following pipelining logic in quite a few places in the codebase on a fast runtime path. This essentially takes an incoming message and uses the following to resolve a queue to enqueue the message onto for later dequeuing by the next stage of the workflow.

int chosenPipeline = input.hashCode() % numberOfPipelines

It is little known, however, that on most hardware the division operation and as a result the modulo operation can actually be quite expensive. You’ll notice that for example modulo is never used in hash functions for example. Have you ever asked why? The reason is that there is a far quicker alternative: bitwise AND. This does involve making a small compromise however in the inputs to the original problem. If we are willing to always supply y in the above method as a power of two we can do the following instead.

    public static int moduloPowerOfTwo(int x, int powerOfTwoY) {
        return x & (powerOfTwoY - 1);

This is dramatically quicker. To give you some stats which you should be asking for at this point see the table below.

Iterations Modulo (ms) PowerOfTwoModulo (ms)
10*4*32 5 1
10*5*32 24 4
10*6*32 237 54
10*7*32 2348 549
10*8*32 30829 5504
10*9*32 320755 54947

You might be thinking at this point – if I’m expecting a power of two I should validate the input to make sure it is so. Well, that’s one viewpoint. The other is, if you’re supplying y or if it is statically configured at startup then you can make sure it is a power of two without taking the performance hit of a runtime check. But if you really want to check here’s how to do so.

    public static boolean isPowerOfTwo(int i) {
        return (i & (i - 1)) == 0;

So the next time you’re writing something that falls into one of the above use cases or any other for the modulo operator and your method needs to be fast at runtime for one reason or another consider the faster alternative. Certainly for price streaming (which is what I’m doing) latency matters! It would be interesting to check whether the java compiler actually makes this optimisation by substitution for you automatically. If so one can stick with the slower alternative for better readability.

The intelligent reader might say that in any such typical modulo use case the use of a bounded data structure and the resulting contention will far outweigh the costs of the modulo operation and the reader would be right in saying so but that’s another problem space entirely that I intend to explore separately. In short – there’s no need to be limited by a bounded data structure 🙂