Performance pattern: Modulo and powers of two

The modulo operator is rare but does occur in certain vital use-cases in Java programming. I’ve been seeing it a lot in striping, segmenting, pipelining and circularity use cases lately. The normal and naive implementation is as below.

    public static int modulo(int x, int y) {
        return x % y;

Recently I saw the following pipelining logic in quite a few places in the codebase on a fast runtime path. This essentially takes an incoming message and uses the following to resolve a queue to enqueue the message onto for later dequeuing by the next stage of the workflow.

int chosenPipeline = input.hashCode() % numberOfPipelines

It is little known, however, that on most hardware the division operation and as a result the modulo operation can actually be quite expensive. You’ll notice that for example modulo is never used in hash functions for example. Have you ever asked why? The reason is that there is a far quicker alternative: bitwise AND. This does involve making a small compromise however in the inputs to the original problem. If we are willing to always supply y in the above method as a power of two we can do the following instead.

    public static int moduloPowerOfTwo(int x, int powerOfTwoY) {
        return x & (powerOfTwoY - 1);

This is dramatically quicker. To give you some stats which you should be asking for at this point see the table below.

Iterations Modulo (ms) PowerOfTwoModulo (ms)
10*4*32 5 1
10*5*32 24 4
10*6*32 237 54
10*7*32 2348 549
10*8*32 30829 5504
10*9*32 320755 54947

You might be thinking at this point – if I’m expecting a power of two I should validate the input to make sure it is so. Well, that’s one viewpoint. The other is, if you’re supplying y or if it is statically configured at startup then you can make sure it is a power of two without taking the performance hit of a runtime check. But if you really want to check here’s how to do so.

    public static boolean isPowerOfTwo(int i) {
        return (i & (i - 1)) == 0;

So the next time you’re writing something that falls into one of the above use cases or any other for the modulo operator and your method needs to be fast at runtime for one reason or another consider the faster alternative. Certainly for price streaming (which is what I’m doing) latency matters! It would be interesting to check whether the java compiler actually makes this optimisation by substitution for you automatically. If so one can stick with the slower alternative for better readability.

The intelligent reader might say that in any such typical modulo use case the use of a bounded data structure and the resulting contention will far outweigh the costs of the modulo operation and the reader would be right in saying so but that’s another problem space entirely that I intend to explore separately. In short – there’s no need to be limited by a bounded data structure 🙂

1 thought on “Performance pattern: Modulo and powers of two”

Leave a Reply