| By Paul Murray | Article Rating: |
|
| December 15, 2009 10:15 AM EST | Reads: |
2,146 |
Java Development on Ulitzer
Writing meaningful Java benchmarks is a tricky business. It's well known that the Java Virtual Machine's just in time (JIT) compilation process means that running an application for a few seconds won't let you predict the performance of the application over hours or days of uptime. In spite of this, developers often rely on micro-benchmarks to set performance SLAs for their applications.
Micro-benchmarks test some small, discrete component of an application. They're usually written in an effort to benchmark a component considered critical to the app's overall performance. Here's a typical example, summing all the numbers from one to a specified limit:
long accumulatedTotal(int limit) {
long result=0L;
for (int i=1; i<=limit; i++) {
result += i;
}
return result;
}
How long does this method take to execute for different values of limit? On my 32-bit OpenSolaris machine using Java 6u16 the first call to this method shows a non-linear runtime for small values of limit:

The method is initially executed by an interpreter and the JIT compiler doesn't kick in until the JVM detects that the method is doing a lot of looping. It then takes some time to compile the method, during which time the loop continues to run under the interpreter. Once this process is complete, the JVM can jump into the compiled version of the method and the average runtime for the remaining iterations is greatly reduced. The upshot is that
for small values of limit, the runtime is non-linear, but as limit gets much larger it becomes much more predictable.
Benchmarks like SPECjvm2008 know about this quirk in the behavior of the JVM and try to account for it. SPECjvm2008 repeatedly runs its benchmark code for a fixed "warm-up" period - typically 120 seconds. The warm-up time must be long enough to allow compilation of all the performance-critical methods in the benchmark code. You can observe the compilation process in the JVM by adding the Java command-line option -XX:+PrintCompilation. The benchmark then continues to execute for a "run" period - for SPECjvm2008 the default duration is 240 seconds - and the benchmark result is the number of iterations of the benchmark that were completed per minute of the run period.
But does this approach really predict application performance? As so often with the JVM, it turns out to be more complicated than you expect.
A modern JVM such as Sun's Java 6 contains a JIT compiler that is capable of generating native code every bit as efficient as that produced by an ahead-of-time (AOT) C++ compiler. Among the many optimizations performed by the JIT is method inlining. Calling small methods can incur a large overhead - for example, Java classes routinely contain getter and setter methods that simply read or write the value of a variable. The cost of calling a trivial method can be high relative to the work performed by the method, so the JIT will attempt to copy the body of a small method into its caller.
Consider a simple getter method like this:
Class Limit {
private int limit;
public Limit(int limit) {
this.limit = limit;
}
public int getLimit() {
return limit;
}
}
If this is called from a loop like this, where "o" is an instance of class Limit:
for (int i=0; i<o.getLimit(); i++) {
// Do something
}
then it would be nice to inline the getLimit() method so that the resulting code looks like this:
for (int i=0; i<o.limit; i++) {
// Do something
}
In an AOT compiler, this inlining might not be possible. For example, the object "o" might not be an instance of Limit but of some subclass of Limit that overrides the getLimit() method. Because it compiles with knowledge of runtime behavior and can throw away compiled code if necessary, the JIT can decide to inline Limit.getLimit(). If "o" is ever seen to be an instance of a subclass of Limit, then the compiled code can be discarded and the loop recompiled appropriately.
Let's take a look at the Arrays.sort(Object[] a) method. The javadocs say that all elements in the array must implement the Comparable interface. Arrays.sort() make many calls to the compareTo() method of the elements it is sorting. The compareTo() method in class Integer is very simple:
public int compareTo(Integer anotherInteger) {
int thisVal = this.value;
int anotherVal = anotherInteger.value;
return (thisVal<anotherVal ? -1 : (thisVal==anotherVal ? 0 : 1));
}
If you only ever call Arrays.sort() on arrays consisting entirely of Integers, the JIT compiler can detect this and inline the Integer.compareTo() method into the sort().
What happens if you later call Arrays.sort() to sort an array of instances of class Short? The compiled code performs a quick check on each array element to ensure that it is an Integer. When this check fails, the compiled code is discarded and Arrays.sort() is recompiled to account for the new set of observed array element types.
The updated compiled code will now inline both Integer.compareTo() and Short.compareTo(). The decision about which inlined method to execute will be taken by explicitly checking whether the array element is of class Integer or Short. Even with two different implementors of the Comparable interface, the JIT compiler is still able to perform method inlining.
Let's go even further and call Arrays.sort() on an array of instances of class Byte. If the JIT compiler now inlines Byte.compareTo() as well, we'll have three different inlined versions of this method in the Arrays.sort() code. This is all starting to get out of hand!
Now the JIT compiler changes strategy. It throws away all its previous inlining of the compareTo() method and does a traditional vtable dispatch just like a static compiler would do. For very small methods such as Integer.compareTo(), this can have a significant performance impact as the time spent calling and returning from such a tiny method may be far greater than the time spent executing its code.
The limitation that only two receivers of a virtual method dispatch can be inlined is known as the bimorphic inlining policy and is a poorly understood but significant limitation on JIT performance.
Listing 1 contains a benchmark program that shows this effect in action.
The test code calls Arrays.sort() five times. First it calls it on an array of Byte objects. This is simply a "warm up" and the results are ignored. It then calls Arrays.sort() on an array of Bytes, then on an array of Shorts, then an array of Integers, and finally Arrays.sort() is called for a second time on an array of Bytes.
Here's the result of running this benchmark using 32-bit Java 6u16 on my OpenSolaris machine. For this test the arrays to be sorted have 100,000 elements:
Java SortBench 100000
(WARMUP) BYTE SORT PERFORMANCE = 1778 operations per minute
(FIRST PASS) BYTE SORT PERFORMANCE = 2034 operations per minute
SHORT SORT PERFORMANCE = 1623 operations per minute
INTEGER SORT PERFORMANCE = 1261 operations per minute
(SECOND PASS) BYTE SORT PERFORMANCE = 1307 operations per minute
As you can see, the warm-up performance is - as expected - a bit lower than for the first pass of the byte sort test. However the performance of subsequent sort tests falls away quite sharply and the performance of the second byte sort test is only 64% of the performance seen first time round.
For complex server-side applications where it's likely that Arrays.sort() will be called on by many different classes, the real performance when sorting 100,000 random byte instances is likely to be about 1,307 operations per minute, not 2,034 operations per minute as the "standard" benchmarking approach suggests.
This limitation is also important in attempting to benchmark Scala code. Scalac - the Scala compiler - rewrites Scala code blocks as standard Java classes that implement a special Scala interface. This interface requires a method called apply() that executes the code block. As you create code blocks in Scala, you are actually creating tiny Java classes that implement an interface. This is the same as the scenario when calling Arrays.sort() with different implementors of Comparable.
Take a look at the Scala benchmark in Listing 2. The benchmark repeatedly sums all the numbers from 1 to 1 million for a specified period of time and then prints out the number of iterations completed and the accumulated result.
Notice that the same code block is used for all executions of the benchmark. This looks more straightforward than the Java case seen previously. But here's the output from this program:
scala Test
WARM-UP RUN : 7628 operations per minute
FIRST RUN : 7713 operations per minute
SECOND RUN : 6185 operations per minute
THIRD RUN : 6134 operations per minute
Even though the code block is identical in all four cases, each code block is handled inside its own unique class. Once again the effect of the bimorphic inlining policy in the JIT compiler is to generate more performant code for the runBench method when only one or two implementors of the interface have been seen.
The bimorphic inlining policy is a great example of why micro-benchmarking isn't just hard in Java - it's all but impossible. Even if you know about this issue and work around it, the probability is that there will be something else in the JVM to trip you up and give you phony results. Generating masses of micro-benchmark numbers can be worse than useless - they give the erroneous impression of scientific accuracy that can lead to performance SLAs that can't be met in production.
Bottom line - don't use micro-benchmarks as predictors of application performance. Run your app for real and profile, then profile again, and then profile some more!
Published December 15, 2009 Reads 2,146
Copyright © 2009 Ulitzer, Inc. — All Rights Reserved.
Syndicated stories and blog feeds, all rights reserved by the author.
More Stories By Paul Murray
Paul Murray is a freelance specialist in Java performance analysis. He was previously a Technology Fellow at a leading Wall Street Investment Bank where he provided Java performance and scalability consultancy to several thousand Java developers. Paul was also a Member of Technical Staff at Silicon Graphics, working on dynamic compiler implementation and porting Sun's JRE to the SGI IRIX platform.
- Reflections on Java Command Line Options
- Java vs C++? Really?
- Cloud Computing Bootcamp Returns to Cloud Expo in New York April 20, 2010
- 101 on jQuery Selector Performance
- Sun Finally Belongs to Oracle
- ZIP and UNZIP with Passwords in Java
- EC Antitrust Chief’s Job Ambitions Reportedly Delay Oracle-Sun OK
- Developing a Master-Detail View
- eXtremeDB Gets Java Front End
- Browsing Database Artifacts Using Data Source Explorer
- Is This the End of Enterprise Software?
- New Beginnings: It's Time for the Yearly ‘State of the JCP’ Review
- Reflections on Java Command Line Options
- Java vs C++? Really?
- What is Enterprise Cloud Computing?
- Oracle Claims Victory Over EC; Says Sun Will Sell Clouds
- Why Cops and Java Developers Have Low Salaries?
- Making the Impossible Easy: Failover for Any Application
- Cloud Computing Bootcamp Returns to Cloud Expo in New York April 20, 2010
- 101 on jQuery Selector Performance
- Oracle-Sun: MySQL Creator Kicks Off Worldwide Petition
- Case Study in Secure Software Development
- Sun Finally Belongs to Oracle
- ZIP and UNZIP with Passwords in Java
- Secrets Of The Masters: Core Java Job Interview Questions
- Java Developer's Journal Exclusive: 2006 "JDJ Editors' Choice" Awards
- The i-Technology Right Stuff
- Creating Web Applications with the Eclipse Web Tools Project
- Rich Internet Applications with Adobe Flex 2 and Java
- Java vs C++ "Shootout" Revisited
- Who Are The All-Time Heroes of i-Technology?
- Creating a Pet Store Application with JavaServer Faces, Spring, and Hibernate
- What's New in Eclipse?
- Developing Web Services "Eclipse Web Tools Project"
- SYS-CON Media Readers Cast More Than 4,000 Votes In First Week Of Voting
- Java Feature — Bringing Together Eclipse,WTP, Struts, and Hibernate





















Ulitzer content is offered under Creative Commons "Attribution Non-Commercial No Derivatives" License.
For any reuse or distribution, you must make clear to others the license terms of this work.
The best way to do this is with a link to this web page.
Any of the above conditions can be waived if you get written permission from Ulitzer, Inc., the copyright holder.
Nothing in this license impairs or restricts the author's moral rights.