IBM WebSphere Application Server Performance Cookbook

Oracle (HotSpot) Java

Oracle (HotSpot) Java Recipe

In addition to the overall recipe in the Java chapter,

In most cases, the -XX:+UseParallelOldGC garbage collection policy works best, with the key tuning being the maximum heap size (-Xmx) and maximum new generation size (-XX:MaxNewSize).
Set -XX:+HeapDumpOnOutOfMemoryError.
When using ergonomics, consider tuning -XX:MaxGCPauseMillis and -XX:GCTimeRatio.
When fine-tuning is required, consider disabling ergonomics (-XX:-AdaptiveSizePolicy) and tune the SurvivorRatio (-XX:SurvivorRatio).

General

Use -XX:+PrintFlagsFinal to see all the options the JVM actually starts with.

Garbage Collection

By default, the collector uses N threads for minor collection where N = # of CPU core threads. Control with -XX:ParallelGCThreads=N

Comparing Policies

	HotSpot -XX:+ UseParallelOldGC	Hotspot -XX:+ UseConcMarkSweepGC	HotSpot -XX:+ UseG1GC
Generational - most GC pauses are short (nursery/scavenge collections)	Yes	Yes	Yes
Compaction	Always	Never	Partial
Large Heaps (>10GB)	Maybe	Maybe	Yes
Soft Real Time - all GC pauses are very short (unless cpu/heap exhaustion occurs)	No	Yes	Yes
Hard Real Time - requires hard real time OS, all GC pauses are very short (unless CPU/heap exhaustion occurs)	No	No	No
Benefits	Tries to balance application throughput with low pause times	Special circumstances	Strategic direction
Potential Consequences	Not designed for low latency requirements	- Non-compacting - requires strategy to force compactions - Hard to tune - Larger memory footprint (~30%) - Reduced throughput - Longest worst-case pause times (when compaction is unavoidable)	Hard to tune
Recommended for	General Use (e.g. Web applications, messaging systems)	Special circumstances, e.g. SIP-based (voice/video) systems	Large heaps (>10GB)

Ergonomics

Prior to the J2SE platform version 5.0 tuning for garbage collection consisted principally of specifying the size of the the overall heap and possibly the size of the generations in the heap. Other controls for tuning garbage collection include the size of the survivor spaces in the young generation and the threshold for promotion from the young generation to the old generation. Tuning required of a series of experiments with different values of these parameters and the use of specialized tools or just good judgment to decide when garbage collection was performing well.

http://www.oracle.com/technetwork/java/ergo5-140223.html

The goal of ergonomics is to provide good performance with little or no tuning of command line options.

http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html#ergonomics

The implementation checks (in this order):

If the GC pause time is greater than the pause time goal then reduce the generations sizes to better attain the goal.

If the pause time goal is being met then consider the application's throughput goal. If the application's throughput goal is not being met, then increase the sizes of the generations to better attain the goal.

If both the pause time goal and the throughput goal are being met, then the size of the generations are decreased to reduce footprint.

-XX:MaxGCPauseMillis=nnn

A hint to the virtual machine that pause times of nnn milliseconds or less are desired. The VM will adjust the java heap size and other GC-related parameters in an attempt to keep GC-induced pauses shorter than nnn milliseconds. Note that this may cause the VM to reduce overall throughput, and in some cases the VM will not be able to meet the desired pause time goal.

By default there is no pause time goal. There are definite limitations on how well a pause time goal can be met. The pause time for a GC depends on the amount of live data in the heap. The minor and major collections depend in different ways on the amount of live data. This parameter should be used with caution. A value that is too small will cause the system to spend an excessive amount of time doing garbage collection.

-XX:GCTimeRatio=nnn

A hint to the virtual machine that it's desirable that not more than 1 / (1 + nnn) of the application execution time be spent in the collector.

For example -XX:GCTimeRatio=19 sets a goal of 5% of the total time for GC and throughput goal of 95%. That is, the application should get 19 times as much time as the collector.

By default the value is 99, meaning the application should get at least 99 times as much time as the collector. That is, the collector should run for not more than 1% of the total time. This was selected as a good choice for server applications. A value that is too high will cause the size of the heap to grow to its maximum.

http://docs.oracle.com/javase/7/docs/technotes/guides/vm/gc-ergonomics.html

If you have set -Xms != -Xmx, and default or reasonable values of -Xminf/-Xmaxf, yet you see unexpected heap expansions or contractions (particularly during nursery collects), then ergonomics is likely the cause.

Ergonomics may be disabled with -XX:-AdaptiveSizePolicy.

Default Throughput/Parallel Scavenge Collector (ParallelGC)

The throughput collector that performs parallel scavenge copy collection on the young generation. This type of garbage collection is the default type on multi-processor server class machines.

Two types of tuning for this collector:

Option 1: Use the default throughput/parallel scavenge collector with built-in tuning enabled.

Starting with Version 5, the Sun HotSpot JVM provides some detection of the operating system on which the server is running, and the JVM attempts to set up an appropriate generational garbage collection mode, that is either parallel or serial, depending on the presence of multiple processors, and the size of physical memory. It is expected that all of the hardware, on which the product runs in production and preproduction mode, satisfies the requirements to be considered a server class machine. However, some development hardware might not meet this criteria.

The behavior of the throughput garbage collector, whether tuned automatically or not, remains the same and introduces some significant pauses, that are proportional to the size of the used heap, into execution of the Java application system as it tries to maximize the benefit of generational garbage collection. However, these automatic algorithms cannot determine if your workload well-suits its actions, or whether the system requires or is better suited to a different garbage collection strategy.

Consult these tuning parameters:

    -XX:+UseParallelGC

    -XX:+UseAdaptiveSizePolicy

    -XX:+AggressiveHeap

Option 2: Use the default throughput/parallel scavenge collector, but tune it manually.

Disadvantages of using the built-in algorithm that is established using the -XX:+UseAdaptiveSizePolicy parameter, include limiting what other parameters, such as the -XX:SurvivorRatio parameter, can be configured to do in combination with the built-in algorithm. When you use the built-in algorithm, you give up some control over determining the resource allocations that are used during execution. If the results of using the built-in algorithm are unsatisfactory, it is easier to manually configure the JVM resources, than to try and tune the actions of the algorithm. Manually configuring the JVM resources involves the use of half as many options as it takes to tune the actions of the algorithm.

Consult these tuning parameters:

    -XX:NewRatio=2 This is the default for a server that is configured for VM mode

    -XX:MaxNewSize= and -XX:NewSize=

    -XX:SurvivorRatio=

    -XX:+PrintTenuringDistribution

    -XX:TargetSurvivorRatio=

http://pic.dhe.ibm.com/infocenter/wasinfo/v8r5/topic/com.ibm.websphere.nd.multiplatform.doc/ae/tprf_hotspot_jvm.html

http://pic.dhe.ibm.com/infocenter/wasinfo/v8r5/topic/com.ibm.websphere.nd.multiplatform.doc/ae/rprf_hotspot_parms.html

Concurrent low-pause mark-sweep collector (CMS)

This collector is a radical departure from the evolution of generational garbage collection that has underpinned the Hotspot architecture, permitting the overlap of application thread processing with a dedicated low-priority, background garbage collection thread. If your application data is incompatible with the behavior of the default throughput collector, then the concurrent mark-sweep (CMS) collector might be a viable strategy, particularly for application systems that are intolerant of invasive pauses. This collector is particularly helpful with the very large heaps that are used with the 64-bit JVM, or applications that have a large set of long-lived data, also referred to as a large tenured generation, and that maintains comparatively good cache utilization, largely preserving pages of the young generation, even while the background thread must scan through all the pages of the entire heap.

To employ the concurrent mark-sweep collector as the principle housekeeping agent, add this option, instead of any other garbage collection modes, to your JVM configuration.

Consult these tuning parameters:

    -XX:+UseConcMarkSweepGC

    -XX:CMSInitiatingOccupancyFraction=75

    -XX:SurvivorRatio=6

    -XX:MaxTenuringThreshold=8

    -XX:NewSize=128m

Among the difficulties for tuning with CMS, is that the worst case garbage collection times, which is when the CMS cycle aborts, can take several seconds, which is especially costly for a system that uses CMS precisely to avoid long pauses. Consequently, service level agreements might dictate the use of CMS, because the average or median pause times are very, very low, and the tuning must err on the cautious side to ensure that CMS cycles don't abort. CMS succeeds only when its anticipatory trigger ensures that the CMS cycle always starts early enough to ensure sufficient free resources are available before they are demanded. If the CMS collector is unable to finish before the tenured generation fills up, the collection is completed by pausing the application threads, which is known as a full collection. Full collections are a sign that further tuning is required to the CMS collector to make it better suit your application.

Finally, unlike other garbage collection modes with a compaction phase, the use of CMS theoretically raises the risk of fragmentation occurring with the HotSpot. However, in practice this is rarely a problem while the collection recovers a healthy proportion of the heap. In cases when the CMS fails, or aborts a collection, an alternative compacting garbage collection is triggered. Inevitably any other type of garbage collection incurs a significant invasive pause compared to a normal CMS collection.

http://pic.dhe.ibm.com/infocenter/wasinfo/v8r5/topic/com.ibm.websphere.nd.multiplatform.doc/ae/tprf_hotspot_jvm.html

http://pic.dhe.ibm.com/infocenter/wasinfo/v8r5/topic/com.ibm.websphere.nd.multiplatform.doc/ae/rprf_hotspot_parms.html

Since Java 6, the option -XX:+ExplicitGCInvokesConcurrent may be used to force System.gc's to run concurrently instead of as stop-the-world operations (http://docs.oracle.com/javase/6/docs/technotes/guides/vm/cms-6.html).

"The incremental mode of CMS (i-CMS) has been deprecated and will likely be removed in a future release. It is recommended to use G1 or regular CMS instead of i-CMS." (http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8003822)

CMS Compaction

"CMS (Concurrent Mark Sweep ) garbage collection does not do compaction." (http://www.oracle.com/technetwork/java/javase/tech/g1-intro-jsp-135488.html)

Garbage-First Garbage Collector (G1GC)

The Garbage-First (G1) garbage collector is fully supported in Oracle JDK 7 update 4 and later releases. The G1 collector is a server-style garbage collector, targeted for multi-processor machines with large memories. It meets garbage collection (GC) pause time goals with high probability, while achieving high throughput. Whole-heap operations, such as global marking, are performed concurrently with the application threads.

http://www.oracle.com/technetwork/java/javase/tech/g1-intro-jsp-135488.html

Verbose garbage collection (-verbose:gc)

See the verbosegc section in the general Java chapter for background.

Verbose garbage collection is written to stdout (e.g. native_stdout.log).

With Java >= 6 update 4, run with

-verbose:gc -XX:+PrintGCDateStamps -XX:+PrintGCDetails

With Java < 6 update 4, run with

-verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails

-XX:+PrintHeapAtGC may be used for additional information although it has a significant overhead.

HP-UX adds the -Xverbosegc option in addition to the existing verbose GC options. This data is more details and can be graphed in HPjmeter.

Send verbose:gc output to a particular log file: -Xloggc:output.log

Starting with versions Java 6 Update 34 and Java 7 Update 2, use -Xloggc:verbosegc.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=100M (http://bugs.java.com/bugdatabase/view_bug.do?bug_id=6941923)

Example:

2010-04-22T18:12:27.796+0200: 22.317: [GC 59030K->52906K(97244K), 0.0019061 secs]

If the verbosegc includes "[Full GC (System)" then it was caused by a call to System.gc or Runtime.gc.

Detailed Garbage Collection Tuning

On 64-bit, ensure -XX:+UseCompressedOops is enabled: "Compressed oops is supported and enabled by default in Java SE 6u23 and later" (http://docs.oracle.com/javase/7/docs/technotes/guides/vm/performance-enhancements-7.html)

-XX:+AggressiveOpts:

Turns on point performance optimizations that are expected to be on by default in upcoming releases. The changes grouped by this flag are minor changes to JVM runtime compiled code and not distinct performance features (such as BiasedLocking and ParallelOldGC). This is a good flag to try the JVM engineering team's latest performance tweaks for upcoming releases. Note: this option is experimental! The specific optimizations enabled by this option can change from release to release and even build to build. You should reevaluate the effects of this option with prior to deploying a new release of Java.

http://www.oracle.com/technetwork/java/tuning-139912.html#section4.2.4

Consider -XX:+UseTLAB which "uses thread-local object allocation blocks. This improves concurrency by reducing contention on the shared heap lock." (http://docs.oracle.com/cd/E13209_01/wlcp/wlss30/configwlss/jvmgc.html)

Setting -XX:MaxTenuringThreshold=0 means that all the objects surviving a minor GC are instantly promoted. This can cause more full GCs.

Permanent Region (permgen)

HotSpot used to have a dedicated region of the address space called the permanent generation to store things such as class meta-data, interned Strings, and class static variables. This region needed to be manually sized. If the region was exhausted, the JVM would throw an OutOfMemoryError with the message "PermGen space." The PermGen space has been removed in Java 8 (http://openjdk.java.net/projects/jdk8/milestones) and replaced with the Metaspace (unbounded by default but may be capped with -XX:MaxMetaspaceSize).

Hotspot's representation of Java classes (referred to here as class meta-data) is currently stored in a portion of the Java heap referred to as the permanent generation. In addition, interned Strings and class static variables are stored in the permanent generation. The permanent generation is managed by Hotspot and must have enough room for all the class meta-data, interned Strings and class statics used by the Java application. Class metadata and statics are allocated in the permanent generation when a class is loaded and are garbage collected from the permanent generation when the class is unloaded. Interned Strings are also garbage collected when the permanent generation is GC'ed.

The proposed implementation will allocate class meta-data in native memory and move interned Strings and class statics to the Java heap. Hotspot will explicitly allocate and free the native memory for the class meta-data. Allocation of new class meta-data would be limited by the amount of available native memory rather than fixed by the value of -XX:MaxPermSize, whether the default or specified on the command line.

http://openjdk.java.net/jeps/122

"The -XX:MaxPermSize= and -Xmx (Maximum Java Heap size) parameters respectively configure the maximum size of the permanent region, where the class code and related data are logically presented as part of the old generation region but are kept physically separate, and the maximum size of the main heap where Java objects and their data are stored either in the young or old generation regions. Together the permanent region and the main heap comprise the total Java heap. An allocation failure in either of these regions either represents the inability to accommodate either all the application code or all the application data, both of which are terminal conditions, that can exhaust available storage, and cause an OutOfMemory error." (http://pic.dhe.ibm.com/infocenter/wasinfo/v8r5/topic/com.ibm.websphere.nd.multiplatform.doc/ae/tprf_hotspot_jvm.html, https://docs.oracle.com/javase/7/docs/webnotes/tsg/TSG-VM/html/memleaks.html)

Heap Expansion and Contraction

Java heap expansion and contraction is generally controlled by -XX:MinHeapFreeRatio/-Xminf and -XX:MaxHeapFreeRatio/-Xmaxf: http://docs.oracle.com/cd/E19683-01/806-7930/vmoptions-chapter/index.html

However, ergonomics may sometimes render these options moot.

Parallel Reference Processing

By default, there is a single "Reference Handler" thread which processes the ReferenceQueue. Use -XX:+ParallelRefProcEnabled to enable multiple threads for parallel reference queue processing. This may be useful for things such as high DirectByteBuffer allocation and free rates.

String.substring Performance

HotSpot V7 update 6 introduced a significant change to the implementation of java/lang/String, where calls to substring no longer return a "view" into the String, but instead return a copy (of the substring portion):

List of Java SE 7 Release Notes: http://www.oracle.com/technetwork/java/javase/7u-relnotes-515228.html
List of bugs fixed in 7u6: http://www.oracle.com/technetwork/java/javase/2col/7u6-bugfixes-1733378.html
Change request discussing the changes: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6924259
Developer mailing list discussing the changes: http://mail.openjdk.java.net/pipermail/core-libs-dev/2013-February/014609.html
Java Lobby article on the subject: http://java.dzone.com/articles/changes-stringsubstring-java-7
Java Performance Tuning article on the subject: http://java-performance.info/changes-to-string-java-1-7-0_06/

If profiling shows significant activity in substring or in array copy, then this may be why. In general, the change is believed to be positive because with the old behavior, the original, potentially very large, String canont be garbage collected unitl all substrings are garbage collected. However, if applications use substring heavily, then they may need to be re-coded.

Reflection Inflation

For a discussion of reflection and inflation, see the general Java chapter. On the HotSpot JVM, the option -Dsun.reflect.inflationThreshold=0 creates an inflated Java bytecode accessor which is used on the second and every subsequent method invocation.

Previous Section (IBM Java) | Next Section (Java Profilers) | Back to Table of Contents

Footer links