HotSpot JVM

HotSpot JVM Recipe

  1. Review the JVM-independent recipe in the Java chapter.
  2. In most cases, the default -XX:+UseG1GC or -XX:+UseParallelOldGC garbage collection policies (depending on version) work best, with the key tuning being the maximum heap size (-Xmx).
  3. Set -XX:+HeapDumpOnOutOfMemoryError.
  4. Enable verbose garbage collection and use a tool such as the Garbage Collection and Memory Visualizer to confirm the proportion of time in stop-the-world garbage collection pauses is less than ~10% and ideally less than 1%.
    1. Check for long individual pause times (e.g. greater than 400ms or whatever response time expectations are)
    2. For G1GC, check for humongous allocations.
    3. Review the latest garbage collection tuning guidance.

General

Use -XX:+PrintFlagsFinal to see all the options the JVM actually starts with.

Garbage Collection

By default, the collector uses N threads for minor collection where N = # of CPU core threads. Control with -XX:ParallelGCThreads=N

Comparing Policies
-XX:+UseParallelOldGC -XX:+UseG1GC -XX:+UseShenandoahGC -XX:+UseZGC -XX:+UseSerialGC -XX:+UseConcMarkSweepGC -XX:+UseEpsilonGC
Generational - most GC pauses are short (nursery/scavenge collections) Yes (Two Generations) Yes (Two Generations) Yes (One Generation) Yes (One Generation) Yes (Two Generations) Yes (Two Generations) No
Compaction Alayws Partial Concurrent ? No Never No
Large Heaps (>10GB) Maybe Yes Yes ? ? Maybe Maybe
Soft Real Time - all GC pauses are very short (unless cpu/heap exhaustion occurs) No Yes Yes ? No Yes Yes
Hard Real Time - requires hard real time OS, all GC pauses are very short (unless CPU/heap exhaustion occurs) No No No ? No No Yes
Benefits Tries to balance application throughput with low pause times Regionalized heap - good for very large heaps Designed for very large heaps ? No cross-thread contention Special circumstances No garbage collection
Potential Consequences Not designed for low latency requirements ? ? ? Potentially limited throughput of GC Non-compacting - requires strategy to force compactions; Hard to tune; Larger memory footprint (~30%); Reduced throughput; Longest worst-case pause times (when compaction is unavoidable); Deprecated Memory exhaustion
Recommended for General Use (e.g. Web applications, messaging systems) General Use on recent versions of Java Large heaps (>10GB) ? ? Deprecated Benchmarks
Garbage-First Garbage Collector (G1GC)

The Garbage First Garbage Collector (G1GC) is a multi-region, generational garbage collector. Review the G1GC Tuning Guide.

G1GC is the default collector starting with Java 9.

Humongous objects

Any object larger than half the region size (-XX:G1HeapRegionSize) is considered a humongous object, it's allocated directly into the old generation, and it consumes the entire region which drives fragmentation.

Print humongous requests from a verbosegc:

awk 'BEGIN {print "Humongous Allocation";} /humongous/ { for (i=1;i<=NF;i++) { if ($i == "allocation" && $(i+1) == "request:") { print $(i+2); } } }' $FILE > data.csv

To create a histogram using Python+Seaborn:

import seaborn as sns
import matplotlib
import matplotlib.pyplot as plt
import pandas as pd
sns.set_theme()
data = pd.read_csv("data.csv")
axes = sns.histplot(data, x="Humongous Allocation")
axes.ticklabel_format(style='plain')
axes.get_xaxis().set_major_formatter(matplotlib.ticker.StrMethodFormatter('{x:,.0f}'))
axes.get_yaxis().set_major_formatter(matplotlib.ticker.StrMethodFormatter('{x:,.0f}'))
plt.xticks(rotation=45)
plt.tight_layout()
plt.savefig('humongous.png')
plt.show()
Throughput/Parallel Scavenge Collector (ParallelGC)

This is the default policy on Java 8 and below.

The throughput collector that performs parallel scavenge copy collection on the young generation. This type of garbage collection is the default type on multi-processor server class machines.

Two types of tuning for this collector:

Option 1: Use the default throughput/parallel scavenge collector with built-in tuning enabled.

Starting with Version 5, the Sun HotSpot JVM provides some detection of the operating system on which the server is running, and the JVM attempts to set up an appropriate generational garbage collection mode, that is either parallel or serial, depending on the presence of multiple processors, and the size of physical memory. It is expected that all of the hardware, on which the product runs in production and preproduction mode, satisfies the requirements to be considered a server class machine. However, some development hardware might not meet this criteria.

The behavior of the throughput garbage collector, whether tuned automatically or not, remains the same and introduces some significant pauses, that are proportional to the size of the used heap, into execution of the Java application system as it tries to maximize the benefit of generational garbage collection. However, these automatic algorithms cannot determine if your workload well-suits its actions, or whether the system requires or is better suited to a different garbage collection strategy.

Consult these tuning parameters:
-XX:+UseParallelGC
-XX:+UseAdaptiveSizePolicy
-XX:+AggressiveHeap

Option 2: Use the default throughput/parallel scavenge collector, but tune it manually.

Disadvantages of using the built-in algorithm that is established using the -XX:+UseAdaptiveSizePolicy parameter, include limiting what other parameters, such as the -XX:SurvivorRatio parameter, can be configured to do in combination with the built-in algorithm. When you use the built-in algorithm, you give up some control over determining the resource allocations that are used during execution. If the results of using the built-in algorithm are unsatisfactory, it is easier to manually configure the JVM resources, than to try and tune the actions of the algorithm. Manually configuring the JVM resources involves the use of half as many options as it takes to tune the actions of the algorithm.

Consult these tuning parameters:

-XX:NewRatio=2 This is the default for a server that is configured for VM mode  
-XX:MaxNewSize= and -XX:NewSize=  
-XX:SurvivorRatio=  
-XX:+PrintTenuringDistribution  
-XX:TargetSurvivorRatio=

See https://www.ibm.com/docs/en/was-nd/9.0.5?topic=tj-tuning-hotspot-java-virtual-machines-solaris-hp-ux and https://www.ibm.com/docs/en/was-nd/9.0.5?topic=thjvmshu-sun-hotspot-jvm-tuning-parameters-solaris-hp-ux

Verbose garbage collection (-verbose:gc)

Verboseg garbage collection is a low-overhead log to understand garbage collection times and behavior. By default, it is written to stdout (e.g. native_stdout.log).

Java 8

For Java 8, use the following Java options:

-verbose:gc -XX:+PrintGCDateStamps -XX:+PrintGCDetails

To add safepoint times:

-verbose:gc -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCApplicationConcurrentTime -XX:+PrintGCApplicationStoppedTime

To send output to a set of rolling files instead of stderr:

-verbose:gc -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCApplicationConcurrentTime -XX:+PrintGCApplicationStoppedTime -Xloggc:verbosegc.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=100M

For WAS traditional, use SERVER_LOG_ROOT to write to the same directory as other files:

-verbose:gc -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCApplicationConcurrentTime -XX:+PrintGCApplicationStoppedTime -Xloggc:${SERVER_LOG_ROOT}/verbosegc.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=100M

-XX:+PrintHeapAtGC may be used for additional information although it has some overhead.

Java >= 9

For Java >9, use the recommended -Xlog:gc option instead. Note that -XX:+PrintGCDetails is no longer required (see the mapping for other options):

-Xlog:gc:stdout:time,level,tags

To add safepoints:

-Xlog:safepoint=info,gc:stdout:time,level,tags

To send output to a set of rolling files instead of stderr:

-Xlog:safepoint=info,gc:file=verbosegc.log:time,level,tags:filecount=10,filesize=100M
CompressedOops

On 64-bit, if using -Xmx less than or equal to 32GB, then -XX:+UseCompressedOops is enabled by default: "Compressed oops is supported and enabled by default in Java SE 6u23 and later" (http://docs.oracle.com/javase/7/docs/technotes/guides/vm/performance-enhancements-7.html)

Oops stands for ordinary object pointer.

Recent versions of HotSpot supports -Xmx much larger than 32GB with CompressedOops using -XX:ObjectAlignmentInBytes: https://bugs.openjdk.java.net/browse/JDK-8040176

Detailed Garbage Collection Tuning

-XX:+AggressiveOpts:

Turns on point performance optimizations that are expected to be on by default in upcoming releases. The changes grouped by this flag are minor changes to JVM runtime compiled code and not distinct performance features (such as BiasedLocking and ParallelOldGC). This is a good flag to try the JVM engineering team's latest performance tweaks for upcoming releases. Note: this option is experimental! The specific optimizations enabled by this option can change from release to release and even build to build. You should reevaluate the effects of this option with prior to deploying a new release of Java.

http://www.oracle.com/technetwork/java/tuning-139912.html#section4.2.4

Consider -XX:+UseTLAB which "uses thread-local object allocation blocks. This improves concurrency by reducing contention on the shared heap lock." (http://docs.oracle.com/cd/E13209_01/wlcp/wlss30/configwlss/jvmgc.html)

The -XX:+AlwaysPreTouch option may be used to force the entire Java heap into RAM on startup.

Permanent Region (permgen)

HotSpot used to have a dedicated region of the address space called the permanent generation to store things such as class meta-data, interned Strings, and class static variables. This region needed to be manually sized. If the region was exhausted, the JVM would throw an OutOfMemoryError with the message "PermGen space." The PermGen space has been removed in Java 8 (http://openjdk.java.net/projects/jdk8/milestones) and replaced with the Metaspace (unbounded by default but may be capped with -XX:MaxMetaspaceSize).

Hotspot's representation of Java classes (referred to here as class meta-data) is currently stored in a portion of the Java heap referred to as the permanent generation. In addition, interned Strings and class static variables are stored in the permanent generation. The permanent generation is managed by Hotspot and must have enough room for all the class meta-data, interned Strings and class statics used by the Java application. Class metadata and statics are allocated in the permanent generation when a class is loaded and are garbage collected from the permanent generation when the class is unloaded. Interned Strings are also garbage collected when the permanent generation is GC'ed.

The proposed implementation will allocate class meta-data in native memory and move interned Strings and class statics to the Java heap. Hotspot will explicitly allocate and free the native memory for the class meta-data. Allocation of new class meta-data would be limited by the amount of available native memory rather than fixed by the value of -XX:MaxPermSize, whether the default or specified on the command line.

http://openjdk.java.net/jeps/122

"The -XX:MaxPermSize= and -Xmx (Maximum Java Heap size) parameters respectively configure the maximum size of the permanent region, where the class code and related data are logically presented as part of the old generation region but are kept physically separate, and the maximum size of the main heap where Java objects and their data are stored either in the young or old generation regions. Together the permanent region and the main heap comprise the total Java heap. An allocation failure in either of these regions either represents the inability to accommodate either all the application code or all the application data, both of which are terminal conditions, that can exhaust available storage, and cause an OutOfMemory error." (https://www.ibm.com/support/knowledgecenter/SSAW57_8.5.5/com.ibm.websphere.nd.doc/ae/tprf_hotspot_jvm.html, https://docs.oracle.com/javase/7/docs/webnotes/tsg/TSG-VM/html/memleaks.html)

In addition, note that interned Strings moved to the Java heap starting in Java 7: https://www.oracle.com/technetwork/java/javase/jdk7-relnotes-418459.html#jdk7changes

Reference Processing

PhantomReferences are handled differently than finalizers. Queued PhantomReferences are processed on the back of every GC cycle.

By default, there is a single "Reference Handler" thread which processes the ReferenceQueue. Use -XX:+ParallelRefProcEnabled to enable multiple threads for parallel reference queue processing. This may be useful for things such as high DirectByteBuffer allocation and free rates.

DirectByteBuffers may be monitored with the BufferPoolMXBean: http://docs.oracle.com/javase/7/docs/api/java/lang/management/BufferPoolMXBean.html

Safepoints

Safepoints are the internal mechanism by which the JVM tries to pause application threads for operations such as stop-the-world garbage collections.

Additional information may be printed with:

-XX:+PrintSafepointStatistics

PreserveFramePointer

If using JDK >= 8u60, use -XX:+PreserveFramePointer to allow tools such as Linux perf to perform higher quality stack walking. Details:

DTrace Integration

Newer versions of Java have DTrace integration, but one large limitation is Bug 6617153, which causes DTrace to fail to evaluate Java thread stack names, making jstack nearly useless.

Code Cache

The default JIT compiled code cache size is 32MB-48MB. If there is available RAM, consider increasing this code cache size to improve JIT performance. For example:

-XX:ReservedCodeCacheSize=1536m

Increasing the maximum code cache size may have negative consequences. The longer the JIT keeps compiling, the more likely it is to generate code at higher optimisation levels. It takes a long time to compile at the higher optimization levels, and that time spent on the compiling can be a negative itself. More broadly, the higher optimization compilations produce much bigger compiled method bodies. Too many can start to impact the instruction cache. So, ideally, you want the JIT to just compile the "right" set of methods at "appropriate" optimization levels and then stop. There isn't any way of knowing when that has happened, so if the code cache is set very big it may keep going into negative territory if it runs for long enough. The best way to find the right value is to run experiments with different values and run for long periods of time.

To exclude certain methods from JIT code cache compilation:

-XX:CompileCommand=exclude,com/example/Exapmle.method

To log code compilations: -XX:+LogCompilation

Code Cache Flushing

We have observed cases where code cache flushing when the code cache is full causes application thread pauses (e.g. DTrace stack samples in libjvm.so\1cJCodeCachebAfind_and_remove_saved_code6FpnNmethodOopDesc__pnHnmethod+0x50)`. You may test if this is the case or not by disable code cache flushing, although of course this means that code can no longer be JITted after the code cache limit is reached:

-XX:-UseCodeCacheFlushing

For performance reasons, consider increasing the code cache size as well when doing this (tuned to available RAM):

-XX:-UseCodeCacheFlushing -XX:ReservedCodeCacheSize=1536m

If the code cache still fills up (e.g. lots of reflection, etc.) then you will receive this message in stderr:

CodeCache is full. Compiler has been disabled.

Relevant code cache changes:

Environment Variables

Use JAVA_TOOL_OPTIONS to specify additional JVM arguments for programs launches in that terminal/command prompt. For example:

export JAVA_TOOL_OPTIONS="-Xmx1024m"
/opt/IBM/WebSphere/AppServer/bin/collector.sh

async-profiler

async-profiler is a Safepoint-aware native sampling profiler.

Concurrent low-pause mark-sweep collector (CMS)

The CMS collector has been removed since Java 14: https://openjdk.java.net/jeps/363

The stop-the-world phases of the CMS garbage collector include CMS-remark (https://blogs.oracle.com/poonam/entry/understanding_cms_gc_logs), and CMS-initial-mark (https://blogs.oracle.com/jonthecollector/entry/the_unspoken_cms_and_printgcdetails).

CMS has poor contraction capabilities, partly because it can only compact on the back of a failed CMS, full collection. If fragmentation is high, this can cause CMS to fail more often and cause many full GCs.

"CMS (Concurrent Mark Sweep) garbage collection does not do compaction." (http://www.oracle.com/technetwork/java/javase/tech/g1-intro-jsp-135488.html)