HotSpot JVM
HotSpot JVM Recipe
- Review the JVM-independent recipe in the Java chapter.
- In most cases, the default
-XX:+UseG1GC
or-XX:+UseParallelOldGC
garbage collection policies (depending on version) work best, with the key tuning being the maximum heap size (-Xmx
). - Set
-XX:+HeapDumpOnOutOfMemoryError
. - Enable verbose
garbage collection and use a tool such as the Garbage
Collection and Memory Visualizer to confirm the proportion of time
in stop-the-world garbage collection pauses is less than ~10% and
ideally less than 1%.
- Check for long individual pause times (e.g. greater than 400ms or whatever response time expectations are)
- For G1GC, check for humongous allocations.
- Review the latest garbage collection tuning guidance.
General
Use -XX:+PrintFlagsFinal
to see all the options the JVM
actually starts with.
Garbage Collection
By default, the collector uses N threads for minor collection where N = # of CPU core threads. Control with -XX:ParallelGCThreads=N
Comparing Policies
-XX:+UseParallelOldGC | -XX:+UseG1GC | -XX:+UseShenandoahGC | -XX:+UseZGC | -XX:+UseSerialGC | -XX:+UseConcMarkSweepGC | -XX:+UseEpsilonGC | |
---|---|---|---|---|---|---|---|
Generational - most GC pauses are short (nursery/scavenge collections) | Yes (Two Generations) | Yes (Two Generations) | Yes (One Generation) | Yes (One Generation) | Yes (Two Generations) | Yes (Two Generations) | No |
Compaction | Alayws | Partial | Concurrent | ? | No | Never | No |
Large Heaps (>10GB) | Maybe | Yes | Yes | ? | ? | Maybe | Maybe |
Soft Real Time - all GC pauses are very short (unless cpu/heap exhaustion occurs) | No | Yes | Yes | ? | No | Yes | Yes |
Hard Real Time - requires hard real time OS, all GC pauses are very short (unless CPU/heap exhaustion occurs) | No | No | No | ? | No | No | Yes |
Benefits | Tries to balance application throughput with low pause times | Regionalized heap - good for very large heaps | Designed for very large heaps | ? | No cross-thread contention | Special circumstances | No garbage collection |
Potential Consequences | Not designed for low latency requirements | ? | ? | ? | Potentially limited throughput of GC | Non-compacting - requires strategy to force compactions; Hard to tune; Larger memory footprint (~30%); Reduced throughput; Longest worst-case pause times (when compaction is unavoidable); Deprecated | Memory exhaustion |
Recommended for | General Use (e.g. Web applications, messaging systems) | General Use on recent versions of Java | Large heaps (>10GB) | ? | ? | Deprecated | Benchmarks |
Garbage-First Garbage Collector (G1GC)
The Garbage First Garbage Collector (G1GC) is a multi-region, generational garbage collector. Review the G1GC Tuning Guide.
G1GC is the default collector starting with Java 9.
Humongous objects
Any object larger than half the region size
(-XX:G1HeapRegionSize
) is considered a humongous
object, it's allocated directly into the old generation, and it
consumes the entire region which drives fragmentation.
Print humongous requests from a verbosegc:
awk 'BEGIN {print "Humongous Allocation";} /humongous/ { for (i=1;i<=NF;i++) { if ($i == "allocation" && $(i+1) == "request:") { print $(i+2); } } }' $FILE > data.csv
To create a histogram using Python+Seaborn:
import seaborn as sns
import matplotlib
import matplotlib.pyplot as plt
import pandas as pd
sns.set_theme()
data = pd.read_csv("data.csv")
axes = sns.histplot(data, x="Humongous Allocation")
axes.ticklabel_format(style='plain')
axes.get_xaxis().set_major_formatter(matplotlib.ticker.StrMethodFormatter('{x:,.0f}'))
axes.get_yaxis().set_major_formatter(matplotlib.ticker.StrMethodFormatter('{x:,.0f}'))
plt.xticks(rotation=45)
plt.tight_layout()
plt.savefig('humongous.png')
plt.show()
Throughput/Parallel Scavenge Collector (ParallelGC)
This is the default policy on Java 8 and below.
The throughput collector that performs parallel scavenge copy collection on the young generation. This type of garbage collection is the default type on multi-processor server class machines.
Two types of tuning for this collector:
Option 1: Use the default throughput/parallel scavenge collector with built-in tuning enabled.
Starting with Version 5, the Sun HotSpot JVM provides some detection of the operating system on which the server is running, and the JVM attempts to set up an appropriate generational garbage collection mode, that is either parallel or serial, depending on the presence of multiple processors, and the size of physical memory. It is expected that all of the hardware, on which the product runs in production and preproduction mode, satisfies the requirements to be considered a server class machine. However, some development hardware might not meet this criteria.
The behavior of the throughput garbage collector, whether tuned automatically or not, remains the same and introduces some significant pauses, that are proportional to the size of the used heap, into execution of the Java application system as it tries to maximize the benefit of generational garbage collection. However, these automatic algorithms cannot determine if your workload well-suits its actions, or whether the system requires or is better suited to a different garbage collection strategy.
Consult these tuning parameters:
-XX:+UseParallelGC
-XX:+UseAdaptiveSizePolicy
-XX:+AggressiveHeapOption 2: Use the default throughput/parallel scavenge collector, but tune it manually.
Disadvantages of using the built-in algorithm that is established using the -XX:+UseAdaptiveSizePolicy parameter, include limiting what other parameters, such as the -XX:SurvivorRatio parameter, can be configured to do in combination with the built-in algorithm. When you use the built-in algorithm, you give up some control over determining the resource allocations that are used during execution. If the results of using the built-in algorithm are unsatisfactory, it is easier to manually configure the JVM resources, than to try and tune the actions of the algorithm. Manually configuring the JVM resources involves the use of half as many options as it takes to tune the actions of the algorithm.
Consult these tuning parameters:
-XX:NewRatio=2 This is the default for a server that is configured for VM mode -XX:MaxNewSize= and -XX:NewSize= -XX:SurvivorRatio= -XX:+PrintTenuringDistribution -XX:TargetSurvivorRatio=
See https://www.ibm.com/docs/en/was-nd/9.0.5?topic=tj-tuning-hotspot-java-virtual-machines-solaris-hp-ux and https://www.ibm.com/docs/en/was-nd/9.0.5?topic=thjvmshu-sun-hotspot-jvm-tuning-parameters-solaris-hp-ux
Verbose garbage collection (-verbose:gc)
Verboseg garbage collection is a low-overhead log to understand
garbage collection times and behavior. By default, it is written to
stdout (e.g. native_stdout.log
).
Java 8
For Java 8, use the following Java options:
-verbose:gc -XX:+PrintGCDateStamps -XX:+PrintGCDetails
To add safepoint times:
-verbose:gc -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCApplicationConcurrentTime -XX:+PrintGCApplicationStoppedTime
To send output to a set of rolling files instead of stderr:
-verbose:gc -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCApplicationConcurrentTime -XX:+PrintGCApplicationStoppedTime -Xloggc:verbosegc.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=100M
For WAS traditional, use SERVER_LOG_ROOT
to write to the
same directory as other files:
-verbose:gc -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCApplicationConcurrentTime -XX:+PrintGCApplicationStoppedTime -Xloggc:${SERVER_LOG_ROOT}/verbosegc.log -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=100M
-XX:+PrintHeapAtGC
may be used for additional
information although it has some overhead.
Java >= 9
For Java >9, use the recommended
-Xlog:gc
option instead. Note that
-XX:+PrintGCDetails
is no longer required (see the mapping
for other options):
-Xlog:gc:stdout:time,level,tags
To add safepoints:
-Xlog:safepoint=info,gc:stdout:time,level,tags
To send output to a set of rolling files instead of stderr:
-Xlog:safepoint=info,gc:file=verbosegc.log:time,level,tags:filecount=10,filesize=100M
CompressedOops
On 64-bit, if using -Xmx
less than or equal to 32GB,
then -XX:+UseCompressedOops
is enabled by default:
"Compressed oops is supported and enabled by default in Java SE 6u23 and
later" (http://docs.oracle.com/javase/7/docs/technotes/guides/vm/performance-enhancements-7.html)
Oops stands for ordinary object pointer.
Recent versions of HotSpot supports -Xmx much larger than 32GB with CompressedOops using -XX:ObjectAlignmentInBytes: https://bugs.openjdk.java.net/browse/JDK-8040176
Detailed Garbage Collection Tuning
-XX:+AggressiveOpts:
Turns on point performance optimizations that are expected to be on by default in upcoming releases. The changes grouped by this flag are minor changes to JVM runtime compiled code and not distinct performance features (such as BiasedLocking and ParallelOldGC). This is a good flag to try the JVM engineering team's latest performance tweaks for upcoming releases. Note: this option is experimental! The specific optimizations enabled by this option can change from release to release and even build to build. You should reevaluate the effects of this option with prior to deploying a new release of Java.
http://www.oracle.com/technetwork/java/tuning-139912.html#section4.2.4
Consider -XX:+UseTLAB which "uses thread-local object allocation blocks. This improves concurrency by reducing contention on the shared heap lock." (http://docs.oracle.com/cd/E13209_01/wlcp/wlss30/configwlss/jvmgc.html)
The -XX:+AlwaysPreTouch option may be used to force the entire Java heap into RAM on startup.
Permanent Region (permgen)
HotSpot used to have a dedicated region of the address space called the permanent generation to store things such as class meta-data, interned Strings, and class static variables. This region needed to be manually sized. If the region was exhausted, the JVM would throw an OutOfMemoryError with the message "PermGen space." The PermGen space has been removed in Java 8 (http://openjdk.java.net/projects/jdk8/milestones) and replaced with the Metaspace (unbounded by default but may be capped with -XX:MaxMetaspaceSize).
Hotspot's representation of Java classes (referred to here as class meta-data) is currently stored in a portion of the Java heap referred to as the permanent generation. In addition, interned Strings and class static variables are stored in the permanent generation. The permanent generation is managed by Hotspot and must have enough room for all the class meta-data, interned Strings and class statics used by the Java application. Class metadata and statics are allocated in the permanent generation when a class is loaded and are garbage collected from the permanent generation when the class is unloaded. Interned Strings are also garbage collected when the permanent generation is GC'ed.
The proposed implementation will allocate class meta-data in native memory and move interned Strings and class statics to the Java heap. Hotspot will explicitly allocate and free the native memory for the class meta-data. Allocation of new class meta-data would be limited by the amount of available native memory rather than fixed by the value of -XX:MaxPermSize, whether the default or specified on the command line.
"The -XX:MaxPermSize= and -Xmx (Maximum Java Heap size) parameters respectively configure the maximum size of the permanent region, where the class code and related data are logically presented as part of the old generation region but are kept physically separate, and the maximum size of the main heap where Java objects and their data are stored either in the young or old generation regions. Together the permanent region and the main heap comprise the total Java heap. An allocation failure in either of these regions either represents the inability to accommodate either all the application code or all the application data, both of which are terminal conditions, that can exhaust available storage, and cause an OutOfMemory error." (https://www.ibm.com/support/knowledgecenter/SSAW57_8.5.5/com.ibm.websphere.nd.doc/ae/tprf_hotspot_jvm.html, https://docs.oracle.com/javase/7/docs/webnotes/tsg/TSG-VM/html/memleaks.html)
In addition, note that interned Strings moved to the Java heap starting in Java 7: https://www.oracle.com/technetwork/java/javase/jdk7-relnotes-418459.html#jdk7changes
Reference Processing
PhantomReferences are handled differently than finalizers. Queued PhantomReferences are processed on the back of every GC cycle.
By default, there is a single "Reference Handler" thread which processes the ReferenceQueue. Use -XX:+ParallelRefProcEnabled to enable multiple threads for parallel reference queue processing. This may be useful for things such as high DirectByteBuffer allocation and free rates.
DirectByteBuffers may be monitored with the BufferPoolMXBean: http://docs.oracle.com/javase/7/docs/api/java/lang/management/BufferPoolMXBean.html
Safepoints
Safepoints are the internal mechanism by which the JVM tries to pause application threads for operations such as stop-the-world garbage collections.
Additional information may be printed with:
-XX:+PrintSafepointStatistics
PreserveFramePointer
If using JDK >= 8u60, use -XX:+PreserveFramePointer
to allow tools such as Linux perf to perform higher quality stack
walking. Details:
- https://docs.oracle.com/javase/9/tools/java.htm
- http://www.brendangregg.com/Slides/JavaOne2015_MixedModeFlameGraphs.pdf
DTrace Integration
Newer versions of Java have DTrace integration, but one large limitation is Bug 6617153, which causes DTrace to fail to evaluate Java thread stack names, making jstack nearly useless.
Code Cache
The default JIT compiled code cache size is 32MB-48MB. If there is available RAM, consider increasing this code cache size to improve JIT performance. For example:
-XX:ReservedCodeCacheSize=1536m
Increasing the maximum code cache size may have negative consequences. The longer the JIT keeps compiling, the more likely it is to generate code at higher optimisation levels. It takes a long time to compile at the higher optimization levels, and that time spent on the compiling can be a negative itself. More broadly, the higher optimization compilations produce much bigger compiled method bodies. Too many can start to impact the instruction cache. So, ideally, you want the JIT to just compile the "right" set of methods at "appropriate" optimization levels and then stop. There isn't any way of knowing when that has happened, so if the code cache is set very big it may keep going into negative territory if it runs for long enough. The best way to find the right value is to run experiments with different values and run for long periods of time.
To exclude certain methods from JIT code cache compilation:
-XX:CompileCommand=exclude,com/example/Exapmle.method
To log code compilations: -XX:+LogCompilation
Code Cache Flushing
We have observed cases where code cache flushing when the code cache
is full causes application thread pauses (e.g. DTrace stack samples in
libjvm.so\
1cJCodeCachebAfind_and_remove_saved_code6FpnNmethodOopDesc__pnHnmethod+0x50)`.
You may test if this is the case or not by disable code cache flushing,
although of course this means that code can no longer be JITted after
the code cache limit is reached:
-XX:-UseCodeCacheFlushing
For performance reasons, consider increasing the code cache size as well when doing this (tuned to available RAM):
-XX:-UseCodeCacheFlushing -XX:ReservedCodeCacheSize=1536m
If the code cache still fills up (e.g. lots of reflection, etc.) then you will receive this message in stderr:
CodeCache is full. Compiler has been disabled.
Relevant code cache changes:
Environment Variables
Use JAVA_TOOL_OPTIONS to specify additional JVM arguments for programs launches in that terminal/command prompt. For example:
export JAVA_TOOL_OPTIONS="-Xmx1024m"
/opt/IBM/WebSphere/AppServer/bin/collector.sh
async-profiler
async-profiler
is a Safepoint-aware native sampling profiler.
Concurrent low-pause mark-sweep collector (CMS)
The CMS collector has been removed since Java 14: https://openjdk.java.net/jeps/363
The stop-the-world phases of the CMS garbage collector include CMS-remark (https://blogs.oracle.com/poonam/entry/understanding_cms_gc_logs), and CMS-initial-mark (https://blogs.oracle.com/jonthecollector/entry/the_unspoken_cms_and_printgcdetails).
CMS has poor contraction capabilities, partly because it can only compact on the back of a failed CMS, full collection. If fragmentation is high, this can cause CMS to fail more often and cause many full GCs.
"CMS (Concurrent Mark Sweep) garbage collection does not do compaction." (http://www.oracle.com/technetwork/java/javase/tech/g1-intro-jsp-135488.html)