IBM WebSphere Application Server Performance Cookbook

Sizing OpenJ9 Native Memory

If running in a memory-constrained environment, review the following diagnostic and sizing guidance for OpenJ9 native memory.

It's not uncommon for a heavy application to use 500MB or more of native memory outside the Java heap. In such cases, this can be reduced as discussed below but that may come at a performance cost.

Diagnostics

On Linux, consider limiting arenas with the environment variable MALLOC_ARENA_MAX=1 and restart.
If using IBM Java 8 and there's an opportunity to restart the JVM, restart with the following option for additional "Standard Class Libraries" native memory accounting in javacores (minimal performance overhead):
```
-Dcom.ibm.dbgmalloc=true
```

Gather operating system statistics on resident process memory usage. A single snapshot at peak workload is an okay start but periodic snapshots over time using a script provide a better picture.

Linux examples:

With /proc:

$ PID=...
$ grep VmRSS /proc/$PID/status
VmRSS:     201936 kB

With ps (in KB):

$ PID=...
$ ps -p $PID -o rss
  RSS
201936

With top and review the RES column in KB (change -d for the interval been outputs in seconds and -n for the number of intervals):

$ PID=...
$ top -b -d 1 -n 1 -p $PID
top - 19:10:46 up  5:43,  0 users,  load average: 0.01, 0.05, 0.02
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.0 us,  0.0 sy,  0.0 ni,100.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :  15950.9 total,  13429.6 free,    473.5 used,   2047.8 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.  15239.3 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
      1 default   20   0 7832760 201936  55304 S   6.2   1.2   0:06.86 java

Gather javacores of the process. A single javacore at peak workload is an okay start but periodic javacores over time using a script provide a better picture.
1. Linux examples:
  1. Use kill -QUIT (assuming default settings that only produce a javacore):
```
$ PID=...
$ kill -QUIT $PID
```
Gather detailed per-process memory mappings. A single snapshot at peak workload is an okay start but periodic snapshots over time using a script provide a better picture.
1. Linux example:
```
$ PID=...
$ cat /proc/$PID/smaps
```
If possible, ensure verbose garbage collection is enabled; for example:
```
-Xverbosegclog:verbosegc.%seq.log,20,50000
```

Review and Sizing

Review the NATIVEMEMINFO section in the javacores. Note that these are virtual memory allocations and not necessarily resident. Review all of them with a particular focus on:
1. Java Heap: The native allocation(s) for the Java heap itself (-Xmx/-XX:MaxRAMPercentage). Even if -Xms (or -XX:InitialRAMPercentage) is less than -Xmx/-XX:MaxRAMPercentage and heap usage is less than -Xmx/-XX:MaxRAMPercentage, you should always assume the entire -Xmx/-XX:MaxRAMPercentage native memory will be touched (and thus resident) at some point because even if application workload never reaches that amount of live Java heap usage, most modern garbage collectors are generational which almost always means trash will accumulate in the tenured region until a full GC, and thus most or all of the Java heap is likely to become resident given enough time.
2. Classes: This is the native backing of classes loaded in the Java heap. If this is excessively large, gather a system dump of the process and check for classloader memory leaks with the Eclipse Memory Analyzer Tool.
3. Threads: Each thread has two stacks both of which live in native memory. In some cases, very large stacks and/or a customized maximum stack size (-Xss) can inflate this number, but more often a large value here simply reflects a large number of threads that can be reduced or may be due to a thread leak. Review threads and thread stacks in the javacore and consider reducing thread pool maximums. To investigate a thread leak, gather a system dump of the process and review with the Eclipse Memory Analyzer Tool).
4. JIT: Some growth in this is expected over time but should level out at the maximum specified by -Xcodecachetotal, -Xjit:dataTotal and -Xjit:scratchSpaceLimit (see below). Note that defaults in recent versions are relatively large (upwards of 550MB or more at peak, primarily driven by the code cache and spikes in JIT compilation).
5. Direct Byte Buffers: These are native memory allocations driven by Java code and may have different drivers. To investigate what's holding on to DirectByteBuffers, gather a system dump of the process, review with the Eclipse Memory Analyzer Tool) and run the query, IBM Extensions } Java SE Runtime } DirectByteBuffers.
6. Unused <32bit allocation regions: Available space within the -Xmcrs value (for compressed references)
Review the 1STSEGMENT lines in the javacores using get_memory_use.pl to break down some the resident amounts of some of the above virtual amounts.
Review verbose garbage collection for a lot of phantom reference processing which may be a symptom of spikes in DirectByteBuffers. Even if Direct Byte Buffer usage in NATIVEMEMINFO above is relatively low, there may have been a spike in DirectByteBuffer memory usage which, in general, will only be returned to libc free lists rather than going back to the operating system.
Check the javacore and per-process memory mappings for non-standard native libraries (e.g. loaded with -agentpath, -agentlib, and -Xrun) and consider testing without each library.
Consider tuning the following options:
1. Maximum Java heap size: -Xmx_m or -XX:MaxRAMPercentage=_
2. Maximum size of threads and thread pools (e.g. <executor coreThreads="_" /> for Liberty or maximum thread pool sizes for WebSphere Application Server traditional)
3. Maximum JIT code cache: -XX:codecachetotalMaxRAMPercentage=X or -Xcodecachetotal_m (default 256MB)
4. Maximum JIT data size (in KB): -Xjit:dataTotal=_
5. JIT scratch space limit (in KB): -Xjit:scratchSpaceLimit=_ (default 256MB)
6. Maximum shared class cache size (though it must be destroyed first to reduce an existing one): -Xscmx_m
7. Number of garbage collection helper threads: -Xgcthreads_
8. Number of JIT compilation threads: -XcompilationThreads_
9. Maximum stack size: -Xss_k
Consider using a JITServer (a.k.a. IBM Semeru Cloud Compiler) to move most JIT compilation memory demands to another process.

Detailed Diagnostics

If you suspect a leak, monitor unfreed native memory allocations:
1. Linux >= 4.1: eBPF memleak.py
2. IBM Java 8 LinuxNativeTracker
  1. For IBM Semeru Runtimes, open a support case to ask for a custom build of LinuxNativeTracker.
If unaccounted memory remains, gather all of the same information above as well as a system dump of the process and cross-reference per-process memory maps to known JVM virtual memory allocations to find unaccounted for memory.
1. Note that NATIVEMEMINFO may be dumped from a system dump using the !nativememinfo command in jdmpview.
2. Fragmentation in C libraries is also possible. Use a native debugger script (e.g. for Linux glibc) to walk the in-use and free lists and search for holes in memory.

Previous Section (Javacore Overhead) | Next Section (Troubleshooting HotSpot Recipes) | Back to Table of Contents