J9 Native OutOfMemoryError Recipe

In the most common cases of IBM Java and Semeru/OpenJ9 JVMs, compressed references are enabled by default for -Xmx less than 57GB in recent versions of Java. When compressed references are enabled, native structures backing classes, classloaders, threads, and monitors (CTM) must be allocated in the 0-4GB virtual address space range (or 0-2GB on z/OS). For the purposes of this discussion, we will call this space "below the bar". If there is insufficient space below the bar for a new CTM due to excessive usage of these structures, competition with other native memory allocations, or native memory fragmentation, then a native OutOfMemoryError (NOOM) is thrown even if there is available physical and virtual memory for the process.

  1. Confirm from the javacore.txt file produced by the OutOfMemoryError that it is a non-Java heap OOM by observing text after Detail "java/lang/OutOfMemoryError". Examples of NOOMs are the following although note that such issues may also be caused by physical memory exhaustion, virtual memory exhaustion, ulimit settings or other causes unrelated to compressed references:
    1. 1TISIGINFO Dump Event "systhrow" (00040000) Detail "java/lang/OutOfMemoryError" "native memory exhausted"
    2. 1TISIGINFO Dump Event "systhrow" (00040000) Detail "java/lang/OutOfMemoryError" "Failed to create a thread: retVal -1073741830, errno 112 (0x70), errno2 0xb510292" received
  2. Confirm in the javacore.txt file that compressed references are enabled by checking that 1CIOMRVERSION or 1CIGCVERSION contains CMPRSS (or check for "Compressed References" in !coreinfo in jdmpview for a system dump).
  3. On non-z/OS platforms, review the 1STHEAPTYPE Object Memory section in the javacore.txt file to confirm that all Java heap region start column values are 0x1_0000_0000 or above (or 0x8000_0000 on z/OS). If they aren't, use -Xgc:preferredHeapBase=0x100000000 although this will have a slight performance impact.
    1STHEAPTYPE    Object Memory
    NULL           id                 start              end                size               space/region
    1STHEAPSPACE   0x000000501B4EA810         --                 --                 --         Generational 
    1STHEAPREGION  0x000000501B4EAB40 0x0000000100000000 0x000000017C000000 0x00000000FC000000 Generational/Tenured Region 
    1STHEAPREGION  0x000000501B4EAA30 0x0000000230000000 0x00000002AC000000 0x000000007C000000 Generational/Nursery Region 
    1STHEAPREGION  0x000000501B4EA920 0x00000002AC000000 0x00000002C0000000 0x0000000014000000 Generational/Nursery Region 
  4. If you have a system dump, you can get the most accurate understanding of below the bar storage by walking the J9HeapWrapper linked list rooted in j9javavm } portLibrary } omrPortLibrary } portGlobals } platformGlobals } subAllocHeapMem32 } firstHeapWrapper. Consider the an example !belowthebar command to do this.
  5. Otherwise, review the NATIVEMEMINFO section of the javacore.txt file and focus on the following lines (or run the !nativememinfo command in jdmpview on a core dump):
    • 3MEMUSER | +--Classes: 1,223,940,328 bytes / 162364 allocations
      • A subset of these data structures are below the bar. Use get_memory_use.pl and observe Class Memory in 256MB segments less than 0x10. For example:
      Virtual Address Segments (split 0x10000000/256.00 MB)
      ==
      0x2 = Class Memory (RAM) (182.71 MB), Segment Total (191588400)
      0x3 = Class Memory (RAM) (5.51 MB), Segment Total (5774504)
      0x4 = Class Memory (RAM) (64.07 MB), Segment Total (67182008)
      0x5 = Class Memory (RAM) (8.02 KB), Segment Total (8208)
      If this usage is excessive, check for class or classloader leaks or excessive class usage. Use -Dsun.reflect.inflationThreshold=0 if there are a large number of sun/reflect/DelegatingClassLoader instances.
    • 4MEMUSER | | +--Java Stack: 17,656,576 bytes / 345 allocations
      • These are thread stacks below the bar (the "Native Stacks" line are allocated by the operating system and are usually high in the address space). Important note: these are allocated from the J9 segments even though they are not printed in the 1STSEGMENT lines. If there are many, try to reduce the number of threads or check for thread leaks, and check for any excessively large stack size (-Xss).
    • 4MEMUSER | | +--Unused <32bit allocation regions: 73,580,197 bytes / 19 allocations
      • This is free segment memory below the bar although some of it may be fragmented.
    • 5MEMUSER | | | +--Direct Byte Buffers: 7,945,600 bytes / 562 allocations
      • On non-z/OS platforms, these might compete with CTM or drive fragmentation. If there are many, try to reduce them, search for leaks, or drive more frequent cleanup of their PhantomReferences with GC tuning (or -XX:MaxDirectMemorySize). On WAS traditional, try channelwritetype=sync.
    • Run get_memory_use.pl and observe any other usage in 256MB segments less than 0x10.
  6. If using a type 2 JDBC driver which uses native memory that may compete with CTM (and must compete with CTM with the type 2 DB2 on z/OS driver), consider switching to the type 4 driver.
  7. On Windows, consider setting HKLM\System\CurrentControlSet\Control\Session Manager\Memory Management\AllocationPreference=0x100000 (REG_DWORD) to avoid any tendency of allocating non-CTM native memory below the bar.
  8. Check that the operating system has sufficient free physical memory at the time of the NOOM.
  9. If the NOOM might be related to competition from non-CTM source, consider increasing -Xmcrs.
  10. Set -Dcom.ibm.dbgmalloc=true and review 4MEMUSER Zip, 4MEMUSER Wrappers, and 5MEMUSER Malloc usage in NATIVEMEMINFO which may compete with CTM or drive fragmentation. If there are many, try to reduce them or search for leaks.
  11. If native memory usage below the bar cannot be accounted for using the above items, then review any other native memory users by reviewing -agentpath, -agentlib, and -Xrun libraries, any other loaded shared objects as seen in native operating system core debuggers, or run a native leak tracker such as eBPF, LinuxNativeTracker, etc.
  12. If a temporary workaround is needed, test using -Xnocompressedrefs although this may have a performance impact of up to 10-20% and increase Java heap usage significantly.

For background, see J9 Compressed References.