IBM Java Health Center

IBM Monitoring and Diagnostics for Java - Health Center is a profiler for IBM Java. It includes a statistical CPU profiler that samples Java stacks that are using CPU. Recent versions generally have an overhead of less than 1% and are suitable for production use and may be enabled dynamically without restarting the JVM.

Client Installation

See the download instructions.

Agent Installation

The Health Center agent is pre-packaged with IBM Java 8 (on Linux, AIX, Windows, and z/OS) and IBM Semeru Runtimes 11 (on z/OS). See the informal instructions for other versions of IBM Semeru Runtimes.

Agent Usage

There are two ways to enable the Health Center agent: 1) a live socket mode or 2) a headless mode that produces HCD files. In general, we recommend the headless mode at startup (optionally for a limited duration) or dynamically enabled at runtime (optionally for a limited duration).

Health Center Recipe

  1. Open Garbage Collection View } Right Click } Change units } X axis } Date
  2. Observe the garbage collection to see roughly when there was significant activity and mouse over the start and end times. Take these times and crop the data in the menu } Data } Crop Data
  3. Observe the proportion of time spent in garbage collection % in the GC Summary view
  4. Switch to the CPU view and observe the average and maximum CPU %
  5. Switch to the Locking view and sort by Slow. If the number of gets or average hold times are high for the top hitters, review the lock Name
  6. Switch to the Profiling view
  7. Sort by self % (default sort) and observe the top 10 methods or so. If a single method self % is greater than 5-10%, this is concerning. Click on each one and observe the Invocation paths.
  8. Sort by tree %. Usually the first one will be something like Thread.run or Worker.run. Select this and change to the Called methods view. Expand the largest tree items until there is a large "drop;" for example, if methods are 100, 99, 100, 100, etc., and then suddenly there is a drop to one method with 60% and one with 40%, this is usually an indication of a major divergence in general application activity. Continue as needed until something interesting comes up (this is an art more than a science).

Overhead

In general, the overhead of Health Center is between 0.4 to 3%, depending on the mode. In the headless mode, the overhead is about 1%; however, if files roll over, this involves zipping files together and that has a momentary impact, which averaged out can increase overhead up to 3%. The socket mode has the lowest overhead of about 0.4%.

Gathering Data

There are two ways to gather HealthCenter data:

  1. Headless mode: writes data to local .hcd files with the HealthCenter agent enabled in headless mode.
  2. Socket mode: direct TCP connection from a HealthCenter client to a JVM with the HealthCenter agent enabled and data is streamed to the client (there is an option to export streamed data to an hcd file).

In general, headless mode is preferred for production usage to avoid connectivity/firewall issues and doesn't require direct human involvement.

Review Health Center agent configuration properties.

Warning: Health Center enables J9 -Xtrace under the covers which uses additional native memory on each thread based on -Xtrace:buffers. If you have a lot of threads and native memory constraints (RAM or virtual), this may cause native OutOfMemoryErrors, so it's always best to test Health Center with a stress test in a test environment before using in production. This risk may be exacerbated with Liberty which defaults to an unlimited maximum thread pool designed to maximize throughput, so Liberty users should consider capping this with <executor maxThreads="N" /> based on available native memory.

Headless mode

See the four different recipes of enabling headless mode:

Socket mode

Socket mode may be enabled during restart or dynamically similar to above although omit level=headless.

When using the socket mode, by default Health Center uses CORBA as the communication protocol. Another option is to use JRMP with -Dcom.ibm.java.diagnostics.healthcenter.agent.transport=jrmp. Both methods are similar in performance although JRMP has a built-in reaper thread that runs approximately once an hour which calls System.gc.

Enabling Health Center Dynamically

As partly covered in the above topics, Health Center may be enabled dynamically.

Disabling Health Center

After data collection has started, there are two ways to disable data collection:

  1. When configuring headless mode (either on restart or dynamically), use the configuration properties to stop collection after some point. For example (in minutes):

    -Dcom.ibm.java.diagnostics.healthcenter.headless.run.duration=30 -Dcom.ibm.java.diagnostics.healthcenter.headless.run.number.of.runs=1

    To skip some amount of profiling (e.g. startup), use, for example (in minutes):

    -Dcom.ibm.java.diagnostics.healthcenter.headless.delay.start=5  
  2. In socket mode, in the HealthCenter client, open the "Configure Collection of Data" screen and uncheck all boxes which will send a command to the agent to turn off most collection.

Analyzing Data

  1. Start Eclipse with Health Center installed
  2. In Eclipse } Window } Perspective } Open Perspective } Other... } Health Center Status Summary
  3. Click File } Load Data... to load an HCD file, or File } New Connection... to connect to a running Health Center agent.

WARNING: The HealthCenter client may quietly drop older data if its Java heap is nearly full, so use a very large heap when analyzing. (This is because it was originally designed for socket mode which streams data in).

Profiling View

Click the Profiling link. By default, the list is sorted by self percentage which will show very hot methods.

The Self (%) column reports the percent of samples where a method was at the top of the stack. The Tree (%) column reports the percent of samples where a method was somewhere else in the stack. In this example, WAS NCSA access logging was sampled more than 2% of the time, and the next largest single sample is java.lang.String.regionMatches. If we click on this row, we can expand the "Invocation Paths" to see which methods call this method. In this example, this 1% of samples was mostly called by JSF HTML rendering from the application.

If you sort by Tree %, skip the framework methods from Java and WAS, and find the first application method. In this example, about 47% of total samples was consumed by com.ibm.websphere.samples.daytrader.web.TradeAppServlet.performTask and all of the methods it called. The "Called Methods" view may be further reviewed to investigate the details of this usage.

Updating to the latest agent

  1. Stop the application servers and node agents (and DMGR if on that node).
  2. Install the Health Center Eclipse client and access the local help content to download the latest agent.
  3. After updating the files, make sure to chown them to the user that runs WAS.
  4. Re-start the application servers.

Platform-Specific Agent Update Notes

On AIX, if you had previously run Health Center, even if you stop all JVMs, you will probably see this error extracting libhealthcenter.so:

tar: 0511-188 Cannot create ./jre/lib/ppc64/libhealthcenter.so: Cannot open or remove a file containing a running program.

By default, AIX will keep shared libraries in memory even after all JVMs referencing that library have stopped. To remove the shared library from memory, you may either reboot the box or more simply, run the slibclean command (see also genkld and genld). This should be safe to run because it only affects shared libraries that have no current load or use counts:

The slibclean command unloads all object files with load and use counts of 0. It can also be used to remove object files that are no longer used from both the shared library region and in the shared library and kernel text regions by removing object files that are no longer required.

Now you should be able to overwrite libhealthcenter.so.

You may also find that healthcenter.jar has open file handles (e.g. lsof) in Java processes even if healthcenter was not enabled. This is because healthcenter.jar is in the ext JRE directory which is searched as part of some classpath operations. If you take a system dump, you will find a java.util.jar.JarFile object with a name field that includes healthcenter.jar, and this JarFile object probably has a native file handle open (although you will not find a java.io.File object with that path). In theory, it should be safe to overwrite healthcenter.jar even if running processes have open file handles to it because the JAR file will not be read by those JVMs that do not have healthcenter enabled.

Point to a different agent installation directory

It is possible to update to the latest agent without modifying the binaries in the WAS folder:

  1. Extract the agent ZIP into any directory; for example, /opt/healthcenter/agent/
  2. Take a javacore of the running target server and find the last value of -Djava.ext.dirs (note that there may be multiple instances, so always take the last value). For example: -Djava.ext.dirs=/opt/local/was85/tivoli/tam:/opt/local/was85/java/jre/lib/ext
  3. Prepend the path to the ext folder under the expanded HealthCenter agent directory to -Djava.ext.dirs. For example: -Djava.ext.dirs=/opt/healthcenter/agent/jre/lib/ext:/opt/local/was85/tivoli/tam:/opt/local/was85/java/jre/lib/ext
  4. Append this parameter as well as the following parameters (replacing the path to the HealthCenter agent) to the generic JVM arguments: -Djava.ext.dirs=/opt/healthcenter/agent/jre/lib/ext:/opt/local/was85/tivoli/tam:/opt/local/was85/java/jre/lib/ext -agentpath:/opt/healthcenter/agent/jre/bin/libhealthcenter.so -Dcom.ibm.java.diagnostics.healthcenter.agent.properties.file=/opt/healthcenter/agent/jre/lib/healthcenter.properties
  5. Append this parameter to the "Claspath" textbox on the same page as the generic JVM arguments (replacing the path to the HealthCenter agent): /opt/healthcenter/agent/jre/lib/ext/healthcenter.jar
  6. Add the necessary HealthCenter arguments described above to enable it.
  7. Restart the JVM.

Low mode

-Xhealthcenter:level=low disables method profiling since this has the highest overhead and creates the most data. This would be useful if you wanted something else from health center (e.g. garbage collection, native memory, etc.) with less overhead.

You may also disable thread stack capturing to reduce overhead further: -Dcom.ibm.diagnostics.healthcenter.data.threads=off

Low mode cannot be combined with headless (e.g. -Xhealthcenter:level=low,level=headless), so the way to do it is to use headless mode and then: In jre/lib/ext there is a file called healthcenter.jar. If you unpack that you will find a file called TRACESourceConfiguration.properties and this is what defines which data is switched on by trace. When we run in low mode, we turn off one of the profiling trace points. You can do this manually by editing this file and finding the entry j9jit.16=on and then changing it to j9jit.16=off. If you repackage the jar up you should find that the amount of trace generated is a lot less (but you won't get method profiling).

Hexadecimal Method Names

Sometimes you may see addresses (0x0123ABCD) instead of method names. This usually occurs for methods loaded very early in the JVM such as classloading methods.

This issue is generally resolved by starting HealthCenter from JVM startup or by using newer agents, so test upgrading the agent.

Warning: Health Center is sometimes recommended with -Xtrace:buffers={2m,dynamic} to reduce the probability of losing method name translations but note that this further increases native memory demands.

To investigate, use the option -Dcom.ibm.diagnostics.healthcenter.logging.methodlookup=debug and upload stdout/stderr.

Health Center Details

In one of the largest customer production situations, health center wrote about 5GB per hour of data to the filesystem.

Docker

To use Health Center through Docker with JRMP and the GUI client, java.rmi.server.hostname=9.140.104.32 which is the IP address of the machine doing the docker run and run with: docker run -p 1972:1972 myDockerContainer

FROM ibmjava:sdk
RUN mkdir /opt/app
COPY HelloWorld.java /opt/app
WORKDIR /opt/app
RUN javac  HelloWorld.java
EXPOSE 1972
CMD ["java","-Dcom.sun.management.jmxremote","-Dcom.sun.management.jmxremote.authenticate=false","-Dcom.sun.management.jmxremote.ssl=false","-Dcom.sun.management.jmxremote.rmi.port=1972","-Djava.rmi.server.hostname=9.140.104.32","-Dcom.sun.management.jmxremote.local.only=false","-Xhealthcenter:transport=jrmp","HelloWorld"]

Gathering HCD at Runtime

  1. Go to the current working directory (cwd) of the process. For example, for tWAS, that's normally /opt/IBM/WebSphere/AppServer/profiles/${PROFILE}/
  2. Underneath the cwd, there should be a temporary directory that HealthCenter created. This is normally named tmp_${NUMBER1}_${NUMBER2}_. If this directory is not found under the profiles directory, check under /tmp. If it's still not found, use lsof to search for the trace file: lsof -p ${PID} | grep "trace$"
  3. Copy the contents of this directory to a temporary directory. For example:
    mkdir /tmp/hc
    cp tmp*_/* /tmp/hc/
  4. Go to the temporary directory. For example: cd /tmp/hc/
  5. Zip up the files in this directory (if you don't have the zip utility, you can use the jar utility in the JDK which also creates a zip file):
    zip hctmp.hcd *

HCD File Name

The HCD file name is of the form ${dir}/${prefix}healthcenter${startday}${startmonth}${startyear}_${starthour}${startminutes}${startsseconds}_${pid}_${filenumber}.hcd. The temporary directory name is of the form ${dir}/tmp_${startday}${startmonth}${startyear}_${starthour}${startminutes}${startsseconds}_.

By default, ${dir} is the current working directory but may be overridden with -Dcom.ibm.java.diagnostics.healthcenter.headless.output.directory=somedir.

By default, ${prefix} is blank but may be set with -Dcom.ibm.java.diagnostics.healthcenter.headless.filename=someprefix. If this is set, an _ is automatically appended to the prefix; therefore, the file will be ${prefix}_healthcenter.... On WAS traditional, if this is set in the generic JVM arguments, the name of the WAS JVM may be set with -Xhealthcenter:level=headless -Dcom.ibm.java.diagnostics.healthcenter.headless.filename=${WAS_SERVER_NAME}.

Agent Version

To find out the agent version from the JDK, extract hcversion.properties (version.properties on older JDKs) from healthcenter.jar:

$ jar xf healthcenter.jar hcversion.properties && cat hcversion.properties && rm hcversion.properties
jar.version=3.0.17.20190121

Disabling Components

Particular Health Center components may be disabled with off system properties. For example:

-Dcom.ibm.diagnostics.healthcenter.data.classes=off
-Dcom.ibm.diagnostics.healthcenter.data.cpu=off
-Dcom.ibm.diagnostics.healthcenter.data.gc=off
-Dcom.ibm.diagnostics.healthcenter.data.io=off
-Dcom.ibm.diagnostics.healthcenter.data.jit=off
-Dcom.ibm.diagnostics.healthcenter.data.memory=off
-Dcom.ibm.diagnostics.healthcenter.data.profiling=off
-Dcom.ibm.diagnostics.healthcenter.data.threads=off

Diagnostic Traces

  • Log trace engine configuration processing:
    -Dcom.ibm.diagnostics.healthcenter.logging.TraceDataProvider=debug
  • Headless file processing:
    -Dcom.ibm.diagnostics.healthcenter.logging.headless=debug
  • Investigating MBean authentication/authorization:
    -Dcom.ibm.java.diagnostics.healthcenter.agent.debug=true
    Then set diagnostic trace using the product trace mechanism, or -Djava.util.logging.config.file:
    • com.ibm.java.diagnostics.healthcenter.agent=all
    • com.ibm.java.diagnostics.healthcenter.agent.mbean.HCLaunchMBean=all
  • Others:
    -Dcom.ibm.diagnostics.healthcenter.logging.Port=debug
    -Dcom.ibm.diagnostics.healthcenter.logging.ClassHistogram=debug
    -Dcom.ibm.diagnostics.healthcenter.logging.cpuplugin=debug

Large Memory Allocations

Health Center may be configured to track large memory allocations based on a number of bytes. For example, to track allocations greater than 10MB:

-Dcom.ibm.java.diagnostics.healthcenter.allocation.threshold.low=10485760

Note that, under the covers, this will cause Health Center to configure -Xdump. For example:

-Xdump:silent:events=allocation,filter=#10485760...

This will not produce any output files and will only be tracked by Health Center; however, this will silently increment the diagnostic artifact ID every time such an allocation occurs. This may explain very large artifact numbers in other produced files such as javacores, etc.

Health Center Thread Stacks

The profiler in Health Center is only aware of threads that use CPU, so if a thread is waiting on a database, for example, it will not show up in Health Center. However, starting with the Health Center agent version 2.2, it periodically captures every thread stack. The Health Center client has minimal capabilities to display this information; however, you can use the Health Center API to read an HCD file and print these stacks:

import java.io.File;
import java.io.PrintWriter;
import java.lang.Thread.State;
import java.lang.management.LockInfo;
import java.lang.management.MonitorInfo;
import java.text.SimpleDateFormat;
import java.util.ArrayList;
import java.util.Comparator;
import java.util.Date;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;

import com.ibm.java.diagnostics.healthcenter.api.HealthCenter;
import com.ibm.java.diagnostics.healthcenter.api.factory.HealthCenterFactory;
import com.ibm.java.diagnostics.healthcenter.api.threads.ThreadData;
import com.ibm.java.diagnostics.healthcenter.api.threads.ThreadsData;

public class HealthCenterThreadDumpAnalyzer {
  private static final SimpleDateFormat threadDumpDate = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");

  public static void main(String[] args) throws Throwable {
    if (args == null || args.length == 0) {
      System.err.println("usage: HealthCenterThreadDumpAnalyzer ${HCDFILE}");
      return;
    }

    File file = new File(args[0]);

    message("Loading " + file.getAbsolutePath());

    HealthCenter hc = HealthCenterFactory.connect(file);

    message("Getting all thread dumps");

    ThreadsData threadsData = hc.getThreadsData();
    HashMap<Long, ThreadData[]> threadDumps = threadsData.getAllThreads();
    List<Entry<Long, ThreadData[]>> sortedThreadDumps = new ArrayList<>();
    for (Entry<Long, ThreadData[]> entry : threadDumps.entrySet()) {
      sortedThreadDumps.add(entry);
    }
    sortedThreadDumps.sort(new Comparator<Entry<Long, ThreadData[]>>() {
      @Override
      public int compare(Entry<Long, ThreadData[]> x, Entry<Long, ThreadData[]> y) {
        return x.getKey().compareTo(y.getKey());
      }
    });

    message("Processing all " + sortedThreadDumps.size() + " thread dumps");

    try (PrintWriter out = new PrintWriter(new File(file.getParentFile(), file.getName() + "_threads.txt"))) {
      for (Entry<Long, ThreadData[]> threadDump : sortedThreadDumps) {
        out.println(threadDumpDate.format(new Date(threadDump.getKey())));
        out.println("Full thread dump Java:");
        out.println();

        ThreadData[] threads = threadDump.getValue();
        Map<String, String> contendedMonitorOwners = new HashMap<>();
        for (int i = 0; i < threads.length; i++) {
          ThreadData thread = threads[i];
          if (thread.getContendedMonitor() != null && !thread.getContendedMonitor().isEmpty()
              && thread.getContendedMonitorOwner() != null
              && !thread.getContendedMonitorOwner().isEmpty()) {
            contendedMonitorOwners.put(thread.getContendedMonitorOwner(), thread.getContendedMonitor());
          }
        }

        for (int i = 0; i < threads.length; i++) {
          ThreadData thread = threads[i];
          String threadName = thread.getName();

          String nid = String.format("%08x", threadName.hashCode());
          nid = "0" + nid.substring(1);

          out.println("\"" + threadName + "\" #" + i + " daemon prio=9 os_prio=0 tid=0x"
              + String.format("%016x", threadName.hashCode()) + " nid=0x" + nid + " "
              + getThreadDumpThreadState(thread.getState()) + " [0x0000000000000000]");

          out.println("   java.lang.Thread.State: " + thread.getState());

          StackTraceElement[] threadStack = thread.getStackTrace();

          if (threadStack != null) {
            boolean firstFrame = true;
            for (StackTraceElement frame : threadStack) {
              out.println("   at " + frame);

              if (firstFrame) {
                firstFrame = false;

                if (thread.getContendedMonitor() != null && !thread.getContendedMonitor().isEmpty()) {
                  String contendedMonitor = thread.getContendedMonitor();
                  writeContendedMonitor(out, contendedMonitor, true);
                }
              }
            }

            for (MonitorInfo monitor : thread.getLockedMonitors()) {
              out.println("   - locked " + monitor);
              throw new IllegalStateException("Not implemented " + monitor);
            }

            for (LockInfo lock : thread.getLockedSynchronizers()) {
              out.println("   - locked " + lock);
              throw new IllegalStateException("Not implemented " + lock);
            }

            for (String ownedMonitor : thread.getOwnedMonitors()) {
              int space = ownedMonitor.indexOf(' ');
              if (space != -1) {
                ownedMonitor = ownedMonitor.substring(0, space);
              }
              out.println("   - locked <0x" + String.format("%016x", ownedMonitor.hashCode()) + "> (a "
                  + ownedMonitor + ")");
            }

            String contendedMonitor = contendedMonitorOwners.get(threadName);
            if (contendedMonitor != null) {
              writeContendedMonitor(out, contendedMonitor, false);
            }
          }

          out.println();
        }

        out.println();
      }
    }

    message("Finished processing");
    System.exit(0);
  }

  private static void writeContendedMonitor(PrintWriter out, String contendedMonitor, boolean waiting) {
    String className = contendedMonitor.substring(0, contendedMonitor.indexOf('@'));
    String identityHashCode = contendedMonitor.substring(contendedMonitor.indexOf('@') + 1);
    out.println("   - " + (waiting ? "waiting to lock" : "locked") + " <0x"
        + String.format("%016x", Long.parseLong(identityHashCode, 16)) + "> (a " + className + ")");
  }

  public static String getThreadDumpThreadState(State threadState) {
    switch (threadState) {
    case BLOCKED:
      return "waiting for monitor entry";
    case NEW:
      return "new";
    case RUNNABLE:
      return "runnable";
    case TERMINATED:
      return "terminate";
    case TIMED_WAITING:
      return "waiting on condition";
    case WAITING:
      return "waiting on condition";
    default:
      throw new IllegalStateException("Not implemented for " + threadState);
    }
  }

  public static void message(String message) {
    System.out.println("[" + new Date() + "] " + message);
  }
}

By default, Health Center captures full stacks. If this appears to be a performance impact, you can limit this with -Dcom.ibm.java.diagnostics.healthcenter.thread.stack.depth=${MAXDEPTH}

The default thread stack collection interval is 30 seconds. This can be changed with -Dcom.ibm.java.diagnostics.healthcenter.thread.collection.interval=${SECONDS}

To disable collection of thread stacks: -Dcom.ibm.diagnostics.healthcenter.data.threads=off

References