Linux perf Recipe

  1. Install perf if it's not installed.
  2. Prepare the Java process:
    1. For an IBM Java or IBM Semeru Runtimes JVM:
      1. For IBM Java >= 8.0.7.20 or IBM Semeru Runtimes >= v8.0.352 / 11.0.17.0 / 17.0.5.0, restart the Java process with -XX:+PerfTool
      2. For older versions of IBM Java or IBM Semeru Runtimes, restart the Java process with -Xjit:perfTool while making sure to combine with commas with any pre-existing -Xjit options
    2. For a HotSpot JVM:
      1. Restart the Java process with -XX:+UnlockDiagnosticVMOptions -XX:+PreserveFramePointer -XX:+ShowHiddenFrames
      2. For a HotSpot JVM >= Java 16, also add the option -XX:+DumpPerfMapAtExit
      3. For a HotSpot JVM < Java 16:
        1. Also add the option -XX:-UseCodeCacheFlushing
        2. Compile perf-map-agent into some directory
  3. Run all of the following commands as root
  4. During the performance problem, run one of the following commands. Change 60 to the number of seconds you want to gather data for:
    1. For IBM Java and IBM Semeru Runtimes running on top of an Intel processor that is Haswell or later (see cat /proc/cpuinfo and reference Intel Processor names), use the following, although note that LBR has a limited stack depth, so use the next option if you need longer stacks:
      date +'%Y-%m-%d %H:%M:%S.%N %Z' &>> diag_starttimes_$(hostname).log; cat /proc/uptime &>> diag_starttimes_$(hostname).log; perf record --call-graph lbr -F 99 -a -g -- sleep 60
    2. For IBM Java and IBM Semeru Runtimes running on any other processor or if you're not sure what the processor is:
      date +'%Y-%m-%d %H:%M:%S.%N %Z' &>> diag_starttimes_$(hostname).log; cat /proc/uptime &>> diag_starttimes_$(hostname).log; perf record --call-graph dwarf,65528 -F 99 -a -g -- sleep 60
    3. For a HotSpot JVM:
      date +'%Y-%m-%d %H:%M:%S.%N %Z' &>> diag_starttimes_$(hostname).log; cat /proc/uptime &>> diag_starttimes_$(hostname).log; perf record --call-graph fp -F 99 -a -g -- sleep 60
  5. Wait for the above command to complete
  6. If running a HotSpot JVM:
    1. For a HotSpot JVM >= Java 16, stop the JVM gracefully to produce the /tmp/perf-$PID.map file. If you can't do this, then, instead, follow the instructions above for a HotSpot JVM < Java 16 using perf-map-agent
    2. For a HotSpot JVM < Java 16, run bin/create-java-perf-map.sh $PID from the perf-map-agent directory to create the /tmp/perf-$PID.map file
  7. Run the following command from the directory where perf record was run:
    perf script > diag_perfscript_$(hostname)_$(date +%Y%m%d_%H%M%S_%N).txt
  8. Gather a thread dump. This is very low overhead with the process pausing for generally about 10ms to 100ms.
    kill -3 $PID
  9. Optionally, for IBM Java and IBM Semeru Runtimes processes, gather an operating sytem core dump of the process if the security, disk and performance risks are acceptable (the process may pause for up to 30 seconds or more) and the process and operating system are configured for it (e.g. core and file ulimits, kernel.core_pattern truncation settings, etc.) using one of various mechanisms and then run jextract (IBM Java) or jpackcore (Semeru) on it; for example:
    $JDK/bin/jpackcore core*.dmp
  10. Run the following command from the directory where perf record was run; replace $LOGS_DIR with the location of the logs of the JVM (including stdout/stderr, verbosegc, server/application logs, etc.), $THREAD_DUMPS_DIR with the location where thread dumps were produced, and $OS_CORE_DUMPS_DIR if a core dump was produced:
    tar czvf diag_perf_$(hostname)_$(date +%Y%m%d_%H%M%S).tar.gz perf.data* diag_perfscript* /proc/kallsyms /boot/System.map-$(uname -r) /tmp/perf*map $LOGS_DIR $THREAD_DUMPS_DIR/javacore*.txt $OS_CORE_DUMPS_DIR/core*.dmp.zip
  11. Upload diag_perf_*.tar.gz

If you want to do basic analysis of the perf output yourself:

  1. Top 10 CPU-using stack frames:
    cat diag_perfscript*txt | awk 'go { go=0; print; } /cpu-clock:/ || /cycles:/ { go=1; }' | sort | uniq -c | sort -nr | head
  2. Create FlameGraphs:
    1. git clone https://github.com/brendangregg/FlameGraph
    2. cd FlameGraph
    3. cat diag_perfscript*txt | ./stackcollapse-perf.pl > out.perf-folded
    4. ./flamegraph.pl --width 1024 out.perf-folded > perf.svg
    5. ./flamegraph.pl --reverse --width 1024 out.perf-folded > perf-reverse.svg
    6. Open perf.svg and perf-reverse.svg in your browser

Notes:

  • For IBM Java and IBM Semeru Runtimes, if not all symbols are resolved, try again with the additional option -Xlp:codecache:pagesize=4k

For background, see Linux perf.