Linux perf Recipe

  1. Install perf if it's not installed.
  2. Prepare the Java process:
    1. For IBM Java >= 8.0.7.20 or Semeru >= v8.0.352 / 11.0.17.0 / 17.0.5.0, restart the Java process with -XX:+PerfTool
    2. For older versions of IBM Java and Semeru, restart the Java process with -Xjit:perfTool while making sure to combine with commas with any pre-existing -Xjit options
    3. For a HotSpot JVM, restart the Java process with -XX:+PreserveFramePointer and perf-map-agent or, for Java >= 16, restart with -XX:+DumpPerfMapAtExit to create /tmp/perf-$PID.map on graceful JVM exit.
  3. During the performance problem, run one of the following commands as root. Change 60 to the number of seconds you want to gather data for:
    1. For IBM Java/Semeru running on top of an Intel processor that is Haswell or later (see cat /proc/cpuinfo and reference Intel.com), use the following, although note that LBR has a limited stack depth, so use the next option if you need longer stacks:
      date +'%Y-%m-%d %H:%M:%S.%N %Z' &>> diag_starttimes_$(hostname).log; cat /proc/uptime &>> diag_starttimes_$(hostname).log; perf record --call-graph lbr -F 99 -a -g -- sleep 60
    2. For IBM Java/Semeru running on any other processor or if you're not sure what the processor is:
      date +'%Y-%m-%d %H:%M:%S.%N %Z' &>> diag_starttimes_$(hostname).log; cat /proc/uptime &>> diag_starttimes_$(hostname).log; perf record --call-graph dwarf,65528 -F 99 -a -g -- sleep 60
    3. For a HotSpot JVM:
      date +'%Y-%m-%d %H:%M:%S.%N %Z' &>> diag_starttimes_$(hostname).log; cat /proc/uptime &>> diag_starttimes_$(hostname).log; perf record --call-graph fp -F 99 -a -g -- sleep 60
  4. After the above completes, run the following command:
    perf script > diag_perfscript_$(hostname)_$(date +%Y%m%d_%H%M%S_%N).txt
  5. After the above completes, gather a thread dump so that thread IDs may be mapped to thread names. This is very low overhead with the process pausing for generally about 10ms to 100ms.
    kill -3 $PID
  6. Similarly, gather an operating sytem core dump of the process if the security, disk and performance risks are acceptable (the process may pause for up to 30 seconds or more) and the process and operating system are configured for it (e.g. core and file ulimits, kernel.core_pattern truncation settings, etc.) using one of various mechanisms and then run jextract (IBM Java) or jpackcore (Semeru) on it; for example:
    $JAVA/bin/jextract core*.dmp
  7. As root (needed to access /proc/kallsyms), run the following commands to archive the perf data; replace $THREAD_DUMPS_DIR with the location where thread dumps were produced, and include the packed operating system core dump if produced:
    # perf archive
    # tar czvf diag_perf_$(hostname)_$(date +%Y%m%d_%H%M%S).tar.gz perf.data* diag_perfscript* diag_perfscript* perf.data.tar.bz2 /proc/kallsyms /boot/System.map-$(uname -r) /tmp/perf*map $THREAD_DUMPS_DIR/javacore*.txt $OS_CORE_DUMPS_DIR/core*.dmp.zip
  8. Upload diag_perf_*.tar.gz and any Java/WAS logs, particularly verbosegc if enabled

If you want to do basic analysis of the perf output yourself:

  1. Top 10 CPU-using stack frames:
    cat diag_perfscript*txt | awk 'go { go=0; print; } /cpu-clock:/ || /cycles:/ { go=1; }' | sort | uniq -c | sort -nr | head
  2. Create FlameGraphs:
    1. git clone https://github.com/brendangregg/FlameGraph
    2. cd FlameGraph
    3. cat diag_perfscript*txt | ./stackcollapse-perf.pl > out.perf-folded
    4. ./flamegraph.pl --width 1024 out.perf-folded > perf.svg
    5. ./flamegraph.pl --reverse --width 1024 out.perf-folded > perf-reverse.svg
    6. Open perf.svg and perf-reverse.svg in your browser

Notes:

  • If not all symbols are resolved, try again with the additional option -Xlp:codecache:pagesize=4k

For background, see Linux perf.