Linux perf Recipe

  1. Install perf if it's not installed.
  2. Prepare the Java process:
    1. For IBM Java >= 8.0.7.20 or Semeru >= v8.0.352 / 11.0.17.0 / 17.0.5.0, restart the Java process with -XX:+PerfTool
    2. For older versions of IBM Java and Semeru, restart the Java process with -Xjit:perfTool while making sure to combine with commas with any pre-existing -Xjit options
    3. For a HotSpot JVM >= Java 16, restart with -XX:+DumpPerfMapAtExit
    4. For an older HotSpot JVM, restart with the perf-map-agent
  3. Run all of the following commands as root
  4. During the performance problem, run one of the following commands. Change 60 to the number of seconds you want to gather data for:
    1. For IBM Java/Semeru running on top of an Intel processor that is Haswell or later (see cat /proc/cpuinfo and reference Intel Processor names), use the following, although note that LBR has a limited stack depth, so use the next option if you need longer stacks:
      date +'%Y-%m-%d %H:%M:%S.%N %Z' &>> diag_starttimes_$(hostname).log; cat /proc/uptime &>> diag_starttimes_$(hostname).log; perf record --call-graph lbr -F 99 -a -g -- sleep 60
    2. For IBM Java/Semeru running on any other processor or if you're not sure what the processor is:
      date +'%Y-%m-%d %H:%M:%S.%N %Z' &>> diag_starttimes_$(hostname).log; cat /proc/uptime &>> diag_starttimes_$(hostname).log; perf record --call-graph dwarf,65528 -F 99 -a -g -- sleep 60
    3. For a HotSpot JVM:
      date +'%Y-%m-%d %H:%M:%S.%N %Z' &>> diag_starttimes_$(hostname).log; cat /proc/uptime &>> diag_starttimes_$(hostname).log; perf record --call-graph fp -F 99 -a -g -- sleep 60
  5. After the above completes, run the following command:
    perf script > diag_perfscript_$(hostname)_$(date +%Y%m%d_%H%M%S_%N).txt
  6. After the above completes, gather a thread dump so that thread IDs may be mapped to thread names. This is very low overhead with the process pausing for generally about 10ms to 100ms.
    kill -3 $PID
  7. Optionally, for IBM Java and IBM Semeru Runtimes processes, gather an operating sytem core dump of the process if the security, disk and performance risks are acceptable (the process may pause for up to 30 seconds or more) and the process and operating system are configured for it (e.g. core and file ulimits, kernel.core_pattern truncation settings, etc.) using one of various mechanisms and then run jextract (IBM Java) or jpackcore (Semeru) on it; for example:
    $JDK/bin/jpackcore core*.dmp
  8. Run the following commands to archive the perf data; replace $THREAD_DUMPS_DIR with the location where thread dumps were produced, and $OS_CORE_DUMPS_DIR if a core dump was produced:
    # perf archive
    # tar czvf diag_perf_$(hostname)_$(date +%Y%m%d_%H%M%S).tar.gz perf.data* diag_perfscript* diag_perfscript* perf.data.tar.bz2 /proc/kallsyms /boot/System.map-$(uname -r) /tmp/perf*map $THREAD_DUMPS_DIR/javacore*.txt $OS_CORE_DUMPS_DIR/core*.dmp.zip
  9. Upload diag_perf_*.tar.gz and any Java/WAS logs, particularly verbosegc if enabled

If you want to do basic analysis of the perf output yourself:

  1. Top 10 CPU-using stack frames:
    cat diag_perfscript*txt | awk 'go { go=0; print; } /cpu-clock:/ || /cycles:/ { go=1; }' | sort | uniq -c | sort -nr | head
  2. Create FlameGraphs:
    1. git clone https://github.com/brendangregg/FlameGraph
    2. cd FlameGraph
    3. cat diag_perfscript*txt | ./stackcollapse-perf.pl > out.perf-folded
    4. ./flamegraph.pl --width 1024 out.perf-folded > perf.svg
    5. ./flamegraph.pl --reverse --width 1024 out.perf-folded > perf-reverse.svg
    6. Open perf.svg and perf-reverse.svg in your browser

Notes:

  • If not all symbols are resolved, try again with the additional option -Xlp:codecache:pagesize=4k

For background, see Linux perf.