Linux perf Recipe
- Install
perf
if it's not installed. - Prepare the Java process:
- For IBM Java >= 8.0.7.20 or Semeru >= v8.0.352 / 11.0.17.0 /
17.0.5.0, restart the Java process with
-XX:+PerfTool
- For older versions of IBM Java and Semeru, restart the Java process
with
-Xjit:perfTool
while making sure to combine with commas with any pre-existing-Xjit
options - For a HotSpot JVM >= Java 16, restart with
-XX:+DumpPerfMapAtExit
- For an older HotSpot JVM, restart with the
perf-map-agent
- For IBM Java >= 8.0.7.20 or Semeru >= v8.0.352 / 11.0.17.0 /
17.0.5.0, restart the Java process with
- Run all of the following commands as
root
- During the performance problem, run one of the following commands.
Change
60
to the number of seconds you want to gather data for:- For IBM Java/Semeru running on top of an Intel processor that is
Haswell or later (see
cat /proc/cpuinfo
and reference Intel Processor names), use the following, although note that LBR has a limited stack depth, so use the next option if you need longer stacks:date +'%Y-%m-%d %H:%M:%S.%N %Z' &>> diag_starttimes_$(hostname).log; cat /proc/uptime &>> diag_starttimes_$(hostname).log; perf record --call-graph lbr -F 99 -a -g -- sleep 60
- For IBM Java/Semeru running on any other processor or if you're not
sure what the processor is:
date +'%Y-%m-%d %H:%M:%S.%N %Z' &>> diag_starttimes_$(hostname).log; cat /proc/uptime &>> diag_starttimes_$(hostname).log; perf record --call-graph dwarf,65528 -F 99 -a -g -- sleep 60
- For a HotSpot JVM:
date +'%Y-%m-%d %H:%M:%S.%N %Z' &>> diag_starttimes_$(hostname).log; cat /proc/uptime &>> diag_starttimes_$(hostname).log; perf record --call-graph fp -F 99 -a -g -- sleep 60
- For IBM Java/Semeru running on top of an Intel processor that is
Haswell or later (see
- After the above completes, run the following command:
perf script > diag_perfscript_$(hostname)_$(date +%Y%m%d_%H%M%S_%N).txt
- After the above completes, gather a thread dump so that thread IDs
may be mapped to thread names. This is very low overhead with the
process pausing for generally about 10ms to 100ms.
kill -3 $PID
- Optionally, for IBM Java and IBM Semeru Runtimes processes, gather
an operating sytem core dump of the process if the security,
disk and
performance
risks are acceptable (the process may pause for up to 30 seconds or
more) and the process and operating system are configured for it (e.g.
core and file
ulimits,
kernel.core_pattern
truncation settings, etc.) using one of various mechanisms and then runjextract
(IBM Java) orjpackcore
(Semeru) on it; for example:$JDK/bin/jpackcore core*.dmp
- Run the following commands to archive the
perf
data; replace$THREAD_DUMPS_DIR
with the location where thread dumps were produced, and$OS_CORE_DUMPS_DIR
if a core dump was produced:# perf archive # tar czvf diag_perf_$(hostname)_$(date +%Y%m%d_%H%M%S).tar.gz perf.data* diag_perfscript* diag_perfscript* perf.data.tar.bz2 /proc/kallsyms /boot/System.map-$(uname -r) /tmp/perf*map $THREAD_DUMPS_DIR/javacore*.txt $OS_CORE_DUMPS_DIR/core*.dmp.zip
- Upload
diag_perf_*.tar.gz
and any Java/WAS logs, particularly verbosegc if enabled
If you want to do basic analysis of the perf
output
yourself:
- Top 10 CPU-using stack frames:
cat diag_perfscript*txt | awk 'go { go=0; print; } /cpu-clock:/ || /cycles:/ { go=1; }' | sort | uniq -c | sort -nr | head
- Create FlameGraphs:
git clone https://github.com/brendangregg/FlameGraph
cd FlameGraph
cat diag_perfscript*txt | ./stackcollapse-perf.pl > out.perf-folded
./flamegraph.pl --width 1024 out.perf-folded > perf.svg
./flamegraph.pl --reverse --width 1024 out.perf-folded > perf-reverse.svg
- Open
perf.svg
andperf-reverse.svg
in your browser
Notes:
- If not all symbols are resolved, try again with the additional
option
-Xlp:codecache:pagesize=4k
For background, see Linux perf.