Linux perf Recipe
- Install
perf
if it's not installed. - Prepare the Java process:
- For IBM Java >= 8.0.7.20 or Semeru >= v8.0.352 / 11.0.17.0 /
17.0.5.0, restart the Java process with
-XX:+PerfTool
- For older versions of IBM Java and Semeru, restart the Java process
with
-Xjit:perfTool
while making sure to combine with commas with any pre-existing-Xjit
options - For a HotSpot JVM, restart the Java process with
-XX:+PreserveFramePointer
andperf-map-agent
or, for Java >= 16, restart with-XX:+DumpPerfMapAtExit
to create/tmp/perf-$PID.map
on graceful JVM exit.
- For IBM Java >= 8.0.7.20 or Semeru >= v8.0.352 / 11.0.17.0 /
17.0.5.0, restart the Java process with
- During the performance problem, run one of the following commands as
root
. Change60
to the number of seconds you want to gather data for:- For IBM Java/Semeru running on top of an Intel processor that is
Haswell or later (see
cat /proc/cpuinfo
and reference Intel.com), use the following, although note that LBR has a limited stack depth, so use the next option if you need longer stacks:date +'%Y-%m-%d %H:%M:%S.%N %Z' &>> diag_starttimes_$(hostname).log; cat /proc/uptime &>> diag_starttimes_$(hostname).log; perf record --call-graph lbr -F 99 -a -g -- sleep 60
- For IBM Java/Semeru running on any other processor or if you're not
sure what the processor is:
date +'%Y-%m-%d %H:%M:%S.%N %Z' &>> diag_starttimes_$(hostname).log; cat /proc/uptime &>> diag_starttimes_$(hostname).log; perf record --call-graph dwarf,65528 -F 99 -a -g -- sleep 60
- For a HotSpot JVM:
date +'%Y-%m-%d %H:%M:%S.%N %Z' &>> diag_starttimes_$(hostname).log; cat /proc/uptime &>> diag_starttimes_$(hostname).log; perf record --call-graph fp -F 99 -a -g -- sleep 60
- For IBM Java/Semeru running on top of an Intel processor that is
Haswell or later (see
- After the above completes, run the following command:
perf script > diag_perfscript_$(hostname)_$(date +%Y%m%d_%H%M%S_%N).txt
- After the above completes, gather a thread dump so that thread IDs
may be mapped to thread names. This is very low overhead with the
process pausing for generally about 10ms to 100ms.
kill -3 $PID
- Similarly, gather an operating sytem core dump of the process if the
security,
disk and
performance
risks are acceptable (the process may pause for up to 30 seconds or
more) and the process and operating system are configured for it (e.g.
core and file
ulimits,
kernel.core_pattern
truncation settings, etc.) using one of various mechanisms and then runjextract
(IBM Java) orjpackcore
(Semeru) on it; for example:$JAVA/bin/jextract core*.dmp
- As root (needed to access
/proc/kallsyms
), run the following commands to archive theperf
data; replace$THREAD_DUMPS_DIR
with the location where thread dumps were produced, and include the packed operating system core dump if produced:# perf archive # tar czvf diag_perf_$(hostname)_$(date +%Y%m%d_%H%M%S).tar.gz perf.data* diag_perfscript* diag_perfscript* perf.data.tar.bz2 /proc/kallsyms /boot/System.map-$(uname -r) /tmp/perf*map $THREAD_DUMPS_DIR/javacore*.txt $OS_CORE_DUMPS_DIR/core*.dmp.zip
- Upload
diag_perf_*.tar.gz
and any Java/WAS logs, particularly verbosegc if enabled
If you want to do basic analysis of the perf
output
yourself:
- Top 10 CPU-using stack frames:
cat diag_perfscript*txt | awk 'go { go=0; print; } /cpu-clock:/ || /cycles:/ { go=1; }' | sort | uniq -c | sort -nr | head
- Create FlameGraphs:
git clone https://github.com/brendangregg/FlameGraph
cd FlameGraph
cat diag_perfscript*txt | ./stackcollapse-perf.pl > out.perf-folded
./flamegraph.pl --width 1024 out.perf-folded > perf.svg
./flamegraph.pl --reverse --width 1024 out.perf-folded > perf-reverse.svg
- Open
perf.svg
andperf-reverse.svg
in your browser
Notes:
- If not all symbols are resolved, try again with the additional
option
-Xlp:codecache:pagesize=4k
For background, see Linux perf.