MustGather: High CPU¶
This documents documents common problems leading to High CPU as well
as instructions for running the executable collector on unix platforms.
The GatherHighCPUDoc
collector should only be run on a system actively
experiencing high CPU.
Known issues to check for first¶
High CPU with mod_ldap and virtualhosts¶
mod_ldap prior to PI94050 could cause loops/hangs/high cpu when used with virtual hosts.
High CPU on 8.0.0.13 and and 8.5.5.11¶
PI70829 can cause high CPU with logHandshakeError() or log_handshake_transcript() in the backtrace. The 8.5.5.11 and 8.0.0.13 interim fixes for PI76874 include a fix for this (PI76757).
High CPU on z/OS before 8.0.0.13, 8.5.5.11, and 9.0.0.2 due to PI68803¶
Prior to 8.0.0.13, 8.5.5.11, and 9.0.0.2, IHS on z/OS may use increasing CPU as individual processes age. The CPU usage may show up in BPXVRCAN (async I/O cancel), and BPX4AIO (async I/O poll) operations or the msgrcv() call. This is PI68803.
High CPU with IHS 8.0 or later on AIX¶
There are known issues in the ICC (IBM Crypto for C-Language) that is part of the IBM Global Security Kit (GSKit) bundled with earlier versions of IBM HTTP Server 8.0 and 8.5.5 that can result in high CPU, delays, or (frontend) timeouts. This is typically noticed by users migrating from IHS 7.0 on AIX or Linux/PPC.
The issues are resolved by IBM Global Security Kit (GSKit) 8.0.50.17 provided by PI09443, which is shipped in IHS 8.0.0.9 or 8.5.5.2. So, the problem no longer exists in those and higher versions, and the additional configuration described below is not required for those.
The recommended solution for earlier versions of IBM HTTP Server is to
either upgrade to a minimum of IHS fixpack versions 8.0.0.9 or 8.5.5.2,
or to upgrade the IBM Global Security Kit (GSKit) to a minimum version of 8.0.50.17 using the
PI09443 interim fix.
NOTE: It is always recommended to use the latest available IBM Global Security Kit (GSKit). To
determine the latest recommended IBM Global Security Kit (GSKit), you you can refer to the
'Comments' section for your fixpack version in the Recommended fixes
for IBM HTTP
Server
document.
If you are experiencing the problem symptoms described above, but cannot apply IHS fixpacks 8.0.0.9, 8.5.5.2 or newer, nor published interim fix PI09443 or newer providing at least IBM Global Security Kit (GSKit) 8.0.50.17, then you can complete the steps below:
You must apply a maintenance level with IBM Global Security Kit (GSKit) 8.0.14.27 or later in order to resolve the problem.
8.0.0.6, 8.0.0.7, 8.5.5.0, 8.5.5.1: These versions already contain IBM Global Security Kit (GSKit) 8.0.14.27, so you can skip to the environment variable configuration below.
8.0.0.0 - 8.0.0.6: Install a minimum {see 'Note' above} of IBM Global Security Kit (GSKit) 8.0.14.27 as provided by interim fix PM85211, then refer to the environment variable configuration below.
Set environment variables
(Not needed for IBM Global Security Kit (GSKit) 8.0.50.17 or higher. The default behavior of those is to act identically to the configuration below without any user configuration.)Set the following in
$IHSROOT/bin/envvars
and perform a full stop and start:ICC_IGNORE_FIPS=YES export ICC_IGNORE_FIPS
If the problem is not yet resolved, try 'also' setting the following in
$IHSROOT/bin/envvars
and perform a full stop and start:ICC_TRNG=ALT2 export ICC_TRNG
Note: 'ICC_TRNG=ALT
' had been previously recommended for this setting, but 'ALT2
' is now the recommended value to try. The only difference between the two is in the PRNG used, but the one used by 'ALT2
' has been determined to resolve the problem in a wider range of circumstances, so is now the suggested value.
For customers that have previously encountered this problem and have set this to 'ALT
', they can leave as-is if it resolved the problem for them, or change to 'ALT2
'.
If FIPSEnable
is required by your configuration, the above fixes will
use a cryptographic library that is NOT FIPS certified. The protocols
and ciphers will however still be restricted to the protocols and
ciphers deemed acceptable under FIPS140-2, but the implementation has
not (yet) been certified. We currently have no timeframe for when a FIPS
certified IBM Global Security Kit (GSKit) with the fixes will be available in an IHS fixpack, as
it depends on external FIPS certification. When it is available, this
document will be updated with the APAR number.
High CPU with SSL and 8.0.0.4 or earlier¶
Disable TLS compression: ssl_questions.html#compress
High CPU with the WebSphere Plugin on 6.0.2.33, 6.0.2.35, 6.1.0.23, 6.1.0.25, or 7.0.0.3¶
If you're using this level of the WebSphere Plugin, and have HTTPS transports configured, upgrade to a level including PK85105 before proceeding.
High CPU with SSL¶
Verify that SSL sessions are being reused.
Verify that the number of handshakes hasn't increased due to a config change.
The same SSL sessions documentation referenced above also contains information on counting handshakes. Count for both current and prior configuration for comparison.Verify that
ThreadsPerChild
has not been increased far beyond the default of25
.Verify that a supported, modern OS is being used.
Assess the impact of using larger SSL keys
Assess whether a more computationally expensive cipher is being used (such as 3DES).
The negotiated cipher can be logged using%{HTTPS_CIPHER}e
as described in the ihsdiag ciphers documentation.Verify sufficient dedicated CPU's are available. Avoid over-subscribed virtual CPU's or very few course for dozens of concurrent handshakes per second.
Review access log to make sure there is a healthy mix of 304 responses for static content, otherwise needless CPU time is spent on encrypting content.
Other configuration issues or suggestions that relate to high CPU¶
Avoid proxying requests for static files.
Limit on-the-fly compression with mod_deflate.
Consider conditional logging to avoid logging overhead on static assets.
Missing recommended maintenance.
Identifying high CPU processes¶
Operating systems provide special programs for displaying the CPU usage
of processes running on the system. These programs include ps
, top
,
topas
, nmon
, and others. When a high CPU problem is suspected, one
of these programs should be used to determine which processes are
actively consuming high CPU. If a web server process is the highest
consumer of CPU, documentation about the suspected problem should be
submitted to IBM HTTP Server support for analysis.
While ps
is universally available, it averages CPU usage across the
lifetime of the process. A very old process that begins to use high CPU
will often not show high CPU in ps
output. Using "top" or "topas" to
show processes with high CPU over some short (1 second) interval is
usually best. If you cannot find a high CPU process via something like
"top" or "topas", the example below demonstrates how to identify the
high CPU process with ps
.
ps
invocations which show percent CPU¶
Note: Read the preceding section carefully before relying on ps output. It is a last resort.
Platform | ps invocation |
AIX, z/OS |
|
Linux |
|
Solaris |
|
HP-UX |
|
example of using ps
to find high CPU processes¶
Substitute the appropriate ps
command for your platform.
# ps -A -o pid,ppid,pcpu,time,args
PID PPID %CPU TIME COMMAND
0 0 0.0 00:07:56 swapper
1 0 0.0 00:01:18 /etc/init
516 0 96.4 89-10:46:02 wait
774 0 0.0 00:00:09 reaper
...
32662 3170 0.0 00:00:00 /usr/sbin/sshd -D
33448 1 0.0 00:00:00 /usr/IBMIHS-1.3.28/bin/httpd -d /usr/IBMIHS-1.3.28
35482 33448 40.0 00:09:12 /usr/IBMIHS-1.3.28/bin/httpd -d /usr/IBMIHS-1.3.28
45270 33448 0.0 00:00:00 /usr/IBMIHS-1.3.28/bin/httpd -d /usr/IBMIHS-1.3.28
The two processes highlighted are the only ones with a high current CPU
utilization (third column in the ps
output). This example is from AIX,
where wait
is a special process which represents all idle CPU on the
system. So we have to ignore the wait
process. This leaves process
35482, which is an IBM HTTP Server process.
Some confirmation that 35482 is actively consuming high amounts of CPU
can be done by examining the forth column in the ps
output over time,
or by watching process with topas/top/prstat
# ps -A -o pid,ppid,pcpu,time,args | grep 35482
35482 33448 96.2 00:10:28 /usr/IBMIHS-1.3.28/bin/httpd -d /usr/IBMIHS-1.3.28
(wait 10 seconds or so)
# ps -A -o pid,ppid,pcpu,time,args | grep 35482
35482 33448 96.1 00:10:35 /usr/IBMIHS-1.3.28/bin/httpd -d /usr/IBMIHS-1.3.28
(wait 10 seconds or so)
[trawick@b80-2 trawick]$ ps -A -o pid,ppid,pcpu,time,args | grep 35482
35482 33448 96.6 00:10:44 /usr/IBMIHS-1.3.28/bin/httpd -d /usr/IBMIHS-1.3.28
The subsequent checks for process 35482 show that cumulative CPU time (the fourth column) continues to increase. At this point, the suspect process is clearly identified, and we should proceed to collecting information about the proces using the instructions below.
Making sure required support programs are available¶
Please refer to these instructions for verifying that required support programs are installed.
Running the tool¶
You will need to download the collector
Warning¶
The collector tool uses native tools such as strace
and truss
to obtain
system call traces, which include the contents of buffers used to read
and write data from the network.
Note: This executable mustgather is not used on Windows nor on z/OS.
On Windows, refer to win32_hang_doc.html and collect a few
procdump
captures of the High CPU process.On z/OS, system dumps should be collected that include the httpd address spaces. A few dumps over a short period of time should be taken. Previously, the executable mustgather was provided on z/OS but it depends on the dbx debugger which conflicts with the very common SAFRunAs directive.
Run the tool as root
to avoid any permissions problems with obtaining
backtraces or reading files, such as log files and configuration files.
(More information about the requirement to run this tool as root
is
available here.)
ServerDoc is passed in four parameters for gathering high CPU documentation:
GatherHighCpuDoc
the name of the IHS installation directory (e.g., /usr/HTTPServer)
the process id of the high CPU process to examine
the address of a non-SSL port handled by the web server (e.g., 127.0.0.1:80), or "-" if there is no non-SSL port
# java -jar ServerDoc.jar GatherHighCpuDoc /path/to/IHS 35482 127.0.0.1:80
The tool creates a new directory which contains a timestamp in the name, and the high CPU documentation will be saved in that directory.
Determining the value of the non-SSL address parameter¶
If the IHS installation only supports SSL, then use - for this parameter. Otherwise, specify an IP address and port which can be used to reach the server from the local machine without using SSL.
A sample run¶
For this example, IHS is installed in /usr/IBMIHS-1.3.28
, the high CPU
process is 35482, the non-SSL port can be reached from the web server
machine on address 127.0.0.1:8080
, and ihsdiag was unpacked into
directory
/root/ihsdiag-1.3.4
.
# java -jar /root/ihsdiag-1.3.4/ServerDoc.jar GatherHighCpuDoc /usr/IBMIHS-1.3.28 \
35482 127.0.0.1:8080
Tracing process for 10 seconds...
Seconds remaining before gathering information again:
60...54...48...42...36...30...24...18...12...6...
Tracing process for 10 seconds...
Seconds remaining before gathering information again:
30...27...24...21...18...15...12...9...6...3...
Tracing process for 10 seconds...
Reports, log files, and configuration files have been saved to directory
HighCpuDoc.200501201122
If you have additional log files or configuration files, copy them there
before packing up the directory.
Web server log and conf files other than the default will have to be copied manually.
WebSphere plug-in conf and log files will have to be copied manually.
Hint for packing up the directory:
tar -cf HighCpuDoc.200501201122.tar HighCpuDoc.200501201122
gzip HighCpuDoc.200501201122.tar
[trawick@b80-2 ServerDoc]$ ls -l HighCpuDoc.200501201122/
total 11479
-rw-r--r-- 1 root system 32986 Jan 20 11:22 access_log
-rw-r--r-- 1 root system 7129 Jan 20 11:22 apachectl
-rw-r--r-- 1 root system 2251 Jan 20 11:22 error_log
-rw-r--r-- 1 root system 662246 Jan 20 11:22 httpd
-rw-r--r-- 1 root system 22835 Jan 20 11:22 httpd.conf
-rw-r--r-- 1 root system 197962 Jan 20 11:24 log
-rw-r--r-- 1 root system 1624 Jan 20 11:24 report
-rw-r--r-- 1 root system 1641481 Jan 20 11:22 trace.0
-rw-r--r-- 1 root system 1642385 Jan 20 11:24 trace.1
-rw-r--r-- 1 root system 1650769 Jan 20 11:24 trace.2
-rwxr-xr-x 1 root system 1442 Jan 20 11:22 traceprocess.sh
Copying other web server and plug-in files¶
The next step is to copy any other web server or plug-in configuration files and logs into the new HighCpuDoc directory. Here is a list of files to copy if they are being used:
any IHS configuration file other than httpd.conf in the IHS install directory
any additional web server error or access log files, such as log files specific to each virtual host or log files created by rotatelogs
the WebSphere plug-in configuration file
the WebSphere plug-in log file
Saving the documentation directory¶
The last step is to pack up and compress the documentation directory using zip, tar followed by gzip, or pax followed by compress. The easiest way is to cut and paste the messages displayed by ServerDoc previously which showed the commands to use. The suggested commands will vary by platform. On z/OS, for example, pax and compress will be suggested instead of tar and gzip.
a sample run¶
# tar -cf HighCpuDoc.200501201122.tar HighCpuDoc.200501201122
# gzip HighCpuDoc.200501201122.tar
The resulting compressed file is the file to send to IBM support.
Understanding the root
requirement¶
When gathering information on high CPU problems, the tool must attach to a live web server process to obtain information about the state of that process.
If the web server is started as root
, then that process will be owned
by root
or by the web server user id (e.g., nobody
or www
). Only
root
has the authority to attach to any of the web server processes,
so it is easiest if the tool itself is run as root
. If the web server
administrator does not have authority to log in or switch user to
root
, a simple script can be created to gather the high CPU
documentation, and the system administrator can give the web server
administrator sudo
access to that script. sudo
is a third-party tool
available without cost for all appropriate platforms.
If the web server is not started as root
, there are no such concerns,
and the high CPU documentation tool may be run by the user id which
starts the web server.
If the tool is run as non-root
and it is unable to gather the required
information, the problem will have to be recreated. It may not be
possible to determine if this problem occurred until the documentation
has been analyzed by IBM HTTP Server support.