MustGather: High CPU

This documents documents common problems leading to High CPU as well as instructions for running the executable collector on unix platforms. The GatherHighCPUDoc collector should only be run on a system actively experiencing high CPU.

  1. Review known issues

  2. Run the collector

For windows, see The Windows Mustgather

Known issues to check for first

High CPU with mod_ldap and virtualhosts

mod_ldap prior to PI94050 could cause loops/hangs/high cpu when used with virtual hosts.

High CPU on 8.0.0.13 and and 8.5.5.11

PI70829 can cause high CPU with logHandshakeError() or log_handshake_transcript() in the backtrace. The 8.5.5.11 and 8.0.0.13 interim fixes for PI76874 include a fix for this (PI76757).

High CPU on z/OS before 8.0.0.13, 8.5.5.11, and 9.0.0.2 due to PI68803

Prior to 8.0.0.13, 8.5.5.11, and 9.0.0.2, IHS on z/OS may use increasing CPU as individual processes age. The CPU usage may show up in BPXVRCAN (async I/O cancel), and BPX4AIO (async I/O poll) operations or the msgrcv() call. This is PI68803.

High CPU with IHS 8.0 or later on AIX

There are known issues in the ICC (IBM Crypto for C-Language) that is part of the IBM Global Security Kit (GSKit) bundled with earlier versions of IBM HTTP Server 8.0 and 8.5.5 that can result in high CPU, delays, or (frontend) timeouts. This is typically noticed by users migrating from IHS 7.0 on AIX or Linux/PPC.

The issues are resolved by IBM Global Security Kit (GSKit) 8.0.50.17 provided by PI09443, which is shipped in IHS 8.0.0.9 or 8.5.5.2. So, the problem no longer exists in those and higher versions, and the additional configuration described below is not required for those.

The recommended solution for earlier versions of IBM HTTP Server is to either upgrade to a minimum of IHS fixpack versions 8.0.0.9 or 8.5.5.2, or to upgrade the IBM Global Security Kit (GSKit) to a minimum version of 8.0.50.17 using the PI09443 interim fix.
NOTE: It is always recommended to use the latest available IBM Global Security Kit (GSKit). To determine the latest recommended IBM Global Security Kit (GSKit), you you can refer to the 'Comments' section for your fixpack version in the Recommended fixes for IBM HTTP Server document.

If you are experiencing the problem symptoms described above, but cannot apply IHS fixpacks 8.0.0.9, 8.5.5.2 or newer, nor published interim fix PI09443 or newer providing at least IBM Global Security Kit (GSKit) 8.0.50.17, then you can complete the steps below:

  • You must apply a maintenance level with IBM Global Security Kit (GSKit) 8.0.14.27 or later in order to resolve the problem.

    • 8.0.0.6, 8.0.0.7, 8.5.5.0, 8.5.5.1: These versions already contain IBM Global Security Kit (GSKit) 8.0.14.27, so you can skip to the environment variable configuration below.

    • 8.0.0.0 - 8.0.0.6: Install a minimum {see 'Note' above} of IBM Global Security Kit (GSKit) 8.0.14.27 as provided by interim fix PM85211, then refer to the environment variable configuration below.

  • Set environment variables
    (Not needed for IBM Global Security Kit (GSKit) 8.0.50.17 or higher. The default behavior of those is to act identically to the configuration below without any user configuration.)

    • Set the following in $IHSROOT/bin/envvars and perform a full stop and start:
      ICC_IGNORE_FIPS=YES export ICC_IGNORE_FIPS

    • If the problem is not yet resolved, try 'also' setting the following in $IHSROOT/bin/envvars and perform a full stop and start:
      ICC_TRNG=ALT2 export ICC_TRNG Note: 'ICC_TRNG=ALT' had been previously recommended for this setting, but 'ALT2' is now the recommended value to try. The only difference between the two is in the PRNG used, but the one used by 'ALT2' has been determined to resolve the problem in a wider range of circumstances, so is now the suggested value.
      For customers that have previously encountered this problem and have set this to 'ALT', they can leave as-is if it resolved the problem for them, or change to 'ALT2'.

If FIPSEnable is required by your configuration, the above fixes will use a cryptographic library that is NOT  FIPS certified. The protocols and ciphers will however still be restricted to the protocols and ciphers deemed acceptable under FIPS140-2, but the implementation has not (yet) been certified. We currently have no timeframe for when a FIPS certified IBM Global Security Kit (GSKit) with the fixes will be available in an IHS fixpack, as it depends on external FIPS certification. When it is available, this document will be updated with the APAR number.

High CPU with SSL and 8.0.0.4 or earlier

Disable TLS compression: ssl_questions.html#compress

High CPU with the WebSphere Plugin on 6.0.2.33, 6.0.2.35, 6.1.0.23, 6.1.0.25, or 7.0.0.3

If you're using this level of the WebSphere Plugin, and have HTTPS transports configured, upgrade to a level including PK85105 before proceeding.

High CPU with SSL

  1. Verify that SSL sessions are being reused.

  2. Verify that the number of handshakes hasn't increased due to a config change.
    The same SSL sessions documentation referenced above also contains information on counting handshakes. Count for both current and prior configuration for comparison.

  3. Verify that ThreadsPerChild has not been increased far beyond the default of 25.

  4. Verify that a supported, modern OS is being used.

  5. Assess the impact of using larger SSL keys

  6. Assess whether a more computationally expensive cipher is being used (such as 3DES).
    The negotiated cipher can be logged using %{HTTPS_CIPHER}e as described in the ihsdiag ciphers documentation.

  7. Verify sufficient dedicated CPU's are available. Avoid over-subscribed virtual CPU's or very few course for dozens of concurrent handshakes per second.

  8. Review access log to make sure there is a healthy mix of 304 responses for static content, otherwise needless CPU time is spent on encrypting content.

Other configuration issues or suggestions that relate to high CPU

Identifying high CPU processes

Operating systems provide special programs for displaying the CPU usage of processes running on the system. These programs include ps, top, topas, nmon, and others. When a high CPU problem is suspected, one of these programs should be used to determine which processes are actively consuming high CPU. If a web server process is the highest consumer of CPU, documentation about the suspected problem should be submitted to IBM HTTP Server support for analysis.

While ps is universally available, it averages CPU usage across the lifetime of the process. A very old process that begins to use high CPU will often not show high CPU in ps output. Using "top" or "topas" to show processes with high CPU over some short (1 second) interval is usually best. If you cannot find a high CPU process via something like "top" or "topas", the example below demonstrates how to identify the high CPU process with ps.

ps invocations which show percent CPU

Note: Read the preceding section carefully before relying on ps output. It is a last resort.

Platform ps invocation
AIX, z/OS
ps -A -o pid,ppid,pcpu,time,args
Linux
ps -eo pid,ppid,pcpu,time,fname,cmd
Solaris
ps -A -o pid,ppid,pcpu,time,comm
HP-UX
UNIX95=1 ps -A -o pid,ppid,pcpu,time,args

example of using ps to find high CPU processes

Substitute the appropriate ps command for your platform.

# ps -A -o pid,ppid,pcpu,time,args
  PID  PPID  %CPU        TIME COMMAND
    0     0   0.0    00:07:56 swapper
    1     0   0.0    00:01:18 /etc/init
  516     0  96.4 89-10:46:02 wait
  774     0   0.0    00:00:09 reaper

 ...
32662  3170   0.0    00:00:00 /usr/sbin/sshd -D
33448     1   0.0    00:00:00 /usr/IBMIHS-1.3.28/bin/httpd -d /usr/IBMIHS-1.3.28
35482 33448  40.0    00:09:12 /usr/IBMIHS-1.3.28/bin/httpd -d /usr/IBMIHS-1.3.28
45270 33448   0.0    00:00:00 /usr/IBMIHS-1.3.28/bin/httpd -d /usr/IBMIHS-1.3.28

The two processes highlighted are the only ones with a high current CPU utilization (third column in the ps output). This example is from AIX, where wait is a special process which represents all idle CPU on the system. So we have to ignore the wait process. This leaves process 35482, which is an IBM HTTP Server process.

Some confirmation that 35482 is actively consuming high amounts of CPU can be done by examining the forth column in the ps output over time, or by watching process with topas/top/prstat

# ps -A -o pid,ppid,pcpu,time,args | grep 35482
35482 33448  96.2    00:10:28 /usr/IBMIHS-1.3.28/bin/httpd -d /usr/IBMIHS-1.3.28

(wait 10 seconds or so)

# ps -A -o pid,ppid,pcpu,time,args | grep 35482
35482 33448  96.1    00:10:35 /usr/IBMIHS-1.3.28/bin/httpd -d /usr/IBMIHS-1.3.28

(wait 10 seconds or so)

[trawick@b80-2 trawick]$ ps -A -o pid,ppid,pcpu,time,args | grep 35482
35482 33448  96.6    00:10:44 /usr/IBMIHS-1.3.28/bin/httpd -d /usr/IBMIHS-1.3.28

The subsequent checks for process 35482 show that cumulative CPU time (the fourth column) continues to increase. At this point, the suspect process is clearly identified, and we should proceed to collecting information about the proces using the instructions below.

Making sure required support programs are available

Please refer to these instructions for verifying that required support programs are installed.

Running the tool

You will need to download the collector

Warning

The collector tool uses native tools such as strace and truss to obtain system call traces, which include the contents of buffers used to read and write data from the network.

Note: This executable mustgather is not used on Windows nor on z/OS.

  • On Windows, refer to win32_hang_doc.html and collect a few procdump captures of the High CPU process.

  • On z/OS, system dumps should be collected that include the httpd address spaces. A few dumps over a short period of time should be taken. Previously, the executable mustgather was provided on z/OS but it depends on the dbx debugger which conflicts with the very common SAFRunAs directive.

Run the tool as root to avoid any permissions problems with obtaining backtraces or reading files, such as log files and configuration files. (More information about the requirement to run this tool as root is available here.)

ServerDoc is passed in four parameters for gathering high CPU documentation:

  1. GatherHighCpuDoc

  2. the name of the IHS installation directory (e.g., /usr/HTTPServer)

  3. the process id of the high CPU process to examine

  4. the address of a non-SSL port handled by the web server (e.g., 127.0.0.1:80), or "-" if there is no non-SSL port

# java -jar ServerDoc.jar GatherHighCpuDoc /path/to/IHS 35482 127.0.0.1:80

The tool creates a new directory which contains a timestamp in the name, and the high CPU documentation will be saved in that directory.

Determining the value of the non-SSL address parameter

If the IHS installation only supports SSL, then use - for this parameter. Otherwise, specify an IP address and port which can be used to reach the server from the local machine without using SSL.

A sample run

For this example, IHS is installed in /usr/IBMIHS-1.3.28, the high CPU process is 35482, the non-SSL port can be reached from the web server machine on address 127.0.0.1:8080, and ihsdiag was unpacked into directory /root/ihsdiag-1.3.4.

# java -jar /root/ihsdiag-1.3.4/ServerDoc.jar GatherHighCpuDoc /usr/IBMIHS-1.3.28 \
35482 127.0.0.1:8080
Tracing process for 10 seconds...

Seconds remaining before gathering information again:
60...54...48...42...36...30...24...18...12...6...

Tracing process for 10 seconds...

Seconds remaining before gathering information again:
30...27...24...21...18...15...12...9...6...3...

Tracing process for 10 seconds...

Reports, log files, and configuration files have been saved to directory
  HighCpuDoc.200501201122
If you have additional log files or configuration files, copy them there
before packing up the directory.
Web server log and conf files other than the default will have to be copied manually.
WebSphere plug-in conf and log files will have to be copied manually.

Hint for packing up the directory:
  tar -cf HighCpuDoc.200501201122.tar HighCpuDoc.200501201122
  gzip HighCpuDoc.200501201122.tar
[trawick@b80-2 ServerDoc]$ ls -l HighCpuDoc.200501201122/
total 11479
-rw-r--r--   1 root     system        32986 Jan 20 11:22 access_log
-rw-r--r--   1 root     system         7129 Jan 20 11:22 apachectl
-rw-r--r--   1 root     system         2251 Jan 20 11:22 error_log
-rw-r--r--   1 root     system       662246 Jan 20 11:22 httpd
-rw-r--r--   1 root     system        22835 Jan 20 11:22 httpd.conf
-rw-r--r--   1 root     system       197962 Jan 20 11:24 log
-rw-r--r--   1 root     system         1624 Jan 20 11:24 report
-rw-r--r--   1 root     system      1641481 Jan 20 11:22 trace.0
-rw-r--r--   1 root     system      1642385 Jan 20 11:24 trace.1
-rw-r--r--   1 root     system      1650769 Jan 20 11:24 trace.2
-rwxr-xr-x   1 root     system         1442 Jan 20 11:22 traceprocess.sh

Copying other web server and plug-in files

The next step is to copy any other web server or plug-in configuration files and logs into the new HighCpuDoc directory. Here is a list of files to copy if they are being used:

  • any IHS configuration file other than httpd.conf in the IHS install directory

  • any additional web server error or access log files, such as log files specific to each virtual host or log files created by rotatelogs

  • the WebSphere plug-in configuration file

  • the WebSphere plug-in log file

Saving the documentation directory

The last step is to pack up and compress the documentation directory using zip, tar followed by gzip, or pax followed by compress. The easiest way is to cut and paste the messages displayed by ServerDoc previously which showed the commands to use. The suggested commands will vary by platform. On z/OS, for example, pax and compress will be suggested instead of tar and gzip.

a sample run

# tar -cf HighCpuDoc.200501201122.tar HighCpuDoc.200501201122
# gzip HighCpuDoc.200501201122.tar

The resulting compressed file is the file to send to IBM support.

Understanding the root requirement

When gathering information on high CPU problems, the tool must attach to a live web server process to obtain information about the state of that process.

If the web server is started as root, then that process will be owned by root or by the web server user id (e.g., nobody or www). Only root has the authority to attach to any of the web server processes, so it is easiest if the tool itself is run as root. If the web server administrator does not have authority to log in or switch user to root, a simple script can be created to gather the high CPU documentation, and the system administrator can give the web server administrator sudo access to that script. sudo is a third-party tool available without cost for all appropriate platforms.

If the web server is not started as root, there are no such concerns, and the high CPU documentation tool may be run by the user id which starts the web server.

If the tool is run as non-root and it is unable to gather the required information, the problem will have to be recreated. It may not be possible to determine if this problem occurred until the documentation has been analyzed by IBM HTTP Server support.