# MustGather: High CPU This documents documents common problems leading to High CPU as well as instructions for running the executable collector on unix platforms. The `GatherHighCPUDoc` collector should only be run on a system actively experiencing high CPU. 1. [Review known issues](#known-issues-to-check-for-first) 2. [Run the collector](#running-the-tool) ## Known issues to check for first ### [High CPU with mod_ldap and virtualhosts](#PI94050) mod_ldap prior to PI94050 could cause loops/hangs/high cpu when used with virtual hosts. ### [High CPU on 8.0.0.13 and and 8.5.5.11](#PI70829) PI70829 can cause high CPU with logHandshakeError() or log\_handshake\_transcript() in the backtrace. The 8.5.5.11 and 8.0.0.13 interim fixes for PI76874 include a fix for this (PI76757). ### [High CPU on z/OS before 8.0.0.13, 8.5.5.11, and 9.0.0.2 due to PI68803](#PI68803) Prior to 8.0.0.13, 8.5.5.11, and 9.0.0.2, IHS on z/OS may use increasing CPU as individual processes age. The CPU usage may show up in BPXVRCAN (async I/O cancel), and BPX4AIO (async I/O poll) operations or the msgrcv() call. This is PI68803. ### [High CPU with IHS 8.0 or later on AIX](#GSKITICC_HIGHCPU) There are known issues in the ICC (IBM Crypto for C-Language) that is part of the IBM Global Security Kit (GSKit) bundled with earlier versions of IBM HTTP Server 8.0 and 8.5.5 that can result in high CPU, delays, or (frontend) timeouts. This is typically noticed by users migrating from IHS 7.0 on AIX or Linux/PPC. The issues are resolved by IBM Global Security Kit (GSKit) 8.0.50.17 provided by PI09443, which is shipped in IHS 8.0.0.9 or 8.5.5.2. So, the problem no longer exists in those and higher versions, and the additional configuration described below is not required for those. The recommended solution for earlier versions of IBM HTTP Server is to either upgrade to a minimum of IHS fixpack versions 8.0.0.9 or 8.5.5.2, or to upgrade the IBM Global Security Kit (GSKit) to a **minimum** version of 8.0.50.17 using the PI09443 interim fix. **NOTE:** It is always recommended to use the latest available IBM Global Security Kit (GSKit). To determine the latest recommended IBM Global Security Kit (GSKit), you you can refer to the 'Comments' section for your fixpack version in the [Recommended fixes for IBM HTTP Server](http://www-01.ibm.com/support/docview.wss?rs=177&context=SSEQTJ&uid=swg27005198) document. If you are experiencing the problem symptoms described above, but cannot apply IHS fixpacks 8.0.0.9, 8.5.5.2 or newer, nor published interim fix PI09443 or newer providing at least IBM Global Security Kit (GSKit) 8.0.50.17, then you can complete the steps below: - You must apply a maintenance level with IBM Global Security Kit (GSKit) 8.0.14.27 or later in order to resolve the problem. - 8.0.0.6, 8.0.0.7, 8.5.5.0, 8.5.5.1: These versions already contain IBM Global Security Kit (GSKit) 8.0.14.27, so you can skip to the environment variable configuration below. - 8.0.0.0 - 8.0.0.6: Install a minimum *{see 'Note' above}* of IBM Global Security Kit (GSKit) 8.0.14.27 as provided by [interim fix PM85211](http://www-01.ibm.com/support/docview.wss?uid=swg24035061), then refer to the environment variable configuration below. - Set environment variables *(Not needed for IBM Global Security Kit (GSKit) 8.0.50.17 or higher. The default behavior of those is to act identically to the configuration below without any user configuration.)* - Set the following in `$IHSROOT/bin/envvars` and perform a full stop and start: ` ICC_IGNORE_FIPS=YES export ICC_IGNORE_FIPS ` - If the problem is not yet resolved, try 'also' setting the following in `$IHSROOT/bin/envvars` and perform a full stop and start: ` ICC_TRNG=ALT2 export ICC_TRNG ` Note: '`ICC_TRNG=ALT`' had been previously recommended for this setting, but '`ALT2`' is now the recommended value to try. The only difference between the two is in the PRNG used, but the one used by '`ALT2`' has been determined to resolve the problem in a wider range of circumstances, so is now the suggested value. For customers that have previously encountered this problem and have set this to '`ALT`', they can leave as-is if it resolved the problem for them, or change to '`ALT2`'. If `FIPSEnable` is required by your configuration, the above fixes will use a cryptographic library that is *NOT*  FIPS certified. The protocols and ciphers will however still be restricted to the protocols and ciphers deemed acceptable under FIPS140-2, but the implementation has not (yet) been certified. We currently have no timeframe for when a FIPS certified IBM Global Security Kit (GSKit) with the fixes will be available in an IHS fixpack, as it depends on external FIPS certification. When it is available, this document will be updated with the APAR number. ### [High CPU with SSL and 8.0.0.4 or earlier](#tlscomp) Disable TLS compression: [ssl\_questions.html\#compress](ssl_questions.html#compress) ### [High CPU with the WebSphere Plugin on 6.0.2.33, 6.0.2.35, 6.1.0.23, 6.1.0.25, or 7.0.0.3](#plgsslspin) If you're using this level of the WebSphere Plugin, and have HTTPS transports configured, upgrade to a level including PK85105 before proceeding. ### [mod\_rewrite related performance issues](#rewrite) #### [Recursive regular expression captures](#pcrerecur) A very rare (usually accidental) pattern used in regular expressions can use excessive CPU. These are most easily identified by searching the configuration file for a repetition character immediately following a closing parenthesis. If a regex is found that matches this pattern using the commands below, refactor it to not repeat the capture itself: ``` grep -F ')+' httpd.conf ``` and ``` grep -F ')*' httpd.conf ``` #### [Excessive mod\_rewrite evaluation](#excessrewrite) Having many hundreds or thousands of mod\_rewrite directives evaluated for common requests can be detrimental to performance, even moreso if they are specified in `.htaccess`. Limit evaluation by defining them in \ context, using the 'L' flag in early rules to avoid further processing for frequent requests, and making matches in the first parameter of `RewriteRule` "fail fast". ### [High CPU with SSL](#SSL) 1. Verify that [SSL sessions are being reused](ihs_performance.html#sid). 2. Verify that the number of handshakes hasn't increased due to a config change. The same [SSL sessions](ihs_performance.html#sid) documentation referenced above also contains information on counting handshakes. Count for both current and prior configuration for comparison. 3. Verify that `ThreadsPerChild` has not been increased far beyond the default of `25`. 4. Verify that a supported, modern OS is being used. 5. Assess the impact of [using larger SSL keys](ssl_questions.html#largekeys) 6. Assess whether a more computationally expensive cipher is being used (such as 3DES). The negotiated cipher can be logged using `%{HTTPS_CIPHER}e` as described in the ihsdiag [ciphers](ihs_performance.html#CIPHERLIST) documentation. 7. Verify sufficient dedicated CPU's are available. Avoid over-subscribed virtual CPU's or very few course for dozens of concurrent handshakes per second. 8. Review access log to make sure there is a healthy mix of 304 responses for static content, otherwise needless CPU time is spent on encrypting content. ### [Other configuration issues or suggestions that relate to high CPU](#other) - Avoid proxying requests for static files. - Limit on-the-fly compression with mod\_deflate. - Consider conditional logging to avoid logging overhead on static assets. - Missing recommended maintenance. - [More relatively obscure configuration features to avoid](ihs_performance.html#Configuration_features_to_avoid) - [SSL Performance](ihs_performance.html#SSL) ## Identifying high CPU processes Operating systems provide special programs for displaying the CPU usage of processes running on the system. These programs include `ps`, `top`, `topas`, `nmon`, and others. When a high CPU problem is suspected, one of these programs should be used to determine which processes are actively consuming high CPU. If a web server process is the highest consumer of CPU, documentation about the suspected problem should be submitted to IBM HTTP Server support for analysis. While `ps` is universally available, it averages CPU usage across the lifetime of the process. A very old process that begins to use high CPU will often not show high CPU in `ps` output. Using "top" or "topas" to show processes with high CPU over some short (1 second) interval is usually best. If you cannot find a high CPU process via something like "top" or "topas", the example below demonstrates how to identify the high CPU process with `ps`. #### `ps` invocations which show percent CPU Note: Read the preceding section carefully before relying on ps output. It is a last resort.
Platform ps invocation
AIX, z/OS
ps -A -o pid,ppid,pcpu,time,args
Linux
ps -eo pid,ppid,pcpu,time,fname,cmd
Solaris
ps -A -o pid,ppid,pcpu,time,comm
HP-UX
UNIX95=1 ps -A -o pid,ppid,pcpu,time,args
#### example of using `ps` to find high CPU processes Substitute the appropriate `ps` command for your platform. # ps -A -o pid,ppid,pcpu,time,args PID PPID %CPU TIME COMMAND 0 0 0.0 00:07:56 swapper 1 0 0.0 00:01:18 /etc/init 516 0 96.4 89-10:46:02 wait 774 0 0.0 00:00:09 reaper ... 32662 3170 0.0 00:00:00 /usr/sbin/sshd -D 33448 1 0.0 00:00:00 /usr/IBMIHS-1.3.28/bin/httpd -d /usr/IBMIHS-1.3.28 35482 33448 40.0 00:09:12 /usr/IBMIHS-1.3.28/bin/httpd -d /usr/IBMIHS-1.3.28 45270 33448 0.0 00:00:00 /usr/IBMIHS-1.3.28/bin/httpd -d /usr/IBMIHS-1.3.28 The two processes highlighted are the only ones with a high current CPU utilization (third column in the `ps` output). This example is from AIX, where `wait` is a special process which represents all idle CPU on the system. So we have to ignore the `wait` process. This leaves process 35482, which is an IBM HTTP Server process. Some confirmation that 35482 is actively consuming high amounts of CPU can be done by examining the forth column in the `ps` output over time, or by watching process with topas/top/prstat # ps -A -o pid,ppid,pcpu,time,args | grep 35482 35482 33448 96.2 00:10:28 /usr/IBMIHS-1.3.28/bin/httpd -d /usr/IBMIHS-1.3.28 (wait 10 seconds or so) # ps -A -o pid,ppid,pcpu,time,args | grep 35482 35482 33448 96.1 00:10:35 /usr/IBMIHS-1.3.28/bin/httpd -d /usr/IBMIHS-1.3.28 (wait 10 seconds or so) [trawick@b80-2 trawick]$ ps -A -o pid,ppid,pcpu,time,args | grep 35482 35482 33448 96.6 00:10:44 /usr/IBMIHS-1.3.28/bin/httpd -d /usr/IBMIHS-1.3.28 The subsequent checks for process 35482 show that cumulative CPU time (the fourth column) continues to increase. At this point, the suspect process is clearly identified, and we should proceed to collecting information about the proces using the instructions below. ## Making sure required support programs are available Please refer to [these instructions](check_platform.html) for verifying that required support programs are installed. ## Running the tool You will need to [download the collector](install.html) ### Warning The collector tool uses native tools such as `strace` and `truss` to obtain system call traces, which include the contents of buffers used to read and write data from the network. Note: This executable mustgather is not used on Windows nor on z/OS. - On Windows, refer to [win32\_hang\_doc.html](win32_hang_doc.html) and collect a few `procdump` captures of the High CPU process. - On z/OS, system dumps should be collected that include the httpd address spaces. A few dumps over a short period of time should be taken. Previously, the executable mustgather was provided on z/OS but it depends on the dbx debugger which conflicts with the very common SAFRunAs directive. Run the tool as `root` to avoid any permissions problems with obtaining backtraces or reading files, such as log files and configuration files. (More information about the requirement to run this tool as `root` is available [here](#root).) ServerDoc is passed in four parameters for gathering high CPU documentation: 1. `GatherHighCpuDoc` 2. the name of the IHS installation directory (e.g., /usr/HTTPServer) 3. the process id of the high CPU process to examine 4. the address of a non-SSL port handled by the web server (e.g., 127.0.0.1:80), or "-" if there is no non-SSL port # java -jar ServerDoc.jar GatherHighCpuDoc /path/to/IHS 35482 127.0.0.1:80 The tool creates a new directory which contains a timestamp in the name, and the high CPU documentation will be saved in that directory. #### Determining the value of the non-SSL address parameter If the IHS installation only supports SSL, then use **-** for this parameter. Otherwise, specify an IP address and port which can be used to reach the server from the local machine without using SSL. #### A sample run For this example, IHS is installed in `/usr/IBMIHS-1.3.28`, the high CPU process is 35482, the non-SSL port can be reached from the web server machine on address `127.0.0.1:8080`, and ihsdiag was unpacked into directory `/root/ihsdiag-1.3.4`. # java -jar /root/ihsdiag-1.3.4/ServerDoc.jar GatherHighCpuDoc /usr/IBMIHS-1.3.28 \ 35482 127.0.0.1:8080 Tracing process for 10 seconds... Seconds remaining before gathering information again: 60...54...48...42...36...30...24...18...12...6... Tracing process for 10 seconds... Seconds remaining before gathering information again: 30...27...24...21...18...15...12...9...6...3... Tracing process for 10 seconds... Reports, log files, and configuration files have been saved to directory HighCpuDoc.200501201122 If you have additional log files or configuration files, copy them there before packing up the directory. Web server log and conf files other than the default will have to be copied manually. WebSphere plug-in conf and log files will have to be copied manually. Hint for packing up the directory: tar -cf HighCpuDoc.200501201122.tar HighCpuDoc.200501201122 gzip HighCpuDoc.200501201122.tar [trawick@b80-2 ServerDoc]$ ls -l HighCpuDoc.200501201122/ total 11479 -rw-r--r-- 1 root system 32986 Jan 20 11:22 access_log -rw-r--r-- 1 root system 7129 Jan 20 11:22 apachectl -rw-r--r-- 1 root system 2251 Jan 20 11:22 error_log -rw-r--r-- 1 root system 662246 Jan 20 11:22 httpd -rw-r--r-- 1 root system 22835 Jan 20 11:22 httpd.conf -rw-r--r-- 1 root system 197962 Jan 20 11:24 log -rw-r--r-- 1 root system 1624 Jan 20 11:24 report -rw-r--r-- 1 root system 1641481 Jan 20 11:22 trace.0 -rw-r--r-- 1 root system 1642385 Jan 20 11:24 trace.1 -rw-r--r-- 1 root system 1650769 Jan 20 11:24 trace.2 -rwxr-xr-x 1 root system 1442 Jan 20 11:22 traceprocess.sh ### Copying other web server and plug-in files The next step is to copy any other web server or plug-in configuration files and logs into the new HighCpuDoc directory. Here is a list of files to copy if they are being used: - any IHS configuration file other than httpd.conf in the IHS install directory - any additional web server error or access log files, such as log files specific to each virtual host or log files created by rotatelogs - the WebSphere plug-in configuration file - the WebSphere plug-in log file ### Saving the documentation directory The last step is to pack up and compress the documentation directory using zip, tar followed by gzip, or pax followed by compress. The easiest way is to cut and paste the messages displayed by ServerDoc previously which showed the commands to use. The suggested commands will vary by platform. On z/OS, for example, pax and compress will be suggested instead of tar and gzip. #### a sample run # tar -cf HighCpuDoc.200501201122.tar HighCpuDoc.200501201122 # gzip HighCpuDoc.200501201122.tar The resulting compressed file is the file to send to IBM support. ### Understanding the `root` requirement When gathering information on high CPU problems, the tool must attach to a live web server process to obtain information about the state of that process. If the web server is started as `root`, then that process will be owned by `root` or by the web server user id (e.g., `nobody` or `www`). Only `root` has the authority to attach to any of the web server processes, so it is easiest if the tool itself is run as `root`. If the web server administrator does not have authority to log in or switch user to `root`, a simple script can be created to gather the high CPU documentation, and the system administrator can give the web server administrator `sudo` access to that script. `sudo` is a third-party tool available without cost for all appropriate platforms. If the web server is not started as `root`, there are no such concerns, and the high CPU documentation tool may be run by the user id which starts the web server. If the tool is run as non-`root` and it is unable to gather the required information, the problem will have to be recreated. It may not be possible to determine if this problem occurred until the documentation has been analyzed by IBM HTTP Server support.