# MustGather: Crashes ## Recent defects and commonly encountered known issues to check for first Review the issues below before running the mustgather or contacting support. New/contemporary issues are addded to the top. ### Recent bugs that can result in a crash: - Prior to 8.5.5.25 and 9.0.5.17 the WAS plugin can crash in "detailedLog". (PH54601) - Prior to 8.5.5.24 and 9.0.5.16 the WAS plugin with IM enabled can crash in "odrFreeDbg". (PH54204) - Prior to 8.5.5.20 and 9.0.5.8 IHS can crash with "StrictHostCheck ON" (APAR PH35107). - Prior to 8.5.5.20 and 9.0.5.8 the WAS plugin can crash in the "detailedLog" function (PH36487) - Prior to 8.5.5.20 and 9.0.5.8 Linux on ppc64le can crash when logging a connection failure error (PH36211) ### Crashes with logrotated If logrotated is configured to rotate multiple log files for the same httpd instance, it can trigger a crash when doing back-to-back graceful restarts. Delays must be added to the logrotated configuration to avoid this knowm limitation of multiple graceful restarts. ### SIGBUS serving large static files that are truncated or modified at runtime If static files are modified or truncated in place, `EnableMMAP OFF` must be specified. On Linux, failures in this scenario will show SIGBUS rather than SIGSEGV. ## Older defects and known issues ### [Graceful restart crashes memory corruption in configurations with WAS Plugin log rotation](#PH20448) The WAS Plugin can cause memory corruption when the log rotation feature is enabled in plugin-cfg.xml without PH20448 applied. The backtrace is likely to include `logClose`. ### [Crashes referencing 'handleLogend'](#PI79492) The WAS plugin can crash in situations where the http_plugin.log can't be opened (due to e.g. bad paths, bad permissions, etc). Apply a fixpack with PI79492. ### Crashes (child process restarts) on Windows with mod_mem_cache IHS is typically a 32-bit application on Windows, and can only address around 2GB of memory. If `MCacheSize` is larger than a few hundred megabytes, it is likely that processes will run out of memory at runtime. ### System crash on AIX AIX APAR IZ99394 (sysroutes IZ44282 IZ48935 IZ95001 IZ99394 IV13061 IV13834) can cause a system crash running any networking software, such as IBM HTTP Server. Crashes will be in net_kmem_rmlist/net_malloc or related AIX OS code.

### Memory allocation errors with high ThreadsPerChild on 64-bit AIX 64-bit IHS builds on AIX mistakenly shipped with a default MAXDATA setting in bin/envvars that limits overall heap size to around 2GB. While this does not cause a leak, it can turn virtual address space size growth into memory allocation failures (or sometimes, crashes) The line should be commented out on 64-bit IHS installs that use a non-default ThreadsPerChild or otherwise have high heap memory requirements. Be sure that the userid that invokes apachectl has 'ulimit -d unlimited' in their environment, as this rlimit also caps max heap size. If you have this symptom (OOM and ~1.75GB core file) but a low ThreadsPerChild directive, consider setting `MaxRequestsPerChild 10000` in addition to the bin/envvars fix above. A similar problem can occur if `ulimit -d` is set to anything other than unlimited. ### 32-bit webserver on any platform / any operating system Exceeding 2000 `ThreadsPerChild` puts any 32-bit server into risk for exhausting all address space available in a single process. When no more memory is addressable, allocations will begin to fail and usually result in crashes. 2000 is not a magic number, and the exact limits on address space vary by system just as exact address space usage varies by configuration and workload. ### Many RewriteCond/RewriteRule directives with long URLs see memoryleak.html#rewrite ### Crashes under SSL load with IHS prior to 8.5.5.2/8.0.0.9/7.0.0.33 (PI08502) IBM Global Security Kit (GSKit) prior to 7.0.4.48/8.0.5.17 can crash or corrupt memory under load. ### Crashes under SSL load with IHS between 8.0 or 8.5 before PM72915 (8.0.0.0-4, 8.5.0.0) Circumvention: Set `SSLAttributeSet 445 1` in each context with SSLEnable if you cannot move to 8.0.0.5, 8.5.0.1, or apply interim fix for PM72915. Will also occur under later releases when `SSLCompression ON` is configured before IBM Global Security Kit (GSKit) is updated to 8.0.14.24 or later. ### Crashes for each SSL request with crytpographic accelerator If `SSLPKCSDriver` is used, it's probably related to the symptom. See cryptohw.html for certified adapters and possible debugging tips. ### Crashes under load with SSL and installed cryptographic hardware prior to 8.0.0.0 Try setting `SSLAcceleratorDisable` globally. If this makes the crashes go away, IHS was unexpectedly using a legacy interface on a modern SSL co-processor and you should remain in this configuration. ### Crashes after using the WebServer Plugin "merge tool" The WebServer Plugin "merge tool" before PM38369 can generate a configuration without a `PrimaryServers` tag which causes a runtime crash in the WebServer Plugin. A `CrashDoc` will report a crash string including ` Make sure required Solaris AF_UNIX fixes have been applied, using one of the patches below or equivalent:

* SPARC: 120664-01 * x64: 120665-01 ### SIGBUS crash on Linux and AIX The most common cause of a SIGBUS crash on these platforms is that a file is truncated while the web server is trying to send it to a client. Some file replacement methods cause the existing file to be truncated and then the new contents written, instead of writing the new contents to a temporary file and then renaming to the proper name.

If you have static files served from IHS which can be modified in place, try EnableMMap Off to see if the problem is resolved.

Note: On Solaris, many other types of crashes result in SIGBUS.

### z/OS specific crashes * For U40xx or S0C4 abend in LE CELQLIB at httpd child process termination, check for applicability of LE APAR PK34252.

* For a S0C4 abend in ATOI at IHS startup with LE trace enabled, check for applicability of LE APAR PK81097.

### Crashes with mod_php on Unix platforms The PHP manual recommends against using PHP in a multithreaded web server; see "Why shouldn't I use Apache2 with a threaded MPM in a production environment?". IHS is multithreaded on all platforms. Thread safety problems in PHP applications or third-party libraries referenced by PHP can cause crashes in a threaded web server. The recommended solution is to configure PHP as a FastCGI application and use mod_fastcgi to communicate with it.

### Crashes on Linux Platforms with ThreadsPerChild over 200 On Linux, child process crashes can occur due to address space exhaustion when large numbers of threads are used with the default thread stack size.

A thread stack size of 128KB is sufficient for IBM HTTP Server and the WebSphere plug-in; however, the system default is typically 8MB or larger. With the system default and large values for `ThreadsPerChild`, most of the address space can be consumed by thread stacks. For example, with `ThreadsPerChild` set to 512 and a stack size of 8MB, 2GB of the address space will be consumed by thread stacks. Memory allocations during request processing can then exceed the address space limit, typically 3GB, and result in crashes in arbitrary components of the webserver. The system default can be displayed by ulimit -s (or 8MB if the value is 'unlimited') With high values for `ThreadsPerChild`, the `ThreadStackSize` directive should be used to specify a much smaller stack size, as in the following example:

``` # Default to 128Kb stack size ThreadStackSize 131072 ``` Third-party modules may require a larger thread stack size. We recommend setting it to 256KB when third-party modules are used, unless the vendor is able to specify the exact requirement. ### Crash when using crypto hardware If you are experiencing crashes while using crypto hardware then refer to the information in the Cryptographic accelerator Questions and Answers / Things to check first section ## Documentation required to diagnose child process crashes * core dump from crash and backtrace obtained on customer system * web server and plug-in configuration files * web server and plug-in log files Obtaining and installing the collector, ihsdiag, is documented here

If core dumps are not being saved for the child process crashes, the first step is to perform any necessary operating system and web server configuration so that core dumps are saved. Core dump configuration information is described here.

When a core dump is available, the ServerDoc tool provided with ihsdiag automates much of the work of gathering and formatting the required documentation. The user runs ServerDoc and provides the IHS installation directory and the path to the core file, and ServerDoc creates a new directory to hold the required documentation, and stores information in that new directory.

Once the ServerDoc tool has completed, the user should copy any remaining log files and configuration files used by the web server and the plug-in into the new directory, and send in the directory to IBM support. Note: If IBM HTTP Server has been upgraded to a newer maintenance level since the core dump was generated, the core dump needs to be reproduced with the new level of product code. Otherwise, the crash information will be incorrect since the core dump and the product won't match. ## Collecting the mustgather If none of the known issues above are responsible for the crash, proceed on to collecting the `CrashDoc` mustgather. You will need to [download the collector](install.html) ### What we expect to learn from this information A core dump and related information is critical for diagnosing the cause of child process crashes. Without the information, IBM support is limited to suggesting that the customer move to the current level of fixes. With the information, IBM support anticipates being able to make the following initial determination:

* which component crashed, whether from IBM or from a third-party vendor * for problems in IBM-provided components: whether or not this is a known problem In cases where an IBM component crashed, the information often contains enough information to address the root cause of previously unknown problems. Even when the root cause cannot be determined from a particular core dump, the information is used to decide the next step.

In cases where a third-party component crashed, the vendor of that component will need to investigate further; IBM support is unable to diagnose problems in third-party components.

### Making sure required support programs are available Please refer to these instructions for verifying that required support programs are installed.

### Running the tool Run the tool as `root` to avoid any permissions problems with reading the core file or other files, such as log files and configuration files. (More information about the requirement to run this tool as `root` is available here.) ServerDoc is passed three parameters for gathering crash documentation: * `GatherCrashDoc` * the name of the IHS installation directory (e.g., /usr/HTTPServer) * the name of the core file (e.g., /tmp/core) ``` # java -jar ServerDoc.jar GatherCrashDoc /path/to/IHS /path/to/corefile ``` The tool creates a new directory which contains a timestamp in the name, and the crash documentation will be saved in that directory.

### a sample run For this example, IHS is installed in `/usr/HTTPServer`, the core dump was written to `/tmp/core`, and ihsdiag was unpacked into /root/ihsdiag-1.1.0

``` # cd /tmp # java -jar /root/ihsdiag-1.1.0/ServerDoc.jar GatherCrashDoc \ /usr/HTTPServer /tmp/core Reports, log files, and configuration files have been saved to directory CrashDoc.200404121310 If you have additional log files or configuration files, copy them there before packing up the directory. Hint for packing up the directory: tar -cf CrashDoc.200404121310.tar CrashDoc.200404121310 gzip CrashDoc.200404121310.tar # ls -l CrashDoc.200404121310/ total 8136 -rw-r--r-- 1 root system 8779 Apr 12 13:10 access_log -rw-r--r-- 1 root system 7094 Apr 12 13:10 apachectl -rw-r--r-- 1 root system 3593703 Apr 12 13:10 core -rw-r--r-- 1 root system 478483 Apr 12 13:10 core_file_strings -rw-r--r-- 1 root system 14419 Apr 12 13:10 error_log -rw-r--r-- 1 root system 37141 Apr 12 13:10 httpd.conf -rw-r--r-- 1 root system 7500 Apr 12 13:10 log -rw-r--r-- 1 root system 173 Apr 12 13:10 report ``` ### copying other web server and plug-in files The next step is to copy any other web server or plug-in configuration files and logs into the new CrashDoc directory. Here is a list of files to copy if they are being used: * any IHS configuration file other than httpd.conf in the IHS install directory * any additional web server error or access log files, such as log files specific to each virtual host or log files created by rotatelogs * the WebSphere plug-in configuration file * the WebSphere plug-in log file ### saving the documentation directory The last step is to pack up and compress the documentation directory using zip, tar followed by gzip, or pax followed by compress. The easiest way is to cut and paste the messages displayed by ServerDoc previously which showed the commands to use. The suggested commands will vary by platform. On z/OS, for example, pax and compress will be suggested instead of tar and gzip.

``` # tar -cf CrashDoc.200404121310.tar CrashDoc.200404121310 # gzip CrashDoc.200404121310.tar ``` The resulting compressed file is the file to send to IBM support.

### understanding the 'root' requirement When gathering information on web server crashes, the tool must be able to read core files created for web server processes and web server logs and configuration files. Often the web server logs and configuration files are readable by normal user ids, but core files are readable only by `root` or by the web server user id (e.g., nobody or www). If the web server is started as `root`, the permissions on generated core files and log files and configuration files can be changed to allow a non-`root` user to run the crash documentation tool. If the web server is not started as `root`, there are no such concerns, and the crash documentation tool may be run by the user id which starts the web server. If the tool is run as non-`root` and it is unable to gather the required information, permissions on the core file or other files can be changed and the tool may be run again. It may not be possible to determine if this problem occurred until the documentation has been analyzed by IBM HTTP Server support.