MustGather: Crashes

Recent defects and commonly encountered known issues to check for first

Review the issues below before running the mustgather or contacting support. New/contemporary issues are addded to the top.

Recent bugs that can result in a crash:

  • Prior to 8.5.5.25 and 9.0.5.17 the WAS plugin can crash in "detailedLog". (PH54601)

  • Prior to 8.5.5.24 and 9.0.5.16 the WAS plugin with IM enabled can crash in "odrFreeDbg". (PH54204)

  • Prior to 8.5.5.20 and 9.0.5.8 IHS can crash with "StrictHostCheck ON" (APAR PH35107).

  • Prior to 8.5.5.20 and 9.0.5.8 the WAS plugin can crash in the "detailedLog" function (PH36487)

  • Prior to 8.5.5.20 and 9.0.5.8 Linux on ppc64le can crash when logging a connection failure error (PH36211)

Crashes with logrotated

If logrotated is configured to rotate multiple log files for the same httpd instance, it can trigger a crash when doing back-to-back graceful restarts. Delays must be added to the logrotated configuration to avoid this knowm limitation of multiple graceful restarts.

SIGBUS serving large static files that are truncated or modified at runtime

If static files are modified or truncated in place, EnableMMAP OFF must be specified. On Linux, failures in this scenario will show SIGBUS rather than SIGSEGV.

Older defects and known issues

Graceful restart crashes memory corruption in configurations with WAS Plugin log rotation

The WAS Plugin can cause memory corruption when the log rotation feature is enabled in plugin-cfg.xml without PH20448 applied. The backtrace is likely to include logClose.

Crashes referencing 'handleLogend'

The WAS plugin can crash in situations where the http_plugin.log can't be opened (due to e.g. bad paths, bad permissions, etc). Apply a fixpack with PI79492.

Crashes (child process restarts) on Windows with mod_mem_cache

IHS is typically a 32-bit application on Windows, and can only address around 2GB of memory. If MCacheSize is larger than a few hundred megabytes, it is likely that processes will run out of memory at runtime.

System crash on AIX

AIX APAR IZ99394 (sysroutes IZ44282 IZ48935 IZ95001 IZ99394 IV13061 IV13834) can cause a system crash running any networking software, such as IBM HTTP Server. Crashes will be in net_kmem_rmlist/net_malloc or related AIX OS code.

Memory allocation errors with high ThreadsPerChild on 64-bit AIX

64-bit IHS builds on AIX mistakenly shipped with a default MAXDATA setting in bin/envvars that limits overall heap size to around 2GB. While this does not cause a leak, it can turn virtual address space size growth into memory allocation failures (or sometimes, crashes) The line should be commented out on 64-bit IHS installs that use a non-default ThreadsPerChild or otherwise have high heap memory requirements. Be sure that the userid that invokes apachectl has 'ulimit -d unlimited' in their environment, as this rlimit also caps max heap size.

If you have this symptom (OOM and ~1.75GB core file) but a low ThreadsPerChild directive, consider setting MaxRequestsPerChild 10000 in addition to the bin/envvars fix above.

A similar problem can occur if ulimit -d is set to anything other than unlimited.

32-bit webserver on any platform / any operating system

Exceeding 2000 ThreadsPerChild puts any 32-bit server into risk for exhausting all address space available in a single process. When no more memory is addressable, allocations will begin to fail and usually result in crashes.

2000 is not a magic number, and the exact limits on address space vary by system just as exact address space usage varies by configuration and workload.

Many RewriteCond/RewriteRule directives with long URLs

see memoryleak.html#rewrite

Crashes under SSL load with IHS prior to 8.5.5.2/8.0.0.9/7.0.0.33 (PI08502)

IBM Global Security Kit (GSKit) prior to 7.0.4.48/8.0.5.17 can crash or corrupt memory under load.

Crashes under SSL load with IHS between 8.0 or 8.5 before PM72915 (8.0.0.0-4, 8.5.0.0)

Circumvention: Set SSLAttributeSet 445 1 in each context with SSLEnable if you cannot move to 8.0.0.5, 8.5.0.1, or apply interim fix for PM72915.

Will also occur under later releases when SSLCompression ON is configured before IBM Global Security Kit (GSKit) is updated to 8.0.14.24 or later.

Crashes for each SSL request with crytpographic accelerator

If SSLPKCSDriver is used, it's probably related to the symptom.
See cryptohw.html for certified adapters and possible debugging tips.

Crashes under load with SSL and installed cryptographic hardware prior to 8.0.0.0

Try setting SSLAcceleratorDisable globally. If this makes the crashes go away, IHS was unexpectedly using a legacy interface on a modern SSL co-processor and you should remain in this configuration.

Crashes after using the WebServer Plugin "merge tool"

The WebServer Plugin "merge tool" before PM38369 can generate a configuration without a PrimaryServers tag which causes a runtime crash in the WebServer Plugin.

A CrashDoc will report a crash string including <listGetHead<serverGroupGetFirstPrimaryServer<

Crashes on Solaris 10

Make sure required Solaris AF_UNIX fixes have been applied, using one of the patches below or equivalent:

  • SPARC: 120664-01

  • x64: 120665-01

SIGBUS crash on Linux and AIX

The most common cause of a SIGBUS crash on these platforms is that a file is truncated while the web server is trying to send it to a client. Some file replacement methods cause the existing file to be truncated and then the new contents written, instead of writing the new contents to a temporary file and then renaming to the proper name.

If you have static files served from IHS which can be modified in place, try EnableMMap Off to see if the problem is resolved.

Note: On Solaris, many other types of crashes result in SIGBUS.

z/OS specific crashes

  • For U40xx or S0C4 abend in LE CELQLIB at httpd child process termination, check for applicability of LE APAR PK34252.

  • For a S0C4 abend in ATOI at IHS startup with LE trace enabled, check for applicability of LE APAR PK81097.

Crashes with mod_php on Unix platforms

The PHP manual recommends against using PHP in a multithreaded web server; see "Why shouldn't I use Apache2 with a threaded MPM in a production environment?".

IHS is multithreaded on all platforms.

Thread safety problems in PHP applications or third-party libraries referenced by PHP can cause crashes in a threaded web server. The recommended solution is to configure PHP as a FastCGI application and use mod_fastcgi to communicate with it.

Crashes on Linux Platforms with ThreadsPerChild over 200

On Linux, child process crashes can occur due to address space exhaustion when large numbers of threads are used with the default thread stack size.

A thread stack size of 128KB is sufficient for IBM HTTP Server and the WebSphere plug-in; however, the system default is typically 8MB or larger. With the system default and large values for ThreadsPerChild, most of the address space can be consumed by thread stacks. For example, with ThreadsPerChild set to 512 and a stack size of 8MB, 2GB of the address space will be consumed by thread stacks. Memory allocations during request processing can then exceed the address space limit, typically 3GB, and result in crashes in arbitrary components of the webserver.

The system default can be displayed by ulimit -s (or 8MB if the value is 'unlimited')

With high values for ThreadsPerChild, the ThreadStackSize directive should be used to specify a much smaller stack size, as in the following example:

# Default to 128Kb stack size
ThreadStackSize 131072

Third-party modules may require a larger thread stack size. We recommend setting it to 256KB when third-party modules are used, unless the vendor is able to specify the exact requirement.

Crash when using crypto hardware

If you are experiencing crashes while using crypto hardware then refer to the information in the Cryptographic accelerator Questions and Answers / Things to check first section

Documentation required to diagnose child process crashes

  • core dump from crash and backtrace obtained on customer system

  • web server and plug-in configuration files

  • web server and plug-in log files

Obtaining and installing the collector, ihsdiag, is documented here

If core dumps are not being saved for the child process crashes, the first step is to perform any necessary operating system and web server configuration so that core dumps are saved. Core dump configuration information is described here.

When a core dump is available, the ServerDoc tool provided with ihsdiag automates much of the work of gathering and formatting the required documentation. The user runs ServerDoc and provides the IHS installation directory and the path to the core file, and ServerDoc creates a new directory to hold the required documentation, and stores information in that new directory.

Once the ServerDoc tool has completed, the user should copy any remaining log files and configuration files used by the web server and the plug-in into the new directory, and send in the directory to IBM support.

Note: If IBM HTTP Server has been upgraded to a newer maintenance level since the core dump was generated, the core dump needs to be reproduced with the new level of product code. Otherwise, the crash information will be incorrect since the core dump and the product won't match.

Collecting the mustgather

If none of the known issues above are responsible for the crash, proceed on to collecting the CrashDoc mustgather.

You will need to download the collector

What we expect to learn from this information

A core dump and related information is critical for diagnosing the cause of child process crashes. Without the information, IBM support is limited to suggesting that the customer move to the current level of fixes. With the information, IBM support anticipates being able to make the following initial determination:

  • which component crashed, whether from IBM or from a third-party vendor

  • for problems in IBM-provided components: whether or not this is a known problem

In cases where an IBM component crashed, the information often contains enough information to address the root cause of previously unknown problems. Even when the root cause cannot be determined from a particular core dump, the information is used to decide the next step.

In cases where a third-party component crashed, the vendor of that component will need to investigate further; IBM support is unable to diagnose problems in third-party components.

Making sure required support programs are available

Please refer to these instructions for verifying that required support programs are installed.

Running the tool

Run the tool as root to avoid any permissions problems with reading the core file or other files, such as log files and configuration files. (More information about the requirement to run this tool as root is available here.)

ServerDoc is passed three parameters for gathering crash documentation:

  • GatherCrashDoc

  • the name of the IHS installation directory (e.g., /usr/HTTPServer)

  • the name of the core file (e.g., /tmp/core)

# java -jar ServerDoc.jar GatherCrashDoc /path/to/IHS /path/to/corefile

The tool creates a new directory which contains a timestamp in the name, and the crash documentation will be saved in that directory.

a sample run

For this example, IHS is installed in /usr/HTTPServer, the core dump was written to /tmp/core, and ihsdiag was unpacked into /root/ihsdiag-1.1.0

# cd /tmp
# java -jar /root/ihsdiag-1.1.0/ServerDoc.jar GatherCrashDoc \
/usr/HTTPServer /tmp/core
Reports, log files, and configuration files have been saved to directory
CrashDoc.200404121310
If you have additional log files or configuration files, copy them there
before packing up the directory.

Hint for packing up the directory:
  tar -cf CrashDoc.200404121310.tar CrashDoc.200404121310
  gzip CrashDoc.200404121310.tar
# ls -l CrashDoc.200404121310/
total 8136
-rw-r--r--   1 root  system       8779 Apr 12 13:10 access_log
-rw-r--r--   1 root  system       7094 Apr 12 13:10 apachectl
-rw-r--r--   1 root  system    3593703 Apr 12 13:10 core
-rw-r--r--   1 root  system     478483 Apr 12 13:10 core_file_strings
-rw-r--r--   1 root  system      14419 Apr 12 13:10 error_log
-rw-r--r--   1 root  system      37141 Apr 12 13:10 httpd.conf
-rw-r--r--   1 root  system       7500 Apr 12 13:10 log
-rw-r--r--   1 root  system        173 Apr 12 13:10 report

copying other web server and plug-in files

The next step is to copy any other web server or plug-in configuration files and logs into the new CrashDoc directory. Here is a list of files to copy if they are being used:

  • any IHS configuration file other than httpd.conf in the IHS install directory

  • any additional web server error or access log files, such as log files specific to each virtual host or log files created by rotatelogs

  • the WebSphere plug-in configuration file

  • the WebSphere plug-in log file

saving the documentation directory

The last step is to pack up and compress the documentation directory using zip, tar followed by gzip, or pax followed by compress. The easiest way is to cut and paste the messages displayed by ServerDoc previously which showed the commands to use. The suggested commands will vary by platform. On z/OS, for example, pax and compress will be suggested instead of tar and gzip.

# tar -cf CrashDoc.200404121310.tar CrashDoc.200404121310
# gzip CrashDoc.200404121310.tar

The resulting compressed file is the file to send to IBM support.

understanding the 'root' requirement

When gathering information on web server crashes, the tool must be able to read core files created for web server processes and web server logs and configuration files. Often the web server logs and configuration files are readable by normal user ids, but core files are readable only by root or by the web server user id (e.g., nobody or www).

If the web server is started as root, the permissions on generated core files and log files and configuration files can be changed to allow a non-root user to run the crash documentation tool.

If the web server is not started as root, there are no such concerns, and the crash documentation tool may be run by the user id which starts the web server.

If the tool is run as non-root and it is unable to gather the required information, permissions on the core file or other files can be changed and the tool may be run again. It may not be possible to determine if this problem occurred until the documentation has been analyzed by IBM HTTP Server support.