MustGather: Crashes¶
Recent defects and commonly encountered known issues to check for first¶
Review the issues below before running the mustgather or contacting support. New/contemporary issues are addded to the top.
Recent bugs that can result in a crash:¶
Prior to 8.5.5.25 and 9.0.5.17 the WAS plugin can crash in "detailedLog". (PH54601)
Prior to 8.5.5.24 and 9.0.5.16 the WAS plugin with IM enabled can crash in "odrFreeDbg". (PH54204)
Prior to 8.5.5.20 and 9.0.5.8 IHS can crash with "StrictHostCheck ON" (APAR PH35107).
Prior to 8.5.5.20 and 9.0.5.8 the WAS plugin can crash in the "detailedLog" function (PH36487)
Prior to 8.5.5.20 and 9.0.5.8 Linux on ppc64le can crash when logging a connection failure error (PH36211)
Crashes with logrotated¶
If logrotated is configured to rotate multiple log files for the same httpd instance, it can trigger a crash when doing back-to-back graceful restarts. Delays must be added to the logrotated configuration to avoid this knowm limitation of multiple graceful restarts.
SIGBUS serving large static files that are truncated or modified at runtime¶
If static files are modified or truncated in place, EnableMMAP OFF
must be specified. On Linux, failures in this scenario
will show SIGBUS rather than SIGSEGV.
Older defects and known issues¶
Graceful restart crashes memory corruption in configurations with WAS Plugin log rotation¶
The WAS Plugin can cause memory corruption when the log rotation feature is enabled in plugin-cfg.xml
without PH20448 applied. The backtrace is likely to include logClose
.
Crashes referencing 'handleLogend'¶
The WAS plugin can crash in situations where the http_plugin.log can't be opened (due to e.g. bad paths, bad permissions, etc). Apply a fixpack with PI79492.
Crashes (child process restarts) on Windows with mod_mem_cache¶
IHS is typically a 32-bit application on Windows, and can only address
around 2GB of memory. If MCacheSize
is larger than a few hundred megabytes,
it is likely that processes will run out of memory at runtime.
System crash on AIX ¶
AIX APAR IZ99394 (sysroutes IZ44282 IZ48935 IZ95001 IZ99394 IV13061 IV13834) can cause a system crash running any networking software, such as IBM HTTP Server. Crashes will be in net_kmem_rmlist/net_malloc or related AIX OS code.
Memory allocation errors with high ThreadsPerChild on 64-bit AIX ¶
64-bit IHS builds on AIX mistakenly shipped with a default MAXDATA setting in bin/envvars that limits overall heap size to around 2GB. While this does not cause a leak, it can turn virtual address space size growth into memory allocation failures (or sometimes, crashes) The line should be commented out on 64-bit IHS installs that use a non-default ThreadsPerChild or otherwise have high heap memory requirements. Be sure that the userid that invokes apachectl has 'ulimit -d unlimited' in their environment, as this rlimit also caps max heap size.
If you have this symptom (OOM and ~1.75GB core file) but a low ThreadsPerChild directive, consider setting MaxRequestsPerChild 10000
in addition to the bin/envvars fix above.
A similar problem can occur if ulimit -d
is set to anything other than unlimited.
32-bit webserver on any platform / any operating system ¶
Exceeding 2000 ThreadsPerChild
puts any 32-bit server
into risk for exhausting all address space available in a single process.
When no more memory is addressable, allocations will begin to fail and usually
result in crashes.
2000 is not a magic number, and the exact limits on address space vary by system just as exact address space usage varies by configuration and workload.
Many RewriteCond/RewriteRule directives with long URLs ¶
Crashes under SSL load with IHS prior to 8.5.5.2/8.0.0.9/7.0.0.33 (PI08502) ¶
IBM Global Security Kit (GSKit) prior to 7.0.4.48/8.0.5.17 can crash or corrupt memory under load.
Crashes under SSL load with IHS between 8.0 or 8.5 before PM72915 (8.0.0.0-4, 8.5.0.0) ¶
Circumvention: Set SSLAttributeSet 445 1
in each context with SSLEnable if you cannot move
to 8.0.0.5, 8.5.0.1, or apply interim fix for PM72915.
Will also occur under later releases when SSLCompression ON
is configured before
IBM Global Security Kit (GSKit) is updated to 8.0.14.24 or later.
Crashes for each SSL request with crytpographic accelerator ¶
If SSLPKCSDriver
is used, it's probably related to the symptom.
See cryptohw.html for certified adapters and possible debugging tips.
Crashes under load with SSL and installed cryptographic hardware prior to 8.0.0.0 ¶
Try setting SSLAcceleratorDisable
globally. If this makes the crashes go away,
IHS was unexpectedly using a legacy interface on a modern SSL co-processor and you should remain
in this configuration.
Crashes after using the WebServer Plugin "merge tool" ¶
The WebServer Plugin "merge tool" before PM38369 can generate a configuration without a PrimaryServers
tag which causes a runtime crash in the WebServer Plugin.
A CrashDoc
will report a crash string
including <listGetHead<serverGroupGetFirstPrimaryServer<
Crashes on Solaris 10 ¶
Make sure required Solaris AF_UNIX fixes have been applied, using one of the patches below or equivalent:
SPARC: 120664-01
x64: 120665-01
SIGBUS crash on Linux and AIX ¶
The most common cause of a SIGBUS crash on these platforms is that a file is truncated while the web server is trying to send it to a client. Some file replacement methods cause the existing file to be truncated and then the new contents written, instead of writing the new contents to a temporary file and then renaming to the proper name.
If you have static files served from IHS which can be modified in place, try EnableMMap Off to see if the problem is resolved.
Note: On Solaris, many other types of crashes result in SIGBUS.
z/OS specific crashes ¶
For U40xx or S0C4 abend in LE CELQLIB at httpd child process termination, check for applicability of LE APAR PK34252.
For a S0C4 abend in ATOI at IHS startup with LE trace enabled, check for applicability of LE APAR PK81097.
Crashes with mod_php on Unix platforms ¶
The PHP manual recommends against using PHP in a multithreaded web server; see "Why shouldn't I use Apache2 with a threaded MPM in a production environment?".
IHS is multithreaded on all platforms.
Thread safety problems in PHP applications or third-party libraries referenced by PHP can cause crashes in a threaded web server. The recommended solution is to configure PHP as a FastCGI application and use mod_fastcgi to communicate with it.
Crashes on Linux Platforms with ThreadsPerChild over 200 ¶
On Linux, child process crashes can occur due to address space exhaustion when large numbers of threads are used with the default thread stack size.
A thread stack size of 128KB is sufficient for IBM HTTP Server and the
WebSphere plug-in; however, the system default is typically 8MB or
larger. With the system default and large values for
ThreadsPerChild
, most of the address space can be
consumed by thread stacks. For example, with
ThreadsPerChild
set to 512 and a stack size of 8MB, 2GB
of the address space will be consumed by thread stacks. Memory
allocations during request processing can then exceed the address
space limit, typically 3GB, and result in crashes in arbitrary
components of the webserver.
The system default can be displayed by ulimit -s (or 8MB if the value is 'unlimited')
With high values for ThreadsPerChild
, the ThreadStackSize
directive should be used to specify a much smaller stack size, as in
the following example:
# Default to 128Kb stack size
ThreadStackSize 131072
Third-party modules may require a larger thread stack size. We recommend setting it to 256KB when third-party modules are used, unless the vendor is able to specify the exact requirement.
Crash when using crypto hardware¶
If you are experiencing crashes while using crypto hardware then refer to the information in the Cryptographic accelerator Questions and Answers / Things to check first section
Documentation required to diagnose child process crashes¶
core dump from crash and backtrace obtained on customer system
web server and plug-in configuration files
web server and plug-in log files
Obtaining and installing the collector, ihsdiag, is documented here
If core dumps are not being saved for the child process crashes, the first step is to perform any necessary operating system and web server configuration so that core dumps are saved. Core dump configuration information is described here.
When a core dump is available, the ServerDoc tool provided with ihsdiag automates much of the work of gathering and formatting the required documentation. The user runs ServerDoc and provides the IHS installation directory and the path to the core file, and ServerDoc creates a new directory to hold the required documentation, and stores information in that new directory.
Once the ServerDoc tool has completed, the user should copy any remaining log files and configuration files used by the web server and the plug-in into the new directory, and send in the directory to IBM support.
Note: If IBM HTTP Server has been upgraded to a newer maintenance level since the core dump was generated, the core dump needs to be reproduced with the new level of product code. Otherwise, the crash information will be incorrect since the core dump and the product won't match.
Collecting the mustgather¶
If none of the known issues above are responsible for the crash, proceed on
to collecting the CrashDoc
mustgather.
You will need to download the collector
What we expect to learn from this information¶
A core dump and related information is critical for diagnosing the cause of child process crashes. Without the information, IBM support is limited to suggesting that the customer move to the current level of fixes. With the information, IBM support anticipates being able to make the following initial determination:
which component crashed, whether from IBM or from a third-party vendor
for problems in IBM-provided components: whether or not this is a known problem
In cases where an IBM component crashed, the information often contains enough information to address the root cause of previously unknown problems. Even when the root cause cannot be determined from a particular core dump, the information is used to decide the next step.
In cases where a third-party component crashed, the vendor of that component will need to investigate further; IBM support is unable to diagnose problems in third-party components.
Making sure required support programs are available¶
Please refer to these instructions for verifying that required support programs are installed.
Running the tool¶
Run the tool as root
to avoid any permissions problems
with reading the core file or other files, such as log files and
configuration files. (More information about the requirement to run
this tool as root
is available here.)
ServerDoc is passed three parameters for gathering crash documentation:
GatherCrashDoc
the name of the IHS installation directory (e.g., /usr/HTTPServer)
the name of the core file (e.g., /tmp/core)
# java -jar ServerDoc.jar GatherCrashDoc /path/to/IHS /path/to/corefile
The tool creates a new directory which contains a timestamp in the name, and the crash documentation will be saved in that directory.
a sample run¶
For this example, IHS is installed in /usr/HTTPServer
,
the core dump was written to /tmp/core
, and ihsdiag was
unpacked into /root/ihsdiag-1.1.0
# cd /tmp
# java -jar /root/ihsdiag-1.1.0/ServerDoc.jar GatherCrashDoc \
/usr/HTTPServer /tmp/core
Reports, log files, and configuration files have been saved to directory
CrashDoc.200404121310
If you have additional log files or configuration files, copy them there
before packing up the directory.
Hint for packing up the directory:
tar -cf CrashDoc.200404121310.tar CrashDoc.200404121310
gzip CrashDoc.200404121310.tar
# ls -l CrashDoc.200404121310/
total 8136
-rw-r--r-- 1 root system 8779 Apr 12 13:10 access_log
-rw-r--r-- 1 root system 7094 Apr 12 13:10 apachectl
-rw-r--r-- 1 root system 3593703 Apr 12 13:10 core
-rw-r--r-- 1 root system 478483 Apr 12 13:10 core_file_strings
-rw-r--r-- 1 root system 14419 Apr 12 13:10 error_log
-rw-r--r-- 1 root system 37141 Apr 12 13:10 httpd.conf
-rw-r--r-- 1 root system 7500 Apr 12 13:10 log
-rw-r--r-- 1 root system 173 Apr 12 13:10 report
copying other web server and plug-in files¶
The next step is to copy any other web server or plug-in configuration files and logs into the new CrashDoc directory. Here is a list of files to copy if they are being used:
any IHS configuration file other than httpd.conf in the IHS install directory
any additional web server error or access log files, such as log files specific to each virtual host or log files created by rotatelogs
the WebSphere plug-in configuration file
the WebSphere plug-in log file
saving the documentation directory¶
The last step is to pack up and compress the documentation directory using zip, tar followed by gzip, or pax followed by compress. The easiest way is to cut and paste the messages displayed by ServerDoc previously which showed the commands to use. The suggested commands will vary by platform. On z/OS, for example, pax and compress will be suggested instead of tar and gzip.
# tar -cf CrashDoc.200404121310.tar CrashDoc.200404121310
# gzip CrashDoc.200404121310.tar
The resulting compressed file is the file to send to IBM support.
understanding the 'root' requirement ¶
When gathering information on web server crashes, the tool must
be able to read core files created for web server processes and web
server logs and configuration files. Often the web server logs and
configuration files are readable by normal user ids, but core files
are readable only by root
or by the web server user id
(e.g., nobody or www).
If the web server is started as root
, the permissions
on generated core files and log files and configuration files can be
changed to allow a non-root
user to run the crash
documentation tool.
If the web server is not started as root
, there are no
such concerns, and the crash documentation tool may be run by the user
id which starts the web server.
If the tool is run as non-root
and it is unable to
gather the required information, permissions on the core file or other
files can be changed and the tool may be run again. It may not be
possible to determine if this problem occurred until the documentation
has been analyzed by IBM HTTP Server support.