Coredumps on Unix¶
A coredump is a special file which represents the memory image of a process. Many operating systems have the capability of saving a core dump when the application crashes. The core dump is an important part of diagnosing the cause of the crash, since the data which the application was accessing at the time is in the core dump, along with information about which part of the application was running at the time of the crash.
There are various configuration requirements which must be met in order for the operating system to save a core dump when IBM HTTP Server crashes. This document describes the common configuration requirements.
Quick checklist for selected platforms¶
Later sections of this document provide more information. Here is a quick checklist to consider. For z/OS information, refer to this document
AIX¶
Modify httpd.conf and set CoreDumpDirectory directive to point to a location where the web server user id (e.g., nobody or www) can create files. Usually this is sufficient:
CoreDumpDirectory /tmp
Note: In some rare situations, core files will be larger than 2GB. They will be truncated unless the filesystem has large file support. By default, JFS filesystems don't support such files; large file support has to be enabled explicitly when the filesystem is created. Also check ulimit -f if your IHS processes are larger than 1GB to prevent the core files from being truncated to 1GB (the ulimit -f default).
Full core handling: Current versions of IBM HTTP Server do not enable full core files automatically because of the potential size. AIX tuning is required to enable them.
Stop IBM HTTP Server.
Open $IHS_ROOT/bin/envvars in an editor and append:
ulimit -c unlimited ulimit -f unlimited
Start IBM HTTP Server as normal
Check the IBM HTTP Server error log and make sure you don't see one of these messages, which would indicate that one of the steps above was skipped:
[notice] Core file limit is 0; core dumps will be not be written for server crashes [notice] CoreDumpDirectory not set; core dumps may not be written for child process crashes
(Levels of IBM HTTP Server prior to mid-2004 do not report these potential configuration problems.)
Solaris¶
Modify httpd.conf and set CoreDumpDirectory directive to point to a location where the web server user id (e.g., nobody or www) can create files. Usually this is sufficient:
CoreDumpDirectory /tmp
Note: On Solaris, /tmp is often mounted on paging space (swap device). If there is a potential paging space shortage, create another directory on a physical file system, make sure that the web server user id can write to it, and set CoreDumpDirectory to point to that new directory.
Run the coreadm program to configure the operating system to write core dumps for programs like IBM HTTP Server which switch identity at startup:
# coreadm -e global-setid -e proc-setid -e global
Stop IBM HTTP Server.
Open $IHS_ROOT/bin/envvars in an editor and append:
ulimit -c unlimited ulimit -f unlimited
Start IBM HTTP Server as normal
Check the IBM HTTP Server error log and make sure you don't see one of these messages, which would indicate that one of the steps above was skipped:
[notice] Core file limit is 0; core dumps will be not be written for server crashes [notice] CoreDumpDirectory not set; core dumps may not be written for child process crashes
(Levels of IBM HTTP Server prior to mid-2004 do not report these potential configuration problems.)
Linux¶
systemd-coredump¶
If /proc/sys/kernel/core_pattern
passes the core to /usr/lib/systemd/systemd-coredump, the core file only
exists as part of a larger archive and cannot directly be read with tools such as gdb. Prior to running any
IHS collector, extract the native core using the following procedure:
Determine the crashing IHS process ID. This is usually noted in the IHS error log.
List the available systemd-coredumps:
coredumpctl list
and confirm it has an entry for the affected PIDExtract the core file to a location with enough space:
coredumpctl dump $PID --output /tmp/core.$PID
Run the ihsdiag against /tmp/core.$PID rather than .lz4/.zst files under /var/lib/systemd/coredump/
Basic enablement in IHS¶
Modify httpd.conf and set CoreDumpDirectory directive to point to a location where the web server user id (e.g., nobody or www) can create files. Usually this is sufficient:
CoreDumpDirectory /tmp
Note: If
/proc/sys/kernel/core_pattern
is set, the core dump location can be overridden including being processed by an application which may choose to put the core file anywhere, or stop writing them if they occur too rapidly. If a program is listed, you may need to look at it's output in the system log ordmesg
to find the location of IHS cores.Stop IBM HTTP Server.
Open $IHS_ROOT/bin/envvars in an editor and append:
ulimit -c unlimited ulimit -f unlimited
Start IBM HTTP Server as normal
Check the IBM HTTP Server error log and make sure you don't see one of these messages, which would indicate that one of the steps above was skipped:
[notice] Core file limit is 0; core dumps will be not be written for server crashes [notice] CoreDumpDirectory not set; core dumps may not be written for child process crashes
(Levels of IBM HTTP Server prior to mid-2004 do not report these potential configuration problems.)
General configuration issues¶
file permission
Make sure that the process for which a coredump is needed has permission to write a coredump. For example, with Apache/IHS, the default location of the coredump is the Apache/IHS install directory or the directory specified by the CoreDumpDirectory directive. The user id associated with Apache/IHS must have permission to write files there. For most processes created by Apache/IHS, that user id and group id is specified by the User and Group directives in httpd.conf. This is often "nobody." A quick work-around to permission problems is to specify "CoreDumpDirectory /tmp" in httpd.conf.
available disk space
Make sure there is plenty of room (possibly many megabytes) available on the partition/mount/volume where you expect the core file to be placed. If you get a core dump which is unusable for some reason, check available disk space with
df -k
on the partition/mount/volume containing the core after the core dump has been written to ensure that the system did not run out of space.Note that with Apache/IHS, the core file will almost always be placed in the directory specified by the CoreDumpDirectory directive.
operating system file size and core size limits (ulimits)
Make sure your ulimit is set appropriately so that you don't hit a limit in the size of the core file (some default limits have the size limited to zero bytes :) ).
There are two parts: 1) the hard limits imposed by your system or system administrator and 2) the soft limits you can manipulate via the shell.
Please note that the limits in force for the user that starts the server (usually
root
) are what is important. When the server starts asroot
switches user ids, the limits in force do not change.
hard system limits¶
On AIX, a hard limit can be set per user in smit:
smitty user
select "Change / Show Characteristics of a User"
enter user name
set "Hard CORE file size"
soft limits manipulated by your shell¶
On all systems, soft limits can be manipulated by the shell.
For bash or ksh, ulimit -a
will display the limit and ulimit -c unlimited
will let you get as much as your system [administrator]
allows.
On AIX, a soft limit can be set per user in smit.
smitty user
select "Change / Show Characteristics of a User"
enter user name
set "Soft CORE file size"
Note that ulimit manipulation in the shell is still effective.
Alternate locations for coredumps¶
The default location for coredumps is the directory specified by the
ServerRoot
directory. When the web server is started as root
, the
child processes run under a different user id, which does not have
permission to write to that directory. This is handled by using the
CoreDumpDirectory
directive to specify an alternate location, such as
/tmp
.
Some platforms provide a mechanism for specifying an alternate coredump
location. This will override the value of the CoreDumpDirectory
directive.
syscorepath
command on AIX 5.2 and above¶
AIX 5.2 and above provides the syscorepath
command for specifying an
alternate coredump directory which affects all applications on the
system. If the web server was started without the CoreDumpDirectory
directive and that is preventing core dumps from being written because
the default directory has unsuitable permissions, the syscorepath
command can be used to specify a directory with the appropriate
permissions, and coredumps can then be written without restarting the
web server.
When syscorepath
is used to specify an alternate directory, the file
name of the coredump is no longer core
, but instead includes the
process id of the process which crashed, and the time of day that the
crash occurred.
Refer to the syscorepath
manpage for further information.
coreadm
command on Solaris¶
Solaris provides the coreadm
command which controls several coredump
settings, including an alternate coredump directory and the format of
the name of the coredump.
Refer to the coreadm
manpage for further information.
Issues with threaded programs¶
Linux¶
If a thread takes a synchronous signal (e.g., SIGSEGV, SIGABRT, SIGBUS) on Linux < 2.4, the kernel won't take a coredump. A patch is available.
With the Linux 2.4 kernel, if a thread crashes you'll get two coredumps: one for the main process, named core.pid, and one for the bad thread, named core.fakepid.
AIX¶
Make sure the "full core" option is enabled (see below).
setuid() Issues¶
When IHS or Apache starts as root on Unix-like systems, it switches identity to the user and group specified in the configuration file. Sticky-bit programs and programs which start as root and then set their user id to something else have special issues for getting coredumps on some operating systems.
Solaris¶
By default, Solaris does not create coredumps for setuid() programs.
Look at the documentation for the coreadm program (man coreadm
). When
all types of core dumps are enabled, it will display something like
this:
% coreadm
global core file pattern: /coredumps/core.%f.%p
init core file pattern: /coredumps/init-core.%f.%p
global core dumps: enabled
per-process core dumps: enabled
global setid core dumps: enabled
per-process setid core dumps: enabled
global core dump logging: enabled
This will turn on most types of core dumps:
coreadm -e global-setid -e proc-setid -e global
This will set the global core file pattern:
coreadm -g /tmp/core.%f.%p
Note: when you include a directory in the core file pattern, Apache's CoreDumpDirectory directive cannot override that.
FreeBSD¶
You need to set the kern.sugid_coredump variable via sysctl.
Linux¶
When an application does this switch on Linux, the kernel normally
disables coredumps. The application can make a special syscall --
prctl(PR_SET_DUMPABLE, 1)
-- which will enable coredumps for that
application. This syscall works only on Linux 2.4 and later kernels.
Important note: There are reports that some 2.4 kernels from some vendors may have the prctl() feature broken, such that a core dump is not written even when the prctl() call is issued.
AIX¶
no known setuid() issues
HP-UX¶
HP-UX prior to 11i has no known setuid() issues. With 11i, some extra configuration is required. Here is some information a customer received from HP-UX technical support:
This is an issue involving programs that run first as root and then switch to another user. The solution is to poke the kernel. Specifically, set an undocumented kernel parameter called dump_all (works for 11.11, but not for 11.0). Here's how to activate dump_all:
# echo "dump_all/W 1" | adb -w /stand/vmunix /dev/kmem
dump_all: 0 = 1
To deactivate use:
# echo "dump_all/W 0" | adb -w /stand/vmunix /dev/kmem
dump_all: 1 = 0
Special Operating System Considerations¶
AIX¶
full core option¶
AIX has a system-wide "full core" option which must be enabled in order
for "user data" areas of memory to be written to the coredump. Without
these areas of memory in the coredump, many types of problems cannot be
diagnosed. It will also result in dbx
having problems analyzing the
coredump of a threaded process. It is very important to enable the "full
core" option so that all the necessary information is in the coredump.
Here is an example scenario from a dump which was not recorded properly
because Enable full CORE dump
was false
:
[trawick@gorthaur platform_test]$ dbx ./a.out /tmp/core
Type 'help' for help.
warning: The core file is truncated. You may need to increasethe
ulimit
for file and coredump, or free some space on the filesystem.
reading symbolic information ...
[using memory image in /tmp/core]
warning: Unable to access address 0xf0203a48 from core
pthdb_session.c, 487: 1 PTHDB_CALLBACK (callback failed)
k_thread.c, 2124: PTHDB_CALLBACK (callback failed)
Segmentation fault in sig_coredump at line 24
24 kill(ap_my_pid, sig);
(dbx) up
warning: Unable to access address 0x8 from core
not that many levels
(dbx)
Once the "full core" option was enabled the proper information was recorded and dbx could be used to determine the cause of the segfault.
checking if full core is set¶
Run this command: lsattr -El sys0 -a fullcore
The desired output is:
fullcore true Enable full CORE dump True
If either the second word or last word of output are not "true" then the full core option is not currently enabled.
(Under some conditions, the full core option may not take affect
immediately if set from smitty chgsys
.)
enabling full core¶
Run this command to enable the option immediately: chdev -l sys0 -a fullcore=true
Important note: If the full core option took effect after the crashing application was started, the application should be stopped and then started again so that full core dumps are written.
Again, verify with the lsattr
command above that the setting took
effect.
Large file support¶
Occasionally, core dumps will exceed 2GB in size. Thus, the directory for coredumps must support large files. This is specified during the creation of the JFS filesystem.
# lsfs -q
should report bf: true
an easy way to set all limits to unlimited for a user¶
As root, edit /etc/security/limits
and set everything to -1, then have
the user log out and back in. Since the server is normally started as
root
, the user of interest is normally root
.
Here is what the settings for the user should look like:
trawick:
fsize = -1
core = -1
cpu = -1
data = -1
rss = -1
stack = -1
nofiles = -1
ulimit -a
should show something like this:
trawick@tetra:~/wrk/port/testtool/platform_test% ulimit -a
core file size (blocks) unlimited
data seg size (kbytes) unlimited
file size (blocks) unlimited
max memory size (kbytes) unlimited
open files unlimited
pipe size (512 bytes) 64
stack size (kbytes) unlimited
cpu time (seconds) unlimited
max user processes 128
virtual memory (kbytes) unlimited