Coredumps on Unix

A coredump is a special file which represents the memory image of a process. Many operating systems have the capability of saving a core dump when the application crashes. The core dump is an important part of diagnosing the cause of the crash, since the data which the application was accessing at the time is in the core dump, along with information about which part of the application was running at the time of the crash.

There are various configuration requirements which must be met in order for the operating system to save a core dump when IBM HTTP Server crashes. This document describes the common configuration requirements.

Quick checklist for selected platforms

Later sections of this document provide more information. Here is a quick checklist to consider. For z/OS information, refer to this document

AIX

  1. Modify httpd.conf and set CoreDumpDirectory directive to point to a location where the web server user id (e.g., nobody or www) can create files. Usually this is sufficient:

    CoreDumpDirectory /tmp
    

    Note: In some rare situations, core files will be larger than 2GB. They will be truncated unless the filesystem has large file support. By default, JFS filesystems don't support such files; large file support has to be enabled explicitly when the filesystem is created. Also check ulimit -f if your IHS processes are larger than 1GB to prevent the core files from being truncated to 1GB (the ulimit -f default).

  2. Full core handling: Current versions of IBM HTTP Server do not enable full core files automatically because of the potential size. AIX tuning is required to enable them.

  3. Stop IBM HTTP Server.

  4. Open $IHS_ROOT/bin/envvars in an editor and append:

    ulimit -c unlimited
    ulimit -f unlimited     
    
  5. Start IBM HTTP Server as normal

  6. Check the IBM HTTP Server error log and make sure you don't see one of these messages, which would indicate that one of the steps above was skipped:

    [notice] Core file limit is 0; core dumps will be not be written for server crashes
    [notice] CoreDumpDirectory not set; core dumps may not be written for child process crashes
    

(Levels of IBM HTTP Server prior to mid-2004 do not report these potential configuration problems.)

Solaris

  1. Modify httpd.conf and set CoreDumpDirectory directive to point to a location where the web server user id (e.g., nobody or www) can create files. Usually this is sufficient:

    CoreDumpDirectory /tmp
    

    Note: On Solaris, /tmp is often mounted on paging space (swap device). If there is a potential paging space shortage, create another directory on a physical file system, make sure that the web server user id can write to it, and set CoreDumpDirectory to point to that new directory.

  2. Run the coreadm program to configure the operating system to write core dumps for programs like IBM HTTP Server which switch identity at startup:

    # coreadm -e global-setid -e proc-setid -e global
    
  3. Stop IBM HTTP Server.

  4. Open $IHS_ROOT/bin/envvars in an editor and append:

    ulimit -c unlimited
    ulimit -f unlimited     
    
  5. Start IBM HTTP Server as normal

  6. Check the IBM HTTP Server error log and make sure you don't see one of these messages, which would indicate that one of the steps above was skipped:

    [notice] Core file limit is 0; core dumps will be not be written for server crashes
    [notice] CoreDumpDirectory not set; core dumps may not be written for child process crashes
    

    (Levels of IBM HTTP Server prior to mid-2004 do not report these potential configuration problems.)

Linux

systemd-coredump

If /proc/sys/kernel/core_pattern passes the core to /usr/lib/systemd/systemd-coredump, the core file only exists as part of a larger archive and cannot directly be read with tools such as gdb. Prior to running any IHS collector, extract the native core using the following procedure:

  1. Determine the crashing IHS process ID. This is usually noted in the IHS error log.

  2. List the available systemd-coredumps: coredumpctl list and confirm it has an entry for the affected PID

  3. Extract the core file to a location with enough space: coredumpctl dump $PID --output /tmp/core.$PID

  4. Run the ihsdiag against /tmp/core.$PID rather than .lz4/.zst files under /var/lib/systemd/coredump/

Basic enablement in IHS

  1. Modify httpd.conf and set CoreDumpDirectory directive to point to a location where the web server user id (e.g., nobody or www) can create files. Usually this is sufficient:

    CoreDumpDirectory /tmp
    

    Note: If /proc/sys/kernel/core_pattern is set, the core dump location can be overridden including being processed by an application which may choose to put the core file anywhere, or stop writing them if they occur too rapidly. If a program is listed, you may need to look at it's output in the system log or dmesg to find the location of IHS cores.

  2. Stop IBM HTTP Server.

  3. Open $IHS_ROOT/bin/envvars in an editor and append:

    ulimit -c unlimited
    ulimit -f unlimited
    
  4. Start IBM HTTP Server as normal

  5. Check the IBM HTTP Server error log and make sure you don't see one of these messages, which would indicate that one of the steps above was skipped:

    [notice] Core file limit is 0; core dumps will be not be written for server crashes
    [notice] CoreDumpDirectory not set; core dumps may not be written for child process crashes
    

    (Levels of IBM HTTP Server prior to mid-2004 do not report these potential configuration problems.)

General configuration issues

  • file permission

    Make sure that the process for which a coredump is needed has permission to write a coredump. For example, with Apache/IHS, the default location of the coredump is the Apache/IHS install directory or the directory specified by the CoreDumpDirectory directive. The user id associated with Apache/IHS must have permission to write files there. For most processes created by Apache/IHS, that user id and group id is specified by the User and Group directives in httpd.conf. This is often "nobody." A quick work-around to permission problems is to specify "CoreDumpDirectory /tmp" in httpd.conf.

  • available disk space

    Make sure there is plenty of room (possibly many megabytes) available on the partition/mount/volume where you expect the core file to be placed. If you get a core dump which is unusable for some reason, check available disk space with df -k on the partition/mount/volume containing the core after the core dump has been written to ensure that the system did not run out of space.

    Note that with Apache/IHS, the core file will almost always be placed in the directory specified by the CoreDumpDirectory directive.

  • operating system file size and core size limits (ulimits)

    Make sure your ulimit is set appropriately so that you don't hit a limit in the size of the core file (some default limits have the size limited to zero bytes :) ).

    There are two parts: 1) the hard limits imposed by your system or system administrator and 2) the soft limits you can manipulate via the shell.

    Please note that the limits in force for the user that starts the server (usually root) are what is important. When the server starts as root switches user ids, the limits in force do not change.

hard system limits

On AIX, a hard limit can be set per user in smit:

smitty user
select "Change / Show Characteristics of a User"
enter user name
set "Hard CORE file size"

soft limits manipulated by your shell

On all systems, soft limits can be manipulated by the shell.

For bash or ksh, ulimit -a will display the limit and ulimit -c unlimited will let you get as much as your system [administrator] allows.

On AIX, a soft limit can be set per user in smit.

smitty user
select "Change / Show Characteristics of a User"
enter user name
set "Soft CORE file size"

Note that ulimit manipulation in the shell is still effective.

Alternate locations for coredumps

The default location for coredumps is the directory specified by the ServerRoot directory. When the web server is started as root, the child processes run under a different user id, which does not have permission to write to that directory. This is handled by using the CoreDumpDirectory directive to specify an alternate location, such as /tmp.

Some platforms provide a mechanism for specifying an alternate coredump location. This will override the value of the CoreDumpDirectory directive.

syscorepath command on AIX 5.2 and above

AIX 5.2 and above provides the syscorepath command for specifying an alternate coredump directory which affects all applications on the system. If the web server was started without the CoreDumpDirectory directive and that is preventing core dumps from being written because the default directory has unsuitable permissions, the syscorepath command can be used to specify a directory with the appropriate permissions, and coredumps can then be written without restarting the web server.

When syscorepath is used to specify an alternate directory, the file name of the coredump is no longer core, but instead includes the process id of the process which crashed, and the time of day that the crash occurred.

Refer to the syscorepath manpage for further information.

coreadm command on Solaris

Solaris provides the coreadm command which controls several coredump settings, including an alternate coredump directory and the format of the name of the coredump.

Refer to the coreadm manpage for further information.

Issues with threaded programs

Linux

If a thread takes a synchronous signal (e.g., SIGSEGV, SIGABRT, SIGBUS) on Linux < 2.4, the kernel won't take a coredump. A patch is available.

With the Linux 2.4 kernel, if a thread crashes you'll get two coredumps: one for the main process, named core.pid, and one for the bad thread, named core.fakepid.

AIX

Make sure the "full core" option is enabled (see below).

setuid() Issues

When IHS or Apache starts as root on Unix-like systems, it switches identity to the user and group specified in the configuration file. Sticky-bit programs and programs which start as root and then set their user id to something else have special issues for getting coredumps on some operating systems.

Solaris

By default, Solaris does not create coredumps for setuid() programs. Look at the documentation for the coreadm program (man coreadm). When all types of core dumps are enabled, it will display something like this:

  % coreadm
       global core file pattern: /coredumps/core.%f.%p
         init core file pattern: /coredumps/init-core.%f.%p
              global core dumps: enabled
         per-process core dumps: enabled
        global setid core dumps: enabled
   per-process setid core dumps: enabled
       global core dump logging: enabled

This will turn on most types of core dumps:

coreadm -e global-setid -e proc-setid -e global

This will set the global core file pattern:

coreadm -g /tmp/core.%f.%p

Note: when you include a directory in the core file pattern, Apache's CoreDumpDirectory directive cannot override that.

FreeBSD

You need to set the kern.sugid_coredump variable via sysctl.

Linux

When an application does this switch on Linux, the kernel normally disables coredumps. The application can make a special syscall -- prctl(PR_SET_DUMPABLE, 1) -- which will enable coredumps for that application. This syscall works only on Linux 2.4 and later kernels.

Important note: There are reports that some 2.4 kernels from some vendors may have the prctl() feature broken, such that a core dump is not written even when the prctl() call is issued.

AIX

no known setuid() issues

HP-UX

HP-UX prior to 11i has no known setuid() issues. With 11i, some extra configuration is required. Here is some information a customer received from HP-UX technical support:

This is an issue involving programs that run first as root and then switch to another user. The solution is to poke the kernel. Specifically, set an undocumented kernel parameter called dump_all (works for 11.11, but not for 11.0). Here's how to activate dump_all:

 
# echo "dump_all/W 1" | adb -w /stand/vmunix /dev/kmem
dump_all: 0 = 1

To deactivate use:

# echo "dump_all/W 0" | adb -w /stand/vmunix /dev/kmem
dump_all: 1 = 0

Special Operating System Considerations

AIX

full core option

AIX has a system-wide "full core" option which must be enabled in order for "user data" areas of memory to be written to the coredump. Without these areas of memory in the coredump, many types of problems cannot be diagnosed. It will also result in dbx having problems analyzing the coredump of a threaded process. It is very important to enable the "full core" option so that all the necessary information is in the coredump.

Here is an example scenario from a dump which was not recorded properly because Enable full CORE dump was false:

    [trawick@gorthaur platform_test]$ dbx ./a.out /tmp/core
    Type 'help' for help.
    warning: The core file is truncated.  You may need to increasethe
    ulimit
    for file and coredump, or free some space on the filesystem.
    reading symbolic information ...
    [using memory image in /tmp/core]
    warning: Unable to access address 0xf0203a48 from core
    pthdb_session.c, 487: 1 PTHDB_CALLBACK (callback failed)
    k_thread.c, 2124: PTHDB_CALLBACK (callback failed)
    
    Segmentation fault in sig_coredump at line 24
       24       kill(ap_my_pid, sig);
    (dbx) up
    warning: Unable to access address 0x8 from core
    not that many levels
    (dbx)

Once the "full core" option was enabled the proper information was recorded and dbx could be used to determine the cause of the segfault.

checking if full core is set

Run this command: lsattr -El sys0 -a fullcore

The desired output is:

    fullcore true Enable full CORE dump True

If either the second word or last word of output are not "true" then the full core option is not currently enabled.

(Under some conditions, the full core option may not take affect immediately if set from smitty chgsys.)

enabling full core

Run this command to enable the option immediately: chdev -l sys0 -a fullcore=true

Important note: If the full core option took effect after the crashing application was started, the application should be stopped and then started again so that full core dumps are written.

Again, verify with the lsattr command above that the setting took effect.

Large file support

Occasionally, core dumps will exceed 2GB in size. Thus, the directory for coredumps must support large files. This is specified during the creation of the JFS filesystem.

# lsfs -q should report bf: true

an easy way to set all limits to unlimited for a user

As root, edit /etc/security/limits and set everything to -1, then have the user log out and back in. Since the server is normally started as root, the user of interest is normally root.

Here is what the settings for the user should look like:

trawick:
        fsize = -1
        core = -1
        cpu = -1
        data = -1
        rss = -1
        stack = -1
        nofiles = -1

ulimit -a should show something like this:

trawick@tetra:~/wrk/port/testtool/platform_test% ulimit -a
core file size (blocks)     unlimited
data seg size (kbytes)      unlimited
file size (blocks)          unlimited
max memory size (kbytes)    unlimited
open files                  unlimited
pipe size (512 bytes)       64
stack size (kbytes)         unlimited
cpu time (seconds)          unlimited
max user processes          128
virtual memory (kbytes)     unlimited