Getting a backtrace from a coredump

The best way to get this information from a core dump is by using the ServerDoc tool, described here. Unless there is a problem running the automated tool, that should be used instead of these manual steps.

In rare circumstances it can be useful to analyze a coredump on a different machine, but in general it is best to try to analyze the coredump on the machine that generated it. The reason? Only that machine will have the same libraries and library versions, and when the coredump is analyzed elsewhere the analysis tools can give misleading information or refuse to work at all without the same libraries.

instructions for various tools

gdb instructions

# gdb /opt/IBMHTTPD/bin/httpd /tmp/core.13587
(gdb) where
(gdb) thread apply all bt

how to get gdb for various platforms

dbx instructions

Example of dbx on AIX (Solaris is similar):

# dbx /usr/HTTPServer/bin/httpd /usr/HTTPServer/core
Type 'help' for help.
warning: The core file is truncated.  You may need to increasethe
ulimit
for file and coredump, or free some space on the filesystem.
reading symbolic information ...
[using memory image in /usr/HTTPServer/core]

Segmentation fault in sig_coredump at 0x10003f3c ($t1)
0x10003f3c (sig_coredump+0x3c) 80410014        lwz   r2,0x14(r1)
(dbx) where
sig_coredump() at 0x10003f3c
wait_or_timeout() at 0x10004564
standalone_main() at 0x10001a00
main() at 0x1000134c

Important note about an AIX limitation

Please beware of a limitation of analyzing illegal instruction (SIGILL) coredumps on AIX. No backtrace is possible. Here is a typical encounter with this limitation:

#dbx /opt/HTTPServer/bin/httpd /IBPP/logs/core
Type 'help' for help.
reading symbolic information ...warning: no source compiled with -g

[using memory image in /IBPP/logs/core]

Illegal instruction (illegal opcode) in sig_coredump at 0x10003ee0
($t1)
0x10003ee0 (sig_coredump+0x3c) 80410014        lwz   r2,0x14(r1)
(dbx) where
sig_coredump() at 0x10003ee0
warning: Unable to access address 0x0 from core
warning: could not locate trace table from starting address 0x0
(dbx)

In the event of an illegal instruction (SIGILL) coredump on AIX, it is best to send the actual core dump file, along with the dbx output, to IBM where some information can be extracted. This is very probablematic and the best we can expect is to get some hints about what module might have crashed.

how to get dbx for various platforms

  1. AIX - it is part of the base operating system; just make sure the fileset bos.adt.debug is installed

  2. Solaris - it comes with Sun's C/C++ development product (Forte), which is an extra-cost product (so use pstack if target system doesn't have this installed)

  3. Linux and HP-UX - not available; use gdb

Solaris adb instructions

Example:

    # adb /opt/IBMHTTPD/bin/httpd /opt/IBMHTTPD/core
    core file = /opt/IBMHTTPD/core -- program ``httpd.emerson'' on -->
    -- platform SUNW,Ultra-250
    SIGBUS: Bus Error
    $c
    send_silly(18e938,fe4f0e81,ffffffff,7efefeff,79,79) + f4
    ap_invoke_handler(18e938,fde20148,0,0,6,6) + 174
    process_request_internal(18e938,1,40,ffbef99c,4,1) + 61c
    ap_process_request(18e938,4,18e938,ffbefa24,ffbefa34,2) + 30
    child_main(2,2ce00,ff37f6a8,ff37e000,0,0) + 720
    make_child(9e7a8,2,3ddea716,ffffffc0,10,fde7a0f4) + 158
    startup_children(3,9e7a8,93d1c,9e7a8,80790,7a58c) + 88
    standalone_main(1,ffbefc94,93d1c,ff23a000,ff23cfec,807d8) + 1dc
    main(1,ffbefc94,ffbefc9c,93800,0,0) + 574
    ^D
    #

Note that the command to get the backtrace is $c - dollar sign followed by c. The command to get out is the eof character, usually ^D

how to get adb

  1. Solaris - it comes with the base OS

Solaris pstack instructions<-- {#pstack} -->

Use the pstack command against the coredump. pstack is part of the base operating system, so it does not have to be installed separately. This is the recommended way to get backtraces on Solaris, especially when Sun's dbx tool is not available, since pstack can display function arguments for programs built without symbolic information (like official product builds) whereas gdb can't. Also, there have been circumstances where gdb didn't display the complete backtrace for a segfaulting thread but pstack did.

Note that pstack doesn't know how many arguments there are so it always displays six. So if you know that some function has only two arguments, ignore whatever pstack displays after the first argument.

Example:

    # pstack core.httpd.1008
    core 'core.httpd.1008' of 1008: /opt/IBMHTTPD/bin/httpd
    -----------------  lwp# 1 / thread# 1  --------------------
     0002e3e8 ???????? (ffbeee7c, 1425, d, a16f0, 82b68, 9b098)
     00031188 main     (1, ffbeef94, 96408, ff238018, ff23b03c, 82cf0) + 478
     00031bec parse_byterange (1, ffbeef94, ffbeef9c, 96000, 0, 0) + 484
     00017308 load_module (0, 0, 0, 0, 0, 0) + 140
    -----------------  lwp# 2 / thread# 2  --------------------
     ff21ad54 _signotifywait (ff16e000, 0, 0, ff23b540, 0, 0) + 8
     ff151ae4 thr_yield (0, 0, 0, 0, 0, 0) + 8c
    -----------------  lwp# 3 / thread# 3  --------------------
     ff21b3e0 _lwp_sema_wait (fe30de30, ff16e000, 0, fe30dd78, 250c4, 0) + c
     ff14944c _swtch   (fe30dd78, fe30dd78, ff16e000, 5, 1000, 1) + 424
     ff14d8a4 _reap_wait (ff172a08, 20a38, 0, ff16e000, 0, 0) + 38
     ff14d5fc _reaper  (ff16ee30, ff255d18, ff172a08, ff16ee08, 0,
     fe400000) + 38
     ff15ba1c _thread_start (0, 0, 0, 0, 0, 0) + 40
    #

The pstack command automatically displays the backtrace for each thread.

how to find out which thread did the dirty deed

The Solaris pflags command displays information about the various threads in a coredump or live process. Here is some output showing how it labels the thread that did something bad:

    $ pflags core
    core 'core' of 20897:   /export/home/trawick/ph/2.0.42/built/bin/httpd
    -k start
            data model = _ILP32
      /1:   flags = PR_PCINVAL
      sigmask = 0xffffbefc,0x00001fff  cursig = SIGSEGV
      /2:   flags = PR_STOPPED|PR_ASLWP
      why = PR_SUSPENDED
      sigmask = 0xffbffeff,0x00001fff
      /5:   flags = PR_STOPPED
      why = PR_SUSPENDED
      /4:   flags = PR_STOPPED
      why = PR_SUSPENDED
      /6:   flags = PR_STOPPED
      why = PR_SUSPENDED
      /7:   flags = PR_STOPPED
      why = PR_SUSPENDED
    (rest of output omitted)

Note that thread 1 has cursig = SIGSEGV next to it. That is the flag that Solaris thinks did the dirty deed. This is often correct. (Note: For other types of problems it may say SIGILL or SIGSEGV or SIGABND or something else.)