Getting a backtrace from a coredump¶
The best way to get this information from a core dump is by using the ServerDoc tool, described here. Unless there is a problem running the automated tool, that should be used instead of these manual steps.
In rare circumstances it can be useful to analyze a coredump on a different machine, but in general it is best to try to analyze the coredump on the machine that generated it. The reason? Only that machine will have the same libraries and library versions, and when the coredump is analyzed elsewhere the analysis tools can give misleading information or refuse to work at all without the same libraries.
instructions for various tools¶
gdb
instructions¶
# gdb /opt/IBMHTTPD/bin/httpd /tmp/core.13587
(gdb) where
(gdb) thread apply all bt
how to get gdb
for various platforms¶
Linux - just install it from your installation CD... All Linux distributions provide
gdb
Solaris - sunfreeware.com, Sun
HP-UX - HP WDB - debugger based on gdb
dbx
instructions¶
Example of dbx on AIX (Solaris is similar):
# dbx /usr/HTTPServer/bin/httpd /usr/HTTPServer/core
Type 'help' for help.
warning: The core file is truncated. You may need to increasethe
ulimit
for file and coredump, or free some space on the filesystem.
reading symbolic information ...
[using memory image in /usr/HTTPServer/core]
Segmentation fault in sig_coredump at 0x10003f3c ($t1)
0x10003f3c (sig_coredump+0x3c) 80410014 lwz r2,0x14(r1)
(dbx) where
sig_coredump() at 0x10003f3c
wait_or_timeout() at 0x10004564
standalone_main() at 0x10001a00
main() at 0x1000134c
Important note about an AIX limitation¶
Please beware of a limitation of analyzing illegal instruction (SIGILL) coredumps on AIX. No backtrace is possible. Here is a typical encounter with this limitation:
#dbx /opt/HTTPServer/bin/httpd /IBPP/logs/core
Type 'help' for help.
reading symbolic information ...warning: no source compiled with -g
[using memory image in /IBPP/logs/core]
Illegal instruction (illegal opcode) in sig_coredump at 0x10003ee0
($t1)
0x10003ee0 (sig_coredump+0x3c) 80410014 lwz r2,0x14(r1)
(dbx) where
sig_coredump() at 0x10003ee0
warning: Unable to access address 0x0 from core
warning: could not locate trace table from starting address 0x0
(dbx)
In the event of an illegal instruction (SIGILL) coredump on AIX, it is best to send the actual core dump file, along with the dbx output, to IBM where some information can be extracted. This is very probablematic and the best we can expect is to get some hints about what module might have crashed.
how to get dbx for various platforms¶
AIX - it is part of the base operating system; just make sure the fileset
bos.adt.debug
is installedSolaris - it comes with Sun's C/C++ development product (Forte), which is an extra-cost product (so use
pstack
if target system doesn't have this installed)Linux and HP-UX - not available; use
gdb
Solaris adb
instructions¶
Example:
# adb /opt/IBMHTTPD/bin/httpd /opt/IBMHTTPD/core
core file = /opt/IBMHTTPD/core -- program ``httpd.emerson'' on -->
-- platform SUNW,Ultra-250
SIGBUS: Bus Error
$c
send_silly(18e938,fe4f0e81,ffffffff,7efefeff,79,79) + f4
ap_invoke_handler(18e938,fde20148,0,0,6,6) + 174
process_request_internal(18e938,1,40,ffbef99c,4,1) + 61c
ap_process_request(18e938,4,18e938,ffbefa24,ffbefa34,2) + 30
child_main(2,2ce00,ff37f6a8,ff37e000,0,0) + 720
make_child(9e7a8,2,3ddea716,ffffffc0,10,fde7a0f4) + 158
startup_children(3,9e7a8,93d1c,9e7a8,80790,7a58c) + 88
standalone_main(1,ffbefc94,93d1c,ff23a000,ff23cfec,807d8) + 1dc
main(1,ffbefc94,ffbefc9c,93800,0,0) + 574
^D
#
Note that the command to get the backtrace is $c
- dollar sign
followed by c. The command to get out is the eof character, usually ^D
how to get adb
¶
Solaris - it comes with the base OS
Solaris pstack
instructions<-- {#pstack} -->¶
Use the pstack
command against the coredump. pstack
is part of the
base operating system, so it does not have to be installed separately.
This is the recommended way to get backtraces on Solaris, especially
when Sun's dbx
tool is not available, since pstack
can display
function arguments for programs built without symbolic information (like
official product builds) whereas gdb
can't. Also, there have been
circumstances where gdb
didn't display the complete backtrace for a
segfaulting thread but pstack
did.
Note that pstack
doesn't know how many arguments there are so it
always displays six. So if you know that some function has only two
arguments, ignore whatever pstack
displays after the first argument.
Example:
# pstack core.httpd.1008
core 'core.httpd.1008' of 1008: /opt/IBMHTTPD/bin/httpd
----------------- lwp# 1 / thread# 1 --------------------
0002e3e8 ???????? (ffbeee7c, 1425, d, a16f0, 82b68, 9b098)
00031188 main (1, ffbeef94, 96408, ff238018, ff23b03c, 82cf0) + 478
00031bec parse_byterange (1, ffbeef94, ffbeef9c, 96000, 0, 0) + 484
00017308 load_module (0, 0, 0, 0, 0, 0) + 140
----------------- lwp# 2 / thread# 2 --------------------
ff21ad54 _signotifywait (ff16e000, 0, 0, ff23b540, 0, 0) + 8
ff151ae4 thr_yield (0, 0, 0, 0, 0, 0) + 8c
----------------- lwp# 3 / thread# 3 --------------------
ff21b3e0 _lwp_sema_wait (fe30de30, ff16e000, 0, fe30dd78, 250c4, 0) + c
ff14944c _swtch (fe30dd78, fe30dd78, ff16e000, 5, 1000, 1) + 424
ff14d8a4 _reap_wait (ff172a08, 20a38, 0, ff16e000, 0, 0) + 38
ff14d5fc _reaper (ff16ee30, ff255d18, ff172a08, ff16ee08, 0,
fe400000) + 38
ff15ba1c _thread_start (0, 0, 0, 0, 0, 0) + 40
#
The pstack
command automatically displays the backtrace for each
thread.
how to find out which thread did the dirty deed¶
The Solaris pflags
command displays information about the various
threads in a coredump or live process. Here is some output showing how
it labels the thread that did something bad:
$ pflags core
core 'core' of 20897: /export/home/trawick/ph/2.0.42/built/bin/httpd
-k start
data model = _ILP32
/1: flags = PR_PCINVAL
sigmask = 0xffffbefc,0x00001fff cursig = SIGSEGV
/2: flags = PR_STOPPED|PR_ASLWP
why = PR_SUSPENDED
sigmask = 0xffbffeff,0x00001fff
/5: flags = PR_STOPPED
why = PR_SUSPENDED
/4: flags = PR_STOPPED
why = PR_SUSPENDED
/6: flags = PR_STOPPED
why = PR_SUSPENDED
/7: flags = PR_STOPPED
why = PR_SUSPENDED
(rest of output omitted)
Note that thread 1 has cursig = SIGSEGV next to it. That is the flag that Solaris thinks did the dirty deed. This is often correct. (Note: For other types of problems it may say SIGILL or SIGSEGV or SIGABND or something else.)