# Getting a backtrace from a coredump The best way to get this information from a core dump is by using the ServerDoc tool, described here. Unless there is a problem running the automated tool, that should be used instead of these manual steps. In rare circumstances it can be useful to analyze a coredump on a different machine, but in general it is best to try to analyze the coredump on the machine that generated it. The reason? Only that machine will have the same libraries and library versions, and when the coredump is analyzed elsewhere the analysis tools can give misleading information or refuse to work at all without the same libraries. ## instructions for various tools ### `gdb` instructions ``` # gdb /opt/IBMHTTPD/bin/httpd /tmp/core.13587 (gdb) where (gdb) thread apply all bt ``` #### how to get `gdb` for various platforms - Linux - just install it from your installation CD... All Linux distributions provide `gdb` - Solaris - [sunfreeware.com](http://www.sunfreeware.com), [Sun](http://www.sun.com/software/solaris/freeware.html) - AIX - [AIX toolbox for Linux applications](http://www-1.ibm.com/servers/aix/products/aixos/linux/) - HP-UX - [HP WDB - debugger based on gdb](http://h21007.www2.hp.com/dspp/tech/tech_TechSoftwareDetailPage_IDX/1,1703,1662,00.html) ### `dbx` instructions Example of dbx on AIX (Solaris is similar): ``` # dbx /usr/HTTPServer/bin/httpd /usr/HTTPServer/core Type 'help' for help. warning: The core file is truncated. You may need to increasethe ulimit for file and coredump, or free some space on the filesystem. reading symbolic information ... [using memory image in /usr/HTTPServer/core] Segmentation fault in sig_coredump at 0x10003f3c ($t1) 0x10003f3c (sig_coredump+0x3c) 80410014 lwz r2,0x14(r1) (dbx) where sig_coredump() at 0x10003f3c wait_or_timeout() at 0x10004564 standalone_main() at 0x10001a00 main() at 0x1000134c ``` #### **Important note about an AIX limitation** Please beware of a limitation of analyzing illegal instruction (SIGILL) coredumps on AIX. No backtrace is possible. Here is a typical encounter with this limitation: ``` #dbx /opt/HTTPServer/bin/httpd /IBPP/logs/core Type 'help' for help. reading symbolic information ...warning: no source compiled with -g [using memory image in /IBPP/logs/core] Illegal instruction (illegal opcode) in sig_coredump at 0x10003ee0 ($t1) 0x10003ee0 (sig_coredump+0x3c) 80410014 lwz r2,0x14(r1) (dbx) where sig_coredump() at 0x10003ee0 warning: Unable to access address 0x0 from core warning: could not locate trace table from starting address 0x0 (dbx) ``` In the event of an *illegal instruction* (SIGILL) coredump on AIX, it is best to send the actual core dump file, along with the dbx output, to IBM where some information can be extracted. This is very probablematic and the best we can expect is to get some hints about what module might have crashed. #### how to get dbx for various platforms 1. AIX - it is part of the base operating system; just make sure the fileset `bos.adt.debug` is installed 2. Solaris - it comes with Sun's C/C++ development product (Forte), which is an extra-cost product (so use [`pstack`](#pstack) if target system doesn't have this installed) 3. Linux and HP-UX - not available; use `gdb` ### Solaris `adb` instructions Example: ``` # adb /opt/IBMHTTPD/bin/httpd /opt/IBMHTTPD/core core file = /opt/IBMHTTPD/core -- program ``httpd.emerson'' on --> -- platform SUNW,Ultra-250 SIGBUS: Bus Error $c send_silly(18e938,fe4f0e81,ffffffff,7efefeff,79,79) + f4 ap_invoke_handler(18e938,fde20148,0,0,6,6) + 174 process_request_internal(18e938,1,40,ffbef99c,4,1) + 61c ap_process_request(18e938,4,18e938,ffbefa24,ffbefa34,2) + 30 child_main(2,2ce00,ff37f6a8,ff37e000,0,0) + 720 make_child(9e7a8,2,3ddea716,ffffffc0,10,fde7a0f4) + 158 startup_children(3,9e7a8,93d1c,9e7a8,80790,7a58c) + 88 standalone_main(1,ffbefc94,93d1c,ff23a000,ff23cfec,807d8) + 1dc main(1,ffbefc94,ffbefc9c,93800,0,0) + 574 ^D # ``` Note that the command to get the backtrace is `$c` - dollar sign followed by c. The command to get out is the eof character, usually `^D` #### how to get `adb` 1. Solaris - it comes with the base OS ### Solaris `pstack` instructions<-- {#pstack} --> Use the `pstack` command against the coredump. `pstack` is part of the base operating system, so it does not have to be installed separately. This is the recommended way to get backtraces on Solaris, especially when Sun's `dbx` tool is not available, since `pstack` can display function arguments for programs built without symbolic information (like official product builds) whereas `gdb` can't. Also, there have been circumstances where `gdb` didn't display the complete backtrace for a segfaulting thread but `pstack` did. Note that `pstack` doesn't know how many arguments there are so it always displays six. So if you know that some function has only two arguments, ignore whatever `pstack` displays after the first argument. Example: ``` # pstack core.httpd.1008 core 'core.httpd.1008' of 1008: /opt/IBMHTTPD/bin/httpd ----------------- lwp# 1 / thread# 1 -------------------- 0002e3e8 ???????? (ffbeee7c, 1425, d, a16f0, 82b68, 9b098) 00031188 main (1, ffbeef94, 96408, ff238018, ff23b03c, 82cf0) + 478 00031bec parse_byterange (1, ffbeef94, ffbeef9c, 96000, 0, 0) + 484 00017308 load_module (0, 0, 0, 0, 0, 0) + 140 ----------------- lwp# 2 / thread# 2 -------------------- ff21ad54 _signotifywait (ff16e000, 0, 0, ff23b540, 0, 0) + 8 ff151ae4 thr_yield (0, 0, 0, 0, 0, 0) + 8c ----------------- lwp# 3 / thread# 3 -------------------- ff21b3e0 _lwp_sema_wait (fe30de30, ff16e000, 0, fe30dd78, 250c4, 0) + c ff14944c _swtch (fe30dd78, fe30dd78, ff16e000, 5, 1000, 1) + 424 ff14d8a4 _reap_wait (ff172a08, 20a38, 0, ff16e000, 0, 0) + 38 ff14d5fc _reaper (ff16ee30, ff255d18, ff172a08, ff16ee08, 0, fe400000) + 38 ff15ba1c _thread_start (0, 0, 0, 0, 0, 0) + 40 # ``` The `pstack` command automatically displays the backtrace for each thread. #### how to find out which thread did the dirty deed The Solaris `pflags` command displays information about the various threads in a coredump or live process. Here is some output showing how it labels the thread that did something bad: ``` $ pflags core core 'core' of 20897: /export/home/trawick/ph/2.0.42/built/bin/httpd -k start data model = _ILP32 /1: flags = PR_PCINVAL sigmask = 0xffffbefc,0x00001fff cursig = SIGSEGV /2: flags = PR_STOPPED|PR_ASLWP why = PR_SUSPENDED sigmask = 0xffbffeff,0x00001fff /5: flags = PR_STOPPED why = PR_SUSPENDED /4: flags = PR_STOPPED why = PR_SUSPENDED /6: flags = PR_STOPPED why = PR_SUSPENDED /7: flags = PR_STOPPED why = PR_SUSPENDED (rest of output omitted) ``` Note that thread 1 has *cursig = SIGSEGV* next to it. That is the flag that Solaris thinks did the dirty deed. This is often correct. (Note: For other types of problems it may say *SIGILL* or *SIGSEGV* or *SIGABND* or something else.)