Troubleshooting z/OS

z/OS often refers to a date in the form: 09.210. In this case, 09 are the last two digits of the year. 210 means it is the 210th day of year 2009; in this example, July 29, 2009.

Signals

Available signals may be listed with kill -l:

$ kill -l
 1) SIGHUP      2) SIGINT    3) SIGABRT   4) SIGILL      5) SIGPOLL
 6) SIGURG      7) SIGSTOP   8) SIGFPE    9) SIGKILL    10) SIGBUS
11) SIGSEGV    12) SIGSYS   13) SIGPIPE  14) SIGALRM    15) SIGTERM
16) SIGUSR1    17) SIGUSR2  19) SIGCONT  20) SIGCHLD    21) SIGTTIN
22) SIGTTOU    23) SIGIO    24) SIGQUIT  25) SIGTSTP    26) SIGTRAP
28) SIGWINCH   29) SIGXCPU  30) SIGXFSZ  31) SIGVTALRM  32) SIGPROF
33) SIGDANGER

Console Dump

A console dumps is a dump of one or more address spaces. Console dumps generally also contain a system trace for the entire LPAR. The simplest console dump is just of the root address space (1):

  1. DUMP COMM=ASID1DMP
  2. Reply with dump options:
    R xx,ASID=1,SDATA=(ALLNUC,CSA,LPA,PSA,RGN,SQA,LSQA,TRT),END

High CPU

Review https://www.ibm.com/support/pages/mustgather-high-cpu-causing-hang-or-loop-running-zos

Sending messages to the MVS log and slip trapping on them

Messages may be sent to the joblog, MVS log, or both. If messages are sent to the MVS log, then you can use them for slip traps for dumps; however, be careful about overloading the MVS log with too many messages.

System Dumps

It's best to ensure a dump is produced with maximum memory for dbx (and IPCS) analysis. For example:

/CHNGDUMP SET,SYSMDUMP=(ALL,ALLNUC)
/CHNGDUMP SET,SDUMP,MAXSPACE=5000M
/DD ALLOC=ACTIVE

To display current dump options:

/DISPLAY DUMP,OPTIONS                                                  
IEE857I 13.12.22 DUMP OPTION 813                                      
  SYSABEND- ADD PARMLIB OPTIONS SDATA=(LSQA,TRT,CB,ENQ,DM,IO,ERR,SUM),
                       PDATA=(SA,REGS,LPA,JPA,PSW,SPLS)               
  SYSUDUMP- ADD PARMLIB OPTIONS SDATA=(SUM), NO PDATA OPTIONS         
  SYSMDUMP- ADD OPTIONS (NUC,SQA,LSQA,SWA,TRT,RGN,LPA,CSA,SUM,ALLNUC, 
                      GRSQ)                                           
  SDUMP- ADD NO OPTIONS,BUFFERS=00000000K,MAXSPACE=00005000M,         
                      MSGTIME=99999 MINUTES,MAXSNDSP=015 SECONDS,     
                      AUXMGMT=ON ,DEFERTND=NO ,OPTIMIZE=NO ,          
                      MAXTNDSP=(,,) SECONDS                           
  ABDUMP- TIMEENQ=0240 SECONDS                                        

IPCS

IPCS is the z/OS debugger used to analyze system dumps (similar to gdb or dbx on other operating systems). z/OS also has the dbx USS utility to investigate system dumps produced by C/C++ programs in a similar way to dbx/gdb on other platforms.

In IPCS, first, go to 0 DEFAULTS and set a source dataset and press Enter. For example:

Source  ==> DSNAME('ASSR1.JVM.BBOS001S.D210125.T211001.X001')

Then press F3, and go to 6 COMMAND, type ip st and press Enter. It may ask you if you want to use summary data and type Y and press Enter. The dump should now be initialized. F8 page down to get details about the dump and then F3 to go back and enter various commands:

  1. General status report: IP ST
    • Local time of the dump at the top
    • Program Producing Dump: ...
    • LPAR name follows SNAME (NN)
  2. Dump request information: IP LIST TITLE
  3. If produced by a SLIP, list SLIP info: IP LIST SLIPTRAP
  4. z/OS version: IP CBF CVT
    • Search for PRODI.... HBB77C0
  5. ASIDs dump: IP CBF RTCT
    • ASIDs dumped in the SDAS column under ASTB
  6. ASID to JOBNAME translation: IP SELECT ALL
  7. Switch ASIDs: IP SELECT ASID(x'nn') or IP SELECT JOB(jobname)
  8. Potential abend information: IP ST FAILDATA
  9. History of abends: IP VERBX LOGDATA
  10. Show MVS console log: IP VERBX MTRACE
  11. Show native TCB thread stacks: ip verbx ledata 'nthreads(*)'
  12. Traceback for the specified TCB: ip verbx ledata 'ceedump asid(188) tcb(0098CA48)'
  13. List thread TCBs: IP SUMM FORMAT
    • f "T C B S U M M A R Y"
    • Non-zero code in the CMP column is the abend code
  14. USS thread status (requires USS kernel address space): ip omvsdata process detail
  15. Display memory: IP L 07208CE0 ASID(X'65') L(X'60')
  16. Display memory as instructions: IP L 07208CE0 ASID(X'65') L(X'60') I
  17. Show system trace: IP SYSTRACE ALL TIME(LOCAL)
    • Search for RCVY for processing error
  18. Show system trace for a particular ASID: IP SYSTRACE ASID(x'nn') TIME(LOCAL)
  19. Show system trace for a particular ASID and TCB: IP SYSTRACE ASID(x'0188') TCB(x'0098CA48') TIME(LOCAL)
  20. Memory usage report: ip verbx vsmdata 'summary noglobal'
  21. Review captured CPU information by ASID: SYSTRACE PERFDATA

VSMDATA

The VSMDATA command IP VERBX VSMDATA 'ASID(NN) NOG SUM' (specifying the ASID in decimal) displays a summary of LE native memory usage below the 2GB bar. The "User Region" is effectively the native heap (actually, it's the LE heap which may be used by other components within the process other than just JVM native heap usage). Subtracting "Ext. User Region Start" from "Ext. User Region Top" provides roughly how much native heap is being used under the 2GB bar. If "Ext. User Region Top" is very close to the 2GB bar, then below-the-bar native memory exhaustion is the likely cause of any native OutOfMemoryErrors. In the following example, about 0x71346000 - 0x1F300000 = 0x52046000 (1.28GB) of native memory is used below the bar and the top is very close to the 2GB bar and therefore this was a compressed references below-the-bar exhaustion NOOM.

    LOCAL STORAGE MAP                                                 
 ___________________________                                          
|                           |80000000  <- Top of Ext. Private         
| Extended                  |                                         
| LSQA/SWA/229/230          |7F600000  <- Max Ext. User Region Address
|___________________________|71367000  <- ELSQA Bottom                
|                           |                                         
| (Free Extended Storage)   |                                         
|___________________________|71346000  <- Ext. User Region Top        
|                           |                                         
| Extended User Region      |                                         
|___________________________|1F300000  <- Ext. User Region Start      
:                           :                                         
: Extended Global Storage   :                                         
=======================================<- 16M Line                    
: Global Storage            :                                         
:___________________________:  A00000  <- Top of Private              
|                           |                                         
| LSQA/SWA/229/230          |  986000  <- Max User Region Address     
|___________________________|  931000  <- LSQA Bottom                 

C/C++

The C/C++ compilers on z/OS are provided by the XLC package. The USS utility c89 is often used to compile C programs and the USS utility c++ is often used to compile C++ programs. Note that for XLC c++:

Except for the -W, -D, and -U flag options, all flag options that are supported by the c89 utility are supported by the xlc utility with the same semantics.

As with other operating systems, it's generally advised to compile C/C++ programs with symbol information for serviceability purposes when diagnosing crashes. In general, most compiler optimizations are still performed when using the -g1 option and this is generally recommended. The -g c89 options map to the DEBUG options. If -g1 provides insufficient information, try -g9 although optimizations will be more affected.

Whether the compiler is run from USS or TSO impacts the default options used:

invoking the compiler with the c89 and xlc utilities overrides the default values for many options, compared to running the compiler in MVS batch or TSO

There is an example C++ program to test compilation: https://www.ibm.com/support/knowledgecenter/SSLTBW_2.4.0/com.ibm.zos.v2r4.cbcux01/cppshell.htm

If your C++ files use the .cpp extension, then run export _CXX_CXXSUFFIX=cpp.

dbx

The dbx USS utility is an alternative to using IPCS for C/C++ programs. Load the dump with the -C option and pass the data set name. For example:

$ dbx -C "//'SF.T00468.S3609.BOSS0030.DUMP1'"

Then use the where command to show the backtrace.

Common commands:

  • List address spaces: asid
  • List all threads: thread
  • Change the current thread: thread current N
  • List loaded shared libraries: map
  • List known processes: pid
  • Display registers: registers
  • Generate copy/paste commands for all threads:
    for i in $(seq 1 164); do echo "thread current ${i}"; echo where; echo "TRASH: THREAD ${i}"; done

LE Native Memory

Environment variables may be used to control memory pools and print memory statistics; for example:

export _CEE_RUNOPTS="$_CEE_RUNOPTS HEAPPOOLS(ON) HEAPPOOLS64(ON) RPTOPTS(ON) RPTSTG(ON)"