IBM i

IBM i product documentation: http://www-01.ibm.com/support/knowledgecenter/ssw_ibm_i/welcome

IBM Java on IBM i runs in PASE mode, so most of its behavior is the same as on AIX: http://www-01.ibm.com/support/knowledgecenter/ssw_ibm_i_72/rzalf/rzalfwhatispase.htm?lang=en

IBM i Recipe

  1. CPU core(s) should not be consistently saturated.
  2. Generally, physical memory should never be saturated and the operating system should not page memory out to disk.
  3. Input/Output interfaces such as network cards and disks should not be saturated, and should not have poor response times.
  4. TCP/IP and network tuning, whilst sometimes complicated to investigate, may have dramatic effects on performance.
  5. Operating system level statistics and optionally process level statistics should be periodically monitored and saved for historical analysis.
  6. Review operating system logs for any errors, warnings, or high volumes of messages.
  7. Review snapshots of process activity, and for the largest users of resources, review per thread activity.
  8. If the operating system is running in a virtualized guest, review the configuration and whether or not resource allotments are changing dynamically.
  9. Enable Collection Services for performance data.
  10. If there is sufficient network capacity for the additional packets, consider reducing the default TCP keepalive timer (CHGTCPA TCPKEEPALV) from 2 hours to a value less than intermediate device idle timeouts (e.g. firewalls).
  11. Test disabling delayed ACKs

Also review the general topics in the Operating Systems chapter.

Central Processing Unit (CPU)

IBM Systems Workload Estimator and processStats: https://www-912.ibm.com/estimator

WRKSYSSTS: http://www.ibm.com/support/knowledgecenter/SSAW57_8.5.5/com.ibm.websphere.nd.iseries.doc/ae/tprf_tunehdwcap.html

Use Collection Services performance data to gather detailed performance information: http://www.ibm.com/support/knowledgecenter/SSAW57_8.5.5/com.ibm.websphere.nd.multiplatform.doc/ae/tprf_collectionservices.html

IBM iDoctor for IBM i PEX Analyzer

  • Simplifies the collection and enhances the analysis of all types of PEX data, which includes, PROFILE, STATS and TRACE data.
  • Provides the details necessary for the low-level analysis of processor utilization, DASD operations, file space usage, waits, file opens and much more.

IBM iDoctor for IBM i Job Watcher

  • Provides real-time, non-intrusive, detailed and summarized views of job/thread/task performance data.
  • It's the step to take to avoid a system wide trace or to ensure that a trace will yield useful data.
  • It's a super WRKSYSACT that displays both "running" and "waiting" components for a job.

More information on CPU and call stacks can be found using the iDoctor tools including Performance Explorer and Job Watcher: https://www-912.ibm.com/i_dir/idoctor.nsf and http://www-01.ibm.com/support/docview.wss?uid=nas8N1012932

OS CPU Profiling

Profiling the CPU on the IBM i can be done on a global or individual job (JVM) basis. This is used with the Performance Explorer (PEX) tool. The process to gather the data is as follows:

1. Add the PEX definition with the *PMCO events (http://www-01.ibm.com/support/knowledgecenter/ssw_ibm_i_72/rzaha/profperf.htm?lang=en):

ADDPEXDFN DFN(JVMCPU) TYPE(*TRACE) JOB((*ALL/*ALL/JVMNAME *ALL)) MAXSTG(1000000) INTERVAL(1) TRCTYPE(*SLTEVT) SLTEVT(*YES) BASEVT((*PMCO *NONE *FORMAT2))

2. Gather the data using the above PEX definition:

STRPEX SSNID(TRACE1) DFN(JVMCPU)

3. Wait 5-10 minutes while the JVM is using high CPU, and then end the collection:

ENDPEX SSNID(TRACE1) DTALIB(QPEXDATA)

4. Print the PEX report, first by program, and next by statement:

PRTPEXRPT MBR(TRACE1) LIB(QPEXDATA) TYPE(*PROFILE) PROFILEOPT(*SAMPLECOUNT *PROGRAM)
PRTPEXRPT MBR(TRACE1) LIB(QPEXDATA) TYPE(*PROFILE) PROFILEOPT(*SAMPLECOUN T *STATEMENT)

5. This produces two spool files to show the breakout of CPU. Here is a histogram showing the breakdown:

All Jobs/Tasks CPU. . . :      41212922           
Jobs in Collection CPU  :      41212922           
Job CPU . . . . . . . . :      38813410    94.2 %
Task CPU. . . . . . . . :       2399512     5.8 %

 Task ID          Job/Task Name                 Pool Priority Existence    Elapsed Time (us)             CPU (us)    CPU %  
00000000000008E1 WQLWI7     QWEBQRYADM 976015     2      171   Y    Y            211403580             37479915     90.94     

 Cnt    %     %                                             
                                                          -
16116  23.1  23.1   libj9jit23.so MethodMetaData.c/getNextMap                                       
 4539   6.5  29.5   JITC/java/util/StringTokenizer.scanToken(I)I                         
 4081   5.8  35.4   libj9vm23.so strsup.s/copyCharsToUnicode
 3226   4.6  40.0   libj9jit23.so MethodMetaData.c/findMapsAtPC                                     
 2788   4.0  44.0   libj9jit23.so MethodMetaData.c/matchingRange                                    
 1975   2.8  46.8   JITC/java/io/PrintWriter.println(Ljava/lang/String;)V                
 1740   2.5  49.3   JITC/java/lang/Throwable.printStackTrace(Ljava/io/PrintWriter;)V     
 1605   2.3  51.6   libj9jit23.so ArrayCopy.s/__arrayCopy   
 1058   1.5  53.1   libj9gc23.so ScavengerWriteBarrier.cpp/J9WriteBarrierStore                      
 1049   1.5  54.6   libpthreads.a(shr_xpg5.o) pth_mutex.c/_mutex_lock                               
 1041   1.5  56.1   libj9vm23.so strsup.s/convertCharsToString                                      
 1002   1.4  57.5   libj9gc23.so j9memclr.c/J9ZeroMemory    
  996   1.4  59.0   JITC/ibi/webfoc/wfutil/WFTracestackLocation.vectorizeStringBuffer(LjZ)Ljava/util/Vector;
  904   1.3  60.2   libpthreads.a(shr_xpg5.o) pth_locks_ppc_mp.s/global_unlock_ppc_mp               
  869   1.2  61.5   libj9vm23.so optinfo.c/getLineNumberForROMClassFromROMMethod                    
  859   1.2  62.7   libj9gc23.so MarkingScheme.cpp/scanObject__16MM_MarkingSchemeFP14MM_EnvironmentP8J9Object
  623   0.9  63.6   libj9vm23.so segment.c/segmentSearchComparator                                  
  559   0.8  64.4   libpthreads.a(shr_xpg5.o) pth_locks_ppc_mp.s/global_lock_ppc_mp                 
  553   0.8  65.2   libj9vm23.so strsup.s/convertCharsToString                                      
  543   0.8  66.0   libj9vm23.so findmethod.c/findROMClassInSegment                                 
  522   0.7  66.7   libj9vm23.so mthutil.s/nextROMMethod    
  515   0.7  67.5   libj9vm23.so strsup.s/convertCharsToString   

Per thread CPU usage

Gathering per thread CPU usage can be done in a variety of ways. The best is to use the WRKJVMJOB command. Example:

1. WRKJVMJOB

2. Take option 11 to display threads. This shows the total CPU (seconds) for each thread.

  Job  . . . . . . :   TJH80EXP            PID  . . . . . . :   82839         
 User . . . . . . :   QEJBSVR             JDK  . . . . . . :   1.6.0         
 Number . . . . . :   946396              Bits . . . . . . :   32            
                                                                             
Type options, press Enter.                                                    
 10=Display call stack                                                       
                                                   Total         Aux         
    Thread      Name              Status            CPU          I/O         
    00000087    P=704863:O=0:CT   THDW            10.336        8277         
    0000008A    JIT Compilatio >  THDW            76.830         809         
    0000008B    JIT Compilatio >  THDW            67.357           6         
    0000008C    JIT Compilatio >  THDW            42.743           3         
    0000008E    IProfiler         THDW             4.275           0         
    0000008F    Signal Dispatc >  THDW            64.984           0         
    00000091    Concurrent Mar >  THDW             7.643        2790         
    00000092    GC Slave          THDW             3.263          44         
    00000093    GC Slave          THDW             3.172          38         
    00000094    GC Slave          THDW             3.665          46         
                                                                      More...

Another option would be to take option 13 instead of 11. This produces a spool file that can be displayed and sent to support.

Physical Memory (RAM)

Memory Pool Tuning and Faulting Guidelines: http://www-01.ibm.com/support/docview.wss?uid=nas8N1014941

Input/Output (I/O)

WRKDSKSTS shows the status of the disk drives. Look for "hot" drives indicating high %Busy. Units consistently above 30% busy will have slow IO response times.

                            Work with Disk Status                     RCHM199B
                                                            09/09/13  12:27:05
Elapsed time:   00:00:22                                                      
                                                                              
               Size    %     I/O   Request   Read  Write   Read  Write    %   
Unit  Type      (M)  Used    Rqs  Size (K)    Rqs   Rqs     (K)   (K)   Busy  
   1  4327    61744  87.7     .0      4.0      .0     .0    4.0     .0     0  
   2  4327    61744  87.7     .3      4.5      .2     .0    4.0    6.0     0  
   3  4327    61744  87.7     .0      4.0      .0     .0     .0    4.0     0  
   4  4327    61744  87.7     .0      4.0      .0     .0     .0    4.0     0  
   5  4327    61744  87.7     .0      4.0      .0     .0     .0    4.0     0  
   6  4327    61744  87.7     .1      8.0      .0     .1     .0    8.0     0  
   7  4327    61744  87.7     .1      4.0      .0     .0    4.0    4.0     0  
   8  4327    61744  87.7     .1      4.0      .0     .1    4.0    4.0     0  

F11 shows another view and the current status. Look for DEGRADED or FAILED units. This example shows they are all ACTIVE. No issues.

           --Protection--                
Unit  ASP  Type  Status    Compression   
   1    1  DPY   ACTIVE                  
   2    1  DPY   ACTIVE                  
   3    1  DPY   ACTIVE                  
   4    1  DPY   ACTIVE                  
   5    1  DPY   ACTIVE                  
   6    1  DPY   ACTIVE                  
   7    1  DPY   ACTIVE                  
   8    1  DPY   ACTIVE                  

Networking

Tune TCP/IP buffer sizes. Use CHGTCPA to tune them up to 8096 KB: http://www.ibm.com/support/knowledgecenter/SSAW57_8.5.5/com.ibm.websphere.nd.multiplatform.doc/ae/tprf_tunetcpip.html

Using Collection Services Performance Data

WAS provides scripts to enable collection services for performance data: http://www-01.ibm.com/support/knowledgecenter/SSAW57_8.5.5/com.ibm.websphere.nd.multiplatform.doc/ae/tprf_collectionservices.html

Gathering Javacores using WRKJVMJOB

Gathering javacores, core dumps, heap dumps, and JVM summary data is very simple on the IBM i. The WRKJVMJOB utility allows you to do all of this.

1. Execute WRKJVMJOB

2. This produces a list of all the JVMs active on the system

          Work with JVM Jobs                      RCHM199B
                                                            09/09/13  12:11:42
Active JVMs on system:   22                                                    
                                                                               
Type options, press Enter.                                                     
  5=Work with   7=Display job log   8=Work with spooled files                  
  9=Display GC information          11=Display threads   12=Dump   13=Print    
Opt  Job Name    User        Number  Function          Status                  
     QSRVMON     QSYS        842707  JVM-ServiceMon     THDW                   
     QP0ZSPWT    HENDERAN    862730  JVM-WSPreLaunc     TIMW                   
     BENNIEDMGR  QEJBSVR     911766  PGM-jvmStartPa     THDW                   
     NODEAGENT   QEJBSVR     911778  PGM-jvmStartPa     THDW                   
     BENNIENODE  QEJBSVR     911779  PGM-jvmStartPa     THDW                   
     SHU85EONE   QEJBSVR     916849  PGM-jvmStartPa     THDW                   
     STIMSERVER  QEJBSVR     934284  PGM-jvmStartPa     THDW                   
     BENNIE      QEJBSVR     937798  PGM-jvmStartPa     THDW                   
     DMGR        QEJBSVR     941298  PGM-jvmStartPa     THDW                   
 12  TJH80EXP    QEJBSVR     946396  PGM-jvmStartPa     THDW        

3. From this list, you can select option 12 to dump. By default, option 12 performs a javacore dump. To produce a different type of dump, you can select 12 next to the JVM, then hit F4 to prompt the command. This will allow you to change. Note the type of dump. (*JAVA = javacore, *SYSTEM = Core dump, *HEAP = heapdump.phd file is produced)

   Generate JVM Dump (GENJVMDMP)                
                                                                      
Type choices, press Enter.                                            
                                                                      
Job name . . . . . . . . . . . . > TJH80EXP      Name                 
  User . . . . . . . . . . . . . >   QEJBSVR     Name                 
  Number . . . . . . . . . . . . >   946396      000000-999999        
Type . . . . . . . . . . . . . .   *JAVA         *JAVA, *SYSTEM, *HEAP
               + for more values      

4. The dumps produced (javacore, heapdump, core dump) will be placed in the JVMs user home directory. The joblog for the JVM will show the location of the file. For example:

DSPJOBLOG JOB(946396/QEJBSVR/TJH80EXP)
JVMDUMP010I Java dump written to /QIBM/UserData/WebSphere/AppServer/V8/Express/profiles/tjh80exp/javacore.20130909.121650.82839.0001.txt

JVM Monitoring

Viewing the Application Server JVM can be done through WRKACTJOB. This command shows the total CPU seconds, CPU %, and IO for the job based on the elapsed time:

                             Work with Active Jobs                     RCHM199B
                                                             09/09/13  11:40:35
 CPU %:     2.2     Elapsed time:   00:00:10     Active jobs:   339            
                                                                               
 Type options, press Enter.                                                    
   2=Change   3=Hold   4=End   5=Work with   6=Release   7=Display message     
   8=Work with spooled files   13=Disconnect ...                               
                                               --------Elapsed---------        
 Opt  Subsystem/Job  Type  Pool  Pty      CPU  Int    Rsp  AuxIO  CPU %        
      QWAS8          SBS     2    0        .0                  0     .0        
        TJH80EXP     BCH     2   20    3454.9                  0     .0        

F11 shows further views, including number of threads, status, and function.

 Opt  Subsystem/Job  User        Number  Type  CPU %  Threads   
      QWAS8          QSYS        894103  SBS      .0        2   
        TJH80EXP     QEJBSVR     946396  BCH      .0       74   

                     Current                                         
 Opt  Subsystem/Job  User        Type  CPU %  Function        Status
      QWAS8          QSYS        SBS      .0                   DEQW  
        TJH80EXP     QEJBSVR     BCH      .0  PGM-jvmStartPa   THDW 

WRKSYSSTS shows the memory pool activity for the JVM. The WRKACTJOB above shows the WebSphere server "TJH80EXP" is running in system pool 2. The example output of WRKSYSSTS below shows system pool 2 as having 28,626MB allocated. The page faults are in faults/second, and split between DB and Non-DB faults. This is based on elapsed time.

WRKSYSSTS ASTLVL(*ADVANCED)    

                           Work with System Status                    RCHM199B
                                                            09/09/13  11:51:52
% CPU used . . . . . . . :        2.0    System ASP . . . . . . . :    493.9 G
% DB capability  . . . . :         .0    % system ASP used  . . . :    87.7574
Elapsed time . . . . . . :   00:07:58    Total aux stg  . . . . . :    493.9 G
Jobs in system . . . . . :       3211    Current unprotect used . :    15970 M
% perm addresses . . . . :       .032    Maximum unprotect  . . . :    22252 M
% temp addresses . . . . :       .569                                         
                                                                              
Sys      Pool   Reserved    Max  ----DB-----  --Non-DB---  Act-   Wait-  Act-
Pool    Size M   Size M     Act  Fault Pages  Fault Pages  Wait   Inel   Inel
  1    1187.55    606.00  +++++     .0    .0     .0    .0   10.5     .0     .0
  2   28626.03     11.30    820     .0    .4     .0    .2  39221     .0     .0
  3   13319.48       .56   1140     .0    .0     .0    .0  558.7     .0     .0
  4        .25       .00      5     .0    .0     .0    .0     .0     .0     .0

The above shows very low page fault rate based on almost 8 minutes elapsed time. Also note the Wait-Inel and Act-Inel counts as being 0. A higher value indicates the max act value is too low for the amount of threads active in the pool. This would cause performance problems.

F11 again shows the pool names. System pool 2 is the *BASE pool. This is the default pool for IBM i batch processes, including WebSphere

Sys      Pool   Reserved    Max                                      Paging
Pool    Size M   Size M     Act  Pool        Subsystem   Library     Option
  1    1187.55    606.00  +++++  *MACHINE                            *FIXED
  2   28626.03     11.30    820  *BASE                               *CALC
  3   13319.48       .56   1140  *INTERACT                           *CALC
  4        .25       .00      5  *SPOOL                              *FIXED