J9 Health Center Enable at Runtime of Limited Duration on z/OS

The Health Center agent is shipped with IBM Java and IBM Semeru Runtimes on z/OS.

The following instructions are for z/OS (in this context, z/Linux is not considered z/OS). For non-z/OS platforms, see alternate instructions.

  1. By default, late attach is disabled. Restart the target process with the additional generic JVM argument -Dcom.ibm.tools.attach.enable=yes
  2. Find the decimal PID of the target JVM. WebSphere traditional shows the PID in SYSPRINT in the BBOJ0051I message. In the following example, it is 16843066:
     Trace: 2024/01/04 16:18:54.968 02 t=7E5E78 c=UNK key=P8 tag= (13007004)        
       SourceId: com.ibm.ws390.orb.CommonBridge                                     
       ExtendedMessage: BBOJ0051I: PROCESS INFORMATION: STC00089/BBOS001S, ASID=76(0x4c), PID=16843066(0x101013a)
  3. Find the path to Java of the target JVM. WebSphere traditional shows this path in SYSPRINT in a BBOJ0077I message for java.home. In the following example, it is /WebSphere/ND/AppServer/java64:
     Trace: 2024/01/04 16:18:54.972 02 t=7E5E78 c=UNK key=P8 tag= (13007004)        
       SourceId: com.ibm.ws390.orb.CommonBridge.printProperties                     
       ExtendedMessage: BBOJ0077I:               java.home = /WebSphere/ND/AppServer/java64
  4. Find the owner of the started task of the target JVM. In the following example in D.DA, it is ASSR1:
     NP   JOBNAME  StepName ProcStep JobID    Owner    C Pos DP Real Paging    SIO  
          BBOS001S BBOS001S BBOPASR  STC00089 ASSR1      IN  C9  93T   0.00  35.64  
  5. Create the following JCL, replacing USER_REPLACEME with the owner of the started task, both instances of JAVAPATH_REPLACEME with the path to Java, PID_REPLACEME with the PID, and 30 with the number of minutes to run (see notes below on duration considerations). Ensure CAPS OFF when editing and that every line of the STDPARM excluding the last line ends with a space ( ).
    //SHLLJOB1 JOB (ACCOUNT),NOTIFY=&SYSUID,REGION=0M,CLASS=A,MSGCLASS=H,
    // MSGLEVEL=(1,1),USER=USER_REPLACEME
    //SHLLSTEP EXEC PGM=BPXBATCH
    //BPXPRINT DD SYSOUT=*
    //STDOUT   DD SYSOUT=*
    //STDERR   DD SYSOUT=*
    //STDPARM  DD *
    SH /JAVAPATH_REPLACEME/bin/java 
    -jar 
    /JAVAPATH_REPLACEME/lib/ext/healthcenter.jar 
    ID=PID_REPLACEME 
    level=headless 
    -Dcom.ibm.java.diagnostics.healthcenter.headless.run.number.of.runs=1 
    -Dcom.ibm.java.diagnostics.healthcenter.headless.run.duration=30
    /*
  6. Submit the job.
    1. If you receive the error, LOGON/JOB INITIATION - SUBMITTER IS NOT AUTHORIZED BY USER, then consider allowing surrogate job submission; for example:
      RDEFINE SURROGAT ASSR1.SUBMIT UACC(NONE) OWNER(ASSR1)
      PERMIT ASSR1.SUBMIT CLASS(SURROGAT) ID(MSTONE1) ACCESS(READ)
      SETROPTS RACLIST(SURROGAT) REFRESH
  7. Confirm in D.ST in the SHLLJOB1 job that the output at the bottom looks similar to the following, specifically the Successfully enabled Health Center agent line:
    IEF033I  JOB/SHLLJOB1/STOP  2024004.1259                                        
            CPU:     0 HR  00 MIN  00.00 SEC    SRB:     0 HR  00 MIN  00.00 SEC    
    Successfully enabled Health Center agent in VM: 16843066                        
    Health Center properties used by agent in target VM:                            
    -- listing properties --                                                        
    com.ibm.java.diagnostics.healthcenter.agent.port=1972                           
    com.ibm.java.diagnostics.healthcenter.data.collection.level=HEADLESS            
  8. Confirm in D.DA of the target JVM in SYSOUT that Health Center has started. For example:
    [Thu Jan  4 17:59:09 2024] com.ibm.diagnostics.healthcenter.headless INFO: 4.0.7
    [Thu Jan  4 17:59:09 2024] com.ibm.diagnostics.healthcenter.headless INFO: Headless data collection has started
    [Thu Jan  4 17:59:09 2024] com.ibm.diagnostics.healthcenter.headless INFO: Each data collection run will last for 30 minutes 
    [Thu Jan  4 17:59:09 2024] com.ibm.diagnostics.healthcenter.headless INFO: Agent will run for 1 collections
    [Thu Jan  4 17:59:09 2024] com.ibm.diagnostics.healthcenter.headless INFO: Agent will keep last 5 hcd files
    [Thu Jan  4 17:59:09 2024] com.ibm.diagnostics.healthcenter.headless INFO: Headless collection output directory is /SY1/var/WebSphere/home/WSSR1
  9. After the time has elapsed, refresh the JVM's SYSOUT to confirm that the HCD file was created. For example:
    [Thu Jan  4 18:29:09 2024] com.ibm.diagnostics.healthcenter.headless INFO: Creating hcd import file /SY1/var/WebSphere/home/WSSR1/healthcenter040124_175909_16843066_1.hcd
    [Thu Jan  4 18:29:09 2024] com.ibm.diagnostics.healthcenter.headless INFO: hcd import file /SY1/var/WebSphere/home/WSSR1/healthcenter040124_175909_16843066_1.hcd created
  10. FTP the HCD file(s) in BIN mode.

Warnings and notes:

  1. Every time a new HCD collection is started, the agent starts to look up method address to method name mappings for all loaded methods at the start of the HCD. By default, the agent queries up to 3,000 unresolved method name mappings every 5 seconds. Therefore, for proper profiled method name evaluation, the minimum duration per-HCD should be specified based on the number of loaded methods. It is common for an enterprise application to load hundreds of thousands of methods, so a minimum of 5-10 minutes is a good start. Tracing to show this behavior and the number of methods is -Dcom.ibm.diagnostics.healthcenter.logging.methodlookup=debug -Dcom.ibm.diagnostics.healthcenter.logging.MethodLookupProvider=debug and search for com.ibm.diagnostics.healthcenter.methodlookup.debug DEBUG: N methods to lookup. Methods created during the HCD interval are captured seperately.
  2. If the JVM could not be stopped gracefully, gather the temporary files from a subdirectory of the output called tmp_${STARTDAY}${STARTMONTH}${STARTYEAR}_${STARTHOUR}${STARTMINUTES}${STARTSECONDS}_
  3. Use the additional JVM option -Dcom.ibm.java.diagnostics.healthcenter.headless.output.directory=$DIR to redirect Health Center files to a different directory instead of the working directory.
  4. Note that this does not work with Liberty if some jndi-1.0-related features are loaded and there is a request for enhancement.
  5. If using Liberty and you specify -Xtrace:buffers={2m,dynamic} to minimize Health Center method metadata loss, since Liberty defaults to an unlimited maximum thread pool designed to maximize throughput, consider capping this with <executor maxThreads="N" /> based on available native memory to avoid native memory exhaustion, or use a smaller -Xtrace buffer size such as -Xtrace:buffers={128k,dynamic} (or lower).
  6. We have observed that some monitoring agents cause problems with Health Center. Consider removing other monitoring agents that use -agentpath while using HealthCenter, engage IBM and the agent company support teams to investigate, or use -Xbootclasspath/p to healthcenter.jar and -agentpath to libhealthcenter.so.

For details, see the Health Center chapter.