General WAS traditional Performance Problem

  1. Make sure the logs are capturing as much as possible:
    1. Administrative Console } Troubleshooting } Logs and Trace } server name } JVM Logs. These can also be changed dynamically on the Runtime tab.
    2. For example, Maximum size = 100MB and Maximum Number of Historical Log Files = 5
  2. Ensure verbose garbage collection is enabled. This may be enabled at runtime. Otherwise, you will need to restart to apply the change.
  3. Ensure that PMI is enabled either with the "Basic" level (this is the default) or with a "Custom" level (see WAS chapter on which counters are recommended)
  4. Enable PMI logging to files, either with a monitoring product or with the built-in TPV logger:
    1. Important note: all of these steps must be done after every application server restart. This can be automated with a wsadmin script
    2. Login to the Administrative Console and go to: Monitoring and Tuning } Performance Viewer } View Logs
    3. Select all relevant application servers and click "Start Monitoring"
    4. Click each application server
    5. Click on server } Settings } Log
    6. Duration = 300000
      Maximum File Size = 50
      Maximum Number of Historical Files = 5
      Log Output Format = XML
    7. Click Apply
    8. Click server } Summary Reports } Servlets
    9. Click "Start Logging"
  5. For IBM Java, enable IBM Health Center in headless mode:
    1. Choose one of these methods to start Health Center:
      1. Restart the JVM adding the following generic JVM arguments:
        -Xhealthcenter:level=headless -Dcom.ibm.java.diagnostics.healthcenter.headless.files.max.size=104857600 -Dcom.ibm.java.diagnostics.healthcenter.headless.files.to.keep=10
      2. Start it dynamically:
        $WEBSPHERE/java/bin/java -jar $WEBSPHERE/java/jre/lib/ext/healthcenter.jar ID=$PID -Dcom.ibm.java.diagnostics.healthcenter.data.collection.level=headless -Dcom.ibm.java.diagnostics.healthcenter.headless.files.max.size=104857600 -Dcom.ibm.java.diagnostics.healthcenter.headless.files.to.keep=10
  6. If there is a web server in front of WAS, see the Web Server recipes.
  7. Archive and truncate any existing logs for each server in $WEBSPHERE/profiles/$PROFILE/logs/$SERVER/*
  8. Reproduce the problem.
  9. Gather the Performance, Hang, or High CPU issue MustGather for your operating system:
    1. Linux
    2. AIX
    3. Windows
    4. z/OS
    5. Solaris
    6. HP-UX
  10. After the problem has been reproduced, gracefully stop the application servers (to produce Health Center HCD files).
  11. Gather:
    1. Server logs under $WEBSPHERE/profiles/$PROFILE/logs/$SERVER/: SystemOut*.log SystemErr*.log native_stderr.log native_stdout.log
    2. FFDC logs under $WEBSPHERE/profiles/$PROFILE/logs/ffdc/*
    3. Javacores, heapdumps, and system dumps, if any: $WEBSPHERE/profiles/$PROFILE/javacore* $WEBSPHERE/profiles/$PROFILE/heapdump* $WEBSPHERE/profiles/$PROFILE/core*
    4. PMI logs: $WEBSPHERE/profiles/$PROFILE/logs/tpv/*
    5. Health Center logs, if any: $WEBSPHERE/profiles/$PROFILE/*.hcd
    6. server.xml for each server: $WEBSPHERE/profiles/$PROFILE/config/cells/$CELL/nodes/$NODE/servers/$SERVER/server.xml
    7. The output of the Performance MustGather

Reviewing the data

  1. Review all WAS logs for any errors, warnings, etc.
  2. Review verbosegc for garbage collection overhead.
  3. Review thread dumps
    1. Review patterns and check for deadlocks and monitor contention (e.g. the TMDA tool).
  4. Review operating system data for WAS and IHS nodes
    1. If CPU time is high, review if it's user or system.
      1. Review per-process and per-thread CPU data for details.
    2. Check virtualization steal time
    3. Check run queue length and any blocked threads
    4. Check for memory swap-ins
      1. If high, check memory statistics such as file cache, free memory, etc.
  5. Review PMI data for the key performance indicators such as the WebContainer thread pool ActiveCount, database connection pool usage, servlet response times, etc. (see WAS - PMI). Try to isolate the problem to particular requests, database queries, etc (duration or volume).
    1. If using a database, review the response times in the connection pool. Try to isolate the problem to particular queries (duration or volume).
  6. Review Health Center data
  7. If using web servers, review IHS messages in access_log, error_log, and the plugin log to see if requests are coming in and if there are errors (i.e. HTTP response codes). Also review mpmstats in error_log to see what the threads are doing.