- Periodically monitor WAS logs for warning and error messages.
- Set the maximum size of JVM
logs to at least 256MB and maximum number of historical files to at
least 4.
- Set the maximum size of diagnostic
trace to at least 256MB and maximum number of historical files to at
least 4.
- Change the hung thread
detection threshold and interval to something smaller that is tuned
for each application, and enable a limited number of thread dumps when
these events occur. For example:
com.ibm.websphere.threadmonitor.threshold=30
com.ibm.websphere.threadmonitor.interval=1
com.ibm.websphere.threadmonitor.dump.java=15
com.ibm.websphere.threadmonitor.dump.java.track=3
- Unless
com.ibm.websphere.threadmonitor.interval
has
been set very low, consider enabling periodic thread pool statistics
logging with the diagnostic trace
*=info:Runtime.ThreadMonitorHeartbeat=detail
- Monitor for increases in the
Count
column in the FFDC summary file
(${SERVER}_exception.log
) for each server, because only the
first FFDC will print a warning to the logs.
- Review relevant timeout values such as JDBC, HTTP, etc.
- A well-tuned WAS is a better-behaving WAS, so also review the WAS traditional tuning
recipes.
- Review the Troubleshooting
Operating System Recipes and Troubleshooting Java
Recipes.
- Review all warnings and errors in
System*.log
(or using
logViewer
if HPEL is enabled) before and during the
problem. A regular expression search is " [W|E] "
. One
common type of warning is an FFDC warning which points to a matching
file in the FFDC logs directory.
- If you're on Linux or use cygwin, use the following command:
find . -name "*System*" -print0 | xargs -0 grep " [W|E] " | grep -v -e supposedly_benign_message1 -e supposedly_benign_message2
- Review all
JVM
messages in
native_stderr.log
before and during the problem. This may
include things such as OutOfMemoryErrors. The filename of such artifacts
includes a timestamp of the form YYYYMMDD
.
- Review any strange messages in
native_stdout.log
before
and during the problem.
- If verbose garbage collection is enabled, review verbosegc in
native_stderr.log
(IBM Java),
native_stdout.log
(HotSpot Java), or any
verbosegc.log
files (if using -Xverbosegclog
or -Xloggc
) in the IBM
Garbage Collection and Memory Visualizer Tool and ensure that the
proportion of time in garbage collection for a relevant period before
and during the problem is less than 5 - 10%
- Review any
javacore*.txt
files in the IBM
Thread and Monitor Dump Analyzer tool. Review the causes of the
thread dump (e.g. user-generated, OutOfMemoryError, etc.) and review
threads with large stacks and any monitor contention.
- Review any
heapdump*.phd
and core*.dmp
files in the Eclipse Memory
Analyzer Tool
- Consider increasing the value of
server_region_stalled_thread_threshold_percent
so that a servant is only abended when a large percentage of threads are
taking a long time. Philosophies on this differ,
but consider a value of 10.
- Set
control_region_timeout_delay
to give some time for work to finish before the servant is abended; for
example, 5.
- Set
control_region_timeout_dump_action
to gather useful diagnostics when a servant is abended; for example,
IEATDUMP
- Consider reducing the
control_region_$PROTOCOL_queue_timeout_percent
values so that requests time out earlier if they queue for a long time;
for example, 10.
- If necessary, apply granular
timeouts to particular requests
- Run
listTimeoutsV85.py
to review and tune timeouts.
Previous Section (Troubleshooting Memory Leaks) |
Next Section (WAS traditional Dynamic Diagnostic Trace Recipe) |
Back to Table of Contents