Thread Pools
Thread pools and their corresponding threads control all execution of the application. The more threads you have, the more requests you can be servicing at once. However, the more threads you have the more they are competing for shared resources such as CPUs and the slower the overall response time may become as these shared resources are contended or exhausted. If you are not reaching a target CPU percentage usage, you can increase the three pool sizes, but this will probably require more memory and should be sized properly. If there is a bottleneck other than the CPUs, then CPU usage will stop increasing.
You can think of thread pools as queuing mechanisms to throttle how many concurrent requests you will have running at any one time in your application.
The most commonly used (and tuned) thread pools within the application server are:
- WebContainer: Used when requests come in over HTTP.
- ORB: Used when remote requests come in over RMI/IIOP for an enterprise bean from an EJB application client, remote EJB interface, or another application server.
- Messaging thread pools (see the messaging chapter)
Understand which thread pools your application uses and size all of them appropriately based on utilization you see in tuning exercises through thread dumps or PMI/TPV.
If the application server ends up being stalled 1/2 of the time it is working on an individual request (likely due to waiting for a database query to start returning data), then you want to run with 2X the number of threads than cores being pinned. Similarly if it's 25%, then 4X, etc.
Use TPV or the IBM Thread and Monitor Dump Analyzer (https://www.ibm.com/developerworks/community/groups/service/html/communityview?communityUuid=2245aa39-fa5c-4475-b891-14c205f7333c) to analyze thread pools.
Thread pools need to be sized with the total number of hardware processor cores in mind.
- If sharing a hardware system with other WAS instances, thread pools have to be tuned with that in mind.
- You need to more than likely cut back on the number of threads active in the system to ensure good performance for all applications due to context switching at OS layer for each thread in the system
- Sizing or restricting the max number of threads an application can have, will help prevent rouge applications from impacting others.
Note: The concurrent thread pool usage (PMI ActiveCount) may not necessarily be the concurrently "active" users hitting the application server. This is not due just to human think times and keepalive between requests, but also because of asynchronous I/O where active connections may not be actively using a thread until I/O activity completes (non-blocking I/O). Therefore, it is incorrect to extrapolate incoming concurrent activity from snapshots of thread pool usage.
If this metric approaches its maximum (which is determined by the maximum pool size), then you know that either the pool is simply too small or that there is a bottleneck that blocks the processing of some of the requests.
- Thread pool- Parameters : Good practice is to use 5 threads per server CPU core for the default thread pool, and 10 threads per server CPU for the ORB and Web container thread pools. For a machine with up to 4 CPUs, the default settings are usually a good start for most applications. If the machine has multiple application server instances, then these sizes should be reduced accordingly. Conversely, there could be situations where the thread pool size might need to be increased to account for slow I/O or long running back-end connections. Ref : http://www.ibm.com/developerworks/websphere/techjournal/0909_blythe/0909_blythe.html
Hung Thread Detection
WAS hung thread detection may be more accurately called WAS long response time detection (default 10-13 minutes) and the "may be hung" warning may be more accurately read as "has been executing for more than the configured threshold."
WSVR0605W is the warning printed when WAS detects that a unit of work is taking longer than the WAS hung thread detection threshold. The default hung thread detection threshold is 10 minutes. Hang detection only monitors most WAS managed threads, such as the WebContainer thread pool. Any native threads, or threads spawned by an application are not monitored. In recent versions of WAS, the warning includes the stack which often points to the delay:
[11/16/09 12:41:03:296 PST] 00000020 ThreadMonitor W WSVR0605W: Thread "WebContainer : 0" (00000021) has been active for 655546 milliseconds and may be hung. There is/are 1 thread(s) in total in the server that may be hung. at java.lang.Thread.sleep(Native Method) at java.lang.Thread.sleep(Thread.java:851) at com.ibm.Sleep.doSleep(Sleep.java:55) at com.ibm.Sleep.service(Sleep.java:35) at javax.servlet.http.HttpServlet.service(HttpServlet.java:831)...
WAS will check threads every `com.ibm.websphere.threadmonitor.interval` seconds (default 180) and any threads dispatched more than `com.ibm.websphere.threadmonitor.threshold` seconds (default 600) will be dumped (unless the false alarm threshold has been hit). Therefore, any thread dispatched between `com.ibm.websphere.threadmonitor.threshold` seconds and `com.ibm.websphere.threadmonitor.threshold` + `com.ibm.websphere.threadmonitor.interval` seconds will be marked.
The amount of time the thread has been active is approximate and is based on each containers' ability to accurately reflect a thread's waiting or running state; however, in general, it is the amount of milliseconds that a thread has been dispatched and doing "work" (i.e. started or reset to "non waiting" by a container) within a WAS managed thread pool.
The various hung thread detection properties may be changed and they are effective after a restart: http://www-01.ibm.com/support/knowledgecenter/SSAW57_8.5.5/com.ibm.websphere.nd.multiplatform.doc/ae/ttrb_confighangdet.html
- com.ibm.websphere.threadmonitor.interval: The frequency (in seconds) at which managed threads in the selected application server will be interrogated. Default: 180 seconds (three minutes).
- com.ibm.websphere.threadmonitor.threshold: The length of time (in seconds) in which a thread can be active before it is considered hung. Any thread that is detected as active for longer than this length of time is reported as hung. Default: The default value is 600 seconds (ten minutes).
- com.ibm.websphere.threadmonitor.dump.java: Set to true to cause a javacore to be created when a hung thread is detected and a WSVR0605W message is printed. The threads section of the javacore can be analyzed to determine what the reported thread and other related threads are doing. Default: False. Note: On z/OS, dumpThreads also creates a heapdump and TDUMP by default. This may be controlled with wsadmin_dumpthreads_enable_heapdump and wsadmin_dumpthreads_enable_javatdump: http://www-01.ibm.com/support/knowledgecenter/SSAW57_8.5.5/com.ibm.websphere.nd.multiplatform.doc/ae/urun_rproperty_custproperties.html?lang=en
- Starting with WAS 8.0.0.10 and 8.5.5.5 (http://www-01.ibm.com/support/docview.wss?uid=swg1PI27232), com.ibm.websphere.threadmonitor.dump.java.track: Set to an integer value in the range 2 through the value of com.ibm.websphere.threadmonitor.dump.java to cause the dumpThreads function to be run over subsequent monitor intervals in which a thread remains hung. The integer value indicates the maximum number of times dumpThreads will be run to track a hung thread.
For IBM JVMs, you can also produce dumps on a hung thread warning (http://www-01.ibm.com/support/knowledgecenter/SSYKE2_7.0.0/com.ibm.java.lnx.71.doc/diag/tools/trace_options_trigger.html):
-Xtrace:trigger=method{com/ibm/ws/runtime/component/ThreadMonitorImpl$RasListener.threadIsHung,sysdump,,,1}
On WAS >= 8.5:
-Xtrace:trigger=method{com/ibm/ws/runtime/component/ThreadMonitorImpl.threadIsHung,sysdump,,,1}
In this example, the maximum number of system dumps to produce for this trigger is 1. Enabling certain -Xtrace options may affect the performance of the entire JVM (see the -Xtrace section in the IBM Java chapter).
Thread Pool Statistics
Starting with WAS 7.0.0.31, 8.0.0.8, and 8.5.5.2, thread pool statistics may be written periodically to SystemOut.log or trace.log. This information may be written to SystemOut.log by enabling the diagnostic trace Runtime.ThreadMonitorHeartbeat=detail or to trace.log by enabling the diagnostic trace Runtime.ThreadMonitorHeartbeat=debug. Example output:
[1/12/15 19:38:15:208 GMT] 000000d4 ThreadMonitor A UsageInfo[ThreadPool:hung/active/size/max]={ SIBFAPThreadPool:0/2/4/50, TCPChannel.DCS:0/3/18/20, server.startup:0/0/1/3, WebContainer:0/3/4/12, SIBJMSRAThreadPool:0/0/10/41, ProcessDiscovery:0/0/1/2, Default:0/2/7/20, ORB.thread.pool:0/0/10/77, HAManager.thread.pool:0/0/2/2 }
When the diagnostic trace is enabled, this output is written every com.ibm.websphere.threadmonitor.interval seconds. Only thread pools that have at least one worker thread (whether active or idle) will be reported.
BoundedBuffer
Consider BoundedBuffer tuning: http://pic.dhe.ibm.com/infocenter/wasinfo/v8r5/topic/com.ibm.websphere.nd.multiplatform.doc/ae/tprf_tunechain.html
The thread pool request buffer is essentially a backlog in front of the thread pool. If the thread pool is at its maximum size and all of the threads are dispatched, then work will queue in the requestBuffer. The maximum size of the requestBuffer is equal to the thread pool maximum size; however, if the unit of work is executed on the thread pool with a blocking mode of EXPAND_WHEN_QUEUE_IS_FULL_ERROR_AT_LIMIT or EXPAND_WHEN_QUEUE_IS_FULL_WAIT_AT_LIMIT, then the maximum size is ThreadPoolMaxSize * 10. When the requestBuffer fills up, then WSVR0629I is issued (although only the first time this happens per JVM run per thread pool). When the requestBuffer is full, work will either wait or throw a ThreadPoolQueueIsFullException, depending on how the unit of work is executed.
Previous Section (Logging and Tracing) | Next Section (Java Database Connectivity (JDBC)) | Back to Table of Contents