Threads in socketRead0 in JDBC calls

JDBC calls can sometimes get stuck on socket read calls to the database if some rather nasty network problems exist or if there is a firewall, between the application server and the database, that aggressively closes long-lived connections (some organizations have security reasons to prevent long-lived connections).

The way to determine if network problems exist is to use tcpdump (AIX, Linux) or snoop (Solaris) to capture the packets into files. One can then use Wireshark to read the capture files. If you see issues like "unreassembled packets", "lost segments" or "duplicate ACK" errors then most likely the network is experiencing serious difficulties affecting the server.

WebSphere Application Server is reporting hung threads in the SystemOut.log and a hung thread message as follows:

[1/2/12 1:23:45:678 EDT] 0000001c ThreadMonitor W WSVR0605W: Thread "WebContainer : 15" (00000045) has been active for 722010 milliseconds and may be hung. There is/are 1 thread(s) in total in the server that may be hung.
  at java.net.SocketInputStream.socketRead0(Native Method)
  at java.net.SocketInputStream.read(SocketInputStream.java:141)
  at com.ibm.db2.jcc.t4.z.b(z.java:199)
  at com.ibm.db2.jcc.am.nn.executeQuery(nn.java:698) [...]

There exists no deadlock or timeout recorded in the logs, even when there are lock timeout (LOCKTIMEOUT) and deadlock check time (DLCHKTIME) settings defined that are greater than 0.

Strategy 1: Apply socketRead timeouts

If threads hang on socketRead0 calls that never seem to get a response then the only way to deal with them is by applying timeouts.

Set the timeout to a reasonable value. The actual value depends on how long is the longest running transaction for the particular application connected to a specific database. If the longest transaction is, for example, 10 seconds then a reasonable value for the timeout could be 12 seconds.

Monitor

Watch the SystemOut.log file and ensure that hung thread messages do not appear again.

Caveats

If the timeout is set too low for the longest running transactions then those transactions will fail.