Java Database Connectivity (JDBC)

Investigating Long Executions

On WAS >= 9.0.0.4 and 8.5.5.12, consider enableJDBCTiming with a millisecond threshold which will print a stack of executions exceeding that threshold and the query.

Lightweight Query Trace

*=info:com.ibm.ws.rsadapter.jdbc.WSJdbcPreparedStatement=all

https://www.ibm.com/support/pages/how-display-sql-statements-executed-jee-applications-using-minimal-tracing

Database Connection Pools

Database connection pools are highly contended in heavily multi-threaded applications. Ensuring available connections in the pool leads to superior performance. Monitor PMI metrics to watch the number of threads waiting on connections from the pool as well as the average wait time.

  • If threads are waiting, consider increasing the number of pooled connections in conjunction with your database administrator (DBA), decreasing the number of active threads in the system, or investigating the usage of database connections by the application.
  • In some cases, a one-to-one mapping between DB connections and threads may be ideal.
  • Always use the latest database driver for the database you are running as performance optimizations between versions are often significant.

The maximum connection pool size is set under Connection pool settings: Resources } JDBC } Data Sources } $DS } Connection pool properties

In order to successfully tune the connection pool, you need to know two pieces of information:

  1. The requests per second that occur during a peak
  2. How long the database takes to respond to each type of operation, SELECT, INSERT, UPDATE, and so on.

Maximum connections setting:

  • Double the number of the Maximum connections parameter then slowly back it down
  • Better performance is generally achieved if this value is set lower than the value for the maximum size of the Web container thread pool

If a ConnectionWaitTimeoutException is found in the WebSphere logs:

  • Obtain the average database operations duration for the application
  • Start with a value that is 5 seconds longer than this average
  • Gradually increase it until problem is resolved or setting is at the highest value that the client/SLAs will tolerate.
  • Before you increase the pool size, consult the database administrator. Why? Because the DBA sets the maximum number of connections their database will accept. If the size of the connection pool increases then that will across all cluster members and can result in trying to establish more connections to the database than it will accept. That scenario results in a lot of strange failures that will take some time to troubleshoot to get to the root cause.
  • Ensure that the database server is configured to handle the maximum pool size setting.
  • In a clustered environment, there is the potential of simultaneously allocating Max connections form all servers simultaneously.

Connection pools are a shared, synchronized resource. They have been highly optimized but when there are a very large number of threads, lock synchronization may become a bottleneck. You may use the IBM Health Center tool or similar tool to measure the lock contention, and if it is high, then you may need to consider scaling out to more JVMs.

Connection pool idle and aged timeouts

For maximum performance, connections in the pool should not timeout due to the idle timeout ("Unused timeout") nor the age timeout ("Aged timeout"). To accomplish this, disable the connection pool maintenance thread by setting the "Reap time" to 0.

The reason to do this is that connection creation and destruction may be expensive (e.g. TLS, authentication, etc.). Besides increased latency, in some cases, this expense may cause a performance tailspin that may make response time spikes worse; for example, something causes an initial database response time spike, incoming load in the clients continues apace, the clients create new connections, and the process of creating new connections causes the database to slow down more than it otherwise would, causing further backups, etc.

The main potential drawback of this approach is that if there is a firewall between the connection pool and the database, and the firewall has an idle or age timeout, then the connection may be destroyed and cause a stale connection exception the next time it's used. This may fail the request and purge the entire connection pool if "Purge policy" = "EntirePool". The main ways to avoid this are either to configure the firewall idle or age timeouts similar to above, or tune the TCP keepalive settings in the client or database operating systems below the timeouts.

Similarly, some databases may have their own idle or age timeouts. The database should be tuned similarly. For example, IBM DB2 does not have such connection timeouts.

Finally, some people use connection pool usage as a proxy of database response time spikes. Instead, monitor database response times.

Connection Pool Usage

The DataSource MBean may be used to query connection pool usage using wsadmin -lang jython. In the following example, three connections are in use and two connections are free:

wsadmin>print AdminControl.invoke(AdminControl.queryNames("*:type=DataSource,process=server1,name=TradeDataSource,*"), "showPoolContents")

PoolManager name:jdbc/TradeDataSource
PoolManager object:-522043580
Total number of connections: 5 (max/min 5/5, reap/unused/aged 180/1800/0, connectiontimeout/purge 180/EntirePool)
                               (testConnection/inteval false/0, stuck timer/time/threshold 0/0/0, surge time/connections 0/-1)
                               (pool paused false, prePopulate alternate false, resourceFailBackEnabled true, isAlternateResourceEnabled false,
                                 disableDatasourceFailoverAlarm false, startFailBack false)
                               (isPartialResourceAdapterFailoverSupportEnabled false, isAlteranteResourcePoolManager false, resourceAvailabilityTestRetryInterval
                                 10, currentInusePool null, currentMode 100, alternate jndiName null)
Shared Connection information (shared partitions 200)
    com.ibm.ws.tx.jta.TransactionImpl@a47615d6#tid=349227028  MCWrapper id 767a85e9  Managed connection WSRdbManagedConnectionImpl@37f2f2c5  
      State:STATE_TRAN_WRAPPER_INUSE Connections being held 1 Used with transaction com.ibm.ws.tx.jta.TransactionImpl@a47615d6#tid=349227028
    com.ibm.ws.tx.jta.TransactionImpl@9ea5a8b5#tid=349227084  MCWrapper id 3f4eefc9  Managed connection WSRdbManagedConnectionImpl@689ac78c  
      State:STATE_TRAN_WRAPPER_INUSE Connections being held 1 Used with transaction com.ibm.ws.tx.jta.TransactionImpl@9ea5a8b5#tid=349227084
    com.ibm.ws.tx.jta.TransactionImpl@4850aa55#tid=349227060  MCWrapper id 716535f  Managed connection WSRdbManagedConnectionImpl@7424ebb6  
      State:STATE_TRAN_WRAPPER_INUSE Connections being held 1 Used with transaction com.ibm.ws.tx.jta.TransactionImpl@4850aa55#tid=349227060
  Total number of connection in shared pool: 3
Free Connection information (free distribution table/partitions 5/1)
  (0)(0)MCWrapper id 863b69f0  Managed connection WSRdbManagedConnectionImpl@41038936  State:STATE_ACTIVE_FREE
  (0)(0)MCWrapper id 94ff7816  Managed connection WSRdbManagedConnectionImpl@9791d5db  State:STATE_ACTIVE_FREE

  Total number of connection in free pool: 2
UnShared Connection information
  No unshared connections

Connection Leak Logic Information: ...

All data source connection pool statistics may be displayed with showAllPoolContents:

wsadmin>print AdminControl.invoke(AdminControl.queryNames("*:type=DataSource,process=server1,name=TradeDataSource,*"), "showAllPoolContents")

Free connections in a data source connection pool may be purged manually:

wsadmin>AdminControl.invoke(AdminControl.queryNames("*:type=DataSource,process=server1,name=TradeDataSource,*"), "purgePoolContents", "immediate")
''
wsadmin>AdminControl.invoke(AdminControl.queryNames("*:type=DataSource,process=server1,name=TradeDataSource,*"), "purgePoolContents", "0")
''

Statement cache

"The WebSphere Application Server data source optimizes the processing of prepared statements and callable statements by caching those statements that are not being used in an active connection. Both statement types help reduce overhead for transactions with backend data.

A prepared statement is a precompiled SQL statement that is stored in a PreparedStatement object. Application Server uses this object to run the SQL statement multiple times, as required by your application run time, with values that are determined by the run time.
A callable statement is an SQL statement that contains a call to a stored procedure, which is a series of precompiled statements that perform a task and return a result. The statement is stored in the CallableStatement object. Application Server uses this object to run a stored procedure multiple times, as required by your application run time, with values that are determined by the run time.

In general, the more statements your application has, the larger the cache should be. Be aware, however, that specifying a larger statement cache size than needed wastes application memory and does not improve performance.

Determine the value for your cache size by adding the number of uniquely prepared statements and callable statements (as determined by the SQL string, concurrency, and the scroll type) for each application that uses this data source on a particular server. This value is the maximum number of possible statements that can be cached on a given connection over the life of the server.

Default: For most databases the default is 10. Zero means there is no cache statement." (https://www.ibm.com/support/knowledgecenter/SSAW57_8.5.5/com.ibm.websphere.nd.doc/ae/rdat_datobjtune.html)

The statement cache size specifies the number of statements that can be cached per connection. Caching prepared statements improves overall response times because an application can reuse a PreparedStatement on a connection if it exists in that connection's cache, bypassing the need to create a new PreparedStatement. However, to make effective use of this cache the application has to be properly written to use parameterized SQL statements using the ? (question mark) notation instead of dynamically building strings with the parameters already substituted as each unique statement will make the cache useless.

Ideally the PreparedStmtCacheDiscardCount should be zero; however, given potential memory constraints, then having a slow incrementing count is not necessarily a bad thing. See: https://www.ibm.com/support/pages/node/6410242

Recommendations are made in several WebSphere Application Server documents on the value for the prepared statement cache. They all recommend estimating the number of unique SQL statements an application prepares and using this number to set the number of prepared statements to be cached for each connection.

These formulas work well when the number of unique prepared statements and maximum connections are relatively small; however, these formulas do not take into account the possible memory consumption of the cached prepared statements, particularly when the total number of statements being cached is large. What is considered a small or large prepared statement cache depends on the database vendor in use.

Each prepared statement object consumes some amount of memory. The actual amount is variable, based on the database vendor in use, as well as the number and size of the parameter data for the statement. When prepared statement caches are configured to large values, it is possible to outgrow the amount of memory available to the cache, resulting in unexpected behavior. Depending on the type of JDBC driver, the memory consumption might be from the Java heap or from the JVM's native heap...

If you choose to decrease the size of your prepared statement cache, some cycling of the statement cache could occur, as the least recently used statements are closed to make room for more recently used statements. It can be worthwhile to analyze the usage pattern of the prepared statements in your application. If some prepared statements are executed infrequently, the penalty in consumed resources might outweigh the advantage of the caching mechanism. These infrequently-used statements might be better suited to the java.sql.Statement interface, rather than the java.sql.PreparedStatement interface. Statement objects are not cached by the Application Server and will not consume memory beyond the scope in which they are used.

Shareable versus Unshareable Connections

Database connections marked shareable are not returned to the connection pool when they are closed. Instead, they are reserved for reuse by subsequent requests for a connection within the same transaction containment context. For example, if a thread within a servlet uses the normal get-use-close pattern on a database connection more than once, the second time, the same connection is immediately returned since it was reserved from the pool.

The Java Enterprise Edition specification defines shareable as the default configuration unless otherwise specified:

Sharing connections typically results in efficient usage of resources and better performance. [...] Containers must assume connections to be shareable if no deployment hint is provided.

With all that said, there are some cases where unshareable connections perform better, so you should consider trying unshareable. Note that this may expose connection leaks or other problems. You can set globalConnectionTypeOverride=unshared to disable shareable connections: https://www.ibm.com/support/knowledgecenter/SSAW57_8.5.5/com.ibm.websphere.nd.doc/ae/tdat_conpoolman.html

Scenarios where unshareable connections may be preferable:

  • The time it takes for the application to service an HTTP request takes a long time.
  • The application typically does not open/close more than one connection to service an HTTP request.
  • The application rarely uses a transaction other than auto-commit with the database.

Scenarios where shareable connections may be preferable:

  • The time it takes for the application to service an HTTP request is very quick.
  • The application will frequently open/close a connection to the database.
  • The application makes heavy use of transactions to the database.
  • Some EJB container transactions require shareable connections.

As with any setting within the application server it is imperative to perform load testing and seeing which connection setting works better with the application.

More JDBC Connections than Threads

Applications that open more than one JDBC connection simultaneously in the same thread before closing the previous connections are identified by seeing more connections in the JDBC connection pool than threads in the thread pool. This can potentially result in an application deadlock if there are not enough connections in the connection pool. To correct this the application developers have to fix the code to close a JDBC connection before acquiring another connection.

DB2 JDBC Driver

On HP-UX, preallocate the DB2 trace segment and ensure the database is created with the UTF-8 code set: https://www.ibm.com/support/knowledgecenter/SSAW57_8.5.5/com.ibm.websphere.nd.doc/ae/tprf_tunehp.html

Oracle JDBC Driver

Tracing

Enabling Oracle JDBC driver trace: Oracle ships several JAR files for each version of the JDBC drivers. The optimized JAR files do not contain any logging code and, therefore, do not generate any log output when used. To get log output, you must use the debug JAR files, which are indicated with a "_g" in the file name, like ojdbc5_g.jar or ojdbc6_g.jar.

  • Set this diagnostic trace: *=info:WAS.j2c=all:RRA=all:WAS.database=all:Transaction=all:com.ibm.ws.oracle.logwriter=all
  • Activate the Debug Library by creating the custom property: -Doracle.jdbc.Trace=true

The JVM must be restarted after the changes have been made to use the debug JAR. The Oracle trace points all come from the 'logwriter' component.

Large memory usage

High memory usage, java.lang.OutOfMemoryErrors, slow performance, and a large volume of garbage collection cycles may occur when the Oracle JDBC driver is used to connect to Oracle databases. This is due to the memory management of the Oracle JDBC driver.

In a heap dump, it can be seen that the Oracle JDBC driver stores a large amount of data in Connection and PreparedStatement objects. For example, oracle.jdbc.driver.T4CConnection, oracle.jdbc.driver.PhysicalConnection$BufferCacheStore, oracle.jdbc.driver.BufferCache, oracle.jdbc.driver.T4CPreparedStatement, and others.

The problem is caused by the way that the Oracle JDBC driver manages memory. For full details, refer to the Oracle white paper, Oracle JDBC Memory Management. Here are some relevant quotes:

The Oracle JDBC drivers can use large amounts of memory. This is a conscious design choice, to trade off large memory use for improved performance. For the most part and for most users this has proved to be a good choice. Some users have experienced problems with the amount of memory the JDBC drivers use.

Some users, mostly those with very large scale applications, have seen performance problems due to large heap size, garbage collector thrashing, and even OutOfMemoryExceptions. In subsequent releases the development team has worked to address those issues by improving the way the drivers use memory and by providing users with additional control to address specific problems.

the size of the buffers depends not on the actual size of the row data returned by the query, but on the maximum size possible for the row data. After the SQL is parsed, the type of every column is known and from that information the driver can compute the maximum amount of memory required to store each column. The driver also has the fetchSize, the number of rows to retrieve on each fetch. With the size of each column and the number of rows, the driver can compute the absolute maximum size of the data returned in a single fetch. That is the size of the buffers.

In the worst case, consider a query that returns 255 VARCHAR2(4000) columns. Each column takes 8k bytes per row. Times 255 columns is 2040K bytes or 2MB per row. If the fetchSize is set to 1000 rows, then the driver will try to allocate a 2GB char[]. This would be bad... The primary tool for controlling memory use is the fetchSize.

Although Java memory management is quite good, allocating large buffers is expensive. It is not the actual malloc cost. That is very fast. Instead the problem is the Java language requirement that all such buffers be zero filled. So not only must a large buffer be malloc'ed, it must also be zero filled. Zero filling requires touching every byte of the allocated buffer. Modern processors with their multilevel data caches do ok with small buffers. Zero filling a large buffer overruns the processor data caches and runs at memory speed, substantially less than the maximum speed of the processor. Performance testing has repeatedly shown that allocating buffers is a huge performance drag on the drivers. This has led to a struggle to balance the cost of allocating buffers with the memory footprint required to save buffers for reuse.

The 11.1.0.7.0 drivers introduce a connection property to address the large buffer problem. This property bounds the maximum size of buffer that will be saved in the buffer cache... The connection property is -Doracle.jdbc.maxCachedBufferSize=N ... e.g. 100000. The default is Integer.MAX_VALUE. This is the maximum size for a buffer which will be stored in the internal buffer cache... If you need to set maxCachedBufferSize, start by estimating the buffer sizes for the SQL queries that require the largest buffers. In the process you may find that by tuning the fetch size for these queries you can achieve the desired performance. Considering the frequency of execution and the size of the buffers, pick a size such that most statements can use cached buffers, but still small enough so that the Java runtime can support the number of buffers needed in order to minimize the frequency with which new buffers have to be allocated.

In 11.2 the value of maxCachedBufferSize is interpreted as the log base 2 of the maximum buffer size. For example if maxCachedBufferSize is set to 20 the max size buffer that is cached is 2^20 = 1048576. For backwards compatibility, values larger than 30 are interpreted as the actual size rather than log2 of the size, but using powers of 2 is recommended... It is usually the case that setting maxCachedBufferSize to a reasonable value has no impact. If you need to set maxCachedBufferSize, start with 18. If you have to set the value to less than 16, you probably need more memory."