IBM WebSphere Application Server Performance Cookbook

IBM JCL and Tools

Reflection Inflation

For a discussion of reflection and inflation, see the general Java chapter. On the IBM JVM, the option -Dsun.reflect.inflationThreshold=0 disables inflation completely.

The sun.reflect.inflationThreshold property tells the JVM what number of times to use the JNI accessor. If it is set to 0, then the JNI accessors are always used. Since the bytecode accessors use more native memory than the JNI ones, if we are seeing a lot of Java reflection, we will want to use the JNI accessors. To do this, we just need to set the inflationThreshold property to zero. (http://www-01.ibm.com/support/docview.wss?uid=swg21566549)

On IBM Java, the default -Dsun.reflect.inflationThreshold=15 means that the JVM will use the JNI accessor for the first 15 accesses, then after that it will change to use the Java bytecode accessor. Using bytecode accessor currently costs 3-4x more than an invocation via JNI accessor for the first invocation, but subsequent invocations have been benchmarked to be over 20x faster than JNI accessor.

Advanced Encryption Standard New Instructions (AESNI)

AESNI is a set of CPU instructions to improve the speed of encryption and decryption using AES ciphers. It is available on recent Intel and AMD CPUs (https://en.wikipedia.org/wiki/AES_instruction_set) and POWER >= 8 CPUs (http://www.redbooks.ibm.com/abstracts/sg248171.html). If using IBM Java >= 6 and the IBM JCE security provider, then AESNI, if available, can be exploited with -Dcom.ibm.crypto.provider.doAESInHardware=true (http://www-01.ibm.com/support/knowledgecenter/SSYKE2\_7.0.0/com.ibm.java.security.component.70.doc/security-component/JceDocs/aesni.html?lang=en).

In some benchmarks, SSL/TLS overhead was reduced by up to 35%.

Use -Dcom.ibm.crypto.provider.AESNITrace=true to check if the processor supports the AES-IN instruction set:

Object Request Broker (ORB) and Remote Method Invocation (RMI)

Important links:

Review key ORB properties discussed in WAS documentation and Java documentation with the following highlights:

-Dcom.ibm.CORBA.ConnectTimeout=SECONDS : New socket connect timeout.
-Dcom.ibm.CORBA.MaxOpenConnections=X : Maximum number of in-use connections that are to be kept in the connection cache table at any one time.
-Dcom.ibm.CORBA.RequestTimeout=SECONDS : Total number of seconds to wait before timing out on a Request message.
-Dcom.ibm.CORBA.SocketWriteTimeout=SECONDS : More granular timeout for every socket write. The value will depend on whether fragmentation is enabled or not. If it's enabled, then it should generally be set relatively low (e.g. 5 seconds) because each write is very small (see FragmentSize). If it's disabled, then the write will be as big as the largest message, so set the timeout based on that and your expected network performance. When setting this value, set on both client and server.
-Dcom.ibm.CORBA.ConnectionMultiplicity=N : See the discussion on ConnectionMultiplicity. Recent versions of Java automatically tune this value at runtime.
-Dcom.ibm.websphere.orb.threadPoolTimeout=MILLISECONDS : Avoid potential deadlocks or hangs on reader threads. This is often set to a value of 10000.
-Dcom.ibm.CORBA.FragmentSize=N : See the discussion on FragmentSize.

Default ORB configuration is specified in ${java}/jre/lib/orb.properties. For WAS, it's generally recommended instead to change these options (where available) under Administrative Console } Websphere application servers } $server } Container services } ORB service or using -D generic JVM arguments. There may be additional settings under Custom Properties within this panel. WAS on z/OS has additional settings under z/OS additional settings. For WAS configuration, these available settings translate to the <services xmi:type="orb:ObjectRequestBroker" ... element or <properties ... child elements underneath that in server.xml (or genericJVMarguments and custom properties under the JVM section in server.xml).

Note that in WAS, there is an ORB.thread.pool configuration which is normally used; however, if the ThreadPool properties are specified in orb.properties, then they override the WAS configuration. See a detailed discussion of how properties are evaluated at the different levels.

You may see ORB reader threads (RT) and writer threads (WT). For example, here is a reader thread:

3XMTHREADINFO "RT=265:P=941052:O=0:WSTCPTransportConnection[addr=...,port=2940,local=48884]" J9VMThread:0x000000000E255600, j9thread_t:0x00002AAAC15D5470, java/lang/Thread:0x000000004CF4B4F0, state:R, prio=5
3XMTHREADINFO1 (native thread ID:0x7EFD, native priority:0x5, native policy:UNKNOWN)
3XMTHREADINFO2 (native stack address range from:0x00002AAAD7D6A000, to:0x00002AAAD7DAB000, size:0x41000)
3XMTHREADINFO3 Java callstack:
4XESTACKTRACE at java/net/SocketInputStream.socketRead0(Native Method)
4XESTACKTRACE at java/net/SocketInputStream.read(SocketInputStream.java:140(Compiled Code))
4XESTACKTRACE at com/ibm/rmi/iiop/Connection.readMoreData(Connection.java:1642(Compiled Code))
4XESTACKTRACE at com/ibm/rmi/iiop/Connection.createInputStream(Connection.java:1455(Compiled Code))
4XESTACKTRACE at com/ibm/rmi/iiop/Connection.doReaderWorkOnce(Connection.java:3250(Compiled Code))
4XESTACKTRACE at com/ibm/rmi/transport/ReaderThread.run(ReaderPoolImpl.java:142(Compiled Code))

These will normally be in R (runnable) state, even if they are just waiting for the incoming message.

The number of Reader Threads (RT) are controlled by the number of active socket connections, not by the ORB thread pool size. For every socket.connect/accept call, an RT gets created and an RT gets removed when the socket closes. RT is not bounded by MaxConnectionCacheSize which is a soft limit - the cache can grow beyond the MaxConnectionCacheSize. Once the cache hits the MaxConnectionCacheSize, the ORB will try to remove stale i.e. unused connections.

The ORB thread pool size will be a cap on the maximum number of Writer Threads (WT), as only up to the number of ORB threads can be writing.

Connection Multiplicity

com.ibm.CORBA.ConnectionMultiplicity: The value of the ConnectionMultiplicity defines the number of concurrent TCP connections between a server and client ORB. The default is 1 or automatic depending on the version of Java. Lower values can lead to a performance bottleneck in J2EE deployments where there are a large number of concurrent requests between client & server ORB.

For example, -Dcom.ibm.CORBA.ConnectionMultiplicity=N

See further discussion at https://www.ibm.com/support/pages/troubleshooting-object-request-broker-orb-problems-1 and https://www.ibm.com/support/pages/node/244347.

Fragment Size

The ORB separates messages into fragments to send over the ORB connection. You can configure this fragment size through the com.ibm.CORBA.FragmentSize parameter.
To determine and change the size of the messages that transfer over the ORB and the number of required fragments, perform the following steps:

In the administrative console, enable ORB tracing in the ORB Properties page.
Enable ORBRas diagnostic trace ORBRas=all (http://www-01.ibm.com/support/docview.wss?uid=swg21254706).
Increase the trace file sizes because tracing can generate a lot of data.
Restart the server and run at least one iteration (preferably several) of the case that you are measuring.
Look at the traceable file and do a search for Fragment to follow: Yes.

This message indicates that the ORB transmitted a fragment, but it still has at least one remaining fragment to send prior to the entire message arriving. A Fragment to follow: No value indicates that the particular fragment is the last in the entire message. This fragment can also be the first, if the message fits entirely into one fragment.

If you go to the spot where Fragment to follow: Yes is located, you find a block that looks similar to the following example:

Fragment to follow: Yes
Message size: 4988 (0x137C)
--
Request ID: 1411

This example indicates that the amount of data in the fragment is 4988 bytes and the Request ID is 1411. If you search for all occurrences of Request ID: 1411, you can see the number of fragments that are used to send that particular message. If you add all the associated message sizes, you have the total size of the message that is being sent through the ORB.
You can configure the fragment size by setting the com.ibm.CORBA.FragmentSize ORB custom property.

http://www.ibm.com/support/knowledgecenter/SSYKE2_8.0.0/com.ibm.java.lnx.80.doc/diag/understanding/orb_using.html

Setting -Dcom.ibm.CORBA.FragmentSize=0 disables fragmentation and may improve performance in some cases; however, note that multiple requests may be multiplexed on a single connection and there will be a lock on the connection during the write of the full message. If a message is large, fragmentation is disabled, and the concurrent load is greater than ConnectionMultiplicity, this may create a bottleneck.

One additional value that FragmentSize=0 may provide is to isolate a subset of problematic clients (e.g. bad network) to just the reader threads because a full, non-fragmented read occurs on the reader thread whereas with fragmentation, it will need to consume a worker thread while it waits for the next fragment. Note that when the server sends the response back to the client, the write happens on the ORB thread pool thread; however, with a sufficient ORB thread pool size, this may help isolate such problematic clients.

Interceptors

Interceptors are ORB extensions that can set up the context prior to the ORB runs a request. For example, the context might include transactions or activity sessions to import. If the client creates a transaction, and then flows the transaction context to the server, then the server imports the transaction context onto the server request through the interceptors.

Most clients do not start transactions or activity sessions, so most systems can benefit from removing the interceptors that are not required.

To remove the interceptors, manually edit the server.xml file and remove the interceptor lines that are not needed from the ORB section.

ORB IBM Data Representation (IDR)

ORB 7.1 introduced dramatic performance improvements.

java.nio.DirectByteBuffer

Unlike IBM Semeru Runtimes and similar OpenJDK runtimes, IBM Java has slightly different default DirectByteBuffer behavior: The -XX:MaxDirectMemorySize hard limit defaults to unlimited and instead there is a "soft" limit default of 64MB. This soft limit grows if the application needs more DirectByteBuffer space and no hard limit is configured. The JDK is reluctant to grow the soft limit and before it will expand the soft limit there will be a series of System GC events, trying to free up enough space to avoid growing the limit. System GCs cause long pauses during which application threads cannot do any work, so we generally try to avoid them if at all possible for performance reasons. Also, when the soft limit is used, the limit will be raised only a little bit at a time, and every time the limit is hit, there is another series of System GCs before growth is allowed. So if the application demands a lot of DirectByteBuffers, then starting at 64 MB and growing to whatever the necessary size is will take a long time, during which performance will be significantly impacted. For this reason, from a performance perspective, it is generally recommended to specify -XX:MaxDirectMemorySize as needed. Ensure there is enough physical memory to support the potential DBB demands. For example:

-XX:MaxDirectMemorySize=1G

The option -Dcom.ibm.nio.DirectByteBuffer.AggressiveMemoryManagement=true may be used to enable a more aggressive DirectByteBuffer cleanup algorithm (which may increase the frequency of System.gc calls).

XML and XSLT

Profile your application using tools such as the IBM Java Health Center or more simply by taking multiple thread dumps. If you observe significant lock contention on an instance of java/lang/Class and/or significant CPU time in com/ibm/xtq/xslt/* classes, then consider testing the older XSLT4J interpreter to see if you have better results:

From Version 6, the XL TXE-J compiler replaces the XSLT4J interpreter as the default XSLT processor.

The XL TXE-J compiler is faster than the XSLT4J interpreter when you are applying the same transformation more than once. If you perform each individual transformation only once, the XL TXE-J compiler is slower than the XSLT4J interpreter because compilation and optimization reduce performance.

For best performance, ensure that you are not recompiling XSLT transformations that can be reused. Use one of the following methods to reuse compiled transformations:

If your stylesheet does not change at run time, compile the stylesheet as part of your build process and put the compiled classes on your classpath. Use the org.apache.xalan.xsltc.cmdline.Compile command to compile the stylesheet and set the http://www.ibm.com/xmlns/prod/xltxe-j/use-classpath transformer factory attribute to true to load the classes from the classpath.

If your application will use the same stylesheet during multiple runs, set the http://www.ibm.com/xmlns/prod/xltxe-j/auto-translet transformer factory attribute to true to automatically save the compiled stylesheet to disk for reuse. The compiler will use a compiled stylesheet if it is available, and compile the stylesheet if it is not available or is out-of-date. Use the http://www.ibm.com/xmlns/prod/xltxe-j/destination-directory transformer factory attribute to set the directory used to store compiled stylesheets. By default, compiled stylesheets are stored in the same directory as the stylesheet.

If your application is a long-running application that reuses the same stylesheet, use the transformer factory to compile the stylesheet and create a Templates object. You can use the Templates object to create Transformer objects without recompiling the stylesheet. The Transformer objects can also be reused but are not thread-safe.

If your application uses each stylesheet just once or a very small number of times, or you are unable to make any of the other changes listed in this step, you might want to continue to use the XSLT4J interpreter by setting the javax.xml.transform.TransformerFactory service provider to org.apache.xalan.processor.TransformerFactoryImpl.

http://www.ibm.com/support/knowledgecenter/SSYKE2_8.0.0/com.ibm.java.lnx.80.doc/user/xml/xslt_migrate.html

For additional information, see http://www-01.ibm.com/support/docview.wss?uid=swg21639667

DNS Cache

The DNS Cache works the same as in the OpenJDK JCL; however, there is an additional property -Dcom.ibm.cacheLocalHost=true to cache localhost lookups.

Previous Section (OpenJDK JCL and Tools) | Next Section (Java Profilers) | Back to Table of Contents

Footer links