IBM JCL and Tools
Reflection Inflation
For a discussion of reflection and inflation, see the general Java chapter. On the IBM JVM, the option -Dsun.reflect.inflationThreshold=0 disables inflation completely.
The sun.reflect.inflationThreshold property tells the JVM what number of times to use the JNI accessor. If it is set to 0, then the JNI accessors are always used. Since the bytecode accessors use more native memory than the JNI ones, if we are seeing a lot of Java reflection, we will want to use the JNI accessors. To do this, we just need to set the inflationThreshold property to zero. (http://www-01.ibm.com/support/docview.wss?uid=swg21566549)
On IBM Java, the default -Dsun.reflect.inflationThreshold=15 means that the JVM will use the JNI accessor for the first 15 accesses, then after that it will change to use the Java bytecode accessor. Using bytecode accessor currently costs 3-4x more than an invocation via JNI accessor for the first invocation, but subsequent invocations have been benchmarked to be over 20x faster than JNI accessor.
Advanced Encryption Standard New Instructions (AESNI)
AESNI is a set of CPU instructions to improve the speed of encryption and decryption using AES ciphers. It is available on recent Intel and AMD CPUs (https://en.wikipedia.org/wiki/AES_instruction_set) and POWER >= 8 CPUs (http://www.redbooks.ibm.com/abstracts/sg248171.html). If using IBM Java >= 6 and the IBM JCE security provider, then AESNI, if available, can be exploited with -Dcom.ibm.crypto.provider.doAESInHardware=true (http://www-01.ibm.com/support/knowledgecenter/SSYKE2\_7.0.0/com.ibm.java.security.component.70.doc/security-component/JceDocs/aesni.html?lang=en).
In some benchmarks, SSL/TLS overhead was reduced by up to 35%.
Use -Dcom.ibm.crypto.provider.AESNITrace=true to check if the processor supports the AES-IN instruction set:
Object Request Broker (ORB) and Remote Method Invocation (RMI)
Important links:
- General information on ORB
- TroubleShooting: Object Request Broker (ORB) problems
- MustGather: Object Request Broker (ORB)
- Additional troubleshooting guide
- Object Request Broker tuning guidelines
Review key ORB properties discussed in WAS documentation and Java documentation with the following highlights:
-Dcom.ibm.CORBA.ConnectTimeout=SECONDS
: New socket connect timeout.-Dcom.ibm.CORBA.MaxOpenConnections=X
: Maximum number of in-use connections that are to be kept in the connection cache table at any one time.-Dcom.ibm.CORBA.RequestTimeout=SECONDS
: Total number of seconds to wait before timing out on a Request message.-Dcom.ibm.CORBA.SocketWriteTimeout=SECONDS
: More granular timeout for every socket write. The value will depend on whether fragmentation is enabled or not. If it's enabled, then it should generally be set relatively low (e.g. 5 seconds) because each write is very small (see FragmentSize). If it's disabled, then the write will be as big as the largest message, so set the timeout based on that and your expected network performance. When setting this value, set on both client and server.-Dcom.ibm.CORBA.ConnectionMultiplicity=N
: See the discussion on ConnectionMultiplicity. Recent versions of Java automatically tune this value at runtime.-Dcom.ibm.websphere.orb.threadPoolTimeout=MILLISECONDS
: Avoid potential deadlocks or hangs on reader threads. This is often set to a value of 10000.-Dcom.ibm.CORBA.FragmentSize=N
: See the discussion on FragmentSize.
Default ORB configuration is specified in
${java}/jre/lib/orb.properties
. For WAS, it's generally
recommended instead to change these options (where available) under
Administrative Console } Websphere application servers } $server }
Container services } ORB
service or using -D
generic JVM arguments. There may be
additional settings under Custom Properties
within this
panel. WAS on z/OS has additional settings under z/OS
additional settings. For WAS configuration, these available settings
translate to the
<services xmi:type="orb:ObjectRequestBroker" ...
element
or <properties ...
child elements underneath that in
server.xml (or genericJVMarguments and custom properties under the JVM
section in server.xml).
Note that in WAS, there is an ORB.thread.pool configuration which is normally used; however, if the ThreadPool properties are specified in orb.properties, then they override the WAS configuration. See a detailed discussion of how properties are evaluated at the different levels.
You may see ORB reader threads (RT) and writer threads (WT). For example, here is a reader thread:
3XMTHREADINFO
"RT=265:P=941052:O=0:WSTCPTransportConnection[addr=...,port=2940,local=48884]"
J9VMThread:0x000000000E255600, j9thread_t:0x00002AAAC15D5470,
java/lang/Thread:0x000000004CF4B4F0, state:R, prio=5
3XMTHREADINFO1 (native thread ID:0x7EFD, native priority:0x5, native
policy:UNKNOWN)
3XMTHREADINFO2 (native stack address range from:0x00002AAAD7D6A000,
to:0x00002AAAD7DAB000, size:0x41000)
3XMTHREADINFO3 Java callstack:
4XESTACKTRACE at java/net/SocketInputStream.socketRead0(Native
Method)
4XESTACKTRACE at
java/net/SocketInputStream.read(SocketInputStream.java:140(Compiled
Code))
4XESTACKTRACE at
com/ibm/rmi/iiop/Connection.readMoreData(Connection.java:1642(Compiled
Code))
4XESTACKTRACE at
com/ibm/rmi/iiop/Connection.createInputStream(Connection.java:1455(Compiled
Code))
4XESTACKTRACE at
com/ibm/rmi/iiop/Connection.doReaderWorkOnce(Connection.java:3250(Compiled
Code))
4XESTACKTRACE at
com/ibm/rmi/transport/ReaderThread.run(ReaderPoolImpl.java:142(Compiled
Code))
These will normally be in R (runnable) state, even if they are just waiting for the incoming message.
The number of Reader Threads (RT) are controlled by the number of active socket connections, not by the ORB thread pool size. For every socket.connect/accept call, an RT gets created and an RT gets removed when the socket closes. RT is not bounded by MaxConnectionCacheSize which is a soft limit - the cache can grow beyond the MaxConnectionCacheSize. Once the cache hits the MaxConnectionCacheSize, the ORB will try to remove stale i.e. unused connections.
The ORB thread pool size will be a cap on the maximum number of Writer Threads (WT), as only up to the number of ORB threads can be writing.
Connection Multiplicity
com.ibm.CORBA.ConnectionMultiplicity: The value of the ConnectionMultiplicity defines the number of concurrent TCP connections between a server and client ORB. The default is 1 or automatic depending on the version of Java. Lower values can lead to a performance bottleneck in J2EE deployments where there are a large number of concurrent requests between client & server ORB.
For example,
-Dcom.ibm.CORBA.ConnectionMultiplicity=N
See further discussion at https://www.ibm.com/support/pages/troubleshooting-object-request-broker-orb-problems-1 and https://www.ibm.com/support/pages/node/244347.
Fragment Size
The ORB separates messages into fragments to send over the ORB connection. You can configure this fragment size through the com.ibm.CORBA.FragmentSize parameter.
To determine and change the size of the messages that transfer over the ORB and the number of required fragments, perform the following steps:In the administrative console, enable ORB tracing in the ORB Properties page.
Enable ORBRas diagnostic trace ORBRas=all (http://www-01.ibm.com/support/docview.wss?uid=swg21254706).
Increase the trace file sizes because tracing can generate a lot of data.
Restart the server and run at least one iteration (preferably several) of the case that you are measuring.
Look at the traceable file and do a search for Fragment to follow: Yes.This message indicates that the ORB transmitted a fragment, but it still has at least one remaining fragment to send prior to the entire message arriving. A Fragment to follow: No value indicates that the particular fragment is the last in the entire message. This fragment can also be the first, if the message fits entirely into one fragment.
If you go to the spot where Fragment to follow: Yes is located, you find a block that looks similar to the following example:
Fragment to follow: Yes
Message size: 4988 (0x137C)
--
Request ID: 1411This example indicates that the amount of data in the fragment is 4988 bytes and the Request ID is 1411. If you search for all occurrences of Request ID: 1411, you can see the number of fragments that are used to send that particular message. If you add all the associated message sizes, you have the total size of the message that is being sent through the ORB.
You can configure the fragment size by setting the com.ibm.CORBA.FragmentSize ORB custom property.
Setting -Dcom.ibm.CORBA.FragmentSize=0
disables
fragmentation and may improve performance in some cases; however, note
that multiple requests may be multiplexed on a single connection and
there will be a lock on the connection during the write of the full
message. If a message is large, fragmentation is disabled, and the
concurrent load is greater than ConnectionMultiplicity, this may create
a bottleneck.
One additional value that FragmentSize=0 may provide is to isolate a subset of problematic clients (e.g. bad network) to just the reader threads because a full, non-fragmented read occurs on the reader thread whereas with fragmentation, it will need to consume a worker thread while it waits for the next fragment. Note that when the server sends the response back to the client, the write happens on the ORB thread pool thread; however, with a sufficient ORB thread pool size, this may help isolate such problematic clients.
Interceptors
Interceptors are ORB extensions that can set up the context prior to the ORB runs a request. For example, the context might include transactions or activity sessions to import. If the client creates a transaction, and then flows the transaction context to the server, then the server imports the transaction context onto the server request through the interceptors.
Most clients do not start transactions or activity sessions, so most systems can benefit from removing the interceptors that are not required.
To remove the interceptors, manually edit the server.xml file and remove the interceptor lines that are not needed from the ORB section.
ORB IBM Data Representation (IDR)
ORB 7.1 introduced dramatic performance improvements.
java.nio.DirectByteBuffer
Unlike IBM Semeru Runtimes
and similar OpenJDK runtimes, IBM Java has slightly different
default DirectByteBuffer
behavior: The
-XX:MaxDirectMemorySize
hard limit defaults to unlimited
and instead there is a "soft" limit default of 64MB. This soft limit
grows if the application needs more DirectByteBuffer
space
and no hard limit is configured. The JDK is reluctant to grow the soft
limit and before it will expand the soft limit there will be a series of
System GC events, trying to free up enough space to avoid growing the
limit. System GCs cause long pauses during which application threads
cannot do any work, so we generally try to avoid them if at all possible
for performance reasons. Also, when the soft limit is used, the limit
will be raised only a little bit at a time, and every time the limit is
hit, there is another series of System GCs before growth is allowed. So
if the application demands a lot of DirectByteBuffers, then starting at
64 MB and growing to whatever the necessary size is will take a long
time, during which performance will be significantly impacted. For this
reason, from a performance perspective, it is generally recommended to
specify -XX:MaxDirectMemorySize
as needed. Ensure there is
enough physical memory to support the potential DBB demands. For
example:
-XX:MaxDirectMemorySize=1G
The option
-Dcom.ibm.nio.DirectByteBuffer.AggressiveMemoryManagement=true
may be used to enable a more aggressive DirectByteBuffer
cleanup algorithm (which may increase the frequency of
System.gc
calls).
XML and XSLT
Profile your application using tools such as the IBM Java Health Center or more simply by taking multiple thread dumps. If you observe significant lock contention on an instance of java/lang/Class and/or significant CPU time in com/ibm/xtq/xslt/* classes, then consider testing the older XSLT4J interpreter to see if you have better results:
From Version 6, the XL TXE-J compiler replaces the XSLT4J interpreter as the default XSLT processor.
The XL TXE-J compiler is faster than the XSLT4J interpreter when you are applying the same transformation more than once. If you perform each individual transformation only once, the XL TXE-J compiler is slower than the XSLT4J interpreter because compilation and optimization reduce performance.
For best performance, ensure that you are not recompiling XSLT transformations that can be reused. Use one of the following methods to reuse compiled transformations:
- If your stylesheet does not change at run time, compile the stylesheet as part of your build process and put the compiled classes on your classpath. Use the org.apache.xalan.xsltc.cmdline.Compile command to compile the stylesheet and set the http://www.ibm.com/xmlns/prod/xltxe-j/use-classpath transformer factory attribute to true to load the classes from the classpath.
- If your application will use the same stylesheet during multiple runs, set the http://www.ibm.com/xmlns/prod/xltxe-j/auto-translet transformer factory attribute to true to automatically save the compiled stylesheet to disk for reuse. The compiler will use a compiled stylesheet if it is available, and compile the stylesheet if it is not available or is out-of-date. Use the http://www.ibm.com/xmlns/prod/xltxe-j/destination-directory transformer factory attribute to set the directory used to store compiled stylesheets. By default, compiled stylesheets are stored in the same directory as the stylesheet.
- If your application is a long-running application that reuses the same stylesheet, use the transformer factory to compile the stylesheet and create a Templates object. You can use the Templates object to create Transformer objects without recompiling the stylesheet. The Transformer objects can also be reused but are not thread-safe.
- If your application uses each stylesheet just once or a very small number of times, or you are unable to make any of the other changes listed in this step, you might want to continue to use the XSLT4J interpreter by setting the javax.xml.transform.TransformerFactory service provider to org.apache.xalan.processor.TransformerFactoryImpl.
For additional information, see http://www-01.ibm.com/support/docview.wss?uid=swg21639667
DNS Cache
The DNS Cache works the same as in the OpenJDK JCL;
however, there is an additional property
-Dcom.ibm.cacheLocalHost=true
to cache
localhost
lookups.