Competition and Migration
Comparing Products
Here are some things to compare when two products are performing differently. Look at the configuration, but also gather evidence on each one (e.g. tracing) to actually confirm or deny whether the feature is in use and the relative cost.
- Compare "underlying" configurations (at least at a high level) such as the operating system (e.g. CPU, RAM usage, etc.), Java (e.g. maximum heap size, garbage collection overhead, -D parameters, etc.), etc.
- Ensure the types and volumes of messages are the same. For example, are there more exceptions in the logs of the worse-performing product?
- Security configuration (e.g. authentication provider)
- Ensure that application logging levels and volume are the same. For example, in one case the default classloading policy of a competitor product picked up a different logging configuration file causing less logging to occur versus WAS.
- If a different temporary directory is used between products (-Djava.io.tmpdir), make sure this will not have any impact (e.g. if it's on a slower file system). For example, Tomcat changes the default temporary directory.
- If the time of a product component (e.g. web service call) is in question, there may be no easy way to compare, so instead consider asking the application to write a log entry at the start and end of each call.
- If there is a small difference, try to magnify the difference (for example, adding more concurrent users) and then gather data.
- Use a monitoring product such as ITCAM that works on both products.
- If you know some aspects of the competition, such as the maximum heap size, then you can test with this same value. If, for example, garbage collection overhead is too high with the same heap size, and there are no other application differences, this may be a sign that some fundamental configuration such as thread pool sizes, data source caching, etc. may be leading to a difference in heap usage and may be the fundamental cause of the difference in performance.
- Profile your application using tools such as the IBM Java Health Center or more simply by taking multiple thread dumps.
- If changing JVMs from HotSpot to J9:
- Review the J9 chapter for common tuning.
- Profile the application and if you see classloading is very heavy, try to eliminate/cache if possible because there have been some observed performance difference in some parts of classloading.
- Test with reduced JIT compilation threads.
- Note that it is possible that two CPUs with identical specifications (e.g. clock speed) may perform differently due to manufacturing defects/differences, physical placement, and other factors which may cause different thermal characteristics and affect behavior such as clock speed (for examples, see Marathe, Aniruddha, et al. "An empirical survey of performance and energy efficiency variation on Intel processors." Proceedings of the 5th International Workshop on Energy Efficient Supercomputing. ACM, 2017.). Consider varying which systems the test are run on to see if there is any difference.
WAS Migration Performance Differences
If a customer reports that performance is worse after migrating WAS versions, consider the following ideas. In some ways, comparing two versions of the same product (e.g. migration) can also be treated as a "competition" between those two versions using the tips in the previous section.
- See the general comparison checklist above.
- What changed? Often times, the hardware, network, and/or application has changed and this could affect the difference. If possible, try installing both versions and applications in the same operating system instance for comparison.
- If the migration is from WAS < 8 to WAS >= 8, and on a platform that runs IBM Java and -Xgcpolicy is not specified on WAS >= 8, and -Xgcpolicy was not specified on the previous version or a non-gencon policy was specified, then the default gcpolicy changed to gencon with WAS V8.0. With gencon, part of the young generation (-Xmn, which defaults to 25% of -Xmx) is unavailable for the application (amount changes dynamically based on the tilt ratio), so there would be relatively less Java heap than previously which can cause performance changes.
- Compare the configurations between versions, first checking the basics such as generic JVM arguments, thread pool configurations, and then more thoroughly. Note that comparing configuration across major product versions may show some known differences in the product that may be unrelated.
- WAS traditional V8.5 includes Intelligent Management (formerly WVE) enabled by default, which includes additional PMI activity amongst other things (ODC rebuilds in the DMGR, etc.), which some customers (particularly on z/OS) may notice during idle periods compared to previous versions. IM may also introduce additionally memory overhead, particularly as the size of the cell increases. If you are not using IM features, then consider disabling it with LargeTopologyOptimization=false.
- Java EE5 modules introduced annotation scanning which can increase startup time and decrease application performance. See the Annotation Scanning section in the WAS chapter.
- Use the migration tools to review the application. The toolkit includes a "Performance" section.
- If the migration is from WAS < 8 to WAS >= 8, and the
application uses Spring, calls to
ApplicationContext.getBean()
on beans using the@Async
annotation cause higher CPU utilization. - On z/OS, ensure that WLM service classes and other classifications are the same.
- If you're migrating from IBM Java <= 8 to IBM Java >= 11 or Semeru Java (OpenJ9+OpenJDK), the JVM is largely the same, but the JCL may have significant changes (e.g. the performance characteristics of the JAXP XSLT compiler may change positively or negatively depending on the use case).
- Review the Java migration notes.
- When installing a fixpack, the Java shared class cache and OSGi caches are cleared.
Known Migration Issues
Linux
- When migrating from RHEL 8.6 (kernel 4.18.0-372.9.1) to RHEL 8.7 and
later (kernel 4.18.0-425.3.1 and later), there is a known performance
regression of ~8 to 10% on certain hardware due to default
changes in the Linux kernel options
mmio_stale_data=full
andretbleed=auto
for hardware vulnerability mitigations.
IBM Semeru Runtimes
- IBM Semeru Runtimes 17.0.10 has a known issue that may cause unexpected global GCs with a reason of "rasdump" and this is fixed in 17.0.11.