Caching and WebSphere eXtreme Scale

Caching Recipes

  1. If available, enable the Java shared class and ahead-of-time compilation caches. WAS enables this by default, but you can increase the size if you have available memory. See the Java chapter.
  2. Pre-compile Java Server Pages (JSPs). See the WAS chapter.
  3. If possible, utilize the WAS Dynacache feature to cache servlet responses. See the HTTP section in the WAS chapter.
  4. The application should set standardized response headers that indicate caching (e.g. Cache-Control in HTTP).
    1. An alternative is to use a web server such as IHS to apply cache headers to responses based on rules. See the Web Servers chapter.
  5. If possible, use the WebSphere eXtreme Scale (WXS) product to maximize data caching (see below).
  6. Consider using an edge cache such as the WebSphere Caching Proxy. See the Web Servers chapter.
  7. If using WebSphere Commerce, set Dynacache caches' sharing modes to NOT_SHARED.

General Caching Topics

Caching (or lack thereof) can have dramatic performance impacts; however, caching must be carefully implemented to avoid inconsistent data:

Most Java EE application workloads have more read operations than write operations. Read operations require passing a request through several topology levels that consist of a front-end web server, the web container of an application server, the EJB container of an application server, and a database. WebSphere Application Server provides the ability to cache results at all levels of the network topology and Java EE programming model that include web services.

Application designers must consider caching when the application architecture is designed because caching integrates at most levels of the programming model. Caching is another reason to enforce the MVC pattern in applications. Combining caching and MVC can provide caching independent of the presentation technology and in cases where there is no presentation to the clients of the application. (https://www.ibm.com/support/knowledgecenter/SSAW57_8.5.5/com.ibm.websphere.nd.doc/ae/cprf_appdesign.html)

In general, caches are held in memory or disk which must be properly sized for the additional cache usage. Caches may also introduce additional administration. Example caches (detailed in later chapters):

  • Avoid your infrastructure altogether by telling the client (e.g. browser) to cache as much as possible with response headers.
  • Cache whole or parts of responses (e.g. servlet caching).
  • Use dedicated caching proxy servers between the end-user and application.
  • Place static content as close to the user as possible (e.g. in a web server instead of the application server, content delivery networks, etc.).

WebSphere eXtreme Scale (WXS)

Caching is used to reduce execution path length at any layer to reduce the cost of each execution. This may lower the response time and/or lower the transaction cost. A grid is a set of maps that store data. Within a grid, partitions split maps across multiple container server JVMs using shards. A catalog server coordinates shard placement and monitors container servers. There may be multiple catalog servers in a catalog service domain (which itself is a mini grid) for high availability. A partition has 1 primary shard and 0 or more replica shards. The primary shard receives the actual data (insert, update, remove). A replica shard may be either synchronous or asynchronous. If minSyncReplica is > 0, a transaction in a primary shard is only committed with the agreement of those replicas.

Catalog Servers

A catalog server references an objectGridServer.properties file. On WAS, this is often in <WAS>/properties and may be copied from <WAS>/optionalLibraries/ObjectGrid/properties/sampleServer.properties.

Container Servers

A container server references both an objectGrid.xml file and an objectGridDeployment.xml file. For a WAR, place both into WebContent/META-INF. A container server also must have access to the objectGridServer.properties file. Full objectGridDeployment.xsd: http://www.ibm.com/support/knowledgecenter/en/SSTVLU_8.6.1/com.ibm.websphere.extremescale.doc/rxsdepschema.html

Example development objectGridDeployment.xml

<?xml version="1.0" encoding="UTF-8"?>
<deploymentPolicy xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://ibm.com/ws/objectgrid/deploymentPolicy ../deploymentPolicy.xsd"
  xmlns="http://ibm.com/ws/objectgrid/deploymentPolicy">
  <objectgridDeployment objectgridName="grid1">
    <mapSet name="mapSet" numberOfPartitions="1" developmentMode="true">
      <map ref="map1"/>
    </mapSet>
  </objectgridDeployment>
</deploymentPolicy>

Example non-development objectGridDeployment.xml

<?xml version="1.0" encoding="UTF-8"?>
<deploymentPolicy xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://ibm.com/ws/objectgrid/deploymentPolicy ../deploymentPolicy.xsd"
  xmlns="http://ibm.com/ws/objectgrid/deploymentPolicy">
  <objectgridDeployment objectgridName="grid1">
    <mapSet name="mapSet" numberOfPartitions="17" minSyncReplicas="1" developmentMode="false">
      <map ref="map1"/>
    </mapSet>
  </objectgridDeployment>
</deploymentPolicy>

WXS Client

A client references an objectGrid.xml file. For a WAR, place into WebContent/META-INF. Full objectGrid.xsd: http://www.ibm.com/support/knowledgecenter/en/SSTVLU_8.6.1/com.ibm.websphere.extremescale.doc/rxslclschema.html

Example objectGrid.xml

<?xml version="1.0" encoding="UTF-8"?>
<objectGridConfig xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://ibm.com/ws/objectgrid/config ../objectGrid.xsd"
  xmlns="http://ibm.com/ws/objectgrid/config">
  <objectGrids>
    <objectGrid name="grid1" txTimeout="120">
      <backingMap name="map1" copyMode="COPY_TO_BYTES" lockStrategy="OPTIMISTIC" lockTimeout="15" />
    </objectGrid>
  </objectGrids>
</objectGridConfig>

Example code to put and get from a grid

import com.ibm.websphere.objectgrid.ClientClusterContext;
import com.ibm.websphere.objectgrid.ConnectException;
import com.ibm.websphere.objectgrid.ObjectGrid;
import com.ibm.websphere.objectgrid.ObjectGridException;
import com.ibm.websphere.objectgrid.ObjectGridManagerFactory;
import com.ibm.websphere.objectgrid.ObjectMap;
import com.ibm.websphere.objectgrid.Session;
import com.ibm.websphere.objectgrid.plugins.TransactionCallbackException;

try {
    long key = 42;
    String value = "Hello World";
    ClientClusterContext ccc = ObjectGridManagerFactory.getObjectGridManager().connect("localhost:4809", null, null);
    ObjectGrid grid = ObjectGridManagerFactory.getObjectGridManager().getObjectGrid(ccc, "grid1");
    Session session = grid.getSession();
    ObjectMap map1 = session.getMap("map1");

    map1.setPutMode(ObjectMap.PutMode.UPSERT);
    map1.put(key, value);

    String fromGrid = (String) map1.get(key);    
    System.out.println(fromGrid.equals(value));
} catch (ConnectException e) {
    throw new RuntimeException(e);
} catch (TransactionCallbackException e) {
    throw new RuntimeException(e);
} catch (ObjectGridException e) {
    throw new RuntimeException(e);
}

When using a catalog service domain (e.g. "csd01") in WAS, use the following instead:

ObjectGridManager objectGridManager = ObjectGridManagerFactory.getObjectGridManager();
CatalogDomainManager catalogDomainManager = objectGridManager.getCatalogDomainManager();
CatalogDomainInfo catalogDomainInfo = catalogDomainManager.getDomainInfo("csd01");
String cep = catalogDomainInfo.getClientCatalogServerEndpoints();
ClientClusterContext ccc = objectGridManager.connect(cep, (ClientSecurityConfiguration) null, (URL) null);
ObjectGrid objectGrid = objectGridManager.getObjectGrid(ccc, "grid1");

Best Practices

Have approximately 10 shards per container. So if you plan to have 50 containers for instance and you have one replica configured in your policy, we would recommend about 250 partitions. This allows for having extra shards available for adding containers in the future when you need to expand without taking a grid outage to change the number of partitions. With having extra partitions per container, elasticity can be achieved. The general formula is (number of containers * 10) / (1 + number of replicas)). That gives you the number of partitions to start with. That usually gives a whole number that is not prime. We recommend choosing a prime number that is close to the number that the formula returns.

When it comes to starting a lot of containers, we recommend making use of the xscmd commands of suspendBalancing and resumeBalancing. You invoke suspendBalancing before staring the containers and resumeBalancing when you are complete. This approach allows eXtreme Scale to make one placement decision instead of multiple ones. If it was making a placement decision for each container as they start, the result can be a lot of unnecessary data movement.

Similarly when you are stopping containers and catalog servers, we recommend making use of the xscmd command of teardown to specify the servers you want to stop if you are stopping more than one. Again this approach allows you to limit the amount of data movement to be more efficient. There are filter options like host or zone to allow you to just say stop all containers on this host or in this zone for instance, or you can just give the complete list of the servers you want to stop. If you want to stop all containers, just run xscmd -c teardown without filters or a list of servers and it will stop all containers. If you want to stop all containers for a specific grid you can use the -g option to specify the grid to filter on.

Thread Pools

Containers:

  • XIOPrimaryPool: Used for WXS CRUD operations.
  • WXS: Used for replication and DataGridAgents.
  • xioNetworkThreadPool: Reads the request and sends response.

Near Cache

A near cache is a client side subset of the grid: http://www-01.ibm.com/support/knowledgecenter/SSTVLU_8.6.0/com.ibm.websphere.extremescale.doc/txsclinearcacheconfig.html?lang=en

The near cache is enabled by default for any map with a non-PESSIMISTIC lockStrategy (default OPTIMISTIC) (see the Spring section for an exception). It is also unbounded by default which may cause OutOfMemoryErrors if an evictor is not specified either through ttlEvictorType/timeToLive or a plugin evictor such as LRU through pluginCollectionRef. Alternatively, nearCacheInvalidationEnabled may be set to true to propagate invalidations from the grid to each nearCache: http://www-01.ibm.com/support/knowledgecenter/SSTVLU_8.6.0/com.ibm.websphere.extremescale.doc/txsnearcacheinv.html?lang=en

The increase in Java heap usage should be monitored to ensure the nearCache is not increasing the proportion of time in garbage collection too much (or its eviction/size should be tuned, or the heap increased).

If the map's copyMode is COPY_TO_BYTES or COPY_TO_BYTES_RAW, then nearCacheCopyMode should be set to NO_COPY, because any copying is unnecessary.

The near cache hit rate is a critical performance metric. A near cache occupancy may be limited by size (e.g. LRU/LFU evictor) or expired over time (e.g. TTL evictor).

Enable near cache statistics through the ObjectGrid Maps PMI module:

Then check the hit rate by analyzing hits / gets:

Spring Integration

WXS provides Spring integration for Spring >= 3.1: http://www-01.ibm.com/support/knowledgecenter/SSTVLU_8.6.0/com.ibm.websphere.extremescale.doc/txsspringprovide.html?cp=SSTVLU_8.6.0&lang=en

Older documentation states that, generally, the nearCache is automatically enabled when the lockStrategy is NONE or OPTIMISTIC (default). This is true, except for the Spring provider which explicitly disables the nearCache even when it would have been enabled, unless a client override XML is provided (see CLIENT_OVERRIDE_XML in the link above).

Example Spring XML specifying the client override XML:

<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
    xmlns:p="http://www.springframework.org/schema/p" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:tx="http://www.springframework.org/schema/tx"
    xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd
        http://www.springframework.org/schema/tx http://www.springframework.org/schema/tx/spring-tx.xsd">

    <bean id="domain"
        class="com.ibm.websphere.objectgrid.spring.ObjectGridCatalogServiceDomainBean"
        p:client-override-xml="file:/objectgrid.xml"
        p:catalog-service-endpoints="${catalogServiceUrl}" />
...

Example client override XML which enables a nearCache (see the Near Cache section for more details):

<?xml version="1.0" encoding="UTF-8"?>
<objectGridConfig
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://ibm.com/ws/objectgrid/config ../objectGrid.xsd"
  xmlns="http://ibm.com/ws/objectgrid/config">

  <objectGrids>
    <objectGrid name="CACHE_REMOTE" txTimeout="60">
      <!-- NOEXP caches' nearCaches use LRU to limit number of nearCache entries per map -->
      <backingMap name="CACHE_NOEXP_.*" template="true"
                  lockStrategy="NONE" ttlEvictorType="NONE" timeToLive="0" copyMode="COPY_TO_BYTES"
                  nearCacheEnabled="true" nearCacheCopyMode="NO_COPY" pluginCollectionRef="LRUevictorPlugins" />
      <!-- EXP caches' nearCaches implicitly use backingMap TTL evictor settings -->
      <backingMap name="CACHE_EXP_.*" template="true"
                  lockStrategy="NONE" ttlEvictorType="LAST_UPDATE_TIME" timeToLive="120" copyMode="COPY_TO_BYTES"
                  nearCacheEnabled="true" />
    </objectGrid>
  </objectGrids>
 
  <backingMapPluginCollections>
    <backingMapPluginCollection id="LRUevictorPlugins">
      <bean id="Evictor" className="com.ibm.websphere.objectgrid.plugins.builtins.LRUEvictor">
        <!-- max entries per map = numberOfLRUQueues * maxSize -->
        <property name="numberOfLRUQueues" type="int" value="5" description="set number of LRU queues" />
        <property name="maxSize" type="int" value="5" description="set max size for each LRU queue" />
      </bean>
    </backingMapPluginCollection>
  </backingMapPluginCollections>
</objectGridConfig>

When a client override XML is successfully loaded, messages such as the following will be printed:

[2/10/16 23:50:03:190 EST] 00000000 ObjectGridMan I CWOBJ2433I: Client-side ObjectGrid settings are going to be overridden for domain DefaultDomain using the URL file:/override-objectgrid.xml.
[2/10/16 23:50:03:758 EST] 00000000 ObjectGridImp I CWOBJ1128I: The client cache is enabled for maps [IBM_SPRING_PARTITIONED_.*] on the SPRING_REMOTE ObjectGrid.

In the above example, the maps using the first template have the LRU evictor specified at the bottom of the XML. The maps using the second template do not specify a pluginCollectionRef but they will implicitly use the TTL evictor because the backingMap specifies a TTL evictor type and time.

The WXS Spring provider enables a "fast fail" mechanism by default. This mechanism exists to allow an application to not hang if a temporary network brownout occurs. Without fastfail, if network connectivity is lost between the client and the WXS server, each request will time out before returning. Fastfail quickly identifies that the network is down and allows all cache requests to return null immediately and reconnect once network connectivity has been restored. This is accomplished with one WXSSpringFastFail[${MAP_NAME}] thread created per map (and if maps are used in different applications with the default classloading policy, one per classloader. This fast fail function may be disabled with -Dcom.ibm.websphere.objectgrid.spring.disable.fastfail=true, in which case TargetNotAvailableExceptions and related exceptions will print FFDCs and a null value will be returned from the cache.

Monitoring

There are many ways to monitor WXS: http://www.ibm.com/support/knowledgecenter/en/SSTVLU_8.6.1/com.ibm.websphere.extremescale.doc/txsadmdeployenv.html

Performance Tracing

See below for additional tracing specific to XIO.

For the overall transaction, use the diagnostic trace com.ibm.ws.objectgrid.SessionImpl=all and calculate the time between the "begin" entry and "commit " exit trace points. That's the lifetime of the transaction on the client. We don't necessarily go to the server immediately after begin() so it's possible if you did the same thing on both the client and the server for the same transaction, you'd get different numbers.

On the client side instrumenting com.ibm.ws.objectgrid.client.RemoteCacheLoader.get() will give you information on the client side for how long a client get operation is taking.

On the container side instrumenting com.ibm.ws.objectgrid.ServerCoreEventProcessor.getFromMap() will give you information on the server side for how long we take to get a value on the server side.

Offload Caching

WXS is frequently used for HTTP Session persistence instead of a database or Dynacache: ftp://ftp.software.ibm.com/software/iea/content/com.ibm.iea.wxs/wxs/7.0/Administration/Labs/XS70_HTTPSession_Lab.pdf. Keep in mind that the Extreme Scale JVMs will also need to be tuned.

eXtreme IO (XIO)

Tuning XIO: http://www-01.ibm.com/support/knowledgecenter/SSTVLU_8.6.0/com.ibm.websphere.extremescale.doc/rxstunexio.html

eXtreme Memory (XM)

WebSphere eXtreme Scale v8.6 provides the ability to store cache data outside of the Java heap space. This feature is termed Extreme Memory or XM. Using XM requires the eXtreme IO feature (XIO) introduced in v8.6.

XM leads to more performant and consistent relative response times: http://www.ibm.com/support/knowledgecenter/en/SSTVLU_8.6.1/com.ibm.websphere.extremescale.doc/cxsxm.html

Data Serialization

COPY_TO_BYTES

To optimize serialization with any of these options, you can use the COPY_TO_BYTES mode to improve performance up to 70 percent. With COPY_TO_BYTES mode, the data is serialized when transactions commit, which means that serialization happens only one time. The serialized data is sent unchanged from the client to the server or from the server to replicated server. By using the COPY_TO_BYTES mode, you can reduce the memory footprint that a large graph of objects can use. (http://www.ibm.com/support/knowledgecenter/en/SSTVLU_8.6.1/com.ibm.websphere.extremescale.doc/cxsserializer.html)

ORB

If using IBM Java ORB communication, tune the ORBs in all WXS processes (catalogs, containers, and clients): http://www-01.ibm.com/support/knowledgecenter/SSTVLU_8.6.0/com.ibm.websphere.extremescale.doc/rxsorbproperties.html

eXtreme Data Format (XDF)

WebSphere eXtreme Scale v8.6 introduced eXtreme Data Format (XDF) which allows sharing between Java and .NET applications, additional indexing options, automatic versioning, and partitioning through annotations. XDF is the default serialization mode when XIO is enabled and copy mode is COPY_TO_BYTES: http://www.ibm.com/support/knowledgecenter/en/SSTVLU_8.6.1/com.ibm.websphere.extremescale.doc/txsconfigxdf.html

XDF supports serialization of Java objects which do not implement the Serializable interface.

XDF does not compress entries, so data placed in the cache may be larger than other serialization modes and may increase the overhead of network transportation.

CAP Theorem

  • Consistency - all clients see the same view, even in the presence of updates
  • High Availability - all clients can find some replica of the data, even in the presence of failures
  • Partition Tolerance - the system properties are held even when the system is partitioned.

CAP theorem states that a grid can only have two of the three. In WXS version prior to WXS v7.1, grids provide CP services. That is to say that the grid provided consistency (only one place to write the data - the primary shard), and partition tolerance (the grid is capable of providing service even if parts of the grid are network partitioned and unavailable). As of WXS v7.1 we can now have AP grids (Availability and Partition Tolerance).

Queries

WXS provides its own SQL-like query language: http://www.ibm.com/support/knowledgecenter/en/SSTVLU_8.6.1/com.ibm.websphere.extremescale.doc/rxsquerylang.html

Setting eXtreme Scale tuning options

For standalone client JVM

Create the file objectGridClient.properties in the server's root directory, and add a JVM parameter:

-Dobjectgrid.client.props=objectGridClient.properties

For standalone container JVM:

Create the file objectGridServer.properties and add the JVM command line argument:

-serverProps objectGridServer.properties

xscmd

xscmd is the fully supported replacement for the older xsadmin. General:

  • Help: xscmd -help
  • List available commands: xscmd -lc

The key thing to specify to xscmd is -cep which specifies the list of catalog service endpoints. For example:

$ ./xscmd.sh -c listObjectGridNames -cep localhost:4809
...
Grid Name
---------
Grid

When the catalog service is running inside WebSphere Application Server (by default, in the deployment manager), and XIO is enabled, the -cep port is the XIO_ADDRESS port.

Suspend and Resume Status

The suspendStatus command displays the suspend and resume status (ignore the heartbeat option as it only applies to WXS stand alone):

$ xscmd.sh -c suspendStatus
...
*** Printing the results of the balance status command for all data grids.

  Type      ObjectGrid name Map Set Name Status  Details
  ----      --------------- ------------ ------  -------
  placement Grid            mapSet       Resumed

*** Printing the results of the transport communication failure detection status command for
    DefaultDomain catalog service domain. The type requested was failoverAll.

  Type        Domain name   Status  Details
  ----        -----------   ------  -------
  failoverAll DefaultDomain Resumed

When you suspend or resume, the primary catalog logs will contain:

  • Placement: CWOBJ1237 for both suspend and resume request attempt, CWOBJ1214 for both suspend and resume when it completes successfully ... the logs will differ with the word "suspend" or "resume" accordingly.
  • FailoverAll: CWOBJ1262 for the supsend and resume request attempt, CWOBJ1260 for both suspend and resume when it completes successfully ... the logs will differ with the word "suspend" or "resume" accordingly.

Performing Maintenance

  1. Use the WXS teardown command on all the containers which will be undergoing maintenance. For example: xscmd -c teardown -sl cachesvr1
  2. Wait 5 minutes
  3. Use the WXS teardown command on all the catalogs which will be undergoing maintenance one at a time and waiting 5 minutes between each. For example: xscmd -c teardown -sl catlgsrvr1
  4. Wait 5 minutes
  5. Stop underlying JVMs (e.g. if running on WAS)
  6. Apply maintenance
  7. Start catalog servers
  8. Wait 5 minutes
  9. Start container servers
  10. Wait 5 minutes

If there are many servers being started at once, you may first suspend balancing:

  1. xscmd -c suspend

Then resume balancing once the JVMs are started:

  1. xscmd -c resume -t placement
  2. Run $(xscmd -c showPlacement) until all partitions show as placed.
  3. xscmd -c resume -t heartbeat

Application Considerations

FIFO Queue

WXS maps may be used as a FIFO queue with the getNextKey method: http://www.ibm.com/support/knowledgecenter/en/SSTVLU_8.6.1/com.ibm.websphere.extremescale.doc/rxsmapsfifo.html

Transactions

Twophase transactions will ensure that all changes made to all maps in the transactions are either rolled back or committed: http://www.ibm.com/support/knowledgecenter/en/SSTVLU_8.6.1/com.ibm.websphere.extremescale.doc/txsprogobjgridtxn.html

You can have two maps involved in a transaction without using the Twophase logic. If the two maps are in the same partition, everything will commit or rollback as part of the transaction. WXS will not partially commit a change by having only one map commit and then not doing the other map due to an error; it is always going to be an atomic operation even with a Onephase transaction.

Transaction Callbacks

The TransactionCallback interface may be used to execute code before a Session.commit completes: http://www.ibm.com/support/knowledgecenter/en/SSTVLU_8.6.1/com.ibm.websphere.extremescale.javadoc.doc/topics/com/ibm/websphere/objectgrid/plugins/TransactionCallback.html