Administration

Administration Best Practices

Use consistent and repeatable administration processes. Manual changes may miss changes in some environments or otherwise cause operational instability. This means to automate all administration. For example, changes should be done through wsadmin and other scripts rather than through hard-to-repeat processes such as manual changes in the Administrative Console. This includes installation, configuration, application changes, and maintenance in all environments.

However, the Administrative Console may be used to read and review the current configuration, or to test proposed changes and use the "View administrative scripting command for last action" link after making a change to help generate automation scripts.

Deployment Manager

From Best Practices for Large WebSphere Application Server Topologies:

The memory requirement of the deployment manager increases as the size of the topology increases, and as the number of concurrent sessions increases. Since the deployment manager is just a single process, there is no mechanism to balance the load. Therefore, there is a limit to the number of concurrent users that can be supported on a single deployment manager.

Just as you would tune the application server heap size, you need to tune the deployment manager heap size to accommodate the number of concurrent users who access the deployment manager. Enable verbose garbage collection, and observe how the heap size increases with the increase in topology and in the number of users.

If too many concurrent sessions are overloading the deployment manager, you need to place a limit on concurrent access. For scripting, consider using the V7 job manager as a mechanism for users to submit wsadmin jobs. The jobs are run sequentially, and an email notification is sent to the user upon job completion.

A JMX request from the deployment manager to a single application server flows through the deployment manager to the node agent on the same node where the server resides, and finally to the application server itself. This design is intended for scalability. The deployment manager has to communicate with a node agent only, and each node agent has to communicate with its respective application servers only.

If an invocation is made to all of the servers on a node, the deployment manager uses one invocation to the node agent and the node agent, in turn, broadcasts the invocation to every server on the node. To avoid a scenario where queries get stuck, use narrow queries that target only the servers or nodes from which you really need information. Queries that touch every server can considerably consume cell resources.

Use -Dcom.ibm.ws.management.connector.soap.keepAlive=true to avoid the cost of SSL re-handshaking when AdminClient uses PullRemoteReceiver/PullRemoteSender.

Starting with WAS 8.5.5.7 (PI42208), you may set -Dcom.ibm.console.overrideSyncPref=true on the deployment manager so that saving any changes will automatically synchronize with any running nodes. This avoids common issues with junior administrators that save a change and restart a server before the automatic synchronization kicks in.

wsadmin/JMX

From Best Practices for Large WebSphere Application Server Topologies:

Often in a script you need to search for a specific configuration object, such as a specific node, server, or data source. The configuration service extracts what you are searching from the master repository to the workspace for you to make your changes. How you construct your query can greatly affect how many files are extracted. If you do not use a targeted query, you can potentially cause the entire repository to be extracted. For a large topology this is a very expensive operation.

Starting the wsadmin process may take 20 seconds or more, depending on hardware. Avoid breaking up your configuration operations into multiple wsadmin invocations. Do combine them into a single script that can be run within one wsadmin session. Consider structuring your scripts into multiple files, and import them from a front-end script.

The -conntype NONE option is running wsadmin in local mode. We don't support updating the configuration in local mode while the deployment manager is running. After the change is made, to reflect to the changes to the nodes, the user will need to start the dmgr and run the node sync (syncNode) operation in order to sync the changes to the nodes. In local mode, the user will not be able to run anything operational such as AdminControl commands to invoke any WAS MBeans (and some of the AdminTask commands also require that the server is running). Other than that, local mode should act the same.

Getting diagnostics:

  • AdminControl.invoke(AdminControl.completeObjectName("type=JVM,process=server1,*"), "dumpThreads")
  • AdminControl.invoke(AdminControl.completeObjectName("type=JVM,process=server1,*"), "generateHeapDump")
  • AdminControl.invoke(AdminControl.completeObjectName("type=JVM,process=server1,*"), "generateSystemDump")

Additional links:

Examples

Restart server:

print "Restarting " + sys.argv[0] + "/" + sys.argv[1] + "..."
print AdminControl.invoke(AdminControl.queryNames("WebSphere:*,type=Server,node=" + sys.argv[0] + ",process=" + sys.argv[1]), "restart")
print "Restart asynchronously started..."

The only potential problem with the above is that it fires off the restart asynchronously, so you don't know if it succeeded or not. Instead, the script can be changed to invoke a stop and then a start, the first of which is synchronous and reports any errors:

print "Stopping " + sys.argv[0] + "/" + sys.argv[1] + "..."
print AdminControl.stopServer(sys.argv[1], sys.argv[0])
print "Starting " + sys.argv[0] + "/" + sys.argv[1] + "..."
print AdminControl.startServer(sys.argv[1], sys.argv[0])
print "Done"

Querying PMI

# Provide the name of the WebSphere Application Server
serverName = "server1"

pmiObject = "JVM"

# If serverName is not unique across the cell, add "node=N," before "process":
lookup = "process=" + serverName

objectName = AdminControl.completeObjectName("type=Perf," + lookup + ",*")
if objectName == '' or objectName is None:
    print "Server not running or not found"
else:
    # Query PMI:
    stats = AdminControl.invoke_jmx(AdminControl.makeObjectName(objectName), "getStatsObject", [AdminControl.makeObjectName(AdminControl.completeObjectName("type=" + pmiObject + "," + lookup + ",*")), java.lang.Boolean("false")], ["javax.management.ObjectName","java.lang.Boolean"])

    usedmem = stats.getStatistic("UsedMemory").getCount()
    totalmem = stats.getStatistic("HeapSize").getCurrent()
    percentUsed = int((float(usedmem)/float(totalmem))*100.0)
    
    print("Used Java Heap (MB): %s" %(usedmem/1024))
    print("Current Java Heap Size (MB): %s" %(totalmem/1024))
    print("Percent Java Heap Used: %s" %(percentUsed))

Node Synchronization

By default, automatic node synchronization is set to occur every 1 minute. This can be increased to 60 minutes. In general, do not disable Automatic Synchronization as it can affect security components such as LTPA key distribution.

From Best Practices for Large WebSphere Application Server Topologies:

Node synchronization is the process by which the WebSphere configuration is transferred from the deployment manager to the node agent. The deployment manager and node agents compare MD5 hashes of the configuration files to determine whether the files are identical. In the cases of a node agent or deployment manager restart, the respective server must create all the MD5 hashes in memory for all the configuration documents in the node or cell. As the cell size and number of documents become larger, the start-up time also increases.

WebSphere Application Server has added support for "Hot Restart Sync." With this support, the node agent and deployment managers save the hashes in both memory as well as on the file system. When a restart is performed, the MD5 hashes do not need to be recomputed but rather can be loaded directly from disk. To enable this support, add the following custom property to your deployment manager and node agent:
-DhotRestartSync=true

Notifications

The SOAP connector has the advantage of having a better chance of making it through a firewall (since it is HTTP traffic) than RMI/IIOP; however, you will generally receive notifications faster with RMI than with SOAP. This is because the RMI uses a "push" model while SOAP uses a "pull" model.

When the RMI connector is used, a remote object is created on the client side and on the stub passed to the server side. Whenever a notification is received on the server, it is almost immediately sent (or "pushed") to the client and handed to the registered listeners. With SOAP, at regular intervals, the client requests any notifications from the server for this listener. If there are any, they are returned from (or "pulled" from) the server and then handed to the listeners. This occurs approximately every 20 seconds, but can be more frequent if a large number of notifications are being received.

Since notifications can take up to 20 seconds to be received when using the SOAP connector, it is recommended that the RMI connector be used to receive notifications, when possible.

Copy WAS nodes or cells to other hosts

Re-install Corrupt WAS on the same nodes

To re-install a corrupt WAS installation through Installation Manager, for each node, starting with the DMGR:

  1. Stop all Java processes (application servers, nodeagent, DMGR, etc.)
  2. Backup (recursive copy) of $WASHOME
  3. View $WASHOME/properties/version/installed.xml and write down the path values of the agent.launch.command, agent.install.location, and cacheLocation properties. For each one of these paths, back them up (recursive copy).
  4. Backup (copy) InstallationManager.dat from the home directory of the user that installed Installation Manager, e.g. ~/etc/.ibm/registry/InstallationManager.dat
  5. If Installation Manager itself is suspected to be corrupt, delete InstallationManager.dat, the paths of agent.launch.command, agent.install.location, and cacheLocation properties, and $WASHOME; then, re-install IM, e.g. $IMAGENT/tools/imcl install com.ibm.cic.agent -dataLocation /opt/IBM/IBMIM/data -repositories $IMAGENT/repository.config -installationDirectory /opt/IBM/IBMIM/eclipse -sharedResourcesDirectory /opt/IBM/IBMIMShared -accessRights nonAdmin -acceptLicense -sP -preferences offering.service.repositories.areUsed=false,com.ibm.cic.common.core.preferences.searchForUpdates=false
  6. If Installation Manager is not suspected to be corrupt, then uninstall WAS: $IM/eclipse/tools/imcl uninstallAll -installationDirectory $WASHOME; then, recursively delete $WASHOME
  7. Install WAS, e.g. $IM/eclipse/tools/imcl install com.ibm.websphere.ND.v90_[...] com.ibm.java.jdk.v8_8.0.[...] -sharedResourcesDirectory /opt/IBM/IBMIMShared -repositories /tmp/WASREPO/repository.config -installationDirectory $WASHOME -sP -acceptLicense. Ensure that the exact version of WAS and any fixpacks and iFixes are installed that match the configurations that have been backed up.
  8. Recursively copy $WASBACKUP/properties/fsdb to $WASHOME/properties/
  9. Recursively copy $WASBACKUP/properties/profileRegistry.xml to $WASHOME/properties/
  10. Recursively copy $WASBACKUP/profiles to $WASHOME/
  11. Recursively remove $WASHOME/configuration/org.eclipse.core.runtime $WASBACKUP/configuration/org.eclipse.equinox.app $WASBACKUP/configuration/org.eclipse.osgi $WASBACKUP/configuration/org.eclipse.update
  12. For each profile, recursively remove logs $WASHOME/profiles/$PROFILE/logs/*
  13. For each profile, recursively remove $WASHOME/profiles/$PROFILE/configuration/org.eclipse.core.runtime $WASHOME/profiles/$PROFILE/configuration/org.eclipse.equinox.app $WASHOME/profiles/$PROFILE/configuration/org.eclipse.osgi $WASHOME/profiles/$PROFILE/configuration/org.eclipse.update
  14. For each profile, recursively remove $WASHOME/profiles/$PROFILE/temp/* $WASHOME/profiles/$PROFILE/wstemp/*
  15. For each profile, run $WASHOME/profiles/$PROFILE/bin/osgiCfgInit.sh
  16. Run $WASHOME/bin/clearClassCache.sh
  17. If the node is the deployment manager, start the deployment manager
  18. If the node is not the deployment manager, log out of the deployment manager administrative console if logged in, then run $WASHOME/profiles/$PROFILE/bin/syncNode.sh $DMGRHOST $DMGRSOAPPORT, and then run $WASHOME/profiles/$PROFILE/bin/startNode.sh
  19. Start all the application servers and perform tests.
  20. If everything goes well and further fixpacks or fixpacks are required, then apply those fixes now using normal procedures.