Intelligent Management
Intelligent Management (IM) was formerly a separate product called WebSphere Virtual Enterprise (WVE) and it became a part of WebSphere Network Deployment starting with version 8.5. IM introduces the On Demand Router which supports application editioning, health policies, service policies, maintenance mode, automatic discovery, dynamic clusters, traffic shaping, and more. The ODR was first delivered as a Java process that was based on the Proxy Server and it was normally placed in between a web server and the application servers. Starting with WAS 8.5.5, there is an option called ODRLib which is a native C component that delivers much of the same functionality but is integrated directly into the IBM HTTP Server (IHS) web server.
Java On Demand Router (ODR)
The Java On Demand Router (ODR) is built on top of the WAS Java Proxy Server. Both of these write the following log files asynchronously in a background "LoggerOffThread:"
- local.log: A log of the communications between the client (e.g. browser) and the ODR, i.e. the activities in the "local" ODR process.
- proxy.log: A log of the communications between the ODR and the backend server (e.g. application server).
The weighted least outstanding request (WLOR) load balancing algorithm is generally superior to the available load balancing algorithms in the WebSphere plugin. WLOR takes into account both the weight of the server and the number of outstanding requests, so it is better at evening out load if one server slows down. WLOR is the default in both ODRLib and the Java ODR.
The "excessive request timeout condition" and "excessive response time condition" are useful health policies that the ODR can monitor to gather diagnostics on anomalous requests.
Conditional Request Trace enables traces only for requests that match a particular condition such as a URI.
The ODR measures "service time" as the time the request was sent to the application server until the time the first response chunk arrives.
Create a separate shared class cache: http://www-01.ibm.com/support/docview.wss?uid=swg21965655
Maintenance Mode
Putting servers into maintenance mode is a great way to gather performance diagnostics while reducing the potential impact to customers. One maintenance mode option is to allow users with affinity to continue making requests while sending new requests to other servers.
Putting a server into maintenance mode is a persistent change. In other words, a server will remain in maintenance mode (even if the server is restarted) until the mode is explicitly changed. The maintenance mode of a server is stored persistently as a server custom property. The name of the custom property is "server.maintenancemode" under Application Servers > Administration > Custom Properties. Possible values for that property are:
- false - maintenance mode is disabled
- affinity - only route traffic with affinity to the server
- break - don't route any traffic to the server
Custom Logging
The Java ODR supports custom logging which allows for conditions on what is logged and has very flexible fields for logging.
The condition uses HTTP request and response operands. Response operands include response code, target server, response time, and service time. The logFileFormat specifies the log file name and the format of the log entry to create, if the condition is true. See the next chart for a list of directives which can be used to specify the format.
Example - logs all requests that took more than 2 seconds to the slow.log file, indicating the service and response times:
condition='response.time > 2000' value='slow.log %t %T %r %s %U %Z'
Use the manageODR.py script to configure custom logging.
Binary Trace Facility (BTF)
The Java ODR supports a different type of tracing from the traditional diagnostic trace. Btrace enables trace on a per-request basis and infrequently-occurring conditions out-of-the-box (e.g. reason for 503). Btrace is hierarchical with respect to function rather than code and trace records are organized top-down and left-to-right (processing order). The trace specification can be set as a cell custom property starting with trace, e.g. name=trace.http, value=http.request.loadBalance=2
The "trace" command in the WAS installation directory can be used to format btrace data:
${WAS}/bin/trace read ${SERVER_LOGS_DIRECTORY} ${SPEC_TO_READ}
Dynamic clusters
Application Placement Controller (APC)
The Application Placement Controller code runs in one JVM in the cell and coordinates stopping and starting JVMs when dynamic clusters are in automatic mode, or creating runtime tasks for doing so when dynamic clusters are in supervised mode. The frequency of changes is throttled by the minimum time between placements option (http://www-01.ibm.com/support/knowledgecenter/SSAW57_8.5.5/com.ibm.websphere.nd.doc/ae/twve_odmonitorapc.html). Some of the basic theory of the APC is described here: http://www2007.org/papers/paper145.pdf
Investigating autonomic dynamic cluster size violations: http://www-01.ibm.com/support/docview.wss?uid=swg21965051
Investigate APC issues:
- Check all node agents are running and healthy and the core group is marked as stable.
- Check if any nodes or servers are in maintenance mode.
- Check the logs for servers to see if they were attempted to be started but failed for some reason (e.g. application initialization).
- Check each node's available physical memory if there is sufficient free space for additional servers.
- Find where the APC is running (DCPC0001I/HAMI0023I) and not stopped (DCPC0002I/HAMI0023I), and ensure that it is actually running at the interval of minimum time between placement options (otherwise, it may be hung).
- Check if APC detected a violation with the DCPC0309I message. If found, check for any subsequent errors or warnings.
- Check the apcReplayer.log, find the "**BEGIN PLACEMENT INPUT DUMP**" section, and review if all nodes are registered with lines starting with {CI.
If APC is constantly stopping and starting JVMs seemingly needlessly, test various options such as:
- APC.BASE.PlaceConfig.DEMAND_DISTANCE_OVERALL=0.05
- APC.BASE.PlaceConfig.UTILITY_DISTANCE_PER_APPL=0.05
- APC.BASE.PlaceConfig.WANT_VIOLATION_SCORE=true
- APC.BASE.PlaceConfig.PRUNE_NO_HELP=false
Service Policies
Service policies define application goals (e.g. average response time less than 1 second) and relative priorities (e.g. application A is High). The Java ODR uses these policies in its request prioritization and routing decisions.
CPU/Memory Overload Protection
These overload protection features cause the Java ODR to queue work to application servers that it sees are over the configured thresholds of CPU and/or memory usage.
Health Policies
When using the "excessive memory usage" health policy (http://www-01.ibm.com/support/knowledgecenter/SSAW57_8.5.5/com.ibm.websphere.nd.multiplatform.doc/ae/cwve_odhealth.html?lang=en), set usexdHeapModule=true (http://www-01.ibm.com/support/knowledgecenter/SSAW57_8.5.5/com.ibm.websphere.nd.multiplatform.doc/ae/rwve_odhealthcustprop.html?lang=en). Otherwise, the heap usage is sampled and this can create false positives with generational garbage collection policies such as gencon. The "memory leak" health policy uses the built-in traditional WAS performance advisor and this always samples, so it's not recommended with generational garbage collectors.
Visualization Data Service
This service logs key performance data into CSV log files. The logs are written to the deployment manager profile directory at $DMGR_PROFILE/logs/visualization/*.log
- System Administration > Visualization Data Service > Check "Enable Log"
- Timestamp format = MM/dd/yyyy HH:mm:ss
- If this is not specified, it defaults to the "number of milliseconds since the standard base time known as "the epoch", namely January 1, 1970, 00:00:00 GMT." – i.e. new Date(timestamp)
- Max file size = 20MB
- Max historical files = 5
- The max file size and historical files apply to each viz data log file, individually.
- Timestamp format = MM/dd/yyyy HH:mm:ss
Bulletin Board over the Structured Overlay Network (BBSON)
BBSON is an alternative to the High Availability Manager (HAManager) and allows some of the WAS components that traditionally relied on the HAManager to use a different approach. BBSON is built on the P2P component which is peer-to-peer with small sized groups rather than a mesh network like HAManager. This can allow for greater scalability and no need for core group bridges. All IM components can use BBSON. WAS WLM can also use it: http://www-01.ibm.com/support/docview.wss?uid=swg1PM71531
High Availability Deployment Manager (HADMGR)
The high availability deployment manager allows multiple instances of the deployment manager to share the same configuration (using a networked filesystem) to eliminate a single point of failure if one of them is not available: http://www-01.ibm.com/support/knowledgecenter/SSAW57_8.5.5/com.ibm.websphere.nd.multiplatform.doc/ae/cwve_xdsodmgr.html. The HADMGR must be accessed through an On Demand Router (ODR) which routes to one of the active deployment managers. The deployment manager can be very chatty in making many small file I/O accesses, thus performance of the networked filesystem is critical.
Previous Section (Asynchronous Beans) | Next Section (Security) | Back to Table of Contents