Intelligent Management
Intelligent Management Recipe
- If using Java On Demand Routers:
- Test the relative performance of an increased maximum size of the
Default
thread pool. - If ODRs are on shared installations, consider using separate shared class caches.
- If using Windows:
- If using AIO (the default), test the relative performance of
-DAIONewWindowsCancelPath=1
- If using AIO (the default), test the relative performance of disabling AIO and using NIO
- If using AIO (the default), test the relative performance of
- Test the relative performance of an increased maximum size of the
Background
Intelligent Management (IM) was formerly a separate product called WebSphere Virtual Enterprise (WVE) and it became a part of WebSphere Network Deployment starting with version 8.5.
IM introduces the On Demand Router which supports application editioning, health policies, service policies, maintenance mode, automatic discovery, dynamic clusters, traffic shaping, and more. The ODR was first delivered as a Java process that was based on the Proxy Server and it was normally placed in between a web server and the application servers. Starting with WAS 8.5.5, there is an option called Intelligent Management for Web Servers (colloquially, ODRLib) which is a native C component that delivers some of the same functionality but is integrated directly into the IBM HTTP Server (IHS) web server.
Java On Demand Router (ODR)
The Java On Demand Router (ODR) is built on top of the WAS Java Proxy
Server. Both of these write the following log files asynchronously in a
background LoggerOffThread
:
local.log
: A log of the communications between the client (e.g. browser) and the ODR, i.e. the activities in the "local" ODR process.proxy.log
: A log of the communications between the ODR and the backend server (e.g. application server).
The weighted least outstanding request (WLOR) load balancing algorithm is generally superior to the available load balancing algorithms in the WebSphere plugin. WLOR takes into account both the weight of the server and the number of outstanding requests, so it is better at evening out load if one server slows down. WLOR is the default in both ODRLib and the Java ODR.
The "excessive request timeout condition" and "excessive response time condition" are useful health policies that the ODR can monitor to gather diagnostics on anomalous requests.
Conditional Request Trace enables traces only for requests that match a particular condition such as a URI.
The ODR measures "service time" as the time the request was sent to the application server until the time the first response chunk arrives.
Default Thread Pool
The Java ODR/Proxy primarily uses the Default
thread
pool for its HTTP proxying function; however, most of its activity is
asynchronous, so a very large volume of traffic would be required to
overwhelm this thread pool. In such case, it may help to increase its
maximum size, although exhaustion of the Default
thread
pool may just be a symptom of downstream or upstream issues instead.
Maintenance Mode
Putting servers into maintenance mode is a great way to gather performance diagnostics while reducing the potential impact to customers. One maintenance mode option is to allow users with affinity to continue making requests while sending new requests to other servers.
Putting a server into maintenance mode is a persistent change. In other words, a server will remain in maintenance mode (even if the server is restarted) until the mode is explicitly changed. The maintenance mode of a server is stored persistently as a server custom property. The name of the custom property is "server.maintenancemode" under Application Servers } Administration } Custom Properties. Possible values for that property are:
- false - maintenance mode is disabled
- affinity - only route traffic with affinity to the server
- break - don't route any traffic to the server
Custom Logging
The Java ODR supports custom logging which logs information about HTTP responses, allows for conditions on what is logged and has very flexible fields for logging.
The condition uses HTTP request and response operands. Response operands include response code, target server, response time, and service time.
There are various fields available to print.
Instructions to log all responses:
- Log into the machine that runs the WAS DMGR, open a command prompt,
and change directory to the
$WAS/bin/
directory. - Run the following command for each ODR, replacing
$ODRNODE
with the ODR's node and$ODRSERVER
with the name of the ODR:wsadmin -f manageODR.py insertCustomLogRule $ODRNODE:$ODRSERVER 1 "service.time }= 0" "http.log %h %t %r %s %b %Z %v %R %T"
- In the WAS DMGR administrative console, for each ODR, go to: Servers
} Server Types } On Demand Routers }
$ODR
} On Demand Router Properties } On Demand Router settings } Custom Properties- Click New and set Name=http.log.maxSize and Value=100 and click OK. This value is in MB.
- Click New and set Name=http.log.history and Value=10 and click OK
- Click Review, check the box to synchronize, and click Save
- Restart the ODRs
- Now observe that there should be an http.log file in
$WAS\profiles\$PROFILE\logs\$ODR\
The default value for http.log.maxSize
is 500 MB
and the default value for
http.log.history
is 1
.
Note that the number of historical files is in addition to the
current file, meaning that the defaults will produce up to 1GB in two
files. Also note that changing the values affects not only the ODR
custom logs, but also the proxy.log
,
local.log
, and cache.log
.
Other notes:
Log rules may be listed with:
$ wsadmin -f manageODR.py listCustomLogRules $ODRNODE:$ODRSERVER
WASX7209I: Connected to process "dmgr" on node dmgr1 using SOAP connector; The type of process is: DeploymentManager
WASX7303I: The following options are passed to the scripting environment and are available as arguments that are stored in the argv variable: "[listCustomLogRules, odr1:odrserver1]"
1: condition='service.time >= 0' value='http.log %h %t %r %s %b %Z %v %R %T'
Log rules may be removed by referencing the rule number (specified in
insertCustomLogRule
or listed on the left side of the
output of listCustomLogRules
):
$ wsadmin -f manageODR.py removeCustomLogRule ${ODRNODE}:%{ODRSERVER} 1
WASX7209I: Connected to process "dmgr" on node dmgr1 using SOAP connector; The type of process is: DeploymentManager
WASX7303I: The following options are passed to the scripting environment and are available as arguments that are stored in the argv variable: "[removeCustomLogRule, odr1:odrserver1, 1]"
Removed log rule #1
If the overhead of the example log rule above is too high, then it
may be reduced significantly by only logging requests that take a long
time. Change the server.time threshold (in milliseconds) to some large
value. For example (the name of the log is also changed to be more
meaningful such as http_slow.log
):
$ ./wsadmin.sh -f manageODR.py insertCustomLogRule ${ODRNODE}:%{ODRSERVER} 1 "service.time >= 5000" "http_slow.log %h %t %r %s %b %Z %v %R %T"
WASX7209I: Connected to process "dmgr" on node dmgr1 using SOAP connector; The type of process is: DeploymentManager
WASX7303I: The following options are passed to the scripting environment and are available as arguments that are stored in the argv variable: "[insertCustomLogRule, odr1:odrserver1, 1, service.time >= 5000, http_slow.log %h %t %r %s %b %Z %v %R %T]"
Inserted 'log rule #1
Example output:
localhost6.localdomain6 09/Jan/2018:14:33:55 PST "GET /swat/Sleep HTTP/1.1" 200 326 cell1/node1/dc1_node1 oc3466700346 6006 6004
Note that %r
will be double-quoted without you needing
to specify the double quotes in insertCustomLogRule
. In
fact, insertCustomLogRule
does not support double quotes
around any field.
Binary Trace Facility (BTF)
The Java ODR supports a different type of tracing from the traditional diagnostic trace. Btrace enables trace on a per-request basis and infrequently-occurring conditions out-of-the-box (e.g. reason for 503). Btrace is hierarchical with respect to function rather than code and trace records are organized top-down and left-to-right (processing order). The trace specification can be set as a cell custom property starting with trace, e.g. name=trace.http, value=http.request.loadBalance=2
The trace
command in the WAS installation directory can
be used to format btrace data:
$WAS/bin/trace read $SERVER_LOGS_DIRECTORY $SPEC_TO_READ
Dynamic clusters
Application Placement Controller (APC)
The Application Placement Controller code runs in one JVM in the cell and coordinates stopping and starting JVMs when dynamic clusters are in automatic mode, or creating runtime tasks for doing so when dynamic clusters are in supervised mode. The frequency of changes is throttled by the minimum time between placements option. Some of the basic theory of the APC is described in Tang et al., 2007.
Investigate autonomic dynamic cluster size violations.
Investigate APC issues:
- Check all node agents are running and healthy and the core group is marked as stable.
- Check if any nodes or servers are in maintenance mode.
- Check the logs for servers to see if they were attempted to be started but failed for some reason (e.g. application initialization).
- Check each node's available physical memory if there is sufficient free space for additional servers.
- Find where the APC is running
(
DCPC0001I
/HAMI0023I
) and not stopped (DCPC0002I
/HAMI0023I
), and ensure that it is actually running at the interval of minimum time between placement options (otherwise, it may be hung). - Check if APC detected a violation with the
DCPC0309I
message. If found, check for any subsequent errors or warnings. - Check the apcReplayer.log, find the
**BEGIN PLACEMENT INPUT DUMP**
section, and review if all nodes are registered with lines starting with{CI
.
If APC is constantly stopping and starting JVMs seemingly needlessly, test various options such as:
APC.BASE.PlaceConfig.DEMAND_DISTANCE_OVERALL=0.05
APC.BASE.PlaceConfig.UTILITY_DISTANCE_PER_APPL=0.05
APC.BASE.PlaceConfig.WANT_VIOLATION_SCORE=true
APC.BASE.PlaceConfig.PRUNE_NO_HELP=false
Service Policies
Service policies define application goals (e.g. average response time less than 1 second) and relative priorities (e.g. application A is High). The Java ODR uses these policies in its request prioritization and routing decisions.
CPU/Memory Overload Protection
These overload protection features cause the Java ODR to queue work to application servers that it sees are over the configured thresholds of CPU and/or memory usage.
Health Policies
When using the "excessive
memory usage" health policy, set usexdHeapModule=true
.
Otherwise, the heap usage is sampled and this can create false positives
with generational garbage collection policies such as gencon. The
"memory leak" health policy uses the built-in traditional WAS
performance advisor and this always samples, so it's not recommended
with generational garbage collectors.
Visualization Data Service
This service logs key performance data into CSV log files. The logs
are written to the deployment manager profile directory at
$DMGR_PROFILE/logs/visualization/*.log
- System Administration } Visualization Data Service } Check "Enable
Log"
- Timestamp format =
MM/dd/yyyy HH:mm:ss
- If this is not specified, it defaults to the "number of milliseconds since the standard base time known as "the epoch", namely January 1, 1970, 00:00:00 GMT." - i.e. new Date(timestamp)
- Max file size = 20MB
- Max historical files = 5
- The max file size and historical files apply to each viz data log file, individually.
- Timestamp format =
Example output of ServerStatsCache.log
:
timeStamp,name,node,cellName,version,weight,cpu,usedMemory,uptime,totalRequests,liveSessions,updateTime,highMemMark,residentMemory,totalMemory,db_averageResponseTime,db_throughput,totalMethodCalls
01/03/2019 09:45:53,server1,localhostNode01,localhostCell01,XD 9.0.0.9,1,0.26649348143619733,80953,846,1337,0,01/03/2019 09:45:44,,334792,5137836,,,
Bulletin Board over the Structured Overlay Network (BBSON)
BBSON is an alternative to the High Availability Manager (HAManager) and allows some of the WAS components that traditionally relied on the HAManager to use a different approach. BBSON is built on the P2P component which is peer-to-peer with small sized groups rather than a mesh network like HAManager. This can allow for greater scalability and no need for core group bridges. All IM components can use BBSON. WAS WLM can also use BBSON.
The SON thread pool sizes may be set with cell custom properties
son.tcpInThreadPoolMin
,
son.tcpInThreadPoolMax
,
son.tcpOutThreadPoolMin
, and
son.tcpOutThreadPoolMax
.
High Availability Deployment Manager (HADMGR)
The high availability deployment manager allows multiple instances of the deployment manager to share the same configuration (using a networked filesystem) to eliminate a single point of failure if one of them is not available. The HADMGR must be accessed through an On Demand Router (ODR) which routes to one of the active deployment managers. The deployment manager can be very chatty in making many small file I/O accesses, thus performance of the networked filesystem is critical.
PMI
In WAS ND 8.5 and above, to disable PMI completely, if you are not using any Intelligent Management capabilities, then set the cell custom property LargeTopologyOptimization=false, disable PMI, and restart:
Intelligent Management which is part of Websphere Application Server V8.5.0.0 and later, requires the default PMI counters to be enabled. It is not possible to disable PMI or the default PMI counters when using Intelligent Management capabilities. If no IntelligentManagement capabilities will ever be used then the property described in this fix can be used to disable Intelligent Management. In turn it will allow disabling the PMI Monitoring Infrastructure of default PMI counters.
- System Administration } Cell } Additional Properties } Custom
Properties } New
- Name: LargeTopologyOptimization
- Value: false
- OK
- Server } Server Types } WebSphere application servers }
$SERVER
} Performance } Performance Monitoring Infrastructure (PMI)- Uncheck "Enable Performance Monitoring Infrastructure"
- OK
- Review
- Check "Synchronize changes with Nodes"
- Save
- Restart
$SERVER