Troubleshooting WebSphere Application Server
Sub-chapters
Education
- Troubleshoot WebSphere Application Server (The Basics)
- Self-paced WebSphere Application Server Troubleshooting and Performance Lab
Preparing for Tracing
WAS supports a diagnostic trace facility:
WAS traditional: https://www.ibm.com/support/pages/setting-trace-websphere-application-server
WebSphere Liberty: https://www.ibm.com/support/pages/set-trace-and-get-full-dump-websphere-liberty
When enabling trace, always notify the customer that the performance impact of trace is highly variable across components, proportional to trace load and the detail of the trace; therefore, there is no way to predict the impact of diagnostic trace. The best thing to do is for the customer to run a benchmark test in a test environment without trace as a baseline, and then run the same test with trace enabled and compare the relative performance difference. Alternatively, diagnostic trace may be enabled on only a single member of a production cluster to reduce the impact on users.
It is critical that the customer follows the instructions in the link above to configure a large number and size of historical trace files when enabling a heavy trace file so that the chances are higher that the trace captures the problem before it rolls over.
Notes
Any class packages that start with com.ibm.websphere are public. Those that start with com.ibm.ws are internal.
Increasing Resiliency for IBM WebSphere Application Server Deployments
The top practices that we have observed in customer situations which cause problems are (http://www.redbooks.ibm.com/redpapers/pdfs/redp5033.pdf):
- No test environment is equal to the production environment
- Communication breakdown
- No plan for education
- No load or stress testing
- Not managing the entire application lifecycle
- No capacity or scalability plan
- No production traffic diagram
- Changes are put directly into production
- No migration plan
- No record of changes
- No current architecture plan
Malpractice: Broadly Disabling Core Logging
By default, WebSphere Application Server has a global log level of *=info. This means the following messages are logged: info, audit, warning, severe, and fatal. It is almost always a malpractice to use *=severe, *=fatal, or *=off, because warnings and errors generally occur infrequently and are critical to understanding why problems occurred. It is often a malpractice to use *=warning, because there are many informational messages that are very useful to understanding why problems occurred. If there are repeating messages flooding your logs, then the last resort should be to broadly disable core logging; instead, consider:
- Open a support ticket with the owner of the message to understand why the message occurs so frequently.
- Change the log level of the particular logger for those messages
(after understanding what they mean in #1). Any log levels specified
after the global log level override the log level for that particular
logger. For example, if the log configuration is
*=info:com.test.Logger=warning
, then the threshold is only changed for com.test.Logger messages. - On Liberty, use <logging hideMessage="..." />
Command line HTTP client with a keep-alive socket
You may use a command line HTTP/HTTPS client to test keep-alive
sockets. Note that the Host
header is required. Press enter
twice after entering the Host
header to send the request.
If keep-alive connections are supported by the server, after the
response is shown, you may send another request.
For HTTP, example using telnet
. Type Ctrl+]
and then type quit
to quit instead of
Ctrl^C
:
$ telnet localhost 80
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
GET / HTTP/1.1
Host: localhost
HTTP/1.1 200 OK
X-Powered-By: Servlet/3.1
Content-Type: text/plain
Content-Language: en-US
Date: Mon, 05 Apr 2021 17:23:31 GMT
Hello World
GET / HTTP/1.1
[...]
For HTTPS, example using openssl
:
$ openssl s_client -connect localhost:443
CONNECTED(00000003)
[...]
---
GET / HTTP/1.1
Host: localhost
HTTP/1.1 200 OK
X-Powered-By: Servlet/3.1
Content-Type: text/plain
Content-Language: en-US
Date: Mon, 05 Apr 2021 17:25:01 GMT
Hello World
GET / HTTP/1.1
[...]