Troubleshooting WebSphere Application Server

Sub-chapters

Education

Preparing for Tracing

WAS supports a diagnostic trace facility:

When enabling trace, always notify the customer that the performance impact of trace is highly variable across components, proportional to trace load and the detail of the trace; therefore, there is no way to predict the impact of diagnostic trace. The best thing to do is for the customer to run a benchmark test in a test environment without trace as a baseline, and then run the same test with trace enabled and compare the relative performance difference. Alternatively, diagnostic trace may be enabled on only a single member of a production cluster to reduce the impact on users.

It is critical that the customer follows the instructions in the link above to configure a large number and size of historical trace files when enabling a heavy trace file so that the chances are higher that the trace captures the problem before it rolls over.

Notes

Any class packages that start with com.ibm.websphere are public. Those that start with com.ibm.ws are internal.

Increasing Resiliency for IBM WebSphere Application Server Deployments

The top practices that we have observed in customer situations which cause problems are (http://www.redbooks.ibm.com/redpapers/pdfs/redp5033.pdf):

  1. No test environment is equal to the production environment
  2. Communication breakdown
  3. No plan for education
  4. No load or stress testing
  5. Not managing the entire application lifecycle
  6. No capacity or scalability plan
  7. No production traffic diagram
  8. Changes are put directly into production
  9. No migration plan
  10. No record of changes
  11. No current architecture plan

Malpractice: Broadly Disabling Core Logging

By default, WebSphere Application Server has a global log level of *=info. This means the following messages are logged: info, audit, warning, severe, and fatal. It is almost always a malpractice to use *=severe, *=fatal, or *=off, because warnings and errors generally occur infrequently and are critical to understanding why problems occurred. It is often a malpractice to use *=warning, because there are many informational messages that are very useful to understanding why problems occurred. If there are repeating messages flooding your logs, then the last resort should be to broadly disable core logging; instead, consider:

  1. Open a support ticket with the owner of the message to understand why the message occurs so frequently.
  2. Change the log level of the particular logger for those messages (after understanding what they mean in #1). Any log levels specified after the global log level override the log level for that particular logger. For example, if the log configuration is *=info:com.test.Logger=warning, then the threshold is only changed for com.test.Logger messages.
  3. On Liberty, use <logging hideMessage="..." />

Command line HTTP client with a keep-alive socket

You may use a command line HTTP/HTTPS client to test keep-alive sockets. Note that the Host header is required. Press enter twice after entering the Host header to send the request. If keep-alive connections are supported by the server, after the response is shown, you may send another request.

For HTTP, example using telnet. Type Ctrl+] and then type quit to quit instead of Ctrl^C:

$ telnet localhost 80
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
GET / HTTP/1.1
Host: localhost

HTTP/1.1 200 OK
X-Powered-By: Servlet/3.1
Content-Type: text/plain
Content-Language: en-US
Date: Mon, 05 Apr 2021 17:23:31 GMT

Hello World
GET / HTTP/1.1
[...]

For HTTPS, example using openssl:

$ openssl s_client -connect localhost:443
CONNECTED(00000003)
[...]
---
GET / HTTP/1.1
Host: localhost

HTTP/1.1 200 OK
X-Powered-By: Servlet/3.1
Content-Type: text/plain
Content-Language: en-US
Date: Mon, 05 Apr 2021 17:25:01 GMT

Hello World
GET / HTTP/1.1
[...]