MustGather: Client Symptoms

on IBM developerWorks.

Known issues to check for first

F5 connection or health check failures over SSL after PI66931 (GSKit 8.0.50.69)

Some clients, including some firmware levels or configurations of F5, are intolerant of the "Extended Master Secret" extension defined in RFC7627. The client may close the connection immediately after the handshake completes. As a workaround, the following snippet can disable RFC7627 in the same context as SSLEnable:

SSLAttributeSet 4002 0

HTTP 400 errors after applying maintenance

After PI73984, IHS does stricter checks on the format of the request line and HTTP headers. Some of the invalid data which has been caught are listed below:

  • Whitespace in header names.

  • Excess whitespace in the HTTP request line.

  • Invalid line endings (CR or LF only)

  • Host: headers with invalid hostnames (underscores before PI99567, non-alphanumerics other than "." and "-".

  • Non-ascii bytes in request or response headers

Generally, webserver features that modify HTTP requests cannot repair the invalid data, because it is detected before they have a chance to run. The HTTPProtocolOptions directive relaxes some, but not all, of the new checks.

These problems are usually triggered by non-browser, custom HTTP clients that do not properly implement the HTTP protocol. mod_net_trace can be used to review the content of the HTTP request to find the problem.

HTTPProtocolOptions

Currently "HTTPProtocolOptions unsafe" allows:

  • Invalid HTTP request header names and values.

  • Invalid HTTP response header names, but not values.

Lost/hung requests with low MaxSpareThreads or MaxRequestsPerChild

IHS 8.5 on z/OS has a window where keepalive requests received while a process is shutting down may appear to hang. These are combinations where IHS is based on the event MPM. When a process is shutting down due to MaxSpareThreads, MaxRequestsPerChild, or a server-wide graceful restart, keepalive connections are not immediately closed. If a long-running requests delays the total exit of the process, and a client sends a request on an idle connection, it will appear to hang rather than getting a more immediate closure (as clients must be prepared for on any keepalive connection).

As a precaution, it's recommended to minimize child process exit during heavy load by configuring MaxSpareThreads as close as possible to MaxClients, keeping MaxRequestsPerChild at 0, and minimizing performing graceful restarts during high load.

APAR PI74119 drastically reduces the window.

Hangs, delays, or frontend timeouts with IHS on Linux/PPC or AIX

See gather_highcpu_doc.html#GSKITICC_HIGHCPU

HTTP 503 and 400 errors related to Transfer-Encoding: chunked requests

Prior to WebSphere Plug-in APAR PI32632 (Duplicate/overridden HTTP status codes), a request body with invalid Transfer-Encoding: chunked would most often return a HTTP 400 error as overridden by the WAS Plug-in. After PI32632, the status code chosen by the Apache-based server is used instead. Servers based on Apache 2.2 (like IBM HTTP Server V7 an V8) respond with unusual status codes for some of these errors -- 413 and 503. While these error codes are not an ideal fit for these error scenarios, they are consistent with non-WAS usage of Apache and are not anticipated to cause any interoperability problems.

TCP connection resets on AIX

AIX 6.1 and later has a feature for denial of service prevention that can cause TCP RST (reset) just after the TCP handshake completes.

/usr/sbin/no -o tcptr_enable
/usr/sbin/no -o tcp_rand_port

This is most likely to be a problem when the client connections are short-lived and come from only a handful of IP addresses.

Client SSL handshake failures with client authentication and large keystores

See ssl_questions#longhandshake

SSL is slow in IHS 8.0/8.5 on AIX or Linux/PPC

See gather_highcpu_doc.html#GSKITICC_HIGHCPU

Internet Explorer errors after upgrading IHS 7.0 to a later release

See gather_certificate_doc.html#TLS12IE