FAQ: WAS Plug-in

Interactions with Apache

Does the WAS Webserver Plug-in support Apache 2.4?

Yes, the WAS Webserver Plug-in supports Apache 2.4 in 8.5.5.2/8.0.09 and later (PI06036)

How can I log my WebSphere-based authentication in the IHS access log?

If WebSphere is configured to use HTTP Basic Authentication, IHS can only log the userid and password together in base64 encoded form. This is accomplished by adding %{Authorization}i to your LogFormat directive.

If WebSphere is configured to use form-based authentication, IHS cannot log a username. As an exception, if the application code itself sets a cookie or HTTP header based on the logged in userid, this cookie or header can be logged by IHS.

Example LogFormat additions for logging of incoming cookies or response headers (full information is available in the documentation for mod_log_config).

  • %{cookiename}C (note that this cookie would only be available on requests that come after the login, not during the login itself)

  • %{headername}o

What about mod_dir, mod_rewrite, and the WebSphere plug-in? (IHS 2.0 and above)

What is the the priority of the following directives?

  • mod_dir(dir_module)

  • mod_rewrite(rewrite_module)

  • WebSpherePlugin(ibm_app_server_http_module)

mod_dir only handles objects which can be served by IHS as static files, so it cannot be used to redirect requests to WebSphere. mod_dir has the lowest priority of these modules, and the priority cannot be changed. It will only try to handle a request if the request was for a directory and no other module has decided to serve the request.

With IHS 2.0, mod_rewrite always takes precedence over the WebSphere plug-in and, with the proper configuration, mod_rewrite can first rewrite URLs and then the WebSphere plug-in can see the rewritten URL and decide whether or not to serve it.

Example: Customer wants to use mod_rewrite to change URL /home to /servlet/home/, and customer has configured the WebSphere plug-in to handle /servlet/*.

Here is a mod_rewrite directive to map /home to /servlet/home, and at the same time pass it through to the WebSphere plug-in to allow it to see the rewritten URL. The PT flag on the RewriteRule is what allows the WebSphere plug-in to process the rewritten URL.

    RewriteRule ^/home /servlet/home [PT]

Note: In IHS 1.3, the actual order of the LoadModule or AddModule directives also makes a difference. The LoadModule or AddModule for mod_rewrite needs to come after the WebSphere plug-in is activated to allow mod_rewrite to rewrite URLs and then have the WebSphere plug-in process the rewritten URL. This is not the case with IHS 2.0, where mod_rewrite always takes precedence.

How does mod_cache interact with the WebSphere Plugin?

mod_cache can cache content generated by the WebSphere Plugin if it has the appropriate HTTP headers in the response, however this cache does not interact with the Plugin ESI cache. When mod_cache is cacheing content generated by the WebSphere Plugin you will not see evidence of the WebSphere Plugin being called for the cached request.

How can I change requests to affect how the plugin handles them?

Changing or hideing a URL from the WAS Plug-in is documented extensively in this page.

To hide a URL from the WAS Plug-in, set the per-request environment variable "skipwas" to any value:

SetEnvIf Request_URI ^/+app1/+BlockMe skipwas=1

Can the http_plugin.log be rotated?

Yes, by writing logs to an external program such as the rotatelogs binary provided by IHS/Apache.

Further details in APAR PI16910: http://www-01.ibm.com/support/docview.wss?uid=swg1PI16910

Example syntax:

<!-- the path to the program is relative to the IHS ServerRoot -->
<Log LogLevel="Error" Name="|bin/rotatelogs /opt/IBM/HTTPServer_Plugins/logs/webserver1/http_plugin.log 100M"/>

Note: The rotatelogs program does not directly support archive/deletion of old logs. The intent of rotatelogs is to allow external mechanisms to archive/delete logs that are not actively in use by the server. The "logrotated" package on linux is an example of taking further action on rotated logfiles. Simple archive/compress/delete can be accomplished by passing a custom script to rotatelogs -p option.

Retries, timeouts, markdowns, etc

When will the WAS Plug-in retry a request?

A number of conditions prevent a request from being retried by the WAS Plug-in.

  • If the retry is due to ServerIOTimeout being exceeded, the value of ServerIOTimeoutRetry limits the number of retries.

  • If the HTTP request method is POST, and the request body does not fit in the PostBufferSize. Note: PostBufferSize defaults to 64k before 8.5.0.0, zero is the recommended value.

  • If the application server responds with a non-503 HTTP response.

Provided none of the above conditions are met, the following conditions result in a retry:

  • If the WAS plugin fails to obtain a TCP connection, or fails to complete an SSL handshake, the request is retried on another server.

    Note: Not a retry in the strictest sense, since no HTTP request is written in this case. A failure here always results in a markdown.

  • If the plugin fails to read (EOF or reset connection) from an existing (pooled) connection, the request will be retried on the same server.

  • If the plugin waits longer than ServerIOTimeout seconds, for I/O with the application server to complete, the request is retried.

    • The server will be marked down if the ServerIOTimeout setting is a negative value.

    • If the request had affinity, and the server is not marked down as a result of the timeout or other activity in the webserver process, the retry will occur on the same server.

  • If the application server responds with a 503 HTTP status code, the request is retried. See the affinity related caveats above in the ServerIOTimeout bullet. MarkBusyDown determines if the server returning the 503 is marked down.

    • The markdown of the server in this case depends on the custom property MarkBusyDown

What happens during an unplanned application server outtage?

If an application server terminates unexpectedly, several things unfold. This is largely WebSphere edition independent.

  • The application servers operating system closes all open sockets.

  • WebServer threads waiting for the response in the WAS Plug-in are notified of EOF or ECONNRESET.

  • If the error occurred on a new connection to the application server, it will be marked down in the current webserver process. This server will not be retried until a configurable interval expires (RetryInterval).

  • If the error occurred on a an existng connection to the application server, it will not be marked down.

  • Retryable requests that were in-flight are retried by the WAS Plug-in, as permitted.

  • If the backend servers use memory to memory session replication (ND only), the WLM component will tell the WAS Plug-in to use a specific replacement affinity server.

  • If the backend servers use any kind of session persistence, the failover is transparent.

    • Session persistence is available in all websphere editions.

    • New requests, with or without affinity, are routed to remaining servers..

    • After the RetryInterval expires, the WAS plug-in will try to establish new connections to the server. If it remains down, failure will be relatively fast, and put the server back into the markd down state.

What flexibility around timeouts is available?

APAR PM94198 enhances the WAS Plug-in timeouts in a few different ways, mostly limited to Apache-based webservers.

  • A timeout while waiting for the "Extended Handshake" response, "100-Continue" response, or during the SSL handshake now always results in a markdown of the server.

    • Timeouts at this stage imply a severe resource shortage on the application server.

  • The timeout used during "Extended Handshake", "100-Continue", and SSL handshakes can use the (typically shorter) configured ConnectTimeout by setting the per-request environment variable websphere-shorten-handshake (Apache only).

  • ServerIOTimeout can be overridden by setting the per-request environment variable websphere-serveriotimeout (Apache only).

  • Apache-based webservers can override ServerIOTimeoutRetry by setting the per-request environment variable websphere-serveriotimeoutretry (Apache only).

Note, conditional setting of per-request environment variables in Apache can be accomplished with the SetEnvIf or RewriteRule directive.

Examples:

    SetEnvIf Request_URI "^/app1/" websphere-serveriotimeout=10
    SetEnvIf Request_URI "^/app1/SlowServlet" websphere-serveriotimeout=60
    SetEnvIf Request_URI "^/app1/" websphere-serveriotimeoutretry=2
    SetEnvIf Request_URI "^/app1/" websphere-shorten-handshake=1
    SetEnvIf Request_URI "\.(jpe?g|gif|css|js)$" skipwas=1

Why might a server appear to be used or marked up before RetryInterval

There are a handful of reasons a server may appear to be used or marked up before `RetryInterval has been exceeded:

  1. The use of the server may be in a different process from where the markdown occurred

  2. The server may have been selected prior to the markdown. If the request completes successfully, the server will additionally be marked up.

  3. The config may have been reloaded, or the webserver restarted, subsequent to the markdown.

Backend connection management

Can I disable SSL for backend connections?

In 9.0.0.4 and later, PI77874 allows the Plugin to use HTTPS backend connections for inbound HTTPS requests. This is not available in V8R5. See http://www-01.ibm.com/support/docview.wss?uid=swg1PI77874

Why does it sometimes take 1-2x the ServerIOTimeout setting to report 'ServerIOTimeout fired'?

If the plugin is waiting for I/O while the webserver is sent a signal, such as during a webserver stop or some types of child process exit such as MaxRequestsPerChild or MinSpareThreads, the poll() system call that waits for I/O with the specified timeout will be restarted. The pluin will have only waited for a portion of ServerIOTimeout, and continues to wait for a full ServerIOTimeout.

What's the story with 100-Continue?

  • Forward Expect: 100-Continue if present from client

  • Add it if there's a request body

  • Add it when there's no body if WaitForContinue=true (non-default) is set for the selected server. See ExtendedHandshake for something that is done pre-flight on new connections only. Note that Liberty does not appear to support the extended handshake.

How does MaxConnections work?

MaxConnections limits the number of connections the WAS Plug-in will open to a single application server from a single webserver child process. In practice, the per-process limitation severely limits the ability to pick a useful number.

  • Crossing MaxConnections does not result in a markdown.

  • MaxConnections applies even to affinity requests.

  • It is usually better to drastically reduce the TCP listen backlog in the application server and reject workload that way

How to debug a "connection refused" error in the http_plugin log imply?

  1. Verify the hostname and port in the message are correct for your JVM.

  2. Verify that the JVM was running at the time of the error, if it wasn't, the error is expected.

  3. Verify that the JVM had idle web container threads at the time of the message.

  4. Verify a command-line client can access the hostname and port from the webserver system.

    1. If this fails, Verify there is no firewall between IHS and WAS and no local firewall rules that block access.

  5. Verify that if multiple IP addresses are associated with the transport hostname, that all lead to the WAS host and the application server listens for connections on all of them.

Misc FAQS

Why are my OSGI context roots not present in plugin-cfg.xml?

OSGI applications need to be mapped/targetted to a webserver just as a normal web module, however the WAS admin console interface is slightly different.

Why might the plugin not reload plugin-cfg.xml on the fly / require a restart?

  • No request has arrived past the RefreshInterval.

  • When "IM for WebServers" or the "ODRLIB" is enabled, dynamic reloading is disabled.

  • In Apache-based servers, reloads of plugin-cfg.xml happen at request-processing time after the webserver has switched to an unprivileged userid. Ensure the full path to plugin-cfg.xml is readable by this ID.

Does IHS support websockets? Does the WebSphere Plug-in support websockets?

General

  • The WAS Plug-in supports websockets as of 8.5.5.3 and 9.0.0.0 and all later releases/versions(PI17642). This support applies only to Apache-based webservers. WebSockets connections currently have a unique requirement that the backend transports host and port need to be specified as host aliases in the virtual host the websocket application is mapped to.

  • IBM HTTP Server does not include any other websockets related modules (such as mod_proxy_wstunnell)

  • mod_proxy_wstunnel in vanilla Apache 2.4 should interoperate with WebSphere Application Server websockets applications, but no support is provided by IBM for generic HTTP proxy servers.

Operational concerns

Using websockets presents some special concerns for Apache-based servers running the WAS WebServer Plug-in. Each established websocket connection ties up a webserver processing thread for the duration of the websocket connection.

The primary ways that a websocket connection terminates are:

  • Client or server initiated close

  • Idle timeout enforced by the WAS Plug-in

  • Restart (non-graceful) of the Apache-based server instance

  • Crash or other abnormal termination of the Apache-based webserver process

Beyond scalability concerns, perpetually open websockets connections will block the graceful termination of Apache processes. These occur when MaxSpareThreads is less than MaxClients, when MaxRequestsPerChild is non-zero, and when the administrator performs a "graceful restart" of the server. These very same features are often used to limit the memory usage of Apache-based servers, but when the termination is delayed it will have a counter-productive effect.

This leads to the following general guidance:

  • WebSocket clients should recover (reconnect) in the event of TCP closure or other error.

  • WebSocket backend applications should accomodate clients that reconnect on error.

  • If the client/server implement a keepalive mechanism, avoid webserver features that trigger graceful process termination:

    • set MaxSpareThreads to the same value as MaxClients

    • ensure MaxRequestsPerChild is 0

    • avoid apachectl graceful in these environments, as they will leave processes from the prior generation that need to be forcibly killed.

  • If using z/OS or 9.0 on Linux, set ServerLimit to a value larger than MaxClients/ThreadsPerChild

    • This allows exiting processes to not take up space needed by replacement processes.

What are known limitations in large POST requests?

  • The WAS Plug-in can't forward request bodies that specify a Content-Length of 2GB or greater, until 9.0.0.0.

  • The WAS Plug-in can forward a > 2GB POST body if it is sent with Transfer-Encoding: chunked, until 9.0.0.0.

Note that only WebSphere Liberty with servlet-3.1 can parse requests with a 2GB or greater Content-Length header.

What are known ESI limitations?

The ESi processor has a number of known limitations:

  • Responses are buffered before being sent to the client.

  • Non-text content types are scanned for ESI includes.

What maintenance do I need to see milliseconds in the http_plugin.log?

PM76364 (6.1.0.47, 7.0.0.28, 8.0.0.6, 8.5.0.2) adds milliseconds on z/OS and unix platforms.

Session Management questions

What controls whether the Plug-in switches back after a failover?

By default, affinity switches back when the original affinity server comes back. This switch back occurs because new clones are appended to the session cookie, and the Plugin tries them left to right. WAS session management can prepend the cookie instead, causing requests to NOT switch back. This is recommended when any persistence with time-based writes is used. See property "NoAffinitySwitchBack" in topic "rprs_custom_properties".

How can I avoid having the JSESSIONID cookie blown away when going between different applications

By default, a cookie path of '/' is specified and sessions are unique to each application. The JSESSIONID, as opposed to the session itself, can be shared by setting the Session management custom property "HttpSessionIdReuse".

When you access the 2nd application, it will add its clone ID (according to "NoAffinitySwitchBack") but will not change the rest of the session cookie.

How can IHS ignore URL parameters inserted by WebSphere URL session rewriting?

IBM HTTP Server treats the URL rewriting as part of the filesystem path of static resources being requested. mod_rewrite can be used to remove this information from URL's, but care has to be taken to only change requests that really will not be sent to WebSphere.

The first method involves putting the rewrite rules in existing <Directory> containers, because these will never affect WebSpehre Requests.

  • Enable mod_rewrite by uncommenting the LoadModule rewrite_module... line in httpd.conf.

  • Find the <Directory> container for your DocumentRoot and for any Alias you've added that will be requested with a URL-rewritten session ID.

  • Inside each existing <Directory> container for your DocumentRoot and Alias'es, add this ruleset, substituting the RewriteBase parameter as dictated by the URL-path of the <Directory> context being updated:

    # Note: this must be in an existing <Directory> container 
    RewriteEngine on
    # The RewriteBase is "/" in the case that this <Directory> block is for the DocumentRoot
    # If this <Directory> block is for an Alias, the RewriteBase below
    #   Should be the same as the first argument to the Alias directive.
    RewriteBase /
    RewriteRule (.*); /$1 [PT]
    

The second method puts the mod_rewrite rules in <VirtualHost> context, which simplifies configuration in one way but complicates it in that the user-supplied pattern must be used to restrict the rewrite to static (local IHS) content. This effectively needs to mimic the WebSphere Plugin processing to determine which URLs to remove the URL-rewritten session info from.

  • Enable mod_rewrite by uncommenting the LoadModule rewrite_module... line in httpd.conf.

  • Determine what prefix or filename extension you want to limit the rewriting to, and express it in a Perl Compatible Regular Expression (PCRE).

  • Inside each existing <VirtualHost> container, and once at the bottom of httpd.conf, add this ruleset, substituting prefixes and file extensions are your requirements dictate.

    RewriteEngine on
    RewriteRule (.*\.(?:gif|jpg|css); $1 [PT]
    RewriteRule (/someprefix/.*\.(?:txt|pdf)); $1 [PT]
    

What options are there for removing a server from routing?

The primary consideration is what happens to requests in-flight, new affinity requests, and new non-affinity requests. Every approach listed below is safe for in-flight requests.

  • Performing a normal stop of the application server is the simplest and most effective way to stop it from taking traffic. During stop processing, the application server immediately closes its listening sockets and idle keepalive connections, while allowing in-flight requests to complete for 3 minutes.

    • With closed listening sockets and keepalive connections, the Plug-in will "fail fast" when trying to route to this server and quickly mark it down.

  • If you set a servers (static/configured) weight to zero, and generate and propagate the plugin-cfg.xml, the WAS Plug-in will only send affinity requests.

    • This method of "draining" can be slow because affinity is maintained by the presence of a session cookie which by default has no timed expiration. Long dormant browsers with expired server-side sessions may return to the webserver w/ the draining cloneID.

    • This change can optionally be done in plugin-cfg.xml manually.

    • This method is used by the single-cell rollout procedure discussed here

  • If you make a server a "backup" server in the cluster, it will not be used while any primary (non-backup) servers are available, even if affinity is established with the backup server. This will cause HTTP session failovers.

    • Backup servers are not load-balanced, so this is not recommended if taking multiple servers out of load balancing at the same time.

    • In the UI, the "server role" is under a path like WebSphere application server clusters > mycluster > Cluster members > member1 > Web server plug-in properties

    • In wsadmin terms:

      plgProps = AdminConfig.list('WebserverPluginSettings', getServerByNodeAndName(nodename, servername)
      AdminConfig.modify(plgProps, [['Role', backup_or_primary]])
      
    • This change can optionally be done in plugin-cfg.xml manually.

  • Removing (or commenting) a <Server> entry from plugin-cfg.xml will stop all subsequent requests, causing affinity requests to failover.

  • If you change an application servers cloneID, and generate/propagate plugin-cfg.xml, affinity requests using the old cloneID will be re-balanced. See the HttpSessionCloneId HTTP Session custom property.

    • This change cannot be effectively done in plugin-cfg.xml manually.

  • If Intelligent Management for webservers is enabled, putting the server into one of the two maintenance modes either sends it only affinity requests, or never sends it any subsequent requests. Pending requests are permitted to complete.

  • Using a "maintenance page" in the webserver is a way to quickly bleed traffic from all application servers.

How to safely roll out updates to enterprise applications

There are a spectrum of methods used for rolling out an update. These methods generally try to address one or more of the following requirements to varying degrees:

  • Bleed sessions from servers running version N, for example by setting weights to zero.

  • Validation of the new application version in production before all users are exposed to it Examples include: - Accessing the application server ports directly for validation - Using an additional virtual host name in the updated version, possibly in an additional cluster or cell

  • Fast rollback, for example by changing plugin-cfg.xml manually to pivot from servers running version N-1 to version N.

On z/OS, WAS supports "application rollout" as a first-class operation. Details of what happens during a rollout are available here

Network Deployment, when used with an Intelligent Management (IM) enabled proxy servers, supports this via Application Editioning. An explanation of what happens during rollout is available here

Other editions of WebSphere, or Network Deployment without an IM proxy server, can achieve many of the requirements with a series of coordinated changes to WebSphere and manual changes to plugin-cfg.xml. More information is available here

A basic but straighforward solution is to use a maintenance page on the proxy server in a low-traffic window.

SSL Questions

How is the plugin-key.kdb personal certificate used?

When a webserver definition is created in traditional WAS, a plugin-key.kdb is created that trusts the deployment manager root CA. A personal certificate, signed by the same deployment manager CA, is also created and assigned the label "default". This key store can be managed in the WAS admin console and propagated to the webserver host.

In default configurations, this initial personal certificate is not used for any purpose.

In non-default configurations described below, the personal certificate may be used implicitly or further configuration might be required.

In any case where TLS mutual auth is required, including XD_AGENT ports or collective dynamic routing:

  1. The certificate label used by the Plugin can be customized by using the CertLabel property: KC Link

  2. The certificate used must be issued by a CA that is trusted by the application server.

    • If the deployment manager is removed or replaced, certificates will need to be manually created.

  3. The WAS Plug-in personal certificate can be converted or replaced via the WAS admin console. Some steps related to replacing older certificate types is documented here

TLS Mutual Authentication required in WAS

For traditional WAS, connections from the WAS Plugin will fail if TLS mutual auth is enabled without taking one of the following additional actions:

  • Using Ikeyman, gskcapicmd, or gskcmd mark the certificate as the default certificate for the keystore.

    • Note, this is different then having a label of "default".

  • Customize the CertLabel as described in the preceding section.

For Liberty, plugin-key.kdb is manually created and administered by the end user. Obtain a certificate and ensure Liberty trusts its issuer via Libertys configured key or trust stroe.

Intelligent Management for webservers (odrlib) enabled

For traditional WAS, the automatically generated certificate with label "default" will be used by default.

For Liberty collectives, a replacement certificate with separate properties is created during dynamic routing setup.

What's the deal with SSL breakage when moving past 8.5.5.6/8.0.0.11?

After APAR PI39126 (8.5.5.7, 8.0.0.12, 9.0.0.0), the WAS WebServer Plugin uses modern defaults for SSL/TLS processing. This includes disabling legacy protocols, ciphers, and certificate validation. This may cause problems if WAS has been explicitly configured to use only weak/export ciphers, or has been configured with a certificate chain that does not meet contemporary standards.

In practice, problems observed after PI39126 have fit into the following certificate processing related categories (See RFC5280 for complete details):

  1. BasicConstraints extension: All certificates used to validate digital signatures (AKA issuers, signers, or CA's) must contain a BasicConstraints extension with the "criticality" field set to TRUE.

  2. CertificatePolicies extension: The CertificatePolicies extension must be RFC5280 conformant across the certificate chain. The algorithm is quite complex, but in a simplifed form an intermediate signer cannot assert policies not also asserted by its own signer.

    • A special case of this is certificate chains that list both specific policies and the special AnyPolicy policy. These are incorrectly rejected by GSKit prior to 8.0.55.10.

The certificate validation changes in PI39126 can be disabled, after PI49893 (8.0.0.12, 8.5.5.8, 9.0.0.0) by setting the WAS Plugin custom property certificate_validation_strict_rfc5280="false".

The default of the property above is flipped in PH27968 (8.5.5.18, 9.0.5.6) so that the checking is more tolerante by default.

What SSL/TLS ciphers/protocols does the Plug-in support?

If the application and the WAS Plugin do not share any TLS ciphers or protocols, http_plugin.log will typically log a message with GSK_ERROR_SOCKET_CLOSED(420) and PARTNER CERTIFICATE: DN=[No Information Available].

Protocols

  • TLS 1.2 has been supported and enabled by default on all platforms since 8.5.5.10/9.0.0.0.

  • TLS 1.3 is supported, but not enabled by default, on all platforms in 9.0.5.2 and later. For more information see https://www.ibm.com/support/pages/apar/PH17128

Ciphers

  • AIX, Linux, Windows, HP-UX, Solaris:

    • In 9.0.5.2 and later, WAS plugin uses all available FIPS-approved ciphers by default.

      • With AutoSecurity="False", the WAS plugin uses a small subset of weak and moderate historical ciphers.

      • StrictSecurity has no affect.

    • In 8.5.5.7 (and later) and 9.0.0.0-9.0.5.1, the WAS plugin uses all available FIPS-approved ciphers by default.

      • If StrictSecurity="True" is set in plugin-cfg.xml, only TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256(C027) and TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256(C02F) are used.

      • With AutoSecurity="False", the WAS plugin uses a small subset of weak and moderate historical ciphers.

  • z/OS:

    • In 9.0.5.2 and later, the WAS Plugin uses a broad set of ciphers by default including GCM and ECDSA based ciphers by default.

      • It is possible to override the list of ciphers by setting the following native environment variables:

        export WS_PLG_NO_CIPHERS_EXPANDED=1
        export WS_PLG_GSK_V3_CIPHERS_CHAR4=1
        # Multiple can be listed, see https://www.ibm.com/docs/en/zos/2.4.0?topic=programming-cipher-suite-definitions#csdcwh__telcsd for identifiers
        export GSK_V3_CIPHER_SPECS_EXPANDED=C030
        
      • If StrictSecurity has been explicitly set to False, the WAS plugin uses a small subset of weak and moderate historical ciphers.

    • In 9.0.0.0-9.0.5.1, only TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256(C027) and TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256(C02F) are used by default.

      • If StrictSecurity has been explicitly set to False, the WAS plugin uses a small subset of weak and moderate historical ciphers.

    • In 8.5, the WAS plugin uses a small subset of weak and moderate historical ciphers by default

      • If StrictSecurity has been explicitly set to True, TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256(C027) and TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256(C02F) are used.

What does GSK_ERROR_BAD_CERT (gsk rc = 414) mean?

GSK_ERROR_BAD_CERT has multiple causes/solutions, listed below in order of frequency:

CA not trusted by plugin-key.kdb

The http_plugin.log will record a GSK_ERROR_BAD_CERT error when plugin-key.kdb doesn't trust the root issuer of the application servers certificates.

In typical configurations where WAS issues certificates for the servers in the cell, it's only necessary to do a one-time copy of plugin-key.kdb from the WAS admin console. Use the button in the console under Webservers > webserver1 > Plug-in properties:

_images/copykdb.jpg

If self-signed certs are replaced with certificates issued by some internal or public CA, the deployment managers copy of plugin-key.kdb will need to be updated with the new root CA (intermediates not necessary)

  • If the webserver is used with WebSphere Liberty, or used with traditional WAS but not managed by the WAS admin console the plugin-key.kdb will have to be modified manually on the webserver system:

    1. Obtain the new CA certiicate that issued your application server certificates and store it on the webserver (e.g. /tmp/ca.cer)

    2. Find the path to plugin-key.kdb by checking plugin-cfg.xml

    3. Add the trusted CA to plugin-key.kdb: <IHSHOME>bin/gskcapicmd -cert -add -db </path/to/>plugin-key.kdb -stashed -file /tmp/ca.cer -label new-ca

    4. Restart IHS.

  • If the webserver is managed by WAS, the issuer certificate is maintained in a keystore in the deployment manager, which must be synchronized with managed or unmanaged webservers via the admin console, wsadmin, or manually.

    1. Obtain the new CA certificate and upload it to a temporary path on the dmgr (e.g. /tmp/ca.cer)

    2. Open the WAS admin console and navigate to Webservers > webserver1 > Plug-in properties

    3. Click 'Manage keys and certificates'

    4. Click 'Signer Certificates', then 'add' to add your updated certificate authority. Click 'OK' and save any changes.

    5. Propagate the plugin-key.kdb to your webserver using the instructions/image in the preceding section 1. Restart IHS.

Alternatives to the "copy to webserver keystore directory" button:
  • Using wsadmin:

def propagateKeyring(configRoot, cellName, nodeName, webServerName):
objNameString = AdminControl.completeObjectName('WebSphere:name=PluginCfgGenerator,*')
verb = 'propagateKeyring'
args = '[' + configRoot + ' ' + cellName + ' ' + nodeName + ' ' + webServerName + ']'
types = '[java.lang.String java.lang.String java.lang.String java.lang.String]'
print AdminControl.invoke(objNameString,verb,args,types)
  • Manually: Find plugin-key.* under your webservers dmgr config repository on the deployment manager and copy it to the path identified in plugin-cfg.xml

Other causes of GSK_ERROR_BAD_CERT

  • On very old levels, or configurations with "AutoSecurity" set to False: If you enable strict SP800-131a support in your backend application server, you must also configure the StrictSecurity WAS Plug-in custom property. It is strongly encouraged to instead upgrade (PI39126 8.5.5.7, 9.0.0.0) and set "AutoSecurity" to True.

  • Other generic invalid certificate processing errors that need further investigation. Some may be resolved by PH27968 (8.5.5.18, 9.0.5.6).

Other more obscure GSKit errors

  • GSK_ERROR_ASN at startup is usually caused by a corrupt plugin-key.rdb. If no CSR is outstanding, just delete the file.

Intelligent Management for webservers (ODRLIB)

Where can I learn more about IM for webservers, plug-in based ODR?

The best references for now is this topic and its children combined with existing IM information and this EA module

How do traditional configuration elements work once IM is enabled?

Most config elements specified globally in plugin-cfg.xml still apply.

Any configuration at the Server or ServerCluster in plugin-cfg.xml scope are not directly used once IM is enabled, but the same source of those values (servers configuration in the cell or values set in <pluginConfiguration> in a collective) provides the information to the IM-enabled webserver dynamically at runtime.

Where timeout related settings can be overridden in httpd.conf, they take precedence over the dynamic IM-enabled values.

  • Example server-level options are handled dynamically: connectTimeout, serverIOTimeout, waitForContinue, wsServerIdleTimeout, wsServerIOTimeout

    • With Liberty, the server level settings are read from <pluginConfiguration> in server.xml.

  • Example cluster-level options are handled dynamically: ServerIOTimeoutRetry

  • In traditional WebSphere, server weights are handled dynamically. With Liberty, the load balancing algorithm does not use weights.

When I enable global security, IM is no longer able to handshake with WAS for the control/REST connection

There is a limitation that prevents the IM-enabled Plugin from using the "default keystore certificate" to communicate with the XD_AGENT port when that port requires TLS client authentication.

Here is what works:

What are the firewall requirements/issues around "Intelligent Management for WebServers?"

In original Intelligent Management topologies where IHS uses a static plugin-cfg.xml generated by the On Demand Router (ODR), IHS only needs to be able to access the ODR HTTP/HTTPS ports.
The ODR needs access to all of the individual application servers on their web container ports AND the XD_AGENT_PORT on the node agents and dmgrs.

In topologies with an IM-enabled IHS (i.e. using ODRLIB), the IM-enabled IHS needs to be able to talk to all of the individual application servers on their web container ports AND the XD_AGENT_PORT on the node agents and dmgrs. There are several ways to manage this requirement:

  • Permit the traffic between the IM-enabled IHS and the cell.

  • Use a dedicated proxy server (such as WebSphere Datapower) in the DMZ and move the IM-enabled IHS into a network segment where the traffic will be permitted.

  • Use a statically configured plugin-cfg.xml in the DMZ that routes to a IM-enabled IHS deeper in the network where the traffic will be permitted (simulating the IHS + Java ODR topology described above)

Logging FAQs

Can I rotate the http_plugin.log?

PI16910 and later allows http_plugin.log to be rotated if running in an Apache-based server.

Liberty-specific questions

How can the liberty-generated plugin-cfg.xml be customized?

Some elements of plugin-cfg.xml can be specified in server.xml in the <pluginConfiguration> element. See the KC for details

Keys can be changed or added to the top-level <Config> element of plugin-cfg.xml:

<pluginConfiguration pluginInstallRoot="/opt/PLG">  
  <extraConfigProperties certificate_validation_strict_rfc5280="false" IgnoreDNSFailures="true"/>  
</pluginConfiguration>  

What options are available for plugin-cfg.xml generation in Liberty?

Historically, plugin-cfg.xml generation was problematic under liberty, but it improved greatly in late 2016.

  • In 16.0.0.3 and later, individual Liberty servers automatically generate a plugin-cfg.xml at startup / when new applications are loaded. Just copy it into place.

  • The collective controller provides a ClusterManager MBean that can be used to generate a plugin-cfg.xml for a configured Liberty cluster. The script bin/pluginUtility can call this bean from the controller or an application server.

  • Servers with the collectiveMember or clusterMember feature can use IM-enabled WAS WebServer Plug-in, in which case only collective controller metadata lives in plugin-cfg.xml. More information is available here: Setting up Dynamic Routing for Liberty collectives.

  • Historical interest only: Individual/standalone liberty application servers expose an MBean over JMX that writes a plugin-cfg.xml out to the local filesystem.

    • jconsole is included with every java runtime and can be used to invoke the mbean

More information about the WAS Plugin on Liberty is available here: https://www.ibm.com/support/knowledgecenter/SSAW57_liberty/com.ibm.websphere.wlp.nd.multiplatform.doc/ae/twlp_admin_conf_webserver_plugin.html

What options are available for plugin-cfg.xml merging in Liberty?

What options are available for plugin-cfg.xml merging in Liberty?

  • In 16.0.0.3, a merge utility was added to Liberty. The executable is bin/pluginUtility. It provides merging of the plugin-cfg.xml generated on standalone servers. 16.0.0.4 is required for the function to work in non Liberty-ND distributions.

  • The Liberty collective controller can generate pre-merged plugin-cfg.xml files for the servers it manages.

  • Individual/standalone liberty application servers plugin-cfg.xml files can be merged by the scripts provided within traditional WAS

  • Individual/standalone liberty application servers plugin-cfg.xml files can be merged by the standalone command-line merge tool available here: https://github.com/WASdev/sample.pluginmergetool.

More information about the WAS Plugin on Liberty is available here: http://www14.software.ibm.com/webapp/wsbroker/redirect?version=phil&product=was-nd-dist&topic=twlp_admin_webserver_plugin

Alternatives to the WAS WebServer Plug-in

While the typical configuration uses a WAS-aware reverse proxy tier for load balancing, failover, and offload (like IHS + the WAS WebServer Plug-in), generic HTTP reverse proxies can be used with only a handful of functional differences.

These generic proxies range from appliance-based HTTP load balancers to open source reverse proxy servers. Some might even have explicit WAS exploitation, but the depth of that exploitation would need to be discussed with the vendor. The info below assumes the server is backend-agnostic.

Directly fronting WebSphere with a layer 3 or layer 4 device ("IP sprayer" or NAT forwarding load balancer) that doesn't even terminate HTTP is also an option, albeit not a very flexible one.

Information about specific alternatives

What's missing in a generic HTTP reverse proxy?

  • Filtering of private HTTP headers used by WAS. These are relatively well known to some administrators who have had to look at WAS Plug-in traces, but are strictly speaking not managed as a public API. Headers beginning with $WS should be filtered from untrusted clients (the WAS WebServer Plug-in and On-Demand-Router can both whitelist IP addresses that are permitted to supply $WS headers).

  • Setting of private headers used by WAS. An HTTP Server can communicate various bits of data via the private headers listed above, many of which become accessible in HTTPServletRequest APIs (client address, client certificate). See Web Container custom property httpsIndicatorHeader for one example where a generic HTTP reverse proxy might need additional configuration if it does HTTPS offload.

  • When memory-to-memory session persistence is enabled, the semantics of the session ID cookie change with an added level of indirection. A table of partition to clone mappings can be requested from the application server which may change over time. Third-party proxies are unlikely to request/parse this table and may do their own affinity, ignoring DWLM in the application server.

  • Most Intelligent Management functionality requires either an On-Demand Router or IM-enabled WAS Plug-in to be involved in the topology.

  • Cell and collective managed generation of the reverse proxy configuration.

  • Dynamic Clustering (in either traditional or liberty profiles) communicates the cell topology to On-Demand Routers, Datapower, or IM-enabled WAS Plug-in in a way that third-party proxies would be unlikely to be able to interpret. This feature would be difficult to use without one of the above enumerated proxy servers in the request pipeline.

What's missing with no HTTP proxy at all?

  • Static file off-load

  • HTTP Caching

  • Header Manipulation

  • Cookie based session affinity

  • SSL OffLoad

What is likely to work?

  • Basic HTTP reverse proxy (offload, load balancing / clustering, security enforcement, affinity)

  • Websockets with Liberty

What's better?

Some contemporary dedicated proxies are highly vertically scalable and use very low memory. There is also a benefit to standardization if there already exists a preferred enterprise proxy solution.

If an HTTP proxy is built into a Platform as a Service, or part of an otherwise orchestrated container technology, it may already know how to deal with the dynamic registration and deregistration of HTTP servers.

What's worse?

  • No IBM support of the proxy server tier. Not only would support be arranged separately, any problems now become multi-vendor investigations.

  • Unlimited use and commercial support of IHS and the WAS WebServer Plug-in come with every application server edition, whereas other commerical alternatives may be separately licensed.

  • Beyond true dynamic clustering, frequent changes in applications or application servers is much simpler with generated plugin-cfg.xml rather than hand-tended WAS-agnostic proxy server configurations.

Debugging questions

Why does my application get the wrong client or remote IP address or not pass authentication?

HTTPServletRequest.getRemoteAddr() and HTTPServletRequest.getRemoteHost() rely on the WAS Plug-in passing private HTTP headers to the server. Passing of TLS client authentication / mutual authentication between IHS and WAS also depends on this trust being established.

  • The 9.0.5.1 WAS Plug-in fails to propagate these values correclty due to PH17449. Upgrade past 9.0.5.1.

  • If you recently upgraded past 8.5.5.16, 9.0.0.11, or 19.0.0.3 for the first time, configuration is required in WAS to trust the WAS Plug-in to pass this information along: technote

Keywords: trustedSensitiveHeaderOrigin, trustedHeaderOrigin

How do I troubleshoot a connection refused or ConnectTimeout error?

There are several high levels reasons this symptom might be reported, they should be debunked or confirmed in the order presented below, which is a combination of ease in checking and likelyhood.

A "connection refused" error implies one of the following:

  1. There is no process listening on the target IP:port. Confirm the JVM was up at the time the error was reported.

  2. A firewall on either host, or an intermediate device, has blocked the connection.

  3. The server is hung or overloaded to the point where it has no threads available to accept new connections, and the OS buffer for new connections is full.

A "ConnectTimeout" error implies one of the following:

  1. The application server is hung or overloaded to the point where it peroiodically has no threads available to accept new connections in a timely fashion.

  2. A firewall on either host, or an intermediate device, has blocked the connections but is doing so in a way that silently discards packets.

  3. The connect() attempts are being retransmitted due to network failures.

Consequences

A connection related error always results in a markdown. If there are happening rarely, but the loss of capacity is problematic, you can drastically reduce the RetryInterval to reduce the impact of the markdown.

How do I troubleshoot a ServerIOTimeout?

There are several high levels reasons a ServerIOTimeout might be reported, they should be debunked or confirmed in the order presented below, which is a combination of ease in checking and likelyhood.

  1. Confirm the server was not hung or out of threads at the time

    • The most common cause of a ServerIOTimeout is simply a hung application or hung application server. This can cause a delay at any point in the processing of a response, from establishing a connection, performing a handshake, reading the first line of the response, or reading any subsequent byte.

    • You should be able to track the free threads in the application server over time and correlate with the timeout.

    • It is also helpful to get a sequence of javacores around the time of the timeout to look for suspect blocking operations in the application.

  2. Confirm the timeouts are reasonable for the URL that failed

    • The ServerIOTimeout must exceed the amount of time between I/O being available from the application server. There is no end-to-end timeout. Check every server in plugin-cfg.xml, sometimes they may be inconsistent.

  3. Confirm the timeouts are actually occurring

    • There may be a defect where the timeout being used is not the one configured, or it otherwise pops prematurely. This can be monitored by WAS Plug-in trace and/or adding response time to the webserver access log.

  4. Confirm no intermediate network device is dropping connections

    • If there is any load balancer or firewall between IHS and WAS, it could disturb connections sitting in the WAS Plug-in's outbound connection pool that can give the appearance of a timeout.

    • If there is such a device, any timeouts it has configured with should exceed the configured ServerIOTimeout by at least several seconds. Further, it should be configured to actively close connections during any kind of timeout or resource shortage rather than throwing away packets from each side of the connection.

TRACE/DEBUG message explanations