sidd troubleshooting

During the initial SSL handshake between browser and web server, a SSL session is established, and characteristics such as client authentication and allowable ciphers are determined. This initial handshake is computationally intensive.

For subsequent TCP connections, the browser normally attempts to resume the prior SSL session instead of establishing a new SSL session, in order to avoid the expense of a full handshake. In order to support this resumption of SSL sessions, IBM HTTP Server maintains a cache of SSL sessions which can be resumed.

For the Windows platform, only one IBM HTTP Server child process is used to handle client connections, so an in-process cache maintained by the security library is sufficient. This document about sidd does not apply to the Windows platform.

For platforms other than Windows, multiple IBM HTTP Server child processes are normally used to handle client connections, so the cache of sessions must be accessible to all of those child processes. A session id cache daemon is provided (IHSROOT/bin/sidd), and it is started automatically when SSL support is enabled. It runs as a separate process.

Disabling sidd

If problems are experienced with sidd, there are certain circumstances where it can be safely disabled. Otherwise, most problems can be resolved with a configuration change.

AIX, HP-UX, Linux, Solaris

If a single, or few, long-lived child process is used to serve requests, sidd can be disabled and the internal security library cache used instead.

Disable the IBM HTTP Server sidd with the SSLCacheDisable directive and remove any existing SSLCacheEnable directives in httpd.conf.

z/OS

The IBM HTTP Server InfoCenter explains now to use the native z/OS equivalent of sidd ("SSL Started Task").

Diagnosing sidd connect failures

For every SSL handshake, the httpd process handling the connection will communicate with the session id daemon. The communication takes place over a Unix (AF_UNIX) socket.

Certain types of problems can result in a connect failure, and one of the following messages may be seen:

  • ECONNREFUSED

        [crit] (nnn)Connection refused: SSL0600S: Unable to connect to session ID cache
    

    The session id cache is not running or is temporarily overloaded or an operating system-specific issue has been encountered.

  • EPERM

    [crit] (13)Permission denied: SSL0600S: Unable to connect to session ID cache
    

    The filesystem permissions in the path to the session id cache socket do not permit the web server user id to access it.

  • EMFILE

    [crit] (24)Too many open files: SSL0600S: Unable to connect to session ID cache
    

    The per-process file descriptor limit is too low, or the system-wide file descriptor limit is too low.

    This typically affects other web server and plug-in operations as well.

The wording of "Connection refused" or "Permission denied" can vary from one platform to another.

Communication between httpd processes and sidd is required to reuse SSL sessions (i.e., avoid the expensive handshake on every TCP connection). Thus, if the connect error is occurring very frequently, it will result in a substantial increase in CPU utilization because SSL sessions will be reused infrequently.

General diagnosis steps

These steps do not apply to the EMFILE error.

  1. Make sure that the Unix socket used by sidd resides on a normal, local filesystem, as some network or other filesystems don't support Unix sockets. The default location for the Unix sockets is

    IHSROOT/logs/siddport.

    If that does not reside on a normal filesystem, use the SSLCachePortFilename directive to place the Unix socket in a directory which resides on a local filesystem.

    Example:

    SSLCachePortFilename /var/run/siddport                         
    
  2. If more than one IBM HTTP Server instance is used on this machine, make sure that each has been configured with a specific Unix socket. This is usually a problem when two instances share the same server root or install location.

    Example:

    httpd-app1.conf

    SSLCachePortFilename /var/run/app1-siddport                         
    

    httpd-app2.conf

    SSLCachePortFilename /var/run/app2-siddport                         
    
  3. Make sure that one sidd process is running for every IBM HTTP Server instance. The parent process of sidd will be the parent httpd process of that web server instance.

    If sidd is not running for one or more instances, use the sslcacheerrorlog directive in the conf file to specify the name of a sidd error log. Restart the web server. Once sidd exits or fails to start up, check the sidd error log.

Specific diagnosis steps for the EPERM failure

This problem is caused by the web server user id (e.g., "www" or "nobody") not having permission to read the Unix socket used by sidd, the session id cache daemon). When this error occurs:

  • It is caused by a configuration problem.

  • It can result in significant increase in CPU usage for busy web sites because the error will occur for every handshake.

Consider /opt/IBMIHS as the example IBM HTTP Server install directory, and assume that customer did not use the SSLCachePortFilename directive to specify the location of the sidd socket, and www is the web server user id (value of User directive).

When IBM HTTP Server starts up, sidd will create the file /opt/IBMIHS/logs/siddport. When a new client SSL connection is received, mod_ibm_ssl will be running as user "www" and will try to connect to the sidd socket. So user "www" must have read and execute permissions to these directories:

        
/opt                                                                    
/opt/IBMIHS                                                             
/opt/IBMIHS/logs                                                        

And user "www" must have read permission to this "file":

/opt/IBMIHS/logs/siddport

Normally, when IBM HTTP Server is installed the directories will be world readable and executable. If the customer changes those permissions (on /opt, /opt/IBMIHS, or /opt/IBMIHS/logs) then permission errors will be received when new SSL connections are being established and mod_ibm_ssl tries to connect to the sidd socket. The SSLCachePortFilename directive can be used to place the sidd socket somewhere else.

Example:

SSLCachePortFilename /var/run/siddport                         

The actual file needs to be in a directory structure which, on your system, the web server user id can access.

If you have two instances of IBM HTTP Server that share an installation directory, they should each have a different argument to SSLCachePortFilename directive specified.

Specific diagnosis steps for the ECONNREFUSED failure

There are several classes of this error:

  • solid failure
    The ECONNREFUSED error occurs for every handshake. sidd has exited or some misconfiguration problem exists.

  • a single failure that occurs immediately following an IHS restart

  • intermittent failure

For solid failures, follow the general diagnosis steps above.

For a single failure that occurs immediately following an IHS restart, a problem was identified and a fix provided by APAR PK78007.
A fix for this APAR is provided in fixpacks 6.0.2.35, 6.1.0.25, and 7.0.0.5. Refer to the APAR for additional details.
The problem can be safely ignored as there are no ill effects on the server itself, but you can apply the appropriate fixpack as desired.
The fix is pertinent only for this 'Connection Refused' error message and not for the other errors such as 'Permission denied' error.
If you are getting multiples of this error message, then the problem is likely to be some other error or misconfiguration that is not addressed by this APAR.

For intermittent failures, find how many handshakes are impacted by comparing the number of failures to the number of total handshakes.

Set LogLevel info in the web server configuration file, rename error_log so that a new one is created, and restart. After sufficient data has been gathered:

  1. Find the total number of SSL handshakes

        $ grep " Session ID: " logs/error_log | wc -l
        5073
    
  2. Find the number of sidd connect errors

        $ grep "SSL0600" logs/error_log | wc -l
        49
    
  3. Find the percentage of failures

        49 / 5073 is a little less than 1%
    

If the percentage of failure is less than 10%, it should have only a small impact on CPU usage.

If the percentage of failure is higher, check the operating system-specific notes below for known issues.

Specific diagnosis steps for the EMFILE failure

Operating system-specific notes

Solaris

Solaris 10 has an apparent problem, seen both on SPARC and x64 platforms, which results in the ECONNREFUSED failure even under relatively light loads. This issue is tracked by Sun under bug id 6460268

Customers encountering the "Connection refused: SSL0600S" message on Solaris 10 should check with Sun on the availability of a fix for this problem.

Linux

On Linux systems tested (2.4 and 2.6 kernels), the ECONNREFUSED error can only occur due to a configuration problem and/or the sidd process exiting. It will not occur intermittently, because the AF_UNIX support in the kernel will block a thread waiting to connect to sidd once the connect queue becomes full.