sidd troubleshooting¶
During the initial SSL handshake between browser and web server, a SSL session is established, and characteristics such as client authentication and allowable ciphers are determined. This initial handshake is computationally intensive.
For subsequent TCP connections, the browser normally attempts to resume the prior SSL session instead of establishing a new SSL session, in order to avoid the expense of a full handshake. In order to support this resumption of SSL sessions, IBM HTTP Server maintains a cache of SSL sessions which can be resumed.
For the Windows platform, only one IBM HTTP Server child process is used to handle client connections, so an in-process cache maintained by the security library is sufficient. This document about sidd does not apply to the Windows platform.
For platforms other than Windows, multiple IBM HTTP Server child
processes are normally used to handle client connections, so the cache
of sessions must be accessible to all of those child processes. A
session id cache daemon is provided (IHSROOT/bin/sidd
), and it is
started automatically when SSL support is enabled. It runs as a separate
process.
Disabling sidd¶
If problems are experienced with sidd, there are certain circumstances where it can be safely disabled. Otherwise, most problems can be resolved with a configuration change.
AIX, HP-UX, Linux, Solaris¶
If a single, or few, long-lived child process is used to serve requests, sidd can be disabled and the internal security library cache used instead.
Disable the IBM HTTP Server sidd with the SSLCacheDisable
directive and
remove any existing SSLCacheEnable
directives in httpd.conf.
z/OS¶
The IBM HTTP Server InfoCenter explains now to use the native z/OS equivalent of sidd ("SSL Started Task").
Diagnosing sidd connect failures¶
For every SSL handshake, the httpd process handling the connection will communicate with the session id daemon. The communication takes place over a Unix (AF_UNIX) socket.
Certain types of problems can result in a connect failure, and one of the following messages may be seen:
ECONNREFUSED
[crit] (nnn)Connection refused: SSL0600S: Unable to connect to session ID cache
The session id cache is not running or is temporarily overloaded or an operating system-specific issue has been encountered.
EPERM
[crit] (13)Permission denied: SSL0600S: Unable to connect to session ID cache
The filesystem permissions in the path to the session id cache socket do not permit the web server user id to access it.
EMFILE
[crit] (24)Too many open files: SSL0600S: Unable to connect to session ID cache
The per-process file descriptor limit is too low, or the system-wide file descriptor limit is too low.
This typically affects other web server and plug-in operations as well.
The wording of "Connection refused" or "Permission denied" can vary from one platform to another.
Communication between httpd processes and sidd is required to reuse SSL sessions (i.e., avoid the expensive handshake on every TCP connection). Thus, if the connect error is occurring very frequently, it will result in a substantial increase in CPU utilization because SSL sessions will be reused infrequently.
General diagnosis steps¶
These steps do not apply to the EMFILE error.
Make sure that the Unix socket used by sidd resides on a normal, local filesystem, as some network or other filesystems don't support Unix sockets. The default location for the Unix sockets is
IHSROOT/logs/siddport
.If that does not reside on a normal filesystem, use the SSLCachePortFilename directive to place the Unix socket in a directory which resides on a local filesystem.
Example:
SSLCachePortFilename /var/run/siddport
If more than one IBM HTTP Server instance is used on this machine, make sure that each has been configured with a specific Unix socket. This is usually a problem when two instances share the same server root or install location.
Example:
httpd-app1.conf
SSLCachePortFilename /var/run/app1-siddport
httpd-app2.conf
SSLCachePortFilename /var/run/app2-siddport
Make sure that one sidd process is running for every IBM HTTP Server instance. The parent process of sidd will be the parent httpd process of that web server instance.
If sidd is not running for one or more instances, use the sslcacheerrorlog directive in the conf file to specify the name of a sidd error log. Restart the web server. Once sidd exits or fails to start up, check the sidd error log.
Specific diagnosis steps for the EPERM failure¶
This problem is caused by the web server user id (e.g., "www" or "nobody") not having permission to read the Unix socket used by sidd, the session id cache daemon). When this error occurs:
It is caused by a configuration problem.
It can result in significant increase in CPU usage for busy web sites because the error will occur for every handshake.
Consider /opt/IBMIHS as the example IBM HTTP Server install directory, and assume that customer did not use the SSLCachePortFilename directive to specify the location of the sidd socket, and www is the web server user id (value of User directive).
When IBM HTTP Server starts up, sidd will create the file /opt/IBMIHS/logs/siddport. When a new client SSL connection is received, mod_ibm_ssl will be running as user "www" and will try to connect to the sidd socket. So user "www" must have read and execute permissions to these directories:
/opt
/opt/IBMIHS
/opt/IBMIHS/logs
And user "www" must have read permission to this "file":
/opt/IBMIHS/logs/siddport
Normally, when IBM HTTP Server is installed the directories will be world readable and executable. If the customer changes those permissions (on /opt, /opt/IBMIHS, or /opt/IBMIHS/logs) then permission errors will be received when new SSL connections are being established and mod_ibm_ssl tries to connect to the sidd socket. The SSLCachePortFilename directive can be used to place the sidd socket somewhere else.
Example:
SSLCachePortFilename /var/run/siddport
The actual file needs to be in a directory structure which, on your system, the web server user id can access.
If you have two instances of IBM HTTP Server that share an installation directory, they should each have a different argument to SSLCachePortFilename directive specified.
Specific diagnosis steps for the ECONNREFUSED failure¶
There are several classes of this error:
solid failure
The ECONNREFUSED error occurs for every handshake. sidd has exited or some misconfiguration problem exists.a single failure that occurs immediately following an IHS restart
intermittent failure
For solid failures, follow the general diagnosis steps above.
For a single failure that occurs immediately following an IHS restart, a
problem was identified and a fix provided by APAR
PK78007.
A fix for this APAR is provided in fixpacks 6.0.2.35, 6.1.0.25, and
7.0.0.5. Refer to the APAR for additional details.
The problem can be safely ignored as there are no ill effects on the
server itself, but you can apply the appropriate fixpack as desired.
The fix is pertinent only for this 'Connection Refused' error message
and not for the other errors such as 'Permission denied' error.
If you are getting multiples of this error message, then the problem is
likely to be some other error or misconfiguration that is not addressed
by this APAR.
For intermittent failures, find how many handshakes are impacted by comparing the number of failures to the number of total handshakes.
Set LogLevel info
in the web server configuration file, rename
error_log so that a new one is created, and restart. After sufficient
data has been gathered:
Find the total number of SSL handshakes
$ grep " Session ID: " logs/error_log | wc -l 5073
Find the number of sidd connect errors
$ grep "SSL0600" logs/error_log | wc -l 49
Find the percentage of failures
49 / 5073 is a little less than 1%
If the percentage of failure is less than 10%, it should have only a small impact on CPU usage.
If the percentage of failure is higher, check the operating system-specific notes below for known issues.
Specific diagnosis steps for the EMFILE failure¶
Operating system-specific notes¶
Solaris¶
Solaris 10 has an apparent problem, seen both on SPARC and x64 platforms, which results in the ECONNREFUSED failure even under relatively light loads. This issue is tracked by Sun under bug id 6460268
Customers encountering the "Connection refused: SSL0600S" message on Solaris 10 should check with Sun on the availability of a fix for this problem.
Linux¶
On Linux systems tested (2.4 and 2.6 kernels), the ECONNREFUSED error can only occur due to a configuration problem and/or the sidd process exiting. It will not occur intermittently, because the AF_UNIX support in the kernel will block a thread waiting to connect to sidd once the connect queue becomes full.