Web Servers
Web Servers Recipe
- The maximum concurrency variables (e.g.
MaxClients
for IHS) are the key tuning variables. Ensure such variables are not saturated through tools such asmpmstats
ormod_status
, while at the same time ensuring that the backend server resources (e.g. CPU, network) are not saturated (this can be done by scaling up the backend, sizing thread pools to queue, optimizing the backend to be faster, or limiting maximum concurrent incoming connections and the listen backlog). - Clusters of web servers are often used with IP sprayers or caching proxies balancing to the web servers. Ensure that such IP sprayers are doing "sticky SSL" balancing so that SSL Session ID reuse percentage is higher.
- Load should be balanced evenly into the web servers and back out to
the application servers. Compare access log hit rates for the former,
and use WAS plugin
STATS
trace to verify the latter. - Review snapshots of thread activity to find any bottlenecks. For
example, in IHS, increase the frequency of
mpmstats
and review the state of the largest number of threads. - Review the keep alive timeout. The ideal value is where server
resources (e.g. CPU, network) are not saturated, maximum concurrency is
not saturated, and the average number of keepalive requests has peaked
(in IHS, review with
mpmstats
ormod_status
). - Check the access logs for HTTP response codes (e.g.
%s
for IHS) >= 400. - Check the access logs for long response times (e.g.
%D
for IHS). - For the WebSphere Plugin, consider setting
ServerIOTimeoutRetry="0"
to avoid retrying requests that time out due toServerIOTimeout
(unlessServerIOTimeout
is very short). - Enable mod_logio and add
%^FB
to LogFormat for time until first bytes of the response - Review access and error logs for any errors, warnings, or high volumes of messages.
- Check http_plugin.log for
ERROR: ws_server: serverSetFailoverStatus: Marking .* down
- Use WAS plugin
DEBUG
orTRACE
logging to dive deeper into unusual requests such as slow requests, requests with errors, etc. Use an automated script for this analysis.
Also review the operating systems chapter.
General
"Web servers like IBM HTTP Server are often used in front of WebSphere Application Server deployments to handle static content or to provide workload management (WLM) capabilities. In versions of the WebSphere Application Server prior to V6, Web servers were also needed to effectively handle thousands of incoming client connections, due to the one-to-one mapping between client connections and Web container threads... In WebSphere Application Server V6 and later, this is no longer required with the introduction of NIO and AIO. For environments that use Web servers, the Web server instances should be placed on dedicated systems separate from the WebSphere Application Server instances. If a Web server is collocated on a system with a WebSphere Application Server instance, they will effectively share valuable processor resources, reducing overall throughput for the configuration."
Locating the web server on a different machine from the application servers may cause a significant throughput improvement (in one benchmark, 27%).
IBM HTTP Server
The IBM HTTP Server is based on the open source Apache httpd code with IBM enhancements. General performance tuning guidelines: http://publib.boulder.ibm.com/httpserv/ihsdiag/ihs_performance.html
Multi-Processing Modules (MPM)
Requests are handled by configurable multi-processing modules (MPMs) (http://publib.boulder.ibm.com/httpserv/manual70/mpm.html, http://publib.boulder.ibm.com/httpserv/manual70/mod/). The most common are:
- worker: This is the default, multi-threaded and optionally multi-process MPM. (http://publib.boulder.ibm.com/httpserv/manual70/mod/worker.html)
- event: Built on top of worker and designed to utilize more asynchronous operating system APIs (http://publib.boulder.ibm.com/httpserv/manual70/mod/event.html)
- prefork: A single thread/process for each request. Not recommended. Generally used for unthread safe or legacy code.
This is the default configuration on distributed platforms other than Windows:
# ThreadLimit: maximum setting of ThreadsPerChild
# ServerLimit: maximum setting of StartServers
# StartServers: initial number of server processes to start
# MaxClients: maximum number of simultaneous client connections
# MinSpareThreads: minimum number of worker threads which are kept spare
# MaxSpareThreads: maximum number of worker threads which are kept spare
# ThreadsPerChild: constant number of worker threads in each server process
# MaxRequestsPerChild: maximum number of requests a server process serves
<IfModule worker.c>
ThreadLimit 25
ServerLimit 64
StartServers 1
MaxClients 600
MinSpareThreads 25
MaxSpareThreads 75
ThreadsPerChild 25
MaxRequestsPerChild 0
Out of the box, IBM HTTP Server supports a maximum of 600 concurrent connections. Performance will suffer if load dictates more concurrent connections, as incoming requests will be queued up by the host operating system...
First and foremost, you must determine the maximum number of simultaneous connections required for this Web server. Using mod_status or mod_mpmstats (available with ihsdiag) to display the active number of threads throughout the day will provide some starting data.
There are 3 critical aspects to MPM (Multi-processing Module) tuning in IBM HTTP Server.
- Configuring the maximum number of simultaneous connections (MaxClients directive)
- Configuring the maximum number of IBM HTTP Server child processes (ThreadsPerChild directive)
- Less importantly, configuring the ramp-up and ramp-down of IBM HTTP Server child processes (MinSpareThreads, MaxSpareThreads, StartServers)
The first setting (MaxClients) has the largest immediate impact, but the latter 2 settings help tune IBM HTTP Server to accommodate per-process features in Apache modules, such as the WebSphere Application Server Web server plug-in.
This is the default configuration on Windows:
ThreadLimit 600
ThreadsPerChild 600
MaxRequestsPerChild 0
In general, recommendations for a high performance, non-resource constrained environment:
- If using TLS, then ThreadsPerChild=100, decide on MaxClients, and then ServerLimit=MaxClients/ThreadsPerChild; otherwise, ThreadsPerChild=MaxClients and ServerLimit=1.
- StartServers=ServerLimit
- MinSpareThreads=MaxSpareThreads=MaxClients
- MaxRequestsPerChild=0
- Test the box at peak concurrent load (MaxClients); for example: ${IHS}/bin/ab -c ${MaxClients} -n ${MaxClients*10} -i https://localhost/
Note that the default configuration does not follow the MaxClients = (ServerLimit * ThreadsPerChild) formula because it gives the flexibility to dynamically increase MaxClients up to the ceiling of ServerLimit * ThreadsPerChild and gracefully restart IHS without destroying existing connections or waiting for them to drain. This is a useful capability but few customers take advantage of it and it's usually best to follow the formula to reduce any confusion.
Note that the message "Server reached MaxClients setting" in the error_log will only be shown once per running worker process.
IBM HTTP Server typically uses multiple multithreaded processes for serving requests. Specify the following values for the properties in the web server configuration file (httpd.conf) to prevent the IBM HTTP Server from using more than one process for serving requests.
ServerLimit 1 ThreadLimit 1024 StartServers 1 MaxClients 1024 MinSpareThreads 1 MaxSpareThreads 1024 ThreadsPerChild 1024 MaxRequestsPerChild 0
Note that when TLS processing is enabled, there is some inter-process contention (buffers, etc.) so more processes and less processes per threads may be faster: http://publib.boulder.ibm.com/httpserv/ihsdiag/ihs_performance.html#Linux_Unix_ThreadsPerChild
MinSpareThreads, MaxSpareThreads
The MinSpareThreads and MaxSpareThreads options are used to reduce memory utilization during low traffic volumes. Unless this is very important, set both of these equal to MaxClients to avoid time spent destroying and creating threads.
MaxRequestsPerChild
The MaxRequestsPerChild option recycles a thread after it has processed the specified number of requests. Historically, this was used to prevent a leaking thread from using too much memory; however, it is generally recommended to set this to 0 and investigate any observed leaks.
Windows
Although IHS is supported on Windows 64-bit, it is only built as a 32-bit executable. So in all cases on Windows, IHS is limited to a 32-bit address space. IHS on Windows also only supports a single child process (https://www.ibm.com/support/knowledgecenter/SSAW57_8.5.5/com.ibm.websphere.nd.doc/ae/rprf_plugin.html). IHS on Windows is not /LARGEADDRESSAWARE, so it cannot utilize the extra space afforded by the /3GB switch. After APAR PI04922 Windows services created with the httpd-la.exe binary are large address aware (which does not depend on /3GB boot time option): http://www-01.ibm.com/support/docview.wss?uid=swg1PI04922
Note also on Windows that there is no MaxClients. It is set implicitly to ThreadsPerChild.
IBM HTTP Server for z/OS
Details: http://www-01.ibm.com/support/docview.wss?uid=tss1wp101170&aid=1
Consider using AsyncSockets=yes in httpd.conf
Access Log, LogFormat
The access log is enabled by default and writes one line for every processed request into logs/access.log. The format of the line is controlled with the LogFormat directive in httpd.conf: http://publib.boulder.ibm.com/httpserv/manual70/mod/mod_log_config.html
The access log is defined with the CustomLog directive, for example:
CustomLog logs/access_log common
The last part (e.g. "common") is the name of the LogFormat to use. Here is the default "common" LogFormat:
LogFormat "%h %l %u %t \"%r\" %>s %b" common
You can either modify this line or add a new LogFormat line with a new name and change the CustomLog to point to the new one.
We recommend adding at least %D to give the total response time (in microseconds).
LogFormat "%h %l %u %t \"%r\" %>s %b %D" common
Here are some other commonly useful directives:
- Print the time taken to serve the request, in microseconds: %D
- Print the time taken to serve the request, in seconds: %T
- Print the contents of the cookie JSESSIONID in the request sent to the server (also includes the clone ID that the cookie wants the request to go back to): %{JSESSIONID}C
- Print the contents of the cookie JSESSIONID in the response sent to the client: %{JSESSIONID}o
- View and log the SSL cipher negotiated for each connection: \"SSL=%{HTTPS}e\" \"%{HTTPS_CIPHER}e\" \"%{HTTPS_KEYSIZE}e\" \"%{HTTPS_SECRETKEYSIZE}e\"
- Print the host name the request was for (useful when the site serves multiple hosts using virtual hosts): %{Host}i
Time until first response bytes (%^FB)
mod_logio
allows the %^FB
LogFormat option to print the microseconds
between when the request arrived and the first byte of the response
headers are written. For example:
LoadModule logio_module modules/mod_logio.so
LogIOTrackTTFB ON
LogFormat "%h %l %u %t \"%r\" %>s %b %D \"%{WAS}e\" %X %I %O %^FB" common
Access Log Response Times (%D)
It is recommended to use %D in the LogFormat to track response times (in microseconds). The response time includes application time, queue time, and network time from/to the end-user and to/from the application.
Note that the time (%t) represents the time the request arrived for HTTPD >= 2.0 and the time the response was sent back for HTTPD < 2.0.
Access Log WAS Plugin Server Name (%{WAS}e)
If using the IBM WAS plugin, you can get the name of the application server that handled the request (http://publib.boulder.ibm.com/httpserv/ihsdiag/WebSphere61.html#LOG). The plugin sets an internal, per-request environment variable on the final transport it used to satisfy a request: %{WAS}e. It is fixed length so it has the first N characters of host/IP but always includes the port. The %{WAS}e syntax means log the environment variable (e) named 'WAS'.
Client ephemeral port
The client's ephemeral port is critical to correlate an access log
entry to network trace since a socket is uniquely identified by the
tuple (client IP, client port, httpd IP, httpd port). The client's
ephemeral port may be logged with %{remote}p
.
Commonly useful LogFormat
Putting everything together, a commonly useful LogFormat for IBM HTTP Server is:
LoadModule logio_module modules/mod_logio.so
LogIOTrackTTFB ON
LogFormat "%h %l %u %t \"%r\" %>s %b %D \"%{WAS}e\" %X %I %O %^FB %{remote}p %p" common
Edge Side Includes (ESI)
The web server plug-in contains a built-in ESI processor. The ESI processor can cache whole pages, as well as fragments, providing a higher cache hit ratio. The cache implemented by the ESI processor is an in-memory cache, not a disk cache, therefore, the cache entries are not saved when the web server is restarted.
When a request is received by the web server plug-in, it is sent to the ESI processor, unless the ESI processor is disabled. It is enabled by default. If a cache miss occurs, a Surrogate-Capabilities header is added to the request and the request is forwarded to the WebSphere Application Server. If servlet caching is enabled in the application server, and the response is edge cacheable, the application server returns a Surrogate-Control header in response to the WebSphere Application Server plug-in.
The value of the Surrogate-Control response header contains the list of rules that are used by the ESI processor to generate the cache ID. The response is then stored in the ESI cache, using the cache ID as the key. For each ESI "include" tag in the body of the response, a new request is processed so that each nested include results in either a cache hit or another request that forwards to the application server. When all nested includes have been processed, the page is assembled and returned to the client.
The ESI processor is configurable through the WebSphere web server plug-in configuration file plugin-cfg.xml. The following is an example of the beginning of this file, which illustrates the ESI configuration options.
<Property Name="esiEnable" Value="true"/> <Property Name="esiMaxCacheSize" Value="1024"/> <Property Name="esiInvalidationMonitor" Value="false"/>
... The second option, esiMaxCacheSize, is the maximum size of the cache in 1K byte units. The default maximum size of the cache is 1 megabyte.
If the first response has a Content-Length response header, the web server plug-in checks for the response size. If the size of the response body is larger than the available ESI caching space, the response passes through without being handled by ESI.
Some parent responses have nested ESI includes. If a parent response is successfully stored in the ESI cache, and any subsequent nested include has a Content-length header that specifies a size larger than the available space in the ESI cache, but smaller than the value specified for esiMaxCacheSize property, the plug-in ESI processor evicts other cache elements until there is enough space for the nested include in the ESI cache.
The third option, esiInvalidationMonitor, specifies if the ESI processor should receive invalidations from the application server... There are three methods by which entries are removed from the ESI cache: first, an entry expiration timeout occurs; second, an entry is purged to make room for newer entries; or third, the application server sends an explicit invalidation for a group of entries. For the third mechanism to be enabled, the esiInvalidationMonitor property must be set to true and the DynaCacheEsi application must be installed on the application server. The DynaCacheEsi application is located in the installableApps directory and is named DynaCacheEsi.ear. If the ESIInvalidationMonitor property is set to true but the DynaCacheEsi application is not installed, then errors occur in the web server plug-in and the request fails.
This ESI processor is monitored through the CacheMonitor application. For the ESI processor cache to be visible in the CacheMonitor, the DynaCacheEsi application must be installed as described above, and the ESIInvalidationMonitor property must be set to true in the plugin-cfg.xml file.
If you're not using the ESI cache, disable it as it has some expensive operations in computing hashes for each request: Administrative Console -> Servers > Web Servers > web_server_name > Plug-in properties > Caching -> Uncheck "Enable ESI," and then re-generate and re-propagate plugin. ESI processing can also cause underisable buffering in the WAS Plug-in.
Elliptic Curve Cryptography (ECC) is available in TLS 1.2 and may be a faster algorithm than RSA for SSL signature and key exchange algorithms. As of 2012, ECC ciphers are not supported by most major web browsers, but they are supported by Java 7, OpenSSL, and GSKit. ECC ciphers start with TLS_EC and are available starting in IHS 8.0.0.6
KeepAlive
KeepAlive allows the client to keep a socket open between requests, thus potentially avoiding TCP connection setup and tear down. For example, let's say a client opens a TCP connection and requests an HTML page. This HTML page contains one image. With KeepAlive, after the HTML response has been parsed and the image found, the client will re-use the previous TCP connection to request the image. (http://publib.boulder.ibm.com/httpserv/manual70/mod/core.html#keepalive)
When using mod_event on Linux for IHS >= 9 or z/OS for IHS >= 8.5.5, KeepAlive sockets do not count towards MaxClients. Elsewhere and with mod_worker, KeepAlive sockets do count towards MaxClients.
In the latter case, KeepAliveTimeout (default 5 seconds) is a balance between latency (a higher KeepAliveTimeout means a higher probability of connection re-use) and the maximum concurrently active requests (because a KeepAlive connection counts towards MaxClients for its lifetime).
Starting with IHS 9 and 8.5.5.18, KeepAliveTimeout may be set to ms; for example, "KeepAliveTimeout 5999ms". When done in this format, IHS will round up and time-out in roughly 6 seconds (in this example); however, it will send back a Keep-Alive timeout response header rounded down to 5 seconds (in this example). This is useful to avoid race conditions for clients who don't first try doing a read on a socket before doing a write in which case IHS might time-out half of the socket right as the client tries to re-use it and thus the response will fail.
Checking incoming connection re-use
The %X
LogFormat
option will show +
if a connection is kept-alive and
available for re-use.
The %k
LogFormat
option will show the number of keepalive requests handled by the
connection used to serve this response. If this number is consistently
0, then the client is not re-using connections or something is not
allowing the client connection to be re-used.
Gzip compression
mod_deflate can be used to use gzip compression on responses: http://publib.boulder.ibm.com/httpserv/manual70/mod/mod_deflate.html
mod_mpmstats
mpmstats is a very lightweight but powerful httpd extension that periodically prints a line to error_log with a count of the number of threads that are ready, busy, keepalive, etc. Here's an example:
[Wed Jan 08 16:59:26 2014] [notice] mpmstats: rdy 48 bsy 3 rd 0 wr 3 ka 0 log 0 dns 0 cls 0
On z/OS, ensure PI24990 is installed.
The default mpmstats interval is 10 minutes although we recommend setting it to 30 seconds or less:
<IfModule mod_mpmstats.c> # Write a record every 10 minutes (if server isn't idle). # Recommendation: Lower this interval to 60 seconds, which will # result in the error log growing faster but with more accurate # information about server load. ReportInterval 600 </IfModule>
As covered in the mod_mpmstats link above, some of the key statistics are:
- rdy (ready): the number of web server threads started and ready to process new client connections
- bsy (busy): the number of web server threads already processing a client connection
- rd (reading): the number of busy web server threads currently reading the request from the client
- wr (writing): the number of busy web server threads that have read the request from the client but are either processing the request (e.g., waiting on a response from WebSphere Application Server) or are writing the response back to the client
- ka (keepalive): the number of busy web server threads that are not processing a request but instead are waiting to see if the client will send another request on the same connection; refer to the KeepAliveTimeout directive to decrease the amount of time that a web server thread remains in this state
If mpmstats is enabled, when the server is approaching MaxClients, a
message is printed by mpmstats (this is in addition to the
server reached MaxClients setting
message printed by the
server itself).
[notice] mpmstats: approaching MaxClients (48/50)
By default, the mpmstats threshold is 90% and may be increased with MPMStatsBusyThreshold.
The mpmstats message will be repeated if the situation occurs again after clearing, whereas the server message will only appear once per process lifetime.
TrackHooks
In recent versions, TrackHooks may be used to get per module response times, check for long-running modules, and track response times of different parts of the request cycle (http://publib.boulder.ibm.com/httpserv/ihsdiag/mpmstats_module_timing.html#loghooks):
Recommended mpmstats configuration
<IfModule mod_mpmstats.c> # Write a record to stderr every 10 seconds (if server isn't idle). ReportInterval 10 TrackHooks allhooks TrackHooksOptions millis permodule logslow TrackModules On SlowThreshold 10 </IfModule>
Add the following to your LogFormat:
%{TRH}e %{TCA}e %{TCU}e %{TPR}e %{TAC}e %{RH}e
The final LogFormat line will most commonly look like this:
LogFormat "%h %l %u %t \"%r\" %>s %b %{TRH}e %{TCA}e %{TCU}e %{TPR}e %{TAC}e %{RH}e %{WAS}e %D" common
The above requires that mod_status and ExtendedStatus are enabled which enables additional statistics-gathering infrastructure in Apache:
LoadModule status_module modules/mod_status.so
<IfModule mod_status.c>
ExtendedStatus On
</IfModule>
As long as the configuration does not use a "<Location /server-status> [...] SetHandler server-status [...] </Location>" block, then there is no additional security exposure by loading mod_status and enabling ExtendedStatus (unless AllowOverride != ALL and someone creates a .htaccess file that enables it).
mod_smf
On z/OS, mod_smf provides additional SMF statistics: http://publib.boulder.ibm.com/httpserv/manual70/mod/mod_smf.html
Status Module
There is a status
module that can be enabled in IHS. It is not enabled by default (or
it hasn't been in the past). However, it does present some interesting
real time statistics which can help in understanding if requests are
backing up or if the site is humming along nicely. It helps provide a
second data point when trying to troubleshoot production problems. Most
enterprise organizations will want to make sure the URL
http://your.server.name/server-status?refresh=N
to access
the statistics are protected by a firewall and only available to the
system administrators.
IHSDiag
Use ihsdiag to take thread dumps to understand what IHS threads are doing in detail: http://publib.boulder.ibm.com/httpserv/ihsdiag/http://publib.boulder.ibm.com/httpserv/ihsdiag/
Fast Response Cache Accelerator
FRCA is deprecated.
FRCA/AFPA was deprecated starting in V7.0 [and] its use is discouraged. Instead, it is recommended to use the IBM HTTP Server default configuration to serve static files... If CPU usage with the default configuration is too high, the mod_mem_cache module can be configured to cache frequently accessed files in memory, or multiple web servers can be used to scale out horizontally. Additional options include the offloading of static files to a Content Delivery Network (CDN) or caching HTTP appliance, or to use the caching proxy component of WebSphere Edge Server in WebSphere Application Server Network Deployment (ND).
Websphere Plugin
ServerIOTimeout
Set a timeout value, in seconds, for sending requests to and reading responses from the application server.
If you set the ServerIOTimeout attribute to a positive value, this attempt to contact the server ends when the timeout occurs. However, the server is not [marked down].
If you set the ServerIOTimeout attribute to a negative value, the server is [marked down] whenever a timeout occurs...
If a value is not set for the ServerIOTimeout attribute, the plug-in, by default, uses blocked I/O to write requests to and read responses from the application server, and does not time out the TCP connection...
Setting the ServerIOTimeout attribute to a reasonable value enables the plug-in to timeout the connection sooner, and transfer requests to another application server when possible...
The default value is 900, which is equivalent to 15 minutes.
The ServerIOTimeout limits the amount of time the plug-in waits for each individual read or write operation to return. ServerIOTimeout does not represent a timeout for the overall request.
It is generally recommended to set a non-zero value for ServerIOTimeout. The value should be greater than the maximum expected response time for all legitimate requests.
In recent versions of WAS, the global ServerIOTimeout can be overidden for specific URLs (http://www-01.ibm.com/support/docview.wss?uid=swg1PM94198):
SetEnvIf Request_URI "\.jsp$" websphere-serveriotimeout=10
By default, if a ServerIOTimeout pops, then the plugin will re-send non-affinity (http://www-01.ibm.com/support/docview.wss?uid=swg21450051) requests to the next available server in the cluster. If, for example, the request exercises a bug in the application that causes an OutOfMemoryError, then after each timeout, the request will be sent to all of the other servers in the cluster, and if the behavior is the same, then effectively it will lead to a complete, cascading failure. This behavior can be controlled with ServerIOTimeoutRetry:
ServerIOTimeoutRetry specifies a limit for the number of times the HTTP plugin retries an HTTP request that has timed out, due to ServerIOTimeout. The default value, -1, indicates that no additional limits apply to the number of retries. A 0 value indicates there are no retries. Retries are always limited by the number of available servers in the cluster. Important: This directive does not apply to connection failures or timeouts due to the HTTP plug-in ConnectTimeout. (https://www.ibm.com/support/knowledgecenter/SSAW57_8.5.5/com.ibm.websphere.nd.doc/ae/rwsv_plugincfg.html)
The resolution of ServerIOTimeout may be affected by MaxSpareThreads. If ServerIOTimeout is taking longer than expected to fire, review the recommendations on MaxSpareThreads above and consider configuring it so that threads are not destroyed.
Retries
When will the WAS Plug-in retry a request: http://publib.boulder.ibm.com/httpserv/ihsdiag/plugin_questions.html#retry
Load Distribution
Use LogLevel="Stats" to print load distribution in the plugin log after each request (see page 28): http://www-01.ibm.com/support/docview.wss?uid=swg27020055&aid=1
There is an option called BackupServers which was used with WAS version 5 for DRS HTTP session failover, so this option is generally not used any more.
Common causes of different distribution include differing network performance, retransmission rates, or packet loss, different network paths, and/or different DNS resolution times.
MaxConnections
MaxConnections limits the number of connections the WAS Plug-in will open to a single application server from a single webserver child process. In practice, the per-process limitation severely limits the ability to pick a useful number.
- Crossing MaxConnections does not result in a markdown.
- MaxConnections applies even to affinity requests.
- It is usually better to drastically reduce the TCP listen backlog in the application server and reject workload that way"
https://publib.boulder.ibm.com/httpserv/ihsdiag/plugin_questions.html#maxconns
In addition:
The use of the MaxConnections parameter in the WebSphere plug-in configuration is most effective when IBM HTTP Server 2.0 and above is used and there is a single IHS child process. However, there are some operational tradeoffs to using it effectively in a multi-process webserver like IHS.
It is usually much more effective to actively prevent backend systems from accepting more connections than they can reliably handle, performing throttling at the TCP level. When this is done at the client (HTTP Plugin) side, there is no cross-system or cross-process coordination which makes the limits ineffective.
Using MaxConnections with more then 1 child processes, or across a webserver farm, introduces a number of complications. Each IHS child process must have a high enough MaxConnections value to allow each thread to be able to find a backend server, but in aggregate the child processes should not be able to overrun an individual application server."
https://publib.boulder.ibm.com/httpserv/ihsdiag/ihs_performance.html#MAXCONN
When MaxConnections is reached, an affinity or non-affinity request will print the following to http_plugin.log:
WARNING: ws_server_group: serverGroupCheckServerStatus: Server $server has reached maximum connections and is not selected
To monitor MaxConnections usage, consider using LogLevel="Stats". If the resulting logging needs to be filtered, consider using piped logging to a script and filter as needed. An alternative monitoring option is to look at network sockets (e.g. Linux ss); however, connections are pooled so this doesn't give insight into actively used connections.
WebSphere Caching Proxy (WCP)
The WebSphere Caching Proxy (WCP) is optimized to store and serve cacheable responses from a backend application. WCP is primarily configured through the ibmproxy.conf file: https://www.ibm.com/support/knowledgecenter/SSAW57_8.5.5/com.ibm.websphere.edge.doc/cp/admingd45.html
The CacheQueries directive may be specified multiple times with different patterns of URLs whose content may be cached. URL patterns may be excluded with the NoCaching directive.
CacheQueries PUBLIC http://*/*
NoCaching http://*/files/form/anonymous/api/library/*/document/*/media/*
HTTP responses may be GZIP compressed based on MIME type:
CompressionFilterEnable /opt/ibm/edge/cp/lib/mod_z.so
CompressionFilterAddContentType text/html
The CacheMemory directive specifies the maximum amount of native memory each WCP process may use for in-memory caching. This will be limited by the operating system, whether the process is 32-bit or 64-bit, shared libraries, and other constraints.
CacheMemory 1000 M
WCP has a thread pool which should match or exceed MaxClients in downstream web server(s) for example.
MaxActiveThreads 700
In general, it is recommended to pool the connections to the backend servers (such as web servers) to avoid the cost of constantly establishing and closing those connections.
ServerConnPool on
The time idle connections in this pool are held open is controlled with ServerConnTimeout and ServerConnGCRun.
By default, WCP will not cache responses with expiration times within the CacheTimeMargin. If you have available memory, disable this:
CacheTimeMargin 0
Load Balancers
Some load balancers are configured to keep affinity between the client IP address and particular web servers. This may be useful to simplify problem determination because the set of requests from a user will all be in one particular web server. However, IP addresses do not always uniquely identify a particular user (e.g. NAT), so this type of affinity can distort the distribution of requests coming into the web servers and it is not functionally required because the WAS plugin will independently decide how to route the request, including looking at request headers such as the JSESSIONID cookie if affinity is required to a particular application server.
Load balancers often have a probe function which will mark down back-end services if they are not responsive to periodic TCP or HTTP requests. One example of this happening was due to the load balancer performing TLS negotiation, exhausting its CPU, and then not having enough juice to process the response quickly enough.
WebSphere Load Balancer
WebSphere Edge Components Load Balancer balances the load of incoming requests. It is sometimes called eLB for Edge Load Balancer or ULB for Userspace Load Balancer (in contrast to the older "IPv4" version that required kernel modules on every platform).
There is a IBM WebSphere Edge Load Balancer for IPv4 and IPv6 Data Collection Tool to investigate LB issues.
nginx
Containers
Forward Proxy One-liner
docker run --rm --entrypoint /bin/sh --name nginx -p 8080:80 -it nginx -c "printf 'server { listen 80 default_server; listen [::]:80 default_server; server_name _; location / { resolver %s; proxy_pass \$scheme://\$http_host\$request_uri; } }' \$(awk '/nameserver/ {print \$2}' /etc/resolv.conf) > /etc/nginx/conf.d/default.conf; cat /etc/nginx/conf.d/default.conf; /docker-entrypoint.sh nginx -g 'daemon off;';"
curl --proxy http://localhost:8080/ http://example.com/
To print debug to stdout, sed
the log level to debug and
use the -debug
nginx binary:
docker run --rm --entrypoint /bin/sh --name nginx -p 8080:80 -it nginx -c "printf 'server { listen 80 default_server; listen [::]:80 default_server; server_name _; location / { resolver %s; proxy_pass \$scheme://\$http_host\$request_uri; } }' \$(awk '/nameserver/ {print \$2}' /etc/resolv.conf) > /etc/nginx/conf.d/default.conf; cat /etc/nginx/conf.d/default.conf; sed -i 's/notice/debug/g' /etc/nginx/nginx.conf; /docker-entrypoint.sh nginx-debug -g 'daemon off;';"
HAProxy
Keep-Alive
The default http-reuse
strategy is safe
. Consider whether aggressive
or always
are acceptable (as discussed in the manual
and blog)
as they generally provide better performance.