Memory leak mustgather

Known issue to check first

Prior to PH42111 (8.5.5.22, 9.0.5.11), Intelligent Management (odrlib) in the WAS Plug-in can leak memory proportional to the number of HTTP requests handled.

Potential leak with MaxKeepAliveRequests = 0

While not confirmed, memory leaks may be a concern in configurations with MaxKeepAliveRequests 0 that also have one or more of the following additional risk properties:

  • MaxRequestsPerChild non-zero and expected to frequently trigger

  • A proxy in front of IHS that will keep connections open indefinitely

  • HostnameLookups ON or HostnameLookups double

  • Using the event mpm: Used by default with 8.5.5 or 9.0 on z/OS, 9.0 on Linux

Essentially, some operations result in memory allocated per-request that can only be released when the underlying connection is released.

If this solves a problem for you, please contact IBM support.

Leak in "sidd" process

7.0.0.41, 8.0.0.13, 8.5.5.9-8.5.5.11, and 9.0.0.0-9.0.0.2 can leak slowly in the sidd daemon. The fix in PI73661 is needed to address this slow leak. Note, this memory will never show up in an "httpd" process. Be sure not to confuse this APAR with PI66787. After PI66787 slows, but does not remove, the leak.

Memory allocation errors with high ThreadsPerChild on 64-bit AIX

64-bit IHS builds on AIX mistakenly shipped with a default MAXDATA setting in bin/envvars that limits overall heap size to around 2GB. While this does not cause a leak, it can turn virtual address space size growth into memory allocation failures. The line should be commented out on 64-bit IHS installs that use a non-default ThreadsPerChild or otherwise have high heap memory requirements.

High memory at startup

Higher memory at startup, but not necessarily a leak, in one environment compared to others may be caused by missing SSLConsolidate="True" in the higher memory environment.

Leak with Intelligent Management enabled before 8.5.5.4

Various leaks are present in the Intelligent Management for webservers (IM) feature of the WAS Webserver Plug-in until 8.5.5.4. One particulary large leak occurs when /server-status is accessed with IM enabled.

Leak with SSL enabled w/ GSKit 8.0.50.13/8.0.50.17 PI13422

IHS V8 can slowly leak memory w/ SSL enabled on Distributed platforms prior to PI13422 if interim fixes were installed that include GSKit 8.0.50.13/8.0.50.17, also apply PI13422 which provides GSKit 8.0.50.18.

Leak with SSL enabled w/ IHS V8 before PM85211

IHS V8 can slowly leak memory w/ SSL enabled on Distributed, non-Windows platforms prior to PM85211. As a workaround, set SSLCacheDisable at the bottom of your IHS configuration.

Many RewriteCond/RewriteRule directive, compounted by long URLs or large ThreadsPerChild.

Any time a variable is expanded or a backreference is captured in a RewriteCond or RewriteRule directive, that string is duplicated and the memory needed for it is reserved for future use on the same Apache thread.

If very large URL's are used, and match the expression in hundreds/thousands of RewriteRule directives, IHS can appear to leak as each thread accumulates storage to be used later. In extreme situations, this can lead to an out of memory crash.

Solution: If enormous numbers of mod_rewrite directives are needed, there are several ways to minimize the impact.

  • Do as much checking as possible in RewriteRule, which protects RewriteCond from evaluation (avoid RewriteRule .*)

  • Limit the unnecessary use of potentially large variables such as %{REQUEST_URI}

  • Use the [L] flag to prevent unnecessary processing of subsequent rules

  • Avoid unnecessary captures. Sometimes, non-capturing groups, such as '(?foo|bar)' are an alternative.

  • Combine redundant rules and conditions using more robust regular expressions

  • Limit the maximum URI length with LimitRequestLine

  • Set MaxMemFree to avoid saving the storage allocated for large strings.

  • Reduce ThreadsPerChild if using a 32-bit webserver and crashing.

Note, this is not actually a leak, just high memory usage.

32-bit webserver on any platform / any operating system

Exceeding 2000 ThreadsPerChild puts any 32-bit server into risk for exhausting all address space available in a single process. When no more memory is addressable, allocations will begin to fail and usually result in crashes.

2000 is not a magic number, and the exact limits on address space vary by system just as exact address space usage varies by configuration and workload.

If you have a high memory condition and recently drastically increased ThreadsPerChild, it's no coincidence.

Note, this is not actually a leak, just high memory usage.

AIX, any configuration

The IHSROOT/bin/envvars file on AIX ships with performance tuning for the native heap library. Many software products have found that this configuration can lead to fragmentation and the appearance of a memory leak.

Append the following configuration to IHSROOT/bin/envvars

unset MALLOCMULTIHEAP 
MALLOCTYPE=buckets
export MALLOCTYPE

Leaks with third-party module "BMC Web Access Manager"

The module "BMC Web Access Manager" (ds_wac_module) has demonstrated an unbounded leak under some circumstances (at least 5.7.1 and 5.7.2).

IBM support will not look at any alleged memory leak issue if this module is loaded at the time of the leak.

z/OS with SSLClientAuth configured

PK79915 is necessary on z/OS with SSLClientAuth required to avoid a memory leak.

Linux, with ThreadsPerChild > 100

On a 64-bit build with no per-process memory limits, high ThreadsPerChild with a large or unlimited ulimit -s will just cause a high "VSZ" (virtual address space size). On a 32-bit build, or a 64-bit build on a system that limits the address space size, this can result in a crash.

Note, this is not actually a leak or even high memory usage, just high (virtual) address space size.

Any operating system, system-wide memory use appears to grow proportional to access log file size

This is a non-issue. Operating systems buffer recently accessed files in memory, and purge these buffers when the memory is needed.

  • On AIX, check the output of vmstat -v | grep numperm over time. This is the % of memory used by filesystem buffers.

  • On Linux, check the output of free -h over time. Note, this is not actually a leak, just high memory usage.

Confirm that there is a memory leak

If a growth of both RSS and VSZ can be demonstrated, it's important to verify that the memory usage never flattens out after many tens of thousands of hits. Any report of a memory leak must show both RSS and VSZ, for a fixed set of processes, over time. Reports of growth in "system memory" or "free memory" cannot be worked by support.

There are several ways to monitor per-process memory growth, but the overall goals are the same -- quantify the rate of unbounded growth and selectively disable features until the growth stops.

Before you start

Before you start, consult the "things to check first" at the top of this page and make sure each has been ruled out.

To properly monitor memory growth, ensure MaxRequestsPerChild is zero and MaxSpareThreads is equal to MaxClients. This will ensure that processes stick around long enough to be measured.

You should idenfity a process that has been using high and increasing memory over the course of many thousands of requests, then measure its further growth.

Collecting data with ihsdiag

ihsdiag contains a tool that collects system information and monitors memory useage over time. It must be run while httpd memory growth is apparent long after steady state has been reached. It is not supported on Windows or HP-UX.

Note: The example below will run for 15 minutes. If your servers memory growth doesn't increase in 15 minutes, choose a larget interval (replacing 900 with some larger number of seconds)

$ java -jar ServerDoc.jar MonitorMemory
usage: java -jar ServerDoc.jarMonitorMemory {ServerInstallPath} [interval total-monitor]

$ java -jar ServerDoc.jar MonitorMemory ~/SRC/2.2.8/built 30 900
Web server version: 8.5.5.3
Available local GSKit version: 8.0.50.13 (32-bit)
Available global GSKit version: 7.0.4.45 (32-bit)
Sleeping 5 seconds before checking memory use again...
Sleeping 5 seconds before checking memory use again...
Sleeping 5 seconds before checking memory use again...
Sleeping 5 seconds before checking memory use again...
Sleeping 5 seconds before checking memory use again...
Sleeping 5 seconds before checking memory use again...
Sleeping 5 seconds before checking memory use again...
Sleeping 5 seconds before checking memory use again...
Sleeping 5 seconds before checking memory use again...
Sleeping 5 seconds before checking memory use again...
Sleeping 5 seconds before checking memory use again...
Reports, log files, and configuration files have been saved to directory
  MemoryUse.201407221722
If you have additional log files or configuration files, copy them there
before packing up the directory.
Web server log and conf files other than the default will have to be
copied manually.
WebSphere plug-in conf and log files will have to be copied manually.

Collecting data manually

If you want to measure memory usage of httpd processes over time manually, use the ps command below in a loop

Platform Memory Command
AIX ps -A -o pid,ppid,vsz,rssize,pmem,pcpu,args|egrep '(httpd|sidd|rotatelogs|PID)'|grep -v grep
Solaris ps -A -o pid,ppid,vsz,rss,pmem,comm|egrep '(httpd|sidd|rotatelogs|PID)'|grep -v grep
Linux ps -eo pid,ppid,vsize,rss,resident,size,share,pcpu,cmd|egrep '(httpd|sidd|rotatelogs|PID)'|grep -v grep
z/OS ps -A -o pid,ppid,vsz,vsz64,rss,args|grep httpd|egrep '(httpd|sidd|rotatelogs|PID)'|grep -v grep
Windows pslist -m

These can be run in a loop with a basic shell command, such as:

while ps -eo pid,ppid,vsize,rss,resident,size,share,pcpu,cmd|egrep '(httpd|sidd|rotatelogs)'; do date && sleep 30; done

Identifying a culprit

The following steps are required to help suppport identify the cause of a memory leak. When opening a PMR, be sure to illustrate which (if any) of the diagnostic steps have been taken and their impact on the symptom.

NOTE: Resolving the "known issues to check for first" is step 1! These are frequent issues or configurations that resemble memory leaks.

  • Disable any third-party modules and check for the presence of the leak.

  • Disable any optional features (compression, SSL) that still allow.

  • Disable the WebsPhere Plugin ESI cache.

  • Try to identify the level of IHS that first leaks under your configuration.

  • Check if the presence of the leak is a function of the version of the WebSphere Plugin used.

  • Configure MaxMemFree 0 (see above) and see if there is any change in behavior.

As a last resort, if every preceding bullet in the prior sections yields no progress, generate a core of a high memory process, with a minimal configuration, and run a CrashDoc on it. If the items in the "known issues to check for first", and the preceding bullets have not been vetted and documented, no analysis of the corefile will be done.

valgrind

Valgrind is a general purpose tool for detetecting memory leaks and misuse. It may not necessarily be suitable for production. Unfortunately, it has a few downsides:

  • It can be easily confused by the way the core of Apache pools memory

  • It produces a lot of false positives

  • It has a significant startup and runtime performance overhead

Steps to collect valgrind diagnostics to determine the source of a memory leak:

  • Install valgrind on your system. valgrind is available as part of most enterprise linux distributions in the main repositories.

  • Make a copy of your apachectl script, named "apahectl.valgrind".

  • Edit "apachectl.valgrind" and change the line beginning with "HTTPD=" to add a new command to the front of the string:

    sed -i -e "s@HTTPD='@HTTPD='valgrind --leak-check=full --trace-children=yes --log-file=/tmp/valgrind-%p.log @g" bin/apachectl.valgrind

  • start IHS via "bin/apachectl.valgrind"

  • After abnormal memory growth is observed, stop IHS with "bin/apachectl"

  • Zip up /tmp/valgrind-*.log and send it to support.

Short term relief

  • By default, each request processing thread sets aside the maximum amount of memory it has ever needed to serve a response. The MaxMemFree directive can be used to limit the amount of memory set aside. Setting this value too low (0, 128, 256) causes some extra CPU usage as each thread must fight contention in the native heap library when it needs additional memory. If this has an effect on your memory leak, it implies the memory being leaked is "APR pool memory", and not native heap memory, which can help identify the culprit.

  • Setting MaxRequestsPerChild to a non-zero value (e.g. 10000) causes IHS child processes to routinely be cleaned up, which can alleviate the impact of slow memory leaks.