Diagnosing suspected problems with mod_deflate¶

Problems with HTTP compression and mod_deflate¶

IBM HTTP Server mod_deflate defects ¶

None in modern maintenance levels.

Common problems to check for first¶

Some common configuration can problems can lead to uncompressed content.

Some proxy servers, such as WebSeal, can remove the Accept-Encoding request header which makes all client requests uncompressable. Logging the Accept-Encoding (and Via) header tells you if the request was eligible for compression: CustomLog logs/deflate-debug.log "%h %l %u %t \"%r\" %>s %b %D %{RH}e %{WAS}e AE=%{Accept-Encoding}i CT=%{Content-Type}o CE=%{Content-Encoding}o %{Via}i %{no-gzip}e"
Cache poisoning when using BrowserMatch to block compression for some User-Agents.

The Apache manual for some time had a block of directives that encouraged second-guessing which browsers could receive compressed content. Unfortunately to do this safely you must also add Vary: User-Agent to the response which makes most caching solutions ineffective. These have been removed from the Apache and IHS manuals and should be removed from configurations where they're used.

Symptoms¶

Common symptoms are blank pages or error messages from the browser, or Javascript execution failures, or problems in Adobe Acrobat displaying PDF files.

Configuring mod_deflate to refuse to compress certain types of content¶

It is simple to disable compression based on the URI. Here are directives which disable compression for all URIs ending in .pdf or .jpg (upper or lower case):

 SetEnvIfNoCase Request_URI "\.pdf$" no-gzip
 SetEnvIfNoCase Request_URI "\.jpg$" no-gzip

Here are directives to disable compression for a specific URI when the client is Internet Explorer:

<Location /path/to/foo.css>
BrowserMatch MSIE no-gzip
</Location>

Confirming that the HTTP response from IBM HTTP Server is correct¶

Ensure that required IBM HTTP Server fixes have been applied.

The most likely cause is that the browser or browser plug-in cannot handle compressed data in the specific context. The browser or browser plug-in may not be able to uncompress certain media types at all or may not be able to uncompress certain media types when received in a certain order or some other limitation can be encountered. The problem could also depend on whether or not SSL is used.

Some theoretical problems that could be caused by mod_delate are:

the response body isn't compressed properly
the proper HTTP headers aren't specified, so that the browser doesn't realize that the response body should be uncompressed

Here are the steps for determining whether or not mod_deflate generated the proper response to deliver to the browser:

A testcase (particular request) that can reproduce the symptom needs to be determined. Hopefully this testcase will consistently show the symptom (i.e., not lead to intermittent failure).
Set up mod_net_trace to trace the data flows between IBM HTTP Server and a particular client (IP address) that will be used to reproduce the problem. Configure mod_net_trace according to these instructions. On the NetTrace directive, be sure to specify the IP address of the client that you'll use for reproducing the problem, as well as a large senddata value so that the entire response is traced.

Example:
```
NetTraceFile /tmp/nettrace
NetTrace client 111.222.333.444 dest file event senddata=5000000 event recvdata=1024
```
(The data sent to the server in the request body isn't normally an issue with compression issues, so we'll only trace the first 1024 bytes of the request body, if any.)

Also, it is recommended that you set LogLevel to Debug and use the DeflateFilterNote directive to log the request, compression ratio, and user agent string from the browser (see the DeflateFilterNote documentation).
Many headers or internal variables affect whether or not a response will be compressed. Adding the following string to your LogFormat in use will make sure these are recorded in the access log:

Age=%{AGE}o VIA=%{Via}i AE=%{Accept-Encoding}i CT=%{Content-Type}o CE=%{Content-Encoding}o NG=%{no-gzip}e GOT=%{gzip-only-text/html}e %{ETag}o
With these configuration changes in place, restart IBM HTTP Server and reproduce the problem from the browser. After the problem has been reproduced, close the browser to ensure that the connections from the browser are closed and copy the trace file (/tmp/nettrace in the example above) to another location so that additional browser traffic isn't written to the trace file that we'll examine next.
Disable mod_net_trace and revert LogLevel to the prior setting and then restart IBM HTTP Server.

The file with the network trace (e.g., /tmp/nettrace) is the key file to send to IBM HTTP Server support. But additional verification can be done by the customer relatively quickly using the ServerDoc tool to parse the network trace.
(The ServerDoc tool is part of the ihsdiag download package.)

Here is an example where the input file is /tmp/nettrace and the results of parsing it are to be stored in a new directory called /tmp/nettrace.parsed:

$ java -jar /tmp/ServerDoc.jar ParseNetTrace /tmp/nettrace /tmp/nettrace.parsed
 checking gzip integrity of /tmp/nettrace.parsed/127.0.0.1/0/sent.body.0
 checking gzip integrity of /tmp/nettrace.parsed/127.0.0.1/1/sent.body.0
 checking gzip integrity of /tmp/nettrace.parsed/127.0.0.1/1/sent.body.1
 checking gzip integrity of /tmp/nettrace.parsed/127.0.0.1/1/sent.body.2

(In this example trace, there were four compressed response bodies.)

If the compressed data is invalid and could cause a problem for the browser, errors will be encountered and displayed by ServerDoc, as in the following example:

java -jar /tmp/ServerDoc.jar ParseNetTrace /tmp/nettrace.bad/tmp/nettrace.bad.parsed
 checking gzip integrity of /tmp/nettrace.bad.parsed/127.0.0.1/0/sent.body.0
    /tmp/nettrace.bad.parsed/127.0.0.1/0/sent.body.0 is not properly gzipped!
    java.util.zip.ZipException: invalid bit length repeat
 checking gzip integrity of /tmp/nettrace.bad.parsed/127.0.0.1/1/sent.body.0
 checking gzip integrity of /tmp/nettrace.bad.parsed/127.0.0.1/1/sent.body.1
 checking gzip integrity of /tmp/nettrace.bad.parsed/127.0.0.1/1/sent.body.2

(For this example, we took a valid network trace generated by mod_net_trace but replaced some of the hex data in the trace file with a different sequence of bytes to simulate a corrupted response.)

Beyond the automatic gzip integrity checking performed by ServerDoc, the response headers and the uncompressed data may need to be examined as well. The response headers will be created by ServerDoc in files called sent.hdr.0, sent.hdr.1, and so on. The header field Content-Encoding must be present whenever the response body is compressed. ServerDoc will not try to check the gzip integrity of responses that did not contain the Content-Encoding header field, so gzipped bodies that weren't checked by ServerDoc possibly have invalid or missing header information.

It is possible that the response body that was gzipped was incomplete such that the gzipped response is valid from a gzipped encoding perspective yet when it is uncompressed by the browser there is missing information (e.g., a truncation occurred). To uncompress the response bodies and see what content was sent, use the gunzip utility.

$ gunzip < /tmp/nettrace.parsed/127.0.0.1/1/sent.body.0 > /tmp/uncompressed

The uncompressed data in file /tmp/uncompressed will have to be examined by someone that knows what is expected in order to determine if the data is truncated or is otherwise malformed.

If a problem is discovered in the data written by mod_deflate or a problem is suspected in the HTTP header, the documentation to send to IBM is

your modified httpd.conf
the request that led to the problem (i.e., "what URL?")
the network trace generated when recreating the problem
the uncompressed data that should have been sent to the browser
any messages written to IHS log files (e.g., access_log, error_log, any other custom logs) during the test case