Diagnosing suspected problems with mod_deflate¶
Problems with HTTP compression and mod_deflate¶
IBM HTTP Server mod_deflate defects¶
None in modern maintenance levels.
Common problems to check for first¶
Some common configuration can problems can lead to uncompressed content.
Some proxy servers, such as WebSeal, can remove the
Accept-Encoding
request header which makes all client requests uncompressable. Logging the Accept-Encoding (and Via) header tells you if the request was eligible for compression: CustomLog logs/deflate-debug.log "%h %l %u %t \"%r\" %>s %b %D %{RH}e %{WAS}e AE=%{Accept-Encoding}i CT=%{Content-Type}o CE=%{Content-Encoding}o %{Via}i %{no-gzip}e"Cache poisoning when using
BrowserMatch
to block compression for some User-Agents.The Apache manual for some time had a block of directives that encouraged second-guessing which browsers could receive compressed content. Unfortunately to do this safely you must also add
Vary: User-Agent
to the response which makes most caching solutions ineffective. These have been removed from the Apache and IHS manuals and should be removed from configurations where they're used.
Other problems¶
Other problems can occur even when IBM HTTP Server sends the correct data to the web browser. Customer experiences with HTTP compression will depend on what type of data is compressed. Different browsers have problems with certain types of data being compressed. Compressing plain HTML works fine on any modern browser, though people have experienced browser problems when HTML with embedded Javascript is compressed. There is some indication that in Internet Explorer the decompression path changes the timing of Javascript loading and some Javascript which would otherwise work will then fail with this changed timing. This has happened not just with mod_deflate but also with mod_gzip, which has been available for Apache 1.3 for a long time.
The Adobe Acrobat plug-in has known problems dealing with PDF files that mod_deflate has compressed. This is not a mod_deflate problem, and the same type of problem has occured with mod_gzip, an independent implementation of HTTP compression.
Some browser levels have had problems processing compressed Javascript. When browsers do not properly process compressed objects, IBM HTTP Server configuration directives must be used to disable compression for certain URLs and/or certain browsers.
Here's an interesting article regarding an IE 6 bug (there is a similar one for IE 5.5):
http://support.microsoft.com/default.aspx?scid=kb;[LN];Q312496
Here's another article about an IE bug with Javascript in the presence
of cache-control: no-cache
:
http://support.microsoft.com/default.aspx?scid=kb%3Ben-us%3B327286.
Here's another one, outlining one person's experience with compression changing the timing of Javascript execution enough that the Javascript no longer worked:
http://lists.over.net/pipermail/mod_gzip/2001-March/001708.html
Symptoms¶
Common symptoms are blank pages or error messages from the browser, or Javascript execution failures, or problems in Adobe Acrobat displaying PDF files.
Configuring mod_deflate to refuse to compress certain types of content¶
It is simple to disable compression based on the URI. Here are directives which disable compression for all URIs ending in .pdf or .jpg (upper or lower case):
SetEnvIfNoCase Request_URI "\.pdf$" no-gzip
SetEnvIfNoCase Request_URI "\.jpg$" no-gzip
Here are directives to disable compression for a specific URI when the client is Internet Explorer:
<Location /path/to/foo.css>
BrowserMatch MSIE no-gzip
</Location>
Confirming that the HTTP response from IBM HTTP Server is correct¶
Ensure that required IBM HTTP Server fixes have been applied.
The most likely cause is that the browser or browser plug-in cannot handle compressed data in the specific context. The browser or browser plug-in may not be able to uncompress certain media types at all or may not be able to uncompress certain media types when received in a certain order or some other limitation can be encountered. The problem could also depend on whether or not SSL is used.
Some theoretical problems that could be caused by mod_delate are:
the response body isn't compressed properly
the proper HTTP headers aren't specified, so that the browser doesn't realize that the response body should be uncompressed
Here are the steps for determining whether or not mod_deflate generated the proper response to deliver to the browser:
A testcase (particular request) that can reproduce the symptom needs to be determined. Hopefully this testcase will consistently show the symptom (i.e., not lead to intermittent failure).
Set up mod_net_trace to trace the data flows between IBM HTTP Server and a particular client (IP address) that will be used to reproduce the problem. Configure mod_net_trace according to these instructions. On the
NetTrace
directive, be sure to specify the IP address of the client that you'll use for reproducing the problem, as well as a large senddata value so that the entire response is traced.Example:
NetTraceFile /tmp/nettrace NetTrace client 111.222.333.444 dest file event senddata=5000000 event recvdata=1024
(The data sent to the server in the request body isn't normally an issue with compression issues, so we'll only trace the first 1024 bytes of the request body, if any.)
Also, it is recommended that you set
LogLevel
toDebug
and use theDeflateFilterNote
directive to log the request, compression ratio, and user agent string from the browser (see the DeflateFilterNote documentation).Many headers or internal variables affect whether or not a response will be compressed. Adding the following string to your
LogFormat
in use will make sure these are recorded in the access log:Age=%{AGE}o VIA=%{Via}i AE=%{Accept-Encoding}i CT=%{Content-Type}o CE=%{Content-Encoding}o NG=%{no-gzip}e GOT=%{gzip-only-text/html}e %{ETag}o
With these configuration changes in place, restart IBM HTTP Server and reproduce the problem from the browser. After the problem has been reproduced, close the browser to ensure that the connections from the browser are closed and copy the trace file (
/tmp/nettrace
in the example above) to another location so that additional browser traffic isn't written to the trace file that we'll examine next.Disable mod_net_trace and revert LogLevel to the prior setting and then restart IBM HTTP Server.
The file with the network trace (e.g., /tmp/nettrace) is the key file to send to IBM HTTP Server support. But additional verification can be done by the customer relatively quickly using the ServerDoc tool to parse the network trace.
(The ServerDoc tool is part of the ihsdiag download package.)Here is an example where the input file is
/tmp/nettrace
and the results of parsing it are to be stored in a new directory called/tmp/nettrace.parsed
:$ java -jar /tmp/ServerDoc.jar ParseNetTrace /tmp/nettrace /tmp/nettrace.parsed checking gzip integrity of /tmp/nettrace.parsed/127.0.0.1/0/sent.body.0 checking gzip integrity of /tmp/nettrace.parsed/127.0.0.1/1/sent.body.0 checking gzip integrity of /tmp/nettrace.parsed/127.0.0.1/1/sent.body.1 checking gzip integrity of /tmp/nettrace.parsed/127.0.0.1/1/sent.body.2
(In this example trace, there were four compressed response bodies.)
If the compressed data is invalid and could cause a problem for the browser, errors will be encountered and displayed by ServerDoc, as in the following example:
java -jar /tmp/ServerDoc.jar ParseNetTrace /tmp/nettrace.bad/tmp/nettrace.bad.parsed checking gzip integrity of /tmp/nettrace.bad.parsed/127.0.0.1/0/sent.body.0 /tmp/nettrace.bad.parsed/127.0.0.1/0/sent.body.0 is not properly gzipped! java.util.zip.ZipException: invalid bit length repeat checking gzip integrity of /tmp/nettrace.bad.parsed/127.0.0.1/1/sent.body.0 checking gzip integrity of /tmp/nettrace.bad.parsed/127.0.0.1/1/sent.body.1 checking gzip integrity of /tmp/nettrace.bad.parsed/127.0.0.1/1/sent.body.2
(For this example, we took a valid network trace generated by mod_net_trace but replaced some of the hex data in the trace file with a different sequence of bytes to simulate a corrupted response.)
Beyond the automatic gzip integrity checking performed by ServerDoc, the
response headers and the uncompressed data may need to be examined as
well. The response headers will be created by ServerDoc in files called
sent.hdr.0
, sent.hdr.1
, and so on. The header field Content-Encoding
must be present whenever the response body is compressed. ServerDoc will
not try to check the gzip integrity of responses that did not contain
the Content-Encoding header field, so gzipped bodies that weren't
checked by ServerDoc possibly have invalid or missing header
information.
It is possible that the response body that was gzipped was incomplete such that the gzipped response is valid from a gzipped encoding perspective yet when it is uncompressed by the browser there is missing information (e.g., a truncation occurred). To uncompress the response bodies and see what content was sent, use the gunzip utility.
$ gunzip < /tmp/nettrace.parsed/127.0.0.1/1/sent.body.0 > /tmp/uncompressed
The uncompressed data in file /tmp/uncompressed
will have to be
examined by someone that knows what is expected in order to determine if
the data is truncated or is otherwise malformed.
If a problem is discovered in the data written by mod_deflate or a problem is suspected in the HTTP header, the documentation to send to IBM is
your modified httpd.conf
the request that led to the problem (i.e., "what URL?")
the network trace generated when recreating the problem
the uncompressed data that should have been sent to the browser
any messages written to IHS log files (e.g., access_log, error_log, any other custom logs) during the test case