FAQ: mod_rewrite

Debugging

Enabling mod_rewrite

Verify that the following directive is uncommented:

LoadModule rewrite_module modules/mod_rewrite.so

(For IBM HTTP Server 2.0.42 and all later releases.)

My rules are ignored. Nothing is written to the rewrite log.

The most common cause of this is placing mod_rewrite directives at global scope (outside of any VirtualHost containers) but expecting the directives to apply to requests which were matched by a VirtualHost container.

In this example, the mod_rewrite configuration will be ignored for requests which are received on port 443:

RewriteEngine On
RewriteRule ^index.htm$ index.html

<VirtualHost *:443>
# ... existing vhost directives
</VirtualHost>

Unlike most configurable features, the mod_rewrite configuration is not inherited by default within a <VirtualHost > container. To have global mod_rewrite directives apply to a VirtualHost, add these two extra directives to the VirtualHost container:

<VirtualHost *:443>
existing vhost directives
RewriteEngine On
RewriteOptions Inherit
</VirtualHost>

Other possibilities:

  • Not including RewriteEngine On in the same scope

  • Not loading mod_rewrite, and rewrite directives are inside <IfModule>

Escaped and unescaped URIs and query strings

When mod_rewrite is working on request URIs and query strings:

  • The URI compared or captured against implicitly by a RewriteRule has been unescaped, so spaces etc are literally present, not escaped.

  • The query string (%{QUERY_STRING}) has NOT been unescaped, so if the request had escaped spaces and such in the query string, then they're still escaped at this time

  • When we get done rewriting things, we need to make sure we leave things in the same state.

So, if we have this rewrite rule:

RewriteRule ^/abc/([a-zA-Z0-9_]*)$ /def?key=$1 [PT,L]

The regex capture is coming from the URI path, and so it's capturing unescaped text. But we're substituting that text into the query string part of the result, and because we're using [PT], mod_rewrite won't escape it (see below). So we need to escape that text ourselves as we substitute it, or we'll end up with an unescaped query string when an escaped query string is expected, and when that gets passed on, it won't look like a valid request.

To do that we can use RewriteMap, like this:

RewriteMap escapemytext int:escape
RewriteRule ^/abc/([a-zA-Z0-9_]*)$ /def?key=${escapemytext:$1} [PT,L]

This would be a little different if we weren't using [PT]. If mod_rewrite is mapping a URL to a filename, then mod_rewrite applies escaping to the result at the end, and we wouldn't need to do this. (In fact if we didn't want that escaping, we'd need to add [NE].) But with [PT], we're just replacing one URI with another and mod_rewrite will not escape it, because it knows the URI shouldn't be escaped*. But then because we're setting a new query string, we have to escape it ourselves.

Note that one can get access to the non-decoded request line by using %{THE_REQUEST} in a RewriteCond and using the result captures (%1..%N) instead of captures from a RewriteRule ($1..$N).

mod_rewrite: a character in my new URL is being escaped as %nn. How can I avoid that?

Example: The following RewriteRule is supposed to redirect requests for /datasheets to http://www.example.com/CatalogView?view=01&content=02#readme, but the # is getting translated to %23 in the response to the browser.

RewriteRule /datasheets http://www.example.com/CatalogView?view=01&content=02#readme

By default, mod_rewrite will perform URI escaping on special characters such as '#', such that the rewritten URL contains the percent sign followed by the numeric code for the character. The escaping behavior can be overridden with the noescape processing flag. Here is the new RewriteRule which avoids the escaping:

RewriteRule /datasheets http://www.example.com/CatalogView?view=01&content=02#readme [R,NE]

Consult the mod_rewrite documentation for more information.

Note: If there is a question about what is being sent to the browser, the RewriteLog and RewriteLogLevel directives can be used to turn on logging in mod_rewrite, or the Location header field can be logged in the access log by adding %{Location}o to the access log format.

Why is the REMOTE_USER look-ahead variable not set for requests handled by the IHS plugin?

Due to the complex interaction between the authentication, mod_rewrite, and IHS plugin modules, the REMOTE_USER look-ahead variable LA-U:REMOTE_USER is not available in mod_rewrite for requests that the IHS plugin handles.

The workaround for this issue is to not use the look-ahead variable and use REMOTE_USER inside a Location directive instead. Please note that there are limitations to this approach as some features such as relative substitutions may not work.

Using mod_rewrite to redirect based on the REMOTE_USER, for example, can be done using the following configuration:

    <Location />
      RewriteCond %{REMOTE_USER} ^user$
      RewriteRule ^(.*) http://example.com [R,L]
    </Location>

HOWTO

How mod_rewrite interacts with the WebSphere plugin

This topic has its own page.

Disabling HTTP TRACE with mod_rewrite

This topic has its own page.

Mapping UTF-8 requests to local codepage

This topic is discussed on the NLS page.

I need to work with the #xxxx part of a request.

The part of the request following a "#" character is called the fragment, and any request URI that includes that is not valid. I know we see href strings like that all the time, but when a browser is given such a request string, it strips off the "#xxxx" part, retrieves the page with the part before that as the URI, then looks in the page for an anchor named xxxx and moves there in the page. The server has nothing to do with it, and never sees the "#".

If the server does receive a request with a fragment, it parses it out, and then pretty much ignores it. It's not passed to CGI scripts or the plug-in, it's not included in rewrite processing, it's pretty close to invisible.

If for some non-standard reason, someone needs to send requests to the server that include fragments and do something with them, the only approach I can see is to use a trick to get at the fragment value, and rewrite it into the request without the "#" so it can be seen. Something like this:

RewriteEngine on

# If the request contains a #, change it to %23
# Note that the # part is not accessible in any of the parsed request env vars,
# so we have to use THE_REQUEST to parse the entire request line ourselves to
# find that value
# If not using GET requests, this will need to be adapted.
# Match against request that contains a #, capture part before and after
RewriteCond %{THE_REQUEST} GET\ (.*)#(.*)\ HTTP/.*

# Put together the URI we want
# Need [PT] so IHS looks at it all over again; otherwise CGI, plug-in, etc
# might not handle the request
RewriteRule ^ %1\%23%2 [PT]

After this, the #xxxx part will be replaced by %23xxxx, and hopefully handlers will be able to do something with it. You can of course do anything you want with the fragment value (%2) in the rewrite rule.

What's the overhead of mod_rewrite.

It is impossible to give too much general guidance here. In one basic test of 100 rules that all match a request, the response is only delayed by a tenth of a millisecond. There are some general guidelines you can follow to avoid causing lots of overhead

  • Make RewriteRule match as strict as possible. This prevents the conditions from being executed.

  • Always code rewrites in vhost context not directory context

  • Use the [L] flag early to short-circuit subsequent rules

  • Don't use %{REQUEST_URI} in conditions if you have many hundreds of them.

  • Redirects (vs rewrite to a local path or URL) are a round-trip with the browser and add considerable latency.