<-
Apache > HTTP Server > Documentation > Version 2.2 > Modules

Apache Module mod_charset_lite

Description:Specify character set translation or recoding
Status:Experimental
Module Identifier:charset_lite_module
Source File:mod_charset_lite.c

Summary

This is an experimental module and should be used with care. Experiment with your mod_charset_lite configuration to ensure that it performs the desired function.

mod_charset_lite allows the administrator to specify the source character set of response bodies as well as the character set they should be translated into before sending to the client.

This module provides a small subset of configuration mechanisms implemented by Russian Apache and its associated mod_charset.

Directives

Topics

top

Common Problems

Invalid character set names

The character set name parameters of CharsetSourceEnc and CharsetDefault must be acceptable to the translation mechanism used by APR on the system where mod_charset_lite is deployed. These character set names are not standardized and are usually not the same as the corresponding values used in http headers. Currently, APR can only use iconv(3), so you can easily test your character set names using the iconv(1) program, as follows:

iconv -f charsetsourceenc-value -t charsetdefault-value

Mismatch between character set of content and translation rules

If the translation rules don't make sense for the content, translation can fail in various ways, including:

top

CharsetAutoindex Directive

Description:Configures charset translation of output of mod_autoindex
Syntax:CharsetAutoindex fromcharset tocharset
Default:none
Context:server config
Override:FileInfo
Status:Experimental
Module:mod_charset_lite
Compatibility:IBM HTTP Server 6.0.2.35, 6.1.0.35, and 7.0.0.5 and later.

The CharsetAutoindex directive overrides the default translation of the output of mod_autoindex.

By default, translation of the output of mod_autoindex is controlled by the mod_charset_lite configuration that applies to the request's configuration context, e.g. the mod_charset_lite configuration in the Directory or Location sections that apply to the request.

If CharsetAutoindex is configured, then translation of the output of mod_autoindex will be done according to the character sets specified in CharsetAutoindex instead. mod_charset_lite will translate the output of mod_autoindex from fromcharset to tocharset. mod_autoindex generates its output in the native character set of Apache, so the fromcharset should match that, typically ISO8859-1 or IBM-1047.

Configuring CharsetAutoindex does not change any headers in the response. You will probably want to use the Charset= keyword in the IndexOptions configuration of mod_autoindex to make the content type header show the right character set in the response.

If other content is included in the mod_autoindex output, for example using the HeaderName directive to include a file, then that content must have the same encoding as mod_autoindex is generating the rest of its output in, as all the content will undergo the same charset translation. For example, on EBCDIC systems, any included files must be encoded in EBCDIC.

top

CharsetCompatDefault Directive

Description:CharsetCompat to translate into
Syntax:CharsetCompatDefault charset
Default:CharsetCompatDefault ISO8859-1
Context:server config, virtual host, directory, .htaccess
Override:FileInfo
Status:Experimental
Module:mod_charset_lite

The CharsetCompatDefault directive specifies the charset that content in the associated container should be translated to when the DGWCompat option is specified.

The value of the charset argument must be accepted as a valid character set name by the character set support in APR. Generally, this means that it must be supported by iconv.

Example

<Directory /export/home/trawick/apacheinst/htdocs/convert>
CharsetCompatSourceEnc UTF-16BE
CharsetCompatDefault ISO-8859-1
</Directory>

top

CharsetCompatSourceEnc Directive

Description:Source charset of files
Syntax:CharsetCompatSourceEnc charset
Default:CharsetCompatSourceEnc IBM1047
Context:server config, virtual host, directory, .htaccess
Override:FileInfo
Status:Experimental
Module:mod_charset_lite

The CharsetCompatSourceEnc directive specifies the source charset of files in the associated container when the DGWCompat option is specified.

The value of the charset argument must be accepted as a valid character set name by the character set support in APR. Generally, this means that it must be supported by iconv.

Example

<Directory /export/home/trawick/apacheinst/htdocs/convert>
CharsetCompatSourceEnc UTF-16BE
CharsetCompatDefault ISO-8859-1
</Directory>

The character set names in this example work with the iconv translation support in Solaris 8.

top

CharsetDefault Directive

Description:Charset to translate into
Syntax:CharsetDefault charset
Context:server config, virtual host, directory, .htaccess
Override:FileInfo
Status:Experimental
Module:mod_charset_lite

The CharsetDefault directive specifies the charset that content in the associated container should be translated to.

The value of the charset argument must be accepted as a valid character set name by the character set support in APR. Generally, this means that it must be supported by iconv.

Example

<Directory /export/home/trawick/apacheinst/htdocs/convert>
CharsetSourceEnc UTF-16BE
CharsetDefault ISO-8859-1
</Directory>

top

CharsetOptions Directive

Description:Configures charset translation behavior
Syntax:CharsetOptions option [option] ...
Default:CharsetOptions DebugLevel=0 ImplicitAdd
Context:server config, virtual host, directory, .htaccess
Override:FileInfo
Status:Experimental
Module:mod_charset_lite

The CharsetOptions directive configures certain behaviors of mod_charset_lite. Option can be one of

DebugLevel=n
The DebugLevel keyword allows you to specify the level of debug messages generated by mod_charset_lite. By default, no messages are generated. This is equivalent to DebugLevel=0. With higher numbers, more debug messages are generated, and server performance will be degraded. The actual meanings of the numeric values are described with the definitions of the DBGLVL_ constants near the beginning of mod_charset_lite.c.
ImplicitAdd | NoImplicitAdd
The ImplicitAdd keyword specifies that mod_charset_lite should implicitly insert its filter when the configuration specifies that the character set of content should be translated. If the filter chain is explicitly configured using the AddOutputFilter directive, NoImplicitAdd should be specified so that mod_charset_lite doesn't add its filter.
TranslateAllMimeTypes | NoTranslateAllMimeTypes
Normally, mod_charset_lite will only perform translation on a small subset of possible mimetypes. When the TranslateAllMimeTypes keyword is specified for a given configuration section, translation is performed without regard for mimetype.
TranslateRequestBodies | NoTranslateRequestBodies
Normally, mod_charset_lite assumes that incoming request bodies (i.e. POST or PUT data) are text and performs the inverse of the output translation. If the incoming request bodies of POST or PUT requests should not be translated, for example for binary files, specify the NoTranslateRequestBodies keyword.
DGWCompat (8.5.5 and later only)
When this option is set, a mode more compatible with DGW default behaviors is activated. If the response contains the Content-Encoding header with special values, the translation behavior can be altered.

If the value of Content-Encoding is "ebcdic", CharsetCompatSourceEnc and CharsetCompatDefaultEnc override their normal counterparts and the content-encoding header is removed.

If the value of Content-Encoding is anything else, no translation occurs. Additionally, If the value is "7bit", "8bit", or "binary", the header is removed.

The most basic usage allows a content generator to set Content-Encoding: "binary" to opt out of translation.

Note

Support for TranslateRequestBodies and NoTranslateRequestBodies was added with APAR PK87717, part of fix pack 7.0.0.7

top

CharsetSourceEnc Directive

Description:Source charset of files
Syntax:CharsetSourceEnc charset
Context:server config, virtual host, directory, .htaccess
Override:FileInfo
Status:Experimental
Module:mod_charset_lite

The CharsetSourceEnc directive specifies the source charset of files in the associated container.

The value of the charset argument must be accepted as a valid character set name by the character set support in APR. Generally, this means that it must be supported by iconv.

Example

<Directory /export/home/trawick/apacheinst/htdocs/convert>
CharsetSourceEnc UTF-16BE
CharsetDefault ISO-8859-1
</Directory>

The character set names in this example work with the iconv translation support in Solaris 8.