Character Data Types (GLS)

Informix database servers support several character data types, including CHAR, NCHAR, and NVARCHAR, the special-use character data type.

Character Data: CHAR(n) and NCHAR(n)

The CHAR(n) data type contains a sequence of n bytes. These characters can be a mixture of English and non-English characters and can be either single byte or multibyte (Asian). The length n ranges from 1 to 32,767.

Whenever the database server retrieves or stores a CHAR(n) value, it transfers exactly n bytes. If an inserted value is shorter than n, the database server extends the value with single-byte ASCII space characters to make up n bytes. If an inserted value exceeds n bytes, the database server truncates the extra characters without returning an error message. Thus the semantic integrity of data for a CHAR(n) column or variable is not enforced when the value that is inserted or updated exceeds n bytes.

Data in CHAR columns is sorted in code-set order. For example, in the ASCII code set, the character a has a code-set value of 97, b has 98, and so forth. The database server sorts CHAR(n) data in this order.

The NCHAR(n) data type also contains a sequence of n bytes. These characters can be a mixture of English and non-English characters and can be either single byte or multibyte (Asian). The length of n has the same limits as the CHAR(n) data type. Whenever an NCHAR(n) value is retrieved or stored, exactly n bytes are transferred. The number of characters transferred can be fewer than the number of bytes if the data contains multibyte characters. If an inserted value is shorter than n, the database server extends the value with space characters to make up n bytes.

The database server sorts data in NCHAR(n) columns according to the order that the locale specifies. For example, the French locale specifies that the character ê is sorted after the value e but before the value f. In other words, the sort order that the French locale dictates is e, ê, f, and so on. For more information on how to use locales, refer to the IBM Informix: GLS User's Guide.

The only difference between CHAR(n) and NCHAR(n) data is how you sort and compare the data. You can store non-English characters in a CHAR(n) column. However, because the database server uses code-set order to perform any sorting or comparison on CHAR(n) columns, you might not obtain the results in the order that you expect.

A CHAR(n) or NCHAR(n) value can include tabs and spaces but normally contains no other nonprinting characters. When you insert rows with INSERT or UPDATE, or when you load rows with a utility program, no means exist for entering nonprintable characters. However, when a program that uses embedded SQL creates rows, the program can insert any character except the null (binary zero) character. It is not a good idea to store nonprintable characters in a character column because standard programs and utilities do not expect them.

The advantage of the CHAR(n) or NCHAR(n) data type is its availability on all database servers. The only disadvantage of CHAR(n) or NCHAR(n) is its fixed length. When the length of data values varies widely from row to row, space is wasted.

Variable-Length Strings: CHARACTER VARYING(m,r), VARCHAR(m,r), NVARCHAR(m,r), and LVARCHAR

Often the items in a character column are different lengths; that is, many are an average length and only a few are the maximum length. For each of the following data types, m represents the maximum number of bytes and r represents the minimum number of bytes. The following data types are designed to save disk space when you store such data:

The difference in the way data is compared distinguishes NVARCHAR(m,r) data from CHARACTER VARYING(m,r) or VARCHAR(m,r) data. For more information about how the locale determines code-set and sort order, see Character Data: CHAR(n) and NCHAR(n).

When you define columns of these data types, you specify m as the maximum number of bytes. If an inserted value consists of fewer than m bytes, the database server does not extend the value with single-byte spaces (as with CHAR(n) and NCHAR(n) values). Instead, it stores only the actual contents on disk with a 1-byte length field. The limit on m is 254 bytes for indexed columns and 255 bytes for non-indexed columns.

The second parameter, r, is an optional reserve length that sets a lower limit on the number of bytes than a value being stored on disk requires. Even if a value requires fewer than r bytes, r bytes are nevertheless allocated to hold it. The purpose is to save time when rows are updated. (See Variable-Length Execution Time.)

The advantages of the CHARACTER VARYING(m,r) or VARCHAR(m,r) data type over the CHAR(n) data type are as follows:

These advantages also apply to the NVARCHAR(m,r) data type in comparison to the NCHAR(n) data type.

The following list describes the disadvantages of using varying-length data types:

Variable-Length Execution Time

When you use any of the CHARACTER VARYING(m,r), VARCHAR(m,r), or NVARCHAR(m,r) data types, the rows of a table have a varying number of bytes instead of a fixed number of bytes. The speed of database operations is affected when the rows of a table have varying numbers of bytes.

Because more rows fit in a disk page, the database server can search the table with fewer disk operations than if the rows were of a fixed number of bytes. As a result, queries can execute more quickly. Insert and delete operations can be a little quicker for the same reason.

When you update a row, the amount of work the database server must perform depends on the number of bytes in the new row as compared with the number of bytes in the old row. If the new row uses the same number of bytes or fewer, the execution time is not significantly different than it is with fixed-length rows. However, if the new row requires a greater number of bytes than the old one, the database server might have to perform several times as many disk operations. Thus, updates of a table that use CHARACTER VARYING(m,r), VARCHAR(m,r), or NVARCHAR(m,r) data can sometimes be slower than updates of a fixed-length field.

To mitigate this effect, specify r as a number of bytes that encompasses a high proportion of the data items. Then most rows use the reserve number of bytes, and padding wastes only a little space. Updates are slow only when a value that uses the reserve number of bytes is replaced with a value that uses more than the reserve number of bytes.

Large Character Objects: TEXT

The TEXT data type stores a block of text. It is designed to store self-contained documents: business forms, program source or data files, or memos. Although you can store any data in a TEXT item, Informix tools expect a TEXT item to be printable, so restrict this data type to printable ASCII text.

Extended Parallel Server supports the TEXT data type in columns but does not allow you to store a TEXT column in a blobspace or use a TEXT value in an SPL routine.

TEXT values are not stored with the rows of which they are a part. They are allocated in whole disk pages, usually in areas separate from rows. For more information, see your IBM Informix: Administrator's Guide.

The advantage of the TEXT data type over CHAR(n) and VARCHAR(m,r) is that the size of a TEXT data item has no limit except the capacity of disk storage to hold it. The disadvantages of the TEXT data type are as follows:

Binary Objects: BYTE

The BYTE data type is designed to hold any data a program can generate: graphic images, program object files, and documents saved by any word processor or spreadsheet. The database server permits any kind of data of any length in a BYTE column.

Extended Parallel Server supports the BYTE data type in columns, but does not allow you to store a BYTE column in a blobspace or use a BYTE value in an SPL routine.

As with TEXT, BYTE data items usually are stored in whole disk pages in disk areas separate from normal row data.

The advantage of the BYTE data type, as opposed to TEXT or CHAR(n), is that it accepts any data. Its disadvantages are the same as those of the TEXT data type.

Using TEXT and BYTE Data Types

The database server stores and retrieves TEXT and BYTE columns. To fetch and store TEXT or BYTE values, you normally use programs written in a language that supports embedded SQL, such as IBM Informix ESQL/C. In such a program, you can fetch, insert, or update a TEXT or BYTE value in a manner similar to the way you read or write a sequential file.

In no SQL statement, interactive or programmed, can a TEXT or BYTE column be used in the following ways:

In a SELECT statement that you enter interactively or in a form or report, you can perform the following operations on a TEXT or BYTE value:

In an interactive INSERT statement, you can use the VALUES clause to insert a TEXT or BYTE value, but the only value that you can give that column is null. However, you can use the SELECT form of the INSERT statement to copy a TEXT or BYTE value from another table.

In an interactive UPDATE statement, you can update a TEXT or BYTE column to null or to a subquery that returns a TEXT or BYTE column.

Changing the Data Type

A fter the table is built, you can use the ALTER TABLE statement to change the data type that is assigned to a column. Although such alterations are sometimes necessary, you should avoid them for the following reasons: