Home | Previous Page | Next Page   Disk, Memory, and Process Management > Loading with External Tables for Extended Parallel Server > Loading Simple Large Objects >

Formats for Simple Large Objects

A format describes the structure of the data in a data file. Before you can load rows from a data file to an Informix database or unload rows from a database to a data file, you must define a format that describes the data file.

The database server supports the following file formats for simple large objects:

Using Informix Internal Format

Informix internal format always loads and unloads simple large objects in internal format and does not use the external TEXT or HEX type definition for simple large objects. In Informix internal format, the data for the simple large object follows the row. The row contains a four-byte integer indicating the length of the simple large object. It is followed by a simple large object of variable length. The length and data are in internal binary format. For example, suppose you want to load the following table:

CREATE TABLE tab1
   (col1 INT,
   largeobj1 BYTE,
   col2 CHAR(10),
   largeobj2 BYTE,
   col3 CHAR(10)) ...

The following diagram shows the layout of the data file.

Figure 78. Simple Large Objects with Informix Internal Format
begin figure description - This figure is described in the surrounding text. - end figure description

You can define an external data file that specifies Informix internal format for simple large objects with the following create external table statement:

CREATE EXTERNAL TABLE extab (
         col1 INT,
         text TEXT,
         byte2 BYTE)
      USING (FORMAT 'INFORMIX',
            DATAFILES('disk:1:/datafile1'));

If you are concerned with the length of time to load or unload a very large table that contains simple large objects, use internal format for the faster execution time. However, you can use internal format only in the following situations:

Using Delimited Format

In delimited format, the data for the simple large object is inserted directly in the row at the point where the TEXT or BYTE column is defined. Delimited format affords slower performance than Informix internal format. Use delimited format when neither of the two preceding conditions applies.

The database server supports two external column types for simple large objects in delimited format:

The database server converts the data between the internal and external column types during loading and unloading.

You can define an external data file that specifies delimited format with either TEXT or HEX column types for simple large objects with the following create external table statement:

CREATE EXTERNAL TABLE extab (
         col1 INT,
         text1 TEXT EXTERNAL 'TEXT', col2 CHAR(10),
         byte2 BYTE EXTERNAL 'HEX', col3 CHAR(10))
      USING (FORMAT 'DELIMITED',
            DATAFILES('disk:1:/datafile1'));

Figure 79 shows the layout of the data file. The external format of these columns is variable length but the internal format is fixed length, except VARCHAR, TEXT, and BYTE data types.

Figure 79. Delimited Format of Simple Large Objects
begin figure description - This figure is described in the surrounding text. - end figure description
External TEXT Type

You can encode a TEXT or BYTE column in the external TEXT type. Use only delimited TEXT and BYTE columns with this external type. Fixed formats are not allowed. If you do not specify the external type in the CREATE external table statement, text columns default to the external text data type and BYTE columns default to the HEX data type.

The database server uses the default field delimiter "|" and record delimiter "\n" (newline). If you define custom delimiters for an external table, the custom delimiters are escaped instead. You can not define the character "\" as a delimiter.

For simple large objects in external text format, the database server always recognizes the "\" (backslash) escape sequences. text data often contains the default record delimiter "\n" (newline), and it is often difficult for a user to tell whether or not text data contains delimiters before a load.

When the database server unloads, it writes each character in the simple large object literally, except for field and record delimiters and backslashes, which are escaped. The database server inserts the escape character (backslash) before it writes the escaped character. When the database server loads, it interprets the escape sequence. A backslash followed by any character is interpreted as a literal character. Nonprintable characters are embedded in the data file unchanged if you specify the external text data type.

The ESCAPE keyword in an external table definition does not affect text data in a simple large object. It affects only character columns. Figure 80 shows this byte sequence represented in the TEXT data file; one cell represents one byte in a data file. Escaped characters are circled.

Figure 80. Expansion of Simple Large Object Loaded in TEXT Data Type
begin figure description - This figure is described in the surrounding text. - end figure description
Multibyte Characters (GLS Only)

In multibyte GLS locales, you can define only characters with a length of exactly one byte as delimiters.

The database server does not alter simple large objects. A simple large object might contain invalid text and incomplete or invalid multibyte characters. In contrast to character columns, the database server does not pad with blanks or truncate invalid multibyte characters in a simple large object. In both single-byte and multibyte locales, the database server always traverses text data byte by byte instead of character by character. If a byte matches one of the delimiters or a backslash, the database server puts an escape character before it.

External HEX Type

You can encode a BYTE or TEXT column in the external HEX type. Use only delimited BYTE and TEXT columns with this external type. Fixed formats are not allowed. If you do not specify the external type in the CREATE external table statement, BYTE columns default to the external HEX data type and TEXT columns default to the external TEXT data type.

The external HEX data type uses two hexadecimal digits (0 through 9, A through F, case insensitive) to represent one byte of data. The digits are continuously written to the data file. No character marks the boundary of bytes. Therefore, one byte of simple-large-object data occupies two bytes in a data file. This expansion might cause a problem if you have large simple large objects and limited disk space.

Because the HEX output always contains printable characters, you cannot escape HEX data. Do not define any hexadecimal character as a field or record delimiter.

For example, suppose you have the following byte sequence where a byte is represented either with its hexadecimal value or a quoted character:

{0x00, 0x01, 0x02, 0x03, '\', 'x', '|', 'y', '\n','z'}

Figure 81 shows this byte sequence represented in hex; one cell represents one byte in a data file.

Figure 81. Expansion of Simple Large Object Loaded in Hex Data Type
begin figure description - This figure is described in the surrounding text. - end figure description
Home | [ Top of Page | Previous Page | Next Page | Contents | Index ]