Updating and Changing Tables

Home | Previous Page | Next Page Managing Tables and Indexes > Planning and Managing Tables >

Updating and Changing Tables

OLTP applications continually modify tables by updating, adding, and deleting a few rows at a time. In addition, however, you can change data and tables in the following ways:

With a batch process that refreshes large decision-support tables with periodically extracted data
By attaching or detaching fragments to add or drop historical data from a specific time period

You might change the structure or properties of a table in the following ways:

Add, drop, or modify columns in tables for different data analysis
Add constraints on columns, change the data type, and so on

This section describes some of the performance implications of global or large changes to table data and the table definition.

For specific information about tuning applications that update data, see Improving Large Data Updates.

Loading and Unloading Table Data

Databases for decision-support applications are often updated by periodically loading and restructuring table data that has been extracted and unloaded from active OLTP databases. Data is usually loaded in Express mode through an external table into an OPERATIONAL or RAW table to get the advantage of parallel processing and light appends. For detailed information about using external tables to load and unload data, refer to the IBM Informix: Extended Parallel Server Administrator's Guide.

If you expect to load and unload the same table often to build or refresh a data mart or data warehouse, monitor the progress of the job to estimate how long similar jobs will take in the future.

Execute the SET EXPLAIN ON statement before you execute the SQL statement that loads or unloads data to write performance information to the sqlexplain.out file. Use ISA to monitor the process as it is executed. Enter the SET PLOAD FILE statement to specify the path and name of a file to store statistics about the number of rows loaded and a list of rejected rows.

Dropping Indexes Before Table Updates

In some applications, most table updates can be confined to a single time period. You might be able to set up your system so that all updates are applied overnight or on specified dates.

When updates are performed as a batch, drop all nonunique indexes and then recreate the indexes after you update the table. This strategy can have the following positive effects:

The updating program runs faster with fewer indexes to update. Often, if the number of updates is large, the total time to drop the indexes, update without them, and re-create them is less than the time to update the table with the indexes in place. For information about the time cost of updating indexes, see Update-Time and I/O Costs.
Newly made indexes are more efficient. Frequent updates tend to dilute the index structure so that it contains many partly full leaf pages. This dilution reduces the efficiency of an index and wastes disk space.

To save time, make sure that a batch-updating program calls for rows in the sequence that the primary-key index defines. Then pages of the primary-key index are read in order and only once.

The presence of indexes also slows the population of tables when you use the LOAD statement or parallel inserts. Loading a table that has no indexes is fast, little more than a disk-to-disk sequential copy, but updating indexes adds a great deal of overhead.

To load a table that has no indexes

Drop the table (if it exists).
Create the table without specifying any unique constraints.
Load all rows into the table.
Alter the table to apply the unique constraints.
Create the nonunique indexes.

To guarantee that the loaded data satisfies all unique constraints, create unique indexes and then load the rows in DELUXE mode, which modifies the index and checks constraints for each row as it is loaded. You save time if the rows are presented in the correct sequence for at least one of the indexes. If you have a choice, make it the row with the largest key. This strategy minimizes the number of leaf pages that must be read and written.

Using External Tables to Load and Unload Simple Large Objects

When you use external tables to load and unload data, you can define the table to specify simple-large-object columns as either raw or delimited, as described in the IBM Informix: Guide to SQL Syntax.

The choice of raw or delimited format has the following performance implications:

Raw format for simple large objects in external files does not have any conversion costs. In addition, because the simple-large-object length field is embedded in the row and the simple-large-object data follows the row in the data stream, simple-large-object data is read only once and then inserted into the existing field in the row where it belongs.
Delimited format for simple large objects has specific conversion costs. The simple large objects in delimited format files are read and written at the position where the simple-large-object column is defined. This process requires more buffer space and might also require an extra read and write if buffer space is exceeded.

For detailed information about loading simple large objects from external tables, refer to the IBM Informix: Extended Parallel Server Administrator's Guide.

Attaching or Detaching Fragments

The ATTACH and DETACH options of the ALTER FRAGMENT statement are often used to update tables fast by adding and removing fragments that contain historical data.

To add a large segment of data into an existing table, you might be able to create the new data fragment and use the ALTER FRAGMENT ATTACH statement to add the data and any attached indexes. The attached table fragments must be the same type as the table to which they are to be attached, such as raw or OPERATIONAL, and must match the table definition.

To remove table fragments that are no longer used, either to archive or drop them entirely, use the ALTER FRAGMENT DETACH statement.

For information, refer to the IBM Informix: Guide to SQL Syntax.

Altering a Table Definition

The database server chooses one of the following algorithms to process an ALTER TABLE statement in SQL:

In-place alter
Standard alter
Fast alter

The algorithm chosen depends on the requested alteration, as described in Rules for Determining Alter Algorithm.

Tip:

To alter the distribution scheme of a table, use the SQL statement ALTER FRAGMENT ... INIT. This method of redistributing table rows might create a long transaction. For best results, unload table data to an external table, drop the current table definition, redefine the table to use the new distribution scheme, and reload the data.

In-Place ALTER TABLE

When you execute an ALTER TABLE statement that uses the in-place alter algorithm, the database server creates a new version of the table structure. When a row is updated, the in-place alter algorithm moves the altered row from the old definition of the table to the new definition. Rows are not duplicated in the old and new definition of the table.

The database server keeps track of all versions of table definitions. The database server retains the version status as well as all of the version structures and alter structures until the entire table is converted to the final format or a standard-alter operation is performed. For information about the standard-alter method, see Standard ALTER TABLE. For complete information about ALTER TABLE syntax and restrictions, see the IBM Informix: Guide to SQL Syntax.

The in-place alter algorithm provides the following time and disk-space performance advantages:

It increases table availability and improves system throughput during the ALTER TABLE operation.
The table is available for use sooner when the ALTER TABLE operation uses the in-place alter algorithm because the database server locks the table for only the time that it takes to update the table definition and rebuild indexes that contain altered columns.

The table is locked for a shorter time than with the standard-alter algorithm for the following reasons:
- The database server does not make a copy of the table in order to convert the table to the new definition or convert the data rows during the alter operation.
- The database server alters the physical rows in place with the latest definition after the alter operation when you subsequently update or insert rows. The database server converts the rows that reside on each page that you updated.
- The database server does not rebuild all indexes on the table.
This increase in table availability can increase system throughput for application systems that require 24-by-7 operations.
It requires less space than the standard-alter algorithm for the following reasons:
- The database server does not make a copy of the table in order to convert the table to the new definition.
- The database server does not log any changes to the table data during the alter operation.
These space savings can be substantial for very large tables.

Standard ALTER TABLE

When the database server uses the standard-alter algorithm, the table is locked for a long period of time because the database server:

Makes a copy of the table in order to convert the table to the new definition
Converts the data rows during the alter operation

The database server uses the standard-alter algorithm when the ALTER TABLE statement changes columns that cannot be changed in place. For more information, see Rules for Determining Alter Algorithm.

Fast ALTER TABLE

The database server uses the fast alter algorithm when the ALTER TABLE statement changes attributes of the table and does not affect the data. The database server uses the fast alter algorithm when the ALTER TABLE statement changes the following attributes:

The type of table (RAW, STATIC, OPERATIONAL, and STANDARD)
The lock mode of the table
The next-extent size
Constraints

When the database server uses the fast alter algorithm, the table is locked for a short time. In some cases, the database server locks the system catalog tables to change the attribute. In either case, the table is unavailable for queries only briefly.

Alter-Table Performance Considerations for Data-Manipulation Statements

If the database server detects a version page from a previous level during the execution of DML statements (INSERT, UPDATE, DELETE, SELECT), it performs the following actions:

For UPDATE statements, the database server converts entire data pages to the final format.
For INSERT statements, the database server converts the inserted row to the final format and inserts it in the page where it fits best. The database server converts the existing rows on the page to the final format.
For DELETE statements, the database server does not convert the data pages to the final format.
For SELECT statements, the database server does not convert the entire data page to the final format.
If a query accesses rows that are not yet converted to the new table definition, you might notice a slight degradation in the performance of individual queries because the database server reformats each row before it is returned.

Improving In-Place Alter Performance

As long as unconverted data pages exist, performance for updates and queries on the altered table might suffer because the database server must convert the data before processing it as requested.

An in-place alter table is outstanding when data pages still exist with the old definition. The onutil CHECK TABLE command displays data page versions for outstanding in-place alter operations. The Count field displays the number of pages that currently use that version of the table definition.

To improve performance, you might convert any remaining unconverted data pages to the latest definition with a dummy UPDATE statement. For example, the following statement, which sets a column value to the existing value, causes the database server to convert data pages to the latest definition:

UPDATE tab1 SET col1 = col1;

After all pages of the table are updated, execute the onutil CHECK TABLE command. The total number of data pages for the current version of the table appears in the Count field.

Important:

If you execute additional ALTER TABLE statements that use the in-place alter algorithm on a table, each subsequent ALTER TABLE statement takes more time to execute than the previous statement. You should not have more than 50 to 60 outstanding alters on a table. Outstanding ALTER TABLE statements affect only the subsequent ALTER TABLE statements. They do not affect the performance of SELECT statements.

Rules for Determining Alter Algorithm

The database server does not use the in-place alter algorithm in the following situations:

When you increase the length of a CHARACTER, DECIMAL, MONEY, or SMALLINT column and specify more than one change
If the ALTER TABLE statement contains more than one change, the database server uses the slower algorithm in the execution of the statement.

For example, assume that an ALTER TABLE MODIFY statement extends a CHARACTER column and shrinks a DECIMAL column. Increasing the length of a CHARACTER column is an in-place alter operation that requires no data conversion, but decreasing the length of a DECIMAL column is a standard-alter operation because it might require data conversion. The database server uses the standard-alter algorithm to execute this statement.
When you convert from real decimal to floating point
Special considerations apply when you convert a real (fixed-point) decimal number to a floating-point number.

A fixed-point DECIMAL column has the format DECIMAL(p1,s1), where p1 refers to the precision of the column (the total number of significant digits) and s1 refers to its scale (the number of digits to the right of the decimal point).

If you are using an ANSI-mode database and specify DECIMAL(p), the value defaults to DECIMAL(p,0). In a non-ANSI database, the value is treated as a floating point with a precision of p.

If a fixed-point DECIMAL is converted to a floating-point DECIMAL, the standard-alter method is used.

In addition to these restrictions, the standard-alter algorithm is used instead of the in-place alter algorithm in the following circumstances:

When the modified column is used for hash partitioning in either a hash or hybrid fragmentation scheme
When you drop or modify a column in a table that has a bitmap index
When you drop or alter a simple-large-object column

Altering a Column That Is Indexed

If an altered column is indexed, the table is still altered in place. The database server automatically rebuilds the index or indexes. If the index does not need to be rebuilt, improve performance by dropping or disabling the index before you perform the alter operation.

However, if the column that you modify is a primary key or foreign key and you want to keep this constraint, specify those keywords again in the ALTER TABLE statement. The database server then rebuilds the index.

For example, suppose that you create tables and alter the parent table with the following SQL statements:

CREATE TABLE parent 
   (si smallint PRIMARY KEY CONSTRAINT pkey);
CREATE TABLE child
   (si smallint REFERENCES parent ON DELETE CASCADE
   CONSTRAINT ckey);
INSERT INTO parent (si) VALUES (1); 
INSERT INTO parent (si) VALUES (2); 
INSERT INTO child (si) VALUES (1); 
INSERT INTO child (si) VALUES (2); 

ALTER TABLE parent
    MODIFY (si int PRIMARY KEY CONSTRAINT pkey);

This ALTER TABLE example converts a SMALLINT column to an INT column.The database server retains the primary key because the ALTER TABLE statement specifies the PRIMARY KEY keywords and the pkey constraint. However, the database server drops any referential constraints that reference that primary key. Therefore, you must also specify the following ALTER TABLE statement for the child table:

ALTER TABLE child 
    MODIFY (si int REFERENCES parent ON DELETE CASCADE
         CONSTRAINT ckey);

Even though the ALTER TABLE operation on a primary key or foreign key column rebuilds the index, the database server still takes advantage of the in-place alter algorithm to provide the following performance benefits:

Does not make a copy of the table in order to convert the table to the new definition
Does not convert the data rows during the alter operation
Does not rebuild all indexes on the table

Warning:

If you alter a table that is part of a view, you must re-create the view to obtain the latest definition of the table.