Redundant Data

Enterprise Edition Home | Express Edition Home | Previous Page | Next Page Table Performance Considerations > Denormalizing the Data Model to Improve Performance >

Redundant Data

Normalized tables contain no redundant data. Every attribute appears in only one table. Normalized tables also contain no derived data. Instead, data that can be computed from existing attributes is selected as an expression based on those attributes.

Normalizing tables minimizes the amount of disk space used and makes updating the tables as easy as possible. However, normalized tables can force you to use joins and aggregate functions often, and those processes can be time consuming.

As an alternative, you can introduce new columns that contain redundant data, provided you understand the trade-offs involved.

Adding Redundant Data

A correct data model avoids redundancy by keeping any attribute only in the table for the entity that it describes. If the attribute data is needed in a different context, you join tables to make the connection. But joining takes time. If a frequently used join affects performance, you can eliminate it by duplicating the joined data in another table.

In the stores_demo database, the manufact table contains the names of manufacturers and their delivery times. An actual working database might contain many other attributes of a supplier, such as address and sales representative name.

The contents of manufact are primarily a supplement to the stock table. Suppose that a time-critical application frequently refers to the delivery lead time of a particular product but to no other column of manufact. For each such reference, the database server must read two or three pages of data to perform the lookup.

You can add a new column, lead_time, to the stock table and fill it with copies of the lead_time column from the corresponding rows of manufact. That arrangement eliminates the lookup and therefore speeds up the application.

Like derived data, redundant data takes space and poses an integrity risk. In the example described in the previous paragraph, many extra copies of the lead time for each manufacturer can exist. (Each manufacturer can appear in stock many times.) The programs that insert or update a row of manufact must also update multiple rows of stock.

The integrity risk is simply that the redundant copies of the data might not be accurate. If a lead time is changed in manufact, the stock column is outdated until it is also updated. As you do with derived data, define the conditions under which redundant data might be wrong.

For more information on database design, see the IBM Informix: Database Design and Implementation Guide.