Home | Previous Page | Next Page   Planning Table and Index Fragmentation > Choosing Fragmentation Schemes >

Fragmentation Methods

This section describes each fragmentation method and lists its advantages and disadvantages for fragment elimination and even data distribution. For syntax information, see the CREATE TABLE section of the IBM Informix: Guide to SQL Syntax.

Round-Robin Fragmentation

A round-robin fragmentation scheme places rows as they are inserted one after another in fragments, rotating through the set of fragments to distribute the rows evenly. If one of the table fragments is full, it is skipped.

Round-robin fragmentation schemes distribute data evenly among table fragments and ensure good performance for applications that require scanning the entire table. Because the database server cannot determine which table fragment contains a specific row, however, round-robin fragmentation schemes have the following significant disadvantages:

Round-robin fragmentation schemes are of limited usefulness. For OLTP applications that execute only short transactions that process few rows through an index, you might use round-robin fragmentation and detach indexes into separate dbspaces. For other applications, consider a fragmentation method that distributes data approximately equally among fragments, but permits the database server to identify fragments that contain specific rows.

Expression-Based Fragmentation

An expression fragmentation scheme puts rows that contain specified values in the same fragment. You specify a fragmentation expression that defines criteria for assigning rows to each fragment, either as a range rule or some other arbitrary rule. You can also specify a remainder fragment that holds all rows that do not match the criteria for any other fragment, but a remainder fragment reduces the efficiency of the expression-based fragmentation scheme.

Before you create an expression fragmentation scheme, analyze the data distribution as described in The Data-Distribution Pattern. If you know the data distribution, you can design a fragmentation rule that distributes data across dbspaces as required to meet your fragmentation goal.

Defining a fragmentation expression

The fragmentation expression can be a simple column comparison that has the general form:

<column_name><comparison><constant>

where <comparison> is one of the operators >, >=, =, <=, or <. The following example demonstrates a simple comparison predicate:

FRAGMENT BY EXPRESSION
colA < 5 IN db1,
colA >= 5 AND colA < 10 db2,
...

The fragmentation can also be a more complex comparison with the general form:

<function(column)><comparison><constant>

In this type of expression, <function> can be one of the following built-in functions:

Time
Functions
Arithmetic
Functions
String-Manipulation
Functions
Conversion Functions
YEAR ABS TRIM UPPER
MONTH POW SUBSTRING LOWER
DAY ROOT SUBSTR INITCAP
MDY ROUND REPLACE HEX
WEEKDAY SQRT LPAD LENGTH
TRUNC RPAD

The following examples demonstrate the types of complex predicates that can be used in fragmentation expressions:

FRAGMENT BY EXPRESSION
colA - colB < 5 in db1,
colA - colB >=5 AND colA - colB < 10 db2,
...
FRAGMENT BY EXPRESSION
YEAR(colA - colB) < 2005 IN db1,
YEAR(colA - colB) >= 2005 AND YEAR(colA-colB) < 2010 db2,

Tip:
Avoid specifying columns in the fragmentation expression if their value changes often. Such updates might cause rows to move, deleting them from one fragment and adding them to another fragment. This activity increases CPU and I/O overhead.

For detailed information about the syntax of expression-based fragmentation, see the IBM Informix: Guide to SQL Syntax.

Fragment elimination

For fragment elimination, the fragmentation scheme should match most query filters. For example, a table might be fragmented so that data for the days of the month, such as the first, the second, the third, and so on, are stored in separate fragments.

Fragment elimination occurs in a way that is analogous to the way fragmentation schemes are handled. The left hand side of the query predicate is extended like the left hand side of a fragmentation expression based on syntactic similarity.

The following examples show queries that have fragments eliminated based on similarities between the query predicate and the fragmentation expression:

FRAGMENT BY EXPRESSION
YEAR(order) < 1900 IN db1,
YEAR(order) >= 1900 AND YEAR(order) < 2000 IN db2,...
SELECT ... WHERE YEAR(order) < 1980;

FRAGMENT BY EXPRESSION
paiddate - shipdate < 30 IN db1,
paiddate - shipdate > 30 IN db2,...
SELECT ... WHERE paiddate - shipdate > 60;
Transforming datetime measures

When a query is expressed using different datetime measurements, it can often be transformed into a comparable type which can be used in fragment elimination. The following examples demonstrate how datetime measures are transformed:

FRAGMENT BY EXPRESSION 
YEAR(order)<1900 IN db1, 
YEAR(order)>=1900 AND YEAR(order)<2000 IN db2,...--unit in year 
SELECT ...WHERE order >'05/31/1998 ' ;--unit in date

In this example the fragmentation scheme uses an extraction function such as YEAR(). Here, the same function is applied to both the left hand side of the query and the left hand side of the fragmentation predicate. In the case, the function can be used for fragmentation elimination because it can be applied to both sides of the query predicate. In other words, WHERE order > '05/31/1998'; becomes WHERE YEAR(order) > YEAR('05/31/1998') for the purpose of fragment elimination.

FRAGMENT BY EXPRESSION
order < '04/31/1900' IN db1,
order >= '06/31/1950' AND order < '01/31/1960' IN db2, ... -- unit in date
SELECT ... WHERE YEAR(order) > 1960; -- unit in year

These examples show a context where the left hand side of the query predicate uses an extraction function which matches the left hand side of the fragmentation predicate. If the unit of measure returned by the extraction function can be represented as a range of values that matches the fragmentation scheme, then query optimizer transforms the right hand side of the query predicate into the corresponding range of values. This predicate can the be used for fragment elimination.

Balanced processing

A fragmentation expression that generates a relatively even distribution of rows across fragments might improve parallel processing of table fragments if all fragments are required by most queries.

Improved concurrency for OLTP applications

Analyze the queries that access the table, as described in The Table-Access Pattern.

If certain rows are accessed more often than others, try to distribute data so that few of these rows are in the same table fragment.

If all columns are used in many transactions, fragmentation expressions based on more than one column can improve fragment elimination. For example, you might create a hybrid fragmentation scheme, fragmenting the table across coservers with an expression-based scheme and across dbspaces within each coserver with a hash-based scheme.

Hash Fragmentation

A hash fragmentation scheme uses an internal hash function to distribute rows among table fragments. If the fragmentation column contains few duplicate values, a hash fragmentation scheme balances the number of rows in each fragment reasonably well.

Because hash fragmentation schemes allow the database server to identify the fragment in which a specific row is placed, such fragmentation schemes are better than round-robin distribution. Table fragments can be identified immediately for OLTP transactions that require only a few rows and eliminated if they are not necessary for DSS queries.

An optimal fragmentation strategy for a DSS-only environment is to fragment tables across all coservers by system-defined hash on a key that is often used to join the tables. When you fragment tables by hash on one column, try to sure that the table fragments are stored in dbslices have been created to take advantage of collocated joins, as explained in Creating Dbslices for Collocated Joins.

Although you can specify a serial column in the hash fragmentation scheme of a fragmented table, it must be the only column that you specify.

Serial values that are assigned to the serial column in a fragmented table and a nonfragmented table might differ. The database server assigns serial values in each fragment, but gaps in the sequence might occur. The values do not overlap. You cannot specify what values to use. The database server controls the values to add.

Hybrid fragmentation schemes, described in the following section, provide additional granularity for fragment elimination.

For detailed information about the syntax of hash fragmentation, see the IBM Informix: Guide to SQL Syntax.

Hybrid Fragmentation

Hybrid fragmentation schemes combine two fragmentation strategies on the same table for increased data granularity and fragment elimination during query processing.

Hybrid fragmentation provides a two-dimensional fragmentation scheme; a table is fragmented by expression or range into specific dbslices or sets of dbspaces and fragmented by hash into the dbspaces in each dbslice or set of dbspaces. This distribution strategy provides finer granularity of table fragments and permits the database server to eliminate fragments based on the hash distribution, the expression distribution, or occasionally both.

For even finer granularity on a single column, you can use the same column as the distribution column for both the hash and expression distribution.

For example, you might use following onutil commands to create the cogroup sales. Then in cogroup sales, you define dbslices across 12 coservers so that each dbslice includes two dbspaces on each coserver for a total of 24 dbspaces:

% onutil
1> CREATE COGROUP sales 
2> FROM xps42t_techpubs.%r(1..12);
Cogroup successfully added.
3> CREATE DBSLICE acct_dbsl FROM 
4> COGROUP sales CHUNK "/dbspaces/dbs1.%c" SIZE 490000,
5> COGROUP sales CHUNK "/dbspaces/dbs2.%c" SIZE 490000;
Dbslice successfully added.
...
36> CREATE DBSLICE acct_dbsl12 FROM 
37> COGROUP sales CHUNK "/dbspaces/dbs23.%c" SIZE 490000,
38> COGROUP sales CHUNK "/dbspaces/dbs24.%c" SIZE 490000;
Dbslice successfully added.

If you have accounting data for a full year, you might fragment account numbers by month, with each month in a separate dbslice. The following CREATE TABLE statement in SQL shows the combination of the expression-based fragmentation scheme to fragment each month into one of the twelve dbslices and the hash fragmentation scheme to fragment the monthly account numbers into the dbspaces in each dbslice:

CREATE TABLE account
(account_num integer,
account_bal integer, 
account_date date, 
account_name char(30)
)
FRAGMENT BY HYBRID (account_num) EXPRESSION
   account_date >= '01/01/1996' 
      and account_date < '02/01/1996' IN acct_dbsl1
   account_date >= '02/01/1996' 
      and account_date < '03/01/96' IN acct_dbsl2
   ...
   account_date >= '12/01/1996' 
      and account_date < '01/01/97' IN acct_dbsl12

Figure 2 illustrates the dbspaces in each dbslice for the table fragments that this hybrid fragmentation scheme defines.

Figure 2. Hybrid Fragmentation
begin figure description - This figure is described in the surrounding text. - end figure description

For detailed information about the syntax of hybrid fragmentation, see the IBM Informix: Guide to SQL Syntax.

Range Fragmentation

Range fragmentation ensures that rows are fragmented evenly across dbspaces and the fragment that contains each row is uniquely identified. Only columns that contain data of type integer or smallint can be used for range fragmentation expressions, and simple fragmentation can be based on only one column.

Range fragmentation is similar to clustering. It is designed to distribute and cluster rows evenly, and it improves access performance for tables with dense, uniform distributions and little or no duplication in the fragmentation column.

Range fragmentation schemes can be used either as a single-level scheme or a hybrid scheme. Hybrid range fragmentation schemes can specify different columns in each fragmentation statement. For small number ranges, a hybrid range fragmentation scheme might avoid data skew, which sometimes occurs with hybrid hash fragmentation schemes.

Range distribution is similar to expression distribution in that rows are distributed by a range of values. You can use range fragmentation in any circumstance in which expression fragmentation is appropriate and the fragmentation columns are INTEGER or SMALLINT data types.

In range distribution, however, the database server balances the distribution of rows evenly among fragments on the basis of the MIN and MAX values if they are provided or on the assumption that the single stated RANGE value is the maximum and 0 is the minimum.

Equality searches on the search key or keys are faster when rows are grouped in range partitions. For example, queries and transactions with filters of the form WHERE a.col1 = b.col1 or WHERE a.col1 = '12345' can take advantage of the range function on col1 if either table a or b is a range-fragmented table.

The following simple example shows how to create a table with the RANGE fragmentation option. This example shows how to fragment an account lookup table evenly across the ten dbspaces in the dbslice accounts so that each dbspace contains approximately 900 rows. The MIN and MAX keywords indicate the total range of expected values, with account numbers beginning at 1000 and ending at 9999.

CREATE TABLE accth(account_num integer,...)
   ...
   FRAGMENT BY RANGE (account_num MIN 1000 MAX 9999) IN accounts;

Range fragmentation can be used in a hybrid fragmentation scheme if the RANGE keyword is used for both fragmentation statements.

For example, assuming that store numbers are evenly distributed from 1 to 1800 and no store number can be greater than 1800, you might enter the following statement to fragment a file on one column in dbslices across coservers and on the same column through dbspaces in the dbslices on each coserver:

CREATE TABLE stores(store_num integer,...)
   ...
   FRAGMENT BY HYBRID (RANGE (store_num))
   RANGE(store_num MIN 1 MAX 1800) 
IN stores1, stores2

Figure 3 shows how the store numbers will be distributed among the dbspaces in the stores1 and stores2 dbslices.

Figure 3. Range Fragmentation
begin figure description - This figure is described in the surrounding text. - end figure description

If any values of store_num fall outside of the specified range, 0 to 1800, the database server returns an error and does not insert the rows that contain those values into the table. To prevent such problems, you can specify a REMAINDER fragment in a single dbspace. However, rows in a REMAINDER fragment reduce the efficiency of a range-fragmented table for queries that require range searches.

For finer granularity, you can also use range fragmentation to create a hybrid range fragmentation scheme on two columns. For detailed information about the rules for hybrid range fragmentation, refer to the IBM Informix: Guide to SQL Syntax.

The ALTER FRAGMENT statement is not supported for range-fragmented tables.

For detailed information about the syntax of range and hybrid range fragmentation expressions, see the IBM Informix: Guide to SQL Syntax.

Altering a Fragmentation Scheme

If you find that the original fragmentation scheme is not efficient before you put your database into production, you can change it completely in two ways:

To change the fragmentation scheme by adding a new fragment, use the ALTER FRAGMENT ATTACH statement, as described in the IBM Informix: Guide to SQL Syntax.

Home | [ Top of Page | Previous Page | Next Page | Contents | Index ]