Data skew is a condition that occurs when the majority of the data involved in an SQL operation resides on one coserver and is not distributed evenly across multiple coservers.
For example, during a hash join, data skew occurs when a large number of duplicate values exist in the join column. The bulk of the data values involved in the hash join reside on one or two of the coservers. As a result, one or two of the coservers are still processing rows for the hash join, while the other coservers have completed and are waiting for the rest of the rows before the next SQL operator can start processing.
RGM can detect data skew during the build or probe phase of the hash join. The database server detects if one coserver has many more rows to process than the other coservers. If data skew occurs, RGM distributes the data from the coserver with the most rows to the other idle coservers to perform part of hash join.
The database server creates a new SQL operator, the flex join operator, to process the redistributed hash join rows on the other coservers.
For more information on the features that Extended Parallel Server provides to balance the workload, refer to your IBM Informix: Performance Guide.
Home | [ Top of Page | Previous Page | Next Page | Contents | Index ]