Home | Previous Page | Next Page   Appendix B. How SQL Queries Are Executed > How Query Plans Are Created >

Query Optimization

Before the optimizer can estimate the cost of each possible query plan, it examines the query filters and determines the selectivity of each column referenced by the query statement.

The number of rows is determined by the selectivity of each conditional expression in the WHERE clause. A conditional expression that is used to select particular rows is called a filter. The selectivity is a value between 0 and 1 that indicates the proportion of rows in the table that the meet the requirements of the filter. A very selective filter, one that passes few rows, has a selectivity close to 0; a filter that passes almost all rows has a selectivity near 1. For guidelines on filters, see Evaluate Filter Clauses.

Important:
It is strongly recommended that you run UPDATE STATISTICS in at least MEDIUM mode against tables for all columns used as query filters. The data-distribution created by UPDATE STATISTICS provide the information that the optimizer uses to determine the best join plan, to decide whether to use an index, and other critically important factors in query processing. For guidelines for running UPDATE STATISTICS, see Create and Maintain Data-Distribution Statistics.

Filter-Selectivity Estimates If Data Distributions are Not Available

If data distributions are not available but the relevant columns are indexed, the optimizer will calculate selectivities based on the table indexes. The optimizer uses the formulas listed in the following table to estimate selectivity.

The list is not exhaustive. Selectivities calculated using data distributions are more accurate than those shown in the table.

Filter Expression Selectivity (F)
indexed-col = literal-value indexed-col = host-variable indexed-col IS NULL F = 1/(number of distinct keys in index)
tab1.indexed-col = tab2.indexed-col F = 1/(number of distinct keys in the larger index)
indexed-col > literal-value F = (2nd-max - literal-value)/(2nd-max - 2nd-min)
indexed-col < literal-value F = (literal-value - 2nd-min)/(2nd-max - 2nd-min)
any-col IS NULL any-col = any-expression F = 1/10
EXISTS subquery F = 1 if subquery estimated to return >0 rows, else 0
NOT expression F = 1 - F(expression)
expr1 AND expr2 F = F(expr1) * F(expr2)
expr1 OR expr2 F = F(expr1) + F(expr2) - (F(expr1) * F(expr2))
any-col IN list Treated as any-col = item1 OR ... OR any-col = itemn
any-col relop ANY subquery Treated as any-col relop value1 OR ... OR any-col relop valuen for estimated size of subquery n
Key: indexed-col: first or only column in an index 2nd-max, 2nd-min: second-largest and second-smallest key values in indexed column any-col: any column not covered by a preceding formula

Index Evaluation

The optimizer notes whether an indexed column can be used to evaluate a filter. For this purpose, an indexed column is any single column with an index or the first column named in a composite index.

The optimizer can use an index in the following cases:

For more information, see Choose Appropriate Columns for Indexes.

Home | [ Top of Page | Previous Page | Next Page | Contents | Index ]