Query Optimization

Home | Previous Page | Next Page Appendix B. How SQL Queries Are Executed > How Query Plans Are Created >

Query Optimization

Before the optimizer can estimate the cost of each possible query plan, it examines the query filters and determines the selectivity of each column referenced by the query statement.

The number of rows is determined by the selectivity of each conditional expression in the WHERE clause. A conditional expression that is used to select particular rows is called a filter. The selectivity is a value between 0 and 1 that indicates the proportion of rows in the table that the meet the requirements of the filter. A very selective filter, one that passes few rows, has a selectivity close to 0; a filter that passes almost all rows has a selectivity near 1. For guidelines on filters, see Evaluate Filter Clauses.

Important:

It is strongly recommended that you run UPDATE STATISTICS in at least MEDIUM mode against tables for all columns used as query filters. The data-distribution created by UPDATE STATISTICS provide the information that the optimizer uses to determine the best join plan, to decide whether to use an index, and other critically important factors in query processing. For guidelines for running UPDATE STATISTICS, see Create and Maintain Data-Distribution Statistics.

Filter-Selectivity Estimates If Data Distributions are Not Available

If data distributions are not available but the relevant columns are indexed, the optimizer will calculate selectivities based on the table indexes. The optimizer uses the formulas listed in the following table to estimate selectivity.

The list is not exhaustive. Selectivities calculated using data distributions are more accurate than those shown in the table.

Filter Expression	Selectivity (F)
indexed-col = literal-value indexed-col = host-variable indexed-col IS NULL	F = 1/(number of distinct keys in index)
tab1.indexed-col = tab2.indexed-col	F = 1/(number of distinct keys in the larger index)
indexed-col > literal-value	F = (2nd-max - literal-value)/(2nd-max - 2nd-min)
indexed-col < literal-value	F = (literal-value - 2nd-min)/(2nd-max - 2nd-min)
any-col IS NULL any-col = any-expression	F = 1/10
EXISTS subquery	F = 1 if subquery estimated to return >0 rows, else 0
NOT expression	F = 1 - F(expression)
expr1 AND expr2	F = F(expr1) * F(expr2)
expr1 OR expr2	F = F(expr1) + F(expr2) - (F(expr1) * F(expr2))
any-col IN list	Treated as any-col = item₁ OR ... OR any-col = item_n
any-col relop ANY subquery	Treated as any-col relop value₁ OR ... OR any-col relop value_nfor estimated size of subquery n
Key: indexed-col: first or only column in an index 2nd-max, 2nd-min: second-largest and second-smallest key values in indexed column any-col: any column not covered by a preceding formula

Index Evaluation

The optimizer notes whether an indexed column can be used to evaluate a filter. For this purpose, an indexed column is any single column with an index or the first column named in a composite index.

The optimizer can use an index in the following cases:

When the column is indexed and a value to be compared is a literal, a host variable, or an uncorrelated subquery
To locate relevant rows in the table, the database server first looks for the row in an appropriate index. If an appropriate index is not available and table fragments cannot be eliminated based on the fragmentation scheme, the database server must scan each table.
When the column is indexed and the value to be compared is a column in another table (a join expression)
The database server can use the index to find matching values. The following join expression shows such an example:
```
WHERE customer.customer_num = orders.customer_num
```
If rows of customer are read first, values of customer_num can be applied to an index on orders.customer_num.
When processing an ORDER BY clause
If all the columns in the clause appear in the required sequence in a single index, the database server can use the index to read the rows in their ordered sequence, thus avoiding a sort.
When processing a GROUP BY clause
If all the columns in the clause appear in one index, the database server can read groups with equal keys from the index without requiring additional processing after the rows are retrieved from their tables.

For more information, see Choose Appropriate Columns for Indexes.