Before the optimizer can estimate the cost of each possible query plan, it examines the query filters and determines the selectivity of each column referenced by the query statement.
The number of rows is determined by the selectivity of each conditional expression in the WHERE clause. A conditional expression that is used to select particular rows is called a filter. The selectivity is a value between 0 and 1 that indicates the proportion of rows in the table that the meet the requirements of the filter. A very selective filter, one that passes few rows, has a selectivity close to 0; a filter that passes almost all rows has a selectivity near 1. For guidelines on filters, see Evaluate Filter Clauses.
If data distributions are not available but the relevant columns are indexed, the optimizer will calculate selectivities based on the table indexes. The optimizer uses the formulas listed in the following table to estimate selectivity.
The list is not exhaustive. Selectivities calculated using data distributions are more accurate than those shown in the table.
Filter Expression | Selectivity (F) |
---|---|
indexed-col = literal-value indexed-col = host-variable indexed-col IS NULL | F = 1/(number of distinct keys in index) |
tab1.indexed-col = tab2.indexed-col | F = 1/(number of distinct keys in the larger index) |
indexed-col > literal-value | F = (2nd-max - literal-value)/(2nd-max - 2nd-min) |
indexed-col < literal-value | F = (literal-value - 2nd-min)/(2nd-max - 2nd-min) |
any-col IS NULL any-col = any-expression | F = 1/10 |
EXISTS subquery | F = 1 if subquery estimated to return >0 rows, else 0 |
NOT expression | F = 1 - F(expression) |
expr1 AND expr2 | F = F(expr1) * F(expr2) |
expr1 OR expr2 | F = F(expr1) + F(expr2) - (F(expr1) * F(expr2)) |
any-col IN list | Treated as any-col = item1 OR ... OR any-col = itemn |
any-col relop ANY subquery | Treated as any-col relop value1 OR ... OR any-col relop valuen for estimated size of subquery n |
Key: indexed-col: first or only column in an index 2nd-max, 2nd-min: second-largest and second-smallest key values in indexed column any-col: any column not covered by a preceding formula |
The optimizer notes whether an indexed column can be used to evaluate a filter. For this purpose, an indexed column is any single column with an index or the first column named in a composite index.
The optimizer can use an index in the following cases:
To locate relevant rows in the table, the database server first looks for the row in an appropriate index. If an appropriate index is not available and table fragments cannot be eliminated based on the fragmentation scheme, the database server must scan each table.
The database server can use the index to find matching values. The following join expression shows such an example:
WHERE customer.customer_num = orders.customer_num
If rows of customer are read first, values of customer_num can be applied to an index on orders.customer_num.
If all the columns in the clause appear in the required sequence in a single index, the database server can use the index to read the rows in their ordered sequence, thus avoiding a sort.
If all the columns in the clause appear in one index, the database server can read groups with equal keys from the index without requiring additional processing after the rows are retrieved from their tables.
For more information, see Choose Appropriate Columns for Indexes.
Home | [ Top of Page | Previous Page | Next Page | Contents | Index ]