The optimizer bases query-cost estimates on the number of rows to be retrieved from each table. In turn, the estimated number of rows is based on the selectivity of each conditional expression that is used within the WHERE clause. A conditional expression that is used to select rows is termed a filter.
The selectivity is a value between 0 and 1 that indicates the proportion of rows within the table that the filter can pass. A selective filter, one that passes few rows, has a selectivity near 0, and a filter that passes almost all rows has a selectivity near 1. For guidelines on filters, see Improving Filter Selectivity.
The optimizer can use data distributions to calculate selectivities for the filters in a query. However, in the absence of data distributions, the database server calculates selectivities for filters of different types based on table indexes. The following table lists some of the selectivities that the optimizer assigns to filters of different types. Selectivities calculated using data distributions are even more accurate than the ones that this table shows.
Filter Expression | Selectivity (F) |
---|---|
indexed-col = literal-valueindexed-col = host-variableindexed-col IS NULL | F = 1/(number of distinct keys in index) |
tab1.indexed-col = tab2.indexed-col | F = 1/(number of distinct keys in the larger index) |
indexed-col > literal-value | F = (2nd-max - literal-value)/(2nd-max - 2nd-min) |
indexed-col < literal-value | F = (literal-value - 2nd-min)/(2nd-max - 2nd-min) |
any-col IS NULLany-col = any-expression | F = 1/10 |
any-col > any-expressionany-col < any-expression | F = 1/3 |
any-col MATCHES any-expressionany-col LIKE any-expression | F = 1/5 |
EXISTS subquery | F = 1 if subquery estimated to return >0 rows, else 0 |
NOT expression | F = 1 - F(expression) |
expr1 AND expr2 | F = F(expr1) ∩ F(expr2) |
expr1 OR expr2 | F = F(expr1) + F(expr2) - (F(expr1) ∩ F(expr2)) |
any-col IN list | Treated as any-col = item1 OR°OR any-col = itemn. |
any-col relop ANY subquery | Treated as any-col relop value1 OR ° OR any-col relop valuen for estimated size of subquery n. relop is any relational operator, such as <, >, >=, <=. |
Key:
indexed-col: first or only column in an index 2nd-max, 2nd-min: second-largest and second-smallest key values in indexed column |