The following SELECT statement calls for a three-way join:
SELECT C.customer_num, O.order_num FROM customer C, orders O, items I WHERE C.customer_num = O.customer_num AND O.order_num = I.order_num
Assume also that no indexes are on any of the three tables. Suppose that the optimizer chooses the customer-orders-items path and the nested-loop join for both joins (in reality, the optimizer usually chooses a hash join for two tables without indexes on the join columns). Figure 49 shows the query plan, expressed in pseudocode. For information about interpreting query plan information, see Query Plan Report.
for each row in the customer table do: read the row into C for each row in the orders table do: read the row into O if O.customer_num = C.customer_num then for each row in the items table do: read the row into I if I.order_num = O.order_num then accept the row and send to user end if end for end if end for end for
This procedure reads the following rows:
This example does not describe the only possible query plan. Another plan merely reverses the roles of customer and orders: for each row of orders, it reads all rows of customer, looking for a matching customer_num. It reads the same number of rows in a different order and produces the same set of rows in a different order. In this example, no difference exists in the amount of work that the two possible query plans need to do.
The presence of a column filter changes things. A column filter is a WHERE expression that reduces the number of rows that a table contributes to a join. The following example shows the preceding query with a filter added:
SELECT C.customer_num, O.order_num FROM customer C, orders O, items I WHERE C.customer_num = O.customer_num AND O.order_num = I.order_num AND O.paid_date IS NULL
The expression O.paid_date IS NULL filters out some rows, reducing the number of rows that are used from the orders table. Consider a plan that starts by reading from orders. Figure 50 displays this sample plan in pseudocode.
for each row in the orders table do: read the row into O if O.paid_date is null then for each row in the customer table do: read the row into C if O.customer_num = C.customer_num then for each row in the items table do: read the row into I if I.order_num = O.order_num then accept row and return to user end if end for end if end for end if end for
Let pdnull represent the number of rows in orders that pass the filter. It is the value of COUNT(*) that results from the following query:
SELECT COUNT(*) FROM orders WHERE paid_date IS NULL
If one customer exists for every order, the plan in Figure 50 reads the following rows:
Figure 51 shows an alternative execution plan that reads from the customer table first.
for each row in the customer table do: read the row into C for each row in the orders table do: read the row into O if O.paid_date is null and O.customer_num = C.customer_num then for each row in the items table do: read the row into I if I.order_num = O.order_num then accept row and return to user end if end for end if end for
Because the filter is not applied in the first step that Figure 51 shows, this plan reads the following rows:
The query plans in Figure 50 and Figure 51 produce the same output in a different sequence. They differ in that one reads a table pdnull times, and the other reads a table SELECT COUNT(*) FROM customer times. By choosing the appropriate plan, the optimizer can save thousands of disk accesses in a real application.
The preceding examples do not use indexes or constraints. The presence of indexes and constraints provides the optimizer with options that can greatly improve query-execution time. Figure 52 shows the outline of a query plan for the previous query as it might be constructed using indexes.
for each row in the customer table do: read the row into C look up C.customer_num in index on orders.customer_num for each matching row in the orders index do: read the table row for O if O.paid_date is null then look up O.order_num in index on items.order_num for each matching row in the items index do: read the row for I construct output row and return to user end for end if end for end for
The keys in an index are sorted so that when the database server finds the first matching entry, it can read any other rows with identical keys without further searching, because they are located in physically adjacent positions. This query plan reads only the following rows:
This query plan achieves a great reduction in cost compared with plans that do not use indexes. An inverse plan, reading orders first and looking up rows in the customer table by its index, is also feasible by the same reasoning.
The physical order of rows in a table also affects the cost of index use. To the degree that a table is ordered relative to an index, the overhead of accessing multiple table rows in index order is reduced. For example, if the orders table rows are physically ordered according to the customer number, multiple retrievals of orders for a given customer would proceed more rapidly than if the table were ordered randomly.
In some cases, using an index might incur additional costs. For more information, see Index Lookup Costs.
Enterprise Edition Home | Express Edition Home | [ Top of Page | Previous Page | Next Page | Contents | Index ]