Analyze query plans either dynamically through ISA and onstat utility options or by examining the output of the SET EXPLAIN statement, as follows:
SET EXPLAIN ON AVOID_EXECUTE
By default, the database server overwrites the contents of the output file each time it collects data for a query. To keep information for all queries, add the WITH APPEND clause to the SET EXPLAIN statement, as the following example shows:
SET EXPLAIN ON WITH APPEND;
If you are tuning more than one query, you might specify a different output file for each query. For information about the default name and location of the output file, and how to specify a different name and location, refer to SET EXPLAIN in the IBM Informix: Guide to SQL Syntax.
To generate sqexplain.out without executing the query, use the set explain on statement with the avoid_execute keyword. This is useful when you want to analyze query plans for large and complex queries but do not want to take the time or use the database server resources required to execute them. The following example shows how to turn this feature on and off for a session:
set explain on avoid execute; ... ... set explain off/on;
When AVOID_EXECUTE is on, the server writes information about each SELECT statement to sqexplain.out but does not execute the statement. To begin actually executing queries, execute the set explain AVOID_EXECUTE off statement.
You can also disable execution of for specific queries. The following example shows the syntax for using the avoid_execute optimizer directive:
SELECT {+explain avoid_execute}
emp.job_num, job.job_name
FROM emp, job
WHERE emp.job_num = job.job_num;
The database server generates explain output for this SELECT statement, but does not execute the query. This directive has no effect on subsequent queries.
Figure 22 shows an example of a query plan output from the SET EXPLAIN ON statement.
QUERY: ------ SELECT C.customer_num, O.order_num, SUM (I.total_price) FROM customer C, orders O, items I WHERE C.customer_num = O.customer_num AND O.order_num = I.order_num GROUP BY C.customer_num, O.order_num; Estimated Cost: 102 Estimated # of Rows Returned: 12 Temporary Files Required For: GROUP BY 1) sales.o: SEQUENTIAL SCAN 2) sales.c: INDEX PATH (1) Index Keys: customer_num (Key-Only) Lower Index Filter: sales.c.customer_num = sales.o.customer_num 3) pubs.i: INDEX PATH (1) Index Keys: order_num Lower Index Filter: sales.i.order_num = sales.o.order_num
After the query statement, the query plan includes the following information:
Figure 22 shows that the cost for the query is 102. These units represent a relative time for query execution, in which each unit is assumed to be roughly equivalent to a typical disk access. The optimizer chose this query plan because the estimated cost for its execution was the lowest among all the evaluated plans.
The example in Figure 22 shows that the number of rows returned is estimated to be 12.
The plan in Figure 22 shows that the database server will perform the following actions:
Lower Index Filter shows the key value where the index read begins. If the filter condition contains more than one value, an Upper Index Filter would be shown for the key value where the index read stops.
The following sections explain how to interpret information in the SET EXPLAIN output when you run a query such as the following one:
SET PDQPRIORITY 10; SET EXPLAIN ON; select geo_id, sum(dollars) from customer a, cash_rr b where a.cust_id=b.cust_id group by geo_id order by geo_id;
The SET EXPLAIN output provides the runtime data for each SQL operator in a query plan. For a complete list of SQL operators, a description of the function of each operator, and whether it can be executed in parallel, see SQL Operators. The following table lists the common SQL operators and explains how to interpret the information that appears.
To display the runtime data of a query, use the following monitoring tools:
For more information on how the database server creates a query plan for execution, refer to How Query Plans Are Created.
Figure 23 shows statistics excerpts from a sample sqexplain.out file.
XMP Query Statistics Cosvr_ID: 1 Plan_ID: 9 type segid brid information ---- ----- ---- ----------- scan 6 0 inst cosvr time rows_prod rows_scan ---- ----- ---- --------- --------- 0 1 0 1 1 -------------------------------------- 1 1 1 scan 7 0 inst cosvr time rows_prod rows_scan ---- ----- ---- --------- --------- 0 1 13 758 991161 1 1 12 687 989834 2 1 13 677 975101 3 1 12 691 972258 4 1 11 736 952424 5 1 12 686 981833 6 1 9 330 628271 7 1 8 359 641173 8 1 9 345 671366 9 1 5 402 343214 10 1 4 343 345530 11 1 5 359 334115 -------------------------------------- 12 6373 8826280 hjoin 5 0 inst cosvr time rows_prod rows_bld rows_probe mem ovfl ---- ----- ---- --------- -------- ---------- --- ---- 0 1 13 0 0 0 80 0 1 1 39 6373 1 6373 88 0 2 1 13 0 0 0 80 0 3 1 13 0 0 0 80 0 ------------------------------------------------------------------- 4 6373 1 6373 (6656)
This sqexplain.out file shows the following query-plan information.
The figure below shows a query plan for a distributed query.
QUERY: ------ SELECT l.customer_num, l.lname, l.company, l.phone, r.call_dtime, r.call_descr FROM customer l, stores@california:cust_calls r WHERE l.customer_num = r.customer_num
Estimated Cost: 3 Estimated # of Rows Returned: 7
1) informix.l: SEQUENTIAL SCAN (Parallel, fragments: ALL)
2) informix.r: REMOTE PATH
Remote SQL Request: select x0.call_dtime ,x0.call_descr ,x0.customer_num from stores:"john".cust_calls x0
DYNAMIC HASH JOIN (Build Outer Broadcast) Dynamic Hash Filters: informix.l.customer_num = informix.r.customer_num
# of Secondary Threads = 5
XMP Query Plan
oper segid brid width misc info
scan 2 0 2 l
remote 3 0 1
hjoin 1 0 2
The following table shows the main differences between the chosen query plans for the distributed join and the local join shown in the figure.
Information Displayed
in
Output |
Description of Difference in the
Output
for a Distributed and Local-Only Query |
---|---|
SQL statement | The local database server sends the SQL statement to the remote site. The remote site reoptimizes the statement to choose the actual plan. |
Table name | The remote table name is prefaced with the database and server names. |
Estimated cost | The optimizer estimates a higher cost for the distributed query. |
Remote path | For a distributed query, indicates that the query
involves a scan of objects on a remote database server (remote scan).
In the example in Figure 24, the remote scan is for r or stores@california:cust_calls. The optimizer decided to use the local table as the outer table of the hash join. |
Remote SQL request | The remote SQL request is the SQL statement that
the local database server sends to the remote server for the remote scan.
The statement qualifies the remote table with its owner name.
In the example in Figure 24, the owner name is john.The remote server reoptimizes this statement to choose the actual plan. |
Queries are processed faster if hash joins and other operators execute entirely in shared memory. If not enough memory is available, operator output overflows to temporary disk space and must be read back in when needed. For increased query-processing efficiency, hash-join overflows to temporary space are read back in asynchronously through read-ahead buffers.
The following sample section of sqexplain.out information for a query shows that both instances of the join operator had overflow to disk:
hjoin 2 0 inst cosvr time rows_prod rows_bld rows_probe mem ovfl tmp ---- ----- ---- --------- -------- ---------- --- ---- --- 0 1 5 4941 1489 4973 1304 872 244 1 2 7 4993 1511 5027 1320 784 226 ---------------------------------------------------------------------- 2 9934 3000 10000 (2432)
The mem column shows the number of kilobytes of memory used to fit all the build rows in memory. The number in the ovfl specifies the number of kilobytes of overflow to disk. The number in the tmp column is the number of pages used. The number in parentheses in the last row of the example is the total memory allocated for the hash join across all the coservers.
If analysis tools show that hash joins are overflowing to temporary space, consider the following options to ensure that the query is granted enough memory:
To calculate the PDQPRIORITY value that would give coserver 2 the memory that it requires for this hash join, divide the memory required (1320 kilobytes) by the value of DS_TOTAL_MEM. Then multiply the result by 100.
If more than one memory-consuming instance overflows, to ensure that all instances get enough memory, choose the instance that requires the most memory.
When IMPLCIT_PDQ is in effect, the hash join operators might be granted more memory than other operators, and thus might not overflow to disk.
If the query still overflows to disk and this overflow degrades performance, you can specify more than one hundred percent as an argument to the SET ENVIRONMENT IMPLICIT_PDQ statement. For example, to request that the optimizer increase its estimate by twenty percent, execute the following statement:
SET ENVIRONMENT IMPLICIT_PDQ 120;
For more information, refer to the IBM Informix: Guide to SQL Syntax.