Enterprise Edition Home | Express Edition Home | Previous Page | Next Page   Improving Individual Query Performance > Updating Statistics >

Displaying Distributions

Unless column values change considerably, you do not need to regenerate the data distributions. To verify the accuracy of the distribution, compare dbschema -hd output with the results of appropriately constructed SELECT statements.

For example, the following dbschema command produces a list of distributions for each column of table customer in database vjp_stores with the number of values in each bin, and the number of distinct values:

dbschema -hd customer -d vjp_stores

Figure 72 shows the data distributions for the column zipcode that this dbschema -hd command produces. Because this column heads the zip_ix index, UPDATE STATISTICS HIGH was run on it, as the following output line indicates:

High Mode, 0.500000 Resolution

Figure 72 shows 17 bins with one distinct zipcode value in each bin.

Figure 72. Displaying Data Distributions with dbschema -hd
dbschema -hd customer -d vjp_stores

...
Distribution for virginia.customer.zipcode

Constructed on 09/18/2000

High Mode, 0.500000 Resolution
  
--- DISTRIBUTION ---

     (          02135 )
  1: (  1,   1, 02135 )
  2: (  1,   1, 08002 )
  3: (  1,   1, 08540 )
  4: (  1,   1, 19898 )
  5: (  1,   1, 32256 )
  6: (  1,   1, 60406 )
  7: (  1,   1, 74006 )
  8: (  1,   1, 80219 )
  9: (  1,   1, 85008 )
 10: (  1,   1, 85016 )
 11: (  1,   1, 94026 )
 12: (  1,   1, 94040 )
 13: (  1,   1, 94085 )
 14: (  1,   1, 94117 )
 15: (  1,   1, 94303 )
 16: (  1,   1, 94304 )
 17: (  1,   1, 94609 )


--- OVERFLOW ---

  1: (  2,      94022 )
  2: (  2,      94025 )
  3: (  2,      94062 )
  4: (  3,      94063 )
  5: (  2,      94086 )

The OVERFLOW portion of the output shows the duplicate values that might skew the distribution data, so dbschema moves them from the distribution to a separate list. The number of duplicates in this overflow list must be greater than a critical amount that the following formula determines. Figure 72 shows a resolution value of .0050. Therefore, this formula determines that any value that is duplicated more than one time is listed in the overflow section.

Overflow = .25 * resolution * number_rows
      = .25 * .0050 * 28
      = .035

For more information on the dbschema utility, see the IBM Informix: Migration Guide.

Enterprise Edition Home | Express Edition Home | [ Top of Page | Previous Page | Next Page | Contents | Index ]