Using the DROP DISTRIBUTIONS Option

Use the DROP DISTRIBUTIONS option to force the removal of distribution information from the sysdistrib system catalog table.

When you specify the DROP DISTRIBUTIONS option, the database server removes the existing distribution data for the column or columns that you specify. If you do not specify any columns, the database server removes all the distribution data for that table.

The following example shows how to remove distributions for the customer_num column in the customer table:

As the example shows, you drop the distribution data at the same time you update the statistical data that the low mode option generates.

Using the MEDIUM Mode Option

Use the MEDIUM mode option to update the same statistics that you can perform with the LOW mode option and also generate statistics about the distribution of data values for each specified column. The database server places distribution information in the sysdistrib system catalog table.

If you use the MEDIUM mode option, the database server scans tables at least once and takes longer to execute on a given table than the LOW mode option.

When you use the MEDIUM mode option, the data for the distributions is obtained by sampling a percentage of data rows, using a statistical confidence level that you specify, or else a default confidence level of 95 percent.

Because the MEDIUM sample size is usually much smaller than the actual number of rows, this mode executes more quickly than the HIGH mode.

Because the distribution is obtained by sampling, the results can vary, because different samples of rows might produce different distribution results. If the results vary significantly, you can lower the resolution percent or increase the confidence level to obtain more consistent results.

If you specify no RESOLUTION clause, the default average percentage of the sample in each bin is 2.5, dividing the range into 40 intervals. If you do not specify a value for confidence_level, the default level is 0.95. This value can be roughly interpreted to mean that 95 times out of 100, the estimate is not statistically different from what would be obtained from high distributions.

You must have the DBA privilege or be the owner of the table to create medium distributions. For more on the MEDIUM and HIGH mode options, see the Resolution Clause.

Using the HIGH Mode Option

Use the HIGH mode option to update the same statistics that you can perform with the LOW mode option and also generate statistics about the distribution of data values for each specified column. The database server places distribution information in the sysdistrib system catalog table.

If you do not specify a RESOLUTION clause, the default percentage of data distributed to every bin is 0.5, partitioning the range of values for each column into 200 intervals.

The constructed distribution is exact. Because more information is gathered, this mode executes more slowly than LOW or MEDIUM modes. If you use the HIGH mode option of update statistics, the database server can take considerable time to gather the information across the database, particularly a database with large tables. The HIGH keyword might scan each table several times (for each column). To minimize processing time, specify a table name and column names within that table.

You must have the DBA privilege or be the owner of the table to create HIGH distributions. For more information on the MEDIUM and HIGH mode options, see the Resolution Clause.

Resolution Clause

Use the Resolution clause to adjust the size of the distribution bin, designate whether or not to avoid calculating data on indexes, and with the MEDIUM mode, to adjust the confidence level.

RESOLUTION Clause:
 
|--+-| RESOLUTION Clause for MEDIUM Mode |-+--------------------|
   '-| RESOLUTION Clause for HIGH Mode |---'
 
RESOLUTION Clause for MEDIUM Mode:
 
|--RESOLUTION--percent--+------------------+-------------------->
                        '-confidence_level-'
 
>--+---------------------------+--------------------------------|
   |  (1)                      |
   '--------DISTRIBUTIONS ONLY-'
 
RESOLUTION Clause for HIGH Mode:
 
|--RESOLUTION--percent--+---------------------------+-----------|
                        |  (1)                      |
                        '--------DISTRIBUTIONS ONLY-'

Notes:

Dynamic Server only

Element	Description	Restrictions	Syntax
confidence_level	Estimated fraction of the time that sampling in MEDIUM mode should produce same results as the exact HIGH mode. Default level is 0.95.	Must be within the range from 0.80 (minimum) to 0.99 (maximum)	Literal Number
percent	Percentage of sample in each bin of distribution Default is 2.5 for MEDIUM and 0.5 for HIGH.	Minimum resolution is 1/nrows, for nrows the number of rows in the table	Literal Number

A distribution is a mapping of the data in a column into a set of column values, ordered by magnitude or by collation. The range of these sample values is partitioned into disjunct intervals, called bins, each containing an approximately equal portion of the sample of column values. For example, if one bin holds 2 percent of the data, 50 such intervals hold the entire sample.

Some statistical texts call these bins equivalence categories. Each contains a subset of the range of the data values that are sampled from the column.

The optimizer estimates the effect of a WHERE clause by examining, for each column included in the WHERE clause, the proportionate occurrence of data values contained in the column.

You cannot create distributions for BYTE or TEXT columns. If you include a BYTE or TEXT column in an UPDATE STATISTICS statement that specifies medium or high distributions, no distributions are created for those columns. Distributions are constructed for other columns in the list, however, and the statement does not return an error.

Columns of the VARCHAR data type do not use overflow bins, even when multiple bins are being used for duplicate values.

The amount of space that the DBUPSPACE environment variable specifies determines the number of times the database server scans the designated table to construct a distribution.