Using User-Defined Statistics

Enterprise Edition Home | Express Edition Home | Previous Page | Next Page Creating User-Defined Routines > Extending Data Types > Providing Statistics Data for a Column >

Using User-Defined Statistics

The user-defined statistics information in sysdistrib system catalog table is used in the following ways:

The dbschema utility with its -hd option uses these statistics to display statistics data for tables.
The query optimizer uses statistics to obtain a best guess for queries on the user-defined data type column.

Displaying Statistics Data

The dbschema utility with its -hd option displays statistics data for tables in your database. It can automatically display statistics for all columns with built-in data types (except TEXT and BYTE). It cannot automatically collect statistics for columns with user-defined data types because it does not know the structure of these data types.

For dbschema -hd to display statistics for a column with a user-defined data type, you must write a user-defined function named statprint( ) that generates text output of the statistics collected for your user-defined data type. The dbschema -hd command obtains the user-defined statistics from the encdat column of the sysdistrib system catalog table. The encdat column stores the statistics data in the stat opaque type. Therefore, dbschema must call the statprint( ) function for your user-defined data type to convert the statistics data from the stat data type to an LVARCHAR value that can be displayed.

To provide display statistics for your user-defined data type, you must:

Define a C statistics-display function to implement the statistics display.
Convert the user-defined statistics for the column to text output within statistics-display function.
Register this C function as a statprint( ) user-defined function.

Defining a Statistics-Display Function

When you declare your statistics-display function, it must have the following signature:

mi_lvarchar *statprint(udt_arg, stat_arg)
   udt_type *udt_arg;
   mi_statret *stat_arg;

udt_arg: is a pointer to a dummy argument. The database server uses this argument to resolve the function and to pass in column values.
stat_arg: is a pointer to the mi_statret structure that contains the statistics information for the user-defined data type.

DBDK

BladeSmith automatically generates an OpaqueStatPrint( ) function (in which Opaque is the name of your opaque data type) with the following declaration:

mi_lvarchar *OpaqueStatCollect(Gen_dummy, Gen_bvin)
   void *Gen_dummy;
   mi_lvarchar *Gen_bvin;

If this declaration is not appropriate for your opaque type, you must customize the OpaqueStatPrint( ) function.

End of DBDK

Creating the ASCII Histogram

The statprint( ) function converts the statistics data stored in the stat data type to an LVARCHAR value that the database server can use to display information. The stat data type is a multirepresentational data type that the database server uses to store statistics data in the encdat column of the sysdistrib system catalog table.

Registering the statprint( ) Function

As with any user-defined function, you register the statistics-display function with the CREATE FUNCTION statement. The registration of this function has the following requirements:

You must name this user-defined function statprint.
The database server handles routine resolution based on the data type of the first argument to statprint( ). If the name of your C statistics-collection function is not statprint( ), specify the C function name in the EXTERNAL NAME clause.
The data types of the parameters must be as follows.

Parameter Number

Parameter Data Type

1

SQL name for the user-defined data type

2

stat
The function must return a value of type LVARCHAR.

The following CREATE FUNCTION statement registers a statistics-collection function:

CREATE FUNCTION statprint(ll_arg longlong, num_rows stat)
RETURNING LVARCHAR
EXTERNAL NAME '/usr/udrs/bin/longlong.so(stat_print_ll)'
LANGUAGE C;

After you register the statprint( ) function, make sure those with the DBA privilege and the table owner have the Execute privilege for the function.

Using User-Defined Statistics in a Query

For SQL statements that use user-defined data types, the optimizer can call custom selectivity and cost functions. Selectivity and cost functions might need to use statistics about the nature of the data in a column. When you create the statcollect( ) function that collects statistics for a UDT, the database server executes this function automatically when a user runs the UPDATE STATISTICS statement with the MEDIUM or HIGH keyword.

The statistics that the database server collects might require a smart large object for storage. The configuration parameter SBSSPACENAME specifies an sbspace for storing this information. If SBSSPACENAME is not set, the database server might not be able to collect the specified statistics.

The query optimizer can use data distributions when it assesses the selectivity of a query filter. The selectivity is the number of rows that the filter will return. For queries that involve columns with built-in data types, the database server uses data distributions to automatically determine selectivity for the following kinds of filters:

Relational-operator functions (lessthan( ), ...)
Boolean built-in operator functions: like( ), matches( )

Important:

The query optimizer can only use data distributions if the UPDATE STATISTICS statement has collected these distributions in the sysdistrib system catalog table.

However, if the query involves columns with user-defined data types, you must provide the following information for the query optimizer to be able to determine the filter selectivity:

Write a user-defined function to implement the appropriate operator function.
For user-defined types, these built-in operator functions do not automatically exist. You must write versions of these functions that handle your user-defined type.
Write a selectivity function for the operator function to provide the optimizer with a selectivity value.
Selectivity and cost functions might need to use statistics about the nature of the data in a column. If you want these selectivity functions to use data distributions, take the following actions:
- Provide user-defined statistics so that the UPDATE STATISTICS statement saves the data distributions in the sysdistrib system catalog table.
- Access the sysdistrib table from within the selectivity function to obtain the data distributions for the column.
For more information on how to write and register selectivity functions, see Writing Selectivity and Cost Functions.