Estimating Bitmap Index Size

Home | Previous Page | Next Page Appendix A. Estimating Table and Index Size > Index Page-Size Estimates >

Estimating Bitmap Index Size

You can create bitmap versions of B-tree indexes, which can be used for tables that are updated, and of GK indexes, which can be used only for STATIC tables, which are not updated.

Conventional B-tree index and B-tree indexes with compressed bitmap leaf pages differ only in the way in which the duplicate list for each key is stored at the leaf level. In a B-tree index, duplicate values are stored in the same way as unique values, one in each slot of an index leaf page. In a bitmap index, duplicate values are stored as a bit map in a leaf page, with a bit mapped to each row of the table. For rows that contain the specific value, the corresponding bit is marked ON.

You might create a bitmap index if the column to be indexed contains many identical values. The more identical values that the column contains, the more efficient the index is, both in storage and processing. The more accurately you can estimate the number of identical values in the key columns, the more accurate your estimate of the index size and storage efficiency will be.

For example, if the indexed column can contain only two values, such as M and F, calculating the storage efficiency of a bitmap index is easy. On the other hand, if a column can contain fifty or sixty values, such as the ages of employees, calculating the efficiency of the index is harder, but a bitmap index on this column might still store key values more efficiently than a conventional B-tree index.

Follow these steps to determine the efficiency of a bitmap index on a table column and to estimate the disk space that the index will require:

Use the maximum number of table rows that can fit in a table page of 255 slots to calculate the integer i, which is the first power of 2 that is greater than or equal to the number of rows on a data page.
For example, if a data page can contain a maximum of 35 rows, i is 6, because 2⁵ < 35 <= 2⁶. In fact, the value of i is 6 for any number of rows between 33 and 64.
Estimate the average distance between two rows that have keys with the same value, assuming that keys with identical values are distributed evenly in the table fragment:
- If the table is not fragmented, list the rowids with the following SELECT statement (replacing key with the column on which the index will be added and duplicate_value with the value of the duplicate key):
```
SELECT rowid FROM table WHERE key = duplicate_value
```
- If the table is fragmented, use the following onutil CHECK INDEX command to list all rowids and their associated keys:
```
onutil
>check index with data database database_name 
   index index_name display data
```
- The last byte of the hexadecimal row identifier is the slot number. The first three bytes (leading zeros suppressed) are the page ID. Use the following formula to calculate the distance between any two specific row identifiers:
```
(PageID₂ - PageID₁) * 2ⁱ + (slot#₂ - slot#₁)
```
  For example, if each page contains 35 rows, the distance between row identifier 0x302 and row identifier 0x401 is calculated as (4 - 3) * 64 + (2 - 1) = 65.
To calculate the integer, N, for a data fragment that contains a total number of pages, P, use the following formula:
```
N = P * 2ⁱ
```
For example, if P is 10,000 and i is 6, then N is 640,000.
Given these formulas, use the estimate of the number of rows in the table and the number of duplicate key values to estimate the minimum space that is required for each bitmap at the leaf nodes of the index:
- If the average row identifier distance between duplicate-key records, as calculated by the formula in step 2, is greater than 64, use the following formula to estimate the number of bytes required for the bitmap for a particular key value:
```
bitmap_size = 24 + (8 * Number of duplicate-key
records)
```
- If the average row identifier distance between duplicate-key records is less than or equal to 64, use the following formula to estimate the space required for each compressed bitmap key:
```
bitmap_size = 28 + N/8 bytes
```

Storage efficiency increases as the distance between rows with the same key decreases. For example, if the average row identifier distance is 75, the space required for a key with 100 duplicates is 824 bytes.

bitmap_size = 24 + (8 * 100) = 824 bytes

In this example, the 824 bytes to store the bitmap is more than the space needed for the key in a conventional index (5 * 100 = approximately 500 bytes plus the size of the key value).

However, if the value of N is 640,000, and the row identifier distance between duplicate key records is 40, the space required for a key is 80,028 bytes.

bitmap_size = 28 + 640,000/8 = 80,028 bytes

In this case, the bitmap is much more efficient than the 3,200,000 bytes needed to store the duplicate row identifiers in a conventional index.

If the bitmap would be larger than the row identifier list of a conventional B-tree index, the database server uses the row identifier list representation for the key instead of the bitmap representation.

As with conventional B-tree indexes, only one bitmap value is permitted on an overflow page.

In addition, bitmaps are aligned on the leaf nodes on 4-byte boundaries. On each page, gaps of 0 to 3 bytes might exist between the end of the key and the beginning of its bitmap. Consider the previous example, where N is 640,000, the row identifier distance between duplicate-key records is 40, and the space that is required for the bitmap is 80,028 bytes. If each index leaf page contains 4,000 free bytes and if the key size is 5 bytes, the bitmap is broken up into 3992-byte pieces because the space required for the key is rounded up to 8 bytes.