Home | Previous Page | Next Page   Dimensional Databases > Building a Dimensional Data Model > Handling Common Dimensional Data-Modeling Problems >

Using the Snowflake Schema

A snowflake schema is a variation on the star schema, in which very large dimension tables are normalized into multiple tables. Dimensions with hierarchies can be decomposed into a snowflake structure when you want to avoid joins to big dimension tables when you are using an aggregate of the fact table. For example, if you have brand information that you want to separate out from a product dimension table, you can create a brand snowflake that consists of a single row for each brand and that contains significantly fewer rows than the product dimension table. Figure 46 shows a snowflake structure for the brand and product line elements and the brand_agg aggregate table.

Figure 46. An Example of a Snowflake Schema
begin figure description - This figure is described in the surrounding text. - end figure description

If you create an aggregate, brand_agg, that consists of the brand code and the total revenue per brand, you can use the snowflake schema to avoid the join to the much larger sales table, as the following query on the brand and brand_agg tables shows:

SELECT brand.brand_name, brand_agg.total_revenue
FROM brand, brand_agg
   WHERE brand.brand_code = brand_agg.brand_code 
   AND brand.brand_name = 'Anza'

Without a snowflaked dimension table, you use a SELECT UNIQUE or SELECT DISTINCT statement on the entire product table (potentially, a very large dimension table that includes all the brand and product-line attributes) to eliminate duplicate rows.

While snowflake schemas are unnecessary when the dimension tables are relatively small, a retail or mail-order business that has customer or product dimension tables that contain millions of rows can use snowflake schemas to significantly improve performance.

If an aggregate table is not available, any joins to a dimension element that was normalized with a snowflake schema must now be a three-way join, as the following query shows. A three-way join reduces some of the performance advantages of a dimensional database.

SELECT brand.brand_name, SUM(sales.revenue)
FROM product, brand, sales
   WHERE product.brand_code = brand.brand_code 
   AND brand.brand_name = 'Alltemp'
GROUP BY brand_name
Home | [ Top of Page | Previous Page | Next Page | Contents | Index ]