Thursday, July 16, 2009

Slowly Changing Dimension

Slowly Changing Dimension a.k.a. SCD is literally means Slow..ly Chang..ing Dimension. Simple.

It's a kind of table designated as a Dimension using the OLAP terminology and referred the most in star schema or snowflake schema design. Contents in the table are descriptive in nature, i.e. Name, Code, Description attributes rather than for analysis, i.e. Measures. A SCD can be designed to handle Type 2 changes, loosely speaking, maintaining full history of incremental changes.

The key to understand and correctly use SCD Type 2 feature in your favourite ETL tools is "Slow...ly Chang..ing". The ratio of changes in the dimension over a defined time interval ideally must not be greater than 5% (heuristically) of the original records, meaning 5 million record changes in every 100 million count. If the ratio is more than expected, you must exercise precautionary measures to ensure the performance is still acceptable. Secondly, the method to identify whether a record already changed is critical (Absolute value, hashsum, timestamp or etc. Thirdly, proper indexes in place will help in improving the performance of SCD process.

If changes are anticipated to be widely covering the source data, perhaps keep a complete snapshot of new data would be more efficient for loading, at the expense of storage and IO processing.





No comments: