Saturday 25 January 2014

Surrogate Key

Surrogate Key:

A surrogate key is a numeric number that uniquely identifies records in the dimension table.
The term surrogate literally means substitute, hence a surrogate key is actually a substitute key that is used as the primary key in the dimension tables.  Before we further define surrogate key's, lets understand what natural keys are.

natural key is a set of one or more column in the dimension table that uniquely identifies a record in the table. The values of the natural key column(s) is provided by the source system. Ideally the natural keys must be defined as the primary key of the dimension table, but we refrain from this for the following reasons
  • The Natural Key could contain non numeric data type (timestamp or char) columns. Joining large fact tables with non numeric data types like timestamps could lead to performance issues.
  • They could be more than one in number, hence increasing the size of the Fact Table as we would need more than one column to join the fact table with the dimension table.
  • The format and structure of the natural key's could change in the future. This could happen when new source systems are added.
  • Having Natural keys will make the process of Slowly Changing Dimensions very complex.
Instead we use a surrogate key, which is a running number and  its value has no co-relation what so ever with the values of the natural key's.

"A surrogate key by definition means an artificial key which is a running Integer whose sole purpose is to identify a dimension row uniquely."

No comments:

Post a Comment