Building a Data warehouse – Two approaches – Same or Different?
Industry accepted definition of a data warehouse is – “a subject oriented, integrated, nonvolatile, time variant collection of data for management’s decision making”. Bill Inmon and Ralph Kimball approached the data warehouse architecture differently. Bill Inmon advocated the approach of creating a “Single version of Truth” Data warehouse where as Ralph Kimball took the “Dimensional data warehouse” approach. We can review the 2 approaches at a very high level.
The key aspect of Inmon’s architecture is the integration of all the applications data at a corporate level to create the “Single version of Truth”. In this approach Departmental data marts will be created on top of the data warehouse. The Integration of operational data from all the legacy applications is a complex and time consuming task. The organizations with multiple mergers and acquisitions will make this even more difficult as there will be multiple systems supporting the same business function. The data in the integration layer is in a normalized relational format. These types of implementations are typically multiyear initiatives leveraging MPP appliances such as Teradata, Netezza, Microsoft’s PDW. The biggest advantage of this approach is the creation of the “Single version of version” enterprise wide Centralized data layer acting as a one stop shop for the business groups for all their data needs.
Kimball’s architecture has evolved over time. The simple dimensional model has fact tables and Dimensions that form a star schema based ‘data mart”. The data to populate the dimensional data marts comes directly from applications. Similarly many different data marts are populated from many different source applications directly. Kimball called the collection of these data marts as a data warehouse. The biggest advantage of this approach is the speed with which data marts can be built. And the problem with this approach is application data being replicated many different times for different marts. For large organizations this becomes a huge issue. The business user have many different places to find a particular value and it simply causes confusion as there is no official corporate source for any data. Hence Kimball introduced the concept of conformed dimensions which was the first step to address the issue of integration. Then the architecture further evolved to include Master data management for creation of the single source of corporate data and data marts are created from this single source. This is commonly referred to as Kimball’s Hub and spoke architecture.
Ironically, if you compare the evolved Kimball’s architecture to Inmon’s architecture you would find them to be very similar.
Blogger : Senthil Balakrishnan