Creating and maintaining the data warehouse
- Requirements
- an integrated collection of data copied from multiple sources
- example: a company with branch offices spread across many countries
- problems
- source databases created and maintained by different groups
- semantic mismatches
- different currency units
- different names for the same attribute
- differences in how tables are normalized or structured
- Steps in creating and maintaining a data warehouse
- extract data from operational databases and external sources
- clean data and fill in missing pieces of information
- transform data: create views over the tables and data sources
- load data:
- materialize views
- generate summary information
- partition data and build indexes
- refresh and purge data periodically
- metadata repository: system catalogs
- description of fields
- source for each data warehouse table
- last refresh date for each table