Data Warehouse Data Modeling Standards
*Disclaimer - we do not always have control over standards for adopted systems such as SIS or UCPath
Tool: ErWin
Data Lineage Documentation (source to target): ErWin
ETL: Informatica
SQL Server (lunar/solar)
These standards are up for debate - one shared DB w/multiple schemas or different DBs? Need to discuss with the DB team to make a final decision.
Database: dss
Schema ODS: ods_{subject area}, for UCPath the standards is hcm_ods
Schema for Data Marts: dss_{subject area}
Schema for EDW (conformed dimensions, person, time): edw
Database for staging: staging
Schema for staging: stage_{subject area}
Table Naming Standards:
Dimensions: {subject area}_D_{table name} (KFS), {table_name}_D (UCPath), D_{table name} (EDW)
Facts: {subject area}_F_{table name} (KFS), {table_name}_F (UCPath), F_{table name} (EDW)
ODS Tables: If it is a copy of OLTP system table, leave name the same as the source system. If it is a new table, follow the naming standards of the source system.
ODS Views: {table_name}_V, and also should follow the naming standards of the source system.
Access control tables for row-level and column-level security should be in the dss_* schemas: {source system}_KSAMS_{table name}, for example KBM_KSAMS_ROLES
Column Naming Standards:
Try to maintain the column names from the source system when possible.
All tables should have a "CREATED_EW_DTTM" and "UPDATED_EW_DTTM" columns to show when the record was created and the data was last updated. Upon initial creation, these two values will be the same until the first update load.
ETL Columns:
SID columns should be named {name}_SID (e.g. ACCOUNT_SID), datatype BigInt Identity, primary key.
All fact tables should have a primary key, which should be a combination of all the dimension SIDs plus any date/time columns (fiscal period, snapshot date, etc.)
All dimensions will include type 2 fields, even if it is a type 1 dim:
*List out Type 2 Fields
Add to ETL Standards:
Non-matching Dimension value SID number should be set to -1