Important Issues and Project Abstract

Objective:

Objective of this project is to built a Metadata Repository as a database to gather, store and distribute contextual information about business data. This contextual business data include meaning and content, policies that govern, technical attributes, specification that transform, and programs that manipulate.

Ralph Kimball defined there are three main categories of metadata: Technical metadata, business metadata and process metadata. Technical metadata are primarily definitional, while business metadata and process metadata are primarily descriptive. Keep in mind that the categories sometimes overlap.

Main Categories of Metadata:

Technical metadata define the objects and processes in a DW/BI system, as seen from a technical point of view. The technical metadata include the system metadata which define the data structures such as: tables, fields, data types, indexes and partitions in the relational engine, and databases, dimensions, measures, and data mining models. Technical metadata define the data model and the way it is displayed for the users, with the reports, schedules, distribution lists and user security rights.

Business metadata are a content from the data warehouse described in more user-friendly terms. The business metadata tell you what data you have, where they come from, what they mean and what is their relationship is to other data in the data warehouse. Business metadata may also serve as a documentation for the DW/BI system. Users who browse the data warehouse are primarily viewing the business metadata.

Process metadata are used to describe the results of various operations in the data warehouse. Within the ETL process, all key data from tasks are logged on execution. This includes start time, end time, CPU seconds used, disk reads, disk writes and rows processed. When troubleshooting the ETL or query process, this sort of data becomes valuable. Process metadata are the fact measurement when building and using a DW/BI system. Some organizations make a living out of collecting and selling this sort of data to companies - in that case the process metadata becomes the business metadata for the fact and dimension tables. Collecting process metadata is in the interest of business people who can use the data to identify the users of their products, which products they are using and what level of service they are receiving.

Preliminary Scope:

In order to limit the scope and start with a managebale scale, I think we should start mainly with the technicla metada and include some essential/overlapped buinsess metadata.

MDS ISSUE: MASTER DATA V.S. METADATA

MDS / Master Data is the high-value, core information used to support critical business processes across the enterprise, its aim is to espouse the “single version of the truth”. AS A Result, in MS MDS Package contains real data, where the end -users can edit and define business rules, define the data domain, and control who can access what in fine details.

On the other hand, metadata is the data about the data. The traditional approach is to begin by building a repository of metadata. In our case, we do not have the information readily available. Regarding MS MDS is for Master Data Service, it is assuming loading the data using the excel spreadsheet will suffice, data is what MDS requires, not data about data. MS used to support meta data repository in the older versions of SQL/SERVER 2005, after 2005, it was no longer supported, it could have been a good basis for metadata building. This is probably an important feature we should be looking for.

Secondly, MDS is enforcing a strict organizational structure designed for the function of Master Data Management, such that Model= business function, entity = table, attribute = column. Since the metadata model in MDS came as a small part of the system. We will be implementing the management of metadata through a sub-function of MDS. Inevitably, this would put us through an awkward situation. For instance, the MDS Metadata Model contains five entitites; in the online documentation it listed only three as follows:

Model Metadata Definition: entity contains members that represent models,
Entity Metadata Definition: you can entity’s Description attribute here.
Attribute metadata Definition: entity has members that represent all attributes in all models.
Attribute Group Metadata Definition:
Hierarchy Metada Definition:

So in order to track attribute properties, There are two approaches:

we’ll need to define each Attribute as an entity in MDS such that it will look like the following: for example(assuming A= attribute, P=property e.g. “description”) we will need to define these entities.

A₁P₁, A₂P₁, A₃P₁. . . . . .

We can add all attributes in the Attribute Data Definition that represents all attributes in all models.

You can imagine the permutations involved and it would be a nightmare to enter the data (building the repository) and maintain the information afterwards.

So, this was the solution I propsed weeks ago, and may not be very practical and workable.

Public Documents

Important Issues and Project Abstract

Analytics