Convergence of Data Governance and Data Modeling

Data – An Asset or a Liability

Data can be an asset or a liability, it all depends on how you manage it.

According to Wikipedia, “Data governance is a set of processes that ensures important data assets are formally managed throughout the enterprise to verify the data can be trusted and that people can be made accountable for any adverse event that happens because of poor data quality”.

Data models are the visual representation of the important data assets. Data models also contain a wealth of important business and technical details like, meaning, rules and organizational structure for the data asset. Unfortunately, organizations tend to treat data modeling as a project deliverable instead of as a program that integrates data governance with repeatable data modeling practices to produce quality results. The challenge is how to developing a best practices-based data modeling infrastructure, which supports data governance goals and is driven by a coherent data modeling strategy.

Integrating Data Modeling and Data Governance

Most books on data governance stress that the data governance processes must be integrated into projects and ongoing data management tasks. How does an organization convert what to do into how to do it? In this case the ‘what’ is data modeling and the ‘how’ is the process that governs the data model quality. The integration of data modeling and data governance is a lot like data manufacturing. When something is manufactured it must have a consistent and repeatable construction process, a bill of materials, tools to construct it, people to do the work, and quality control standards and processes to prevent, identify and address defects.

Standards

Data standards are the measuring stick for data quality. Data standards can include everything from how something is named and defined to how it can be used. Without data standards you will never be able to measure the quality of your output. Without the ability to measure quality you will not be able to determine if your model quality is increasing or decreasing. A data model scorecard or similar mechanism is useful in scoring the level of quality of your data models.

Procedures

In many organizations there are no documented steps on when to create a new model or modify an existing model. Data model objects are copied from one model to another without regard for integrity. This leads to error prone, redundant models that require extra effort to manage.

Having an easy to follow procedure for model construction promotes consistency. A set of published standard operating procedures removes the ambiguity around building data models.

Change Control

Data models are not static. Business is constantly evolving, and data needs change all the time. One of the primary tenets of data governance is that change must be managed. With change comes responsibility. Procedures must be in place to manage what change is to be made, who is responsible for making the change, and a review process to verify the change.

Reusability

It is no secret that development and data management teams are sometimes at odds with each other. A common complaint is that the data team slows down the development team. Part of the problem stems from a lack of reusability. If we go back to our data manufacturing analogy, a reusable data object is part of a warehouse of data components that can be used to assemble new models without starting from scratch. A reusable data object has already undergone the quality control process and is a trusted component, free of defects.

A commonly asked question is “How can we show a return on investment (ROI) on a data modeling program”? Calculating a return on investment is difficult because no two modeling programs are exactly the same, just like no two data governance programs are the same. The value comes from the return on reuse. A modeling program may start out building everything from scratch using quality control metrics to guide it’s development. Overtime more and more data objects can be reused, thus reducing the development time required to produce quality data models. For example, a new data modeling program might be able to reuse 5% of existing data objects in a new project model. After one year perhaps 70% of data objects come from reuse.

Communication

Communication is the key to promoting the awareness and adoption of the data modeling program. Developing the policies, procedures, data objects, and supporting documentation is a significant investment in time. There is no reason to keep it to yourself. Consumers of data models include both business and technical audiences. Knowledge collaboration software is ideal for publishing content about the data modeling program and allowing people to comment.

Summary

Establishing a quality driven data modeling program is a necessary foundation for data governance. Data models contain the structure and meaning of data and data governance controls the processes that that create and manage data. The convergence of governance and modeling provide for quality data content. Business management consultant Peter Drucker is famously quoted as having said “You can’t manage what you don’t measure”. I’d like to extend that thought a bit by saying:

  • You can’t find what you can’t name
  • You can’t understand what you can’t define
  • You can’t measure what is not predictably consistent

For more information on how to achieve the necessary foundation for data governance contact us at info@sandhillconsultants.com, or visit: https://www.sandhillconsultants.com/offerings/enterprise-modeling-standards-framework-em-sos/.