Data Quality Improvement

Assessment in Data Quality Improvement Projects

Laura Sebastian-Coleman , in Measuring Data Quality for Ongoing Improvement, 2013

Data Quality Improvement Efforts

Data quality improvement efforts are just what the name implies: work whose goal is to make data better suited to serve organizational purposes. Improvement projects may bring data into closer alignment with expectations, or they may enable data consumers to make decisions about whether to use data for particular purposes, or both. Projects can be large or small, including numerous data elements or just one. They can focus on process improvement, system improvement, or a combination. Other times, such work is part of a larger project or part of ongoing operational improvement efforts. Whatever the approach to improvement, it is important to have specific, measurable improvement goals.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780123970336000080

Directives for Data Quality Strategy

Laura Sebastian-Coleman , in Measuring Data Quality for Ongoing Improvement, 2013

Purpose

This chapter presents a set of 12 directives for establishing an organization's data quality strategy and describes how to assess organizational readiness for such a strategy.

Thought leaders in the product quality movement recognized that the manufacture of quality products depends on many factors, including cultural and environmental ones. They introduced some of the key tools and methods for assessing and maintaining quality—the control chart, the Plan-Do-Study-Act approach, the Pareto diagram, and the Ishikawa (fishbone) diagram. More important than the tools themselves, pioneers in product quality demonstrated how organizations work. They recognized that producing quality products requires the commitment of an enterprise to a quality result. This commitment must come first from the top of the organization. Its goal must be to foster a culture of quality in which all members of the organization are engaged. (The companion web site includes additional background on the origins of product quality.)

Data use, however, also requires knowledge of what the data represents and how it effects this representation. Managing data and ensuring that it is of high quality therefore requires an understanding of knowledge management. Organizations that want to get the most out of their data must be learning organizations. They must capture, make explicit, and actively manage knowledge about their data, as well as the data itself.

Data quality improvement is built on the same foundation as manufacturing quality improvement, but also recognizes the need to manage data knowledge. Thought leaders in data quality recognize a set of directives necessary to the success of any organization's efforts to improve and sustain data quality. This chapter will summarize those directives and how to assess an organization's strategic readiness for data quality management and improvement. 1

The directives break down into three sets. The first set focuses on the importance of data within an enterprise and needs to be driven by senior management. The second applies concepts related to manufacturing physical goods to data and should be driven by a data quality program team. The third focuses on building a culture of quality in order to respond to the fluid nature of data and meet the ongoing challenges of strategic management. A mature organization will plan for its own evolution and the ongoing health of its cultural orientation toward quality.

The directives are numbered and presented in these sets so that they can be better understood. (see Table 13.1). However, they are not sequential in the sense that process steps are. Each is important in and of itself, and they are interrelated. (Will you be able to obtain management commitment without recognizing the value of data?) So your strategic plan should account for all of them. However, which ones you focus on first tactically will depend on your organization's starting point and its receptivity to data quality improvement.

Table 13.1. Twelve Directives for Data Quality Strategy

Directive Actions Drivers
Directive 1: Obtain Management Commitment to Data Quality

Associate data with the organization's vision and mission.

Orient the organization toward quality.

Recognize the importance of data to the organization's mission.

Driven by senior management

Directive 2: Treat Data as an Asset

Define the value of data to the organization.

Determine the willingness of the organization to invest in data quality.

Recognize data as a knowledge asset.

Recognize the importance of data to the organization's mission.

Driven by senior management

Directive 3: Apply Resources to Focus on Quality

Commission a data quality program team to support the achievement of strategic goals related to data quality improvement.

Improve processes by which data is managed; put in place measurements to maintain data quality.

Recognize the importance of data to the organization's mission.

Driven by senior management

Directive 4: Build Explicit Knowledge of Data

Recognize that data quality management is a problem of knowledge management, as well as product management.

Build knowledge of the data chain and ensure processes are in place to use that knowledge.

Recognize the importance of data to the organization's mission.

Driven by senior management

Directive 5: Treat Data as a Product of Processes That Can Be Measured and Improved

Establish a realistic orientation toward data. It is something people create.

Measure its quality.

Apply quality and process improvement methodology to improve its production.

Apply concepts related to manufacturing physical goods to data.

Driven by the data quality program team

Directive 6: Recognize Quality Is Defined by Data Consumers

Build a vocabulary around data quality.

Ensure that data consumers articulate their data quality requirements.

Apply concepts related to manufacturing physical goods to data.

Driven by the data quality program team

Directive 7: Address the Root Causes of Data Problems

Apply knowledge of data processes and root cause analysis to understand issues and problems.

Invest in remediating root causes rather than symptoms.

Apply concepts related to manufacturing physical goods to data.

Driven by the data quality program team

Directive 8: Measure Data Quality, Monitor Critical Data

Broadly assess overall data quality.

Regularly monitor critical data.

Apply concepts related to manufacturing physical goods to data.

Driven by the data quality program team

Directive 9: Hold Data Producers Accountable for the Quality of Their Data (and Knowledge about That Data)

Engage data producers to prevent data problems.

Improve communications between producers and consumers.

Ensure producers have the tools and input they need to deliver high-quality data.

Build a culture of quality that can respond to ongoing challenges of strategic data management.

Partnership between senior management, the data quality program team, and other governance structures

Directive 10: Provide Data Consumers with the Knowledge They Require for Data Use

Ensure data consumers have the tools they need to understand the data they use.

Build data consumers' knowledge of the data chain and its risks.

Build a culture of quality that can respond to ongoing challenges of strategic data management.

Directive 11: Data Use Will Continue to Evolve—Plan for Evolution

Recognize the environment is evolving.

Plan for constant change.

Build a culture of quality that can respond to ongoing challenges of strategic data management.

Directive 12: Data Quality Goes Beyond the Data—Build a Culture Focused on Data Quality

Recognize high-quality data does not produce itself.

Put in place the governance and support structures needed to enable the production and use of high-quality data.

Build a culture of quality that can respond to ongoing challenges of strategic data management.

Articulating a strategy can be very clarifying: It helps people understand how pieces of an organization work together. Because clarification is energizing, it is also tempting to try and implement an overall strategy immediately. However, it is very difficult to do so. A successful approach to reaching long-term goals depends on many small steps. It also depends on knowing your starting point.

Assessing the current state for the implementation of strategy is separate from assessing the current state of data (as described in Section Three). Like any assessment, assessing strategic readiness consists of asking the right questions and understanding options presented by the answers. A primary goal of the assessment is to help you determine which directives to prioritize and to associate them with measurable actions that support the overall strategy. 2

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780123970336000146

Structuring Your Project

Danette McGilvray , in Executing Data Quality Projects (Second Edition), 2021

Description

A data quality improvement project concentrates on specific data quality issue(s) that are negatively impacting the organization. The goal is to support business needs by improving the quality of specific data where data quality issues are suspected or already known. This can be any set of data – internally created or externally acquired data.

This type of project selects the applicable steps from the Ten Steps Process to understand the information environment surrounding the data quality issue and business needs, assess the data quality, and show business value. For the most sustained results, the goal should be to identify root causes and improve the data by preventing the issues from arising again, such as by implementing new or enhancing existing processes to manage the data. Improving the data also includes correcting the current data errors. To sustain data quality, some of the improvements or controls may be candidates for on-going monitoring. Communicating, managing the project, and engaging with people are done throughout the project. The Ten Steps Process can provide the foundation for the project plan.

A variation of this type of project is to use the Ten Steps as the basis for creating a data quality improvement methodology customized to your particular organization.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780128180150000013

Key Concepts

Danette McGilvray , in Executing Data Quality Projects (Second Edition), 2021

The Data Quality Improvement Cycle and the Ten Steps Process

The Data Quality Improvement Cycle helps illustrate the iterative and on-going nature of data quality management and can be mapped to the Ten Steps Process (see Figure 3.16 ). The Ten Steps Process (as introduced in the previous section) describes a set of methods for the continuous assessment, maintenance, and improvement of data and information. Included are processes for:

Figure 3.16

Figure 3.16. Data quality improvement cycle and the Ten Steps Process.

Determining the most important business needs, associated data, and where to focus efforts

Describing and analyzing the information environment

Assessing data quality

Determining the business impact of poor data quality

Identifying the root causes of data quality problems and their effects on the business

Correcting and preventing data defects

Continuous monitoring of data quality controls

The Ten Steps are a concrete articulation of the Data Quality Improvement Cycle. Like the cycle, they are iterative – when one improvement cycle is completed, start again to expand on the results.

The idea of an improvement cycle comes into play again once controls are being monitored in Step 9. The responsibility for this moves into an operational process. If issues are uncovered from the monitoring, an improvement cycle comprised of Steps 5, 6, 7, 8, and 9 begins with identifying root causes and moves through the remaining steps.

If issues require a new project to resolve them, then the improvement cycle may start again at Step 1.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780128180150000098

Bringing It All Together

David Loshin , in The Practitioner's Guide to Data Quality Improvement, 2011

20.1.1 Developing the Data Quality Business Case

Communicating the value of data quality improvement involves characterizing the value gap attributable to variance in meeting data quality expectations. Quantifying the value gap and accompanying the descriptions with proposed alternatives for addressing those issues along with cost estimates demonstrating a positive return on investment helps to build the business case. However, this requires some exploration, including:

Reviewing the types of risks relating to the use of information

Considering ways to specify data quality expectations based on those risks

Developing processes and tools for clarifying what data quality means and how it is measured

Defining data validity rules that can be used for inspection and assessment

Measuring data quality

Reporting and tracking data issues

Linking those issues directly to quantifiable business metrics

Since there are many ways in which business issues can be associated with situations where data quality is below user expectations, exploring the different categories of business impacts attributable to poor information quality and discussing ways to facilitate identification and classification of cost impacts related to poor data quality will guide the data quality practitioner to identify key data quality opportunities.

Establishing qualitative metrics for data quality as a means for establishing the value gap requires thought regarding how flawed data leads to material impact to business processes, driving a need for:

Distinguishing high-impact from low-impact data quality issues,

Isolating the source of the introduction of data flaws,

Fixing broken processes,

Correlating business value with source data quality, and

Instituting data quality best practices to address flawed information production.

Mapping data quality expectations and business expectations involves specifying rules measuring aspects of data validity and then looking at the corresponding relationship to missed business expectations regarding productivity, efficiency, revenue generation and growth, throughput, agility, spend management, as well as other drivers of organizational value.

To determine the true value added by data quality programs, conformance to business expectations (and the corresponding business value) should be measured in relation to its component data quality rules. We do this by identifying how the business impacts of poor data quality can be measured as well as how they relate to their root causes, then assess the costs to eliminate the root causes. Characterizing both the business impacts and the data quality problems provides a framework for developing a business case.

Chapter 1 presented an approach for analyzing the degree to which poor data quality impedes business objectives that detailed business impacts, categorized those impacts, and then prioritized the issues in relation to the severity of the impacts. We reviewed a simplified approach for classifying the business impacts associated with data errors within a classification scheme. This categorization is intended to support the data quality analysis process and help in differentiating between data issues that have serious business ramifications and those that are benign. This classification taxonomy provided primary categories for evaluating either the negative impacts related to data errors, or the potential opportunities for improvement resulting from better data quality, and focused on four general areas:

Financial impacts, such as increased operating costs, decreased revenues, missed opportunities, reduction or delays in cash flow, or increased penalties, fines, or other charges

Confidence and satisfaction-based impacts, such as customer, employee, or supplier satisfaction, as well as general market satisfaction decreased organizational trust, low confidence in forecasting, inconsistent operational and management reporting, and delayed or improper decisions

Productivity impacts such as increased workloads, decreased throughput, increased processing time, or decreased end-product quality.

Risk and compliance impacts associated with credit assessment, investment risks, competitive risk, capital investment and/or development, fraud, and leakage; compliance with government regulations, industry expectations, or self-imposed policies (such as privacy policies)

The approach in chapter 1 involves defining your own classification taxonomy for business impacts, and then evaluating how known or potential data issues contribute to negative impacts. Use the approach described in chapter 1 to compartmentalize the evaluation and assessment of critical data issues and align them with data flaws. Breaking up the scope of business impacts into small analytic pieces makes building the business case for data quality a much more manageable task. In addition, the categorical hierarchy of impact areas will naturally map to our future performance reporting structure for monitoring how data quality improvement hits the bottom line.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780123737175000208

Functions of Measurement

Laura Sebastian-Coleman , in Measuring Data Quality for Ongoing Improvement, 2013

The DQAF and Statistical Process Control

As discussed in Section Five, most approaches to data quality improvement start with a comparison between the production of data and the production of manufactured goods and recognize the value of treating data like a product. Much thinking about data quality is directly rooted in methods for process quality. Statistical process control methods have been successfully applied to the measurement of data quality both for initial analysis which identifies special causes and for ongoing measurement to confirm whether a process remains in control. 4 The DQAF draws directly on this body of work, especially for its approach to automating measurement.

The third basic step in making quality measurements meaningful is to explicitly compare any new measurement to a standard for quality. The DQAF describes how to collect the raw data of measurement (record counts, amounts, etc.) and how to process this data to make it comprehensible and meaningful through the calculation of percentages, ratios, and averages. Most measurement types compare new measurements to the history of past measurements. These comparisons include:

Comparison to the historical mean percentage of row counts.

Comparison to the historical mean total amount.

Comparison to the historical percentage of total amount.

Comparison to the historical average amount.

Comparison to the historical median duration.

Comparisons to historical data can be used to measure the consistency of data content for processes where content is expected to be consistent. For example, tests of reasonability based on the distribution of values can be automated to identify instances where changes in distribution fall outside of three standard deviations from the mean of past measurements. These comparisons can identify changes in data distribution that are statistically unusual (those that are outside of the 99.7% of all measurements). Not every measurement that falls outside of three standard deviations from the mean of past measurements represents a problem. Such measurements point to data that needs to be reviewed. Because thresholds based on three standard deviations from the mean of past measurements provide a test for potential problems, historical data can be used as the basis for automation of an initial level for data quality thresholds. Keep in mind that there is risk involved in using historical data as the standard. For this data to provide an effective standard, the data must come from a process that is both under control (stable, not influenced by special causes) and meeting expectations. If it does not meet both of these conditions, then it can be used to gauge consistency, but review of results requires significant skepticism (in the pure sense of that word, "doubt and questioning").

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780123970336000158

Foreword

Ted Friedman , in The Practitioner's Guide to Data Quality Improvement, 2011

I also read a lot of books on the topic of data quality improvement. Many people write about this discipline in the abstract. Much of that writing is highly theoretical or even academic in nature. It's great if you want to understand the high-level principles and the philosophy of how data quality can be optimized in a perfect world. But of course we don't live in a perfect world, and therefore we need practical approaches that can be directly translated into effective action in our organizations – action that will get our data quality improvement initiatives off on the right foot and generate immediate business value.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780123737175000257

Data Quality Management

Mark Allen , Dalton Cervo , in Multi-Domain Master Data Management, 2015

Planning and Oversight

A data-quality program is ongoing. There will be many data-quality improvement projects, but the overall goal is to have a strategically sustainable office to direct and foment initiatives. Strong data-quality leadership needs to work closely with a data governance office to align business needs and continuing prioritization of efforts. A data-driven company needs to encourage everyone to detect issues and propose data-quality improvements. Of course, any proposed initiative will have to be analyzed, reviewed, and prioritized. A data quality office can oversee the analysis of data issues, evaluate the impact that it has on the business, propose solutions to cleanse existing data, and either mitigate or prevent future occurrences.

In multi-domain MDM, it is natural to have certain data-quality requirements and activities already presumed since they are intrinsic to consolidating and synchronizing master data, as explained earlier in this chapter. Therefore, the data-quality roadmap is somewhat embedded to the overall MDM roadmap. Although MDM provides the opportunity to start a data-quality practice, DQM needs to be built as a strong practice with well-supported capabilities and leadership. To be sure, they need to be collaborating functions. But if companies do not already have a data-quality program, they need to take this opportunity to start one and expand the role of data quality beyond master data.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780128008355000099

Data Quality and MDM

David Loshin , in Master Data Management, 2009

5.9 Influence of Data Profiling and Quality on MDM (and Vice Versa)

In many master data management implementations, MDM team members and their internal customers have indicated that data quality improvement is both a driver and a by-product of their MDM or Customer Data Integration (CDI) initiatives, often citing data quality improvement as the program's major driver. Consider these examples:

A large software firm's customer data integration program was driven by the need to improve customer data integrated from legacy systems or migrated from acquired company systems. As customer data instances were brought into the firm's Customer Relationship Management (CRM) system, the MDM team used data profiling and data quality tools to understand what data were available, to evaluate whether the data met business requirements, and to resolve duplicate identities. In turn, the master customer system was adopted as the baseline for matching newly created customer records to determine potential duplication as part of a quality identity management framework.

An industry information product compiler discussed its need to rapidly and effectively deploy the quality integration of new data sources into its master repository because deploying a new data source could take weeks, if not months. By using data profiling tools, the customer could increase the speed of deploying a new data source. As a by-product, the customer stated that one of the ways it could add value to the data was by improving the quality of the source data. This improvement was facilitated when this company worked with its clients to point out source data inconsistencies and anomalies, and then provided services to assist in root-cause analysis and elimination.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780123742254000059

Rating Your Data Stewardship Maturity

David Plotkin , in Data Stewardship (Second Edition), 2021

Maturity Level 4: Strategic

Response to data issues : Tools are added for data quality and profiling with ongoing improvement efforts. Data Stewards are always involved in data quality improvement efforts. Risk assessments for data around projects are done early. Data quality issues and resolutions are measured, monitored, and communicated.

Attitude of management: Data Governance and Data Stewardship metrics have become a primary corporate measurement of success in managing data across the enterprise. Senior management drives Data Governance strategy. Data is seen as a valuable corporate asset. Accountability for quality and understanding of data is practiced across the enterprise. Data quality is a corporate objective, not a business or IT problem. Ongoing investments in managing data and Metadata are supported and championed. Stewardship metrics are included in assessments of projects and employee performance.

Handling of Metadata: Expertise increases in metadata management and Master Data Management. Single sources of the truth for both metadata and data are identified and documented. All key business data elements have full metadata collected quickly and efficiently.

Development of formal organization and structure: All business functions are represented in Data Governance and Data Stewardship, and participation is mandatory. The executive leadership team gets regular updates and handles escalated issues quickly and efficiently. The Data Governance Program office is fully staffed and funded, and reports progress, metrics, and issues to senior leadership on a regular basis.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780128221327000097