The Unexpected Benefit of Analytics Pilots – Uncovering Data Quality Issues

Posted On 05/22/2023

Analytics

Data-driven businesses are hard to build. In fact, any sort of data program is hard to launch in an organization. The first step is often understanding the “business-side” of the data and deciding on one or two use cases that make good pilots or proofs-of-concept. These pilot projects can be critical in demonstrating the current data capabilities of an organization and the possible value of data for driving business outcomes. An unexpected benefit of a data pilot is what the project can reveal about the organization’s data quality.

How Simatree Uses Report Automation Pilot Projects to Improve Data Quality

One simple, but powerful initial pilot project is automating a manual report. With manual reporting, for example, there will almost certainly be data quality issues to identify and remediate due to human error or staff-developed workarounds. Often small errors on a single data entry can compound over a dataset or multiple datasets for a complete report (see visual below). Although remediating data quality issues organization-wide can seem like a daunting task, teams can avoid the headache by cleaning up one dataset to apply data quality improvements across the organization.

When automating a manual report, a team can check data quality by comparing small subsets of the automatically generated and manually generated reports to:

Verify that the data is accurate OR
If errors are found, fix them and apply the fixes across the dataset

To identify data quality issues, a pilot team should create exception reports that identify and list common types of issues to fix, such as duplicates, missing records, and null values. Exception reports represent a ‘Quick Fix’ solution that may also surface needed process changes to stop recurrences, which calls for preventative solutions (see below for details).

To track progress against identified data quality issues, teams managing data should create trend reports for their leadership team. These reports should track trends (e.g., trending up, trending down) for each type of data quality issue identified, as well as overall progress. Further, remediation plans should be time bound with clear goals.

How to Roll Out Organization-Wide

Once one dataset is in good shape, the pilot team can apply the learnings to other datasets and reports within their team. After the pilot team demonstrates success, and with organization-wide collaboration, the learnings can be applied throughout the organization. For example, good, clean data used for a finance team’s reporting can also be applied to an operations team’s reporting, and so on. Further, quality data can then be applied to other business initiatives beyond reporting.

Having clean and consistent data across teams requires centralized data assets. Centralizing data access enables better cross-silo collaboration, helps eliminate rework, and ensures shared knowledge of definitions and data quality concerns across teams. It also raises the standard for data quality and management across the entire organization. While data centralization may raise security concerns, there are simple ways to control who sees what – as micro as cell-level security features in a dataset – such that the organization can have these benefits of centralization without the risks.

10 Common Data Quality Issues Identified Through the Pilot Approach

Through pilot projects such as this, Simatree has identified 10 common data quality issues and solutions, below. While exception reports can identify data quality issues after the fact, the below solutions can prevent data quality challenges before they begin. The types of data quality challenges exception reports identify can help organizations prioritize which preventative solutions to implement first.

10 Common Data Quality Issues Identified Through the Pilot Approach

ISSUE	DESCRIPTION	IMPACT	PREVENTATIVE SOLUTION
Duplicates	The same value used multiple times across records in fields (i.e., columns) that should have unique identifiers	Results in double counting data, making it inaccurate and, therefore, providing unreliable insights	Create an alert on a unique identifier field so that, if the value already exists in the system, the record cannot be saved until that is fixed
Missing Records	Record(s) found in one data source are unexpectedly not found in another data source	Results in undercounting, insufficient and/or incomplete data for product accounting, and therefore unreliable insights	Create business rules to ensure that when a record gets entered into one system, it triggers a chain of events to record it in every relevant system
Null Values	Critical fields are blank	Similar to missing records, nulls can skew data for impacted fields and result in unreliable insights	Lock down fields where null values shouldn’t be allowed so that records cannot be saved if the fields are not filled out
Inconsistent Rules Around Field Use	Lack of clear business rules and/or consistent practice leads to individuals using multiple fields in a data source for the same type of data	Will lead to inaccurate counting and potentially skew key metrics calculated from data impacted by inconsistencies – client metrics in this instance	Create clear business rules for how to enter data into a source system; audit systems regularly to proactively identify misuse of fields
Timing Issues	Depending upon when a report is run, the underlying data may be incomplete	Incomplete daily sales data produces unreliable insights	Analyze the data flows and work with business leaders to set business rules for when data should be pulled in and when a day’s value should be considered final
Siloed Definitions	Different groups within the organization may view/calculate data differently	Results in multiple views of the truth around the organization, potentially paralyzing decision-making	Clearly define and align on key metrics across teams/groups, and create a centralized repository of clearly defined metrics. Explore the development of automated dashboards and reports that lock users into consistent metrics
Incomplete Reference Tables	When a field in a dataset pulls in data from an external table, record(s) in the field will show up as unknown in the dataset if the associated record is not contained in the external table	Results in incomplete product data and can skew the data similarly as with null values; it therefore provides unreliable insights	Create a robust data architecture and work with business leaders to ensure that the reference tables are complete and well defined
Free Text Fields	Free text fields allow users to manually input misspelled or incorrect values	Makes it difficult to perform reliable analysis with all the misplaced and inaccurate data. Unless someone manually cleans up the free text fields, there may be typos in any report you produce	Lock down fields that should only have a few acceptable entries; make it a drop-down list instead of free text
Integration Errors	When acquiring new businesses/clients and loading their data into existing systems, issues and errors may occur if their data is structured differently or of lower quality than the data in existing systems	Results in poorer data quality (insufficient, incomplete, and hard to interpret data), and therefore limits ability to realize underwritten or otherwise assumed synergies	Conduct a gap analysis to understand data quality gaps with newly acquired entity prior to integration. Determine whether to integrate systems or to maintain parallel systems. Set clear business rules for what to do if an acquired company/client’s data is not up to the business’ data quality standard. Rules should cover data cleansing standards and procedures, acceptable alternative data sources for gap-filling, approval process, etc.
Technology Incompatibility	Depending on the system used for data extraction, the data pipeline used may not have the rights to read in the entire dataset	With data read in incompletely, the dataset becomes unreliable for analysis and insight generation	Be agile with technology usage and verify that all necessary data is received. If a certain extraction method isn’t working, consider exploring other options like an excel report instead of a direct connection

Small One-Off Data Quality Issues Can Quickly Add Up Across a Dataset

The visual below shows how small errors on individual entries can compound across a dataset to reduce trust in the complete dataset and with the associated reports. Please reference the key below for a description of each data quality issue identified in this dataset.

Initial Dataset

Ingested Data View

Key

Conclusion

Data analytics pilots have the dual benefit of spurring data quality issue identification and remediation, while also demonstrating the value of analytics to the business. With assurances of data quality and a proven use-case of data analytics in the organization, leaders can be confident in making data-driven decisions going forward. More importantly, a successful data analytics pilot enables leaders to develop and execute on more analytics use cases. In essence, one successful pilot can set an organization on a roadmap to a data-driven future.

About the Authors

Patrick McCreesh Catherine Quinn Daniel Crystal Sophie Griffiths

The Unexpected Benefit of Analytics Pilots – Uncovering Data Quality Issues

Recent Insights

Subscribe for More Insights