Key factors in a data quality assessment are the data values, data requirements and data subject

Data quality assessments can provide large amounts of useful information. However, to gain a complete perspective on organisational data quality, it is essential to consider three key perspectives:

The data itself – entries in databases and spreadsheets
The requirements for data – arising from processes and organisational objectives
The data subject – the person, product, activity or event represented by the data

Considering data quality from only one or two of these perspectives are insufficient to understand organisational data quality. Having a lot of information about some aspects of data quality may mask the fact that you are missing key data quality dimensions and insights. Activities to improve data quality may therefore not be correctly targeted.

Looking at each of these perspectives in turn….

Data values

Data profiling

There are many data profiling tools available that can rapidly deliver a significant amount of information about a data set. For example, ‘out of the box’ analysis of a data table or data set by a profiling tool will generate many insights:

The number of records
Count of unique values (and the number of duplicates)
Proportion of null values (or blank entries)
Maximum and minimum values and perhaps the average for numeric values
Format patterns e.g. CCNN NCC as a format for a UK post code
When multiple tables in the same database are analysed, the profiling tool can automatically suggest likely relationships between data tables, primary and foreign keys
Etc.

Specific data profiling tools are not essential to measure and understand data quality, you can also use a wide variety of analytic tools for this purpose, including spreadsheets. Whichever technology is used, the downsides are similar, from an organisational perspective.

Data profiling drawbacks

Data profiling results can be delivered quickly, with minimal user input and can provide a large volume of insights into your data which can be a deceptively incomplete view of your data quality. In isolation these results are of limited value to an organisation, for example:

The accuracy of your data cannot be assessed without considering the data subject. Just because you have data that is plausible, does not mean that it correctly represents the subject. For example, data may indicate that a car that is actually red is plausibly recorded as blue. Only when you compare the data to the car itself, will you know that this data is inaccurate
Profiling a data table or database provides one representation of the data subject but is not an enterprise view of your data quality
Assessing data quality without linking to business requirements means that there will be no checks that the data is what the organisation needs. You may have millions of rows of data, but may still not be able to meet your organisational objectives if key fields are not recorded
You cannot identify and deliver appropriate data improvement projects (which could involve improving existing data and/or gathering new data) without considering changes to data requirements to meet organisational objectives

Data Requirements

According to ISO 9000 (and repeated across many, many other standards), quality is defined as “…conformance to requirements…”.

Assessing data without considering data requirements only illustrates the characteristics of the data, not its quality.

Data profiling tools can be configured to include rules translating organisational requirements into data quality assessments – a linking of the data values with data requirements. Developing data quality rules requires a degree of domain knowledge to ensure that rules are appropriate for the data and context. Such rules enable assessment of the validity and consistency of your data and identify data that is outside agreed limits. You are now developing a broader view of your data quality.

Assessing data alongside data requirements still only provides a partial view of data quality and what is needed to meet the overall requirements of the organisation:

Again, without comparing to the data subject itself stops you gaining any information about the accuracy of your data
If the weight of a product, a key attribute, does not exist in the relevant data tables, then you will not know that the data is missing if you are only analysing from the perspective of the data values
Failing to take account of the data subject and its context may mean data requirements are not realistic. They may be either impossible to meet or may be inordinately expensive to gather and assess. For example, requiring the location of road signs to the nearest centimetre will require more expensive GPS equipment than ‘standard’ equipment. The measurement process may take significantly longer increasing the data acquisition costs and later costs of checking the data
Meeting data requirements without considering the data stores existing in an organisation means you may select the wrong data store to record a new data requirement

Data Subject

A Data Subject is the thing or entity that the data represents and could be a product, service, customer, activity, event etc. Considering data quality from the context of the data subject is essential to fully understand your organisational data quality.

It is extremely difficult to assess the accuracy of data without reference to the data subject itself. For example, a seemingly valid and plausible date of birth of 3/6/1981 appears correct, but fails to identify that it had been recorded in a US date format and should actually be 6/3/1981. Such errors are unlikely to be spotted without reference to the data subject, or another authoritative data source.

Inventory accuracy

Reference to the data subject is essential for ‘inventory accuracy’:

Is there one, and only one, entry for each data subject? (Avoiding data duplication or missing data entries)
For each data entry, is there a corresponding data subject? (Avoiding ‘ghost’ data entries)

These key checks ensure, for example:

That a single product is not listed twice (or more times) against different suppliers, descriptions etc.
That Jo Smith, who left the organisation five years ago, is not still listed as an ERP system user

In many situations the data representing a data subject will be stored in multiple data stores. For example, information about an employee is likely to be recorded in:

The ERP systems they use for their normal work
The HR system
The training system
Building access control system
etc.

Employees may also appear in numerous spreadsheets that are outside the control of IT – for example, details of employees involved in a charity event recorded in a local spreadsheet.

Consistency

Assessing the data quality dimension of ‘consistency’ requires analysis across all data stores holding information about the data subject. For example, a particular product with a unique ID may appear in the online store, finance system, procurement systems and warehouse/ logistics systems. Past updates to the product description may not have been copied across to all data stores which represents a consistency issue.

If the data subject is a person and they understand where their data is stored, they will be better able to understand what needs to be changed if they, for example, move house.

Assessing data quality across these perspectives

The sections above demonstrate how the key perspectives of data values, data requirements and data subjects must all be considered when assessing organisational data quality. However, that is not enough!

All these aspects of data quality should be considered together to understand and define the data architecture of your organisation. This should include all data sources, not just those managed by IT, and yes, that also includes user created spreadsheets. In turn, this informs the approaches to master data management and data flows across and outside the organisation. Your data governance activities should consider all aspects of data and data quality, not just for the data stores managed by the IT department.

Benefits of a complete view of your organisation’s data quality

Why should you be concerned about understanding organisational data quality? A previous blog post explains, the benefits of improving data quality include:

The removal of inefficiencies from your organisation
Improvements to the quality of decision making and effectiveness
Better performance and greater efficiency

Briefly, poor data quality will lead to adverse financial impacts for your organisation. In today’s challenging times, can you afford for your organisation not to be as efficient and effective it can be?

Why not get in contact to discuss how we can help you deliver greater performance for your organisation enabled by better data quality management.

Data quality assessment from an organisational perspective