An organisation that works in isolation and does not share data with other organisations is probably rare. Most organisations will collaborate with other organisations to some degree and will be providing data as part of this collaboration. So, what factors need are relevant when considering data quality across organisations?
When building a machine or structure, you will usually be assembling components supplied by many manufacturers. For this to work, the interfaces between components and items should meet agreed requirements. For example, if one organisation decided to make bolts with a different type of thread, it would be difficult (or perhaps impossible) to join different components together.
In a recent post I considered what is needed to understand data quality from an organisational perspective, however, what are the data quality implications when we have to use data across organisations?
Society, businesses and organisations are increasingly interacting with each other in ever closer ways. Suppliers may be given direct access to key systems of the core organisation, or may supply data files for direct uploading/ import. Many sectors, such as automotive and aerospace, use complex supply chains with many organisations contributing to the final product. For this to work effectively, the data aspects of these relationships need to be smooth, automatic and ‘frictionless’.
Organisations will work with many suppliers who in turn will be working with many other organisations. Whilst some sectors and situations have commonly agreed data standards, this is probably a minority of situations. Organisations will therefore have to get used to working with different data standards and approaches whenever working with other organisations.
First, lets consider data requirements when working across organisations. There are a number of basic scenarios (probably with infinite variations):
- Data requirements defined by a lead authoritative body or standard. For example, grid references defining location in the UK conform to standards set by the Ordnance Survey
- Appointing organisation sets the data requirements, appointed organisations should meet these requirements. For example, a construction company needs to supply designs and documentation to formats specified by the building owner
- Collaborations between two or more organisations working on a joint venture where the data standards and requirements should be agreed as part of the collaboration
- Informal use of data provided by another organisation, such as using data openly published or shared generally involves using the data in the format that it is supplied
The higher up this list you are, potentially the easier for all parties to collaborate, share and interact efficiently and effectively.
Interoperability and data quality
The above scenarios collectively relate to interoperability. Interoperability is defined in ISO 27790 as the ‘ability of two or more systems or components to exchange information and to use the information that has been exchanged‘.
Interoperability, from a data quality perspective, can be related to the format of the data i.e. that a date in Excel is stored in a numeric format representing the number of days since January 1, 1990. A date stored as text in the format 12/5/21 (or perhaps 12-05-21, 12/5/2021 etc.) whilst recognisable as a date to a user, is not in the correct format i.e. numeric, to support interoperability. This represents the validity of the data when considering data quality.
One other key aspect of data quality in interoperability scenarios is consistency of definitions. This is particularly an issue for dimensions and ratings. For example, the attribute ‘diameter’ of a pipe may be consistently recorded in mm and stored as a number, yet one organisation may be recording the internal diameter of the pipe, the other the external diameter.
Another situation that needs to be considered is the situation where two data sets which in isolation are valid and accurate, but when combined, create new data scenarios that had not previously been encountered. For example, adding a wireless temperature sensor to a machine type that had previously not had ‘wireless temperature sensor’ as a valid component. This is a similar challenge to that found in data migration projects where a merged data set contains new use cases that had not previously been defined.
Data mapping can be a challenge if the same concept is recorded differently. For example, if one organisation records the gender of an individual as ‘Male’, ‘Female’ or ‘Prefer not to say’ and a different organisation records gender as ‘M’, ‘F’, ‘Other’ there will be challenges correctly mapping this data from one organisation to another – ‘Female’ to ‘F’ and ‘Male’ to ‘M’ are straight forward, however, ‘Other’ and ‘Prefer not to say’ represent different concepts and almost certainly could not be mapped to each other. The party receiving the data has to determine how they are going to handle such entries, which may mean creating a new allowable value in both their own data dictionary and the receiving systems.
Other aspects of data quality that need to be considered are:
- Completeness – are all entities recorded, are all attribute fields populated
- Consistency – are consistent identifiers for the same thing used across data sets and organisations
- Timeliness – is data supplied within agreed timescales, particularly in complex supply chains
- Uniqueness – is there one entry only for each thing being recorded
- Accuracy – does the data correctly represent the entity it relates to
These data quality dimensions are discussed more widely in other blog posts, so we won’t repeat them now.
A consideration when working across organisations is data structure – if the data fields in each organisation represent the same underlying data concept, they can be mapped from one to another enabling data transfer, sharing and aggregation. However, if fields are structured differently, this starts to get interesting.
For example, a postal address in one organisation may put the building number in one field and the road name in another, whilst a different organisation may use a ‘first line of address’ field to record both these items. A person reading two such entries could quickly work out how they relate to each other, but if data is being imported and processed automatically, a ‘rule’ needs to be created to define how such data is to be treated.
When considering the effects of data quality and interoperability, the following table illustrates some of the likely scenarios you may face:
|All data formats and definitions consistent||Some data formats and definitions consistent||Little consistent data|
|Good data quality all parties||Fantastic, the sky’s the limit||A bit of work to map data fields and you should be picking up speed||Lots of work needed to map data fields, but eventually you will start to gain speed|
|Some good data quality, some poor or unknown||Caution needed. Mapping the data may be straight forward, but there may be risks ahead||Work needed to map data fields, but once you spot the nuggets of good data, your decision making and insights will be delivering value||Lots of work needed to map data fields, but you may be able to spot the nuggets of good data that supports insights and decision making|
|Data quality all poor or unknown||Beware! Apparently easy mapping of data fields may hide lots of risk||Caution needed. Even when you have mapped data fields, decision making and insights from the data likely to be poor||Oh dear, lots of challenges and hard work ahead!|
There are many industry standard ways of exchanging information (such as COBie) and probably a near infinite number of proprietary or organisation and situation specific ways to exchange information. As the table above illustrates, even if mapping data from one system/organisation to another looks straight forward, there is a risk that the resulting data set is poor and unusable. As with any good data migration methodology, subject matter expertise is essential to ensure that different data situations are handled correctly.
This post has provided a quick overview of the challenges when considering data quality across organisations. There will probably be some follow on posts exploring the topic in more detail, however, please comment below if there are specific areas you want to explore.
For help understanding your data challenges and developing suitable and pragmatic ways to work across organisations why not contact us?