There are a number of analogies for issues associated with data quality. Ken O’Connor’s recent blog post likened data quality to the quality of water in a river. I also sometimes use an analogy based on the quality of water in a swimming pool.
A different analogy that I use, and is the subject of this blog post, is to liken your data to a piece of cheese!
Being half Swiss and also a cheese lover, makes this an analogy one I am particularly fond of. It also tends to cause a chuckle or two when presented at conferences.
Okay, so why do I think cheese is a suitable analogy for data quality?
If you imagine that the cheese illustrated above represents your data set and the holes represent areas where your information is not complete. At present, this cheese is edible and you know where the holes are.
Now consider the following:
- What will happen to the cheese if users do not supply data as a result of business activities? You will potentially get more holes in your cheese
- If call centre staff use default job codes rather than the correct one, then mould will develop on your cheese
- If users start to use a data field for a purpose that was not intended, then further mould will develop
- If engineers delivering new assets do not provide information on the assets that they have replaced/demolished, then you will have further holes developing
- If data is lost or corrupted as part of a data migration, then further holes and mould will develop in your data ‘cheese’
- What if users start to keep ‘rival’ copies of data in spreadsheets, then this will be similar to a mouse nibbling the corner of your cheese away
- Further inappropriate business behaviours will tend to continue this gradual process of decline in the quality of your cheese
In the real world, if you have a little mould on your cheese, then you would probably cut the mould off and continue to consume the cheese. If the mould is more extensive, then the only option will be to throw the cheese away and start again, or you may risk indigestion.
If the cheese represents a corporate data set, the decline of quality of the data cheese can be gradual and insidious. Data cleansing can be used to address small quantities of mould on the cheese. However, the ‘indigestion’ arising from the poor data quality will result in poor business decision making. If left unchecked, then the data set may reach a point where it is no longer fit for purpose and needs to be rebuilt from scratch.
Can you afford to let the cheese get to a state where it is no longer edible?
By the way, before other cheese lovers make the point, this analogy does tend to fall down if the cheese in question is Stilton or Gorgonzola, but hopefully you should now have got the concept!