There are a number of analogies for issues associated with data quality. Ken O’Connor’s recent blog post likened data quality to the quality of water in a river. I also sometimes use an analogy based on the quality of water in a swimming pool.

A different analogy that I use, and is the subject of this blog post, is to liken your data to a piece of cheese!

Being half Swiss and also a cheese lover, makes this an analogy one I am particularly fond of. It also tends to cause a chuckle or two when presented at conferences.

Okay, so why do I think cheese is a suitable analogy for data quality?

If you imagine that the cheese illustrated above represents your data set and the holes represent areas where your information is not complete. At present, this cheese is edible and you know where the holes are.

Now consider the following:

  • What will happen to the cheese if users do not supply data as a result of business activities? You will potentially get more holes in your cheese
  • If call centre staff use default job codes rather than the correct one, then mould will develop on your cheese
  • If users start to use a data field for a purpose that was not intended, then further mould will develop
  • If engineers delivering new assets do not provide information on the assets that they have replaced/demolished, then you will have further holes developing
  • If data is lost or corrupted as part of a data migration, then further holes and mould will develop in your data ‘cheese’
  • What if users start to keep ‘rival’ copies of data in spreadsheets, then this will be similar to a mouse nibbling the corner of your cheese away
  • Further inappropriate business behaviours will tend to continue this gradual process of decline in the quality of your cheese

In the real world, if you have a little mould on your cheese, then you would probably cut the mould off and continue to consume the cheese. If the mould is more extensive, then the only option will be to throw the cheese away and start again, or you may risk indigestion.

If the cheese represents a corporate data set, the decline of quality of the data cheese can be gradual and insidious. Data cleansing can be used to address small quantities of mould on the cheese. However, the ‘indigestion’ arising from the poor data quality will result in poor business decision making. If left unchecked, then the data set may reach a point where it is no longer fit for purpose and needs to be rebuilt from scratch.

Can you afford to let the cheese get to a state where it is no longer edible?

By the way, before other cheese lovers make the point, this analogy does tend to fall down if the cheese in question is Stilton or Gorgonzola, but hopefully you should now have got the concept!

Tagged on:         

5 thoughts on “How tasty is your data quality cheese?

  • 3rd March 2010 at 7:37 pm
    Permalink

    Julian

    Good post. If the cheese has too much mold on it, then no one is going to want to take it. Everyone’s going to pass it around.

    Great analogy!

    Reply
  • 3rd March 2010 at 9:19 pm
    Permalink

    If consumers of your (cheese) do not trust the provenance, or value the cheese it will be left to dry out on the shelf and become useless.

    Quality is a PERCEPTION of ‘fitness for purpose’ by the consumer. Most people forget the ‘soft systems’ issues in data quality. They are just as important as the ‘hard’ issues of accuracy, relevance, completeness or timeliness.

    Reply
  • 4th March 2010 at 2:56 pm
    Permalink

    Phil,
    Thanks for the comment.
    If the cheese has got too bad, then it will give off a certain odour. Even when you have thrown the cheese away, then the odour will tend to linger. Similarly, if your data set has acquired a bad odour (reputation) then it can be very difficult to change perception of this.

    Charlie,
    Thanks for the comment, and good to hear from you again.
    As mentioned above, perception is key and if the perception/reputation of the data is poor, then people may use other sources of data which will tend to further worsen the reputation of the core data.

    Julian

    Reply
  • 4th March 2010 at 7:13 pm
    Permalink

    Hi Julian

    The biggest issue, in my opinion, tends to be the cheese that looks just right, only to afflict those that consume it with some nasty bacterial infection!

    Much data may appear to be of good quality, it may later turn out to be seriously wrong, causing untold commercial and reputational damage.

    Reply
    • 5th March 2010 at 9:09 am
      Permalink

      Hi Paul,
      Good to hear from you again.

      This is probably an area where the analogy works less well. Whilst cheese can have listeria etc., testing for the presence of listeria will tend to destroy your cheese. Whereas tests of data quality (accuracy, fitness for purpose etc.) should be achievable without destroying your data.

      Whilst the nirvana of having ‘perfect’ data is something organisations sometimes desire, the cost of achieving this level of perfection is something most organisations will not be able to afford.

      Decision makers should be aware of the actual quality of their data in order to understand what additional checks and controls may be required on data analysis.

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.