How much data is enough data? This is a discussion often had within organisations where there are two broad groups:

  • Gather only the data you know that you need – the minimalists
  • Gather all the data you possibly can – the maximalists

There is arguably a third group – the employees who are involved in a process who would prefer to get by without providing any data at all, if possible, since completing call logs, feedback forms etc. is time consuming.

So what is the correct position?

You will often hear the term ‘storage is cheap’ which is broadly correct, so why does this matter? Well, the activity of data management can be far from cheap with costs arising from storage, backup, analysis, migration, de-duplication, master data management etc. being substantial costs for many organisations. This can be particularly the case for complex data migration projects which may result in delays to business transformation projects or risk degrading the data.

Since most business processes require some data to be effective, then we will ignore the third group above – arguably organisations that do not use data may not be able to provide a suitable service to their customers or have a suitable audit trail will not be in business for long. Which allows us to focus on the two main groups.

The Minimalists

The Minimalists will insist that the only data that should be collected is that which has an immediate and clear need. Software and databases should be configured just to capture this data and no more. The positive impact is that this should allow the data aspects of processes to be as efficient as possible and software implementations as simple as possible. 

The Maximalists

The Maximalists will suggest that you should collect as much data as possible, since not all future uses that the data can support will be currently known, so this extra data will allow you to be future proof to an extent. If you have taken the time and effort to access an asset that is normally inaccessible, for example a buried pipe/cable or assets normally submerged at the bottom of a fluid filled tank, then the cost of gathering all the data you are able whilst the asset is accessible should minimise the need to re-expose the asset in future.

Cases where each can  be relevant

Factors that may help decide where you should be on the spectrum ranging from Minimalist to Maximalist:

  • Data cost versus transaction cost – If an activity or transaction takes a number of hours to complete and the data entry only takes a few minutes, then gathering more data will probably not be a major imposition. A ‘rule of thumb’ could be that, depending on the value of the data, if the data cost is not more than 20% of the transaction cost, than this is reasonable;
  • Data cost versus availability cost – If it is easy to gain access to the entity that is the subject of a transaction e.g. checking Google StreetView to confirm whether a property has a driveway, then the data cost is low. However, if an asset has to be excavated, or a process stream shut down to allow a tank to be drained of fluid then it is sensible to gather more data than is required purely to complete the process transaction. Again, if the data acquisition cost is not more than 20% of the transaction cost, than this is reasonable;
  • Transaction volumes – If there are relatively few transactions occurring in a business process, then the aggregate effect of spending 10% more time gathering data, then this is less likely to have a major impact on organisational performance, however, if the process has many thousands of transactions per day, then spending 10% more time gathering data could have a large impact on efficiency;
  • Process maturity – A mature process that has been operated for a number of years will tend to change only gradually and the likely data needs to support this change will  be small, therefore, the minimalist approach may be more relevant. However, for a process that is new/immature and where a lot of change and evolution is likely, then there is also likely to be a need for more data to support process changes, therefore a more maximalist approach may be relevant;
  • Innovation – If it is likely that only relatively small, iterative improvement steps are likely for a process, then more of a minimalist approach to data will be suitable, however, if it is likely that lots of innovation will be required to allow you to keep ahead of your competition, then extra data may help support this innovation.

Are you a data hoarder?

Capturing lots of data does not necessarily mean that you are a hoarder – a library will contain many books, but they should be both indexed (using the Dewey Decimal Classification) and then sorted within categories. Similarly, if you have a clear view of your ‘data landscape’ and suitable governance, then you should not think of yourself as a data hoarder.

If there is a lack of control within your organisation about where to store data sets (and even to govern whether new ones should be created at all) then your ‘data landscape’ will become exceedingly complex, however, you are likely to be reliant on the knowledge of key individuals to know where data is stored, what it represents and how it could be used.

If there is a maximalist approach to data acquisition then it becomes ever more critical that you have suitable governance in place to reduce the likelihood that your data will become an unmanageable hoard.

Is it time for a spring clean?

With the imminent arrival of GDPR, it becomes ever more critical for organisations to know where all their personal data is stored (with a spin-off for other sorts of data). If you have a large number of data sets, some of which you may not know their provenance or quality, then it may be prudent to consider a programme of archiving and deletion. Before actually undertaking deletion, consider archiving (or closing down access to the data) in order to spot if there are any users of the data that you were not aware of. If unexpected data users do not ‘pop up’ then you proceed with archiving or deletion.


Tagged on:             

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.