Clearly, when computers are required to perform “straight forward” calculations they are accurate. For example, when adding up a series of values they will get the correct answer. A recent Dataspora blog post postulates that we are not far from the point where data flows around the world helping to make everything happen, but without involving humans.
I take a slightly different view based on real world experiences of data, analysis systems and human behavior. In summary, I believe that complex analysis systems are inaccurate, to a certain degree, so outputs need to be treated with caution and reviewed for suitability before being acted upon.
I will consider three areas in turn – data, analysis systems and human behaviour.
Firstly, whilst organisations aspire to “perfect” data quality (however this is defined for a particular organisation) the reality is that few organisations can afford to achieve data perfection, even if this is possible. In most cases, many factors conspire to prevent data quality perfection being attained. For example:
- Manual data entry will still have some mistakes, however well trained staff and customers are
- User interfaces may increase the likelihood of incorrect value selection
- System interfaces may not be able to handle every data combination
- Data on some events or physical entities may no longer exist and not be possible to recreate
- Changes in the attributes of physical entities may often not get updated in relevant systems
Organisations will therefore all be dealing with data with varying levels of defects.
Secondly, analysis systems typically are looking to replicate and automate human logic, intuition and insight. Whilst some decisions are simple and have limited scope for error or misinterpretation organisation often have many more complex decision and modelling scenarios to run their business. Examples of such complex logic are in asset investment planning, financial forecasting, climate change analysis, demographic analysis etc. In these cases, the actual decision logic can be fiendishly complex, is sometimes fully understood by only a small number of people and can be highly sensitive to data quality problems. Due to the complexity of the decision logic, there is a risk or errors arising when the logic needs to be changed to meet new business needs. A few years ago I was involved in the development and operation of a rules based artificial intelligence system which suffered from the problems mentioned previously and also had the unfortunate habit of producing a completely different answer when only small changes to input parameters were made. In this case, the system operated in such a way that it was not possible to trace the actual decision logic to spot where any issues had arisen, so it was a true “black box” system.
Finally, we need to consider how human behaviour responds to the outputs of computer systems. Even simple systems, if utilising digital outputs, can cause users to be over trusting of the results. For example, when I worked in the quarrying industry, the quarry I worked at had many belt weighers (see example of similar weighers) which typically have a mechanical accuracy of +/- 10%, one of these which was key to the overall control of the quarry was on a conveyor which typically transported 1,100 t/hour, however the digital readout was set to read to around 4 decimal places, i.e. +/- 0.00001%!! Unfortunately, because the readout was digital, a number of staff trusted the data implicitly. Once the readout was set more appropriately, staff treated the output results in a more suitable manner.
When we consider more complex analysis systems running on imperfect data this problem can be magnified. Unfortunately, there is a tendency for humans to be lazy, so they may not “sanity check” the outputs of such complex systems and will implicitly rely on these outputs as if they were correct. I am sure most people will have many examples where users have blindly trusted the outputs of systems. For the artificial intelligence system referred to earlier, there was a belief held by some senior staff that if enough data were processed by the biggest analysis tool they could build, then the results would not need to be checked prior to being acted on by other staff – an idealistic and incorrect assumption.
In summary, whilst computers are not inaccurate (they do whatever you tell them with the data you provide) there is a high likelihood that the outputs of complex analysis systems using imperfect data will not be as good as intended, therefore such outputs will require sanity checking and possibly further analysis before being used to support decision making.