Garbage In, Gospel Out

Most people should be familiar with the old adage “Garbage In, Garbage Out” intended to remind people that if your input data is poor, then any outputs will also be poor.

There is a variant of this that cropped up in a discussion recently “Garbage In, Gospel Out”. So what can this mean and does it apply to your organisation?

This adaptation arises from the bias of the user based on their perception of the tool being used to generate the analysis.

A good example of this from my early career (which also explains the image above) dates from when I worked in a newly built quarry. This was a state of the art facility using computer controls and many sensors to assess the performance and state of the plant. A vital fact when running a quarry is to know the throughput of various parts of the plant in order to determine how much stone is being processed, by which processing route etc. Throughput is measured using a belt weigher to determine the tonnage of stone being carried by the conveyor belt which has two parts – a mechanical component that does the weighing and a digital readout. The mechanics of a belt weigher mean that the accuracy is around +/-10% of the true value (see below for an illustration of the sensing parts of the belt weigher). The digital readout converts the 4-20mA signals from various sensors and components into a figure that represents the tonnage of stone on the conveyor belt.

In this particular quarry, many of the conveyor belts were rated at 1100 tonnes per hour, yet the digital readouts were set to display this figure to four decimal places and were also set to update the readout every minute. The number of decimal places implied an accuracy of 0.001% and the fact that the readout updated every minute gave the impression to staff that this was an accurate figure. Changing the configuration of the display so that readings were to the nearest 10 tonnes per hour with a display update every few seconds then resulted in a far more correct perception of accuracy.

The issue here was the belief that, because the system was digital, it was completely accurate. Looking at current BI and dashboard implementations, many of the same issues are present today – just because you have spent a small fortune on visualisation tool X, does not mean that the outputs can be trusted implicitly. One organisation I know of used a number of spreadsheets and data sources of variable quality to produce performance metrics and KPIs – these were widely derided due to the understanding of the underlying data quality. The organisation then implemented one of the well known data visualisation and dashboard tools without changing/improving any of the source data and suddenly, the performance metrics were incorrectly perceived to be more accurate due to an unreasonable belief about what the tool could do! How similar is the situation in your organisation?

The Business Case Example at the foot of this page goes into a bit more detail about how different presentations of data could be interpreted, but for a slightly different perspective on how different presentation can affect the outputs, see this excellent xkcd cartoon.

Some questions:

Does your organisation presents costs/benefits etc. with a range based on the quality of the data?
If costs/benefits etc. were presented as a range, how would they be perceived by the leaders of your organisation?
Would the person presenting such figures be congratulated for indicating the sensitivity of the figures or derided for not being able to provide accurate figures?
What similar examples of Garbage In, Gospel Out have you come across?

Business case example

Consider if the costs and benefits of a business case being submitted for approval were presented as follows:

Cost £1,000,000, annualised benefits £250,000, payback period 4 years

Cost £1,004,386, annualised benefits £249,625, payback period 4.0236 years

Cost £1,000,000 +/-5%, annualised benefits £250,000 +/-10%, payback period 4 years +/- 7%

Cost £1,000,000 +20%/-5%, annualised benefits £250,000 +10%/-25%, payback period 3.5-6.4 years

These all represent the same business case, but the presentation of the figures, and the implications of them, varies greatly. If the internal threshold for investment in projects is a 5 year payback period, here’s how the answers could be interpreted:

This could either be viewed as a clear case for investment as it meets the payback period criteria, or that the quoted figures are suspiciously round numbers, so there may have been little analysis to back them up

Again, this passes the payback period criteria, so may be approved, however, the figures and payback period being quoted in this way may give an impression of artificially better accuracy than the reality

Costs, benefits and payback quoted as a range which both implies that accuracy has been considered and with the payback period being well within the threshold would indicate a good project to support

The variance of costs, benefits and payback are far wider than example 3 with the range of payback periods straddling the threshold figure – this could indicate a really good investment, or a really bad one. The large variances on costs and benefits suggests more uncertainty than may be desirable. One outcome would be to approve £100,000 investment to undertake more research into the business case before committing the rest of the money

Business case example

Leave a Reply Cancel reply