When talking about data quality, it is usual to consider different aspects or ‘dimensions’ of data quality – validity, completeness, uniqueness, consistency, timeliness and accuracy. These six dimensions were agreed as the most relevant and representative of data quality as part of work led by DAMA UK that I contributed to in 2013 and published in this White Paper.

Some recent work and discussions suggest that there may be another dimension to consider – continuity. This blog post explores things in a bit more detail.

Continuity will not be relevant as a data quality dimension in all situations. Some situations where it may be relevant include:

  • Time series data – for example, temperature measurements taken every minute
  • Data on a progressively changing situation – for example, an evolving building design where rooms may be added, removed and resized across different iterations of the design
  • A complete history – for example, a full employment history in a CV or the complete record of the journey of a freight truck
  • An audit trail of transactions – for example, all purchases against an account and not just the final balance

Each of these circumstances will require relevant data quality criteria to test the consistency of the data:

  • A full set of temperature readings with no missing values
  • All versions of the design model, even those that may not have been submitted for formal review
  • Checking for gaps in dates between roles or sections of a vehicle route
  • A complete list of all the transactions against an account that, in total, represent the difference between the opening and closing balance on the account

It could be argued that continuity is the same as consistency, however, we believe that consistency is more about checking that an entity has the same representation across different systems, rather than determining whether you have a full ‘history’ of an entity. So continuity should, therefore, be considered as another data quality dimension.

Do you agree? What examples of data where continuity is important are you aware of?

Tagged on:

4 thoughts on “Continuity – the new data quality dimension

  • 22nd May 2019 at 3:26 pm
    Permalink

    Hello Julian,
    Interesting idea. Prior to it, that would generally have fallen in the facet of “Completeness”, would it not have?
    Martin Storey
    Perth, Australia
    mstorey@welldataqa.com

    Reply
    • 31st May 2019 at 1:41 pm
      Permalink

      Martin,
      Thanks for the comment. A comment I’ve made elsewhere to distinguish between the two relates to the fourth dimension – time!
      Completeness could be assessed at a point in time, irrespective of what has happened in the past
      Continuity is checking that you have, say, a full event history for an asset, or a full audit trail for a decision making process.
      They do relate to each other, but by prompting users to think about continuity, it may be a trigger to consider data quality assessments that they might otherwise miss.

      Reply
  • 15th December 2020 at 8:00 am
    Permalink

    There is a problematic confusion of concepts and terminologies in this article. In the knowledge base of data quality discipline, the term ‘dimension’ means different concept than the term ‘dimension’ in technical terminology of business intelligence community.

    In data quality, the term ‘continuity’ is a data quality property/attribute – not a dimension. In physical business intelligence data mart terminology, you might have a dimension table with a continuity attribute e.g. OLAP dimension. Therefore, you could say that there might be a data quality metric based on ‘data marts dimension called continuity’. However, this is not a data quality dimension but a technical implementation terminology for a specific physical data modelling style or reporting tool implementation style (facts vs dimensions).

    You should read the foundational data quality articles starting from the early 90s such as:
    Beyond Accuracy: What Data Quality Means to Data Consumers
    Author(s): Richard Y. Wang and Diane M. Strong
    Source: Journal of Management Information Systems, Vol. 12, No. 4 (Spring, 1996), pp. 5-33

    In the above article, the authors collected almost 200 data quality ATTRIBUTES (including continuity as one of such) and then analyzed them to form more theoretically meaningful and unified groupings – dimensions. In other words, data quality dimensions are not individual attributes such as continuity but a theoretical abstraction on top of those concepts. Continuity is one of the attributes that was used as a raw attribute to derive wider dimension concepts that group attributes together.

    At the same time, the concept of continuity is very important data quality issue. For example, I often demonstrate continuity challenges related to business value chains, data lineage or data lifecycle.

    Its problematic if business value chain like order-to-sales-to-support event chain is broken into silos leading to missing events or unnecessary misunderstandings. The continuity of customer journey path breaks down. For example, in healthcare persons care pathway might consist of preoperative visit, intraoperative episode and postoperative visit related to ambulatory surgery. More straightforward cases are time-series continuity issues such as lack of daily/hourly measurements and getting only milestones updates between logistics centers.

    The above problems related to continuity could be described with a data quality attribute i.e. data mart dimension but in the data quality terminology continuity is just one of the many alternative attributes of a wider completeness dimension. Completeness means all kind of “lack of data” related to scope, depth, breadth etc. Under the completeness dimension, there is a massive amount of underlying issues and related metrics depending on needs of users and their business cases. In technical data quality terminology, time-series continuity is more specifically cross-record and even cross-table data completeness issue – a slightly more complex issue than completeness of individual attributes.

    Therefore, I agree on the significance of continuity in data quality assessments and I also use it a lot in my examples. However, conceptually its part of completeness dimension since completeness covers all continuity issues and even more quality properties. I have implemented continuity quality scores as “data mart dimensions” and group these metrics under completeness “data quality dimension”.

    Sometimes, there might be a need to create a separate grouping between metrics and dimensions, such as “continuity category” that groups together all continuity metrics but still separates different completeness metrics to different groups. Data quality dimension is an abstraction and you rarely actually need them for measurements.You might prefer more specific or business relevant groupings like ‘continuity metrics’.

    That is because for business users, it might be better to use more specific categories that relate to their business context. Talk about their business problems and create metrics to them. For example, implement process (data quality) metrics and customer (data quality) metrics. Process metrics grouping can include metrics like continuity of business value chain. One specific continuity metric could measure completeness of patient pathway. a) each event has a valid parent pathway populated or b) each pathway has all mandatory events if it is finished pathway. If you have a lot of broken but finished care pathways or events without specified care pathways, you either have a technical data quality or business process problem leading to holes in your process data and could lead to erroneous reports elsewhere.

    You do not need to talk about dimensions to the business. The main point is completeness of cross-record references between events (visits) and their parent records (pathways). For a technically oriented audience I would put these under completeness dimension as used in data quality literature and best practices. Under the dimension I would have a grouping called continuity metrics. In that group, there would be two metrics:
    PathwayIsComplete (If PathwayStatus=Finished then PathwayChildEventTypeDistictCount=3 (preoperative, operative, postoperative).
    EventHasPathway (PathwayCode is mandatory).
    In reality, they measure completeness of data from two angles – existence of parent record and completeness of mandatory event types in a chain of events.

    For business users it might be better to talk about processes and continuity of business value chains. For business users I would group them process metrics. If there is a lot of metrics, I could use subcategories like process.continuity metrics, customer.experience metrics etc. Group them in a way that makes sense for business users.

    Reply
    • 15th December 2020 at 10:15 am
      Permalink

      Sami,
      Thank you for that long and comprehensive reply – you make a lot of valid points.
      One of my current frustrations is the number of organisations where the ‘pain’ of poor data quality and poor data exploitation practices is see as ‘just the way things are’ without recognising the drag on profitability, efficiency and quality.
      For me, one of the key things is for organisations to take a measured, repeatable approach to assessing both their data and how they are exploiting it. This is far more than buying a data profiling tool, but a management process. Our challenge as data professionals is to apply this knowledge to develop repeatable, systemic processes in our organisations. However we describe concepts and terms, the most important thing is to be able to make a positive difference – particularly one that is sustainable.

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.