The UCOVI Blog

The UCOVI Blog


The Joy of Clunky Data Analogies




Ned Stratton: 14th April 2022

My interview two blog posts ago with Susan Walsh – author of Between the Spreadsheets and creator of the COAT methodology as well as a metaphor for data maturity around the cycle of cleaning clothes (dirty laundry as dirty data all the way to wardrobe-arranged ironed, cleaned garments as insightful reports) – got me thinking about how awash (no pun intended) the conversation about data and data analytics is with analogies and metaphors, and whether or not this is useful.

Sometimes the oversimplification of an issue with a kindergarten analogy can be condescending and unhelpful. I had this a few years ago when I asked for flexibility in the monthly spend of a data research budget at work to account for slower months vs months with unforeseen urgent requirements, only to be told by a sales manager in temporary charge of the data team that "that's like a salesman saying 'I don’t need to hit my target this month because I sold loads last month'". Also consider this attempt by Computerworld's blog at condensing corporate data governance into the simple management of household finances (complete with headline straight from 1957).

But frequently in the world of data a good analogy comes in handy. Abstract and dry concepts such as normalisation, many-to-many relationships, statistical significance, and the Central Limit Theorem are as engrossing and intellectually satisfying to committed data nerds as they are utterly boring and remote to business stakeholders and non-data folk. In a situation like this as a data analyst, your aim is to get the message across in a way that maintains the level of complexity of the issue necessary to ensure its resolution, but without confusing, alienating or talking down to the message receiver. A pithy analogy achieves this by repackaging the abstractness of something like a warehouse load failure or a data science model with a high false positive rate into something every day, amusing and ideally visual.

This depiction that uses Lego bricks to visualise the process of converting raw data into charts and reports is a fun one, as is the David McCandless flow diagram of raw data to inter-connected knowledge from his book Knowledge is Beautiful, which he explains in a talk through the analogy of atoms, cells, organs and organisms.

The host of this podcast from Half Stack Data Science perfectly encapsulated how odd it is that data analysts and scientists currently seem to specialise by coding language and statistical approach rather than by industry (think SQL Data Analyst, Power BI Consultant etc), explaining 32 minutes in that it's rather like "a builder specialising in hammers".

Paul Daniel Jones, a former Data Governance Head at Nationwide and Barclays, has impressively managed to adapt the art of data analogy formulation into a full-length book - The Data Garden and Other Data Allegories. It features six short stories with morals on how to approach data-related challenges in businesses, including fables about a data literacy driving school and a data governance hospital.

Despite their inherent cheese factor, I think data analogies are good. Getting in the habit of using them is a genuinely effective way of engaging business users in what your analysis is saying, and explaining any limitations to your conclusions or the length and complexity of the process involved. They are a vital linguistic tool in the task of data translation.

But what is truly precious is when the moment arises to use SQL, databases and data as the analogy itself with which to explain something else.

I had this great fortune four years ago at a funeral of all occasions, when someone started a conversation about Christianity. He candidly owned up to "getting the whole God thing" but being confused about what - if there is a God that is omniscient and controls the universe – the point of Jesus Christ was. Since I knew he had a solid enough grasp of databases, I used that as the basis to make The Lord Our Saviour relevant to his line of work. I told him that just as large databases hosted in SQL Server have simple front-end web interfaces for non-SQL users to interact with in day-to-day tasks such as entering new records or downloading reports, Jesus Christ acts as the conduit by which Christians – the end users of Christianity – interact with God (the database) and receive salvation from sin and purgatory (complex queries and risky update/delete statements). Rather pleasingly he got the gist, only to then ask me on that basis "what are other religions for?". Less convincingly, I told him to think of those as MySQL, Oracle and PostgreSQL databases.





Interview: Adrian Mitchell - Founder, Brijj.io (28/06/2022) ⏪ ⏩ Event Review - SQLBits 2022, London (17/03/2022)

⌚ Back to Latest Post