Slawomir Laskowski: 31st May 2020
In recent years, the concept of data has become a huge sensation. The sheer volume of it that is now available makes it more complex to understand, analyse and build business recommendations from it. But this is what data analysts have to do. They are tasked with structuring and extracting sense from vast and seemingly chaotic datasets; creating meaningful relationships among outputs from multiple manual or computerised data sources.
In the minds of most business-focussed, non-technical people, data analysts occupy a kind of nebulous grey area encompassing IT, maths, and black magic with spreadsheets. Each one misses the point.
A Data Analyst is someone that can work on large datasets whatever their format or subject matter. They can relate to anything, from traffic in a crowded city, through the amount of waste generated in a factory or the behaviour of customers using an application, to the number of used plastic straws in restaurants. The use cases for data are countless because you can measure almost anything.
But 'large' can be too large when considering a dataset, even for the most technical and talented data analyst. This is where data science comes in.
When confronted with millions of rows and hundreds of input variables, artificial intelligence and machine learning methods support the work of an analyst by distilling it into fewer dimensions. Correlation matrices and Principal Components Analysis (PCA) are examples of this. Organising and processing data of this kind is the domain of specialised tools and machines.
The analyst's role, although highly technically-skilled, is not as a have-a-go data scientist or Big Data engineer. Their purpose is not to re-invent the wheel by attempting this type of processing themselves, but instead setting the processed data in real life context, judging which correlations matter, and drawing actionable conclusions from it.
A report’s primary purpose is to answer a question. Either it’s a "yes/no/yes, but…", or supporting/rejecting a null hypothesis.
Its additional aims include:
• Informing users about about the complexity of a situation and allowing them to broaden their perspective.
• Acting as an interactive tool for more in-depth analysis or investigation.
• Recommending the most preferable next action from the options available to a business.
So it can be said that data analyst is not only a "data person", but above all, a special advisor. This is why the presentational and aesthetic elements of a report – the PowerPoint slides and animations - are as important to data analysis as numerical accuracy and technical work.
• Logical thinking
• Resourcefulness: a willingness to research and draw upon additional relevant data sources
• Deductive reasoning and an ability to draw conclusions
• Correct communication with other people and a willingness to work with them
This last skill is not overrated. A data analyst is not the same as a computer programmer. Some developers - when asked "why do you create applications?" – might jokingly but with an element of truth answer "to avoid dealing with people". But with software development, the requirements brief is often explicit and detailed to start with. Data analysis is different. When a business user requests a piece of analysis there is always an element of curiosity and uncertainty. This is natural – if the user knew exactly what was needed to begin with, why would he even need the analysis to be done in the first place? A good data analyst must talk to and work with people throughout the duration of a project, keeping them informed on progress and discoveries as they unfold. He is coming up with recommendations for future directions in the creation of products, services and solutions. And these are not created for self-learning robots, but people.
Executing Data Quality Projects: Ten Steps to Quality Data and Trusted Information – Danette McGilvray (2008)
This book has helped me to wrangle data quality projects every day, closely tying technical tasks to business motivations. It is detailed, well-written and in plain English. Each step provides a description, business benefit/context and approach, alongside useful examples and templates.
Data Driven: Profiting from Your Most Important Business Asset - Thomas C. Redman (2008)
This is an excellent guide to the principles of data management. It shows comprehensively how to improve data quality, use data to make better decisions and establish management systems that will help nearly any company get the most from its data. This book represents a significant milestone in the evolution of the data quality profession.
The Signal and The Noise – Nate Silver (2012)
Using subject matter ranging from climate change to baseball, Nate Silver brings statistical concepts and predictive methods to life and emphasizes how costly making the wrong assumptions from data can be. One of the best exponents of Bayesian statistics available.
Small Data – The Tiny Clues that Uncover Huge Trends – Martin Lindstrom (2016)
As the title suggests, this is a must-read for highlighting the importance to situation-understanding and decision-making of ‘small data’ that comes from naked-eye observations and interviews with people. Like Nate Silver, Lindstrom does this through detailed examples of how the approach he advocates has worked.