Ned Stratton: 31st January 2022
At the start of a new year I get all nostalgic, and this time five years ago I was 18 months into my first proper employment as a data analyst. Because I could write 10 lines of SQL to query Access databases and had bashed out some VBA to identify unique category values in a spreadsheet column and save a new spreadsheet for each category's data, I could see pennies dropping. It was the next level up from the VLOOKUPs and pivot tables that I considered to be advanced the year before.
Furthermore, I'd been signed off by my company to do a three-day immersive in Advanced VBA, I had convinced a Meetup group in London to let me give a ten-minute lightning talk on my Tinder-dating database idea, and had 'Advanced SQL' in my sights with grand ideas of stored procedures and CTEs, having worked out how to download SQL Server Management Studio and load a text file into to a table. It was January 2017, and I was a blazer and a pair of glasses short of being a data scientist.
Yes...but. Being an arts graduate I had no maths background, as was pointed out to me in recruiter rejection feedback. So I Googled standard deviations. Then a contract database developer was on hand to tell me that "little kittens die every time you code something in VBA". Snarky but completely right, he nudged me towards learning Python, the pre-requisite for many data science bootcamps and the opening to faster data wrangling and all-round proper coding. So I got the hang of Python by the end of 2018 through a Udemy course and a self-led project to web scrape football results from a scoring site into a database and run predictions from the data in order to win accumulator bets. It was a complete failure, but an automated and well-coded one.
Switching to a job at a bank in 2019, I consolidated SQL in a role which was mostly writing and debugging stored procedures, as well as cutting my teeth on Power BI to build reports and put the V into UCOVI. All the while I nostalgically volunteered to fix broken reports in Excel that used macros (VBA to me is Blackpool to a stick of rock, to use a saying older than the coding language itself), and picked up some HTML because some of the SQL stored procs I had to debug actually sent colour-formatted reports in HTML emails to managers from the data warehouse. Boy did I think that was the holy grail.
So where exactly am I going with this? Fast forward to present day, and I have solid core skills in SQL, Python, HTML/CSS (look at the lovely website you're reading this on) and Power BI. I also have things I dabble in or have dabbled in (Power Shell scripting, SSIS, JavaScript, statistical methods), and related to that, the ability to learn new tools and languages quicker from experience and practice. This should be pleasing and vindicate a steep five-year learning curve.
However, I wonder whether I'm alone or in good company as a data analyst in feeling anxious that I either don't know enough, that I don'’t know what I do know in great depth, or that what I do know is old hat. Ergo, are my skills sufficiently broad, deep and current?
Starting on breadth, I don't have much of a scooby about Google Analytics, BigQuery, Data Studio or other bits of their sophisticated data offering, which feels like a yawning gap in knowledge. This is because I'm a Microsoft guy, something which cements in me a tribal loyalty to Power BI as opposed to Tableau or Qlik. I have taken this loyalty to the extreme of convincing business decision makers in a company to switch a preference they had for Qlik to Power BI, just to stick to what I know and reduce the learning overhead! I got away with this risky move but they could easily have been thinking "he's a professional data analyst but can’t use our BI tool".
I'm also aware that I don't know my core competencies in full. In Python, I rarely use classes and recursion makes my head hurt. The re library for regex expressions and working with text – something I'm using everyday at work and on personal projects – is a pandoras box of hidden functionality and complex syntax that would take a solid year to learn top to bottom. I thought I had SQL down to a T a year ago with effortless command of user-defined scalar functions, until a DBA showed me how slow they were on large table joins, and how to write a better-performing table function instead.
That SQL anecdote betrays uncertainty around the third consideration – currentness. The version of SQL that I was aiming to master and thought I had done until the table-function lesson was SQL Server 2012, 4 versions behind. The long-term utility of detailed knowledge about SQL Server Agent job scripting, SSIS Packages, and SSRS that I picked up in that job may be questionable in tomorrow's world of data lakes, new ETL tools and cloud-based data architecture.
So the learning curve doesn't stop now, and for a data guy as for a developer, IT admin or any other technologist, it probably never does. I'm okay with this. I have the same thirst for knowledge as I did when Excel was my day-to-day and SQL felt space-age. But how do I keep my learning broad, deep, and current, which of these should I prioritise, and what would I advise younger analysts in the position I was in five years ago to learn?
There are varied schools of thought on this. Data scientist and Youtuber Luke Barousse recently outlined his technical skills learning roadmap on his channel. He suggests going for breadth first via the Google Data Analytics Professional Certificate in order to get a brief overview of SQL, Excel, Tableau and R, then diving deep into Excel and SQL. His advice is worth listening to and well-evidenced (he picks out SQL and Excel as the top two skills referenced on LinkedIn job postings) but the Google certificate takes six months to complete with a time commitment of 10 hours per week; I’ve seen briefer "brief overviews". Furthermore, Excel and SQL are the top two requirements today. I can guarantee that the data job market in five years will reward the analyst who can quickly get a handle on wrangling data from APIs and understands document data structure, rather than the expert on advanced SQL or relational databases.
My approach would be to prioritise learning something in depth first and to start with Python. It's an intuitive, easy-to-read, versatile and widely used coding language with which to build confidence as a developer-analyst. Crucially, its error messages are clear and explicit, and it has an abundance of online support through its own documentation and Stack Overflow posts, both of which make it easy to learn on your own and build important problem-solving habits.
As you progress through Python, you will see how its core data structures – lists, dictionaries and sets – both enable operations that would take longer in SQL and Excel and prepare you for JSON data (the format of data from most APIs). You will also discover it has libraries to scrape websites, connect to databases and read their data into data frames (virtual row/column structured tables with indices), even visualise data in Venn diagrams, which are still ridiculously badly supported in Excel and Power BI considering their usefulness in presenting categorical data.
I believe that with deep knowledge of and experience in something as versatile as Python, you can do pretty much everything expected of a modern, full-stack data analyst, and become a battle-hardened self-learner who can turn their hand quickly to anything Python doesn't cover. From this starting point of depth comes breadth in being able to solve a broad range of problems, as well as the ability to keep current in your knowledge as you identify the gaps in your core skillset and the newest most popular tools or languages to plug them.