Ned Stratton: 13th June 2024
I want to talk about a data analytics project I’ve worked on in as little detail as possible about the timeframe and subject matter (I don't want to piss people off), but as much detail as possible about the technical specifications of it (so that you grasp the magnificence of its engineering). Here goes.
Within the past six years, I've been on a data analytics project that involved a no-code data blending tool. It also involved SQL databases. Three of them: dev, test, and live. All changes to the dev database were version controlled using git. Several experienced IT technicians across different teams were consulted to advise on the SQL permissions group setup to avoid business users – who owned the source data – from causing accidental deletions or changes in the SQL database, even though they likely didn't know SQL.
And what, crucially, was the source data? It was an Excel spreadsheet with about 20 columns and 300 rows. (That's one SQL database for every 100 rows). And of course, the business users controlling the spreadsheet – themselves unencumbered by devops or change control – could do what they liked with it. They were quite relaxed about breaking changes such as renaming important columns or getting creative with date formats. The solution to this – I think you get the picture by now – was more technical complexity in the blending tool to anticipate it.
Now, I'm not against devops in data solutions per se. Things need to be tested properly before release into production to prevent embarrassing errors that cause loss of trust. Version control and change tracking are invaluable means both to get to the root of errors and to enable concurrent work by multiple people on the same project.
However, at some point one has to take a bird's eye view of the change-controlled devops mega city with test, dev, git, and a data center in Texas that's been built on the foundations of Debbie-from-accounts's 2023 expenses tracker and think, "isn't this a bit ridiculous?"
One also has to take stock of how the devops imperative weighs on the priorities of data teams.
The grimly accepted reality that 80% of data analytics is finding, cleaning, and organising the data is just about acceptable on the proviso that the other 20% is for actual insight exploration. But if it turns out that 19% is configuring devops pipelines, agonising over commit messages, and merging pull requests, then it really is 5:28 on a Friday afternoon before the afterthought of maybe converting all of this data into something interesting for the business is considered.
Where did this all come from? Well, there is the possibility that data has become so technically advanced and important to the modern business that it requires the same practices that software development teams follow. I don't discount it, but if I thought it told the whole story I wouldn't be writing this blog.
Data folk are sensitive souls who search love, attention and respect for the work they do and the things they know. The search starts within the business-focussed teams (product/sales/marketing) that they often report into, and ends unsuccessfully. They are not true business stakeholders (despite the business knowledge they acquire in their work). Furthermore, everything they do is requested, prioritised, and validated by these true business stakeholders, who often value politeness and confirmatory answers from their data analysts more than they value nuance or pushback. This inevitably causes feelings of resentment and powerlessness – the customers and waiters effect from my free-roles piece last year.
They are also beset by scope creep, last minute changes, unclear requests, being made to grovel and wait an age to get things installed, and most annoying of all, something breaking all the time that it's their job to keep operational even though they themselves didn't build it.
It is at this point that the appeal of devops and the illusory sense of order it could bring kicks in. "If everything is version controlled and tested on two servers before rollout, not only will breaking changes not happen, but also the business will become so impressed by the technical organisation and architecture of our BI that it will drive the cultural sea-change around data needed to stop time-consuming requests that prevent us from building advanced, maintainable, and lasting things with data." Or so it goes.
As this magical, great-new-world-over-the-choppy-sea, "this is how big companies do it" thinking takes hold, something else takes hold as well that was seeded in the one-sided business-stakeholder relationships. Devops and the desire to operationalize and automate everything becomes the goal over the delivery of new insights.
If you're simply bored by building the same sales dashboards all the time and demoralised by the "it doesn't match these numbers" feedback loop, then really leaning into the nitty-gritty of configuring devops pipelines, security administration, and coded automations for manual processes can provide the desired channel for your creativity and curiosity, as well as the sense of there being a finished end-product to your work. In essence, you're more like a software developer, and you have the git repo to prove it.
Except, you're not really a software developer. You're supporting and investigating rather than producing from a design. The next "release" is when a random stakeholder asks for a tweak, or after some upstream change to a column in the raw data you have no control over. You're like a zebra in a safari park that's become jealous of the giraffes because they can pick the high-hanging fruit with their longer necks. All of your efforts to paint your neck yellow with brown spots and time spent on stretching exercises to elongate it will never get you to the point where you, as a zebra, can pick high fruit as effectively as the giraffes, or have the giraffes accept you and grant you the status of giraffe rather than zebra.
There's no stat from a Gartner report about increased spend on Redgate by data teams backing this up; I'm merely stating what I've observed about the data profession from the conversation on LinkedIn, what dominates the agenda at data conferences, and my own experience of working in data teams over 7-8 years and how my job role has changed.
Fundamentally, the over devops-ification of data is essentially an expensive substitute for having the backbone to tell well-paid people to prove the need for a new report and think through its content properly before committing to produce it on a scheduled basis. It's the making of frustrated developers as opposed to productive data teams that are first-class citizens within their businesses. For that to occur, the thrill of insight discovery needs to come back into fashion.